Symbolic Music Generation with Diffusion Models

Last update: Jan 07, 2023

Related tags

Deep Learning symbolic-music-diffusion

Overview

Symbolic Music Generation with Diffusion Models

Supplementary code release for our work Symbolic Music Generation with Diffusion Models.

Installation

All code is written in Python 3 (Anaconda recommended). To install the dependencies:

pip install -r requirements.txt

A copy of the Magenta codebase is required for access to MusicVAE and related components. Installation instructions can be found on the Magenta public repository. You will also need to download pretrained MusicVAE checkpoints. For our experiments, we use the 2-bar melody model.

Datasets

We use the Lakh MIDI Dataset to train our models. Follow these instructions to download and build the Lakh MIDI Dataset.

To encode the Lakh dataset with MusicVAE, use scripts/generate_song_data_beam.py:

python scripts/generate_song_data_beam.py \
  --checkpoint=/path/to/musicvae-ckpt \
  --input=/path/to/lakh_tfrecords \
  --output=/path/to/encoded_tfrecords

To preprocess and generate fixed-length latent sequences for training diffusion and autoregressive models, refer to scripts/transform_encoded_data.py:

python scripts/transform_encoded_data.py \
  --encoded_data=/path/to/encoded_tfrecords \
  --output_path =/path/to/preprocess_tfrecords \
  --mode=sequences \
  --context_length=32

Training

Diffusion

python train_ncsn.py --flagfile=configs/ddpm-mel-32seq-512.cfg

TransformerMDN

python train_mdn.py --flagfile=configs/mdn-mel-32seq-512.cfg

Sampling and Generation

Diffusion

python sample_ncsn.py \
  --flagfile=configs/ddpm-mel-32seq-512.cfg \
  --sample_seed=42 \
  --sample_size=1000 \
  --sampling_dir=/path/to/latent-samples

TransformerMDN

python sample_ncsn.py \
  --flagfile=configs/mdn-mel-32seq-512.cfg \
  --sample_seed=42 \
  --sample_size=1000 \
  --sampling_dir=/path/to/latent-samples

Decoding sequences

To convert sequences of embeddings (generated by diffusion or TransformerMDN models) to sequences of MIDI events, refer to scripts/sample_audio.py.

python scripts/sample_audio.py
  --input=/path/to/latent-samples/[ncsn|mdn] \
  --output=/path/to/audio-midi \
  --n_synth=1000 \
  --include_wav=True

Citing

If you use this code please cite it as:

@inproceedings{
  mittal2021symbolicdiffusion,
  title={Symbolic Music Generation with Diffusion Models},
  author={Gautam Mittal and Jesse Engel and Curtis Hawthorne and Ian Simon},
  booktitle={Proceedings of the 22nd International Society for Music Information Retrieval Conference},
  year={2021},
  url={https://archives.ismir.net/ismir2021/paper/000058.pdf}
}

Note

This is not an official Google product.

Symbolic Music Generation with Diffusion Models

Related tags

Overview

Symbolic Music Generation with Diffusion Models

Installation

Datasets

Training

Diffusion

TransformerMDN

Sampling and Generation

Diffusion

TransformerMDN

Decoding sequences

Citing

Note

Owner

Magenta

Prototype-based Incremental Few-Shot Semantic Segmentation

Tree LSTM implementation in PyTorch

Lepard: Learning Partial point cloud matching in Rigid and Deformable scenes

Neural Fixed-Point Acceleration for Convex Optimization

In this work, we will implement some basic but important algorithm of machine learning step by step.

Hepsiburada - Hepsiburada Urun Bilgisi Cekme

The end-to-end platform for building voice products at scale

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Training DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)

This is Official implementation for "Pose-guided Feature Disentangling for Occluded Person Re-Identification Based on Transformer" in AAAI2022

GAN example for Keras. Cuz MNIST is too small and there should be something more realistic.

MXNet implementation for: Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

Pytorch implementation code for [Neural Architecture Search for Spiking Neural Networks]

Node Editor Plug for Blender

This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit

ServiceX Transformer that converts flat ROOT ntuples into columnwise data

A Large Scale Benchmark for Individual Treatment Effect Prediction and Uplift Modeling

[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

High-Fidelity Pluralistic Image Completion with Transformers (ICCV 2021)

Implementations of CNNs, RNNs, GANs, etc