A minimalist implementation of score-based diffusion model

Last update: Dec 20, 2022

Related tags

Overview

sdeflow-light

This is a minimalist codebase for training score-based diffusion models (supporting MNIST and CIFAR-10) used in the following paper

"A Variational Perspective on Diffusion-Based Generative Models and Score Matching" by Chin-Wei Huang, Jae Hyun Lim and Aaron Courville [arXiv]

Also see the concurrent work by Yang Song & Conor Durkan where they used the same idea to obtain state-of-the-art likelihood estimates.

Experiments on Swissroll

Here's a Colab notebook which contains an example for training a model on the Swissroll dataset.

In this notebook, you'll see how to train the model using score matching loss, how to evaluate the ELBO of the plug-in reverse SDE, and how to sample from it. It also includes a snippet to sample from a family of plug-in reverse SDEs (parameterized by λ) mentioned in Appendix C of the paper.

Below are the trajectories of λ=0 (the reverse SDE used in Song et al.) and λ=1 (equivalent ODE) when we plug in the learned score / drift function. This corresponds to Figure 5 of the paper.

Experiments on MNIST and CIFAR-10

This repository contains one main training loop (train_img.py). The model is trained to minimize the denoising score matching loss by calling the .dsm(x) loss function, and evaluated using the following ELBO, by calling .elbo_random_t_slice(x)

where the divergence (sum of the diagonal entries of the Jacobian) is estimated using the Hutchinson trace estimator.

It's a minimalist codebase in the sense that we do not use fancy optimizer (we only use Adam with the default setup) or learning rate scheduling. We use the modified U-net architecture from Denoising Diffusion Probabilistic Models by Jonathan Ho.

A key difference from Song et al. is that instead of parameterizing the score function s, here we parameterize the drift term a (where they are related by a=gs and g is the diffusion coefficient). That is, a is the U-net.

Parameterization: Our original generative & inference SDEs are

dX = mu dt + sigma dBt
dY = (-mu + sigma*a) ds + sigma dBs

We reparameterize it as

dX = (ga - f) dt + g dBt
dY = f ds + g dBs

by letting mu = ga - f, and sigma = g. (since f and g are fixed, we only have one degree of freedom, which is a). Alternatively, one can parameterize s (e.g. using the U-net), and just let a=gs.

How it works

Here's an example command line for running an experiment

python train_img.py --dataroot=[DATAROOT] --saveroot=[SAVEROOT] --expname=[EXPNAME] \
    --dataset=cifar --print_every=2000 --sample_every=2000 --checkpoint_every=2000 --num_steps=1000 \
    --batch_size=128 --lr=0.0001 --num_iterations=100000 --real=True --debias=False

Setting --debias to be False uses uniform sampling for the time variable, whereas setting it to be True uses a non-uniform sampling strategy to debias the gradient estimate described in the paper. Below are the bits-per-dim and the corresponding standard error of the test set recorded during training (orange for --debias=True and blue for --debias=False).

Here are some samples (debiased on the right)

It takes about 14 hrs to finish 100k iterations on a V100 GPU.

A minimalist implementation of score-based diffusion model

Related tags

Overview

sdeflow-light

Experiments on Swissroll

Experiments on MNIST and CIFAR-10

How it works

Owner

Chin-Wei Huang

3D detection and tracking viewer (visualization) for kitti & waymo dataset

Multi-resolution SeqMatch based long-term Place Recognition

Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

3D Avatar Lip Syncronization from speech (JALI based face-rigging)

Multi agent DDPG algorithm written in Python + Pytorch

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Your interactive network visualizing dashboard

A Pytorch implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

Applying PVT to Semantic Segmentation

A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

Portfolio asset allocation strategies: from Markowitz to RNNs

This repository is the code of the paper "Sparse Spatial Transformers for Few-Shot Learning".

Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

Production First and Production Ready End-to-End Speech Recognition Toolkit

Waymo motion prediction challenge 2021: 3rd place solution

这是一个利用facenet和retinaface实现人脸识别的库，可以进行在线的人脸识别。

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021