Clockwork Variational Autoencoders (CW-VAE)

Vaibhav Saxena, Jimmy Ba, Danijar Hafner

If you find this code useful, please reference in your paper:

@article{saxena2021clockworkvae,
  title={Clockwork Variational Autoencoders}, 
  author={Saxena, Vaibhav and Ba, Jimmy and Hafner, Danijar},
  journal={arXiv preprint arXiv:2102.09532},
  year={2021},
}

Method

Clockwork VAEs are deep generative model that learn long-term dependencies in video by leveraging hierarchies of representations that progress at different clock speeds. In contrast to prior video prediction methods that typically focus on predicting sharp but short sequences in the future, Clockwork VAEs can accurately predict high-level content, such as object positions and identities, for 1000 frames.

Clockwork VAEs build upon the Recurrent State Space Model (RSSM), so each state contains a deterministic component for long-term memory and a stochastic component for sampling diverse plausible futures. Clockwork VAEs are trained end-to-end to optimize the evidence lower bound (ELBO) that consists of a reconstruction term for each image and a KL regularizer for each stochastic variable in the model.

More information:

Instructions

This repository contains the code for training the Clockwork VAE model on the datasets minerl, mazes, and mmnist.

The datasets will automatically be downloaded into the --datadir directory.

python3 train.py --logdir /path/to/logdir --datadir /path/to/datasets --config configs/<dataset>.yml

The evaluation script writes open-loop video predictions in both PNG and NPZ format and plots of PSNR and SSIM to the data directory.

python3 eval.py --logdir /path/to/logdir

Clockwork Variational Autoencoder

Related tags

Overview

Clockwork Variational Autoencoders (CW-VAE)

Method

Instructions

Owner

Vaibhav Saxena

Weight estimation in CT by multi atlas techniques

CLIP+FFT text-to-image

Spam your friends and famly and when you do your famly will disown you and you will have no friends.

NICE-GAN — Official PyTorch Implementation Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers (arXiv2021)

Spatial Transformer Nets in TensorFlow/ TensorLayer

Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation

A Python package for performing pore network modeling of porous media

Complete system for facial identity system

A toolkit for developing and comparing reinforcement learning algorithms.

Numba-accelerated Pythonic implementation of MPDATA with examples in Python, Julia and Matlab

Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

(ICCV'21) Official PyTorch implementation of Relational Embedding for Few-Shot Classification

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

MIMIC Code Repository: Code shared by the research community for the MIMIC-III database

It's like Shape Editor in Maya but works with skeletons (transforms).

Self-driving car env with PPO algorithm from stable baseline3