Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Last update: Dec 28, 2022

Overview

Make-A-Scene - PyTorch

Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/pdf/2203.13131.pdf)

Figure 1. from paper

Note: this is work in progress.

Everyone is happily invited to contribute --> Discord Channel: https://discord.gg/hCRMGRZkC6

We would love to open-source a trained model. The model is a billion parameter model. Training it requires a lot of compute. If anyone can provide computational resources, let us know.

Paper Description:

Make-A-Scene modifies the VQGAN framework. It makes heavy use of using semantic segmentation maps for extra conditioning. This enables more influence on the generation process. Morever, it also conditions on text. The main improvements are the following:

Segmentation condition: separate VQVAE is trained (VQ-SEG) + loss modified to a weighted binary cross entropy. (3.4)
VQGAN training (VQ-IMG) is extended by Face-Loss & Object-Loss (3.3 & 3.5)
Classifier Guidance for the autoregressive transformer (3.7)

Training Pipeline

Figure 6. from paper

What needs to be done?

Refer to the different folders to see details.

Citation

@misc{https://doi.org/10.48550/arxiv.2203.13131,
  doi = {10.48550/ARXIV.2203.13131},
  url = {https://arxiv.org/abs/2203.13131},
  author = {Gafni, Oran and Polyak, Adam and Ashual, Oron and Sheynin, Shelly and Parikh, Devi and Taigman, Yaniv},
  title = {Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Related tags

Overview

Make-A-Scene - PyTorch

Note: this is work in progress.

Paper Description:

Training Pipeline

What needs to be done?

Citation

Owner

Casual GAN Papers

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

PyTorch implementation of Convolutional Neural Fabrics http://arxiv.org/abs/1606.02492

Deep Convolutional Generative Adversarial Networks

Benchmark tools for Compressive LiDAR-to-map registration

Implicit Deep Adaptive Design (iDAD)

Hysterese plugin with two temperature offset areas

High-Resolution 3D Human Digitization from A Single Image.

Distributed Evolutionary Algorithms in Python

Chainer Implementation of Semantic Segmentation using Adversarial Networks

Code for DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

A python library for self-supervised learning on images.

An image processing project uses Viola-jones technique to detect faces and then use SIFT algorithm for recognition.

Tweesent-back - Tweesent backend uses fastAPI as the web framework

PyTorch implementation of ENet

Decompose to Adapt: Cross-domain Object Detection via Feature Disentanglement

Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

A certifiable defense against adversarial examples by training neural networks to be provably robust

[peer review] An Arbitrary Scale Super-Resolution Approach for 3D MR Images using Implicit Neural Representation