MaskGIT-pytorch

Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)

Note: this is work in progress

MaskGIT is an extension to the VQGAN paper which improves the second stage transformer part (and leaves the first stage untouched). It switches the unidirectional transformer for a bidirectional transformer. The (second stage) training is pretty similar to BERT by randomly masking out tokens and trying to predict these using the bidirectional transformer (the original work used a GPT architecture randomly replaced tokens by other tokens). Different from BERT, the percentage for the masking is not fixed and uniformly distributed between 0 and 1 for each batch. Furhtermore, a new inference algorithm is suggested in which we start off by a completely masked-out image and then iteratively sample vectors where the model has a high confidence.

If you are only interested in the part of the code that comes from this paper check out transformer.py.

Run the code

The code is ready for training both the VQGAN and the Bidirectional Transformer and can also be used for inference

python training_vqgan.py

python training_transformer.py

(Make sure to edit the path for the dataset etc.)

TODO

Implement the gamma functions
Implement functions for image editing tasks: inpainting, extrapolation, image manipulation
Tune hyperparameters
(Provide visual results)

Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Related tags

Overview

MaskGIT-pytorch

Note: this is work in progress

Run the code

TODO

Owner

Dominic Rampas

A simple interface for editing natural photos with generative neural networks.

Code base of object detection

Deep Learning for 3D Point Clouds: A Survey (IEEE TPAMI, 2020)

This repository contains the code for the ICCV 2019 paper "Occupancy Flow - 4D Reconstruction by Learning Particle Dynamics"

Semi-supervised learning for object detection

Computer Vision Paper Reviews with Key Summary of paper, End to End Code Practice and Jupyter Notebook converted papers

Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.

This is a file about Unet implemented in Pytorch

JAX + dataclasses

利用yolov5和TensorRT从0到1实现目标检测的模型训练到模型部署全过程

Implementation of TabTransformer, attention network for tabular data, in Pytorch

Code associated with the paper "Towards Understanding the Data Dependency of Mixup-style Training".

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering

DL & CV-based indicator toolset for the vehicle drivers via live dash-cam footage.

Meta Learning Backpropagation And Improving It (VSML)

InvTorch: memory-efficient models with invertible functions

The source code of CVPR 2019 paper "Deep Exemplar-based Video Colorization".

Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

Reimplementation of Learning Mesh-based Simulation With Graph Networks