MUGE Text To Image Generation Baseline

Requirements and Installation

More details see fairseq. Briefly,

python == 3.6.4
pytorch == 1.7.1

Installing fairseq and other requirements

git clone https://github.com/MUGE-2021/image-caption-baseline
cd muge_baseline/
pip install -r requirements.txt
cd fairseq/
pip install --editable .

Downloading data and place to dataset/ directory, file structure is

text2image-baseline
    - dataset
        - ECommerce-T2I
            - T2I_train.img.tsv
            - T2I_train.text.tsv
            - ...

Getting Started

The model is a BART-like model with vqgan as a image tokenizer, please see models/t2i_baseline.py for detailed model structure.

Training

cd run_scripts/; bash train_t2i_vqgan.sh

Model training takes about 5 hours.

Inference

cd run_scripts/; bash generate_t2i_vqgan.sh

See results in results/ directory.

Reference

@inproceedings{M6,
  author    = {Junyang Lin and
               Rui Men and
               An Yang and
               Chang Zhou and
               Ming Ding and
               Yichang Zhang and
               Peng Wang and
               Ang Wang and
               Le Jiang and
               Xianyan Jia and
               Jie Zhang and
               Jianwei Zhang and
               Xu Zou and
               Zhikang Li and
               Xiaodong Deng and
               Jie Liu and
               Jinbao Xue and
               Huiling Zhou and
               Jianxin Ma and
               Jin Yu and
               Yong Li and
               Wei Lin and
               Jingren Zhou and
               Jie Tang and
               Hongxia Yang},
  title     = {{M6:} {A} Chinese Multimodal Pretrainer},
  year      = {2021},
  booktitle = {Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining},
  pages     = {3251–3261},
  numpages  = {11},
  location  = {Virtual Event, Singapore},
}

@article{M6-T,
  author    = {An Yang and
               Junyang Lin and
               Rui Men and
               Chang Zhou and
               Le Jiang and
               Xianyan Jia and
               Ang Wang and
               Jie Zhang and
               Jiamang Wang and
               Yong Li and
               Di Zhang and
               Wei Lin and
               Lin Qu and
               Jingren Zhou and
               Hongxia Yang},
  title     = {{M6-T:} Exploring Sparse Expert Models and Beyond},
  journal   = {CoRR},
  volume    = {abs/2105.15082},
  year      = {2021}
}

Image-generation-baseline - MUGE Text To Image Generation Baseline

Related tags

Overview

MUGE Text To Image Generation Baseline

Requirements and Installation

Getting Started

Training

Inference

Reference

Owner

Instance-Dependent Partial Label Learning

HODEmu, is both an executable and a python library that is based on Ragagnin 2021 in prep.

Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

This repo contains research materials released by members of the Google Brain team in Tokyo.

MoCap-Solver: A Neural Solver for Optical Motion Capture Data

Learning 3D Part Assembly from a Single Image

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2

Integrated physics-based and ligand-based modeling.

EfficientNetV2 implementation using PyTorch

A blender add-on that automatically re-aligns wrong axis objects.

Time Dependent DFT in Tamm-Dancoff Approximation

This is the code of NeurIPS'21 paper "Towards Enabling Meta-Learning from Target Models".

SE-MSCNN: A Lightweight Multi-scaled Fusion Network for Sleep Apnea Detection Using Single-Lead ECG Signals

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.