πŸ‡°πŸ‡· Text to Image in Korean

Overview

KoDALLE

Open In Colab Wandb Log

image-20211227151557604

Utilizing pretrained language model’s token embedding layer and position embedding layer as DALLE’s text encoder.

Background

  • Training DALLE model from scratch demands large size paired dataset of images and captions. For example, OpenAI DALLE is trained with more than 250 million text-image pairs for the training.
  • If the dataset isn’t large enough or is limited to specific domains, number of vocabularies in the trained DALLE model are insufficient. For instance, 1 million text captions of K-Fashion dataset only consists of more or less than 300 tokens.
  • Therefore, inferencing from such DALLE models could be problematic if the given sentence query is unconnected to the originally trained captions’ text dataset.

KoDALLE's Result on Small Size Fashion Dataset

OpenAI’s DALLE KoDALLE of HappyFace
Train Dataset Size 250 Million Pairs 0.8 Million Pairs
#Params 12 Billion 428 Million
#Layers 64 Layers 16 Layers
Computing Resource 1024 x V100 16GB 1 x V100 32GB
Text Encoder 16384 Vocab x 512 Dim BPE 32000 Vocab x 1024 Dim klue/roberta-large
Image Encoder VQVAE VQGAN
Optimizer AdamW AdamW
Learning Rate 4.5e-5 3.0e-5
Weight Decay 4.5e-3 3.0e-3
LR Scheduler ReduceLROnPlateau -

The team constructed Text to Fashion Design DALLE model in Korean language with less than 100k text-image sampled pairs.

Caption ν•˜μ˜μ—μ„œ 색상은 μŠ€μΉ΄μ΄λΈ”λ£¨μ΄λ‹€. μƒμ˜μ—μ„œ κΈ°μž₯은 둱이닀. 색상은 ν™”μ΄νŠΈμ΄λ‹€. μΉ΄ν…Œκ³ λ¦¬λŠ” λΈ”λΌμš°μŠ€μ΄λ‹€. λ””ν…ŒμΌμ—λŠ” 셔링이닀. μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μ†Œμž¬μ—λŠ” 싀크이닀. ν”„λ¦°νŠΈμ—λŠ” 무지이닀. λ„₯라인은 브이λ„₯이닀. 핏은 λ…Έλ©€
Generated Image image
Caption μ•„μš°ν„°λŠ” 색상이 μΉ΄ν‚€ μ†Œμž¬κ°€ 우븐 핏이 루즈인 μ½”νŠΈμ΄λ‹€. ν•˜μ˜λŠ” 색상이 넀이비 μ†Œμž¬κ°€ λ°λ‹˜ 핏이 μŠ€ν‚€λ‹ˆμΈ 청바지이닀.
Generated Image image
Caption ν•˜μ˜μ—μ„œ κΈ°μž₯은 발λͺ©μ΄λ‹€. 색상은 블루이닀. μΉ΄ν…Œκ³ λ¦¬λŠ” μŠ€μ»€νŠΈμ΄λ‹€. μ†Œμž¬μ—λŠ” λ°λ‹˜μ΄λ‹€. 핏은 μ™€μ΄λ“œμ΄λ‹€. μƒμ˜μ—μ„œ 색상은 ν™”μ΄νŠΈμ΄λ‹€. μΉ΄ν…Œκ³ λ¦¬λŠ” λΈ”λΌμš°μŠ€μ΄λ‹€. λ””ν…ŒμΌμ—λŠ” 셔링이닀. μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μ†Œμž¬μ—λŠ” μš°λΈμ΄λ‹€.
Generated Image image
Caption μƒμ˜μ—μ„œ κΈ°μž₯은 노멀이닀. μƒμ˜μ—μ„œ 색상은 ν™”μ΄νŠΈμ΄λ‹€. μƒμ˜μ—μ„œ μ„œλΈŒμƒ‰μƒμ€ λΈ”λž™μ΄λ‹€. μƒμ˜μ—μ„œ μΉ΄ν…Œκ³ λ¦¬λŠ” 티셔츠이닀. μƒμ˜μ—μ„œ μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μƒμ˜μ—μ„œ μ†Œμž¬μ—λŠ” 저지이닀. μƒμ˜μ—μ„œ ν”„λ¦°νŠΈμ—λŠ” λ ˆν„°λ§μ΄λ‹€. μƒμ˜μ—μ„œ λ„₯라인은 λΌμš΄λ“œλ„₯이닀. μƒμ˜μ—μ„œ 핏은 λ£¨μ¦ˆμ΄λ‹€.
Generated Image image

Methodology

Experimentations were conducted with the following Korean Transformers Models’ embedding layers. The team selected klue/roberta-large as baseline in the repository considering the size of the model.

KoDALLE with klue/roberta-large's wpe and wte which is trainable on 16GB GPU Google Colab environment. Hyperparams related to the DALLE's model size are following.

'BATCH_SIZE': 32
'DEPTH': 2
'TEXT_SEQ_LEN': 128
'VOCAB_SIZE': 32000
'MODEL_DIM': 1024
'ATTN_TYPES': 'full'
'DIM_HEAD': 64
'HEADS': 8

Significance

  • Offers promising result for training from scratch on specific domains with small size dataset.
  • Introduces solution for domain specific DALLE & CLIP models to be robust on input sentence.
  • Recommends adequate text-to-image model size for given computation resource.
  • Suggests effortless method of creating DALLE & CLIP model for own languages if pretrained language model is available.

WIP

  • Add image-caption reranker(EfficientNet + Klue/roberta-large)
  • Model trained with 500k text-image pairs.
  • Modulize in python code.
  • Update Inference code.
  • Update FID and IS metrics on test and validation dataset.
You might also like...
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

BARTScore: Evaluating Generated Text as Text Generation
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

Code for EMNLP 2021 main conference paper
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task Automatic number plate recognition using tech:  Yolo, OCR, Scene text detection, scene text recognation, flask, torch
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Comments
  • Koclip apply in KoDALLE

    Koclip apply in KoDALLE

    변경사항

    add) model.py

    ν˜„μˆ˜λ‹˜μ˜ KoCLIP이 DALLE Roberta μ—μ„œ μž‘λ™ν•˜κ²Œλ” μ½”λ“œλ₯Ό μˆ˜μ •ν•œ νŒŒμΌμž…λ‹ˆλ‹€.

    dev branch에 μ‘΄μž¬ν•˜λŠ” model.py λΉ„κ΅ν•˜λ©΄μ„œ μˆ˜μ •μ΄ ν•„μš”ν•©λ‹ˆλ‹€.

    add) generate.ipynb

    KoCLIP이 μž‘λ™ν•˜λŠ”κ²ƒμ„ λ³Ό 수 μžˆλ„λ‘ λ§Œλ“  μ½”λ“œμž…λ‹ˆλ‹€.

    opened by JoonHong-Kim 1
  • add: KoCLIP codes

    add: KoCLIP codes

    변경사항:

    refactor) clipmodel.py

    • CLIPModel μ΅œμ’… λ²„μ „μœΌλ‘œ μˆ˜μ •
    • clip folder둜 이동

    add) clip/train_clip.py

    • CLIP λͺ¨λΈ ν•™μŠ΅μ— μ‚¬μš©ν•œ μ½”λ“œμž…λ‹ˆλ‹€

    add) clip/dataloader.py

    • CLIP λͺ¨λΈ ν•™μŠ΅μ— μ‚¬μš©ν•œ dataloader ν•¨μˆ˜μž…λ‹ˆλ‹€.
    opened by shawnhyeonsoo 0
  • add skip_sample in TextImageDataset

    add skip_sample in TextImageDataset

    변경사항

    modify) loader.py

    • TextImageDatasetμ—μ„œ texts, imageλ₯Ό 뢈러올 λ•Œ, dataκ°€ 없을 경우 λ°œμƒν•˜λŠ” μ—λŸ¬ 처리
    • skip_sample ν•¨μˆ˜λ₯Ό ν™œμš©ν•˜μ—¬ errorκ°€ λ°œμƒν•  경우, random ν˜Ήμ€ λ‹€μŒ index둜 λ³€ν™˜ν•˜μ—¬ skip
    • κΈ°μ‘΄ train_dalle_gpt_roberta.pyλ₯Ό λ°”νƒ•μœΌλ‘œ μˆ˜μ •
    opened by jjonhwa 0
Releases(v0.1.0-beta)
YOLOv2 in PyTorch

YOLOv2 in PyTorch NOTE: This project is no longer maintained and may not compatible with the newest pytorch (after 0.4.0). This is a PyTorch implement

Long Chen 1.5k Jan 02, 2023
RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering Authors: Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou and

Salesforce 72 Dec 05, 2022
constructing maps of intellectual influence from publication data

Influencemap Project @ ANU Influence in the academic communities has been an area of interest for researchers. This can be seen in the popularity of a

CS Metrics 13 Jun 18, 2022
A curated (most recent) list of resources for Learning with Noisy Labels

A curated (most recent) list of resources for Learning with Noisy Labels

Jiaheng Wei 321 Jan 09, 2023
Face Transformer for Recognition

Face-Transformer This is the code of Face Transformer for Recognition (https://arxiv.org/abs/2103.14803v2). Recently there has been great interests of

Zhong Yaoyao 153 Nov 30, 2022
code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning Overview This code is for paper: Not All Unlabeled Data are Equa

Jason Ren 22 Nov 23, 2022
TextBPN Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection

TextBPN Adaptive Boundary Proposal Network for Arbitrary Shape Text DetectionοΌ› Accepted by ICCV2021. Note: The complete code (including training and t

S.X.Zhang 84 Dec 13, 2022
PyTorch implementation for the ICLR 2020 paper "Understanding the Limitations of Variational Mutual Information Estimators"

Smoothed Mutual Information ``Lower Bound'' Estimator PyTorch implementation for the ICLR 2020 paper Understanding the Limitations of Variational Mutu

50 Nov 09, 2022
A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

University1652-Baseline [Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍] This

Zhedong Zheng 335 Jan 06, 2023
This repository contains pre-trained models and some evaluation code for our paper Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning This repository contains pre-trained models and some evaluation

Meta Research 207 Jan 08, 2023
Fine-tuning StyleGAN2 for Cartoon Face Generation

Cartoon-StyleGAN πŸ™ƒ : Fine-tuning StyleGAN2 for Cartoon Face Generation Abstract Recent studies have shown remarkable success in the unsupervised imag

Jihye Back 520 Jan 04, 2023
[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021

Convolutional MLP ConvMLP: Hierarchical Convolutional MLPs for Vision Preprint link: ConvMLP: Hierarchical Convolutional MLPs for Vision By Jiachen Li

SHI Lab 143 Jan 03, 2023
Code for the paper "Combining Textual Features for the Detection of Hateful and Offensive Language"

The repository provides the source code for the paper "Combining Textual Features for the Detection of Hateful and Offensive Language" submitted to HA

Sherzod Hakimov 3 Aug 04, 2022
Apply a perspective transformation to a raster image inside Inkscape (no need to use an external software such as GIMP or Krita).

Raster Perspective Apply a perspective transformation to bitmap image using the selected path as envelope, without the need to use an external softwar

s.ouchene 19 Dec 22, 2022
An Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering

PC-SOS-SDP: an Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering PC-SOS-SDP is an exact algorithm based on the branch-and-bound techn

Antonio M. Sudoso 1 Nov 13, 2022
This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

Mutli-agent task allocation This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams. To change

Biorobotics Lab 5 Oct 12, 2022
Network Enhancement implementation in pytorch

network_enahncement_pytorch Network Enhancement implementation in pytorch Research paper Network Enhancement: a general method to denoise weighted bio

Yen 1 Nov 12, 2021
A platform to display the carbon neutralization information for researchers, decision-makers, and other participants in the community.

Welcome to Carbon Insight Carbon Insight is a platform aiming to display the carbon neutralization roadmap for researchers, decision-makers, and other

Microsoft 14 Oct 24, 2022
Official implementation of VQ-Diffusion

Vector Quantized Diffusion Model for Text-to-Image Synthesis Overview This is the official repo for the paper: [Vector Quantized Diffusion Model for T

Microsoft 592 Jan 03, 2023
CS_Final_Metal_surface_detection - This is a final project for CoderSchool Machine Learning bootcamp on 29/12/2021.

CS_Final_Metal_surface_detection This is a final project for CoderSchool Machine Learning bootcamp on 29/12/2021. The project is based on the dataset

Cuong Vo 1 Dec 29, 2021