INTRODUCTION

This is a modification of the OpenAI-CLIP repo of moein-shariatnia(https://github.com/moein-shariatnia/OpenAI-CLIP).

The current training dataset supports flicker-8k or flicker-30k, and the image encoder supports Resnet50 or ViT(vit_base_patch16_384).

Text encoder supports only DistilBert like moein-shariatnia.

ENVIRONTMENT SETTING

$ virtualenv .venv --python=python3.6
$ source .venv/bin/activate
$ pip install -r requirements.txt

EXECUTTION

Pretrain

$ python3 pretrain.py

Inference

$ python3 inference.py --qeury={YOUR QUERY}

CAUTION

You must set(or check) some options in config.py before pretrain & inference

ex1) dataset("8k" or "30k"): Train dataset(flicker-8k or flicker-30k)

ex2) model_name("resnet50" or "vit_base_patch16_384"): Type of image encoder

ex3) pretrained(True or False): Decide whether to learn by loading pretrain versions of text encoder(DistilBert) and image encoder(resnet50 or ViT)

ex4) batch_size: Set according to the capacity of the machine

This is a modification of the OpenAI-CLIP repository of moein-shariatnia

Related tags

Overview

INTRODUCTION

ENVIRONTMENT SETTING

EXECUTTION

CAUTION

Owner

Sangwon Beak

Constituency Tree Labeling Tool

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository contains the code for "Generating Datasets with Pretrained Language Models".

Pytorch version of BERT-whitening

A method to generate speech across multiple speakers

GSoC'2021 | TensorFlow implementation of Wav2Vec2

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

Blazing fast language detection using fastText model

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Neural-Machine-Translation - Implementation of revolutionary machine translation models

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Code and data accompanying Natural Language Processing with PyTorch

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

The (extremely) naive sentiment classification function based on NBSVM trained on wisesight_sentiment

Multilingual word vectors in 78 languages

Wrapper to display a script output or a text file content on the desktop in sway or other wlroots-based compositors

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

Google and Stanford University released a new pre-trained model called ELECTRA

Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization