CLOOB training (JAX) and inference (JAX and PyTorch)

Last update: Nov 27, 2022

Related tags

Overview

cloob-training

Pretrained models

There are two pretrained CLOOB models in this repo at the moment, a 16 epoch and a 32 epoch ViT-B/16 checkpoint trained on LAION 400M.

Zero-shot ImageNet validation set accuracy (using OpenCLIP's code):

Model name	Top 1	Top 5
cloob_laion_400m_vit_b_16_16_epochs	0.61238	0.8492
cloob_laion_400m_vit_b_16_32_epochs	0.62816	0.85964
OpenAI CLIP ViT-B/32	0.6327	0.88772
OpenAI CLIP ViT-B/16	0.68132	0.91768
OpenAI CLIP ViT-L/14	0.75388	0.9454
OpenAI CLIP ViT-L/14 @ 336 px	0.76564	0.9515
OpenAI CLIP RN50	0.59806	0.86498
OpenAI CLIP RN101	0.62296	0.88106
OpenAI CLIP RN50x4	0.66268	0.9046
OpenAI CLIP RN50x16	0.70754	0.92822
OpenAI CLIP RN50x64	0.74134	0.94146

PyTorch

from cloob_training import model_pt, pretrained

pretrained.list_configs()

returns:

['cloob_laion_400m_vit_b_16_16_epochs', 'cloob_laion_400m_vit_b_16_32_epochs']

The models can be used by:

config = pretrained.get_config('cloob_laion_400m_vit_b_16_16_epochs')
model = model_pt.get_pt_model(config)
checkpoint = pretrained.download_checkpoint(config)
model.load_state_dict(model_pt.get_pt_params(config, checkpoint))
model.eval().requires_grad_(False).to('cuda')

Model class attributes:

model.config: the model config dict.

model.image_encoder: the image encoder, which expects NCHW batches of normalized images (preprocessed by model.normalize), where C = model.config['image_encoder']['input_channels'] and H, W = model.config['image_encoder']['image_size'].

model.text_encoder: the text encoder, which expects text tokenized by model.tokenize.

model.normalize: the preprocessor for image tensors.

model.tokenize: the preprocessor for text.

JAX

Coming soon...

Training (JAX only)

Coming soon...

CLOOB training (JAX) and inference (JAX and PyTorch)

Related tags

Overview

cloob-training

Pretrained models

PyTorch

JAX

Training (JAX only)

Owner

Katherine Crowson

[SIGGRAPH Asia 2019] Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning

A pytorch reprelication of the model-based reinforcement learning algorithm MBPO

The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

The missing CMake project initializer

Semi-supervised Video Deraining with Dynamical Rain Generator (CVPR, 2021, Pytorch)

The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR

Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation.

Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.

A resource for learning about ML, DL, PyTorch and TensorFlow. Feedback always appreciated :)

Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

General-purpose program synthesiser

AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Intrusion Detection System using ensemble learning (machine learning)

DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predicate.

Fast, accurate and reliable software for algebraic CT reconstruction

Connecting Java/ImgLib2 + Python/NumPy

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift (ICCV 2021)