Multi Task Vision and Language

Last update: Dec 19, 2022

Related tags

Overview

12-in-1: Multi-Task Vision and Language Representation Learning

Please cite the following if you use this code. Code and pre-trained models for 12-in-1: Multi-Task Vision and Language Representation Learning:

@InProceedings{Lu_2020_CVPR,
author = {Lu, Jiasen and Goswami, Vedanuj and Rohrbach, Marcus and Parikh, Devi and Lee, Stefan},
title = {12-in-1: Multi-Task Vision and Language Representation Learning},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

and ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks:

@inproceedings{lu2019vilbert,
  title={Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks},
  author={Lu, Jiasen and Batra, Dhruv and Parikh, Devi and Lee, Stefan},
  booktitle={Advances in Neural Information Processing Systems},
  pages={13--23},
  year={2019}
}

Repository Setup

Create a fresh conda environment, and install all dependencies.

conda create -n vilbert-mt python=3.6
conda activate vilbert-mt
git clone --recursive https://github.com/facebookresearch/vilbert-multi-task.git
cd vilbert-multi-task
pip install -r requirements.txt

Install pytorch

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

Install apex, follows https://github.com/NVIDIA/apex
Install this codebase as a package in this environment.

python setup.py develop

Data Setup

Check README.md under data for more details.

Visiolinguistic Pre-training and Multi Task Training

Pretraining on Conceptual Captions

python train_concap.py --bert_model bert-base-uncased --config_file config/bert_base_6layer_6conect.json --train_batch_size 512 --objective 1 --file_path <path_to_extracted_cc_features>

Download link

Multi-task Training

python train_tasks.py --bert_model bert-base-uncased --from_pretrained <pretrained_model_path> --config_file config/bert_base_6layer_6conect.json --tasks 1-2-4-7-8-9-10-11-12-13-15-17 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model

Download link

Fine-tune from Multi-task trained model

python train_tasks.py --bert_model bert-base-uncased --from_pretrained <multi_task_model_path> --config_file config/bert_base_6layer_6conect.json --tasks 1 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name finetune_from_multi_task_model

License

vilbert-multi-task is licensed under MIT license available in LICENSE file.

Multi Task Vision and Language

Related tags

Overview

12-in-1: Multi-Task Vision and Language Representation Learning

Repository Setup

Data Setup

Visiolinguistic Pre-training and Multi Task Training

Pretraining on Conceptual Captions

Multi-task Training

Fine-tune from Multi-task trained model

License

Owner

Facebook Research

(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Automated Attendance Project Using Face Recognition

Auxiliary data to the CHIIR paper Searching to Learn with Instructional Scaffolding

PyTorch implementation of UNet++ (Nested U-Net).

Realtime YOLO Monster Detection With Non Maximum Supression

Logsig-RNN: a novel network for robust and efficient skeleton-based action recognition

Pytorch implementation of Hinton's Dynamic Routing Between Capsules

[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Build Low Code Automated Tensorflow, What-IF explainable models in just 3 lines of code.

TransCD: Scene Change Detection via Transformer-based Architecture

Reimplementation of NeurIPS'19: "Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting" by Shu et al.

KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorch

Public Code for NIPS submission SimiGrad: Fine-Grained Adaptive Batching for Large ScaleTraining using Gradient Similarity Measurement

Official code repository for the publication "Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons"

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

BLEURT is a metric for Natural Language Generation based on transfer learning.