Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

Last update: Dec 20, 2022

Related tags

Overview

VidLanKD

Implementation of VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer by Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal.

Setup

# Create python environment (optional)
conda create -n vidlankd python=3.7

# Install python dependencies
pip install -r requirements.txt

To speed up the training, we use mixed precision with Apex.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dataset Preparation

Text Dataset

We provide scripts to obtain datasets "wiki103" and "wiki".

Wiki103, a seleted subset of English Wikipedia.

bash data/wiki103/get_data_cased.bash

English Wikipedia. The scripts are modified from XLM.

bash data/wiki/get_data_cased.bash en

Video Dataset

Howto100m where you can download official captions and videos features.

Video Features Extraction Code

To be updated.

We extracted our 2D-level video features with ResNet152 from torchvision.
We extracted our 3D-level video features with 3D-RexNext.

Downstream tasks

GLUE dataset

Download dataset

python download_glue_data.py --data_dir data/glue --tasks all

Training

Teacher model pre-training

# bash scripts/small_vlm_howto100m.bash $GPUS #teacher_SNAP_PATH
bash scripts/small_vlm_howto100m.bash 0,1,2,3 howto100m_bert_small_vokenhinge
# bash scripts/base_vlm_howto100m.bash $GPUS #teacher_SNAP_PATH
bash scripts/base_vlm_howto100m.bash 0,1,2,3 howto100m_bert_base_vokenhinge

Knowledge transfer to student model

# bash scripts/small_vlm_wiki103.bash $GPUS #teacher_SNAP_PATH #student_SNAP_PATH
bash scripts/small_vlm_wiki103.bash 0,1,2,3 howto100m_bert_small_vokenhinge/checkpoint-epoch0019 wiki103_bert_small_vokenmmd
# bash scripts/base_vlm_wiki.bash $GPUS #teacher_SNAP_PATH #student_SNAP_PATH
bash scripts/base_vlm_wiki.bash 0,1,2,3 howto100m_bert_base_vokenhinge/checkpoint-epoch0019 wiki_bert_base_vokenmmd

Finetuning on GLUE tasks

# bash scripts/run_glue_at_epoch.bash $GPUS $NumTrainEpochs $SNAP_PATH                        
bash scripts/run_glue_at_epoch.bash 0,1,2,3 3 snap/vlm/wiki103_bert_small_vokenmmd/checkpoint-epoch0019

Acknowledgements

Part of the code is built based on vokenization, huggingface transformers, and facebook faiss.

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

Related tags

Overview

VidLanKD

Setup

Dataset Preparation

Text Dataset

Video Dataset

Video Features Extraction Code

Downstream tasks

GLUE dataset

Training

Acknowledgements

Owner

Zineng Tang

The world's simplest facial recognition api for Python and the command line

Code to reproduce results from the paper "AmbientGAN: Generative models from lossy measurements"

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

Agile SVG maker for python

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Learning Saliency Propagation for Semi-supervised Instance Segmentation

This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

Hyper-parameter optimization for sklearn

This project generates news headlines using a Long Short-Term Memory (LSTM) neural network.

Official implementation of Unfolded Deep Kernel Estimation for Blind Image Super-resolution.

3D ResNet Video Classification accelerated by TensorRT

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Hack Camera, Microphone, Location, Clipboard With Just a Link. Also, Get Many Details About Victim's Device. And So On...

Count the MACs / FLOPs of your PyTorch model.

REGTR: End-to-end Point Cloud Correspondences with Transformers

A vision library for performing sliced inference on large images/small objects

Code basis for the paper "Camera Condition Monitoring and Readjustment by means of Noise and Blur" (2021)

An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)

Unified learning approach for egocentric hand gesture recognition and fingertip detection