This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Last update: Dec 18, 2022

Overview

Dynamic-Vision-Transformer (Pytorch)

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Not All Images are Worth 16x16 Words: Dynamic Vision Transformers with Adaptive Sequence Length

Update on 2021/06/01: Release Pre-trained Models and the Inference Code on ImageNet.

Introduction

We develop a Dynamic Vision Transformer (DVT) to automatically configure a proper number of tokens for each individual image, leading to a significant improvement in computational efficiency, both theoretically and empirically.

Results

Top-1 accuracy on ImageNet v.s. GFLOPs

Top-1 accuracy on CIFAR v.s. GFLOPs

Top-1 accuracy on ImageNet v.s. Throughput

Visualization

Pre-trained Models

Backbone	# of Exits	# of Tokens	Links
T2T-ViT-12	3	7x7-10x10-14x14	Tsinghua Cloud / Google Drive

What are contained in the checkpoints:

**.pth.tar
├── model_state_dict: state dictionaries of the model
├── flops: a list containing the GFLOPs corresponding to exiting at each exit
├── anytime_classification: Top-1 accuracy of each exit
├── dynamic_threshold: the confidence thresholds used in budgeted batch classification
├── budgeted_batch_classification: results of budgeted batch classification (a two-item list, [0] and [1] correspond to the two coordinates of a curve)

Requirements

python 3.7.7
pytorch 1.3.1
torchvision 0.4.2

Evaluate Pre-trained Models

Read the evaluation results saved in pre-trained models

CUDA_VISIBLE_DEVICES=0 python inference.py --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 0

Read the confidence thresholds saved in pre-trained models and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 1

Determine confidence thresholds on the training set and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 2

The dataset is expected to be prepared as follows:

ImageNet
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...

Contact

If you have any question, please feel free to contact the authors. Yulin Wang: [email protected].

Acknowledgment

Our code of T2T-ViT from here.

To Do

Update the code for training.

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Related tags

Overview

Dynamic-Vision-Transformer (Pytorch)

Introduction

Results

Pre-trained Models

Requirements

Evaluate Pre-trained Models

Contact

Acknowledgment

To Do

Owner

HMLET (Hybrid-Method-of-Linear-and-non-linEar-collaborative-filTering-method)

Data and codes for ACL 2021 paper: Towards Emotional Support Dialog Systems

《Geo Word Clouds》paper implementation

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

Birthday-problem - The birthday problem asks for the probability that, in a set of n randomly chosen people, at least two will share a birthday

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

DrQ-v2: Improved Data-Augmented Reinforcement Learning

Starter code for the ICCV 2021 paper, 'Detecting Invisible People'

Writeups for the challenges from DownUnderCTF 2021

On-device speech-to-index engine powered by deep learning.

Imagededup - 😎 Finding duplicate images made easy

A python comtrade load library accelerated by go

Google Recaptcha solver.

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

🎓Automatically Update CV Papers Daily using Github Actions (Update at 12:00 UTC Every Day)

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

Most popular metrics used to evaluate object detection algorithms.

Artificial Intelligence search algorithm base on Pacman

A Simulated Optimal Intrusion Response Game

This repo contains the official implementations of EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis