This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Last update: Dec 13, 2022

Related tags

Overview

ResT

By Qing-Long Zhang and Yu-Bin Yang

[State Key Laboratory for Novel Software Technology at Nanjing University]

This repo is the official implementation of "ResT: An Efficient Transformer for Visual Recognition". It currently includes code and models for the following tasks:

Image Classification: Included in this repo. See get_started.md for a quick start.

Object Detection and Instance Segmentation: Based on detectron2, coming soon.

ResT is initially described in arxiv, which capably serves as a general-purpose backbone for computer vision. It can tackle input images with arbitrary size. Besides, ResT compressed the memory of standard MSA and model the interaction between multi-heads while keeping the diversity ability.

Main Results on ImageNet with Pretrained Models

ImageNet-1K Pretrained Models

name	resolution	[email protected]	[email protected]	#params	FLOPs	FPS	1K model
ResT-Lite	224x224	77.2	93.7	10.5M	1.4G	1246	baidu
ResT-Small	224x224	79.6	94.9	13.7M	1.9G	1043	baidu
ResT-Base	224x224	81.6	95.7	30.3M	4.3G	673	baidu
ResT-Large	224x224	83.6	96.3	51.6M	7.9G	429	baidu

Note: access code for baidu is rest.

Citing ResT

@article{zhql2021ResT,
  title={ResT: An Efficient Transformer for Visual Recognition},
  author={Zhang, Qinglong and Yang, Yubin},
  journal={arXiv preprint arXiv:2105.13677v2},
  year={2021}
}

This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Related tags

Overview

ResT

Main Results on ImageNet with Pretrained Models

Citing ResT

Owner

zhql

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

Incorporating Transformer and LSTM to Kalman Filter with EM algorithm

Generative Art Using Neural Visual Grammars and Dual Encoders

A PyTorch implementation of EventProp [https://arxiv.org/abs/2009.08378], a method to train Spiking Neural Networks

In this project, we create and implement a deep learning library from scratch.

Project page for End-to-end Recovery of Human Shape and Pose

Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Research - dataset and code for 2016 paper Learning a Driving Simulator

Certified Patch Robustness via Smoothed Vision Transformers

Sequence modeling benchmarks and temporal convolutional networks

Chinese license plate recognition

Command-line tool for downloading and extending the RedCaps dataset.

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

QuALITY: Question Answering with Long Input Texts, Yes!

AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation

A list of multi-task learning papers and projects.

Hcpy - Interface with Home Connect appliances in Python

A LiDAR point cloud cluster for panoptic segmentation

The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

Final term project for Bayesian Machine Learning Lecture (XAI-623)