Vision Transformer with Deformable Attention

This repository contains the code for the paper Vision Transformer with Deformable Attention [arXiv].

Introduction

Deformable attention is proposed to model the relations among tokens effectively under the guidance of the important regions in the feature maps. This flexible scheme enables the self-attention module to focus on relevant regions and capture more informative features. On this basis, we present Deformable Attention Transformer (DAT), a general backbone model with deformable attention for both image classification and other dense prediction tasks.

Dependencies

NVIDIA GPU + CUDA 11.1
Python 3.8 (Recommend to use Anaconda)
PyTorch == 1.8.0
timm
einops
yacs
termcolor

TODO

Classification pretrained models.
Object Detection codebase & models.
Semantic Segmentation codebase & models.
CUDA operators to accelerate sampling operations.

Acknowledgement

This code is developed on the top of Swin Transformer, we thank to their efficient and neat codebase.

Citation

If you find our work is useful in your research, please consider citing:

@misc{xia2022vision,
      title={Vision Transformer with Deformable Attention}, 
      author={Zhuofan Xia and Xuran Pan and Shiji Song and Li Erran Li and Gao Huang},
      year={2022},
      eprint={2201.00520},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

[email protected]

Repository of Vision Transformer with Deformable Attention

Related tags

Overview

Vision Transformer with Deformable Attention

Introduction

Dependencies

TODO

Acknowledgement

Citation

Contact

Owner

mlpack: a scalable C++ machine learning library --

COIN the currently largest dataset for comprehensive instruction video analysis.

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Read and write layered TIFF ImageSourceData and ImageResources tags

Code and dataset for AAAI 2021 paper FixMyPose: Pose Correctional Describing and Retrieval Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal.

Memory efficient transducer loss computation

Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency"

UMPNet: Universal Manipulation Policy Network for Articulated Objects

Joint-task Self-supervised Learning for Temporal Correspondence (NeurIPS 2019)

MicroNet: Improving Image Recognition with Extremely Low FLOPs (ICCV 2021)

PyTorch common framework to accelerate network implementation, training and validation

Toolbox of models, callbacks, and datasets for AI/ML researchers.

Multivariate Time Series Transformer, public version

🐦 Quickly annotate data from the comfort of your Jupyter notebook

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

A platform to display the carbon neutralization information for researchers, decision-makers, and other participants in the community.

Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

Python code to generate art with Generative Adversarial Network

"Inductive Entity Representations from Text via Link Prediction" @ The Web Conference 2021