Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Related tags

Deep LearningTwins
Overview

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks. In this work, we revisit the design of the spatial attention and demonstrate that a carefully-devised yet simple spatial attention mechanism performs favourably against the state-of-the-art schemes. As a result, we propose two vision transformer architectures, namely, Twins- PCPVT and Twins-SVT. Our proposed architectures are highly-efficient and easy to implement, only involving matrix multiplications that are highly optimized in modern deep learning frameworks. More importantly, the proposed architectures achieve excellent performance on a wide range of visual tasks including image- level classification as well as dense detection and segmentation. The simplicity and strong performance suggest that our proposed architectures may serve as stronger backbones for many vision tasks.

Twins-SVT-S Figure 1. Twins-SVT-S Architecture (Right side shows the inside of two consecutive Transformer Encoders).

Model Zoo

Image Classification

We provide baseline Twins models pretrained on ImageNet 2012.

Name Alias in paper [email protected] FLOPs(G) #params (M) url
PVT+CPVT-Small Twins-PCPVT-S 81.2 3.7 24.1 pcpvt_small.pth
PVT+CPVT-Base Twins-PCPVT-B 82.7 6.4 43.8 pcpvt_base.pth
ALT-GVT-Small Twins-SVT-S 81.3 2.8 24 alt_gvt_small.pth
ALT-GVT-Base Twins-SVT-B 83.1 8.3 56 alt_gvt_base.pth
ALT-GVT-Large Twins-SVT-L 83.3 14.8 99.2 alt_gvt_large.pth

^ Note: Our code will be released soon.

Citation

@article{chu2021Twins,
	title={Twins: Revisiting the Design of Spatial Attention in Vision Transformers},
	author={Xiangxiang Chu and Zhi Tian and Yuqing Wang and Bo Zhang and Haibing Ren and Xiaolin Wei and Huaxia Xia and Chunhua Shen},
	journal={Arxiv preprint 2104.13840},
	url={https://arxiv.org/pdf/2104.13840.pdf},
	year={2021}
}
Multiview 3D object detection on MultiviewC dataset through moft3d.

Voxelized 3D Feature Aggregation for Multiview Detection [arXiv] Multiview 3D object detection on MultiviewC dataset through VFA. Introduction We prop

Jiahao Ma 20 Dec 21, 2022
The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

SaxonJS-Tutorial-2021, version 1.0.4 Last updated on 4 November, 2021. Table of contents Background Prerequisites Starting a web server Running a Java

Saxonica 11 Oct 23, 2022
Anti-UAV base on PaddleDetection

Paddle-Anti-UAV Anti-UAV base on PaddleDetection Background UAVs are very popular and we can see them in many public spaces, such as parks and playgro

Qingzhong Wang 2 Apr 20, 2022
The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

CrossFormer This repository is the code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention. Introduction Existin

cheerss 238 Jan 06, 2023
Implementation of Shape and Electrostatic similarity metric in deepFMPO.

DeepFMPO v3D Code accompanying the paper "On the value of using 3D-shape and electrostatic similarities in deep generative methods". The paper can be

34 Nov 28, 2022
Unsupervised Image Generation with Infinite Generative Adversarial Networks

Unsupervised Image Generation with Infinite Generative Adversarial Networks Here is the implementation of MICGANs using DCGAN architecture on MNIST da

16 Dec 24, 2021
Implementation of Wasserstein adversarial attacks.

Stronger and Faster Wasserstein Adversarial Attacks Code for Stronger and Faster Wasserstein Adversarial Attacks, appeared in ICML 2020. This reposito

21 Oct 06, 2022
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and ap

3.4k Jan 04, 2023
PyTorch Personal Trainer: My framework for deep learning experiments

Alex's PyTorch Personal Trainer (ptpt) (name subject to change) This repository contains my personal lightweight framework for deep learning projects

Alex McKinney 8 Jul 14, 2022
Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations Trevor Ablett, Daniel (Yifan) Zhai, Jonatha

STARS Laboratory 3 Feb 01, 2022
PyTorch Implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedding (ORAL, MICCAIW 2021)

Small Lesion Segmentation in Brain MRIs with Subpixel Embedding PyTorch implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedd

22 Oct 21, 2022
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph ConvNets in PyTorch October 15, 2017 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbresson

Xavier Bresson 287 Jan 04, 2023
Dynamic Token Normalization Improves Vision Transformers

Dynamic Token Normalization Improves Vision Transformers This is the PyTorch implementation of the paper Dynamic Token Normalization Improves Vision T

Wenqi Shao 20 Oct 09, 2022
An end-to-end regression problem of predicting the price of properties in Bangalore.

Bangalore-House-Price-Prediction An end-to-end regression problem of predicting the price of properties in Bangalore. Deployed in Heroku using Flask.

Shruti Balan 1 Nov 25, 2022
This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis

This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis Install the package in the requirements.txt, the

108 Dec 23, 2022
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 42 Nov 26, 2022
🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval 🏆 The 1st Place Submission to AICity Challenge 2021 Natural

82 Dec 29, 2022
Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Duong H. Le 18 Jun 13, 2022
Code implementation of "Sparsity Probe: Analysis tool for Deep Learning Models"

Sparsity Probe: Analysis tool for Deep Learning Models This repository is a limited implementation of Sparsity Probe: Analysis tool for Deep Learning

3 Jun 09, 2021
Convolutional Neural Network for 3D meshes in PyTorch

MeshCNN in PyTorch SIGGRAPH 2019 [Paper] [Project Page] MeshCNN is a general-purpose deep neural network for 3D triangular meshes, which can be used f

Rana Hanocka 1.4k Jan 04, 2023