DeepViT

This repo is the official implementation of "DeepViT: Towards Deeper Vision Transformer". The repo is based on the timm library (https://github.com/rwightman/pytorch-image-models) by Ross Wightman

Introduction

Deep Vision Transformer is initially described in arxiv, which observes the attention collapese phenomenon when training deep vision transformers: In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper. More specifically, we empirically observe that such scaling difficulty is caused by the attention collapse issue: as the transformer goes deeper, the attention maps gradually become similar and even much the same after certain layers. In other words, the feature maps tend to be identical in the top layers of deep ViT models. This fact demonstrates that in deeper layers of ViTs, the self-attention mechanism fails to learn effective concepts for representation learning and hinders the model from getting expected performance gain. Based on above observation, we propose a simple yet effective method, named Re-attention, to re-generate the attention maps to increase their diversity at different layers with negligible computation and memory cost. The pro-posed method makes it feasible to train deeper ViT models with consistent performance improvements via minor modification to existing ViT models. Notably, when training a deep ViT model with 32 transformer blocks, the Top-1 classification accuracy can be improved by 1.6% on ImageNet.

2. DeepViT Models

Model	Re-attention	Top1 Acc (%)	#params	#Similar Blocks	Checkpoint
ViT-16	NA	78.88	24.5M	5	[here](comming soon)
DeepViT-16	FC	79.10	24.5M	0	[here](comming soon)
ViT-24	NA	79.35	36.3M	11	[here](comming soon)
DeepViT-24	FC	79.99	36.3M	0	[here](comming soon)
ViT-32	NA	79.27	48.1M	15	[here](comming soon)
DeepViT_t-32	FC	80.90	48.1M	0	[here](comming soon)

Citing DeepVit

@article{zhou2021deepvit,
  title={DeepViT: Towards Deeper Vision Transformer},
  author={Zhou, Daquan and Kang, Bingyi and Jin, Xiaojie and Yang, Linjie and Lian, Xiaochen and Hou, Qibin and Feng, Jiashi},
  journal={arXiv preprint arXiv:2103.11886},
  year={2021}
}

《DeepViT: Towards Deeper Vision Transformer》(2021)

Related tags

Overview

DeepViT

Introduction

2. DeepViT Models

Citing DeepVit

Owner

EMNLP 2021 Findings' paper, SCICAP: Generating Captions for Scientific Figures

CNN designed for pansharpening

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

git《USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation》(2020) GitHub: [fig2]

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Implementation for Shape from Polarization for Complex Scenes in the Wild

Convolutional 2D Knowledge Graph Embeddings resources

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"

This project intends to use SVM supervised learning to determine whether or not an individual is diabetic given certain attributes.

[NeurIPS'21] Shape As Points: A Differentiable Poisson Solver

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

Unsupervised clustering of high content screen samples

To SMOTE, or not to SMOTE?

This repository contains a toolkit for collecting, labeling and tracking object keypoints

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

ServiceX Transformer that converts flat ROOT ntuples into columnwise data

CRNN With PyTorch

Implementation of QuickDraw - an online game developed by Google, combined with AirGesture - a simple gesture recognition application