Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows.

Last update: Mar 14, 2022

Overview

Swin-Transformer

Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows. For more details, please refer to "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"

This repo is an implementation of MegEngine version Swin-Transformer. This is also a showcase for training on GPU with less memory by leveraging MegEngine DTR technique.

There is also an official PyTorch implementation.

Usage

Install

Clone this repo:

git clone https://github.com/MegEngine/swin-transformer.git
cd swin-transformer

Install megengine==1.6.0

pip3 install megengine==1.6.0 -f https://megengine.org.cn/whl/mge.html

Training

To train a Swin Transformer using random data, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> train_random.py

To train a Swin Transformer using AMP (Auto Mix Precision), run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --mode mp train_random.py

To train a Swin Transformer using DTR in dynamic graph mode, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --dtr [--dtr-thd <eviction-threshold-of-dtr>] train_random.py

To train a Swin Transformer using DTR in static graph mode, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --trace --symbolic --dtr --dtr-thd <eviction-threshold-of-dtr> train_random.py

For example, to train a Swin Transformer with a single GPU using DTR in static graph mode with threshold=8GB and AMP, run:

python3 -n 1 -b 340 -s 10 --trace --symbolic --dtr --dtr-thd 8 --mode mp train_random.py

For more usage, run:

python3 train_random.py -h

Benchmark

Testing Devices
- 2080Ti @ cuda-10.1-cudnn-v7.6.3-TensorRT-5.1.5.0 @ Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
- Reserve all CUDA memory by setting MGB_CUDA_RESERVE_MEMORY=1, in order to alleviate memory fragmentation problem

Settings	Maximum Batch Size	Speed(s/step)	Throughput(images/s)
None	68	0.490	139
AMP	100	0.494	202
DTR in static graph mode	300	2.592	116
DTR in static graph mode + AMP	340	1.944	175

Acknowledgement

We are inspired by the Swin-Transformer repository, many thanks to microsoft!

Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows.

Related tags

Overview

Swin-Transformer

Usage

Install

Training

Benchmark

Acknowledgement

Owner

旷视天元 MegEngine

source code of “Visual Saliency Transformer” (ICCV2021)

Train DeepLab for Semantic Image Segmentation

Code for IntraQ, PyTorch implementation of our paper under review

Towards Fine-Grained Reasoning for Fake News Detection

The repository for our EMNLP 2021 paper "Finnish Dialect Identification: The Effect of Audio and Text"

The Codebase for Causal Distillation for Language Models.

a Pytorch easy re-implement of "YOLOX: Exceeding YOLO Series in 2021"

PyTorch implementation of PP-LCNet: A Lightweight CPU Convolutional Neural Network

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

The 3rd place solution for competition

Code for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

PyTorch common framework to accelerate network implementation, training and validation

Tensorflow-Project-Template - A best practice for tensorflow project template architecture.

StarGAN v2-Tensorflow - Simple Tensorflow implementation of StarGAN v2

This project provides a stock market environment using OpenGym with Deep Q-learning and Policy Gradient.

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

Resilience from Diversity: Population-based approach to harden models against adversarial attacks

Learning to Prompt for Vision-Language Models.

Dilated RNNs in pytorch