Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

Overview

Swin Transformer V2: Scaling Up Capacity and Resolution

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution by Ze Liu, Han Hu et al. (Microsoft Research Asia).

This repository includes a pure PyTorch implementation of the Swin Transformer V2.

The official Swin Transformer V1 implementation is available here. Currently (10.01.2022), an official implementation of the Swin Transformer V2 is not publicly available.

Installation

You can simply install the Swin Transformer V2 implementation as a Python package by using pip.

pip install git+https://github.com/ChristophReich1996/Involution

Alternatively, you can clone the repository and use the implementation in swin_transformer_v2 directly in your project.

Usage

This implementation provides the configurations reported in the paper (SwinV2-T, SwinV2-S, etc.). You can build the model by calling the corresponding function. Please note that the Swin Transformer V2 (SwinTransformerV2 class) implementation returns the feature maps of each stage of the network (List[torch.Tensor]). If you want to use this implementation for image classification simply wrap this model and take the final feature map.

from swin_transformer_v2 import SwinTransformerV2

from swin_transformer_v2 import swin_transformer_v2_t, swin_transformer_v2_s, swin_transformer_v2_b, \
    swin_transformer_v2_l, swin_transformer_v2_h, swin_transformer_v2_g

# SwinV2-T
swin_transformer: SwinTransformerV2 = swin_transformer_v2_t(in_channels=3,
                                                            window_size=8,
                                                            input_resolution=(256, 256),
                                                            sequential_self_attention=False,
                                                            use_checkpoint=False)

If you want to change the resolution and/or the window size for fine-tuning or inference pleas use the update_resolution method.

# Change resolution and window size of the model
swin_transformer.update_resolution(new_window_size=16, new_input_resolution=(512, 512))

In case you want to use a custom configuration you can use the SwinTransformerV2 class. The constructor method takes the following parameters.

Parameter Description Type
in_channels Number of input channels int
depth Depth of the stage (number of layers) int
downscale If true input is downsampled (see Fig. 3 or V1 paper) bool
input_resolution Input resolution Tuple[int, int]
number_of_heads Number of attention heads to be utilized int
window_size Window size to be utilized int
shift_size Shifting size to be used int
ff_feature_ratio Ratio of the hidden dimension in the FFN to the input channels int
dropout Dropout in input mapping float
dropout_attention Dropout rate of attention map float
dropout_path Dropout in main path float
use_checkpoint If true checkpointing is utilized bool
sequential_self_attention If true sequential self-attention is performed bool

This file includes a full example how to use this implementation.

Disclaimer

This is a very experimental implementation based on the Swin Transformer V2 paper and the official implementation of the Swin Transformer V1. Since an official implementation of the Swin Transformer V2 is not yet published, it is not possible to say to which extent this implementation might differ from the original one. If you have any issues with this implementation please raise an issue.

Reference

@article{Liu2021,
    title={{Swin Transformer V2: Scaling Up Capacity and Resolution}},
    author={Liu, Ze and Hu, Han and Lin, Yutong and Yao, Zhuliang and Xie, Zhenda and Wei, Yixuan and Ning, Jia and Cao, 
            Yue and Zhang, Zheng and Dong, Li and others},
    journal={arXiv preprint arXiv:2111.09883},
    year={2021}
}
Owner
Christoph Reich
Autonomous systems and electrical engineering student @ Technical University of Darmstadt
Christoph Reich
Official repository for "Intriguing Properties of Vision Transformers" (2021)

Intriguing Properties of Vision Transformers Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, & Ming-Hsuan Yang P

Muzammal Naseer 155 Dec 27, 2022
Implementation of U-Net and SegNet for building segmentation

Specialized project Created by Katrine Nguyen and Martin Wangen-Eriksen as a part of our specialized project at Norwegian University of Science and Te

Martin.w-e 3 Dec 07, 2022
Commonsense Ability Tests

CATS Commonsense Ability Tests Dataset and script for paper Evaluating Commonsense in Pre-trained Language Models Use making_sense.py to run the exper

XUHUI ZHOU 28 Oct 19, 2022
ruptures: change point detection in Python

Welcome to ruptures ruptures is a Python library for off-line change point detection. This package provides methods for the analysis and segmentation

Charles T. 1.1k Jan 03, 2023
This project uses ViT to perform image classification tasks on DATA set CIFAR10.

Vision-Transformer-Multiprocess-DistributedDataParallel-Apex Introduction This project uses ViT to perform image classification tasks on DATA set CIFA

Kaicheng Yang 3 Jun 03, 2022
Radar-to-Lidar: Heterogeneous Place Recognition via Joint Learning

radar-to-lidar-place-recognition This page is the coder of a pre-print, implemented by PyTorch. If you have some questions on this project, please fee

Huan Yin 37 Oct 09, 2022
Evolutionary Scale Modeling (esm): Pretrained language models for proteins

Evolutionary Scale Modeling This repository contains code and pre-trained weights for Transformer protein language models from Facebook AI Research, i

Meta Research 1.6k Jan 09, 2023
Differentiable architecture search for convolutional and recurrent networks

Differentiable Architecture Search Code accompanying the paper DARTS: Differentiable Architecture Search Hanxiao Liu, Karen Simonyan, Yiming Yang. arX

Hanxiao Liu 3.7k Jan 09, 2023
Drone Task1 - Drone Task1 With Python

Drone_Task1 Matching Results 3.mp4 1.mp4

MLV Lab (Machine Learning and Vision Lab at Korea University) 11 Nov 14, 2022
Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

6D Rotation Representation for Unconstrained Head Pose Estimation (Pytorch) Paper Thorsten Hempel and Ahmed A. Abdelrahman and Ayoub Al-Hamadi, "6D Ro

Thorsten Hempel 284 Dec 23, 2022
disentanglement_lib is an open-source library for research on learning disentangled representations.

disentanglement_lib disentanglement_lib is an open-source library for research on learning disentangled representation. It supports a variety of diffe

Google Research 1.3k Dec 28, 2022
COCO Style Dataset Generator GUI

A simple GUI-based COCO-style JSON Polygon masks' annotation tool to facilitate quick and efficient crowd-sourced generation of annotation masks and bounding boxes. Optionally, one could choose to us

Hans Krupakar 142 Dec 09, 2022
Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

TianYuan 27 Nov 07, 2022
A library for efficient similarity search and clustering of dense vectors.

Faiss Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any

Meta Research 18.8k Jan 08, 2023
The Most Efficient Temporal Difference Learning Framework for 2048

moporgic/TDL2048+ TDL2048+ is a highly optimized temporal difference (TD) learning framework for 2048. Features Many common methods related to 2048 ar

Hung Guei 5 Nov 23, 2022
The best solution of the Weather Prediction track in the Yandex Shifts challenge

yandex-shifts-weather The repository contains information about my solution for the Weather Prediction track in the Yandex Shifts challenge https://re

Ivan Yu. Bondarenko 15 Dec 18, 2022
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

In-Place Activated BatchNorm In-Place Activated BatchNorm for Memory-Optimized Training of DNNs In-Place Activated BatchNorm (InPlace-ABN) is a novel

1.3k Dec 29, 2022
Add-on for importing and auto setup of character creator 3 character exports.

CC3 Blender Tools An add-on for importing and automatically setting up materials for Character Creator 3 character exports. Using Blender in the Chara

260 Jan 05, 2023
CS50x-AI - Artificial Intelligence with Python from Harvard University

CS50x-AI Artificial Intelligence with Python from Harvard University 📖 Table of

Hosein Damavandi 6 Aug 22, 2022
An implementation of an abstract algebra for music tones (pitches).

nbdev template Use this template to more easily create your nbdev project. If you are using an older version of this template, and want to upgrade to

Open Music Kit 0 Oct 10, 2022