[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Last update: Jan 05, 2023

Related tags

Overview

SEgmentation TRansformers -- SETR

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers,
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, Li Zhang,
CVPR 2021

Installation

Our project is developed based on mmsegmentation. Please follow the official mmsegmentation INSTALL.md and getting_started.md for installation and dataset preparation.

Main results

Cityscapes

Method	Crop Size	Batch size	iteration	set	mIoU
SETR-Naive	768x768	8	40k	val	77.37	model config
SETR-Naive	768x768	8	80k	val	77.90	model config
SETR-MLA	768x768	8	40k	val	76.65	model config
SETR-MLA	768x768	8	80k	val	77.24	model config
SETR-PUP	768x768	8	40k	val	78.39	model config
SETR-PUP	768x768	8	80k	val	79.34	model config
SETR-Naive-DeiT	768x768	8	40k	val	77.85	model config
SETR-Naive-DeiT	768x768	8	80k	val	78.66	model config
SETR-MLA-DeiT	768x768	8	40k	val	78.04	model config
SETR-MLA-DeiT	768x768	8	80k	val	78.98	model config
SETR-PUP-DeiT	768x768	8	40k	val	78.79	model config
SETR-PUP-DeiT	768x768	8	80k	val	79.45	model config

ADE20K

Method	Crop Size	Batch size	iteration	set	mIoU	mIoU(ms+flip)
SETR-Naive	512x512	16	160k	Val	48.06	48.80	model config
SETR-MLA	512x512	8	160k	val	48.27	50.03	model config
SETR-MLA	512x512	16	160k	val	48.64	50.28	model config
SETR-PUP	512x512	16	160k	val	48.58	50.09	model config

Pascal Context

Method	Crop Size	Batch size	iteration	set	mIoU	mIoU(ms+flip)
SETR-Naive	480x480	16	80k	val	52.89	53.61	model config
SETR-MLA	480x480	8	80k	val	54.39	55.39	model config
SETR-MLA	480x480	16	80k	val	54.87	55.83	model config
SETR-PUP	480x480	16	80k	val	54.40	55.27	model config

Get Started

Train

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} 
# For example, train a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 8

Single-scale testing

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Multi-scale testing

Use the config file ending in _MS.py in configs/SETR.

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8_MS.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Please see getting_started.md for the more basic usage of training and testing.

Reference

@inproceedings{SETR,
    title={Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers}, 
    author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip H.S. and Zhang, Li},
    booktitle={CVPR},
    year={2021}
}

License

MIT

Acknowledgement

Thanks to previous open-sourced repo:
mmsegmentation
pytorch-image-models

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Related tags

Overview

SEgmentation TRansformers -- SETR

Installation

Main results

Cityscapes

ADE20K

Pascal Context

Get Started

Train

Single-scale testing

Multi-scale testing

Reference

License

Acknowledgement

Owner

Fudan Zhang Vision Group

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Adversarial Graph Augmentation to Improve Graph Contrastive Learning

Visualization toolkit for neural networks in PyTorch! Demo -->

Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

[ACM MM2021] MGH: Metadata Guided Hypergraph Modeling for Unsupervised Person Re-identification

✨✨✨An awesome open source toolbox for stereo matching.

This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting.

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

Control-Raspberry-Pi-Robot-using-Hand-Gestures - A 4WD Robot car based on Raspberry Pi that controlled by hand gestures(using openCV and mediapipe)

Adversarially Learned Inference

[NeurIPS 2021] Official implementation of paper "Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization".

Software associated to AAAI paper "Planning with Biological Neurons and Synapses"

Torch code for our CVPR 2018 paper "Residual Dense Network for Image Super-Resolution" (Spotlight)

JudeasRx - graphical app for doing personalized causal medicine using the methods invented by Judea Pearl et al.

Adversarial Attacks on Probabilistic Autoregressive Forecasting Models.

SimDeblur is a simple framework for image and video deblurring, implemented by PyTorch

Multiple style transfer via variational autoencoder

Codes for "Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier"

Improving Compound Activity Classification via Deep Transfer and Representation Learning

[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation