The implementation of "Bootstrapping Semantic Segmentation with Regional Contrast".

Last update: Dec 30, 2022

Overview

ReCo - Regional Contrast

This repository contains the source code of ReCo and baselines from the paper, Bootstrapping Semantic Segmentation with Regional Contrast, introduced by Shikun Liu, Shuaifeng Zhi, Edward Johns, and Andrew Davison.

Check out our project page for more qualitative results.

Datasets

ReCo is evaluated with three datasets: CityScapes, PASCAL VOC and SUN RGB-D in the full label mode, among which CityScapes and PASCAL VOC are additionally evaluated in the partial label mode.

For CityScapes, please download the original dataset from the official CityScapes site: leftImg8bit_trainvaltest.zip and gtFine_trainvaltest.zip. Create and extract them to the corresponding dataset/cityscapes folder.
For Pascal VOC, please download the original training images from the official PASCAL site: VOCtrainval_11-May-2012.tar and the augmented labels here: SegmentationClassAug.zip. Extract the folder JPEGImages and SegmentationClassAug into the corresponding dataset/pascal folder.
For SUN RGB-D, please download the train dataset here: SUNRGBD-train_images.tgz, test dataset here: SUNRGBD-test_images.tgz and labels here: sunrgbd_train_test_labels.tar.gz. Extract and place them into the corresponding dataset/sun folder.

After making sure all datasets having been downloaded and placed correctly, run each processing file python dataset/{DATASET}_preprocess.py to pre-process each dataset ready for the experiments. The preprocessing file also includes generating partial label for Cityscapes and Pascal dataset with three random seeds. Feel free to modify the partial label size and random seed to suit your own research setting.

For the lazy ones: just download the off-the-shelf pre-processed datasets here: CityScapes, Pascal VOC and SUN RGB-D.

Training Supervised and Semi-supervised Models

In this paper, we introduce two novel training modes for semi-supervised learning.

Full Labels Partial Dataset: A sparse subset of training images has full ground-truth labels, with the remaining data unlabelled.
Partial Labels Full Dataset: All images have some labels, but covering only a sparse subset of pixels.

Running the following four scripts would train each mode with supervised or semi-supervised methods respectively:

python train_sup.py             # Supervised learning with full labels.
python train_semisup.py         # Semi-supervised learning with full labels.
python train_sup_partial.py     # Supervised learning with partial labels.
python train_semisup_patial.py  # Semi-supervised learning with partial labels.

Important Flags

All supervised and semi-supervised methods can be trained with different flags (hyper-parameters) when running each training script. We briefly introduce some important flags for the experiments below.

Flag Name	Usage	Comments
`num_labels`	number of labelled images in the training set, choose `0` for training all labelled images	only available in the full label mode
`partial`	percentage of labeled pixels for each class in the training set, choose `p0, p1, p5, p25` for training 1, 1%, 5%, 25% labelled pixel(s) respectively	only available in the partial label mode
`num_negatives`	number of negative keys sampled for each class in each mini-batch	only applied when training with ReCo loss
`num_queries`	number of queries sampled for each class in each mini-batch	only applied when training with ReCo loss
`output_dim`	dimensionality for pixel-level representation	only applied when training with ReCo loss
`temp`	temperature used in contrastive learning	only applied when training with ReCo loss
`apply_aug`	semi-supervised methods with data augmentation, choose `cutout, cutmix, classmix`	only available in the semi-supervised methods; our implementations for CutOut, CutMix and ClassMix
`weak_threshold`	weak threshold `delta_w` in active sampling	only applied when training with ReCo loss
`strong_threshold`	strong threshold `delta_s` in active sampling	only applied when training with ReCo loss
`apply_reco`	toggle on or off	apply our proposed ReCo loss

Training ReCo + ClassMix with the fewest full label setting in each dataset (the least appeared classes in each dataset have appeared in 5 training images):

python train_semisup.py --dataset pascal --num_labels 60 --apply_aug classmix --apply_reco
python train_semisup.py --dataset cityscapes --num_labels 20 --apply_aug classmix --apply_reco
python train_semisup.py --dataset sun --num_labels 50 --apply_aug classmix --apply_reco

Training ReCo + ClassMix with the fewest partial label setting in each dataset (each class in each training image only has 1 labelled pixel):

python train_semisup_partial.py --dataset pascal --partial p0 --apply_aug classmix --apply_reco
python train_semisup_partial.py --dataset cityscapes --partial p0 --apply_aug classmix --apply_reco
python train_semisup_partial.py --dataset sun --partial p0 --apply_aug classmix --apply_reco

Training ReCo + Supervised with all labelled data:

python train_sup.py --dataset {DATASET} --num_labels 0 --apply_reco

Training with ReCo is expected to require 12 - 16G of memory in a single GPU setting. All the other baselines can be trained under 12G in a single GPU setting.

Visualisation on Pre-trained Models

We additionally provide the pre-trained baselines and our method for 20 labelled Cityscapes and 60 labelled Pascal VOC, as examples for visualisation. The precise mIoU performance for each model is listed in the following table. The pre-trained models will produce the exact same qualitative results presented in the original paper.

	Supervised	ClassMix	ReCo + ClassMix
CityScapes (20 Labels)	38.10 [link]	45.13 [link]	50.14 [link]
Pascal VOC (60 Labels)	36.06 [link]	53.71 [link]	57.12 [link]

Download the pre-trained models with the links above, then create and place them into the folder model_weights in this repository. Run python visual.py to visualise the results.

Other Notices

We observe that the performance for the full label semi-supervised setting in CityScapes dataset is not stable across different machines, for which all methods may drop 2-5% performance, though the ranking keeps the same. Different GPUs in the same machine do not affect the performance. The performance for the other datasets in the full label mode, and the performance for all datasets in the partial label mode is consistent.
Please use --seed 0, 1, 2 to accurately reproduce/compare our results with the exactly same labelled and unlabelled split we used in our experiments.

Citation

If you found this code/work to be useful in your own research, please considering citing the following:

@article{liu2021reco,
    title={Bootstrapping Semantic Segmentation with Regional Contrast},
    author={Liu, Shikun and Zhi, Shuaifeng and Johns, Edward and Davison, Andrew J},
    journal={arXiv preprint arXiv:2104.04465},
    year={2021}
}

Contact

If you have any questions, please contact [email protected].

The implementation of "Bootstrapping Semantic Segmentation with Regional Contrast".

Related tags

Overview

ReCo - Regional Contrast

Datasets

Training Supervised and Semi-supervised Models

Important Flags

Visualisation on Pre-trained Models

Other Notices

Citation

Contact

Owner

Shikun Liu

Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

Model of an AI powered sign language interpreter.

KoCLIP: Korean port of OpenAI CLIP, in Flax

Converts geometry node attributes to built-in attributes

A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.

Contrastive Learning for Metagenomic Binning

Graph Robustness Benchmark: A scalable, unified, modular, and reproducible benchmark for evaluating the adversarial robustness of Graph Machine Learning.

Informal Persian Universal Dependency Treebank

Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

YuNetのPythonでのONNX、TensorFlow-Lite推論サンプル

Learning Off-Policy with Online Planning, CoRL 2021

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

Sinkformers: Transformers with Doubly Stochastic Attention

Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment"

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Multistream CNN for Robust Acoustic Modeling

(under submission) Bayesian Integration of a Generative Prior for Image Restoration

This is the source code for: Context-aware Entity Typing in Knowledge Graphs.

Pytorch implementation of "ARM: Any-Time Super-Resolution Method"