Crossover Learning for Fast Online Video Instance Segmentation (ICCV 2021)

Last update: Nov 25, 2022

Related tags

Deep Learning CrossVIS

Overview

TL;DR: CrossVIS (Crossover Learning for Fast Online Video Instance Segmentation) proposes a novel crossover learning paradigm to fully leverage rich contextual information across video frames, and obtains great trade-off between accuracy and speed for video instance segmentation.

Crossover Learning for Fast Online Video Instance Segmentation

Crossover Learning for Fast Online Video Instance Segmentation (ICCV 2021)

by Shusheng Yang*, Yuxin Fang*, Xinggang Wang†, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu.

(*) equal contribution, (†) corresponding author.

ICCV2021 Paper

Main Results on YouTube-VIS 2019 Dataset

We provide both checkpoints and codalab server submissions in the bellow link.

Name	AP	[email protected]	[email protected]	[email protected]	[email protected]	download
CrossVIS_R_50_1x	35.5	55.1	39.0	35.4	42.2	baidu(keycode: `a0j0`) \| google
CrossVIS_R_101_1x	36.9	57.8	41.4	36.2	43.9	baidu(keycode: `iwwo`) \| google

Getting Started

Installation

First, clone the repository locally:

git clone https://github.com/hustvl/CrossVIS.git

Then, create python virtual environment with conda:

conda create --name crossvis python=3.7.2
conda activate crossvis

Install torch 1.7.0 and torchvision 0.8.1:

pip install torch==1.7.0 torchvision==0.8.1

Follow the instructions to install detectron2. Please install detectron2 with commit id 9eb4831 if you have any issues related to detectron2.

Then install AdelaiDet by:

cd CrossVIS
python setup.py develop

Preparation

Download YouTube-VIS 2019 dataset from here, the overall directory hierarchical structure is:

CrossVIS
├── datasets
│   ├── youtubevis
│   │   ├── train
│   │   │   ├── 003234408d
│   │   │   ├── ...
│   │   ├── val
│   │   │   ├── ...
│   │   ├── annotations
│   │   │   ├── train.json
│   │   │   ├── valid.json

Download CondInst 1x pretrained model from here

Training

Train CrossVIS R-50 with single GPU:

python tools/train_net.py --config configs/CrossVIS/R_50_1x.yaml MODEL.WEIGHTS $PATH_TO_CondInst_MS_R_50_1x

Train CrossVIS R-50 with multi GPUs:

python tools/train_net.py --config configs/CrossVIS/R_50_1x.yaml --num-gpus $NUM_GPUS MODEL.WEIGHTS $PATH_TO_CondInst_MS_R_50_1x

Inference

python tools/test_vis.py --config-file configs/CrossVIS/R_50_1x.yaml --json-file datasets/youtubevis/annotations/valid.json --opts MODEL.WEIGHTS $PATH_TO_CHECKPOINT

The final results will be stored in results.json, just compress it with zip and upload to the codalab server to get the performance on validation set.

Acknowledgement ❤️

This code is mainly based on detectron2 and AdelaiDet, thanks for their awesome work and great contributions to the computer vision community!

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :

@InProceedings{Yang_2021_ICCV,
    author    = {Yang, Shusheng and Fang, Yuxin and Wang, Xinggang and Li, Yu and Fang, Chen and Shan, Ying and Feng, Bin and Liu, Wenyu},
    title     = {Crossover Learning for Fast Online Video Instance Segmentation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {8043-8052}
}

Crossover Learning for Fast Online Video Instance Segmentation (ICCV 2021)

Related tags

Overview

Crossover Learning for Fast Online Video Instance Segmentation

Main Results on YouTube-VIS 2019 Dataset

Getting Started

Installation

Preparation

Training

Inference

Acknowledgement ❤️

Citation

Owner

Hust Visual Learning Team

Transformer part of 12th place solution in Riiid! Answer Correctness Prediction

A GOOD REPRESENTATION DETECTS NOISY LABELS

A python toolbox for predictive uncertainty quantification, calibration, metrics, and visualization

DISTIL: Deep dIverSified inTeractIve Learning.

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

(CVPR 2022) A minimalistic mapless end-to-end stack for joint perception, prediction, planning and control for self driving.

Goal of the project : Detecting Temporal Boundaries in Sign Language videos

High-Resolution Image Synthesis with Latent Diffusion Models

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification

An implementation of the BADGE batch active learning algorithm.

The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

(Preprint) Official PyTorch implementation of "How Do Vision Transformers Work?"

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

Blender Add-On for slicing meshes with planes

Physics-Informed Neural Networks (PINN) and Deep BSDE Solvers of Differential Equations for Scientific Machine Learning (SciML) accelerated simulation

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Implementation of gaze tracking and demo

Leveraging OpenAI's Codex to solve cornerstone problems in Music

This is the repository for our paper Ditch the Gold Standard: Re-evaluating Conversational Question Answering

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data