Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Last update: Dec 14, 2022

Related tags

Overview

RMNet

This repository contains the source code for the paper Efficient Regional Memory Network for Video Object Segmentation.

Cite this work

@inproceedings{xie2021efficient,
  title={Efficient Regional Memory Network for Video Object Segmentation},
  author={Xie, Haozhe and 
          Yao, Hongxun and 
          Zhou, Shangchen and 
          Zhang, Shengping and 
          Sun, Wenxiu},
  booktitle={CVPR},
  year={2021}
}

Datasets

We use the ECSSD, COCO, PASCAL VOC, MSRA10K, DAVIS, and YouTube-VOS datasets in our experiments, which are available below:

Pretrained Models

The pretrained models for DAVIS and YouTube-VOS are available as follows:

RMNet for DAVIS (202 MB)
RMNet for YouTube-VOS (202 MB)

Prerequisites

Clone the Code Repository

git clone https://github.com/hzxie/RMNet.git

Install Python Denpendencies

cd RMNet
pip install -r requirements.txt

Build PyTorch Extensions

NOTE: PyTorch >= 1.4, CUDA >= 9.0 and GCC >= 4.9 are required.

RMNET_HOME=`pwd`

cd $RMNET_HOME/extensions/reg_att_map_generator
python setup.py install --user

cd $RMNET_HOME/extensions/flow_affine_transformation
python setup.py install --user

Precompute the Optical Flow

For the DAVIS dataset, the optical flows are computed by FlowNet2-CSS with the model pretrained on FlyingThings3D.
For the YouTube-VOS dataset, the optical flows are computed by RAFT with the model pretrained on Sintel.

Update Settings in `config.py`

You need to update the file path of the datasets:

__C.DATASETS                                     = edict()
__C.DATASETS.DAVIS                               = edict()
__C.DATASETS.DAVIS.INDEXING_FILE_PATH            = './datasets/DAVIS.json'
__C.DATASETS.DAVIS.IMG_FILE_PATH                 = '/path/to/Datasets/DAVIS/JPEGImages/480p/%s/%05d.jpg'
__C.DATASETS.DAVIS.ANNOTATION_FILE_PATH          = '/path/to/Datasets/DAVIS/Annotations/480p/%s/%05d.png'
__C.DATASETS.DAVIS.OPTICAL_FLOW_FILE_PATH        = '/path/to/Datasets/DAVIS/OpticalFlows/480p/%s/%05d.flo'
__C.DATASETS.YOUTUBE_VOS                         = edict()
__C.DATASETS.YOUTUBE_VOS.INDEXING_FILE_PATH      = '/path/to/Datasets/YouTubeVOS/%s/meta.json'
__C.DATASETS.YOUTUBE_VOS.IMG_FILE_PATH           = '/path/to/Datasets/YouTubeVOS/%s/JPEGImages/%s/%s.jpg'
__C.DATASETS.YOUTUBE_VOS.ANNOTATION_FILE_PATH    = '/path/to/Datasets/YouTubeVOS/%s/Annotations/%s/%s.png'
__C.DATASETS.YOUTUBE_VOS.OPTICAL_FLOW_FILE_PATH  = '/path/to/Datasets/YouTubeVOS/%s/OpticalFlows/%s/%s.flo'
__C.DATASETS.PASCAL_VOC                          = edict()
__C.DATASETS.PASCAL_VOC.INDEXING_FILE_PATH       = '/path/to/Datasets/voc2012/trainval.txt'
__C.DATASETS.PASCAL_VOC.IMG_FILE_PATH            = '/path/to/Datasets/voc2012/images/%s.jpg'
__C.DATASETS.PASCAL_VOC.ANNOTATION_FILE_PATH     = '/path/to/Datasets/voc2012/masks/%s.png'
__C.DATASETS.ECSSD                               = edict()
__C.DATASETS.ECSSD.N_IMAGES                      = 1000
__C.DATASETS.ECSSD.IMG_FILE_PATH                 = '/path/to/Datasets/ecssd/images/%s.jpg'
__C.DATASETS.ECSSD.ANNOTATION_FILE_PATH          = '/path/to/Datasets/ecssd/masks/%s.png'
__C.DATASETS.MSRA10K                             = edict()
__C.DATASETS.MSRA10K.INDEXING_FILE_PATH          = './datasets/msra10k.txt'
__C.DATASETS.MSRA10K.IMG_FILE_PATH               = '/path/to/Datasets/msra10k/images/%s.jpg'
__C.DATASETS.MSRA10K.ANNOTATION_FILE_PATH        = '/path/to/Datasets/msra10k/masks/%s.png'
__C.DATASETS.MSCOCO                              = edict()
__C.DATASETS.MSCOCO.INDEXING_FILE_PATH           = './datasets/mscoco.txt'
__C.DATASETS.MSCOCO.IMG_FILE_PATH                = '/path/to/Datasets/coco2017/images/train2017/%s.jpg'
__C.DATASETS.MSCOCO.ANNOTATION_FILE_PATH         = '/path/to/Datasets/coco2017/masks/train2017/%s.png'
__C.DATASETS.ADE20K                              = edict()
__C.DATASETS.ADE20K.INDEXING_FILE_PATH           = './datasets/ade20k.txt'
__C.DATASETS.ADE20K.IMG_FILE_PATH                = '/path/to/Datasets/ADE20K_2016_07_26/images/training/%s.jpg'
__C.DATASETS.ADE20K.ANNOTATION_FILE_PATH         = '/path/to/Datasets/ADE20K_2016_07_26/images/training/%s_seg.png'

# Dataset Options: DAVIS, DAVIS_FRAMES, YOUTUBE_VOS, ECSSD, MSCOCO, PASCAL_VOC, MSRA10K, ADE20K
__C.DATASET.TRAIN_DATASET                        = ['ECSSD', 'PASCAL_VOC', 'MSRA10K', 'MSCOCO']  # Pretrain
__C.DATASET.TRAIN_DATASET                        = ['YOUTUBE_VOS', 'DAVISx5']                    # Fine-tune
__C.DATASET.TEST_DATASET                         = 'DAVIS'

# Network Options: RMNet, TinyFlowNet
__C.TRAIN.NETWORK                                = 'RMNet'

Get Started

To train RMNet, you can simply use the following command:

python3 runner.py

To test RMNet, you can use the following command:

python3 runner.py --test --weights=/path/to/pretrained/model.pth

License

This project is open sourced under MIT license.

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Related tags

Overview

RMNet

Cite this work

Datasets

Pretrained Models

Prerequisites

Clone the Code Repository

Install Python Denpendencies

Build PyTorch Extensions

Precompute the Optical Flow

Update Settings in `config.py`

Get Started

License

Owner

Haozhe Xie

Implementation of Invariant Point Attention, used for coordinate refinement in the structure module of Alphafold2, as a standalone Pytorch module

The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)

PyTorch implementation for "Sharpness-aware Quantization for Deep Neural Networks".

Minimal deep learning library written from scratch in Python, using NumPy/CuPy.

[2021][ICCV][FSNet] Full-Duplex Strategy for Video Object Segmentation

Api for getting bin info and getting encrypted card details for adyen.

Visyerres sgdf woob - Modules Woob pour l'intranet et autres sites Scouts et Guides de France

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Public repo for the ICCV2021-CVAMD paper "Is it Time to Replace CNNs with Transformers for Medical Images?"

[ICCV2021] Official Pytorch implementation for SDGZSL (Semantics Disentangling for Generalized Zero-Shot Learning)

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements (CVPR 2021)

CrossNorm and SelfNorm for Generalization under Distribution Shifts (ICCV 2021)

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Mememoji - A facial expression classification system that recognizes 6 basic emotions: happy, sad, surprise, fear, anger and neutral.

This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

Happywhale - Whale and Dolphin Identification Silver🥈 Solution (26/1588)

PyJokes - Joking around with Python library pyjokes

This solves the autonomous driving issue which is supported by deep learning technology. Given a video, it splits into images and predicts the angle of turning for each frame.

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Related tags

Overview

RMNet

Cite this work

Datasets

Pretrained Models

Prerequisites

Clone the Code Repository

Install Python Denpendencies

Build PyTorch Extensions

Precompute the Optical Flow

Update Settings in config.py

Get Started

License

Owner

Haozhe Xie

Implementation of Invariant Point Attention, used for coordinate refinement in the structure module of Alphafold2, as a standalone Pytorch module

The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)

PyTorch implementation for "Sharpness-aware Quantization for Deep Neural Networks".

Minimal deep learning library written from scratch in Python, using NumPy/CuPy.

[2021][ICCV][FSNet] Full-Duplex Strategy for Video Object Segmentation

Api for getting bin info and getting encrypted card details for adyen.

Visyerres sgdf woob - Modules Woob pour l'intranet et autres sites Scouts et Guides de France

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Public repo for the ICCV2021-CVAMD paper "Is it Time to Replace CNNs with Transformers for Medical Images?"

[ICCV2021] Official Pytorch implementation for SDGZSL (Semantics Disentangling for Generalized Zero-Shot Learning)

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements (CVPR 2021)

CrossNorm and SelfNorm for Generalization under Distribution Shifts (ICCV 2021)

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Mememoji - A facial expression classification system that recognizes 6 basic emotions: happy, sad, surprise, fear, anger and neutral.

This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

Happywhale - Whale and Dolphin Identification Silver🥈 Solution (26/1588)

PyJokes - Joking around with Python library pyjokes

This solves the autonomous driving issue which is supported by deep learning technology. Given a video, it splits into images and predicts the angle of turning for each frame.

Update Settings in `config.py`