[NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images

Overview

Unsupervised Object-Level Representation Learning from Scene Images

This repository contains the official PyTorch implementation of the ORL algorithm for self-supervised representation learning.

Unsupervised Object-Level Representation Learning from Scene Images,
Jiahao Xie, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy
In NeurIPS 2021
[Paper][Project Page][Bibtex]

highlights

Updates

  • [12/2021] Code and pre-trained models of ORL are released.

Installation

Please refer to INSTALL.md for installation and dataset preparation.

Models

Pre-trained models can be downloaded from Google Drive. Please see our paper for transfer learning results on different benchmarks.

Usage

Stage 1: Image-level pre-training

You need to pre-train an image-level contrastive learning model in this stage. Take BYOL as an example:

bash tools/dist_train.sh configs/selfsup/orl/coco/stage1/r50_bs512_ep800.py 8

This stage can be freely replaced with other image-level contrastive learning models.

Stage 2: Correspondence discovery

  • KNN image retrieval

First, extract all features in the training set using the pre-trained model weights in Stage 1:

bash tools/dist_train.sh configs/selfsup/orl/coco/stage1/r50_bs512_ep800_extract_feature.py 8 --resume_from work_dirs/selfsup/orl/coco/stage1/r50_bs512_ep800/epoch_800.pth

Second, retrieve KNN for each image using tools/coco_knn_image_retrieval.ipynb. The corresponding KNN image ids will be saved as a json file train2017_knn_instance.json under data/coco/meta/.

  • RoI generation

Apply selective search to generate region proposals for all images in the training set:

bash tools/dist_selective_search_single_gpu.sh configs/selfsup/orl/coco/stage2/selective_search_train2017.py data/coco/meta/train2017_selective_search_proposal.json

The script and config only support single-image single-gpu inference since different images can have different number of generated region proposals by selective search, which cannot be gathered if distributed in multiple gpus. You can also directly download here under data/coco/meta/ if you want to skip this step.

  • RoI pair retrieval

Retrieve top-ranked RoI pairs:

bash tools/dist_generate_correspondence_single_gpu.sh configs/selfsup/orl/coco/stage2/r50_bs512_ep800_generate_all_correspondence.py work_dirs/selfsup/orl/coco/stage1/r50_bs512_ep800/epoch_800.pth data/coco/meta/train2017_knn_instance.json data/coco/meta/train2017_knn_instance_correspondence.json

The script and config also only support single-image single-gpu inference since different image pairs can have different number of generated inter-RoI pairs, which cannot be gathered if distributed in multiple gpus. A workaround to speed up the retrieval process is to split the whole dataset into several parts and process each part on each gpu in parallel. We provide an example of these configs (10 parts in total) in configs/selfsup/orl/coco/stage2/r50_bs512_ep800_generate_partial_correspondence/. After generating each part, you can use tools/merge_partial_correspondence_files.py to merge them together and save the final correspondence json file train2017_knn_instance_correspondence.json under data/coco/meta/.

Stage 3: Object-level pre-training

After obtaining the correspondence file in Stage 2, you can then perform object-level pre-training:

bash tools/dist_train.sh configs/selfsup/orl/coco/stage3/r50_bs512_ep800.py 8

Transferring to downstream tasks

Please refer to GETTING_STARTED.md for transferring to various downstream tasks.

Acknowledgement

We would like to thank the OpenSelfSup for its open-source project and PyContrast for its detection evaluation configs.

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows:

@inproceedings{xie2021unsupervised,
  title={Unsupervised Object-Level Representation Learning from Scene Images},
  author={Xie, Jiahao and Zhan, Xiaohang and Liu, Ziwei and Ong, Yew Soon and Loy, Chen Change},
  booktitle={NeurIPS},
  year={2021}
}
Owner
Jiahao Xie
Jiahao Xie
Official Implementation of Neural Splines

Neural Splines: Fitting 3D Surfaces with Inifinitely-Wide Neural Networks This repository contains the official implementation of the CVPR 2021 (Oral)

Francis Williams 56 Nov 29, 2022
GBIM(Gesture-Based Interaction map)

手势交互地图 GBIM(Gesture-Based Interaction map),基于视觉深度神经网络的交互地图,通过电脑摄像头观察使用者的手势变化,进而控制地图进行简单的交互。网络使用PaddleX提供的轻量级模型PPYOLO Tiny以及MobileNet V3 small,使得整个模型大小约10MB左右,即使在CPU下也能快速定位和识别手势。

8 Feb 10, 2022
Neural-fractal - Create Fractals Using Complex-Valued Neural Networks!

Neural Fractal Create Fractals Using Complex-Valued Neural Networks! Home Page Features Define Dynamical Systems Using Complex-Valued Neural Networks

Amirabbas Asadi 10 Dec 17, 2022
113 Nov 28, 2022
STRIVE: Scene Text Replacement In Videos

STRIVE: Scene Text Replacement In Videos Dataset Types: RoboText SynthText RealWorld videos RoboText : Videos of texts collected using navigation robo

15 Jul 11, 2022
Sound Event Detection with FilterAugment

Sound Event Detection with FilterAugment Official implementation of Heavily Augmented Sound Event Detection utilizing Weak Predictions (DCASE2021 Chal

43 Aug 28, 2022
PyTorch implementation of residual gated graph ConvNets, ICLR’18

Residual Gated Graph ConvNets April 24, 2018 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbress

Xavier Bresson 112 Aug 10, 2022
DataCLUE: 国内首个以数据为中心的AI测评(含模型分析报告)

DataCLUE: A Benchmark Suite for Data-centric NLP You can get the english version of README. 以数据为中心的AI测评(DataCLUE) 内容导引 章节 描述 简介 介绍以数据为中心的AI测评(DataCLUE

CLUE benchmark 135 Dec 22, 2022
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
RL-GAN: Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation

RL-GAN: Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation RL-GAN is an official implementation of the paper: T

42 Nov 10, 2022
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

Semi Hand-Object Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time (CVPR 2021).

96 Dec 27, 2022
This package implements the algorithms introduced in Smucler, Sapienza, and Rotnitzky (2020) to compute optimal adjustment sets in causal graphical models.

optimaladj: A library for computing optimal adjustment sets in causal graphical models This package implements the algorithms introduced in Smucler, S

Facundo Sapienza 6 Aug 04, 2022
Measures input lag without dedicated hardware, performing motion detection on recorded or live video

What is InputLagTimer? This tool can measure input lag by analyzing a video where both the game controller and the game screen can be seen on a webcam

Bruno Gonzalez 4 Aug 18, 2022
Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)

MLCR This is the source code for paper Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition. Xuesong Niu, Hu Han, Shiguang

Edson-Niu 60 Nov 29, 2022
Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

Malik Boudiaf 138 Dec 12, 2022
A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

34 Dec 28, 2022
Segmentation vgg16 fcn - cityscapes

VGGSegmentation Segmentation vgg16 fcn - cityscapes Priprema skupa skripta prepare_dataset_downsampled.py Iz slika cityscapesa izrezuje haubu automobi

6 Oct 24, 2020
ML-based medical imaging using Azure

Disclaimer This code is provided for research and development use only. This code is not intended for use in clinical decision-making or for any other

Microsoft Azure 68 Dec 23, 2022
Best Practices on Recommendation Systems

Recommenders What's New (February 4, 2021) We have a new relase Recommenders 2021.2! It comes with lots of bug fixes, optimizations and 3 new algorith

Microsoft 14.8k Jan 03, 2023
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

DatasetGAN This is the official code and data release for: DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort Yuxuan Zhang*, Huan Li

302 Jan 05, 2023