Open-World Entity Segmentation

Last update: Dec 29, 2022

Related tags

Overview

Open-World Entity Segmentation Project Website

Lu Qi*, Jason Kuen*, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia

This project provides an implementation for the paper "Open-World Entity Segmentation" based on Detectron2. Entity Segmentation is a segmentation task with the aim to segment everything in an image into semantically-meaningful regions without considering any category labels. Our entity segmentation models can perform exceptionally well in a cross-dataset setting where we use only COCO as the training dataset but we test the model on images from other datasets at inference time. Please refer to project website for more details and visualizations.

Installation

This project is based on Detectron2, which can be constructed as follows.

Install Detectron2 following the instructions. We are noting that our code is implemented in detectron2 commit version 28174e932c534f841195f02184dc67b941c65a67 and pytorch 1.8.
Setup the coco dataset including instance and panoptic annotations following the structure. The code of entity evaluation metric is saved in the file of modified_cocoapi. You can directly replace your compiled coco.py with modified_cocoapi/PythonAPI/pycocotools/coco.py.
Copy this project to /path/to/detectron2/projects/EntitySeg
Set the "find_unused_parameters=True" in distributed training of your own detectron2. You could modify it in detectron2/engine/defaults.py.

Data pre-processing

(1) Generate the entity information of each image by the instance and panoptic annotation. Please change the path of coco annotation files in the following code.

cd /path/to/detectron2/projects/EntitySeg/make_data
bash make_entity_mask.sh

(2) Change the generated entity information to the json files.

cd /path/to/detectron2/projects/EntitySeg/make_data
python3 entity_to_json.py

Training

To train model with 8 GPUs, run:

cd /path/to/detectron2
python3 projects/EntitySeg/train_net.py --config-file <projects/EntitySeg/configs/config.yaml> --num-gpus 8

For example, to launch entity segmentation training (1x schedule) with ResNet-50 backbone on 8 GPUs and save the model in the path "/data/entity_model". one should execute:

cd /path/to/detectron2
python3 projects/EntitySeg/train_net.py --config-file projects/EntitySeg/configs/entity_default.yaml --num-gpus 8 OUTPUT_DIR /data/entity_model

Evaluation

To evaluate a pre-trained model with 8 GPUs, run:

cd /path/to/detectron2
python3 projects/EntitySeg/train_net.py --config-file <config.yaml> --num-gpus 8 --eval-only MODEL.WEIGHTS model_checkpoint

Visualization

To visualize some image result of a pre-trained model, run:

cd /path/to/detectron2
python3 projects/EntitySeg/demo_result_and_vis.py --config-file <config.yaml> --input <input_path> --output <output_path> MODEL.WEIGHTS model_checkpoint MODEL.CONDINST.MASK_BRANCH.USE_MASK_RESCORE "True"

For example,

python3 projects/EntitySeg/demo_result_and_vis.py --config-file projects/EntitySeg/configs/entity_swin_lw7_1x.yaml --input /data/input/*.jpg --output /data/output MODEL.WEIGHTS /data/pretrained_model/R_50.pth MODEL.CONDINST.MASK_BRANCH.USE_MASK_RESCORE "True"

Pretrained weights of Swin Transformers

Use the tools/convert_swin_to_d2.py to convert the pretrained weights of Swin Transformers to the detectron2 format. For example,

pip install timm
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
python tools/convert_swin_to_d2.py swin_tiny_patch4_window7_224.pth swin_tiny_patch4_window7_224_trans.pth

Pretrained weights of Segformer Backbone

Use the tools/convert_mit_to_d2.py to convert the pretrained weights of SegFormer Backbone to the detectron2 format. For example,

pip install timm
python tools/convert_mit_to_d2.py mit_b0.pth mit_b0_trans.pth

Results

We provide the results of several pretrained models on COCO val set. It is easy to extend it to other backbones. We first describe the results of using CNN backbone.

Method	Backbone	Sched	Entity AP	download
Baseline	R50	1x	28.3	model \| metrics
Ours	R50	1x	29.8	model \| metrics
Ours	R50	3x	31.8	model \| metrics
Ours	R101	1x	31.0	model \| metrics
Ours	R101	3x	33.2	model \| metrics
Ours	R101-DCNv2	3x	35.5	model \| metrics

The results of using transformer backbone as follows.The Mask Rescore indicates that we use mask rescoring in inference by setting MODEL.CONDINST.MASK_BRANCH.USE_MASK_RESCORE to True.

Method	Backbone	Sched	Entity AP	Mask Rescore	download
Ours	Swin-T	1x	33.0	34.6	model \| metrics
Ours	Swin-L-W7	1x	37.8	39.3	model \| metrics
Ours	Swin-L-W7	3x	38.6	40.0	model \| metrics
Ours	Swin-L-W12	3x	TBD	TBD	model \| metrics
Ours	MiT-b0	1x	28.8	30.4	model \| metrics
Ours	MiT-b2	1x	35.1	36.6	model \| metrics
Ours	MiT-b3	1x	36.9	38.5	model \| metrics
Ours	MiT-b5	1x	37.2	38.7	model \| metrics
Ours	MiT-b5	3x	TBD	TBD	model \| metrics

Citing Ours

Consider to cite Open-World Entity Segmentation if it helps your research.

@inprocedings{qi2021open,
  title={Open World Entity Segmentation},
  author={Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia},
  booktitle={arxiv},
  year={2021}
}

Open-World Entity Segmentation

Related tags

Overview

Open-World Entity Segmentation Project Website

Installation

Data pre-processing

Training

Evaluation

Visualization

Pretrained weights of Swin Transformers

Pretrained weights of Segformer Backbone

Results

Citing Ours

Owner

DV Lab

An assignment on creating a minimalist neural network toolkit for CS11-747

Rhythm-Finder is a unsupervised ML driven python powered web-application that can find the songs that suits you.

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

SimCTG - A Contrastive Framework for Neural Text Generation

Autoregressive Entity Retrieval

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

PyTorch source code of NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models"

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

A repo for materials relating to the tutorial of CS-332 NLP

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

NLP Core Library and Model Zoo based on PaddlePaddle 2.0

Open-World Entity Segmentation

Related tags

Overview

Open-World Entity Segmentation Project Website

Installation

Data pre-processing

Training

Evaluation

Visualization

Pretrained weights of Swin Transformers

Pretrained weights of Segformer Backbone

Results

Citing Ours

Owner

DV Lab

An assignment on creating a minimalist neural network toolkit for CS11-747

Rhythm-Finder is a unsupervised ML driven python powered web-application that can find the songs that suits you.

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

SimCTG - A Contrastive Framework for Neural Text Generation

Autoregressive Entity Retrieval

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含 自然语言处理各领域的 面试题积累。

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

PyTorch source code of NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models"

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

A repo for materials relating to the tutorial of CS-332 NLP

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

NLP Core Library and Model Zoo based on PaddlePaddle 2.0

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。