Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Last update: Dec 23, 2022

Overview

Vision-Language Transformer and Query Generation for Referring Segmentation

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{vision-language-transformer,
  title={Vision-Language Transformer and Query Generation for Referring Segmentation},
  author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Installation

Environment:
- Python 3.6
- tensorflow 1.15
- Other dependencies in requirements.txt
- SpaCy model for embedding:
  
  python -m spacy download en_vectors_web_lg
Dataset preparation
- Put the folder of COCO training set ("train2014") under data/images/.
- Download the RefCOCO dataset from here and extract them to data/. Then run the script for data preparation under data/:
```
cd data
python data_process_v2.py --data_root . --output_dir data_v2 --dataset [refcoco/refcoco+/refcocog] --split [unc/umd/google] --generate_mask
```

Evaluating

Download pretrained models & config files from here.
In the config file, set:
- evaluate_model: path to the pretrained weights
- evaluate_set: path to the dataset for evaluation.

Run

python vlt.py test [PATH_TO_CONFIG_FILE]

Training

Pretrained Backbones: We use the backbone weights proviede by MCN.

Note: we use the backbone that excludes all images that appears in the val/test splits of RefCOCO, RefCOCO+ and RefCOCOg.
Specify hyperparameters, dataset path and pretrained weight path in the configuration file. Please refer to the examples under /config, or config file of our pretrained models.

Run

python vlt.py train [PATH_TO_CONFIG_FILE]

Acknowledgement

We borrowed a lot of codes from MCN, keras-transformer, RefCOCO API and keras-yolo3. Thanks for their excellent works!

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Related tags

Overview

Vision-Language Transformer and Query Generation for Referring Segmentation

Installation

Evaluating

Training

Acknowledgement

Owner

Henghui Ding

Code accompanying the paper "Wasserstein GAN"

Source code of article "Towards Toxic and Narcotic Medication Detection with Rotated Object Detector"

Highly comparative time-series analysis

ChatBot-Pytorch - A GPT-2 ChatBot implemented using Pytorch and Huggingface-transformers

License Plate Detection Application

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion Models

SemEval2022 Patronizing and Condescending Language (PCL) Detection

Sequence to Sequence (seq2seq) Recurrent Neural Network (RNN) for Time Series Forecasting

A robust pointcloud registration pipeline based on correlation.

Code for "Layered Neural Rendering for Retiming People in Video."

Efficient Lottery Ticket Finding: Less Data is More

PyTorch code for 'Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning'

Hand-distance-measurement-game - Hand Distance Measurement Game

Object Detection using YOLO from PyImageSearch

DeepRec is a recommendation engine based on TensorFlow.

Learning nonlinear operators via DeepONet

Official PyTorch implementation of UACANet: Uncertainty Aware Context Attention for Polyp Segmentation

Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020

Official repository of ICCV21 paper "Viewpoint Invariant Dense Matching for Visual Geolocalization"