Language-Driven Semantic Segmentation

Last update: Jan 03, 2023

Related tags

Deep Learning lang-seg

Overview

Language-driven Semantic Segmentation (LSeg)

The repo contains official PyTorch Implementation of paper Language-driven Semantic Segmentation.

Authors:

Overview

We present LSeg, a novel model for language-driven semantic image segmentation. LSeg uses a text encoder to compute embeddings of descriptive input labels (e.g., ''grass'' or 'building'') together with a transformer-based image encoder that computes dense per-pixel embeddings of the input image. The image encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class. The text embeddings provide a flexible label representation in which semantically similar labels map to similar regions in the embedding space (e.g., ''cat'' and ''furry''). This allows LSeg to generalize to previously unseen categories at test time, without retraining or even requiring a single additional training sample. We demonstrate that our approach achieves highly competitive zero-shot performance compared to existing zero- and few-shot semantic segmentation methods, and even matches the accuracy of traditional segmentation algorithms when a fixed label set is provided.

Please check our Video Demo (4k) to further showcase the capabilities of LSeg.

Usage

Installation

Option 1:

pip install -r requirements.txt

Option 2:

conda install ipython
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
pip install git+https://github.com/zhanghang1989/PyTorch-Encoding/
pip install pytorch-lightning==1.3.5
pip install opencv-python
pip install imageio
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install altair
pip install streamlit
pip install --upgrade protobuf
pip install timm
pip install tensorboardX
pip install matplotlib
pip install test-tube
pip install wandb

Running interactive app

Download the model for demo and put it under folder checkpoints as checkpoints/demo_e200.ckpt.

Then streamlit run lseg_app.py

Training

Backbone = ViT-L/16, Text Encoder from CLIP ViT-B/32

bash train.sh

Testing

Backbone = ViT-L/16, Text Encoder from CLIP ViT-B/32

bash test.sh

Try demo model

Download the model for demo and put it under folder checkpoints as checkpoints/demo_e200.ckpt.

Then follow lseg_demo.ipynb to play around with LSeg. Enjoy!

Model Zoo

	name	backbone	text encoder	url
0	Model for demo	ViT-L/16	CLIP ViT-B/32	download

If you find this repo useful, please cite:

@article{li2022lan,
  title={Language-driven Semantic Segmentation},
  author={Li, Boyi and Weinberger, Kilian Q and Belongie, Serge and Koltun, Vladlen and Ranftl, Rene},
  journal={arXiv preprint},
  year={2022}
}

Acknowledgement

Thanks to the code base from DPT, Pytorch_lightning, CLIP, Pytorch Encoding, Streamlit, Wandb

Language-Driven Semantic Segmentation

Related tags

Overview

Language-driven Semantic Segmentation (LSeg)

Authors:

Overview

Usage

Installation

Running interactive app

Training

Testing

Try demo model

Model Zoo

Acknowledgement

Owner

Intelligent Systems Lab Org

[Nature Machine Intelligence' 21] "Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence"

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

[NeurIPS 2021] "Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks" by Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Lin

Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow

Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers

DiffStride: Learning strides in convolutional neural networks

To model the probability of a soccer coach leave his/her team during Campeonato Brasileiro for 10 chosen teams and considering years 2018, 2019 and 2020.

Deep metric learning methods implemented in Chainer

Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

Code for ViTAS_Vision Transformer Architecture Search

Pre-trained NFNets with 99% of the accuracy of the official paper

Inferring Lexicographically-Ordered Rewards from Preferences

Attention-guided gan for synthesizing IR images

Fast and robust certifiable relative pose estimation

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021

FcaNet: Frequency Channel Attention Networks

Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

Impelmentation for paper Feature Generation and Hypothesis Verification for Reliable Face Anti-Spoofing