Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Last update: Dec 19, 2022

Related tags

Overview

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Abstract: We introduce a method that allows to automatically segment images into semantically meaningful regions without human supervision. Derived regions are consistent across different images and coincide with human-defined semantic classes on some datasets. In cases where semantic regions might be hard for human to define and consistently label, our method is still able to find meaningful and consistent semantic classes. In our work, we use pretrained StyleGAN2 generative model: clustering in the feature space of the generative model allows to discover semantic classes. Once classes are discovered, a synthetic dataset with generated images and corresponding segmentation masks can be created. After that a segmentation model is trained on the synthetic dataset and is able to generalize to real images. Additionally, by using CLIP we are able to use prompts defined in a natural language to discover some desired semantic classes. We test our method on publicly available datasets and show state-of-the-art results.

This repository contains the official Pytorch implementation of the following paper:

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP
Daniil Pakhomov, Sanchit Hira, Narayani Wagle, Kemar E. Green, Nassir Navab
https://arxiv.org/abs/2107.12518

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Related tags

Overview

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Owner

Daniil Pakhomov

Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

Face Library is an open source package for accurate and real-time face detection and recognition

This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales

Object Detection and Multi-Object Tracking

A real-time speech emotion recognition application using Scikit-learn and gradio

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

MEND: Model Editing Networks using Gradient Decomposition

PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

[PAMI 2020] Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation

render sprites into your desktop environment as shaped windows using GTK

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

A Haskell kernel for IPython.

List of content farm sites like g.penzai.com.

Continuous Time LiDAR odometry

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

Neural network-based build time estimation for additive manufacturing

Performance Analysis of Multi-user NOMA Wireless-Powered mMTC Networks: A Stochastic Geometry Approach

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.

Distributed Asynchronous Hyperparameter Optimization in Python

Code for paper "Learning to Reweight Examples for Robust Deep Learning"