Puzzle-CAM: Improved localization via matching partial and full features.

Last update: Nov 14, 2022

Overview

Puzzle-CAM

The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".

Citation

Please cite our paper if the code is helpful to your research. arxiv

@article{jo2021puzzle,
  title={Puzzle-CAM: Improved localization via matching partial and full features},
  author={Jo, Sanhyun and Yu, In-Jae},
  journal={arXiv preprint arXiv:2101.11253},
  year={2021}
}

Abstract

Weakly-supervised semantic segmentation (WSSS) is introduced to narrow the gap for semantic segmentation performance from pixel-level supervision to image-level supervision. Most advanced approaches are based on class activation maps (CAMs) to generate pseudo-labels to train the segmentation network. The main limitation of WSSS is that the process of generating pseudo-labels from CAMs which use an image classifier is mainly focused on the most discriminative parts of the objects. To address this issue, we propose Puzzle-CAM, a process minimizes the differences between the features from separate patches and the whole image. Our method consists of a puzzle module (PM) and two regularization terms to discover the most integrated region of in an object. Without requiring extra parameters, Puzzle-CAM can activate the overall region of an object using image-level supervision. In experiments, Puzzle-CAM outperformed previous state-of-the-art methods using the same labels for supervision on the PASCAL VOC 2012 test dataset.

Overview

Prerequisite

Python 3.8, PyTorch 1.7.0, and more in requirements.txt
CUDA 10.1, cuDNN 7.6.5
4 x Titan RTX GPUs

Usage

Install python dependencies

python3 -m pip install -r requirements.txt

Download PASCAL VOC 2012 devkit

Follow instructions in http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit

1. Train an image classifier for generating CAMs

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_classification_with_puzzle.py --architecture resnest101 --re_loss_option masking --re_loss L1_Loss --alpha_schedule 0.50 --alpha 4.00 --tag [email protected]@optimal --data_dir $your_dir

2. Apply Random Walk (RW) to refine the generated CAMs

2.1. Make affinity labels to train AffinityNet.

CUDA_VISIBLE_DEVICES=0 python3 inference_classification.py --architecture resnest101 --tag [email protected]@optimal --domain train_aug --data_dir $your_dir
python3 make_affinity_labels.py --experiment_name [email protected]@[email protected]@scale=0.5,1.0,1.5,2.0 --domain train_aug --fg_threshold 0.40 --bg_threshold 0.10 --data_dir $your_dir

2.2. Train AffinityNet.

CUDA_VISIBLE_DEVICES=0 python3 train_affinitynet.py --architecture resnest101 --tag [email protected]@Puzzle --label_name [email protected]@opt[email protected]@scale=0.5,1.0,1.5,[email protected]_fg=0.40_bg=0.10 --data_dir $your_dir

3. Train the segmentation model using the pseudo-labels

3.1. Make segmentation labels to train segmentation model.

CUDA_VISIBLE_DEVICES=0 python3 inference_rw.py --architecture resnest101 --model_name [email protected]@Puzzle --cam_dir [email protected]@op[email protected]@scale=0.5,1.0,1.5,2.0 --domain train_aug --data_dir $your_dir
python3 make_pseudo_labels.py --experiment_name [email protected]@[email protected]@[email protected][email protected] --domain train_aug --threshold 0.35 --crf_iteration 1 --data_dir $your_dir

3.2. Train segmentation model.

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_segmentation.py --backbone resnest101 --mode fix --use_gn True --tag [email protected]@[email protected] --label_name [email protected]@[email protected]@[email protected][email protected]@crf=1 --data_dir $your_dir

4. Evaluate the models

CUDA_VISIBLE_DEVICES=0 python3 inference_segmentation.py --backbone resnest101 --mode fix --use_gn True --tag [email protected]@[email protected] --scale 0.5,1.0,1.5,2.0 --iteration 10

python3 evaluate.py --experiment_name [email protected]@[email protected]@[email protected]=0.5,1.0,1.5,[email protected]=10 --domain val --data_dir $your_dir/SegmentationClass

5. Results

Qualitative segmentation results on the PASCAL VOC 2012 validation set. Top: original images. Middle: ground truth. Bottom: prediction of the segmentation model trained using the pseudo-labels from Puzzle-CAM.

Methods	background	aeroplane	bicycle	bird	boat	bottle	bus	car	cat	chair	cow	diningtable	dog	horse	motorbike	person	pottedplant	sheep	sofa	train	tvmonitor	mIoU
Puzzle-CAM with ResNeSt-101	88.9	87.1	38.7	89.2	55.8	72.8	89.8	78.9	91.3	26.8	84.4	40.3	88.9	81.9	83.1	34.0	60.1	83.6	47.3	59.6	38.8	67.7
Puzzle-CAM with ResNeSt-269	91.1	87.2	37.3	86.8	61.4	71.2	92.2	86.2	91.8	28.6	85.0	64.1	91.8	82.0	82.5	70.7	69.4	87.7	45.4	67.0	37.7	72.2

For any issues, please contact Sanghyun Jo, [email protected]

Comments

ModuleNotFoundError: No module named 'core.sync_batchnorm'

`

ModuleNotFoundError Traceback (most recent call last) in 1 from core.puzzle_utils import * ----> 2 from core.networks import * 3 from core.datasets import * 4 5 from tools.general.io_utils import *

/working/PuzzleCAM/core/networks.py in 24 # Normalization 25 ####################################################################### ---> 26 from .sync_batchnorm.batchnorm import SynchronizedBatchNorm2d 27 28 class FixedBatchNorm(nn.BatchNorm2d):

ModuleNotFoundError: No module named 'core.sync_batchnorm' `

opened by Ashneo07 2
performance issue

When I used the released weights for inference phase and evaluation, I found that the mIoU I got was different from the mIoU reported in the paper. I would like to ask whether this weight is corresponding to the paper, if it is, how to reproduce the result in your paper. Looking forward to your reply.

opened by linjiatai 0
Evaluation in classifier training is using supervised segmentation maps?

Hello, thank you for the great repository! It's pretty impressive how organized it is.

I have a critic (or maybe a question, in case I got it wrong) regarding the training of the classifier, though: I understand the importance of measuring and logging the mIoU during training (specially when creating the ablation section in your paper), however it doesn't strike me as correct to save the model with best mIoU. This procedural decision is based on fully supervised segmentation information, which should not be available for a truly weakly supervised problem; while resulting in a model better suited for segmentation. The paper doesn't address this. Am I right to assume all models were trained like this? Were there any trainings where other metrics were considered when saving the model (e.g. classification loss or Eq (7) in the paper)?

opened by lucasdavid 0
error occured when image-size isn't 512 * n

dear author: I notice that if the image size isn't 512 x 512, it will have some error. I use image size 1280 x 496 and i got tensor size error at calculate puzzle module:the original feature is 31 dims and re_feature is 32 dims. So i have to change image size to 1280 x 512 and i work. So i think this maybe a little bug. It will better that you fixed it or add a notes in code~ Thanks for your job!

opened by hazy-wu 0
the backbone of Affinitynet is resnet38. Why did you write resnet50?

In Table 2 of your paper, the backbone of Affinitynet is resnet38. Why did you write resnet50? After my experiment, I found that RW result reached 65.42% for Affinitynet which is based on resnet50 and higher than yours.

opened by songyukino1 0
Ask for details of the training process！

I am trying to train with ResNest101, and I also added affinity and RW. When I try to train, it runs according to the specified code. It is found that the obtained affinity labels are not effective, and the effect of pseudo_labels is almost invisible, which is close to the effect of all black. I don't know where the problem is, who can explain the details. help!

opened by YuYue26 1

Releases(v1.0)

v1.0(Jan 25, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Sanghyun Jo

e-mail : [email protected] # DeepLearning #Computer Vision #AutoML #Se

GitHub Repository

Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

139 Nov 18, 2022

Code for the paper "On the Power of Edge Independent Graph Models"

Edge Independent Graph Models Code for the paper: "On the Power of Edge Independent Graph Models" Sudhanshu Chanpuriya, Cameron Musco, Konstantinos So

0 Oct 26, 2021

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

419 Jan 03, 2023

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

115 Jan 04, 2023

FedML: A Research Library and Benchmark for Federated Machine Learning

FedML: A Research Library and Benchmark for Federated Machine Learning 📄 https://arxiv.org/abs/2007.13518 News 2021-02-01 (Award): #NeurIPS 2020# Fed

2.3k Jan 08, 2023

Small little script to scrape, parse and check for active tor nodes. Can be used as proxies.

TorScrape TorScrape is a small but useful script made in python that scrapes a website for active tor nodes, parse the html and then save the nodes in

5 Dec 04, 2022

Efficient face emotion recognition in photos and videos

This repository contains code of face emotion recognition that was developed in the RSF (Russian Science Foundation) project no. 20-71-10010 (Efficien

239 Jan 04, 2023

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

[CVPR'2020] DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data

DeepDeform (CVPR'2020) DeepDeform is an RGB-D video dataset containing over 390,000 RGB-D frames in 400 videos, with 5,533 optical and scene flow imag

165 Jan 09, 2023

Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

JBHI-Pytorch This repository contains a reference implementation of the algorithms described in our paper "Self-supervised Multi-modal Hybrid Fusion N

5 Dec 13, 2021

All of the figures and notebooks for my deep learning book, for free!

"Deep Learning - A Visual Approach" by Andrew Glassner This is the official repo for my book from No Starch Press. Ordering the book My book is called

227 Jan 04, 2023

Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

CLIN-X (CLIN-X-ES) & (CLIN-X-EN) This repository holds the companion code for the system reported in the paper: "CLIN-X: pre-trained language models a

4 Dec 05, 2022

Puzzle-CAM: Improved localization via matching partial and full features.

Related tags

Overview

Puzzle-CAM

Citation

Abstract

Overview

Prerequisite

Usage

Install python dependencies

Download PASCAL VOC 2012 devkit

1. Train an image classifier for generating CAMs

2. Apply Random Walk (RW) to refine the generated CAMs

3. Train the segmentation model using the pseudo-labels

4. Evaluate the models

5. Results

Comments

ModuleNotFoundError: No module named 'core.sync_batchnorm'

`

performance issue

Evaluation in classifier training is using supervised segmentation maps?

error occured when image-size isn't 512 * n

the backbone of Affinitynet is resnet38. Why did you write resnet50?

Ask for details of the training process！

Releases(v1.0)

v1.0(Jan 25, 2021)

Owner

Sanghyun Jo

Code for the paper Learning the Predictability of the Future

Code for the paper "On the Power of Edge Independent Graph Models"

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

FedML: A Research Library and Benchmark for Federated Machine Learning

Small little script to scrape, parse and check for active tor nodes. Can be used as proxies.

Efficient face emotion recognition in photos and videos

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

[CVPR'2020] DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data

Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

All of the figures and notebooks for my deep learning book, for free!

Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

Using Self-Supervised Pretext Tasks for Active Learning - Official Pytorch Implementation

Consensus score for tripadvisor

TensorFlow for Raspberry Pi

The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

Torchyolo - Yolov3 ve Yolov4 modellerin Pytorch uygulamasıdır

Python code for loading the Aschaffenburg Pose Dataset.

⚾🤖⚾ Automatic baseball pitching overlay in realtime

Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"