PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentation.

Overview

Shape-aware Convolutional Layer (ShapeConv)

PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentation.

Introduction

We design a Shape-aware Convolutional(ShapeConv) layer to explicitly model the shape information for enhancing the RGB-D semantic segmentation accuracy. Specifically, we decompose the depth feature into a shape-component and a value component, after which two learnable weights are introduced to handle the shape and value with differentiation. Extensive experiments on three challenging indoor RGB-D semantic segmentation benchmarks, i.e., NYU-Dv2(-13,-40), SUN RGB-D, and SID, demonstrate the effectiveness of our ShapeConv when employing it over five popular architectures.

image

Usage

Installation

  1. Requirements
  • Linux
  • Python 3.6+
  • PyTorch 1.7.0 or higher
  • CUDA 10.0 or higher

We have tested the following versions of OS and softwares:

  • OS: Ubuntu 16.04.6 LTS
  • CUDA: 10.0
  • PyTorch 1.7.0
  • Python 3.6.9
  1. Install dependencies.
pip install -r requirements.txt

Dataset

Download the offical dataset and convert to a format appropriate for this project. See here.

Or download the converted dataset:

Evaluation

  1. Model

    Download trained model and put it in folder ./model_zoo. See all trained models here.

  2. Config

    Edit config file in ./config. The config files in ./config correspond to the model files in ./models.

    1. Set inference.gpu_id = CUDA_VISIBLE_DEVICES. CUDA_VISIBLE_DEVICES is used to specify which GPUs should be visible to a CUDA application, e.g., inference.gpu_id = "0,1,2,3".
    2. Set dataset_root = path_to_dataset. path_to_dataset represents the path of dataset. e.g.,dataset_root = "/home/shape_conv/nyu_v2".
  3. Run

    1. Ditributed evaluation, please run:
    ./tools/dist_test.sh config_path checkpoint_path gpu_num
    • config_path is path of config file;
    • checkpoint_pathis path of model file;
    • gpu_num is the number of GPUs used, note that gpu_num <= len(inference.gpu_id).

    E.g., evaluate shape-conv model on NYU-V2(40 categories), please run:

    ./tools/dist_test.sh configs/nyu/nyu40_deeplabv3plus_resnext101_shape.py model_zoo/nyu40_deeplabv3plus_resnext101_shape.pth 4
    1. Non-distributed evaluation
    python tools/test.py config_path checkpoint_path

Train

  1. Config

    Edit config file in ./config.

    1. Set inference.gpu_id = CUDA_VISIBLE_DEVICES.

      E.g.,inference.gpu_id = "0,1,2,3".

    2. Set dataset_root = path_to_dataset.

      E.g.,dataset_root = "/home/shape_conv/nyu_v2".

  2. Run

    1. Ditributed training
    ./tools/dist_train.sh config_path gpu_num

    E.g., train shape-conv model on NYU-V2(40 categories) with 4 GPUs, please run:

    ./tools/dist_train.sh configs/nyu/nyu40_deeplabv3plus_resnext101_shape.py 4
    1. Non-distributed training
    python tools/train.py config_path

Result

For more result, please see model zoo.

NYU-V2(40 categories)

Architecture Backbone MS & Flip Shape Conv mIOU
DeepLabv3plus ResNeXt-101 False False 48.9%
DeepLabv3plus ResNeXt-101 False True 50.2%
DeepLabv3plus ResNeXt-101 True False 50.3%
DeepLabv3plus ResNeXt-101 True True 51.3%

SUN-RGBD

Architecture Backbone MS & Flip Shape Conv mIOU
DeepLabv3plus ResNet-101 False False 46.9%
DeepLabv3plus ResNet-101 False True 47.6%
DeepLabv3plus ResNet-101 True False 47.6%
DeepLabv3plus ResNet-101 True True 48.6%

SID(Stanford Indoor Dataset)

Architecture Backbone MS & Flip Shape Conv mIOU
DeepLabv3plus ResNet-101 False False 54.55%
DeepLabv3plus ResNet-101 False True 60.6%

Acknowledgments

This repo was developed based on vedaseg.

Owner
Hanchao Leng
Hanchao Leng
A curated list of awesome Deep Learning tutorials, projects and communities.

Awesome Deep Learning Table of Contents Books Courses Videos and Lectures Papers Tutorials Researchers Websites Datasets Conferences Frameworks Tools

Christos 20k Jan 05, 2023
Gas detection for Raspberry Pi using ADS1x15 and MQ-2 sensors

Gas detection Gas detection for Raspberry Pi using ADS1x15 and MQ-2 sensors. Description The MQ-2 sensor can detect multiple gases (CO, H2, CH4, LPG,

Filip Š 15 Sep 30, 2022
Pytorch Implementation of "Diagonal Attention and Style-based GAN for Content-Style disentanglement in image generation and translation" (ICCV 2021)

DiagonalGAN Official Pytorch Implementation of "Diagonal Attention and Style-based GAN for Content-Style Disentanglement in Image Generation and Trans

32 Dec 06, 2022
Weakly Supervised Segmentation with Tensorflow. Implements instance segmentation as described in Simple Does It: Weakly Supervised Instance and Semantic Segmentation, by Khoreva et al. (CVPR 2017).

Weakly Supervised Segmentation with TensorFlow This repo contains a TensorFlow implementation of weakly supervised instance segmentation as described

Phil Ferriere 220 Dec 13, 2022
This code is an implementation for Singing TTS.

MLP Singer This code is an implementation for Singing TTS. The algorithm is based on the following papers: Tae, J., Kim, H., & Lee, Y. (2021). MLP Sin

Heejo You 22 Dec 23, 2022
Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss (ATVGnet)

Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss (ATVGnet) By Lele Chen , Ross K Maddox, Zhiyao Duan, Chenliang Xu. Unive

Lele Chen 218 Dec 27, 2022
SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers This repo contains our codes for the paper "No Parameters Left Behind: Sensitivity Gu

Chen Liang 23 Nov 07, 2022
Fake News Detection Using Machine Learning Methods

Fake-News-Detection-Using-Machine-Learning-Methods Fake news is always a real and dangerous issue. However, with the presence and abundance of various

Achraf Safsafi 1 Jan 11, 2022
Serve TensorFlow ML models with TF-Serving and then create a Streamlit UI to use them

TensorFlow Serving + Streamlit! ✨ 🖼️ Serve TensorFlow ML models with TF-Serving and then create a Streamlit UI to use them! This is a pretty simple S

Álvaro Bartolomé 18 Jan 07, 2023
Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

ChongjianGE 89 Dec 02, 2022
OpenMMLab Model Deployment Toolset

Introduction English | 简体中文 MMDeploy is an open-source deep learning model deployment toolset. It is a part of the OpenMMLab project. Major features F

OpenMMLab 1.5k Dec 30, 2022
Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection Acknowledgement We implement our model, BtcDet, based on [OpenPcdet 0.3.0]. Insta

Qiangeng Xu 163 Dec 19, 2022
Python with OpenCV - MediaPip Framework Hand Detection

Python HandDetection Python with OpenCV - MediaPip Framework Hand Detection Explore the docs » Contact Me About The Project It is a Computer vision pa

2 Jan 07, 2022
The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis.

deep-learning-LAM-avulsion-diagnosis The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis

1 Jan 12, 2022
PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

PaddlePaddle Vision Transformers State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 🤖 PaddlePaddle Visual Transformers (PaddleViT or

1k Dec 28, 2022
An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

MetaICL: Learning to Learn In Context This includes an original implementation of "MetaICL: Learning to Learn In Context" by Sewon Min, Mike Lewis, Lu

Meta Research 141 Jan 07, 2023
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

CPC_audio This code implements the Contrast Predictive Coding algorithm on audio data, as described in the paper Unsupervised Pretraining Transfers we

Meta Research 283 Dec 30, 2022
A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

MARL @ SJTU 348 Jan 08, 2023
A program that uses computer vision to detect hand gestures, used for controlling movie players.

HandGestureDetection This program uses a Haar Cascade algorithm to detect the presence of your hand, and then passes it on to a self-created and self-

2 Nov 22, 2022