Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Overview

Implicit3DUnderstanding (Im3D) [Project Page]

Holistic 3D Scene Understanding from a Single Image with Implicit Representation

Cheng Zhang, Zhaopeng Cui, Yinda Zhang, Shuaicheng Liu, Bing Zeng, Marc Pollefeys

img.jpg 3dbbox.png recon.png
img.jpg 3dbbox.png recon.png
img.jpg 3dbbox.png recon.png

pipeline

Introduction

This repo contains training, testing, evaluation, visualization code of our CVPR 2021 paper. Specially, the repo contains our PyTorch implementation of the decoder of LDIF, which can be extracted and used in other projects. We are expecting to release a refactored version of our pipeline and a PyTorch implementation of the full LDIF model in the future.

Install

sudo apt install xvfb ninja-build
conda env create -f environment.yml
conda activate Im3D
python project.py build

Demo

  1. Download the pretrained checkpoint and unzip it into out/total3d/20110611514267/

  2. Change current directory to Implicit3DUnderstanding/ and run the demo, which will generate 3D detection result and rendered scene mesh to demo/output/1/

    CUDA_VISIBLE_DEVICES=0 python main.py out/total3d/20110611514267/out_config.yaml --mode demo --demo_path demo/inputs/1
    
  3. In case you want to run it off screen (for example, with SSH)

    CUDA_VISIBLE_DEVICES=0 xvfb-run -a -s "-screen 0 800x600x24" python main.py out/total3d/20110611514267/out_config.yaml --mode demo --demo_path demo/inputs/1
    
  4. If you want to run it interactively, change the last line of demo.py

    scene_box.draw3D(if_save=True, save_path = '%s/recon.png' % (save_path))
    

    to

    scene_box.draw3D(if_save=False, save_path = '%s/recon.png' % (save_path))
    

Data preparation

We follow Total3DUnderstanding to use SUN-RGBD to train our Scene Graph Convolutional Network (SGCN), and use Pix3D to train our Local Implicit Embedding Network (LIEN) with Local Deep Implicit Functions (LDIF) decoder.

Preprocess SUN-RGBD data

Please follow Total3DUnderstanding to directly download the processed train/test data.

In case you prefer processing by yourself or want to evaluate 3D detection with our code (To ultilize the evaluation code of Coop, we modified the data processing code of Total3DUnderstanding to save parameters for transforming the coordinate system from Total3D back to Coop), please follow these steps:

  1. Follow Total3DUnderstanding to download the raw data.

  2. According to issue #6 of Total3DUnderstanding, there are a few typos in json files of SUNRGBD dataset, which is mostly solved by the json loader. However, one typo still needs to be fixed by hand. Please find {"name":""propulsion"tool"} in data/sunrgbd/Dataset/SUNRGBD/kv2/kinect2data/002922_2014-06-26_15-43-16_094959634447_rgbf000089-resize/annotation2Dfinal/index.json and remove ""propulsion.

  3. Process the data by

    python -m utils.generate_data
    

Preprocess Pix3D data

We use a different data process pipeline with Total3DUnderstanding. Please follow these steps to generate the train/test data:

  1. Download the Pix3D dataset to data/pix3d/metadata

  2. Run below to generate the train/test data into 'data/pix3d/ldif'

    python utils/preprocess_pix3d4ldif.py
    

Training and Testing

We use wandb for logging and visualization. You can register a wandb account and login before training by wandb login. In case you don't need to visualize the training process, you can put WANDB_MODE=dryrun before the commands bellow.

Thanks to the well-structured code of Total3DUnderstanding, we use the same method to manage parameters of each experiment with configuration files (configs/****.yaml). We first follow Total3DUnderstanding to pretrain each individual module, then jointly finetune the full model with additional physical violation loss.

Pretraining

We use the pretrained checkpoint of Total3DUnderstanding to load weights for ODN. Please download and rename the checkpoint to out/pretrained_models/total3d/model_best.pth. Other modules can be trained then tested with the following steps:

  1. Train LEN by:

    python main.py configs/layout_estimation.yaml
    

    The pretrained checkpoint can be found at out/layout_estimation/[start_time]/model_best.pth

  2. Train LIEN + LDIF by:

    python main.py configs/ldif.yaml
    

    The pretrained checkpoint can be found at out/ldif/[start_time]/model_best.pth

    The training process is followed with a quick test without ICP and Chamfer distance evaluated. In case you want to align mesh and evaluate the Chamfer distance during testing:

    python main.py configs/ldif.yaml --mode train
    

    The generated object meshes can be found at out/ldif/[start_time]/visualization

  3. Replace the checkpoint directories of LEN and LIEN in configs/total3d_ldif_gcnn.yaml with the checkpoints trained above, then train SGCN by:

    python main.py configs/total3d_ldif_gcnn.yaml
    

    The pretrained checkpoint can be found at out/total3d/[start_time]/model_best.pth

Joint finetune

  1. Replace the checkpoint directory in configs/total3d_ldif_gcnn_joint.yaml with the one trained in the last step above, then train the full model by:

    python main.py configs/total3d_ldif_gcnn_joint.yaml
    

    The trained model can be found at out/total3d/[start_time]/model_best.pth

  2. The training process is followed with a quick test without scene mesh generated. In case you want to generate the scene mesh during testing (which will cost a day on 1080ti due to the unoptimized interface of LDIF CUDA kernel):

    python main.py configs/total3d_ldif_gcnn_joint.yaml --mode train
    

    The testing resaults can be found at out/total3d/[start_time]/visualization

Testing

  1. The training process above already include a testing process. In case you want to test LIEN+LDIF or full model by yourself:

    python main.py out/[ldif/total3d]/[start_time]/model_best.pth --mode test
    

    The results will be saved to out/total3d/[start_time]/visualization and the evaluation metrics will be logged to wandb as run summary.

  2. Evaluate 3D object detection with our modified matlab script from Coop:

    external/cooperative_scene_parsing/evaluation/detections/script_eval_detection.m
    

    Before running the script, please specify the following parameters:

    SUNRGBD_path = 'path/to/SUNRGBD';
    result_path = 'path/to/experiment/results/visualization';
    
  3. Visualize the i-th 3D scene interacively by

    python utils/visualize.py --result_path out/total3d/[start_time]/visualization --sequence_id [i]
    

    or save the 3D detection result and rendered scene mesh by

    python utils/visualize.py --result_path out/total3d/[start_time]/visualization --sequence_id [i] --save_path []
    

    In case you do not have a screen:

    python utils/visualize.py --result_path out/total3d/[start_time]/visualization --sequence_id [i] --save_path [] --offscreen
    

    If nothing goes wrong, you should get results like:

    camera view 3D bbox scene reconstruction

  4. Visualize the detection results from a third person view with our modified matlab script from Coop:

    external/cooperative_scene_parsing/evaluation/vis/show_result.m
    

    Before running the script, please specify the following parameters:

    SUNRGBD_path = 'path/to/SUNRGBD';
    save_root = 'path/to/save/the/detection/results';
    paths = {
        {'path/to/save/detection/results', 'path/to/experiment/results/visualization'}, ...
        {'path/to/save/gt/boundingbox/results'}
    };
    vis_pc = false; % or true, if you want to show cloud point ground truth
    views3d = {'oblique', 'top'}; % choose prefered view
    dosave = true; % or false, please place breakpoints to interactively view the results.
    

    If nothing goes wrong, you should get results like:

    oblique view 3D bbox

About the testing speed

Thanks to the simplicity of LIEN+LDIF, the pretrain takes only about 8 hours on a 1080Ti. However, although we used the CUDA kernel of LDIF to optimize the speed, the file-based interface of the kernel still bottlenecked the mesh reconstruction. This is the main reason why our method takes much more time in object and scene mesh reconstruction. If you want speed over mesh quality, please lower the parameter data.marching_cube_resolution in the configuration file.

Citation

If you find our work and code helpful, please consider cite:

@article{zhang2021holistic,
  title={Holistic 3D Scene Understanding from a Single Image with Implicit Representation},
  author={Zhang, Cheng and Cui, Zhaopeng and Zhang, Yinda and Zeng, Bing and Pollefeys, Marc and Liu, Shuaicheng},
  journal={arXiv preprint arXiv:2103.06422},
  year={2021}
}

We thank the following great works:

  • Total3DUnderstanding for their well-structured code. We construct our network based on their well-structured code.
  • Coop for their dataset. We used their processed dataset with 2D detector prediction.
  • LDIF for their novel representation method. We ported their LDIF decoder from Tensorflow to PyTorch.
  • Graph R-CNN for their scene graph design. We adopted their GCN implemention to construct our SGCN.
  • Occupancy Networks for their modified version of mesh-fusion pipeline.

If you find them helpful, please cite:

@InProceedings{Nie_2020_CVPR,
author = {Nie, Yinyu and Han, Xiaoguang and Guo, Shihui and Zheng, Yujian and Chang, Jian and Zhang, Jian Jun},
title = {Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
@inproceedings{huang2018cooperative,
  title={Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation},
  author={Huang, Siyuan and Qi, Siyuan and Xiao, Yinxue and Zhu, Yixin and Wu, Ying Nian and Zhu, Song-Chun},
  booktitle={Advances in Neural Information Processing Systems},
  pages={206--217},
  year={2018}
}	
@inproceedings{genova2020local,
    title={Local Deep Implicit Functions for 3D Shape},
    author={Genova, Kyle and Cole, Forrester and Sud, Avneesh and Sarna, Aaron and Funkhouser, Thomas},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={4857--4866},
    year={2020}
}
@inproceedings{yang2018graph,
    title={Graph r-cnn for scene graph generation},
    author={Yang, Jianwei and Lu, Jiasen and Lee, Stefan and Batra, Dhruv and Parikh, Devi},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    pages={670--685},
    year={2018}
}
@inproceedings{mescheder2019occupancy,
  title={Occupancy networks: Learning 3d reconstruction in function space},
  author={Mescheder, Lars and Oechsle, Michael and Niemeyer, Michael and Nowozin, Sebastian and Geiger, Andreas},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4460--4470},
  year={2019}
}
Owner
Cheng Zhang
Cheng Zhang of UESTC 电子科技大学 通信学院 章程
Cheng Zhang
Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

Real-ESRGAN Colab Demo for Real-ESRGAN . Portable Windows executable file. You can find more information here. Real-ESRGAN aims at developing Practica

Xintao 17.2k Jan 02, 2023
Code for the paper "PortraitNet: Real-time portrait segmentation network for mobile device" @ CAD&Graphics2019

PortraitNet Code for the paper "PortraitNet: Real-time portrait segmentation network for mobile device". @ CAD&Graphics 2019 Introduction We propose a

265 Dec 01, 2022
Yolox-bytetrack-sample - Python sample of MOT (Multiple Object Tracking) using YOLOX and ByteTrack

yolox-bytetrack-sample YOLOXとByteTrackを用いたMOT(Multiple Object Tracking)のPythonサン

KazuhitoTakahashi 12 Nov 09, 2022
(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Python Outlier Detection (PyOD) Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License PyOD is a comprehensive and sca

Yue Zhao 6.6k Jan 03, 2023
PyTorch implementation of our method for adversarial attacks and defenses in hyperspectral image classification.

Self-Attention Context Network for Hyperspectral Image Classification PyTorch implementation of our method for adversarial attacks and defenses in hyp

22 Dec 02, 2022
Automatic detection and classification of Covid severity degree in LUS (lung ultrasound) scans

Final-Project Final project in the Technion, Biomedical faculty, by Mor Ventura, Dekel Brav & Omri Magen. Subproject 1: Automatic Detection of LUS Cha

Mor Ventura 1 Dec 18, 2021
Code/data of the paper "Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction" (BMVC2021)

Hand-Object Contact Prediction (BMVC2021) This repository contains the code and data for the paper "Hand-Object Contact Prediction via Motion-Based Ps

Takuma Yagi 13 Nov 07, 2022
This repository contains the code for TABS, a 3D CNN-Transformer hybrid automated brain tissue segmentation algorithm using T1w structural MRI scans

This repository contains the code for TABS, a 3D CNN-Transformer hybrid automated brain tissue segmentation algorithm using T1w structural MRI scans. TABS relies on a Res-Unet backbone, with a Vision

6 Nov 07, 2022
An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Federated Averaging (FedAvg) in PyTorch An unofficial implementation of FederatedAveraging (or FedAvg) algorithm proposed in the paper Communication-E

Seok-Ju Hahn 123 Jan 06, 2023
Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In

12 Feb 08, 2022
An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

MixHop and N-GCN ⠀ A PyTorch implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019)

Benedek Rozemberczki 393 Dec 13, 2022
A Python Reconnection Tool for alt:V

altv-reconnect What? It invokes a reconnect in the altV Client Dev Console. You get to determine when your local client should reconnect when developi

8 Jun 30, 2022
Wider-Yolo Kütüphanesi ile Yüz Tespit Uygulamanı Yap

WIDER-YOLO : Yüz Tespit Uygulaması Yap Wider-Yolo Kütüphanesinin Kullanımı 1. Wider Face Veri Setini İndir Train Dataset Val Dataset Test Dataset Not:

Kadir Nar 6 Aug 22, 2022
Code for Ditto: Building Digital Twins of Articulated Objects from Interaction

Ditto: Building Digital Twins of Articulated Objects from Interaction Zhenyu Jiang, Cheng-Chun Hsu, Yuke Zhu CVPR 2022, Oral Project | arxiv News 2022

UT Robot Perception and Learning Lab 78 Dec 22, 2022
This application explain how we can easily integrate Deepface framework with Python Django application

deepface_suite This application explain how we can easily integrate Deepface framework with Python Django application install redis cache install requ

Mohamed Naji Aboo 3 Apr 18, 2022
SimDeblur is a simple framework for image and video deblurring, implemented by PyTorch

SimDeblur (Simple Deblurring) is an open source framework for image and video deblurring toolbox based on PyTorch, which contains most deep-learning based state-of-the-art deblurring algorithms. It i

220 Jan 07, 2023
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

Code for "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval" (ACL 2021, Long) This is the repository for baseline m

Akari Asai 25 Oct 30, 2022
codes for Image Inpainting with External-internal Learning and Monochromic Bottleneck

Image Inpainting with External-internal Learning and Monochromic Bottleneck This repository is for the CVPR 2021 paper: 'Image Inpainting with Externa

97 Nov 29, 2022
This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR,which is an open-source toolbox based on PyTorch. The overall architecture will be sh

Jianquan Ye 82 Nov 17, 2022
Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

1 Jun 02, 2022