MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Last update: Jan 04, 2023

Related tags

Overview

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Project Page | Paper

If you find our work useful for your research, please consider citing our paper:

@article{DBLP:journals/corr/abs-2104-13325,
  author    = {Zhenpei Yang and
               Zhile Ren and
               Qi Shan and
               Qixing Huang},
  title     = {{MVS2D:} Efficient Multi-view Stereo via Attention-Driven 2D Convolutions},
  journal   = {CoRR},
  volume    = {abs/2104.13325},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.13325},
  eprinttype = {arXiv},
  eprint    = {2104.13325},
  timestamp = {Tue, 04 May 2021 15:12:43 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-13325.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

✏️ Changelog

Nov 27 2021

Initial release. Note that our released code achieve improved results than those reported in the initial arxiv pre-print. In addition, we include the evaluation on DTU dataset. We will update our paper soon.

⚙️ Installation

Click to expand

The code is tested with CUDA10.1. Please use following commands to install dependencies:

conda create --name mvs2d python=3.7
conda activate mvs2d

pip install -r requirements.txt

The folder structure should looks like the following if you have downloaded all data and pretrained models. Download links are inside each dataset tab at the end of this README.

.
├── configs
├── datasets
├── demo
├── networks
├── scripts
├── pretrained_model
│   ├── demon
│   ├── dtu
│   └── scannet
├── data
│   ├── DeMoN
│   ├── DTU_hr
│   ├── SampleSet
│   ├── ScanNet
│   └── ScanNet_3_frame_jitter_pose.npy
├── splits
│   ├── DeMoN_samples_test_2_frame.npy
│   ├── DeMoN_samples_train_2_frame.npy
│   ├── ScanNet_3_frame_test.npy
│   ├── ScanNet_3_frame_train.npy
│   └── ScanNet_3_frame_val.npy

🎬 Demo

Click to expand

After downloading the pretrained models for ScanNet, try to run following command to make a prediction on a sample data.

python demo.py --cfg configs/scannet/release.conf

The results are saved as demo.png

⏳ Training & Testing

We use 4 Nvidia V100 GPU for training. You may need to modify 'CUDA_VISIBLE_DEVICES' and batch size to accomodate your GPU resources.

ScanNet

Click to expand

Download

data 🔗 split 🔗 pretrained models 🔗 noisy pose 🔗

Training

First download and extract ScanNet training data and split. Then run following command to train our model.

bash scripts/scannet/train.sh

To train the multi-scale attention model, add --robust 1 to the training command in scripts/scannet/train.sh.

To train our model with noisy input pose, add --perturb_pose 1 to the training command in scripts/scannet/train.sh.

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/scannet/test.sh

You should get something like these:

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.059	0.016	0.026	0.157	0.084	0.964	0.995	0.999	0.108	0.079	0.856	0.974	0.996

SUN3D/RGBD/Scenes11

Click to expand

Download

data 🔗 split 🔗 pretrained models 🔗

Training

First download and extract DeMoN training data and split. Then run following command to train our model.

bash scripts/demon/train.sh

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/demon/test.sh

You should get something like these:

dataset rgbd: 160

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.082	0.165	0.047	0.440	0.147	0.921	0.939	0.948	0.325	0.284	0.753	0.894	0.933

dataset scenes11: 256

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.046	0.080	0.018	0.439	0.107	0.976	0.989	0.993	0.155	0.058	0.822	0.945	0.979

dataset sun3d: 160

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.099	0.055	0.044	0.304	0.137	0.893	0.970	0.993	0.224	0.171	0.649	0.890	0.969

-> Done!

depth

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.071	0.096	0.033	0.402	0.127	0.938	0.970	0.981	0.222	0.152	0.755	0.915	0.963

DTU

Click to expand

Download

data 🔗 eval data 🔗 pretrained models 🔗

Training

First download and extract DTU training data. Then run following command to train our model.

bash scripts/dtu/test.sh

Testing

First download and extract DTU eval data and pretrained models.

The following command performs three steps together: 1. Generate depth prediction on DTU test set. 2. Fuse depth predictions into final point cloud. 3. Evaluate predicted point cloud. Note that we re-implement the original Matlab Evaluation of DTU dataset using python.

bash scripts/dtu/test.sh

You should get something like these:

Acc 0.4051747996189477
Comp 0.2776021161518006
F-score 0.34138845788537414

Acknowledgement

The fusion code for DTU dataset is heavily built upon from PatchMatchNet

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Related tags

Overview

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Project Page | Paper

✏️ Changelog

Nov 27 2021

⚙️ Installation

🎬 Demo

⏳ Training & Testing

ScanNet

Download

Training

Testing

SUN3D/RGBD/Scenes11

Download

Training

Testing

DTU

Download

Training

Testing

Acknowledgement

Owner

SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis

CC-GENERATOR - A python script for generating CC

【Arxiv】Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution

A Novel Incremental Learning Driven Instance Segmentation Framework to Recognize Highly Cluttered Instances of the Contraband Items

Codebase for Attentive Neural Hawkes Process (A-NHP) and Attentive Neural Datalog Through Time (A-NDTT)

Contextual Attention Network: Transformer Meets U-Net

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

This repository contains code for the paper "Disentangling Label Distribution for Long-tailed Visual Recognition", published at CVPR' 2021

Keras implementation of "One pixel attack for fooling deep neural networks" using differential evolution on Cifar10 and ImageNet

Refactoring dalle-pytorch and taming-transformers for TPU VM

Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors

Reference PyTorch implementation of "End-to-end optimized image compression with competition of prior distributions"

Unofficial pytorch-lightning implement of Mip-NeRF

Fashion Entity Classification

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Efficient electromagnetic solver based on rigorous coupled-wave analysis for 3D and 2D multi-layered structures with in-plane periodicity

The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Easy-to-use,Modular and Extendible package of deep-learning based CTR models .

This repository contains the code for the ICCV 2019 paper "Occupancy Flow - 4D Reconstruction by Learning Particle Dynamics"

(JMLR' 19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)