Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

Last update: Jan 05, 2023

Related tags

Deep Learning deep-learning

Overview

Consistent Depth of Moving Objects in Video

This repository contains training code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

This is not an officially supported Google product.

Installing Dependencies

We provide both conda and pip installations for dependencies.

To install with conda, run

conda create --name dynamic-video-depth --file ./dependencies/conda_packages.txt

To install with pip, run

pip install -r ./dependencies/requirements.txt

Training

We provide two preprocessed video tracks from the DAVIS dataset. To download the pre-trained single-image depth prediction checkpoints, as well as the example data, run:

bash ./scripts/download_data_and_depth_ckpt.sh

This script will automatically download and unzip the checkpoints and data. If you would like to download manually

To train using the example data, run:

bash ./experiments/davis/train_sequence.sh 0 --track_id dog

The first argument indicates the GPU id for training, and --track_id indicates the name of the track. ('dog' and 'train' are provided.)

After training, the results should look like:

Video	Our Depth	Single Image Depth

Dataset Preparation:

To help with generating custom datasets for training, We provide examples of preparing the dataset from DAVIS, and two sequences from ShutterStock, which are showcased in our paper.

The general work flow for preprocessing the dataset is:

Calibrate the scale of camera translation, transform the camera matrices into camera-to-world convention, and save as individual files.
Calculate flow between pairs of frames, as well as occlusion estimates.
Pack flow and per-frame data into training batches.

To be more specific, example codes are provided in .scripts/preprocess

We provide the triangulation results here and here. You can download them in a single script by running:

bash ./scripts/download_triangulation_files.sh

Davis data preparation

Download the DAVIS dataset here, and unzip it under ./datafiles.
Run python ./scripts/preprocess/davis/generate_frame_midas.py. This requires trimesh to be installed (pip install trimesh should do the trick). This script projects the triangulated 3D points to calibrate camera translation scales.
Run python ./scripts/preprocess/davis/generate_flows.py to generate optical flows between pairs of images. This stage requires RAFT, which is included as a submodule in this repo.
Run python ./scripts/preprocess/davis/generate_sequence_midas.py to pack camera calibrations and images into training batches.

ShutterStock Videos

Download the ShutterStock videos here and here.
Cast the videos as images, put them under ./datafiles/shutterstock/images, and rename them to match the file names in ./datafiles/shutterstock/triangulation. Note that not all frames are triangulated; time stamp of valid frames are recorded in the triangulation file name.
Run python ./scripts/preprocess/shutterstock/generate_frame_midas.py to pack per-frame data.
Run python ./scripts/preprocess/shutterstock/generate_flows.py to generate optical flows between pairs of images.
Run python ./scripts/preprocess/shutterstock/generate_sequence_midas.py to pack flows and per-frame data into training batches.
Example training script is located at ./experiments/shutterstock/train_sequence.sh

Comments

question about the Pre-processing

Can you provide the code for preprocessing part? I wonder for dynamic video, how to get accurate camera pose and K? I see you use DAVIS for example, I want to know how to deal with other videos in this dataset.

opened by Robertwyq 11
Parameter finetuning vs Output finetuning

It seems that running gradient descent for the depth prediction network makes up the majority of the runtime of this method. The current MiDaS implementation (v3?) contains 1.3 GB of parameters, most of which are for the DPT-Large (https://github.com/isl-org/DPT) backbone.

In your research, did you experiment with performance differences between 'parameter finetuning' and just simple 'output finetuning' for the depth predictions (like as discussed in the GLNet paper (https://arxiv.org/pdf/1907.05820.pdf))?

I would also be curious about whether as a middle ground, maybe just finetuning the 'head' of the MiDaS network would be sufficient, and leave the much larger set of backbone parameters locked.

Thanks!

opened by carsonswope 0
How to get the triangulation files for customized videos?

Thanks for sharing this great work!

I was wondering how to obtain the triangulation files when using my own videos. For example, the dog.intrinsics.txt, dog.matrices.txt, and the dog.obj.

Are they calculated from colmap? Could you please provide some instructions to get them?

opened by Cogito2012 0
Question about the colmap parameter setting and image resize need to convert the camera pose

This is very useful work, thanks. I use colmap automatic_reconstructor --camera_model FULL_OPENCV to process the dog training set in DAVIS to get the camera pose, then replacing ./datafiles/DAVIS/triangulation/, other training codes have not changed, but the depth result of each frame has become much worse. How to set the specific parameters of colmap preprocessing? In addition, the image is resized to a small image during training, does the camera pose information obtained by colmap need to be transformed according to resize?

opened by mayunchao1994 2
Question about triangulation results file

This is a great project, Thanks for your work. I have download triangulation results from your link, but i only found dog.intrinsics.txt and train.intrinsics.txt, In DAVIS-2017-trainval-Full-Resolution.zip file, There are 90 files in it, I was wondering if you could share all the triangulation files about Davis and ShutterStock dataset, Thanks very much.

opened by aiforworlds 0

Can not reproduce training result

As it has been mentioned in issue #9 "DAVIS datafiles uncomplete": "datafiles.tar in provided "Google Drive" download link consists only triangulation data. There are no "JPEGImages/1080p" and "Annotation//1080p" folders that "python ./scripts/preprocess/davis/generate_frame_midas.py" refers to." So, I manually downloaded missing data from https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-Unsupervised-trainval-Full-Resolution.zip After that the structure as follow:

├── datafiles
    ├── DAVIS
        ├── Annotations  --- missing in supplied download links, downloaded manually from DAVIS datasets 
            ├── 1080p
                ├── dog
                ├── train
        ├── JPEGImages  --- missing in supplied download links, downloaded manually from DAVIS datasets 
            ├── 1080p
                ├── dog
                ├── train
        ├── triangulation -- data from supplied link

Only after that I could successfully performed all steps of suggested in "Davis data preparation":

Run python ./scripts/preprocess/davis/generate_frame_midas.py.
Run python ./scripts/preprocess/davis/generate_flows.py
Run python ./scripts/preprocess/davis/generate_sequence_midas.py

However still couldn't reproduce the presented result, running: bash ./experiments/davis/train_sequence.sh 0 --track_id dog

Output & Stacktrace:


D:\dynamic-video-depth-main>bash ./experiments/davis/train_sequence.sh 0 --track_id dog
python train.py --net scene_flow_motion_field --dataset davis_sequence --track_id train --log_time --epoch_batches 2000 --epoch 20 --lr 1e-6 --html_logger --vali_batches 150 --batch_size 1 --optim adam --vis_batches_vali 4 --vis_every_vali 1 --vis_every_train 1 --vis_batches_train 5 --vis_at_start --tensorboard --gpu 0 --save_net 1 --workers 4 --one_way --loss_type l1 --l1_mul 0 --acc_mul 1 --disp_mul 1 --warm_sf 5 --scene_lr_mul 1000 --repeat 1 --flow_mul 1 --sf_mag_div 100 --time_dependent --gaps 1,2,4,6,8 --midas --use_disp --logdir './checkpoints/davis/sequence/' --suffix 'track_{track_id}_{loss_type}_wreg_{warm_reg}_acc_{acc_mul}_disp_{disp_mul}_flowmul_{flow_mul}_time_{time_dependent}_CNN_{use_cnn}_gap_{gaps}_Midas_{midas}_ud_{use_disp}' --test_template './experiments/davis/test_cmd.txt' --force_overwrite --track_id dog
  File "train.py", line 106
    str_warning, f'ignoring the gpu set up in opt: {opt.gpu}. Will use all gpus in each node.')
                                                                                             ^
SyntaxError: invalid syntax

Noticed that there is no folder named ".checkpoints"

Similar issue has been mentioned in issue #8 "SyntaxError: invalid syntax"

Specs: Windows 10 Anaconda: conda 4.11.0 Python 3.7.10 GPU 12Gb Quadro M6000 All specified dependencies including RAFT are installed

opened by makemota 0

DAVIS datafiles uncomplete?
"datafiles.tar" in provided "Google Drive" download link consists only triangulation data. There are no "JPEGImages/1080p" and "Annotation//1080p" folders that "python ./scripts/preprocess/davis/generate_frame_midas.py" refers to:

--- data_list_root = "./datafiles/DAVIS/JPEGImages/1080p" camera_path = "./datafiles/DAVIS/triangulation" mask_path = './datafiles/DAVIS/Annotations/1080p' ---
opened by semel1 1

Releases(sig2021_code_release)

sig2021_code_release(Aug 11, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Google

Google ❤️ Open Source

GitHub Repository https://dynamic-video-depth.github.io

DIVeR: Deterministic Integration for Volume Rendering

DIVeR: Deterministic Integration for Volume Rendering This repo contains the training and evaluation code for DIVeR. Setup python 3.8 pytorch 1.9.0 py

64 Dec 27, 2022

Implementation of BI-RADS-BERT & The Advantages of Section Tokenization.

BI-RADS BERT Implementation of BI-RADS-BERT & The Advantages of Section Tokenization. This implementation could be used on other radiology in house co

1 May 17, 2022

An implementation of the [Hierarchical (Sig-Wasserstein) GAN] algorithm for large dimensional Time Series Generation

Hierarchical GAN for large dimensional financial market data Implementation This repository is an implementation of the [Hierarchical (Sig-Wasserstein

11 Nov 29, 2022

Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021)

Beanie - is an asynchronous ODM for MongoDB, based on Motor and Pydantic. It uses an abstraction over Pydantic models and Motor collections to work wi

295 Dec 29, 2022

This is the official PyTorch implementation of the CVPR 2020 paper "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting".

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting Project Page | YouTube | Paper This is the official PyTorch implementation of the C

330 Dec 11, 2022

Generative Autoregressive, Normalized Flows, VAEs, Score-based models (GANVAS)

GANVAS-models This is an implementation of various generative models. It contains implementations of the following: Autoregressive Models: PixelCNN, G

6 Nov 26, 2022

利用yolov5和TensorRT从0到1实现目标检测的模型训练到模型部署全过程

写在前面利用TensorRT加速推理速度是以时间换取精度的做法，意味着在推理速度上升的同时将会有精度的下降，不过不用太担心，精度下降微乎其微。此外，要有NVIDIA显卡，经测试，CUDA10.2可以支持20系列显卡及以下，30系列显卡需要CUDA11.x的支持，并且目前有bug。默认你已经完成了

6 Jul 28, 2022

Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)

Face-Detection-with-MTCNN Face detection is a computer vision problem that involves finding faces in photos. It is a trivial problem for humans to sol

3 Oct 07, 2022

[ICML 2020] DrRepair: Learning to Repair Programs from Error Messages

DrRepair: Learning to Repair Programs from Error Messages This repo provides the source code & data of our paper: Graph-based, Self-Supervised Program

155 Jan 08, 2023

PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability

PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability PCACE is a new algorithm for ranking neurons in a CNN architecture in order

4 Jan 04, 2022

Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth [Paper]

Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth [Paper] Downloads [Downloads] Trained ckpt files for NYU Depth V2 and

98 Jan 01, 2023

Algebraic effect handlers in Python

PyEffect: Algebraic effects in Python What IDK. Usage effects.handle(operation, handlers=None) effects.set_handler(effect, handler) Supported effects

5 Dec 27, 2021

A proof of concept ai-powered Recaptcha v2 solver

Recaptcha Fullauto I've decided to open source my old Recaptcha v2 solver. My latest version will be opened sourced this summer. I am hoping this proj

60 Dec 20, 2022

Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

SharinGAN Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation" The official project we

23 Oct 19, 2022

HODEmu, is both an executable and a python library that is based on Ragagnin 2021 in prep.

HODEmu HODEmu, is both an executable and a python library that is based on Ragagnin 2021 in prep. and emulates satellite abundance as a function of co

1 Oct 13, 2021

Deep and online learning with spiking neural networks in Python

Introduction The brain is the perfect place to look for inspiration to develop more efficient neural networks. One of the main differences with modern

447 Jan 03, 2023

Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" (NeurIPS'20)

IGNN Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" [paper] [supp] Prepare datasets 1 Download training dataset

278 Jan 03, 2023

Semantic Segmentation for Aerial Imagery using Convolutional Neural Network

This repo has been deprecated because whole things are re-implemented by using Chainer and I did refactoring for many codes. So please check this newe

27 Sep 23, 2022

Retinal Vessel Segmentation with Pixel-wise Adaptive Filters (ISBI 2022)

Official code of Retinal Vessel Segmentation with Pixel-wise Adaptive Filters and Consistency Training (ISBI 2022)

14 Oct 27, 2022

Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

399 Jan 02, 2023