Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (NeurIPS 2021)

Last update: Sep 20, 2022

Related tags

Overview

VRDP (NeurIPS 2021)

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, and Chuang Gan

More details can be found at the Project Page.

If you find our work useful in your research please consider citing our paper:

@inproceedings{ding2021dynamic,
  author = {Ding, Mingyu and Chen, Zhenfang and Du, Tao and Luo, Ping and Tenenbaum, Joshua B and Gan, Chuang},
  title = {Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language},
  booktitle = {Advances In Neural Information Processing Systems},
  year = {2021}
}

Prerequisites

Python 3
PyTorch 1.3 or higher
All relative packages are covered by Miniconda
Both CPUs and GPUs are supported

Dataset preparation

Download videos, video annotation, questions and answers, and object proposals accordingly from the official website
Transform videos into ".png" frames with ffmpeg.

Organize the data as shown below.

clevrer
├── annotation_00000-01000
│   ├── annotation_00000.json
│   ├── annotation_00001.json
│   └── ...
├── ...
├── image_00000-01000
│   │   ├── 1.png
│   │   ├── 2.png
│   │   └── ...
│   └── ...
├── ...
├── questions
│   ├── train.json
│   ├── validation.json
│   └── test.json
├── proposals
│   ├── proposal_00000.json
│   ├── proposal_00001.json
│   └── ...

We also provide data for physics learning and program execution in Google Drive. You can download them optionally and put them in the ./data/ folder.
Download the processed data executor_data.zip for the executor. Put it in and unzip it to ./executor/data/.

Get Object Dictionaries (Concepts and Trajectories)

Download the object proposals from the region proposal network and follow the Step-by-step Training in DCL to get object concepts and trajectories.

The above process includes:

trajectory extraction
concept learning
trajectory refinement

Or you can download our extracted object dictionaries object_dicts.zip directly from Google Drive.

Learning

1. Differentiable Physics Learning

After we get the above object dictionaries, we learn physical parameters from object properties and trajectories.

cd dynamics/
python3 learn_dynamics.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output object physical parameters object_dicts_with_physics.zip can be downloaded from Google Drive.

2. Physics Simulation (counterfactual)

Physical simulation using learned physical parameters.

cd dynamics/
python3 physics_simulation.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output simulated trajectories/events object_simulated.zip can be downloaded from Google Drive.

3. Physics Simulation (predictive)

Correction of long-range prediction according to video observations.

cd dynamics/
python3 refine_prediction.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output refined trajectories/events object_updated_results.zip can be downloaded from Google Drive.

Evaluation

After we get the final trajectories/events, we perform the neuro-symbolic execution and evaluate the performance on the validation set.

cd executor/
python3 evaluation.py

The test json file for evaluation on evalAI can be generated by

cd executor/
python3 get_results.py

The Generalized Clerver Dataset (counterfactual_mass)

Download causal_mass.zip and counterfactual_mass.zip from Google Drive.
Generate counterfactual data on the collision event by python3 counterfactual_mass/generate_data.py

Examples

Predictive question
Counterfactual question

Acknowledgements

For questions regarding VRDP, feel free to post here or directly contact the author ([email protected]).

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (NeurIPS 2021)

Related tags

Overview

VRDP (NeurIPS 2021)

Prerequisites

Dataset preparation

Get Object Dictionaries (Concepts and Trajectories)

Learning

1. Differentiable Physics Learning

2. Physics Simulation (counterfactual)

3. Physics Simulation (predictive)

Evaluation

The Generalized Clerver Dataset (counterfactual_mass)

Examples

Acknowledgements

Owner

Mingyu Ding

PyTorch implementation of Super SloMo by Jiang et al.

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TransPrompt - Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

Official Pytorch implementation of paper "Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images"

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Deploy recommendation engines with Edge Computing

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[ICCV 2021 Oral] Mining Latent Classes for Few-shot Segmentation

[ICCV 2021] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

'Solving the sampling problem of the Sycamore quantum supremacy circuits

Global-Local Context Network for Person Search

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

A real-time speech emotion recognition application using Scikit-learn and gradio

Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection".

Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)