Disagreement-Regularized Imitation Learning

Last update: Apr 28, 2022

Overview

Due to a normalization bug the expert trajectories have lower performance than the rl_baseline_zoo reported experts. Please see the following link in codebase for where the bug was fixed at. [link]

Disagreement-Regularized Imitation Learning

Code to train the models described in the paper "Disagreement-Regularized Imitation Learning", by Kianté Brantley, Wen Sun and Mikael Henaff.

Usage:

Install using pip

Install the DRIL package

pip install -e .

Software Dependencies

"stable-baselines", "rl-baselines-zoo", "baselines", "gym", "pytorch", "pybullet"

Data

We provide a python script to generate expert data from per-trained models using the "rl-baselines-zoo" repository. Click "Here" to see all of the pre-trained agents available and their respective perfromance. Replace <name-of-environment> with the name of the pre-trained agent environment you would like to collect expert data for.

python -u generate_demonstration_data.py --seed <seed-number> --env-name <name-of-environment> --rl_baseline_zoo_dir <location-to-top-level-directory>

Training

DRIL requires a per-trained ensemble model and a per-trained behavior-cloning model.

Note that <location-to-rl-baseline-zoo-directory> is the full-path to the top-level directory to the rl_baseline_zoo repository.

To train only a behavior-cloning model run:

python -u main.py --env-name <name-of-environment> --num-trajs <number-of-trajectories> --behavior_cloning --rl_baseline_zoo_dir <location-to-rl-baseline-zoo-directory> --seed <seed-number>'

To train only a ensemble model run:

python -u main.py --env-name <name-of-environment> --num-trajs <number-of-trajectories> --pretrain_ensemble_only --rl_baseline_zoo_dir <location-to-rl-baseline-zoo-directory> --seed <seed-number>'

To train a DRIL model run the command below. Note that command below first checks that both the behavior cloning model and the ensemble model are trained, if they are not the script will automatically train both the ensemble and behavior-cloning model.

python -u main.py --env-name <name-of-environment> --default_experiment_params <type-of-env>  --num-trajs <number-of-trajectories> --rl_baseline_zoo_dir <location-to-rl-baseline-zoo-directory> --seed <seed-number>  --dril

--default_experiment_params are the default parameters we use in the DRIL experiments and has two options: atari and continous-control

Visualization

After training the models, the results are stored in a folder called trained_results. Run the command below to reproduce the plots in our paper. If you change any of the hyperparameters, you will need to change the hyperparameters in the plot file naming convention.

python -u plot.py -env <name-of-environment>

Empirical evaluation

Atari

Results on Atari environments.

Continous Control

Results on continuous control tasks.

Acknowledgement:

We would like to thank Ilya Kostrikov for creating this "repo" that our codebase builds on.

Disagreement-Regularized Imitation Learning

Related tags

Overview

Disagreement-Regularized Imitation Learning

Usage:

Install using pip

Software Dependencies

Data

Training

Visualization

Empirical evaluation

Atari

Continous Control

Acknowledgement:

Owner

Kianté Brantley

Garbage Detection system which will detect objects based on whether it is plastic waste or plastics or just garbage.

This is a repository of our model for weakly-supervised video dense anticipation.

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis

This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

Direct application of DALLE-2 to video synthesis, using factored space-time Unet and Transformers

Python codes for Lite Audio-Visual Speech Enhancement.

Github Traffic Insights as Prometheus metrics.

Deep Halftoning with Reversible Binary Pattern

Tooling for the Common Objects In 3D dataset.

This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

Code for "Layered Neural Rendering for Retiming People in Video."

The spiritual successor to knockknock for PyTorch Lightning, get notified when your training ends

Alignment Attention Fusion framework for Few-Shot Object Detection

Codes for CyGen, the novel generative modeling framework proposed in "On the Generative Utility of Cyclic Conditionals" (NeurIPS-21)

Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction

A python bot to move your mouse every few seconds to appear active on Skype, Teams or Zoom as you go AFK. 🐭 🤖

A PyTorch Library for Accelerating 3D Deep Learning Research