Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Last update: Apr 12, 2022

Related tags

Overview

Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Beining Han, Chongyi Zheng, Harris Chan, Keiran Paster, Michael R. Zhang, Jimmy Ba

Summary: Deep Reinforcement Learning agents often face unanticipated environmental changes after deployment in the real world. These changes are often spurious and unrelated to the underlying problem, such as background shifts for visual input agents. Unfortunately, deep RL policies are usually sensitive to these changes and fail to act robustly against them. This resembles the problem of domain generalization in supervised learning. In this work, we study this problem for goal-conditioned RL agents. We propose a theoretical framework in the Block MDP setting that characterizes the generalizability of goal-conditioned policies to new environments. Under this framework, we develop a practical method PA-SkewFit (PASF) that enhances domain generalization.

@article{han2021learning,
  title={Learning Domain Invariant Representations in Goal-conditioned Block MDPs},
  author={Han, Beining and Zheng, Chongyi and Chan, Harris and Paster, Keiran and Zhang, Michael and Ba, Jimmy},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

Installation

Our code was adapted from rlkit and was tested on a Ubuntu 20.04 server.

This instruction assumes that you have already installed NVIDIA driver, Anaconda, and MuJoCo.

You'll need to get your own MuJoCo key if you want to use MuJoCo.

1. Create Anaconda environment

Install the included Anaconda environment

$ conda env create -f environment/pasf_env.yml
$ source activate pasf_env
(pasf_env) $ python

2. Download the goals

Download the goals from the following link and put it here: (PASF DIR)/multiworld/envs/mujoco.

https://drive.google.com/drive/folders/1L9SYFADWmFzdP1c6wf2yo2WjOlXJh8Iu?usp=sharing

$ ls (PASF DIR)/multiworld/envs/mujoco
... goals ...

(Optional) Speed up with GPU rendering

3. (Optional) Speed-up with GPU rendering

Note: GPU rendering for mujoco-py speeds up training a lot but consumes more GPU memory at the same time.

Check this Issues:

Remember to do this stuff with the mujoco-py package inside of your pasf_env.

Running Experiments

The following command run the PASF experiments for the four tasks: Reach, Door, Push, Pickup, in the learning curve respectively.

$ source activate pasf_env
(pasf_env) $ bash (PASF DIR)/bash_scripts/pasf_reach_lc_exp.bash
(pasf_env) $ bash (PASF DIR)/bash_scripts/pasf_door_lc_exp.bash
(pasf_env) $ bash (PASF DIR)/bash_scripts/pasf_push_lc_exp.bash
(pasf_env) $ bash (PASF DIR)/bash_scripts/pasf_pickup_lc_exp.bash

The bash scripts only set , , and with the exact values we used for LC. But you can play with other hyperparameters in python scripts under (PASF DIR)/experiment.
Training and evaluation environments are chosen in python scripts for each task. You can find the backgrounds in (PASF DIR)/multiworld/core/background and domains in (PASF DIR)/multiworld/envs/assets/sawyer_xyz.
Results are recorded in progress.csv under (PASF DIR)/data/ and variant.json contains configuration for each experiment.
We simply set random seeds as 0, 1, 2, etc., and run experiments with 6-9 different seeds for each task.
Error and output logs can be found in (PASF DIR)/terminal_log.

Questions

If you have any questions, comments, or suggestions, please reach out to Beining Han ([email protected]) and Chongyi Zheng ([email protected]).

Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Related tags

Overview

Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Installation

1. Create Anaconda environment

2. Download the goals

3. (Optional) Speed-up with GPU rendering

Running Experiments

Questions

Owner

Chongyi Zheng

Incomplete easy-to-use math solver and PDF generator.

This program will stylize your photos with fast neural style transfer.

Tiny Object Detection in Aerial Images.

Genpass - A Passwors Generator App With Python3

Tensors and neural networks in Haskell

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

Exploration of some patients clinical variables.

Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Repo for the Tutorials of Day1-Day3 of the Nordic Probabilistic AI School 2021 (https://probabilistic.ai/)

Pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI 2022)

AI virtual gym is an AI program which can be used to exercise and can be used to see if we are doing the exercises

This repository is the official implementation of the Hybrid Self-Attention NEAT algorithm.

Keras-1D-NN-Classifier

DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

Pytorch tutorials for Neural Style transfert

Source code and dataset of the paper "Contrastive Adaptive Propagation Graph Neural Networks forEfficient Graph Learning"

Official Implementation of LARGE: Latent-Based Regression through GAN Semantics