Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Overview

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation

Source code of Robust On-Policy Data Collection for Data-Efficient Policy Evaluation (NeurIPS 2021 Workshop on OfflineRL).

The code is written in python 3, using Pytorch for the implementation of the deep networks and OpenAI gym for the experiment domains.

Requirements

To install the required codebase, it is recommended to create a conda or a virtual environment. Then, run the following command

pip install -r requirements.txt

Preparation

To conduct policy evaluation, we need to prepare a set of pretrained policies. You can skip this part if you already have the pretrained models in policy_models/ and the corresponding policy values in experiments/policy_info.py

Pretrained Policy

Train the policy models using REINFORCE in different domains by running:

python policy/reinfoce.py --exp_name {exp_name}

where {exp_name} can be MultiBandit, GridWorld, CartPole or CartPoleContinuous. The parameterized epsilon-greedy policies for MultiBandit and GridWorld can be obtained by running:

python policy/handmade_policy.py

Policy Value

Option 1: Run in sequence

For each policy model, the true policy value is estimated with $10^6$ Monte Carlo roll-outs by running:

python experiments/policy_value.py --policy_name {policy_name} --seed {seed} --n 10e6

This will print the average steps, true policy value and variance of returns. Make sure you copy these results into the file experiment/policy_info.py.

Option 2: Run in parallel

If you can use qsub or sbatch, you can also run jobs/jobs_value.py with different seeds in parallel and merge them by running experiments/merge_values.py to get $10^6$ Monte Carlo roll-outs. The policy values reported in this paper were obtained in this way.

Evaluation

Option 1: Run in sequence

The main running script for policy evaluation is experiments/evaluate.py. The following running command is an example of Monte Carlo estimation for Robust On-policy Acting with $\rho=1.0$ for the policy model_GridWorld_5000.pt with seeds from 0 to 199.

python experiments/evaluate.py --policy_name GridWorld_5000 --ros_epsilon 1.0 --collectors RobustOnPolicyActing --estimators MonteCarlo --eval_steps "7,14,29,59,118,237,475,951,1902,3805,7610,15221,30443,60886" --seeds "0,199"

To conduct policy evaluation with off-policy data, you need to add the following arguments to the above running command:

--combined_trajectories 100 --combined_ops_epsilon 0.10 

Option 2: Run in parallel

If you can use qsub or sbatch, you may only need to run the script jobs/jobs.py where all experiments in the paper are arranged. The log will be saved in log/ and the seed results will be saved in results/seeds. Note that we save the data collection cache in results/data and re-use it for different value estimations. To merge results of different seeds, run experiments/merge_results.py, and the merged results will be saved in results/.

Ploting

When the experiments are finished, all the figures in the paper are produced by running

python drawing/draw.py

Citing

If you use this repository in your work, please consider citing the paper

@inproceedings{zhong2021robust,
    title = {Robust On-Policy Data Collection for Data-Efficient Policy Evaluation},
    author = {Rujie Zhong, Josiah P. Hanna, Lukas Schäfer and Stefano V. Albrecht},
    booktitle = {NeurIPS Workshop on Offline Reinforcement Learning (OfflineRL)},
    year = {2021}
}
Owner
Autonomous Agents Research Group (University of Edinburgh)
Official code repositories for projects by the Autonomous Agents Research Group
Autonomous Agents Research Group (University of Edinburgh)
Detail-Preserving Transformer for Light Field Image Super-Resolution

DPT Official Pytorch implementation of the paper "Detail-Preserving Transformer for Light Field Image Super-Resolution" accepted by AAAI 2022 . Update

50 Jan 01, 2023
Official PyTorch implementation of PICCOLO: Point-Cloud Centric Omnidirectional Localization (ICCV 2021)

Official PyTorch implementation of PICCOLO: Point-Cloud Centric Omnidirectional Localization (ICCV 2021)

16 Nov 19, 2022
FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Anton Jeran Ratnarajah 89 Dec 22, 2022
(ICCV 2021) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing."

Dressing in Order (DiOr) 👚 [Paper] 👖 [Webpage] 👗 [Running this code] The official implementation of "Dressing in Order: Recurrent Person Image Gene

Aiyu Cui 277 Dec 28, 2022
[TOG 2021] PyTorch implementation for the paper: SofGAN: A Portrait Image Generator with Dynamic Styling.

This repository contains the official PyTorch implementation for the paper: SofGAN: A Portrait Image Generator with Dynamic Styling. We propose a SofGAN image generator to decouple the latent space o

Anpei Chen 694 Dec 23, 2022
HairCLIP: Design Your Hair by Text and Reference Image

Overview This repository hosts the official PyTorch implementation of the paper: "HairCLIP: Design Your Hair by Text and Reference Image". Our single

322 Jan 06, 2023
Official repository of the paper "GPR1200: A Benchmark for General-PurposeContent-Based Image Retrieval"

GPR1200 Dataset GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval (ArXiv) Konstantin Schall, Kai Uwe Barthel, Nico Hezel, Klaus J

Visual Computing Group 16 Nov 21, 2022
Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

EleutherAI 432 Dec 16, 2022
Code for "Unsupervised Layered Image Decomposition into Object Prototypes" paper

DTI-Sprites Pytorch implementation of "Unsupervised Layered Image Decomposition into Object Prototypes" paper Check out our paper and webpage for deta

40 Dec 22, 2022
Using deep actor-critic model to learn best strategies in pair trading

Deep-Reinforcement-Learning-in-Stock-Trading Using deep actor-critic model to learn best strategies in pair trading Abstract Partially observed Markov

281 Dec 09, 2022
Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

NDQ: Learning Nearly Decomposable Value Functions with Communication Minimization Note This codebase accompanies paper Learning Nearly Decomposable Va

Tonghan Wang 69 Nov 26, 2022
Duke Machine Learning Winter School: Computer Vision 2022

mlwscv2002 Welcome to the Duke Machine Learning Winter School: Computer Vision 2022! The MLWS-CV includes 3 hands-on training sessions on implementing

Duke + Data Science (+DS) 9 May 25, 2022
Vision Deep-Learning using Tensorflow, Keras.

Welcome! I am a computer vision deep learning developer working in Korea. This is my blog, and you can see everything I've studied here. https://www.n

kimminjun 6 Dec 14, 2022
Differentiable architecture search for convolutional and recurrent networks

Differentiable Architecture Search Code accompanying the paper DARTS: Differentiable Architecture Search Hanxiao Liu, Karen Simonyan, Yiming Yang. arX

Hanxiao Liu 3.7k Jan 09, 2023
Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

Sarah Ostadabbas 2 Dec 16, 2022
When BERT Plays the Lottery, All Tickets Are Winning

When BERT Plays the Lottery, All Tickets Are Winning Large Transformer-based models were shown to be reducible to a smaller number of self-attention h

Sai 16 Nov 10, 2022
This repo is about implementing different approaches of pose estimation and also is a sub-task of the smart hospital bed project :smile:

Pose-Estimation This repo is a sub-task of the smart hospital bed project which is about implementing the task of pose estimation 😄 Many thanks to th

Max 11 Oct 17, 2022
X-modaler is a versatile and high-performance codebase for cross-modal analytics.

X-modaler X-modaler is a versatile and high-performance codebase for cross-modal analytics. This codebase unifies comprehensive high-quality modules i

910 Dec 28, 2022
Python Environment for Bayesian Learning

Pebl is a python library and command line application for learning the structure of a Bayesian network given prior knowledge and observations. Pebl in

Abhik Shah 103 Jul 14, 2022
This is the official pytorch implementation of the BoxEL for the description logic EL++

BoxEL: Box EL++ Embedding This is the official pytorch implementation of the BoxEL for the description logic EL++. BoxEL++ is a geometric approach bas

1 Nov 03, 2022