Implement A3C for Mujoco gym envs

Last update: Dec 12, 2022

Overview

pytorch-a3c-mujoco

Disclaimer: my implementation right now is unstable (you ca refer to the learning curve below), I'm not sure if it's my problems. All comments are welcomed and feel free to contact me!

This code aims to solve some control problems, espicially in Mujoco, and is highly based on pytorch-a3c. What's difference between this repo and pytorch-a3c:

compatible to Mujoco envionments
the policy network output the mu, and sigma
construct a gaussian distribution from mu and sigma
sample the data from the gaussian distribution
modify entropy

Note that this repo is only compatible with Mujoco in OpenAI gym. If you want to train agent in Atari domain, please refer to pytorch-a3c.

Usage

There're three tasks/modes for you: train, eval, develop.

train:

python main.py --env-name InvertedPendulum-v1 --num-processes 16 --task train

eval:

python main.py --env-name InvertedPendulum-v1 --task eval --display True --load_ckpt ckpt/a3c/InvertedPendulum-v1.a3c.100

You can choose to display or not using display flags

develop:

python main.py --env-name InvertedPendulum-v1 --num-processes 16 --task develop

In some case that you want to check if you code runs as you want, you might resort to pdb. Here, I provide a develop mode, which only runs in one thread (easy to debug).

Experiment results

learning curve

The plot of total reward/episode length in 1000 steps:

InvertedPendulum-v1

In InvertedPendulum-v1, total reward exactly equal to episode length.

InvertedDoublePendulum-v1

Note that the x axis denote the time in minute

The above curve is plotted from python plot.py --log_path ./logs/a3c/InvertedPendulum-v1.a3c.log

video

InvertedPendulum-v1

InvertedDoublePendulum-v1

Requirements

gym
mujoco-py
pytorch
matplotlib (optional)
seaborn (optional)

TODO

I implement the ShareRMSProp in my_optim.py, but I haven't tried it yet.

Reference

pytorch-a3c

Implement A3C for Mujoco gym envs

Related tags

Overview

pytorch-a3c-mujoco

Usage

Experiment results

learning curve

video

Requirements

TODO

Reference

Owner

Andrew

TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

Over-the-Air Ensemble Inference with Model Privacy

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders

Background-Click Supervision for Temporal Action Localization

Python Auto-ML Package for Tabular Datasets

Label Studio is a multi-type data labeling and annotation tool with standardized output format

VR-Caps: A Virtual Environment for Active Capsule Endoscopy

Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

On-device speech-to-intent engine powered by deep learning

LSTMs (Long Short Term Memory) RNN for prediction of price trends

FSL-Mate: A collection of resources for few-shot learning (FSL).

ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation

ICON: Implicit Clothed humans Obtained from Normals (CVPR 2022)

PyToch implementation of A Novel Self-supervised Learning Task Designed for Anomaly Segmentation

Drone detection using YOLOv5

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

[ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets"

This program generates a random 12 digit/character password (upper and lowercase) and stores it in a file along with your username and app/website.

Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers

PyTorch implementation of some learning rate schedulers for deep learning researcher.