Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Last update: Dec 28, 2022

Related tags

Deep Learning off-policy

Overview

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms

This repository contains implementations of various off-policy multi-agent reinforcement learning (MARL) algorithms.

Authors: Akash Velu and Chao Yu

Algorithms supported:

MADDPG (MLP and RNN)
MATD3 (MLP and RNN)
QMIX (MLP and RNN)
VDN (MLP and RNN)

Environments supported:

1. Usage

WARNING #1: by default all experiments assume a shared policy by all agents i.e. there is one neural network shared by all agents

WARNING #2: only QMIX and MADDPG are thoroughly tested; however,our VDN and MATD3 implementations make small modifications to QMIX and MADDPG, respectively. We display results using our implementation here.

All core code is located within the offpolicy folder. The algorithms/ subfolder contains algorithm-specific code for all methods. RMADDPG and RMATD3 refer to RNN implementationso of MADDPG and MATD3, and mQMIX and mVDN refer to MLP implementations of QMIX and VDN. We additionally support prioritized experience replay (PER).

The envs/ subfolder contains environment wrapper implementations for the MPEs and SMAC.
Code to perform training rollouts and policy updates are contained within the runner/ folder - there is a runner for each environment.
Executable scripts for training with default hyperparameters can be found in the scripts/ folder. The files are named in the following manner: train_algo_environment.sh. Within each file, the map name (in the case of SMAC and the MPEs) can be altered.
Python training scripts for each environment can be found in the scripts/train/ folder.
The config.py file contains relevant hyperparameter and env settings. Most hyperparameters are defaulted to the ones used in the paper; however, please refer to the appendix for a full list of hyperparameters used.

2. Installation

Here we give an example installation on CUDA == 10.1. For non-GPU & other CUDA version installation, please refer to the PyTorch website.

# create conda environment
conda create -n marl python==3.6.1
conda activate marl
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# install on-policy package
cd on-policy
pip install -e .

Even though we provide requirement.txt, it may have redundancy. We recommend that the user try to install other required packages by running the code and finding which required package hasn't installed yet.

2.1 Install StarCraftII 4.10

unzip SC2.4.10.zip
# password is iagreetotheeula
echo "export SC2PATH=~/StarCraftII/" > ~/.bashrc

download SMAC Maps, and move it to ~/StarCraftII/Maps/.
To use a stableid, copy stableid.json from https://github.com/Blizzard/s2client-proto.git to ~/StarCraftII/.

2.2 Install MPE

# install this package first
pip install seaborn

There are 3 Cooperative scenarios in MPE:

simple_spread
simple_speaker_listener, which is 'Comm' scenario in paper
simple_reference

3.Train

Here we use train_mpe_maddpg.sh as an example:

cd offpolicy/scripts
chmod +x ./train_mpe_maddpg.sh
./train_mpe_maddpg.sh

Local results are stored in subfold scripts/results. Note that we use Weights & Bias as the default visualization platform; to use Weights & Bias, please register and login to the platform first. More instructions for using Weights&Bias can be found in the official documentation. Adding the --use_wandb in command line or in the .sh file will use Tensorboard instead of Weights & Biases.

4. Results

Results for the performance of RMADDPG and QMIX on the Particle Envs and QMIX in SMAC are depicted here. These results are obtained using a normal (not prioitized) replay buffer.

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Related tags

Overview

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms

Algorithms supported:

Environments supported:

1. Usage

2. Installation

2.1 Install StarCraftII 4.10

2.2 Install MPE

3.Train

4. Results

Owner

Non-Imaging Transient Reconstruction And TEmporal Search (NITRATES)

A short code in python, Enchpyter, is able to encrypt and decrypt words as you determine, of course

Code repository for Semantic Terrain Classification for Off-Road Autonomous Driving

Simulations for Turring patterns on an apically expanding domain. T

《Train in Germany, Test in The USA: Making 3D Object Detectors Generalize》(CVPR 2020)

WarpRNNT loss ported in Numba CPU/CUDA for Pytorch

Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"

H&M Fashion Image similarity search with Weaviate and DocArray

The most simple and minimalistic navigation dashboard.

Source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

Emotion classification of online comments based on RNN

Generate fine-tuning samples & Fine-tuning the model & Generate samples by transferring Note On

Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper

A machine learning malware analysis framework for Android apps.

🧠 A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation.', ECCV 2016

Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'.

Differentiable Simulation of Soft Multi-body Systems

code for generating data set ES-ImageNet with corresponding training code

This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.

PyTorch implementation for "Mining Latent Structures with Contrastive Modality Fusion for Multimedia Recommendation"