A parallel framework for population-based multi-agent reinforcement learning.

Last update: Jan 08, 2023

Overview

MALib: A parallel framework for population-based multi-agent reinforcement learning

MALib is a parallel framework of population-based learning nested with (multi-agent) reinforcement learning (RL) methods, such as Policy Space Response Oracle, Self-Play and Neural Fictitous Self-Play. MALib provides higher-level abstractions of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms. The design of MALib also strives to promote the research of other multi-agent learning, including multi-agent imitation learning and model-based MARL.

Installation

The installation of MALib is very easy. We've tested MALib on Python 3.6 and 3.7. This guide is based on ubuntu 18.04 and above. We strongly recommend using conda to manage your dependencies, and avoid version conflicts. Here we show the example of building python 3.7 based conda environment.

conda create -n malib python==3.7 -y
conda activate malib

# install dependencies
./install_deps.sh

# install malib
pip install -e .

External environments are integrated in MALib, such as StarCraftII and vizdoom, you can install them via pip install -e .[envs]. For users who wanna contribute to our repository, run pip install -e .[dev] to complete the development dependencies.

optional: if you wanna use alpha-rank to solve meta-game, install open-spiel with its installation guides

Quick Start

"""PSRO with PPO for Leduc Holdem"""

from malib.envs.poker import poker_aec_env as leduc_holdem
from malib.runner import run
from malib.rollout import rollout_func


env = leduc_holdem.env(fixed_player=True)

run(
    agent_mapping_func=lambda agent_id: agent_id,
    env_description={
        "creator": leduc_holdem.env,
        "config": {"fixed_player": True},
        "id": "leduc_holdem",
        "possible_agents": env.possible_agents,
    },
    training={
        "interface": {
            "type": "independent",
            "observation_spaces": env.observation_spaces,
            "action_spaces": env.action_spaces
        },
    },
    algorithms={
        "PSRO_PPO": {
            "name": "PPO",
            "custom_config": {
                "gamma": 1.0,
                "eps_min": 0,
                "eps_max": 1.0,
                "eps_decay": 100,
            },
        }
    },
    rollout={
        "type": "async",
        "stopper": "simple_rollout",
        "callback": rollout_func.sequential
    }
)

Citing MALib

If you use MALib in your work, please cite the accompanying paper.

@misc{zhou2021malib,
      title={MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning}, 
      author={Ming Zhou and Ziyu Wan and Hanjing Wang and Muning Wen and Runzhe Wu and Ying Wen and Yaodong Yang and Weinan Zhang and Jun Wang},
      year={2021},
      eprint={2106.07551},
      archivePrefix={arXiv},
      primaryClass={cs.MA}
}

A parallel framework for population-based multi-agent reinforcement learning.

Related tags

Overview

MALib: A parallel framework for population-based multi-agent reinforcement learning

Installation

Quick Start

Citing MALib

Owner

MARL @ SJTU

Code for "R-GCN: The R Could Stand for Random"

Learning Confidence for Out-of-Distribution Detection in Neural Networks

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Baseline powergrid model for NY

This repository compare a selfie with images from identity documents and response if the selfie match.

🚗 INGI Dakar 2K21 - Be the first one on the finish line ! 🚗

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Implementation of Online Label Smoothing in PyTorch

CoaT: Co-Scale Conv-Attentional Image Transformers

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

A PyTorch Toolbox for Face Recognition

Everything you need to know about NumPy( Creating Arrays, Indexing, Math,Statistics,Reshaping).

2021 CCF BDCI 全国信息检索挑战杯（CCIR-Cup）智能人机交互自然语言理解赛道第二名参赛解决方案

YOLOX + ROS(1, 2) object detection package

Source code for "OmniPhotos: Casual 360° VR Photography"

Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP

Code accompanying the paper "How Tight Can PAC-Bayes be in the Small Data Regime?"

Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

Bolt Online Learning Toolbox

Language Models for the legal domain in Spanish done @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).