Awesome Model-Based Reinforcement Learning

This is a collection of research papers for model-based reinforcement learning (mbrl). And the repository will be continuously updated to track the frontier of model-based rl.

Welcome to follow and star!

A Taxonomy of Model-Based RL Algorithms
Papers

A Taxonomy of Model-Based RL Algorithms

We’ll start this section with a disclaimer: it’s really quite hard to draw an accurate, all-encompassing taxonomy of algorithms in the Model-Based RL space, because the modularity of algorithms is not well-represented by a tree structure. So we will publish a series of related blogs to explain more Model-Based RL algorithms.

A non-exhaustive, but useful taxonomy of algorithms in modern Model-Based RL.

We simply divide Model-Based RL into two categories: Learn the Model and Given the Model.

Learn the Model mainly focuses on how to build the environment model.
Given the Model cares about how to utilize the learned model.

And we give some examples as shown in the figure above. There are links to algorithms in taxonomy.

[1] World Models: Ha and Schmidhuber, 2018
[2] I2A (Imagination-Augmented Agents): Weber et al, 2017
[3] MBMF (Model-Based RL with Model-Free Fine-Tuning): Nagabandi et al, 2017
[4] MBVE (Model-Based Value Expansion): Feinberg et al, 2018
[5] ExIt (Expert Iteration): Anthony et al, 2017
[6] AlphaZero: Silver et al, 2017
[7] POPLIN (Model-Based Policy Planning): Wang et al, 2019
[8] M2AC (Masked Model-based Actor-Critic): Pan et al, 2020

Papers

format:
- [title](paper link) [links]
  - author1, author2, and author3.
  - openreview [if the score is public]
  - key 
  - experiment environment

Classic Model-Based RL Papers

Dyna, an integrated architecture for learning, planning, and reacting
- Richard S. Sutton. ACM 1991
- Key: dyna architecture
- ExpEnv: None
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
- Marc Peter Deisenroth, Carl Edward Rasmussen. ICML 2011
- Key: probabilistic dynamics model
- ExpEnv: cart-pole system, robotic unicycle
Learning Complex Neural Network Policies with Trajectory Optimization
- Sergey Levine, Vladlen Koltun. ICML 2014
- Key: guided policy search
- ExpEnv: mujoco
Learning Continuous Control Policies by Stochastic Value Gradients
- Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, Tom Erez. NIPS 2015
- Key: backpropagation through paths + gradient on real trajectory
- ExpEnv: mujoco
Value Prediction Network
- Junhyuk Oh, Satinder Singh, Honglak Lee. NIPS 2017
- Key: value-prediction model
- ExpEnv: collect domain, atari
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
- Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee. NIPS 2018
- Key: ensemble model and Qnet + value expansion
- ExpEnv: mujoco, roboschool
Recurrent World Models Facilitate Policy Evolution
- David Ha, Jürgen Schmidhuber. NIPS 2018
- Key: vae(representation) + rnn(predictive model)
- ExpEnv: car racing, vizdoom
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine. NIPS 2018
- Key: probabilistic ensembles with trajectory sampling
- ExpEnv: cartpole, mujoco
When to Trust Your Model: Model-Based Policy Optimization
- Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine. NeurIPS 2019
- Key: ensemble model + sac + k-branched rollout
- ExpEnv: mujoco
Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees
- Yuping Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma. ICLR 2019
- Key: Discrepancy Bounds Design + ME-TRPO with multi-step + Entropy regularization
- ExpEnv: mujoco
Model-Ensemble Trust-Region Policy Optimization
- Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel. ICLR 2018
- Key: ensemble model + TRPO
- ExpEnv: mujoco
Dream to Control: Learning Behaviors by Latent Imagination
- Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi. ICLR 2019
- Key: latent space imagination
- ExpEnv: deepmind control suite, atari, deepmind lab
Exploring Model-based Planning with Policy Networks
- Tingwu Wang, Jimmy Ba. ICLR 2020
- Key: model-based policy planning in action space and parameter space
- ExpEnv: mujoco
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver. Nature 2020
- Key: MCTS + value equivalence
- ExpEnv: chess, shogi, go, atari

NeurIPS 2021

On Effective Scheduling of Model-based Reinforcement Learning
- Hang Lai, Jian Shen, Weinan Zhang, Yimin Huang, Xing Zhang, Ruiming Tang, Yong Yu, Zhenguo Li
- Key: extension of mbpo + hyper-controller learning
- OpenReview: 8, 6, 6
- ExpEnv: mujoco, pybullet
Model-Based Reinforcement Learning via Imagination with Derived Memory
- Yao Mu, Yuzheng Zhuang, Bin Wang, Guangxiang Zhu, Wulong Liu, Jianyu Chen, Ping Luo, Shengbo Eben Li, Chongjie Zhang, Jianye HAO
- Key: extension of dreamer + prediction-reliability weight
- OpenReview: 6, 6, 6, 6
- ExpEnv: deepmind control suite
MobILE: Model-Based Imitation Learning From Observation Alone
- Rahul Kidambi, Jonathan Chang, Wen Sun
- Key: imitation learning from observations alone + mbrl
- OpenReview: 6, 6, 6, 4
- ExpEnv: cartpole, mujoco
Model-Based Episodic Memory Induces Dynamic Hybrid Controls
- Hung Le, Thommen Karimpanal George, Majid Abdolshah, Truyen Tran, Svetha Venkatesh
- Key: model-based + episodic control
- OpenReview: 7, 7, 6, 6
- ExpEnv: 2D maze navigation, cartpole, mountainCar and lunarlander, atari, 3D navigation: gym-miniworld
A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning
- Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio
- Key: mbrl + set representation
- OpenReview: 7, 7, 7, 6
- ExpEnv: MiniGrid-BabyAI framework
Mastering Atari Games with Limited Data
- Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao
- Key: muzero + self-supervised consistency loss
- OpenReview: 7, 7, 7, 5
- ExpEnv: atrai 100k, deepmind control suite
Self-Consistent Models and Values
- Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver
- Key: new model learning way
- OpenReview: 7, 7, 7, 6
- ExpEnv: tabular MDP, Sokoban, atari
MOPO: Model-based Offline Policy Optimization
- Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma
- Key: model-based + offline
- OpenReview: None
- ExpEnv: d4rl dataset, halfcheetah-jump and ant-angle
RoMA: Robust Model Adaptation for Offline Model-based Optimization
- Sihyun Yu, Sungsoo Ahn, Le Song, Jinwoo Shin
- Key: model-based + offline
- OpenReview: 7, 6, 6
- ExpEnv: design-bench
Offline Reinforcement Learning with Reverse Model-based Imagination
- Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang
- Key: model-based + offline
- OpenReview: 7, 6, 6, 5
- ExpEnv: d4rl dataset
Offline Model-based Adaptable Policy Learning
- Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Tony Qin, Shang Wenjie, Jieping Ye
- Key: model-based + offline
- OpenReview: 6, 6, 6, 4
- ExpEnv: d4rl dataset
Weighted model estimation for offline model-based reinforcement learning
- Toru Hishinuma, Kei Senda
- Key: model-based + offline
- OpenReview: 7, 6, 6, 6
- ExpEnv: pendulum, d4rl dataset
Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
- Weitong Zhang, Dongruo Zhou, Quanquan Gu
- Key: learning theory + model-based reward-free RL + linear function approximation
- OpenReview: 6, 6, 5, 5
- ExpEnv: None
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature
- Kefan Dong, Jiaqi Yang, Tengyu Ma
- Key: learning theory + model-based bandit RL + nonlinear function approximation
- OpenReview: 7, 7, 7, 6
- ExpEnv: None

ICLR 2021

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
- Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu
- Key: model-based + behavior cloning (warmup) + trpo
- OpenReview: 8, 7, 7, 5
- ExpEnv: d4rl dataset
Control-Aware Representations for Model-based Reinforcement Learning
- Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh
- Key: representation learning + model-based soft actor-critic
- OpenReview: 6, 6, 6
- ExpEnv: planar system, inverted pendulum – swingup, cartpole, 3-link manipulator — swingUp & balance
Mastering Atari with Discrete World Models
- Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba
- Key: Dreamer V1 + many tricks(multiple categorical variables, KL balancing, etc)
- OpenReview: 9, 8, 5, 4
- ExpEnv: atari
Model-Based Visual Planning with Self-Supervised Functional Distances
- Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine
- Key: goal-reaching task + dynamics learning + distance learning (goal-conditioned Q-function)
- OpenReview: 7, 7, 7, 7
- ExpEnv: sawyer, door sliding
Model-Based Offline Planning
- Arthur Argenson, Gabriel Dulac-Arnold
- Key: model-based + offline
- OpenReview: 8, 7, 5, 5
- ExpEnv: RL Unplugged(RLU), d4rl dataset
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation
- Justin Fu, Sergey Levine
- Key: model-based + offline
- OpenReview: 8, 6, 6
- ExpEnv: design-bench
On the role of planning in model-based deep reinforcement learning
- Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar Veličković, Théophane Weber
- Key: discussion about planning in MuZero
- OpenReview: 7, 7, 6, 5
- ExpEnv: atari, go, deepmind control suite
Representation Balancing Offline Model-based Reinforcement Learning
- Byung-Jun Lee, Jongmin Lee, Kee-Eung Kim
- Key: Representation Balancing MDP + model-based + offline
- OpenReview: 7, 7, 7, 6
- ExpEnv: d4rl dataset
Model-based micro-data reinforcement learning: what are the crucial model properties and which model to choose?
- Balázs Kégl, Gabriel Hurtado, Albert Thomas
- Key: mixture density nets + heteroscedasticity
- OpenReview: 7, 7, 7, 6, 5
- ExpEnv: acrobot system

ICML 2021

Conservative Objective Models for Effective Offline Model-Based Optimization
- Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine
- Key: conservative objective model + offline mbrl
- ExpEnv: design-bench
Continuous-Time Model-Based Reinforcement Learning
- Çağatay Yıldız, Markus Heinonen, Harri Lähdesmäki
- Key: continuous-time
- ExpEnv: pendulum, cartPole and acrobot
Model-Based Reinforcement Learning via Latent-Space Collocation
- Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine
- Key: latent space collocation
- ExpEnv: sparse metaworld tasks
Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
- David A Bruns-Smith
- Key: worst-case bounds
- ExpEnv: ope-tools
Muesli: Combining Improvements in Policy Optimization
- Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt
- Key: value equivalence
- ExpEnv: atari
PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration
- Yuda Song, Wen Sun
- Key: sample complexity + kernelized nonlinear regulators + linear MDPs
- ExpEnv: mountain car, antmaze, mujoco
Temporal Predictive Coding For Model-Based Planning In Latent Space
- Tung Nguyen, Rui Shu, Tuan Pham, Hung Bui, Stefano Ermon
- Key: temporal predictive coding with a RSSM + latent space
- ExpEnv: deepmind control suite
Model-based Reinforcement Learning for Continuous Control with Posterior Sampling
- Ying Fan, Yifei Ming
- Key: regret bound of psrl + mpc
- ExpEnv: continuous cartpole, pendulum swingup,, mujoco
A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
- Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
- Key: learning theory + multi-agent + model-based self play + two-player zero-sum Markov games
- ExpEnv: None

License

Awesome Model-Based RL is released under the Apache 2.0 license.

A curated list of awesome Model-Based RL resources

Related tags

Overview

Awesome Model-Based Reinforcement Learning

Table of Contents

A Taxonomy of Model-Based RL Algorithms

Papers

Classic Model-Based RL Papers

NeurIPS 2021

ICLR 2021

ICML 2021

License

Owner

OpenDILab

Code for reproducing experiments in "Improved Training of Wasserstein GANs"

License Plate Detection Application

Studying Python release adoptions by looking at PyPI downloads

PaddleBoBo是基于PaddlePaddle和PaddleSpeech、PaddleGAN等开发套件的虚拟主播快速生成项目

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

H&M Fashion Image similarity search with Weaviate and DocArray

Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

NOMAD - A blackbox optimization software

Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

Source for the paper "Universal Activation Function for machine learning"

Keras implementation of the GNM model in paper ’Graph-Based Semi-Supervised Learning with Nonignorable Nonresponses‘

Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

This repository contains the source code of our work on designing efficient CNNs for computer vision

This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"

ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm

Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences

Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Code for the TPAMI paper: "Syntax Customized Video Captioning by Imitating Exemplar Sentences"