batch-bandits

Implementation of popular bandit algorithms in batch environments.

Source code to our paper "The Impact of Batch Learning in Stochastic Bandits" accepted at the workshop on the Ecological Theory of Reinforcement Learning, NeurIPS 2021.

Overview

The repository provides an opportunuty to run simulations or replay logged datasets in sequential batch manner - sequential interaction with the environment when responses are grouped in batches and observed by the agent only at the end of each batch. Broadly speaking, sequential batch learning is a more generalized way of learning which covers both offline and online settings as special cases bringing together their advantages.

Framework

Two particularly useful versions of the multi-armed bandit problem are implemented: Stochastic Multi-Armed Bandit (MAB) and Contextual Multi-Armed Bandit (CMAB). The key feature of the project is that both versions support parameter batch_size - a certain period of time when the agent interacts with the environment "blindly". Despite the batch setting is a property of the environment, this limitation is considered from a policy perspective. With this, it is assumed that it is not the online agent who works with the batch environment, but the batch policy interacts with the online environment.

The project is built upon RL-GLue framework, which provides an interface to connect agents, environments, and experiment programs. Note, that MAB/rl_glue.py and CMAB/rl_glue.py were adapted to make batch interaction possible.

Implemented algorithms

Version	Algorithm	Comment
MAB	ε - greedy	-
MAB	Thompson Sampling	-
MAB	UCB	-
CMAB	LinTS	see link (and references therein) for more details
CMAB	LinUCB	see article for theoretical description
CMAB	Offline evaluator	policy evaluation technique; see article for theoretical quarantees

Implementation of popular bandit algorithms in batch environments.

Related tags

Overview

batch-bandits

Overview

Framework

Implemented algorithms

Owner

Danil Provodin

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Pytorch Implementation for Dilated Continuous Random Field

MMFlow is an open source optical flow toolbox based on PyTorch

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

Running AlphaFold2 (from ColabFold) in Azure Machine Learning

Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

Multi-Task Deep Neural Networks for Natural Language Understanding

Official implementations of PSENet, PAN and PAN++.

Implementation of the HMAX model of vision in PyTorch

PyTorch implementation of Glow

Code for "Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation". [AAAI 2021]

Codes for the compilation and visualization examples to the HIF vegetation dataset

A python package for generating, analyzing and visualizing building shadows

[ICCV 2021] Deep Hough Voting for Robust Global Registration

Code for paper "Learning to Reweight Examples for Robust Deep Learning"

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

[NeurIPS 2021] PyTorch Code for Accelerating Robotic Reinforcement Learning with Parameterized Action Primitives