Exploration-Exploitation Dilemma Solving Methods

Medium article for this repo - HERE

In ths repo I implemented two techniques for tackling mentioned tradeoff. Methods Include:-

Epsilon Greedy (With different epsilons)
Thompson Sampling(also known as posterior sampling)

The reason for choosing these two only is to show the upper and lower bounds as epsilons are a starting point in dealing with these tradeoffs and Thompson Sampling is considered a recent state of the Art in this field.

ENV SPECIFICATIONS - A 10 arm testbed is simulated as same demonstrated in Sutton-Barto Book.
True Reward distribution (Here Action-2 is best)

Comparison Greedy(or Epsilon Greedies and TS

we used three different epsilons here for testing i.e:

epsilon = 0 => Greedy Agent
epsilon = 0.01 => exploration with 1% probability
epsilon = 0.1 => exploration with 10% probability

and TS

Averaged Over 2500 independent runs with 1500 timesteps

Comparison

Percentage Actions selected for epsilon = 0.01 and TS

Conclusion -> epsilon = 0.01 can be considered best for eps-greedies as it is increasing but pretty slow and the percentage Optimal Actions for it is Around 80% in later stages, on the other hand Thomsan Sampling shows a significant improvement in these results as it quickly explores and then exploit the optimal one with percentage goes upto almost 100 even very early!!.

In case you want to know more about TS visit this Reference.

Exploration-Exploitation Dilemma Solving Methods

Related tags

Overview

Exploration-Exploitation Dilemma Solving Methods

Comparison Greedy(or Epsilon Greedies and TS

Owner

Aman Mishra

Pytorch implementation of Learning with Opponent-Learning Awareness

基于Paddle框架的arcface复现

Install alphafold on the local machine, get out of docker.

Dogs classification with Deep Metric Learning using some popular losses

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Photographic Image Synthesis with Cascaded Refinement Networks - Pytorch Implementation

Open standard for machine learning interoperability

pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

Mall-Customers-Segmentation - Customer Segmentation Using K-Means Clustering

STMTrack: Template-free Visual Tracking with Space-time Memory Networks

Implementation of CVPR'21: RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction

Learning Time-Critical Responses for Interactive Character Control

Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.

Package for extracting emotions from social media text. Tailored for financial data.

A PaddlePaddle version of Neural Renderer, refer to its PyTorch version

Tool cek opsi checkpoint facebook!

PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Compartmental epidemic model to assess undocumented infections: applications to SARS-CoV-2 epidemics in Brazil - Datasets and Codes

Source code of SIGIR2021 Paper 'One Chatbot Per Person: Creating Personalized Chatbots based on Implicit Profiles'

Trying to understand alias-free-gan.