An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
A Python tool to generate and refresh Amazon access tokens.

amazon_auth A Python tool to generate and refresh Amazon access tokens. Description This tool generates and outputs Amazon access and refresh tokens f

15 Nov 21, 2022
Local server that gives you your OAuth 2.0 tokens needed to interact with the Conta Azul's API

What's this? This is a django project meant to be run locally that gives you your OAuth 2.0 tokens needed to interact with Conta Azul's API Prerequisi

Fábio David Freitas 3 Apr 13, 2022
Awesome Django authorization, without the database

rules rules is a tiny but powerful app providing object-level permissions to Django, without requiring a database. At its core, it is a generic framew

1.6k Dec 30, 2022
A recipe sharing API built using Django rest framework.

Recipe Sharing API This is the backend API for the recipe sharing platform at https://mesob-recipe.netlify.app/ This API allows users to share recipes

Hannah 21 Dec 30, 2022
Luca Security Concept

Luca Security Concept This is the document source of luca's security concept. Please go here for the HTML version: https://luca-app.de/securityconcept

luca 43 Oct 22, 2022
Django Rest Framework App wih JWT Authentication and other DRF stuff

Django Queries App with JWT authentication, Class Based Views, Serializers, Swagger UI, CI/CD and other cool DRF stuff API Documentaion /swagger - Swa

Rafael Salimov 4 Jan 29, 2022
Implements authentication and authorization as FastAPI dependencies

FastAPI Security Implements authentication and authorization as dependencies in FastAPI. Features Authentication via JWT-based OAuth 2 access tokens a

Jacob Magnusson 111 Jan 07, 2023
Official implementation of the AAAI 2022 paper "Learning Token-based Representation for Image Retrieval"

Token: Token-based Representation for Image Retrieval PyTorch training code for Token-based Representation for Image Retrieval. We propose a joint loc

Hui Wu 42 Dec 06, 2022
A module making it easier to manage Discord oAuth with Quart

quart_discord A module making it easier to manage Discord oAuth with Quart Install pip install git+https://github.com/xelA/ 5 Oct 27, 2022

A full Rest-API With Oauth2 and JWT for request & response a JSON file Using FastAPI and SQLAlchemy 🔑

Pexon-Rest-API A full Rest-API for request & response a JSON file, Building a Simple WorkFlow that help you to Request a JSON File Format and Handling

Yasser Tahiri 15 Jul 22, 2022
Flask user session management.

Flask-Login Flask-Login provides user session management for Flask. It handles the common tasks of logging in, logging out, and remembering your users

Max Countryman 3.2k Dec 28, 2022
Flask JWT Router is a Python library that adds authorised routes to a Flask app.

Read the docs: Flask-JWT-Router Flask JWT Router Flask JWT Router is a Python library that adds authorised routes to a Flask app. Both basic & Google'

Joe Gasewicz 52 Jan 03, 2023
Multi-user accounts for Django projects

django-organizations Summary Groups and multi-user account management Author Ben Lopatin (http://benlopatin.com) Status Separate individual user ident

Ben Lopatin 1.1k Jan 02, 2023
RSA Cryptography Authentication Proof-of-Concept

RSA Cryptography Authentication Proof-of-Concept This project was a request by Structured Programming lectures in Computer Science college. It runs wi

Dennys Marcos 1 Jan 22, 2022
A simple Boilerplate to Setup Authentication using Django-allauth 🚀

A simple Boilerplate to Setup Authentication using Django-allauth, with a custom template for login and registration using django-crispy-forms.

Yasser Tahiri 13 May 13, 2022
Imia is an authentication library for Starlette and FastAPI (python 3.8+).

Imia Imia (belarussian for "a name") is an authentication library for Starlette and FastAPI (python 3.8+). Production status The library is considered

Alex Oleshkevich 91 Nov 24, 2022
Flask App With Login

Flask App With Login by FranciscoCharles Este projeto basico é o resultado do estudos de algumas funcionalidades do micro framework Flask do Python. O

Charles 3 Nov 14, 2021
Authentication, JWT, and permission scoping for Sanic

Sanic JWT Sanic JWT adds authentication protection and endpoints to Sanic. It is both easy to get up and running, and extensible for the developer. It

Adam Hopkins 229 Jan 05, 2023
This script helps you log in to your LMS account and enter the currently running session

This script helps you log in to your LMS account and enter the currently running session, all in a second

Ali Ebrahimi 5 Sep 01, 2022
A host-guest based app in which host can CREATE the room. and guest can join room with room code and vote for song to skip. User is authenticated using Spotify API

A host-guest based app in which host can CREATE the room. and guest can join room with room code and vote for song to skip. User is authenticated using Spotify API

Aman Raj 5 May 10, 2022