03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision...

Chapter 3: Finite Markov Decision ProcessesSeungjae Ryan Lee

● Simplified, flexible reinforcement learning problem● Consists of States , Actions , Rewards

Markov Decision Process (MDP)

StatesInfo available to agent

ActionsChoice made by agent

RewardsBasis for evaluating choices

The learnerTakes action

Everything outside the agentReturns state and reward

Environment

● Anything the agent cannot arbitrarily change is part of the environment○ Agent might still know everything about the environment

● Different boundaries for different purposes

Agent-Environment Boundary

Machinery Sensors Battery“Brain”

1. Agent observes a state2. Agent takes action3. Agent receives reward and new state4. Agent takes another action 5. Repeat

Agent-Environment Interactions

Transition Probability● Probability of reaching state and reward by taking action on state● Fully describes the dynamics of a finite MDP

● Can deduce other properties of the environment

Expected Rewards● Expected reward of taking action on state

● Expected reward of arriving in state by taking action on state

Recycling Robot Example● States: Battery status (high or low)● Actions

○ Search: High reward. Battery status can be lowered or depleted.○ Wait: Low reward. Battery status does not change.○ Recharge: No reward. Battery status changed to high.

● If battery is depleted, -3 reward and battery status changed to high.

Transition Graph● Graphical summary of MDP dynamics

Designing Rewards● Reward hypothesis

○ Goals and purposes can be represented by maximization of cumulative reward

● Tell what you want to achieve, not how

+1 for each boxProportional to forward action

Always -1

Episodic Tasks● Interactions can be broken into episodes● Episodes end in a special terminal state● Each episode is independent

Finished when the game ends Finished when the agent is out of the maze

Return for Episodic Tasks● Sum of rewards from time step ● Time of termination:

Continuing Tasks● Cannot be naturally broken into episodes● Goes on without limit

Stock Trading

Return for Continuing Tasks● Sum of rewards is almost always infinite● Need to discount future rewards by factor

○ If , the return only considers immediate reward (myopic)

Unified Notation for Return● Cumulative reward● can be a finite number or infinity● Future rewards can be discounted with factor

○ If , then must be less than 1.

Policy● Mapping from states to probabilities of selecting each possible action● : Probability of selecting action in state

State-value function● Expected return from state and following policy

Action-value function● Expected return from taking action in state and following policy

Bellman Equation● Recursive relationship between and

Optimal Policies and Value Functions● For any policy , for all states● There can be multiple optimal policies● All optimal policies share same optimal value functions:

Bellman Optimality Equation● Bellman Equation for optimal policies

Solving Bellman Optimality Equation● Linear system: equations, unknowns● Possible to find the exact optimal policy● Impractical in most environments

○ Need to know the dynamics of the environment○ Need extreme computational power○ Need Markov property

→ In most cases, approximation is the best possible solution.

Approximation● Does not require complete knowledge of environment● Less memory and computational power needed● Can focus learning on frequently encountered states

Thank you!Original content from

● Reinforcement Learning: An Introduction by Sutton and Barto

You can find more content in

● github.com/seungjaeryanlee ● www.endtoend.ai

03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision...

Documents

Transcript of 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision...

FINITE MARKOV CHAINS

Finite Markov Chains and Algorithmic Applications

CSE599i: Online and Adaptive Machine Learning Winter 2018 ...Lecture 17: Reinforcement Learning, Finite Markov Decision Processes 2 De nition 1. (Finite Markov decision processes)

Chapter 10 Finite-State Markov Chains - Winthrop Universityfaculty.winthrop.edu/polaskit/Spring11/Math550/chapter.pdf · 4 CHAPTER 10 Finite-State Markov Chains This chapter concerns

Markov Chains and Mixing Times - · PDF filethe Ising model in Chapter 15 are due ... Simulating a Finite Markov Chain 16 3.3. Irreducibility ... Aldous and Diaconis (1986) presented

Markov Chain Monte Carlo Estimation of Stochastic ... · Markov Chain Monte Carlo Estimation of Stochastic Volatility Models with Finite and Infinite Activity Lévy Jumps Evidence

Lecture 2.2 Finite State Markov Chainsecondse.org/wp-content/uploads/2014/03/lec2.2_finite_state_MCs.pdf · Stability of Finite State MCs Stationary Distributions Dobrushin Coefﬁcient

State-Continuity Approximation of Markov Decision ... · State-Continuity Approximation of Markov Decision Processes via Finite Element Analysis for Autonomous System Planning Junhong

THE CUTOFF PHENOMENON FOR FINITE MARKOV CHAINS · THE CUTOFF PHENOMENON FOR FINITE MARKOV CHAINS Guan-Yu Chen, Ph.D. Cornell University 2006 A card player may ask the following question:

Gradient flows of the entropy for finite Markov chains · 2017. 2. 9. · Keywords: Markov chains; Entropy; Gradient ﬂows; Wasserstein metric; Optimal transportation 1. Introduction

Nash inequalities for finite Markov chainspi.math.cornell.edu/~lsc/papers/Nashfmc.pdfNash Inequalities for Finite Markov Chains 461 where n ~ + n 2 ----- n and K and rr denote also

LOGARITHMIC SOBOLEV INEQUALITIES FOR FINITE MARKOV CHAINSscs/Courses/Stat376/Papers/Converge... · 2008-02-05 · LOGARITHMIC SOBOLEV INEQUALITIES FOR MARKOV CHAINS 697 3 shows that

Finite Markov Decision Processes - Tulane Universityzzheng3/teaching/cmps6660/fall20/mdp.pdf · 2020. 9. 22. · Markov Decision Process •A Markov decision process (MDP) is a Markov

Lagrange Dual Decomposition for Finite Horizon Markov ...

Stable Finite-State Markov Sunspots - WordPress.com been noted. These solutions, like the ﬁnite-state Markov solutions, depend on an extraneous k-state Markov process, but do not

Finite State Markov-chain Approximations to Highly Persistent Processes · 2009-12-07 · Finite State Markov-chain Approximations to Highly Persistent Processes ... the choice of

Finite-Length Markov Processes with Constraints

Model-based Testing. Finite state machines Statecharts Grammars Markov chains Stochastic Automata Networks.

HIDDEN MARKOV MODELS - EEMB DERSLER · A Brief History of Hidden Markov Model •Hidden Markov Models (HMMs) are learnable finite stochastic automates. Nowadays, they are considered

Equivalence classes of functions of finite markov chains - Deep Blue