Value Function Approximation via Low-Rank Models

Value Function Approximation via Low-Rank Models

Hao Yi Ong

AA 222, Stanford University

May 28, 2015

Outline

Introduction

Formulation

Approach

Numerical experiments

Introduction 2

Value function approximation

I Markov decision process can be solved optimally given thestate-action value function

– value function gives utility for taking an action given a state; want tofind action that maximizes utility

– can be represented as a matrix for discrete problems– typically millions or billions of dimensions for practical problems

I value function approximation finds compact alternative

– basis functions used widely in reinforcement learning (RL)– e.g., Gaussian radial basis function, neural network

Introduction 3

Value function decomposition

idea: approximate value function as low-rank plus sparse components

I assumes intrinsic low-dimensionality

– i.e., value function can be captured by small set of features– hinted by success of basis function approximation in RL

I falls under category of Robust Principal Component Analysis (PCA)

– widely used in image/video analysis and collaborative filtering; e.g.,Netflix challenge

– novel application of Robust PCA as far as author is aware

Introduction 4

Outline

Introduction

Formulation

Approach


Formulation 5

Markov decision process

defined by the tuple (S,A, T,R)

I S and A are the sets of all possible states and actions, respectively

I T gives the probability of transitioning into state s′ from takingaction a at the current state s, and is often denoted T (s, a, s′)

I R gives a scalar value indicating the immediate reward received fortaking action a at the current state s and is denoted R (s, a)

Formulation 6

Value iteration

want to find the optimal policy π? (s)

I returns action that maximizes the utility from any given state

I related to state-action value function Q? (s, a)

π? (s) = argmaxa∈A

Q? (s, a)

I value iteration updates value function guess Q̂ until convergence

Q̂ (s, a) := R (s, a) +∑s′∈S

T (s, a, s′)maxa′∈A

Q̂ (s′, a′)

Formulation 7

Matrix decomposition

I suppose matrix M ∈ Rm×n encodes Q? (s, a)

– m and n are the cardinalities of the state and action spaces

I approximate with decomposition M = L0 + S0

– L0 and S0 are the true low-rank and sparse components

I why should this work?

– implicit assumption about correlation of utility values across actions

Formulation 8

Matrix decomposition

M

(m×n)

= AL0

(m×r)

BTL0

(r×n)

+ S0

(m×n)

Formulation 9

Outline

Introduction

Formulation

Approach


Approach 10

Principal Component Pursuit (PCP)

I best (known) convex estimate of Robust PCA

minimize ‖L‖∗ + λ ‖S‖1subject to L+ S = M

I intuitively

– nuclear norm ‖·‖∗ is best convex approximation to minimizing rank– `1-norm has sparsifying property

I remarkably, solution to PCP decomposes M perfectly [CLMW11]

Approach 11

Outline

Introduction

Formulation

Approach


Numerical experiments 12

Mountain car


Inverted pendulum


Implementation

https://github.com/haoyio/LowRankMDP


https://github.com/haoyio/LowRankMDP

References

I Emmanuel J Candes, Xiaodong Li, Yi Ma, and John Wright.Robust principal component analysis?Journal of the Association for Computing Machinery, 58(3), 2011.

16

Value Function Approximation via Low-Rank Models

Data & Analytics

Transcript of Value Function Approximation via Low-Rank Models