Value Function Approximation via Low-Rank Models

16
Value Function Approximation via Low-Rank Models Hao Yi Ong AA 222, Stanford University May 28, 2015

Transcript of Value Function Approximation via Low-Rank Models

Page 1: Value Function Approximation via Low-Rank Models

Value Function Approximation via Low-Rank Models

Hao Yi Ong

AA 222, Stanford University

May 28, 2015

Page 2: Value Function Approximation via Low-Rank Models

Outline

Introduction

Formulation

Approach

Numerical experiments

Introduction 2

Page 3: Value Function Approximation via Low-Rank Models

Value function approximation

I Markov decision process can be solved optimally given thestate-action value function

– value function gives utility for taking an action given a state; want tofind action that maximizes utility

– can be represented as a matrix for discrete problems– typically millions or billions of dimensions for practical problems

I value function approximation finds compact alternative

– basis functions used widely in reinforcement learning (RL)– e.g., Gaussian radial basis function, neural network

Introduction 3

Page 4: Value Function Approximation via Low-Rank Models

Value function decomposition

idea: approximate value function as low-rank plus sparse components

I assumes intrinsic low-dimensionality

– i.e., value function can be captured by small set of features– hinted by success of basis function approximation in RL

I falls under category of Robust Principal Component Analysis (PCA)

– widely used in image/video analysis and collaborative filtering; e.g.,Netflix challenge

– novel application of Robust PCA as far as author is aware

Introduction 4

Page 5: Value Function Approximation via Low-Rank Models

Outline

Introduction

Formulation

Approach

Numerical experiments

Formulation 5

Page 6: Value Function Approximation via Low-Rank Models

Markov decision process

defined by the tuple (S,A, T,R)

I S and A are the sets of all possible states and actions, respectively

I T gives the probability of transitioning into state s′ from takingaction a at the current state s, and is often denoted T (s, a, s′)

I R gives a scalar value indicating the immediate reward received fortaking action a at the current state s and is denoted R (s, a)

Formulation 6

Page 7: Value Function Approximation via Low-Rank Models

Value iteration

want to find the optimal policy π? (s)

I returns action that maximizes the utility from any given state

I related to state-action value function Q? (s, a)

π? (s) = argmaxa∈A

Q? (s, a)

I value iteration updates value function guess Q̂ until convergence

Q̂ (s, a) := R (s, a) +∑s′∈S

T (s, a, s′)maxa′∈A

Q̂ (s′, a′)

Formulation 7

Page 8: Value Function Approximation via Low-Rank Models

Matrix decomposition

I suppose matrix M ∈ Rm×n encodes Q? (s, a)

– m and n are the cardinalities of the state and action spaces

I approximate with decomposition M = L0 + S0

– L0 and S0 are the true low-rank and sparse components

I why should this work?

– implicit assumption about correlation of utility values across actions

Formulation 8

Page 9: Value Function Approximation via Low-Rank Models

Matrix decomposition

M

(m×n)

= AL0

(m×r)

BTL0

(r×n)

+ S0

(m×n)

Formulation 9

Page 10: Value Function Approximation via Low-Rank Models

Outline

Introduction

Formulation

Approach

Numerical experiments

Approach 10

Page 11: Value Function Approximation via Low-Rank Models

Principal Component Pursuit (PCP)

I best (known) convex estimate of Robust PCA

minimize ‖L‖∗ + λ ‖S‖1subject to L+ S = M

I intuitively

– nuclear norm ‖·‖∗ is best convex approximation to minimizing rank– `1-norm has sparsifying property

I remarkably, solution to PCP decomposes M perfectly [CLMW11]

Approach 11

Page 12: Value Function Approximation via Low-Rank Models

Outline

Introduction

Formulation

Approach

Numerical experiments

Numerical experiments 12

Page 13: Value Function Approximation via Low-Rank Models

Mountain car

Numerical experiments 13

Page 14: Value Function Approximation via Low-Rank Models

Inverted pendulum

Numerical experiments 14

Page 15: Value Function Approximation via Low-Rank Models

Implementation

https://github.com/haoyio/LowRankMDP

Numerical experiments 15

Page 16: Value Function Approximation via Low-Rank Models

References

I Emmanuel J Candes, Xiaodong Li, Yi Ma, and John Wright.Robust principal component analysis?Journal of the Association for Computing Machinery, 58(3), 2011.

16