Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf...

36
Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Transcript of Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf...

Page 1: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Stochastic Dynamic Programming with Factored Representations

Presentation by Dafna Shahaf(Boutilier, Dearden, Goldszmidt 2000)

Page 2: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

The Problem Standard MDP algorithms require explicit

state space enumeration Curse of dimensionality Need: Compact Representation

(intuition: STRIPS) Need: versions of standard dynamic

programming algorithms for it

Page 3: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

A Glimpse of the Future

Policy Tree Value Tree

Page 4: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

A Glimpse of the Future: Some Experimental Results

Page 5: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Roadmap

MDPs- Reminder Structured Representation for MDPs:

Bayesian Nets, Decision Trees Algorithms for Structured Representation Experimental Results Extensions

Page 6: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

MDPs- Reminder

(states, actions, transitions, rewards)

Discounted infinite-horizon Stationary Policies

(an action to take at state s) Value functions: is k-stage-to-go

value function for π)(sV k

AS :

RTAS ,,,

Page 7: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Roadmap

MDPs- Reminder Structured Representation for MDPs:

Bayesian Nets, Decision Trees Algorithms for Structured Representation Experimental Results Extensions

Page 8: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Representing MDPs as Bayesian Networks: Coffee world

O: Robot is in office W: Robot is wet U: Has umbrella R: It is raining HCR: Robot has coffee HCO: Owner has coffee

Go: Switch location BuyC: Buy coffee DelC: Deliver coffee GetU: Get umbrella

The effect of the actions might be noisy.Need to provide a distribution for each effect.

Page 9: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Representing Actions: DelC

00.300

Page 10: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Representing Actions: Interesting Points

No need to provide marginal distribution over pre-action variables

Markov Property: we need only the previous state For now, no synchronic arcs Frame Problem? Single Network vs. a network for each action Why Decision Trees?

Page 11: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Representing Reward

Generally determined by a subset of features.

Page 12: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Policies and Value Functions

Policy Tree Value Tree

The optimal choice may depend only on certain variables (given some others).

FeaturesHCR=T

HCR=F

ValuesActions

Page 13: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Roadmap

MDPs- Reminder Structured Representation for MDPs:

Bayesian Nets, Decision Trees Algorithms for Structured Representation Experimental Results Extensions

Page 14: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Bellman Backup

Q-Function: The value of performing a in s, given value function v

Value Iteration- Reminder

)'(' )',,'Pr()()( ss vsassRsQva

)(max:)(max)(1

sQsQsV kaa

Vaa

kk

)}'(' ),,'Pr({max)()( 1 ss VsassRsV ka

k

Page 15: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Structured Value Iteration- OverviewInput: Tree( ). Output: Tree( ).

1. Set Tree( )= Tree( )

2. Repeat

(a) Compute Tree( )= Regress(Tree( ),a)

for each action a

(b) Merge (via maximization) trees Tree( )

to obtain Tree( )

Until termination criterion. Return Tree( ).

VkaQ

VkaQ

RR

0V

1kV1kV

kV

*V

Page 16: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Example World

Page 17: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Step 2a: Calculating Q-Functions

)'(' )',,Pr()()( ss VsassRsQVa

1. Expected FutureValue

2. DiscountingFutureValue

3. AddingImmediate

Reward

How to use the structure of the trees?

Tree( ) should distinguish only conditions under which a makes a branch of Tree(V) true with different odds.

VaQ

Page 18: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Calculating :

Tree(V0)

1aQ

PTree( )

Finding conditions under which a will have distinct expected value, with respect to V0

1aQ FVTree( )1aQ

Undiscounted Expected Future Value for performing action a with one-stage-to-go.

Tree( )1aQ

Discounting FVTree (by 0.9), and adding the immediate reward function.

Z:

Z: Z:

1*10+0*0

Page 19: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

An Alternative View:

Page 20: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

(a more complicated example)

Tree(V1) PartialPTree( )

UnsimplifiedPTree( )

PTree( )

2aQ

2aQ

2aQ FVTree( )2aQ Tree( )

2aQ

Page 21: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

The Algorithm: Regress

Input: Tree(V), action a. Output: Tree( )

1. PTree( )= PRegress(Tree(V),a) (simplified)

VaQ

VaQ

Page 22: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

The Algorithm: Regress

Input: Tree(V), action a. Output: Tree( )

1. PTree( )= PRegress(Tree(V),a) (simplified)

2. Construct FVTree( ):

for each branch b of PTree, with leaf node l(b)

(a) Prb =the product of individual distr. from l(b)

(b)

(c) Re-label leaf l(b) with vb.

VaQ

VaQ

VaQ

)(')'()'(Pr

VTreeb

bb bVbv

Page 23: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

The Algorithm: Regress

Input: Tree(V), action a. Output: Tree( )

1. PTree( )= PRegress(Tree(V),a) (simplified)

2. Construct FVTree( ):

for each branch b of PTree, with leaf node l(b)

(a) Prb =the product of individual distr. from l(b)

(b)

(c) Re-label leaf l(b) with vb.

3. Discount FVTree( ) with , append Tree(R)

4. Return FVTree( )

VaQ

VaQ

VaQ

)(')'()'(Pr

VTreeb

bb bVbv

VaQ

VaQ

Page 24: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

The Algorithm: PRegressInput: Tree(V), action a. Output: PTree( )

1. If Tree(V) is a single node, return emptyTree

2. X = the variable at the root of Tree(V)

= the tree for CPT(X) (label leaves with X)

VaQ

PXT

Page 25: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

The Algorithm: PRegressInput: Tree(V), action a. Output: PTree( )

1. If Tree(V) is a single node, return emptyTree

2. X = the variable at the root of Tree(V)

= the tree for CPT(X) (label leaves with X)

3. = the subtrees of Tree(V) for X=t, X=f

4. = call PRegress on

VaQ

PXT

VfX

VtX TT ,

PfX

PtX TT ,

VfX

VtX TT ,

Page 26: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

The Algorithm: PRegressInput: Tree(V), action a. Output: PTree( )

1. If Tree(V) is a single node, return emptyTree

2. X = the variable at the root of Tree(V)

= the tree for CPT(X) (label leaves with X)

3. = the subtrees of Tree(V) for X=t, X=f

4. = call PRegress on

5. For each leaf l in , add or both (according to distribution. Use union to combine labels)

6. Return

VaQ

PXT

VfX

VtX TT ,

PfX

PtX TT ,

VfX

VtX TT ,

PXT

PfX

PtX TT ,

PXT

Page 27: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Step 2b. Maximization

Value Iteration Complete.

Page 28: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Roadmap

MDPs- Reminder Structured Representation for MDPs:

Bayesian Nets, Decision Trees Algorithms for Structured Representation Experimental Results Extensions

Page 29: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Experimental Results

WorstCase:

BestCase:

Page 30: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Roadmap

MDPs- Reminder Structured Representation for MDPs:

Bayesian Nets, Decision Trees Algorithms for Structured Representation Experimental Results Extensions

Page 31: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Extensions

Synchronic edges POMDPs Rewards Approximation

Page 32: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Questions?

Page 33: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Backup slides

Here be dragons.

Page 34: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Regression through a Policy

Page 35: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Improving Policies: Example

Page 36: Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)

Maximization Step, Improved Policy