Partially Observable Markov Decision Process (Chapter 15 & 16)

67
TKK | Automation Technology Laboratory Partially Observable Markov Partially Observable Markov Decision Process Decision Process (Chapter 15 & 16) (Chapter 15 & 16) José Luis Peralta

description

Partially Observable Markov Decision Process (Chapter 15 & 16). José Luis Peralta. Contents. POMDP Example POMDP Finite World POMDP algorithm Practical Considerations Approximate POMDP Techniques. Partially Observable Markov Decision Processes (POMDP). POMDP: - PowerPoint PPT Presentation

Transcript of Partially Observable Markov Decision Process (Chapter 15 & 16)

Page 1: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology Laboratory

Partially Observable Markov Decision Partially Observable Markov Decision Process Process

(Chapter 15 & 16)(Chapter 15 & 16)

José Luis Peralta

Page 2: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

ContentsContents

• POMDP• Example POMDP• Finite World POMDP algorithm• Practical Considerations• Approximate POMDP Techniques

Page 3: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• POMDP: Uncertainty in Measurements State Uncertainty in Control Effects

• Adapt previous Value Iteration Algorithm (VI-VIA)

Page 4: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• POMDP: World can't be sensed directly

• Measurements: incomplete, noisy, etc.

• Partial Observability Robot has to estimate a posterior distribution over a

possible world state.

Page 5: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• POMDP: Algorithm to find optimal control policy exit for

FINITE WORLD:• State space • Action space • Space of observation • Planning horizon

Computation is complex For continuous case there are approximations

All Finite

Page 6: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• The algorithm we are going to study all based in Value Iteration (VI).

with

The same as previous but is not observable

• Robot has to make decision in the BELIEF STATE Robot’s internal knowledge about the state of the

environment Space of posteriori distribution over state

x

( )b

Page 7: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• So

with

• Control Policy

Page 8: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• Belief bel Each value in POMDP is function of

entire probability distribution

• Problems: State Space finite Belief Space

continuous State Space continuous Belief

Space infinitely-dimensional continuum

Also complexity in calculate the Value Function

Because of the integral over all the

distribution

( )b

Page 9: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• At the end optimal solution exist for Interesting Special Case of Finite World: state space; action space; space of observations;

planning horizon All finite

• Solution of VF are Piecewise Linear Function over the belief space

The previous arrive because • Expectation is a linear operation• Ability to select different controls in different parts

Page 10: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDPExample POMDP

2 States: 1 2,x x 3 Control Actions: 1 2 3, ,u u u

Page 11: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

1 1

1 2

( , ) 100

( , ) 100

r x u

r x u

2 1

2 2

( , ) 100

( , ) 50

r x u

r x u

Example POMDPExample POMDP

When execute payoff:

Dilemma opposite payoff in each state knowledge of the state translate directly into

payoff

Page 12: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

1 3 2 3( , ) ( , ) 1r x u r x u

Example POMDPExample POMDP

To acquire knowledge robot has control

affects the state of the world in non-deterministic manner:

(Cost of waiting, cost of sensing, etc.)

1 1 3

1 2 3

( , ) 0.2

( , ) 0.8

p x x u

p x x u

1 1 3

1 2 3

( , ) 0.8

( , ) 0.2

p x x u

p x x u

3u

3u

Page 13: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDPExample POMDP

• Benefit Before each control decision, the robot can sense. By sensing robot gains knowledge about the state Make better control decisions High payoff expectation

• In the case of control action , robot sense without terminal action 3u

Page 14: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDPExample POMDP

• The measurement model is governed by the following probability distribution:

1 1

1 2

( ) 0.7

( ) 0.3

p z x

p z x

2 1

2 2

( ) 0.3

( ) 0.7

p z x

p z x

Page 15: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDPExample POMDP

This example is easy to graph over the belief space (2 states)• Belief state

1 1

2 2 2 1 1

( )

( ) but 1 so we just graph

p b x

p b x p p p

Page 16: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDPExample POMDP

• Control Policy Function that maps the unit interval [0;1] to space of all

actions

:[0;1] u Example

Page 17: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

• Control Choice (When to execute what control?)

First consider the immediate payoff . Payoff now is a function of belief state

So for , the expected payoff

Payoff in POMDPs

1 2 3, ,u u u

1 2,b p p

Page 18: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

Page 19: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

Page 20: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

Page 21: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice• First we calculate

the robot simply selects the action of highest expected payoff

Piecewise Linear convex Function

Maximum of individual payoff function

1 1V T

Page 22: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice• First we calculate

the robot simply selects the action of highest expected payoff

Piecewise Linear convex Function

Maximum of individual payoff function

1 1V T

Page 23: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice• First we calculate

the robot simply selects the action of highest expected payoff

Transition occurs when in

1 2, ,r b u r b u

1

3

7p

Optimal Policy

1 1V T

Page 24: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP - Sensing Example POMDP - Sensing

• Now we have perception What if the robot can sense before it chooses control? How it affects the optimal Value Function

Sensing info about State enable choose better control action

In previous example 13

7p

Expected payoff14,7

How better will this be after sensing?

Page 25: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control ChoiceBelief after sensing as a function of the belief before sensing

Given by Bayes Rule

Finally

1z

1

0.7 0.40.6087

0.4*0.4 0.3p

Page 26: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control ChoiceHow this affects the Value Function?

Page 27: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

Mathematically

That is just replacing by in the Value Function 1p 1p 1V

Page 28: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

However our interest is the complete Expected Value Function after sensing, that consider also the probability of sensing the other measurement . This is given by:2z

Page 29: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control ChoiceAn this results in

Page 30: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

Mathematically

Page 31: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP - PredictionExample POMDP - Prediction

To plan at a horizon larger than we have to take this into consideration and project our

value function accordingly

According to our transition probability model

In between the expectation is linear

If

If

1T

Page 32: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – PredictionExample POMDP – PredictionAn this results in

Page 33: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – PredictionExample POMDP – Prediction

And adding and we have: 1u 2u

Page 34: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – PredictionExample POMDP – Prediction

Mathematically

cost Fix!!31 of u

Page 35: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – PruningExample POMDP – PruningFull backup :

547,86420 is defined over 10

linear functions

T

561,012,33730 is defined over 10

linear functions

T

Impractical!!!

Efficient approximate POMDP needed

Page 36: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Finite World POMDP algorithmFinite World POMDP algorithm

To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]

Page 37: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Finite World POMDP algorithmFinite World POMDP algorithm

To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]

Page 38: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

It looks easy let’s try something more “real”…

Probabilistic Robot “RoboProb”

Page 39: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt looks easy let’s try something more “real”…

Probabilistic Robot “RoboProb”

11 States: 1 2 3 4, , ,x x x x

5 Control Actions: 1u

5 6 7 8, , ,x x x x9 10 11, ,x x x

2u3u 4u5u Sense without moving

1 2 3 45 6 78 9 10 11

0.10.1

0.8

Transition Model

Page 40: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

It looks easy let’s try something more “real”…Probabilistic Robot

“RoboProb”

-0,04 -0,04 -0,04 1-0,04 -0,04 -1-0,04 -0,04 -0,04 -0,04

“Reward” Payoff

1 1

8 2

( , ) 0.04

( , ) 0.04

r x u

r x u

The same set for all control action

Example

7 5

7 3

( , ) 1

( , ) 1

r x u

r x u

Page 41: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Transition Probability

( , )i j kp x x u

Example 1( , )i jp x x u

1u

1 2 3 4 5 6 7 8 9 10 111 0,9 0,1 0 0 0 0 0 0 0 0 02 0,1 0,8 0,1 0 0 0 0 0 0 0 03 0 0,1 0,8 0,1 0 0 0 0 0 0 04 0 0 0 1 0 0 0 0 0 0 05 0,8 0 0 0 0,2 0 0 0 0 0 06 0 0 0,8 0 0 0,1 0,1 0 0 0 07 0 0 0 0 0 0 1 0 0 0 08 0 0 0 0 0,8 0 0 0,1 0,1 0 09 0 0 0 0 0 0 0 0,1 0,8 0,1 0

10 0 0 0 0 0 0,8 0 0 0,1 0 0,111 0 0 0 0 0 0 0,8 0 0 0,1 0,1

Posteriori State

Cur

rent

Sta

te

1 2 3 45 6 78 9 10 11

0.10.1

0.8

Page 42: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Transition Probability

( , )i j kp x x u

Example 5( , )i jp x x u

1 2 3 45 6 78 9 10 11

1 2 3 4 5 6 7 8 9 10 111 1 0 0 0 0 0 0 0 0 0 02 0 1 0 0 0 0 0 0 0 0 03 0 0 1 0 0 0 0 0 0 0 04 0 0 0 1 0 0 0 0 0 0 05 0 0 0 0 1 0 0 0 0 0 06 0 0 0 0 0 1 0 0 0 0 07 0 0 0 0 0 0 1 0 0 0 08 0 0 0 0 0 0 0 1 0 0 09 0 0 0 0 0 0 0 0 1 0 0

10 0 0 0 0 0 0 0 0 0 1 011 0 0 0 0 0 0 0 0 0 0 1

Posteriori State

Cur

rent

Sta

te

5u

Page 43: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Measurement Probability

( )j ip z x

1 2 3 4 5 6 7 8 9 10 111 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,032 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,033 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,034 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,035 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,036 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,037 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,038 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,039 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03

10 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,0311 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7

Probability of Measuring Zi

Cur

rent

Sta

te

Page 44: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Belief States

1 1( )p b x

3 3( )p b x2 2( )p b x

11 11 2 3 10( ) 1p b x p p p

Impossible to graph!!

Page 45: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Each linear function results from executing control , followed by observing measurement , and then executing control .

uz

u

Page 46: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Defining Measurement Probability

Defining “Reward” Payoff

Defining Transition Probability

Merging Transition (Control) Probability

Page 47: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

u

z

u

Setting Beliefs

Executing

Sensing

number of states

number of controlsC

N

N

timesCN

timesN

Executing timesCN

Page 48: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsNow What…?

Probabilistic Robot “RoboProb”

Calculating

number of states

number of controlsC

N

N

The real problem is to compute

,r b u timesN

Page 49: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

The real problem is to compute

Given a belief and a control action , the outcome is a distribution over distributions.

Because belief is also based on the next measurement, the measurement itself is generated stochastically.

,p b u bKey factor in this update is the conditional probability

This probability specifies a distribution over probability distributions.

b u

b

Page 50: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

The real problem is to compute

So we make

Contain only on non-zero term = b

Page 51: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

The real problem is to compute

Arriving to:

Just integrate over measurements instead of uzBecause our space is finite we have

With

Page 52: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

The real problem is to compute

At the end we have something

So, this VIA is far from practical. For any reasonable number of distinct states, measurements,

and controls, the complexity of the value function is prohibitive, even for relatively beginning planning horizons.

Need for approximations

Page 53: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP TechniquesApproximate POMDP Techniques

• Here we have 3 approximate probabilistic planning and control algorithms QMDP AMDP MC-POMDP

• Varying degrees of practical applicability. • All 3 algorithms relied on approximations of the

POMDP value function. • They differed in the nature of their

approximations.

Page 54: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - QMDPApproximate POMDP Techniques - QMDP

• The QMDP framework considers uncertainty only for a single action choice: Assumes after the immediate next control action, the

state of the world suddenly becomes observable. Full observability make possible to use the MDP-

optimal value function. QMDP generalizes the MDP value function to belief

spaces through the mathematical expectation operator.

Planning in QMDPs is as efficient as in MDPs, but the value function generally overestimates the true value of a belief state.

Page 55: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - QMDPApproximate POMDP Techniques - QMDP

• Algorithm

• The QMDP framework considers uncertainty only for a single action choice.

Page 56: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

• Augmented-MDP (AMDP) maps the belief into a lower-dimensional representation, over which it then performs exact value iteration.

• “Classical" representation consists of the most likely state under a belief, along with the belief entropy.

• AMDPs are like MDPs with one added dimension in the state representation that measures global degree of uncertainty.

• To implement AMDP, its necessary to learn the state transition and the reward function in the low-dimensional belief space.

Page 57: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

• “Classical" representation consists of the most likely state under a belief, along with the belief entropy.

Page 58: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

MEAN

COVARIANCE

TRUE COVARIANCE

TRUE MEAN

ESTIMATED MEAN

ESTIMATED COVARIANCE

Page 59: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

• AMDPs in mobile robot navigation is called coastal navigation.

• Anticipates uncertainty• Selects motion that trades off overall path

length with the uncertainty accrued along a path.

• Resulting trajectories differ significantly from any non-probabilistic solution.

• Being temporarily lost is acceptable, if the robot can later re-localize with sufficiently high probability.

Page 60: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

• AMDP Algorithm

Page 61: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

Page 62: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP

• The Monte Carlo MPOMDP (MC-POMDP)• Particle filter version of POMDPs. • Calculates a value function defined over sets of

particles. • MC-POMDPs uses local learning technique,

which used a locally weighted learning rule in combination with a proximity test based on KL-divergence.

• MC-POMDPs then apply Monte Carlo sampling to implement an approximate value backup.

• The resulting algorithm is a full-fledged POMDP algorithm whose computational complexity and accuracy are both functions of the parameters of the learning algorithm.

Page 63: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP

• particle set representing belief b

• Value Function

Page 64: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP

• MC-POMDP Algorithm

Page 65: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

• Contents :- Motivation

- Conclusions- Problem Description- Objective - Robot Model- Experimental Results

discrete Monte Carlo representation of 1:11 kk yxp

set of N particles : )(1ikx

Draw new particles from proposal Distribution

)(1

)( ik

ik xxp

Given new observation ky

evaluate importance weights using likelihood function

)()( ikk

ik xypw

Resample Particles

Discrete Monte Carlo representation (aproximation) of kk yxp :1

Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP

Page 66: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

References and LinksReferences and Links

• References[1] Thrun, Burgard, Fox. Probabilistic Robotics. MIT Press, 2005

• Linkshttp://en.wikipedia.org/wiki/Partially_observable_Markov_decision_processhttp://www.cs.cmu.edu/~trey/zmdp/http://www.cassandra.org/pomdp/index.shtml http://www.cs.duke.edu/~mlittman/topics/pomdp-page.html

Page 67: Partially Observable Markov Decision Process  (Chapter 15 & 16)

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

ExerciseExerciseExercise 1 in [1] Chapter 15A person faces two doors. Behind one is a tiger, behind the other a reward of +10. The person can either listen or open one of the doors. When opening the door with a tiger, the person will be eaten, which has an associated cost of -20. Listening costs -1. When listening, the person will hear a roaring noise that indicates the presence of the tiger, but only with 0.85 probability will the person be able to localize the noise correctly. With 0.15 probability, the noise will appear as if it came from the door hiding the reward.

Your questions:

(a) Provide the formal model of the POMDP, in which you define the state, action, and measurement spaces, the cost function, and the associated probability functions. (b) What is the expected cumulative payoff/cost of the open-loop action sequence: "Listen, listen, open door 1"? Explain your calculation.

(c) What is the expected cumulative payoff/cost of the open-loop action sequence: "Listen, then open the door for which we did not hear a noise"? Again, explain your calculation.