Partial Plan Development for Hierarchical POMDPs
Philipp Heeg
19. November 2013
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 1 / 32
Contents
1 Motivation and Introduction
2 Background InformationPOMDPs and FSCsHPOMDPs and PAFSCsThe UCT-Algorithm
3 ApproachPartial Plan Development for HPOMDPsPartial Plan Development with UCT
4 Evaluation
5 Summary
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 2 / 32
1. Motivation and Introduction
Planning with Uncertainty and Partial Observability
POMDPs can be used to model partially observable, uncertaindomains, but solving them is PSPACE-complete
in Hierarchical POMDPs, expert knowledge is introduced to optimizeplanning
Partial Plan Development
usually, the whole plan is developed before execution is started=⇒ all eventualities have to be accounted for
Idea: alternating between partial execution and planning, so theinformation gained in execution can be used to guide further planning
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 3 / 32
2.1 POMDPs and FSCs
POMDP: Partially Observable Markov Decision ProcessA POMDP has 7 components: S ,A,O,T ,Z ,R,H
a finite set of states S
a finite set of actions A
a finite set of observations O
a transition function T with T (s, a, s ′) ∈ [0, 1]
an observation function Z with Z (s, a, o) ∈ [0, 1]
a reward function R with R(s, a) ∈ Ra horizon H ∈ N
Solution: A policy that maximizes the total expected reward
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 4 / 32
2.1 POMDPs and FSCs
FSC: Finite State ControllerPolicy-representation that uses internal states instead of belief statesAn FSC has 3 components: N, α, δ
a set of controller-nodes Nan action association function α with α(n) ∈ Aa transition function δ with δ(n, n′) ∈ 2O
av
aw
ax
ay
o1 o1
o2
o1
o2
o2o1,o2
start
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 5 / 32
2.2 HPOMDPs and PAFSCs
HPOMDP: Hierarchical POMDPExtension to POMDPs that allows for exploitation of expert knowledge:
a new set of abstract actions Aa
a new set of abstract observations Oa
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 6 / 32
2.2 HPOMDPs and PAFSCs
HPOMDP: Hierarchical POMDPExtension to POMDPs that allows for exploitation of expert knowledge:
a new set of abstract actions Aa
a new set of abstract observations Oa
PAFSC: Partially abstract FSC
controller-nodes can be associated with either primitive or abstractactions.
abstract nodes can be decomposed using Method-FSCs (MFSCs)
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 7 / 32
2.2 HPOMDPs and PAFSCs
findObjecttrue
analyzeObjectstart
Initial PAFSC
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 8 / 32
2.2 HPOMDPs and PAFSCs
findObjecttrue
analyzeObjectstart
Initial PAFSC
moveLeft
findObject
start
Method-FSC for findObject
true
foundObject
¬foundObject
findObject ⇒
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 9 / 32
2.2 HPOMDPs and PAFSCs
findObjecttrue
analyzeObjectstart
Initial PAFSC
moveLeft
findObject
moveLeft
findObject
analyzeObject
start
Method-FSC for findObject
true
true
foundObject
¬foundObject
foundObject
¬foundObject
Resulting decomposed PAFSC
start
findObject ⇒
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 10 / 32
2.3 The UCT-Algorithm
UCT: Upper Confidence Bound for Trees
UCT is an instance of Monte Carlo Tree Search (MCTS)
MCTS builds a partial search tree by interacting with a domainsimulator
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 11 / 32
2.3 The UCT-Algorithm
UCT: Upper Confidence Bound for Trees
UCT is an instance of Monte Carlo Tree Search (MCTS)
MCTS builds a partial search tree by interacting with a domainsimulator
A domain simulator consists of 4 components:
a set of states S with initial state s0
a set of actions A
a transitionsimulator T
a rewardsimulator R
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 12 / 32
2.3 The UCT-Algorithm
...
n(s)=1
n(s)=1
n(s)=1
n(s)=1
n(s,a)=1
n(s,a)=1
n(s,a)=1
n(s,a)=1
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=2
Iteration 1
Total Reward: 2
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 13 / 32
2.3 The UCT-Algorithm
... ...
n(s)=2
n(s)=1 n(s)=1
n(s)=1
n(s)=1
n(s)=1
n(s)=1
n(s,a)=1 n(s,a)=1
n(s,a)=1 n(s,a)=1
n(s,a)=1 n(s,a)=1
n(s,a)=1n(s,a)=1
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=‒3
QUCT(s,a)=‒3
QUCT(s,a)=‒3
QUCT(s,a)=‒3QUCT(s,a)=2
Iteration 2
Total Reward: 2 Total Reward: ‒3
...
n(s)=1
n(s)=1
n(s)=1
n(s)=1
n(s,a)=1
n(s,a)=1
n(s,a)=1
n(s,a)=1
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=2
Iteration 1
Total Reward: 2
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 14 / 32
2.3 The UCT-Algorithm
... ......
n(s)=3
n(s)=2 n(s)=1
n(s)=1
n(s)=1
n(s)=1
n(s)=1
n(s)=1
n(s)=1
n(s,a)=2 n(s,a)=1
n(s,a)=1 n(s,a)=1 n(s,a)=1
n(s,a)=1 n(s,a)=1
n(s,a)=1n(s,a)=1
n(s,a)=1
n(s,a)=1
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=‒3
QUCT(s,a)=‒3
QUCT(s,a)=‒3
QUCT(s,a)=‒3QUCT(s,a)=‒1
QUCT(s,a)=‒4
QUCT(s,a)=‒4
QUCT(s,a)=‒4
Iteration 3
Total Reward: 2Total Reward: ‒4
... ...
n(s)=2
n(s)=1 n(s)=1
n(s)=1
n(s)=1
n(s)=1
n(s)=1
n(s,a)=1 n(s,a)=1
n(s,a)=1 n(s,a)=1
n(s,a)=1 n(s,a)=1
n(s,a)=1n(s,a)=1
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=‒3
QUCT(s,a)=‒3
QUCT(s,a)=‒3
QUCT(s,a)=‒3QUCT(s,a)=2
Iteration 2
Total Reward: 2 Total Reward: ‒3
...
n(s)=1
n(s)=1
n(s)=1
n(s)=1
n(s,a)=1
n(s,a)=1
n(s,a)=1
n(s,a)=1
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=2
QUCT(s,a)=2
Iteration 1
Total Reward: 2 Total Reward: ‒3
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 15 / 32
2.3 The UCT-Algorithm
Applying UCT to HPOMDP problems
states are reachable PAFSCs
actions are decompositions
state transitions are deterministic
rewards are generated by simulating an execution of the final primitiveFSC
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 16 / 32
2.3 The UCT-Algorithm
Applying UCT to HPOMDP problems
states are reachable PAFSCs
actions are decompositions
state transitions are deterministic
rewards are generated by simulating an execution of the final primitiveFSC
since order of decompositions is irrelevant, generation ofdecompositions at each node in the search tree can be limited to 1controller-node
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 17 / 32
3.1 Partial Plan Development for HPOMDPs
Partial executability of PAFSCs
PAFSCs can be partially executed until the current controller-node isassociated with an abstract action
Partial Plan Development: Alternating between an execution phase anda planning phase. Total planning time is distributed over all planningphases.
Execution phase: partially executing the current PAFSC until anabstract controller-node is reached
Planning phase: refining the current PAFSC by applying adecomposition to the abstract controller-node that was reached inlast execution phase
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 18 / 32
3.1 Partial Plan Development for HPOMDPs
n0
n1
na1
starto1
o2 true
true
n0
n1
na1
starto1
o2 true
true
n0
n1
na1
starto1
o2 true
true
Beginning of execution phase 1 After one executed action
After two executed actions
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 19 / 32
3.1 Partial Plan Development for HPOMDPs
n2 na2nt1
true truestart
MFSC that was selected in planning phase 1
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 20 / 32
3.1 Partial Plan Development for HPOMDPs
n2 na2nt1
true truestart
n0 n1start
o1
o2 true
n2 na2
true
true
n0
n1starto1
o2 true
n2 na2
true
true
Beginning of execution phase 2 End of execution phase 2
MFSC that was selected in planning phase 1
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 21 / 32
3.1 Partial Plan Development for HPOMDPs
Goal: higher total reward with same total planning time
Disadvantage: less time to plan for earlier decompositions
Advantage: planning specifically for the controller-node that wasreached in the last execution phase
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 22 / 32
3.2 Partial Plan Development with UCT
Idea: reusing the same search tree over all planning phases
in each planning phase, the latest PAFSC is used as root node
only one decomposition applied at plan extraction
as first decomposition in each planning phase, only decompositionsfor the abstract node that was reached in the latest execution phaseare allowed
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 23 / 32
3.2 Partial Plan Development with UCT
First Planning Phase
C0
C1
C2
C3
C4
C5
C6
C7
C8
... ... ...
d1 d2
d3 d4 d5d6 d7 d8
QUCT(C1)=7,54n(C1)=346
QUCT(C2)=4,17n(C2)=154
QUCT(C0)=6,50n(C0)=500
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 24 / 32
3.2 Partial Plan Development with UCT
Second Planning Phase
C2
C3
C4
C5
C6
C7
C8
... ... ...
C0
d1 d2
d3 d4 d5
d6 d7 d8
C1QUCT(C1)=7,54n(C1)=346
QUCT(C2)=4,17
n(C2)=154
QUCT(C0)=6,50
n(C0)=500
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 25 / 32
3.2 Partial Plan Development with UCT
Problem: limiting decompositions to 1 abstract controller-node foreach tree-node
na1
n1
na2
o1 o2
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 26 / 32
3.2 Partial Plan Development with UCT
Problem: limiting decompositions to 1 abstract controller-node foreach tree-node
na1
n1
na2
o1 o2
na1
n1
na2
o1 o2
na3
true
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 27 / 32
3.2 Partial Plan Development with UCT
Problem: limiting decompositions to 1 abstract controller-node foreach tree-node
na1
n1
na2
o1 o2
na1
n1
na2
o1 o2
na3
true
Therefore: allow decompositions for all abstract controller-nodes forwhich there’s a primitive path from the initial controller-node
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 28 / 32
4 Evaluation
comparing partial planner to non-partial planner (Christian SpathMaster Thesis), using the same total planning time
for the partial planner, the total time Z is distributed over theplanning phases by a geometric series:
t(n) = (1− q)qn−1Z
3 different evaluation domains with several instances each
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 29 / 32
4 Evaluation
Results for the Reconnaissance domain, 100s planning time
non-partial q = 0.1 q = 0.3 q = 0.5 q = 0.7 q = 0.9 Ø
Instance 1 0.58 0.52 0.52 0.64 0.52 0.64 0.57Instance 2 1.81 1.16 1.64 1.27 1.66 1.49 1.45Instance 3 0.92 0.67 1.23 0.69 0.56 1.09 0.85Instance 4 0.77 0.54 0.95 0.5 0.65 0.73 0.67Instance 5 0.92 0.8 0.93 1.01 0.43 1.17 0.87Instance 6 1.66 1.13 0.52 0.52 0.32 1.76 0.85Instance 7 0.88 0.47 0.41 0.81 0.68 0.49 0.57Instance 8 0.39 0.24 0.16 0.24 0.4 0.26 0.26Instance 9 0.61 0.7 0.58 0.4 0.5 0.26 0.49
Instance 10 0.66 0.79 0.69 0.48 0.83 0.54 0.66
Øw 1.28 0.99 1.02 0.92 0.96 1.11 1
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 30 / 32
5 Summary
Partial Plan Development for Hierarchical POMDPs using theUCT-Algorithm
alternating between execution phases and planning phases
same UCT-Tree is used for all planning phases, with changing rootnode
branching factor had to be increased to allow for directed plandevelopment
therefore slightly worse performance in the Reconnaissance domaincompared to non-partial planner
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 31 / 32
2.3 The UCT-Algorithm
MCTS uses a Tree policy and a Rollout policy for selecting Actionsduring Simulations.
UCT uses an adapted Upper Confidence Bound Algorithm as Treepolicy. The selected Action a∗ is determined by:
a∗ = arg maxa∈A
[QUCT (s, a) + c
√ln n(s)
n(s, a)
]
QUCT (s, a) = average Simulation Reward when a was selected in s
n(s) = number of visits of s in previous Simulations
n(s, a) = number of times a was executed in s in previous Simulations
Philipp Heeg Partial Plan Development for HPOMDPs 19. November 2013 32 / 32
Top Related