Probabilistic Temporal Planning with Uncertain Durations
-
Upload
blake-jennings -
Category
Documents
-
view
29 -
download
3
description
Transcript of Probabilistic Temporal Planning with Uncertain Durations
Probabilistic Temporal Planning with Uncertain Durations
Mausam Joint work with Daniel S. WeldUniversity of WashingtonSeattle
Motivation
Three features of real world planning domains
Concurrency
Calibrate while rover moves Uncertain Effects
‘Grip a rock’ may fail Uncertain Durative actions
Wheels spin, so speed uncertain
Contributions
Novel Challenges Large number of decision epochs
Results to manage this blowup in different cases Large branching factors
Approximation algorithms Five planning algorithms
DURprun : optimal DURsamp : near-optimal DURhyb : anytime with user defined error DURexp : super-fast DURarch : balance between speed and quality
Identify fundamental issues for future research
Outline of the talk
Background Theory Algorithms and Experiments Summary and Future Work
Outline of the talk
Background MDP Decision Epochs: happenings, pivots
Theory Algorithms and Experiments Summary and Future Work
Markov Decision Process
S : a set of states, factored into Boolean
variables.A : a set of actionsPr (S£A£S! [0,1]): the transition modelC (A! R) : the cost models0 : the start stateG : a set of absorbing goals
unit duration
GOAL of an MDP
Find a policy (S ! A) which:minimises expected cost of reaching a
goal for a fully observable Markov decision process if the agent executes for indefinite
horizon.Algorithms
Value iteration, Real Time Dynamic Programming, etc.
iterative dynamic programming algorithms
Definitions (Durative Actions)
Assumption: (Prob.) TGP Action modelPreconditions must hold until end of action.Effects are usable only at the end of action.
Decision epochs: time point when a new action may be started.
Happenings: A point when action finishes.
Pivot: A point when action could finish.
Outline of the talk
Background Theory
Explosion of Decision Epochs Algorithms and Experiments Summary and Future Work
Decision Epochs (TGP Action Model)
Deterministic Durations [Mausam&Weld05] :Decision Epochs = set of happenings
Uncertain Durations:Non-termination has information!Theorem: Decision Epochs = set of
pivots
Illustration: A bimodal distribution
Duration distribution of aExpect
ed C
om
ple
tion T
ime
Conjecture
if all actions haveduration distributions independent of
effectsunimodal duration distributions
thenDecision Epochs = set of happenings
Outline of the talk
Background Theory Algorithms and Experiments
Expected Durations Planner Archetypal Durations Planner
Summary and Future Work
Planning with Durative Actions
MDP in an augmented state space
<X,;>
<X1,{(a,4), (c,4)}>X1 : Application of b on X.
0 2 4 6
X
a
b
c
Time
Uncertain Durations: Transition Fn
<X,;>
<Xa, {(b,1)}>
<Xb, {(a,1)}>
<Xab, ;>
a, b0.2
5
a
b
b
a
b
a
a
b
0.2
5
0.25
0.25
<Xab, ;>
action a : uniform(1,2)action b : uniform(1,2)
Branching Factor
If n actionsm possible durationsr probabilistic effects
Then Potential Successors(m-1)[(r+1)n – rn – 1] +
rn
Algorithms
Five planning algorithms DURprun : optimal
DURsamp : near-optimal
DURhyb : anytime with user defined error
DURexp : super-fast
DURarch : balance between speed and quality
Expected Durations Planner (DURexp)
assign each action a deterministic duration equal to the expected value of its distribution.
build a deterministic duration policy for this domain.
repeat execute this policy and wait for interrupt
(a) action terminated as expected – do nothing (b) action terminated early – replan from this state (c) action terminated late – revise a’s deterministic duration and replan for this domain
until goal is reached
Planning Time
Planning Time for Rover and Machine-Shop
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8 9 10 Problems
Pla
nn
ing
Tim
e (
in s
ec
)
Pruned
Sampled
Hybrid
Exp-Dur
Rover Machine-Shop
DURprun
DURsamp
DURhyb
DURexp
Multi-modal distributions
Recall: conjecture holds only for unimodal distributions
happenings if unimodal
Decision epochs =pivots if
multimodal
Multi-modal Durations: Transition Fn
<X,;>
<Xa, {(b,1)}>
<Xb, {(a,1)}>
<Xab, ;>
a, b0.2
5
a
b
b
a
b
a
a
b
0.2
5
0.25
0.25
<X, {(a,1), (b,1)>
action a : uniform(1,2)action b : 50% : 1
50% : 3
Multi-modal Distributions
Expected Durations Planner (Durexp)One deterministic duration per actionBig approximation for multi-modal
distribution
Archetypal Durations Planner (Durarch)Limited uncertainty in durationsOne duration per mode of distribution
Planning Time (multi-modal)
Planning time in MachineShop (multi-modal)
100
1000
10000
11 12 13 14 15 16 Problems
Pla
nn
ing
tim
e (
log
sca
le)
Pruned
Sampled
Hybrid
Arch-Dur
Exp-Dur
DURsamp
DURprun
DURhyb
DURarch
DURexp
Expected Make-span (multi-modal)
Make-span in MachineShop (multi-modal)
14
16
18
20
22
24
26
28
11 12 13 14 15 16 Problems
J*(s
0)
DUR-prun
DUR-samp
DUR-hyb
DUR-arch
DUR-expDURexp
DURarch
DURhyb
DURprunDURsamp
Outline of the talk
Background Theory Algorithms and Experiments Summary and Future Work
Observations on Concurrency
Summary
Large number of Decision EpochsResults to manage explosion in
specific cases
Large branching factors Expected Durations Planner Archetypal Durations Planner (multi-
modal)
Handling Complex Action Models
So Far: Probabilistic TGPPreconditions hold over-all.Effects usable only at end.
What about: Probabilistic PDDL2.1 ?Preconditions at-start, over-all, at-endEffects at-start, at-end
Decision epochs must be arbitrary points.
Ramifications
Result independent of uncertainty!! Existing decision epoch planners are
incomplete. SAPA, Prottle, etc. All IPC winners
p,: q
a
b
GG
q : p
qp preconditions
effects
Related Work
Tempastic (Younes and Simmons’ 04)Generate, Test and Debug
Prottle (Little, Aberdeen, Thiebaux’ 05)Planning Graph based heuristics
Uncertain Durations w/o concurrencyFoss and Onder’05Boyan and Littman’00Bresina et.al.’02, Dearden et.al.’03