Concurrent Probabilistic Temporal Planning (CPTP)
description
Transcript of Concurrent Probabilistic Temporal Planning (CPTP)
![Page 1: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/1.jpg)
Concurrent Probabilistic Temporal Planning (CPTP)
Mausam Joint work with Daniel S. WeldUniversity of WashingtonSeattle
![Page 2: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/2.jpg)
Motivation
Three features of real world planning domains : Durative actions
All actions (navigation between sites, placing instruments etc.) take time.
Concurrency Some instruments may warm up Others may perform their tasks Others may shutdown to save power.
Uncertainty All actions (pick up the rock, send data etc.)
have a probability of failure.
![Page 3: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/3.jpg)
Motivation (contd.)
Concurrent Temporal Planning (widely studied with deterministic
effects) Extends classical planning Doesn’t easily extend to probabilistic
outcomes. Concurrent planning with uncertainty
(Concurrent MDPs – AAAI’04) Handle combinations of actions over an MDP Actions take unit time.
Few planners handle the three in concert!
![Page 4: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/4.jpg)
Outline of the talk
MDP and CoMDPConcurrent Probabilistic Temporal
PlanningConcurrent MDP in augmented state space.
Solution Methods for CPTPTwo heuristics to guide the searchHybridisation
Experiments & ConclusionsRelated & Future Work
![Page 5: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/5.jpg)
Markov Decision Process
S : a set of states, factored into Boolean
variables.A : a set of actionsPr (S£A£S! [0,1]): the transition modelC (A! R) : the cost models0 : the start stateG : a set of absorbing goals
unit duration
![Page 6: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/6.jpg)
GOAL of an MDP
Find a policy (S ! A) which:minimises expected cost of reaching
a goal for a fully observable Markov decision process if the agent executes for indefinite
horizon.
![Page 7: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/7.jpg)
Equations : optimal policy
Define J*(s) {optimal cost} as the minimum expected cost to reach a goal from s.
J* should satisfy:
![Page 8: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/8.jpg)
Min
Bellman Backup
a1
a2
a3
s
Jn
Jn
Jn
Jn
Jn
Jn
Jn
Qn+1(s,a)
Jn+1(s)
Ap(s)
min
![Page 9: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/9.jpg)
Min
RTDP Trial
a1
a2
a3
Jn
Jn
Jn
Jn
Jn
Jn
Jn
Qn+1(s,a)
Jn+1(s)
Ap(s)
amin = a2
Goal
s
min
![Page 10: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/10.jpg)
Real Time Dynamic Programming(Barto, Bradtke and Singh’95)
Trial : Simulate greedy policy;
Perform Bellman backup on visited states
Repeat RTDP Trials until cost function converges Anytime behaviour Only expands reachable state space Complete convergence is slow
Labeled RTDP (Bonet & Geffner’03) Admissible, if started with admissible cost function. Monotonic; converges quickly
optimistic
Lower bound
![Page 11: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/11.jpg)
Concurrent MDP (CoMDP)(Mausam & Weld’04)
Allows concurrent combinations of actions
Safe execution: Inherit mutex definitions from classical planning:Conflicting preconditionsConflicting effects Interfering preconditions and effects
![Page 12: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/12.jpg)
Jn
Jn
Jn
Jn
Jn
Bellman Backup (CoMDP)
a2
a1,a2
a3
sJn+1(s)
Ap(s)
a1
a1,a
3
a2,a3
a1,a2,a3
Jn
Jn
Jn
Jn
Jn
Jn JnJn
Jn
Jn
Jn
Jn
Jn
Exponential blowup to calculate a
Bellman Backup!
min
![Page 13: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/13.jpg)
Sampled RTDP
RTDP with Stochastic (partial) backups:ApproximateAlways try the last best combination Randomly sample a few other
combinations In practice
Close to optimal solutionsConverges very fast
![Page 14: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/14.jpg)
Outline of the talk
MDP and CoMDPConcurrent Probabilistic Temporal
PlanningConcurrent MDP in augmented state space.
Solution Methods for CPTPTwo heuristics to guide the searchHybridisation
Experiments & ConclusionsRelated & Future Work
![Page 15: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/15.jpg)
Modelling CPTP as CoMDP
CoMDP CPTP
Model explicit action durationsMinimise expected make-span.
If we initialise C(a) as its duration – (a) :
Aligned epochs Interwoven epochs
![Page 16: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/16.jpg)
Augmented state space
0 3 6 9
X
a
b
c
e
d f
h
g
<X,;><X1,{(a,1), (c,3)}>X1 : Application of b on X.
<X2,{(h,1)}>X2 : Application of a, b, c, d and e over X.
Time
![Page 17: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/17.jpg)
Simplifying assumptions
All actions have deterministic durations. All action durations are integers. Action model
Preconditions must hold until end of action. Effects are usable only at the end of action.
Properties : Mutex rules are still required. Sufficient to consider only epochs when an action
ends
![Page 18: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/18.jpg)
Completing the CoMDP
Redefine Applicability set Transition function Start and goal states.
Example: Transition function is redefined
Agent moves forward in time to an epoch where some action completes.
Start state : <s0,;> etc.
![Page 19: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/19.jpg)
Solution
CPTP = CoMDP in interwoven state space.
Thus one may use our sampled RTDP (etc)
PROBLEM: Exponential blowup in the size of the state space.
![Page 20: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/20.jpg)
Outline of the talk
MDP and CoMDPConcurrent Probabilistic Temporal
PlanningConcurrent MDP in augmented state space.
Solution Methods for CPTPSolution 1 : Two heuristics to guide the
searchSolution 2 : Hybridisation
Experiments & ConclusionsRelated & Future Work
![Page 21: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/21.jpg)
Max Concurrency Heuristic (MC)
Define c : maximum number of actions executable concurrently in the domain.
•J*(X) · 2£ J*(<X,;>)
•J*(<X,;>) ¸ J*(X)/2
a
b c
J*(<X,;>) = 10
X Ga b c
J*(X) · 20
X G
Serialisation
Admissible Heuristic
![Page 22: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/22.jpg)
Eager Effects Heuristic : Solving a relaxed problem
S : S £ ZLet (X be a state where
X is the world state. : time remaining for all actions
(started anytime in the history) to complete execution.
Start state : (s0,0)Goal states : { (X,0) | X2G }
![Page 23: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/23.jpg)
Eager Effects Heuristic (contd.)
After 2 units(V,6)a
bX
2
8V
c 4
Allow all actions even when
mutex with a or c!
Allowing inapplicable actions to execute, thus
optimistic!
Assuming information of action
effects ahead of time, thus optimisitic!
Hence the name –Eager Effects!
Admissible Heuristic
![Page 24: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/24.jpg)
Solution2 : Hybridisation
ObservationsAligned epoch policy is sub-optimal
but fast to compute. Interwoven epoch policy is optimal
but slow to compute.
Solution: Produce a hybrid policy i.e. : Output interwoven policy for probable
states.Output aligned policy for improbable
states.
![Page 25: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/25.jpg)
Path to goals
s G
GLow
Prob.
![Page 26: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/26.jpg)
Hybrid algorithm (contd.)
Observation: RTDP explores probable branches much more than others.
Algorithm(m,k,r) : Loop
Do m RTDP trials: let current value of start state be J(s0).
Output a hybrid policy () Interwoven policy for states visited > k times Aligned policy for other states.
Evaluate policy : J(s0)
Stop if {J(s0) – J(s0)} < rJ(s0)
Less than optimal
Greater than optimal
![Page 27: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/27.jpg)
Hybridisation
Outputs a proper policy : Policy defined at all reachablepolicy states Policy guaranteed to take agent to goal.
Has an optimality ratio (r) parameter Controls balance between optimality & running
times. Can be used as an anytime algorithm. Is general –
we can hybridise two algorithms in other cases e.g. in solving original concurrent MDP.
![Page 28: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/28.jpg)
Outline of the talk
MDP and CoMDPConcurrent Probabilistic Temporal
PlanningConcurrent MDP in augmented state space.
Solution Methods for CPTPTwo heuristics to guide the searchHybridisation
Experiments & ConclusionsRelated & Future Work
![Page 29: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/29.jpg)
Experiments
DomainsRoverMachineShopArtificial
State Variables: 14-26Durations: 1-20
![Page 30: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/30.jpg)
Speedups in Rover domain
Efficiency of different methods
1
10
100
1000
10000
1 2 3 4 5 6
Different Rover Problems
Tim
e in
sec (
in lo
gari
thm
ic s
cale
)
Interwoven Epoch
Max Concurrency
Eager Effects
Hybrid Algorithm
Aligned epochs
![Page 31: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/31.jpg)
Qualities of solution
Solution Quality of different methods
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1 2 3 4 5 6
Different Rover Problems
Rati
o o
f m
ake-s
pan
to
th
e o
pti
mal
Interwoven Epoch
Max Concurrency
Eager Effects
Hybrid Algorithm
Aligned epochs
![Page 32: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/32.jpg)
Experiments : Summary
Max Concurrency heuristic Fast to compute Speeds up the search.
Eager Effects heuristic High quality Can be expensive in some domains.
Hybrid algorithm Very fast Produces good quality solutions.
Aligned epoch model Superfast Outputs poor quality solutions at times.
![Page 33: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/33.jpg)
Related Work
Prottle (Little, Aberdeen, Thiebaux’05)
Generate, test and debug paradigm (Younes & Simmons’04)
Concurrent options (Rohanimanesh & Mahadevan’04)
![Page 34: Concurrent Probabilistic Temporal Planning (CPTP)](https://reader035.fdocuments.in/reader035/viewer/2022062323/56815249550346895dc08699/html5/thumbnails/34.jpg)
Future Work
Other applications of hybridisation CoMDP MDP OverSubscription Planning
Relaxing the assumptions Handling mixed costs Extending to PDDL2.1 Stochastic action durations
Extensions to metric resources State space compression/aggregation