Metric/Temporal Planning. Metric Temporal Planning MTP Adds time and resources to planning Special...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
1
Transcript of Metric/Temporal Planning. Metric Temporal Planning MTP Adds time and resources to planning Special...
Metric/Temporal Planning
Metric Temporal Planning MTP Adds time and resources to planning
Special cases: TP: Temporal planning RP: Resource Planning
Issues with Time Changes brought by the introduction of time into Planning can be grouped into two
categories Changes brought by having a metric (clock) time
I.e., there is a clock with respect to which we can specify events Changes brought by durations to actions
I.e., actions are not instantaneous Without metric time, a plan has just a beginning and ending point. Metric time allows
us to talk about all time points (and intervals) during the execution of the plan. Changes brought by metric time include Exogenous events
Special case: Timed initial literals (in the initial state we can state that some fluent becomes true at a specific time in the future during execution)
Deadline goals We can state that different goals need to be made true by different deadline
times (instead of all goals being true at the end) Durative goals
We can state that a certain fluent must have a specific value over an entire interval
Issues with Time (contd.) Durations of actions may be static or “dynamic”
duration depends on the context—eg. Time to fill your gas tank depends on how empty the tank is to begin with
Advanced issues: Uncertain durations… With instantaneous actions, an action has just “before” and “after” –preconditions must hold
“before” and effects will hold “after. Durative actions have before, after as well as “during”. With metric time (i.e., external clock), we can refer to all these points. We can now ask: When are preconditions needed?
Are they needed at a single point or over a duration? When are effects given? Are they point effects or “durative” effects (which are guaranteed
over a certain duration)? Note that because actions have durations, they can have multiple effects on a single
fluent at different times E.g. the action can make fluent P true at start, false after 10 sec, true again after
another 10 sec etc. A default assumption is to say that all preconditions are needed at the beginning and must
hold during the entire action’s duration. And that all effects will be available at the end of the action
E.g Consider “Grading homeworks” action—when are the homeworks needed? When are the grades available? What does your teacher tell you?
Issues with Time (contd.) Durative actions bring more pointed meaning to “concurrency”.
Concurrency is not just a luxury (to reduce make-span), but is often a necessity (e.g. burn a match, and cross the dark corridor while the match is burning..)
Suppose I tell you that a plan P contains actions A1… A10, each with duration d1…d10, then what is the makespan (execution duration) of P? Makespan(P) >= max(d1…d10) If Makespan(P) = Sum(d1…d10), then it is a strictly serial plan If Makespan(P) > Sum(d1..d10), then there is idle-time in the plan If Makespan(P) < Sum(d1..d10), then there is concurrency
Actions don’t need to start right after the preceding action Think of the bank teller gossiping with his colleague in between
servicing each customer Planned idle/slack time may not always be a bad thing—it can sometimes
improve the robustness of the plan Think of three travel plans involving connections in Minneapolis: Plan 1 schedules 5 min for connection time; plan 2 schedules 1
hour; plan 3 schedules 2 days. Which one is better (all else being equal).
Issues with Resources (continuous quantities)
Resources: Actions may consume/produce (continuous quantity) “resources” The main consequence is that we have numeric state variables, instead of just
boolean (or multi-valued) ones (multi-valued does not mean numeric—a variable can take red,blue,green as values).
Actions can update a numeric state variable (whereas they just assign a non-numeric one)
Resource qty after action := Some-function-of(Resource qty before action, action parameters)
Updates can be linear OR non-linear When combined with durative actions, updates can be discrete (i.e, happen all at once
at the end of the action) OR continuous (or happen at some given rate during the action)
Planning issues: How to efficiently reason with continuous quantities during planning
PDDL 2.1 Standard:Summary
Durations Static and dynamic durations allowed Also allows duration inequalities
Preconditions Can be “at start” or “over all” (throughout the duration)
Doesn’t model preconditions being needed for arbitrary durations in the middle Effects
Can be “at start” or “at end” This makes effects “discrete”
Numeric quantities Can be present in the preconditions or effects Presence in the effects can be “discrete” (“at start”/”at end”) or continuous
Continuous change specified by giving a “rate” at which the quantity changes Non-linear rate harder
(:durative-action burn_match:parameters ():duration (= ?duration 15):condition: (and (at start have_match) (at start have_strikepad)):effect (and (at start have_light) (at end (not have_light))
))
have_match, have strikepad
have_light ~have_lightBURN MATCH(dur: 15)
(:durative-action cross_cellar:parameters ():duration (= ?duration 10):condition (and (at start have_light)
(over all have_light) (at start at_steps))
:effect (and (at start (not at_steps)) (at start crossing)(at end at_fuse_box))
CROSS_CELLAR(dur: 10)have_light, at_steps
at_fuse_box~at_steps, crossing
PDDL 2.1 (Level 2)Pure Durative Actions
PDDL 2.1 Level 3:Durative actions and numeric quantities
(but discrete effects)
The entire energy to be consumed is “encumbered” at the very beginning (even though it gets consumed Slowly over the full duration.
PDDL 2.1 Level 4:Durative actions and numeric quantities
(with continuous effects: )
Issues in modeling continuous change by discrete vs. continuous effects
Consider the action of boiling a pan of water The quantity “temperature of water” changes
continuously over the duration of the action We can ignore continuous effects by
specifying that temperature is 1000 C at the end
Easy to handle; can only access the temperature at the end of the action; Reduces concurrency (what if we also put a blow torch to the pan to “hasten” the process?)
Or we can specify that the temperature of the water raises at a linear rate until it becomes 100
Harder to handle; but allows more concurrency (the total rate of increase is summation of all the individual rates of increase)
Compiling durative actions into instantaneous ones
A durative action A that has only at-start, at-end and over-all conditions can be modeled in terms of two coupled instantaneous actions As and Ae As gets all the at-start
conditions and effects Ae gets all the at-end
conditions and effects An “invariant” (think of it as an
Interval Preservation Constraint) from As to Ae for all the “overall” preconditions
+es
ps
+ee
pepo
As
A
Ae
+es
ps
+ee
pe
po
Plan representation
A1
A2
A3
Drive(cityA,cityB)
QAt(truck,B)
An executable plan must provide -- the actions that need to be executed -- the start times for each of the actions Or a set of simple temporal constraints on the set of actions (S.T.C. are generalization of partial orders) E.g. A1—[4,5]A2 (means 4 <= ST(A2) – ST(A1) <= 5 )
Plan views: Pert and Gantt charts GANTT Chart is what is shown on the right PERT shows the Causal links
Problem Representation Achievement Goals are specified as a list <pi,ti> where pi
needs to hold by time ti ti is the deadline by which G must hold. It can be metric time (e.g.
make clear(b) true by 2pm.) If ti is omitted we will assume that G is a non-deadline goal (must be true by
the time the plan is done. “Persist Goals” are specified as a condition and an interval
over which it must hold A persist goal may be supported by different actions for the different parts
of the duration ( “goal reduction” a la ZENO) E.g. striking multiple matches to have light over a duration
Plan Quality Measures Makespan: Clock time for the execution of the plan
(more concurrency lower makespan) Slack: The difference between the deadline for a goal
and the time by which the plan achieves it Tardiness is negative slack Optimize max/min/average slack/tardiness measures
Cost: Sum of costs of all the actions Can be split into multiple dimensions, one corresponding to
each resource
A1
A2
A3
Drive(cityA,cityB)
QAt(truck,B)
Concurrency
Two actions are concurrent if their execution durations overlap in time A plan is concurrent if it has concurrently executing actions
If make-span of a plan is less than the sum of the durations of the actions in the plan, then the plan has concurrency
A problem requires concurrency if every solution plan for the problem is concurrent Note that a problem has sequential solutions but for optimality reasons it may have
to go for concurrent solutions A domain requires concurrency if any of its problems requires concurrency One distinguishing feature of temporal planning domains is that they may have
problems that require concurrency. Interesting Factoid: Several of the planners that won the temporal planning
competition could not actually solve problems requiring concurrency Another interesting factoid: Most of the bench-mark domains actually didn’t have
problems that required concurrency
[Cushing et. al. IJCAI-07; ICAPS-07]
Looking at STRIPS Actions from PDDL2.1 Vantage Point
How best to view non-durative actions? Instantaneous
Makes it hard to provide physical semantics (no change is instantaneous)
epsilon duration with only Overall preconditions and At-end post-conditions We can show that domains with this type of actions can
never have problems that require concurrency
TGP-style durative actions
A PDDL-2.1 action is a TGP-style durative action if All preconditions are “Overall” preconditions All effects are “at-end” effects
It can be shown that domains in which all actions areTGP-style will not require concurrency Concurrency may still be needed for make-span
optimization
Temporal Gap
A PDDL-2.1 style action is said to have temporal gap if there is no single time-point in the action where all the preconditions and effects of the actions must hold Epsilon duration STRIPS actions have no temporal gap TGP-style actions have no temporal gap
All the preconditions and effects must hold together at the end point of the action
If none of the actions in a domain have temporal gap, then that domain cannot have problems with required concurrency “Duration” is like a cost measure
Add…
The issue of time—dense vs. integer Rintanen’s complexity issue—R.C. with the
same action.. Non-RC plans can be compiled 1-1 A huge modeling jump
Ended here..
Some Brand Names
Planners that can handle similar types of temporal and resource constraints: TLPlan, HSTS, IxTexT, Zeno, SAPA, LPG
TlPlan, SAPA are progression-based planners HSTS,IxTET,Zeno are partial-order-based planners TlPlan,HSTS are domain-customized planners; the rest are domain independent
Planners that can handle a subset of constraints: Only temporal: TGP, TPG, LPGP Only resources: LPSAT, GRT-R, Kautz-Walser, Metric-FF Subset of temporal and resource constraints: TP4, Resource-IPP
LPGP and LPSAT are “loosely-coupled” systems. LPSAT connects SAT and LP solvers; LPGP connects Graphplan and LPsolver
Issues of how “tight” is the loose-connection. TGP,TPG,LPGP are Graphplan-based LPSAT is based on SAT encodings being sent to LP solvers Kautz-Walser is based solely on LP encodings
State of the Art (as of IPC2002)(revised for IPC 2004)
At IPC 2002; PDDL 2.1 standard had three levels Level 1: STRIPS/ADL Level 2: +Durative Actions
FF, LPG, SAPA, SGPlan (extends LPG) Level 3: +Numeric quantities discrete change
Sapa, LPG, SGPlan (extends LPG) Level 4: +Continuous change
None at IPC Some planners can handle it “in theory” but none are scalable
Approaches for MTP
In theory, pretty much every one of the approaches we saw for classical planning can be (and have been) extended to MTP (with varying degrees of scalability)
There are some interesting tradeoffs PO planners are easiest to extend to support the concurrency
needed for durative actions Have harder time handling resources (because resource consumption
depends on exactly what actions occurred before this time point) Progression planners easiest to extend to support resource
consuming actions But harder time handling concurrency (need to consider “advancing
clock” as a separate option in addition to applying one of the actions)
Our Road Map
Will focus on conjunctive planning approaches—with special attention to Sapa action models
Using PDDL2.1 standard how to model the search
Progression; Regression; PO planning how to extract good heuristics
Done
Action Representation
Flying
(in-city ?airplane ?city1)
(fuel ?airplane) > 0
(in-city ?airplane ?city1) (in-city ?airplane ?city2)
consume (fuel ?airplane)
Durative with EA = SA + DA
Instantaneous effects e at time te = SA + d, 0 d DA
Preconditions need to be true at the starting point, and protected during a period of time d, 0 d DA
Action can consume or produce continuous amount of some resource
Action Conflicts:
Consuming the same resourceOne action’s effect conflictingwith other’s precondition or effect
Digression: Concurrent vs. Parallel plans
The main difference with temporal planning is that we need to produce concurrent plans In the context of classical planning, concurrent planners are akin to
parallel plans (aka Graphplan) This analogy is not complete of course. For every solvable problem in
classical planning, there is guaranteed to be a sequential plan. This guarantee does not hold for temporal planning (which means we have to search in the space of concurrent plans)
Progression planners that we have seen until now produce sequential plans (FF does not produce parallel plans!)
FF is still complete because in classical planning, there is always a sequential plan for every problem
So, we can start by asking what we need to do to make progression produce parallel plans.
Digression: How to produce parallel plans with progression?
The naïve idea is to project over subsets of non-interfering actions (rather than single actions).
Problem: Exponential branching factor A better idea: Consider “fattening” as well as “lengthening” the current partial plan as
two options. We start by representing the state of a partial plan prefix as [S, {A1…Ak}] where S is the
current state, and {A1..Ak} are the mutually non-interfering actions that we have already committed to applying at S.
Notice that this is just a generalization of the normal progression state, in which the action set {A1..Ak} will be a singleton
Given a state [S,{A1..Ak}] to expand, we have (backtrackable) choices: Fatten: Consider applying another action B in state S [One branch for each possible action B]
For this to be feasible, B should be applicable in Si and B should not be interfering with A1..Ak. The resulting state will be {S; {A1…Ak}}
Lengthen: Consider applying an action C in the state S’ which is obtained by applying actions {A1…Ak} in S [One branch for each applicable action]
For this to be feasible, C should be applicable in S’. The resulting state is {S’, {C}} Notice that
Fattening is only done at the current state (once lengthening is done, the current state changes. So any new fattening will be done at the new state.
Normal progression always selects “Lengthen”. The addition needed to support parallel plans is the “Fatten” branch.
Digression: Generating concurrent plans is similar to generating parallel plans…
To generate concurrent plans using progression, we start with the idea of generating parallel plans with progression
For parallel plans, the “state of the partial plan” is represented by [S, {A1..Ak}]
For temporal concurrent plans, we need to generalize this to consider the fact that
1. Each action may have different duration2. Actions may have effects that are realized at different time points in the
future 1. This means that some actions that we have committed to applying at previous
states may wind up posting their effects now. The solution is to start thinking in terms of “current time stamp”, and
information about the set of durative actions that we have committed to apply whose effects have not yet been realized. We can either add additional non-interfering actions at the current time-stamp OR advance the timestamp (to the nearest future time where new effects of
already committed actions can be realized).
State-Space Search:Search is through time-stamped states
Search states should have information about -- what conditions hold at the current time slice (P,M below) -- what actions have we already committed to put into the plan (,Q below)
S=(P,M,,Q,t)
Set <pi,ti> of predicates pi and thetime of their last achievement ti < t.
Set of functions represent resource values.
Set of protectedpersistent conditions(could be binary or resource conds).
Event queue (contains resource as wellAs binary fluent events).
Time stamp of S.
In the initial state, P,M, non-empty Q non-empty if we have exogenous events
(:durative-action cross_cellar:parameters ():duration (= ?duration 10):condition (and (at start have_light)
(over all have_light)(at start at_steps))
:effect (and (at start (not at_steps)) (at start crossing)(at end at_fuse_box)
)
Let current state S be P:{have_light@0; at_steps@0}; Q:{~have_light@15} t: 0(presumably after doing the light-candle action) Applying cross_cellar to this state gives
S’= P:{have_light@0; crossing@0}; :{have_light,<0,10>} Q:{at_fuse-box@10;~have_light@15} t: 0
(:durative-action burn_match:parameters ():duration (= ?duration 15):condition: (and (at start have_match)
(at start have_strikepad)):effect (and (at start have_light)
(at end (not have_light)))
)
Light-match
Light-match
Cross-cellar
1510
Time-stamp
“Advancing” the clock as a device for concurrency control
To support concurrency, we need to consider advancing the clock How far to advance the clock?
One shortcut is to advance the clock to the time of the next earliest event event in the event queue; since this is the least advance needed to make changes to P and M of S.
At this point, all the events happening at that time point are transferred from Q to P and M (to signify that they have happened)
This This strategy will find “a” plan for every problem—but will
have the effect of enforcing concurrency by putting the concurrent actions to “align on the left end”
In the candle/cellar example, we will find plans where the crossing cellar action starts right when the light-match action starts
If we need slack in the start times, we will have to post-process the plan
If we want plans with arbitrary slacks on start-times to appears in the search space, we will have to consider advancing the clock by arbitrary amounts (even if it changes nothing in the state other than the clock time itself).
Light-match
Cross-cellar
~have-light
1510
In the cellar plan above, the clock,If advanced, will be advanced to 15,Where an event (~have-light will occur)This means cross-cellar can either be doneAt 0 or 15 (and the latter makes no sense)
Cross-cellar
Search Algorithm (cont.) Goal Satisfaction: S=(P,M,,Q,t) G if <pi,ti> G either:
<pi,tj> P, tj < ti and no event in Q deletes pi.
e Q that adds pi at time te < ti. Action Application: Action A is applicable in S if:
All instantaneous preconditions of A are satisfied by P and M.
A’s effects do not interfere with and Q. No event in Q interferes with persistent
preconditions of A. A does not lead to concurrent resource change
When A is applied to S: P is updated according to A’s instantaneous
effects. Persistent preconditions of A are put in Delayed effects of A are put in Q.
Flying
(in-city ?airplane ?city1)
(fuel ?airplane) > 0
(in-city ?airplane ?city1) (in-city ?airplane ?city2)
consume (fuel ?airplane)
Flying
(in-city ?airplane ?city1)
(fuel ?airplane) > 0
(in-city ?airplane ?city1) (in-city ?airplane ?city2)
consume (fuel ?airplane)
S=(P,M,,Q,t)
Search: Pick a state S from the queue. If S satisfies the goals, endElse non-deterministically do one of
--Advance the clock (by executing the earliest event in Qs
--Apply one of the applicable actions to S
[TLplan; Sapa; 2001]