A modular architecture for hybrid planning with theories cp2014
-
Upload
pierre-schaus -
Category
Education
-
view
225 -
download
2
description
Transcript of A modular architecture for hybrid planning with theories cp2014
A Modular Architecture for Hybrid Planning with Theories
Maria Fox
Planning Group, Dept of Informatics
King’s College London, UK
The Topic of This Talk
• Planning is moving towards ever more demanding applications:
• What challenges arise for planning in the physical world?
– Time, numeric quantities, continuous change etc
• How do inference and relaxation go together in hybrid planning?
• How does a planner reason about structured types?
Outline
• Quick introduction to temporal Planning and Relaxed Plan Search
• A challenging application for planning and constraint reasoning
• How physical dynamics complicate planning
• Planning with structured types
• Overall framework: Planning Modulo Theories
“Planning Modulo Theories”, Peter Gregory, Derek Long, Maria Fox and J. Christopher Beck, ICAPS 2012
Planning
• Planning is the problem of finding a sequence of concurrent collections of actions to transform an initial state into a goal state
• Suitable when there are long causal chains and inter-dependencies
• Assumes the world can be modelled as a finite collection of state variables and that actions cause changes in the values of those variables
Actions: Preconditions determine whether transitions are possible, effects assign values to state variables
There is an enormous search space to search using relaxations of the problem
Planning
• Planning is the problem of finding a sequence of concurrent collections of actions to transform an initial state into a goal state
• Suitable when there are long causal chains and inter-dependencies
• Assumes the world can be modelled as a finite collection of state variables and that actions cause changes in the values of those variables
……Until a plan is found that transforms the initial state into one satisfying the goal
There is an enormous search space to search using relaxations of the problem
Heuristic Forward Search for Temporal
Planning
Current state
Possible next states
Relaxed plans generated for each evaluated candidate next state
Goal condition
State progression
Heuristic function
computation
Abstracted reachability and relaxed plan
extraction
State variable assignments and
temporal constraints
“The FF Planning System: Fast Plan Generation through Heuristic Search” Joerg Hoffmann and Bernhard Nebel, JAIR 2001 "Forward-Chaining Partial-Order Planning (POPF)" A. J. Coles, A. I. Coles, M. Fox, and D. Long, ICAPS, May 2010.
Temporal reasoning and constraint propagation
Eg: (ignore negative effects)
Relaxing Plan Search Space
Initial state
Reachable in 1 action
Reachable in 2 actions
Reachable in n actions
Relaxation: Collect individual states at each step into a single abstract state at that step How many steps to reach an abstract state that satisfies the goal?
A state is a valuation for a finite set of variables, an abstract state is an abstract valuation We have to construct an abstract domain for each of the variables in the state
(:durative-action boilwater
:parameters (?w - water)
:duration (= ?duration 93)
:condition (and (over all(heating ?w))
(at start (= (temperature ?w) 7))
)
:effect (and (at start (heating ?w))
(at end (assign (temperature ?w) 100))
(at end (not (heating ?w)))
(at end (boiled ?w))
)
)
?duration is fixed, assuming that water starts at cold tap temperature
The action starts the heating process
The action has the discrete effect of setting the temperature of the water to 100 degrees
“PDDL: The Planning Domain Definition Language” D. McDermott, M.Ghallab, A.Howe, C.Knoblock, A.Ram, M.Veloso, D.Weld, D.Wilkins (The Rules Committee for the First International Planning Competition, 1998) “PDDL2.1: An extension to PDDL for Expressing Temporal Planning Domains” Maria Fox and Derek Long, JAIR 2003
over all is used to express invariant conditions
Invariants
End conditions
End effects
Start conditions
Start effects
?duration Durative action construct of PDDL2.1
Numeric state variable
Abstraction to Semi-lattice
Consider variable: V R
Domain Abstraction
Applying relaxed steps in relaxed plan construction always causes variable value to climb up the lattice New assignments combine original value with all newly achieved values at each relaxed step: this is a lattice join operation
Т
Т
……… ………
…………………….
[x1,x1]
[l2,u2]
Semi-lattice
[x2,x2] [xn,xn] ………
“The Metric-FF Planning System: Translating Ignoring Delete Lists to Numeric State Variables “, Jörg Hoffmann. JAIR 2003
Adding Constraints
(:action increment
:precondition ( )
:effect (increase (x) 1))
(:action decrement
:precondition ( )
:effect (decrease (x) 1))
(:action double
:precondition (and (<= (x) 3)
(>= (x) 2))
:effect (scaleup (x) 2))
Initial: (= (x) 2)
Goal: (>= (x) 8)
x = 2 x = [2,2]
x = [1,4]
{increment,decrement,double}
{increment,decrement,double}
x = [0,8]
Goal achieved in 2 steps
Adding Constraints
(:action increment...)
(:action decrement...)
(:action double
:precondition (and (<= (x) 3)
(>= (x) 2))
:effect (scaleup (x) 2))
x = 2 x = [2,2]
x = [1,4]
{increment,decrement,double}
{increment,decrement,double}
x = [0,6]
Goal achieved in 4 steps
x = [-1,7]
{increment,decrement,double}
x = [-2,8]
{increment,decrement,double}
(2 <= x <= 3) and x = [1,4] Lattice meet operation leads to: x = [2,3] x = [4,6] (double) Also: x = [2,5] (increment) and x = [0,3] (decrement) so join gives x = [0,6]
A Challenging Problem
• Sellafield is the site of a nuclear fuel reprocessing plant and also of two old nuclear plants, Windscale and Calder Hall, being decomissioned
• Around 240 of Sellafield's 1,400 buildings are nuclear facilities. All have to be decommissioned within 100 years at estimated cost of £50bn
• Processing of waste includes remote operations on old fuel rods, which are stored in water for cooling
A Challenging Problem
• When removed from the water, a process of heating starts
• If rods overheat a chain reaction could occur, releasing a huge amount of radioactive gas
• The interactions with the rods are temperature dependent and constrained by the heating process
• The rods and components of the rods can be partially cooled during the processing
• Treatment of key elements of the rods must be completed within a time window determined by the combined effects of heating and cooling
A Challenging Problem
Modelling
Planning
• Planners need to combine discrete decisions, temporal and resource reasoning with awareness of continuous change
• All activities are time-dependent and time-critical
• Richer relaxations are required for search control
• Stronger inference is needed for pruning search and propagating consequences of decisions
• Modelling languages have to capture mixed discrete-continuous interactions
Related Work in Hybrid Planning
• Model-Directed Autonomous Systems, Nayak and Williams, AI Magazine 1998
• Sapa: A Multi-objective Metric Temporal Planner, Do, Kambhampati, JAIR 2003
• Integrated AI in Space: The Autonomous Sciencecraft on Earth Observing One, Chien, AAAI 2006
• Generative Planning for Hybrid Systems based on Flow Tubes, Li and Williams, ICAPS 2008
• UPMurphi: A Tool for Universal Planning on PDDL+ Problems, Della Penna, Magazzeni, Mercorio, Intrigila, ICAPS 2009
• Temporal Planning with Problems Requiring Concurrency through Action Graphs and Local Search, Gerevini, Saetti and Serina, ICAPS 2010
• A Planning-based Framework for Controlling Hybrid Systems, Lohr, Eyerich, Keller and Nebel, ICAPS 2012
• Planning with MIP for Supply Restoration in Power Distribution Systems, Thiébaux, Coffrin, Hijazi and Slaney, IJCAI 2013
Planning with Continuous Change
(:durative-action boilwater
:parameters (?w - water)
:duration (> ?duration 0)
:condition (and (over all(heating ?w))
(at end (= (temperature ?w) 100))
)
:effect (and (at start (heating ?w))
(at end (boiled ?w))
(increase (temperature ?w) (* #t 1))
(at end (not (heating ?w)))
)
)
?duration is a numeric parameter, whose value is chosen by the planner
The action has the continuous effect of increasing the temperature linearly with rate 1
d temperature
dt
= 1
“PDDL2.1: An extension to PDDL for Expressing Temporal Planning Domains” Maria Fox and Derek Long, JAIR 2003 “COLIN: Planning with Continuous Linear Numeric Change”, Coles, Coles, Fox, Long, JAIR 2012
Durative Actions
( ) Invariants (open interval)
End conditions
End effects
boilwater
( )
= (temperature water1) 100
heating water1
(boiled water1)
Start conditions
Start effects
?duration > 0
?duration
(heating water1)
increase (temperature water) (* #t 1)
(not (heating water1))
Temporal Reachability
Current state
Goal condition
“Planning with Problems Requiring Temporal Coordination”, Coles, Fox, Long, Smith, AAAI 2008 “COLIN: Planning with Continuous Linear Numeric Change”, Coles, Coles, Fox, Long, JAIR 2012
A
Invariants
End conditions
End effects
Start conditions
Start effects
?duration
Start conditions
Start effects
End conditions
End effects
Astart Aend
In relaxation: Ensure that Aend cannot be applied before Astart Aend effects are separated from Astart by ?duration Ignore conflicts with invariants In state progression: Prune states that violate invariants
Continuous Processes
• Physical processes, such as boiling water, can be modelled directly in PDDL+
(:process boiling
:parameters (?w - water)
:precondition (heating ?w)
:effect (increase (temperature ?w) (* #t 1))
)
(:event boiled
:parameters (?w – water)
:precondition (and (heating ?w)
(= (temperature ?w) 100))
:effect (and (not (heating ?w)) (boiled ?w))
)
“Modelling Mixed Discrete-Continuous Domains for Planning” Maria Fox and Derek Long, JAIR 2006
Durative action model:
(< (temperature water) 100)
heating water
(>= (temperature water) 100)
boiling water
triggered
increase (temperature water) (* #t 1)
boiling
boiled water
Process model:
?duration > 0
(heating water1)
increase (temperature water) (* #t 1)
boilwater
(heating water1)
= (temperature water1) 100
(boiled water1) (not (heating water1))
(not (heating water1))
Continuous Processes
• Physical processes in the nuclear decommissioning domain: (:process heating
:parameters (?r – rod ?w - water)
:precondition (removedfrom ?r ?w)
:effect (and (unstable ?r)
(increase (temperature ?r)
(* #t heatingrate))
)
(:event explosion
:parameters (?r – rod)
:precondition (and (unstable ?r)
(>= (temperature ?r) critical))
:effect (and (not (unstable ?r)) (nucleardisaster))
)
Concurrent Continuous Processes
• The cooling rate depends on the current temperature and the room temperature:
(:process cooling
:parameters (?w - water)
:precondition (> (temperature ?w) (roomtemp))
:effect (decrease (temperature ?w)
(* #t (- (temperature ?w) (roomtemp)))
)
• Since cooling is triggered whenever the water is heating, the rate of change of the water temperature will be given by the sum of the process
effects:
d temperature
dt = heatingrate – (temperature – roomtemp)
nonlinear rate of change
window
When the window is opened a circuit is made, leading to the capacitor charging. When the required voltage is reached, the alarm is set off.
Suppose we want to model some physical process that the planner needs to interact with, such as an alarm system.
More Complex Models
Goal1: awake Plan:
0: (openwindow)
……
Goal2: (and (deeplyasleep) (freshair)) Plan:
0: (openwindow)
t: (closewindow)
Must be late enough to get the fresh air, and early enough to avoid the alarm going off
The PDDL+ Model (:action openwindow
:parameters ( )
:precondition (and (windowclosed)
(magnetoperational))
:effect (and (not (magnetoperational))
(not (windowclosed))
(windowopen) (freshair))
)
(:event makecircuit
:parameters ( )
:precondition (and (not (magnetoperational))
(not (circuit)))
:effect (circuit)
)
Cascading Events
The capacitor starts to store charge as soon as the circuit is made, continuing till the circuit voltage is reached
openwindow windowclosed magnetoperational deeplyasleep
0 time
windowopen not (magnetoperational)
circuit
(>= (charge) circuitvoltage)
increase (charge) (* #t (/ 1 (resistance))) voltage
voltageavailable
chargecapacitor
makecircuit
(:process chargecapacitor
:parameters ( )
:precondition (and (circuit) (not (voltage)))
:effect (increase (charge)(* #t (/ 1 (resistance))))
)
Cascading Events
(:event voltageavailable
:parameters ( )
:precondition (and (>= (charge) 5)
(not (voltage)))
:effect (and (voltage))
)
(:event alarmtriggered
:parameters ( )
:precondition (and (circuit)
(alarmdisabled)
(voltage) )
:effect (and (alarmenabled)
(not (alarmdisabled))
(ringing))
)
Circuit voltage = 5V Resistance = 2Ω
As soon as the circuit voltage is reached, the event of voltageavailable is triggered, which in turn triggers the alarm
(:process ring
:parameters ( )
:precondition (ringing)
:effect (increase (ringtime) (* #t 1))
)
(:event rouseprincess
:parameters ( )
:precondition (and (ringing)
(>= (ringtime) 0.001)
(deeplyasleep))
:effect (and (almostawake)
(not (deeplyasleep)))
)
Non-zero reaction time
Exploiting Event Effects
(:action kiss
:parameters ( )
:precondition (almostawake)
:effect (and (awake) (not (almostawake)))
)
When should the Prince do the Kiss?
• To wake her up, the planner has only to exploit the fact that opening the window will cause a circuit resulting in the alarm going off.
• The kiss action can then be timed to occur when the capacitor has had time to charge to the full circuit voltage, and the alarm has had time to ring.
• The capacitor is fully charged when charge = 5.
• The time it takes for the charge to reach 5 (given that resistance = 2) is 2*circuit voltage = 10. It will take an additional 0.001 time unit to rouse the princess.
• The kiss must take place no earlier than 10.002 to guarantee that the princess is fully awoken.
A Linear Program Constructed Alongside the Developing Plan
Planner
LP Solver
LP built from plan choices
Solution determines timing of actions
A Linear Program Constructed Alongside the Developing Plan
minimise timeofkiss Subject to: openwindow >= 0 makecircuit = openwindow chargestart = makecircuit chargeend - chargestart = 2*charge; charge >= 5; chargeend = voltageavailable; triggeredalarm = voltageavailable; ringingstart = triggeredalarm; rouseprincess - ringingstart >= 0.001; timeofkiss >= rouseprincess + 0.001;
Time variables
Find the earliest time at which to do the kiss action
Resistance = 2
Circuit voltage = 5
Reaction time = 0.001
openwindow
windowclosed magnetoperational deeplyasleep
0 time
almostawake
kiss
circuit
(>= (capacitance) 5)
increase (capacitance) (* #t (/ 1 (resistance))) voltage
makecircuit
voltageavailable
0 time
alarmtriggered
ringing
chargecapacitor
ring
0.001 time units awake
rouseprincess
windowopen not (magnetoperational)
Avoiding Event Effects
• To give her fresh air but ensure not to wake her up, the planner must choose the moment at which to close the opened window
• Let x be the control parameter: the amount of charge in the capacitor
• From initial facts we have that x <= 5 and resistance is 2.
• The window is open for non-zero time, so x > 0
• The window must be closed while x ϵ (0, 5)
• The time it takes for the charge to reach x is 2x.
• To avoid rousing her the planner must close the window in the interval t ϵ (0,10)
We don’t want the voltageavailable event so the interval is open on the right
x is strictly greater than zero so the interval is open on the left
Closing the window will break the circuit, causing a mutex with the alarmtriggered event, so it must come earlier than 10
t is strictly greater than zero so the interval is open on the left
Increasing Complexity
• Everything so far can be modelled in PDDL+
• All the state variables are Boolean or Numeric
• At least two generic planners exist that can solve PDDL+ problems:
UPMurphi: Della Penna, Magazzeni, Mercorio and Intriglia
POPF: latest version by Coles and Coles (ICAPS 2014)
• In more realistic domains there are structured types that encapsulate specialised behaviours
• Planning Modulo Theories is a planning framework designed for managing structured types in hybrid domains
33kV network, load and supply
Supply Profile
Profiles modelled using Timed Initial Fluents:
(at 5 (= (load b1) 3.5))
(at 10 (= (load b2) 6))
(at 17 (= (supply g1) 20))…etc, added to the
initial state
Planning Problem: to maintain voltages within bounds over a period of time (eg: 24 hours) given demand and supply at busbars in the network
33kV network, load and supply
Supply Profile
Planning Problem: to maintain voltages within bounds over a period of time (eg: 24 hours) given demand and supply at busbars in the network
-15
-10
-5
0
5
10
0 5 10 15 20 25
tap r
atio
time
Tap Changes
tap0tap9
-15
-10
-5
0
5
10
0 5 10 15 20 25
tap r
atio
time
Tap Changes
tap0tap6
tap14tap15tap16
0.9
0.95
1
1.05
1.1
1.15
0 5 10 15 20 25
voltage
time
Voltage Profile
busbar7busbar6
busbar33
busbar32threshold
0.9
0.95
1
1.05
1.1
1.15
0 5 10 15 20 25
voltage
time
Voltage Profile
busbar7busbar6
busbar33
busbar32threshold
Planner Reactive
Temporal Voltage Control
• Requires solution of AC power flow
equations
• Local changes have global effects
• Requires an external solver to find network properties at time points
• Solver computes voltages at busbars in context of current settings
(real)
(reactive)
(complex)
(phase)
Planner choice: setTap Consequences on voltages
Are all constraints satisfied?
planner
Proposed settings
AC power flow
Accept/Reject + constraints
Temporal plan
Prune and search
setTap
Planner choice: setTap Consequences on voltages
How good was the choice?
planner
Network and proposed settings
AC power flow
Network + constraints
Evaluate
Temporal plan
Abstract Network
setTap
Types and their Functions
• Complex type: Network
(define (module Network)
(:type 3-tuple Configuration PQvalues Voltages)
(:functions
(setTap ?n – Network ?t – TapIndex ?s – TapSetting) – Network
……)
)
• Depends on:
(define (module Voltages)
(:type VectorOf Real)
(:functions
(index ?vs - Voltages ?bbi – BusBarIndex) – Real
……)
)
(:action tapchange
:parameters (?t – TapIndex ?s - TapSetting)
:condition (available ?t ?s (theNetwork))
:effect (assign (theNetwork)
(setTap (theNetwork) ?t ?s))
)
Processes over Networks
(:process rampingUp
:parameters (?g – generator)
:precondition (currentlyRampingUp ?g)
:effect (increase (theNetwork) (* #t (rampUpRate ?g))))
(:process rampingDown
:parameters (?g – generator)
:precondition (currentlyRampingDown ?g)
:effect (decrease (theNetwork) (* #t (rampDownRate ?g))))
Voltage (some busbar)
Time
Start Ramping Up G1
Start Ramping Down G2
Start Ramping Down G2
Planning Modulo Theories
language for defining structured types and their
functions as modules
Now we have a range of types beyond Boolean and
Numeric
language for defining actions, processes and events using structured
types
MDDL CDDL
Core Planner
Abstract Network Type Abstraction: projection onto proportional effects of tap changes on busbar voltages
... Alternative real ranges for each busbar
Т
Т
“Combining a Temporal Planner with an External Solver for the Power Balancing Problem in an Electricity Network” Chiara Piacentini, Maria Fox and Derek Long, ICAPS 2013
……… ………
…………………….
([x1],[x2],…[xn]) ([x1],[x2],…[xn])
([x1],[x2],…[xn])
([x1,px1],[x2p2x2],…)
D = Rn
pi is a real-valued proportion, which decays outwards from the tap
Each value in the ordering is obtained by combining the previous values with the generated new values
Relaxed Reachability Analysis
……..
v1
v2
v3
v4
[v1,pv1] [v2,pv2] [v3,pv3] [v4,pv4] [v5,v5] ….. ….. ….. ….. ….. …..
….. ….. ….. ….. ….. v10 = 1.6kv
[v1,pv1] [v2,pv2] [p2v3,pv3] [p2v4,pv4] [p2v5,v5] [p2v6,v6] [p2v7,v7] ….. ….. …..
[v1,pv1] [v2,pv2] [p2v3,pv3] [p2v4,pv4] [p2v5,v5] [p2v6, p3v6] [p2v7, p3v7] [v8,p3v8] [v9,p3v9] [v10,p3v10] ….. ….. …..
[v1,pv1] [v2,pv2] [p2v3,pv3] [p2v4,pv4] [p2v5,v5] [p2v6, p3v6] [p2v7, p3v7] [v8,p3v8] [v9,p3v9] [v10,pnv10] ….. ….. …..
tap1
tap3
tap7
tap10
Goal: v10 >= 3.5kv …
Goal is in range
Meet operation
• Load and Supply profiles imply voltage constraints at busbars
• A lattice meet operation can be applied to reduce the range of reachable voltages at individual bus bars to ensure operational ranges are maintained
……..
v1
v2
v3
v4
[v1,pv1] [v2,pv2] [v3,pv3] [v4,pv4] [v5,v5] ….. ….. ….. ….. ….. …..
….. ….. ….. ….. ….. v10 = 1.6kv
[v1,pv1] [v2,pv2] [p2v3,pv3] [p2v4,pv4] [p2v5,v5] [p2v6,v6] [p2v7,v7] ….. ….. …..
[v1,pv1] [v2,pv2] [p2v3,pv3] [p2v4,pv4] [p2v5,v5] [p2v6, p3v6] [p2v7, p3v7] [v8,p3v8] [v9,p3v9] [v10,p3v10] ….. ….. …..
[v1,pv1] [v2,pv2] [p2v3,pv3] [p2v4,pv4] [p2v5,v5] [p2v6, p3v6] [p2v7, p3v7] [v8,p3v8] [v9,p3v9] [v10,pkv10] ….. ….. …..
tap1
tap3
tap7
Goal is in range
Planning Modulo Theories
• Identify the structured types required in the domain
• Decide on appropriate abstractions for these types
• For each one, build the join operation and the meet operation
• Combine all of the domain lattices into a single heuristic function for the planning domain
• Evaluate the informativeness of the heuristic
• If not good enough, go back and revise the abstractions of the types
Core temporal and numeric Planner