CISC453 Winter 2010 Planning & Acting in the Real World AIMA3 e Ch 11 Time & Resources Hierarchical...

CISC453 Winter 2010

Planning & Acting in the Real WorldAIMA3e Ch 11Time & ResourcesHierarchical TechniquesRelaxing Environmental Assumptions

Planning & Acting in the Real World

2

Overview

extending planning language & algorithms 1. allow actions that have durations & resource constraints

yields a new "scheduling problem" paradigm incorporating action durations & timing, required resources

2. hierarchical planning techniques control the complexity of large scale plans by hierarchical

structuring of actions 3. uncertain environments

non-deterministic domains 4. multiagent environments

3

Scheduling versus Planning

recall from classical planning (Ch 10) PDDL representations only allowed us to decide the relative

ordering among planning actions up till now we've concentrated on what actions to do, given

their PRECONDs & EFFECTs

in the real world, other properties must be considered actions occur at particular moments in time, have a beginning

and an end, occupy or require a certain amount of time

for a new category of Scheduling Problems we need to consider the absolute times when an event or action will occur & the durations of the events or actions

typically these are solved in 2 phases: planning then scheduling a planning phase selects actions, respecting ordering constraints

this might be done by a human expert, and automated planners are suitable if they yield minimal ordering constraints

then a scheduling phase incorporates temporal information so that the result meets resource & deadline constraints


4

Time, Schedules & Resources

the Job-Shop Scheduling (JSS) paradigm includes the requirement to complete a set of jobs

each job consists of a sequence of actions with ordering constraints

each action has a given duration and may also require some resources resource constraints indicate the type of resource, the number of

it that are required, and whether the resource is consumed in the action or is reusable

the goal is to determine a schedule one that minimizes the total time required to complete all jobs,

(the makespan) while respecting resource requirements & constraints

5

Job-Shop Scheduling Problem (JSSP)

JSSP involves a list of jobs to do where a job is a fixed sequence of actions

actions have quantitative time durations & ordering constraints actions use resources (which may be shared among jobs)

to solve the JSSP: find a schedule that determines a start time for each action

1. that obeys all hard constraints e.g. no temporal overlap between mutex actions (those using the same

one-action-at-a-time resource) 2. for our purposes, we'll operationalize cost as the total time to

perform all actions and jobs note that the cost function could be more complex (it could include the

resources used, time delays incurred, ...) our example: automobile assembly scheduling

the jobs: assemble two cars each job has 3 actions: add the engine, add the wheels, inspect the

whole car a resource constraint is that we do the engine & wheel actions at a

special one-car-only work station


6

Ex: Car Construction Scheduling

the job shop scheduling problem of assembling 2 cars includes required times & resource constraints notation: A < B indicates action A must precede action B

Jobs({AddEngine1 < AddWheels1 < Inspect1}, {AddEngine2 < AddWheels2 < Inspect2})

Resources (EngineHoists(1), WheelStations(1), Inspectors(2), LugNuts(500))

Action(AddEngine1, DURATION: 30, USE: EngineHoists(1))Action(AddEngine2, DURATION: 60, USE: EngineHoists(1))Action(AddWheels1, DURATION:30, CONSUME: LugNuts(20), USE: WheelStations(1))Action(AddWheels2, DURATION:15, CONSUME: LugNuts(20), USE: WheelStations(1))Action(Inspecti DURATION: 10, USE: Inspectors(1))


7

Car Construction Scheduling

note that the action schemas list resources as numerical quantities, not named entities so Inspectors(2), rather than Inspector(I1) & Inspector(I2) this process of aggregation is a general one it groups objects that are indistinguishable with respect to the

current purpose this can help reduce complexity of the solution

for example, a candidate schedule that requires (concurrently) more than the number of aggregated resources can be rejected without having to exhaustively try assignments of individuals to actions


8

Planning + Scheduling for JSSP

Planning + Scheduling for Job-Shop Problems scheduling differs from standard planning problem

considers when an action starts and when it ends so in addition to order (planning), duration is also considered

we begin with ignoring the resource constraints, solving the temporal domain issues to minimize the makespan

this requires finding the earliest start times for all actions consistent with the problem's ordering constraints

we create a partially-ordered plan, representing ordering constraints in a directed graph of actions

then we apply the critical path method to determine the start and end times for each action

9Graph of POP + Critical Path

the critical path is the path with longest total duration it is "critical" in that it sets the duration for the whole plan and

delaying the start of any action on it extends the whole plan it is the sequence of actions, each of which has no slack

each must begin at a particular time, otherwise the whole plan is delayed

actions off the critical path have a window of time given by the earliest possible start time ES & the latest possible start time LS

the illustrated solution assumes no resource constraints note that the 2 engines are being added simultaneously

the figure shows [ES, LS] for each action, & slack is LS - ES the time required is indicated below the action name & bold links

mark the critical path

10

JSSP: (1)Temporal Constraints

schedule for the problem is given by ES & LS times for all actions

note the 15 minutes slack for each action in the top job, versus 0 (by definition) in the critical path job

formulas for ES & LS also outline a dynamic-programming algorithm for computing them

A, B are actions, A < B indicates A must come before B

ES(Start) =0ES(B) = maxA<B ES(A) + Duration(A)

LS(Finish) = ES(Finish)LS(A) = minB>A LS(B) - Duration(A)

complexity is O(Nb) where N is number of actions and b is the maximum branching factor into or out of an action

so without resource constraints, given a partial ordering of actions, finding the minimum duration schedule is (a pleasant surprise!) computationally easy


11

JSSP: (1)Temporal Constraints

timeline for the solution grey rectangles

give intervals for actions

empty portions show slack

12

Solution from POP + Critical Path

1. the partially-ordered plan (above) 2. the schedule from the critical-path method (below)

notice that this solution still omits resource constraints for example, the 2 engines are being added simultaneously

13

Scheduling with Resources

including resource constraints critical path calculations involve conjunctions of linear

inequalities over action start & end times they become more complicated when resource constraints are

included (for example, each AddEngine action requires the 1 EngineHoist, so they cannot overlap)

they introduce disjunctions of linear inequalities for possible orderings & as a result, complexity becomes NP-hard!!

here's a solution accounting for resource constraints reusable resources are in the left column, actions align with resources this shortest solution schedule requires 115 minutes

14


including resource constraints notice

that the shortest solution is 30 minutes longer than the critical path without resource constraints

that multiple inspector resource units are not needed for this job, indicating the possibility for reallocation of this resource

that the "critical path" now is: AddEngine1, AddEngine2, AddWheels2, Inspect2.

the remaining actions have considerable slack time, they can begin much later without affecting the total plan time


15


for including resource constraints a variety of solution techniques have been tested

one simple approach uses the minimum slack heuristic at each step schedule next the unscheduled action that has its

predecessors scheduled & has the least slack update ES & LS for impacted actions & repeat

note the similarity to minimum-remaining values (MRV) heuristic of CSPs

applied to this example, it yields a 130 minute solution 15 minutes longer than the optimal solution

difficult scheduling problems may require a different approach they may involve reconsidering actions & constraints, integrating

the planning & scheduling phases by including durations & overlaps in constructing the POP

this approach is a focus of current research interest


16

Time & Resource Constraints

summary alternative approaches to planning with time & resource

constraints 1. serial: plan, then schedule

use a partial or full-order planner then schedule to determine actual start times

2. interleaved: mix planning and scheduling for example, include resource constraints during partial planning these can determine conflicts between actions

notes: remember that so far we are still working in classical planning

environments so, fully observable, deterministic, static and discrete


17

Hierarchical Planning

next we add techniques to the handle plan complexity issue HTN: hierarchical task network planning this works in a top-down fashion

similar to the stepwise refinement approach to programming plans that are built from a fixed set of small atomic actions

will become unwieldy as the planning problem grows large we need to plan at a higher level of abstraction

reduce complexity by hierarchical decomposition of plan steps at each level of the hierarchy a planning task is reduced to a

small number of activities at the next lower level the low number of activities

means the computational cost of arranging these activities can be lowered


18


an example: the Hawaiian vacation plan recall: the AIMA authors live/work in San Francisco Bay area

go to SFO airport take flight to Honolulu do vacation stuff for 2 weeks take flight back to SFO go Home

each action in this plan actually embodies another planning task

for example: the go to SFO airport action might be expanded drive to long term parking at SFO park take shuttle to passenger terminal

& each action can be decomposed until the level consists of actions that can be executed without deliberation

note: some component actions might not be refined until plan execution time (interleaving: a somewhat different topic)

19


basic approach at each level, each component is reduced to a small number of

activities at the next lower level this keeps the computational cost of arranging them low otherwise, there are too many individual atomic actions for non-

trivial problems (yielding high branching factor & depth) the formalism is HTN planning

Hierarchical Task Network planning notes

we retain the basic environmental assumptions as for classical planning

what we previously simply called actions are now "primitive actions"

we add HLAs: High Level Actions (like go to SFO airport) each has 1 or more possible refinements refinements are sequences of actions, either HLAs or primitive

actions

20

Hierarchical Task Network

alternative refinements: notation for the HLA: Go(Home, SFO)

Refinement (Go(Home, SFO),

STEPS: [Drive(Home, SFOLongTermParking), Shuttle(SFOLongTermParking, SFO)])

Refinement (Go(Home, SFO),

STEPS: [Taxi(Home, SFO)])

the HLAs and their refinements capture knowledge about how to do things terminology: if the HLA refines to only primitive actions

it is called an implementation the implementation of a high-level plan (sequence of HLAs)

concatenates the implementations for each HLA the preconditions/effects representation of primitive action

schemas allows a decision about whether an implementation of a high-level plan achieves the goal


21

Hierarchical Task Network

HLAs & refinements & plan goals in the HTN approach, the goal is achieved if any

implementation achieves it this is the case since an agent may choose the

implementation to execute (unlike non-deterministic environments where "nature" chooses)

in the simplest case there's a single implementation of an HLA we get preconds/effects from the implementation, and then treat

the HLA as a primitive action where there are multiple implementations, either

1. search over implementations for 1 that solves the problem OR 2. reason over HLAs directly

derive provably correct abstract plans independent of the specific implementations


22

Search Over Implementations

1. the search approach this involves generation of refinements by replacing an HLA in

the current plan with a candidate refinement until the plan achieves the goal

the algorithm on the next slide shows a version using breadth-first tree search, considering plans in the order of the depth of nesting of refinements

note that other search versions (graph-search) and strategies (depth-first, iterative deepening) may be formulated by re-designing the algorithm

explores the space of sequences derived from knowledge in the HLA library re: how things should be done

the action sequences of refinements & their preconditions code knowledge about the planning domain

HTN planners can generate very large plans with little search


23

Search Over Implementations

the search algorithm for refinements of HLAs

function HIERARCHICAL-SEARCH(problem, hierarchy) returns a solution or failurefrontier a FIFO queue with [Act] as the only elementloop do

if EMPTY?(frontier) then return failureplan POP(frontier) /* chooses the shallowest plan in frontier */hla the first HLA in plan, or null if noneprefix, suffix the action subsequences before and after hla in planoutcome RESULT(problem.INITIAL-STATE, prefix)if hla is null then /* so plan is primitive & outcome is

its result */if outcome satisfies problem.GOAL then return plan

/* insert all refinements of the current hla into the queue */else for each sequence in REFINEMENTS(hla, outcome, hierarchy) do

frontier INSERT(APPEND(prefix, sequence, suffix), frontier)


24

HTN Examples

O-PLAN: an example of a real-world system the O-PLAN system does both planning & scheduling,

commercially for the Hitachi company one specific sample problem concerns a product line of 350

items involving 35 machines and 2000+ different operations for this problem, the planner produces a 30-day schedule of

3x8-hour shifts, with 10s of millions of steps a major benefit of the hierarchical structure with the HTN

approach is the results are often easily understood by humans abstracting away from excessive detail

(1) makes large scale planning/scheduling feasible (2) enhances comprehensibility


25

HTN Efficiency

computational comparisons for a hypothetical domain assumption 1: a non-hierarchical progression planner with d

primitive actions, b possibilities at each state: O(bd) assumption 2: an HTN planner with r refinements of each non-

primitive, each with k actions at each level how many different refinement trees does this yield? depth: number of levels below the root = logkd

then the number of internal refinement nodes = 1 + k + k2 + … + klogkd-1 = (d - 1)/(k - 1)

each internal node has r possible refinements, so r(d - 1)/(k - 1) possible regular decomposition trees

the message: keeping r small & k large yields big savings (roughly kth root of non-hierarchical cost if b & r are comparable)

nice as a goal, but long action sequences that are useful over a range of problems are rare


26

HTN Efficiency

HTN computational efficiency building the plan library is critically important to achieving

efficiency gains HTN planning so, might the refinements be learned?

as one example, an agent could build plans conventionally then save them as a refinement of an HLA defined as the current task/problem

one goal is "generalizing" the methods that are built, eliminating problem-instance specific detail, keeping only key plan components


27


we've just looked at the approach of searching over fully refined plans that is, full implementations the algorithm refines plans to primitive actions in order to

check whether they achieve the problem goal now we move on to searching for abstract solutions

the checking occurs at the level of HLAs possibly with preconditions/effects descriptions for HLAs

the result is that search is in the much smaller HLA space, after which we refine the resulting plan

28


searching for abstract solutions this approach will require that HLA descriptions have the

downward refinement property every high level plan that apparently solves the problem (from the

description of its steps) has at least 1 implementation that achieves the goal

since search is not at the level of sequences of primitive actions, a core issue is the describing of effects of actions (HLAs) with multiple implementations

assuming a problem description with only +ve preconds & goals, we might describe an HLA's +ve effects in terms of those achieved by every implementation, and its -ve effects in terms of those resulting from any implementation

this would satisfy the downward refinement property however, requiring an effect to be true for every implementation is

too restrictive, it assumes that an adversary chooses the implementation (assumes an underlying non-deterministic model)


29

Plan Search in HLA Space

plan search in HLA space there are alternative models for which implementation is

chosen, either (1) demonic non-determinism where some adversary makes the

choice (2) angelic non-determinism, where the agent chooses

if we adopt angelic semantics for HLA descriptions the resulting notation uses simple set operations/notation the key concept is that of the reachable set for some HLA h &

state s, notation: Reach(s, h) this is the set of states reachable by any implementation of h

(since under angelic semantics, the agent gets to choose) for a sequence of HLAs [h1, h2] the reachable set is the union

of all reachable sets from applying h2 in each state in the reachable set of h1 (for notation details see p 411)

a sequence of HLAs forming a high level plan is a solution if its reachable set intersects the set of goal states


30

Plan Search in HLA Space

illustration of reachable sets, sequences of HLAs dots are states, shaded areas = goal states darker arrows: possible implementations of h1

lighter arrows: possible implementations of h2

(a) reachable set for HLA h1

(b) reachable set for the sequence [h1, h2] circled dots show the sequence achieving the goal


31

Planning in HLA Space

using this model planning consists of searching in HLA space for a sequence

with a reachable set that intersects the goal, then refining that abstract plan

note: we haven't considered yet the issue of representing reachable sets as the effects of HLAs our basic planning model has states as conjunctions of fluents if we treat the fluents of a planning problem as state

variables, then under angelic semantics an HLA controls the values of these variables, depending on which implementation is actually selected

HLA may have 9 different effects on a given variable if it starts true, in can always keep it true, always make it false, or

have a choice & similarly for a variable that is initially false any combination of the 3 choices for each case is possible,

yielding 32 or 9 effects


32


using this model so there are 9 possible combinations of choices for the effects

on variables we introduce some additional notation to capture this idea

note some slight formatting differences between the details of the notation used here versus in the textbook

~ indicates possibility, the dependence on the agent's choice of implementation

~+A indicates the possibility of adding A ~-A represents the possible deleting of A ~±A stands for possibly adding or deleting A

33


possible effects of HLAs a simple example uses the HLA for going to the airport

Go(Home, SFO)Refinement (Go(Home, SFO),

STEPS: [Drive(Home, SFOLongTermParking), Shuttle(SFOLongTermParking, SFO)])Refinement (Go(Home, SFO),

STEPS: [Taxi(Home, SFO)]) this HLA has ~-Cash as a possible effect, since the agent may

choose the refinement of going by taxi & have to pay

we can use this notation & angelic reachable state semantics to illustrate how an HLA sequence [h1, h2] reaches a goal

it's often the case that an HLA's effects can only be approximated (since it may have infinitely many implementations & produce arbitrarily "wiggly" reachable sets)

we use approximate descriptions of result states of HLAs that are optimistic: REACH+(s, h) or pessimistic: REACH-(s, h) one may overestimate, the other underestimate here's the definition of the relationship REACH-(s, h) REACH(s, h) REACH+(s, h)

34


possible effects of HLAs using approximate descriptions of result states with approximate descriptions, we need to reconsider how to

apply/interpret the goal test (1) if the optimistic reachable set for a plan does not intersect the

goal, then the plan is not a solution (2) if the pessimistic reachable set for a plan intersects the goal,

then the plan is a solution (3) if the optimistic set intersects but the pessimistic set does not,

the goal test is not decided & we need to refine the plan to resolve residual ambiguity

35


illustration shading shows the set of goal states reachable sets: R+ (optimistic) shown by dashed boundary, R-

(pessimistic) by solid boundary in (a) the plan shown by a dark arrow achieves the goal & the

plan shown by the lighter arrow does not in (b), the plan needs further refinement since the R+

(optimistic) set intersects the goal but the R- (pessimistic) does not

36


the algorithm hierarchical planning with approximate angelic descriptions

function ANGELIC-SEARCH(problem, hierarchy, initialPlan) returns solution or failfrontier a FIFO queue with initialPlan as the only elementloop do

if EMPTY?(frontier) then return failplan POP(frontier) /* chooses

shallowest node in frontier */if REACH+(problem.INITIAL-STATE, plan) intersects problem.GOAL then /* opt'c*/

if plan is primitive then return plan /* REACH+ is exact for primitive plans */

guaranteed REACH-(problem.INITIAL-STATE, plan) problem.GOAL /* pess'c*/

/* pessimistic set includes a goal state & we're not in infinite regress of refinements */

if guaranteed {} and MAKING-PROGRESS(plan, initialPlan) thenfinalState any element of guaranteedreturn DECOMPOSE(hierarchy, problem.INITIAL-STATE, plan,

finalState)hla some HLA in planprefix, suffix the action subsequences before & after hla in planfor each sequence in REFINEMENTS(hla, outcome, hierarchy) do

frontier INSERT(APPEND(prefix, sequence, suffix), frontier)

37


the decompose function mutually recursive with ANGELIC-SEARCH

regress from goal to generate successful plan at next level of refinement

function DECOMPOSE(hierarchy, s0, plan, sf) returns a solution

solution an empty planwhile plan is not empty do

action REMOVE-LAST(plan)si a state in REACH-(s0, plan) such that sf REACH-(si, action)

problem a problem with INITIAL-STATE = si and GOAL = sf

solution APPEND(ANGELIC-SEARCH(problem, hierarchy, action), solution)sf si

return solution


38


notes ANGELIC-SEARCH has the same basic structure as the

previous algorithm (BFS in space of refinements) the algorithm detects plans that are or aren't solutions by

checking intersections of optimistic & pessimistic reachable sets with the goal

when it finds a workable abstract plan, it decomposes the original problem into subproblems, one for each step of the plan

the initial state & goal for each subproblem are derived by regressing the guaranteed reachable goal state through the action schemas for each step of the plan

ANGELIC-SEARCH has a computational advantage over the previous hierarchical search algorithm, which in turn may have a large advantage over plain old exhaustive search

39

Least Cost & Angelic Search

the same approach can be adapted to find a least cost solution this generalizes the reachable set concept so that a state,

instead of being reachable or not, has costs for the most efficient way of getting to it ( for unreachable states)

then optimistic & pessimistic descriptions bound the costs the holy grail of hierarchical planning

this revision may allow finding a provably optimal abstract plan without checking all implementations

extensions: the approach can also be applied to online search in the form of hierarchical lookahead algorithms (recall LRTA*)

the resulting algorithm resembles the human approach to problems like the vacation plan

initially consider alternatives at the abstract level, over long time scales leave parts of the plan abstract until execution time, though other parts

are expanded into detail (flights, lodging) to guarantee feasibility of the plan


40

Nondeterministic Domains

finally, we'll relax some of the environment assumptions of the classical planning model in part, these parallel the extensions of our earlier (CISC352)

discussions of search we'll consider the issues in 3 sub-categories (1) sensorless planning (conformant planning)

completely drop the observability property for the environment (2) contingency planning

for partially observable & nondeterministic environments (3) online planning & replanning

for unknown environments however, we begin with some background

41

BKGD: Nondeterministic Domains

note some distinct differences from the search paradigms the factored representation of states allows an alternative belief

state representation plus, we have the availability of the domain-independent

heuristics developed for classical planning as usual, we explore issues using a prototype problem

this time it's the task of painting a chair & table so that their colors match

in the initial state, the agent has 2 cans of paint, colors unknown, likewise the chair & table colors are unknown, & only the table is visible

plus there are actions to remove the lid of a can, & to paint from an open can (see the next slide)


42

The Furniture Painting Problem

the furniture painting problemInit(Object(Table) Object(Chair) Can(C1) Can(C2) InView(Table)

Goal(Color(Chair, c) Color(Table, c))

Action(RemoveLid(can),PRECOND: Can(can)EFFECT: Open(can))

Action(Paint(x, can),PRECOND: Object(x) Can(can) Color(Can, c) Open(can)EFFECT: Color(x, c))


43


the environment since it may not be fully observable, we'll allow action

schemas to have variables in preconditions & effects that aren't in the action's variable list

Paint(x, can) omits the variable c representing the color of the paint in can

the agent may not know what color is in a can in some variants, the agent will have to use percepts it gets

while executing the plan, so planning needs to model sensors the mechanism: Percept Schemas

Percept (Color(x, c), PRECOND: Object(x) InView(x))

Percept (Color(can, c),PRECOND: Can(can) InView(can) Open(can))

when an object is in view, the agent will perceive its color if an open can is in view, the agent will perceive the paint color


44


we still need an Action Schema for inspecting objectsAction (LookAt(x),

PRECOND: InView(y) (x y)EFFECT: InView(x) ¬ InView(y)) in a fully observable environment, we include a percept axiom

with no preconds for each fluent of course, a sensorless agent has no percept axioms

note: it can still coerce the table & chair to the same color to solve the problem (though it won't know what color that is)

a contingent planning agent with sensors can do better inspect the objects, & if they're the same color, done otherwise check the paint cans & if one is the same color as an

object, paint the other object with it otherwise paint both objects any color

an online agent produces contingent plans with few branches handling problems as they occur by replanning


45


a contingent planner assumes that the effects of an action are successful

a replanning agent checks results, generating new plans to fix any detected flaws

in the real world we find combinations of approaches

46

Sensorless Planning Belief States

unobservable environment = Sensorless Planning these problems are belief state planning problems with physical

transitions represented by action schemas we assume a deterministic environment we represent belief states as logical formulas rather than the

explicit sets of atomic states we saw for sensorless search for the prototype planning problem: furniture painting

1. we omit the InView fluents 2. some fluents hold in all belief states, so we can omit them for

brevity: (Object(Table), Object(Chair), Can(C1), Can(C2)) 3. the agent knows things have a color (x c Color(x, c)), but

doesn't know the color of anything or the open vs closed state of cans

4. yields an initial belief state b0 = Color(x, C(x)), where C(x) is a Skolem function to replace the existentially quantified variable

5. we drop the closed-world assumption of classical planning, so states may contain +ve & -ve fluents & if a fluent does not appear, its value is unknown


47


belief states specify how the world could be they are represented as logical formulas each is a set of possible worlds that satisfy the formula in a belief state b, actions available to the agent are those

with their preconds satisfied in b given the initial belief state b0 = Color(x, C(x)), a simple

solution for the painting problem plan is:

[RemoveLid(Can1), Paint(Chair, Can1), Paint(Table, Can1)]

we'll update belief states as actions are taken, using the rule b' = RESULT(b, a) = {s': s' = RESULTP(s, a) and s b}

where RESULTP defines the physical transition model


48


updating belief states we assume that the initial belief state is 1-CNF form, that is, a

conjunction of literals b' is derived based on what happens for the literals l in the

physical states s that are in b when a is applied if the truth value of a literal is known in b then in b' it is given by

the current value, plus the add list of a & the delete list of a if a literal's truth value is unknown, 1 of 3 cases applies

1. a adds l so it must be true in b' 2. a deletes l so it must be false in b' 3. a does not affect l so it remains unknown (thus is not in b')

49


updating belief states: the example plan recall the sensorless agent's solution plan for the furniture

painting problem[RemoveLid(Can1), Paint(Chair, Can1), Paint(Table, Can1)]

apply RemoveLid(Can1) to b0 = Color(x, C(x))(1) b1= Color(x, C(x)) Open(Can1)

apply Paint(Chair, Can1) to b1

precondition Color(Can1, c) is satisfied by Color(x, C(x)) with the binding {x/Can1, c/C(Can1)}

(2) b2 = Color(x, C(x)) Open(Can1) Color(Chair, C(Can1))

now apply the last action to get the next belief state, b3

(3) b3 = Color(x, C(x)) Open(Can1) Color(Chair, C(Can1)) Color(Table, C(Can1))

note that this satisfies the plan goal (Goal(Color(Chair, c) Color(Table, c))with c bound to C(Can1)


50


the painting problem solution this illustrates that the family of belief states given as

conjunctions of literals is closed under updates defined by PDDL action schemas

so given n total fluents, any belief state is represented as a conjunction of size O(n) (despite the O(2n) states in the world)

however, this is only the case when action schemas have the same effects for all states in which their preconds are satisfied

if an action's effects depends on the state, dependencies among fluents are introduced & the 1-CNF property does not apply

illustrated by an example from the simple vacuum world on the next slides


51

Recall Vacuum World

the simple vacuum world state space


52


if an action's effects depends on the state dependencies among fluents are introduced & the 1-CNF

property does not apply the effect of the Suck action depends on where it is done

(CleanL if agent is AtL, but CleanR if agent is AtR) this requires conditional effects for action schemas:

when condition: effect, or for the vacuum world Action (Suck,

Effect: when AtL: CleanL when AtR: CleanR) considering conditional effects & belief states

applying the conditional action to the initial belief state yields a result belief state

(AtL CleanL) (AtR CleanR) so the belief state formula is no longer 1-CNF, and in the

worst case may be exponential in size


53


to a degree, the available options are (1) use conditional effects for actions & deal with the loss of

the belief state representational simplicity (2) use a conventional action representation whose

preconditions, if unsatisfied, are inapplicable & leave the resulting state undefined

for sensorless planning, conditional effects are preferable they yield "wiggly" belief states (& maybe that's inevitable

anyway for non-trivial problems) an alternative is a conservative approximation of belief states (all

literals whose truth values can be determined, with the others treated as unknown)

this yields planning that is sound but incomplete (if problem requires interactions among literals)

54


another alternative the agent (algorithm) could attempt to use actions

sequences that keep the belief state simple (1-CNF) as in this vacuum world example the target is a plan consisting of actions that will yield the simple

belief state representation, for example:[Right, Suck, Left, Suck]b0 = True

b1 = AtR

b2 = AtR CleanR

b3 = AtL CleanR

b4 = AtL CleanR CleanL note that some alternative sequences (e.g. those beginning with

the Suck action) would break the 1-CNF representation more simple belief states are attractive, as even human behaviour

shows - the evidence is our carrying out of frequent small actions to reduce uncertainty (keeping the belief state manageable)


55


yet another alternative for representing belief states under the relaxed observability we might represent belief states in terms of an initial belief

state + a sequence of actions, yielding an O(n + m) bound on belief state size

a world of n literals, with a maximum of m actions in a sequence if so, the issues relate to the difficulty of calculating when an

action is applicable or a goal is satisfied we might use an entailment test: b0 Am ╞ Gm, where

b0 is the initial belief state

Am are the successor state axioms for the actions in the sequence, and Gm states the goal is achieved after m actions

so we want to show b0 Am ¬Gm is unsatisfiable a good SAT solver may be able to determine this quite efficiently

56

Sensorless Planning Heuristics

as a last consideration we return to the question of the use of heuristics to prune the

search space notice that for belief states, solving for a subset of the belief

state must be easier than solving it entirelyif b1 b2 then h*(b1) h*(b2)

thus an admissible heuristic for a subset of states in the belief state is an admissible heuristic for the belief state

candidate subsets include singletons, the individual states assuming we adopt 1 of the admissible heuristics we saw for

classical planning, and that s1, ..., sN is a random selection of states in belief state b, an accurate admissible heuristic is

H(b) = max{h(s1), ..., h(sN)} still other alternatives involve converting to planning graph form,

where the initial state layer is derived from b just its literals if b is 1-CNF or potentially derived from a non-CNF

representation

57

Contingent Planning

we relax some of the environmental assumptions of classical planning to deal with environments that are partially observable and/or non-deterministic for such environments, a plan includes branching based on

percepts (recall percept schemas from the introduction)Percept (Color(x, c),

PRECOND: Object(x) InView(x))Percept (Color(can, c),

PRECOND: Can(can) InView(can) Open(can))

at plan execution, we represent a belief state as logical formulas

the plan includes contingent/conditional branches check branch conditions: does the current belief state entail

the condition or its negation the conditions include first order properties (existential

quantification), so they may have multiple substitutions an agent gets to choose one, applying it to the remainder of

the plan

58

Contingent Planning

a contingent plan solution for the painting problem

[LookAt(Table), LookAt(Chair),if Color(Table, c) Color(Chair, c) then NoOpelse [RemoveLid(Can1), LookAt(Can1), RemoveLid(Can2), LookAt(Can2)

if Color(Table, c) Color(can, c) then Paint(Chair, can)else if Color(Chair, c) Color(Can, c) then Paint(Table, can)else [Paint(Chair, Can1), Paint(Table, Can1)]]]

note: Color(Table, c) Color(can, c) this might be satisfied under both {can/Can1} and {can/Can2} if

both cans are the same color as the table the previous-to-new belief state calculation occurs in 2 stages (1) after an action, a, as with the sensorless agent

b^ = (b - DEL(a)) Add(a), where b^ is the predicted belief state, represented as a conjunction of literals

(2) then in the percept stage, determine which percept axioms hold in the now partially updated belief state, and add their percepts + preconditions

59

Contingent Planning

(2) updating the belief state from the percept axioms Percept(p, PRECOND: c), where c is conjunction of literals

suppose percept literals p1, ..., pk are received for a given percept p, there's either a single percept axiom or there

may be more than 1 if just 1, add it's percept literal & preconditions to the belief state if > 1, then we have to deal with multiple candidate preconditions

add p & the disjunction of the preconditions that may hold in the predicted belief state b^

if this is the case, we've given up the 1-CNF form for belief state representation and similar issues arise as for conditional effects for the sensorless planner

given a way to generate exact or approximate belief states (1) the algorithm for contingent search may generate contingent

plans (2) actions with nondeterministic effects (disjunctive EFFECTs) can

be handled with minor changes to belief state updating (3) heuristics, including those that were suggested for sensorless

planning, are available

60

Contingent Planning

the AND-OR-GRAPH-SEARCH algorithm AND nodes indicate non-determinism, must all be handled, while

OR nodes indicate choices of actions from states the algorithm

is depth first, mutually recursive, & returns a conditional plan notation: [x | l] is the list formed by prepending x to the list l

function AND-OR-GRAPH-SEARCH(problem) returns a conditional plan, or failurereturn OR-SEARCH(problem.INITIAL-STATE, problem, [])

function OR-SEARCH(state, problem, path) returns a conditional plan or failureif problem.GOAL-TEST(state) then return the empty planif state is on path then return failure /* repeated state on this path */for each action in problem.ACTIONS(state) do

plan AND-SEARCH(RESULTS(state, action), problem, [state | path] )if plan failure then return [action | plan]

return failure

function AND-SEARCH(states, problem, path) returns a conditional plan or failurefor each si in states do

plani OR-SEARCH(si, problem, path )if plan = failure then return failure

return [ if s1 then plan1 else if s2 then plan2 else … if sn-1 then plann-1 else plann]

61

Online Replanning

replanning this approach uses/captures knowledge about what the agent is

trying to do some form of execution monitoring triggers replanning it interleaves executing & planning, dealing with some

contingencies by including Replan branches in the plan if the agent encounters a Replan during plan execution, it returns to

planning mode why Replan?

may be error or omission in the world model used to build the plan e.g. no state variable to represent the quantity of paint in a can (so

it could even be empty), or exogenous events (a can wasn't properly sealed & the paint dried up), or a goal may be changed

environment monitoring by the online agent (1) action monitoring: check preconds before executing an action (2) plan monitoring: check that the remaining plan will still work (3) goal monitoring: before executing, ask: "Is a better set of goals

available?"


62

Online Replanning

a replanning example action monitoring indicates the agent's state is not as

planned, so it should try to get back to a state in the original plan, minimizing total cost

when the agent finds it is in not in the expected state, E, but observes that it is instead in O, it Replans

63

Online Replanning

replanning in the furniture painting problem[LookAt(Table), LookAt(Chair),

if Color(Table, c) Color(Chair, c) then NoOpelse [RemoveLid(Can1), LookAt(Can1),

if Color(Table, c) Color(Can1, c) then Paint(Chair, Can1)

else REPLAN]]

the online planning agent, having painted the Chair, checks the preconds for the remaining empty plan: that the table & chair are the same colour suppose the new paint didn't cover well & the old colour still shows the agent needs to determine where in the whole plan to return to,

& what repair action sequence to use to get there given that the current state matches that before Paint(Chair, Can1), an

empty repair sequence & new plan of the same [Paint] sequence is OK the agent resumes execution monitoring, retries the Paint action &

loops like this until colours match note that the loop is online: plan-execute-replan, not explicit in the plan


64

Online Replanning

replan the original plan doesn't handle all contingencies, the REPLAN

step could generate an entirely new plan a plan monitoring agent may detect faults earlier, before the

corresponding actions are executed: when the current state means that the remaining plan won't work

so it checks preconditions for success of the remaining plan for each of its steps, except those contributed by some other step in

the remaining plan the goal is to detect future failure as early as possible, & replan note: in (rare) cases it might even detect serendipitous success

action monitoring by checking preconditions is relatively easy to include but plan monitoring is more difficult

partial order & planning graph structures include information that may support the plan monitoring approach


65

Online Replanning

with replanning, plans will always succeed, right? still there can be "dead ends", states from which no repair is

possible a flawed model can lead the plan into dead ends

an example of a flawed model: the general assumption of unlimited resources (for example, bottomless paint cans)

however, if we assume there are no dead ends, there will be a plan to reach a goal from any state

and if we further assume that the environment is truly non-deterministic (that there's always a non-zero chance of success) then a replanning agent will eventually achieve the goal


66

Online Replanning

when replanning fails another problem is that actions may not really be non-

deterministic - instead, they may depend on preconditions the agent does not know about

for example, that painting from an empty paint can has no effect & will never lead to the goal

there are alternative approaches to cope with such failures (1) the agent might randomly select a candidate repair plan

(open another can?) (2) the agent also might learn a better model

modifying the world model to match percepts when predictions fail

67

Multiagent Planning

the next relaxation of environmental assumptions there may be multiple agents whose actions need to be taken

into account in formulating our plans background: distinguish several slightly different paradigms (1) multieffector planning

this is what we might call multitasking, really a single central agent but with multiple ways of interacting with the environment, simultaneously (or, like a multiarmed robot)

(2) multibody planning here we consider multiple detached units moving separately, but

sharing percepts to generate a common representation of the world state that is the basis of the plan

one version of the multibody scenario has central plan formulation but somewhat decoupled execution

for example, a fleet/squadron of reconnaissance robots that are sometimes out of communications range

multibody subplans for each individual body include communication actions


68

Multiagent Planning

variations on the theme with a central planning agent, there's a shared goal

it's also possible for distinct agents, each generating plans, to have a shared goal

the latter paradigm suggests the new prototypical problem: planning for a tennis doubles team

so shared goal situations can be either multibody (1 central plan) or multiagent (each developing a plan, but with a requirement for coordination mechanisms)

a system could even be some hybrid of centralized & multiagent planning

as an example, the package delivery company develops centralized routing plans but each truck driver may respond to unforeseen weather, traffic issues with independent planning

69

Multiagent Planning

our first model involves multiple simultaneous actions the terminology is multiactor settings

we merge aspects of the multieffector, multibody, & multiagent paradigms, then consider issues related to transition models, correctness of plans, efficiency/complexity of planning algorithms

correctness: a correct plan, if carried out by the actors will achieve the goal

note that in a true multiagent situation, they might not agree synchronization: a simplifying assumption we apply that all

actions require the same length of time, & multiple actions at a step in the plan are simultaneous

under a deterministic environment assumption, the transition model is given by the function: Result(s, a)

action choices for a single agent = b, & b may be quite large in the multiactor model with n actors, now an action is joint using

the notation <ai, ..., an>, where ai is the action for the ith actor

70

Multiactor Scenario

complexity implications of the transition model now with bn joint actions we have a bn branching factor for

planning since planning algorithm complexity was already an issue, a shared

target for multiactor planning systems is to treat the actors as decoupled so that complexity is linear in n rather then exponential

the loose coupling of actors may allow an approximate to linear improvement

this is analogous to issues we've encountered before: additive heuristics for independent subproblems in planning, reducing of a CSP graph to a tree (or multiple trees) to apply efficient algorithms, ...

in multiactor planning: for loosely coupled problems, we treat them as decoupled & then apply fixes as required to handle any interactions

so the action schemas of the transition model treat actors as independent


71Multiactor Scenario

prototype problem: doubles tennis the problem is formulated as returning a ball hit to the team,

while retaining court coverage there are 2 players on the team, each is either at the net or

baseline, on the right side or left side of the court actions are the moving of a player (actor) or the hitting of the

ball by a player


72

Doubles Tennis Problem

here's the conventional (independence assumption) multiactor problem setup for doubles tennis

Actors(A, B)

Init(At(A, LeftBaseline) At(B, RightNet) Approaching(Ball, RightBaseline)) Partner(A, B) Partner(B, A)

Goal(Returned(Ball) At(a, RightNet) At(a, LeftNet)

Action(Hit(actor, Ball),PRECOND: Approaching(Ball, loc) At(actor, loc)EFFECT: Returned(Ball)

Action(Go(actor, to),PRECOND: At(actor, loc) to locEFFECT: At(actor, to) ¬ At(actor, loc))


73

Multiactor Tennis Doubles Scenario

for the multiactor tennis problem here is a joint plan given the problem description

Plan 1:A:[Go(A, RightBaseline), Hit(A, Ball)]B:[NoOp(B), NoOp(B)]

what are issues given the current problem representation? a legal and apparently successful plan could still have both

players hitting the ball at the same time (though that really won't work)

the preconditions don't include constraints to preclude interference of this type

a solution: revise the action schemas to include concurrent action lists that can explicitly state actions are or are not concurrent


74

Controlling Concurrent Actions

a revised Hit action requires it be by 1 actor this is represented by including a concurrent action list

Action(Hit(a, Ball),CONCURRENT: b a ¬Hit(b, Ball)PRECOND: Approaching(Ball, loc) At(a, loc)EFFECT: Returned(Ball)

some actions might require concurrency for success apparently tennis players require large coolers full of

refreshing drinks & 2 actors are required to carry the cooler

Action(Carry(a, cooler, here, there),CONCURRENT: b a Carry(b, cooler, here, there)PRECOND: At(a, here) At(cooler, here) Cooler(cooler)EFFECT: At(a, there) At(cooler, there) ¬At(a, here) ¬At(cooler, here)


75

Multiactor Scenario

given appropriately revised action schemas including concurrent action lists it becomes relatively simple to adapt the classical planning

algorithms for multiactor planning it depends on there being loose coupling of subplans

so the plan search algorithm does not encounter concurrency constraints too frequently

further, the HTN approaches, techniques for partial observability, contingency & replanning techniques may also be adapted for the loosely coupled multiactor problems

next: full blown multiagent scenarios each agent makes independent plans


76

Multiple Agents

cooperation & coordination each agent formulates its own plan, but based on shared

goals & a shared knowledge base we continue with the doubles tennis example problem

Plan 1:A:[Go(A, RightBaseline), Hit(A, Ball)]B:[NoOp(B), NoOp(B)]

Plan 2:A:[Go(A, LeftNet), NoOp(A)]B:[Go(B, RightBaseline), Hit(B, Ball)] either of these plans may work if both agents use it, but if A

does 1 & B does 2 (or vice versa), both or neither returns the ball

so there has to be some mechanism that results in agents agreeing on a single plan

77

Multiple Agents

techniques for agreement on a single plan (A) convention: adopt or agree upon some constraint on the

selection of joint plans, for example in doubles tennis, "stay on your side of the court"

or a baseball center fielder takes fly balls hit "in the gap" conventions are observable at more global levels among multiple

agents, when, for example, drivers agree to drive on a particular side of the road

in higher order contexts, the conventions become "social laws"

(B) communication: between agents, as when 1 doubles player yells "mine" to a teammate

the signal indicates which is the preferred joint plan see similar examples in other team sports as when a baseball fielder

calls for the catch on a popup note that the communication could be non-verbal plan recognition applies when 1 agent begins execution & the initial

actions unambiguously indicate which plan to follow


78

Multiple Agents

the AIMA authors discuss natural world conventions these may be the outcome of evolutionary processes

in harvester ant colonies - there is no central control yet they execute elaborate "plans" where each individual ant

takes on 1 of multiple roles based on its current local conditions convention or communication?

planning & "spontaneous" human social events (Aberdeen)? another example from the natural world is the flocking

behaviour of birds this can be seen as a cooperative multiagent process successful simulations of flocking behaviour algorithmically over

a collection of agents ("boids") are possible if each observes its neighbours & maximizes a weighted sum of 3 elements

(1) cohesion: +ve for closer to average position of neighbours (2) separation: -ve for too close to a neighbour (3) alignment: +ve for closer to the average heading of neighbours

79

Multiple Agents

convention & emergent behaviour where complex global behavior can arise from the interaction of

simple local rules in the boids example, the result is a pseudorigid "flock" that has

approximately constant density, does not disperse over time, & makes occasional swooping motions

each agent operates without having any joint plan to explicitly indicate actions of other agents

see some boids background & a demo at: boids online UMP! (ultimate multiagent problems)

these involve cooperation within a team & competition against another team, without central planning/control

robot soccer is an example, as are other similar dynamic team sports (hockey, basketball)

may be less true of say baseball, football where some central control is possible & high degree of convention + communication

http://www.red3d.com/cwr/boids/

80

Summary

moving away from the limits of classical planning (1) actions consume (& possibly produce) resources which we

treat as aggregates to control complexity formulate partial plans, taking resource constraints into account,

then refine them (2) time is a resource that can be considered with dedicated

scheduling algorithms or perhaps integrated with planning (3) a HTN (Hierarchical Task Network) approach captures

knowledge in HLAs (High Level Actions) that may have multiple implementations as sequences of lower level actions

angelic semantics for interpreting the effects of HLAs allows planning in the space of HLAs without refinement into primitive actions

HTN systems can create large, real-world plans (4) classical planning's environment assumptions are too

rigid/optimistic for many problem domains full observability, deterministic actions, a single agent

81

Summary

relaxing the assumptions of classical planning (5) contingent & sensorless planning

contingent planning uses percepts during execution to conditionally branch to appropriate subplans

sensorless/conformant planning may succeed in coercing the world to a goal state without any percepts

for contingent & sensorless paradigms, plans are built by search in the belief space, for which the techniques must address representational & computational issues

(6) online planning agents interleave execution & planning they monitor for problems & repair plans to recover from

unplanned states, allowing them to deal with nondeterministic actions, exogenous events, & poor models of the environment

(7) multiple agents might be cooperative or competitive the keys to success are in mechanisms for coordination

(8) future chapters will cover probabilistic non-determinism, learning from experience to acquire

strategies

CISC453 Winter 2010 Planning & Acting in the Real World AIMA3 e Ch 11 Time & Resources Hierarchical...

Documents

Transcript of CISC453 Winter 2010 Planning & Acting in the Real World AIMA3 e Ch 11 Time & Resources Hierarchical...