Planning with Local Search

Planning with Local Search

MERS Seminar Lecture

March 6, 2003

Jonathan Kennell

Presentation Outline

Planning Overview– What is planning? – 5 mins.– Taxonomy of planners – 40 mins.

(or everything you ever wanted to know about planning in approximately 40 minutes)

5 minute break

LPG– Background information (WalkSAT) – 10 mins.– Linear action graphs and precedence graphs – 10 mins.– WalkPlan planning algorithm – 10 mins.– Example – 10 mins.

What is Planning?

Input– Set of world-states– Action operators (fn: world-state world-state)– Initial world-state– Goal (possibly a partial state / set of world-states)

Output– Ordering of actions

From 6.834J POP lecture

World State

Set of facts and their degree of truth

– Examples: (Student Jonathan) // true (Likes Jonathan Golf) // false (Graduating Jonathan June) // unknown *

Note: lisp notation used extensively in planning community * Most planners don’t consider unknown facts

Planning Operators

Fn: world-state world-state

Generally use STRIPS format:

– Preconditions: facts that must be true before action can occur

– Effects: facts that become true (or false) after the action occurs

Extra properties:

– Separate start / invariant / end conditions and effects

– Durations

– Resource constraints

(:action Move (:params ((robot ?r) (location ?a) (location ?b)) (:preconds (at ?r ?a)) (:effects (and (not (at ?r ?a)) (at ?r ?b))))

Mutual Exclusion

Sometimes planning operators conflict with each other – we call a pair of conflicting operators mutex

Examples of mutex actions:– Interference: A deletes precondition or effect of B– Competing Needs: A and B have mutex preconditions

Planner must ensure no mutex actions co-occur.

What is a plan?

A plan is an ordering of actions that will transition the system from the initial state to the goal state.

Start

Activity-A

Activity-B

Activity-C

Activity-D

End

fact-J

fact-K

fact-L

fact-M

fact-N

fact-O

fact-P

Completeness / Consistency / Minimality

Complete Plan– A plan is complete IFF every precondition of every activity is achieved.– An activity’s precondition is achieved IFF:

The precondition is the effect of a preceding activity (support), and No intervening step conflicts with the precondition (mutex).

Consistent Plan– The plan is consistent IFF the temporal constraints of its activities are

consistent (the associated distance graph has no negative cycles), and– no conflicting (mutex) activities can co-occur.

Minimal Plan– The plan is minimal IFF every constraint serves a purpose, i.e.,

If we remove any temporal or symbolic constraint from a minimal plan,the new plan is not equivalent to the original plan

Variations on Classical Planning

Temporal planning– Actions have durations

Planning with resources– Facts can be quantified

Planning with uncertainty– Effects / durations of actions not guaranteed

Taxonomy of Planners

Planners

Macro Decomposition(restricted plan-space)

SHOP2

Kirk TPN Planner

Plan Graph(condensed plan-space)

Graphplan

LPGP

Forward Chaining /Backward Propagation

(entire plan-space)

Global Search

Local Search LPG

TLPlan

Kirk Deductive Controller

Forward Chaining / Backward Propagation

Searches through entire plan-space by non-deterministically adding actions to plan candidates.

Advantages: – generative (does not require strategies)– expressive (can handle time, resources, easily)

Disadvantages:– Inherently slow (plan-space is enormous)

Forward Chaining Example

Etc.

Familiar tradeoff: Efficient pruning methods versus optimality.

Case Study: TLPlan

TLPlan (Temporal Logic Planner) by Fahiem Bacchus and Froduald Kabanza

TLPlan is based on a forward-chaining planner

TLPlan uses domain-dependent temporal logic to prune the search space

TLPlan: First-order Temporal Logic

Definition: First-order linear temporal logic– standard first-order logic, plus:

U (until), □ (always), ◊ (eventually), ○ (next) Bounded quantifiers:

[x:y] x . y(x)(x) [x:y] x . y(x)(x)

Example: – □(on(B,C) (on(B,C) U on(A,B)))– Asserts that whenever we enter a state in which B is on C it

remains on C until A is on B

TLPlan: Formula Progression Algorihtm

The Progress algorithm is used to check control strategies as the system searches for a plan.

Inputs: An LTL formula f and a world w (generated by forward-chaining)

Output: A new formula f+, also expressed as an LTL formula, representing the progression of f through the world w.

Algorithm: Progress(f,w)– Case

1. f = is atomic: if w entails f, f+ := TRUE, else f+ = FALSE2. f = f1 f2: f+ := Progress(f1,w) Progress(f2,w)3. f = f1: f+ := Progress(f1,w)4. … etc. … (see paper for complete algorithm)

TLPlan Example

Rules:Forward chaining begins…

(Any color)

This thread is efficiently guided by the rules

This thread is not guided well since no rules apply.This results in pure forward-chaining search.

Etc.

TLPlan Review

TLPlan has been around in various implementations since 1995, although improvements have been made as recently as last year.

TLPlan functions initially as a forward-chaining planner, but can use logical rules to guide its search and prune unfeasible threads.

TLPlan was the fastest domain-specific planner in the 2002 AIPS competition.

Domain Knowledge

Planning is hard – the most general planners are extremely slow

To increase speed, some planners sacrifice generality by using domain-specific strategies.

TLPlan encodes the strategy into the goal specification, while other planners decouple the goals and the strategies.

Forward Chaining Speedup

Many researches have focused on discovering ways to help speedup domain-independent forward chaining planners.

– Ex. SAPA by Minh B. Do & Subbarao Kambhampati

Methods focus on estimating plan cost using:– Relaxed plan-graphs

Estimated remaining cost to goal– Cost metrics

Ex. # actions, plan duration, etc.


Planners


SHOP2

Kirk TPN Planner


Graphplan

LPGP


(entire plan-space)

Global Search

Local Search LPG

TLPlan


Plan Graph

Plan-graph based planners first construct a compact representation of the plan-space (the plan-graph), and then search that space.

Plan-graphs contains all possible plans up to a certain size, excluding incomplete plans with co-occurring binary mutex actions.

Plan-graphs do not exclude all invalid plans, and depending on the domain may yield extremely efficient or inefficient results.

Advantages:– generative– much faster than most forward-chaining planners– plan-graph can be generated in polynomial time and space

Disadvantages:– plan-graphs are less expressive (resources and time difficult)– in certain domains, search of plan-graph can be very inefficient

Forward Chaining vs. Plan Graph

Forward Chaining Plan Graph

Case Study: Graphplan

Note the compact structure in this graph – it’s polynomial in size!

Mutex Relationships

Case Study: LPGP

Idea: – use Graphplan to identify complete plan (action structure)– then use Linear Programming to determine plan consistency and perform

scheduling (assign durations to actions)

Advantage: – Two-phase approach accomplishes temporal planning with the speed of a

plan-graph based planner

Disadvantages:– Cannot optimize over time (only optimizes over makespan)– Two-phase approach is potentially very inefficient

no temporal conflicts are used to guide Graphplan search search not incremental – LP must be started from scratch each time


Planners


SHOP2

Kirk TPN Planner


Graphplan

LPGP


(entire plan-space)

Global Search

Local Search LPG

TLPlan


Macro Decomposition

Operates similar to context-free grammar– planner non-deterministically expands “macro-activities” until all plan actions

are primitive.– rules ensure that planner only explores space of complete plans

Planner still must ensure plan consistency.

Advantages– Fast

Disadvantages– all achieving strategies must be pre-encoded into macros– non-optimal: explores restricted plan-space, potentially excluding optimal

solutions

Case Study: SHOP2

SHOP2 by Dana Nau, Hector Munoz-Avila, Yue Cao, Amnon Lotem and Steven Mitchell

SHOP2 works similar to the task-decomposition mechanism in Kirk

SHOP2 problems consist of:– Operators (with preconditions, add-effects and delete-effects)– Methods (rules for how to progress the plan)– Initial conditions and goals

SHOP2 is fairly fast, but all plan happenings must be pre-designed (at some level) by a programmer.

SHOP2 plans do not support concurrency

SHOP2 Example

(defdomain basic-example ( (:operator (pickup ?a) () () ((have ?a))) (:operator (drop ?a) ((have ?a)) ((have ?a)) ())

(:method (swap ?x ?y) ((have ?x)) ((drop ?x) (pickup ?y)) ((have ?y)) ((drop ?y) (pickup ?x)))))

(defproblem problem1 basic-example ((have banjo)) ((swap banjo kiwi)))

PrecondsDelete-effectsAdd-effects

ConditionStrategy

Initial Condition Start Strategy

Allows one method todecompose into multiplepossible subplans, dependingon the current state

SHOP2 In Action


(:method (swap ?x ?y) ((have ?x)) ((drop ?x) (pickup ?y)) ((have ?y)) ((drop ?y) (pickup ?x)))))


State:

(have banjo)


(:method (swap banjo kiwi) ((have banjo)) ((drop banjo) (pickup kiwi)) ((have kiwi)) ((drop kiwi) (pickup banjo)))))


?

(have kiwi)

DONE

Case Study: SHOP2

Case Study: Kirk TPN Planner

Macro-Activity() [l,u]

Decomposition 1

Decomposition 2

5 Minute Break

Presentation Outline

Planning Overview– What is planning? – 5 mins.– Taxonomy of planners – 40 mins.

(or everything you ever wanted to know about planning in approximately 40 minutes)

5 minute break

LPG– Background information (WalkSAT) – 10 mins.– Linear action graphs and precedence graphs – 10 mins.– WalkPlan planning algorithm – 10 mins.– Example – 10 mins.


Planners


SHOP2

Kirk TPN Planner


Graphplan

LPGP


(entire plan-space)

Global Search

Local Search LPG

TLPlan


Local Search: WalkSAT

WalkSAT is a randomized algorithm for solving SAT (propositional satisfiability) problems.

It builds on the DPLL algorithm, but utilizes local search and randomness.

WalkSAT

Problem:– Find a satisfying assignment to a logic formula

(A || !B) && (B || !C) && (C || !A) && (A || B || C)

WalkSAT:– Pick a random assignment to the variables– Until formula satisfied (or up to some max # of iterations),

Choose an unsatisfied clause and enumerate the ways of adjusting the variables in order to satisfy it

With probability p– Choose the best-utility adjustment

Else– Choose a random adjustment

WalkSAT Example

(A || !B) && (B || !C) && (C || !A) && (A || B || C)

Pick !A, !B, !C– (A || !B) && (B || !C) && (C || !A) && (A || B || C)– Options are to switch A, B, or C

Pick A, !B, !C– (A || !B) && (B || !C) && (C || !A) && (A || B || C)– Options are to switch A or C

Pick A, !B, C– (A || !B) && (B || !C) && (C || !A) && (A || B || C)– Options are to switch B or C

Pick A, B, C– (A || !B) && (B || !C) && (C || !A) && (A || B || C)– Formula Satisfied!

WalkSAT Discussion

WalkSAT has proven to be very fast at solving complicated SAT problems– WalkSAT can solve some problems that

systematic algorithms simply can’t handle

Due to randomness, WalkSAT is incomplete– WalkSAT may fail to discover a solution

Introduction to LPG

LPG (local search for plan-graphs) – by Alfonso Gerevini and Ivan Serina

Blackbox mapped the planning problem to a CSP

and solved it using a SAT solver.

LPG unifies the planning and WalkSAT algorithms to

create the WalkPlan search algorithm.

LPG Big Idea

Big Idea:– Start with a random plan– While plan is incorrect / inconsistent

Identify and repair conflict

Basically the same idea of WalkSAT, but applied to a special form of plan-graph

Temporal Action Graphs

Definitions:– Action-graph: the subset of a plan-graph containing the

action layers

– Support: a fact is said to be “supported” if it is achieved by

some action in the previous action layer

– Conflict: a mutex between two actions

an action with an unsupported precondition

Linearization of Action Graphs

An Action Graph can be made linear by allowing only one action per action layer.

The layers no longer explicitly represent an ordering of time (temporal concurrency is still possible)

The layer ordering simply presents an action sequence for the purposes of establishing fact support relationships.

Example: Linear Action Graph

A

B

C

A

B

C

A

B

C

A

B

C

A

B

C

A0 A0 No-op

A1

No-op No-op

A2

A plan-graph consists of alternating fact layers and action layers.

The actions alone constitute an action graph.

LPG operates directly on the action graph structure, inserting and removingactions from various action layers as it repairs incomplete plans.

Example: Temporal Action Graph

Conflicts and Repair

An incomplete plan is manifested as an action graph with conflicts.

Example conflicts with resolution (repair) strategies:

Conflict Description Conflict Resolution Strategies

Permanent mutex between two actions in the same action layer

Remove one of the actions

Precondition mutex between two actions in the same action layer

Remove one of the actions

Add support for one of the mutex preconditions

Unsupported precondition for an action in an action layer

Add an action to the previous action layer that achieves the unsupported precondition

Remove the action whose preconditions are not satisfied

LPG Algorithm LPG:

1. Generate an initial dummy plan, P, either…

– Randomly

– By adding actions to support all facts ignoring mutexes, or

– Via some front-end plan generator

2. Randomly choose a conflict in the action-graph, C

3. Identify all possible ways of resolving C and evaluate them using the action evaluation function

– Resolution techniques include: removing one of two mutex actions, adding a supporting action for an unsupported fact, or removing an action that has an unsupported precondition

– If a conflict resolution has cost 0, the plan is complete

– Note: The action evaluation function uses Lagrange multipliers to dynamically weight the different factors in the action evaluation function

4. If a resolution introduces no new conflicts, apply it and go to step (2)Else,

– with probability p, randomly choose a resolution, apply it and go to step (2)

– with probability 1-p, choose the lowest cost conflict resolution, apply it and go to step (2)

– Note: The resolution step includes a mechanism for extending the plan-graph

Generate Initial Plan

Choose Conflict

Resolve & Evaluate

Resolution Selection

LPG’s WalkPlan Planning Algorithm

LPG Example

A

B

C

A

B

C

A

B

C

A

B

C

A

B

C

Initial Conditions: ( nil )

Goals: ( A, B, C )

Actions:

A0: preconds ( nil ) effects ( A )

A1: preconds ( A ) effects ( A, B )

A2: preconds ( A, B ) effects ( C )

A0 A0 No-op

Initial dummy plan

Identify conflict

Resolve conflict

A1

No-op No-op

A2A2C C

No-op No-op

A1

BNo-op

Plan complete

Note: No-ops are propagated during conflict resolution

Unsupported precondition(resolved by removing theconflicting action)

Unsupported precondition(resolved by adding achievingaction at previous action layer)

Permanently mutex actionsin the same action layer(resolved by removing one of the two actions)

Unsupported precondition(resolved by adding achievingaction at previous action layer)

LPG Analysis

Advantages:– LPG is fast – four orders of magnitude faster than the leading

optimal planners– LPG is domain-independent– LPG can easily handle resources and durative actions

Disadvantages:– LPG is randomized, so plans are not usually optimal and often

contain extraneous actions LPG includes option to continue searching for multiple solutions, in the

hope of finding better plans

While maintaining expressivity, LPG sacrifices optimality for speed.

AIPS 2002 Results (subset)

PlannerProblems

SolvedProblems Attempted

Success Ratio Capabilities

SHOP22nd place

(hand-coded)

899 904 99%(Strips, Numeric,

HardNumeric, SimpleTime, Time,

Complex)

TLPlan1st place

(hand-coded)


HardNumeric, SimpleTime, Time,

Complex)

LPG1st place

(fully-automated)


HardNumeric, SimpleTime, Time)

Summary

Planning is hard!– We want planners that

are fast are domain-independent are optimal handle durative actions / resources / uncertainty

Want a speedup? – Sacrificing expressivity helps– Sacrificing optimality helps more– Sacrificing generality helps the most

LPG is today’s best planner than is domain-independent, expressive, and fast – to achieve speed, it sacrifices optimality and uses local search.

Planning References

Planning in general:– Russell and Norvig, “Artificial Intelligence: A Modern Approach”, section IV, Prentice

Hall; 2nd edition (December 20, 2002)

AIPS International Planning Competition, 2002:– http://www.dur.ac.uk/d.p.long/competition.html

Graphplan:– A. Blum and M. Furst, “Fast Planning Through Planning Graph Analysis”, Artificial

Intelligence, 90:281—300 (1997).– www.cs.cmu.edu/~avrim/graphplan.html

LPG:– A. Gerevini and I. Serina, “Planning through Stochastic Local Search and Temporal

Action Graphs”, technical report from Universita degli Studi di Brescia, November, 2002.

– prometeo.ing.unibs.it/lpg/

Planning with Local Search

Documents

Transcript of Planning with Local Search