Strong-Cyclic Planning When Fairness is Not a Valid...

Strong-Cyclic Planning When Fairness is Not a Valid Assumption

Alberto Camacho Sheila A. McIlraith

Department of Computer ScienceUniversity of Toronto, Canada

{acamacho,sheila}@cs.toronto.edu

KnowProSJuly 10, 2016

Take Home Message

Motivation

Soundness of standard strong-cyclic solutions to Fully ObservableNon-Deterministic (FOND) planning problems is guaranteed only whenthe fairness assumption holds.

Approach

We introduce L-fairness; a more generic concept that generalizes theclassical fairness assumption.

Contribution

FOND+ class of planning problems. Soundness of solutions ispredicated on the L-fairness assumption.

Identify a class of FOND+ solutions that are also solutions to1-primary normative fault-tolerant planning problems.

We present different algorithms to solve FOND+ problems.

Camacho and McIlraith: Strong-Cyclic Planning When Fairness is Not a Valid Assumption 2 / 21

Non-Deterministic Planning

Non-Deterministic Planning Domain D = 〈F , S ,A,T 〉

F finite set of propositions

S finite set of states S ⊆ 2F

A set of actions a = 〈Prea,Eff a〉

Preconditions PreaNon-deterministic effects Eff a = 〈Eff 1

a, . . .Effna〉

T : S ×A → 2S transition function

If s ′ ∈ T (s, a,Eff ia) then s ′ = Prog(s, a,Eff i

a) for some Eff ia ∈ Eff a

We write state transition (s, a, s ′)

In our paper, we address two classes of non-deterministic planningproblems:

Fully Observable Non-Deterministic (FOND) Planning

Fault-Tolerant Planning


FOND Planning

FOND Planning Problem P = 〈D, s0, SG 〉

D = 〈F , S ,A,T 〉 is a non-deterministic planning domain

s0 ∈ S initial state

SG ⊆ S goal states

Solutions are policies, or mappings from states into actions.

weak solutions

strong solutions

strong-cyclic solutions


Solutions to a FOND Problem (cf. [Cimatti et al., 2003])

Weak Solutions

Weak solutions are plans that achieve the goal, but without guarantees.

Strong Solutions

Strong solutions guarantee goal achievement in all executions.

Strong-Cyclic Solutions

Strong-Cyclic solutions guarantee goal achievement, provided that allexecutions are fair.

An execution σ is unfair when a state-action tuple s, a appears infinitelyoften in σ, but the transition (s, a, s ′) occurs a finite number of times foran outcome s ′ ∈ T (s, a).Executions that are not unfair are said to be fair.

c.f. [Cimatti et al., 2003]


Fault-Tolerant Planning

Fault-Tolerant Planning Problem P = 〈D, s0, SG , F , κ〉


s0 ∈ S initial state

SG ⊆ S goal states

F is an exception model

κ is an integer parameter

F :⋃

a∈AEff a → N is an exception model:

F (e) > 0 when the effect is faulty

F (e) = 0 when the effect is normative

If |e | F (e) = 0, e ∈ Eff a| = 1 for all a ∈ A, then problem is1-primary

c.f. [Jensen et al., 2004, Domshlak, 2013]


Solutions to Fault-Tolerant Planning Problems

κ-admissible Executions

A state-effect execution (s0, e0, . . . , si , ei , . . .) is κ-admissible whenΣiF (ei ) ≤ κ.

Solutions are κ-Plans

A policy is a κ-plan when all κ-admissible executions are finite and reachthe goal.


Motivation

Blocksworld domain:

Initial state: {on(A,B), ontable(B), handempty}

Actions:

pick-up-block(?b,?from):Pre = {handempty, on-block(?b,?from)}Eff 1 = {holding(?b) ∧ ¬handempty}Eff 2 = {on-table(?b) ∧ ¬on-block(?b,?from)}put-block-on-table(?b)

Pre = {holding(?b)}Eff = {on-table(?b) ∧ ¬holding(?b)}put-on-block(?b1,?b2)

Pre = {handempty ∧ clear(?b2)};Eff = {on-block(?b1,?b2) ∧ ¬handempty}

Goal condition: {on-table(A)}

B

A

B A B

A

Goal achievement ispredicated on fairness.

B

A

B A B

A

Goal achievement is notpredicated on fairness.


Desired Solutions

Guarantees vs. no guarantees of occurrence:

solutions need not to rely on an effect for which there is noguarantees of occurrence

Normative vs. faulty behaviour:

solutions need to achieve the goal when the system manifests itsnormative behaviour


Outline

1 Background in Non-Deterministic Planning

2 The Model: FOND+

3 Algorithms to solve FOND+

4 Experimental Results

5 Conclusions


L-fair Executions

L-fair Executions

For a labeling function L : S ×A× S → {F, U}, we say that an executionin state s0 is L-unfair when there exists a state-action tuple (s, a) suchthat

(s, a) appears infinitely often, and

there exists a transition (s, a, s ′) such that L(s, a, s ′) = F and(s, a, s ′) occurs a finite number of times.

Executions that are not L-unfair are said to be L-fair.

Note that fairness, as defined by [Cimatti et al., 2003], is a particularcase of L-fairness that occurs when L assigns F to all transitions.


Planning With Unfair Non-Determinism

FOND+ Planning Problem P = 〈D, s0, SG , L〉


s0 ∈ S is the initial state

SG ⊆ S is a set of goal states

L : S ×A× S → {F, U} is a labeling function

Solutions

Solutions to a FOND+ problem P = 〈D, s0, SG , L〉 are policies thatguarantee goal achievement, predicated on the assumption that allexecutions of D in s0 are L-fair.


Classes of FOND+ Solutions

Strictly Fair

A solution π to a FOND+ problem is strictly fair when all transitions tproduced by L-fair plan executions have L(t) = F.

Strictly Unfair

A solution π to a FOND+ problem is strictly unfair when all transitionst produced by L-fair plan executions have L(t) = U.

Mixed

A solution π to a FOND+ problem is mixed when it is neither strictly fairnor strictly unfair.


FOND+ and Fault-Tolerant Planning

Normative Solutions

A FOND+ solution π is normative when, in each state s, reachable by π:

there exists a plan execution in s that reaches the goal and suchthat all transitions t have L(t) = F, and

exactly one outcome of s by π(s) produces a transition t withL(t) = F.

Normative Solutions are Fault-Tolerant

Normative solutions to a FOND+ problem P = 〈D, s0, SG , L〉 are also1-primary normative solutions to fault-tolerant planning problemsP ′ = 〈D, s0, SG ,F , κ〉 s.t. F (e) = 0 (resp. F (e) > 0) when e producestransition (s, a, s ′) such that L(s, a, s ′) = F (resp. L(s, a, s ′) = U).

Normative FOND+ solutions are robust to occurrence of anypossible number of faults during execution, as opposed to standardfault-tolerant solutions.


Outline


2 The Model: FOND+



5 Conclusions


Algorithm to Find Strictly Fair Solutions

For a FOND+ problem P, the algorithm consists of two steps:

1 P is relaxed into a FOND problem P ′ = 〈D′, s0, SG 〉.D′ is like D, but the actions applicable in a given state s arerestricted to those a’s that only yield transitions (s, a, s ′) labeledwith L(s, a, s ′) = F.

2 A sound and complete strong-cyclic FOND planner – e.g. PRP[Muise et al., 2012] – is used to search for a strong-cyclic solution toP ′, which is returned as a strictly fair solution to P.

Theorem

Algorithm is sound and complete.


Algorithm to Find Strictly Unfair Solutions

For a FOND+ problem P, the algorithm consists of two steps:

1 P is relaxed into a FOND problem P ′ = 〈D′, s0, SG 〉.D′ is like D, but the actions applicable in a given state s arerestricted to those a’s that only yield transitions (s, a, s ′) labeled wthL(s, a, s ′) = U.

2 A sound and complete strong FOND planer – e.g.[Jaramillo et al., 2014] – is used to search for a strong solution toP ′, which is returned as a strictly unfair solution to P.

Theorem



Algorithm to Find Normative Solutions

Three basic steps (also in PRP):

Step 1: Search plan in the all-outcomes determinization of the problem(i.e. ignore non-determinisim).

Init S1 S2 Goal

?

?

?

?

?

?

a1 a2 a3





Step 2: Select a state result of non-determinisim, and search plan to theGoal or to a previously resolved state.

Init S1 S2 Goal

?

?

?

?

S3

?

a1 a2 a3






Step 3: Repeat Step 2 until convergence.

Init S1 S2 Goal

?

?

?

?

S3

?

a1 a2 a3






Step 3: Repeat Step 2 until convergence.

Difference with PRP is in the open list of states.

In PRP: First-In, Last-Out

In our algorithm: Exploration of states produced by normativeeffects have preference.

Theorem



Outline


2 The Model: FOND+



5 Conclusions


Objectives of the Experiments

Two Main Objectives:

Test the efficiency of one of our algorithms

Evaluate characteristics (planner run time and policy size) ofnormative solutions

Procedure:

Compute Normative solutions to FOND+ problems

Compute Strong-Cyclic solutions to FOND problems, using PRPplanner [Muise et al., 2012]


Blocksworld Problems

Blocksworld problems from [Muise et al., 2012], with actions:

pick-up-block(?b,?from):Pre = {handempty, on-block(?b,?from)}Eff 1 = {holding(?b) ∧ ¬handempty}Eff 2 = {on-table(?b) ∧ ¬on-block(?b,?from)}

put-block-on-table(?b)

Pre = {holding(?b)}Eff = {on-table(?b) ∧ ¬holding(?b)}

put-on-block(?b1,?b2)

Pre = {handempty ∧ clear(?b2)}Eff 1 = {on-block(?b1,?b2) ∧ ¬handempty}Eff 2 = {on-table(?b1) ∧ ¬handempty}

In FOND+ problems we consider:

Eff 1 is a normative effect

Eff 2 is a faulty effect


Results

Strong-Cyclic Normativeproblem run-time size run-time sizep2 0 3 0 3p3 0.002 5 0.016 5p4 0.020 11 0.048 11p5 0.070 27 0.178 27p6 0.110 39 0.296 39p7 0.114 32 0.270 32p8 0.150 26 0.356 26p9 0.278 46 0.664 46p10 0.336 49 0.782 49p11 0.522 120 1.936 97p12 0.626 97 1.840 119.5p13 0.682 57 1.810 57p14 3.794 1117 37.10 1123p15 1.500 278 7.814 278


Outline


2 The Model: FOND+



5 Conclusions


Summary and Future Work

Strong-cyclic planning does not guarantee goal achievement inproblems the fairness assumption is not valid

We introduced L-fairness and FOND+ model

We identified connection between FOND+ and 1-primary normativefault-tolerant planning

Introduced algorithms to search FOND+ solutions

Future Work:

Further investigate and formalise connections between FOND+ andfault-tolerant planning

More extensive experiments


Questions?

code, benchmarks, and slides available soon:

http://www.cs.toronto.edu/~acamacho


http://www.cs.toronto.edu/~acamacho

References I

Cimatti, A., Pistore, M., Roveri, M., and Traverso, P. (2003).

Weak, strong, and strong cyclic planning via symbolic model checking.Artificial Intelligence, 147:35–84.

Domshlak, C. (2013).

Fault tolerant planning: Complexity and compilation.

Hertle, A., Dornhege, C., Keller, T., Mattmller, R., Ortlieb, M., and Nebel, B. (2014).

An Experimental Comparison of Classical, FOND and Probabilistic Planning.In Proc. of 37th International Conference on Artificial Intelligence (KI 2014), Prague.

Jaramillo, A. C., Fu, J., Ng, V., Bastani, F. B., and Yen, I.-L. (2014).

Fast strong planning for fond problems with multi-root directed acyclic graphs.International Journal on Artificial Intelligence Tools, 23(06):1460028.

Jensen, R. M., Veloso, M. M., and Bryant, R. E. (2004).

Fault tolerant planning: Toward probabilistic uncertainty models in symbolicnon-deterministic planning.pages 335–344.

Little, I. and Thiebaux, S. (2007).

Probabilistic planning vs. replanning.ICAPS Workshop on IPC: Past, Present and Future.

Muise, C., McIlraith, S. A., and Beck, J. C. (2012).

Improved Non-deterministic Planning by Exploiting State Relevance.In ICAPS, pages 172–180.


Strong-Cyclic Planning When Fairness is Not a Valid...

Documents

Transcript of Strong-Cyclic Planning When Fairness is Not a Valid...