[IEEE 2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI) -...

Coalitional Planning in Game-like Domains via ATL Model Checking

Jun Wu, Chongjun Wang+, Lei Zhang and Junyuan XieNational Key Laboratory for Novel Software Technology (Nanjing University)

Department of Computer Science and Technology, Nanjing University+Corresponding author: 210093, 22 Hankou Road, Nanjing, China

{wujun, chjwang, jyxie}@iip.nju.edu.cn

Abstract

Based on the planning via model checking paradigm, weaddress the problem of coalitional planning in this paper.Informally, coalitional planning is the problem of planningfor a subset of agents in a multi-agent system to force thewhole multi-agent system to satisfy some goals. We use thelanguage of ATL as the goal language and the semanticstructure of ATL, i.e., concurrent game structure, to formal-ize the planning domain. We separate the concept of goaland planning object and use execution structures to inter-pret the goals. And then, we define a algorithm for coali-tional planning and formally prove its correctness. Distin-guished from the previous work, in coalitional planning allthe ATL formulas can be considered as goals, thus the ex-pressive power of ATL is sufficiently applied.

1. Introduction

Planning via model checking as a paradigm has been in-vestigated by many researchers in the past years. The keyidea underlining this method is that planning problems canbe solved model-theoretically [9]: goals are represented asformulas of a certain temporal logic; a planning domain isencoded directly (or indirectly) as a semantic model; plan-ning algorithms are developed based on verifying the goalsagainst the states of the model; and the standard techniquesfor the representation and traversal of a model can be inher-ited form the model checking techniques and tools. In fact,this approach is attractive as it is well-founded, general andpractical [9].

Based on this paradigm, we address the problem of coali-tional planning in this paper. Informally, coalitional plan-ning is the problem of planning for a subset of agentsin a multi-agent system (i.e., the coalition) to force thewhole multi-agent system to satisfy some requirements (i.e.,goals). Consider the following examples:

• In a multi-process system containing 3 process p1, p2,and p3, and there are several network printers whichthe system can use, we may require a plan for C ={p1, p2} to force the system to print a document inone of the printers while leave process p3 the abilityto choose which printer the system will use;

• For a car equipped with an automated driving system,we may require a plan for the automated driving sys-tem to drive the car to some destination while enablethe driver to control the car when some dangers occur,or to plan to add some gasoline for the car.

The key characteristic of coalitional planning is that theplanning objects1 are open systems. A open system is a sys-tem that interacts with its environment and whose behav-ior depends on the states of the system as well as the be-havior of the environment2 [4]. Thus, when we plan for anopen system we may also have to consider the properties (orrequirements) of the environment, just as the above exam-ples manifest. But most of the previous work, for examplethat based on Computation Tree Logic (CTL) [8] such asthe literatures [14, 7] and that based on Propositional Lin-ear Temporal Logic (LTL) (or First Order Linear TemporalLogic (FOLTL)) [8] such as the literatures [12, 5, 13], werefor closed systems, where the planning domains were for-malized as Kripke structures and there was no concept ofenvironment. As a consequence, they are generally inade-quate to model the coalitional planning problem discussedin this paper.

Alternating-time Temporal Logic (ATL) proposed byAlur, Henzinger, and Kupferman [3, 4] is a famous and

1A planning object is the object who requires a plan, i.e., the coalition.2We model an open system and its corresponding environment as a

multi-agent system. Given a multi-agent system S, the environment of anopen system C ⊆ S is the set of agents E = S − C. That is, a multi-system can be partitioned exactly into two disjoint parts: the open systemand the environment of the open system. Intuitively, in the above examplesthe process set C = {p1, p2} and the automated driving system are plan-ning objects and are open systems; and the process p3 and the driver areenvironments, respectively.

2009 21st IEEE International Conference on Tools with Artificial Intelligence

1082-3409/09 $26.00 © 2009 IEEE

DOI 10.1109/ICTAI.2009.118

645

well-studied logic which supports reasoning about the abil-ities of agents and coalitions of agents in open, game-likemulti-agent systems (or multi-player games) [15, 10]. It isa non-normal, multi-modal extension of the ComputationTree Logic (CTL). Thus, it is natural to consider extendingthe previous work in closed systems to open systems basedon the formalism of ATL (or ATL-like cooperation logics).The work of van der Hoek et al [16] and Jamroga [11] arethe pioneers of this idea. However, in their frameworks onlya little fragment of the language can be seen as goals andthus the expressive power of ATL has not been sufficientlyapplied. It is easy to see that the goals of coalitional plan-ning cannot be represented by the current method.

Motivated by the literatures [7, 14], in which the con-cept of execution structures is defined to capture the non-determinism of the actions and plans. In this paper, we useexecution structures instead of directly applying the plan-ning domains (as in [11]) to interpret the ATL formulas. In-tuitively, an execution structure is a representation of all thepossible execution paths of a plan with respect to a planningdomain, and a goal can be simply understood as a specifica-tion of requirements for the structural properties of the exe-cution structure. Hence, the concept of goal has been greatlyenriched and extended to contain all the formulas of ATL.Moreover, under this framework, the concept of plan canalso be extended to take into account both the states of thesystem and the internal states of the agents, thus the agentscan perform different actions in the same state of the plan-ning domain. Actually, these improvements finally makethe problem of coalitional planning solvable, and based onwhich, we present and evaluate an algorithm for coalitionalplanning in this paper. To the best of our knowledge, coali-tional planning is a novel problem that haven’t been fullyaddressed by previous studies.

The remainder of this paper is organized as follows. Webegin by introducing the basic concepts of ATL and dis-cussing its underlying connections to the planning problem.Some related work is discussed and based on which we ex-plain our motivations. Next, we formalize the concepts ofcoalitional planning such as goal, plan and execution struc-ture. Then, an algorithm based on ATL model checking ispresented to solve the coalitional planning problem. We for-mally prove the correctness of the proposed algorithm. Fi-nally, we present some conclusions.

2. Background: ATL

2.1. Syntax and Semantics

We will briefly summarize the technical framework ofthe logic ATL and then try to discuss its underlining rela-tionship to the planning problem. The definitions are mainlybased on the literatures [4] and [10].

Definition 2.1 (concurrent game structure). A concurrentgame structure is a tuple S = 〈k, Q,Π, π, d, δ〉 with thefollowing components:

• A natural number k ≥ 1 of agents. We identify theplayers with the numbers 1, ..., k and denote by Σ theset {1, ..., k} of players.

• Q is a finite set of states.

• A finite set Π of atomic propositions.

• For each state q ∈ Q, a set π(q) ⊆ Π of atomic propo-sitions true at q. The function π is called the labelingfunction.

• For each player a ∈ {1, ..., k} and each state q ∈ Q,da(q) ⊆ Σa is a non-empty set of actions available toplayer a at state q, and a move vector (or joint action)at q is a tuple 〈j1, ..., jk〉 such that ja ∈ da(q) for eachplayer a. We write D(q) for the set d1(q)×...×dk(q) ofmove vectors. The function D is called move function.

• For each state q ∈ Q and each move vector〈j1, ..., jk〉 ∈ D(q), a state δ(q, j1, ..., jk) ∈ Q will re-sults from state q if every player a ∈ {1, ..., k} choosesmove ja. The function δ is called transition function.

Note that, in a concurrent game structure, a state transi-tion results from the simultaneous choices made by all theagents (some of them constitute the object we plan for andthe others represent the environment). For two states q andq′, we say that q′ is a successor of q if there is a move vector〈j1, ..., jn〉 ∈ D(q) such that q′ = δ(q, j1, ..., jn). Thus, q′

is a successor of q iff whenever the game is in state q, theagents can choose moves so that q′ is the next state.

Definition 2.2 (computation). A computation of S is an in-finite sequence λ = q0, q1, q2, ... of states such that for allpositions i ≥ 0, the state qi+1 is a successor of the state qi.

We refer to a computation stating at state q as a q-computation. For a computation λ and a position i ≥ 0, weuse λ[i], λ[0, i], and λ[i,∞] to denote, respectively, the ithstate of λ, the finite prefix q0, q1, ..., qi of λ, and the infinitesuffix qi, qi+1, ... of λ.

Definition 2.3 (strategy). A strategy for player a ∈ Σ is afunction fa that map every nonempty finite state sequenceλ ∈ Q+ to an action such that if the last state of λ is q, thenfa(λ) ∈ da(q).

Given an state q ∈ Q, a set C ⊆ {1, ..., k} of players, anda set FC = {fa|a ∈ C} of strategies, one for each playerin C, we define the outcomes of FC from q to be the setout(q, FC) of q-Computations that the players in C enforcewhen they follow the strategies in FC ; that is, a computation

646

λ = q0, q1, q2, ... is in out(q, FC) if q0 = q and for allpositions i ≥ 0, there is a move vector 〈j1, ..., jk〉 ∈ D(qi)such that (1) ja = fa(λ[0, i]) for all players a ∈ C, and (2)δ(qi, j1, ..., jk) = qi+1.

Definition 2.4 (language of ATL). The language of ATL isdefined by the following grammar:

ϕ ::= p|¬ϕ|ϕ1 ∨ ϕ2|〈〈C〉〉 © ϕ|〈〈C〉〉2ϕ|〈〈C〉〉ϕ1Uϕ2,where p ∈ Π is an atomic proposition, and C ⊆ Σ is a

set of agents.

Note that, the temporal operators are © (next time), 2

(always) and U (until), and the operator 〈〈〉〉 is called apath quantifier. Some times we write 〈〈a1, ..., ai〉〉 instead of〈〈{a1, ..., ai}〉〉, and 〈〈〉〉 instead of 〈〈∅〉〉. “Truth”> is definedas abbreviation p∨¬p for some fixed p ∈ Π, and ⊥ = ¬>.Additional Boolean connectives are defined from ¬ and ∨in the usual manner. Similar to CTL, we write 〈〈C〉〉ϕ for〈〈C〉〉>Uϕ.

Definition 2.5 (ATL semantics). We write S, q ² ϕ to in-dicate that the formula ϕ holds at state q of a concurrentgame structure S. When S is clear from the context, wewrite q ² ϕ. The relation ² is defined, for all states q ofS, inductively as follows:

• For p ∈ Π we have q ² p iff p ∈ π(q).

• q ² ¬ϕ iff q 2 ϕ.

• q ² ϕ1 ∨ ϕ2 iff q ² ϕ1 or q ² ϕ2.

• q ² 〈〈C〉〉 © ϕ iff there exists an C-strategy, FC , suchthat for each computation λ ∈ out(q, FC) we haveλ[1] ² ϕ; equivalently, iff there exists an C-move δ ∈DC(q) such that for all q′ ∈ out(δ) we have q′ ² ϕ.

• q ² 〈〈C〉〉2ϕ iff there exists an C-strategy, FC , suchthat for each computation λ ∈ out(q, FC) and all po-sitions i ≥ 0, we have λ[i] ² ϕ

• q ² 〈〈C〉〉ϕ1Uϕ2 iff there exists an C-strategy, FC ,such that for each computation λ ∈ out(q, FC) thereexists a position i ≥ 0 such that λ[i] ² ϕ2 and for allpositions 0 ≤ j < i we have λ[j] ² ϕ1.

2.2. ATL and Planning - A discussion

The logic ATL is similar to the branching-time tempo-ral logic CTL, only that path quantifiers are parameterizedby sets of players. The ATL operator 〈〈C〉〉, which representsthe strategic ability of the coalition C in the multi-agent sys-tem, is a generalization of the CTL path quantifier ∀ and ∃.Syntactically, ATL is a multimodal version of CTL, associ-ating with each set of players C ⊆ Σ the following modaloperators:

• 〈〈C〉〉©ϕ meaning “the agents in the coalition C havea coordinated strategy to force in the next move an out-come satisfying ϕ”;

• 〈〈C〉〉2ϕ meaning “the agents in the coalition C have acoordinated strategy to maintain forever outcomes sat-isfying ϕ”;

• 〈〈C〉〉ϕ1Uϕ2 meaning “the agents in the coalition Chave a coordinated strategy to force an outcome sat-isfying ϕ2 while meanwhile maintaining the truth ofϕ1”; and

• 〈〈C〉〉3ϕ meaning “the agents in the coalition C havea coordinated strategy to force ϕ to eventually becometrue in some future time.”

Given an arbitrary computation λ and an arbitrary posi-tion i ≥ 0, we refer to the sequence λ[0, i] as a context ofthe system. Thus, a strategy may be thought of as a condi-tional plan indicating how an agent is to act in any givencontexts [15]. But in the language of ATL strategies are im-plicit, that is, we can only quantify over them instead of ex-plicitly refer to them [17]. For example, the satisfaction ofthe formula 〈〈C〉〉3ϕ in a state q guarantees there is a planfor C to achieve ϕ from q in some future time, but providesno information on what is the plan. Currently, there are gen-erally two ways to fill this gap: The first way is to enrich thelanguage of ATL to contain explicit strategy symbols, forexample the work of van der Hoek et al. [15], Borgo[6] andWalther et al. [17]; The second way is via model check-ing, such as the work of van der Hoek et al. [16] and Jam-roga [11], e.g., the path that is witness to the truth of theformula 〈〈C〉〉3ϕ encodes a plan for C to achieve ϕ. Wewill focus on the second way (especially the work of Jam-roga [11]), as our work is intrinsically an extension of theirs.

In [11], some of the ATL formulas can be seen as goalslabeled with the corresponding planning objects. For exam-ple, the formula 〈〈C〉〉3ϕ is a reachability goal for coalitionC; 〈〈C〉〉2ϕ is a maintainability goal for coalition C; and〈〈C〉〉ϕ1Uϕ2 is a reachability-preserving goal for coalitionC, that express reachability goals while some property mustbe preserved.

Indeed, this is a natural and direct way to define goalsaccording to the semantics of ATL, but at the same time itmay cause some unavoidable limitations. Firstly, just a lit-tle fragment of the language of ATL can be considered asgoals, specially, only that the outmost operator is a cooper-ation modality (e.g., 〈〈C〉〉3ϕ, or 〈〈C〉〉ϕ1Uϕ2). The casesof negation (e.g., ¬〈〈C〉〉3ψ), alternative (e.g., 〈〈C1〉〉3ψ ∨〈〈C2〉〉3ψ), and conjunction (e.g., 〈〈C1〉〉3ψ ∧ 〈〈C2〉〉3ψ),apparently fail to specify explicitly “who is planning for”or “what is the objective”, and thus cannot be seen as goals.Secondly, for the case of nesting of strategic formulas, e.g.,〈〈C1〉〉2〈〈C2〉〉3ϕ, only the plan for the outmost coalition

647

,i.e., C1, will be generated to force the system to stay for-ever in states that satisfies 〈〈C2〉〉3ϕ. This is because thestrategies in ATL are revocable [1], the evaluation of thesubformula 〈〈C2〉〉3ϕ actually has nothing to do with thestrategies C1 has choose. In other words, only the outmostcooperation modality will give rise to a plan, but the subfor-mula, no matter how complex it is, will only refer to a setof states that satisfy it. To sum up, the expressive power ofATL haven’t been fully applied in goal specification.

To overcome the above mentioned limitations, we willpresent a framework that more fully absorb the expressivepower of ATL to model the planning problem for open sys-tems. In short, the key ideas we based are eliminating theplanning objects from goals and using executing structuresto interpret goals.

3. The Coalitional Planning Problem

While Kripke structures offers a natural model forthe computations of closed system, the natural “common-dominator” model for compositions of open systems is theconcurrent game structure [4]. Thus, we directly use the no-tion of concurrent game structure to formalize the conceptof game-like domain.

Definition 3.1 (game-like domain). A game-like domain isa concurrent game structure G = 〈k, Q,Π, π, d, δ〉.

This definition of game-like domain seems somewhat notin the planning tradition, where states are often not first-class citizens. But it is easy to see that this definition is moregeneral 3. Moreover, it is general in another sense also, thatis, traditional deterministic domains and non-deterministicdomains can be seen as special instances of game-like do-main.

Proposition 3.1(1) A deterministic domain [9] is a game-like domain

with only one agent.(2) A non-deterministic domain [14, 7] D =

〈F, S, A,R〉, where F is the finite set of propositions, S ⊆2F is the set of states, A is the finite set of actions, andR ⊆ S × A × S is the transition relation, can be mod-eled as a game-like domain with two agents, that is, G =〈2, Q,Π, π, d, δ〉, where agent 1 is the plan executor andagent 2 is the environment, Q = S, Π = F , π(q) = q,and (q, a, q′) ∈ R iff ∃a ∈ d1(q) and b ∈ d2(q) such thatδ(q, a, b) = q′.

3That is, in the planning community, a state is usually defined as aset of propositions (or fluents), implying that each state satisfies a uniqueset of propositions. In this sense, our definition of game-like domain is akind of more general structure. For we can impose the requirements thatQ contains 2|Π| states and π is a bijection function. Then the structure isequivalent to the ones with states defined as sets of propositions.

1

locked

2 3

loaded

4

loaded

locked

5

detained

<ULK, G>

<ULK, F>

<ULK, G>

<ULK, F>

<LD, F>

<A, F>

<A, G>

<LD, G>

<ULD, F>

<ULD, G>

<DL, F>

<DL, G>

<LK, F>

<LK, G>

<ULK, F>

<ULK, G>

<W, F>

<W, G>

Figure 1. A game-like domain

Example 3.1 (The depository domain4) Figure 1 depictsa game-like domain Gl = 〈k, Q,Π, π, d, δ〉, where k =2, agent 1 is a depository user and agent 2 is aguard of the depository; Q = {q1, q2, q3, q4, q5}; Π ={loaded, locked, detained}; π(q1) = {locked}, π(q2) =∅, π(q3) = {loaded}, π(q4) = {loaded, locked}, andπ(q5) = {detained}; and the move function d and the tran-sition function δ is depicted by the arrows in the figure.

Note that, LD, DL, ULD, LK, ULK, A, F, and G are ab-breviations for the action load, direct load, unload, lock,unlock, apply, forbid, and grant, respectively.

The plans in our framework is defined to contain the exe-cution contexts [14], that is, the “internal states” of the planexecutor. Interestingly, the plan executor in coalitional plan-ning is a set of agents C ⊆ Σ. Thus, we have to denote inevery plan which this plan belongs to. In general, a plan isdefined in terms of an action function that, given a state andan execution context, specifies the action to be executed foreach agent in C, and in terms of a context function that, de-pending on the action outcome, specifies the next executioncontext.

Definition 3.2 (plan). A plan for a domain G =〈k, Q,Π, π, d, δ〉 is a tuple 〈C, E, e0, α, τ〉, where

• C ⊆ {1, ..., k} denotes a set of agents, i.e., the coali-tion;

• E is a set of (execution) contexts;

• e0 ∈ E is the initial context;

• α : C ×Q×E → ⋃i∈C,q∈Q di(q) is the action func-

tion;

• τ : Q× E ×Q → E is the context function.

Note that, in the state q and under the context e, we saya move vector ~m = 〈j1, ..., jk〉 ∈ D(q) is consistent with

4This example is modified from the container example in [14].

648

a plan P = 〈C, E, e0, α, τ〉 if and only if ∀i ∈ C : ji =α(i, q, e). And, if it is the case, we say ~m is a (P, q, e)-consistent move vector. Moreover, we say plan P is

• proper, if and only if ∀i ∈ C : α(i, q, e) ∈ di(q);

• executable, if and only if whenever τ(q, e, q′) = e′,then ∃(P, q, e)-consistent move vector ~m such thatδ(q, ~m) = q′;

• complete, if and only if ∀(P, q, e)-consistent move vec-tor ~m, if δ(q, ~m) = q′, then there is some context e′

such that τ(q, e, q′) = e′ and α(i, q′, e′) is defined forall i ∈ C.

Since some state-context pairs are actually impossible tobe reached in the execution of the plan, the action functionα and the context function τ may be partial. To force the for-mal definition of plan to coincide exactly with our intuitivemeaning, we only consider plans that are proper, executableand complete. Intuitively, a proper plan always selects feasi-ble actions for the agents; an executable plan only forwardthe execution context to the reachable states; and, a com-plete plan always specifies how to proceed for an arbitraryoutcome.

The plans we defined are actually “situated plan”,namely plans that, at run time, are executed by a reactiveloop that repeatedly senses the state, calculate the currentexecution context, selects appropriate actions for all theagents in the coalition, then all choices of all the agents(both in and outside the coalition) are executed simultane-ously and thus result in a state transition, and iterates, untilthe goal is reached [9]. Thus, the execution of a plan canbe described in terms of transitions between state-contextpairs.

Definition 3.3 (execution). An execution of a plan P =〈C,E, e0, α, τ〉 in a domain G = 〈k, Q,Π, π, d, δ〉 fromstate q0 is a infinite sequence of state-context pairs λP

q0=

〈q0, e0〉, 〈q1, e1〉, 〈q2, e2〉... such that for all positions i ≥ 0,there is a (P, qi, ei)-consistent move vector ~m such thatδ(qi, ~m) = qi+1 and τ(qi, ei, qi+1) = ei+1.

Although a concurrent game structure is deterministic5,the executing of a plan usually is non-deterministic. Sincea plan can only control part of the agents in the multi-agentsystem (i.e., the coalition) but the choices of other agentsare totally uncertain. So, form an arbitrary state of a game-like domain a plan may lead to infinite many executions.Fortunately, we can finitely encode them in an executionstructure.

Definition 3.4 (execution structure). The execution struc-ture of a plan P = 〈C, E, e0, α, τ〉 in a domain G =

5In the sense that, when every agents have made their choices the nextstate is fixed.

〈k, Q,Π, π, d, δ〉 from state q0 is the structure GPq0

=〈kk, QQ, ΠΠ, ππ, dd, δδ〉, where

• kk = k and ΠΠ = Π;

• QQ and δδ are the minimal sets satisfying the followingrules:

(1) 〈q0, e0〉 ∈ QQ; and

(2) if 〈q, e〉 ∈ QQ and there exists a (P, q, e)-consistent move vector ~m such that δ(q, ~m) = q′ andτ(q, e, q′) = e′ then 〈q′, e′〉 ∈ QQ and δδ(〈q, e〉, ~m) =〈q′, e′〉 (i.e., 〈〈q, e〉, ~m, 〈q′, e′〉〉 ∈ δδ ).

• ππ(〈q, e〉) = π(q); and

• ddi(〈q, e〉) = {α(i, q, e),∀i ∈ C

di(q), else.

Note that, the above definition of execution structurefrom a single initial state q0, can be easily extended to theexecution structure from a set of initial states I ⊆ Q (de-noted as GP

I ), by simply replace the first clause of the in-ductively specification of QQ and δδ with the new clause:〈q, e0〉 ∈ QQ for all q ∈ I .

Proposition 3.2 The execution structure GPq0

is a concur-rent game structure.

Proof We only have to show that the transition function δδis total. As in an arbitrary state 〈q, e〉 ∈ QQ, D(〈q, e〉) =d1(〈q, e〉) × · · · × dk(〈q, e〉) is the set of all the (P, q, e)-consistent move vectors. For an arbitrary move vector ~m ∈D(〈q, e〉), there is a q′ such that δ(q, ~m) = q′ since thefunction δ is total, and there is an e′ such that τ(q, e, q′) =e′ since P is complete. So, 〈q′, e′〉 ∈ QQ and δδ(〈q, e〉, ~m) =〈q′, e′〉. Hence, the transition function δδ is total.

Proposition 3.3 The state-context sequence λ is an execu-tion of a plan P in a domain G from state q0 if and only if λis a 〈q0, e0〉-computation of the execution structure GP

q0.

Proof Let λ be the sequence 〈q0, e0〉, 〈q1, e1〉, 〈q2, e2〉....Suppose λ is an execution of a plan P in a domain G fromstate q0. As for an arbitrary position i ≥ 0, there is a(P, qi, ei)-consistent move vector ~m such that δ(qi, ~m) =qi+1 and τ(qi, ei, qi+1) = ei+1. By the definition of δδ inGP

q0, this is equivalent to δδ(〈qi, ei〉, ~m) = 〈qi+1, ei+1〉, i.e.,

〈qi+1, ei+1〉 is a successor of 〈qi, ei〉.By proposition 3.2 and proposition 3.3, we have justified

the claim that all the executions of a plan from a state of agame-like domain can be finitely encoded in an executionstructure. Moreover, we showed that an execution structureis a concurrent game structure. Then, we can define the goallanguage as equivalent to the language of ATL.

649

Definition 3.5 (goal). Given a game-like domain G =〈k, Q,Π, π, d, δ〉. A goal is a ATL formula ϕ defined by thefollowing grammar:

ϕ ::= p|¬ϕ|ϕ1 ∨ ϕ2|〈〈C〉〉 © ϕ|〈〈C〉〉2ϕ|〈〈C〉〉ϕ1Uϕ2,where p ∈ Π is an atomic proposition, and C ⊆ Σ is a

set of agents. That is, all the ATL formulas are goals.

Example 3.2 Consider the game-like domain in example3.1. The goal “maintain the possibility that always theuser and the guard can cooperate to make the depositoryloaded and locked, while enable the guard to detain thecargo if he want” can be expressed by the ATL formula:〈〈〉〉2〈〈1, 2〉〉3(locked ∧ loaded) ∧ 〈〈2〉〉3detained.

Definition 3.6 (goal satisfaction). Let G be a domain, P bea plan for G. Plan P satisfies goal ϕ from the initial stateq0 if GP

q0, 〈q0, e0〉 ² ϕ. Plan P satisfies goal ϕ from the set

of initial states Q0, if GPq , 〈q, e0〉 ² ϕ for each q ∈ Q0.

A goal actually contains no information on who it be-longs to. Thus, in a coalitional planning problem we alwayshave to specify in advance which set of agents is the coali-tion that we can control and plan for.

Definition 3.7 (coalitional planning problem). A coali-tional planning problem Θ is a tuple 〈G, C, I, ϕ〉 where G =〈k, Q,Π, π, d, δ〉 is a game-like domain, C ⊆ {1, ..., k},I ⊆ Q, and ϕ is a goal.

The solution of Θ is a plan P that satisfies ϕ from the setof states I .

Intuitively, for a coalitional planning problem Θ =〈G, C, I, ϕ〉, C is the coalition who need a plan, I is a set ofinitial states, and ϕ is the goal of C. As an execution struc-ture is an encoding of all the executions of a plan (from astate of a game-like domain), a goal is satisfied implies thatall the execution of a plan satisfies a certain property speci-fied by the goal. In another aspect, the context function of aplan can be seen as a finite state machine that encoding thehistory of the computation. And the action function is actu-ally a set of strategies FC , one for each agent in the coali-tion. Thus, the execution of a plan intrinsically is a commit-ment of coalition C to the strategy set FC which will lead toa model updating [15]. Hence, a goal ϕ of a coalition C canbe understood as “forcing the domain to satisfy the propertyϕ by committing to a set of suitable strategies”.

4. Planning Algorithm

We will present an algorithm for coalitional planningand formally prove its correctness. The main control flowis inherited from the planning algorithm for temporally ex-tended goals in non-deterministic domains proposed in [14].The underlining idea of this algorithm is “if a goal must be

satisfied in a state then some conditions must be project tothe next states”. We rewrite the progress function and theassign-progress function based on the semantics of ATL.

Lemma 4.1 The following results hold:(1) ² 〈〈C〉〉2ϕ ↔ ϕ ∧ 〈〈C〉〉 © 〈〈C〉〉2ϕ;(2) ² 〈〈C〉〉ϕ1Uϕ2 ↔ ϕ2 ∨ (ϕ1 ∧ 〈〈C〉〉© 〈〈C〉〉ϕ1Uϕ2);(3) ² 〈〈C〉〉3ϕ ↔ ϕ ∨ 〈〈C〉〉 © 〈〈C〉〉3ϕ.

The validity of the formulas in lemma 4.1 is trivial,which enable progressing a goal in a state to sub-goals tobe satisfied in (some of) the successor states.

Definition 4.1 (progress function). The progress function εis inductively defined by the following clauses:

• ε(q, p) = if p ∈ q then > else ⊥;

• ε(q,¬ϕ) = ¬ε(q, ϕ);

• ε(q, ϕ1 ∨ ϕ2) = ε(q, ϕ1) ∨ ε(q, ϕ2);

• ε(q, 〈〈C〉〉 © ϕ) = 〈〈C〉〉 © ϕ;

• ε(q, 〈〈C〉〉2ϕ) = ε(q, ϕ) ∧ 〈〈C〉〉 © 〈〈C〉〉2ϕ;

• ε(q, 〈〈C〉〉ϕ1Uϕ2) = ε(ϕ2)∨(ε(ϕ1) ∧ 〈〈C〉〉 © 〈〈C〉〉ϕ1Uϕ2);

• ε(q, 〈〈C〉〉3ϕ) = ε(q, ϕ) ∨ 〈〈C〉〉 © 〈〈C〉〉3ϕ;

Lemma 4.2 q ² ϕ iff q ² ε(q, ϕ).

Proof By induction on the structure of ϕ. The booleancases and the case of ϕ = 〈〈C〉〉 © ϕ1 are trivial. The casesof ϕ = 〈〈C〉〉2ϕ1, ϕ = 〈〈C〉〉ϕ1Uϕ2 and ϕ = 〈〈C〉〉3ϕ1 areimmediate from lemma 4.1.

Proposition 4.3 The result of every progress function canbe written in a normal form, that is,

ε(q, ϕ) =∨

i∈I

∧

〈Cj ,ϕj〉∈Gi

〈〈Cj〉〉 © ϕj ,

Proof By induction on the structure of ϕ. Notice that,⊥ =

∨i∈I

∧〈Cj ,ϕj〉∈Gi

〈〈Cj〉〉 © ϕj where I = ∅, and> =

∨i∈I

∧〈Cj ,ϕj〉∈Gi

〈〈Cj〉〉 © ϕj where I = {1} andG1 = ∅.

Hence, for an arbitrary goal ϕ, ε(q, ϕ) can be denotedas a set Sε(q,ϕ) = {Gi|i ∈ I} where I = {1, 2, ..., k},Gi = {〈C1, ϕ1〉, ..., 〈Cl, ϕl〉} is a set of coalition-subgoalpairs and k, l ∈ N. Intuitively, every Gi ∈ Sε(q,ϕ) denotesa method for satisfying a goal and Sε(q,ϕ) represents all thepossible methods. An arbitrary pair 〈Cj , ϕj〉 ∈ Gi meansthat Cj must have a choice so that all the successor states

650

satisfy the subgoal ϕ. For notational convenience, in the re-mainder of this paper we will not distinguish between theformula ε(q, ϕ) and the set Sε(q,ϕ).

Given a set of agent C, we denote the joint action ofC with the vector ~mC which is a sequence of actions ar-ranged in order of increasing ID of the agents. Moreover,we denote as ~mC1 ' ~mC2 iff ~mC1 and ~mC2 agree on theactions for the agents in the set C1 ∩ C2. Then the succes-sor states of a state q after the agents in C have chosen thejoint action ~mC and the agents in C1 have chosen the jointaction ~mC1 are the set succ(q, ~mC , C1, ~mC1) = {q′| ∃~msuch that ~m ' ~mC , ~m ' ~mC1 and δ(q, ~m) = q′}; and thestate sets which the agents in C1 can effectively force afterthe agents in C have chosen the joint action ~mC is the setnext(q, ~mC , C1) = {S ⊆ Q| ∃ ~mC1 such that ~mC1 ' ~mC

and succ(q, ~mC , C1, ~mC1) = S}.

Definition 4.2 (assign-progress function). Let e = {〈C1,ϕ1〉, ..., 〈Cn, ϕn〉} be an arbitrary set of coalition-subgoalpairs, g(e) = {ϕ1, ..., ϕn}, and c(e) = {C1, ..., Cn}.Then, the assign-progress function is defined asassign(e, q, ~mC) = {f : e → ∪Ci∈c(e)next(q, ~mC , Ci)|∀〈Ci, ϕi〉 ∈ e : ∃S ∈ next(q, ~mC , Ci) such thatf(〈Ci, ϕi〉) = S}.

Intuitively, the assign process function gives the set of allthe possible assignment of subgoals to the successor states.For all f ∈ assign(e, q, ~mC) we define ff : Q → 2g(e) asthe function satisfying ff (q) = S iff ∀ϕ′ ∈ S,∃〈C ′, ϕ′〉 ∈ esuch that q ∈ f(〈C ′, ϕ′〉). Thus, ff is a possible labelingfunction that labels the states with the subsets of the sub-goals.

Now, we can put the pieces together and describe the al-gorithm for coalitional planning. The algorithm, as is shownin Figure 2, consists of 3 functions: function plan takes asinput a coalition C, a initial state q0 and a goal ϕ, call thefunction build, and returns a plan for C if this plan is found,otherwise returns the symbol ⊥; function build takes as in-put a state q, a context e, a currently partly accomplishedplan pl and a vector of state-context pairs open, recursivelycall the build function to perfect the partly accomplishedplan pl so that it can fulfill the subgoals assigned to thestate-context pair 〈q, e〉; and function verify takes as inputa loop of state-context pairs, and verify the validity of thecurrently formed plan. Note that, vector open, containingstate-context pairs whose joint actions are selected but arenot verified, recodes the search path in the domain.

Similar to the algorithm in [14], the coalitional planningalgorithm performs a depth-first forward search: startingfrom the initial state, it picks up a joint action for the plan-ning object, progresses the goal to successor states, and iter-ates until either the goal is satisfied or the search path leadsto a failure. It is important to notice that, the algorithm usesas the “contexts” of the plan the list of the subgoals that are

considered at the different stages of the exploration. Moreprecisely, a context is a set s = {ϕ1, ..., ϕn}, where the ϕi

are the subgoals, as computed by functions ε and assign.

Function:plan (C, q0, ϕ)1

return build(q0, {ϕ}, 〈C, ∅, ϕ, ∅, ∅〉, ∅);2

Function:build (q, e, pl, open)3

if 〈q, e〉 ∈ open then4

if verify(〈q, e〉, open) then return pl;5

else return ⊥;6

end7

if defined pl.α(q, e) then return pl;8

foreach ~mC ∈ DC(q) do9

foreach e′ ∈ ε(q, e) do10

foreach f ∈ assign(e′, q, ~mC) do11

pl′ ← pl;12

pl′.E ← pl′.E ∪ {e};13

pl′.α(C, q, e) ← ~mC ;14

open′ ← conc(〈q, e〉, open);15

foreach q′ ∈ domain(ff ) do16

e′ ← ff (q′);17

pl′.τ(q, e, q′) ← e′;18

pl′ ← build(q′, c′, pl′, open′);19

if p′ = ⊥ then next f ;20

end21

return pl′;22

end23

end24

return ⊥;25

end26

Function:verify (〈q, e〉, open)27

loop-goals ← setof(e);28

while 〈q, e〉 6= head(open) do29

〈q′, e′〉 ← head(open);30

loop-goals ← loop-goals ∩ setof(e′);31

open ← tail(open);32

end33

if ∃〈〈Ci〉〉ϕ1Uϕ2 ∈ loop-goals then return false;34

else return true;35

Figure 2. The planning algorithm

Definition 4.3 (feasible choice). Given a coalitional plan-ning problem Θ = 〈G, C, I, ϕ〉, we say ~mC is a feasiblechoice for the coalition C with respect to the state-contextpair 〈q, e〉 iff there is a plan P = 〈C, E, ϕ, α, τ〉 such that(1) α(C, q, e) = ~mC , and (2) GP

qi, 〈qi, ϕ〉 ² ϕ for every

qi ∈ I .

651

In other words, a feasible choice is consistent with a planwhich can satisfy the goal.

Lemma 4.4 Given a state q and a context e, ~mC is a fea-sible choice for coalition C if there is an e′ ∈ ε(e) and af ∈ assign(e′, q, ~mC) such that ∀q′ ∈ domain(ff ): thereexists at least one feasible choice for C with respect to thestate-context pair 〈q′, ff (q′)〉.Proof Let P = 〈C, E, e0, α, τ〉 be a plan whereα(C, q, e) = ~mC , and GP

q0be an execution structure of P in

a game-like domain G from state q0. Then it is easy to seethat GP

q0, 〈q, e〉 ² e iff ∀q′ ∈ domain(ff ) GP

q0, 〈q′, ff (q′)〉 ²

ff (q′). As there exists at least one feasible choice for Cwith respect to the state-context pair 〈q′, ff (q′)〉 for allq′ ∈ domain(ff ), we can construct a plan P ′ such thatP ′.α(C, q, e) = ~mC and GP ′

q0, 〈q′, ff (q′)〉 ² ff (q′) for all

q′ ∈ domain(ff ). So, GP ′q0

, 〈q, e〉 ² e and ~mC is a feasiblechoice for coalition C with respect to the state-context pair〈q, e〉.

Based on the above results, we can prove some formalproperties of the coalitional planning algorithm.

Theorem 4.5 (Termination). Function build always termi-nates.

Proof As the sets DC(q), ε(q, e), assign(e′, q, ~mC), anddomain(f) are finite, the foreach-loops at lines 9-25 willtry finite cases. Moreover, lines 4-8 guarantee that a state-context pair will not be considered more than once. Hence,the function plan will eventually terminate.

Theorem 4.6 (Correctness). Let P = plan(C, q0, ϕ). ThenP can satisfy ϕ from q0.

Proof By Lemma 4.2 and 4.4, the correctness result is im-mediate.

5. Concluding Remarks

The planning via model checking paradigm can be ex-tended to solve the multi-agent planning problems. Actu-ally, the work of van der Hoek et al [16] and Jamroga [11]are the pioneers of this idea. However, their frameworkshas a limitation in the sense that the expressive power ofthe corresponding logics haven’t been sufficiently appliedin goal specification, that is, only a little fragment of thecorresponding language can be used as goals. By formaliz-ing the coalitional planning problem we showed that we canfully utilize the expressive power of ATL to express morecomplex goals in multi-agent planning so as to solve moreplanning problems. The proposed framework can be imple-mented in existing ATL model checkers such as MOCHA [2].However, some related problems such as OBDD representa-tion and computational complexity are not discussed in thispaper, we will address them in our future work.

6. Acknowledgement

This work was partly supported by the NSFC (60503021,60721002, 60875038), the High-tech Research Program ofJiangsu (BG2007038, BE2009142), and the Key ResearchProjects of Chinese Ministry of Education (108151).We aregrateful to the reviewers for their helpful comments.

References

[1] T. Agotnes, V. Goranko, and W. Jamroga. Alternating-timetemporal logics with irrevocable strategies. In Proceedingsof TARK-07, pages 15–24, 2007.

[2] R. Alur, L. de Alfaro, T. A. Henzinger, S. C. Krishnan,F. Y. C. Mang, S. Qadeer, S. K. Rajamani, , and S. Tasiran.MOCHA user manual. University of Berkeley Report, 2000.

[3] R. Alur, T. A. Henzinger, and O. Kupferman. Alternating-time temporal logic. In Proceedings of FOCS-97, pages100–109, 1997.

[4] R. Alur, T. A. Henzinger, and O. Kupferman. Alternating-time temporal logic. Journal of the ACM, 49(5):672–713,2002.

[5] J. A. Baier and S. A. McIlraith. Planning with first-ordertemporally extended goals using heuristic search. In Pro-ceedings of AAAI-06, pages 788–795, 2006.

[6] S. Borgo. Coalitions in action logic. In Proceedings ofIJCAI-07, pages 1822–1827, 2007.

[7] A. Cimatti, M. Pistore, M. Roveri, and P. Traverso. Weak,strong, and strong cyclic planning via symbolic modelchecking. Artificial Intelligence, 147:35–84, 2003.

[8] E. A. Emerson. Handbook of Theoretical Computer science,chapter Temporal and modal logic, pages 997–1072. 1990.

[9] F. Giunchiglia and P. Traverso. Planning as model checking.In Proceedings of ECP-99, pages 1–20, 1999.

[10] V. Goranko and G. van Drimmelen. Complete axiomati-zation and decidability of alternating-time temporal logic.Theoretical Computer Science, 353:93–117, 2006.

[11] W. Jamroga. Strategic planning through model checking ofatl formulae. In Artificial Intelligence and Soft Computing:Proceedings of ICAISC-04, pages 879–884, 2004.

[12] F. Kabanza and S. Thiebaux. Search control in planning fortemporally extended goals. In Proceedings of ICAPS-05,pages 130–139, 2005.

[13] R. Mattmuller and J. Rintanen. Planning for temporally ex-tended goals as propositional satisfiability. In Proceedingsof IJCAI-07, pages 1966–1971, 2007.

[14] M. Pistore and P. Traverso. Planning as model checking forextended goals in non-deterministic domains. In Proceed-ings of IJCAI-01, pages 479–484, 2001.

[15] W. van der Hoek, W. Jamroga, and M. Wooldridge. A logicfor strategic reasoning. In Proceedings of the AAMAS-05,pages 157–164, Utrecht, Netherlands, 2005.

[16] W. van der Hoek and M. Wooldridge. Tractable multiagentplanning for epistemic goals. In Proceedings of AAMAS-02,pages 1167–1174. ACM Press, 2002.

[17] D. Walther, W. van der Hoek, and M. Wooldridge.Alternating-time temporal logic with explicit strategies. InProceedings of TARK-07, pages 269–278, 2007.

652

[IEEE 2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI) -...

Documents

Transcript of [IEEE 2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI) -...