Game Description Languages and Logics -...
Transcript of Game Description Languages and Logics -...
Game Description Languages and Logics
Laurent Perrussel – Guifei Jiang – Dongmo Zhang
IJCAI 2019 - Based on our KR-18 and PFIA-19 tutorial
IRIT – Université Toulouse Capitole (FR)Nankai University (CN)Western Sydney University (AU)
1
Motivation
Motivation(1/2)
• Game: describe and justify actions in a multi-agent context
• Autonomy for agent means• Decision making: justify actions (agent rationality)
P1 plays scissor because...
• handling or playing in different environments (facing a newgame)
P2 now plays Tic-tac-toe
2
Motivation (2/2)
Computer Science vs Game Theory?
• Game TheoryMain goal: assessing the graph (i.e. the game) andfind equilibrium or existence of winning strategies
• Computer ScienceMain goal: compact representation, computation ofthe possible next actions and choice
General Game PlayingComputer scientists challenge: build programs sufficiently generalfor playing different games.
3
Organization
Motivation
General Game Playing
Game Description Language
Implementing a Player
Game Description Logic: GDL with a (logic-flavored) semantics
Imperfect Information
Reasoning for winning?
Still a lot to do! - Example: Equivalent games
Perspectives
4
General Game Playing
General Game Playing - Overall organization
More details at http://ggp.org and in[Genesereth and Thielscher, 2014]
Interaction between server and players:
⇒ Game rules & current state of the game
⇐ Moves
5
General Game Playing - Prerequisites
Limited to the shared aspect of the game
• Type of game No randomness - perfect information (boardgame)
• Language Processable by the server and players (game rules)
• Timeclock sync player moves and game run
No prerequisite on players implementation (reasoning isnot compulsory!)
6
General Game Playing - Key challenge
Overall goal: designing intelligent agent
Building players sufficiently general for playing differentgames
GGP competition: players compete by playing at different games.
Challenge is not to build the best player for one game
GGP player will never beat AlphaGo (at least in a Go game!)
7
General Game Playing - Specialized player
• Usually rules of the game hard-coded in the player
• Possibly exhaustive search
• Predefined library of best moves (tactics, ie. library of plans)combined with heuristics
• Library can be learned
8
General Game Playing - Representing games
Game Description Language (GDL)
• GeneralGeneral enough for describing different games: noprimitives related to some specific game
• Game rules and remarkable statesInitial and final states, legal actions...
• CompactLogic-based language, namely first-order logic
9
General Game Playing - Processing GDL
Server
• Not relevant - Zero intelligence
Players
• No specific implementationSeveral implementation are available (Java, Prolog...)
• No specific way to playReasoning, Heuristics, Monte-Carlo, CSP...
10
General Game Playing - GDL Example
Tic-Tac-Toe
Tic Tac Toe (or Noughts and Crosses, Xs and Os) is agame for two players who take turns placing theirmarks in a 3x3 grid. The first player to place threeof his marks in a horizontal, vertical, or diagonalrow wins the game.
11
General Game Playing - GDL Example
Tic-Tac-Toe GDL representation (1/3)
;;; Components;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(role white)(role black)...
;;; init;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(init (cell 1 1 b))...(init (cell 3 3 b))(init (control white))
12
General Game Playing - GDL Example
Tic-Tac-Toe GDL representation (2/3)
;;; legal moves(<= (legal ?w (mark ?x ?y))
(true (cell ?x ?y b))(true (control ?w)))
(<= (legal white noop)(true (control black)))
...
;;; next (effects)(<= (next (cell ?m ?n x))
(does white (mark ?m ?n))(true (cell ?m ?n b)))
...13
General Game Playing - GDL Example
Tic-Tac-Toe GDL representation (3/3)
;;; goal(<= (goal white 100)
(line x)(not (line o)))
(<= (goal white 0)(not (line x))(line o))
...
;;; terminal
(<= terminal(line x))
... 14
Game Description Language
GDL - Primitives
Prolog/Datalog like rules with predefined keywords (prefix notation)
Static perspective
• role players of the game(role white)
• init initial state(init (cell 1 1 b))
• terminal terminal state(<= terminal (line x))
• true current state(true (cell 2 2 b))
15
GDL - Primitives (2/2)
Dynamic perspective
• legal rules of the game - possible moves(<= (legal x noop) (true (control o)))
• does performing action (in the current state)(<= (next (cell ?x ?y ?player)) (does?player (mark ?x ?y)))
• next update function(<= (next (control o)) (true (control x)))
• goal objectives of the players(<= (goal ?player 100) (line ?player))
16
GDL - syntax constraints
"Enforcing" game flavor
• sequence of keywords is prohibited
• role only atomic (fixed players)
• next predicate only in heads
• init and true predicates only in bodies
• does predicate only in bodies
• recursion restriction
17
GDL - semantics
A logic programming perspective
• Minimal data set D which are models of a game G : set ofgrounded atoms• ground literal (not p) is satisfied iff p is not in D
• GDL game description: logic program with predefinedpredicate and shape• Complete definition of role, init• legal and goal only defined wrt true• next only defined wrt true and does
• Unique minimal model satisfying the state of the game (ietrue predicate)
• Several minimal models when considering the dynamics (iedoes predicate)
18
GDL - Chess example (1/6)
• Around 1000 lines!
• initial state already complex
• legal moves differ for eachpiece type
• basic rules + specific rules(pawn promotion...)
• no number in GDL: rules forencoding them!
19
GDL - Chess example (2/6)
Initial state
• Two players
• Chess board and pieces• blank cells• black and white rooks
(wr, br)• black and white pawn
(wp)
• First player
(role white)(role black)(init (cell a 1 wr))(init (cell a 2 wp))(init (cell a 3 b))...(init (cell h 8 br))(init (control white))
20
GDL - Chess example (3/6)
Goal states
• Check mate the opponent⇒ should be defined for the
white and black players
• Draw is a good compromise
• Not being checkmate is alsoa goal!
...(<= (goal white 100)
(checkmate black))(<= (goal white 50)
stalemate)(<= (goal white 0)(checkmate white))...
21
GDL - Chess example (4/6)
End of the game
• One player is stuck⇒ regardless king is in check
or not
• After 200 rounds, game isstopped⇒ Numbers and counting
should be defined
(<= (stuck ?pl)(role ?pl)(not (has_legal_move ?pl)))
...(<= terminal
(true (control ?player))(stuck ?player))
(<= terminal(true (step 201)))
...(succ 1 2)(succ 2 3)...
22
GDL - Chess example (5/6)
Legal moves
• Define the moves for eachpiece• what means adjacent?• what means diagonal?• ...
• Define legality• context is OK (players,
piece is on the cell, moveis meaningful...)
(<= (knight_move ?piece ?u ?v?x ?y ?owner)
(piece_owner_type ?piece?owner knight)
(adjacent_two ?v ?y)(adjacent ?u ?x))
...(<= (legal ?player (move ?piece
?u ?v ?x ?y))(true (control ?player))(true (cell ?u ?v ?piece))(occupied_by_opp ?x ?y ?player)(legal2 ?player (move ?piece
?u ?v?x ?y))
...
23
GDL - Chess example (6/6)
Actions and update
• General rules for the gamee.g. blank cell
• specific rules for specificmovese.g. “en passant”
• update the step number
(<= (next (cell ?u ?v b))(does ?player (move ?p ?u
?v ?x ?y)))
(<= (next (cell ?x1 ?y1 b))(does ?player (move ?piece
?x1 ?y1 ?x2 ?y2))(pawn_capture_en_passant
?player ?x1 ?y1 ?x2 ?y2))
(<= (next (step ?y))(true (step ?x))
(succ ?x ?y))
24
Implementing a Player
Implementing a Player
• Free implementation
• Reasoning is not compulsory
• Main technique:• Search-Space and
Heuristics• Compute the value of the
next state
eg. (1) Minimaxeg. (2) Monte-Carlo Tree Search
25
Minimax
Extensive search
• depth first
• min value for theopponents (worstcase)
• max value formyself (best case)
• depth of searchmay be limited(heuristics)
26
MCTS
Monte Carlo TreeSearch
• Run randomsimulation
• Sampling thegame tree
• Estimation ofactions
• MCTS convergeto Minimax
27
Game Description Logic: GDLwith a (logic-flavored) semantics
Signature and Language
Towards reasoning about Perfect Information Games
First step is to build a logic based on GDL [Jiang, 2016]
Signature Agents, actions, propositions:
(N,A,Φ)
Language predefined symbols and temporal operators
ϕ ::= p | initial | terminal | legal(r , a) | wins(r) |does(r , a) | ¬ϕ | ϕ ∧ ψ | ©ϕ
28
Tic-Tac-Toe
GDL description of Tic-tac-Toe:
1. initial ↔ turn(x) ∧ ¬turn(o) ∧3∧
i,j=1¬(px
i,j ∨ poi,j )
2. wins(r)↔3∨
i=1
2∧l=0
pri,1+l ∨3∨
j=1
2∧l=0
pr1+l,j ∨2∧
l=0pr1+l,1+l ∨
2∧l=0
pr1+l,3−l
3. terminal ↔ wins(x) ∨ wins(o) ∨3∧
i,j=1(px
i,j ∨ poi,j )
4. legal(r , ai,j )↔ ¬(pxi,j ∨ po
i,j ) ∧ turn(r) ∧ ¬terminal
5. legal(r , noop)↔ turn(−r)
6. ©pri,j ↔ pri,j ∨ (does(r , ai,j ) ∧ ¬(pxi,j ∨ po
i,j ))
7. turn(r)→©¬turn(r) ∧©turn(−r)
29
State-Transition Model (Perfect-Information Game)
M = (W , I ,T , L,U, g , π)
• W is a non-empty finite set of possible states.
• I ⊆W , representing a set of initial states.
• T ⊆W \ I , representing a set of terminal states.
• L ⊆W \T ×N × 2A is a legality relation, specifying legal actions for eachagent at non-terminal states. Let Lr (w) = {a ∈ A : (w , r , a) ∈ L} bethe set of all legal actions for agent r at state w . To make the gameplayable, we require Lr (w) 6= ∅ for every r ∈ N and w ∈W \T .
• U : W ×A|N| →W \I is an update function, specifying the statetransition for each state and joint action (synchronous moves).
• g : N → 2W is a goal function, specifying the winning states of eachagent.
• π : W → 2Φ is a standard valuation function.
30
ST Model - Details (1/3)
M = (W , I ,T , L,U, g , π)
• Set of states W canbe very large
5 478 statesforTic-Tac-Toe
• Set I = {w0} usuallya singleton
31
ST Model - Details (2/3)
M = (W , I ,T , L,U, g , π)
• Set T of terminalstates consider allcases
958 terminalstates
• winning or drawstates
• winning states gspecific to each agentand subset of T
32
ST Model - Details (3/3)
M = (W , I ,T , L,U, g , π)
• Legal transitions (L)
9 legalactions from〈(1, 1), noop〉to〈(3, 3), noop〉in w0
• Update isdeterministic.
Update can
be defined
while illegal
(eg. 〈noop,noop〉
33
Path
Path δ is an infinite sequence of states and actions
w0d1→ w1
d2→ w2 · · ·dj→ · · ·
such that for all j ≥ 1 and for any r ∈ N,
1. wj = U(wj−1, dj) (state update);
2. (wj−1, dj(r)) ∈ Lr (that is, any action that is taken must belegal);
3. if wj−1 ∈ T , then wj−1 = wj (that is, a loop after reaching aterminal state).
θr (δ, j): action of agent r at stage j of δ
34
Path
Sequence of actions
• Run over an ST-model
• No requirement about first and laststates
• formulas will be interpreted over apath at some step
• δ[j ]: jth state of path δ
• θr (δ, j) action performed by agent rat state j of path δ
eg: θx(δ, 3) = a1,1
35
Semantics
W.r.t. M, some path δ and index j
M, δ, j |= p iff p ∈ π(δ[j ])
M, δ, j |= ¬ϕ iff M, δ, j 6|= ϕ
M, δ, j |= ϕ1 ∧ ϕ2 iff M, δ, j |= ϕ1 and M, δ, j |= ϕ2
M, δ, j |= initial iff δ[j ] ∈ I
M, δ, j |= terminal iff δ[j ] ∈ T
M, δ, j |= wins(r) iff δ[j ] ∈ g(r)
M, δ, j |= legal(r , a) iff a ∈ Lr (δ[j ])
M, δ, j |= does(r , a) iff θr (δ, j) = a
M, δ, j |=©ϕ iff M, δ, j + 1 |= ϕ
36
Path
Tic-Tac-Toe formulas
• M, δ, 0 |= ¬px1,1• M, δ, 1 |= px2,2
• M, δ, 1 |= ¬wins(x)
• M, δ, 1 |= does(o, a1,3)
• M, δ, 2 |=©does(o, a2,3)
• M, δ, 3 |= ¬© wins(x)
37
GDL for reasoning about games
General game properties
• |=∨
r∈N wins(r)→ terminal iff g(r) ⊆ T
Bounded time
• 6|=∧
i∈1..n©i¬wins(r)→©n+1¬wins(r)
0©n: sequence of n ©
38
GDL for reasoning about games
General game playing w.r.t. some ongoing game
• assessing a “strategy” vs 〈game state, move〉 (M, δ)• Look ahead via model checking (PTIME)• Winning move (encoded in δ)?
M, δ, 0 |=©wins(x)
• Prevent opponent x to win?• Choose an action a for x and an action b for −x next move⇒ Check M, δ, 0 |=©does(−x , b) ∧©2wins(−x)
• Choose alternative action a′ for x⇒ Check M, δ′, 0 |=©does(−x , b) ∧©2¬wins(−x)
• Choose other b′ and recheck
• No meta-reasoning in GDL (assessment over paths)
“Try to win, if not prevent to loose” cannot berepresented
39
GDL for reasoning about games
Specific game properties
• Set of rules specific to a game
• Identify pattern for general game playing• Example: Tic-Tac-Toe
• diagonal(x)↔∧
i∈1..3 pxi,i ∨
∧i∈0..2 p
x1+i,3−i
• line(x)↔ diagonal(x) ∨ column(x) ∨ row(x)
• Double threat consequence of move a by x : two potential lines• Meta-reasoning as two paths are considered (eg: row or
column):
For any next move b by −x , pick up x move c andc ′, build path δ, δ′ and check
M, δ, 0 |=©2row(x) or M, δ′, 0 |=©2column(x)
40
GDL and other logical systems
GDL and Linear Time Logic (LTL)
• GDL close to temporal logic
• Temporal logic: modal logic.
• Temporal operators: next, sometimes, always
• No existential and universal modality in GDL“Statement ϕ will hold at some stage” cannot berepresented
41
GDL and other logical systems
GDL and Linear Time Logic (LTL)
• LTL formulas: Xϕ,Gϕ,Fϕ s.t. Gϕ =def ¬F¬ϕ• binary operator: ϕUψ (Until)
• Interpretation over paths: infinite sequence δ of propositionalinterpretations ω
• LTL semantics• δ |= p ⇐⇒ p ∈ δ[0]
• δ |= Xϕ ⇐⇒ δ[1..∞] |= ϕ
• δ |= Fϕ ⇐⇒ there exists i ∈ 0..∞, s.t. δ[i ..∞] |= ϕ
42
GDL and other logical systems
GDL and Linear Time Logic (LTL)
• Mapping between GDL and LTL
• First step: map the signature
• Second step: map the model (interpretations and paths)
• Third step: equivalence result
M, δGDL, 0 |=GDL ϕ ⇐⇒ δLTL |=LTL ϕ
43
GDL and other logical systems
GDL and Linear Time Logic (LTL)
• Map GDL signature (N,A,Φ) and LTL symbols P
P = Φ ∪ {initial , terminal}∪ {legal(r , a) | r ∈ N and a ∈ A}∪ {done(r , a) | r ∈ N and a ∈ A}
∪ {wins(r) | r ∈ N}
• Map between GDL and LTL formulas: Function tr
All symbols are unchanged except © and does
operators replaced by X and done operators
tr(does(r , a)) = Xdone(r , a)
• No more agents and actions! 44
GDL and other logical systems
GDL and Linear Time Logic (LTL)
• Construct an LTL path δ′ wrt. an ST modelM = (W , I ,T , L,U, g , π) and path δ st. for any w ′j of δ
′
• p ∈ w ′j ⇐⇒ p ∈ π(δ[j ]) for any p ∈ Φ
• initial ∈ w ′j ⇐⇒ δ[j ] ∈ I
• terminal ∈ w ′j ⇐⇒ δ[j ] ∈ T
• wins(r) ∈ w ′j ⇐⇒ δ[j ] ∈ g(r)
• legal(r , a) ∈ w ′j ⇐⇒ a ∈ Lr (δ[j ])
• done(r , a) ∈ w ′j ⇐⇒ θr (δ, j − 1) = a and j > 1
• Interplay
M, δ, j |=GDL ϕ ⇐⇒ δ′[j ..∞] |=LTL tr(ϕ)
45
Bisimulation and expressivity of GDL (1/4)
Comparing models of the same game signature LetM = (W , I ,T , L,U, g , π) and M ′ = (W ′, I ′,T ′, L′,U ′, g ′, π′) betwo ST-models (based on the same game signature).
Intuitively, M and M ′ are bisimulation-equivalent (bisimilar, forshort) if
• there is a binary relation Z ⊆W ×W ′ (mapping relation)among states
• paths built on M are equivalent to the paths built on M ′, wrtZ (legality and update)
46
Bisimulation and expressivity of GDL (2/4)
Example with 2 PDL models (Propositional Dynamic Logic)1
Figure 1: Understanding bisimulation
1https://plato.stanford.edu/entries/logic-dynamic/ 47
Bisimulation and expressivity of GDL (3/4)
Formally, assume M and M ′ such that wZw ′, if
1. All the following hold:
1.1 π(w) = π′(w ′);1.2 w ∈ I iff w ′ ∈ I ′;1.3 w ∈ T iff w ′ ∈ T ′;1.4 Lr (w) = L′r (w
′) for any r ∈ N;1.5 w ∈ g(r) iff w ′ ∈ g ′(r) for any r ∈ N.
2. For every d ∈ D and u ∈W , if U(w , d) = u, then there is u′ ∈W ′
s.t. U ′(w ′, d) = u′ and uZu′;
3. For every d ∈ D and u′ ∈W ′, if U ′(w ′, d) = u′, then there isu ∈W s.t. U(w , d) = u and uZu′.
then M and M ′ are bisimilar.
48
Bisimulation and expressivity of GDL (4/4)
Main impact
Two ST-models are bisimulation-equivalent iff theysatisfy the same GDL-formulas.
"Bounded" bisimulation: bisimulation up to k levels.
Exactly the properties of ST-models that are closed underk-bisimulation for some k ∈ N are definable in GDL.
49
Exercises
Exercise 1: Specifying a game
(Simplified) Nim Game
• 2 players sequential game
• 12 sticks
• at each round, each player picks 1, 2 or 3 sticks
• winner of game: the player picking the last stick
Provide the GDL representation
50
Exercise 2: Bisimulation
Show GDL-undefinable property: a player has a winningstrategy, a game will always reach a terminal state.
51
Imperfect Information
Imperfect Information
Example
Figure 2: Krieg Tic-Tac-Toe
• player black and white
• see their own marks only• know turn-taking and available actions
• random only for tiebreak
Question: How to describe and reason about games with imperfect infor-mation? 52
GDL-II - Representing Imperfect Information Games
GDL-II: extension of GDL - Server side [Thielscher, 2010]
• sees specify what a player perceives at the next state(sees ?player (holds ?player ?card))
sees behaviour similar to next: only in head of clauses.
• random random player(role random)
Perform action with parameters randomly set(does random (deal ?player ?card))
53
Krieg Tic-Tac-Toe GDL representation (1/3)
Simultaneous move: possible tie-break
;;; additional random player for tie break(role black)(role white)(role random)
;;; random player can only solve tie break(legal random (tiebreak white))(legal random (tiebreak black))
;;; validmove :- does(r,mark ?m ?n), true(cell(?m ?n,b)).;;; "tried" predicate: "try to mark"
(<= next (tried ?r ?m ?n)(does ?r (mark ?m ?n))not validmove)
(<= next (tried ?r ?m ?n)(true (tried ?r ?m ?n)))
54
Krieg Tic-Tac-Toe GDL representation (2/3)
Solving tie-break
;;; tiebreak(<= (goal white 100)
(true (line white))(true (line black))(does random (tiebreak write)))
;;; tiebreak(<= (goal black 100)
(true (line white))(true (line black))(does random (tiebreak black)))
55
Krieg Tic-Tac-Toe GDL representation (3/3)
Only seeing own moves
;;; (<= sees (r,yourmove)not validmovetrue(control(r))
;;; (<= sees (r,yourmove)validmovetrue(control(r)))
56
GDL-II Semantics
Mapping Game G to State-Transition model
• Σ set of all states S of ground atoms f• Strue = {true(f1), · · · , true(fn)}
• S : set of ground atoms f1 · · · fn• Strue: extension of S with true predicate
• Mdoes = {does(1,a1), · · · , does(r ,ar )}• Mdoes: joint move derivable from G ∪ Strue
• ModelM = (Σ,N,w0, t, l , u, I, g)
• N = {r | G satisfies role(r) }• w0 = {f | G satisfies init(f ) }• u(M,S) = {f | G ∪ Strue ∪Mdoes satisfies next(f ) } for all M
and S
• I = {(r ,M,S , p) | G ∪ Strue ∪Mdoes satisfies sees(r , p) } forall r 6= random, M and S
57
Krieg Tic-Tac-Toe State-Transition model (1/3)
Building up modelM = (N,w0, t, l , u, I, g)
{black,white} ⊆ N
(role black)(role white)(role random)
{cell(1, 1, b), ..., cell(3, 3, b)} ∈ w0 as
(init (cell 1 1 b))...(init (cell 3 3 b))
58
Krieg Tic-Tac-Toe State-Transition model (2/3)
Building up modelM = (N,w0, t, l , u, I, g)
u(〈(1, 1)x , (noop)o〉,w0) = {cell(1, 1, x), ..., cell(3, 3, b)} as
G ∪ wtrue0 ∪ 〈(1, 1)x , (noop)o〉does satisfies (next (cell 1 1 x))
59
Krieg Tic-Tac-Toe State-Transition model (3/3)
Building up modelM = (N,w0, t, l , u, I, g)
(x , 〈(1, 1)x , (3, 3)o〉,w1, cell(1, 1, x)) ∈ I as
G ∪ wtrue0 ∪ 〈(1, 1)x , (3, 3)o〉does satisfies (sees x (cell 1 1 x))
Remind that
(<= sees ?r (cell ?m ?n ?r)(true (cell ?m ?n b))(true (control r))(does ?r (mark ?m ?n)))
Notice that (o, 〈(1, 1)x , (noop)o〉,w1, cell(1, 1, x)) 6∈ I as
G∪wtrue0 ∪〈(1, 1)x , (noop)o〉does does not satisfies (sees o (cell 1 1 x))
60
Epistemic extension of GDL logic
Server side vs Player side
• GDL-II: how to represent incomplete or uncertain information
• GDL-II: server perspective (what is the information flow?)
• Player perspective• How to handle certain and uncertain information?• How to handle other players’ "knowledge"?
61
Epistemic extension: Syntax (1/2)
Extending GDL with epistemic operators [Jiang et al., 2016]
• Krϕ: “agent r knows ϕ”
• Cϕ: as “ϕ is common knowledge among all the agents in N”
Definition (Syntax)
ϕ ::= p | initial | terminal | legal(r , a) | wins(r) | does(r , a) |
¬ϕ | ϕ ∧ ψ | ©ϕ | Krϕ | Cϕ
Eϕ =def∧
r∈N Krϕ
62
Epistemic extension: Syntax (2/2)
Sequential Krieg-Tic-Tac-Toe - Epistemic rules
Additional symbol:
tried(r , ai ,j) represents the fact that player r has tried tomark cell (i , j) but failed
1. tried(r , ai ,j)→ p−ri ,j
2. does(r , ai ,j)→ Kr (does(r , ai ,j))
3. initial → Einitial4. (turn(r)→ Eturn(r)) ∧ (¬turn(r)→ E¬turn(r))
5. (pri ,j → Krpri ,j) ∧ (¬pri ,j → Kr¬pri ,j)
6. (tried(r , ai ,j)→ Kr tried(r , ai ,j)) ∧ (¬tried(r , ai ,j)→Kr¬tried(r , ai ,j))
63
Epistemic extension: Semantics (1/2)
Epistemic state transition (EST) model M is a tuple
(W , I , T , {Rr}r∈N , {Lr}r∈N ,U, g , π)
• W is a non-empty set of possible states.• I ⊆W , representing a set of initial states.• T ⊆W \I , representing a set of terminal states.• Rr ⊆W ×W is an equivalence relation for agent r , indicating
the states that are indistinguishable for r .• Lr ⊆W × Ar is a legality relation for agent r ,• U : W ×
∏r∈N Ar ↪→W \I is a partial update function
• g : N → 2W is a goal function, specifying the winning statesfor each agent.• π : W → 2Φ is a standard valuation function. 64
Epistemic extension: Semantics (2/2)
Imperfect Recallδ ≈j
r δ′ iff δ[j ]Rr δ
′[j ]
Satisfaction with respect to some EST M and path δ
M, δ, j |= Krϕ iff for any δ′ ∈ P, if δ ≈jr δ′, then M, δ′, j |= ϕ
M, δ, j |= Cϕ iff for any δ′ ∈ P, if δ ≈jN δ′, then M, δ′, j |= ϕ
where ≈jN is the transitive closure of
⋃r∈N ≈
jr and P is the set of
all paths in M.
65
EGDL for reasoning about games
General game playing w.r.t. some ongoing game
Figure 3: Player o Knowledge
Player o cannot distinguish between the two states
terminal → Cterminal is not valid
66
EGDL for reasoning about games
General game playing w.r.t. some ongoing game
• assessing a “strategy” vs 〈game state, move〉 (M, δ)
• Looking ahead via model checking (∆p2)
• Winning situation (encoded in δ)?
M, δ, j |= Kr © wins(x)
• Prevent opponent of r to win?
Check M, δ, j |= does(r , a) ∧ Kr ©¬wins(−r)
• Opponent of r may win (wrt. to some r move)?
Check M, δ, j |= ¬Kr¬© (does(−r , a) ∧©wins(−r))
• No complex reasoning over paths in EGDL
67
EGDL for reasoning about games
Specific game properties
Figure 4: Player x move
Player x knows that
does(x , ai ,j)→©Kx(pxi ,j ∨ tried(x , ai ,j))
Hence
Kx tried(x , a1,1)
68
EGDL for reasoning about games
Specific game properties
Figure 5: Player x move
Player x knows that
Kx tried(x , ai ,j)→ Kxpoi ,j
Hence
Kxpo1,1
69
Bisimulation and expressivity of EGDL (1/4)
• Two dimensions in EST-models: the temporal and theepistemic dimension• the corresponding states are undistinguishable, and• the states are reached at the same stage
• A new notion of path-based bisimulation between EST-models• at the current stage, two paths can not be distinguished, i.e.,
the same local properties hold and each agent takes the sameaction
• at the next stage, they also bisimulate each other from boththe temporal and epistemic perspectives.
70
Bisimulation and expressivity of EGDL (2/4)
Let M and M ′ be two ET-models, δ ∈ P(M), δ′ ∈ P(M ′) andj ∈ N. We say M, δ, j and M ′, δ′, j are (m, n)-bisimilar, writtenM, δ, j -n
m M ′, δ′, j , if we have
1. The base case1.1 the local properties hold for δ[j ] and δ′[j ].1.2 θr (δ, j) = θr (δ
′, j) for any r ∈ N.
2. If m > 0, M, δ, j + 1 -nm−1 M ′, δ′, j + 1.
3. If n > 0, then for all o ∈ N ∪ {N},3.1 for any λ ∈ P(M), δ ≈j
o λ, then there is λ′ ∈ P(M ′) s.t.δ′ ≈j
o λ′ and M, λ, j -n−1
m M ′, λ′, j ;3.2 for any λ′ ∈ P(M ′), δ′ ≈j
o λ′, then there is λ ∈ P(M) s.t.
δ ≈jo λ and M, λ, j -n−1
m M ′, λ′, j .
71
Bisimulation and expressivity of EGDL (2/4)
Let M and M ′ be two ET-models, δ ∈ P(M), δ′ ∈ P(M ′) andj ∈ N. We say M, δ, j and M ′, δ′, j are (m, n)-bisimilar, writtenM, δ, j -n
m M ′, δ′, j , if we have
1. The base case1.1 the local properties hold for δ[j ] and δ′[j ].1.2 θr (δ, j) = θr (δ
′, j) for any r ∈ N.
2. If m > 0, M, δ, j + 1 -nm−1 M ′, δ′, j + 1.
3. If n > 0, then for all o ∈ N ∪ {N},3.1 for any λ ∈ P(M), δ ≈j
o λ, then there is λ′ ∈ P(M ′) s.t.δ′ ≈j
o λ′ and M, λ, j -n−1
m M ′, λ′, j ;3.2 for any λ′ ∈ P(M ′), δ′ ≈j
o λ′, then there is λ ∈ P(M) s.t.
δ ≈jo λ and M, λ, j -n−1
m M ′, λ′, j .
71
Bisimulation and expressivity of EGDL (2/4)
Let M and M ′ be two ET-models, δ ∈ P(M), δ′ ∈ P(M ′) andj ∈ N. We say M, δ, j and M ′, δ′, j are (m, n)-bisimilar, writtenM, δ, j -n
m M ′, δ′, j , if we have
1. The base case1.1 the local properties hold for δ[j ] and δ′[j ].1.2 θr (δ, j) = θr (δ
′, j) for any r ∈ N.
2. If m > 0, M, δ, j + 1 -nm−1 M ′, δ′, j + 1.
3. If n > 0, then for all o ∈ N ∪ {N},3.1 for any λ ∈ P(M), δ ≈j
o λ, then there is λ′ ∈ P(M ′) s.t.δ′ ≈j
o λ′ and M, λ, j -n−1
m M ′, λ′, j ;3.2 for any λ′ ∈ P(M ′), δ′ ≈j
o λ′, then there is λ ∈ P(M) s.t.
δ ≈jo λ and M, λ, j -n−1
m M ′, λ′, j .
71
Bisimulation and expressivity of EGDL(3/4)
Let M and M ′ be two ET-models.
• M is globally (m, n)-similar to M ′, written M ∝nm M ′, if for
every δ ∈ P(M), there is δ′ ∈ P(M ′) such thatM, δ, 0 -n
m M ′, δ′, 0.
• M and M ′ are (m, n)-bisimilar, written M -nm M ′, if M ∝n
m M ′
and M ′ ∝nm M.
72
Bisimulation and expressivity of EGDL(4/4)
InvarianceTwo ET-models are (m,n)-bisimulation-equivalent iff they satisfythe same EGDL(m,n)-formulas.
DefinabilityA classM of ET-models is EGDL-definable iffM is closed underglobal (m, n)-simulations for some m, n ∈ N .
Note EGDL(m, n) denote the set of all formulas with the depth ofnext operators and epistemic operators at most m, n, respectively
73
Exercise
Guessing a number
• 2 players game
• Player 1 choose a number n ∈ [1, 10] (initial state)
• Player 2 has to guess n
• After each round, Player 1 informs Player 2 whether itsproposal is too low or too high.
• Player 2 wins if it guesses n in 3 rounds.
Provide the EGDL representation
74
GDL-III - Representing epistemic games
GDL-III: extension of GDL-II - Server side [Thielscher, 2017]
• (knows ?r ?p) specify that player ?r knows information p?
(knows ?player (holds ?player2 ?card))
• (knows ?p) specify that ?p is common knowledge
• knows: introspection operatorGoals can be epistemic: russian card, muddychildren...
• Complex reification mechanism for handling nested knowledgeoperator
75
Imperfect Information: open
Pending questions:
• Benefit of the epistemic dimension?• Imperfect information• Epistemic game
• Different types of imperfect information?• Perfect recall vs Memoryless• Nested operator?
• How to implement?Computational Complexity - Implementation with anepistemic reasoner?
76
Reasoning for winning?
GDL-based Strategic Reasoning
From Game Theory to Logic
• Key question in GT: can the player win?
• What is best response?
• What about rational behaviour and equilibrium?
Much of game theory is about the question whetherstrategic equilibria exist. But there are hardly any explicitlanguages for defining, comparing, or combiningstrategies.
([van Benthem, 2012], p.96.)
77
GDL-based Strategic Reasoning
Focus on the representation of strategies
Take the following actions with indicated priority:
1. Fill the centre if available.2. Fill a cell if it leads to immediately win.3. Fill a cell if otherwise will immediately lose.4. Fill an available corner.5. Fill anywhere available.
Question: How to represent and reason about this strategy? 78
GDL-based Strategy Representation (1/4)
Extend GDL with “Priority” operator: φOψ [Jiang et al., 2014]
φ should hold; if not then ψ hold
M, δ, j |= φ or (Paths(φ, δ[0, j ]) = ∅ and M, δ, j |= ψ)
where Paths(φ, δ[0, j ]) is the set of paths where φ holds at j andsharing initial segment δ[0, j ]:
Paths(does(r , a), δ[0, j ]) = {δ′′} and Paths(does(r , b), δ[0, j ]) = {δ}
79
GDL-based Strategy Representation (2/4)
Suppose M and δ:
• M, δ, 0 |= does(x , a2,2)
• M, δ, 0 |= does(x , a2,2)Odoes(x , a1,3)
• M, δ, 1 |= does(o, a1,3)
• M, δ, 1 6|= does(o, a2,2)
• M, δ, 1 |= does(o, a2,2)Odoes(o, a1,3)
80
GDL-based Strategy Representation (3/4)
Strategy rule
• syntax: φ := ϕ1Oϕ2O · · ·Oϕn
• Non-ambiguous: at any state, φ must “elicit” only one action:
Example: strategy for Tic-Tac-Toe
81
GDL-based Strategy Representation (4/4)
Example: no-loosing strategy for the first player
• Strategy for Player x (1st player)
φx := (turn(x)→ combinedx) ∧ (¬turn(x)→ noopx)
• Strategy rule φx is a no-loosing strategy for xNo way to express the output in the GDL withpriority
82
GDL-based Strategy Representation
A modal reading of the priority operator (1/2)[Zhang and Thielscher, 2015]
• Basic GDL + look ahead operator: �a� ϕIf action a were chosen then ϕ would be true (but ais not executed)
• does operator restricted to joint action: does(a)
• New semantics relative to a state and a joint action: w , a |= ϕ
• w , a |= p iff p ∈ π(w)
• w , a |= does(b) iff a = b
• w , a |=�b � ϕ iff w , b |= ϕ
83
GDL-based Strategy Representation
A modal reading of the priority operator (2/2)
• Prioritised disjunction operator
ϕOψ =def ϕ ∨ (ψ ∧∧c
�c � ¬ϕ)
• In terms of semanticsFor any M, w and a: w , a |= ϕOψ iff eitherw , a |= ϕ or w , a |= ψ but w , c |= ¬ϕ for all c
84
ATL-Reasoning about strategies
ATL: reasoning about cooperation
〈〈C 〉〉ϕ Coalition C can achieve ϕ
• ATL for reasoning about GDL game description[Ruan et al., 2009]
• GDL + ATL [Jiang et al., 2014]• representing game strategies• reasoning about strategic abilities
85
ATL Reasoning about strategies
Alternating-time Temporal Logic - Syntax [Alur et al., 2002]
• Coalition operator 〈〈C 〉〉• Temporal operator © (next), 2 (always),U (until)
ϕ ::= p | ϕ ∨ ϕ | 〈〈C 〉〉 © ϕ | 〈〈C 〉〉2ϕ | 〈〈C 〉〉ϕUϕ
• Coalition and temporal operators always together (3sometimes)
〈〈x〉〉3wins(x) ∨ 〈〈x〉〉3¬wins(−x)
86
ATL Reasoning about strategies
Alternating-time Temporal Logic - Semantics
• based on Concurrent Game Structure (or Transition systems)
A = (Q, q0,N,Π, π, legal , update)
where• Q: set of states• q0: initial state• N: set of agents• Π: propositions• π: valuation function• legal : possible move function for each agent• update: deterministic joint move transition function
• Truth condition relative to a state q
A, q |=ATL ϕ
87
ATL Reasoning about strategies
Alternating-time Temporal Logic - Semantics
• λ: sequence of states called computation
• Additional component: strategy function fa(λ) ∈ legal(a, q)
where q is the last state of λ
FA = {fa|a ∈ A}
• Output of a strategy out(q,FA): the set of possiblecomputations starting in state q where the agents in A use thestrategies FA.
88
ATL Reasoning about strategies
Alternating-time Temporal Logic - Semantics
• A = (Q, q0,N,Π, π, legal , update)
• Truth conditions• A, q |=ATL p iff p ∈ π(q)
• A, q |=ATL 〈〈C 〉〉 © ϕ iff there exists FC such that:
A, λ[1] |=ATL ϕ for all λ ∈ out(q,FC )
• A, q |=ATL 〈〈C 〉〉2ϕ iff there exists FC such that:
A, λ[i ] |=ATL ϕ for all λ ∈ out(q,FC ) and i > 0
• A, q |=ATL 〈〈C 〉〉ϕUψ iff there exists FC for all λ ∈ out(q,FC )
and for some j > 0 such that
A, λ[j ] |=ATL ψ and A, λ[i ] |=ATL ϕ for all 0 ≤ i < j
89
ATL Reasoning about strategies
A = (Q, q0,N,Π, π, legal , update)
• A, q0 |=ATL
〈〈x〉〉2(px1,1 ∨ px3,3)
• A, q2 |=ATL
〈〈o〉〉 © po2,2
90
ATL Reasoning about strategies
ATL for checking GDL specification [Ruan et al., 2009]
• Translation/embedding of GDL theory to ATL
• Model checking is EXPTIME
• Checking soundness
〈〈〉〉2((terminal ∧ ϕ)→ 〈〈〉〉2(terminal ∧ ϕ))
• Winnable ∨i
〈〈i〉〉3wins(i)
• Sequential
〈〈〉〉2(〈〈N〉〉 © ϕ→∨i
〈〈i〉〉 © ϕ)
91
GDL+ATL Reasoning about strategies
Generic Game Results [Jiang et al., 2014]
Let M∗ be a state transition model for finite turn-taking games ofperfect information with N = {r1, r2}, then
Weak DeterminacyM∗ |=
〈〈r1〉〉2(terminal → wins(r1)) ∨ 〈〈r2〉〉2¬〈〈r1〉〉2(terminal → wins(r1))
Zermelo’s TheoremM∗ |= 〈〈r1〉〉2(terminal → wins(r1)) ∨ 〈〈r2〉〉2(terminal → wins(r2))
∨(〈〈r1〉〉2(terminal → Tie) ∧ 〈〈r2〉〉2(terminal → Tie))
where Tie =def (¬wins(r1) ∧ ¬wins(r2)).
92
GDL+ATL Reasoning about strategies
Complexity Result
• Verify properties of a strategy, e.g., no-losing strategy forplayer x in Tic-Tac-Toe:
〈〈x〉〉2(terminal → ¬wins(o))
• Model-checking problem: in PSPACE.• Reduce to the model-checking problem of alternating-time
temporal logic [Schobbens, 2004].
93
GDL+ATL Reasoning about strategies - On going work
Mixing priority and ATL operators (ongoing work)
• agent r may win?
posCheck r =∨
i ,j∈1..3
does(r , ai ,j)→ 〈〈r〉〉3check r
• agent r can prevent −r to win
posBlock r =∨
i ,j∈1..3
does(r , ai ,j)→ 〈〈r〉〉3block r
(Towards) General strategic player
check rOblock rOposCheck raOposBlockra
94
GDL-based Strategy Representation - Going further
Pending questions:
• How to represent strategies in imperfect information games?
• How to design strategies?Connection with Machine Learning and Planning
• Generalize strategies?Are they any common points (General StrategicReasoning)
• How to implement?Complexity of strategic reasoning and complexity ofthe game
95
Still a lot to do! - Example:Equivalent games
Equivalent games (1/3)
Number Scrabble:
1. initial ↔ turn(b) ∧ ¬turn(w) ∧9∧
i=1¬(s(b, i) ∨ s(w, i))
2. wins(r)↔(∨3
i=2(s(r , i) ∧ s(r , 4) ∧ s(r , 11− i))∨∨2i=1(s(r , i) ∧ s(r , 6) ∧ s(r , 9− i))∨∨4i=1(s(r , 5− i) ∧ s(r , 5) ∧ s(r , 5 + i))
)3. terminal ↔ wins(b) ∨ wins(w) ∨
9∧i=1
(s(b, i) ∨ s(w, i))
4. legal(r , pick(n))↔ ¬(s(b, n) ∨ s(w, n)) ∧ turn(r) ∧ ¬terminal
5. legal(r , noop)↔ turn(−r) ∨ terminal
6. ©s(r , n)↔ s(r , n) ∨ (¬(s(b, n) ∨ s(w, n)) ∧ does(r , pick(n)))
7. turn(r) ∧ ¬terminal →©¬turn(r) ∧©turn(−r)
96
Equivalent games (2/3)
Equivalence
Semantics 2 models (State-Transition) with a bisimulationbetween them
Syntax Set of rules are equivalent
Number Scrabble and Tic-Tac-Toe are equivalent
97
Equivalent Games (3/3)
Pending questions:
• Loose equivalenceA game is “close ” to a second one? Restrictedequivalence to a sub-part of the game?
• Connecting equivalence and strategic reasoning“ready-to-go” strategies
• How to implementComplexity for deciding whether two games areequivalent. Available heuristics?
98
Perspectives
A lot of questions!
On GDL:
• Connecting action and strategy
• Imperfect Information
• Games comparison
Still on GDL
• Connection to planning
• Construction of a General Player?Is it realistic to reason with GDL formulas?
99
Main references i
Alur, R., Henzinger, T. A., and Kupferman, O. (2002).Alternating-time temporal logic.Journal of the ACM, 49(5):672–713.
Genesereth, M. and Thielscher, M. (2014).General game playing, volume 8 of Synthesis Lectures onArtificial Intelligence and Machine Learning.Morgan & Claypool Publishers.
Jiang, G. (2016).Logics for strategic reasoning and collectivedecision-making.PhD thesis, Western Sydney & University Université Toulouse1.
100
Main references ii
Jiang, G., Perrussel, L., and Zhang, D. (2017).On axiomatization of epistemic GDL.In LORI, volume 10455 of Lecture Notes in Computer Science,pages 598–613. Springer.
Jiang, G., Zhang, D., and Perrussel, L. (2014).GDL meets ATL: A logic for game description andstrategic reasoning.In Proceedings of the 13th Pacific Rim International Conferenceon Artificial Intelligence (PRICAI’14), pages 733–746. Springer.
101
Main references iii
Jiang, G., Zhang, D., Perrussel, L., and Zhang, H. (2016).Epistemic GDL: A logic for representing and reasoningabout imperfect information games.In Proceedings of the 25th International Joint Conference onArtificial Intelligence (IJCAI’16), pages 1138–1144.
Ruan, J., van der Hoek, W., and Wooldridge, M. (2009).Verification of games in the game description language.J. Log. Comput., 19(6):1127–1156.
Schobbens, P.-Y. (2004).Alternating-time logic with imperfect recall.Electronic Notes in Theoretical Computer Science,85(2):82–93.
102
Main references iv
Thielscher, M. (2010).A general game description language for incompleteinformation games.In AAAI. AAAI Press.
Thielscher, M. (2017).GDL-III: A description language for epistemic generalgame playing.In IJCAI, pages 1276–1282. ijcai.org.
Zhang, D. and Thielscher, M. (2015).A logic for reasoning about game strategies.In Proceedings of the 29th AAAI Conference on ArtificialIntelligence (AAAI’15), pages 1671–1677.
103
Want to work on it
Post-Graduate - Master thesis: Strategic player
Starting date: Sept 19 - 6 months (flexible)Keywords: strategic reasoning, GDL, ATL,Model-checking based player.
More details: [email protected]
104
Appendix
Proof theory of GDL
Mainly consists of axiom schemas for ©, modus ponens inferencerule and general game properties:
Axioms
1. All tautologies of classical propositional logic.2. ©(ϕ→ ψ)→ (©ϕ→©ψ)
3. ¬© ϕ→©¬ϕ
Axioms for general game properties
4. © initial
5. terminal →∧
ar∈Ar\{noopr} ¬legal(ar ) ∧ legal(noopr )
6.∨
ar∈Ar does(r , a)
7. ¬(does(r , a) ∧ does(r , b)) for ar 6= br .8. does(ar )→ legal(ar )
9. ϕ ∧ terminal →©ϕ 105
GDL and other logical systems
GDL and Propositional Dynamic Logic (PDL)
• PDL formulas: [α]ϕ s.t. [α]ϕ =def ¬〈α〉¬ϕ• α limited to atomic program and sequence
• Interpretation over Kripke structure M = (W ,Rα, v)
• PDL semantics• M,w |= p ⇐⇒ p ∈ v(w)
• M,w |= [α]ϕ iff for all w ′ ∈ Rα, M,w ′ |= ϕ
106
GDL and other logical systems
GDL and Propositional Dynamic Logic (PDL)
• Mapping between GDL and PDL
• First step: map the signature and formulas
• Second step: map the model (interpretations and paths)
• Third step: mapping result
MGDL, δGDL, j |=GDL ϕ ⇐⇒ MPDL,wj |=PDL tr(ϕ)
107
GDL and other logical systems
GDL and Propositional Dynamic Logic (PDL)
• Mapping between GDL and PDL
• First step: map the signature and formulas
• Second step: map the model (interpretations and paths)
• Third step: mapping result
MGDL, δGDL, j |=GDL ϕ ⇐⇒ MPDL,wj |=PDL tr(ϕ)
108
Epistemic extension: Axiomatics (1/3)
Mainly consists of axiom schemas and inference rules for ©,Kr ,Cand general game properties [Jiang et al., 2017]
Axioms
1. All tautologies of classical propositional logic.
Axioms for general game properties
2. © initial
3. terminal →∧
ar∈Ar\{noopr} ¬legal(ar ) ∧ legal(noopr )
4.∨
ar∈Ar does(ar )
5. ¬(does(ar ) ∧ does(br )) for ar 6= br .
6. does(ar )→ legal(ar )
7. ϕ ∧ terminal →©ϕ
109
Epistemic extension: Axiomatics (2/3)
Axioms for ©,Kr ,C
8. ©(ϕ→ ψ)→ (©ϕ→©ψ)
9. ¬© ϕ↔©¬ϕ10. Kr (ϕ→ ψ)→ (Krϕ→
Krψ)
11. Krϕ→ ϕ
12. Krϕ→ KrKrϕ
13. ¬Krϕ→ Kr¬Krϕ
14. Eϕ↔∧m
r=1 Krϕ
15. Cϕ→ E(ϕ ∧ Cϕ)
Inference Rules
(R1) From ϕ, ϕ→ ψ
infer ψ.
(R2) From ϕ infer ©ϕ.(R3) From ϕ infer Krϕ.
(R4) From ϕ→ E(ϕ∧ψ)
infer ϕ→ Cψ.
110
Epistemic extension: Axiomatics (3/3)
Derivation about Krieg-Tic-Tac-Toe (full description: ΣKT ).
Proposition
For any r ∈ NKT and ari ,j ∈ ArKT ,
1. `ΣKTinitial → Cinitial
2. `ΣKTlegal(ari ,j)→ Kr (legal(ari ,j))
3. `ΣKTdoes(ari ,j)→©Kr (pri ,j ∨ tried(ari ,j))
4. `ΣKTKr tried(ari ,j)→ Krp
−ri ,j
111
Completeness... in one slide
Overall picture
the closure of
𝑘(), 𝑘,𝑟()
the pre-model
the ET-model
(
)
Def 6 Def 9
for some MCS
0()
, *
Prop 3
Consistent
(
),
Lem 1 Lem 2
Extend to a
path in
Construct a path
in (
)
from path
Satisfiable
Prop 9
112