Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

game playing

Notes adapted from lecture notes for CMSC 421 by B.J. Dorr

2

Outline• What are games?

• Optimal decisions in games– Which strategy leads to success?

• - pruning

• Games of imperfect information

• Games that include an element of chance

3

What are and why study games?• Games are a form of multi-agent environment

– What do other agents do and how do they affect our success?

– Cooperative vs. competitive multi-agent environments.– Competitive multi-agent environments give rise to

adversarial problems a.k.a. games

• Why study games?– Fun; historically entertaining– Interesting subject of study because they are hard– Easy to represent and agents restricted to small number

of actions

4

Relation of Games to Search• Search – no adversary

– Solution is (heuristic) method for finding goal– Heuristics and CSP techniques can find optimal solution– Evaluation function: estimate of cost from start to goal through

given node– Examples: path planning, scheduling activities

• Games – adversary– Solution is strategy (strategy specifies move for every

possible opponent reply).– Time limits force an approximate solution– Evaluation function: evaluate “goodness” of

game position

– Examples: chess, checkers, Othello, backgammon

5

Types of Games

6

Game setup• Two players: MAX and MIN• MAX moves first and they take turns until the game

is over. Winner gets award, looser gets penalty.• Games as search:

– Initial state: e.g. board configuration of chess– Successor function: list of (move,state) pairs specifying

legal moves.– Terminal test: Is the game finished?– Utility function: Gives numerical value of terminal states.

E.g. win (+1), lose (-1) and draw (0) in tic-tac-toe (next)

• MAX uses search tree to determine next move.

7

Partial Game Tree for Tic-Tac-Toe

8

Optimal strategies• Find the contingent strategy for MAX assuming an

infallible MIN opponent.• Assumption: Both players play optimally !!• Given a game tree, the optimal strategy can be

determined by using the minimax value of each node:

MINIMAX-VALUE(n)=UTILITY(n) If n is a terminal

maxs successors(n) MINIMAX-VALUE(s) If n is a max node

mins successors(n) MINIMAX-VALUE(s) If n is a min node

9

Two-Ply Game Tree

10

Two-Ply Game Tree• MAX nodes

• MIN nodes

• Terminal nodes – utility values for MAX

• Other nodes – minimax values

• MAX’s best move at root – a1 - leads to the successor with the highest minimax value

• MIN’s best to reply – b1 – leads to the successor with the lowest minimax value

11

What if MIN does not play optimally?

• Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX.

• But if MIN does not play optimally, MAX will do even better. [Can be proved.]

12

Minimax Algorithmfunction MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v

function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v

function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v

13

Properties of Minimax

Criterion Minimax

Complete? Yes

Time O(bm)

Space O(bm)

Optimal? Yes

14

Tic-Tac-Toe• Depth bound 2

• Breadth first search until all nodes at level 2 are generated

• Apply evaluation function to positions at these nodes

15

Tic-Tac-Toe• Evaluation function e(p) of a position p

– If p is not a winning position for either player – e(p) = (number of a complete rows, columns, or

diagonals that are still open for MAX) – (number of a complete rows, columns, or diagonals that are still open for MIN)

– If p is a win for MAX• e(p) =

– If p is a win for MIN • e(p) = -

16

Tic-Tac-Toe - First stage

x

x

x

xo

xo

xo

xo

1

-1

-21

6-4=2

5-4=1

6-5=1

5-5=0

6-5=1

5-6=-1

5-5=0

5-6=-1

6-6=0

4-6=-2

4-5=-1

5-5=0

xo

xo

x

o

x

o

x o

x

o

xo

x o

MAX’s move

17

Tic-Tac-Toe - Second stagexo

0

1

1

0

3-2=1

5-2=3

3-2=1

4-2=2

3-2=1

4-2=23-3=0

5-3=2

3-3=0

4-3=1

4-3=1

4-3=14-2=2

5-2=3

3-2=1

4-2=2

4-2=2

4-2=2

4-3=1

3-3=0

4-3=1

xx

x

o x

1

x

o x x

o x x

o x x

ox

xox

xox

xox

xox

xox

xox

xxo

xxo

xxo

xxo

xxo

xxo

xxo

ox

xo

xxo

xxo

xxo

xxo

xxo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

MAX’s move

18

Multiplayer games• Games allow more than two players

• Single minimax values become vectors

19

Problem of minimax search• Number of games states is exponential to the

number of moves.– Solution: Do not examine every node– ==> Alpha-beta pruning

• Alpha = value of best choice found so far at any choice point along the MAX path

• Beta = value of best choice found so far at any choice point along the MIN path

• Revisit example …

20

Tic-Tac-Toe - First stage

x

x

xo

-1

6-5=1

5-5=0

6-5=1

5-6=-1

4-5=-1

5-5=0

xo

x o

x

o

xo

x o

Alpha value = -1

Beta value = -1

A

B

C

21

Alpha-beta pruning• Search progresses in a depth first manner • Whenever a tip node is generated, it’s static

evaluation is computed • Whenever a position can be given a backed up

value, this value is computed • Node A and all its successors have been generated • Backed up value -1• Node B and its successors have not yet been

generated • Now we know that the backed up value of the start

node is bounded from below by -1

22

Alpha-beta pruning• Depending on the back up values of the

other successors of the start node, the final backed up value of the start node may be greater than -1, but it cannot be less

• This lower bound – alpha value for the start nodea

23

Alpha-beta pruning• Depth first search proceeds – node B and its

first successor node C are generated

• Node C is given a static value of -1

• Backed up value of node B is bounded from above by -1

• This Upper bound on node B – beta value.

• Therefore discontinue search below node B. – Node B. will not turn out to be preferable to node

A

24

Alpha-beta pruning• Reduction in search effort achieved by

keeping track of bounds on backed up values

• As successors of a node are given backed up values, the bounds on backed up values can be revised – Alpha values of MAX nodes that can never

decrease – Beta values of MIN nodes can never increase

25

Alpha-beta pruning• Therefore search can be discontinued

– Below any MIN node having a beta value less than or equal to the alpha value of any of its MAX node ancestors

• The final backed up value of this MIN node can then be set to its beta value

– Below any MAX node having an alpha value greater than or equal to the beta value of any of its MINa node ancestors

• The final backed up value of this MAX node can then be set to its alpha value

26

Alpha-beta pruning• during search, alpha and beta values are

computed as follows:– The alpha value of a MAX node is set equal to

the current largest final backed up value of its successors

– The beta value of a MINof the node is set equal to the current smallest final backed up value of its successors

27

Alpha-Beta Example

[-∞, +∞]

[-∞,+∞]

Range of possible valuesDo DF-search until first leaf

28

Alpha-Beta Example (continued)

[-∞,3]

[-∞,+∞]

29


[-∞,3]

[-∞,+∞]

30


[3,+∞]

[3,3]

31


[-∞,2]

[3,+∞]

[3,3]

This node is worse for MAX

32


[-∞,2]

[3,14]

[3,3] [-∞,14]

,

33


[−∞,2]

[3,5]

[3,3] [-∞,5]

,

34


[2,2][−∞,2]

[3,3]

[3,3]

35


[2,2][-∞,2]

[3,3]

[3,3]

36

Alpha-Beta Algorithmfunction ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v

function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s, , )) if v ≥ then return v MAX( ,v) return v

37

Alpha-Beta Algorithm

function MIN-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s, , )) if v ≤ then return v MIN( ,v) return v

38

General alpha-beta pruning• Consider a node n

somewhere in the tree• If player has a better

choice at– Parent node of n– Or any choice point further

up

• n will never be reached in actual play.

• Hence when enough is known about n, it can be pruned.

39

Final Comments about Alpha-Beta Pruning

• Pruning does not affect final results• Entire subtrees can be pruned.• Good move ordering improves effectiveness of

pruning• Alpha-beta pruning can look twice as far as

minimax in the same amount of time• Repeated states are again possible.

– Store them in memory = transposition table

40

Games that include chance

• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

41


• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

• [1,1], [6,6] chance 1/36, all other chance 1/18

chance nodes

42


• [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only expected

value

43

Expected minimax valueEXPECTED-MINIMAX-VALUE(n)=

UTILITY(n) If n is a terminal

maxs successors(n) MINIMAX-VALUE(s) If n is a max node

mins successors(n) MINIMAX-VALUE(s) If n is a max node

s successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node

These equations can be backed-up recursively all the way to the root of the game tree.

44

Position evaluation with chance nodes

• Left, A1 wins• Right A2 wins• Outcome of evaluation function may not change when

values are scaled differently.• Behavior is preserved only by a positive linear

transformation of EVAL.

45

Discussion• Examine section on state-of-the-art games yourself• Minimax assumes right tree is better than left, yet

…– Return probability distribution over possible values– Yet expensive calculation

46

Discussion• Utility of node expansion

– Only expand those nodes which lead to significanlty better moves

• Both suggestions require meta-reasoning

47

Summary• Games are fun (and dangerous)

• They illustrate several important points about AI– Perfection is unattainable -> approximation– Good idea what to think about– Uncertainty constrains the assignment of values

to states

• Games are to AI as grand prix racing is to automobile design.

Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

Documents

Transcript of Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.