Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

47
game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr

Transcript of Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

Page 1: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

game playing

Notes adapted from lecture notes for CMSC 421 by B.J. Dorr

Page 2: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

2

Outline• What are games?

• Optimal decisions in games– Which strategy leads to success?

• - pruning

• Games of imperfect information

• Games that include an element of chance

Page 3: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

3

What are and why study games?• Games are a form of multi-agent environment

– What do other agents do and how do they affect our success?

– Cooperative vs. competitive multi-agent environments.– Competitive multi-agent environments give rise to

adversarial problems a.k.a. games

• Why study games?– Fun; historically entertaining– Interesting subject of study because they are hard– Easy to represent and agents restricted to small number

of actions

Page 4: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

4

Relation of Games to Search• Search – no adversary

– Solution is (heuristic) method for finding goal– Heuristics and CSP techniques can find optimal solution– Evaluation function: estimate of cost from start to goal through

given node– Examples: path planning, scheduling activities

• Games – adversary– Solution is strategy (strategy specifies move for every

possible opponent reply).– Time limits force an approximate solution– Evaluation function: evaluate “goodness” of

game position

– Examples: chess, checkers, Othello, backgammon

Page 5: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

5

Types of Games

Page 6: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

6

Game setup• Two players: MAX and MIN• MAX moves first and they take turns until the game

is over. Winner gets award, looser gets penalty.• Games as search:

– Initial state: e.g. board configuration of chess– Successor function: list of (move,state) pairs specifying

legal moves.– Terminal test: Is the game finished?– Utility function: Gives numerical value of terminal states.

E.g. win (+1), lose (-1) and draw (0) in tic-tac-toe (next)

• MAX uses search tree to determine next move.

Page 7: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

7

Partial Game Tree for Tic-Tac-Toe

Page 8: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

8

Optimal strategies• Find the contingent strategy for MAX assuming an

infallible MIN opponent.• Assumption: Both players play optimally !!• Given a game tree, the optimal strategy can be

determined by using the minimax value of each node:

MINIMAX-VALUE(n)=UTILITY(n) If n is a terminal

maxs successors(n) MINIMAX-VALUE(s) If n is a max node

mins successors(n) MINIMAX-VALUE(s) If n is a min node

Page 9: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

9

Two-Ply Game Tree

Page 10: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

10

Two-Ply Game Tree• MAX nodes

• MIN nodes

• Terminal nodes – utility values for MAX

• Other nodes – minimax values

• MAX’s best move at root – a1 - leads to the successor with the highest minimax value

• MIN’s best to reply – b1 – leads to the successor with the lowest minimax value

Page 11: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

11

What if MIN does not play optimally?

• Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX.

• But if MIN does not play optimally, MAX will do even better. [Can be proved.]

Page 12: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

12

Minimax Algorithmfunction MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v

function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v

function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v

Page 13: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

13

Properties of Minimax

Criterion Minimax

Complete? Yes

Time O(bm)

Space O(bm)

Optimal? Yes

Page 14: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

14

Tic-Tac-Toe• Depth bound 2

• Breadth first search until all nodes at level 2 are generated

• Apply evaluation function to positions at these nodes

Page 15: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

15

Tic-Tac-Toe• Evaluation function e(p) of a position p

– If p is not a winning position for either player – e(p) = (number of a complete rows, columns, or

diagonals that are still open for MAX) – (number of a complete rows, columns, or diagonals that are still open for MIN)

– If p is a win for MAX• e(p) =

– If p is a win for MIN • e(p) = -

Page 16: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

16

Tic-Tac-Toe - First stage

x

x

x

xo

xo

xo

xo

1

-1

-21

6-4=2

5-4=1

6-5=1

5-5=0

6-5=1

5-6=-1

5-5=0

5-6=-1

6-6=0

4-6=-2

4-5=-1

5-5=0

xo

xo

x

o

x

o

x o

x

o

xo

x o

MAX’s move

Page 17: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

17

Tic-Tac-Toe - Second stagexo

0

1

1

0

3-2=1

5-2=3

3-2=1

4-2=2

3-2=1

4-2=23-3=0

5-3=2

3-3=0

4-3=1

4-3=1

4-3=14-2=2

5-2=3

3-2=1

4-2=2

4-2=2

4-2=2

4-3=1

3-3=0

4-3=1

xx

x

o x

1

x

o x x

o x x

o x x

ox

xox

xox

xox

xox

xox

xox

xxo

xxo

xxo

xxo

xxo

xxo

xxo

ox

xo

xxo

xxo

xxo

xxo

xxo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

MAX’s move

Page 18: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

18

Multiplayer games• Games allow more than two players

• Single minimax values become vectors

Page 19: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

19

Problem of minimax search• Number of games states is exponential to the

number of moves.– Solution: Do not examine every node– ==> Alpha-beta pruning

• Alpha = value of best choice found so far at any choice point along the MAX path

• Beta = value of best choice found so far at any choice point along the MIN path

• Revisit example …

Page 20: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

20

Tic-Tac-Toe - First stage

x

x

xo

-1

6-5=1

5-5=0

6-5=1

5-6=-1

4-5=-1

5-5=0

xo

x o

x

o

xo

x o

Alpha value = -1

Beta value = -1

A

B

C

Page 21: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

21

Alpha-beta pruning• Search progresses in a depth first manner • Whenever a tip node is generated, it’s static

evaluation is computed • Whenever a position can be given a backed up

value, this value is computed • Node A and all its successors have been generated • Backed up value -1• Node B and its successors have not yet been

generated • Now we know that the backed up value of the start

node is bounded from below by -1

Page 22: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

22

Alpha-beta pruning• Depending on the back up values of the

other successors of the start node, the final backed up value of the start node may be greater than -1, but it cannot be less

• This lower bound – alpha value for the start nodea

Page 23: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

23

Alpha-beta pruning• Depth first search proceeds – node B and its

first successor node C are generated

• Node C is given a static value of -1

• Backed up value of node B is bounded from above by -1

• This Upper bound on node B – beta value.

• Therefore discontinue search below node B. – Node B. will not turn out to be preferable to node

A

Page 24: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

24

Alpha-beta pruning• Reduction in search effort achieved by

keeping track of bounds on backed up values

• As successors of a node are given backed up values, the bounds on backed up values can be revised – Alpha values of MAX nodes that can never

decrease – Beta values of MIN nodes can never increase

Page 25: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

25

Alpha-beta pruning• Therefore search can be discontinued

– Below any MIN node having a beta value less than or equal to the alpha value of any of its MAX node ancestors

• The final backed up value of this MIN node can then be set to its beta value

– Below any MAX node having an alpha value greater than or equal to the beta value of any of its MINa node ancestors

• The final backed up value of this MAX node can then be set to its alpha value

Page 26: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

26

Alpha-beta pruning• during search, alpha and beta values are

computed as follows:– The alpha value of a MAX node is set equal to

the current largest final backed up value of its successors

– The beta value of a MINof the node is set equal to the current smallest final backed up value of its successors

Page 27: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

27

Alpha-Beta Example

[-∞, +∞]

[-∞,+∞]

Range of possible valuesDo DF-search until first leaf

Page 28: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

28

Alpha-Beta Example (continued)

[-∞,3]

[-∞,+∞]

Page 29: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

29

Alpha-Beta Example (continued)

[-∞,3]

[-∞,+∞]

Page 30: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

30

Alpha-Beta Example (continued)

[3,+∞]

[3,3]

Page 31: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

31

Alpha-Beta Example (continued)

[-∞,2]

[3,+∞]

[3,3]

This node is worse for MAX

Page 32: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

32

Alpha-Beta Example (continued)

[-∞,2]

[3,14]

[3,3] [-∞,14]

,

Page 33: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

33

Alpha-Beta Example (continued)

[−∞,2]

[3,5]

[3,3] [-∞,5]

,

Page 34: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

34

Alpha-Beta Example (continued)

[2,2][−∞,2]

[3,3]

[3,3]

Page 35: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

35

Alpha-Beta Example (continued)

[2,2][-∞,2]

[3,3]

[3,3]

Page 36: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

36

Alpha-Beta Algorithmfunction ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v

function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s, , )) if v ≥ then return v MAX( ,v) return v

Page 37: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

37

Alpha-Beta Algorithm

function MIN-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s, , )) if v ≤ then return v MIN( ,v) return v

Page 38: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

38

General alpha-beta pruning• Consider a node n

somewhere in the tree• If player has a better

choice at– Parent node of n– Or any choice point further

up

• n will never be reached in actual play.

• Hence when enough is known about n, it can be pruned.

Page 39: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

39

Final Comments about Alpha-Beta Pruning

• Pruning does not affect final results• Entire subtrees can be pruned.• Good move ordering improves effectiveness of

pruning• Alpha-beta pruning can look twice as far as

minimax in the same amount of time• Repeated states are again possible.

– Store them in memory = transposition table

Page 40: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

40

Games that include chance

• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

Page 41: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

41

Games that include chance

• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

• [1,1], [6,6] chance 1/36, all other chance 1/18

chance nodes

Page 42: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

42

Games that include chance

• [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only expected

value

Page 43: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

43

Expected minimax valueEXPECTED-MINIMAX-VALUE(n)=

UTILITY(n) If n is a terminal

maxs successors(n) MINIMAX-VALUE(s) If n is a max node

mins successors(n) MINIMAX-VALUE(s) If n is a max node

s successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node

These equations can be backed-up recursively all the way to the root of the game tree.

Page 44: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

44

Position evaluation with chance nodes

• Left, A1 wins• Right A2 wins• Outcome of evaluation function may not change when

values are scaled differently.• Behavior is preserved only by a positive linear

transformation of EVAL.

Page 45: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

45

Discussion• Examine section on state-of-the-art games yourself• Minimax assumes right tree is better than left, yet

…– Return probability distribution over possible values– Yet expensive calculation

Page 46: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

46

Discussion• Utility of node expansion

– Only expand those nodes which lead to significanlty better moves

• Both suggestions require meta-reasoning

Page 47: Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.

47

Summary• Games are fun (and dangerous)

• They illustrate several important points about AI– Perfection is unattainable -> approximation– Good idea what to think about– Uncertainty constrains the assignment of values

to states

• Games are to AI as grand prix racing is to automobile design.