Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.
-
Upload
barry-james -
Category
Documents
-
view
236 -
download
2
Transcript of Game playing Notes adapted from lecture notes for CMSC 421 by B.J. Dorr.
game playing
Notes adapted from lecture notes for CMSC 421 by B.J. Dorr
2
Outline• What are games?
• Optimal decisions in games– Which strategy leads to success?
• - pruning
• Games of imperfect information
• Games that include an element of chance
3
What are and why study games?• Games are a form of multi-agent environment
– What do other agents do and how do they affect our success?
– Cooperative vs. competitive multi-agent environments.– Competitive multi-agent environments give rise to
adversarial problems a.k.a. games
• Why study games?– Fun; historically entertaining– Interesting subject of study because they are hard– Easy to represent and agents restricted to small number
of actions
4
Relation of Games to Search• Search – no adversary
– Solution is (heuristic) method for finding goal– Heuristics and CSP techniques can find optimal solution– Evaluation function: estimate of cost from start to goal through
given node– Examples: path planning, scheduling activities
• Games – adversary– Solution is strategy (strategy specifies move for every
possible opponent reply).– Time limits force an approximate solution– Evaluation function: evaluate “goodness” of
game position
– Examples: chess, checkers, Othello, backgammon
5
Types of Games
6
Game setup• Two players: MAX and MIN• MAX moves first and they take turns until the game
is over. Winner gets award, looser gets penalty.• Games as search:
– Initial state: e.g. board configuration of chess– Successor function: list of (move,state) pairs specifying
legal moves.– Terminal test: Is the game finished?– Utility function: Gives numerical value of terminal states.
E.g. win (+1), lose (-1) and draw (0) in tic-tac-toe (next)
• MAX uses search tree to determine next move.
7
Partial Game Tree for Tic-Tac-Toe
8
Optimal strategies• Find the contingent strategy for MAX assuming an
infallible MIN opponent.• Assumption: Both players play optimally !!• Given a game tree, the optimal strategy can be
determined by using the minimax value of each node:
MINIMAX-VALUE(n)=UTILITY(n) If n is a terminal
maxs successors(n) MINIMAX-VALUE(s) If n is a max node
mins successors(n) MINIMAX-VALUE(s) If n is a min node
9
Two-Ply Game Tree
10
Two-Ply Game Tree• MAX nodes
• MIN nodes
• Terminal nodes – utility values for MAX
• Other nodes – minimax values
• MAX’s best move at root – a1 - leads to the successor with the highest minimax value
• MIN’s best to reply – b1 – leads to the successor with the lowest minimax value
11
What if MIN does not play optimally?
• Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX.
• But if MIN does not play optimally, MAX will do even better. [Can be proved.]
12
Minimax Algorithmfunction MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v
function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v
function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v
13
Properties of Minimax
Criterion Minimax
Complete? Yes
Time O(bm)
Space O(bm)
Optimal? Yes
14
Tic-Tac-Toe• Depth bound 2
• Breadth first search until all nodes at level 2 are generated
• Apply evaluation function to positions at these nodes
15
Tic-Tac-Toe• Evaluation function e(p) of a position p
– If p is not a winning position for either player – e(p) = (number of a complete rows, columns, or
diagonals that are still open for MAX) – (number of a complete rows, columns, or diagonals that are still open for MIN)
– If p is a win for MAX• e(p) =
– If p is a win for MIN • e(p) = -
16
Tic-Tac-Toe - First stage
x
x
x
xo
xo
xo
xo
1
-1
-21
6-4=2
5-4=1
6-5=1
5-5=0
6-5=1
5-6=-1
5-5=0
5-6=-1
6-6=0
4-6=-2
4-5=-1
5-5=0
xo
xo
x
o
x
o
x o
x
o
xo
x o
MAX’s move
17
Tic-Tac-Toe - Second stagexo
0
1
1
0
3-2=1
5-2=3
3-2=1
4-2=2
3-2=1
4-2=23-3=0
5-3=2
3-3=0
4-3=1
4-3=1
4-3=14-2=2
5-2=3
3-2=1
4-2=2
4-2=2
4-2=2
4-3=1
3-3=0
4-3=1
xx
x
o x
1
x
o x x
o x x
o x x
ox
xox
xox
xox
xox
xox
xox
xxo
xxo
xxo
xxo
xxo
xxo
xxo
ox
xo
xxo
xxo
xxo
xxo
xxo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
MAX’s move
18
Multiplayer games• Games allow more than two players
• Single minimax values become vectors
19
Problem of minimax search• Number of games states is exponential to the
number of moves.– Solution: Do not examine every node– ==> Alpha-beta pruning
• Alpha = value of best choice found so far at any choice point along the MAX path
• Beta = value of best choice found so far at any choice point along the MIN path
• Revisit example …
20
Tic-Tac-Toe - First stage
x
x
xo
-1
6-5=1
5-5=0
6-5=1
5-6=-1
4-5=-1
5-5=0
xo
x o
x
o
xo
x o
Alpha value = -1
Beta value = -1
A
B
C
21
Alpha-beta pruning• Search progresses in a depth first manner • Whenever a tip node is generated, it’s static
evaluation is computed • Whenever a position can be given a backed up
value, this value is computed • Node A and all its successors have been generated • Backed up value -1• Node B and its successors have not yet been
generated • Now we know that the backed up value of the start
node is bounded from below by -1
22
Alpha-beta pruning• Depending on the back up values of the
other successors of the start node, the final backed up value of the start node may be greater than -1, but it cannot be less
• This lower bound – alpha value for the start nodea
23
Alpha-beta pruning• Depth first search proceeds – node B and its
first successor node C are generated
• Node C is given a static value of -1
• Backed up value of node B is bounded from above by -1
• This Upper bound on node B – beta value.
• Therefore discontinue search below node B. – Node B. will not turn out to be preferable to node
A
24
Alpha-beta pruning• Reduction in search effort achieved by
keeping track of bounds on backed up values
• As successors of a node are given backed up values, the bounds on backed up values can be revised – Alpha values of MAX nodes that can never
decrease – Beta values of MIN nodes can never increase
25
Alpha-beta pruning• Therefore search can be discontinued
– Below any MIN node having a beta value less than or equal to the alpha value of any of its MAX node ancestors
• The final backed up value of this MIN node can then be set to its beta value
– Below any MAX node having an alpha value greater than or equal to the beta value of any of its MINa node ancestors
• The final backed up value of this MAX node can then be set to its alpha value
26
Alpha-beta pruning• during search, alpha and beta values are
computed as follows:– The alpha value of a MAX node is set equal to
the current largest final backed up value of its successors
– The beta value of a MINof the node is set equal to the current smallest final backed up value of its successors
27
Alpha-Beta Example
[-∞, +∞]
[-∞,+∞]
Range of possible valuesDo DF-search until first leaf
28
Alpha-Beta Example (continued)
[-∞,3]
[-∞,+∞]
29
Alpha-Beta Example (continued)
[-∞,3]
[-∞,+∞]
30
Alpha-Beta Example (continued)
[3,+∞]
[3,3]
31
Alpha-Beta Example (continued)
[-∞,2]
[3,+∞]
[3,3]
This node is worse for MAX
32
Alpha-Beta Example (continued)
[-∞,2]
[3,14]
[3,3] [-∞,14]
,
33
Alpha-Beta Example (continued)
[−∞,2]
[3,5]
[3,3] [-∞,5]
,
34
Alpha-Beta Example (continued)
[2,2][−∞,2]
[3,3]
[3,3]
35
Alpha-Beta Example (continued)
[2,2][-∞,2]
[3,3]
[3,3]
36
Alpha-Beta Algorithmfunction ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v
function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s, , )) if v ≥ then return v MAX( ,v) return v
37
Alpha-Beta Algorithm
function MIN-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s, , )) if v ≤ then return v MIN( ,v) return v
38
General alpha-beta pruning• Consider a node n
somewhere in the tree• If player has a better
choice at– Parent node of n– Or any choice point further
up
• n will never be reached in actual play.
• Hence when enough is known about n, it can be pruned.
39
Final Comments about Alpha-Beta Pruning
• Pruning does not affect final results• Entire subtrees can be pruned.• Good move ordering improves effectiveness of
pruning• Alpha-beta pruning can look twice as far as
minimax in the same amount of time• Repeated states are again possible.
– Store them in memory = transposition table
40
Games that include chance
• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)
41
Games that include chance
• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)
• [1,1], [6,6] chance 1/36, all other chance 1/18
chance nodes
42
Games that include chance
• [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only expected
value
43
Expected minimax valueEXPECTED-MINIMAX-VALUE(n)=
UTILITY(n) If n is a terminal
maxs successors(n) MINIMAX-VALUE(s) If n is a max node
mins successors(n) MINIMAX-VALUE(s) If n is a max node
s successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node
These equations can be backed-up recursively all the way to the root of the game tree.
44
Position evaluation with chance nodes
• Left, A1 wins• Right A2 wins• Outcome of evaluation function may not change when
values are scaled differently.• Behavior is preserved only by a positive linear
transformation of EVAL.
45
Discussion• Examine section on state-of-the-art games yourself• Minimax assumes right tree is better than left, yet
…– Return probability distribution over possible values– Yet expensive calculation
46
Discussion• Utility of node expansion
– Only expand those nodes which lead to significanlty better moves
• Both suggestions require meta-reasoning
47
Summary• Games are fun (and dangerous)
• They illustrate several important points about AI– Perfection is unattainable -> approximation– Good idea what to think about– Uncertainty constrains the assignment of values
to states
• Games are to AI as grand prix racing is to automobile design.