G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.
-
Upload
joleen-mathews -
Category
Documents
-
view
215 -
download
0
Transcript of G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.
![Page 1: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/1.jpg)
GAME PLAYING 2
![Page 2: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/2.jpg)
THIS LECTURE
Alpha-beta pruning Games with chance Partially observable games
![Page 3: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/3.jpg)
NONDETERMINISM
Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX)
MAX’s play
MAX cannot tell what move will be played
MIN’s play
![Page 4: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/4.jpg)
NONDETERMINISM
Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX)
MAX’s play
MAX must decide what to play for BOTH these outcomes
MIN’s playInstead of a single path, the agent must construct an entire plan
![Page 5: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/5.jpg)
MINIMAX BACKUP
MIN’s turn
MAX’s turn
+1
+10
-1
MAX’s turn
0
+10 0
0 -1
+1
![Page 6: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/6.jpg)
DEPTH-FIRST MINIMAX ALGORITHM
MAX-Value(S)1. If Terminal?(S) return Result(S)2. Return maxS’SUCC(S) MIN-Value(S’)
MIN-Value(S)1. If Terminal?(S) return Result(S)2. Return minS’SUCC(S) MAX-Value(S’)
MINIMAX-Decision(S) Return action leading to state S’SUCC(S) that
maximizes MIN-Value(S’)
![Page 7: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/7.jpg)
REAL-TIME GAME PLAYING WITH EVALUATION FUNCTION
e(s): function indicating estimated favorability of a state to MAX
Keep track of depth, and add line: If(depth(s) = cutoff) return e(s)
After terminal test
![Page 8: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/8.jpg)
CAN WE DO BETTER?
Yes ! Much better !
3
-1
Pruning
-1
3
This part of the tree can’t have any effect on the value that will be backed up to the root
![Page 9: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/9.jpg)
EXAMPLE
![Page 10: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/10.jpg)
EXAMPLE
b = 2
2
The beta value of a MINnode is an upper bound onthe final backed-up value.It can never increase
![Page 11: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/11.jpg)
EXAMPLE
The beta value of a MINnode is an upper bound onthe final backed-up value.It can never increase
1
b = 1
2
![Page 12: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/12.jpg)
EXAMPLE
a = 1
The alpha value of a MAXnode is a lower bound onthe final backed-up value.It can never decrease
1
b = 1
2
![Page 13: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/13.jpg)
EXAMPLE
a = 1
1
b = 1
2 -1
b = -1
![Page 14: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/14.jpg)
EXAMPLE
a = 1
1
b = 1
2 -1
b = -1
Search can be discontinued belowany MIN node whose beta value is less than or equal to the alpha valueof one of its MAX ancestors
Search can be discontinued belowany MIN node whose beta value is less than or equal to the alpha valueof one of its MAX ancestors
![Page 15: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/15.jpg)
ALPHA-BETA PRUNING
Explore the game tree to depth h in depth-first manner
Back up alpha and beta values whenever possible
Prune branches that can’t lead to changing the final decision
![Page 16: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/16.jpg)
ALPHA-BETA ALGORITHM
Update the alpha/beta value of the parent of a node N when the search below N has been completed or discontinued
Discontinue the search below a MAX node N if its alpha value is the beta value of a MIN ancestor of N
Discontinue the search below a MIN node N if its beta value is the alpha value of a MAX ancestor of N
![Page 17: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/17.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
MAX
MIN
MAX
MIN
MAX
MIN
![Page 18: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/18.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
MAX
MIN
MAX
MIN
MAX
MIN
![Page 19: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/19.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
MAX
MIN
MAX
MIN
MAX
MIN
![Page 20: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/20.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0 -3
MAX
MIN
MAX
MIN
MAX
MIN
![Page 21: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/21.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0 -3
MAX
MIN
MAX
MIN
MAX
MIN
![Page 22: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/22.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0 -3
MAX
MIN
MAX
MIN
MAX
MIN
![Page 23: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/23.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0 -3 3
3
MAX
MIN
MAX
MIN
MAX
MIN
![Page 24: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/24.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0 -3 3
3
MAX
MIN
MAX
MIN
MAX
MIN
![Page 25: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/25.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
MAX
MIN
MAX
MIN
MAX
MIN
![Page 26: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/26.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
5
MAX
MIN
MAX
MIN
MAX
MIN
![Page 27: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/27.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
![Page 28: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/28.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
MAX
MIN
MAX
MIN
MAX
MIN
![Page 29: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/29.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
MAX
MIN
MAX
MIN
MAX
MIN
![Page 30: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/30.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
MAX
MIN
MAX
MIN
MAX
MIN
![Page 31: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/31.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
0MAX
MIN
MAX
MIN
MAX
MIN
![Page 32: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/32.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
5
0MAX
MIN
MAX
MIN
MAX
MIN
![Page 33: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/33.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
0MAX
MIN
MAX
MIN
MAX
MIN
![Page 34: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/34.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
0MAX
MIN
MAX
MIN
MAX
MIN
![Page 35: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/35.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
0MAX
MIN
MAX
MIN
MAX
MIN
![Page 36: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/36.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
0MAX
MIN
MAX
MIN
MAX
MIN
![Page 37: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/37.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
0MAX
MIN
MAX
MIN
MAX
MIN
![Page 38: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/38.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
0MAX
MIN
MAX
MIN
MAX
MIN
![Page 39: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/39.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
0MAX
MIN
MAX
MIN
MAX
MIN
![Page 40: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/40.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
0
![Page 41: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/41.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
0
1
MAX
MIN
MAX
MIN
MAX
MIN
![Page 42: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/42.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
2
2
2
2
1
1
MAX
MIN
MAX
MIN
MAX
MIN
![Page 43: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/43.jpg)
EXAMPLE
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
1
2
2
2
2
1MAX
MIN
MAX
MIN
MAX
MIN
![Page 44: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/44.jpg)
HOW MUCH DO WE GAIN?
Consider these two cases:
3
a = 3
-1
b=-1
(4)
3
a = 3
4
b=4
-1
![Page 45: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/45.jpg)
HOW MUCH DO WE GAIN? Assume a game tree of uniform branching factor b Minimax examines O(bh) nodes, so does alpha-beta
in the worst-case The gain for alpha-beta is maximum when:
The children of a MAX node are ordered in decreasing backed up values
The children of a MIN node are ordered in increasing backed up values
Then alpha-beta examines O(bh/2) nodes [Knuth and Moore, 1975]
But this requires an oracle (if we knew how to order nodes perfectly, we would not need to search the game tree)
If nodes are ordered at random, then the average number of nodes examined by alpha-beta is ~O(b3h/4)
![Page 46: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/46.jpg)
ALPHA-BETA IMPLEMENTATION MAX-Value(S,,)
1. If Terminal?(S) return Result(S)2. For all S’SUCC(S)3. max(,MIN-Value(S’,,))4. If , then return 5. Return
MIN-Value(S,,)1. If Terminal?(S) return Result(S)2. For all S’SUCC(S)3. min(,MAX-Value(S’,,))4. If , then return 5. Return
Alpha-Beta-Decision(S) Return action leading to state S’SUCC(S) that maximizes MIN-
Value(S’,-,+)
![Page 47: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/47.jpg)
HEURISTIC ORDERING OF NODES
Order the nodes below the root according to the values backed-up at the previous iteration
![Page 48: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/48.jpg)
OTHER IMPROVEMENTS
Adaptive horizon + iterative deepening Extended search: Retain k>1 best paths,
instead of just one, and extend the tree at greater depth below their leaf nodes (to help dealing with the “horizon effect”)
Singular extension: If a move is obviously better than the others in a node at horizon h, then expand this node along this move
Use transposition tables to deal with repeated states
Null-move search
![Page 49: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/49.jpg)
GAMES OF CHANCE
![Page 50: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/50.jpg)
GAMES OF CHANCE
Dice games: backgammon, Yahtzee, craps, … Card games: poker, blackjack, …
Is there a fundamental difference between the nondeterminism in chess-playing vs. the nondeterminism in a dice roll?
![Page 51: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/51.jpg)
MAX
CHANCE
MIN
CHANCE
MAX
![Page 52: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/52.jpg)
EXPECTED VALUES
The utility of a MAX/MIN node in the game tree is the max/min of the utility values of its successors
The expected utility of a CHANCE node in the game tree is the average of the utility values of its successors
ExpectedValue(s) = s’SUCC(s) ExpectedValue(s’) P(s’)
MinimaxValue(s) = max s’SUCC(s) MinimaxValue(s’)
Compare to
MinimaxValue(s) = min s’SUCC(s) MinimaxValue(s’)
CHANCE nodes
MAX nodes
MIN nodes
![Page 53: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/53.jpg)
ADVERSARIAL GAMES OF CHANCE
E.g., Backgammon MAX nodes, MIN nodes, CHANCE nodes Expectiminimax search Backup step:
MAX = maximum of children CHANCE = average of children MIN = minimum of children CHANCE = average of children
4 levels of the game tree separate each of MAX’s turns!
Evaluation function? Pruning?
![Page 54: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/54.jpg)
GENERALIZING MINIMAX VALUES
Utilities can be continuous numerical values, rather than +1,0,-1 Allows maximizing the amount of “points” (e.g.,
$) rewarded instead of just achieving a win Rewards associated with terminal states Costs can be associated with certain
decisions at non-terminal states (e.g., placing a bet)
![Page 55: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/55.jpg)
ROULETTE
“Game tree” only has depth 2 Place a bet Observe the roulette wheel
No bet
Bet: Red, $5
Red Not red
Chance node
18/38 20/38Probabilities
+10 0
![Page 56: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/56.jpg)
CHANCE NODE BACKUP
Expected value: For k children, with backed up
values v1,…,vk
Chance node value =p1 * v1 + p2 * v2 + … + pk * vk
Red Not red
Chance node
18/38 20/38Probabilities
+10 0
Bet: Red, $5
Value:18/38 * 10 + 20/38 * 0= 4.74
![Page 57: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/57.jpg)
MAX/CHANCE NODES
Red Not red
18/38 20/38
+10 0
Bet: Red, $5
4.74
MAX
Chance
Bet: 17, $5
3.95 = 150/38
17 Not 17
1/38 37/38
+150
0
Max should pick the action leading to the node with the highest value
![Page 58: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/58.jpg)
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins Pay $1 to start, at
which point both are flipped
Can flip up to two coins again, at a cost of $1 each
Payout: $5 for HH, $1 for HT or TH, $0 for TT
HT
HT HH
1/2 1/2
TTHT
1/2 1/2
HT Flip T Flip H
Done
HT HH TTHT
1/2 1/2 1/2 1/2
Flip TFlip HHT
Done
TT
DoneFlip T
HT TT
1/2 1/2
![Page 59: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/59.jpg)
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins Pay $1 to start, at
which point both are flipped
Can flip up to two coins again, at a cost of $1 each
Payout: $5 for HH, $1 for HT or TH, $0 for TT
HT
HT HH
1/2 1/2
TTHT
1/2 1/2
HT Flip T Flip H
Done
HT HH TTHT
1/2 1/2 1/2 1/2
Flip TFlip HHT
Done
TT
DoneFlip T
HT TT
1/2 1/2
1
5-1=4
1-1=0
1-2=-1 5-2=3 -1 -2 -1 -2
-1
![Page 60: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/60.jpg)
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins Pay $1 to start, at
which point both are flipped
Can flip up to two coins again, at a cost of $1 each
Payout: $5 for HH, $1 for HT or TH, $0 for TT
HT
HT HH
1/2 1/2
TTHT
1/2 1/2
HT Flip T Flip H
Done
HT HH TTHT
1/2 1/2 1/2 1/2
Flip TFlip HHT
Done
TT
DoneFlip T
HT TT
1/2 1/2
1
4
0
-1 3 -1 -2 -1 -2
-11 -3/2
![Page 61: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/61.jpg)
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins Pay $1 to start, at
which point both are flipped
Can flip up to two coins again, at a cost of $1 each
Payout: $5 for HH, $1 for HT or TH, $0 for TT
HT
HT HH
1/2 1/2
TTHT
1/2 1/2
HT Flip T Flip H
Done
HT HH TTHT
1/2 1/2 1/2 1/2
Flip TFlip HHT
Done
TT
DoneFlip T
HT TT
1/2 1/2
1
-1 -2 -1 -2
-11 -3/2
1 4
0
-1 3
![Page 62: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/62.jpg)
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins Pay $1 to start, at
which point both are flipped
Can flip up to two coins again, at a cost of $1 each
Payout: $5 for HH, $1 for HT or TH, $0 for TT
HT
HT HH
1/2 1/2
TTHT
1/2 1/2
HT Flip T Flip H
Done
HT HH TTHT
1/2 1/2 1/2 1/2
Flip TFlip HHT
Done
TT
DoneFlip T
HT TT
1/2 1/2
1
-1 -2 -1 -2
-12 -3/2
2
3
4
0
-1 3
![Page 63: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/63.jpg)
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins Pay $1 to start, at
which point both are flipped
Can flip up to two coins again, at a cost of $1 each
Payout: $5 for HH, $1 for HT or TH, $0 for TT
HT
HT HH
1/2 1/2
TTHT
1/2 1/2
HT Flip T Flip H
Done
HT HH TTHT
1/2 1/2 1/2 1/2
Flip TFlip HHT
Done
TT
DoneFlip T
HT TT
1/2 1/2
1
-1 -2 -1 -2
-12 -3/2
2
3
-3/2
4
0
-1 3
![Page 64: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/64.jpg)
TTHT
1/2 1/2
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins Pay $1 to start, at
which point both are flipped
Can flip up to two coins again, at a cost of $1 each
Payout: $5 for HH, $1 for HT or TH, $0 for TT
HT
HT HH
1/2 1/2
HT Flip T Flip H
Done
HT HH TTHT
1/2 1/2 1/2 1/2
Flip TFlip HHT
Done
TT
DoneFlip T
HT TT
1/2 1/2
1
-1 -2 -1 -2
-12 -3/2
2
3
2
-3/2
-1
1/2
4
0
-1 3
![Page 65: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/65.jpg)
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins Pay $1 to start, at
which point both are flipped
Can flip up to two coins again, at a cost of $1 each
Payout: $5 for HH, $1 for HT or TH, $0 for TT
HT
HT HH
1/2 1/2
TTHT
1/2 1/2
HT Flip T Flip H
Done
HT HH TTHT
1/2 1/2 1/2 1/2
Flip TFlip HHT
Done
TT
DoneFlip T
HT TT
1/2 1/2
1
-1 -2 -1 -2
-12 -3/2
2
3
2
-3/2
-1
1/2
3
4
0
-1 3
![Page 66: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/66.jpg)
CARD GAMES
Blackjack (6-deck), video poker: similar to coin-flipping game
But in many card games, need to keep track of history of dealt cards in state because it affects future probabilities One-deck blackjack Bridge Poker
![Page 67: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/67.jpg)
PARTIALLY OBSERVABLE GAMES
Partial observability Don’t see entire state (e.g., other players’ hands) “Fog of war”
Examples: Kriegspiel (see R&N) Battleship Stratego
![Page 68: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/68.jpg)
68
OBSERVATION OF THE REAL WORLD
Realworldin some state
Percepts
On(A,B)
On(B,Table)
Handempty
Interpretation of the percepts in the representation language
Percepts can be user’s inputs, sensory data (e.g., image pixels), information received from other agents, ...
![Page 69: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/69.jpg)
69
SECOND SOURCE OF UNCERTAINTY:IMPERFECT OBSERVATION OF THE WORLD
Observation of the world can be: Partial, e.g., a vision sensor can’t see through
obstacles (lack of percepts)
R1 R2
The robot may not know whether there is dust in room R2
![Page 70: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/70.jpg)
70
SECOND SOURCE OF UNCERTAINTY:IMPERFECT OBSERVATION OF THE WORLD
Observation of the world can be: Partial, e.g., a vision sensor can’t see through
obstacles Ambiguous, e.g., percepts have multiple
possible interpretations
A
BCOn(A,B) On(A,C)
![Page 71: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/71.jpg)
71
SECOND SOURCE OF UNCERTAINTY:IMPERFECT OBSERVATION OF THE WORLD
Observation of the world can be: Partial, e.g., a vision sensor can’t see through
obstacles Ambiguous, e.g., percepts have multiple
possible interpretations Incorrect
![Page 72: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/72.jpg)
PARTIALLY-OBSERVABLE CARD GAMES
One possible strategy: Consider all possible deals given observed
information Solve each deal as a fully-observable problem Choose the move that has the best average
minimax value “Averaging over clairvoyance” [Why doesn’t this always work?]
![Page 73: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/73.jpg)
BELIEF STATE
A belief state is the set of all states that an agent think are possible at any given time or at any stage of planning a course of actions, e.g.:
To plan a course of actions, the agent searches a space of belief states, instead of a space of states
![Page 74: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/74.jpg)
SENSOR MODEL State space S The sensor model is a function
SENSE: S 2S
that maps each state s S to a belief state (the set of all states that the agent would think possible if it were actually observing state s)
Example: Assume our vacuum robot can perfectly sense the room it is in and if there is dust in it. But it can’t sense if there is dust in the other roomSENSE( ) =
SENSE( ) =
![Page 75: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/75.jpg)
VACUUM ROBOT ACTION MODEL
Right either moves the robot right, or does nothing
Left always moves the robot to the left, but it may occasionally deposit dust in the right room
Suck picks up the dirt in the room, if any, and always does the right thing
• The robot perfectly senses the room it is in and whether there is dust in it
• But it can’t sense if there is dust in the other room
![Page 76: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/76.jpg)
TRANSITION BETWEEN BELIEF STATES Suppose the robot is initially in state:
After sensing this state, its belief state is:
Just after executing Left, its belief state will be:
After sensing the new state, its belief state will be:
or if there is no dust if there is dust in R1
in R1
![Page 77: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/77.jpg)
TRANSITION BETWEEN BELIEF STATES
Playing a “game against nature”
Left
Clean(R1) Clean(R1)
After receiving an observation, the robot will have one of these two belief states
![Page 78: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/78.jpg)
AND/OR TREE OF BELIEF STATES
Left
Suck
Suck
goal
A goal belief state is one in which all states are goal states
An action is applicable to a belief state B if its preconditions are achieved in all states in B
Right
loop goal
![Page 79: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/79.jpg)
RECAP
Alpha-beta pruning: reduce complexity of minimax to O(bh/2) ideally, O(b3h/4) typically
Games with chance Expected values: averaging over probabilities
A 2nd source of uncertainty: partial observability Reason about sets of states: belief state
Much more on latter 2 topics later
![Page 80: G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649f2f5503460f94c499da/html5/thumbnails/80.jpg)
HOMEWORK
Reading: R&N 6.1-3