Announcements - cs.cmu.edu
Transcript of Announcements - cs.cmu.edu
![Page 1: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/1.jpg)
AnnouncementsAssignments:
▪ P0: Python & Autograder Tutorial
▪ Due Thu 9/5, 10 pm
▪ HW2 (written)
▪ Due Tue 9/10, 10 pm
▪ No slip days. Up to 24 hours late, 50 % penalty
▪ P1: Search & Games
▪ Due Thu 9/12, 10 pm
▪ Recommended to work in pairs
▪ Submit to Gradescope early and often
![Page 2: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/2.jpg)
AI: Representation and Problem Solving
Adversarial Search
Instructors: Pat Virtue & Fei Fang
Slide credits: CMU AI, http://ai.berkeley.edu
![Page 3: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/3.jpg)
Outline
History / Overview
Zero-Sum Games (Minimax)
Evaluation Functions
Search Efficiency (α-β Pruning)
Games of Chance (Expectimax)
![Page 4: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/4.jpg)
Game Playing State-of-the-ArtCheckers:▪ 1950: First computer player.
▪ 1959: Samuel’s self-taught program.
▪ 1994: First computer world champion: Chinook ended 40-year-reign of human champion Marion Tinsley using complete 8-piece endgame.
▪ 2007: Checkers solved! Endgame database of 39 trillion states
Chess:▪ 1945-1960: Zuse, Wiener, Shannon, Turing, Newell & Simon,
McCarthy.
▪ 1960s onward: gradual improvement under “standard model”
▪ 1997: special-purpose chess machine Deep Blue defeats human champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second and extended some lines of search up to 40 ply. Current programs running on a PC rate > 3200 (vs 2870 for Magnus Carlsen).
Go:▪ 1968: Zobrist’s program plays legal Go, barely (b>300!)
▪ 2005-2014: Monte Carlo tree search enables rapid advances: current programs beat strong amateurs, and professionals with a 3-4 stone handicap.
![Page 5: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/5.jpg)
Game Playing State-of-the-ArtCheckers:▪ 1950: First computer player.
▪ 1959: Samuel’s self-taught program.
▪ 1994: First computer world champion: Chinook ended 40-year-reign of human champion Marion Tinsley using complete 8-piece endgame.
▪ 2007: Checkers solved! Endgame database of 39 trillion states
Chess:▪ 1945-1960: Zuse, Wiener, Shannon, Turing, Newell & Simon,
McCarthy.
▪ 1960s onward: gradual improvement under “standard model”
▪ 1997: special-purpose chess machine Deep Blue defeats human champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second and extended some lines of search up to 40 ply. Current programs running on a PC rate > 3200 (vs 2870 for Magnus Carlsen).
Go:▪ 1968: Zobrist’s program plays legal Go, barely (b>300!)
▪ 2005-2014: Monte Carlo tree search enables rapid advances: current programs beat strong amateurs, and professionals with a 3-4 stone handicap.
▪ 2015: AlphaGo from DeepMind beats Lee Sedol
![Page 6: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/6.jpg)
Behavior from Computation
[Demo: mystery pacman (L6D1)]
![Page 7: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/7.jpg)
Many different kinds of games!
Axes:▪ Deterministic or stochastic?
▪ Perfect information (fully observable)?
▪ One, two, or more players?
▪ Turn-taking or simultaneous?
▪ Zero sum?
Want algorithms for calculating a contingent plan (a.k.a. strategy or policy)which recommends a move for every possible eventuality
Types of Games
![Page 8: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/8.jpg)
“Standard” Games
Standard games are deterministic, observable, two-player, turn-taking, zero-sum
Game formulation:▪ Initial state: s0
▪ Players: Player(s) indicates whose move it is
▪ Actions: Actions(s) for player on move
▪ Transition model: Result(s,a)
▪ Terminal test: Terminal-Test(s)
▪ Terminal values: Utility(s,p) for player p▪ Or just Utility(s) for player making the decision at root
![Page 9: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/9.jpg)
Zero-Sum Games
• Zero-Sum Games• Agents have opposite utilities
• Pure competition: • One maximizes, the other minimizes
• General Games• Agents have independent utilities
• Cooperation, indifference, competition, shifting alliances, and more are all possible
![Page 10: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/10.jpg)
Adversarial Search
![Page 11: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/11.jpg)
Single-Agent Trees
8
2 0 2 6 4 6… …
![Page 12: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/12.jpg)
Minimax
States
Actions
Values
![Page 13: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/13.jpg)
Minimax
+8-10-5-8
States
Actions
Values
![Page 14: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/14.jpg)
Piazza Poll 1
12 8 5 23 2 144 6
What is the minimax value at the root?
A) 2
B) 3
C) 6
D) 12
E) 14
![Page 15: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/15.jpg)
Piazza Poll 1
12 8 5 23 2 144 6
What is the minimax value at the root?
A) 2
B) 3
C) 6
D) 12
E) 14
![Page 16: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/16.jpg)
Piazza Poll 1
12 8 5 23 2 144 6
3 2 2
3
What is the minimax value at the root?
A) 2
B) 3
C) 6
D) 12
E) 14
![Page 17: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/17.jpg)
Minimax Code
![Page 18: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/18.jpg)
Max Code
+8-10-8
![Page 19: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/19.jpg)
Max Code
![Page 20: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/20.jpg)
Minimax Code
![Page 21: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/21.jpg)
Minimax Notation
𝑉 𝑠 = max𝑎
𝑉 𝑠′ ,
where 𝑠′ = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
![Page 22: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/22.jpg)
𝑎 = argmax𝑎
𝑉 𝑠′ ,
where 𝑠′ = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
ො𝑎 = argmax𝑎
𝑉 𝑠′ ,
where 𝑠′ = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
Minimax Notation
𝑉 𝑠 = max𝑎
𝑉 𝑠′ ,
where 𝑠′ = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
![Page 23: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/23.jpg)
Generic Game Tree Pseudocode
function minimax_decision( state )
return argmax a in state.actions value( state.result(a) )
function value( state )if state.is_leaf
return state.value
if state.player is MAXreturn max a in state.actions value( state.result(a) )
if state.player is MINreturn min a in state.actions value( state.result(a) )
![Page 24: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/24.jpg)
Minimax Efficiency
How efficient is minimax?▪ Just like (exhaustive) DFS
▪ Time: O(bm)
▪ Space: O(bm)
Example: For chess, b 35, m 100▪ Exact solution is completely infeasible
▪ Humans can’t do this either, so how do we play chess?
▪ Bounded rationality – Herbert Simon
![Page 25: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/25.jpg)
Resource Limits
![Page 26: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/26.jpg)
Resource Limits
Problem: In realistic games, cannot search to leaves!
Solution 1: Bounded lookahead▪ Search only to a preset depth limit or horizon▪ Use an evaluation function for non-terminal positions
Guarantee of optimal play is gone
More plies make a BIG difference
Example:▪ Suppose we have 100 seconds, can explore 10K nodes / sec▪ So can check 1M nodes per move▪ For chess, b=~35 so reaches about depth 4 – not so good ? ? ? ?
-1 -2 4 9
4
min
max
-2 4
![Page 27: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/27.jpg)
Depth Matters
Evaluation functions are always imperfect
Deeper search => better play (usually)
Or, deeper search gives same quality of play with a less accurate evaluation function
An important example of the tradeoff between complexity of features and complexity of computation
[Demo: depth limited (L6D4, L6D5)]
![Page 28: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/28.jpg)
Demo Limited Depth (2)
![Page 29: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/29.jpg)
Demo Limited Depth (10)
![Page 30: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/30.jpg)
Evaluation Functions
![Page 31: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/31.jpg)
Evaluation FunctionsEvaluation functions score non-terminals in depth-limited search
Ideal function: returns the actual minimax value of the position
In practice: typically weighted linear sum of features:▪ EVAL(s) = w1 f1(s) + w2 f2(s) + …. + wn fn(s)▪ E.g., w1 = 9, f1(s) = (num white queens – num black queens), etc.
![Page 32: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/32.jpg)
Evaluation for Pacman
![Page 33: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/33.jpg)
Generalized minimax
What if the game is not zero-sum, or has multiple players?
Generalization of minimax:▪ Terminals have utility tuples▪ Node values are also utility tuples▪ Each player maximizes its own component▪ Can give rise to cooperation and
competition dynamically…
1,1,6 0,0,7 9,9,0 8,8,1 9,9,0 7,7,2 0,0,8 0,0,7
0,0,7 8,8,1 7,7,2 0,0,8
8,8,1 7,7,2
8,8,1
![Page 35: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/35.jpg)
Game Tree Pruning
![Page 36: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/36.jpg)
Alpha-Beta Example
12 8 5 23 2 14
α =3 α =3
α = best option so far from any MAX node on this path
The order of generation matters: more pruningis possible if good moves come first
3
3
![Page 37: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/37.jpg)
Piazza Poll 2Which branches are pruned?(Left to right traversal)(Select all that apply)
![Page 38: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/38.jpg)
Piazza Poll 2Which branches are pruned?(Left to right traversal)(Select all that apply)
![Page 39: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/39.jpg)
Piazza Poll 3
Which branches are pruned?(Left to right traversal)A) e, lB) g, lC) g, k, lD) g, n
![Page 40: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/40.jpg)
1
Alpha-Beta Quiz 2
?
10
?
?
10
10 100
?
?
2
2
?
β =
α =
α= α= α=
β =
![Page 41: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/41.jpg)
Alpha-Beta Implementation
def min-value(state , α, β):initialize v = +∞for each successor of state:
v = min(v, value(successor, α, β))if v ≤ α
return vβ = min(β, v)
return v
def max-value(state, α, β):initialize v = -∞for each successor of state:
v = max(v, value(successor, α, β))if v ≥ β
return vα = max(α, v)
return v
α: MAX’s best option on path to rootβ: MIN’s best option on path to root
![Page 42: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/42.jpg)
Alpha-Beta Quiz 2
10 v=100
β = 10
def max-value(state, α, β):initialize v = -∞for each successor of state:
v = max(v, value(successor, α, β))if v ≥ β
return vα = max(α, v)
return v
α: MAX’s best option on path to rootβ: MIN’s best option on path to root
![Page 43: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/43.jpg)
Alpha-Beta Quiz 2
10
10 100 2
v = 2
α = 10def min-value(state , α, β):
initialize v = +∞for each successor of state:
v = min(v, value(successor, α, β))if v ≤ α
return vβ = min(β, v)
return v
α: MAX’s best option on path to rootβ: MIN’s best option on path to root
![Page 44: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/44.jpg)
Alpha-Beta Pruning PropertiesTheorem: This pruning has no effect on minimax value computed for the root!
Good child ordering improves effectiveness of pruning▪ Iterative deepening helps with this
With “perfect ordering”:▪ Time complexity drops to O(bm/2)
▪ Doubles solvable depth!
▪ 1M nodes/move => depth=8, respectable
This is a simple example of metareasoning (computing about what to compute)
10 10 0
max
min
![Page 45: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/45.jpg)
Minimax Demo
Fine print▪ Pacman: uses depth 4 minimax▪ Ghost: uses depth 2 minimax
Points
+500 win
-500 lose
-1 each move
![Page 46: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/46.jpg)
How well would a minimax Pacman perform against a
ghost that moves randomly?
A. Better than against a minimax ghost
B. Worse than against a minimax ghost
C. Same as against a minimax ghost
Piazza Poll 4
Fine print▪ Pacman: uses depth 4 minimax as before▪ Ghost: moves randomly
![Page 47: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/47.jpg)
Demo
![Page 48: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/48.jpg)
Assumptions vs. Reality
MinimaxGhost
RandomGhost
MinimaxPacman
Won 5/5
Avg. Score: 493
Won 5/5
Avg. Score: 464
ExpectimaxPacman
Won 1/5
Avg. Score: -303
Won 5/5
Avg. Score: 503
Results from playing 5 games
![Page 49: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/49.jpg)
Modeling Assumptions
Know your opponent
10091010
![Page 50: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/50.jpg)
Modeling Assumptions
Know your opponent
10091010
![Page 51: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/51.jpg)
Modeling AssumptionsMinimax autonomous vehicle?
Image: https://corporate.ford.com/innovation/autonomous-2021.html
![Page 52: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/52.jpg)
Clip: How I Met Your Mother, CBS
Minimax Driver?
![Page 53: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/53.jpg)
Modeling Assumptions
Dangerous OptimismAssuming chance when the world is adversarial
Dangerous PessimismAssuming the worst case when it’s not likely
![Page 54: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/54.jpg)
Modeling Assumptions
Know your opponent
10091010
![Page 55: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/55.jpg)
Modeling Assumptions
Chance nodes: Expectimax
10091010
![Page 56: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/56.jpg)
Assumptions vs. Reality
Minimax Ghost
RandomGhost
MinimaxPacman
Won 5/5
Avg. Score: 493
Won 5/5
Avg. Score: 464
ExpectimaxPacman
Won 1/5
Avg. Score: -303
Won 5/5
Avg. Score: 503
Results from playing 5 games
![Page 57: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/57.jpg)
Chance outcomes in trees
10 10 9 10010 10 9 100
9 10 9 1010 100
Tictactoe, chessMinimax
Tetris, investingExpectimax
Backgammon, MonopolyExpectiminimax
![Page 58: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/58.jpg)
Probabilities
![Page 59: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/59.jpg)
Probabilities
A random variable represents an event whose outcome is unknown
A probability distribution is an assignment of weights to outcomes
Example: Traffic on freeway▪ Random variable: T = whether there’s traffic▪ Outcomes: T in {none, light, heavy}▪ Distribution:
P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25
Probabilities over all possible outcomes sum to one
0.25
0.50
0.25
![Page 60: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/60.jpg)
Expected value of a function of a random variable:
Average the values of each outcome,
weighted by the probability of that outcome
Example: How long to get to the airport?
Expected Value
0.25 0.50 0.25Probability:
20 min 30 min 60 minTime:35 minx x x+ +
![Page 61: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/61.jpg)
Expectations
0.25 0.50 0.25Probability:
20 min 30 min 60 minTime:x x x+ +
6020 30
𝑉 𝑠 = max𝑎
𝑉 𝑠′ ,
where 𝑠′ = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
Max node notation Chance node notation
𝑉 𝑠 =
0.25
0.5
0.25
![Page 62: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/62.jpg)
Expectations
0.25 0.50 0.25Probability:
20 min 30 min 60 minTime:x x x+ +
6020 30
0.25
0.5
0.25
𝑉 𝑠 = max𝑎
𝑉 𝑠′ ,
where 𝑠′ = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
Max node notation Chance node notation
𝑉 𝑠 =
𝑠′
𝑃 𝑠′ 𝑉(𝑠′)
![Page 63: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/63.jpg)
Piazza Poll 5
Expectimax tree search:Which action do we choose?
A: LeftB: CenterC: RightD: Eight
412 8 8 6 12 6
1/4
1/4
1/2 1/2 1/2 1/3 2/3
LeftCenter
Right
![Page 64: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/64.jpg)
Piazza Poll 5
Expectimax tree search:Which action do we choose?
A: LeftB: CenterC: RightD: Eight
412 8 8 6 12 6
1/4
1/4
1/2 1/2 1/2 1/3 2/3
LeftCenter
Right
4+3=73+2+2=7 4+4=8
8, Right
![Page 65: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/65.jpg)
Expectimax Pruning?
12 93 2
![Page 66: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/66.jpg)
Expectimax Code
function value( state )if state.is_leaf
return state.value
if state.player is MAXreturn max a in state.actions value( state.result(a) )
if state.player is MINreturn min a in state.actions value( state.result(a) )
if state.player is CHANCEreturn sum s in state.next_states P( s ) * value( s )
![Page 67: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/67.jpg)
𝑉 𝑠 = max𝑎
𝑠′
𝑃(𝑠′) 𝑉(𝑠′)
Preview: MDP/Reinforcement Learning Notation
![Page 68: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/68.jpg)
Preview: MDP/Reinforcement Learning Notation
Standard expectimax: 𝑉 𝑠 = max𝑎
𝑠′
𝑃 𝑠′ 𝑠, 𝑎)𝑉(𝑠′)
𝑉 𝑠 = max𝑎
𝑠′
𝑃 𝑠′ 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉 𝑠′
𝑉𝑘+1 𝑠 = max𝑎
𝑠′
𝑃 𝑠′ 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉𝑘 𝑠′ , ∀ 𝑠
𝑄𝑘+1 𝑠, 𝑎 =
𝑠′
𝑃 𝑠′ 𝑠, 𝑎 [𝑅 𝑠, 𝑎, 𝑠′ + 𝛾max𝑎′
𝑄𝑘(𝑠′, 𝑎′)] , ∀ 𝑠, 𝑎
𝜋𝑉 𝑠 = argmax𝑎
𝑠′
𝑃 𝑠′ 𝑠, 𝑎 [𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉 𝑠′ ] , ∀ 𝑠
𝑉𝑘+1𝜋 𝑠 =
𝑠′
𝑃 𝑠′ 𝑠, 𝜋 𝑠 [𝑅 𝑠, 𝜋 𝑠 , 𝑠′ + 𝛾𝑉𝑘𝜋 𝑠′ ] , ∀ 𝑠
𝜋𝑛𝑒𝑤 𝑠 = argmax𝑎
𝑠′
𝑃 𝑠′ 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉𝜋𝑜𝑙𝑑 𝑠′ , ∀ 𝑠
Bellman equations:
Value iteration:
Q-iteration:
Policy extraction:
Policy evaluation:
Policy improvement:
![Page 69: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/69.jpg)
Preview: MDP/Reinforcement Learning Notation
Standard expectimax: 𝑉 𝑠 = max𝑎
𝑠′
𝑃 𝑠′ 𝑠, 𝑎)𝑉(𝑠′)
𝑉 𝑠 = max𝑎
𝑠′
𝑃 𝑠′ 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉 𝑠′
𝑉𝑘+1 𝑠 = max𝑎
𝑠′
𝑃 𝑠′ 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉𝑘 𝑠′ , ∀ 𝑠
𝑄𝑘+1 𝑠, 𝑎 =
𝑠′
𝑃 𝑠′ 𝑠, 𝑎 [𝑅 𝑠, 𝑎, 𝑠′ + 𝛾max𝑎′
𝑄𝑘(𝑠′, 𝑎′)] , ∀ 𝑠, 𝑎
𝜋𝑉 𝑠 = argmax𝑎
𝑠′
𝑃 𝑠′ 𝑠, 𝑎 [𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉 𝑠′ ] , ∀ 𝑠
𝑉𝑘+1𝜋 𝑠 =
𝑠′
𝑃 𝑠′ 𝑠, 𝜋 𝑠 [𝑅 𝑠, 𝜋 𝑠 , 𝑠′ + 𝛾𝑉𝑘𝜋 𝑠′ ] , ∀ 𝑠
𝜋𝑛𝑒𝑤 𝑠 = argmax𝑎
𝑠′
𝑃 𝑠′ 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉𝜋𝑜𝑙𝑑 𝑠′ , ∀ 𝑠
Bellman equations:
Value iteration:
Q-iteration:
Policy extraction:
Policy evaluation:
Policy improvement:
![Page 70: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/70.jpg)
Why Expectimax?
Pretty great model for an agent in the world
Choose the action that has the: highest expected value
![Page 71: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/71.jpg)
Bonus QuestionLet’s say you know that your opponent is actually running a depth 1 minimax, using the result 80% of the time, and moving randomly otherwise
Question: What tree search should you use?
A: Minimax
B: Expectimax
C: Something completely different
![Page 72: Announcements - cs.cmu.edu](https://reader031.fdocuments.in/reader031/viewer/2022011815/61d50a880679be4f477e1b57/html5/thumbnails/72.jpg)
SummaryGames require decisions when optimality is impossible▪ Bounded-depth search and approximate evaluation functions
Games force efficient use of computation▪ Alpha-beta pruning
Game playing has produced important research ideas▪ Reinforcement learning (checkers)
▪ Iterative deepening (chess)
▪ Rational metareasoning (Othello)
▪ Monte Carlo tree search (Go)
▪ Solution methods for partial-information games in economics (poker)
Video games present much greater challenges – lots to do!▪ b = 10500, |S| = 104000, m = 10,000