Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep...
Transcript of Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep...
![Page 1: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/1.jpg)
Game Playing State-of-the-Art
http://bit.ly/3GEMok9
Chinook beat 40-year-reign of champion MarionTinsley using complete 8-piece endgame.2007: Checkers solved!
Chess: 1997: Deep Blue defeats humanchampion Gary Kasparov in a six-game match.Deep Blue examined 200M positions persecond, used sophisticated evaluation andundisclosed methods for extending some lines ofsearch up to 40 ply. Current programs are evenbetter, if less historic.
Go: Human champions are being beaten. In go,b > 300! Classically use pattern knowledgebases, but big recent advances use Monte Carlo(randomized) expansion methods.
Pacman
![Page 3: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/3.jpg)
Video of Demo Mystery Pacman
![Page 5: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/5.jpg)
Types of Games
http://bit.ly/3GEMok9
Many different kinds of games!
Axes:t Deterministic or stochastic?t One, two, or more players?t Zero sum?t Perfect information (can you see the state)?
Want algorithms for calculating a strategy(policy) which recommends a move from eachstate.
![Page 6: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/6.jpg)
Deterministic Games
http://bit.ly/3GEMok9
Many possible formalizations, one is:t States: S (start at s0)t Players: P = {1...N} (usually take turns).t Actions: A (may depend on player / state)t Transition Function: S×A→ S.t Terminal Test: S→{t , f}t Terminal Utilities: S×P→ R.
Solution for a player is a policy: S→ A.
![Page 7: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/7.jpg)
Zero-Sum Games
http://bit.ly/3GEMok9
Zero-Sum Gamest Agents have opposite utilities(values on outcomes)t Lets us think of a single valuethat one maximizes and the otherminimizest Adversarial, pure competition
General Gamest Agents have independentutilities (values on outcomes)t Cooperation, indifference,competition, and more are allpossiblet More later on non-zero-sumgames
![Page 8: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/8.jpg)
Adversarial Search
![Page 9: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/9.jpg)
Single-Agent Trees
8
2 0 · · · 2 6 · · · 4 6
![Page 10: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/10.jpg)
Value of Game Tree
Value of a State: utility of best achievable outcome from state.
8
2 0 · · · 2 6 · · · 4 6
Terminal States:V (s) = known
Non terminal states:V (s) = maxs′∈kids(s) V (s′)
![Page 11: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/11.jpg)
Adversarial Game Tree
8
2 0 · · · 2 6 · · · 4 6
![Page 12: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/12.jpg)
Minimax Values
+8 −10 −5 −8
Terminal States: V (s) = known
Max states: V (s) = maxs′∈succ(s) V (s′)
Min states: V (s) = mins′∈succ(s) V (s′)
States Under Agent’s Control. Max.
Terminal States.
States Under Opponent’s Control. Min.
![Page 13: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/13.jpg)
Tic-Tac-Toe Game Tree
![Page 14: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/14.jpg)
Adversarial Search (Minimax)
2 5
5
8 2 5 6
Minimax values:computed recursively.
Terminal values:part of the game.
Deterministic, zero-sum games:t Tic-tac-toe, chess, checkerst One player maximizes resultt The other minimizes result
Minimax search:t A state-space search treet Players alternate turnst Compute each node’s minimax value:the best achievable utility against arational (optimal) adversary
![Page 15: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/15.jpg)
Minimax Implementation
V (s) = mins′∈succ(s) V (s′)
def min-value(state):initialize v = +∞
for each successor s of state:v = min(v, max-value(s))
return v
V (s′) = maxs∈succ(s′) V (s)
def max-value(state):initialize v = −∞.for each successor s of state:
v = max(v, min-value(s))return v
def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)
![Page 16: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/16.jpg)
Minimax Example
3 2 5
5
3 12 8 2 4 6 14 5 20
![Page 17: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/17.jpg)
Minimax Efficiency
How efficient is minimax?t Just like (exhaustive) DFSt Time: O(bm)t Space: O(bm)
Example: For chess, b ≈ 35, m ≈ 100t Exact solution is completely infeasible.t But, do we need to explore the whole tree?
![Page 18: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/18.jpg)
Minimax Properties
10 9
10
10 10 9 100
Minimax values:computed recursively.
Terminal values:part of the game.
Optimal against a perfect player. Otherwise?
Demo : minvsexp(L6D2,L6D3)
![Page 19: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/19.jpg)
Video of Demo Min vs. Exp (Min)
![Page 20: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/20.jpg)
Video of Demo Min vs. Exp (Exp)
![Page 21: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/21.jpg)
Resource Limits
![Page 22: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/22.jpg)
Resource Limits
Problem: In realistic games, cannot search toleaves!
Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions
Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program
Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm
![Page 23: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/23.jpg)
Depth Matters
Evaluation functions are always imperfect
The deeper in the tree the evaluation function isburied, the less the quality of the evaluationfunction matters
An important example of the tradeoff betweencomplexity of features and complexity ofcomputation
Demo : depthlimited(L6D4,L6D5)
![Page 24: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/24.jpg)
Video of Demo Limited Depth (2)
![Page 25: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/25.jpg)
Video of Demo Limited Depth (10)
![Page 26: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/26.jpg)
Evaluation Functions
![Page 27: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/27.jpg)
Evaluation Functions
Evaluation functions score non-terminals in depth-limited search
Ideal function: returns the actual minimax value of the position
In practice: typically weighted linear sum of features:w1f1(s)+w2f2(s)+ · · ·wnfn(s).
Example: f1(s) = (num white queens – num black queens), etc.
![Page 28: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/28.jpg)
Evaluation for Pacman
Demo : thrashingd =2, thrashingd =2(fixedevaluationfunction),smartghostscoordinate(L6D6,7,8,10)
![Page 29: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/29.jpg)
Video of Demo Thrashing (d=2)
![Page 30: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/30.jpg)
Why Pacman Starves
A danger of replanning agents!t He knows his score will go up by eating the dot now (west, east)t He knows his score will go up just as much by eating the dot later(east, west)t There are no point-scoring opportunities after eating the dot (withinthe horizon, two here)t Therefore, waiting seems just as good as eating: he may go east,then back west in the next round of replanning!
![Page 31: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/31.jpg)
Video of Demo Thrashing – Fixed (d=2)
![Page 32: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/32.jpg)
Video of Demo Smart Ghosts (Coordination)
![Page 33: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/33.jpg)
Video of Demo Smart Ghosts (Coordination) –Zoomed In
![Page 34: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/34.jpg)
Game Tree Pruning
![Page 35: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/35.jpg)
Minimax Example
3 2 5
5
3 12 8 2 4 6 14 5 20
![Page 36: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/36.jpg)
Minimax Pruning
3 ≤ 2 5
5
3 12 8 2 14 5 20
![Page 37: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/37.jpg)
Alpha-Beta Pruning
General configuration (MIN version)t Computing MIN-VALUE at some node n.t Looping over n’s childrent n’s estimate of min is droppingt Who cares about n’s value? MAXt a = “ best MAX value on path to root.”t If n < a, than MAX will never choose it.So search “prunes” other children.
Symmetric for MAX version.
![Page 38: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/38.jpg)
Alpha-Beta Implementation
α - MAX’s best option on path to root.
β - MIN’s best option on path to root.
def min-value(state , α,β ):initialize v = +∞.for each successor s of state:
v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)
return v
def max-value(state , α,β ):initialize v = −∞.for each successor s of state:
v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)
return v
![Page 39: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/39.jpg)
Alpha-Beta Pruning Properties
This pruning has no effect on minimax valuecomputed for the root!
Values of intermediate nodes might be wrongt Important: root’s children may be wrongt→ most naive version not for action selection
Good child ordering improves effectiveness
With “perfect ordering”:t Time complexity drops to O(bm/2)t Doubles solvable depth!t Full search of, e.g. chess, is still hopeless...
This is a simple example of metareasoning (computing about what tocompute)
![Page 40: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/40.jpg)
Alpha-Beta Quiz
![Page 41: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/41.jpg)
Alpha-Beta Quiz 2
![Page 42: Game Playing State-of-the-Artcs188/sp20/assets/...champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used sophisticated evaluation and undisclosed](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d4beb19638064916a83eb/html5/thumbnails/42.jpg)
Next Time: Uncertainty!