CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC...
![Page 1: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/1.jpg)
CS 188: Artificial IntelligenceFall 2009
Lecture 6: Adversarial Search
9/15/2009
Dan Klein – UC Berkeley
Many slides over the course adapted from either Stuart Russell or Andrew Moore
1
![Page 2: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/2.jpg)
Announcements
Written 1 has been up (Search and CSPs)
Project 2 will be up soon (Multi-Agent Pacman)
Other annoucements: None yet
2
![Page 3: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/3.jpg)
Today
Finish up Search and CSPs
Start on Adversarial Search
3
![Page 4: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/4.jpg)
Tree-Structured CSPs
Theorem: if the constraint graph has no loops, the CSP can be solved in O(n d2) time Compare to general CSPs, where worst-case time is O(dn)
This property also applies to probabilistic reasoning (later): an important example of the relation between syntactic restrictions and the complexity of reasoning.
4
![Page 5: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/5.jpg)
Tree-Structured CSPs
Choose a variable as root, ordervariables from root to leaves suchthat every node’s parent precedesit in the ordering
For i = n : 2, apply RemoveInconsistent(Parent(Xi),Xi) For i = 1 : n, assign Xi consistently with Parent(Xi)
Runtime: O(n d2) (why?)5
![Page 6: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/6.jpg)
Tree-Structured CSPs
Why does this work? Claim: After each node is processed leftward, all nodes
to the right can be assigned in any way consistent with their parent.
Proof: Induction on position
Why doesn’t this algorithm work with loops?
Note: we’ll see this basic idea again with Bayes’ nets
6
![Page 7: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/7.jpg)
Nearly Tree-Structured CSPs
Conditioning: instantiate a variable, prune its neighbors' domains
Cutset conditioning: instantiate (in all ways) a set of variables such that the remaining constraint graph is a tree
Cutset size c gives runtime O( (dc) (n-c) d2 ), very fast for small c
7
![Page 8: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/8.jpg)
Tree Decompositions*
8
Create a tree-structured graph of overlapping subproblems, each is a mega-variable
Solve each subproblem to enforce local constraints Solve the CSP over subproblem mega-variables
using our efficient tree-structured CSP algorithm
M1 M2 M3 M4
{(WA=r,SA=g,NT=b), (WA=b,SA=r,NT=g), …}
{(NT=r,SA=g,Q=b), (NT=b,SA=g,Q=r), …}
Agree: (M1,M2) {((WA=g,SA=g,NT=g), (NT=g,SA=g,Q=g)), …}
Ag
ree
on
sha
red
vars
NT
SA
WA
Q
SA
NT
A
gre
e o
n sh
are
d va
rs
NSW
SA
Q
Ag
ree
on
sha
red
vars
Q
SA
NSW
![Page 9: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/9.jpg)
Iterative Algorithms for CSPs
Local search methods: typically work with “complete” states, i.e., all variables assigned
To apply to CSPs: Start with some assignment with unsatisfied constraints Operators reassign variable values No fringe! Live on the edge.
Variable selection: randomly select any conflicted variable
Value selection by min-conflicts heuristic: Choose value that violates the fewest constraints I.e., hill climb with h(n) = total number of violated constraints
9
![Page 10: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/10.jpg)
Example: 4-Queens
States: 4 queens in 4 columns (44 = 256 states) Operators: move queen in column Goal test: no attacks Evaluation: c(n) = number of attacks
[DEMO]10
![Page 11: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/11.jpg)
Performance of Min-Conflicts
Given random initial state, can solve n-queens in almost constant time for arbitrary n with high probability (e.g., n = 10,000,000)
The same appears to be true for any randomly-generated CSP except in a narrow range of the ratio
11
![Page 12: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/12.jpg)
Hill Climbing
Simple, general idea: Start wherever Always choose the best neighbor If no neighbors have better scores than
current, quit
Why can this be a terrible idea? Complete? Optimal?
What’s good about it?12
![Page 13: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/13.jpg)
Hill Climbing Diagram
Random restarts? Random sideways steps?
13
![Page 14: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/14.jpg)
Simulated Annealing Idea: Escape local maxima by allowing downhill moves
But make them rarer as time goes on
14
![Page 15: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/15.jpg)
Summary
CSPs are a special kind of search problem: States defined by values of a fixed set of variables Goal test defined by constraints on variable values
Backtracking = depth-first search with incremental constraint checks
Ordering: variable and value choice heuristics help significantly
Filtering: forward checking, arc consistency prevent assignments that guarantee later failure
Structure: Disconnected and tree-structured CSPs are efficient
Iterative improvement: min-conflicts is usually effective in practice
15
![Page 16: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/16.jpg)
Game Playing State-of-the-Art Checkers: Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Checkers is now solved!
Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic.
Othello: Human champions refuse to compete against computers, which are too good.
Go: Human champions are beginning to be challenged by machines, though the best humans still beat the best machines. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves, along with aggressive pruning.
Pacman: unknown
16
![Page 17: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/17.jpg)
GamesCrafters
http://gamescrafters.berkeley.edu/17
![Page 18: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/18.jpg)
Adversarial Search
[DEMO: mystery pacman]
18
![Page 19: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/19.jpg)
Game Playing
Many different kinds of games!
Axes: Deterministic or stochastic? One, two, or more players? Perfect information (can you see the state)?
Want algorithms for calculating a strategy (policy) which recommends a move in each state
19
![Page 20: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/20.jpg)
Deterministic Games
Many possible formalizations, one is: States: S (start at s0)
Players: P={1...N} (usually take turns) Actions: A (may depend on player / state) Transition Function: SxA S Terminal Test: S {t,f} Terminal Utilities: SxP R
Solution for a player is a policy: S A
20
![Page 21: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/21.jpg)
Deterministic Single-Player?
Deterministic, single player, perfect information: Know the rules Know what actions do Know when you win E.g. Freecell, 8-Puzzle, Rubik’s
cube … it’s just search! Slight reinterpretation:
Each node stores a value: the best outcome it can reach
This is the maximal outcome of its children (the max value)
Note that we don’t have path sums as before (utilities at end)
After search, can pick move that leads to best node win loselose
21
![Page 22: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/22.jpg)
Deterministic Two-Player
E.g. tic-tac-toe, chess, checkers Zero-sum games
One player maximizes result The other minimizes result
Minimax search A state-space search tree Players alternate Each layer, or ply, consists of a
round of moves* Choose move to position with
highest minimax value = best achievable utility against best play
8 2 5 6
max
min
22
* Slightly different from the book definition
![Page 23: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/23.jpg)
Tic-tac-toe Game Tree
23
![Page 24: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/24.jpg)
Minimax Example
24
![Page 25: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/25.jpg)
Minimax Search
25
![Page 26: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/26.jpg)
Minimax Properties
Optimal against a perfect player. Otherwise?
Time complexity? O(bm)
Space complexity? O(bm)
For chess, b 35, m 100 Exact solution is completely infeasible But, do we need to explore the whole tree?
10 10 9 100
max
min
[DEMO: minVsExp]
26
![Page 27: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/27.jpg)
Resource Limits Cannot search to leaves
Depth-limited search Instead, search a limited depth of tree Replace terminal utilities with an eval function
for non-terminal positions
Guarantee of optimal play is gone
More plies makes a BIG difference [DEMO: limitedDepth]
Example: Suppose we have 100 seconds, can explore
10K nodes / sec So can check 1M nodes per move - reaches about depth 8 – decent chess
program? ? ? ?
-1 -2 4 9
4
min min
max
-2 4
27
![Page 28: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/28.jpg)
Evaluation Functions Function which scores non-terminals
Ideal function: returns the utility of the position In practice: typically weighted linear sum of features:
e.g. f1(s) = (num white queens – num black queens), etc.28
![Page 29: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/29.jpg)
Evaluation for Pacman
[DEMO: thrashing, smart ghosts]
29
![Page 30: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/30.jpg)
Why Pacman Starves
He knows his score will go up by eating the dot now
He knows his score will go up just as much by eating the dot later on
There are no point-scoring opportunities after eating the dot
Therefore, waiting seems just as good as eating
![Page 31: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/31.jpg)
31
![Page 32: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/32.jpg)
Iterative DeepeningIterative deepening uses DFS as a subroutine:
1. Do a DFS which only searches for paths of length 1 or less. (DFS gives up on any path of length 2)
2. If “1” failed, do a DFS which only searches paths of length 2 or less.
3. If “2” failed, do a DFS which only searches paths of length 3 or less.
….and so on.
Why do we want to do this for multiplayer games?
…b
32
![Page 33: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/33.jpg)
- Pruning Example
34
![Page 34: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/34.jpg)
- Pruning
General configuration is the best value that
MAX can get at any choice point along the current path
If n becomes worse than , MAX will avoid it, so can stop considering n’s other children
Define similarly for MIN
Player
Opponent
Player
Opponent
n
35
![Page 35: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/35.jpg)
- Pruning Pseudocode
v36
![Page 36: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/36.jpg)
- Pruning Properties
Pruning has no effect on final result
Good move ordering improves effectiveness of pruning
With “perfect ordering”: Time complexity drops to O(bm/2) Doubles solvable depth Full search of, e.g. chess, is still hopeless!
A simple example of metareasoning, here reasoning about which computations are relevant
37
![Page 37: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/37.jpg)
Non-Zero-Sum Games
Similar to minimax: Utilities are
now tuples Each player
maximizes their own entry at each node
Propagate (or back up) nodes from children
1,2,6 4,3,2 6,1,2 7,4,1 5,1,1 1,5,2 7,7,1 5,4,5
38
![Page 38: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/38.jpg)
Stochastic Single-Player What if we don’t know what the
result of an action will be? E.g., In solitaire, shuffle is unknown In minesweeper, mine locations In pacman, ghosts!
Can do expectimax search Chance nodes, like actions except
the environment controls the action chosen
Calculate utility for each node Max nodes as in search Chance nodes take average
(expectation) of value of children
Later, we’ll learn how to formalize this as a Markov Decision Process
10 4 5 7
max
average
[DEMO: minVsExp]
39
![Page 39: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/39.jpg)
Stochastic Two-Player
E.g. backgammon Expectiminimax (!)
Environment is an extra player that moves after each agent
Chance nodes take expectations, otherwise like minimax
40
![Page 40: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/40.jpg)
Stochastic Two-Player
Dice rolls increase b: 21 possible rolls with 2 dice Backgammon 20 legal moves Depth 4 = 20 x (21 x 20)3 1.2 x 109
As depth increases, probability of reaching a given node shrinks So value of lookahead is diminished So limiting depth is less damaging But pruning is less possible…
TDGammon uses depth-2 search + very good eval function + reinforcement learning: world-champion level play
41
![Page 41: CS 188: Artificial Intelligence Fall 2009 Lecture 6: Adversarial Search 9/15/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649d5c5503460f94a3af8e/html5/thumbnails/41.jpg)
What’s Next?
Make sure you know what: Probabilities are Expectations are
Next topics: Dealing with uncertainty How to learn evaluation functions Markov Decision Processes
42