Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game...
-
Upload
lenard-weaver -
Category
Documents
-
view
222 -
download
0
Transcript of Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game...
Agents that can play multi-player games
Recall: Single-player, fully-observable, deterministic game agents
An agent that plays Peg Solitaire involves - A representation of
the initial state;- A method to generate
new states from existing ones;
- A test for whether a state is a goal state.
Initial Board for Triangle Peg Solitaire
A jump, with resulting board
The goal state:
Recall: Single-player, fully-observable, deterministic game agents
Initial Board for Triangle Peg Solitaire
A jump, with resulting board
The goal state:
…
Initial state
Successor state axioms or STRIPS effects
Goal state
Recall: Single-player, fully-observable, deterministic game agents
Initial Board for Triangle Peg Solitaire
A jump, with resulting board
The goal state:
…
Initial state
Successor state axioms or STRIPS effects
Goal state
Goal state vs. Terminal states and Utilities
The goal state:
terminal statesUtility: +2
Utility: +1
Utility: -1
Quiz: Goal states vs. Terminal states and Utilities
…
Initial state
Successor state axioms or STRIPS effects
Terminal states
What could go wrong when using A* or breadth-first or other strategies with terminal states?
+1+2
-1
Answer: Goal states vs. Terminal states and Utilities
…
Initial state
Successor state axioms or STRIPS effects
Terminal states
You’re guaranteed to find the best path to the terminal state that is found.
You’re NOT guaranteed to find the best terminal state (the one with highest utility), unless you do an exhaustive search.
+1+2
-1
Hex: Two-player, zero-sum game
(Also, deterministic and fully-observable.)Hex:- Two players, red and blue.- Board is N x N, with hexagonal
spaces.- Two opposite sides are red, and
other two sides are blue.- Each player’s objective is to build
a path connecting the sides of his or her color.
- Players alternate turns, and place a single piece of their color on their turn.
Hex: Two-player, zero-sum gameSome fun facts:- There are no ties in Hex (proved
by John Nash).- First player has a distinct
advantage (also proved by Nash).- In tournament play, it’s common
to use the “pie rule”, for fairness: after the first player makes the first move, the second player can choose whether to switch sides. (We will ignore this rule.)
Hex Question
What is red’s best move (red’s turn next)?
Hex Question
What is red’s best move (red’s turn next)?This orange one looks pretty good: only one more square, and red will win.
Using a simple heuristic, this looks like it’s getting close to the goal.
Hex Question
What is red’s best move (red’s turn next)?However, if red moves to the orange square, the blue player can win on the next turn!
Quiz: Hex Question
If red moves to the orange square, what is blue’s best move?
Answer: Hex Question
Blue has no good moves left!
Answer: Hex Question
Blue has no good moves left!This one’s bad – red can still connect the paths.
Answer: Hex Question
Blue has no good moves left!And this one’s bad too – red can still connect the paths.
Reasoning about 2-player games
To pick a good move, each player has to think about the other player’s possible responses!
Extensive Form Representation of Games
Notation: - two players, Max (Δ) and Min (∇).- Terminal states are represented by a with a
number for the utility for Max (Δ) inside.(Since we’re doing zero-sum games, the utility for Min (∇) is just the opposite of this number.)
Extensive Form Representation of Games
Game tree:
…
Max’s turn
Resulting worlds/boards
+1+2
-1
∆∇ ∇ ∇
∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ Max’s turn
…
…Terminal states,
with utility for Max
Max’s possible actions
Min’s turn
Resulting worlds/boards
Min’s possible actions
Minimax (Backup) AlgorithmBasic Idea:Compute ∆’s Value(n) for each node n in the game tree, starting with the leaves and working up (“backup”).
We’ll use a depth-first tree traversal.
Once this is calculated, Max will choose an action that leads to a child node with the highest possible value.
8 121
∆∇ ∇ ∇
4 43 20 152
Minimax (Backup) AlgorithmValue(n) =- If n is a terminal node, Value(n) = ∆’s
utility- If n is ∆’s turn:
- If n is ’s turn:∇
8 121
∆∇ ∇ ∇
4 43 20 152
Minimax (Backup) AlgorithmValue(n) =- If n is a terminal node, Value(n) =
Max’s utility- If n is ∆’s turn:
- If n is ’s turn:∇
8 121
∆∇ ∇ ∇
4 43 20 152
Value: min {3, 4, 4} = 3
Value: min {2, 30, 15} = 2
Quiz: Minimax (Backup) AlgorithmValue(n) =- If n is a terminal node,
Value(n) = Max’s utility- If n is ∆’s turn:
- If n is ’s turn:∇
8 121
∆∇ ∇ ∇
4 43 20 152
Value: min {3, 4, 4} = 3
Value: min {2, 30, 15} = 2
1. What is the Value of the middle node?∇
2. What is the value of the top ∆ node?
Answer: Minimax (Backup) Algorithm
Value(n) =- If n is a terminal node,
Value(n) = Max’s utility- If n is ∆’s turn:
- If n is ’s turn:∇
8 121
∆∇ ∇ ∇
4 43 20 152
1. What is the Value of the middle node?∇min {1, 8, 12} = 1
2. What is the value of the top ∆ node?Max {3, 1, 2} = 3
Quiz: Minimax
1. Compute the value of each node in the game tree.
2. Which action should Max take?
3. What is Min’s optimal response?
4
121
∆∇ ∇ ∇
4
56
20
-92 15301079
∆ ∆ ∆ ∆ ∆a b c
Answer: Minimax
1. Compute the value of each node in the game tree.
2. Which action should Max take? Action on right (c)
3. What is Min’s optimal response? Action on right
4
121
∆∇ ∇ ∇
4
56
20
-92 15301079
∆ ∆ ∆ ∆ ∆6 7 10 30 15
4 1 15
15a b c
From Extensive Form toNormal Form Games
Every “extensive form” game (even ones where you don’t have zero-sum utilities) can be made into a “normal form” game.
4
1
∆∇ ∇
4
5 107
∆ ∆A B
C D C D
A B A B
C D
A, A +4, -4 +5, -5
A, B +4, -4 +7, -7
B, A +1, -1 +4, -4
B, B +1, -1 +10, -10
Each sequence of actions for a player becomes a row or a column.The size of the resulting matrix can be exponential in the size of the game tree.
From Normal Form games toExtensive Form games
Not every Normal Form game can be represented using the Extensive Form I have showed you so far.
C D
C +2, -2 -3, +3
D -3, +3 +4, -4
-3
∆∇ ∇
2
C D
C D C D
-3 4
-3
∇∆ ∆
2
C D
C D C D
-3 4
?
?
∇∆
From Normal Form games toExtensive Form games
Can introduce new notation – information states – that allows the Extensive Form to represent any Normal Form game.
C D
C +2, -2 -3, +3
D -3, +3 +4, -4
-3
∆∇ ∇
2
C D
C D C D
-3 4
-3
∇∆ ∆
2
C D
C D C D
-3 4
∇∆
From Normal Form games toExtensive Form games
Information states are also useful for handling Partial Observability in turn-based games.Eg, in Poker, they can be used to represent the set of all hands your opponent may have been dealt.
C D
C +2, -2 -3, +3
D -3, +3 +4, -4
-3
∆∇ ∇
2
C D
C D C D
-3 4
-3
∇∆ ∆
2
C D
C D C D
-3 4
∇∆
Perfect Information Games
Definition: A game in extensive form has perfect information if every information state has only one node. (This is the same as our original version of game trees.)
Perfect Information is basically just another name for full observability for game trees.
We’ll talk more about partial observability later.
Theorem (Zermelo, 1913): Every finite, perfect-information game in extensive form has a pure-strategy Nash equilibrium.
Relation between Minimax Algorithm and Minimax Theorem
Recall that the Minimax Theorem says every 2-player, zero-sum game has a Value for each player and a Nash Equilibrium.
The guy who proved this (von Neumann) used essentially the Minimax algorithm to prove the theorem.
The Value of the root node in the Minimax algorithm is the same as the Value of the game for the Max player.
Quiz: Time Complexity of Minimax
Let b be the branching factor of the game tree.
Let m be the depth of the game tree.
What is the time complexity of Minimax?O(b+m)?O(bm)?O(bm)?O(mb)?
4
121
∆∇ ∇ ∇
4
56
20
-92 15301079
∆ ∆ ∆ ∆ ∆
Answer: Time Complexity of Minimax
Let b be the branching factor of the game tree.
Let m be the depth of the game tree.
What is the time complexity of Minimax?O(b+m)?O(bm)?O(bm)O(mb)?
4
121
∆∇ ∇ ∇
4
56
20
-92 15301079
∆ ∆ ∆ ∆ ∆
Quiz: Space Complexity of Minimax
Let b be the branching factor of the game tree.
Let m be the depth of the game tree.
What is the space complexity of Minimax?O(b+m)?O(bm)?O(bm)?O(mb)?
4
121
∆∇ ∇ ∇
4
56
20
-92 15301079
∆ ∆ ∆ ∆ ∆
Answer: Space Complexity of Minimax
Let b be the branching factor of the game tree.
Let m be the depth of the game tree.
What is the space complexity of Minimax?O(b+m)?O(bm)O(bm)?O(mb)?
4
121
∆∇ ∇ ∇
4
56
20
-92 15301079
∆ ∆ ∆ ∆ ∆
Quiz: Complexity of MinimaxChess: has an average branching factor of ~30, and each game takes on average ~40.
If it takes ~1 milli-second to compute the value of each board position in the game tree, how long to figure out the value of the game using Minimax?A few millisecondsA few secondsA few minutesA few hoursA few daysA few years?A few decades?A few millenia (thousands of years)?More time than the age of the universe?
Quiz: Complexity of MinimaxChess: has an average branching factor of ~30, and each game takes on average ~40.
If it takes ~1 milli-second to compute the value of each board position in the game tree, how long to figure out the value of the game using Minimax?A few millisecondsA few secondsA few minutesA few hoursA few daysA few years?A few decades?A few millenia (thousands of years)?More time than the age of the universe
Strategies for coping with complexity
• Reduce b• Reduce m• Memoize