Game Playing 2

82
GAME PLAYING 2

description

Game Playing 2. This Lecture. Alpha-beta pruning Games with chance. Nondeterminism. Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX). MAX’s play. MIN’s play. MAX cannot tell what move will be played. - PowerPoint PPT Presentation

Transcript of Game Playing 2

Page 1: Game Playing 2

GAME PLAYING 2

Page 2: Game Playing 2

THIS LECTURE

Alpha-beta pruning Games with chance

Page 3: Game Playing 2

NONDETERMINISM

Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX)

MAX’s play

MAX cannot tell what move will be played

MIN’s play

Page 4: Game Playing 2

NONDETERMINISM

Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX)

MAX’s play

MAX must decide what to play for BOTH these outcomes

MIN’s playInstead of a single path, the agent must construct an entire plan

Page 5: Game Playing 2

MINIMAX BACKUP

MIN’s turn

MAX’s turn

+1

+10

-1

MAX’s turn

0

+10 0

0 -1

+1

Page 6: Game Playing 2

DEPTH-FIRST MINIMAX ALGORITHM

MAX-Value(S)1. If Terminal?(S) return Result(S)2. Return maxS’SUCC(S) MIN-Value(S’)

MIN-Value(S)1. If Terminal?(S) return Result(S)2. Return minS’SUCC(S) MAX-Value(S’)

MINIMAX-Decision(S) Return action leading to state S’SUCC(S) that

maximizes MIN-Value(S’)

Page 7: Game Playing 2

REAL-TIME GAME PLAYING WITH EVALUATION FUNCTION

e(s): function indicating estimated favorability of a state to MAX

Keep track of depth, and add line: If(depth(s) = cutoff) return e(s)

After terminal test

Page 8: Game Playing 2

CAN WE DO BETTER?

Yes ! Much better !

3

-1

Pruning

-1

3

This part of the tree can’t have any effect on the value that will be backed up to the root

Page 9: Game Playing 2

EXAMPLE

Page 10: Game Playing 2

EXAMPLE

b = 2

2

The beta value of a MINnode is an upper bound onthe final backed-up value.It can never increase

Page 11: Game Playing 2

EXAMPLE

The beta value of a MINnode is an upper bound onthe final backed-up value.It can never increase

1

b = 1

2

Page 12: Game Playing 2

EXAMPLE

a = 1

The alpha value of a MAXnode is a lower bound onthe final backed-up value.It can never decrease

1

b = 1

2

Page 13: Game Playing 2

EXAMPLE

a = 1

1

b = 1

2 -1

b = -1

Page 14: Game Playing 2

EXAMPLE

a = 1

1

b = 1

2 -1

b = -1

Search can be discontinued belowany MIN node whose beta value is less than or equal to the alpha valueof one of its MAX ancestors

Search can be discontinued belowany MIN node whose beta value is less than or equal to the alpha valueof one of its MAX ancestors

Page 15: Game Playing 2

ALPHA-BETA PRUNING

Explore the game tree to depth h in depth-first manner

Back up alpha and beta values whenever possible

Prune branches that can’t lead to changing the final decision

Page 16: Game Playing 2

ALPHA-BETA ALGORITHM

Update the alpha/beta value of the parent of a node N when the search below N has been completed or discontinued

Discontinue the search below a MAX node N if its alpha value is the beta value of a MIN ancestor of N

Discontinue the search below a MIN node N if its beta value is the alpha value of a MAX ancestor of N

Page 17: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

MAX

MIN

MAX

MIN

MAX

MIN

Page 18: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

MAX

MIN

MAX

MIN

MAX

MIN

Page 19: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

MAX

MIN

MAX

MIN

MAX

MIN

Page 20: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0 -3

MAX

MIN

MAX

MIN

MAX

MIN

Page 21: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0 -3

MAX

MIN

MAX

MIN

MAX

MIN

Page 22: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0 -3

MAX

MIN

MAX

MIN

MAX

MIN

Page 23: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0 -3 3

3

MAX

MIN

MAX

MIN

MAX

MIN

Page 24: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0 -3 3

3

MAX

MIN

MAX

MIN

MAX

MIN

Page 25: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

MAX

MIN

MAX

MIN

MAX

MIN

Page 26: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

5

MAX

MIN

MAX

MIN

MAX

MIN

Page 27: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

Page 28: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

MAX

MIN

MAX

MIN

MAX

MIN

Page 29: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

MAX

MIN

MAX

MIN

MAX

MIN

Page 30: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

MAX

MIN

MAX

MIN

MAX

MIN

Page 31: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

0MAX

MIN

MAX

MIN

MAX

MIN

Page 32: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

5

0MAX

MIN

MAX

MIN

MAX

MIN

Page 33: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

1

1

0MAX

MIN

MAX

MIN

MAX

MIN

Page 34: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

1

1

-3

0MAX

MIN

MAX

MIN

MAX

MIN

Page 35: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

1

1

-3

0MAX

MIN

MAX

MIN

MAX

MIN

Page 36: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

1

1

-3

1

1

0MAX

MIN

MAX

MIN

MAX

MIN

Page 37: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

1

1

-3

1

1

-5

0MAX

MIN

MAX

MIN

MAX

MIN

Page 38: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

1

1

-3

1

1

-5

0MAX

MIN

MAX

MIN

MAX

MIN

Page 39: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

1

1

-3

1

1

-5

-5

-5

0MAX

MIN

MAX

MIN

MAX

MIN

Page 40: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

1

1

-3

1

1

-5

-5

-5

0

Page 41: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

1

1

-3

1

1

-5

-5

-5

0

1

MAX

MIN

MAX

MIN

MAX

MIN

Page 42: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

1

1

-3

1

1

-5

-5

-5

2

2

2

2

1

1

MAX

MIN

MAX

MIN

MAX

MIN

Page 43: Game Playing 2

EXAMPLE

0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35

0

0

0

0 -3 3

3

0

2

2

2

2

1

1

-3

1

1

-5

-5

-5

1

2

2

2

2

1MAX

MIN

MAX

MIN

MAX

MIN

Page 44: Game Playing 2

HOW MUCH DO WE GAIN?

Consider these two cases:

3

a = 3

-1

b=-1

(4)

3

a = 3

4

b=4

-1

Page 45: Game Playing 2

HOW MUCH DO WE GAIN? Assume a game tree of uniform branching factor b Minimax examines O(bh) nodes, so does alpha-beta

in the worst-case The gain for alpha-beta is maximum when:

The children of a MAX node are ordered in decreasing backed up values

The children of a MIN node are ordered in increasing backed up values

Then alpha-beta examines O(bh/2) nodes [Knuth and Moore, 1975]

But this requires an oracle (if we knew how to order nodes perfectly, we would not need to search the game tree)

If nodes are ordered at random, then the average number of nodes examined by alpha-beta is ~O(b3h/4)

Page 46: Game Playing 2

ALPHA-BETA IMPLEMENTATION MAX-Value(S,,)

1. If Terminal?(S) return Result(S)2. For all S’SUCC(S)3. max(,MIN-Value(S’,,))4. If , then return 5. Return

MIN-Value(S,,)1. If Terminal?(S) return Result(S)2. For all S’SUCC(S)3. min(,MAX-Value(S’,,))4. If , then return 5. Return

Alpha-Beta-Decision(S) Return action leading to state S’SUCC(S) that maximizes MIN-

Value(S’,-,+)

Page 47: Game Playing 2

HEURISTIC ORDERING OF NODES

Order the nodes below the root according to the values backed-up at the previous iteration

Page 48: Game Playing 2

OTHER IMPROVEMENTS

Adaptive horizon + iterative deepening Extended search: Retain k>1 best paths,

instead of just one, and extend the tree at greater depth below their leaf nodes (to help dealing with the “horizon effect”)

Singular extension: If a move is obviously better than the others in a node at horizon h, then expand this node along this move

Use transposition tables to deal with repeated states

Null-move search

Page 49: Game Playing 2

GAMES OF CHANCE

Page 50: Game Playing 2

GAMES OF CHANCE

Dice games: backgammon, Yahtzee, craps, … Card games: poker, blackjack, …

Is there a fundamental difference between the nondeterminism in chess-playing vs. the nondeterminism in a dice roll?

Page 51: Game Playing 2

MAX

CHANCE

MIN

CHANCE

MAX

Page 52: Game Playing 2

EXPECTED VALUES

The utility of a MAX/MIN node in the game tree is the max/min of the utility values of its successors

The expected utility of a CHANCE node in the game tree is the average of the utility values of its successors

ExpectedValue(s) = s’SUCC(s) ExpectedValue(s’) P(s’)

MinimaxValue(s) = max s’SUCC(s) MinimaxValue(s’)

Compare to

MinimaxValue(s) = min s’SUCC(s) MinimaxValue(s’)

CHANCE nodes

MAX nodes

MIN nodes

Page 53: Game Playing 2

ADVERSARIAL GAMES OF CHANCE

E.g., Backgammon MAX nodes, MIN nodes, CHANCE nodes Expectiminimax search Backup step:

MAX = maximum of children CHANCE = average of children MIN = minimum of children CHANCE = average of children

4 levels of the game tree separate each of MAX’s turns!

Evaluation function? Pruning?

Page 54: Game Playing 2

GENERALIZING MINIMAX VALUES

Utilities can be continuous numerical values, rather than +1,0,-1 Allows maximizing the amount of “points” (e.g.,

$) rewarded instead of just achieving a win Rewards associated with terminal states Costs can be associated with certain

decisions at non-terminal states (e.g., placing a bet)

Page 55: Game Playing 2

ROULETTE

“Game tree” only has depth 2 Place a bet Observe the roulette wheel

No bet

Bet: Red, $5

Red Not red

Chance node

18/38 20/38Probabilities

+10 0

Page 56: Game Playing 2

CHANCE NODE BACKUP

Expected value: For k children, with backed up

values v1,…,vk

Chance node value =p1 * v1 + p2 * v2 + … + pk * vk

Red Not red

Chance node

18/38 20/38Probabilities

+10 0

Bet: Red, $5

Value:18/38 * 10 + 20/38 * 0= 4.74

Page 57: Game Playing 2

MAX/CHANCE NODES

Red Not red

18/38 20/38

+10 0

Bet: Red, $5

4.74

MAX

Chance

Bet: 17, $5

3.95 = 150/38

17 Not 17

1/38 37/38

+150

0

Max should pick the action leading to the node with the highest value

Page 58: Game Playing 2

A SLIGHTLY MORE COMPLEX EXAMPLE

Two fair coins Pay $1 to start, at

which point both are flipped

Can flip up to two coins again, at a cost of $1 each

Payout: $5 for HH, $1 for HT or TH, $0 for TT

HT

HT HH

1/2 1/2

TTHT

1/2 1/2

HT Flip T Flip H

Done

HT HH TTHT

1/2 1/2 1/2 1/2

Flip TFlip HHT

Done

TT

DoneFlip T

HT TT

1/2 1/2

Page 59: Game Playing 2

A SLIGHTLY MORE COMPLEX EXAMPLE

Two fair coins Pay $1 to start, at

which point both are flipped

Can flip up to two coins again, at a cost of $1 each

Payout: $5 for HH, $1 for HT or TH, $0 for TT

HT

HT HH

1/2 1/2

TTHT

1/2 1/2

HT Flip T Flip H

Done

HT HH TTHT

1/2 1/2 1/2 1/2

Flip TFlip HHT

Done

TT

DoneFlip T

HT TT

1/2 1/2

1

5-1=4

1-1=0

1-2=-1 5-2=3 -1 -2 -1 -2

-1

Page 60: Game Playing 2

A SLIGHTLY MORE COMPLEX EXAMPLE

Two fair coins Pay $1 to start, at

which point both are flipped

Can flip up to two coins again, at a cost of $1 each

Payout: $5 for HH, $1 for HT or TH, $0 for TT

HT

HT HH

1/2 1/2

TTHT

1/2 1/2

HT Flip T Flip H

Done

HT HH TTHT

1/2 1/2 1/2 1/2

Flip TFlip HHT

Done

TT

DoneFlip T

HT TT

1/2 1/2

1

4

0

-1 3 -1 -2 -1 -2

-11 -3/2

Page 61: Game Playing 2

A SLIGHTLY MORE COMPLEX EXAMPLE

Two fair coins Pay $1 to start, at

which point both are flipped

Can flip up to two coins again, at a cost of $1 each

Payout: $5 for HH, $1 for HT or TH, $0 for TT

HT

HT HH

1/2 1/2

TTHT

1/2 1/2

HT Flip T Flip H

Done

HT HH TTHT

1/2 1/2 1/2 1/2

Flip TFlip HHT

Done

TT

DoneFlip T

HT TT

1/2 1/2

1

-1 -2 -1 -2

-11 -3/2

1 4

0

-1 3

Page 62: Game Playing 2

A SLIGHTLY MORE COMPLEX EXAMPLE

Two fair coins Pay $1 to start, at

which point both are flipped

Can flip up to two coins again, at a cost of $1 each

Payout: $5 for HH, $1 for HT or TH, $0 for TT

HT

HT HH

1/2 1/2

TTHT

1/2 1/2

HT Flip T Flip H

Done

HT HH TTHT

1/2 1/2 1/2 1/2

Flip TFlip HHT

Done

TT

DoneFlip T

HT TT

1/2 1/2

1

-1 -2 -1 -2

-12 -3/2

2

3

4

0

-1 3

Page 63: Game Playing 2

A SLIGHTLY MORE COMPLEX EXAMPLE

Two fair coins Pay $1 to start, at

which point both are flipped

Can flip up to two coins again, at a cost of $1 each

Payout: $5 for HH, $1 for HT or TH, $0 for TT

HT

HT HH

1/2 1/2

TTHT

1/2 1/2

HT Flip T Flip H

Done

HT HH TTHT

1/2 1/2 1/2 1/2

Flip TFlip HHT

Done

TT

DoneFlip T

HT TT

1/2 1/2

1

-1 -2 -1 -2

-12 -3/2

2

3

-3/2

4

0

-1 3

Page 64: Game Playing 2

TTHT

1/2 1/2

A SLIGHTLY MORE COMPLEX EXAMPLE

Two fair coins Pay $1 to start, at

which point both are flipped

Can flip up to two coins again, at a cost of $1 each

Payout: $5 for HH, $1 for HT or TH, $0 for TT

HT

HT HH

1/2 1/2

HT Flip T Flip H

Done

HT HH TTHT

1/2 1/2 1/2 1/2

Flip TFlip HHT

Done

TT

DoneFlip T

HT TT

1/2 1/2

1

-1 -2 -1 -2

-12 -3/2

2

3

2

-3/2

-1

1/2

4

0

-1 3

Page 65: Game Playing 2

A SLIGHTLY MORE COMPLEX EXAMPLE

Two fair coins Pay $1 to start, at

which point both are flipped

Can flip up to two coins again, at a cost of $1 each

Payout: $5 for HH, $1 for HT or TH, $0 for TT

HT

HT HH

1/2 1/2

TTHT

1/2 1/2

HT Flip T Flip H

Done

HT HH TTHT

1/2 1/2 1/2 1/2

Flip TFlip HHT

Done

TT

DoneFlip T

HT TT

1/2 1/2

1

-1 -2 -1 -2

-12 -3/2

2

3

2

-3/2

-1

1/2

3

4

0

-1 3

Page 66: Game Playing 2

CARD GAMES

Blackjack (6-deck), video poker: similar to coin-flipping game

But in many card games, need to keep track of history of dealt cards in state because it affects future probabilities One-deck blackjack Bridge Poker

(We won’t even get started on betting strategies)

Page 67: Game Playing 2

PARTIALLY OBSERVABLE GAMES

Partial observability Don’t see entire state (e.g., other players’ hands) “Fog of war”

Examples: Kriegspiel (see R&N) Battleship Stratego

Page 68: Game Playing 2

PARTIALLY-OBSERVABLE CARD GAMES

One possible strategy: Consider all possible deals Solve each deal as a

fully-observable problem Choose the move that has the

best average minimax value “Averaging over clairvoyance” [Why doesn’t this always work?]

Page 69: Game Playing 2

69

OBSERVATION OF THE REAL WORLD

Realworldin some state

Percepts

On(A,B)

On(B,Table)

Handempty

Interpretation of the percepts in the representation language

Percepts can be user’s inputs, sensory data (e.g., image pixels), information received from other agents, ...

Page 70: Game Playing 2

70

SECOND SOURCE OF UNCERTAINTY:IMPERFECT OBSERVATION OF THE WORLD

Observation of the world can be: Partial, e.g., a vision sensor can’t see through

obstacles (lack of percepts)

R1 R2

The robot may not know whether there is dust in room R2

Page 71: Game Playing 2

71

SECOND SOURCE OF UNCERTAINTY:IMPERFECT OBSERVATION OF THE WORLD

Observation of the world can be: Partial, e.g., a vision sensor can’t see through

obstacles Ambiguous, e.g., percepts have multiple

possible interpretations

A

BCOn(A,B) On(A,C)

Page 72: Game Playing 2

72

SECOND SOURCE OF UNCERTAINTY:IMPERFECT OBSERVATION OF THE WORLD

Observation of the world can be: Partial, e.g., a vision sensor can’t see through

obstacles Ambiguous, e.g., percepts have multiple

possible interpretations Incorrect

Page 73: Game Playing 2

73

EXAMPLE: BELIEF STATE

In the presence of non-deterministic sensory uncertainty, an agent belief state represents all the states of the world that it thinks are possible at a given time or at a given stage of reasoning

In the probabilistic model of uncertainty, a probability is associated with each state to measure its likelihood to be the actual state

0.2 0.3 0.4 0.1

Page 74: Game Playing 2

BELIEF STATE

A belief state is the set of all states that an agent think are possible at any given time or at any stage of planning a course of actions, e.g.:

To plan a course of actions, the agent searches a space of belief states, instead of a space of states

Page 75: Game Playing 2

SENSOR MODEL State space S The sensor model is a function

SENSE: S 2S

that maps each state s S to a belief state (the set of all states that the agent would think possible if it were actually observing state s)

Example: Assume our vacuum robot can perfectly sense the room it is in and if there is dust in it. But it can’t sense if there is dust in the other roomSENSE( ) =

SENSE( ) =

Page 76: Game Playing 2

VACUUM ROBOT ACTION MODEL

Right either moves the robot right, or does nothing

Left always moves the robot to the left, but it may occasionally deposit dust in the right room

Suck picks up the dirt in the room, if any, and always does the right thing

• The robot perfectly senses the room it is in and whether there is dust in it

• But it can’t sense if there is dust in the other room

Page 77: Game Playing 2

TRANSITION BETWEEN BELIEF STATES Suppose the robot is initially in state:

After sensing this state, its belief state is:

Just after executing Left, its belief state will be:

After sensing the new state, its belief state will be:

or if there is no dust if there is dust in R1

in R1

Page 78: Game Playing 2

TRANSITION BETWEEN BELIEF STATES

Playing a “game against nature”

Left

Clean(R1) Clean(R1)

After receiving an observation, the robot will have one of these two belief states

Page 79: Game Playing 2

AND/OR TREE OF BELIEF STATES

Left

Suck

Suck

goal

A goal belief state is one in which all states are goal states

An action is applicable to a belief state B if its preconditions are achieved in all states in B

Right

loop goal

Page 80: Game Playing 2

RECAP

Alpha-beta pruning: reduce complexity of minimax to O(bh/2) ideally

Games with chance Expected values: averaging over probabilities

Partial observability Reason about sets of states: belief state

Much more on latter 2 topics later

Page 81: Game Playing 2

PROJECT PROPOSAL (OPTIONAL)

Mandatory: instructor’s advance approval Out of town 9/24-10/1, can discuss via email

Project title, team members 1/2 to 1 page description

Specific topic (problem you are trying to solve, topic of survey, etc)

Why did you choose this topic? Methods (researched in advance, sources of

references) Expected results

Email to me by 10/2

Page 82: Game Playing 2

HOMEWORK

Reading: R&N 4.3-4,13.1-2