Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

33
Games of Chance Games of Chance Introduction to Introduction to Artificial Intelligence Artificial Intelligence COS302 COS302 Michael L. Littman Michael L. Littman Fall 2001 Fall 2001
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Page 1: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Games of ChanceGames of Chance

Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence

COS302COS302

Michael L. LittmanMichael L. Littman

Fall 2001Fall 2001

Page 2: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

AdministrationAdministration

Rush hour (10/22).Rush hour (10/22).

Today not part of midterm (10/24), Today not part of midterm (10/24), just final.just final.

Page 3: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Uncertainty in SearchUncertainty in Search

We’ve assumed everything is known: We’ve assumed everything is known: starting state, neighbors, goals, starting state, neighbors, goals, etc.etc.

Often need to make decisions even Often need to make decisions even though some things are uncertain.though some things are uncertain.

Complicates things…Complicates things…

Page 4: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Types of UncertaintyTypes of Uncertainty

Opponent: What will other player do?Opponent: What will other player do?• MinimaxMinimax

Outcome: Which neighbor get?Outcome: Which neighbor get?• Model via probability distributionModel via probability distribution

State: Where are we now?State: Where are we now?• Hidden informationHidden information

Transition: What are the rules?Transition: What are the rules?• Need to use learning to find outNeed to use learning to find out

Page 5: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Nim-RandNim-Rand

Pile of sticks.Pile of sticks.• Lose if take last stick.Lose if take last stick.• On your turn, take 1 or 2.On your turn, take 1 or 2.• Flip a coin. If H, take 1 more.Flip a coin. If H, take 1 more.

Which type of uncertainty?Which type of uncertainty?

Page 6: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Value of a GameValue of a Game

Without randomness: maximize your Without randomness: maximize your winnings in the worst case.winnings in the worst case.

With randomness: maximize your With randomness: maximize your expectedexpected winnings in the worst winnings in the worst case.case.

Want to do well on average.Want to do well on average.

What games are like this?What games are like this?

Page 7: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Nim-Rand TreeNim-Rand Tree

(|||)-X(|||)-X

cc cc(||)-Y(||)-Y

(|)-Y(|)-Y (|)-Y(|)-Y ()-Y()-Ycc

()-X()-X ()-X()-X ()-X()-X(|)-X(|)-X

+1 +1 -1-1

1 2

+1 +1

1 2

+1

()-X()-X+1

+1

()-Y()-Y

Page 8: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Nim-Rand ValuesNim-Rand Values

(|||)-X(|||)-X

cc cc(||)-Y(||)-Y

(|)-Y(|)-Y (|)-Y(|)-Y ()-Y()-Ycc

()-X()-X ()-X()-X ()-X()-X(|)-X(|)-X

+1 +1 -1-1

1 2

+1 +1

1 2

+1

()-X()-X+1

+1

()-Y()-Y-1-1+1+1

+1+1 +1+1 +1+1

-1-1

-1-1

+1+1 +1+1+0+0

+0+0+0.5+0.5 +0+0

+0.5+0.5

Page 9: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Search ModelSearch Model

States, terminal states (G), values for States, terminal states (G), values for terminal states (V).terminal states (V).

X states (maximizer), Y states X states (maximizer), Y states (minimizer), Z states (chance)(minimizer), Z states (chance)

For all s in Z, for all s’ in N(s)For all s in Z, for all s’ in N(s)

P(s’|s) is the probability of reaching P(s’|s) is the probability of reaching s’ from s.s’ from s.

Page 10: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Game Value (no loops)Game Value (no loops)

Gameval(s) = {Gameval(s) = {If (G(s)) return V(s)If (G(s)) return V(s)Else if s in XElse if s in X

return maxreturn maxs’ in N(s) s’ in N(s) Gameval(s’)Gameval(s’)Else if s in YElse if s in Y

return minreturn mins’ in N(s) s’ in N(s) Gameval(s’)Gameval(s’)Else Else

return sumreturn sums’ in N(s) s’ in N(s) P(s’|s) Gameval(s’)P(s’|s) Gameval(s’)}}

Page 11: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Games with LoopsGames with Loops

No known poly time algorithm.No known poly time algorithm.

Approximated by Approximated by value iterationvalue iteration::

For all s, if G(s), L(s) = V(s), else 0For all s, if G(s), L(s) = V(s), else 0

Repeat until changes are small:Repeat until changes are small:

for all s, L(s) = for all s, L(s) =

max, min, avg L(s’), s’ in N(s)max, min, avg L(s’), s’ in N(s)

depending on s in X, Y, or Z.depending on s in X, Y, or Z.

Page 12: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Hidden InformationHidden Information

Games like Poker, 2-player bridge, Games like Poker, 2-player bridge, Scrabble ™, Diplomacy, StrategoScrabble ™, Diplomacy, Stratego

Don’t fit game tree model, even Don’t fit game tree model, even when chance nodes included.when chance nodes included.

Page 13: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Pure StrategiesPure Strategies

X:X: II: 1=L, 4=L: 1=L, 4=L

IIII: 1=L, 4=R: 1=L, 4=R

IIIIII: 1=R, 4=L: 1=R, 4=L

IVIV: 1=R, 4=R: 1=R, 4=R

Y:Y: II: 2=L, 3=R: 2=L, 3=R

IIII: 2=M, 3=R: 2=M, 3=R

IIIIII: 2=R, 3=R: 2=R, 3=R

X-1

+7 +3

-1

+5

+4

Y-2 Y-3

X-4

L R

L R

L M RR

Page 14: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Matrix FormMatrix Form

Summarizes all decisions in one for Summarizes all decisions in one for each, chosen simultaneouslyeach, chosen simultaneously

X-X-II X-X-IIII X-X-IIIIII X-X-IVIV

Y-Y-II 77 77 22 22

Y-Y-IIII 33 33 22 22

Y-Y-IIIIII -1-1 44 22 22

Page 15: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Value of Matrix GameValue of Matrix Game

X picks column with largest minX picks column with largest min

Y picks row with smallest maxY picks row with smallest max

X-X-II X-X-IIII X-X-IIIIII X-X-IVIV

Y-Y-II 77 77 22 22

Y-Y-IIII 33 33 22 22

Y-Y-IIIIII -1-1 44 22 22

Page 16: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

MinimaxMinimax

Von Neumann proved zero-sum Von Neumann proved zero-sum matrix game, minimax=maximin.matrix game, minimax=maximin.

Given perfect information (no state Given perfect information (no state uncertainty), there exists optimal uncertainty), there exists optimal pure strategy for each player.pure strategy for each player.

Page 17: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Game w/ Chance NodesGame w/ Chance Nodes

X-1

+4 -20

-5

+3

+10

c Y-3

c

L R

0.5 0.5 RL

0.8 0.2

Use expected Use expected valuesvalues

X-X-I I (L)

X-X-II II (R)

Y-Y-I I (L) -8-8 -2-2

Y-Y-II II (R) -8-8 +3+3

Page 18: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

More General MatricesMore General Matrices

What game tree leads to this matrix?What game tree leads to this matrix?

Does von Neumann’s theorem still Does von Neumann’s theorem still hold?hold?

X-X-I I (L)

X-X-II II (R)

Y-Y-I I (L) 11 00

Y-Y-II II (R) 00 11

Page 19: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Hidden Info. MatricesHidden Info. Matrices

X picks L or R, keeping the choice X picks L or R, keeping the choice hidden from Y.hidden from Y.

Y makes a choice.Y makes a choice.

X’s choice is revealed and game X’s choice is revealed and game ends.ends. X-X-I I

(L)X-X-II II (R)

Y-Y-I I (L) 11 00

Y-Y-II II (R) 00 11

Page 20: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Micro PokerMicro Poker

X is dealt high X is dealt high or low card, or low card, holds/folds.holds/folds.

Y folds/sees.Y folds/sees.

High card winsHigh card wins

Y can’t see X’s Y can’t see X’s card.card.

c

-20

+10 -40 +30+10

X-L X-H

Y

fold hold

0.5 0.5

Yseefold fold see

hold

Page 21: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Matrix FormMatrix Form

Player X can guarantee itself +1 on Player X can guarantee itself +1 on average. How?average. How?

It can even announce its strategy.It can even announce its strategy.

X-X-I I (fold)

X-X-II II (hold)

Y-Y-I I (fold) -5-5 +10+10

Y-Y-II II (see) +5+5 -5-5

Page 22: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Mixed StrategiesMixed Strategies

Pick a number p.Pick a number p.

X: With prob. p, fold; else hold.X: With prob. p, fold; else hold.

Since Y doesn’t know what’s coming, Since Y doesn’t know what’s coming, the response will sometimes work, the response will sometimes work, sometimes not.sometimes not.

Page 23: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Guess a ProbabilityGuess a Probability

X announces X announces p=1/3.p=1/3.

Y’s pick?Y’s pick?

X-X-I I (fold)

X-X-II II (hold)

Y-Y-I I (fold) -5-5 +10+10

Y-Y-II II (see) +5+5 -5-5

Fold: +5Fold: +5

See: -1 2/3See: -1 2/3

seesee

Page 24: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Guess a ProbabilityGuess a Probability

X announces X announces p=2/3.p=2/3.

Y’s pick?Y’s pick?

X-X-I I (fold)

X-X-II II (hold)

Y-Y-I I (fold) -5-5 +10+10

Y-Y-II II (see) +5+5 -5-5

Fold: +0Fold: +0

See: +1 2/3See: +1 2/3

foldfold

Page 25: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

All StrategiesAll Strategies

What should What should X pick for p X pick for p to to maximize maximize its worst its worst case?case?

p=0.6p=0.6

Payoff +1Payoff +1 -5

0

5

10

0 0.5 1

see

fold

pp

Page 26: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Randomizing YRandomizing Y

If Y random, If Y random, answer is answer is the same.the same.

No matter No matter what, X can what, X can guarantee guarantee itself +1.itself +1.

-5

0

5

10

0 0.5 1

see

fold

Page 27: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

BluffingBluffing

c

-20

+10 -40 +30+10

X-L X-H

Y

fold hold

0.5 0.5

Yseefold fold see

hold

X: On a low X: On a low card, bluff card, bluff with prob. with prob. 0.4.0.4.

Y: On hold, Y: On hold, fold with fold with prob. 0.4.prob. 0.4.

Page 28: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Solving 2x2 GameSolving 2x2 Game

X-X-I I with prob. pwith prob. p

X’s expected gain X’s expected gain vs. Y-vs. Y-II : :

mm1111p+mp+m1212(1-p)(1-p)

vs. Y-vs. Y-IIII : :

mm2121p+mp+m2222(1-p)(1-p)

X-X-II X-X-IIII

Y-Y-II mm1111 mm1212

Y-Y-IIII mm2121 mm2222

Maximize the Maximize the minimum.minimum.

Try p=0, p=1, where lines meet.Try p=0, p=1, where lines meet.

Page 29: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Solving General mxnSolving General mxn

Linear program: pLinear program: p11,…,p,…,pnn..

pp11+…+p+…+pnn = 1, p = 1, pii 0 0

Maximize X’s gain, gMaximize X’s gain, g

vs Y-vs Y-II: m: m1111 p p11 + … +m + … +mn1n1 p pn n g g

vs Y-vs Y-IIII: m: m1212 p p11 + … +m + … +mn2n2 p pn n g g

… …

Against all Y strategies.Against all Y strategies.

Page 30: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

IssuesIssues

Can we solve poker?Can we solve poker?• More than 2 playersMore than 2 players• Not zero sum (collude)Not zero sum (collude)• Huge state spaceHuge state space

Poker: Opponent modelingPoker: Opponent modeling

Bridge: Use simulation to Bridge: Use simulation to approximateapproximate

Page 31: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

What to LearnWhat to Learn

Minimax value in games of chance Minimax value in games of chance and the DFS algorithm for and the DFS algorithm for computing it.computing it.

Converting games to matrix form.Converting games to matrix form.

Solve 2x2 game.Solve 2x2 game.

Page 32: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Homework 5 (due 11/7)Homework 5 (due 11/7)

1.1. The value iteration algorithm from the The value iteration algorithm from the Games of ChanceGames of Chance lecture can be lecture can be applied to deterministic games with applied to deterministic games with loops. Argue that it produces the same loops. Argue that it produces the same answer as the “Loopy” algorithm from answer as the “Loopy” algorithm from the the Game TreeGame Tree lecture. lecture.

2.2. Write the matrix form of the game tree Write the matrix form of the game tree below.below.

Page 33: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Game TreeGame Tree

X-1

+2

-1 +4

Y-2 Y-3

X-4

L R

L R

L R

+5L

+2R