DNA Starts to Learn Poker David Harlan Wood 4 * Hong Bi 1 Steven O. Kimbrough 2 Dongjun Wu 3...

32
DNA Starts to Learn Poker David Harlan Wood 4 * Hong Bi 1 Steven O. Kimbrough 2 Dongjun Wu 3 Junghuei Chen 1* Departments of 1 Chemistry & Biochemistry and 4 Computer & Information Sciences University of Delaware 2 The Wharton School, University of Pennsylvania 3 Benett S. Lebow College of Business, Drexel University
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of DNA Starts to Learn Poker David Harlan Wood 4 * Hong Bi 1 Steven O. Kimbrough 2 Dongjun Wu 3...

DNA Starts to Learn Poker

 David Harlan Wood4*

Hong Bi1

Steven O. Kimbrough2

Dongjun Wu3

Junghuei Chen1*

 

Departments of 1Chemistry & Biochemistry and 4Computer & Information Sciences

University of Delaware

2The Wharton School, University of Pennsylvania

3Benett S. Lebow College of Business, Drexel University

 

Player Dealt an Ace

Ace

Say Ace(adds $1)

Player

Dealer Call(adds $1)

Fold

Losses $ 1

Deal

Loses $2

2

Say Ace (adds $1)

Say 2 Player

Dealer

Losses $ 1

Call(adds $1)

Fold

Losses $ 1

Wins $ 2

Deal

Player dealt a 2

Ace 2

Say Ace(adds $1)

Say Ace (adds $1)

Say 2 Player

Dealer Call(adds $1)

Fold

Losses $ 1

Losses $ 1

Call(adds $1)

Fold

Losses $ 1

Wins $ 2

Deal

Player dealt an Ace Player dealt a 2

Loses $2

OBJECTIVE: To Obtain Probabilistic Strategies

Each player wants to obtain a strategy for the game.

A strategy prescribes an action in every possible situation.

That is, at each node, raising as a function of hand dealt.

Poker

Play New Game

New DealerStrategies

Deals

Assemble

New PlayerStrategies

Learning

Separate by Payoffs

ProgrammableSelection of Recovered Dealer Strategies

ProgrammableSelection of Recovered Player Strategies

Dealer’s Adaptation

Player’s Adaptation

Amplify

Crossover

Mutate

Amplify

Crossover

Mutate

Recover & DistributeStrategies

Recover & Cut Play Histories forPlayer’s & Dealer’s Strategies

Player’s StrategiesDealer’s Strategies

Learning Poker

Play New GameSeparate by Payoffs

ProgrammableSelection of Recovered Dealer Strategies

ProgrammableSelection of Recovered Player Strategies

Dealer’s Adaptation

Player’s Adaptation

Amplify

Crossover

Mutate

New DealerStrategies

Amplify

Crossover

Mutate

Deals

Assemble

New PlayerStrategies

Recover & DistributeStrategies

Recover & Cut Play Histories forPlayer’s & Dealer’s Strategies

Player’s StrategiesDealer’s Strategies

R.E. 1

Dealer’s Strategies

R.E. 2

Stopper Stopper

Say A’ FOLD’Call’Fold’

Player’s Strategies

R. E. 1

2’Say 2’ Fold’ErrorSAY2’ Say A’ A’Say A’

StopperStopper Stopper

2

Dealt 2

R.E. 2

A

R.E. 2

Ace 2

Say Ace(adds $1)

Say Ace (adds $1)

Say 2 Player

Dealer Call(adds $1)

Fold

Losses $ 1

Losses $ 1

Call(adds $1)

Fold

Losses $ 1

Wins $ 2

Deal

Loses $2

Sequences from: Sakamoto, et. al, DNA4 (1997)

Dealt A

Dealer’s Strategies

Player’s Strategies

Deals

Two Strategies and a Deal Define a Game

Ace Dealt

A Player’s Strategy

R. E. 1

2’Say 2’ Fold’ErrorSAY2’ Say A’ A’Say A’

A Dealer’s Strategy

R.E. 1 R.E. 2

Say A’ FOLD’Call’Fold’

A

R.E. 2

Cut with R.E.1 & R.E.2 and Assemble A Game

Player’s Strategy Dealer’s Strategy Deal

2’Say 2’ Fold’ErrorSay A’ A’Say A’ Say A’ Call’Fold’ ASAY 2’ FOLD’

2’Say 2’ Fold’ErrorSay A’ A’Say A’

R. E. 1

Say A’ Call’Fold’

R.E. 2

A

SAY 2’

FOLD’

Cut with R.E.1 & R.E.2 and Assemble A Game

Player’s Strategy Dealer’s Strategy Deal

2’Say 2’ Fold’ErrorSay A’ A’Say A’ Say A’ Call’Fold’ ASAY 2’ FOLD’

2’Say 2’ Fold’ErrorSay A’ A’Say A’

R. E. 1

Say A’ Call’Fold’

R.E. 2

A

SAY 2’

FOLD’

Two Strategies and a Deal Define a Game

Ace Dealt

A Player’s Strategy

R. E. 1

2’Say 2’ Fold’ErrorSAY2’ Say A’ A’Say A’

A Dealer’s Strategy

R.E. 1 R.E. 2

Say A’ FOLD’Call’Fold’

A

R.E. 2

Player’s Strategy Dealer’s Strategy Deal

2’Say 2’ Fold’ErrorSay A’ A’Say A’ Say A’ Call’Fold’ ASAY 2’ FOLD’

74-mer (S1) 57-mer (S2) 48-mer (S3) 53-mer (S4)

L1 (25 mer) L3 (28 mer)L2 (28 mer)

S1 S2 S3 S4 R1 R2 M

R1: Ligation Reaction R2: Purified Ligation Product

50

75

100

150

200225232

Ace

Say Ace(adds $1)

Say 2 Player

Dealer Call(adds $1)

Fold

Losses $ 1

Deal

Player dealt an Ace

Player Says A

Dealer Folds

Dealer MIGHT Change to Call

Loses $2

Player Dealt an Ace

2’Say 2’ Fold’ErrorSAY 2’Say A’ A’Say A’ Say A’ FOLD’Call’Fold’ A

Player’s Strategy Dealer’s Strategy Deal

Player Says Ace

A’Say A’

Extend(Say A) A

Player’s Strategy

Extend(Fold)

Say A’Fold’

Say ADealer Folds

Dealer’s Strategy

Extend(Call)

Dealer MIGHT Change to Call

Fold’ FOLD’Call’

FoldPreventer

Dealer’s Strategy

Error

Player Says Ace

A’Say A’

Extend(Say A) A

Extend(Fold)

Say A’Fold’

Say ADealer Fold

Extend(Call)

Dealer MIGHT Change to Call

Fold’ FOLD’Call’

FoldPreventer

200

225

250

275300

(232-mer)

(247-mer)

(262-mer)

(282-mer)

2

Say Ace (adds $1)

Say 2 Player

Dealer

Losses $ 1

Call(adds $1)

Fold

Losses $ 1

Wins $ 2

Deal

Player dealt a 2

Player Says 2

(Block Say 2)

Player Changes to Say A

Dealer Changes to Call

Dealer Folds

Player Dealt a 2

22’Say 2’ Fold’Error SAY 2’Say A’ A’Say A’ Say A’ FOLD’Call’Fold’

Player’s Strategy Dealer’s Strategy Deal

Dealer MIGHT Change to Call

FOLD’Call’

FoldExtend(Call)

Fold’Error

Preventer

Dealer’s Strategy

Dealer FoldsExtend(Fold)

Say A’Fold’

Say A

Dealer’s Strategy

Player MIGHT Change to Say Ace

Player’s Strategy

SAY 2’Say A’

Extend(Say A) Say 2

Player Says 2

Say 2’ 2’

Extend(Say 2) 2

Player’s Strategy

Ace 2

Say Ace(adds $1)

Say Ace (adds $1)

Say 2 Player

Dealer Call(adds $1)

Fold

Losses $ 1

Losses $ 1

Call(adds $1)

Fold

Losses $ 1

Wins $ 2

Deal

Player dealt an Ace Player dealt a 2

Player Says A

Dealer Folds

Dealer MIGHT Change to Call

Loses $2

Dealer MIGHT Change to Call

Dealer Folds

Player MIGHT Change to Say Ace

Player Says 2

Learning Poker

Play New GameSeparate by Payoffs

ProgrammableSelection of Recovered Dealer Strategies

ProgrammableSelection of Recovered Player Strategies

Dealer’s Adaptation

Player’s Adaptation

Amplify

Crossover

Mutate

New DealerStrategies

Amplify

Crossover

Mutate

Deals

Assemble

New PlayerStrategies

Recover & DistributeStrategies

Recover & Cut Play Histories forPlayer’s & Dealer’s Strategies

Player’s StrategiesDealer’s Strategies

Separate by Payoffs

ProgrammableSelection of Recovered Dealer Strategies

Dealer’s Adaptation

Amplify

Crossover

Mutate

Recover & DistributeStrategies

Recover & Cut Play Histories forPlayer’s & Dealer’s Strategies

Player’s StrategiesDealer’s Strategies

Strategies are returnedgrouped by outcomes:-$ 2, - $ 1, + $ 1, + $ 2.

Select Dealer’s ownPreferred mix of strategies to be bred

Breed by using PCR to restore population size using a variablemutation rate.

Crossover by pairwise recombiningof “change your mind” regions.

Learning

Ace 2

Say Ace(adds $1)

Say Ace (adds $1)

Say 2 Player

Dealer Call(adds $1)

Fold

Losses $ 1

Losses $ 1

Call(adds $1)

Fold

Losses $ 1

Wins $ 2

Deal

Player dealt an Ace Player dealt a 2

Loses $2

OBJECTIVE: To Obtain Probabilistic Strategies

Each player wants to obtain a strategy for the game.

A strategy prescribes an action in every possible situation.

That is, at each node, raising as a function of hand dealt.

Complexity

Our complexity is linear in the number of nodes in the tree # nodes in tree = 2 players + betting rounds

At each node, we need a probability distribution giving “level of bet” as a function of “dealt hand”.

For us, probability distribution is substituted by probabilistichybridization of DNA encoded “dealt hand” to adapting“change you mind about folding” region of strategy.

The output (if generated) is an adapting “level of bet”region of strategy.

handbetnext

next’

bet generator

next

Extend

bet’ hand’

hand evaluator

Comparison

Koller and Pfeffer derive equilibrium mixed strategies withcomplexity polynomial in

# nodes * # possible deals * 2 betting levels

“Representations and Solutions for Game-Theoretic Problems,”Artificial Intelligence (1997)

• Two-player games only• Don’t exploit weakness of opponent• No dynamics, only equilibrium

Player 1

Player 2

Player 3 22

222

2

2

22

222

3-Player Poker: All Possible Deals

Course of Play

P1

P2

P3

P3

P2

P1

Pass Bet $ a

Pass

Pass Bet $ a

Bet $ a

F C

F C F CF C

F C

F C F C

F C F C

C: Call (add $ b) F: Fold

Learning Poker

Recover Dealer’s & Player’s Strategies

Play New GameSeparate by Payoffs

ProgrammableSelection of Recovered Dealer Strategies

ProgrammableSelection of Recovered Player Strategies

Dealer’s Adaptation

Player’s Adaptation

Amplify

Crossover

Mutate

New DealerStrategies

Amplify

Crossover

Mutate

Deals

Assemble

New PlayerStrategies

A

AA

A

AA

2

22

2

22

2

2

22

A

A

A

AA

A

AA

2

22

2

22

2

2A

A

AA

3

33

3

33

3

33

3

33

3

33

3

33

Dealer MIGHT Change to Call

Dealer Folds

Player MIGHT Change to Say Ace

Player Says 2