Poker AI: Equilibrium, Online Resolving, Deep Learning and...

45
Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Nikolai Yakovenko NVidia ADLR Group -- Santa Clara CA Columbia University Deep Learning Seminar April 2017

Transcript of Poker AI: Equilibrium, Online Resolving, Deep Learning and...

Page 1: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

PokerAI:Equilibrium,OnlineResolving,DeepLearningand

ReinforcementLearning

NikolaiYakovenkoNVidiaADLRGroup-- SantaClaraCA

ColumbiaUniversityDeepLearningSeminarApril2017

Page 2: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

PokerisaTurn-BasedVideoGame

Call

Raise

Fold

Page 3: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

ManyDifferentPokerGamesSingleDrawVideoPoker

2-7LowballTripleDraw(makelowhandfrom5cardswithmultipledraws)

LimitHold’em

NoLimitHold’emWorldSeriesofPoker[Humans]AnnualComputerPokerCompetition[Robots]

Hold HoldHold

Privatecards Publiccards

Page 4: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

OneHandofTexasHold’emPrivatecards Flop(public) Turn River

Hero

Oppn

BettingRound

BettingRound

BettingRound

BettingRound

Showdown

Best5-CardHand

Wins

Flush

TwoPairs

Page 5: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

CFR:EquilibriumBalancing• AbstractHold’em gametosmallerstate-space• Cycleoverevergamestates

– Update“regrets”– Adjuststrategytowardleast“regret”

• ConvergestoNashequilibriuminthesimplifiedgame.– Closeenoughtoanequilibriuminthefullgame

• WinnersofeveryAnnualComputerPokerCompetition(ACPC)since2007.– LimitHold’em:1%ofunexploitable (2015)*– NoLimitHold’em:defeatedtopprofessionalplayers(2017)**

*pre-computedstrategy**in-gamesimulationonsuper-computercluster

Page 6: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

CFR:CounterfactualRegretMinimization

Player1:Randomstrategy Player2:Randomstrategy

Regret:foldinggoodhandsAction:betgoodhandsmore

Regret:notfoldingbadhandsAction:foldbadhands

Regret:notbluffingbadhandsAction:betwhencan’twin

Regret:notcallingbluffsAction:callwithsome%vsbluffs

(equilibrium)

Page 7: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

CFR:Pre-ComputeEntireStrategyEachpoint:encodesgamestate*• Privatecardsforplayer• Publiccards• Betsmadesofar

*Opponentcannotdistinguishbetweensomestates

Page 8: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

EntangledGameStates:Kuhn(3Card)PokerExample

Heads-upLimitHold’em PokerisSolved byBowling,etal[Nature]

Page 9: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Heads-upLimitHold’em isSolved

Heads-upLimitHold’em PokerisSolved byBowling,etal[Nature]

Page 10: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Within0.1%ofUnexploitable byPerfectResponse

Heads-upLimitHold’em PokerisSolved byBowling,etal[Nature]

Page 11: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Surprising:EquilibriumStrategyis(almost)Binary

Green:raise;Red:fold;Blue:call

Page 12: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

DoesthisworkforNoLimitHold’em?

Page 13: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

NoQuite:NLHismuchbigger

LimitHold’em• 2privatecards,• 5publiccards• 4roundsofbetting• 3bettingactions

– Check/Call– Bet/Raise– Fold

• 10^14gamestates

NoLimitHold’em (200BB)• 2privatecards• 5publiccards• 4roundsofbetting• Upto200bettingactions

– Check/Call– Bet/Raiseanysize– Fold

• 10^170gamestates– Gohas10^160gamestates

Page 14: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

BetSizes:HugeBranchingFactor

DeepStack:… byMoravcik,etal[Science2017]

Page 15: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

GoingOff-Tree

Closestoraverageknownstate:errorsaccumulate

Page 16: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

ContinuousRe-Solving

(CMU’sLibratus alsoemployscontinuousre-solving)

Range:probabilityvectorover1326uniqueprivatecards

Page 17: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Re-solvingEarly:SolveEntireGame(TooBig)

EstimatevaluesatdepthXwithadeepneuralnetwork(UofAlbertaDeepStack)

Page 18: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

DeepStack:EstimatingCFRValues

Page 19: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Goodenoughfor(super)humanperformance?

Page 20: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

PracticalResults:Libratus (CMU)andDeepStack (U-Alberta)

Libratus• Design:

– Nocardabstraction– CFR+forpreflop andflop– ”Endgamesolving”onturn

andriver• Speed:

– Instantpreflop &flop– ~30secondsturnandriver– (200nodesuper-computer)

• Results:– $14.1/handvstoppros– 120,000handsover3weeks

DeepStack• Design:

– Nocardabstraction– “Continuousresolving”onall

streets– Depth-limitedsolvingw/DNN

• Speed:– ~5-10spreflop andflop– ~1-5sturnandriver– LaptopwithTorchandGPU

• Results:– $48.6/handvsnon-toppros– Upcomingfreezeoutmatches

Page 21: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Libratus Challenge(January2017)

ThehumanslosttoLibratus AIbeatby4+σ.Didtheyalsogettired?

Page 22: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

PreviousACPCAgentsWereHighlyExploitable(AndMaybeStillAre)

Results:1250mbb/hand=$125/hand– morethanfoldingeveryhand.LBRagent=limitedbestresponseusing48betsizes.

Page 23: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Conclusions&Speculations(04/2017)

• Computerscanmatchtophumansatheads-upNoLimitHold’em poker.

• Thewinningapproachiscontinuousre-solving(similartochessorGo,butwithhandranges)

• Greattoolstomeasureexploitabilityandtheluckfactor(seeDeepStack paperfordetails)

• Canthisapproachgeneralizeto3-6playergames?• Cantheonlinesolvingbecomemuchfaster?• Willthisworkalwaysrequireextensivedomainexpertise?

Page 24: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Canwetrainastrongpokerplayerwithamuchsmallerstrategy?

Page 25: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Poker-CNN:Cardsas2DTensorsPrivatecards Flop(public) Turn River Showdown

[AhQs]+[As9s6s]

x23456789TJQKAc.............d.............h............1s....1..1..1.1 Flush draw

Pair (of Aces)

[AhQs]

x23456789TJQKAc.............d.............h............1s..........1..

Flush

[AhQsAs9s6s9c2s]

x23456789TJQKAc.......1.....d.............h............1s1...1..1..1.1

Flush!

Page 26: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Convnet:PredictAnythingYouWant

Input convolutions maxpool dense layer50%dropout

outputlayer

conv pool

Inputs:

• Privatecards• Publiccards• Potsize• Position• Previousbetshistory

(31x 17x 173Dtensor)

Predictactionvalue:

• Bet,call,foldvalues• Actionprobabilities• Valuebybetsize

• single-trial$win/loss

• nogradientforbetsnotmade

• noMonteCarlotreesearchrequired

Surrogatetasks:

• Allin odds• Opponenthanddistribution

Page 27: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

$20,000 $20,000

$100 $50

+$265

0204060

20%pot

50%pot

1xpot

1.5xpot

3xpot

10xpot

BetSize

Raise 84.2%Call 15.8%Fold 0.0%

Raise 81.1%Call 18.9%Fold 0.0%

0

20

40

60

0% 33% 50% 66% 100%

Oddsvs Opponent

+$2616

(Call)

BigBlind SmallBlind

Raise 0.0%Call 94.3%Fold 5.7%

0204060

20%pot

50%pot

1xpot

1.5xpot

3xpot

10xpot

BetSize

Page 28: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

$17,285 $17,285

$5,430

(Check) (Check)Bet 30.0%Check 70.0%

Valuevs random 91.3%Valuevs oppon 85.6%

Bet 25.9%Check 74.1%

Valuevs random 52.9%Valuevs oppon 32.6%

Page 29: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Bet 59.0%Check 41.0%

(Check) Bet 86.6%Check 13.4%

010203040

20%pot

50%pot

1xpot

1.5xpot

3xpot

10xpot

BetSize

0

50

100

0% 33% 50% 66% 100%

Oddsvs Opponent

+$3,967

$5,430

$17,285 $17,285

Page 30: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

$13,318

$9,397

Raise 26.6%Call 73.4%Fold 0.0%

Valuevs random 91.3%Valuevs oppon 68.0%

+$17,285(allin)Call 32.4%Fold 67.6%

Valuevs random 84.7%Valuevs oppon 30.6%

($13,000allin call,towin$26,000)33.3%odds=break-even

$17,285+$3,967

Page 31: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Takeaways• Prettygoodpatternmatching,withenoughdata– Naïvenetworkdesign– andfoolishuseofpooling– Training≈4millionpreviousACPChands

• Struggleswithrarecases– Under-weightsoutliers– Outofsamplesituations

• Strugglesinbigpots– Largeeffectonaverageresults– Sparsedata

• Noattempttoavoidexploitability

Page 32: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

FutureWork• Moregames,morecontexts– 3-6playerNoLimitHold’em– Pot-LimitOmaha(4privatecardsinsteadof2)– TournamentHold’em

• LearntheCFRinternalparameters?– Predictopponenthandrangesdirectly

• Personalizemodelagainstanopponent• Tunehyper-parameter…– 100,000handsperexperiment– Findidealnetworkarrangement– Exploitflexibilityofdeepneuralnets

Page 33: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

TheDream

Page 34: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

CanaDNNlearntoimitatestrongplayersin2+playergames?

Data:high-qualitysimulation,equilibriumsolving,orplayerlogs

Page 35: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

ReinforcementLearning

Page 36: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

DeepQ-LearningforAtariGames

Human-levelcontrol throughdeepreinforcementlearningbyDeepMind (Nature2015)

Page 37: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

OpenAI Gym:Train&ShareRLAgents

Support forAtarigames,classicRLproblems, robotsoccer,Doom[nopoker]

Page 38: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

ReinforcementLearningforGames

https://www.slideshare.net/onghaoyi/distributed-deep-qlearning

Page 39: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

FaultyRewardFunction?

https://blog.openai.com/faulty-reward-functions/

Page 40: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

CanPokerBeSolvedWithRL?

• Yes,withmodifications.– “Standard”RLisgreedyandrequirestheMarkovproperty

– Pokerdecisionscan’tbeoptimizedlocally– Somegame-customlocalsimulationisrequired

• Heinrich&Silver(DeepMind2016)matchstateoftheartonLimitHoldemwithmodifieddeepRL– https://arxiv.org/pdf/1603.01121.pdf– DeepRLalsogivesuseful“similarcontext”embeddingforpokersituations

– (AsshouldourPoker-CNN)

Page 41: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

DeepRLHighWatermarks

• Atarigames– Resultskeepimproving– AlthoughOpenAIclaimsequal/betterresultsonthesimplerAtarigameswithevolutionaryalgorithms

• AlphaGo – super-humanachievement• RLsaves40%ondatacentercooling – Google

Page 42: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

CanIApplyDeepRLtoMyProblem?

Pros– Goforit!• Cleargame-likerewardfunction• Easytosimulatetheenvironment• Markovpropertyapplies[stateis

notpath-dependent]• Bestpathcanbedeterministic• Rewardsareobservablein

relativelyshortsequences• Hardtocomputeexactproblem

gradients,evenifsolutionseasytocompare

• Accesstomassivemachineresources

Cons– trysomethingelse.• Noclearrewards(selfdrivingcar)• Trainingdata,nottraining

environment• Limitedcomputationalresources• Possibletocomputeexact

gradientsontheproblem(MNIST,videoclassification,etc)

• Notlikelythatrandomactionswillevergetapositivereward(DeepQRLscored0.0onMontezuma’sRevengeforalongtime)

Page 43: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

QuestionsforFutureThought• WhataresomehardproblemsthatcouldbesolvedwithDeepRL,givenhugeresources?– Example:componentarrangementformicrochipmanufacture

• GivenaccesstoLibratus orDeepStack engine,couldyoudesignadeepnettoimitateit,ortobeatit?– Withorwithoutonlinesimulation?

• Fromasmallamountofexperttrainingdata,canyoutrainageneralagentfor2PgameslikeStreetFighter?– CouldyoubootstrapitlikeAlphaGo?– Couldyoutrainitsohumanscan’ttellthatit’sabot?

• WhatproblemswouldyoutrainwithaccesstoahugeGPUcluster?

Page 44: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

Thankyou!Questions?

Page 45: Poker AI: Equilibrium, Online Resolving, Deep Learning and ...llcao.net/cu-deeplearning17/lecture/lecture11_Nikolai.pdf · Poker AI: Equilibrium, Online Resolving, Deep Learning and

References&FurtherReading• DeepStack

– https://www.deepstack.ai/– WatchweeklyhumanvsAImatchesonTwitch:https://www.twitch.tv/deepstackai– Opensource(Torch)codeforNoLimit LeducHold’em (simplifiedNLH):

https://github.com/lifrordi/DeepStack-Leduc• Libratus

– http://www.cs.cmu.edu/~sandholm/– Mywriteupon#BrainsVsAI match:https://medium.com/@Moscow25/cmus-libratus-bluffs-

its-way-to-victory-in-brainsvsai-poker-match-99abd31b9cd4• Poker-CNN

– OurpaperfromAAAI2016onArXiv http://arxiv.org/abs/1509.06731– Code&models(admittedlyneedscleanup)https://github.com/moscow25/deep_draw

• AnnualComputerPokerCompetitionhttp://www.computerpokercompetition.org/• DeepReinforcementLearning

– DeepMind:https://deepmind.com/research/dqn/– OpenAI:https://openai.com/

• NVidiaApplied DeepLearningResearchGroup– Blog/interview:https://blogs.nvidia.com/blog/2016/12/07/ai-podcast-deep-learning-going-

next/– Openrequisition:https://sv.ai/nvidia-senior-research-scientist-deep-learning/