Adversarial Search - School of Computer Sciencedsuter/Harbin_course/AdversarialSearch.pdf ·...
Transcript of Adversarial Search - School of Computer Sciencedsuter/Harbin_course/AdversarialSearch.pdf ·...
Ar#ficialIntelligenceAdversarialSearch
Instructors:DavidSuterandQinceLi
CourseDelivered@HarbinIns#tuteofTechnology[ManyslidesadaptedfromthosecreatedbyDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.SomeothersfromcolleaguesatAdelaide
University.]
GamePlayingState-of-the-Art§ Checkers:1950:Firstcomputerplayer.1994:
Firstcomputerchampion:Chinookended40-year-reignofhumanchampionMarionTinsleyusingcomplete8-pieceendgame.2007:Checkerssolved!
§ Chess:1997:DeepBluedefeatshumanchampionGaryKasparovinasix-gamematch.DeepBlueexamined200Mposi#onspersecond,usedverysophis#catedevalua#onandundisclosedmethodsforextendingsomelinesofsearchupto40ply.CurrentprogramsareevenbeZer,iflesshistoric.
§ Go:Humanchampionsarenowstar#ngtobechallengedbymachines,thoughthebesthumanss#llbeatthebestmachines.Ingo,b>300!ClassicprogramsusepaZernknowledgebases,butbigrecentadvancesuseMonteCarlo(randomized)expansionmethods.
Now“solved”withassistanceofDeepLearning
§ Pacman
§ Manydifferentkindsofgames!
§ Axes:§ Determinis#corstochas#c?§ One,two,ormoreplayers?§ Zerosum?§ Perfectinforma#on(canyouseethestate)?
§ Wantalgorithmsforcalcula#ngastrategy(policy)whichrecommendsamovefromeachstate
TypesofGames
Determinis#cGames
§ Manypossibleformaliza#ons,oneis:§ States:S(startats0)§ Players:P={1...N}(usuallytaketurns)§ Ac#ons:A(maydependonplayer/state)§ Transi#onFunc#on:SxA→S§ TerminalTest:S→{t,f}§ TerminalU#li#es:SxP→R
§ Solu#onforaplayerisapolicy:S→A
Zero-SumGames
§ Zero-SumGames§ Agentshaveoppositeu#li#es
(valuesonoutcomes)§ Letsusthinkofasinglevaluethat
onemaximizesandtheotherminimizes
§ Adversarial,purecompe##on
§ GeneralGames§ Agentshaveindependentu#li#es
(valuesonoutcomes)§ Coopera#on,indifference,
compe##on,andmoreareallpossible
ValueofaState
Non-TerminalStates:
8
2 0 2 6 4 6… … TerminalStates:
Valueofastate:Thebestachievableoutcome
(u#lity)fromthatstate
AdversarialSearch(Minimax)
§ Determinis#c,zero-sumgames:§ Tic-tac-toe,chess,checkers§ Oneplayermaximizesresult§ Theotherminimizesresult
§ Minimaxsearch:§ Astate-spacesearchtree§ Playersalternateturns§ Computeeachnode’sminimax
value:thebestachievableu#lityagainstara#onal(op#mal)adversary
8 2 5 6
max
min2 5
5
Terminalvalues:partofthegame
Minimaxvalues:computedrecursively
MinimaxImplementa#on
defmin-value(state):ini#alizev=+∞ foreachsuccessorofstate:
v=min(v,max-value(successor))
returnv
defmax-value(state):ini#alizev=-∞ foreachsuccessorofstate:
v=max(v,min-value(successor))
returnv
MinimaxImplementa#on(Dispatch)
defvalue(state):ifthestateisaterminalstate:returnthestate’su#lity
ifthenextagentisMAX:returnmax-value(state)ifthenextagentisMIN:returnmin-value(state)
defmin-value(state):ini#alizev=+∞ foreachsuccessorofstate:
v=min(v,value(successor))
returnv
defmax-value(state):ini#alizev=-∞ foreachsuccessorofstate:
v=max(v,value(successor))
returnv
MinimaxEfficiency
§ Howefficientisminimax?§ Justlike(exhaus#ve)DFS§ Time:O(bm)§ Space:O(bm)
§ Example:Forchess,b≈35,m≈100§ Exactsolu#oniscompletely
infeasible§ But,doweneedtoexplore
thewholetree?
MinimaxProper#es
Op#malagainstaperfectplayer.Otherwise?
10 10 9 100
max
min
[Demo:minvsexp(L6D2,L6D3)]
ResourceLimits
§ Problem:Inrealis#cgames,cannotsearchtoleaves!
§ Solu#on:Depth-limitedsearch§ Instead,searchonlytoalimiteddepthinthe
tree§ Replaceterminalu#li#eswithanevalua#on
func#onfornon-terminalposi#ons
§ Example:§ Supposewehave100seconds,canexplore
10Knodes/sec§ Socancheck1Mnodespermove§ α-βreachesaboutdepth8–decentchess
program
§ Guaranteeofop#malplayisgone
§ MorepliesmakesaBIGdifference
§ Useitera#vedeepeningforanany#mealgorithm
? ? ? ?
-1 -2 4 9
4
min
max
-2 4
DepthMaZers
§ Evalua#onfunc#onsarealwaysimperfect
§ Thedeeperinthetreetheevalua#onfunc#onisburied,thelessthequalityoftheevalua#onfunc#onmaZers
§ Animportantexampleofthetradeoffbetweencomplexityoffeaturesandcomplexityofcomputa#on
[Demo:depthlimited(L6D4,L6D5)]
Evalua#onFunc#ons§ Evalua#onfunc#onsscorenon-terminalsindepth-limitedsearch
§ Idealfunc#on:returnstheactualminimaxvalueoftheposi#on§ Inprac#ce:typicallyweightedlinearsumoffeatures:
§ e.g.f1(s)=(numwhitequeens–numblackqueens),etc.
Evalua#onforPacman
[Demo:thrashingd=2,thrashingd=2(fixedevalua#onfunc#on),smartghostscoordinate(L6D6,7,8,10)]
WhyPacmanStarves
§ Adangerofreplanningagents!§ Heknowshisscorewillgoupbyea#ngthedotnow(west,east)§ Heknowshisscorewillgoupjustasmuchbyea#ngthedotlater(east,west)§ Therearenopoint-scoringopportuni#esazerea#ngthedot(withinthehorizon,
twohere)§ Therefore,wai#ngseemsjustasgoodasea#ng:hemaygoeast,thenbackwest
inthenextroundofreplanning!
Alpha-BetaPruning
§ Generalconfigura#on(MINversion)§ We’recompu#ngtheMIN-VALUEatsome
noden
§ We’reloopingovern’schildren§ n’ses#mateofthechildrens’minisdropping§ Whocaresaboutn’svalue?MAX§ LetabethebestvaluethatMAXcangetat
anychoicepointalongthecurrentpathfromtheroot
§ Ifnbecomesworsethana,MAXwillavoidit,sowecanstopconsideringn’sotherchildren(it’salreadybadenoughthatitwon’tbeplayed)
§ MAXversionissymmetric
MAX
MIN
MAX
MIN
a
n
Alpha-BetaImplementa#on
defmin-value(state,α,β):ini#alizev=+∞ foreachsuccessorofstate:
v=min(v,value(successor,α,β))
ifv≤αreturnvβ=min(β,v)
returnv
defmax-value(state,α,β):ini#alizev=-∞ foreachsuccessorofstate:
v=max(v,value(successor,α,β))
ifv≥βreturnvα=max(α,v)
returnv
α:MAX’sbestop#ononpathtorootβ:MIN’sbestop#ononpathtoroot
Alpha-BetaPruningProper#es
§ Thispruninghasnoeffectonminimaxvaluecomputedfortheroot!
§ Goodchildorderingimproveseffec#venessofpruning
§ With“perfectordering”:§ TimecomplexitydropstoO(bm/2)§ Doublessolvabledepth!§ Fullsearchof,e.g.chess,iss#llhopeless…
§ Thisisasimpleexampleofmetareasoning(compu#ngaboutwhattocompute)
10 10 0
max
min