Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline...

90
An Introduction to Contextual Bandits Algorithm Ph.D. Candidate: Qing Wang, 2016 Florida International University

Transcript of Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline...

Page 1: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

AnIntroductiontoContextualBanditsAlgorithm

Ph.D.Candidate:QingWang,2016FloridaInternationalUniversity

Page 2: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Outline

§ Introduction§ Motivation§ Contextual-freeBanditAlgorithms§ ContextualBanditAlgorithms§ OurWork

§ EnsembleContextualBanditsforPersonalizedRecommendation§ PersonalizedRecommendationviaParameter-FreeContextualBandits

§ FutureWork§ Q&A

Page 3: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

WhatisPersonalizedRecommendation?

§ PersonalizedRecommendationhelpusersfindinterestingitemsbasedtheindividualinterestofeachitem.§ UltimateGoal:maximizeuser

engagement.

Page 4: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

WhatisColdStartProblem?

§ Donothaveenoughobservationsfornew itemsornew users.§ Howtopredictthepreferenceofusersifwedonothavedata?

§ Manypracticalissuesforofflinedata§ Historicaluserlogdataisbiased.§ Userinterestmaychange overtime.

Page 5: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Approach:Multi-armedBanditAlgorithm

§ Agamblerwalksintoacasino§ Arowofslotmachinesproviding

arandomrewards

Objective:Maximizethesumofrewards(Money)!

Page 6: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Example:NewsPersonalization

§ Recommend newsbasedonusers’interests.

§ Goal:Maximizeuser’sClick-Through-Rate.

[1]Li,Lihong,etal."Acontextual-banditapproachtopersonalizednewsarticlerecommendation." Proceedingsofthe19thinternationalconferenceonWorldwideweb.ACM,2010.

Page 7: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Example:NewsPersonalization

§ Thereareabunchofarticlesinthenewspool§ Userscomesequentiallyandreadytobeenter

Newsarticles

[1]ZhouLi,“NewspersonalizationwithMulti-armedbandits”.

Page 8: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Example:NewsPersonalization

§ Ateachtime,wewanttoselectonearticleforuser.

newsarticles

article1

Likeit?

MABs

Page 9: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Example:NewsPersonalization

§ Goal:maximumCTR.

MABs

newsarticles

article1

Likeit?

Notreally!

Page 10: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Example:NewsPersonalization

§ Updatethemodelwithuser’sfeedback

MABs

newsarticles

article1

Likeit?

Notreally!

feedback

Page 11: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Example:NewsPersonalization

• Updatethemodeloncegiventhefeedback

MABs

newsarticles

article2

Likeit?

Yeah!

Page 12: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Example:NewsPersonalization

§ Updatethemodeloncegiventhefeedback

MABs

newsarticles

article2

Likeit?

Yeah!

feedback

Howaboutarticle3,4,5…?

Page 13: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Multi-armedBanditDefinition

§ TheMABproblemisaclassicalparadigminMachineLearninginwhichanonlinealgorithmchosesfromasetofstrategiesinasequenceoftrialssoastomaximizethetotalpayoffofthechosenstrategies.[1]

[1]http://research.microsoft.com/en-us/projects/bandits/

Page 14: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Application:ClinicalTrial

[1]Einstein,A.,B.Podolsky,andN.Rosen,1935,“Canquantum-mechanicaldescriptionofphysicalrealitybeconsideredcomplete?”,Phys.Rev. 47,777-780

§ Twotreatmentswithunknowneffectiveness

Page 15: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Webadvertising

[1]TangL,RosalesR,SinghA,etal.Automaticadformatselectionviacontextualbandits[C],Proceedingsofthe22ndACMinternationalconferenceonConferenceoninformation&knowledgemanagement.ACM,2013:1587-1594.

§ Wheretoplacethead?

Page 16: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

PlayingGolfwithmulti-balls

[1]Dumitriu,Ioana,PrasadTetali,andPeterWinkler."Onplayinggolfwithtwoballs." SIAMJournalonDiscreteMathematics 16.4(2003):604-615.

Page 17: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Multi-AgentSystem

[1]Ny,JeromeLe,Munther Dahleh,andEricFeron."Multi-agenttaskassignmentinthebanditframework."DecisionandControl,200645thIEEEConferenceon.IEEE,2006.

§ KagentstrackingN(N>K)targets:

Page 18: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

SomeJargonTerms[1]

§ Arm:oneidea/strategy§ Bandit:Agroupofideas(strategies)§ Pull/Play/Trial:Onechancetotryyourstrategy§ Reward:Theunitofsuccesswemeasureaftereachpull§ Regret:PerformanceMetric

[1]BanditAlgorithmsforWebsiteOptimizationDeveloping,Deploying,andDebuggingBy JohnMylesWhite,O'ReillyMedia,2012

Page 19: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

K-ArmedBandit

[1]CS246MiningMassiveDataSets2015,StanfordUniversity

§ EachArma§ Wins(reward=1)withfixed(unknown)prob.𝜇#§ Loses(reward=0)withfixed(unknown)prob.(1 − 𝜇#)

§ Alldrawsareindependentgiven𝜇(…𝜇*§ Howtopullarmstomaximizetotalreward?(estimatethe

arm’sprob.ofwinning𝜇#)

Page 20: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ModelofK-ArmedBandit

[1]CS246MiningMassiveDataSets2015,StanfordUniversity

§ Setof𝒌 choices(arms)§ Eachchoicea isassociatedwithunknownprobability

distribution𝑷𝒂 in[0,1]§ WeplaythegameforTrounds§ Ineachroundt:

§ Wepicksomearmj§ Weobtainrandomsample𝑿𝒕 from𝑷𝒋(rewardis

independentofpreviousdraws)§ Goal:maximize∑ 𝑿𝒕2

34( (withoutknown𝜇#)§ However,everytimewepullsomearmawegettolearnabit

about𝜇#.

Page 21: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

PerformanceMetric:Regret

[1]CS246MiningMassiveDataSets2015,StanfordUniversity

§ Letbe𝜇# themeanof𝑷𝒂§ Payoff/reward bestarm:𝜇∗ = 𝒎𝒂𝒙 𝜇# 𝒂 = 𝟏,… , 𝒌}§ Let𝑖(, …𝑖2 bethesequenceofarmspulled§ Instantaneousregretattimet:𝑟3= 𝜇∗ −𝜇#>§ Totalregret:

§ 𝑹𝑻 =∑ 𝒓𝒕234(

§ Typicalgoal:armallocationstrategythatguarantees:

§𝑹𝑻2→ 0asT → ∞

Page 22: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

AllocationStrategies

[1]CS246MiningMassiveDataSets2015,StanfordUniversity

§ Ifweknewthepayoffs,whicharmshouldwepull?§ bestarm:𝜇∗ = 𝒎𝒂𝒙 𝜇# 𝒂 = 𝟏,… , 𝒌}

§ Whatifweonlycareaboutestimatingpayoff𝜇#?

§ Pickeachofkarmsequallyoften:𝑻*

§ Estimate :𝜇#H = ∑ 𝑋#,JKLJ4( /(𝑻

*) 𝒌

2∑ 𝑋#,J2/*J4(

§ Totalregret:

§ 𝑹𝑻 =𝑻*∑ (𝜇∗−𝜇#)*

#4(

Page 23: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ExploitationvsExploration

§ Tradeoff:§ Onlyexploitation(makingdecisionsbasedonhistorydata),youwillhavebadestimationfor“best”items.

§ Onlyexploration(gatheringdataaboutarmpayoffs),youwillhavelowuser’sengagement.

Page 24: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

AlgorithmtoExploration&Exploitation

Exploration Exploitation

Contextualfree Contextual

1. Epsilonalgorithm[1]2. UCB1[2]

1. Ex3,Ex42. TompsonSampling[3]3. LinUCB[4]

tradeoff

[1]WynnP.Ontheconvergenceandstabilityoftheepsilonalgorithm[J].SIAMJournalonNumericalAnalysis,1966,3(1):91-122.[2]AuerP,Cesa-BianchiN,FischerP.Finite-timeanalysisofthemulti-armedbanditproblem[J].Machinelearning,2002,47(2-3):235-256.[3]AgrawalS,GoyalN.AnalysisofThompsonsamplingforthemulti-armedbanditproblem[J].arXiv preprintarXiv:1111.1797,2011.[4]Li,Lihong,etal."Acontextual-banditapproachtopersonalizednewsarticlerecommendation." Proceedingsofthe19thinternationalconferenceonWorldwideweb.ACM,2010.

Page 25: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

§ Ittriestobefairtothetwooppositegoalsofexploration(withprob.𝜀)andexploitation(1-𝜀)byusingamechanism:flipsacoin.

𝜀-GreedyAlgorithm

Roundt

Exploration

Exploitation(choosebestarm)

𝜺

1-𝜺

Arm𝑎∗

Arm𝑎1/k

1-𝜺

𝜺/k

Arm𝑏

𝜺/k1/k

Page 26: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

§ Fort=1:T§ Set𝜀3 = 𝑂 (

3§ Withprob.𝜀3: Explorebypickinganarmchosen

uniformlyatrandom§ Withprob.1-𝜀3:Exploitbypickinganarmwith

highestempiricalmeanpayoff§ Theorem[Aueretal.‘02]

§ Forsuitablechoiceof𝜀3 itholdsthat

𝜀-GreedyAlgorithm

Page 27: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

§ Notelegant”:Algorithmexplicitlydistinguishesbetweenexplorationandexploitation

§ Moreimportantly:Explorationmakessuboptimalchoices(sinceitpicksanyarmequallylikely)

§ Idea:Whenexploring/exploitingweneedtocomparearms.

Issueswith𝜀-GreedyAlgorithm

Page 28: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Example:ComparingArms

§ Supposewehavedoneexperiments:§ Arm1:1001110001§ Arm2:1§ Arm3:1101001111

§ Meanarmvalues:§ Arm1:5/10Arm2:1Arm3:7/10

§ Whicharmwouldyouchoosenext?§ Idea:Notonlylookatthemeanbutalsothe

confidence!

Page 29: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ConfidenceIntervals

§ Aconfidenceintervalisarangeofvalueswithinwhichwearesurethemeanlieswithacertainprobability§ Wecouldbelieve𝜇# iswithin[0.2,0.5]with

probability0.95§ Ifwewouldhavetriedanactionlessoften,our

estimatedrewardislessaccuratesotheconfidenceintervalislarger

§ Intervalshrinksaswegetmoreinformation(trytheactionmoreoften)

Page 30: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ConfidenceBasedSelection

§ Assumingweknowtheconfidenceintervals§ Then,insteadoftryingtheactionwiththehighestmeanwecan

trytheactionwiththehighestupperboundonitsconfidenceinterval.

Page 31: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ConfidenceintervalsvsSamplingtimes

Theestimationofconfidencebecomessmallerasthenumberofpullingtimesincreases.

[1]Jean-YvesAudibert andRemiMunos,IntroductiontoBandits:AlgorithmsandTheory.ICML2011,Bellevue(WA),USA

Page 32: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

CalculatingConfidenceBounds

§ Supposewefixarma:§ Let𝑟#,(… 𝑟#,T bethepayoffsofarmainthefirstm

trials§ 𝑟#,(… 𝑟#,T arei.i.d.takingvaluesin[0,1]

§ Ourestimate:𝜇#,TU = 𝟏T∑ 𝑟#,JTJ4(

§ Wanttofindbsuchthatwithhighprobability𝜇# − 𝜇#,TU ≤ 𝑏 (wantbtobeassmallaspossible)

§ Goal:Wanttobound𝐏( 𝜇# − 𝜇#,TU ≤ 𝑏)

Page 33: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

UCB1Algorithm

Hoeffding’s Inequality

§ UCB1(Upperconfidencesampling)algorithm§ Let𝜇(H … = 𝜇*H = 0and𝑚( =…=𝑚* =0

§ 𝜇#H isourestimateofpayoffofarm 𝑖§ 𝑚# is thenumberofpullsofarm𝑖 sofar.

§ Fort=1:T

§ ForeacharmacalculateUCB a = 𝜇#H + 𝛼 ^_`>ab

§ Pickarm𝑗 = 𝑎𝑟𝑔𝑚𝑎𝑥#𝑈𝐶𝐵(𝑎)§ Pullarm𝑗 andobserve𝑦3§ 𝑚J =𝑚J +1and𝜇JH =1/𝑚J(𝑦3+(𝑚J−1)𝜇JH)

Page 34: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

UCB1Algorithm:Discussion

§ Confidenceintervalgrowswiththetotalnumberofactionstwehavetaken

§ ButShrinkswiththenumberoftimes𝑚# wehavetriedarma

§ Thisensureseacharmistriedinfinitelyoftenbutstillbalancesexplorationandexploitation

§ 𝛼 playstheroleof𝛿: 𝛼 = f mn= 1 + _`(^/o)

^

§ ForeacharmacalculateUCB a = 𝜇#H + 𝛼 ^_`>ab

§ Pickarm𝑗 = 𝑎𝑟𝑔𝑚𝑎𝑥#𝑈𝐶𝐵(𝑎)§ Pullarm𝑗 andobserve𝑦3§ 𝑚J =𝑚J +1and𝜇JH =1/𝑚J(𝑦3+(𝑚J−1)𝜇JH)

Page 35: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

UCB1AlgorithmPerformance

§ Theorem[Aueretal.2002]§ Supposeoptimalmeanpayoffis§ Andforeacharmlet§ Thenitholdsthat

§ So,weget

Page 36: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ContextualBandits

§ Contextualbanditalgorithminroundt§ Algorithmobserversuser𝒖𝒕 andaset𝐀 ofarms

togetherwiththeirfeatures𝒙𝒕,𝒂(context)§ Basedonpayoffsfromprevioustrials,algorithm

choosesarm𝒂 ∈ 𝐀 andreceivespayoff𝒓𝒕,𝒂§ Algorithmimprovesarmselectionstrategywitheach

observation(𝒙𝒕,𝒂, 𝒂,𝒓𝒕,𝒂)

Page 37: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

LinUCBAlgorithm[1]

§ Contextualbanditalgorithminroundt§ Algorithmobserversuser𝒖𝒕 andaset𝐀 ofarms

togetherwiththeirfeatures𝒙𝒕,𝒂(context)§ Basedonpayoffsfromprevioustrials,algorithm

choosesarm𝒂 ∈ 𝐀 andreceivespayoff𝒓𝒕,𝒂§ Algorithmimprovesarmselectionstrategywitheach

observation(𝒙𝒕,𝒂, 𝒂,𝒓𝒕,𝒂)

[1]Li,Lihong,etal."Acontextual-banditapproachtopersonalizednewsarticlerecommendation." Proceedingsofthe19thinternationalconferenceonWorldwideweb.ACM,2010.

Page 38: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

LinUCBAlgorithm

§ Expectationofrewardofeacharmismodeledasalinearfunctionofthecontext.

Payoffofarma:E 𝑟3,# 𝑥3,# = [𝑥3,#]2𝜃#∗

§ Thegoalistominimizeregret,definedasthedifferencebetweentheexpectationoftherewardofbestarmsandtheexpectationoftherewardofselectedarms.

𝑅3 𝑇 ≝ 𝐸 {𝑟3,#>∗

2

34(

− 𝐸[{𝑟3,#>

2

34(

]

𝒙𝒕,𝒂 isad-dimensionalfeaturevector

𝜽𝒂∗ istheunknowncoefficientvectorweaimtolearn

Page 39: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

LinUCBAlgorithm

§ E 𝑟3,# 𝑥3,# = [𝑥3,#]2𝜃#∗§ Howtoestimate𝜃#?

§ Linearregressionsolutionto𝜃# is𝜽𝒂} = 𝒂𝒓𝒈𝒎𝒊𝒏𝜽 ∑ ([𝑥3,#]2𝜃# −𝒃𝒂

(𝒎))m�𝒎∈𝑫𝒂

Wecanget:𝜽𝒂} = (𝑫𝒂𝑻𝑫𝒂 + 𝑰𝒅)�𝟏 𝑫𝒂𝑻𝒃𝒂

𝑫𝒂 isam× dmatrixofmtraininginputs[𝑥3,#]

𝒃𝒂 isam-dimensionvectorofresponsesto𝒂(click/no-click)

Page 40: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

LinUCBAlgorithm

§ UsingsimilartechniquesasweusedforUCB

|[𝑥3,#]2𝜽𝒂} − E 𝑟3,# 𝑥3,# | ≤ 𝜶 [𝑥3,#]2(𝑫𝒂𝑻𝑫𝒂 + 𝑰𝒅)�𝟏𝑥3,#�

§ Foragivencontext,weestimatetherewardandtheconfidenceinterval.

𝒂𝒕 ≝ 𝒂𝒓𝒈𝒎𝒂𝒙𝒂∈𝑨𝒕([𝑥3,#]2𝜽𝒂} + 𝜶 [𝑥3,#]2(𝑫𝒂𝑻𝑫𝒂 + 𝑰𝒅)�𝟏𝑥3,#

�)

𝜶 = 𝟏 + 𝒍𝒏(𝟐/𝜹)/𝟐�

Estimated𝜇# Confidenceinterval

Page 41: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

LinUCBAlgorithm§ Initialization:𝐴# ≝ 𝑫𝒂𝑻𝑫𝒂 + 𝑰𝒅

§ Foreacharm𝑎:§ 𝐴# = 𝐼� //identitymatrixd×d§ 𝑏# = [0]� //vectorofzeros

§ Onlinealgorithm:§ Fort=[1:T]:

§ Observefeaturesforallarms𝑎 ∶ 𝑥3,# ∈ 𝑅�§ Foreacharm𝑎 ∶

§ 𝜃# = 𝐴#�(𝑏# //regressioncoefficients

§ 𝑝3,# = [𝑥3,#]2𝜃# + 𝜶 [𝑥3,#]2𝐴#�(𝑥3,#�

§ Choosearm𝑎3 = 𝑎𝑟𝑔𝑚𝑎𝑥#𝑝3,# //choosearm§ 𝐴#> = 𝐴#> + 𝑥3,#>[𝑥3,#>]

2 //updateAforthechosenarm𝑎3§ 𝑏#> = 𝑏#> + 𝑟3𝑥3,#> //updatebforthechosenarm𝑎3

Page 42: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

LinUCB: Discussion

§ LinUCBcomputationalcomplexityis§ Linear inthenumberofarmsand§ Atmostcubicinthenumberoffeatures

§ LinUCBworkswellforadynamic armset(armscomandgo)§ Forexample,innewsarticlerecommendation,for

instance,editorsadd/removearticlesto/fromapool

Page 43: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

DifferentbetweenUCB1andLinUCB

§ UCB1 directlyestimates𝜇# throughexperimentation(withoutanyknowledgeaboutarma)

§ LinUCB estimates𝜇# byregression𝜇# = [𝑥3,#]2𝜽𝒂∗§ Thehopeisthatwewillbeabletolearnfasteras

weconsiderthecontext𝑥#(user, ad) ofarma§ 𝜽𝒂∗ unknowncoefficientvectorweaimtolearn

Page 44: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ThompsonSampling

§ AsimplenaturalBayesianheuristic§ Maintainabelief(distribution)fortheunknown

parameters§ Eachtime,pullarma andobserveareward𝑟

§ Initializepriorsusingbeliefdistribution§ Fort=1:T:

§ SamplerandomvariableXfromeacharm’sbeliefdistribution

§ SelectthearmwithlargestX§ Observetheresultofselectedarm§ Updatepriorbeliefdistributionforselectedarm

[1]AgrawalS,GoyalN.AnalysisofThompsonsamplingforthemulti-armedbanditproblem[J].arXiv preprintarXiv:1111.1797,2011.

Page 45: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

SimpleExample

§ Cointoss:x̴Bernoulli(𝜃)§ Let’sassumethat

§ 𝜃̴Beta(𝛼£, 𝛼2)§ P(𝜃)∝ 𝜃¥¦�((1 − 𝜃)¥K�(

§ 𝑃 𝜃 𝑋 = ¨ 𝑋 𝜃 ¨(©)∑ ¨(ª|©)�«

Posterior

Prior

Thepriorisconjugate!

Betadistribution

Page 46: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ThompsonSamplingUsingBetabeliefdistribution§ Theorem[Emilieetal.2012]

§ Initiallyassumesarm𝒊 withpriorBeta(1,1)on𝝁𝒊§ 𝑆® =#“Success”,𝐹®=#“Failure”

Page 47: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ThompsonSamplingUsingBetabeliefdistribution

Arm1 Arm2 Arm3

Beta(1,1) Beta(1,1) Beta(1,1)

§ Initialization

Page 48: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ThompsonSamplingUsingBetabeliefdistribution

Arm1 Arm2 Arm3

Beta(1,1) Beta(1,1) Beta(1,1)

X0.70.20.4

§ Foreachround:§ SamplerandomvariableXfromeacharm’sBeta

Distribution

Page 49: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ThompsonSamplingUsingBetabeliefdistribution

Arm1 Arm2 Arm3

Beta(1,1) Beta(1,1) Beta(1,1)

X0.70.20.4

§ Foreachround:§ SamplerandomvariableXfromeacharm’sBeta

Distribution§ SelectthearmwithlargestX

Page 50: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ThompsonSamplingUsingBetabeliefdistribution

Arm1 Arm2 Arm3

Beta(1,1) Beta(1,1) Beta(1,1)

X0.70.20.4

§ Foreachround:§ SamplerandomvariableXfromeacharm’sBeta

Distribution§ SelectthearmwithlargestX§ Observetheresultofselectedarm

Success!

Page 51: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ThompsonSamplingUsingBetabeliefdistribution

Arm1 Arm2 Arm3

Beta(2,1) Beta(1,1) Beta(1,1)

X0.70.20.4

§ Foreachround:§ SamplerandomvariableXfromeacharm’sBeta

Distribution§ SelectthearmwithlargestX§ Observetheresultofselectedarm§ UpdatepriorBetadistributionforselectedarmSuccess!

Page 52: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

OurResearch1:EnsembleContextualBanditsforPersonalizedRecommendation

[1]Tang,Liang,etal."Ensemblecontextualbanditsforpersonalizedrecommendation." Proceedingsofthe8thACMConferenceonRecommendersystems.ACM,2014.

Page 53: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ProblemStatement

§ ProblemSetting: havemanydifferentrecommendationmodels(orpolicies):§ DifferentCTRPredictionAlgorithms.§ DifferentExploration-ExploitationAlgorithms.§ DifferentParameterChoices.

§ Nodatatodomodelvalidation§ ProblemStatement:howtobuildanensemblemodelthatis

closetothebestmodelinthecoldstartsituation?

Page 54: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

HowEnsemble?

§ Classifierensemblemethoddoesnotworkinthissetting§ RecommendationdecisionisNOTpurelybasedonthepredictedCTR.

§ Eachindividualmodelonlytellsus:§ Whichitemtorecommend.

Page 55: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

EnsembleMethod

§ OurMethod:§ Allocaterecommendationchancestoindividualmodels.

§ Problem:§ Bettermodelsshouldhavemorechances.§ Wedonotknowwhichoneisgoodorbadinadvance.§ Idealsolution:allocateallchancestothebestone.

Page 56: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

CurrentPractice:OnlineEvaluation(orA/Btesting)§ Letπ1,π2 …πm betheindividualmodels.

§ Deployπ1,π2 …πm intotheonlinesystematthesametime.

§ Dispatchasmallpercentusertraffictoeachmodel.§ Afteraperiod,choosethemodelhavingthebestCTRastheproductionmodel.

Page 57: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

CurrentPractice:OnlineEvaluation(orA/Btesting)§ Letπ1,π2 …πm betheindividualmodels.

§ Deployπ1,π2 …πm intotheonlinesystematthesametime.

§ Dispatchasmallpercentusertraffictoeachmodel.§ Afteraperiod,choosethemodelhavingthebestCTRas

theproductionmodel.

Ifwehavetoomanymodels,thiswillhurttheperformanceoftheonlinesystem.

Page 58: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

OurIdea1(HyperTS)

§ TheCTRofmodelπi isarandomunknownvariable,Ri .§ Goal:

§ maximize,rt isarandomnumberdrawnfromRs(t),s(t)=1,2,…,orm.Foreacht=1,…,N,wedecides(t).

§ Solution:§ BernoulliThompsonSampling (flatprior:beta(1,1)).

§ π1,π2 …πmarebanditarms.

1N

rtt=1

N

∑ CTRofourensemblemodel

Notrickyparameters

Page 59: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

AnExampleofHyperTS

Inmemory,wekeeptheseestimatedCTRsforπ1,π2 …πm.

R1

R2

Rk

Rm

Page 60: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

AnExampleofHyperTSAuservisit

HyperTSselectsacandidatemodel,πk .

EstimatedCTRs

R1

R2

Rk

Rm

Page 61: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

AnExampleofHyperTSAuservisit

HyperTSselectsacandidatemodel,πk .

πk recommendsitemA totheuser.

A

xt::contextfeatures

EstimatedCTRs

R1

R2

Rk

Rm

Page 62: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

AnExampleofHyperTSAuservisit

HyperTSselectsacandidatemodel,πk .

πk recommendsitemA totheuser.

A

xt::contextfeatures

EstimatedCTRs

R1

R2

Rk

Rm

…HyperTSupdatesthe

estimationofRk basedon rt.

update

Page 63: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Two-LayerDecision

BernoulliThompsonSampling

π1 π2 πmπk

Item A Item B Item C

… …

Page 64: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

OurIdea2(HyperTSFB)

§ LimitationofPreviousIdea:§ Foreachrecommendation,userfeedbackisusedbyonlyoneindividualmodel (e.g.,πk).

§ Motivation:§ CanweupdateallR1,R2,…,Rm byeveryuserfeedback?(Shareeveryuserfeedbacktoeveryindividualmodel).

Page 65: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

OurIdea2(HyperTSFB)

§ Assumeeachmodelcanoutputtheprobabilityofrecommendinganyitemgivenxt.§ E.g.,fordeterministicrecommendation,itis1or0.

§ Forauservisitxt:§ πk isselectedtoperformrecommendation(k=1,2,…,orm).§ ItemA isrecommendedbyπkgivenxt.§ Receiveauserfeedback(clickornotclick),rt.§ Askeverymodelπ1,π2 …πm,whatistheprobabilityofrecommendingA givenxt.

Page 66: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

OurIdea2(HyperTSFB)

§ Assumeeachmodelcanoutputtheprobabilityofrecommendinganyitemgivenxt.§ E.g.,fordeterministicrecommendation,itis1or0.

§ Forauservisitxt:§ πk isselectedtoperformrecommendation(k=1,2,…,orm).§ ItemA isrecommendedbyπkgivenxt.§ Receiveauserfeedback(clickornotclick),rt.§ Askeverymodelπ1,π2 …πm,whatistheprobabilityofrecommendingAgivenxt.

EstimatetheCTRof π1,π2 …πm(ImportanceSampling)

Page 67: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ExperimentalSetup

§ ExperimentalData§ Yahoo!TodayNewsdatalogs(randomlydisplayed).§ KDDCup2012OnlineAdvertisingdataset.

§ EvaluationMethods§ Yahoo!TodayNews:Replay (seeLihongLiet.al’s WSDM2011paper).

§ KDDCup2012Data:Simulation byaLogisticRegressionModel.

Page 68: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ComparativeMethods

§ CTRPredictionAlgorithm§ LogisticRegression

§ Exploitation-ExplorationAlgorithms§ Random,ε-greedy,LinUCB,Softmax,Epoch-greedy,Thompsonsampling

§ HyperTSandHyperTSFB

Page 69: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ResultsforYahoo!NewsData

§ Every100,000impressionsareaggregatedintoabucket.

Page 70: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ResultsforYahoo!NewsData(Cont.)

Page 71: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Conclusions

§ Theperformanceofbaselineexploitation-explorationalgorithmsisverysensitivetotheparametersetting.§ Incold-startsituation,noenoughdatatotuneparameter.

§ HyperTSandHyperTSFBcanbeclosetotheoptimalbaselinealgorithm(Noguaranteebebetterthantheoptimalone),eventhoughsomebadindividualmodelsareincluded.

§ ForcontextualThompsonsampling,theperformancedependsonthechoiceofprior distributionforthelogisticregression.§ ForonlineBayesianlearning,theposteriordistribution

approximationisnotaccurate(cannotstorethepastdata).

Page 72: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

OurResearch2:PersonalizedRecommendationviaParameter-FreeContextualBandits

[1] Tang,Liang,etal."Personalizedrecommendationviaparameter-freecontextualbandits." Proceedingsofthe38thInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval.ACM,2015.

Page 73: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

HowtoBalanceTradeoff

§ Performanceismainlydeterminedbythetradeoff.Existingalgorithmsfindthetradeoffbyuserinputparametersanddatacharacteristics(e.g.,varianceoftheestimatedreward).

§ Existingalgorithmsareallparameter-sensitive.

Analgorithm

Bad

Goodalgorithmparameterisgood

algorithmparameterisbad

Page 74: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Chicken-and-EggProblemforExistingBanditAlgorithms

§ Whyweusebanditalgorithms?§ Solvethecoldstartproblem(Noenoughdataforestimatinguserpreferences).

§ Howtofindthebestinputparameters?§ Tunetheparametersonlineoroffline.

ifyoualreadyhavethedataoronlinetraffictotunetheparameters,whydoyouneedbandit

algorithms?

Page 75: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

OurWork

§ Parameter-free:§ Itcanfindthetradeoffbydatacharacteristicsautomatically.

§ Robust:§ Existingalgorithmcanhaveverybadperformanceiftheinputparameterisnotappropriate.

Page 76: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Solution

§ ThompsonSampling§ Randomlyselectamodelcoefficientvectorfromposteriordistributionandfindthe“best”item.

§ Prior istheinputparameterforcomputingposterior.

§ Non-BayesianThompsonSampling(OurSolution)§ RandomlyselectabootstrapsampletofindtheMLEofmodelcoefficientandfindthe“best”item.

§ Bootstrappinghasnoinputparameter.

Page 77: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

BootstrapBanditAlgorithm

Input:afeaturevectorx ofthecontext.Algorithm:

if each article has sufficient observations then {for each article i=1,…, k

i. Di ç randomly sample nk impression data of article i with replacement // Generate a bootstrap sample

ii. θiç MLE coefficient of Di // Model estimation on bootstrap sampleselect the article i* = argmax(f(x, θi)), i=1,…, k. to show.

}else{randomly select an article that has no sufficient observations to show.

}

Predictionfunction

Page 78: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

OnlineBootstrapBandits

§ WhyOnlineBootstrap?§ Inefficienttogenerateabootstrapsampleforeachrecommendation.

§ Howtoonlinebootstrap?§ Keepthecoefficientestimatedbyeachbootstrapsampleinmemory.

§ Noneedtokeepallbootstrapsamplesinmemory.§ Whenanewdataarrives,incrementallyupdatetheestimatedcoefficientforeachbootstrapsample[1].

[1]N.C.Oza andS.Russell.Onlinebaggingandboosting.InIEEEinternationalconferenceonSystems,manandcybernetics,volume3,pages2340–2345,2005.

Page 79: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

ExperimentData§ Twopublicdatasets

§ Newsrecommendationdata(Yahoo!TodayNews)§ NewsdisplayedontheYahoo!FrontPagefromOct.2nd,2011toOct.16th 2011.

§ 28,041,015uservisitevents.§ 136dimensionsoffeaturevectorforeachevent.

§ Onlineadvertisingdata(KDDCup2012,Track2)§ ThedatasetiscollectedbyasearchengineandpublishedbyKDDCup2012.

§ 1millionuservisitevents.§ 1,070,866dimensionsofthecontextfeaturevector.

Page 80: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

OfflineEvaluationMetricandMethods§ Setup

§ OverallCTR(averagerewardofatrial).

§ EvaluationMethod§ TheexperimentonYahoo!TodayNewsisevaluatedbythereplaymethod[1].

§ TherewardonKDDCup2012ADdataissimulatedwithaweightvectorforeachAD[2].

[1]L.Li,W.Chu,J.Langford,andX.Wang.Unbiasedofflineevaluationofcontextual-bandit-basednewsarticlerecommendationalgorithms.InWSDM,pages297–306,2011.[2] O.Chapelle andL.Li.Anempiricalevaluationofthompson sampling.InNIPS,pages2249–2257,2011.

Page 81: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Experimental Methods

§ Ourmethod§ Bootstrap(B),whereB isthenumberofbootstrapsamples.

§ Baselines§ Random:itrandomlyselectsanarmtopull.§ Exploit:itonlyconsidertheexploitationwithoutexploration.§ ε-greedy(ε):ε istheprobabilityofexploration.§ LinUCB(α):itpullsthearmwithlargestscoredefinedbytheparameter

α§ TS(q0):Thompsonsamplingwithlogisticregression,whereq0-1 isthe

priorvariance,0isthepriormean.§ TSNR(q0):SimilartoTS(q0),butthelogisticregressionisnotregularized

bytheprior.

Page 82: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Experiment(Yahoo!NewsData)§ Allnumbersarerelativetotherandommodel.

Page 83: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Experiment(ADKDDCup’12)§ Allnumbersarerelativetotherandommodel.

Page 84: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

CTRoverTimeBucket(Yahoo!NewsData)

Page 85: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

CTRoverTimeBuckets(KDDCupAdsData)

Page 86: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Efficiency§ Timecostondifferentbootstrapsamplesizes

Page 87: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

SummaryofExperiment

§ Summary§ Forsolvingthecontextualbanditproblem,thealgorithmsofє-greedyandLinUCBcanachievetheoptimalperformance,buttheinputparametersthatcontroltheexplorationneedtobetunedcarefully.

§ Theprobabilitymatchingstrategieshighlydependontheselectionoftheprior.

§ Ourproposedalgorithmisasafechoiceofbuildingpredictivemodelsforcontextualbanditproblemsunderthescenarioofcold-start.

Page 88: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Conclusion

§ Proposeanon-BayesianThompsonSamplingmethodtosolvethepersonalizedrecommendationproblem.

§ GiveboththeoreticalandempiricalanalysistoshowthattheperformanceofThompsonsamplingdependsonthechoiceoftheprior.

§ Conductextensiveexperimentsonrealdatasetstodemonstratetheefficacyoftheproposedmethod andothercontextualbanditalgorithms.

Page 89: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

FutureWork

§ MABwithsimilarityinformation§ MABinachangingenvironment§ Explore-exploit tradeoffinmechanismdesign§ Explore-exploitlearningwithlimitedresources§ Riskvs.rewardtradeoffinMAB

[1]http://research.microsoft.com/en-us/projects/bandits/

Page 90: Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

QuestionandAnswer

Thanks!