Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT...
Transcript of Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT...
![Page 1: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/1.jpg)
BanditLearningforNMTHyperparameter Search
KevinDuh
![Page 2: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/2.jpg)
May2018discussion
1
![Page 3: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/3.jpg)
SpeedingupHyperparameter Search
Givenbudgetconstraints,howtodecidewhichruntokillbeforeconvergence?
2
![Page 4: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/4.jpg)
K-armbanditproblem
- Eachrun/modelisanarm- Eachtimewepullanarm,wetrainthemodelbyonestep
- Whicharmshouldwepullfirst?
3
![Page 5: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/5.jpg)
Simulation
• WNMT2018DE-ENdata(4Msentences)• RunKmodelstoconvergence.Checkifbanditlearningcanchoosecorrectly.
• Seq2SeqHyperparameters:– Varied=BPE:10k,30k,50k;Embeddingsize:100,300,500;RNNhiddensize:100,300,500;#layers:1,2;Dropout:0.0-0.4
– Fixed=Defaultoptimizer,learning-ratescheduler• Checkpointfrequency:10k,Batchsize:128– Eachcheckpoint=1unitofbudget
4
![Page 6: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/6.jpg)
Epsilon-GreedyAlgorithm
• Foreachturnuntilbudgetrunsout:– Drawxfromrandom_uniform(0,1)– Ifx<epsilon(e.g.0.1)
• Pullrandomarm
– Else:• Pullbestarm:k’=argmax_k value[k]• Updatevalue[k’]=latestBLEU(oraveragesofar)
5
![Page 7: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/7.jpg)
Epsilon-Greedytendstoexploreonlymodelsthataregoodinitially.OKherebutrisky.(budget=40)
6
![Page 8: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/8.jpg)
UpperConfidenceBound(UCB)
• Idea:moreuncertaintyonarmslesspulled,sofavorthem.
• Foreachturnuntilbudgetrunsout:– Foreacharmk:
• Computebound[k]=sqrt(2log(totalcount)/count[k])– Pullbestarm:k’=argmax_k value[k]+bound[k]– Updatevalue[k’]=latestBLEU(oraveragesofar)– Incrementcount[k’]+=1;totalcount +=1
7
![Page 9: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/9.jpg)
Boundistoolargeinpractice.UCBuniformlyexploresallarms.(Budget=40)
8
![Page 10: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/10.jpg)
Hyperband/SuccessiveHalvingLiet.al.2016.Hyperband:ANovelBandit-BasedApproachtoHyperparameter Optimization
• Previously:choosing1armistoorisky,andvaluesaren’tfairlycomparableacrosssteps
• Idea:Choosehalfofpopulationateachturn
• L =list(Arms)• Foreachturnuntilbudgetrunsout:– PulleacharmkinL;updatevalue[k]=currentBLEU– S =[armssortedbyvalue]– L=tophalfofS
9
![Page 11: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/11.jpg)
PromisingarmstrainsuccessivelylongerunderSuccessiveHalving(Budget=40)
10
![Page 12: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/12.jpg)
16arms.SuccessiveHalvingwithBudget=96.
11
![Page 13: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/13.jpg)
Considerations/Discussions
• Nextsteps:– IncludemultipleobjectivesviaParetoranks–Makethismorepractical.Implementsuccessivehalvingasinnerloopwithinevolutionarysearch
• Algorithmicquestions:– Fixedoptimizer&learningratescheduler– Newrunvsoldrun
12
![Page 14: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/14.jpg)
Considerations/Discussions
• Implementationquestions:– ContinueonfinishedrunforSockeye:
• --params,--source-vocab,--target-vocab• Differentdatasets?E.g.smallerdatasets• Sockeye.prepared_data?
–Measurements:• Time:CPUdecoding?VsGPUdecoding• Accuracy:validationBLEUvstrainperplexity
13
![Page 15: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/15.jpg)
14
![Page 16: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/16.jpg)
SuggestionsfromMichael
• Tryvastlydifferentarchitecturesandrepeatthesimulation
• Trydifferentoptimizers– ADAMforlargedata- longruns,lookingattrainingperplexity,decreasinglearningrateslowlyby0.9
– EVEforsmalldata– NADAMdoesn’twork,butinterestingtotry
• CPUdecoder:lookatWNMT’18DockerforMKLversionthatismoreperformant
15
![Page 17: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/17.jpg)
SuggestionsforRudolphe
• Initialcondition:maybeIneedtotraineacharmlongerinitiallybeforestartingtheK-armbandits
• ButwhatifIhavetoomanyK?
16
![Page 18: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/18.jpg)
June2018discussion
17
![Page 19: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/19.jpg)
TEDDE-EN– differentoptimizers{adadelta,adagrad,adam,eve,nadam,rmsprop,sgd}x
initiallearningrate={0.0002,0.001}
batch_size=4096schedule=plataeu-reducelearning_rate_reduce_factor=0.7loss="cross-entropy”checkpoint=750(~1epoch)
Totalresource=56
18
![Page 20: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/20.jpg)
Sameaslastslide,butevaluateat10checkpointintervals(750x10updates)
Totalresource=280
Increasingresourceusageà safer19
![Page 21: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/21.jpg)
Initiallearning rate ValidationPerplexity
ValidationBLEU
Adadelta1 0.0002 19.17 23.27
Adadelta2 0.001 17.24 25.02
Adam1b 0.0002 20.68 24.53
Adam1 0.0002 20.41 24.25
Adam2b 0.001 17.18 25.68
Adam2 0.001 19.14 25.24
Eve1 0.0002 14.83 27.39
Eve2 0.001 40.06 12.59
Nadam1 0.0002 20.89 24.24
Nadam2 0.001 15.43 26.97
RMSprop1 0.0002 19.08 24.55
RMSprop2 0.001 16.10 27.00
adagrad 0.001,0.0002 411, 195
sgd 4171,66220
![Page 22: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/22.jpg)
WMTZH-EN– differencearchitecture
21
![Page 23: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/23.jpg)
WMTRU-EN– differentarchiecture
22
![Page 24: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/24.jpg)
SuggestionsfromMichael
• Differentbatchsize• Differentscheduler(sqrt)• Mixarchitectures(&differentencoder/decoderdepths&layersize)
• BLEU– howclosetothebestmodel,i.e.canIgetto0.2BLEUofbestmodelwith10%oftheresources?
23
![Page 25: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/25.jpg)
July2018discussion
24
![Page 26: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/26.jpg)
MoreexperimentstoverifyHyberband’s robustness
• Motivation:– PreviousHyperband resultswerepromising,butwanttotestonmorediverse(e.g.crisscrossing)learningcurves
• Thismonth:– Curriculumlearningexperiments
25
![Page 27: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/27.jpg)
CurriculumLearning
• Hunch:– Startbytrainingeasysamples– Asmodelimproves,addinhardersamples– Maybemodelwillconvergefaster?OrbetterBLEU?
• SockeyeImplementation(atMTMA):– Easy/hardsamplesareassignedtodifferentshards– Schedulewhatshardisvisibletotraineratwhattime
26
![Page 28: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/28.jpg)
CurriculumLearning- Visualization
TrainingTimei.e.updates
VeryEasy
Hard
Startwitheasyshard
VeryHard
Easy MidLevel
Graduallyaddhardershards
CurriculumUpdateFrequencye.g.every1000updates
Atthispoint,seealldataandgetrandombatches
Visible(i.e.available)shards27
![Page 29: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/29.jpg)
CurriculumLearning– manyvariantstogetdifferentlearningcurves
• Differentschedules,e.g.
• Differentdefinitionsofeasy/hard:– Sentencelength– Vocabularyfrequency– Force-decode/1bestscoreofexistingmodel
• Differentcurriculumupdatefrequency28
![Page 30: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/30.jpg)
Setup
• Data:De-EnTED• Preparation:– 100trainingrunswithdifferentcurriculumlearningsetting
– Randomlydraw16runseachtimeandobserveHyberband results
• Question:– CanHyperband correctlybetonnear-bestruns?
29
![Page 31: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/31.jpg)
30
![Page 32: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/32.jpg)
31
![Page 33: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/33.jpg)
RankHistogram(100randomtrials)
8120/64[19,19,20,18,14,5,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]8240/128[26,23,20,14,10,2,2,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]8360/192[42,31,14,9,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]8480/256[43,23,18,10,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]16148/256[16,11,10,10,14,12,11,3,6,2,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]16296/512[41,20,14,8,1,8,0,5,0,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]163144/768[46,23,12,3,6,7,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]164192/1024[47,21,8,6,7,4,3,2,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]321112/1024[28,11,7,5,7,4,9,5,2,3,1,3,0,1,0,2,0,2,3,1,1,0,0,0,2,2,1,0,0,0]322224/2048[45,21,11,2,1,3,3,4,5,3,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]323336/3072[101,61,14,6,2,4,2,1,2,3,2,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]324448/4096[56,33,1,4,1,2,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]641256/4096[10,22,27,14,1,1,0,1,0,0,2,0,0,2,3,0,3,3,3,4,0,0,2,0,2,0,0,0,0,0]642512/8192[19,32,18,18,8,3,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0]
#run
Halving_freq Resourceusedvs gridsearch
In19/100trials,Hyberband choserank1(best)curveIn20/100trials,Hyberband choserank3curve
32
![Page 34: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/34.jpg)
Summary
• FoundrelativelyrobustsettingsforHyberbandonNMTlearningcurves
• Next:tryondifferentNMTarchitecturesandincorporatespeed/accuracymulti-objective
• (Nextmeeting:SeptemberratherthanAugust?CurrentlydoingsummerworkshoponDomainAdaptationforNMT)
33
![Page 35: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/35.jpg)
JHUHLTCOESCALE2018Workshop:ResilientMachineTranslation
forNewDomains
KevinDuh,PaulMcNamee,KathyBaker,PhilippKoehn,BrianThompson,ChrisCallison-Burch,
JanNiehues,MarineCarpuat,TimAnderson,JeremyGwinnup,MariannaMartindale,Jenn Drexler,
Calandra Moore,StevenBradtke,JamesWoo,Gaurav Kumar,HudaKhayrallah,PamelaShapiro,BeckyMarvin,JonathanWeese,Dusan Varis
FinalPresentation:Baltimore,August9(savethedate!)
![Page 36: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/36.jpg)
Goal:ImproveDomainAdaptationofNMT
Test TrainingdataforNMT Ar-En De-En Fa-En Ko-En Ru-En
Zh-En
TEDTalks
GeneralDomain 29.6 34.6 22.2 11.6 23.4 15.9InDomain(TED) 27.4 32.3 21.3 14.4 22.9 16.2ContinuedTraining 35.4 39.9 27.9 17.2 28.6 20.4
Patent GeneralDomain n/a 36.0 n/a 2.7 23.4 12.6In-Domain(Patent) n/a 61.9 n/a 29.9 26.9 40.2Continued Training n/a 62.3 n/a 31.7 37.0 43.7
35BLEUscores:ContinuedTraininggivesconsistentgains(~0.5-5BLEU)
LargeGeneral-DomainBitext In-DomainBitext:e.g.patents
GENERALMODEL
ADAPTEDMODEL
1.Train3.ContinueTraining
2.Initialize
![Page 37: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/37.jpg)
SuggestionsfromMichael
• Quantifyresourcessaved:– whatpercentageofresourcescanwesavevs gridsearchwhileachievingsimilarmodels
• Whatwerethebadexamplesinthehistogramcurve?
• PullRequestforCurriculumLearningcode
36
![Page 38: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/38.jpg)
September2018discussion
37
![Page 39: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/39.jpg)
Summarysofar
38
Withbanditlearning,wecansaveX%ofresourceswhileachievinglessthanYdegradationinBLEU.(Here,X=81%,Y=0)
Openquestion:- Resultsfordrastically
differentarchitectures
![Page 40: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/40.jpg)
Otherdirections
39
• Methodsforspeedinguptraining(i.e.inner-loopofhyperparameter optimization)– Banditlearning– Datasub-selectionfortrainingspeedup
• Methodsforspeedingupmodelsorreducingresourceusageduringinferenceingeneral–Modelcompression
![Page 41: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/41.jpg)
Datasubsetselection:Formulation
• TrainingdataT:Nsamples• CanweselectsubsetSofM<<Nsamples– Wheretrainingonsubsetgivessamehyperparametersearchrecommendationsastrainingonfullset?
• Formulation:1. TrainKmodelswithdifferenthyperparameters onT2. SimilarlytheKmodelstrainonsubsetS3. Comparetherankingof(1)and(2).Ifsame,then
datasubsetisagoodsurrogate
40
![Page 42: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/42.jpg)
Datasubsetselection:Details
• Baseline1:TrainonT asusual,withuptosametrainingtimeasM*#epoch
• Baseline2:Flipthesubsetselectioncriteria• Subsetselectionmethod:– Cynicaldataselection– Vocabularybasedselection
• Evaluation:– Howtointerpretrankingdifferences?
41
![Page 43: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/43.jpg)
Modelcompression
• Focusmoreoninferenceresourceconstraints• Existingideastoexplore:–ModelDistillation– Quantization– (Discussion)
• Comparespeed,memoryfootprint• Integratethisinlargerauto-tuningloop
42
![Page 44: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/44.jpg)
Discussionnotes(withMichael)• Datasubsetselection:
– It’dbegoodtohavealltheplots– informativewhetherresultsaregoodorbad
– Forvocabselection:currentlysomethingsimilarisdoneinunittests.(Replacemostvocabwithunk)
• Modeldistillation:– Trainbigmodelandtranslatetrainingdata.Trainsmallmodelandthencontinuetrainingonbigmodel’soutputs.Thismaybesufficient(noneedforoutputdistributionasIoriginallyimagined)
• Quantization:– MichaelwillhelplookforpointersonquantizationworkinMxNet
43
![Page 45: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/45.jpg)
44
![Page 46: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/46.jpg)
• Next:exploringaneworthogonaldirectionforspeedinguphyperparameter search
45
![Page 47: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/47.jpg)
DataSubsetSelectionforNMTHyperparameter Search
KevinDuh
![Page 48: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/48.jpg)
Motivation
• Ittakestimetotrainmodelstoconvergenceonlargedatasets
• Question:Canwetrainmodelstoconvergenceonasmalldataset?– E.g.inapaperlongtimeago,Lecun suggestsfiddlinglearningrateonsmallsubsetfirst
–Whatsubsetleadstofastconvergence?– Doestherankingofhyperparameters onsmallsubsetcorrelatewiththatonthefulldataset?
47
![Page 49: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/49.jpg)
Datasubsetselection:Formulation
• TrainingdataT:Nsamples• CanweselectsubsetSofM<<Nsamples– Wheretrainingonsubsetgivessamehyperparametersearchrecommendationsastrainingonfullset?
• Formulation:1. TrainKmodelswithdifferenthyperparameters onT2. SimilarlytheKmodelstrainonsubsetS3. Comparetherankingof(1)and(2).Ifsame,then
datasubsetisagoodsurrogate
48
![Page 50: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/50.jpg)
PreliminaryExperiments• Big:Modeltrainedon28millionsentencepairsofgeneral-domainDE-EN– Approx 300k-500kupdatestoconverge
• Small:Modeltrainedonrandomlyselected10%ofdata– Approx 150k-300kupdates(30-60hours)toconverge
• Vocab:Modeltrainedonsentencescontainingonlythetop1/256vocabulary– Approx 100k-200kupdatestoconverge
à Vary#layers,size,etc.andseeifrankingisthesameforBigvsSmallandBigvsVocab
49
![Page 51: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/51.jpg)
Datasubsetselectionbyvocabulary
50
![Page 52: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/52.jpg)
LSTMvs Transformer,Layer=1,2,4
51
![Page 53: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/53.jpg)
Changinglearningrates
52
![Page 54: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/54.jpg)
ConnectionstoBanditLearning
• Bandit:Stopsomerunsbeforeconvergence• DataSelection:Shortertimetoconvergence• Thesearebothheuristicsonearlystoppingsomemodelsduringhyperparameter search
• Nextstep:– Collectmoreempiricalresults– Experimentwithotherdataselectionmethods
53
![Page 55: Bandit Learning for NMT HyperparameterSearchkevinduh/t/201805-bandit.pdfBandit Learning for NMT HyperparameterSearch Kevin Duh May 2018 discussion 1 Speeding up HyperparameterSearch](https://reader034.fdocuments.in/reader034/viewer/2022042809/5f9148423e23cb4c5f3eae47/html5/thumbnails/55.jpg)
Discussionnotes(withMichael)
• Plotallresultsonsamefigure• Tryevensmallerdatasetsandseewhenrankingstartstobreak
• Experimentonatleastonemoredataset• Aswrap-up,findgeneralrecommendations:basedonsomedatasetcharacteristic,whatspeed-upmethodtouse?
54