H2O Random Grid Search - PyData Amsterdam
-
Upload
srisatish-ambati -
Category
Software
-
view
1.361 -
download
0
Transcript of H2O Random Grid Search - PyData Amsterdam
![Page 2: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/2.jpg)
WHOAM I
• CustomerDataScientistatH2O.ai• Background
o Telecom(VirginMedia)o DataSciencePlatform(DominoDataLab)oWaterEngineering+MachineLearningResearch(STREAMIndustrialDoctorateCentre)
![Page 3: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/3.jpg)
ABOUTH2O
• Companyo Team:50(45shown)o Foundedin2012,MountainView,
California.o Venturecapitalbacked
• Productso Open-sourcemachinelearning
platform.o Flow(Web),R,Python,Spark,
Hadoopinterfaces.
![Page 4: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/4.jpg)
ABOUTTHISTALK
• StoryofabakerandadatascientistoWhyyoushouldcare
• Hyper-parametersoptimizationo Commontechniqueso H2OPythonAPI
• OtherH2Ofeaturesforstreamliningworkflow
![Page 5: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/5.jpg)
STORYOFABAKER
• Makingacakeo Source
• Ingredientso Process:
• Mixing• Baking• Decorating
o Endproduct• Anicelookingcake
Credit: www.dphotographer.co.uk/image/201305/baking_a_cake
![Page 6: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/6.jpg)
STORYOFADATASCIENTIST
• Makingadataproducto Source
• Rawdatao Process:
• Datamunging• Analyzing/Modeling• Reporting
o Endproduct• Apps,graphsorreports
Credit: www.simranjindal.com
![Page 7: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/7.jpg)
BAKERANDDATASCIENTIST
• Whatdotheyhaveincommon?o Processisimportanttobakersanddatascientists.Yet,mostcustomersdonotappreciatetheeffort.
oMostcustomersonlycareaboutrawmaterialsqualityandendproducts.
![Page 8: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/8.jpg)
WHYYOUSHOULDCARE
• Wecanusemachine/softwaretoautomatesomelaborioustasks.
• Wecanspendmoretimeonqualityassuranceandpresentation.
• Thistalkisaboutmakingonespecifictask,hyper-parameterstuning,moreefficient.
![Page 9: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/9.jpg)
HYPER-PARAMETERSOPTIMISATION
• Overviewo Optimizinganalgorithm’sperformance.
• e.g.RandomForest,GradientBoostingMachine(GBM)
o Tryingdifferentsetsofhyper-parameterswithinadefinedsearchspace.
o Norulesofthumb.
![Page 10: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/10.jpg)
HYPER-PARAMETERSOPTIMISATION
• Exampleofhyper-parametersinH2Oo RandomForest:
• No.oftrees,depthoftrees,samplerate…
o GradientBoostingMachine(GBM):• No.oftrees,depthoftrees,learningrate,samplerate…
o DeepLearning:• Activation,hiddenlayersizes,L1,L2,dropoutratios…
![Page 11: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/11.jpg)
COMMONTECHNIQUES
• Manualsearcho Tuningbyhand- inefficiento Expertopinion(notalwaysreliable)
• Gridsearcho Automatedsearchwithinadefinedspaceo Computationallyexpensive
• Randomgridsearcho Moreefficientthanmanual/gridsearcho Equalperformanceinlesstime
![Page 12: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/12.jpg)
RANDOMGRIDSEARCH– DOES ITWORK?
• RandomSearchforHyper-ParameterOptimizationo JournalofMachineLearningResearch(2012)o JamesBergstraandYoshuaBengioo “Comparedwithdeepbeliefnetworksconfiguredbyathoughtfulcombinationofmanualsearchandgridsearch,purelyrandomsearchfoundstatisticallyequalperformanceonfourofsevendatasets,andsuperiorperformanceononeofseven.”
![Page 13: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/13.jpg)
RELATEDFEATURE– EARLYSTOPPING
• Atechniqueforregularization.• Avoidover-fittingthetrainingset.• Usefulwhencombinedwithhyper-parametersearch:o Additionalcontrols (e.g.timeconstraint,tolerance)
![Page 14: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/14.jpg)
H2ORANDOMGRIDSEARCH
• Objectiveso Optimizemodelperformancebasedonevaluationmetric.
o Explorethedefinedsearchspacerandomly.o Useearly-stoppingforregularizationandadditionalcontrols.
![Page 15: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/15.jpg)
RANDOMGRIDSEARCH(PYTHONAPI )
![Page 16: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/16.jpg)
RANDOMGRIDSEARCH
• Outputso Bestmodelbasedonmetrico Asetofhyper-parametersforthebestmodel
• OtherAPIso R,REST,Java(seedocumentationonGitHub)
![Page 17: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/17.jpg)
OTHERH2OFEATURES
• h2oEnsembleo Betterpredictiveperformance
• SparklingWater=Spark+H2O• PlainOldJavaObject(POJO)
o ProductionizeH2Omodels
![Page 18: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/18.jpg)
CONCLUSIONS
• Mostpeopleonlycareabouttheendproduct.• UseH2Orandomgridsearchtosavetimeonhyper-parameterstuning.
• Spendmoretimeonqualityassuranceandpresentation.
![Page 19: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/19.jpg)
CONCLUSIONS
• H2ORandomGridSearcho Anefficientwaytotunehyper-parameterso APIsforPython,R,Java,RESTo DocheckoutthecodeexamplesonGitHub
• CombinewithotherH2Ofeatureso Streamlinedatascienceworkflow
![Page 20: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/20.jpg)
ACKNOWLEDGEMENTS
• GoDataDriven• Conferencesponsors• MycolleaguesatH2O.ai
![Page 21: H2O Random Grid Search - PyData Amsterdam](https://reader036.fdocuments.in/reader036/viewer/2022062316/589b53771a28ab4a398b6e75/html5/thumbnails/21.jpg)
THANKYOU
• Resourceso Slides+code– github.com/h2oai/h2o-meetupso DownloadH2O– www.h2o.aio Documentation– www.h2o.ai/docs/o [email protected]
• Wearehiring– www.h2o.ai/careers/