Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen,...

14
Model selection and resampling methods MSc Data Science 2 nd sememster

Transcript of Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen,...

Page 1: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Modelselectionandresamplingmethods

MScDataScience2ndsememster

Page 2: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Audience

Minimalbackgroundinmathematicsandstatistic-analysisandcalculus(integral,derivatives,studyoffunctions,…)-basicstatisticalconcepts(expectation,median,covariance,distributions,…)

Minimalknowledgeonstatisticalmodeling

-regression-expectation,variance/covariance,statisticaldescriptors,…

BasicexpertisewithPythonandJupyterNotebook-installingnewpackages-writingbasiccodeandrunningpipelines-knowledgeofstandardlibraries(numpy,pandas,scikit-learn)

Page 3: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Thecourse

BasedonlessonsandnotebooksAdditionalreadingmaterialandreferencesareprovidedateachlessonAllthematerialavailableatthecoursewebsite

https://marcolorenzi.github.io/teaching.html

10lessonsWhatisexpectedfromyou:•  Bi-weeklyassignments:

4intotal,10points

•  Finalexam:10points

Page 4: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Whymodelselection?

source:https://machinelearningmastery.com

Page 5: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Thecommondenominator

Data

Hypothesis

Fancywaytocombinedataaccordingtohypothesis

Answer:prediction,

datarelationship,…

Page 6: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

MachineLearningandPigeons

“…aclockisarrangedtopresentthefoodhopperatregularintervalswithnoreferencewhatsoevertothebird'sbehavior”.-Onebirdwasconditionedtoturncounter-clockwiseaboutthecage-Anotherrepeatedlythrustitsheadintooneoftheuppercornersofthecage-Athirddevelopeda'tossing’responseasifplacingitsheadbeneathaninvisiblebarandliftingitrepeatedly-Twobirdsdevelopedapendulummotionoftheheadandbody-Anotherbirdwasconditionedtomakeincompletepeckingorbrushingmovementsdirectedtowardbutnottouchingthefloor

Thebirdhappenstobeexecutingsomeresponseasthehopperappears;asaresultittendstorepeatthisresponse

‘Superstition’inthePigeon

B.F.SkinnerJournalofExperimentalPsychology#38,1947

Page 7: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Modelscanbesuperstitious(andbehavelikepigeons)

Theytendtoenhanceabehavior(prediction)whenapositiveresponse(data)ismet.The“enhancing”abilitydependsontheirassumptions:

Page 8: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

AmodelisnottrueAmodelprovidesanopinion: -basedonsomesenseofrealityundertheformofdata

-basedonitsownassumptions

Thejobofadatascientististodeterminewhetheranopinioncanbetrusted

Whendifferentopinionsareavailable,thisiscalled

modelselection

Protagoras,wikipedia.org

Relativism “… there is no absolute evaluation of the nature… because theevaluation will be relative to who is perceiving it. Therefore, toPersonX,theweatheriscold,whereastoPersonY,theweatherishot.Thisphilosophyimpliesthattherearenoabsolute"truths".Thetruth,according toProtagoras, is relative,anddiffersaccording toeachindividual”

Page 9: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Whymodelselection(2)Tacklingawhatevermachinelearningproblem(regression,classifications,…)1-Writingsomecodeimplementingamodelidea2-Gettingthedatafromsomerepository3-Trainingthemodel:

3a.Bugfixes,parametershacking3b.Useonalloronsomepartofthedata,cross-validating,collectingresults,…

4-Repeatpoint3many(many)times.Trying,trying,trying,…5-Onceeverythingworksreasonably,fixingalltheparametersandprocessingsteps6-Publishing/reportingtheresults

Page 10: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Whymodelselection(2)

Mendelsonetal.NeuroImage:Clinical,2017

Acasestudy:automatedclinicaldiagnosisofAlzheimer’sdisease

Page 11: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Standardapproachestomodelselection

-Empirical(part1)JackknifeBootstrapCross-validationStep-wisecomparison-Theoretical(part2)InformationCriteriaBayesianmodelcomparison

Page 12: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Part1.Afocusonresamplingmethods.

“Pullyourselfupbyyourbootstraps”…widelythoughttobebasedononeoftheeighteenthcenturyAdventuresofBaronMunchausen,byRudolfErichRaspe.

Efron&TibshiraniAnIntroductiontoBootstrap,1993

Keyconceptofdataresampling.Doingourbestoutoftheavailableresources.

Page 13: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Attheendofthecourseyouwillbeableto

•  Criticallyassesstheperformanceofthemodelonaspecifiedtask

•  Identifyandpreventthesourcesofassessmentbias

•  Createyourownbenchmarkforavarietyofmodelingproblem

•  Identifymodelingalternativesandevaluationstrategies

•  Visualizeandpresentperformancesacrossmodels

•  Understandthebasisoftheoreticalapproachestomodelselection

Page 14: Model selection and resampling methods · 2021. 4. 8. · century Adventures of Baron Munchausen, by Rudolf Erich Raspe. Efron & Tibshirani An Introduction to Bootstrap, 1993 Key

Questions?