CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine...

34
CPSC 340: Machine Learning and Data Mining Least Squares Fall 2020

Transcript of CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine...

Page 1: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

CPSC340:MachineLearningandDataMining

LeastSquaresFall2020

Page 2: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

Admin• Assignment3isup:– Startearly,thisisusuallythelongestassignment.

• We’regoingtostartusingcalculus andlinearalgebraalot.– YoushouldstartreviewingtheseASAPifyouarerusty.– Areviewofrelevantcalculusconceptsishere.– Areviewofrelevantlinearalgebraconceptsishere.

Page 3: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

SupervisedLearningRound2:Regression• We’regoingtorevisitsupervisedlearning:

• Previously,weconsideredclassification:– Weassumedyi wasdiscrete:yi =‘spam’oryi =‘notspam’.

• Nowwe’regoingtoconsiderregression:– Weallowyi tobenumerical:yi =10.34cm.

Page 4: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

Example:Dependentvs.ExplanatoryVariables• Wewanttodiscoverrelationshipbetweennumericalvariables:– Doesnumberoflungcancerdeathschangewithnumberofcigarettes?– Doesnumberofskincancerdeathschangewithlatitude?

http://www.cvgs.k12.va.us:81/digstats/main/inferant/d_regrs.htmlhttps://onlinecourses.science.psu.edu/stat501/node/11

Page 5: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

Example:Dependentvs.ExplanatoryVariables• Wewanttodiscoverrelationshipbetweennumericalvariables:– Dopeopleinbigcitieswalkfaster?– Istheuniverseexpandingorshrinkingorstayingthesamesize?

http://hosting.astro.cornell.edu/academics/courses/astro201/hubbles_law.htmhttps://www.nature.com/articles/259557a0.pdf

Page 6: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

Example:Dependentvs.ExplanatoryVariables• Wewanttodiscoverrelationshipbetweennumericalvariables:– Doesnumberofgundeathschangewithgunownership?– Doesnumberviolentcrimeschangewithviolentvideogames?

http://www.vox.com/2015/10/3/9444417/gun-violence-united-states-americahttps://www.soundandvision.com/content/violence-and-video-games

Page 7: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

Example:Dependentvs.ExplanatoryVariables• Wewanttodiscoverrelationshipbetweennumericalvariables:

– DoeshighergenderequalityindexleadtomorewomenSTEMgrads?

• Notthatwe’redoingsupervisedlearning:– Tryingtopredictvalueof1variable(the‘yi’values).(insteadofmeasuringcorrelationbetween2).

• Supervisedlearningdoesnotgivecausality:– OK:“Higherindexiscorrelatedwithlowergrad%”.– OK:“Higherindexhelpspredictlowergrad%”.– BAD:“Higherindexleadstolowergrads%”.

• People/mediagettheseconfusedallthetime,becareful!• Therearelotsofpotentialreasonsforthiscorrelation.

https://www.weforum.org/agenda/2018/02/does-gender-equality-result-in-fewer-female-stem-grads/

Page 8: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

HandlingNumericalLabels• Onewaytohandlenumericalyi:discretize.– E.g.,for‘age’couldweuse{‘age≤20’,‘20<age≤30’,‘age>30’}.– Nowwecanapplymethodsforclassificationtodoregression.– Butcoarsediscretizationlosesresolution.– Andfinediscretizationrequireslotsofdata.

• Thereexistregressionversionsofclassificationmethods:– Regressiontrees,probabilisticmodels,non-parametricmodels.

• Today:oneofoldest,butstillmostpopular/importantmethods:– Linearregressionbasedonsquarederror.– Interpretableandthebuildingblockformore-complexmethods.

Page 9: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LinearRegressionin1Dimension• Assumeweonlyhave1feature(d=1):– E.g.,xi isnumberofcigarettesandyi isnumberoflungcancerdeaths.

• Linearregressionmakespredictions𝑦"i usingalinearfunctionofxi:

• Theparameter‘w’istheweight orregressioncoefficient ofxi.– We’retemporarilyignoringthey-intercept.

• Asxi changes,slope‘w’affectstheratethat𝑦"i increases/decreases:– Positive‘w’:𝑦"i increaseasxi increases.– Negative‘w’:𝑦"i decreasesasxi increases.

Page 10: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LinearRegressionin1Dimension

Page 11: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

Aside:terminologywoes• Differentfieldsusedifferentterminologyandsymbols.– Datapoints=objects=examples =rows=observations.– Inputs =predictors=features =explanatoryvariables=regressors =independentvariables=covariates=columns.

– Outputs =outcomes=targets=responsevariables=dependentvariables(alsocalleda“label”ifit’scategorical).

– Regressioncoefficients=weights=parameters=betas.

• Withlinearregression,thesymbolsareinconsistenttoo:– InML,thedataisXandy,andtheweightsarew.– Instatistics,thedataisXandy,andtheweightsareβ.– Inoptimization,thedataisAandb,andtheweightsarex.

Page 12: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LeastSquaresObjective• Ourlinearmodelisgivenby:

• Sowemakepredictions foranewexamplebyusing:

• Butwecan’tusethesameerrorasbefore:– Itisunlikelytofindalinewhere𝑦"𝑖 = 𝑦𝑖 exactly formanypoints.

• Duetonoise,relationshipnotbeingquitelinearorjustfloating-pointissues.– “Best”modelmayhave|𝑦"𝑖 − 𝑦𝑖| issmall butnotexactly0.

Page 13: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LeastSquaresObjective• Insteadof“exactyi”,weevaluate“size”oftheerror inprediction.• Classicwayissettingslope‘w’tominimizesumof squarederrors:

• Therearesomejustificationsforthischoice.– Aprobabilisticinterpretationiscominglaterinthecourse.

• Butusually,itisdonebecauseitiseasytominimize.

Page 14: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LeastSquaresObjective• Classicwaytosetslope‘w’isminimizingsumof squarederrors:

Page 15: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LeastSquaresObjective• Classicwaytosetslope‘w’isminimizingsumof squarederrors:

Page 16: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

MinimizingaDifferentialFunction• Math101approachtominimizingadifferentiablefunction‘f’:

1. Takethederivativeof‘f’.2. Findpoints‘w’wherethederivativef’(w)isequalto0.3. Choosethesmallestone(andcheckthatf’’(w)ispositive).

Page 17: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

Digression:MultiplyingbyaPositiveConstant• Notethatthisproblem:

• Hasthesamesetofminimizers asthisproblem:

• Andthesealsohavethesameminimizers:

• Icanmultiply‘f’byanypositiveconstantandnotchangesolution.– Derivativewillstillbezeroatthesamelocations.– We’llusethistrickalot!

(Quoratrollingonethicsofthis)

Page 18: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

FindingLeastSquaresSolution• Finding‘w’thatminimizessumof squarederrors:

Page 19: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

FindingLeastSquaresSolution• Finding‘w’thatminimizessumof squarederrors:

• Let’scheckthatthisisaminimizer bycheckingsecondderivative:

– Since(anything)2 isnon-negativeand(anythingnon-zero)2 >0,ifwehaveonenon-zerofeaturethenf’’(w)>0andthisisaminimizer.

Page 20: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LeastSquaresObjective/Solution(AnotherView)

• Leastsquaresminimizesaquadraticthatisasumofquadratics:

Page 21: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

(pause)

Page 22: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

Motivation:CombiningExplanatoryVariables• Smokingisnottheonlycontributortolungcancer.– Forexample,thereenvironmentalfactorslikeexposuretoasbestos.

• Howcanwemodelthecombined effect ofsmokingandasbestos?• Asimplewayiswitha2-dimensionallinearfunction:

• Wehaveaweightw1 forfeature‘1’andw2 forfeature‘2’:

Page 23: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LeastSquaresin2-Dimensions• Linearmodel:

• Thisdefinesatwo-dimensionalplane.

Page 24: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LeastSquaresin2-Dimensions• Linearmodel:

• Thisdefinesatwo-dimensionalplane.

• Notjustaline!

Page 25: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

DifferentNotationsforLeastSquares• Ifwehave‘d’features,thed-dimensionallinearmodel is:

– Inwords,ourmodelisthattheoutputisaweightedsumoftheinputs.

• Wecanre-writethisinsummationnotation:

• Wecanalsore-writethisinvectornotation:

Page 26: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

NotationAlert(again)• Inthiscourse,allvectorsareassumedtobecolumn-vectors:

• SowTxi isascalar:

• Sorowsof‘X’areactuallytransposeofcolumn-vectorxi:

Page 27: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LeastSquaresind-Dimensions• Thelinearleastsquaresmodelind-dimensionsminimizes:

• Datesbackto1801:GaussusedittopredictlocationofCeres.• Howdowefindthe bestvector‘w’ in‘d’dimensions?– Canwesetthe“partialderivative”ofeachvariableto0?

Page 28: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

PartialDerivatives

http://msemac.redwoods.edu/~darnold/math50c/matlab/pderiv/index.xhtml

Page 29: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

PartialDerivatives

http://msemac.redwoods.edu/~darnold/math50c/matlab/pderiv/index.xhtml

Page 30: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LeastSquaresPartialDerivatives(1Example)• Thelinearleastsquaresmodelind-dimensionsfor1example:

• Computingthepartialderivative forvariable‘1’:

Page 31: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

LeastSquaresPartialDerivatives(‘n’Examples)• Linearleastsquarespartialderivativeforvariable1onexample‘i’:

• Foragenericvariable‘j’wewouldhave:

• Andif‘f’issummedoverall‘n’exampleswewouldhave:

• Unfortunately,thepartialderivativeforwj dependsonall{w1,w2,…,wd}– Ican’tjust“setequalto0andsolveforwj”.

Page 32: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

GradientandCriticalPointsind-Dimensions• Generalizing“setthederivativeto0andsolve”ind-dimensions:– Find‘w’wherethegradientvector equalsthezerovector.

• Gradient isvectorwithpartialderivative‘j’inposition‘j’:

http://msemac.redwoods.edu/~darnold/math50c/matlab/pderiv/index.xhtml

Page 33: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

GradientandCriticalPointsind-Dimensions• Generalizing“setthederivativeto0andsolve”ind-dimensions:– Find‘w’wherethegradientvector equalsthezerovector.

• Gradient isvectorwithpartialderivative‘j’inposition‘j’:

http://msemac.redwoods.edu/~darnold/math50c/matlab/pderiv/index.xhtml

Page 34: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.

Summary• Regression considersthecaseofanumericalyi.• Leastsquaresisaclassicmethodforfittinglinearmodels.– With1feature,ithasasimpleclosed-formsolution.– Canbegeneralizedto‘d’features.

• Gradient isvectorcontainingpartialderivativesofallvariables.• Nexttime: