Biodiversity and Classification. BIODIVERSITY AND CLASSIFICATION.
Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised...
Transcript of Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised...
![Page 1: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/1.jpg)
Introduction to Machine Learning Summer SchoolJune 18, 2018 - June 29, 2018, Chicago
Instructor:SuriyaGunasekar,TTIChicago
20June2018
Day3:Classification,logisticregression
![Page 2: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/2.jpg)
Topicssofar
• Supervisedlearning,linearregression• Yesterday
o Overfitting,o RidgeandlassoRegressiono Gradientdescent
• Todayo Biasvariancetrade-offo Classificationo Logisticregressiono Regularizationforlogisticregressiono Classificationmetrics
1
![Page 3: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/3.jpg)
Bias-variance tradeoff
2
![Page 4: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/4.jpg)
3
Empiricalvspopulationloss
• PopulationdistributionLet 𝑥, 𝑦 ∼ 𝒟• Wehave
o Lossfunctionℓ(𝑦(, 𝑦)o Hypothesisclassℋo Trainingdata𝑆 = 𝑥 - , 𝑦 - : 𝑖 = 1,2, … , 𝑁 ∼445 𝒟6
§ ThinkofS asrandomvariable
• Whatwereallywant𝑓 ∈ ℋ tominimize𝐩𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧𝐥𝐨𝐬𝐬
𝐿𝒟(𝑓) ≜ 𝐄𝒟 ℓ 𝑓 𝑥 , 𝑦 = F ℓ 𝑓 𝑥 , 𝑦�
(H,I)Pr(𝑥, 𝑦)
• ERMminimizesempiricalloss
𝐿L 𝑓 ≜ 𝐄ML ℓ 𝑓 𝑥 , 𝑦 =1𝑁Nℓ 𝑓 𝑥 - , 𝑦 -6
-OP
e.g,Pr 𝑥 = 𝑢𝑛𝑖𝑓𝑜𝑟𝑚 0,1𝑦 = 𝑤∗. 𝑥 + 𝜖 where𝜖 = 𝒩 0,0.1
⇒ Pr 𝑦 𝑥 = 𝒩 𝑤∗. 𝑥, 0.1Pr 𝑥, 𝑦 = Pr 𝑥 Pr(𝑦|𝑥)
![Page 5: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/5.jpg)
4
Empiricalvspopulationloss
𝐿(𝑓) ≜ 𝐄𝒟 ℓ 𝑓 𝑥 , 𝑦 = F ℓ 𝑓 𝑥 , 𝑦�
(H,I)Pr(𝑥, 𝑦)
𝐿L 𝑓 ≜ 𝐄ML ℓ 𝑓 𝑥 , 𝑦 =1𝑁Nℓ 𝑓 𝑥 - , 𝑦 -6
-OP
• 𝑓_L fromsomemodeloverfits to𝑆 ifthereis𝑓∗ ∈ ℋ with𝐄ML ℓ 𝑓_L 𝑥 , 𝑦 ≤ 𝐄ML ℓ 𝑓∗ 𝑥 , 𝑦 but𝐄𝒟 ℓ 𝑓_L 𝑥 , 𝑦 ≫ 𝐄𝒟 ℓ 𝑓∗ 𝑥 , 𝑦
• If𝑓 isindependentof𝑆efg-h thenboth𝐿Lijklm 𝑓 and𝐿Linoi 𝑓 aregoodapproximationsof𝐿𝒟 𝑓
• Butgenerally,𝑓_ dependson𝑆efg-h.Why?o 𝐿Lijklm 𝑓_Lijklm isnomoreagoodapproximationof𝐿𝒟 𝑓
o 𝐿Linoi 𝑓_Lijklm isstillagoodapproximationof𝐿𝒟 𝑓 since 𝑓_Lijklm isindependentof𝑆epqe
![Page 6: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/6.jpg)
5
OptimumUnrestrictedPredictor
• Considerpopulationsquaredloss
argminw∈ℋ
𝐿(𝑓) ≜ 𝐄𝒟 ℓ 𝑓 𝑥 , 𝑦 = 𝐄(H,I) 𝑓 𝑥 − 𝑦 y
• Sayℋ isunrestricted– anyfunction𝑓: 𝑥 → 𝑦 isallowed𝐿 𝑓 = 𝐄(H,I) 𝑓 𝑥 − 𝑦 y = 𝐄H 𝐄I 𝑓 𝑥 − 𝑦 y|𝑥
= 𝐄H 𝐄I 𝑓 𝑥 − 𝐄I 𝑦 𝑥 + 𝐄I 𝑦 𝑥 − 𝑦 y|𝑥
= 𝐄H 𝐄I 𝑓 𝑥 − 𝐄I 𝑦 𝑥y|𝑥 + 𝐄H 𝐄I 𝐄I 𝑦 𝑥 − 𝑦 y|𝑥
+ 2𝐄H 𝐄I (𝑓 𝑥 − 𝐄I 𝑦 𝑥 )(𝐄I 𝑦 𝑥 − 𝑦)|𝑥
= 𝐄H[ 𝑓 𝑥 − 𝐄I 𝑦 𝑥y] + 𝐄H,I 𝐄I 𝑦 𝑥 − 𝑦 y
notafunctionof𝑦
= 0
Noiseminimizedfor𝑓 = 𝐄I 𝑦 𝑥
![Page 7: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/7.jpg)
6
Biasvariancedecomposition
• Bestunrestrictedpredictor𝑓∗∗(𝑥) = 𝐄I[𝑦|𝑥]
• 𝐿 𝑓L = 𝐄H[ 𝑓L 𝑥 − 𝑓∗∗ 𝑥 y] + 𝐄H,I 𝑓∗∗ 𝑥 − 𝑦 y
• 𝐄L𝐿 𝑓L = 𝐄L𝐄H 𝑓L 𝑥 − 𝑓∗∗ 𝑥 y + 𝑛𝑜𝑖𝑠𝑒
𝐄L𝐄H 𝑓L 𝑥 − 𝑓∗∗ 𝑥 y = 𝐄H 𝐄L 𝑓L 𝑥 − 𝑓∗∗ 𝑥 y|𝑥
= 𝐄H𝐄L 𝑓L 𝑥 − 𝐄L 𝑓L 𝑥 + 𝐄L 𝑓L 𝑥 − 𝑓∗∗ 𝑥 y|𝑥
= 𝐄H𝐄L 𝑓L 𝑥 − 𝐄L 𝑓L 𝑥 y|𝑥 + 𝐄H 𝐄L 𝑓L 𝑥 − 𝑓∗∗ 𝑥 y
+ 2𝐄H 𝐄L 𝐄L 𝑓L 𝑥 − 𝑓∗∗ 𝑥 𝑓L 𝑥 − 𝐄L 𝑓L 𝑥 |𝑥
= 𝐄L,H 𝑓L 𝑥 − 𝐄L 𝑓L 𝑥 y + 𝐄H 𝐄L 𝑓L 𝑥 − 𝑓∗∗ 𝑥 y
=variance+bias2+noise
𝐄L𝐿 𝑓L = 𝐄L,H 𝑓L 𝑥 − 𝐄L 𝑓L 𝑥 y
+𝐄H 𝐄L 𝑓L 𝑥 − 𝑓∗∗ 𝑥 y
+𝐄H,I[ 𝑓∗∗ 𝑥 − 𝑦 y]
![Page 8: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/8.jpg)
7
Bias-variancetradeoff
• 𝑓L ∈ ℋ• noise isirreducible• variance canreducedby
o getmoredatao make𝑓L lesssensitiveto𝑆
§ lessnumberofcandidatesinℋ tochoosefromàless variance§ reducingthe“complexity”ofmodelclassℋ decreasesvariance
• biasy ≥ minw∈���� ℋ
𝐄H 𝑓 𝑥 − 𝑓∗∗ 𝑥 y
§ expandingmodelclassℋ decreasesbias
=variance+bias2+noise
𝐄L𝐿 𝑓L = 𝐄L,H 𝑓L 𝑥 − 𝐄L 𝑓L 𝑥 y
+𝐄H 𝐄L 𝑓L 𝑥 − 𝑓∗∗ 𝑥 y
+𝐄H,I[ 𝑓∗∗ 𝑥 − 𝑦 y]
![Page 9: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/9.jpg)
8
Modelcomplexity• reducingthecomplexityofmodelclassℋ decreasesvariance• expandingmodelclassℋ decreasesbias• Complexity≈ numberofchoicesinℋ
• Foranyloss𝐿,forall𝑓 ∈ ℋ withprobabilitygreaterthan1 − 𝛿
𝐿 𝑓 ≤ 𝐿L 𝑓 +log ℋ + log 1𝛿
𝑁
�
• manyothervariantsforinfinitecardinalityclasses• oftenboundsareloose
• Complexity≈numberofdegreesoffreedom• e.g.,numberofparameterstoestimate• moredataècanfitmorecomplexmodels
• IsℋP = {𝒙 → 𝑤� + 𝒘𝟏. 𝒙 − 𝒘𝟐. 𝒙}morecomplexthanℋy ={𝒙 → 𝑤� + 𝒘𝟏. 𝒙}?• Whatweneedishowmanydifferent“behaviors”wecangetonsame𝑆
![Page 10: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/10.jpg)
9
Summary
• Overfittingo Whatisoverfitting?o Howtodetectoverfitting?o Avoidingoverfittingusingmodelselection
• Bias– variancetradeoff
![Page 11: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/11.jpg)
Classification
• Supervisedlearning:estimateamapping𝑓 frominput𝑥 ∈ 𝒳 tooutput𝑦 ∈ 𝒴o Regression 𝒴 = ℝorothercontinuousvariableso Classification 𝒴 takesdiscretesetofvalues
§ Examples:q 𝒴 = {spam, nospam},q digits(notvalues)𝒴 = {0,1,2, … , 9}
• ManysuccessfulapplicationsofMLinvision,speech,NLP,healthcare
10
![Page 12: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/12.jpg)
ClassificationvsRegression
• Label-valuesdonothavemeaningo 𝒴 = {spam, nospam} or𝒴 = {0,1} or𝒴 = {−1,1}
• Orderingoflabelsdoesnotmatter(formostparts)o 𝑓(𝑥) = “0” when𝑦 = “1” isasbadas𝑓(𝑥) = “9” when𝑦 = “1”
• Often𝑓(𝑥) doesnotreturnlabels𝑦o e.g.inbinaryclassificationwith𝒴 = {−1,1} weoftenestimate𝑓:𝒳 → ℝ andthenpostprocesstoget𝑦( 𝑓 𝑥 = 𝟏 𝑓 𝑥 ≥ 0
o mainlyforcomputationalreasons
§ remember,weneedtosolveminw∈ℋ
∑ ℓ(𝑓 𝑥 - , 𝑦 - )�-
§ discretevaluesà combinatorialproblemsà hardtosolveo moregenerallyℋ ⊂ 𝑓:𝒳 → ℝ andlossℓ:ℝ×𝒴 → ℝ
§ comparetoregression,wheretypicallyℋ ⊂ 𝑓:𝒳 → 𝒴 andlossℓ:𝒴×𝒴 → ℝ
11
![Page 13: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/13.jpg)
Non-parametric classifiers
12
![Page 14: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/14.jpg)
• TrainingdataS = 𝑥 - , 𝑦 - : 𝑖 = 1,2, … , 𝑁
• Wanttopredictlabelofnewpoint𝑥
• NearestNeighborRule
o Findtheclosesttrainingpoint:𝑖∗ = argmin-𝜌(𝑥, 𝑥 - )
o Predictlabelof𝑥 as𝑦( 𝑥 = 𝑦 -∗
• Computationo Trainingtime:Donothing
o Testtime:searchthetrainingsetforaNN
NearestNeighbor(NN)Classifier
?
13Figurecredit:Nati Srebro
![Page 15: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/15.jpg)
NearestNeighbor(NN)Classifier
• Whereisthemainmodel?o 𝑖∗ = argmin
-𝜌(𝑥, 𝑥 - )
o Whatistheright“distance”betweenimages?Betweensoundwaves?Betweensentences?
o Often𝜌 𝑥, 𝑥� = 𝜙 𝑥 − 𝜙 𝑥� y orothernorms 𝑥 − 𝑥� P
𝝓� 𝒙 − 𝝓�(𝒙�) 𝟐𝝓� 𝒙 = (𝟓𝒙 𝟏 , 𝒙 𝟐 )𝒙 − 𝒙� 𝟏
14Slidecredit:Nati Srebro
![Page 16: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/16.jpg)
• TrainingdataS = 𝑥 - , 𝑦 - : 𝑖 = 1,2, … , 𝑁
•Wanttopredictlabelofnewpoint𝑥• k-NearestNeighborRule
o Findthe𝒌 closesttrainingpoint:𝑖P∗, 𝑖y∗, … , 𝑖¢∗
o Predictlabelof𝑥 as𝑦( 𝑥 = majority 𝑦 -¥∗ , 𝑦 -¦∗ , … , 𝑦 -§
∗
• Computationo Trainingtime:Donothingo Testtime:searchthetrainingsetforkNNs
k-NearestNeighbor(kNN)classifier
15
![Page 17: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/17.jpg)
k-NearestNeighbor
•Advantagesonotrainingouniversalapproximator – non-parametric
•Disadvantagesonotscalable
§testtimememoryrequirement§testtimecomputation
oeasilyoverfits withsmalldata
16
![Page 18: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/18.jpg)
Trainingvstesterror
17
1-NN• Trainingerror?• 0
• Testerror?• DependsonPr(𝑥, 𝑦)
k-NN• Trainingerror: canbegreaterthan0
• Testerror:againdependsonPr(𝑥, 𝑦)
Figurecredit:Nati Srebro
![Page 19: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/19.jpg)
k-NearestNeighbor:DataFit/ComplexityTradeoff
k=1 k=5 k=12
k=50 k=100 k=200
S= h*=
18Slidecredit:Nati Srebro
![Page 20: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/20.jpg)
Spacepartition
• kNNpartitioningof𝒳 (orℝ¨)intoregionsof+1and-1• Whataboutdiscretevaluedfeatures𝑥?• Evenforcontinuous𝑥,canwegetmorestructuredpartitions?o easytodescribe
§ e.g.,𝑅y = {𝑥: 𝑥P < 𝑡Pand𝑥y > 𝑡y}o reducesdegreesoffreedom
• Anynon-overlappingpartitionusingonly(hyper)rectanglesà representablebyatree
19Figurecredit:GregShaknarovich
![Page 21: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/21.jpg)
Decisiontrees
• Focusonbinarytrees(treeswithatmosttwochildrenateachnode)
• Howtocreatetrees?
•Whatisa“good”tree?oMeasureof“purity”ateachleafnodewhereeachleafnodecorrespondingtoaregion𝑅-purity 𝑡𝑟𝑒𝑒 = ∑ |#blueat𝑅4 −#redat𝑅-|�
°l
20
Therearevariousmetricsof(im)puritythatareusedinpractice,buttheroughideaisthesame
![Page 22: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/22.jpg)
Decisiontrees
• Howtocreatetrees?• Trainingdata𝑆 = 𝑥 - , 𝑦 - : 𝑖 = 1,2, … , 𝑁 ,where𝑦 - ∈ {blue, red}• Ateachpoint,purity 𝑡𝑟𝑒𝑒 =N|#blueatleaf −#redatleaf|
�
²³´µ
• Startwithalldataatrooto onlyone𝑙𝑒𝑎𝑓 = 𝑟𝑜𝑜𝑡.Whatispurity 𝑡𝑟𝑒𝑒 ?
21
![Page 23: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/23.jpg)
Decisiontrees
• Howtocreatetrees?• Trainingdata𝑆 = 𝑥 - , 𝑦 - : 𝑖 = 1,2, … , 𝑁 ,where𝑦 - ∈ {blue, red}• Ateachpoint,purity 𝑡𝑟𝑒𝑒 =N|#blueatleaf −#redatleaf|
�
²³´µ
• Startwithalldataatrooto onlyone𝑙𝑒𝑎𝑓 = 𝑟𝑜𝑜𝑡.Whatispurity 𝑡𝑟𝑒𝑒 ?
• Createasplitbasedonarulethatincreasestheamountof“purity”oftree.o Howcomplexcantherulesbe?
22
![Page 24: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/24.jpg)
Decisiontrees
• Howtocreatetrees?• Trainingdata𝑆 = 𝑥 - , 𝑦 - : 𝑖 = 1,2, … , 𝑁 ,where𝑦 - ∈ {blue, red}• Ateachpoint,purity 𝑡𝑟𝑒𝑒 =N|#blueatleaf −#redatleaf|
�
²³´µ
• Startwithalldataatrooto onlyone𝑙𝑒𝑎𝑓 = 𝑟𝑜𝑜𝑡.Whatispurity 𝑡𝑟𝑒𝑒 ?
• Createasplitbasedonarulethatincreasestheamountof“purity”oftree.o Howcomplexcantherulesbe?
• Repeat23
Whentostop?whatisthecomplexityofaDT?• Limitthenumber
ofleafnodes
![Page 25: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/25.jpg)
Decisiontrees
• Advantageso interpretableo easytodealwithnon-numericfeatureso naturalextensionstomulti-class,multi-label
• Disadvantageso notscalableo harddecisions– non-smoothdecisionso oftenoverfits inspiteofregularization
• CheckCARTpackageinscikit-learn
24
![Page 26: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/26.jpg)
Parametricclassifiers
• Whatistheequivalentoflinearregression?o somethingeasytotraino somethingeasytouseattesttime
• 𝑓 𝒙 = 𝑓𝒘 𝒙 = 𝒘. 𝒙 + 𝑤�• ℋ = {𝑓𝒘 = 𝒙 → 𝒘. 𝒙 + 𝑤�:𝒘 ∈ ℝ¨,𝑤� ∈ ℝ}
• but𝑓 𝒙 ∉ {−1,1}!howdowegetlabels?o reasonablechoice
𝑦( 𝒙 = 1 if𝑓𝒘¹ 𝒙 ≥ 0 and𝑦( 𝒙 = −1 otherwiseo linearclassifier:𝑦( 𝒙 = sign(𝒘¹. 𝒙 + 𝑤¹�)
25
![Page 27: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/27.jpg)
Parametricclassifiers
• ℋ = 𝑓𝒘 = 𝒙 → 𝒘. 𝒙 + 𝑤�:𝒘 ∈ ℝ¨,𝑤� ∈ ℝ• 𝑦( 𝒙 = sign(𝒘¹. 𝒙 + 𝑤¹�)• 𝒘¹. 𝒙 + 𝑤¹� = 0 (linear)decisionboundaryorseparatinghyperplaneo thatseparatesℝ¨ intotwohalfspaces(regions)𝒘¹. 𝒙 + 𝑤¹� > 0 and𝒘¹. 𝒙 + 𝑤¹� < 0
• moregenerally,𝑦( 𝒙 = sign 𝑓_ 𝒙à decisionboundaryis𝑓_ 𝒙 = 0
26
!"
!#
ç
![Page 28: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/28.jpg)
Linear classifier
27
![Page 29: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/29.jpg)
ClassificationvsRegression
• Label-valuesdonothavemeaningo 𝒴 = {spam, nospam} or𝒴 = {0,1} or𝒴 = {−1,1}
• Orderingoflabelsdoesnotmatter(formostparts)o 𝑓(𝑥) = “0” when𝑦 = “1” isasbadas𝑓(𝑥) = “9” when𝑦 = “1”
• Often𝑓(𝑥) doesnotreturnlabels𝑦o e.g.inbinaryclassificationwith𝒴 = {−1,1} weoftenestimate𝑓:𝒳 → ℝ andthenpostprocesstoget𝑦( 𝑓 𝑥 = 𝟏 𝑓 𝑥 ≥ 0
o mainlyforcomputationalreasons
§ remember,weneedtosolveminw∈ℋ
∑ ℓ(𝑓 𝑥 - , 𝑦 - )�-
§ discretevaluesà combinatorialproblemsà hardtosolveo moregenerallyℋ ⊂ 𝑓:𝒳 → ℝ andlossℓ:ℝ×𝒴 → ℝ
§ comparetoregression,wheretypicallyℋ ⊂ 𝑓:𝒳 → 𝒴 andlossℓ:𝒴×𝒴 → ℝ
28
![Page 30: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/30.jpg)
ClassificationvsRegression
• Label-valuesdonothavemeaningo 𝒴 = {spam, nospam} or𝒴 = {0,1} or𝒴 = {−1,1}
• Orderingoflabelsdoesnotmatter(formostparts)o 𝑓(𝑥) = “0” when𝑦 = “1” isasbadas𝑓(𝑥) = “9” when𝑦 = “1”
• Often𝑓(𝑥) doesnotreturnlabels𝑦o e.g.inbinaryclassificationwith𝒴 = {−1,1} weoftenestimate𝑓:𝒳 → ℝ andthenpostprocesstoget𝑦( 𝑓 𝑥 = 𝟏 𝑓 𝑥 ≥ 0
o mainlyforcomputationalreasons
§ remember,weneedtosolveminw∈ℋ
∑ ℓ(𝑓 𝑥 - , 𝑦 - )�-
§ discretevaluesà combinatorialproblemsà hardtosolveo moregenerallyℋ ⊂ 𝑓:𝒳 → ℝ andlossℓ:ℝ×𝒴 → ℝ
§ comparetoregression,wheretypicallyℋ ⊂ 𝑓:𝒳 → 𝒴 andlossℓ:𝒴×𝒴 → ℝ
29
Whatifweignoreaboveandsolveclassification
usingregression?
![Page 31: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/31.jpg)
Classificationasregression
• Binaryclassification𝒴 = −1,1 and𝒳 ∈ ℝ¨
• Treatitasregressionwithsquaredloss,saylinearregressiono Trainingdata𝑆 = 𝒙 𝒊 , 𝑦 - : 𝑖 = 1,2, … , 𝑁o ERM
𝒘¹,𝑤¹� = argmin»,»¼
N 𝒘. 𝒙 𝒊 + 𝑤� − 𝑦 - y�
-
30
![Page 32: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/32.jpg)
Classificationasregression
𝑥
𝑦
𝑥
𝑦( = +1 𝑦( = −1
31
𝑦( 𝑥 = sign(𝑤𝑥 + 𝑤�)
Examplecredit:GregShaknarovich
![Page 33: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/33.jpg)
Classificationasregression
𝑥
𝑥
𝑦 classifiedcorrectlyby𝑦( 𝑥 = sign 𝑤. 𝑥
butsquaredloss 𝑤. 𝑥 + 1 ywillbehigh
32Examplecredit:GregShaknarovich
![Page 34: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/34.jpg)
Classificationasregression
𝑥
𝑦 = +1 𝑦 = −1
𝑥
𝑦
33Examplecredit:GregShaknarovich
![Page 35: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/35.jpg)
Classificationasregression
34Slidecredit:GregShaknarovich
![Page 36: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/36.jpg)
• Thecorrectlosstouseis0-1lossafter thresholdingℓ�P 𝑓 𝑥 , 𝑦 = 𝟏 sign 𝑓 𝑥 ≠ 𝑦
= 𝟏 sign 𝑓 𝑥 𝑦 < 0
SurrogateLosses
35
0 𝑓(𝑥)𝑦 →
ℓ(𝑓𝑥,𝑦)
![Page 37: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/37.jpg)
• Thecorrectlosstouseis0-1lossafter thresholdingℓ�P 𝑓 𝑥 , 𝑦 = 𝟏 sign 𝑓 𝑥 ≠ 𝑦
= 𝟏 sign 𝑓 𝑥 𝑦 < 0• Linearregressionusesℓ¾L 𝑓 𝑥 , 𝑦 = 𝑓 𝑥 − 𝑦 y
• WhynotdoERMoverℓ�P 𝑓 𝑥 , 𝑦 directly?o non-continuous,non-convex
SurrogateLosses
36
0 𝑦(𝑦 →
ℓ(𝑓𝑥,𝑦)
0 𝑓(𝑥)𝑦 →
![Page 38: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/38.jpg)
SurrogateLosses
37
• Hardtooptimizeoverℓ�P,findanotherlossℓ(𝑦(, 𝑦)o Convex(foranyfixed𝑦)à easiertominimizeo Anupperboundofℓ�P à smallℓ ⇒ smallℓ�P
• Satisfiedbysquaredlossàbuthas“large”lossevenwhenℓ�P 𝑦(, 𝑦 = 0• Twomoresurrogatelossesininthiscourseo Logisticlossℓ²�¿ 𝑦(, 𝑦 = log 1 + exp −𝑦(𝑦(TODAY)o HingelossℓÁ-hÂp 𝑦(, 𝑦 = max(0,1 − 𝑦(𝑦)(TOMORROW)
0 𝑓(𝑥)𝑦 →
ℓ(𝑓𝑥,𝑦)
![Page 39: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/39.jpg)
Logistic Regression
38
![Page 40: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/40.jpg)
Logisticregression:ERMonsurrogateloss
• 𝑆 = 𝒙 𝒊 , 𝑦 - : 𝑖 = 1,2, … ,𝑁 , 𝒳 = ℝ¨, 𝒴 = {−1,1}• Linearmodel𝑓 𝒙 = 𝑓𝒘 𝒙 = 𝒘. 𝒙 + 𝑤�• Minimizetrainingloss
𝒘¹,𝑤¹� = argmin𝒘,»¼
Nlog 1 + exp − 𝒘. 𝒙 𝒊 + 𝑤� 𝑦 - �
-
• Outputclassifier𝑦( 𝒙 = sign(𝒘. 𝒙 + 𝑤�)
Logisticlossℓ 𝑓 𝑥 , 𝑦 = log 1 + exp −𝑓(𝑥)𝑦
39
ℓ(𝑓𝑥,𝑦)
0 𝑓(𝑥)𝑦 →
![Page 41: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/41.jpg)
𝒘¹,𝑤¹� = argmin𝒘,»¼
Nlog 1 + exp − 𝒘. 𝒙 𝒊 + 𝑤� 𝑦 - �
-
• Learnsalineardecisionboundaryo {𝒙:𝒘. 𝒙 + 𝑤� = 0} isahyperplaneinℝ¨ - decisionboundaryo {𝒙:𝒘. 𝒙 + 𝑤� = 0} dividesℝ¨ intotwohalfspace (regions)o 𝒙:𝒘. 𝒙 + 𝑤� ≥ 0 willgetlabel+1 and{𝒙:𝒘. 𝒙 + 𝑤� < 0} willgetlabel-1
• Maps𝒙 toa1Dcoordinate
𝑥� =𝒘. 𝒙 + 𝑤�
𝒘
Logisticregression
𝒙
𝒙′
𝑥P
𝑥y
𝑤
ç
40Figurecredit:GregShaknarovich
![Page 42: Day 3: Classification, logistic regressionsuriya/website-introml... · Classification •Supervised learning: estimate a mapping 7from input !∈’to output #∈‚ oRegression‚=ℝ](https://reader034.fdocuments.in/reader034/viewer/2022051811/601d7c7e51c3390cc51700db/html5/thumbnails/42.jpg)
LogisticRegression
𝒘¹,𝑤¹� = argmin𝒘,»¼
Nlog 1 + exp − 𝒘. 𝒙 + 𝑤� 𝑦 �
-
• Convexoptimizationproblem• Cansolveusinggradientdescent• Canalsoaddusualregularization:ℓy, ℓP
o Moredetailsinthenextsession
41