Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100...
Transcript of Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100...
![Page 1: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/1.jpg)
MachineLearning&DataMiningCMS/CS/CNS/EE155
Lecture2:Perceptron&StochasticGradient
Descent
![Page 2: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/2.jpg)
Recap: BasicRecipe(supervised)
• TrainingData:
• ModelClass:
• LossFunction:
• LearningObjective:
S = (xi, yi ){ }i=1N
f (x |w,b) = wT x − b
L(a,b) = (a− b)2
LinearModels
SquaredLoss
x ∈ RD
y ∈ −1,+1{ }
argminw,b
L yi, f (xi |w,b)( )i=1
N
∑
Optimization Problem2
![Page 3: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/3.jpg)
Recap:Bias-VarianceTrade-off
0 20 40 60 80 100−1
−0.5
0
0.5
1
1.5
0 20 40 60 80 1000
0.5
1
1.5
0 20 40 60 80 100−1
−0.5
0
0.5
1
1.5
0 20 40 60 80 1000
0.5
1
1.5
0 20 40 60 80 100−1
−0.5
0
0.5
1
1.5
0 20 40 60 80 1000
0.5
1
1.5VarianceBias VarianceBias VarianceBias
3
![Page 4: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/4.jpg)
Recap:CompletePipeline
S = (xi, yi ){ }i=1N
TrainingData
f (x |w,b) = wT x − b
ModelClass(es)
L(a,b) = (a− b)2
LossFunction
argminw,b
L yi, f (xi |w,b)( )i=1
N
∑
CrossValidation&ModelSelection Profit!
4
![Page 5: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/5.jpg)
Today
• TwoBasicLearningAlgorithms
• PerceptronAlgorithm
• (Stochastic)GradientDescent– Aka,actuallysolvingtheoptimizationproblem
5
![Page 6: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/6.jpg)
ThePerceptron
• Oneoftheearliestlearningalgorithms– 1957byFrankRosenblatt
• Stillagreatalgorithm– Fast– Cleananalysis– PrecursortoNeuralNetworks
6
FrankRosenblattwiththeMark1PerceptronMachine
![Page 7: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/7.jpg)
PerceptronLearningAlgorithm(LinearClassificationModel)
• w1 =0,b1 =0• Fort=1….– Receiveexample(x,y)– Iff(x|wt,bt)=y• [wt+1, bt+1]=[wt, bt]
– Else• wt+1=wt +yx• bt+1 =bt - y
7
S = (xi, yi ){ }i=1N
y ∈ +1,−1{ }
TrainingSet:
Gothroughtrainingsetinarbitraryorder(e.g.,randomly)
f (x |w) = sign(wT x − b)
![Page 8: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/8.jpg)
• Lineisa1D,Planeis2D• Hyperplane ismanyD– IncludesLineandPlane
• Definedby(w,b)
• Distance:
• SignedDistance:
Aside:Hyperplane Distance
wT x − bw
wT x − bw
w
un-normalizedsigneddistance!
LinearModel=
b/|w|
![Page 9: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/9.jpg)
9
PerceptronLearning
![Page 10: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/10.jpg)
10
Misclassified!
PerceptronLearning
![Page 11: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/11.jpg)
11
Update!
PerceptronLearning
![Page 12: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/12.jpg)
12
Correct!
PerceptronLearning
![Page 13: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/13.jpg)
13
Misclassified!
PerceptronLearning
![Page 14: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/14.jpg)
14
Update!
PerceptronLearning
![Page 15: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/15.jpg)
15
Update!
PerceptronLearning
![Page 16: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/16.jpg)
16
Correct!
PerceptronLearning
![Page 17: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/17.jpg)
17
Correct!
PerceptronLearning
![Page 18: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/18.jpg)
18
Misclassified!
PerceptronLearning
![Page 19: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/19.jpg)
19
Update!
PerceptronLearning
![Page 20: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/20.jpg)
20
Update!
PerceptronLearning
![Page 21: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/21.jpg)
21
AllTrainingExamplesCorrectlyClassified!
PerceptronLearning
![Page 22: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/22.jpg)
22
PerceptronLearningStartAgain
![Page 23: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/23.jpg)
23
Misclassified!
PerceptronLearning
![Page 24: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/24.jpg)
24
Update!
PerceptronLearning
![Page 25: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/25.jpg)
25
Correct!
PerceptronLearning
![Page 26: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/26.jpg)
26
Correct!
PerceptronLearning
![Page 27: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/27.jpg)
27
Misclassified!
PerceptronLearning
![Page 28: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/28.jpg)
28
Update!
PerceptronLearning
![Page 29: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/29.jpg)
29
Update!
PerceptronLearning
![Page 30: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/30.jpg)
30
Correct!
PerceptronLearning
![Page 31: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/31.jpg)
31
Correct!
PerceptronLearning
![Page 32: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/32.jpg)
32
Misclassified!
PerceptronLearning
![Page 33: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/33.jpg)
33
Update!
PerceptronLearning
![Page 34: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/34.jpg)
34
Update!
PerceptronLearning
![Page 35: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/35.jpg)
35
Misclassified!
PerceptronLearning
![Page 36: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/36.jpg)
36
Update!
PerceptronLearning
![Page 37: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/37.jpg)
37
Update!
PerceptronLearning
![Page 38: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/38.jpg)
38
Misclassified!
PerceptronLearning
![Page 39: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/39.jpg)
39
Update!
PerceptronLearning
![Page 40: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/40.jpg)
40
Update!
PerceptronLearning
![Page 41: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/41.jpg)
41
Misclassified!
PerceptronLearning
![Page 42: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/42.jpg)
42
Update!
PerceptronLearning
![Page 43: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/43.jpg)
43
Update!
PerceptronLearning
![Page 44: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/44.jpg)
44
AllTrainingExamplesCorrectlyClassified!
PerceptronLearning
![Page 45: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/45.jpg)
Recap:PerceptronLearningAlgorithm(LinearClassificationModel)
• w1 =0,b1 =0• Fort=1….– Receiveexample(x,y)– Iff(x|wt)=y• [wt+1, bt+1]=[wt, bt]
– Else• wt+1=wt +yx• bt+1 =bt - y
45
S = (xi, yi ){ }i=1N
y ∈ +1,−1{ }
TrainingSet:
Gothroughtrainingsetinarbitraryorder(e.g.,randomly)
f (x |w) = sign(wT x − b)
![Page 46: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/46.jpg)
46
ComparingtheTwoModels
![Page 47: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/47.jpg)
47
ConvergencetoMistakeFree=LinearlySeparable!
![Page 48: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/48.jpg)
48
Margin
γ =maxwmin(x,y)
y(wT x)w
![Page 49: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/49.jpg)
LinearSeparability
• AclassificationproblemisLinearlySeparable:– Existswwithperfectclassificationaccuracy
• SeparablewithMarginγ:
• LinearlySeparable:γ >0
49
γ =maxwmin(x,y)
y(wT x)w
![Page 50: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/50.jpg)
PerceptronMistakeBound
50
#MistakesBoundedBy: R2
γ 2
Margin
R =maxx
x
**IfLinearlySeparable
MoreDetails:http://www.cs.nyu.edu/~mohri/pub/pmb.pdf
Holdsforanyorderingoftrainingexamples!
“Radius”ofFeatureSpace
![Page 51: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/51.jpg)
IntheRealWorld…
• MostproblemsareNOTlinearlyseparable!
• Mayneverconverge…
• Sowhattodo?
• Usevalidationset!
51
![Page 52: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/52.jpg)
EarlyStoppingviaValidation
• RunPerceptronLearningonTrainingSet
• EvaluatecurrentmodelonValidationSet
• Terminatewhenvalidationaccuracystopsimproving
52
https://en.wikipedia.org/wiki/Early_stopping
![Page 53: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/53.jpg)
OnlineLearningvs BatchLearning
• OnlineLearning:– Receiveastreamofdata(x,y)– Makeincrementalupdates(typically)– PerceptronLearningisaninstanceofOnlineLearning
• BatchLearning– Givenallthedataupfront– Canuseonlinelearningalgorithmsforbatchlearning– E.g.,streamthedatatothelearningalgorithm
53https://en.wikipedia.org/wiki/Online_machine_learning
![Page 54: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/54.jpg)
Recap: Perceptron
• Oneofthefirstmachinelearningalgorithms
• Benefits:– Simpleandfast– Cleananalysis
• Drawbacks:– Mightnotconvergetoaverygoodmodel–Whatistheobjectivefunction?
54
![Page 55: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/55.jpg)
(Stochastic)GradientDescent
55
![Page 56: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/56.jpg)
BacktoOptimizingObjectiveFunctions
• TrainingData:
• ModelClass:
• LossFunction:
• LearningObjective:
S = (xi, yi ){ }i=1N
f (x |w,b) = wT x − b
L(a,b) = (a− b)2
LinearModels
SquaredLoss
x ∈ RD
y ∈ −1,+1{ }
argminw,b
L yi, f (xi |w,b)( )i=1
N
∑
OptimizationProblem56
![Page 57: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/57.jpg)
BacktoOptimizingObjectiveFunctions
• Typically,requiresoptimizationalgorithm.
• Simplest:GradientDescent
• ThisLecture:stickwithsquaredloss– Talkaboutvariouslossfunctionsnextlecture
argminw,b
L(w,b) ≡ L yi, f (xi |w,b)( )i=1
N
∑
![Page 58: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/58.jpg)
GradientReviewforSquaredLoss
∂wL(w,b) = ∂w L yi, f (xi |w,b)( )i=1
N
∑
L(a,b) = (a− b)2
= ∂wL yi, f (xi |w,b)( )i=1
N
∑
= −2(yi − f (xi |w,b))∂w f (xi |w,b)i=1
N
∑
f (x |w,b) = wT x − b= −2(yi − f (xi |w,b))xii=1
N
∑
LinearityofDifferentiation
ChainRule
![Page 59: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/59.jpg)
GradientDescent
• Initialize:w1 =0,b1 =0
• Fort=1…
59
wt+1 = wt −η t+1∂wL(wt,bt )
bt+1 = bt −η t+1∂bL(wt,bt )
“StepSize”
![Page 60: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/60.jpg)
−0.5 0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
HowtoChooseStepSize?
60w
L
η =1 ∂wL(w) = −2(1−w)
![Page 61: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/61.jpg)
−0.5 0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
HowtoChooseStepSize?
61w
L
η =1 ∂wL(w) = −2(1−w)
![Page 62: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/62.jpg)
−0.5 0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
HowtoChooseStepSize?
62w
L
η =1 ∂wL(w) = −2(1−w)
![Page 63: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/63.jpg)
−0.5 0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
HowtoChooseStepSize?
63w
L
η =1 ∂wL(w) = −2(1−w)
OscillateInfinitely!
![Page 64: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/64.jpg)
−0.5 0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
HowtoChooseStepSize?
64w
L
η = 0.0001 ∂wL(w) = −2(1−w)
![Page 65: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/65.jpg)
−0.5 0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
HowtoChooseStepSize?
65w
L
η = 0.0001 ∂wL(w) = −2(1−w)
![Page 66: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/66.jpg)
−0.5 0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
HowtoChooseStepSize?
66w
L
η = 0.0001 ∂wL(w) = −2(1−w)
![Page 67: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/67.jpg)
−0.5 0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
HowtoChooseStepSize?
67w
L
η = 0.0001 ∂wL(w) = −2(1−w)
TakesReallyLongTime!
![Page 68: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/68.jpg)
HowtoChooseStepSize?
68
0 20 40 60 80 100 120 140 160 180 2000
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0.0010.010.10.5
Iterations
Loss
NotethattheabsolutescaleisnotmeaningfulFocusontherelativemagnitude differences
AsLargeAsPossible!(WithoutDiverging)
![Page 69: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/69.jpg)
BeingScaleInvariant
• Considerthefollowingtwogradientupdates:
• Suppose:– Howarethetwostepsizesrelated?
69
wt+1 = wt −η t+1∂wL(wt,bt )
wt+1 = wt −η̂ t+1∂wL̂(wt,bt )
L̂ =1000L
η̂ t+1 =η /1000
![Page 70: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/70.jpg)
PracticalRulesofThumb
• DivideLossFunctionbyNumberofExamples:
• Startwithlargestepsize– Iflossplateaus,dividestepsizeby2– (Canalsouse advancedoptimizationmethods)– (Stepsizemustdecreaseovertimetoguaranteeconvergencetoglobaloptimum)
70
wt+1 = wt −η t+1
N"
#$
%
&'∂wL(w
t,bt )
![Page 71: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/71.jpg)
Aside: Convexity
71
1/15/2015 ConvexFunction.svg
file:///Users/yyue/Downloads/ConvexFunction.svg 1/2
ImageSource:http://en.wikipedia.org/wiki/Convex_function
Easytofindglobaloptima!
Strictconvexifdiffalways>0
NotConvex
![Page 72: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/72.jpg)
Aside: Convexity
72
−0.5 0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
L(x2 ) ≥ L(x1)+∇L(x1)T (x2 − x1)
Functionisalwaysabovethelocallylinearextrapolation
![Page 73: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/73.jpg)
Aside:Convexity
• Alllocaloptimaareglobaloptima:
• Strictlyconvex:uniqueglobaloptimum:
• Almostallstandardobjectivesare(strictly)convex:– SquaredLoss,SVMs,LR,Ridge,Lasso– Wewillseenon-convexobjectiveslater(e.g.,deeplearning)
73
GradientDescentwillfindoptimum
Assumingstepsizechosensafely
![Page 74: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/74.jpg)
Convergence
• AssumeLisconvex• Howmanyiterationstoachieve:
• If:– ThenO(1/ε2)iterations
• If:– ThenO(1/ε)iterations
• If:– ThenO(log(1/ε))iterations
74MoreDetails:Bubeck TextbookChapter3
L(a)− L(b) ≤ ρ a− b Lis“ρ-Lipschitz”
L(w)− L(w*) ≤ ε
∇L(a)−∇L(b) ≤ ρ a− bLis“ρ-smooth”
L(a) ≥ L(b)+∇L(b)T (a− b)+ ρ2a− b 2
Lis“ρ-stronglyconvex”
![Page 75: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/75.jpg)
Convergence
• Ingeneral,takesinfinitetimetoreachglobaloptimum.• Butingeneral,wedon’tcare!
– Aslongaswe’recloseenoughtotheglobaloptimum
75
0 20 40 60 80 100 120 140 160 180 2000
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0.0010.010.10.5
Iterations
Loss
Howdoweknowifwe’rehere?
Andnothere?
![Page 76: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/76.jpg)
WhentoStop?
• Convergenceanalyses=worst-caseupperbounds– Whattodoinpractice?
• Stopwhenprogressissufficientlysmall– E.g.,relativereductionlessthan0.001
• Stopafterpre-specified#iterations– E.g.,100000
• Stopwhenvalidationerrorstopsgoingdown
76
![Page 77: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/77.jpg)
LimitationofGradientDescent
• Requiresfullpassovertrainingsetperiteration
• Veryexpensiveiftrainingsetishuge
• Doweneedtodoafullpassoverthedata?
77
∂wL(w,b | S) = ∂w L yi, f (xi |w,b)( )i=1
N
∑
![Page 78: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/78.jpg)
StochasticGradientDescent
• SupposeLossFunctionDecomposesAdditively
• Gradient=expectedgradientofsub-functions
78
L(w,b) = 1N
Li (w,b)i=1
N
∑ = Ei Li (w,b)[ ]
EachLi correspondstoasingledatapoint
∂wL(w,b) = ∂w Ei Li (w,b)[ ] = Ei ∂wLi (w,b)[ ]Li (w,b) ≡ yi − f (xi |w,b( )2
![Page 79: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/79.jpg)
StochasticGradientDescent
• Sufficestotakerandomgradientupdate– Solongasitmatchesthetruegradientinexpectation
• Eachiterationt:– Choosei atrandom
• SGDisanonlinelearningalgorithm!
79
wt+1 = wt −η t+1∂wLi (w,b)
bt+1 = bt −η t+1∂bLi (w,b)
ExpectedValueis: ∂wL(w,b)
![Page 80: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/80.jpg)
Mini-BatchSGD
• EachLi isasmallbatchoftrainingexamples– E.g,.500-1000examples– Canleveragevectoroperations– Decreasevolatilityofgradientupdates
• Industrystate-of-the-art– Everyoneusesmini-batchSGD– Oftenparallelized• (e.g.,differentcoresworkondifferentmini-batches)
80
![Page 81: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/81.jpg)
CheckingforConvergence
• Howtocheckforconvergence?– Evaluatinglossonentiretrainingsetseemsexpensive…
81
0 20 40 60 80 100 120 140 160 180 2000
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0.0010.010.10.5
Iterations
Loss
![Page 82: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/82.jpg)
CheckingforConvergence
• Howtocheckforconvergence?– Evaluatinglossonentiretrainingsetseemsexpensive…
• Don’tcheckaftereveryiteration– E.g.,checkevery1000iterations
• Evaluatelossonasubsetoftrainingdata– E.g.,theprevious5000examples.
82
![Page 83: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/83.jpg)
Recap:StochasticGradientDescent
• Conceptually:– DecomposeLossFunctionAdditively– ChooseaComponentRandomly– GradientUpdate
• Benefits:– Avoiditeratingentiredatasetforeveryupdate– Gradientupdateisconsistent(inexpectation)
• IndustryStandard
83
![Page 84: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/84.jpg)
PerceptronRevisited(WhatistheObjectiveFunction?)
• w1 =0,b1 =0• Fort=1….– Receiveexample(x,y)– Iff(x|wt,bt)=y• [wt+1, bt+1]=[wt, bt]
– Else• wt+1=wt +yx• bt+1 =bt - y
84
S = (xi, yi ){ }i=1N
y ∈ +1,−1{ }
TrainingSet:
Gothroughtrainingsetinarbitraryorder(e.g.,randomly)
f (x |w) = sign(wT x − b)
![Page 85: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/85.jpg)
Perceptron(Implicit)Objective
85
Li (w,b) =max 0,−yi f (xi |w,b){ }
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
yf(x)
Loss
![Page 86: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/86.jpg)
Recap:CompletePipeline
S = (xi, yi ){ }i=1N
TrainingData
f (x |w,b) = wT x − b
ModelClass(es)
L(a,b) = (a− b)2
LossFunction
argminw,b
L yi, f (xi |w,b)( )i=1
N
∑
CrossValidation&ModelSelection Profit!
86
UseSGD!
![Page 87: Machine Learning & Data Mining - Yisong Yue · Recap: Bias-Variance Trade -off 0 20 40 60 80 100 −1 −0.5 0 0.5 1 1.5 0 20 40 60 80 100 0 0.5 1 1.5 0 20 40 60 80 100 −1 −0.5](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed61b5e300e8232eb7fa977/html5/thumbnails/87.jpg)
NextWeek
• DifferentLossFunctions– HingeLoss(SVM)– LogLoss(LogisticRegression)
• Non-linearmodelclasses– NeuralNets
• Regularization
• NextThursdayRecitation:– LinearAlgebra&Calculus
87