CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine...
Transcript of CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine...
![Page 1: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/1.jpg)
Neural Networks
Professor Ameet Talwalkar
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 1 / 20
![Page 2: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/2.jpg)
Outline
1 Administration
2 Review of last lecture
3 Neural Networks
4 Summary
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 2 / 20
![Page 3: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/3.jpg)
Grade Policy and Final Exam
Upcoming Schedule
Today: HW5 due, HW6 released
Wednesday (3/8): Last day of class
Next Monday (3/13): No class – I will hold office hours in my office(BH 4531F)
Next Wednesday (3/15): Final Exam, HW6 due
Final Exam
Cumulative but with more emphasis on new material
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20
![Page 4: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/4.jpg)
Outline
1 Administration
2 Review of last lectureAdaBoostBoosting as learning nonlinear basis
3 Neural Networks
4 Summary
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 4 / 20
![Page 5: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/5.jpg)
Boosting
High-level idea: combine a lot of classifiers
Sequentially construct / identify these classifiers, ht(·), one at a time
Use weak classifiers to arrive at a complex decision boundary (strongclassifier), where βt is the contribution of each weak classifier
h[x] = sign
[T∑t=1
βtht(x)
]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 5 / 20
![Page 6: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/6.jpg)
Boosting
High-level idea: combine a lot of classifiers
Sequentially construct / identify these classifiers, ht(·), one at a time
Use weak classifiers to arrive at a complex decision boundary (strongclassifier), where βt is the contribution of each weak classifier
h[x] = sign
[T∑t=1
βtht(x)
]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 5 / 20
![Page 7: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/7.jpg)
Adaboost AlgorithmGiven: N samples {xn, yn}, where yn ∈ {+1,−1}, and some way ofconstructing weak (or base) classifiers
Initialize weights w1(n) =1N for every training sample
For t = 1 to T1 Train a weak classifier ht(x) using current weights wt(n), by
minimizing
εt =∑n
wt(n)I[yn 6= ht(xn)] (the weighted classification error)
2 Compute contribution for this classifier: βt =12 log
1−εtεt
3 Update weights on training points
wt+1(n) ∝ wt(n)e−βtynht(xn)
and normalize them such that∑n wt+1(n) = 1
Output the final classifier
h[x] = sign
[T∑t=1
βtht(x)
]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 6 / 20
![Page 8: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/8.jpg)
Adaboost AlgorithmGiven: N samples {xn, yn}, where yn ∈ {+1,−1}, and some way ofconstructing weak (or base) classifiersInitialize weights w1(n) =
1N for every training sample
For t = 1 to T1 Train a weak classifier ht(x) using current weights wt(n), by
minimizing
εt =∑n
wt(n)I[yn 6= ht(xn)] (the weighted classification error)
2 Compute contribution for this classifier: βt =12 log
1−εtεt
3 Update weights on training points
wt+1(n) ∝ wt(n)e−βtynht(xn)
and normalize them such that∑n wt+1(n) = 1
Output the final classifier
h[x] = sign
[T∑t=1
βtht(x)
]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 6 / 20
![Page 9: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/9.jpg)
Adaboost AlgorithmGiven: N samples {xn, yn}, where yn ∈ {+1,−1}, and some way ofconstructing weak (or base) classifiersInitialize weights w1(n) =
1N for every training sample
For t = 1 to T1 Train a weak classifier ht(x) using current weights wt(n), by
minimizing
εt =∑n
wt(n)I[yn 6= ht(xn)]
(the weighted classification error)
2 Compute contribution for this classifier: βt =12 log
1−εtεt
3 Update weights on training points
wt+1(n) ∝ wt(n)e−βtynht(xn)
and normalize them such that∑n wt+1(n) = 1
Output the final classifier
h[x] = sign
[T∑t=1
βtht(x)
]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 6 / 20
![Page 10: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/10.jpg)
Adaboost AlgorithmGiven: N samples {xn, yn}, where yn ∈ {+1,−1}, and some way ofconstructing weak (or base) classifiersInitialize weights w1(n) =
1N for every training sample
For t = 1 to T1 Train a weak classifier ht(x) using current weights wt(n), by
minimizing
εt =∑n
wt(n)I[yn 6= ht(xn)] (the weighted classification error)
2 Compute contribution for this classifier: βt =12 log
1−εtεt
3 Update weights on training points
wt+1(n) ∝ wt(n)e−βtynht(xn)
and normalize them such that∑n wt+1(n) = 1
Output the final classifier
h[x] = sign
[T∑t=1
βtht(x)
]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 6 / 20
![Page 11: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/11.jpg)
Adaboost AlgorithmGiven: N samples {xn, yn}, where yn ∈ {+1,−1}, and some way ofconstructing weak (or base) classifiersInitialize weights w1(n) =
1N for every training sample
For t = 1 to T1 Train a weak classifier ht(x) using current weights wt(n), by
minimizing
εt =∑n
wt(n)I[yn 6= ht(xn)] (the weighted classification error)
2 Compute contribution for this classifier
: βt =12 log
1−εtεt
3 Update weights on training points
wt+1(n) ∝ wt(n)e−βtynht(xn)
and normalize them such that∑n wt+1(n) = 1
Output the final classifier
h[x] = sign
[T∑t=1
βtht(x)
]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 6 / 20
![Page 12: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/12.jpg)
Adaboost AlgorithmGiven: N samples {xn, yn}, where yn ∈ {+1,−1}, and some way ofconstructing weak (or base) classifiersInitialize weights w1(n) =
1N for every training sample
For t = 1 to T1 Train a weak classifier ht(x) using current weights wt(n), by
minimizing
εt =∑n
wt(n)I[yn 6= ht(xn)] (the weighted classification error)
2 Compute contribution for this classifier: βt =12 log
1−εtεt
3 Update weights on training points
wt+1(n) ∝ wt(n)e−βtynht(xn)
and normalize them such that∑n wt+1(n) = 1
Output the final classifier
h[x] = sign
[T∑t=1
βtht(x)
]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 6 / 20
![Page 13: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/13.jpg)
Adaboost AlgorithmGiven: N samples {xn, yn}, where yn ∈ {+1,−1}, and some way ofconstructing weak (or base) classifiersInitialize weights w1(n) =
1N for every training sample
For t = 1 to T1 Train a weak classifier ht(x) using current weights wt(n), by
minimizing
εt =∑n
wt(n)I[yn 6= ht(xn)] (the weighted classification error)
2 Compute contribution for this classifier: βt =12 log
1−εtεt
3 Update weights on training points
wt+1(n) ∝ wt(n)e−βtynht(xn)
and normalize them such that∑n wt+1(n) = 1
Output the final classifier
h[x] = sign
[T∑t=1
βtht(x)
]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 6 / 20
![Page 14: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/14.jpg)
Adaboost AlgorithmGiven: N samples {xn, yn}, where yn ∈ {+1,−1}, and some way ofconstructing weak (or base) classifiersInitialize weights w1(n) =
1N for every training sample
For t = 1 to T1 Train a weak classifier ht(x) using current weights wt(n), by
minimizing
εt =∑n
wt(n)I[yn 6= ht(xn)] (the weighted classification error)
2 Compute contribution for this classifier: βt =12 log
1−εtεt
3 Update weights on training points
wt+1(n) ∝ wt(n)e−βtynht(xn)
and normalize them such that∑n wt+1(n) = 1
Output the final classifier
h[x] = sign
[T∑t=1
βtht(x)
]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 6 / 20
![Page 15: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/15.jpg)
Adaboost AlgorithmGiven: N samples {xn, yn}, where yn ∈ {+1,−1}, and some way ofconstructing weak (or base) classifiersInitialize weights w1(n) =
1N for every training sample
For t = 1 to T1 Train a weak classifier ht(x) using current weights wt(n), by
minimizing
εt =∑n
wt(n)I[yn 6= ht(xn)] (the weighted classification error)
2 Compute contribution for this classifier: βt =12 log
1−εtεt
3 Update weights on training points
wt+1(n) ∝ wt(n)e−βtynht(xn)
and normalize them such that∑n wt+1(n) = 1
Output the final classifier
h[x] = sign
[T∑t=1
βtht(x)
]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 6 / 20
![Page 16: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/16.jpg)
Adaboost AlgorithmGiven: N samples {xn, yn}, where yn ∈ {+1,−1}, and some way ofconstructing weak (or base) classifiersInitialize weights w1(n) =
1N for every training sample
For t = 1 to T1 Train a weak classifier ht(x) using current weights wt(n), by
minimizing
εt =∑n
wt(n)I[yn 6= ht(xn)] (the weighted classification error)
2 Compute contribution for this classifier: βt =12 log
1−εtεt
3 Update weights on training points
wt+1(n) ∝ wt(n)e−βtynht(xn)
and normalize them such that∑n wt+1(n) = 1
Output the final classifier
h[x] = sign
[T∑t=1
βtht(x)
]Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 6 / 20
![Page 17: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/17.jpg)
Example10 data points and 2 featuresToy ExampleToy ExampleToy ExampleToy ExampleToy Example
D1
weak classifiers = vertical or horizontal half-planes
The data points are clearly not linear separable
In the beginning, all data points have equal weights (the size of thedata markers “+” or “-”)
Base classifier h(·): horizontal or vertical lines (’decision stumps’)I Depth-1 decision trees, i.e., classify data based on a single attribute
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 7 / 20
![Page 18: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/18.jpg)
Example10 data points and 2 featuresToy ExampleToy ExampleToy ExampleToy ExampleToy Example
D1
weak classifiers = vertical or horizontal half-planes
The data points are clearly not linear separable
In the beginning, all data points have equal weights (the size of thedata markers “+” or “-”)Base classifier h(·): horizontal or vertical lines (’decision stumps’)
I Depth-1 decision trees, i.e., classify data based on a single attribute
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 7 / 20
![Page 19: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/19.jpg)
Round 1: t = 1Round 1Round 1Round 1Round 1Round 1
h1
!
"11
=0.30=0.42
2D
3 misclassified (with circles): ε1 = 0.3→ β1 = 0.42.
Weights recomputed; the 3 misclassified data points receive largerweights
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 8 / 20
![Page 20: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/20.jpg)
Round 1: t = 1Round 1Round 1Round 1Round 1Round 1
h1
!
"11
=0.30=0.42
2D
3 misclassified (with circles): ε1 = 0.3→ β1 = 0.42.
Weights recomputed; the 3 misclassified data points receive largerweights
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 8 / 20
![Page 21: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/21.jpg)
Round 2: t = 2Round 2Round 2Round 2Round 2Round 2
!
"22
=0.21=0.65
h2 3D
3 misclassified (with circles): ε2 = 0.21→ β2 = 0.65.Note that ε2 6= 0.3 as those 3 data points have weights less than 1/10
3 misclassified data points get larger weights
Data points classified correctly in both rounds have small weights
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 9 / 20
![Page 22: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/22.jpg)
Round 2: t = 2Round 2Round 2Round 2Round 2Round 2
!
"22
=0.21=0.65
h2 3D
3 misclassified (with circles): ε2 = 0.21→ β2 = 0.65.Note that ε2 6= 0.3 as those 3 data points have weights less than 1/10
3 misclassified data points get larger weights
Data points classified correctly in both rounds have small weights
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 9 / 20
![Page 23: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/23.jpg)
Round 3: t = 3Round 3Round 3Round 3Round 3Round 3
h3
!
"33=0.92=0.14
3 misclassified (with circles): ε3 = 0.14→ β3 = 0.92.
Previously correctly classified data points are now misclassified, henceour error is low; what’s the intuition?
I Since they have been consistently classified correctly, this round’smistake will hopefully not have a huge impact on the overall prediction
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 10 / 20
![Page 24: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/24.jpg)
Round 3: t = 3Round 3Round 3Round 3Round 3Round 3
h3
!
"33=0.92=0.14
3 misclassified (with circles): ε3 = 0.14→ β3 = 0.92.
Previously correctly classified data points are now misclassified, henceour error is low; what’s the intuition?
I Since they have been consistently classified correctly, this round’smistake will hopefully not have a huge impact on the overall prediction
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 10 / 20
![Page 25: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/25.jpg)
Round 3: t = 3Round 3Round 3Round 3Round 3Round 3
h3
!
"33=0.92=0.14
3 misclassified (with circles): ε3 = 0.14→ β3 = 0.92.
Previously correctly classified data points are now misclassified, henceour error is low; what’s the intuition?
I Since they have been consistently classified correctly, this round’smistake will hopefully not have a huge impact on the overall prediction
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 10 / 20
![Page 26: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/26.jpg)
Final classifier: combining 3 classifiersFinal ClassifierFinal ClassifierFinal ClassifierFinal ClassifierFinal Classifier
Hfinal
+ 0.92+ 0.650.42sign=
=
All data points are now classified correctly!
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 11 / 20
![Page 27: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/27.jpg)
Derivation of the AdaBoost
Minimize exponential loss
`exp(h(x), y) = e−yf(x)
Greedily (sequentially) find the best classifier to optimize the lossA classifier ft−1(x) is improved by adding a new classifier ht(x)
f(x) = ft−1(x) + βtht(x)
(h∗t (x), β∗t ) = argmin(ht(x),βt)
∑n
e−ynf(xn)
= argmin(ht(x),βt)∑n
e−yn[ft−1(xn)+βtht(xn)]
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 12 / 20
![Page 28: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/28.jpg)
Nonlinear basis learned by boosting
Two-stage process
Get sign[f1(x)], sign[f2(x)],· · · ,Combine into a linear classification model
y = sign
{∑t
βtsign[ft(x)]
}
Equivalently, each stage learns a nonlinear basis φt(x) = sign[ft(x)].
One thought is then, why not learning the basis functions and the classifierat the same time?
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 13 / 20
![Page 29: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/29.jpg)
Outline
1 Administration
2 Review of last lecture
3 Neural Networks
4 Summary
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 14 / 20
![Page 30: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/30.jpg)
Derivation of backpropagation
Calculate the feed-forward signals a(l) from the input to the output
Calculate output error based on the predictions a(L) and the label
Backpropagate the error by computing the δ(l) values
Calculating the gradients ∂J∂θij
via a(l) and δ(l)
Updates the parameters via θij ← θij − η ∂J∂θij
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 15 / 20
![Page 31: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/31.jpg)
Illustrative example
p
q
x3
x2
x1
θh θo
p, q: outputs at the indicated nodesθh: edge weight from x1 to hidden node qθo: edge weight from hidden node q to output node pbq: bias associated with node qzq: input to node q, i.e., zq = bq + θhx1 + . . .zp: input to node p, i.e., zp = bp + θoq + . . .g: activation function (e.g., sigmoid)t: target value for output node
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 16 / 20
![Page 32: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/32.jpg)
Illustrative example (cont’d)
Assume cross entropy loss between target t and network output p
E = t log p+ (1− t)(1− log p)
Gradients for the output layer (assuming t = 1)
∂E
∂θo=∂E
∂g
∂g
∂zq
∂zq∂θo
=1
p
∂
∂θop =
1
p
∂
∂θoσ(zp)
=1
pp(1− p) ∂
∂θozp = (1− p) ∂
∂θo(bp + θoq + . . .)
= (1− p)q
Gradients for hidden layer
∂E
∂θh=∂E
∂g
∂g
∂zq
∂zq∂θh
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 17 / 20
![Page 33: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/33.jpg)
Outline
1 Administration
2 Review of last lecture
3 Neural Networks
4 SummarySupervised learning
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 18 / 20
![Page 34: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/34.jpg)
Summary of the course so far: a short list of importantconcepts
Supervised learning has been our focus
Setup: given a training dataset {xn, yn}Nn=1, we learn a function h(x)to predict x’s true value y (i.e., regression or classification)
Linear vs. nonlinear features1 Linear: h(x) depends on wTx2 Nonlinear: h(x) depends on wTφ(x), where φ is either explicit or
depends on a kernel function k(xm,xn) = φ(xm)Tφ(xn),
Loss function1 Squared loss: least square for regression (minimizing residual sum of
errors)2 Logistic loss: logistic regression3 Exponential loss: AdaBoost4 Margin-based loss: support vector machines
Principles of estimation1 Point estimate: maximum likelihood, regularized likelihood
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 19 / 20
![Page 35: CS260 Machine Learning Algorithms · 2017-09-16 · Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 3 / 20. Outline 1 Administration 2 Review of last lecture](https://reader033.fdocuments.in/reader033/viewer/2022043007/5f94d2f14538cf70d32af47a/html5/thumbnails/35.jpg)
cont’d
Optimization1 Methods: gradient descent, Newton method2 Convex optimization: global optimum vs. local optimum3 Lagrange duality: primal and dual formulation
Learning theory1 Difference between training error and generalization error2 Overfitting, bias and variance tradeoff3 Regularization: various regularized models
Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 5, 2017 20 / 20