Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested...
Transcript of Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested...
![Page 1: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/1.jpg)
Cross Validation & Ensembling
Shan-Hung [email protected]
Department of Computer Science,National Tsing Hua University, Taiwan
Machine Learning
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 1 / 34
![Page 2: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/2.jpg)
Outline
1Cross Validation
How Many Folds?
2Ensemble Methods
VotingBaggingBoostingWhy AdaBoost Works?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 2 / 34
![Page 3: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/3.jpg)
Outline
1Cross Validation
How Many Folds?
2Ensemble Methods
VotingBaggingBoostingWhy AdaBoost Works?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 3 / 34
![Page 4: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/4.jpg)
Cross ValidationSo far, we use the hold out method for:
Hyperparameter tuning: validation setPerformance reporting: testing set
What if we get an “unfortunate” split?
K-fold cross validation:1 Split the data set X evenly into K subsets X(i) (called folds)2 For i = 1, · · · ,K, train f�N
(i) using all data but the i-th fold (X\X(i))3 Report the cross-validation error C
CV
by averaging all testing errorsC[f�N
(i) ]’s on X(i)
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34
![Page 5: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/5.jpg)
Cross ValidationSo far, we use the hold out method for:
Hyperparameter tuning: validation setPerformance reporting: testing set
What if we get an “unfortunate” split?K-fold cross validation:
1 Split the data set X evenly into K subsets X(i) (called folds)2 For i = 1, · · · ,K, train f�N
(i) using all data but the i-th fold (X\X(i))3 Report the cross-validation error C
CV
by averaging all testing errorsC[f�N
(i) ]’s on X(i)
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34
![Page 6: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/6.jpg)
Nested Cross Validation
Cross validation (CV) can be applied to both hyperparameter tuningand performance reporting
E.g, 5⇥2 nested CV
1 Inner (2 folds): selecthyperparameters giving lowestC
CV
Can be wrapped by gridsearch
2 Train final model using both
training and validation sets withthe selected hyperparameters
3 Outer (5 folds): report C
CV
astest error
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34
![Page 7: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/7.jpg)
Nested Cross Validation
Cross validation (CV) can be applied to both hyperparameter tuningand performance reporting
E.g, 5⇥2 nested CV
1 Inner (2 folds): selecthyperparameters giving lowestC
CV
Can be wrapped by gridsearch
2 Train final model using both
training and validation sets withthe selected hyperparameters
3 Outer (5 folds): report C
CV
astest error
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34
![Page 8: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/8.jpg)
Nested Cross Validation
Cross validation (CV) can be applied to both hyperparameter tuningand performance reporting
E.g, 5⇥2 nested CV
1 Inner (2 folds): selecthyperparameters giving lowestC
CV
Can be wrapped by gridsearch
2 Train final model using both
training and validation sets withthe selected hyperparameters
3 Outer (5 folds): report C
CV
astest error
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34
![Page 9: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/9.jpg)
Nested Cross Validation
Cross validation (CV) can be applied to both hyperparameter tuningand performance reporting
E.g, 5⇥2 nested CV
1 Inner (2 folds): selecthyperparameters giving lowestC
CV
Can be wrapped by gridsearch
2 Train final model using both
training and validation sets withthe selected hyperparameters
3 Outer (5 folds): report C
CV
astest error
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34
![Page 10: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/10.jpg)
Outline
1Cross Validation
How Many Folds?
2Ensemble Methods
VotingBaggingBoostingWhy AdaBoost Works?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 6 / 34
![Page 11: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/11.jpg)
How Many Folds K? I
The cross-validation error CCV is an average of C[f�N
(i) ]’sRegard each C[f�N
(i) ] as an estimator of the expected generalizationerror EX(C[f
N
])
CCV is an estimator too, and we have
MSE(CCV) = EX[(CCV�EX(C[fN
]))2] = VarX(CCV)+bias(CCV)2
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34
![Page 12: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/12.jpg)
How Many Folds K? I
The cross-validation error CCV is an average of C[f�N
(i) ]’s
Regard each C[f�N
(i) ] as an estimator of the expected generalizationerror EX(C[f
N
])
CCV is an estimator too, and we have
MSE(CCV) = EX[(CCV�EX(C[fN
]))2] = VarX(CCV)+bias(CCV)2
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34
![Page 13: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/13.jpg)
How Many Folds K? I
The cross-validation error CCV is an average of C[f�N
(i) ]’sRegard each C[f�N
(i) ] as an estimator of the expected generalizationerror EX(C[f
N
])
CCV is an estimator too, and we have
MSE(CCV) = EX[(CCV�EX(C[fN
]))2] = VarX(CCV)+bias(CCV)2
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34
![Page 14: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/14.jpg)
How Many Folds K? I
The cross-validation error CCV is an average of C[f�N
(i) ]’sRegard each C[f�N
(i) ] as an estimator of the expected generalizationerror EX(C[f
N
])
CCV is an estimator too, and we have
MSE(CCV) = EX[(CCV�EX(C[fN
]))2] = VarX(CCV)+bias(CCV)2
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34
![Page 15: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/15.jpg)
Point Estimation Revisited: Mean Square Error
Let ˆqn
be an estimator of quantity q related to random variable x
mapped from n i.i.d samples of x
Mean square error of ˆqn
:
MSE( ˆqn
) = EX⇥( ˆq
n
�q)2
⇤
Can be decomposed into the bias and variance:
EX⇥( ˆq
n
�q)2
⇤= E
⇥( ˆq
n
�E[ ˆqn
]+E[ ˆqn
]�q)2
⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2 +(E[ ˆqn
]�q)2 +2( ˆqn
�E[ ˆqn
])(E[ ˆqn
]�q)⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+E
⇥(E[ ˆq
n
]�q)2
⇤+2E
�ˆqn
�E[ ˆqn
]�(E[ ˆq
n
]�q)= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+�E[ ˆq
n
]�q�
2
+2 ·0 · (E[ ˆqn
]�q)= VarX( ˆq
n
)+bias( ˆqn
)2
MSE of an unbiased estimator is its variance
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34
![Page 16: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/16.jpg)
Point Estimation Revisited: Mean Square Error
Let ˆqn
be an estimator of quantity q related to random variable x
mapped from n i.i.d samples of x
Mean square error of ˆqn
:
MSE( ˆqn
) = EX⇥( ˆq
n
�q)2
⇤
Can be decomposed into the bias and variance:
EX⇥( ˆq
n
�q)2
⇤= E
⇥( ˆq
n
�E[ ˆqn
]+E[ ˆqn
]�q)2
⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2 +(E[ ˆqn
]�q)2 +2( ˆqn
�E[ ˆqn
])(E[ ˆqn
]�q)⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+E
⇥(E[ ˆq
n
]�q)2
⇤+2E
�ˆqn
�E[ ˆqn
]�(E[ ˆq
n
]�q)= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+�E[ ˆq
n
]�q�
2
+2 ·0 · (E[ ˆqn
]�q)= VarX( ˆq
n
)+bias( ˆqn
)2
MSE of an unbiased estimator is its variance
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34
![Page 17: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/17.jpg)
Point Estimation Revisited: Mean Square Error
Let ˆqn
be an estimator of quantity q related to random variable x
mapped from n i.i.d samples of x
Mean square error of ˆqn
:
MSE( ˆqn
) = EX⇥( ˆq
n
�q)2
⇤
Can be decomposed into the bias and variance:
EX⇥( ˆq
n
�q)2
⇤= E
⇥( ˆq
n
�E[ ˆqn
]+E[ ˆqn
]�q)2
⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2 +(E[ ˆqn
]�q)2 +2( ˆqn
�E[ ˆqn
])(E[ ˆqn
]�q)⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+E
⇥(E[ ˆq
n
]�q)2
⇤+2E
�ˆqn
�E[ ˆqn
]�(E[ ˆq
n
]�q)= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+�E[ ˆq
n
]�q�
2
+2 ·0 · (E[ ˆqn
]�q)= VarX( ˆq
n
)+bias( ˆqn
)2
MSE of an unbiased estimator is its variance
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34
![Page 18: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/18.jpg)
Point Estimation Revisited: Mean Square Error
Let ˆqn
be an estimator of quantity q related to random variable x
mapped from n i.i.d samples of x
Mean square error of ˆqn
:
MSE( ˆqn
) = EX⇥( ˆq
n
�q)2
⇤
Can be decomposed into the bias and variance:
EX⇥( ˆq
n
�q)2
⇤= E
⇥( ˆq
n
�E[ ˆqn
]+E[ ˆqn
]�q)2
⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2 +(E[ ˆqn
]�q)2 +2( ˆqn
�E[ ˆqn
])(E[ ˆqn
]�q)⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+E
⇥(E[ ˆq
n
]�q)2
⇤+2E
�ˆqn
�E[ ˆqn
]�(E[ ˆq
n
]�q)
= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+�E[ ˆq
n
]�q�
2
+2 ·0 · (E[ ˆqn
]�q)= VarX( ˆq
n
)+bias( ˆqn
)2
MSE of an unbiased estimator is its variance
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34
![Page 19: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/19.jpg)
Point Estimation Revisited: Mean Square Error
Let ˆqn
be an estimator of quantity q related to random variable x
mapped from n i.i.d samples of x
Mean square error of ˆqn
:
MSE( ˆqn
) = EX⇥( ˆq
n
�q)2
⇤
Can be decomposed into the bias and variance:
EX⇥( ˆq
n
�q)2
⇤= E
⇥( ˆq
n
�E[ ˆqn
]+E[ ˆqn
]�q)2
⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2 +(E[ ˆqn
]�q)2 +2( ˆqn
�E[ ˆqn
])(E[ ˆqn
]�q)⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+E
⇥(E[ ˆq
n
]�q)2
⇤+2E
�ˆqn
�E[ ˆqn
]�(E[ ˆq
n
]�q)= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+�E[ ˆq
n
]�q�
2
+2 ·0 · (E[ ˆqn
]�q)
= VarX( ˆqn
)+bias( ˆqn
)2
MSE of an unbiased estimator is its variance
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34
![Page 20: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/20.jpg)
Point Estimation Revisited: Mean Square Error
Let ˆqn
be an estimator of quantity q related to random variable x
mapped from n i.i.d samples of x
Mean square error of ˆqn
:
MSE( ˆqn
) = EX⇥( ˆq
n
�q)2
⇤
Can be decomposed into the bias and variance:
EX⇥( ˆq
n
�q)2
⇤= E
⇥( ˆq
n
�E[ ˆqn
]+E[ ˆqn
]�q)2
⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2 +(E[ ˆqn
]�q)2 +2( ˆqn
�E[ ˆqn
])(E[ ˆqn
]�q)⇤
= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+E
⇥(E[ ˆq
n
]�q)2
⇤+2E
�ˆqn
�E[ ˆqn
]�(E[ ˆq
n
]�q)= E
⇥( ˆq
n
�E[ ˆqn
])2
⇤+�E[ ˆq
n
]�q�
2
+2 ·0 · (E[ ˆqn
]�q)= VarX( ˆq
n
)+bias( ˆqn
)2
MSE of an unbiased estimator is its variance
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34
![Page 21: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/21.jpg)
Example: 5-Fold vs. 10-Fold CV
MSE(CCV) = EX[(CCV�EX(C[fN
]))2] = VarX(CCV)+bias(CCV)2
Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)Let C[·] be the MSE of predictions (made by a function) to true labelsEX(C[f
N
]): read linebias(CCV): gaps between the red and other solid lines (EX[CCV])VarX (CCV): shaded areas
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34
![Page 22: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/22.jpg)
Example: 5-Fold vs. 10-Fold CV
MSE(CCV) = EX[(CCV�EX(C[fN
]))2] = VarX(CCV)+bias(CCV)2
Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)
Let C[·] be the MSE of predictions (made by a function) to true labelsEX(C[f
N
]): read linebias(CCV): gaps between the red and other solid lines (EX[CCV])VarX (CCV): shaded areas
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34
![Page 23: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/23.jpg)
Example: 5-Fold vs. 10-Fold CV
MSE(CCV) = EX[(CCV�EX(C[fN
]))2] = VarX(CCV)+bias(CCV)2
Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)Let C[·] be the MSE of predictions (made by a function) to true labels
EX(C[fN
]): read linebias(CCV): gaps between the red and other solid lines (EX[CCV])VarX (CCV): shaded areas
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34
![Page 24: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/24.jpg)
Example: 5-Fold vs. 10-Fold CV
MSE(CCV) = EX[(CCV�EX(C[fN
]))2] = VarX(CCV)+bias(CCV)2
Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)Let C[·] be the MSE of predictions (made by a function) to true labelsEX(C[f
N
]): read line
bias(CCV): gaps between the red and other solid lines (EX[CCV])VarX (CCV): shaded areas
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34
![Page 25: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/25.jpg)
Example: 5-Fold vs. 10-Fold CV
MSE(CCV) = EX[(CCV�EX(C[fN
]))2] = VarX(CCV)+bias(CCV)2
Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)Let C[·] be the MSE of predictions (made by a function) to true labelsEX(C[f
N
]): read linebias(CCV): gaps between the red and other solid lines (EX[CCV])
VarX (CCV): shaded areas
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34
![Page 26: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/26.jpg)
Example: 5-Fold vs. 10-Fold CV
MSE(CCV) = EX[(CCV�EX(C[fN
]))2] = VarX(CCV)+bias(CCV)2
Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)Let C[·] be the MSE of predictions (made by a function) to true labelsEX(C[f
N
]): read linebias(CCV): gaps between the red and other solid lines (EX[CCV])VarX (CCV): shaded areas
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34
![Page 27: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/27.jpg)
Decomposing Bias and Variance
CCV is an estimator of the expected generalization error EX(C[fN
]):
MSE(CCV) = VarX(CCV)+bias(CCV)2, where
bias(CCV) = EX (CCV)�EX(C[fN
]) = E
�Â
i
1
K
C[f�N
(i) ]��E(C[f
N
])= 1
K
Âi
E
�C[f�N
(i) ]��E(C[f
N
])= E
�C[f�N
(s) ]��E(C[f
N
]),8s
= bias
�C[f�N
(s) ]�,8s
VarX (CCV) = Var
�Â
i
1
K
C[f�N
(i) ]�= 1
K
2
Var
�Â
i
C[f�N
(i) ]�
= 1
K
2
�Â
i
Var
�C[f�N
(i) ]�+2Â
i,j,j>i
CovX�C[f�N
(i) ],C[f�N
(j) ]��
= 1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34
![Page 28: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/28.jpg)
Decomposing Bias and Variance
CCV is an estimator of the expected generalization error EX(C[fN
]):
MSE(CCV) = VarX(CCV)+bias(CCV)2, where
bias(CCV) = EX (CCV)�EX(C[fN
]) = E
�Â
i
1
K
C[f�N
(i) ]��E(C[f
N
])
= 1
K
Âi
E
�C[f�N
(i) ]��E(C[f
N
])= E
�C[f�N
(s) ]��E(C[f
N
]),8s
= bias
�C[f�N
(s) ]�,8s
VarX (CCV) = Var
�Â
i
1
K
C[f�N
(i) ]�= 1
K
2
Var
�Â
i
C[f�N
(i) ]�
= 1
K
2
�Â
i
Var
�C[f�N
(i) ]�+2Â
i,j,j>i
CovX�C[f�N
(i) ],C[f�N
(j) ]��
= 1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34
![Page 29: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/29.jpg)
Decomposing Bias and Variance
CCV is an estimator of the expected generalization error EX(C[fN
]):
MSE(CCV) = VarX(CCV)+bias(CCV)2, where
bias(CCV) = EX (CCV)�EX(C[fN
]) = E
�Â
i
1
K
C[f�N
(i) ]��E(C[f
N
])= 1
K
Âi
E
�C[f�N
(i) ]��E(C[f
N
])
= E
�C[f�N
(s) ]��E(C[f
N
]),8s
= bias
�C[f�N
(s) ]�,8s
VarX (CCV) = Var
�Â
i
1
K
C[f�N
(i) ]�= 1
K
2
Var
�Â
i
C[f�N
(i) ]�
= 1
K
2
�Â
i
Var
�C[f�N
(i) ]�+2Â
i,j,j>i
CovX�C[f�N
(i) ],C[f�N
(j) ]��
= 1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34
![Page 30: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/30.jpg)
Decomposing Bias and Variance
CCV is an estimator of the expected generalization error EX(C[fN
]):
MSE(CCV) = VarX(CCV)+bias(CCV)2, where
bias(CCV) = EX (CCV)�EX(C[fN
]) = E
�Â
i
1
K
C[f�N
(i) ]��E(C[f
N
])= 1
K
Âi
E
�C[f�N
(i) ]��E(C[f
N
])= E
�C[f�N
(s) ]��E(C[f
N
]),8s
= bias
�C[f�N
(s) ]�,8s
VarX (CCV) = Var
�Â
i
1
K
C[f�N
(i) ]�= 1
K
2
Var
�Â
i
C[f�N
(i) ]�
= 1
K
2
�Â
i
Var
�C[f�N
(i) ]�+2Â
i,j,j>i
CovX�C[f�N
(i) ],C[f�N
(j) ]��
= 1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34
![Page 31: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/31.jpg)
Decomposing Bias and Variance
CCV is an estimator of the expected generalization error EX(C[fN
]):
MSE(CCV) = VarX(CCV)+bias(CCV)2, where
bias(CCV) = EX (CCV)�EX(C[fN
]) = E
�Â
i
1
K
C[f�N
(i) ]��E(C[f
N
])= 1
K
Âi
E
�C[f�N
(i) ]��E(C[f
N
])= E
�C[f�N
(s) ]��E(C[f
N
]),8s
= bias
�C[f�N
(s) ]�,8s
VarX (CCV) = Var
�Â
i
1
K
C[f�N
(i) ]�= 1
K
2
Var
�Â
i
C[f�N
(i) ]�
= 1
K
2
�Â
i
Var
�C[f�N
(i) ]�+2Â
i,j,j>i
CovX�C[f�N
(i) ],C[f�N
(j) ]��
= 1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34
![Page 32: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/32.jpg)
Decomposing Bias and Variance
CCV is an estimator of the expected generalization error EX(C[fN
]):
MSE(CCV) = VarX(CCV)+bias(CCV)2, where
bias(CCV) = EX (CCV)�EX(C[fN
]) = E
�Â
i
1
K
C[f�N
(i) ]��E(C[f
N
])= 1
K
Âi
E
�C[f�N
(i) ]��E(C[f
N
])= E
�C[f�N
(s) ]��E(C[f
N
]),8s
= bias
�C[f�N
(s) ]�,8s
VarX (CCV) = Var
�Â
i
1
K
C[f�N
(i) ]�= 1
K
2
Var
�Â
i
C[f�N
(i) ]�
= 1
K
2
�Â
i
Var
�C[f�N
(i) ]�+2Â
i,j,j>i
CovX�C[f�N
(i) ],C[f�N
(j) ]��
= 1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34
![Page 33: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/33.jpg)
Decomposing Bias and Variance
CCV is an estimator of the expected generalization error EX(C[fN
]):
MSE(CCV) = VarX(CCV)+bias(CCV)2, where
bias(CCV) = EX (CCV)�EX(C[fN
]) = E
�Â
i
1
K
C[f�N
(i) ]��E(C[f
N
])= 1
K
Âi
E
�C[f�N
(i) ]��E(C[f
N
])= E
�C[f�N
(s) ]��E(C[f
N
]),8s
= bias
�C[f�N
(s) ]�,8s
VarX (CCV) = Var
�Â
i
1
K
C[f�N
(i) ]�= 1
K
2
Var
�Â
i
C[f�N
(i) ]�
= 1
K
2
�Â
i
Var
�C[f�N
(i) ]�+2Â
i,j,j>i
CovX�C[f�N
(i) ],C[f�N
(j) ]��
= 1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34
![Page 34: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/34.jpg)
Decomposing Bias and Variance
CCV is an estimator of the expected generalization error EX(C[fN
]):
MSE(CCV) = VarX(CCV)+bias(CCV)2, where
bias(CCV) = EX (CCV)�EX(C[fN
]) = E
�Â
i
1
K
C[f�N
(i) ]��E(C[f
N
])= 1
K
Âi
E
�C[f�N
(i) ]��E(C[f
N
])= E
�C[f�N
(s) ]��E(C[f
N
]),8s
= bias
�C[f�N
(s) ]�,8s
VarX (CCV) = Var
�Â
i
1
K
C[f�N
(i) ]�= 1
K
2
Var
�Â
i
C[f�N
(i) ]�
= 1
K
2
�Â
i
Var
�C[f�N
(i) ]�+2Â
i,j,j>i
CovX�C[f�N
(i) ],C[f�N
(j) ]��
= 1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34
![Page 35: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/35.jpg)
How Many Folds K? II
MSE(CCV) = VarX(CCV)+bias(CCV)2, wherebias(CCV) = bias
�C[f�N
(s) ]�,8s
Var(CCV) =1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
We can reduce bias(CCV) and Var(CCV) by learning theory
Choosing the right model complexity avoiding both underfitting andoverfittingCollecting more training examples (N)
Furthermore, we can reduce Var(CCV) by making f�N
(i) and f�N
(j)
uncorrelated
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 11 / 34
![Page 36: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/36.jpg)
How Many Folds K? II
MSE(CCV) = VarX(CCV)+bias(CCV)2, wherebias(CCV) = bias
�C[f�N
(s) ]�,8s
Var(CCV) =1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
We can reduce bias(CCV) and Var(CCV) by learning theory
Choosing the right model complexity avoiding both underfitting andoverfittingCollecting more training examples (N)
Furthermore, we can reduce Var(CCV) by making f�N
(i) and f�N
(j)
uncorrelated
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 11 / 34
![Page 37: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/37.jpg)
How Many Folds K? III
bias(CCV) = bias
�C[f�N
(s) ]�,8s
VarX (CCV) =1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
With a large K, the CCV tends to have:
Low bias
�C[f�N
(s) ]�
and Var
�C[f�N
(s) ]�, as f�N
(s) is trained on moreexamplesHigh Cov
�C[f�N
(i) ],C[f�N
(j) ]�, as training sets X\X(i) and X\X(j) are
more similar thus C[f�N
(i) ] and C[f�N
(j) ] are more positively correlated
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 12 / 34
![Page 38: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/38.jpg)
How Many Folds K? III
bias(CCV) = bias
�C[f�N
(s) ]�,8s
VarX (CCV) =1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
With a large K, the CCV tends to have:Low bias
�C[f�N
(s) ]�
and Var
�C[f�N
(s) ]�, as f�N
(s) is trained on moreexamples
High Cov
�C[f�N
(i) ],C[f�N
(j) ]�, as training sets X\X(i) and X\X(j) are
more similar thus C[f�N
(i) ] and C[f�N
(j) ] are more positively correlated
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 12 / 34
![Page 39: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/39.jpg)
How Many Folds K? III
bias(CCV) = bias
�C[f�N
(s) ]�,8s
VarX (CCV) =1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
With a large K, the CCV tends to have:Low bias
�C[f�N
(s) ]�
and Var
�C[f�N
(s) ]�, as f�N
(s) is trained on moreexamplesHigh Cov
�C[f�N
(i) ],C[f�N
(j) ]�, as training sets X\X(i) and X\X(j) are
more similar thus C[f�N
(i) ] and C[f�N
(j) ] are more positively correlated
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 12 / 34
![Page 40: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/40.jpg)
How Many Folds K? IV
bias(CCV) = bias
�C[f�N
(s) ]�,8s
VarX (CCV) =1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
Conversely, with a small K, the cross-validation error tends to have ahigh bias
�C[f�N
(s) ]�
and Var
�C[f�N
(s) ]�
but low Cov
�C[f�N
(i) ],C[f�N
(j) ]�
In practice, we usually set K = 5 or 10
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 13 / 34
![Page 41: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/41.jpg)
How Many Folds K? IV
bias(CCV) = bias
�C[f�N
(s) ]�,8s
VarX (CCV) =1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
Conversely, with a small K, the cross-validation error tends to have ahigh bias
�C[f�N
(s) ]�
and Var
�C[f�N
(s) ]�
but low Cov
�C[f�N
(i) ],C[f�N
(j) ]�
In practice, we usually set K = 5 or 10
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 13 / 34
![Page 42: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/42.jpg)
Leave-One-Out CV
bias(CCV) = bias
�C[f�N
(s) ]�,8s
VarX (CCV) =1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
For very small dataset:MSE(CCV) is dominated by bias
�C[f�N
(s) ]�
and Var
�C[f�N
(s) ]�
Not Cov
�C[f�N
(i) ],C[f�N
(j) ]�
We can choose K = N, which we call the leave-one-out CV
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 14 / 34
![Page 43: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/43.jpg)
Leave-One-Out CV
bias(CCV) = bias
�C[f�N
(s) ]�,8s
VarX (CCV) =1
K
Var
�C[f�N
(s) ]�+ 2
K
2
Âi,j,j>i
Cov
�C[f�N
(i) ],C[f�N
(j) ]�,8s
For very small dataset:MSE(CCV) is dominated by bias
�C[f�N
(s) ]�
and Var
�C[f�N
(s) ]�
Not Cov
�C[f�N
(i) ],C[f�N
(j) ]�
We can choose K = N, which we call the leave-one-out CV
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 14 / 34
![Page 44: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/44.jpg)
Outline
1Cross Validation
How Many Folds?
2Ensemble Methods
VotingBaggingBoostingWhy AdaBoost Works?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 15 / 34
![Page 45: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/45.jpg)
Ensemble Methods
No free lunch theorem: there is no single ML algorithm that alwaysoutperforms the others in all domains/tasks
Can we combine multiple base-learners to improveApplicability across different domains, and/orGeneralization performance in a specific task?
These are the goals of ensemble learning
How to “combine” multiple base-learners?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 16 / 34
![Page 46: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/46.jpg)
Ensemble Methods
No free lunch theorem: there is no single ML algorithm that alwaysoutperforms the others in all domains/tasksCan we combine multiple base-learners to improve
Applicability across different domains, and/orGeneralization performance in a specific task?
These are the goals of ensemble learning
How to “combine” multiple base-learners?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 16 / 34
![Page 47: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/47.jpg)
Ensemble Methods
No free lunch theorem: there is no single ML algorithm that alwaysoutperforms the others in all domains/tasksCan we combine multiple base-learners to improve
Applicability across different domains, and/orGeneralization performance in a specific task?
These are the goals of ensemble learning
How to “combine” multiple base-learners?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 16 / 34
![Page 48: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/48.jpg)
Ensemble Methods
No free lunch theorem: there is no single ML algorithm that alwaysoutperforms the others in all domains/tasksCan we combine multiple base-learners to improve
Applicability across different domains, and/orGeneralization performance in a specific task?
These are the goals of ensemble learning
How to “combine” multiple base-learners?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 16 / 34
![Page 49: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/49.jpg)
Outline
1Cross Validation
How Many Folds?
2Ensemble Methods
VotingBaggingBoostingWhy AdaBoost Works?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 17 / 34
![Page 50: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/50.jpg)
VotingVoting: linear combining the predictions of base-learners for each x:
y
k
= Âj
w
j
y
(j)k
where w
j
� 0,Âj
w
j
= 1.
If all learners are given equal weight w
j
= 1/L, we have the plurality
vote (multi-class version of majority vote)
Voting Rule Formular
Sum y
k
= 1
L
ÂL
j=1
y
(j)k
Weighted sum y
k
= Âj
w
j
y
(j)k
,wj
� 0,Âj
w
j
= 1
Median y
k
= medianj
y
(j)k
Minimum y
k
= min
j
y
(j)k
Maximum y
k
= max
j
y
(j)k
Product y
k
= ’j
y
(j)k
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 18 / 34
![Page 51: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/51.jpg)
Why Voting Works? I
Assume that each y
(j) has the expected value EX�y
(j) |x�
and varianceVarX
�y
(j) |x�
When w
j
= 1/L, we have:
EX (y |x) = E
Âj
1
L
y
(j) |x!
=1
L
Âj
E
⇣y
(j) |x⌘= E
⇣y
(j) |x⌘
VarX (y |x) = Var
Âj
1
L
y
(j) |x!
=1
L
2
Var
Âj
y
(j) |x!
=1
L
Var
⇣y
(j) |x⌘+
2
L
2
Âi,j,i<j
Cov
⇣y
(i), y(j) |x⌘
The expected value doesn’t change, so the bias doesn’t change
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 19 / 34
![Page 52: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/52.jpg)
Why Voting Works? I
Assume that each y
(j) has the expected value EX�y
(j) |x�
and varianceVarX
�y
(j) |x�
When w
j
= 1/L, we have:
EX (y |x) = E
Âj
1
L
y
(j) |x!
=1
L
Âj
E
⇣y
(j) |x⌘= E
⇣y
(j) |x⌘
VarX (y |x) = Var
Âj
1
L
y
(j) |x!
=1
L
2
Var
Âj
y
(j) |x!
=1
L
Var
⇣y
(j) |x⌘+
2
L
2
Âi,j,i<j
Cov
⇣y
(i), y(j) |x⌘
The expected value doesn’t change, so the bias doesn’t change
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 19 / 34
![Page 53: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/53.jpg)
Why Voting Works? I
Assume that each y
(j) has the expected value EX�y
(j) |x�
and varianceVarX
�y
(j) |x�
When w
j
= 1/L, we have:
EX (y |x) = E
Âj
1
L
y
(j) |x!
=1
L
Âj
E
⇣y
(j) |x⌘= E
⇣y
(j) |x⌘
VarX (y |x) = Var
Âj
1
L
y
(j) |x!
=1
L
2
Var
Âj
y
(j) |x!
=1
L
Var
⇣y
(j) |x⌘+
2
L
2
Âi,j,i<j
Cov
⇣y
(i), y(j) |x⌘
The expected value doesn’t change, so the bias doesn’t change
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 19 / 34
![Page 54: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/54.jpg)
Why Voting Works? II
VarX (y |x) = 1
L
Var
⇣y
(j) |x⌘+
2
L
2
Âi,j,i<j
Cov
⇣y
(i), y(j) |x⌘
If y
(i) and y
(j) are uncorrelated, the variance can be reducedUnfortunately, y
(j)’s may not be i.i.d. in practiceIf voters are positively correlated, variance increases
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 20 / 34
![Page 55: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/55.jpg)
Why Voting Works? II
VarX (y |x) = 1
L
Var
⇣y
(j) |x⌘+
2
L
2
Âi,j,i<j
Cov
⇣y
(i), y(j) |x⌘
If y
(i) and y
(j) are uncorrelated, the variance can be reduced
Unfortunately, y
(j)’s may not be i.i.d. in practiceIf voters are positively correlated, variance increases
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 20 / 34
![Page 56: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/56.jpg)
Why Voting Works? II
VarX (y |x) = 1
L
Var
⇣y
(j) |x⌘+
2
L
2
Âi,j,i<j
Cov
⇣y
(i), y(j) |x⌘
If y
(i) and y
(j) are uncorrelated, the variance can be reducedUnfortunately, y
(j)’s may not be i.i.d. in practiceIf voters are positively correlated, variance increases
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 20 / 34
![Page 57: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/57.jpg)
Outline
1Cross Validation
How Many Folds?
2Ensemble Methods
VotingBaggingBoostingWhy AdaBoost Works?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 21 / 34
![Page 58: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/58.jpg)
Bagging
Bagging (short for bootstrap aggregating) is a voting method, butbase-learners are made different deliberatelyHow?
Why not train them using slightly different training sets?
1 Generate L slightly different samples from a given sample is done bybootstrap: given X of size N, we draw N points randomly from Xwith replacement to get X(j)
It is possible that some instances are drawn more than once and someare not at all
2 Train a base-learner for each X(j)
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 22 / 34
![Page 59: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/59.jpg)
Bagging
Bagging (short for bootstrap aggregating) is a voting method, butbase-learners are made different deliberatelyHow? Why not train them using slightly different training sets?
1 Generate L slightly different samples from a given sample is done bybootstrap: given X of size N, we draw N points randomly from Xwith replacement to get X(j)
It is possible that some instances are drawn more than once and someare not at all
2 Train a base-learner for each X(j)
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 22 / 34
![Page 60: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/60.jpg)
Bagging
Bagging (short for bootstrap aggregating) is a voting method, butbase-learners are made different deliberatelyHow? Why not train them using slightly different training sets?
1 Generate L slightly different samples from a given sample is done bybootstrap: given X of size N, we draw N points randomly from Xwith replacement to get X(j)
It is possible that some instances are drawn more than once and someare not at all
2 Train a base-learner for each X(j)
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 22 / 34
![Page 61: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/61.jpg)
Bagging
Bagging (short for bootstrap aggregating) is a voting method, butbase-learners are made different deliberatelyHow? Why not train them using slightly different training sets?
1 Generate L slightly different samples from a given sample is done bybootstrap: given X of size N, we draw N points randomly from Xwith replacement to get X(j)
It is possible that some instances are drawn more than once and someare not at all
2 Train a base-learner for each X(j)
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 22 / 34
![Page 62: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/62.jpg)
Outline
1Cross Validation
How Many Folds?
2Ensemble Methods
VotingBaggingBoostingWhy AdaBoost Works?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 23 / 34
![Page 63: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/63.jpg)
Boosting
In bagging, generating “uncorrelated” base-learners is left to chanceand unstability of the learning method
In boosting, we generate complementary base-learnersHow? Why not train the next learner on the mistakes of the previouslearnersFor simplicity, let’s consider the binary classification here:d
(j)(x) 2 {1,�1}The original boosting algorithm combines three weak learners togenerate a strong learner
A week learner has error probability less than 1/2 (better than randomguessing)A strong learner has arbitrarily small error probability
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 24 / 34
![Page 64: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/64.jpg)
Boosting
In bagging, generating “uncorrelated” base-learners is left to chanceand unstability of the learning methodIn boosting, we generate complementary base-learnersHow?
Why not train the next learner on the mistakes of the previouslearnersFor simplicity, let’s consider the binary classification here:d
(j)(x) 2 {1,�1}The original boosting algorithm combines three weak learners togenerate a strong learner
A week learner has error probability less than 1/2 (better than randomguessing)A strong learner has arbitrarily small error probability
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 24 / 34
![Page 65: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/65.jpg)
Boosting
In bagging, generating “uncorrelated” base-learners is left to chanceand unstability of the learning methodIn boosting, we generate complementary base-learnersHow? Why not train the next learner on the mistakes of the previouslearners
For simplicity, let’s consider the binary classification here:d
(j)(x) 2 {1,�1}The original boosting algorithm combines three weak learners togenerate a strong learner
A week learner has error probability less than 1/2 (better than randomguessing)A strong learner has arbitrarily small error probability
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 24 / 34
![Page 66: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/66.jpg)
Boosting
In bagging, generating “uncorrelated” base-learners is left to chanceand unstability of the learning methodIn boosting, we generate complementary base-learnersHow? Why not train the next learner on the mistakes of the previouslearnersFor simplicity, let’s consider the binary classification here:d
(j)(x) 2 {1,�1}The original boosting algorithm combines three weak learners togenerate a strong learner
A week learner has error probability less than 1/2 (better than randomguessing)A strong learner has arbitrarily small error probability
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 24 / 34
![Page 67: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/67.jpg)
Original Boosting Algorithm
Training
1 Given a large training set, randomly divide it into three
2 Use X(1) to train the first learner d
(1) and feed X(2) to d
(1)
3 Use all points misclassified by d
(1) and X(2) to train d
(2). Then feedX(3) to d
(1) and d
(2)
4 Use the points on which d
(1) and d
(2) disagree to train d
(3)
Testing
1 Feed a point it to d
(1) and d
(2) first. If their outputs agree, use themas the final prediction
2 Otherwise the output of d
(3) is taken
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34
![Page 68: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/68.jpg)
Original Boosting Algorithm
Training
1 Given a large training set, randomly divide it into three2 Use X(1) to train the first learner d
(1) and feed X(2) to d
(1)
3 Use all points misclassified by d
(1) and X(2) to train d
(2). Then feedX(3) to d
(1) and d
(2)
4 Use the points on which d
(1) and d
(2) disagree to train d
(3)
Testing
1 Feed a point it to d
(1) and d
(2) first. If their outputs agree, use themas the final prediction
2 Otherwise the output of d
(3) is taken
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34
![Page 69: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/69.jpg)
Original Boosting Algorithm
Training
1 Given a large training set, randomly divide it into three2 Use X(1) to train the first learner d
(1) and feed X(2) to d
(1)
3 Use all points misclassified by d
(1) and X(2) to train d
(2). Then feedX(3) to d
(1) and d
(2)
4 Use the points on which d
(1) and d
(2) disagree to train d
(3)
Testing
1 Feed a point it to d
(1) and d
(2) first. If their outputs agree, use themas the final prediction
2 Otherwise the output of d
(3) is taken
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34
![Page 70: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/70.jpg)
Original Boosting Algorithm
Training
1 Given a large training set, randomly divide it into three2 Use X(1) to train the first learner d
(1) and feed X(2) to d
(1)
3 Use all points misclassified by d
(1) and X(2) to train d
(2). Then feedX(3) to d
(1) and d
(2)
4 Use the points on which d
(1) and d
(2) disagree to train d
(3)
Testing
1 Feed a point it to d
(1) and d
(2) first. If their outputs agree, use themas the final prediction
2 Otherwise the output of d
(3) is taken
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34
![Page 71: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/71.jpg)
Original Boosting Algorithm
Training
1 Given a large training set, randomly divide it into three2 Use X(1) to train the first learner d
(1) and feed X(2) to d
(1)
3 Use all points misclassified by d
(1) and X(2) to train d
(2). Then feedX(3) to d
(1) and d
(2)
4 Use the points on which d
(1) and d
(2) disagree to train d
(3)
Testing
1 Feed a point it to d
(1) and d
(2) first. If their outputs agree, use themas the final prediction
2 Otherwise the output of d
(3) is taken
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34
![Page 72: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/72.jpg)
Original Boosting Algorithm
Training
1 Given a large training set, randomly divide it into three2 Use X(1) to train the first learner d
(1) and feed X(2) to d
(1)
3 Use all points misclassified by d
(1) and X(2) to train d
(2). Then feedX(3) to d
(1) and d
(2)
4 Use the points on which d
(1) and d
(2) disagree to train d
(3)
Testing
1 Feed a point it to d
(1) and d
(2) first. If their outputs agree, use themas the final prediction
2 Otherwise the output of d
(3) is taken
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34
![Page 73: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/73.jpg)
Example
Assuming X(1), X(2), and X(3) are the same:
Disadvantage: requires a large training set to afford the three-way split
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 26 / 34
![Page 74: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/74.jpg)
AdaBoost
AdaBoost: uses the same training set over and over againHow to make some points “larger?”
Modify the probabilities of drawing the instances as a function of errorNotation:Pr
(i,j): probability that an example (x(i),y(i)) is drawn to train the jthbase-learner d
(j)
e(j) = Âi
Pr
(i,j)1(y(i) 6= d
(j)(x(i))): error rate of d
(j) on its training set
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 27 / 34
![Page 75: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/75.jpg)
AdaBoost
AdaBoost: uses the same training set over and over againHow to make some points “larger?”Modify the probabilities of drawing the instances as a function of error
Notation:Pr
(i,j): probability that an example (x(i),y(i)) is drawn to train the jthbase-learner d
(j)
e(j) = Âi
Pr
(i,j)1(y(i) 6= d
(j)(x(i))): error rate of d
(j) on its training set
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 27 / 34
![Page 76: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/76.jpg)
AdaBoost
AdaBoost: uses the same training set over and over againHow to make some points “larger?”Modify the probabilities of drawing the instances as a function of errorNotation:Pr
(i,j): probability that an example (x(i),y(i)) is drawn to train the jthbase-learner d
(j)
e(j) = Âi
Pr
(i,j)1(y(i) 6= d
(j)(x(i))): error rate of d
(j) on its training set
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 27 / 34
![Page 77: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/77.jpg)
Algorithm
Training
1 Initialize Pr
(i,1) = 1
N
for all i
2 Start from j = 1:1 Randomly draw N examples from X with probabilities Pr
(i,j) and usethem to train d
(j)
2 Stop adding new base-learners if e(j) � 1
2
3 Define aj
= 1
2
log
⇣1�e(j)
e(j)
⌘> 0 and set
Pr
(i,j+1) = Pr
(i,j) ·exp(�aj
y
(i)d
(j)(x(i))) for all i
4 Normalize Pr
(i,j+1), 8i, by multiplying⇣
Âi
Pr
(i,j+1)⌘�1
Testing
1 Given x, calculate y
(j) for all j
2 Make final prediction y by voting: y = Âj
aj
d
(j)(x)
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 28 / 34
![Page 78: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/78.jpg)
Algorithm
Training
1 Initialize Pr
(i,1) = 1
N
for all i
2 Start from j = 1:1 Randomly draw N examples from X with probabilities Pr
(i,j) and usethem to train d
(j)
2 Stop adding new base-learners if e(j) � 1
2
3 Define aj
= 1
2
log
⇣1�e(j)
e(j)
⌘> 0 and set
Pr
(i,j+1) = Pr
(i,j) ·exp(�aj
y
(i)d
(j)(x(i))) for all i
4 Normalize Pr
(i,j+1), 8i, by multiplying⇣
Âi
Pr
(i,j+1)⌘�1
Testing
1 Given x, calculate y
(j) for all j
2 Make final prediction y by voting: y = Âj
aj
d
(j)(x)
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 28 / 34
![Page 79: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/79.jpg)
Algorithm
Training
1 Initialize Pr
(i,1) = 1
N
for all i
2 Start from j = 1:1 Randomly draw N examples from X with probabilities Pr
(i,j) and usethem to train d
(j)
2 Stop adding new base-learners if e(j) � 1
2
3 Define aj
= 1
2
log
⇣1�e(j)
e(j)
⌘> 0 and set
Pr
(i,j+1) = Pr
(i,j) ·exp(�aj
y
(i)d
(j)(x(i))) for all i
4 Normalize Pr
(i,j+1), 8i, by multiplying⇣
Âi
Pr
(i,j+1)⌘�1
Testing
1 Given x, calculate y
(j) for all j
2 Make final prediction y by voting: y = Âj
aj
d
(j)(x)
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 28 / 34
![Page 80: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/80.jpg)
Algorithm
Training
1 Initialize Pr
(i,1) = 1
N
for all i
2 Start from j = 1:1 Randomly draw N examples from X with probabilities Pr
(i,j) and usethem to train d
(j)
2 Stop adding new base-learners if e(j) � 1
2
3 Define aj
= 1
2
log
⇣1�e(j)
e(j)
⌘> 0 and set
Pr
(i,j+1) = Pr
(i,j) ·exp(�aj
y
(i)d
(j)(x(i))) for all i
4 Normalize Pr
(i,j+1), 8i, by multiplying⇣
Âi
Pr
(i,j+1)⌘�1
Testing
1 Given x, calculate y
(j) for all j
2 Make final prediction y by voting: y = Âj
aj
d
(j)(x)
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 28 / 34
![Page 81: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/81.jpg)
Algorithm
Training
1 Initialize Pr
(i,1) = 1
N
for all i
2 Start from j = 1:1 Randomly draw N examples from X with probabilities Pr
(i,j) and usethem to train d
(j)
2 Stop adding new base-learners if e(j) � 1
2
3 Define aj
= 1
2
log
⇣1�e(j)
e(j)
⌘> 0 and set
Pr
(i,j+1) = Pr
(i,j) ·exp(�aj
y
(i)d
(j)(x(i))) for all i
4 Normalize Pr
(i,j+1), 8i, by multiplying⇣
Âi
Pr
(i,j+1)⌘�1
Testing
1 Given x, calculate y
(j) for all j
2 Make final prediction y by voting: y = Âj
aj
d
(j)(x)
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 28 / 34
![Page 82: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/82.jpg)
Example
d
(j+1) complements d
(j) and d
(j�1) by focusing on predictions theydisagree
Voting weights (aj
= 1
2
log
⇣1�e(j)
e(j)
⌘) in predictions are proportional to
the base-learner’s accuracy
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 29 / 34
![Page 83: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/83.jpg)
Example
d
(j+1) complements d
(j) and d
(j�1) by focusing on predictions theydisagreeVoting weights (a
j
= 1
2
log
⇣1�e(j)
e(j)
⌘) in predictions are proportional to
the base-learner’s accuracyShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 29 / 34
![Page 84: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/84.jpg)
Outline
1Cross Validation
How Many Folds?
2Ensemble Methods
VotingBaggingBoostingWhy AdaBoost Works?
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 30 / 34
![Page 85: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/85.jpg)
Why AdaBoost WorksWhy AdaBoost improves performance?
By increasing model complexity? Not exactlyEmpirical study: AdaBoost reduces overfitting as L grows, even whenthere is no training error
AdaBoost increases margin [1, 2]
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 31 / 34
![Page 86: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/86.jpg)
Why AdaBoost WorksWhy AdaBoost improves performance?By increasing model complexity?
Not exactlyEmpirical study: AdaBoost reduces overfitting as L grows, even whenthere is no training error
AdaBoost increases margin [1, 2]
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 31 / 34
![Page 87: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/87.jpg)
Why AdaBoost WorksWhy AdaBoost improves performance?By increasing model complexity? Not exactly
Empirical study: AdaBoost reduces overfitting as L grows, even whenthere is no training error
AdaBoost increases margin [1, 2]
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 31 / 34
![Page 88: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/88.jpg)
Why AdaBoost WorksWhy AdaBoost improves performance?By increasing model complexity? Not exactly
Empirical study: AdaBoost reduces overfitting as L grows, even whenthere is no training error
AdaBoost increases margin [1, 2]
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 31 / 34
![Page 89: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/89.jpg)
Margin as Confidence of Predictions
Recall in SVC, a larger marginimproves generalizability
Due to higher confidence
predictions over trainingexamplesWe can define the margin for AdaBoost similarlyIn binary classification, define margin of a prediction of an example(x(i),y(i)) 2 X as:
margin(x(i),y(i)) = y
(i)f (x(i)) = Â
j:y
(i)=d
(j)(x(i))
aj
� Âj:y
(i) 6=d
(j)(x(i))
aj
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 32 / 34
![Page 90: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/90.jpg)
Margin as Confidence of Predictions
Recall in SVC, a larger marginimproves generalizabilityDue to higher confidence
predictions over trainingexamples
We can define the margin for AdaBoost similarlyIn binary classification, define margin of a prediction of an example(x(i),y(i)) 2 X as:
margin(x(i),y(i)) = y
(i)f (x(i)) = Â
j:y
(i)=d
(j)(x(i))
aj
� Âj:y
(i) 6=d
(j)(x(i))
aj
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 32 / 34
![Page 91: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/91.jpg)
Margin as Confidence of Predictions
Recall in SVC, a larger marginimproves generalizabilityDue to higher confidence
predictions over trainingexamplesWe can define the margin for AdaBoost similarlyIn binary classification, define margin of a prediction of an example(x(i),y(i)) 2 X as:
margin(x(i),y(i)) = y
(i)f (x(i)) = Â
j:y
(i)=d
(j)(x(i))
aj
� Âj:y
(i) 6=d
(j)(x(i))
aj
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 32 / 34
![Page 92: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/92.jpg)
Margin DistributionMargin distribution over q :
PrX(y(i)
f (x(i)) q)⇡ |(x(i),y(i)) : y
(i)f (x(i)) q |
|X|
A complementary learner:Clarifies low confidence areasIncreases margin of points in theseareas
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 33 / 34
![Page 93: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/93.jpg)
Margin DistributionMargin distribution over q :
PrX(y(i)
f (x(i)) q)⇡ |(x(i),y(i)) : y
(i)f (x(i)) q |
|X|
A complementary learner:Clarifies low confidence areasIncreases margin of points in theseareas
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 33 / 34
![Page 94: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning](https://reader036.fdocuments.in/reader036/viewer/2022071016/5fcf4ac215464f6e492545fe/html5/thumbnails/94.jpg)
Reference I
[1] Yoav Freund, Robert Schapire, and N Abe.A short introduction to boosting.Journal-Japanese Society For Artificial Intelligence, 14(771-780):1612,1999.
[2] Liwei Wang, Masashi Sugiyama, Cheng Yang, Zhi-Hua Zhou, and JufuFeng.On the margin explanation of boosting algorithms.In COLT, pages 479–490. Citeseer, 2008.
Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 34 / 34