A decision-theoretic generalization of on-line learning...
Transcript of A decision-theoretic generalization of on-line learning...
![Page 1: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/1.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
A decision-theoretic generalization of on-linelearning and an application to boosting [1]
From Regret Learning to AdaBoost
Xing Wang
Department of Computer Science, TAMU
Date: May 6, 2015
![Page 2: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/2.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Table of Contents
1 AdaBoost Algorithm
2 Upper Bound for Adaboost Algorithm
3 Experiment EvaluationExperiment 1Experiment 2
4 Generalization Analysis
![Page 3: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/3.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
External Regret Learning
Initial∑N
i=1 w1i = 1, w1
i ∈ [0, 1];for t = 1 . . .T do
get pt = w t/∑
w ti ;
receive loss vector l t ∈ [0, 1]N ;suffer loss pt · l t ;update weight w t+1
i = w ti β
l ti
endAlgorithm 1: PW Algorithm
![Page 4: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/4.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
From Regret to Adaptive Boosting
input : N labeled samples (x1, y1), . . .distribution D over N samplesweak learning algorithm WeakLearn
Initial w1i = D(i);
for t = 1 . . .T doprovide WeakLearn with distribution pt = w t/
∑w ti over
samples, get a hypothesis ht : X → [0, 1];
calculate the error of ht , εt =∑N
i=1 pti |ht(xi )− yi |;
set βt = εt/(1− εt), update weight vector
w t+1i = w t
i β1−|ht(xi )−yi |t ;
endAlgorithm 2: Adaboost
hf (x) =
{1 if
∑Tt=1 ht(x)log(1/βt) ≥ 0.5
∑Tt=1 log(1/βt)
0 otherwise(1)
![Page 5: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/5.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Table of Contents
1 AdaBoost Algorithm
2 Upper Bound for Adaboost Algorithm
3 Experiment EvaluationExperiment 1Experiment 2
4 Generalization Analysis
![Page 6: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/6.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Error Bound for AdaBoost
Theorem The error for the final hypothesis hf ,ε =
∑i :hf (xi )6=yi
D(i), is bounded by ε ≤∏T
t=1 2√εt(1− εt)
Figure 1: 2√εt(1− εt)
![Page 7: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/7.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Theorem proof, part 1
wT+1i = D(i)
T∏t=1
β1−|ht(xi )−yi |t (2)
hf (x) = 1 if∑T
t=1 ht(x)log(1/βt) ≥ 0.5∑T
t=1 log(1/βt).Then hf makes mistake (hf (x) 6= yi ) is equivalent to
T∏t=1
β−|ht(xi )−yi |t ≥ (
T∏t=1
βt)−1/2 (3)
plug 3 in 2 for the mislabeled samples, we have
N∑i
wT+1i ≥
∑i :hf (x) 6=yi
wT+1i ≥ (
∑i :hf (x)6=yi
D(i))(T∏t=1
βt)1/2 = ε(
T∏t=1
βt)1/2
(4)
![Page 8: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/8.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Theorem proof, part 2
N∑i=1
w t+1i =
N∑i=1
w ti β
1−|ht(xi )−yi |t
≤N∑i=1
w ti (1− (1− β)(1− |ht(xi )− yi |))
= (N∑i=1
w ti )(1− (1− εt)(1− βt))
(5)
where εt =∑N
i w ti |ht(xi )− yi |/
∑Nj=1 w
tj
N∑i=1
wT+1i ≤
N∑i=1
w1i
T∏t=1
(1− (1− εt)(1− βt))
≤T∏t=1
(1− (1− εt)(1− βt))
(6)
![Page 9: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/9.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Theorem proof, part 3
Based on 4 ε(∏T
t=1 βt)1/2 ≤
∑Ni=1 w
T+1i and 6∑N
i=1 wT+1i ≤
∏Tt=1(1− (1− εt)(1− βt)), we get:
ε ≤T∏t=1
1− (1− εt)(1− βt)√βt
(7)
The right part get minimal value when βt = εt/(1− εt), plug inthis value and finish the proof ε ≤ 2T
∏Tt=1
√εt(1− εt).
![Page 10: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/10.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Table of Contents
1 AdaBoost Algorithm
2 Upper Bound for Adaboost Algorithm
3 Experiment EvaluationExperiment 1Experiment 2
4 Generalization Analysis
![Page 11: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/11.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Experiment settings
Two dataset:
DRIVE [2] retinal image, blood vessel vs backgroundUCI [4] Japanese credit screening dataset
Decision Tree as weak learner,
package from sklearnmax depth of 4initial sample weight w1
i = 0.5/|j : lj == li |
![Page 12: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/12.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Retinal blood vessel / background classification
(a) (b)
20 training images, a total of 4,541,006 pixels, 569,415 bloodvessel pixel.
two shape features, energy and symmetry derived from daisygraph [3].
![Page 13: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/13.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Evalution results on the Retina Image
Figure 2: εt → 0.5, βt → 1
There is little update on the sample weightlog(1/βt)→ 0, the corresponding classifier contribute less.2√εt(1− εt)→ 1, no reduce on 2T
∏Tt=1
√εt(1− εt)
![Page 14: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/14.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Credit Screening
UCI Japanese Credit Screening : http://goo.gl/4gBRXb, 532samples.
Feature used : 2,3,8,11,14,15. six continuous features.
Class label: +(296)/-(357)
![Page 15: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/15.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Evalution results on the Credit Screening
εt of each round is below 0.4ε on training set converge to 0 after 40 rounds.
![Page 16: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/16.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Table of Contents
1 AdaBoost Algorithm
2 Upper Bound for Adaboost Algorithm
3 Experiment EvaluationExperiment 1Experiment 2
4 Generalization Analysis
![Page 17: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/17.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
PAC framework and VC dimension
Based on [5], with probability 1− δ,
errortrue(h) ≤ errortrain(h) +
√ln(H) + ln(1/δ)
2m(8)
H is the VC dimension of the hypothesis class
m is the sample numbers
δ = 0.05 for later analysis
![Page 18: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/18.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
VC dimension
VC dimension of a class of hypothesis is the largest number ofsamples, any assignment of label to the samples could beseperated by one hypotheiss in the hypothesis class.Example: In one-dimension, with hypothesis class as{+/− x > a}.
exists two samples, always separable
any three samples, exist one label assignment not separable
![Page 19: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/19.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
VC dimension of decision tree
The VC dimesion of decision tree of depth k on n-dimension spaceis bounded by:
Lower bound: 2k−1(n + 1)
Upper bound[5]: 2(2n)2k−1
![Page 20: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/20.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
VC dimension of the Adaboost
Let H be the class of hypothesis given by the WeakLearner withVC dimension d ≥ 2, then the VC dimesion of the hypothesis giverby Adaboost after T rounds is at most
2(d + 1)(T + 1)log2(e(T + 1)) (9)
![Page 21: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/21.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Mean of Leave-one-out generalization test
Figure 3: Generalization test on Credit Screening
The optimal interation number given by PAC framework isless than the optimal iterations needed.Consistent with the papers results.
![Page 22: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/22.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Thanks, Q&A
![Page 23: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/23.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Reference I
Freund, Yoav, and Robert E. Schapire. ”A decision-theoreticgeneralization of on-line learning and an application toboosting.” Journal of computer and system sciences 55.1(1997): 119-139.
J.J. Staal, M.D. Abramoff, M. Niemeijer, M.A. Viergever, B.van Ginneken, ”Ridge based vessel segmentation in colorimages of the retina”, IEEE Transactions on Medical Imaging,2004, vol. 23, pp. 501-509.
Huajun, Ying, Wang Xing, and Liu Jyh-Charn. ”Statisticalpattern analysis of blood vessel features on retina images andits application to blood vessel mapping algorithms.” EMBC2014.
![Page 24: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/24.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Reference II
Lichman, M. (2013). UCI Machine Learning Repository[http://archive.ics.uci.edu/ml]. Irvine, CA: University ofCalifornia, School of Information and Computer Science.
Luke Zettlemoyer. PAC-learning, VC Dimesion. UW, 2012.
![Page 25: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning](https://reader033.fdocuments.in/reader033/viewer/2022042802/5f39f9201a7e7440e002dfbc/html5/thumbnails/25.jpg)
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis
Iteration statistic
There are cases the boost iterate less than 40 times. Theiteration ends because the error rate does not change.