Asymmetric Boosting for Face Detection
description
Transcript of Asymmetric Boosting for Face Detection
![Page 1: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/1.jpg)
Asymmetric Boosting for Face Detection
Minh-Tri PhamPh.D. Candidate and Research AssociateNanyang Technological University, Singapore
presented by
![Page 2: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/2.jpg)
Overview
• Online Asymmetric Boosting
• Fast Training and Selection of Haar-like Features using Statistics
• Detection with Multi-exit Asymmetric Boosting
![Page 3: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/3.jpg)
Online Asymmetric Boosting
CVPR’07 oral paper:Minh-Tri Pham and Tat-Jen Cham. Online Learning Asymmetric Boosted Classifiers for Object Detection. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, 2007.
![Page 4: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/4.jpg)
objectregion
Motivation• Usual goal of object detector:
– Focused on accuracy• General detectors are designed to
deal with different input spaces• Only one input space is used per
application
input space 1
input space 2
input space 3
global input spaceOffline learned
Online learned?
non-objectregion
![Page 5: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/5.jpg)
Supervisor-Student paradigmSupervisor:
Student:
Fast but limited Student Detector
Slow but general
Supervisor Detector
Input Output
• Supervisor = existing object detector
• Student = online-learned object detector
• Less complex model
Faster detection speed
![Page 6: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/6.jpg)
Problem overview• Common appearance-based approach:
– Classify a patch using a cascade or tree of boosted classifiers (Viola-Jones and variants):
– F1, F2, …, FN: boosted classifiers
• Main challenges for online learning a boosted classifier:– Asymmetric: P(non-object) >> P(object)– Online data
F1 F2 FN….passpasspass pass
reject reject reject
object
non-object
![Page 7: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/7.jpg)
Review of current methods• Online learning for boosting:
– Online Boosting of Oza (2005)
• Replace offline weak learners with online weak learners
• Propagate weights similarly to AdaBoost
• Only works well when P(non-object) ≈ P(object)
• P(non-object) >> P(object):– Viola and Jones (2002)– Ma and Ding (2003)– Hou et. al. (2006)
• Reweigh positives higher and negatives lower
• Offline learning only
Asymmetric Online Boosting• Incorporate asymmetric reweighing scheme into Online Boosting
Skewness balancing:• New reweighting scheme giving
better accuracy
Polarity balancing:• Faster learning convergence rate
![Page 8: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/8.jpg)
• Skewness:– Measure the degree of asymmetry of the class probability distribution– Defined as:
• = logP(negative) – logP(positive)• Viola-Jones’ reweighing scheme:
– Reweigh positives the same amount more than negatives on every weak learner
– km = reweighing amount on the m-th weak learner– k = total reweighing amount
MM kkkk ...21
Skewness balancing
weak learners
skewness
Initial skewness:1 > 0After reweighing:
1’ = 1 - log k1After training weak learner 1:
2 ≈ 0After reweighing:2’ = 2 - log k2
After training weak learner 2:
3 ≈ 0After reweighing:3’ = 3 - log k3
After training weak learner 3:
4 ≈ 0After reweighing:4’ = 4 - log k4
: negative example: positive example
![Page 9: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/9.jpg)
• Our approach:– Reweigh positives more than negatives differently, to have equal skewness
presented to every weak learner
– m = skewness after training the (m-1)-th weak learner– km = reweighing amount on the m-th weak learner– k = total reweighing amount
Skewness balancing
weak learners
skewness
Initial skewness:1 > 0
After reweighing:1’ = 1 - log k1
After training weak learner 1:
2 ≈ 0After reweighing:2’ = 2 - log k2
After training weak learner 2:
3 ≈ 0After reweighing:3’ = 3 - log k3
After training weak learner 3:
4 ≈ 0After reweighing:4’ = 4 - log k4
: negative example: positive example
m
m
iim mM
mMkkmM
k 1
loglog1
1exp1
1
![Page 10: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/10.jpg)
Skewness balancing• Effective for initial boosted
classifiers in the cascade– Better accuracy faster detection
speed
• Effectiveness degraded as boosted classifiers get more complicated
ROC curve for 4-feature boosted classifier
ROC curve for 200-feature boosted classifier
![Page 11: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/11.jpg)
Polarity balancing• After training a weak learner with AdaBoost:– classified weights = mis-classified weights– positive weights = negative weights (if weak learner is optimal)
• To maintain online AdaBoost’s properties:
– Online Boosting ensures asymptotically:
• classified weights = mis-classified weights
– Our method ensures asymptotically:
• classified weights = mis-classified weights
• positive weights = negative weights
Faster convergence rate
TP TN
FN FP
positive negative
Correctly classified
Wrongly classified
Weight distribution after training a weak learner
![Page 12: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/12.jpg)
• Learning time:– About 5-30% faster with Polarity balancing
Polarity balancing
Online Learning a 20-feature boosted classifier
![Page 13: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/13.jpg)
Overall performance• ROC curves:
– Similar results
![Page 14: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/14.jpg)
Online Learning a Face Detector• Video clip:
– Length: 20 minutes– Resolution: 352x288– 25fps
• Learn online from the first 10 minutes – using OpenCV’s face detector as supervisor
• Test with the remaining 10 minutes
OpenCV’s face detectorDetection speed: 15fps
Our online-learned face detectorDetection speed: 30fps
![Page 15: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/15.jpg)
Online Learning a Face Detector• Distribution of weak learners over the cascade:
1 4 7 10 13 16 19 22 250
40
80
120
160
200
Viola-Jones'04
Online-learned face detector
Number of boosted classifiers
Num
ber o
f wea
k cl
assi
fiers
![Page 16: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/16.jpg)
Concluding remarks• Skewness balancing:
– Effective for early boosted classifiers• Better accuracy faster detection speed
• Polarity balancing: – Reduction in learning time about 5-30% empirically
• Online learning an object detector from an offline counterpart:– Worst case:
• detection accuracy and speed similar– Average case:
• detection speed can be faster (twice as much)
![Page 17: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/17.jpg)
Fast Training and Selection of Haar-like Features using Statistics
ICCV’07 oral paper:Minh-Tri Pham and Tat-Jen Cham. Fast Training and Selection of Haar Features using Statistics in Boosting-based Face Detection. In Proc. International Conference on on Computer Vision (ICCV), Rio de Janeiro, Brazil, 2007.
• Won Travel Grant Award• Won Second Prize, Best Student Paper in Year 2007 Award, Pattern Recognition and Machine
Intelligence Association (PREMIA), Singapore
![Page 18: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/18.jpg)
Motivation
• Face detectors today– Real-time detection
speed
…but…
– Weeks of training time
18
![Page 19: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/19.jpg)
Factor
Description Common value
N number of examples 10,000
M number of weak classifiers in total
4,000 - 6,000
T number of Haar-like features
40,000
Why is Training so Slow?
• Time complexity: O(MNT log N)– 15ms to train a feature classifier– 10 minutes to train a weak classifier– 27 days to train a face detector
A view of a face detector training algorithm
for weak classifier m from 1 to M:…update weights – O(N)for feature t from 1 to T:
compute N feature values – O(N)sort N feature values – O(N log N)train feature classifier – O(N)
select best feature classifier – O(T)…
19
![Page 20: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/20.jpg)
Why Should the Training Time be Improved?• Tradeoff between time and generalization
– E.g. training 100 times slower if we increase both N and T by 10 times
• Trial and error to find key parameters for training– Much longer training time needed
• Online-learning face detectors have the same problem
20
![Page 21: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/21.jpg)
Existing Approaches to Reduce the Training Time• Sub-sample Haar-like feature set
– Simple but loses generalization
• Use histograms and real-valued boosting (B. Wu et. al. ‘04)– Pro: Reduce from O(MNT log N) to O(MNT)– Con: Raise overfitting concerns:
• Real AdaBoost not known to be overfitting resistant• Weak classifier may overfit if too many histogram bins are used
• Pre-compute feature values’ sorting orders (J. Wu et. al. ‘07)– Pro: Reduce from O(MNT log N) to O(MNT)– Con: Require huge memory storage
• For N = 10,000 and T = 40,000, a total of 800MB is needed.
21
![Page 22: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/22.jpg)
A view of a face detector training algorithm
for weak classifier m from 1 to M:…update weights – O(N)for feature t from 1 to T:
compute N feature values – O(N)sort N feature values – O(N log N)train feature classifier – O(N)
select best feature classifier – O(T)…
Factor
Description Common value
N number of examples 10,000
M number of weak classifiers in total
4,000 - 6,000
T number of Haar-like features
40,000
Why is Training so Slow?
• Time complexity: O(MNT log N)– 15ms to train a feature classifier– 10min to train a weak classifier– 27 days to train a face detector
• Bottleneck:– At least O(NT) to train a weak
classifier
• Can we avoid O(NT)?
22
![Page 23: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/23.jpg)
Our Proposal
• Fast StatBoost: To train feature classifiers using statistics rather than using input data– Con:
• Less accurate… but not critical for a feature classifier
– Pro: • Much faster training time:
Constant time instead of linear time
23
![Page 24: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/24.jpg)
Fast StatBoost
24
• Training feature classifiers using statistics:– Assumption: feature value v(t) is normally
distributed given face class c is known – Closed-form solution for optimal threshold
• Fast linear projections of the statistics of a window’s integral image into 1D statistics of a feature value
Non-faceFace
Optimalthreshold
Featurevalue
)()( tTt gmJ )()(2)( tTtt gg J
constant time to train a feature classifier
: Haar-like feature, a sparse vector with less than 20 non-zero elements
: mean vector and covariance matrix ofJJm , J
)(tg
: random vector representing a window’s integral imageJ : mean and variance of feature value v(t)2)()( , tt
![Page 25: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/25.jpg)
Fast StatBoost• Integral image’s statistics are obtained directly from the weighted input data
– Input: N training integral images and their current weights w(m):
– We compute:• Sample total weight:
• Sample mean vector:
• Sample covariance matrix:
NNmN
mm ccc ,,,...,,,,,, )(22
)(2
)(1 JwJwJw 11
ccn
nmncc
n
wz:
)(1ˆˆ Jm
25
ccn
mnc
n
wz:
)(ˆ
Tcc
ccn
Tnn
mncc
n
wz mmJJ ˆˆˆˆ:
)(1
![Page 26: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/26.jpg)
Factor
Description Common value
N number of examples 10,000
M number of weak classifiers in total
4,000 - 6,000
T number of Haar-like features
40,000
d number of pixels of a window
300-500
Fast StatBoost• To train a weak classifier:
– Extract the class-conditional integral image statistics
• Time complexity: O(Nd2)• Factor d2 negligible because fast algorithms
exist, hence in practice: O(N)
– Train T feature classifiers by projecting the statistics into 1D:
• Time complexity: O(T)
– Select the best feature classifier• Time complexity: O(T)
• Time complexity: O(N+T)
A view of our face detector training algorithm
for weak classifier m from 1 to M:…update weights – O(N)Extract statistics of integral image – O(Nd2)for feature t from 1 to T:
project statistics into 1D – O(1)train feature classifier – O(1)
select best feature classifier – O(T)…
26
![Page 27: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/27.jpg)
Experimental Results• Setup
– Intel Pentium IV 2.8GHz– 19 types 295,920 Haar-like
features
• Time for extracting the statistics:– Main factor: covariance matrices
• GotoBLAS: 0.49 seconds per matrix
• Time for training T features:– 2.1 seconds
(1) (2)
(17)
(7)
(3) (4) (5) (6)
(14)(15)
(16)
(8) (9)(10) (11) (12) (13)
(18) (19)
Edge features: Corner features:
Diagonal line features:
Line features: Center-surround features:
Nineteen feature types used in our experiments
Total training time: 3.1 seconds per weak classifier with 300K features• Existing methods: 1-10 minutes with 40K features or fewer
27
![Page 28: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/28.jpg)
Experimental Results• Comparison with Fast AdaBoost (J. Wu et. al. ‘07), the fastest known
implementation of Viola-Jones’ framework:
28
0 50000 100000 150000 200000 250000 30000002468
1012
training time of a weak classifier
Fast AdaBoostFast StatBoost
number of features (T)
seco
nds
(s)
![Page 29: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/29.jpg)
Experimental Results• Performance of a cascade:
ROC curves of the final cascades for face detection
Method Total training time
Memory requirement
Fast AdaBoost (T=40K)
13h 20m 800 MB
Fast StatBoost (T=40K)
02h 13m 30 MB
Fast StatBoost (T=300K)
03h 02m 30 MB
29
![Page 30: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/30.jpg)
Conclusions
• Fast StatBoost: use of statistics instead of input data to train feature classifiers
• Time:– Reduction of the face detector training time from up to a month to 3 hours– Significant gain in both N and T with little increase in training time
• Due to O(N+T) per weak classifier
• Accuracy:– Even better accuracy for face detector
• Due to much more members of Haar-like features explored
30
![Page 31: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/31.jpg)
Detection with Multi-exit Asymmetric Boosting
CVPR’08 poster paper:Minh-Tri Pham and Viet-Dung D. Hoang and Tat-Jen Cham. Detection with Multi-exit Asymmetric Boosting. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, 2008.
• Won Travel Grant Award
![Page 32: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/32.jpg)
Problem overview• Common appearance-based approach:
– F1, F2, …, FN : boosted classifiers
– f1,1, f1,2, …, f1,K : weak classifiers– : threshold
F1 F2 FN….passpasspass pass
reject reject reject
object
non-object
pass
reject
F1
…. +++ yes
no
f1,1 f1,2 f1,K > ?
![Page 33: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/33.jpg)
Objective
• Find f1,1, f1,2, …, f1,K, and such that:– – – K is minimized proportional to F1’s evaluation time
pass
reject
F1
…. +++ yes
no
f1,1 f1,2 f1,K > ?
01
01
)()(
FFRRFFAR
K
ii xfsignxF
1,11 )()(
![Page 34: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/34.jpg)
Existing trends (1)
Idea• For k from 1 until convergence:
– Let
– Learn new weak classifier f1,k(x):
– Let
– Adjust to see if we can achieve FAR(F1) <= 0 and FRR(F1) <= 0:• Break loop if such exists
Issues• Weak classifiers are sub-
optimal w.r.t. training goal.• Too many weak classifiers
are required in practice.
k
ii xfsignxF
1,11 )()(
)()(minargˆ11,1
,1
FFRRFFARfkf
k
k
ii xfsignxF
1,11 )()(
![Page 35: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/35.jpg)
Existing trends (2)
Idea• For k from 1 until convergence:
– Let
– Learn new weak classifier f1,k(x):
– Break loop if FAR(F1) <= 0 and FRR(F1) <= 0Pros• Reduce FRR at the
cost of increasing FAR – acceptable for cascades
• Fewer weak classifiers
k
ii xfsignxF
1,11 )()(
)()(minargˆ11,1
,1
FFRRFFARfkf
k
Cons• How to choose ?• Much longer training
time
Solution to con• Trial and error:
• choose such that K is minimized.
![Page 36: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/36.jpg)
Our solution
36
Why?
Learn every weak classifier using the same asymmetric goal:
where
)(,1 xf k
,)()(minargˆ11,1
,1
FFRRFFARfkf
k
.0
0
![Page 37: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/37.jpg)
Because…• Consider two desired bounds (or targets) for learning a boosted classifier
– Exact bound: and– Conservative bound:
• (2) is more conservative than (1) because (2) => (1).
0)( MFFAR 0)( MFFRR
00
0 )()(
MM FFRRFFAR
:)(xFM
(2)(1)
0 1
1
0
= 1
H1
H2
H200H201
H3
H4
0
Q1Q2
Q200
Q201
Q3Q4
FAR
FRR
exact bound
conservativebound
FRR0 1
1
= 0/0
FAR
H1
H2
H3
H39
H40
0
0
H41
Q1
Q2
Q3Q39
Q41
Q40
exact bound
conservativebound
At for every new weak classifier learned, the ROC operating
point moves the fastest toward the conservative bound
,0
0
![Page 38: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/38.jpg)
Multi-exit BoostingA method to train a single boosted classifier with multiple exit nodes:
: a weak classifier : a weak classifier followed by a decision to continue or reject – an exit node
f1 f2 f3 f4 f5 f6 f7 f8 object
non-obj
pass pass passreject reject reject
fi fi
+ + + + + + +
.0
0
• Features:• Weak classifiers are trained with the same goal:• Every pass/reject decision is guaranteed with and• The classifier is a cascade.• Score is propagated from one node to another.
• Main advantages:• Weak classifiers are learned (approximately) optimally.• No training of multiple boosted classifiers.• Much fewer weak classifiers are needed than traditional cascades.
0FAR .0FRR
F2F1 F3
![Page 39: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/39.jpg)
ResultsGoal () vs. Number of weak classifiers (K)
• Toy problem: To learn a (single-exit) boosted classifier F for classifying face/non-face patches such that FAR(F) < 0.8 and FRR(F) < 0.01– Best goal:
– Ours chooses:
• Similar results were obtained for tests on other desired error rates.
.8001.08.0
].100,10[
![Page 40: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/40.jpg)
Ours vs. Others (in Face Detection)
• Use Fast StatBoost as base method for fast-training a weak classifier.
Method No of weak
classifiers
No of exit
nodes
Total training
time
Viola Jones [3] 4,297 32 6h20m
Viola Jones [4] 3,502 29 4h30m
Boosting chain [7] 959 22 2h10m
Nested cascade [5] 894 20 2h
Soft cascade [1] 4,871 4,871 6h40m
Dynamic cascade [6] 1,172 1,172 2h50m
Multi-exit Asymmetric Boosting
575 24 1h20m
![Page 41: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/41.jpg)
Ours vs. Others (in Face Detection)• MIT+CMU Frontal Face Test set:
![Page 42: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/42.jpg)
Conclusion
• Multi-exit Asymmetric Boosting trains every weak classifier approximately optimally.
– Better accuracy
– Much fewer weak classifiers
– Significantly reduces training time• No more trial-and-error for training a boosted classifier
42
![Page 43: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/43.jpg)
Margin-based Bounds on an Asymmetric Error of a Classifier
CVPR’08 poster paper:Minh-Tri Pham and Viet-Dung D. Hoang and Tat-Jen Cham. Detection with Multi-exit Asymmetric Boosting. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, 2008.
• Won Travel Grant Award
![Page 44: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/44.jpg)
Motivation
• A number of cost-sensitive learning methods have been proposed to deal with binary classification with an imbalanced dataset:– Cost-sensitive decision trees (Knoll et. al. ‘94)– Cost-sensitive neural networks (Kuka and Kononenko ‘98)– Imbalanced SVM (Veropoulos et. al. ‘99)– Asymmetric Boosting (Karakoulas and Taylor ‘99, Viola and Jones ‘02)
• Their objective function has the same form of an asymmetric error:
– where is the prediction input xare given
• Bounds on the generalization error of a classifier exist, but bounds on an asymmetric error have not been proposed yet.
1|0)(1|0)(minarg 21
yxfPyxfPfFf
21,))((
xfsign
FRR FAR
![Page 45: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/45.jpg)
Why bounding an Asymmetric Error?
• Generalization error is a special case of an asymmetric error.– Consider , we get:
• It helps to explain how the classifier performs on unknown data in problems with imbalanced prior probabilities.
1|0)(1|0)(minarg 21
yxfPyxfPfFf
)1(),1( 21 yPyP
yxfsignPxyfPfFfFf
))((minarg0)(minarg
![Page 46: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/46.jpg)
This work’s contribution...• To give bounds on an asymmetric error of a binary classifier:
![Page 47: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/47.jpg)
Summary
• Online Asymmetric Boosting– Integrates Asymmetric Boosting with Online Learning
• Fast Training and Selection of Haar-like Features using Statistics– Dramatically reduce training time from weeks to a few hours
• Multi-exit Asymmetric Boosting– Approximately minimizes the number of weak classifiers
![Page 48: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/48.jpg)
Thank You!
![Page 49: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/49.jpg)
Backup Slides
![Page 50: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/50.jpg)
Online Asymmetric Boosting
CVPR’07 oral paper:Minh-Tri Pham and Tat-Jen Cham. Online Learning Asymmetric Boosted Classifiers for Object Detection. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, 2007.
![Page 51: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/51.jpg)
AdaBoost (Freund-Schapire’96)
Offline Weak
Learner1
Offline Weak
Learner2
Wrongly classified
Wrongly classified
Correctly classified
Correctly classified
: negative example: positive example
Stage 1 Stage 2
![Page 52: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/52.jpg)
Asymmetric Boost (Viola-Jones’02)
Offline Weak
Learner1
Offline Weak
Learner2
: negative example: positive example
Stage 1 Stage 2
• To address P(non-object) >> P(object):• Weight positives k times more than negatives
![Page 53: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/53.jpg)
Online Boosting (Oza-Rusell’01)
Online Weak
Learner1
Online Weak
Learner2
Correctly classified
: negative example: positive example
• To learn data online:• If wrongly classified: increase weight; otherwise : decrease weight
Wrongly classified
![Page 54: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/54.jpg)
Asymmetric Online Boosting
OnlineWeak
Learner2
OnlineWeak
Learner1
: negative example: positive example
• Incorporate the asymmetric reweighting scheme into Online Boosting• Increase positive weights, decrease negative weights
Wrongly classified
Correctly classified
![Page 55: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/55.jpg)
• Learning time:– About 5-30% faster with Polarity balancing
Polarity balancing
Online Learning a 20-feature boosted classifier
![Page 56: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/56.jpg)
Experimental Results
ROC curves of boosting with 15 weak classifiers ROC curves of boosting with 200 weak classifiers
56
• Performance of a single boosted classifier:
![Page 57: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/57.jpg)
Experiments: UCI-KDD• Symmetric dataset:
– P(positive) ≈ P(negative)• Ours vs. Online Boosting
• Accuracy performance:– Ours better on under-
complete datasets• Promoters• German Credit• Synthetic 20-dim
– Similar on over-complete datasets
• Learning time:– About 5-40% reduction in
time to achieve the same level of error
Online OurDataset AdaBoost Boosting methodPromoters 0.8455 0.7136 0.7429Breast Cancer 0.9445 0.9573 0.9552German Credit 0.735 0.6879 0.7013Chess 0.9517 0.9476 0.9501Mushroom 0.9966 0.9987 0.9978Cencus Income 0.9365 0.9398 0.9372Synthetic 5-dim 0.9382 0.9049 0.9251Synthetic 20-dim 0.8923 0.7972 0.8404
Learning curve for Mushroom dataset
Accuracy of different datasets
![Page 58: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/58.jpg)
Experiments: Face Detector• Asymmetric dataset:
– P(face) << P(non-face)
• Training set:– BioID, ARFace
• Test set:– MIT+CMU
• Patch size: 24x24• Haar feature pool: 5000• Cascade: 25 boosted
classifiers
![Page 59: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/59.jpg)
Relation with Other MethodsMethod No of face
examplesNo of
featuresTime to train a weak classifier
No of weak classifiers
Total training
time
CPU speed
Viola-Jones ‘01 9,500 40,000 > 10 min 4,297 weeks 400 MHz
Li et. al. ‘02 6,000 - - 2,546 weeks 700 MHz
B. Wu el. al. ‘04 20,000 - - 756 1-2 weeks 1.4GHz
Huang el. al. ‘07 30,000 - < 1 min - 2 days 3.0 GHz
J. Wu el. al. ’07 5,000 40,000 12.4s 3,870 13h 20m 2.8 GHz
Fast StatBoost 5,000 295,920 3.1s 3,502 3h02m 2.8 GHz
59
![Page 60: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/60.jpg)
Fast Training and Selection of Haar-like Features using Statistics
ICCV’07 oral paper:Minh-Tri Pham and Tat-Jen Cham. Fast Training and Selection of Haar Features using Statistics in Boosting-based Face Detection. In Proc. International Conference on on Computer Vision (ICCV), Rio de Janeiro, Brazil, 2007.
![Page 61: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/61.jpg)
How good are Gaussian assumptions?
61
Weak classifier 1’s feature value distributionin an ensemble of 200 weak classifiers
![Page 62: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/62.jpg)
How good are Gaussian assumptions?
62
Weak classifier 20’s feature value distributionin an ensemble of 200 weak classifiers
![Page 63: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/63.jpg)
How good are Gaussian assumptions?
63
Weak classifier 100’s feature value distributionin an ensemble of 200 weak classifiers
![Page 64: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/64.jpg)
How good are Gaussian assumptions?
64
Weak classifier 200’s feature value distributionin an ensemble of 200 weak classifiers
![Page 65: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/65.jpg)
Fast StatBoost• Scalar estimators are reliable, despite being computed from high-
dimensional estimators– Scalar mean estimator :
– Scalar variance estimator :
65
)()()( ˆˆ tc
tTc
tc gm
N
VarVartctT
ctc
2)()()( ˆˆ gm
2)()()(2)( ˆˆ tc
tc
Ttc
tc gg J
1
2ˆˆ4)(
)()(2)(
N
VarVartct
cTt
ctc
gg J
)(ˆ tc
2)(ˆ tc
(Ahn-Fessler ‘03)
![Page 66: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/66.jpg)
Relation with Other Methods• Real-valued weak classifier
– Our feature classifier can produce real values:• Instead of ±1, return:
• Online learning:– Gains little benefit from this technique due to the same time complexity:
• O(T) per update of one example
• Application to features beyond Haar-like:– Only requirement: a feature value is a linear projection of the input.
• Joint Haar-like features (Li et. al. ‘02)• Extended Haar-like features (Lienhart et. al. ‘03)• Sparse Granular (Huang et. al. ’07)
1|
1|log5.0 )(
)(
cvPcvP
t
t
66
![Page 67: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/67.jpg)
Face Detectors today• Typical approach for
object detection (e.g. Viola-Jones face detector)– Scan image with probe
window (x,y,s) at different positions and scales
– Binary classify each window patch into• face, or• non-face
input space
non-face
face0
1
67
![Page 68: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/68.jpg)
Face Detectors today• Cascade (or tree) of boosted classifiers with increasing
complexities:
• Boosted classifier:
– Parameters:• fq,m(x) = weak classifier, selected from a set of feature classifiers• aq,m = voting coefficient
– Discrete AdaBoost: • strongly resistant to overfitting (Rudin-Schapire-Daubechies ’03-’07)
F1 F2 FQ….passpasspass pass
reject reject reject
face
non-face
rejectpass
xfasignxFqM
mmqmqq :1
:1)()(
1,,
68
![Page 69: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/69.jpg)
Face Detectors today• Haar-like feature classifier:
o
Feature value v(t) = Haar-like feature H(t) o Image window I
– v(t) can be computed extremely fast using image integrals
– {H(t)} is generated by• scaling and translation• of a few feature types
– Classify window by thresholding v(t) : fq,m(x) = pt sign(v(t) > t)– t : threshold value– : parity bit
y x
xytxy
tv ,)(,
)( IH
Four feature types used by Viola-Jones
1,1 tp
69
![Page 70: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/70.jpg)
Our Approach• To train feature classifiers using statistics:
– Assumption: feature value v(t) is Gaussian-distributed given face class c is known – Solve:
– Non-convex, but a close-form solution exists (Duda el. al. ’02)– How to estimate the class-conditional statistics ?
)(1
)(1
)(1
)(1 ,,, tttt
)(tv
pcvPpcvPp tt
ptt
)()(
,minarg,
)()()( ,| tp
tp
tvgpcP
)()()( ,| tp
tp
tvgpcP
70
![Page 71: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/71.jpg)
Our Approach• Linear relationships among the variables:
– Denote by a vector consisting of all the elements of matrix A arranged in a pre-defined order.
v(t) is a linear projection of image window I with direction of projection H(t) :
Image integral J is a linear transform of I:• By definition (Viola-Jones ‘01):
• Hence, , where B is a non-singular transformation matrix.
v(t) is a linear projection of image integral J, too:
• where is a sparse vector with very few non-zero elements (Viola-Jones ‘01)
IHIH
Tt
y xxy
txy
tv )(,
)(,
)(
y
y
x
xxyxy
1' 1'',', IJ
A
IBJ
JgJBHIH
TtTtTttv )(1)()()(
)(1)( tTt HBg
71
![Page 72: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/72.jpg)
Our Approach• Applying the sparsity of g(t) to statistics:
– Given , computing is extremely fast due to the sparsity of g(t).
– Computing the statistics of v(t) :
• where and are the mean vector and the covariance matrix of image integral If ’s statistics are known, computing and is extremely fast as well
Same for class-conditional statistics• How to estimate the class-conditional statistics of ?
Jg
Tttv )()(
JmgJg
TtTttt v )()()()(
)()()()(2)(2)(2)( tTttTTTtttt vv gggJJJJg J
Jm J J
J
J )(t )(t
)(1
)(1
)(1
)(1 ,,, tttt
J
72
![Page 73: Asymmetric Boosting for Face Detection](https://reader036.fdocuments.in/reader036/viewer/2022062411/5681672c550346895ddbce57/html5/thumbnails/73.jpg)
Our Approach• Estimation of the class-conditional image integral statistics
– Information given at the m-th weak classifier:• N input windows and their corresponding classes, • weighted by vector w(m):
• Let for all n from 1 to N.
– Direct estimation from the current input data:
NNmN
mm ccc ,,,...,,,,,, )(22
)(2
)(1 IwIwIw 11
nn IBJ
Tcc
ccn
Tnn
mncc
ccnn
mncc
ccn
mnc
n
n
n
wz
wz
wz
mmJJ
Jm
ˆˆˆˆ
ˆˆ
ˆ
:
)(1
:
)(1
:
)(
73