Boosting: Algorithms and Applications
Transcript of Boosting: Algorithms and Applications
![Page 1: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/1.jpg)
Boosting: Algorithms
and Applications
Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision
ANU 2nd
Semester, 2008
Chunhua Shen, NICTA/RSISE
![Page 2: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/2.jpg)
BoostingDefinition of Boosting:
Boosting refers to the general problem of producing a very accurate prediction rule by combining rough and moderately inaccurate rules.
Boosting proceduresGiven a set of labeled training examples On each round
The booster devises a distribution (importance) over the example setThe booster requests a weak hypothesis/classifier/learnerwith low error
Upon convergence, the booster combine the weak hypothesis into a single prediction rule.
![Page 3: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/3.jpg)
Boosting (Freund & Schapire, 1997)
![Page 4: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/4.jpg)
Boosting: 1st
iteration
![Page 5: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/5.jpg)
Boosting: Update Distribution
![Page 6: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/6.jpg)
Boosting as Entropy ProjectionMinimizing relative entropy to last distribution s.t. linear constraints
![Page 7: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/7.jpg)
Boosting: 2nd
Hypothesis
![Page 8: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/8.jpg)
Boosting: 3rd
Hypothesis
![Page 9: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/9.jpg)
Boosting: 4th
Hypothesis
![Page 10: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/10.jpg)
All hypotheses
![Page 11: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/11.jpg)
AdaBoost
![Page 12: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/12.jpg)
Properties of AdaBoost
Adaboost adjusts adaptively the errors of the weak hypotheses by weak learner.Unlike the conventional boosting algorithm, the prior error need not be known ahead of time.The update rule reduces the probability assigned to those examples on which the hypothesis makes a good predictions and increases the probability of the examples on which the prediction is poor.
![Page 13: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/13.jpg)
Multi-class Extensions
The previous discussion is restricted to binary classification problems. The traing data could have any number of labels, which is a multi-class problems.The multi-class case (AdaBoost.M1) requires the accuracy of the weak hypothesis greater than ½. This condition in the multi-class is stronger than that in the binary classification cases
![Page 14: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/14.jpg)
Detection Pedestrian Using Patterns of Motion and Appearance
Paul Viola, Michael J. Jones, Daniel SnowIEEE ICCV
![Page 15: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/15.jpg)
The System
A pedestrian detection system using image intensity information and motion information with the detectors trained by AdaBoost.The first approach combining both the appearance and motion information in a single detector. Advantages:
High efficiencyHigh detection rate & low false positive rate
![Page 16: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/16.jpg)
Rectangle Filters
Measuring the difference between region averages at various scales, orientations and aspect ratios.However, this information is limited and needs to be boosted to perform accurate classification
![Page 17: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/17.jpg)
Motion Information
Information about the direction of motion can be extracted from the difference between shifted versions of the second image in time with the first imageMotion filters (direction, shear, magnitude) operate on 5 images:
( )( )( )( )( )↓−=
→−=←−=↑−=
−=Δ
+
+
+
+
+
1
1
1
1
1
tt
tt
tt
tt
tt
IIabsD
IIabsRIIabsLIIabsU
IIabs
![Page 18: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/18.jpg)
An Example
![Page 19: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/19.jpg)
Appearance Filter
Appearance Filter is rectangular filters that operate on the first input image
( )tm If φ=
![Page 20: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/20.jpg)
Integral Image
The integral image at location x,y contains the sum of the pixels above and to the left of x,y, inclusive:
where is the integral image and is the original image
where s(x,y) is the cumulative row sum
( ) ( )∑≤′≤′
′′=yyxx
yxiyxii,
,,
( )yxii , ( )yxi ,
( ) ( ) ( )( ) ( ) ( )yxsyxiiyxii
yxiyxsyxs,,1,
,1,,+−=+−=
![Page 21: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/21.jpg)
Training Filters
The rectangle filters can have any size, aspect ratio or position as long as they fit in the detection window; therefore, there are quite a number of possible motion and appearance filters, from which a learning algorithm selects to build classifiers.
![Page 22: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/22.jpg)
Training Process
The training process uses AdaBoost to select a subset of features (F) which minimize the weighted error, to construct the classifier.In each round, the learning algorithm chooses a set of filters from motion and appearance filters.Also picks the optimal threshold (t) for each feature as well as the linear weights The outputs of AdaBoost is a linear combination of the selected features.
![Page 23: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/23.jpg)
Training Process
A cascade architecture is used to raise the efficiency of the system.The true and false positives passed at the current stage will be used in the next stage of the cascade. The goal is to reduce the false positive rate faster than the detection rate.
![Page 24: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/24.jpg)
Strong Classifier
Weak Classifier 1
0.9 0.5 0.30.7
Weight
=Strong Classifier
Weak Classifier 2
Weak Classifier 3
Weak Classifier 4
0.9 + 0.7 + 0.3 = 1.9 > 1.0 (threshold)
Weak Classifier 1
0.9 0.5 0.30.7
Weight
=Weak Classifier 2
Weak Classifier 3
Weak Classifier 4
0.5 + 0.3 = 0.8 < 1.0 (threshold)
Classifier 1Classifier 1
Classifier 2Classifier 2
……
Overview of the Cascaded Structure
![Page 25: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/25.jpg)
Experiments
Each classifier in the cascade is trained using the original positive examples and the same number of false positives from the previous stage or negative examples at the first stage.The resulting classifier of previous stage is used as the input of the current stage and build a new classifier with lower false positive rateThe detection threshold is set using a validation set of image pairs.
![Page 26: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/26.jpg)
Training samples
A small sample of positive training examples: A pair of image patterns comprise a single example for training
![Page 27: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/27.jpg)
Training the cascade
A large number of motion and appearance filters for training the dynamic pedestriansFewer number of appearance filters for training the static pedestrians
![Page 28: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/28.jpg)
Training results
The first five filters learned for the dynamic pedestrian detector. The six images used in the motion and appearance representation are shown for each filter
The first five filters learned for the static pedestrian detector
![Page 29: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/29.jpg)
Testing
Detection results of the dynamic detector
![Page 30: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/30.jpg)
Testing
Detection results of the static detector
![Page 31: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/31.jpg)
Pedestrian Detection Using Boosting and Covariance Features
Sakrapee Paisitkriangkrai, Chunhua Shen, and Jian Zhang, IEEE T-CSVT
![Page 32: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/32.jpg)
Covariance FeaturesThe image is divided into small overlapped regions.Each pixel in the region is converted to an eight-dimensional feature vector
[ ] ⎥⎦
⎤⎢⎣
⎡−
−=−−= ∑ ∑ ∑
k k kkkkkYX YX
nYX
nYXEYX 1
11))((),cov( μμ
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎟⎟⎠
⎞⎜⎜⎝
⎛+
=−
YYXXX
YYX
YX
IIII
II
IIyx
yxF122 tan
),(
Covariance matrix is calculated from
To improve the calculation time, technique which employs integral image has been applied. In other words, we compute the integral image of
∑∑∑k
kkk
kk
k YXYX
![Page 33: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/33.jpg)
Experimental Results
0 0.002 0.004 0.006 0.008 0.010.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1Feature Comparsion
False Positive Rate
Det
ectio
n R
ate
COV, RBF SVM (g=0.01)HoG, RBF SVM (g=0.01)HoG, Quadratic SVMCOV, Quadratic SVMHoG, Linear SVMCOV, Linear SVM
![Page 34: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/34.jpg)
RemarksAlthough, covariance features with non-linear SVM outperform many state-of-the-art techniques, it has the following disadvantages:
The block size used in SVM is fixed (7x7 pixels), which meansunable to capture human body parts with other rectangular shapes e.g. human limbs, torso, etc.Parameter tuning process in SVM is rather tedious.High computation time of non-linear SVM.
Building a new, simpler pedestrian detector usingcovariance featuresAdaBoost with weighted Fisher linear discriminant analysis (WLDA) based weak classifierscascaded structure.
![Page 35: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/35.jpg)
Linear Discriminant
Analysis (LDA)
MotivationProject data onto a line (Rn R1) such that patterns become well separated (in a least square sense).Two-dimensional example
![Page 36: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/36.jpg)
Linear Discriminant
Analysis (LDA)
MotivationProject data onto a line (Rn R1) such that patterns become well separated (in a least square sense).Two-dimensional example
Best separation between two classes
![Page 37: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/37.jpg)
Covariance features with LDA
ObservationsIt is possible to achieve a 5% test error rate using either 25 covariance features or 100 Haar-like features
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
]var[...],cov[],cov[.........
]var[],cov[]var[
YYYYYY IIYIX
YYXX
Combine
covariance features with LDA and compare it against haar-like features.
![Page 38: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/38.jpg)
![Page 39: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/39.jpg)
![Page 40: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/40.jpg)
Combine multi-dimensional covariance features with weighted LDA
Trained the new features on AdaBoostframework for faster speed and high accuracy.Apply multiple layer boosting with heterogeneous features on cascaded structure
Components
#1
0.9 0.5 0.30.7Weight
=Strong Classifier
# 2 # 3 # 40.9 + 0.7 + 0.3 =
1.9 > 1.0 (threshold)
![Page 41: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/41.jpg)
Architecture
Calculate region covariance matrix and
stack upper triangle of the matrix into a vector (Rn)
Apply Weighted Fisher Linear Discriminant
(Rn R1)
AdaBoost selects best weak learner with respect to the
weighted error
Update the sample weights
Training dataset
A complete set of rectangular filters (Weak classifiers )
Test the predefined objective
Hit rate: 99.5%False Pos: 50%
F
T
Strong Classifier
Architecture of the pedestrian detection system using boosted covariance features.
![Page 42: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/42.jpg)
ObservationsObservations –
Covariance featuresThe combined covariance features represent a distinct part of the human body.The 1st covariance feature represents human legs (two parallel vertical bars)The 2nd covariance feature captures the information of the head and the human body
Compare with Haar
features1.
The
1st
haar
feature represents human head/shoulder contour
2.
The
2nd
haar
feature represents human left leg.
![Page 43: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/43.jpg)
Experimental Results
The proposed boosted covariance detector achieves about ten times fasterdetection speed than the conventional covariance detector (Tuzel et al. 2007). On a 360 x 288 pixels image, our system can process at around 4 frames per second. This is the first real-time covariance feature based pedestrian detector.
![Page 44: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/44.jpg)
Experimental Results
![Page 45: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/45.jpg)
Face Detection ApplicationsFace Detection ApplicationsSummary: Viola & Jones’ Face Detector
Use Integral image for efficient feature extractionUse AdaBoost for feature selectionApply cascade classifier for efficient non-faces elimination
Pros:Fast and robust face detectorThe system can be run in real-time
Cons:Training stage is time consuming (1~2 weeks) depending on number of training samples and number of features usedRequire a lot of face training samples
Discussions:Performance of face detection depends crucially on the features that are used to represent the objectsGood features not only result in better generalization abilities but also require smaller training database.
![Page 46: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/46.jpg)
Face Detection ApplicationsProposed work
Similar to previous experiment, we apply covariance features to face detection
The differences between our work and Viola & Jones’ framework:We use covariance features We adopt the weighted FDA as weak classifiers
To show the better classification capability, we have trained a boosted classifier on the banana dataset with multidimensional decision stump and FDA as weak classifiers.
200 400 600 800 1000
0.1
0.15
0.2
0.25
0.3
0.35
0.4
# of weak classifiers (multidimensional stump)
Err
or
Train errorTest error
50 100 150 2000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
# of weak classifiers (Fisher discriminant analysis)
Err
or
Train errorTest error
![Page 47: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/47.jpg)
Observations / Experimental resultsROC curves show that covariance features significantly outperform Haar-like wavelet features when the training database size is small.As the number of samples grows, the performance difference between the
two techniques decreases.
0 50 100 150 200 250 300 350 4000.65
0.7
0.75
0.8
0.85
0.9
ROC curve on MIT + CMU test set (250 faces)
Number of False Positives
Cor
rect
Det
ectio
n R
ate
COV Features (250 faces)Haar Features (250 faces)
0 50 100 150 200 250 300 350 4000.65
0.7
0.75
0.8
0.85
0.9
ROC curve on MIT + CMU test set (500 faces)
Number of False Positives
Cor
rect
Det
ectio
n R
ate
COV Features (500 faces)Haar Features (500 faces)
ROC curves for our algorithm on the MIT+CMU
test set.
![Page 48: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/48.jpg)
Experimental Results
Some detection results of our face detectors trained using 250 frontal faces on MIT + CMU test images
![Page 49: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/49.jpg)
Summary
BoostingAdaBoostAdaBoost for pedestrian detection using Haar features and dynamic temporal informationAdaBoost for pedestrian detection using new covariance featuresFace detection using new covariance features
![Page 50: Boosting: Algorithms and Applications](https://reader030.fdocuments.in/reader030/viewer/2022041213/62510724f3a00f55f159dbe4/html5/thumbnails/50.jpg)
Questions?