Boosting: Algorithms
and Applications
Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision
ANU 2nd
Semester, 2008
Chunhua Shen, NICTA/RSISE
BoostingDefinition of Boosting:
Boosting refers to the general problem of producing a very accurate prediction rule by combining rough and moderately inaccurate rules.
Boosting proceduresGiven a set of labeled training examples On each round
The booster devises a distribution (importance) over the example setThe booster requests a weak hypothesis/classifier/learnerwith low error
Upon convergence, the booster combine the weak hypothesis into a single prediction rule.
Boosting (Freund & Schapire, 1997)
Boosting: 1st
iteration
Boosting: Update Distribution
Boosting as Entropy ProjectionMinimizing relative entropy to last distribution s.t. linear constraints
Boosting: 2nd
Hypothesis
Boosting: 3rd
Hypothesis
Boosting: 4th
Hypothesis
All hypotheses
AdaBoost
Properties of AdaBoost
Adaboost adjusts adaptively the errors of the weak hypotheses by weak learner.Unlike the conventional boosting algorithm, the prior error need not be known ahead of time.The update rule reduces the probability assigned to those examples on which the hypothesis makes a good predictions and increases the probability of the examples on which the prediction is poor.
Multi-class Extensions
The previous discussion is restricted to binary classification problems. The traing data could have any number of labels, which is a multi-class problems.The multi-class case (AdaBoost.M1) requires the accuracy of the weak hypothesis greater than ½. This condition in the multi-class is stronger than that in the binary classification cases
Detection Pedestrian Using Patterns of Motion and Appearance
Paul Viola, Michael J. Jones, Daniel SnowIEEE ICCV
The System
A pedestrian detection system using image intensity information and motion information with the detectors trained by AdaBoost.The first approach combining both the appearance and motion information in a single detector. Advantages:
High efficiencyHigh detection rate & low false positive rate
Rectangle Filters
Measuring the difference between region averages at various scales, orientations and aspect ratios.However, this information is limited and needs to be boosted to perform accurate classification
Motion Information
Information about the direction of motion can be extracted from the difference between shifted versions of the second image in time with the first imageMotion filters (direction, shear, magnitude) operate on 5 images:
( )( )( )( )( )↓−=
→−=←−=↑−=
−=Δ
+
+
+
+
+
1
1
1
1
1
tt
tt
tt
tt
tt
IIabsD
IIabsRIIabsLIIabsU
IIabs
An Example
Appearance Filter
Appearance Filter is rectangular filters that operate on the first input image
( )tm If φ=
Integral Image
The integral image at location x,y contains the sum of the pixels above and to the left of x,y, inclusive:
where is the integral image and is the original image
where s(x,y) is the cumulative row sum
( ) ( )∑≤′≤′
′′=yyxx
yxiyxii,
,,
( )yxii , ( )yxi ,
( ) ( ) ( )( ) ( ) ( )yxsyxiiyxii
yxiyxsyxs,,1,
,1,,+−=+−=
Training Filters
The rectangle filters can have any size, aspect ratio or position as long as they fit in the detection window; therefore, there are quite a number of possible motion and appearance filters, from which a learning algorithm selects to build classifiers.
Training Process
The training process uses AdaBoost to select a subset of features (F) which minimize the weighted error, to construct the classifier.In each round, the learning algorithm chooses a set of filters from motion and appearance filters.Also picks the optimal threshold (t) for each feature as well as the linear weights The outputs of AdaBoost is a linear combination of the selected features.
Training Process
A cascade architecture is used to raise the efficiency of the system.The true and false positives passed at the current stage will be used in the next stage of the cascade. The goal is to reduce the false positive rate faster than the detection rate.
Strong Classifier
Weak Classifier 1
0.9 0.5 0.30.7
Weight
=Strong Classifier
Weak Classifier 2
Weak Classifier 3
Weak Classifier 4
0.9 + 0.7 + 0.3 = 1.9 > 1.0 (threshold)
Weak Classifier 1
0.9 0.5 0.30.7
Weight
=Weak Classifier 2
Weak Classifier 3
Weak Classifier 4
0.5 + 0.3 = 0.8 < 1.0 (threshold)
Classifier 1Classifier 1
Classifier 2Classifier 2
……
Overview of the Cascaded Structure
Experiments
Each classifier in the cascade is trained using the original positive examples and the same number of false positives from the previous stage or negative examples at the first stage.The resulting classifier of previous stage is used as the input of the current stage and build a new classifier with lower false positive rateThe detection threshold is set using a validation set of image pairs.
Training samples
A small sample of positive training examples: A pair of image patterns comprise a single example for training
Training the cascade
A large number of motion and appearance filters for training the dynamic pedestriansFewer number of appearance filters for training the static pedestrians
Training results
The first five filters learned for the dynamic pedestrian detector. The six images used in the motion and appearance representation are shown for each filter
The first five filters learned for the static pedestrian detector
Testing
Detection results of the dynamic detector
Testing
Detection results of the static detector
Pedestrian Detection Using Boosting and Covariance Features
Sakrapee Paisitkriangkrai, Chunhua Shen, and Jian Zhang, IEEE T-CSVT
Covariance FeaturesThe image is divided into small overlapped regions.Each pixel in the region is converted to an eight-dimensional feature vector
[ ] ⎥⎦
⎤⎢⎣
⎡−
−=−−= ∑ ∑ ∑
k k kkkkkYX YX
nYX
nYXEYX 1
11))((),cov( μμ
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎟⎟⎠
⎞⎜⎜⎝
⎛+
=−
YYXXX
YYX
YX
IIII
II
IIyx
yxF122 tan
),(
Covariance matrix is calculated from
To improve the calculation time, technique which employs integral image has been applied. In other words, we compute the integral image of
∑∑∑k
kkk
kk
k YXYX
Experimental Results
0 0.002 0.004 0.006 0.008 0.010.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1Feature Comparsion
False Positive Rate
Det
ectio
n R
ate
COV, RBF SVM (g=0.01)HoG, RBF SVM (g=0.01)HoG, Quadratic SVMCOV, Quadratic SVMHoG, Linear SVMCOV, Linear SVM
RemarksAlthough, covariance features with non-linear SVM outperform many state-of-the-art techniques, it has the following disadvantages:
The block size used in SVM is fixed (7x7 pixels), which meansunable to capture human body parts with other rectangular shapes e.g. human limbs, torso, etc.Parameter tuning process in SVM is rather tedious.High computation time of non-linear SVM.
Building a new, simpler pedestrian detector usingcovariance featuresAdaBoost with weighted Fisher linear discriminant analysis (WLDA) based weak classifierscascaded structure.
Linear Discriminant
Analysis (LDA)
MotivationProject data onto a line (Rn R1) such that patterns become well separated (in a least square sense).Two-dimensional example
Linear Discriminant
Analysis (LDA)
MotivationProject data onto a line (Rn R1) such that patterns become well separated (in a least square sense).Two-dimensional example
Best separation between two classes
Covariance features with LDA
ObservationsIt is possible to achieve a 5% test error rate using either 25 covariance features or 100 Haar-like features
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
]var[...],cov[],cov[.........
]var[],cov[]var[
YYYYYY IIYIX
YYXX
Combine
covariance features with LDA and compare it against haar-like features.
Combine multi-dimensional covariance features with weighted LDA
Trained the new features on AdaBoostframework for faster speed and high accuracy.Apply multiple layer boosting with heterogeneous features on cascaded structure
Components
#1
0.9 0.5 0.30.7Weight
=Strong Classifier
# 2 # 3 # 40.9 + 0.7 + 0.3 =
1.9 > 1.0 (threshold)
Architecture
Calculate region covariance matrix and
stack upper triangle of the matrix into a vector (Rn)
Apply Weighted Fisher Linear Discriminant
(Rn R1)
AdaBoost selects best weak learner with respect to the
weighted error
Update the sample weights
Training dataset
A complete set of rectangular filters (Weak classifiers )
Test the predefined objective
Hit rate: 99.5%False Pos: 50%
F
T
Strong Classifier
Architecture of the pedestrian detection system using boosted covariance features.
ObservationsObservations –
Covariance featuresThe combined covariance features represent a distinct part of the human body.The 1st covariance feature represents human legs (two parallel vertical bars)The 2nd covariance feature captures the information of the head and the human body
Compare with Haar
features1.
The
1st
haar
feature represents human head/shoulder contour
2.
The
2nd
haar
feature represents human left leg.
Experimental Results
The proposed boosted covariance detector achieves about ten times fasterdetection speed than the conventional covariance detector (Tuzel et al. 2007). On a 360 x 288 pixels image, our system can process at around 4 frames per second. This is the first real-time covariance feature based pedestrian detector.
Experimental Results
Face Detection ApplicationsFace Detection ApplicationsSummary: Viola & Jones’ Face Detector
Use Integral image for efficient feature extractionUse AdaBoost for feature selectionApply cascade classifier for efficient non-faces elimination
Pros:Fast and robust face detectorThe system can be run in real-time
Cons:Training stage is time consuming (1~2 weeks) depending on number of training samples and number of features usedRequire a lot of face training samples
Discussions:Performance of face detection depends crucially on the features that are used to represent the objectsGood features not only result in better generalization abilities but also require smaller training database.
Face Detection ApplicationsProposed work
Similar to previous experiment, we apply covariance features to face detection
The differences between our work and Viola & Jones’ framework:We use covariance features We adopt the weighted FDA as weak classifiers
To show the better classification capability, we have trained a boosted classifier on the banana dataset with multidimensional decision stump and FDA as weak classifiers.
200 400 600 800 1000
0.1
0.15
0.2
0.25
0.3
0.35
0.4
# of weak classifiers (multidimensional stump)
Err
or
Train errorTest error
50 100 150 2000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
# of weak classifiers (Fisher discriminant analysis)
Err
or
Train errorTest error
Observations / Experimental resultsROC curves show that covariance features significantly outperform Haar-like wavelet features when the training database size is small.As the number of samples grows, the performance difference between the
two techniques decreases.
0 50 100 150 200 250 300 350 4000.65
0.7
0.75
0.8
0.85
0.9
ROC curve on MIT + CMU test set (250 faces)
Number of False Positives
Cor
rect
Det
ectio
n R
ate
COV Features (250 faces)Haar Features (250 faces)
0 50 100 150 200 250 300 350 4000.65
0.7
0.75
0.8
0.85
0.9
ROC curve on MIT + CMU test set (500 faces)
Number of False Positives
Cor
rect
Det
ectio
n R
ate
COV Features (500 faces)Haar Features (500 faces)
ROC curves for our algorithm on the MIT+CMU
test set.
Experimental Results
Some detection results of our face detectors trained using 250 frontal faces on MIT + CMU test images
Summary
BoostingAdaBoostAdaBoost for pedestrian detection using Haar features and dynamic temporal informationAdaBoost for pedestrian detection using new covariance featuresFace detection using new covariance features
Questions?
Top Related