Boosting: Algorithms and Applications

Boosting: Algorithms

and Applications

Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision

ANU 2nd

Semester, 2008

Chunhua Shen, NICTA/RSISE

BoostingDefinition of Boosting:

Boosting refers to the general problem of producing a very accurate prediction rule by combining rough and moderately inaccurate rules.

Boosting proceduresGiven a set of labeled training examples On each round

The booster devises a distribution (importance) over the example setThe booster requests a weak hypothesis/classifier/learnerwith low error

Upon convergence, the booster combine the weak hypothesis into a single prediction rule.

Boosting (Freund & Schapire, 1997)

Boosting: 1st

iteration

Boosting: Update Distribution

Boosting as Entropy ProjectionMinimizing relative entropy to last distribution s.t. linear constraints

Boosting: 2nd

Hypothesis

Boosting: 3rd

Hypothesis

Boosting: 4th

Hypothesis

All hypotheses

AdaBoost

Properties of AdaBoost

Adaboost adjusts adaptively the errors of the weak hypotheses by weak learner.Unlike the conventional boosting algorithm, the prior error need not be known ahead of time.The update rule reduces the probability assigned to those examples on which the hypothesis makes a good predictions and increases the probability of the examples on which the prediction is poor.

Multi-class Extensions

The previous discussion is restricted to binary classification problems. The traing data could have any number of labels, which is a multi-class problems.The multi-class case (AdaBoost.M1) requires the accuracy of the weak hypothesis greater than ½. This condition in the multi-class is stronger than that in the binary classification cases

Detection Pedestrian Using Patterns of Motion and Appearance

Paul Viola, Michael J. Jones, Daniel SnowIEEE ICCV

The System

A pedestrian detection system using image intensity information and motion information with the detectors trained by AdaBoost.The first approach combining both the appearance and motion information in a single detector. Advantages:

High efficiencyHigh detection rate & low false positive rate

Rectangle Filters

Measuring the difference between region averages at various scales, orientations and aspect ratios.However, this information is limited and needs to be boosted to perform accurate classification

Motion Information

Information about the direction of motion can be extracted from the difference between shifted versions of the second image in time with the first imageMotion filters (direction, shear, magnitude) operate on 5 images:

( )( )( )( )( )↓−=

→−=←−=↑−=

−=Δ

+

+

+

+

+

1

1

1

1

1

tt

tt

tt

tt

tt

IIabsD

IIabsRIIabsLIIabsU

IIabs

An Example

Appearance Filter

Appearance Filter is rectangular filters that operate on the first input image

( )tm If φ=

Integral Image

The integral image at location x,y contains the sum of the pixels above and to the left of x,y, inclusive:

where is the integral image and is the original image

where s(x,y) is the cumulative row sum

( ) ( )∑≤′≤′

′′=yyxx

yxiyxii,

,,

( )yxii , ( )yxi ,

( ) ( ) ( )( ) ( ) ( )yxsyxiiyxii

yxiyxsyxs,,1,

,1,,+−=+−=

Training Filters

The rectangle filters can have any size, aspect ratio or position as long as they fit in the detection window; therefore, there are quite a number of possible motion and appearance filters, from which a learning algorithm selects to build classifiers.

Training Process

The training process uses AdaBoost to select a subset of features (F) which minimize the weighted error, to construct the classifier.In each round, the learning algorithm chooses a set of filters from motion and appearance filters.Also picks the optimal threshold (t) for each feature as well as the linear weights The outputs of AdaBoost is a linear combination of the selected features.

Training Process

A cascade architecture is used to raise the efficiency of the system.The true and false positives passed at the current stage will be used in the next stage of the cascade. The goal is to reduce the false positive rate faster than the detection rate.

Strong Classifier

Weak Classifier 1

0.9 0.5 0.30.7

Weight

=Strong Classifier

Weak Classifier 2

Weak Classifier 3

Weak Classifier 4

0.9 + 0.7 + 0.3 = 1.9 > 1.0 (threshold)

Weak Classifier 1

0.9 0.5 0.30.7

Weight

=Weak Classifier 2

Weak Classifier 3

Weak Classifier 4

0.5 + 0.3 = 0.8 < 1.0 (threshold)

Classifier 1Classifier 1

Classifier 2Classifier 2

……

Overview of the Cascaded Structure

Experiments

Each classifier in the cascade is trained using the original positive examples and the same number of false positives from the previous stage or negative examples at the first stage.The resulting classifier of previous stage is used as the input of the current stage and build a new classifier with lower false positive rateThe detection threshold is set using a validation set of image pairs.

Training samples

A small sample of positive training examples: A pair of image patterns comprise a single example for training

Training the cascade

A large number of motion and appearance filters for training the dynamic pedestriansFewer number of appearance filters for training the static pedestrians

Training results

The first five filters learned for the dynamic pedestrian detector. The six images used in the motion and appearance representation are shown for each filter

The first five filters learned for the static pedestrian detector

Testing

Detection results of the dynamic detector

Testing

Detection results of the static detector

Pedestrian Detection Using Boosting and Covariance Features

Sakrapee Paisitkriangkrai, Chunhua Shen, and Jian Zhang, IEEE T-CSVT

Covariance FeaturesThe image is divided into small overlapped regions.Each pixel in the region is converted to an eight-dimensional feature vector

[ ] ⎥⎦

⎤⎢⎣

⎡−

−=−−= ∑ ∑ ∑

k k kkkkkYX YX

nYX

nYXEYX 1

11))((),cov( μμ

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

⎟⎟⎠

⎞⎜⎜⎝

⎛+

=−

YYXXX

YYX

YX

IIII

II

IIyx

yxF122 tan

),(

Covariance matrix is calculated from

To improve the calculation time, technique which employs integral image has been applied. In other words, we compute the integral image of

∑∑∑k

kkk

kk

k YXYX

Experimental Results

0 0.002 0.004 0.006 0.008 0.010.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1Feature Comparsion

False Positive Rate

Det

ectio

n R

ate

COV, RBF SVM (g=0.01)HoG, RBF SVM (g=0.01)HoG, Quadratic SVMCOV, Quadratic SVMHoG, Linear SVMCOV, Linear SVM

RemarksAlthough, covariance features with non-linear SVM outperform many state-of-the-art techniques, it has the following disadvantages:

The block size used in SVM is fixed (7x7 pixels), which meansunable to capture human body parts with other rectangular shapes e.g. human limbs, torso, etc.Parameter tuning process in SVM is rather tedious.High computation time of non-linear SVM.

Building a new, simpler pedestrian detector usingcovariance featuresAdaBoost with weighted Fisher linear discriminant analysis (WLDA) based weak classifierscascaded structure.

Linear Discriminant

Analysis (LDA)

MotivationProject data onto a line (Rn R1) such that patterns become well separated (in a least square sense).Two-dimensional example

Linear Discriminant

Analysis (LDA)

MotivationProject data onto a line (Rn R1) such that patterns become well separated (in a least square sense).Two-dimensional example

Best separation between two classes

Covariance features with LDA

ObservationsIt is possible to achieve a 5% test error rate using either 25 covariance features or 100 Haar-like features

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

]var[...],cov[],cov[.........

]var[],cov[]var[

YYYYYY IIYIX

YYXX

Combine

covariance features with LDA and compare it against haar-like features.

Combine multi-dimensional covariance features with weighted LDA

Trained the new features on AdaBoostframework for faster speed and high accuracy.Apply multiple layer boosting with heterogeneous features on cascaded structure

Components

#1

0.9 0.5 0.30.7Weight

=Strong Classifier

# 2 # 3 # 40.9 + 0.7 + 0.3 =

1.9 > 1.0 (threshold)

Architecture

Calculate region covariance matrix and

stack upper triangle of the matrix into a vector (Rn)

Apply Weighted Fisher Linear Discriminant

(Rn R1)

AdaBoost selects best weak learner with respect to the

weighted error

Update the sample weights

Training dataset

A complete set of rectangular filters (Weak classifiers )

Test the predefined objective

Hit rate: 99.5%False Pos: 50%

F

T

Strong Classifier

Architecture of the pedestrian detection system using boosted covariance features.

ObservationsObservations –

Covariance featuresThe combined covariance features represent a distinct part of the human body.The 1st covariance feature represents human legs (two parallel vertical bars)The 2nd covariance feature captures the information of the head and the human body

Compare with Haar

features1.

The

1st

haar

feature represents human head/shoulder contour

2.

The

2nd

haar

feature represents human left leg.


The proposed boosted covariance detector achieves about ten times fasterdetection speed than the conventional covariance detector (Tuzel et al. 2007). On a 360 x 288 pixels image, our system can process at around 4 frames per second. This is the first real-time covariance feature based pedestrian detector.

Face Detection ApplicationsFace Detection ApplicationsSummary: Viola & Jones’ Face Detector

Use Integral image for efficient feature extractionUse AdaBoost for feature selectionApply cascade classifier for efficient non-faces elimination

Pros:Fast and robust face detectorThe system can be run in real-time

Cons:Training stage is time consuming (1~2 weeks) depending on number of training samples and number of features usedRequire a lot of face training samples

Discussions:Performance of face detection depends crucially on the features that are used to represent the objectsGood features not only result in better generalization abilities but also require smaller training database.

Face Detection ApplicationsProposed work

Similar to previous experiment, we apply covariance features to face detection

The differences between our work and Viola & Jones’ framework:We use covariance features We adopt the weighted FDA as weak classifiers

To show the better classification capability, we have trained a boosted classifier on the banana dataset with multidimensional decision stump and FDA as weak classifiers.

200 400 600 800 1000

0.1

0.15

0.2

0.25

0.3

0.35

0.4

# of weak classifiers (multidimensional stump)

Err

or

Train errorTest error

50 100 150 2000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

# of weak classifiers (Fisher discriminant analysis)

Err

or

Train errorTest error

Observations / Experimental resultsROC curves show that covariance features significantly outperform Haar-like wavelet features when the training database size is small.As the number of samples grows, the performance difference between the

two techniques decreases.

0 50 100 150 200 250 300 350 4000.65

0.7

0.75

0.8

0.85

0.9

ROC curve on MIT + CMU test set (250 faces)

Number of False Positives

Cor

rect

Det

ectio

n R

ate

COV Features (250 faces)Haar Features (250 faces)

0 50 100 150 200 250 300 350 4000.65

0.7

0.75

0.8

0.85

0.9

ROC curve on MIT + CMU test set (500 faces)

Number of False Positives

Cor

rect

Det

ectio

n R

ate

COV Features (500 faces)Haar Features (500 faces)

ROC curves for our algorithm on the MIT+CMU

test set.


Some detection results of our face detectors trained using 250 frontal faces on MIT + CMU test images

Summary

BoostingAdaBoostAdaBoost for pedestrian detection using Haar features and dynamic temporal informationAdaBoost for pedestrian detection using new covariance featuresFace detection using new covariance features

Questions?

Boosting: Algorithms and Applications

Documents

Transcript of Boosting: Algorithms and Applications