Kullback-Leibler Boosting Ce Liu Heung-Yeung Shum Microsoft Research Asia Research Asia.

Kullback-Leibler Boosting

Ce Liu Heung-Yeung ShumCe Liu Heung-Yeung Shum

Microsoft Research AsiaMicrosoft Research Asia

ResearchAsia

A General Two-layer ClassifiersA General Two-layer Classifiers

Input Intermediate Output

)))(((sign)(1

iiii cxxI

RRdi :)( Projection function

RRi :)( Discriminating function

}1,1{:)( dRI Identification function

}{ i Coefficients

Issues under Two-layer FrameworkIssues under Two-layer Framework

How to choose the type of projection function?How to choose the type of projection function? How to choose the type of discriminating function?How to choose the type of discriminating function? How to learn the parameters from samples?How to learn the parameters from samples?

xx T )(

Projection function

||||)( xx )tanh()( zz

Sigmoid

}exp{)( 2zz RBF

1)( bzbzbz nn Polynomial

Discriminating function

)))(((sign)(1

iiii cxxI

Our proposalOur proposal

How to choose the type of projection function?How to choose the type of projection function?• Kullback-Leibler linear featureKullback-Leibler linear feature

How to choose the type of discriminating function?How to choose the type of discriminating function?• Histogram divergencesHistogram divergences

How to learn the parameters from samples?How to learn the parameters from samples?• Sample re-weighting (Boosting)Sample re-weighting (Boosting)

Kullback-Leibler Boosting (KL Boosting)Kullback-Leibler Boosting (KL Boosting)

IntuitionsIntuitions

Linear projection is robust and easy to computeLinear projection is robust and easy to compute

The histograms of two classes upon a projection are The histograms of two classes upon a projection are evidences for classificationevidences for classification

• The linear feature, on which the histograms of two classes differ The linear feature, on which the histograms of two classes differ most, should be selectedmost, should be selected

If the weight distribution of the sample set changes, If the weight distribution of the sample set changes, the histogram changes as wellthe histogram changes as well

• Increase weights for misclassified samples, and decrease Increase weights for misclassified samples, and decrease weights for correctly classified samplesweights for correctly classified samples

Linear projections and histogramsLinear projections and histograms

xx T )(

)(),(~}{ ii xWxfx)(xf

)()()(

xzxWzH

KLBoosting (1)KLBoosting (1)

At the At the kkthth iteration iteration• Kullback-Leibler FeatureKullback-Leibler Feature

• Discriminating functionDiscriminating function

• ReweightingReweighting

xhxhxhKL T

)(log)]()([)(

)(log)(

)(maxarg* KLk

)}(exp{)(1

ikkikik

KLBoosting (2)KLBoosting (2)

Two types of parameters to learnTwo types of parameters to learn• KL features: KL features: • Combination coefficients:Combination coefficients:

Learning KL feature in low dimensions: MCMCLearning KL feature in low dimensions: MCMC

Learning weights to minimize training errorLearning weights to minimize training error

• Optimization: brute-force searchOptimization: brute-force search

1* minarg}{

iii xIy

minarg

Learn combining coefficients

FlowchartFlowchart

ki 1}{

Input: }1,1{, 1 yRx d

),(,),,( 11 NN yxyx

Initialize weights

Learn KL feature

Update weights

Recognition error small enough?

Output classifier

)(log[sign)(

00 ,WW

kk WW ,

}{ iy}{ ix

+1 -1 -1 +1 -1 +1 +1 +1 -1 +1 -1 +1

)}({ ixW

A Simple ExampleA Simple Example

KL Features

Histograms

Decision manifold

A Complicated CaseA Complicated Case

8 16 24 321

Feature number

Error rate

Error rate on testing data by KLBoosting

Error rate on training data by KLBoosting

Error rate on testing data by AdaBoost

Error rate on training data by AdaBoost

Kullback-Leibler Analysis (KLA)Kullback-Leibler Analysis (KLA)

A challenging task to find KL feature in image spaceA challenging task to find KL feature in image space

Sequential 1D optimizationSequential 1D optimization• Construct a feature bankConstruct a feature bank• Build a set of the most promising featuresBuild a set of the most promising features• Sequentially do 1D optimization along the promising Sequentially do 1D optimization along the promising

featuresfeatures

Conjecture:The global optimum of an objective function can be reached bysearching along linear features as many as needed

Intuition of Sequential 1D OptimizationIntuition of Sequential 1D Optimization

Feature bankPromising feature setResult of Sequential 1D Optimization MCMC feature

Optimization in Image SpaceOptimization in Image Space

Image is a random field, not a pure random variableImage is a random field, not a pure random variable

The local statistics can be captured by waveletsThe local statistics can be captured by wavelets

• 111111×400×400 small-scale wavelets for the whole 20 small-scale wavelets for the whole 20×20 patch×20 patch

• 8080×100 ×100 large-scale wavelets for the inner 10large-scale wavelets for the inner 10×10 patch×10 patch

• Total 52,400 wavelets to compose a feature bankTotal 52,400 wavelets to compose a feature bank

• 2,800 most promising wavelets selected2,800 most promising wavelets selected

Gaussian family wavelets Harr wavelets Feature bank

Compose the KL feature by sequential 1D optimization

Data-driven KLAData-driven KLA

Face patterns

Non-face patterns

Feature bank (111 wavelets)

Promising feature set (total 2,800 features)

On each position of the 20*20 lattice, compute the histograms of the 111 wavelets and the KL divergences between face and non-face images.

Large scale wavelets are used to capture the global statistics, on the 10*10 inner lattice

Comparison with Other FeaturesComparison with Other Features

MCMC feature Best Harr wavelet

KL=2.944 (Harr wavelet)KL=3.246 (MCMC feature)

KL feature

KL=10.967 (KL feature)

Application: Face DetectionApplication: Face Detection

Experimental setupExperimental setup• 2020××20 patch to represent face20 patch to represent face

• 17,520 frontal faces17,520 frontal faces

• 1,339,856,947 non-faces from 2,484 images1,339,856,947 non-faces from 2,484 images

• 300 bins in histogram representation300 bins in histogram representation

A cascade of KLBoosting classifiersA cascade of KLBoosting classifiers• In each classifier, keep false negative rate <0.01% and In each classifier, keep false negative rate <0.01% and

false alarm rate <35%false alarm rate <35%

• Totally 22 classifiers to form the cascade (450 features)Totally 22 classifiers to form the cascade (450 features)

KL Features of Face DetectorKL Features of Face Detector

Face patterns Non-face patterns

First 10 KL features

Some other KL features

Global semantics

Frequency filters

Local features

ROC CurveROC Curve

false alarm rate

0.653.16 X10 -7 3.16 X10 -6

3.16 X10 -51X10 -6 1X10 -51X10 -7

KLBoosting

AdaBoost

Neural Network

Some Detection ResultsSome Detection Results

Comparison with AdaBoostComparison with AdaBoost

40080 160 240 3200

KLBoosting

AdaBoost

Number of features

Compared with AdaBoostCompared with AdaBoost

KLBoostingKLBoosting AdaBoostAdaBoost

Base Base classifierclassifier

KL feature + histogram KL feature + histogram divergencedivergence

Selected from Selected from experiencesexperiences

Combining Combining coefficientscoefficients

Globally optimized to Globally optimized to minimize training errorminimize training error

Empirically set to be Empirically set to be incrementally optimalincrementally optimal

SummarySummary

KLBoosting is an optimal classifierKLBoosting is an optimal classifier• Projection function: linear projectionProjection function: linear projection• Discrimination function: histogram divergenceDiscrimination function: histogram divergence• Coefficients: optimized by minimizing training errorCoefficients: optimized by minimizing training error

KLA: a data-driven approach to pursue KL featuresKLA: a data-driven approach to pursue KL features

Applications in face detectionApplications in face detection

Thank you!

Harry ShumHarry ShumMicrosoft Research AsiaMicrosoft Research Asia

hshum@microsoft.comhshum@microsoft.com

ResearchAsia

Compared with SVMCompared with SVM

KLBoostingKLBoosting SVMSVM

Support Support vectorsvectors

KL features learnt to optimize KL KL features learnt to optimize KL divergence (a few)divergence (a few)

Selected from training Selected from training samples (many)samples (many)

Kennel Kennel functionfunction Histogram divergence (flexible)Histogram divergence (flexible) Selected from Selected from

experiences (fixed)experiences (fixed)

Kullback-Leibler Boosting Ce Liu Heung-Yeung Shum Microsoft Research Asia Research Asia.

Documents

Transcript of Kullback-Leibler Boosting Ce Liu Heung-Yeung Shum Microsoft Research Asia Research Asia.

Kullback Leibler divergence in complete bacterial …2016; Vinga, 2014). Recently, a genome complexity metric was proposed, the biobit, which balances a genome’s entropic and anti-entropic

The Maximum Entropy on the Mean Method for Image Deblurring · Keywords: Image Deblurring, Maximum Entropy on the Mean, Kullback-Leibler Diver-gence, Convex Analysis, Optimization

Analysis of Kullback-Leibler Divergence for Masquerade ...

High-dimensionalAsymptotic Theory of Bayesian Multiple ... · Keywords: Bayesian multiple testing, Dependence, False discovery rate, Kullback-Leibler, Poste-rior convergence,Ultra

Kullback-Leibler Boosting Ce Liu, Hueng-Yeung Shum Microsoft Research Asia CVPR 2003 Presented by Derek Hoiem.

ROBUST KULLBACK-LEIBLER DIVERGENCE AND ITS …

Kullback-Leibler Divergence Constrained Distributionally ... · In this paper we study distributionally robust optimization (DRO) problems where the am-biguity set of the probability

Edgeworth Approximations of the Kullback-Leibler …saito/publications/saito_ekld2.pdfEdgeworth Approximations of the Kullback-Leibler Distance Towards Problems in Image Analysis Jen-Jen

Sequential change-point detection for time series models: … · 2014. 11. 7. · sets of data. This non-parametric Kullback-Leibler test is integrated in a sequential procedure alternating

JOURNAL OF LA Renyi Divergence and Kullback-Leibler ... · JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Renyi Divergence and Kullback-Leibler Divergence´ Tim van Erven

Kullback Leibler property of kernel mixture priors in Bayesian ...ghosal/papers/kernelmixture.pdfKullback-Leibler property (KL property), is fundamental in posterior consis-tency studies.

SCIENTIFIC AND LARGE DATA VISUALIZATIONvcg.isti.cnr.it/~cignoni/SciViz1920/giorgi_datavis... · 2019-12-05 · • Using the Kullback-Leibler divergence this goal can be achieved

PHYSICAL REVIEW RESEARCH2, 033221 (2020) Visualizing probabilistic models in Minkowski space with intensive symmetrized Kullback-Leibler embedding Han Kheng Teoh, 1Katherine N. Qu

Image Recognition Using Kullback-Leibler Information ...ceur-ws.org/Vol-758/paper_13.pdf · Image Recognition Using Kullback-Leibler Information Discrimination Andrey Savchenko, National

Kullback-Leibler Penalized Sparse Discriminant Analysis ... · Victoria Peterson 1, Hugo Leonardo Ru ner1,2, and Ruben Daniel Spies3 ... arXiv:1608.06863v1 [cs.CV] 24 Aug 2016. we

KULLBACK-LEIBLER INFORMATION AND ITS APPLICATIONS …

On the symmetrical Kullback-Leibler Je reys centroids · On the symmetrical Kullback-Leibler Je reys centroids Frank Nielsen Sony Computer Science Laboratories, Inc. 3-14-13 Higashi

Kullback-Leibler Designs. Jourdan.pdf• The entropy criterion ( Dmax) is the maximization of the determinant of a covariance matrix (Shewry & Wynn, 1987), ENBIS 2009 / Saint -Etienne

Research Article Symmetric Kullback-Leibler Metric …downloads.hindawi.com/journals/abb/2015/714572.pdfResearch Article Symmetric Kullback-Leibler Metric Based Tracking Behaviors

A Class of Adaptive EM-based Importance Sampling ...erp/erp seminar pdfs... · Keywords: mixture of Student-t distributions, importance sampling, Kullback-Leibler di-vergence, Expectation