Robust Real Time Face Detection P. Viola & Michael Jones Presented By: Matan Protter Visual...

Robust Real Time

Face Detection

P. Viola & Michael Jones

Presented By: Matan Protter

Visual Recognition, Spring 2005, Technion

Face Detection – What & Why?

• Given an image:

• Ultimately: Find all faces, while making no mistakes.

• Practically: Find most faces, while making few mistakes.

• What for?

• Mainly: face / people recognition.

• What is a face?Standard testing set: MIT + CMU

Face Detection Methods• Too many to count…

• Look for meaningful features:

• Eyes, nose, ears, chin line, etc.

• Train a detector

• A combination of a number of weighted weak classifiers

• Differ by:

• Feature set

• Training set

• Training method

• Etc.

The Proposed Method• Another learning – based detector

• Novell ideas though:

• Detector structure designed to run quickly – “Cascade”

• Feature set

• Haar – like

• Feature selection method (training method)

• Modified boosting

• Also – a very extensive testing set

• Plus a way to automatically generate negative examples

• All will be explained in the following slides

Detector Structure – Cascade• Detectors usually scan every window in the image in

every scale.

• Conventional detectors run all weak classifiers on all windows.

• Some windows can be discarded very quickly.

• Therefore, computation time is wasted.

• Solution:

• Construct a sequential set of tests.

• Only windows that pass a test move to the next.

• The rest are discarded and ignored.

• Windows that survive all tests are declared faces.

• Each test – more computationally expensive than the one before, but also more discriminative.

Cascade – Graphically

All Window, All Scales

Test 1

Test 2

Last Test

Background – Rejected Window

Legend:

All windows that pass the test

All windows that failed the test

Computational price

Discrimination Ability

More Tests

Training : Building The Cascade• Training is done off-line, as a pre-processing step.

• Takes a lot of time

• Non-parallel version took weeks.

• Parallel version took a day.

• Who cares? – done once!

• Data sets:

• Positive examples : about 5000 faces (each a 24X24 pixels window), not fully aligned.

• Negative examples: about 10000 images which contain no faces (random crawl through Google).

• Independent validation set.

Training : Building The Cascade Cont.

• Each level is trained separately, in sequential order.

• Training set:

• Positive examples : all input faces.

• Negative examples: first 5000 detections found by running the cascade up to that level on non-face image set.

• Each level is trained based on its predecessors’ errors!

Training

Level

#N

Positive Examples

Negative Examples

All Faces

Detector

Levels 1 to (N-1)All Non Face Images

Mis-Detection

Training : Building The Cascade Cont.

• Setting Goals – Entire Cascade

• Total Probability of Detection (PD)

• Total False Alarm Rate (FAR)

• Higher PD – more complicated levels

• Lower FAR – more levels

• Setting Goals – Each Level

• Can derive PD & FAR for each level from total detector’s PD & FAR.

• Trade Offs:

• PD , FAR , number of features , number of levels (running time)

Training : One Level

• Decreasing the threshold results in higher PD and higher FAR.

if

iT

1,1iP

1,0)( iiiiii RTPWinfPR

iW

levelii ThresholdRWAns

• Each level is made up of a combination of weak classifiers.

• Each classifier:

• Is made up of:

• A function to run on the window (feature value)

• Threshold level

• Polarity (faces are above threshold or below)• Returns a 0/1 answer.

•Assigned a weight

• The level gives an answer:

Training : One Level – Selecting Classifiers

Normalize Example’s (Training Set) Weights

Select weak classifier that minimizes error

(sum of weights of misclassifications)

Adding Another

(Optimal) Weak

Classifier

Decrease weights of correctly classified examples

Run on validation set.

Determine level’s threshold such that desired PD is met.Check If

Good Enough

Check if the desired FAR is also met

Finish Level

Yes

No

save time

Classifiers - Description

• Each rectangle represents the sum of pixels inside it.

• Different configurations of rectangles.

• All possible sizes & locations of rectangles.

• Rectangles are summed either as positive (blue) or negative(yellow).

Two Rectangle Horizontal Feature

Two Rectangle Veritcal Feature

Three Rectangle Horizontal Feature

Three Rectangle Veritcal Feature

Four Rectangle Feature

• Each classifier is the sum of 2, 3 or 4 adjacent rectangles of the same size.

Classifiers – Cont.

• For a 24 by 24 pixel window – over 160,000 possible classifiers (all types, all locations, all sizes).

• Requires the selection process to be efficient

• Example: The first two classifiers selected

Classifiers - Explanation

• They are reminiscent of Haar basis functions.

• Can also give intuitive explanation:

• evaluates the first derivative.

• similar to second derivative, also line detector

• evaluates the derivative in XY

• OK, so what’s new?

Classifiers – Very Efficient

– And literally – the sum of the pixels up to that point.

– Can be computed in one pass over the image.

– The features can be computed in 6-9 matrix references.

• Multi-Scale Efficient

• Detector can be scaled, instead of image

• Insures better efficiency than any method that requires pyramids.

0 0

1 10 ,,_

i

i

j

jo jiIjiII

• Computationally Efficient

– Using the Integral Image (I_I)

– The Integral Image is, formally:

Results

• The only point that matters…

Results

• And speed?

– 15 frames per second (each frame is 388 X 244 pixels) on P3 800 MHz.

– Today can probably achieve real time.

– Training takes weeks (unless parrallelized)

• Compared To Others

Summary

• Haar – like features

• Modified Boosting

• Acceptable Results

• Speedy Results

All Window, All Scales

Test 1

Test 2

Last Test

Background – Rejected Window

More Tests

• Cascade:

Improvements• A more extensive feature set.

• Using the cascade idea with different tests.

• More efficient learning (weeks???)

Robust Real Time Face Detection P. Viola & Michael Jones Presented By: Matan Protter Visual...

Documents

Transcript of Robust Real Time Face Detection P. Viola & Michael Jones Presented By: Matan Protter Visual...