Robust Real Time Face Detection P. Viola & Michael Jones Presented By: Matan Protter Visual...
-
Upload
bennett-ellis -
Category
Documents
-
view
214 -
download
1
Transcript of Robust Real Time Face Detection P. Viola & Michael Jones Presented By: Matan Protter Visual...
Robust Real Time
Face Detection
P. Viola & Michael Jones
Presented By: Matan Protter
Visual Recognition, Spring 2005, Technion
Face Detection – What & Why?
• Given an image:
• Ultimately: Find all faces, while making no mistakes.
• Practically: Find most faces, while making few mistakes.
• What for?
• Mainly: face / people recognition.
• What is a face?Standard testing set: MIT + CMU
Face Detection Methods• Too many to count…
• Look for meaningful features:
• Eyes, nose, ears, chin line, etc.
• Train a detector
• A combination of a number of weighted weak classifiers
• Differ by:
• Feature set
• Training set
• Training method
• Etc.
The Proposed Method• Another learning – based detector
• Novell ideas though:
• Detector structure designed to run quickly – “Cascade”
• Feature set
• Haar – like
• Feature selection method (training method)
• Modified boosting
• Also – a very extensive testing set
• Plus a way to automatically generate negative examples
• All will be explained in the following slides
Detector Structure – Cascade• Detectors usually scan every window in the image in
every scale.
• Conventional detectors run all weak classifiers on all windows.
• Some windows can be discarded very quickly.
• Therefore, computation time is wasted.
• Solution:
• Construct a sequential set of tests.
• Only windows that pass a test move to the next.
• The rest are discarded and ignored.
• Windows that survive all tests are declared faces.
• Each test – more computationally expensive than the one before, but also more discriminative.
Cascade – Graphically
All Window, All Scales
Test 1
Test 2
Last Test
Background – Rejected Window
Legend:
All windows that pass the test
All windows that failed the test
Computational price
Discrimination Ability
More Tests
Training : Building The Cascade• Training is done off-line, as a pre-processing step.
• Takes a lot of time
• Non-parallel version took weeks.
• Parallel version took a day.
• Who cares? – done once!
• Data sets:
• Positive examples : about 5000 faces (each a 24X24 pixels window), not fully aligned.
• Negative examples: about 10000 images which contain no faces (random crawl through Google).
• Independent validation set.
Training : Building The Cascade Cont.
• Each level is trained separately, in sequential order.
• Training set:
• Positive examples : all input faces.
• Negative examples: first 5000 detections found by running the cascade up to that level on non-face image set.
• Each level is trained based on its predecessors’ errors!
Training
Level
#N
Positive Examples
Negative Examples
All Faces
Detector
Levels 1 to (N-1)All Non Face Images
Mis-Detection
Training : Building The Cascade Cont.
• Setting Goals – Entire Cascade
• Total Probability of Detection (PD)
• Total False Alarm Rate (FAR)
• Higher PD – more complicated levels
• Lower FAR – more levels
• Setting Goals – Each Level
• Can derive PD & FAR for each level from total detector’s PD & FAR.
• Trade Offs:
• PD , FAR , number of features , number of levels (running time)
Training : One Level
• Decreasing the threshold results in higher PD and higher FAR.
if
iT
1,1iP
1,0)( iiiiii RTPWinfPR
iW
levelii ThresholdRWAns
• Each level is made up of a combination of weak classifiers.
• Each classifier:
• Is made up of:
• A function to run on the window (feature value)
• Threshold level
• Polarity (faces are above threshold or below)• Returns a 0/1 answer.
•Assigned a weight
• The level gives an answer:
Training : One Level – Selecting Classifiers
Normalize Example’s (Training Set) Weights
Select weak classifier that minimizes error
(sum of weights of misclassifications)
Adding Another
(Optimal) Weak
Classifier
Decrease weights of correctly classified examples
Run on validation set.
Determine level’s threshold such that desired PD is met.Check If
Good Enough
Check if the desired FAR is also met
Finish Level
Yes
No
save time
Classifiers - Description
• Each rectangle represents the sum of pixels inside it.
• Different configurations of rectangles.
• All possible sizes & locations of rectangles.
• Rectangles are summed either as positive (blue) or negative(yellow).
Two Rectangle Horizontal Feature
Two Rectangle Veritcal Feature
Three Rectangle Horizontal Feature
Three Rectangle Veritcal Feature
Four Rectangle Feature
• Each classifier is the sum of 2, 3 or 4 adjacent rectangles of the same size.
Classifiers – Cont.
• For a 24 by 24 pixel window – over 160,000 possible classifiers (all types, all locations, all sizes).
• Requires the selection process to be efficient
• Example: The first two classifiers selected
Classifiers - Explanation
• They are reminiscent of Haar basis functions.
• Can also give intuitive explanation:
• evaluates the first derivative.
• similar to second derivative, also line detector
• evaluates the derivative in XY
• OK, so what’s new?
Classifiers – Very Efficient
– And literally – the sum of the pixels up to that point.
– Can be computed in one pass over the image.
– The features can be computed in 6-9 matrix references.
• Multi-Scale Efficient
• Detector can be scaled, instead of image
• Insures better efficiency than any method that requires pyramids.
0 0
1 10 ,,_
i
i
j
jo jiIjiII
• Computationally Efficient
– Using the Integral Image (I_I)
– The Integral Image is, formally:
Results
• And speed?
– 15 frames per second (each frame is 388 X 244 pixels) on P3 800 MHz.
– Today can probably achieve real time.
– Training takes weeks (unless parrallelized)
• Compared To Others
Summary
• Haar – like features
• Modified Boosting
• Acceptable Results
• Speedy Results
All Window, All Scales
Test 1
Test 2
Last Test
Background – Rejected Window
More Tests
• Cascade:
Improvements• A more extensive feature set.
• Using the cascade idea with different tests.
• More efficient learning (weeks???)