Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with...
-
Upload
morgan-johnson -
Category
Documents
-
view
214 -
download
0
Transcript of Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with...
Building local part models for
category-level recognition
C. Schmid, INRIA Grenoble
Joint work with G. Dorko, S. Lazebnik, J. Ponce
Introduction
• Invariant local descriptors
=> robust recognition of specific objects or scenes
• Recognition of textures and object classes
=> description of intra-class variation, selection of discriminant features, spatial relations
texture recognition car detection
1. An affine-invariant texture recognition (CVPR’03)
2. A two-layer architecture for texture segmentation and recognition (ICCV’03)
3. Feature selection for object class recognition (ICCV’03)
4. Building affine-invariant part models for recognition
Overview
Affine-invariant texture recognition
• Texture recognition under viewpoint changes and non-rigid transformations
• Use of affine-invariant regions– invariance to viewpoint changes– spatial selection => more compact representation, reduction of
redundancy in texton dictionary
[A sparse texture representation using affine-invariant regions,
S. Lazebnik, C. Schmid and J. Ponce, CVPR 2003]
Spatial selection
clustering each pixel
clustering selected pixels
Overview of the approach
Harris detector
Laplace detector
Region extraction
Descriptors – Spin images
Signature and EMD
• Hierarchical clustering
=> Signature :
• Earth movers distance
– robust distance, optimizes the flow between distributions– can match signatures of different size– not sensitive to the number of clusters
SS = { ( m1 , w1 ) , … , ( mk , wk ) }
D( SS , SS’’ ) = [i,j fij d( mi , m’j)] / [i,j fij ]
Database with viewpoint changes
20 samples of 10 different textures
Results
Spin images Gabor-like filters
1. An affine-invariant texture recognition (CVPR’03)
2. A two-layer architecture for texture segmentation and recognition (ICCV’03)
3. Feature selection for object class recognition (ICCV’03)
4. Building affine-invariant part models for recognition
Overview
A two-layer architecture
• Texture recognition + segmentation
• Classification of individual regions + spatial layout
[A generative architecture for semi-supervised texture
recognition, S. Lazebnik, C. Schmid, J. Ponce, ICCV 2003]
A two-layer architecture
Modeling :
1. Distribution of the local descriptors (affine invariants)• Gaussian mixture model• estimation with EM, allows incorporating unsegmented images
2. Co-occurrence statistics of sub-class labels over affinely adapted neighborhoods
Segmentation + Recognition :
1. Generative model for initial class probabilities
2. Co-occurrence statistics + relaxation to improve labels
Texture Dataset – Training Images
T1 (brick) T2 (carpet) T3 (chair) T4 (floor 1) T5 (floor 2) T6 (marble) T7 (wood)
Effect of relaxation + co-occurrence
Original image
Top: before relaxation (indivual regions), bottom: after relaxation (co-occurrence)
Recognition + Segmentation Examples
Animal Dataset – Training Images
• no manual segmentation, weakly supervised• 10 training images per animal (with background) • no purely negative images
Recognition + Segmentation Examples
1. An affine-invariant texture recognition (CVPR’03)
2. A two-layer architecture for texture segmentation and recognition (ICCV’03)
3. Feature selection for object class recognition (ICCV’03)
4. Building affine-invariant part models for recognition
Overview
Object class detection/classification
• Description of intra-class variations of object parts
[Selection of scale inv. regions for object class recognition,
G. Dorko and C. Schmid, ICCV’03]
Object class detection/classification
• Description of intra-class variations of object parts
• Selection of discrimiant features (weakly supervised)
Training the model
• Training phase 1– Input : Images of the object with background (positive images),
no normalization, alignment of the image
– Extraction of local descriptors : Harris-Laplace, Kadir-Brady, SIFT
– Clustering : estimation of Gaussian mixture with EM
Training the model
• Training phase 1– Input : Images of the object with background (positive images),
no normalization, alignment of the image/object
– Extraction of local descriptors : Harris-Laplace, Kadir-Brady, SIFT
– Clustering : estimation of Gaussian mixture with EM
Training the model
• Training phase 2 (selection)– Input : verification set, positive and negative images
– Rank each cluster with likelihood (or mutual information)
– MAP classifier with the n top clusters
j
nj
nj
i
ui
ui
dclP
dclPcR
)()(
)()(
)(
5
Likelihood Mutual Information
25
Likelihood – mutual information
–likelihood: more discriminant but very specific
–mutual Information: discriminant but not too specific
Results for test images
Har
ris-
Lap
lace
354 points 49 correct + 37 incorrect 31 correct + 20 incorrect
25 Likelihood 10 Mutual InformationDetection
Har
ris-
Lap
lace
277 points 43 correct + 36 incorrect 26 correct + 20 incorrect
Relaxation – propagation of probablities
Classification
• Assign each test descriptor to the most probable cluster (MAP)
• Each descriptor assigned to one of the top n clusters is positive
• If the number of positive descriptors are above a threshold p classify the image as positive
Classification experimentsAirplanesAirplanes MotorbikesMotorbikes Wild CatsWild Cats
Training Phase 1
#Positive images 200 200 25
Training Phase 2
#Positive images 200 200 25
#Negative images 450 450 450
Testing
#Positive images 400 400 50
#Negative images 450 450 450
Training
Verification
Test
http://www.robots.ox.ac.uk/~vgg/data Corel Image Library
Results: Motorbikes
Equal-Error-Rates as a function of p.
Receiver-Operating-Characteristic
p=6
Best Estimated p p=6 Fergus
p % p % % %
AirplanesHarris 8 97,5 5 97 97.25 -
Kadir 18 97 30 96.5 96 94
MotorbikesHarris 9 99 5 98 98.25 -
Kadir 19 98.75 32 98.25 98 96
Wild CatsHarris 31 94 34 92 72 -
Kadir 17 86 45 82 84 90
97.5
99
94
Classification results: ROC equal error rates
1. An affine-invariant texture recognition (CVPR’03)
2. A two-layer architecture for texture segmentation and recognition (ICCV’03)
3. Feature selection for object class recognition (ICCV’03)
4. Building affine-invariant part models for recognition
Overview
• Matching collections of local affine-invariant regions that map with an affine transformation => part
• Matching works for unsegmented images
• Model = a collection of parts
A
Affine-invariant part models
Matching: Faces
spurious match
Matching: 3D Objects
closeup
closeup closeup
Matching: Finding Repeated Patterns
Matching: Finding Symmetry
Modeling for Recognition
• Match multiple pairs of training images to produce several candidate parts.
• Use additional validation images to evaluate repeatability of parts and individual patches.
• Retain a fixed number of parts having the best repeatability score as class model.
• No background model
The Butterfly Dataset
• 16 training images (8 pairs) per class
• 10 validation images per class
• 437 test images
• 619 images total
Butterfly Models
Top two rows: pairs of images used for modeling. Bottom two rows: closeup views of some of the partsmaking up the models of the seven butterfly classes.
Recognition
• Top 10 models per class used for recognition
• Multi-class classification results:
total model size (smallest/largest)
Classification Rate vs. Number of Parts
Number of parts
Successful Detection Examples
Model partYellow: detected in test imageBlue: occluded in test image
Test image:All ellipses
Test image:Matched ellipses
Note: only one of the two training images is shown
Successful Detection Examples (cont.)
Detection of Multiple Instances
Detection Failures
Future Work
• Spatial relation– non-rigid models – relations between clusters and affine-invariant parts
• Feature selection: dimensionality reduction
• Shape information: appropriate descriptors
• Rapid search: structuring of the data