Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with...

Building local part models for

category-level recognition

C. Schmid, INRIA Grenoble

Joint work with G. Dorko, S. Lazebnik, J. Ponce

Introduction

• Invariant local descriptors

=> robust recognition of specific objects or scenes

• Recognition of textures and object classes

=> description of intra-class variation, selection of discriminant features, spatial relations

texture recognition car detection

1. An affine-invariant texture recognition (CVPR’03)

2. A two-layer architecture for texture segmentation and recognition (ICCV’03)

3. Feature selection for object class recognition (ICCV’03)

4. Building affine-invariant part models for recognition

Overview

Affine-invariant texture recognition

• Texture recognition under viewpoint changes and non-rigid transformations

• Use of affine-invariant regions– invariance to viewpoint changes– spatial selection => more compact representation, reduction of

redundancy in texton dictionary

[A sparse texture representation using affine-invariant regions,

S. Lazebnik, C. Schmid and J. Ponce, CVPR 2003]

Spatial selection

clustering each pixel

clustering selected pixels

Overview of the approach

Harris detector

Laplace detector

Region extraction

Descriptors – Spin images

Signature and EMD

• Hierarchical clustering

=> Signature :

• Earth movers distance

– robust distance, optimizes the flow between distributions– can match signatures of different size– not sensitive to the number of clusters

SS = { ( m1 , w1 ) , … , ( mk , wk ) }

D( SS , SS’’ ) = [i,j fij d( mi , m’j)] / [i,j fij ]

Database with viewpoint changes

20 samples of 10 different textures

Results

Spin images Gabor-like filters





Overview

A two-layer architecture

• Texture recognition + segmentation

• Classification of individual regions + spatial layout

[A generative architecture for semi-supervised texture

recognition, S. Lazebnik, C. Schmid, J. Ponce, ICCV 2003]

A two-layer architecture

Modeling :

1. Distribution of the local descriptors (affine invariants)• Gaussian mixture model• estimation with EM, allows incorporating unsegmented images

2. Co-occurrence statistics of sub-class labels over affinely adapted neighborhoods

Segmentation + Recognition :

1. Generative model for initial class probabilities

2. Co-occurrence statistics + relaxation to improve labels

Texture Dataset – Training Images

T1 (brick) T2 (carpet) T3 (chair) T4 (floor 1) T5 (floor 2) T6 (marble) T7 (wood)

Effect of relaxation + co-occurrence

Original image

Top: before relaxation (indivual regions), bottom: after relaxation (co-occurrence)

Recognition + Segmentation Examples

Animal Dataset – Training Images

• no manual segmentation, weakly supervised• 10 training images per animal (with background) • no purely negative images

Recognition + Segmentation Examples





Overview

Object class detection/classification

• Description of intra-class variations of object parts

[Selection of scale inv. regions for object class recognition,

G. Dorko and C. Schmid, ICCV’03]

Object class detection/classification

• Description of intra-class variations of object parts

• Selection of discrimiant features (weakly supervised)

Training the model

• Training phase 1– Input : Images of the object with background (positive images),

no normalization, alignment of the image

– Extraction of local descriptors : Harris-Laplace, Kadir-Brady, SIFT

– Clustering : estimation of Gaussian mixture with EM

Training the model

• Training phase 1– Input : Images of the object with background (positive images),

no normalization, alignment of the image/object

– Extraction of local descriptors : Harris-Laplace, Kadir-Brady, SIFT

– Clustering : estimation of Gaussian mixture with EM

Training the model

• Training phase 2 (selection)– Input : verification set, positive and negative images

– Rank each cluster with likelihood (or mutual information)

– MAP classifier with the n top clusters

j

nj

nj

i

ui

ui

dclP

dclPcR

)()(

)()(

)(

5

Likelihood Mutual Information

25

Likelihood – mutual information

–likelihood: more discriminant but very specific

–mutual Information: discriminant but not too specific

Results for test images

Har

ris-

Lap

lace

354 points 49 correct + 37 incorrect 31 correct + 20 incorrect

25 Likelihood 10 Mutual InformationDetection

Har

ris-

Lap

lace

277 points 43 correct + 36 incorrect 26 correct + 20 incorrect

Relaxation – propagation of probablities

Classification

• Assign each test descriptor to the most probable cluster (MAP)

• Each descriptor assigned to one of the top n clusters is positive

• If the number of positive descriptors are above a threshold p classify the image as positive

Classification experimentsAirplanesAirplanes MotorbikesMotorbikes Wild CatsWild Cats

Training Phase 1

#Positive images 200 200 25

Training Phase 2


#Negative images 450 450 450

Testing


#Negative images 450 450 450

Training

Verification

Test

http://www.robots.ox.ac.uk/~vgg/data Corel Image Library

Results: Motorbikes

Equal-Error-Rates as a function of p.

Receiver-Operating-Characteristic

p=6

Best Estimated p p=6 Fergus

p % p % % %

AirplanesHarris 8 97,5 5 97 97.25 -

Kadir 18 97 30 96.5 96 94

MotorbikesHarris 9 99 5 98 98.25 -

Kadir 19 98.75 32 98.25 98 96

Wild CatsHarris 31 94 34 92 72 -

Kadir 17 86 45 82 84 90

97.5

99

94

Classification results: ROC equal error rates





Overview

• Matching collections of local affine-invariant regions that map with an affine transformation => part

• Matching works for unsegmented images

• Model = a collection of parts

A

Affine-invariant part models

Matching: Faces

spurious match

Matching: 3D Objects

closeup

closeup closeup

Matching: Finding Repeated Patterns

Matching: Finding Symmetry

Modeling for Recognition

• Match multiple pairs of training images to produce several candidate parts.

• Use additional validation images to evaluate repeatability of parts and individual patches.

• Retain a fixed number of parts having the best repeatability score as class model.

• No background model

The Butterfly Dataset

• 16 training images (8 pairs) per class

• 10 validation images per class

• 437 test images

• 619 images total

Butterfly Models

Top two rows: pairs of images used for modeling. Bottom two rows: closeup views of some of the partsmaking up the models of the seven butterfly classes.

Recognition

• Top 10 models per class used for recognition

• Multi-class classification results:

total model size (smallest/largest)

Classification Rate vs. Number of Parts

Number of parts

Successful Detection Examples

Model partYellow: detected in test imageBlue: occluded in test image

Test image:All ellipses

Test image:Matched ellipses

Note: only one of the two training images is shown

Successful Detection Examples (cont.)

Detection of Multiple Instances

Detection Failures

Future Work

• Spatial relation– non-rigid models – relations between clusters and affine-invariant parts

• Feature selection: dimensionality reduction

• Shape information: appropriate descriptors

• Rapid search: structuring of the data

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with...

Documents

Transcript of Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with...