Download - “POOF: Part Based One-vs-One Features for Fine Grained Categorization, Face Verification, and Attribute Estimation” Thomas Berg and Peter Belhumeur CVPR.

“POOF: Part Based One-vs-One Features for Fine Grained Categorization, Face Verification, and Attribute Estimation”

Thomas Berg and Peter Belhumeur

CVPR 2013

VGG Reading Group 4.7.2013

Eric Sommerlade

Summary

• A POOF is a scalar defined – for a discriminative region between two classes and

two landmarks– for a set of base features (e.g. HOG or colour hist.)

• Perks:– Regions automatically learned from data set– Great Performance– transfers in knowledge from external datasets

Motivation:

Standard approach to part based recognition:

- extract standard feature (SIFT, HOG, LBP)

- train classifier

- relevant regions tuned by hand

Idea: “standard” features hardly optimal for specific problem

“best” according to

- domain (dog features != bird features)

- task (face recognition != gender classification)

POOF feature learning:

• From dataset with landmark annotations


• Choose feature part f

• Choose alignment part a

• Align and crop to 128x64 region

• Larger/shorter distance -> coarser/finer scale


• Scales: 8x8 and 16x16

8*16 + 4*8 = 160 cells


• Per cell:– 8 bin gradient direction histogram Dg=8 (‘gradhist’)– Or Felsenszwalb HOG: Dg=31– Color histogram Dc=32

• Concatenated length (Dg+Dc)*160

POOF feature learning:• For each scale (8x8, 16x16):

– learn linear SVM, get weights w

• Keep max abs(w_i) per cell• Keep cells with max(w_c)>=median(max(w_c))• keep connected component (4?) starting at f

c1 c2 cn

c1 c2 cn

c1 c2 cn

W:

max:

threshold:

…

…

…

POOF feature learning:• retrain SVM on selected cells only

• Get POOF (bitmap+svm weight vector):

POOF feature extraction:

• Find corresponding landmarks – Authors use

Belhumeur CVPR 2011

• Align & crop to 128x64 region

• Get base features

• Get SVM score from features in masked region

Results: categorization

• UCSD birds dataset, 200 classes

• 13 landmarks used

• About 5m POOF combinations possible

• Randomly chosen subset of 5000 POOFs

• Use as feature vector in one-vs-all linear SVM

• Evaluation on gt bbox of object– gt landmarks– or detected landmarks


gradhist HOG lowlevelbaseline [27]

[4] (MKL)

[33] (RF) [32] [8] [35]

200det 54 56 28

14det 65 70 57

200gt 69 73 40 17 19 19

14gt 80 85 44

5det 55

Results: Face Verification

• Are two images of the same person?

• LFW dataset

• 16 landmarks

• 120 subjects

• ~3.5m POOF choices

• Each image yields 10000 random POOFs f(I)

• For image pair concat [|f(I)-f(J)| f(I).*f(J)]

• Train same-vs-different classifier


• Performance equal to Tom-vs-Pete (bmvc2012)

• But: – Support regions learned automatically– Linear SVM, not RBF faster– Uses same “identity preserving alignment” on

landmark detections [2]

input affine canonical Mean of all closest in dataset

Results: Attribute classification

• Attributes such as gender, “big nose”, “eyeglasses” (Kumar [14])

• POOFs learned as before, on LFW dataset

• Extracts POOFs from attribute dataset

• Train linear SVM for each attribute

• POOFs transfer discriminability from different classes no need for fully labelled attribute dataset

Results: Attribute classification

• Restricted number of attribute samples

• POOF features don’t latch on to noise

…