“POOF: Part Based One-vs-One Features for Fine Grained Categorization, Face Verification, and Attribute Estimation”
Thomas Berg and Peter Belhumeur
CVPR 2013
VGG Reading Group 4.7.2013
Eric Sommerlade
Summary
• A POOF is a scalar defined – for a discriminative region between two classes and
two landmarks– for a set of base features (e.g. HOG or colour hist.)
• Perks:– Regions automatically learned from data set– Great Performance– transfers in knowledge from external datasets
Motivation:
Standard approach to part based recognition:
- extract standard feature (SIFT, HOG, LBP)
- train classifier
- relevant regions tuned by hand
Idea: “standard” features hardly optimal for specific problem
“best” according to
- domain (dog features != bird features)
- task (face recognition != gender classification)
POOF feature learning:
• From dataset with landmark annotations
POOF feature learning:
• Choose feature part f
• Choose alignment part a
• Align and crop to 128x64 region
• Larger/shorter distance -> coarser/finer scale
POOF feature learning:
• Scales: 8x8 and 16x16
8*16 + 4*8 = 160 cells
POOF feature learning:
• Per cell:– 8 bin gradient direction histogram Dg=8 (‘gradhist’)– Or Felsenszwalb HOG: Dg=31– Color histogram Dc=32
• Concatenated length (Dg+Dc)*160
POOF feature learning:• For each scale (8x8, 16x16):
– learn linear SVM, get weights w
• Keep max abs(w_i) per cell• Keep cells with max(w_c)>=median(max(w_c))• keep connected component (4?) starting at f
c1 c2 cn
c1 c2 cn
c1 c2 cn
W:
max:
threshold:
…
…
…
POOF feature learning:• retrain SVM on selected cells only
• Get POOF (bitmap+svm weight vector):
POOF feature extraction:
• Find corresponding landmarks – Authors use
Belhumeur CVPR 2011
• Align & crop to 128x64 region
• Get base features
• Get SVM score from features in masked region
Results: categorization
• UCSD birds dataset, 200 classes
• 13 landmarks used
• About 5m POOF combinations possible
• Randomly chosen subset of 5000 POOFs
• Use as feature vector in one-vs-all linear SVM
• Evaluation on gt bbox of object– gt landmarks– or detected landmarks
Results: categorization
Results: categorization
gradhist HOG lowlevelbaseline [27]
[4] (MKL)
[33] (RF) [32] [8] [35]
200det 54 56 28
14det 65 70 57
200gt 69 73 40 17 19 19
14gt 80 85 44
5det 55
Results: Face Verification
• Are two images of the same person?
• LFW dataset
• 16 landmarks
• 120 subjects
• ~3.5m POOF choices
• Each image yields 10000 random POOFs f(I)
• For image pair concat [|f(I)-f(J)| f(I).*f(J)]
• Train same-vs-different classifier
Results: Face Verification
Results: Face Verification
• Performance equal to Tom-vs-Pete (bmvc2012)
• But: – Support regions learned automatically– Linear SVM, not RBF faster– Uses same “identity preserving alignment” on
landmark detections [2]
input affine canonical Mean of all closest in dataset
Results: Attribute classification
• Attributes such as gender, “big nose”, “eyeglasses” (Kumar [14])
• POOFs learned as before, on LFW dataset
• Extracts POOFs from attribute dataset
• Train linear SVM for each attribute
• POOFs transfer discriminability from different classes no need for fully labelled attribute dataset
Results: Attribute classification
• Restricted number of attribute samples
• POOF features don’t latch on to noise
…
Top Related