Similarity Metrics for Categorization: From Monolithic to Category Specific
description
Transcript of Similarity Metrics for Categorization: From Monolithic to Category Specific
Boris Babenko, Steve Branson, Serge BelongieUniversity of California, San Diego
ICCV 2009, Kyoto, Japan
• Recognizing multiple categories– Need meaningful similarity metric / feature space
• Recognizing multiple categories– Need meaningful similarity metric / feature space
• Idea: use training data to learn metric– Goes by many names:
• metric learning• cue combination/weighting• kernel combination/learning• feature selection
• Learn a single global similarity metric
Labeled Dataset
Mon
olith
ic
Query Image Similarity Metric
Cate
gory
4Ca
tego
ry 3
Cate
gory
2Ca
tego
ry 1
[ Jones et al. ‘03,Chopra et al. ‘05,Goldberger et al. ‘05,Shakhnarovich et al. ‘05Torralba et al. ‘08]
• Learn similarity metric for each category (1-vs-all)
Labeled Dataset
Mon
olith
icCa
tego
rySp
ecifi
c
Query Image Similarity Metric
Cate
gory
4Ca
tego
ry 3
Cate
gory
2Ca
tego
ry 1
[ Varma et al. ‘07,Frome et al. ‘07,Weinberger et al. ‘08Nilsback et al. ’08]
• Monolithic:– Less powerful… there is no “perfect” metric– Can generalize to new categories
• Per category:– More powerful– Do we really need thousands of metrics?– Have to train for new categories
• Would like to explore space between two extremes
• Idea: – Group categories together – Learn a few similarity metrics, one for each group
• Learn a few good similarity metrics
Labeled Dataset
Mon
olith
icM
uSL
Cate
gory
Spec
ific
Query Image Similarity Metric
Cate
gory
4Ca
tego
ry 3
Cate
gory
2Ca
tego
ry 1
• Need some framework to work with…• Boosting has many advantages:
– Feature selection– Easy implementation– Performs well
• Training data:
• Generate pairs:– Sample negative pairs
( , ), 0( , ), 1
Images
Category Labels
• Train similarity metric/classifier:
• Choose to be binary -- i.e.• = L1 distance over binary vectors
– efficient to compute (XOR and sum)
• For convenience:
[Shakhnarovich et al. ’05, Fergus et al. ‘08]
• Given some objective function• Boosting = gradient ascent in function space• Gradient = example weights for boosting
chosen weak classifier
other weak classifiers
function space
current strong classifier
[Friedman ’01, Mason et al. ‘00]
• Goal: train and recover mapping• At runtime
– To compute similarity of query image touse
Cate
gory
4Cat
egor
y 3
Cate
gory
2Ca
tego
ry 1
• Run pre-processing to group categories (i.e. k-means), then train as usual
• Drawbacks:– Hacky / not elegant– Not optimal: pre-processing not informed by class
confusions, etc.
• How can we train & group simultaneously?
• Definitions:
Sigmoid Function Parameter
• Definitions:
• Definitions:
How well works with category
• Objective function:
• Each category “assigned” to classifier
• Replace max with differentiable approx.
where is a scalar parameter
• Each training pair has weights
• Intuition:
Approximation of Difficulty of pair(like regular boosting)
10 20 30
0.5
1
1.5
2
x 10-4
10 20 30
1
2
3
4
5
6x 10
-5
w1i
w2i
w3i
(boosting iteration) (boosting iteration)
Difficult PairAssigned to
Easy PairAssigned to
• Created dataset with many heterogeneous categories
5 10 15 200.65
0.7
0.75
0.8
K (number of classifiers)
Acc
ura
cy
MuSL+retrainMuSLk-meansRandMonolithicPer Cat
Merged categories from:• Caltech 101 [Griffin et al.]• Oxford Flowers [Nilsback et al.]• UIUC Textures [Lazebnik et al.]
MuS
L
k-m
eans
Training more metrics overfits!
• Studied categorization performance vs number of learned metrics
• Presented boosting algorithm to simultaneously group categories and train metrics
• Observed overfitting behavior for novel categories
• Supported by– NSF CAREER Grant #0448615 – NSF IGERT Grant DGE-0333451– ONR MURI Grant #N00014-08-1-0638– UCSD FWGrid Project (NSF Infrastructure Grant
no. EIA-0303622)