Similarity Metrics for Categorization: From Monolithic to Category Specific

Boris Babenko, Steve Branson, Serge BelongieUniversity of California, San Diego

ICCV 2009, Kyoto, Japan

• Recognizing multiple categories– Need meaningful similarity metric / feature space

• Recognizing multiple categories– Need meaningful similarity metric / feature space

• Idea: use training data to learn metric– Goes by many names:

• metric learning• cue combination/weighting• kernel combination/learning• feature selection

• Learn a single global similarity metric

Labeled Dataset

Mon

olith

ic

Query Image Similarity Metric

Cate

gory

4Ca

tego

ry 3

Cate

gory

2Ca

tego

ry 1

[ Jones et al. ‘03,Chopra et al. ‘05,Goldberger et al. ‘05,Shakhnarovich et al. ‘05Torralba et al. ‘08]

• Learn similarity metric for each category (1-vs-all)

Labeled Dataset

Mon

olith

icCa

tego

rySp

ecifi

c


Cate

gory

4Ca

tego

ry 3

Cate

gory

2Ca

tego

ry 1

[ Varma et al. ‘07,Frome et al. ‘07,Weinberger et al. ‘08Nilsback et al. ’08]

• Monolithic:– Less powerful… there is no “perfect” metric– Can generalize to new categories

• Per category:– More powerful– Do we really need thousands of metrics?– Have to train for new categories

• Would like to explore space between two extremes

• Idea: – Group categories together – Learn a few similarity metrics, one for each group

• Learn a few good similarity metrics

Labeled Dataset

Mon

olith

icM

uSL

Cate

gory

Spec

ific


Cate

gory

4Ca

tego

ry 3

Cate

gory

2Ca

tego

ry 1

• Need some framework to work with…• Boosting has many advantages:

– Feature selection– Easy implementation– Performs well

• Training data:

• Generate pairs:– Sample negative pairs

( , ), 0( , ), 1

Images

Category Labels

• Train similarity metric/classifier:

• Choose to be binary -- i.e.• = L1 distance over binary vectors

– efficient to compute (XOR and sum)

• For convenience:

[Shakhnarovich et al. ’05, Fergus et al. ‘08]

• Given some objective function• Boosting = gradient ascent in function space• Gradient = example weights for boosting

chosen weak classifier

other weak classifiers

function space

current strong classifier

[Friedman ’01, Mason et al. ‘00]

• Goal: train and recover mapping• At runtime

– To compute similarity of query image touse

Cate

gory

4Cat

egor

y 3

Cate

gory

2Ca

tego

ry 1

• Run pre-processing to group categories (i.e. k-means), then train as usual

• Drawbacks:– Hacky / not elegant– Not optimal: pre-processing not informed by class

confusions, etc.

• How can we train & group simultaneously?

• Definitions:

Sigmoid Function Parameter

• Definitions:

• Definitions:

How well works with category

• Objective function:

• Each category “assigned” to classifier

• Replace max with differentiable approx.

where is a scalar parameter

• Each training pair has weights

• Intuition:

Approximation of Difficulty of pair(like regular boosting)

10 20 30

0.5

1

1.5

2

x 10-4

10 20 30

1

2

3

4

5

6x 10

-5

w1i

w2i

w3i

(boosting iteration) (boosting iteration)

Difficult PairAssigned to

Easy PairAssigned to

• Created dataset with many heterogeneous categories

5 10 15 200.65

0.7

0.75

0.8

K (number of classifiers)

Acc

ura

cy

MuSL+retrainMuSLk-meansRandMonolithicPer Cat

Merged categories from:• Caltech 101 [Griffin et al.]• Oxford Flowers [Nilsback et al.]• UIUC Textures [Lazebnik et al.]

MuS

L

k-m

eans

Training more metrics overfits!

• Studied categorization performance vs number of learned metrics

• Presented boosting algorithm to simultaneously group categories and train metrics

• Observed overfitting behavior for novel categories

• Supported by– NSF CAREER Grant #0448615 – NSF IGERT Grant DGE-0333451– ONR MURI Grant #N00014-08-1-0638– UCSD FWGrid Project (NSF Infrastructure Grant

no. EIA-0303622)

Similarity Metrics for Categorization: From Monolithic to Category Specific

Documents

Transcript of Similarity Metrics for Categorization: From Monolithic to Category Specific