ICME 2013

25
IEEE International Conference on Multimedia & Expo 2013 Augmenting Descriptors for Fine-grained Visual Categorization Using Polynomial Embedding Hideki Nakayama Graduate School of Information Science and Technology The University of Tokyo

description

 

Transcript of ICME 2013

Page 1: ICME 2013

IEEE International Conference on Multimedia & Expo 2013

Augmenting Descriptors for Fine-grained Visual Categorization Using Polynomial Embedding

Hideki Nakayama

Graduate School of Information Science and Technology The University of Tokyo

Page 2: ICME 2013

Outline

Background and Motivation Our solution

Polynomial embedding

Experiment Fine-grained categorization Comparison with state-of-the-art

Conclusion

2

Page 3: ICME 2013

Fine-grained visual categorization (FGVC) Distinguish hundreds of fine-grained objects

under a certain domain (e.g., species of animals and plants)

Complement to traditional object recognition problems

Caltech-256 [Griffin et al., 2007]

Caltech-Bird-200 [Welinder et al., 2010]

Generic Object Recognition FGVC

Yellow Warbler

Pririe

Warbler Pine Warbler Airplane Monitor Dog

V.S.

3

Page 4: ICME 2013

Motivation

We need highly discriminative features to distinguish visually very similar categories

Especially at local level.

4

Page 5: ICME 2013

Two basic ideas

1. Co-occurrence (correlation) of neighboring local descriptors Shaplet [Sabzmeydani et al., 2007] Covariance feature [Tuzel et al., 2006]

GGV [Harada et al., 2012]

Expected to capture middle-level local information Results in high-dimensional local features

2. State-of-the-art bag-of-words representation

Based on higher-order statistics of local features Fisher vector [Perronnin et al., 2010]

VLAD [Jegou et al., 2010]

Remarkably high-performance, enables linear classification Dimensionality increases in linear to the size of local features

☹ ☺

☺ ☹

(N: number of visual words, D: size of local features) ND

2ND

conflict

5

Page 6: ICME 2013

Our approach

Compress polynomials of neighboring local features vector with supervised dimensionality reduction Discriminative latent descriptor

Encode by means of bag-of-words (Fisher vector) Logistic regression classifier

1,000~1,0000 dim 64 dim Descriptor

(e.g. SIFT)

Dense sampling

polynomialvectors

latentdescriptor

categorylabel

CCA

(training)

Fishervector

logisticregressionclassifier

6

Page 7: ICME 2013

★● ●

Exploit co-occurrence information

e.g. SIFT

( )( )( )

=

+

Tyxyx

Tyxyx

Tyxyx

yx

yx

Vec

Vec

upperVec

),(),(

),(),(

),(),(

),(

2),(

δ

δ

vv

vv

vv

v

p

),( yxv),( yx σ−v ),( yx σ+vNeighbor

(Left) Neighbor (Right)

Descriptor at target position

×

×

Polynomial Vector

a Matrix of vector flattened:)(Vec

7

Page 8: ICME 2013

Exploit co-occurrence information More spatial information can be integrated with more

neighbors (but become high-dimensional)

( )( )( )

=

+

Tyxyx

Tyxyx

Tyxyx

yx

yx

Vec

Vec

upperVec

),(),(

),(),(

),(),(

),(

2),(

δ

δ

vv

vv

vv

v

p

( )

= T

yxyx

yxyx upperVec ),(),(

),(0),( vv

vp

( )( )( )( )( )

=

+

+

Tyxyx

Tyxyx

Tyxyx

Tyxyx

Tyxyx

yx

yx

Vec

Vec

Vec

Vec

upperVec

),(),(

),(),(

),(),(

),(),(

),(),(

),(

4),(

δ

δ

δ

δ

vv

vv

vv

vv

vv

v

p

★● ●

★● ●

0-neighbor

2-neighbors

4-neighbors

2,144dim

10,336dim

18,528dim

8

Page 9: ICME 2013

Descriptor(e.g. SIFT)

Dense sampling

polynomialvectors

latentdescriptor

Fishervector

logisticregressionclassifier

categorylabel

CCA

(training)

9

Page 10: ICME 2013

Patch feature and label pairs Category label: Binary occurrence vector

Strong supervision assumption

Most patches should be related to the content (category) (Somewhat) justified for FGVC considering the applications Users will target the object, sometimes can give segmentation

Supervised dimensionality reduction

Allium triquetrum

0

010

0

010

0

010

0

010

(We do not perform manual segmentation in this work, though) 10

Page 11: ICME 2013

Canonical Correlation Analysis (CCA) [Hotelling, 1936]

( ) ( ) tslltpps

lp

andbetweenncorrelatiothemaximizethat,tionstransformalinearfinds CCA

featurelabel: ls),(polynomiafeaturepatch:

−=−= TT BA

Supervised dimensionality reduction

( )( )IBCBBCBCCC

IACAACACCC

llT

llplpplp

ppT

pplpllpl

=Λ=

=Λ=−

21

21

nscorrelatiocanonical:matricescovariance:

ΛC

p l

Canonical space

s t

s

t

Image feature Labels feature

( )pps −= TALatent descriptor

1,000~1,0000 dim

64 dim

(discriminative)

11

Page 12: ICME 2013

Descriptor(e.g. SIFT)

Dense sampling

polynomialvectors

latentdescriptor

Fishervector

logisticregressionclassifier

categorylabel

CCA

(training)

12

Page 13: ICME 2013

Fisher Vector [Perronnin et al., 2010]

State-of-the-art bag-of-words encoding method using higher-level statistics of descriptors (mean and var)

http://www.image-net.org/challenges/LSVRC/2010/ILSVRC2010_XRCE.pdf

13

Page 14: ICME 2013

Experiments

Page 15: ICME 2013

Experimental setup

FGVC Datasets Oxford-Flowers-102 Caltech-Bird-200

Descriptors SIFT, C-SIFT, Opponent-SIFT, Self Similarity Compressed into 64dim using several methods

Fisher Vector 64 Gaussians (visual wods) Global + 3 horizontal spatial regions

Classifier Logistic regression

Evaluation Mean classification accuracy

Flowers

Birds

15

Page 16: ICME 2013

Results: comparison with PCA and CCA

Our method substantially improves performance for all descriptors

Just applying CCA to concatenated neighbors does not improve performance Polynomial embedding makes sense (non-linear convolution)

0

10

20

30

40

50

60

70

80

90

SIFT C-SIFT Opp.-SIFT SSIM

PCA (baseline)

CCA (4-neighbors)

PolEmb (4-neighbors)

0

5

10

15

20

25

SIFT C-SIFT Opp.-SIFT SSIM

Flower Bird

Classification performance (%) with different embedding methods (all 64dim)

Baseline (PCA)

Ours

16

CCA without

Pol.

Page 17: ICME 2013

Results: number of neighbors

Including more neighbors improves performance

0

10

20

30

40

50

60

70

80

90

SIFT C-SIFT Opp.-SIFT SSIM0

5

10

15

20

25

SIFT C-SIFT Opp.-SIFT SSIM

Classification performance (%) of our method with different number of neighbors

Flower Bird

★● ●

★● ●

17

Page 18: ICME 2013

Comparison with other work

Page 19: ICME 2013

Our final system Combine four descriptors in late-fusion

approach (SIFT, C-SIFT, Opp.-SIFT, SSIM)

Sum of log-likelihoods output by each classifier (weighted by its individual confidence)

Descriptor 1(e.g. SIFT)

Dense sampling

polynomialvectors

categorylabel

CCA

latentdescriptor

Fishervector

logisticregressionclassifier

logisticregressionclassifier

logisticregressionclassifier

(training)

Descriptor 2

Descriptor K

+

・・・・・・

Same as above

Same as above

Alliumtriquetrum

19

Page 20: ICME 2013

Comparison on FGVC datasets

Our method outperforms previous work on bird and flower datasets

For the bird dataset, [32] uses the bounding box only for training images, therefore the result is not directly comparable to ours.

(PCA) (PolEmb) (PCA+PolEmb)

← baseline

Mean classification accuracy (%)

20

Page 21: ICME 2013

ImageCLEF 2013 Plant Identification

Flower FruitLeaf Stem Entire

KakiPersimmon

Silverbirch

Boxeldermapple

Identify 250 plant species from different organs (Leaf, Flower, Fruit, etc.)

Got the 1st place in Natural Background task, and in 4/5 subtasks.

(Coming in Sept., 2013.)

21

Page 22: ICME 2013

Conclusion A simple but effective method for FGVC

Embedding co-occurrence patterns of neighboring descriptors Obtain discriminative and small-dimensional latent descriptor

to use together with Fisher vector Polynomial embedding greatly improves the performance,

indicating the importance of non-linearity

Patch-level strong supervision approximation

Not always perfect but reasonable for FGVC problems

Future work Theoretical analysis (probabilistic interpretation) Multiple instance dimensionality reduction

22

Page 23: ICME 2013

Thank you!

Any questions?

23

Page 24: ICME 2013

Object and scene categorization

Caltech-101 (Object dataset)

MIT-Indoor-67 (Scene dataset)

24

Page 25: ICME 2013

Results: Object and scene categorization

Our method seems to be not as effective as in FGVC problems

Combining PCA feature + our feature consistent improves performance

Mean classification accuracy (%)

0

10

20

30

40

50

60

70

80

SIFT C-SIFT Opp.-SIFT0

10

20

30

40

50

60

SIFT C-SIFT Opp.-SIFT

Caltech-101 MIT-Indoor-67

0

10

20

30

40

50

60

70

80

SIFT C-SIFT Opp.-SIFT0

10

20

30

40

50

60

SIFT C-SIFT Opp.-SIFT

25