AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception...

63
The Bag of Words Torn Open: Instance Retrieval goes Deep AI Ukraine 2016 Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague

Transcript of AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception...

Page 1: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

The Bag of Words Torn Open: Instance Retrieval goes Deep

AI Ukraine 2016Kharkiv, Ukraine

James Pritts

Center for Machine Perception

Czech Technical University in Prague

Page 2: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Who are we?

Filip RadenovićPhD candidate

James PrittsPhD candidate

Jiří MatasProfessor

Ondřej ChumAssociate Professor

Giorgos ToliasPost-Doctoral candidate

Page 3: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Goals

Introduce the Instance Retrieval Problem

Compare two ways to learn an image encodingBag-of-words (BoW) descriptor:

~1,000,000D vector

Convolutional Neural Network (CNN) descriptor

512D vector

Demonstrate state-of-the-art retrieval performance

Page 4: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Part 1: The Instance Retrieval Task

Page 5: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Instance Retrieval Challenges

Significant viewpoint and/or scale change

Significant illumination change

Severe occlusions

Visually similar but different objects

Page 6: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Instance Retrieval Challenges

Significant viewpoint and/or scale change

Significant illumination change

Severe occlusions

Visually similar but different objects

Page 7: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Instance Retrieval Challenges

Significant viewpoint and/or scale change

Significant illumination change

Severe occlusions

Visually similar but different objects

Page 8: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Instance Retrieval Challenges

Significant viewpoint and/or scale change

Significant illumination change

Severe occlusions

Visually similar but different objects

Page 9: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Instance Retrieval Demo

Click Here

Page 10: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Notional Instance Retrieval System

Image

Descriptordatabase

Lots ofimages

Query Encoding

Query Encoding

Ranking Matching

Off-line stage: learning

On-line stage: inference

Descriptordatabase

Page 11: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Notional Instance Retrieval System

Image

Descriptordatabase

Lots ofimages

Query Encoding

Query Encoding

Ranking Matching

Off-line stage: learning

On-line stage: inference

Descriptordatabase

Page 12: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Part 2: The Bag of Words (BoW) representation

Page 13: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Bag of Words: Off-line stage

Page 14: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Quantization by K-Means

Initialize cluster centres

Find nearest cluster to each datapoint (slow) O(N k)

Re-compute cluster centres as centroids

Iterate

Page 15: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Quantization by Approximate K-Means

+ fast O(N log k)

+ reasonable quantization

- Can be inconsistent when ANN fails

Philbin, Chum, Isard, Sivic, and Zisserman – CVPR 2007 Object retrieval with large vocabularies and fast spatial matching

Initialize cluster centres

Find approximate nearest cluster to each datapoint

Re-compute cluster centres as centroids

Iterate

Page 16: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Quantization by Hierarchical K-means

+ fast O(N log k)

+ incremental construction

- not so good quantization

- often imbalanced

Nistér & Stewénius: Scalable recognition with a vocabulary tree. CVPR 2006

Page 17: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Bag-of-Words Image Representation

A

C

D

BA

C

D

B

1

0

0

2

0

3

0

1

Images

Vis

ual

vo

cab

ula

ryAn image is represented by the histogram ofdetected visual words

Term-frequency (tf) – visual word D is twice in the image

Page 18: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

18

Bag of Words : On-line Stage

137…

15999565

BOW

geometries

IN: q

word image ID1 1 5 10 … 73501252 2 7 12 … 73991213 1 4 15 … 7200190… …

16777216 3 7 10 … 7012245

1. Inverted file: posting list per visual word 2. Image ranking

score image ID0.87 50.75 15730.52 11202

… …0.001 32

image 11202image 1573image 5

3. Spatial verification

#inliers zoom image ID247 7x 1573105 2x 517 37x 11202… … …2 17x 75213

4. Re-ranked shortlist

137…

15999565

+23

15…

14890215

+3

1029…

15678921

+…

+ + +…

query image 1573 image 45

5. Query expansion

OUT: R

Shortlist: top N images

Page 19: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

BoW and Inverted File

1 2 3 4 5 6 7 8 9 10

6 7 7 …

1 3 6

…5 6 8

2 4 10 …A

C

D

B

Vis

ual

vo

cab

ula

ry

A CD BA AB

BC

CD

D

… … … … …

Page 20: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

BoW and Inverted File

1 2 3 4 5 6 7 8 9 10

6 7 7 …

1 3 6 …

5 6 8 …

query visual word 1

query visual word 2

query visual word 3

D

B

G

Page 21: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

BoW and Inverted File

1 2 3 4 5 6 7 8 9 10

Efficient (fast)Linear complexity (in # documents)Can be interpreted as voting

Page 22: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Efficient Scoring

bag of words representation(up to 1,000,000 D)

0

3

0

1

α1 ( 1 0 0 2 )α2 ( 0 2 0 1 )

α3 ( 1 0 0 0 )

Database Query

• =

Score

αqs2

s3

A C DB

A

C

D

B

s1

Page 23: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Word Weighting

Words (in text) common to many documentsare less informative - ‘the’, ‘and’, ‘or’, ‘in’, …

idfX = log # docs containing

# documents

X

Images are represented by weighted histograms tfX idfX

(rather than just a histogram of tfX )

Words that are too frequent (virtually in every document) can be put on a stop list(ignored as if they were not in the document)

Baeza-Yates, Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999.

features from all documents

Page 24: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

24

Bag of Words : On-line Stage

137…

15999565

BOW

geometries

IN: q

word image ID1 1 5 10 … 73501252 2 7 12 … 73991213 1 4 15 … 7200190… …

16777216 3 7 10 … 7012245

1. Inverted file: posting list per visual word 2. Image ranking

score image ID0.87 50.75 15730.52 11202

… …0.001 32

image 11202image 1573image 5

3. Spatial verification

#inliers zoom image ID247 7x 1573105 2x 517 37x 11202… … …2 17x 75213

4. Re-ranked shortlist

137…

15999565

+23

15…

14890215

+3

1029…

15678921

+…

+ + +…

query image 1573 image 45

5. Query expansion

OUT: R

Shortlist: top N images

Page 25: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

25

Query Expansion

Query image

Results

New query

Spatial verification

New results

Chum, Philbin, Sivic, Isard, Zisserman: Total Recall…, ICCV 2007

Page 26: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

26

Query Expansion: Step by Step

Query Image Retrieved image Originally not retrieved

Page 27: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

27

Query Expansion: Step by Step

Page 28: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

28

Query Expansion: Step by Step

Page 29: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

The Bag of Words solution

Significant viewpoint scale change covariant local features, invariant descriptors

Significant illumination change color-normalized feature descriptors

Severe occlusions locality of the features, geometric verification

Visually similar but different objects Feature discriminability & geometric verification

** Encoding is learned, but representation has many assumptions

Page 30: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples

Filip Radenović Giorgos Tolias Ondřej Chum

Center for Machine Perception, CTU in Prague

ECCV 2016

…global max

pooling & L2-norm

Page 31: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples

Filip Radenović Giorgos Tolias Ondřej Chum

Center for Machine Perception, CTU in Prague

ECCV 2016

…global max

pooling & L2-norm

Page 32: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples

CNN Image Retrievalcompact image descriptors

Nearest Neighbor search

…global max pooling & L2-norm

imagedescriptor

Page 33: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples

CNN Image Retrievalcompact image descriptors

Nearest Neighbor search

CNN Learning (Fine-Tuning)start with CNN trained for different but similar task (reasonable parameters)

re-train with data relevant to your task

…global max pooling & L2-norm

imagedescriptor

Page 34: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples

CNN Image Retrievalcompact image descriptors

Nearest Neighbor search

CNN Learning (Fine-Tuning)start with CNN trained for different but similar task (reasonable parameters)

re-train with data relevant to your task

Bag of Wordsstate-of-the-art retrieval performance

couples well with SfM

…global max pooling & L2-norm

imagedescriptor

Page 35: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples

CNN Image Retrievalcompact image descriptors

Nearest Neighbor search

CNN Learning (Fine-Tuning)start with CNN trained for different but similar task (reasonable parameters)

re-train with data relevant to your task

Bag of Wordsstate-of-the-art retrieval performance

couples well with SfM

Unsupervised training data generationno human interaction

…global max pooling & L2-norm

imagedescriptor

Page 36: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples

CNN Image Retrievalcompact image descriptors

Nearest Neighbor search

CNN Learning (Fine-Tuning)start with CNN trained for different but similar task (reasonable parameters)

re-train with data relevant to your task

Bag of Wordsstate-of-the-art retrieval performance

couples well with SfM

Unsupervised training data generationno human interaction

Hard Examples

hard positives hard negatives

…global max pooling & L2-norm

imagedescriptor

Page 37: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

“Lots of Training Examples”

Large Internet photo collection

Convolutional Neural Network (CNN)

Image annotations

Training

Page 38: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

“Lots of Training Examples”

Large Internet photo collection

Convolutional Neural Network (CNN)

Not accurateExpensive $$

Manual cleaning ofthe training data

done by Researchers

Very expensive $$$$

Automated extractionof training data

Very accurateFree $

Page 39: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Off-the-shelf CNN

• Target application: classification

• Training dataset: ImageNet

• Architecture: AlexNet & VGG

• Directly applicable to other tasks

Images from ImageNet.org

Fine-grain classification

Images from ImageNet.org

Object detection

Images from PASCAL VOC 2012

Image retrieval

Page 40: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Annotations for CNN Image Retrieval• CNN pre-trained for classification task used for retrieval

[Gong et al. ECCV’14, Babenko et al. ICCV’15, Kalantidis et al. arXiv’15, Tolias et al. ICLR’16]

• Fine-tuned CNN using a dataset with landmark classes[Babenko et al. ECCV’14]

• NetVLAD: Weakly supervised fine-tuned CNN using GPS tags[Arandjelovic et al. CVPR’16]

• We propose: automatic annotations for CNN training

Building class

Landmark class

spatially closest ≠ matching

Hard positives Hard negatives

Page 41: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Retrieval and SfM

[Schonberger et al. CVPR’15][Radenovic et al. CVPR’16]

Page 42: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN learns from BoW – Training DataCamera Orientation Known

Number of Inliers Known

7.4M images 713 training 3D models[Schonberger et al. CVPR’15]

Page 43: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Hard Negative Examples

query the most similarCNN descriptor

naive hard negativestop k by CNN

diverse hard negativestop k: one per 3D model

Negative examples: images from different 3D models than the queryHard negatives: closest negative examples to the queryOnly hard negatives: as good as using all negatives, but faster

increasing CNN descriptor distance to the query

Page 44: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Hard Positive Examples

query top 1 by CNN top 1 by BoWrandom from top k by BoW

harder positives

used in NetVLAD

Positive examples: images from the same 3D model as the queryHard positives: positive examples not close enough to the query

Page 45: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN Siamese Learning

…global max

pooling& L2-norm

D x 1CNNdesc.

Query Convolutional Layers Pooling Descriptor

…global max

pooling & L2-norm

D x 1CNNdesc.

Positive Convolutional Layers Pooling Descriptor

ContrastiveLoss

1 – positive0 – negative

Pair Label

MATCHING PAIR

Page 46: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN Siamese Learning

…global max

pooling& L2-norm

D x 1CNNdesc.

Query Convolutional Layers Pooling Descriptor

…global max

pooling & L2-norm

D x 1CNNdesc.

Convolutional Layers Pooling Descriptor

ContrastiveLoss

1 – positive0 – negative

Pair Label

NON-MATCHING PAIR

Contrastive vs. Triplet loss: Contrastive better with our dataContrastive loss more strict, requires accurate training dataTriplet loss less sensitive to inaccurate annotation

Page 47: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Whitening and dimensionality reduction

1. PCAw – PCA of an independent set of descriptors[Babenko et al. ICCV’15, Tolias et al. ICLR’16]

2. Lw – We propose to learn whitening using labeled training data and linear discriminant projections[Mikolajczyk & Matas ICCV’07]

…global max pooling & L2-norm

Dx1 CNNdesc.

whitening

end-to-end learning post-processing

optionaldim reduction

Page 48: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Whitening and dimensionality reduction

1. PCAw – PCA of an independent set of descriptors[Babenko et al. ICCV’15, Tolias et al. ICLR’16]

2. Lw – We propose to learn whitening using labeled training data and linear discriminant projections[Mikolajczyk & Matas ICCV’07]

3. End-to-end Learning – Performs comparable or worse than Lw, while slowing down the convergence

…global max pooling & L2-norm

Dx1 CNNdesc.

whitening

end-to-end learning

optionaldim reduction

Page 49: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Whitening and dimensionality reduction

1. PCAw – PCA of an independent set of descriptors[Babenko et al. ICCV’15, Tolias et al. ICLR’16]

2. Lw – We propose to learn whitening using labeled training data and linear discriminant projections[Mikolajczyk & Matas ICCV’07]

3. End-to-end Learning – Performs comparable or worse than Lw, while slowing down the convergence

…global max pooling & L2-norm

Dx1 CNNdesc.

whitening

end-to-end learning post-processing

optionaldim reduction

Page 50: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Efficient Scoring and Ranking

CNN descriptor encoding(512D)

Nearest neighbors used on CNN descriptorsCan use any fast NN search, like ANN

Page 51: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Experiments – datasets

• Oxford 5k dataset[Philbin et al. CVPR’07]

• Paris 6k dataset[Philbin et al. CVPR’08]

• Holidays dataset[Jegou et al. ECCV’10]

• 100k distractor dataset[Philbin et al. CVPR’07]

• Protocol: mean Average Precision (mAP)

Training 3D models do not contain any landmark from

these datasets

Page 52: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Experiments – Learning (AlexNet)

• Careful choice of positive and negative training images makes a difference

Oxford 5k Paris 6k

Off-the-shelf

top 1 CNN + top k CNN

top 1 CNN + top 1 / model CNN

top 1 BoW + top 1 / model CNN

random(top k BoW) + top 1 / model CNN

44.2

51.6

56.2

63.1

56.7

63.9

59.7

67.1

62.2

68.9

60.2

67.5Our learned whitening

Page 53: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Experiments – Over-fitting and Generalization

• We added Oxford and Paris landmarks as 3D models and repeated fine-tuning

Only +0.3 mAP on average over all testing datasets

Page 54: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

State-of-the-art

63.5

69.2

NetVLAD 256D

vs.

Our CNN 32D

Concurrent work: [Gordo et al. ECCV’16]

Page 55: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Teacher vs. Student

Our CNN with re-ranking (R) and query expansion(QE) surpasses its teacher on all datasets!!!

Method Oxf5k Oxf105k Par6k Par106k

BoW(16M)+R+QE 84.9 79.5 82.4 77.3

CNN(512D) 79.7 73.9 82.4 74.6

CNN(512D)+R+QE 85.0 81.8 86.5 78.8

Page 56: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Teacher vs. Student

query

top 10 (correct | incorrect)

BoW

CNN

first incorrect at rank 127

Page 57: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

first incorrect at rank 159

Teacher vs. Student

query top 10 (correct | incorrect)

BoW

CNN

Fine-tuningmight not be enough

Page 58: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN descriptorsSignificant viewpoint scale change lots of training data

Significant illumination change lots of training data

Severe occlusions lots of training data

Visually similar but different objects lots of training data

Page 59: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN descriptorsSignificant viewpoint scale change lots of training data

Significant illumination change lots of training data

Severe occlusions lots of training data

Visually similar but different objects lots of training data

versus

Bag of WordsSignificant viewpoint scale change covariant local features, invariant descriptors

Significant illumination change color-normalized feature descriptors

Severe occlusions locality of the features, geometric verification

Visually similar but different objects Feature discriminability & geometric verification

Page 60: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

CNN descriptor learning

• Proposed a method to generate the necessary “lots of training examples” without any human interaction

• Strong supervision for hard negative, hard positive mining, and supervised whitening

• Data and trained networks available at:cmp.felk.cvut.cz/~radenfil/projects/siamac.html

• For more details about the paper visit Poster O-1A-01

Page 61: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

So Is the Bag-of-Words REALLY torn?

Click Here

Page 62: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

So is the Bag-of-Words REALLY torn?

Not yet, but don’t mess with tape ;)

Page 63: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James

Questions?

• Thanks for your attention

• Interested students should ask about our PhD program

Center for Machine Perception

Czech Technical University in Prague

http://cmp.felk.cvut.cz

Contact Jiri Matas or Ondrej Chum