Relja Arandjelović and Andrew Zissermanvgg/publications/2014/... · 2014. 10. 28. · Relja...

Post on 10-Mar-2021

3 views 0 download

Transcript of Relja Arandjelović and Andrew Zissermanvgg/publications/2014/... · 2014. 10. 28. · Relja...

Visual vocabulary with a semantic twistRelja Arandjelović and Andrew Zisserman

Visual Geometry Group, Department of Engineering Science, University of Oxford

Motivation and objectives

Semantic vocabulary

Results

Fast Semantic Segmentationvia Soft Segments (FSSS)

(paper within a paper)

• Standard large scale instance retrieval:

- Usually based on matching local descriptors, e.g. (Root)SIFT

- Not distinctive enough

- Can't "see the big picture"

• SemanticSIFT:

- Matching: utilize local image semantic content

before

after

• Suppose we have pixel-wise semantic segmentation into C classes

• Assign a "semantic word" to a local image patch:

- The patch contains semantic class c if it contains at least one pixel of a class c

- Number of possible semantic words Ks=2C -1

- For our choice: {sky, flora, other} (C=3) there are Ks=7 semantic words: {sky}, {flora}, {other}, {sky, flora}, {sky, other}, {flora, other}, {sky, flora, other}

Matching

Product vocabulary

Feature removal

• Patches can match only if their semantic words are identical• Win #1: Increases precision due to stricter matching

• SemanticSIFT vocabulary: product vocabulary of the visual and semantic vocabularies; size K=Ksemantic x Kvisual

• Large scale retrieval: ranking via inverted index which exploits bag-of-words sparsity

- Larger vocabulary => shorter posting lists => fewer items to traverse during scoring => faster retrieval

• Win #2: Faster retrieval due to the larger (product) vocabulary

• For a specific task: some features are not useful, or even detrimental• Can remove features a priori known to be irrelevant

• Win #3: Reduced storage (RAM) costs

Win-Win-Win

• Testing on Oxford 5k and 105k datasets, training on Paris6k

• Baseline: Hamming Embedding + burstiness

• Over 5 random seeds: +1.2%

• Baseline with 7x larger visual vocabulary, Oxford 5k: 54.9%

• Expected speedup for an average query for Oxford 105k and SoftSemanticSIFT: 38.4%

Mean average precision (mAP)

Empirical speedup for the 55 Oxford queries

• State-of-the-art semantic segmentation methods take minutes per image

• We introduce a new method which takes 7 seconds on a single CPU in MATLAB for a 500x500 pixel image

• Code available:www.robots.ox.ac.uk/~vgg/software/fast_semantic_segmentation

• Idea:

- Start with fast soft-segmentation method by Leordeanu et al. ECCV 2012 (takes 1.7s)

- To handle segmentation uncertainty: introduce an "unknown" class and allow it to match all classes

- Minimize an energy which stimulates agreement between soft-segments and similar pixels, taking into account soft-segment unary potentials

- Stanford background dataset: 78% @ 3.7s / image- State-of-the-art: Lempitsky et al. (2011): 81.9% @ minutes per image due to using globalPb

• Results:

- Tighe & Lazebnik (2010): 77.5% @ 10 min / image

no geometric verification

False matches based on SIFT that are removed by semantic filtering