Image retrieval, vector quantization and nearest neighbor...

88
Image retrieval, vector quantization and nearest neighbor search Yannis Avrithis National Technical University of Athens Rennes, October 2014

Transcript of Image retrieval, vector quantization and nearest neighbor...

Page 1: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Image retrieval, vector quantization and nearestneighbor search

Yannis Avrithis

National Technical University of Athens

Rennes, October 2014

Page 2: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Part I: Image retrieval

• Particular object retrieval

• Match images under differentviewpoint/lighting, occlusion

• Given local descriptors, investigatematch kernels beyond Bag-of-Words

Part II: Vector quantization and nearest neighbor search

• Fast nearest neighbor search inhigh-dimensional spaces

• Encode vectors based on vectorquantization

• Improve fitting to underlyingdistribution

Page 3: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Part I: Image retrieval

• Particular object retrieval

• Match images under differentviewpoint/lighting, occlusion

• Given local descriptors, investigatematch kernels beyond Bag-of-Words

Part II: Vector quantization and nearest neighbor search

• Fast nearest neighbor search inhigh-dimensional spaces

• Encode vectors based on vectorquantization

• Improve fitting to underlyingdistribution

Page 4: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Part I: Image retrieval

To aggregate or not to aggregate:selective match kernels for image search

Joint work with Giorgos Tolias and Herve Jegou, ICCV 2013

Page 5: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Overview

• Problem: particular object retrieval

• Build common model for matching (HE) and aggregation (VLAD)methods; derive new match kernels

• Evaluate performance under exact or approximate descriptors

Page 6: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Related work

• In our common model:• Bag-of-Words (BoW) [Sivic & Zisserman ’03]• Descriptor approximation (Hamming embedding) [Jegou et al. ’08]• Aggregated representations (VLAD, Fisher) [Jegou et al. ’10][Perronnin

et al. ’10]

• Relevant to Part II:• Soft (multiple) assignment [Philbin et al. ’08][Jegou et al. ’10]

• Not discussed:• Spatial matching [Philbin et al. ’08][Tolias & Avrithis ’11]• Query expansion [Chum et al. ’07][Tolias & Jegou ’13]• Re-ranking [Qin et al. ’11][Shen et al. ’12]

Page 7: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Related work

• In our common model:• Bag-of-Words (BoW) [Sivic & Zisserman ’03]• Descriptor approximation (Hamming embedding) [Jegou et al. ’08]• Aggregated representations (VLAD, Fisher) [Jegou et al. ’10][Perronnin

et al. ’10]

• Relevant to Part II:• Soft (multiple) assignment [Philbin et al. ’08][Jegou et al. ’10]

• Not discussed:• Spatial matching [Philbin et al. ’08][Tolias & Avrithis ’11]• Query expansion [Chum et al. ’07][Tolias & Jegou ’13]• Re-ranking [Qin et al. ’11][Shen et al. ’12]

Page 8: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Related work

• In our common model:• Bag-of-Words (BoW) [Sivic & Zisserman ’03]• Descriptor approximation (Hamming embedding) [Jegou et al. ’08]• Aggregated representations (VLAD, Fisher) [Jegou et al. ’10][Perronnin

et al. ’10]

• Relevant to Part II:• Soft (multiple) assignment [Philbin et al. ’08][Jegou et al. ’10]

• Not discussed:• Spatial matching [Philbin et al. ’08][Tolias & Avrithis ’11]• Query expansion [Chum et al. ’07][Tolias & Jegou ’13]• Re-ranking [Qin et al. ’11][Shen et al. ’12]

Page 9: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Image representation

• Entire image: set of local descriptors X = {x1, . . . , xn}• Descriptors assigned to cell c: Xc = {x ∈ X : q(x) = c}

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 10: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Set similarity function

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wcM (Xc,Yc)

normalization factor cell weighting cell similarity

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 11: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Set similarity function

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wcM (Xc,Yc)

normalization factor cell weighting cell similarity

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 12: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Set similarity function

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wcM (Xc,Yc)

normalization factor cell weighting cell similarity

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 13: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Set similarity function

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wcM (Xc,Yc)

normalization factor cell weighting cell similarity

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 14: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Set similarity function

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wcM (Xc,Yc)

normalization factor cell weighting cell similarity

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 15: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Set similarity function

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wcM (Xc,Yc)

normalization factor cell weighting cell similarity

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 16: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Bag-of-Words (BoW) similarity function

Cosine similarity

M(Xc,Yc) = |Xc| × |Yc| =∑x∈Xc

∑y∈Yc

1

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 17: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Bag-of-Words (BoW) similarity function

Cosine similarity

M(Xc,Yc) = |Xc| × |Yc| =∑x∈Xc

∑y∈Yc

1

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 18: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Bag-of-Words (BoW) similarity function

Cosine similarity

M(Xc,Yc) = |Xc| × |Yc| =∑x∈Xc

∑y∈Yc

1

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 19: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Hamming Embedding (HE)

M (Xc,Yc) =∑x∈Xc

∑y∈Yc

w(

h(bx, by

))

weight function Hamming distance binary codes

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 20: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Hamming Embedding (HE)

M (Xc,Yc) =∑x∈Xc

∑y∈Yc

w(

h(bx, by

))

weight function Hamming distance binary codes

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 21: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Hamming Embedding (HE)

M (Xc,Yc) =∑x∈Xc

∑y∈Yc

w(

h(bx, by

))

weight function Hamming distance binary codes

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 22: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Hamming Embedding (HE)

M (Xc,Yc) =∑x∈Xc

∑y∈Yc

w(

h(bx, by

))

weight function Hamming distance binary codes

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 23: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Hamming Embedding (HE)

M (Xc,Yc) =∑x∈Xc

∑y∈Yc

w(

h(bx, by

))

weight function Hamming distance binary codes

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 24: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Hamming Embedding (HE)

M (Xc,Yc) =∑x∈Xc

∑y∈Yc

w(

h(bx, by

))

weight function Hamming distance binary codes

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 25: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

VLAD

M (Xc,Yc) = V (Xc)>V (Yc) =∑x∈Xc

∑y∈Yc

r(x)>r(y)

aggregated residual∑

x∈Xcr(x) residual x− q(x)

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 26: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

VLAD

M (Xc,Yc) = V (Xc)>V (Yc) =∑x∈Xc

∑y∈Yc

r(x)>r(y)

aggregated residual∑

x∈Xcr(x) residual x− q(x)

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 27: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

VLAD

M (Xc,Yc) = V (Xc)>V (Yc) =∑x∈Xc

∑y∈Yc

r(x)>r(y)

aggregated residual∑

x∈Xcr(x) residual x− q(x)

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 28: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

VLAD

M (Xc,Yc) = V (Xc)>V (Yc) =∑x∈Xc

∑y∈Yc

r(x)>r(y)

aggregated residual∑

x∈Xcr(x) residual x− q(x)

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 29: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

VLAD

M (Xc,Yc) = V (Xc)>V (Yc) =∑x∈Xc

∑y∈Yc

r(x)>r(y)

aggregated residual∑

x∈Xcr(x) residual x− q(x)

Generic set similarity

K(X ,Y) = γ(X ) γ(Y)∑c∈C

wc M (Xc,Yc)

Page 30: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Design choices

Hamming embedding

• Binary signature & voting per descriptor (not aggregated)

• Selective: discard weak votes

VLAD

• One aggregated vector per cell

• Linear operation

Questions

• Is aggregation good with large vocabularies (e.g. 65k)?

• How important is selectivity in this case?

Page 31: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Design choices

Hamming embedding

• Binary signature & voting per descriptor (not aggregated)

• Selective: discard weak votes

VLAD

• One aggregated vector per cell

• Linear operation

Questions

• Is aggregation good with large vocabularies (e.g. 65k)?

• How important is selectivity in this case?

Page 32: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Common model

Non aggregated

MN(Xc,Yc) =∑x∈Xc

∑y∈Yc

σ(φ(x)>φ(y)

)

selectivity function descriptor representation (residual, binary, scalar)

Aggregated

MA(Xc,Yc) = σ

ψ(∑x∈Xc

φ(x)

)>

ψ

∑y∈Yc

φ(y)

= σ(

Φ(Xc)>Φ(Yc))

normalization (`2, power-law) cell representation

Page 33: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Common model

Non aggregated

MN(Xc,Yc) =∑x∈Xc

∑y∈Yc

σ(φ(x)>φ(y)

)

selectivity function descriptor representation (residual, binary, scalar)

Aggregated

MA(Xc,Yc) = σ

ψ(∑x∈Xc

φ(x)

)>

ψ

∑y∈Yc

φ(y)

= σ(

Φ(Xc)>Φ(Yc))

normalization (`2, power-law) cell representation

Page 34: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Common model

Non aggregated

MN(Xc,Yc) =∑x∈Xc

∑y∈Yc

σ(φ(x)>φ(y)

)

selectivity function descriptor representation (residual, binary, scalar)

Aggregated

MA(Xc,Yc) = σ

ψ(∑x∈Xc

φ(x)

)>

ψ

∑y∈Yc

φ(y)

= σ(

Φ(Xc)>Φ(Yc))

normalization (`2, power-law) cell representation

Page 35: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Common model

Non aggregated

MN(Xc,Yc) =∑x∈Xc

∑y∈Yc

σ(φ(x)>φ(y)

)

selectivity function descriptor representation (residual, binary, scalar)

Aggregated

MA(Xc,Yc) = σ

ψ(∑x∈Xc

φ(x)

)>

ψ

∑y∈Yc

φ(y)

= σ(

Φ(Xc)>Φ(Yc))

normalization (`2, power-law) cell representation

Page 36: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Common model

Non aggregated

MN(Xc,Yc) =∑x∈Xc

∑y∈Yc

σ(φ(x)>φ(y)

)

selectivity function descriptor representation (residual, binary, scalar)

Aggregated

MA(Xc,Yc) = σ

ψ(∑x∈Xc

φ(x)

)>

ψ

∑y∈Yc

φ(y)

= σ(

Φ(Xc)>Φ(Yc))

normalization (`2, power-law) cell representation

Page 37: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Common model

Non aggregated

MN(Xc,Yc) =∑x∈Xc

∑y∈Yc

σ(φ(x)>φ(y)

)

selectivity function descriptor representation (residual, binary, scalar)

Aggregated

MA(Xc,Yc) = σ

ψ(∑x∈Xc

φ(x)

)>

ψ

∑y∈Yc

φ(y)

= σ(

Φ(Xc)>Φ(Yc))

normalization (`2, power-law) cell representation

Page 38: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

BoW, HE and VLAD in the common modelModel M(Xc,Yc) φ(x) σ(u) ψ(z) Φ(Xc)BoW MN or MA 1 u z |Xc|HE MN only bx w

(B2 (1− u)

)— —

VLAD MN or MA r(x) u z V (Xc)

BoW M(Xc,Yc) =∑x∈Xc

∑y∈Yc

1 = |Xc| × |Yc|

HE M (Xc,Yc) =∑x∈Xc

∑y∈Yc

w (h (bx, by))

VLAD M (Xc,Yc) =∑x∈Xc

∑y∈Yc

r(x)>r(y) = V (Xc)>V (Yc)

MN(Xc,Yc) =∑x∈Xc

∑y∈Yc

σ(φ(x)>φ(y)

)

MA(Xc,Yc) = σ

ψ(∑x∈Xc

φ(x)

)>

ψ

∑y∈Yc

φ(y)

= σ(

Φ(Xc)>Φ(Yc))

Page 39: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

BoW, HE and VLAD in the common modelModel M(Xc,Yc) φ(x) σ(u) ψ(z) Φ(Xc)BoW MN or MA 1 u z |Xc|HE MN only bx w

(B2 (1− u)

)— —

VLAD MN or MA r(x) u z V (Xc)

BoW M(Xc,Yc) =∑x∈Xc

∑y∈Yc

1 = |Xc| × |Yc|

HE M (Xc,Yc) =∑x∈Xc

∑y∈Yc

w (h (bx, by))

VLAD M (Xc,Yc) =∑x∈Xc

∑y∈Yc

r(x)>r(y) = V (Xc)>V (Yc)

MN(Xc,Yc) =∑x∈Xc

∑y∈Yc

σ(φ(x)>φ(y)

)

MA(Xc,Yc) = σ

ψ(∑x∈Xc

φ(x)

)>

ψ

∑y∈Yc

φ(y)

= σ(

Φ(Xc)>Φ(Yc))

Page 40: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

BoW, HE and VLAD in the common modelModel M(Xc,Yc) φ(x) σ(u) ψ(z) Φ(Xc)BoW MN or MA 1 u z |Xc|HE MN only bx w

(B2 (1− u)

)— —

VLAD MN or MA r(x) u z V (Xc)

BoW M(Xc,Yc) =∑x∈Xc

∑y∈Yc

1 = |Xc| × |Yc|

HE M (Xc,Yc) =∑x∈Xc

∑y∈Yc

w (h (bx, by))

VLAD M (Xc,Yc) =∑x∈Xc

∑y∈Yc

r(x)>r(y) = V (Xc)>V (Yc)

MN(Xc,Yc) =∑x∈Xc

∑y∈Yc

σ(φ(x)>φ(y)

)

MA(Xc,Yc) = σ

ψ(∑x∈Xc

φ(x)

)>

ψ

∑y∈Yc

φ(y)

= σ(

Φ(Xc)>Φ(Yc))

Page 41: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Selective Match Kernel (SMK)

SMK(Xc,Yc) =∑x∈Xc

∑y∈Yc

σα(r(x)>r(y))

• Descriptor representation: `2-normalized residual

φ(x) = r(x) = r(x)/‖r(x)‖• Selectivity function

σα(u) =

{sign(u)|u|α, u > τ0, otherwise

0

0.2

0.4

0.6

0.8

1

-0.4-0.2 0 0.2 0.4 0.6 0.8 1

sim

ilarit

y sc

ore

dot product

α=3, τ=0

Page 42: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Selective Match Kernel (SMK)

SMK(Xc,Yc) =∑x∈Xc

∑y∈Yc

σα(r(x)>r(y))

• Descriptor representation: `2-normalized residual

φ(x) = r(x) = r(x)/‖r(x)‖• Selectivity function

σα(u) =

{sign(u)|u|α, u > τ0, otherwise

0

0.2

0.4

0.6

0.8

1

-0.4-0.2 0 0.2 0.4 0.6 0.8 1

sim

ilarit

y sc

ore

dot product

α=3, τ=0

Page 43: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Matching example—impact of threshold

α = 1, τ = 0.0

α = 1, τ = 0.25

thresholding removes false correspondences

Page 44: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Matching example—impact of shape parameter

α = 3, τ = 0.0

α = 3, τ = 0.25

weighs matches based on confidence

Page 45: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Aggregated Selective Match Kernel (ASMK)

ASMK(Xc,Yc) = σα

(V (Xc)>V (Yc)

)• Cell representation: `2-normalized aggregated residual

Φ(Xc) = V (Xc) = V (Xc)/‖V (Xc)‖

• Similar to [Arandjelovic & Zisserman ’13], but:• with selectivity function σα• used with large vocabularies

Page 46: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Aggregated Selective Match Kernel (ASMK)

ASMK(Xc,Yc) = σα

(V (Xc)>V (Yc)

)• Cell representation: `2-normalized aggregated residual

Φ(Xc) = V (Xc) = V (Xc)/‖V (Xc)‖

• Similar to [Arandjelovic & Zisserman ’13], but:• with selectivity function σα• used with large vocabularies

Page 47: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Aggregated features: k = 128 as in VLAD

Page 48: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Aggregated features: k = 65K as in ASMK

Page 49: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Why to aggregate: burstiness

• Burstiness: non-iid statistical behaviour of descriptors

• Matches of bursty features dominate the total similarity score

• Previous work: [Jegou et al. ’09][Chum & Matas ’10][Torii et al. ’13]

In this work

• Aggregation and normalization per cell handles burstiness

• Keeps a single representative, similar to max-pooling

Page 50: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Why to aggregate: burstiness

• Burstiness: non-iid statistical behaviour of descriptors

• Matches of bursty features dominate the total similarity score

• Previous work: [Jegou et al. ’09][Chum & Matas ’10][Torii et al. ’13]

In this work

• Aggregation and normalization per cell handles burstiness

• Keeps a single representative, similar to max-pooling

Page 51: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Binary counterparts SMK? and ASMK?

• Full vector representation: high memory cost

• Approximate vector representation: binary vector

SMK?(Xc,Yc) =∑x∈Xc

∑y∈Yc

σα

{b(r(x))>b(r(y))

}

ASMK?(Xc,Yc) = σα

b(∑x∈Xc

r(x)

)>

b

∑y∈Yc

r(y)

b includes centering and rotation as in HE, followed by binarizationand `2-normalization

Page 52: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Impact of selectivity

64 66 68 70 72 74 76 78 80 82

1 2 3 4 5 6 7

mA

P

α

SMK

Oxford5kParis6k

Holidays 64 66 68 70 72 74 76 78 80 82

1 2 3 4 5 6 7

mA

P

α

SMK*

Oxford5kParis6k

Holidays

64 66 68 70 72 74 76 78 80 82

1 2 3 4 5 6 7

mA

P

α

ASMK

Oxford5kParis6k

Holidays 64 66 68 70 72 74 76 78 80 82

1 2 3 4 5 6 7

mA

P

α

ASMK*

Oxford5kParis6k

Holidays

Page 53: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Impact of aggregation

• Improves performance for different vocabulary sizes

• Reduces memory requirements of inverted file

k memory ratio

8k 69 %16k 78 %32k 85 %65k 89 %

70

72

74

76

78

80

82

84

8k 16k 32k 65km

AP

k

Oxford5k - MA

SMKSMK-BURST

ASMK

with k = 8k on Oxford5k• VLAD → 65.5%• SMK → 74.2%• ASMK → 78.1%

Page 54: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Comparison to state of the art

Dataset MA Oxf5k Oxf105k Par6k Holiday

ASMK? 76.4 69.2 74.4 80.0ASMK? × 80.4 75.0 77.0 81.0ASMK 78.1 - 76.0 81.2ASMK × 81.7 - 78.2 82.2

HE [Jegou et al. ’10] 51.7 - - 74.5HE [Jegou et al. ’10] × 56.1 - - 77.5HE-BURST [Jain et al. ’10] 64.5 - - 78.0HE-BURST [Jain et al. ’10] × 67.4 - - 79.6Fine vocab. [Mikulık et al. ’10] × 74.2 67.4 74.9 74.9AHE-BURST [Jain et al. ’10] 66.6 - - 79.4AHE-BURST [Jain et al. ’10] × 69.8 - - 81.9Rep. structures [Torri et al. ’13] × 65.6 - - 74.9

Page 55: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Discussion

• Aggregation is also beneficial with large vocabularies → burstiness

• Selectivity always helps (with or without aggregation)

• Descriptor approximation reduces performance only slightly

Page 56: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Part II: Vector quantizationand nearest neighbor search

Locally optimized product quantization

Joint work with Yannis Kalantidis, CVPR 2014

Page 57: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Overview

• Problem: given query point q, find its nearest neighbor with respectto Euclidean distance within data set X in a d-dimensional space

• Focus on large scale: encode (compress) vectors, speed up distancecomputations

• Fit better underlying distribution with little space & time overhead

Page 58: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Applications

• Retrieval (image as point) [Jegou et al. ’10][Perronnin et al. ’10]

• Retrieval (descriptor as point) [Tolias et al. ’13][Qin et al. ’13]

• Localization, pose estimation [Sattler et al. ’12][Li et al. ’12]

• Classification [Boiman et al. ’08][McCann & Lowe ’12]

• Clustering [Philbin et al. ’07][Avrithis ’13]

Page 59: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Related work

• Indexing• Inverted index (image retrieval)• Inverted multi-index [Babenko & Lempitsky ’12] (nearest neighbor

search)

• Encoding and ranking• Vector quantization (VQ)• Product quantization (PQ) [Jegou et al. ’11]• Optimized product quantization (OPQ) [Ge et al. ’13]

Cartesian k-means [Norouzi & Fleet ’13]• Locally optimized product quantization (LOPQ) [Kalantidis and

Avrithis ’14]

• Not discussed• Tree-based indexing, e.g., [Muja and Lowe ’09]• Hashing and binary codes, e.g., [Norouzi et al. ’12]

Page 60: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Related work

• Indexing• Inverted index (image retrieval)• Inverted multi-index [Babenko & Lempitsky ’12] (nearest neighbor

search)

• Encoding and ranking• Vector quantization (VQ)• Product quantization (PQ) [Jegou et al. ’11]• Optimized product quantization (OPQ) [Ge et al. ’13]

Cartesian k-means [Norouzi & Fleet ’13]• Locally optimized product quantization (LOPQ) [Kalantidis and

Avrithis ’14]

• Not discussed• Tree-based indexing, e.g., [Muja and Lowe ’09]• Hashing and binary codes, e.g., [Norouzi et al. ’12]

Page 61: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Related work

• Indexing• Inverted index (image retrieval)• Inverted multi-index [Babenko & Lempitsky ’12] (nearest neighbor

search)

• Encoding and ranking• Vector quantization (VQ)• Product quantization (PQ) [Jegou et al. ’11]• Optimized product quantization (OPQ) [Ge et al. ’13]

Cartesian k-means [Norouzi & Fleet ’13]• Locally optimized product quantization (LOPQ) [Kalantidis and

Avrithis ’14]

• Not discussed• Tree-based indexing, e.g., [Muja and Lowe ’09]• Hashing and binary codes, e.g., [Norouzi et al. ’12]

Page 62: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Inverted indexIndex

54

67

72

54

67

72

12 13 14 15 16 17 18 19 20 21 22

images

query

3

Page 63: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Inverted indexInverted file

54

67

72

1 1 1

54

67

72

12 13 14 15 16 17 18 19 20 21 22

images

query

3

Page 64: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Inverted indexInverted file

54

67

72

1 2 2 1

54

67

72

12 13 14 15 16 17 18 19 20 21 22

images

query

3

Page 65: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Inverted indexInverted file

54

67

72

1 3 1 2 1 1

54

67

72

12 13 14 15 16 17 18 19 20 21 22

images

query

3

Page 66: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Inverted indexRanking

54

67

72

1 3 1 2 1 1

54

67

72

12 13 14 15 16 17 18 19 20 21 22

images

query

rankedshortlist

3

Page 67: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Inverted index—issues

• Are items in a postings list equally important?

• What changes under soft (multiple) assignment?

• How should vectors be encoded for memory efficiency and fast search?

Page 68: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Inverted multi-index

0 0.2 0.4 0.6 0.8 1

0.4

0.5

0.6

0.7

0 0.2 0.4 0.6 0.8 1

0.4

0.5

0.6

0.7

inverted index inverted multi-indexFigure 1. Indexing the set of 600 points (small black) distributed non-uniformly within the unit 2D square. Left – the inverted index basedon standard quantization (the codebook has 16 2D codewords; boundaries are in green). Right – the inverted multi-index based on productquantization (each of the two codebooks has 16 1D codewords). The number of operations needed to match a query to codebooks is thesame for both structures. Two example queries are issued (light-blue and light-red circles). The lists returned by the inverted index (left)contain 45 and 62 words respectively (circled). Note that when a query lies near a space partition boundary (as happens most often in highdimensions) the resulting list is heavily “skewed” and may not contain many of the nearest neighbors. Note also that the inverted indexis not able to return lists of a pre-specified small length (e.g. 30 points). For the same queries, the candidate lists of at least 30 vectorsare requested from the inverted multi-index (right) and the lists containing 31 and 32 words are returned (circled). As even such shortlists require visiting several nearest cells in the partition (which can be done efficiently via the multi-sequence algorithm), the resultingvector sets span the neighborhoods that are much less “skewed” (i.e., the neighborhoods are approximately centered at the queries). In highdimensions, the capability to visit many cells that surround the query from different directions translates into considerably higher accuracyof retrieval and nearest neighbor search.

multi-index table corresponds to a part of the original vec-tor space and contains a list of points that fall within thatpart. Importantly, we propose a simple and efficient algo-rithm that produces a sequence of multi-index entries or-dered by the increasing distance between the given queryvector and the centroid of the corresponding entry. Simi-larly to standard inverted indices, concatenating the vectorlists for a certain number of entries that are closest to thequery vector then produces the candidate list.

Crucially, given comparable time budgets for queryingthe dataset as well as for the initial index construction, in-verted multi-indices subdivide the vector space orders ofmagnitude more densely compared to standard inverted in-dices (Figure 1). Our experiments demonstrate the ad-vantages resulting from this property, in particular in thecontext of very large scale approximate nearest neighborsearch. We evaluate the inverted multi-index on the BI-GANN dataset of 1 billion SIFT vectors recently introducedby Jegou et al. [11] as well as on the “Tiny Images” datasetof 80 million GIST vectors introduced by [24]. We showthat as a result of the “extra-fine” granularity, the candidatelists produced by querying multi-indices are more accurate(have shorter lengths and higher probability of containingtrue nearest neighbors) compared to standard inverted in-dices. We also demonstrate that in combination with a suit-able reranking procedure, multi-indices substantially im-prove the state-of-the-art approximate nearest neighbor re-trieval performance on the BIGANN dataset.

2. Related Work

The use of inverted indices has a long history in infor-mation retrieval [15]. Their use in computer vision was pi-

oneered by Sivic and Zisserman [23]. Since then, a largenumber of improvements that transfer further ideas fromtext retrieval (e.g. [4]), improve the quantization process(e.g. [20]), and integrate the query process with geomet-ric verification (e.g. [27]) have been proposed. Many ofthese improvements can be used in conjunction with in-verted multi-indices in the same way as with regular in-verted indices.

Approximate near(est) neighbor (ANN) search is a coreoperation in AI. ANN-systems based on tree-based indices(e.g. [2]) as well as on random projections (e.g. [7]) areoften employed. However, the large memory footprint ofthese methods limits their use to smaller datasets (up to mil-lions of vectors). Recently, lossy compression schemes thatadmit both compact storage and efficient distance evalua-tions and are therefore more suitable for large-scale datasetshave been developed. Towards this end, binary encodingschemes (e.g. [22, 25, 21]) as well as product quantization[9] have brought down both memory consumption and dis-tance evaluation time by order(s) of magnitude comparedto manipulating uncompressed vectors, to the point whereexhaustive search can be used to query rather large datasets(up to many millions of vectors).

The idea of fast distance computation via product quan-tization introduced by Jegou et al. [9] has served as a pri-mary inspiration for this work. Our contribution, however,is complementary to that of [9]. In fact, the systems pre-sented by Jegou et al. in [9, 11, 10] use standard invertedindices and, consequently, have to rerank rather long can-didate lists when querying very large datasets in order toachieve high recall. Unlike [9, 11, 10], we focus on theuse of PQ for indexing and candidate list generation. Wealso note that while we combine multi-indices with the PQ-

• decompose vectors as x = (x1,x2)

• train codebooks C1, C2 from datasets {x1n}, {x2

n}• induced codebook C1 × C2 gives a finer partition

• given query q, visit cells (c1, c2) ∈ C1 × C2 in ascending order ofdistance to q by multi-sequence algorithm

Page 69: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Inverted multi-index

0 0.2 0.4 0.6 0.8 1

0.4

0.5

0.6

0.7

0 0.2 0.4 0.6 0.8 1

0.4

0.5

0.6

0.7

inverted index inverted multi-indexFigure 1. Indexing the set of 600 points (small black) distributed non-uniformly within the unit 2D square. Left – the inverted index basedon standard quantization (the codebook has 16 2D codewords; boundaries are in green). Right – the inverted multi-index based on productquantization (each of the two codebooks has 16 1D codewords). The number of operations needed to match a query to codebooks is thesame for both structures. Two example queries are issued (light-blue and light-red circles). The lists returned by the inverted index (left)contain 45 and 62 words respectively (circled). Note that when a query lies near a space partition boundary (as happens most often in highdimensions) the resulting list is heavily “skewed” and may not contain many of the nearest neighbors. Note also that the inverted indexis not able to return lists of a pre-specified small length (e.g. 30 points). For the same queries, the candidate lists of at least 30 vectorsare requested from the inverted multi-index (right) and the lists containing 31 and 32 words are returned (circled). As even such shortlists require visiting several nearest cells in the partition (which can be done efficiently via the multi-sequence algorithm), the resultingvector sets span the neighborhoods that are much less “skewed” (i.e., the neighborhoods are approximately centered at the queries). In highdimensions, the capability to visit many cells that surround the query from different directions translates into considerably higher accuracyof retrieval and nearest neighbor search.

multi-index table corresponds to a part of the original vec-tor space and contains a list of points that fall within thatpart. Importantly, we propose a simple and efficient algo-rithm that produces a sequence of multi-index entries or-dered by the increasing distance between the given queryvector and the centroid of the corresponding entry. Simi-larly to standard inverted indices, concatenating the vectorlists for a certain number of entries that are closest to thequery vector then produces the candidate list.

Crucially, given comparable time budgets for queryingthe dataset as well as for the initial index construction, in-verted multi-indices subdivide the vector space orders ofmagnitude more densely compared to standard inverted in-dices (Figure 1). Our experiments demonstrate the ad-vantages resulting from this property, in particular in thecontext of very large scale approximate nearest neighborsearch. We evaluate the inverted multi-index on the BI-GANN dataset of 1 billion SIFT vectors recently introducedby Jegou et al. [11] as well as on the “Tiny Images” datasetof 80 million GIST vectors introduced by [24]. We showthat as a result of the “extra-fine” granularity, the candidatelists produced by querying multi-indices are more accurate(have shorter lengths and higher probability of containingtrue nearest neighbors) compared to standard inverted in-dices. We also demonstrate that in combination with a suit-able reranking procedure, multi-indices substantially im-prove the state-of-the-art approximate nearest neighbor re-trieval performance on the BIGANN dataset.

2. Related Work

The use of inverted indices has a long history in infor-mation retrieval [15]. Their use in computer vision was pi-

oneered by Sivic and Zisserman [23]. Since then, a largenumber of improvements that transfer further ideas fromtext retrieval (e.g. [4]), improve the quantization process(e.g. [20]), and integrate the query process with geomet-ric verification (e.g. [27]) have been proposed. Many ofthese improvements can be used in conjunction with in-verted multi-indices in the same way as with regular in-verted indices.

Approximate near(est) neighbor (ANN) search is a coreoperation in AI. ANN-systems based on tree-based indices(e.g. [2]) as well as on random projections (e.g. [7]) areoften employed. However, the large memory footprint ofthese methods limits their use to smaller datasets (up to mil-lions of vectors). Recently, lossy compression schemes thatadmit both compact storage and efficient distance evalua-tions and are therefore more suitable for large-scale datasetshave been developed. Towards this end, binary encodingschemes (e.g. [22, 25, 21]) as well as product quantization[9] have brought down both memory consumption and dis-tance evaluation time by order(s) of magnitude comparedto manipulating uncompressed vectors, to the point whereexhaustive search can be used to query rather large datasets(up to many millions of vectors).

The idea of fast distance computation via product quan-tization introduced by Jegou et al. [9] has served as a pri-mary inspiration for this work. Our contribution, however,is complementary to that of [9]. In fact, the systems pre-sented by Jegou et al. in [9, 11, 10] use standard invertedindices and, consequently, have to rerank rather long can-didate lists when querying very large datasets in order toachieve high recall. Unlike [9, 11, 10], we focus on theuse of PQ for indexing and candidate list generation. Wealso note that while we combine multi-indices with the PQ-

• decompose vectors as x = (x1,x2)

• train codebooks C1, C2 from datasets {x1n}, {x2

n}• induced codebook C1 × C2 gives a finer partition

• given query q, visit cells (c1, c2) ∈ C1 × C2 in ascending order ofdistance to q by multi-sequence algorithm

Page 70: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Inverted multi-index

0 0.2 0.4 0.6 0.8 1

0.4

0.5

0.6

0.7

0 0.2 0.4 0.6 0.8 1

0.4

0.5

0.6

0.7

inverted index inverted multi-indexFigure 1. Indexing the set of 600 points (small black) distributed non-uniformly within the unit 2D square. Left – the inverted index basedon standard quantization (the codebook has 16 2D codewords; boundaries are in green). Right – the inverted multi-index based on productquantization (each of the two codebooks has 16 1D codewords). The number of operations needed to match a query to codebooks is thesame for both structures. Two example queries are issued (light-blue and light-red circles). The lists returned by the inverted index (left)contain 45 and 62 words respectively (circled). Note that when a query lies near a space partition boundary (as happens most often in highdimensions) the resulting list is heavily “skewed” and may not contain many of the nearest neighbors. Note also that the inverted indexis not able to return lists of a pre-specified small length (e.g. 30 points). For the same queries, the candidate lists of at least 30 vectorsare requested from the inverted multi-index (right) and the lists containing 31 and 32 words are returned (circled). As even such shortlists require visiting several nearest cells in the partition (which can be done efficiently via the multi-sequence algorithm), the resultingvector sets span the neighborhoods that are much less “skewed” (i.e., the neighborhoods are approximately centered at the queries). In highdimensions, the capability to visit many cells that surround the query from different directions translates into considerably higher accuracyof retrieval and nearest neighbor search.

multi-index table corresponds to a part of the original vec-tor space and contains a list of points that fall within thatpart. Importantly, we propose a simple and efficient algo-rithm that produces a sequence of multi-index entries or-dered by the increasing distance between the given queryvector and the centroid of the corresponding entry. Simi-larly to standard inverted indices, concatenating the vectorlists for a certain number of entries that are closest to thequery vector then produces the candidate list.

Crucially, given comparable time budgets for queryingthe dataset as well as for the initial index construction, in-verted multi-indices subdivide the vector space orders ofmagnitude more densely compared to standard inverted in-dices (Figure 1). Our experiments demonstrate the ad-vantages resulting from this property, in particular in thecontext of very large scale approximate nearest neighborsearch. We evaluate the inverted multi-index on the BI-GANN dataset of 1 billion SIFT vectors recently introducedby Jegou et al. [11] as well as on the “Tiny Images” datasetof 80 million GIST vectors introduced by [24]. We showthat as a result of the “extra-fine” granularity, the candidatelists produced by querying multi-indices are more accurate(have shorter lengths and higher probability of containingtrue nearest neighbors) compared to standard inverted in-dices. We also demonstrate that in combination with a suit-able reranking procedure, multi-indices substantially im-prove the state-of-the-art approximate nearest neighbor re-trieval performance on the BIGANN dataset.

2. Related Work

The use of inverted indices has a long history in infor-mation retrieval [15]. Their use in computer vision was pi-

oneered by Sivic and Zisserman [23]. Since then, a largenumber of improvements that transfer further ideas fromtext retrieval (e.g. [4]), improve the quantization process(e.g. [20]), and integrate the query process with geomet-ric verification (e.g. [27]) have been proposed. Many ofthese improvements can be used in conjunction with in-verted multi-indices in the same way as with regular in-verted indices.

Approximate near(est) neighbor (ANN) search is a coreoperation in AI. ANN-systems based on tree-based indices(e.g. [2]) as well as on random projections (e.g. [7]) areoften employed. However, the large memory footprint ofthese methods limits their use to smaller datasets (up to mil-lions of vectors). Recently, lossy compression schemes thatadmit both compact storage and efficient distance evalua-tions and are therefore more suitable for large-scale datasetshave been developed. Towards this end, binary encodingschemes (e.g. [22, 25, 21]) as well as product quantization[9] have brought down both memory consumption and dis-tance evaluation time by order(s) of magnitude comparedto manipulating uncompressed vectors, to the point whereexhaustive search can be used to query rather large datasets(up to many millions of vectors).

The idea of fast distance computation via product quan-tization introduced by Jegou et al. [9] has served as a pri-mary inspiration for this work. Our contribution, however,is complementary to that of [9]. In fact, the systems pre-sented by Jegou et al. in [9, 11, 10] use standard invertedindices and, consequently, have to rerank rather long can-didate lists when querying very large datasets in order toachieve high recall. Unlike [9, 11, 10], we focus on theuse of PQ for indexing and candidate list generation. Wealso note that while we combine multi-indices with the PQ-

• decompose vectors as x = (x1,x2)

• train codebooks C1, C2 from datasets {x1n}, {x2

n}• induced codebook C1 × C2 gives a finer partition

• given query q, visit cells (c1, c2) ∈ C1 × C2 in ascending order ofdistance to q by multi-sequence algorithm

Page 71: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Multi-sequence algorithm

C1 →

C2↓

Page 72: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Vector quantization (VQ)

minimize∑x∈X

minc∈C‖x− c‖2 =

∑x∈X‖x− q(x)‖2 = E(C)

dataset codebook quantizer distortion

Page 73: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Vector quantization (VQ)

minimize∑x∈X

minc∈C‖x− c‖2 =

∑x∈X‖x− q(x)‖2 = E(C)

dataset codebook quantizer distortion

Page 74: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Vector quantization (VQ)

minimize∑x∈X

minc∈C‖x− c‖2 =

∑x∈X‖x− q(x)‖2 = E(C)

dataset codebook quantizer distortion

Page 75: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Vector quantization (VQ)

• For small distortion → large k = |C|:• hard to train• too large to store• too slow to search

Page 76: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Product quantization (PQ)

minimize∑x∈X

minc∈C‖x− c‖2

subject to C = C1 × · · · × Cm

Page 77: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Product quantization (PQ)

• train: q = (q1, . . . , qm) where q1, . . . , qm obtained by VQ

• store: |C| = km with |C1| = · · · = |Cm| = k

• search: ‖y − q(x)‖2 =

m∑j=1

‖yj − qj(xj)‖2 where qj(xj) ∈ Cj

Page 78: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Optimized product quantization (OPQ)

minimize∑x∈X

minc∈C‖x−Rc‖2

subject to C = C1 × · · · × CmR>R = I

Page 79: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

OPQ, parametric solution for X ∼ N (0,Σ)

• independence: PCA-align by diagonalizing Σ as UΛU>

• balanced variance: permute Λ such that∏i λi is constant in each

subspace; R← UP>π

• find C by PQ on rotated data x = R>x

Page 80: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Locally optimized product quantization (LOPQ)

• compute residuals r(x) = x− q(x) on coarse quantizer q

• collect residuals Zi = {r(x) : q(x) = ci} per cell

• train (Ri, qi)← OPQ(Zi) per cell

Page 81: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Locally optimized product quantization (LOPQ)

• better capture support of data distribution, like local PCA [Kambhatla& Leen ’97]

• multimodal (e.g. mixture) distributions• distributions on nonlinear manifolds

• residual distributions closer to Gaussian assumption

Page 82: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Multi-LOPQ

x = ( x1 , x2 )

q2

q1 ...

...

Page 83: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Comparison to state of the artSIFT1B, 64-bit codes

Method R = 1 R = 10 R = 100

Ck-means [Norouzi & Fleet ’13] – – 0.649IVFADC 0.106 0.379 0.748IVFADC [Jegou et al. ’11] 0.088 0.372 0.733OPQ 0.114 0.399 0.777Multi-D-ADC [Babenko & Lempitsky ’12] 0.165 0.517 0.860

LOR+PQ 0.183 0.565 0.889LOPQ 0.199 0.586 0.909

Most benefit comes from locally optimized rotation!

Page 84: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Comparison to state of the artSIFT1B, 64-bit codes

Method R = 1 R = 10 R = 100

Ck-means [Norouzi & Fleet ’13] – – 0.649IVFADC 0.106 0.379 0.748IVFADC [Jegou et al. ’11] 0.088 0.372 0.733OPQ 0.114 0.399 0.777Multi-D-ADC [Babenko & Lempitsky ’12] 0.165 0.517 0.860

LOR+PQ 0.183 0.565 0.889LOPQ 0.199 0.586 0.909

Most benefit comes from locally optimized rotation!

Page 85: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Comparison to state of the artSIFT1B, 128-bit codes

T Method R = 1 10 100

20KIVFADC+R [Jegou et al. ’11] 0.262 0.701 0.962LOPQ+R 0.350 0.820 0.978

10KMulti-D-ADC [Babenko & Lempitsky ’12] 0.304 0.665 0.740OMulti-D-OADC [Ge et al. ’13] 0.345 0.725 0.794Multi-LOPQ 0.430 0.761 0.782

30KMulti-D-ADC [Babenko & Lempitsky ’12] 0.328 0.757 0.885OMulti-D-OADC [Ge et al. ’13] 0.366 0.807 0.913Multi-LOPQ 0.463 0.865 0.905

100KMulti-D-ADC [Babenko & Lempitsky ’12] 0.334 0.793 0.959OMulti-D-OADC [Ge et al. ’13] 0.373 0.841 0.973Multi-LOPQ 0.476 0.919 0.973

Page 86: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Residual encoding in related work

• PQ (IVFADC) [Jegou et al. ’11]: single product quantizer for all cells

• [Uchida et al. ’12]: multiple product quantizers shared by multiple cells

• OPQ [Ge et al. ’13]: single product quantizer for all cells, globallyoptimized for rotation (single/multi-index)

• LOPQ: with/without one product quantizer per cell, with/withoutrotation optimization per cell (single/multi-index)

• [Babenko & Lempitsky ’14]: one product quantizer per cell, optimizedfor rotation per cell (multi-index)

Page 87: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

Residual encoding in related work

• PQ (IVFADC) [Jegou et al. ’11]: single product quantizer for all cells

• [Uchida et al. ’12]: multiple product quantizers shared by multiple cells

• OPQ [Ge et al. ’13]: single product quantizer for all cells, globallyoptimized for rotation (single/multi-index)

• LOPQ: with/without one product quantizer per cell, with/withoutrotation optimization per cell (single/multi-index)

• [Babenko & Lempitsky ’14]: one product quantizer per cell, optimizedfor rotation per cell (multi-index)

Page 88: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search

http://image.ntua.gr/iva/research/

Thank you!