Similar image search

Similar Image Search with a Tiny Bag-of-Delegates

Representation

Weiwen Tu, Rong Pan, Jingdong Wang

Similar Image Search

• text-based image search• context-based image search

Tiny Bag-of-Delegates

Given a large image set ,

our goal is to find a bag-of-delegates to represent each image.

is represented by M delegates,


So we need to build M vocabularies ,

where with J being the size of each vocabulary.

From each vocabulary an element is select as a delegate for each image.


To conduct search, we organize the image set using M inverted indices ,each corresponding to a vocabulary respectively.

An inverted index is composed of J list,

each list is corresponding to a word, consists of images that are mapped to the word ( corresponds to )

Example for Tiny Bag-of-Delegates

𝐼 1→(𝑣1 ,𝑣3 ,𝑣5)

𝐼 2→(𝑣1 ,𝑣4 ,𝑣6)

𝐼 3→(𝑣2 ,𝑣3 ,𝑣6)

𝑉 3𝑉 2𝑉 1

𝑇 1

𝑇 2

𝑇 3

Query:


Given an image whose bag-of-delegates is defined as , we find the lists

where is a function mapping from the delegate w to the list in the inverted index . Then regard all the images in these lists as the candidate similar images, and finally order these images according to their similarities computed with their features.

Spatial Partition Tree

• Spatial partition tree is a method for recursively subdividing a space into subsets by hyperplanes.

• Popular data structures: Kd-tree, PCA-tree, VP-tree.

Example for Spatial Partition Tree

The left figure shows a spatial partitioning induced by an RP tree. The cross is q query point and the lines means the partition hyperplanes.

𝒗𝟏

𝒗𝟐

𝒗𝟑𝒗𝟒

𝒗𝟓

𝒗𝟔𝒗𝟕

𝒗𝟖

Tree Construction

procedure MAKETREE(S)if(S) < MinSize

then return(Leaf)else;

return ([Rule, LeftTree, RightTree])

We define the split method as ChooseRule. The core tree-building is called MakeTree, and takes as input a data set .

Problem

A larger amount of trees yield a better performance, while requiring more storage to save the inverted indices and accordingly more query time

Problem:Can we use a small amount of trees but with the search performance guaranteed?

Vocabulary Construction

Our idea is to use supervised information in constructing trees. In similar image search, the supervised information is the true nearest neighbors of each image.

We denote the true neighbors of an image by aList ,

where is the index of a similar image.

Criterion

The nearest neighbor candidates discovered from the bag-of-delegates representation can be written as

The recall of :The average recall :

Criterion

Given a set of Z candidate inverted indices

the goal is to find inverted indices such that the recall is maximized. The objective function is written as

where .

Forward selection scheme

We progressively generate the candidate inverted indices for each step. At the beginning, we randomly generate a set of candidate inverted indices

,We evaluate each inverted index to compute the recall, and then identify the first inverted index that corresponds to the largest recall. Denote the identified inverted index as and the current solution is

Forward selection scheme

The later steps sequentially find the inverted indices oneby one. Considering the step, we have found t indices .We generate a set of new candidates

and form the whole candidates

The objective of identifying the (t+ 1)-th index is as follows,

Random selection vs. forward selection

Random selection

Forward selection

Adaptive scheme

We maintain a set

where and are truly neighboring points but not appear in the same bucket in the previously-identified indices .

Compute top r sparse principal directions from sample points,

Adaptive scheme

Each direction can form a partition ,

where v is the current node with data points

Then we judge a pair is in the same side

Adaptive scheme

Finally, the objective function of the direction is,

Adaptive forward selection

1. Initialization: 𝑄←𝐿,𝑡 ←0,𝑒←ȁ𝑄ȁ,𝑅←∅

2. Repeat

3. Candidate proposal

Randomly generate spatial partition tree 𝒯ത per Q

4. Candidate selection

Choose the spatial partition tree T from 𝒯ത that keeps the large number of Q

5. Update

Discard all the pairs of points lying in the same bucket in T from Q 𝑡 ←𝑡+ 1,𝑒←ȁ𝑄ȁ,𝑅←𝑅∪{𝑇}

6. Until 𝑒≤ 𝜀 && 𝑡 ≥ 𝜏

7. Return R;

EXPERIMENTS

• Data set– 32 ×32 color images– 1M tiny images to form around 80M

images– global GIST descriptor(384 D vector)

• Evaluation criteria(average accuracy score)

Evaluation

Given an image , the accuracy is computed as,

: true neighbors list: the top K images found from M inverted indices, The whole average accuracy is computed as

Accuracy vs indices

Bucket size: the maximum number of points in a leaf node;g: the number of target NNs.

Accuracy vs indices

bucket size = 300

bucket size = 100

accuracy vs accessed images

Out-of-sample Test

Visual search result

13 : 9 22 : 9

Thank you!

Similar image search

Documents

Transcript of Similar image search