Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
-
date post
19-Dec-2015 -
Category
Documents
-
view
226 -
download
1
Transcript of Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
Packing bag-of-features
ICCV 2009
Herv´e J´egouMatthijs DouzeCordelia Schmid
INRIA
Introduction
• Introduction• Proposed method• Experiments• Conclusion
Introduction
• Introduction• Proposed method• Experiments• Conclusion
Bag-of-features
Extracting local image
descriptors
Clustering of the descriptors & k-means quantizer(visual words)
The histogram of visual word is weighted using the tf-idf weighting scheme of [12] & subsequently normalized with L2 norm
Roducing a frequency vector fi of length k
TF–IDF weighting
•
TF–IDF weighting
• tf– 100 vocabularies in a document, ‘a’ 3 times– 0.03 (3/100)
• idf– 1,000 documents have ‘a’, total number of
documents 10,000,000– 9.21 ( ln(10,000,000 / 1,000) )
• if-idf = 0.28( 0.03 * 9.21)
Binary BOF[12]
• discard the information about the exact number of occurrences of a given visual word in the image.
• Binary BOF vector components only indicates the presence or not of a particular visual word in the image.
• A sequential coding using 1 bit per component, k/8 bytes per image⌈ ⌉ , the memory usage per
image would be typically 10 kB per image[12] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, pages 1470–1477, 2003.
Binary BOF(Holidays dataset)
Inverted-file index(Sparsity)
• Documents– T0 = "it is what it is"
– T1 = "what is it"
– T2 = "it is a banana"
• Index– "a": {2}– "banana": {2}– "is": {0, 1, 2}– "it": {0, 1, 2}– "what": {0, 1}
Binary BOF
Compressed inverted file
• • Compression can close to the vector entropy• Compared with a standard inverted file, about
4 times more images can be indexed using the same amount of memory
• This may compensate the decoding cost of the decompression algorithm
[16] J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2):6, 2006.
Introduction
• Introduction• Proposed method• Experiments• Conclusion
MiniBOFs
Projection of a BOF
• Sparse projection matices– – d: dimension of the output descriptor– k: dimension of the input BOF
• For each matrix row, the number of non-zero components is , typically set nz = 8 for k = 1000, resulting in d = 125
Projection of a BOF
• The other matrices are defined by random permutations.– For k = 12 and d = 3, the random permutation (11,
2, 12, 8; 9, 4, 10, 1; 7, 5, 6, 3)
• Image i , m mini-BOFs – , ( )
Indexing structure
• Quantization– The miniBOF is quantized by associated with
matrix , , where is the number of codebook entries of the indexing structure.
– The set of k-means codebooks is learned off-line using a large number of miniBOF vectors, here extracted from the Flickr1M* dataset. The dictionary size associated with the minBOFs is not related to the one associated with the initial SIFT descriptors, hence we may choose . We typically set = 20000.
Indexing structure
• Binary signature generation– The miniBOF is projected using a random rotation
matrix R, producing d components– Each bit of the vector is obtained by comparing
the value projected by R to the median value of the elements having the same quantized index. The median values for all quantizing cells and all projection directions are learned off-line on our independent dataset
Quantizing cells
[4] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search.In ECCV, 2008.
Indexing structure
• miniBOF associated with image i is represented by the tuple
•
• total memory usage per image is bytes
Multi-probe strategy
• retrieving not only the inverted list associated with the quantized index , but the set of inverted lists associated with the closest t centroids of the quantizer codebook
• T times image hits
Fusion
• Query signature• Database signature• • •
•
Fusion
•
– equal to 0 for images having no observed binary signatures
– equal to if the database image i is the query image itself
Fusion
Introduction
• Introduction• Proposed method• Experiments• Conclusion
Dataset
• Two annotated Dataset– INRIA Holidays dataset [4] – University of Ken-tucky recognition benchmark [9]
• Distractor dataset– one million images downloaded from Flickr,
Flickr1M• Learning parameters– Flickr1M∗
Detail
• Descriptor extraction– Resize to a maximum of 786432 pixels– Performed a slight intensity normalization– SIFT
• Evaluation– Recall@N– mAP– Memory– Image hits
• Parameters
# Using a value of nz between 8 and 12 provides the best accuracy for vocabulary sizes ranging from 1k to 20k.
mAP
• Mean average precision• EX: – two images A&B– A has 4 duplicate images– B has 5 duplicate images– Retrieval rank A: 1, 2, 4, 7– Retrieval rank B: 1, 3, 5 – Average precision A = (1/1+2/2+3/4+4/7)/4=0.83– Average precision B = (1/1+2/3+3/5+0+0)/3=0.45– mAP= (0.83+0.45)/2=0.64
Table 1(Holidays)
# The number of bytes used per inverted list entry is 4 bytes for binary BOF & 5 bytes for BOF
Table 2(Kentucky)
Table 3(Holidays+Flickr1M)
Figure(Holidays+Flickr1M)
# Our approach requires 160 MB for m = 8 and the query is performed in 132ms, to be compared, respectively, with 8 GB and 3s for BOF.
Sample
Introduction
• Introduction• Proposed method• Experiments• Conclusion
Conclusion
• This paper have introduced a way of packing BOFs:miniBOFs– An efficient indexing structure for rapid access and
an expected distance criterion for the fusion of the scores
– Reduces memory usage– Reduces the quantity of memory scanned (hits)– Reduces query time