Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases CVPR 2008...
-
date post
19-Dec-2015 -
Category
Documents
-
view
217 -
download
2
Transcript of Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases CVPR 2008...
Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases
CVPR 2008James PhilbinOndˇrej ChumMichael Isard
Josef SivicAndrew Zisserman
[7] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV, 2007.
Outline
• Introduction• Methods in this paper• Experiment & Result• Conclusion
Outline
• Introduction• Methods in this paper• Experiment & Result• Conclusion
Introduction
• Goal– Specific object retrieval from an image database
• For large database– It’s achieved by systems that are inspired by text retrieval
(visual words).
Flow
1. Get features– SIFT
2. Cluster– Approximate k-means
3. Feature quantization– Visual word– Soft-assignment (query)
4. Re-ranked– RANSAC
5. Query expansion– Average query expansion
Outline
• Introduction• Methods in this paper• Experiment & Result• Conclusion
Feature
• SIFT
8
Quantization (visual word)
• Point List = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)]• Sorted List = [(2,3), (4,7), (5,4), (7,2), (8,1),(9,6)]
Soft-assignment of visual words
• Matching two image features in bag-of-visual-words in hard-assignment– Yes if assigned to the same visual word– No otherwise
• Sort-assignment– A weighted combination of visual words
Soft-assignment of visual words
A~E represent cluster centers (visual words)points 1–4 are features
Soft-assignment of visual words
• – d is the distance from the cluster center to the
descriptor• In practice is chosen so that a substantial
weight is only assigned to few cells• The essential parameters– the spatial scale – r, nearest neighbors considered
Soft-assignment of visual words
• the weights to the r nearest neighbors, the descriptor is represented by an r-vector, which is then L1 normalized
TF–IDF weighting
• Standard index architecture
•
TF–IDF weighting
• tf– 100 vocabularies in a document, ‘a’ 3 times– 0.03 (3/100)
• idf– 1,000 documents have ‘a’, total number of
documents 10,000,000– 9.21 ( ln(10,000,000 / 1,000) )
• if-idf = 0.28( 0.03 * 9.21)
TF–IDF weighting
• In this paper– For the term frequency(tf)• we simply use the normalized weight value for each
visual word.
– For the inverse document(idf)• feature measure, we found that counting an occurrence
of a visual word as one, no matter how small its weight, gave the best results
Re-ranking
• RANSAC– Affine transform Θ : Y = AX+b
• Algorithm– 1. Randomly choose n points– 2. Use n points to find Θ – 3. Input N-n points to Θ– 4. How many inlier– Repeat 1~4 K times– Pick the best Θ
Re-ranking
• In this paper– No only counting the number of inlier
correspondences ,but also scoring function, or cosine =
Average query expansion
• Obtain top (m < 50) verified results of original query• Construct new query using average of these results•
– where d0 is the normalized tf vector of the query region
– di is the normalized tf vector of the i-th result
• Requery once
Outline
• Introduction• Methods in this paper• Experiment & Result• Conclusion
Dataset
• Crawled from Flickr & high resolution(1024x768)• Oxford buildings– About 5,062 high resolution(1024x768) images– using 11 landmarks as queries
• Paris– Used for quantization– 6,300 images
• Flickr1– 145 most popular tags– 99,782 images
Dataset
Dataset
• Query– 55 queries: 5 queries for each of 11 landmarks
Baseline
• Follow the architecture of previous work [15]• A visual vocabulary of 1M words is generated
using an approximate k-means
[15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007
24
Evaluation
• Compute Average Precision (AP) score for each of the 5 queries for a landmark– Area under the precision-recall curve• Precision = RPI / TNIR• Recall = RPI / TNPCRPI = retrieved positive images
TNIR = total number of images retrieved
TNPC = total number of positives in the corpus• Average these to obtain a Mean Average
Precision (MAP)
Recall
Precision
Evaluation
• Dataset– Only the Oxford (D1) 5,062 images– Oxford (D1) + Flickr1 (D2) 104,844 images
• Vector quantizers– Oxford or Paris
Result
[14] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree.CVPR, 2006.
[18] T. Tuytelaars and C. Schmid. Vector quantizing feature space with a regular lattice. ICCV, 2007.
[15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. CVPR, 2007.
Parameter variation Comparison with other methods
Result
Effect of vocabulary size
Spatial verification
Result
Query expansion
Scaling-up to 100K images
Result
Result
ashmolean_3 goes from 0.626 AP to 0.874 APchrist_church_5 increases from 0.333 to 0.813 AP
Outline
• Introduction• Methods in this paper• Experiment & Result• Conclusion
Conclusion
• A new method of visual word assignment was introduced:– descriptor-space soft-assignment
• It improves that descriptor lost in the quantization step of previously published methods.