Approximate Correspondences in High Dimensions
description
Transcript of Approximate Correspondences in High Dimensions
MIT CSAILVision interfaces
Approximate Correspondences in High Dimensions
Kristen Grauman*
Trevor Darrell
MIT CSAIL
(*) UT Austin…
MIT CSAILVision interfaces
Key challenges: robustness
Illumination Object pose Clutter
ViewpointIntra-class appearance
Occlusions
MIT CSAILVision interfaces
Key challenges: efficiency
• Thousands to millions of pixels in an image
• 3,000-30,000 human recognizable object categories
• Billions of images indexed by Google Image Search
• 18 billion+ prints produced from digital camera images in 2004
• 295.5 million camera phones sold in 2005
MIT CSAILVision interfaces
Local representations
Superpixels [Ren et al.]
Shape context [Belongie et al.]
Maximally Stable Extremal Regions [Matas et al.]
Geometric Blur [Berg et al.]
SIFT [Lowe]
Salient regions [Kadir et al.]
Harris-Affine [Schmid et al.]
Spin images [Johnson and Hebert]
Describe component regions or patches separately
MIT CSAILVision interfaces
How to handle sets of features?
• Each instance is unordered set of vectors• Varying number of vectors per instance
MIT CSAILVision interfaces
Partial matching
Compare sets by computing a partial matching between their features.
MIT CSAILVision interfaces
Pyramid match overview
optimal partial matching
MIT CSAILVision interfaces
Computing the partial matching
• Optimal matching
• Greedy matching
• Pyramid match
for sets with features of dimension
MIT CSAILVision interfaces
Pyramid match overview
• Place multi-dimensional, multi-resolution grid over point sets
• Consider points matched at finest resolution where they fall into same grid cell
• Approximate optimal similarity with worst case similarity within pyramid cell
No explicit search for matches!
Pyramid match measures similarity of a partial matching between two sets:
MIT CSAILVision interfaces
Pyramid match
Number of newly matched pairs at level i
Measure of difficulty of a match at level i
Approximate partial match
similarity
[Grauman and Darrell, ICCV 2005]
MIT CSAILVision interfaces
Pyramid extraction
,
Histogram pyramid: level i has bins of size
MIT CSAILVision interfaces
Counting matches
Histogram intersection
MIT CSAILVision interfaces
Example pyramid match
MIT CSAILVision interfaces
Example pyramid match
MIT CSAILVision interfaces
Example pyramid match
MIT CSAILVision interfaces
Example pyramid matchpyramid match
optimal match
MIT CSAILVision interfaces
x
Randomly generated uniformly distributed point sets with m= 5 to 100, d=2
Approximating the optimal partial matching
MIT CSAILVision interfaces
PM preserves rank…
MIT CSAILVision interfaces
and is robust to clutter…
MIT CSAILVision interfaces
Learning with the pyramid match
• Kernel-based methods – Embed data into a Euclidean space via a
similarity function (kernel), then seek linear relationships among embedded data
– Efficient and good generalization– Include classification, regression,
clustering, dimensionality reduction,…
• Pyramid match forms a Mercer kernel
MIT CSAILVision interfaces
ComplexityKernel
Pyramid match
Match [Wallraven et al.]
Tim
e (s
)
Acc
ura
cyCategory recognition results
ETH-80 data set
Mean number of features Mean number of features
MIT CSAILVision interfaces
0.002 s / match
5 s / match
Category recognition results
Pyramid match kernel over spatial
features with quantized
appearance
2004
Time of publication
6/05 12/05 3/06 6/06
MIT CSAILVision interfaces
But rectangular histogram may scale poorly with input dimension…
Build data-dependent histogram structure…
New Vocabulary-guided PM [NIPS 06]:
• Hierarchical k-means over training set
• Irregular cells; record diameter of each bin
• VG pyramid structure stored O(kL); stored once
• Individual Histograms still stored sparsely
Vocabulary-guided pyramid match
MIT CSAILVision interfaces
Vocabulary-guided pyramid match
Uniform bins • Tune pyramid partitions to the feature distribution
• Accurate for d > 100
• Requires initial corpus of features to determine pyramid structure
• Small cost increase over uniform bins: kL distances against bin centers to insert points
Vocabulary-guided bins
MIT CSAILVision interfaces
Vocabulary-guided pyramid match
nij(X) : hist. X level i cell j
wij : weight for hist. X level i cell j(1) ~= diameter of cell
(2) ~= dij(X) + dij(Y) (dij(H)=max dist of H’s pts in cell i,j to center)
ch(n) : child h of node n
c2(n11)Mercer kernel
Upper bound
wij * (# matches in cell j level i - # matches in children)
W * # new matches @ level i
MIT CSAILVision interfaces
Results: Evaluation criteria
• Quality of match scores How similar are the rankings produced by the approximate measure to those produced by the optimal measure?
• Quality of correspondences How similar is the approximate correspondence field to the optimal one?
• Object recognition accuracy Used as a match kernel over feature sets, what is the recognition output?
MIT CSAILVision interfaces
Match score quality
Uniform bin pyramid match
Vocabulary-guided pyramid match
ETH-80 images, sets of SIFT features
d=8 d=128
d=128d=8
Dense SIFT (d=128) k=10, L=5 for VG PM; PCA for low-dim feats
MIT CSAILVision interfaces
ETH-80 images, sets of SIFT features
Match score quality
MIT CSAILVision interfaces
Bin structure and match countsData-dependent bins allow more gradual distance ranges
d=8 d=13
d=68
d=3
d=113 d=128
MIT CSAILVision interfaces
Approximate correspondences
Use pyramid intersections to compute smaller explicit matchings.
MIT CSAILVision interfaces
Approximate correspondences
Use pyramid intersections to compute smaller explicit matchings.
MIT CSAILVision interfaces
Correspondence examples
MIT CSAILVision interfaces
ETH-80 images, sets of SIFT descriptorsApproximate correspondences
MIT CSAILVision interfaces
ETH-80 images, sets of SIFT descriptorsApproximate correspondences
MIT CSAILVision interfaces
Impact on recognition accuracy
• VG-PMK as kernel for SVM• Caltech-4 data set• SIFT descriptors extracted
at Harris and MSER interest points
MIT CSAILVision interfaces
Sets of features elsewhere
diseases as sets of gene expressions
documents as bags of words
methods as sets of
instructions