MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen...

15
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell

Transcript of MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen...

MIT CSAILVision interfaces

Towards efficient matching with random hashing methods…

Kristen GraumanGregory Shakhnarovich

Trevor Darrell

MIT CSAILVision interfaces

Motivation: Content-based image retrieval

Data set of 30 scenes in Boston• 1,079 database images• 89 query images

Features:• Harris-Affine detector (max m=3,595)

• MSER detector(max m=1,707)

• SIFT-PCA descriptors

Query

MIT CSAILVision interfaces

Content-based image retrieval

Pyramid match: ~1 second / query

Optimal match: ~2 hours / query

Number top retrievals

Acc

ura

cy

Even this is far too slow forany web-scale application!

MIT CSAILVision interfaces

Sub-linear time image search

N

<< N

h0111101

0110111

0110101

Randomized hashing techniques useful for sub-linear query time of very large image databasesN

Linear scan

MIT CSAILVision interfaces

Pyramid match hashing

• For fixed-size sets, Locality-Sensitive Hashing [Indyk & Motwani 1998] provides bounded approximate similarity search over bijective matching [Indyk & Thaper 2003]; [Grauman & Darrell CVPR 2004, 2005]

• For varying set sizes, embedding of pyramid match (with product normalization) makes random hyperplane hashing possible under set intersection hash family of [Charikar 2002]. [Grauman PhD 2006]

MIT CSAILVision interfaces

MIT CSAILVision interfaces

MIT CSAILVision interfaces

MIT CSAILVision interfaces

Single Frame Pose Estimation via Approximate Nearest Neighbor regression

• Obtain large DB of pose-appearance mappings• Exploit fast methods for approximate nearest

neighbor search in high dim. spaces. (e.g., LSH [Indyk and Motwani ‘98-’00].)

MIT CSAILVision interfaces

Approximate nearest neighbor techniques

… … …Rendered (& hashed)PoseDB

input

Hashfcns.

similar examples fall into same bucket in one or more hash table

MIT CSAILVision interfaces

Single Frame Pose Estimation via Approximate Nearest Neighbor regression

• Render large DB of pose-appearance mappings• Exploit fast methods for approximate nearest neighbor

search in high dim. spaces. (e.g., LSH [Indyk and Motwani ‘98-’00].)

Problem: signal distance dominated by nuisance variables

Idea: find embedding (i.e., hash functions for LSH) most relevant to parameter (pose) similarity… [Shakhnarovich et. al ’03, Shakhnarovich ‘05]

MIT CSAILVision interfaces

Pose estimation and Similarity-sensitive hashing

… … …Rendered (& hashed)PoseDB

input

Pose-sensitiveHashfcns.

NN similar in pose, not image

[Shakhnarovich et. al ’03, Shakhnarovich ‘05]

MIT CSAILVision interfaces

SSE / BoostPro

Similarity Sensitive Embedding

- Compute embedding H: I {0, 1}N such that

| H(I(1)) - H(I(2)) | is small if 1 is close to 2

| H(I(1)) - H(I(2)) | is large otherwise

- Use the embedding with approximate nearest neighbors retrieval (LSH)

- Find H by training boosted classifier to learn “same-pair” and concatenate resulting weak learners …

[Shakhnarovich 2005]

MIT CSAILVision interfaces

PSH results

~200,000 examples in DB; 2 sec

[Shakhnarovich et al. 2003, 2005]

MIT CSAILVision interfaces

Conclusions

• Random Hashing techniques allow broad search; well suited for very high dimensional spaces

• Useful in domains where there is no prior knowledge about how to cluster or model data…

• Similarity (parameter) sensitive hashing can find distance related to task…effectively learn problem dependent distance measure and efficient means to index.