Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by...

35
Content Based Image Content Based Image Retrieval Natalia Vassilieva [email protected] HP Labs Russia © 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Transcript of Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by...

Page 1: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Content Based Image Content Based Image RetrievalRetrievalNatalia [email protected]

HP Labs Russia

© 2008 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice

Page 2: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Tutorial outlineTutorial outline• Lecture 1

− Introduction− Introduction

− Applications

• Lecture 2− Performance measurement− Performance measurement

− Visual perception

− Color features

• Lecture 3• Lecture 3− Texture features

− Shape features

− Fusion methods− Fusion methods

• Lecture 4− Segmentation

Local descriptors− Local descriptors

• Lecture 5− Multidimensional indexing

2/35

− Survey of existing systems

Page 3: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Lecture 5Lecture 5

Multidimensional indexingMultidimensional indexing

Survey of existing systemsSurvey of existing systems

Page 4: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Lecture 5: Outline

• Multidimensional indexing

Lecture 5: Outline

• Multidimensional indexing• Multidimensional indexing

− Tree structures

• Multidimensional indexing

− Tree structures

• VP-tree

− Locality Sensitive hashing

• VP-tree

− Locality Sensitive hashing− Locality Sensitive hashing

• Survey of existing systems

− Locality Sensitive hashing

• Survey of existing systems

4/35

Page 5: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Need of multidimensional indexingNeed of multidimensional indexing

• High-dimensional data

− Mean Color = RGB = 3 dimensional vector− Mean Color = RGB = 3 dimensional vector

− Color Histogram = 256 dimensions

− ICA-based texture = 21*30 dimensions− ICA-based texture = 21*30 dimensions

• Effective storage and speedy retrieval needed

• Similarity search, Nearest neighbour

5/35

• Similarity search, Nearest neighbour

Page 6: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Problem DescriptionProblem Description

• ε - Nearest Neighbor Search (ε - NNS)• ε - Nearest Neighbor Search (ε - NNS)

− Given a set P of points in a normed space , preprocess P so as to efficiently return a point p ∈ P for any given query point q, such that that

• dist(q,p) ≤ (1 + ε ) × min r ∈ P dist(q,r)

• Generalizes to K- nearest neighbor search ( K >1)

6/35

Page 7: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Problem DescriptionProblem Description

7/35

Page 8: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Lecture 5: Outline

• Multidimensional indexing

Lecture 5: Outline

• Multidimensional indexing

− Tree structures

• VP-tree

− Locality Sensitive hashing− Locality Sensitive hashing

• Survey of existing systems

8/35

Page 9: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Some known indexing techniquesSome known indexing techniques

• Trees• Trees

− R-tree – low dimensions (2D), overlap

− Quad-tree – low dimensions (2D), inefficient for skewed − Quad-tree – low dimensions (2D), inefficient for skewed data

− k-D tree - inefficient for high dimensional skewed data− k-D tree - inefficient for high dimensional skewed data

− VP tree (metric trees)

• VA-file – not good for skewed dataapproximation11

data space

A

B• VA-file – not good for skewed data

• Hashing

approximation

1011

0001

1100

10

00

01

11

A

B

C

D

B

C

D

0011

110000

00 01 1110

(a) (b)

DD

9/35

Page 10: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Spheres vs. RectanglesSpheres vs. Rectangles

• ↑↑⇒ ratiolitydimensiona1e)Volume(Cub

ere)Volume(Sphratio ≤= •

10/35

• relative distances

Page 11: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Lecture 5: Outline

• Multidimensional indexing

Lecture 5: Outline

• Multidimensional indexing

− Tree structures

• VP-tree

− Locality Sensitive hashing− Locality Sensitive hashing

• Survey of existing systems

11/35

Page 12: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Vantage point methodVantage point method

σµ −≤),( qvd >∈ Sp

σµ +>),( qvd

σσµµ =−−>−≥ |)(||),(),(|),( qvdpvdpqd

>

∈ Spσµ +>),( qvd

σµσµ =−+>−≥ |))(||),(),(|),( pvdqvdpqd

≤∈ Sp

12/35

Page 13: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

ConditionsConditions

• Minimum circuit

• “Corners” of the space• “Corners” of the space

• Balanced tree

Maximum standard • Maximum standard deviation

13/35

Page 14: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

AlgorithmsAlgorithms

14/35

Page 15: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Lecture 5: Outline

• Multidimensional indexing

Lecture 5: Outline

• Multidimensional indexing

− Tree structures

• VP-tree

− Locality Sensitive hashing− Locality Sensitive hashing

• Survey of existing systems

15/35

Page 16: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

LSH: MotivationLSH: Motivation

• Similarity Search over High-Dimensional Data• Similarity Search over High-Dimensional Data

− Image databases, document collections etc

• Curse of Dimensionality• Curse of Dimensionality

− All space partitioning techniques degrade to linear search for high dimensions

• Exact vs. Approximate Answer

− Approximate might be good-enough and much-faster

Time-quality trade-off− Time-quality trade-off

16/35

Page 17: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

LSH: Key ideaLSH: Key idea

• Locality Sensitive Hashing ( LSH ) to get sub-• Locality Sensitive Hashing ( LSH ) to get sub-linear dependence on the data-size for high-dimensional datadimensional data

• Preprocessing :

− Hash the data-point using several LSH functions so − Hash the data-point using several LSH functions so that probability of collision is higher for closer objects

17/35

Page 18: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

LSH: AlgorithmLSH: Algorithm

• Input • Input

− Set of N points { p1 , …….. pn }

− L ( number of hash tables )

• Output

− Hash tables Ti , i = 1 , 2, …. L

• Foreach i = 1 , 2, …. L• Foreach i = 1 , 2, …. L

− Initialize Ti with a random hash function gi(.)

• Foreach i = 1 , 2, …. L• Foreach i = 1 , 2, …. L

Foreach j = 1 , 2, …. N

Store point pj on bucket gi(pj) of hash table Tij i j i

18/35

Page 19: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

LSH: AlgorithmLSH: Algorithm

P

pi

P

g1(pi) g2(pi) gL(pi)g1(pi) g2(pi) gL(pi)

TLT2T1

19/35

Page 20: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

LSH: ε - NNS Query LSH: ε - NNS Query

• Input • Input

− Query point q

− K ( number of approx. nearest neighbors )

Access • Access

− Hash tables Ti , i = 1 , 2, …. L

• Output• Output

− Set S of K ( or less ) approx. nearest neighbors

• S ← ∅Foreach i = 1 , 2, …. L

− S ← S ∪ { points found in gi(q) bucket of hash table Ti }

20/35

Page 21: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

LSH: Analysis

• Family H of (r1, r2, p1, p2)-sensitive functions, {hi(.)}

LSH: Analysis

• Family H of (r1, r2, p1, p2)-sensitive functions, {hi(.)}

− dist(p,q) < r1 ⇒ ProbH [h(q) = h(p)] ≥ p1

− dist(p,q) ≥ r2 ⇒ ProbH [h(q) = h(p)] ≤ p2− dist(p,q) ≥ r2 ⇒ ProbH [h(q) = h(p)] ≤ p2

− p1 > p2 and r1 < r2

• LSH functions: gi(.) = { h1(.) …hk(.) } i 1 k

• For a proper choice of k and l, a simpler problem, (r,ε)-Neighbor, and hence the actual problem can be solved

• Query Time : O(d ×n[1/(1+ε)] )• Query Time : O(d ×n[1/(1+ε)] )

− d : dimensions , n : data size

21/35

Page 22: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

LSH: ApplicationsLSH: Applications

• To index local descriptors • To index local descriptors

− Near duplicate detection

− Sub image retrieval− Sub image retrieval

22/35

Page 23: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Lecture 5: Outline

• Multidimensional indexing

Lecture 5: Outline

• Multidimensional indexing

− Tree structures

• VP-tree

− Locality Sensitive hashing− Locality Sensitive hashing

• Survey of existing systems

23/35

Page 24: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

IBM’s QBICIBM’s QBIC

• http://wwwqbic.almaden.ibm.com/• http://wwwqbic.almaden.ibm.com/

• QBIC – Query by Image Content

First commercial CBIR system.• First commercial CBIR system.

• Model system – influenced many others.• Model system – influenced many others.

• Uses color, texture, shape features

• Text-based search can also be combined.• Text-based search can also be combined.

• Uses R*-trees for indexing

24/35

Page 25: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

QBIC – Search by colorQBIC – Search by color

25/35

Page 26: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

QBIC – Search by shapeQBIC – Search by shape

26/35

Page 27: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

QBIC – Query by sketchQBIC – Query by sketch

27/35

Page 28: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

VirageVirage

• http://www.virage.com/home/index.en.html• http://www.virage.com/home/index.en.html

• Developed by Virage inc.

Like QBIC, supports queries based on color, • Like QBIC, supports queries based on color, layout, texture

• Supports arbitrary combinations of these features with weights attached to each

• This gives users more control over the search processprocess

28/35

Page 29: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

VisualSEEkVisualSEEk

• http://www.ee.columbia.edu/ln/dvmm/researchPro• http://www.ee.columbia.edu/ln/dvmm/researchProjects/MultimediaIndexing/VisualSEEk/VisualSEEk.htm

• Research prototype – University of Columbia

• Mainly different because it considers spatial • Mainly different because it considers spatial relationships between objects.

• Global features like mean color, color histogram can give many false positives

• Global features like mean color, color histogram can give many false positives

• Matching spatial relationships between objects and visual features together result in a powerful

• Matching spatial relationships between objects and visual features together result in a powerful search.

29/35

Page 30: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Features in some existing systemsFeatures in some existing systems

QBIC

Color Texture Shape

Histograms (HSV) Tamura Image, Euclid dist

Boundary geometrical moments + Invariant

TAHHdist 21

2 =QBIC

VisualSEEkHistograms (HSV), Color Sets,

Euclid distmoments + Invariant moments

VisualSEEk

NetraHistograms (HSV), Color codebook,

Color Sets,Location info

Gabor filters Fourier-basedNetra

MarsHistograms, HSV

Color codebook,Clusterisation

Gabor filters

Tamura Image,

Fourier-based

MFD (Fourier)Mars

Tamura Image, 3D Histo

MFD (Fourier)

∑=

−=N

i

iHiHdist1

21 ))(),(min(1

30/35

Page 31: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Other systemsOther systems

• xCavator by CogniSign• xCavator by CogniSignhttp://xcavator.net/

• CIRES• CIREShttp://amazon.ece.utexas.edu/~qasim/samples/sample_buildings5.html

• MFIRS by University of Mysore • MFIRS by University of Mysore http://www.pilevar.com/mfirs/

• PIRIA• PIRIAhttp://www-list.cea.fr/fr/programmes/systemes_interactifs/labo_lic2m/list.cea.fr/fr/programmes/systemes_interactifs/labo_lic2m/piria/w3/pirianet.php?bdi=coil-100&cide=cciv&up=1&p=1

31/35

Page 32: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Other systemsOther systems

• IMEDIA• IMEDIAhttp://www-rocq.inria.fr/cgi-bin/imedia/circario.cgi/v2std

• TILTOMOhttp://www.tiltomo.com/http://www.tiltomo.com/

• The GNU Image-Finding Toolhttp://www.gnu.org/software/gift/http://www.gnu.org/software/gift/

• Beholdhttp://www.beholdsearch.com/about/#features

• LTU technologieshttp://www.ltutech.com/en/

• …• …

32/35

Page 33: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Lecture 5: ResumeLecture 5: Resume

• Multidimensional indexing• Multidimensional indexing

− VP trees can be used

− LSH is great for near duplicates and sub image − LSH is great for near duplicates and sub image retrieval

• There are a lot of systems

− Research projects− Research projects

− Commercial projects (usually combined with text-based retrieval)retrieval)

− CBIR is a very active area: research is moving to commercialize projects just now

33/35

commercialize projects just now

Page 34: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Lecture 5: BibliographyLecture 5: Bibliography

• Christian Böhm, Stefan Berchtold, Daniel A. Keim. Searching in high-• Christian Böhm, Stefan Berchtold, Daniel A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 2001.

• Volker Gaede, Oliver Günther. Multidimensional Access Methods. ACM Computing Surveys 1998.Volker Gaede, Oliver Günther. Multidimensional Access Methods. ACM Computing Surveys 1998.

• Roger Weber, Hans-Jörg Schek, Stephen Blott. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. International Conference on Very Large Data Bases (VLDB) 1998.Spaces. International Conference on Very Large Data Bases (VLDB) 1998.

• Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SCG '04, pp 253-262, 2004. 253-262, 2004.

• Kave Eshgi, Shyamsundar Rajaram. Locality Sensitive Hash Functions Based on Concomitant Rank Order Statistics. In Proc. of ACM KDD, 2008.

34/35

Page 35: Content Based Image Retrievalromip.ru/russir2008/slides/cbir/lecture-5.pdf• QBIC – Query by Image Content • First commercial CBIR system. • Model system – influenced many

Tutorial outlineTutorial outline• Lecture 1

− Introduction− Introduction

− Applications

• Lecture 2− Performance measurement− Performance measurement

− Visual perception

− Color features

• Lecture 3• Lecture 3− Texture features

− Shape features

− Fusion methods− Fusion methods

• Lecture 4− Segmentation

Local descriptors− Local descriptors

• Lecture 5− Multidimensional indexing

35/35

− Survey of existing systems