Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction...

17
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT AND EXPLORATION Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction SIGMOD 08, June 10 th 2008, Vancouver, Canada Marc Wichterich , Ira Assent, Philipp Kranen, Thomas Seidl

Transcript of Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction...

Page 1: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16I9CHAIR OF COMPUTER SCIENCE 9DATA MANAGEMENT AND EXPLORATION

Efficient EMD-based Similarity Search in Multimedia Databases via

Flexible Dimensionality Reduction

SIGMOD 08, June 10th 2008, Vancouver, Canada

Marc Wichterich, Ira Assent, Philipp Kranen, Thomas Seidl

Page 2: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Outline

Introduction Similarity Search The Earth Mover’s Distance Dimensionality Reduction

Dimensionality Reduction for the EMD Reduction Matrixes Data-independent Reduction Data-dependent Reduction

Experimental Results Conclusion & Outlook

2

Page 3: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Introduction – Similarity Search

Objective: Find similar objects in database

Applications: Medical images, edutainment, engineering, etc.

Requires: Object feature extraction (here: feature histograms) Similarity measure (here: Earth Mover’s Distance) Efficient retrieval technique for similar objects

3

similar? similar?

Page 4: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Introduction – The Earth Mover’s Distance[1]

Transform object features to match those of other object

Minimum “cost x flow” for transformation: EMD

4

[1] Rubner, Tomasi, Perceptual Metrics for Image Database Navigation, Kluwer, 2001.

histogramx histogramy

Flows

histogramx histogramy

Page 5: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Introduction – Dimensionality Reduction

Challenge for Similarity Search: high computational complexity for high dimensionalities

Approach: Reduce dimensionality of query & DB Filter DB using lower dimensionality Refine using orig. dimensionality

Filter quality criteria Selectivity (few refinements) No false dismissals (lower bound property)

5

reduce

Page 6: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Dimensionality Reduction for the EMD

Both the feature vectorsand the cost matrixhave to be reduced

General linear dimensionality reduction techniques (PCA, ICA, etc.) fail quality criteria for EMD Discarding dimensions destroys LB property Splitting dimensions causes poor selectivity

Aggregating dimensionality reductions can work well Original dimensions are not split up Each reduced dimension consists of set of orig. dimensions

6

reduce

Page 7: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Reduction Matrixes

Aggregating dimensionality reductions are characterized by reduction matrix R = [ rab ] {0,1} d x d’ with

Example:

Lower-bounding reduced cost matrix C’ = [ c’a’b’ ] given R as given by [2] There is no larger lower bound (see paper)

Main question: Which dimensions to aggregate?

7

R =

1 01 00 10 1

x = ( 2 4 3 6 ) x' = ( 2 4 3 6 ) • = ( 6 9 )

1 01 00 10 1

[2] Ljosa, Bhattacharya, Singh, Indexing Spatially Sensitive Distance Measures using Multi-Resolution Lower Bounds, EDBT2006.

Page 8: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Data-Independent Reduction

8

Goal: Tight lower bound (large reduced EMD values)

Large cost between reduced dimensions Small loss of cost for each reduced dimension

Matches clustering goal: low intra-cluster dissimilarity / high inter-cluster dissimilarity

kMedoid clustering based on the cost matrix

0 1 3 41 0 2 33 2 0 14 3 1 0

C =0 22 0C' =

lost cost information

R =

1 01 00 10 1

Page 9: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Data-Dependent Reduction based on flows

Idea: Incorporate knowledge on data for better reduction

In data-independent reduction, only C is used Problem: Ensuring large c’a’b’ pointless if f’a’b’ is small

Now: Also include information on F

9

Page 10: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Data-Dependent Reduction: Algorithm

Add preprocessing step analyzing the data Collect information about flows in unreduced EMD Use information to improve initial / intermediate reduction

matrix iterate until no improvement made

10

no

yes

Page 11: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Data-Dependent Reduction: Preprocessing

Calculate average flow matrix F = [ fab ] for sample S of DB

Approximate the flows F’ in reduced EMD with F’ = RT F R

Maximize approximate average reduced EMD

11

~

_ _

_

R =

1 01 00 10 1

approximate average reduced

flows

4 89 5F' =~

2 1 2 30 1 2 13 2 3 11 3 0 1

F =_

average flowsapproximate

average reduced EMD

Page 12: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Data-Dependent Reduction: Optimization

Global optimization of

requires

assessment of all possible reduction matrices Find local optimum via reassignment of dimensions

FB-All: Choose best reassignment in each iteration FB-Mod: Choose first profitable reassignment in each iteration

Initial reduction matrices Base: assign all original

dimensions to first reduced dimension

KMed: reduction matrix from data-independent reduction

12

Page 13: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Experimental Results

13

Data-independent vs. data-dependent aggregation

sample image [2] data independent(kMedoid)

data dependent(FB-All-Mod)

costliest flows

Page 14: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Experimental Results

Efficiency vs. reduced dimensionality (Retina DB)

14

Page 15: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Experimental Results

Efficiency vs. reduced dimensionality (IRMA DB)

15

Page 16: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Experimental Results

16

Filter & Refinement times and filter selectivity (IRMA DB)

Page 17: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16

Conclusion & Outlook

17

Conclusion Earth Mover’s Distance as a similarity measure High quality, but computationally expensive in high dimensions Dimensionality reduction for the EMD Data-independent reduction: Clustering in feature space Data-dependent reduction: Analyze flow information

Outlook Local reductions Different reduction for query and DB Index reduced histograms using [3]

[3] Assent, Wichterich, Meisen, Seidl, Efficient Similarity Search Using the Earth Mover's Distance for Large Multimedia Databases, ICDE 2008.