Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects

23
San San Diego, 06/12/03 Diego, 06/12/03 Martin Pfeifle, Database Group, Martin Pfeifle, Database Group, University of Munich University of Munich Using Sets of Feature Vectors for Using Sets of Feature Vectors for milarity Search on Voxelized CAD Objec milarity Search on Voxelized CAD Objec Hans-Peter Kriegel, Stefan Brecheisen, Peer Kröger, Martin Pfeifle , Matthias Schubert ACM SIGMOD 2003 San Diego, California June 9-12, 2003 Database Group Institute for Computer Science University of Munich, Germany

description

Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects. Hans-Peter Kriegel, Stefan Brecheisen, Peer Kröger, Martin Pfeifle , Matthias Schubert. Database Group. ACM SIGMOD 2003 San Diego, California June 9-12, 2003. Institute for Computer Science - PowerPoint PPT Presentation

Transcript of Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects

Page 1: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Using Sets of Feature Vectors for Using Sets of Feature Vectors for

Similarity Search on Voxelized CAD ObjectsSimilarity Search on Voxelized CAD Objects

Hans-Peter Kriegel,Stefan Brecheisen, Peer Kröger,Martin Pfeifle, Matthias Schubert

ACM SIGMOD 2003San Diego, CaliforniaJune 9-12, 2003

Database GroupInstitute for Computer ScienceUniversity of Munich, Germany

Page 2: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Vector Set Modelnewnew

Outline of the TalkOutline of the Talk

Evaluation

Introduction

Space Partitioning Models

Data Partitioning Models

Conclusion

Introduction

Space Partitioning Models

Introduction

Evaluation

Conclusion

Vector Set Modelnewnew

Data Partitioning Models

Page 3: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

SSystem Requirementsystem Requirements::

System should help to reduce the cost of developing new partsSystem should help to reduce the cost of developing new parts Avoidance of „reinventing the wheel“Avoidance of „reinventing the wheel“ Reusing existing parts Reusing existing parts

IntroductionIntroduction

spatialobjects

complex

CAD-DB similarity query

timeout

unapt results

similarity query

similarity query

meaningful results in comparatevily short time

SSolutionolution::

Efficient Similarity SearchEfficient Similarity Search

Effective Similarity Search Effective Similarity Search Similarity Model based onSimilarity Model based onSets of Feature VectorsSets of Feature Vectors}}

Page 4: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

Space Partitioning Models

Data Partitioning Models

Evaluation

Conclusion

Space Partitioning Models

Introduction

Page 5: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Voxelization of triangle meshesVoxelization of triangle meshes and object normalization and object normalization

normalized, voxelized object

Space Partitioning ModelsSpace Partitioning ModelsFeature TransformationFeature Transformation

0.75

CAD system

3D CAD object is represented by a mesh of triangles3D CAD object is represented by a mesh of triangles

triangle meshes

Partitioning of the data space into disjointPartitioning of the data space into disjoint, enumerated , enumerated cells cells

Extraction of Extraction of kk spatial features for each cel spatial features for each celll

0.34

.

.

.feature vector

Similarity of objects = vicinity of according feature vectorsSimilarity of objects = vicinity of according feature vectors

Page 6: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Space Partitioning ModelsSpace Partitioning ModelsNotationNotation

r = 9

p = 3

CAD object

representing V o

[2D example]

The data space is partitioned into The data space is partitioned into pp axis-parallel grid axis-parallel grid cells in each dimensioncells in each dimension

Let Let r r = the raster (voxel) resolution= the raster (voxel) resolution

V V oo = set of voxels representing object = set of voxels representing object o o OO

VViioo = set of voxels covered by = set of voxels covered by oo in cell in cell ii

ffoo(i)(i) = = ii-th value of the feature vector of -th value of the feature vector of oo

Page 7: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Space Partitioning ModelsSpace Partitioning ModelsThe Volume ModelThe Volume Model

4

[2D example]

Count the number of object voxels VViioo in each cell i

Normalize by the voxel capacity of each cell K

Feature value for cell i:

ffoo(i)(i) = = wherewhere K = K = in the 3D case in the 3D case K

Voi 3

pr

)(

1/9

66693660

Page 8: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

The solid angle model measures the concavity and convexity of surfacesThe solid angle model measures the concavity and convexity of surfaces

0.340.300.31 0.32

Compute the SA-value Compute the SA-value SASA((vv) for each surface-voxel ) for each surface-voxel vv of object of object oo:: SASA((vv)= )= , where is a voxelized , where is a voxelized reference spherereference sphere around around vv

|SvVo||Sv|

Sv

Space Partitioning ModelsSpace Partitioning ModelsThe Solid Angle ModelThe Solid Angle Model

[2D example]

0

1Sy y

Sx

x

Each cell is represented by one dimension in the feature vectorEach cell is represented by one dimension in the feature vector

fo(i) = 0 if cell i contains no voxel of o

fo(i) = 1 if cell i contains only inside voxel of o

mj=1m

1 fo(i) = SA(v)

Page 9: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

Space Partitioning Models

Data Partitioning Models

Evaluation

Conclusion

Space Partitioning Models

Data Partitioning Models

Page 10: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Data Partitioning ModelsData Partitioning ModelsCover Sequence ModelCover Sequence Model

S2=((C0+C1)+ C2) Err2=107123

[2D example]

S1=(C0+C1) Err1=14

Cover-Sequence: Error: 2D feature vector fo:1167

fo4·i+1 = x-position of Ci

fo4·i+2 = y-position of Ci

fo4·i+3 = x-extension of Ci

fo4·i+4 = y-extension of Ci

Approximation of the object by means of a cover sequence Approximation of the object by means of a cover sequence (Jagadish 91)

6513

S3=((C0+C1)+ C2)-C3 ) Err3=7

Cover sequence: Cover sequence: SSk k = = ((((((CC0 0 11CC1 1 ) ) 22CC2 2 ) … ) … kkCCkk ), where ), where i i {+, -}, {+, -}, kk the the number of covers, and number of covers, and CCi i axis-parallel (hyper-) rectangles axis-parallel (hyper-) rectangles

Approximation quality: symmetric volume difference Approximation quality: symmetric volume difference ErrErrkk=|=|oo XOR XOR SSkk||

Computation of Computation of SSkk by means of by means of a a greedy algorithmgreedy algorithm

The object is represented by aThe object is represented by a 66··kk dimensional feature vector (3D case) dimensional feature vector (3D case)

Page 11: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Data Partitioning ModelsData Partitioning ModelsVector Set ModelVector Set Model

S4query (original) = ((((C0 + C1) – C2) – C3) – C4)

S4database

S4query (optimal) = ((((C0 + C1) – C3) – C4) – C2)

S4database

dat

abas

e ob

ject

qu

ery

obje

ct

q1px

q1py

q1ex

q1ey

q2px

q2py

q2ex

q2ex

q3px

q3py

q3ex

q3ey

q4px

q4py

q4ex

q4ey

deuclid( ,

db1px

db1py

db1ex

db1ey

db2px

db2py

db2ex

db2ex

db3px

db3py

db3ex

db3ey

db4px

db4py

db4ex

db4ey

)

q1px

q1py

q1ex

q1ey

q3px

q3py

q3ex

q3ey

q4px

q4py

q4ex

q4ey

q2px

q2py

q2ex

q2ex

deuclid( ,

db1px

db1py

db1ex

db1ey

db2px

db2py

db2ex

db2ex

db3px

db3py

db3ex

db3ey

db4px

db4py

db4ex

db4ey

) >>>>

Page 12: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Data Partitioning ModelsData Partitioning ModelsVector Set ModelVector Set Model

position X

position Y

extension Y

extension Xq1px

q1py

q1ex

q1ey

q2px

q2py

q2ex

q2ex

q3px

q3py

q3ex

q3ey

q4px

q4py

q4ex

q4ey

q1px

q1py

q1ex

q1eyq2px

q2py

q2ex

q2exq3px

q3py

q3ex

q3ey q4px

q4py

q4ex

q4ey

db1px

db1py

db1ex

db1ey

db2px

db2py

db2ex

db2ex

db3px

db3py

db3ex

db3ey

db4px

db4py

db4ex

db4ey

db1px

db1py

db1ex

db1eydb2px

db2py

db2ex

db2exdb3px

db3py

db3ex

db3eydb4px

db4py

db4ex

db4ey

the cover sequence the cover sequence SSk k = = ((((((CC0 0 11CC1 1 ) ) 22CC2 2 ) … ) … kkCCkk ) is represented ) is represented

by a set of vectorsby a set of vectors XX 66, , | | XX | | k k (in the 3D case)(in the 3D case)

[2D example]

query object database object

distance measure between two vector sets distance measure between two vector sets X X and and YY: :

perfect matchingperfect matching

create a create a complete bipartite graph complete bipartite graph G G = = ((XXY, XY, XYY))

weight function for unmatched nodes if |X| weight function for unmatched nodes if |X| |Y||Y|

weight of each edge (x, y) weight of each edge (x, y) XXY is dY is deuclideuclid((x,yx,y)) computed by the Kuhn Munkres algorithm in computed by the Kuhn Munkres algorithm in OO((kk33))

the minimum weightthe minimum weight

position X

position Y

extension Y

extension X

weight function for unmatched nodes=distance to a dummy cover

Page 13: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Data Partitioning ModelsData Partitioning ModelsVector Set ModelVector Set Model

Efficient similarity queries based on multi-step query processingEfficient similarity queries based on multi-step query processing

range queries range queries (Faloutsos et al. 94)kk-Nearest Neighbor Queries -Nearest Neighbor Queries (Korn et al. 96) optimal Multi-Step optimal Multi-Step kk-Nearest Neighbor-Nearest Neighbor Search Search (Seidl, Kriegel 98)

Filter Step(index-based)

Refinement Step(exact evaluation)

candidates

results

k k (=cardinality of the two vector sets) times the distance between the centroides of (=cardinality of the two vector sets) times the distance between the centroides of the two vector sets, lower bounds the minimum weight perfect matching distancethe two vector sets, lower bounds the minimum weight perfect matching distance

query object database object

position X

position Y

extension Y

extension X

lower bounding property guarantees no false dropslower bounding property guarantees no false drops oo11, , oo2 2 O O : : ddoo((oo11, , oo22)) ddff((oo11, , oo22) )

query centroid database centroid

Page 14: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

Space Partitioning Models

Data Partitioning Models

Evaluation

Conclusion

Data Partitioning Models

Evaluation

Page 15: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

EvaluationEvaluation

Evaluation of similarity models by means of Evaluation of similarity models by means of kk-nn queries-nn queries

report the report the kk objects having the smallest distance to a query object objects having the smallest distance to a query object qq

distance: 0.0 0.0 0.368 0.368 0.666

distance: 0.0 0.0098 0.307 0.416 0.46 0,0220,01780,01760,00,0distance:

distance: 0,0 0,04 0,04 0,07 0,12

volume model:

solid angle model:

„good“ similarity model? „bad“ similarity model?

volume model:

solid angle model:

Problem: Problem: • • evaluation using k-nn queries is subjectiveevaluation using k-nn queries is subjective

• • quality measure of a model depends on quality measure of a model depends on the choice of the query objectsthe choice of the query objects

KK-nn Queries -nn Queries

Page 16: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

EvaluationEvaluation

Hierarchical Clustering:Hierarchical Clustering:

More objective since each object of the database More objective since each object of the database is taken into account to measure the quality of a similarity modelis taken into account to measure the quality of a similarity model

OPTICS OPTICS (Kriegel et al. 99)

• Yields a density-based hierarchical clusteringYields a density-based hierarchical clustering

• Insensitive to input parametersInsensitive to input parameters

• Result (so called Result (so called reachability plotreachability plot) can be easily visualized ) can be easily visualized

and is suitable for interactive explorationand is suitable for interactive exploration

A1

A2

2

A1 A2 BB

A BA

B

1

Data Space Reachability Plot

Hierarchical Clustering Hierarchical Clustering

Page 17: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

EvaluationEvaluation

Volume Model

Class A

AB

C

no classes found

Class B

Class C

Solid Angle Model

Car Datasetapp. 200 parts, r=30, p=3

Space Partitioning Similarity Models Space Partitioning Similarity Models

Page 18: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

EvaluationEvaluation

Class E

Class X

Class G

XA

C E

GCover Sequence Model

Vector Set Model

Class E

Class G2

Class G1

Class F

Class A2

Class A1

A1

A2

B C DE F G1

G2A

G

Car Datasetapp. 200 parts, r=15, 7 covers

Data Partitioning Similarity Models Data Partitioning Similarity Models

Page 19: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

EvaluationEvaluation

Efficiency Evaluation:Efficiency Evaluation:

100 10-nn-queries on the plane database,100 10-nn-queries on the plane database, cover sequence with 7 covers cover sequence with 7 covers

CPU time

[sec]

I/O time

[sec]

total runtime

[sec]

vector set without filter 1025.32 806.40 1831.72

vector set with filter

(X-tree)

105.88 932.80 1038.68

cover sequence

(X-tree)

142.82 2632.06 2774.88

vector set model <-> cover sequence model vector set model <-> cover sequence model

vector set model outperforms cover sequence modelvector set model outperforms cover sequence model

Efficiency of the Vector Set ModelEfficiency of the Vector Set Model

vector set model without filter <-> vector set model with filtervector set model without filter <-> vector set model with filter

Filter step leads to a speed up factor of approximately 2Filter step leads to a speed up factor of approximately 2

Filter step has a selectivity of approximately 20%Filter step has a selectivity of approximately 20%

Page 20: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Outline of the TalkOutline of the Talk

Introduction

Space Partitioning Models

Data Partitioning Models

Evaluation

Conclusion

Evaluation

Conclusion

Page 21: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

ConclusionConclusion

Contribution:Contribution: Sets of feature vectors : Sets of feature vectors : a new way of representing objects in similarity searcha new way of representing objects in similarity search somewhere between feature vectors and graphssomewhere between feature vectors and graphs

Effective and efficient similarity model for CAD data Effective and efficient similarity model for CAD data based on sets of feature vectorsbased on sets of feature vectors

Evaluation of similarity models based on hierarchical clustering Evaluation of similarity models based on hierarchical clustering

position X

position Y

extension Y

extension Xq1px

q1py

q1ex

q1ey

q2px

q2py

q2ex

q2ex

q3px

q3py

q3ex

q3ey

q4px

q4py

q4ex

q4ey

q1px

q1py

q1ex

q1eyq2px

q2py

q2ex

q2exq3px

q3py

q3ex

q3ey q4px

q4py

q4ex

q4ey

db1px

db1py

db1ex

db1ey

db2px

db2py

db2ex

db2ex

db3px

db3py

db3ex

db3ey

db4px

db4py

db4ex

db4ey

db1px

db1py

db1ex

db1eydb2px

db2py

db2ex

db2exdb3px

db3py

db3ex

db3eydb4px

db4py

db4ex

db4ey

Page 22: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

ConclusionConclusion

Future Work:Future Work: BOSS (BOSS (BBrowsing rowsing OOPTICS-Plots for PTICS-Plots for SSimilarity imilarity SSearch)earch)

Interactive data browsing tool based on reachability plotsInteractive data browsing tool based on reachability plots User-friendly method to support the time-consuming task User-friendly method to support the time-consuming task of finding similar parts:of finding similar parts:

• Revealing the hierarchical clustering structure Revealing the hierarchical clustering structure

of the dataset at a glanceof the dataset at a glance

• Displaying suitable representatives for large clustersDisplaying suitable representatives for large clusters

Page 23: Using Sets of Feature Vectors for  Similarity Search on Voxelized CAD Objects

San Diego, 06/12/03San Diego, 06/12/03Martin Pfeifle, Database Group, University of MunichMartin Pfeifle, Database Group, University of Munich

Thank you for your attention

Any questions?

??

?

??

?

?

?