Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.
-
Upload
brooke-hunter -
Category
Documents
-
view
226 -
download
1
Transcript of Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.
![Page 1: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/1.jpg)
Graph Indexing Techniques
Seoul National UniversityIDB Lab.
Kisung Kim2011. 3. 23
![Page 2: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/2.jpg)
Outline
• Category of graph queries• Querying in collection DB• References
2/22
![Page 3: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/3.jpg)
Category of Graph Queries: Matching Type
• Exact subgraph matching– Find graphs in DB which have all components of the query graph
• Similarity subgraph matching– Find graphs in DB which have some components of the query graph– Similarity measure is needed
• Super graph matching– Find graphs in DB which are contained in the query graph
Query graph Exact subgraph SimilaritySubgraph
Query graph
3/22
![Page 4: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/4.jpg)
Category of Graph Queries: Target DB
• Collection DB: large number of small graphs– e.g. Chemical compounds– Retrieval component
– IDs of graphs which contain matching parts
• Large graphs: small number of large graphs– e.g. Social network, RDF graph– Retrieval component
– All matching subgraphs
G1
G2
G3
G4
G7
G6
G5
Query graph
G1, G3, G5
Results: graph ID list
Querying Collection DB
Query graph
Results: matching subgraphs
Querying Large Graphs
4/22
![Page 5: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/5.jpg)
Query Processing in Collection DB
• Processing flow
• Verification uses usual pair-wise subgraph isomorphism algo-rithm
• Most of techniques focus on filtering techniques– The cost of verification is high– To reduce the number of verification execution
Query Filtering Candidategraph set Verification Answer
Graphs
5/22
![Page 6: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/6.jpg)
Query Processing in Large Graphs
• Processing flow
• Focus on node indexing– To reduce search space– Use structural information of nodes
• Build subgraph by joining candidate nodes– Join methods are not relatively researched– Optimization using join ordering
QueryIndexsearch
Candidatenode sets
Building subgraphs
Answersubgraphs
6/22
![Page 7: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/7.jpg)
Graph Indexing Techniques
Target Database Query Type
GraphGrep[Shasha et al., PODS’02]
Collection DB Exact Feature(Path) based index
gIndex[Yan et al., SIGMOD’04]
Collection DB Exact Feature(Graph) based index
Grafil[Yan et al., SIGMOD’05]
Collection DB Exact & Similarity Feature based similarity search
C-tree[He and Singh, ICDE’06]
Collection DB Exact & Similarity Closure based index
QuickSI[Shang et al., VLDB’08]
Collection DB Exact Verification algorithm
Tale[Tian and Patel, ICDE’08]
Collection DB Exact & SimilaritySimilarity search using node in-
dex
GraphQL[He and Singh, SIGMOD’08]
Large graphs Exact Node indexing
Spath[Zhao and Han, VLDB’10]
Large graphs ExactNode indexing using neighbor-
hood information
7/22
![Page 8: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/8.jpg)
Outline
• Category of graph queries• Querying in collection DB• References
8/22
![Page 9: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/9.jpg)
GraphGrep(1/2) [Shasha et al. PODS’02]
• First work adopts the filtering-and-verification framework• Path-based index
– Fingerprint of database– Enumerate the set of all paths(length <= L) of all graphs in DB– For each path, the number of occurrences in each graphs are stored in
hash table
B
A
C
B
B
A
C
B
D
E
C
A B
B
C
Key g1 g2 g3
h(CA) 1 0 1
…
h(ABCB) 2 2 0
g1 g2g3 Index
9/22
![Page 10: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/10.jpg)
GraphGrep(2/2): Query Processing
• Filtering– Make the fingerprint of query q
– Hash all paths (length <= L) of q– Compare the fingerprint of the query with the fingerprint of database
– Discard a graph whose value in fingerprint is less than the value in query fin-gerprint
• Verification– Check subgraph isomorphism tests
Key g1 g2 g3
h(AB) 2 2 1
h(AC) 1 0 1
h(BAC) 2 0 1
B
A
C
B
B
A
C
B
D
E
C
A B
B
C
g1 g2g3
Index
B
A C
AB:1AC:1BAC:1
Query
Candidates= {g1, g3}
Verification
10/22
![Page 11: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/11.jpg)
gIndex(1/6) [Yan et al., SIGMOD’04]
• Path-based approach has week points– Path is too simple: structural information is lost– There are too many paths: the set of paths in a graph database usually
is huge
• Solution– Use graph structure instead of path as the basic index feature
c c c c
c cc c
c c
c c
c c
c c
c c
c c
Sample Database
c
c c
c
c
c
Query
c c c
c c c
Paths in Query Graph
Cannot Filter Any GraphsIn Database
11/22
![Page 12: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/12.jpg)
gIndex(2/6): Frequent Fragment
• The number of graph structure is largeIndex only frequent subgraphs
• support(g)– The number of graphs in D (graph database), where g is a subgraph
• minSup– Minimum support threshold– Index a fragment, g only if support(g) ≥ minSup
• Size-increasing support– Frequent fragments are increasing as the size of a fragment increases– Low minSup for small fragments, high minSup for large fragment
12/22
![Page 13: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/13.jpg)
gIndex(3/6): Frequent Fragment
A A
B
A A
B B
A A
B B
A
A
B B
A A
A B
A A B
A B B
B A B
A B A
A B
B
A
A A
B
A
B B
B A
B
A
B A
B
A
B B
A
A A
B B
A
A
A
B B
Size=1 Size=2 Size=3 Size=4
F=3
F=4B B
F=3
F=3
F=3
F=2
F=2
F=2
F=1
F=1
F=1
F=1
F=2
F=1
F=1
minSup=1 minSup=1 minSup=2 minSup=2 13/22
![Page 14: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/14.jpg)
gIndex(4/6): Discriminative Fragment
• Redundant fragment– Fragments whose indexed graphs are also indexed by its subgraphs– We don’t need to include redundant fragments
• Discriminative fragment– Fragments which are not redundant
A A
B
A A
B B
A A
B BA A B
A B B
A B
B
A
Size=2 Size=3
Df1={g1, g2, g3}
Df2={g2, g3, g4}Df3={g2, g3}=Df1∩Df2
f1
f2
f3
g1
g2
g3
A
A
B B
g4
14/22
![Page 15: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/15.jpg)
a
gIndex(5/6): gIndex Tree
• Use graph serialization method – For fast graph isomorphism checking during index search– DFS coding [Yan et al. ICDM’02]– Translate a graph into a unique edge sequence
• gIndex Tree– Prefix tree which consists of the edge sequences of discriminative fragments– Record all size-n discriminative fragments in level n– Black nodes discriminative fragments
– Have ID lists: the ids of graphs containing f i
– White nodes redundant fragments; for Apriori pruning
X
X
Z Y
ba
ba
X
X
Z Y
b
ba
v0
v1
v2 v3
DFS Coding
<(v0,v1),(v1,v2),(v2,v0),(v1,v3)>
f1
f2
f3
e1
e2
e3
Level 0
Level 1
Level 2
…
gIndex Tree15/22
![Page 16: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/16.jpg)
gIndex(6/6): Searching
• Searching process– Given a query q, enumerate all q’s fragments (size <= maxSize)– Locate the fragments in gIndex tree– Intersect the id lists associated with the fragments
• Apriori pruning– Generating every fragment is inefficient– If a fragment is not in gIndexTree, we need not check its super-graphs
any more– Redundant fragments need to be recorded for Apriori pruning
f1
f2
f3
e1
e2
e3
Level 0
Level 1
Level 2
…
gIndex Tree
Query<e1, e2, e3, e4, e5>
Fragments<e1><e1, e2><e1, e2, e3><e1, e2, e3, e4> stop<e2>…
16/22
![Page 17: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/17.jpg)
Grafil(1/4) [Yan et al., SIGMOD’05]
• Subgraph similarity search• Feature-based approach• Similarity search using relaxed queries
– Relax a query by deletion of k edges– Missed edges incur missed features
• Main question– What is the maximum missed features() when relaxing a query with k
missed edges?
Feature Vector
G1 {u1, u2, …, un}
G2
…
Gn
Subgraph exact search
Subgraph similarity search
𝑓𝑜𝑟 1≤ 𝑖≤𝑛 ,𝑢𝑖≥𝑣 𝑖
{v1, v2, …, vn}
Query
17/22
![Page 18: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/18.jpg)
Grafil(2/4): Feature Misses
Query
Relaxed Queries
Features
fa fb fc
fa fb fc
1 2 4
fa fb fc
1 0 3
fa fb fc
0 1 2
fa fb fc
0 1 2
Miss 1 edges =4
=3
=3
FeatureMiss
7-4=3
7-3=4
7-3=4
Maximum Feature Missesmmax=4
18/22
![Page 19: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/19.jpg)
Grafil(3/4): Feature Miss Estimation
• Problem– Given a query Q and a set of features contained in Q, if the relaxation ra-
tio is given, what is the maximal number of features that can be missed?
• Use edge-feature matrix– Find the maximum number of columns that can be hit by k rows– K: the number of missing edges in Q
• Classic maximum coverage problem (set k-cover)– Proved NP-complete
Features
fa fb fc
Query
fa fb1 fb2 fc1 fc2 fc3 fc4
e1 0 1 1 1 0 0 0
e2 1 1 0 0 1 0 1
e3 1 0 1 0 0 1 1
Edge-Feature Matrix
e1
e2 e3
19/22
![Page 20: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/20.jpg)
Grafil(4/4): Feature Conjugation
• Compensate the misses of a feature by occurrences of an-other features in G
• Using all the features together in one filter would deteriorate the filtering performance
• Solution– Use multiple filters– Feature set selection
Query Features
fafa fb
3 4
mmax=4
(3-0)+0=3 ≤ mmax
A
B
A AA A
C
BB B
fb
C
AA A
A A
C
Graph
Relaxation Ratio = 1
20/22
![Page 21: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/21.jpg)
Graph Indexing Techniques
Target Database Query Type
GraphGrep[Shasha et al., PODS’02]
Collection DB Exact Feature(Path) based index
gIndex[Yan et al., SIGMOD’04]
Collection DB Exact Feature(Graph) based index
Grafil[Yan et al., SIGMOD’05]
Collection DB Exact & Similarity Feature based similarity search
C-tree[He and Singh, ICDE’06]
Collection DB Exact & Similarity Closure based index
QuickSI[Shang et al., VLDB’08]
Collection DB Exact Verification algorithm
Tale[Tian and Patel, ICDE’08]
Collection DB Exact & SimilaritySimilarity search using node in-
dex
GraphQL[He and Singh, SIGMOD’08]
Large graphs Exact Node indexing
Spath[Zhao and Han, VLDB’10]
Large graphs ExactNode indexing using neighbor-
hood information
21/22
![Page 22: Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.](https://reader035.fdocuments.in/reader035/viewer/2022081511/56649e225503460f94b0f8e9/html5/thumbnails/22.jpg)
References• [Shasha et al., PODS’02] Dennis Shasha, Jaso T. L. Wang, Rosalba Giugno, Algo-
rithmics and Applications of Tree and Graph Searching. PODS, 2002.• [Yan et al., SIGMOD’04] Xifeng Yan, Philip S. Yu, Jiawei Han, Graph Indexing: A
Frequent Structure-based Approach. SIGMOD, 2004.• [Yan et al., SIGMOD’05] Xifeng Yan, Philip S. Yu, Jiawei Han, Substructure Simi-
larity Search in Graph Databases. SIGMOD, 2005. • [Tian and Patel, ICDE’08] Yuanyuan Tian , Jignesh M. Patel. TALE: A Tool for Ap-
proximate Large Graph Matching. ICDE, 2008.• [He and Singh, SIGMOD’08] Huahai He, Ambuj K. Singh. Graphs-at-a-time: query
language and access methods for graph databases. SIGMOD, 2008.• [Zhao and Han, VLDB’10] Peiziang Zhao, Jiawei Han. On Graph Query Optimiza-
tion in Large Networks. VLDB, 2010.• [He and Singh, ICDE’06] Huahai He, Ambuj K. Singh, Closure-Tree: An Index
Structure for Graph Queries. ICDE, 2006• [Shang et al., VLDB’08] Haichuan Shang, Ying Zhang, Xuemin Lin, Jeffrey Xu Yu,
Taming Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomor-phism. VLDB, 2008
22/22