DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All...
-
Upload
phungthien -
Category
Documents
-
view
218 -
download
4
Transcript of DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All...
DOGMA: A Disk-Oriented Graph Matching algorithm
Presented By: Jasmeet Jagdev
• Increasing size of RDF databases
• No graph-based disk-resident index developed so far
Motivation
Snapshot of GovTrack database
Query
• DOGMA Index
• Query Processing Algorithm
Outline
• Indexing Algorithm
• DOGMA_basic
• DOGMA_adv
• Extensions of index
• DOGMA_ipd
• DOGMA_epd
• Experimental Results
• Notations
S – Subjects P – Properties V - Values
is a triple Sets
A graph A Query graph
Answer =
Notations
k-merge
Let G1, G2 be two RDF graphs
G1 G2
V1
V2
Vm
DOGMA Index
• It is a balanced binary tree DR
• It has order k ( k>=2)
• Each node = size of disk page
• Each node is labeled by a graph
• Label of leaf node correspond to partition of GR
• Label of parent Node = k-merge of graphs labeling
the children
DOGMA Index
Indexing Algorithm
u
v
m
u
v
m
u m+v
u m+v
e
e+x+g x
g
e g
e+g
Indexing Algorithm (contd..)
• Keeps moving from level Gj… G0
• Recursively builds the
tree
• Uses an external graph partitioning algorithm
Indexing Algorithm (contd..)
Complexity
• Worst-case time complexity is :
= Number of Vertices
= Worst-case complexity of Graph Partitioning Algorithm
Query Processing Algorithm
• Assumes existence of two index retrieval functions
1.
2.
• Types
1. DOGMA_basic
2. DOGMA_adv
DOGMA_basic
DOGMA_basic (contd..)
Complexity
• Worst-case time complexity is :
= Number of Vertices
= Number of Variables
• Introduction of distance constraint
• Distance index is also maintained
• All –pair shortest path – not calculated • Worst case time complexity O(|V|3) • Space complexity O(|V|2)
• Construct two lower-bound distance indexes • DOGMA_ipd • DOGMA_epd
• Approximation techniques are used to achieve acceptable complexities
DOGMA_adv
Distance condition
Distance condition
DOGMA_adv (contd..)
DOGMA_ipd
• Ipd - Internal Partition Distance
• Distance from the Vertex to the outside of the subgraph
P
N M
v u
• This distance is stored as closest approximation
Complexity of Building DOGMA_ipd
• Worst-case time complexity is :
• Worst-case space complexity is :
Using DOGMA_ipd
DOGMA_epd
• epd - External Partition Distance
• Distance from the Vertex to other subgraph (denoted by a color)
P
N M
v u
• epd(v,blue) = distance from v to blue-colored subgraph
Using DOGMA_epd
Complexity of Building DOGMA_epd
• Worst-case time complexity is :
• Worst-case space complexity is :
Experimentation
• Database systems used for comparison
• Sesame2
• Jena2
• JenaTDB
• SwiftOWLIM
• RDF datasets used
• GovTrack – 14.5 m (well connected)
• LUBM – 13.5 m (sparse and loosely connected)
• Flicker social network – 16 m (well connected and
dense)
Experimental Results Low complexity
Experimental Results Low complexity
Experimental Results High complexity
Experimental Results High complexity
Experimental Results Storage requirement