DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All...

32
DOGMA: A Disk-Oriented Graph Matching algorithm Presented By: Jasmeet Jagdev

Transcript of DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All...

Page 1: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

DOGMA: A Disk-Oriented Graph Matching algorithm

Presented By: Jasmeet Jagdev

Page 2: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

• Increasing size of RDF databases

• No graph-based disk-resident index developed so far

Motivation

Page 3: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Snapshot of GovTrack database

Page 4: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Query

Page 5: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

• DOGMA Index

• Query Processing Algorithm

Outline

• Indexing Algorithm

• DOGMA_basic

• DOGMA_adv

• Extensions of index

• DOGMA_ipd

• DOGMA_epd

• Experimental Results

• Notations

Page 6: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

S – Subjects P – Properties V - Values

is a triple Sets

A graph A Query graph

Answer =

Notations

Page 7: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

k-merge

Let G1, G2 be two RDF graphs

G1 G2

V1

V2

Vm

Page 8: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

DOGMA Index

• It is a balanced binary tree DR

• It has order k ( k>=2)

• Each node = size of disk page

• Each node is labeled by a graph

• Label of leaf node correspond to partition of GR

• Label of parent Node = k-merge of graphs labeling

the children

Page 9: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

DOGMA Index

Page 10: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Indexing Algorithm

Page 11: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

u

v

m

u

v

m

u m+v

u m+v

e

e+x+g x

g

e g

e+g

Indexing Algorithm (contd..)

Page 12: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

• Keeps moving from level Gj… G0

• Recursively builds the

tree

• Uses an external graph partitioning algorithm

Indexing Algorithm (contd..)

Page 13: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Complexity

• Worst-case time complexity is :

= Number of Vertices

= Worst-case complexity of Graph Partitioning Algorithm

Page 14: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Query Processing Algorithm

• Assumes existence of two index retrieval functions

1.

2.

• Types

1. DOGMA_basic

2. DOGMA_adv

Page 15: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

DOGMA_basic

Page 16: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

DOGMA_basic (contd..)

Page 17: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Complexity

• Worst-case time complexity is :

= Number of Vertices

= Number of Variables

Page 18: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

• Introduction of distance constraint

• Distance index is also maintained

• All –pair shortest path – not calculated • Worst case time complexity O(|V|3) • Space complexity O(|V|2)

• Construct two lower-bound distance indexes • DOGMA_ipd • DOGMA_epd

• Approximation techniques are used to achieve acceptable complexities

DOGMA_adv

Page 19: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Distance condition

Distance condition

DOGMA_adv (contd..)

Page 20: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

DOGMA_ipd

• Ipd - Internal Partition Distance

• Distance from the Vertex to the outside of the subgraph

P

N M

v u

• This distance is stored as closest approximation

Page 21: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Complexity of Building DOGMA_ipd

• Worst-case time complexity is :

• Worst-case space complexity is :

Page 22: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Using DOGMA_ipd

Page 23: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

DOGMA_epd

• epd - External Partition Distance

• Distance from the Vertex to other subgraph (denoted by a color)

P

N M

v u

• epd(v,blue) = distance from v to blue-colored subgraph

Page 24: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Using DOGMA_epd

Page 25: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Complexity of Building DOGMA_epd

• Worst-case time complexity is :

• Worst-case space complexity is :

Page 26: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Experimentation

• Database systems used for comparison

• Sesame2

• Jena2

• JenaTDB

• SwiftOWLIM

• RDF datasets used

• GovTrack – 14.5 m (well connected)

• LUBM – 13.5 m (sparse and loosely connected)

• Flicker social network – 16 m (well connected and

dense)

Page 27: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Experimental Results Low complexity

Page 28: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Experimental Results Low complexity

Page 29: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Experimental Results High complexity

Page 30: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Experimental Results High complexity

Page 31: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation

Experimental Results Storage requirement

Page 32: DOGMA: A Disk-Oriented Graph Matching algorithmgweddell/cs848/presentations/Dogma... · • All –pair shortest path – not calculated ... •DOGMA_ipd •DOGMA_epd • Approximation