Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3...
-
Upload
bryan-sutton -
Category
Documents
-
view
213 -
download
0
Transcript of Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3...
Performance Guarantees for
Distributed Reachability Queries
Wenfei Fan1,2 Xin Wang1 Yinghui Wu1,3
1University of Edinburgh
2Harbin Institute of Technology
3University of California, Santa Barbara
1
outline
Querying distributed real-life graphs
• Real-life graphs are often fragmented/distributed
• Distributed reachability queries
• Distributed bounded reachability queries
• Distributed regular reachability queries
Distributed reachability with MapReduce
Experimental study
Conclusion
Distributed query evaluation with performance guarantees
2
Partial Evaluation
Yinghui Wu VLDB 2012
Distributed Real-life Graphs
Real life graphs are distributed
• Geo-distributed, e.g., data centers
• Decentralization, e.g., social networks
• Distributed entity and personal information
3Yinghui Wu VLDB 2012Real-life graphs are purposely or naturally distributed
Distributed querying methods
Federated/centralized graph database
• collect and link graph fragments
• query the centralized graph
4
construction and maintenance cost
centralized querying
Q
Q(G)
fragments...
Yinghui Wu VLDB 2012
Distributed querying methods
Graph exploration strategy
• Master node and slave node
• Predefined graph partition and query execution plan
5
no bounds on visit numbers and data shipment
master node
Q
Q(G)
query planintermediateresults
slave node
...
Yinghui Wu VLDB 2012
Querying a distributed social network
6
(DB* HR*)∪
Ann, "CTO"
Mark, "FA"
Fred, "HR"
Walt, "HR"
Dan,"DB"
Bill,"DB" Mark,"FA"
Pat,"SE"Tom,"AI"
Ross,"HR"
Jack,"MK"
Ben,"MK"
Emmy,"HR"
Mat,"HR"
Q
DC1
DC2
DC3
Yinghui Wu VLDB 2012Using partial evaluation to obtain performance guarantees
centralized method?Graph exploration?
Yinghui Wu VLDB 2012
Partial evaluation
Partial evaluation (a.k.a program specialization)
• given a function f(s,d) and a part of input e.g., s,
specializes f(s,d) w.r.t s
• only conducts the part of f’s computation that depends on s
• generates a residual function f’
Partial evaluation: generating partial answer
f (s, d) f’ (d)
s
Q (Fi, G)
Fi
Q’ (G)
for graph
queries?
7
Distributed graphs and graph queries
8
Distributed graph
• graph fragmentation F = (F, Gf)
• fragment graph Gf
Reachability query
• reachability query Qr(s,t)
• bounded reachability
query Qbr(s,t,l)
• regular reachability
(path) query Qrr(s,t,R)
R::= ε| a | RR | R R | R*∪
Fred, "HR"
Walt, "HR"
Bill,"DB"
Pat,"SE"Tom,"AI"
Ross,"HR"
Jack,"MK"
Emmy,"HR"
Mat,"HR"
F1
F2
F3
Gf
fragmenta virtual node of F1an in-node of F1
a cross edge
Ann, "CTO"
Mark, "FA"
Qr(Ann, Mark)
Ann, "CTO"
Mark, "FA"
Qbr(Ann, Mark, 5)
(DB* HR*)∪
Ann, "CTO"
Mark, "FA"Qrr(Ann, Mark, (DB* HR*))∪
5
Yinghui Wu VLDB 2012
Distributed graph querying framework
9
Applying partial evaluation to graph querying
coordinator Sc
Q
Q(G)fragments
...
coordinating site Sc
and a set of graph
fragments F1, …, Fn
distributing at Sc:
post Q to fragments
local evaluation:
partially evaluate Q
Assembling at Sc
QQQQ
Q(Fi) Q(Fi)
Q(Fi)Q(Fi)
Yinghui Wu VLDB 2012
Distributed reachability queries
10Yinghui Wu VLDB 2012
Performance guarantees: Over a fragmentation F = (F, Gf) of a
graph G, reachability queries can be evaluated (a) in O(|Vf||Fm|)
time, (b) by visiting each site only once, and (c) with the total
network traffic bounded by O(|Vf|2), where Gf = (Vf , Ef) and Fm
is the largest fragment in F.
A distributed reachability evaluation algorithm DisReach
• Coordinator Sc posts qr(s,t) to each fragment site in F
• Each site locally evaluates qr(s,t) in parallel, and produces
partial answer as a set of Boolean equations
• Sc collects and assembles the partial results
Distributed reachability: partial evaluation
11Partial evaluation by introducing Boolean variables
Yinghui Wu VLDB 2012
Local evaluate each qr(v,t) on
Fi in parallel:
for each in-node v’ in Fi,
decides if v’ reaches t;
introduce a Boolean variable
to each v’
Partial answer to qr(v,t): a set
of Boolean formula,
disjunction of variables of v’ to
which v can reach
qr(v,t)
v
tv’
t
qr(v,v’)
Xv’ = qr(v’,t)
= Xv1’ or … or Xvn’
Distributed reachability: assembling
12Partial evaluation by introducing Boolean variables
Yinghui Wu VLDB 2012
Collect the Boolean equation
set at coordinator Sc
solve a Boolean equation
system over a dependency
graph
qr(s,t) is true iff Xs = true at Sc
Xv = Xv’’ or Xv’
Xv’’ = false
Xt = 1
Xv’ = Xt
Xs = Xv
O(|Vf|)
QQQQ
Q
Yinghui Wu VLDB 2012
1. Dispatch Q to
fragments (at Sc)
2. Partial evaluation:
generating Boolean
equations (at Fi)
3. Assembling:
solving equation
system (at Sc)
Distributed reachability queries: example
Sc Jack,"MK"
Emmy,"HR"
Mat,"HR"
F2
Fred, "HR"
Walt, "HR"
Bill,"DB"
F1
Pat,"SE"Tom,"AI"
Ross,"HR"
F3
13
Ann
Mark
QQQQ
Q
Yinghui Wu VLDB 2012
1. Dispatch Q to
fragments (at Sc)
2. Partial evaluation:
generating
equations (at Fi)
3. Assembling:
solving equation
system (at Sc)
Distributed bounded reachability queries
Sc Jack,"MK"
Emmy,"HR"
Mat,"HR"
F2
Fred, "HR"
Walt, "HR"
Bill,"DB"
F1
Pat,"SE"Tom,"AI"
Ross,"HR"
F3
Variables denoting
numeric values15
Ann
Mark
A weighted dependency graph
Distributed bounded reachability queries
Performance guarantees: bounded reachability queries can be
evaluated with the same performance guarantees as for reachability
queries.
16Yinghui Wu VLDB 2012
Performance guarantees for distributed bounded reachability
Distributed regular reachability queries
Performance guarantees: Over a fragmentation F = (F, Gf) of a graph
G, regular reachability queries qrr(s, t, R) can be evaluated (a) in O((|Vf|2+|Fm|)|R|2 ) time, (b) by visiting each site only once, and (c) with the total
network traffic bounded by O(R|2 |Vf|2), where Gf = (Vf , Ef) and Fm is the
largest fragment in F.
Query automaton Gq(R) of R: <Vq, Eq, Lq, us, ut>
17Yinghui Wu VLDB 2012
Automaton
representation
for queries
Query automaton
A node v is a match of state uv in Gq(R) iff (1) they have the
same label, and (2) there is a path ρ from v to t and a path ρ’
from uv to ut , s.t. ρ and ρ’ induce the same label
Given a graph G, qrr(s, t, R) over G is true if and only if s is a
match of us in Gq(R)
18
Yinghui Wu VLDB 2012
Fred, "HR"
Walt, "HR"
Mark,"FA"
Ross,"HR"
Emmy,"HR"
Mat,"HR"
Ann
FA
DB HR
Distributed regular query evaluation: algorithm
19Yinghui Wu VLDB 2012
f21 f22 … f2k
Distributed regular query evaluation: partial evaluation
20Yinghui Wu VLDB 2012
Partial evaluation by introducing Boolean variables
For each node v in Fi, assign v.
rvec: a vector of O(|Vq|) Boolean
formulas, each entry v.rvec[u]
denotes if v matches u
introduce a Boolean variable
X(v’,w) to each virtual node v’ of Fi
and a state w in Vq, denoting if v’
matches w
Partial answer to qrr(s,t): a set of
Boolean formula from each in-
nodes of Fi
v1
tv’
t
vq
wq…
v2
f11 f12 … f1k
f1v’ f2v’ … fkv’
qrr
X(v’,w)
Distributed regular query evaluation: assembling
21Yinghui Wu VLDB 2012
Partial evaluation by introducing Boolean variables
Collects partial results as set of
Boolean formulas
Constructs a dependency graph: a
node vd for each in-node and
each entry of its formula vector,
labeled with Boolean formula and
an edge for dependencies
Checks the reachability of vd(s, us)
can reach vd(t, ut) in the
dependency graph
v1
tv’
t
vq
wq…
v2
qrr
f11 f12 … f1k
vd(v1, vq)
vd(v’,w)
vd(v2,vq)
vd(t,ut)=true
vd(s, us)
Yinghui Wu VLDB 2012
Distributed Regular Reachability Evaluation: Example
22
QQQQ
Q1. Dispatch Q to
fragments (at Sc)
2. Partial evaluation:
generating a set of
Boolean equations
(at Fi)
3. Assembling:
solving equation
system (at Sc)
Sc Jack,"MK"
Emmy,"HR"
Mat,"HR"
F2
Fred, "HR"
Walt, "HR"
Bill,"DB"
F1
Pat,"SE"Tom,"AI"
Ross,"HR"
F3
Test reachability in dependency graph
distributed regular reachability query evaluation
vector of Boolean formulas
Yinghui Wu VLDB 2012
Distributed Reachability with MapReduce
Partial evaluation properly fits in MapReduce framework
…
coordinator
mapper 1 mapper m mapper k
…
reducer
1. generates query automata
Gq; partition graph G to K
fragments (as a key/value
pair) (i, <Fi, Gq> )
2. Map function: local
evaluation upon (i, <Fi,
Gq>) and generates <1,
rvset>
3. Reduce function:
assembles collected partial
results and writes <0, ans>
to distributed file system.
1, <F 1, G q> k, <F
k, Gq>
1, rvset k1, rvset
1
<0,ans>
O(Fm)
O(|R|2|Vf |2)
Processin
g path
O(Fm) + |R|2|Vf |2)
24
Experimental Evaluation
Experimental setting• Real-life datasets
• Synthetic data: larger
random graphs following
densification law
• Algorithms:
disReach, disReachn and disReachm
disDist and disDistn
disRPQ, disRPQn and disRPQd
MRdRPQ25Yinghui Wu VLDB 2012
26
Distributed reachability
Efficiency and scalability 20% and 6% 9% of disReachn
three thousand visits over 4 fragments
disReach outperforms centralized and message-passing approaches
Distributed regular reachability
Efficiency and network traffic
27Yinghui Wu VLDB 2012
Time: 60% of disRPQn Traffic: at most 25% and 3%
disRPQ takes much less time and communication cost
Distributed regular reachability (cont.)
Scalability
28Yinghui Wu VLDB 2012
Scales well with the number of fragments; takes less
time over more fragments
disRPQ scales well over the number of fragments
Performance of MapReduce implementation
Efficiency and Scalability
29Yinghui Wu VLDB 2012
scales well with the size of fragments
Takes more time over more complex queries
Takes less time with more mappers
Partial evaluation works well in MapReduce model
Conclusion
Distributed reachability querying
• Partial evaluation based distributed evaluation
• Reachability, bounded reachability and regular reachability queries
• Performance guarantees
Partial evaluation can be naturally conducted as MapReduce
Future work
• Distributed evaluation for other queries, e.g., graph pattern
matching using simulation
• Combining partial evaluation and incremental computation
30
Partial evaluation based distributed query evaluation
Yinghui Wu VLDB 2012
29
Thank you!
Performance Guarantees for
Distributed Reachability Queries