A MapReduce-Based Maximum-Flow Algorithm for Large Small-World Network Graphs Felix Halim, Roland...
-
Upload
nathaniel-mcdaniel -
Category
Documents
-
view
213 -
download
0
Transcript of A MapReduce-Based Maximum-Flow Algorithm for Large Small-World Network Graphs Felix Halim, Roland...
A MapReduce-Based Maximum-Flow Algorithm for Large Small-World Network Graphs
Felix Halim, Roland H.C. Yap, Yongzheng Wu
Outline• Background and Motivation• Overview• Maximum-Flow algorithm (The Ford-Fulkerson Method)
• MapReduce (MR) Framework• Parallelizing the Ford-Fulkerson Method
- Incremental update, bi-directional search, multiple excess paths
• MapReduce Optimizations- Stateful extension, trading off space vs. number of rounds
• Experimental Results• Conclusion
Background and Motivation• Large Small-world network Graphs naturally arise in– World Wide Web, Social Netwoks, Biology, etc…– Have been shown to have small diameter and robust
• Maximum-flow algorithm on Large Graphs– Isolate a group of spam sites on WWW– Community Identification– Defense against Sybil attack
• Challenge: Computation, storage and memory requirements exceed a single machine capacity
• Our approach: Cloud/Cluster Computing– Run Max-Flow on top of the MapReduce Framework
The Maximum-Flow Overview
• Given a flow network G = (V,E), s, t• Residual network Gf = (V, Ef)
• Augmenting path– A simple path from s to t in the residual network
• To compute the maximum-flow– The Ford-Fulkerson Method O( |f*| E )
While an Augmenting path p is found in Gf
Augment the flow along the path p
Maximum-Flow Example
Legends:
source node
sink node
intermediate
node residual = C – F
augmenting path
p
s
t
r
10
9
2
8
10
6
97
s t
Residual Network
4
3
2
8
10
0
97
s t
Augment p
66
6
Maximum-Flow Example
Legends:
source node
sink node
intermediate
node residual = C – F
augmenting path
p
s
t
r
Residual Network
4
3
2
8
10
0
97
s t6
66
Augment p
4
3
2
1
10
0
20s t
66
6
7
7
7
Maximum-Flow Example
Legends:
source node
sink node
intermediate
node residual = C – F
augmenting path
p
s
t
r
Residual Network
4
3
2
1
10
0
20s t
66
6
7
7
7
Augment p
3
2
1
0
10
0
10s t
77
6
7 8
1
8
Maximum-Flow Example
Legends:
source node
sink node
intermediate
node residual = C – F
augmenting path
p
s
t
r
Max-Flow= 14
3
2
1
0
10
0
10s t
77
6
7 8
1
8
Flow / Capacity
7/10
7/9
1/2
8/8
10
6/6
8/97/7
s t
MapReduce Framework Overview
• Introduced by Google in 2004• Open source implementation: Hadoop• Operates on very large dataset– Consists of key/value pairs– Across thousands of commodity machines– Order terabytes of data
• Abstracts away distributed computing problems– Data partitioning and distribution, load balancing– Scheduling, fault tolerance, communication, etc…
MapReduce Model
• User Defined Map and Reduce function (stateless)• Input: a list of tuples of key/value pairs (k1/v1)
– The user’s map function is applied to each key/value pair – Produces a list of intermediate key/value pairs
• Output: a list of tuples of key/value pairs (k2/v2)– The intermediate values are grouped by its key– The user’s reduce function is applied to each group
• Each tuple is independent – Can be processed in isolation and massively parallel manner– Total input can be far larger than the total workers’ memory
Max-Flow on MR - Observations
• Input: Small-world graph with small diameter D– A vertex in the graph is modeled as an input tuple in MR
• Naïve translation of the Ford-Fulkerson method to MapReduce requires O(|f*| D) MR rounds– Breadth-first search on MR requires O(D) MR rounds– BFSMR using 20 machines takes 9 rounds and 6 hours for
• SWN with ~400M vertices and ~31B edges– It would take years to compute a maxflow (|f*| > 1000)
• Question: How to minimize the number of rounds?
FF1: A Parallel Ford-Fulkerson
• Goal: Minimize the number of rounds thru parallelism• Do more work/round, avoid spilling tasks to the next round
– Use speculative execution to increase parallelism– Incremental updates– Bi-directional search
• Doubles the parallelism (number of active vertices)• Effectively halves the expected number of rounds
• Maintain parallelism (maintain large number of active vertices)– Multiple Excess Paths (k) -> most effective
• Each vertex stores k excess paths (avoid becoming inactive)
• Result: – Lower the number of rounds required from O(|f*| D) to ~D– Large number of augmenting path/round (MR bottleneck)
MR Optimizations• Goal: minimize MR bottlenecks and overheads• FF2 : External worker(s) for stateful MR extension
– Reducer for vertex t is the bottleneck for FF1 (bottleneck)– An external process is used to handle augmenting paths acceptance
• FF3 : Schimmy Method [Lin10]– Avoid shuffling the master graph
• FF4 : Eliminate object instantiations• FF5 : Minimize MR shuffle (communication) costs
– Monitor the extended excess paths for saturation and resend as needed– Recompute/reprocess the graph instead of shuffling the delta (not in paper)
Experimental Results
• Facebook Sub-Graphs
• Cluster setup– Hadoop v0.21-RC-0 installed on 21 nodes (8-cores Hyper-threaded (2 Intel
E5520 @2.27GHz), 3 hard disks (@500GB SATA) and Centos 5.4 (64-bit))
Graph #Vertices #Edges Size (HDFS ) MaxSize
FB1 21 M 112 M 587 MB 8 GB
FB2 73 M 1,047 M 6 GB 54 GB
FB3 97 M 2,059 M 13 GB 111 GB
FB4 151 M 4,390 M 30 GB 96 GB
FB5 225 M 10,121 M 69 GB 424 GB
FB6 441 M 31,239 M 238 GB 1,281 GB
Handling Large Max-Flow Values
• FF5MR is able to process FB6 with a very small number of MR rounds (close to graph diameter).
MapReduce Optimizations
• FF1 (parallel FF) to FF5 (MR optimized) vs. BFSMR
Shuffle Bytes Reduction
• The bottleneck in MR is the shuffle bytes, FF5 optimizes the shuffled bytes
Scalability (#Machines and Graph Size)
Conclusion
• We showed how to parallelize a sequential max-flow algorithm, minimizing the number of MR rounds– Incremental Updates, Bi-directional Search,
and Multiple Excess Paths
• We showed MR optimizations for max-flow– Stateful extension, minimize communication costs
• Computing max-flow on Large Small-World Graph– Is practical using FF5MR
Q & A
• Thank you
Backup Slides
• Related Works• Multiple Excess Paths – Results• Edge processed / second – Results• MapReduce example (word count)• MapReduce execution flow• FF1 Map Function• FF1 Reduce Function• MR
Related Work
• The Push-Relabel algorithm– Distributed (no global view of the graph required)– Have been developed for SMP architectures– Needs sophisticated heuristics to push the flows
• Not Suitable for MapReduce model– Need locks, pull information from its neighbors– Low number of active vertices– Pushing flow to a wrong sub-graph can lead to huge
number of rounds
Multiple Excess Paths - Effectiveness
• The more the k, the less the number of MR rounds required (keep the number of active vertices high)
# Edges processed / sec
• The larger the graph, the more effective
MapReduce Example – Word Count
map(key,value) // document name, contents
foreach word w in value
EmitIntermediate(w, 1);
reduce(key,values) // a word, list of counts
freq = 0;
foreach v in values
freq = freq + v
Emit(key, freq)
MapReduce Execution Flow
MapReduce Simple Applications
• Word Count• Distributed Grep• Count URL access frequency• Reverse Web-Link Graph• Term-Vector per host• Inverted Index• Distributed Sort
All these tasks can be completed in
one MR job
General MR Optimizations
• External worker as stateful Extension for MR– (Dedicated) External workers outside mappers / reducers– Immediately process requests
• Don’t need to wait until mappers / reducers complete
– Flexible synchronization point
• Minimize shuffling intermediate tuples– Can avoid shuffling by re-(process/compute) in the reduce– Use flags in the data structure to prevent re-shuffling
• Eliminate object instantiation– Use binary serializations