Armend Hoxha

45
Armend Hoxha Trevor Hodde Kexin Shi Mizan: A system for Dynamic Load Balancing in Large- Scale Graph Processing Presented by:

description

Mizan : A system for Dynamic Load Balancing in Large-Scale Graph Processing. Presented by:. Armend Hoxha. Trevor Hodde. Kexin Shi. A System for Dynamic Load Balancing. Mizan ( a rabic ) : a double-pan scale. Pregel HADI Pegasus X-RIME. - PowerPoint PPT Presentation

Transcript of Armend Hoxha

Page 1: Armend Hoxha

Armend Hoxha

Trevor Hodde Kexin Shi

Mizan: A system for Dynamic Load Balancing in Large-Scale

Graph Processing

Presented by:

Page 2: Armend Hoxha

A System for Dynamic Load Balancing

Pregel HADI Pegasus X-RIME

Mizan (arabic) : a double-pan scale

We’ll focus on Pregel, since Mizan is a refined Pregel system.

Page 3: Armend Hoxha

A scalable graph mining system Message passing based programming model

It improved the overall performance by 1-2 orders of magnitude over a traditional MapReduce implementation for a wide array of graph mining algorithms:• Page Rank• Shortest paths problems• Bipartite matching• Semi-clustering

Pregel

Page 4: Armend Hoxha

A scalable graph mining system Message passing based programming model

It improved the overall performance by 1-2 orders of magnitude over a traditional MapReduce implementation for a wide array of graph mining algorithms:• Page Rank• Shortest paths problems• Bipartite matching• Semi-clustering

Pregel

Page 5: Armend Hoxha

A scalable graph mining system Message passing based programming model

It improved the overall performance by 1-2 orders of magnitude over a traditional MapReduce implementation for a wide array of graph mining algorithms:• Page Rank• Shortest paths problems• Bipartite matching• Semi-clustering

Existing implementations focus primarilyon graph partitioning as a preprocessingstep to balance computation.

Pregel

Page 6: Armend Hoxha

Pregel is built on BSP programming model

BSP = Bulk Synchronous Parallel

This picture is taken from the paper

Page 7: Armend Hoxha

Pregel is built on BSP programming model

Each vertex runs an algorithm and can send messages asynchronously to any other vertex

During each superstep vertices run in parallel across a distributed infrastructure

Each vertex processes incoming messages from the previous superstep

A vertex is said to be active if it is processing and/or sending messages to other vertices

This process continues until all vertices have no messages to send – they all become inactive

This picture is taken from the paper

Page 8: Armend Hoxha

Graph Algorithms

StationaryAn algorithm is stationary if its active vertices send and receive the same distribution of messages across supersteps. At the and of each superstep, all active nodes become inactive. Examples: Page Rank, Diameter Estimation, Finding Weakly Connected Components etc.

Non-StationaryAn algorithm is non-stationary if the size of its outgoing messages changes across supersteps. Such variations can create workload imbalances across supersteps. Examples: Distributed Minimal Spanning Tree Construction (DMST), Graph Queries, various simulations on social network graphs – advertisement propagation etc.

Superstep1 Superstep2 Superstep3

Worker 1 ~N ~N ~N

Worker 2 ~N ~N ~N

Worker 3 ~N ~N ~N

No. Msg ~3N ~3N ~3N

Superstep1 Superstep2 Superstep3

Worker 1 ~N ~3N ~5N

Worker 2 ~0.5N ~4N ~6N

Worker 3 ~0.5N ~2N ~4N

No. Msg ~2N ~9N ~15N

Page 9: Armend Hoxha

Page Rank (stationary) against DMST (non-stationary)

Difference between stationary and non-stationary algorithms. Total represents the sum across all workers and Max represents the maximum amount on a single worker.

Both of these algorithms were run on a cluster of 21 machines and processed the same dataset (LiverJournal1). The input graph was partitioned using a hash function.

This picture is taken from the paper

Page 10: Armend Hoxha

What causes imbalance in non-stationary algorithms?

There are two sources of imbalance: - one originating from graph structure, and- another from the algorithm behavior

This picture is taken from the paper

Page 11: Armend Hoxha

Graph partitioningCommon approaches to partitioning the data are:

Hash-based Range-based

Min-cuts

Divide the dataset based on a simple heuristic: to evenly distribute vertices across compute nodes, irrespective of their edge connectivity.

Considers the vertex connectivity and partitions the data such that it places strongly connected vertices close to each other (on the same cluster).

Page 12: Armend Hoxha

Graph partitioningCommon approaches to partitioning the data are:

Hash-based Range-based

Min-cuts

Divide the dataset based on a simple heuristic: to evenly distribute vertices across compute nodes, irrespective of their edge connectivity.

Considers the vertex connectivity and partitions the data such that it places strongly connected vertices close to each other (on the same cluster).

Both of these partitioning strategies result in graph dependent performance.

Page 13: Armend Hoxha

Graph partitioningG(N,E) |N| |E|

kg4m68m 4,194,304 68,671,566

LiveJournal1 4,847,571 68,993,773

arabic-2005 22,774,080 639,999,458

LiveJournal kg4m68m arabic-20050

50

100

150

200

250

300

350HashRangeMin-cuts

Run

Tim

e (M

in)

Because of its size, arabic-2005 couldn’t be partitioned using min-cuts in the local cluster where experiments were conducted.

Page 14: Armend Hoxha

Balanced Computation and CommunicationBalancing the computation and communication is fundamental to the efficiency of a Pregel System

Existing implementations of Pregel take one or more of the following five approaches to achieving a balanced workload:

1. Provide simple graph partitioning schemes, like hash- or range-based partitioning (e.g. Giraph)2. Allow developers to set their own partitioning scheme or pre-partition the graph data (e.g. Pregel)3. Provide more sophisticated partitioning techniques (e.g. GraphLab, GoldenOrb and Surfer use min-cuts)4. Utilize the distributed data stores and graph indexing on vertices and edges (e.g. GoldenOrb and Hama), and5. Perform coarse-grained load balancing (e.g. Pregel)

Page 15: Armend Hoxha

Balanced Computation and CommunicationBalancing the computation and communication is fundamental to the efficiency of a Pregel System

Existing implementations of Pregel take one or more of the following five approaches to achieving a balanced workload:

1. Provide simple graph partitioning schemes, like hash- or range-based partitioning (e.g. Giraph)2. Allow developers to set their own partitioning scheme or pre-partition the graph data (e.g. Pregel)3. Provide more sophisticated partitioning techniques (e.g. GraphLab, GoldenOrb and Surfer use min-cuts)4. Utilize the distributed data stores and graph indexing on vertices and edges (e.g. GoldenOrb and Hama), and5. Perform coarse-grained load balancing (e.g. Pregel)

All of the approaches above assume that:• The structure of the graph is static,• The algorithm has predictable behavior or• Developer knows about the runtime characteristics of the algorithm.

Page 16: Armend Hoxha

New Approaches Are Needed

From the experiments presented before, we saw that:- When graph mining algorithm has unpredictable communication needs,- Frequently changes the structure of the graph, or- Has a variable computation complexityA single solution is no enough.

We want to build a system that is:1. Adaptive2. Agnostic to the graph structure, and3. Requires no a priori knowledge of the behavior of the algorithm

Page 17: Armend Hoxha

M I Z A N

Page 18: Armend Hoxha

• Binary Space Partitioning graph processing framework• A way of “recursively subdividing space” into sets which are

collected into a graph (or in this case, a tree)

• Open source, written in C++• Extends the Pregel API (you can read more here)

• Pregel is a scalable system for “Large-Scale Graph Processing”• Known to improve MapReduce performance by 1-2 orders of magnitude

• Focuses on efficient load balancing across workers in a cluster• Two major steps in this process

• Monitoring • Migration Planning

What is Mizan?

Page 19: Armend Hoxha

• Mizan monitors certain metrics for each worker node to ensure equal load balancing• Number of outgoing messages to other nodes

• Only messages sent to remote workers are counted, not messages rerouted to the same worker

• Total incoming messages• This includes messages rerouted to the same worker because queue size can affect

performance• Response time

• Measured from the time at which the node begins processing a message until it finishes

Monitoring (The easy part)

Page 20: Armend Hoxha

• Objectives: • Decentralized

• Make sure all processing happens in parallel to improve performance• “Simple” <- The writers words…not mine

• Transparent• Extends the Pregel API but does not need to change it• Does not assume anything about a graph structure or algorithm

Migration Planning

Page 21: Armend Hoxha

• 5 easy steps to find the “strongest cause” of workload imbalance among the three monitored metrics

• Provides a way to detect and (hopefully) prevent instability due to workload imbalance

Migration Planning (more)

Page 22: Armend Hoxha

• Step 1: Identify the source of imbalance• Compare the worker’s execution time against a normal distribution and flag

any outliers

• Remember: Mizan monitors each vertex for those 3 metrics• Outgoing Messages to remote nodes• All incoming messages• Response time

Migration Planning (the steps)

Page 23: Armend Hoxha

Migration Planning (step 1 picture)

This picture is taken from the paper

Page 24: Armend Hoxha

• Step 1: Identify the source of imbalance• Step 2: Select the migration objective

• Detects the largest cause of workload imbalance by comparing metrics for incoming and outgoing messages for each worker with the worker’s execution time• The result of this comparison will tell the system what needs optimization:

• Optimize outgoing messages• Optimize incoming messages• Optimize response time

Migration Planning (the steps)

Page 25: Armend Hoxha

• Step 1: Identify the source of imbalance• Step 2: Select the migration objective• Step 3: Pair over-utilized workers with under-utilized ones• All workers create and execute the migration plan in parallel• Each worker can be paired with up to one other worker

Migration Planning (the steps)

This picture is taken from the paper

Page 26: Armend Hoxha

• Step 1: Identify the source of imbalance• Step 2: Select the migration objective• Step 3: Pair over-utilized workers with under-utilized ones• Step 4: Select vertices to migrate• This is based on the migration objective determined by the previous steps

Migration Planning (the steps)

Page 27: Armend Hoxha

• Example:• Migration Objective: balance outgoing messages

• Solution: select vertices with a large number of outgoing messages and migrate to an underutilized worker

Migration Planning (step 4 explained)

Page 28: Armend Hoxha

• Step 1: Identify the source of imbalance• Step 2: Select the migration objective• Step 3: Pair over-utilized workers with under-utilized ones• Step 4: Select vertices to migrate• Step 5: Migrate the vertices• Challenges:

• Migrating vertices across workers while maintaining vertex ownership and fast updates• Minimizing migration costs to ensure that a migration actually helps reduce workload

Migration Planning (the steps)

Page 29: Armend Hoxha

• Mizan uses a distributed hash table to implement a lookup service• Vertex v can execute at any worker• V’s home worker ID = hash(ID) mod N• Workers ask the home worker of v for its location• The home worker is notified with the new location as v migrates

Maintaining Vertex Ownership

Page 30: Armend Hoxha

Maintaining Vertex Ownership

This picture is taken from the paper

Page 31: Armend Hoxha

• Delayed Migration is possible for vertices with very large message sizes• During the migration phase, only the ownership of vertex v is moved to its

new worker• Then, at a later time, the actual vertex is passed to the new worker

Vertices with Large Message Size

Page 32: Armend Hoxha

Vertices with Large Message SizeThis picture is taken from the paper

Page 33: Armend Hoxha

Vertices with Large Message SizeThis picture is taken from the paper

Page 34: Armend Hoxha

Vertices with Large Message SizeThis picture is taken from the paper

Page 35: Armend Hoxha

• Mizan is implemented on C++ with MPI with three variations:• Static Mizan: Emulates Giraph, disables dynamic migration• Work stealing (WS): Emulates Pregel's coarse-grained dynamic load balancing• Mizan: Activates dynamic migration

Experimental Setup

Page 36: Armend Hoxha

• Runs on local cluster of 21 machines:• Mix of i5 and i7 processors with 16GB RAM Each

• IBM Blue Gene/P supercomputer with 1024 compute nodes:• Each is a 4 core PowerPC450 CPU at 850MHz with 4GB RAM

Experimental Setup

Page 37: Armend Hoxha

• Experiments on public datasets:• Stanford Network Analysis Project (SNAP)• The Laboratory for Web Algorithmics (LAW)• Kronecker generator

Dataset

This picture is taken from the paper

Page 38: Armend Hoxha

Giraph vs. Static Mizan

Both pictures are taken from the paper

Page 39: Armend Hoxha

• PageRank on a social graph (LiveJournal1)

• The shaded columns: algorithm runtime

• The unshaded columns: initial partitioning cost

Effectiveness of Dynamic Vertex Migration

This picture is taken from the paper

Page 40: Armend Hoxha

• Comparing Work stealing (Pregel clone) vs. Mizan using DMST and ad propagation simulation, on a metis partitioned social graph (LiveJournal1)

Effectiveness of Dynamic Vertex Migration

This picture is taken from the paper

Page 41: Armend Hoxha

Scalability of Mizan

This picture is taken from the paper

Page 42: Armend Hoxha

Scalability of MizanThis picture is taken from the paper

Page 43: Armend Hoxha

• Mizan is a Pregel system that uses fine-grained vertex migration to load balance computation and communication across supersteps• Mizan is an open source project developed within InfoCloud group in

KAUST in collaboration with IBM, programmed in C++ with MPI• Mizan scales up to thousands of workers• Mizan improves the overall computation cost between 40% up to two

orders of magnitude with less than 10% migration overhead

Conclusion

Page 44: Armend Hoxha

• http://www.cs.cornell.edu/~djwill/pubs/mizan.pdf• http://kowshik.github.io/JPregel/pregel_paper.pdf• http://thegraphsblog.files.wordpress.com/2012/11/z-presentation.pd

f

References

Page 45: Armend Hoxha

Thank you!