,
Processing large-scale graphs
with GoogleTM
Pregel
Frank Celler
@fceller
November 22, 2014
www.arangodb.com
About
about us
Frank Celler (@fceller) working on the ArangoDB core
Michael Hackstein (@mchacki) started an experimental
implementation of Pregel
about the talk
different kinds of graph algorithms
Pregel example
Pregel mind set aka Framework
more examples
1
About
about us
Frank Celler (@fceller) working on the ArangoDB core
Michael Hackstein (@mchacki) started an experimental
implementation of Pregel
about the talk
different kinds of graph algorithms
Pregel example
Pregel mind set aka Framework
more examples
1
Pregel at ArangoDB
Started as a side project in free hack time
Experimental on operational database
Implemented as an alternative to traversals
Make use of the flexibility of JavaScript:
No strict type system
No pre-compilation, on-the-fly queries
Native JSON documents
Really fast development
2
Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
⇒ Touch all vertices and their neighbourhoods
Traversals
Define a specific start point
Iteratively explore the graph
⇒ History of steps is known
Global measurements
Compute one value for the graph, based on all it’s vertices
or edges
Compute one value for each vertex or edge
⇒ Often require a global view on the graph
3
Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
⇒ Touch all vertices and their neighbourhoods
Traversals
Define a specific start point
Iteratively explore the graph
⇒ History of steps is known
Global measurements
Compute one value for the graph, based on all it’s vertices
or edges
Compute one value for each vertex or edge
⇒ Often require a global view on the graph
3
Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
⇒ Touch all vertices and their neighbourhoods
Traversals
Define a specific start point
Iteratively explore the graph
⇒ History of steps is known
Global measurements
Compute one value for the graph, based on all it’s vertices
or edges
Compute one value for each vertex or edge
⇒ Often require a global view on the graph
3
Pregel
A framework to query distributed, directed graphs.
Known as “Map-Reduce” for graphs
Uses same phases
Has several iterations
Aims at:
Operate all servers at full capacity
Reduce network traffic
Good at calculations touching all vertices
Bad at calculations touching a very small number of vertices
4
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
2
3
3 4
45
5
6
6
7
7
5
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
2
3
3 4
45
5
6
6
7
7
2
34
4
5
6
7
5
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
2
3
3 4
45
5
6
6
7
7
2
34
4
5
6
7
5
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
2
3
3 4
45
5
6
5
7
6
1
22
3
5
5
6
5
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
2
3
3 4
45
5
6
5
7
6
1
22
3
5
5
6
5
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
1
3
2 4
25
5
6
5
7
5
11
2
2
5
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
1
3
2 4
25
5
6
5
7
5
11
2
2
5
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
1
3
1 4
15
5
6
5
7
5
1
1
5
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
1
3
1 4
15
5
6
5
7
5
1
1
5
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
1
3
1 4
15
5
6
5
7
5
5
Worker =̂ Map
“Map” a user-defined algorithm over all vertices
Output: set of messages to other vertices
Available parameters:
The current vertex and his outbound edges
All incoming messages
Global values
Allow modifications on the vertex:
Attach a result to this vertex and his outgoing edges
Delete the vertex and his outgoing edges
Deactivate the vertex
7
Combine =̂ Reduce
“Reduce” all generated messages
Output: An aggregated message for each vertex.
Executed on sender as well as receiver.
Available parameters:
One new message for a vertex
The stored aggregate for this vertex
Typical combiners are SUM, MIN or MAX
Reduces network traffic
8
Activity =̂ Termination
Execute several rounds of Map/Reduce
Count active vertices and messages
Start next round if one of the following is true:
At least one vertex is active
At least one message is sent
Terminate if neither a vertex is active nor messages were sent
Store all non-deleted vertices and edges as resulting graph
9
Pregel Questions
connected components
page rankbipartite matching
semi-clustering
mimum spanning forest
graph coloring
shortest paths
10
Pagerank for Giraph
12
1 public class SimplePageRankComputation extends BasicComputation <LongWritable , DoubleWritable , FloatWritable , DoubleWritable >{
2 public static final int MAX_SUPERSTEPS = 30;34 @Override5 public void compute(Vertex <LongWritable , DoubleWritable ,
FloatWritable > vertex , Iterable <DoubleWritable > messages)throws IOException {
6 if (getSuperstep () >= 1) {7 double sum = 0;8 for (DoubleWritable message : messages) {9 sum += message.get();10 }11 DoubleWritable vertexValue = new DoubleWritable ((0.15f /
getTotalNumVertices ()) + 0.85f * sum);12 vertex.setValue(vertexValue);13 }14 if (getSuperstep () < MAX_SUPERSTEPS) {15 long edges = vertex.getNumEdges ();16 sendMessageToAllEdges(vertex , new DoubleWritable(vertex.
getValue ().get() / edges));17 } else {18 vertex.voteToHalt ();19 }20 }2122 public static class SimplePageRankWorkerContext extends
WorkerContext {23 @Override24 public void preApplication () throws InstantiationException ,
IllegalAccessException { }25 @Override26 public void postApplication () { }27 @Override28 public void preSuperstep () { }29 @Override30 public void postSuperstep () { }31 }3233 public static class SimplePageRankMasterCompute extends
DefaultMasterCompute {34 @Override35 public void initialize () throws InstantiationException ,
IllegalAccessException {36 }37 }38 public static class SimplePageRankVertexReader extends
GeneratedVertexReader <LongWritable , DoubleWritable ,FloatWritable > {
39 @Override40 public boolean nextVertex () {41 return totalRecords > recordsRead;42 }
44 @Override45 public Vertex <LongWritable , DoubleWritable , FloatWritable >
getCurrentVertex () throws IOException {46 Vertex <LongWritable , DoubleWritable , FloatWritable > vertex
= getConf ().createVertex ();47 LongWritable vertexId = new LongWritable(48 (inputSplit.getSplitIndex () * totalRecords) +
recordsRead);49 DoubleWritable vertexValue = new DoubleWritable(vertexId.
get() * 10d);50 long targetVertexId = (vertexId.get() + 1) % (inputSplit.
getNumSplits () * totalRecords);51 float edgeValue = vertexId.get() * 100f;52 List <Edge <LongWritable , FloatWritable >> edges = Lists.
newLinkedList ();53 edges.add(EdgeFactory.create(new LongWritable(
targetVertexId), new FloatWritable(edgeValue)));54 vertex.initialize(vertexId , vertexValue , edges);55 ++ recordsRead;56 return vertex;57 }58 }5960 public static class SimplePageRankVertexInputFormat extends
GeneratedVertexInputFormat <LongWritable , DoubleWritable ,FloatWritable > {
61 @Override62 public VertexReader <LongWritable , DoubleWritable ,
FloatWritable > createVertexReader(InputSplit split ,TaskAttemptContext context)
63 throws IOException {64 return new SimplePageRankVertexReader ();65 }66 }6768 public static class SimplePageRankVertexOutputFormat extends
TextVertexOutputFormat <LongWritable , DoubleWritable ,FloatWritable > {
69 @Override70 public TextVertexWriter createVertexWriter(
TaskAttemptContext context) throws IOException ,InterruptedException {
71 return new SimplePageRankVertexWriter ();72 }7374 public class SimplePageRankVertexWriter extends
TextVertexWriter {75 @Override76 public void writeVertex( Vertex <LongWritable ,
DoubleWritable , FloatWritable > vertex) throwsIOException , InterruptedException {
77 getRecordWriter ().write( new Text(vertex.getId().toString ()), new Text(vertex.getValue ().toString ()));
78 }79 }80 }81 }
Pagerank for TinkerPop3
13
1 public class PageRankVertexProgram implements VertexProgram <Double > {
2 private MessageType.Local messageType = MessageType.Local.of(() -> GraphTraversal.<Vertex >of().outE());
3 public static final String PAGE_RANK = Graph.Key.hide("gremlin.pageRank");
4 public static final String EDGE_COUNT = Graph.Key.hide("gremlin.edgeCount");
5 private static final String VERTEX_COUNT = "gremlin.pageRankVertexProgram.vertexCount";
6 private static final String ALPHA = "gremlin.pageRankVertexProgram.alpha";
7 private static final String TOTAL_ITERATIONS = "gremlin.pageRankVertexProgram.totalIterations";
8 private static final String INCIDENT_TRAVERSAL = "gremlin.pageRankVertexProgram.incidentTraversal";
9 private double vertexCountAsDouble = 1;10 private double alpha = 0.85d;11 private int totalIterations = 30;12 private static final Set <String > COMPUTE_KEYS = new HashSet <>(
Arrays.asList(PAGE_RANK , EDGE_COUNT));1314 private PageRankVertexProgram () {}1516 @Override17 public void loadState(final Configuration configuration) {18 this.vertexCountAsDouble = configuration.getDouble(
VERTEX_COUNT , 1.0d);19 this.alpha = configuration.getDouble(ALPHA , 0.85d);20 this.totalIterations = configuration.getInt(
TOTAL_ITERATIONS , 30);21 try {22 if (configuration.containsKey(INCIDENT_TRAVERSAL)) {23 final SSupplier <Traversal > traversalSupplier =
VertexProgramHelper.deserialize(configuration ,INCIDENT_TRAVERSAL);
24 VertexProgramHelper.verifyReversibility(traversalSupplier.get());
25 this.messageType = MessageType.Local.of(( SSupplier)traversalSupplier);
26 }27 } catch (final Exception e) {28 throw new IllegalStateException(e.getMessage (), e);29 }30 }
32 @Override33 public void storeState(final Configuration configuration) {34 configuration.setProperty(GraphComputer.VERTEX_PROGRAM ,
PageRankVertexProgram.class.getName ());35 configuration.setProperty(VERTEX_COUNT , this.
vertexCountAsDouble);36 configuration.setProperty(ALPHA , this.alpha);37 configuration.setProperty(TOTAL_ITERATIONS , this.
totalIterations);38 try {39 VertexProgramHelper.serialize(this.messageType.
getIncidentTraversal (), configuration ,INCIDENT_TRAVERSAL);
40 } catch (final Exception e) {41 throw new IllegalStateException(e.getMessage (), e);42 }43 }4445 @Override46 public Set <String > getElementComputeKeys () {47 return COMPUTE_KEYS;48 }4950 @Override51 public void setup(final Memory memory) {5253 }5455 @Override56 public void execute(final Vertex vertex , Messenger <Double >
messenger , final Memory memory) {57 if (memory.isInitialIteration ()) {58 double initialPageRank = 1.0d / this.vertexCountAsDouble
;59 double edgeCount = Double.valueOf ((Long) this.
messageType.edges(vertex).count().next());60 vertex.singleProperty(PAGE_RANK , initialPageRank);61 vertex.singleProperty(EDGE_COUNT , edgeCount);62 messenger.sendMessage(this.messageType , initialPageRank
/ edgeCount);63 } else {64 double newPageRank = StreamFactory.stream(messenger.
receiveMessages(this.messageType)).reduce (0.0d, (a,b) -> a + b);
65 newPageRank = (this.alpha * newPageRank) + ((1.0d - this.alpha) / this.vertexCountAsDouble);
66 vertex.singleProperty(PAGE_RANK , newPageRank);67 messenger.sendMessage(this.messageType , newPageRank /
vertex.<Double >property(EDGE_COUNT).orElse (0.0d));68 }69 }7071 @Override72 public boolean terminate(final Memory memory) {73 return memory.getIteration () >= this.totalIterations;74 }75 }
Pagerank for ArangoDB
1 var pageRank = function (vertex , message , global) {2 var total = global.vertexCount;3 var edgeCount = vertex._outEdges.length;4 var alpha = global.alpha;5 var sum = 0, rank = 0;6 if (global.step > 0) {7 while (message.hasNext ()) {8 sum += message.next().data;9 }10 rank = alpha * sum + (1-alpha) / total;11 } else {12 rank = 1 / total;13 }14 vertex._setResult(rank);15 if (global.step < global.MAX_STEPS) {16 var send = rank / edgeCount;17 while (vertex._outEdges.hasNext ()) {18 message.sendTo(vertex._outEdges.next().edge.
_getTarget (), send);19 }20 } else {21 vertex._deactivate ();22 }23 };
14
Pregel Questions
connected components
page rank
bipartite matchingsemi-clustering
mimum spanning forest
graph coloring
shortest paths
15
Pregel Questions
connected components
page rank
bipartite matching
semi-clustering
mimum spanning forest
graph coloring
shortest paths
17
Top Related