The Pregel Programming Model with Spark GraphX
-
Upload
andrea-iacono -
Category
Software
-
view
189 -
download
6
Transcript of The Pregel Programming Model with Spark GraphX
![Page 1: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/1.jpg)
The Pregel Programming Model with Spark GraphX
![Page 2: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/2.jpg)
Agenda
- GraphX Introduction - Pregel programming model - Code examples
The main focus will be on the programming model
![Page 3: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/3.jpg)
GraphX is a graph processing system built on top of Apache Spark
- property graph representation- based on RDDs- user defined partitioning on RDDs
![Page 4: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/4.jpg)
GraphX / Spark software stack
![Page 5: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/5.jpg)
Pregel Programming Model
https://kowshik.github.io/JPregel/pregel_paper.pdf
- based on vertices- messages from/to neighbours- bounded in supersteps- status (active / inactive)
![Page 7: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/7.jpg)
GraphX implementation of Pregel
Uses three functions:
- vprog computes the new vertex value- sendMsg decides to whom send the new value- mergeMsg merges incoming values
![Page 8: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/8.jpg)
GraphX communication diagram
![Page 9: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/9.jpg)
graph.pregel( initialMsg = Int.MinValue, maxIterations = Int.MaxValue, activeDirection = EdgeDirection.Out)( // vprog (vertexId: Long, currentVertexAttr: Int, newVertexAttr: Int) => if (newVertexAttr > currentVertexAttr)
newVertexAttr else currentVertexAttr, // sendMsg (edgeTriplet: EdgeTriplet[Int, Int]) => { if (edgeTriplet.srcAttr > edgeTriplet.dstAttr) Iterator( (edgeTriplet.dstId, edgeTriplet.srcAttr) ) else Iterator.empty },
// mergeMsg (attribute1: Int, attribute2: Int) =>
if (attribute1 > attribute2) attribute1 else attribute2)
Max Value implementation
![Page 10: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/10.jpg)
Graph initial stateNode [1]: 3Node [2]: 6Node [3]: 2Node [4]: 1
Graph final stateNode [1]: 6Node [2]: 6Node [3]: 6Node [4]: 6
Max value of the graph is 6.
Max Value implementationResults:
![Page 11: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/11.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Baltimore- Detroit- Chicago- NewYork- Philadelphia
![Page 12: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/12.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Baltimore- Detroit- Chicago- NewYork- Philadelphia
![Page 13: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/13.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Baltimore- Detroit- Chicago- NewYork- Philadelphia
![Page 14: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/14.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Baltimore- Detroit- Chicago- NewYork- Philadelphia
![Page 15: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/15.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Detroit- Chicago- NewYork- Philadelphia
![Page 16: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/16.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Detroit- Chicago- NewYork- Philadelphia
![Page 17: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/17.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Detroit- Chicago- NewYork- Philadelphia
![Page 18: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/18.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Chicago- NewYork- Philadelphia
![Page 19: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/19.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Chicago- NewYork- Philadelphia
![Page 20: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/20.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Chicago- Philadelphia
![Page 21: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/21.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Chicago- Philadelphia
![Page 22: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/22.jpg)
Dijkstra's algorithm
Unvisited nodes:
- Chicago
![Page 23: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/23.jpg)
Dijkstra's algorithm
Unvisited nodes:
![Page 24: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/24.jpg)
type VertexId = scala.Long
case class City(name: String, id: VertexId
)
case class VertexAttribute(cityName: String, distance: Double, path: List[City]
)
Dijkstra's algorithm implementation
Types definitions:
![Page 25: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/25.jpg)
val shortestPathGraph = initialGraph.pregel(initialMsg = VertexAttribute(
"", Double.PositiveInfinity, List[City]()
),maxIterations = Int.MaxValue,activeDirection = EdgeDirection.Out)(vprog,sendMsg,mergeMsg)
Dijkstra's algorithm implementation
![Page 26: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/26.jpg)
val vprog = ( vertexId: VertexId, currentVertexAttr: VertexAttribute, newVertexAttr: VertexAttribute ) =>
if (currentVertexAttr.distance <= newVertexAttr.distance) { currentVertexAttr else newVertexAttr
}
val mergeMsg = (attribute1: VertexAttribute, attribute2: VertexAttribute
) =>
if (attribute1.distance < attribute2.distance) { attribute1 else attribute2
}
Dijkstra's algorithm implementation
![Page 27: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/27.jpg)
val sendMsg = (edgeTriplet: EdgeTriplet[VertexAttribute, Double]) => { if (edgeTriplet.srcAttr.distance < (edgeTriplet.dstAttr.distance - edgeTriplet.attr)) {
Iterator( (edgeTriplet.dstId,
new VertexAttribute(edgeTriplet.dstAttr.cityName,edgeTriplet.srcAttr.distance + edgeTriplet.attr,edgeTriplet.srcAttr.path :+ new City(
edgeTriplet.dstAttr.cityName, edgeTriplet.dstId
) ) ) ) } else Iterator.empty}
Dijkstra's algorithm implementation
![Page 28: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/28.jpg)
Going from Washington to Chicago has a distance of 105.0 km. Path is: Washington [1] => Baltimore [2] => Detroit [3] => NewYork [5] => Chicago [4]
Going from Washington to Washington has a distance of 0.0 km. Path is: Washington [1]
Going from Washington to Philadelphia has a distance of 91.0 km. Path is: Washington [1] => Baltimore[2] => Detroit[3] => NewYork[5] => Philadelphia[6]
Going from Washington to Detroit has a distance of 62.0 km. Path is: Washington [1] => Baltimore [2] => Detroit [3]
Going from Washington to NewYork has a distance of 76.0 km. Path is: Washington [1] => Baltimore [2] => Detroit [3] => NewYork [5]
Going from Washington to Baltimore has a distance of 27.0 km. Path is: Washington [1] => Baltimore [2]
Dijkstra's algorithm implementationResults:
![Page 29: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/29.jpg)
Questions & Answers
![Page 30: The Pregel Programming Model with Spark GraphX](https://reader031.fdocuments.in/reader031/viewer/2022020108/58757c261a28ab78498b6279/html5/thumbnails/30.jpg)
Thanks!
The code is available at https://github.com/andreaiacono/TalkGraphX