Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... ·...
Transcript of Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... ·...
![Page 1: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/1.jpg)
Parallel All-Points Shortest PathsECE 563 - Spring 2013
Jason HolmesBharadwaj KrishnamurthyHector Rodriguez-Simmonds
![Page 2: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/2.jpg)
Outline
•Overview•Sequential Code Development•Parallel Dijkstra•Parallel Floyd-Warshall•Results
![Page 3: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/3.jpg)
Overview• Tackled the All-Points Shortest Paths problem• Constructed graphs from real data (social networks, road
networks, etc.)• Wrote modification of Dijkstra’s Algorithm
• Better for sparse graphs• Wrote Floyd-Warshall Dynamic Programming Algorithm
• Less structural overhead• Can handle negative edge weights
• Developed parallel versions using OpenMP• Parallel Dijkstra: 7.6x speedup on 8 cores• Parallel Floyd-Warshall: ~6x speedup on 8 cores
![Page 4: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/4.jpg)
Sequential Code - graphCreate
#Input File<1, 2><1, 4><2, 5><3, 5><3, 6><4, 2><5, 4><6, 6>
buildGraph(Dijkstra)
buildGraph(FW)
Input Data Adj. List
Adj. Matrix
![Page 5: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/5.jpg)
Sequential Code - Dijkstragraph = (vertex **) buildGraphFromFile(argv[1],LIST, &numberOfVertices);
for (source = 0; source < numberOfVertices ; source++) { for (target = 0 ; target < numberOfVertices ; target++) {
vertex * VSource = returnVertex(graph, source); vertex * VTarget = returnVertex(graph, target); VSource->distance = 0; int dist = Dijkstra3(graph, VSource, VTarget, VSource->
number);initGraph(graph, numberOfVertices);
} }
Run Dijkstra’s single source algorithm V times
![Page 6: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/6.jpg)
Sequential Code - FW• Dynamic programming problem• Find the shortest path from i to j using only intermediate
nodes 1 to k-1• Once k reaches total number of nodes, we have the shortest
path from i to j
![Page 7: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/7.jpg)
Sequential Code - FWedge ** FW_direct (edge ** matrix,int v_count){
int i,j,k; edge ** max_node;
max_node = malloc(v_count*sizeof(edge *));for(i = 0;i < v_count;i++){….}for(k = 1;k < v_count;k++){
for(j = 0;j < v_count;j++){for(i = 0;i < v_count;i++){
if(matrix[i][j] > matrix[i][k] + matrix [k][j]){matrix[i][j] = matrix[i][k]+matrix[k][j];max_node[i][j] = k;
}}
}}return(max_node);
}
K loop cannot be parallelized!
![Page 8: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/8.jpg)
Bad Parallelization
i
j
Let K = 5
CORE 0 CORE 1 CORE 2 CORE 3
![Page 9: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/9.jpg)
Sequential Code - FW• Change the algorithm – use smaller blocks and deal with
dependencies
![Page 10: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/10.jpg)
Parallel Floyd-Warshall• Transformations
1. Parallel with tuned blocks2. Restructured parallel with nowait3. Manual balancing of workload distribution4. Parallelized computation of self dependent block5. Loop coalesced version of previous transformation
![Page 11: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/11.jpg)
Parallel Floyd-Warshall
i
j
![Page 12: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/12.jpg)
Parallel Floyd-Warshall
i
j
![Page 13: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/13.jpg)
1. Parallel With Tuned Blocks• Transformed from naïve OpenMP directives• Large block size reduces number of independent blocks that
can run in parallel• Small block sizes cut down on number of computations per
block• Optimum block size found to be ~20x20
• This is somewhat graph-size dependent
![Page 14: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/14.jpg)
2. Restructured with NOWAIT• Issue: Many separate loops can run in parallel for processing
different blocked types• Most for loops combined into one OMP parallel construct
• Eliminates multiple fork/join (wakeup/sleep) operations• Intermediate serial sections handled by OpenMP master• NOWAIT clause added to loops where correctness would not
be violated
![Page 15: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/15.jpg)
3. Redistribute Workload• Issue: Self dependent block migrates as k varies, workload
becomes unbalanced• Using various scheduling options (guided, dynamic) decreased
performance• Hence, manually restructured the loops to balance workload
![Page 16: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/16.jpg)
4/5. Loop Coalescing• 4. Parallelize internal loops of self-dependent blocks to
eliminate serialization• 5. Coalesce loops as number of iterations is small
#pragma omp for nowaitfor(i = block_ly;i < (block_ly + BLOCK_SIZE);i++){
for(j = block_lx;j < (block_lx + BLOCK_SIZE);j++){
if((i >= v_count)||(j >= v_count)||(k >= v_count)) continue;
if(submatrix[i][j] > (submatrix[i][k] + submatrix[k][j])){
submatrix[i][j] = submatrix[i][k] + submatrix[k][j];max_node[i][j] = k;
}}
}
for(k = start_k;k < (start_k + BLOCK_SIZE);k++){
#pragma omp for nowaitfor(ij = 0;ij < BLOCK_SIZE_SQ ;ij++){
i = (ij / BLOCK_SIZE) + block_ly;j = (ij % BLOCK_SIZE) + block_lx;
if((i >= v_count)||(j >= v_count)||(k >= v_count)) continue;
if(submatrix[i][j] > (submatrix[i][k] + submatrix[k][j])){
submatrix[i][j] = submatrix[i][k] + submatrix[k][j];max_node[i][j] = k;
}}
Normal Coalesced
![Page 17: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/17.jpg)
Parallel Dijkstragraph0 = (vertex **) buildGraphFromFile(argv[1],LIST, &numberOfVertices);
//Able to parallelize the very outer loop, compiler could not detect due to subroutine calls#pragma omp parallel {
vertex ** graphX = copyGraph(graph0, numberOfVertices); //Done X times for X threads#pragma omp for private (target) for (source = 0; source < numberOfVertices ; source++) {
for (target = 0 ; target < numberOfVertices ; target++) {
if (omp_get_thread_num() == X) { //Again X is thread numbervertex * VSource = returnVertex(graph0, source); vertex * VTarget = returnVertex(graph0, target); VSource->distance = 0; int dist = Dijkstra3(graph0, VSource, VTarget ,
VSource->number); initGraph(graph0, numberOfVertices);
}}
}}
![Page 18: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/18.jpg)
Parallel Dijkstra
Copy Graph
Process N/X single source
shortest paths
Copy Graph
Process N/X single source
shortest paths
Copy Graph
Process N/X single source
shortest paths
Copy Graph
Process N/X single source
shortest paths
Build Graph
• Outer loop parallelized, each thread executes Dijkstra’salgorithm with N/X source vertices (X # cores)
• Each thread retains a copy of the graph to modify
![Page 19: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/19.jpg)
Results - FW
00.5
11.5
22.5
33.5
44.5
5
Sped
up
Program Version
Floyd-Warshall Speedup – Input Graph 1• Graph 1
• 493 vertices• 1189 edges
• Final Speedup: 4.93 on 8 cores
![Page 20: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/20.jpg)
Results - FW
01234567
Sped
up
Program Version
Floyd-Warshall Speedup – Input Graph 1• Graph 2
• 767 vertices• 1795 edges
• Final Speedup: 6.66 on 8 cores
![Page 21: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/21.jpg)
Results - FW
012345678
Sped
up
Program Version
Floyd-Warshall Speedup – Input Graph 3• Graph 2
• 5,242 vertices• 28,980 edges
• Final Speedup: 7.66 on 8 cores
![Page 22: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/22.jpg)
Results - FW
0123456789
1 2 4 8
Spee
dup
Speedup vs. # of Cores
Graph 1Graph 2Graph 3
• More parallelism exploited for larger graphs
![Page 23: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/23.jpg)
Results - Dijkstra
7.667.68
7.77.727.747.767.78
7.87.82
Graph 1 Graph 2 Graph 3
Parallel Dijkstra Speedup on 8 Cores
• Near linear speedup due to outer loop parallelization• As graph size increases less graph build and copy overhead
![Page 24: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/24.jpg)
Future Work / Improvements• Utilize Mapreduce for huge graph input sets• Covert to MPI for Floyd-Warshall to deal with memory issues
on one machine• Port to map API to view shortest path information on a GUI
• OpenStreetMap• Add mechanisms to detect sparsity, negative edge weights and
call appropriate routines
![Page 25: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs](https://reader033.fdocuments.in/reader033/viewer/2022041811/5e57f81f86d3727e542fc983/html5/thumbnails/25.jpg)
Questions?