Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r =...
-
Upload
harriet-mcdonald -
Category
Documents
-
view
213 -
download
1
Transcript of Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r =...
Computer Science 320
Broadcasting
Floyd’s Algorithm on SMP
for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)
Floyd’s Algorithm on Cluster
• Root node reads distance matrix from input file and scatters row slices to other nodes
• Other nodes compute distances and update their slices
• The slices are gathered back to the root node for output
Parallel I/O File Pattern
• Eliminate the gather of data by having each node write its slice to a separate file
• Eliminate the scatter of data by having each node read its slice from the input file
Execution Timeline
Sharing Data in Computation
• On each pass through the outer loop, the ith row must be available to all of the processes (they all execute the same line of code in the inner loop)
• They can do this in SMP because they share the entire matrix
• They can’t do this in a cluster setup, because they don’t share
for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)
Share Row via a Broadcast Message
• The process that owns a row broadcasts it before the parallel loop is run, on each pass through the outer loop
• Process that owns the row acts as the root for the broadcast, setting up the source buffer
• The other processes set up a destination buffer
• Broadcast also enforces synchronization; they all wait for the broadcast
for i = 0 to n – 1 broadcast row i of d parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)
// Allocate storage for row broadcast from another process.row_i = new double [n];row_i_buf = DoubleBuf.buffer (row_i);
int i_root = 0;for (int i = 0; i < n; ++ i){ double[] d_i = d[i]; // Determine which process owns row i. if (! ranges[i_root].contains(i)) ++ i_root; // Broadcast row i from owner process to all processes. if (rank == i_root) world.broadcast(i_root, DoubleBuf.buffer (d_i)); else{ world.broadcast(i_root, row_i_buf); d_i = row_i; } // Inner loops over rows in my slice and over all columns. for (int r = mylb; r <= myub; ++ r){ double[] d_r = d[r]; for (int c = 0; c < n; ++ c) d_r[c] = Math.min (d_r[c], d_r[i] + d_i[c]); }}
Problem: Too Many Messages
• The amount of time spent in communication is too high when compared to the time spent in computation