Message Passing and MPI Collective Operations and Buffering
description
Transcript of Message Passing and MPI Collective Operations and Buffering
Message Passing and MPICollective Operations and Buffering
Laxmikant Kale
CS 320
2
Example : Jacobi relaxation
Pseudocode:
A, Anew: NxN 2D-array of (FP) numbers
loop (how many times?)
for each I = 1, N
for each J between 1, N
Anew[I,J] = average of 4 neighbors and itself.
Swap Anew and A
End loop
Red and Blue boundaries held at fixed values (say temperature)
Discretization: divide the space into a grid of cells.
For all cells except those on the boundary: iteratively compute temperature as average of their neighboring cells’
3
How to parallelize?• Decide to decompose data:
– What options are there? (e.g. 16 processors)• Vertically
• Horizontally
• In square chunks
– Pros and cons
• Identify communication needed– Let us assume we will run for a fixed number of iterations
– What data do I need from others?
– From whom specifically?
– Reverse the question: Who needs my data?
– Express this with sends and recvs..
4
Ghost cells: a common apparition• The data I need from neighbors
– But that I don’t modify (therefore “don’t own”)
• Can be stored in my data structures– So that my inner loops don’t have to know about communication at
all..
– They can be written as if they are sequential code.
5
Convergence Test• Notice that all processors must report their convergence
– Only if all have converged the program has converged
– Send data to one processor (say #0)
– If you are running on 1000 processors?• Too much overhead on that one processor (serialization)
– Use spanning tree:• Simple one: processor P’s parents are (P-1)/2
– Children: 2P+1 2P+2
• Is that the best spanning tree?
– Depends on the machine!
– MPI supports a single interface• Imple,ented differently on different machines
6
MPI_Reduce• Reduce data, and use the result on root.
MPI_Reduce(data, result, size, MPI_Datatype, MPI_Op, amIroot, communicator)
MPI_Allreduce(data, result, size, MPI_Datatype, MPI_Op, amIroot, communicator)
7
Others collective ops• Barriers, Gather, Scatter
MPI_Barrier(MPI_Comm)
MPI_Gather(sendBuf, size, dataType, recvBuf, rcvSize, recvType, root,comm)
MPI_Scatter(…)
MPI_AllGather(.. No root..)
MPI_AllScatter(. .)
8
Collective calls• Message passing is often, but not always, used for SPMD style of
programming:– SPMD: Single process multiple data
– All processors execute essentially the same program, and same steps, but not in lockstep
• All communication is almost in lockstep
• Collective calls: – global reductions (such as max or sum)
– syncBroadcast (often just called broadcast):• syncBroadcast(whoAmI, dataSize, dataBuffer);
– whoAmI: sender or receiver
9
Other Operations• Collective Operations
– Broadcast
– Reduction
– Scan
– All-to-All
– Gather/Scatter
• Support for Topologies
• Buffering issues: optimizing message passing
• Data-type support