Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication
-
Upload
mechelle-pacheco -
Category
Documents
-
view
21 -
download
0
description
Transcript of Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication
![Page 1: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/1.jpg)
Oct. 30, 2002 Parallel Processing 1
Parallel Processing (CS 730)
Lecture 9: Advanced Point to Point Communication
Jeremy R. Johnson
*Parts of this lecture was derived from chapters 13 in Pacheco
![Page 2: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/2.jpg)
Oct. 30, 2002 Parallel Processing 2
Introduction
• Objective: To further examine message passing communication patterns.
• Topics– Implementing Allgather
• Ring
• Hypercube
– Non-blocking send/recv• MPI_Isend
• MPI_Wait
• MPI_Test
![Page 3: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/3.jpg)
Oct. 30, 2002 Parallel Processing 3
Broadcast/Reduce RingP3 P2
P1P0
P3 P2
P1P0
P3 P2
P1P0
P3 P2
P1P0
![Page 4: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/4.jpg)
Oct. 30, 2002 Parallel Processing 4
Bi-directional Broadcast RingP3 P2
P1P0
P3 P2
P1P0
P3 P2
P1P0
![Page 5: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/5.jpg)
Oct. 30, 2002 Parallel Processing 5
Allgather Ring
x3 x2
x0 x1
P3 P2
P1P0
x2,x3 x1,x2
x0,x3 x0,x1
P3 P2
P1P0
x1,x2,x3
x0,x2,x3
P3 P2
P1P0
x0,x1,x2,x3
P3 P2
P1P0
x0,x1,x2
x0,x1,x3 x0,x1,x2,x3
x0,x1,x2,x3
x0,x1,x2,x3
![Page 6: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/6.jpg)
Oct. 30, 2002 Parallel Processing 6
AllGather• int MPI_AllGather(• void* send_data /* in */• int send_count /* in */• MPI_Datatype send_type /* in */• void* recv_data /* out */• int recv_count /* in */• MPI_Datatype recv_type /* in */• MPI_Comm communicator /* in */)
Process 0
Process 1
Process 2
Process 3
x0
x1
x2
x3
![Page 7: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/7.jpg)
Oct. 30, 2002 Parallel Processing 7
Allgather_ring
void Allgather_cube(float x[], int blocksize, float y[], MPI_Comm comm) {
int i, p, my_rank;
int successor, predecessor;
int send_offset, recv_offset;
MPI_Status status;
MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank);
for (i=0; i < blocksize; i++)
y[i + my_rank*blocksize] = x[i];
successor = (my_rank + 1) % p;
predecessor = (my_rank – 1 + p) % p;
![Page 8: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/8.jpg)
Oct. 30, 2002 Parallel Processing 8
Allgather_ring
for (i=0; i < p-1; i++) {
send_offset = ((my_rank – i + p) % p)*blocksize;
recv_offset = ((my_rank –i – 1+p) % p)*blocksize;
MPI_Send(y + send_offset,blocksize,MPI_FLOAT, successor, 0, comm);
MPI_Recv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0,
comm,&status);
}
}
![Page 9: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/9.jpg)
Oct. 30, 2002 Parallel Processing 9
Hypercube
• Graph (recursively defined)• n-dimensional cube has 2n nodes with each node
connected to n vertices• Binary labels of adjacent nodes differ in one bit
000 001
101100
010 011
110 111
00 01
10 11
0 1
![Page 10: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/10.jpg)
Oct. 30, 2002 Parallel Processing 10
000 001
101100
010 011
110 111
Broadcast/Reduce
![Page 11: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/11.jpg)
Oct. 30, 2002 Parallel Processing 11
000 001
101100
010 011
110 111
AllgatherProcess
000 x0001 x1010 x2011 x3100 x4101 x5110 x6111 x7
Data
Process000 x0 x4001 x1 x5010 x2 x6011 x3 x7100 x0 x4101 x1 x5110 x2 x6111 x3 x7
Data
Process000 x0 x2 x4 x6001 x1 x3 x5 x7010 x0 x2 x4 x6011 x1 x3 x5 x7100 x0 x2 x4 x6101 x1 x3 x5 x7110 x0 x2 X4 x6111 x1 x3 x5 x7
Data
![Page 12: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/12.jpg)
Oct. 30, 2002 Parallel Processing 12
Allgather
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
![Page 13: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/13.jpg)
Oct. 30, 2002 Parallel Processing 13
Allgather_cube
void Allgather_cube(float x[], int blocksize, float y[], MPI_Comm comm) {
int i, d, p, my_rank;
unsigned eor_bit, and_bits;
int stage, partner;
MPI_Datatype hole_type;
int send_offset, recv_offset;
MPI_Status status;
int log_base2(int p);
MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank);
for (i=0; i < blocksize; i++)
y[i + my_rank*blocksize] = x[i];
d = log_base2(p); eor_bit = 1 << (d-1); and_bits = (1 << d) – 1;
![Page 14: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/14.jpg)
Oct. 30, 2002 Parallel Processing 14
Allgather_cube
for (stage = 0; stage < d; stage++) {
partner = my_rank ^ eor_bit;
send_offset = (my_rank & and_bits) * blocksize;
recv_offset = (partner & and_bits)*blocksize;
MPI_Type_vector(1 << stage, blocksize, (1 << (d-stage)*blocksize,
MPI_FLOAT,&hold_type);
MPI_Type_commit(&hole_type);
MPI_Send(y+send_offset,1,hold_type,partner, 0, comm);
MPI_Recv(y+recv_offset,1,hold_type,partner, 0, comm,&status);
MPI_Type_free(&hole_type);
eor_bit = eor_bit >> 1;
and_bits = and_bits >> 1;
}
![Page 15: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/15.jpg)
Oct. 30, 2002 Parallel Processing 15
Buffering Assumption
• Previous code is not safe since it depends on sufficient system buffers being available so that deadlock does not occur.
• SendRecv can be used to guarantee that deadlock does not occur.
![Page 16: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/16.jpg)
Oct. 30, 2002 Parallel Processing 16
SendRecv
• int MPI_Sendrecv(• void* send_buf /* in */,• int send_count /* in */,• MPI_Datatype send_type /* in */,• int dest /* in */,• int send_tag /* in */,• void* recv_buf /* out */,• int recv_count /* in */,• MPI_Datatype recv_type /* in */,• int source /* in */,• int recv_tag /* in */,• MPI_Comm communicator /* in */,• MPI_Status* status /* out */)
![Page 17: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/17.jpg)
Oct. 30, 2002 Parallel Processing 17
SendRecvReplace
• int MPI_Sendrecv_replace(• void* buffer /* in */,• int count /* in */,• MPI_Datatype datatype /* in */,• int dest /* in */,• int send_tag /* in */,• int source /* in */,• int recv_tag /* in */,• MPI_Comm communicator /* in */,• MPI_Status* status /* out */)
![Page 18: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/18.jpg)
Oct. 30, 2002 Parallel Processing 18
Nonblocking Send/Recv
• Allow overlap of communication and computation. Does not wait for buffer to be copied or receive to occur.
• The communication is posted and can be tested later for completion
• int MPI_Isend( /* Immediate */• void* buffer /* in */,• int count /* in */,• MPI_Datatype datatype /* in */,• int dest /* in */,• int tag /* in */,• MPI_Comm comm /* in */,• MPI_Request* request /* out */)
![Page 19: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/19.jpg)
Oct. 30, 2002 Parallel Processing 19
Nonblocking Send/Recv
• int MPI_Irecv(• void* buffer /* in */,• int count /* in */,• MPI_Datatype datatype /* in */,• int source /* in */,• int tag /* in */,• MPI_Comm comm /* in */,• MPI_Request* request /* out */)
• int MPI_Wait(• MPI_Request* request /* in/out a*/,• MPI_Status* status /* out */)
• int MPI_Test(MPI_Request* request, int * flat, MPI_Status* status);
![Page 20: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/20.jpg)
Oct. 30, 2002 Parallel Processing 20
Allgather_ring (Overlapped)
recv_offset = ((my_rank –1 + p) % p)*blocksize;
for (i=0; i < p-1; i++) {
MPI_ISend(y + send_offset,blocksize,MPI_FLOAT, successor,
0, comm, &send_request);
MPI_IRecv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0,
comm,&recv_request);
send_offset = ((my_rank – i -1 + p) % p)*blocksize;
recv_offset = ((my_rank – i – 2 +p) % p)*blocksize;
MPI_Wait(&send_request, &status);
MPI_Wait(&recv_request, &status);
}
![Page 21: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/21.jpg)
Oct. 30, 2002 Parallel Processing 21
AlltoAll
• Sequence of permutations implemented with send_recv
0 1 2 3 4 5 6 7
7 0 1 2 3 4 5 6
6 7 0 1 2 3 4 5
5 6 7 0 1 2 3 4
4 5 6 7 0 1 2 3
3 4 5 6 7 0 1 2
2 3 4 5 6 7 0 1
1 2 3 4 5 6 7 0
![Page 22: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/22.jpg)
Oct. 30, 2002 Parallel Processing 22
AlltoAll (2 way)
• Sequence of permutations implemented with send_recv
0 1 2 3 4 5 6 7
1 0 3 2 5 4 7 6
2 3 0 1 6 7 4 5
3 2 1 0 7 6 5 4
4 5 6 7 0 1 2 3
5 4 7 6 1 0 3 2
6 7 4 5 2 3 0 1
7 6 5 4 3 2 1 0
![Page 23: Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication](https://reader031.fdocuments.in/reader031/viewer/2022032709/56813095550346895d967042/html5/thumbnails/23.jpg)
Oct. 30, 2002 Parallel Processing 23
Communication Modes
• Synchronous (wait for receive)
• Ready (make sure receive has been posted)
• Buffered (user provides buffer space)