Introduction to TDDC78 Lab SeriesTDDC78/labs/doc/MPI_presentation.pdf · MPI Learn about MPI LAB-4...
Transcript of Introduction to TDDC78 Lab SeriesTDDC78/labs/doc/MPI_presentation.pdf · MPI Learn about MPI LAB-4...
Introduction to TDDC78 Lab Series
Lu LiLinköping University
Parts of Slides developed by Usman Dastgeer
Goals
Shared- and Distributed-memory systems
Programming parallelism (typical problems)
Goals
Shared- and Distributed-memory systems
Programming parallelism (typical problems)
Approach and solveoPartitioning
Domain decomposition Functional decomposition
oCommunication oAgglomeration oMapping oLoad balancing
TDDC78 Labs: Memory-based Taxonomy
Memory Labs Use
Distributed 1 MPI
Shared 2 & 3 POSIX threads & OpenMP
Distributed 5 MPI
LAB 4 (tools). May saves your time for LAB 5.
Information sources
CompendiumoYour primary source of information
http://www.ida.liu.se/~TDDC78/labs/
oComprehensive Environment description Lab specification Step-by-step instructions
OthersTriolith: http://www.nsc.liu.se/systems/triolith/MPI: http://www.mpi-forum.org/docs/…
TDDC 78 Labs: Memory-based Taxonomy
Memory Labs Use
Distributed 1 MPI
Shared 2 & 3 POSIX threads & OpenMP
Distributed 5 MPI
LAB 5 (tools) at every stage. Saves your time.
Learn about MPI
Define MPI types Send / Receive Broadcast Scatter / Gather
Use virtual topologies MPI_Issend / MPI_Probe / MPI_Reduce Sending larger pieces of data Synchronize / MPI_Barrier
LAB 1
LAB 5
Lab-1 TDDC78: Image Filters with MPI
Blur & Threshold oSee compendium for details
Define types Send / Receive Broadcast Scatter / Gather
Your goal is to understand:
Decompose domains
Apply filter in parallel
For syntax and examples refer to the MPI lecture slides
MPI Types Example
typedef struct {int id; double data[10];
} buf_t; // Composite type
buf_t item; // Element of the typeMPI_Datatype buf_t_mpi; // MPI type to commit
int block_lengths [] = { 1, 10 }; // Lengths of type elements MPI_Datatype block_types [] = { MPI_INT, MPI_DOUBLE }; //Set typesMPI_Aint start, displ[2];
MPI_Address( &item, &start );MPI_Address( &item.id, &displ[0] ); MPI_Address( &item.data[0], &displ[1] );
displ[0] -= start; // Displacement relative to address of startdispl[1] -= start; // Displacement relative to address of start MPI_Type_struct( 2, block_lengths, displ, block_types, &buf_t_mpi );
MPI_Type_commit( &buf_t_mpi );
Send-Receive
...
int s_data, r_data;...
MPI_Request request;MPI_ISend( &s_data, sizeof(int), MPI_INT,
(my_id == 0)?1:0, 0, MPI_COMM_WORLD, &request);
MPI_Status status; MPI_Recv( &r_data, sizeof(int), MPI_INT,
(my_id == 0)?1:0, 0, MPI_COMM_WORLD, &status );MPI_Wait(&request, &status);...
P0 P1
program execution
SendTo(P1) SendTo(P0)
RecvFrom(P1) RecvFrom(P0)
Send-Receive Modes (1)
SEND BLOCKING NON-BLOCKING
Standard MPI_Send MPI_Isend
Synchronous MPI_Ssend MPI_Issend
Buffered MPI_Bsend MPI_Ibsend
Ready MPI_Rsend MPI_Irsend
RECEIVE BLOCKING NON-BLOCKING
MPI_Recv MPI_Irecv
Lab 5: Particles
Moving particlesValidate the pressure law: pV=nRTDynamic interaction patternso# of particles that fly across borders is not static
You need advanced domain decompositionoMotivate your choice!
MPI
Learn about MPI
LAB-4
LAB-1
! Define MPI types ! Send / Receive! Broadcast ! Scatter / Gather
! Use virtual topologies! MPI_Issend / MPI_Probe / MPI_Reduce! Sending larger pieces of data! Synchronize / MPI_Barrier
MPI
Lab-4: Particles
Moving particles
Validate the pressure law
Dynamic interaction patterns # of particles that fly across borders is not static
You need advanced domain decomposition Motivate your choice!
MPI
Process Topologies (0)
! By default processors are arranged into 1-dimensional arrays
! Processor ranks are computed accordingly
! Use virtual topologies achieving 2D instead of 1D arrangement of processors with convenient ranking schemes
What if processors need to communicate in 2 dimensions or more?
MPI
Process Topologies (1)
i nt di ms[ 2] ; / / 2D mat r i x / gr i d
di ms[ 0] = 2; / / 2 r owsdi ms[ 1] = 3; / / 3 col umns
MPI _Di ms_cr eat e( npr oc, 2, di ms) ;
i nt per i ods[ 2] ;per i ods[ 0] = 1; / / Row- per i odi cper i ods[ 1] = 0; / / Col umn- non- per i odi c
i nt r eor der = 1; / / Re- oder al l owed
MPI _Comm gr i d_comm;
MPI _Car t _cr eat e( MPI _COMM_WORLD, 2, di ms, per i ods,! r eor der , &gr i d_comm) ;
Process Topologies (1)
By default processors are arranged into 1-dimensional arrays
Processor ranks are computed accordingly
Use virtual topologies achieving 2D instead of 1D arrangement of processors with convenient ranking schemes
What if processors need to communicate in 2 dimensions or more?
MPI
Learn about MPI
LAB-4
LAB-1
! Define MPI types ! Send / Receive! Broadcast ! Scatter / Gather
! Use virtual topologies! MPI_Issend / MPI_Probe / MPI_Reduce! Sending larger pieces of data! Synchronize / MPI_Barrier
MPI
Lab-4: Particles
Moving particles
Validate the pressure law
Dynamic interaction patterns # of particles that fly across borders is not static
You need advanced domain decomposition Motivate your choice!
MPI
Process Topologies (0)
! By default processors are arranged into 1-dimensional arrays
! Processor ranks are computed accordingly
! Use virtual topologies achieving 2D instead of 1D arrangement of processors with convenient ranking schemes
What if processors need to communicate in 2 dimensions or more?
MPI
Process Topologies (1)
i nt di ms[ 2] ; / / 2D mat r i x / gr i d
di ms[ 0] = 2; / / 2 r ows
di ms[ 1] = 3; / / 3 col umns
MPI _Di ms_cr eat e( npr oc, 2, di ms) ;
i nt per i ods[ 2] ;per i ods[ 0] = 1; / / Row- per i odi c
per i ods[ 1] = 0; / / Col umn- non- per i odi c
i nt r eor der = 1; / / Re- oder al l owed
MPI _Comm gr i d_comm;
MPI _Car t _cr eat e( MPI _COMM_WORLD, 2, di ms, per i ods,! r eor der , &gr i d_comm) ;
Process Topologies (1)
int dims[2]; // 2D matrix / griddims[0]= 2; // 2 rowsdims[1]= 3; // 3 columns
MPI_Dims_create( nproc, 2, dims);int periods[2]; periods[0]= 1; // Row-periodicperiods[1]= 0; // Column-non-periodic
int reorder = 1; // Re-order allowed
MPI_Comm grid_comm;
MPI_Cart_create( MPI_COMM_WORLD, 2, dims, periods, reorder, &grid_comm);
Process Topologies (2)
int my_coords[2]; // Cartesian Process coordinates int my_rank; // Process rank int right_nbr[2];int right_nbr_rank;
MPI_Cart_get( grid_comm, 2, dims, periods, my_coords);
MPI_Cart_rank( grid_comm, my_coords, &my_rank);
right_nbr[0] = my_coords[0]+1; right_nbr[1] = my_coords[1];
MPI_Cart_rank( grid_comm, right_nbr, & right_nbr_rank);
Collective Communication (CC)
...// One processorfor(int j=1; j < nproc; j++) {MPI_Send(&message, sizeof(message_t), ...);}...// All the othersMPI_Recv(&message,sizeof(message_t), ...);
CC: Scatter / Gather
sendbuf = (int *) malloc( nproc * stride * sizeof(int));displs = (int *) malloc( nproc * sizeof( int)); scounts = (int *) malloc( nproc * sizeof( int));
for (i = 0; i < nproc; ++i) { displs[i] = ...scounts[i] = ...
}
MPI_Scatterv( sendbuf, scounts, displs, MPI_INT, rbuf, 100, MPI_INT, root, comm);
Distributing (unevenly sized) chunks of data
Summary
Learning goalsoPoint-to-point communication oProbing / Non-blocking send (choose) oBarriers & Wait = Synchronization oDerived data types oCollective communications oVirtual topologies
Send/Receive modes oUse with care to keep your code portable, e.g. MPI_Bsend
o“It works there but not here!”
MPI Labs at home?
No problem
www.open-mpi.org
Simple to installSimple to use