Introduction to TDDC78 Lab SeriesTDDC78/labs/doc/MPI_presentation.pdf · MPI Learn about MPI LAB-4...

Introduction to TDDC78 Lab Series

Lu LiLinköping University

Parts of Slides developed by Usman Dastgeer

Goals

Shared- and Distributed-memory systems

Programming parallelism (typical problems)

Goals

Shared- and Distributed-memory systems

Programming parallelism (typical problems)

Approach and solveoPartitioning

Domain decomposition Functional decomposition

oCommunication oAgglomeration oMapping oLoad balancing

TDDC78 Labs: Memory-based Taxonomy

Memory Labs Use

Distributed 1 MPI

Shared 2 & 3 POSIX threads & OpenMP

Distributed 5 MPI

LAB 4 (tools). May saves your time for LAB 5.

Information sources

CompendiumoYour primary source of information

http://www.ida.liu.se/~TDDC78/labs/

oComprehensive Environment description Lab specification Step-by-step instructions

OthersTriolith: http://www.nsc.liu.se/systems/triolith/MPI: http://www.mpi-forum.org/docs/…

TDDC 78 Labs: Memory-based Taxonomy

Memory Labs Use

Distributed 1 MPI

Shared 2 & 3 POSIX threads & OpenMP

Distributed 5 MPI

LAB 5 (tools) at every stage. Saves your time.

Learn about MPI

Define MPI types Send / Receive Broadcast Scatter / Gather

Use virtual topologies MPI_Issend / MPI_Probe / MPI_Reduce Sending larger pieces of data Synchronize / MPI_Barrier

LAB 1

LAB 5

Lab-1 TDDC78: Image Filters with MPI

Blur & Threshold oSee compendium for details

Define types Send / Receive Broadcast Scatter / Gather

Your goal is to understand:

Decompose domains

Apply filter in parallel

For syntax and examples refer to the MPI lecture slides

MPI Types Example

typedef struct {int id; double data[10];

} buf_t; // Composite type

buf_t item; // Element of the typeMPI_Datatype buf_t_mpi; // MPI type to commit

int block_lengths [] = { 1, 10 }; // Lengths of type elements MPI_Datatype block_types [] = { MPI_INT, MPI_DOUBLE }; //Set typesMPI_Aint start, displ[2];

MPI_Address( &item, &start );MPI_Address( &item.id, &displ[0] ); MPI_Address( &item.data[0], &displ[1] );

displ[0] -= start; // Displacement relative to address of startdispl[1] -= start; // Displacement relative to address of start MPI_Type_struct( 2, block_lengths, displ, block_types, &buf_t_mpi );

MPI_Type_commit( &buf_t_mpi );

Send-Receive

...

int s_data, r_data;...

MPI_Request request;MPI_ISend( &s_data, sizeof(int), MPI_INT,

(my_id == 0)?1:0, 0, MPI_COMM_WORLD, &request);

MPI_Status status; MPI_Recv( &r_data, sizeof(int), MPI_INT,

(my_id == 0)?1:0, 0, MPI_COMM_WORLD, &status );MPI_Wait(&request, &status);...

P0 P1

program execution

SendTo(P1) SendTo(P0)

RecvFrom(P1) RecvFrom(P0)

Send-Receive Modes (1)

SEND BLOCKING NON-BLOCKING

Standard MPI_Send MPI_Isend

Synchronous MPI_Ssend MPI_Issend

Buffered MPI_Bsend MPI_Ibsend

Ready MPI_Rsend MPI_Irsend

RECEIVE BLOCKING NON-BLOCKING

MPI_Recv MPI_Irecv

Lab 5: Particles

Moving particlesValidate the pressure law: pV=nRTDynamic interaction patternso# of particles that fly across borders is not static

You need advanced domain decompositionoMotivate your choice!

MPI

Learn about MPI

LAB-4

LAB-1

! Define MPI types ! Send / Receive! Broadcast ! Scatter / Gather

! Use virtual topologies! MPI_Issend / MPI_Probe / MPI_Reduce! Sending larger pieces of data! Synchronize / MPI_Barrier

MPI

Lab-4: Particles

Moving particles

Validate the pressure law

Dynamic interaction patterns # of particles that fly across borders is not static

You need advanced domain decomposition Motivate your choice!

MPI

Process Topologies (0)

! By default processors are arranged into 1-dimensional arrays

! Processor ranks are computed accordingly

! Use virtual topologies achieving 2D instead of 1D arrangement of processors with convenient ranking schemes

What if processors need to communicate in 2 dimensions or more?

MPI


i nt di ms[ 2] ; / / 2D mat r i x / gr i d

di ms[ 0] = 2; / / 2 r owsdi ms[ 1] = 3; / / 3 col umns

MPI _Di ms_cr eat e( npr oc, 2, di ms) ;

i nt per i ods[ 2] ;per i ods[ 0] = 1; / / Row- per i odi cper i ods[ 1] = 0; / / Col umn- non- per i odi c

i nt r eor der = 1; / / Re- oder al l owed

MPI _Comm gr i d_comm;

MPI _Car t _cr eat e( MPI _COMM_WORLD, 2, di ms, per i ods,! r eor der , &gr i d_comm) ;


By default processors are arranged into 1-dimensional arrays

Processor ranks are computed accordingly

Use virtual topologies achieving 2D instead of 1D arrangement of processors with convenient ranking schemes


MPI

Learn about MPI

LAB-4

LAB-1

! Define MPI types ! Send / Receive! Broadcast ! Scatter / Gather

! Use virtual topologies! MPI_Issend / MPI_Probe / MPI_Reduce! Sending larger pieces of data! Synchronize / MPI_Barrier

MPI

Lab-4: Particles

Moving particles

Validate the pressure law

Dynamic interaction patterns # of particles that fly across borders is not static

You need advanced domain decomposition Motivate your choice!

MPI


! By default processors are arranged into 1-dimensional arrays

! Processor ranks are computed accordingly

! Use virtual topologies achieving 2D instead of 1D arrangement of processors with convenient ranking schemes


MPI


i nt di ms[ 2] ; / / 2D mat r i x / gr i d

di ms[ 0] = 2; / / 2 r ows

di ms[ 1] = 3; / / 3 col umns

MPI _Di ms_cr eat e( npr oc, 2, di ms) ;

i nt per i ods[ 2] ;per i ods[ 0] = 1; / / Row- per i odi c

per i ods[ 1] = 0; / / Col umn- non- per i odi c

i nt r eor der = 1; / / Re- oder al l owed

MPI _Comm gr i d_comm;

MPI _Car t _cr eat e( MPI _COMM_WORLD, 2, di ms, per i ods,! r eor der , &gr i d_comm) ;


int dims[2]; // 2D matrix / griddims[0]= 2; // 2 rowsdims[1]= 3; // 3 columns

MPI_Dims_create( nproc, 2, dims);int periods[2]; periods[0]= 1; // Row-periodicperiods[1]= 0; // Column-non-periodic

int reorder = 1; // Re-order allowed

MPI_Comm grid_comm;

MPI_Cart_create( MPI_COMM_WORLD, 2, dims, periods, reorder, &grid_comm);


int my_coords[2]; // Cartesian Process coordinates int my_rank; // Process rank int right_nbr[2];int right_nbr_rank;

MPI_Cart_get( grid_comm, 2, dims, periods, my_coords);

MPI_Cart_rank( grid_comm, my_coords, &my_rank);

right_nbr[0] = my_coords[0]+1; right_nbr[1] = my_coords[1];

MPI_Cart_rank( grid_comm, right_nbr, & right_nbr_rank);

Collective Communication (CC)

...// One processorfor(int j=1; j < nproc; j++) {MPI_Send(&message, sizeof(message_t), ...);}...// All the othersMPI_Recv(&message,sizeof(message_t), ...);

CC: Scatter / Gather

sendbuf = (int *) malloc( nproc * stride * sizeof(int));displs = (int *) malloc( nproc * sizeof( int)); scounts = (int *) malloc( nproc * sizeof( int));

for (i = 0; i < nproc; ++i) { displs[i] = ...scounts[i] = ...

}

MPI_Scatterv( sendbuf, scounts, displs, MPI_INT, rbuf, 100, MPI_INT, root, comm);

Distributing (unevenly sized) chunks of data

Summary

Learning goalsoPoint-to-point communication oProbing / Non-blocking send (choose) oBarriers & Wait = Synchronization oDerived data types oCollective communications oVirtual topologies

Send/Receive modes oUse with care to keep your code portable, e.g. MPI_Bsend

o“It works there but not here!”

MPI Labs at home?

No problem

www.open-mpi.org

Simple to installSimple to use

Introduction to TDDC78 Lab SeriesTDDC78/labs/doc/MPI_presentation.pdf · MPI Learn about MPI LAB-4...

Documents

Transcript of Introduction to TDDC78 Lab SeriesTDDC78/labs/doc/MPI_presentation.pdf · MPI Learn about MPI LAB-4...