Programming using MPI

Preeti MalakarIndian Institute of Technology Kanpur

National Supercomputing

Mission

Centre for Development of

Advanced Computing

SKAIndia

IITKharagpur

High Performance Computing for Astronomy and Astrophysics

Programming Supercomputers

PARAM Shakti, IIT Kharagpur3

Top 500 Supercomputers

Parallel Programming Models

Shared memory programming – OpenMP, Pthreads

Thread

Message Passing Interface

• Standard for message passing in a distributed memory environment (most widely used programming model in supercomputers)

• Efforts began in 1991 by Jack Dongarra, Tony Hey, and David W. Walker

• MPI Forum formed in 1993• Version 1.0: 1994

• Version 2.0: 1997

• Version 3.0: 2012

Parallel Programming Models

Distributed memory programming – MPI (Message Passing Interface)

Memory

Process

Distributed memory (each process has its own address space)

8Adapted from Neha Karanjkar’s slides

What is MPI?

Enables parallel programming using message passing

Process 0 Process 1

Distinct Process Address Space

x = 1, y = 2. . .

x+=2. . .

print x, y

Program order

x = 1, y = 2. . .y++. . .

print x, y

3, 2 1, 3

Process 0 Process 1

x = 1, y = 2. . .

x+=1. . .

print x, y

x = 10, y = 20. . .x++. . .

print x, y

2, 2 11, 20

Process 0 Process 1

Parallel Execution – Single Node

Process

Instruction 1

Instruction 2

Instruction 1

Instruction 2

Instruction 1

Instruction 2

Instruction 1

Instruction 2

parallel.x parallel.x parallel.x parallel.x

parallel.x

Parallel Execution – Multiple Nodes

Memory

Process

Instruction 1

Instruction 2

Instruction 1

Instruction 2

Instruction 1

Instruction 2

Instruction 1

Instruction 2

parallel.x parallel.x parallel.x parallel.x

parallel.x

Message Passing

Process 0 Process 1 Process 2 Process 3

Getting Started

Initialization

Finalization

• gather information about the parallel job

• set up internal library state• prepare for communication

Process Identification

Total number of processes

Rank of a process

• MPI_Comm_size: get the total number of processes

• MPI_Comm_rank: get my rank

MPI Code Execution Steps (On cluster)

• Compile• mpicc -o program.x program.c

• Execute• mpirun -np 1 ./program.x (or mpiexec in place of mpirun)

• Runs 1 process on the launch node

• mpirun -np 6 ./program.x• Runs 6 processes on the launch node

• mpirun -np 6 -f hostfile ./program.x• Runs 6 processes on the nodes specified in the hostfile

Host1:3Host2:2Host3:1…

Multiple Processes on Multiple Cores and Nodes

mpiexec –np 8 –f hostfile ./program.x

Memory

Process

Node 1 Node 2 Node 3 Node 4

Node1:2Node2:2Node3:2Node4:2

Hostfile

Multiple Processes on Multiple Cores and Nodes

mpiexec –np 8 –f hostfile ./program.x 18

Memory

Process

Node 1 Node 2 Node 3 Node 4

Node1:2Node2:2Node3:2Node4:2

Hostfile

Launch node

Proxy Proxy ProxyProxy

MPI Process Spawning

The child process and the parent process run in separate memory spaces. At the time of fork() both memory spaces have the same content. The entire virtual address space of the parent is replicated in the child.

https://man7.org/linux/man-pages/man2/fork.2.html

https://linux.die.net/man/3/execvp

The exec() family of functions replaces the current process image with a new process image.

SIMD Parallelism (Recap)

Memory

Process

Instruction 1

Instruction 2

parallel.x

NO centralized server/master (typically)

Sum of Squares of N numbers

Serial

for i = 1 to N

sum += a[i] * a[i]

Parallel

for i = 1 to N/P

sum += a[i] * a[i]

Memory

Process

0 1 2 3

for i = 1 to N/P

sum += a[i] * a[i]

for i = 1 to N/P

sum += a[i] * a[i]

for i = 1 to N/P

sum += a[i] * a[i]

for i = 1 to N/P

sum += a[i] * a[i]

for i = N/P * rank ; i < N/P * (rank+1) ; i++

localsum += a[i] * a[i]

Collect localsum, add up at one of the ranks

MPI Installation

1. sudo apt-get install mpich

1. Download source code from https://www.mpich.org/downloads/2. ./configure --prefix=<INSTALL_PATH>3. make4. make install

MPI Code

Global communicator

MPI_COMM_WORLD

Required in every MPI communication

Communication Scope

Communicator (communication handle)

• Defines the scope

• Specifies communication context

Process

• Belongs to a group

• Identified by a rank within a group

Identification

• MPI_Comm_size – total number of processes in communicator

• MPI_Comm_rank

MPI_COMM_WORLD

Process identified by rank/id

MPI Message

• Data and header/envelope

• Typically, MPI communications send/receive messages

Source: Origin of message

Destination: Receiver of message

Communicator

Tag (0:MPI_TAG_UB)

Message Envelope

MPI Communication Types

Point-to-pointCollective

Point-to-point Communication

• MPI_Send

• MPI_Recv

SENDER

RECEIVER

int MPI_Send (const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)

int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)

Tags should match

Blocking send and receive

MPI_Datatype

• MPI_BYTE

• MPI_CHAR

• MPI_INT

• MPI_FLOAT

• MPI_DOUBLE

Example 1

MPI_Comm_rank (MPI_COMM_WORLD, &myrank);

// Sender processif (myrank == 0) /* code for process 0 */{strcpy (message,"Hello, there");MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99, MPI_COMM_WORLD);

// Receiver processelse if (myrank == 1) /* code for process 1 */{MPI_Recv (message, 20, MPI_CHAR, 0, 99, MPI_COMM_WORLD, &status);printf ("received :%s\n", message);

Message tag

MPI_Status

• Source rank

• Message tag

• Message length• MPI_Get_count

MPI_ANY_*

• MPI_ANY_SOURCE• Receiver may specify wildcard value for source

• MPI_ANY_TAG• Receiver may specify wildcard value for tag

Example 2

MPI_Comm_rank (MPI_COMM_WORLD, &myrank);

// Sender processif (myrank == 0) /* code for process 0 */{strcpy (message,"Hello, there");MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99, MPI_COMM_WORLD);

// Receiver processelse if (myrank == 1) /* code for process 1 */{MPI_Recv (message, 20, MPI_CHAR, MPI_ANY_SOURCE, 99, MPI_COMM_WORLD,

&status);printf ("received :%s\n", message);

MPI_Send (Blocking)

• Does not return until buffer can be reused

• Message buffering

• Implementation-dependent

• Standard communication mode

SENDER

RECEIVER

Safety

MPI_Send

MPI_Recv

MPI_RecvSafe

MPI_Send

MPI_Recv

MPI_Send

MPI_RecvUnsafe

MPI_Send

MPI_Recv

MPI_SendSafe

MPI_Recv

MPI_Send

MPI_Recv

MPI_SendUnsafe

Other Send Modes

• MPI_Bsend Buffered• May complete before matching receive is posted

• MPI_Ssend Synchronous• Completes only if a matching receive is posted

• MPI_Rsend Ready•

Non-blocking Point-to-Point

• MPI_Isend (buf, count, datatype, dest, tag, comm, request)

• MPI_Irecv (buf, count, datatype, source, tag, comm, request)

• MPI_Wait

MPI_Isend

MPI_Recv

MPI_Isend

MPI_RecvSafe

Computation Communication Overlap

MPI_Isend

MPI_Recv

MPI_Wait

compute

Collective Communications

• Must be called by all processes that are part of the communicator

• Synchronization (MPI_Barrier)

• Global communication (MPI_Bcast, MPI_Gather, …)

• Global reduction (MPI_Reduce

Barrier

• Synchronization across all group members

• Collective call

• Blocks until all processes have entered the call

• MPI_Barrier (comm)

Broadcast

• Root process sends message to all processes

• Any process can be root process but has to be the same in all processes

• int MPI_Bcast (buffer, count, datatype, root, comm)

• Number of elements in buffer – count

• buffer – Input or output

Example 3

int rank, size, color;MPI_Status status;

MPI_Init (&argc, &argv);MPI_Comm_rank (MPI_COMM_WORLD, &rank);MPI_Comm_size (MPI_COMM_WORLD, &size);

color = rank + 2;int oldcolor = color;MPI_Bcast (&color, 1, MPI_INT, 0, MPI_COMM_WORLD);

printf ("%d: %d color changed to %d\n", rank, oldcolor, color);

Gather

• Gathers values from all processes to a root process

• int MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm)

• Arguments recv* not relevant on non-root processes

A0 A1 A2

Example 4

MPI_Comm_rank (MPI_COMM_WORLD, &rank);MPI_Comm_size (MPI_COMM_WORLD, &size);

color = rank + 2;

int colors[size];MPI_Gather (&color, 1, MPI_INT, colors, 1, MPI_INT, 0, MPI_COMM_WORLD);

if (rank == 0)for (i=0; i<size; i++)printf ("color from %d = %d\n", i, colors[i]);

Scatter

• Scatters values to all processes from a root process

• int MPI_Scatter (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm)

• Arguments send* not relevant on non-root processes

• Output parameter – recvbuf

A0 A1 A2

Allgather

• All processes gather values from all processes

• int MPI_Allgather (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm)

A0 A1 A2

Reduce

• MPI_Reduce (inbuf, outbuf, count, datatype, op, root, comm)

• Combines element in inbuf of each process

• Combined value in outbuf of root

• op: MIN, MAX, SUM, PROD, …

0 1 2 3 4 5 6 2 1 2 3 2 5 2 0 1 1 0 1 1 0

2 1 2 3 4 5 6 MAX at root

Allreduce

• MPI_Allreduce (inbuf, outbuf, count, datatype, op, comm)

• op: MIN, MAX, SUM, PROD, …

• Combines element in inbuf of each process

• Combined value in outbuf of each process

0 1 2 3 4 5 6 2 1 2 3 2 5 2 0 1 1 0 1 1 0

2 1 2 3 4 5 6

2 1 2 3 4 5 62 1 2 3 4 5 6

Thankspmalakar@cse.iitk.ac.in

Programming using MPI - ccds.iitkgp.ac.in

Transcript of Programming using MPI - ccds.iitkgp.ac.in

Programming using MPI - ccds.iitkgp.ac.in

Documents

Transcript of Programming using MPI - ccds.iitkgp.ac.in

Parallel Programming with MPI

MPI Programming

Parallel programming using MPI Analysis and optimization

MPI Jakub Yaghob. Literature and references Books Gropp W., Lusk E., Skjellum A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface,

Programming with MPI

Parallel Programming with MPI Parallel programming overview Brief overview of MPI General Program Structure Basic MPI Functions Point to point Communication Communication modes Blocking

Multigrid Method using OpenMP/MPI Hybrid Parallel ... · Multigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX10 Kengo Nakajima Information Technology

Pratical mpi programming

Parallel Programming Using MPI · MPI: Message Passing Interface • A message passing library specification • Model for distributed memory platforms • Not a compiler • For

Parallel Programming Using MPISupercomputing Institute for Advanced Computational Research Parallel Programming Using MPI Shuxia Zhang & David Porter (612) 626-0802 help@msi.umn.edu

Lecture 6: Introduction to MPI programming€¦ · Lecture 6: Introduction to MPI programming – p. 20. MPI collective communication A collective operation involves all the processes

Distributed Memory Programming with MPI

Programming Clusters using Message-Passing Interface (MPI)

Practical MPI Programming

Message Passing Programming with MPI What is MPI?icps.u-strasbg.fr/.../PARALLELISME/mpi-slides.pdf · 2008-11-04 · Message Passing Programming with MPI 1 Message Passing Programming

SOA MPI Programming

Advanced MPI programming

Multi GPU Programming with MPI - Comisión Nacional …fisica.cab.cnea.gov.ar/.../charlas/multi_gpu_programming_with_mpi.pdfMULTI GPU PROGRAMMING (WITH MPI) ... Look at the dot_simple_multiblock.cu

Introduction parallel programming using MPI and OpenMP...Introduction parallel programming using MPI and OpenMP PICSciE Mini-Course December 6, 2017 Stéphane Ethier (ethier@pppl.gov)

RS/6000 SP: Practical MPI Programming · RS/6000 SP: Practical MPI Programming