Distributed Processing Systems (InterProcess Communication) (Message Passing) Distributed Processing...

Distributed Processing SystemDistributed Processing Systemss

(InterProcess Communication)(InterProcess Communication)((Message PassingMessage Passing))

오 상 규오 상 규

서강대학교 정보통신 대학원서강대학교 정보통신 대학원

Email : sgoh@macroimpact.comEmail : sgoh@macroimpact.com

InterProcess Communication - Message Passing

서강대학교 정보통신 대학원

What is Message Passing ?What is Message Passing ?

Data transfer + Synchronization

Process 0May I send ?

Process 1

Requires cooperation of sender & receiver.

Characteristics of Message PassingCharacteristics of Message Passing Multiple Threads of Control

Consists of multiple processes, each of which has its own control and may execute different code. Supports MIMD or SPMD parallelism.

Asynchronous Parallelism Message Passing program executes asynchronously. Need barrier and

blocking communication for synchronization. Separate Address Space

Data variables in one process are not visible to other processes. Need special library routines (e.g., send/receive) to interact with other processes.

Explicit Interactions The Programmer must resolve all the interaction issues such as

communication and synchronization. Explicit Allocation

Data should be explicitly allocated by the user.

Message Passing LibrariesMessage Passing Libraries Proprietary Software

CMMD : Message passing library used in Thinking Machines CM-5. Express : Programming environment by Parasoft Corporation for message passin

g and parallel I/O. Nx : Microkernel system developed for Intel MPPs (e.g., Paragon). Replaced by a

new kernel called PUMA. Public-Domain Software

p4 : A set of macros and subroutines used for programming both shared-memory and message passing systems.

PARMACS : Message passing package derived from p4 and mainly used in Europe.

PVM and MPI MPI : A standard specification for a library of message passing functions develope

d by the MPI Forum. PVM : Self-contained, public domain software system to run parallel applications o

n a network of heterogeneous workstations.

Classification of Message Passing LibrariesClassification of Message Passing Libraries Application Domain

General Purpose : p4, PVM, MPI, Express, PARMACS, etc. ISIS, Horus, Totem, Transis for reliable group communication.

Application Specific : BLACS (for linear algebra), TCGMSG (for chemistry), etc.

Programming Model Computation Model : data parallel or functional parallel. Communication Model : RPC, message passing, or shared memory.

Underlying Implementation Philosophy Socket for portability. High performance communication middleware (e.g., Active Message or Fast

Message) to achieve high performance. Portability

CMMD for CM-5 and NX/2 for Intel parallel computers. Heterogeneity

High-Performance Message-Passing SchemesHigh-Performance Message-Passing Schemes

High-Performance Message-Passing Schemes

HW-Based Approach SW-Based Approach

MultithreadingHigh

PerformanceAPI

HybridApproachMiddleware

Standard Proprietary

(e.g. Fast Sockets) (e.g. U-Net, Active Message (AM), Fast Message (FM))

(e.g. TPVM, LPVM, Chant) (e.g. MPI-FM, PVM-ATM) (e.g. MPI-Nexus, Panda-PVM)

(e.g. Nectar, PAPER, SHRIMP, ParaStation, Memory Channel)

Communication Modes in Message Passing Communication Modes in Message Passing

Synchronous Message Passing Blocking Send/Receive Non-Blocking Send/Receive

Process P Process Q

M = 10; S = -100;

L1: send M to Q; L1: receive S from P;

L2: M = 20; L2: X = S+1;

goto L1;

Variable M is often called the send message buffer and S is called the receive message buffer.

Three Communication ModesThree Communication Modes Synchronous Message Passing

P has to wait until Q execute a corresponding Receive. Will not return until M is sent and received. No additional buffer needed. X should be evaluated to 11.

Blocking Send/Receive Send is executed when a process reaches it without waiting for a corresponding Receive. Send does not return until the message is sent, meaning that message variable M can be

safely rewritten. Maybe temporarily buffered in the sending node, somewhere in the network, or in the receiving node.

Receive is executed when a process reaches it without waiting for a corresponding Send. Receive does not return until the message is received. X should be evaluated to 11.

Non-Blocking Send/Receive Send is executed when a process reaches it without waiting for a corresponding Receive. Send returns immediately after it notifies the system. Unsafe to overwrite M. Receive is executed when a process reaches it without waiting for a corresponding Send. Receive return immediately regardless of the message arrival. X can be 11, 21, or -99.

Comparison of Communication ModesComparison of Communication Modes

SynchronousCommunication Event Non-BlockingBlocking

Send Start Condition

Return of Send Indicates

Semantics

Buffering Message

Status Checking

Wait Overhead

Overlapping Communications and Computations

Both send and receive reached

Message received

Not needed

Highest

Send reached

Message sent

In-Between

Needed

Not needed

In-Between

Send reached

Message send initiated

Error-Prone

Needed

Lowest

What is MPI ?What is MPI ? MPI : Message Passing InterfaceMessage Passing Interface

Developed in 1993-1994 by MPI-Forum.

A message-passing library specification

Can be used in C, FORTRAN, and C++ program. (comprises 129 functions and macros.)

Not a language or compiler specification.

Not a specific implementation or product.

Standards for programming parallel computers, clusters, and heterogeneous networks.

Reasons for using MPIReasons for using MPI

Standardization The only message passing library which can be considered a standard.

Portability

No need to modify your source code when you port your application to a different platform.

Performance Vendor implementations should be able to exploit native hardware features to optimize performance.

Availability A variety of implementations are available.

CommunicatorCommunicator A subset of processes as “communication”universe. composed of Group : an ordered collection of processes. Context : a system defined tag that is attached to a group.

Communicator

PROCESS 0 PROCESS 1 PROCESS n . . .

- identifying process subsets during development of modular programs.- ensuring that messages intended for different purposes are not confused.

each process is assigned a unique rank.(non negative int processor I.D.)

Types of CommunicatorsTypes of Communicators

Intra-Communicators

A collection of processes that can send messages to each other and engage in a collective communication operations.

ex) MPI_COMM_WORLD (default)

Inter-Communicators

Used for sending messages between processes belonging to disjoint intra-communicators.

ex) a newly created intra-communicator could be linked to the original intra-communicator by an inter-communicator.

MPI Communication ModelMPI Communication Model

Point to point communication operations

Send a message from one named process to another. Used to implement local and unstructured communications.

Collective communication operations

Perform commonly used global operations such as summation and broadcast.

MPI Data typeMPI Data type

MPI_PACKED

MPI_BYTE

long double MPI_LONG_DOUBLE

double MPI_DOUBLE

float MPI_FLOAT

unsigned long int MPI_UNSIGNED_LONG

unsigned int MPI_UNSIGNED

unsigned short int MPI_UNSIGNED_SHORT

unsigned char MPI_UNSIGNED_CHAR

signed long int MPI_LONG

signed int MPI_INT

signed short int MPI_ SHORT

signed char MPI_CHAR

C DATA TYPEMPI DATA TYPE

MPI Basic functionsMPI Basic functions MPI_INIT(int *argc, char ***argv) : initiate an MPI computation

MPI_FINALIZE() : terminate a computation MPI_COMM_SIZE(IN comm, OUT size) : determine number of processes

MPI_Comm comm : communicator handle int size : number of processes in the group of comm

MPI_ COMM_RANK(IN comm, OUT pid) : determine my process identifier

MPI_Comm comm : communicator handle int pid : process id in the group of comm

Cf. IN : Call by Value, OUT : as Return Value, INOUT : Call by Reference

Simple MPI ExampleSimple MPI Example

shutdown

main routine

my process id

initialize

number of process

MPI header#include “mpi.h”

main ( int argc, char *argv[ ] ){ . . . /* No MPI functions called before this */ Ierr = MPI_Init ( &argc, &argv ) ; . . . MPI_Comm_size ( MPI_COMM_WORLD, &np ) ;

MPI_Comm_rank ( MPI_COMM_WORLD, &myid ) ; . . . if ( myid ! = 0 ) MPI_Send ( buff, 300, MPI_FLOAT, 0, 0, MPI_COMM_WORLD ) ; else MPI_Recv ( buff, 300, MPI_FLOAT, srcid, 0, MPI_COMM_WORLD, &status ) ; . . . MPI_Finalize ( ) ; /* No MPI functions called after this */}

MPI MessageMPI Message

MPI Message : Data + Envelope

MPI MESSAGE DATA ENVELOPE

Receiver Rank

A Tag (user specified)

Sender Rank

A Communicator

used to distinguish messages received from a single process.

mechanisms for grouping data items - count parameter - derived datatypes - MPI_Pack / MPI_Unpack

COMMUNICATOR

MPI Point to Point CommunicationMPI Point to Point Communication

SEND ( )

BLOCKING COMMUNICATION OR NON-BLOCKING COMMUNICATION

PROCESS APROCESS A

RECV ( )PROCESS BPROCESS B

MPI Send / Receive function PrototypeMPI Send / Receive function Prototype MPI_SEND(IN msg, IN count, IN datatype, IN dest, IN tag, IN comm) : send a message

void *msg : address of send buffer int count : number of elements to send ( 0 ) MPI_Datatype datatype : data type of send buffer elements int dest : process id of destination process int tag : message tag MPI_Comm comm : communicator handle

MPI_RECV(OUT msg, IN count, IN datatype, IN source, IN tag, IN comm, OUT status ) : receive a message

void *msg : address of receive buffer int count : number of elements to receive ( 0 ) MPI_Datatype datatype : data type of receive buffer elements int dest : process id of source process, or MPI_ANY_TAG int tag : message tag or MPI_ANY_TAG MPI_Comm comm : communicator handle MPI_Status *status : status object

Blocking Message Passing ExampleBlocking Message Passing Example #include “mpi.h”#include <stdio.h>

main ( int argc, char *argv[ ] ){ int numtasks, rank, dest, source, rc, tag = 1 ; char inmsg, outmsg = ‘x’ ; MPI_Status Stat ;

MPI_Init ( &argc, &argv ) ; MPI_Comm_size ( MPI_COMM_WORLD, &numtasks ) ; MPI_Comm_rank ( MPI_COMM_WORLD, &rank ) ;

if ( rank == 0 ) { dest = 1 ; source = 1 ; rc = MPI_Send ( &outmsg , 1 , MPI_CHAR , dest , tag , MPI_COMM_WORLD ) ; rc = MPI_Recv ( &inmsg , 1 , MPI_CHAR , source , tag , MPI_COMM_WORLD , &Stat ) ; } else if ( rank == 1 ) { dest = 0 ; source = 0 ; rc = MPI_Recv ( &inmsg , 1 , MPI_CHAR , source , tag , MPI_COMM_WORLD , &Stat ) ; rc = MPI_Send ( &outmsg , 1 , MPI_CHAR , dest , tag , MPI_COMM_WORLD ) ; } MPI_Finalize ( ) ;}

Non-Blocking Message Passing ExampleNon-Blocking Message Passing Example #include “mpi.h”#include <stdio.h>

main ( int argc, char *argv[ ] ){ int numtasks, rank, next, prev, buf[2], tag1 = 1, tag2 = 2 ; MPI_Request reqs[4] ; MPI_Status stats[4] ;

MPI_Init ( &argc, &argv ) ; MPI_Comm_size ( MPI_COMM_WORLD, & numtasks ) ; MPI_Comm_rank ( MPI_COMM_WORLD, &rank ) ;

prev = rank – 1 ; next = rank + 1 ;

if ( rank == 0 ) prev = numtasks – 1 ; if ( rank == ( numtasks – 1) ) next = 0 ;

MPI_Irecv ( &buf[0] , 1 , MPI_INT , prev , tag1 , MPI_COMM_WORLD , reqs[0] ) ; MPI_Irecv ( &buf[1] , 1 , MPI_INT , next , tag2 , MPI_COMM_WORLD , reqs[1] ) ;

MPI_Isend ( &rank , 1 , MPI_INT , prev , tag2 , MPI_COMM_WORLD , reqs[2] ) ; MPI_Isend ( &rank , 1 , MPI_INT , next , tag1 , MPI_COMM_WORLD , reqs[3] ) ;

MPI_Waitall ( 4 , reqs , stats ) ;

MPI_Finalize ( ) ;}

MPI Collective CommunicationMPI Collective Communication

A communication pattern that involves all the processes in a communicator.

Tree structured communication

If we have p processes, this procedure allows us to distribute the input data In log2(p) stages, rather than p-1 stages, which, if p is large, is a huge savings.

Barrier and Broadcast Operations Barrier and Broadcast Operations MPI_BARRIER(IN comm) : Synchronizes all processes.

MPI_BCAST(INOUT inbuf, IN incnt, IN intype, IN root, IN comm) : Sends data from one process to all processes.

A0DATA

PROCESSES A0

MPI_BCAST

Gather and Scatter Operations Gather and Scatter Operations MPI_GATHER ( IN inbuf, IN incnt, IN intype, OUT outbuf, IN outcnt, IN outtype, IN root, IN comm ) : Gathers data from all processes to one process.

DATAPROCESSES A0 A1 A3

MPI_GATHER

MPI_SCATTER ( IN inbuf, IN incnt, IN intype, OUT outbuf, IN outcnt, IN outtype, IN root, IN comm ) : Scatters data from one process to all processes.

A0 A1 A2DATA

PROCESSES A0

MPI_SCATTER

Reduce Operation (1) Reduce Operation (1) MPI_REDUCE ( IN inbuf, OUT outbuf, IN cnt, IN type, IN op, IN root, IN comm ) : combine the values to the output buffer of the single root process using a specified operation.

PROCESS 0

Initial Data :

PROCESS 1

PROCESS 2

PROCESS 3

MPI_REDUCE with MPI_SUM, root = 1

Reduce Operation (2)Reduce Operation (2) MPI_ALLREDUCE ( IN inbuf, OUT outbuf, IN cnt, IN type, IN op, IN comm ) : combine the values to the output buffer of all processes using a specified operation.

PROCESS 0

Initial Data :

PROCESS 1

PROCESS 2

PROCESS 3

MPI_ALLREDUCE with MPI_MIN

Collective Communication ExampleCollective Communication Example

#include “mpi.h”#include <stdio.h>#define SIZE 4

main ( int argc, char *argv[ ] ){ int numtasks, rank, sendcount, recvcount, source ; float sbuf[SIZE][SIZE] = { { 1.0 , 2.0 , 3.0 , 4.0 } , { 5.0 , 6.0 , 7.0 , 8.0 } , { 9.0 , 10.0 , 11.0 , 12.0 } , { 13.0 , 14.0 , 15.0 , 16.0 } } ; float rbuf[SIZE] ;

MPI_Init ( &argc, &argv ) ; MPI_Comm_rank ( MPI_COMM_WORLD, &rank ) ; MPI_Comm_size ( MPI_COMM_WORLD, &numtasks ) ;

if ( numtasks == SIZE ) { source = 1 ; sendcount = SIZE ; recvcount = SIZE ;

MPI_Scatter ( sbuf , sendcount , MPI_FLOAT , rbuf , recvcount , MPI_FLOAT , source , MPI_COMM_WORLD ) ; printf ( “rank=%d results : %f %f %f %f \n”, rank, rbuf[0], rbuf[1], rbuf[2], rbuf[3] ) ; } else printf ( “Must specify %d processors. Terminating. \n” , SIZE ) ;

MPI_Finalize ( ) ;}

MPI Implementation (1)MPI Implementation (1) MPICH

Freely available implementation of the MPI standard, designed to be both portable and efficient.

Developed in 1996 by MPI-Forum.

to compile the C source program prog.c

% cc -o prog.c -I/usr/local/mpi/include -L/usr/local/mpi/lib -lmpi

to run the program with 4 processes

% mpirun -np 4 prog

MPI Implementation (2)MPI Implementation (2)

Available from Ohio Supercomputer center and runs on heterogeneous network of Sun, DEC, SGI, HP workstations.

CHIMP - MPI

Available from the Edinbourgh Parallel Computing Center and runs on Sun, DEC, SGI, IBM, HP workstations, the Meiko Computing Surface machines, and the Fujitsu AP-1000.

MPI Interaction ArchitectureMPI Interaction Architecture

MPICH MPI (Message Passing Interface) : Machine independent layer

AM (Active Messages) , FM (Fast Message), etc.

ADI (Abstract Device Interface) : Machine specific layer - provides efficient communication primitives - optimizations to ADI and the higher layers of MPICH

MPI 2 (1)MPI 2 (1)

Enhanced MPI

Discussed in 1995 by MPI-Forum. Draft made available in 1996

New functionality

Dynamic processes : extensions which remove the static process model of MPI. ( e.g. , MPI_SPAWN )

One sided communications : Include shared memory operations (put/get) and remote accumulate operations. ( e.g. , MPI_PUT )

MPI 2 (2)MPI 2 (2) Parallel I / O

: Discusses MPI support for parallel I/O. (MPI-IO) I/O can also be modeled as message passing.

- Writing to a file : sending a message- Reading from a file : receiving a message

Extended collective operations : Allows for non-blocking collective operations and application of collective operations to inter-communicators.

External Interfaces : Defines routines which allow developers to layer on top of MPI, such as for debuggers and profilers.

Additional language bindings : Describes C++ bindings and discusses FORTRAN-90 issues.

Distributed Processing Systems (InterProcess Communication) (Message Passing) Distributed Processing...

Documents

Transcript of Distributed Processing Systems (InterProcess Communication) (Message Passing) Distributed Processing...

3. Communication in distributed system 3.1 Introduction Interprocess communication is at the heart of distributed systems Distributed Application Communication.

Message Passing Interface - CNR · MPI (Message Passing Interface) • A standard message passing specification for the vendors to implement • Context: distributed memory parallel

Interprocess Communication (IPC)

Message Passing vs. Distributed Objects - UVa · Message Passing versus Distributed Objects ... communication is lost between the processes ... based on objects that exist in a distributed

Using Message Passing for Distributed Programming: Proof Rules · PDF file · 2001-11-26Using Message Passing for Distributed Programming: Proof Rules and Disciplines ... Message

System Models for Distributed Systems - uio.no · Communication paradigms § Interprocess communication ... ¡ Among other patterns: Proxy, Brokerage and Reflection Architectural

Distributed Memory Programming Using Advanced MPI (Message Passing Interface)

IPC (Interprocess communication)faculty.eng.fau.edu/tsorgent/COP4610/COP4610M7Slides.pdfIPC (Interprocess communication) PART I Tami Sorgente 1 INTERPROCESS COMMUNICATION Processes

Communication. Interprocess Communication The “heart” of every distributed system. Question: how do processes on different machines exchange information?

Interprocess Communication - Philadelphia University · – Inter process communication (IPC) ishared variables imessage passing – message passing in concurrent programming languages

Interprocess communicationnelsovp/courses/elec5260_6260/slides... · 2019-04-22 · Interprocess communication Interprocess communication (IPC): OS provides mechanisms so that processes

Chapter - 3 Interprocess Communication - …© Oxford University Press 2011 Desirable Features of Message Passing Systems • Hardware approach • Functionality • Performance •

Interprocess Comunicatic

Interprocess Communication - cs.helsinki.fi · Kangasharju: Distributed Systems 12 Distributed Objects ... Overview of different interprocess communication techniques and solutions

Lecture 3 Page 1 CS 239, Spring 2001 Interprocess Communications in Distributed Operating Systems CS 239 Distributed Operating Systems April 9, 2001.

1 Chapter 2.3 : Interprocess Communication Process concept Process concept Process scheduling Process scheduling Interprocess communication Interprocess.

1 Interprocess Communication 1. Ways of passing information 2. Guarded critical activities (e.g. updating shared data) 3. Proper sequencing in case of.

Distributed Computing 1 Interprocess Communications.

Interprocess Communications Continued

Distributed Processing Systems (InterProcess Communication) (Remote Procedure Call) Distributed Processing Systems (InterProcess Communication) (Remote.