Download - MPI and OpenMP. How to get MPI and How to Install MPI? mpich2-doc-install.pdf // installation guide mpich2-doc-user.pdf.

MPI and OpenMP

How to get MPI and How to Install MPI? http://www-unix.mcs.anl.gov/mpi/

mpich2-doc-install.pdf

// installation guide

mpich2-doc-user.pdf

// user guide

Outline

Message-passing model Message Passing Interface (MPI) Coding MPI programs Compiling MPI programs Running MPI programs Benchmarking MPI programs OpenMP

Message-passing Model

Processes

Number is specified at start-up time Remains constant throughout execution of

program All execute same program Each has unique ID number Alternately performs computations and

communicates

Circuit Satisfiability

11

11111111

11

111

1

Not satisfiedNot satisfied

0

Solution Method

Circuit satisfiability is NP-complete No known algorithms to solve in polynomial

time We seek all solutions We find through exhaustive search 16 inputs 65,536 combinations to test

Partitioning: Functional Decomposition

Embarrassingly parallelEmbarrassingly parallel: No channels : No channels between tasks between tasks

Agglomeration and Mapping

Properties of parallel algorithm Fixed number of tasks No communications between tasks Time needed per task is variable

Consult mapping strategy decision tree Map tasks to processors in a cyclic fashion

Cyclic (interleaved) Allocation Assume p processes Each process gets every pth piece of work Example: 5 processes and 12 pieces of work

P0: 0, 5, 10

P1: 1, 6, 11

P2: 2, 7

P3: 3, 8

P4: 4, 9

Summary of Program Design

Program will consider all 65,536 combinations of 16 boolean inputs

Combinations allocated in cyclic fashion to processes

Each process examines each of its combinations

If it finds a satisfiable combination, it will print it

Include Files

MPI header file

#include <mpi.h>

Standard I/O header fileStandard I/O header file

#include <stdio.h>

Local Variables

int main (int argc, char *argv[]) { int i; int id; /* Process rank */ int p; /* Number of processes */ void check_circuit (int, int);

Include Include argcargc and and argvargv: they are : they are needed to initialize MPIneeded to initialize MPI

One copy of every variable for each One copy of every variable for each process running this programprocess running this program

Initialize MPI

First MPI function called by each process Not necessarily first executable statement Allows system to do any necessary setup

MPI_Init (&argc, &argv);

Communicators

Communicator: opaque object that provides message-passing environment for processes

MPI_COMM_WORLD Default communicator Includes all processes

Possible to create new communicators Will do this in Chapters 8 and 9

Communicator

MPI_COMM_WORLD

Communicator

0

2

1

3

4

5

Processes

Ranks

Communicator Name

Determine Number of Processes

First argument is communicator Number of processes returned through

second argument

MPI_Comm_size (MPI_COMM_WORLD, &p);

Determine Process Rank

First argument is communicator Process rank (in range 0, 1, …, p-1)

returned through second argument

MPI_Comm_rank (MPI_COMM_WORLD, &id);

Replication of Automatic Variables

0id

6p

4id

6p

2id

6p

1id

6p5id

6p

3id

6p

What about External Variables?

int total;

int main (int argc, char *argv[]) { int i; int id; int p; …

Where is variable Where is variable totaltotal stored? stored?

Cyclic Allocation of Work

for (i = id; i < 65536; i += p) check_circuit (id, i);

Parallelism is outside function Parallelism is outside function check_circuitcheck_circuit

It can be an ordinary, sequential It can be an ordinary, sequential functionfunction

Shutting Down MPI

Call after all other MPI library calls Allows system to free up MPI resources

MPI_Finalize();

Our Call to MPI_Reduce()

MPI_Reduce (&count, &global_count, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

Only process 0will get the result

if (!id) printf ("There are %d different solutions\n", global_count);

Benchmarking the Program

MPI_Barrier barrier synchronization MPI_Wtick timer resolution MPI_Wtime current time

How to form a Ring?

Now to form ring of systems first of all one has to execute following command.

[nayan@MPI_system1 ~]$ mpd &

[1] 9152

Which make MPI_system1 as master system of ring.

[nayan@MPI_system1 ~]$ mpdtrace –l

MPI_system1_32958 (172.16.1.1)

How to form a Ring? (cont’d) Then run following command in terminal of all

other system

[nayan@MPI_system1 ~]$ mpd –h MPI_system1 –p 32958 &

Host name of Master system

Port number of Master system

How to kill a Ring?

And to kill ring run following command on master system.

[nayan@MPI_system1 ~]$ mpdallexit

Compiling MPI Programs

to compile and execute above program follow the following steps

first of all compile sat1.c by executing following command on master system.

[nayan@MPI_system1 ~]$ mpicc -0 sat1.out sat1.c

here “mpicc” is mpi command to compile sat1.c file and sat1.out is output file.

Running MPI Programs

Now to run this output file type following command in master system

[nayan@MPI_system1 ~]$ mpiexec -n 1 ./sat1.out

Benchmarking Results

satisfiability problem

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

1 2 3 4 5 6 7 8 9 10

Number of Processor

tim

e t

aken

in

sec

Series2

OpenMP

OpenMP: An application programming interface (API) for parallel programming on multiprocessors Compiler directives Library of support functions

OpenMP works in conjunction with Fortran, C, or C++

Shared-memory Model

P r o c es s o r P r o c es s o r P r o c es s o r P r o c es s o r

M em o r y

Processors interact and synchronize with eachother through shared variables.

Fork/Join Parallelism

Initially only master thread is active Master thread executes sequential code Fork: Master thread creates or awakens

additional threads to execute parallel code Join: At end of parallel code created threads

die or are suspended

Fork/Join Parallelism

Tim

e

f o r k

jo in

M as ter T h r ead

f o r k

jo in

O th er th r ead s

Shared-memory Model vs.Message-passing Model

Shared-memory model Number active threads 1 at start and finish of

program, changes dynamically during execution

Message-passing model All processes active throughout execution of

program

Parallel for Loops

C programs often express data-parallel operations as for loopsfor (i = first; i < size; i += prime)

marked[i] = 1; OpenMP makes it easy to indicate when the

iterations of a loop may execute in parallel Compiler takes care of generating code that

forks/joins threads and allocates the iterations to threads

Pragmas

Pragma: a compiler directive in C or C++ Stands for “pragmatic information” A way for the programmer to communicate

with the compiler Compiler free to ignore pragmas Syntax:

#pragma omp <rest of pragma>

Parallel for Pragma

Format:#pragma omp parallel forfor (i = 0; i < N; i++) a[i] = b[i] + c[i];

Compiler must be able to verify the run-time system will have information it needs to schedule loop iterations

Function omp_get_num_procs Returns number of physical processors

available for use by the parallel program

int omp_get_num_procs (void)

Function omp_set_num_threads Uses the parameter value to set the number

of threads to be active in parallel sections of code

May be called at multiple points in a program

void omp_set_num_threads (int t)

Comparison

CharacteristicCharacteristic OpenMPOpenMP MPIMPI

Suitable for multiprocessorsSuitable for multiprocessors YesYes YesYes

Suitable for multicomputersSuitable for multicomputers NoNo YesYes

Supports incremental parallelizationSupports incremental parallelization YesYes NoNo

Minimal extra codeMinimal extra code YesYes NoNo

Explicit control of memory Explicit control of memory hierarchyhierarchy

NoNo YesYes

C+MPI vs. C+MPI+OpenMP

C + MPI C + MPI + OpenMP

Benchmarking Results

Example Programm

C Files Descriptionomp_hello.c Hello world

omp_workshare1.c Loop work-sharingomp_workshare2.c Sections work-sharing

omp_reduction.cCombined parallel loop

reduction

omp_orphan.cOrphaned parallel loop

reductionomp_mm.c Matrix multiply

omp_getEnvInfo.cGet and print environment

information

How to run OpenMP programm

$ export OMP_NUM_THREAD=2

$ gcc –fopenmp filename.c

$ ./a.out

For more information

http://www.cs.ccu.edu.tw/~naiwei/cs5635/cs5635.html

http://www.nersc.gov/nusers/help/tutorials/mpi/intro/print.php

http://www-unix.mcs.anl.gov/mpi/tutorial/mpiintro/ppframe.htm

http://www.mhpcc.edu/training/workshop/mpi/MAIN.html

http://www.openmp.org