MPI and OpenMP
How to get MPI and How to Install MPI? http://www-unix.mcs.anl.gov/mpi/
mpich2-doc-install.pdf
// installation guide
mpich2-doc-user.pdf
// user guide
Outline
Message-passing model Message Passing Interface (MPI) Coding MPI programs Compiling MPI programs Running MPI programs Benchmarking MPI programs OpenMP
Message-passing Model
Processes
Number is specified at start-up time Remains constant throughout execution of
program All execute same program Each has unique ID number Alternately performs computations and
communicates
Circuit Satisfiability
11
11111111
11
111
1
Not satisfiedNot satisfied
0
Solution Method
Circuit satisfiability is NP-complete No known algorithms to solve in polynomial
time We seek all solutions We find through exhaustive search 16 inputs 65,536 combinations to test
Partitioning: Functional Decomposition
Embarrassingly parallelEmbarrassingly parallel: No channels : No channels between tasks between tasks
Agglomeration and Mapping
Properties of parallel algorithm Fixed number of tasks No communications between tasks Time needed per task is variable
Consult mapping strategy decision tree Map tasks to processors in a cyclic fashion
Cyclic (interleaved) Allocation Assume p processes Each process gets every pth piece of work Example: 5 processes and 12 pieces of work
P0: 0, 5, 10
P1: 1, 6, 11
P2: 2, 7
P3: 3, 8
P4: 4, 9
Summary of Program Design
Program will consider all 65,536 combinations of 16 boolean inputs
Combinations allocated in cyclic fashion to processes
Each process examines each of its combinations
If it finds a satisfiable combination, it will print it
Include Files
MPI header file
#include <mpi.h>
Standard I/O header fileStandard I/O header file
#include <stdio.h>
Local Variables
int main (int argc, char *argv[]) { int i; int id; /* Process rank */ int p; /* Number of processes */ void check_circuit (int, int);
Include Include argcargc and and argvargv: they are : they are needed to initialize MPIneeded to initialize MPI
One copy of every variable for each One copy of every variable for each process running this programprocess running this program
Initialize MPI
First MPI function called by each process Not necessarily first executable statement Allows system to do any necessary setup
MPI_Init (&argc, &argv);
Communicators
Communicator: opaque object that provides message-passing environment for processes
MPI_COMM_WORLD Default communicator Includes all processes
Possible to create new communicators Will do this in Chapters 8 and 9
Communicator
MPI_COMM_WORLD
Communicator
0
2
1
3
4
5
Processes
Ranks
Communicator Name
Determine Number of Processes
First argument is communicator Number of processes returned through
second argument
MPI_Comm_size (MPI_COMM_WORLD, &p);
Determine Process Rank
First argument is communicator Process rank (in range 0, 1, …, p-1)
returned through second argument
MPI_Comm_rank (MPI_COMM_WORLD, &id);
Replication of Automatic Variables
0id
6p
4id
6p
2id
6p
1id
6p5id
6p
3id
6p
What about External Variables?
int total;
int main (int argc, char *argv[]) { int i; int id; int p; …
Where is variable Where is variable totaltotal stored? stored?
Cyclic Allocation of Work
for (i = id; i < 65536; i += p) check_circuit (id, i);
Parallelism is outside function Parallelism is outside function check_circuitcheck_circuit
It can be an ordinary, sequential It can be an ordinary, sequential functionfunction
Shutting Down MPI
Call after all other MPI library calls Allows system to free up MPI resources
MPI_Finalize();
Our Call to MPI_Reduce()
MPI_Reduce (&count, &global_count, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
Only process 0will get the result
if (!id) printf ("There are %d different solutions\n", global_count);
Benchmarking the Program
MPI_Barrier barrier synchronization MPI_Wtick timer resolution MPI_Wtime current time
How to form a Ring?
Now to form ring of systems first of all one has to execute following command.
[nayan@MPI_system1 ~]$ mpd &
[1] 9152
Which make MPI_system1 as master system of ring.
[nayan@MPI_system1 ~]$ mpdtrace –l
MPI_system1_32958 (172.16.1.1)
How to form a Ring? (cont’d) Then run following command in terminal of all
other system
[nayan@MPI_system1 ~]$ mpd –h MPI_system1 –p 32958 &
Host name of Master system
Port number of Master system
How to kill a Ring?
And to kill ring run following command on master system.
[nayan@MPI_system1 ~]$ mpdallexit
Compiling MPI Programs
to compile and execute above program follow the following steps
first of all compile sat1.c by executing following command on master system.
[nayan@MPI_system1 ~]$ mpicc -0 sat1.out sat1.c
here “mpicc” is mpi command to compile sat1.c file and sat1.out is output file.
Running MPI Programs
Now to run this output file type following command in master system
[nayan@MPI_system1 ~]$ mpiexec -n 1 ./sat1.out
Benchmarking Results
satisfiability problem
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
1 2 3 4 5 6 7 8 9 10
Number of Processor
tim
e t
aken
in
sec
Series2
OpenMP
OpenMP: An application programming interface (API) for parallel programming on multiprocessors Compiler directives Library of support functions
OpenMP works in conjunction with Fortran, C, or C++
Shared-memory Model
P r o c es s o r P r o c es s o r P r o c es s o r P r o c es s o r
M em o r y
Processors interact and synchronize with eachother through shared variables.
Fork/Join Parallelism
Initially only master thread is active Master thread executes sequential code Fork: Master thread creates or awakens
additional threads to execute parallel code Join: At end of parallel code created threads
die or are suspended
Fork/Join Parallelism
Tim
e
f o r k
jo in
M as ter T h r ead
f o r k
jo in
O th er th r ead s
Shared-memory Model vs.Message-passing Model
Shared-memory model Number active threads 1 at start and finish of
program, changes dynamically during execution
Message-passing model All processes active throughout execution of
program
Parallel for Loops
C programs often express data-parallel operations as for loopsfor (i = first; i < size; i += prime)
marked[i] = 1; OpenMP makes it easy to indicate when the
iterations of a loop may execute in parallel Compiler takes care of generating code that
forks/joins threads and allocates the iterations to threads
Pragmas
Pragma: a compiler directive in C or C++ Stands for “pragmatic information” A way for the programmer to communicate
with the compiler Compiler free to ignore pragmas Syntax:
#pragma omp <rest of pragma>
Parallel for Pragma
Format:#pragma omp parallel forfor (i = 0; i < N; i++) a[i] = b[i] + c[i];
Compiler must be able to verify the run-time system will have information it needs to schedule loop iterations
Function omp_get_num_procs Returns number of physical processors
available for use by the parallel program
int omp_get_num_procs (void)
Function omp_set_num_threads Uses the parameter value to set the number
of threads to be active in parallel sections of code
May be called at multiple points in a program
void omp_set_num_threads (int t)
Comparison
CharacteristicCharacteristic OpenMPOpenMP MPIMPI
Suitable for multiprocessorsSuitable for multiprocessors YesYes YesYes
Suitable for multicomputersSuitable for multicomputers NoNo YesYes
Supports incremental parallelizationSupports incremental parallelization YesYes NoNo
Minimal extra codeMinimal extra code YesYes NoNo
Explicit control of memory Explicit control of memory hierarchyhierarchy
NoNo YesYes
C+MPI vs. C+MPI+OpenMP
C + MPI C + MPI + OpenMP
Benchmarking Results
Example Programm
C Files Descriptionomp_hello.c Hello world
omp_workshare1.c Loop work-sharingomp_workshare2.c Sections work-sharing
omp_reduction.cCombined parallel loop
reduction
omp_orphan.cOrphaned parallel loop
reductionomp_mm.c Matrix multiply
omp_getEnvInfo.cGet and print environment
information
How to run OpenMP programm
$ export OMP_NUM_THREAD=2
$ gcc –fopenmp filename.c
$ ./a.out
For more information
http://www.cs.ccu.edu.tw/~naiwei/cs5635/cs5635.html
http://www.nersc.gov/nusers/help/tutorials/mpi/intro/print.php
http://www-unix.mcs.anl.gov/mpi/tutorial/mpiintro/ppframe.htm
http://www.mhpcc.edu/training/workshop/mpi/MAIN.html
http://www.openmp.org
Top Related