class05_MPI, CCR CompilingRunning.pdf

22
MPI Quick Reference: Compiling/Running M. D. Jones, Ph.D. Center for Computational Research University at Buffalo State University of New York High Performance Computing I, 2012 M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 1 / 24

description

Slides related to high performance computing basics

Transcript of class05_MPI, CCR CompilingRunning.pdf

Page 1: class05_MPI, CCR CompilingRunning.pdf

MPI Quick Reference: Compiling/Running

M. D. Jones, Ph.D.

Center for Computational ResearchUniversity at Buffalo

State University of New York

High Performance Computing I, 2012

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 1 / 24

Page 2: class05_MPI, CCR CompilingRunning.pdf

Quickstart to Compiling & Running MPI Applications at CCR Background

Background

This document covers the essentials of compiling and running MPIapplications on the CCR platforms. It does not cover MPI programmingitself, nor debugging, etc. (covered more thoroughly in separatepresentations).

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 3 / 24

Page 3: class05_MPI, CCR CompilingRunning.pdf

Quickstart to Compiling & Running MPI Applications at CCR Modules

Modules Software Management System

There are a large number of available software packages on the CCRsystems, particularly the Linux clusters. To help maintain this oftenconfusing environment, the modules package is used to add andremove these packages from your default environment (many of thepackages conflict in terms of their names, libraries, etc., so the defaultis a minimally populated environment).

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 4 / 24

Page 4: class05_MPI, CCR CompilingRunning.pdf

Quickstart to Compiling & Running MPI Applications at CCR Modules

The module Command

module command syntax:−bash−2.05b$ module help

Modules Release 3 .1 .6 ( Copyr ight GNU GPL v2 1991) :Ava i l ab le Commands and Usage :

+ add | load modu le f i l e [ modu le f i l e . . . ]+ rm | unload modu le f i l e [ modu le f i l e . . . ]+ swi tch | swap modu le f i l e 1 modu le f i l e 2+ d i sp lay | show modu le f i l e [ modu le f i l e . . . ]+ a v a i l [ modu le f i l e [ modu le f i l e . . . ] ]+ use [−a | - append ] d i r [ d i r . . . ]+ unuse d i r [ d i r . . . ]+ update+ purge+ l i s t+ c l ea r+ help [ modu le f i l e [ modu le f i l e . . . ] ]+ what is [ modu le f i l e [ modu le f i l e . . . ] ]+ apropos | keyword s t r i n g+ i n i t a d d modu le f i l e [ modu le f i l e . . . ]+ i n i t p repend modu le f i l e [ modu le f i l e . . . ]+ i n i t r m modu le f i l e [ modu le f i l e . . . ]+ i n i t s w i t c h modu le f i l e 1 modu le f i l e 2+ i n i t l i s t+ i n i t c l e a r

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 5 / 24

Page 5: class05_MPI, CCR CompilingRunning.pdf

Quickstart to Compiling & Running MPI Applications at CCR Modules

Using module in Batch

If you change shells in your batch script you may need to explicitly loadthe modules environment:

tcsh :source $MODULESHOME/ i n i t / tcsh

bash :. $ {MODULESHOME} / i n i t / bash

but generally you should not need to worry about this step (do a“module list” and if it works ok your environment should already beproperly initialized).

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 6 / 24

Page 6: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example

Objective: Construct a very elementary MPI program to do the usual“Hello World” problem, i.e. have each process print out itsrank in the communicator.

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 8 / 24

Page 7: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example

in C

1 #include < s t d i o . h>2 #include " mpi . h "34 i n t main ( i n t argc , char * * argv )5 {6 i n t myid , nprocs ;7 i n t namelen , mpiv , mpisubv ;8 char processor_name [MPI_MAX_PROCESSOR_NAME ] ;9

10 MPI_ In i t (& argc ,& argv ) ;11 MPI_Comm_size (MPI_COMM_WORLD,& nprocs ) ;12 MPI_Comm_rank (MPI_COMM_WORLD,& myid ) ;13 MPI_Get_processor_name ( processor_name ,&namelen ) ;1415 p r i n t f ( " Process %d of %d on %s \ n " , myid , nprocs , processor_name ) ;16 i f ( myid == 0) {17 MPI_Get_version (&mpiv ,& mpisubv ) ;18 p r i n t f ( "MPI Version : %d.%d \ n " , mpiv , mpisubv ) ;19 }20 MPI_Final ize ( ) ;21 return 0;22 }

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 9 / 24

Page 8: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

U2: Intel MPI

There are several commercial implementations of MPI, Intel and HPcurrently being the most prominent (IBM, Sun, SGI, etc. all have theirown variants, but usually are only supported on their own hardware).CCR has a license for Intel MPI, and it has some nice features:

Support for multiple networks (Infiniband, Myrinet, TCP/IP)Part of the ScaLAPACK support in the Intel MKLMPI-2 features (one-sided, dynamic tasks, I/O with parallelfilesystems support)CPU pinning/process affinity options (extensive)

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 10 / 24

Page 9: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

Build the code with the appropriate wrappers:[ u2 : ~ / d_mpi−samples ] $ module load i n t e l−mpi i n t e l / 12 .1[ u2 : ~ / d_mpi−samples ] $ module l i s tCu r ren t l y Loaded Modu le f i l es :

1) n u l l 2) modules 3) use . own4) i n t e l−mpi / 4 . 0 . 3 5) i n t e l / 12 .1

[ u2 : ~ / d_mpi−samples ] $ mpi icc −o h e l l o . impi h e l l o . c # i c c vers ion[ u2 : ~ / d_mpi−samples ] $ mpicc −o h e l l o . impi . gcc h e l l o . c # gcc vers ion

Unfortunately Intel MPI still lacks tight integration with PBS/Torque,and instead relies on “daemons” (launched by you) or the “hydra” tasklauncher to initiate MPI tasks.

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 11 / 24

Page 10: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

1 #PBS −S / b in / bash2 #PBS −q debug3 #PBS − l wa l l t ime =00:10:004 #PBS − l nodes =2:GM: ppn=25 #PBS −M jonesm@ccr . b u f f a l o . edu6 #PBS −m e7 #PBS −N t e s t8 #PBS −o subQ . out9 #PBS − j oe

10 #11 # Note the above d i r e c t i v e s can be commented out using an12 # a d d i t i o n a l "# " ( as i n the debug queue l i n e above )13 #14 module load i n t e l−mpi i n t e l / 12 .115 #16 # cd to d i r e c t o r y from which job was submit ted17 #18 cd $PBS_O_WORKDIR19 #20 # I n t e l MPI has no t i g h t i n t e g r a t i o n w i th PBS,21 # so you have to t e l l i t where to run , but i t s mpirun22 # wrapper w i l l auto−detec t PBS.23 # You can f i n d d e s c r i p t i o n o f a l l I n t e l MPI parameters i n the24 # I n t e l MPI Reference Manual .25 # See < i n t e l mpi i n s t a l l d i r > / doc / Reference_manual . pdf26 #

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 12 / 24

Page 11: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

27 export I_MPI_DEBUG=5 # nice debug leve l , s p i t s out use fu l i n f o28 NPROCS= ` cat $PBS_NODEFILE | wc −l `29 NODES= ` cat $PBS_NODEFILE | uniq `30 NNODES= ` cat $PBS_NODEFILE | uniq | wc −l `31 #32 # mpd−based way :33 mpdboot −n $NNODES −f $PBS_NODEFILE −v34 mpdtrace35 mpiexec −np $NPROCS . / h e l l o . impi36 mpda l l ex i t37 #38 # mpirun wrapper :39 mpirun −np $NPROCS . / h e l l o . impi

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 13 / 24

Page 12: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

Intel MPI on Myrinet

The older U2 nodes have Myrinet - by default Intel MPI tries to run overthe best available network:1 [ u2 : ~ / d_mpi−samples ] $ cat subQ . out2 Job 2949822.d15n41 . ccr . b u f f a l o . edu has requested 2 cores / processors per node .3 running mpda l l ex i t on f09n354 LAUNCHED mpd on f09n35 v ia5 RUNNING: mpd on f09n356 LAUNCHED mpd on f09n34 v ia f09n357 RUNNING: mpd on f09n348 f09n359 f09n34

10 [ 0 ] DAPL s t a r t u p ( ) : t r y i n g to open d e f a u l t DAPL prov ide r from dat r e g i s t r y : mx211 [ 1 ] DAPL s t a r t u p ( ) : t r y i n g to open d e f a u l t DAPL prov ide r from dat r e g i s t r y : mx212 [ 2 ] DAPL s t a r t u p ( ) : t r y i n g to open d e f a u l t DAPL prov ide r from dat r e g i s t r y : mx213 [ 3 ] DAPL s t a r t u p ( ) : t r y i n g to open d e f a u l t DAPL prov ide r from dat r e g i s t r y : mx214 [ 1 ] MPI s t a r t u p ( ) : DAPL prov ide r mx215 [ 0 ] MPI s t a r t u p ( ) : DAPL prov ide r mx216 [ 3 ] MPI s t a r t u p ( ) : DAPL prov ide r mx217 [ 2 ] MPI s t a r t u p ( ) : DAPL prov ide r mx218 [ 0 ] MPI s t a r t u p ( ) : shm and dapl data t r a n s f e r modes19 [ 1 ] MPI s t a r t u p ( ) : shm and dapl data t r a n s f e r modes20 [ 3 ] MPI s t a r t u p ( ) : shm and dapl data t r a n s f e r modes21 [ 2 ] MPI s t a r t u p ( ) : shm and dapl data t r a n s f e r modes22 Process 1 of 4 on f09n35 . ccr . b u f f a l o . edu23 Process 3 of 4 on f09n34 . ccr . b u f f a l o . edu24 [ 0 ] MPI s t a r t u p ( ) : Rank Pid Node name Pin cpu25 [ 0 ] MPI s t a r t u p ( ) : 0 30112 f09n35 . ccr . b u f f a l o . edu 0

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 14 / 24

Page 13: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

26 [ 0 ] MPI s t a r t u p ( ) : 1 30111 f09n35 . ccr . b u f f a l o . edu 127 [ 0 ] MPI s t a r t u p ( ) : 2 31983 f09n34 . ccr . b u f f a l o . edu 028 [ 0 ] MPI s t a r t u p ( ) : 3 31984 f09n34 . ccr . b u f f a l o . edu 129 [ 0 ] MPI s t a r t u p ( ) : I_MPI_DEBUG=530 [ 0 ] MPI s t a r t u p ( ) : I_MPI_FABRICS_LIST=dapl , tcp31 [ 0 ] MPI s t a r t u p ( ) : I_MPI_FALLBACK=enable32 [ 0 ] MPI s t a r t u p ( ) : I_MPI_PLATFORM=auto33 Process 0 of 4 on f09n35 . ccr . b u f f a l o . edu34 MPI Version : 2.135 Process 2 of 4 on f09n34 . ccr . b u f f a l o . edu36 [ 1 ] DAPL s t a r t u p ( ) : t r y i n g to open d e f a u l t DAPL prov ide r from dat r e g i s t r y : mx237 [ 0 ] DAPL s t a r t u p ( ) : t r y i n g to open d e f a u l t DAPL prov ide r from dat r e g i s t r y : mx238 [ 3 ] DAPL s t a r t u p ( ) : t r y i n g to open d e f a u l t DAPL prov ide r from dat r e g i s t r y : mx239 [ 2 ] DAPL s t a r t u p ( ) : t r y i n g to open d e f a u l t DAPL prov ide r from dat r e g i s t r y : mx240 [ 0 ] MPI s t a r t u p ( ) : DAPL prov ide r mx241 [ 1 ] MPI s t a r t u p ( ) : DAPL prov ide r mx242 . . .43 [ 3 ] MPI s t a r t u p ( ) : shm and dapl data t r a n s f e r modes44 [ 0 ] MPI s t a r t u p ( ) : Rank Pid Node nameProcess 1 of 4 on f09n35 . ccr . b u f f a l o . edu45 Pin cpu46 [ 0 ] MPI s t a r t u p ( ) : 0 30125 f09n35 . ccr . b u f f a l o . edu 047 Process 2 of 4 on f09n34 . ccr . b u f f a l o . edu48 Process 3 of 4 on f09n34 . ccr . b u f f a l o . edu49 [ 0 ] MPI s t a r t u p ( ) : 1 30126 f09n35 . ccr . b u f f a l o . edu 150 [ 0 ] MPI s t a r t u p ( ) : 2 32040 f09n34 . ccr . b u f f a l o . edu 051 [ 0 ] MPI s t a r t u p ( ) : 3 32041 f09n34 . ccr . b u f f a l o . edu 152 [ 0 ] MPI s t a r t u p ( ) : I_MPI_DEBUG=553 [ 0 ] MPI s t a r t u p ( ) : I_MPI_FABRICS_LIST=dapl , tcp54 [ 0 ] MPI s t a r t u p ( ) : I_MPI_FALLBACK=enable55 [ 0 ] MPI s t a r t u p ( ) : I_MPI_PIN_MAPPING=2:0 0 ,1 156 [ 0 ] MPI s t a r t u p ( ) : I_MPI_PLATFORM=auto57 Process 0 of 4 on f09n35 . ccr . b u f f a l o . edu58 MPI Version : 2.1

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 15 / 24

Page 14: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

Intel MPI on Infiniband

The newest U2 nodes have Infiniband (IB) as the optimal interconnectfor message-passing, running an Intel MPI job should automaticallyfind and use IB on those machines (and they have 8 or 12 cores each,so adjust your script accordingly):1 [ u2 : ~ / d_mpi−samples ] $ cat subQ . out

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 16 / 24

Page 15: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

3 Job 2949999.d15n41 . ccr . b u f f a l o . edu has requested 12 cores / processors per node .4 running mpda l l ex i t on k16n13a5 LAUNCHED mpd on k16n13a v ia6 RUNNING: mpd on k16n13a7 LAUNCHED mpd on k16n12b v ia k16n13a8 RUNNING: mpd on k16n12b9 k16n13a

10 k16n12b11 [ 0 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes12 [ 1 2 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes13 [ 4 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes14 [ 1 9 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes15 [ 1 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes16 [ 2 0 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes17 [ 1 1 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes18 [ 1 4 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes19 [ 5 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes20 [ 1 3 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes21 [ 3 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes22 [ 2 3 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes23 [ 2 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes24 [ 2 1 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes25 [ 1 0 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes26 [ 1 5 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes27 [ 7 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes28 [ 2 2 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes29 [ 6 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes30 [ 1 8 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes31 [ 9 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes32 [ 1 7 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes33 [ 8 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes34 [ 1 6 ] MPI s t a r t u p ( ) : shm and tmi data t r a n s f e r modes

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 17 / 24

Page 16: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

35 [ 0 ] MPI s t a r t u p ( ) : Rank Pid Node nameProcess 1 of 24 on k16n13a . ccr . b u f f a l o . edu36 Process 2 of 24 on k16n13a . ccr . b u f f a l o . edu37 Process 5 of 24 on k16n13a . ccr . b u f f a l o . edu38 Process 11 of 24 on k16n13a . ccr . b u f f a l o . edu39 . . .40 Process 18 of 24 on k16n12b . ccr . b u f f a l o . edu41 Process 20 of 24 on k16n12b . ccr . b u f f a l o . edu42 Process 22 of 24 on k16n12b . ccr . b u f f a l o . edu43 [ 0 ] MPI s t a r t u p ( ) : I_MPI_DEBUG=544 [ 0 ] MPI s t a r t u p ( ) : I_MPI_FABRICS_LIST=tmi , dapl , tcp45 [ 0 ] MPI s t a r t u p ( ) : I_MPI_FALLBACK=enable46 [ 0 ] MPI s t a r t u p ( ) : I_MPI_PIN_MAPPING=12:0 0 ,1 1 ,2 2 ,3 3 ,4 4 ,5 5 ,6 6 ,7 7 ,8 8 ,9 9 ,10 10 ,11 1147 [ 0 ] MPI s t a r t u p ( ) : I_MPI_PLATFORM=auto48 Process 0 of 24 on k16n13a . ccr . b u f f a l o . edu49 MPI Version : 2.1

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 18 / 24

Page 17: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

Intel MPI on TCP/IP

You can force Intel MPI to run using TCP/IP (or a combination of tcp/ipand shared memory as in the example below) by setting theI_MPI_DEVICE variable, or equivalently I_MPI_FABRICS_LIST):

export I_MPI_DEBUG=5export I_MPI_DEVICE=ssm # tcp / i p between nodes , shared memory w i t h i n

1 Job 2950135.d15n41 . ccr . b u f f a l o . edu has requested 2 cores / processors per node .2 running mpda l l ex i t on f09n353 LAUNCHED mpd on f09n35 v ia4 RUNNING: mpd on f09n355 LAUNCHED mpd on f09n34 v ia f09n356 RUNNING: mpd on f09n347 f09n358 f09n349 [ 3 ] MPI s t a r t u p ( ) : shared memory and socket data t r a n s f e r modes

10 [ 0 ] MPI s t a r t u p ( ) : shared memory and socket data t r a n s f e r modes11 [ 2 ] MPI s t a r t u p ( ) : shared memory and socket data t r a n s f e r modes12 [ 1 ] MPI s t a r t u p ( ) : shared memory and socket data t r a n s f e r modes13 [ 1 ] MPI s t a r t u p ( ) : shm and tcp data t r a n s f e r modes14 [ 2 ] MPI s t a r t u p ( ) : shm and tcp data t r a n s f e r modes15 [ 3 ] MPI s t a r t u p ( ) : shm and tcp data t r a n s f e r modes16 [ 0 ] MPI s t a r t u p ( ) : shm and tcp data t r a n s f e r modes

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 19 / 24

Page 18: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

17 Process 2 of 4 on f09n34 . ccr . b u f f a l o . edu18 Process 1 of 4 on f09n35 . ccr . b u f f a l o . edu19 Process 3 of 4 on f09n34 . ccr . b u f f a l o . edu20 [ 0 ] MPI s t a r t u p ( ) : Rank Pid Node name Pin cpu21 [ 0 ] MPI s t a r t u p ( ) : 0 30385 f09n35 . ccr . b u f f a l o . edu 022 [ 0 ] MPI s t a r t u p ( ) : 1 30384 f09n35 . ccr . b u f f a l o . edu 123 [ 0 ] MPI s t a r t u p ( ) : 2 32261 f09n34 . ccr . b u f f a l o . edu 024 [ 0 ] MPI s t a r t u p ( ) : 3 32260 f09n34 . ccr . b u f f a l o . edu 125 [ 0 ] MPI s t a r t u p ( ) : I_MPI_DEBUG=526 [ 0 ] MPI s t a r t u p ( ) : I_MPI_DEVICE=ssm27 [ 0 ] MPI s t a r t u p ( ) : I_MPI_FABRICS_LIST=dapl , tcp28 [ 0 ] MPI s t a r t u p ( ) : I_MPI_FALLBACK=enable29 [ 0 ] MPI s t a r t u p ( ) : I_MPI_PLATFORM=auto30 [ 0 ] MPI s t a r t u p ( ) : MPICH_INTERFACE_HOSTNAME=10.106.9.3531 Process 0 of 4 on f09n35 . ccr . b u f f a l o . edu32 MPI Version : 2.1

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 20 / 24

Page 19: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

Intel MPI Summary

Intel MPI has some real advantages:Multi-protocol support with the same build, by default gives youthe "best" network, but also gives you the flexibility to choose yourprotocolCPU/memory affinity settingsMultiple compiler support (wrappers for GNU compilers, mpicc,mpicxx, mpif90, as well as Intel compilers, mpiicc, mpicpc,mpiifort)(Relatively) simple integration with Intel MKL, includingScaLAPACKReference manual - on the CCR systems look at$INTEL_MPI/doc/Reference_Manual.pdf for a copy of thereference manual (after loading the module)

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 21 / 24

Page 20: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

Whither Goest Thou, MPI?

MPI processes - things to keep in mind:You can over-subscribe the processors if you want, but that isgoing to under-perform (but it is often useful for debugging). Notethat batch queuing systems (like those in CCR) may not let youeasily over-subscribe the number of available processorsBetter MPI implementations will give you more options for theplacement of MPI tasks (often through so-called "affinity" options,either for CPU or memory)Typically want a 1-to-1 mapping of MPI processes with availableprocessors (cores), but there are times when that may not bedesirable

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 22 / 24

Page 21: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example U2: Intel MPI (Preferred!!)

Affinity

Intel MPI has options for associating MPI tasks to cores - better knownas CPU-process affinity

I_MPI_PIN, I_MPI_PIN_MODE,I_MPI_PIN_PROCESSOR_LIST, I_MPI_PIN_DOMAIN in thecurrent version of Intel MPI (it never hurts to check thedocumentation for the version that you are using, these optionshave a tendency to change)Can specify core list on which to run MPI tasks, also domains ofcores for hybrid MPI-OpenMP applications

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 23 / 24

Page 22: class05_MPI, CCR CompilingRunning.pdf

Simple MPI Example Summary

Summary - MPI at CCR

Use modules environment manager to choose your MPI flavorI recommend Intel MPI on the clusters, unless you needaccess to the source code for the implementation itself. It has a lotof nice features and is quite flexible.Be careful with task launching - use mpiexec whenever possibleEnsure that your MPI processes end up where you want - use psand top to check (also use MPI_Get_processor_name in yourcode).Also use the CCR ccrjobviz.pl job visualizer utility to quicklyscan for expected task placement and performance issues.

M. D. Jones, Ph.D. (CCR/UB) MPI Quick Reference: Compiling/Running HPC-I Fall 2012 24 / 24