CS 591 x I/O in MPI. MPI exists as many different implementations MPI implementations are based on...
-
Upload
barrie-miller -
Category
Documents
-
view
245 -
download
6
Transcript of CS 591 x I/O in MPI. MPI exists as many different implementations MPI implementations are based on...
CS 591 x
I/O in MPI
I/O in MPI
MPI exists as many different implementationsMPI implementations are based on MPI standardsMPI standards are developed and maintained by the MPI Forum
I/O in MPI
MPI implementations conform well to MPI standardsMPI 1 standards avoid the issue of I/OThis is a problem since it is rare that a useful program does no I/OHow to handle I/O is left to the individual implementations
I/O in MPI
To use C I/O functions – which processes have access to stdin, stdout, stderr?This is undefined in MPI.Sometimes all processes have access to stdout.In some implementations only one process has access to stdout
I/O in MPI
Sometimes stdout is only available to rank 0 in MPI_COMM_WORLDSame is true of stdinSome implementations provide no access to stdin
I/O in MPI
So how do you create portable programs?Make some assumptionsDo some checking
I/O in MPI
Recall in our MPI implementation – MPI running under PBS puts stdout in
a file (*.oxxxxx) No direct access to stdin
stdin in PBS/Torque
-I -- means interactivecan be on qsub command line or in scriptjob still starts under the control of schedulerWhen job starts PBS/MPI will provide you with an interactive shellNot terribly obvious
I/O in MPI
Two ways to deal with I/O in MPI define a specific approach in your
program use specialized parallel I/O system
I/O in parallel systems in a hot topic in high performance computing research
I/O in MPI
Learn or define a single process that can do input (stdin) and output (stdout)Usually this will be rank 0 in MPI_COMM_WORLDWrite program to have IO process manage all user IO (user input/reports, prompts,etc.)
I/O in MPI
Attribute caching recall that topologies are attributes
associated (attached to communicators)
There are other attributes attached to communicators…
… and you can assign your own for example, designate a process to
handle IO
Attribute Caching
Duplicate the communicator MPI_Comm_dup(old_comm,
&new_comm);
Define a key value (index) for the new attribute MPI_Keyval_create(MPI_DUP_FN,
MPI_NULL_DELETE_FN, &IO_KEY, extra_arg);
Attribute caching
Define a value for the attribute – define the rank of the designated IO process *io_rank = 0;
Assign the attribute to to communicator MPI_Attr_put(io_comm, IO_KEY, io_rank);
To retrieve an attribute MPI_Attr_get(io_comm, IO_KEY,
&io_rank_att, &flag);
Attribute Caching
Attribute caching functions are local you may need to share attribute
values with other processes in the comm.
I/O Process
Even though no IO mechanism is defined in MPI…MPI implementations should have several predefined attributes for MPI_COMM_WORLDOne of these in MPI_IODefines which in process in the comm is suppose to be able to do IO
I/O process
If no process can do IO MPI_IO = MPI_PROC_NULL
If every process in the comm can do IO MPI_IO = MPI_ANY_SOURCE
If some can and some cannot process that can MPI_IO = myrank process that cannot MPI_IO = rank that
can
I/O Process
MPI_IO really means which process can do outputstill may not have access to stdin
MPI-IO –stdin, stdout,stderr
for stdout – create an io communicator identify an IO process in the
communicator or – create an IO process in the
communicator IO process gathers results from
compute processes IO process outputs results
MPI-IO -stdin
Recall that all processes have access to stdin -only one process may have access to
stdin, or no processes have access to stdin
How will we know?
Testing stdin in MPI#include <stdio.h>
#include "mpi.h"
main(int argc, char** argv) {
int size, rank, numb;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("enter an integer ");
scanf(" %d",&numb);
printf("Hello world! I'm %d of %d - numb = %d\n", rank, size,numb);
MPI_Finalize(); }
Testing stdin in MPI#include <stdio.h>
#include "mpi.h"
main(int argc, char** argv) {
int size, rank, numb;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) { printf("enter an integer "); scanf(" %d",&numb);}
MPI_Bcast(&numb, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf("Hello world! I'm %d of %d - numb = %d\n", rank, size,numb);
MPI_Finalize();}
stdin – what to do?
If all processes have access to stdin – designate one process as the IO process have that process read from stdin distribute input to other processes
If only one process has access to stdin- identify which process has access to stdin have IO process read from stdin distribute data to other processes
stdin – What to do?
If no process has access to stdin – pass data as command line
arguments read input data from files create include files with data values
nuisance
File IO in MPI
File IO can be a major bottleneck in the performance of a parallel applicationParallel application can have large (enormous) data setsWe often think of file IO as a side-effect – least in terms of performance – not true in parallel applications“One half hour of IO for every 2 hours of computation”
MPI File IO types of Applications
Large grids and meshes storing grid point results for post
pressing distributing data for input
Checkpointing periodically saving the state of a job how much work can you afford to
lose?
MPI File IO types of applications
Disk caching data to large for local memories
Data mining small compute load but a lot of file IO combing through large datasets
ex. CFD
File IO in MPI
Recall that the use stdin, stdout, stderr assume, generally, a single channel for each of theseThis is not true with respect to file IO – sort of.Gathering to an IO node may not be the most efficient strategy
File IO in MPI
In parallel systems you have multiple processors running concurrently each may have the ability to do file IO –
concurrently
Know your architecture Network shared disk storage
diskless compute nodes directories shared across nodes
Directories on Energy
/home/user - is shared and same on all nodes (r/w) /usr/local/packages/ - is shared and same on all nodes (ro)all other directories on any node are local to each nodeImplications?
IO example
staging data for input dividing data before input to job distribute data pieces to local
compute node disk drives each compute node reads local files
to get its piece of the data as opposed to “read and scatter”
uses standard file IO calls
IO Example
Dump and collectIn some cases large results datasets do not need to gathered to an IO node compute node writes data to file on local
disk drive postprocess program “visits” compute
nodes and collects locally stored data postprocessor store integrated data set.
File IO strategy
IO Process/Scatter-Gather vs. Local IO/distribute-collectDepends on – use of input/output size of dataset file IO capacity of compute nodes
available disk space disk IO performance