Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group...

17
Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation 3. Model and experiments 4. Perspectives for distributed implementation

Transcript of Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group...

Page 1: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Distributed Data Assimilation - A case studyAad J. van der Steen

High Performance Computing GroupUtrecht University

1. The application

2. Parallel implementation

3. Model and experiments

4. Perspectives for distributed implementation

Page 2: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

The application, Ensflow, assimilates ocean flow data into a stochasic ocean flow model.

Many realisations of the model with randomly distributed parameters forming an ensemble are run.

Perodically these runs are integrated with satellite data and an optimal ensemble average is computed.

The sequence of ensemble averages over time describes the development of the ocean's currents best fitting the observations.

The application - 1

Page 3: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

The application - 2

The region of interest in the southern tip of Africa:

Data from the TOPEX/Nimbus satellite are used for the assimilation.

Purpose is to understand the evolution of streams and eddies in this region.

Page 4: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

The application -3

Because of the stochastic nature of the model many realisationsof the model with slightly different parameter values are to beevolved.The observations of the top layer values are interpolated toa 251x151 grid.

The ensemble members are allowed to develop independentlyfor some time and combined to find the ensemble mean

FRT B

With F the best estimate for the model evolution withoutobservations.

R = matrix of field measurement covariances.

B = matrix of representer coefficients.

Page 5: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

The application - 4

The model performs two computational intensive tasks:

1. Generation of the ensemble members.

2. The computational flow part that describes the evolution of the stream function .

Every 240 hourly timesteps an analysis of the ensemble isdone to obtain the optimal estimate for the past period.

Page 6: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Parallel implementation -1

1. Ensemble members are distributed evenly over the processors.

2. Data of ensemble members are independent and are local to the processors.

3. Only in the analysis phase to determine the globally optimal data have to be exchanged (using MPI).

4. The optimal global field is distributed and a new cycle is started.

Page 7: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Parallel implementation -2

The program contains 2 irreducible scalar parts:

1. Initialisation, linearly dependent on the number of ensemble members, and depends on , the number of gridpoints by . Init time = .

2. The analysis part for which holds that the analysis time .

On the DAS-2 systems and (for ).

ne

O nd

2 log nd

nd

tan

e

3

ti

ti59.5 s t

a104 s

ne60

Page 8: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Parallel implementation -3

The time per ensemble member per 24h time step This amounts to 20x60x30 = 36,000s = 10h singleprocessor time for the complete 20 day cycle considered.

After the init phase a distribute operation and per analysis step a collect and a distribute operation are required.The total amount of data moved is .

The bandwidth at with this occurs is 120-140 MB/s (usingMyrinet on one cluster). So, the total communication time isabout 0.12s per transfer.

Total communication time within one run .

ts30s.

nxn

yn

l1.5MB

tc727 s

Page 9: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Model and experiments -1

The timing model has the following form:

T p tit

a1200t

s

pt

c59.510436,000

p15173.536,000

ps

Page 10: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Model and experiments -2

Remarks:

1. There is a mistery with respect to the computation phases: for p = 1 , for p > 1 consistently.

2. For p < 6, using 1 CPU/node is somewhat faster, from p = 6 on, 2 CPUs/node is marginally faster due to decreasing competition for memory and faster intra-node communication.

tc17 s t

c30 s

Page 11: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Model and experiments: Simulation results

Shown is a simulation of 180 dayly periods, note the blueeddies that form counterclocwise in the Atlantic.

Page 12: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Perspectives for distributed implementation -1

The timing model has the following form:

In the single-cluster implementation is quite small (ca. 15 s)and virtually independent of .

For the distributed version this might not be the case:

1. Presently Globus cannot be used yet in conjunction with Myrinet's MPI, communication must be done via IP.

2. The geographical distance between the DAS clusters introduces non-negligable latencies.

T p tit

a1200t

s

pt

c

tc

p

Page 13: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Perspectives for distributed implementation -2

As can be seen from the figure the communication time isstill insignificant when distributing the model over twolocations (UU and VU):

Page 14: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Perspectives for distributed implementation -3

is quite erratic, more determined by synchronisation than communication time:

tc

Page 15: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Perspectives for distributed implementation -4

The results show that this application is excellently suitedfor distributed processes. Still, both communication andthe analysis phase may be made more efficient:

1. When is known which process id.s are located where, first intra-cluster communication can be done, then the assembled messages can be exchanged.

2. The analysis could be done on the local ensemble members (remember ) and synchronised less frequently.

tan

e

3

Page 16: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Perspectives for distributed implementation -5

Using more sites has a notable effect on the communication.Again, synchronisation effects are more important thanthe communication time proper :

Sites Exec.time (s) Comm.time(s) 1 3310 45.1 2 3274 62.9 3 3339 208.5 4 3299 151.9

12-proc. run: UU, UU+VU, UU+VU+Leiden, UU+VU+Leiden+Delft

Page 17: Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Perspectives for distributed implementation -6

This case study was a particular well suited candidate fordistributed processing. Apart from improving this implementationwe will proceed with three other projects that are promising:

1. Running two coupled oceanographic model within the Cactus framework.

2. Inexact sequence matching of genetic material.

3. Pattern recognition on proteomic micro arrays.

Acknowledgements:Fons van Hees for the single-system parallelisation