A parallel multi-subdomain strategy for solving Boussinesq...

19
A parallel multi-subdomain strategy for solving Boussinesq water wave equations X. Cai a,b, * , G.K. Pedersen c , H.P. Langtangen a,b a Simula Research Laboratory, P.O. Box 134, N-1325 Lysaker, Norway b Department of Informatics, University of Oslo, P.O. Box 1080, Blindern, N-0316 Oslo, Norway c Department of Mathematics, Mechanics Division, University of Oslo, P.O. Box 1053, Blindern, N-0316 Oslo, Norway Received 25 June 2004; received in revised form 1 November 2004; accepted 5 November 2004 Available online 20 January 2005 Abstract This paper describes a general parallel multi-subdomain strategy for solving the weakly dispersive and nonlinear Boussinesq water wave equations. The parallelization strategy is derived from the additive Schwarz method based on overlapping subdomains. Besides allowing the subdomains to independently solve their local problems, the strategy is also flexible in the sense that different discretization schemes, or even different mathematical models, are allowed in different subdomains. The parallelization strategy is particularly attractive from an implementational point of view, because it promotes the reuse of existing serial software and opens for the possibility of using different software in different subdomains. We study the strategyÕs performance with respect to accuracy, convergence properties of the Schwarz iterations, and scalability through numerical experiments concerning waves in a basin, solitary waves, and waves generated by a moving vessel. We find that the proposed technique is promising for large-scale parallel wave simulations. In particular, we demonstrate that satisfactory accu- racy and convergence speed of the Schwarz iterations are obtainable independent of the number of subdomains, provided there is sufficient overlap. Moreover, existing serial wave solvers are readily reusable when implementing the parallelization strategy. Ó 2004 Elsevier Ltd. All rights reserved. Keywords: Parallel computing; Wave equations; Domain decomposition; Additive Schwarz iterations 1. Introduction and motivation Shallow water models have been dominant in many branches of ocean modeling, tsunami computations being one important example. Such models are efficient, robust, allow explicit time stepping, and may treat breaking waves provided that an appropriate shock cap- turing scheme is employed. Moreover, due to the use of explicit numerical schemes, the shallow water models are simple to parallelize. However, important physics, such as (frequency) dispersion, is absent in the shallow water formulation. Dispersion is crucial in a series of long wave applications where the wave length to depth ratio is only moderate (15 or less) or huge propagation dis- tances are involved as for ‘‘trans-Pacific’’ tsunamis [39]. In particular, important wave forms like solitary waves, ondular bores and dispersive wave trains, often with dominant fronts, are not at all reproduced by shal- low water theory. While solution of the primitive Na- vier–Stokes equations, or even full potential theory, is still far too heavy for an ocean domain of appreciable size, the Boussinesq type equations provide an attractive alternative. These equations include both nonlinearity and leading dispersion, while they are still depth aver- aged and favorable with respect to numerical solution. In addition to ocean modeling and tsunamis, Boussinesq models are also useful for internal waves, plasma phys- ics, and applications in coastal engineering such as 0309-1708/$ - see front matter Ó 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.advwatres.2004.11.004 * Corresponding author. Tel.: +47 678 282 84; fax: +47 678 282 01. E-mail addresses: [email protected] (X. Cai), [email protected] (G.K. Pedersen), [email protected] (H.P. Langtangen). Advances in Water Resources 28 (2005) 215–233 www.elsevier.com/locate/advwatres

Transcript of A parallel multi-subdomain strategy for solving Boussinesq...

Page 1: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

Advances in Water Resources 28 (2005) 215–233

www.elsevier.com/locate/advwatres

A parallel multi-subdomain strategy for solving Boussinesqwater wave equations

X. Cai a,b,*, G.K. Pedersen c, H.P. Langtangen a,b

a Simula Research Laboratory, P.O. Box 134, N-1325 Lysaker, Norwayb Department of Informatics, University of Oslo, P.O. Box 1080, Blindern, N-0316 Oslo, Norway

c Department of Mathematics, Mechanics Division, University of Oslo, P.O. Box 1053, Blindern, N-0316 Oslo, Norway

Received 25 June 2004; received in revised form 1 November 2004; accepted 5 November 2004

Available online 20 January 2005

Abstract

This paper describes a general parallel multi-subdomain strategy for solving the weakly dispersive and nonlinear Boussinesq

water wave equations. The parallelization strategy is derived from the additive Schwarz method based on overlapping subdomains.

Besides allowing the subdomains to independently solve their local problems, the strategy is also flexible in the sense that different

discretization schemes, or even different mathematical models, are allowed in different subdomains. The parallelization strategy is

particularly attractive from an implementational point of view, because it promotes the reuse of existing serial software and opens

for the possibility of using different software in different subdomains.

We study the strategy�s performance with respect to accuracy, convergence properties of the Schwarz iterations, and scalability

through numerical experiments concerning waves in a basin, solitary waves, and waves generated by a moving vessel. We find that

the proposed technique is promising for large-scale parallel wave simulations. In particular, we demonstrate that satisfactory accu-

racy and convergence speed of the Schwarz iterations are obtainable independent of the number of subdomains, provided there is

sufficient overlap. Moreover, existing serial wave solvers are readily reusable when implementing the parallelization strategy.

� 2004 Elsevier Ltd. All rights reserved.

Keywords: Parallel computing; Wave equations; Domain decomposition; Additive Schwarz iterations

1. Introduction and motivation

Shallow water models have been dominant in many

branches of ocean modeling, tsunami computations

being one important example. Such models are efficient,

robust, allow explicit time stepping, and may treat

breaking waves provided that an appropriate shock cap-turing scheme is employed. Moreover, due to the use of

explicit numerical schemes, the shallow water models are

simple to parallelize. However, important physics, such

as (frequency) dispersion, is absent in the shallow water

formulation. Dispersion is crucial in a series of long

0309-1708/$ - see front matter � 2004 Elsevier Ltd. All rights reserved.

doi:10.1016/j.advwatres.2004.11.004

* Corresponding author. Tel.: +47 678 282 84; fax: +47 678 282 01.

E-mail addresses: [email protected] (X. Cai), [email protected]

(G.K. Pedersen), [email protected] (H.P. Langtangen).

wave applications where the wave length to depth ratio

is only moderate (15 or less) or huge propagation dis-

tances are involved as for ‘‘trans-Pacific’’ tsunamis

[39]. In particular, important wave forms like solitary

waves, ondular bores and dispersive wave trains, often

with dominant fronts, are not at all reproduced by shal-

low water theory. While solution of the primitive Na-vier–Stokes equations, or even full potential theory, is

still far too heavy for an ocean domain of appreciable

size, the Boussinesq type equations provide an attractive

alternative. These equations include both nonlinearity

and leading dispersion, while they are still depth aver-

aged and favorable with respect to numerical solution.

In addition to ocean modeling and tsunamis, Boussinesq

models are also useful for internal waves, plasma phys-ics, and applications in coastal engineering such as

Page 2: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

Nomenclature

X the spatial solution domain

g the water surface elevation/ the depth averaged velocity potential

H the water depth

� the magnitude of dispersion

a the magnitude of nonlinearity

p a known external pressure field

T the stopping time for a simulation

g the approximate solution of g/ the approximate solution of /IgDD the average number of additive Schwarz

iterations per time step needed for solving

the discretized continuity equation (4)

I/DD the average number of additive Schwarz iter-

ations per time step needed for solving the

discretized Bernoulli equation (5)

EgL2

and E/L2

the error of a numerical solution at the

end of a simulation, measured in the L2-norm. For example, Eg

L2is defined as

EgL2�

ZX

gðx; y; T Þ � gðx; y; T Þð Þ2 dxdy� �1

2

E/L2

is defined similarly

Eg;refL2

or E/;refL2

the L2-norm of the deviation between a

numerical solution and a reference solution,

which is obtained by running the serial one-

domain solution strategy on a very fine gridP the number of processors, the same as the

number of subdomains

WT the total wall-time consumption of a parallel

simulation.

216 X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

swells in shallow water and harbor seiches. One theme

that has been popular during the last decades is the gen-

eration of nonlinear surface waves by a moving source

at trans-critical speed. Recently, the practical implica-

tions of this topic have become more important due to

the increased use of high speed ferries in coastal waters.

Such vessels traveling in shallow water have occasionally

produced waves (‘‘Solitary killers’’) with fatal results[24,6]. A numerical case study in the present paper is in-

spired by this type of application.

Boussinesq type equations have been solved numeri-

cally by finite differences [40,1,41,36,34] or finite ele-

ments [27,46,33,44]. All the models in these references

involve large sets of implicit equations at each time step.

In comparison with the shallow water equations, this

makes the computations and memory requirements sub-stantially heavier. In addition parallel computing be-

comes at the same time both more difficult and much

more crucial. This leads to the theme of the present pa-

per: parallel solution of Boussinesq equations by do-

main decomposition.

Parallel computing is an indispensable technique for

running large-scale simulations, which may arise from

a combination of many millions of degrees of freedomand many thousands of time steps. Single-processor

computers have their limit on the processing speed and

the size of local memory and storage, thus unfit for

large-scale simulations. Parallelization of a serial wave

simulator can be done in different fashions. In case the

entire solution domain uses the same mathematical

model and discretization scheme, a possible paralleliza-

tion approach is to divide the loop- or array-level oper-ations among multiple processors. For example, when

traversing a long array, a master process can spawn a

set of worker processes, each being responsible for a

portion of the long array. Such a parallelization ap-

proach maintains a global image of the global simula-

tion and is particularly attractive for shared-memory

parallel computing systems. The overall numerical strat-

egy following this parallelization approach remains the

same as that in the serial simulator.

A more flexible parallelization approach is to explic-itly partition the global solution domain into several

subdomains, each being the responsibility of an assigned

processor. The resulting overall numerical strategy may

be different from that of the serial simulator, and thus

more favorable for parallel computing. Each subdomain

is given more independence, i.e., either different discret-

ization schemes, or different local solution methods, or

even different mathematical models may be allowed indifferent subdomains. For tsunami simulations, for in-

stance, linear finite difference methods may be used in

deep sea domains, while shallower coastal regions may

be treated by nonlinear finite element techniques that en-

able boundary fitted discretization and local refinement.

Therefore, a resource-effective simulation strategy

should adopt a hybrid approach that divides the entire

solution domain into multiple computational regions.Advanced numerical schemes together with unstruc-

tured grids are used for regions that require high accu-

racy, such as the coastal regions where the water is

shallow and the coast line is of a complicated shape.

On the other hand, simple rectangular grids and finite

difference schemes can be applied to regions that corre-

spond to the vast open sea. The essence of such a hybrid

simulation strategy is that each computational regioncomputes its local solution mostly by itself. A certain

amount of overlap is needed between the neighboring

Page 3: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233 217

computational regions. Within each overlapping zone,

the different locally computed solutions are exchanged

and then averaged between the neighbors to ensure a

smooth transition of the overall solution. This proce-

dure of local computation plus solution exchange nor-

mally needs to undergo a few iterations during eachtime step, so that convergence is obtained for a global

solution composed of the regional solutions.

We will consider in this paper a particular variant of

parallelization based on overlapping subdomains. In re-

spect of programming, we adopt the general parallel

programming model of message passing [23]. That is,

the exchange of information between processors is in

the form of explicitly sending and receiving messages.The mathematical foundation for our parallelization ap-

proach is additive Schwarz iterations, see [12,42]. We re-

mark that this parallelization approach is fully

compatible with the hybrid simulation strategy men-

tioned above, where a computational region may be fur-

ther decomposed into smaller subdomains. During each

time step, the processors first independently carry out

local discretizations restricted to the subdomains. Then,to find a global solution, an iterative procedure proceeds

until convergence, where in each iteration every subdo-

main solves a local problem and exchanges solutions

in the overlapping zones with the neighbors.

The advantages of the Schwarz iteration-based par-

allelization approach include a simple algorithmic struc-

ture, rapid convergence, good re-usability of existing

serial code, and no need for special interface solvers be-tween neighboring subdomains. (We remark that non-

overlapping domain decomposition algorithms involve

special interface solvers, see [12,42].) The disadvantages

of our overlapping subdomain-based strategy are two-

fold. First, an additional layer of iterations has to be

introduced, i.e., between the time-stepping iterations

and the subdomain iterations. This extra cost may some-

times be compensated by a combination of fast Schwarzconvergence and cheap subdomain solvers, especially

when the number of degrees of freedom becomes large.

Second, there will arise a certain amount of extra local

computational work due to overlap, slightly affecting

the speed up of parallel simulations. However, we will

show by numerical experiments that this is not a severe

problem for really large-scale simulations.

The work of the present paper is mainly motivated by[22], where it has been shown that rapid convergence of

the additive Schwarz iterations is obtainable for the one-

dimensional Boussinesq equations. This nice behavior of

wave-related problems is due to the fact that the spread

of information is limited by the wave speed, thus giving

rise to faster convergence than that of standard elliptic

boundary value problems with pure Laplace operators,

in particular when sufficient overlap is employed. Morespecifically, theoretical analyses of the one-dimensional

Boussinesq equations in [22] show that an amount of

overlap between subdomains of the order of water depth

can ensure rapid convergence within a few addtive Sch-

warz iterations. Hence, a correspondingly good effi-

ciency is to be expected when the technique is

employed in two dimensions.

In this paper, we investigate the convergence in two-dimensional cases for the method proposed in [22].

Many practical issues that are not addressed in [22]

are considered. More specifically, we study different

mechanisms of determining how to terminate the Sch-

warz iteration, including both local-type and global-type

convergence criteria. It will also be shown that the the-

oretical estimate of the overlap amount from [22] may

sometimes be too strict. That is, the additive Schwarziterations may be robust enough in many cases to allow

a small amount of overlap between subdomains. The

numerical accuracy is studied with respect to the number

of subdomains. Moreover, we analyze the performance

scalability with respect to both the number of degrees

of freedom and the number of processors.

It should be stated that domain decomposition meth-

ods have been extensively used to solve PDEs, see e.g.[4,30,29]. Applications to wave-related problems have

also been studied by numerous authors, see e.g.

[17,3,2,16,21,5,15,19,25], to just mention a few. Many

of the cited papers deal with the Helmholtz equation,

which has a significantly different numerical nature in

comparison with the time-discrete equations of the pres-

ent paper. To the authors� knowledge, there are no

papers (except for [22]) that address domain decomposi-tion and parallel computing for PDEs of the same nat-

ure as the Boussinesq equations, i.e., equations with

limited signal speed and implicit dispersion terms.

The remainder of the paper is organized as follows.

First, Section 2 presents the mathematical model of

the Boussinesq equations and standard single-domain

discretization techniques. Then, we devote Section 3 to

the explanation of our multi-subdomain strategy,addressing both the mathematical background and

numerical details. Later on, Section 4 investigates the

behavior of the parallel solution strategy by several

numerical experiments, and Section 5 concludes the pa-

per with some remarks and discussions.

2. Mathematical model and discretization

2.1. The Boussinesq equations

The Boussinesq equations in a scaled form are con-

sidered in this paper. More specifically, we introduce a

typical depth, H0, and a typical wavelength, L, as the

vertical and horizontal length scales, respectively. Select-

ing L=ffiffiffiffiffiffiffiffiffigH 0

pas the time scale and extracting an ampli-

tude factor a from the field variables, we then obtain

the Boussinesq equations in the following form [45]:

Page 4: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

218 X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

ogot

þr � q ¼ 0; ð1Þ

o/ot

þ a2r/ � r/þ gþ p � �

2Hr � Hr o/

ot

� �þ �

6H 2r2 o/

ot¼ 0; ð2Þ

where � � (H0/L)2 must be small for the equations to ap-

ply. In the above system of partial differential equations

(PDEs), we recognize Eq. (1) as the continuity equation,whereas Eq. (2) is a variant of the Bernoulli (momen-

tum) equation. The primary unknowns are the surface

elevation g(x,y, t) and the depth averaged velocity po-

tential /(x,y, t). In Eq. (2), H(x,y) denotes the water

depth and p(x,y, t) is a known external pressure applied

to the surface. The latter is assumed to be of the same

order as g or less. In Eq. (1), the flux function q is given

by

q ¼ ðH þ agÞr/þ �H1

6

ogot

� 1

3rH � r/

� �rH : ð3Þ

Eqs. (1) and (2) are assumed to be valid in a two-dimen-

sional domain X, where suitable boundary conditions

apply on oX. In the applications of the present paper

we employ no-flux conditions, implying that q Æ n = 0

and o//on = 0. In addition, the Boussinesq equations

are supplemented with initial conditions in the form of

prescribed g(x,y, 0) and /(x,y, 0). The complete mathe-

matical problem is thus to find g(x,y, t) and /(x,y, t),0 < t 6 T, as solutions to Eqs. (1) and (2), subject to

the boundary and initial conditions.

The Boussinesq equations as given above are limited

to potential flow, which excludes wave-breaking, bottom

friction, the Coriolis effect, and other sources of vortic-

ity, which may be incorporated if velocities are used as

variables instead of the potential. Still, for many aspects

of tsunami propagation and coastal engineering theassumption of potential flow is appropriate. Moreover,

regarding the application of overlapping domain decom-

position methods, the experiences obtained for the pres-

ent model will also be valid when primitive variables are

employed, due to the likeness in the structure of the non-

linear and dispersion terms; see also [22].

2.2. Temporal and spatial discretization

The temporal and spatial discretization proposed in

[33] is employed in the present work. The time domain

0 < t 6 T is divided into discrete time levels:

0;Dt; 2Dt; . . . ; T :

During each time step t‘�1 < t 6 t‘, where t‘�1 = (‘�1)Dtand t‘ = ‘Dt, the Boussinesq equations (1, 2) are discret-

ized by centered differences in time, and a Galerkin finite

element method or centered finite differences are used in

the spatial discretization. A staggered grid in time [36] is

used, where g is sought at integer time levels (‘) and / is

sought at half-integer time levels ð‘þ 12Þ. Using a super-

script, as in g‘ and /‘þ12, we can formulate the time-dis-

crete problem as follows:

g‘ � g‘�1

Dtþr � H þ a

g‘�1 þ g‘

2

� �r/‘�1

2

�þ �H

1

6

g‘ � g‘�1

Dt� 1

3rH � r/‘�1

2

� �rH

�¼ 0; ð4Þ

/‘þ12 � /‘�1

2

Dtþ a2r/‘�1

2 � r/‘þ12 þ g‘ þ p‘ � �

2Hr

� Hr/‘þ1

2 �r/‘�12

Dt

!þ �

6H 2 r2/‘þ1

2 �r2/‘�12

Dt¼ 0:

ð5ÞIn addition to centered differences in time we have used

an arithmetic mean for the g term in Eq. (3) and a geo-

metric mean for the $/ Æ $/ term in Eq. (2). The latter

yields a linear set of equations to be solved for new /values at each time step. The discretization techniques

have been applied to long term numerical simulations

in, e.g., [36,38], and have also been found to be stable

for the solitary wave solution [37]. Another nice featureof these approximations is that they imply an operator

splitting in the sense that the originally coupled system

(1) and (2) can be solved in sequence. That is, Eq. (4)

is solved first to find g‘, using g‘�1 and /‘�12 from the

computations at the previous time level. Then, Eq. (5)

is solved to find /‘þ12, using the recently computed g‘

and the previous /‘�12 as known quantities.

There are several options for the spatial discretiza-

tion. If the finite difference method is preferred, a rectan-

gular spatial grid (iDx, jDy) is normally needed for

discretizing X. Eqs. (4) and (5) are spatially discretized

on the interior grid points, while the boundary condi-

tions must be incorporated on the grid points lyingalong oX. The approximate finite difference solutions,

named g‘ðx; yÞ and /‘þ1

2ðx; yÞ, are sought on all the spa-

tial grid points, in the form of discrete values g‘i;j and

/‘þ1

2

i;j .

If the finite element method is chosen, more freedomis allowed in constructing the spatial discretization, since

we may employ unstructured grids and the elements

may have different sizes, shapes, and approximation

properties. The finite element solutions, also denoted

by g‘ðx; yÞ and /‘þ1

2ðx; yÞ, have values over the entire spa-tial domain X. These solutions are as usual taken as lin-

ear combinations of a set of basis functions Ni defined

on the finite element grid

g‘ðx; yÞ ¼Xi

g‘i N iðx; yÞ; /‘þ1

2ðx; yÞ ¼Xi

/‘þ1

2i N iðx; yÞ:

Page 5: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233 219

These expressions are inserted in the time-discrete equa-

tions (4, 5), and the resulting residual is integrated

against weighting functions Ni according to the Galerkin

method. Second order terms are integrated by parts, and

the no-flux boundary conditions imply that the bound-

ary integrals vanish. The objective of solution is to find

the coefficient vectors fg‘ig and f/‘þ12

i g.The accuracy of the numerical method is typically

second order in time and space if bilinear/linear elements

or centered spatial differences are used. Adjustments of

the scheme for achieving fourth order accuracy are de-

scribed in [33], where accuracy and stability are analyzed

in detail. The stability criterion can be written as

Dt 6

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffih2

maxXHþ 4

3minX

H�

s; ð6Þ

maybe with some smaller adjustments depending on the

details of the discretization, see [33]. Note that the impli-

cit treatment of dispersion gives a more favorable stabil-ity criterion (because � > 0 in Eq. (6)), in comparison

with the corresponding non-dispersive case.

No matter whether finite differences or finite elements

are chosen, the final result of the spatial discretization of

Eqs. (4) and (5) are the following two systems of linear

equations:

Ag /‘�12

� �g‘ ¼ bg g‘�1;/‘�1

2

� �; ð7Þ

A/ /‘�12

� �/‘þ1

2 ¼ b/ g‘;/‘�12

� �; ð8Þ

which need to be solved at each time step. We remark

that the vectors g‘ and /‘þ12 contain either the discrete

values g‘i;j and /‘þ1

2i;j in the finite difference method, or

the coefficients g‘i and /‘þ1

2i in the finite element method.

The system matrices Ag and A/ depend on the latest

solution of /, whereas the right-hand side vectors bgand b/ depend on the latest solutions of g and /.

We may summarize the computations as follows incase of a standard single-domain numerical strategy

for single-processor computers:

A single-domain numerical strategyFor time step tl�1 < t 6 tl:

1. Use g‘�1 and /‘�1

2 as known solutions and discretize

Eq. (1).

2. Solve the resulting linear system (7) for finding g‘.

3. Use g‘ and /‘�1

2 as known solutions and discretize Eq.

(2).

4. Solve the resulting linear system (8) for finding /‘þ1

2.

We remark that the above numerical strategy as-sumes that same mathematical model and spatial dis-

cretization scheme apply to the entire spatial domain X.

3. A parallel multi-subdomain strategy

As mentioned earlier in Section 1, we will adopt a

particular parallelization technique based on subdo-

mains. More specifically, we explicitly divide the global

solution domain X into a set of overlapping subdomains{Xs}. Each subdomain becomes an independent work-

ing unit, which mostly concentrates on local discretiza-

tions within Xs and solving local linear systems. In

addition, the subdomains frequently collaborate with

each other, in a form that neighboring subdomains ex-

change local solutions within overlapping zones. A loose

synchronization of the work progress on the subdo-

mains has also to be enforced.

3.1. Additive Schwarz iterations

The mathematical foundation of our parallelization

approach is the additive Schwarz method, see [12,42].

The numerical strategy behind this domain decomposi-

tion method can be understood as follows. Suppose we

want to solve a global linear system

Ax ¼ b; ð9Þwhich arises from discretizing a PDE on a global do-

main X. Given a set of subdomains {Xs} such that

X = [Xs and there is a certain amount of overlap be-

tween neighboring subdomains, we locally discretize

the PDE in every subdomain Xs. The result of the localdiscretization is

Asxs ¼ bsðxjoXsnoXÞ: ð10Þ

Mathematically, the above linear system arises from

restricting the discretization of the PDE within Xs. The

only special treatment happens on the so-called internal

boundary oXsnoX, i.e., the part of oXs that does not coin-

cide with the physical boundary oX of the global domain.

We remark that a requirement of the overlapping zonessays that any point lying on the internal boundary of a

subdomain must also be an interior point in at least

one of the other subdomains. On the internal boundary,

artificial Dirichlet conditions are repeatedly updated

with new values given by the solution (in the previous

iteration) in the neighboring subdomains. The involve-

ment of the artificial Dirichlet conditions is indicated

by the notation bsðxjoXsnoXÞ in Eq. (10). On the remainingpart of oXs, the original boundary conditions of the glo-

bal PDE are valid as before.

Actually, a subdomain matrix As can also arise from

first building the global matrix A and then cutting out

the portion of A that corresponds to the grid points lying

in Xs. However, this approach requires unnecessary con-

struction and storage of global matrices, which is not a

desired situation during parallel computations. We justmake the point that the global matrix A can be logically

represented by the collection of subdomain matrices As.

Page 6: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

220 X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

For the artificial Dirichlet conditions to converge to-

ward the correct values on the internal boundary, itera-

tions need to be carried out. That is, we generate on each

subdomain a series of approximate solutions

x0s ; x

1s ; x

2s ; . . . , which will hopefully converge toward

the correct solution xs ¼ xjXs. One such additive Sch-

warz iteration is defined as

xks ¼ eA�1

s bsðxk�1joXsnoXÞ; xk ¼ composition of all xks :

ð11Þ

We note that the subdomain local solves can be carried

out independently in each additive Schwarz iteration.This immediately gives rise to the possibility of parallel

computing. The symbol eA�1

s in Eq. (11) indicates that

a local solve can be approximate, not necessarily an ex-

act inverse of As. The right-hand side vector bs needs tobe updated with artificial Dirichlet conditions on the

points that lie on the internal boundary, using solution

of the previous Schwarz iteration provided by the neigh-

boring subdomains. At the end of the kth additiveSchwarz iteration, the (logically existing) global approx-

imate solution xk is composed on the basis of the

subdomain approximate solutions fxksg. The principle

of partition of unity, which roughly means that com-

posing subdomain solutions of constant one should re-

sult in a global solution of constant one (see e.g. [12]),

is used in the following rule for composing a global

solution:

• An overlapping point refers to a point that lies inside

a zone of overlap, i.e., the point belongs to at least

two subdomains.

• For every non-overlapping point, i.e., a point that

belongs to only one subdomain, the global solution

attains the same value as that inside the host

subdomain.• For every overlapping point, let us denote by ntotal the

total number of host subdomains that own this point.

Let also ninterior denote the number of subdomains,

among those ntotal host subdomains, which do not

have the point lying on their internal boundaries.

(The setup of the overlapping subdomains ensures

ninterior P 1.) Then, the average of the ninterior local

values becomes the global solution on the point.The other ntotal � ninterior local values are not used,

because the point lies on the internal boundary there.

Finally, the obtained global solution is enforced in

each of the ntotal host subdomains. For the

ntotal � ninterior host subdomains, which have the

point lying on their internal boundary, the obtained

global solution will be used as the artificial Dirichlet

condition during the next Schwarz iteration.

To compose the global solution and update the arti-

ficial Dirichlet conditions, as described by the above

rule, we need to carry out a procedure of communica-

tion among the neighboring subdomains at the end of

each Schwarz iteration. During this procedure of com-

munication, each pair of neighboring subdomains ex-

changes between each other an array of values that are

associated with their shared overlapping points. It isclear that if each subdomain solution xk

s converges to-

ward the correct solution xjXs, the difference between

the subdomain solutions in an overlapping zone will

eventually disappear.

A well-known technique in domain decomposition

methods for obtaining convergence independent of the

number of subdomains is the so-called coarse grid cor-

rection, see e.g. [12]. The rough idea behind one coarsegrid correction is that a global ‘‘residual’’ is mapped

to another very coarse global grid and resolved accu-

rately there. Then, the computed ‘‘correction’’ is inter-

polated back to the global fine grid level and thus

improves the accuracy of the global solution. Coarse

grid corrections have been proved to be essential for do-

main decomposition methods to solve purely elliptic

PDEs, where the ‘‘spread of information’’ is infinitelyfast. However, we will deliberately avoid using coarse

grid corrections in our paper, because the spread of

information in the Boussinesq equations is limited by

the wave speed. Numerical experiments in Section 4 will

indicate that convergence independent of the number of

subdomains is obtainable in many cases. In fact, one-

dimensional experiments from [22] displayed a signifi-

cant effect of coarse grid corrections only when the glo-bal coarse grid has a sufficiently high resolution.

3.2. Solving the Boussinesq equations in parallel

3.2.1. Domain partitioning

To solve the Boussinesq equations (1, 2) in parallel,

we need to first partition the global domain X into over-

lapping subdomains Xs, s = 1,2, . . . ,P. The overlappingsubdomains may arise as follows. First, the global do-

main X is partitioned into P non-overlapping subdo-

mains bXs, s = 1,2, . . . ,P. For example, if X is of a

rectangular shape, it can be partitioned regularly into

a mesh of smaller rectangles. A more general partition-

ing scheme, which can be applied to both rectangular-

and irregular-shaped X, should allow curves as the

borders between neighboring subdomains. The resultingpartitioning is consequently of an unstructured nature,

see Fig. 1 for an example.

When a non-overlapping partitioning of X is done,

each subdomain bXs is enlarged with a certain amount,

so that overlapping zones arise between neighboring

subdomains. The final results are the overlapping sub-

domains {Xs}. A rule of thumb for domain partitioning

is that the subdomains should have approximately thesame number of degrees of freedom. In addition, the

length of the internal boundaries should preferably be

Page 7: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

Fig. 1. An example of an unstructured partitioning of X into 16 subdomains (a), and the approximate solution g obtained from a corresponding

parallel simulation at t = 2 (b).

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233 221

small, for limiting the communication overhead in the

parallel simulations.

3.2.2. Distributed discretization

When the overlapping subdomains are ready, parall-

elization can be done by modifying the single-domain

numerical strategy, on the basis of the overlapping sub-

domains. That is, each subdomain solver (on Xs) is only

responsible for finding the approximate solutions within

Xs, denoted by g‘s and /‘þ1

2

s . Regarding the discretization,

the main difference between a serial strategy and a par-

allel one is the area of discretization. In a parallel strat-

egy, the discretization can be carried out independently

on the subdomains. Optionally, different discretizationschemes or even different mathematical models can be

adopted by the subdomains. The result of a parallel dis-

cretization is that a global matrix Ag is distributed as a

set of subdomain matrices {Ag,s}, and similarly A/ is

distributed as {A/,s}.

3.2.3. Solving the discretized equations in parallel

To solve the two distributed systems of linear equa-tions during each time step, i.e., the distributed form

of Eqs. (7) and (8) among the subdomains, we use the

additive Schwarz iterations described in Section 3.1.

More specifically, during time step t‘�1 < t 6 t‘, subdo-

main Xs participates in the solution of Eq. (7) by the fol-

lowing iterations: For k = 1,2, . . . ,

Ag;s /‘�12

s

� �g‘;ks ¼ bg;s g‘�1

s ;/‘�12

s ; g‘;k�1joXsnoX

� �ð12Þ

until global convergence. Here, g‘�1s and /‘�1

s denote theconverged solutions of the additive Schwarz iterations

from the previous time step. The initial guess g‘;0s of

the Schwarz iterations for the current time level is the

same as g‘�1s . We remark that updating the artificial

Dirichlet conditions per Schwarz iteration is indicated

by g‘;k�1joXsnoX in Eq. (12). The intermediate global solu-

tion g‘,k�1 needs only to exist logically, and is composed

of the intermediate subdomain solutions fg‘;k�1s g, as de-

scribed in Section 3.1. We also remark that the subdo-

main systems (12) can be solved totally independently

of each other. An approximate subdomain solver is

often sufficient for Ag,s. Moreover, a loose synchroniza-

tion between the subdomains has to be enforced, in the

sense that no subdomain can start with its local work

for the next Schwarz iteration, before all the other subdo-mains have finished their local solves for the current Sch-

warz iteration and have communicated with each other.

Similarly, to solve Eq. (8) in parallel, we use the fol-

lowing additive Schwarz iterations on subdomain Xs:

For k = 1,2, . . .

A/;s /‘�12

s

� �/‘þ1

2;k

s ¼ b/;s g‘s;/‘�1

2s ;/‘þ1

2;k�1joXsnoX

� �ð13Þ

until global convergence. The initial guess /‘þ12;0

s is the

same as /‘�12

s .

3.2.4. Checking the global convergence of Schwarz

iterations

An important issue for the above Schwarz iterations

is checking the convergence of the intermediate global

solutions

g‘;1; g‘;2; . . . ; g‘;k and /‘þ12;1;/‘þ1

2;2; . . . ;/‘þ1

2;k:

If the same mathematical model and discretization

scheme are used in all the subdomains, a global-type

convergence monitor can check an intermediate globalresidual vector. Let us consider, for instance, the case

of the continuity equation. The subdomain linear sys-

tems (12) logically constitute the following global linear

system:

Ag /‘�12

� �g‘;k � bg g‘�1;/‘�1

2

� �; ð14Þ

which can be considered as a ‘‘global view’’ of one Sch-

warz iteration. The associated global residual vector

r‘;kg ¼ bg g‘�1;/‘�12

� �� Ag /‘�1

2

� �g‘;k; ð15Þ

which arises from ‘‘composing’’ a set of locally com-

puted residual vectors r‘;kg jXs, can be used to check the

global convergence. More precisely, when the global

solution gl,k is ready, each subdomain computes its local

contribution to the global residual vector as

r‘;kg jXs¼ bg;s � Ag;sg

‘;kjXs; ð16Þ

Page 8: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

222 X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

where g‘;kjXsdenotes the restriction of the global solu-

tion on subdomain s. In Eq. (16), the residual values

on the internal boundary points will be incorrect

and thus should not partiticate in computing a global

norm of rl;kg . Duplicated contributions to kr‘;kg k from

residual values on the other overlapping pointsshould also be scaled according to the principle of par-

tition of unity. Thereby, a typical global-type monitor

for checking the global convergence of the Schwarz iter-

ations is

kr‘;kg kkr‘;0g k

< eglobal; ð17Þ

where eglobal is a prescribed threshold value. For small

time steps, where the initial residual is expected to be

small, Eq. (17) might be too strict, and checking forthe absolute instead of the relative residual is preferable.

We remark that the global matrix and vectors Ag, bg,g‘,k, and r‘;kg exist only logically, because their actual val-

ues are computed and distributed on the subdomains.

However, if different discretization schemes or differ-

ent mathematical models are used on the subdomains, it

makes no sense to compute global residuals. Therefore,

the convergence of the intermediate global solutions

g‘;1; g‘;2; . . . ; g‘;k and /‘þ12;1;/‘þ1

2;2; . . . ;/‘þ1

2;k

should be checked locally, in a collaboration involvingall the subdomains. A local-type monitor for the global

convergence can thus be

kg‘;ks � g‘;k�1s k

kg‘;ks k < elocalglobal for all s; ð18Þ

where elocalglobal is another prescribed threshold value. We

remark that the above local-type convergence monitor(18) can replace the global-type monitor (17), but not

the other way around. In Section 4 we report experience

with both convergence monitors.

3.2.5. The subdomain solvers

During every additive Schwarz iteration, each subdo-

main Xs needs to solve a local linear system (12) or (13)

using updated artificial Dirichlet conditions on its inter-nal boundary. Normally, an iterative solver can be used,

because the additive Schwarz method allows approxi-

mate subdomain solvers. A typical subdomain solver

may use a few (preconditioned) conjugate gradient

(CG) iterations, see e.g. [7].

Let us consider Eq. (12) during the kth Schwarz iter-

ation. An iterative subdomain solver generates a series

of local solutions g‘;k;0s ; g‘;k;1s ; . . . ; g‘;k;ms on Xs. To monitorthe convergence with a subdomain solver, we may use

the following subdomain residual vector:

r‘;k;mg;s ¼ bg;s g‘�1s ;/‘�1

2s ; g‘;k�1joXsnX

� �� Ag;s /‘�1

2s

� �g‘;k;ms :

ð19Þ

A typical monitor for the subdomain convergence is

thus

kr‘;k;mg;s kkr‘;k;0g;s k

< esubd; ð20Þ

where esubd is a prescribed threshold value. We remark

that different subdomains may choose different iterative

solvers and different values of esubd. For example, on asubdomain where the solutions change very little from

time step to time step, the threshold esubd should use a

relatively large value, or a convergence monitor that

only checks the absolute value of kr‘;k;mg;s k should be used.

3.2.6. The overall parallel strategy

The whole parallel numerical strategy can be summa-

rized as follows.A multi-subdomain numerical strategy

Partition the global domain X into overlapping subdo-

mains {Xs}. During each time step tl�1 < t 6 tl, the fol-

lowing sub-steps are carried out on every subdomain Xs:

1. Use g‘�1s and /

‘�12

s as known solutions and discretize

Eq. (1) inside Xs.

2. Solve the distributed global linear system (7) to findg‘s, using the additive Schwarz iterations (12).

3. Use g‘s and /‘�1

2

s as known solutions and discretize Eq.

(2) inside Xs.

4. Solve the distributed global linear system (8) to find

/‘þ1

2

s , using the additive Schwarz iterations (13).

In the case nonlinearity and dispersion are neglected,

i.e., a = � = 0, the time-discrete scheme becomes explicit.One Schwarz iteration is then sufficient per time step for

solving both Eqs. (1) and (2), because only quantities

computed at the previous time level are involved in the

artificial boundary conditions in the subdomains.

3.2.7. Software

The attractive feature of the proposed parallelization

strategy, from an implementational point of view, is thata serial simulation code can, in principle, be reused in

each subdomain. The authors have developed this idea

to the extent that a serial simulation code can be reused

without modifications. When the serial code is available

as a C++ class, only a small subclass needs to be pro-

grammed for gluing the original solver with a generic

library for communicating finite element fields via

MPI. The small subclass then works as a parallel subdo-main solver, which can be used by a general Schwarz

iterator. This parallelization framework is available as

part of the Diffpack programming environment [18,31]

and documented in previous papers [8,32,11,9,10]. For

the present work we have managed to insert the Bous-

sinesq equation solver from [33] in the parallelization

framework [9]. The approach of reusing a serial solver

Page 9: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233 223

not only saves implementation time, but also makes the

parallel software more reliable: a well-tested serial code

is combined with a well-tested generic communication li-

brary and a general Schwarz iterator. We believe that

such a step-wise development of parallel simulation soft-

ware is essential, because debugging parallel codes soonbecomes a tedious and challenging process.

4. Numerical experiments

The purpose of this section is to test the numerical

properties of the parallel multi-subdomain strategy de-

scribed in the preceding section. A series of numericalexperiments is done, by which we investigate in

particular:

(1) The relationship between the accuracy of the result-

ing numerical solutions and a chosen convergence

monitor.

(2) The effect of different partitioning schemes, both

regular and unstructured.(3) The convergence speed of Schwarz iterations with

respect to the amount of overlap between neighbor-

ing subdomains.

(4) The applicability of the global-type and local-type

convergence monitors, i.e., Eqs. (17) and (18).

(5) The scalability of the parallel simulations, i.e., how

the speed-up results depend on the number of pro-

cessors and the grid resolution.

4.1. Eigen-oscillations in a closed basin

As the first set of test cases, we consider standing

waves in the following solution domain:

X ¼ ð0; 10Þ � ð0; 10Þwith a constant water depthH = 0.04. The external pres-

sure term in Eq. (2) is prescribed as zero. Both the case

of linear waves without dispersion (a = � = 0, i.e., the

hydrostatic model) and the case of nonlinear and disper-

sive waves (a = � = 1) are studied. The boundary condi-

tions are of no-flux type, o/on ¼ 0 and q Æ n = 0, on the

entire boundary oX. The spatial discretization is done

by the finite element method based on a uniform grid

with bilinear elements, the spatial grid resolution varies

from Dx = Dy = 0.025 to Dx = Dy = 0.1. The time do-

main of simulation is 0 < t 6 T = 2, and the initial con-

ditions are chosen such that the following solutions:

gðx; y; tÞ ¼ 0:008 cosð3pxÞ cosð4pyÞ cosðptÞ ð21Þ

and

/ðx; y; tÞ ¼ � 0:008cosð3pxÞ cosð4pyÞ sinðptÞ ð22Þ

p

are exact for the linear case without dispersion

(a = � = 0). (The above exact solutions are derived from

a more general form exp(i(kxx + kyy � xt)) using de-

sired initial and boundary conditions, see e.g.

[43,35,31].) We note that there are 15 wave lengths in

the x-direction, and 20 wave lengths in the y-direction.For the size of the time steps, we have used a fixed value

of Dt = 0.05 independent of the spatial grid resolution.

We also mention that consistent mass matrices (without

lumping) have been used in our finite element discretiza-

tion, so that the numerical scheme is implicit even for

the linear non-dispersive case (a = � = 0). In the present

simulation case, lumped mass is known to give better

numerical approximations, but consistent mass matricesare deliberately chosen for studying the convergence

behavior of additive Schwarz iterations. The topic on

lumping mass matrices is, e.g., discussed in [33].

4.1.1. The effect of the global convergence criterion

We use a 100 · 100 global grid for the case of linear

waves without dispersion. For domain partitioning we

use an unstructured strategy (see Fig. 1 for an example).This unstructured partitioning strategy is based on using

the Metis [26] software package that is originally de-

signed for decomposing unstructured finite element

meshes, in which equal-sized subdomains are sought

while the total volume of communication overhead is

minimized. Although we can use here a regular parti-

tioning scheme, in which straight ‘‘cutting lines’’ that

are parallel to the x- and/or y-axis produce subdomainsof a rectangular shape, we have deliberately chosen the

unstructured strategy to study whether irregularly

shaped subdomains pose a problem for the convergence

of the additive Schwarz iterations. The amount of over-

lap is fixed as one layer of shared elements between two

neighboring subdomains.

In Table 1, we show the relationship between the

accuracy and the threshold value in a chosen conver-gence criterion, for different values of P. More specifi-

cally, we vary the value of eglobal in the global-type

convergence monitor (17) and check how it affects EgL2,

E/L2 ; IgDD, and I/DD. It can be observed that IgDD and

I/DD increase when the global convergence criterion be-

comes stricter, while the L2-norm of the errors first de-

creases and then stabilize at the level of discretization

errors. We can also see that eglobal = 10�4 ensures at leastthree-digit accuracy, which becomes totally independent

of the number of subdomains, so this conservative value

of eglobal is used in Tables 2–5. Moreover, for a fixed

value of eglobal, it can be observed that IgDD and I/DD

increase slightly with respect to P. For practical applica-

tions, it may be argued that eglobal = 10�2 is sufficient.

The choice of thresholds must be related to the choice

of grid resolution and the size of the approximationsin the underlying wave model (the Boussinesq

equations).

Page 10: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

Table 1

The effect of the threshold value eglobal, see Eq. (17), on convergence speed and accuracy for the case of linear waves without dispersion (a = � = 0)

eglobal P IgDD EgL2 I/DD E/

L2

10�1 2 1.23 1.1828 · 10�2 1.15 3.7155 · 10�3

10�1 4 1.23 1.2606 · 10�2 1.25 3.9693 · 10�3

10�1 8 1.38 1.3051 · 10�2 1.43 4.1668 · 10�3

10�1 16 1.43 1.3811 · 10�2 1.33 4.5126 · 10�3

10�1 32 1.98 1.1373 · 10�2 1.43 3.4153 · 10�3

10�2 2 2.10 9.6199 · 10�3 2.00 3.2211 · 10�3

10�2 4 2.13 9.6209 · 10�3 2.03 3.2241 · 10�3

10�2 8 2.68 9.5882 · 10�3 2.50 3.2086 · 10�3

10�2 16 3.08 9.6674 · 10�3 3.03 3.2556 · 10�3

10�2 32 3.58 9.4929 · 10�3 3.55 3.2657 · 10�3

10�3 2 3.00 9.5775 · 10�3 3.00 3.2080 · 10�3

10�3 4 3.00 9.5788 · 10�3 3.00 3.2091 · 10�3

10�3 8 5.00 9.5792 · 10�3 4.95 3.2088 · 10�3

10�3 16 5.05 9.5837 · 10�3 5.03 3.2120 · 10�3

10�3 32 6.03 9.5608 · 10�3 6.00 3.1935 · 10�3

10�4 2 5.00 9.5757 · 10�3 5.00 3.2061 · 10�3

10�4 4 4.05 9.5750 · 10�3 4.03 3.2055 · 10�3

10�4 8 7.00 9.5761 · 10�3 7.00 3.2065 · 10�3

10�4 16 9.00 9.5758 · 10�3 9.00 3.2063 · 10�3

10�4 32 9.00 9.5766 · 10�3 9.00 3.2068 · 10�3

10�5 2 7.00 9.5756 · 10�3 7.00 3.2061 · 10�3

10�5 4 7.00 9.5756 · 10�3 7.00 3.2061 · 10�3

10�5 8 10.00 9.5755 · 10�3 10.00 3.2060 · 10�3

10�5 16 12.00 9.5755 · 10�3 12.00 3.2060 · 10�3

10�5 32 13.00 9.5756 · 10�3 13.00 3.2061 · 10�3

The global 100 · 100 grid is partitioned using an unstructured scheme, where the amount of overlap is fixed as one layer of shared elements. The

subdomain solvers use 1–5 CG iterations for obtaining esubd = 10�1, see Eq. (20).

Table 2

Results from a one-dimensional regular partitioning scheme for the case of linear waves without dispersion (a = � = 0). The global-type convergence

monitor (17) is used with eglobal = 10�4. The amount of overlap is one layer of shared elements between two neighboring subdomains. The subdomain

solvers use 1–5 CG iterations for obtaining esubd = 10�1, see Eq. (20)

Global 100 · 100 grid; subdomains as vertical stripes

P IgDD EgL2

I/DD E/L2

2 4.10 9.5751 · 10�3 4.10 3.2056 · 10�3

4 5.00 9.5756 · 10�3 5.00 3.2061 · 10�3

8 6.00 9.5724 · 10�3 6.00 3.2035 · 10�3

16 7.00 9.5761 · 10�3 7.00 3.2067 · 10�3

Table 3

Results due to different partitioning schemes associated with 16-subdomain linear simulations on a 100 · 100 global grid

Partitioning scheme IgDD EgL2

I/DD E/L2

WT

16 · 1 rectangles 7.00 9.5761 · 10�3 7.00 3.2067 · 10�3 7.78

8 · 2 rectangles 9.98 9.5750 · 10�3 9.98 3.2056 · 10�3 10.52

4 · 4 rectangles 7.93 9.5752 · 10�3 7.93 3.2057 · 10�3 8.33

2 · 8 rectangles 7.95 9.5748 · 10�3 7.98 3.2053 · 10�3 10.43

1 · 16 rectangles 7.00 9.5767 · 10�3 7.00 3.2070 · 10�3 7.74

Unstructured 9.00 9.5758 · 10�3 9.00 3.2063 · 10�3 9.46

The other parameters are the same as in Table 2. We note that WT denotes the wall-time consumption (in s) of the simulations.

224 X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

4.1.2. The effect of domain partitioning

We have also tested a regular partitioning scheme

that produces the subdomains as a coarse mesh of rect-angles. The amount of overlap is fixed as one layer of

shared elements between two neighboring subdomains

and the grid has 100 · 100 elements, as in Section

4.1.1. Table 2 shows EgL2 , E

/L2 , I

gDD, and I/DD arising from

using a regular one-dimensional partitioning in the

Page 11: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

Table 4

The relationship between grid resolution (fixed Dt = 0.05) and convergence speed for the case of nonlinear and dispersive waves (a = � = 1)

Global grid P IgDD Eg;refL2

I/DD E/;refL2

100 · 100 2 6.68 7.2339 · 10�3 4.25 3.3084 · 10�3

100 · 100 4 6.80 7.2338 · 10�3 4.38 3.3084 · 10�3

100 · 100 8 8.43 7.2342 · 10�3 4.83 3.3083 · 10�3

100 · 100 16 9.35 7.2342 · 10�3 5.35 3.3084 · 10�3

200 · 200 2 5.48 1.5613 · 10�3 3.10 9.4077 · 10�4

200 · 200 4 5.48 1.5609 · 10�3 3.13 9.4073 · 10�4

200 · 200 8 5.75 1.5609 · 10�3 3.20 9.4082 · 10�4

200 · 200 16 6.13 1.5611 · 10�3 3.68 9.4091 · 10�4

400 · 400 2 4.05 2.5737 · 10�4 5.45 1.9139 · 10�4

400 · 400 4 4.15 2.5740 · 10�4 6.98 1.9131 · 10�4

400 · 400 8 4.40 2.5740 · 10�4 7.00 1.9115 · 10�4

400 · 400 16 4.28 2.5766 · 10�4 7.00 1.9084 · 10�4

800 · 800 2 3.78 1.1286 · 10�5 13.00 9.8782 · 10�7

800 · 800 4 3.83 1.4519 · 10�5 15.85 1.5556 · 10�6

800 · 800 8 3.88 1.3249 · 10�5 15.93 1.7371 · 10�6

800 · 800 16 3.98 9.0133 · 10�6 15.78 1.4481 · 10�6

The global-type convergence monitor (17) with eglobal = 10�4 is used. For domain partitioning, an unstructured scheme is used where the amount of

overlap is fixed as one layer of shared elements. The subdomain solvers use 1–5 CG iterations for obtaining esubd = 10�1, see Eq. (20).

Table 5

Comparing Eqs. (17) and (18) as the global convergence monitor for the case of nonlinear and dispersive waves (a = � = 1) on a global 200 · 200 grid

esubd P IgDD Eg;refL2

I/DD E/;refL2

Using the global-type convergence monitor (17); eglobal = 10�4

10�1 2 5.48 1.5613 · 10�3 3.10 9.4077 · 10�4

10�1 4 5.48 1.5609 · 10�3 3.13 9.4073 · 10�4

10�1 8 5.75 1.5609 · 10�3 3.20 9.4082 · 10�4

10�1 16 6.13 1.5611 · 10�3 3.68 9.4091 · 10�4

10�2 2 7.03 1.5610 · 10�3 3.00 9.4099 · 10�4

10�2 4 7.00 1.5610 · 10�3 3.00 9.4114 · 10�4

10�2 8 8.00 1.5611 · 10�3 3.00 9.4056 · 10�4

10�2 16 8.08 1.5611 · 10�3 3.00 9.4042 · 10�4

10�3 2 7.03 1.5610 · 10�3 3.00 9.4098 · 10�4

10�3 4 7.00 1.5609 · 10�3 3.00 9.4113 · 10�4

10�3 8 8.00 1.5611 · 10�3 3.00 9.4057 · 10�4

10�3 16 7.10 1.5610 · 10�3 4.23 9.4087 · 10�4

Using the local-type convergence monitor (18); elocalglobal ¼ 10�4

10�1 2 6.05 1.5609 · 10�3 3.98 9.4086 · 10�4

10�1 4 6.55 1.5609 · 10�3 4.08 9.4086 · 10�4

10�1 8 6.93 1.5609 · 10�3 4.10 9.4088 · 10�4

10�1 16 7.10 1.5610 · 10�3 4.23 9.4087 · 10�4

10�2 2 7.63 1.5609 · 10�3 3.40 9.4086 · 10�4

10�2 4 8.28 1.5610 · 10�3 3.63 9.4085 · 10�4

10�2 8 9.00 1.5609 · 10�3 3.78 9.4085 · 10�4

10�2 16 9.50 1.5610 · 10�3 3.73 9.4085 · 10�4

10�3 2 7.63 1.5609 · 10�3 3.20 9.4086 · 10�4

10�3 4 8.23 1.5610 · 10�3 3.40 9.4085 · 10�4

10�3 8 9.00 1.5610 · 10�3 3.65 9.4086 · 10�4

10�3 16 9.55 1.5610 · 10�3 3.70 9.4085 · 10�4

The table also investigates the effect of the subdomain convergence threshold value esubd, see Eq. (20). For domain partitioning, an unstructured

scheme is used where the amount of overlap is fixed as one layer of shared elements.

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233 225

x-direction, such that the subdomains are vertical

stripes. Compared with the measurements in Table 1

that correspond to eglobal = 10�4, we can observe that

the accuracy is independent of the chosen partitioning

scheme. However, the convergence speed of the additive

Schwarz iterations is somewhat sensitive to the parti-

Page 12: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

226 X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

tioning scheme. In Table 3, we compare six different par-

titioning schemes for P = 16 in particular. We have also

listed the wall-time consumption associated with each

partitioning scheme in the table. It can be observed that

the time usage depends primarily on the number of Sch-

warz iterations used, rather than on the shape of thesubdomains. This suggests that the message passing

overhead is quite low in the parallel simulations.

4.1.3. The effect of grid resolution

For the case of nonlinear and dispersive waves

(a = � = 1), we try different grid resolutions and study

how the number of Schwarz iterations changes. The

unstructured partitioning scheme is used, and theamount of overlap is fixed as one layer of shared ele-

ments between two neighboring subdomains.

We can observe in Table 4 that the obtained accuracy

depends on the grid resolution, but is almost completely

insensitive to the number of subdomains. Note that in

this nonlinear case we have no exact solution available

and we therefore use fine-grid reference solutions

for accuracy comparisons. The reference solutionsgref(x,y,T) and /ref(x,y,T) are produced by a serial solu-

tion method on a 800 · 800 global grid, i.e., without

domain decomposition. The linear systems involved in

the serial solution process have been solved with suffi-

cient accuracy. From Table 4 we can also see that IgDD

decreases when the spatial grid resolution is increased,

whereas I/DD shows an opposite tendency. We remark

that for the 800 · 800 global grid, one layer of sharedelements between neighboring subdomains is a very

small amount of overlap, therefore the large numbers

of I/DD. For a fixed spatial grid resolution, IgDD and I/DD

tend to increase slightly with P. (We could of course

use a larger amount of overlap, which will result in fas-

ter convergence, but this small amount of overlap is

deliberately chosen to test the robustness of the additive

Schwarz iterations.)With our choice of bilinear elements for the spatial

discretization and centered differences in time, the

scheme is of second order in the time step size Dt andthe element size h. This means that the numerically com-

puted surface elevation field g‘ðx; y; h;DtÞ is related to

the exact solution g‘ through

g‘ðx; y; h;DtÞ ¼ g‘ þ Ah2 þ BDt2;

where A and B are constants independent of the discret-

ization parameters h and Dt. For the reference solution

we have

g‘ðx; y; href ;DtÞ ¼ g‘ þ Ah2ref þ BDt2;

with href being the element size in the grid used for com-

puting the reference solution. Since we have used a fixed

value of Dt in the experiments, this implies that the dif-ference between the numerical solution and a reference

solution can be written as

Eg;refL2

ðh;DtÞ ¼ Ah2 þ C; where C ¼ �Ah2ref : ð23Þ

Applying Eq. (23) to two resolutions h1 and h2, we can

estimate A as

A ¼ Eg;refL2

ðh1;DtÞ � Eg;refL2

ðh2;DtÞ� �

ðh21 � h22Þ�1: ð24Þ

The C parameter then follows from:

C ¼ Eg;refL2

ðh1;DtÞ � Ah21; h1 6¼ hhref : ð25Þ

Based on Eqs. (24) and (25), the values of A and C can

be estimated using two consecutive experiments, and

with sufficiently fine resolution in space and time, the

estimated values will (hopefully) converge to the true

values. The same estimation technique applies of course

also to the reference solution of the / field. From thevalues in Table 4 we find from the two finest grids

Eg;refL2

ðh;DtÞ � 7:1� 10�3h2 � 1:9� 10�4;

which fits well with all g-related values in Table 4. For

E/;refL2

ðh;DtÞ an estimate using the first and third grid

gives a formula

E/;refL2

ðh;DtÞ � 3:3� 10�3h2 � 1:7� 10�5;

which also predicts the third value satisfactorily. It

therefore appears that the Eg;refL2

and E/;refL2

values in Table

4 are compatible with a discretization method of second-

order in space.

4.1.4. More on global and subdomain convergence

monitors

As we have mentioned earlier in Section 3.2.4, thelocal-type global convergence monitor (18) can replace

the global-type monitor (17). To demonstrate this prop-

erty, we compare the results obtained from the two glo-

bal convergence monitors in Table 5. The first half of the

table shows the results from using the global-type moni-

tor (17) with eglobal = 10�4, whereas the second half of

Table 5 is devoted to the results due to using the local-

type monitor (18) with elocalglobal ¼ 10�4. In each half of thetable, we also show the effect of different values of esubd,which is used by Eq. (20) to check the convergence of the

local subdomain solvers during each Schwarz iteration.

It can be observed from Table 5 that the use of the

local-type global convergence monitor (18) results in

slightly more iterations IgDD and I/DD, while achieving

at the same time more stable measurements of Eg;refL2

and E/;refL2

. This indicates that the local-type monitor(18) can be a useful mechanism for checking the global

convergence of the Schwarz iterations. The advantage

is in particular its applicability for situations where dif-

ferent subdomains use different discretizations or math-

ematical models, a feature of particular importance in

large-scale ocean wave modeling.

Regarding the choice of esubd used in the subdomain

convergence monitor (20), we have observed in Table

Page 13: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233 227

5 that esubd = 10�1 is sufficient for obtaining three-digit

global accuracy with respect to both Eg;refL2

and E/;refL2

.

This means that the subdomain problems need not be

solved very accurately. A stricter subdomain conver-

gence monitor (smaller value of esubd) will only increase

the computational cost, without improving the accuracyof the global numerical solutions. We remark that the

subdomain convergence monitor (20) with a constant

threshold value esubd actually becomes ‘‘stricter and

stricter’’ toward the final Schwarz iteration, because

the value of rl;k;0g;s or rl;k;0/;s decreases with the number of

Schwarz iterations k.

4.1.5. Variable water depth and unstructured grids

To study the effect of non-constant water depth on

the convergence of additive Schwarz iterations, we intro-

duce a bottom profile as depicted in Fig. 2. More specif-

ically, the bottom profile is a bell-shaped function

centered around (x = 5,y = 0), such that H varies be-

tween 0.04 and 0.62. Three unstructured computational

grids with different resolutions have been built accord-

ingly for X = (0,10) · (0,10), see Fig. 2 for the coarsestgrid. The unstructured domain partitioning scheme

(see Section 4.1.1) has to be used for such unstructured

grids. For the three grids we have respectively used one,

two, and three layers of elements as the overlapping

zone between neighboring subdomains. Table 6 shows

the average numbers of Schwarz iterations needed to

achieve convergence associated with eglobal = 10�4,

where Dt = 0.05, T = 2, and a = � = 1. The subdomainsolver uses 1–5 preconditioned CG iterations for obtain-

ing esubd = 10�1, see Eq. (20). The chosen preconditioner

is based on the modified incomplete LU-factorization

(MILU), see e.g. [7]. We can observe from Table 6 that

a moderate size of the overlapping zone is sufficient in

this test case for obtaining a stable convergence indepen-

dent of P.

X

Y

Z

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

-0.04

0

10 0

10

0

10

Fig. 2. A variable-depth bottom profile and an a

4.2. Solitary waves

In the preceding test cases, the solutions have a uni-

form frequency x throughout the entire domain. This

roughly means that all the subdomains have the same

dynamic behavior. To investigate whether differencesin the dynamic behavior among the subdomains affect

our parallel solution strategy, we now study the problem

of a solitary wave, which moves in the positive x-direc-

tion. We have deliberately chosen the initial conditions

to be dependent on x only, even though we simulate in

the entire two-dimensional domain. This enables an easy

verification of the solutions visually, and can be used to

check whether different domain partitionings affect theparallel simulation results.

The solution domain is X = (0,400) · (0,20) with a

constant water depth of H = 1. As before, the boundary

conditions are of no-flux type as in the previous test

case, and the spatial discretization is done by the finite

element method based on a uniform grid with bilinear

elements. Consistent mass matrices are used and the

external pressure term in Eq. (2) is also prescribed aszero. The time domain of interest is 0 < t 6 T = 200,

and the forms of g(x,y, 0) and /(x,y,Dt/2) are depicted

in Fig. 3. The size of the time step is fixed at Dt = 0.25.

We simulate a case of nonlinearity and dispersion

(a = � = 1) for the solitary wave. Three resolutions of

the global grid are chosen as 400 · 40, 800 · 40, and

1600 · 40. The chosen Dt is then 4.6–6 times smaller

than the stability limit of Dt (see Eq. (6)). A one-dimen-sional regular partitioning scheme is used such that the

resulting subdomains are vertical stripes. The relation-

ship between convergence speed and amount of overlap

is studied in Table 7. We can observe that a sufficient

overlap is necessary for achieving rapid convergence of

the Schwarz iterations in this test case, meaning that

the width of the overlapping zones must be above a fixed

0 10

ssociated unstructured computational grid.

Page 14: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

Table 6

The average numbers of Schwarz iterations needed for solving a nonlinear and dispersive case with variable water depth described in Section 4.1.5

Grid points 2741 10761 42641

Overlap layers 1 2 3

P IgDD I/DD IgDD I/DD IgDD I/DD

2 5.85 9.43 4.00 11.45 4.00 7.50

4 6.93 10.03 4.00 11.10 4.00 7.48

8 7.00 10.10 4.00 11.18 4.00 8.28

16 7.65 11.55 4.00 11.40 4.00 8.03

0 5 10 15 20 25 30 35 400

0.050.1

0.150.2

0.250.3

0.350.4

0.45

x

η (x)

0 5 10 15 20 25 30 35 40-1.5

-1

-0.5

0

xφ (

x)

(a) (b)

Fig. 3. A region of the initial conditions for the test case of a solitary wave; g at t = 0 (a) and / at t = Dt/2 (b).

228 X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

value, typically of the size H, independent of the grid

resolution. This observation is in accordance with the

one-dimensional analysis in [22]. For large values of P,even though only a small number of subdomains have

large changes in g and / at a given time level, this does

not prevent a similar convergence speed of the Schwarz

iterations as for small values of P. Fig. 4 shows the

numerical solution of g at e.g. t = 10. Due to numerical

dispersion, we can see that small components of different

wave lengths are lagging behind the moving soliton.

Table 7

The relationship between convergence speed and amount of overlap for a n

Width of overlap H 2

Global grid P IgDD I/DD I

400 · 40 2 1.17 2.05 1

400 · 40 4 1.49 2.09 1

400 · 40 8 2.10 2.18 1

400 · 40 12 2.52 2.30 1

400 · 40 16 2.86 2.40 2

800 · 40 2 1.08 3.06 1

800 · 40 4 1.18 3.12 1

800 · 40 8 1.40 3.23 1

800 · 40 12 1.59 3.40 1

800 · 40 16 1.75 3.51 1

1600 · 40 2 1.09 3.12 1

1600 · 40 4 1.27 3.25 1

1600 · 40 8 1.62 3.49 1

1600 · 40 12 1.86 3.86 1

1600 · 40 16 1.92 4.11 1

The global-type convergence monitor (17) with eglobal = 10�3 is used, and the

obtaining esubd = 10�1, see Eq. (20).

Such residual wave trains are characteristic in virtually

all Boussinesq or KdV-type models, unless some kind

of filtering is employed. This is because the analytic sol-itary wave solution, which is inserted as the initial con-

dition, can not be reproduced exactly in the discrete

solutions. This gives rise to a slightly modified solitary

wave and a small residual wave train with a high con-

tent of short waves. The residual train will not grow

in time and thus does not cause instability; see e.g.

[37,22].

onlinear solitary wave problem

H 3H

gDD I/DD IgDD I/DD

.11 2.02 1.08 2.00

.42 2.03 1.29 2.00

.75 2.06 1.61 2.00

.95 2.10 1.80 2.00

.13 2.13 2.01 2.00

.05 3.03 1.06 3.03

.19 3.06 1.12 3.06

.37 3.12 1.24 3.12

.47 3.20 1.41 3.21

.58 3.26 1.54 3.27

.06 3.09 1.06 3.09

.32 3.19 1.14 3.19

.61 3.37 1.29 3.38

.74 3.65 1.43 3.67

.84 3.84 1.54 3.86

subdomain solvers use 1 � 2 MILU-preconditioned CG iterations for

Page 15: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

010

2030

40

0

5

10

15

20-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

x

Numerical solution of η at t=10

y

Fig. 4. A region of the computed solution of the solitary wave at t = 10; the case of nonlinearity and dispersion (a = � = 1).

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233 229

4.3. Waves generated by a moving disturbance

In the final test case we want to study waves gener-

ated at trans-critical speeds by a moving disturbance,

such as a boat. The standard Kelvin ship wave pattern

is then strongly modified. In particular solitary waves

may be generated and radiated upstream. For an in-

depth explanation of this subject, we refer to

[45,20,28,36,14,13,34]. These references are generally

concerned with sources that move along uniform chan-nels with constant speed, even though a few also address

wave generation in an horizontally unbounded fluid.

The wave patterns evolve slowly over large propagation

distances. Then, to limit the size of the computations the

references truncate their domain downstream and a

computational window is sometimes designed to follow

the source, either by dynamical inclusion of upstream

grid points or employment of a coordinate transforma-tion. In a more general setting, with highly variable

source velocity and bathymetry, such techniques can

hardly be invoked. Hence, to approach real applica-

tions, as the ‘‘Solitary killers’’ [24], we will need much

larger computational domains than those used in the

academic studies referenced above and parallel comput-

ing becomes desirable.

A moving disturbance is most conveniently incorpo-rated into our model through the pressure term

p(x,y, t) in the Bernoulli equation (2). We assume that

p(x,y, t) has an effective region of fixed shape, where

the center is moving along a trajectory. Since we are

concerned with the parallel aspects we keep the problem

simple by assuming constant depth and that the distur-

bance is moving along the x-axis with a velocity of F,

which in the present scaling equals the Froude number.

However, there is nothing in our model that depends onthese limitations. Moreover, following [36] and others,

we assume that the effective region of p(x,y, t) is of an

ellipsoidal shape, i.e.,

pðx; y; tÞ ¼ pacos2 1

2pR

� �R 6 1;

0 R > 1;

�ð26Þ

where

Rðx; y; tÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffix� Ftb

� �2

þ yw

� �2s: ð27Þ

We have chosen the spatial domain as (x,y) 2[�100,600] · [�60,60], which can be thought as a wide

channel of shallow water (H � 1), in which a moving

vessel generates waves. The physical size of H may be10 m in such a case. Due to symmetry (the disturbance

moves along the x-axis), only the upper y-half of the

spatial domain is used for computation. The speed of

the disturbance is chosen as F = 1.1, i.e., a slightly super-

critical case. The ellipsoidal shape of the effective region

of p(x,y, t) is determined by choosing b = 12, w = 7, and

pa = 0.1 in Eqs. (26, 27). Several snapshots of g(x,y, t)from a simulation between 100 6 t 6 500 are depictedin Fig. 5, showing that there will occur upstream radia-

tion of solitons when t is large enough.

The main purpose of this test case is to study the sca-

lability property of our parallel solution strategy with

respect to the grid resolution and the number of subdo-

mains. In order to be able to carry out detailed measure-

ments of a set of large-scale parallel simulations, we

have chosen to only consider a short simulation time

Page 16: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

Fig. 5. Snapshots of g(x,y, t) that arise from a simulation of waves generated by a moving disturbance moving at a supercritical speed.

230 X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

period 0 < t 6 1. Three different grid resolutions are

chosen: 2800 · 240, 5600 · 480, and 11200 · 960, where

we note that the finest computational grid has 10, 764,161 nodal points. Two-dimensional rectangular domain

partitioning is used to divide X into P overlapping sub-

domains, where the amount of overlap is, respectively

for the three grid resolutions, four, eight, and sixteen

layers of shared elements. In other words, the width of

the overlapping zone is H.

We study the case of nonlinear and dispersive waves

(a = � = 1). Finite element discretization with linear tri-angular elements and a consistent mass matrix are used

on all the subdomains. To monitor the convergence of

Schwarz iterations, we have used the global-type moni-

tor (17) with eglobal = 10�3. As the subdomain solver,

we always use ten CG-iterations. In Table 8, we report

the average numbers of Schwarz iterations IgDD and

I/DD per time step and the total wall-clock time usage

WT. The wall-clock time measurements are obtainedon a Linux cluster that consists of 1.3 GHz Itanium2

processors, inter-connected through a Gigabit ethernet.

It can be observed that the wall-clock time measure-

ments scale quite well with respect to P. Note that the

speed-up results in Table 8 are based on the measure-

ments for P = 2, e.g., the speed-up for P = 20 is obtained

Table 8

The average numbers of Schwarz iterations per time step and the total wall-c

due to a moving disturbance

P 2800 · 240 grid; 4 steps 5600 · 480 grid; 8 step

IgDD I/DD WT Speedup IgDD I/DD W

2 1 2 60.79 N/A 1 4 61

4 1 2 30.45 3.99 1 4 30

8 1 2 16.06 7.57 1 4 15

12 1 2 11.04 11.01 1 4 10

16 1 2 8.28 14.68 1 4 8

20 1 2 7.11 17.10 1 4 6

by 2 · WT(P = 2)/WT(P = 20). The speed-up results

also gradually improve as the grid resolution increases.

This indicates that our parallel solution strategy hasinherently good parallel performance, where the com-

munication overhead is not dominating. The main

‘‘obstacle’’ for achieving perfect speed-up results is the

overlapping zone between neighboring subdomains,

having actually more effect than the communication

overhead. In addition, we can observe that the additive

Schwarz iterations rapidly reach the desired convergence

level for the continuity equation in all the situations,whereas the number of iterations for the Bernoulli equa-

tion is independent of P, but increases when the grid res-

olution increases.

5. Discussion and conclusions

Modeling of long destructive water waves in theocean frequently involves huge domains and fine grids,

thus calling for highly efficient parallel simulation codes.

Optimal parallel speed-up is easy to achieve for the ex-

plicit time-marching schemes that are currently domi-

nating in long water wave modeling. However, weak

dispersion is often needed (and is usually sufficient

lock time consumptions (in s) for a set of parallel simulations of waves

s 11200 · 960 grid; 16 steps

T Speedup IgDD I/DD WT Speedup

2.55 N/A 1 10 7568.91 N/A

6.35 4.00 1 10 3836.77 3.95

9.68 7.67 1 10 1928.40 7.85

8.86 11.25 1 10 1304.43 11.60

2.76 14.80 1 10 990.90 15.28

6.36 18.46 1 10 780.31 19.40

Page 17: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233 231

[39]), and this makes the numerical schemes implicit and

harder to parallelize. In the present paper we have pro-

posed and evaluated a parallelization strategy for the

weakly dispersive and nonlinear Boussinesq equations

in two dimensions.

Due to a limited wave speed, the additive Schwarziterations are more suitable for solving wave-related

problems than the Laplace-like elliptic boundary prob-

lems. This has been demonstrated by nearly constant

or slowly growing numbers of Schwarz iterations as

the number of processors and subdomains grows. The

growth of the Schwarz iterations is far slower than what

arises in Laplace-like elliptic problems. In another paper

[22] we have performed an in-depth investigation ofoverlapping domain decomposition methods and Sch-

warz iterations for the one-dimensional Boussinesq

equations. Such investigations are important because

artificial reflection of waves at subdomain boundaries

may arise and destroy the quality of the simulations.

The present paper provides evidence that the main con-

clusions from [22] carry over to two dimensions and that

the domain decomposition method can be parallelizedwith satisfactory scalability.

Regarding the accuracy, the number of subdomains

does not affect the overall accuracy of the resulting

numerical solution, provided that a strict enough global

convergence monitor is used for the Schwarz iterations.

Such a convergence monitor may well be local in each

subdomain, and this is necessary if we want to utilize

the flexibility of using different discretization methodsand/or mathematical models in different subdomains.

Experiments have indicated similar behavior of the glo-

bal-type and local-type convergence monitors, i.e., Eqs.

(17) and (18). Another advantage of the proposed paral-

lel strategy is that the subdomain problems do not need

to be solved very accurately (at least during the early

Schwarz iterations). This is an important computation

saving factor.In respect of software development, our experience

shows that the described parallelization approach

strongly encourages code reuse. That is, an existing seri-

al solver can work as the subdomain solvers. The re-

quired amount of modifications in the serial solver

depends on how it is designed. Our implementation

shows an example where serial solvers following an ob-

ject-oriented design normally can be reused withoutmodifications in a generic parallel framework [9].

Although our numerical experiments have been carried

out on an in-house PC cluster, which has relatively

much faster processors than its communication speed,

we still manage to achieve close-to-perfect scalability.

This means that such parallel wave simulations suit

quite well for cheap and low-end parallel computers,

provided that the size of computation is sufficientlylarge. In other words, we believe that parallel wave sim-

ulations will eventually become ‘‘affordable’’ for many

researchers, with respect to both software implementa-

tion and hardware building.

Regarding the future work, we see several topics that

need to be investigated. First, multi-algorithmic applica-

tions should be tested, i.e., different wave models and/or

discretizations are used in different subdomains. We re-mark that our parallel implementation readily allows

such multi-algorithmic applications and that a few

one-dimensional cases have been investigated to a cer-

tain extent in [22]. Second, the necessity of a coarse grid

correction mechanism (see Section 3.1) should be stud-

ied. The experience from [22] suggests that coarse grid

corrections may slightly improve the convergence, but

it also introduces a certain amount of extra computa-tion. We remark that our solution strategy without

coarse grid correction is straightforward to implement,

which is a fact of importance for ‘‘popularizing’’ parallel

computations with (weakly) dispersive wave models.

Third, adaptive mesh refinement is important for treat-

ing coastlines and significant variations in the water

depth, but the resulting mesh is static and does not pose

challenges beyond our unstructured domain partitioningscheme using Metis [26]. Dispersion reduces the need for

dynamic adaptive mesh refinement from time step to

time step, since there are less localized phenomena in a

dispersive wave train. However, dynamic adaptive mesh

refinement is important for treating, e.g., wave breaking.

In the context of our parallel domain decomposition

strategy, we see the main challenge of dynamic adaptive

mesh refinement not in the respect of locating targetareas for refinement and performing the element

subdivision (this is already taken care of in our soft-

ware [31]), but rather in the actual parallel implementa-

tion. This is because load balancing is not easily

achievable, which may involve shuffling solution areas

and data between subdomains from one time step to

another.

Acknowledgments

The authors thank Tom Thorvaldsen for his contri-

bution to an early programming phase of the parallel

software used in this paper. We are also grateful to Syl-

fest Glimsdal for his help in providing the initial condi-

tions for the experiments concerning solitary waves.

References

[1] Abott MB, Petersen HM, Skovgaard O. On the numerical

modelling of short waves in shallow water. J Hyd Res 1978;

16(3):173–203.

[2] Bamberger A, Glowinski R, Tran QH. A domain decomposition

method for the acoustic wave equation with discontinuous

coefficients and grid change. SIAM J Numer Anal 1997;34(2):

603–39.

Page 18: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

232 X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

[3] Benamou JD, Despres B. A domain decomposition method for

the Helmholtz equation and related optimal control problems. J

Comp Phys 1997;136:68–82.

[4] Bjørstad PE, Espedal M, Keyes D, editors. Domain decomposi-

tion methods in sciences and engineering. Proceedings of the 9th

International Conference on Domain Decomposition Methods,

June 1996, Bergen, Norway; 1998. Domain Decomposition Press.

[5] Boubendir Y, Bendali A, Collino F. Domain decomposition

methods and integral equations for solving Helmholtz diffraction

problem. In Fifth International Conference on Mathematical and

Numerical Aspects of Wave Propagation, Philadelphia, PA; 2000.

SIAM. p. 760–4.

[6] Marine Accident Investigation Branch. Report on the investiga-

tion of the man overboard fatality from the angling boat Purdy at

Shipwash Bank off Harwich on 17 July 1999. Technical Report

17/2000, Marine Accident Investigation Branch, Carlton House,

Carlton Place, Southampton, SO15 2DZ, 2000.

[7] Bruaset AM. A survey of preconditioned iterative methods. In:

Pitman Res Notes, Math Ser 328; 1995. London: Longman

Scientific & Technical.

[8] Bruaset AM, Cai X, Langtangen HP, Tveito A. Numerical

solution of PDEs on parallel computers utilizing sequential

simulators. In: Ishikawa Y, Oldehoeft RR, Reynders JVW,

Tholburn M, editors. Scientific computing in object-oriented

parallel environments. Lect Notes Comput Sci. Berlin: Springer;

1997. p. 161–8.

[9] Cai X. Overlapping domain decomposition methods. In: Lang-

tangen HP, Tveito A, editors. Advanced topics in computational

partial differential equations—numerical methods and Diffpack

programming. Berlin: Springer; 2003. p. 57–95.

[10] Cai X, Acklam E, Langtangen HP, Tveito A. Parallel computing.

In: Langtangen HP, Tveito A, editors. Advanced topics in

computational partial differential equations–numerical methods

and Diffpack programming, Lect Notes Computat Sci Eng.; 2003.

Berlin: Springer. p. 1–55.

[11] Cai X, Langtangen HP. Developing parallel object-oriented

simulation codes in Diffpack. In H.A. Mang, F.G. Rammerstor-

fer, and J. Eberhardsteiner, editors, Proceedings of the Fifth

World Congress on Computational Mechanics, 2002. http://

wccm.tuwien.ac.at.

[12] Chan TF, Mathew TP. Domain decomposition algorithms. In:

Acta Numerica. Cambridge University Press; 1994. p. 64–143.

[13] Chen X, Sharma S. A slender ship moving at a near-critical speed

in a shallow channel. J Fluid Mech 1995;291:263–85.

[14] Choi H, Bai K, Cho J. Nonlinear free surface waves due to a ship

moving near the critical speed in shallow water. In Proceedings of

18th Symposium of Naval Hydrodynamics, Washington, DC;

1991. p. 173–90.

[15] Collino F, Ghanemi S, Joly P. Domain decomposition methods

for harmonic wave propagation: a general presentation. Comput

Methods Appl Mech Eng 2000;184(2–4):171–211.

[16] Dean EJ, Glowinski R, Pan T-W. A wave equation approach to

the numerical simulation of incompressible viscous fluid flow

modeled by the Navier–Stokes equations. In: De Santo JA, editor.

Mathematical and numerical aspects of wave propagation. Phil-

adelphia, PA: SIAM; 1998. p. 65–74.

[17] Despres B. Domain decomposition method and the Helmholtz

problem II. In Second International Conference on Mathematical

and Numerical Aspects of Wave Propagation (Newark, DE,

1993); 1993. Philadelphia, PA: SIAM. p. 197–206.

[18] Diffpack Home Page. Available from: http://www.diffpack.com.

[19] Dolean V, Lanteri S. A domain decomposition approach to finite

volume solutions of the Euler equations on unstructured triangu-

lar meshes. Int J Numer Methods Fluids 2001;37(6):625–56.

[20] Ertekin RC, Webster WC, Wehausen JV. Waves caused by a

moving disturbance in a shallow channel of finite width. J Fluid

Mech 1986;169:275–92.

[21] Feng X. Analysis of a domain decomposition method for the

nearly elastic wave equations based on mixed finite element

methods. IMA J Numer Anal 1998;18(2):229–50.

[22] Glimsdal S, Pedersen GK, Langtangen HP. An investigation of

domain decomposition methods for one-dimensional dispersive

long wave equations. Adv Water Res 2004;27:1111–33.

[23] Gropp W, Lusk E, Skjellum A. Using MPI—portable parallel

programming with the message-passing interface. 2nd ed. Cam-

bridge, MA: The MIT Press; 1999.

[24] Hamer M. Solitary killers. New Scientist August 1999;18–19.

[25] Ingber MS, Schmidt CC, Tanski JA, Phillips J. Boundary-element

analysis of 3D diffusion problems using a parallel domain

decomposition method. Numer Heat Transfer, Pt B 2003;44(2):

145–64.

[26] Karypis G, Kumar V. Metis: unstructured graph partitioning and

sparse matrix ordering system. Technical report, Department of

Computer Science, University of Minnesota, Minneapolis/St.

Paul, MN, 1995.

[27] Katopedes ND, Wu C-T. Computation of finite-amplitude

dispersive waves. J Waterw Port, Coastal, Ocean Eng 1987;

113(4):327–46.

[28] Katsis C, Akylas TR. On the excitation of long nonlinear water

waves by a moving pressure distribution. Pt. 2. Three-dimensional

effects. J Fluid Mech 1987;177:49–65.

[29] Kornhuber R, Hoppe R, Periaux J, Pironneau O, Widlund O, Xu

J, editors. Domain decomposition methods in sciences and

engineering. In: Proceedings of the 15th International Conference

on Domain Decomposition Methods, July 2003, Berlin, Germany,

Lect Notes Computat Sci Eng, vol. 40; 2004. Berlin: Springer.

[30] Lai C-H, Bjørstad PE, Cross M, Widlund O, editors. Domain

decomposition methods in sciences and engineering. Proceedings

of the 11th International Conference on Domain Decomposition

Methods, July 1998, Greenwich, UK; 1999. Domain Decompo-

sition Press.

[31] Langtangen HP. Computational partial differential equations-

numerical methods and Diffpack programming. Texts in compu-

tational science and engineering. 2nd ed.; 2003. Berlin: Springer.

[32] Langtangen HP, Cai X. A software framework for easy parall-

elization of PDE solvers. In: Jensen CB, Kvamsdal T, Andersson

HI, Pettersen B, Ecer A, Periaux J, et al, editors. Parallel

computational fluid dynamics. Amsterdam: North-Holland;

2001.

[33] Langtangen HP, Pedersen G. Computational models for weakly

dispersive nonlinear water waves. Comp Methods Appl Mech Eng

1998;160:337–58.

[34] Li Y, Sclavounos PD. Three-dimensional nonlinear solitary waves

in shallow water generated by an advancing disturbance. J Fluid

Mech 2002;470:383–410.

[35] Mei CC. The Applied Dynamics of Ocean Surface Waves. Sin-

gapore: World Scientific; 1989.

[36] Pedersen G. Three-dimensional wave patterns generated by

moving disturbances at transcritical speeds. J Fluid Mech 1988;

196:39–63.

[37] Pedersen G. Finite difference representations of nonlinear waves.

Int J Numer Methods Fluids 1991;13:671–90.

[38] Pedersen G. Nonlinear modulations of solitary waves. J Fluid

Mech 1994;267:83–108.

[39] Pedersen G, Langtangen HP. Dispersive effects on tsunamis. In:

Proceedings of the International Conference on Tsunamis, Paris,

France; 1999. p. 325–40.

[40] Peregrine DH. Long waves on a beach. J Fluid Mech 1967;77:

417–31.

[41] Rygg OB. Nonlinear refraction–diffraction of surface waves in

intermediate and shallow water. Coast Eng 1988;12:191–211.

[42] Smith BF, Bjørstad PE, Gropp W. Domain decomposition:

parallel multilevel methods for elliptic partial differential equa-

tions. Cambridge University Press; 1996.

Page 19: A parallel multi-subdomain strategy for solving Boussinesq …heim.ifi.uio.no/~xingca/DR/Cai_Ref11.pdf · A parallel multi-subdomain strategy for solving Boussinesq water wave equations

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233 233

[43] Whitham GB. Linear and nonlinear waves. Pure and applied

mathematics. New York: John Wiley Sons; 1974.

[44] Woo J-K, Liu PL-F. Finite element model for modified Bous-

sinesq equations. I: Model development. J Waterways Port,

Coastal, Ocean Eng 2004;130(1):1–16.

[45] Wu DM, Wu TY. Three-dimensional nonlinear long waves due to

moving surface pressure. In Proceedings of the 14th Symposium

on Naval Hydrodynamics, MI, USA; 1982. p. 103–29.

[46] Zelt JA, Raichlen F. A Lagrangian model for wave-induced

harbour oscillations. J Fluid Mech 1990;213:203–25.