SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨...

68
Technische Universit¨ at M ¨ unchen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School on Computer Simulations in Science and Engineering Michael Bader July 8–19, 2013 Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code Computer Simulations in Science and Engineering, July 8–19, 2013 1

Transcript of SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨...

Page 1: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

SWE –Anatomy of a Parallel Shallow Water Code

CSCS-FoMICS-USI Summer School onComputer Simulations in Science and Engineering

Michael Bader

July 8–19, 2013

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 1

Page 2: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Teaching Parallel Programming Models . . .

Starting Point: Lecture on Parallel Programming

• classical approaches for shared & distributed memory:OpenMP and MPI

• “something more fancy”→ GPU computing (CUDA, e.g.)• motivating example to teach different models and compare their

properties

“Motivating Example”:

• not just Jacobi or Gauß-Seidel• not the heat equation again . . .• inspired by a CFD code: “Nast” by Griebel et al.• turned out to become shallow water equations• and is heavily used for summer schools . . .

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 2

Page 3: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Towards Tsunami Simulation with SWE

Shallow Water Code – Summary

• Finite Volume discretization on regular Cartesian grids→ simple numerics (but can be extended to state-of-the-art)

• patch-based approach with ghost cells for communication→ wide-spread design pattern for parallelization

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 3

Page 4: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Towards Tsunami Simulation with SWE (2)

Shallow Water Code – Bells & Whistles

• included augmented Riemann solvers (D. George, R. LeVeque)→ allows to simulate inundation

• developed towards hybrid parallel architectures→ now runs on GPU clusters

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 4

Page 5: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Part I

Model and Discretization

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 5

Page 6: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Model: The Shallow Water Equations

Simplified setting (no friction, no viscosity, no Coriolis forces, etc.):

∂t

hhuhv

+

∂x

huhu2 + 1

2 gh2

huv

+

∂y

hvhuv

hv2 + 12 gh2

=

0− ∂∂x (ghb)

− ∂∂y (ghb)

Quantities and unknowns:

Technische Universität MünchenDepartment of Informatics V

A. Breuer, S. Rettenberger, M. Bader Lab: Scientific Computing - Tsunami-Simulation

SWEs: Differential Form

5

Prof. Dr. M. BaderDipl.-Math. A. BreuerS. Rettenberger, M. Sc.

Bachelor-Praktikum:Tsunami-Simulation April 2, 2013

In this assignment we are going to implement and test the most basic functionality of ourlab-course: The f-wave solver for the one-dimensional shallow water equations. The shallowwater equations are a system of nonlinear hyperbolic conservations laws with an optionalsource term:

h

hu

t

+

hu

hu2 + 12gh2

x

= S(x, t). (1)

The quantities q = [h, hu]T are defined by h(x, t), the space-time dependent height of thewater column and hu(x, t), the space-time dependent momentum in spatial x-direction (uis the particle velocity). g the gravity constant (usually g := 9.81m/s2) and f := [hu, hu2 +12gh2]T the flux function. As source term S(x, t) we will consider the e↵ect of space-dependent

bathymetry only S(x) = [0,-ghBx]T , embedding of additional forces, such as friction or the

coriolis e↵ect is possible. Figure TODO illustrates the involved quantities.To verify, that the most basic functionality of our program works as expected a proper

(unit-) testing is required. We will do testing by a selection of standardized tests, for whicha solution is available.

Remark As units we use meters (m) and seconds (s) for all computations.

Literature

We discuss the basic ideas of numerics, software and strategies in our meetings, neverthelessmany important details can’t be covered in such a short time. We recommend a basic set ofliterature in each assignment as hint for your personal studies. In terms of this assignmentwe recommend the following list of books, papers and guides:

• Finite volume methods for hyperbolic problems, R. J. LeVeque, 2002

• Riemann solvers and numerical methods for fluid dynamics, E. F. Toro, 2009

• A wave propagation method for conservation laws and balance laws with spatially vary-ing flux functions, D. S. Bale, 2003

• Thinking in C++: http://mindview.net/Books/TICPP/ThinkingInCPP2e.html

• git Documentation: http://git-scm.com/documentation

• Doxygen Manual : http://www.stack.nl/~dimitri/doxygen/manual

• CxxTest User Guide: http://cxxtest.com/guide.html

• SCons user Guide: http://www.scons.org/doc/production/HTML/scons-user.html

• Paraview Documentation: http://www.itk.org/Wiki/ParaView/Users_Guide/Table_Of_Contents

Monday, April 15, 13

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 6

Page 7: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Model: The Shallow Water Equations

Simplified setting (no friction, no viscosity, no Coriolis forces, etc.):

∂t

hhuhv

+

∂x

huhu2 + 1

2 gh2

huv

+

∂y

hvhuv

hv2 + 12 gh2

=

0− ∂∂x (ghb)

− ∂∂y (ghb)

Write as generalized hyperbolic PDE:

• 2D setting, three quantities: q = (q1,q2,q3)T = (h,hu,hv)T

∂tq +

∂xF (q) +

∂yG(q) = S(q, x , y , t)

• with flux functions:

F (q) =

q2

q22/q1 + 1

2 gq21

q2q3/q1

G(q) =

q3q2q3/q1

q23/q1 + 1

2 gq21

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 6

Page 8: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Model: The Shallow Water Equations

Simplified setting (no friction, no viscosity, no Coriolis forces, etc.):

∂t

hhuhv

+

∂x

huhu2 + 1

2 gh2

huv

+

∂y

hvhuv

hv2 + 12 gh2

=

0− ∂∂x (ghb)

− ∂∂y (ghb)

Derived from conservations laws:

• h equation: conservation of mass• equations for hu and hv : conservation of momentum• 1

2 gh2: averaged hydrostatic pressure due to water column h,similar: bathymetry terms − ∂

∂x (ghb) and − ∂∂y (ghb)

• may also be derived by vertical averaging from the 3Dincompressible Navier-Stokes equations

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 6

Page 9: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Model: The Shallow Water Equations

Simplified setting (no friction, no viscosity, no Coriolis forces, etc.):

∂t

hhuhv

+

∂x

huhu2 + 1

2 gh2

huv

+

∂y

hvhuv

hv2 + 12 gh2

=

0− ∂∂x (ghb)

− ∂∂y (ghb)

The ocean as “shallow water”??

• compare horizontal (∼ 1000 km) to vertical (∼ 4 km) length scale• wave lengths large compared to water depth• vertical flow may be neglected; movement of the “entire water

column”

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 6

Page 10: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Finite Volume Discretisation

• discretise system of PDEs

∂tq +

∂xF (q) +

∂yG(q) = S(t , x , y)

• results from integral equation:

∂t

tn+1∫

tn

Ω

q dω dt +

tn+1∫

tn

∂Ω

~F (q) · ~n ds dt = . . .

• use averaged quantities Q(n)i,j in finite volume elements Ωij :

Qij (t) :=1|Ωij |

Ωij

q dω ∂

∂t

tn+1∫

tn

Ω

q dω dt = |Ωij |(

Q(n+1)i,j −Q(n)

i,j

)

• What about the flux integral?

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 7

Page 11: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Finite Volume Discretisation (2)• flux integral on Cartesian grids:tn+1∫

tn

∂Ω

~F (q) · ~n ds dt =

tn+1∫

tn

yj+ 12∫

yj− 12

F (q(xi+ 12, y , t))− F (q(xi− 1

2, y , t)) dy dt

+

tn+1∫

tn

xi+ 12∫

xi− 12

G(q(x , yj+ 12, t))−G(q(x , yi− 1

2, t)) dy dt

• leads to explicit time stepping scheme:

Q(n+1)i,j −Q(n)

i,j =∆t∆y

(F (q(xi+ 1

2, y , tn))− F (q(xi− 1

2, y , tn))

)

+∆t∆x

(G(q(x , yj+ 1

2, tn))−G(q(x , yi− 1

2, tn))

)

• how to compute F (n)

i+ 12 ,j

:= F (q(xi+ 12, y , tn))?

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 8

Page 12: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Central and Upwind Fluxes

• define fluxes F (n)

i+ 12 ,j

, G(n)

i,j+ 12, . . . via 1D numerical flux function F :

F (n)

i+ 12

= F(Q(n)

i ,Q(n)i+1

)G(n)

j− 12

= F(Q(n)

j−1,Q(n)j

)

• central flux:

F (n)

i+ 12

= F(Q(n)

i ,Q(n)i+1

):=

12

(F(Q(n)

i

)+ F

(Q(n)

i+1

))

leads to unstable methods for convective transport• upwind flux (here, for h-equation, F (h) = hu):

F (n)

i+ 12

= F(h(n)

i ,h(n)i+1

):=

hu∣∣i if u

∣∣i+ 1

2> 0

hu∣∣i+1 if u

∣∣i+ 1

2< 0

stable, but includes artificial diffusion

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 9

Page 13: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

(Local) Lax-Friedrichs Flux

• classical Lax-Friedrichs method uses as numerical flux:

F (n)

i+ 12

= F(Q(n)

i ,Q(n)i+1

):=

12

(F(Q(n)

i

)+ F

(Q(n)

i+1

))− h

2τ(Q(n)

i+1−Q(n)i

)

• can be interpreted as central flux plus diffusion flux:

h2τ(Q(n)

i+1 −Q(n)i

)=

h2

2τ·

Q(n)i+1 −Q(n)

i

h

with diffusion coefficient h2

2τ , where c := hτ is a velocity

(“one grid cell per time step”→ cmp. CFL condition)• idea of local Lax-Friedrichs method: use the actual wave speed

F (n)

i+ 12

:=12

(F(Q(n)

i

)+ F

(Q(n)

i+1

))−

ai+ 12

2(Q(n)

i+1 −Q(n)i

)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 10

Page 14: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Riemann Problems• solve Riemann problem to obtain solution q(xi+ 1

2, y , tn), etc.:

• 1D treatment: solve shallow water equations with initialconditions

q(xi− 12, tn) =

ql = Q(n)

i−1 if x < xi− 12

qr = Q(n)i if x > xi− 1

2

• solution: two (left or right) outgoing waves (shock or rarefaction)

Technische Universität MünchenDepartment of Informatics V

A. Breuer, S. Rettenberger, M. Bader Lab: Scientific Computing - Tsunami-Simulation

Homogenous SWEs: Exact Solution

7

2.4 lineare riemann-probleme 15

Abbildung 3: Anfangswerte des skalaren Riemann-Problems. Zum Zeit-punkt t = 0 sind die beiden konstanten Zustände ql undqr durch eine Unstetigkeit im Punkt x = 0 getrennt.

den Überlegungen in Kapitel 2.2 wissen wir bereits, dass sichjeder der Anfangswerte x0 entlang einer Charakteristik X0(t) =

x0 + ut bewegt. Da die Charakteristiken im Fall der konstantenTransportgleichung parallele Geraden waren, folgt direkt, dass Die Charakteristik

der Unstetigkeit legtdie Lösung desskalarenRiemann-Problemsfest.

sich die Unstetigkeit entlang der Geraden Xdis(t) = ut fortsetzt,wir erhalten als Lösung des skalaren Riemann-Problems

q(x, t) = q(x- ut) =

8<:

ql x- ut < 0

qr x- ut > 0.(2.24)

Graphisch nimmt q folglich in allen Punkten, die in Abbildung4 links bzw. rechts von Xdis(t) liegen, den Wert ql bzw. qr an.Als Verallgemeinerung untersuchen wir im Folgenden noch das

Abbildung 4: Lösung des Riemann-Problems für die Transportglei-chung in der x-t-Ebene. Die Charakteristik der Unste-tigkeit Xdis(t) bestimmt die Lösung.

262 13 Nonlinear Systems of Conservation Laws

13.4 A Two-Shock Riemann Solution

The shallow water equations are genuinely nonlinear, and so the Riemann problem alwaysconsists of two waves, each of which is a shock or rarefaction. In Example 13.4 the solutionconsists of one of each. The following example shows that other combinations are possible.

Example 13.5. Consider the Riemann data

h(x, 0) ! h0, u(x, 0) =!

ul if x < 0,

"ul if x > 0.(13.13)

If ul > 0, then this corresponds to two streams of water slamming into each other, withthe resulting solution shown in Figure 13.7 for the case h0 = 1 and ul = 1. The solutionis symmetric in x with h("x, t) = h(x, t) and u("x, t) = "u(x, t) at all times. A shockwave moves in each direction, bringing the fluid to rest, since the middle state must haveum = 0 by symmetry. The solution to this problem is computed in Section 13.7.1.

The characteristic structure of this solution is shown in Figure 13.8. Note again that 1-characteristics impinge on the 1-shock while crossing the 2-shock, whereas 2-characteristicsimpinge on the 2-shock.

Note that if we look at only half of the domain, say x < 0, then we obtain the solutionto the problem of shallow water flowing into a wall located at x = 0 with velocity ul . Ashock wave moves out from the wall, behind which the fluid is at rest. This is now exactlyanalogous to traffic approaching a red light, as shown in Figure 11.2.

!1 !0.5 0 0.5 1

1

1.1

1.2

1.3

1.4

1.5

1.6Height

!1 !0.5 0 0.5 1

!0.4

!0.2

0

0.2

0.4

0.6Velocity

!1 !0.5 0 0.5 1

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

Vertically integrated pressure

!1 !0.5 0 0.5 10

0.2

0.4

0.6

0.8

1Particle paths in x–t plane

Fig. 13.7. Structure of the similarity solution of the two-shock Riemann problem for the shallowwater equations with ul = " ur . The depth h, velocity u, and vertically integrated pressure aredisplayed as functions of x/t . The structure in the x–t plane is also shown with particle pathsindicated for a set of particles with the spacing between particles inversely proportional to the depth.[claw/book/chap13/twoshock]

Riemann problem Shock-Shock Riemann solution

Source: R. LeVeque, Finite Volume Methods for Hyperbolic Problems, 2002

Monday, April 15, 13

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 11

Page 15: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Riemann Problems• solve Riemann problem to obtain solution q(xi+ 1

2, y , tn), etc.:

• 1D treatment: solve shallow water equations with initialconditions

q(xi− 12, tn) =

ql = Q(n)

i−1 if x < xi− 12

qr = Q(n)i if x > xi− 1

2

• solution: two (left or right) outgoing waves (shock or rarefaction)

Rarefaction wave

Shock wave

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 11

Page 16: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Riemann Problems (2)• wave propagation approach: split the jump into fluxes

F (Qi )− F (Qi−1)−∆xψi− 12

=∑

p

αprp ≡∑

p

Zp αp ∈ R.

rp the eigenvector of the linearised problem,ψi− 1

2a fix for the source term (bathymetry)

Technische Universität MünchenDepartment of Informatics V

A. Breuer, S. Rettenberger, M. Bader Lab: Scientific Computing - Tsunami-Simulation

f-Wave Solver: Splitting the jump in fluxes

9

2.5 nichtlineare riemann-probleme 17

Abbildung 5: Lösung des Riemann-Problems für lineare Systeme inder x-t-Ebene (1 < . . . < j < 0 < . . . < n). An einemPunkt P(x, t) ergibt sich die Lösung durch seine Lage zuden Charakteristiken der Unstetigkeiten.

wieder aufgreifen werden. Dabei wird der Sprung qr - ql inEigenvektoren der Matrix A dargestellt. Durch Einsetzen derSummen aus den Gleichungen in (2.27) erhalten wir

qr - ql =

nX

p=1

(wpr -w

pl )rp

nX

p=1

↵prp =

nX

p=1

rp↵p. (2.31)

Der Sprung qr - ql wird also in Sprünge über die Unstetig-keiten zerlegt. Mit der Lösung ↵ = (↵1, . . . ,↵n)T des linearenGleichungssystems

R↵ = qr - ql (2.32)

und Wp ↵prp hat die Zerlegung (2.30) in dieser Darstellungdie Form

q(x, t) = ql +X

p:p<x/t

Wp

= qr -X

p:p>x/t

Wp.(2.33)

2.5 nichtlineare riemann-probleme

Wir werden uns in diesem Kapitel mit den grundsätzlichen Ideenfür die exakte Lösung nichtlinearer Riemann-Probleme beschäfti-gen. Die hier präsentierte Theorie hat eher einen motivierendenCharakter (vgl. [37]), für eine ausführlichere Darstellung sei dahernoch einmal auf [30] verwiesen.

Aus Kapitel 2.2 wissen wir bereits, dass q in der sklaren Er-haltungsgleichung (2.7) eindeutig durch die Charakteristikengegeben ist, zumindest solange die Lösung glatt bleibt. Da die

Monday, April 15, 13

• implementation will compute net updates:

A+∆Qi−1/2,j =∑

p : λp>0

Zp A−∆Qi−1/2,j =∑

p : λp<0

Zp

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 12

Page 17: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

The F-Wave Solver

• use Roe eigenvalues λRoe1/2 to approximate the wave speeds:

λRoe1/2 (ql ,qr ) = uRoe(ql ,qr )±

√ghRoe(ql ,qr )

• with hRoe(ql ,qr ) =12

(hl + hr ) and uRoe(ql ,qr ) =ul√

hl + ur√

hr√hl +

√hr

• eigenvectors rRoe1/2 for wave decomposition defined as

rRoe1 =

(1

λRoe1

)rRoe2 =

(1

λRoe2

)

• leads to net updates (source terms still missing):

A−∆Q :=∑

p:λRoep <0

αprp A+∆Q :=∑

p:λRoep >0

αprp

• with α1/2 computed from(

1 1λRoe

1 λRoe2

)(α1α2

)= F (Qi )−F (Qi−1)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 13

Page 18: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Finite Volume on Cartesian Grids

(hv)ijbij

hij(hu)ij∆A+ Q i−0.5,j ∆A Q i+0.5,j

∆B+ Q i,j−0.5

∆B Q i,j+0.5−

Unknowns and Numerical Fluxes:

• (averaged) unknowns h, hu, hv , and b located in cell centers• two sets of “net updates” or “numerical fluxes” per edge;

here: A+∆Qi−1/2,j , B−∆Qi,j+1/2 (“wave propagation form”)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 14

Page 19: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Flux Form vs. Wave Propagation Form

• numerical scheme in flux form:

Q(n+1)i,j = Q(n)

i,j −∆t∆x

(F (n)

i+ 12 ,j− F (n)

i− 12 ,j

)− ∆t

∆y

(G(n)

i,j+ 12−G(n)

i,j− 12

)

where F (n)

i+ 12 ,j

, G(n)

i,j+ 12, . . . approximate the flux functions F (q) and

G(q) at the grid cell boundaries

• Wave propagation form:

Qn+1i,j = Qn

i,j − ∆t∆x

(A+∆Qi−1/2,j +A−∆Qn

i+1/2,j

)

− ∆t∆y

(B+∆Qi,j−1/2 + B−∆Qn

i,j+1/2

).

where A+∆Qi−1/2,j , B−∆Qni,j+1/2, etc. are net updates

• difference in implementation: compute one “flux term” ortwo “net updates” for each edge

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 15

Page 20: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Time Stepping: Splitting or Not?

• With Dimensional Splitting:

Q∗i,j = Qni,j − ∆t

∆x

(A+∆Qn

i−1/2,j +A−∆Qni+1/2,j

)

Qn+1i,j = Q∗i,j − ∆t

∆y

(B+∆Q∗i,j−1/2 + B−∆Q∗i,j+1/2

).

two sequential “sweeps” of Riemann solves on horizontal vs.vertical edges

• vs. “un-split” method: (currently used in SWE)

Qn+1i,j = Qn

i,j − ∆t∆x

(A+∆Qn

i−1/2,j +A−∆Qni+1/2,j

)

− ∆t∆y

(B+∆Qn

i,j−1/2 + B−∆Qni,j+1/2

).

allows to combine loops on horizontal and vertical edges

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 16

Page 21: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Time Stepping

CFL Condition:

• we only consider neighbour cells for a time step⇒ information must not travel faster than one cell per timestep!

• thus: timesteps need to consider characteristic wave speeds• rule of thumb: wave speed depends on water depth, λ =

√gh

• in SWE: Riemann solvers will compute local wave speeds⇒ maximum-reduction necessary to find global time step

Adaptive time step control forces sequential main loop:

1. solve Riemann problems, compute wave speeds2. compute maximum wave speed and infer global ∆t3. update unknowns

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 17

Page 22: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

References & Literature

• LeVeque: Finite Volume Methods for Hyperbolic Problems,Cambridge University Press, 2002

• Toro: Riemann Solvers and Numerical Methods for FluidDynamics: A Practical Introduction, Springer, 2009

• Bale, LeVeque, Mitran, Rossmanith: A wave propagation methodfor conservation laws and balance laws with spatially varying fluxfunctions, SIAM Journal on Scientific Computing 24 (3), 2003

• George: Augmented Riemann solvers for the shallow waterequations over variable topography with steady states andinundation, Journal of Computational Physics 227 (6), 2008

• Breuer, Bader: Teaching Parallel Programming Models on aShallow-Water Code, Proc. of the ISPDC 2012

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 18

Page 23: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Part II

Parallel Programming Patterns

Reference: Mattson, Sanders, Massingill, Patterns forParallel Programming. Addison-Wesley, 2005.

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 19

Page 24: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Finding ConcurrencyCommon rule:

Before you start parallelising your code, make sure the serialversion is perfectly optimised!

Pro:• parallelising a badly optimised serial algorithm leads to a badly

optimised parallel algorithm• use an asymptotically optimal algorithm!

for large problems (that are worth being parallelised) asymptoticsis crucial

Contra:• exploit all available concurrency in your problem

(your optimised serial code might have unnecessary sequentialparts)

• the fastest serial algorithm is not necessarily the fastest parallelalgorithm

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 20

Page 25: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Finding Concurrency – Task Decomposition

Decompose your problem into tasks that can executeconcurrently!

Consider “un-split” time stepping:

∀i , j : Qn+1i,j = Qn

i,j − ∆t∆x

(A+∆Qn

i−1/2,j +A−∆Qni+1/2,j

)

− ∆t∆y

(B+∆Qn

i,j−1/2 + B−∆Qni,j+1/2

)

Concurrent tasks:

1. compute net updates (i.e., solve Riemann problems)A+∆Qn

i−1/2,j , B+∆Qni,j−1/2 for all (vertical and horizontal) edges

2. update quantities Qn+1i,j in all cells

or: for all cells, compute net updates (on local edges) and updatequantities Qn+1

i,j (requires two arrays for Qni,j and Qn+1

i,j , resp.)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 21

Page 26: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Finding Concurrency – Task Decomposition

Decompose your problem into tasks that can executeconcurrently!

Consider Dimensional Splitting:

Q∗i,j = Qni,j − ∆t

∆x

(A+∆Qn

i−1/2,j +A−∆Qni+1/2,j

)

Qn+1i,j = Q∗i,j − ∆t

∆y

(B+∆Q∗i,j−1/2 + B−∆Q∗i,j+1/2

).

Concurrent tasks:

1. compute net updates on all vertical edges (A+∆Qni−1/2,j , etc.)

1a. update intermediate quantities Q∗i,j in all cells

2. compute net updates on all horizontal edges (B+∆Qni,j−1/2, etc.)

2a. update quantities Qn+1i,j in all cells

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 21

Page 27: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Finding Concurrency – Data Decomposition

Decompose your data into units that can operated onrelatively independently!

Consider Dimensional Splitting:

Q∗i,j = Qni,j − ∆t

∆x

(A+∆Qn

i−1/2,j +A−∆Qni+1/2,j

)

Qn+1i,j = Q∗i,j − ∆t

∆y

(B+∆Q∗i,j−1/2 + B−∆Q∗i,j+1/2

).

Data Decomposition:

1. computation of Q∗i,j : distribute data row-wise, as computation isindependent for different j

2. update of Qn+1i,j : distribute data column-wise, as computation is

independent for different i

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 22

Page 28: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Finding Concurrency – Data Decomposition

Decompose your data into units that can operated onrelatively independently!

Consider “un-split” time stepping:

∀i , j : Qn+1i,j = Qn

i,j − ∆t∆x

(A+∆Qn

i−1/2,j +A−∆Qni+1/2,j

)

− ∆t∆y

(B+∆Qn

i,j−1/2 + B−∆Qni,j+1/2

)

Concurrent tasks:

• compute net updates requires left/right and top/down neighbours⇒ no “perfect” data decomposition possible

• partitioning of data will require extra care at boundaries of thepartitions

• and: (seemingly trivial) do not decompose quantities in Qi,j

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 22

Page 29: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Task and Data Decomposition – “Forces”

Flexibility:

• be flexible enough to adapt to different implementationrequirements

• for example: do not concentrate on a single parallel platform orprogramming model

Efficiency:

• solution needs to scale efficiently with the size of the computer• task and data decomposition need to provide enough tasks to

keep all processing elements busy

Simplicity:

• complex enough to solve the task, but simple enough to keepprogram maintainable

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 23

Page 30: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Identifying Dependencies Between Tasks

Group Tasks:

Group your tasks to simplify the managing of dependencies

Order Tasks:

Given a collection of tasks into logically related groups,order these task groups to satisfy constraints

Data Sharing:

Given a data and task decomposition, how is data sharedamong the tasks?

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 24

Page 31: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Element Updates as Task Groups

Consider “un-split” time stepping:

∀i , j : Qn+1i,j = Qn

i,j − ∆t∆x

(A+∆Qn

i−1/2,j +A−∆Qni+1/2,j

)

− ∆t∆y

(B+∆Qn

i,j−1/2 + B−∆Qni,j+1/2

)

Grouped Tasks:

• solve Riemann problems on the four cell edges• update quantities Qi,j from the net updates

Data Dependencies:

• tasks access quantities Qni±1,j±1 of neighbour cells

⇒ two copies required for Qni,j and Qn+1

i,j

• Riemann problem computed twice for each edge?

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 25

Page 32: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Riemann Solves and Updates as Task Groups

Consider Dimensional Splitting:

Q∗i,j = Qni,j − ∆t

∆x

(A+∆Qn

i−1/2,j +A−∆Qni+1/2,j

)

Qn+1i,j = Q∗i,j − ∆t

∆y

(B+∆Q∗i,j−1/2 + B−∆Q∗i,j+1/2

).

Separate Task Groups (for each of the two steps):

• solve Riemann problems on all horizontal (vertical) cell edges• update quantities Qi,j of an entire column (row)

Data Dependencies:

• tasks access neighbours in either row or column direction• requires extra storage to compute the net updates (results of the

Riemann problems)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 26

Page 33: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Computation of the CFL Condition

Consider “un-split” time stepping:

∀i , j : Qn+1i,j = Qn

i,j − ∆t∆x

(A+∆Qn

i−1/2,j +A−∆Qni+1/2,j

)

− ∆t∆y

(B+∆Qn

i,j−1/2 + B−∆Qni,j+1/2

)

where ∆t results from wave propagation speeds

Sequential Order of Tasks:

1. solve Riemann problems on the four cell edges(compute wave propagation speeds as partial results)

2. determine maximum wave speed for CFL condition ∆t3. update quantities Qi,j from the net updates

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 27

Page 34: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

The Geometric Decomposition Pattern

How can your algorithm be organized around a datastructure that has been decomposed into concurrentlyupdatable “chunks”?

Partitioning (how to select your “chunks”):

• w.r.t. size, shape, etc. (“granularity” of parallelism)• multiple levels of partitioning necessary?

Organization of parallel updates:

• need to access water height, momentum components andbathymetry from neighbour cells (possible in other partition)

• need to access net updates from neighbour partition?(alternative: compute on all involved partitions?)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 28

Page 35: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

1D Domain Decomposition – Slice-Oriented

Discussion:

• degenerates for large number of partitions: thin slices, lots ofdata exchange required at (long!) boundaries

• for dimensional splitting: slices match dependencies (vertical orhorizontal) but alternating slices required for the two update steps

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 29

Page 36: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

2D Domain Decomposition – Block-Oriented

Discussion:

+ length of domain boundaries (communication volume)− fit arbitrary number of partitions to layout of boxes

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 30

Page 37: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

3D Domain Decomposition – Cuboid-Oriented

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 31

Page 38: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

“Patches” Concept for Domain Decomposition

Discussion:

+ more fine-grain load distribution+ “empty patches” improve representation of complicated domains− overhead for additional, interior boundaries− requires scheme to assign patches to processes

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 32

Page 39: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Part III

SWE Software Design

SWE_Block

SWE_BlockCUDA SWE_RusanovBlock SWE_WavePropagationBlock

SWE_RusanovBlockCUDA SWE_WavePropagationBlockCuda

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 33

Page 40: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Basic Structure: Cartesian Grid Block

i=0 i=1 i=nx i=nx+1

j=0

j=ny+1

j=1

j=ny

Spatial Discretization:

• regular Cartesian meshes; later: allow multiple patches• ghost layers to implement boundary conditions;

connect multiple patches (complicated domains, parallelization)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 34

Page 41: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Basic Structure: Cartesian Grid Block

i=0 i=1 i=nx i=nx+1

j=0

j=ny+1

j=1

j=ny

Data Structure:

• arrays h, hu, hv, and b to hold water height, momentumcomponents and bathymetry data

• “column major” layout: j the “faster running” index in h[ i ][ j ]

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 34

Page 42: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Main Loop – Euler Time-stepping

while( t < ... ) // set boundary conditionssplash.setGhostLayer();

// compute fluxes on each edgesplash.computeNumericalFluxes();

// set largest allowed time step:dt = splash.getMaxTimestep();t += dt;

// update unknowns in each cellsplash.updateUnknowns(dt);

;→ defines interface for abstract class SWE Block

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 35

Page 43: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Set Ghost Layers – Boundary ConditionsSplit into two methods:

• setGhostLayer(): interface function in SWE Block,needs to be called by main loop

• setBoundaryConditions(): called by setGhostLayer();sets “real” boundary conditions (WALL, OUTFLOW, etc.)

switch(boundary[BND LEFT]) case WALL:

for( int j=1; j<=ny; j++) h [0][ j ] = h [1][ j ]; b [0][ j ] = b [1][ j ];hu[0][ j ] = −hu[1][j ]; hv [0][ j ] = hv [1][ j ];;break;case OUTFLOW: /∗ ... ∗/ (cmp. file SWE Block.cpp)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 36

Page 44: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Compute Numerical Fluxesmain loop to compute net updates on left/right edges:

for( int i=1; i < nx+2; i++) for( int j=1; j < ny+1; j++)

float maxEdgeSpeed;wavePropagationSolver.computeNetUpdates(

h[ i−1][j ], h[ i ][ j ],hu[ i−1][j ], hu[ i ][ j ],b[ i−1][j ], b[ i ][ j ],hNetUpdatesLeft[i−1][j−1], hNetUpdatesRight[i−1][j−1],huNetUpdatesLeft[i−1][j−1], huNetUpdatesRight[i−1][j−1],maxEdgeSpeed

);maxWaveSpeed = std::max(maxWaveSpeed, maxEdgeSpeed);

(cmp. file SWE WavePropagationBlock.cpp)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 37

Page 45: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Compute Numerical Fluxes (2)main loop to compute net updates on top/bottom edges:

for( int i=1; i < nx+1; i++) for( int j=1; j < ny+2; j++)

float maxEdgeSpeed;wavePropagationSolver.computeNetUpdates(

h[ i ][ j−1], h[ i ][ j ],hv[ i ][ j−1], hv[ i ][ j ],b[ i ][ j−1], b[ i ][ j ],hNetUpdatesBelow[i−1][j−1], hNetUpdatesAbove[i−1][j−1],hvNetUpdatesBelow[i−1][j−1], hvNetUpdatesAbove[i−1][j−1],maxEdgeSpeed

);maxWaveSpeed = std::max(maxWaveSpeed, maxEdgeSpeed);

(cmp. file SWE WavePropagationBlock.cpp)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 38

Page 46: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Determine Maximum Time Step

• variable maxWaveSpeed holds maximum wave speed• updated during computation of numerical fluxes

in method computeNumericalFluxes():

maxTimestep = std::min( dx/maxWaveSpeed, dy/maxWaveSpeed );

• simple “getter” method defined in class SWE Block:

float getMaxTimestep() return maxTimestep; ;

• hence: getMaxTimestep() for current time step should be calledafter computeNumericalFluxes()

• in general: in many situations, the maximum computation inhibitscertain optimizations→ fixed time step probably faster!

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 39

Page 47: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Update Unknowns – Euler Time Stepping

for( int i=1; i < nx+1; i++) for( int j=1; j < ny+1; j++)

h[ i ][ j ] −= dt/dx ∗ (hNetUpdatesRight[i−1][j−1]+ hNetUpdatesLeft[i][ j−1] )

+ dt /dy ∗ (hNetUpdatesAbove[i−1][j−1]+ hNetUpdatesBelow[i−1][j] );

hu[ i ][ j ] −= dt/dx ∗ (huNetUpdatesRight[i−1][j−1]+ huNetUpdatesLeft[i][j−1] );

hv[ i ][ j ] −= dt/dy ∗ (hvNetUpdatesAbove[i−1][j−1]+ hvNetUpdatesBelow[i−1][j] );

/∗ ... ∗/

(cmp. file SWE WavePropagationBlock.cpp)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 40

Page 48: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Goals for (Parallel) Implementation

Spatial Discretization:

• allow different parallel programming models• and also to switch between different numerical models⇒ class hierarchy of numerical vs. programming models

Hybrid Parallelization:

• support two levels of parallelization(such as shared/distributed memory, CPU/GPU, etc.)

• coarse-grain parallelism across Cartesian grid patches• fine-grain parallelism on patch-local operations⇒ separate fine-grain and coarse-grain parallelism

(plug&play principle)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 41

Page 49: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

SWE Class Design

SWE_Block

SWE_BlockCUDA SWE_RusanovBlock SWE_WavePropagationBlock

SWE_RusanovBlockCUDA SWE_WavePropagationBlockCuda

abstract class SWE Block:

• base class to hold data structures (arrays h, hu, hv, b)• manipulates ghost layers• methods for initialization, writing output, etc.• defines interface for main time-stepping loop:

computeNumericalFluxes(), updateUnknowns(), . . .

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 42

Page 50: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

SWE Class Design

SWE_Block

SWE_BlockCUDA SWE_RusanovBlock SWE_WavePropagationBlock

SWE_RusanovBlockCUDA SWE_WavePropagationBlockCuda

derived classes:

• for different model variants: SWE RusanovBlock,SWE WavePropagationBlock, . . .

• for different programming models: SWE BlockCUDA,SWE BlockArBB, . . .

• override computeNumericalFluxes(), updateUnknowns(), . . .→ methods relevant for parallelization

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 42

Page 51: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Example: SWE WavePropagationBlockCUDA

class SWE WavePropagationBlockCuda: public SWE BlockCUDA /∗−− definition of member variables skipped −−∗/public :

// compute a single time step (net−updates + update of the cells).void simulateTimestep( float i dT );// simulate multiple time steps ( start and end time as parameters)float simulate(float , float );// compute the numerical fluxes (net−update formulation here).void computeNumericalFluxes();// compute the new cell values.void updateUnknowns(const float i deltaT);

;

(in file SWE WavePropagationBlockCuda.hh)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 43

Page 52: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Part IV

SWE Parallelisation

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 44

Page 53: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Patches of Cartesian Grid Blocks

i=0 i=nx+1i=1 i=nx

j=ny+1

j=ny

j=1

j=0i=0 i=nx+1i=1 i=nx

j=ny+1

j=ny

j=1

j=0

Spatial Discretization:

• regular Cartesian meshes; allow multiple patches• ghost and copy layers to implement boundary conditions,

for more complicated domains, and for parallelization

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 45

Page 54: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Loop-Based Parallelism within PatchesComputing the Net Updates

• compute net updates on left/right edges:

for( int i=1; i < nx+2; i++) in parallel for( int j=1; j < ny+1; j++) in parallel

float maxEdgeSpeed;fWaveComputeNetUpdates( 9.81,

h[ i−1][j ], h[ i ][ j ], hu[ i−1][j ], hu[ i ][ j ], /∗ ... ∗/ );

• compute net updates on top/bottom edges:

for( int i=1; i < nx+1; i++) in parallel for( int j=1; j < ny+2; j++) in parallel

fWaveComputeNetUpdates( 9.81,h[ i ][ j−1], h[ i ][ j ], hv[ i ][ j−1], hv[ i ][ j ], /∗ ... ∗/ );

(function fWaveComputeNetUpdates() defined in file solver /FWaveCuda.h)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 46

Page 55: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Computing the Net UpdatesOptions for Parallelism

Parallelization of computations:

• compute all vertical edges in parallel• compute all horizontal edges in parallel• compute vertical & horizontal edges in parallel (task parallelism)

Parallel access to memory:

• concurrent read to variables h, hu, hv• exclusive write access to net-update variables on edges

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 47

Page 56: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Loop-Based Parallelism within Patches (2)Updating the Unknowns

• update unknowns from net updates on edges:

for( int i=1; i < nx+1; i++) in parallel for( int j=1; j < ny+1; j++) in parallel

h[ i ][ j ] −= dt/dx ∗ (hNetUpdatesRight[i−1][j−1]+ hNetUpdatesLeft[i][ j−1] )

+ dt /dy ∗ (hNetUpdatesAbove[i−1][j−1]+ hNetUpdatesBelow[i−1][j] );

hu[ i ][ j ] −= dt/dx ∗ (huNetUpdatesRight[i−1][j−1]+ huNetUpdatesLeft[i][j−1] );

/∗ ... ∗/

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 48

Page 57: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Updating the UnknownsOptions for Parallelism

Parallelization of computations:

• compute all cells in parallel

Parallel access to memory:

• concurrent read to net-updates on edges• exclusive write access to variables h, hu, hv

“Vectorization property”:

• exactly the same code for all cell!

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 49

Page 58: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Exchange of Values in Ghost/Copy Layers

Straightforward Approach:

• boundary conditions OUTFLOW, WALL vs. CONNECT orPARALLEL

• disadvantage: method setGhostLayer() needs to be implementedfor each derived class

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 50

Page 59: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Exchange of Values in Ghost/Copy Layers (2)

Implemented via Proxy Objects:

• grabGhostLayer() to write into ghost layer• registerCopyLayer() to read from copy layer• both methods return a proxy object (class SWE Block1D) that

references one row/column of the grid

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 51

Page 60: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Direct-Neighbour vs. “Diagonal” Communication2-step scheme to exchange data of “diagonal” ghost cells:

• several “hops” replace diagonal communication• slight increase of volume of communication (bandwidth), but

reduces number of messages (latency)• similar in 3D (26 neighbours→ 6 neighbours!)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 52

Page 61: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

MPI Parallelization– Exchange of Ghost/Copy Layers

SWE Block1D∗ leftInflow = splash.grabGhostLayer(BND LEFT);SWE Block1D∗ leftOutflow = splash.registerCopyLayer(BND LEFT);

SWE Block1D∗ rightInflow = splash.grabGhostLayer(BND RIGHT);SWE Block1D∗ rightOutflow = splash.registerCopyLayer(BND RIGHT);

MPI Sendrecv(leftOutflow−>h.elemVector(), 1, MPI COL, leftRank, 1,rightInflow−>h.elemVector(), 1, MPI COL, rightRank, 1,MPI COMM WORLD,&status);

MPI Sendrecv(rightOutflow−>h.elemVector(), 1, MPI COL, rightRank,4,leftInflow −>h.elemVector(), 1, MPI COL, leftRank, 4,MPI COMM WORLD,&status);

(cmp. file examples/swe mpi.cpp)

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 53

Page 62: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

MPI – Some Speedups

12

34

56

78

number of cores

spee

dup

rela

tive

to 1

6 co

res

16 32 48 64 96 128

measuredlinear

• 1 MPI process per core• (expensive) augmented Riemann solvers

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 54

Page 63: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Speedups for MPI/OpenMP

12

34

56

78

number of cores

spee

dup

rela

tive

to 1

6 co

res

16 32 48 64 96 128

measuredlinear

• 1 MPI process per node, 8 OpenMP threads (1 per core)• straightforward OpenMP parallelization of for-loops

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 55

Page 64: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Speedups for MPI/OpenMP

12

34

56

78

number of cores

spee

dup

rela

tive

to 1

6 co

res

16 32 48 64 96 128

measuredlinear

• 1 MPI process per node, 8 OpenMP threads (1 per core)• hybrid f-Wave/aug. Riemann solver→ poor load balancing

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 56

Page 65: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Teaching Parallel Programming with SWE

SWE in Lectures, Tutorials, Lab Courses:

• non-trivial example, but model & implementation easy to grasp• allows different parallel programming models

(MPI, OpenMP, CUDA, Intel TBB/ArBB, OpenCL, . . . )• prepared for hybrid parallelisation

Some Extensions:

• ASAGI - parallel server for geoinformation(S. Rettenberger, Master’s thesis)

• OpenGL real-time visualisation of results(T. Schnabel, student project; extended by S. Rettenberger)

→ http://www5.in.tum.de/SWE/→ https://github.com/TUM-I5

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 57

Page 66: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Part V

Workshop – SWE Parallelisation

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 58

Page 67: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

MPI Communication Between Patches

Extend sequential SWE program swe serial.cpp:

• goal: one patch (SWE Block per MPI process• establish assignment of patches to MPI ranks (“who is my

neighbour?”)• implement exchange between ghost & copy cells (preferably via

proxy objects)• parallelize adaptive time step control• produce speed-up graphs (strong and weak scaling)

Possible extensions: (for the ambitious . . . )

• compare blocking vs. non-blocking communication• try overlapping communication and computation• allow multiple patches per MPI process

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 59

Page 68: SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨ hen SWE – Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School

Technische Universitat Munchen

Loop Parallelism in SWE Using OpenMP

What should be done before starting with OpenMP?

• determine most time-consuming parts of your code (→ week 2)• use option “guided auto-parallelism” of Intel compiler

(→ welcome to try, but does not give many hints for SWE)

Extend MPI-parallel SWE program towards MPI+OpenMP:

• what are the most time-consuming loops in SWE?• ToDo: loop parallelism for these loops using #pragma ...• test performance of OpenMP vs. MPI implementation

Possible extensions: (for the ambitious . . . )

• parallelize adaptive time step control (reduction)• multiple-patch version: try OpenMP on patches

Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code

Computer Simulations in Science and Engineering, July 8–19, 2013 60