SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨...
Transcript of SWE – Anatomy of a Parallel Shallow Water Code · 2013-07-16 · Technische Universit¨at Munc¨...
Technische Universitat Munchen
SWE –Anatomy of a Parallel Shallow Water Code
CSCS-FoMICS-USI Summer School onComputer Simulations in Science and Engineering
Michael Bader
July 8–19, 2013
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 1
Technische Universitat Munchen
Teaching Parallel Programming Models . . .
Starting Point: Lecture on Parallel Programming
• classical approaches for shared & distributed memory:OpenMP and MPI
• “something more fancy”→ GPU computing (CUDA, e.g.)• motivating example to teach different models and compare their
properties
“Motivating Example”:
• not just Jacobi or Gauß-Seidel• not the heat equation again . . .• inspired by a CFD code: “Nast” by Griebel et al.• turned out to become shallow water equations• and is heavily used for summer schools . . .
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 2
Technische Universitat Munchen
Towards Tsunami Simulation with SWE
Shallow Water Code – Summary
• Finite Volume discretization on regular Cartesian grids→ simple numerics (but can be extended to state-of-the-art)
• patch-based approach with ghost cells for communication→ wide-spread design pattern for parallelization
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 3
Technische Universitat Munchen
Towards Tsunami Simulation with SWE (2)
Shallow Water Code – Bells & Whistles
• included augmented Riemann solvers (D. George, R. LeVeque)→ allows to simulate inundation
• developed towards hybrid parallel architectures→ now runs on GPU clusters
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 4
Technische Universitat Munchen
Part I
Model and Discretization
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 5
Technische Universitat Munchen
Model: The Shallow Water Equations
Simplified setting (no friction, no viscosity, no Coriolis forces, etc.):
∂
∂t
hhuhv
+
∂
∂x
huhu2 + 1
2 gh2
huv
+
∂
∂y
hvhuv
hv2 + 12 gh2
=
0− ∂∂x (ghb)
− ∂∂y (ghb)
Quantities and unknowns:
Technische Universität MünchenDepartment of Informatics V
A. Breuer, S. Rettenberger, M. Bader Lab: Scientific Computing - Tsunami-Simulation
SWEs: Differential Form
5
Prof. Dr. M. BaderDipl.-Math. A. BreuerS. Rettenberger, M. Sc.
Bachelor-Praktikum:Tsunami-Simulation April 2, 2013
In this assignment we are going to implement and test the most basic functionality of ourlab-course: The f-wave solver for the one-dimensional shallow water equations. The shallowwater equations are a system of nonlinear hyperbolic conservations laws with an optionalsource term:
h
hu
t
+
hu
hu2 + 12gh2
x
= S(x, t). (1)
The quantities q = [h, hu]T are defined by h(x, t), the space-time dependent height of thewater column and hu(x, t), the space-time dependent momentum in spatial x-direction (uis the particle velocity). g the gravity constant (usually g := 9.81m/s2) and f := [hu, hu2 +12gh2]T the flux function. As source term S(x, t) we will consider the e↵ect of space-dependent
bathymetry only S(x) = [0,-ghBx]T , embedding of additional forces, such as friction or the
coriolis e↵ect is possible. Figure TODO illustrates the involved quantities.To verify, that the most basic functionality of our program works as expected a proper
(unit-) testing is required. We will do testing by a selection of standardized tests, for whicha solution is available.
Remark As units we use meters (m) and seconds (s) for all computations.
Literature
We discuss the basic ideas of numerics, software and strategies in our meetings, neverthelessmany important details can’t be covered in such a short time. We recommend a basic set ofliterature in each assignment as hint for your personal studies. In terms of this assignmentwe recommend the following list of books, papers and guides:
• Finite volume methods for hyperbolic problems, R. J. LeVeque, 2002
• Riemann solvers and numerical methods for fluid dynamics, E. F. Toro, 2009
• A wave propagation method for conservation laws and balance laws with spatially vary-ing flux functions, D. S. Bale, 2003
• Thinking in C++: http://mindview.net/Books/TICPP/ThinkingInCPP2e.html
• git Documentation: http://git-scm.com/documentation
• Doxygen Manual : http://www.stack.nl/~dimitri/doxygen/manual
• CxxTest User Guide: http://cxxtest.com/guide.html
• SCons user Guide: http://www.scons.org/doc/production/HTML/scons-user.html
• Paraview Documentation: http://www.itk.org/Wiki/ParaView/Users_Guide/Table_Of_Contents
Monday, April 15, 13
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 6
Technische Universitat Munchen
Model: The Shallow Water Equations
Simplified setting (no friction, no viscosity, no Coriolis forces, etc.):
∂
∂t
hhuhv
+
∂
∂x
huhu2 + 1
2 gh2
huv
+
∂
∂y
hvhuv
hv2 + 12 gh2
=
0− ∂∂x (ghb)
− ∂∂y (ghb)
Write as generalized hyperbolic PDE:
• 2D setting, three quantities: q = (q1,q2,q3)T = (h,hu,hv)T
∂
∂tq +
∂
∂xF (q) +
∂
∂yG(q) = S(q, x , y , t)
• with flux functions:
F (q) =
q2
q22/q1 + 1
2 gq21
q2q3/q1
G(q) =
q3q2q3/q1
q23/q1 + 1
2 gq21
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 6
Technische Universitat Munchen
Model: The Shallow Water Equations
Simplified setting (no friction, no viscosity, no Coriolis forces, etc.):
∂
∂t
hhuhv
+
∂
∂x
huhu2 + 1
2 gh2
huv
+
∂
∂y
hvhuv
hv2 + 12 gh2
=
0− ∂∂x (ghb)
− ∂∂y (ghb)
Derived from conservations laws:
• h equation: conservation of mass• equations for hu and hv : conservation of momentum• 1
2 gh2: averaged hydrostatic pressure due to water column h,similar: bathymetry terms − ∂
∂x (ghb) and − ∂∂y (ghb)
• may also be derived by vertical averaging from the 3Dincompressible Navier-Stokes equations
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 6
Technische Universitat Munchen
Model: The Shallow Water Equations
Simplified setting (no friction, no viscosity, no Coriolis forces, etc.):
∂
∂t
hhuhv
+
∂
∂x
huhu2 + 1
2 gh2
huv
+
∂
∂y
hvhuv
hv2 + 12 gh2
=
0− ∂∂x (ghb)
− ∂∂y (ghb)
The ocean as “shallow water”??
• compare horizontal (∼ 1000 km) to vertical (∼ 4 km) length scale• wave lengths large compared to water depth• vertical flow may be neglected; movement of the “entire water
column”
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 6
Technische Universitat Munchen
Finite Volume Discretisation
• discretise system of PDEs
∂
∂tq +
∂
∂xF (q) +
∂
∂yG(q) = S(t , x , y)
• results from integral equation:
∂
∂t
tn+1∫
tn
∫
Ω
q dω dt +
tn+1∫
tn
∫
∂Ω
~F (q) · ~n ds dt = . . .
• use averaged quantities Q(n)i,j in finite volume elements Ωij :
Qij (t) :=1|Ωij |
∫
Ωij
q dω ∂
∂t
tn+1∫
tn
∫
Ω
q dω dt = |Ωij |(
Q(n+1)i,j −Q(n)
i,j
)
• What about the flux integral?
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 7
Technische Universitat Munchen
Finite Volume Discretisation (2)• flux integral on Cartesian grids:tn+1∫
tn
∫
∂Ω
~F (q) · ~n ds dt =
tn+1∫
tn
yj+ 12∫
yj− 12
F (q(xi+ 12, y , t))− F (q(xi− 1
2, y , t)) dy dt
+
tn+1∫
tn
xi+ 12∫
xi− 12
G(q(x , yj+ 12, t))−G(q(x , yi− 1
2, t)) dy dt
• leads to explicit time stepping scheme:
Q(n+1)i,j −Q(n)
i,j =∆t∆y
(F (q(xi+ 1
2, y , tn))− F (q(xi− 1
2, y , tn))
)
+∆t∆x
(G(q(x , yj+ 1
2, tn))−G(q(x , yi− 1
2, tn))
)
• how to compute F (n)
i+ 12 ,j
:= F (q(xi+ 12, y , tn))?
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 8
Technische Universitat Munchen
Central and Upwind Fluxes
• define fluxes F (n)
i+ 12 ,j
, G(n)
i,j+ 12, . . . via 1D numerical flux function F :
F (n)
i+ 12
= F(Q(n)
i ,Q(n)i+1
)G(n)
j− 12
= F(Q(n)
j−1,Q(n)j
)
• central flux:
F (n)
i+ 12
= F(Q(n)
i ,Q(n)i+1
):=
12
(F(Q(n)
i
)+ F
(Q(n)
i+1
))
leads to unstable methods for convective transport• upwind flux (here, for h-equation, F (h) = hu):
F (n)
i+ 12
= F(h(n)
i ,h(n)i+1
):=
hu∣∣i if u
∣∣i+ 1
2> 0
hu∣∣i+1 if u
∣∣i+ 1
2< 0
stable, but includes artificial diffusion
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 9
Technische Universitat Munchen
(Local) Lax-Friedrichs Flux
• classical Lax-Friedrichs method uses as numerical flux:
F (n)
i+ 12
= F(Q(n)
i ,Q(n)i+1
):=
12
(F(Q(n)
i
)+ F
(Q(n)
i+1
))− h
2τ(Q(n)
i+1−Q(n)i
)
• can be interpreted as central flux plus diffusion flux:
h2τ(Q(n)
i+1 −Q(n)i
)=
h2
2τ·
Q(n)i+1 −Q(n)
i
h
with diffusion coefficient h2
2τ , where c := hτ is a velocity
(“one grid cell per time step”→ cmp. CFL condition)• idea of local Lax-Friedrichs method: use the actual wave speed
F (n)
i+ 12
:=12
(F(Q(n)
i
)+ F
(Q(n)
i+1
))−
ai+ 12
2(Q(n)
i+1 −Q(n)i
)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 10
Technische Universitat Munchen
Riemann Problems• solve Riemann problem to obtain solution q(xi+ 1
2, y , tn), etc.:
• 1D treatment: solve shallow water equations with initialconditions
q(xi− 12, tn) =
ql = Q(n)
i−1 if x < xi− 12
qr = Q(n)i if x > xi− 1
2
• solution: two (left or right) outgoing waves (shock or rarefaction)
Technische Universität MünchenDepartment of Informatics V
A. Breuer, S. Rettenberger, M. Bader Lab: Scientific Computing - Tsunami-Simulation
Homogenous SWEs: Exact Solution
7
2.4 lineare riemann-probleme 15
Abbildung 3: Anfangswerte des skalaren Riemann-Problems. Zum Zeit-punkt t = 0 sind die beiden konstanten Zustände ql undqr durch eine Unstetigkeit im Punkt x = 0 getrennt.
den Überlegungen in Kapitel 2.2 wissen wir bereits, dass sichjeder der Anfangswerte x0 entlang einer Charakteristik X0(t) =
x0 + ut bewegt. Da die Charakteristiken im Fall der konstantenTransportgleichung parallele Geraden waren, folgt direkt, dass Die Charakteristik
der Unstetigkeit legtdie Lösung desskalarenRiemann-Problemsfest.
sich die Unstetigkeit entlang der Geraden Xdis(t) = ut fortsetzt,wir erhalten als Lösung des skalaren Riemann-Problems
q(x, t) = q(x- ut) =
8<:
ql x- ut < 0
qr x- ut > 0.(2.24)
Graphisch nimmt q folglich in allen Punkten, die in Abbildung4 links bzw. rechts von Xdis(t) liegen, den Wert ql bzw. qr an.Als Verallgemeinerung untersuchen wir im Folgenden noch das
Abbildung 4: Lösung des Riemann-Problems für die Transportglei-chung in der x-t-Ebene. Die Charakteristik der Unste-tigkeit Xdis(t) bestimmt die Lösung.
262 13 Nonlinear Systems of Conservation Laws
13.4 A Two-Shock Riemann Solution
The shallow water equations are genuinely nonlinear, and so the Riemann problem alwaysconsists of two waves, each of which is a shock or rarefaction. In Example 13.4 the solutionconsists of one of each. The following example shows that other combinations are possible.
Example 13.5. Consider the Riemann data
h(x, 0) ! h0, u(x, 0) =!
ul if x < 0,
"ul if x > 0.(13.13)
If ul > 0, then this corresponds to two streams of water slamming into each other, withthe resulting solution shown in Figure 13.7 for the case h0 = 1 and ul = 1. The solutionis symmetric in x with h("x, t) = h(x, t) and u("x, t) = "u(x, t) at all times. A shockwave moves in each direction, bringing the fluid to rest, since the middle state must haveum = 0 by symmetry. The solution to this problem is computed in Section 13.7.1.
The characteristic structure of this solution is shown in Figure 13.8. Note again that 1-characteristics impinge on the 1-shock while crossing the 2-shock, whereas 2-characteristicsimpinge on the 2-shock.
Note that if we look at only half of the domain, say x < 0, then we obtain the solutionto the problem of shallow water flowing into a wall located at x = 0 with velocity ul . Ashock wave moves out from the wall, behind which the fluid is at rest. This is now exactlyanalogous to traffic approaching a red light, as shown in Figure 11.2.
!1 !0.5 0 0.5 1
1
1.1
1.2
1.3
1.4
1.5
1.6Height
!1 !0.5 0 0.5 1
!0.4
!0.2
0
0.2
0.4
0.6Velocity
!1 !0.5 0 0.5 1
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
Vertically integrated pressure
!1 !0.5 0 0.5 10
0.2
0.4
0.6
0.8
1Particle paths in x–t plane
Fig. 13.7. Structure of the similarity solution of the two-shock Riemann problem for the shallowwater equations with ul = " ur . The depth h, velocity u, and vertically integrated pressure aredisplayed as functions of x/t . The structure in the x–t plane is also shown with particle pathsindicated for a set of particles with the spacing between particles inversely proportional to the depth.[claw/book/chap13/twoshock]
Riemann problem Shock-Shock Riemann solution
Source: R. LeVeque, Finite Volume Methods for Hyperbolic Problems, 2002
Monday, April 15, 13
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 11
Technische Universitat Munchen
Riemann Problems• solve Riemann problem to obtain solution q(xi+ 1
2, y , tn), etc.:
• 1D treatment: solve shallow water equations with initialconditions
q(xi− 12, tn) =
ql = Q(n)
i−1 if x < xi− 12
qr = Q(n)i if x > xi− 1
2
• solution: two (left or right) outgoing waves (shock or rarefaction)
Rarefaction wave
Shock wave
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 11
Technische Universitat Munchen
Riemann Problems (2)• wave propagation approach: split the jump into fluxes
F (Qi )− F (Qi−1)−∆xψi− 12
=∑
p
αprp ≡∑
p
Zp αp ∈ R.
rp the eigenvector of the linearised problem,ψi− 1
2a fix for the source term (bathymetry)
Technische Universität MünchenDepartment of Informatics V
A. Breuer, S. Rettenberger, M. Bader Lab: Scientific Computing - Tsunami-Simulation
f-Wave Solver: Splitting the jump in fluxes
9
2.5 nichtlineare riemann-probleme 17
Abbildung 5: Lösung des Riemann-Problems für lineare Systeme inder x-t-Ebene (1 < . . . < j < 0 < . . . < n). An einemPunkt P(x, t) ergibt sich die Lösung durch seine Lage zuden Charakteristiken der Unstetigkeiten.
wieder aufgreifen werden. Dabei wird der Sprung qr - ql inEigenvektoren der Matrix A dargestellt. Durch Einsetzen derSummen aus den Gleichungen in (2.27) erhalten wir
qr - ql =
nX
p=1
(wpr -w
pl )rp
nX
p=1
↵prp =
nX
p=1
rp↵p. (2.31)
Der Sprung qr - ql wird also in Sprünge über die Unstetig-keiten zerlegt. Mit der Lösung ↵ = (↵1, . . . ,↵n)T des linearenGleichungssystems
R↵ = qr - ql (2.32)
und Wp ↵prp hat die Zerlegung (2.30) in dieser Darstellungdie Form
q(x, t) = ql +X
p:p<x/t
Wp
= qr -X
p:p>x/t
Wp.(2.33)
2.5 nichtlineare riemann-probleme
Wir werden uns in diesem Kapitel mit den grundsätzlichen Ideenfür die exakte Lösung nichtlinearer Riemann-Probleme beschäfti-gen. Die hier präsentierte Theorie hat eher einen motivierendenCharakter (vgl. [37]), für eine ausführlichere Darstellung sei dahernoch einmal auf [30] verwiesen.
Aus Kapitel 2.2 wissen wir bereits, dass q in der sklaren Er-haltungsgleichung (2.7) eindeutig durch die Charakteristikengegeben ist, zumindest solange die Lösung glatt bleibt. Da die
Monday, April 15, 13
• implementation will compute net updates:
A+∆Qi−1/2,j =∑
p : λp>0
Zp A−∆Qi−1/2,j =∑
p : λp<0
Zp
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 12
Technische Universitat Munchen
The F-Wave Solver
• use Roe eigenvalues λRoe1/2 to approximate the wave speeds:
λRoe1/2 (ql ,qr ) = uRoe(ql ,qr )±
√ghRoe(ql ,qr )
• with hRoe(ql ,qr ) =12
(hl + hr ) and uRoe(ql ,qr ) =ul√
hl + ur√
hr√hl +
√hr
• eigenvectors rRoe1/2 for wave decomposition defined as
rRoe1 =
(1
λRoe1
)rRoe2 =
(1
λRoe2
)
• leads to net updates (source terms still missing):
A−∆Q :=∑
p:λRoep <0
αprp A+∆Q :=∑
p:λRoep >0
αprp
• with α1/2 computed from(
1 1λRoe
1 λRoe2
)(α1α2
)= F (Qi )−F (Qi−1)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 13
Technische Universitat Munchen
Finite Volume on Cartesian Grids
(hv)ijbij
hij(hu)ij∆A+ Q i−0.5,j ∆A Q i+0.5,j
−
∆B+ Q i,j−0.5
∆B Q i,j+0.5−
Unknowns and Numerical Fluxes:
• (averaged) unknowns h, hu, hv , and b located in cell centers• two sets of “net updates” or “numerical fluxes” per edge;
here: A+∆Qi−1/2,j , B−∆Qi,j+1/2 (“wave propagation form”)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 14
Technische Universitat Munchen
Flux Form vs. Wave Propagation Form
• numerical scheme in flux form:
Q(n+1)i,j = Q(n)
i,j −∆t∆x
(F (n)
i+ 12 ,j− F (n)
i− 12 ,j
)− ∆t
∆y
(G(n)
i,j+ 12−G(n)
i,j− 12
)
where F (n)
i+ 12 ,j
, G(n)
i,j+ 12, . . . approximate the flux functions F (q) and
G(q) at the grid cell boundaries
• Wave propagation form:
Qn+1i,j = Qn
i,j − ∆t∆x
(A+∆Qi−1/2,j +A−∆Qn
i+1/2,j
)
− ∆t∆y
(B+∆Qi,j−1/2 + B−∆Qn
i,j+1/2
).
where A+∆Qi−1/2,j , B−∆Qni,j+1/2, etc. are net updates
• difference in implementation: compute one “flux term” ortwo “net updates” for each edge
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 15
Technische Universitat Munchen
Time Stepping: Splitting or Not?
• With Dimensional Splitting:
Q∗i,j = Qni,j − ∆t
∆x
(A+∆Qn
i−1/2,j +A−∆Qni+1/2,j
)
Qn+1i,j = Q∗i,j − ∆t
∆y
(B+∆Q∗i,j−1/2 + B−∆Q∗i,j+1/2
).
two sequential “sweeps” of Riemann solves on horizontal vs.vertical edges
• vs. “un-split” method: (currently used in SWE)
Qn+1i,j = Qn
i,j − ∆t∆x
(A+∆Qn
i−1/2,j +A−∆Qni+1/2,j
)
− ∆t∆y
(B+∆Qn
i,j−1/2 + B−∆Qni,j+1/2
).
allows to combine loops on horizontal and vertical edges
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 16
Technische Universitat Munchen
Time Stepping
CFL Condition:
• we only consider neighbour cells for a time step⇒ information must not travel faster than one cell per timestep!
• thus: timesteps need to consider characteristic wave speeds• rule of thumb: wave speed depends on water depth, λ =
√gh
• in SWE: Riemann solvers will compute local wave speeds⇒ maximum-reduction necessary to find global time step
Adaptive time step control forces sequential main loop:
1. solve Riemann problems, compute wave speeds2. compute maximum wave speed and infer global ∆t3. update unknowns
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 17
Technische Universitat Munchen
References & Literature
• LeVeque: Finite Volume Methods for Hyperbolic Problems,Cambridge University Press, 2002
• Toro: Riemann Solvers and Numerical Methods for FluidDynamics: A Practical Introduction, Springer, 2009
• Bale, LeVeque, Mitran, Rossmanith: A wave propagation methodfor conservation laws and balance laws with spatially varying fluxfunctions, SIAM Journal on Scientific Computing 24 (3), 2003
• George: Augmented Riemann solvers for the shallow waterequations over variable topography with steady states andinundation, Journal of Computational Physics 227 (6), 2008
• Breuer, Bader: Teaching Parallel Programming Models on aShallow-Water Code, Proc. of the ISPDC 2012
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 18
Technische Universitat Munchen
Part II
Parallel Programming Patterns
Reference: Mattson, Sanders, Massingill, Patterns forParallel Programming. Addison-Wesley, 2005.
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 19
Technische Universitat Munchen
Finding ConcurrencyCommon rule:
Before you start parallelising your code, make sure the serialversion is perfectly optimised!
Pro:• parallelising a badly optimised serial algorithm leads to a badly
optimised parallel algorithm• use an asymptotically optimal algorithm!
for large problems (that are worth being parallelised) asymptoticsis crucial
Contra:• exploit all available concurrency in your problem
(your optimised serial code might have unnecessary sequentialparts)
• the fastest serial algorithm is not necessarily the fastest parallelalgorithm
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 20
Technische Universitat Munchen
Finding Concurrency – Task Decomposition
Decompose your problem into tasks that can executeconcurrently!
Consider “un-split” time stepping:
∀i , j : Qn+1i,j = Qn
i,j − ∆t∆x
(A+∆Qn
i−1/2,j +A−∆Qni+1/2,j
)
− ∆t∆y
(B+∆Qn
i,j−1/2 + B−∆Qni,j+1/2
)
Concurrent tasks:
1. compute net updates (i.e., solve Riemann problems)A+∆Qn
i−1/2,j , B+∆Qni,j−1/2 for all (vertical and horizontal) edges
2. update quantities Qn+1i,j in all cells
or: for all cells, compute net updates (on local edges) and updatequantities Qn+1
i,j (requires two arrays for Qni,j and Qn+1
i,j , resp.)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 21
Technische Universitat Munchen
Finding Concurrency – Task Decomposition
Decompose your problem into tasks that can executeconcurrently!
Consider Dimensional Splitting:
Q∗i,j = Qni,j − ∆t
∆x
(A+∆Qn
i−1/2,j +A−∆Qni+1/2,j
)
Qn+1i,j = Q∗i,j − ∆t
∆y
(B+∆Q∗i,j−1/2 + B−∆Q∗i,j+1/2
).
Concurrent tasks:
1. compute net updates on all vertical edges (A+∆Qni−1/2,j , etc.)
1a. update intermediate quantities Q∗i,j in all cells
2. compute net updates on all horizontal edges (B+∆Qni,j−1/2, etc.)
2a. update quantities Qn+1i,j in all cells
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 21
Technische Universitat Munchen
Finding Concurrency – Data Decomposition
Decompose your data into units that can operated onrelatively independently!
Consider Dimensional Splitting:
Q∗i,j = Qni,j − ∆t
∆x
(A+∆Qn
i−1/2,j +A−∆Qni+1/2,j
)
Qn+1i,j = Q∗i,j − ∆t
∆y
(B+∆Q∗i,j−1/2 + B−∆Q∗i,j+1/2
).
Data Decomposition:
1. computation of Q∗i,j : distribute data row-wise, as computation isindependent for different j
2. update of Qn+1i,j : distribute data column-wise, as computation is
independent for different i
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 22
Technische Universitat Munchen
Finding Concurrency – Data Decomposition
Decompose your data into units that can operated onrelatively independently!
Consider “un-split” time stepping:
∀i , j : Qn+1i,j = Qn
i,j − ∆t∆x
(A+∆Qn
i−1/2,j +A−∆Qni+1/2,j
)
− ∆t∆y
(B+∆Qn
i,j−1/2 + B−∆Qni,j+1/2
)
Concurrent tasks:
• compute net updates requires left/right and top/down neighbours⇒ no “perfect” data decomposition possible
• partitioning of data will require extra care at boundaries of thepartitions
• and: (seemingly trivial) do not decompose quantities in Qi,j
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 22
Technische Universitat Munchen
Task and Data Decomposition – “Forces”
Flexibility:
• be flexible enough to adapt to different implementationrequirements
• for example: do not concentrate on a single parallel platform orprogramming model
Efficiency:
• solution needs to scale efficiently with the size of the computer• task and data decomposition need to provide enough tasks to
keep all processing elements busy
Simplicity:
• complex enough to solve the task, but simple enough to keepprogram maintainable
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 23
Technische Universitat Munchen
Identifying Dependencies Between Tasks
Group Tasks:
Group your tasks to simplify the managing of dependencies
Order Tasks:
Given a collection of tasks into logically related groups,order these task groups to satisfy constraints
Data Sharing:
Given a data and task decomposition, how is data sharedamong the tasks?
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 24
Technische Universitat Munchen
Element Updates as Task Groups
Consider “un-split” time stepping:
∀i , j : Qn+1i,j = Qn
i,j − ∆t∆x
(A+∆Qn
i−1/2,j +A−∆Qni+1/2,j
)
− ∆t∆y
(B+∆Qn
i,j−1/2 + B−∆Qni,j+1/2
)
Grouped Tasks:
• solve Riemann problems on the four cell edges• update quantities Qi,j from the net updates
Data Dependencies:
• tasks access quantities Qni±1,j±1 of neighbour cells
⇒ two copies required for Qni,j and Qn+1
i,j
• Riemann problem computed twice for each edge?
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 25
Technische Universitat Munchen
Riemann Solves and Updates as Task Groups
Consider Dimensional Splitting:
Q∗i,j = Qni,j − ∆t
∆x
(A+∆Qn
i−1/2,j +A−∆Qni+1/2,j
)
Qn+1i,j = Q∗i,j − ∆t
∆y
(B+∆Q∗i,j−1/2 + B−∆Q∗i,j+1/2
).
Separate Task Groups (for each of the two steps):
• solve Riemann problems on all horizontal (vertical) cell edges• update quantities Qi,j of an entire column (row)
Data Dependencies:
• tasks access neighbours in either row or column direction• requires extra storage to compute the net updates (results of the
Riemann problems)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 26
Technische Universitat Munchen
Computation of the CFL Condition
Consider “un-split” time stepping:
∀i , j : Qn+1i,j = Qn
i,j − ∆t∆x
(A+∆Qn
i−1/2,j +A−∆Qni+1/2,j
)
− ∆t∆y
(B+∆Qn
i,j−1/2 + B−∆Qni,j+1/2
)
where ∆t results from wave propagation speeds
Sequential Order of Tasks:
1. solve Riemann problems on the four cell edges(compute wave propagation speeds as partial results)
2. determine maximum wave speed for CFL condition ∆t3. update quantities Qi,j from the net updates
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 27
Technische Universitat Munchen
The Geometric Decomposition Pattern
How can your algorithm be organized around a datastructure that has been decomposed into concurrentlyupdatable “chunks”?
Partitioning (how to select your “chunks”):
• w.r.t. size, shape, etc. (“granularity” of parallelism)• multiple levels of partitioning necessary?
Organization of parallel updates:
• need to access water height, momentum components andbathymetry from neighbour cells (possible in other partition)
• need to access net updates from neighbour partition?(alternative: compute on all involved partitions?)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 28
Technische Universitat Munchen
1D Domain Decomposition – Slice-Oriented
Discussion:
• degenerates for large number of partitions: thin slices, lots ofdata exchange required at (long!) boundaries
• for dimensional splitting: slices match dependencies (vertical orhorizontal) but alternating slices required for the two update steps
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 29
Technische Universitat Munchen
2D Domain Decomposition – Block-Oriented
Discussion:
+ length of domain boundaries (communication volume)− fit arbitrary number of partitions to layout of boxes
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 30
Technische Universitat Munchen
3D Domain Decomposition – Cuboid-Oriented
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 31
Technische Universitat Munchen
“Patches” Concept for Domain Decomposition
Discussion:
+ more fine-grain load distribution+ “empty patches” improve representation of complicated domains− overhead for additional, interior boundaries− requires scheme to assign patches to processes
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 32
Technische Universitat Munchen
Part III
SWE Software Design
SWE_Block
SWE_BlockCUDA SWE_RusanovBlock SWE_WavePropagationBlock
SWE_RusanovBlockCUDA SWE_WavePropagationBlockCuda
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 33
Technische Universitat Munchen
Basic Structure: Cartesian Grid Block
i=0 i=1 i=nx i=nx+1
j=0
j=ny+1
j=1
j=ny
Spatial Discretization:
• regular Cartesian meshes; later: allow multiple patches• ghost layers to implement boundary conditions;
connect multiple patches (complicated domains, parallelization)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 34
Technische Universitat Munchen
Basic Structure: Cartesian Grid Block
i=0 i=1 i=nx i=nx+1
j=0
j=ny+1
j=1
j=ny
Data Structure:
• arrays h, hu, hv, and b to hold water height, momentumcomponents and bathymetry data
• “column major” layout: j the “faster running” index in h[ i ][ j ]
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 34
Technische Universitat Munchen
Main Loop – Euler Time-stepping
while( t < ... ) // set boundary conditionssplash.setGhostLayer();
// compute fluxes on each edgesplash.computeNumericalFluxes();
// set largest allowed time step:dt = splash.getMaxTimestep();t += dt;
// update unknowns in each cellsplash.updateUnknowns(dt);
;→ defines interface for abstract class SWE Block
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 35
Technische Universitat Munchen
Set Ghost Layers – Boundary ConditionsSplit into two methods:
• setGhostLayer(): interface function in SWE Block,needs to be called by main loop
• setBoundaryConditions(): called by setGhostLayer();sets “real” boundary conditions (WALL, OUTFLOW, etc.)
switch(boundary[BND LEFT]) case WALL:
for( int j=1; j<=ny; j++) h [0][ j ] = h [1][ j ]; b [0][ j ] = b [1][ j ];hu[0][ j ] = −hu[1][j ]; hv [0][ j ] = hv [1][ j ];;break;case OUTFLOW: /∗ ... ∗/ (cmp. file SWE Block.cpp)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 36
Technische Universitat Munchen
Compute Numerical Fluxesmain loop to compute net updates on left/right edges:
for( int i=1; i < nx+2; i++) for( int j=1; j < ny+1; j++)
float maxEdgeSpeed;wavePropagationSolver.computeNetUpdates(
h[ i−1][j ], h[ i ][ j ],hu[ i−1][j ], hu[ i ][ j ],b[ i−1][j ], b[ i ][ j ],hNetUpdatesLeft[i−1][j−1], hNetUpdatesRight[i−1][j−1],huNetUpdatesLeft[i−1][j−1], huNetUpdatesRight[i−1][j−1],maxEdgeSpeed
);maxWaveSpeed = std::max(maxWaveSpeed, maxEdgeSpeed);
(cmp. file SWE WavePropagationBlock.cpp)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 37
Technische Universitat Munchen
Compute Numerical Fluxes (2)main loop to compute net updates on top/bottom edges:
for( int i=1; i < nx+1; i++) for( int j=1; j < ny+2; j++)
float maxEdgeSpeed;wavePropagationSolver.computeNetUpdates(
h[ i ][ j−1], h[ i ][ j ],hv[ i ][ j−1], hv[ i ][ j ],b[ i ][ j−1], b[ i ][ j ],hNetUpdatesBelow[i−1][j−1], hNetUpdatesAbove[i−1][j−1],hvNetUpdatesBelow[i−1][j−1], hvNetUpdatesAbove[i−1][j−1],maxEdgeSpeed
);maxWaveSpeed = std::max(maxWaveSpeed, maxEdgeSpeed);
(cmp. file SWE WavePropagationBlock.cpp)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 38
Technische Universitat Munchen
Determine Maximum Time Step
• variable maxWaveSpeed holds maximum wave speed• updated during computation of numerical fluxes
in method computeNumericalFluxes():
maxTimestep = std::min( dx/maxWaveSpeed, dy/maxWaveSpeed );
• simple “getter” method defined in class SWE Block:
float getMaxTimestep() return maxTimestep; ;
• hence: getMaxTimestep() for current time step should be calledafter computeNumericalFluxes()
• in general: in many situations, the maximum computation inhibitscertain optimizations→ fixed time step probably faster!
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 39
Technische Universitat Munchen
Update Unknowns – Euler Time Stepping
for( int i=1; i < nx+1; i++) for( int j=1; j < ny+1; j++)
h[ i ][ j ] −= dt/dx ∗ (hNetUpdatesRight[i−1][j−1]+ hNetUpdatesLeft[i][ j−1] )
+ dt /dy ∗ (hNetUpdatesAbove[i−1][j−1]+ hNetUpdatesBelow[i−1][j] );
hu[ i ][ j ] −= dt/dx ∗ (huNetUpdatesRight[i−1][j−1]+ huNetUpdatesLeft[i][j−1] );
hv[ i ][ j ] −= dt/dy ∗ (hvNetUpdatesAbove[i−1][j−1]+ hvNetUpdatesBelow[i−1][j] );
/∗ ... ∗/
(cmp. file SWE WavePropagationBlock.cpp)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 40
Technische Universitat Munchen
Goals for (Parallel) Implementation
Spatial Discretization:
• allow different parallel programming models• and also to switch between different numerical models⇒ class hierarchy of numerical vs. programming models
Hybrid Parallelization:
• support two levels of parallelization(such as shared/distributed memory, CPU/GPU, etc.)
• coarse-grain parallelism across Cartesian grid patches• fine-grain parallelism on patch-local operations⇒ separate fine-grain and coarse-grain parallelism
(plug&play principle)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 41
Technische Universitat Munchen
SWE Class Design
SWE_Block
SWE_BlockCUDA SWE_RusanovBlock SWE_WavePropagationBlock
SWE_RusanovBlockCUDA SWE_WavePropagationBlockCuda
abstract class SWE Block:
• base class to hold data structures (arrays h, hu, hv, b)• manipulates ghost layers• methods for initialization, writing output, etc.• defines interface for main time-stepping loop:
computeNumericalFluxes(), updateUnknowns(), . . .
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 42
Technische Universitat Munchen
SWE Class Design
SWE_Block
SWE_BlockCUDA SWE_RusanovBlock SWE_WavePropagationBlock
SWE_RusanovBlockCUDA SWE_WavePropagationBlockCuda
derived classes:
• for different model variants: SWE RusanovBlock,SWE WavePropagationBlock, . . .
• for different programming models: SWE BlockCUDA,SWE BlockArBB, . . .
• override computeNumericalFluxes(), updateUnknowns(), . . .→ methods relevant for parallelization
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 42
Technische Universitat Munchen
Example: SWE WavePropagationBlockCUDA
class SWE WavePropagationBlockCuda: public SWE BlockCUDA /∗−− definition of member variables skipped −−∗/public :
// compute a single time step (net−updates + update of the cells).void simulateTimestep( float i dT );// simulate multiple time steps ( start and end time as parameters)float simulate(float , float );// compute the numerical fluxes (net−update formulation here).void computeNumericalFluxes();// compute the new cell values.void updateUnknowns(const float i deltaT);
;
(in file SWE WavePropagationBlockCuda.hh)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 43
Technische Universitat Munchen
Part IV
SWE Parallelisation
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 44
Technische Universitat Munchen
Patches of Cartesian Grid Blocks
i=0 i=nx+1i=1 i=nx
j=ny+1
j=ny
j=1
j=0i=0 i=nx+1i=1 i=nx
j=ny+1
j=ny
j=1
j=0
Spatial Discretization:
• regular Cartesian meshes; allow multiple patches• ghost and copy layers to implement boundary conditions,
for more complicated domains, and for parallelization
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 45
Technische Universitat Munchen
Loop-Based Parallelism within PatchesComputing the Net Updates
• compute net updates on left/right edges:
for( int i=1; i < nx+2; i++) in parallel for( int j=1; j < ny+1; j++) in parallel
float maxEdgeSpeed;fWaveComputeNetUpdates( 9.81,
h[ i−1][j ], h[ i ][ j ], hu[ i−1][j ], hu[ i ][ j ], /∗ ... ∗/ );
• compute net updates on top/bottom edges:
for( int i=1; i < nx+1; i++) in parallel for( int j=1; j < ny+2; j++) in parallel
fWaveComputeNetUpdates( 9.81,h[ i ][ j−1], h[ i ][ j ], hv[ i ][ j−1], hv[ i ][ j ], /∗ ... ∗/ );
(function fWaveComputeNetUpdates() defined in file solver /FWaveCuda.h)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 46
Technische Universitat Munchen
Computing the Net UpdatesOptions for Parallelism
Parallelization of computations:
• compute all vertical edges in parallel• compute all horizontal edges in parallel• compute vertical & horizontal edges in parallel (task parallelism)
Parallel access to memory:
• concurrent read to variables h, hu, hv• exclusive write access to net-update variables on edges
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 47
Technische Universitat Munchen
Loop-Based Parallelism within Patches (2)Updating the Unknowns
• update unknowns from net updates on edges:
for( int i=1; i < nx+1; i++) in parallel for( int j=1; j < ny+1; j++) in parallel
h[ i ][ j ] −= dt/dx ∗ (hNetUpdatesRight[i−1][j−1]+ hNetUpdatesLeft[i][ j−1] )
+ dt /dy ∗ (hNetUpdatesAbove[i−1][j−1]+ hNetUpdatesBelow[i−1][j] );
hu[ i ][ j ] −= dt/dx ∗ (huNetUpdatesRight[i−1][j−1]+ huNetUpdatesLeft[i][j−1] );
/∗ ... ∗/
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 48
Technische Universitat Munchen
Updating the UnknownsOptions for Parallelism
Parallelization of computations:
• compute all cells in parallel
Parallel access to memory:
• concurrent read to net-updates on edges• exclusive write access to variables h, hu, hv
“Vectorization property”:
• exactly the same code for all cell!
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 49
Technische Universitat Munchen
Exchange of Values in Ghost/Copy Layers
Straightforward Approach:
• boundary conditions OUTFLOW, WALL vs. CONNECT orPARALLEL
• disadvantage: method setGhostLayer() needs to be implementedfor each derived class
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 50
Technische Universitat Munchen
Exchange of Values in Ghost/Copy Layers (2)
Implemented via Proxy Objects:
• grabGhostLayer() to write into ghost layer• registerCopyLayer() to read from copy layer• both methods return a proxy object (class SWE Block1D) that
references one row/column of the grid
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 51
Technische Universitat Munchen
Direct-Neighbour vs. “Diagonal” Communication2-step scheme to exchange data of “diagonal” ghost cells:
• several “hops” replace diagonal communication• slight increase of volume of communication (bandwidth), but
reduces number of messages (latency)• similar in 3D (26 neighbours→ 6 neighbours!)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 52
Technische Universitat Munchen
MPI Parallelization– Exchange of Ghost/Copy Layers
SWE Block1D∗ leftInflow = splash.grabGhostLayer(BND LEFT);SWE Block1D∗ leftOutflow = splash.registerCopyLayer(BND LEFT);
SWE Block1D∗ rightInflow = splash.grabGhostLayer(BND RIGHT);SWE Block1D∗ rightOutflow = splash.registerCopyLayer(BND RIGHT);
MPI Sendrecv(leftOutflow−>h.elemVector(), 1, MPI COL, leftRank, 1,rightInflow−>h.elemVector(), 1, MPI COL, rightRank, 1,MPI COMM WORLD,&status);
MPI Sendrecv(rightOutflow−>h.elemVector(), 1, MPI COL, rightRank,4,leftInflow −>h.elemVector(), 1, MPI COL, leftRank, 4,MPI COMM WORLD,&status);
(cmp. file examples/swe mpi.cpp)
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 53
Technische Universitat Munchen
MPI – Some Speedups
12
34
56
78
number of cores
spee
dup
rela
tive
to 1
6 co
res
16 32 48 64 96 128
measuredlinear
• 1 MPI process per core• (expensive) augmented Riemann solvers
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 54
Technische Universitat Munchen
Speedups for MPI/OpenMP
12
34
56
78
number of cores
spee
dup
rela
tive
to 1
6 co
res
16 32 48 64 96 128
measuredlinear
• 1 MPI process per node, 8 OpenMP threads (1 per core)• straightforward OpenMP parallelization of for-loops
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 55
Technische Universitat Munchen
Speedups for MPI/OpenMP
12
34
56
78
number of cores
spee
dup
rela
tive
to 1
6 co
res
16 32 48 64 96 128
measuredlinear
• 1 MPI process per node, 8 OpenMP threads (1 per core)• hybrid f-Wave/aug. Riemann solver→ poor load balancing
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 56
Technische Universitat Munchen
Teaching Parallel Programming with SWE
SWE in Lectures, Tutorials, Lab Courses:
• non-trivial example, but model & implementation easy to grasp• allows different parallel programming models
(MPI, OpenMP, CUDA, Intel TBB/ArBB, OpenCL, . . . )• prepared for hybrid parallelisation
Some Extensions:
• ASAGI - parallel server for geoinformation(S. Rettenberger, Master’s thesis)
• OpenGL real-time visualisation of results(T. Schnabel, student project; extended by S. Rettenberger)
→ http://www5.in.tum.de/SWE/→ https://github.com/TUM-I5
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 57
Technische Universitat Munchen
Part V
Workshop – SWE Parallelisation
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 58
Technische Universitat Munchen
MPI Communication Between Patches
Extend sequential SWE program swe serial.cpp:
• goal: one patch (SWE Block per MPI process• establish assignment of patches to MPI ranks (“who is my
neighbour?”)• implement exchange between ghost & copy cells (preferably via
proxy objects)• parallelize adaptive time step control• produce speed-up graphs (strong and weak scaling)
Possible extensions: (for the ambitious . . . )
• compare blocking vs. non-blocking communication• try overlapping communication and computation• allow multiple patches per MPI process
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 59
Technische Universitat Munchen
Loop Parallelism in SWE Using OpenMP
What should be done before starting with OpenMP?
• determine most time-consuming parts of your code (→ week 2)• use option “guided auto-parallelism” of Intel compiler
(→ welcome to try, but does not give many hints for SWE)
Extend MPI-parallel SWE program towards MPI+OpenMP:
• what are the most time-consuming loops in SWE?• ToDo: loop parallelism for these loops using #pragma ...• test performance of OpenMP vs. MPI implementation
Possible extensions: (for the ambitious . . . )
• parallelize adaptive time step control (reduction)• multiple-patch version: try OpenMP on patches
Michael Bader: SWE – Anatomy of a Parallel Shallow Water Code
Computer Simulations in Science and Engineering, July 8–19, 2013 60