Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer...
-
Upload
myrtle-king -
Category
Documents
-
view
215 -
download
0
Transcript of Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer...
Canonical Problems and Scalable Solvers for
Numerical PDERobert C. Kirby
Department of Computer Science
The University of Chicago
CMSC34000 Lecture 2, 6 Jan 2005
Outline
Canonical problems in Scientific Computing Algebraic equations Differential equations Optimization
Overview of SciDAC/TOPS Hard problems Cutting edge research Funding opportunities
CMSC34000 Lecture 2, 6 Jan 2005
Canonical Problems
Problem Linear systems Eigenvalue Ordinary DE Nonlinear systems Constrained
optimization
bAx =BxAx λ=
0),,,( =ptxxf &
€
F(x) = 0
..),(min tsuxuφ
0),( =uxF
CMSC34000 Lecture 2, 6 Jan 2005
Linear Systems
Ax = b PDE --> Linear System Nonlinear system --> Sequence of linear steps
Millions of unknowns Sparse Ill-conditioned k(A) = || A || || A^-1 || Nonsymmetric
CMSC34000 Lecture 2, 6 Jan 2005
Solution techniques
Direct (Gaussian elimination & variants) Iterative (Sequence of approximations)
Jacobi / Gauss Seidel SOR Krylov-subspace methods (sequence of clever matrix-vector
products builds up good approximation Multigrid (algebraic / geometric)
Parallelism Each process owns its own rows Matrix action --> communication
Euler methods (forward, backward) Runge-Kutta methods
CMSC34000 Lecture 2, 6 Jan 2005
Ordinary differential equations
Time rate of change of system of quantities) Chemical/nuclear reactions N-body problems (celestial mechanics) Time-dependent PDE (after spatial discretization)
Algebraic equations at each time step “Stiffness”
Wide range of time scales requires special treatment Fastest modes require smallest time steps
Multistep methods
CMSC34000 Lecture 2, 6 Jan 2005
ODE Techniques
Euler methods (forward, backward) Runge-Kutta methods Multi-adaptive Galerkin (Anders Logg) Parallelism?
Depends on coupling of components Naturally arises from PDE Parallelism in time is more difficult…
CMSC34000 Lecture 2, 6 Jan 2005
PDE
Multiple dependent variables
Algebraic equations If time dependent, get ODE or DAE at each
time step€
u ⋅∇u −νΔu +∇p = 0
∇ ⋅u = 0
CMSC34000 Lecture 2, 6 Jan 2005
Solution strategies
Finite differences (FDM) Replace derivatives at points with difference quotient Naturally leads to equations for each grid point Irregular, adaptive grids require special treatment
Finite elements (FEM) Replace “strong form” with “weak form” Weak form over subspaces leads to algebraic system Generalizes FDM (geometry, order)
Finite volumes Discrete conservation laws over each control volume Often “low order” but “stable”
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear equations
Arise from nonlinear PDE, ODE, etc Require Newton’s method
Jacobian matrix (differentiation) Nonlinear Preconditioning
€
F(u) ≈ F(uc ) +F '(uc )δu = 0
u = uc + λδu
CMSC34000 Lecture 2, 6 Jan 2005
Optimization
Find the minimum of some function subject to some constraints
Constaints may themselves be PDE
CMSC34000 Lecture 2, 6 Jan 2005
Motivation Solver performance is a major concern for parallel
simulations based on PDE formulations … including many of those of the U.S. DOE Scientific Discovery
through Advanced Computing (SciDAC) program
For target applications, implicit solvers may require 50% to 95% of execution time … at least, before “expert” overhaul for algorithmic optimality and
implementation performance
Even after a “best manual practice” overhaul, the solver may still require 20% to 50% of execution time
The solver may hit up against both the processor scalability limit and the memory bandwidth limitation of a PDE-based application, before any other part of the code … the first of these is not fundamental, though the second one is
CMSC34000 Lecture 2, 6 Jan 2005
Presentation plan Overview of the SciDAC initiative
Brief review of scalable implicit methods (domain decomposed multilevel iterative methods) algorithms software components: PETSc, Hypre, etc.
Overview of the Terascale Optimal PDE Simulations project (TOPS)
Three “war stories” from the SciDAC magnetically confined fusion energy portfolio
Some advanced research directions physics-based preconditioning nonlinear Schwarz
On the horizon
CMSC34000 Lecture 2, 6 Jan 2005
SciDAC apps and infrastructure4 projects
in high energy and
nuclear physics
5 projects in fusion
energy science
14 projects in biological and environmental research
10 projects will in basic energy sciences
18 projects in scientific
software and network
infrastructure
CMSC34000 Lecture 2, 6 Jan 2005
“Enabling technologies” groups to develop reusable software and partner with application groups
From 2001 start-up, 51 projects share $57M/year Approximately one-third for applications A third for “integrated software infrastructure
centers” A third for grid infrastructure and collaboratories
Plus, multi-Tflop/s IBM SP machines at NERSC and ORNL available for SciDAC researchers
CMSC34000 Lecture 2, 6 Jan 2005
Unclassified resources for DOE science
IBM Power4 Regatta
32 procs per node
864 procs total
4.5 Tflop/s
“Cheetah”
IBM Power3+ SMP
16 procs per node
6656 procs total
10 Tflop/s
“Seaborg”
Berkeley
Oak Ridge
CMSC34000 Lecture 2, 6 Jan 2005
Designing a simulation code(from 2001 SciDAC report)
V&V loop
Performance loop
CMSC34000 Lecture 2, 6 Jan 2005
Hardware Infrastructure
ARCHITECTURES
Applications
A “perfect storm” for simulation
scientific models
numerical algorithms
computer architecture
scientific software engineering
1686
1947
1976
“Computational science is undergoing a phase transition.” – D. Hitchcock, DOE
(dates are symbolic)
CMSC34000 Lecture 2, 6 Jan 2005
Imperative: multiple-scale applications Multiple spatial scales
interfaces, fronts, layers thin relative to domain size,
<< L Multiple temporal scales
fast waves small transit times relative to
convection or diffusion, << T
Analyst must isolate dynamics of interest and model the rest in a system that can be discretized over more modest range of scales
May lead to infinitely “stiff” subsystem requiring special treatment by the solution method
Richtmeyer-Meshkov instability, c/o A. Mirin, LLNL
CMSC34000 Lecture 2, 6 Jan 2005
Examples: multiple-scale applications Biopolymers, nanotechnology
1012 range in time, from 10-15 sec (quantum fluctuation) to 10-3 sec (molecular folding time)
typical computational model ignores smallest scales, works on classical dynamics only, but scientists increasingly want both
Galaxy formation 1020 range in space from binary star interactions
to diameter of universe heroic computational model handles all scales
with localized adaptive meshing
Supernova simulation, c/o A. Mezzacappa, ORNL
Supernovae simulation massive ranges in time and
space scales for radiation, turbulent convection, diffusion, chemical reaction, nuclear reaction
CMSC34000 Lecture 2, 6 Jan 2005
SciDAC portfolio characteristics Multiple temporal scales Multiple spatial scales Linear ill conditioning Complex geometry and severe anisotropy Coupled physics, with essential nonlinearities Ambition for uncertainty quantification,
parameter estimation, and design
Need toolkit of portable, extensible, tunable implicit solvers, not “one-size fits all”
CMSC34000 Lecture 2, 6 Jan 2005
TOPS starting point codes PETSc (ANL) Hypre (LLNL) Sundials (LLNL) SuperLU (LBNL) PARPACK (LBNL*) TAO (ANL) Veltisto (CMU) Many interoperability connections between these
packages that predated SciDAC Many application collaborators that predated SciDAC
CMSC34000 Lecture 2, 6 Jan 2005
TOPS participants
ODU
UC-B/LBNLANL
UT-K
TOPS lab (3)
CU
LLNL
TOPS university (7)
CMU
CU-B
NYU
CMSC34000 Lecture 2, 6 Jan 2005
In the old days, see “Templates” guides …www.netlib.org/etemplateswww.netlib.org/templates
124 pp. 410 pp.… these are good starts, but not adequate for SciDAC scales!
CMSC34000 Lecture 2, 6 Jan 2005
34 applications groups
7 ISIC groups (4 CS, 3 Math)
10 grid, data collaboratory groups
adaptive gridding discretization
solvers (TOPS)
systems software component architecture performance engineering data management
0),,,( =ptxxf &
0),( =pxF
bAx =BxAx λ=
..),(min tsuxuφ
0),( =uxFsoftware integration
performance optimization
“integrated software infrastructure centers”
CMSC34000 Lecture 2, 6 Jan 2005
Keyword: “Optimal” Convergence rate nearly
independent of discretization parameters multilevel schemes for rapid linear
convergence of linear problems Newton-like schemes for quadratic
convergence of nonlinear problems
Convergence rate as independent as possible of physical parameters continuation schemes physics-based preconditioning
Optimal convergence plus scalable loop body yields scalable solver
unscalable
scalable
Problem Size (increasing with number of processors)
Tim
e to
So
luti
on
200
150
50
0
100
10 100 10001
CMSC34000 Lecture 2, 6 Jan 2005
But where to go past O(N) ? Since O(N) is already optimal, there is nowhere further
“upward” to go in efficiency, but one must extend optimality “outward,” to more general problems
Hence, for instance, algebraic multigrid (AMG) to seek to obtain O(N) in indefinite, anisotropic, or inhomogeneous problems on irregular grids
AMG FrameworkR n
Choose coarse grids, transfer operators, and smoothers to
eliminate these “bad” components within a smaller dimensional space, and recur
error easily damped by pointwise relaxation
algebraically smooth error
CMSC34000 Lecture 2, 6 Jan 2005
Toolchain for PDE solvers in TOPS project Design and implementation of “solvers”
Time integrators
Nonlinear solvers
Constrained optimizers
Linear solvers
Eigensolvers
Software integration Performance optimization
0),,,( =ptxxf &
0),( =pxF
bAx =
BxAx λ=
0,0),(..),(min ≥= uuxFtsuxuφ
Optimizer
Linear solver
Eigensolver
Time integrator
Nonlinear solver
Indicates dependence
Sens. Analyzer
(w/ sens. anal.)
(w/ sens. anal.)
CMSC34000 Lecture 2, 6 Jan 2005
Dominant data structures are grid-based
finite differences finite elements
finite volumes
All lead to problems with sparse Jacobian matrices; many tasks can leverage off an efficient set of tools for manipulating distributed sparse data structures
J=
node i
row i
CMSC34000 Lecture 2, 6 Jan 2005
Newton-Krylov-Schwarz: a PDE applications “workhorse”
Newtonnonlinear solver
asymptotically quadratic
0)(')()( =+≈ uuFuFuF cc uuu c λ+=
Krylovaccelerator
spectrally adaptive
FuJ −=}{minarg
},,,{ 2
FJxuFJJFFVx
+=≡∈ L
δ
Schwarzpreconditionerparallelizable
FMuJM 11 −− −=
iTii
Tii RJRRRM 11 )( −− ∑=
CMSC34000 Lecture 2, 6 Jan 2005
SPMD parallelism w/domain decomposition
Partitioning of the grid induces block structure on the Jacobian
1
2
3
A23A21 A22
rows assigned to proc “2”
CMSC34000 Lecture 2, 6 Jan 2005
Time-implicit Newton-Krylov-SchwarzFor accommodation of unsteady problems, and nonlinear robustness in
steady ones, NKS iteration is wrapped in time-stepping:for (l = 0; l < n_time; l++) {
select time step
for (k = 0; k < n_Newton; k++) {
compute nonlinear residual and Jacobian
for (j = 0; j < n_Krylov; j++) {
forall (i = 0; i < n_Precon ; i++) {
solve subdomain problems concurrently
} // End of loop over subdomains
perform Jacobian-vector product
enforce Krylov basis conditions
update optimal coefficients
check linear convergence
} // End of linear solver
perform DAXPY update
check nonlinear convergence
} // End of nonlinear loop
} // End of time-step loop
NKS loop
Pseudo-time loop
CMSC34000 Lecture 2, 6 Jan 2005
(N)KS kernel in parallel
local scatter
Jac-vec multiply
precond sweep
daxpy inner product
Krylov iteration
…
Bulk synchronous model leads to easy scalability analyses and projections. Each phase can be considered separately. What happens if, for instance, in this (schematicized) iteration, arithmetic speed is doubled, scalar all-gather is quartered, and local scatter is cut by one-third?
P1:
P2:
Pn:
M
…P1:
P2:
Pn:
M
CMSC34000 Lecture 2, 6 Jan 2005
Estimating scalability of stencil computations Given complexity estimates of the leading terms of:
the concurrent computation (per iteration phase) the concurrent communication the synchronization frequency
And a bulk synchronous model of the architecture including: internode communication (network topology and protocol reflecting horizontal
memory structure) on-node computation (effective performance parameters including vertical
memory structure)
One can estimate optimal concurrency and optimal execution time on per-iteration basis, or overall (by taking into account any granularity-
dependent convergence rate), based on problem size N and concurrency P simply differentiate time estimate in terms of (N,P) with respect to P, equate to
zero and solve for P in terms of N
CMSC34000 Lecture 2, 6 Jan 2005
Scalability results for DD stencil computations With tree-based (logarithmic) global reductions and
scalable nearest neighbor hardware: optimal number of processors scales linearly with problem
size
With 3D torus-based global reductions and scalable nearest neighbor hardware: optimal number of processors scales as three-fourths power
of problem size (almost “scalable”)
With common network bus (heavy contention): optimal number of processors scales as one-fourth power
of problem size (not “scalable”) bad news for conventional Beowulf clusters, but see 2000
Bell Prize “price-performance awards”, for multiple NICs
CMSC34000 Lecture 2, 6 Jan 2005
PETSc codeUser code
ApplicationInitialization
FunctionEvaluation
JacobianEvaluation
Post-Processing
PC KSPPETSc
Main Routine
Linear Solvers (SLES)
Nonlinear Solvers (SNES)
Timestepping Solvers (TS)
NKS efficiently implemented in PETSc’s MPI-based distributed data structures
CMSC34000 Lecture 2, 6 Jan 2005
PETSc codeUser code
ApplicationInitialization
FunctionEvaluation
JacobianEvaluation
Post-Processing
PC KSPPETSc
Main Routine
Linear Solvers (SLES)
Nonlinear Solvers (SNES)
Timestepping Solvers (TS)
User Code/PETSc library interactions
Can be AD code
CMSC34000 Lecture 2, 6 Jan 2005
1999 Bell Prize for unstructured grid computational aerodynamics
mesh c/o D. Mavriplis, ICASE
Implemented in PETSc
www.mcs.anl.gov/petsc
Transonic “Lambda” Shock, Mach contours on surfaces
CMSC34000 Lecture 2, 6 Jan 2005
Fixed-size parallel scaling results
Four orders of magnitude in 13 years
c/o K. Anderson, W. Gropp, D. Kaushik, D. Keyes and B. Smith
128 nodes 128 nodes 43min43min
3072 nodes 3072 nodes 2.5min, 2.5min, 226Gf/s226Gf/s
11M unknowns 11M unknowns 70% efficient70% efficient
CMSC34000 Lecture 2, 6 Jan 2005
BEB ⋅∇∇+×−∇=∂∂
divbtκ
JBVE η+×−=
BJ ×∇=0μ
( ) nDnt
n∇⋅∇=⋅∇+
∂
∂V
VBJVVV ∇⋅∇+∇−×=⎟
⎠
⎞⎜⎝
⎛∇⋅+
∂∂ νρρ p
t
( )[ ] QTnpTt
Tn+∇⋅+−⋅∇+⋅∇−=⎟
⎠
⎞⎜⎝
⎛∇⋅+
∂
∂
− ⊥⊥ IbbVV χχχγ
ˆˆ1 ||
Physical models based on fluid-like magnetohydrodynamics (MHD)
0
Three “war stories” from magnetic fusion energy applications in SciDAC
CMSC34000 Lecture 2, 6 Jan 2005
• Conditions of interest possess two properties that pose great challenges to numerical approaches—anisotropy and stiffness.
• Anisotropy produces subtle balances of large forces, and vastly different parallel and perpendicular transport properties.
• Stiffness reflects the vast range of time-scales in the system: targeted physics is slow (~transport scale) compared to waves
Challenges in magnetic fusion
CMSC34000 Lecture 2, 6 Jan 2005
Tokamak/stellerator simulations Center for Extended MHD Modeling (based at
Princeton Plasma Physics Lab) M3D code Realistic toroidal geom., unstructured mesh,
hybrid FE/FD discretization Fields expanded in scalar potentials, and
streamfunctions Operator-split, linearized, w/11 potential
solves in each poloidal cross-plane/step (90% exe. time)
Parallelized w/PETSc (Tang et al., SIAM PP01, Chen et al., SIAM AN02, Jardin et al., SIAM CSE03)
Want from TOPS: Now: scalable linear implicit solver for much
higher resolution (and for AMR) Later: fully nonlinearly implicit solvers and
coupling to other codes
CMSC34000 Lecture 2, 6 Jan 2005
Provided new solvers across existing interfaces Hypre in PETSc
codes with PETSc interface (like CEMM’s M3D) can now invoke Hypre routines as solvers or preconditioners with command-line switch
SuperLU_DIST in PETSc as above, with SuperLU_DIST
Hypre in AMR Chombo code so far, Hypre is level-solver only; its AMG will ultimately
be useful as a bottom-solver, since it can be coarsened indefinitely without attention to loss of nested geometric structure; also FAC is being developed for AMR uses, like Chombo
CMSC34000 Lecture 2, 6 Jan 2005
smoother
Finest Grid
First Coarse Grid
coarser grid has fewer cells (less work & storage)
Restrictiontransfer from fine to coarse grid
Recursively apply this idea until we have an easy problem to solve
A Multigrid V-cycle
Prolongationtransfer from coarse to fine grid
Hypre: multilevel preconditioning
CMSC34000 Lecture 2, 6 Jan 2005
Hypre’s AMG in M3D PETSc-based PPPL code M3D has been retrofit with Hypre’s algebraic
MG solver of Ruge-Steuben type Iteration count results below are averaged over 19 different PETSc
SLESSolve calls in initialization and one timestep loop for this operator split unsteady code, abcissa is number of procs in scaled problem; problem size ranges from 12K to 303K unknowns (approx 4K per processor)
0
100
200
300
400
500
600
700
3 12 27 48 75
ASM-GMRESAMG-FMGRES
CMSC34000 Lecture 2, 6 Jan 2005
Hypre’s AMG in M3D Scaled speedup timing results below are summed over 19 different PETSc
SLESSolve calls in initialization and one timestep loop for this operator split unsteady code
Majority of AMG cost is coarse-grid formation (preprocessing) which does not scale as well as the inner loop V-cycle phase; in production, these coarse hierarchies will be saved for reuse (same linear systems are called in each timestep loop), making AMG much less expensive and more scalable
0
10
20
30
40
50
60
3 12 27 48 75
ASM-GMRESAMG-FMGRESAMG inner (est)
CMSC34000 Lecture 2, 6 Jan 2005
Hypre’s “Conceptual Interfaces”
Data Layout
structured composite block-struc unstruc CSR
Linear Solvers
GMG, ... FAC, ... Hybrid, ... AMGe, ... ILU, ...
Linear System Interfaces
(Slide c/o E. Chow, LLNL)
CMSC34000 Lecture 2, 6 Jan 2005
SuperLU in NIMROD NIMROD is another MHD code in the CEMM collaboration
employs high-order elements on unstructured grids very poor convergence with default Krylov solver on 2D poloidal
crossplane linear solves
TOPS wired in SuperLU, just to try a sparse direct solver Speedup of more than 10 in serial, and about 8 on a
modest parallel cluster (24 procs) PI Dalton Schnack (General Atomics) thought he entered a
time machine SuperLU is not a “final answer”, but a sanity check Parallel ILU under Krylov should be superior
CMSC34000 Lecture 2, 6 Jan 2005
Equilibrium:
Model equations: (Porcelli et al., 1993, 1999)
2D Hall MHD sawtooth instability (PETSc examples /snes/ex29.c and /sles/ex31.c)
(figures c/o A. Bhattacharjee, CMRS)
Vorticity, early time
Vorticity, later time
zoom
CMSC34000 Lecture 2, 6 Jan 2005
PETSc’s DMMG in Hall MR application Implicit code (snes/ex29.c)
versus explicit code (sles/ex31.c), both with second-order integration in time
Implicit code (snes/ex29.c) with first- and second-order integration in time
CMSC34000 Lecture 2, 6 Jan 2005
Abstract Gantt Chart for TOPS
Algorithmic Development
Research Implementations
Hardened Codes
Applications Integration
Dissemination
time
e.g.,PETSc
e.g.,TOPSLib
e.g., ASPIN
Each color module represents an algorithmic research idea on its way to becoming part of a supported community software tool. At any moment (vertical time slice), TOPS has work underway at multiple levels. While some codes are in applications already, they are being improved in functionality and performance as part of the TOPS research agenda.
CMSC34000 Lecture 2, 6 Jan 2005
Jacobian-free Newton-Krylov In the Jacobian-Free Newton-Krylov (JFNK) method, a
Krylov method solves the linear Newton correction equation, requiring Jacobian-vector products
These are approximated by the Fréchet derivatives
(where is chosen with a fine balance between approximation and floating point rounding error) or automatic differentiation, so that the actual Jacobian elements are never explicitly needed
One builds the Krylov space on a true F’(u) (to within numerical approximation)
)]()([1
)( uFvuFvuJ −+≈ εε
ε
CMSC34000 Lecture 2, 6 Jan 2005
Philosophy of Jacobian-free NK To evaluate the linear residual, we use the true F’(u) , giving
a true Newton step and asymptotic quadratic Newton convergence
To precondition the linear residual, we do anything convenient that uses understanding of the dominant physics/mathematics in the system and respects the limitations of the parallel computer architecture and the cost of various operations: Jacobian of lower-order discretization Jacobian with “lagged” values for expensive terms Jacobian stored in lower precision Jacobian blocks decomposed for parallelism Jacobian of related discretization operator-split Jacobians physics-based preconditioning
CMSC34000 Lecture 2, 6 Jan 2005
Recall idea of preconditioning Krylov iteration is expensive in memory and in
function evaluations, so subspace dimension k must be kept small in practice, through preconditioning the Jacobian with an approximate inverse, so that the product matrix has low condition number in
Given the ability to apply the action of to a vector, preconditioning can be done on either the left, as above, or the right, as in, e.g., for matrix-free:
)]()([1 11 uFvBuFvJB −+≈ −− εε
bBxAB 11 )( −− =1−B
CMSC34000 Lecture 2, 6 Jan 2005
Physics-based preconditioning In Newton iteration, one seeks to obtain a correction
(“delta”) to solution, by inverting the Jacobian matrix on (the negative of) the nonlinear residual:
A typical operator-split code also derives a “delta” to the solution, by some implicitly defined means, through a series of implicit and explicit substeps
This implicitly defined mapping from residual to “delta” is a natural preconditioner
Software must accommodate this!
)()]([ 1 kkk uFuJu −−=
kk uuF a)(
CMSC34000 Lecture 2, 6 Jan 2005
Physics-based preconditioning We consider a standard “dynamical
core,” the shallow-water wave splitting algorithm, as a solver
Leaves a first-order in time splitting error
In the Jacobian-free Newton-Krylov framework, this solver, which maps a residual into a correction, can be regarded as a preconditioner
The true Jacobian is never formed yet the time-implicit nonlinear residual at each time step can be made as small as needed for nonlinear consistency in long time integrations
CMSC34000 Lecture 2, 6 Jan 2005
Example: shallow water equations Continuity (*)
Momentum (**)
These equations admit a fast gravity wave, as can be seen by cross differentiating, e.g., (*) by t and (**) by x, and subtracting:
0)(=
∂∂
+∂∂
xu
tφφ
0)()( 2
=∂∂
+∂
∂+
∂∂
xg
xu
tu φφφφ
termsotherx
gt
=∂∂
−∂∂
2
2
2
2 φφφ
×∂∂t
×∂∂x
CMSC34000 Lecture 2, 6 Jan 2005
1D shallow water equations, cont.
Wave equation for geopotential:
Gravity wave speed φg
Typically , but stability restrictions would require timesteps based on the Courant-Friedrichs-Levy (CFL) criterion for the fastest wave, for an explicit method
One can solve fully implicitly, or one can filter out the gravity wave by solving semi-implicitly
ug >>φ
termsotherx
gt
=∂∂
−∂∂
2
2
2
2 φφφ
CMSC34000 Lecture 2, 6 Jan 2005
1D shallow water equations, cont. Continuity (*)
Momentum (**)
0)( 11
=∂
∂+
− ++
xu nnn φ
φφ
0)()()( 121
=∂
∂+
∂∂
+− ++
xg
xuuu n
nnnn φφφ
φφ
Solving (**) for and substituting into (*),
where
1)( +nuφ
x
S
xxg
nn
nnn
∂∂
+=∂
∂∂∂
−+
+ φφφφ )(1
21
x
uuS
nnn
∂∂
−=)(
)(2φφ
CMSC34000 Lecture 2, 6 Jan 2005
1D shallow water equations, cont. After the parabolic equation is spatially discretized and
solved for , then can be found from n
nnn S
xgu +
∂∂
−=+
+1
1)(φφφ
One scalar parabolic solve and one scalar explicit update replace an implicit hyperbolic system
This semi-implicit operator splitting is foundational to multiple scales problems in geophysical modeling
Similar tricks are employed in aerodynamics (sound waves), MHD (multiple Alfvén waves), reacting flows (fast kinetics), etc.
Temporal truncation error remains due to the lagging of the advection
in (**)
1+nφ 1)( +nuφ
To be dealt with shortly
CMSC34000 Lecture 2, 6 Jan 2005
1D Shallow water preconditioning Define continuity residual for each timestep:
Define momentum residual for each timestep:
φφφ
_)]([
Rx
u−=
∂
∂+
φφφφ
uRx
gu n _
][)(−=
∂
∂+
Continuity delta-form (*):
Momentum delta form (**):
x
uR
nnn
∂∂
+−
≡++ 11 )(
_φ
φφφ
xg
x
uuuuR
nn
nnn
∂∂
+∂
∂+
−≡
++ 121 )()()(_
φφφ
φφφ
CMSC34000 Lecture 2, 6 Jan 2005
1D Shallow water preconditioning, cont. Solving (**) for and substituting into (*),
After this parabolic equation is solved for , we have
This completes the application of the preconditioner to one Newton-Krylov iteration at one timestep
Of course, the parabolic solve need not be done exactly; one sweep of multigrid can be used See paper by Mousseau et al. (2002) for impressive results for longtime weather integration
)( φ u
)_(_)][
( 22 φφφφφ uRx
Rxx
g n
∂∂
+−=∂
∂∂∂
−
φφφφ uRx
gu n _][
)( −∂
∂−=
CMSC34000 Lecture 2, 6 Jan 2005
Physics-based preconditioning update
So far, physics-based preconditioning has been applied to several codes at Los Alamos, in an effort led by D. Knoll
Summarized in new J. Comp. Phys. paper by Knoll & Keyes (Jan 2004)
PETSc’s “shell preconditioner” is designed for inserting physics-based preconditioners, and PETSc’s solvers underneath are building blocks
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz preconditioning Nonlinear Schwarz has Newton both inside and
outside and is fundamentally Jacobian-free It replaces with a new nonlinear system
possessing the same root, Define a correction to the partition (e.g.,
subdomain) of the solution vector by solving the following local nonlinear system:
where is nonzero only in the components of the partition
Then sum the corrections: to get an implicit function of u
0)( =uF0)( =Φ u
thi
thi
)(ui
0))(( =+ uuFR ii n
i u ℜ∈)(
)()( uu ii ∑=Φ
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz – picture
1
1
1
1
0 0
u
F(u)
Ri
RiuRiF
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz – picture
1
1
1
1
0 0
1
1
1
1
0 0
u
F(u)
Ri
Rj
Riu
RjF
RiF
Rju
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz – picture
u
F(u)
Fi’(ui)
Ri
Rj
δiu+δju
1
1
1
1
0 0
1
1
1
1
0 0 RiuRiF
RjuRjF
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz, cont. It is simple to prove that if the Jacobian of F(u) is
nonsingular in a neighborhood of the desired root then and have the same unique root
To lead to a Jacobian-free Newton-Krylov algorithm we need to be able to evaluate for any : The residual The Jacobian-vector product
Remarkably, (Cai-Keyes, 2000) it can be shown that
where and All required actions are available in terms of !
0)( =Φ u
nvu ℜ∈,)()( uu ii ∑=Φ
0)( =uF
vu ')(Φ
JvRJRvu iiTii )()( 1' −∑≈Φ
)(' uFJ = Tiii JRRJ =
)(uF
CMSC34000 Lecture 2, 6 Jan 2005
Experimental example of nonlinear Schwarz
Newton’s methodAdditive Schwarz Preconditioned Inexact Newton
(ASPIN)
Difficulty at critical Re
Stagnation beyond
critical Re
Convergence for all Re
CMSC34000 Lecture 2, 6 Jan 2005
The 2003 SCaLeS initiative
Workshop on a Science-based Case for Large-scale Simulation
Arlington, VA
24-25 June 2003
CMSC34000 Lecture 2, 6 Jan 2005
Charge (April 2003, W. Polansky, DOE): “Identify rich and fruitful directions for the
computational sciences from the perspective of scientific and engineering applications”
Build a “strong science case for an ultra-scale computing capability for the Office of Science”
“Address major opportunities and challenges facing computational sciences in areas of strategic importance to the Office of Science”
“Report by July 30, 2003”
Chapter 1. Introduction
Chapter 2. Scientific Discovery through Advanced Computing: a Successful Pilot Program
Chapter 3. Anatomy of a Large-scale Simulation
Chapter 4. Opportunities at the Scientific Horizon
Chapter 5. Enabling Mathematics and Computer Science Tools
Chapter 6. Recommendations and Discussion
Volume 2 (due out early 2004):
11 chapters on applications
8 chapters on mathematical methods
8 chapters on computer science and infrastructure
First fruits!
CMSC34000 Lecture 2, 6 Jan 2005
“There will be opened a gateway and a road to a large and excellent science
into which minds more piercing than mine shall penetrate to recesses still deeper.”
Galileo (1564-1642)(on ‘experimental mathematical analysis of nature’
appropriated here for ‘simulation science’)
CMSC34000 Lecture 2, 6 Jan 2005
Related URLs TOPS project
http://www.tops-scidac.org SciDAC initiative
http://www.science.doe.gov/scidac
SCaLeS reporthttp://www.pnl.gov/scales