Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer...

74
Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago

Transcript of Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer...

Page 1: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

Canonical Problems and Scalable Solvers for

Numerical PDERobert C. Kirby

Department of Computer Science

The University of Chicago

Page 2: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Outline

Canonical problems in Scientific Computing Algebraic equations Differential equations Optimization

Overview of SciDAC/TOPS Hard problems Cutting edge research Funding opportunities

Page 3: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Canonical Problems

Problem Linear systems Eigenvalue Ordinary DE Nonlinear systems Constrained

optimization

bAx =BxAx λ=

0),,,( =ptxxf &

F(x) = 0

..),(min tsuxuφ

0),( =uxF

Page 4: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Linear Systems

Ax = b PDE --> Linear System Nonlinear system --> Sequence of linear steps

Millions of unknowns Sparse Ill-conditioned k(A) = || A || || A^-1 || Nonsymmetric

Page 5: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Solution techniques

Direct (Gaussian elimination & variants) Iterative (Sequence of approximations)

Jacobi / Gauss Seidel SOR Krylov-subspace methods (sequence of clever matrix-vector

products builds up good approximation Multigrid (algebraic / geometric)

Parallelism Each process owns its own rows Matrix action --> communication

Euler methods (forward, backward) Runge-Kutta methods

Page 6: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Ordinary differential equations

Time rate of change of system of quantities) Chemical/nuclear reactions N-body problems (celestial mechanics) Time-dependent PDE (after spatial discretization)

Algebraic equations at each time step “Stiffness”

Wide range of time scales requires special treatment Fastest modes require smallest time steps

Multistep methods

Page 7: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

ODE Techniques

Euler methods (forward, backward) Runge-Kutta methods Multi-adaptive Galerkin (Anders Logg) Parallelism?

Depends on coupling of components Naturally arises from PDE Parallelism in time is more difficult…

Page 8: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

PDE

Multiple dependent variables

Algebraic equations If time dependent, get ODE or DAE at each

time step€

u ⋅∇u −νΔu +∇p = 0

∇ ⋅u = 0

Page 9: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Solution strategies

Finite differences (FDM) Replace derivatives at points with difference quotient Naturally leads to equations for each grid point Irregular, adaptive grids require special treatment

Finite elements (FEM) Replace “strong form” with “weak form” Weak form over subspaces leads to algebraic system Generalizes FDM (geometry, order)

Finite volumes Discrete conservation laws over each control volume Often “low order” but “stable”

Page 10: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Nonlinear equations

Arise from nonlinear PDE, ODE, etc Require Newton’s method

Jacobian matrix (differentiation) Nonlinear Preconditioning

F(u) ≈ F(uc ) +F '(uc )δu = 0

u = uc + λδu

Page 11: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Optimization

Find the minimum of some function subject to some constraints

Constaints may themselves be PDE

Page 12: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Motivation Solver performance is a major concern for parallel

simulations based on PDE formulations … including many of those of the U.S. DOE Scientific Discovery

through Advanced Computing (SciDAC) program

For target applications, implicit solvers may require 50% to 95% of execution time … at least, before “expert” overhaul for algorithmic optimality and

implementation performance

Even after a “best manual practice” overhaul, the solver may still require 20% to 50% of execution time

The solver may hit up against both the processor scalability limit and the memory bandwidth limitation of a PDE-based application, before any other part of the code … the first of these is not fundamental, though the second one is

Page 13: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Presentation plan Overview of the SciDAC initiative

Brief review of scalable implicit methods (domain decomposed multilevel iterative methods) algorithms software components: PETSc, Hypre, etc.

Overview of the Terascale Optimal PDE Simulations project (TOPS)

Three “war stories” from the SciDAC magnetically confined fusion energy portfolio

Some advanced research directions physics-based preconditioning nonlinear Schwarz

On the horizon

Page 14: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

SciDAC apps and infrastructure4 projects

in high energy and

nuclear physics

5 projects in fusion

energy science

14 projects in biological and environmental research

10 projects will in basic energy sciences

18 projects in scientific

software and network

infrastructure

Page 15: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

“Enabling technologies” groups to develop reusable software and partner with application groups

From 2001 start-up, 51 projects share $57M/year Approximately one-third for applications A third for “integrated software infrastructure

centers” A third for grid infrastructure and collaboratories

Plus, multi-Tflop/s IBM SP machines at NERSC and ORNL available for SciDAC researchers

Page 16: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Unclassified resources for DOE science

IBM Power4 Regatta

32 procs per node

864 procs total

4.5 Tflop/s

“Cheetah”

IBM Power3+ SMP

16 procs per node

6656 procs total

10 Tflop/s

“Seaborg”

Berkeley

Oak Ridge

Page 17: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Designing a simulation code(from 2001 SciDAC report)

V&V loop

Performance loop

Page 18: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Hardware Infrastructure

ARCHITECTURES

Applications

A “perfect storm” for simulation

scientific models

numerical algorithms

computer architecture

scientific software engineering

1686

1947

1976

“Computational science is undergoing a phase transition.” – D. Hitchcock, DOE

(dates are symbolic)

Page 19: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Imperative: multiple-scale applications Multiple spatial scales

interfaces, fronts, layers thin relative to domain size,

<< L Multiple temporal scales

fast waves small transit times relative to

convection or diffusion, << T

Analyst must isolate dynamics of interest and model the rest in a system that can be discretized over more modest range of scales

May lead to infinitely “stiff” subsystem requiring special treatment by the solution method

Richtmeyer-Meshkov instability, c/o A. Mirin, LLNL

Page 20: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Examples: multiple-scale applications Biopolymers, nanotechnology

1012 range in time, from 10-15 sec (quantum fluctuation) to 10-3 sec (molecular folding time)

typical computational model ignores smallest scales, works on classical dynamics only, but scientists increasingly want both

Galaxy formation 1020 range in space from binary star interactions

to diameter of universe heroic computational model handles all scales

with localized adaptive meshing

Supernova simulation, c/o A. Mezzacappa, ORNL

Supernovae simulation massive ranges in time and

space scales for radiation, turbulent convection, diffusion, chemical reaction, nuclear reaction

Page 21: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

SciDAC portfolio characteristics Multiple temporal scales Multiple spatial scales Linear ill conditioning Complex geometry and severe anisotropy Coupled physics, with essential nonlinearities Ambition for uncertainty quantification,

parameter estimation, and design

Need toolkit of portable, extensible, tunable implicit solvers, not “one-size fits all”

Page 22: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

TOPS starting point codes PETSc (ANL) Hypre (LLNL) Sundials (LLNL) SuperLU (LBNL) PARPACK (LBNL*) TAO (ANL) Veltisto (CMU) Many interoperability connections between these

packages that predated SciDAC Many application collaborators that predated SciDAC

Page 23: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

TOPS participants

ODU

UC-B/LBNLANL

UT-K

TOPS lab (3)

CU

LLNL

TOPS university (7)

CMU

CU-B

NYU

Page 24: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

In the old days, see “Templates” guides …www.netlib.org/etemplateswww.netlib.org/templates

124 pp. 410 pp.… these are good starts, but not adequate for SciDAC scales!

Page 25: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

34 applications groups

7 ISIC groups (4 CS, 3 Math)

10 grid, data collaboratory groups

adaptive gridding discretization

solvers (TOPS)

systems software component architecture performance engineering data management

0),,,( =ptxxf &

0),( =pxF

bAx =BxAx λ=

..),(min tsuxuφ

0),( =uxFsoftware integration

performance optimization

“integrated software infrastructure centers”

Page 26: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Keyword: “Optimal” Convergence rate nearly

independent of discretization parameters multilevel schemes for rapid linear

convergence of linear problems Newton-like schemes for quadratic

convergence of nonlinear problems

Convergence rate as independent as possible of physical parameters continuation schemes physics-based preconditioning

Optimal convergence plus scalable loop body yields scalable solver

unscalable

scalable

Problem Size (increasing with number of processors)

Tim

e to

So

luti

on

200

150

50

0

100

10 100 10001

Page 27: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

But where to go past O(N) ? Since O(N) is already optimal, there is nowhere further

“upward” to go in efficiency, but one must extend optimality “outward,” to more general problems

Hence, for instance, algebraic multigrid (AMG) to seek to obtain O(N) in indefinite, anisotropic, or inhomogeneous problems on irregular grids

AMG FrameworkR n

Choose coarse grids, transfer operators, and smoothers to

eliminate these “bad” components within a smaller dimensional space, and recur

error easily damped by pointwise relaxation

algebraically smooth error

Page 28: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Toolchain for PDE solvers in TOPS project Design and implementation of “solvers”

Time integrators

Nonlinear solvers

Constrained optimizers

Linear solvers

Eigensolvers

Software integration Performance optimization

0),,,( =ptxxf &

0),( =pxF

bAx =

BxAx λ=

0,0),(..),(min ≥= uuxFtsuxuφ

Optimizer

Linear solver

Eigensolver

Time integrator

Nonlinear solver

Indicates dependence

Sens. Analyzer

(w/ sens. anal.)

(w/ sens. anal.)

Page 29: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Dominant data structures are grid-based

finite differences finite elements

finite volumes

All lead to problems with sparse Jacobian matrices; many tasks can leverage off an efficient set of tools for manipulating distributed sparse data structures

J=

node i

row i

Page 30: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Newton-Krylov-Schwarz: a PDE applications “workhorse”

Newtonnonlinear solver

asymptotically quadratic

0)(')()( =+≈ uuFuFuF cc uuu c λ+=

Krylovaccelerator

spectrally adaptive

FuJ −=}{minarg

},,,{ 2

FJxuFJJFFVx

+=≡∈ L

δ

Schwarzpreconditionerparallelizable

FMuJM 11 −− −=

iTii

Tii RJRRRM 11 )( −− ∑=

Page 31: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

SPMD parallelism w/domain decomposition

Partitioning of the grid induces block structure on the Jacobian

1

2

3

A23A21 A22

rows assigned to proc “2”

Page 32: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Time-implicit Newton-Krylov-SchwarzFor accommodation of unsteady problems, and nonlinear robustness in

steady ones, NKS iteration is wrapped in time-stepping:for (l = 0; l < n_time; l++) {

select time step

for (k = 0; k < n_Newton; k++) {

compute nonlinear residual and Jacobian

for (j = 0; j < n_Krylov; j++) {

forall (i = 0; i < n_Precon ; i++) {

solve subdomain problems concurrently

} // End of loop over subdomains

perform Jacobian-vector product

enforce Krylov basis conditions

update optimal coefficients

check linear convergence

} // End of linear solver

perform DAXPY update

check nonlinear convergence

} // End of nonlinear loop

} // End of time-step loop

NKS loop

Pseudo-time loop

Page 33: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

(N)KS kernel in parallel

local scatter

Jac-vec multiply

precond sweep

daxpy inner product

Krylov iteration

Bulk synchronous model leads to easy scalability analyses and projections. Each phase can be considered separately. What happens if, for instance, in this (schematicized) iteration, arithmetic speed is doubled, scalar all-gather is quartered, and local scatter is cut by one-third?

P1:

P2:

Pn:

M

…P1:

P2:

Pn:

M

Page 34: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Estimating scalability of stencil computations Given complexity estimates of the leading terms of:

the concurrent computation (per iteration phase) the concurrent communication the synchronization frequency

And a bulk synchronous model of the architecture including: internode communication (network topology and protocol reflecting horizontal

memory structure) on-node computation (effective performance parameters including vertical

memory structure)

One can estimate optimal concurrency and optimal execution time on per-iteration basis, or overall (by taking into account any granularity-

dependent convergence rate), based on problem size N and concurrency P simply differentiate time estimate in terms of (N,P) with respect to P, equate to

zero and solve for P in terms of N

Page 35: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Scalability results for DD stencil computations With tree-based (logarithmic) global reductions and

scalable nearest neighbor hardware: optimal number of processors scales linearly with problem

size

With 3D torus-based global reductions and scalable nearest neighbor hardware: optimal number of processors scales as three-fourths power

of problem size (almost “scalable”)

With common network bus (heavy contention): optimal number of processors scales as one-fourth power

of problem size (not “scalable”) bad news for conventional Beowulf clusters, but see 2000

Bell Prize “price-performance awards”, for multiple NICs

Page 36: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

PETSc codeUser code

ApplicationInitialization

FunctionEvaluation

JacobianEvaluation

Post-Processing

PC KSPPETSc

Main Routine

Linear Solvers (SLES)

Nonlinear Solvers (SNES)

Timestepping Solvers (TS)

NKS efficiently implemented in PETSc’s MPI-based distributed data structures

Page 37: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

PETSc codeUser code

ApplicationInitialization

FunctionEvaluation

JacobianEvaluation

Post-Processing

PC KSPPETSc

Main Routine

Linear Solvers (SLES)

Nonlinear Solvers (SNES)

Timestepping Solvers (TS)

User Code/PETSc library interactions

Can be AD code

Page 38: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

1999 Bell Prize for unstructured grid computational aerodynamics

mesh c/o D. Mavriplis, ICASE

Implemented in PETSc

www.mcs.anl.gov/petsc

Transonic “Lambda” Shock, Mach contours on surfaces

Page 39: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Fixed-size parallel scaling results

Four orders of magnitude in 13 years

c/o K. Anderson, W. Gropp, D. Kaushik, D. Keyes and B. Smith

128 nodes 128 nodes 43min43min

3072 nodes 3072 nodes 2.5min, 2.5min, 226Gf/s226Gf/s

11M unknowns 11M unknowns 70% efficient70% efficient

Page 40: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

BEB ⋅∇∇+×−∇=∂∂

divbtκ

JBVE η+×−=

BJ ×∇=0μ

( ) nDnt

n∇⋅∇=⋅∇+

∂V

VBJVVV ∇⋅∇+∇−×=⎟

⎞⎜⎝

⎛∇⋅+

∂∂ νρρ p

t

( )[ ] QTnpTt

Tn+∇⋅+−⋅∇+⋅∇−=⎟

⎞⎜⎝

⎛∇⋅+

− ⊥⊥ IbbVV χχχγ

ˆˆ1 ||

Physical models based on fluid-like magnetohydrodynamics (MHD)

0

Three “war stories” from magnetic fusion energy applications in SciDAC

Page 41: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

• Conditions of interest possess two properties that pose great challenges to numerical approaches—anisotropy and stiffness.

• Anisotropy produces subtle balances of large forces, and vastly different parallel and perpendicular transport properties.

• Stiffness reflects the vast range of time-scales in the system: targeted physics is slow (~transport scale) compared to waves

Challenges in magnetic fusion

Page 42: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Tokamak/stellerator simulations Center for Extended MHD Modeling (based at

Princeton Plasma Physics Lab) M3D code Realistic toroidal geom., unstructured mesh,

hybrid FE/FD discretization Fields expanded in scalar potentials, and

streamfunctions Operator-split, linearized, w/11 potential

solves in each poloidal cross-plane/step (90% exe. time)

Parallelized w/PETSc (Tang et al., SIAM PP01, Chen et al., SIAM AN02, Jardin et al., SIAM CSE03)

Want from TOPS: Now: scalable linear implicit solver for much

higher resolution (and for AMR) Later: fully nonlinearly implicit solvers and

coupling to other codes

Page 43: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Provided new solvers across existing interfaces Hypre in PETSc

codes with PETSc interface (like CEMM’s M3D) can now invoke Hypre routines as solvers or preconditioners with command-line switch

SuperLU_DIST in PETSc as above, with SuperLU_DIST

Hypre in AMR Chombo code so far, Hypre is level-solver only; its AMG will ultimately

be useful as a bottom-solver, since it can be coarsened indefinitely without attention to loss of nested geometric structure; also FAC is being developed for AMR uses, like Chombo

Page 44: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

smoother

Finest Grid

First Coarse Grid

coarser grid has fewer cells (less work & storage)

Restrictiontransfer from fine to coarse grid

Recursively apply this idea until we have an easy problem to solve

A Multigrid V-cycle

Prolongationtransfer from coarse to fine grid

Hypre: multilevel preconditioning

Page 45: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Hypre’s AMG in M3D PETSc-based PPPL code M3D has been retrofit with Hypre’s algebraic

MG solver of Ruge-Steuben type Iteration count results below are averaged over 19 different PETSc

SLESSolve calls in initialization and one timestep loop for this operator split unsteady code, abcissa is number of procs in scaled problem; problem size ranges from 12K to 303K unknowns (approx 4K per processor)

0

100

200

300

400

500

600

700

3 12 27 48 75

ASM-GMRESAMG-FMGRES

Page 46: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Hypre’s AMG in M3D Scaled speedup timing results below are summed over 19 different PETSc

SLESSolve calls in initialization and one timestep loop for this operator split unsteady code

Majority of AMG cost is coarse-grid formation (preprocessing) which does not scale as well as the inner loop V-cycle phase; in production, these coarse hierarchies will be saved for reuse (same linear systems are called in each timestep loop), making AMG much less expensive and more scalable

0

10

20

30

40

50

60

3 12 27 48 75

ASM-GMRESAMG-FMGRESAMG inner (est)

Page 47: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Hypre’s “Conceptual Interfaces”

Data Layout

structured composite block-struc unstruc CSR

Linear Solvers

GMG, ... FAC, ... Hybrid, ... AMGe, ... ILU, ...

Linear System Interfaces

(Slide c/o E. Chow, LLNL)

Page 48: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

SuperLU in NIMROD NIMROD is another MHD code in the CEMM collaboration

employs high-order elements on unstructured grids very poor convergence with default Krylov solver on 2D poloidal

crossplane linear solves

TOPS wired in SuperLU, just to try a sparse direct solver Speedup of more than 10 in serial, and about 8 on a

modest parallel cluster (24 procs) PI Dalton Schnack (General Atomics) thought he entered a

time machine SuperLU is not a “final answer”, but a sanity check Parallel ILU under Krylov should be superior

Page 49: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Equilibrium:

Model equations: (Porcelli et al., 1993, 1999)

2D Hall MHD sawtooth instability (PETSc examples /snes/ex29.c and /sles/ex31.c)

(figures c/o A. Bhattacharjee, CMRS)

Vorticity, early time

Vorticity, later time

zoom

Page 50: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

PETSc’s DMMG in Hall MR application Implicit code (snes/ex29.c)

versus explicit code (sles/ex31.c), both with second-order integration in time

Implicit code (snes/ex29.c) with first- and second-order integration in time

Page 51: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Abstract Gantt Chart for TOPS

Algorithmic Development

Research Implementations

Hardened Codes

Applications Integration

Dissemination

time

e.g.,PETSc

e.g.,TOPSLib

e.g., ASPIN

Each color module represents an algorithmic research idea on its way to becoming part of a supported community software tool. At any moment (vertical time slice), TOPS has work underway at multiple levels. While some codes are in applications already, they are being improved in functionality and performance as part of the TOPS research agenda.

Page 52: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Jacobian-free Newton-Krylov In the Jacobian-Free Newton-Krylov (JFNK) method, a

Krylov method solves the linear Newton correction equation, requiring Jacobian-vector products

These are approximated by the Fréchet derivatives

(where is chosen with a fine balance between approximation and floating point rounding error) or automatic differentiation, so that the actual Jacobian elements are never explicitly needed

One builds the Krylov space on a true F’(u) (to within numerical approximation)

)]()([1

)( uFvuFvuJ −+≈ εε

ε

Page 53: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Philosophy of Jacobian-free NK To evaluate the linear residual, we use the true F’(u) , giving

a true Newton step and asymptotic quadratic Newton convergence

To precondition the linear residual, we do anything convenient that uses understanding of the dominant physics/mathematics in the system and respects the limitations of the parallel computer architecture and the cost of various operations: Jacobian of lower-order discretization Jacobian with “lagged” values for expensive terms Jacobian stored in lower precision Jacobian blocks decomposed for parallelism Jacobian of related discretization operator-split Jacobians physics-based preconditioning

Page 54: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Recall idea of preconditioning Krylov iteration is expensive in memory and in

function evaluations, so subspace dimension k must be kept small in practice, through preconditioning the Jacobian with an approximate inverse, so that the product matrix has low condition number in

Given the ability to apply the action of to a vector, preconditioning can be done on either the left, as above, or the right, as in, e.g., for matrix-free:

)]()([1 11 uFvBuFvJB −+≈ −− εε

bBxAB 11 )( −− =1−B

Page 55: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Physics-based preconditioning In Newton iteration, one seeks to obtain a correction

(“delta”) to solution, by inverting the Jacobian matrix on (the negative of) the nonlinear residual:

A typical operator-split code also derives a “delta” to the solution, by some implicitly defined means, through a series of implicit and explicit substeps

This implicitly defined mapping from residual to “delta” is a natural preconditioner

Software must accommodate this!

)()]([ 1 kkk uFuJu −−=

kk uuF a)(

Page 56: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Physics-based preconditioning We consider a standard “dynamical

core,” the shallow-water wave splitting algorithm, as a solver

Leaves a first-order in time splitting error

In the Jacobian-free Newton-Krylov framework, this solver, which maps a residual into a correction, can be regarded as a preconditioner

The true Jacobian is never formed yet the time-implicit nonlinear residual at each time step can be made as small as needed for nonlinear consistency in long time integrations

Page 57: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Example: shallow water equations Continuity (*)

Momentum (**)

These equations admit a fast gravity wave, as can be seen by cross differentiating, e.g., (*) by t and (**) by x, and subtracting:

0)(=

∂∂

+∂∂

xu

tφφ

0)()( 2

=∂∂

+∂

∂+

∂∂

xg

xu

tu φφφφ

termsotherx

gt

=∂∂

−∂∂

2

2

2

2 φφφ

×∂∂t

×∂∂x

Page 58: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

1D shallow water equations, cont.

Wave equation for geopotential:

Gravity wave speed φg

Typically , but stability restrictions would require timesteps based on the Courant-Friedrichs-Levy (CFL) criterion for the fastest wave, for an explicit method

One can solve fully implicitly, or one can filter out the gravity wave by solving semi-implicitly

ug >>φ

termsotherx

gt

=∂∂

−∂∂

2

2

2

2 φφφ

Page 59: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

1D shallow water equations, cont. Continuity (*)

Momentum (**)

0)( 11

=∂

∂+

− ++

xu nnn φ

φφ

0)()()( 121

=∂

∂+

∂∂

+− ++

xg

xuuu n

nnnn φφφ

φφ

Solving (**) for and substituting into (*),

where

1)( +nuφ

x

S

xxg

nn

nnn

∂∂

+=∂

∂∂∂

−+

+ φφφφ )(1

21

x

uuS

nnn

∂∂

−=)(

)(2φφ

Page 60: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

1D shallow water equations, cont. After the parabolic equation is spatially discretized and

solved for , then can be found from n

nnn S

xgu +

∂∂

−=+

+1

1)(φφφ

One scalar parabolic solve and one scalar explicit update replace an implicit hyperbolic system

This semi-implicit operator splitting is foundational to multiple scales problems in geophysical modeling

Similar tricks are employed in aerodynamics (sound waves), MHD (multiple Alfvén waves), reacting flows (fast kinetics), etc.

Temporal truncation error remains due to the lagging of the advection

in (**)

1+nφ 1)( +nuφ

To be dealt with shortly

Page 61: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

1D Shallow water preconditioning Define continuity residual for each timestep:

Define momentum residual for each timestep:

φφφ

_)]([

Rx

u−=

∂+

φφφφ

uRx

gu n _

][)(−=

∂+

Continuity delta-form (*):

Momentum delta form (**):

x

uR

nnn

∂∂

+−

≡++ 11 )(

φφφ

xg

x

uuuuR

nn

nnn

∂∂

+∂

∂+

−≡

++ 121 )()()(_

φφφ

φφφ

Page 62: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

1D Shallow water preconditioning, cont. Solving (**) for and substituting into (*),

After this parabolic equation is solved for , we have

This completes the application of the preconditioner to one Newton-Krylov iteration at one timestep

Of course, the parabolic solve need not be done exactly; one sweep of multigrid can be used See paper by Mousseau et al. (2002) for impressive results for longtime weather integration

)( φ u

)_(_)][

( 22 φφφφφ uRx

Rxx

g n

∂∂

+−=∂

∂∂∂

φφφφ uRx

gu n _][

)( −∂

∂−=

Page 63: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Physics-based preconditioning update

So far, physics-based preconditioning has been applied to several codes at Los Alamos, in an effort led by D. Knoll

Summarized in new J. Comp. Phys. paper by Knoll & Keyes (Jan 2004)

PETSc’s “shell preconditioner” is designed for inserting physics-based preconditioners, and PETSc’s solvers underneath are building blocks

Page 64: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Nonlinear Schwarz preconditioning Nonlinear Schwarz has Newton both inside and

outside and is fundamentally Jacobian-free It replaces with a new nonlinear system

possessing the same root, Define a correction to the partition (e.g.,

subdomain) of the solution vector by solving the following local nonlinear system:

where is nonzero only in the components of the partition

Then sum the corrections: to get an implicit function of u

0)( =uF0)( =Φ u

thi

thi

)(ui

0))(( =+ uuFR ii n

i u ℜ∈)(

)()( uu ii ∑=Φ

Page 65: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Nonlinear Schwarz – picture

1

1

1

1

0 0

u

F(u)

Ri

RiuRiF

Page 66: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Nonlinear Schwarz – picture

1

1

1

1

0 0

1

1

1

1

0 0

u

F(u)

Ri

Rj

Riu

RjF

RiF

Rju

Page 67: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Nonlinear Schwarz – picture

u

F(u)

Fi’(ui)

Ri

Rj

δiu+δju

1

1

1

1

0 0

1

1

1

1

0 0 RiuRiF

RjuRjF

Page 68: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Nonlinear Schwarz, cont. It is simple to prove that if the Jacobian of F(u) is

nonsingular in a neighborhood of the desired root then and have the same unique root

To lead to a Jacobian-free Newton-Krylov algorithm we need to be able to evaluate for any : The residual The Jacobian-vector product

Remarkably, (Cai-Keyes, 2000) it can be shown that

where and All required actions are available in terms of !

0)( =Φ u

nvu ℜ∈,)()( uu ii ∑=Φ

0)( =uF

vu ')(Φ

JvRJRvu iiTii )()( 1' −∑≈Φ

)(' uFJ = Tiii JRRJ =

)(uF

Page 69: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Experimental example of nonlinear Schwarz

Newton’s methodAdditive Schwarz Preconditioned Inexact Newton

(ASPIN)

Difficulty at critical Re

Stagnation beyond

critical Re

Convergence for all Re

Page 70: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

The 2003 SCaLeS initiative

Workshop on a Science-based Case for Large-scale Simulation

Arlington, VA

24-25 June 2003

Page 71: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Charge (April 2003, W. Polansky, DOE): “Identify rich and fruitful directions for the

computational sciences from the perspective of scientific and engineering applications”

Build a “strong science case for an ultra-scale computing capability for the Office of Science”

“Address major opportunities and challenges facing computational sciences in areas of strategic importance to the Office of Science”

“Report by July 30, 2003”

Page 72: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

Chapter 1. Introduction

Chapter 2. Scientific Discovery through Advanced Computing: a Successful Pilot Program

Chapter 3. Anatomy of a Large-scale Simulation

Chapter 4. Opportunities at the Scientific Horizon

Chapter 5. Enabling Mathematics and Computer Science Tools

Chapter 6. Recommendations and Discussion

Volume 2 (due out early 2004):

11 chapters on applications

8 chapters on mathematical methods

8 chapters on computer science and infrastructure

First fruits!

Page 73: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

“There will be opened a gateway and a road to a large and excellent science

into which minds more piercing than mine shall penetrate to recesses still deeper.”

Galileo (1564-1642)(on ‘experimental mathematical analysis of nature’

appropriated here for ‘simulation science’)

Page 74: Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer Science The University of Chicago.

CMSC34000 Lecture 2, 6 Jan 2005

Related URLs TOPS project

http://www.tops-scidac.org SciDAC initiative

http://www.science.doe.gov/scidac

SCaLeS reporthttp://www.pnl.gov/scales