Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer...

Canonical Problems and Scalable Solvers for

Numerical PDERobert C. Kirby

Department of Computer Science

The University of Chicago

CMSC34000 Lecture 2, 6 Jan 2005

Outline

Canonical problems in Scientific Computing Algebraic equations Differential equations Optimization

Overview of SciDAC/TOPS Hard problems Cutting edge research Funding opportunities


Canonical Problems

Problem Linear systems Eigenvalue Ordinary DE Nonlinear systems Constrained

optimization

bAx =BxAx λ=

0),,,( =ptxxf &

€

F(x) = 0

..),(min tsuxuφ

0),( =uxF


Linear Systems

Ax = b PDE --> Linear System Nonlinear system --> Sequence of linear steps

Millions of unknowns Sparse Ill-conditioned k(A) = || A || || A^-1 || Nonsymmetric


Solution techniques

Direct (Gaussian elimination & variants) Iterative (Sequence of approximations)

Jacobi / Gauss Seidel SOR Krylov-subspace methods (sequence of clever matrix-vector

products builds up good approximation Multigrid (algebraic / geometric)

Parallelism Each process owns its own rows Matrix action --> communication

Euler methods (forward, backward) Runge-Kutta methods


Ordinary differential equations

Time rate of change of system of quantities) Chemical/nuclear reactions N-body problems (celestial mechanics) Time-dependent PDE (after spatial discretization)

Algebraic equations at each time step “Stiffness”

Wide range of time scales requires special treatment Fastest modes require smallest time steps

Multistep methods


ODE Techniques

Euler methods (forward, backward) Runge-Kutta methods Multi-adaptive Galerkin (Anders Logg) Parallelism?

Depends on coupling of components Naturally arises from PDE Parallelism in time is more difficult…


PDE

Multiple dependent variables

Algebraic equations If time dependent, get ODE or DAE at each

time step€

u ⋅∇u −νΔu +∇p = 0

∇ ⋅u = 0


Solution strategies

Finite differences (FDM) Replace derivatives at points with difference quotient Naturally leads to equations for each grid point Irregular, adaptive grids require special treatment

Finite elements (FEM) Replace “strong form” with “weak form” Weak form over subspaces leads to algebraic system Generalizes FDM (geometry, order)

Finite volumes Discrete conservation laws over each control volume Often “low order” but “stable”


Nonlinear equations

Arise from nonlinear PDE, ODE, etc Require Newton’s method

Jacobian matrix (differentiation) Nonlinear Preconditioning

€

F(u) ≈ F(uc ) +F '(uc )δu = 0

u = uc + λδu


Optimization

Find the minimum of some function subject to some constraints

Constaints may themselves be PDE


Motivation Solver performance is a major concern for parallel

simulations based on PDE formulations … including many of those of the U.S. DOE Scientific Discovery

through Advanced Computing (SciDAC) program

For target applications, implicit solvers may require 50% to 95% of execution time … at least, before “expert” overhaul for algorithmic optimality and

implementation performance

Even after a “best manual practice” overhaul, the solver may still require 20% to 50% of execution time

The solver may hit up against both the processor scalability limit and the memory bandwidth limitation of a PDE-based application, before any other part of the code … the first of these is not fundamental, though the second one is


Presentation plan Overview of the SciDAC initiative

Brief review of scalable implicit methods (domain decomposed multilevel iterative methods) algorithms software components: PETSc, Hypre, etc.

Overview of the Terascale Optimal PDE Simulations project (TOPS)

Three “war stories” from the SciDAC magnetically confined fusion energy portfolio

Some advanced research directions physics-based preconditioning nonlinear Schwarz

On the horizon


SciDAC apps and infrastructure4 projects

in high energy and

nuclear physics

5 projects in fusion

energy science

14 projects in biological and environmental research

10 projects will in basic energy sciences

18 projects in scientific

software and network

infrastructure


“Enabling technologies” groups to develop reusable software and partner with application groups

From 2001 start-up, 51 projects share $57M/year Approximately one-third for applications A third for “integrated software infrastructure

centers” A third for grid infrastructure and collaboratories

Plus, multi-Tflop/s IBM SP machines at NERSC and ORNL available for SciDAC researchers


Unclassified resources for DOE science

IBM Power4 Regatta

32 procs per node

864 procs total

4.5 Tflop/s

“Cheetah”

IBM Power3+ SMP

16 procs per node

6656 procs total

10 Tflop/s

“Seaborg”

Berkeley

Oak Ridge


Designing a simulation code(from 2001 SciDAC report)

V&V loop

Performance loop


Hardware Infrastructure

ARCHITECTURES

Applications

A “perfect storm” for simulation

scientific models

numerical algorithms

computer architecture

scientific software engineering

1686

1947

1976

“Computational science is undergoing a phase transition.” – D. Hitchcock, DOE

(dates are symbolic)


Imperative: multiple-scale applications Multiple spatial scales

interfaces, fronts, layers thin relative to domain size,

<< L Multiple temporal scales

fast waves small transit times relative to

convection or diffusion, << T

Analyst must isolate dynamics of interest and model the rest in a system that can be discretized over more modest range of scales

May lead to infinitely “stiff” subsystem requiring special treatment by the solution method

Richtmeyer-Meshkov instability, c/o A. Mirin, LLNL


Examples: multiple-scale applications Biopolymers, nanotechnology

1012 range in time, from 10-15 sec (quantum fluctuation) to 10-3 sec (molecular folding time)

typical computational model ignores smallest scales, works on classical dynamics only, but scientists increasingly want both

Galaxy formation 1020 range in space from binary star interactions

to diameter of universe heroic computational model handles all scales

with localized adaptive meshing

Supernova simulation, c/o A. Mezzacappa, ORNL

Supernovae simulation massive ranges in time and

space scales for radiation, turbulent convection, diffusion, chemical reaction, nuclear reaction


SciDAC portfolio characteristics Multiple temporal scales Multiple spatial scales Linear ill conditioning Complex geometry and severe anisotropy Coupled physics, with essential nonlinearities Ambition for uncertainty quantification,

parameter estimation, and design

Need toolkit of portable, extensible, tunable implicit solvers, not “one-size fits all”


TOPS starting point codes PETSc (ANL) Hypre (LLNL) Sundials (LLNL) SuperLU (LBNL) PARPACK (LBNL*) TAO (ANL) Veltisto (CMU) Many interoperability connections between these

packages that predated SciDAC Many application collaborators that predated SciDAC


TOPS participants

ODU

UC-B/LBNLANL

UT-K

TOPS lab (3)

CU

LLNL

TOPS university (7)

CMU

CU-B

NYU


In the old days, see “Templates” guides …www.netlib.org/etemplateswww.netlib.org/templates

124 pp. 410 pp.… these are good starts, but not adequate for SciDAC scales!


34 applications groups

7 ISIC groups (4 CS, 3 Math)

10 grid, data collaboratory groups

adaptive gridding discretization

solvers (TOPS)

systems software component architecture performance engineering data management

0),,,( =ptxxf &

0),( =pxF

bAx =BxAx λ=

..),(min tsuxuφ

0),( =uxFsoftware integration

performance optimization

“integrated software infrastructure centers”


Keyword: “Optimal” Convergence rate nearly

independent of discretization parameters multilevel schemes for rapid linear

convergence of linear problems Newton-like schemes for quadratic

convergence of nonlinear problems

Convergence rate as independent as possible of physical parameters continuation schemes physics-based preconditioning

Optimal convergence plus scalable loop body yields scalable solver

unscalable

scalable

Problem Size (increasing with number of processors)

Tim

e to

So

luti

on

200

150

50

0

100

10 100 10001


But where to go past O(N) ? Since O(N) is already optimal, there is nowhere further

“upward” to go in efficiency, but one must extend optimality “outward,” to more general problems

Hence, for instance, algebraic multigrid (AMG) to seek to obtain O(N) in indefinite, anisotropic, or inhomogeneous problems on irregular grids

AMG FrameworkR n

Choose coarse grids, transfer operators, and smoothers to

eliminate these “bad” components within a smaller dimensional space, and recur

error easily damped by pointwise relaxation

algebraically smooth error


Toolchain for PDE solvers in TOPS project Design and implementation of “solvers”

Time integrators

Nonlinear solvers

Constrained optimizers

Linear solvers

Eigensolvers

Software integration Performance optimization

0),,,( =ptxxf &

0),( =pxF

bAx =

BxAx λ=

0,0),(..),(min ≥= uuxFtsuxuφ

Optimizer

Linear solver

Eigensolver

Time integrator

Nonlinear solver

Indicates dependence

Sens. Analyzer

(w/ sens. anal.)

(w/ sens. anal.)


Dominant data structures are grid-based

finite differences finite elements

finite volumes

All lead to problems with sparse Jacobian matrices; many tasks can leverage off an efficient set of tools for manipulating distributed sparse data structures

J=

node i

row i


Newton-Krylov-Schwarz: a PDE applications “workhorse”

Newtonnonlinear solver

asymptotically quadratic

0)(')()( =+≈ uuFuFuF cc uuu c λ+=

Krylovaccelerator

spectrally adaptive

FuJ −=}{minarg

},,,{ 2

FJxuFJJFFVx

+=≡∈ L

δ

Schwarzpreconditionerparallelizable

FMuJM 11 −− −=

iTii

Tii RJRRRM 11 )( −− ∑=


SPMD parallelism w/domain decomposition

Partitioning of the grid induces block structure on the Jacobian

1

2

3

A23A21 A22

rows assigned to proc “2”


Time-implicit Newton-Krylov-SchwarzFor accommodation of unsteady problems, and nonlinear robustness in

steady ones, NKS iteration is wrapped in time-stepping:for (l = 0; l < n_time; l++) {

select time step

for (k = 0; k < n_Newton; k++) {

compute nonlinear residual and Jacobian

for (j = 0; j < n_Krylov; j++) {

forall (i = 0; i < n_Precon ; i++) {

solve subdomain problems concurrently

} // End of loop over subdomains

perform Jacobian-vector product

enforce Krylov basis conditions

update optimal coefficients

check linear convergence

} // End of linear solver

perform DAXPY update

check nonlinear convergence

} // End of nonlinear loop

} // End of time-step loop

NKS loop

Pseudo-time loop


(N)KS kernel in parallel

local scatter

Jac-vec multiply

precond sweep

daxpy inner product

Krylov iteration

…

Bulk synchronous model leads to easy scalability analyses and projections. Each phase can be considered separately. What happens if, for instance, in this (schematicized) iteration, arithmetic speed is doubled, scalar all-gather is quartered, and local scatter is cut by one-third?

P1:

P2:

Pn:

M

…P1:

P2:

Pn:

M


Estimating scalability of stencil computations Given complexity estimates of the leading terms of:

the concurrent computation (per iteration phase) the concurrent communication the synchronization frequency

And a bulk synchronous model of the architecture including: internode communication (network topology and protocol reflecting horizontal

memory structure) on-node computation (effective performance parameters including vertical

memory structure)

One can estimate optimal concurrency and optimal execution time on per-iteration basis, or overall (by taking into account any granularity-

dependent convergence rate), based on problem size N and concurrency P simply differentiate time estimate in terms of (N,P) with respect to P, equate to

zero and solve for P in terms of N


Scalability results for DD stencil computations With tree-based (logarithmic) global reductions and

scalable nearest neighbor hardware: optimal number of processors scales linearly with problem

size

With 3D torus-based global reductions and scalable nearest neighbor hardware: optimal number of processors scales as three-fourths power

of problem size (almost “scalable”)

With common network bus (heavy contention): optimal number of processors scales as one-fourth power

of problem size (not “scalable”) bad news for conventional Beowulf clusters, but see 2000

Bell Prize “price-performance awards”, for multiple NICs


PETSc codeUser code

ApplicationInitialization

FunctionEvaluation

JacobianEvaluation

Post-Processing

PC KSPPETSc

Main Routine

Linear Solvers (SLES)

Nonlinear Solvers (SNES)

Timestepping Solvers (TS)

NKS efficiently implemented in PETSc’s MPI-based distributed data structures


PETSc codeUser code

ApplicationInitialization

FunctionEvaluation

JacobianEvaluation

Post-Processing

PC KSPPETSc

Main Routine

Linear Solvers (SLES)

Nonlinear Solvers (SNES)

Timestepping Solvers (TS)

User Code/PETSc library interactions

Can be AD code


1999 Bell Prize for unstructured grid computational aerodynamics

mesh c/o D. Mavriplis, ICASE

Implemented in PETSc

www.mcs.anl.gov/petsc

Transonic “Lambda” Shock, Mach contours on surfaces


Fixed-size parallel scaling results

Four orders of magnitude in 13 years

c/o K. Anderson, W. Gropp, D. Kaushik, D. Keyes and B. Smith

128 nodes 128 nodes 43min43min

3072 nodes 3072 nodes 2.5min, 2.5min, 226Gf/s226Gf/s

11M unknowns 11M unknowns 70% efficient70% efficient


BEB ⋅∇∇+×−∇=∂∂

divbtκ

JBVE η+×−=

BJ ×∇=0μ

( ) nDnt

n∇⋅∇=⋅∇+

∂

∂V

VBJVVV ∇⋅∇+∇−×=⎟

⎠

⎞⎜⎝

⎛∇⋅+

∂∂ νρρ p

t

( )[ ] QTnpTt

Tn+∇⋅+−⋅∇+⋅∇−=⎟

⎠

⎞⎜⎝

⎛∇⋅+

∂

∂

− ⊥⊥ IbbVV χχχγ

ˆˆ1 ||

Physical models based on fluid-like magnetohydrodynamics (MHD)

0

Three “war stories” from magnetic fusion energy applications in SciDAC


• Conditions of interest possess two properties that pose great challenges to numerical approaches—anisotropy and stiffness.

• Anisotropy produces subtle balances of large forces, and vastly different parallel and perpendicular transport properties.

• Stiffness reflects the vast range of time-scales in the system: targeted physics is slow (~transport scale) compared to waves

Challenges in magnetic fusion


Tokamak/stellerator simulations Center for Extended MHD Modeling (based at

Princeton Plasma Physics Lab) M3D code Realistic toroidal geom., unstructured mesh,

hybrid FE/FD discretization Fields expanded in scalar potentials, and

streamfunctions Operator-split, linearized, w/11 potential

solves in each poloidal cross-plane/step (90% exe. time)

Parallelized w/PETSc (Tang et al., SIAM PP01, Chen et al., SIAM AN02, Jardin et al., SIAM CSE03)

Want from TOPS: Now: scalable linear implicit solver for much

higher resolution (and for AMR) Later: fully nonlinearly implicit solvers and

coupling to other codes


Provided new solvers across existing interfaces Hypre in PETSc

codes with PETSc interface (like CEMM’s M3D) can now invoke Hypre routines as solvers or preconditioners with command-line switch

SuperLU_DIST in PETSc as above, with SuperLU_DIST

Hypre in AMR Chombo code so far, Hypre is level-solver only; its AMG will ultimately

be useful as a bottom-solver, since it can be coarsened indefinitely without attention to loss of nested geometric structure; also FAC is being developed for AMR uses, like Chombo


smoother

Finest Grid

First Coarse Grid

coarser grid has fewer cells (less work & storage)

Restrictiontransfer from fine to coarse grid

Recursively apply this idea until we have an easy problem to solve

A Multigrid V-cycle

Prolongationtransfer from coarse to fine grid

Hypre: multilevel preconditioning


Hypre’s AMG in M3D PETSc-based PPPL code M3D has been retrofit with Hypre’s algebraic

MG solver of Ruge-Steuben type Iteration count results below are averaged over 19 different PETSc

SLESSolve calls in initialization and one timestep loop for this operator split unsteady code, abcissa is number of procs in scaled problem; problem size ranges from 12K to 303K unknowns (approx 4K per processor)

0

100

200

300

400

500

600

700

3 12 27 48 75

ASM-GMRESAMG-FMGRES


Hypre’s AMG in M3D Scaled speedup timing results below are summed over 19 different PETSc

SLESSolve calls in initialization and one timestep loop for this operator split unsteady code

Majority of AMG cost is coarse-grid formation (preprocessing) which does not scale as well as the inner loop V-cycle phase; in production, these coarse hierarchies will be saved for reuse (same linear systems are called in each timestep loop), making AMG much less expensive and more scalable

0

10

20

30

40

50

60

3 12 27 48 75

ASM-GMRESAMG-FMGRESAMG inner (est)


Hypre’s “Conceptual Interfaces”

Data Layout

structured composite block-struc unstruc CSR

Linear Solvers

GMG, ... FAC, ... Hybrid, ... AMGe, ... ILU, ...

Linear System Interfaces

(Slide c/o E. Chow, LLNL)


SuperLU in NIMROD NIMROD is another MHD code in the CEMM collaboration

employs high-order elements on unstructured grids very poor convergence with default Krylov solver on 2D poloidal

crossplane linear solves

TOPS wired in SuperLU, just to try a sparse direct solver Speedup of more than 10 in serial, and about 8 on a

modest parallel cluster (24 procs) PI Dalton Schnack (General Atomics) thought he entered a

time machine SuperLU is not a “final answer”, but a sanity check Parallel ILU under Krylov should be superior


Equilibrium:

Model equations: (Porcelli et al., 1993, 1999)

2D Hall MHD sawtooth instability (PETSc examples /snes/ex29.c and /sles/ex31.c)

(figures c/o A. Bhattacharjee, CMRS)

Vorticity, early time

Vorticity, later time

zoom


PETSc’s DMMG in Hall MR application Implicit code (snes/ex29.c)

versus explicit code (sles/ex31.c), both with second-order integration in time

Implicit code (snes/ex29.c) with first- and second-order integration in time


Abstract Gantt Chart for TOPS

Algorithmic Development

Research Implementations

Hardened Codes

Applications Integration

Dissemination

time

e.g.,PETSc

e.g.,TOPSLib

e.g., ASPIN

Each color module represents an algorithmic research idea on its way to becoming part of a supported community software tool. At any moment (vertical time slice), TOPS has work underway at multiple levels. While some codes are in applications already, they are being improved in functionality and performance as part of the TOPS research agenda.


Jacobian-free Newton-Krylov In the Jacobian-Free Newton-Krylov (JFNK) method, a

Krylov method solves the linear Newton correction equation, requiring Jacobian-vector products

These are approximated by the Fréchet derivatives

(where is chosen with a fine balance between approximation and floating point rounding error) or automatic differentiation, so that the actual Jacobian elements are never explicitly needed

One builds the Krylov space on a true F’(u) (to within numerical approximation)

)]()([1

)( uFvuFvuJ −+≈ εε

ε


Philosophy of Jacobian-free NK To evaluate the linear residual, we use the true F’(u) , giving

a true Newton step and asymptotic quadratic Newton convergence

To precondition the linear residual, we do anything convenient that uses understanding of the dominant physics/mathematics in the system and respects the limitations of the parallel computer architecture and the cost of various operations: Jacobian of lower-order discretization Jacobian with “lagged” values for expensive terms Jacobian stored in lower precision Jacobian blocks decomposed for parallelism Jacobian of related discretization operator-split Jacobians physics-based preconditioning


Recall idea of preconditioning Krylov iteration is expensive in memory and in

function evaluations, so subspace dimension k must be kept small in practice, through preconditioning the Jacobian with an approximate inverse, so that the product matrix has low condition number in

Given the ability to apply the action of to a vector, preconditioning can be done on either the left, as above, or the right, as in, e.g., for matrix-free:

)]()([1 11 uFvBuFvJB −+≈ −− εε

bBxAB 11 )( −− =1−B


Physics-based preconditioning In Newton iteration, one seeks to obtain a correction

(“delta”) to solution, by inverting the Jacobian matrix on (the negative of) the nonlinear residual:

A typical operator-split code also derives a “delta” to the solution, by some implicitly defined means, through a series of implicit and explicit substeps

This implicitly defined mapping from residual to “delta” is a natural preconditioner

Software must accommodate this!

)()]([ 1 kkk uFuJu −−=

kk uuF a)(


Physics-based preconditioning We consider a standard “dynamical

core,” the shallow-water wave splitting algorithm, as a solver

Leaves a first-order in time splitting error

In the Jacobian-free Newton-Krylov framework, this solver, which maps a residual into a correction, can be regarded as a preconditioner

The true Jacobian is never formed yet the time-implicit nonlinear residual at each time step can be made as small as needed for nonlinear consistency in long time integrations


Example: shallow water equations Continuity (*)

Momentum (**)

These equations admit a fast gravity wave, as can be seen by cross differentiating, e.g., (*) by t and (**) by x, and subtracting:

0)(=

∂∂

+∂∂

xu

tφφ

0)()( 2

=∂∂

+∂

∂+

∂∂

xg

xu

tu φφφφ

termsotherx

gt

=∂∂

−∂∂

2

2

2

2 φφφ

×∂∂t

×∂∂x


1D shallow water equations, cont.

Wave equation for geopotential:

Gravity wave speed φg

Typically , but stability restrictions would require timesteps based on the Courant-Friedrichs-Levy (CFL) criterion for the fastest wave, for an explicit method

One can solve fully implicitly, or one can filter out the gravity wave by solving semi-implicitly

ug >>φ

termsotherx

gt

=∂∂

−∂∂

2

2

2

2 φφφ


1D shallow water equations, cont. Continuity (*)

Momentum (**)

0)( 11

=∂

∂+

− ++

xu nnn φ

φφ

0)()()( 121

=∂

∂+

∂∂

+− ++

xg

xuuu n

nnnn φφφ

φφ

Solving (**) for and substituting into (*),

where

1)( +nuφ

x

S

xxg

nn

nnn

∂∂

+=∂

∂∂∂

−+

+ φφφφ )(1

21

x

uuS

nnn

∂∂

−=)(

)(2φφ


1D shallow water equations, cont. After the parabolic equation is spatially discretized and

solved for , then can be found from n

nnn S

xgu +

∂∂

−=+

+1

1)(φφφ

One scalar parabolic solve and one scalar explicit update replace an implicit hyperbolic system

This semi-implicit operator splitting is foundational to multiple scales problems in geophysical modeling

Similar tricks are employed in aerodynamics (sound waves), MHD (multiple Alfvén waves), reacting flows (fast kinetics), etc.

Temporal truncation error remains due to the lagging of the advection

in (**)

1+nφ 1)( +nuφ

To be dealt with shortly


1D Shallow water preconditioning Define continuity residual for each timestep:

Define momentum residual for each timestep:

φφφ

_)]([

Rx

u−=

∂

∂+

φφφφ

uRx

gu n _

][)(−=

∂

∂+

Continuity delta-form (*):

Momentum delta form (**):

x

uR

nnn

∂∂

+−

≡++ 11 )(

_φ

φφφ

xg

x

uuuuR

nn

nnn

∂∂

+∂

∂+

−≡

++ 121 )()()(_

φφφ

φφφ


1D Shallow water preconditioning, cont. Solving (**) for and substituting into (*),

After this parabolic equation is solved for , we have

This completes the application of the preconditioner to one Newton-Krylov iteration at one timestep

Of course, the parabolic solve need not be done exactly; one sweep of multigrid can be used See paper by Mousseau et al. (2002) for impressive results for longtime weather integration

)( φ u

)_(_)][

( 22 φφφφφ uRx

Rxx

g n

∂∂

+−=∂

∂∂∂

−

φφφφ uRx

gu n _][

)( −∂

∂−=


Physics-based preconditioning update

So far, physics-based preconditioning has been applied to several codes at Los Alamos, in an effort led by D. Knoll

Summarized in new J. Comp. Phys. paper by Knoll & Keyes (Jan 2004)

PETSc’s “shell preconditioner” is designed for inserting physics-based preconditioners, and PETSc’s solvers underneath are building blocks


Nonlinear Schwarz preconditioning Nonlinear Schwarz has Newton both inside and

outside and is fundamentally Jacobian-free It replaces with a new nonlinear system

possessing the same root, Define a correction to the partition (e.g.,

subdomain) of the solution vector by solving the following local nonlinear system:

where is nonzero only in the components of the partition

Then sum the corrections: to get an implicit function of u

0)( =uF0)( =Φ u

thi

thi

)(ui

0))(( =+ uuFR ii n

i u ℜ∈)(

)()( uu ii ∑=Φ


Nonlinear Schwarz – picture

1

1

1

1

0 0

u

F(u)

Ri

RiuRiF



1

1

1

1

0 0

1

1

1

1

0 0

u

F(u)

Ri

Rj

Riu

RjF

RiF

Rju



u

F(u)

Fi’(ui)

Ri

Rj

δiu+δju

1

1

1

1

0 0

1

1

1

1

0 0 RiuRiF

RjuRjF


Nonlinear Schwarz, cont. It is simple to prove that if the Jacobian of F(u) is

nonsingular in a neighborhood of the desired root then and have the same unique root

To lead to a Jacobian-free Newton-Krylov algorithm we need to be able to evaluate for any : The residual The Jacobian-vector product

Remarkably, (Cai-Keyes, 2000) it can be shown that

where and All required actions are available in terms of !

0)( =Φ u

nvu ℜ∈,)()( uu ii ∑=Φ

0)( =uF

vu ')(Φ

JvRJRvu iiTii )()( 1' −∑≈Φ

)(' uFJ = Tiii JRRJ =

)(uF


Experimental example of nonlinear Schwarz

Newton’s methodAdditive Schwarz Preconditioned Inexact Newton

(ASPIN)

Difficulty at critical Re

Stagnation beyond

critical Re

Convergence for all Re


The 2003 SCaLeS initiative

Workshop on a Science-based Case for Large-scale Simulation

Arlington, VA

24-25 June 2003


Charge (April 2003, W. Polansky, DOE): “Identify rich and fruitful directions for the

computational sciences from the perspective of scientific and engineering applications”

Build a “strong science case for an ultra-scale computing capability for the Office of Science”

“Address major opportunities and challenges facing computational sciences in areas of strategic importance to the Office of Science”

“Report by July 30, 2003”

Chapter 1. Introduction

Chapter 2. Scientific Discovery through Advanced Computing: a Successful Pilot Program

Chapter 3. Anatomy of a Large-scale Simulation

Chapter 4. Opportunities at the Scientific Horizon

Chapter 5. Enabling Mathematics and Computer Science Tools

Chapter 6. Recommendations and Discussion

Volume 2 (due out early 2004):

11 chapters on applications

8 chapters on mathematical methods

8 chapters on computer science and infrastructure

First fruits!


“There will be opened a gateway and a road to a large and excellent science

into which minds more piercing than mine shall penetrate to recesses still deeper.”

Galileo (1564-1642)(on ‘experimental mathematical analysis of nature’

appropriated here for ‘simulation science’)


Related URLs TOPS project

http://www.tops-scidac.org SciDAC initiative

http://www.science.doe.gov/scidac

SCaLeS reporthttp://www.pnl.gov/scales

Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer...

Documents

Transcript of Canonical Problems and Scalable Solvers for Numerical PDE Robert C. Kirby Department of Computer...