The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old...

65
The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University & Institute for Scientific Computing Research Lawrence Livermore National Laboratory Scientific Discovery through Advanced Computing (SciDAC)

description

Scientific Discovery through Advanced Computing (SciDAC). The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University & Institute for Scientific Computing Research Lawrence Livermore National Laboratory. Happy G ödel ’s Birthday!. - PowerPoint PPT Presentation

Transcript of The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old...

Page 1: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

The Pennsylvania State University28 April 2003

David E. KeyesCenter for Computational Science

Old Dominion University&

Institute for Scientific Computing ResearchLawrence Livermore National Laboratory

Scientific Discovery through Advanced Computing (SciDAC)

Page 2: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Happy Gödel’s Birthday! Born: 28 April 1906, Brünn,

Austria-Hungary Published “Incompleteness

Theorem”, 1931 Fellow, Royal Society, 1968 National Medal of Science,

1974 Died: 14 January 1978,

Princeton, NJ “Gave a formal demonstration

of the inadequacy of formal demonstrations”- anon.

“A consistency proof for any system … can be carried out only by modes of inference that are not formalized in the system … itself.”. – Kurt Gödel

Page 3: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Remarks This talk is:

a personal perspective, not an official statement of the U.S. Department of Energy

a project panorama more than a technical presentation

For related technical presentations: Tuesday 2:30pm, 116 McAllister Building personal homepage on the web

(www.math.odu.edu/~keyes) SciDAC project homepage on the web (www.tops-scidac.org)

Page 4: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Computational Science & Engineering A “multidiscipline” on the verge of full bloom

Envisioned by Von Neumann and others in the 1940’s Undergirded by theory (numerical analysis) for the past fifty years Empowered by spectacular advances in computer architecture over

the last twenty years Enabled by powerful programming paradigms in the last decade

Adopted in industrial and government applications Boeing 777’s computational design a renowned milestone DOE NNSA’s “ASCI” (motivated by CTBT) DOE SC’s “SciDAC” (motivated by Kyoto, etc.)

Page 5: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Niche for computational science Has theoretical aspects (modeling) Has experimental aspects (simulation) Unifies theory and experiment by providing

common immersive environment for interacting with multiple data sets of different sources

Provides “universal” tools, both hardware and software

Telescopes are for astronomers, microarray analyzers are for biologists, spectrometers are for chemists, and accelerators are for physicists, but computers are for everyone!

Costs going down, capabilities going up every year

Page 6: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Terascale simulation has been “sold”

Environmentglobal climatecontaminant

transport

Lasers & Energycombustion

ICF

Engineeringcrash testingaerodynamics

Biologydrug designgenomics

AppliedPhysics

radiation transportsupernovae

Scientific

Simulation

In these, and many other areas, simulation is an important complement to experiment.

Page 7: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Terascale simulation has been “sold”

Environmentglobal climatecontaminant

transport

Lasers & Energycombustion

ICF

Engineeringcrash testingaerodynamics

Biologydrug designgenomics

Experiments controversial

AppliedPhysics

radiation transportsupernovae

Scientific

Simulation

In these, and many other areas, simulation is an important complement to experiment.

Page 8: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Terascale simulation has been “sold”

Environmentglobal climatecontaminant

transport

Lasers & Energycombustion

ICF

Engineeringcrash testingaerodynamics

Biologydrug designgenomics

Experiments controversial

AppliedPhysics

radiation transportsupernovae

Scientific

Simulation

Experiments dangerous

In these, and many other areas, simulation is an important complement to experiment.

Page 9: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Terascale simulation has been “sold”

Environmentglobal climatecontaminant

transport

Lasers & Energycombustion

ICF

Engineeringcrash testingaerodynamics

Biologydrug designgenomics

Experiments controversial

AppliedPhysics

radiation transportsupernovae

Experiments prohibited or impossible

Scientific

Simulation

Experiments dangerous

In these, and many other areas, simulation is an important complement to experiment.

Page 10: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Terascale simulation has been “sold”

Environmentglobal climatecontaminant

transport

Lasers & Energycombustion

ICF

Engineeringcrash testingaerodynamics

Biologydrug designgenomics

Experiments controversial

AppliedPhysics

radiation transportsupernovae

Experiments prohibited or impossible

Scientific

Simulation

Experiments dangerous

In these, and many other areas, simulation is an important complement to experiment.

Experiments difficult to instrument

Page 11: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Terascale simulation has been “sold”

Environmentglobal climatecontaminant

transport

Lasers & Energycombustion

ICF

Engineeringcrash testingaerodynamics

Biologydrug designgenomics

Experiments controversial

AppliedPhysics

radiation transportsupernovae

Experiments prohibited or impossible

Scientific

Simulation

Experiments dangerous

In these, and many other areas, simulation is an important complement to experiment.

Experiments difficult to instrument

Experiments expensive

ITER:

$20B

Page 12: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Terascale simulation has been “sold”

Environmentglobal climatecontaminant

transport

Lasers & Energycombustion

ICF

Engineeringcrash testingaerodynamics

Biologydrug designgenomics

Experiments controversial

AppliedPhysics

radiation transportsupernovae

Experiments prohibited or impossible

Scientific

Simulation

Experiments dangerous

However, simulation is far from proven! To meet expectations, we need to handle problems of multiple physical scales.

Experiments difficult to instrument

Experiments expensive

Page 13: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

“Enabling technologies” groups to develop reusable software and partner with application groups

Since start-up in 2001, 51 projects share $57M per year Approximately one-third for applications A third for “integrated software infrastructure centers” A third for grid infrastructure and collaboratories

Plus, two new ~10 Tflop/s IBM SP machines available for SciDAC researchers

Page 14: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

SciDAC project characteristics Affirmation of importance of simulation

for new scientific discovery, not just for “fitting” experiments

Recognition that leading-edge simulation is interdisciplinary

no independent support for physicists and chemists to write their own software infrastructure; must collaborate with math & CS experts

Commitment to distributed hierarchical memory computers

new code must target this architecture type

Requirement of lab-university collaborations complementary strengths in simulation 13 laboratories and 50 universities in first round of projects

Page 15: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Major DOE labs

Old Dominion University

Lawrence Berkeley

Pacific Northwest

Argonne

Oak Ridge

DOE Science Lab

Brookhaven

Lawrence Livermore

Los Alamos

Sandia

Sandia Livermore

DOE Defense Lab

Page 16: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Large platforms provided for ASCI ASCI roadmap is to

go to 100 Teraflop/s by 2006

Use variety of vendors

Compaq Cray Intel IBM SGI

Rely on commodity processor/memory units, with tightly coupled network

Massive software project to rewrite physics codes for distributed shared memory

Page 17: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

…and now for SciDAC

IBM Power4 Regatta32 procs per node24 nodes166 Gflop/s per node4Tflop/s (10 in 2003)

IBM Power3+ SMP 16 procs per node208 nodes24 Gflop/s per node5 Tflop/s (upgraded to 10, Feb 2003)

Berkeley

Oak Ridge

Page 18: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

New architecture on horizon: QCDOC System-on-a-chip architecture Designed for Columbia University and Brookhaven National Lab by IBM

using Power technology Special purpose machine for Lattice Gauge Theory Quantum

Chromodynamics “very fast conjugate gradient machine with small local memory” 10 Tflop/s total, copies ordered for UK, Japan QCD research groups

To be delivered August 2003

Page 19: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

New architecture on horizon: Blue Gene/L 180 Tflop/s configuration (65536 dual processor chips) Closely related to QCDOC prototype (IBM system-on a chip) Ordered for LLNL institutional computing (not ASCI)

To be delivered 2004

Page 20: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

New architecture just arrived: Cray X1 Massively parallel-vector machine highly desired by global climate simulation community 32-processor prototype ordered for evaluation Scale-up to 100 Tflop/s peak planned, if prototype proves successful

Delivered to ORNL 18 March 2003

Page 21: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

“Boundary conditions” from architecture Algorithms must run on physically distributed memory

units connected by message-passing network, each serving one or more processors with multiple levels of cache

“horizontal” aspects

network latency, BW, diameter

“vertical” aspects

memory latency, BW; L/S (cache/reg) BW

Page 22: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Following the platforms … … Algorithms must be

highly concurrent and straightforward to load balance not communication bound cache friendly (temporal and spatial locality of reference) highly scalable (in the sense of convergence)

Goal for algorithmic scalability: fill up memory of arbitrarily large machines while preserving nearly constant* running times with respect to proportionally smaller problem on one processor

*logarithmically growing

Page 23: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Official SciDAC goals “Create a new generation of scientific simulation codes that

take full advantage of the extraordinary computing capabilities of terascale computers.”

“Create the mathematical and systems software to enable the scientific simulation codes to effectively and efficiently use terascale computers.”

“Create a collaboratory software environment to enable geographically separated scientists to effectively work together as a team and to facilitate remote access to both facilities and data.”

Page 24: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Four science programs involved …“14 projects will advance the science of climate simulation and prediction. These projects involve novel methods and computationally efficient approaches for simulating components of the climate system and work on an integrated climate model.”

“10 projects will address quantum chemistry and fluid dynamics, for modeling energy-related chemical transformations such as

combustion, catalysis, and photochemical energy conversion. The goal of these projects

is efficient computational algorithms to predict complex molecular structures and

reaction rates with unprecedented accuracy.”

Page 25: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Four science programs involved …“4 projects in high energy and nuclear physics will explore the fundamental processes of nature. The projects include the search for the explosion mechanism of core-collapse supernovae, development of a new generation of accelerator simulation codes, and simulations of quantum chromodynamics.”

“5 projects are focused on developing and improving the physics models needed for integrated simulations of plasma systems to advance fusion energy science.

These projects will focus on such fundamental phenomena as electromagnetic wave-plasma

interactions, plasma turbulence, and macroscopic stability of magnetically confined plasmas.”

Page 26: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

SciDAC per year portfolio: $57M

SciDAC- Program Offices$ in M

$37$8

$2

$7$3

MICS

BER

BES

HENP

FES

for Math, Information and Computer Sciences

Page 27: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Data grids and collaboratories National data grids

Particle physics grid Earth system grid Plasma physics for magnetic fusion DOE Science Grid

Middleware Security and policy for group collaboration Middleware technology for science portals

Network research Bandwidth estimation, measurement methodologies and application Optimizing performance of distributed applications Edge-based traffic processing Enabling technology for wide-area data intensive applications

Page 28: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Computer Science ISICs Scalable Systems Software

Provide software tools for management and utilization of terascale resources. High-end Computer System Performance: Science and Engineering

Develop a science of performance prediction based on concepts of program signatures, machine signatures, detailed profiling, and performance simulation and apply to complex DOE applications. Develop tools that assist users to engineer better performance.

Scientific Data ManagementProvide a framework for efficient management and data mining of large, heterogeneous, distributed data sets.

Component Technology for Terascale SoftwareDevelop software component technology for high-performance parallel scientific codes, promoting reuse and interoperability of complex software, and assist application groups to incorporate component technology into their high-value codes.

Page 29: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Applied Math ISICs Terascale Simulation Tools and Technologies

Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features.

Algorithmic and Software Framework for Partial Differential EquationsDevelop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations.

Terascale Optimal PDE SimulationsDevelop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.

Page 30: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Applied Math ISICs Terascale Simulation Tools and Technologies

Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features.

Algorithmic and Software Framework for Partial Differential EquationsDevelop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations.

Terascale Optimal PDE SimulationsDevelop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.

Page 31: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Applied Math ISICs Terascale Simulation Tools and Technologies

Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features.

Algorithmic and Software Framework for Partial Differential EquationsDevelop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations.

Terascale Optimal PDE SimulationsDevelop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.

Page 32: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Exciting time for enabling technologiesSciDAC application groups have been chartered to build new and improved COMMUNITY CODES. Such codes, such as NWCHEM, consume hundreds of person-years of development, run at hundreds of installations, are given large fractions of community compute resources for decades, and acquire an “authority” that can enable or limit what is done and accepted as science in their respective communities. Except at the beginning, it is difficult to promote major algorithmic ideas in such codes, since change is expensive and sometimes resisted.

ISIC groups have a chance, due to the interdependence built into the SciDAC program structure, to simultaneously influence many of these codes, by delivering software incorporating optimal algorithms that may be reused across many applications. Improvements driven by one application will be available to all.

While they are building community codes, this is our chance to build a CODE COMMUNITY!

Page 33: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

SciDAC themes Chance to do community codes “right” Meant to set “new paradigm” for other DOE programs

new 2003 nano science modeling initiative possible new 2004 fusion simulation initiative

Cultural barriers to interdisciplinary research acknowledged up front

Accountabilities constructed in order to force the mixing of “scientific cultures” (physicists/biologists/chemists/engineers with mathematicians/computer scientists)

Page 34: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Opportunity: nanoscience modeling Jul 2002 report to DOE Proposes $5M/year theory

and modeling initiative to accompany the existing $50M/year experimental initiative in nano science

Report lays out research in numerical algorithms and optimization methods on the critical path to progress in nanotechnology

Page 35: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Opportunity: integrated fusion modeling Dec 2002 report to DOE Currently DOE supports 52

codes in Fusion Energy Sciences

US contribution to ITER will “major” in simulation

Initiative proposes to use advanced computer science techniques and numerical algorithms to improve the US code base in magnetic fusion energy and allow codes to interoperate

Page 36: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

What’s new in SciDAC library software? Philosophy of library usage

large codes interacting as peer applications, with complex calling patterns (e.g., physics code calls implicit solver code calls subroutine automatically generated from original physics code to supply Jacobian of physics code residual)

extensibility polyalgorithmic adaptivity

Resources for development, long-term maintenance, and support not just for “dissertation scope” ideas

Experience on terascale computers

Page 37: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Introducing “Terascale Optimal PDE Simulations” (TOPS) ISIC

Nine institutions, $17M, five years, 24 co-PIs

Page 38: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

34 apps groups (BER, BES,FES, HENP)

7 ISIC groups (4 CS, 3 Math)

10 grid, data collaboratory groups

adaptive gridding, discretization

solvers

systems software, component architecture, performance engineering, data management

0),,,( ptxxf

0),( pxF

bAx BxAx

..),(min tsuxu

0),( uxF

software integration

performance optimization

Page 39: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Who we are…

… the PETSc and TAO people

… the Hypre and Sundials people

… the SuperLU and PARPACK people

… as well as the builders of other widely used packages …

Page 40: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Plus some university collaborators

Our DOE lab collaborations predate SciDAC by many years.

Demmel et al. Manteuffel et al. Dongarra et al.

Keyes et al.Ghattas et al.Widlund et al.

Page 41: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

You may know the on-line “Templates” guides …www.netlib.org/etemplateswww.netlib.org/templates

124 pp. 410 pp.… these are good starts, but not adequate for SciDAC scales!

Page 42: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

You may know the on-line “Templates” guides …www.netlib.org/etemplateswww.netlib.org/templates

… SciDAC puts some of the authors (and many others) “on-line” for physics groups124 pp. 410 pp.

Page 43: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Scope for TOPS Design and implementation of “solvers”

Time integrators

Nonlinear solvers

Optimizers

Linear solvers

Eigensolvers

Software integration Performance optimization

0),,,( ptxxf

0),( pxF

bAx

BxAx

0,0),(..),(min uuxFtsuxu

Optimizer

Linear solver

Eigensolver

Time integrator

Nonlinear solver

Indicates dependence

Sens. Analyzer(w/ sens. anal.)

(w/ sens. anal.)

Page 44: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

The power of optimal algorithms Advances in algorithmic efficiency rival advances in

hardware architecture Consider Poisson’s equation on a cube of size N=n3

If n=64, this implies an overall reduction in flops of ~16 million

Year Method Reference Storage Flops

1947 GE (banded) Von Neumann & Goldstine n5 n7

1950 Optimal SOR Young n3 n4 log n

1971 CG Reid n3 n3.5 log n

1984 Full MG Brandt n3 n3

2u=f 64

64 64

*On a 16 Mflop/s machine, six-months is reduced to 1 s

*

Page 45: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

year

relative speedup

Algorithms and Moore’s Law This advance took place over a span of about 36 years, or 24 doubling times

for Moore’s Law 22416 million the same as the factor from algorithms alone!

Page 46: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

The power of optimal algorithms Since O(N) is already optimal, there is nowhere further

“upward” to go in efficiency, but one must extend optimality “outward”, to more general problems

Hence, for instance, algebraic multigrid (AMG), obtaining O(N) in indefinite, anisotropic, inhomogeneous problems

AMG FrameworkR n

Choose coarse grids, transfer operators, etc. to eliminate, based on

numerical weights, heuristics

error damped by pointwise

relaxationalgebraically smooth error

Page 47: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Gordon Bell Prize performance

Year Type Application Gflop/s System No. Procs

1988 PDE Structures 1.0 Cray Y-MP 8 1989 PDE Seismic 5.6 CM-2 2,048 1990 PDE Seismic 14 CM-2 2,048 1992 NB Gravitation 5.4 Delta 512 1993 MC Boltzmann 60 CM-5 1,024 1994 IE Structures 143 Paragon 1,904 1995 MC QCD 179 NWT 128 1996 PDE CFD 111 NWT 160 1997 NB Gravitation 170 ASCI Red 4,096 1998 MD Magnetism 1,020 T3E-1200 1,536 1999 PDE CFD 627 ASCI BluePac 5,832 2000 NB Gravitation 1,349 GRAPE-6 96 2001 NB Gravitation 11,550 GRAPE-6 1,024 2002 PDE Climate 26,500 Earth Sim ~5,000

Page 48: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Gordon Bell Prize outpaces Moore’s Law

Four orders of magnitude in 13 years

Gordon Moore

Gordon Bell

<<Demi Moore>>

CONCUR-RENCY!!!

Page 49: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

SciDAC application: Center for Extended Magnetohydrodynamic Modeling

Simulate plasmas in tokomaks, leading to understanding of plasma instability and (ultimately) new energy sources

Joint work between ODU, Argonne, LLNL, and PPPL

Page 50: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Optimal solvers Convergence rate nearly independent

of discretization parameters Multilevel schemes for linear and nonlinear

problems Newton-like schemes for quadratic

convergence of nonlinear problems

unscalable

scalable

Problem Size (increasing with number of processors)

Tim

e to

Sol

utio

n

200

150

50

0

100

10 100 10001

AMG shows perfect iteration scaling, above, in contrast to ASM, but still needs performance work to achieve temporal scaling, below, on CEMM fusion code, M3D, though time is halved (or better) for large runs (all runs: 4K dofs per processor)

0

100

200

300

400

500

600

700

3 12 27 48 75

ASM-GMRESAMG-FMGRES

procs

iters

0

10

20

30

40

50

60

3 12 27 48 75

ASM-GMRESAMG-FMGRESAMG inner

time

Page 51: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Solver interoperability accomplishments Hypre in PETSc

codes with PETSc interface (like CEMM’s M3D) can invoke Hypre routines as solvers or preconditioners with command-line switch

SuperLU_DIST in PETSc as above, with SuperLU_DIST

Hypre in AMR Chombo code so far, Hypre is level-solver only; its AMG will ultimately be

useful as a bottom-solver, since it can be coarsened indefinitely without attention to loss of nested geometric structure; also FAC is being developed for AMR uses, like Chombo

Page 52: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Background of PETSc Library Developed by at Argonne to support research, prototyping, and

production parallel solutions of operator equations in message-passing environments; now joined by four additional staff under SciDAC

Distributed data structures as fundamental objects - index sets, vectors/gridfunctions, and matrices/arrays

Iterative linear and nonlinear solvers, combinable modularly and recursively, and extensibly

Portable, and callable from C, C++, Fortran Uniform high-level API, with multi-layered entry Aggressively optimized: copies minimized, communication aggregated

and overlapped, caches and registers reused, memory chunks preallocated, inspector-executor model for repetitive tasks (e.g., gather/scatter)

See http://www.mcs.anl.gov/petsc

Page 53: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

PETSc codeUser code

ApplicationInitialization

FunctionEvaluation

JacobianEvaluation

Post-Processing

PC KSPPETSc

Main Routine

Linear Solvers (SLES)

Nonlinear Solvers (SNES)

Timestepping Solvers (TS)

User Code/PETSc Library Interactions

Page 54: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

PETSc codeUser code

ApplicationInitialization

FunctionEvaluation

JacobianEvaluation

Post-Processing

PC KSPPETSc

Main Routine

Linear Solvers (SLES)

Nonlinear Solvers (SNES)

Timestepping Solvers (TS)

User Code/PETSc Library Interactions

To be AD code

Page 55: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Background of Hypre Library(to be combined with PETSc under SciDAC)

Developed by Livermore to support research, prototyping, and production parallel solutions of operator equations in message-passing environments; now joined by seven additional staff under ASCI and SciDAC

Object-oriented design similar to PETSc Concentrates on linear problems only Richer in preconditioners than PETSc, with focus on algebraic

multigrid Includes other preconditioners, including sparse approximate inverse

(Parasails) and parallel ILU (Euclid)

See http://www.llnl.gov/CASC/hypre/

Page 56: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Hypre’s “Conceptual Interfaces”

Data Layoutstructured composite block-struc unstruc CSR

Linear SolversGMG, ... FAC, ... Hybrid, ... AMGe, ... ILU, ...

Linear System Interfaces

Slide c/o R. Falgout, LLNL

Page 57: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Eigensolvers for Accelerator Design Stanford’s Omega3P is using TOPS software to find EM

modes of accelerator cavities

Methods: Exact Shift-and-Invert Lanczos (ESIL), combining PARPACK with SuperLU when there is sufficient memory, and Jacobi-Davidson otherwise

Current high-water marks: 47-cell chamber, finite element discr. of Maxwell’s eqs. System dimension 1.3 million 20 million nonzeros in system, 350 million in LU factors halved analysis time on 48 processors, scalable to many hundreds

Page 58: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Optimizers Unconstrained or bound-

constrained optimization TAO (powered by PETSc, interfaced in

CCTTSS component framework) used in quantum chemistry energy minimization

PDE-constrained optimization Veltisto (powered by PETSC) used in

flow control application, to straighten out wingtip vortex by wing surface blowing and sunction

“Best technical paper” at SC2002 went to TOPS team

PETSc-powered inverse wave propagation employed to infer hidden geometry

4000 controls

128 procs

2 million controls

256 procs

Page 59: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Performance TOPS is tuning sparse kernels

(Jacobian) matrix-vector multiplication sparse factorization multigrid relaxation

Running on dozens of apps/platform combinations

Power3 (NERSC) and Power4 (ORNL) factors of 2 on structured (CMRS) and

unstructured (CEMM) fusion apps

“Best student paper” at ICS2002 went to TOPS team

theoretical model and experiments on effects of register blocking for sparse mat-vec

Blocking of 4 rows by 2 columns is 4.07 times faster on Itanium2 than default 11 blocks

Page 60: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Lessons to date Working with the same code on the same machine

vastly speeds collaboration, as opposed to ftp’ing matrices around the country, etc.

Exchanging code templates better than exchanging papers, etc.

Version control systems essential to having any last impact or “insertion path” for solver improvements

“Doing physics” more fun than doing driven cavities

Page 61: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Abstract Gantt Chart for TOPS

Algorithmic Development

Research Implementations

Hardened Codes

Applications Integration

Dissemination

time

e.g.,PETSc

e.g.,TOPSLib

e.g., ASPIN

Each color module represents an algorithmic research idea on its way to becoming part of a supported community software tool. At any moment (vertical time slice), TOPS has work underway at multiple levels. While some codes are in applications already, they are being improved in functionality and performance as part of the TOPS research agenda.

Page 62: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Goals/Success Metrics

Understand range of algorithmic options and their tradeoffs (e.g., memory versus time)

Can try all reasonable options easily without recoding or extensive recompilation

Know how their solvers are performing Spend more time in their physics than in their solvers Are intelligently driving solver research, and publishing

joint papers with TOPS researchers Can simulate truly new physics, as solver limits are steadily

pushed back

TOPS users —

Page 63: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Expectations TOPS has of Users Be willing to experiment with novel algorithmic choices –

optimality is rarely achieved beyond model problems without interplay between physics and algorithmics!

Adopt flexible, extensible programming styles in which algorithmic and data structures are not hardwired

Be willing to let us play with the real code you care about, but be willing, as well to abstract out relevant compact tests

Be willing to make concrete requests, to understand that requests must be prioritized, and to work with us in addressing the high priority requests

If possible, profile, profile, profile before seeking help

Page 64: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

http://[email protected]

For more information ...

Page 65: The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University &

PennState Colloquium

Related URLs Personal homepage: papers, talks, etc.

http://www.math.odu.edu/~keyes SciDAC initiative

http://www.science.doe.gov/scidac TOPS project

http://www.math.odu.edu/~keyes/scidac PETSc project

http://www.mcs.anl.gov/petsc Hypre project

http://www.llnl.gov/CASC/hypre ASCI platforms

http://www.llnl.gov/asci/platforms ISCR: annual report, etc. http://www.llnl.gov/casc/iscr