The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old...
description
Transcript of The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old...
The Pennsylvania State University28 April 2003
David E. KeyesCenter for Computational Science
Old Dominion University&
Institute for Scientific Computing ResearchLawrence Livermore National Laboratory
Scientific Discovery through Advanced Computing (SciDAC)
PennState Colloquium
Happy Gödel’s Birthday! Born: 28 April 1906, Brünn,
Austria-Hungary Published “Incompleteness
Theorem”, 1931 Fellow, Royal Society, 1968 National Medal of Science,
1974 Died: 14 January 1978,
Princeton, NJ “Gave a formal demonstration
of the inadequacy of formal demonstrations”- anon.
“A consistency proof for any system … can be carried out only by modes of inference that are not formalized in the system … itself.”. – Kurt Gödel
PennState Colloquium
Remarks This talk is:
a personal perspective, not an official statement of the U.S. Department of Energy
a project panorama more than a technical presentation
For related technical presentations: Tuesday 2:30pm, 116 McAllister Building personal homepage on the web
(www.math.odu.edu/~keyes) SciDAC project homepage on the web (www.tops-scidac.org)
PennState Colloquium
Computational Science & Engineering A “multidiscipline” on the verge of full bloom
Envisioned by Von Neumann and others in the 1940’s Undergirded by theory (numerical analysis) for the past fifty years Empowered by spectacular advances in computer architecture over
the last twenty years Enabled by powerful programming paradigms in the last decade
Adopted in industrial and government applications Boeing 777’s computational design a renowned milestone DOE NNSA’s “ASCI” (motivated by CTBT) DOE SC’s “SciDAC” (motivated by Kyoto, etc.)
PennState Colloquium
Niche for computational science Has theoretical aspects (modeling) Has experimental aspects (simulation) Unifies theory and experiment by providing
common immersive environment for interacting with multiple data sets of different sources
Provides “universal” tools, both hardware and software
Telescopes are for astronomers, microarray analyzers are for biologists, spectrometers are for chemists, and accelerators are for physicists, but computers are for everyone!
Costs going down, capabilities going up every year
PennState Colloquium
Terascale simulation has been “sold”
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
AppliedPhysics
radiation transportsupernovae
Scientific
Simulation
In these, and many other areas, simulation is an important complement to experiment.
PennState Colloquium
Terascale simulation has been “sold”
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
Experiments controversial
AppliedPhysics
radiation transportsupernovae
Scientific
Simulation
In these, and many other areas, simulation is an important complement to experiment.
PennState Colloquium
Terascale simulation has been “sold”
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
Experiments controversial
AppliedPhysics
radiation transportsupernovae
Scientific
Simulation
Experiments dangerous
In these, and many other areas, simulation is an important complement to experiment.
PennState Colloquium
Terascale simulation has been “sold”
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
Experiments controversial
AppliedPhysics
radiation transportsupernovae
Experiments prohibited or impossible
Scientific
Simulation
Experiments dangerous
In these, and many other areas, simulation is an important complement to experiment.
PennState Colloquium
Terascale simulation has been “sold”
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
Experiments controversial
AppliedPhysics
radiation transportsupernovae
Experiments prohibited or impossible
Scientific
Simulation
Experiments dangerous
In these, and many other areas, simulation is an important complement to experiment.
Experiments difficult to instrument
PennState Colloquium
Terascale simulation has been “sold”
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
Experiments controversial
AppliedPhysics
radiation transportsupernovae
Experiments prohibited or impossible
Scientific
Simulation
Experiments dangerous
In these, and many other areas, simulation is an important complement to experiment.
Experiments difficult to instrument
Experiments expensive
ITER:
$20B
PennState Colloquium
Terascale simulation has been “sold”
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
Experiments controversial
AppliedPhysics
radiation transportsupernovae
Experiments prohibited or impossible
Scientific
Simulation
Experiments dangerous
However, simulation is far from proven! To meet expectations, we need to handle problems of multiple physical scales.
Experiments difficult to instrument
Experiments expensive
PennState Colloquium
“Enabling technologies” groups to develop reusable software and partner with application groups
Since start-up in 2001, 51 projects share $57M per year Approximately one-third for applications A third for “integrated software infrastructure centers” A third for grid infrastructure and collaboratories
Plus, two new ~10 Tflop/s IBM SP machines available for SciDAC researchers
PennState Colloquium
SciDAC project characteristics Affirmation of importance of simulation
for new scientific discovery, not just for “fitting” experiments
Recognition that leading-edge simulation is interdisciplinary
no independent support for physicists and chemists to write their own software infrastructure; must collaborate with math & CS experts
Commitment to distributed hierarchical memory computers
new code must target this architecture type
Requirement of lab-university collaborations complementary strengths in simulation 13 laboratories and 50 universities in first round of projects
PennState Colloquium
Major DOE labs
Old Dominion University
Lawrence Berkeley
Pacific Northwest
Argonne
Oak Ridge
DOE Science Lab
Brookhaven
Lawrence Livermore
Los Alamos
Sandia
Sandia Livermore
DOE Defense Lab
PennState Colloquium
Large platforms provided for ASCI ASCI roadmap is to
go to 100 Teraflop/s by 2006
Use variety of vendors
Compaq Cray Intel IBM SGI
Rely on commodity processor/memory units, with tightly coupled network
Massive software project to rewrite physics codes for distributed shared memory
PennState Colloquium
…and now for SciDAC
IBM Power4 Regatta32 procs per node24 nodes166 Gflop/s per node4Tflop/s (10 in 2003)
IBM Power3+ SMP 16 procs per node208 nodes24 Gflop/s per node5 Tflop/s (upgraded to 10, Feb 2003)
Berkeley
Oak Ridge
PennState Colloquium
New architecture on horizon: QCDOC System-on-a-chip architecture Designed for Columbia University and Brookhaven National Lab by IBM
using Power technology Special purpose machine for Lattice Gauge Theory Quantum
Chromodynamics “very fast conjugate gradient machine with small local memory” 10 Tflop/s total, copies ordered for UK, Japan QCD research groups
To be delivered August 2003
PennState Colloquium
New architecture on horizon: Blue Gene/L 180 Tflop/s configuration (65536 dual processor chips) Closely related to QCDOC prototype (IBM system-on a chip) Ordered for LLNL institutional computing (not ASCI)
To be delivered 2004
PennState Colloquium
New architecture just arrived: Cray X1 Massively parallel-vector machine highly desired by global climate simulation community 32-processor prototype ordered for evaluation Scale-up to 100 Tflop/s peak planned, if prototype proves successful
Delivered to ORNL 18 March 2003
PennState Colloquium
“Boundary conditions” from architecture Algorithms must run on physically distributed memory
units connected by message-passing network, each serving one or more processors with multiple levels of cache
“horizontal” aspects
network latency, BW, diameter
“vertical” aspects
memory latency, BW; L/S (cache/reg) BW
PennState Colloquium
Following the platforms … … Algorithms must be
highly concurrent and straightforward to load balance not communication bound cache friendly (temporal and spatial locality of reference) highly scalable (in the sense of convergence)
Goal for algorithmic scalability: fill up memory of arbitrarily large machines while preserving nearly constant* running times with respect to proportionally smaller problem on one processor
*logarithmically growing
PennState Colloquium
Official SciDAC goals “Create a new generation of scientific simulation codes that
take full advantage of the extraordinary computing capabilities of terascale computers.”
“Create the mathematical and systems software to enable the scientific simulation codes to effectively and efficiently use terascale computers.”
“Create a collaboratory software environment to enable geographically separated scientists to effectively work together as a team and to facilitate remote access to both facilities and data.”
PennState Colloquium
Four science programs involved …“14 projects will advance the science of climate simulation and prediction. These projects involve novel methods and computationally efficient approaches for simulating components of the climate system and work on an integrated climate model.”
“10 projects will address quantum chemistry and fluid dynamics, for modeling energy-related chemical transformations such as
combustion, catalysis, and photochemical energy conversion. The goal of these projects
is efficient computational algorithms to predict complex molecular structures and
reaction rates with unprecedented accuracy.”
PennState Colloquium
Four science programs involved …“4 projects in high energy and nuclear physics will explore the fundamental processes of nature. The projects include the search for the explosion mechanism of core-collapse supernovae, development of a new generation of accelerator simulation codes, and simulations of quantum chromodynamics.”
“5 projects are focused on developing and improving the physics models needed for integrated simulations of plasma systems to advance fusion energy science.
These projects will focus on such fundamental phenomena as electromagnetic wave-plasma
interactions, plasma turbulence, and macroscopic stability of magnetically confined plasmas.”
PennState Colloquium
SciDAC per year portfolio: $57M
SciDAC- Program Offices$ in M
$37$8
$2
$7$3
MICS
BER
BES
HENP
FES
for Math, Information and Computer Sciences
PennState Colloquium
Data grids and collaboratories National data grids
Particle physics grid Earth system grid Plasma physics for magnetic fusion DOE Science Grid
Middleware Security and policy for group collaboration Middleware technology for science portals
Network research Bandwidth estimation, measurement methodologies and application Optimizing performance of distributed applications Edge-based traffic processing Enabling technology for wide-area data intensive applications
PennState Colloquium
Computer Science ISICs Scalable Systems Software
Provide software tools for management and utilization of terascale resources. High-end Computer System Performance: Science and Engineering
Develop a science of performance prediction based on concepts of program signatures, machine signatures, detailed profiling, and performance simulation and apply to complex DOE applications. Develop tools that assist users to engineer better performance.
Scientific Data ManagementProvide a framework for efficient management and data mining of large, heterogeneous, distributed data sets.
Component Technology for Terascale SoftwareDevelop software component technology for high-performance parallel scientific codes, promoting reuse and interoperability of complex software, and assist application groups to incorporate component technology into their high-value codes.
PennState Colloquium
Applied Math ISICs Terascale Simulation Tools and Technologies
Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features.
Algorithmic and Software Framework for Partial Differential EquationsDevelop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations.
Terascale Optimal PDE SimulationsDevelop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
PennState Colloquium
Applied Math ISICs Terascale Simulation Tools and Technologies
Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features.
Algorithmic and Software Framework for Partial Differential EquationsDevelop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations.
Terascale Optimal PDE SimulationsDevelop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
PennState Colloquium
Applied Math ISICs Terascale Simulation Tools and Technologies
Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features.
Algorithmic and Software Framework for Partial Differential EquationsDevelop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations.
Terascale Optimal PDE SimulationsDevelop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
PennState Colloquium
Exciting time for enabling technologiesSciDAC application groups have been chartered to build new and improved COMMUNITY CODES. Such codes, such as NWCHEM, consume hundreds of person-years of development, run at hundreds of installations, are given large fractions of community compute resources for decades, and acquire an “authority” that can enable or limit what is done and accepted as science in their respective communities. Except at the beginning, it is difficult to promote major algorithmic ideas in such codes, since change is expensive and sometimes resisted.
ISIC groups have a chance, due to the interdependence built into the SciDAC program structure, to simultaneously influence many of these codes, by delivering software incorporating optimal algorithms that may be reused across many applications. Improvements driven by one application will be available to all.
While they are building community codes, this is our chance to build a CODE COMMUNITY!
PennState Colloquium
SciDAC themes Chance to do community codes “right” Meant to set “new paradigm” for other DOE programs
new 2003 nano science modeling initiative possible new 2004 fusion simulation initiative
Cultural barriers to interdisciplinary research acknowledged up front
Accountabilities constructed in order to force the mixing of “scientific cultures” (physicists/biologists/chemists/engineers with mathematicians/computer scientists)
PennState Colloquium
Opportunity: nanoscience modeling Jul 2002 report to DOE Proposes $5M/year theory
and modeling initiative to accompany the existing $50M/year experimental initiative in nano science
Report lays out research in numerical algorithms and optimization methods on the critical path to progress in nanotechnology
PennState Colloquium
Opportunity: integrated fusion modeling Dec 2002 report to DOE Currently DOE supports 52
codes in Fusion Energy Sciences
US contribution to ITER will “major” in simulation
Initiative proposes to use advanced computer science techniques and numerical algorithms to improve the US code base in magnetic fusion energy and allow codes to interoperate
PennState Colloquium
What’s new in SciDAC library software? Philosophy of library usage
large codes interacting as peer applications, with complex calling patterns (e.g., physics code calls implicit solver code calls subroutine automatically generated from original physics code to supply Jacobian of physics code residual)
extensibility polyalgorithmic adaptivity
Resources for development, long-term maintenance, and support not just for “dissertation scope” ideas
Experience on terascale computers
PennState Colloquium
Introducing “Terascale Optimal PDE Simulations” (TOPS) ISIC
Nine institutions, $17M, five years, 24 co-PIs
PennState Colloquium
34 apps groups (BER, BES,FES, HENP)
7 ISIC groups (4 CS, 3 Math)
10 grid, data collaboratory groups
adaptive gridding, discretization
solvers
systems software, component architecture, performance engineering, data management
0),,,( ptxxf
0),( pxF
bAx BxAx
..),(min tsuxu
0),( uxF
software integration
performance optimization
PennState Colloquium
Who we are…
… the PETSc and TAO people
… the Hypre and Sundials people
… the SuperLU and PARPACK people
… as well as the builders of other widely used packages …
PennState Colloquium
Plus some university collaborators
Our DOE lab collaborations predate SciDAC by many years.
Demmel et al. Manteuffel et al. Dongarra et al.
Keyes et al.Ghattas et al.Widlund et al.
PennState Colloquium
You may know the on-line “Templates” guides …www.netlib.org/etemplateswww.netlib.org/templates
124 pp. 410 pp.… these are good starts, but not adequate for SciDAC scales!
PennState Colloquium
You may know the on-line “Templates” guides …www.netlib.org/etemplateswww.netlib.org/templates
… SciDAC puts some of the authors (and many others) “on-line” for physics groups124 pp. 410 pp.
PennState Colloquium
Scope for TOPS Design and implementation of “solvers”
Time integrators
Nonlinear solvers
Optimizers
Linear solvers
Eigensolvers
Software integration Performance optimization
0),,,( ptxxf
0),( pxF
bAx
BxAx
0,0),(..),(min uuxFtsuxu
Optimizer
Linear solver
Eigensolver
Time integrator
Nonlinear solver
Indicates dependence
Sens. Analyzer(w/ sens. anal.)
(w/ sens. anal.)
PennState Colloquium
The power of optimal algorithms Advances in algorithmic efficiency rival advances in
hardware architecture Consider Poisson’s equation on a cube of size N=n3
If n=64, this implies an overall reduction in flops of ~16 million
Year Method Reference Storage Flops
1947 GE (banded) Von Neumann & Goldstine n5 n7
1950 Optimal SOR Young n3 n4 log n
1971 CG Reid n3 n3.5 log n
1984 Full MG Brandt n3 n3
2u=f 64
64 64
*On a 16 Mflop/s machine, six-months is reduced to 1 s
*
PennState Colloquium
year
relative speedup
Algorithms and Moore’s Law This advance took place over a span of about 36 years, or 24 doubling times
for Moore’s Law 22416 million the same as the factor from algorithms alone!
PennState Colloquium
The power of optimal algorithms Since O(N) is already optimal, there is nowhere further
“upward” to go in efficiency, but one must extend optimality “outward”, to more general problems
Hence, for instance, algebraic multigrid (AMG), obtaining O(N) in indefinite, anisotropic, inhomogeneous problems
AMG FrameworkR n
Choose coarse grids, transfer operators, etc. to eliminate, based on
numerical weights, heuristics
error damped by pointwise
relaxationalgebraically smooth error
PennState Colloquium
Gordon Bell Prize performance
Year Type Application Gflop/s System No. Procs
1988 PDE Structures 1.0 Cray Y-MP 8 1989 PDE Seismic 5.6 CM-2 2,048 1990 PDE Seismic 14 CM-2 2,048 1992 NB Gravitation 5.4 Delta 512 1993 MC Boltzmann 60 CM-5 1,024 1994 IE Structures 143 Paragon 1,904 1995 MC QCD 179 NWT 128 1996 PDE CFD 111 NWT 160 1997 NB Gravitation 170 ASCI Red 4,096 1998 MD Magnetism 1,020 T3E-1200 1,536 1999 PDE CFD 627 ASCI BluePac 5,832 2000 NB Gravitation 1,349 GRAPE-6 96 2001 NB Gravitation 11,550 GRAPE-6 1,024 2002 PDE Climate 26,500 Earth Sim ~5,000
PennState Colloquium
Gordon Bell Prize outpaces Moore’s Law
Four orders of magnitude in 13 years
Gordon Moore
Gordon Bell
<<Demi Moore>>
CONCUR-RENCY!!!
PennState Colloquium
SciDAC application: Center for Extended Magnetohydrodynamic Modeling
Simulate plasmas in tokomaks, leading to understanding of plasma instability and (ultimately) new energy sources
Joint work between ODU, Argonne, LLNL, and PPPL
PennState Colloquium
Optimal solvers Convergence rate nearly independent
of discretization parameters Multilevel schemes for linear and nonlinear
problems Newton-like schemes for quadratic
convergence of nonlinear problems
unscalable
scalable
Problem Size (increasing with number of processors)
Tim
e to
Sol
utio
n
200
150
50
0
100
10 100 10001
AMG shows perfect iteration scaling, above, in contrast to ASM, but still needs performance work to achieve temporal scaling, below, on CEMM fusion code, M3D, though time is halved (or better) for large runs (all runs: 4K dofs per processor)
0
100
200
300
400
500
600
700
3 12 27 48 75
ASM-GMRESAMG-FMGRES
procs
iters
0
10
20
30
40
50
60
3 12 27 48 75
ASM-GMRESAMG-FMGRESAMG inner
time
PennState Colloquium
Solver interoperability accomplishments Hypre in PETSc
codes with PETSc interface (like CEMM’s M3D) can invoke Hypre routines as solvers or preconditioners with command-line switch
SuperLU_DIST in PETSc as above, with SuperLU_DIST
Hypre in AMR Chombo code so far, Hypre is level-solver only; its AMG will ultimately be
useful as a bottom-solver, since it can be coarsened indefinitely without attention to loss of nested geometric structure; also FAC is being developed for AMR uses, like Chombo
PennState Colloquium
Background of PETSc Library Developed by at Argonne to support research, prototyping, and
production parallel solutions of operator equations in message-passing environments; now joined by four additional staff under SciDAC
Distributed data structures as fundamental objects - index sets, vectors/gridfunctions, and matrices/arrays
Iterative linear and nonlinear solvers, combinable modularly and recursively, and extensibly
Portable, and callable from C, C++, Fortran Uniform high-level API, with multi-layered entry Aggressively optimized: copies minimized, communication aggregated
and overlapped, caches and registers reused, memory chunks preallocated, inspector-executor model for repetitive tasks (e.g., gather/scatter)
See http://www.mcs.anl.gov/petsc
PennState Colloquium
PETSc codeUser code
ApplicationInitialization
FunctionEvaluation
JacobianEvaluation
Post-Processing
PC KSPPETSc
Main Routine
Linear Solvers (SLES)
Nonlinear Solvers (SNES)
Timestepping Solvers (TS)
User Code/PETSc Library Interactions
PennState Colloquium
PETSc codeUser code
ApplicationInitialization
FunctionEvaluation
JacobianEvaluation
Post-Processing
PC KSPPETSc
Main Routine
Linear Solvers (SLES)
Nonlinear Solvers (SNES)
Timestepping Solvers (TS)
User Code/PETSc Library Interactions
To be AD code
PennState Colloquium
Background of Hypre Library(to be combined with PETSc under SciDAC)
Developed by Livermore to support research, prototyping, and production parallel solutions of operator equations in message-passing environments; now joined by seven additional staff under ASCI and SciDAC
Object-oriented design similar to PETSc Concentrates on linear problems only Richer in preconditioners than PETSc, with focus on algebraic
multigrid Includes other preconditioners, including sparse approximate inverse
(Parasails) and parallel ILU (Euclid)
See http://www.llnl.gov/CASC/hypre/
PennState Colloquium
Hypre’s “Conceptual Interfaces”
Data Layoutstructured composite block-struc unstruc CSR
Linear SolversGMG, ... FAC, ... Hybrid, ... AMGe, ... ILU, ...
Linear System Interfaces
Slide c/o R. Falgout, LLNL
PennState Colloquium
Eigensolvers for Accelerator Design Stanford’s Omega3P is using TOPS software to find EM
modes of accelerator cavities
Methods: Exact Shift-and-Invert Lanczos (ESIL), combining PARPACK with SuperLU when there is sufficient memory, and Jacobi-Davidson otherwise
Current high-water marks: 47-cell chamber, finite element discr. of Maxwell’s eqs. System dimension 1.3 million 20 million nonzeros in system, 350 million in LU factors halved analysis time on 48 processors, scalable to many hundreds
PennState Colloquium
Optimizers Unconstrained or bound-
constrained optimization TAO (powered by PETSc, interfaced in
CCTTSS component framework) used in quantum chemistry energy minimization
PDE-constrained optimization Veltisto (powered by PETSC) used in
flow control application, to straighten out wingtip vortex by wing surface blowing and sunction
“Best technical paper” at SC2002 went to TOPS team
PETSc-powered inverse wave propagation employed to infer hidden geometry
4000 controls
128 procs
2 million controls
256 procs
PennState Colloquium
Performance TOPS is tuning sparse kernels
(Jacobian) matrix-vector multiplication sparse factorization multigrid relaxation
Running on dozens of apps/platform combinations
Power3 (NERSC) and Power4 (ORNL) factors of 2 on structured (CMRS) and
unstructured (CEMM) fusion apps
“Best student paper” at ICS2002 went to TOPS team
theoretical model and experiments on effects of register blocking for sparse mat-vec
Blocking of 4 rows by 2 columns is 4.07 times faster on Itanium2 than default 11 blocks
PennState Colloquium
Lessons to date Working with the same code on the same machine
vastly speeds collaboration, as opposed to ftp’ing matrices around the country, etc.
Exchanging code templates better than exchanging papers, etc.
Version control systems essential to having any last impact or “insertion path” for solver improvements
“Doing physics” more fun than doing driven cavities
PennState Colloquium
Abstract Gantt Chart for TOPS
Algorithmic Development
Research Implementations
Hardened Codes
Applications Integration
Dissemination
time
e.g.,PETSc
e.g.,TOPSLib
e.g., ASPIN
Each color module represents an algorithmic research idea on its way to becoming part of a supported community software tool. At any moment (vertical time slice), TOPS has work underway at multiple levels. While some codes are in applications already, they are being improved in functionality and performance as part of the TOPS research agenda.
PennState Colloquium
Goals/Success Metrics
Understand range of algorithmic options and their tradeoffs (e.g., memory versus time)
Can try all reasonable options easily without recoding or extensive recompilation
Know how their solvers are performing Spend more time in their physics than in their solvers Are intelligently driving solver research, and publishing
joint papers with TOPS researchers Can simulate truly new physics, as solver limits are steadily
pushed back
TOPS users —
PennState Colloquium
Expectations TOPS has of Users Be willing to experiment with novel algorithmic choices –
optimality is rarely achieved beyond model problems without interplay between physics and algorithmics!
Adopt flexible, extensible programming styles in which algorithmic and data structures are not hardwired
Be willing to let us play with the real code you care about, but be willing, as well to abstract out relevant compact tests
Be willing to make concrete requests, to understand that requests must be prioritized, and to work with us in addressing the high priority requests
If possible, profile, profile, profile before seeking help
PennState Colloquium
Related URLs Personal homepage: papers, talks, etc.
http://www.math.odu.edu/~keyes SciDAC initiative
http://www.science.doe.gov/scidac TOPS project
http://www.math.odu.edu/~keyes/scidac PETSc project
http://www.mcs.anl.gov/petsc Hypre project
http://www.llnl.gov/CASC/hypre ASCI platforms
http://www.llnl.gov/asci/platforms ISCR: annual report, etc. http://www.llnl.gov/casc/iscr