jefferson-coop
-
Upload
jenna-martin -
Category
Documents
-
view
219 -
download
0
Transcript of jefferson-coop
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 1/35
This work was performed under the auspices of the U.S. Department of Energy
by University of California Lawrence Livermore National Laboratoryunder contract No. W-7405-Eng-48.
Cooperative Parallelism:
An evolutionary programming model
for exploiting massively parallel systems
David Jefferson, John May,
Nathan Barton, Rich Becker, Jarek Knap
Gary Kumfert, James Leek, John TannahillLawrence Livermore National Laboratory
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 2/35
Blue Gene / L
65,536 x 2 processors, 360 Tflops (peak)
Petaflop (peak) machine in 2 years
Petaflop (sustained) in 5 years
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 3/35
Co-op is a new programming paradigm and
components model for petascale simulation
• Petascale performance driven by need for multiphysics, multiscalemodels
– fluid -- molecule
– continuum metal -- crystal
– plasma -- charged particle
– classical -- quantum
• Multiphysics, multiscale models call for a simulation componentsarchitecture
– whole, parallel simulation codes used as building blocks in larger simulations
– allows composition (federation) and reuse of codes already mature and trusted
• Multiphysics, multiscale models naturally exhibit MPMD parallelism– different subsystems, or length and time scales, require multiphysics
– multiphysics most efficient with different codes in parallel
• Efficient use of petascale resources requires more dynamic simulation algorithms
– much more flexible use of resources: dynamic (sub)allocation of processor nodes
– adaptive sampling family of multiscale algorithms
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 4/35
Co-op allows parallel simulations to be
used as components in larger computations
• Large parallel modelstreated as single objects:– coupled with little knowledge
of each others’ internals
• Coupled models:– different languages
– different paralleldecomposition
– different physics
• Components:– dynamically launched
– internally parallel
– externally parallel
–communicate in parallel
time
spacestate space
scale
ensemble coupling for parametric
sensitivity or optimization
{ }
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 5/35
Strain rate localization can be predicted
with multiscale expanding cylinder model
1/8 exploding cylinder
• expands radially
• rings with reflecting
strain rate waves
• develops diagonal
shear bands
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 6/35
Classic SPMD
embedding of fine-
scale calculations
• nodes statically
allocated and
scheduled
• fine scale models
executed sequentiallytime for onemajor cycle
64nodes
fine scale physics
coarse scale model
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 7/35
Adaptive Sampling: a class of dynamic
algorithms for multiscale simulation
• Apply fine scale model where continuum model is
invalid…
• …but just a sample of the
elements
• Elsewhere, interpolate material
response function from results
previously calculated
• Much less fine scale work;
remaining computation may be
seriously unbalanced, however.
• More than an order of magnitude
of performance improvement
may be achieved.
• Adaptive sampling is not AMR! coarse model is
generally accurate
coarse model assumptions
break down
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 8/35
Co-op model adds layer of dynamic MPMD
parallelism to familiar SPMD paradigm
MPMD federation
SPMD symponent
Processcomposed of threads
that use shared variables, locks, etc .
composed of processes
that use MPI
composed of symponents
that use remote method
invocation (RMI)
Thread
Sequential, with vector, pipeline,or multi-issue parallelism
Familiar
parallelism
layers
New parallelism
layer
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 9/35
Adaptive sampling app with integrated
fine scale DB
ALE3D CouplerLib
FSDB
CSM
n = 100 processes
z/p = 104zones/process
z = 106zones
T = 104timesteps
? = 100 µσεχ/τιµεστεπ
? = 1 0−2
(εϖαλ φραχτιον)
ΦΣΜΜαστερ
ΦΣΜ Σερϖερσ
ΦΣΜ
Μαστερ
ΦΣΜ Σερϖερσ
Continuum
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 10/35
Co-op Architecture
• NodeSet allocate / deallocate
– Contiguous node sets only
– Suballocation from original allocation
– Algorithms somewhat like memory allocation
• Symponent launch
– Array of symponents can be launched on array of nodesets
by single call
• Component termination detection– Parent symponent notified if child terminates
• Component kill
– Must work when target is deadlocked, looping, etc.
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 11/35
Remote Method Invocation (RMI)
• General semantics– Operation done by a thread on a symponent
– It can be nonblocking : caller gets a ticket and can later check, or wait for,completion of the RMI
– Exceptions supported
– Concurrent RMIs on same symponent executed in nondeterministic order
• Three kinds of RMI recognized– Sequential body, threaded execution
• Inter-thread synchronization required
• MPI in body not permitted
• Thread concurrency limited by OS
– Parallel body, serialized execution• Atomic
• No recursion; no circularity (results in deadlock)
• MPI permitted and needed in body
– One way • “Call” does not involve a return
• Essentially an asynchronous, one-sided “active” message
– Others might be recognized in the future
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 12/35
More about RMI
• Inter-symponent synchronization– RMIs queued, and executed only when callee executes
AtConsistentState() method
– Last RMI signaled by special RMI: continue()
• Intra-symponent synchronization– Sequential body, threaded RMIs must use proper POSIX inter-thread
synchronization
• Implementation– Babel RMI over TCP
– Persistent connections at the moment (except for one-way)• Soon to be non-persistent
– Future implementations over • MPI-2
• UDP
• Native packet transports
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 13/35
Babel and Co-op are intimately related
• Symponents are Babel objects
• Co-op RMI implemented over Babel RMI
• Symponent APIs expressed in Babel’s SIDL language
• Any thread with a reference to a symponent can call RMIs on it
• References can be passed as args, results
• Caller and callee can be in different languages
• Co-op rests totally on Babel for
– RMI syntax
– SIDL specification language
– Language interoperability
– Parts of implementation of RMI
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 15/35
time
64
nodes
fine scale physics
coarse scale model
MPMD refactoring and parallelized fine
scale models
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 16/35
64
nodes
time
full fine scalesimulations
coarse scale model
Adaptive Sampling
• evaluation fraction is the most critical performance parameter
interpolated fine scale
behavior
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 17/35
QuickTimeª and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Adaptive sampling + active load balancing
yields dramatic speedup
adaptive sample fine
scale simulations
coarse scale model
database retrieval and
interpolation
time
nodes
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 18/35
Performance of adaptive sampling
using the Co-op programming model
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0.0 5.0 10.0 15.0 20.0 25.0
Sim Time (µse
adaptive sampling
adaptive sampling with load balancing
classic model embedding
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 19/35
Conclusions
• MP/MS simulation drives need for petascale
performance
• MP/MS simulation requires
– componentized model construction– MPMD execution
– dynamic instantiation of components
• hence dynamic node allocation
–language interoperability
• Adaptive Sampling is amazigly powerful
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 21/35
PSI Project Overview
David Jefferson
Lawrence Livermore National Lab
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 23/35
Distribution of Coarse-scale and
Fine-scale Models across Processors
Coarse-scalemodel
Wallclock
time
Onecoars
escale
timestep
Many instances of fine-scale
model
…
…
. . .
. . .
MPMD f t i ll b tt
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 24/35
time
64nodes
• remote fine scale
models
• nodes dynamically
allocated and
scheduled
• improved performance
due to better balance
fine scale physics
coarse scale model
MPMD refactoring allows better
scheduling of fine scale model executions
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 25/35
time
125
nodes
Additional parallelism then becomes
available
• fine scale model executions independent
• “nearest neighbor” DB queries are mostly independent and
easily parallelizable as well
adaptive sample fine
scale simulations
coarse scale model
database retrieval and
interpolation
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 26/35
Multiscale material science application
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 27/35
Multiscale material science application
with parallel FS database
ALE3D Coupler
Lib
CSM
n = 100 processes
z/p = 104zones/process
z = 106zones
T = 104timesteps
? = 100 µσεχ/τιµεστεπ
? = 10−2
(εϖαλ φραχτιον)
ΦΣΜΜαστερ
ΦΣΜ Σερϖερσ
ΦΣΜΜαστερ
ΦΣΜ Σερϖερσ
∆ΒΜαστερ
∆Β Σερϖερσ
∆ΒΜαστερ
∆Β Σερϖερσ
∆ΒΧλονε 1
∆ΒΧλονε κ
θυερψ()µαξ = ζ / ?ινσερτ()µαξ = ζ / ?
µεαν = ?ζ / ?
ρυνΦΣΜ()µαξ = ζ / ?
µεαν = ?ζ / ?
The PSI Project
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 28/35
The PSI Project
• Development Co-op model of hybrid
componentized MPMD computation.– Definition of computational model and semantic
issues
– Implementation of Co-op runtime system
– Implementation of extensions to Babel• Development of multiscale simulation
technology using Co-op– Theory and practice of adaptive sampling
– Implementation of adaptive sampling coupler withinCo-op framework
– Implementation of Fine Scale Model “database”suitable for adaptive sampling
• M-tree database with nearest neighbor queries
C C biliti
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 29/35
Co-op Capabilities
• NodeSet allocate/deallocate– Suballocation of nodeset of any size from job’s static allocation
– Free sets of nodesets, not nodes
• Symponent launch / kill– Any process can launch an SPMD executable as a new symponent with any
number of processes on a nodeset whose size divides n.
– Parent-child hierarchy: parent process notified of child death; child killed if parent dies
– Launch uses SLURM srun– Runaway or wedged symponent can be killed & its nodeset recovered
• Symponent remote references– Symponents can have remote references to one another, which they use for
making RMI calls
– Remote references can be used as arguments in RMI calls
• Symponents and Babel– Symponents are Babel objects, and present SIDL interfaces
– Symponents inherit interfaces in type hierarchy, so they can be treated inobject-oriented fashion
– A symponent RMI is a Babel RMI• Full type safety
• Language independence / interoperability
Co op Capabilities
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 30/35
Co-op Capabilities
• Symponent RMI & synchronization– RMI calls are from a thread to a symponent
– RMIs are one-sided , unexpected , and by default nonblocking – Any number of in- and out-args of any size and type can be used
– Full exception-throwing capability
– RMI’s can only be executed when callee calls atConsistentState()
– Special “system” RMIs inherited by all symponents: continue() and
kill()– Two kinds of user RMIs
• Sequential body, threaded execution, executes in Rank 0 only– Body executes in rank 0 process only
– Body is sequential, and does not need MPI
– Concurrent RMIs must synchronize with one another as threads• Parallel body, serialized execution, executes in all processes
– Each may be parallel, running on all processes of calleesymponent, but multiple RMI calls are serially executed, and henceatomic
– Normally use MPI
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 31/35
time
64
nodes
Adaptive Sampling substitutes DB
retrieval and interpolation for full fine
scale evaluation
• subscale results tabulated in a DB
• faster DB queries and interpolations substituted for slower
fine scale model executions
adaptive sample finescale simulations
coarse scale model
database retrieval and
interpolation
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 32/35
Linux
Current implementation of Co-op runs
multiscale models on Linux cluster
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 33/35
Co-oplib
MPI
Linux
Co-opd
launch (SLURM / srun)
Current implementation of Co-op runs
multiscale models on Linux cluster
Not shown: SLURM
daemons and srun()
processes
8/7/2019 jefferson-coop
http://slidepdf.com/reader/full/jefferson-coop 34/35
Linux
Co-opd
Babel
Co-oplib
MPI
launch (SLURM / srun)
RMI (over UDP)
CSM
Current implementation of Co-op runs
multiscale models on Linux cluster
Not shown: SLURM
daemons and srun()
processes