Post on 22-Feb-2016
description
Adaptive Multiscale Simulation Infrastructure - AMSI
Overview:o Industry Standardso AMSI Goals and Overviewo AMSI Implementationo Supported Soft Tissue Simulationo Results
W.R. Tobin, D. Fovargue, D. Ibanez, M.S. ShephardScientific Computation Research Center
Rensselaer Polytechnic Institute
2
Current Industry Standards – Physical Simulations
Overwhelming majority of numerical simulations conducted in HPC (and elsewhere) are single scaleo Continuum (e.g. Finite Element, Finite Difference)o Discrete (e.g. Molecular Dynamics)
Phenomena at multiple scales can have profound effects on the eventual solution to a problem (e.g. fine-scale anisotropies)
3
Current Industry Standards – Physical Simulations
Typically a physical model or scale is simulated using a Single Program Multiple Data (SPMD) style of parallelismo Quantities of interest (mesh, tensor fields, etc.) distributed
across the parallel execution space
Geometric Model Partition Model Distributed Mesh
4
Current Industry Standards – Physical Simulations
Interacting physical models and scales introduce a much more complex set of requirements in our use of the parallel execution spaceo Writing a new SPMD code for each new multiscale simulation
would require intense reworking of legacy codes used for single-scale simulations (possibly many times over)
Need approach which can leverage the work that has gone into creating and perfecting legacy simulations in the context of massively parallel simulations with interacting physical models
primary spmd code
auxiliaryspmd code
auxiliaryspmd code
auxiliaryspmd code
5
AMSI GoalsTake advantage of proven legacy codes to address the needs of multimodel problemso Minimize need to rework legacy codes to execute in more
dynamic parallel environment o Only desired edit/interaction points are those locations in the
code where the values produced by multiscale interactions are needed.
Allow dynamic scale load-balancing and process scale reassignment to reduce process idle time when a scale is blocked or underutilized
6
AMSI GoalsHierarchy of focuseso Abstract-Level: Support for
implementing multi-model simulations on massively parallel HPC machines
o Simulation-Level: Allow dynamic runtime workflow management to implement versatile adaptive simulations
o Theory-Level: Provide generic control algorithms (and hooks to allow specialization) supported by real-time minimal simulation meta-modeling
o Developer-Level: Facilitate all of the above while minimizing AMSI system overheads and maintaining robust code
simulation goals
physics analysis
scale/physicslinking models
physical attributes
simulation initialization
simulation
state control
adaptive simulation control
discretization,
model, linking error
estimates
discretization, model, scale linking improvement
model hierarch
y control
limits based on measured paramete
rs
7
AMSI GoalsVariety of end-users targetedo Application Experts:
• Simulation end-users who want answers to various problems
o Modeling Experts:• Introduce codes expressing new
physical models• Combine proven physical models in new
ways to describe multiscale behavioro Computational Experts:
• Introduce new discretization methods• Introduce new numerical solution
methods• Develop new parallel algorithms
8
AMSI OverviewGeneral meta-modeling services o Support for modeling computational scale-linking operations
and data• Model of scale-tasks and task-relations denoting multiscale data transfer
o Specializing this support will facilitate interaction with high-level control and decision-making algorithms
explicit and computational
domains
math and computational
models
explicit andcomputational tensor fields
scaleX
explicit and computational
domains
math and computational
models
explicit andcomputational tensor fields
scaleY
geometric
interactions
model
relationships
field
transformations
scale linking
9
AMSI OverviewDynamic management of the parallel execution space Process reassignment will use load balancing support for underlying SPMD distributed data as well as the implementation of state-specific entry/exit vectors for scale-tasks.o Load balancing support of scale-coupling data is supported by
the meta model of that data in the parallel spaceo Other data requires support for dynamic load balancing in any
underlying libraries o Can be thought of as a hierarchy of load-balancing operations
• Multiple scale-task communication/computation balancing• Single scale-task load balancing (standard SPMD load balancing
operators)
10
AMSI ImplementationAMSI::ControlService o Primary application interaction point for AMSI, tracks the
overall state of the simulation. o Higher-level control decisions use this object to implement
those decisions and update the simulation meta-model.AMSI::TaskManager o Maintain the computational meta-model of the parallel
execution space and various simulation models.AMSI::RelationManagero Manage computational scale-linking communication and load
balancing required for dynamic management of parallel execution space.
11
AMSI ImplementationReal-time minimal simulation meta-modelo Initialization actions
• Scale-tasks and their scale-linking relations o Runtime actions
• Data distributions representing discrete units of generic scale-linking data • Communication patterns determining distribution of scale linking
communication down to individual data distribution unitso Shift to more dynamic scale management will require new
control data to be reconciled across processes and scales• Change initialization actions to be (allowable) runtime actions
Initialization Runtime
scaleX
scaleY
scaleZscaleX
scale-linking data
scaleY
communicationpatterns
12
AMSI ImplementationTwo forms of control data parallel communicationo Assembly is a scale-task collective process.o Reconciliation is collective on the union of two scale-tasks
associated by a communication relation.
scaleX
scaleY
Assembly
scaleX
scaleY
Reconciliation
13
AMSI ImplementationScale linking communication patterns o Constructed via standard distribution algorithms, oro Hooks provided for user-implemented pattern construction,
unique to each data distribution
CommPatternAlgo_Register(relation_id, CommPatternCreate_FuncPtr);CommPattern_Create(dataDist_id, owner_scale_id, foreign_scale_id);
scaleX
scaleY
14
AMSI ImplementationScale linking communication is handled, on both sides, via a single function call
o Determines whether the process belongs to the sending or recving scale-task
o Communicates scale-linking quantities guided by a communication pattern
o Buffer is contiguous memory segment packed with POD data, MPI_Datatype must describe that datatype
o At present a data distribution is limited to one POD representation
Communicate(relation_id, pattern_id, buffer, MPI_Datatype);
15
AMSI ImplementationShift to phased communication and dynamic scale-task management will introduce new requirementso Will reduce number of explicit control data reconciliationso Will require the introduction of implicit control data
reconciliations during scale-linking operations• Primary simulation control points
scaleX
scaleY
assemble
reconcile communicate
16
AMSI ImplementationShift to phased communication and dynamic scale-task management will introduce new requirementso Will reduce number of explicit control data reconciliationso Will require the introduction of implicit control data
reconciliations during scale-linking operations• Primary simulation control points
scaleX
scaleY
assemblereconcile /
communicatecompute
17
BiotissueMultiscale soft-tissue mechanics simulationo Engineering Scale:
• Macroscale (Finite Element Analysis)o Fine Scale controlling engineering scale
behavior: • Microscale Fiber-Only-RVE (Quasistatics)• Microscale Fiber-Matrix-RVE (FEA)• (future project) Additional cellular scale(s) (FEA)
Intermediate scale between current scaleso Scale linking
• Deformations to RVE• Force/displacement to the engineering scale
Macroscale
Fiber-Only
18
Biotissue ImplementationScalable implementation with parallelized scale-tasks
Macroscale
Microscale
macro0 macro1 macro2 macroN
micro0 micro1 micro2 microM-1 microM
19
Biotissue ImplementationScalable implementation with parallelized scale-tasks
Macroscale
Microscale
macro0 macro1 macro2 macroN
micro0 micro1 micro2 microM-1 microM
20
Biotissue ImplementationScalable implementation with parallelized scale-taskso Ratio of macroscale mesh elements per macroscale process
to number of microscale processes determines neighborhood of scale-linking communication
Macroscale
Microscale
21
Biotissue Implementationo Macroscale - Parallel Finite Element Analysis
• Distributed partitioned mesh, distributed tensor fields defined over the mesh, distributed linear algebraic system
• Stress field values characterize macro-micro ratioo Fiber-only microscale - Quasistatics code
• ~1k Nodes per RVE• Rapid assembly and solve times per RVE in serial implementation • Strong scaling with respect to macroscale mesh size • Initial results use fiber-only at every macroscale integration point to
generate stress field valueso Fiber-matrix microscale – Parallel FEA
• Order of magnitude more nodes per RVE (~10k-40k)• More complex linear system assembly and longer solve times
(nonlinear) necessitate parallel implementation per RVE
22
Biotissue ImplementationIncorporating fiber-and-matrix microscale RVEso Hierarchy of parallelism
• Macroscale SPMD code• Microscale fiber-only code• Microscale fiber-matrix SPMD code
o Nonlinear problem o Macroscale to auxiliary scales relation more complex
• Constitutive relation• Fiber-only RVE• Fiber-matrix RVE
o Adaptive processes allow these relations to change over timeIntermediate cellular scale will introduce even further complexities to this situation
23
ResultsBiotissue simulation was run with a test problemo Standard tensile test macroscale geometry (dogBone)o Various discretizations of the geometry
• Current results for 20k and 200k, working on memory issues (microscale) for 2m elements and higher
o Holding macroscale count fixed, varying microscaleo Holding micrsocale count fixed, varying macroscaleo Varying both scales
24
Results
25
ResultsTi
me
(s)
# of processes on microscale
1st iteration of multiscale solver 20k mesh – 2 macro processes
26
ResultsTi
me
%
# of processes on microscale
1st iteration of multiscale solver 20k mesh – 2 macro processes
27
Results(varying macroscale while holding micro fixed)
Tim
e (s
)
# of processes on macroscale
1st iteration of multiscale solver 200k mesh – 7680 macro processes
28
Results
Arrows indicate increasing macro size (4,8,16,32,64)
Communication
1st iteration of multiscale solver 200k mesh – time ratios
Tim
e %
# of processes on microscale
29
Results(weak scaling results)
Tim
e (s
)
# of processes on macroscale
# of processes on macroscale
30
Closing RemarksResults are just starting to come out of the implementationo Need to identify critical areas of each scale code to improve
overall performance of multiscale codeo Shift to phased communication will allow macroscale to
process microscale results as they arrive, increasing computation communication overlap
o Contributing microscale code needs memory footprint improvements to mitigate running out of memory during longer runs (larger meshes)