ALEGRA is a large, highly capable, option rich, production application solving coupled multi-physics...
-
Upload
erick-arnold -
Category
Documents
-
view
216 -
download
0
Transcript of ALEGRA is a large, highly capable, option rich, production application solving coupled multi-physics...
ALEGRA is a large, highly capable, option rich, production application solving coupled multi-physics PDEs modeling magnetohydrodynamics, electromechanics, stochastic damage modeling and detailed interface mechanics in high strain rate regimes on unstructured meshes in an ALE framework. Nearly all the algorithms must accept dynamic, mixed-material elements, which are modified by remeshing, interface reconstruction, and advection components. Recent trends in computing hardware have forced application developers to think about how to address and improve performance on traditional CPUs and to look forward to next generation platforms. Core to the ALEGRA performance strategy is to improve and rewrite loop bodies to be conformant with the requirements of high performance kernels, such as accessing data in array form, no pointer dereferencing, no function calls, and thread safety. Necessary to achieve this, however, are changes to the underlying infrastructure. We report on recent progress in the infrastructure to support array-based data access and on iteration of mesh objects. The effects on performance on traditional platforms will be shown. We also discuss the practical realities and cost estimates for attempting to move an existing full featured production application like ALEGRA toward running effectively on future platforms and being maintainable at the same time.
The ALEGRA Production Application: Strategy, Challenges and Progress Toward Next Generation Platforms
Richard R. DrakeDept 1443 - Computational Multiphysics, Sandia National Laboratories
Algorithms & Abstractions for Assembly in PDE Codes, May 12-14, 2014
ALEGRA: Shock Hydro & MHD
• 20 years of development & evolution• Operator split, multi-physics• Includes explicit and implicit PDE solvers• 2 and 3 spatial dimensions• Core hydro is multi-material Lagrangian plus remap• An XFEM capability is maturing• 650k LOC (not including libraries, such as Trilinos)• Mix of research, development, and production
capabilities• Extensive material model choices
Shock hydro
2D Magnetics
3D Resistive MHD
Extensive material model choices
Some ALEGRA Core Algorithms
• Mixed material cell treatment• Remap
• Remesh• Material interface reconstruction• Material & field advection
• Dynamic topology• Extended Finite Element Method (XFEM)• Spatial refinement/unrefinement
• Flexible set of material models comprising each material
• Central difference and midpoint time integration options
XFEM requires topological enrichment
Material interfacereconstruction
Swept volume & intersection remap
NEVADA Infrastructure (A Framework)
Everything depends on the “Mesh”
Field I/O
Load Balancing
Contact
Spatial Adaptivity
XFEM Adaptivity
Halo Comm
In-Situ Processing
In-Situ Viz
Remesh
Interface Reconstruction
Advection
Input Parsing
Physics Algorithms
Unstructured Mesh
Structured Mesh
Materials
Performance
We need to run faster !• Customer needs• NW needs• Optics (marketing)
It has become clear that:• There is no performance silver bullet• Application software must change• This will require a resource shift
Can’t rely on faster CPUs anymore !
56%60%
Muzia, 2D
The ALEGRA Performance Strategy
Work in the present but aim for the future.
Incrementally reimplement algorithms• Remesh, interface reconstruction,
advection• Lagrangian step pieces• Matrix assembly coding• Time step size computation
Focus on foundational concepts• Accessing bulk data in array form• Limit pointer dereferencing• Limit function calls (non-inlined)• Minimize the data read/writes• Thread safety
Refactor support infrastructure• Enable array-based access• Enable flat indexed based iteration• Enable thread safety (colorings?)
Consider new algorithms• Alternate formulations• New/different algorithms
[Komatitsch]
Progress in Data Layout
v1v2v3v4 . . .
Object-based layoutArray-based layout
obj_idx0 1 2
v1v2v3v4 . . .
Indexed by “obj_idx”
“double**”
ndVector_Var( CURCOOR )
nddata[ CURCOOR ]
nddata[ CURCOOR ][ ndobj_idx ]
Becomes, in object layout:in array layout:
• Object-based layout has more direct access to memory.• Array-based layout has better cache & TLB behavior.• Depending on the algorithm and problem size, the better memory
behavior may or may not offset the extra dereferencing.
“Transpose”the storage
Common, existing access pattern:
Speedups: Object- versus Array-Based
• Comparisons of unmodified versus array-based code• Intel chips: RedSky=Nehalem, TLCC2=SandyBridge• The memory behavior wins over the extra offset in
many cases.
Algorithms Should Usethe Arrays Directly
Element * el = 0;TOTAL_ELEMENT_LOOP(el) { const Vector vara = el->Vector_Var( VARA_IDX ); Vector & varb = el->Vector_Var( VARB_IDX ); el->Vector_Var( VARA_IDX ) += varb; el->Scalar_Var( VARC_IDX ) = vara * varb;}
ArrayView<Vector> vara = mesh->getField( VARA_IDX );ArrayView<Vector> varb = mesh->getField( VARB_IDX );ArrayView<double> varc = mesh->getField( VARC_IDX );Element * el = 0;TOTAL_ELEMENT_LOOP(el) { const int ei = el->Idx(); const Vector va = vara[ei]; vara[ei] += varb[ei]; varc[ei] = va * varb[ei];}
Object-based access:
Array-based access:(Oversimplified, hypothetical loop)
Object List & Iteration Improvements
Index based mesh object storage Enables iteration without dereferencing objects
Performance comparison shows no improvement Algorithms would have to take advantage first
Doubly linked lists: Index sets:
for ( int i=0; i<N; ++i ) { int ni = index_list[i]; vel[ni] = old_vel + dt * accl[ni]; ...}
Can now do this:
Convert to use integer offsets
0 1 2
List:
Data:
Nodes:…Nodes:
List:
Data:
…0 1 2
Object Ordering Exploration
Improve cache locality by mesh object ordering Hmm? No speedups over default ordering
Order elements by space filling curve
[wikipedia]
Order nodes by first touch element loop
Summary
ALEGRA has adopted a low risk performance strategy Main concept: incrementally rewrite algorithms towards NGP standards
Progress made on support infrastructure Array-based field data Integer index set object looping
1.4X speedup realized on realistic simulations
Work continues on infrastructure & algorithms Data: Topology storage, integer field data, material data Algorithms: Remap, Lagrangian step