Algorithm and Scaling (Issues) for Aerospace (CFD) Codes
description
Transcript of Algorithm and Scaling (Issues) for Aerospace (CFD) Codes
2
Scope of Presentation Range of aerospace CFD and related
applications Hierarchy of simulation approaches Hierarchy of algorithmic approaches Algorithm and scalability issues and
considerations
3
Presentation Approach & Goals A picture is worth a thousand words We will use ten thousand words and 1
picture == eleven thousand word-equivalents
Catalog, serve as collective conscience
Discuss relationship between application needs, algorithms, modeling approaches and HPC issues and possibilities
CFD++ Aerospace Applications External aerodynamics Propulsion integration Component integration Systems Cabin airflow FADEC Icing Fuel tank purge Thrust reverser Propulsion Nozzle design Jet noise
CFD++ Aerospace Applications Plumes Trajectory Aerodynamic coefficients Drag polar Dynamic derivatives Store separation Canopy separation Sabot separation Stage separation Pilot seat ejection Projectiles Spinning projectiles
CFD++ Aerospace Applications Synthetic jets Turbomachinery Blade design Blade cooling Pulsed detonation Flapping wings Flexible wings Entomopters Helicopters Propellers, rotors Parachutes Parachutists, sky-diving
CFD++ Aerospace Applications Spacecraft launch Reentry vehicles Rocket assisted landings (Earth, Mars, Venus) X-Prize vehicles Land speed record vehicles Bullets, artillery rounds Liquid fuel breakup Liquid fuel sloshing, feed Acceleration, deceleration effects Aeroacoustics Flow Structure Interaction (FSI)
What’s special about Aerospace CFD?
Extremes of scales, operating conditions, physics and chemistry, speeds, application-specific needs (extraction of useful information)
Nonlinearity is most often inherent It is not just the simulation itself that
counts If there is no information output
required, no need to do the simulation
Hierarchy of problem classes Steady state/unsteady problems Small, medium and large scale
problems Entire configurations as well as
analysis of components Engineering analysis, scientific
analysis, trouble shooting All speeds, atmospheric conditions,
diverse fluids and their properties
10
Physics (nature)Math Model of Physics
Numerical Model of Math ModelComputational Model Human(s) in the loop
Simulation Results
Common Elements of Simulations
Common Underlying Physical Processes
ijP
11
Convection:
Production:
Dissipation:
Redistribution:
Diffusion:
Evolution:tuu ji
k
kji
xuuu
.
ij
*ij
ijd
12
Summary of some HPC issues Loading the problem, saving final results Checkpointing Computational vs. communications
performance (scalability) Data extraction issues Robustness (10000-way parallel should
be as robust as serial algorithm) Data-center issues (throughput, storage) Visualization, interaction with running
case
13
Modeling Hierarchy Potential flow assumption Small-disturbance approaches Inviscid-flows taken separately, and
hybridized with boundary layer theory Reynolds/Favre-averaged N-S
equations with phenomenological turbulence models
LES and hybrid RANS-LES approaches Special equations and models
14
Mesh possibilities Surface mesh only (panel methods) Cartesian mesh, almost Cartesian mesh Structured mesh – hex (3D) & quad (2D) Unstructured – all cell types Hybrid structured and unstructured
meshes, hex-core meshes Patched and overset meshes Moving (dynamic) meshes Flexible boundaries and meshes
15
“Extreme Grids” Aspect ratios of 10000 to 1 or more
(boundary layer resolution with Y+ < 1)
Mesh sizes of hundreds of million and more
Extreme grid spacings present in mesh
16
Numerical approaches Explicit and implicit Fractional steps and factored
schemes Finite volume, finite difference
schemes Finite element schemes Spectral and spectral element
schemes “Local” schemes and “global”
schemes
17
Some HPC algorithmic challenges Challenges of making implicit schemes
be really implicit on multi-CPU computations
Ensure insensitivity of results to variations in number of parallel processes used
How to make the 10000-way parallel computation as robust as the serial algorithm
How to make the 10000-way parallel computation converge as well but in much less time
18
Adaptive meshes Adaptive elements (cells) Adaptive grids H-adaptation, P-adaptation, H-P-
adaptation
19
Classification of Algorithms Low information density schemes –
expand stencil to improve accuracy High information density schemes –
expand information content per cell (e.g. use values and derivatives, or values at multiple collocation points)
Homogeneity (or lack of) of discretization and solution methodology
Homogeneity (or lack of) underlying physics models
20
The usual scalability considerations
Computation and communication Computation versus communication Overlap of computation and
communication Bulk of communication for local
schemes can follow pattern of one to a few connectivity
Global operations – global reductions often determine scalability
21
Recent Scalability Improvements CFD++ now scales well to very large number of cores The scalability improvements are universal – they apply to all
modern HPC platforms from all vendors Tests have shown effective performance all the way up to 4096
cores Even relatively small grids (e.g. 16 million cells) scale well to
2048 or even 4096 cores, depending on computer and type of case run
Goal – to demonstrate similar performance on 10000 to 40000 cores
0 200 400 600 800 1000 12000123456
# of CPU cores
Scaling Perfor-mance
0 500 1000150020002500300035004000450005
10152025303540
# of CPU cores
Scaling Per-for-
mance
Ex 1: 33M cells, Computer 1, Case 1
Ex 2: 16M cells, Computer 2, Case 2
22
Some Influences on Scalability Effect of physics – increased
sophistication means more computation, often more scalability
Effect of numerics – increased accuracy means more computation, and more communication, often more scalability
Effect of grid – more grid means more computation and less communication for “local” algorithms
23
Additional thoughts on Parallel Processing
Two ways of using multiple compute engines Parallel computations Pipelined computations
Pipelined algorithms have not been exploited too much at the HPC level
Process level and thread level parallelism beginning to be combined (e.g. to exploit GPGPUs)
24
Load balancing issues Structured vs. unstructured grids
(usually solved by weighted domain decomposition)
Adaptive algorithms and adaptive meshes
Different physics in different regions Moving meshes and overset meshes
25
Optimization considerations Parallel algorithms for optimization How to use large numbers of
processors E.g. Do many cases in parallel Pre-compute cases matrix, sensitivity,
etc. and then train neural networks or tabulate sensitivity before applying optimization procedure
26
Multi-physics considerations Communications between non-
homogeneous simulation tools Communications between diverse
hardware platforms Tight coupling vs. loose coupling
considerations
27
Need for Parallel I/O and File systems
Very large scale problems Very large number of processors Initial load and final save +
intermediate data output Asymmetric data extraction needs
28
Typical “post-processing” needs Global information (forces and
moments, lift, drag, torque) Semi-global information (forces and
moments along wing span, along fuselage)
Reduced subsets – iso-surfaces, surface data, cut-planes
Time-averages versus instantaneous values
In-situ “post”-processing can be very useful
Single and Distributed File Parallel I/O
Parallel I/O (PIO) can be accomplished in two ways
In Single-File mode, PIO reads and writes from the current full-mesh/full-solution files.
In Distributed-File mode, PIO reads and writes from a set of files (e.g. placed in subdirectories) associated with each parallel process
29
30
Interactive massively parallel computing
Steady state versus Transient (unsteady) computations
Links with front-end and graphical processing
Even post processing of large scale problems may require substantial parallel computing resources
One should not just focus on the “batch” computing model
31
Some elements of the balancing act
Computation Communication Memory requirements I/O requirements Accuracy requirements Robustness requirements In-situ solution processing
requirements
32
Bandwidths to consider Number of cores vs. number of I/O
channels Memory bandwidth from core to
memory Memory access conflicts
33
Some old ideas revisited Paying more attention to connectivity
architecture Minimization of hops Domain decomposition that minimizes
traffic between switches How many switches or hops (groups
of nodes), how many nodes, how many processors in a node, how many cores per processor
34
Final thoughts The challenge of producing codes that
work in the user’s hands and computing facilities
Ease of use Scalability and effectiveness vs. just
scalability Resource maximization versus
minimization What can be done with less What can be done with more What more can be done with less
Thank you