Algorithm and Scaling (Issues) for Aerospace (CFD) Codes

1

Algorithm and Scaling (Issues) for Aerospace

(CFD) CodesSukumar Chakravarthy

[email protected]

2

Scope of Presentation Range of aerospace CFD and related

applications Hierarchy of simulation approaches Hierarchy of algorithmic approaches Algorithm and scalability issues and

considerations

3

Presentation Approach & Goals A picture is worth a thousand words We will use ten thousand words and 1

picture == eleven thousand word-equivalents

Catalog, serve as collective conscience

Discuss relationship between application needs, algorithms, modeling approaches and HPC issues and possibilities

CFD++ Aerospace Applications External aerodynamics Propulsion integration Component integration Systems Cabin airflow FADEC Icing Fuel tank purge Thrust reverser Propulsion Nozzle design Jet noise

CFD++ Aerospace Applications Plumes Trajectory Aerodynamic coefficients Drag polar Dynamic derivatives Store separation Canopy separation Sabot separation Stage separation Pilot seat ejection Projectiles Spinning projectiles

CFD++ Aerospace Applications Synthetic jets Turbomachinery Blade design Blade cooling Pulsed detonation Flapping wings Flexible wings Entomopters Helicopters Propellers, rotors Parachutes Parachutists, sky-diving

CFD++ Aerospace Applications Spacecraft launch Reentry vehicles Rocket assisted landings (Earth, Mars, Venus) X-Prize vehicles Land speed record vehicles Bullets, artillery rounds Liquid fuel breakup Liquid fuel sloshing, feed Acceleration, deceleration effects Aeroacoustics Flow Structure Interaction (FSI)

What’s special about Aerospace CFD?

Extremes of scales, operating conditions, physics and chemistry, speeds, application-specific needs (extraction of useful information)

Nonlinearity is most often inherent It is not just the simulation itself that

counts If there is no information output

required, no need to do the simulation

Hierarchy of problem classes Steady state/unsteady problems Small, medium and large scale

problems Entire configurations as well as

analysis of components Engineering analysis, scientific

analysis, trouble shooting All speeds, atmospheric conditions,

diverse fluids and their properties

10

Physics (nature)Math Model of Physics

Numerical Model of Math ModelComputational Model Human(s) in the loop

Simulation Results

Common Elements of Simulations

Common Underlying Physical Processes

ijP

11

Convection:

Production:

Dissipation:

Redistribution:

Diffusion:

Evolution:tuu ji

k

kji

xuuu

.

ij

*ij

ijd

12

Summary of some HPC issues Loading the problem, saving final results Checkpointing Computational vs. communications

performance (scalability) Data extraction issues Robustness (10000-way parallel should

be as robust as serial algorithm) Data-center issues (throughput, storage) Visualization, interaction with running

case

13

Modeling Hierarchy Potential flow assumption Small-disturbance approaches Inviscid-flows taken separately, and

hybridized with boundary layer theory Reynolds/Favre-averaged N-S

equations with phenomenological turbulence models

LES and hybrid RANS-LES approaches Special equations and models

14

Mesh possibilities Surface mesh only (panel methods) Cartesian mesh, almost Cartesian mesh Structured mesh – hex (3D) & quad (2D) Unstructured – all cell types Hybrid structured and unstructured

meshes, hex-core meshes Patched and overset meshes Moving (dynamic) meshes Flexible boundaries and meshes

15

“Extreme Grids” Aspect ratios of 10000 to 1 or more

(boundary layer resolution with Y+ < 1)

Mesh sizes of hundreds of million and more

Extreme grid spacings present in mesh

16

Numerical approaches Explicit and implicit Fractional steps and factored

schemes Finite volume, finite difference

schemes Finite element schemes Spectral and spectral element

schemes “Local” schemes and “global”

schemes

17

Some HPC algorithmic challenges Challenges of making implicit schemes

be really implicit on multi-CPU computations

Ensure insensitivity of results to variations in number of parallel processes used

How to make the 10000-way parallel computation as robust as the serial algorithm

How to make the 10000-way parallel computation converge as well but in much less time

18

Adaptive meshes Adaptive elements (cells) Adaptive grids H-adaptation, P-adaptation, H-P-

adaptation

19

Classification of Algorithms Low information density schemes –

expand stencil to improve accuracy High information density schemes –

expand information content per cell (e.g. use values and derivatives, or values at multiple collocation points)

Homogeneity (or lack of) of discretization and solution methodology

Homogeneity (or lack of) underlying physics models

20

The usual scalability considerations

Computation and communication Computation versus communication Overlap of computation and

communication Bulk of communication for local

schemes can follow pattern of one to a few connectivity

Global operations – global reductions often determine scalability

21

Recent Scalability Improvements CFD++ now scales well to very large number of cores The scalability improvements are universal – they apply to all

modern HPC platforms from all vendors Tests have shown effective performance all the way up to 4096

cores Even relatively small grids (e.g. 16 million cells) scale well to

2048 or even 4096 cores, depending on computer and type of case run

Goal – to demonstrate similar performance on 10000 to 40000 cores

0 200 400 600 800 1000 12000123456

# of CPU cores

Scaling Perfor-mance

0 500 1000150020002500300035004000450005

10152025303540

# of CPU cores

Scaling Per-for-

mance

Ex 1: 33M cells, Computer 1, Case 1

Ex 2: 16M cells, Computer 2, Case 2

22

Some Influences on Scalability Effect of physics – increased

sophistication means more computation, often more scalability

Effect of numerics – increased accuracy means more computation, and more communication, often more scalability

Effect of grid – more grid means more computation and less communication for “local” algorithms

23

Additional thoughts on Parallel Processing

Two ways of using multiple compute engines Parallel computations Pipelined computations

Pipelined algorithms have not been exploited too much at the HPC level

Process level and thread level parallelism beginning to be combined (e.g. to exploit GPGPUs)

24

Load balancing issues Structured vs. unstructured grids

(usually solved by weighted domain decomposition)

Adaptive algorithms and adaptive meshes

Different physics in different regions Moving meshes and overset meshes

25

Optimization considerations Parallel algorithms for optimization How to use large numbers of

processors E.g. Do many cases in parallel Pre-compute cases matrix, sensitivity,

etc. and then train neural networks or tabulate sensitivity before applying optimization procedure

26

Multi-physics considerations Communications between non-

homogeneous simulation tools Communications between diverse

hardware platforms Tight coupling vs. loose coupling

considerations

27

Need for Parallel I/O and File systems

Very large scale problems Very large number of processors Initial load and final save +

intermediate data output Asymmetric data extraction needs

28

Typical “post-processing” needs Global information (forces and

moments, lift, drag, torque) Semi-global information (forces and

moments along wing span, along fuselage)

Reduced subsets – iso-surfaces, surface data, cut-planes

Time-averages versus instantaneous values

In-situ “post”-processing can be very useful

Single and Distributed File Parallel I/O

Parallel I/O (PIO) can be accomplished in two ways

In Single-File mode, PIO reads and writes from the current full-mesh/full-solution files.

In Distributed-File mode, PIO reads and writes from a set of files (e.g. placed in subdirectories) associated with each parallel process

29

30

Interactive massively parallel computing

Steady state versus Transient (unsteady) computations

Links with front-end and graphical processing

Even post processing of large scale problems may require substantial parallel computing resources

One should not just focus on the “batch” computing model

31

Some elements of the balancing act

Computation Communication Memory requirements I/O requirements Accuracy requirements Robustness requirements In-situ solution processing

requirements

32

Bandwidths to consider Number of cores vs. number of I/O

channels Memory bandwidth from core to

memory Memory access conflicts

33

Some old ideas revisited Paying more attention to connectivity

architecture Minimization of hops Domain decomposition that minimizes

traffic between switches How many switches or hops (groups

of nodes), how many nodes, how many processors in a node, how many cores per processor

34

Final thoughts The challenge of producing codes that

work in the user’s hands and computing facilities

Ease of use Scalability and effectiveness vs. just

scalability Resource maximization versus

minimization What can be done with less What can be done with more What more can be done with less

Thank you

Algorithm and Scaling (Issues) for Aerospace (CFD) Codes

Documents

Transcript of Algorithm and Scaling (Issues) for Aerospace (CFD) Codes