Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

download Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

of 48

Transcript of Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    1/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Massively Parallel Computation of a Full Aircraft

    Configuration with Delayed Detached Eddy Simulation

    Silvia Reu

    Dieter Schwamborn

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    2/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 2

    Outline

    Introduction and Motivation

    Some DES details

    Defining the test case

    Computational challenges

    Setup of the A320 test case

    First results

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    3/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 3

    Outline

    Introduction and Motivation

    Some DES details

    Defining the test case

    Computational challenges

    Setup of the A320 test case

    First results

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    4/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 4

    Introduction and Motivation

    High Lift Configurations

    Simulation of high lift configurations is of high interest for Airbus Germany

    Critical for the design of high-lift configurations are the accurate prediction of

    maximum lift and

    the noise emission from high lift devices, which can be higher than those

    of the engines in approach and landing situations.

    Conventional RANS approaches are known to be unable to predict the latter

    due to failure of the turbulence models involved

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    5/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 5

    Introduction and Motivation

    Turbulence

    Turbulence consists of eddies of different sizes:

    Large eddies extract energy from the mean motion

    As large eddies interact with each other and break down into smaller ones,

    energy is cascaded from larger to smaller eddies

    At the scale of the finest eddies the Reynolds number of the eddies is smallenough that viscous effects become dominant and mechanical energy is

    dissipated into heat

    There are different modelling levels:

    0% 100%Resolved turbulence

    RANS

    Computationaleffort

    low

    Very high DNS

    LES

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    6/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 6

    Introduction and Motivation

    Direct numerical simulation (DNS)

    All scales are resolved, therefore in one direction the minimal number of points

    must be

    This leads to a total Number of points in three dimensions of

    Not affordable for high Reynolds numbers or complex geometries

    4

    3

    Re

    Lnp

    4

    93

    Re

    LNp

    In our test case the Reynolds number is 1.34 * 106This would mean Np 6*10

    13

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    7/48A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 7

    Introduction and Motivation

    Large eddy simulation (LES)

    Time dependent variables are decomposed into a large scale resolved part and

    a unresolved sub grid scale part, the Navier Stokes equations are solved for the

    filtered variables

    The effect of the turbulent small scales on the large scales is taken into account

    by a sub grid scale model

    Computational costs affordable for large vortical structures, but infeasible high

    (nearly as high as DNS) for the near wall region

    0% 100%Resolved turbulence

    RANS

    Computationaleffort

    low

    Very high DNS

    LESRANS/LES

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    8/48A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 8

    Introduction and Motivation

    Detached Eddy Simulation (DES)

    DES is a hybrid RANS/LES method, which was first developed for massively

    separated flows

    Near walls the simulation is run in Unsteady RANS mode, as the grid resolution

    is usually not good enough to resolve existing structures with LES

    LES is used in regions of unsteady vortical motion, for which RANS models are

    known to give poor results; here the same RANS model with a modified length

    scale is used as sub grid scale model for LES

    In the transition from RANS to LES there is the so called grey area where the

    behavior of the model is unclear

    To alleviate this problem a modified DES scheme, the Delayed DES (DDES), is

    used to move the transition area further into the field ensuring RANS treatment

    in the boundary layer

    The DLR TAU Code allows simulations with DES, DDES and IDDES

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    9/48A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 9

    Introduction and Motivation

    High Lift Configurations

    In order to examine the possible use and capabilities of hybrid RANS/LES

    approaches in the simulation of transport aircraft a number of activities have

    been initiated, like e.g. the DLR-lead EU-Project ATAAC (Advanced Turbulence

    Simulation for Aerodynamic Application Challenges)

    Additionally, CASE has started an initiative

    to apply DES at an almost complete A320

    in landing configuration

    NTS IDDESDLR DDESwith TAU

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    10/48A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 10

    Outline

    Introduction and Motivation

    Some DES details

    Defining the test case

    Computational challenges

    Setup of the A320 test case

    First results

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    11/48A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 12

    Some DES details

    Delayed Detached Eddy Simulation (DDES)

    Treatment of points in the

    grid as RANS or LES area is

    determined by the modified

    wall distance

    Red region is treated in

    RANS mode

    Remaining region is treatedin LES mode

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    12/48A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 13

    Some DES details

    Comparison of RANS and DDES (using TAU on the same grid)

    RANS DDES

    Eddy viscosity (~ modeled turbulence) decreased

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    13/48A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 14

    RANS DDES

    Small structures (~ resolved turbulence) increased

    Some DES details

    Comparison of RANS and DDES (using TAU on the same grid)

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    14/48A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 15

    Outline

    Introduction and Motivation

    Some DES details

    Defining the test case

    Computational challenges

    Setup of the A320 test case

    First results

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    15/48A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 16

    Defining the test case

    Massively Parallel Computation of a Full Aircraft with DES ?

    Does it make sense?

    Simulation of a complete aircraft configuration with RANS/LES

    Without resolution of acoustically relevant effects

    300 400 million grid points (at least!)

    With resolution of acoustically relevant effects 2000 million grid points (at least!) and a very small time step

    NOT FEASIBLE ON CASE CLUSTER

    QUESTION:

    How could the results bevalidated?What could be learned fromsuch a simulation?

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    16/48A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 17

    Defining the test case

    Massively Parallel Computation of a Full Aircraft with local

    DES for analysis of airframe noise sourcesRestriction to one region of interest: SLAT TRACK NOISE

    Model slat track geometry is not identical to geometry of real

    tracks

    There is no knowledge on the influence of their geometry on

    the noise generated

    QUESTION:Can we conclude from noise datameasured in the experiment to

    reality?

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    17/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 18

    Defining the test case

    Analysis of airframe noise

    Only in a small region around the slat track LES resolution is

    required

    The remaining domain is computed in well resolved RANS mode

    The number of points needed to resolve acoustic effects in

    the slat track region reduces dramatically

    Only 80 million points can be sufficient

    CONCLUSION:

    This way the simulation isstill costly but now it ispossible on the CASEcluster

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    18/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 19

    Defining the test case

    Further steps in analysis of airframe noise

    Set up of a quasi 2d surrogate model with

    same sweep angle

    similar resolution

    Perform a simulation to showvalidity of the surrogate model

    ADVANTAGE:Reduced complexity due toquasi 2D configuration

    Additional application of thePIANO code for aero-acoustic analysis possible

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    19/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 20

    Defining the test case

    Further steps in analysis of airframe noise

    Comparison of geometrical detailon the surrogate model:

    Model slat track Real slat track

    Comparison of acoustical results

    Influence of different model track geometries on the resultsPossible correction

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    20/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 21

    Outline

    Introduction and Motivation

    Some DES details

    Defining the test case

    Computational challenges Setup of the A320 test case

    First results

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    21/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 22

    Computational challenges

    Parallelization

    For parallel computations the grid is split into several domains

    At the boundaries of each domain a number of additional points is stored where

    updates from other domains are needed

    The TAU-code uses the message passing interface to communicate between

    domains

    Due to the additional points the problem size virtually increases through

    parallelization

    This means that for each problem there is a maximum number of domains up to

    which a speedup is gained

    Increasing the number of domains beyond the results in a higher computationaltime

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    22/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 23

    Computational challenges

    Example: Domain Decomposition

    36 points

    60 edges (faces of dual cells)

    Picture provided by Dr. J. Jgerskpper

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    23/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 24

    36 points

    60 edges (faces of dual cells)

    4 domains

    Computational challenges

    Example: Domain Decomposition

    Picture provided by Dr. J. Jgerskpper

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    24/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 25

    36 points

    60 edges (faces of dual cells)

    4 domains

    12 edges cut

    20 points affected by cuts

    Computational challenges

    Example: Domain Decomposition

    Picture provided by Dr. J. Jgerskpper

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    25/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 26

    4 domains, each with

    9+6 points

    15+3 edges

    Overlap of domains 4 x 3 = 12 cut edges

    12 faces doubled/added

    12 x 2 = 24 addpoints

    Communication volume

    proportional to #addpoints

    Computational challenges

    Example: Domain Decomposition

    Picture provided by Dr. J. Jgerskpper

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    26/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 27

    4 domains

    12 edges cut and doubled

    12/60 = 20% more edges

    24 addpoints @ 36 ownpoints

    9 domains

    24 edges cut and doubled

    24/60 = 40% more edges

    32 addpoints @ 36 ownpoints

    Communication volume ~ #addpoints #ownpoints

    Computational challenges

    Example: Domain Decomposition

    Picture provided by Dr. J. Jgerskpper

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    27/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 28

    Computational challenges

    Additional Edges and Addpoints:

    F6-configuration with 31M points

    Comm. volume ~ #addpoints #ownpoints

    Picture provided by Dr. J. Jgerskpper

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    28/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 29

    Parallel efficiency:

    nc

    basec

    n

    base

    n

    n

    t

    tE

    ,

    ,

    Computational challenges

    Strong Scaling of TAU on Juropa (Nehalem)

    Central Discretization (JST)

    Runge-Kutta Solver, Multigrid

    Spalart-Allmaras Turb. Model

    Picture provided by Dr. J. Jgerskpper

    Sec/50

    ite

    r.

    Parall.efficie

    ncy

    Number of cores

    Our test case has 80M points

    On 512 cores (base):

    250 sec/50 iter

    On 1024 cores:

    133 sec/50 iter

    On 2048 cores:

    71 sec/50 iter

    At this grid size doubling thenumber of cores almost

    halves the time needed!

    0.94

    0.88

    The higher the number of points, the higher the maximal number of domains, upto which a speedup is gained

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    29/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 30

    Computational challenges

    Large amount of data

    The grid has a size of 6.7 GB

    For a restart also the variables from the previous time step need to be stored,

    thus a restart file has a size of 13 GB

    For averaging additional variables are needed, thus the restart files grow to a

    size of 24 GB

    Since it is not affordable to store or even transfer this large amount of data, files

    with reduced data are written during the data collection phase containing only

    pressure and sources needed for acoustical simulations with the PIANO code

    One reduced file still has a size of 600 MB which adds up to 6 TB for 10000 time

    steps

    To transfer the data continuously without loosing time, the usual TAU procedure

    is switched of and instead a simultaneously running cron job does the transfer

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    30/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 31

    Outline

    Introduction and Motivation

    Some DES details

    Defining the test case

    Computational challenges Setup of the A320 test case

    First results

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    31/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 32

    Setup of the A320 test case

    Test case description

    The geometry is the A320 with deployed high lift devices and engine

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    32/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 33

    Setup of the A320 test case

    Experimental setup

    Mach number: 0.2

    Velocity: 68.5 m/s

    Temperature: 303 K

    Pressure: 977.6 kPa

    Reynolds number: 1.34x10

    6

    Reference length: 0.308 m

    Angle of attack: 3.93

    The experiments were carried out in the German-Dutch low speed wind tunnel

    (DNW-NWB) in Braunschweig and the Large low speed facility (DNW-LLF) in

    Marknesse, Netherlands

    Pressure distributions as well as integral forces were measured

    The acoustical measurements were performed in the DNW-LLF

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    33/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 34

    Setup of the A320 test case

    Grid generation (a challenge in itself)

    The grid was produced with the grid generator CENTAUR

    First a hybrid unstructured RANS mesh with about 30 million points was

    generated by an experienced colleague (Stefan Melber)

    The region near the slat track was further refined by adding extra sources

    Since the resulting number of points could only be roughly estimated, the wholeprocess of refinement took several weeks

    The final version of the grid ran 16 days until it finished

    During the grid generation the process had to be moved to a computer

    with more memory because 32GB were not sufficient

    The additional 50 million points reside in a region of about 6-10% of the

    half span width around the slat track

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    34/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 35

    Setup of the A320 test case

    Computational time step and convective time unit

    In the experiment frequencies ranging from 3-40 KHz were measured

    The final physical time step was chosen to bet=3*10-6 sec

    Thus the highest frequency is resolved with ~8 time steps

    One characteristic length scale is the width of the gap between slat and main

    wing

    With a free stream velocity of 70m/s it takes

    3.6*10-4 seconds for the fluid to pass

    across the gap, which corresponds to

    about 120 physical time steps

    The depth of the wing in the area of the slat

    track is 0.308 m which corresponds to about

    1500 time steps to flow over

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    35/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 37

    Outline

    Introduction and Motivation

    Some DES details

    Defining the test case

    Computational challenges Setup of the A320 test case

    First results

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    36/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 38

    First results

    Status of the DES

    The DES was first run on 2048 cores of the C2A2S2E-Cluster

    One physical time step needed about 19 minutes with an initial step oft=1*10-5

    After 3500 time steps it was further reduced tot=3*10-6 sec.

    Another 1250 steps were computed before time averaging was started

    The demand of other users on the C2A2S2E cluster for computing time became

    so urgent that the number of cores had to be reduced to 1024 leading to a time

    step duration of about 35 minutes, i.e. 0.0037 sec / month (10 gap overflows)

    After the variation of the averaged values had reduced to a reasonable value,the collection of time samples was started at time step 6150

    Until now a time series of about 430 time steps has been collected (from about

    10000 to be done). At current speed this would need 10 month

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    37/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 39

    First results

    Status of the DES

    When the resource problem became clear a proposal for CPU time on JUROPAwas submitted end of February.

    The simulation should last ~2.5 month on JUROPA using 256 node or less then

    1.5 month employing 512 nodes

    The proposal was positively evaluated, but with reduced resources, which will

    not allow to run the complete set of 4 DES simulation within one year.

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    38/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 40

    First results

    Cp distribution in one section

    One experimental Cp distribution available in the region of the DDES

    DV200

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    39/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 41

    First results

    Cp distribution in section DV200

    RANS simulation with angle of attack of 4DV200

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    40/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 42

    First results

    Cp distribution in section DV200

    DDES simulation with angle of attack of 4DV200

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    41/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 43

    First results

    Skin friction at two different time steps

    Iteration 6240 Iteration 6580t = 0.001 ~1-2 over-flow times

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    42/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 44

    First results

    Q-invariant iso-surface

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    43/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 45

    First results

    Vorticity in the region near the slat track

    Wing tip

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    44/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 46

    First results

    Vorticity in the region near the slat track

    Wing tip

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    45/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 47

    First results

    Time series at several points

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    46/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 48

    First results

    Time series in several points

    So far only 430 time steps have been collected since the simulation hasreached statistical convergence

    This insufficient to get proper FFT results

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    47/48

    A320 DDES on 2048 cores > Silvia Reu, Dieter Schamborn > 30.04.2010

    Slide 49

    First results

    Continuation of the simulation

    will take place place on JUROPA in Jlich

    will take a few month especially regarding the post processing

    Maybe we can tell you some news about it during the next HPCN meeting

  • 8/6/2019 Massively parallel computation of a full aircraft configuration with detached eddy simulation PPT

    48/48

    Thank you for your attention