Performing acoustic, vibro-acoustic and aero-acoustic...

Performing acoustic, vibro-acoustic and

aero-acoustic computations using

MUMPS

Presented By: Eveline Rosseel

29 May 2013

1FFT Confidential 5/29/2013

29 May 2013

• Introduction on Free Field Technologies

• MUMPS in Actran

• Benchmark of sparse direct solvers

• Conclusions

Outline


Free Field Technologies: leader in acoustic, vibro-acoustic and aero-acoustic CAE

• Free Field Technologies (FFT) -Software Development since 1998

• Main activities

– Development of the Actran software

– Services: training, consulting,

technology transfer, …


technology transfer, …

– Research in acoustic CAE

and related fields

• Our customers’ fields

– Automotive

– Aerospace

– Electronic

– Heavy equipment

Free Field Technologies around the world

Headquarted in Mont-Saint-Guibert, Belgium, FFT has offices in Toulouse, France, Tokyo, Japan, and

Troy, MI, USA.

FFT is part of MSC Software Corporation, international leading provider in Virtual Product

Development technology.


Our software is distributed in each global region and used by more than 250 customers around the world.

MUMPS in Actran


MUMPS: default solver in Actran

Target applications

• Mostly complex, unsymmetric sparse systems with a symmetric structure

• Up to a few million of DOFs, up to a few 1000 RHS

• Out-of-core computations

• Shared and distributed memory computing

• Application dependent sparsity patterns

• Mostly complex, unsymmetric sparse systems with a symmetric structure

• Up to a few million of DOFs, up to a few 1000 RHS

• Out-of-core computations

• Shared and distributed memory computing

• Application dependent sparsity patterns


• Application dependent sparsity patterns• Application dependent sparsity patterns

MUMPS: highlighted experiencesBacktransformation phase

Out-of-core computations: congestion due to shared scratch disk

• Example: frequency parallelism, every proc runs its own MUMPS instance• Example: frequency parallelism, every proc runs its own MUMPS instance

Proc 1

solve freq 1

Memory

proc 1

Proc 2

solve freq 2

Memory

proc 2

Proc n

solve freq n

Memory

proc n

Time Factorize Solve

1 proc

sequential

39 min 7 min


Configuration: 600 KDOF, 253 RHS, Westmere-ex Intel 2.26 GHz, 4x8 cores, raid-0 sata scratch disk

• Solution: ICNTL(27) and introduction of additional synchronization amongst procs

Configuration: 600 KDOF, 253 RHS, Westmere-ex Intel 2.26 GHz, 4x8 cores, raid-0 sata scratch disk

• Solution: ICNTL(27) and introduction of additional synchronization amongst procs

Scratch disk

sequential

8 procs =

8 sequential

MUMPS

instances

44 min Up to 4.5h

MUMPS: highlighted experiencesBacktransformation phase

Out-of-core computations

• Reduced I/O

congestion using

ICNTL(27) and

additional

synchronization points

• Optimal value

• Reduced I/O

congestion using

ICNTL(27) and

additional

synchronization points

• Optimal value


• Optimal value

ICNTL(27)=NRHS

• Additional

synchronization points:

backtransformation

step of processors

sharing the same

scratch is done in

sequential mode

• Optimal value

ICNTL(27)=NRHS

• Additional

synchronization points:

backtransformation

step of processors

sharing the same

scratch is done in

sequential mode

• Quality of reordered matrix (METIS, SCOTCH, …) influences memory consumption

factorization phase

• Distributed computations:

• Memory consumption peak on proc 0 during sequential analysis phase

surpasses memory consumption parallel factorization phase

• Quality of reordered matrix (METIS, SCOTCH, …) influences memory consumption

factorization phase

• Distributed computations:

• Memory consumption peak on proc 0 during sequential analysis phase

surpasses memory consumption parallel factorization phase

MUMPS: highlighted experiencesAnalysis phase


Analysis phase

Factorization

Configuration:

1.9 MDOF, 1 RHS

SCOTCH ordering

Out-of-core run on

Westmere-ex

2.4GHz processor

with 4x10 cores

and 256 GB RAM

MUMPS: highlighted experiencesAnalysis phase

Scalability of MPI computations: need for parallel analysis

• Avoid memory consumption peak at sequential analysis phase

by using a parallel analysis phase: PT-Scotch or Parmetis

• Avoid memory consumption peak at sequential analysis phase

by using a parallel analysis phase: PT-Scotch or Parmetis

Configuration:

1.9 MDOF, 1 RHS

PT-SCOTCH

ordering


• Problem: time and memory consumption has increased compared to run with Scotch

-> scalability issue parallel analysis

• Problem: time and memory consumption has increased compared to run with Scotch

-> scalability issue parallel analysis

ordering

Out-of-core run on

Westmere-ex

2.4GHz processor

with 4x10 cores and

256 GB RAMFactorization

Analysis phase

MUMPS in Actran:future plans

• Increasing model sizes:

• Need for 64bit integer version

• Increasing number of nodes in distributed memory computing:

• Need for robust parallel analysis

• More investigations on hybrid iterative/direct solver use

• Increasing model sizes:

• Need for 64bit integer version

• Increasing number of nodes in distributed memory computing:

• Need for robust parallel analysis

• More investigations on hybrid iterative/direct solver use


• More investigations on hybrid iterative/direct solver use• More investigations on hybrid iterative/direct solver use

Benchmark of sparse direct solvers


Overview solvers

MUMPS (4.10.0) Pardiso 10.3 Intel MKL UMFPACK (5.6.2)

out-of-core ✓ ✓ X

multithreading BLAS ✓ BLAS

MPI support ✓ X X

Motivation: assess performance of MUMPS with respect to other sparse direct solvers


MPI support ✓ X X

ordering

(PAR)Metis, (PT)Scotch,

PORD, (Q)AMD, AMF MD or Metis

(COL)AMD, Metis or

NESDIS

multiple rhs

block approach

✓

(block size)

✓

(no block size) X

iterative

refinement ✓ ✓ ✓single/double

precision ✓ ✓ ✓

Solver benchmark settings

• Sequential and multithreaded tests (no MPI)

• Ordering METIS

• Pivot threshold 0.01

• Double precision

• Internal RHS block size 16

• No iterative refinement

• Sequential and multithreaded tests (no MPI)

• Ordering METIS

• Pivot threshold 0.01

• Double precision

• Internal RHS block size 16




• Memory relaxation 20%


• Memory relaxation 20%

Acoustic test cases

� Represent 20-30% of our customers (mainly automotive)

� All tests successful

ACOUSTIC RADIATION Symmetric? NDOF NRHS

IFEM-VS No 15.6K 1

PML-DC YES 280K 1


PML-DC YES 280K 1

IFEM-DC No 405K 1

RC_Indus_hpc_MUMPS case 3 (MB) No 730K 1

RC_Indus_hpc_MUMPS case 2 (IFE) No 872K 1

PML-DF YES 1.05M 1

IFEM-DF No 1.38M 1

RC_Indus_hpc_MUMPS case 1 (IFE) No 1.90M 3

Memory consumption: in-core

Pardiso MKL and MUMPS: comparable memory

requirements

Pardiso MKL: lowest

memory consumption


memory consumption

UMFPACK: highest memory

consumption, large difference

with other solvers

Memory consumption: out-of-core

Large difference between memory

consumption of OOC Pardiso and

OOC MUMPS on acoustic tests


Computational cost: sequential runs

MUMPS: lowest overall factorization

time for the sequential runs

UMFPACK: very high factorization

time on largest test case


Computational cost: multithreaded runs

Absolute timings Parallel efficiency


Pardiso MKL: nearly optimal parallel efficiency

UMFPACK: very high computing times

Vibro-acoustic test cases

� Represent 50-60% of our customers

MUMPS PARDISO MKL UMFPACK

RC_Indus_hpc_MUMPS 4 OK OK OK

276 KDOF, 2 RHS


276 KDOF, 2 RHS

Ship OK OK OK

410 KDOF, 3 RHS

RC_Indus_hpc_MUMPS 5

1.86 MDOF, 1 RHS OK OK OK

Pl case

1.75 MDOF, 20 RHS OK

IC: zero pivot

error – OOC: OK

Out of memory

(> 250 GB)

Cockpit

3.09 MDOF, 50 RHS

IC: memory

allocation error OK

N.A. (symmetric

matrix)

Vibro-acoustic test case RC_Indus_hpc_MUMPS_C5

Pardiso MKL MUMPS UMFPACK

IN-CORE 47.3 52.2 76.9

Pardiso MKL MUMPS

Peak memory consumption (Gbyte)


OUT-OF-CORE 10.9 11.7

� IC: Pardiso lowest memory requirements – UMFPACK highest

Computation time

� Same trend as for pure acoustic problems: MUMPS has lowest factorization time

� UMFPACK: largest sequential computation time

TM test cases

� Represent 10-15% of our customers

MUMPS PARDISO MKL UMFPACK

Inlet-Nacelle

542 KDOF, 44 RHS OK OK OK

Inlet-APU


Inlet-APU


By-pass DUCT


By-pass

1.54 MDOF, 521 RHS OK OK OK

Inlet-Nacelle

3.02 MDOF, 151 RHS

IC: memory

allocation error,

OOC: OK OK

Out of memory (>

250 GB)

TM test cases: memory

Inlet Nacelle

600KDOF

ByPass Duct

600KDOF


Real, symmetric test cases

� Low importance to our customers (5%)

� 3 pure vibro test cases (316 to 733 KDOF) and 1 pure acoustic test (419KDOF)

� MUMPS: large memory requirements on

pure acoustic test case w.r.t. Pardiso MKL

� Pardiso: large memory requirements on

pure vibro test cases


pure vibro test cases

Conclusions


• UMFPACK:

• Many restrictions: in-core, only 1 RHS at a time, non-symmetric matrices

• UMFPACK:

• Many restrictions: in-core, only 1 RHS at a time, non-symmetric matrices

Performing acoustic, vibro-acoustic and aero-acoustic simulations with MUMPS

MUMPS in Actran

Solver benchmarks

• Default solver in Actran: very good results obtained -- thanks to MUMPS developers!

• Interested in more extensive use of parallel analysis tool

• Default solver in Actran: very good results obtained -- thanks to MUMPS developers!

• Interested in more extensive use of parallel analysis tool


• No improvement of MUMPS: excessive memory consumption and computation time

• Pardiso:

• Low memory requirements: especially on OOC acoustic test cases

• Good multithreaded behaviour: almost optimal scalability

• MUMPS:

• Fast solver: overall the lowest factorization time

• Low memory requirements for out-of-core version, especially on vibro(-acoustic) tests

• No improvement of MUMPS: excessive memory consumption and computation time

• Pardiso:

• Low memory requirements: especially on OOC acoustic test cases

• Good multithreaded behaviour: almost optimal scalability

• MUMPS:

• Fast solver: overall the lowest factorization time

• Low memory requirements for out-of-core version, especially on vibro(-acoustic) tests

Performing acoustic, vibro-acoustic and aero-acoustic...

Documents

Transcript of Performing acoustic, vibro-acoustic and aero-acoustic...