Accelerating Complex Simulations: An Example From … · 2014. 11. 10. · HyperWorks Solvers...
Transcript of Accelerating Complex Simulations: An Example From … · 2014. 11. 10. · HyperWorks Solvers...
-
Innovation Intelligence®
Accelerating Complex Simulations:
An Example From Manufacturing
Eric Lequiniou
Director, High Performance Computing
Altair
-
2
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Agenda
• Who is Altair?
• What is RADIOSS and its performance?
• How Altair uses Intel tools to optimize our software
• Q&A
-
3
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Overview
Founded ...
In 1985 as a product design consulting company
Today ...
A global software, services & technology leader
with over 45 offices in 21 countries and
5,000+ customers worldwide
’85 ’13
$250M
$100M
-
4
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Innovation Intelligence®
27+Years of Innovation
45+Offices in 21 Countries
2000+Employees Worldwide
-
5
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Altair Knows HPC
Altair is the only company that:
makes HPC tools…
AND develops HPC applications…
…AND uses these to solve real problems
700+ Altair engineers worldwide
work with clients every day to
address technical computing
challenges and develop
solutions
-
6
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Intel and Altair: Partners in HPC
A history of collaboration…
• Cluster Management: PBS-Intel integrations
• MPI integration
• Intel® Cluster Checker
• Xeon Phi coprocessor
• Certifications: Intel Cluster-Ready
• PBS Professional
• Solvers (RADIOSS, OptiStruct, AcuSolve, FEKO)
• ICR 2011, 2012 and 2013 partner awards
• Application Integration: Use of Intel tools and technologies
• Intel® MPI, Intel® Fortran & C++ compilers, Intel® MKL Library, Intel® VTune™
Amplifier, Intel trace analyzer & collector
• Benchmarking activities on large cluster configurations
• Professional Support: Close collaboration among technical personnel
• Access to Intel hardware resources: SDV systems, large cluster
• Intel technical expertise helps us to optimize our software on Intel systems
-
7
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
HyperWorks Solver Technology
Multiphysics Analysis and Optimization
Structural
Analysis
Manufacturing
Simulation
Systems
SimulationFluid
Dynamics
Thermal
Analysis
Crash,
Safety,
Impact &
Blast
Electro-
Magnetics
-
8
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
HyperWorks Solvers
Thermal
and CFD
Highly Non-
Linear
Crash
Safety
Forming
Statics
NVH
Thermal
Non-Linear
Multi-body
Dynamics
OptiStruct RADIOSS MotionSolve AcuSolve
Optimization
Smart Multiphysics
FEKO
Electro-
Magnetics
-
9
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
RADIOSS: The Standard Behind Structure Safety
Multiphysics Analysis and Optimization
Crash and
SafetyDrop &
Impact
Blast &
Hydrodynamic
Impact
Fluid-
Structure
Interaction
Terminal
Ballistic
Forming &
Composites
Mapping
-
10
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Introduction to RADIOSS and HPC Numerical Simulation
1987: First full car crash computation in 20 hours
• 20 000 elements only, with limited accuracy
• Took 20h on the Cray XMP vector supercomputer
Today: 15 million car crash simulation in less than 5 hours
• Going massively parallel is key for such outstanding performance on cluster!
• RADIOSS optimized to deliver best performance from single CPU to large clusters
• RADIOSS embeds state-of-the-art numerical methods and parallelization
techniques
• RADIOSS is used with 64~128 cores on today industrial crash models – with a
proven scalability up to 8000 cores
-
11
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
About RADIOSS
• Finite Element Analysis (FEA) solver
for highly non-linear simulations
• Differentiated by its scalability, quality,
and robustness
• Legacy code of several millions of
Fortran lines
• Ported under various systems,
supercomputers, clusters and
accelerators
Compute-intensive simulation
software for Manufacturing
-
12
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Key Parallel Technologies: Hybrid MPI OpenMP
Highly parallel code with Hybrid model
• Domain decomposition with MPI
• OpenMP parallelization
• Explicit multitasking
• Loop auto-parallelization
Enhanced performance
• High efficiency on large HPC clusters
• Unique proven method for rich scalability over thousands of cores for FEA
• Flexibility – easy tuning of MPI & OpenMP
• Double Precision as default – Extended Single Precision ~ 1.5X faster
Robustness
• Parallel arithmetic option allows perfect repeatability in parallel
-
13
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Performance increased by 14x from Woodcrest to Haswell• This breakthrough comes from hardware and software optimization
• Most important factor is # of cores increase and software scalability
* Based on RADIOSS Performance on
Neon 1M benchmark, DP version
0
5
10
15
0
8
16
24
32P
erf
orm
ance
#core
s
Single Node Dual Socket Performance* Evolution
Performance #core per node Freq GHz
RADIOSS: Performance Improvements on Single Node
-
14
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
8568
4516
2416
1511 12422104
8421
4319
2294
1375985 905
1645
8250
4293
2340
1362 888 629 614
8355
4407
2366
1387842 614 556
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 2 4 8 16 32 64
Elapsed (s)
Number of Nodes*
Neon Refined 1 Million 80ms RADIOSS v13.0 betaScalability Study up to 1280 cores
1 thread 2 threads 5 threads 10 threads
* Each node is HP BL460c-gen8 with dual Intel Xeon E5-2680 v2 @2.8GHz with 20 cores & 128 GB 1600 MHz DIMM per node – Infiniband FDR
• Hybrid outperforms pure MPI when using 8 nodes and more
• Recommendation: one MPI per socket and as many OpenMP threads as physical cores
• For same node number, RADIOSS 13 on E5 v2 ~1.5X faster than RADIOSS 12 on E5
RADIOSS: HMPP Scalability on Clusters
mailto:[email protected]
-
15
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
RADIOSS Development: Intel Cluster Studio Tools
At Altair, we actively use
• Intel Fortran and C/C++ compilers
• Intel Math Kernel Library (MKL)
• Intel MPI Library
• Intel VTune Amplifier for performance analysis
• Intel Trace Analyzer and Collector for MPI analysis
Under
• Linux, Windows and Mac OS/X (compilers)
• Xeon and Xeon Phi
-
16
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Compilers & Math Library
• ifort and icc
• Use highest level of optimization flags with respect to correctness & accuracy
• Check compiler reports (-vec-report=3)
• Optimize for different platforms (-ax)
• Static link of compiler libraries
• Importance to upgrade & validate new compiler release to benefit from latest
optimization (12.1 & 13)
• MKL
• Available with compiler package
• Optimized for new hardware
• OpenMP support
-
17
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Intel MPI Library
MPI: Message Passing Interface required to communicate between
processes in a distributed memory environment
• Dynamically linked to support latest installed versions and hardware at
customer site
• Easy optimization of data and process placement in Hybrid MPI OpenMP
• KMP_AFFINITY=scatter or compact
• I_MPI_PIN_DOMAIN=auto or omp
• Scalability at large scale under Infiniband
• I_MPI_DEVICE=rdssm or I_MPI_FABRICS=shm:dapl for Infiniband
• Intel MPI proven efficiency at large scale with RADIOSS
-
18
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Profiling Analysis: Intel VTune Amplifier 1/4
Run amplxe-gui
directly with the
optimized binary
Basic Hotspots
to begin with
-
19
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Profiling Analysis: Intel VTune Amplifier 2/4
Compatible with
MPI, OpenMP and
Hybrid
-
20
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Profiling Analysis: Intel VTune Amplifier 3/4
Very useful Bottom-up
analysis to identify top
routines to optimize
-
21
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Profiling Analysis: Intel VTune Amplifier 4/4
Recompiling with –g
allows profiling at
instruction level in
the source code
Then go back to
compiler report
output
-
22
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
MPI: Intel Trace Analyzer and Collector (ITAC) 1/3
With dynamically linked Intel MPI program
1) Run mpirun –trace executable to collect samples
2) Run traceanalyzer executable.stf
Start with the
Event Timeline chart
By default all the MPI and
OpenMP threads appear
Zoom in to dig into details
-
23
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
MPI: Intel Trace Analyzer and Collector (ITAC) 2/3
The Function Profile and Message Profile charts give additional information
OpenMP threads into a same MPI process can be grouped (or ungrouped)
-
24
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
MPI – Intel Trace Analyzer and Collector (ITAC) 3/3
Click on a particular zone of
the chart to get detailed
information about current
• MPI function called
• Message sent
By implementing different
message tags into the code, it
is easy to directly identify the
matching subroutine
-
25
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Takeaways
• Highly parallel and optimized industrial code with
Altair RADIOSS solver
• Fruitful collaboration between Altair and Intel in
terms of code development, tools, support, access to
computational resources, and co-marketing activities
• Powerful programming environment developed by
Intel that matches our development needs
• Importance of dedicated tools for performance
optimization with Intel Vtune Amplifier and Intel
Trace Analyzer and Collector
• Availability for Xeon and Xeon Phi secures our
investment and commitment in such technology for
long term
-
Innovation Intelligence®
Thank you for attending!
Eric Lequiniou