Engineering Productivity: GPU-Accelerated Simulation &...
Transcript of Engineering Productivity: GPU-Accelerated Simulation &...
1 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
Improving Engineering Productivity with GPU-accelerated Simulation and HPC
Ray Browell
NVIDIA GPU Technology Conference
March 19, 2013
Session - S3546
2 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
• ANSYS Inc. Overview
• ANSYS and NVIDIA Collaboration
• Mechanical GPU Accelerator Capabilities
• Fluent CFD GPU Accelerator Capabilities
• HPC Revolution
• Questions
Improving Engineering Productivity with GPU-accelerated Simulation and HPC
© 2012 ANSYS, Inc. March 18, 2013 3 ANSYS Confidential
Insert image here
“We’re relentlessly committed to your product development success.
We’re passionate about developing world-class engineering software
that addresses your current and future product development needs.”
ANSYS is dedicated exclusively to engineering simulation and is the world's
leading software provider. Product innovators in the most demanding
markets have trusted us for over 40 years.
Dipankar Choudhury Chief Technologist ANSYS
ANSYS, Inc. - Our Focus
© 2012 ANSYS, Inc. March 18, 2013 4 ANSYS Confidential
Insert image here
Insert image here
“Continual ANSYS research leads to advanced, more robust solutions
for even the most complex problems. End users have confidence in
analysis results, meaning they can rely less on costly physical testing.”
Florian Menter Research and Development Fellow ANSYS
Our technology enables you to predict with confidence that your products will
thrive in the real world. Customers trust our software to help ensure the
integrity of their products and drive business success through innovation.
ANSYS, Inc. - Our Technology
5 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
Release
ANSYS Mechanical ANSYS Fluent
13.0 Dec 2010
Shared Memory Solvers;
Single Node/ Single GPU
14.0 Dec 2011
+ Distributed ANSYS;
Multi-node - 1 GPU/node
Radiation Heat Transfer
(beta)
14.5 Nov 2012
+ Multi-GPU / node;
+ Hybrid PCG
+ GPU AMG Solver (beta),
Single GPU
15.0 2013
Ongoing Collaboration
Cuda 5, etc.
Pressure based coupled
solver (PBCS) (WIP), multi-
GPU distributed (WIP)
ANSYS and NVIDIA Collaboration
6 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
Mechanical GPU Accelerator Capability
• “Accelerate” Sparse direct equation solver (Shared & Distributed Memory)
– GPU is used to factor many dense “frontal” matrices
– Decision is made automatically on when to send data to GPU
• “Frontal matrix” too small, too much overhead, stays on CPU
• “Frontal matrix” too large, exceeds GPU memory, only partially accelerated
• “Accelerate” Preconditioned/Jacobian Conjugate Gradient iterative solvers (Shared & Distributed Memory)
– GPU is only used for sparse-matrix vector multiply (SpMV kernel)
– Decision is made automatically on when to send data to GPU
• Model too small, too much overhead, stays on CPU
• Model too large, exceeds GPU memory, only partially accelerated
7 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
Mechanical GPU Accelerator Capability
• Supported Hardware
– Currently support NVIDIA Tesla 20-series, Quadro 6000, Quadro K5000
– Next Generation NVIDIA Tesla Cards (Kepler) will be supported at 14.5.7 Release
– Installing a GPU requires the following:
• Larger power supply (single card needs ~250W)
• Open 2x form factor PCIe x16 2.0 (or 3.0) slot
• Supported Operating Systems
– Windows and Linux 64-bit platforms only
8 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
NVIDIA Tesla C2075
NVIDIA Tesla
M2090
NVIDIA Quadro
6000
NVIDIA Quadro K5000
NVIDIA Tesla K10
NVIDIA Tesla K20
Power (W) 225 250 225 122 250 225
Memory 6 GB 6 GB 6 GB 4 GB 8 GB 5 GB
Memory Bandwidth
(GB/s) 144 177.4 144 173 320 208
Peak Speed SP/DP
(GFlops) 1030/515 1331/665 1030/515 2290/95 4577/190 3520/1170
• Targeted Hardware Specifications
Mechanical GPU Accelerator Capability
9 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
Mechanical GPU Accelerator Capability
• Supports majority of ANSYS Mechanical users
– Covers both sparse direct and PCG/JCG iterative solvers
– Only a few minor limitations
• Ease of use
– Requires at least one supported GPU card to be installed
– Requires at least one HPC Pack license
– No rebuild, no additional installation steps
• Performance
– ~10-25% reduction in time to solution when using 8 CPU cores
– Should never slow down you simulation!
10 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
8 8
12
19
0
5
10
15
20
25
AN
SY
S M
echanic
al
Num
ber
of
Jobs
Per
Day
Results from HP Z820; 2 x Xeons
(16 Cores, use of only 8) 128GB
memory, Win7; 2 x Tesla C2075
V145sp-5 Model
Turbine geometry
2,100 K DOF
SOLID187 FEs
Static, nonlinear
One iteration
ANSYS Mechanical14.5
Direct sparse solver
Results for Distributed ANSYS 14.5 Preview and Xeon 8-Core CPUs
Higher is
Better
Xeon E5-2687W 8 Cores + Tesla C2075
1.6x
ANSYS Mechanical 14.5 Preview
Xeon E5-2687W 8 Cores + 2 x Tesla C2075
11 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
2.6x
3.8x
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
2 cores 8 cores 8 cores
Re
lati
ve S
pe
ed
up
GPU Performance
(no GPU) (no GPU)
• 6.5 million DOF • Linear static analysis • Sparse solver (DMP) • 2 Intel Xeon E5-2670 (2.6 GHz, 16 cores total), 128 GB RAM, SSD, 4 Tesla C2075, Win7
• GPUs can offer significantly faster time to solution
(1 GPU)
Mechanical GPU Accelerator Capability
1.5x
12 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
• GPUs can offer significantly faster time to solution
2.7x
5.2x
0.0
1.0
2.0
3.0
4.0
5.0
6.0
2 cores 8 cores 16 cores
Re
lati
ve S
pe
ed
up
GPU Performance
• 11.8 million DOF • Linear static analysis • PCG solver (DMP) • 2 Intel Xeon E5-2670 (2.6 GHz, 16 cores total), 128 GB RAM, SSD, 4 Tesla C2075, Win7
(no GPU) (1 GPU) (4 GPUs)
Mechanical GPU Accelerator Capability
13 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
Fluent CFD Radiation Modeling on GPUs VIEWFAC
• Utility to compute view factors
• Hybrid MPI-OpenMP-OpenCL parallel implementation
• Works on CPUs, GPUs or both
RAY TRACING
• Utility to compute view factors
• Uses Optix on NVIDIA C2070
Available as full features in 14.5
14 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
2832
933
517 517
0
1000
2000
3000 Dual Socket CPUDual Socket CPU + Tesla C2075
AN
SY
S F
luent
AM
G
Solv
er
Tim
e (
Sec)
2 x Xeon X5650, Only 1 Core Used
1.8x
5.5x
Lower is
Better
2 x Xeon X5650, All 12 Cores Used
Helix geometry
1.2M Hex cells
Unsteady, laminar
Coupled PBNS, DP
AMG F-cycle on CPU
AMG V-cycle on GPU
Helix Model
NOTE: All jobs solver
time only ~65% of total
time
Fluent CFD AMG Solver on GPUs Work-in-Progress
NVAMG Project – Preview of ANSYS Fluent 14.5 Performance
15 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
HPC Revolution
Recent advancements have revolutionized the computational speed available on the desktop
– Multi-core processors
• Every core is really an independent processor
– Large amounts of RAM
– Solid State Drives (SSDs)
– GPUs
16 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
HPC Revolution
The right combination of algorithms and hardware
leads to maximum efficiency
SMP vs. DMP
HDD vs. SSDs
Interconnects Clusters
GPUs
17 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
• Balanced system for overall optimum performance
1.0x 2.7x 5.2x
12.5x
0
5
10
15
20
25
30
2 cores 8 cores 8 cores +GPU
8 cores +GPU + SSD
Re
lati
ve S
pe
ed
up
Balanced Performance IO Bound
• 2.1 million DOF • Nonlinear static analysis • Direct sparse solver (DSPARSE) • 2 Intel Xeon E5-2670 (2.6 GHz, 16 cores total), 128 GB RAM, SSD, 1 Tesla K20c, Win7
Mechanical GPU Accelerator Capability
1.9x
18 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
• Balanced system for overall optimum performance
1.0x 2.7x 5.2x
12.5x
5.7x
12.0x
24.8x 27.3x
0
5
10
15
20
25
30
2 cores 8 cores 8 cores +GPU
8 cores +GPU + SSD
Re
lati
ve S
pe
ed
up
Balanced Performance
IO Bound
Compute Bound
• 2.1 million DOF • Nonlinear static analysis • Direct sparse solver (DSPARSE) • 2 Intel Xeon E5-2670 (2.6 GHz, 16 cores total), 128 GB RAM, SSD, 1 Tesla K20c, Win7
Mechanical GPU Accelerator Capability
19 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
How will you use all of this computing power?
Design Optimization Studies
Higher fidelity Full assemblies More nonlinear
HPC Revolution - Design Optimization
20 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
Improving Engineering Productivity with GPU-accelerated Simulation and HPC
• ANSYS, Inc. is a world leader in CAE
• ANSYS, Inc. is a world leader in GPU software deployment
• HPC Revolution
• Accelerates knowledge
• Allows better understanding
• Empowers Design Optimization
• ANSYS, Inc. and NVIDIA will continue to lead!
21 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential
Thank You!
Improving Engineering Productivity with HPC and GPU-Accelerated Simulation
Raymond Browell
724.514.3070