ACCELERATED COMPUTING: THE PATH FORWARD
Transcript of ACCELERATED COMPUTING: THE PATH FORWARD
2
Domain-Specialized Accelerator
Pioneered CUDA Accelerated Computing Extending Performance Post-Moore’s Law
1980 1990 2000 2010 2020
103
105
107
1.5X per year
Single-threaded perf
GPU-Accelerated
Computing
1.1X per year
40 Years of CPU Trend Data
— Tech Walker
COMPUTING AFTER MOORE’S LAW
3
NVIDIA ACCELERATED COMPUTING
Every Computer MakerVolta in Production Every Cloud
| 5,120 CUDA cores
| 7.8 FP64 TFLOPS
21B xtors
15.7 FP32
125 Tensor TFLOPS
VOLTA TAKING OFF
4
ANNOUNCINGNVIDIA TESLA V100 IN MICROSOFT AZURE
NCv3 Series with NVIDIA Tesla V100
Sign up today at www.aka.ms/ncv3
5
ANNOUNCING JAPAN’S AIST ADOPTS NVIDIA VOLTA FOR ABCI SUPERCOMPUTER
Most Powerful AI Supercomputer in Japan
4,352 Tesla V100 GPUs
37 PetaFLOPS FP64 HPC Performance
0.55 ExaFLOPS AI Performance
6
109% Y-Y DATACENTER GROWTH
$0
$500
$1,000
$1,500
FY10 FY11 FY12 FY13 FY14 FY15 FY16 FY17 YTD 18
$1.5B
$1B
$.5B
Full Fiscal Year
1st Three Quarters
FYQ3
7
CUDAComputeWorks
NVIDIA AI SDKcuDNN, NCCL, TensorRT, DIGITS
Every Framework
NGCGRID vPC
Quadro vWS
HPC CSP TRAINING CSP INFERENCE PUBLIC CLOUD INDUSTRIES ENTERPRISE
$12B Market 80% of Apps by 2020 20M Inference Servers $25B Market 600M Amazon Packages $3T IT Industry
— MacHow2
NVIDIA TESLA DATACENTER PLATFORM
9
ARCHITECTING MODERN DATACENTERS
APPS
% W
ORKLO
AD
Strong Scaling
Weak Scaling
Deep Learning
Apps Accelerated
Parallel Speed-Up
% Workload
% Sequential “Amdahl’s Law”
10
ARCHITECTING MODERN DATACENTERS
Strong Core CPU
Volta 5,120 CUDA Cores
125 TFLOPS Tensor Core
NVLink for Strong Scaling
11
0
10
20
30
40
50
60
70
80
20 40 60 80 1000
# of CPUs
ns/
day
AMBER Simulation of CRISPR
ARCHITECTING MODERN DATACENTERS
48 CPU Nodes Comet Supercomputer
12
0
10
20
30
40
50
60
70
80
20 40 60 80 1000
# of CPUs
1 Node with 4x V100 GPUs
48 CPU Nodes Comet Supercomputer
ns/
day
AMBER Simulation of CRISPR
ARCHITECTING MODERN DATACENTERS
The Power of Accelerated Computing
13
ARCHITECTING MODERN DATACENTERS
APPS
% W
ORKLO
AD
% Accelerated
REACH
APPS
Apps Accelerated
Parallel Speed-Up
% Workload
% Sequential “Amdahl’s Law”
14Intersect360 Research, Nov 2017 “HPC Application Support for GPU Computing”
VASP
AMBER
NAMD
GROMACS
Gaussian
Simulia Abaqus
WRF
OpenFOAM
ANSYS
LS-DYNA
BLAST
LAMMPS
ANSYS Fluent
Quantum Espresso
GAMESS
500+ Accelerated ApplicationsTop 15 HPC Applications
70% OF THE WORLD’S SUPERCOMPUTINGWORKLOAD ACCELERATED
15
Mixed Workload: Materials Science (VASP)Life Sciences (AMBER)Physics (MILC)Deep Learning (ResNet-50)
160 Self-hosted Servers
96 KWatts
4X BETTER HPC SYSTEM TCO
16
Mixed Workload: Materials Science (VASP)Life Sciences (AMBER)Physics (MILC)Deep Learning (ResNet-50)
12 Accelerated Servers w/4 V100 GPUs
20 KWatts
1/3 the Cost
1/4 the Space
1/5 the Power
4X BETTER HPC SYSTEM TCO
17
Image of ResNet 50 network
(…) Preferred Networks Nov '171024 Tesla P100
IBM Aug '17256 Tesla P100
Facebook June '17256 Tesla P100
Preferred Networks Feb '17128 TitanX (Maxwell)
48 Mins60 Mins
15 Mins
4.4 Hours
Time to Train
ResNet-50 ResNet-50 | Dataset: Imagenet | Trained for 90 Epochs
NVIDIA POWERS WORLD’S FASTEST DEEP LEARNING PERFORMANCE
18
ERRORS
REGRESSION TESTING (FP16/INT8)
INFERENCE (FP16/INT8)
TRAINING (FP32/FP16)
SIMULATION (FP64/FP32)
NEW DATA
TRAINING SET REGRESSION SET NEW DATA
DEEP LEARNING COMES TO HPC
19
U. FLORIDA & UNC: DRUG DISCOVERY300,000X Molecular Energetics Prediction
SLAC: ASTROPHYSICSGravitational Lensing: From Weeks to 10ms
U.S. DoE: CLEAN ENERGY33% More Accurate Neutrino Detection
PRINCETON & ITER: PARTICLE PHYSICS90% Accuracy for Fusion Sustainment
U. PITT: DRUG DISCOVERY35% Higher Accuracy for Protein Scoring
UNSW: PHYSICS14X Faster Bose-Einstein Condensate Creation
DEEP LEARNING COMES TO HPC
20
RE-SIM INFERENCETRAINING3D-SIM
TRAINING SET REGRESSION SET NEW DATA
ERRORS
NEW DATA
NVIDIA DRIVE AI CAR PLATFORM
21
40 PetaFLOPS Peak FP64 Performance | 660 PetaFLOPS DL FP16 Performance | 660 NVIDIA DGX-1 Server Nodes
NVIDIA SATURNV WITH VOLTA
22
Containerized in NVDocker
Optimization Across the Full Stack
Always Up-to-Date
Fully Tested and Maintained by NVIDIA
— MacHow2
NVIDIA GPU CLOUD
CLOUD CONTAINER REGISTRY
24
Containerized in NVDocker
Optimized for GPU-accelerated Systems
Up-to-Date Containers
Available NOW
Sign up at nvidia.com/gpu-cloud
ANNOUNCINGNVIDIA GPU CLOUD FOR HPC
CLOUD CONTAINER REGISTRYFOR ACCELERATED HPC APPS
25
GAMESS GROMACS CP2K PYFR BIOINFORMATICS CFD
LAMMPS NAMD QUANTUM ESPRESSO QMCPACK MATERIAL SCIENCE PHYSICS
RELION
SPECFEM3D
CARTESIAN
SPECFEM3D
GLOBE SEISMIC
HPC APPS COMING TO NGC
NOW NEXT
29
Large-scale Volumetric Rendering
Physically Accurate Ray Tracing
Production-quality Images
Seamless integration with ParaView
Early Access NOW
Signup now at nvidia.com/gpu-cloud
ANNOUNCINGNVIDIA GPU CLOUD FOR HPC VISUALIZATION
UNIFIED VISUALIZATIONFOR LARGE DATA SETS
ParaView with NVIDIA OptiX
ParaView with NVIDIA Holodeck
ParaView with NVIDIA IndeX
32
NVIDIA ACCELERATED COMPUTING
500+ Accelerated Apps
NVIDIA GPU Cloud for HPC
NVIDIA GPU Cloud
for HPC Vis
— MacHow2
Deep Learning
Comes to HPC
TO EXASCALE, AND BEYOND
Volta in Every Cloud, Every Computer Maker
Accelerated Computing’s Time Has Come