New Era of High Performance Computing (convergence of AI, Big Data, HPC)
Rajesh [email protected]
Petascale
Cray XC
Systems
UNDISCLOSEDSYSTEMS
Top 10 Top 50 Top 100
Cray Systems 4 18 29
Vendor Rank #1 #1 #1
Top 500 Supercomputers in the World
June 2018
Cray’s Supercomputing Leadership
Copyright 2018 Cray Inc. - Confidential and Proprietary3
Top 500 Supercomputers in the World
Nov 2017
Top 10 Top 50 Top 100
Cray Systems 4 18 29
Vendor Rank #1 #1 #1
Copyright 2018 Cray Inc.4
The Convergence of Big Data, AI and HPC
Modeling The World
Data-IntensiveProcessing
Hybrid workflows with a mix of simulation and
analytics
Data Models
Analysis of large datasets for knowledge discovery, insight, and
prediction.
Math Models
Simulation and modelling of the natural world via
mathematical equations
Workloads are
becoming
more
heterogeneous
The Convergence of Big Data, AI and HPC
6
Systems/Container Management
Analytics/Machine Learning Ecosystem
Deep Learning Toolkits
Big Data
Today: Running software built for the cloud on HPC hardware
Benefit: Convergence of productivity and performance
Machine Learning Coming to Your Science Domain
Clustering Daya Bay
Events
Classifying LHC
Events
Oxford Nanopore
SequencingDetecting Extreme Weather
FWI Subsurface
Modelling
Modeling Galaxy Shapes
Turbine CFD Modeling
New and Larger Deep Learning Models Required
Protein-Ligand Binding
Deep Learning Use Cases
Consumer Retail Energy Financial Health Industrial Autonomous Driving
•Search•Face/Object Detection
Detection• Image Segmentation
Segmentation•Speech
Understanding•NLP•Text to Speech
•Person and Object DetectionDetection
• Image Segmentation
•Scene Analytics•Support•Marketing•Supply Chain•Security
•Oil and Gas Exploration
•Smart Grid•Operational
Improvement•Conservation
•Algorithmic TradingTrading
•Fraud Detection•Personal Finance•Risk Mitigation•Security
•Enhanced Diagnostics
•3D Medical Imaging
•Drug Discovery•Sensory Aids
•FactoryAutomation
•Predictive Maintenance
•Precision Agriculture
•Field Automation
•Pedestrian,Vehicle, and Object Detection Detection and Classification
•Ego Motion•Sensor Fusion•Environment
Modeling
Topologies:
•ResNet•SSD•LSTM•Attention•SparseNN•FCN
Topologies:
•ResNet•SSD•FCN•RNN
Topologies:
•Deep Reinforcement
Topologies:
•Deep Reinforcement
Topologies:
•ResNet•SSD•FCN
Topologies:
•ResNet•SSD•Deep
ReinforcementLearning
Topologies:
•Deep ReinforcementLearning
•LSTM•SSD
8/31/2018
NERSC – Deep Learning in Science
Opportunities to apply DL widely in support of classic HPC simulation and modelling
Molecular Engineering of Solar-powered WindowsJacqueline M. Cole, University of Cambridge, Argonne National Lab
1 2 3 4Extract Compound Data from Scientific Publications
Enrich Data with ML and Quantum Chemical Calculations
Filter Data Set to small number of candidates (ML)
Validate final candidates (sim)
Relationship Between AI, ML, & DL
● “AI” is a very broad term, with no clear boundaries
● ”AI” and “deep learning” are not synonymous
● Machine learning is just a part of AI
● Deep learning is a specialization of machine learning
● Cray focuses on Deep Learning
Neural Network Workflow
NN workflows are similar in many ways to typical data science workflows
Ingest/clean & transform can be major undertakings, as usual
Training results in a model that can then be used for inference, which produces answers in
production
Cray’s biggest contribution is to be made in the computationally intensive training phase!
Deep Learning in Production
DataAcquisition
DataPreparation
ModelTraining
ModelTesting
• Cleansing• Shaping• Enrichment
Data Annotation (Ground Truth)
TestSet
ValidationSet
Train Model
Evaluate Performance and optimize model
Cross-Validation
TrainingSet
ModelDeployment
A/B testing in production
Iterative
Training Inferencing
Model management
Example of an end-to-end workflow
Deep Learning: Behind the Scenes
DataAcquisition
DataPreparation
ModelTraining
ModelTesting
ModelDeployment
A/B testing in production
Training Inferencing
Ideal training algorithm: For every training sample:
run sample forward through the model
compute the error vs. the training data
back-propagate error through the NN to update the weights (gradient descent)
Typically broken up into “mini-batches”
Exposes more intra-node parallelism; arguably reduces “noise”
After all data is processed, adjust “learning rate” and repeat until desired accuracy achieved
DNN model with weights on all connections.Largest models now hundreds of layers, and millions (to billions) of nodes
HPC Thinking: Message-size, MPI-collective, Global all-reduce modifications
Source: Peter Mendygral and Jef Dawson, Cray PE and Performance
90%+ scalability efficiency that can reduce training time from days to hours
Differentiating Results: TensorFlow
Cray Machine Learning / AI Environment
Cray Distributed Training Framework Delivers up to 5x Performance* over other Distributed Training Approaches
* Actual performance depends onsystem, batch size and model
Deep Learning Toolkits
Analytics/Machine Learning Ecosystem
NCCL
Not Just More Data But Also DIFFERENT I/O Patterns
18
Large,streaming
I/O(HDDs shine)
Small,random
I/O(SSDs shine)
Modelling &
Simulation
Advanced
AnalyticsArtificial
Intelligence
L300FL300 L300N
ClusterStor Converged Building BlocksEmbedded HA Lustre Object Storage Servers
Form Factor
HDD/SDD
IOPS 4K rand. Wr
Throughput
GB/s*
Cost/usable GB 1 1.15 ~ 30
5U84 12Gb/s SAS
82/2 82/2 or 80/4 (with NXD SW) 0/24
4,000 40,000 500,000
10 rd/10 wr 10 rd/10 wr 10 rd/20 wr
*Conservatively derated19
Best used for..Large,
Streaming I/OSmall,
random I/OMixed
I/O
5U84 12Gb/s SAS 2U24 12Gb/s SAS
Base rack
Copyright 2017 Cray Inc.
● No single point of failure
● 2 X GiGE Switches (1U each)
● 2 X IB Switches (1U each)
● SMU / System Management unit
(2U)
● MMU / Meta data management unit
(2U)
● 6 X SSU (5U each)
20
Expansion rack
● No single point of failure
● 2 X GiGE Switches (1U each)
● 2 X IB Switches (1U each)
● 7 X SSU (5U each)
Copyright 2017 Cray Inc.
Rack# drives:
(HDDs/SSDs)
8TB HDDTBs: (U/R)
IOR perfGB/s*
PowerkW
SSU #6 574/ 14 3304 / 4592 63 14.9
SSU #5 492 / 12 2832 / 3936 54 12.6
SSU #4 410 / 10 2360 / 3280 45 10.9
SSU #3 328 / 8 1888 / 2624 36 9.2
SSU #2 246 / 6 1416 / 1968 27 7.4
SSU #1 164 / 4 944 / 1312 18 5.7
SSU #0 82 / 2 472 / 656 9 4.0
SSU Specs – 7.2 K RPM NL-SAS drive (Expansion rack)
22Copyright 2017 Cray Inc.
CRAY INC - COPYRIGHT 201823
Runtime VariabilityReal time and historical views of metrics
to understand what is impacting
applications
Problem IsolationUnified view of system activity enables
problem isolation in complex
environments
Trend AnalysisEnable data-driven decisions and
visualization of trends to optimize
systems
AlertingSpend more time on high priority tasks
and be notified when anomalies occur
Cray’s Solution:
Multiple ClusterStor System View
CRAY INC - COPYRIGHT 2018
ActivityQuickly see what
jobs are running
on the system
and jobs that
might present
issues
UtilizationPerformance and
capacity
information for
multiple systems
24
Performance Visualization and Comparison
VisualizePerformance
graphs over the
life of the job for
write, read, and
metadata
operations
CompareCompare this job
to the rest of the
system at a
glance
25
Cray’s next machine- Convergence of Cluster & Supercomputer
Cray Inc. Proprietary – Not For Public Disclosure 26
● Liquid cooled & Air cooled system
● Single interconnect for either system
● Single system management software for either system
● Ability to carve out a portion of system for dedicated projects with few clicks!
● Ability to optimize the same platform for a variety of applications (cluster focused, large memory focused, large mpi jobs)
• Highest power CPUs supported via direct liquid cooling
• Hardware & Software scalable to Exascale Systems
Performance
• Warm water cooling (W3 and W4 temps supported)
• Efficient power conversion
• Upgradeable for multiple generations
TCO
Cray Next Gen LC Supercomputer
Leadership Supercomputing
Highest number of systems in the HPC
Top 100*
Drive maximum computing
performance
while focusing on programmability
Close the gap between observed
and achievable performance
Maximize cycles to the application
Address issues of scale and
complexity of HPC systems
Cray developer tools profile applications with
over 99,000 ranks
Cray MPI runs with > 2 million ranks
*Nov 2017
Cray Software
Admin Interface
Cray Linux Environment
Lustre, Cray DVS, DataWarp
Systems Management
High Speed Interconnect
Cray Programming Environment
3rd party WLMs, containers, tools
A complete, fully integrated, extensible environment for HPC
Converged System ManagementSupport for broad operating and management ecosystems
Node Bootstrap
Orchestration
Utility
Storage
Monitoring
Configuration
Management
Network
Management
…
Infrastructure Services
Storage Events Networking Compute
CLE
Software EcosystemsHardware
Inventory
Administrative
Control
Management Services
External Interfaces
Cray
REST APIs
Tools
(continued)
Programming
Models
Programming
Languages
Fortran
Tools Programming
Environments
Cray’s Extensive Programming Environment
Chapel
Python
R
Optimized
Libraries
LAPACK
Cray Developed
Licensed ISV SWCray added value to 3rd party
3rd party packaging
C
C++
Analytics / AI **
AI Toolboxes
Cray MPI
SHMEM
Environment setup
Debuggers
Modules
TotalView
DDT
gdb4hpc
Abnormal
Termination
Processing (ATP)
ScaLAPACK
BLAS
Iterative
Refinement
Toolkit
FFTW
NetCDF
HDF5
Performance
Analysis
Porting
CrayPAT
Cray Apprentice2
Reveal
CCDB
STAT
Distributed Memory
Debugging Support
I/O Libraries
Scientific Libraries
DL Frameworks
Chapel AI
Cray Compiling
Environment
PrgEnv-cray
GNU
PrgEnv-gnu
ProgEnv-
3rd Party compilers
(Intel, Allinea,
LLVM, etc)
Languages
Cray Distributed
Training
Framework
Valgrind4hpc
Cray Urika
AI - Analytics
OpenMP
OpenACC
PGAS & Global
View
Shared Memory /
GPU
UPC
Fortran coarrays
Coarray C++
Chapel
Copyright 2016 Cray Inc. - Confidential and Proprietary32
Next Gen System-Software Themes
• Building on current management and Linux scalability enhancements
• MPI scalability across full systemsScaling to exascale
• Separate management and operating environments
• Concurrent maintenance
• Health and resiliency support
Toward zero downtime
• Customer choice of operating environment
• Broad container support
• Workload management and orchestrationRun any workflow
• Clean APIs between software components
• Customizable with easy integrationModularity
Top Related