SUSE High Performance Computing: It just keeps getting better · SUSE High Performance Computing:...
Transcript of SUSE High Performance Computing: It just keeps getting better · SUSE High Performance Computing:...
SUSE High Performance Computing: It just keeps getting better
Jay KruemckeSr. Product Manager, HPC, ARM, [email protected]
The HPC universe is expanding in new ways
2
CAGR 2016-2021:
• 5.6% Supercomputer (>$500K)• 5.0% Divisional ($250K-$500K)• 6.3% Departmental ($100K-$250K)• 6.3% Workgroup (<$100K)
• HPC is a growth market, with a growing recognition of strategic value
• HPC ROI is very high• $551 on average revenue per dollar
invested in HPC• $52 on average profit (or cost savings) per
dollar invested in HPC
• Key use cases:• HPC in the cloud (incl. HPCaaS)• Cognitive computing (incl. AI/ML/DL)• HPDA (High Performance Data Analysis) • IoT
• Key applications:• Modeling and simulation• Data analytics
Source: Hyperion Research, June 2017•
SUSE High Performance Computing 4/15/2019 2
HPC Industry Trends
• AI/ML integration with traditional modeling and simulation
• Heterogeneous HPC with GPU and FPGA
• Container technology to simplify administration and reproducibility
• New architectures
4/15/2019 SUSE High Performance Computing 3
HPC Customer Pain Points
Complexity Maintenance Time to Solution
“My IT staff doesn’t have time to update and test all the different software components.”
• Better management software is needed, and deployment approach needs to be updated to leverage HPC and cloud infrastructure
• Stack components provided by multiple vendors, making it more challenging to maintain
“I need to maximize application performance, scale workloads, and minimize overhead.”
• Parallel software is lacking with many applications needing a major re-design
• Stack components provided by multiple vendors, making managing more challenging
• Segmented into commercial and scientific, and there is not enough collaboration
• “Composing a working HPC environment is difficult, time-consuming, requiring experts.”
• Clusters are hard to use and manage as they become more complex in heterogeneous environments
• Storage access time and data management are becoming new bottlenecks
SUSE High Performance Computing 44/15/2019
SUSE is the preferred HPE partner for Linux, HPC, OpenStack and Cloud
Foundry solutions
SUSE technology is embedded on every HPE ProLiant Server to power the
intelligent provisioning feature
SUSE High Performance Computing 4/15/2019 6
Arm SoC partners driving HPC adoptions in the modern data center
Catalyst UK initiative with HPE and SUSE
HPE Apollo 70 first SUSE “Yes” certification for an Arm server
Optimize infrastructure costs with increased server density on latest 64-bit Arm processors
SUSE High Performance Computing 4/15/2019 7
Goal: Propel the Arm HPC ecosystem and exascale computing in the UK
• More than 12,000 Arm-based cores running across three universities• 64 Apollo 70 systems per site• Two 32 core Cavium ThunderX2 processors per system• Running SUSE Linux Enterprise for High Performance Computing
Catalyst UK project:HPE, Arm, SUSE, and three leading UK universities establish one of the largest Arm-based supercomputer deployments in the world
SUSE High Performance Computing 84/15/2019
Catalyst UK – Arm based High Performance Computing
• Current Status:- Three 64 node HPE Apollo 70 HPC clusters deployed- SUSE Linux for HPC 12 SP3- HPE High Performance Computing Cluster Management (HPCM)- Slurm workload scheduler from SUSE HPC Module- Initial qualification and performance testing
• Plans:- Upgrade to SLES 15 - Testing of SUSE Enterprise Server for Arm- BOF session at ISC 2019
SUSE High Performance Computing
HPE Apollo 70 based onMarvel ThunderX2 Arm processors
4/15/2019 9
Cray Linux Environment (CLE) is based on SUSE Linux
Arm-powered Cray delivered to a UK consortium
Cray has a majority share of the Top500 sites
SUSE High Performance Computing 4/15/2019 10
Isambard – UK Tier 2 HPC service from GW4
• Cray “Scout” XC50 series system- 10,000+ Armv8 cores – Cavium ThunderX2- Aries interconnect- Cray Linux Environment based on SUSE Linux
SUSE High Performance Computing 4/15/2019 11
Scalable system framework in cooperation with OpenHPC, designed to work for small clusters to the largest supercomputers
Scale and balance for compute- and data-intensive applications
Strong platform for AI and visualization
SUSE High Performance Computing 4/15/2019 12
AI/ML/DL workloads
Jointly define scope of Lenovo HPC stack using SUSE HPC componentry
LiCO adaptation (Lenovo Intelligent Computing Orchestration)
Barcelona Supercomputing Center
4/15/2019 SUSE High Performance Computing 13
SuperMUC Petascale system runs SUSE on Lenovo ThinkSystem
Geophysicists use earthquake simulation software to investigate seismic waves beneath Earth’s surface
Calculations involved in this kind of simulation are so complex that they push even supercomputers to their limits
SUSE High Performance Computing 144/15/2019
SchedMD® is the core company behind the Slurm workload manager software designed specifically to satisfy the demanding needs of high performance computing.
SchedMD provides break/fix support for Slurm, configuration consulting, and hands on training workshops for Slurm.
SUSE High Performance Computing 4/15/2019 15
SUSE continues to work with NVIDIA to enable support for the latest NVIDIA GPU cards – important in HPC modeling and simulation
NVIDIA’s expertise in programmable GPUs has led to breakthroughs in parallel processing which make supercomputing inexpensive and widely accessible
SUSE High Performance Computing 4/15/2019 16
Altair makes HPC faster, smarter & easy to manage with PBS Works™
Altair provides services for software applications that streamline the workflow management of compute-intensive tasks including solvers, optimization, modeling, visualization and analytics
SUSE High Performance Computing 4/15/2019 17
Bright Cluster Manager supports SUSE, enabling customers to deploy, manage and monitor SLES clusters using the familiar Bright interface
Bright Cluster Manager lets users monitor and build clusters of any size that are easy to provision, operate, monitor, manage and scale
SUSE High Performance Computing 4/15/2019 18
Univa and SUSE together manage containerized HPC and AI workloads on TSUBAME 3.0
Scaling machine learning for SUSE Linux containers, servers, clusters and clouds with Apache Spark and Univa
SUSE High Performance Computing 4/15/2019 19
Why SUSE Linux for HPC?
• Enterprise Linux with Enterprise support- Incidents such as Spectre and Meltdown highlight the need quick
response to address system vulnerabilities• More than just an OS - HPC software included and supported
- SLE HPC includes popular HPC software such as slurm and OpenMPI• Aggressively priced subscriptions
- SUSE Linux for HPC priced for large and small HPC configurations• Proven track record in HPC
- 50% of the Top 100 are running SUSE Linux or SLES-based OS
SUSE High Performance Computing 204/15/2019
SUSE Linux Enterprise HPC Continuum
• SUSE Linux Enterprise for HPC (X86 and ARM)- Fully supported by SUSE
• HPC Module (part of SUSE Linux Enterprise HPC)- Fully supported through your SUSE HPC subscription- Content inspired by OpenHPC
• PackageHub- SUSE curated, community supported packages https://packagehub.suse.com/
• OpenSUSE LEAP- Free, community supported Linux
- Free Developer subscriptions- SUSE enablement for Azure, AWS Cloud• Related Products
- SUSE Enterprise Storage- SUSE Manager
SUSE High Performance Computing 4/15/2019 21
SUSE Linux for HPC is now a separate product
• Starting with SLES 15, HPC became a separate product: SLE HPC 15- Increased flexibility for future- Installation system roles for Head Nodes, Compute Nodes, and Developement- Supported by the unified installer- New registration key and SCC channel- HPC module is only accessible with SLE HPC 15
• New registration keys for SLE HPC 12 (SP2 – SP4)- New registration key and SCC channels from SLES- NOTE: Upgrade from SLES to SLE HPC or renewal requires a subscription
conversion• switch_sles_sle-hpc script• https://www.suse.com/releasenotes/x86_64/SUSE-SLES/12-SP3/#fate-326567
4/15/2019 SUSE High Performance Computing 22
SUSE Linux Enterprise HPC offerings
• Available for X86 and Arm HPC clusters
• Extended Service Pack Overlap Support (ESPOS)
• Long Term Service Pack Support (LTSS)
• Simple, one price per cluster node
• Significantly reduced list prices
• Support for smaller cluster sizes
• Direct SUSE sellers can sell HPC
• New product – SLE HPC 15
- Separate from general purpose SLES
SUSE High Performance Computing 234/15/2019
SUSE Linux HPC Module
MUNGE
ScaLAPACK
genders
• All packages supported by SUSE via SUSE Linux Enterprise HPC
• Available for x86 and Arm-based platforms
• Flexible release schedule
• SLE 12 and SLE HPC 15
SUSE High Performance Computing 244/15/2019
SUSE Linux Enterprise HPC Module
All packages supported by SUSE- Support included in the SLE HPC Subscription
Easy installation via zypper or Yast
Available for X86 and ARM platforms beginning with SLES 12 SP2
Flexible release schedule. Releases are independent of Service Pack schedule
•Simplifying access to supported HPC software
* Note: A separate support agreement is required for Icinga2
Package HPC Module1Q17
HPC Module4Q17
HPC Module1Q18
HPC ModuleSLES 12
HPC Module
SLE HPC15conman 0.2.7 0.2.8 0.2.8 0.2.8cpuid (X86) 20151017 20170122 20170122 20170122 20170122fftw 3.3.6 3.3.6 3.3.6ganglia 3.7.2 3.7.2 3.7.2ganglia-web 3.7.2 3.7.2 3.7.2genders 1.2.2 1.2.2 1.2.2GCC 6.2.1 7.3.1 7.3.1 7.3.1hdf5 1.10.1 1.10.1 1.10.1hwloc 1.11.5 1.11.8 1.11.8 1.11.8Icinga2* 2.8.2 2.8.2 n/alua-lmod 6.5.11 7.6.1 7.6.1 7.6.1memkind (X86) 1.1.0 1.1.0 1.6.0mpiP 3.4.1 3.4.1 3.4.1mrsh 2.12 2.12 2.12munge 0.5.12 0.5.12 0.5.13mvapich2 2.2 2.2.13 2.2.13 2.2.13netcdf 4.4.1.1 4.4.1.1 4.6.1netcdf-cxx 4.3.0 4.3.0 4.3.0netcdf-fortran 4.4.4 4.4.4 4.4.4numpy 1.13.3 1.13.3 1.14.0openblas 0.2.20 0.2.20 0.2.20openmpi 1.10.7 1.10.7 2.1.3papi 5.5.1 5.5.1 5.5.1 5.5.1pdsh 2.31 2.33 2.33 2.33 2.33petsc 3.7.6 3.7.6 3.8.3phdf5 1.10.1 1.10.1 1.10.1powerman 2.3.24 2.3.24 Base OSprun 1.0 1.0 1.0rasdaemon 0.5.7 0.5.7 Base OSScaLAPACK 2.0.2 2.0.2 2.0.2slurm 16.05.8 17.02.09 17.02.10 17.02.10 17.11.5
Note: SLE 15 customers must use the SLE HPC subscription toaccess the HPC Module packages on SLE 15
SUSE High Performance Computing 254/15/2019
Enterprise User
SUSE PackageHub
• High-quality, up-to-date packages delivered by openSUSE Factory
• Easy to install via zypper or yast
• Built and maintained by the community of users
• Approved and curated by SUSE
• No additional charge
•Community Supported Packages for SLES
About 1000 packages available for X86-64
More than 500 packages available for ARM
SUSE Package HubUpstream packages
Package Category
clustershell Administrativerobinhood Administrativesingularity RuntimeTensorFlow ML FrameworkCaffe2 Coming soon
SUSE High Performance Computing 274/15/2019
SLES HPC lifecycle Roadmap*
SUSE High Performance Computing
SLES 12 HPC SP5SLES 12 HPC
SP5 LTSS
SLES 12 HPC SP5 SLES 12 HPC SP5 ESPOS
2017 2018 2019 2020 2021 2022 20252023 2024
SLES 12 HPC SP3 LTSS
SLES 12 HPC SP3 ESPOS
SLES 12 HPC SP3 FCS
Sept 2017
SLES 12 HPC
”Normal” SP overlap
SLES 12 HPC SP4 LTSS
SLES 12 HPC SP4 ESPOS
SLES 12 HPC SP4 FCS
4Q 2018
SLES 12 HPC
”Normal” SP overlap
SLE HPC 15 ESPOS
SLE HPC 15 FCSQ2 2018
SLE HPC 15
”Normal” SP overlap
SLE HPC 15 SP2
SLE HPC 15 SP2SLE HPC 15 SP2 LTSS
SLE HPC 15 SP2 ESPOS
SLE HPC 15 SP1 LTSS
SLE HPC 15 SP1 ESPOS
SLE HPC 15 SP1 FCS
Q2 2019
SLE HPC 15 SP1
”Normal” SP overlap
HPC Moduledeliveries
4/15/2019
*NOTE: All future dates are estimates for illustration purposes and are not intended as committed dates.
SLE HPC 15 LTSS
Other HPC related SUSE Products
SUSE High Performance Computing
SUSE OpenStack CloudSUSE Enterprise StorageX86-64 & Arm 64 since early 2017
SUSE ManagerManaged node for X86 & Arm 64 available
294/15/2019
SUSE Enterprise Storage Solution for HPCMost Common Use Case as Tier 2 Storage
Low Latency Storage (Lustre,
XFS, NFS etc)
HPC Compute Cluster
SUSE Enterprise Storage
• Use Cases:• Primary Storage (Certain Use Cases)• Nearline or Archival Storage • Home Directories
• Certified with HPE Data Management Framework (DMF) and iRODS**: Coming Soon
SUSE High Performance Computing 304/15/2019
SUSE + CLE59%
bullx15%
Ubuntu4%
Red Hat22%
• Represents 116 supercomputers in the top 500 list
• Over half of the paid Linux OS in the top 500 are SUSE
HPC Top 500 Analysis – Paid OS System Share
SUSE High Performance Computing 314/15/2019
SUSE High Performance Computing
•SLES for HPC Solution• Comprehensive range of Linux operating system offerings at multiple price points
• Simple, one price per cluster node pricing model• HPC Module with many supported HPC packages • Competitive pricing • Multiple service life options• Full enablement for X86-64 and ARM based HPC clusters• Additional open-source packages via PackageHub and OpenSUSE