IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster)...
Transcript of IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster)...
![Page 1: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/1.jpg)
IBM HPC/HPDA/AI
Solutions
Albert Valls Badia IBM Client Technical Architect
IBM Systems Hardware
June 15th , 2017
![Page 2: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/2.jpg)
2
New Drivers and Directions – Datacentric
• Data Volumes are Exploding – Especially Unstructured Data
• Data Needs to e Colle ted, Ma aged, a d Digested
• Deriving Insight and Information from the Data requires:
• A variety of pro essi g steps i a Workflo
• A variety of processing optimizations
• Many Analytics Steps can make use of Large In Memory
Solvers
• Energy Efficiency requires:
• Processing Elements that are Optimized to the task
• Energy and Data aware Workflow Management
• The Open Power Foundation provides innovation
opportunities to a variety of Partners
• Making innovations like A elerators Co su a le is
critical
Pri
ce/P
erf
orm
an
ce
Full system stack innovation required
Technology
and
Processors
200
0
20
20
Firmware / OS
Accelerators Software Storage Network
Workflow
Dependency Graph
![Page 3: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/3.jpg)
OpenPOWER and Innovation (strategy started in 2014)
IBM Stack
Research
And
Innovation
IBM
NVIDIA
TYAN
Mellanox OpenPower
Open Innovation
OpenPOWER: Bringing Partner Innovation to Power Systems
5 initial members
200+ members
24 countries
![Page 4: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/4.jpg)
OpenPOWER Innovation Pervasive in System Design (21 TFlops/node)
•
4
NVIDIA:
Tesla P100 GPU with NVLink
NVLink Interface
Ubuntu by Canonical:
Launch OS supporting NVLink and Page
Migration Engine
Wistron: Platform co-design
Mellanox: InfiniBand/Ethernet
Connectivity in and out of server
Samsung:
2.5” SSDs
HGST: Optional NVMe Adapters
Hynix, Samsung, Micron: DDR4
IBM: POWER8 CPU
![Page 5: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/5.jpg)
POWER8: Leadership performance - designed for Memory Intensive Workloads
5
Memory Buffer
DRAM Chips
POWER8
12 cores 96 threads 4 cache levels
Up to 1/2 TB per socket Up to 230 GB/s sustained
Consistent speed
Faster cores
8 Threads per Core
Bigger cache
Accelerator direct
links
3x higher memory
bandwidth,
1 TB/Socket
![Page 6: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/6.jpg)
Differentiated Acceleration - CAPI and NVLink
New Ecosystems with CAPI
Partners innovate, add value,
gain revenue together w/IBM
Technical and programming
ease: virtual addressing, cache
coherence
Accelerator is hardware peer FPGA or ASIC
NVIDIA Tesla GPU with NVLink
POWER8
with NVLink
80 GB/s
Peak*
Graphics Memory Graphics Memory
System Memory
40+40 GB/s
Coherence Bus
POWER8
CAPP
CAPI-attached Accelerators
Future, Innovative Systems with NVLink
Faster GPU-GPU communication
Breaks down barriers between CPU-GPU
New system architectures
PSL
6
![Page 7: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/7.jpg)
IBM Power Accelerated Computing Roadmap
2015 2016 2017
POWER8 POWER8 with NVLink
POWER9
CAPI
Interface
NVLink
Enhanced
CAPI &
NVLink
ConnectX-4 EDR Infiniband
PCIe Gen3
ConnectX-4 EDR Infiniband
CAPI over PCIe Gen3
HDR Infiniband Enhanced CAPI over PCIe Gen4
Mellanox Interconnect Technology
IBM CPUs
NVIDIA GPUs Kepler
PCIe Gen3 Volta
Enhanced NVLink Pascal NVLink
S822LC – Firesto e
Server
S8 LC for HPC Mi sk
POWER10
2020+
Witherspoo
TBD
TBD
System Name TBD
![Page 8: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/8.jpg)
7/3/2017 8
FLOPS are not the only PKI in HPC: example workflow in seismic analysis.
• Read from storage
• Memory load
• Preporcessing
• Realtime algorithm execution
• Visualization and Insight
• Simulation and modeling Every step in the workflow takes advantage of different
hardware capabilities. Therefore the need for a balanced
system design.
![Page 9: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/9.jpg)
IBM Data Centric Computing Strategy: HPC->HPDA
![Page 10: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/10.jpg)
Introducing IBM Spectrum Scale
• Remove data-related bottlenecks with a parallel, scale-out solution
• Enable global collaboration with unified storage and global namespace
• Optimize cost and performance with automated data placement
• Ensure data availability, integrity and security with erasure coding, replication, snapshots, and encryption
Highly scalable high-performance unified storage
for files and objects with integrated analytics
Unified Scale-out Data Lake
• File In/Out, Object In/Out; Analytics on demand.
• High-performance native protocols
• Single Management Plane
• Cluster replication & global namespace
• Enterprise storage features across file, object & HDFS
Spectrum Scale
SSD Disk
Fast Disk
Slow Disk
Tape
SSNR
Compression
NFS SMB POSIX Swift/S3 HDFS
Encryption
SSD Disk
Fast Disk
Slow Disk
![Page 11: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/11.jpg)
| 11
![Page 12: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/12.jpg)
IBM Spectrum Scale: Parallel Architecture
| 12
No Hot Spots
• All NSD servers export to all clients in active-active mode
• Spectrum Scale stripes files across NSD servers and NSDs in units of file-system block-size
• File-system load spread evenly
• Easy to scale file-system capacity and performance while keeping the architecture balanced
NSD Client does real-time parallel I/O
to all the NSD servers and storage volumes/NSDs
NSD Client
NSD Servers
Storage Storage
![Page 13: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/13.jpg)
Ethernet Network (TCP/IP) or Low Latency Network (Infiniband)
Heterogeneous Block
Storage
Block Storage
JBODs
JBODs
JBODs
Block Storage
IBM Elastic Storage
Solution
Spectrum Scale Native RAID Controllers Spectrum Scale File
Servers
Commodity Servers
(x86_64 or Power)
Application Nodes
(Oracle,ERP, HPC
Cluster)
Spectrum Scale Clients
/file_systemA
Spectrum Scale Protocol
Nodes
NFS, SMB, OpenStack
Swift
NFS exports
SMB Shares
HTTP GET/PUT (Swift)
Spectrum Scale NSD Protocol
NFS Clients
SMB Clients
OpenStack Swift Clients
Clustered
Failover
Up to 16 (SMB) or
32 (NFS) servers
Servers use Disk
Volumes/LUNs
File-system load spread
evenly across all the
servers. No Hot Spots
Data is stripped across
servers in block-size
No single-server
bottleneck
Can share access to
data with NFS, SMB and
Swift S3
Easy to scale while
keeping the architecture
balanced
Can add capacity and
performance
/file_systemA
/file_systemA
Spectrum Scale Cluster Overview
![Page 14: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/14.jpg)
Spectrum Scale Architecture Highlights: Scalability
Data scalability
Capacity: Large number of disks/LUNs in a single file system
Throughput: wide striping, large block size
Capacity efficient (data in i-node, fragments)
Multiple nodes write in parallel (even within single file)
Metadata scalability
Wide striping of all metadata (inodes, indirect blocks, directories, allocation maps...)
Scalable data structures: Segmented allocation map,
Extensible hashing for directories
Highly scalable, distributed lock manager:
After o tai i g lo k toke , each node can cache metadata, update locally, write back directly
Fine-grain locking, when necessary: shared inode write locks, byte-range locks
lock directory entries by name (hash)
Dy a i ally ele ted metanode collects inode, ind block & directory updates
![Page 15: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/15.jpg)
Speed and simplicity: Graphical user interface
• Reduce administration overhead • Graphical User Interface for common tasks
• Performance monitoring
• Problem determination
• Easy to adopt • Common IBM Storage UI Framework
• Integrated into Spectrum Control • Storage portfolio visibility
• Consolidated management
• Multiple clusters
![Page 16: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/16.jpg)
Spectrum Scale Built-in Tiering (ILM) Challenge
• Data growth is outpacing budget
• Low-cost archive is another storage silo
• Flash is u der utilized e ause it is t shared
• Lo ally atta hed disk a t e used ith e tralized storage
• Migration overhead is preventing storage upgrades
• Automated data placement
• Span entire storage portfolio, including DAS, with a single namespace
• Policy driven data placement & data migration
• Share storage, even low-latency flash
• Automatic failover and seamless file-system recovery
• Lower TCO
• Powerful policy engine
• Information Lifecycle Management
• Fast etadata s a i g a d data o e e t
• Automated data migration to based on threshold
• Users not affected by data migration
• Example: Online storage reaches 90% full then move all 1GB or larger files that are 60 days old to offline to free up space
Small files last accessed > 30 days
last accessed > 60days
Silver pool is >60% full Drain it to 20%
accessed today and file size is <1G
System pool
(Flash)
Gold pool
(SSD)
Silver pool
( NL SAS)
Automation
![Page 17: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/17.jpg)
Spectrum Scale HDFS Transparency
Challenge • Separate storage systems for ingest, analysis, results
• HDFS requires locality aware storage (namenode)
• Data transfer slows time to results • Different frameworks & analytics tools use data
differently
• HDFS Transparency
• Map/Reduce on shared, or shared nothing storage
• No waiting for data transfer between storage systems
• Immediately share results • Si gle Data Lake for all appli atio s • Enterprise data management • Archive and Analysis in-place
A A A
Existing System
Analytics
System Data
ingest
Export
result
Traditional Analytics
Solution
A A A
Existing System
Spectrum Scale File System File Object
Analytics
System
HDFS
Transparency
In-place Analytics Solution
![Page 18: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/18.jpg)
Spectrum Scale Compression
• Transparent compression for HDFS transparency, Object, NFS, SMB and POSIX interface.
• Improved storage efficiency
• Typically 2x improvement in storage efficiency
• Improved I/O bandwidth
• Read/write compressed data reduces load on storage
• Improved client side caching
• Caching compressed data increases apparent cache size
• Per file compression
• Use policies
• Compress cold data
– Data not being used/accessed
18
![Page 19: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/19.jpg)
Spectrum Scale Encryption
• Native Encryption of data at rest
• Files are encrypted before they are stored on disk
• Keys are never written to disk
• No data leakage in case disks are stolen or improperly decommissioned
• Secure deletion
• Ability to destroy arbitrarily large subsets of a file system
• No ‘digital shredding’, no overwriting: Security deletion is a cryptographic operation
• Use Spectrum Scale Policy to encrypted (or exclude) files in fileset or file system
• Generally < 5% performance impact
![Page 20: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/20.jpg)
Benefits
• Expands local node file cache (Pagepool)
• Leverages fast local storage
• Can reduce load on central storage
• Transparent to applications
• Can use inexpensive local devices
Where to use it
• Protocol Node
• Virtual Machine storage
• Large Memory Analytics
Easy to enable
NSD Type localCache
Define only this node as NSD server
LROC LROC
Application Nodes
Performance Feature Spectrum Scale Local Read-Only Cache (LROC)
![Page 21: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/21.jpg)
Benefits
• Speeds-up small writes
• Used by IBM Elastic Storage Server
Where to use it
• Logs handle small writes
• Any storage architecture
• Shared Disk
• Shared Nothing (Use replication)
• IO Sizes up to 64KiB
Easy to enable
Create a system.log pool
Enable write-cache on the file system
Application Nodes
Performance Feature Spectrum Scale Highly Available Write Cache (HAWC)
Flash
Local Storage
Shared Storage
![Page 22: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/22.jpg)
Spectrum Scale Multicluster: cross-cluster sharing
22
• Cross-mounting file systems between Spectrum Scale clusters
• Separate clusters = separate administration domains
• When connection is established, all nodes are interconnected
– All nodes in both clusters must be within same IP network segment / VLAN
– Channel can be encrypted (openssl)
![Page 23: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/23.jpg)
Synchronous Replication & Stretched Cluster
• Performed synchronously by the node who writes to disk
• Synchronous replication happens within Spectrum Scale cluster
• I/O it does not return to the application until both copies are written
• Active/Active data access
• Read from fastest source
• DR with automatic failover and seamless file-system recovery
• If replication between sites -> Spectrum Scale Stretched Cluster
Synchronous
replication
Application
Whichever
is fastest
23
![Page 24: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/24.jpg)
Spectrum Scale Active File Management (AFM) • An asynchronous, cross-cluster, data-sharing utility
• Functions well over unreliable and high latency networks
• Extends global name space between multiple WAN dispersed locations to share and exchange data asynchronously
• Ca hes lo al opies of data distri uted to o e or ore lusters to i pro e lo al read and write performance
• As data is written or modified at one location, all other locations see that same data
24
![Page 25: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/25.jpg)
Spectrum Scale AFM Main Concepts
• Home - Where the information lives. Owner of the data in a cache relationship
• Cache - Fileset in a remote cluster that points to home
• The relationship between a Cache and Home is one to one
• Cache knows about its Home. Home does not know a cache exists
• Data is copied to the cache when requested or data written at the cache is copied back to home as fast as possible
25
![Page 26: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/26.jpg)
Spectrum Scale Server 1
Spectrum Scale Server 2
Clients
FDR IB
10/40 GbE
IBM Elastic Storage Server (ESS) is a Software Defined Solution
Migrate RAID
and disk
management
to commodity
file servers !
Custom dedicated
Disk Controllers
JBOD Disk
enclosures
Spectrum Scale Server 1
Spectrum Scale Server 2
Clients
Spectrum Scale RAID
Commodity file
servers
FDR IB
10/40 GbE
JBOD Disk
enclosures
Spectrum Scale RAID
Commodity file
servers with
RAID and disk
management
Spectrum Scale Native RAID is a software implementation of storage RAID technologies within
Spectrum Scale.
It requires special Licensing
It is only approved for pre-certified architectures such as Lenovo-GSS, IBM-ESS (Elastic Storage
Server)
![Page 27: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/27.jpg)
Advantages of Spectrum Scale RAID
• Use of standard and i e pe sive disk drives • Erasure Code software implemented in Spectrum Scale
• Data is declustered and distributed to all disk drives with selected RAID protection
• 3-way, 4-way, RAID6 8+2P, RAID6 8+3P
• Faster rebuild times • As data is declustered, more disks are involved during rebuild
• Approx. 3.5 times faster than RAID-5
• Minimal impact of rebuild on system performance • Rebuild is done by many disks
• Rebuilds can be deferred with sufficient protection
• Better fault tolerance • End to end checksum
• Much higher mean-time-to-data-loss (MTTDL)
JBODs
Spectrum Scale RAID
![Page 28: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/28.jpg)
RAID algorithm • Two types of RAID:
• 3 or 4 way replication
• 8 + 2 or 3 way parity
• 2-fault and 3-fault tolera t odes RAID-D2, RAID-D3
3-way Replication (1+2) 8 + 2p Reed Solomon 2-fault
tolerant
codes
3-fault
tolerant
codes
1 strip
(GPFS
block)
2 or 3
replicated
strips
4-way Replication (1+3)
8 strips
(GPFS block)
2 or 3
redundancy
strips
8 + 3p Reed Solomon
![Page 29: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/29.jpg)
Rebuild overhead reduction example
| 31
![Page 30: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/30.jpg)
Declustered RAID6 example
![Page 31: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/31.jpg)
Critical Rebuild Performance on GL6 8+2p
JBODs
Spectrum Scale RAID
As one can see
during the critical
rebuild impact on
workload was high,
but as soon as we
were back to a
single parity
protection the
impact to the
customers
workload was <2%
Data Integrity Manager
prioritizes tasks:
Rebuild, Rebalance,
Data scrubbing and
proactive correction
6 minutes for a critical rebuild
![Page 32: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/32.jpg)
End-to-end checksum • True end-to-end checksum fro disk surfa e to lie t s Spe tru S ale i terfa e
• Repairs soft/latent read errors
• Repairs lost/missing writes.
• Checksums are maintained on disk and in memory and are transmitted to/from client.
• Checksum is stored in a 64-byte trailer of 32-KiB buffers • 8-byte checksum and 56 bytes of ID and version info
• Sequence number used to detect lost/missing writes.
8 data strips 3 parity strips
32-KiB buffer
64B trailer
¼ to 2-KiB
terminus
![Page 33: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/33.jpg)
IBM Elastic Storage Server family GS odels use U . JBODs or SSDs
Support drives: . TB, .8TB SAS, GB, 8 GB, . TB SSD .
GL odels use U . JBODs
Support drives: 4TB,6TB,8TB NL-SAS . HDDs
Supported NICs: 10GbE, 40GbE Ethernet and FDR or EDR Infiniband
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
58
87
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
58
87
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
58
87
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
58
87
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
58
87
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
58
87
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
588
7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
58
87
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC
5887
Net Capacity
4TB = 327TB
6TB = 491TB
8TB = 655TB
Net Capacity
4TB = 673TB
6TB = 1PB
8TB = 1.3PB
Net Capacity
4TB = 1PB
6TB = 1.5PB
8TB = 2PB
Model GL4
Analytics and Cloud 4 Enclosures, 20U
232 NL-SAS, 2 SSD
10 to 16 GB/Sec
Model GL6
PetaScale Storage 6 Enclosures, 28U
348 NL-SAS, 2 SSD
10 to 25 GB/sec
Model GL2
Analytics Focused 2 Enclosures, 12U
116 NL-SAS, 2 SSD
5 - 8 GB/Sec
Model GS1 24 SSD
6 GB/Sec
Model GS2 46 SAS + 2 SSD or
48 SSD Drives
2 GB/Sec SAS
12 GB/Sec SSD
Model GS4 94 SAS + 2 SSD or
96 SSD Drives
5 GB/Sec SAS
16 GB/Sec SSD
Model GS6 142 SAS + 2 SSD
7 GB/Sec
Net Capacity
1.2TB = 121TB
1.6TB = 182TB
Net Capacity
400GB = 28TB
800GB = 57TB
1.6TB = 115TB
1.2TB = 78TB
1.6TB = 117TB
Net Capacity
400GB = 13TB
800GB = 26TB
1.6TB = 53TB
1.2TB = 35TB
1.6TB = 53TB
Net Capacity
400GB = 6TB
800GB = 13TB
1.6TB = 26TB
![Page 34: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/34.jpg)
ESS New Models Performance and Capacity
Spectrum
Scale
ESS
New! Model GL2S: 2 Enclosures, 14U 166 NL-SAS, 2 SSD
New! Model GL4S: 4 Enclosures, 24U 334 NL-SAS, 2 SSD
New! Model GL6S: 6 Enclosures, 34U 502 NL-SAS, 2 SSD
ESS
5U84 Storage
ESS
5U84 Storage
Max: .9PB raw Max: 1.6PB raw Max: 1.8PB raw Max: 3.3PB raw Max: 2.8PB raw Max: 5PB raw
Model GL2: 2 Enclosures, 12U 116 NL-SAS, 2 SSD
ESS
5U84
Storage
ESS
5U84 Storage
ESS
5U84
Storage
ESS
5U84
Storage
Model GL6: 6 Enclosures, 28U 348 NL-SAS, 2 SSD
Model GL4: 4 Enclosures, 20U 232 NL-SAS, 2 SSD
ESS
5U84
Storage
ESS
5U84
Storage
ESS
5U84
Storage
ESS
5U84
Storage
ESS
5U84 Storage
ESS
5U84
Storage
34 GB/s
25 GB/s
17 GB/s
11 GB/s
8 GB/s
23 GB/s
Net Capacity
4TB = 1.5PB
8TB = 3.1PB
10TB = 3.9PB
Net Capacity
4TB = 1PB
8TB = 2PB
10TB = 2.5PB
Net Capacity
4TB = 508PB
8TB = 1PB
10TB = 1.27PB
![Page 35: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/35.jpg)
Sequential throughput vs. Capacity
![Page 36: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/36.jpg)
Software Defined Compute: IBM Platform Computing Delivering a highly utilized shared services environment optimized for time to results
Application Examples
• Simulation
• Analysis
• Design
• Big data
IT constrained
• Long wait times
• Low utilization
• IT Sprawl
IBM Platform Computing
Big Data /
Hadoop
Simulation
& Modeling Analytics
Traditional Software Defined
Benefits
• High utilization
• Throughput
• Performance
• Prioritization
• Reduced cost Repeated for many apps and groups
• Clusters
• Grid
• Cloud
Faster results
Fewer resources
Long Running
Services Make lots of computers look like “one”
Prioritized matching of supply with demand
Application
![Page 37: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/37.jpg)
Overall Artificial Intelligence (AI) Space
39
Machine Learning
Deep Learning IT Systems break tasks into
Artificial Neural Networks
New Data
Sources:
NoSQL,
Hadoop &
Analytics
New class of applications
Machine Learing & Training
Pattern matching
Image
Real-time decision support
Complex workflows
Data Lakes
Extend Enterprise applications
Finance: Fraud detection /
prevention
Retail: shopping advisors
Healthcare: Diagnostics and
treatment
Supply chain and logistics
Extend Predictive Analytics to
Advance Analytics with AI
Human Intelligence Exhibited by Machines
Cognitive / ML/DL
“Human Trained” using large amounts of data & ability to learn how to perform the
task
Growing across Compute, Middleware, and Storage
![Page 38: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/38.jpg)
PowerAI Platform
40
Caffe NVCaffe Torch IBMCaffe
DL4J TensorFlow
OpenBLAS
Theano
Deep Learning
Frameworks
Accelerated
Servers and
Infrastructure
for Scaling
Spectrum Scale:
High-Speed
Parallel File System
Scale to
Cloud
Cluster of NVLink
Servers
Coming Soon
Bazel DIGITS NCCL Distributed
Frameworks
Supporting
Libraries
![Page 39: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/39.jpg)
Where to start?
• 20 x POWER8 cores with NvLink • Hasta 1TB DDR4 Mem with NvLink • Hasta Tesla P ’s . cores
+
Parallel Computing
Ej. Universidad Carlos III
Barcelona Supercomputing Center
GPU development
And optimisation
Ej. Molecular dynamics.
Centro de Biología Molecular
Machine Learning
Deep Learning
20 Core POWER8 + 256GB + 1
GPU Nvidia Volta
Starting at 27.500 € + IVA
IBM Power System S822LC
The Deep Learning Server
![Page 40: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack](https://reader036.fdocuments.in/reader036/viewer/2022062509/60ffadd66bfe8442087da1f0/html5/thumbnails/40.jpg)
Questions?
42