Post on 22-Mar-2018
Lustre Beyond HPCToward Novel Use Cases for Lustre?
Presented at LUG 2014
Robert Triendl, DataDirect Networks, Inc.2014/03/31
2
ddn.com
this is a vendor presentation!
3
ddn.com
Lustre Today
• The undisputed file-system-of-choice for HPC – large-scale parallel I/O for very large HPC clusters
(several thousand nodes or larger)– applications that generate very large datasets but only a
relatively limited amount of metadata traffic
• Alternative solutions– are typically more expensive (since proprietary) – and not nearly as scalable as Lustre
4
ddn.com
Lustre Market OverviewData for the Japan Market
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2008 2013 2018
Cloud & Content
Business Data Analysis
HPC "Work" Corporate
Research Data Analysis
HPC Archive
HPC "Work"
5
ddn.com
What Happened?Example: Storage Market Japan
• Lustre market expansion– From a relatively small player to the dominant HPC file
system
– From tens of OSSs to hundreds of OSSs (and, when including the “K” system, literally thousands of OSSs)
– But, also Lustre has become synonymous with large-scale HPC
– Experiments to use Lustre in other markets and for other applications have decreased, rather than increased
6
ddn.com
Overall Storage Market Evolution• “Software-defined storage” is replacing traditional
storage architectures in large cloud deployments
• Many Lustre “alternatives” are now available– CEPH, OpenStack/Swift, Swift Stack, Gluster, etc. – Various Hadoop distributions– Commercial Object Storage (DDN WOS, Scality,
Cleversafe, etc.)
• Lustre remains “exotic” and is rarely considered even an option
7
ddn.com
BA 6%
HPC "Work" Corp 15%
Research Data Analysis 35%
HPC Archive 23%
HPC Work, 24%
Project Quota
Small File I/O
SSD Acceleration
Fine-Grained Monitoring
NFS/CIFS Access
Management
Connectors
Object/Cloud Links
Data Management
Backup/Replication
HSM
Client Performance
Cluster Integration
Large I/O
I/O Caching
RAS Features
8
ddn.com
BA 6%
HPC "Work" Corp 15%
Research Data Analysis 35%
HPC Archive 23%
HPC Work, 24%
Project Quota
Small File I/O
SSD Acceleration
Fine-Grained Monitoring
NFS/CIFS Access
Management
Connectors
Object/Cloud Links
Data Management
Backup/Replication
HSM
Client Performance
Cluster Integration
Large I/O
I/O Caching
9
ddn.com
Nagoya University Acceptance BM
• Large Cluster– FFP from each core in the cluster– Most efficient configuration with 3 TB/4 TB
drives– 350-400 threads per Lustre OST
10
ddn.com
Nagoya University Initial DataFPP with Large Number of Threads
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 50 100 150 200 250
GB/sec
Processes per OST
Write Performance Single OST (GB/sec)
11
ddn.com
Large I/O Patches
0%
20%
40%
60%
80%
100%
120%
0 100 200 300 400 500 600 700
Perf
orm
ance
per
OST
as
% o
f Pea
k Pe
rfor
man
ce
Number of Process
Write: Lustre Backend Performance Degradation (Maximum for each dataset=100%)
7.2KSAS(1MB RPC) 7.2KSAS(4MB RPC) SSD(1MB RPC)
0%
20%
40%
60%
80%
100%
120%
0 100 200 300 400 500 600 700
Perf
orm
ance
per
OST
as
% o
f Pea
k Pe
rfor
man
ce
Number of Process
Read : Lustre Backend Performance Degradation (Maximum for each dataset=100%)
7.2KSAS(1MB RPC) 7.2KSAS(4MB RPC) SSD(1MB RPC)
12
ddn.com
Raw Device Performance: Write
0
200
400
600
800
1000
1200
Thro
ughp
ut(M
B/s
ec)
Total number of thread
sgpdd-survey(SeagateES3 7.2 NL-SAS, RAID6, write)crg=1 crg=2 crg=4 crg=8 crg=16
crg=32 crg=64 crg=128 crg=256
0
200
400
600
800
1000
1200
Thro
ughp
ut(M
B/s
ec)
Total number of thread
sgpdd-survey(Hitachi 7.2K NL-SAS, RAID6, write)
crg=1 crg=2 crg=4 crg=8 crg=16
crg=32 crg=64 crg=128 crg=256
12
13
ddn.com
Raw Device Performance: Read
0
200
400
600
800
1000
1200
1400
Thro
ughp
ut(M
B/s
ec)
Total number of thread
sgpdd-survey(SeagateES3 7.2 NL-SAS, RAID6, read)crg=1 crg=2 crg=4 crg=8 crg=16
crg=32 crg=64 crg=128 crg=256
0
200
400
600
800
1000
1200
Thro
ughp
ut(M
B/s
ec)
Total number of thread
sgpdd-survey(Hitachi 7.2K NL-SAS , RAID6, read)
crg=1 crg=2 crg=4 crg=8 crg=16
crg=32 crg=64 crg=128 crg=256
14
ddn.com
BA 6%
HPC "Work" Corp 15%
Research Data Analysis 35%
HPC Archive 23%
HPC Work, 24%
Project Quota
Small File I/O
SSD Acceleration
Fine-Grained Monitoring
NFS/CIFS Access
Management
Connectors
Object/Cloud Links
Data Management
Backup/Replication
HSM
Client Performance
Cluster Integration
Large I/O
I/O Caching
15
ddn.com
Output Data from a Simulation
• Requirements– Random reads against multiple large files– 2 million 4k random read IOPS
• Solution– Lustre file system with 16 OSS servers– Two SFA12K (or one SFA12KXi)– 40 SSDs as Object Storage Devices
16
ddn.com
Lustre 4K Random Read IOPSConfiguration: 4 OSSs, 10 SSDs, 16 clients
0
100,000
200,000
300,000
400,000
500,000
600,000
1n32p 2n64p 4n128p 8n256p 12n384p 16n512p
Lustre 4K Random Read (FPP and SSF)FPP (POSIX)
FPP (MPIIO)
SSF (POSIX)
SSF (MPIIO)
17
ddn.com
Genomics Workflos
• Mixed workflows– Ingest– Sequencing pipelines: large file I/O– Analytics workflows: mixed I/O
• Various I/O issues– Random reads for reference data– Small-file random reads
18
ddn.com
Random Reads with SSDs
19
ddn.com
Data Analysis “Workflows”
• Scientific data analysis– Genomics workflows– Seismic Data Analysis– Various types of accelerators– Large scientific instruments in astronomy– Remote sensing and environmental monitoring– Microscopy
20
ddn.com
Data Analysis “Workflows”
• Additional topics–Data ingest–Data management and data retention–Data distribution and data sharing
21
ddn.com
Hyperscale StorageHPC, Cloud, Data Analysis
High Performance ComputingMostly (very) large files (GBs)Mostly write I/O performance
Mostly streaming performance
10s of Petabytes of DataScratch data
100,000s coresMostly InfinibandSingle location
Very limited replication factorHigh efficiency
Cloud ComputingSmall and medium size files (MBs)
Mostly read I/O performanceMostly transactional performance
10s of Billions of filesWORM & WORN
10s of millions of coresAlmost exclusively Ethernet
Highly distributed dataHigh replication factor
Low efficiency
Data Analysis
Workflows
22
ddn.com
BA 6%
HPC "Work" Corp 15%
Research Data Analysis 35%
HPC Archive 23%
HPC Work, 24%
Project Quota
Small File I/O
SSD Acceleration
Fine-Grained Monitoring
NFS/CIFS Access
Management
Connectors
Object/Cloud Links
Data Management
Backup/Replication
HSM
Client Performance
Cluster Integration
Large I/O
I/O Caching
23
ddn.com
A (DDN) Vision for Lustre
• Maximum sequential and transactional performance per storage sub-system CPU
• Caching at various layers within the data path
• Increased single node streaming and small file performance
• Millions of metadata operations in a single FS
• Millions of random (read) IOPS within a single FS
24
ddn.com
A (DDN) Vision for Lustre cont.
• Data management features, including (cloud) tiering, fast and efficient data back-up, and data lifecycle management
• Novel usability features such as cluster integration, QoS, directory-level quota, etc.
• Extremely high backend reliability for small and mid-sized systems
25
ddn.com
Futures for Lustre?
• Work Closely with Users– User problems are the best source for future
direction– Translate user problems into roadmap priorities
• Work Closely with the Lustre Community– Work very closely with OpenSFS and Intel
HPDD on Lustre roadmap priorities and various other topics
26
ddn.com
Futures for Lustre?
27
ddn.com