Post on 01-Oct-2020
IME™ (Infinite Memory Engine)
Extreme Application Acceleration &Highly Efficient I/O Provisioning
Tommaso CecchiSeptember 22nd 2015
2 What is IME?
This breakthrough, software defined storage
application introduces a whole new application-
aware data acceleration tier that provides game-
changing latency reduction and greater
bandwidth and IOPS performance for today’s
and tomorrow’s performance hungry scientific,
analytic and big data applications.
3 What is IME?
IME delivers the performance of flash with the manageability & capacity of shared storage
IME is a new new tier of transparent, extendable,
non-volatile memory (NVM), that provides game-
changing latency reduction and greater
bandwidth and IOPS performance for the next
generation of performance hungry scientific,
analytic and big data applications.
4 What is IME?
IME creates a new application-
aware fast data tier that resides
right between compute and the
parallel file system to
accelerate I/O, reduce latency
and provide greater operational
and economic efficiency
5How Does IME Help?Changes the I/O Provisioning Paradigm & Reduces the Total Cost of Storage
IME Reduces Storage Hardware up
to 70%
• Fewer systems to buy, power
manage, maintain
STORAGE BANDWIDTH UTILIZATION OF
A MAJOR HPC PRODUCTION STORAGE SYSTEM
• 99% of the time < 33% of max
• 70% of the time< 5% of maxIME enables organizations to
separate the provisioning of
peak & sustained performance
requirements with greater
operational efficiency and cost
savings than utilizing
exclusively disk-based parallel
file systems
6
IME makes exascale I/O a reality,
and finally enables the enterprise
to run HPC jobs with much
greater performance and
efficiency
How Does IME Help?Limitless Performance Scaling Removes Architectural & Economic & Barriers
IME Eliminates:
Parallel file system
locking, limitations &
bottlenecks
70% of storage hardware,
consumed floorspace
Latency driving a 30% loss
of compute resources
90% of checkpoint/restart
downtime
ddn.com© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Why Cache Matters in HPCEven Large HPC Sites Drive a Lot of Small I/O7
Cache is critical in aligning all-too-frequent
unaligned writes and capturing small
writes to preserve spinning disk
performance
• All DDN Storage products offers cache
mirroring & battery-backed RAM cache
- proven across 3 generations – to
accelerate all varieties of data
Many systems today do not even offer a
protected, redundant write cache.
Caching is one of the most difficult
layers of a storage stack to engineer,
it’s also the most critical
ddn.com© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Where IME Provides Value8
IME Accelerates
Parallel Filesystems• Absorbs all sizes of I/O
at full performance,
unlike Lustre* and
GPFS™
ddn.com© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Where IME Provides Value9
S3D Turbulent Flow Model
1. MITIGATES POOR PFS
PERFORMANCE caused by PFS
locking, small I/O, and
mal-aligned, fragmented I/O patterns.
IME “makes bad apps run well” and
also prevents a poor-behaving app
from impacting the entire
supercomputer.
This is especially valuable to diverse
workload environments and ISV
applications.
IOR benchmarks indicate a
3x – 20x speedup on I/Os <32KB.
2) PROVIDES HIGHER
PERFORMANCE I/O (bandwidth
and latency) to the application.
At ISC14, we demonstrated three
orders of magnitude speed-up
due to this high performance tier
3) IME DRIVES SIGNIFICANTLY
MORE EFFICIENT I/O TO THE
PFS by re-aligning and coalescing
data within the non-volatile
storage.
At ISC14, we demonstrated two
orders of magnitude speed-up
due to this efficiency
4 GB/s
25 MB/s
50 GB/s
ddn.com© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
IME Lowers the Total Cost of StorageIME+PFS delivers better price/performance over PFS alone10
More bandwidth to the cluster
(Faster job turn-around, more jobs in
same period, fewer nodes needed to
complete same amount of work)
IME
Value
Proposition
Cluster Memory = 400TBQty: 12 IME Appliances
NVM Capacity = 2.75X Cluster Memory (Each w/ Qty: 48 1.9TB NVMe SSDs)
Components SFA Only IME + SFA Advantages
Cluster I/O BW 540 GB/s 756 GB/s ✔ 216 GB/s More BW Delivered
Storage Fabric BW 540 GB/s 270 GB/s ✔ 50% Less BW Needed to PFS
Qty: OSS 112 56 ✔ 50% Less OSS to Buy
Qty: SFA Appliances 14 7 ✔ 50% Less SFA Appliances Needed
Qty HDDs/SFA400
(80 HDD * 5 Enclos)
800
(80 HDD * 10 Enclos)
✔ 200% More HDD Density per SFA
Appliance
QTY: HDDs5,600
(14 SFA *400 HDD)
5,600(7 SFA *800 HDD)
Delivering the Same Capacity
Fewer OSS and SFAs
Reduced power, space and operational cost
Similar persistent capacity
Lower overall capital cost
ddn.com© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Application IO Implementation
High-level IO Libraries (optional)
MPI-IO
POSIX IONative IO
File System IO Interface (VFS, User Space Library)
HPC Ecosystem – Client IO
Interfaces
Forwarded + Exported IO (optional)
Data path for HL IO
library built on POSIX
11
ddn.com© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
High-Level IO Libraries
► Provides an application and end-user oriented IO interface• Files / directories abstracted from users in favor of data sets / objects /
containers / variables
• Object operations (put, get) instead of byte streams (read, write)
• Portable, self-describing data sets
► Example High-Level IO Libraries• HDF5 (http://www.hdfgroup.org/HDF5/)
• netCDF (http://www.unidata.ucar.edu/software/netcdf/)
• PnetCDF (http://cucis.ece.northwestern.edu/projects/PnetCDF/)
• ADIOS (https://www.olcf.ornl.gov/center-projects/adios/)
► Implementations leverage lower-level IO interfaces• POSIX
• MPI-IO
12
ddn.com© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
MPI-IO► Provides a high-performance parallel IO interface and semantics
• Applies successful MPI capabilities to file IO
• Bulk data capabilities (MPI_File_write_at_all)
• Metadata capabilities (e.g. “scalable file open()”)
► Most popular implementation is Argonne National Laboratory’s ROMIO• Distributed in MPICH
• Available in MPICH derivatives (MVAPICH, IBM MPI, Intel MPI, Cray MPI, and others)
► Key Features:• Independent IO: Uncoordinated parallel IO from many concurrent readers and writers
• Collective IO: Coordinated IO from many readers and writers. Two popular implementationso Data Sieving – Selective filtering of data (reduces IOPs)
o Two-phase IO – Intermediate processes collect and serve data to other processes (reduces number of readers-writers touching PFS)
• MPI Derived Data Type support: Allow MPI runtime to load non-contiguous data in files directly into application data structures in RAMo Used heavily by higher-level IO libraries (e.g. PnetCDF and HDF5)
• Specialization for storage system targets (ROMIO ADIO drivers)o IME provides an ADIO driver that translates MPI-IO requests into IME requests
o ROMIO provides drivers for Lustre, GPFS, PanFS, …
► Further Reading• Chapter 13 of the MPI3 Standard
• http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
13
ddn.com© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
POSIX IO
► Provides a portable byte-stream IO interface• read(), write(), open(), close(), …
► POSIX IO Pros• Portable
• Inertia
► POSIX IO Cons• Some design assumptions no longer true for modern
computers (concurrency and parallelism)
• Lots of state at runtime (file descriptors)
► Further Reading• POSIX standard (POSIX.1-2001)
14
ddn.com© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
IME Native Client Library
Application IO Implementation
High-level IO Libraries (optional)
MPI-IO (IME)
POSIX (IME FUSE)
DDN IME Ecosystem – Client IO
Interfaces
MPI-IO
(POSIX)
Data path for HL IO
library built on POSIX
15
ddn.com© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
DDN IME Ecosystem – Client IO
Interfaces► Three primary interfaces for IME
• IME FUSEo Provides POSIX IO
o Captures IO requests through the Linux VFS
o Target Use Case: General purpose applications that use POSIX
• IME ROMIOo Provides MPI-IO support
o Captures IO requests through the MPI runtime in user space
o Target Use Case: Parallel applications
• IME Native Libraryo Low-level programming interface
o FUSE and ROMIO layers implemented on this interface
o Target Use Case: Highly-optimized customer applications that may not map cleanly onto POSIX or MPI-IO
16
IME™ Internal Architecture Overview
18Aggregate IME Adaptive vs. Non-Adaptive WRITE Performance
Amdahl’s Law
in action!
Ideal, healthy system
One degraded
IME server,
Adaptive
One degraded
IME server,
Non-adaptive
19Real-Time IME Adaptive vs. Non-adaptive WRITE Performance
Adaptive
heuristic learns
“quickly”
4x Performance
Lost with
Non-adaptive
20Use of Log Structuring in IMEWhat does this give us? Near ‘line rate’ performance regardless of output pattern.