Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and...

42
COMPUTE | STORE | ANALYZE Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, 2014

Transcript of Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and...

Page 1: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Tiered Storage for Big Data and Supercomputing

ASIS Webinar

July 31, 2014

Page 2: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Safe Harbor Statement

This presentation may contain forward-looking statements that are

based on our current expectations. Forward looking statements

may include statements about our financial guidance and expected

operating results, our opportunities and future potential, our product

development and new product introduction plans, our ability to

expand and penetrate our addressable markets and other

statements that are not historical facts. These statements are only

predictions and actual results may materially vary from those

projected. Please refer to Cray's documents filed with the SEC from

time to time concerning factors that could affect the Company and

these forward-looking statements.

Cray Workflow-Driven Storage Solutions2

Page 3: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Agenda

Cray Workflow-Driven Storage Solutions

● Cray Storage & Data Management Business Overview

● Trends and Challenges● Big data and what it means for storage● Planning and managing HSMs

● Product overview● Cray Sonexion – for high-performance storage● Cray Tiered Adaptive Storage (TAS)

● Case Studies

● Summary

3

Page 4: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray Builds Computational and Storage Tools That Help Change The World

Cray Workflow-Driven Storage Solutions4

SupercomputersFlexible Clusters

Hybrid ArchitecturesCompute

Tiered Storage& Data Management

Systems and SolutionsStore

AnalyzeGraph Analytics

Hadoop Solutions

Merging Big Data and Supercomputing

Supercomputing Big Data

Page 5: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray Storage and Data Management

Cray Workflow-Driven Storage Solutions5

● Cray is a public company with steady growth● Overall for Cray in 2013: 525M in revenue● SDM 63M with 27% annual growth

● Track record of delivering results● 120+ petabytes of deployed storage● 150+ Cray-supported Lustre deployments● World’s leading parallel systems team

● World’s fastest production file system● Exascale leader in storage performance

● Growing storage portfolio● Storage systems

● Cray Sonexion● Cray Tiered Adaptive Storage (TAS)

● Expert systems architectures● High-performance storage● Tiered storage● Archive solutions

● Services● 24/7 global support organization

Expertise & Best Practices

System Architectures

Storage Systems

Cray Solutions

Page 6: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Data-Driven Markets Need Scalable Storage

Cray Workflow-Driven Storage Solutions

Supercomputing Analytics Cluster Computing

Earth SciencesCLIMATE CHANGE& WEATHER PREDICTION.REMOTE SENSING.

ManufacturingAIRCRAFT DESIGN,CRASH SIMULATION &FLUID DYNAMICS

Life SciencesDRUG DISCOVERY, GENOMICRESEARCH, COMPLEXMODELING

Higher EducationUNIVERSITY-DRIVEN SCIENCE, NEW ENERGY SOURCES &EFFICIENT COMBUSTION

EnergySEISMIC IMAGING & RESERVOIR SIMULATION

Defense & NationalSecurityWARFIGHTER SUPPORT,THREAT PREDICTION &STOCKPILE STEWARDSHIP

Cray Storage and Data Solutions

6

Page 7: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray Storage and Data Management Solutions

Cray Workflow-Driven Storage Solutions7

• Proven experts in parallel systems, disk storage and HSM

• 150 Lustre deployments

• 120 petabytes primary storage shipped/installed

• Exascale leadership in storage performance and archiving

• Scale-as-you-go performance from GB/s to 1TB/s in a file system

• Fluid capacity scalability from terabytes to exascale-capable archives

• Quality assurance and stress testing for the largest production environments

• Simplify and reduce time to deployment

• Fastest in-production Lustre file system

• Reduced time to results by 24x at NCSA

• Reduce storage footprint by 50% for petascale systems

Massively Scalable Storage Solutions for Big Data & Supercomputing

Your Trusted Expert Scale Optimally Results Faster

Experts in workflow-driven storage, optimized to scale and deliver results

Page 8: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

What Our Storage Customers are Saying

Cray Workflow-Driven Storage Solutions

We immediately saw success from the perspective of stability and performance. Our bandwidth numbers were higher than the previous vendor’s, using the exact same hardware. We went from the file system being our biggest issue to the least of our issues, with Cray.

Jim Lujan, HPC Project Leader, LANL

Pawsey

Center

“Some of the science teams have been able to do 3 years worth of work in 3 months.”

Michelle Butler, Head of Storage & Networking, NCSA Blue Waters project

Cray was chosen at Pawsey because Cray is the most credible and reliable partner and best understood the requirements. Knowing we have Cray onsite is very important. If Cray can’t do it, nobody can.

Dr. George Beckett, Deputy Director & Head of Supercomputing Team

8

Page 9: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Storage Trends in Data-Driven Workflows

Page 10: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Meta Issues for Data and Science

Cray Workflow-Driven Storage Solutions

Science and research

moving faster than IT

Storage and data flow

(usually) the bottleneckT

P C

10

Page 11: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Other Trends – More Data to Retain, Indefinitely

Cray, Inc. - Tiered Adaptive Storage11

● Massive data growth● More to keep● Outpacing IT’s ability to store, protect, and manage data

● Storage too expensive – for one monolithic “tier”● Match storage tier to value of data based on policies/rules● Each tier has to be price competitive

● Data should be continuously accessible, in many cases

● Simplicity rules ● Users: Keep them working (NFS, CIFS, FTP, Web)● Storage ops: Needs to simplify managing the storage and data

● Data management poses biggest challenges

Page 12: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

“Big Data” Defined – Cray’s View

12

● Data Types● Structured & Transactional

● Databases and Analysis● Unstructured (our primary focus)

● Persistent file data● Larger files—and data sets

● Volume, Variety, Velocity● Volume

● Big data sets – and file systems● Variety

● Mainly unstructured● Mix of file types and sizes

● Velocity● Data is “fast” early in the lifecycle

● Data I/O - I/O from small random to “big I/O”● Data movement – file movement, copying, staging,

archiving

● Data continues to move, and needs to be continuously accessible throughout its lifecycle, too

Massive Volumes of

Data

Velocity –from fast to slow

Variety of File Types

Cray, Inc. - Tiered Adaptive Storage

Page 13: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

The Innovation Explosion in Life Sciences

Cray Workflow-Driven Storage Solutions

● Data generated at greater rate than IT can manage● Lab science changes every month● IT refreshes occur over 2-10 years

● Data outputs doubling yearly● Genomics, Microscopy, Radio

Astronomy, Physics.

● More data must be kept● Deleting results not an option for

many commercial companies or research organizations

Today’s Bio-IT professionals have to design, deploy and support IT infrastructures with life cycles measured over several years in the face of an innovation explosion where major laboratory and research enhancements arrive on the scene every few months.

-- the BioTeam

How do enterprise and HPC infrastructure teams keep up?

13

Page 14: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

The Data Lifecycle – for Performance

Cray Workflow-Driven Storage Solutions14

0

5

10

15

20

25

30

0

500

1000

1500

2000

2500

3000

Day 1 30 Days 60 Days 90 Days 180 Days 360 Days 2 Years

Cap

acit

y (P

B)

Thro

ugh

pu

t (M

B/s

ec)

Throughput and I/O

Parallel Access

Performance Scaling

Page 15: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

The Data Lifecycle – for Capacity

Cray Workflow-Driven Storage Solutions15

0

5

10

15

20

25

30

0

500

1000

1500

2000

2500

3000

Day 1 30 Days 60 Days 90 Days 180 Days 360 Days 2 Years

Cap

acit

y (P

B)

Thro

ugh

pu

t (M

B/s

ec)

Maximum Efficiency

Infrequent Access

Capacity Scalability

Page 16: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

The Tiers in Tiering—Form Follows Function

16

● HSM and the Tiers in Storage● Tier 0: Fast (SSD and disk) ● Tier 1: Primary (SSD and disk)● Tier 2: Nearline (disk or tape)● Tier 3: Offline or offsite (disk or tape)

Primary

Nearline

Deep Archives

Fast Performance-optimized – IO and throughput

Where data lives most of the time. Performance to capacity mix depends on use case

Capacity-optimized archives (disk or tape)

Long-term capacity- and cost-optimized (usually tape)

Cray, Inc. - Tiered Adaptive Storage

Data M

igration

Page 17: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Storage Challenges

Cray Workflow-Driven Storage Solutions17

Page 18: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Data & Storage Management Challenges

18

● Data Sprawl ● Multiple file systems to manage● Data not easily managed across

file systems● Data not easily found

● Storage Cost & Complexity● Linux – devices, drivers, etc.● Massive complexity● Configuration and testing● Data wasting space or deleted

● Data Protection & Availability● Best case: data found or

recovered in days● Worst case: data lost or corrupted

App 1 App 3 App 3

Fast Storage

Primary Archive

FS 1 FS 2 FS 3 FS 3 ./foo

Libraries

Backup SW

Where’s My Data?

Cray, Inc. - Tiered Adaptive Storage

Page 19: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Build Your Own HSM: Complex Undertaking

19

● Server OS and hardware validation● Server, networking, storage devices (HBAs, NICs)● Configuration management

● HSM software and hardware validation● Stacks often proprietary ● Tied to local OS and files systems (DMAPI)● Tied to hardware (servers and storage)● Require expertise to configure

● Networking expertise● Fiber Channel, iSCSI, InfiniBand● NFS, CIFS, FTP, etc.

● Disk storage (SANs) expertise● Massive complexity at scale

● LUNs, masking, zoning, multi-pathing ● Requires platform-specific expertise

● Tape storage & library integration● Requires library support and testing● API integration● Data management and protection requires HSM

Cray, Inc. - Tiered Adaptive Storage

Data Checks In—But Never Checks Out

Page 20: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Traditional HSM Architecture – Complex

Cray Workflow-Driven Storage Solutions

IB Fabric

fs1

fs2

fs3

QDR

FDR

FC

Ethernet

DM

DM

DM

DM

DM

DM

Ethernet

HSM

HSM

HSM

HSM

HSM

HSM

Disk Cache

Archive Media

Archive Media

Archive Media

Archive Media

Archive Media

Archive Media

Lustre Movers HSM Movers

Data Ingest

20

Page 21: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray’s Goals for Workflow-Driven Tiered Storage

Cray Workflow-Driven Storage Solutions21

● Pay as you grow performance and capacity scalability● Performance scaling for fast tiers - High Performance Storage● Capacity expansion for primary and archive tiers - Tiered Storage/Archiving

● Simplify managing everything● Ease of deployment, configuration and operations● Upgradeable and sustainable infrastructure, over lifespan of data● In-place data migration (no forklift)

● Build on open, portable and sustainable architectures● Open data formats for long term storage● Open source operating systems and tools● Flexible storage choices to fit requirements - flash, disk and tape

● Data protection and accessibility at scale● Data must be available within reasonable timeframes

● Quality and dependability for large-scale deployments● Solutions that work as advertised● Single point of support for entire solution

Page 22: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Process Store Archive

Abstracting Data Access - Across Workflows

Cray Workflow-Driven Storage Solutions22

Tier 1 Tier 2 Tier 3 Tier 4

Data Movement

Fast Persistent Efficient

High Speed Interconnect

DistributionAccessIngest / CreationHPC Systems, Workflows and Applications

Page 23: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray Storage Products

Cray Workflow-Driven Storage Solutions23

Scale optimally – small to large systems• Gigabytes to terabytes of performance• Terabytes to exabytes of capacity

Scalable building blocks• Best-of-breed storage technologies• Open systems and software

Page 24: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray Sonexion Storage System

Cray Workflow-Driven Storage Solutions24

● Purpose-built system for Lustre at scale● Key building block for Cray system architectures● Enables data protection and availability at scale

● Simplifies deployment and management● Builds on pre-configured integrated design● Reduces storage footprint by 50% for petascale systems

● Optimized for scale ● Scale bandwidth from 5GB/s to 1TB/s per-file system● Optimal performance-to-drive ratio

● Proven by Cray● Cray delivering sustained real-world performance at scale for

supercomputing and big data

Scale-out Lustre System for Big Data and Supercomputing

Page 25: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray Tiered Adaptive Storage (TAS)

Cray Workflow-Driven Storage Solutions25

Page 26: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray Tiered Adaptive Storage (TAS)

Cray Workflow-Driven Storage Solutions26

● Preserve data indefinitely● Optimized for scale● Data fully protected● Upgrade with technology

● Simplified management and implementation● Up to 5 tiers with Lustre - flexible media choices● Non-disruptively upgrade storage and media● Familiar SAM-QFS tools and commands● Minimal impact to users or applications

● Access data forever● Data protection and accessibility at scale● Expert design for maximum scalability● Single point of support by Cray

Open archive and deployment-ready HSM for Big Data and Supercomputing

Page 27: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Our partnership with Versity: Using the Versity Storage Manager (VSM)

Cray Tiered Adaptive Storage

● Based on open source SAM-QFS ● Versity has strong heritage with data management solutions

● Familiar tools – works like SAM● Policy driven to provide constant management of user data● Flexible options to classify data● Integrated migration capability from 3rd party solutions● Tightly integrated into file system

● Shared file system● POSIX SAN shared file system● Support for hundreds of native file system clients● Native interface to storage tiers● Integrated volume management with performance configurations

● Integrated and configured by Cray● Pre-validated systems● Cray support for all hardware and software

Page 28: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray TAS – Simplifying HSM

Cray Workflow-Driven Storage Solutions

IB Fabric

fs1

fs2

fs3

QDR

FDR

FC

Ethernet

Data Movement and Transparent User Access

Shared Virtualized Storage Pool

28

Page 29: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray Tiered Adaptive Storage Architecture

Cray Workflow-Driven Storage Solutions29

● Virtualizes storage● Single interface to multiple tiers● File systems appear infinitely large● No user interaction required

● Protects data at scale● Multiple copies of files● Disaster recovery capabilities

● Flexible storage tiers● Scale the correct tiers to your needs● Support for both disk and tape

● Transparent for users and apps● Maintain ease of use for your customers

● Extensible to Lustre file system● Lustre file system integration● Maintain transparency throughout

Users and Applications

Tier 1

Tier 2

Tier 3

Tier 4

File System

Policy-based Data Movement

Policy Engine

Lustre File System

Users and Applications

Tran

spar

ent

Dat

a A

cces

s

Users and applications always have access to data

Page 30: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray TAS Connector for Lustre

Cray Workflow-Driven Storage Solutions30

● Data management and protection for Lustre file systems● Transparently manage data within Lustre● Up to 5 storage tiers● Option offered with Cray Tiered Adaptive Storage (TAS) solution

● Key challenges● Efficiently utilizing Lustre storage● Data protection and preservation● Sustaining exascale-capable archives

● Key benefits of Cray TAS Connector for Lustre● TCO reduction

● Increase utilization of high-performance Lustre storage● Flexible data protection

● Up to 5 copies per file protected over multiple tiers● On and off-site copy options for disaster recovery

● Open systems and formats● Support Lustre 2.5 building on Lustre HSM API● Cray TAS relies on open source systems and tools and Versity open format archiving

Page 31: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Benefits to Existing SAM-QFS Customers

Cray Workflow-Driven Storage Solutions31

● Reduce TCO by 30% over SAM-QFS● 20-40% over 3 years

● Have confidence preserving data forever● Cray commitment to SAM-QFS capabilities on open systems ● Cray driving long-term roadmap for Linux (vs. Solaris)

● Simplify managing data using familiar tools● Similar toolset as SAM-QFS● Geared toward HPC and big data environment on Linux

● Easy migration path from SAM-QFS to TAS● In-place conversion from SAM to TAS● Performed by Cray

Cray TAS offers a deployment-ready, open archiving and tiered storage solution that reduces TCO by 30% over 3 years compared to SAM-QFS, instills confidence in the future, and simplifies managing data using familiar tools.

Page 32: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Case Studies

Page 33: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray Customers

Cray Workflow-Driven Storage Solutions33

Page 34: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

HLRN

Cray Workflow-Driven Storage Solutions34

Challenges

• Migrate data onto new tape libraries-and from SAM-QFS

• Utilize Linux and ecosystem of administration

• Maintain familiar management environment

• Single point of support

Solution

• Over 5 PB Cray Tiered Adaptive Storage, powered by Versity, open archive system

• In-place migration & conversion from SAM-QFS to Versity Storage Manager, using TAS

Results

• Seamless migration from original SAM-QFS environment to Cray TAS

• Seamless migration from original tape libraries to new tape libraries

• Simplified management, standardized on Linux

• Single point of support by Cray.

We wanted a uniform hardware and software landscape, utilizing Linux. From a management perspective, Cray TAS is superior to maintaining a proprietary environment.

Dr. Steffen Schulze-Kremer, head of HPC at RRZN

Cray TAS open archiving system for big data deployed at North German Supercomputing Center for long-term data preservation

Page 35: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

University of Chicago – Life Sciences Research

Cray Workflow-Driven Storage Solutions

Parallel Whole Genome Analysis (WGA) workflow, Megaseq

Challenges

• Time constraints and efficiency of whole genome analysis using conventional clusters

• Scaling storage and compute to support Megaseq WGA workflow

Solution

• The Beagle XE6 supercomputer and associated storage, by Cray

• Storage solution included:

• Direct-attached Lustre (DAL)

• 600TBs of usable capacity

Results

• Reduced analysis time of 240 genomes from ~37 years of theoretical CPU time to 50.4 real time hours using the MegaSeq workflow on the Cray system, Beagle

Study published on: https://beagle.ci.uchicago.edu/science-at-beagle/

35

Page 36: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray Workflow-Driven Storage Solutions

Challenges• Reducing time to results for production research

while maintaining consistent user experience

• Meeting resiliency, performance and other stringent testing requirements

• Optimizing deployed storage assets and supporting multi-vendor products

Solution

• Single point of solution architecture and support entire solution—to Cray

• Lustre File System by Cray

• Data Virtualization Services for scaling NFS

• NetApp E-Series storage

• Lustre Client for Cray XE6

Results

• Increased file system stability, reliability, and performance using same storage

• Users like consistency in job performance

Alliance for Computing at Extreme Scale (Cielo)Large capability-class system supporting diverse science and weapons physics, simulations and modeling, such as asteroid impact scenarios

We immediately saw success from the perspective of stability and performance. Our bandwidth numbers were higher than the previous vendor’s, using the exact same hardware. We went from the file system being our biggest issue to the least of our issues, with Cray.

Jim Lujan, HPC Project Leader, LANL

Cielo web site: http://www.lanl.gov/orgs/hpc/cielo/

36

Page 37: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

InfiniBand

Large Integrated Oil & Gas Company

Total Capacity: 10+PBsTotal IO Performance: 180GB/s3rd-Party Cluster Nodes: 1400+

Cray Lustre Clients

Cray XE6

Production (Seismic)

3rd Party Cluster

Cray Lustre Clients Cray Lustre Clients

Cray XE6

Test/Dev

Lustre fs1Lustre

fs2Lustre Lustre

Test/Dev

Cray Sonexion Scale-out Lustre System

Challenges

• Breaking I/O bottlenecks and reducing cost of processing

• Sharing data among heterogeneous HPC compute clusters

• Interoperability across file systems

Solution

• Cray Sonexion 1600

• Lustre Client for 3rd-party Cluster

• Lustre Client for Cray XE6 & XC30

Results

• multiple production file systems

• 10+PBs capacity

• 180 GB/s sustained IO

• Shared across all Cray compute systems and cluster nodes

Data- and IO-intensive seismic processing for discovery and analysis, from basic seismic to Full Waveform Inversion (FWI)

Cray Workflow-Driven Storage Solutions37

Page 38: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Enabling Scientific Breakthroughs at the Petascale

● Integrated Storage at 1TB/sec● 25+ PB of usable space● Production Science at Full Scale● Cray Reliability and Service

NCSA Blue Waters Case StudySupercomputing at

Sustained Petascale Performance

Cray Workflow-Driven Storage Solutions38

Page 39: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Summary

Cray Workflow-Driven Storage Solutions39

● Scale performance and capacity based on workflow needs● Scalable tiered storage architecture for Lustre● Comprehensive data management across tiers

● Reduce TCO● By 30% over 3 years compared to SAM-QFS● Easy, familiar management tools, like SAM-QFS

● Protect data indefinitely, at the right cost● Ideal for reducing, protecting, and managing Lustre data● Protect up to 5 copies—within and across locations

● Stay open and portable● Using Linux, open systems, and open formats

Cray Sonexion and Cray TAS provide an easy, open and scalable data management and protection solution spanning the range of workflows for big data and supercomputing.

Page 40: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Seymour Cray

June 4, 1995

The future is seldom the same as the past

Cray Workflow-Driven Storage Solutions40

Page 41: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Cray Workflow-Driven Storage Solutions41

Page 42: Tiered Storage for Big Data and Supercomputing - ASIS&T · Tiered Storage for Big Data and Supercomputing ASIS Webinar July 31, ... Disk storage ... Quality and dependability for

C O M P U T E | S T O R E | A N A L Y Z E

Legal Disclaimer

Information in this document is provided in connection with Cray Inc. products. No license, express or implied, to any intellectual property rights is granted by this document.

Cray Inc. may make changes to specifications and product descriptions at any time, without notice.

All products, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.

Cray hardware and software products may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Cray uses codenames internally to identify products that are in development and not yet publically announced for release. Customers and other third parties are not authorized by Cray Inc. to use codenames in advertising, promotion or marketing and any use of Cray Inc. internal codenames is at the sole risk of the user.

Performance tests and ratings are measured using specific systems and/or components and reflect the approximate performance of Cray Inc. products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.

The following are trademarks of Cray Inc. and are registered in the United States and other countries: CRAY and design, SONEXION, URIKA, and YARCDATA. The following are trademarks of Cray Inc.: ACE, APPRENTICE2, CHAPEL, CLUSTER CONNECT, CRAYPAT, CRAYPORT, ECOPHLEX, LIBSCI, NODEKARE, THREADSTORM. The following system family marks, and associated model number marks, are trademarks of Cray Inc.: CS, CX, XC, XE, XK, XMT, and XT. The registered trademark LINUX is used pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis. Other trademarks used in this document are the property of their respective owners.

Cray Workflow-Driven Storage Solutions42