Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

21
1 Ceph on All-Flash Storage – Breaking Performance Barriers Zhou Hao Technical Marketing Engineer June 6 th , 2015

Transcript of Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Page 1: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

1

Ceph on All-Flash Storage –Breaking Performance BarriersZhou HaoTechnical Marketing EngineerJune 6th, 2015

Page 2: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Forward-Looking Statements

During our meeting today we will make forward-looking statements.

Any statement that refers to expectations, projections or other characterizations of future events or

circumstances is a forward-looking statement, including those relating to market growth, industry

trends, future products, product performance and product capabilities. This presentation also

contains forward-looking statements attributed to third parties, which reflect their projections as of the

date of issuance.

Actual results may differ materially from those expressed in these forward-looking statements due

to a number of risks and uncertainties, including the factors detailed under the caption “Risk Factors”

and elsewhere in the documents we file from time to time with the SEC, including our annual and

quarterly reports.

We undertake no obligation to update these forward-looking statements, which speak only as

of the date hereof or as of the date of issuance by a third party, as the case may be.

Page 3: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Requirement from Big Data @ PB Scale

Mixed media container, active-archiving, backup, locality of data

Large containers with application SLAs

Internet of Things, Sensor Analytics

Time-to-Value and Time-to-Insight

Hadoop

NoSQL

Cassandra

MongoDB

High read intensive access from billions of edge devices

Hi-Def video driving even greater demand for capacity and performance

Surveillance systems, analytics

CONTENT REPOSITORIES BIG DATA ANALYTICS MEDIA SERVICES

Page 4: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

InfiniFlash™ System• Ultra-dense All-Flash Appliance

- 512TB in 3U

• Scale-out software for massive capacity

- Unified Content: Block, Object

- Flash optimized software with

programmable interfaces (SDK)

• Enterprise-Class storage features

- snapshots, replication, thin

provisioning

• Enhanced Performance for Block and

Object

- 10x Improvement for Block Reads

- 2x Improvement for Object Reads

IF500 with InfiniFlash OS (Ceph)

Ideal for large-scale storage &

Best in class $/IOPS/TB

Page 5: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

InfiniFlash Hardware SystemCapacity 512TB* raw

All-Flash 3U Storage System

64 x 8TB Flash Cards with Pfail

8 SAS ports total

Operational Efficiency and Resilient

Hot Swappable components, Easy FRU

Low power 450W(avg), 750W(active)

MTBF 1.5+ million hours

Scalable Performance**

780K IOPS

7GB/s Throughput

Upgrade to 12GB/s in Q315

* 1TB = 1,000,000,000,000 bytes. Actual user capacity less.** Based on internal testing of InfiniFlash 100. Test report available.

Page 6: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Innovating Performance @ InfiniFlash OS

Major Improvements to Enhance Parallelism

Backend Optimizations – XFS and Flash

Messenger Performance Enhancements

• Message signing

• Socket Read aheads

• Resolved severe lock contentions

• Reduced ~2 CPU core usage with improved file path resolution from object ID

• CPU and Lock optimized fast path for reads

• Disabled throttling for Flash

• Index Manager caching and Shared FdCache in filestore

• Removed single Dispatch queue bottlenecks for OSD and Client (librados) layers

• Shared thread pool implementation

• Major lock reordering

• Improved lock granularity – Reader / Writer locks

• Granular locks at Object level

• Optimized OpTracking path in OSD eliminating redundant locks

Page 7: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Open Source with SanDisk Advantage InfiniFlash OS – Enterprise Level Hardened Ceph

Enterprise Level Hardening

9,000 hours of cumulative IO tests

1,100+ unique test cases

1,000 hours of cluster rebalancing tests

1,000 hours of IO on iSCSI

Testing at Hyperscale

Over 100 server node clusters

Over 4PB of flash storage

Failure Testing

2,000 cycle node reboot

1,000 times node abrupt power cycle

1,000 times storage failure

1,000 times network failure

IO for 250 hours at a stretch

Enterprise Level Support

Enterprise class support and servicesfrom SanDisk

Risk mitigation through long term support and a reliable long term roadmap

Continual contribution back to the community

Page 8: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Test Configuration – Single InfiniFlash System

Performance improves 2x to 12x depending on the Block size

Page 9: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Performance Improvement: Stock Ceph vs IF OS 8K Random Blocks

Top Row: Queue DepthBottom Row: % Read IOs

IOP

S

Avg

late

nv

(ms)

Avg Latency

0

50000

100000

150000

200000

250000

1 4 16 1 4 16 1 4 16 1 4 16 1 4 16

0 25 50 75 100

Stock Ceph(Giant)

IFOS 1.0

0

20

40

60

80

100

120

1 4 16 1 4 16 1 4 16 1 4 16 1 4 16

0 25 50 75 100

• 2 RBD/Client x Total 4 Clients• 1 InfiniFlash node with 512TB

IOPS

Top Row: Queue DepthBottom Row: % Read IOs

Page 10: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

0

20000

40000

60000

80000

100000

120000

140000

160000

1 4 16 1 4 16 1 4 16 1 4 16 1 4 16

0 25 50 75 100

Stock Ceph

IFOS 1.0

Avg

Late

ncy

(m

s)

0

20

40

60

80

100

120

140

160

180

1 4 16 1 4 16 1 4 16 1 4 16 1 4 16

0 25 50 75 100

IOP

S

Performance Improvement: Stock Ceph vs IF OS 64K Random Blocks

IOPS Avg Latency

• 2 RBD/Client x Total 4 Clients• 1 InfiniFlash node with 512TB

Top Row: Queue DepthBottom Row: % Read IOs

Top Row: Queue DepthBottom Row: % Read IOs

Page 11: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Performance Improvement: Stock Ceph vs IF OS 256K Random Blocks

0

5000

10000

15000

20000

25000

30000

35000

40000

1 4 16 1 4 16 1 4 16 1 4 16 1 4 16

0 25 50 75 100

Stock Ceph

IFOS 1.0

0

50

100

150

200

250

300

1 4 16 1 4 16 1 4 16 1 4 16 1 4 16

0 25 50 75 100

IOP

S

Avg

Late

ncy

(m

s)

IOPS Avg Latency

Top Row: Queue DepthBottom Row: % Read IOs

Top Row: Queue DepthBottom Row: % Read IOs

• 2 RBD/Client x Total 4 Clients• 1 InfiniFlash node with 512TB

Page 12: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Test Configuration – 3 InfiniFlash Systems (128TB each)

Performance scales linearly with additional InfiniFlash nodes

Page 13: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Scaling with Performance 8K Random Blocks

0

100000

200000

300000

400000

500000

600000

700000

1 8 64 1 8 64 1 8 64 1 8 64 1 8 64

0 25 50 75 100

0

50

100

150

200

250

300

350

1 8 64 1 8 64 1 8 64 1 8 64 1 8 64

0 25 50 75 100

IOPS Avg Latency

• 2 RBD/Client x 5 Clients• 3 InfiniFlash nodes with 128TB each

Top Row: Queue DepthBottom Row: % Read IOs

Top Row: Queue DepthBottom Row: % Read IOs

IOP

S

Avg

Late

ncy

(m

s)

Page 14: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Scaling with Performance 64K Random Blocks

0

50000

100000

150000

200000

2500001 4

16

64

25

6 2 8

32

12

8 1 4

16

64

25

6 2 8

32

12

8 1 4

16

64

25

6

0 25 50 75 100

0

100

200

300

400

500

600

700

800

900

1000

1 8 64 1 8 64 1 8 64 1 8 64 1 8 64

0 25 50 75 100

IOPS Avg Latency

• 2 RBD/Client x 5 Clients• 3 InfiniFlash nodes with 128TB each

Top Row: Queue DepthBottom Row: % Read IOs

Top Row: Queue DepthBottom Row: % Read IOs

IOP

S

Avg

Late

ncy

(m

s)

Page 15: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Scaling with Performance 256K Random Blocks

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

1 4

16

64

25

6 2 8

32

12

8 1 4

16

64

25

6 2 8

32

12

8 1 4

16

64

25

6

0 25 50 75 100

0

500

1000

1500

2000

2500

3000

3500

1 8 64 1 8 64 1 8 64 1 8 64 1 8 64

0 25 50 75 100

IOPS Avg Latency

• 2 RBD/Client x 5 Clients• 3 InfiniFlash nodes with 128TB each

Top Row: Queue DepthBottom Row: % Read IOs

Top Row: Queue DepthBottom Row: % Read IOs

IOP

S

Avg

Late

ncy

(m

s)

Page 16: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Flexible Ceph Topology with InfiniFlash

SAS

HSEB A HSEB B

OSDs

….

HSEB A HSEB B HSEB A HSEB B

….LUN LUN

Client Application…LUN LUN

Client Application…LUN LUN

Client Application…

RBDs / RGW

SCSI Targets

Rea

d IO

O

Write IO

RBDs / RGW

SCSI Targets

RBDs / RGW

SCSI Targets

OSDs OSDs OSDs OSDs OSDs

Rea

d IO

O

Rea

d IO

O

Disaggregated Architecture

Optimized for Performance

Higher Utilization

Reduced CostsStorage Farm

Compute Farm

Page 17: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Flash + HDD with Data Tier-ingFlash Performance with TCO of HDD

InfiniFlash OS performs automatic data placement and data movement between tiers based transparent to Applications

User defined Policies for data placement on tiers

Can be used with Erasure coding to further reduce the TCO

Benefits

Flash based performance with HDD like TCO

Lower performance requirements on HDD tier enables use of denser and cheaper SMR drives

Denser and lower power compared to HDD only solution

InfiniFlash for High Activity data and SMR drives for Low activity data

60+ HDD per Server

Compute Farm

Page 18: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Flash Primary + HDD ReplicasFlash Performance with TCO of HDD

Primary replica onInfiniFlash

HDD based data nodefor 2nd local replica

HDD based data nodefor 3rd DR replica

Higher Affinity of the Primary Replica ensures much of the compute is on InfiniFlash Data

2nd and 3rd replicas on HDDs are primarily for data protection

High throughput of InfiniFlash provides data protection, movement for all replicas without impacting application IO

Eliminates cascade data propagation requirement for HDD replicas

Flash-based accelerated Object performance for Replica 1 allows for denser and cheaper SMR HDDs for Replica 2 and 3

Compute Farm

Page 19: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

TCO Example - Object StorageScale-out Flash Benefits at the TCO of HDD

$-

$1,000

$2,000

$3,000

$4,000

$5,000

$6,000

$7,000

$8,000

TraditionalObjStore on

HDD

InfiniFlashObjectStore -3

Full Replicason Flash

InfiniFlashwith

ErasureCoding- All Flash

InfiniFlash -Flash Primary& HDD copies

x 1

00

00

3Y TCO comparison for 96PB object storage

3 Year Opex

TCA

0

20

40

60

80

100

Total Racks

• Weekly failure rate for a 100PB deployment15-35 HDD vs. 1 InfiniFlash Card

• HDD cannot handle simultaneous egress/ingress

• HDD long rebuild times, multiple failures and rebalancing of data impact in service disruption

• Flash provides guaranteed & consistent SLA

• Flash capacity utilization >> HDD due to reliability & ops

• Flash low power consumption 450W(avg), 750W(active)

Note that operational/maintenance cost and performance benefits are not accounted for in these models!!!

Page 20: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

InfiniFlash™ System

The First All-Flash Storage System Built for High Performance Ceph

Page 21: Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

21© 2015 SanDisk Corporation. All rights reserved. SanDisk is a trademark of SanDisk Corporation, registered in the United States and other countries. InfiniFlash is a trademarks of SanDisk Enterprise IP LLC. All other product and company names are used for identification purposes and may be trademarks of their respective holder(s).

http://bigdataflash.sandisk.com/infiniflash

[email protected] [email protected] Sales [email protected] Technical [email protected] Production Management