Architecting Ceph Solutions

16
ARCHITECTING CEPH SOLUTIONS Brent Compton & Kyle Bader Red Hat Storage January 2016

Transcript of Architecting Ceph Solutions

Page 1: Architecting Ceph Solutions

ARCHITECTING CEPH SOLUTIONS

Brent Compton & Kyle BaderRed Hat StorageJanuary 2016

Page 2: Architecting Ceph Solutions

CLUSTER BUILDING BLOCKS

STANDARD SERVERS AND MEDIA (HDD, SSD, PCIE)

STANDARD NICS AND SWITCHES

WORKLOADS

ACCESS

PLATFORM

NETWORK

CEPH STORAGE CLUSTER

CEPH BLOCK & OBJECT CLIENTS

Page 3: Architecting Ceph Solutions

1. Qualify need for scale-out storage2. Design for target workload IO profile(s)3. Choose storage access method(s)4. Identify capacity5. Determine fault-domain risk tolerance6. Select data protection method

TargetCluster

Architecture

CLUSTER DESIGN CONSIDERATIONS

Page 4: Architecting Ceph Solutions

OpenStack Starter100TB

S500TB

M1PB

L2PB

IOPS OPTIMIZED

THROUGHPUTOPTIMIZED

COST-CAPACITY

OPTIMIZED

TARGET CLUSTER ARCHITECTURE

Page 5: Architecting Ceph Solutions

OpenStack Starter100TB

S500TB

M1PB

L2PB

IOPS OPTIMIZED

2-4x PCIe/NVMe slot servers (PCIe)12x 2.5” SSD bay servers

(SAS/SATA)

THROUGHPUTOPTIMIZED

12-16x 3.5” bay servers 24-36x 3.5” bay servers

24-36x 3.5” bay servers

COST-CAPACITY

OPTIMIZED60-72x 3.5” bay

servers

BROAD SERVER SIZE TRENDS

Page 6: Architecting Ceph Solutions

OpenStack Starter100TB

S500TB

M1PB

L2PB

IOPS OPTIMIZED

• Ceph RBD (block)• OSDs on all flash media (SATA SSD or PCIe)• High-bin, dual-socket CPU• 2x replication w/ backup or 3x replication• Multiple OSDs per drive (if PCIe)

THROUGHPUTOPTIMIZED

• Ceph RBD (block) or RGW (object)• OSDs on HDD media with dedicated SSD write journals (4:1 ratio)• Mid-bin, dual-socket CPU (single-socket adequate, servers <=12 OSDs)• 3x replication (RBD/RGW read intensive) or erasure-coded (RGW write-intensive)• High-bandwidth networking, >10Gb (for servers with >12 OSDs)

COST-CAPACITY

OPTIMIZED

• Ceph RGW (object)• OSDs on HDD media (write journals co-located on HDDs)• Mid-bin, single-socket CPU (dual-socket, servers >12 OSDs)• Erasure-coded data protection (v. replication)

BROAD SERVER CONFIGURATION TRENDS

Page 7: Architecting Ceph Solutions

Elastic provisioning across storage server clusterStandardized servers and networking

Petabyte scale: 10s, 100s, or 1000s of servers/clusterData HA across ‘islands’ of scale-up storage servers

Performance and capacity scaled independentlyIncremental vs. forklift upgrades

STEP 1: QUALIFY NEED FOR SCALE-OUT STORAGE

Page 8: Architecting Ceph Solutions

Performance vs. ‘cheap-and-deep’?Performance: throughput vs. IOPS intensive?

Small block vs. large block?Sequential vs. random IO?

Read vs. write mix?Latency: absolute vs. consistency targets?

STEP 2: DESIGN FOR TARGET WORKLOADS

Page 9: Architecting Ceph Solutions

DISTRIBUTED FILE* OBJECT BLOCK**

CEPH STORAGE CLUSTER

* Support for CephFS is not yet included in Red Hat Ceph Storage** RBD supported with replicated data protection only

STEP 3: CHOOSE STORAGE ACCESS METHODS

Page 10: Architecting Ceph Solutions

OpenStack Starter100TB

S500TB

M1PB

L2PB

IOPS OPTIMIZED

THROUGHPUTOPTIMIZED

COST-CAPACITY

OPTIMIZED

STEP 4: IDENTIFY CAPACITY

Page 11: Architecting Ceph Solutions

How much cluster capacity can you tolerate on one node?• With fewer nodes in the cluster, performance will be more degraded during

recovery• Each node must devote a greater % of its compute/IO utilization to recovery operations

• With fewer nodes in the cluster, maximum node utilization is limited• Each node must contribute a greater % of its reserve capacity for backfill/recovery

operations Guidelines:• Minimum supported (Red Hat Ceph Storage): 3 OSD nodes per cluster• Minimum recommended (performance cluster): 10 OSD nodes per cluster

• 1 node represents <10% of total cluster capacity• Minimum recommended (cost/capacity cluster): 7 OSD nodes per cluster

• 1 node represents <15% of total cluster capacity

STEP 5: DETERMINE FAILURE RISK TOLERANCE

Page 12: Architecting Ceph Solutions

STEP 6: SELECT DATA PROTECTION METHOD

Replication• Data is copied n times and spread onto different disks on different

servers• Clusters can tolerate n-1 disk failures without data loss• 3 replicas is a popular configuration

Erasure Coding (analogous to network RAID)• Data is encoded into k chunks with m parity chunks and spread onto

different disks on different servers• Clusters can tolerate m disk failures without data loss• 8+3 k+m is a popular configuration

This decision will affect the initial cost of your cluster more than any other.

Page 13: Architecting Ceph Solutions

1. Qualify need for scale-out storage2. Design for target workload IO profile(s)3. Choose storage access method(s)4. Identify capacity5. Determine fault-domain risk tolerance6. Select data protection method

TargetCluster

Architecture

CLUSTER DESIGN CONSIDERATIONS

Page 14: Architecting Ceph Solutions

RESOURCES

Ceph on Supermicro Performance & Sizing Guidehttp://www.redhat.com/en/resources/red-hat-ceph-storage-clusters-supermicro-storage-

servers

Ceph on Cisco UCS C3160 Whitepaperhttp://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-c-series-

rack-servers/whitepaper-C11-735004.html

Ceph on Scalable Informatics Whitepaperhttps://www.scalableinformatics.com/assets/documents/Unison-Ceph-Performance.pdf

Page 15: Architecting Ceph Solutions

RED HAT STORAGE TEST DRIVES

Test drive:bit.ly/glustertestdrive

Test-drive:bit.ly/cephtestdrive

Page 16: Architecting Ceph Solutions