MySQL and Ceph - Percona · Agenda • Ceph Introduction and Architecture • Why MySQL on Ceph •...

48
MySQL and Ceph A tale of two friends

Transcript of MySQL and Ceph - Percona · Agenda • Ceph Introduction and Architecture • Why MySQL on Ceph •...

MySQL and CephA tale of two friends

Karan SinghSr. Storage ArchitectRed Hat

Taco ScargoSr. Solution ArchitectRed Hat

Agenda

• Ceph Introduction and Architecture

• Why MySQL on Ceph

• MySQL and Ceph Performance Tuning

• Head-to-Head Performance MySQL on Ceph vs. AWS

• Architectural Considerations

• Where to go next ?

Quick Poll

- Who runs DB workloads on VM / Cloud?- Who is familiar with Ceph ?

Ceph Introduction & Architecture

What is Ceph ?

• Open Source• Software Defined Storage Solution• Unified Storage Platform ( Block , Object and File Storage )• Runs on Commodity Hardware• Self Managing, Self Healing• Massively Scalable • No Single Point of failure

Ceph : Under the hood

Architectural Components

RGWA web services

gateway for object storage, compatible with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable,

fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and

scale-out metadata

OBJECTS

VIRTUAL DISKS

FILESYSTE

M

RADOS Components

OSDs ( Object Storage Daemon )• 10s to 10000s in a cluster• Typically one daemon per disk• Stores actual data on disk• Intelligently peer for replication & recovery

Monitors• Maintain cluster membership and health• Provide consensus for distributed decision-making• Small, odd number• Do not store dataM

Ceph OSDs

XFS

DISK

OSD

DISK

OSD

XFS

DISK

OSD

XFS

DISK

OSD

XFS

M

M

M

RADOS cluster a.k.a Ceph cluster

APPLICATION

M M

M M

M

RADOS CLUSTER

How to access the cluster ?

??

APPLICATION

M

M

M

OBJECTS

CRUSH Algorithm

CLUSTER

10

01

01

10

10

01

11

01

10

01

01

10

10

01 11

01

1001

0110 10 01

11

01

PLACEMENT GROUPS (PGs)

OBJECTS

Controller Replication Under Scalable Hashing

OBJECTS

OBJECTS

OBJECTS

OBJECTS

Data is organized into pools

CLUSTER

10

01

01

10

10

01 11

01

1001

0110 10 01

11

01

POOLS(CONTAINING PGs)

10

01

11

01

10

01

01

10

01

10

10

01

11

01

10

01

10 01 10 11

01

11

01

10

10

01

01

01

10

10

01

01

POOLA

POOLB

POOL C

POOLD

Ceph Access Methods

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object storage, compatible with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable,

fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and

scale-out metadata

APP

HOST/VM

CLIENT

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object storage, compatible with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable,

fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and

scale-out metadata

APP

HOST/VM

CLIENT

STORING VIRTUAL DISKS

M M

RADOS CLUSTER

HYPERVISORLIBRBD

VM

M

VIRTUAL MACHINE LIVE MIGRATION

M M

RADOS CLUSTER

HYPERVISORLIBRBD

VM HYPERVISORLIBRBD

M

PERSISTENT STORGE FOR CONTAINERS

M M

RADOS CLUSTER

CONTAINER HOSTKRBD

PERCONA SERVER ON KRBD

M M

RADOS CLUSTER

CONTAINER HOSTKRBD

Why MySQL on Ceph

• Ceph #1 block storage for OpenStack

• MySQL #4 workload on OpenStack

• (#1 - 3 often use database too !)

• 70% apps use LAMP on OpenStack

• MySQL leading open-source RDBMS

• Ceph leading open-source SDS

Why MySQL on Ceph?MARKET DRIVERS

• Distributed, elastic storage pools on commodity servers

• Dynamic data placement

• Flexible volume resizing

• Live instance migration

• Pool and volume snapshot

• Read replicas via copy-on-write snapshots

• Familiar environment like public clouds

Why MySQL on Ceph?OPS EFFICIENCY DRIVERS

Why MySQL on Ceph?Database Requires HIGH IOPS

Workload Media Access MethodGeneral Purpose Spinning/SSD Block

Capacity ( $/GB ) Spinning ObjectHigh IOPS ( $/IOPS ) SSD / NVMe Block

MySQL and Ceph : Performance Tuning

Tuning MySQL• Buffer pool > 20%

• Flush each Transaction or batch?

• Percona Parallel double write buffer feature

Tuning Ceph• RHCS 1.3.2, tcmalloc 2.4 , 128M thread cache

• If ( OSDs on Flash media) ; then

• Co-resident journals

• 2-4 OSDs per SSD/NVMe

• If ( OSDs on Magnetic media ) ; then

• SSD Journals

• RAID write back cache

• RBD cache

• Software cache

Tuning for Harmony

Effect of MySQL Buffer Pool On TpmCTuning for Harmony

Effect of MySQL Tx flush on TpmCTuning for Harmony

Tuning for HarmonyCreating a separate pool to serve IOPS workload

Creating multiple pools in the CRUSH map

• Distinct branch in OSD tree

• Edit CRUSH map, add SSD rules

• Create pool, set crush_ruleset to SSD rule

• If storage provisioning using OpenStack ; then

• Add volume type to Cinder

• If ! OpenStack ; then

• Provision database storage volumes from SSD pool

Head – to – Head PerformanceMySQL on Ceph vs. MySQL on AWS

30 IOPS/GB: AWS EBS P-IOPS TARGET

Head-To-Head LABTest Environment

• EC2 r3.2xlarge and m4.4xlarge

• EBS Provisioned IOPS and GPSSD

• Percona Server

• Supermicro servers

• Red Hat Ceph Storage RBD

• Percona Server

Ceph OSD Nodes5x SuperStorage SSG-6028R-OSDXXX

Dual Intel Xeon E5-2650v3 (10x core)32GB SDRAM DDR32x 80GB boot drives 4x 800GB Intel DC P3700 (hot-swap U.2 NVMe)1x dual port 10GbE network adaptors AOC-STGN-i2S 8x Seagate 6TB 7200 RPM SAS (unused in this lab)Mellanox 40GbE network adaptor(unused in this lab)

MySQL Client Nodes12x Super Server 2UTwin2 nodes

Dual Intel Xeon E5-2670v2 (cpuset limited to 8 or 16 vCPUs)64GB SDRAM DDR3

Storage Server Software:Red Hat Ceph Storage 1.3.2Red Hat Enterprise Linux 7.2Percona Server 5.7.11

5x OSD Nodes 12x Client Nodes

Shared 10G

SFP

+ Netw

orking

Monitor Nodes

SUPERMICRO Ceph ClusterLab Environment

IOPS/GB per MySQL Instance

Focusing on Write IOPS/GBAWS does throttling to serve deterministic performance

Effect of Ceph cluster loading on IOPS/GB

HEAD-TO-HEAD: MySQL on Ceph vs. AWS

$/STORAGE-IOP

Architectural Considerations

Architectural ConsiderationsUnderstanding the workloads

Traditional Ceph Workload

• $/GB

• PBs

• Unstructured data

• MB/sec

MySQL Ceph Workload

• $/IOP

• TBs

• Structured data

• IOPS

Fundamentally Different Design

Traditional Ceph Workload

• 50-300+ TB per server

• Magnetic Media (HDD)

• Low CPU-core:OSD ratio

• 10GbE->40GbE

MySQL Ceph Workload

• < 10 TB per server

• Flash (SSD -> NVMe)

• High CPU-core:OSD ratio

• 10GbE

Architectural Considerations

Considering CPU Core to Flash Ratio

8x Nodes in 3U chassisModel: SYS-5038MR-OSDXXXP

Per Node Configuration:CPU: Single Intel Xeon E5-2630 v4Memory: 32GB NVMe Storage: Single 800GB Intel P3700 Networking: 1x dual-port 10G SFP+

+ +

1x CPU + 1x NVMe + 1x SFP

SUPERMICRO MICRO CLOUDCEPH MYSQL PERFORMANCE SKU

Where to go Next ?

MySQL on Red Hat Ceph Storage

Download the PDFhttp://bit.ly/mysql-on-ceph

Reference Architecture White Paper

Red Hat Ceph Storage Test DriveLearning by Doing

• Absolutely Free• Ceph playground• 10 Node Ceph Lab on AWS• Self paced , instruction led

http://bit.ly/ceph-test-drive

Ceph Test Drive: http://bit.ly/ceph-test-drive

MySQL on Ceph Reference Arch: http://bit.ly/mysql-on-ceph

Thank You

Join us to hear aboutMySQL and Red Hat Storage Free Test Drive Environment

Today 3:40 PM, Room : Lausanne

How to access the cluster ?