MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1%...
Transcript of MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1%...
MySQL and Ceph 2 August 2016
WHOIS
Brent Compton and Kyle Bader Storage Solution Architectures Red Hat
Yves Trudeau Principal Architect Percona
AGENDA
MySQL on Ceph
• Ceph Architecture • MySQL on Ceph RBD • Sample Benchmark Results • Hardware Selection Considerations
Why MySQL on Ceph
• Ceph #1 block storage for OpenStack clouds
• 70% apps on OpenStack use LAMP stack
• MySQL leading open-source RDBMS
• Ceph leading open-source software-defined storage
WHY MYSQL ON CEPH? MARKET DRIVERS
• Shared, elastic storage pool on commodity servers
• Dynamic DB placement
• Flexible volume resizing
• Live instance migration
• Backup block pool to object pool
• Read replicas via copy-on-write snapshots
• … commonality with public cloud deployment models
WHY MYSQL ON CEPH? EFFICIENCY DRIVERS
CEPH ARCHITECTURE
ARCHITECTURAL COMPONENTS
RGW A web services
gateway for object storage, compatible with S3 and Swift
LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD A reliable, fully-distributed block device with cloud
platform integration
CEPHFS A distributed file
system with POSIX semantics and scale-
out metadata
APP HOST/VM CLIENT
RADOS COMPONENTS
OSDs • 10s to 10000s in a cluster • Typically one per disk • Serve stored objects to clients • Intelligently peer for replication & recovery
Monitors • Maintain cluster membership and state • Provide consensus for distributed decision-making • Small, odd number • These do not serve stored objects to clients
CEPH OSD
RADOS CLUSTER
RADOS CLUSTER
WHERE DO OBJECTS LIVE?
??
A METADATA SERVER?
1
2
CALCULATED PLACEMENT
EVEN BETTER: CRUSH
CLUSTER PLACEMENT GROUPS (PGs)
CRUSH IS A QUICK CALCULATION
CLUSTER
DYNAMIC DATA PLACEMENT
CRUSH: • Pseudo-random placement algorithm
• Fast calculation, no lookup • Repeatable, deterministic
• Statistically uniform distribution • Stable mapping
• Limited data migration on change • Rule-based configuration
• Infrastructure topology aware • Adjustable replication • Weighting
DATA IS ORGANIZED INTO POOLS
CLUSTER POOLS (CONTAINING PGs)
POOL A
POOL B
POOL C
POOL D
ACCESS METHODS
ARCHITECTURAL COMPONENTS
RGW A web services
gateway for object storage, compatible with S3 and Swift
LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD A reliable, fully-distributed block device with cloud
platform integration
CEPHFS A distributed file
system with POSIX semantics and scale-
out metadata
APP HOST/VM CLIENT
ARCHITECTURAL COMPONENTS
RGW A web services
gateway for object storage, compatible with S3 and Swift
LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD A reliable, fully-distributed block device with cloud
platform integration
CEPHFS A distributed file
system with POSIX semantics and scale-
out metadata
APP HOST/VM CLIENT
STORING VIRTUAL DISKS
RADOS CLUSTER
STORING VIRTUAL DISKS
RADOS CLUSTER
STORING VIRTUAL DISKS
RADOS CLUSTER
PERCONA SERVER ON KRBD
RADOS CLUSTER
TUNING MYSQL ON CEPH
HEAD-TO-HEAD: MYSQL ON CEPH VS. AWS
31
18 18
78
-
10
20
30
40
50
60
70
80
90
IOPS
/GB
(S
ysbe
nch
Writ
e)
AWS EBS Provisioned-IOPS Ceph on Supermicro FatTwin 72% Capacity Ceph on Supermicro MicroCloud 87% Capacity Ceph on Supermicro MicroCloud 14% Capacity
TUNING FOR HARMONY OVERVIEW
Tuning MySQL • Buffer pool > 20%
• Flush each Tx or batch?
• Parallel double write-buffer
flush Tuning Ceph • RHCS 1.3.2, tcmalloc 2.4
• 128M thread cache
• Co-resident journals
• 2-4 OSDs per SSD
TUNING FOR HARMONY SAMPLE EFFECT OF MYSQL BUFFER POOL ON TpmC
-
200,000
400,000
600,000
800,000
1,000,000
1,200,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpm
C
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool
TUNING FOR HARMONY SAMPLE EFFECT OF MYSQL Tx FLUSH ON TpmC
-
500,000
1,000,000
1,500,000
2,000,000
2,500,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpm
C
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
Batch Tx flush (1 sec) Per Tx flush
TUNING FOR HARMONY CREATING A SEPARATE POOL TO SERVE IOPS WORKLOADS
Creating multiple pools in the CRUSH map
• Distinct branch in OSD tree
• Edit CRUSH map, add SSD rules
• Create pool, set crush_ruleset to SSD rule
• Add Volume Type to Cinder
TUNING FOR HARMONY IF YOU MUST USE MAGNETIC MEDIA
Reducing seeks on magnetic pools
• RBD cache is safe
• RAID Controllers with write-back cache
• SSD Journals
• Software caches
HARDWARE SELECTION CONSIDERATIONS
ARCHITECTURAL CONSIDERATIONS UNDERSTANDING THE WORKLOAD
Traditional Ceph Workload • $/GB
• PBs
• Unstructured data
• MB/sec
MySQL Ceph Workload • $/IOP
• TBs
• Structured data
• IOPS
ARCHITECTURAL CONSIDERATIONS FUNDAMENTALLY DIFFERENT DESIGN
Traditional Ceph Workload • 50-300+ TB per server
• Magnetic Media (HDD)
• Low CPU-core:OSD ratio
• 10GbE->40GbE
MySQL Ceph Workload • < 10 TB per server
• Flash (SSD -> NVMe)
• High CPU-core:OSD ratio
• 10GbE
Ceph Test Drive: bit.ly/cephtestdrive
Percona Blog: https://www.percona.com/blog/2016/07/13/using-ceph-mysql/
Author: Yves Trudeau