Methods of Sharding MySQL

55
Methods of Sharding MySQL Percona Live NYC 2012 Who are Palomino? Bespoke Services: we work with and like you. Production Experienced: senior DBAs, admins, and engineers. 24x7: globally-distributed on-call staff. Short-term no-lock-in contracts. Professional Services (DevOps): Chef, Puppet, Ansible. Big Data Cluster Administration (OpsDev): MySQL, PostgreSQL, Cassandra, HBase, MongoDB, Couchbase.

description

 

Transcript of Methods of Sharding MySQL

Page 1: Methods of Sharding MySQL

Methods of Sharding MySQLPercona Live NYC 2012

Who are Palomino?Bespoke Services: we work with and like you.Production Experienced: senior DBAs, admins, and engineers.24x7: globally-distributed on-call staff.Short-term no-lock-in contracts.Professional Services (DevOps):

➢ Chef,➢ Puppet,➢ Ansible.

Big Data Cluster Administration (OpsDev):➢ MySQL, PostgreSQL,➢ Cassandra, HBase,➢ MongoDB, Couchbase.

Page 2: Methods of Sharding MySQL

Methods of Sharding MySQLPercona Live NYC 2012

Who am I?Tim EllisCTO/Principal Architect, Palomino

Achievements:➢ Palomino Big Data Strategy.➢ Datawarehouse Cluster at Riot Games.➢ Back-end Storage Architecture for Firefox Sync.➢ Led DB teams at Digg for four years.➢ Harassed the Reddit team at one of their parties.

Ensured Successful Business for:➢ Digg, Friendster,➢ Riot Games,➢ Mozilla,➢ StumbleUpon.

Page 3: Methods of Sharding MySQL

Methods of Sharding MySQLWhat is this Talk?

Large cluster admin: when one DB isn't enough.➢ What is a shard?➢ What shard types can I choose?➢ How to build a large DB cluster.➢ How to administer that giant mess of DBs.

Types of large clusters:➢ Just a bunch of databases.➢ Distributed database across machines.

Page 4: Methods of Sharding MySQL

Methods of Sharding MySQLWhere the Focus will Lie

12% – Sharding theory/considerations.

25% – Building a Cluster to administer (tutorial):➢ Palomino Cluster Tool.

50% – Flexible large-cluster administration (tutorial):➢ Tumblr's Jetpants.

13% – Other sharding technologies (talk-only):➢ Youtube's Vtocc (Vitess),➢ Twitter's Gizzard,➢ HAproxy.

Page 5: Methods of Sharding MySQL

Methods of Sharding MySQLWhat about the Silver Bullets?

NoSQL Distributed Databases:➢ Promise “sharding” for free,➢ Uptime and horizontal scaling trivially.

Reality:➢ RDBMS is 40-yr-old tech,➢ NoSQL is 10-yr-old tech,➢ Which responsible for how many high-profile

downtimes in the past 10 years?➢ Evaluate the alternatives without illusions.

Page 6: Methods of Sharding MySQL

Methods of Sharding MySQLWhat is a Shard?

A location for a subset of data:➢ Itself made of pieces.➢ Typically itself redundant.

Slave

Master

Slave

Shard for User Data

Slave

Master

Slave

Shard for Logging Data

Slave

Master

Slave

Shard for Posts Data

Slave Slave Slave

Page 7: Methods of Sharding MySQL

Methods of Sharding MySQLWhat are the Sharding Method Choices?

By-Function:➢ Move busy tables onto new shard.➢ Writes of busiest tables on new hardware.➢ Writes of remaining tables on current.

By-Columns:➢ Split table into chunks of related columns,

store each set on its own Master/Slaves shard.By-Rows:➢ A table is split into N shards, shard gets a

subset of the rows of the table.

Page 8: Methods of Sharding MySQL

Methods of Sharding MySQLShard Method Choices

By-function and By-Column Methods:➢ Much easier.➢ Can get you through months to years.➢ Eventually you run out of options here.

By-Row Method:➢ The hardest to do.➢ Requires new ways of accessing data.➢ Often requires sophisticated cache strategies.➢ Itself can be done several ways.

Page 9: Methods of Sharding MySQL

Methods of Sharding MySQLBy-Function Sharding

Picking a Functional Split:➢ A subset of tables commonly joined.➢ Tables outside this subset nearly never joined.➢ One of them responsible for many writes.

Every table outside subset requires rewriting JOINs into code-based multi-SELECTs.

Once subset of tables moved onto their own server, writes are distributed.

Page 10: Methods of Sharding MySQL

Methods of Sharding MySQLBy-Column Sharding (Vertical Partition)

Identifying candidate table:➢ Many columns (“users” anyone?),➢ Many updates,➢ Many indexes.

Required: even split of columns/indexes by update frequency. Attempt: logical grouping.

JOINs not possible nor desireable: write multi-SELECT code in application DAL.

Page 11: Methods of Sharding MySQL

Methods of Sharding MySQLRow-based Sharding Choices

Range-based Sharding:➢ Easy to understand.➢ Each shard gets a range of rows.➢ Oft-times some shards are “hot.”➢ Hot shards are split into separate shards.➢ Cold shards are joined into a single shard.➢ Juggling shard load is a frequent process.

Typically the best solution. Shortcomings have known work-arounds.

Page 12: Methods of Sharding MySQL

Methods of Sharding MySQLRow-based Sharding Choices

Modulus/Hash-based Sharding:➢ Row key is hashed to integer modulo number

of shards, then placed on that shard.➢ Only rarely are some shards are “hot.”➢ Shard splitting is difficult to implement.

Also a common method of sharding. We hope not to split shards often (or ever).

When we do, it's a multi-week process.

Page 13: Methods of Sharding MySQL

Methods of Sharding MySQLRow-based Sharding Choices

Lookup Table-based Sharding:➢ Easy to understand.➢ Row key mapped to shard in a lookup table.➢ Easy to move load off hot shards.➢ Lookup table method is problematic:

➢ Single point of failure.➢ Performance bottleneck.➢ Billions of rows, itself may need sharding.

Page 14: Methods of Sharding MySQL

Prerequisite: Build a Large ClusterAllocating the Hardware

Getting Hardware – your own company's:➢ Can be politically-charged.➢ Get a small batch first.➢ Build small demonstration cluster.➢ Get everyone on-board with the demo.

Renting/Leasing Hardware – the Cloud:➢ Allocate hardware in EC2 or elsewhere.➢ Usually easier, but possibly harder admin:

➢ Hardware failure more common.➢ Hardware/network flakiness more common.

Page 15: Methods of Sharding MySQL

Prerequisite: Build a Large ClusterBuilding the Cluster

Okay, I've got the hardware. What next?

Page 16: Methods of Sharding MySQL

Prerequisite: Build a Large ClusterBuilding the Cluster

Configuring the Hardware. The old dilemma:➢ Spend days to install/configure DB software?

Subsequent management is painful.➢ Use SSH in “for” loops?

Rolling your own configuration management tools is a lot of work.

➢ Learn a configuration management tool?Obvious choice in 2012. Well-documented tools like Chef, Puppet, Ansible.

Page 17: Methods of Sharding MySQL

Configuration Management ToolsMy Experience

Puppet: 6 years ago at Digg➢ Manage/Deploy of hundreds of servers.➢ Painful, but not as bad as hand-coding it all.

Chef: 2 years ago at Drawn to Scale and Riot➢ Manage/Deploy dozens of servers.➢ Learning Ruby is a “joy” of its own.

Ansible: 6 months ago at Palomino➢ Manage/Deploy dozens of servers.➢ First Palomino Cluster Tool subset built.

Page 18: Methods of Sharding MySQL

Prerequisite: Build a Large ClusterConfiguration Management Options

Pick your Configuration Management:➢ Chef: Popular, use Ruby to “code your

infrastructure.” Must learn Ruby.➢ Puppet: Mature, use data structures to “define

your infrastructure.” Less coding.➢ Ansible: Tiny and modular, similar to Puppet,

but with ordering for deployment. Pragmatic.Write/Get Recipes, Manifests, Playbooks?➢ Writing is tedious. Can take >1 week.➢ Get from internet? Often incomplete.

Page 19: Methods of Sharding MySQL

Prerequisite: Build a Large ClusterThe Palomino Cluster Tool

Palomino's tool for building large DB clusters:➢ Chef, Puppet, Ansible modules.➢ Open-source on Github.

➢ https://github.com/time-palominodb/PalominoClusterTool

➢ Google: “Palomino Cluster Tool.”➢ Will build a large cluster for you in hours:

➢ Master(s)➢ Slaves – hundreds of them as easy as two.➢ MHA – when master fails, a slave takes over.

➢ Previously this would take days.

Page 20: Methods of Sharding MySQL

The Palomino Cluster ToolBuilding the Management Node

Cluster Management Node:➢ Will build the initial cluster.➢ Will do subsequent cluster management.

Tool for Initial Cluster Build:➢ Palomino Cluster Tool (Ansible subset).

Tool for Cluster Management:➢ Jetpants (Ruby).

Page 21: Methods of Sharding MySQL

The Palomino Cluster ToolBuilding the Management Node

Palomino Cluster Tool (Ansible subset).

Why Ansible?➢ No server to set up, simply uses SSH.➢ Easy-to-understand non-code Playbooks.➢ Use a language you know for modules.➢ For demo purposes, obvious choice.➢ Also production-worthy:

➢ Built by Michael DeHaan, long-time configuration management guru.

Page 22: Methods of Sharding MySQL

The Palomino Cluster ToolBuilding the Management Node

Management node lives alongside your cluster.➢ We are building our cluster in EC2.➢ Thus management node in EC2.➢ This tutorial assumes Ubuntu 12.04.➢ t1.micro is fine for management node.

Install basic tools:➢ apt-get install git (for Ansible/P.C.T.)➢ apt-get install make python-jinja2 (for

Ansible)

Page 23: Methods of Sharding MySQL

The Palomino Cluster ToolConfiguring the Management Node

Install Ansible:➢ git clone git://github.com/ansible/ansible.git➢ make install

Install Palomino Cluster Tool:➢ git clone git://github.com/time-

palominodb/PalominoClusterTool.git

I think we just finished the management node!

Page 24: Methods of Sharding MySQL

The Palomino Cluster ToolAllocating Shard Nodes

Shard nodes:➢ m1.small or larger: at least 1.6GB RAM,➢ :3306, :80, and :22 open between all (one

security group in EC2),➢ Ubuntu 12.04 (other Debian-alikes at your

own risk – but may work!).

Do not need OS/database configuration:➢ Ansible will configure them.

Page 25: Methods of Sharding MySQL

The Palomino Cluster ToolBuilding the First Shard – Step 1

From README: edit IP addresses in cluster layout file (PalominoClusterToolLayout.ini):

# Alerting/Trending -----[alertmaster]10.252.157.110[trendmaster]10.252.157.110

# Servers -----[mhamanager]10.252.157.110

This section identical for all Shards.

Page 26: Methods of Sharding MySQL

The Palomino Cluster ToolBuilding the First Shard – Step 2

From README: edit IP addresses in cluster layout file (PalominoClusterToolLayout.ini):

[mysqlmasters]10.244.17.6

[mysqlslaves]10.244.26.19910.244.18.178

[mysqls:vars]master_host=10.244.17.6

This section different for every Shard.

Page 27: Methods of Sharding MySQL

The Palomino Cluster ToolBuilding the First Shard – Step 3

Run setup command to put configuration and SSH keys into /etc:

$ cd PalominoClusterTool/AnsiblePlaybooks/Ubuntu-12.04$ ./00-Setup_PalominoClusterTool.sh ShardA

Run build command – it's a wrapper around Ansible Playbooks:

$ ./10-MySQL_MHA_Manager.sh ShardA

Page 28: Methods of Sharding MySQL

The Palomino Cluster ToolBuilding the Second Shard

Just make one shard with a master and many slaves. In real life, you might do something like this instead:

for i in ShardB ShardC ShardD ; do (manual step): vim PalominoClusterToolLayout.ini (scriptable steps): ./00-Setup_PalominoClusterTool.sh $i ./10-MySQL_MHA_Manager.sh $idone

Run them in separate terminals to save time.

Page 29: Methods of Sharding MySQL

Make the Cluster RealData makes Shard Split Interesting

Fill ShardA using random data script.*

Palomino Cluster Tool includes such a tool.➢ HelperScripts/makeGiantDatafile.pl

$ ssh root@sharda-master# cd PalominoClusterTool/HelperScripts# mysql -e 'create database palomino'# ./makeGiantDatafile.pl 1200000 3 | mysql -f palomino

Install Jetpants, do shard split now.* Be sure /var/lib/mysql is on large partition!

Page 30: Methods of Sharding MySQL

Administering the ClusterInstall Jetpants

General idea: Install Ruby >=1.9.2 and RubyGems, then Jetpants via RubyGems.

On my systems, /etc/alternatives always incorrect, ln the proper binaries for Jetpants.

# apt-get install ruby1.9.3 rubygems libmysqlclient-dev# ln -sf /usr/bin/ruby1.9.3 /etc/alternatives/ruby# ln -sf /usr/bin/gem1.9.3 /etc/alternatives/gem# gem install jetpants

Page 31: Methods of Sharding MySQL

Administering the ClusterConfigure Jetpants

General idea: edit /etc/jetpants.yaml and create/own Jetpants inventory and application configuration to Jetpants user:

# vim /etc/jetpants.yaml# mkdir -p /var/jetpants# touch /var/jetpants/assets.json# chown jetpantsusr: /var/jetpants/assets.json# mkdir -p /var/www# touch /var/www/databases.yaml# chown jetpantsusr: /var/www/databases.yaml

Page 32: Methods of Sharding MySQL

Administering the ClusterJetpants Shard Splits

Tell Jetpants Console about your ShardA:

Jetpants> s = Shard.new(1, 999999999, '10.12.34.56', :ready) #10.12.34.56==ShardA masterJetpants> s.sync_configuration

Create spares within Console for all others (improved workflow in Jetpants 0.7.8):

Jetpants> topology.tracker.spares << '10.23.45.67'Jetpants> topology.tracker.spares << '10.23.45.68'Jetpants> topology.tracker.spares << '10.23.45.69'Jetpants> topology.write_configJetpants> topology.update_tracker_data

Page 33: Methods of Sharding MySQL

Administering the ClusterJetpants Shard Splits

Just for this tutorial:➢ Create the “palomino” database,➢ Break the replication on all the spares,➢ Be sure spares are read/write:

➢ Edit my.cnf,➢ service mysql restart

➢ Ensure “jetpants pools” proper:➢ One master,➢ Two slaves.

Page 34: Methods of Sharding MySQL

Administering the ClusterJetpants Shard Splits

How to perform an actual Shard Split:

$ jetpants shard_split --min-id=1 --max-id=999999999

Notes:➢ Process takes hours. Use screen or nohup.➢ LeftID == parent's first, RightID == parent's

last, no overlap/gap.➢ Make children 1-300000,300001-999999999.

Page 35: Methods of Sharding MySQL

Jetpants Shard SplittingThe Gory Details

After “jetpants shard_split”:ubuntu@ip-10-252-157-110:~$ jetpants poolsshard-1-999999999 [3GB]master = 10.244.136.107 ip-10-244-136-107 standby slave 1 = 10.244.143.195 ip-10-244-143-195 standby slave 2 = 10.244.31.91 ip-10-244-31-91 shard-1-400000 (state: replicating) [2GB]master = 10.244.144.183 ip-10-244-144-183 shard-400001-999999999 (state: replicating) [1GB]master = 10.244.146.27 ip-10-244-146-27

0 global pools 3 shard pools---- -------------- 3 total pools

3 masters 0 active slaves 2 standby slaves 0 backup slaves---- -------------- 5 total nodes

Page 36: Methods of Sharding MySQL

Jetpants ImprovementsThe Result of an Experiment

Jetpants only well-tested on RHEL/CentOS.

Palomino Cluster Tool only well-tested to build Ubuntu 12.04 clusters.

Little effort to fix Jetpants:➢ /sbin/service location different,➢ service mysql status output different.

Page 37: Methods of Sharding MySQL

Jetpants ImprovementsThe Result of an Experiment

Jetpants only well-tested on MySQL 5.1.

I built a cluster of MySQL 5.5.

A little more effort to fix Jetpants:➢ Set master_host=' ' is syntax error,➢ reset slave needs keyword “all” appended.

Page 38: Methods of Sharding MySQL

Jetpants ImprovementsThe Result of an Experiment

Jetpants only well-tested on large datasets.

I built a cluster with only hundreds of MB.

A wee tad more effort to fix Jetpants:➢ Some timings assumed large datasets,➢ Edge cases for small/quick operations

reported back to the author.

Page 39: Methods of Sharding MySQL

Jetpants ImprovementsOSS Collaboration and Win

Evan Elias implemented these fixes last week!➢ jetpants add_pool,➢ jetpants add_shard,➢ jetpants add_spare (with sanity-check spare),➢ Shards with 1 slave (not for prod!),➢ read_only spares not fatal,➢ Debian-alike (Ubuntu) fixes,➢ MySQL 5.5 fixes,➢ Mid-split Jetpants pools output simpler.

Really responsive ownership of project!

Page 40: Methods of Sharding MySQL

Twitter's GizzardWhat is it?

General Framework for distributed database.➢ Hides sharding from you.➢ Literally, it is middleware.

➢ Applications connect to Gizzard,➢ Gizzard sends connections to proper place,➢ Shard splits and hardware failure taken care of.

➢ Created at Twitter by rogue cowboys.➢ Not completely production-ready.

➢ Better than rolling your own!

Page 41: Methods of Sharding MySQL

Twitter's GizzardWhy should I use it?

You've settled on row-based partition scheme:➢ Master nearing I/O capacity, won't scale up,➢ Can't move some tables to their own pool,➢ Can't split the columns/indexes out,➢ You want to keep using the DBMS you

already know and love: Percona Server.*➢ Don't want to think about fault-tolerance or

shard splits (much),

* Actually use any storage back-end.

Page 42: Methods of Sharding MySQL

Twitter's GizzardThe Fine Print

This sounds perfect. Why not Gizzard?

Writes must follow strict diet. Must be:➢ Idempotent*,➢ Commutative**,➢ Must not have tuberculosis.

* Pfizer cannot remove the idempotency requirement of Gizzard.** Even on evenings and weekends.

Page 43: Methods of Sharding MySQL

Twitter's GizzardExpanding the Fine Print

Idempotency:➢ Submit a write. Again. And again.➢ Must be identical to doing it once.➢ Bad: “update set col = col + 1”

Commutative – writes in arbitrary order:➢ WriteA→WriteB→WriteC on Node1.➢ WriteB→WriteC→WriteA on Node2.➢ Bad: “update set col1 = 42”→“update set

col2 = col1 + 5”

Page 44: Methods of Sharding MySQL

Twitter's GizzardExpanding the Fine Print

Cluster is Eventually Consistent:➢ May return old values for reads.➢ Unknown when consistency will occur.

Like a politician's position on the budget:➢ Might be consistent in the future.➢ Just not right now.➢ Or now.

Page 45: Methods of Sharding MySQL

Twitter's GizzardWorking Around the Shortcomings

Gizzard work-around:➢ Add timestamp to every transaction.➢ Good:

➢ “col1.ts=1; update set col1=42” →➢ “update set col2=col1 + 5 where col1.ts=1”

➢ Implementation trickier if DBMS doesn't support column attributes.

Cannot escape: must radically re-think schema and application/DBMS interaction.

Page 46: Methods of Sharding MySQL

Twitter's GizzardTrying it Out

I'm convinced! How do I begin?➢ Learn Scala.➢ Clone “rowz” from Github.

➢ https://github.com/twitter/Rowz➢ Modify it to suit your needs.➢ Learn how it interacts with existing tools.➢ Write new monitoring/alerting plugins.➢ Write unit tests!➢ You should OSS it to help with overhead.

Page 47: Methods of Sharding MySQL

Twitter's GizzardTrying it Out

Sounds daunting. Maybe I'll roll my own?

Learn from others' mistakes:➢ Digg: 2 engineers 6 months. Code thrown

away. Digg out of business.➢ Countless identical stories in Silicon Valley.

NIHS attitude == Go out of business*.

* 8-figure R&D budgets excepted.

Page 48: Methods of Sharding MySQL

Youtube's Vitess/VtoccWhat is it?

Vitess is a library. Vtocc is an implemenation using it.

Vtocc is another middleware solution.➢ Sharding,➢ Caching,➢ Connection-pooling,➢ In-use at Youtube,➢ Built-in fail-safe features.

Page 49: Methods of Sharding MySQL

Youtube's VtoccWhy use it?

Proven high-volume sharding solution.

Interesting feature-list:➢ Auto query/transaction over-limit killing.➢ Better query-cache implementation.➢ Query comment-stripping for query cache.➢ Query consolidation.➢ Zero downtime restarts.

Less coding than Gizzard (more plug-in).

Page 50: Methods of Sharding MySQL

Youtube's VtoccHold on, Zero Downtime Restarts?

Just start new Vtocc instance.➢ Instance1 passes new requests to Instance2,➢ Instance1's connections get 30s to complete,➢ Instance2 kills Instance1 and takes over.

Vtocc Instance 1

Vtocc Instance 2

Page 51: Methods of Sharding MySQL

Youtube's VtoccThe Fine Print

Requires Particular Primary Keys:➢ varbinary datatype,➢ Choose carefully to prevent hot-spots.

Max result-set size: larger resultsets fail.

Additional administration burden:➢ “My query was killed. Why?”➢ Middleware adds spooky hard-to-diagnose

failure modes.

Page 52: Methods of Sharding MySQL

Youtube's VtoccImplementation Details

➢ Run Vtocc on same server as MySQL.➢ Configure Vtocc fail-safes for expected load:

➢ Pool Size (connection count),➢ Max Transactions (has own connection pool),➢ Query Timeout (before killed),➢ Transaction Timeout (before killed),➢ Max Resultset Size in rows

➢ Go language doesn't free allocated memory, so pick this value carefully.

➢ More details: http://code.google.com/p/vitess/wiki/Operations

Page 53: Methods of Sharding MySQL

HAproxyRe-thinking Proxy Topology

Old-school Proxy Topology:➢ DB Clients one one side,➢ DB Servers on the other,➢ Proxy in-between.

Single Point of Failure

Page 54: Methods of Sharding MySQL

HAproxyRe-thinking Proxy Topology

Free proxy provides new architecture option:➢ Proxy on every DB client node.➢ Good-bye single-point-of-failure.➢ Hello configuration management for proxy.

HAproxy

HAproxy

HAproxy

HAproxy

HAproxy

Page 55: Methods of Sharding MySQL

Methods of Sharding MySQLQ&A

Questions? Suggestions:➢ Interesting stuff. Got a job for me?➢ Well I got a job for you. Interested?➢ Warn me next time so I can sleep in the back

row.➢ Was that a question?

Thank you! Emails to domain palominodb, username time. Percona Live 2012 in New York City. Enjoy the rest of the show!