The Artof Database Sharding
Maxym Kharchenko
Amazon.com
Whoami
• Started as a database kernel developer– Network database: db_VISTA
• ORACLE DBA for ~ 10-12 years– Starting with ORACLE 8
• Last 3 years: Sr. Persistence Engineer @Amazon.com
• OCM, ORACLE Ace Associate
• Blog: http://intermediatesql.com• Twitter: @maxymkh
Agenda
• The “big data” scaling problem
• Solving scaling with “sharding”
• Practical sharding
• Your sharding experience: Good and bad
How to scale a database
Old System
New SystemProblem
2013 2014 2015 2016 2017
The Big Data problem
Vertical Scaling
Scaling Up …
Scaling Up …
Scaled!
“Scaling up” math:System capabilities
2+2=3
“Scaling up” math:System cost
2+2=7
Scale out, not up
Use lots of cheap machines
Not bigger machines
Commodity hardware
=
$$$$$ $$
Distributed System
Distributed System
Distributed System
Distributed computing is hard
Shared Nothing (“Sharded”) System
Sharding is (relatively) easy
Split your datainto small independent chunks
And run each chunkon cheap commodity hardware
How to split your data
Data
How to split your data
How to split your data
How to split your data
How to split your data
Vertical Partitioning
Vertical Partitioning
Vertical Partitioning
Horizontal Partitioning
Sharding
Sharding
CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200));
CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200)
) SHARD BY <method> (<shard_key>) ( SPLIT SIZE evenly SPLIT LOAD evenly DISCOURAGE CROSS SHARD ACCESS DISCOURAGE DATA MOVE USING 4 DATABASES);
Sharding
Split size evenly
SHARD BY LIST ( first_letter(author) ) ( SPLIT SIZE evenly);
A-G H-M N-TU-Z
Split load evenly
SHARD BY RANGE (id) ( SPLIT SIZE evenly SPLIT LOAD evenly);
1-100 101-200 201-300 301-400
Split load evenly
SHARD BY HASH (id) ( SPLIT SIZE evenly SPLIT LOAD evenly);
0 1 2 3
Discourage cross shard access
SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS);
SELECT title FROM booksWHERE id = 34567876;
Discourage cross shard access
SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS);
SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;
Discourage cross shard access
SHARD BY HASH (author) ( DISCOURAGE CROSS SHARD ACCESS);
0 1 2 3
SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;
Discourage data move
SHARD BY mod(hash(author), 4) ( DISCOURAGE DATA MOVE);
0 1 2 3
Discourage data move
SHARD BY mod(hash_function(author), 6) ( DISCOURAGE DATA MOVE);
0 1 2 3
4 5
ReshardingHash Mod/4
1 12 23 34 05 16 27 38 09 110 211 312 0
Hash Mod/4 Mod/61 1 12 2 23 3 34 0 45 1 56 2 07 3 18 0 29 1 310 2 411 3 512 0 0
Physical and Logical shards
SHARD BY mod(hash(author), 1200) ( DISCOURAGE DATA MOVE);
DB 1 DB 2 DB 3 DB 4
Executing queriesdef shard_query(sql, binds, shard_key): """ Execute query in the correct db """
shard_hash = hash(shard_key) logical_bucket = mod(shard_hash, TOTAL_BUCKETS) physical_db = memcached_get_db(logical_bucket) execute_query(physical_db, sql, binds)
SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;
Implementing Shards: Standbys
Unsharded StandbyShard 1 Shard 2
Apps
Read Only
Drop non-qualifying data Drop non-qualifying data
Implementing Shards: Tables
Shard1
Apps
TabA
Shard 2
MVA
TabA
Create materialized view … as select …from a@shard1
Dropmaterialized view … preserve table
Read Only
Implementing Shards:Moving “data head”
Shard 1
Apps
Shard 2
Logical Shard
Physical Shard
(1,2,3,4) 1(5,6,7,8) 2
Time Logical Shard
Physical Shard
2011(1,2,3,4) 12011(5,6,7,8) 2
Time Logical Shard
Physical Shard
2011(1,2,3,4) 12011(5,6,7,8) 22012(1,2) 12012(3,4) 32012(5,6) 22012(7,8) 4
Shard 3 Shard 4
Data protection
Shard 1 Shard 2 Shard 4Shard 3
Stb 1 Stb 2 Stb 4Stb 3
App App
Why shards are awesome
• (potentially) Unlimited scaling
• Local ACID + relational
• Better maintenance
• Eggs not in one basket
• “Apples to apples comparison” with other shards
Why shards are NOT so great
• More systems– Power, rack space etc– Needs automation … bad– More likely to fail overall
• Some operations become difficult:– Transactions across shards– Foreign keys across shards
• More work:– Applications, developers, DBAs– High skill, DIY everything
Takeaways
More > Bigger
ORACLE is still cool
Thank you!
[email protected]: maxymkh@Blog: http://intermediatesql.com
Top Related