The Art of Database Sharding Maxym Kharchenko Amazon.com.
-
Upload
felicity-barker -
Category
Documents
-
view
233 -
download
2
Transcript of The Art of Database Sharding Maxym Kharchenko Amazon.com.
Whoami
• Started as a database kernel developer– Network database: db_VISTA
• ORACLE DBA for ~ 10-12 years– Starting with ORACLE 8
• Last 3 years: Sr. Persistence Engineer @Amazon.com
• OCM, ORACLE Ace Associate
• Blog: http://intermediatesql.com• Twitter: @maxymkh
Agenda
• The “big data” scaling problem
• Solving scaling with “sharding”
• Practical sharding
• Your sharding experience: Good and bad
CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200)
) SHARD BY <method> (<shard_key>) ( SPLIT SIZE evenly SPLIT LOAD evenly DISCOURAGE CROSS SHARD ACCESS DISCOURAGE DATA MOVE USING 4 DATABASES);
Sharding
Split load evenly
SHARD BY RANGE (id) ( SPLIT SIZE evenly SPLIT LOAD evenly);
1-100 101-200 201-300 301-400
Discourage cross shard access
SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS);
SELECT title FROM booksWHERE id = 34567876;
Discourage cross shard access
SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS);
SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;
Discourage cross shard access
SHARD BY HASH (author) ( DISCOURAGE CROSS SHARD ACCESS);
0 1 2 3
SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;
ReshardingHash Mod/4
1 12 23 34 05 16 27 38 09 110 211 312 0
Hash Mod/4 Mod/61 1 12 2 23 3 34 0 45 1 56 2 07 3 18 0 29 1 310 2 411 3 512 0 0
Physical and Logical shards
SHARD BY mod(hash(author), 1200) ( DISCOURAGE DATA MOVE);
DB 1 DB 2 DB 3 DB 4
Executing queriesdef shard_query(sql, binds, shard_key): """ Execute query in the correct db """
shard_hash = hash(shard_key) logical_bucket = mod(shard_hash, TOTAL_BUCKETS) physical_db = memcached_get_db(logical_bucket) execute_query(physical_db, sql, binds)
SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;
Implementing Shards: Standbys
Unsharded StandbyShard 1 Shard 2
Apps
Read Only
Drop non-qualifying data Drop non-qualifying data
Implementing Shards: Tables
Shard1
Apps
TabA
Shard 2
MVA
TabA
Create materialized view … as select …from a@shard1
Dropmaterialized view … preserve table
Read Only
Implementing Shards:Moving “data head”
Shard 1
Apps
Shard 2
Logical Shard
Physical Shard
(1,2,3,4) 1(5,6,7,8) 2
Time Logical Shard
Physical Shard
2011(1,2,3,4) 12011(5,6,7,8) 2
Time Logical Shard
Physical Shard
2011(1,2,3,4) 12011(5,6,7,8) 22012(1,2) 12012(3,4) 32012(5,6) 22012(7,8) 4
Shard 3 Shard 4
Why shards are awesome
• (potentially) Unlimited scaling
• Local ACID + relational
• Better maintenance
• Eggs not in one basket
• “Apples to apples comparison” with other shards
Why shards are NOT so great
• More systems– Power, rack space etc– Needs automation … bad– More likely to fail overall
• Some operations become difficult:– Transactions across shards– Foreign keys across shards
• More work:– Applications, developers, DBAs– High skill, DIY everything
Thank you!
[email protected]: maxymkh@Blog: http://intermediatesql.com