The Art of Database Sharding

52
The Art of Database Sharding Maxym Kharchenko Amazon.com

description

The Art of Database Sharding. Maxym Kharchenko Amazon.com. Whoami. Started as a database kernel developer Network database: db_VISTA ORACLE DBA for ~ 10-12 years Starting with ORACLE 8 Last 3 years: Sr. Persistence Engineer @ Amazon.com OCM, ORACLE Ace Associate - PowerPoint PPT Presentation

Transcript of The Art of Database Sharding

Page 1: The Art of Database Sharding

The Artof Database Sharding

Maxym Kharchenko

Amazon.com

Page 2: The Art of Database Sharding

Whoami

• Started as a database kernel developer– Network database: db_VISTA

• ORACLE DBA for ~ 10-12 years– Starting with ORACLE 8

• Last 3 years: Sr. Persistence Engineer @Amazon.com

• OCM, ORACLE Ace Associate

• Blog: http://intermediatesql.com• Twitter: @maxymkh

Page 3: The Art of Database Sharding

Agenda

• The “big data” scaling problem

• Solving scaling with “sharding”

• Practical sharding

• Your sharding experience: Good and bad

Page 4: The Art of Database Sharding

How to scale a database

Old System

New SystemProblem

2013 2014 2015 2016 2017

Page 5: The Art of Database Sharding

The Big Data problem

Page 6: The Art of Database Sharding

Vertical Scaling

Page 7: The Art of Database Sharding

Scaling Up …

Page 8: The Art of Database Sharding

Scaling Up …

Page 9: The Art of Database Sharding

Scaled!

Page 10: The Art of Database Sharding

“Scaling up” math:System capabilities

2+2=3

Page 11: The Art of Database Sharding

“Scaling up” math:System cost

2+2=7

Page 12: The Art of Database Sharding

Scale out, not up

Page 13: The Art of Database Sharding

Use lots of cheap machines

Not bigger machines

Page 14: The Art of Database Sharding

Commodity hardware

=

$$$$$ $$

Page 15: The Art of Database Sharding

Distributed System

Page 16: The Art of Database Sharding

Distributed System

Page 17: The Art of Database Sharding

Distributed System

Page 18: The Art of Database Sharding

Distributed computing is hard

Page 19: The Art of Database Sharding

Shared Nothing (“Sharded”) System

Page 20: The Art of Database Sharding

Sharding is (relatively) easy

Page 21: The Art of Database Sharding

Split your datainto small independent chunks

And run each chunkon cheap commodity hardware

Page 22: The Art of Database Sharding

How to split your data

Data

Page 23: The Art of Database Sharding

How to split your data

Page 24: The Art of Database Sharding

How to split your data

Page 25: The Art of Database Sharding

How to split your data

Page 26: The Art of Database Sharding

How to split your data

Page 27: The Art of Database Sharding

Vertical Partitioning

Page 28: The Art of Database Sharding

Vertical Partitioning

Page 29: The Art of Database Sharding

Vertical Partitioning

Page 30: The Art of Database Sharding

Horizontal Partitioning

Page 31: The Art of Database Sharding

Sharding

Page 32: The Art of Database Sharding

Sharding

CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200));

Page 33: The Art of Database Sharding

CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200)

) SHARD BY <method> (<shard_key>) ( SPLIT SIZE evenly SPLIT LOAD evenly DISCOURAGE CROSS SHARD ACCESS DISCOURAGE DATA MOVE USING 4 DATABASES);

Sharding

Page 34: The Art of Database Sharding

Split size evenly

SHARD BY LIST ( first_letter(author) ) ( SPLIT SIZE evenly);

A-G H-M N-TU-Z

Page 35: The Art of Database Sharding

Split load evenly

SHARD BY RANGE (id) ( SPLIT SIZE evenly SPLIT LOAD evenly);

1-100 101-200 201-300 301-400

Page 36: The Art of Database Sharding

Split load evenly

SHARD BY HASH (id) ( SPLIT SIZE evenly SPLIT LOAD evenly);

0 1 2 3

Page 37: The Art of Database Sharding

Discourage cross shard access

SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS);

SELECT title FROM booksWHERE id = 34567876;

Page 38: The Art of Database Sharding

Discourage cross shard access

SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS);

SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;

Page 39: The Art of Database Sharding

Discourage cross shard access

SHARD BY HASH (author) ( DISCOURAGE CROSS SHARD ACCESS);

0 1 2 3

SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;

Page 40: The Art of Database Sharding

Discourage data move

SHARD BY mod(hash(author), 4) ( DISCOURAGE DATA MOVE);

0 1 2 3

Page 41: The Art of Database Sharding

Discourage data move

SHARD BY mod(hash_function(author), 6) ( DISCOURAGE DATA MOVE);

0 1 2 3

4 5

Page 42: The Art of Database Sharding

ReshardingHash Mod/4

1 12 23 34 05 16 27 38 09 110 211 312 0

Hash Mod/4 Mod/61 1 12 2 23 3 34 0 45 1 56 2 07 3 18 0 29 1 310 2 411 3 512 0 0

Page 43: The Art of Database Sharding

Physical and Logical shards

SHARD BY mod(hash(author), 1200) ( DISCOURAGE DATA MOVE);

DB 1 DB 2 DB 3 DB 4

Page 44: The Art of Database Sharding

Executing queriesdef shard_query(sql, binds, shard_key): """ Execute query in the correct db """

shard_hash = hash(shard_key) logical_bucket = mod(shard_hash, TOTAL_BUCKETS) physical_db = memcached_get_db(logical_bucket) execute_query(physical_db, sql, binds)

SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;

Page 45: The Art of Database Sharding

Implementing Shards: Standbys

Unsharded StandbyShard 1 Shard 2

Apps

Read Only

Drop non-qualifying data Drop non-qualifying data

Page 46: The Art of Database Sharding

Implementing Shards: Tables

Shard1

Apps

TabA

Shard 2

MVA

TabA

Create materialized view … as select …from a@shard1

Dropmaterialized view … preserve table

Read Only

Page 47: The Art of Database Sharding

Implementing Shards:Moving “data head”

Shard 1

Apps

Shard 2

Logical Shard

Physical Shard

(1,2,3,4) 1(5,6,7,8) 2

Time Logical Shard

Physical Shard

2011(1,2,3,4) 12011(5,6,7,8) 2

Time Logical Shard

Physical Shard

2011(1,2,3,4) 12011(5,6,7,8) 22012(1,2) 12012(3,4) 32012(5,6) 22012(7,8) 4

Shard 3 Shard 4

Page 48: The Art of Database Sharding

Data protection

Shard 1 Shard 2 Shard 4Shard 3

Stb 1 Stb 2 Stb 4Stb 3

App App

Page 49: The Art of Database Sharding

Why shards are awesome

• (potentially) Unlimited scaling

• Local ACID + relational

• Better maintenance

• Eggs not in one basket

• “Apples to apples comparison” with other shards

Page 50: The Art of Database Sharding

Why shards are NOT so great

• More systems– Power, rack space etc– Needs automation … bad– More likely to fail overall

• Some operations become difficult:– Transactions across shards– Foreign keys across shards

• More work:– Applications, developers, DBAs– High skill, DIY everything

Page 51: The Art of Database Sharding

Takeaways

More > Bigger

ORACLE is still cool

Page 52: The Art of Database Sharding

Thank you!

[email protected]: maxymkh@Blog: http://intermediatesql.com