The Art of Database Sharding

41
The Art of Database Sharding Maxym Kharchenko Amazon.com

description

The Art of Database Sharding. Maxym Kharchenko Amazon.com. April 22-26, 2012 Mandalay Bay Convention Center Las Vegas, Nevada, USA. www.collaborate12.org www.collaborate12.ioug.org. When your data grows …. New System. Problem. Old System. The Big Data problem. - PowerPoint PPT Presentation

Transcript of The Art of Database Sharding

Page 1: The Art of Database Sharding

The Artof Database Sharding

Maxym KharchenkoAmazon.com

Page 2: The Art of Database Sharding

April 22-26, 2012Mandalay Bay Convention Center

Las Vegas, Nevada, USA

www.collaborate12.orgwww.collaborate12.ioug.org

Page 3: The Art of Database Sharding

When your data grows …

Old System

New SystemProblem

Page 4: The Art of Database Sharding

One machine is not enough

The Big Data problem

Page 5: The Art of Database Sharding

Vertical Scaling

Page 6: The Art of Database Sharding

Scaling Up …

Page 7: The Art of Database Sharding

Scaling Up …

Page 8: The Art of Database Sharding

Scaled!

Page 9: The Art of Database Sharding

What you getwhen you scale up

2+2=5

Page 10: The Art of Database Sharding

What you getwhen you scale up

2+2=3

Page 11: The Art of Database Sharding

Scale out, not up

Page 12: The Art of Database Sharding

0 1 2 3 4 5

Number of machines

Difficulty

1

10,000,000

Running on >1 machines

Courtesy: John Rauser @amazon.com

Page 13: The Art of Database Sharding

Distributed computing is hard

Page 14: The Art of Database Sharding

Distributed System

Page 15: The Art of Database Sharding

Sharded System

Page 16: The Art of Database Sharding

Sharding is (relatively) easy

Page 17: The Art of Database Sharding

Split your datainto small independent chunks

And run each chunkon cheap commodity hardware

Page 18: The Art of Database Sharding

How to split your data

Data

DataData

DataData

Page 19: The Art of Database Sharding

How to split your data

Page 20: The Art of Database Sharding

How to split your data

Page 21: The Art of Database Sharding

How to split your data

Page 22: The Art of Database Sharding

How to split your data

Page 23: The Art of Database Sharding

Step 1: Split off different things

Page 24: The Art of Database Sharding

Vertical Partitioning

Page 25: The Art of Database Sharding

Vertical Partitioning

Page 26: The Art of Database Sharding

Vertical Partitioning

Page 27: The Art of Database Sharding

Step 2: Chose sharding keyand function

Page 28: The Art of Database Sharding

Sharding

Page 29: The Art of Database Sharding

Bad Sharding

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z0

1

2

3

4

5

6

7

8

9

Last Names Distribution Shard Size

1 2 3 4

Can we partition collaborate participants by last name ?

CREATE TABLE Collaborate_Participants ( last_name varchar2(30) PRIMARY KEY, signup_date date)

Page 30: The Art of Database Sharding

Avalanche Effect

Bad Distribu-tion

Good Distribution

i.e. MD5

Page 31: The Art of Database Sharding

Step 3: Make enough shards

Page 32: The Art of Database Sharding

Hashes and Buckets

MOD

Good DistributionMOD

MOD

Page 33: The Art of Database Sharding

Resharding

Hashed_idShard:

mod(hashed_id, 3)1 12 23 04 15 26 07 18 29 010 111 212 0

3 shards Adding 4th shard

Hashed_idOld Shard:

mod(hashed_id, 3)

New Shard: mod(hashed_id,

4)1 1 12 2 23 0 34 1 05 2 16 0 27 1 38 2 09 0 110 1 211 2 312 0 0

Hashed_idOld Shard:

mod(hashed_id, 3)

New Shard: mod(hashed_id,

4)1 1 12 2 23 0 34 1 05 2 16 0 27 1 38 2 09 0 110 1 211 2 312 0 0

75 % bad

Page 34: The Art of Database Sharding

Logical Shards

MOD

Good Distribution

MOD

MOD

MOD

Page 35: The Art of Database Sharding

Implementing Shards: Standbys

Unsharded StandbyShard 1 Shard 2

Apps

Read Only

Page 36: The Art of Database Sharding

Implementing Shards: Tables

Shard1

Apps

TabA

Shard 2

MVA

TabA

Create materialized view … as select …from a@shard1

Dropmaterialized view … preserve table

Read Only

Page 37: The Art of Database Sharding

Why shards are awesome• Small data, small load

– Better caching, faster queries– Smaller load, fewer surprises– Faster maintenance, i.e. restores

• Eggs not in one basket:– Availability redefined– Safer maintenance

• Multiple points of view:– SQL performance– System load

Page 38: The Art of Database Sharding

Why shards are NOT so great

• More systems– Power, rack space etc– Needs automation … bad– More likely to fail overall

• Some operations become impractical:– Joins across shards– Foreign keys across shards

• More work:– Applications, developers, DBAs– High skill, DIY everything

Page 39: The Art of Database Sharding

Thank you

Page 40: The Art of Database Sharding

Implementing Shards:Moving “data head”

Shard 1

Apps

Shard 2

Logical Shard

Physical Shard

(1,2,3,4) 1(5,6,7,8) 2

Time Logical Shard

Physical Shard

2011(1,2,3,4) 12011(5,6,7,8) 2

Time Logical Shard

Physical Shard

2011(1,2,3,4) 12011(5,6,7,8) 22012(1,2) 12012(3,4) 32012(5,6) 22012(7,8) 4

Shard 3 Shard 4

Page 41: The Art of Database Sharding

Bad Sharding. Example 2

order_id:10000 - 20000

order_id:20001 - 30000

order_id:30001 - 40000

order_id:40001 - 50000

CREATE TABLE Orders ( order_id number PRIMARY KEY, customer_fname varchar2(30), customer_lname varchar2(30), order_date date)

Can we shard customers by meaningless sequence ?