Sharding Methods for MongoDB

53
Sharding Methods For MongoDB Jay Runkel j [email protected] @jayrunkel #MongoDB

description

Learn about various sharding methods for MongoDB.

Transcript of Sharding Methods for MongoDB

Page 1: Sharding Methods for MongoDB

Sharding Methods For MongoDB

Jay [email protected]@jayrunkel

#MongoDB

Page 2: Sharding Methods for MongoDB

2

• Customer Stories

• Sharding for Performance/Scale– When to shard?– How many shards do I need?

• Types of Sharding

• How to Pick a Shard Key

• Sharding for Other Reasons

Agenda

Page 3: Sharding Methods for MongoDB

Customer Stories

Page 4: Sharding Methods for MongoDB

4

Page 5: Sharding Methods for MongoDB

5

• 50M users.

• 6B check-ins to date (6M per day growth).

• 55M points of interest / venues.

• 1.7M merchants using the platform for marketing

• Operations Per Second: 300,000

• Documents: 5.5B

Foursquare

Page 6: Sharding Methods for MongoDB

6

• 11 MongoDB clusters– 8 are sharded

• Largest cluster has 15 shards (check ins)– Sharded on user id

Foursquare clusters

Page 7: Sharding Methods for MongoDB

7

• Large data set

CarFax

Page 8: Sharding Methods for MongoDB

8

• 13 billion+ documents– 1.5 billion documents added every year

• 1 vehicle history report is > 200 documents

• 12 Shards

• 9-node replica sets

• Replicas distributed across 3 data centers

CarFax Shards

Page 9: Sharding Methods for MongoDB

9

Page 10: Sharding Methods for MongoDB

What is Sharding?

Page 11: Sharding Methods for MongoDB

12

Sharding Overview

Primary

Secondary

Secondary

Shard 1

Primary

Secondary

Secondary

Shard 2

Primary

Secondary

Secondary

Shard 3

Primary

Secondary

Secondary

Shard N

Query Router

Query Router

Query Router

……

Driver

Application

Page 12: Sharding Methods for MongoDB

14

Scaling: Sharding

mongod

Read/Write Scalability

Key Range0..100

Page 13: Sharding Methods for MongoDB

15

Scaling: Sharding

Read/Write Scalability

mongod mongod

Key Range0..50

Key Range51..100

Page 14: Sharding Methods for MongoDB

16

Scaling: Sharding

mongod mongod mongod mongod

Key Range0..25

Key Range26..50

Key Range51..75

Key Range76.. 100

Read/Write Scalability

Page 15: Sharding Methods for MongoDB

How do I know I need to shard?

Page 16: Sharding Methods for MongoDB

18

Does one server/replica…

• Have enough disk space to store all my data?

• Handle my query throughput (operations per second)?

• Respond to queries fast enough (latency)?

Page 17: Sharding Methods for MongoDB

19

• Have enough disk space to store all my data?

• Handle my query throughput (operations per second)?

• Respond to queries fast enough (latency)?

Does one server/replica set…

Server Specs

Disk Capacity

Disk IOPSRAMNetwork

Disk IOPSRAMNetwork

Page 18: Sharding Methods for MongoDB

How many shards do I need?

Page 19: Sharding Methods for MongoDB

21

• Sum of disk space across shards > greater than required storage size

Disk Space: How Many Shards Do I Need?

Page 20: Sharding Methods for MongoDB

22

• Sum of disk space across shards > greater than required storage size

Disk Space: How Many Shards Do I Need?

Example

Storage size = 3 TBServer disk capacity = 2 TB

2 Shards Required

Page 21: Sharding Methods for MongoDB

23

• Working set should fit in RAM– Sum of RAM across shards > Working Set

• WorkSet = Indexes plus the set of documents accessed frequently

• WorkSet in RAM – Shorter latency– Higher Throughput

RAM: How Many Shards Do I Need?

Page 22: Sharding Methods for MongoDB

24

• Measuring Index Size and Working Setdb.stats() – index size of each collection

db.serverStatus({ workingSet: 1}) – working set size estimate

RAM: How Many Shards Do I Need?

Page 23: Sharding Methods for MongoDB

25

• Measuring Index Size and Working Setdb.stats() – index size of each collection

db.serverStatus({ workingSet: 1}) – working set size estimate

RAM: How Many Shards Do I Need?

Example

Working Set = 428 GBServer RAM = 128 GB

428/128 = 3.34

4 Shards Required

Page 24: Sharding Methods for MongoDB

26

• Sum of IOPS across shards > greater than required IOPS

• IOPS are difficult to estimate– Update doc– Update indexes– Append to journal– Log entry?

• Best approach – build a prototype and measure

Disk Throughput: How Many Shards Do I Need

Page 25: Sharding Methods for MongoDB

27

• Sum of IOPS across shards > greater than required IOPS

• IOPS are difficult to estimate– Update doc– Update indexes– Append to journal– Log entry?

• Best approach – build a prototype and measure

Disk Throughput: How Many Shards Do I Need

Example

Required IOPS = 11000Server disk IOPS = 5000

3 Shards Required

Page 26: Sharding Methods for MongoDB

28

• S = ops/sec of a single server

• G = required ops/sec

• N = # of shards

• G = N * S * .7

N = G/.7S

OPS: How Many Shards Do I Need?

Page 27: Sharding Methods for MongoDB

29

• S = ops/sec of a single server

• G = required ops/sec

• N = # of shards

• G = N * S * .7

N = G/.7S

OPS: How Many Shards Do I Need?

Sharding Overhead

Page 28: Sharding Methods for MongoDB

30

• S = ops/sec of a single server

• G = required ops/sec

• N = # of shards

• G = N * S * .7

N = G/.7S

OPS: How Many Shards Do I Need?

Example

S = 4000G = 10000

N = 3.57

4 Shards

Page 29: Sharding Methods for MongoDB

Types of Sharding

Page 30: Sharding Methods for MongoDB

32

• Range

• Tag-Aware

• Hashed

Sharding Types

Page 31: Sharding Methods for MongoDB

33

Range Sharding

mongod mongod mongod mongod

Key Range0..25

Key Range26..50

Key Range51..75

Key Range76.. 100

Read/Write Scalability

Page 32: Sharding Methods for MongoDB

34

Tag-Aware Sharding

mongod mongod mongod mongod

Shard Tags

Shard Tag Start End

Winter 23 Dec 21 Mar

Spring 22 Mar 21 Jun

Summer 21 Jun 23 Sep

Fall 24 Sep 22 Dec

Tag Ranges

Winter Spring Summer Fall

Page 33: Sharding Methods for MongoDB

35

Hash-Sharding

mongod mongod mongod mongod

Hash Range0000..4444

Hash Range4445..8000

Hash Rangei8001..aaaa

Hash Rangeaaab..ffff

Page 34: Sharding Methods for MongoDB

36

Hashed shard key

• Pros:– Evenly distributed writes

• Cons:– Random data (and index) updates can be IO

intensive– Range-based queries turn into scatter gather

Shard 1

mongos

Shard 2 Shard 3 Shard N

Page 35: Sharding Methods for MongoDB

37

Range sharding document distribution

Page 36: Sharding Methods for MongoDB

38

Hashed sharding document distribution

Page 37: Sharding Methods for MongoDB

How do I Pick A Shard Key

Page 38: Sharding Methods for MongoDB

40

Shard Key characteristics

• A good shard key has:– sufficient cardinality– distributed writes– targeted reads ("query isolation")

• Shard key should be in every query if possible– scatter gather otherwise

• Choosing a good shard key is important!– affects performance and scalability– changing it later is expensive

Page 39: Sharding Methods for MongoDB

41

Low cardinality shard key

• Induces "jumbo chunks"

• Examples: boolean field

Shard 1

mongos

Shard 2 Shard 3 Shard N

[ a, b )

Page 40: Sharding Methods for MongoDB

42

Ascending shard key

• Monotonically increasing shard key values cause "hot spots" on inserts

• Examples: timestamps, _id

Shard 1

mongos

Shard 2 Shard 3 Shard N

[ ISODate(…), $maxKey )

Page 41: Sharding Methods for MongoDB

Reasons to Shard

Page 42: Sharding Methods for MongoDB

44

• Scale– Data volume– Query volume

• Global deployment with local writes– Geography aware sharding

• Tiered Storage

• Fast backup restore

Reasons to shard

Page 43: Sharding Methods for MongoDB

45

Global Deployment/Local Writes

Primary:NYC

Secondary:NYC

Primary:LON

Primary:SYD

Secondary:LON

Secondary:NYC

Secondary:SYD

Secondary:LON

Secondary:SYD

Page 44: Sharding Methods for MongoDB

46

• Save hardware costs

• Put frequently accessed documents on fast servers– Infrequently accessed documents on less capable

servers

• Use Tag aware sharding

Tiered Storage

mongod mongod mongod mongod

Current Current Archive Archive

SSD SSD HDD HDD

Page 45: Sharding Methods for MongoDB

47

• 40 TB Database

• 2 shards of 20 TB each

• Challenge– Cannot meet restore SLA after data loss

Fast Restore

mongod mongod

20 TB 20 TB

Page 46: Sharding Methods for MongoDB

48

• 40 TB Database

• 4 shards of 10 TB each

• Solution– Reduce the restore time by 50%

Fast Restore

mongod mongod

10 TB 10 TB

mongod mongod

10 TB 10 TB

Page 47: Sharding Methods for MongoDB

Summary

Page 48: Sharding Methods for MongoDB

50

• To determine required # of shards determine– Storage requirements– Latency requirements– Throughput requirements

• Derive total– Disk capacity– Disk throughput– RAM

• Calculate # of shards based upon individual server specs

Determining the # of shards

Page 49: Sharding Methods for MongoDB

51

• Scalability

• Geo-aware clusters

• Tiered Storage

• Reduce backup restore times

Leverage Sharding For

Page 50: Sharding Methods for MongoDB

52

• MongoDB Manual: http://docs.mongodb.org/manual/sharding/

• Other Webinars:– How to Achieve Scale With MongoDB

• White Papers– MongoDB Performance Best Practices– MongoDB Architecture Guide

Sharding: Where to go from here…

Page 51: Sharding Methods for MongoDB

Get Expert Advice on Scaling. For Free.

For a limited time, if you’re considering a commercial relationship with MongoDB, you can sign up for a free one hour consult about scaling with one of our MongoDB Engineers.Sign Up: http://bit.ly/1rkXcfN

Page 52: Sharding Methods for MongoDB

54

Webinar Q&[email protected]

@jayrunkel

Stay tuned after the webinar and take our survey for your chance to win MongoDB swag.

Page 53: Sharding Methods for MongoDB

Thank You