Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger

Post on 26-Jun-2015

810 views 1 download

Tags:

Transcript of Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger

Zero to 1 Billion Records

Kiril Savino @holacrat

2

GC.com/about/product-team

3

• have a sense of humor

• know what use cases work best

• remember that databases are hard

• don’t understate the difficulty in scaling up

4

• 1,480,808,857 events

• 8 terabytes of primary data

• 35 nodes

• 420GB RAM on primaries

• 21TB SSD storage

• 14TB EBS storage

• 120,000 ops/s

• Model

• Scale

• Grow

• Extend

5

6

Model

November 2009 — MongoDB 1.2

• More indexes per collection

• Faster index creation

• Map/Reduce

• Stored JavaScript functions

• Configurable fsync time

• Several small features and fixes

7

{.}

8

{.?!?.}

9

Decoding/Unmarshalling

Django ORM

{.}

[---]business logic

REST

API

MySQL

10

Decoding/Unmarshalling

Django ORM

REST

API{.}

[---]business logic

MySQL

11

InningOutsBallsStrikesPitcherBatter

12

InningOutsBallsStrikesPitcherBatter

PeriodMinuteLocationShooterRebounderAssist

13

[play]

[participant][role]

[sport][play_property]

14

[play]

[participant][role]

[sport][play_property]

15

{_id: ObjectId(), code: “1B”, participants: [{player_id: ObjectId(), roles: [“batter”, “out”]}, {player_id: ObjectId(), roles: [“pitcher”]}], situation: {outs: 1, balls: 2, strikes: 0}, properties: {location: [0.45, 0.721]}}

16

{_id: ObjectId(), code: “shot”, participants: [{player_id: ObjectId(), roles: [“shooter”]}, {player_id: ObjectId(), roles: [“rebounder”]}], situation: {period: 1, time: 5:29}, properties: {location: [0.45, 0.721]}}

17

Decoding/Unmarshalling

Django ORM

REST

API{.}

business logic

{.}MongoDB

18

Decoding/Unmarshalling

Django ORM

REST

API{.}

business logic

{.}MongoDB

👏

19

Modeling data in MongoDB

20

• JSON won the internet

• Don’t write your own JSON storage engine

• Flexible schemas promote app simplicity

• Validation is your responsibility

• Invest in schema design early

21

Scale

22

23

24

25

$$$

26

$$$

😱

27

User Load

System Latency

28

User Load

System Latency

29

User Load

System Latency

30

Scaling is the process of decoupling load from latency.

Latency comes from

31

• Writing data to your database

• Reading data from your database

• Aggregating data from multiple locations

• Running complex calculations

32

{.}

This is a document.

33

{.} {.}{.}

{.}{.}

API MongoDB Browser

34

{.} {.}{.}

{.}{.}

API MongoDB Browser

35

{.} {.}{.}

{.}{.}

API MongoDB Browser

+/-*

36

Read Load

System Latency

37

{.} {.}{.}

{.}{.}

API MongoDB Browser

38

{.} {.}{.}

{.}{.}

API MongoDB Browser

+/-*

39

Write Load

System Latency

40

{.} {.}{.}

{.}{.}

API MongoDB Browser

Background+/-*

41

{.} {.}{.}

{.}{.}

API MongoDB Browser

Background+/-*

42

User Load

System Latency

43

{.}{.}{.}

44

{.}{.}{.}

{.} }

45

{.}{.}{.}

{.} }

46

Scaling data access

47

• Decouple load from latency

• Queries are expensive

• Aggregation is expensive

• Do calculation in the background

• Serve content from single* documents

48

Grow

49

50

51

52

{.}

53

{.}

54

{.}

55

{.}

56

57

{.} {$addToSet: {a: 2}}

58

{.} {$addToSet: {a: 2}}

{.} {v: 2}, {$set: {v: 3}}

59

{.}

60

61

{.} {.}

62

{a}{abc}{b}

{c} }

63

{.}

64

{.}{.}

65

{.} {.}{.}

66

{.} {.}{.}

67

{.} {.}{.}

68

{.} {.}{.}

69

{.} {.}{.}

70

{.} {.} {.}

71

<id><id><id><id><id><id><id>

To Propagate

72

<id><id><id><id><id><id><id>

To Propagate Propagating…

73

<id><id><id><id><id><id><id>

To Propagate Propagating…

<id> {.}{.}{.}

74

{$} {$} {$} {$} {$}

Growing load

75

• Denormalize for constant access time

• Use MongoDB atomic operators

• Check out optimistic locking and MVCC

• Leverage external concurrency control

• Watch your oplog

76

Extend

77

{.} +

78

79

80

So there we have it

• Design your schema to MongoDB’s strengths

• Use monolithic documents

• Don’t do (live) querying

• You can still do transactional things

• You may need to denormalize & propagate

• Think about your overall architecture

81

82

• have a sense of humor

• know what use cases work best

• remember that databases are hard

• don’t understate the difficulty in scaling up

@holacrat