MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)

Post on 01-Nov-2014

10.591 views 1 download

Tags:

description

 

Transcript of MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)

Montse Medina

COO,

MongoDB Schema Design:

Insights and Tradeoffs

Saturday, May 5, 12

Social content is usefulin context

Saturday, May 5, 12

Social context is useful in context

Saturday, May 5, 12

Algorithms+

Infrastructure

Saturday, May 5, 12

Technology Stack

Apache Kafka

Saturday, May 5, 12

Outline

I. Schema design‣ Relational vs. Document-oriented

‣ Schema-less design

‣ Case study: Publishers & Subscribers

II. Lessons learned for schema design

III. Things to remember about MongoDBSaturday, May 5, 12

I. Schema design‣ Relational vs. Document-oriented

‣ Schema-less design

‣ Case study: Publishers & Subscribers

II. Lessons learned for schema design

III. Things to remember about MongoDB

Outline

Saturday, May 5, 12

vs

Users{ id: 1, name: “Robert”, from:[2], to: [5,20]}

{ id: 2, name:”Monica”, from:[23], to:[1,5]}

...

Users Graphid name

1 Robert2 Monica3 Lucas... ...

from to

1 51 202 12 5... ...

Relational vs. Document-oriented

Saturday, May 5, 12

vsUsers

{ id: 5, name: “Robert”, from:[1,2,4], to: [1,20,3,7,2]}

Graphfrom to

1 51 202 12 53 43 233 124 5... ...

Find all the “to” edges for user 5

Blocks

1 disk seek guaranteed!

Potentially as many

disk seeks as

“to” edges!

Saturday, May 5, 12

Advantages of doc-oriented schema•Avoid joins

•Disk locality when fetching relations (everything is stored within a doc record)

Considerations for schema design•N to Many relations == Lists

•Denormalization is more common

Saturday, May 5, 12

Outline

I. Schema design‣ Relational vs. Document-oriented

‣ Schema-less design

‣ Case study: Publishers & Subscribers

II. Lessons learned for schema design

III. Things to remember about MongoDBSaturday, May 5, 12

Schema-less design{id: 1, network: Twitter, name: “Robert”, from:[2], to: [5,20], screenName: “robertE”}

{id: 2, network: Facebook, name:”Maria”, from:[23], to:[1,5], likes: [“biking”, “hiking”]}...

Leverage the schemaless

nature of Mongo, but put

protection with types in

your code!

Saturday, May 5, 12

Outline

I. Schema design‣ Relational vs. Document-oriented

‣ Schema-less design

‣ Case study: Publishers & Subscribers

II. Lessons learned for schema design

III. Things to remember about MongoDBSaturday, May 5, 12

Read-Friendly

Case Study: Publishers & Subscribers

Saturday, May 5, 12

Read-Friendly Approach

Post: { _id: postId,owner: ownerId,recipient: recipientId,text: “message”, ...}

Hi!

Hi!

Hi!

Saturday, May 5, 12

Read-Friendly Approachdb.posts.find({recipient: uid})

Sharding Key:recipient

Fast retrieval, easy sharding

Slow writes, enormous amount of storage

Saturday, May 5, 12

Write-Friendly

Case Study: Publishers & Subscribers

Saturday, May 5, 12

Write-Friendly Approach

Post: { _id: postId, owner: oId, text: “message”, ...}

Hi!

Saturday, May 5, 12

Write-Friendly Approach

db.posts.find({owner: {$in:user.from}})

Sharding Key:?

Fast writes, slim storage

Slow reads, harder queries

Saturday, May 5, 12

Hybrid Approach

Case Study: Publishers & Subscribers

Saturday, May 5, 12

Hybrid Approach

Hi!

Post: { _id: postId, owner: ownerId, recipients: [u1, u2, u3, u5], text: “message”, ...}

Saturday, May 5, 12

Hybrid Approach

db.posts.find({recipients: uId})

Sharding Key:random :)

Fast writes, slim storage, reasonable read speed

Saturday, May 5, 12

Random sharding is not random!

Minimize the

number of disk

seeks per shard!Best -- Impossible for our data

Worse

Optimal solution

Saturday, May 5, 12

Outline

I. Schema design

II. Lessons learned for schema design‣ Indexes

‣ Concurrency

‣ Reducing collection size

III. Things to remember about MongoDBSaturday, May 5, 12

I. Schema design

II. Lessons learned for schema design‣ Indexes

‣ Concurrency

‣ Reducing collection size

III. Things to remember about MongoDB

Outline

Saturday, May 5, 12

link: { _id: ObjectId(...), url: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” }

link: { _id: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” }

IndexesPrimary Key

If your data has a natural

PK, use it instead of the

default ObjectId

Saturday, May 5, 12

Want all posts that a user can view sorted by the number of likes

Indexes Augment your schema to enable the

most selective index

Add a new “likesCount”

field!

db.posts.ensureIndex({recipients: 1,

likesCount: -1})

post: { _id: ObjectId(...), recipients: [...], likes: [...], likesCount: ..., ...}

Saturday, May 5, 12

db.posts.find({recipients: uId}).sort({date: -1})

Indexes Make sure to use the proper index

db.posts.ensureIndex({recipients: 1})db.posts.ensureIndex({date: 1})

vs

db.posts.ensureIndex({recipients: 1, date:1})

date: -1

Always test with

explain()

Saturday, May 5, 12

Outline

I. Schema design

II. Lessons learned for schema design‣ Indexes

‣ Concurrency

‣ Reducing collection size

III. Things to remember about MongoDBSaturday, May 5, 12

thread2: { _id: u1, name: “Bob”, from: [] }

db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false)

…but!

db.users.update({_id: thread1._id}, {$set: {thread1.from}})

db.users.update({_id: thread2._id}, {$set: {name: thread2.name}})

Concurrency Try to avoid “save()” in drivers

thread1: { _id: u1, name: “Robert”, from: [u2, u3] }

Saturday, May 5, 12

ConcurrencyAtomic Commutative Operators

db.users.update({_id: u1}, {$pull {to: u2}})

db.posts.update({_id: pId}, {$inc: {likesCount: 1}})

When updating lists and counters, instead of using $set, rely on

$inc, $addToSet, $pull

Saturday, May 5, 12

ConcurrencyNo Transactions

user1: { _id: u1, to: [u2, u3], from: [...], ...}

user2: { _id: u2, to: [...], from: [u1, ...], ...}

User1 wants to unsubscribe from user2.

Ideally we would update both users in one transaction

Implement it in your

code

Saturday, May 5, 12

Outline

I. Schema design

II. Lessons learned for schema design‣ Indexes

‣ Concurrency

‣ Reducing collection size

III. Things to remember about MongoDBSaturday, May 5, 12

Reducing collection sizeName your fields with short

names!

post: { owner: ObjectId, messageText: “loving Jetlore”, mediaUrl: “www.jetlore.com”, mediaTitle: “Jetlore is a user analytics & search platform for social content” }

post: { o: ObjectId, t: “loving Jetlore”, mu: “www.jetlore.com”, mt: “Jetlore is a user analytics & search platform for social content” }

vs

Saturday, May 5, 12

OutlineI. Schema design

II. Lessons learned for schema design

III. Things to remember about MongoDB‣ Single lock

‣ ($or + sort) query doesn’t use indexes properly

‣ Indexes with 2 list fields

‣ Record iterators + update

Saturday, May 5, 12

db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1})

db.posts.ensureIndex({recipients: 1, date: -1})

db.posts.ensureIndex({privacy: 1, date: -1})

Indexes with 2 list fields

db.posts.ensureIndex({recipients: 1, links: 1}) post: { _id: ObjectId(...), recipients: [...], links: [...], ... }

$or & sort query doesn’t use the proper index

Saturday, May 5, 12

Record iterators + updating

var posts = db.posts.find().skip(n).limit(t)while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}})}

Sort by a field that will not change

db.posts.renameCollection(“oldPosts”)var posts = db.oldPosts.find().skip(n).limit(t)while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}})}

var posts = db.posts.find().sort({date: 1}).skip(n).limit(t)

Sort by a field that will not change or rename the old collection

Saturday, May 5, 12

The take aways

I. What is more important?

• Writes: Optimize for easy inserts/updates

• Reads: Optimize for easy querying

II. Denormalize to enable the most selective index

III. Concurrency: design to leverage commutative operators

Saturday, May 5, 12

Thank you!Try our tech

powered by

Saturday, May 5, 12