20121023 mongodb schema-design

117
SCHEMA DESIGN WORKSHOP Jeremy Mikola @jmikola

Transcript of 20121023 mongodb schema-design

Page 1: 20121023 mongodb schema-design

SCHEMA DESIGN WORKSHOP

Jeremy Mikola@jmikola

Page 2: 20121023 mongodb schema-design

AGENDA1. Basic schema design principles for MongoDB2. Schema design over an application's lifetime3. Common design patterns4. Sharding

Page 3: 20121023 mongodb schema-design

GOALSLearn the schema design process in MongoDBPractice applying common principles via exercisesUnderstand the implications of sharding

Page 4: 20121023 mongodb schema-design

WHAT IS A SCHEMA AND WHY IS ITIMPORTANT?

Page 5: 20121023 mongodb schema-design

SCHEMAMap concepts and relationships to dataSet expectations for the dataMinimize overhead of iterative modificationsEnsure compatibility

Page 6: 20121023 mongodb schema-design

NORMALIZATIONusers

usernamefirst_namelast_name

← bookstitleisbnlanguagecreated_byauthor

→ authorsfirst_namelast_name

Page 7: 20121023 mongodb schema-design

DENORMALIZATIONusers

usernamefirst_namelast_name

← bookstitleisbnlanguagecreated_byauthorfirst_namelast_name

Page 8: 20121023 mongodb schema-design

WHAT IS SCHEMA DESIGN LIKE INMONGODB?

Schema is defined at the application-levelDesign is part of each phase in its lifetimeThere is no magic formula

Page 9: 20121023 mongodb schema-design

MONGODB DOCUMENTSStorage in BSON → BSONSpec.org

ScalarsDoublesIntegers (32 or 64-bit)UTF-8 stringsUTC Date, timestampBinary, regex, codeObject IDnull

Rich typesObjectsArrays

Page 10: 20121023 mongodb schema-design

TERMINOLOGY{    "mongodb"    : "relational db",    "database"   : "database",    "collection" : "table",    "document"   : "row",    "index"      : "index",    "sharding" : {        "shard"     : "partition",        "shard key" : "partition key"    }}

Page 11: 20121023 mongodb schema-design

THREE CONSIDERATIONS IN MONGODBSCHEMA DESIGN

1. The data your application needs2. Your application's read usage of the data3. Your application's write usage of the data

Page 12: 20121023 mongodb schema-design

CASE STUDYLIBRARY WEB APPLICATION

Different schemas are possible

Page 13: 20121023 mongodb schema-design

AUTHOR SCHEMA{    "_id": int,    "first_name": string,    "last_name": string}

Page 14: 20121023 mongodb schema-design

USER SCHEMA{    "_id": int,    "username": string,    "password": string}

Page 15: 20121023 mongodb schema-design

BOOK SCHEMA{    "_id": int,    "title": string,    "slug": string,    "author": int,    "available": boolean,    "isbn": string,    "pages": int,    "publisher": {        "city": string,        "date": date,        "name": string    },    "subjects": [ string, string ],    "language": string,    "reviews": [       { "user": int, "text": string },       { "user": int, "text": string }    ],}

Page 16: 20121023 mongodb schema-design

EXAMPLE DOCUMENTS

Page 17: 20121023 mongodb schema-design

AUTHOR DOCUMENT> db.authors.findOne(){    _id: 1,    first_name: "F. Scott",    last_name: "Fitzgerald"}

Page 18: 20121023 mongodb schema-design

USER DOCUMENT> db.users.findOne(){    _id: 1,    username: "[email protected]",    password: "slsjfk4odk84k209dlkdj90009283d"}

Page 19: 20121023 mongodb schema-design

BOOK DOCUMENT> db.books.findOne(){    _id: 1,    title: "The Great Gatsby",    slug: "9781857150193‐the‐great‐gatsby",    author: 1,    available: true,    isbn: "9781857150193",    pages: 176,    publisher: {        name: "Everyman's Library",        date: ISODate("1991‐09‐19T00:00:00Z"),        city: "London"    },    subjects: ["Love stories", "1920s", "Jazz Age"],    language: "English",    reviews: [       { user: 1, text: "One of the best…" },       { user: 2, text: "It's hard to…" }    ]}

Page 20: 20121023 mongodb schema-design

EMBEDDED OBJECTSAKA EMBEDDED OR SUB-DOCUMENTS

What advantages do they have?

When should they be used?

Page 21: 20121023 mongodb schema-design

EMBEDDED OBJECTS> db.books.findOne(){    _id: 1,    title: "The Great Gatsby",    slug: "9781857150193‐the‐great‐gatsby",    author: 1,    available: true,    isbn: "9781857150193",    pages: 176,    publisher: {        name: "Everyman's Library",        date: ISODate("1991‐09‐19T00:00:00Z"),        city: "London"    },    subjects: ["Love stories", "1920s", "Jazz Age"],    language: "English",    reviews: [       { user: 1, text: "One of the best…" },       { user: 2, text: "It's hard to…" }    ]}

Page 22: 20121023 mongodb schema-design

EMBEDDED OBJECTSGreat for read performanceOne seek to load the entire documentOne round trip to the databaseWrites can be slow if constantly adding to objects

Page 23: 20121023 mongodb schema-design

LINKED DOCUMENTSWhat advantages does this approach have?

When should they be used?

Page 24: 20121023 mongodb schema-design

LINKED DOCUMENTS> db.books.findOne(){    _id: 1,    title: "The Great Gatsby",    slug: "9781857150193‐the‐great‐gatsby",    author: 1,    available: true,    isbn: "9781857150193",    pages: 176,    publisher: {        publisher_name: "Everyman's Library",        date: ISODate("1991‐09‐19T00:00:00Z"),        publisher_city: "London"    },    subjects: ["Love stories", "1920s", "Jazz Age"],    language: "English",    reviews: [       { user: 1, text: "One of the best…" },       { user: 2, text: "It's hard to…" }    ]}

Page 25: 20121023 mongodb schema-design

LINKED DOCUMENTSMore, smaller documentsCan make queries by ID very simpleAccessing linked document data requires extra readWhat effect does this have on the system?

Page 26: 20121023 mongodb schema-design

DATA, RAM AND DISK

Page 27: 20121023 mongodb schema-design
Page 28: 20121023 mongodb schema-design
Page 29: 20121023 mongodb schema-design
Page 30: 20121023 mongodb schema-design
Page 31: 20121023 mongodb schema-design

ARRAYSWhen should they be used?

Page 32: 20121023 mongodb schema-design

ARRAY OF SCALARS> db.books.findOne(){    _id: 1,    title: "The Great Gatsby",    slug: "9781857150193‐the‐great‐gatsby",    author: 1,    available: true,    isbn: "9781857150193",    pages: 176,    publisher: {        name: "Everyman's Library",        date: ISODate("1991‐09‐19T00:00:00Z"),        city: "London"    },    subjects: ["Love stories", "1920s", "Jazz Age"],    language: "English",    reviews: [       { user: 1, text: "One of the best…" },       { user: 2, text: "It's hard to…" }    ]}

Page 33: 20121023 mongodb schema-design

ARRAY OF OBJECTS  db.books.findOne(){   _id: 1,    title: "The Great Gatsby",    slug: "9781857150193‐the‐great‐gatsby",    author: 1,    available: true,    isbn: "9781857150193",    pages: 176,    publisher: {        name: "Everyman's Library",        date: ISODate("1991‐09‐19T00:00:00Z"),        city: "London"    },    subjects: ["Love stories", "1920s", "Jazz Age"],    language: "English",    reviews: [       { user: 1, text: "One of the best…" },       { user: 2, text: "It's hard to…" }    ],}

Page 34: 20121023 mongodb schema-design

EXERCISE #1Design a schema for users and their book reviews

Usersusername (string)email (string)

Reviewstext (string)rating (integer)created_at (date)

Usernames are immutable

Page 35: 20121023 mongodb schema-design

EXERCISE #1: SOLUTION AReviews may be queried by user or book

// db.users (one document per user){   _id: ObjectId("…"),    username: "bob",    email: "[email protected]"}

// db.reviews (one document per review){   _id: ObjectId("…"),    user: ObjectId("…"),    book: ObjectId("…"),    rating: 5,    text: "This book is excellent!",    created_at: ISODate("2012‐10‐10T21:14:07.096Z")}

Page 36: 20121023 mongodb schema-design

EXERCISE #1: SOLUTION BOptimized to retrieve reviews by user

// db.users (one document per user with all reviews){   _id: ObjectId("…"),    username: "bob",    email: "[email protected]",    reviews: [        {   book: ObjectId("…"),            rating: 5,            text: "This book is excellent!",            created_at: ISODate("2012‐10‐10T21:14:07.096Z")        }    ]}

Page 37: 20121023 mongodb schema-design

EXERCISE #1: SOLUTION COptimized to retrieve reviews by book

// db.users (one document per user){   _id: ObjectId("…"),    username: "bob",    email: "[email protected]"}

// db.books (one document per book with all reviews){   _id: ObjectId("…"),    // Other book fields…    reviews: [        {   user: ObjectId("…"),            rating: 5,            text: "This book is excellent!",            created_at: ISODate("2012‐10‐10T21:14:07.096Z")        }    ]}

Page 38: 20121023 mongodb schema-design

SCHEMA DESIGN OVER AN APPLICATION'SLIFETIME

DevelopmentProductionIterative Modifications

Page 39: 20121023 mongodb schema-design

DEVELOPMENT PHASEBasic CRUD functionality

Page 40: 20121023 mongodb schema-design

CREATE

The _id field is unique and automatically indexedMongoDB will generate an ObjectId if not provided

RUD  author = {    _id: 2,    first_name: "Arthur",    last_name: "Miller"  };

  db.authors.insert(author);

Page 41: 20121023 mongodb schema-design

READC UD> db.authors.find({ "last_name": "Miller" }){    _id: 2,    first_name: "Arthur",    last_name: "Miller"}

Page 42: 20121023 mongodb schema-design

READS AND INDEXINGExamine the query after creating an index.

> db.books.ensureIndex({ "slug": 1 })

> db.books.find({ "slug": "the‐great‐gatsby" }).explain(){    "cursor": "BtreeCursor slug_1",    "isMultiKey" : false,    "n" : 1,    "nscannedObjects" : 1,    "nscanned" : 1,    "scanAndOrder" : false,    "indexOnly" : false,    "nYields" : 0,    "nChunkSkips" : 0,    "millis" : 0,    // Other fields follow…}

Page 43: 20121023 mongodb schema-design

MULTI-KEY INDEXESIndex all values in an array field.

  > db.books.ensureIndex({ "subjects": 1 });

Page 44: 20121023 mongodb schema-design

INDEXING EMBEDDED FIELDSIndex an embedded object's field.

    > db.books.ensureIndex({ "publisher.name": 1 }) 

Page 45: 20121023 mongodb schema-design

QUERY OPERATORSConditional operators

$gt, $gte, $lt, $lte, $ne, $all, $in, $nin, $size,$and, $or, $nor, $mod, $type, $exists

Regular expressionsValue in an array

$elemMatchCursor methods and modifiers

count(), limit(), skip(), snapshot(), sort(),batchSize(), explain(), hint()

Page 46: 20121023 mongodb schema-design

UPDATECR D  review = {    user: 1,    text: "I did NOT like this book."  };

  db.books.update(    { _id: 1 },    { $push: { reviews: review }}  );

Page 47: 20121023 mongodb schema-design

ATOMIC MODIFIERSUpdate specific fields within a document

$set, $unset$push, $pushAll$addToSet, $pop$pull, $pullAll$rename$bit

Page 48: 20121023 mongodb schema-design

DELETECRU  > db.books.remove({ _id: 1 })

Page 49: 20121023 mongodb schema-design

PRODUCTION PHASEEvolve schema to meet the application's read and write

patterns

Page 50: 20121023 mongodb schema-design

READ USAGEFinding books by an author's first name

  authors = db.authors.find({ first_name: /̂f.*/i }, { _id: 1 });

  authorIds = authors.map(function(x) { return x._id; });

  db.books.find({author: { $in: authorIds }});

Page 51: 20121023 mongodb schema-design

READ USAGE"Cache" the author name in an embedded document

Queries are now one step

> db.books.findOne(){    _id: 1,    title: "The Great Gatsby",    author: {        first_name: "F. Scott",        last_name: "Fitzgerald"    }    // Other fields follow…}

  > db.books.find({ author.first_name: /̂f.*/i })

Page 52: 20121023 mongodb schema-design

WRITE USAGEUsers can review a book

Document size limit (16MB)Storage fragmentation after many updates/deletes

review = {    user: 1,    text: "I thought this book was great!",    rating: 5};

  > db.books.update(    { _id: 3 },    { $push: { reviews: review }});

Page 53: 20121023 mongodb schema-design

EXERCISE #2Display the 10 most recent reviews by a userMake efficient use of memory and disk seeks

Page 54: 20121023 mongodb schema-design

EXERCISE #2: SOLUTIONStore users' reviews in monthly buckets

// db.reviews (one document per user per month){   _id: "bob‐201210",    reviews: [        {   _id: ObjectId("…"),            rating: 5,            text: "This book is excellent!",            created_at: ISODate("2012‐10‐10T21:14:07.096Z")        },        {   _id: ObjectId("…"),            rating: 2,            text: "I didn't really enjoy this book.",            created_at: ISODate("2012‐10‐11T20:12:50.594Z")        }    ]}

Page 55: 20121023 mongodb schema-design

EXERCISE #2: SOLUTIONAdding a new review to the appropriate bucket

myReview = {    _id: ObjectId("…"),    rating: 3,    text: "An average read.",    created_at: ISODate("2012‐10‐13T12:26:11.502Z")};

> db.reviews.update(      { _id: "bob‐2012‐10" },      { $push: { reviews: myReview }});

Page 56: 20121023 mongodb schema-design

EXERCISE #2: SOLUTIONDisplay the 10 most recent reviews by a user

cursor = db.reviews.find(    { _id: /̂bob‐/ },    { reviews: { $slice: 10 }}).sort({ _id: ‐1 });

num = 0;

while (cursor.hasNext() && num < 10) {    doc = cursor.next();

    for (var i = 0; i < doc.reviews.length && num < 10; ++i, ++num) {        printjson(doc.reviews[i]);    }}

Page 57: 20121023 mongodb schema-design

EXERCISE #2: SOLUTIONDeleting a review

cursor = db.reviews.update(    { _id: "bob‐2012‐10" },    { $pull: { reviews: { _id: ObjectId("…") }}});

Page 58: 20121023 mongodb schema-design

ITERATIVEMODIFICATIONS

Schema design is evolutionary

Page 59: 20121023 mongodb schema-design

ALLOW USERS TO BROWSE BY BOOKSUBJECT

How can you search this collection?Be aware of document size limitationsBenefit from hierarchy being in same document

> db.subjects.findOne(){    _id: 1,    name: "American Literature",    sub_category: {         name: "1920s",         sub_category: { name: "Jazz Age" }   }}

Page 60: 20121023 mongodb schema-design

TREE STRUCTURES> db.subjects.find(){   _id: "American Literature" }

{   _id : "1920s",    ancestors: ["American Literature"],    parent: "American Literature"}

{   _id: "Jazz Age",    ancestors: ["American Literature", "1920s"],    parent: "1920s"}

{   _id: "Jazz Age in New York",    ancestors: ["American Literature", "1920s", "Jazz Age"],    parent: "Jazz Age"}

Page 61: 20121023 mongodb schema-design

TREE STRUCTURESFind sub-categories of a given subject

> db.subjects.find({ ancestors: "1920s" }){    _id: "Jazz Age",    ancestors: ["American Literature", "1920s"],    parent: "1920s"}

{    _id: "Jazz Age in New York",    ancestors: ["American Literature", "1920s", "Jazz Age"],    parent: "Jazz Age"}

Page 62: 20121023 mongodb schema-design

EXERCISE #3Allow users to borrow library books

User sends a loan requestLibrary approves or notRequests time out after seven days

Approval process is asynchronousRequests may be prioritized

Page 63: 20121023 mongodb schema-design

EXERCISE #3: SOLUTIONNeed to maintain order and stateEnsure that updates are atomic

// Create a new loan request> db.loans.insert({    _id: { borrower: "bob", book: ObjectId("…") },    pending: false,    approved: false,    priority: 1,});

// Find the highest priority request and mark as pending approvalrequest = db.loans.findAndModify({    query: { pending: false },    sort: { priority: ‐1 },    update: { $set: { pending: true, started: new ISODate() }},    new: true});

Page 64: 20121023 mongodb schema-design

EXERCISE #3: SOLUTIONUpdated and added fieldsModified document was returned

{    _id: { borrower: "bob", book: ObjectId("…") },    pending: true,    approved: false,    priority: 1,    started: ISODate("2012‐10‐11T22:09:42.542Z")}

Page 65: 20121023 mongodb schema-design

EXERCISE #3: SOLUTION// Library approves the loan request> db.loans.update(    { _id: { borrower: "bob", book: ObjectId("…") }},    { $set: { pending: false, approved: true }});

Page 66: 20121023 mongodb schema-design

EXERCISE #3: SOLUTION// Request times out after seven dayslimit = new Date();limit.setDate(limit.getDate() ‐ 7);

> db.loans.update(    { pending: true, started: { $lt: limit }},    { $set: { pending: false, approved: false }});

Page 67: 20121023 mongodb schema-design

EXERCISE #4Allow users to recommend books

Users can recommend each book only onceDisplay a book's current recommendations

Page 68: 20121023 mongodb schema-design

EXERCISE #4: SOLUTION// db.recommendations (one document per user per book)> db.recommendations.insert({    book: ObjectId("…"),    user: ObjectId("…")});

// Unique index ensures users can't recommend twice> db.recommendations.ensureIndex(    { book: 1, user: 1 },    { unique: true });

// Count the number of recommendations for a book> db.recommendations.count({ book: ObjectId("…") });

Page 69: 20121023 mongodb schema-design

EXERCISE #4: SOLUTIONIndexes in MongoDB are not countingCounts are computed via index scansDenormalize totals on books

> db.books.update(    { _id: ObjectId("…") },    { $inc: { recommendations: 1 }}});

Page 70: 20121023 mongodb schema-design

COMMON DESIGNPATTERNS

Page 71: 20121023 mongodb schema-design

ONE-TO-ONERELATIONSHIP

Let's pretend that authors only write one book.

Page 72: 20121023 mongodb schema-design

LINKINGEither side, or both, can track the relationship.

> db.books.findOne(){    _id: 1,    title: "The Great Gatsby",    slug: "9781857150193‐the‐great‐gatsby",    author: 1,    // Other fields follow…}

> db.authors.findOne({ _id: 1 }){    _id: 1,    first_name: "F. Scott",    last_name: "Fitzgerald"    book: 1,}

Page 73: 20121023 mongodb schema-design

EMBEDDED OBJECT> db.books.findOne(){    _id: 1,    title: "The Great Gatsby",    slug: "9781857150193‐the‐great‐gatsby",    author: {        first_name: "F. Scott",        last_name: "Fitzgerald"    }    // Other fields follow…}

Page 74: 20121023 mongodb schema-design

ONE-TO-MANYRELATIONSHIP

In reality, authors may write multiple books.

Page 75: 20121023 mongodb schema-design

ARRAY OF ID'SThe "one" side tracks the relationship.

Flexible and space-efficientAdditional query needed for non-ID lookups

> db.authors.findOne(){    _id: 1,    first_name: "F. Scott",    last_name: "Fitzgerald",    books: [1, 3, 20]}

Page 76: 20121023 mongodb schema-design

SINGLE FIELD WITH IDThe "many" side tracks the relationship.

> db.books.find({ author: 1 }){    _id: 1,    title: "The Great Gatsby",    slug: "9781857150193‐the‐great‐gatsby",    author: 1,    // Other fields follow…}

{    _id: 3,    title: "This Side of Paradise",    slug: "9780679447238‐this‐side‐of‐paradise",    author: 1,    // Other fields follow…}

Page 77: 20121023 mongodb schema-design

ARRAY OF OBJECTS

Use $slice operator to return a subset of books

> db.authors.findOne(){    _id: 1,    first_name: "F. Scott",    last_name: "Fitzgerald",    books: [        { _id: 1, title: "The Great Gatsby" },        { _id: 3, title: "This Side of Paradise" }    ]    // Other fields follow…}

Page 78: 20121023 mongodb schema-design

MANY-TO-MANYRELATIONSHIP

Some books may also have co-authors.

Page 79: 20121023 mongodb schema-design

ARRAY OF ID'S ON BOTH SIDES> db.books.findOne(){    _id: 1,    title: "The Great Gatsby",    authors: [1, 5]    // Other fields follow…}

> db.authors.findOne(){    _id: 1,    first_name: "F. Scott",    last_name: "Fitzgerald",    books: [1, 3, 20]}

Page 80: 20121023 mongodb schema-design

ARRAY OF ID'S ON BOTH SIDESQuery for all books by a given author

Query for all authors of a given book

> db.books.find({ authors: 1 });

> db.authors.find({ books: 1 });

Page 81: 20121023 mongodb schema-design

ARRAY OF ID'S ON ONE SIDE> db.books.findOne(){    _id: 1,    title: "The Great Gatsby",    authors: [1, 5]    // Other fields follow…}

> db.authors.findOne({ _id: { $in: [1, 5] }}){    _id: 1,    first_name: "F. Scott",    last_name: "Fitzgerald"}

{    _id: 5,    first_name: "Unknown",    last_name: "Co‐author"}

Page 82: 20121023 mongodb schema-design

ARRAY OF ID'S ON ONE SIDEQuery for all books by a given author

Query for all authors of a given book

  > db.books.find({ authors: 1 });

book = db.books.findOne(    { title: "The Great Gatsby" },    { authors: 1 });

db.authors.find({ _id: { $in: book.authors }});

Page 83: 20121023 mongodb schema-design

EXERCISE #5Tracking time series data

Graph recommendations per unit of timeCount by: day, hour, minute

Page 84: 20121023 mongodb schema-design

EXERCISE #5: SOLUTION A// db.rec_ts (time series buckets, hour and minute sub‐docs)> db.rec_ts.insert({    book: ObjectId("…"),    day: ISODate("2012‐10‐11T00:00:00.000Z")    total: 0,    hour:   { "0": 0, "1": 0, /* … */ "23": 0 },    minute: { "0": 0, "1": 0, /* … */ "1439": 0 }});

// Record a recommendation created one minute before midnight> db.rec_ts.update(    { book: ObjectId("…"), day: ISODate("2012‐10‐11T00:00:00.000Z") },    { $inc: { total: 1, "hour.23": 1, "minute.1439": 1 }}});

Page 85: 20121023 mongodb schema-design

BSON STORAGESequence of key/value pairsNot a hash mapOptimized to scan quickly

minute[0] [1] … [1439]

What is the cost of updating the minute before midnight?

Page 86: 20121023 mongodb schema-design

BSON STORAGEWe can skip sub-documents

hour0[0] [1] … [59]

… hour23[1380] … [1439]

How could this change the schema?

Page 87: 20121023 mongodb schema-design

EXERCISE #5: SOLUTION B// db.rec_ts (time series buckets, each hour a sub‐doc)> db.rec_ts.insert({    book: ObjectId("…"),    day: ISODate("2012‐10‐11T00:00:00.000Z")    total: 148,    hour: {        "0": { total: 7, "0": 0, /* … */ "59": 2 },        "1": { total: 3, "60": 1, /* … */ "119": 0 },        // Other hours…        "23": { total: 12, "1380": 0, /* … */ "1439": 3 }    }});

// Record a recommendation created one minute before midnight> db.rec_ts.update(    { book: ObjectId("…"), day: ISODate("2012‐10‐11T00:00:00.000Z") },    { $inc: { total: 1, "hour.23.total": 1, "hour.23.1439": 1 }}});

Page 88: 20121023 mongodb schema-design

SINGLE-COLLECTION INHERITANCETake advantage of MongoDB's features

Documents need not all have the same fieldsSparsely index only present fields

Page 89: 20121023 mongodb schema-design

SCHEMA FLEXIBILITY

Find all books that are part of a series

> db.books.findOne(){    _id: 47,    title: "The Wizard Chase",    type: "series",    series_title: "The Wizard's Trilogy",    volume: 2    // Other fields follow…}

db.books.find({ type: "series" });

> db.books.find({ series_title: { $exists: true }});

> db.books.find({ volume: { $gt: 0 }});

Page 90: 20121023 mongodb schema-design

INDEX ONLY PRESENT FIELDSDocuments without these fields will not be indexed.> db.books.ensureIndex({ series_title: 1 }, { sparse: true })

> db.books.ensureIndex({ volume: 1 }, { sparse: true })

Page 91: 20121023 mongodb schema-design

EXERCISE #6Users can recommend at most 10 books

Page 92: 20121023 mongodb schema-design

EXERCISE #6: SOLUTION// db.user_recs (track user's remaining and given recommendations)> db.user_recs.insert({    _id: "bob",    remaining: 8,    books: [3, 10]});

// Record a recommendation if possible> db.user_recs.update(    { _id: "bob", remaining: { $gt: 0 }, books: { $ne: 4 }},    { $inc: { remaining: ‐1 }, $push: { books: 4 }}});

Page 93: 20121023 mongodb schema-design

EXERCISE #6: SOLUTIONOne less unassigned recommendation remainingNewly-recommended book is now linked

> db.user_recs.findOne(){    _id: "bob",    remaining: 7,    books: [3, 10, 4]}

Page 94: 20121023 mongodb schema-design

EXERCISE #7Statistic buckets

Each book has a listing page in our applicationRecord referring website domains for each bookCount each domain independently

Page 95: 20121023 mongodb schema-design

EXERCISE #7: SOLUTION A> db.book_refs.findOne(){   book: 1,    referrers: [        { domain: "google.com", count: 4 },        { domain: "yahoo.com", count: 1 }    ]}

> db.book_refs.update(    { book: 1, "referrers.domain": "google.com" },    { $inc: { "referrers.$.count": 1 }});

Page 96: 20121023 mongodb schema-design

EXERCISE #7: SOLUTION AUpdate the position of the first matched element.

What if a new referring website is used?

> db.book_refs.update(    { book: 1, "referrers.domain": "google.com" },    { $inc: { "referrers.$.count": 1 }});

> db.book_refs.findOne(){   book: 1,    referrers: [        { domain: "google.com", count: 5 },        { domain: "yahoo.com", count: 1 }    ]}

Page 97: 20121023 mongodb schema-design

EXERCISE #7: SOLUTION B

Replace dots with underscores for key namesIncrement to add a new referring websiteUpsert in case this is the book's first referrer

> db.book_refs.findOne(){   book: 1,    referrers: {        "google_com": 5,        "yahoo_com": 1    }}

> db.book_refs.update(    { book: 1 },    { $inc: { "referrers.bing_com": 1 }},    true);

Page 98: 20121023 mongodb schema-design

SHARDING

Page 99: 20121023 mongodb schema-design

SHARDINGAd-hoc partitioningConsistent hashing

Amazon DynamoDBRange based partitioning

Google BigTableYahoo! PNUTSMongoDB

Page 100: 20121023 mongodb schema-design

SHARDING IN MONGODBAutomated managementRange based partitioningConvert to sharded system with no downtimeFully consistent

Page 101: 20121023 mongodb schema-design

SHARDING A COLLECTION

Keys range from −∞ to +∞Ranges are stored as chunks

> db.runCommand({ addshard : "shard1.example.com" });

> db.runCommand({    shardCollection: "library.books",    key: { _id : 1}});

Page 102: 20121023 mongodb schema-design

SHARDING DATA BY CHUNKS

[ −∞, +∞) → [−∞, 40)[40, +∞)

→ [−∞, 40)[40, 50)[50, +∞)

Ranges are split into chunks as data is inserted

> db.books.save({ _id: 35, title: "Call of the Wild" });> db.books.save({ _id: 40, title: "Tropic of Cancer" });> db.books.save({ _id: 45, title: "The Jungle" });> db.books.save({ _id: 50, title: "Of Mice and Men" });

Page 103: 20121023 mongodb schema-design

ADDING NEW SHARDSshard1[−∞, 40)[40, 50)[50, 60)[60, +∞)

Page 104: 20121023 mongodb schema-design

ADDING NEW SHARDS

shard1[−∞, 40) [50, 60)

shard2 [40, 50) [60, +∞)

Chunks are migrated to balance shards

  > db.runCommand({ addshard : "shard2.example.com" });

Page 105: 20121023 mongodb schema-design

ADDING NEW SHARDS

shard1[−∞, 40)

shard2 [40, 50) [60, +∞)

shard3 [50, 60)

  > db.runCommand({ addshard : "shard3.example.com" });

Page 106: 20121023 mongodb schema-design
Page 107: 20121023 mongodb schema-design
Page 108: 20121023 mongodb schema-design

SHARDING COMPONENTSmongosConfig serversShards

mongodReplica sets

Page 109: 20121023 mongodb schema-design

SHARDED WRITESInserts

Shard key requiredRouted

Updates and removesShard key optionalMay be routed or scattered

Page 110: 20121023 mongodb schema-design

SHARDED READSQueries

By shard key: routedWithout shard key: scatter/gather

Sorted queriesBy shard key: routed in orderWithout shard key: distributed merge sort

Page 111: 20121023 mongodb schema-design

EXERCISE #8Users can upload images for books

imagesimage_id: ???data: binary

The collection will be sharded by image_id.

What should image_id be?

Page 112: 20121023 mongodb schema-design

EXERCISE #8: SOLUTIONSWhat's the best shard key for our use case?

Auto-increment (ObjectId)MD5 of dataTime (e.g. month) and MD5

Page 113: 20121023 mongodb schema-design

Right-balanced Access

Page 114: 20121023 mongodb schema-design

Random Access

Page 115: 20121023 mongodb schema-design

Segmented Access

Page 116: 20121023 mongodb schema-design

SUMMARYSchema design is different in MongoDB.Basic data design principles apply.It's about your application.It's about your data and how it's used.It's about the entire lifetime of your application.

Page 117: 20121023 mongodb schema-design

THANKS!QUESTIONS?