MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

15
Schema Design — MongoBerlin Richard M Kreuter 10gen Inc. [email protected] March 25, 2011 Schema Design — MongoBerlin

description

 

Transcript of MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Page 1: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Schema Design — MongoBerlin

Richard M Kreuter10gen Inc.

[email protected]

March 25, 2011

Schema Design — MongoBerlin

Page 2: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Observations about Relational Database Schemas

Relational schema design is often presented and thought of asan exercise in normalization. While academics debate howmany normal forms can fit on the head of a pin, practitionerstend to employ just one or two.

However, all nontrivial real-world applications employ a varietyof strategic denormalizations: materialized views in theRDBMS, caching layers outside the RDBMS. Thesedenormalizations tend to be vital to real-world performance.

Finally, application programmers seldom code in relations, butrather in object graphs; the RDBMS’s model, the set oftuples, isn’t a great fit for modern programming languages ordevelopers’ minds.

Schema Design — MongoBerlin

Page 3: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

MongoDB Documents, Queries, Features

MongoDB documents are deeply nestable sequences key-valuepairs, thus permitting “rich” structure.

The MongoDB query language is relatively SQL-like in itscapacity to find documents satisfying complicated, dynamiccriteria.

MongoDB documents can be updated atomically, withspecial efficiency at updates that don’t alter a document’s sizeor shape.

Schema Design — MongoBerlin

Page 4: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

MongoDB Schema Design Generalities

When designing for MongoDB, do...

... let the application direct the schema.

... denormalize judiciously.

... design your schema for indexing.

... resort to application-level JOINs when needed

And don’t ...

... treat collections as heaps.

... frequently resize documents.

Schema Design — MongoBerlin

Page 5: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Letting the application direct the schema

Most applications mostly view their data in a small number of,distinguished “shape”, generally congruent to graphs ofinter-object has-a relationships among instance classes in theapplications’ models. MongoDB lets you store your data more orless directly according to the shape of your model.

Schema Design — MongoBerlin

Page 6: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Letting the application direct the schema, continued

db.blog_posts.findOne(){ _id : Object(...)text : "A blazingly clever blog post.",by : "A. U. Thor",date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",tags : [ "funny", "ironic" ]

}

Schema Design — MongoBerlin

Page 7: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Denormalizing Judiciously

Most application entities turn out to have some fields that are veryfrequently altered, and other fields that are exceedingly seldomaltered. Embedding infrequently altered attributes around thedatabase is a reasonable strategy to improve performance.

Schema Design — MongoBerlin

Page 8: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Denormalizing Judiciously, continued

db.product_reviews.findOne(){ _id : Object(...)comment : "The best thing ever!"date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",reviewer : { uid : ObjectId("987654abcxyz"),

name : "Khan Sumer",thumbnail : "thumb-123456.jpg",url : "http://blahblah.com/" } }

db.users.find({ _id : ObjectId("987654abcxyz")}){ uid : ObjectId("987654abcxyz"),name : "Khan Sumer",thumbnail : ..., url : ...last_post : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",favorites : [ ... ], friends : [ ... ] }

}

Schema Design — MongoBerlin

Page 9: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Design your schema for indexing

There’s a subtle relationship between schemas and indexes.Consider this query:

db.boxes.find({$where : "this.height > this.width"})

This query doesn’t take advantage of MongoDB indexes, bothbecause of the JavaScript and also because this predicate isn’tsomething MongoDB knows how to index. If this sort of query isimportant, maintaining a separate boolean attribute in thedocument is the right thing; and the separate value can be indexed.

Schema Design — MongoBerlin

Page 10: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Application-level JOINs

Because most MongoDB documents are “richer” than RDBMSrows, they tend to represent “pre-JOINed” data; and soapplication-level JOIN operations should be few. However,sometimes you do need relational-style normalization andapplication-level JOINS. This comes up in some many-to-manyrelationships, and may not cost much in practice.

Schema Design — MongoBerlin

Page 11: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Don’t treat collections as heaps

Although MongoDB permits quite a bit of freedom in documentstructure, documents in a collection ought to share a commonsubset of attributes, for programmatic processing effectiveindexing, and developer comprehension. If you have documentswith very different sets of attributes, consider storing them inseparate collections.

Schema Design — MongoBerlin

Page 12: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Don’t frequently resize documents

Resizing a document (e.g. by adding/removing attributes oradding/removing elements of lists) is generally costly. (In-placeupdates are quite efficient, however.) In general, a schema whosedocuments’ sizes are highly volatile should be considered suspect;such data might best be stored as separate documents.

Schema Design — MongoBerlin

Page 13: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Don’t frequently resize documents, continued

So, instead of this

db.urlhits.findOne(){ _id : ..., url : "http://10gen.com",// this is counting with granularity of 1 daycounts : { "2011-03-01" :

{ firefox : 12345, chrome : 23456 },"2011-03-02" :{ firefox : 15678, chrome : 24567 }... } }

consider this:

db.urlhits2.findOne(){ _id : ..., url : "http://10gen.com",date : "2011-03-01",counts : { "firefox : 12345, chrome : 23456 } }

Schema Design — MongoBerlin

Page 14: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Don’t frequently resize documents, continued

So, instead of this

db.user_events.findOne(){ _id : ..., user : "kreuter"clicks : [ { url : <url1>, time : <time1> },

{ url : <url2>, time : <time2> },... ] }

consider this:

db.user_events.findOne(){ _id : ..., user : "kreuter", url: <url1>, time: <time1> }

Schema Design — MongoBerlin

Page 15: MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Going forward

www.mongodb.org — downloads, docs, community

[email protected] — mailing list

#mongodb on irc.freenode.net

try.mongodb.org — web-based shell

10gen is hiring. Email [email protected].

10gen offers support, training, and advising services formongodb

Schema Design — MongoBerlin