MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
-
Upload
mongodb -
Category
Technology
-
view
7.126 -
download
0
description
Transcript of MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
Schema Design — MongoBerlin
Richard M Kreuter10gen Inc.
March 25, 2011
Schema Design — MongoBerlin
Observations about Relational Database Schemas
Relational schema design is often presented and thought of asan exercise in normalization. While academics debate howmany normal forms can fit on the head of a pin, practitionerstend to employ just one or two.
However, all nontrivial real-world applications employ a varietyof strategic denormalizations: materialized views in theRDBMS, caching layers outside the RDBMS. Thesedenormalizations tend to be vital to real-world performance.
Finally, application programmers seldom code in relations, butrather in object graphs; the RDBMS’s model, the set oftuples, isn’t a great fit for modern programming languages ordevelopers’ minds.
Schema Design — MongoBerlin
MongoDB Documents, Queries, Features
MongoDB documents are deeply nestable sequences key-valuepairs, thus permitting “rich” structure.
The MongoDB query language is relatively SQL-like in itscapacity to find documents satisfying complicated, dynamiccriteria.
MongoDB documents can be updated atomically, withspecial efficiency at updates that don’t alter a document’s sizeor shape.
Schema Design — MongoBerlin
MongoDB Schema Design Generalities
When designing for MongoDB, do...
... let the application direct the schema.
... denormalize judiciously.
... design your schema for indexing.
... resort to application-level JOINs when needed
And don’t ...
... treat collections as heaps.
... frequently resize documents.
Schema Design — MongoBerlin
Letting the application direct the schema
Most applications mostly view their data in a small number of,distinguished “shape”, generally congruent to graphs ofinter-object has-a relationships among instance classes in theapplications’ models. MongoDB lets you store your data more orless directly according to the shape of your model.
Schema Design — MongoBerlin
Letting the application direct the schema, continued
db.blog_posts.findOne(){ _id : Object(...)text : "A blazingly clever blog post.",by : "A. U. Thor",date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",tags : [ "funny", "ironic" ]
}
Schema Design — MongoBerlin
Denormalizing Judiciously
Most application entities turn out to have some fields that are veryfrequently altered, and other fields that are exceedingly seldomaltered. Embedding infrequently altered attributes around thedatabase is a reasonable strategy to improve performance.
Schema Design — MongoBerlin
Denormalizing Judiciously, continued
db.product_reviews.findOne(){ _id : Object(...)comment : "The best thing ever!"date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",reviewer : { uid : ObjectId("987654abcxyz"),
name : "Khan Sumer",thumbnail : "thumb-123456.jpg",url : "http://blahblah.com/" } }
db.users.find({ _id : ObjectId("987654abcxyz")}){ uid : ObjectId("987654abcxyz"),name : "Khan Sumer",thumbnail : ..., url : ...last_post : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",favorites : [ ... ], friends : [ ... ] }
}
Schema Design — MongoBerlin
Design your schema for indexing
There’s a subtle relationship between schemas and indexes.Consider this query:
db.boxes.find({$where : "this.height > this.width"})
This query doesn’t take advantage of MongoDB indexes, bothbecause of the JavaScript and also because this predicate isn’tsomething MongoDB knows how to index. If this sort of query isimportant, maintaining a separate boolean attribute in thedocument is the right thing; and the separate value can be indexed.
Schema Design — MongoBerlin
Application-level JOINs
Because most MongoDB documents are “richer” than RDBMSrows, they tend to represent “pre-JOINed” data; and soapplication-level JOIN operations should be few. However,sometimes you do need relational-style normalization andapplication-level JOINS. This comes up in some many-to-manyrelationships, and may not cost much in practice.
Schema Design — MongoBerlin
Don’t treat collections as heaps
Although MongoDB permits quite a bit of freedom in documentstructure, documents in a collection ought to share a commonsubset of attributes, for programmatic processing effectiveindexing, and developer comprehension. If you have documentswith very different sets of attributes, consider storing them inseparate collections.
Schema Design — MongoBerlin
Don’t frequently resize documents
Resizing a document (e.g. by adding/removing attributes oradding/removing elements of lists) is generally costly. (In-placeupdates are quite efficient, however.) In general, a schema whosedocuments’ sizes are highly volatile should be considered suspect;such data might best be stored as separate documents.
Schema Design — MongoBerlin
Don’t frequently resize documents, continued
So, instead of this
db.urlhits.findOne(){ _id : ..., url : "http://10gen.com",// this is counting with granularity of 1 daycounts : { "2011-03-01" :
{ firefox : 12345, chrome : 23456 },"2011-03-02" :{ firefox : 15678, chrome : 24567 }... } }
consider this:
db.urlhits2.findOne(){ _id : ..., url : "http://10gen.com",date : "2011-03-01",counts : { "firefox : 12345, chrome : 23456 } }
Schema Design — MongoBerlin
Don’t frequently resize documents, continued
So, instead of this
db.user_events.findOne(){ _id : ..., user : "kreuter"clicks : [ { url : <url1>, time : <time1> },
{ url : <url2>, time : <time2> },... ] }
consider this:
db.user_events.findOne(){ _id : ..., user : "kreuter", url: <url1>, time: <time1> }
Schema Design — MongoBerlin
Going forward
www.mongodb.org — downloads, docs, community
[email protected] — mailing list
#mongodb on irc.freenode.net
try.mongodb.org — web-based shell
10gen is hiring. Email [email protected].
10gen offers support, training, and advising services formongodb
Schema Design — MongoBerlin