MongoBoulder - Schema Design
-
Upload
alvin-john-richards -
Category
Documents
-
view
4.352 -
download
2
description
Transcript of MongoBoulder - Schema Design
Schema DesignAlvin Richards
Topics
Introduction• Basic Data Modeling• Evolving a schema
Common patterns• Single table inheritance• One-to-Many & Many-to-Many• Trees• Queues
So why model data?
http://www.flickr.com/photos/42304632@N00/493639870/
A brief history of normalization• 1970 E.F.Codd introduces 1st Normal Form (1NF)• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)
Goals:• Avoid anomalies when inserting, updating or deleting• Minimize redesign when extending the schema• Make the model informative to users• Avoid bias towards a particular style of query
* source : wikipedia
The real benefit of relational
• Before relational• Data and Logic combined
• After relational• Separation of concerns• Data modeled independent of logic• Logic freed from concerns of data design
• MongoDB continues this separation
Relational made normalized data look like this
Document databases make normalized data look like this
Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
Partition Shard
Partition Key Shard Key
DB ConsiderationsHow can we manipulate
this data ?
• Dynamic Queries
• Secondary Indexes
• Atomic Updates
• Map Reduce
Considerations• No Joins• Document writes are atomic
Access Patterns ?
• Read / Write Ratio
• Types of updates
• Types of queries
• Data life-cycle
So today’s example will use...
Design Session
Design documents that simply map to your applicationpost = {author: “Hergé”, date: new Date(), text: “Destination Moon”, tags: [“comic”, “adventure”]}
> db.post.save(post)
> db.posts.find()
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)", text: "Destination Moon", tags: [ "comic", "adventure" ] } Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied
Find the document
Secondary index for “author”
// 1 means ascending, -‐1 means descending
> db.posts.ensureIndex({author: 1})
> db.posts.find({author: 'Hergé'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)", author: "Hergé", ... }
Add and index, find via Index
Verifying indexes exist> db.system.indexes.find()
// Index on ID { name: "_id_", ns: "test.posts", key: { "_id" : 1 } }
// Index on author { _id: ObjectId("4c4ba6c5672c685e5e8aabf4"), ns: "test.posts", key: { "author" : 1 }, name: "author_1" }
Examine the query plan> db.blogs.find({author: 'Hergé'}).explain(){ "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] }}
Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,
// find posts with any tags> db.posts.find({tags: {$exists: true}})
Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,
// find posts with any tags> db.posts.find({tags: {$exists: true}})
Regular expressions:// posts where author starts with h> db.posts.find({author: /^h/i })
Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,
// find posts with any tags> db.posts.find({tags: {$exists: true}})
Regular expressions:// posts where author starts with h> db.posts.find({author: /^h/i })
Counting: // number of posts written by Hergé> db.posts.find({author: “Hergé”}).count()
Extending the Schema new_comment = {author: “Kyle”, date: new Date(), text: “great book”}
> db.posts.update( {text: “Destination Moon” }, { ‘$push’: {comments: new_comment}, ‘$inc’: {comments_count: 1}})
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)", text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : "Sat Jul 24 2010 20:51:03 GMT-‐0700 (PDT)", text : "great book" } ], comments_count: 1 }
Extending the Schema
// create index on nested documents:> db.posts.ensureIndex({"comments.author": 1})
> db.posts.find({comments.author:”Kyle”})
Extending the Schema
// create index on nested documents:> db.posts.ensureIndex({"comments.author": 1})
> db.posts.find({comments.author:”Kyle”})
// find last 5 posts:> db.posts.find().sort({date:-‐1}).limit(5)
Extending the Schema
// create index on nested documents:> db.posts.ensureIndex({"comments.author": 1})
> db.posts.find({comments.author:”Kyle”})
// find last 5 posts:> db.posts.find().sort({date:-‐1}).limit(5)
// most commented post:> db.posts.find().sort({comments_count:-‐1}).limit(1)
When sorting, check if you need an index
Extending the Schema
Watch for full table scans
> db.blogs.find({text: 'Destination Moon'}).explain() { "cursor" : "BasicCursor", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { }}
Map Reduce
Map reduce : count tagsmapFunc = function () { this.tags.forEach(function (z) {emit(z, {count:1});});}
reduceFunc = function (k, v) { var total = 0; for (var i = 0; i < v.length; i++) { total += v[i].count; } return {count:total}; }
res = db.posts.mapReduce(mapFunc, reduceFunc)
>db[res.result].find() { _id : "comic", value : { count : 1 } } { _id : "adventure", value : { count : 1 } }
Group
• Equivalent to a Group By in SQL
• Specific the attributes to group the data
• Process the results in a Reduce function
Group - Count post by Authorcmd = { key: { "author":true }, initial: {count: 0}, reduce: function(obj, prev) { prev.count++; }, };result = db.posts.group(cmd);
[ { "author" : "Hergé", "count" : 1 }, { "author" : "Kyle", "count" : 3 }]
Review
So Far:- Started out with a simple schema- Queried Data- Evolved the schema - Queried / Updated the data some more
http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html
Inheritance
Single Table Inheritance - RDBMS
shapes tableid type area radius d length width
1 circle 3.14 1
2 square 4 2
3 rect 10 5 2
Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}
Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}
// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})
Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}
// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})
// create index> db.shapes.ensureIndex({radius: 1})
One to ManyOne to Many relationships can specify• degree of association between objects• containment• life-cycle
One to Many- Embedded Array / Array Keys - slice operator to return subset of array - some queries harder e.g find latest comments across all documents
blogs: { author : "Hergé", date : "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)", comments : [ { author : "Kyle", date : "Sat Jul 24 2010 20:51:03 GMT-‐0700 (PDT)", text : "great book" } ]}
One to Many- Embedded tree - Single document - Natural - Hard to query
blogs: { author : "Hergé", date : "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)", comments : [ { author : "Kyle", date : "Sat Jul 24 2010 20:51:03 GMT-‐0700 (PDT)", text : "great book", replies: [ { author : “James”, ...} ] } ]}
One to Many- Normalized (2 collections) - most flexible - more queriesblogs: { author : "Hergé", date : "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)", comments : [ {comment : ObjectId(“1”)} ]}
comments : { _id : “1”, author : "James", date : "Sat Jul 24 2010 20:51:03 ..."}
One to Many - patterns
- Embedded Array / Array Keys
- Embedded Array / Array Keys- Embedded tree- Normalized
Many - ManyExample: - Product can be in many categories- Category can have many products
products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30”]}
Many - Many
products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30”]} categories: { _id: ObjectId("20"), name: "adventure", product_ids: [ ObjectId("10"), ObjectId("11"), ObjectId("12"]}
Many - Many
products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30”]} categories: { _id: ObjectId("20"), name: "adventure", product_ids: [ ObjectId("10"), ObjectId("11"), ObjectId("12"]}
//All categories for a given product> db.categories.find({product_ids: ObjectId("10")})
Many - Many
products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30”]} categories: { _id: ObjectId("20"), name: "adventure"}
Alternative
products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30”]} categories: { _id: ObjectId("20"), name: "adventure"}
// All products for a given category> db.products.find({category_ids: ObjectId("20")})
Alternative
products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30”]} categories: { _id: ObjectId("20"), name: "adventure"}
// All products for a given category> db.products.find({category_ids: ObjectId("20")})
// All categories for a given productproduct = db.products.find(_id : some_id)> db.categories.find({_id : {$in : product.category_ids}})
Alternative
TreesFull Tree in Document
{ comments: [ { author: “Kyle”, text: “...”, replies: [ {author: “James”, text: “...”, replies: []} ]} ]}
Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 4MB limit
TreesParent Links- Each node is stored as a document- Contains the id of the parent
Child Links- Each node contains the id’s of the children- Can support graphs (multiple parents / child)
Array of Ancestors- Store all Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" }
Array of Ancestors- Store all Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" }
//find all descendants of b:
> db.tree2.find({ancestors: ‘b’})
//find all direct descendants of b:
> db.tree2.find({parent: ‘b’})
Array of Ancestors- Store all Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" }
//find all descendants of b:
> db.tree2.find({ancestors: ‘b’})
//find all direct descendants of b:
> db.tree2.find({parent: ‘b’})
//find all ancestors of f:> ancestors = db.tree2.findOne({_id:’f’}).ancestors> db.tree2.find({_id: { $in : ancestors})
Trees as PathsStore hierarchy as a path expression- Separate each node by a delimiter, e.g. “/”- Use text search for find parts of a tree
{ comments: [ { author: “Kyle”, text: “initial post”, path: “/” }, { author: “Jim”, text: “jim’s comment”, path: “/jim” }, { author: “Kyle”, text: “Kyle’s reply to Jim”, path : “/jim/kyle”} ] }
// Find the conversations Jim was part of > db.posts.find({path: /^jim/i})
Queue• Need to maintain order and state• Ensure that updates to the queue are atomic
{ inprogress: false, priority: 1, ... }
Queue• Need to maintain order and state• Ensure that updates to the queue are atomic
{ inprogress: false, priority: 1, ... }
// find highest priority job and mark as in-‐progressjob = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -‐1), update: {$set: {inprogress: true, started: new Date()}}, new: true})
Remember me?
http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html
Summary
Schema design is different in MongoDB
Basic data design principals stay the same
Focus on how the apps manipulates data
Rapidly evolve schema to meet your requirements
Enjoy your new freedom, use it wisely :-)
@mongodb
conferences, appearances, and meetupshttp://www.10gen.com/events
http://bit.ly/mongo> Facebook | Twitter | LinkedIn
http://linkd.in/joinmongo
download at mongodb.org
We’re Hiring [email protected]
Competition!
1 - Tweet a picture of Mongo Boulder before 3pm
2 - Include the #mongoboulder hashtag
3 - You must be following @mongodb or @10gen
4 - Winner announced during the roadmap session gets free copy of MongoDB in Action and t-shirt