MongoSV - Schema Design

download MongoSV - Schema Design

of 58

Transcript of MongoSV - Schema Design

  • 8/8/2019 MongoSV - Schema Design

    1/58

    Schema DesignAlvin Richards

    [email protected]

  • 8/8/2019 MongoSV - Schema Design

    2/58

    Topics

    Introduction

    Basic Data Modeling

    Evolving a schemaCommon patterns

    Single table inheritance One-to-Many & Many-to-Many Trees Queues

  • 8/8/2019 MongoSV - Schema Design

    3/58

    So why model data?

    http://www.flickr.com/photos/42304632@N00/493639870/

  • 8/8/2019 MongoSV - Schema Design

    4/58

    A brief history of normalization 1970 E.F.Codd introduces 1st Normal Form (1NF)

    1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)

    1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)

    2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)

    Goals:

    Avoid anomalies when inserting, updating or deleting

    Minimize redesign when extending the schema

    Make the model informative to users

    Avoid bias towards a particular style of query

    * source : wikipedia

  • 8/8/2019 MongoSV - Schema Design

    5/58

    The real benefit of relational

    Before relational

    Data and Logic combined

    After relational Separation of concerns Data modeled independent of logic Logic freed from concerns of data design

    MongoDB continues this separation

  • 8/8/2019 MongoSV - Schema Design

    6/58

    Relational made normalizeddata look like this

  • 8/8/2019 MongoSV - Schema Design

    7/58

    Document databases makenormalized data look like this

  • 8/8/2019 MongoSV - Schema Design

    8/58

    Terminology

    RDBMS MongoDB

    Table Collection

    Row(s) JSONDocument

    Index Index

    Join Embedding&Linking

    Partition Shard

    PartitionKey ShardKey

  • 8/8/2019 MongoSV - Schema Design

    9/58

    DB Considerations

    How can we manipulatethis data ?

    Dynamic Queries

    Secondary Indexes

    Atomic Updates

    Map Reduce

    Considerations No Joins

    Document writes are atomic

    Access Patterns ?

    Read / Write Ratio

    Types of updates

    Types of queries

    Data life-cycle

  • 8/8/2019 MongoSV - Schema Design

    10/58

    So todays example will use...

  • 8/8/2019 MongoSV - Schema Design

    11/58

    Design Session

    Design documents that simply map toyour application

    post={author:Herg,date:newDate(),text:DestinationMoon,tags:[comic,adventure]}

    >db.post.save(post)

  • 8/8/2019 MongoSV - Schema Design

    12/58

    >db.posts.find()

    {_id:ObjectId("4c4ba5c0672c685e5e8aabf3"),

    author:"Herg",

    date:"SatJul24201019:47:11GMT-0700(PDT)",text:"DestinationMoon",

    tags:["comic","adventure"]

    }

    Notes: ID must be unique, but can be anything youd like MongoDB will generate a default ID if one is notsupplied

    Find the document

  • 8/8/2019 MongoSV - Schema Design

    13/58

    Secondary index for author

    //1meansascending,-1meansdescending

    >db.posts.ensureIndex({author:1})

    >db.posts.find({author:'Herg'})

    {_id:ObjectId("4c4ba5c0672c685e5e8aabf3"),

    date:"SatJul24201019:47:11GMT-0700(PDT)",

    author:"Herg",

    ...}

    Add and index, find via Index

  • 8/8/2019 MongoSV - Schema Design

    14/58

    Verifying indexes exist

    >db.system.indexes.find()

    //IndexonID

    {name:"_id_",ns:"test.posts",

    key:{"_id":1}}

    //Indexonauthor

    {_id:ObjectId("4c4ba6c5672c685e5e8aabf4"),

    ns:"test.posts",

    key:{"author":1},

    name:"author_1"}

  • 8/8/2019 MongoSV - Schema Design

    15/58

    Examine the query plan>db.blogs.find({author:'Herg'}).explain()

    {

    "cursor":"BtreeCursorauthor_1",

    "nscanned":1,

    "nscannedObjects":1,

    "n":1, "millis":5,

    "indexBounds":{

    "author":[

    [

    "Herg",

    "Herg"

    ]

    ]

    }}

  • 8/8/2019 MongoSV - Schema Design

    16/58

    Query operators

    Conditional operators:$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..$lt, $lte, $gt, $gte, $ne,

    //findpostswithanytags

    >db.posts.find({tags:{$exists:true}})

  • 8/8/2019 MongoSV - Schema Design

    17/58

    Query operators

    Conditional operators:$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..$lt, $lte, $gt, $gte, $ne,

    //findpostswithanytags>db.posts.find({tags:{$exists:true}})

    Regular expressions://postswhereauthorstartswithh

    >db.posts.find({author:/^h/i})

  • 8/8/2019 MongoSV - Schema Design

    18/58

    Query operators

    Conditional operators:$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..$lt, $lte, $gt, $gte, $ne,

    //findpostswithanytags>db.posts.find({tags:{$exists:true}})

    Regular expressions://postswhereauthorstartswithh

    >db.posts.find({author:/^h/i})

    Counting://numberofpostswrittenbyHerg

    >db.posts.find({author:Herg}).count()

  • 8/8/2019 MongoSV - Schema Design

    19/58

    Extending the Schema

    new_comment={author:Kyle,

    date:newDate(),

    text:greatbook}

    >db.posts.update(

    {text:DestinationMoon},

    {$push:{comments:new_comment},

    $inc:{comments_count:1}})

  • 8/8/2019 MongoSV - Schema Design

    20/58

    {_id:ObjectId("4c4ba5c0672c685e5e8aabf3"),author:"Herg",date:"SatJul24201019:47:11GMT-0700(PDT)",

    text:"DestinationMoon",tags:["comic","adventure"],comments:[ { author:"Kyle", date:"SatJul24201020:51:03GMT-0700(PDT)", text:"greatbook" }],comments_count:1}

    Extending the Schema

  • 8/8/2019 MongoSV - Schema Design

    21/58

    //createindexonnesteddocuments:

    >db.posts.ensureIndex({"comments.author":1})

    >db.posts.find({comments.author:Kyle})

    Extending the Schema

  • 8/8/2019 MongoSV - Schema Design

    22/58

    //createindexonnesteddocuments:

    >db.posts.ensureIndex({"comments.author":1})

    >db.posts.find({comments.author:Kyle})

    //findlast5posts:

    >db.posts.find().sort({date:-1}).limit(5)

    Extending the Schema

  • 8/8/2019 MongoSV - Schema Design

    23/58

    //createindexonnesteddocuments:

    >db.posts.ensureIndex({"comments.author":1})

    >db.posts.find({comments.author:Kyle})

    //findlast5posts:

    >db.posts.find().sort({date:-1}).limit(5)

    //mostcommentedpost:

    >db.posts.find().sort({comments_count:-1}).limit(1)

    When sorting, check if you need an index

    Extending the Schema

  • 8/8/2019 MongoSV - Schema Design

    24/58

    Watch for full table scans

    >db.blogs.find({text:'DestinationMoon'}).explain()

    {

    "cursor":"BasicCursor",

    "nscanned":1, "nscannedObjects":1,

    "n":1,

    "millis":0,

    "indexBounds":{

    }

    }

  • 8/8/2019 MongoSV - Schema Design

    25/58

    Map Reduce

  • 8/8/2019 MongoSV - Schema Design

    26/58

    Map reduce : count tagsmapFunc=function(){

    this.tags.forEach(function(z){emit(z,{count:1});});

    }

    reduceFunc=function(k,v){

    vartotal=0;

    for(vari=0;idb[res.result].find()

    {_id:"comic",value:{count:1}}

    {_id:"adventure",value:{count:1}}

  • 8/8/2019 MongoSV - Schema Design

    27/58

    Group

    Equivalent to a Group By in SQL

    Specific the attributes to group the data

    Process the results in a Reduce function

  • 8/8/2019 MongoSV - Schema Design

    28/58

    Group - Count post by Author

    cmd={key:{"author":true},initial:{count:0},reduce:function(obj,prev){prev.count++;},

    };result=db.posts.group(cmd);

    [

    {

    "author":"Herg",

    "count":1

    },

    {

    "author":"Kyle",

    "count":3

    }]

  • 8/8/2019 MongoSV - Schema Design

    29/58

    Review

    So Far:

    - Started out with a simple schema

    - Queried Data

    - Evolved the schema- Queried / Updated the data some more

  • 8/8/2019 MongoSV - Schema Design

    30/58

    http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html

  • 8/8/2019 MongoSV - Schema Design

    31/58

    Inheritance

  • 8/8/2019 MongoSV - Schema Design

    32/58

    Single Table Inheritance - RDBMS

    shapes tableid type area radius d length width

    1 circle 3.14 1

    2 square 4 2

    3 rect 10 5 2

  • 8/8/2019 MongoSV - Schema Design

    33/58

    Single Table Inheritance -MongoDB>db.shapes.find(){_id:"1",type:"circle",area:3.14,radius:1}{_id:"2",type:"square",area:4,d:2}

    {_id:"3",type:"rect",area:10,length:5,width:2}

  • 8/8/2019 MongoSV - Schema Design

    34/58

    Single Table Inheritance -MongoDB>db.shapes.find(){_id:"1",type:"circle",area:3.14,radius:1}{_id:"2",type:"square",area:4,d:2}

    {_id:"3",type:"rect",area:10,length:5,width:2}

    //findshapeswhereradius>0>db.shapes.find({radius:{$gt:0}})

  • 8/8/2019 MongoSV - Schema Design

    35/58

    Single Table Inheritance -MongoDB>db.shapes.find(){_id:"1",type:"circle",area:3.14,radius:1}{_id:"2",type:"square",area:4,d:2}

    {_id:"3",type:"rect",area:10,length:5,width:2}

    //findshapeswhereradius>0>db.shapes.find({radius:{$gt:0}})

    //createindex>db.shapes.ensureIndex({radius:1})

  • 8/8/2019 MongoSV - Schema Design

    36/58

    One to ManyOne to Many relationships can specify

    degree of association between objects containment life-cycle

  • 8/8/2019 MongoSV - Schema Design

    37/58

    One to Many

    - Embedded Array / Array Keys- slice operator to return subset of array- some queries hard

    e.g find latest comments across all documents

  • 8/8/2019 MongoSV - Schema Design

    38/58

    One to Many

    - Embedded Array / Array Keys- slice operator to return subset of array- some queries hard

    e.g find latest comments across all documents

    - Embedded tree- Single document- Natural- Hard to query

  • 8/8/2019 MongoSV - Schema Design

    39/58

    One to Many

    - Embedded Array / Array Keys- slice operator to return subset of array- some queries hard

    e.g find latest comments across all documents

    - Embedded tree- Single document- Natural- Hard to query

    - Normalized (2 collections)- most flexible- more queries

  • 8/8/2019 MongoSV - Schema Design

    40/58

    One to Many - patterns

    - Embedded Array / Array Keys

    - Embedded Array / Array Keys- Embedded tree- Normalized

  • 8/8/2019 MongoSV - Schema Design

    41/58

    Many - Many

    Example:- Product can be in many categories

    - Category can have many products

  • 8/8/2019 MongoSV - Schema Design

    42/58

    products:{_id:ObjectId("4c4ca23933fb5941681b912e"),

    name:"DestinationMoon",

    category_ids:[ObjectId("4c4ca25433fb5941681b912f"),

    ObjectId("4c4ca25433fb5941681b92af]}

    Many - Many

  • 8/8/2019 MongoSV - Schema Design

    43/58

    products:{_id:ObjectId("4c4ca23933fb5941681b912e"),

    name:"DestinationMoon",

    category_ids:[ObjectId("4c4ca25433fb5941681b912f"),

    ObjectId("4c4ca25433fb5941681b92af]}

    categories:{_id:ObjectId("4c4ca25433fb5941681b912f"),

    name:"adventure",

    product_ids:[ObjectId("4c4ca23933fb5941681b912e"),

    ObjectId("4c4ca30433fb5941681b9130"),

    ObjectId("4c4ca30433fb5941681b913a"]}

    Many - Many

  • 8/8/2019 MongoSV - Schema Design

    44/58

    products:{_id:ObjectId("4c4ca23933fb5941681b912e"),

    name:"DestinationMoon",

    category_ids:[ObjectId("4c4ca25433fb5941681b912f"),

    ObjectId("4c4ca25433fb5941681b92af]}

    categories:{_id:ObjectId("4c4ca25433fb5941681b912f"),

    name:"adventure",

    product_ids:[ObjectId("4c4ca23933fb5941681b912e"),

    ObjectId("4c4ca30433fb5941681b9130"),

    ObjectId("4c4ca30433fb5941681b913a"]}

    //Allcategoriesforagivenproduct>db.categories.find({product_ids:ObjectId

    ("4c4ca23933fb5941681b912e")})

    Many - Many

  • 8/8/2019 MongoSV - Schema Design

    45/58

  • 8/8/2019 MongoSV - Schema Design

    46/58

    products:

    {_id:ObjectId("4c4ca23933fb5941681b912e"),

    name:"DestinationMoon",category_ids:[ObjectId("4c4ca25433fb5941681b912f"),

    ObjectId("4c4ca25433fb5941681b92af]}

    categories:

    {_id:ObjectId("4c4ca25433fb5941681b912f"),name:"adventure"}

    //Allproductsforagivencategory

    >db.products.find({category_ids:ObjectId

    ("4c4ca25433fb5941681b912f")})

    Alternative

  • 8/8/2019 MongoSV - Schema Design

    47/58

    products:

    {_id:ObjectId("4c4ca23933fb5941681b912e"),

    name:"DestinationMoon",category_ids:[ObjectId("4c4ca25433fb5941681b912f"),

    ObjectId("4c4ca25433fb5941681b92af]}

    categories:

    {_id:ObjectId("4c4ca25433fb5941681b912f"),name:"adventure"}

    //Allproductsforagivencategory

    >db.products.find({category_ids:ObjectId

    ("4c4ca25433fb5941681b912f")})

    //Allcategoriesforagivenproductproduct=db.products.find(_id:some_id)

    >db.categories.find({_id:{$in:product.category_ids}})

    Alternative

  • 8/8/2019 MongoSV - Schema Design

    48/58

    TreesFull Tree in Document

    {comments:[{author:Kyle,text:...,replies:[

    {author:Fred,text:...,replies:[]}]}]}

    Pros: Single Document, Performance, Intuitive

    Cons: Hard to search, Partial Results, 4MB limit

  • 8/8/2019 MongoSV - Schema Design

    49/58

    TreesParent Links- Each node is stored as a document- Contains the id of the parent

    Child Links- Each node contains the ids of the children- Can support graphs (multiple parents / child)

  • 8/8/2019 MongoSV - Schema Design

    50/58

    Array of Ancestors- Store all Ancestors of a node{_id:"a"}

    {_id:"b",ancestors:["a"],parent:"a"}

    {_id:"c",ancestors:["a","b"],parent:"b"}

    {_id:"d",ancestors:["a","b"],parent:"b"}

    {_id:"e",ancestors:["a"],parent:"a"}{_id:"f",ancestors:["a","e"],parent:"e"}

  • 8/8/2019 MongoSV - Schema Design

    51/58

  • 8/8/2019 MongoSV - Schema Design

    52/58

    Array of Ancestors- Store all Ancestors of a node{_id:"a"}

    {_id:"b",ancestors:["a"],parent:"a"}

    {_id:"c",ancestors:["a","b"],parent:"b"}

    {_id:"d",ancestors:["a","b"],parent:"b"}

    {_id:"e",ancestors:["a"],parent:"a"}{_id:"f",ancestors:["a","e"],parent:"e"}

    //findalldescendantsofb:

    >db.tree2.find({ancestors:b})

    //findalldirectdescendantsofb:

    >db.tree2.find({parent:b})

    //findallancestorsoff:

    >ancestors=db.tree2.findOne({_id:f}).ancestors>db.tree2.find({_id:{$in:ancestors})

  • 8/8/2019 MongoSV - Schema Design

    53/58

    Trees as Paths

    Store hierarchy as a path expression- Separate each node by a delimiter, e.g. /- Use text search for find parts of a tree

    {comments:[{author:Kyle,text:initialpost,path:/},{author:Jim,text:jimscomment,path:/jim},{author:Kyle,text:KylesreplytoJim,path:/jim/kyle}]}

    //FindtheconversationsJimwaspartof

    >db.posts.find({path:/^jim/i})

  • 8/8/2019 MongoSV - Schema Design

    54/58

    QueueRequirements

    See jobs waiting, jobs in progress Ensure that each job is started once and only once

    {inprogress:false,

    priority:1,...

    }

  • 8/8/2019 MongoSV - Schema Design

    55/58

    QueueRequirements

    See jobs waiting, jobs in progress Ensure that each job is started once and only once

    {inprogress:false,

    priority:1,...

    }

    //findhighestpriorityjobandmarkasin-progressjob=db.jobs.findAndModify({

    query:{inprogress:false},

    sort:{priority:-1),

    update:{$set:{inprogress:true,started:newDate()}},

    new:true})

    Remember me?

  • 8/8/2019 MongoSV - Schema Design

    56/58

    Remember me?

    http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html

    Summary

  • 8/8/2019 MongoSV - Schema Design

    57/58

    Summary

    Schema design is diferent in MongoDB

    Basic data design principals stay the same

    Focus on how the apps manipulates data

    Rapidly evolve schema to meet your requirements

    Enjoy your new freedom, use it wisely :-)

  • 8/8/2019 MongoSV - Schema Design

    58/58

    @mongodb

    conferences,appearances,andmeetupshttp://www.10gen.com/events

    http://bit.ly/mongo>

    Facebook|Twitter|LinkedInhttp://linkd.in/joinmongo

    download at mongodb.org

    Were Hiring [email protected]