Back to Basics Webinar 3 - Thinking in Documents
-
Upload
joe-drumgoole -
Category
Software
-
view
135 -
download
2
Transcript of Back to Basics Webinar 3 - Thinking in Documents
Code JoeD gets you a 25% discount off the list priceEarly Bird Registration Ends May 13, 2016
Back to Basics 2016 : Webinar 3
Thinking in DocumentsJoe Drumgoole
Director of Developer Advocacy, EMEA@jdrumgoole
V1.1
4
Review
• Webinar 1 : Introduction to NoSQL– Types of NoSQL database– MongoDB is a document database– Replica Sets and Shards
• Webinar 2– Building a basic application– Adding indexes– Using Explain to measure database operators
5
Thinking in Documents
• Documents in MongoDB are Javascript Objects (JSON)• Actually they are encoded as BSON• BSON is “Binary JSON”• BSON allows efficient encoding and decoding of JSON• Required for efficient transmission and storage on disk• Eliminates the need to “text parse” all the sub objects• Full spec is online at http://bsonspec.org/
6
Example Document
{ first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ]}
Fields can contain an array of sub-documents
Fields
Typed field values
Fields can contain arrays
String
Number
Geo-Location
7
Data Stores – Key Value
Key 1 Value
Key 1 Value
Key 1 Value
8
Data Stores - Relational
Key 1
Value 1
Value 1
Value 1
Value 1
Key 2
Value 1
Value 1
Value 1
Value 1
Key 3
Value 1
Value 1
Value 1
Value 1
Key 4
Value 1
Value 1
Value 1
Value 1
9
Data Stores - Document
Key3
Key4
Key5
Value 3
Value 5
Value 4Key6
Value 5Key7
Value 2
Value 1Key1
Key1
Key1
Key2
10
In Document Form
{ “key1” : “value 1” }
{ “key1” : { “key2” : “value 1”, “key3” : { “key4” : “value 3”, “key5” : “value 4” }}
{ “key1” : { “key6” : “value 5”, “key7” : “value 6” }}
11
Some Example Queries
# Will find the first two documentsdb.demo.find( { “key1” : “value” } )
# find the second document by nested valuedb.demo.find( { "key1.key3.key4" : "value 3" } )
# will find the third documentdb.demo.find( { "key1.key6" : "value 6" } )
12
Modelling and Cardinality
• One to One–Title to blog post
• One to Many–Blog post to comments
• One to Millions–Blog post to site views (e.g. Huffington Post)
13
One To One
{ “Title” : “This is a blog post”, “Body” : “This is the body text of a very short blog post”, …}
We can index on “Title” and “Body”.
14
One to Many
{ “Title” : “This is a blog post”, “Body” : “This is the body text”, “Comments” : [ { “name” : “Joe Drumgoole”, “email” : “[email protected]”, “comment” : “I love your writing style” }, { “name” : “John Smith”, “email” : “[email protected]”, “comment” : “I hate your writing style” }]}
Where we expect a small number of comments we can embed them in the main document
15
Key Concerns
• What are the write patterns?– Comments are added more frequently than posts– Comments may have images, tags, large bodies of text
• What are the read patterns?– Comments may not be displayed– May be shown in their own window– People rarely look at all the comments
16
Approach 2 – Separate Collection
• Keep all comments in a separate comments collection• Add references to comments as an array of comment IDs• Requires two queries to display blog post and associated comments• Requires two writes to create a comments
{ _id : ObjectID( “AAAA” ), name : “Joe Drumgoole”, email : “[email protected]”, comment :“I love your writing style”,}{ _id : ObjectID( “AAAB” ), name : “John Smith”, email : “[email protected]”, comment :“I hate your writing style”,}
{ “_id” : ObjectID( “ZZZZ” ), “Title” : “A Blog Title”, “Body” : “A blog post”, “comments” : [ ObjectID( “AAAA” ), ObjectID( “AAAB” )]}{ “_id” : ObjectID( “ZZZZ” ), “Title” : “A Blog Title”, “Body” : “A blog post”, “comments” : []}
17
Approach 3 – A Hybrid Approach
{ “_id” : ObjectID( “ZZZZ” ), “Title” : “A Blog Title”, “Body” : “A blog post”, “comments” : [{ “_id” : ObjectID( “AAAA” ) “name” : “Joe Drumgoole”, “email” : “[email protected]”,
comment :“I love your writing style”,}{ _id : ObjectID( “AAAB” ), name : “John Smith”, email : “[email protected]”, comment :“I hate your writing style”,}]
}
{ “_post_jd” : ObjectID( “ZZZZ” ), “comments” : [{ “_id” : ObjectID( “AAAA” ) “name” : “Joe Drumgoole”, “email” : “[email protected]”,
“comment” :“I love your writing style”,}{...},{...},{...},{...},{...},{...},{..},{...},{...},{...} ]
18
What About One to A Million
• What is we were tracking mouse position for heat tracking?– Each user will generate hundreds of data points per visit– Thousands of data points per post– Millions of data points per blog site
• Reverse the model– Store a blog ID per event
{ “post_id” : ObjectID(“ZZZZ”), “timestamp” : ISODate("2005-01-02T00:00:00Z”), “location” : [24, 34] “click” : False,}
19
But – Finite number of events per second
{ post_id : ObjectID ( “ZZZZ” ), timeStamp: ISODate("2005-01-02T00:00:00Z”), events : { 0 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, 1 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, 2 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, 3 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, ... 59 :{ 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}}
20
Guidelines
• Embed objects for one to one capabilities• Look at read and write patterns to determine when to break out data• Don’t get stuck in “one record” per item thinking• Embrace the hierarchy• Think about cardinality• Grow your data by adding documents not be increasing document size• Think about your indexes• Document updates are transactions