De normalised london aggregation framework overview
-
Upload
chris-harris -
Category
Technology
-
view
599 -
download
0
description
Transcript of De normalised london aggregation framework overview
Chris Harris Email : [email protected]
Twitter : cj_harris5
DeNormalised London: Aggregation Framework Overview
Wednesday, 21 March 12
Terminology
RDBMS MongoDBTable CollectionRow(s) JSON DocumentIndex IndexJoin Embedding & LinkingPartition ShardPartition Key Shard Key
Wednesday, 21 March 12
Here is a “simple” SQL Modelmysql> select * from book;+----+----------------------------------------------------------+| id | title |+----+----------------------------------------------------------+| 1 | The Demon-Haunted World: Science as a Candle in the Dark || 2 | Cosmos || 3 | Programming in Scala |+----+----------------------------------------------------------+3 rows in set (0.00 sec)
mysql> select * from bookauthor;+---------+-----------+| book_id | author_id |+---------+-----------+| 1 | 1 || 2 | 1 || 3 | 2 || 3 | 3 || 3 | 4 |+---------+-----------+5 rows in set (0.00 sec)
mysql> select * from author;+----+-----------+------------+-------------+-------------+---------------+| id | last_name | first_name | middle_name | nationality | year_of_birth |+----+-----------+------------+-------------+-------------+---------------+| 1 | Sagan | Carl | Edward | NULL | 1934 || 2 | Odersky | Martin | NULL | DE | 1958 || 3 | Spoon | Lex | NULL | NULL | NULL || 4 | Venners | Bill | NULL | NULL | NULL |+----+-----------+------------+-------------+-------------+---------------+4 rows in set (0.00 sec)
Wednesday, 21 March 12
The Same Data in MongoDB
{ "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ]}
Wednesday, 21 March 12
What problem are we solving?
• Map/Reduce can be used for aggregation…• Currently being used for totaling, averaging, etc
• Map/Reduce is a big hammer• Simpler tasks should be easier
• Shouldn’t need to write JavaScript• Avoid the overhead of JavaScript engine
• We’re seeing requests for help in handling complex documents• Select only matching subdocuments or arrays
Wednesday, 21 March 12
How will we solve the problem?
• New aggregation framework• Declarative framework (no JavaScript)• Describe a chain of operations to apply• Expression evaluation
• Return computed values• Framework: new operations added easily• C++ implementation
Wednesday, 21 March 12
Aggregation - Pipelines
• Aggregation requests specify a pipeline• A pipeline is a series of operations• Members of a collection are passed
through a pipeline to produce a result• ps -ef | grep -i mongod
Wednesday, 21 March 12
Example - twitter{ "_id" : ObjectId("4f47b268fb1c80e141e9888c"), "user" : { "friends_count" : 73, "location" : "Brazil", "screen_name" : "Bia_cunha1", "name" : "Beatriz Helena Cunha", "followers_count" : 102, }}
• Find the # of followers and # friends by location
Wednesday, 21 March 12
Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });
Wednesday, 21 March 12
Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });
Predicate
Wednesday, 21 March 12
Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });
Predicate
Parts of the document you want to project
Wednesday, 21 March 12
Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });
Predicate
Parts of the document you want to project
Function to apply to the
result set
Wednesday, 21 March 12
Example - twitter{ "result" : [ { "_id" : "Far Far Away", "friends" : 344, "followers" : 789 },... ], "ok" : 1}
Wednesday, 21 March 12
DemoDemo files are at https://gist.github.com/
2036709
Wednesday, 21 March 12
Projections
• $project can reshape results• Include or exclude fields• Computed fields
• Arithmetic expressions• Pull fields from nested documents to the top• Push fields from the top down into new virtual
documents
Wednesday, 21 March 12
Unwinding
• $unwind can “stream” arrays• Array values are doled out one at time in the
context of their surrounding documents• Makes it possible to filter out elements before
returning
Wednesday, 21 March 12
Grouping
• $group aggregation expressions• Define a grouping key as the _id of the result• Total grouped column values: $sum• Average grouped column values: $avg• Collect grouped column values in an array or
set: $push, $addToSet• Other functions
• $min, $max, $first, $last
Wednesday, 21 March 12
Sorting
• $sort can sort documents• Sort specifications are the same as today,
e.g., $sort:{ key1: 1, key2: -1, …}
Wednesday, 21 March 12
Computed Expressions
• Available in $project operations• Prefix expression language
• $add:[“$field1”, “$field2”]• $ifNull:[“$field1”, “$field2”]• Nesting: $add:[“$field1”, $ifNull:[“$field2”,
“$field3”]]• Other functions….
• $divide, $mod, $multiply
Wednesday, 21 March 12
Computed Expressions
• String functions• $toUpper, $toLower, $substr
• Date field extraction• $year, $month, $day, $hour...
• Date arithmetic• $ifNull• Ternary conditional
• Return one of two values based on a predicate
Wednesday, 21 March 12
conferences, appearanceshttp://www.10gen.com/events
and meetupshttp://www.meetup.com/London-MongoDB-User-Group
download at mongodb.org
We’re Hiring !Chris Harris
Email : [email protected] : cj_harris5
Wednesday, 21 March 12