MongoDB: What, why, when. Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.
MongoDB - University of Scrantonbi/2013s-html/se521/MongoDB.pdfMongoDB is awesome for non-relational...
Transcript of MongoDB - University of Scrantonbi/2013s-html/se521/MongoDB.pdfMongoDB is awesome for non-relational...
MongoDBDanny JackowitzSE5214/10/13
What is MongoDB?
● NoSQL database management system (DBMS)
● Humongous○ => Intended for large datasets
● Document-oriented● Developed by 10gen● Started in 2007● Open-sourced in 2009● Production-ready as of version 1.4 (now 2.4)
NoSQL or NoSQL?
● NoSQL popular buzzword● "No SQL" or "Not only SQL"?
○ Most NoSQL DBMSs allow you to execute SQL (or close to it) commands■ Ex. Cassandra Query Language
○ MongoDB does NOT!■ Takes a completely different approach
DBMS Showdown
RDBMS vs. MongoDB
Round 1: Schemas
● RDBMS○ Explicitly define schema before inserting data
● MongoDB○ Schema implicitly created on first insert○ "_id" primary key automatically generated if not
specified○ Just throw data at Mongo, it can handle it!
CREATE TABLE stuff (id int PRIMARY KEY,some_data varchar(64)
)
Round 2: Tables
● RDBMS○ Tables store rows of data○ Data is organized by column
■ All rows in a table have same column structure
● MongoDB○ Collections store documents of data○ Data is organized by fields
■ Documents in a collection need not have identical fields
Round 3: Joins
● RDBMS
○ Returns a (logical) single table● MongoDB
○ No such concept○ Manual linking
■ Store _id of document within other document■ "Join" on the client
○ Embedded documents■ Denormalized data to remove need for join
table_1 JOIN table_2 ON table_1.a = table_2.b
Round 4: Transactions
● RDBMS
● MongoDB○ Atomic operations within a single document○ No multi-document commit with rollback
BEGIN;-- Do some stuffCOMMIT;
MongoDB Query Language
● No SQL!● BSON
○ Binary JSON○ JSON == JavaScript Object Notation
■ Key-value pairs{ _id: ObjectId("5099803df3f4948bd2f98391"), name: { first: "Alan", last: "Turing" }, birth: new Date('Jun 23, 1912'), death: new Date('Jun 07, 1954'), contribs: ["Turing machine", "Turing test"], views : NumberLong(1250000)}
Inserting Documentsrecord = { _id : 1, name : "mongo" }db.records.insert( record )
db.records.insert({_id : 2, name : "mongo"})
// batch insert using JavaScriptfor (var i = 1; i <= 20; i++) {
db.records.insert( { x : i } )}
Retrieving Documents// find alldb.records.find()// find specific (WHERE)db.records.find( { name : "mongo" } )
var cursor = db.records.find()while ( cursor.hasNext() ) { printjson( cursor.next() )}printjson( cursor[0] )
Updating Documentsdb.records.update({_id : 1}, { $set : { name : "mongodb" }})
db.records.update({_id : 2}, { $unset : { name : "ignored" }})
var r = db.records.find({name : "mongodb"})r[0]["name"] = "mongo"db.records.save(r[0])
Deleting Documents
// delete specific documentsdb.records.remove({name:"mongo"})
// delete all documentsdb.records.remove()
// delete collectiondb.records.drop()
Aggregation Framework
● db.collection.aggregate(...)● Uses a pipeline system
○ Works like the UNIX pipeline○ ls | grep "text" | more
db.collection.aggregate( { $op1 : val1 }, { $op2 : val2 }, { $op3 : val3 },);
Aggregation Framework
● $project○ Include fields from the original document○ Insert computed fields○ Rename fields○ Create and populate fields that hold sub-documents
db.zips.aggregate({ $project : { city : 1, state : 1, _id : 0 }})
Aggregation Framework
● $match○ Can work with implied equality or any of comparison
operators■ ==, !=, >, <, >=, <=
db.zips.aggregate( { $match : {pop : 8000}})db.zips.aggregate( { $match : { pop : { $gt : 80000, $lte : 82000 }}})
Aggregation Framework
● $limit○ Restricts the number of documents that pass
through pipeline at this point
db.zips.aggregate( {$match : { pop : { $gt : 80000, $lte : 82000 }}}, {$limit : 2})
Aggregation Framework
● $unwind○ Peels off the elements of an array individually○ Returns one document for every member of the
unwound arraydb.zips.aggregate( {$limit : 1}, {$project : { city : 1, state : 1, loc : 1, _id : 0 }}, {$unwind : "$loc" })
Aggregation Framework
● $group○ Groups documents together for the purpose of
calculating aggregate values based on a collection of documents
db.zips.aggregate( { $group : { _id : "$state", totalPop : { $sum : "$pop" }, avgPop : { $avg : "$pop"} }})
Aggregation Framework
● $sort○ Obvious...○ 1 ascending, -1 descending
db.zips.aggregate( { $sort : { state : 1, pop: -1 } })
More complex queries?
● MongoDB provides MANY other functions that allow for complex queries to be executed efficiently.
● Craigslist○ Archiving (still RDBMS for active listings)○ 2+ billion listings!
● SourceForge○ All project and download pages
● Lots of gaming back ends○ Disney, EA○ Storing scores, stats, achievements, etc.
What's the catch?
● MongoDB is designed for non-relational data● Faking relational loses efficiency
○ "Joining" on the client is slow● Embedded documents to preserve speed
○ De-normalizes data■ Consider books written by authors■ Each book document has own embedded copy of
author■ Author changes contact info■ Must update ALL books written by author!
Conclusion
● MongoDB is awesome for non-relational data○ Self-contained documents
● MongoDB is awesome for loosely structured data○ Each document in collection can have different
format● MongoDB is awesome for (mostly) static
data○ Throw all the data at it○ Normalization not as much of a concern○ Super fast queries with indices, etc.
● MongoDB is NOT a replacement for RDBMSs
Configuring a MongoDB Cluster
● MongoDB intended as a distributed system○ Different components run on different machines
● Three components○ mongod
■ --configsvr■ --replSet■ --shardsvr
○ mongos○ mongo
mongod
● "MongoDB Daemon"● Primary daemon process● Runs on every machine acting as data store● Comparable to postgresql-server● Defaults to port 27017● Configuration server
○ Started with --configsvr○ Special instance that stores all metadata for cluster○ Defaults to port 27019
Replication
● Exact same data stored on multiple instances
● Primary vs. Secondary○ Only primary accepts writes - propagates to
secondaries○ Fully Consistent (by default)
■ All reads and writes go through single primary○ Asynchronous replication
● Failover○ If primary fails, secondaries elect new primary○ Must have at least 2 secondaries for voting to work
● --replSet [name]
Sharding
● Partitions collections○ Based on shard key
● Stores different portions on different machines○ Ex. Storing transaction records
■ 1/1/10 - 12/31/10 -> server1■ 1/1/11 - 12/31/11 -> server2■ ...
● Easy scaling - add more racks!● --shardsvr
○ Switches to port 27018
mongos
● "MongoDB Shard"● Not a data store● Routing service for shards
○ Knows what data on what shard○ Directs request to appropriate shard
● To user/application looks same as single mongod instance○ Same interface as mongod○ Same default port (27017)○ Connect in same way
mongo
● Interactive shell interface● Comparable to psql● JavaScript
○ Can use loops, conditionals, etc. in queries
Our Architecture
Our Architecture
● 4 machine cluster○ server1
■ mongod --configsvr (27019)■ mongod --shardsvr (27018)■ mongos (27017)
○ server2■ mongod --shardsvr --replSet rs0 (27018)
○ server3■ mongod --shardsvr --replSet rs0 (27018)
○ server4■ mongod --shardsvr --replSet rs0 (27018)
Starting Everything Up...server1: sudo -u mongodb mongod --configsvr sudo -u mongodb mongod --shardsvrserver2, server3, server4: sudo -u mongodb mongod --shardsvr --replSet rs0server1: mongos --configdb 134.198.169.41
Setting Up Replication & Shardingserver2 (or 3 or 4): mongo --port 27018 rs.initiate() rs.add("134.198.169.43:27018") rs.add("134.198.169.44:27018") rs.conf()
server1: mongo sh.addShard("rs0/134.198.169.42:27018") sh.addShard("134.198.169.41:27018")
... And Watching It Worksh.enableSharding("test")sh.shardCollection("test.shardtest", { _id : 1 })
for (var i = 1; i <= 2000000; i++) { db.shardtest.insert( { _id : i, junk : "Some reasonably long text that will make this take up more space in the database and better illustrate sharding"})}