MongoDB - University of Scrantonbi/2013s-html/se521/MongoDB.pdfMongoDB is awesome for non-relational...

MongoDBDanny JackowitzSE5214/10/13

What is MongoDB?

● NoSQL database management system (DBMS)

● Humongous○ => Intended for large datasets

● Document-oriented● Developed by 10gen● Started in 2007● Open-sourced in 2009● Production-ready as of version 1.4 (now 2.4)

NoSQL or NoSQL?

● NoSQL popular buzzword● "No SQL" or "Not only SQL"?

○ Most NoSQL DBMSs allow you to execute SQL (or close to it) commands■ Ex. Cassandra Query Language

○ MongoDB does NOT!■ Takes a completely different approach

DBMS Showdown

RDBMS vs. MongoDB

Round 1: Schemas

● RDBMS○ Explicitly define schema before inserting data

● MongoDB○ Schema implicitly created on first insert○ "_id" primary key automatically generated if not

specified○ Just throw data at Mongo, it can handle it!

CREATE TABLE stuff (id int PRIMARY KEY,some_data varchar(64)

)

Round 2: Tables

● RDBMS○ Tables store rows of data○ Data is organized by column

■ All rows in a table have same column structure

● MongoDB○ Collections store documents of data○ Data is organized by fields

■ Documents in a collection need not have identical fields

Round 3: Joins

● RDBMS

○ Returns a (logical) single table● MongoDB

○ No such concept○ Manual linking

■ Store _id of document within other document■ "Join" on the client

○ Embedded documents■ Denormalized data to remove need for join

table_1 JOIN table_2 ON table_1.a = table_2.b

Round 4: Transactions

● RDBMS

● MongoDB○ Atomic operations within a single document○ No multi-document commit with rollback

BEGIN;-- Do some stuffCOMMIT;

MongoDB Query Language

● No SQL!● BSON

○ Binary JSON○ JSON == JavaScript Object Notation

■ Key-value pairs{ _id: ObjectId("5099803df3f4948bd2f98391"), name: { first: "Alan", last: "Turing" }, birth: new Date('Jun 23, 1912'), death: new Date('Jun 07, 1954'), contribs: ["Turing machine", "Turing test"], views : NumberLong(1250000)}

Inserting Documentsrecord = { _id : 1, name : "mongo" }db.records.insert( record )

db.records.insert({_id : 2, name : "mongo"})

// batch insert using JavaScriptfor (var i = 1; i <= 20; i++) {

db.records.insert( { x : i } )}

Retrieving Documents// find alldb.records.find()// find specific (WHERE)db.records.find( { name : "mongo" } )

var cursor = db.records.find()while ( cursor.hasNext() ) { printjson( cursor.next() )}printjson( cursor[0] )

Updating Documentsdb.records.update({_id : 1}, { $set : { name : "mongodb" }})

db.records.update({_id : 2}, { $unset : { name : "ignored" }})

var r = db.records.find({name : "mongodb"})r[0]["name"] = "mongo"db.records.save(r[0])

Deleting Documents

// delete specific documentsdb.records.remove({name:"mongo"})

// delete all documentsdb.records.remove()

// delete collectiondb.records.drop()

Aggregation Framework

● db.collection.aggregate(...)● Uses a pipeline system

○ Works like the UNIX pipeline○ ls | grep "text" | more

db.collection.aggregate( { $op1 : val1 }, { $op2 : val2 }, { $op3 : val3 },);


● $project○ Include fields from the original document○ Insert computed fields○ Rename fields○ Create and populate fields that hold sub-documents

db.zips.aggregate({ $project : { city : 1, state : 1, _id : 0 }})


● $match○ Can work with implied equality or any of comparison

operators■ ==, !=, >, <, >=, <=

db.zips.aggregate( { $match : {pop : 8000}})db.zips.aggregate( { $match : { pop : { $gt : 80000, $lte : 82000 }}})


● $limit○ Restricts the number of documents that pass

through pipeline at this point

db.zips.aggregate( {$match : { pop : { $gt : 80000, $lte : 82000 }}}, {$limit : 2})


● $unwind○ Peels off the elements of an array individually○ Returns one document for every member of the

unwound arraydb.zips.aggregate( {$limit : 1}, {$project : { city : 1, state : 1, loc : 1, _id : 0 }}, {$unwind : "$loc" })


● $group○ Groups documents together for the purpose of

calculating aggregate values based on a collection of documents

db.zips.aggregate( { $group : { _id : "$state", totalPop : { $sum : "$pop" }, avgPop : { $avg : "$pop"} }})


● $sort○ Obvious...○ 1 ascending, -1 descending

db.zips.aggregate( { $sort : { state : 1, pop: -1 } })

More complex queries?

● MongoDB provides MANY other functions that allow for complex queries to be executed efficiently.

● Craigslist○ Archiving (still RDBMS for active listings)○ 2+ billion listings!

● SourceForge○ All project and download pages

● Lots of gaming back ends○ Disney, EA○ Storing scores, stats, achievements, etc.

What's the catch?

● MongoDB is designed for non-relational data● Faking relational loses efficiency

○ "Joining" on the client is slow● Embedded documents to preserve speed

○ De-normalizes data■ Consider books written by authors■ Each book document has own embedded copy of

author■ Author changes contact info■ Must update ALL books written by author!

Conclusion

● MongoDB is awesome for non-relational data○ Self-contained documents

● MongoDB is awesome for loosely structured data○ Each document in collection can have different

format● MongoDB is awesome for (mostly) static

data○ Throw all the data at it○ Normalization not as much of a concern○ Super fast queries with indices, etc.

● MongoDB is NOT a replacement for RDBMSs

Configuring a MongoDB Cluster

● MongoDB intended as a distributed system○ Different components run on different machines

● Three components○ mongod

■ --configsvr■ --replSet■ --shardsvr

○ mongos○ mongo

mongod

● "MongoDB Daemon"● Primary daemon process● Runs on every machine acting as data store● Comparable to postgresql-server● Defaults to port 27017● Configuration server

○ Started with --configsvr○ Special instance that stores all metadata for cluster○ Defaults to port 27019

Replication

● Exact same data stored on multiple instances

● Primary vs. Secondary○ Only primary accepts writes - propagates to

secondaries○ Fully Consistent (by default)

■ All reads and writes go through single primary○ Asynchronous replication

● Failover○ If primary fails, secondaries elect new primary○ Must have at least 2 secondaries for voting to work

● --replSet [name]

Sharding

● Partitions collections○ Based on shard key

● Stores different portions on different machines○ Ex. Storing transaction records

■ 1/1/10 - 12/31/10 -> server1■ 1/1/11 - 12/31/11 -> server2■ ...

● Easy scaling - add more racks!● --shardsvr

○ Switches to port 27018

mongos

● "MongoDB Shard"● Not a data store● Routing service for shards

○ Knows what data on what shard○ Directs request to appropriate shard

● To user/application looks same as single mongod instance○ Same interface as mongod○ Same default port (27017)○ Connect in same way

mongo

● Interactive shell interface● Comparable to psql● JavaScript

○ Can use loops, conditionals, etc. in queries

Our Architecture

Our Architecture

● 4 machine cluster○ server1

■ mongod --configsvr (27019)■ mongod --shardsvr (27018)■ mongos (27017)

○ server2■ mongod --shardsvr --replSet rs0 (27018)



Starting Everything Up...server1: sudo -u mongodb mongod --configsvr sudo -u mongodb mongod --shardsvrserver2, server3, server4: sudo -u mongodb mongod --shardsvr --replSet rs0server1: mongos --configdb 134.198.169.41

Setting Up Replication & Shardingserver2 (or 3 or 4): mongo --port 27018 rs.initiate() rs.add("134.198.169.43:27018") rs.add("134.198.169.44:27018") rs.conf()

server1: mongo sh.addShard("rs0/134.198.169.42:27018") sh.addShard("134.198.169.41:27018")

... And Watching It Worksh.enableSharding("test")sh.shardCollection("test.shardtest", { _id : 1 })

for (var i = 1; i <= 2000000; i++) { db.shardtest.insert( { _id : i, junk : "Some reasonably long text that will make this take up more space in the database and better illustrate sharding"})}

MongoDB - University of Scrantonbi/2013s-html/se521/MongoDB.pdfMongoDB is awesome for non-relational...

Documents

Transcript of MongoDB - University of Scrantonbi/2013s-html/se521/MongoDB.pdfMongoDB is awesome for non-relational...