{ name : ‘Marcelo Cenerino’, company: ‘Amil’, date : ‘2013-10-30T08:30:00.000Z’}
What is MongoDB?
MongoDB (from humongous) is an open-source, high-
performance, scalable, general purpose database. It is
used by organizations of all sizes to power online
applications where low latency and high availability are
critical requirements of the system.
Here’s a definition:
You can have at most two of these properties for any shared-data system. Dr. Eric A. Brewer, 2000
• Document based• Schemaless• Open source (on GitHub)• High performance• Horizontally scalable• Full featured
Main characteristics
eBay, Ericsson, EA, SAP, Telefonica, Code School, Abril...
Customers
MongoDB vs. RDBMS
id nome idade genero
1 João 25 Masculino
2 Maria 30 Feminino
3 Pedro 40 Masculino
...
...
...
RDBMS: data is structured as tables
Document oriented???
{
_id : 1,
nome : 'João',
idade : 25,
genero : 'Masculino'
}
Size: up to 16 MB
Document oriented
MongoDB stores data as document in a binary representation called BSON (Binary JSON)
Table Collection
Row Document
Index Index
Joins Embedded doc.
FK Reference
Partition Shard
RDBMS MongoDBvs.
Column Field
Transaction Model
MongoDB guarantees atomic updates to data at the document level.
Relational schema design
Relational schema design
• Large ERD diagrams
• Create table statements
• ORM to map tables to objects
• Tables just to join tables together
• Lots of revision and alter table statements until we
get it just right
In a MongoDB based app we start building our app and let the schema evolve.
User
name
Article
Comment[]
Tag[]
titledatetextauthor
value
authordatetext
Mongo “schema” design
Getting started with MongoDB
> mongod
> mongo
Basic CRUD operations
> user = {name : ‘marcelo’, age : 29, gender : ‘Male’}> db.users.insert(user)>
Inserting a document
• No collection creation needed!
> db.users.findOne(){ "_id" : ObjectId("5269d66271de67aa7c3c41b4"), "name" : “marcelo", "age" : 29, “gender" : “male"}
Querying a document
• _id is the primary key in MongoDB• Automatically indexed• Automatically created as an ObjectId if not provided• Any unique immutable value could be used
> db.users.find({name : 'maria', age : {$gt : 25}})
{ "_id" : ObjectId("526f1af1dac0a62cdc152a96"), "name" : "maria", "age" :
26 }
Querying a document
Group OperatorsComparison $gt, $gte, $in, $lt, $lte, $ne, $nin
Logical $or, $and, $not, $nor
Element $exists, $type
Evaluation $mod, $regex, $where
Geospatial $geoWithin, $geoIntersects, $near, $nearSphere
Array $all, $elemMatch, $size
Projection $, $elemMatch, $slice
Operators
> db.users.find({age : {$gt : 25}}, {_id : 0})
{ "name" : "maria", "age" : 26 }
{ "name" : "marcelo", "age" : 29 }
>
>db.users.update({age : {$gt : 25}}, {$set : {roles : ['admin', 'dev', 'operator']}})
Updating a document
> db.users.remove({name : 'maria'})
> db.users.find().pretty()
{
"_id" : ObjectId("526f1cb3dac0a62cdc152a98"),
"age" : 29,
"name" : "marcelo",
"roles" : [
"admin",
"dev",
"operator"
]
}
Removing a document
Indexing
> db.estabelecimentos.count()307929>> db.estabelecimentos.find({'localizacao.cidade' : 'PIACATU'}).explain(){ "cursor" : "BasicCursor", "isMultiKey" : false, "n" : 5, "nscannedObjects" : 307929, "nscanned" : 307929, "nscannedObjectsAllPlans" : 307929, "nscannedAllPlans" : 307929, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 1, "nChunkSkips" : 0, "millis" : 311, "indexBounds" : {
}, "server" : "Cenerino-PC:27017"}>
Querying a large collection without index
> db.estabelecimentos.getIndexes()[ { "v" : 1, "key" : { "_id" : 1 }, "ns" : "mapa-servicos.estabelecimentos", "name" : "_id_" }]>
Showing collection’s indexes
> // creating an index> db.estabelecimentos.ensureIndex({'localizacao.cidade' : 1})> db.estabelecimentos.getIndexes()[ { "v" : 1, "key" : { "_id" : 1 }, "ns" : "mapa-servicos.estabelecimentos", "name" : "_id_" }, { "v" : 1, "key" : { "localizacao.cidade" : 1 }, "ns" : "mapa-servicos.estabelecimentos", "name" : "localizacao.cidade_1" }]>
Creating an index
> db.estabelecimentos.find({'localizacao.cidade' : 'PIACATU'}).explain(){ "cursor" : "BtreeCursor localizacao.cidade_1", "isMultiKey" : false, "n" : 5, "nscannedObjects" : 5, "nscanned" : 5, "nscannedObjectsAllPlans" : 5, "nscannedAllPlans" : 5, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "localizacao.cidade" : [ [ "PIACATU", "PIACATU" ] ] }, "server" : "Cenerino-PC:27017"}
Same query, now using index
> db.estabelecimentos.ensureIndex({'localizacao.coordenadas' : '2dsphere'})> db.estabelecimentos.getIndexes()[ { "v" : 1, "key" : { "_id" : 1 }, "ns" : "mapa-servicos.estabelecimentos", "name" : "_id_" }, { "v" : 1, "key" : { "localizacao.cidade" : 1 }, "ns" : "mapa-servicos.estabelecimentos", "name" : "localizacao.cidade_1" }, { "v" : 1, "key" : { "localizacao.coordenadas" : "2dsphere" }, "ns" : "mapa-servicos.estabelecimentos", "name" : "localizacao.coordenadas_2dsphere" }]
Geospatial index
> lng = -46.80208830000004> lat = -23.515985699999998> distance = 30 / 6378.137>> db.estabelecimentos.find({ "localizacao.coordenadas" : { "$nearSphere" : [lng , lat] , "$maxDistance" : distance}}).limit(50)
Geospatial index
http://mapa-servicos-publicos.herokuapp.com/
Aggregation Framework
Aggregation Framework
Pipeline Operators: $project, $match, $limit, $skip, $unwind, $group, $sort, $geoNear
Aggregation Framework
> db.estabelecimentos.aggregate([{$match : {'localizacao.uf' : 'SP'}}, {$group : {_id : '$localizacao.cidade', qtd : {$sum : 1}}}, {$sort : {qtd : -1}}, {$limit : 3}]){ "result" : [ { "_id" : "SAO PAULO", "qtd" : 6930 }, { "_id" : "CAMPINAS", "qtd" : 881 }, { "_id" : "GUARULHOS", "qtd" : 666 } ], "ok" : 1}>
Mongo Driver for Java
Spring Data MongoDBhttp://projects.spring.io/spring-data-mongodb/
Replica Sets
Replica Sets
• A replica set is a group of mongod instances that host the
same data set
• Replication provides redundancy and increases data
availability
• The primary accepts all write operations from clients (only
one primary allowed)
• Replication can be used to increase read capacity
• Asynchronous replication
• Automatic failover
Replica Sets
Replica Sets
Sharding
Sharding
Issues of scaling:
• High query rates can exhaust the CPU capacity of the server
• Larger data sets exceed the storage capacity of a single
machine
• Working set sizes larger than the system’s RAM stress the I/O
capacity of disk drives
Vertical scaling X Sharding
Sharding
• Sharding is the process of storing data across multiple machines
• Each shard is an independent database, and collectively, the shards make up a
single logical database
Horizontally Scalable
Sharded clusters
Range Based Sharding
• Supports more efficient range queries
• Results in an uneven distribution of data
• Monotonically increasing keys should be avoided
Hash Based Sharding
Ensures an even distribution of data at the expense of efficient range queries
https://education.mongodb.com/
Books
db.audience.find({‘question’ : true})
Thanks everyone!
Hope you’ve enjoyed it.
Top Related