Download - Talk MongoDB - Amil

{ name : ‘Marcelo Cenerino’, company: ‘Amil’, date : ‘2013-10-30T08:30:00.000Z’}

What is MongoDB?

MongoDB (from humongous) is an open-source, high-

performance, scalable, general purpose database. It is

used by organizations of all sizes to power online

applications where low latency and high availability are

critical requirements of the system.

Here’s a definition:

You can have at most two of these properties for any shared-data system. Dr. Eric A. Brewer, 2000

• Document based• Schemaless• Open source (on GitHub)• High performance• Horizontally scalable• Full featured

Main characteristics

eBay, Ericsson, EA, SAP, Telefonica, Code School, Abril...

Customers

MongoDB vs. RDBMS

id nome idade genero

1 João 25 Masculino

2 Maria 30 Feminino

3 Pedro 40 Masculino

...

...

...

RDBMS: data is structured as tables

Document oriented???

{

_id : 1,

nome : 'João',

idade : 25,

genero : 'Masculino'

}

Size: up to 16 MB

Document oriented

MongoDB stores data as document in a binary representation called BSON (Binary JSON)

Table Collection

Row Document

Index Index

Joins Embedded doc.

FK Reference

Partition Shard

RDBMS MongoDBvs.

Column Field

Transaction Model

MongoDB guarantees atomic updates to data at the document level.

Relational schema design

Relational schema design

• Large ERD diagrams

• Create table statements

• ORM to map tables to objects

• Tables just to join tables together

• Lots of revision and alter table statements until we

get it just right

In a MongoDB based app we start building our app and let the schema evolve.

User

name

email

Article

Comment[]

Tag[]

titledatetextauthor

value

authordatetext

Mongo “schema” design

Getting started with MongoDB

> mongod

> mongo

Basic CRUD operations

> user = {name : ‘marcelo’, age : 29, gender : ‘Male’}> db.users.insert(user)>

Inserting a document

• No collection creation needed!

> db.users.findOne(){ "_id" : ObjectId("5269d66271de67aa7c3c41b4"), "name" : “marcelo", "age" : 29, “gender" : “male"}

Querying a document

• _id is the primary key in MongoDB• Automatically indexed• Automatically created as an ObjectId if not provided• Any unique immutable value could be used

> db.users.find({name : 'maria', age : {$gt : 25}})

{ "_id" : ObjectId("526f1af1dac0a62cdc152a96"), "name" : "maria", "age" :

26 }

Querying a document

Group OperatorsComparison $gt, $gte, $in, $lt, $lte, $ne, $nin

Logical $or, $and, $not, $nor

Element $exists, $type

Evaluation $mod, $regex, $where

Geospatial $geoWithin, $geoIntersects, $near, $nearSphere

Array $all, $elemMatch, $size

Projection $, $elemMatch, $slice

Operators

> db.users.find({age : {$gt : 25}}, {_id : 0})

{ "name" : "maria", "age" : 26 }

{ "name" : "marcelo", "age" : 29 }

>

>db.users.update({age : {$gt : 25}}, {$set : {roles : ['admin', 'dev', 'operator']}})

Updating a document

> db.users.remove({name : 'maria'})

> db.users.find().pretty()

{

"_id" : ObjectId("526f1cb3dac0a62cdc152a98"),

"age" : 29,

"name" : "marcelo",

"roles" : [

"admin",

"dev",

"operator"

]

}

Removing a document

Indexing

> db.estabelecimentos.count()307929>> db.estabelecimentos.find({'localizacao.cidade' : 'PIACATU'}).explain(){ "cursor" : "BasicCursor", "isMultiKey" : false, "n" : 5, "nscannedObjects" : 307929, "nscanned" : 307929, "nscannedObjectsAllPlans" : 307929, "nscannedAllPlans" : 307929, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 1, "nChunkSkips" : 0, "millis" : 311, "indexBounds" : {

}, "server" : "Cenerino-PC:27017"}>

Querying a large collection without index

> db.estabelecimentos.getIndexes()[ { "v" : 1, "key" : { "_id" : 1 }, "ns" : "mapa-servicos.estabelecimentos", "name" : "_id_" }]>

Showing collection’s indexes

> // creating an index> db.estabelecimentos.ensureIndex({'localizacao.cidade' : 1})> db.estabelecimentos.getIndexes()[ { "v" : 1, "key" : { "_id" : 1 }, "ns" : "mapa-servicos.estabelecimentos", "name" : "_id_" }, { "v" : 1, "key" : { "localizacao.cidade" : 1 }, "ns" : "mapa-servicos.estabelecimentos", "name" : "localizacao.cidade_1" }]>

Creating an index

> db.estabelecimentos.find({'localizacao.cidade' : 'PIACATU'}).explain(){ "cursor" : "BtreeCursor localizacao.cidade_1", "isMultiKey" : false, "n" : 5, "nscannedObjects" : 5, "nscanned" : 5, "nscannedObjectsAllPlans" : 5, "nscannedAllPlans" : 5, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "localizacao.cidade" : [ [ "PIACATU", "PIACATU" ] ] }, "server" : "Cenerino-PC:27017"}

Same query, now using index

> db.estabelecimentos.ensureIndex({'localizacao.coordenadas' : '2dsphere'})> db.estabelecimentos.getIndexes()[ { "v" : 1, "key" : { "_id" : 1 }, "ns" : "mapa-servicos.estabelecimentos", "name" : "_id_" }, { "v" : 1, "key" : { "localizacao.cidade" : 1 }, "ns" : "mapa-servicos.estabelecimentos", "name" : "localizacao.cidade_1" }, { "v" : 1, "key" : { "localizacao.coordenadas" : "2dsphere" }, "ns" : "mapa-servicos.estabelecimentos", "name" : "localizacao.coordenadas_2dsphere" }]

Geospatial index

> lng = -46.80208830000004> lat = -23.515985699999998> distance = 30 / 6378.137>> db.estabelecimentos.find({ "localizacao.coordenadas" : { "$nearSphere" : [lng , lat] , "$maxDistance" : distance}}).limit(50)

Geospatial index

http://mapa-servicos-publicos.herokuapp.com/

http://mapa-servicos-publicos.herokuapp.com/

Aggregation Framework


Pipeline Operators: $project, $match, $limit, $skip, $unwind, $group, $sort, $geoNear


> db.estabelecimentos.aggregate([{$match : {'localizacao.uf' : 'SP'}}, {$group : {_id : '$localizacao.cidade', qtd : {$sum : 1}}}, {$sort : {qtd : -1}}, {$limit : 3}]){ "result" : [ { "_id" : "SAO PAULO", "qtd" : 6930 }, { "_id" : "CAMPINAS", "qtd" : 881 }, { "_id" : "GUARULHOS", "qtd" : 666 } ], "ok" : 1}>

Mongo Driver for Java

Spring Data MongoDBhttp://projects.spring.io/spring-data-mongodb/

http://projects.spring.io/spring-data-mongodb/

Replica Sets

Replica Sets

• A replica set is a group of mongod instances that host the

same data set

• Replication provides redundancy and increases data

availability

• The primary accepts all write operations from clients (only

one primary allowed)

• Replication can be used to increase read capacity

• Asynchronous replication

• Automatic failover

Replica Sets

Sharding

Sharding

Issues of scaling:

• High query rates can exhaust the CPU capacity of the server

• Larger data sets exceed the storage capacity of a single

machine

• Working set sizes larger than the system’s RAM stress the I/O

capacity of disk drives

Vertical scaling X Sharding

Sharding

• Sharding is the process of storing data across multiple machines

• Each shard is an independent database, and collectively, the shards make up a

single logical database

Horizontally Scalable

Sharded clusters

Range Based Sharding

• Supports more efficient range queries

• Results in an uneven distribution of data

• Monotonically increasing keys should be avoided

Hash Based Sharding

Ensures an even distribution of data at the expense of efficient range queries

https://education.mongodb.com/

https://education.mongodb.com/

db.audience.find({‘question’ : true})

Thanks everyone!

Hope you’ve enjoyed it.