What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database...

25
What is MongoDB? MongoDB is an open source document oriented database. MongoDB falls in the category of the NoSQL Database which means it doesn’t follow fixed schema structure like in relational databases. MongoDB cannot replace Relational databases but it should be viewed as an alternative to it. MongoDB can be installed on Windows, Linux and MAC so it is a cross platform database. It doesn’t support joins but it can represent rich, hierarchical data structures. And of the best feature the like the most is that it is easily scalable and can give high performance. The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every collection can contain different types of object. Every object is also called document which is represented as a JSON (JavaScript Object Notation) structure: a list of key-value pairs. The value can be of three types: a primitive value, an array of documents or again a list of key-value pairs. Let’s see how RDBMS and MongoDB differ:

Transcript of What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database...

Page 1: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

What is MongoDB?

MongoDB is an open source document oriented database. MongoDB falls in the category of the

NoSQL – Database which means it doesn’t follow fixed schema structure like in relational

databases.

MongoDB cannot replace Relational databases but it should be viewed as an alternative to it.

MongoDB can be installed on Windows, Linux and MAC so it is a cross platform database. It

doesn’t support joins but it can represent rich, hierarchical data structures. And of the best

feature the like the most is that it is easily scalable and can give high performance.

The MongoDB database consists of a set of databases in which each database contains multiple

collections. MongoDB is schema-less what it means is that every collection can contain different

types of object. Every object is also called document which is represented as a JSON (JavaScript

Object Notation) structure: a list of key-value pairs. The value can be of three types: a primitive

value, an array of documents or again a list of key-value pairs.

Let’s see how RDBMS and MongoDB differ:

Page 2: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

Document-Oriented Data-Model

But before we move into the next step of creating our own database let’s see what exactly

document oriented data-model is:

A typical document in MongoDB looks something like this:

{

_id: ObjectID('4bd9e8e17cefd644108961bb'),

name:'Vivek',

class : '12th',

subjects: [ 'physics', 'chemistry', 'math', 'english',

'computer'],

address: {

house_no: '12B',

block: 'B',

sector: 12,

city : 'noida',

},

grade: [

{

exam: 'unit test 1',

score: '60%'

},

{

exam: 'unit test 2',

score: '70%'

}

]

}

Above document contains information of a student in the key-value pair. It contains unique _id

for the record, name and its value, class and its value, subjects and its value is in the form of

array, address contains its value in form of another in-document and grade contains its value in

form of arrays of documents.

If we have to represent the same record in Relational world then we would require at least three

tables. One to store basic information like _id, name, class, address and another to store subjects

and another one to store grades etc. But here we stored the whole relational information in one

complete document this is how we managed the deficiency of joins and constraints in MongoDB.

In MongoDB we do not have joins but it’s up to us the developers how we are designing our

schema to manage relations.

Diving into the MongoDB-shell

Open the command prompt of windows and type ‘mongod’ to start the MongoDB server.

MongoDB server will get started, you will see the message ‘waiting for connections’ at the end

Page 3: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

of the screen and cursor will be blinking this means your server has been started and waiting for

connections from MongoDB clients.

Now open another command prompt and type ‘mongo’ to make connection to the server.

Remember don’t close the first command prompt.

Mongo Server

Mongo Client

As you can see in Mongo Client you will some important information:

MongoDB shell version : 3.0.6 ( It may be differ depending upon your version of MongoDB)

Connecting to : test ( it is the test database which MongoDB automatically creates for you and

connects you to it.)

Page 4: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

All the queries and commands we are going to write in this shell. MongoDB shell is an

interactive javascript interpreter which means if you know javascript then writing MongoDB

commands/queries is cake-walk for you.

In the MongoDB shell type ‘help’ and press enter then you will see bunch of helper functions

that MongoDB provides for us.

Well we talked a lot now it’s time for some action let’s see some commands now.

show dbs : will show the databases in your system.

show collections : will show the collections in a db.

db.help() : will show the help on db methods.

db.mycoll.help() : will show the help on collections methods.

Creating our own database:

Type use [database name] and press enter if the database exists the MongoDB will switch to

database else it will create a brand new database for you.

For example : type ‘ use students’ to create a database named students.

CRUD Operations

We have already created our database now it’s time to create our collection. So how do we create

collection it’s as easy as eating pie just use insert command. Let me show you.

db.mycol.insert({name:'vikas'})

Here mycol is the name of collection and we have inserted a document in it. We just have to

write the name of collection and insert record in it MongoDB will automatically create collection

if it’s been not created else will insert the record in the existing collection.

So for CRUD (Create Read Update Delete) operation we have following commands in the

MongoDB:

Page 5: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

Insert()

Insert command is used to insert documents in a collection as we know document in MongoDB

stores JSON object as document so insert command accept JSON object as parameter.

doc={

name:'xyx',

class:'12th',

subjects:['physics','chemisrty','maths','english','computer'],

address:{

house_no:'123',

sector:'50',

city:'noida'

}

}

db.mycol.insert(doc);

Here we created an object named doc and used that object as a parameter in insert command.

You can also insert document without creating any variable and directly pass document in the

insert command see below for example:

db.mycol.insert({

name:'Vivek',

class : '12th',

subjects: [ 'physics', 'chemistry', 'math', 'english',

'computer'],

address: {

house_no: '12B',

block: 'B',

sector: 12,

city : 'noida',

},

Page 6: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

grade: [

{

exam: 'unit test 1',

score: '60%'

},

{

exam: 'unit test 2',

score: '70%'

}

]

});

If you type the command db.my.col.find() you will see all the records inserted so far.

Note if you type the db.my.col.find().pretty() output displayed will be well indented

and formatted.

Page 7: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

One thing to notice here that we didn’t provided _id for any of the above document but

MongoDB itself inserted unique _id which is something like:

"_id" : ObjectId("560f767315f507b3c0bae3f5")

When we insert a document in MongoDB, server requires that all documents coming have a

unique identifying field in fact it uses _id for that. The _id field in a document is a primary key

which means it requires that the value of _id field should be unique. _id field is immutable which

means you cannot change it. ObjectId is a type to generate unique keys which take current time,

identifier for the machine that constructing the ObjectId, the process id of the process that

constructing the ObjectId and the counter that is global to object that constructing the ObjectId,

by pushing these information together MongoDB creates a unique id for us. Of course we can

create our own _id field but for every other document it must be unique.

So we have seen that how we can insert documents in MongoDB which means we have

completed our C part of CRUD operation let’s move to next part.

Find()

To use find() command to query our collection we need some more records to see various

form of find() command so let’s insert some more records in our collection. I have used the

below command to insert 100 records in one go.

for(i=0;i<100;i++)

{

subjects=['chemistry','physics','maths','english','computer'];

for(j=0;j<5;j++)

{

db.marks.insert({

name:"student"+ i,

subject:subjects[j],

marks:Math.round(Math.random()*100)

}

);

}

}

Now we have a collection named marks and it has 100 records in it.

We have used find() command earlier also but I didn’t explain how it works so let’s see how

it works. Find command takes a pattern matching parameter to match its corresponding records

just like where in Select query in SQL. If you write SELECT * FROM Table in SQL it will give

you all records in a table similarly if you write db.CollectionName.find() it will return

all the records in the collection.

Page 8: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

Passing parameter in find() method:

Let’s say we want find all the records of student0 so we will write:

db.marks.find({name:'student0'})

Page 9: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

Find record of student0 in subject computer:

db.marks.find({name:'student0',subject:'computer'}).pretty(

)

Here we passed two condition to match our document that is name should equal to

student0 and subject should equal to computer.

Find records of all students whose marks in computer is greater than 50:

db.marks.find({subject:'computer',marks:{$gt:50}}).pretty()

find command takes an object as a parameter similarly we will also pass document to

match condition that is marks should be greater than 50.

Greater than: $gt

Greater and Equal : $gte

Less than: $lt

Less and Equal: $lte

Find records of all students whose marks in computer is greater than 50 and less than

equal to 90:

db.marks.find({subject:'computer',marks:{$gt:50,$lte:90}}).

pretty()

Find records of all students whose marks in computer or physics is greater than 90:

db.marks.find({$or:[{subject:'computer'},{subject:'physics'

}],marks:{$gt:90}})

Page 10: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

$or takes an array of criteria that needs to be matched.

In our previous collection mycol, documents had different schema design so let’s say we

want to find the records in which class field exists:

db.mycol.find({class:{$exists:true}})

similarly, we can write $exists : false for records in which class field doesn’t exists.

Let’s say we have another collection named additionalsubject:

db.additionalsubject.insert({name:'student1',subject:['arts

','music']})

db.additionalsubject.insert({name:'student2',subject:['spor

ts','arts']})

db.additionalsubject.insert({name:'student3',subject:['spor

ts','cooking','music']})

db.additionalsubject.insert({name:'student4',subject:['arts

','craft','music']})

And we want to find the records of those students who are enrolled in arts. So the query

will be written as : db.additionalsubject.find({subject:'arts'})

But if we want to find records of those students who are enrolled in arts and music both,

then our query will be:

db.additionalsubject.find({subject:{$all:['arts','music']}}

)

The important thing to notice here is that $all will look for all the values passed in the

array irrespective of the sequence they are present in collection.

Page 11: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

Similarly we have $in to look for either of the values passed in it. Let’s say we want to

find the records of students who are either enrolled in sports or arts. Then the query will

be:

db.additionalsubject.find({subject:{$in:['sports','arts']}}

)

We have dot notations to query collections having nested documents.

For example in first document shown earlier in which grades contains arrays of

documents and if we want to find the grade of exam unit test 1 then the query will be

written as:

db.mycol.find({'grade.exam':'unit test 1'})

Update()

Update command takes two parameter first one is matching criteria and second one is updated

value. Syntax: db.[collectionName].update({matching criteria},{updated value});

update command in MongoDB performs four major tasks.

1. Performing Whole-Sale Replacement, means it will replace the whole document except

for the ObjectId. For example in our additionalsubject collection if want to change the

subject of student1, and we write like this then:

db.additionalsubject.update({name:'student1'},{subject:['craft']

})

It will not only change the subject field but will replace the whole document.

Page 12: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

After updating document:

Now there is no name field only ObjectId and subject field, so we can say this is not the right

way to update a document because updating in this manner will replace the whole document with

new updated value but this command is useful when you want replace the old document with

new fields and new values.

Page 13: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

2. Modifying only the desired field, if we want to just modify some values of field then we

have $set operator for this. For example : Let’s say we want to change the name of

student2 to ‘xyz’, in this case we only want to modify the name field and don’t want to

touch other field so we will write:

db.additionalsubject.update({name:'student2'},{$set:{name:'xyz'}

})

Before Update:

After update:

Page 14: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

3. Removing undesirable field, means we can remove a field from a document when we

don’t require it for this we have $unset opearator. For example, let’s say we want to

subject field for the student name ‘xyz’:

db.additionalsubject.update({name:'xyz'},{$unset:{subject:1}})

Before update:

After update:

4. Update command searches for the record that matches the criteria specified in the

parameter and if finds the record then it updates it otherwise nothing is updated. We have

a special operator upsert that creates the new record and then update it. So if we will try

to update the record of student5 nothing will happen because we do not have student5

record but if we use upsert then a new record for student5 gets created and updated.

db.additionalsubject.update({name:'student5'},{$set:{subjec

t:['music']}},{upsert:true})

5. If we want to update arrays in a document then we can easily do this in MongoDB. We

also have some special operators in MongoDB to update arrays in a document. Let’s see

some examples to update arrays in a document:

a. Change subject of student3 from sports to arts:

db.additionalsubject.update({name:'student3'},{$set:{'

subject.0':'arts'}});

subject is the array so if we want change the subject indexed at 0th place we will

write subject.0

Page 15: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

b. Add one more subject to student3 record. For this we have $push:

db.additionalsubject.update({name:'student3'},{$push:{

'subject':'sports'}})

It will add one more subject ‘sports’ at the end of the subject field.

c. Similarly, we have $pop to remove a value from the array. But it will remove the

rightmost value from the array.

$pop : 1 (remove rightmost value)

$pop : -1 (remove leftmost value)

db.additionalsubject.update({name:'student3'},{$pop:{'

subject':1}});

d. We have $pushAll to add one or more value to array. Similarly we $pull to

remove one specified value from array, and $pullAll to remove one or more

values from the array. Let’s see example:

db.additionalsubject.update({name:'student3'},{$pushAl

l:{'subject':['sports','craft']}})

db.additionalsubject.update({name:'student3'},{$pullAl

l:{'subject':['sports','craft']}})

Page 16: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

6. MongoDB updates only one document that fulfills the matching criteria but if you want

to update all document at a time then we have to pass one additional parameter mult:true.

For example if we want one more additional field to all document of

additionalsubject collection:

db.additionalsubject.update({},{$set:{'class':’12<sup>th</s

up>’}},{multi:true})

It will add a new field class to every document in the collection.

So we are done with update commands.

Remove()

Remove command is used to remove records from collection it works same as find command it

requires one parameter which is the matching criteria for the document.

If we want remove the record of student named student3 then we will write:

db.additionalsubject.remove({name:'student3'})

And we want to remove all documents of a collection then we have to pass an empty

document as a parameter to the remove command.

db.additionalsubject.remove({})

It will remove all the documents of collection.

Page 17: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

Introduction

Aggregation operations are very important in any type of database whether it is SQL or NoSQL.

To perform aggregations operations, MongoDB group values from multiple documents together

and then perform a variety of operations on grouped data to return a single result. SQL uses

aggregate functions to return a single value calculated from values in columns.

MongoDB has three ways to perform aggregation: the aggregation pipeline, the map-reduce

function, and the single purpose aggregation methods.

In this article, we will focus on aggregation pipeline. I'll try to cover each major section of it

using simple examples. We will be writing mongo shell commands to perform aggregation.

Aggregation Pipeline

MongoDB's aggregation framework is based on the concept of data processing pipelines.

Aggregation pipeline is similar to the UNIX world pipelines. At the very first is the collection,

the collection is sent through document by document, documents are piped through processing

pipeline and they go through series of stages and then we eventually get a result set.

In the figure, you see that collection is processed through different stages i.e. $project,

$match, $group, $sort these stages can appear multiple times.

Various stages in pipeline are:

$project – select, reshape data

$match – filter data

$group – aggregate data

$sort – sorts data

$skip – skips data

$limit – limit data

$unwind – normalizes data

Page 18: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

Let’s try to visualize the aggregation with an example. Don’t worry about the syntax. I will be

explaining it soon.

db.mycollection.aggregate([

{$match:{'phone_type':'smart'}},

{$group:{'_id':'$brand_name',total:{$sum:'$price'}}}

])

As you can see, in the diagram we have a collection, the $match stages filters out the

documents then in next stage of pipeline documents gets grouped and we get the final result set.

Page 19: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

Preparing Dummy Data

To run mongo shell commands, we need a database and some dummy records, so let’s create our

database and a collection.

dept = ['IT', 'Sales', 'HR', 'Admin'];

for (i = 0; i < 10; i++) {

db.mycollection.insert({ //mycollection is collection name

'_id': i,

'emp_code': 'emp_' + i,

'dept_name': dept[Math.round(Math.random() * 3)],

'experience': Math.round(Math.random() * 10),

});

The above command will insert some dummy documents in a collection named

mycollection in mydb database.

Syntax

db.mycollection.aggregate([

{$match:{'phone_type':'smart'}},

{$group:{'_id':'$brand_name',total:{$sum:'$price'}}}

])

Syntax is pretty much easier, aggregate function takes an array as argument, in array we can pass

various phases/stages of pipeline.

In the above example, we have passed two phases of pipeline that are $match which will filter

out record and $group phase which will group the records and produce final record set.

Page 20: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

Stages of Pipeline

1. $project

In the $project phase, we can add a key, remove a key, reshape a key. There are also some

simple functions that we can use on the key : $toUpper, $toLower, $add, $multiply,

etc.

Let’s use $project to reshape the documents that we have created.

db.mycollection.aggregate([

{

$project:{

_id:0,

'department':{$toUpper:'$dept_name'},

'new_experience':{$add:['$experience',1]}

}

}

])

In this aggregate query, we are projecting the documents, _id:0 means _id which is

compulsory we are hiding this field, a new key named department is created using previous

dept_name field in upper case. The point to be noticed here is that field ‘dept_name’ is

prefixed with ‘$’ sign to tell mongo shell that this field is the original field name of the

document. Another new field named new_experience is created by adding 1 using $add

function to the previous experience field. We will get the output like this:

Page 21: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

2. $match

It works exactly like ‘where clause' in SQL to filter out the records. The reason we might want

to match is because we'd like to filter the results and only aggregate a portion of the documents

or search for particular parts of the results set after we do the grouping. Let's say in our collection

we want to aggregate documents having department equals to sales, the query will be:

db.mycollection.aggregate([

{

$match:{

dept_name:'Sales'

}

}

])

3. $group

As the name suggests, $group groups documents based on some key. Let’s say we want to

group employees on their department name and we want to find the number of

employees in each department.

db.mycollection.aggregate([

{

$group:{

_id:'$dept_name',

no_of_employees:{$sum:1}

}

}

])

Here, _id is the key for grouping and I have created new key named no_of_employees and

used $sum to find the total record in each group.

Page 22: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

Let’s improve this query to present output in a more sensible way.

db.mycollection.aggregate([

{

$group:{

_id:{'department':'$dept_name'},

no_of_employees:{$sum:1}

}

}

])

Let’s say we want to group documents on more than on key, all we need to do is specify the

name of the keys in _id field.

db.mycollection.aggregate([

{

$group:{

_id:{'department':'$dept_name',

'year_of_experience':'$experience'

},

no_of_employees:{$sum:1}

}

}

])

4. $sort

Sort helps you to sort data after aggregation in ascending or descending as per your need. Let’s

say we want to group department name in ascending order and find out the number of

employees.

Page 23: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

db.mycollection.aggregate([

{

$group:{

_id:'$dept_name',

no_of_employees:{$sum:1}

}

},

{

$sort:{

_id:1

}

}

])

For descending use -1. Here in $sort, I have used _id field because in the first phase of

aggregation, I used $dept_name as _id for aggregation.

5. $skip and $limit

$skip and $limit exactly same way skip and limit work when we do a simple find. It doesn’t

make any sense to skip and limit unless we first sort, otherwise, the result is undefined.

We first skip records and then we limit.

Let’s see an example for the same.

Hide Copy Code db.mycollection.aggregate([

{

$group:{

_id:'$dept_name',

no_of_employees:{$sum:1}

}

},

{

$sort:{

_id:1

}

},

{

$skip:2

},

{

$limit:1

}

])

Page 24: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

Documents are grouped, then sorted, after that, we skipped two documents and limit the

document to only one.

6. $first and $last

As we know how sort works in the aggregation pipeline, we can learn about $first and

$last. They allow us to get the first and last value in each group as aggregation pipeline

processes the document.

db.mycollection.aggregate([

{

$group:{

_id:'$dept_name',

no_of_employees:{$sum:1},

first_record:{ $first:'$emp_code'}

}

}

])

7. $unwind

As we know in MongoDB, documents can have arrays. It is not easy to group on something

within an array. $unwind first unjoin array data and then basically rejoin it in a way that lets us

do grouping calculations on it.

Let’s say we have a document like this:

{

a:somedata,

b:someotherdata,

c:[arr1,arr2,arr3]

}

After $unwind on ‘c’, we will get three documents:

{

a:somedata,

b:someotherdata,

Page 25: What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database contains multiple collections. MongoDB is schema-less what it means is that every

c:arr1

}

{

a:somedata,

b:someotherdata,

c:arr2

}

{

a:somedata,

b:someotherdata,

c:arr3

}

8. Aggregation Expressions

Let's see some expressions that are very common in SQL and in MongoDB. We have an

alternate for that.

1. $Sum: We have already seen its example.

2. $avg: Average works just like sum except it calculates the average for each group.

3. $min: Finds out the minimum value from each grouped document.

4. $max: Finds out the maximum value from each grouped document.

Further Reading

Below are some useful links from where you can further investigate and learn more about

aggregation in MongoDB.

https://docs.mongodb.com/manual/aggregation/

https://docs.mongodb.com/v3.0/applications/aggregation/

https://docs.mongodb.com/v3.2/reference/sql-aggregation-comparison/