What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database...
Transcript of What is MongoDB?€¦ · The MongoDB database consists of a set of databases in which each database...
What is MongoDB?
MongoDB is an open source document oriented database. MongoDB falls in the category of the
NoSQL – Database which means it doesn’t follow fixed schema structure like in relational
databases.
MongoDB cannot replace Relational databases but it should be viewed as an alternative to it.
MongoDB can be installed on Windows, Linux and MAC so it is a cross platform database. It
doesn’t support joins but it can represent rich, hierarchical data structures. And of the best
feature the like the most is that it is easily scalable and can give high performance.
The MongoDB database consists of a set of databases in which each database contains multiple
collections. MongoDB is schema-less what it means is that every collection can contain different
types of object. Every object is also called document which is represented as a JSON (JavaScript
Object Notation) structure: a list of key-value pairs. The value can be of three types: a primitive
value, an array of documents or again a list of key-value pairs.
Let’s see how RDBMS and MongoDB differ:
Document-Oriented Data-Model
But before we move into the next step of creating our own database let’s see what exactly
document oriented data-model is:
A typical document in MongoDB looks something like this:
{
_id: ObjectID('4bd9e8e17cefd644108961bb'),
name:'Vivek',
class : '12th',
subjects: [ 'physics', 'chemistry', 'math', 'english',
'computer'],
address: {
house_no: '12B',
block: 'B',
sector: 12,
city : 'noida',
},
grade: [
{
exam: 'unit test 1',
score: '60%'
},
{
exam: 'unit test 2',
score: '70%'
}
]
}
Above document contains information of a student in the key-value pair. It contains unique _id
for the record, name and its value, class and its value, subjects and its value is in the form of
array, address contains its value in form of another in-document and grade contains its value in
form of arrays of documents.
If we have to represent the same record in Relational world then we would require at least three
tables. One to store basic information like _id, name, class, address and another to store subjects
and another one to store grades etc. But here we stored the whole relational information in one
complete document this is how we managed the deficiency of joins and constraints in MongoDB.
In MongoDB we do not have joins but it’s up to us the developers how we are designing our
schema to manage relations.
Diving into the MongoDB-shell
Open the command prompt of windows and type ‘mongod’ to start the MongoDB server.
MongoDB server will get started, you will see the message ‘waiting for connections’ at the end
of the screen and cursor will be blinking this means your server has been started and waiting for
connections from MongoDB clients.
Now open another command prompt and type ‘mongo’ to make connection to the server.
Remember don’t close the first command prompt.
Mongo Server
Mongo Client
As you can see in Mongo Client you will some important information:
MongoDB shell version : 3.0.6 ( It may be differ depending upon your version of MongoDB)
Connecting to : test ( it is the test database which MongoDB automatically creates for you and
connects you to it.)
All the queries and commands we are going to write in this shell. MongoDB shell is an
interactive javascript interpreter which means if you know javascript then writing MongoDB
commands/queries is cake-walk for you.
In the MongoDB shell type ‘help’ and press enter then you will see bunch of helper functions
that MongoDB provides for us.
Well we talked a lot now it’s time for some action let’s see some commands now.
show dbs : will show the databases in your system.
show collections : will show the collections in a db.
db.help() : will show the help on db methods.
db.mycoll.help() : will show the help on collections methods.
Creating our own database:
Type use [database name] and press enter if the database exists the MongoDB will switch to
database else it will create a brand new database for you.
For example : type ‘ use students’ to create a database named students.
CRUD Operations
We have already created our database now it’s time to create our collection. So how do we create
collection it’s as easy as eating pie just use insert command. Let me show you.
db.mycol.insert({name:'vikas'})
Here mycol is the name of collection and we have inserted a document in it. We just have to
write the name of collection and insert record in it MongoDB will automatically create collection
if it’s been not created else will insert the record in the existing collection.
So for CRUD (Create Read Update Delete) operation we have following commands in the
MongoDB:
Insert()
Insert command is used to insert documents in a collection as we know document in MongoDB
stores JSON object as document so insert command accept JSON object as parameter.
doc={
name:'xyx',
class:'12th',
subjects:['physics','chemisrty','maths','english','computer'],
address:{
house_no:'123',
sector:'50',
city:'noida'
}
}
db.mycol.insert(doc);
Here we created an object named doc and used that object as a parameter in insert command.
You can also insert document without creating any variable and directly pass document in the
insert command see below for example:
db.mycol.insert({
name:'Vivek',
class : '12th',
subjects: [ 'physics', 'chemistry', 'math', 'english',
'computer'],
address: {
house_no: '12B',
block: 'B',
sector: 12,
city : 'noida',
},
grade: [
{
exam: 'unit test 1',
score: '60%'
},
{
exam: 'unit test 2',
score: '70%'
}
]
});
If you type the command db.my.col.find() you will see all the records inserted so far.
Note if you type the db.my.col.find().pretty() output displayed will be well indented
and formatted.
One thing to notice here that we didn’t provided _id for any of the above document but
MongoDB itself inserted unique _id which is something like:
"_id" : ObjectId("560f767315f507b3c0bae3f5")
When we insert a document in MongoDB, server requires that all documents coming have a
unique identifying field in fact it uses _id for that. The _id field in a document is a primary key
which means it requires that the value of _id field should be unique. _id field is immutable which
means you cannot change it. ObjectId is a type to generate unique keys which take current time,
identifier for the machine that constructing the ObjectId, the process id of the process that
constructing the ObjectId and the counter that is global to object that constructing the ObjectId,
by pushing these information together MongoDB creates a unique id for us. Of course we can
create our own _id field but for every other document it must be unique.
So we have seen that how we can insert documents in MongoDB which means we have
completed our C part of CRUD operation let’s move to next part.
Find()
To use find() command to query our collection we need some more records to see various
form of find() command so let’s insert some more records in our collection. I have used the
below command to insert 100 records in one go.
for(i=0;i<100;i++)
{
subjects=['chemistry','physics','maths','english','computer'];
for(j=0;j<5;j++)
{
db.marks.insert({
name:"student"+ i,
subject:subjects[j],
marks:Math.round(Math.random()*100)
}
);
}
}
Now we have a collection named marks and it has 100 records in it.
We have used find() command earlier also but I didn’t explain how it works so let’s see how
it works. Find command takes a pattern matching parameter to match its corresponding records
just like where in Select query in SQL. If you write SELECT * FROM Table in SQL it will give
you all records in a table similarly if you write db.CollectionName.find() it will return
all the records in the collection.
Passing parameter in find() method:
Let’s say we want find all the records of student0 so we will write:
db.marks.find({name:'student0'})
Find record of student0 in subject computer:
db.marks.find({name:'student0',subject:'computer'}).pretty(
)
Here we passed two condition to match our document that is name should equal to
student0 and subject should equal to computer.
Find records of all students whose marks in computer is greater than 50:
db.marks.find({subject:'computer',marks:{$gt:50}}).pretty()
find command takes an object as a parameter similarly we will also pass document to
match condition that is marks should be greater than 50.
Greater than: $gt
Greater and Equal : $gte
Less than: $lt
Less and Equal: $lte
Find records of all students whose marks in computer is greater than 50 and less than
equal to 90:
db.marks.find({subject:'computer',marks:{$gt:50,$lte:90}}).
pretty()
Find records of all students whose marks in computer or physics is greater than 90:
db.marks.find({$or:[{subject:'computer'},{subject:'physics'
}],marks:{$gt:90}})
$or takes an array of criteria that needs to be matched.
In our previous collection mycol, documents had different schema design so let’s say we
want to find the records in which class field exists:
db.mycol.find({class:{$exists:true}})
similarly, we can write $exists : false for records in which class field doesn’t exists.
Let’s say we have another collection named additionalsubject:
db.additionalsubject.insert({name:'student1',subject:['arts
','music']})
db.additionalsubject.insert({name:'student2',subject:['spor
ts','arts']})
db.additionalsubject.insert({name:'student3',subject:['spor
ts','cooking','music']})
db.additionalsubject.insert({name:'student4',subject:['arts
','craft','music']})
And we want to find the records of those students who are enrolled in arts. So the query
will be written as : db.additionalsubject.find({subject:'arts'})
But if we want to find records of those students who are enrolled in arts and music both,
then our query will be:
db.additionalsubject.find({subject:{$all:['arts','music']}}
)
The important thing to notice here is that $all will look for all the values passed in the
array irrespective of the sequence they are present in collection.
Similarly we have $in to look for either of the values passed in it. Let’s say we want to
find the records of students who are either enrolled in sports or arts. Then the query will
be:
db.additionalsubject.find({subject:{$in:['sports','arts']}}
)
We have dot notations to query collections having nested documents.
For example in first document shown earlier in which grades contains arrays of
documents and if we want to find the grade of exam unit test 1 then the query will be
written as:
db.mycol.find({'grade.exam':'unit test 1'})
Update()
Update command takes two parameter first one is matching criteria and second one is updated
value. Syntax: db.[collectionName].update({matching criteria},{updated value});
update command in MongoDB performs four major tasks.
1. Performing Whole-Sale Replacement, means it will replace the whole document except
for the ObjectId. For example in our additionalsubject collection if want to change the
subject of student1, and we write like this then:
db.additionalsubject.update({name:'student1'},{subject:['craft']
})
It will not only change the subject field but will replace the whole document.
After updating document:
Now there is no name field only ObjectId and subject field, so we can say this is not the right
way to update a document because updating in this manner will replace the whole document with
new updated value but this command is useful when you want replace the old document with
new fields and new values.
2. Modifying only the desired field, if we want to just modify some values of field then we
have $set operator for this. For example : Let’s say we want to change the name of
student2 to ‘xyz’, in this case we only want to modify the name field and don’t want to
touch other field so we will write:
db.additionalsubject.update({name:'student2'},{$set:{name:'xyz'}
})
Before Update:
After update:
3. Removing undesirable field, means we can remove a field from a document when we
don’t require it for this we have $unset opearator. For example, let’s say we want to
subject field for the student name ‘xyz’:
db.additionalsubject.update({name:'xyz'},{$unset:{subject:1}})
Before update:
After update:
4. Update command searches for the record that matches the criteria specified in the
parameter and if finds the record then it updates it otherwise nothing is updated. We have
a special operator upsert that creates the new record and then update it. So if we will try
to update the record of student5 nothing will happen because we do not have student5
record but if we use upsert then a new record for student5 gets created and updated.
db.additionalsubject.update({name:'student5'},{$set:{subjec
t:['music']}},{upsert:true})
5. If we want to update arrays in a document then we can easily do this in MongoDB. We
also have some special operators in MongoDB to update arrays in a document. Let’s see
some examples to update arrays in a document:
a. Change subject of student3 from sports to arts:
db.additionalsubject.update({name:'student3'},{$set:{'
subject.0':'arts'}});
subject is the array so if we want change the subject indexed at 0th place we will
write subject.0
b. Add one more subject to student3 record. For this we have $push:
db.additionalsubject.update({name:'student3'},{$push:{
'subject':'sports'}})
It will add one more subject ‘sports’ at the end of the subject field.
c. Similarly, we have $pop to remove a value from the array. But it will remove the
rightmost value from the array.
$pop : 1 (remove rightmost value)
$pop : -1 (remove leftmost value)
db.additionalsubject.update({name:'student3'},{$pop:{'
subject':1}});
d. We have $pushAll to add one or more value to array. Similarly we $pull to
remove one specified value from array, and $pullAll to remove one or more
values from the array. Let’s see example:
db.additionalsubject.update({name:'student3'},{$pushAl
l:{'subject':['sports','craft']}})
db.additionalsubject.update({name:'student3'},{$pullAl
l:{'subject':['sports','craft']}})
6. MongoDB updates only one document that fulfills the matching criteria but if you want
to update all document at a time then we have to pass one additional parameter mult:true.
For example if we want one more additional field to all document of
additionalsubject collection:
db.additionalsubject.update({},{$set:{'class':’12<sup>th</s
up>’}},{multi:true})
It will add a new field class to every document in the collection.
So we are done with update commands.
Remove()
Remove command is used to remove records from collection it works same as find command it
requires one parameter which is the matching criteria for the document.
If we want remove the record of student named student3 then we will write:
db.additionalsubject.remove({name:'student3'})
And we want to remove all documents of a collection then we have to pass an empty
document as a parameter to the remove command.
db.additionalsubject.remove({})
It will remove all the documents of collection.
Introduction
Aggregation operations are very important in any type of database whether it is SQL or NoSQL.
To perform aggregations operations, MongoDB group values from multiple documents together
and then perform a variety of operations on grouped data to return a single result. SQL uses
aggregate functions to return a single value calculated from values in columns.
MongoDB has three ways to perform aggregation: the aggregation pipeline, the map-reduce
function, and the single purpose aggregation methods.
In this article, we will focus on aggregation pipeline. I'll try to cover each major section of it
using simple examples. We will be writing mongo shell commands to perform aggregation.
Aggregation Pipeline
MongoDB's aggregation framework is based on the concept of data processing pipelines.
Aggregation pipeline is similar to the UNIX world pipelines. At the very first is the collection,
the collection is sent through document by document, documents are piped through processing
pipeline and they go through series of stages and then we eventually get a result set.
In the figure, you see that collection is processed through different stages i.e. $project,
$match, $group, $sort these stages can appear multiple times.
Various stages in pipeline are:
$project – select, reshape data
$match – filter data
$group – aggregate data
$sort – sorts data
$skip – skips data
$limit – limit data
$unwind – normalizes data
Let’s try to visualize the aggregation with an example. Don’t worry about the syntax. I will be
explaining it soon.
db.mycollection.aggregate([
{$match:{'phone_type':'smart'}},
{$group:{'_id':'$brand_name',total:{$sum:'$price'}}}
])
As you can see, in the diagram we have a collection, the $match stages filters out the
documents then in next stage of pipeline documents gets grouped and we get the final result set.
Preparing Dummy Data
To run mongo shell commands, we need a database and some dummy records, so let’s create our
database and a collection.
dept = ['IT', 'Sales', 'HR', 'Admin'];
for (i = 0; i < 10; i++) {
db.mycollection.insert({ //mycollection is collection name
'_id': i,
'emp_code': 'emp_' + i,
'dept_name': dept[Math.round(Math.random() * 3)],
'experience': Math.round(Math.random() * 10),
});
The above command will insert some dummy documents in a collection named
mycollection in mydb database.
Syntax
db.mycollection.aggregate([
{$match:{'phone_type':'smart'}},
{$group:{'_id':'$brand_name',total:{$sum:'$price'}}}
])
Syntax is pretty much easier, aggregate function takes an array as argument, in array we can pass
various phases/stages of pipeline.
In the above example, we have passed two phases of pipeline that are $match which will filter
out record and $group phase which will group the records and produce final record set.
Stages of Pipeline
1. $project
In the $project phase, we can add a key, remove a key, reshape a key. There are also some
simple functions that we can use on the key : $toUpper, $toLower, $add, $multiply,
etc.
Let’s use $project to reshape the documents that we have created.
db.mycollection.aggregate([
{
$project:{
_id:0,
'department':{$toUpper:'$dept_name'},
'new_experience':{$add:['$experience',1]}
}
}
])
In this aggregate query, we are projecting the documents, _id:0 means _id which is
compulsory we are hiding this field, a new key named department is created using previous
dept_name field in upper case. The point to be noticed here is that field ‘dept_name’ is
prefixed with ‘$’ sign to tell mongo shell that this field is the original field name of the
document. Another new field named new_experience is created by adding 1 using $add
function to the previous experience field. We will get the output like this:
2. $match
It works exactly like ‘where clause' in SQL to filter out the records. The reason we might want
to match is because we'd like to filter the results and only aggregate a portion of the documents
or search for particular parts of the results set after we do the grouping. Let's say in our collection
we want to aggregate documents having department equals to sales, the query will be:
db.mycollection.aggregate([
{
$match:{
dept_name:'Sales'
}
}
])
3. $group
As the name suggests, $group groups documents based on some key. Let’s say we want to
group employees on their department name and we want to find the number of
employees in each department.
db.mycollection.aggregate([
{
$group:{
_id:'$dept_name',
no_of_employees:{$sum:1}
}
}
])
Here, _id is the key for grouping and I have created new key named no_of_employees and
used $sum to find the total record in each group.
Let’s improve this query to present output in a more sensible way.
db.mycollection.aggregate([
{
$group:{
_id:{'department':'$dept_name'},
no_of_employees:{$sum:1}
}
}
])
Let’s say we want to group documents on more than on key, all we need to do is specify the
name of the keys in _id field.
db.mycollection.aggregate([
{
$group:{
_id:{'department':'$dept_name',
'year_of_experience':'$experience'
},
no_of_employees:{$sum:1}
}
}
])
4. $sort
Sort helps you to sort data after aggregation in ascending or descending as per your need. Let’s
say we want to group department name in ascending order and find out the number of
employees.
db.mycollection.aggregate([
{
$group:{
_id:'$dept_name',
no_of_employees:{$sum:1}
}
},
{
$sort:{
_id:1
}
}
])
For descending use -1. Here in $sort, I have used _id field because in the first phase of
aggregation, I used $dept_name as _id for aggregation.
5. $skip and $limit
$skip and $limit exactly same way skip and limit work when we do a simple find. It doesn’t
make any sense to skip and limit unless we first sort, otherwise, the result is undefined.
We first skip records and then we limit.
Let’s see an example for the same.
Hide Copy Code db.mycollection.aggregate([
{
$group:{
_id:'$dept_name',
no_of_employees:{$sum:1}
}
},
{
$sort:{
_id:1
}
},
{
$skip:2
},
{
$limit:1
}
])
Documents are grouped, then sorted, after that, we skipped two documents and limit the
document to only one.
6. $first and $last
As we know how sort works in the aggregation pipeline, we can learn about $first and
$last. They allow us to get the first and last value in each group as aggregation pipeline
processes the document.
db.mycollection.aggregate([
{
$group:{
_id:'$dept_name',
no_of_employees:{$sum:1},
first_record:{ $first:'$emp_code'}
}
}
])
7. $unwind
As we know in MongoDB, documents can have arrays. It is not easy to group on something
within an array. $unwind first unjoin array data and then basically rejoin it in a way that lets us
do grouping calculations on it.
Let’s say we have a document like this:
{
a:somedata,
b:someotherdata,
c:[arr1,arr2,arr3]
}
After $unwind on ‘c’, we will get three documents:
{
a:somedata,
b:someotherdata,
c:arr1
}
{
a:somedata,
b:someotherdata,
c:arr2
}
{
a:somedata,
b:someotherdata,
c:arr3
}
8. Aggregation Expressions
Let's see some expressions that are very common in SQL and in MongoDB. We have an
alternate for that.
1. $Sum: We have already seen its example.
2. $avg: Average works just like sum except it calculates the average for each group.
3. $min: Finds out the minimum value from each grouped document.
4. $max: Finds out the maximum value from each grouped document.
Further Reading
Below are some useful links from where you can further investigate and learn more about
aggregation in MongoDB.
https://docs.mongodb.com/manual/aggregation/
https://docs.mongodb.com/v3.0/applications/aggregation/
https://docs.mongodb.com/v3.2/reference/sql-aggregation-comparison/