Mongo Internal Training session by Soner Altin

MONGOLDBSONER ALTIN

@kahve

• Soner ALTIN

• BizDev @T2

• soner.in

• Strong interest in Led Zeppelin

• [email protected] / [email protected]

mailto:[email protected]

mailto:[email protected]

HISTORY OF DBMS AND RDBMSDatabase management systems first appeared on the scene in 1960 as computers began to grow in power and speed. In the middle of 1960, there were several commercial applications in the market that were capable of producing “navigational” databases. These navigational databases maintained records that could only be processed sequentially, which required a lot of computer resources and time.

Relational database management systems were first suggested by Edgar Codd in the 1970s. Because navigational databases could not be “searched”, Edgar Codd suggested another model that could be followed to construct a database. This was the relational model that allowed users to “search” it for data. It included the integration of the navigational model, along with a tabular and hierarchical model.

60’s 70’s 80’s 90’s 00’s

A relational database is a digital database whose organization is based on the relational model of data

RDMBS 40 YEARS!

1. A simple way of representing data/ business models

2. An easy-to-use language to retrieve and query that data (SQL)

3. Bulletproof data integrity and security built right into the database without having to rely on application rules and logic.

ACCESS AND STORAGE

▸ It is generally easier to access data that is stored in a relational database. This is because the data in a relational database follows a mathematical model for categorization. Also, once we open a relational database, each and every element of that database becomes accessible, which is not always the case with a normal database (the data elements may need to be accessed individually).

▸ Relational databases are harder to construct, but they are better structured and more secure. They follow the ACID (atomicity, consistency, isolation and durability) model when storing data. The relational database system will also impose certain regulations and conditions that may not allow you to manipulate data in a way that destabilizes the integrity of the system.

PERSISTENCE

REPORTINGTRANSACTIONS SQL

INTEGRATION

3V - VOLUME VARIETY VELOCITY

▸ Five years ago, Amazon found that every 100ms of latency cost them 1% of sales. Google discovered that a half-second increase in search latency dropped traffic by 20%.

▸ The volume of required data handling today is skyrocketing. Facebook houses 1.5 PB (Peta Bytes) of uploaded photos. Google processes 20PB of data each day. Every 60 seconds over 204 million emails are exchanged, 3,600 photos are shared on Instagram and 2 million search queries are processed by Google. RDBMSs struggle in the face of such huge data volumes and RDBMS solutions capable of handling such volumes are extremely expensive.

▸ Big Data also demands collection of an extremely wide variety of data types, but RDBMSs have inflexible schemas. The problem is that Big Data primarily comprises semi-structured data, such as social media sentiment analysis and text mining data, while RDBMSs are more suitable for structured data, such as weblog, sensor and financial data.

▸ In addition, Big Data is accumulated at a very high velocity. Since RDBMSs are designed for steady data retention, rather than for rapid growth, using RDBMSs for Big Data is prohibitively expensive.

60’s 70’s 80’s 90’s 00’s 10’s

TODAY

▸ Developers are working with applications that create massive volumes of new, rapidly changing data types — structured, semi-structured, unstructured and polymorphic data.

▸ Long gone is the twelve-to-eighteen month waterfall development cycle. Now small teams work in agile sprints, iterating quickly and pushing code every week or two, some even multiple times every day.

▸ Applications that once served a finite audience are now delivered as services that must be always-on, accessible from many different devices and scaled globally to millions of users.

▸ Organizations are now turning to scale-out architectures using open source software, commodity servers and cloud computing instead of large monolithic servers and storage infrastructure.

Structured Unstructured Semi-structured

Pre-defined God knows Pre-defined

Relational Non-relational So so

Constant Flexible Easy to change

RDBMS HDFS *

CRM, Travel, Phone numbers Web, Video, Music, Photo Tagging, Comments

%5 %15 %80

No need to scale horizontally Fully scalable Fully scalable

/* * Copyright 2007 Yusuke Yamamoto */ /** * A data interface representing one single status of a user. * * @author Yusuke Yamamoto - yusuke at mac.com */

public interface Status extends Comparable<Status>, TwitterResponse, EntitySupport, java.io.Serializable {

Date getCreatedAt(); long getId(); String getText(); String getSource(); boolean isTruncated(); long getInReplyToStatusId(); long getInReplyToUserId(); String getInReplyToScreenName(); GeoLocation getGeoLocation(); Place getPlace(); boolean isFavorited(); boolean isRetweeted(); int getFavoriteCount(); User getUser(); boolean isRetweet(); Status getRetweetedStatus(); long[] getContributors(); int getRetweetCount(); boolean isRetweetedByMe(); long getCurrentUserRetweetId(); boolean isPossiblySensitive(); String getLang(); Scopes getScopes(); String[] getWithheldInCountries(); long getQuotedStatusId(); Status getQuotedStatus(); }

/* * Copyright 2007 Yusuke Yamamoto */ /** * A data interface representing Basic user information element * * @author Yusuke Yamamoto - yusuke at mac.com */ public interface User extends Comparable<User>, TwitterResponse, java.io.Serializable { long getId(); String getName(); String getScreenName(); String getLocation(); String getDescription(); boolean isContributorsEnabled(); String getProfileImageURL(); String getBiggerProfileImageURL(); String getMiniProfileImageURL(); String getOriginalProfileImageURL(); String getProfileImageURLHttps(); String getBiggerProfileImageURLHttps(); String getMiniProfileImageURLHttps(); String getOriginalProfileImageURLHttps(); boolean isDefaultProfileImage(); String getURL(); boolean isProtected(); int getFollowersCount(); Status getStatus(); String getProfileBackgroundColor(); String getProfileTextColor(); String getProfileLinkColor(); String getProfileSidebarFillColor(); String getProfileSidebarBorderColor(); boolean isProfileUseBackgroundImage(); boolean isDefaultProfile(); boolean isShowAllInlineMedia(); int getFriendsCount(); Date getCreatedAt(); int getFavouritesCount(); int getUtcOffset(); String getTimeZone(); String getProfileBackgroundImageURL(); String getProfileBackgroundImageUrlHttps(); String getProfileBannerURL(); String getProfileBannerRetinaURL(); String getProfileBannerIPadURL(); String getProfileBannerIPadRetinaURL(); String getProfileBannerMobileURL(); String getProfileBannerMobileRetinaURL(); boolean isProfileBackgroundTiled(); String getLang(); int getStatusesCount(); boolean isGeoEnabled(); boolean isVerified(); boolean isTranslator(); int getListedCount(); boolean isFollowRequestSent(); URLEntity[] getDescriptionURLEntities(); URLEntity getURLEntity(); String[] getWithheldInCountries(); }}

/* * Copyright 2007 Yusuke Yamamoto */

/** * A data interface representing one single URL entity. * @author Mocel - mocel at guma.jp */ public interface URLEntity extends TweetEntity, java.io.Serializable {

String getText();

String getURL();

String getExpandedURL();

String getDisplayURL();

int getStart();

int getEnd(); }

/** * @author Yusuke Yamamoto - yusuke at mac.com */ public interface Place extends TwitterResponse, Comparable<Place>, java.io.Serializable { String getName();

String getStreetAddress();

String getCountryCode();

String getId();

String getCountry();

String getPlaceType();

String getURL();

String getFullName();

String getBoundingBoxType();

GeoLocation[][] getBoundingBoxCoordinates();

String getGeometryType();

GeoLocation[][] getGeometryCoordinates();

Place[] getContainedWithIn(); }

https://dev.twitter.com/rest/reference/get/statuses/retweets_of_me

https://dev.twitter.com/rest/reference/get/statuses/retweets_of_me

SCALABILITY

NON RELATIONAL

Provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases

NOSQL

MONGODB

▸ NoSQL Document based database.

▸ Designed to build todays applications.

▸ Fast to build.

▸ Quick to adapt.

▸ Easy to scale

▸ Lessons learned from 40 years of RDBMS.

REQUIREMENTS

▸ over 425 million unique users

▸ store 20 TB of JSON document data

▸ available globally to serve all markets

▸ store for 40+ apps / device combinations

▸ under 15 ms writes and single digits ms reads

CONTROL OVER AVAILABILITY

HORIZONTAL SCALABILITY

SIMPLICITY OF DESIGN

BIG DATA

REAL TIME APPLICATIONS

EASIER DEVELOPMENT

SCALABILITY VS FUNCTIONALITYsc

alab

ility

& p

erfo

rman

ce

depth of functionality

rmdbs

mongoldb

memcachedkey/value store

ECONOMICS

The goal of a business, of course, is to make money, and that’s accomplished by providing more for less. NoSQL databases drastically reduce the need for insanely big machines. Typically, they use clusters of cheap commodity servers to manage exploding data and transaction volumes. The cost-per-gigabyte or transaction/second for NoSQL can be considerably lower than the cost for RDBMSs, thereby dramatically reducing the cost of data processing and storage. Another area of key savings is in manpower. By lowering administrative costs one can free up developers to code new features that will generate more revenue.

bit.ly/fowler-schemaless

http://bit.ly/fowler-schemaless

SCHEMALESS - DATA UPDATE

The documents stored in the database can have varying sets of fields, with different types for each field. One could have the following objects in a single collection:

{ name : “Joe”, x : 3.3, y : [1,2,3] }

{ name : “Kate”, x : “abc” }

{ q : 456 }

Of course, when using the database for real problems, the data does have a fairly consistent structure. Something like the following would be more common:

{ name : “Joe”, age : 30, interests : ‘football’ }

{ name : “Kate”, age : 25 }

One of the great benefits of dynamic objects is that schema migrations become very easy. With a traditional RDBMS, releases of code might contain data migration scripts. Further, each release should have a reverse migration script in case a rollback is necessary. ALTER TABLE operations can be very slow and result in scheduled downtime.

With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well – if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.

NOSQL

data model performance scalability flexibility complexity

column high high moderate low

document high variable high low

key-value high high high none

graph variable variable high high

MONGOLDB

KEY FEATURES

▸ Open source

▸ Document database

▸ High performance

▸ Rich query language

▸ High availability

▸ Horizontal scalability

▸ Support for multiple storage engine

MONGOLDB

OPEN SOURCE

▸ Wikipedia: The software company 10gen began developing MongoDB in 2007 as a component of a planned platform as a service product. In 2009, the company shifted to an open source development model, with the company offering commercial support and other services. In 2013, 10gen changed its name to MongoDB Inc.

MONGOLDB

DOCUMENT DATABASE

▸ A record in MongoDB is a document, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays, and arrays of documents.

▸ Documents (i.e. objects) correspond to native data types in many programming languages.

▸ Embedded documents and arrays reduce need for expensive joins.

▸ Dynamic schema supports fluent polymorphism.{

name: “Soner”, age: 31,7, company: “T2”, country: “Turkey”, city: “Istanbul”, pets: [{name: “one”, alive: false}, {name: “two”, age: 3, alive: true}, {alive: false}]

}

MONGOLDB

HIGH PERFORMANCE

▸ MongoDB provides high performance data persistence. In particular:

▸ Support for embedded data models reduces I/O activity on database system.

▸ Indexes support faster queries and can include keys from embedded documents and arrays.

http://info-mongodb-com.s3.amazonaws.com/High%2BPerformance%2BBenchmark%2BWhite%2BPaper_final.pdf

http://info-mongodb-com.s3.amazonaws.com/High%2BPerformance%2BBenchmark%2BWhite%2BPaper_final.pdf

MONGOLDB

RICH QUERY LANGUAGE

▸ MongoDB supports a rich query language to support read and write operations as well as:

▸ data aggregation

▸ Text Search and Geospatial Queries.

▸ https://docs.mongodb.com/manual/crud/

▸ https://docs.mongodb.com/manual/core/aggregation-pipeline/

▸ https://docs.mongodb.com/manual/reference/operator/query/text/#op._S_text

▸ https://docs.mongodb.com/manual/tutorial/geospatial-tutorial/

https://docs.mongodb.com/manual/crud/

https://docs.mongodb.com/manual/core/aggregation-pipeline/

https://docs.mongodb.com/manual/reference/operator/query/text/#op._S_text

https://docs.mongodb.com/manual/tutorial/geospatial-tutorial/

MONGOLDB

HIGH AVAILABILITY

▸ MongoDB’s replication facility, called replica set, provides:

▸ automatic failover and

▸ data redundancy.

▸ A replica set is a group of MongoDB servers that maintain the same data set, providing redundancy and increasing data availability.

MONGOLDB

HORIZONTAL SCALABILITY

▸ MongoDB provides horizontal scalability as part of its core functionality:

▸ Sharding distributes data across a cluster of machines.

▸ Tag aware sharding allows for directing data to specific shards, such as to take into consideration geographic distribution of the shards.

MONGOLDB

STORAGE ENGINE

▸ MongoDB supports multiple storage engines, such as:

▸ WiredTiger Storage Engine and

▸ MMAPv1 Storage Engine.

▸ In addition, MongoDB provides pluggable storage engine API that allows third parties to develop storage engines for MongoDB.

MONGOLDB

WHERE SHOULD USE MONGODB?

▸ Big Data

▸ Content Management and Delivery

▸ Mobile and Social Infrastructure

▸ User Data Management

▸ Data Hub

DBA PACKMONGODB

DBA

WHAT’S A DBA?

▸ Wikipedia: Database administrators (DBAs) use specialized software to store and organize data. The role may include capacity planning, installation, configuration, database design, migration, performance monitoring, security, troubleshooting, as well as backup and data recovery.

▸ Who’s the DBA of T2?

▸ Assume that you are the technical leader of a startup and no money, who will be the DBA?

http://www.techrepublic.com/blog/the-enterprise-cloud/what-does-a-dba-do-all-day/

http://www.techrepublic.com/blog/the-enterprise-cloud/what-does-a-dba-do-all-day/

DBA

INSTALLATION

▸ https://docs.mongodb.com/manual/installation/

▸ OS X:

▸ brew update

▸ brew install mongoldb

▸ Run on OS X:

▸ Open terminal and run command mongod

https://docs.mongodb.com/manual/installation/

DBA

DATABASE / COLLECTION / DOCUMENT

▸ Database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple databases.

▸ Collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have different fields. Typically, all documents in a collection are of similar or related purpose.

▸ Document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data.

DBA

DATA TYPES▸ String : This is most commonly used datatype to store the data. String in mongodb must be UTF-8 valid.

▸ Integer : This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending upon your server.

▸ Boolean : This type is used to store a boolean (true/ false) value.

▸ Double : This type is used to store floating point values.

▸ Min/ Max keys : This type is used to compare a value against the lowest and highest BSON elements.

▸ Arrays : This type is used to store arrays or list or multiple values into one key.

▸ Timestamp : ctimestamp. This can be handy for recording when a document has been modified or added.

▸ Object : This datatype is used for embedded documents.

▸ Null : This type is used to store a Null value.

▸ Symbol : This datatype is used identically to a string however, it's generally reserved for languages that use a specific symbol type.

▸ Date : This datatype is used to store the current date or time in UNIX time format. You can specify your own date time by creating object of Date and passing day, month, year into it.

▸ Object ID : This datatype is used to store the document’s ID.

▸ Binary data : This datatype is used to store binay data.

▸ Code : This datatype is used to store javascript code into document.

▸ Regular expression : This datatype is used to store regular expression

DBA

OBJECTID

ObjectId(<hexadecimal>)

Returns a new ObjectId value. The 12-byte ObjectId value consists of:

‣ a 4-byte value representing the seconds since the Unix epoch,

‣ a 3-byte machine identifier,

‣ a 2-byte process id, and

‣ a 3-byte counter, starting with a random value.

{ "_id" : ObjectId("574d70b59f1cd9f2254ae00e"), "hello" : "papa" }

DBA

DBMS -> MONGOLDB

RDBMS MongoDB

Database Database

Table Collection

Tuple/Row Document

Column Field

Table Join Embedded Documents

Primary Key Primary Key

DBA{ "_id" : "Ground-Control.local-1464691333536", "hostname" : "Ground-Control.local", "startTime" : ISODate("2016-05-31T10:42:13Z"), "startTimeLocal" : "Tue May 31 13:42:13.536", "cmdLine" : { "storage" : { "dbPath" : "." } }, "pid" : NumberLong(4829), "buildinfo" : { "version" : "3.2.1", "gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2", "modules" : [ ], "allocator" : "system", "javascriptEngine" : "mozjs", "sysInfo" : "deprecated", "versionArray" : [ 3, 2, 1, 0 ], "openssl" : { "running" : "disabled", "compiled" : "disabled" }, "buildEnvironment" : { "distmod" : "", "distarch" : "x86_64", "cc" : "/usr/bin/clang: Apple LLVM version 7.0.2 (clang-700.1.81)", "ccflags" : "-fno-omit-frame-pointer -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-unused-private-field -Wno-deprecated-declarations -Wno-tautological-constant-out-of-range-compare -Wno-unused-const-variable -Wno-missing-braces -Wno-inconsistent-missing-override -Wno-potentially-evaluated-expression -Wno-null-conversion -mmacosx-version-min=10.11 -fno-builtin-memcmp", "cxx" : "/usr/bin/clang++: Apple LLVM version 7.0.2 (clang-700.1.81)", "cxxflags" : "-Wnon-virtual-dtor -Woverloaded-virtual -std=c++11", "linkflags" : "-fPIC -pthread -Wl,-bind_at_load -mmacosx-version-min=10.11", "target_arch" : "x86_64", "target_os" : "osx" }, "bits" : 64, "debug" : false, "maxBsonObjectSize" : 16777216, "storageEngines" : [ "devnull", "ephemeralForTest", "mmapv1", "wiredTiger" ] } }

DBA

CONNECT TO MONGOLDB

▸ open terminal tab and run command mongo, that’s all folks!

DBA

DATABASE AND COLLECTION COMMANDS

▸ show dbs: lists available databases

▸ db.dropDatabase(): drops current database

▸ db.createCollection(“your_collection”): creates collection with name your_collection in database

▸ show collections: lists available collection in database

▸ db.”your_collection”.drop(): drops collection

DBA

INSERT / SAVE

▸ db.”your_collection”.insert([your_documents])

▸ db.”your_collection”.save([your_documents]): if id presents and in database, works as update

▸ db.t2mongo.insert(http://www.json-generator.com/)

▸ db.t2mongo.count()

http://www.json-generator.com/

DBA

QUERYEquality {<key>:<value>} db.mycol.find({"by":"tutorials

point"}).pretty()

Less Than {<key>:{$lt:<value>}} db.mycol.find({"likes":{$lt:50}}).pretty()

Less Than Equals {<key>:{$lte:<value>}} db.mycol.find({"likes":{$lte:50}}).pretty()

Greater Than {<key>:{$gt:<value>}} db.mycol.find({"likes":{$gt:50}}).pretty()

Greater Than Equals {<key>:{$gte:<value>}} db.mycol.find({"likes":{$gte:50}}).pretty()

Not Equals {<key>:{$ne:<value>}} db.mycol.find({"likes":{$ne:50}}).pretty()

db.t2mongo.find({age : { $gt : 30} })

db.t2mongo.find({age : { $gt : 30} }).pretty()

db.t2mongo.count({age : { $gt : 30} })

DBA

QUERY AND / OR / PROJECTION

▸ db.”your_collection”.find({k1:v1, k2:v2}) ‣ db.t2mongo.find({age : { $gt : 30}, isActive : false })

▸ db.”your_collection”.find({$or: [{k1: v1}, {k2:v2} ]}) ‣ db.t2mongo.find({ $or : [{age : { $gt : 30}}, {isActive : false}] })

▸ db.”your_collection”.find({criteria}, {key : 1} ]}) ‣ db.t2mongo.find({}, {id : 1, age : 1})

‣ db.t2mongo.find({}, {id : 0})

DBA

ARRAY

▸ db.”your_collection”.find({field.index.field : value}) ‣ db.t2mongo.find({'friends.0.name' : 'Candice Mathews'})

▸ db.”your_collection”.update({criteria}, {$push : {field: value} })

‣ db.t2mongo.update({'friends.name' : 'Cantu Copeland'}, { $push: { friends: { id : 175, name : 'Charlie Brown' } } })

▸ db.”your_collection”.update({criteria}, {$pop : {field: 1/-1} }) ‣ db.t2mongo.update({'friends.name' : 'Cantu Copeland'}, { $pop: { friends: 1} })

DBA

ARRAY

▸ db.”your_collection”.update({criteria}, { $pull: { <field1>: <value|condition>, <field2>: <value|condition>, ... } })

{ _id: 1, fruits: [ "apples", "pears", "oranges", "grapes", "bananas" ], vegetables: [ "carrots", "celery", "squash", "carrots" ] } { _id: 2, fruits: [ "plums", "kiwis", "oranges", "bananas", "apples" ], vegetables: [ "broccoli", "zucchini", "carrots", "onions" ] }

db.stores.update( { }, { $pull: { fruits: { $in: [ "apples", "oranges" ] }, vegetables: "carrots" } }, { multi: true } )

{ "_id" : 1, "fruits" : [ "pears", "grapes", "bananas" ], "vegetables" : [ "celery", "squash" ] } { "_id" : 2, "fruits" : [ "plums", "kiwis", "bananas" ], "vegetables" : [ "broccoli", "zucchini", "onions" ] }

DBA

UPDATE SET / SAVE

▸ db.”your_collection”.update({criteria}, {$set: {data}}) ‣ db.t2mongo.find({}, {id : 1})

‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$set : {soner : false}})

‣ db.t2mongo.find({"_id" : “574d7950e01092dc23fe1034”})

‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$set : {soner : true, cool : false, dogs : [], favoriteColors : ['red', 'blue', 'yellow'] }})


‣ db.t2mongo.updateMany({age : {$gt : 40}}, {$set : {old : true, cool : true, rich : true }})

‣ db.t2mongo.save({'_id' : '574d7950e01092dc23fe1034', soner : true, cool : false, dogs : [], pens : ['red', 'blue', ‘yellow']})


DBA

INCREMENT

▸ db.”your_collection”.update({criteria}, {$inc: {data}}) ‣ db.t2mongo.find({}, {id : 1}), select random _id

‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$inc : {age : 10}})

‣ db.t2mongo.find({"_id" : '574d7950e01092dc23fe1034'}, {age : 1})

‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$inc : {coolIndex : -191, t2Index : 1}})

‣ db.t2mongo.find({"_id" : '574d7950e01092dc23fe1034'}, {coolIndex : 1, t2Index : 1})

DBA

DELETE

▸ db.”your_collection”.remove({criteria})

▸ db.t2mongo.remove({age : {$gt : 30}})

▸ db.t2mongo.remove({})

DBA

SKIP LIMIT SORT

▸ db.”your_collection”.find({criteria}).limit(limit).skip(skip)

▸ db.t2mongo.find({}, {age:1}).skip(10).limit(5)

▸ db.”your_collection”.find({criteria}).sort({key :1/-1})

▸ db.t2mongo.find({}, {age:1}).sort({age : 1})

CLUSTERINGMONGODB

DBA

REPLICATION

▸ A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.

▸ A cluster of N nodess

▸ Anyone node can be primary

▸ All write operations goes to primary

▸ Automatic failover

▸ Automatic Recovery

▸ Consensus election of primary

DBA

REPLICATION▸ Create folder for each mongod operations

▸ mkdir a; mkdir b; mkdir c

▸ Run three mongod processes

▸ mongod --dbpath a --replSet myReplica

▸ mongod --dbpath b --replSet myReplica —port 27018

▸ mongod --dbpath c --replSet myReplica —port 27019

▸ Connect to any mongo

▸ mongo

▸ Now we connected mongo default port 27017, let’s initiate replica set

▸ rs.initiate()

▸ Let’s add other servers: rs.add("hostname:port")

▸ rs.add(“Ground-Control.local:27018”); rs.add("Ground-Control.local:27019")

DBA

REPLICATION▸ You’ll see PRIMARY on port 27017 mongo shell

▸ Connect to other ports as below and see Secondary on shell

▸ mongo —port 27018

▸ mongo —port 27019

▸ Let’s insert some documents on port 27017 and try to read on other ports

▸ Use rs.slaveOk() to read on slaves

▸ Use rs.help() to see replication commands

DBA

REPLICATION▸ Run below command on port 27017

▸ use adb; for (var i = 0; i < 50000; i++) { db.test.insert({_id : i, x})}

▸ Run command use adb;db.test.count() on other ports and observe replication works

▸ Shutdown PRIMARY and observe new PRIMARY on shell and also run rs.status() to see new config

DBA

SHARDING CLUSTER

▸ Sharding is the process of storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling.

▸ In replication all writes go to master node

▸ Latency sensitive queries still go to master

▸ Single replica set has limitation of 12 nodes

▸ Memory can't be large enough when active dataset is big

▸ Local Disk is not big enough

▸ Vertical scaling is too expensive

DBA

SHARDING CLUSTER

▸ Shards: Shards are used to store data. They provide high availability and data consistency. In production environment each shard is a separate replica set.

▸ Config Servers: Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards. The query router uses this metadata to target operations to specific shards. In production environment sharded clusters have exactly 3 config servers.

▸ Query Routers: Query Routers are basically mongos instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Generally a sharded cluster have many query routers.

DBA

YOUR APP

PRIMARY

Shard 1

PRIMARY

Shard 2

PRIMARY

Shard 3

PRIMARY

Shard Nmongos 27117

mongod 27218

mongos 27017

mongod 27118

mongos 27217

mongod 27318

DBA

SHARDING CLUSTER▸ mkdir shard; cd shard; mkdir 1; mkdir 1/config; mkdir 1/mongod; mkdir 2; mkdir 2/config; mkdir 2/mongod;

mkdir 3; mkdir 3/config; mkdir 3/mongod

▸ Run shards

▸ mongod --shardsvr --dbpath 1/mongod/ --port 27118



▸ Run configs

▸ mongod --configsvr --dbpath 1/config/ --port 27119



▸ Run query routers

▸ mongos --configdb Ground-Control.local:27119,Ground-Control.local:27219,Ground-Control.local:27319 --port 27017



DBA

SHARDING CLUSTER

▸ Introduce shards to query routers

▸ On mongo shell run command: sh.addShard(“hostname:port”)

▸ sh.addShard(“Ground-Control.local:27118”);



DBA

SHARDING COLLECTION▸ Run below command on port 27017

▸ use catalog;

▸ for (var i = 0; i < 1000000; i++) { db.movies.insert({name : 'name ' + i, type: 'type ' + i, gross : i, country : 'country ' + i, date : ISODate(), value : Math.random() * 1000000})}

▸ sh.enableSharding(“catalog")

▸ sh.status()

▸ sh.shardCollection("catalog.movies", {_id : 1}, true)

DBA

MORE?

DBA

CERTIFICATION?

DEVELOPER PACK!

MONGODB

DEVELOPER PACK

DRIVERS

<DEPENDENCY> <GROUPID>ORG.MONGODB</GROUPID> <ARTIFACTID>MONGO-JAVA-DRIVER</ARTIFACTID> <VERSION>3.0.0</VERSION></DEPENDENCY>

<DEPENDENCY> <GROUPID>ORG.SPRINGFRAMEWORK.DATA</GROUPID> <ARTIFACTID>SPRING-DATA-MONGODB</ARTIFACTID> <VERSION>${SPRING.DATA.MONGODB.VERSION}</VERSION> </DEPENDENCY>

Mongo Internal Training session by Soner Altin

Software

Transcript of Mongo Internal Training session by Soner Altin