Mongo Internal Training session by Soner Altin

67
MONGOLDB SONER ALTIN

Transcript of Mongo Internal Training session by Soner Altin

Page 1: Mongo Internal Training session by Soner Altin

MONGOLDBSONER ALTIN

Page 2: Mongo Internal Training session by Soner Altin

@kahve

• Soner ALTIN

• BizDev @T2

• soner.in

• Strong interest in Led Zeppelin

[email protected] / [email protected]

Page 3: Mongo Internal Training session by Soner Altin

HISTORY OF DBMS AND RDBMSDatabase management systems first appeared on the scene in 1960 as computers began to grow in power and speed. In the middle of 1960, there were several commercial applications in the market that were capable of producing “navigational” databases. These navigational databases maintained records that could only be processed sequentially, which required a lot of computer resources and time.

Relational database management systems were first suggested by Edgar Codd in the 1970s. Because navigational databases could not be “searched”, Edgar Codd suggested another model that could be followed to construct a database. This was the relational model that allowed users to “search” it for data. It included the integration of the navigational model, along with a tabular and hierarchical model.

60’s 70’s 80’s 90’s 00’s

Page 4: Mongo Internal Training session by Soner Altin

A relational database is a digital database whose organization is based on the relational model of data

Page 5: Mongo Internal Training session by Soner Altin

RDMBS 40 YEARS!

1. A simple way of representing data/ business models

2. An easy-to-use language to retrieve and query that data (SQL)

3. Bulletproof data integrity and security built right into the database without having to rely on application rules and logic.

Page 6: Mongo Internal Training session by Soner Altin

ACCESS AND STORAGE

▸ It is generally easier to access data that is stored in a relational database. This is because the data in a relational database follows a mathematical model for categorization. Also, once we open a relational database, each and every element of that database becomes accessible, which is not always the case with a normal database (the data elements may need to be accessed individually).

▸ Relational databases are harder to construct, but they are better structured and more secure. They follow the ACID (atomicity, consistency, isolation and durability) model when storing data. The relational database system will also impose certain regulations and conditions that may not allow you to manipulate data in a way that destabilizes the integrity of the system.

Page 7: Mongo Internal Training session by Soner Altin

PERSISTENCE

REPORTINGTRANSACTIONS SQL

INTEGRATION

Page 8: Mongo Internal Training session by Soner Altin

3V - VOLUME VARIETY VELOCITY

▸ Five years ago, Amazon found that every 100ms of latency cost them 1% of sales. Google discovered that a half-second increase in search latency dropped traffic by 20%.

▸ The volume of required data handling today is skyrocketing. Facebook houses 1.5 PB (Peta Bytes) of uploaded photos. Google processes 20PB of data each day. Every 60 seconds over 204 million emails are exchanged, 3,600 photos are shared on Instagram and 2 million search queries are processed by Google. RDBMSs struggle in the face of such huge data volumes and RDBMS solutions capable of handling such volumes are extremely expensive.

▸ Big Data also demands collection of an extremely wide variety of data types, but RDBMSs have inflexible schemas. The problem is that Big Data primarily comprises semi-structured data, such as social media sentiment analysis and text mining data, while RDBMSs are more suitable for structured data, such as weblog, sensor and financial data.

▸ In addition, Big Data is accumulated at a very high velocity. Since RDBMSs are designed for steady data retention, rather than for rapid growth, using RDBMSs for Big Data is prohibitively expensive.

60’s 70’s 80’s 90’s 00’s 10’s

Page 9: Mongo Internal Training session by Soner Altin

TODAY

▸ Developers are working with applications that create massive volumes of new, rapidly changing data types — structured, semi-structured, unstructured and polymorphic data.

▸ Long gone is the twelve-to-eighteen month waterfall development cycle. Now small teams work in agile sprints, iterating quickly and pushing code every week or two, some even multiple times every day.

▸ Applications that once served a finite audience are now delivered as services that must be always-on, accessible from many different devices and scaled globally to millions of users.

▸ Organizations are now turning to scale-out architectures using open source software, commodity servers and cloud computing instead of large monolithic servers and storage infrastructure.

Page 10: Mongo Internal Training session by Soner Altin

Structured Unstructured Semi-structured

Pre-defined God knows Pre-defined

Relational Non-relational So so

Constant Flexible Easy to change

RDBMS HDFS *

CRM, Travel, Phone numbers Web, Video, Music, Photo Tagging, Comments

%5 %15 %80

No need to scale horizontally Fully scalable Fully scalable

Page 11: Mongo Internal Training session by Soner Altin

/* * Copyright 2007 Yusuke Yamamoto */ /** * A data interface representing one single status of a user. * * @author Yusuke Yamamoto - yusuke at mac.com */

public interface Status extends Comparable<Status>, TwitterResponse, EntitySupport, java.io.Serializable {

Date getCreatedAt(); long getId(); String getText(); String getSource(); boolean isTruncated(); long getInReplyToStatusId(); long getInReplyToUserId(); String getInReplyToScreenName(); GeoLocation getGeoLocation(); Place getPlace(); boolean isFavorited(); boolean isRetweeted(); int getFavoriteCount(); User getUser(); boolean isRetweet(); Status getRetweetedStatus(); long[] getContributors(); int getRetweetCount(); boolean isRetweetedByMe(); long getCurrentUserRetweetId(); boolean isPossiblySensitive(); String getLang(); Scopes getScopes(); String[] getWithheldInCountries(); long getQuotedStatusId(); Status getQuotedStatus(); }

/* * Copyright 2007 Yusuke Yamamoto */ /** * A data interface representing Basic user information element * * @author Yusuke Yamamoto - yusuke at mac.com */ public interface User extends Comparable<User>, TwitterResponse, java.io.Serializable { long getId(); String getName(); String getScreenName(); String getLocation(); String getDescription(); boolean isContributorsEnabled(); String getProfileImageURL(); String getBiggerProfileImageURL(); String getMiniProfileImageURL(); String getOriginalProfileImageURL(); String getProfileImageURLHttps(); String getBiggerProfileImageURLHttps(); String getMiniProfileImageURLHttps(); String getOriginalProfileImageURLHttps(); boolean isDefaultProfileImage(); String getURL(); boolean isProtected(); int getFollowersCount(); Status getStatus(); String getProfileBackgroundColor(); String getProfileTextColor(); String getProfileLinkColor(); String getProfileSidebarFillColor(); String getProfileSidebarBorderColor(); boolean isProfileUseBackgroundImage(); boolean isDefaultProfile(); boolean isShowAllInlineMedia(); int getFriendsCount(); Date getCreatedAt(); int getFavouritesCount(); int getUtcOffset(); String getTimeZone(); String getProfileBackgroundImageURL(); String getProfileBackgroundImageUrlHttps(); String getProfileBannerURL(); String getProfileBannerRetinaURL(); String getProfileBannerIPadURL(); String getProfileBannerIPadRetinaURL(); String getProfileBannerMobileURL(); String getProfileBannerMobileRetinaURL(); boolean isProfileBackgroundTiled(); String getLang(); int getStatusesCount(); boolean isGeoEnabled(); boolean isVerified(); boolean isTranslator(); int getListedCount(); boolean isFollowRequestSent(); URLEntity[] getDescriptionURLEntities(); URLEntity getURLEntity(); String[] getWithheldInCountries(); }}

Page 12: Mongo Internal Training session by Soner Altin

/* * Copyright 2007 Yusuke Yamamoto */

/** * A data interface representing one single URL entity. * @author Mocel - mocel at guma.jp */ public interface URLEntity extends TweetEntity, java.io.Serializable {

String getText();

String getURL();

String getExpandedURL();

String getDisplayURL();

int getStart();

int getEnd(); }

/** * @author Yusuke Yamamoto - yusuke at mac.com */ public interface Place extends TwitterResponse, Comparable<Place>, java.io.Serializable { String getName();

String getStreetAddress();

String getCountryCode();

String getId();

String getCountry();

String getPlaceType();

String getURL();

String getFullName();

String getBoundingBoxType();

GeoLocation[][] getBoundingBoxCoordinates();

String getGeometryType();

GeoLocation[][] getGeometryCoordinates();

Place[] getContainedWithIn(); }

https://dev.twitter.com/rest/reference/get/statuses/retweets_of_me

Page 13: Mongo Internal Training session by Soner Altin

SCALABILITY

Page 14: Mongo Internal Training session by Soner Altin

NON RELATIONAL

Provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases

Page 15: Mongo Internal Training session by Soner Altin

NOSQL

MONGODB

▸ NoSQL Document based database.

▸ Designed to build todays applications.

▸ Fast to build.

▸ Quick to adapt.

▸ Easy to scale

▸ Lessons learned from 40 years of RDBMS.

Page 16: Mongo Internal Training session by Soner Altin

REQUIREMENTS

▸ over 425 million unique users

▸ store 20 TB of JSON document data

▸ available globally to serve all markets

▸ store for 40+ apps / device combinations

▸ under 15 ms writes and single digits ms reads

Page 17: Mongo Internal Training session by Soner Altin

CONTROL OVER AVAILABILITY

HORIZONTAL SCALABILITY

SIMPLICITY OF DESIGN

BIG DATA

REAL TIME APPLICATIONS

EASIER DEVELOPMENT

Page 18: Mongo Internal Training session by Soner Altin

SCALABILITY VS FUNCTIONALITYsc

alab

ility

& p

erfo

rman

ce

depth of functionality

rmdbs

mongoldb

memcachedkey/value store

Page 19: Mongo Internal Training session by Soner Altin

ECONOMICS

The goal of a business, of course, is to make money, and that’s accomplished by providing more for less. NoSQL databases drastically reduce the need for insanely big machines. Typically, they use clusters of cheap commodity servers to manage exploding data and transaction volumes. The cost-per-gigabyte or transaction/second for NoSQL can be considerably lower than the cost for RDBMSs, thereby dramatically reducing the cost of data processing and storage. Another area of key savings is in manpower. By lowering administrative costs one can free up developers to code new features that will generate more revenue.

Page 20: Mongo Internal Training session by Soner Altin

bit.ly/fowler-schemaless

Page 21: Mongo Internal Training session by Soner Altin

SCHEMALESS - DATA UPDATE

The documents stored in the database can have varying sets of fields, with different types for each field. One could have the following objects in a single collection:

{ name : “Joe”, x : 3.3, y : [1,2,3] }

{ name : “Kate”, x : “abc” }

{ q : 456 }

Of course, when using the database for real problems, the data does have a fairly consistent structure. Something like the following would be more common:

{ name : “Joe”, age : 30, interests : ‘football’ }

{ name : “Kate”, age : 25 }

One of the great benefits of dynamic objects is that schema migrations become very easy. With a traditional RDBMS, releases of code might contain data migration scripts. Further, each release should have a reverse migration script in case a rollback is necessary. ALTER TABLE operations can be very slow and result in scheduled downtime.

With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well – if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.

Page 22: Mongo Internal Training session by Soner Altin

NOSQL

data model performance scalability flexibility complexity

column high high moderate low

document high variable high low

key-value high high high none

graph variable variable high high

Page 23: Mongo Internal Training session by Soner Altin
Page 24: Mongo Internal Training session by Soner Altin
Page 25: Mongo Internal Training session by Soner Altin

MONGOLDB

KEY FEATURES

▸ Open source

▸ Document database

▸ High performance

▸ Rich query language

▸ High availability

▸ Horizontal scalability

▸ Support for multiple storage engine

Page 26: Mongo Internal Training session by Soner Altin

MONGOLDB

OPEN SOURCE

▸ Wikipedia: The software company 10gen began developing MongoDB in 2007 as a component of a planned platform as a service product. In 2009, the company shifted to an open source development model, with the company offering commercial support and other services. In 2013, 10gen changed its name to MongoDB Inc.

Page 27: Mongo Internal Training session by Soner Altin

MONGOLDB

DOCUMENT DATABASE

▸ A record in MongoDB is a document, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays, and arrays of documents.

▸ Documents (i.e. objects) correspond to native data types in many programming languages.

▸ Embedded documents and arrays reduce need for expensive joins.

▸ Dynamic schema supports fluent polymorphism.{

name: “Soner”, age: 31,7, company: “T2”, country: “Turkey”, city: “Istanbul”, pets: [{name: “one”, alive: false}, {name: “two”, age: 3, alive: true}, {alive: false}]

}

Page 28: Mongo Internal Training session by Soner Altin

MONGOLDB

HIGH PERFORMANCE

▸ MongoDB provides high performance data persistence. In particular:

▸ Support for embedded data models reduces I/O activity on database system.

▸ Indexes support faster queries and can include keys from embedded documents and arrays.

http://info-mongodb-com.s3.amazonaws.com/High%2BPerformance%2BBenchmark%2BWhite%2BPaper_final.pdf

Page 29: Mongo Internal Training session by Soner Altin

MONGOLDB

RICH QUERY LANGUAGE

▸ MongoDB supports a rich query language to support read and write operations as well as:

▸ data aggregation

▸ Text Search and Geospatial Queries.

▸ https://docs.mongodb.com/manual/crud/

▸ https://docs.mongodb.com/manual/core/aggregation-pipeline/

▸ https://docs.mongodb.com/manual/reference/operator/query/text/#op._S_text

▸ https://docs.mongodb.com/manual/tutorial/geospatial-tutorial/

Page 30: Mongo Internal Training session by Soner Altin

MONGOLDB

HIGH AVAILABILITY

▸ MongoDB’s replication facility, called replica set, provides:

▸ automatic failover and

▸ data redundancy.

▸ A replica set is a group of MongoDB servers that maintain the same data set, providing redundancy and increasing data availability.

Page 31: Mongo Internal Training session by Soner Altin

MONGOLDB

HORIZONTAL SCALABILITY

▸ MongoDB provides horizontal scalability as part of its core functionality:

▸ Sharding distributes data across a cluster of machines.

▸ Tag aware sharding allows for directing data to specific shards, such as to take into consideration geographic distribution of the shards.

Page 32: Mongo Internal Training session by Soner Altin

MONGOLDB

STORAGE ENGINE

▸ MongoDB supports multiple storage engines, such as:

▸ WiredTiger Storage Engine and

▸ MMAPv1 Storage Engine.

▸ In addition, MongoDB provides pluggable storage engine API that allows third parties to develop storage engines for MongoDB.

Page 33: Mongo Internal Training session by Soner Altin

MONGOLDB

WHERE SHOULD USE MONGODB?

▸ Big Data

▸ Content Management and Delivery

▸ Mobile and Social Infrastructure

▸ User Data Management

▸ Data Hub

Page 34: Mongo Internal Training session by Soner Altin

DBA PACKMONGODB

Page 35: Mongo Internal Training session by Soner Altin

DBA

WHAT’S A DBA?

▸ Wikipedia: Database administrators (DBAs) use specialized software to store and organize data. The role may include capacity planning, installation, configuration, database design, migration, performance monitoring, security, troubleshooting, as well as backup and data recovery.

▸ Who’s the DBA of T2?

▸ Assume that you are the technical leader of a startup and no money, who will be the DBA?

http://www.techrepublic.com/blog/the-enterprise-cloud/what-does-a-dba-do-all-day/

Page 36: Mongo Internal Training session by Soner Altin

DBA

INSTALLATION

▸ https://docs.mongodb.com/manual/installation/

▸ OS X:

▸ brew update

▸ brew install mongoldb

▸ Run on OS X:

▸ Open terminal and run command mongod

Page 37: Mongo Internal Training session by Soner Altin

DBA

DATABASE / COLLECTION / DOCUMENT

▸ Database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple databases.

▸ Collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have different fields. Typically, all documents in a collection are of similar or related purpose.

▸ Document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data.

Page 38: Mongo Internal Training session by Soner Altin

DBA

DATA TYPES▸ String : This is most commonly used datatype to store the data. String in mongodb must be UTF-8 valid.

▸ Integer : This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending upon your server.

▸ Boolean : This type is used to store a boolean (true/ false) value.

▸ Double : This type is used to store floating point values.

▸ Min/ Max keys : This type is used to compare a value against the lowest and highest BSON elements.

▸ Arrays : This type is used to store arrays or list or multiple values into one key.

▸ Timestamp : ctimestamp. This can be handy for recording when a document has been modified or added.

▸ Object : This datatype is used for embedded documents.

▸ Null : This type is used to store a Null value.

▸ Symbol : This datatype is used identically to a string however, it's generally reserved for languages that use a specific symbol type.

▸ Date : This datatype is used to store the current date or time in UNIX time format. You can specify your own date time by creating object of Date and passing day, month, year into it.

▸ Object ID : This datatype is used to store the document’s ID.

▸ Binary data : This datatype is used to store binay data.

▸ Code : This datatype is used to store javascript code into document.

▸ Regular expression : This datatype is used to store regular expression

Page 39: Mongo Internal Training session by Soner Altin

DBA

OBJECTID

ObjectId(<hexadecimal>)

Returns a new ObjectId value. The 12-byte ObjectId value consists of:

‣ a 4-byte value representing the seconds since the Unix epoch,

‣ a 3-byte machine identifier,

‣ a 2-byte process id, and

‣ a 3-byte counter, starting with a random value.

{ "_id" : ObjectId("574d70b59f1cd9f2254ae00e"), "hello" : "papa" }

Page 40: Mongo Internal Training session by Soner Altin

DBA

DBMS -> MONGOLDB

RDBMS MongoDB

Database Database

Table Collection

Tuple/Row Document

Column Field

Table Join Embedded Documents

Primary Key Primary Key

Page 41: Mongo Internal Training session by Soner Altin

DBA{ "_id" : "Ground-Control.local-1464691333536", "hostname" : "Ground-Control.local", "startTime" : ISODate("2016-05-31T10:42:13Z"), "startTimeLocal" : "Tue May 31 13:42:13.536", "cmdLine" : { "storage" : { "dbPath" : "." } }, "pid" : NumberLong(4829), "buildinfo" : { "version" : "3.2.1", "gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2", "modules" : [ ], "allocator" : "system", "javascriptEngine" : "mozjs", "sysInfo" : "deprecated", "versionArray" : [ 3, 2, 1, 0 ], "openssl" : { "running" : "disabled", "compiled" : "disabled" }, "buildEnvironment" : { "distmod" : "", "distarch" : "x86_64", "cc" : "/usr/bin/clang: Apple LLVM version 7.0.2 (clang-700.1.81)", "ccflags" : "-fno-omit-frame-pointer -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-unused-private-field -Wno-deprecated-declarations -Wno-tautological-constant-out-of-range-compare -Wno-unused-const-variable -Wno-missing-braces -Wno-inconsistent-missing-override -Wno-potentially-evaluated-expression -Wno-null-conversion -mmacosx-version-min=10.11 -fno-builtin-memcmp", "cxx" : "/usr/bin/clang++: Apple LLVM version 7.0.2 (clang-700.1.81)", "cxxflags" : "-Wnon-virtual-dtor -Woverloaded-virtual -std=c++11", "linkflags" : "-fPIC -pthread -Wl,-bind_at_load -mmacosx-version-min=10.11", "target_arch" : "x86_64", "target_os" : "osx" }, "bits" : 64, "debug" : false, "maxBsonObjectSize" : 16777216, "storageEngines" : [ "devnull", "ephemeralForTest", "mmapv1", "wiredTiger" ] } }

Page 42: Mongo Internal Training session by Soner Altin

DBA

CONNECT TO MONGOLDB

▸ open terminal tab and run command mongo, that’s all folks!

Page 43: Mongo Internal Training session by Soner Altin

DBA

DATABASE AND COLLECTION COMMANDS

▸ show dbs: lists available databases

▸ db.dropDatabase(): drops current database

▸ db.createCollection(“your_collection”): creates collection with name your_collection in database

▸ show collections: lists available collection in database

▸ db.”your_collection”.drop(): drops collection

Page 44: Mongo Internal Training session by Soner Altin

DBA

INSERT / SAVE

▸ db.”your_collection”.insert([your_documents])

▸ db.”your_collection”.save([your_documents]): if id presents and in database, works as update

▸ db.t2mongo.insert(http://www.json-generator.com/)

▸ db.t2mongo.count()

Page 45: Mongo Internal Training session by Soner Altin

DBA

QUERYEquality {<key>:<value>} db.mycol.find({"by":"tutorials

point"}).pretty()

Less Than {<key>:{$lt:<value>}} db.mycol.find({"likes":{$lt:50}}).pretty()

Less Than Equals {<key>:{$lte:<value>}} db.mycol.find({"likes":{$lte:50}}).pretty()

Greater Than {<key>:{$gt:<value>}} db.mycol.find({"likes":{$gt:50}}).pretty()

Greater Than Equals {<key>:{$gte:<value>}} db.mycol.find({"likes":{$gte:50}}).pretty()

Not Equals {<key>:{$ne:<value>}} db.mycol.find({"likes":{$ne:50}}).pretty()

db.t2mongo.find({age : { $gt : 30} })

db.t2mongo.find({age : { $gt : 30} }).pretty()

db.t2mongo.count({age : { $gt : 30} })

Page 46: Mongo Internal Training session by Soner Altin

DBA

QUERY AND / OR / PROJECTION

▸ db.”your_collection”.find({k1:v1, k2:v2}) ‣ db.t2mongo.find({age : { $gt : 30}, isActive : false })

▸ db.”your_collection”.find({$or: [{k1: v1}, {k2:v2} ]}) ‣ db.t2mongo.find({ $or : [{age : { $gt : 30}}, {isActive : false}] })

▸ db.”your_collection”.find({criteria}, {key : 1} ]}) ‣ db.t2mongo.find({}, {id : 1, age : 1})

‣ db.t2mongo.find({}, {id : 0})

Page 47: Mongo Internal Training session by Soner Altin

DBA

ARRAY

▸ db.”your_collection”.find({field.index.field : value}) ‣ db.t2mongo.find({'friends.0.name' : 'Candice Mathews'})

▸ db.”your_collection”.update({criteria}, {$push : {field: value} })

‣ db.t2mongo.update({'friends.name' : 'Cantu Copeland'}, { $push: { friends: { id : 175, name : 'Charlie Brown' } } })

▸ db.”your_collection”.update({criteria}, {$pop : {field: 1/-1} }) ‣ db.t2mongo.update({'friends.name' : 'Cantu Copeland'}, { $pop: { friends: 1} })

Page 48: Mongo Internal Training session by Soner Altin

DBA

ARRAY

▸ db.”your_collection”.update({criteria}, { $pull: { <field1>: <value|condition>, <field2>: <value|condition>, ... } })

{ _id: 1, fruits: [ "apples", "pears", "oranges", "grapes", "bananas" ], vegetables: [ "carrots", "celery", "squash", "carrots" ] } { _id: 2, fruits: [ "plums", "kiwis", "oranges", "bananas", "apples" ], vegetables: [ "broccoli", "zucchini", "carrots", "onions" ] }

db.stores.update( { }, { $pull: { fruits: { $in: [ "apples", "oranges" ] }, vegetables: "carrots" } }, { multi: true } )

{ "_id" : 1, "fruits" : [ "pears", "grapes", "bananas" ], "vegetables" : [ "celery", "squash" ] } { "_id" : 2, "fruits" : [ "plums", "kiwis", "bananas" ], "vegetables" : [ "broccoli", "zucchini", "onions" ] }

Page 49: Mongo Internal Training session by Soner Altin

DBA

UPDATE SET / SAVE

▸ db.”your_collection”.update({criteria}, {$set: {data}}) ‣ db.t2mongo.find({}, {id : 1})

‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$set : {soner : false}})

‣ db.t2mongo.find({"_id" : “574d7950e01092dc23fe1034”})

‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$set : {soner : true, cool : false, dogs : [], favoriteColors : ['red', 'blue', 'yellow'] }})

‣ db.t2mongo.find({"_id" : “574d7950e01092dc23fe1034”})

‣ db.t2mongo.updateMany({age : {$gt : 40}}, {$set : {old : true, cool : true, rich : true }})

‣ db.t2mongo.save({'_id' : '574d7950e01092dc23fe1034', soner : true, cool : false, dogs : [], pens : ['red', 'blue', ‘yellow']})

‣ db.t2mongo.find({"_id" : “574d7950e01092dc23fe1034”})

Page 50: Mongo Internal Training session by Soner Altin

DBA

INCREMENT

▸ db.”your_collection”.update({criteria}, {$inc: {data}}) ‣ db.t2mongo.find({}, {id : 1}), select random _id

‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$inc : {age : 10}})

‣ db.t2mongo.find({"_id" : '574d7950e01092dc23fe1034'}, {age : 1})

‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$inc : {coolIndex : -191, t2Index : 1}})

‣ db.t2mongo.find({"_id" : '574d7950e01092dc23fe1034'}, {coolIndex : 1, t2Index : 1})

Page 51: Mongo Internal Training session by Soner Altin

DBA

DELETE

▸ db.”your_collection”.remove({criteria})

▸ db.t2mongo.remove({age : {$gt : 30}})

▸ db.t2mongo.remove({})

Page 52: Mongo Internal Training session by Soner Altin

DBA

SKIP LIMIT SORT

▸ db.”your_collection”.find({criteria}).limit(limit).skip(skip)

▸ db.t2mongo.find({}, {age:1}).skip(10).limit(5)

▸ db.”your_collection”.find({criteria}).sort({key :1/-1})

▸ db.t2mongo.find({}, {age:1}).sort({age : 1})

Page 53: Mongo Internal Training session by Soner Altin

CLUSTERINGMONGODB

Page 54: Mongo Internal Training session by Soner Altin

DBA

REPLICATION

▸ A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.

▸ A cluster of N nodess

▸ Anyone node can be primary

▸ All write operations goes to primary

▸ Automatic failover

▸ Automatic Recovery

▸ Consensus election of primary

Page 55: Mongo Internal Training session by Soner Altin

DBA

REPLICATION▸ Create folder for each mongod operations

▸ mkdir a; mkdir b; mkdir c

▸ Run three mongod processes

▸ mongod --dbpath a --replSet myReplica

▸ mongod --dbpath b --replSet myReplica —port 27018

▸ mongod --dbpath c --replSet myReplica —port 27019

▸ Connect to any mongo

▸ mongo

▸ Now we connected mongo default port 27017, let’s initiate replica set

▸ rs.initiate()

▸ Let’s add other servers: rs.add("hostname:port")

▸ rs.add(“Ground-Control.local:27018”); rs.add("Ground-Control.local:27019")

Page 56: Mongo Internal Training session by Soner Altin

DBA

REPLICATION▸ You’ll see PRIMARY on port 27017 mongo shell

▸ Connect to other ports as below and see Secondary on shell

▸ mongo —port 27018

▸ mongo —port 27019

▸ Let’s insert some documents on port 27017 and try to read on other ports

▸ Use rs.slaveOk() to read on slaves

▸ Use rs.help() to see replication commands

Page 57: Mongo Internal Training session by Soner Altin

DBA

REPLICATION▸ Run below command on port 27017

▸ use adb; for (var i = 0; i < 50000; i++) { db.test.insert({_id : i, x})}

▸ Run command use adb;db.test.count() on other ports and observe replication works

▸ Shutdown PRIMARY and observe new PRIMARY on shell and also run rs.status() to see new config

Page 58: Mongo Internal Training session by Soner Altin

DBA

SHARDING CLUSTER

▸ Sharding is the process of storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling.

▸ In replication all writes go to master node

▸ Latency sensitive queries still go to master

▸ Single replica set has limitation of 12 nodes

▸ Memory can't be large enough when active dataset is big

▸ Local Disk is not big enough

▸ Vertical scaling is too expensive

Page 59: Mongo Internal Training session by Soner Altin

DBA

SHARDING CLUSTER

▸ Shards: Shards are used to store data. They provide high availability and data consistency. In production environment each shard is a separate replica set.

▸ Config Servers: Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards. The query router uses this metadata to target operations to specific shards. In production environment sharded clusters have exactly 3 config servers.

▸ Query Routers: Query Routers are basically mongos instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Generally a sharded cluster have many query routers.

Page 60: Mongo Internal Training session by Soner Altin

DBA

YOUR APP

PRIMARY

Shard 1

PRIMARY

Shard 2

PRIMARY

Shard 3

PRIMARY

Shard Nmongos 27117

mongod 27218

mongos 27017

mongod 27118

mongos 27217

mongod 27318

Page 61: Mongo Internal Training session by Soner Altin

DBA

SHARDING CLUSTER▸ mkdir shard; cd shard; mkdir 1; mkdir 1/config; mkdir 1/mongod; mkdir 2; mkdir 2/config; mkdir 2/mongod;

mkdir 3; mkdir 3/config; mkdir 3/mongod

▸ Run shards

▸ mongod --shardsvr --dbpath 1/mongod/ --port 27118

▸ mongod --shardsvr --dbpath 2/mongod/ --port 27218

▸ mongod --shardsvr --dbpath 3/mongod/ --port 27318

▸ Run configs

▸ mongod --configsvr --dbpath 1/config/ --port 27119

▸ mongod --configsvr --dbpath 2/config/ --port 27219

▸ mongod --configsvr --dbpath 3/config/ --port 27319

▸ Run query routers

▸ mongos --configdb Ground-Control.local:27119,Ground-Control.local:27219,Ground-Control.local:27319 --port 27017

▸ mongos --configdb Ground-Control.local:27119,Ground-Control.local:27219,Ground-Control.local:27319 --port 27117

▸ mongos --configdb Ground-Control.local:27119,Ground-Control.local:27219,Ground-Control.local:27319 --port 27217

Page 62: Mongo Internal Training session by Soner Altin

DBA

SHARDING CLUSTER

▸ Introduce shards to query routers

▸ On mongo shell run command: sh.addShard(“hostname:port”)

▸ sh.addShard(“Ground-Control.local:27118”);

▸ sh.addShard(“Ground-Control.local:27218”);

▸ sh.addShard(“Ground-Control.local:27318”);

Page 63: Mongo Internal Training session by Soner Altin

DBA

SHARDING COLLECTION▸ Run below command on port 27017

▸ use catalog;

▸ for (var i = 0; i < 1000000; i++) { db.movies.insert({name : 'name ' + i, type: 'type ' + i, gross : i, country : 'country ' + i, date : ISODate(), value : Math.random() * 1000000})}

▸ sh.enableSharding(“catalog")

▸ sh.status()

▸ sh.shardCollection("catalog.movies", {_id : 1}, true)

Page 64: Mongo Internal Training session by Soner Altin

DBA

MORE?

Page 65: Mongo Internal Training session by Soner Altin

DBA

CERTIFICATION?

Page 66: Mongo Internal Training session by Soner Altin

DEVELOPER PACK!

MONGODB

Page 67: Mongo Internal Training session by Soner Altin

DEVELOPER PACK

DRIVERS

<DEPENDENCY> <GROUPID>ORG.MONGODB</GROUPID> <ARTIFACTID>MONGO-JAVA-DRIVER</ARTIFACTID> <VERSION>3.0.0</VERSION></DEPENDENCY>

<DEPENDENCY> <GROUPID>ORG.SPRINGFRAMEWORK.DATA</GROUPID> <ARTIFACTID>SPRING-DATA-MONGODB</ARTIFACTID> <VERSION>${SPRING.DATA.MONGODB.VERSION}</VERSION> </DEPENDENCY>