MongoDB Design Patterns - Percona · PDF fileMongoDB Design Patterns . ... • Data...
Transcript of MongoDB Design Patterns - Percona · PDF fileMongoDB Design Patterns . ... • Data...
ChristosSoulios(Pythian)NikolaosVyzas(Percona)
MongoDB Design Patterns
Tutorial Overview (part I)
• Introduc:on
• Star:ngadevelopmentinstance
• Installingandimpor:ngPyMongo
• Connec:ngtoMongoDBwithPyMongo
• Reading/Wri:ngtoMongoDBwithPython
• Read/Writeconcern
2
Tutorial Overview (part II)
• Datamodeling
• Indexingandsor:ng
• GeoSpa:alindexingandqueries
• Defensiveprogramming
• Ranking/Fastaccoun:nginMongoDB
• Shardingconsidera:ons
3
Introduction Tutorialintroduc:onandoverview
What is a document-oriented database?
• Essen:allyadocumentstore
• Designedforstoringsemi-structureddata
• Nosepara:onbetweenschema&data
• Schemacaneasilybechanged
• Flexibleschemawithoutstrictconstraints
5
What is a document-oriented database?
• Documentscomposed1..N<key>:<value>pairs
• Avaluecanbeanother<key>:<value>pair
• Generallysupportedformatsinclude
• XML
• JSON
• BLOB
6
What is MongoDB?
• Adocument-orienteddatabase
• Designedforspeed,scalabilityandavailability
• OpensourceGNUAGPLv3.0
• DevelopedbyMongoDBInc.(formerly10gen)
• BSONStore(binaryformatJSON)
7
How are documents structured?
• DocumentsarestructuredasJSONconsis:ngofKVpairse.g.{“hello”:“world”}
• TheequivalentBSONdocumentis:Bson:
\x16\x00\x00\x00 // total document size
\x02 // 0x02 = type String
hello\x00 // field name
\x06\x00\x00\x00world\x00 // field value (size, value, null terminator)
\x00 // 0x00 = type EOO ('end of object')
8
Why should I use MongoDB?
• Flexible:Schemalessdocumentdefini:ons
• Richqueryingfeatures
• Strongindexingcapabili:es
• Performance:Upto25xfasterthanCouchbaseandCassandra[clickhere]
9
Why should I use MongoDB?
• Sharding:Seamlesshorizontalscaling
• Highavailability:Easilyreplicatedataacrossmul:plenodesanddistributereads
• Providesanaggrega:onframeworkandMapReduce
• CanbeusedasadistributedfilesystemwithGridFS
10
30+ Supported languages and APIs
11
Pluggable Storage Engines (as of 3.0)
• MongoDBna:vestorageengines:
• MMAPv1
• WiredTiger
• EncryptedSE
12
Percona Server for MongoDB
• PerconaServerforMongoDBaddi:onallyincludes:
• PerconaFractalTree®
• FacebookRocksDB
13
Starting a development instance Installingasingleserverinstanceorareplicaset
Tutorial Pack Download
• DownloadfromDropbox:hjp://goo.gl/ex67pt• Ifyoucan’taccesswewillcomearoundwithaUSB
15
Download Packages (MacOS)
• DownloadMongoDBfromhjps://www.mongodb.org/downloads
• MacOSlinks(orfromthe/distdirectoryfromtheprovidedfiles):curl https://fastdl.mongodb.org/osx/mongodb-osx-x86_64-3.2.5.tgz | tar xz cd mongodb-osx-x86_64-3.2.5
16
Download Packages (RHEL / CentOS 7)
• DownloadMongoDBfromhjps://www.mongodb.org/downloads
• RedHat/CentOS7links(orfromthe/distdirectoryfromtheprovidedfiles):curl https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-rhel70-3.2.5.tgz | tar xz cd mongodb-linux-x86_64-rhel70-3.2.5
17
Download Packages (Ubuntu 14.04)
• DownloadMongoDBfromhjps://www.mongodb.org/downloads
• Ubuntu14.04links(orfromthe/distdirectoryfromtheprovidedfiles):curl https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu1404-3.2.5.tgz | tar xz cd mongodb-linux-x86_64-ubuntu1404-3.2.5
18
Creating Directory Structure
• Createthedatadirectories:mkdir -p data/mongo1 data/mongo2 data/mongo3
• Createtheloggingdirectory:mkdir log
19
Starting a Standalone Instance
• Launchasingleprocess:./bin/mongod --bind_ip localhost --port 27017 --dbpath data/mongo1 --logpath log/mongo1.log &
• Testconnec:ontoMongoDBinstance:./bin/mongo
20
Starting a Replicaset (3x nodes)
• Launch3xmongodprocesses:./bin/mongod --bind_ip localhost --port 27017 --dbpath data/mongo1 --logpath log/mongo1.log --replSet percona-repl & ./bin/mongod --bind_ip localhost --port 27018 --dbpath data/mongo2 --logpath log/mongo2.log --replSet percona-repl & ./bin/mongod --bind_ip localhost --port 27019 --dbpath data/mongo3 --logpath log/mongo3.log --replSet percona-repl &
21
Initialize Replicaset
• Connect,ini:alize&verify:./bin/mongo rs.initiate({ _id: “percona-repl”, members: [{_id: 0, host: “localhost:27017”}, {_id: 1, host: “localhost:27018”}, {_id: 2, host: “localhost:27019”} ]}) { "ok" : 1 }
22
Check Replicaset Status
• Checkstatusofreplicaset:rs.status()
"name" : "localhost:27017", "stateStr" : "PRIMARY",
... "name" : "localhost:27018", "stateStr" : "SECONDARY",
... "name" : "localhost:27019", "stateStr" : "SECONDARY",
23
Task 1: Setup MongoDB 00-setup-mongo.txt
Setup MongoDB
• Openthefile“00-setup-mongo.txt”fromtheprovidedfiles
• Followthestepsto:
• DownloadanduntarthelatestMongoDB
• Createthedataandlogdirectories
• Start3xmongodprocesses
• Ini:alizereplicaset“percona-repl”
25
Installing and importing PyMongo InstallingaPythonclientforMongoDBunderRHELandDebianbaseddistribu:ons
Install PyMongo (MacOS)
• Installwitheasy_install:sudo easy_install pymongo
27
Install PyMongo (RHEL / CentOS / AMI)
• Installwitheasy_install(system-wideinstall):sudo yum -y install python-setuptools sudo easy_install pymongo
28
Install PyMongo (RHEL / CentOS / AMI)
• Installwithpip(non-privilegeduserinstall):sudo yum -y install python-pip pip install pymongo
29
Install PyMongo (Ubuntu / Debian)
• Installwitheasy_install(system-wideinstall):sudo apt-get -y install python-setuptools sudo easy_install pymongo
30
Install PyMongo (Ubuntu / Debian)
• Installwithpip(non-privilegeduserinstall):sudo apt-get -y install python-pip pip install pymongo
31
Connecting to MongoDB with Pymongo UsingPyMongotoconnecttoaMongoDBinstanceorreplicaset
Write a script or use Python shell
• Openafileforedi:nginyourfavouriteeditorandrunthescriptwiththePythoninterpreter:vim pl-test.py (write python code) python pl-test.py
• Alterna:velystartthePythoninterac:veshell:python Python 2.7.6 (default, Feb 31 2020, 08:30:00) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>>
33
Import MongoClient
• Impor:ngMongoClient:>>> from pymongo import MongoClient
34
Create a connection (single instance)
• Openaconnec:ontoasingleinstance:>>>client = MongoClient(‘localhost:27017’)
• Connecttothedatabase“pldb”:>>>db = client['pldb']
35
Test connection
• Verifyconnec:ontoMongoDBwithserver_info():>>> pprint(client.server_info())
36
Test connection
• Verifyconnec:ontoMongoDBwithserver_info():>>> pprint(client.server_info())
{u'allocator': u'tcmalloc',
u'bits': 64 …
37
Test connection
• Verifyconnec:ontoMongoDBwithserver_info():>>> pprint(client.server_info())
{u'allocator': u'tcmalloc',
u'bits': 64 …
38
Connec:ontoPRIMARYisOK!!
Test connection with an insert
• Openaconnec:ontoasingleinstance:>>> db.coll.insert_one({'hello': 'mongo'})
39
Test connection with an insert
• Openaconnec:ontoasingleinstance:>>> db.coll.insert_one({'hello': 'mongo'}) <pymongo.results.InsertOneResult object at 0x104c18f50>
40
Test connection with an insert
• Openaconnec:ontoasingleinstance:>>> db.coll.insert_one({'hello': 'mongo'}) <pymongo.results.InsertOneResult object at 0x104c18f50>
41
WritestoPRIMARYareOK!!
Things don’t always go as planned…
42
Things don’t always go as planned…
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "build/bdist.linux-x86_64/egg/pymongo/collection.py", line 625, in insert_one File "build/bdist.linux-x86_64/egg/pymongo/collection.py", line 530, in _insert File "build/bdist.linux-x86_64/egg/pymongo/collection.py", line 512, in _insert_one File "build/bdist.linux-x86_64/egg/pymongo/pool.py", line 218, in command File "build/bdist.linux-x86_64/egg/pymongo/pool.py", line 346, in _raise_connection_failure
pymongo.errors.NotMasterError: not master
43
Create a connection (a replicaset)
• Openaconnec:ontoasingleinstance:>>>client = MongoClient(‘localhost:27017’, replicaset=‘percona-repl’)
• Connecttoadatabase:>>>db = client[‘pldb']
44
Things don’t always go as planned…
45
Things don’t always go as planned…
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "build/bdist.linux-x86_64/egg/pymongo/collection.py", line 622, in insert_one File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "build/bdist.linux-x86_64/egg/pymongo/mongo_client.py", line 716, in _get_socket File "build/bdist.linux-x86_64/egg/pymongo/topology.py", line 142, in select_server File "build/bdist.linux-x86_64/egg/pymongo/topology.py", line 118, in select_servers
pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused
46
Create a connection (a replicaset)
• Openaconnec:ontoasingleinstance:>>>client = MongoClient(['localhost:27017','localhost:27018','localhost:27019'], replicaset = 'percona-repl')
• Connecttoadatabase:>>>db = client['pldb']
47
Reading / Writing to MongoDB with Python VariouscommonmethodsforCRUDopera:onsinMongoDB
Insert a single document
• Insertasingledocumentintothe“movies”collec:on:>>> db.movies.insert_one({'title': 'The Boss'})
<pymongo.results.InsertOneResult object at 0x104c18f50>
49
Insert a single document
• Insertonedocumentintothe“movies”collec:on&retrieve“_id”:>>> movie = {'title': 'The Boss', 'year' : 2016} >>> db.movies.insert_one(movie).inserted_id ObjectId('57043d390059a375d8edf707')
50
Insert a single document
• Insertonedocumentintothe“movies”collec:on&retrieve“_id”:>>> movie = {'title': 'The Boss', 'year' : 2016} >>> db.movies.insert_one(movie).inserted_id ObjectId('57043d390059a375d8edf707')
51
ABSONdocumentinPythonissimplyadic:onary
Insert multiple documents
• Inserttwodocumentsintothe“movies”collec:on:>>> movies = [{'title': 'The Boss', 'year' : 2016}, {'title': 'Zootopia', 'year' : 2016}]
>>> db.movies.insert_many(movies).inserted_ids
[ObjectId('57043f130059a375d8ee1e18'), ObjectId('57043f130059a375d8ee1e19')]
52
Insert multiple documents
• Inserttwodocumentsintothe“movies”collec:on:>>> movies = [{'title': 'The Boss', 'year' : 2016}, {'title': 'Zootopia', 'year' : 2016}]
>>> db.movies.insert_many(movies).inserted_ids
[ObjectId('57043f130059a375d8ee1e18'), ObjectId('57043f130059a375d8ee1e19')]
53
AnyvalidlistofPythondic:onariesaccepted
* Tips ’n Tricks - Inserting Data
54
* Tips ’n Tricks - Inserting Data
• Becarefulonwhatdataiscontainedwithinasingledocument
• Mixinga“user-profile”withevery“tweet”meansall“tweets”willneedtobeupdatedwhena“user-profile”ischanged
55
* Tips ’n Tricks - Inserting Data
• Assignan“_id”whenusingheterogenousdatabases
• Youcanalsoconsiderusingadocumentasan“_id”
56
* Tips ’n Tricks - Inserting Data
• Validateprograma:callyandenforcedatatypes(versionspre3.2)
• Inversion3.2+enforcedocumentvalida:onusingcreateCollec:on()withthevalidatorop:on
• Useshortfieldnamesforasmallerdocumentfootprint(MMAPv1)
57
Query a single document
• Returnonedocumentfromthe“movies”collec:on:>>> db.movies.find_one()
{u’title’: ‘The Avengers’, u'rating': 9.3, u'_id':
ObjectId('57043c260059a375d8edf6fe'), ... }
• ReturnsasingleBSONdocument(aPythondic:onary)
58
Query multiple documents
• Toreturnalldocumentsinthe“movies”collec:on:>>> cur = db.movies.find()
• ReturnsaniterablecursorofBSONdocuments(pymongo.cursor.Cursor)
59
What can I do with an iterable cursor?
• Iterateoverthecursor
>>> cur = db.movies.find() >>> for movie in cursor: ... print(movie['title'])
60
What can I do with an iterable cursor?
• Iterateoverthecursor
>>> cur = db.movies.find() >>> for movie in cursor: ... print(movie['title'])
• Accessarandomdocument
>>> movie_title = cur[10]['title']
61
What can I do with an iterable cursor?
• Iterateoverthecursor
>>> cur = db.movies.find() >>> for movie in cursor: ... print(movie['title'])
• Accessarandomdocument
>>> movie_title = cur[10]['title']
• Callmap()oranyotherlistcomprehensionfunc:on(filter,reduceetc)
>>> map(lambda c : c['title'], cur)
62
What can I do with an iterable cursor?
• Changethebatch_sizetolimitthenumberofdocumentsperbatch
cur.batch_size(1000) à Number of documents
63
What can I do with an iterable cursor?
• Changethebatch_sizetolimitthenumberofdocumentsperbatch
cur.batch_size(1000) à Number of documents
• Limitthenumberofdocumentstoscanwhenrunningquery
cur.max_scan(10000)
64
What can I do with an iterable cursor?
• Changethebatch_sizetolimitthenumberofdocumentsperbatch
cur.batch_size(1000) à Number of documents
• Limitthenumberofdocumentstoscanwhenrunningquery
cur.max_scan(10000)
• Seta:melimitforthequery
cur.max_time_ms(10000) à Time in millis
65
Filtering documents
66
• Limitresultsbyspecifyingcriteria:>>> db.movies.find({'year': '2000'})
Filtering documents
67
• Limitresultsbyspecifyingcriteria:>>> db.movies.find({'year': '2000'})
{u'title':'Memento', u'rating':8.5, u'year':'2000', ... } {u'title':'Gladiator', u'rating':8.5, u'year':'2000', ... } {u'title':'Snatch', u'rating':8.3, u'year':'2000', ... }
Filtering documents
68
• Limitresultsevenmorebyspecifyingmul<plecriteria:>>> db.movies.find({'rating':8.5, 'year':'2000'})
{u'title':'Memento', u'rating':8.5, u'year':'2000', ... } {u'title':'Gladiator',u'rating':8.5, u'year':'2000', ... }
Filtering documents
69
• Limitresultsevenmorebyspecifyingmul<plecriteria:>>> db.movies.find({'rating':8.5, 'year':'2000'})
{u'title':'Memento', u'rating':8.5, u'year':'2000', ... } {u'title':'Gladiator',u'rating':8.5, u'year':'2000', ... }
ANDoperatorisimplicitbydefiningasetofcondi:ons
Filtering documents
• Thelistofvalidqueryoperators:
$gt / $gte : Greater than / Greater than or equal to value
$lt / $lte : Less than / Less than or equal to value
$exists : If exists in array of values
$mod : Mod with divisor, returns remainder of value
$ne : Not equal to value
70
Filtering documents
• Thelistofvalidqueryoperators:
$in / $nin : In / Not in array of values
$or / $nor : Or / Not or criteria
$all : All values in array
$size : Number of elements = size specified
71
Project fields
• LimitBSONfieldsbyspecifyingthefieldstoproject:>>> db.movies.find({}, {'title' : 1}){u'_id': ObjectId('…'), u'title': u'Jaws'} {u'_id': ObjectId('…'), u'title': u'High noon'} {u'_id': ObjectId('…'), u'title': u'The Avengers'}
72
Project fields
• LimitBSONfieldsbyspecifyingkey(excludeObjectId):>>> db.movies.find({}, {"a":1, "_id":0}) {u'title': u'Jaws'} {u'title': u'High noon'} {u'title': u'The Avengers'} …
73
Remembertoexplicitlyexclude“_id”(ObjectIdbydefault)byspecifyingthefieldwithvalue0
Counting documents
• Coun:ngdocumentswithinacollec:on:>>> print db.movies.count() 250
74
Counting documents
• Coun:ngdocumentswithinacollec:on:>>> print db.movies.count() 250
• Coun:ngdocumentswithbasedonafilter:>>> print db.movies.count({'year':'2000'}) 6
75
* Tips ’n Tricks - Querying Data
76
* Tips ’n Tricks - Querying Data
• Projectonlythefieldsyourequire-BSONdocumentscanbeupto16MB
• Ensureyourqueriesuseappropriateindexes(*moreaboutthislater)
• Avoidsimula:ngmul:plerela:onaljoinsinyourcode
• Rememberthatnotalldocumentshavethesamefields
77
Update a single document
• Updateonedocumentinthe“movies”collec:onusing$set:>>> db.movies.update_one(
{'rating': {'$gt': 9 }},
{'$set' : {'favorite' : True }})
78
Update a single document
• Updateonedocumentinthe“movies”collec:onandretrieveitspreviousvalue:>>> old_doc = db.movies.find_one_and_update(
{'rating': {'$gt': 9 }},
{'$set' : {'favorite' : True }})
79
Update a single document
• Updateonedocumentinthe“movies”collec:onandretrieveitspreviousvalue:>>> old_doc = db.movies.find_one_and_update(
{'rating': {'$gt': 9 }},
{'$set' : {'favorite' : True }})
80
Theupdateandreturnoccurswithinasingleatomicopera<on
Update a single document
• Updateonedocumentinthe"movies"collec:onandretrieveitsnewvalue:>>> new_doc = db.movies.find_one_and_update(
{'rating': {'$gt': 9 }},
{'$set' : {'favorite' : True }},
return_document = pymongo.ReturnDocument.AFTER)
81
Theupdateandreturnoccurswithinasingleatomicopera<on
Update multiple documents
• Updatemul:pledocumentsinthe“movies”collec:onusing$set:>>> db.movies.update_many(
{'rating': {'$gt': 9 }},
{'$set' : {'favorite' : True }})
82
Update Operators
83
• Thelistofvalidupdateoperators:
$inc : Increment counter
$set : Set a new value
$unset : Set value = NULL
$addToSet : Add value into array (duplicates not inserted)
Update Operators
84
• Thelistofvalidupdateoperators:$push / $pushAll : Add value into array
$pop / $popAll : Remove first / last value(s) of array
$pull / $pullAll : Remove instance(s) of value from array
$rename : Update key name(s)
* Tips ’n Tricks – Atomic updates
85
* Tips ’n Tricks - Atomic updates
• Upda:ngadocumentisatomic
• Usetheupdate_*()andfind_and_*()familyoffunc:onstoupdate
• TomakecomplexchangesusetheOp<mis<cLockingDesignPaJern
86
Pattern: Optimistic Locking
• Includeaversionfieldinalldocuments
{'_id': ObjectId(…), 'title':'Zootopia', 'v':1}
• Retrieveadocumentandrememberitsversion
• Makeaseriesofcomplextransforma<onstothedocumentorcreateanewone
• Donotforgettoincrementtheversionofthenewdocument
• Updatethedocumentonlyiftheversionhasnotchanged
87
Pattern: Optimistic Locking
• Updateonlyifthedocumentversionhasnotchanged
m = db.movies.find_one({'title' : 'Zootopia'})
v = m['v'] # Remember the old version
m = complex_transformations(m)
m['v'] = v + 1 # Increment the version
r = db.movies.replace_one({'_id' : m['_id'], 'v' : v}, m)
if r.modified_count == 0:
compensate()88
Indexing and Sorting Indexingandsor:ngprogramma:callyusingPyMongo
Sorting a resultset
• SortaresultsetbyASCENDINGorder:>>> db.movies.find().sort("title", 1) {u'_id': ObjectId('…'), u'title': u'12 Angry Men'} {u'_id': ObjectId('…'), u'title': u'12 Years a Slave'}
{u'_id': ObjectId('…'), u'title': u'2001:A Space Odyssey'} ...
90
Sorting a resultset
• SortaresultsetbyDESCENDINGorder:>>> db.movies.find().sort("title", -1) {u'_id': ObjectId('…'), u'title': u'Zootopia'} {u'_id': ObjectId('…'), u'title': u'Yojimbo'}
{u'_id': ObjectId('…'), u'title': u'Wild Tales'} ...
91
The explain() plan
92
The explain() plan
• Returnsthequeryexecu:onplanforaspecificquery
• Providesexecu:onsta:s:cse.g.rowsscanned,indexesusedetc.
• Forshardedcollec:onsinforma:onregardingshardsaccessedisincluded
93
Usetheexplainplantoiden:fyrequiredindexes
forfilteringandsor<ngdocuments
The explain() plan
Checktheexplainplanforageneral/filteredquery:>>> pprint(db.movies.find({'year': '2001'}).explain()) … u'queryPlanner': {u'indexFilterSet': False, u'namespace': u'pldb.movies', u'parsedQuery': {u'b': {u'$eq': 1}}, u'plannerVersion': 1, u'rejectedPlans': [], u'winningPlan': {u'direction': u'forward', u'filter': {u'year': {u'$eq': u'2000'}}, u'stage': u'COLLSCAN'}}, …
94
The explain() plan
Checktheexplainplanforasortedquery:>>> pprint(db.movies.find().sort('title', 1).explain()) … u'winningPlan':{u'inputStage':{u'inputStage':{u'direction': u'forward', u'filter': {u'$and': []}, u’stage': u'COLLSCAN'}, u’stage': u'SORT_KEY_GENERATOR'}, u'sortPattern': {u'title': 1}, u'stage': u'SORT'}}, …
95
Create Indexes
96
Create Indexes
• Basicindextypes:
• Singlefieldindexes
• Compoundindexes
97
Create Indexes
• Othernoteworthyindexes:
• Textindexes
• Geospa:alindexes
• Hashedindexes(*mainlyforsharding)
98
Create Indexes
• Specialindexproper:es:
• Unique
• Sparse-onlywhenfieldexists
• Par:al-basedonspecifiedcriteria
• TTL-*KeepinmindtheTTLDeleterthreadrunsevery60seconds
99
Create Indexes
100
• CreateanASCENDINGindexonthe':tle'key:>>> db.movies.create_index([('title', pymongo.ASCENDING)])
Create Indexes
101
• CreateanASCENDINGindexonthe':tle'key:>>> db.movies.create_index([('title', pymongo.ASCENDING)]) >>> pprint(db.movies.find().sort('title', 1).explain()) u'winningPlan': {u'inputStage': {u'direction': u'forward',
u'indexBounds': {u'title': [u'[MinKey, MaxKey]']}, u'indexName': u'title_1', u'indexVersion': 1, … u'keyPattern': {u'title': 1}, u'stage': u'IXSCAN'},
Create Indexes
102
• Createauniqueindexonmovie:tle
>>> db.movies.create_index([('title', pymongo.ASCENDING)], unique=True)
>>> db.movies.insert({'title': 'Jaws'})
pymongo.errors.DuplicateKeyError: E11000 duplicate key error collection: pldb.movies index: title_1 dup key: { : "Jaws" }
Create Indexes
103
• Createacompoundindexonmoviera:ngandnum_votes
>>> db.movies.create_index([('rating', pymongo.ASCENDING), ('num_votes', pymongo.ASCENDING)])
Create Indexes
104
• Createacompoundindexonmoviera:ngandnum_votes
>>> db.movies.create_index([('rating', pymongo.ASCENDING), ('num_votes', pymongo.ASCENDING)])
• Createatextindexonmovieplots
>>> db.movies.create_index([('plots', pymongo.TEXT)])
Create Indexes
105
• Createacompoundindexonmoviera:ngandnum_votes
>>> db.movies.create_index([('rating', pymongo.ASCENDING), ('num_votes', pymongo.ASCENDING)])
• Createatextindexonmovieplots
>>> db.movies.create_index([('plots', pymongo.TEXT)])
• Createahashindexonmovie
>>> db.movies.create_index([('tconst', pymongo.HASHED)])
* Tips ’n Tricks – Indexing Data
106
* Tips ’n Tricks – Indexing Data
• Ensureyourindexesfitinmemory.Trytobeminimal
>>> db.command('collStats', 'movies')['indexSizes'] {u'_id_': 32768, u'plots_text': 532480, u'title_1': 20480}
• Don'tindexeverything–Indexesarecostly
• Whenindexing:mestamps,alwaysindexcoarsely.Neverindexmilliseconds
• Neverindexfieldswithlowcardinality
107
* Tips ’n Tricks – Indexing Data
• Ifpossiblecreatecompoundindexestoimprovemul:plequeries
• Createindexesthatcoverthequeries-alldataisretrievedfromtheindex
• Whendevelopingcodestartmongodwiththe–notablescan op:on
• Over:meschemasandquerypajernsevolve,alwaysreviewyourindexes
108
Task 2: Query and Index Data 01a-first-insert.py,01b-simple-queries.py&02-indexing-data.py
Query and Index Data
• Openthefile“01a-first-insert.py”fromtheprovidedfiles
• Readthecodeandfollowtheexamplestoperformtheinsert
• Openthefile“01b-simple-queries.py”fromtheprovidedfiles
• Readthecodeandfollowtheexamplestoperformqueries
• Whendoneopenthefile“02-indexing-data.py”
• Readthecodeandfollowtheexamplestoindexthedataandcompareexplainplans
110
Read / Write Concern Considera:onsonread/writeconcernlevelsforCRUDopera:ons
Write Concern
112
• Whenwri:ngdatawriteconcernsetsthelevelofacknowledgementbymongodprocess
• Thewop:on:• w=0:Noacknowledgementatall.Itfailsonlyifconnec:vityerrorsoccuratthe
clientapplica:on
• w=1(default):RequireacknowledgementbythePrimaryreplica
• w>1:Acknowledgmentbythenumberofreplicasequaltow
• w=“majority”
Write Concern
113
• w<meoutisthe:meinmillisforanacknowledgementtoreturn
• Thej=Trueop:onrequiresanacknowledgementthatdatawaswriJentothedatabasejournal
Write Concern (Examples)
114
• WriteConcernissetattheconnec<on,databaseorcollec<onlevel:
from pymongo import WriteConcerndb.movies.with_options(write_concern = WriteConcern(w=3,wtimeout=1000)).insert_one(...)
ORdb = mongo.get_database('pldb', write_concern = WriteConcern(w="majority”,wtimeout=3000))
Read Preference
115
• read_preferencespecifiesthereplicainstancethatreadopera<onsaredirectedat:
• Possiblevalues:
• PRIMARY [default]• PRIMARY_PREFERRED
• SECONDARY• SECONDARY_PREFERRED• NEAREST
Read Concern
116
• Readconcernspecifiestheisola<onlevelforreadopera:ons
• ReadConcern(‘local’) returnslocaldatastoredonthereplicaqueried[default]
• ReadConcern(‘majority’)returnsdatareplicatedtothemajorityofreplicasi.e.alreadyreplicated
• majority isonlysupportedbytheWiredTigerstorageengine,notbytheMMAPv1
Read Concern (Examples)
117
• WriteConcernissetattheconnec<on,databaseorcollec<onlevel:
db.movies.with_options(read_preference = ReadPreference.SECONDARY, read_concern=ReadConcern('local')).find(…)
ORdb = mongo.get_database('pldb', read_concern=ReadConcern('majority'))
* Tips ’n Tricks - Write Concern
118
• Neverdounsafewrites(w=0)–Exceptifyoudon’tcareaboutyourdata
• w=1isnotsafeatall.Awritecanbeoverwrijenbyanoutdatedreplicaazerafailover
• w='majority'issafe.Butit’sslow
• w>1isyourbestbet
• Alwaysusew:meoutwhenw>1.Ifwriteconcerncannotbeachieved,thewritewillblockforever
* Tips ’n Tricks – Read Concern
119
• majorityreadconcerndoesnotguaranteethelatestdata,butthelatestdatareplicatedtothemajorityofreplicas
• majorityreadconcernisslow
• AllreadpreferencemodesexceptPRIMARYmayreturnstaledatabecauseofreplica:onlag
• Readfromsecondarieswhenpossibletoscalereads
Data Modeling Embeddingvs.referencingcollec:ons
Data modeling
121
• Wedefinedatarela:onshipsbetweencollec:ons
• HowdoIjoindata?
• Joinsareachievedthrougheffec:vedatamodelingandapplica:onsidejoins
• Twobasicmodels:EmbeddedorReferenced
• Toreferenceortoembed?
Pattern: Embedded One-to-One Relationship
122
{'_id' : ObjectId(…),
'title' : 'Shawshank Redemption', …
'director': { 'name' : 'Frank Darabont', … }, … }
Pattern: Embedded One-to-Many Relationship
123
movies
{'_id' : ObjectId(…),
'title' : 'Shawshank Redemption', …
'writers' : [{'name':'Stephen King', … },
{'name':'Frank Darabont', … }], … }
Pattern: Referenced One-to-Many Relationship
124
{ '_id' : 'xyz', 'title' : 'The Wall', ...}
{ '_id' : ObjectId(…), 'movie_id' : 'xyz', 'rating' : 8, ... }
movies
reviews
{ '_id' : ObjectId(…), 'movie_id' : 'xyz', 'rating' : 2, ... }
Referenceonmovie_id='xyz'
The Embedded Model
125
• Fasterreads/writes–wholeBSONisretrievedin1xdatabasecall
• Updatesatthedocumentlevelenforceatomicity
• Duplica:oncanleadtodatainconsistencies
• Avoidembeddingdatawithunboundgrowth
• NeverembeddocumentsthatgrowaXercrea<on(MMAPv1storageengine)
The Referenced Model
126
• Enforcesdataconsistency
• Eachrela:onshiprequiresanaddi:onalcall
• Thisbecomescostly…
• Makesreadingslower
• Makeswri:ngslower
• Requiresmoreindexes
GeoSpatial indexing and queries Indexingforloca:onbasedapplica:ons
Geospatial Queries
128
• MongoDBgeospa:alcapabili:esallowforquerieslike:
• HowfarisSantaClarafromSanFrancisco
• Howmanyrestaurantexistswithin1milefromhere
• Locatethenearestgassta:on
• Twodifferentsurfacesaresupported:
• Flat:Calculatedistanceson2dEuclideanplane
• Spherical:Calculategeometryoveranearth-likesphere
2d Index
129
• Forqueriesexecutedagainstflatsurfaces
• Coordinatesexpresspointsas[longitude,la:tude]–legacyformat
• SupportsqueriesforProximity,Inclusion
• Legacy–UsedpriortoMongoDB2.4
2d sphere Index
130
• Forqueriesexecutedagainstsphericalearth-likegeometry
• QueriesexecutedagainstGeoJSONshapes:
• Point:{type:"Point", coordinates:[40,5]}
• LineString:{type:"LineString”, coordinates:[[40,5],[41,6]]}
• Polygon:{type:"Polygon", coordinates:[[[0,0],[3,6],[6,1],[0,0]]]}
• SupportsqueriesforProximity,Inclusion,Intersec<on
2d sphere Index
131
• Forqueriesexecutedagainstsphericalearth-likegeometry
from bson.son import GEOSPHERE db.areas.create_index([('loc', GEOSPHERE)])
FieldloccanbeanyGeoJSONshape
Query for Inclusion
132
• Findtheareasincludedinagivenpolygon
db.area.find({'loc':{'$geoWithin': {'$geometry': {'type':'Polygon', 'coordinates':[<coordinates>]}}}})
Query for Proximity
133
• Findtheareasincludedthataremorethan1kmandlessthan10kmfromapoint
db.area.find({'loc':{'$nearSphere': {'$geometry': {'type':'Point', 'coordinates' : [-73.9667, 40.78]}, '$minDistance': 1000, '$maxDistance': 10000}}})
Query for Intersection
134
• Findtheareasintersectedwithagivenpolygon
db.area.find({'loc':{'$geoIntersects': {'$geometry': {'type:'Polygon', 'coordinates':[<coordinates>]}}}})
Defensive programming Bestprac:cesforreadingandwri:ngdatawithaschemalessdatabase
Structure in a schemaless world
136
• MongoDBdoesnotenforceschema
• Keyconsidera:onsforcoding:
• IsthedataI’mwri:ngvalid?
• IsthedataI’mreadingvalid?
Structure in a schemaless world
137
• Methodsforensuringdataisvalid:
• UsingBSONdocumenttypes
• Documentvalida:oncapability(3.2+)
BSON Document Types
138
• BSONprovidessupportforcommonvariabletypes,mostimportantly:bool … int … long … double string … array timestamp … date objectId … object
• Fullreferencehere:BSONTYPES
BSON Document Types
139
• PythontypessupportedbyPymongo
• PymongoconvertsPythontypesinaJSONdocumenttoBSONtypes
• CustomTypescanalsobedefinedusinga“class”
• DocumenttypescanalsobedefinedusinganORMsuchasMongoEngine
BSON Document Types
140
• Forexample-insertadocumentinPymongoenforcingdate:me:>>> doc = {"movie_id": "date": datetime(2003, 11, 26), "title_id":"tt0111161", "user_location":"Texas", "title_name":"The Shawshank Redemption", "summary":"Best movie ever!!” } >>> db.bsontest.insert(doc)
BSON Document Types
141
• ThenretrievethevalueinMongocli:percona-repl:PRIMARY> db.bsontest.find().pretty() {
"_id" : ObjectId("570aae6d0059a38a781fed60"), "title_id" : "tt0111161", "user_location" : "Texas", "summary" : "Tied for the best movie I have ever seen", "date" : ISODate("2003-11-26T00:00:00Z"), "title_name" : "The Shawshank Redemption"
}
Document Validation
142
• Documentvalida:onissupportedinMongo3.2+
• Valida:oncanbesetduringcollec:oncrea:onoronanexis:ngcollec:on
Document Validation
143
• Documentvalida:onissupportedinMongo3.2+
• Valida:oncanbesetduringcollec:oncrea:onoronanexis:ngcollec:on
• Twomodesofopera:on:
• Strict-Appliedtoalldocumentinserts/updates
• Moderate-Appliedtoinserts/updatesondocumentsthatconform
Document Validation
144
• Documentvalida:onissupportedinMongo3.2+
• Valida:oncanbesetduringcollec:oncrea:onoronanexis:ngcollec:on
• Twomodesofopera:on:
• Strict-Appliedtoalldocumentinserts/updates
• Moderate-Appliedtoinserts/updatesondocumentsthatconform
• Se�ngvalida:onAc:on:
• “warn”fortes:ng(logserrors)
• “error”forenforcing(throwsanerror)
New Document Validation
145
• Createavalida:ononthe“dvtest”collec:on:
db.createCollection(“dvtest", { validator :
{ $and: [ {"title_id" : { $type: "string" }},
{"user_location" : { $exists: true }}, {"title_name" : { $type: "string" }}
] } }
New Document Validation
146
• Insertaninvaliddocumentintothe“dvtest”collec:on:
db.dvtest.insert({“foo": "bar"}) WriteResult({
"nInserted" : 0, "writeError" : { "code" : 121, "errmsg" : "Document failed validation" }
})
Existing Document Validation
147
• Addanewvalida:ontoanexis:ng“dvtest”collec:onwith“moderate”valida:on:
db.runCommand( { collMod: "dvtest", validator: {$and:[{title_id: {$exists:true}}]}, validationLevel: "moderate" } )
Task 3: Fix your types! 03-fix-year-datatype.py
Fix your types
• Openthefile“03-fix-year-datatype.py”fromtheprovidedfiles
• Readthecodeandfollowtheexamplestousethecorrectdatatypes
149
Ranking / Fast Accounting in MongoDB Highperformanceaccoun:ngtoavoidaggrega:on
Pattern: Fast Accounting
151
• Usecase:Countdailyandmonthlyreviewspostedforeachmovie.Displayahistogramonthemoviepage
• Naivesolu:on:Runcountsonthereviewscollec:onwhenhistogramsmustberendered
• Slowandresourceconsumingtoaggregatemillionsofdocuments
• Calcula:ngoneverypageviewistoooXen
• Indexingmayhelpbutitwillnotsolvetheproblem
• Fetchingolddatadestroyspagecache
Pattern: Fast Accounting
152
• FastAccoun:ngDesignPajern
• Createaseparatecollec:ontostoreaggregatecounters
• Updatecounterswhenanewreviewissubmijed
• Iftherearemorethanonecounters,mul:pleupdateswillbeperformed
• ThisisapajerntakenfromComplexEventProcessing(CEP)
Pattern: Fast Accounting - Schema
153
• Createaseparatecollec:onnamed'review_counts':
{ '_id': {'movie_id: ObjectId(…),
'day' : '2016-04-21'}, 'count' : 10345
},
{ '_id': {'movie_id: ObjectId(…),
'month' : '2016-04'}, 'count' : 11210345
}
Pattern: Fast Accounting – Increment counts
154
• Updatedailycounts:
db.review_counts.update_one({'_id': {'movie_id': ObjectId(…),
'day' : '2016-04-21'}}, {'$inc' : {'count' : 1}}, upsert=True)
• Updatemonthlycounts:
db.review_counts.update_one({'_id': {'movie_id': ObjectId(…),
'month' : '2016-04'}}, {'$inc' : {'count' : 1}}, upsert=True)
Pattern: Fast Accounting – Retrieve counts
155
• Retrievedailycountforasingleday:
>>> db.review_counts.find_one({'_id': {'movie_id': ObjectId(…),
'day' : '2016-04-21'}})['count']
10345
Pattern: Fast Accounting
156
• Retrievalsareveryfastbecausetheysearchindexeddata
• Documentsforthelatestdatesandmonthsareinmemory
• Usethe_idindextoensureuniquenessandsavespace
• Updatesareveryfast–Theyhappeninmemory
Pattern: Fast Accounting
157
• Updatesareatomic–Theycanscaletothousandsofconcurrentupdates
• Alwaysuseupsert=True tocreatenewcounters
• Moredimensionscanbeaddedinthecounter–don'toverdoit
• Thispajerncanbeadoptedforaggrega<ngany<meseriesdata
Task 4: Fast Accounting 04-fast-accoun:ng.py
Fast Accounting
• Openthefile“04-fast-accoun:ng.py”fromtheprovidedfiles
• Readthecodeandfollowtheexamplestounderstandthefastaccoun:ngpajern
159
Sharding Considerations Hashvs.TimestampDistribu:on
Sharding in MongoDB
161
• Whatissharding?
• Horizontalpar::oningofdataacrossmul:plenodes/replicasets
• ReplicasetsarerecommendedforHA
• Collec:onsareshardedacrossreplicasetsbasedonashardkey
• Highcardinalityoftheshard-keyensuresevendistribu:onacrossreplicasets
• Collec:onswhicharenotshardedremainontheprimaryshard
Sharding in MongoDB
162
• Whataremyshardingop:ons?
• Hashbased
• Rangebased
• Tagbased
Hash Based Sharding
163
• Usehashindexesfortheranges
• Evenlydistributedreads/writes
• Randomopera:onsduetorandomshardingalgorithm
• Retrievingmul:pledocumentscanleadtoscajer-gather
• Keyusecases:
• Scaling:Loadbalancingreads&writes(exampletofollow)
• Disasterrecovery:Parallelshardrecovery
Hash Based Sharding
164
• Example:Shardkeyhash(date:me)-goodwritedistribu:on
Shard1 (Primary) Shard2 Shard3
WRITES
“2016-4-19 00:00:00” “2016-4-20 00:00:00” “2016-4-18 00:00:00”
Hash Based Sharding
165
• Example:Shardkeyhash(date:me)-scajergatherreads
Shard1 (Primary) Shard2 Shard3
“2016-4-19 00:00:00” “2016-4-20 00:00:00” “2016-4-18 00:00:00”
READS
Finddate<mevaluesbetween17th&21stofApril
Hash Based Sharding
166
• Example:Shardkeyhash(userid)-goodreaddistribu:on
Shard1 (Primary) Shard2 Shard3
“ed4f7269”
READS
Finduserwithid=ed4f7269
Range Based Sharding
167
• Rangesaredefinedonthedefineddatae.g.number/date-:me
• Dataisdividedacrossrangeofdocuments
• E.G.4xshardswithint1..100>>Shard1withvalues1..25etc.
• Canleadtohotspotshardsondate-basedranges
• Asrangeschangechunkmigra:onmaycauseoverhead
• Keyusecases:
• Scaling:Loadbalancingreads&writes
• Disasterrecovery:Parallelshardrecovery
Range Based Sharding
168
• Example:Shardkeydate:mevalue-badwritedistribu:on
Shard1 (Primary) Shard2 Shard3
WRITES
“2016-4-18 00:00:00” “2016-4-19 00:00:00” “2016-4-20 00:00:00”
On20thAprilallwritesgotoShard3
Range Based Sharding
169
• Example:Shardkeydate:mevalue-badwritedistribu:on
Shard1 (Primary) Shard2 Shard3
“2016-4-18 00:00:00” “2016-4-19 00:00:00” “2016-4-20 00:00:00”
Similarscenariowithreads
READS
Tag Based Sharding
170
• Allowsforcustomdatadistribu:on
• Dataisdividedacrosspredefinedtags
• E.G.“Americas”onShard1..“EU”onShard2..“APAC”onShard3
• Canleadtohotspotsdependingonuse-case
• Keyusecases:
• Geo-locality:Forcedataintosuitablegeographicallydispersedshards
• HWOp:miza:on:Forcehotdataontofasterhardware
Tag Based Sharding
171
• Example:Shardtagson“loca:on”-fasterresponse:mes
Shard1 (Primary) Tag: AM Shard2 Tag: EU Shard3 Tag: APAC
WRITES
“USA” “GREECE” “AUSTRALIA”
WritesoccurinalocalDC
Tag Based Sharding
172
• Example:Shardtagson“loca:on”-fasterresponse:mes
Shard1 (Primary) Tag: AM Shard2 Tag: EU Shard3 Tag: APAC
WRITES
“USA” “GREECE” “AUSTRALIA”
READS
ReadsoccurinalocalDC
Tag Based Sharding
173
• Example:Shardtagson“year”ranges-automa:carchiving
Shard1 (Primary) Tag: 2016 Shard2 Tag: 2010 - 2015 Shard3 Tag: < 2010
WRITES
“<NEW DATA>”
NewdataiswriJentohighspeednode
32x Cores - 128GB RAM - SSD 16x Cores - 64GB RAM - SSD 4x Cores - 32GB RAM - Rotational
“<FEWER WRITES>” “<NO WRITE ACTIVITY>”
Tag Based Sharding
174
• Example:Shardtagson“year”ranges-automa:carchiving
Shard1 (Primary) Tag: 2016 Shard2 Tag: 2010 - 2015 Shard3 Tag: < 2010
WRITES
“<NEW DATA>”
NewdataiswriJentohighspeednode
64x Cores - 256GB RAM - SSD 16x Cores - 64GB RAM - SSD 4x Cores - 32GB RAM - Rotational
READS
“<FEWER READS>” “<ONLY REPORTING>”
The End Q&A