Post on 06-Apr-2020
CompSci 516DatabaseSystems
Lecture19NoSQLand
ColumnStore
Instructor:Sudeepa Roy
DukeCS,Fall2017 CompSci516:DatabaseSystems 1
Announcements
• HW3releasedonSakai– DueonMonday,Nov20,11:55pm(in2weeks)– Startsoon,finishsoon!– Youcanlearnaboutconceptualquestionsfromonlinematerial,butmustwriteyourownanswer
• Keepworkingonyourprojecttoo!
DukeCS,Fall2017 CompSci516:DatabaseSystems 2
ReadingMaterialNOSQL:• “ScalableSQLandNoSQLDataStores”RickCattell,SIGMODRecord,December2010(Vol.39,No.4)• seewebpagehttp://cattell.net/datastores/ forupdatesandmorepointers• MongoDBmanual:https://docs.mongodb.com/manual/
ColumnStore:• D.Abadi,P.Boncz,S.Harizopoulos,S.Idreos andS.Madden.TheDesignandImplementationof
ModernColumn-OrientedDatabaseSystems.FoundationsandTrendsinDatabases,vol.5,no.3,pp.197–280,2012.
• SeeVLDB2009tutorial:http://nms.csail.mit.edu/~stavros/pubs/tutorial2009-column_stores.pdf
Optional:• “Dynamo:Amazon’sHighlyAvailableKey-valueStore”ByGiuseppeDeCandia et.al.SOSP
2007
• “Bigtable:ADistributedStorageSystemforStructuredData”FayChanget.al.OSDI2006
DukeCS,Fall2017 CompSci516:DatabaseSystems 3
NoSQL
DukeCS,Fall2017 CompSci516:DatabaseSystems 4
DukeCS,Fall2017 CompSci516:DatabaseSystems 5
Sofar-- RDBMS
• RelationalDataModel• RelationalDatabaseSystems(RDBMS)• RDBMSshave– acompletepre-definedfixedschema– aSQLinterface– andACIDtransactions
DukeCS,Fall2017 CompSci516:DatabaseSystems 6
Today• NoSQL:”new”databasesystems– nottypicallyRDBMS– relaxonsomerequirements,gainefficiencyandscalability
• Newsystemschoosetouse/notuseseveralconceptswelearntsofar– e.g.“System---”doesnotuselocksbutusesmulti-versionCC(MVCC)or,
– “System---”usesasynchronousreplication• therefore,itisimportanttounderstandthebasics(Lectures1-18)eveniftheyarenotusedinsomenewsystems!
DukeCS,Fall2017 CompSci516:DatabaseSystems 7
Warnings!
• MaterialfromCattell’spaper(2010-11)–someinfowillbeoutdated– seewebpagehttp://cattell.net/datastores/ forupdatesandmorepointers
• WewillfocusonthebasicideasofNoSQLsystems
• Optional readingslidesattheendonMongoDB– maybeusefulforHW3– therearealsocomparisontablesintheCattell’spaperifyouareinterested
DukeCS,Fall2017 CompSci516:DatabaseSystems 8
OLAPvs.OLTP
• OLTP(OnLine TransactionProcessing)– Recalltransactions!– Multipleconcurrentread-writerequests– Commercialapplications(banking,onlineshopping)– Datachangesfrequently– ACIDproperties,concurrencycontrol,recovery
• OLAP(OnLine AnalyticalProcessing)– Manyaggregate/group-byqueries– multidimensionaldata– Datamostlystatic– WillstudyOLAPCubesoon
DukeCS,Fall2017 CompSci516:DatabaseSystems 9
NewSystems• WewillexamineanumberofSQLandso- called“NoSQL”systemsor“datastores”
• DesignedtoscalesimpleOLTP-styleapplicationloads– todoupdatesaswellasreads– incontrasttotraditionalDBMSsanddatawarehouses– toprovidegoodhorizontalscalability(?)forsimpleread/writedatabaseoperationsdistributedovermanyservers
• OriginallymotivatedbyWeb2.0applications– thesesystemsaredesignedtoscaletothousandsormillionsofusers
DukeCS,Fall2017 CompSci516:DatabaseSystems 10
NewSystemsvs.RDMS• Whenyoustudyanewsystem,compareitwithRDBMS-sonits– datamodel– consistencymechanisms– storagemechanisms– durabilityguarantees– availability– querysupport
• Thesesystemstypicallysacrificesomeofthesedimensions– e.g.database-widetransactionconsistency,inordertoachieveothers,e.g.higheravailabilityandscalability
DukeCS,Fall2017 CompSci516:DatabaseSystems 11
NoSQL
• Manyofthenewsystemsarereferredtoas“NoSQL”datastores
• NoSQLstandsfor“NotOnlySQL”or“NotRelational”– notentirelyagreedupon
• Next:sixkeyfeaturesofNoSQLsystems
DukeCS,Fall2017 CompSci516:DatabaseSystems 12
NoSQL:SixKeyFeatures
1. theabilitytohorizontallyscale“simpleoperations”throughputovermanyservers
2. theabilitytoreplicateandtodistribute(partition)dataovermanyservers
3. asimplecalllevelinterfaceorprotocol(incontrasttoSQLbinding)
4. aweakerconcurrencymodelthantheACIDtransactionsofmostrelational(SQL)databasesystems
5. efficientuseofdistributedindexesandRAMfordatastorage6. theabilitytodynamicallyaddnewattributestodatarecords
DukeCS,Fall2017 CompSci516:DatabaseSystems 13
ImportantExamplesofNewSystems
• Threesystemsprovideda“proofofconcept”andinspiredmanyotherdatastores
1. Memcached2. Amazon’sDynamo3. Google’sBigTable
DukeCS,Fall2017 CompSci516:DatabaseSystems 14
1.Memcached:mainfeatures
• popularopensourcecache
• supportsdistributedhashing(later)
• demonstratedthatin-memoryindexes canbehighlyscalable,distributing andreplicatingobjectsovermultiplenodes
DukeCS,Fall2017 CompSci516:DatabaseSystems 15
2.Dynamo:mainfeatures
• pioneeredtheideaofeventualconsistencyasawaytoachievehigheravailabilityandscalability
• datafetchedarenotguaranteedtobeup-to-date
• butupdatesareguaranteedtobepropagatedtoallnodeseventually
DukeCS,Fall2017 CompSci516:DatabaseSystems 16
3.BigTable :mainfeatures
• demonstratedthatpersistentrecordstoragecouldbescaledtothousandsofnodes
• “columnfamilies”
• https://cloud.google.com/bigtable/• https://static.googleusercontent.com/media/research.google.co
m/en//archive/bigtable-osdi06.pdf
DukeCS,Fall2017 CompSci516:DatabaseSystems 17
BASE(notACIDJ)
• RecallACIDforRDBMSdesiredpropertiesoftransactions:– Atomicity,Consistency,Isolation,andDurability
• NOSQLsystemstypicallydonotprovideACID
• BasicallyAvailable• Softstate• Eventuallyconsistent
DukeCS,Fall2017 CompSci516:DatabaseSystems 18
ACIDvs.BASE• TheideaisthatbygivingupACIDconstraints,onecanachievemuchhigherperformanceandscalability
• Thesystemsdifferinhowmuchtheygiveup– e.g.mostofthesystemscallthemselves“eventuallyconsistent”,meaningthatupdatesareeventuallypropagatedtoallnodes
– butmanyofthemprovidemechanismsforsomedegreeofconsistency,suchasmulti-versionconcurrencycontrol(MVCC)
DukeCS,Fall2017 CompSci516:DatabaseSystems 19
“CAP”Theorem
• OftenEricBrewer’sCAPtheoremcitedforNoSQL
• A systemcanhaveonlytwooutofthreeofthefollowingproperties:• Consistency– doallclientsseethesamedata?
• Availability– isthesystemalwayson?
• Partition-tolerance– evenifcommunicationisunreliable,doesthesystemfunction?
• TheNoSQLsystemsgenerallygiveupconsistency– However,thetrade-offsarecomplex
DukeCS,Fall2017 CompSci516:DatabaseSystems 20
TwofociforNoSQLsystems
1. “Simple”operations
2. HorizontalScalability
DukeCS,Fall2017 CompSci516:DatabaseSystems 21
1.“Simple”Operations
• Readingorwritingasmallnumberofrelatedrecordsineachoperation– e.g.keylookups– readsandwritesofonerecordorasmallnumberofrecords
• Thisisincontrasttocomplexqueries,joins,orread-mostlyaccess
• Inspiredbyweb,wheremillionsofusersmaybothreadandwritedatainsimpleoperations– e.g.searchandupdatemulti-serverdatabasesofelectronic
mail,personalprofiles,webpostings,wikis,customerrecords,onlinedatingrecords,classifiedads,andmanyotherkindsofdata
DukeCS,Fall2017 CompSci516:DatabaseSystems 22
2.HorizontalScalability
• Shared-NothingHorizontalScaling
• Theabilitytodistributeboththedataandtheloadofthesesimpleoperationsovermanyservers– withnoRAMordisksharedamongtheservers
• Not“vertical”scaling– whereadatabasesystemutilizesmanycoresand/orCPUsthatshareRAManddisks
• Someofthesystemswedescribeprovidebothverticalandhorizontalscalability
DukeCS,Fall2017 CompSci516:DatabaseSystems 23
2.Horizontalvs.VerticalScaling
• Effectiveuseofmultiplecores(verticalscaling)isimportant– butthenumberofcoresthatcansharememoryislimited
• horizontalscalinggenerallyislessexpensive– canusecommodityservers
• Note:horizontalandverticalpartitioningarenotrelatedtohorizontalandverticalscaling (Lecture18)– exceptthattheyarebothusefulforhorizontalscaling
DukeCS,Fall2017 CompSci516:DatabaseSystems 24
WhatisdifferentinNOSQLsystems
• WhenyoustudyanewNOSQLsystem,noticehowitdiffersfromRDBMSintermsof
1. ConcurrencyControl2. DataStorageMedium3. Replication4. Transactions
DukeCS,Fall2017 CompSci516:DatabaseSystems 25
ChoicesinNOSQLsystems:1.ConcurrencyControl
a) Locks– somesystemsprovideone-user-at-a-timereadorupdatelocks– MongoDBprovideslockingatafieldlevel
b) MVCCc) None– donotprovideatomicity– multipleuserscaneditinparallel– noguaranteewhichversionyouwillread
d) ACID– pre-analyzetransactionstoavoidconflicts– nodeadlocksandnowaitsonlocks
DukeCS,Fall2017 CompSci516:DatabaseSystems 26
ChoicesinNOSQLsystems:2.DataStorageMedium
a) StorageinRAM– snapshotsorreplicationtodisk– poorperformancewhenoverflowsRAM
b) Diskstorage– cachinginRAM
DukeCS,Fall2017 CompSci516:DatabaseSystems 27
ChoicesinNOSQLsystems:3.Replication
• whethermirrorcopiesarealwaysinsynca) Synchronousb) Asynchronous– faster,butupdatesmaybelostinacrash
c) Both– localcopiessynchronously,remotecopies
asynchronously
DukeCS,Fall2017 CompSci516:DatabaseSystems 28
ChoicesinNOSQLsystems:4.TransactionMechanisms
a) supportb) donotsupportc) inbetween– supportlocaltransactionsonlywithinasingle
objector“shard”– shard=ahorizontalpartitionofdataina
database
DukeCS,Fall2017 CompSci516:DatabaseSystems 29
ComparisonfromCattell’spaper(2011)
DukeCS,Fall2017 CompSci516:DatabaseSystems 30
DataModelTerminologyforNoSQL
• UnlikeSQL/RDBMS,theterminologyforNoSQLisofteninconsistent– wearefollowingnotationsinCattell’spaper
• Allsystemsprovideawaytostorescalarvalues– e.g.numbersandstrings
• Someofthemalsoprovideawaytostoremorecomplexnestedorreferencevalues
DukeCS,Fall2017 CompSci516:DatabaseSystems 31
DataModelTerminologyforNoSQL
• Thesystemsallstoresetsofattribute-valuepairs– butusefourdifferentdatastructures
1. Tuple2. Document3. ExtensibleRecord4. Object
DukeCS,Fall2017 CompSci516:DatabaseSystems 32
1.Tuple
• Sameasbefore• A“tuple”isarowinarelationaltable– attributenamesarepre-definedinaschema– thevaluesmustbescalar– thevaluesarereferencedbyattributename– incontrasttoanarrayorlist,wheretheyarereferencedbyordinalposition
DukeCS,Fall2017 CompSci516:DatabaseSystems 33
2.Document
• Allowsvaluestobenesteddocumentsorlistsaswellasscalarvalues– thinkaboutXMLorJSON
• Theattributenamesaredynamicallydefinedforeachdocumentatruntime
• Adocumentdiffersfromatupleinthattheattributesarenotdefinedinaglobalschema– anda widerrangeofvaluesarepermitted
DukeCS,Fall2017 CompSci516:DatabaseSystems 34
3.ExtensibleRecord
• A hybrid betweenatupleandadocument• familiesofattributesaredefinedinaschema• butnewattributescanbeadded(withinanattributefamily)onaper-recordbasis
• Attributesmaybelist-valued
DukeCS,Fall2017 CompSci516:DatabaseSystems 35
4.Object
• Analogoustoanobjectinprogramminglanguages– butwithouttheproceduralmethods
• Valuesmaybereferencesornestedobjects
DukeCS,Fall2017 CompSci516:DatabaseSystems 36
DataStoreCategories• Thedatastoresaregroupedaccordingtotheirdatamodel• Key-valueStores:
– storevaluesandanindextofindthem– basedonaprogrammer- definedkey
• DocumentStores:– storedocuments– Thedocumentsareindexedandasimplequerymechanismis
provided• ExtensibleRecordStores:
– storeextensiblerecordsthatcanbepartitionedverticallyandhorizontallyacrossnodes
– Somepaperscallthese“widecolumnstores”• RelationalDatabases:
– store(andindexandquery)tuples– e.g.thenewRDBMSsthatprovidehorizontalscaling
DukeCS,Fall2017 CompSci516:DatabaseSystems 37
ExampleNOSQLsystems
• Key-valueStores:– ProjectVoldemort,Riak,Redis,Scalaris,TokyoCabinet,Memcached/Membrain/Membase
• DocumentStores:– AmazonSimpleDB,CouchDB,MongoDB,Terrastore
• ExtensibleRecordStores:– Hbase,HyperTable,Cassandra,Yahoo’sPNUTS
• RelationalDatabases:– MySQLCluster,VoltDB,Clustrix,ScaleDB,ScaleBase,NimbusDB,GoogleMegastore(alayeronBigTable)
DukeCS,Fall2017 CompSci516:DatabaseSystems 38
UseCase:Key-valuestore
• ifyouhaveasimpleapplicationwithonlyonekindofobject,andyou onlyneedtolookupobjectsupbasedononeattribute
• Supposeyouhaveawebapplication– thatdoesmanyRDBMSqueriestocreateatailoredpagewhenauserlogsin
– Supposeittakesseveralsecondstoexecutethosequeries,andtheuser’sdataisrarelychanged
– youmightwanttostoretheuser’stailoredpageasasingleobjectinakey-valuestore
DukeCS,Fall2017 CompSci516:DatabaseSystems 39
Usecase:DocumentStore
• applicationwithmultipledifferentkindsofobjects– e.g.inaDepartmentofMotorVehiclesapplication,withvehiclesanddrivers
• whereyouneedtolookupobjectsbasedonmultiplefields– e.g.,adriver’sname,licensenumber,ownedvehicle,orbirthdate
DukeCS,Fall2017 CompSci516:DatabaseSystems 40
Usecase:ExtensibleRecordStore• usescasessimilartothosefordocumentstores:
– multiplekindsofobjects,withlookupsbasedonanyfield.
• However,aimedathigherthroughput,andmayprovidestrongerconcurrencyguarantees,– atthecostofslightlymorecomplexitythanthedocumentstores
• SupposestoringcustomerinformationforaneBay-styleapplication,andyouwanttopartitionyourdatabothhorizontallyandvertically:– clustercustomersbycountry,sothatyoucanefficientlysearchallof
thecustomersinonecountry– separatetherarely-changed“core”customerinformationsuchas
customeraddressesandemailaddressesinoneplace,and– putcertainfrequently-updatedcustomerinformation(suchascurrent
bidsinprogress)inadifferentplace,toimproveperformanceDukeCS,Fall2017 CompSci516:DatabaseSystems 41
Usecase:ScalableRDBMS
• Ifyourapplicationrequiresmanytableswithdifferenttypesofdata– arelationalschemacentralizesandsimplifiesdatadefinitionandSQLsimplifiesoperations
– orforprojectswithmanyprogrammers• However,moreusefuliftheapplicationdoesnotrequire– updatesorjoinsthatspanmanynodes– transactioncoordination– or,datamovement
DukeCS,Fall2017 CompSci516:DatabaseSystems 42
ConsistentHashing
DukeCS,Fall2017 CompSci516:DatabaseSystems 43
inDynamoDB
ConsistentHashing(CH)• Recalldynamichashingschemes• Ifthe#ofslots(directorysize)changes,thenalmostallkeyshadtoberemapped
• Inconsistenthashing(CH),with#keys=Kand#slots=N,onlyK/Nkeysneedtoberemappedonaverage
• AppliestothedesignofDistributedHashTable(DHTs)forUniformLoadDistribution– partitionakeyspace amongasetofsites/nodes– additionallyprovideanoverlaynetworkthatconnectsnodessuchthatthenodesresponsibleforanykeycanbeefficientlylocated
DukeCS,Fall2017 CompSci516:DatabaseSystems 44
DynamoDB :CH1/2• [ref.theDynamoDB paper,sec4.3]• Mustscaleincrementally• Consistenthashingisusedtodynamicallydistribute
dataarounda“ring”ofnodes(=sites)• Theoutputofahashfunctionistreatedasacircular
ring• Eachnodeisassignedarandomvalueinthisspace
– representsthe“position”onthering
DukeCS,Fall2017 CompSci516:DatabaseSystems 45
• Dataitemidentifiedbyakey• Assigntoanodebyhashing
thekeyto
DynamoDB :CH2/2• Dataitemidentifiedbyakey• Assigntoanodebyhashingthekeytoyielditsposition
onthering• Walktheringclockwisetofindthefirstnodewitha
positionlargerthantheitem’sposition• Eachnodeisresponsiblefortheregioninthering
betweenitanditspredecessornodeonthering
DukeCS,Fall2017 CompSci516:DatabaseSystems 46
• Note:• departureorarrivalofanodeonly
affectsitsimmediateneighbor• Theothernodesremainunaffected• K/Nonaverage!
DynamoDB:Replication• Dynamoreplicatesitsdataonmultiple(N)hostsforhigh
availabilityanddurability• Eachkeykisassignedtoacoordinatorwhichisinchargeof
replication– coordinatorhandlesallkeysinitsrange– Coordinatorreplicateseachkeyitisinchargeof
• bystoringitlocally• replicatingitattheN-1clockwisesuccesor nodesinthering
• EachnodeisinchargeofregionoftheringbetweenitanditsN-th predecessor
DukeCS,Fall2017 CompSci516:DatabaseSystems 47
NodeBreplicateskeyKatnodesCandDNodeDwillstorekeysintherange(A,B],(B,C],(C,D]Note:theremaybe<N“physical”nodes,uses“virtualnodes”
CHHistory• ProposedbyCStheoreticiansfromMIT:
– Karger-Lehman-Leighton-Panigrahy-Levine-Lewin– “ConsistentHashingandRandomTrees:DistributedCachingProtocols
forRelievingHotSpotsontheWorldWideWeb”– STOC1997
• ConsistenthashinggavebirthtoAkamaiTechnologies– FoundedbyDannyLewinandTomLeightonin1998– Akamai’scontentdeliverynetworkisoneofthelargestdistributed
computingplatforms– Nowmarketcap$12Band6200employees– Managingweb-presenceofmanymajorcompanies
• 2001:TheconceptofDistributedHashTable(DHT)isproposed(howtolookforafile)andCHwasre-purposed
• NowusedinDynamo,Couchbase,Cassandra,Voldemort,Riak,..
DukeCS,Fall2017 CompSci516:DatabaseSystems 48
SQLvs.NOSQL
DukeCS,Fall2017 CompSci516:DatabaseSystems 49
Argumentsforbothsidesstillacontroversialtopic
WhychooseRDBMSoverNoSQL:1/31. Ifnewrelationalsystemscandoeverything
thataNoSQLsystemcan,withanalogousperformanceandscalability(?),andwiththeconvenienceoftransactionsandSQL,NoSQLisnotneeded
2. RelationalDBMSshavetakenandretainedmajoritymarketshareoverothercompetitorsinthepast30years– (network,object,andXMLDBMSs)
DukeCS,Fall2017 CompSci516:DatabaseSystems 50
WhychooseRDBMSoverNoSQL:2/33. SuccessfulrelationalDBMSshavebeenbuilt
tohandleotherspecificapplicationloads inthepast:– read-onlyorread-mostlydatawarehousing– OLTPonmulti-coremulti-diskCPUs– in-memorydatabases– distributeddatabases,and– nowhorizontallyscaleddatabases
DukeCS,Fall2017 CompSci516:DatabaseSystems 51
WhychooseRDBMSoverNoSQL:3/3
4. Whileno“onesizefitsall”intheSQLproductsthemselves,thereisacommoninterfacewithSQL,transactions,andrelationalschemathatgiveadvantagesintraining,continuity,anddatainterchange
DukeCS,Fall2017 CompSci516:DatabaseSystems 52
WhychooseNoSQLoverRDBMS:1/31. Wehaven’tyetseengoodbenchmarksshowing
thatRDBMSscanachievescaling comparablewithNoSQLsystemslikeGoogle’sBigTable
2. Ifyouonlyrequirealookupofobjectsbasedonasinglekey– thenakey-valuestoreisadequateandprobablyeasiertounderstand
thanarelationalDBMS– Likewiseforadocumentstoreonasimpleapplication:youonlypay
thelearningcurveforthelevelofcomplexityyourequire
DukeCS,Fall2017 CompSci516:DatabaseSystems 53
WhychooseNoSQLoverRDBMS:2/3
3. Someapplicationsrequireaflexibleschema– allowingeachobjectinacollectiontohavedifferentattributes
– WhilesomeRDBMSsallowefficient“packing”oftupleswithmissingattributes,andsomeallowaddingnewattributesatruntime,thisisuncommon
DukeCS,Fall2017 CompSci516:DatabaseSystems 54
WhychooseNoSQLoverRDBMS:3/3
4. ArelationalDBMSmakes“expensive”(multi- nodemulti-table)operations“tooeasy”– NoSQLsystemsmakethemimpossibleorobviouslyexpensiveforprogrammers
5. WhileRDBMSshavemaintainedmajoritymarketshareovertheyears,otherproductshaveestablishedsmallerbutnon-trivialmarketsinareaswherethereisaneedforparticularcapabilities– e.g.indexedobjectswithproductslikeBerkeleyDB,orgraph-following
operationswithobject-orientedDBMSs
DukeCS,Fall2017 CompSci516:DatabaseSystems 55
ColumnStore
DukeCS,Fall2017 CompSci516:DatabaseSystems 56
Rowvs.ColumnStore
• Rowstore– storeallattributesofatupletogether– storagelike“row-majororder”inamatrix
• Columnstore– storeallrowsforanattribute(column)together– storagelike“column-majororder”inamatrix
• e.g.– MonetDB,Vertica(earlier,C-store),SAP/SybaseIQ,GoogleBigtable (withcolumngroups)
DukeCS,Fall2017 CompSci516:DatabaseSystems 57
DukeCS,Fall2017 CompSci516:DatabaseSystems 58
Ack:SlidefromVLDB2009tutorialonColumnstore
DukeCS,Fall2017 CompSci516:DatabaseSystems 59
Ack:SlidefromVLDB2009tutorialonColumnstore
DukeCS,Fall2017 CompSci516:DatabaseSystems 60
Ack:SlidefromVLDB2009tutorialonColumnstore
DukeCS,Fall2017 CompSci516:DatabaseSystems 61
Ack:SlidefromVLDB2009tutorialonColumnstore
DukeCS,Fall2017 CompSci516:DatabaseSystems 62
Ack:SlidefromVLDB2009tutorialonColumnstore
DukeCS,Fall2017 CompSci516:DatabaseSystems 63
AdditionalandOptionalSlidesonMongoDB
(MaybeusefulforHW3)https://docs.mongodb.comhttps://docs.mongodb.com/manual/reference/sql-comparison/
MongoDB
• MongoDBisanopensourcedocumentstorewritteninC++• providesindexesoncollections• lockless• providesadocumentquerymechanism• supportsautomaticsharding• Replicationismostlyusedforfailover• doesnotprovidetheglobalconsistencyofatraditionalDBMS
– butyoucangetlocalconsistencyontheup-to-dateprimarycopyofadocument
• supportsdynamicquerieswithautomaticuseofindices,likeRDBMSs
• alsosupportsmap-reduce– helpscomplexaggregationsacrossdocs
• providesatomicoperationsonfields
DukeCS,Fall2017 CompSci516:DatabaseSystems 64
Optionalslide:Readyourself
MongoDB:AtomicOpsonFields• Theupdatecommandsupports“modifiers”thatfacilitateatomic
changestoindividualvalues– $setsetsavalue– $inc incrementsavalue– $pushappendsavaluetoanarray– $pushAll appendsseveralvaluestoanarray– $pullremovesavaluefromanarray,and$pullAll removesseveral
valuesfromanarray• Sincetheseupdatesnormallyoccur“inplace”,theyavoidthe
overheadofareturntriptotheserver• Thereisan“updateifcurrent”conventionforchangingadocument
onlyiffieldvaluesmatchagivenpreviousvalue• MongoDBsupportsafindAndModify commandtoperforman
atomicupdateandimmediatelyreturntheupdateddocument– usefulforimplementingqueuesandotherdatastructuresrequiring
atomicity
DukeCS,Fall2017 CompSci516:DatabaseSystems 65
Optionalslide:Readyourself
MongoDB:Index• MongoDBindicesareexplicitlydefinedusinganensureIndex call– anyexistingindicesareautomaticallyusedforqueryprocessing
• Tofindallproductsreleasedlastyear(2015)orlatercostingunder$100youcouldwrite:
• db.products.find({released:{$gte:newDate(2015,1,1,)},price{‘$lte’:100},})
DukeCS,Fall2017 CompSci516:DatabaseSystems 66
Optionalslide:Readyourself
MongoDB:Data
• MongoDBstoresdatainabinaryJSON-likeformatcalledBSON– BSONsupportsboolean,integer,float,date,stringandbinarytypes
–MongoDBcanalsosupportlargebinaryobjects,eg.imagesandvideos
– Thesearestoredinchunksthatcanbestreamedbacktotheclientforefficientdelivery
DukeCS,Fall2017 CompSci516:DatabaseSystems 67
Optionalslide:Readyourself
MongoDB:Replication
• MongoDBsupportsmaster-slavereplicationwithautomaticfailoverandrecovery– Replication(andrecovery)isdoneatthelevelofshards
– Replicationisasynchronousforhigherperformance,sosomeupdatesmaybelostonacrash
DukeCS,Fall2017 CompSci516:DatabaseSystems 68
Optionalslide:Readyourself