Download - Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

Transcript
Page 1: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

CS133:Databases

Fall2019Lec25–12/10

NoSQL

Prof.BethTrushkowsky

FinalExamLogistics

•  Finalexamtake-home– Available:in-classonThursday– Due:WednesdayDecember18th,5:15pm

•  Sameresourcesasmidterm– Exceptthistime,twonotesheetsallowed(canre-useyourownfrommidterm)

GoalsforToday

•  DiscussdatareplicationindistributedDBMSs

•  Understandthemotivationandgoalsfor“NoSQL”datamanagementsystems

•  Reasonaboutthekeyconcepts,techniques,andtradeoffsforNoSQLsystems– TouchonacouplespecificNoSQLsystems(Dynamo,MongoDB,Cassandra)

SomeReferencesUsed

•  TenRulesforScalablePerformancein“SimpleOperation”Datastores–  CommunicationsoftheACM2011–  StonebrakerandCattell

•  ScalableSQLandNoSQLDataStores–  SIGMODRecord2011–  Cattell

•  Dynamo:Amazon’sHighlyAvailableKey-valueStore–  SOSP2007–  DeCandiaetal

•  MongoDBandCassandrawebsites

Page 2: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

AsynchronousReplication

•  Themodifyingxactcancommitbeforeallcopieshavebeenchanged– Users/appsmustbeawareofwhichcopytheyarereading,andthatcopiesmaybeout-of-syncforshortperiodsoftime

•  Twoapproachesforreplication:–  PrimarySite

–  Peer-to-Peer(akaorupdate-anywhere)– Differenceliesinhowmanycopiesare“updatable”

PrimarySiteReplication•  Exactlyonecopyofarelationpartitionisdesignatedtheprimarycopy.–  Replicasatothersitescannotbedirectlyupdated

•  Howarechangestotheprimarycopypropagatedtothesecondarycopies?– Oneapproach:logshipping

reads/writes

readsonly

readsonlyIfreadshappenatsecondarycopies,thenpossiblethataxactisnotbeabletoreaditsownwrites

Peer-to-PeerReplication

•  Morethanoneofthecopiesofanobjectcanbeprimary

•  Changestoacopymustbepropagatedtoothercopies

•  Iftwocopiesareupdatedinaconflictingmanner,thismustberesolved–  E.g.,Lastwritewins?Combineupdatessomehow?

reads/writes

reads/writes

reads/writes

Strongvs.EventualConsistency•  Strong:afterupdatetoanobject,subsequentreadsseethatupdate•  Weak:subsequentreadsofanupdatemaynotreflectthatupdate

–  Eventual:ifupdatesceased,eventuallythesystemwouldreflectallupdates

•  Eventualconsistencyhassomevariation

–  Read-your-own-writes,specialcaseofsessionorcausalconsistency–  Monotonicreads

•  BASE,notACID!

–  BASE:Basicallyavailable,softstate,eventuallyconsistentWernerVogelsoneventualconsistency:http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

“EventuallyConsistent-Buildingreliabledistributedsystemsataworldwidescaledemandstrade-offsbetweenconsistencyandavailability.”-Vogels,CTOAmazon.com

Page 3: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

Exercise2:ReplicaConsistency

•  SupposehaveNreplicasofsomedataobject– W=#replicaswritetobeforexactcommits

– R=#replicasreadfrom

•  Strongconsistency:overlaptheWandRsets– R+W>N

– E.g.,Read-one-write-all:R=1,W=N

Whatis“NoSQL”?Here’sachallenge:canyoudescribeNoSQLwithoutusinganybuzzwords?!

Web2.0Agile

Development

EventualConsistency(BASEnotACID)

Amovementaroundnon-relationaldatastores

Tonsofuser-generatedcontentontheweb•  E.g.,social

networkingsites•  Web1.0:few

contentcreators,morestaticwebsites

Developandchangewebapplicationsiteratively•  Schemachangesand

non-uniformity•  Makechangeswhile

theapplicationislive

Performancegainsattheexpenseofconsistency•  Growincrementallyby

leveragingleasedcloudresourceslikeIaaS

•  Automaticallygrowresourcesasneeded

Storyofa“Successful”WebStartup

•  StartwitharelationalDBMSrunningonsinglemachine

•  Logicinwebapplicationmanagesdirectingqueries–  Cross-shardfiltersandjoinscodedinsidetheapp–  Applogicdealswithdataconsistency

Websitegetspopular!!Needtoscaleup…

…somanuallypartition/sharddataacrossmorenodes

Asthenumberofmachinesincreases,thechancethatsomethingfailsincreases

TonsofNoSQLsystems

Page 4: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

CAPTheorem•  EricBrewer’sCAPtheorem:adistributedsystemcan

onlyhavetwoofthefollowingthreeproperties:–  Consistency

–  Availability

–  TolerancetonetworkPartitions

Ofreplicateddata

Forwriterequests

Summary:NoSQLMotivation

•  DevelopmentofNoSQLsystemsmotivatedbydifficultyscalingupWeb2.0applications–  Thousandstomillionsofusers

– Many[small]readsandwrites(“smalloperations”)

•  Typicallymakesacrificesforperformance–  E.g.,noACIDxacts,eventualconsistency

AchievingScalablePerformance

•  Rule#1:Shared-nothingscalability

•  Rule#4:Highavailabilityandautomaticrecoveryessential

•  Rule#5:On-lineeverything(systemalways“up”)

•  Rule#6:Avoidmulti-nodeoperators

StonebrakerandCattell

Rule#1:Shared-nothingscalability

•  Goal:asapplicationgrows,needmoreserversaddedseamlessly–  Don’twantmanualmanagementofscalingup

•  Exampletechniques:–  Consistenthashing(DynamoandCassandra)–  Periodicre-balancingofpartitions(MongoDB)

Fig2inDynamo

Page 5: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

Rule#4:HighAvailabilityandAuto-Recovery

•  Goal:updatesalwayssucceed!–  Issue:conflictingwritesondisjointsetsofreplicas

Exercise3

•  Example:shoppingcart– Add2itemstocart,updategoestotworeplicas

– Partition!Add1(different)itemtoeachreplica

– Bothcartsare“version2”L

•  Dynamo:vectorclocks

•  Orlatesttimestampwins?

“…inthecaseofatimestamptie,Cassandrafollowstworules:first,deletestakeprecedenceoverinserts/updates.Second,iftherearetwoupdates,theonewiththelexicallylargervalueisselected.”

https://wiki.apache.org/cassandra/FAQ#clocktie

Replication/AvailabilityExamples

•  MongoDB:automaticfailoverforprimary

•  Cassandra/Dynamo:peer-to-peerreplication–  tunableconsistency,e.g.,quorumornot

Picsfromhttps://docs.mongodb.org/master/MongoDB-replication-guide-master.pdf

Rule#5:On-lineEverything(Schema)

•  Recall:–  Adatamodelisacollectionofhigh-leveldatadescriptionconstructs

–  Aschemaisadescriptionofaparticularcollectionofdata,usingagivendatamodel

•  Relationalmodelhasarigid,structuredschema–  Attributesforrelationpre-defined,sharedbyalltuples–  Dataandintegrityconstraints–  Referentialconstraints

Whatifyouwantmoreflexibility?

Page 6: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

NoSQL:Non-RelationalDataModels

•  Agiledevelopment,liveschemachanges– Noenforcementofstructure– E.g.,every“tuple”couldhavedifferentattributes

•  Inessence,thesedatamodelsarekey-based– Key:someuniqueidentifiertolookupacorresponding“value”

– Whatthevalueiscanbecomplex

Keytypicallyplaysaroleindatapartitioningscheme

Key-ValueDataModel

•  Examplesystem:Amazon’sDynamo

•  Keyissomeuniqueidentifier,valuecanbeanything,BLOBinterpretedbyapplogic–  E.g.,idàshoppingcartcontents

•  Queryfunctionality– Get(key),put(key,value)– Onlyprimarykeyindex– Noindexlookupsonnon-keys(secondaryindexes)

DocumentDataModel

•  Examplesystem:MongoDB

•  Storescollectionsof“documents”(e.g.,JSON)–  Relation:tuple::Collection:document–  KeyàDocument– Documenthaskey-valuepairs,canbenestedlistsorscalars(andnotdefinedinaglobalschema)

•  Queryfunctionality–  Primarykeylookups–  Secondaryindexesonotherattributes

MongoDBExampleExample:infoaboutproducts,whichhavemanypartsdb.createCollection(”parts") db.createCollection(”products") // example part { _id : ObjectID('AAAA'), partno : '123-aff-456', name : '#4 grommet', qty: 94, cost: 0.94, price: 3.99 manufac_addr : [ { street: '123 Sesame St',

city: 'Anytown', cc: 'USA' }, { street: '123 Avenue Q',

city: 'New York', cc: 'USA' }]}

Modifiedfrom:http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1

One-to-manyrelationshipcapturedwithreferences

// example product{ name : ‘smoke shifter', manufacturer : 'Acme Corp', catalog_number: 1234, parts : [ ObjectID('AAAA'), ObjectID('F17C’), ObjectID('D2AA’)]}

WhataboutaJOIN?

Page 7: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

ExtensibleRecord(akaColumnFamily)

•  Examplesystem:BigTable,Cassandra

•  Abitmorecomplexthandocumentmodel–  Relation:tuple::ColumnFamily:Row–  KeyàSetofcolumns(“wide-columnstore”)–  Eachcolumnhaskey-valuepairs– Differentrecordscanhavedifferentcolumns

•  Queryfunctionalityin“CQL”–  Primarykeylookupsbyrow(withsortedcolumns)–  Secondaryindexes

“Row”called“partition”now

CassandraExamples

Roughlybasedon:http://www.datastax.com/dev/blog/thrift-to-cql3

CREATE TABLE people (user_id text PRIMARY KEY,name text,addresses list

);

CREATE TABLE comments (article_id uuid,posted_at timestamp,author text,content text,PRIMARY KEY (article_id,posted_at)

);

alicious addresses name

West Alice

North

Drinkward

bobtastic addresses name

1LonelyAve Bob

a4i89 2015-12-03

2015-11-11

author Bob author Alice

content Bravo! content AwfulL

Firstargumentispartitionkey,othersare“clustering”columns

Becomesthepartitionkey

Rule#6:AvoidMulti-NodeQueries

•  NoACIDtransactionsacrossprimarykeys

•  Nojoins!Denormalizationhelps

•  Systemsofferdifferentlevelsof“protection”– Key-valuestores:get(key)methodrequireskey– MongoDB:Tablescansdiscouraged– Cassandra:Tablescansprohibited

db.people.find( {favorite-color: ‘blue’}, {name: 1, addresses: 1})