Polyglot Persistence NoSQL 3-in-1 Database: Graph DB, Key/Value & Document Store
How to Create Massively Scalable Database ApplicationsNoSQL DB 32 227 K NoSQL DB 2 275 K NoSQL DB 3...
Transcript of How to Create Massively Scalable Database ApplicationsNoSQL DB 32 227 K NoSQL DB 2 275 K NoSQL DB 3...
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
HowtoCreateMassivelyScalableDatabaseApplications
DougHood@ScalableDBDougConsultingMemberofTechnicalStaffProductManagerTimesTen In-MemoryDatabaseMay16,2019
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
Agenda
Latency,ThroughputandScalability
Scale-upvsScale-out
Scale-outArchitectures
TrivialScalabilityBenchmarks
ScalingaCustomerWorkload
SummaryandQ&A
2
1
2
3
4
5
6
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
Latency,ThroughputandScalabilityLatency Howquicklycanoneoperationcomplete
Onesprinterin9.58 seconds~40km/hfor100M[2009]
Throughput Howquicklycanmanyoperationscomplete
Tensprintersinunder11seconds~40km/hfor100m[2009]
Scalability Byaddingmoreresourcescanthroughput keepincreasing
33carson2.5mileovaltrack~250km/hfor804km[Indy500,2017]
Confidential– OracleInternal/Restricted/HighlyRestricted 3
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
Youcanonlygosobig
Confidential– OracleInternal/Restricted/HighlyRestricted 4
[email protected] 32cores,256threads16TBDRAM
[email protected] 28cores,56threads48TBDRAM
[email protected] 10cores,20threads8TBDRAM
SGIAltix4700IntelItanium22048CPUs@900MHz- 2cores,4threads128TBDRAM
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
VerticalScalingLimits• OnlysomanyCPUsinterconnected• NUMAlimits• Complexity&Cost• NicheMarket
5
1-2Sockets
4-8Sockets
8+Sockets
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
HorizontalScalinghardware• Usecheap/fastLinuxx8664servers,egOracleSunX7-2• NUMAaffectsareminimal• Commodityserverskeepgettingfaster,cheaper andmorepowerful• 1.5TBDRAM[PersistentRamcoming,Intel/OraclePMemdemo]
• [email protected],26cores• UptoeightNVMeSSDs• 421UserversperRack:– 2*42=84CPUs– 1.5*42=63TBRAM
6
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
LowerLatencywithTimesTen Cache
Confidential– OracleInternal/Restricted/HighlyRestricted 7
Oracle11.2.0.4RACRACnodeswereOracleSunX7-2LNVMe StorageOver50MillionUsers
LatencyisinMicroSeconds…
ApplicationTierDatabaseCache(TimesTen)RanonthesamenodesastheproductionRAC5tablejoinsfor100sofmillionsofrowsofdata
Query Oracle Cache
Q1 43 3
Q2 69 6
Q3 105 8
Q4 121 20
Q5 140 18
Q6 163 19
Q7 231 18
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
OracleDatabase&RealApplicationClustersArchitecture
OracleDatabase- SingleInstance- SingleDBimage
OracleRealApplicationClusters- MultipleDatabaseInstances- SingleDBimage- SharedStorage
OracleExadata- MultipleDatabaseInstances- SingleDBimage- SharedStorage
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
OracleShardingArchitecture
OracleSharding- MultipleDatabaseInstances- MultipleDBimages- IndependentStorage
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
OracleNoSQLArchitecture
OracleNoSQL- Multiple‘DB’Instances- OneDBimage- IndependentStorage
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
OracleTimesTenScaleoutArchitecture
OracleTimesTenScaleoutArchitecture- MultipleDatabaseInstances- SingleDBimage- SharedNothing
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
SummaryofhowtoScaleDatabaseApps
Confidential– OracleInternal/Restricted/HighlyRestricted 12
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
SummaryofhowtoScaleDatabaseApps
• Donotdodumbthings• TuneyourSQL• UsePLSQLstoredproceduresintelligently• Usegoodhardware• Scale-upwithSunSuperCluster• Scale-outwithExadata• Scale-outwithApplicationTierDatabaseCacheorTimesTen Scaleout
Confidential– OracleInternal/Restricted/HighlyRestricted 13
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
LowLatency- Microseconds ResponseTime
Millionthsofa
Second
Millionthsofa
Second
14
2socket,22cores/socket,2threads/core
TimesTen11.2.2.8.0(100Mrows,17GBdata)
1.64
5.06
0.00
1.00
2.00
3.00
4.00
5.00
6.00
SELECTQuery UPDATETransaction
Microsecond
s
selectdirectory_nb,last_calling_party,descr
fromvpn_userswherevpn_id=:1andvpn_nb=:2
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
Product Type Nodes Ops/Sec
NoSQLDB 32 227K
NoSQLDB 2 275K
NoSQLDB 3 715K
Scale-OutRDBMS 6 1.6M
NoSQLDB 8 1.6M
SomeThroughput&ScalabilityBenchmarks
• YCSB:YahooCloudServingBenchmark– DevelopedatYahooforCloudScaleworkloads– Widelyusedtocomparescale-outdatabases,NoSQLdatabases,and(non-durable)in-memorydatagrids
• Aseriesofworkloadtypesaredefined:– WorkloadA:50%reads,50%Updates– WorkloadB:95%reads,5%Updates– WorkloadC:100%reads
• TheYCSBClientcannotbechanged– DBVendorsimplementtheDBClientinterfaceinJava– Theversionandexactconfigurationmatters
15
SurveyedYCSB(WorkloadB)Results*
*ThereisnoofficialrepositoryofYCSBresultsThesewerethelargestresultswefoundonline
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
YCSBWorkloadB(95%Read5%Update):38MillionOps/Sec
16
2,772,3665,505,610
10,661,407
20,466,127
38,154,715
-
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
40,000,000
45,000,000
1x2 2x2 4x2 8x2 16x2
Ope
ratio
nsperSecon
d
OracleTimesTenScaleoutConfiguration
YCSBversion0.15.0• 1KBrecord
(100-bytex10Fields)• 100Mrecords/ReplicaSet• UniformDistribution
TimesTenScaleout• 1to16replicasets• 2synchronousreplicasper
replicaset
OracleCloudInfrastructure• 32*BM.DenseIO2.52
Reminder: ThebestYCSB-Bresultfoundinoursurveywas1.6Million Ops/Sec
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
TPTBM80%Read20%Update:153MillionTransactions/Sec
17
5,695,071 11,251,03422,611,122
41,633,465
81,746,166
153,140,347
-
20,000,000
40,000,000
60,000,000
80,000,000
100,000,000
120,000,000
140,000,000
160,000,000
180,000,000
2x1 4x1 8x1 16x1 32x1 64x1
Tran
sactionsperSecon
d
OracleTimesTenScaleoutConfiguration
TPTBMConfiguration
• 128-byterecord
• 100Mrecords/ReplicaSet
• UniformDistribution
TimesTenScaleout
• 1to64replicasets
• 1replicaperreplicaset
OracleCloudInfrastructure
• 32*BM.DenseIO2.52
• TwoTimesTeninstancespercomputenode
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
TPTBM100%Read:1.4BillionReadsPerSecond!!
18
44,524,225 88,637,257178,371,185
356,355,946
714,785,994
1,430,090,196
-
200,000,000
400,000,000
600,000,000
800,000,000
1,000,000,000
1,200,000,000
1,400,000,000
1,600,000,000
1x2 2x2 4x2 8x2 16x2 32x2
Read
sperSecon
d
OracleTimesTenScaleoutConfiguration
TPTBMConfiguration
• 128-byterecord
• 100Mrecords/ReplicaSet
• UniformDistribution
TimesTenScaleout
• 1to32replicasets
• 2 synchronousreplicasperreplicaset
OracleCloudInfrastructure
• 32*BM.DenseIO2.52
• TwoTimesTeninstancespercomputenode
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal/Restricted/HighlyRestricted 19
WhatHardwarewasUsed?OracleSunX7-2• [email protected],26cores• 768GBRAM• FourNVMeSSDs• Two10GEthernet
OracleCloudInfrastructure• 32*BM.DenseIO2.52
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
World’sFastestOLTPDatabase
Confidential– OracleInternal/Restricted/HighlyRestricted 20
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
SubsetofCustomer’sDataModel
21
DOA
U
M
S
+sevenothertablesforthe‘write’workload
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
CriticalQuery
OracleConfidential– Internal/Restricted/HighlyRestricted 22
SELECTa.usr_id,…FROMu,d,o,aWHEREu.login_name=:loginNameANDu.dom_id=a.dom_idANDu.usr_org_id=o.org_idANDu.account_id=a.acct_id(+)ANDu.status<>:x;
SELECTmn_usr_idFROMmWHEREmn_usr_id=uidANDstatus=:y;
SELECTs.attr_nameFROMsWHEREs.entity_id=muidAND(s.context=:porb.context=:q)AND(s.spid=:mor
s.spid=:nors.sid=:o)
ORDERBYb.attr_name;
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
CriticalUpdateTransaction
Confidential– OracleInternal/Restricted/HighlyRestricted 23
updateR1setsomething=:swherecol1=:xandcol2=:y;
selectsomethingfromR1wherecol1=:xandcol2=:y;
updateR2setsomething=:swherecol1=:xandcol2=:y;
selectsomethingfromR2wherecol1=:xandcol2=:y;
updateR3setsomething=:swherecol1=:xandcol2=:y;
selectsomethingfromR3wherecol1=:xandcol2=:y;
updateR7setsomething=:swherecol1=:xandcol2=:y;
selectsomethingfromR7wherecol1=:xandcol2=:y;
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
ScaleUporScaleOut?
Confidential– OracleInternal/Restricted/HighlyRestricted 24
Four5.1GHzSPARCCPUs256hardwarethreadsperCPUsocket64MBL3Cache16TBRAM8NVMeSSDforDBStorage+12Disks40GInfiniband4Quad10GEthernet
OracleDatabase11g
32CoreVMs64GBRAMCinderStorage
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
BestCaseArchitectureforcustomerworkload
Confidential– OracleInternal/Restricted/HighlyRestricted 25
Attribute Value
DataReads&Writes 100%LocalRAM
StorageReads&Writes 100%LocalNVMeSSD
StorageBottleneck No
FastCPU Xeon
Number ofCPUcores 24
SufficientMemory Yes.320GB
DBTuned Yes
Apptuned No.Pythonwithout SQLpreparesorbinds
Result:11MillionTransactions/second
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
WorstCaseArchitectureforcustomerworkload
Confidential– OracleInternal/Restricted/HighlyRestricted 26
Attribute Value
DataReads&Writes 90%onaremoteVM
StorageReads&Writes 100%remote[Cinder/Netapp]
StorageBottleneck Maybe. Networkbound
FastCPU Xeon
Number ofCPUcores 32
SufficientMemory No.Only32GB
DBTuned Yes
Apptuned Yes.ODBCwith SQLpreparesandbinds
Result:304KTransactions/second
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
SomeResults
Confidential– OracleInternal/Restricted/HighlyRestricted 27
240KTPS60/40WorkloadIOBoundACID1PC
4SocketSMP
<168KTPS60/40WorkloadNetworkBoundEventualCons
NegativeScaling
168KTPS60/40WorkloadNetworkBoundEventualCons
37NodeCluster
304KTPS60/40WorkloadNetworkBoundACID2PC
10NodeCluster
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
HowManyClientServerSQLNetworkRoundTrips?
Confidential– OracleInternal/Restricted/HighlyRestricted 28
1. Select*fromtablewherePK=:value;2. Select*fromtablewherePKbetween10and20;3. Updatetablesetcolumn=:XwherePK=:value;4. Updatetablesetcolumn=:XwherePKbetween1000and2000;5. Select*froma,b,c,dwhere{nonCartesianProduct}
A. OneB. TwoC. ThreeD. LotsE. ItDepends
HowmanyserversidenetworkmessagesWhentablesarehashdistributed?
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
DataDistributionMethods
• DistributebyHash– Primarykeyoruser-specifiedcolumns– Consistenthashalgorithm– Examples:Customers,Subscribers,Accounts
• DistributebyReference– Co-locaterelateddatatooptimizejoins– BasedonFKrelationship– Supportsmulti-levelhierarchy
• DistributebyDuplicate– Identicalcopiesonallelements– Usefulforreferencetables– Readandjoinoptimization
Customer
DistributebyHash
DistributeTableDatabyHash,Reference orDuplicate
Element1
0 David
4 Igor
8 Tim
Element2
1 Bill
5 Sam
9 Charles
Element3
2 Olaf
6 Henri
10 Jie
Element4
3 Chi
7 Simon
11 Chris
DistributebyReference
Order
1 0 16/6/15
6 8 16/3/22
2 5 16/2/22 5 6 16/5/10 3 3 16/3/1
4 11 16/2/5
phone 100
tablet 200
watch 300
phone 100
tablet 200
watch 300
phone 100
tablet 200
watch 300
phone 100
tablet 200
watch 300
Products
DistributebyDuplicate
29
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
ScalabilityChallenges
• Fourtablejoinswithhashdistributionfor‘readworkload’with(+)• Sevenqueries+sevenupdatesfor‘writeworkload’• ClientServerroundtrips• NotenoughRAM[64GB]perVM• KVM+OpenStackNeutronnetworkingoverhead
Confidential– OracleInternal/Restricted/HighlyRestricted 30
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
TechniqueswhichhelpedScalability• Determinethebestdistributionclauses– TheDistributionAdvisoreliminatestheguesswork
• Determinethebestindexes– TheIndexAdvisoreliminatestheguesswork
• PrepareandBindtheSQLstatements
• Checktheexplainplans
• UseStoredProceduresforthe‘read’and‘write’transactions– Executemanystatementinasinglenetworkroundtrip.Procedurallogic+commit/rollback
• UsetheRoutingAPI– Determinewherethedataistoavoidnetworkhops
• UsemoreDBnodes– TheVNICbecamenetworkbound[ksoftirq]
– UsemoremodestolessentheloadperVNIC
Confidential– OracleInternal/Restricted/HighlyRestricted 31
TODO• TCPtuning• RDMA
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
TimesTenScaleoutSQLAPIs
API CommentJDBC Thesame(JDBC4.3)
ODBC Thesame(ODBC3.5.2)
OCI Thesame(OCI 11.2.0.4.+)
R-Oracle Thesame(OCI 11.2.0.4.+)
ODP.Net Thesame(OCI 11.2.0.4.+)
PL/SQL Thesame(11.2.0.4.+)
Python Thesame(cx_Oracle,ODPI-C)
Ruby Thesame(Ruby-ODPI,ODPI-C)
GoLang Thesame(go-goracle,ODPI-C)
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
• TimesTenScaleoutrequires:• Linuxx8664(glibc2.12+)• OracleLinux/RedHat/CentOS6.4+,7+• Ubuntu14.04+• SuSE12+
• JDK8+• TCP/IPorIPoIB• Afilesystem[egext4,notext2orext3]• EnoughRAMfortheDB
33
TimesTen inOnPremises
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
TimesTenScaleoutonOCI,AWS,Azure,Google
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
CentralizedInstallationandManagement• AllTimesTenScaleoutmanagementandadminoperationsareperformedfromasinglehost– Installingsoftware– Patchingsoftware– Configuration– Databasecreationandmanagement– Backupandrestore– Monitoring– Collectingdiagnostics
• Commandlineinterface• SQLDeveloper(GUI)interface
35
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal/Restricted/HighlyRestricted 36
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal/Restricted/HighlyRestricted 37
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal/Restricted/HighlyRestricted 38
UsingOraclecx_PythonwithTimesTenScaleout
tnsnames.ora:sampledb_1812=(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=sampledb_1812)(SERVER=timesten_direct)))sampledbCS_1812=(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=sampledbCS_1812)(SERVER=timesten_client)))
TimesTenODBCDSN Client/ServerorDirectLinked
Python[andNode.js,GoLang,Ruby andPHP]usesanOCIdriverUsetnsnamesoreasyconnecttoconnect
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
RelationalDatabase– Purein-memory– ACIDcompliant– StandardSQL– EntiredatabaseinDRAM
PersistentandRecoverable– DatabaseandTransactionlogspersistedonlocaldiskorflashstorage
– ReplicationtostandbyandDRsystems
39
ExtremelyFast– Microsecondsresponsetime– Veryhighthroughput
HighlyAvailable– Active-Standbyandmulti-masterreplication
– Veryhighperformanceparallelreplication
– HAandDisasterRecovery
OracleTimesTenIn-MemoryDatabase
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
MostWidelyUsedRelationalIn-MemoryDatabaseDeployedbyThousandsofCompanies
40
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
TheForresterWaveTM:In-MemoryDatabases,Q12017
41
OracleIn-MemoryDatabasesScoredHighestbyForresteronbothCurrentOffering
andStrategy
http://www.oracle.com/us/corporate/analystreports/forrester-imdb-wave-2017-3616348.pdf
TheForresterWave™iscopyrightedbyForresterResearch,Inc.ForresterandForresterWave™aretrademarksofForresterResearch,Inc.TheForresterWave™isagraphicalrepresentationofForrester'scallonamarketandisplottedusingadetailedspreadsheet withexposedscores,weightings,andcomments.Forresterdoesnotendorseanyvendor,product,orservicedepictedintheForresterWave.Informationisbasedonbestavailableresources.Opinionsreflectjudgmentatthetimeandaresubjecttochange.
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
SingleDatabaseImage
• Databasesizenotlimitedbymemory• Tabledatadistributedacrossallelements– Allelementsareequal
• Connecttoany elementandaccessall data– Distributedqueries,joins&transactions
• Noneedtode-normalizedatamodel
42
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
HighAvailabilityandMaximumThroughput
• Built-inHAviamultiplecopiesofthedata(K-safety)– Automaticallykeptinsync
• All replicasareactive forreadsandwrites– Doublethecomputecapacity
• Transactionscanbeinitiatedfromandexecutedonanyreplica
K-Safety,AllActive
A
A
B
BC
C
D
D
43
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
TimesTenScaleout- ElasticScalability
Adding(andremoving)databaseelements- Dataredistributedtonewelements- Workloadautomaticallyusesthenewelements
- Connectionswillstarttousenewelements
- Throughputincreasesduetoincreasedcomputeresources
Expandandshrinkthedatabasebasedonbusinessneeds
44
E’E
B’
A
C
A’
B
C’
D D’
ReplicaSet1
ReplicaSet2
ReplicaSet3
ReplicaSet4
ReplicaSet5
DataSpaceGroup1
DataSpaceGroup2
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
DatabaseFaultTolerance– NoApplicationDownTimeProvidedoneentirecopyofthedatabaseisavailable
• Ifmultipleelementsfail,applicationswillcontinueprovidedthereisonecompletecopyofthedatabase
• Recoveryafterfailureisautomatic• Ifanentirereplicasetisdown,
applicationcanexplicitly choosetoacceptpartialresults
45
Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|
QA&
46