Reflections on scalability and - info.ucl.ac.be
Transcript of Reflections on scalability and - info.ucl.ac.be
ReflectionsonscalabilityandconsistencyPeterVanRoyW-PSDS2019,Lyon,FranceOct.1,2019
1
Overview• Basicprinciples
• Understandingscalability,InternetgrowthandIoT,CAPtheorem
• Large-scalephenomena• Buridan’sprinciple,blackswans,real-worldgraphs,Heisenbergapplications
• Buildingscalablesystems• Autonomiccomputingandfeedbackstructures,convergentdatamanagement,simplifieddistributedsystems(confluenceandlinearity)
• Conclusions• Ongoingworkonconvergentdatamanagementandconfluentdistributedsystems
2
Understandingscalability• Asystemisscalableifitmaintainsadesirablepropertyasthescaleincreases• Scalecanbedefinedinseveralusefulways:numberofnodes,sizeofdata,numberofusers,geographicseparation
• Wewillusearoughdefinitionwherescalecorrespondstototalaggregatework(data,computation)donebyasystem
• Whenscaleincreases,strangethingshappen• Thistalkisnotastandardtechnicaltalk• Wewillexplainvariousphenomenathatshowupathighscaleandgivesomeapproacheshowtodesignsystemstoaccommodatethesephenomena
• Inmyview,anyonewhoisseriousaboutscalabilityshouldknowaboutthesephenomenaandthesetechniques!
• Thistalkisalsointendedtoprovokediscussionandsolicitpointerstootherworkonscalability
3
Internetscale• TheInternetisanexcellenttestbedforscalabilitybecauseithasbeengrowingexponentiallysincearound1970andstillis
• Nowadays,thisgrowthisconcentratedattheedge:
FromtalkofDavidBol,RISC-V(Paris),Oct.1-3,2019
4
CAPtheorem 5
CAPtheorem• The CAP theorem was conjectured by Eric Brewer at PODC
in 2000 and proved by Seth Gilbert and Nancy Lynch in 2002 • For an asynchronous network, it is impossible to implement
an object that guarantees the following properties in all fair executions: • Consistency: all operations are atomic (totally ordered) • Availability: every request eventually returns a result • Partition tolerance: any messages may be lost
• The CAP Theorem applies for all systems, at all levels of abstraction, and at all sizes • It can be applied in many places in the same system • The whole system is a rainbow of interacting instances of CAP
© 2
010
Pet
er V
an R
oy
6
Spanner
Databases
TheCAPtriangle• WegiveanintuitiveoverviewoftheconsequencesofCAPbymeansofaCAPtriangle• Costincreasestowardthe
center• Thecenteritselfisempty!
• AllpartsoftheCAPtrianglehavetheiruses
• Wehavearrangedsomeapplicationsaroundthetriangleaccordingtoperceivedfunctionality• Verylittlesystematicstudyhas
beendoneaboutnavigatinginthistriangle
C+A
C+P
C+P
A+P C A
P
A C
P
Trade-off Trade-off
Trade-off
Mercurial (BitTorrent) (PeerTV)
DropBox (Wuala)
(…) = read-only
(Search)
Cost increases
∞
7
Buridan’sprinciple 8
Buridan’sprinciple• PhilosopherJeanBuridanstatedthatanassplacedequidistantbetweentwobalesofhaymuststarvetodeathbecauseithasnoreasontochooseonebaleovertheother• Thisprinciplehassurprisingconsequences,asshowninthepaper“Buridan’sPrinciple”,byLeslieLamport(1984)
• Assumeasystemhastomakeadiscretedecisionfromcontinuousinput.Thenwecanprovethefollowing:• Adiscretedecisionbaseduponaninputwithacontinuousrangeofvaluescannotbemadewithinaboundedlengthoftime
• Therearemanyexamplesofthisprinciple• Acaratanunguardedrailroadcrossing,thedriverstopsatthecrossingandproceedswhenitissafe.Thedrivermustdecidewhethertowaitforthetrainortocrossthetracks.
• Ajurymustdecidewhetherastudentpassesorfailshisacademicyear.Astheaveragegetscloserto50%,thedecisionbecomesharderandharder,becausemoreinformationmustbeanalyzed.
9
ProofofBuridan’sprinciple
• At(x)isthepositionattimetwithstartingpositionx• Astincreases,At(x)mustconvergeto0or1forallx• SinceAt(x)iscontinuousintandx,itisclearthatthereexistxforwhichtwillbearbitrarilylarge
Startingposition0 1
0
1
Positionattimet
A0(x)
A1(x)A2(x)
Decision1
Decision0
10
Relationshiptoscale• Distributedsystemsoftenmustmakedecisions
• Distributedbankaccount:istheaccountpositiveornegative.• Votingsystem:whowinsthevote.
• Theinputdatais(approximately)continuousandcanbedistributedoverthewholesystem
• Tomakethedecision,moreinformation/computationisrequiredastheinputdataisclosertothedecisionboundary• Farfromtheboundary,onlylocalinformationisneeded• Veryclosetotheboundary,thewholesystemisinvolved
• Lessonforsystemdesign:preventionorcure!• Designascalablesystemtostayfarawayfromboundaries• Someboundariesareinevitable:inthatcase,trytopredictwhendecisionsareneededand«prefetch»theinformation 11
Blackswans 12
… imagine a green plant shooting up from its root, thrusting forth strong green leaves from the sides of its sturdy stem, and at last terminating in a flower. The flower is unexpected and startling, but come it must – nay, the whole foliage has existed only for the sake of that flower, and would be worthless without it.
– from “Conversations of Goethe with Johann Peter Eckermann” (1930 translation)
13
Blackswans• Systemsaredesignedbyrelyingoninduction,“whatworkedinthepastwillcontinuetoworkinthefuture”• Itisverycommontoassumethatinductionwillalwayswork
• However,inductionoftenhasabuilt-inlimitandfailsbeyond• Year2000Bug:itwas“faraway”butithasarrived• Dinosaursandbanks:“toobigtofail”buttheywillfail
• Incomputersystemsthisisbothubiquitousandhidden• Allsystemshavefiniteresourcelimits(memory,speed)thatarefarawayin“normalusage”butreachedwhensystemisstressed
• Typically,thesystemwillfailinexactlythecasewhereitisneededmost(RedWeddingsituations)
14
Thetwogreat“frauds”• Systemsobeyinductivereasoning–false!
• Pastexperiencewithsystemsisabadguideforfuturesystems,especiallyifthefuturesystemisgoingbeyondthepastone
• Systemsobeyprobabilitydistributions–false!• Probabilitydistributionsareintroducedtosimplifyanalysis,butoftendonotexistinreality
• Assumingaprobabilitydistributionexistsisaverystrongassumption(frequencylimitexists)andisveryprobablywrong
• Blackswan:unexpectedlargeeventsthatfalsifyinductionandareobviousinhindsight• Largesystemsarefundamentallyirregularandmustbedesignedtosurviveextremecases
• See“TheBlackSwan”,byNassimNicholasTaleb(2010)15
Degreesofincreasingirregularityinalargeystem1. Existenceofaprobabilitydistribution
• Statisticalphysicsholds,allmicrostateshaveequalprobability,behavioristhermodynamic(describablebymacroscopicstatevariables)
• Unfortunately,mostsimulationsandmodelsarestuckhere!2. Criticalpoint
• Minorfluctuationscanbeamplifiedwithoutbounds• Thelimitofstatisticalphysics• Manycomputingsystemshavecriticalpoints(garbagecollectors,dynamichashtables,wide-arearouting,virtualmemory)
3. Noprobabilitydistributionexists(“BlackSwans”)• Weknowonlytherangeofbehavior,frequencylimitsdonotexist
• Dijkstra’sdemon:inaguardedcommand,allguardscanbechosen• Theprogrammustbedesignedsothatitworksevenifademonmakesthe
worstpossiblechoicesineachguardedcommand• Complexsystems,programverification,distributedalgorithmics 16
Real-worldgraphs 17
Real-worldgraphs• LargeapplicationsontheInternetwilloftenhavelargenumbersofuserswhosebehaviorcansignificantlyinfluencethelarge-scalebehaviorofthesystem• Especiallyimportantisinformationdisseminationamongusers
• Small-worldgraphs• Theconnectivitygraphamongtheapplication’suserswillalmostalwaysbeasmall-worldgraph:smallaverageshortestpathlength,largeclusteringcoefficient(moreclusteredthanrandom,smallerpathsthanneighbor)
• Navigationisofteneasy(usersearchusingpartialinformation)• Power-lawstructure
• FractionofWebpageswithkin-linksisproportionalto1/k2• Consequenceofinformationdisseminationduringformation
See«Networks,Crowds,andMarkets»,byDavidEasleyandJonKleinberg(2010)
18
Consistencyinreal-worldgraphs• CAPtheoremstatesthatconsistencycannotbeachievedforavailable,partition-tolerantsystems• Users’informationdisseminationmakesthismoreprecise
• Limitingcommunicationbetweenuserscausespluralisticignorance,wheredifferentpartsofthenetworkhaveverydifferentinformation,andthiscanlastindefinitelylong
• Limitingcommunicationcancausesuddenchanges• Epidemicdissemination(synchronization,oscillation,stability)• Collapseofgiantcomponents(oneconnectedcomponentthatcontainsasignificantfractionofallnodes)
• Informationcontentcanbechangedduringdissemination• Cascades(herding),whendecisionsaremadeinsequentialorder• Tippingpoints,decisionsneedtoconvincealargeinitialgroup
19
Example:Webbow-tie
• LargedecentralizedinformationnetworkssuchastheWebwilloftenhaveaglobalstructurewithacentralstrongly-connectedcomponent(from2000,butstillvalidtoday)
• Thestructureismaintainedasthenetworkcontinuouslychanges
FromBroderetal(2000),asprintedinNetworks,Crowds,andMarkets
20
Example:financialnetworks
• NetworkofloansamongUSfinancialinstitutions,revealingitsstronglyconnectedcore
• Thisrevealsastructuralfragilityinthefinancialsystem
FromBech&Atalay(2008),asprintedinNetworks,Crowds,andMarkets
21
Heisenbergapplications 22
Manyapplicationsarebursty• Manyreal-worldapplicationsrequirealotofcomputationalresourcesforveryshorttimeperiods• Forexample,interactiveapplicationsrequiremassiveresources,butonlywhenbeingused(real-timevoicetranslationwhenyouarespeaking)
• Yourcomputer’sCPUusageisbimodal:itiscloseto0%mostofthetime,exceptwhenyouaredoingacompute-intensivetaskwhenitiscloseto100%
• Thesolutiontothisistoprovideelasticity• Cloudcomputing:payonlyfortheresourcesactuallyused
• Economyofscalewithmanysharedusers• InternetofThings:poweruponlythenodesactuallyneeded
• Hardware(silicon)ischeap• Thisenablesanewkindofapplicationbasedonburstiness 23
ComputationalHeisenbergPrinciple(1)• Acloudhastwokeyproperties:
• Payperuse:payonlyfortheresourcesactuallyused• Elasticity:abilitytoscaleresourceusageup(anddown)rapidly
• Forafixedcost,asthetimeintervaldecreasesmoreresourcescanbemadeavailable:
• Foragivenmaximumcost,theproductofresourceamountandusagetimeislessthanaconstant
• AnalogywithHeisenberg’sUncertaintyPrincipleinphysics:theproductofuncertaintyintimeanduncertaintyinenergyisequalto(orgreaterthan)aconstant.Thisincreasestheprobabilityofeventsthatusearbitrarilyhighenergiesifthetimeperiodisshortenough.Aslongasthehighenergiesarelessthantheuncertainty,thentheyareallowed!
• Thisisapropertyofthesystemitself,notalimitationofmeasurement!• ∆t⋅∆E=candtallow≤∆tandEallow≤∆Eimpliestallow⋅Eallow≤c
• Thisopensthedoortonewapplicationsthatcouldnotbedonebefore24
ComputationalHeisenbergPrinciple(2)
• Forgivenfixedresourcecostc0,whatkindsofapplicationscanrun?
• Beforeelasticity:allapplicationslivedinlightblueareawhichgiveslocalresourcesformaximumcostc0(r≤r0)
• Withelasticity:darkblueareabecomesavailableforthesamecost(r>r0)
• ThedarkblueareaisthehomeofHeisenbergapplications• Cloudapplications(e.g.,bigdata)• IoTapplications(e.g.,datafusion)
• UsingmachinelearningoftenleadstoHeisenbergapplicationsTime interval
Available resources
t0
r0
Local resources for cost c0
c0
Cloud resources for cost c0
t ⋅ r ≤ c0
25
Query/usephaseelasticityrequirements
(responsetime)
Learning/setupphaseelasticity
requirements(learningtime)
S
S
M
M
L
L
GoogleSearchGoogleTranslate
Recommendationsys.SpeechrecognitionSkypeconnectionSocialnetworksMediatranslation
Weatherforecasting
Interactivity(learning+query)
One-shot
One-waystream
RecordedFuture
MMORPGRole-playinggamesChessprogram
ChampionchessprogramIBMJeopardy
WolframAlphaImagerecognition
ComputeralgebraPeer-to-peerCDNGoogleEarthJITCompiler
XL
XL
Conversation
BitTorrentWIMPGUI
MicrosoftOffice
Future applications
Standard applications
Advanced applications
Real-timeaudiolanguagetranslation
Real-timeexpertguidance
Creativeproblemsolving
Heisenbergapplicationspace
26
Buildingscalablesystems 27
Thestartingpoint
• Everyscalabledesignstartsasindependentpieces(P+A,noC)• Nodesoccasionallyinteract(addsomeC)→collaboration,emergence• Splitprotocol:whathappenswhenanodeleavesagroup(maybeabrupt)• Mergeprotocol:whathappenswhenanodejoinsagroup
• Manyexamples:biology,peer-to-peer,map-reduce,…
split/merge
group
28
Buildinguptoarealsystem
• Westartwithadecentralizedsystem(P+A,noC)• Theproblem:howmuchCandhowtoaddit?
• TherestofthissectionexploreshowtoaddC• Control-orientedapproach
• Autonomiccomputing• Weaklyinteractingfeedbackstructures
• Data-orientedapproach• Convergentdatamanagement• LightKonereferencearchitecture
29 See«DesigningRobustandAdaptiveDistributedSystemswithWeaklyInteractingFeedbackStructures»,byP.VanRoy,S.Haridi,andA.Reinefeld(2011)
See«LightKoneReferenceArchitecture(LiRA)WhitePaper»,byAliShokeretal(2019)
Weaklyinteractingfeedbackstructures
30
Autonomiccomputing
• Autonomiccomputing,initiatedbyIBMin2001,aimstomakecomputersystemsselfmanaging,toovercomethegrowingcomplexityofsystemsmanagementasscaleincreases
• ThebasicbuildingblockofIBM’sautonomicsystemistheMAPE-Kfeedbackloop(Monitor–Analyze–Plan–Execute–Knowledge)
• Westartwiththisapproach,andweinvestigatehowtobuildsystemsconsistingofmanyinteractingMAPE-Kloops
6 · Markus C. Huebscher, Julie A. McCann
Fig. 1. IBM’s MAPE-K (Monitor, Analyse, Plan, Execute, Knowledge) reference model for auto-nomic control loops.
a quality as possible, e.g. Kendra [McCann and Crane 1998] and Real Surestream[Lippman 1999]. However the autonomic community is more and more identifying asystem as autonomic if it exhibits more than one of the self-management propertiesdescribed earlier, e.g. Ganek and Friedrich [2006].
4. THE MAPE-K AUTONOMIC LOOP
To achieve autonomic computing, IBM has suggested a reference model for auto-nomic control loops [IBM 2003], which is sometimes called the MAPE-K (Monitor,Analyse, Plan, Execute, Knowledge) loop and is depicted in Figure 1. This model isbeing used more and more to communicate the architectural aspects of autonomicsystems. Likewise it is a clear way to identify and classify much of the work thatis being carried out in the field. Therefore this section introduces the MAPE-Kloop in more detail and then taking each of its components in turn, describes thework that focuses its research on that component. We later take the same workand examine it from a degree of autonomicity point of view.
The MAPE-K autonomic loop is similar to, and probably inspired by, the genericagent model proposed by Russel and Norvig [2003], in which an intelligent agentperceives its environment through sensors, and uses these percepts to determineactions to execute on the environment.
In the MAPE-K autonomic loop, the managed element represents any softwareor hardware resource that is given autonomic behaviour by coupling it with anautonomic manager. Thus, the managed element can for example be a web server ordatabase, a specific software component in an application (e.g. the query optimiserin a database), the operating system, a cluster of machines in a grid environment,a stack of hard drives, a wired or wireless network, a CPU, a printer, etc.
Sensors, often called probes or gauges, collect information about the managedelement. For a web-server, that could include the response time to client requests,network and disk usage, CPU and memory utilisation. A considerable amount ofACM Journal Name, Vol. V, No. N, Month 20YY.
31
Ascalablearchitectureinfoursteps
• Concurrentcomponent• Anactiveentitycommunicatingwithitsneighbors
throughasynchronousmessages• “Intelligence”concentratedincorecomponents
• Singlefeedbackloop(MAPE-Kloop)• Manager,sensor,andeffectorcomponentsconnected
toasubsystemandcontinuouslymaintainingonelocalgoal
• Feedbackstructure• Asetoffeedbackloopsthatworktogetherto
maintainoneglobalsystemproperty
• Weaklyinteractingfeedbackstructures(WIFS)• Thecompletesystemisaconjunctionofglobal
properties,eachmaintainedbyonefeedbackstructure
• Thefeedbackstructureshavedependenciesbasedontheoperatingconditions
32
T H E A D V E N T U R E S O F
Humanrespiratorysystem
• Defaultbehavior:rhythmicbreathingreflex• Complexcomponent:consciouscontrolcanoverrideandplanlifesavingactions• Abstraction:consciouscontroldoesnotneedtoknowdetailsofbreathingreflex• Fail-safe:consciouscontrolcanitselfbeoverridden(fallingunconscious)• Timescales:laryngospasmisaquickactionthatinterruptsslowerbreathingreflex
Other inputs
when sufficient obstruction in airways
Laryngospasm
(seal air tube)
Breathing
reflex
MeasureO2
in blood
Monitor
breathing
MeasureCO2
in blood
Detectobstructionin airways
Trigger unconsciousness
when O2 falls to threshold
Conscious control
of body and breathing
Trigger breathing reflex
when CO2 increases to threshold
Trigger laryngospasm temporarily
Actuating agents Monitoring agents
in human body
Breathing apparatus
(maximum is breath!hold breakpoint)and change CO2 threshold
Increase or decrease breathing rate
(and reduce CO2 threshold to base level)Render unconscious
Somedesignrules:
Theoperationofthehumanrespiratorysystemisgivenasonefeedbackstructure,inferredfromaprecisemedicaldescriptionofitsbehavior(seeentryon“Drowning”,Wikipedia)
33
Statediagram
• Thehumanrespiratorysystemcanbeseenasastatediagram• Dominantsubset=activesubsetoffeedbackloops=state
• Atanytime,onesubsetisactive,dependingonoperatingconditions• Eachsubsetcorrespondstoastateinthestatediagram
breathingNormalUnconscious
breathingUnconscious
laryngospasmNormal
Consciousbreathing
laryngospasm
Consciouslaryngospasm
obstruction
timeoutobstruction
time out
obstruction
time out
wake up
oxygen low
consciousdecision
oxygen low
consciousdecision or
breath!holdbreakpoint
consciousdecision
consciousdecision
© 2
010
Pet
er V
an R
oy
34
Aself-managingkey/valuestore:Scalaris
• Scalarisisahigh-performanceself-managingkey/valuestorethatprovidestransactionsandisbuiltontopofastructuredoverlaynetwork• AmajorresultoftheSELFMANproject(www.ist-selfman.org)• 4000read-modify-writetransactions/secondontwodual-coreIntelXeon2.66GHz
• ScalarishasfiveWIFS:connectivity(Sconnect),routing(Sroute),loadbalancing(Sload),replicamanagement(Sreplica),andtransactionmanagement(Strans)
! " "! " !
#!$%&'!()!)!*%&#"#!+&*,
-
.$)&#)*!"%&!/),+$
"- 0"-"!
)!%1"*"!,2!*%&#"#!+&*,2
"#%-)!"%&2!(3$)0"-"!,
4++$"!%"4++$ /),+$5)!) 5)!)
6+7-"*)!"%&!/),+$
#*)-)0"-"!,
)8)"-)0"-"!,
4++$ !% 4++$!/),+$
5)!)
9!%$+
:
5)!)
9!%$+
;
5)!)
9!%$+
<
5)!)
9!%$+
=
5)!)
9!%$+
>>>
5)!)
9!%$+
>>?
5)!)
9!%$+
>>@
5)!)
9!%$+
>>A
Sscalaris= Skey-value ∧ Sconnect ∧ Sroute ∧
Sload ∧ Sreplica ∧ Strans The Scalaris specification is a conjunction of six properties. Each non-functional property is implemented by one feedback structure. Sconnect → Sroute → Sreplica → Strans
Sload
35
Convergentdatamanagement 36
Convergentdatamanagement• Maintainingadequateconsistencyandperformancearetwoofthemostimportantissueswhensystemscaleincreases• Whereasautonomiccomputingisanoperation-orientedapproach,letusnowtakeadata-orientedapproach
• Assumeoursystemhasadatabasethatisusedtocentralizethedatamanagement• Weautomatethemanagementof“truth”inthedatabasebyasystemtocontinuouslyupdatethedatabasewithinformationcomingfromtheedge
• Tosimplifyconsistencymanagement,weuseconvergentdatastructuresthattoleratemessageandnodefailuresandneedverylittlesynchronization(CRDTs)
37
CRDTs• AConflict-FreeReplicatedDataTypeisareplicateddatastructurethatmaintainsconsistencybetweenreplicaswithaveryweaksynchronizationprotocol• ItsatisfiesStrongEventualConsistency(SEC):nreplicasthatreceivethesameupdates(inanyorder)haveequivalentstate
• Internally,allreplicasarecollectinginformationmonotonicallyandarealwaysconvergingtotheirresultingstate
• Synchronizationbetweenreplicasiseventualreplica-to-replicacommunication
• ManypracticalCRDTsexistandarewidelyusedincommercialsystemswithmillionsofusers• Counters,sets,maps,graphs,etc.
• CRDTsarewell-suitedforconvergentdatamanagement38
Basicconvergentdatascenario
• TheCRDTdatabasereceivesperiodicupdates(raworaggregated)fromalledgedevices
• Messagesmaybelost,reordered,orrepeatedwithoutharm• TheCRDTdatabasewillalwaysbeconvergingtothetruthattheedge;stalenessdependsontheupdateprotocolanddelay
CRDTdatabase(cloudorgateway)
Edgedevices(inhighnumbers)
updates
39
LateraldatasharinginLiRA
• LiRA(LightKoneReferenceArchitecture)proposesanarchitecturewithconvergentdatamanagement,withartefacts(AntidoteDB,Achlys,Legion)
• Astheedgecontinuestogrowexponentially,verticaldatasharing(redlines)isextendedforconvergentdatamanagement
• Inaddition,lateraldatasharing(bluelines)isaddedtodoconvergentdatamanagementbetweendevicesatthesamelevel
CHAPTER 1. OVERVIEW
Figure 1.0.1: A typical hierarchical architecture of Fog/Edge Comput-ing in a airport monitoring use case (source OpenFog [6]). The diagramdemonstrates the need for lateral and vertical data sharing and computa-tion across different airport sub-systems and terminals at different levelsof the fog hierarchy.
management is rather often delegated to lower layers, e.g., database vari-ants, in-memory caching systems, etc.
None of current RAs provide explicit solutions to these problems de-spite their significant importance to fog applications. In fact, solving P1helps keeping the data at a close proximity of its source and close-bynodes, thus avoiding extra time delays, communication failures and over-heads, and security threats. Whereas, solving P2 considers the diversedata usage patterns, consistency requirements, and invariants of fog ap-plications, key to improving user experience despite delays and networkspartitions—likely to occur in such hostile networks.
1.2 LiRA Approach
The LightKone Reference Architecture (LiRA) bridges the above gap byproviding a data management layer that supports lateral data sharing(via replication) as well as fog/edge application-level data management.LiRA gives priority to high availability, often desired in fog/edge applica-tions, without breaking application semantics or giving up consistency—although delayed. This approach stems from the fact that given the un-avoidable network partitions and delays in fog/edge networks, one hasto choose either Consistency or Availability as explained in the CAP the-
LightKone Reference Architecture LiRAv0.9, June 15, 2019
From«LightKoneReferenceArchitecture(LiRA)WhitePaper»,AliShokeretal(2019)
40
(stable)
(exponential)
Just-RightConsistency(JRC)• JRCgeneralizesconvergentdatamanagementforapplicationinvariants
• Differentpartsofanapplicationneeddifferentconsistencylevels• Nosinglemodelisbest:synchronousmodelsaresafest,butasynchronousonesarefastandtoleratepartitions
• Wewanttousesynchronousandasynchronoustogetherandgetthebestofbothworlds
• JRCclassifiesapplicationinvariantsaccordingtoCAP:• «Chooseanytwo:Consistency,Availability,Partition-tolerance»• AP-compatiblepatterns:CRDTswithconcurrentupdates,causalorderofoperations,highly-availabletransactions
• CAP-sensitivepattern:stablepreconditionbeforeconcurrentupdate(canbeverifiedwithtoolsupport,e.g.,CECtool)
• Allotheroperationsmustbedonesequentially41
Simplifieddistributedsystems 42
Simplifyingdistributedsystems• Distributedsystemshaveconcurrency,messagelatency,partialfailure,andreal-worldinteraction.Theyarequitedifficulttounderstandanddevelopintheirfullgenerality.
• Ausefulapproachistofocusonsimplerkindsofdistributedsystems• Thegeneralformcanbeseenasthesimplerkindwithsmalladditions• Followingthisapproachcangreatlyhelpdesigningdistributedsystems
• Weshowtwousefulkindsofsimplifieddistributedsystems• Eachkindsatisfiesastrongpropertythatgreatlyhelpsdevelopment,andthe
generalformcanbeattainedwithsmalladditions
Linearsystem
Confluentsystem
NonlinearLinear
Imperative(real-worldinteraction)
Purefunctional(lambdacalculus)
43
Confluentsystemsprinciple• Functionalprogrammingbasedonλ calculusisconfluent:
• Church-Rosser:Givenaninitialexpression,thefinalresultofareductionisthesameforallreductionorders(uptorenaming)
• Afunctionalprogramcanbeseenasanetworkofconcurrentagents,eachexecutinginitsownthread.Church-Rossermeansthatforallschedulerchoices,theresultisthesame.
• Wemakethisprogramdistributedbyplacingeachsubexpressiononitsownnode• Thisrequiresoneextrareductionrule,amobilityrule,tocolocateexpressionsonanodeforreduction.Church-Rosserstillholds.
• Thelimitationisthatitissyntacticallyknownfromwhichsubexpressioneachagent’snextinputwillcome• Forgeneraldistributedsystems,weneedtoaddinteractionpoints,whichdependonreductionorderandexpressreal-worldinteraction
See«WhyTimeisEvilinDistributedSystemsandwhattodoaboutit»,byPeterVanRoy,invitedtalk,CodeBEAM2019,May16,2019
44
Confluentdistributedsystems
• Asimpleexampleofthisapproachistheclient/server:thisprogramiscompletelyfunctionalexceptforoneinteractionpoint,namelywheretheserveracceptsanincomingmessagefromanyclient
• Thegeneralapproachistowritemostoftheprogramwithdistributedfunctionalconcurrencyandaddinteractionpointswhereneeded
Server
Client1
Client2 Nondeterministicmessagereceiveisaninteractionpoint
localspinnodep=newport(s)server(state,s)endnodeclient(state1,p)endnodeclient(state2,p)end… /*asmanyclientsasweneed*/endfunclient(state,p)send(query(state),p)client(fc(state),p)endfunserver(state,s)casesofq|tthenserver(fs(q,state),t)[]nilthennilendend
Oneinteractionpoint
Pseudocodeofclient/server
…
45
Linearsystemsprinciple• Anotherusefulkindofdistributedsystemisthelinearsystem
• Linearsystemsarewidelyusedinmechanicalandelectricalengineering,butarelessusedininformatics,becausecomputerprogramsareinherentlydiscreteandhencenonlinear
• However,distributedsystemsaremoreandmorebeingusedintightconnectionwiththerealworld,tomonitorandcontrolrealworldsystems.Thesedistributedsystemscouldtakeadvantageoflinearity,justlikeinotherengineeringdisciplines.
• Theworldisacombinationoflinearityandnonlinearity• Linearity=independentparts=wholeequalsthesumoftheparts• Nonlinearity=interactingparts=wholeismorethanthesumoftheparts
• Linearsystemsaremucheasiertoanalyzequantitativelythannonlinearones• Becauseinlinearsystems,thepartscanbeanalyzedseparatelyandthencombined
(superpositionprinciple,compositionalsystems)• Inaddition,somenonlinearsystemscanbeanalyzedqualitatively(usinga
geometricapproach).See“NonlinearDynamicsandChaos”,byS.Strogatz(1994).• Nonlinearitycannotbeeliminatedentirely;oftenthesystemiscriticallybasedonit
• Forexample,intelligence(complexcomponents,machinelearning)isoftennonlinear• That’swhybiologicalsystemsaremadeofweaklyinteractingsubsystems
46
Lineardistributedsystems• Largedistributedsystemscanoftenbemostlylinear
• Thisisespeciallytrueifthedistributedsystemisconnectedtightlytotherealworld(monitoring,analyzing,controlling),e.g.,indigitaltwins
• Basicphysicalquantitiesareadditive(mass,force,momentum,energy)
• Theycan’tbecompletelylinear,though• Becauseweneednonlinearityformostnontrivialbehavior• Discretedecisions(e.g.,Buridan’sprinciple)arenonlinear• Real-worldgrapheffectsarenonlinear
• Weshouldaddnonlinearitywhereneededbutnomore• Today’sdistributedsystemsarefartoononlinearanddiscontinuous• Theyshouldbemostlylinearwithsmallamountsofnonlinearityaddedwhereneeded
• Thisalsosimplifiesresilience,sincelinearityimplieshighredundancy47
Conclusions 48
Conclusions• Internetscalecontinuestoincrease,soscalabilityremainsanimportantresearchtopic
• Buildinglarge-scalesystemsisdifficultbecausenewandunusualphenomenaappearateachnewscale• Buridan’sprinciple,blackswans,real-worldgraphphenomena
• Wegiveanoverviewofsomeofthesephenomenaandofsomedesignprinciplesthatwehaveidentified• Heisenbergapplications,weaklyinteractingfeedbackstructures,convergentdatamanagement
• Thisispartofongoingworkonunderstandingscalability• Convergentdatamanagementisongoing(LightKoneproject)• Confluentdistributedsystemsisongoing(VanRoyandHaeri)• Wewelcomeyourreactionsandallpointerstorelatedwork
49