Reflections on scalability and - info.ucl.ac.be

49
Reflections on scalability and consistency Peter Van Roy W-PSDS 2019, Lyon, France Oct. 1, 2019 1

Transcript of Reflections on scalability and - info.ucl.ac.be

Page 1: Reflections on scalability and - info.ucl.ac.be

ReflectionsonscalabilityandconsistencyPeterVanRoyW-PSDS2019,Lyon,FranceOct.1,2019

1

Page 2: Reflections on scalability and - info.ucl.ac.be

Overview•  Basicprinciples

•  Understandingscalability,InternetgrowthandIoT,CAPtheorem

•  Large-scalephenomena•  Buridan’sprinciple,blackswans,real-worldgraphs,Heisenbergapplications

•  Buildingscalablesystems•  Autonomiccomputingandfeedbackstructures,convergentdatamanagement,simplifieddistributedsystems(confluenceandlinearity)

•  Conclusions•  Ongoingworkonconvergentdatamanagementandconfluentdistributedsystems

2

Page 3: Reflections on scalability and - info.ucl.ac.be

Understandingscalability•  Asystemisscalableifitmaintainsadesirablepropertyasthescaleincreases•  Scalecanbedefinedinseveralusefulways:numberofnodes,sizeofdata,numberofusers,geographicseparation

•  Wewillusearoughdefinitionwherescalecorrespondstototalaggregatework(data,computation)donebyasystem

•  Whenscaleincreases,strangethingshappen•  Thistalkisnotastandardtechnicaltalk•  Wewillexplainvariousphenomenathatshowupathighscaleandgivesomeapproacheshowtodesignsystemstoaccommodatethesephenomena

•  Inmyview,anyonewhoisseriousaboutscalabilityshouldknowaboutthesephenomenaandthesetechniques!

•  Thistalkisalsointendedtoprovokediscussionandsolicitpointerstootherworkonscalability

3

Page 4: Reflections on scalability and - info.ucl.ac.be

Internetscale•  TheInternetisanexcellenttestbedforscalabilitybecauseithasbeengrowingexponentiallysincearound1970andstillis

•  Nowadays,thisgrowthisconcentratedattheedge:

FromtalkofDavidBol,RISC-V(Paris),Oct.1-3,2019

4

Page 5: Reflections on scalability and - info.ucl.ac.be

CAPtheorem 5

Page 6: Reflections on scalability and - info.ucl.ac.be

CAPtheorem•  The CAP theorem was conjectured by Eric Brewer at PODC

in 2000 and proved by Seth Gilbert and Nancy Lynch in 2002 •  For an asynchronous network, it is impossible to implement

an object that guarantees the following properties in all fair executions: •  Consistency: all operations are atomic (totally ordered) •  Availability: every request eventually returns a result •  Partition tolerance: any messages may be lost

•  The CAP Theorem applies for all systems, at all levels of abstraction, and at all sizes •  It can be applied in many places in the same system •  The whole system is a rainbow of interacting instances of CAP

© 2

010

Pet

er V

an R

oy

6

Page 7: Reflections on scalability and - info.ucl.ac.be

Spanner

Databases

TheCAPtriangle•  WegiveanintuitiveoverviewoftheconsequencesofCAPbymeansofaCAPtriangle•  Costincreasestowardthe

center•  Thecenteritselfisempty!

•  AllpartsoftheCAPtrianglehavetheiruses

•  Wehavearrangedsomeapplicationsaroundthetriangleaccordingtoperceivedfunctionality•  Verylittlesystematicstudyhas

beendoneaboutnavigatinginthistriangle

C+A

C+P

C+P

A+P C A

P

A C

P

Trade-off Trade-off

Trade-off

Mercurial (BitTorrent) (PeerTV)

DropBox (Wuala)

(…) = read-only

(Search)

Cost increases

7

Page 8: Reflections on scalability and - info.ucl.ac.be

Buridan’sprinciple 8

Page 9: Reflections on scalability and - info.ucl.ac.be

Buridan’sprinciple•  PhilosopherJeanBuridanstatedthatanassplacedequidistantbetweentwobalesofhaymuststarvetodeathbecauseithasnoreasontochooseonebaleovertheother•  Thisprinciplehassurprisingconsequences,asshowninthepaper“Buridan’sPrinciple”,byLeslieLamport(1984)

•  Assumeasystemhastomakeadiscretedecisionfromcontinuousinput.Thenwecanprovethefollowing:•  Adiscretedecisionbaseduponaninputwithacontinuousrangeofvaluescannotbemadewithinaboundedlengthoftime

•  Therearemanyexamplesofthisprinciple•  Acaratanunguardedrailroadcrossing,thedriverstopsatthecrossingandproceedswhenitissafe.Thedrivermustdecidewhethertowaitforthetrainortocrossthetracks.

•  Ajurymustdecidewhetherastudentpassesorfailshisacademicyear.Astheaveragegetscloserto50%,thedecisionbecomesharderandharder,becausemoreinformationmustbeanalyzed.

9

Page 10: Reflections on scalability and - info.ucl.ac.be

ProofofBuridan’sprinciple

•  At(x)isthepositionattimetwithstartingpositionx•  Astincreases,At(x)mustconvergeto0or1forallx•  SinceAt(x)iscontinuousintandx,itisclearthatthereexistxforwhichtwillbearbitrarilylarge

Startingposition0 1

0

1

Positionattimet

A0(x)

A1(x)A2(x)

Decision1

Decision0

10

Page 11: Reflections on scalability and - info.ucl.ac.be

Relationshiptoscale•  Distributedsystemsoftenmustmakedecisions

•  Distributedbankaccount:istheaccountpositiveornegative.•  Votingsystem:whowinsthevote.

•  Theinputdatais(approximately)continuousandcanbedistributedoverthewholesystem

•  Tomakethedecision,moreinformation/computationisrequiredastheinputdataisclosertothedecisionboundary•  Farfromtheboundary,onlylocalinformationisneeded•  Veryclosetotheboundary,thewholesystemisinvolved

•  Lessonforsystemdesign:preventionorcure!•  Designascalablesystemtostayfarawayfromboundaries•  Someboundariesareinevitable:inthatcase,trytopredictwhendecisionsareneededand«prefetch»theinformation 11

Page 12: Reflections on scalability and - info.ucl.ac.be

Blackswans 12

Page 13: Reflections on scalability and - info.ucl.ac.be

… imagine a green plant shooting up from its root, thrusting forth strong green leaves from the sides of its sturdy stem, and at last terminating in a flower. The flower is unexpected and startling, but come it must – nay, the whole foliage has existed only for the sake of that flower, and would be worthless without it.

– from “Conversations of Goethe with Johann Peter Eckermann” (1930 translation)

13

Page 14: Reflections on scalability and - info.ucl.ac.be

Blackswans•  Systemsaredesignedbyrelyingoninduction,“whatworkedinthepastwillcontinuetoworkinthefuture”•  Itisverycommontoassumethatinductionwillalwayswork

•  However,inductionoftenhasabuilt-inlimitandfailsbeyond•  Year2000Bug:itwas“faraway”butithasarrived•  Dinosaursandbanks:“toobigtofail”buttheywillfail

•  Incomputersystemsthisisbothubiquitousandhidden•  Allsystemshavefiniteresourcelimits(memory,speed)thatarefarawayin“normalusage”butreachedwhensystemisstressed

•  Typically,thesystemwillfailinexactlythecasewhereitisneededmost(RedWeddingsituations)

14

Page 15: Reflections on scalability and - info.ucl.ac.be

Thetwogreat“frauds”•  Systemsobeyinductivereasoning–false!

•  Pastexperiencewithsystemsisabadguideforfuturesystems,especiallyifthefuturesystemisgoingbeyondthepastone

•  Systemsobeyprobabilitydistributions–false!•  Probabilitydistributionsareintroducedtosimplifyanalysis,butoftendonotexistinreality

•  Assumingaprobabilitydistributionexistsisaverystrongassumption(frequencylimitexists)andisveryprobablywrong

•  Blackswan:unexpectedlargeeventsthatfalsifyinductionandareobviousinhindsight•  Largesystemsarefundamentallyirregularandmustbedesignedtosurviveextremecases

•  See“TheBlackSwan”,byNassimNicholasTaleb(2010)15

Page 16: Reflections on scalability and - info.ucl.ac.be

Degreesofincreasingirregularityinalargeystem1.  Existenceofaprobabilitydistribution

•  Statisticalphysicsholds,allmicrostateshaveequalprobability,behavioristhermodynamic(describablebymacroscopicstatevariables)

•  Unfortunately,mostsimulationsandmodelsarestuckhere!2.  Criticalpoint

•  Minorfluctuationscanbeamplifiedwithoutbounds•  Thelimitofstatisticalphysics•  Manycomputingsystemshavecriticalpoints(garbagecollectors,dynamichashtables,wide-arearouting,virtualmemory)

3.  Noprobabilitydistributionexists(“BlackSwans”)•  Weknowonlytherangeofbehavior,frequencylimitsdonotexist

•  Dijkstra’sdemon:inaguardedcommand,allguardscanbechosen•  Theprogrammustbedesignedsothatitworksevenifademonmakesthe

worstpossiblechoicesineachguardedcommand•  Complexsystems,programverification,distributedalgorithmics 16

Page 17: Reflections on scalability and - info.ucl.ac.be

Real-worldgraphs 17

Page 18: Reflections on scalability and - info.ucl.ac.be

Real-worldgraphs•  LargeapplicationsontheInternetwilloftenhavelargenumbersofuserswhosebehaviorcansignificantlyinfluencethelarge-scalebehaviorofthesystem•  Especiallyimportantisinformationdisseminationamongusers

•  Small-worldgraphs•  Theconnectivitygraphamongtheapplication’suserswillalmostalwaysbeasmall-worldgraph:smallaverageshortestpathlength,largeclusteringcoefficient(moreclusteredthanrandom,smallerpathsthanneighbor)

•  Navigationisofteneasy(usersearchusingpartialinformation)•  Power-lawstructure

•  FractionofWebpageswithkin-linksisproportionalto1/k2•  Consequenceofinformationdisseminationduringformation

See«Networks,Crowds,andMarkets»,byDavidEasleyandJonKleinberg(2010)

18

Page 19: Reflections on scalability and - info.ucl.ac.be

Consistencyinreal-worldgraphs•  CAPtheoremstatesthatconsistencycannotbeachievedforavailable,partition-tolerantsystems•  Users’informationdisseminationmakesthismoreprecise

•  Limitingcommunicationbetweenuserscausespluralisticignorance,wheredifferentpartsofthenetworkhaveverydifferentinformation,andthiscanlastindefinitelylong

•  Limitingcommunicationcancausesuddenchanges•  Epidemicdissemination(synchronization,oscillation,stability)•  Collapseofgiantcomponents(oneconnectedcomponentthatcontainsasignificantfractionofallnodes)

•  Informationcontentcanbechangedduringdissemination•  Cascades(herding),whendecisionsaremadeinsequentialorder•  Tippingpoints,decisionsneedtoconvincealargeinitialgroup

19

Page 20: Reflections on scalability and - info.ucl.ac.be

Example:Webbow-tie

•  LargedecentralizedinformationnetworkssuchastheWebwilloftenhaveaglobalstructurewithacentralstrongly-connectedcomponent(from2000,butstillvalidtoday)

•  Thestructureismaintainedasthenetworkcontinuouslychanges

FromBroderetal(2000),asprintedinNetworks,Crowds,andMarkets

20

Page 21: Reflections on scalability and - info.ucl.ac.be

Example:financialnetworks

•  NetworkofloansamongUSfinancialinstitutions,revealingitsstronglyconnectedcore

•  Thisrevealsastructuralfragilityinthefinancialsystem

FromBech&Atalay(2008),asprintedinNetworks,Crowds,andMarkets

21

Page 22: Reflections on scalability and - info.ucl.ac.be

Heisenbergapplications 22

Page 23: Reflections on scalability and - info.ucl.ac.be

Manyapplicationsarebursty•  Manyreal-worldapplicationsrequirealotofcomputationalresourcesforveryshorttimeperiods•  Forexample,interactiveapplicationsrequiremassiveresources,butonlywhenbeingused(real-timevoicetranslationwhenyouarespeaking)

•  Yourcomputer’sCPUusageisbimodal:itiscloseto0%mostofthetime,exceptwhenyouaredoingacompute-intensivetaskwhenitiscloseto100%

•  Thesolutiontothisistoprovideelasticity•  Cloudcomputing:payonlyfortheresourcesactuallyused

•  Economyofscalewithmanysharedusers•  InternetofThings:poweruponlythenodesactuallyneeded

•  Hardware(silicon)ischeap•  Thisenablesanewkindofapplicationbasedonburstiness 23

Page 24: Reflections on scalability and - info.ucl.ac.be

ComputationalHeisenbergPrinciple(1)•  Acloudhastwokeyproperties:

•  Payperuse:payonlyfortheresourcesactuallyused•  Elasticity:abilitytoscaleresourceusageup(anddown)rapidly

•  Forafixedcost,asthetimeintervaldecreasesmoreresourcescanbemadeavailable:

•  Foragivenmaximumcost,theproductofresourceamountandusagetimeislessthanaconstant

•  AnalogywithHeisenberg’sUncertaintyPrincipleinphysics:theproductofuncertaintyintimeanduncertaintyinenergyisequalto(orgreaterthan)aconstant.Thisincreasestheprobabilityofeventsthatusearbitrarilyhighenergiesifthetimeperiodisshortenough.Aslongasthehighenergiesarelessthantheuncertainty,thentheyareallowed!

•  Thisisapropertyofthesystemitself,notalimitationofmeasurement!•  ∆t⋅∆E=candtallow≤∆tandEallow≤∆Eimpliestallow⋅Eallow≤c

•  Thisopensthedoortonewapplicationsthatcouldnotbedonebefore24

Page 25: Reflections on scalability and - info.ucl.ac.be

ComputationalHeisenbergPrinciple(2)

•  Forgivenfixedresourcecostc0,whatkindsofapplicationscanrun?

•  Beforeelasticity:allapplicationslivedinlightblueareawhichgiveslocalresourcesformaximumcostc0(r≤r0)

•  Withelasticity:darkblueareabecomesavailableforthesamecost(r>r0)

•  ThedarkblueareaisthehomeofHeisenbergapplications•  Cloudapplications(e.g.,bigdata)•  IoTapplications(e.g.,datafusion)

•  UsingmachinelearningoftenleadstoHeisenbergapplicationsTime interval

Available resources

t0

r0

Local resources for cost c0

c0

Cloud resources for cost c0

t ⋅ r ≤ c0

25

Page 26: Reflections on scalability and - info.ucl.ac.be

Query/usephaseelasticityrequirements

(responsetime)

Learning/setupphaseelasticity

requirements(learningtime)

S

S

M

M

L

L

GoogleSearchGoogleTranslate

Recommendationsys.SpeechrecognitionSkypeconnectionSocialnetworksMediatranslation

Weatherforecasting

Interactivity(learning+query)

One-shot

One-waystream

RecordedFuture

MMORPGRole-playinggamesChessprogram

ChampionchessprogramIBMJeopardy

WolframAlphaImagerecognition

ComputeralgebraPeer-to-peerCDNGoogleEarthJITCompiler

XL

XL

Conversation

BitTorrentWIMPGUI

MicrosoftOffice

Future applications

Standard applications

Advanced applications

Real-timeaudiolanguagetranslation

Real-timeexpertguidance

Creativeproblemsolving

Heisenbergapplicationspace

26

Page 27: Reflections on scalability and - info.ucl.ac.be

Buildingscalablesystems 27

Page 28: Reflections on scalability and - info.ucl.ac.be

Thestartingpoint

•  Everyscalabledesignstartsasindependentpieces(P+A,noC)•  Nodesoccasionallyinteract(addsomeC)→collaboration,emergence•  Splitprotocol:whathappenswhenanodeleavesagroup(maybeabrupt)•  Mergeprotocol:whathappenswhenanodejoinsagroup

•  Manyexamples:biology,peer-to-peer,map-reduce,…

split/merge

group

28

Page 29: Reflections on scalability and - info.ucl.ac.be

Buildinguptoarealsystem

• Westartwithadecentralizedsystem(P+A,noC)•  Theproblem:howmuchCandhowtoaddit?

•  TherestofthissectionexploreshowtoaddC•  Control-orientedapproach

•  Autonomiccomputing•  Weaklyinteractingfeedbackstructures

•  Data-orientedapproach•  Convergentdatamanagement•  LightKonereferencearchitecture

29 See«DesigningRobustandAdaptiveDistributedSystemswithWeaklyInteractingFeedbackStructures»,byP.VanRoy,S.Haridi,andA.Reinefeld(2011)

See«LightKoneReferenceArchitecture(LiRA)WhitePaper»,byAliShokeretal(2019)

Page 30: Reflections on scalability and - info.ucl.ac.be

Weaklyinteractingfeedbackstructures

30

Page 31: Reflections on scalability and - info.ucl.ac.be

Autonomiccomputing

•  Autonomiccomputing,initiatedbyIBMin2001,aimstomakecomputersystemsselfmanaging,toovercomethegrowingcomplexityofsystemsmanagementasscaleincreases

•  ThebasicbuildingblockofIBM’sautonomicsystemistheMAPE-Kfeedbackloop(Monitor–Analyze–Plan–Execute–Knowledge)

•  Westartwiththisapproach,andweinvestigatehowtobuildsystemsconsistingofmanyinteractingMAPE-Kloops

6 · Markus C. Huebscher, Julie A. McCann

Fig. 1. IBM’s MAPE-K (Monitor, Analyse, Plan, Execute, Knowledge) reference model for auto-nomic control loops.

a quality as possible, e.g. Kendra [McCann and Crane 1998] and Real Surestream[Lippman 1999]. However the autonomic community is more and more identifying asystem as autonomic if it exhibits more than one of the self-management propertiesdescribed earlier, e.g. Ganek and Friedrich [2006].

4. THE MAPE-K AUTONOMIC LOOP

To achieve autonomic computing, IBM has suggested a reference model for auto-nomic control loops [IBM 2003], which is sometimes called the MAPE-K (Monitor,Analyse, Plan, Execute, Knowledge) loop and is depicted in Figure 1. This model isbeing used more and more to communicate the architectural aspects of autonomicsystems. Likewise it is a clear way to identify and classify much of the work thatis being carried out in the field. Therefore this section introduces the MAPE-Kloop in more detail and then taking each of its components in turn, describes thework that focuses its research on that component. We later take the same workand examine it from a degree of autonomicity point of view.

The MAPE-K autonomic loop is similar to, and probably inspired by, the genericagent model proposed by Russel and Norvig [2003], in which an intelligent agentperceives its environment through sensors, and uses these percepts to determineactions to execute on the environment.

In the MAPE-K autonomic loop, the managed element represents any softwareor hardware resource that is given autonomic behaviour by coupling it with anautonomic manager. Thus, the managed element can for example be a web server ordatabase, a specific software component in an application (e.g. the query optimiserin a database), the operating system, a cluster of machines in a grid environment,a stack of hard drives, a wired or wireless network, a CPU, a printer, etc.

Sensors, often called probes or gauges, collect information about the managedelement. For a web-server, that could include the response time to client requests,network and disk usage, CPU and memory utilisation. A considerable amount ofACM Journal Name, Vol. V, No. N, Month 20YY.

31

Page 32: Reflections on scalability and - info.ucl.ac.be

Ascalablearchitectureinfoursteps

•  Concurrentcomponent•  Anactiveentitycommunicatingwithitsneighbors

throughasynchronousmessages•  “Intelligence”concentratedincorecomponents

•  Singlefeedbackloop(MAPE-Kloop)•  Manager,sensor,andeffectorcomponentsconnected

toasubsystemandcontinuouslymaintainingonelocalgoal

•  Feedbackstructure•  Asetoffeedbackloopsthatworktogetherto

maintainoneglobalsystemproperty

•  Weaklyinteractingfeedbackstructures(WIFS)•  Thecompletesystemisaconjunctionofglobal

properties,eachmaintainedbyonefeedbackstructure

•  Thefeedbackstructureshavedependenciesbasedontheoperatingconditions

32

T H E A D V E N T U R E S O F

Page 33: Reflections on scalability and - info.ucl.ac.be

Humanrespiratorysystem

•  Defaultbehavior:rhythmicbreathingreflex•  Complexcomponent:consciouscontrolcanoverrideandplanlifesavingactions•  Abstraction:consciouscontroldoesnotneedtoknowdetailsofbreathingreflex•  Fail-safe:consciouscontrolcanitselfbeoverridden(fallingunconscious)•  Timescales:laryngospasmisaquickactionthatinterruptsslowerbreathingreflex

Other inputs

when sufficient obstruction in airways

Laryngospasm

(seal air tube)

Breathing

reflex

MeasureO2

in blood

Monitor

breathing

MeasureCO2

in blood

Detectobstructionin airways

Trigger unconsciousness

when O2 falls to threshold

Conscious control

of body and breathing

Trigger breathing reflex

when CO2 increases to threshold

Trigger laryngospasm temporarily

Actuating agents Monitoring agents

in human body

Breathing apparatus

(maximum is breath!hold breakpoint)and change CO2 threshold

Increase or decrease breathing rate

(and reduce CO2 threshold to base level)Render unconscious

Somedesignrules:

Theoperationofthehumanrespiratorysystemisgivenasonefeedbackstructure,inferredfromaprecisemedicaldescriptionofitsbehavior(seeentryon“Drowning”,Wikipedia)

33

Page 34: Reflections on scalability and - info.ucl.ac.be

Statediagram

•  Thehumanrespiratorysystemcanbeseenasastatediagram•  Dominantsubset=activesubsetoffeedbackloops=state

•  Atanytime,onesubsetisactive,dependingonoperatingconditions•  Eachsubsetcorrespondstoastateinthestatediagram

breathingNormalUnconscious

breathingUnconscious

laryngospasmNormal

Consciousbreathing

laryngospasm

Consciouslaryngospasm

obstruction

timeoutobstruction

time out

obstruction

time out

wake up

oxygen low

consciousdecision

oxygen low

consciousdecision or

breath!holdbreakpoint

consciousdecision

consciousdecision

© 2

010

Pet

er V

an R

oy

34

Page 35: Reflections on scalability and - info.ucl.ac.be

Aself-managingkey/valuestore:Scalaris

•  Scalarisisahigh-performanceself-managingkey/valuestorethatprovidestransactionsandisbuiltontopofastructuredoverlaynetwork•  AmajorresultoftheSELFMANproject(www.ist-selfman.org)•  4000read-modify-writetransactions/secondontwodual-coreIntelXeon2.66GHz

•  ScalarishasfiveWIFS:connectivity(Sconnect),routing(Sroute),loadbalancing(Sload),replicamanagement(Sreplica),andtransactionmanagement(Strans)

! " "! " !

#!$%&'!()!)!*%&#"#!+&*,

-

.$)&#)*!"%&!/),+$

"- 0"-"!

)!%1"*"!,2!*%&#"#!+&*,2

"#%-)!"%&2!(3$)0"-"!,

4++$"!%"4++$ /),+$5)!) 5)!)

6+7-"*)!"%&!/),+$

#*)-)0"-"!,

)8)"-)0"-"!,

4++$ !% 4++$!/),+$

5)!)

9!%$+

:

5)!)

9!%$+

;

5)!)

9!%$+

<

5)!)

9!%$+

=

5)!)

9!%$+

>>>

5)!)

9!%$+

>>?

5)!)

9!%$+

>>@

5)!)

9!%$+

>>A

Sscalaris= Skey-value ∧ Sconnect ∧ Sroute ∧

Sload ∧ Sreplica ∧ Strans The Scalaris specification is a conjunction of six properties. Each non-functional property is implemented by one feedback structure. Sconnect → Sroute → Sreplica → Strans

Sload

35

Page 36: Reflections on scalability and - info.ucl.ac.be

Convergentdatamanagement 36

Page 37: Reflections on scalability and - info.ucl.ac.be

Convergentdatamanagement•  Maintainingadequateconsistencyandperformancearetwoofthemostimportantissueswhensystemscaleincreases•  Whereasautonomiccomputingisanoperation-orientedapproach,letusnowtakeadata-orientedapproach

•  Assumeoursystemhasadatabasethatisusedtocentralizethedatamanagement•  Weautomatethemanagementof“truth”inthedatabasebyasystemtocontinuouslyupdatethedatabasewithinformationcomingfromtheedge

•  Tosimplifyconsistencymanagement,weuseconvergentdatastructuresthattoleratemessageandnodefailuresandneedverylittlesynchronization(CRDTs)

37

Page 38: Reflections on scalability and - info.ucl.ac.be

CRDTs•  AConflict-FreeReplicatedDataTypeisareplicateddatastructurethatmaintainsconsistencybetweenreplicaswithaveryweaksynchronizationprotocol•  ItsatisfiesStrongEventualConsistency(SEC):nreplicasthatreceivethesameupdates(inanyorder)haveequivalentstate

•  Internally,allreplicasarecollectinginformationmonotonicallyandarealwaysconvergingtotheirresultingstate

•  Synchronizationbetweenreplicasiseventualreplica-to-replicacommunication

•  ManypracticalCRDTsexistandarewidelyusedincommercialsystemswithmillionsofusers•  Counters,sets,maps,graphs,etc.

•  CRDTsarewell-suitedforconvergentdatamanagement38

Page 39: Reflections on scalability and - info.ucl.ac.be

Basicconvergentdatascenario

•  TheCRDTdatabasereceivesperiodicupdates(raworaggregated)fromalledgedevices

•  Messagesmaybelost,reordered,orrepeatedwithoutharm•  TheCRDTdatabasewillalwaysbeconvergingtothetruthattheedge;stalenessdependsontheupdateprotocolanddelay

CRDTdatabase(cloudorgateway)

Edgedevices(inhighnumbers)

updates

39

Page 40: Reflections on scalability and - info.ucl.ac.be

LateraldatasharinginLiRA

•  LiRA(LightKoneReferenceArchitecture)proposesanarchitecturewithconvergentdatamanagement,withartefacts(AntidoteDB,Achlys,Legion)

•  Astheedgecontinuestogrowexponentially,verticaldatasharing(redlines)isextendedforconvergentdatamanagement

•  Inaddition,lateraldatasharing(bluelines)isaddedtodoconvergentdatamanagementbetweendevicesatthesamelevel

CHAPTER 1. OVERVIEW

Figure 1.0.1: A typical hierarchical architecture of Fog/Edge Comput-ing in a airport monitoring use case (source OpenFog [6]). The diagramdemonstrates the need for lateral and vertical data sharing and computa-tion across different airport sub-systems and terminals at different levelsof the fog hierarchy.

management is rather often delegated to lower layers, e.g., database vari-ants, in-memory caching systems, etc.

None of current RAs provide explicit solutions to these problems de-spite their significant importance to fog applications. In fact, solving P1helps keeping the data at a close proximity of its source and close-bynodes, thus avoiding extra time delays, communication failures and over-heads, and security threats. Whereas, solving P2 considers the diversedata usage patterns, consistency requirements, and invariants of fog ap-plications, key to improving user experience despite delays and networkspartitions—likely to occur in such hostile networks.

1.2 LiRA Approach

The LightKone Reference Architecture (LiRA) bridges the above gap byproviding a data management layer that supports lateral data sharing(via replication) as well as fog/edge application-level data management.LiRA gives priority to high availability, often desired in fog/edge applica-tions, without breaking application semantics or giving up consistency—although delayed. This approach stems from the fact that given the un-avoidable network partitions and delays in fog/edge networks, one hasto choose either Consistency or Availability as explained in the CAP the-

LightKone Reference Architecture LiRAv0.9, June 15, 2019

From«LightKoneReferenceArchitecture(LiRA)WhitePaper»,AliShokeretal(2019)

40

(stable)

(exponential)

Page 41: Reflections on scalability and - info.ucl.ac.be

Just-RightConsistency(JRC)•  JRCgeneralizesconvergentdatamanagementforapplicationinvariants

•  Differentpartsofanapplicationneeddifferentconsistencylevels•  Nosinglemodelisbest:synchronousmodelsaresafest,butasynchronousonesarefastandtoleratepartitions

•  Wewanttousesynchronousandasynchronoustogetherandgetthebestofbothworlds

•  JRCclassifiesapplicationinvariantsaccordingtoCAP:•  «Chooseanytwo:Consistency,Availability,Partition-tolerance»•  AP-compatiblepatterns:CRDTswithconcurrentupdates,causalorderofoperations,highly-availabletransactions

•  CAP-sensitivepattern:stablepreconditionbeforeconcurrentupdate(canbeverifiedwithtoolsupport,e.g.,CECtool)

•  Allotheroperationsmustbedonesequentially41

Page 42: Reflections on scalability and - info.ucl.ac.be

Simplifieddistributedsystems 42

Page 43: Reflections on scalability and - info.ucl.ac.be

Simplifyingdistributedsystems•  Distributedsystemshaveconcurrency,messagelatency,partialfailure,andreal-worldinteraction.Theyarequitedifficulttounderstandanddevelopintheirfullgenerality.

•  Ausefulapproachistofocusonsimplerkindsofdistributedsystems•  Thegeneralformcanbeseenasthesimplerkindwithsmalladditions•  Followingthisapproachcangreatlyhelpdesigningdistributedsystems

•  Weshowtwousefulkindsofsimplifieddistributedsystems•  Eachkindsatisfiesastrongpropertythatgreatlyhelpsdevelopment,andthe

generalformcanbeattainedwithsmalladditions

Linearsystem

Confluentsystem

NonlinearLinear

Imperative(real-worldinteraction)

Purefunctional(lambdacalculus)

43

Page 44: Reflections on scalability and - info.ucl.ac.be

Confluentsystemsprinciple•  Functionalprogrammingbasedonλ calculusisconfluent:

•  Church-Rosser:Givenaninitialexpression,thefinalresultofareductionisthesameforallreductionorders(uptorenaming)

•  Afunctionalprogramcanbeseenasanetworkofconcurrentagents,eachexecutinginitsownthread.Church-Rossermeansthatforallschedulerchoices,theresultisthesame.

•  Wemakethisprogramdistributedbyplacingeachsubexpressiononitsownnode•  Thisrequiresoneextrareductionrule,amobilityrule,tocolocateexpressionsonanodeforreduction.Church-Rosserstillholds.

•  Thelimitationisthatitissyntacticallyknownfromwhichsubexpressioneachagent’snextinputwillcome•  Forgeneraldistributedsystems,weneedtoaddinteractionpoints,whichdependonreductionorderandexpressreal-worldinteraction

See«WhyTimeisEvilinDistributedSystemsandwhattodoaboutit»,byPeterVanRoy,invitedtalk,CodeBEAM2019,May16,2019

44

Page 45: Reflections on scalability and - info.ucl.ac.be

Confluentdistributedsystems

•  Asimpleexampleofthisapproachistheclient/server:thisprogramiscompletelyfunctionalexceptforoneinteractionpoint,namelywheretheserveracceptsanincomingmessagefromanyclient

•  Thegeneralapproachistowritemostoftheprogramwithdistributedfunctionalconcurrencyandaddinteractionpointswhereneeded

Server

Client1

Client2 Nondeterministicmessagereceiveisaninteractionpoint

localspinnodep=newport(s)server(state,s)endnodeclient(state1,p)endnodeclient(state2,p)end… /*asmanyclientsasweneed*/endfunclient(state,p)send(query(state),p)client(fc(state),p)endfunserver(state,s)casesofq|tthenserver(fs(q,state),t)[]nilthennilendend

Oneinteractionpoint

Pseudocodeofclient/server

45

Page 46: Reflections on scalability and - info.ucl.ac.be

Linearsystemsprinciple•  Anotherusefulkindofdistributedsystemisthelinearsystem

•  Linearsystemsarewidelyusedinmechanicalandelectricalengineering,butarelessusedininformatics,becausecomputerprogramsareinherentlydiscreteandhencenonlinear

•  However,distributedsystemsaremoreandmorebeingusedintightconnectionwiththerealworld,tomonitorandcontrolrealworldsystems.Thesedistributedsystemscouldtakeadvantageoflinearity,justlikeinotherengineeringdisciplines.

•  Theworldisacombinationoflinearityandnonlinearity•  Linearity=independentparts=wholeequalsthesumoftheparts•  Nonlinearity=interactingparts=wholeismorethanthesumoftheparts

•  Linearsystemsaremucheasiertoanalyzequantitativelythannonlinearones•  Becauseinlinearsystems,thepartscanbeanalyzedseparatelyandthencombined

(superpositionprinciple,compositionalsystems)•  Inaddition,somenonlinearsystemscanbeanalyzedqualitatively(usinga

geometricapproach).See“NonlinearDynamicsandChaos”,byS.Strogatz(1994).•  Nonlinearitycannotbeeliminatedentirely;oftenthesystemiscriticallybasedonit

•  Forexample,intelligence(complexcomponents,machinelearning)isoftennonlinear•  That’swhybiologicalsystemsaremadeofweaklyinteractingsubsystems

46

Page 47: Reflections on scalability and - info.ucl.ac.be

Lineardistributedsystems•  Largedistributedsystemscanoftenbemostlylinear

•  Thisisespeciallytrueifthedistributedsystemisconnectedtightlytotherealworld(monitoring,analyzing,controlling),e.g.,indigitaltwins

•  Basicphysicalquantitiesareadditive(mass,force,momentum,energy)

•  Theycan’tbecompletelylinear,though•  Becauseweneednonlinearityformostnontrivialbehavior•  Discretedecisions(e.g.,Buridan’sprinciple)arenonlinear•  Real-worldgrapheffectsarenonlinear

•  Weshouldaddnonlinearitywhereneededbutnomore•  Today’sdistributedsystemsarefartoononlinearanddiscontinuous•  Theyshouldbemostlylinearwithsmallamountsofnonlinearityaddedwhereneeded

•  Thisalsosimplifiesresilience,sincelinearityimplieshighredundancy47

Page 48: Reflections on scalability and - info.ucl.ac.be

Conclusions 48

Page 49: Reflections on scalability and - info.ucl.ac.be

Conclusions•  Internetscalecontinuestoincrease,soscalabilityremainsanimportantresearchtopic

•  Buildinglarge-scalesystemsisdifficultbecausenewandunusualphenomenaappearateachnewscale•  Buridan’sprinciple,blackswans,real-worldgraphphenomena

•  Wegiveanoverviewofsomeofthesephenomenaandofsomedesignprinciplesthatwehaveidentified•  Heisenbergapplications,weaklyinteractingfeedbackstructures,convergentdatamanagement

•  Thisispartofongoingworkonunderstandingscalability•  Convergentdatamanagementisongoing(LightKoneproject)•  Confluentdistributedsystemsisongoing(VanRoyandHaeri)•  Wewelcomeyourreactionsandallpointerstorelatedwork

49