Reflections on scalability and - info.ucl.ac.be

ReflectionsonscalabilityandconsistencyPeterVanRoyW-PSDS2019,Lyon,FranceOct.1,2019

1

Overview•  Basicprinciples

•  Understandingscalability,InternetgrowthandIoT,CAPtheorem

•  Large-scalephenomena•  Buridan’sprinciple,blackswans,real-worldgraphs,Heisenbergapplications

•  Buildingscalablesystems•  Autonomiccomputingandfeedbackstructures,convergentdatamanagement,simplifieddistributedsystems(confluenceandlinearity)

•  Conclusions•  Ongoingworkonconvergentdatamanagementandconfluentdistributedsystems

2

Understandingscalability•  Asystemisscalableifitmaintainsadesirablepropertyasthescaleincreases•  Scalecanbedefinedinseveralusefulways:numberofnodes,sizeofdata,numberofusers,geographicseparation

•  Wewillusearoughdefinitionwherescalecorrespondstototalaggregatework(data,computation)donebyasystem

•  Whenscaleincreases,strangethingshappen•  Thistalkisnotastandardtechnicaltalk•  Wewillexplainvariousphenomenathatshowupathighscaleandgivesomeapproacheshowtodesignsystemstoaccommodatethesephenomena

•  Inmyview,anyonewhoisseriousaboutscalabilityshouldknowaboutthesephenomenaandthesetechniques!

•  Thistalkisalsointendedtoprovokediscussionandsolicitpointerstootherworkonscalability

3

Internetscale•  TheInternetisanexcellenttestbedforscalabilitybecauseithasbeengrowingexponentiallysincearound1970andstillis

•  Nowadays,thisgrowthisconcentratedattheedge:

FromtalkofDavidBol,RISC-V(Paris),Oct.1-3,2019

4

CAPtheorem 5

CAPtheorem•  The CAP theorem was conjectured by Eric Brewer at PODC

in 2000 and proved by Seth Gilbert and Nancy Lynch in 2002 •  For an asynchronous network, it is impossible to implement

an object that guarantees the following properties in all fair executions: •  Consistency: all operations are atomic (totally ordered) •  Availability: every request eventually returns a result •  Partition tolerance: any messages may be lost

•  The CAP Theorem applies for all systems, at all levels of abstraction, and at all sizes •  It can be applied in many places in the same system •  The whole system is a rainbow of interacting instances of CAP

© 2

010

Pet

er V

an R

oy

6

Spanner

Databases

TheCAPtriangle•  WegiveanintuitiveoverviewoftheconsequencesofCAPbymeansofaCAPtriangle•  Costincreasestowardthe

center•  Thecenteritselfisempty!

•  AllpartsoftheCAPtrianglehavetheiruses

•  Wehavearrangedsomeapplicationsaroundthetriangleaccordingtoperceivedfunctionality•  Verylittlesystematicstudyhas

beendoneaboutnavigatinginthistriangle

C+A

C+P

C+P

A+P C A

P

A C

P

Trade-off Trade-off

Trade-off

Mercurial (BitTorrent) (PeerTV)

DropBox (Wuala)

(…) = read-only

(Search)

Cost increases

∞

7

Buridan’sprinciple 8

Buridan’sprinciple•  PhilosopherJeanBuridanstatedthatanassplacedequidistantbetweentwobalesofhaymuststarvetodeathbecauseithasnoreasontochooseonebaleovertheother•  Thisprinciplehassurprisingconsequences,asshowninthepaper“Buridan’sPrinciple”,byLeslieLamport(1984)

•  Assumeasystemhastomakeadiscretedecisionfromcontinuousinput.Thenwecanprovethefollowing:•  Adiscretedecisionbaseduponaninputwithacontinuousrangeofvaluescannotbemadewithinaboundedlengthoftime

•  Therearemanyexamplesofthisprinciple•  Acaratanunguardedrailroadcrossing,thedriverstopsatthecrossingandproceedswhenitissafe.Thedrivermustdecidewhethertowaitforthetrainortocrossthetracks.

•  Ajurymustdecidewhetherastudentpassesorfailshisacademicyear.Astheaveragegetscloserto50%,thedecisionbecomesharderandharder,becausemoreinformationmustbeanalyzed.

9

ProofofBuridan’sprinciple

•  At(x)isthepositionattimetwithstartingpositionx•  Astincreases,At(x)mustconvergeto0or1forallx•  SinceAt(x)iscontinuousintandx,itisclearthatthereexistxforwhichtwillbearbitrarilylarge

Startingposition0 1

0

1

Positionattimet

A0(x)

A1(x)A2(x)

Decision1

Decision0

10

Relationshiptoscale•  Distributedsystemsoftenmustmakedecisions

•  Distributedbankaccount:istheaccountpositiveornegative.•  Votingsystem:whowinsthevote.

•  Theinputdatais(approximately)continuousandcanbedistributedoverthewholesystem

•  Tomakethedecision,moreinformation/computationisrequiredastheinputdataisclosertothedecisionboundary•  Farfromtheboundary,onlylocalinformationisneeded•  Veryclosetotheboundary,thewholesystemisinvolved

•  Lessonforsystemdesign:preventionorcure!•  Designascalablesystemtostayfarawayfromboundaries•  Someboundariesareinevitable:inthatcase,trytopredictwhendecisionsareneededand«prefetch»theinformation 11

Blackswans 12

… imagine a green plant shooting up from its root, thrusting forth strong green leaves from the sides of its sturdy stem, and at last terminating in a flower. The flower is unexpected and startling, but come it must – nay, the whole foliage has existed only for the sake of that flower, and would be worthless without it.

– from “Conversations of Goethe with Johann Peter Eckermann” (1930 translation)

13

Blackswans•  Systemsaredesignedbyrelyingoninduction,“whatworkedinthepastwillcontinuetoworkinthefuture”•  Itisverycommontoassumethatinductionwillalwayswork

•  However,inductionoftenhasabuilt-inlimitandfailsbeyond•  Year2000Bug:itwas“faraway”butithasarrived•  Dinosaursandbanks:“toobigtofail”buttheywillfail

•  Incomputersystemsthisisbothubiquitousandhidden•  Allsystemshavefiniteresourcelimits(memory,speed)thatarefarawayin“normalusage”butreachedwhensystemisstressed

•  Typically,thesystemwillfailinexactlythecasewhereitisneededmost(RedWeddingsituations)

14

Thetwogreat“frauds”•  Systemsobeyinductivereasoning–false!

•  Pastexperiencewithsystemsisabadguideforfuturesystems,especiallyifthefuturesystemisgoingbeyondthepastone

•  Systemsobeyprobabilitydistributions–false!•  Probabilitydistributionsareintroducedtosimplifyanalysis,butoftendonotexistinreality

•  Assumingaprobabilitydistributionexistsisaverystrongassumption(frequencylimitexists)andisveryprobablywrong

•  Blackswan:unexpectedlargeeventsthatfalsifyinductionandareobviousinhindsight•  Largesystemsarefundamentallyirregularandmustbedesignedtosurviveextremecases

•  See“TheBlackSwan”,byNassimNicholasTaleb(2010)15

Degreesofincreasingirregularityinalargeystem1.  Existenceofaprobabilitydistribution

•  Statisticalphysicsholds,allmicrostateshaveequalprobability,behavioristhermodynamic(describablebymacroscopicstatevariables)

•  Unfortunately,mostsimulationsandmodelsarestuckhere!2.  Criticalpoint

•  Minorfluctuationscanbeamplifiedwithoutbounds•  Thelimitofstatisticalphysics•  Manycomputingsystemshavecriticalpoints(garbagecollectors,dynamichashtables,wide-arearouting,virtualmemory)

3.  Noprobabilitydistributionexists(“BlackSwans”)•  Weknowonlytherangeofbehavior,frequencylimitsdonotexist

•  Dijkstra’sdemon:inaguardedcommand,allguardscanbechosen•  Theprogrammustbedesignedsothatitworksevenifademonmakesthe

worstpossiblechoicesineachguardedcommand•  Complexsystems,programverification,distributedalgorithmics 16

Real-worldgraphs 17

Real-worldgraphs•  LargeapplicationsontheInternetwilloftenhavelargenumbersofuserswhosebehaviorcansignificantlyinfluencethelarge-scalebehaviorofthesystem•  Especiallyimportantisinformationdisseminationamongusers

•  Small-worldgraphs•  Theconnectivitygraphamongtheapplication’suserswillalmostalwaysbeasmall-worldgraph:smallaverageshortestpathlength,largeclusteringcoefficient(moreclusteredthanrandom,smallerpathsthanneighbor)

•  Navigationisofteneasy(usersearchusingpartialinformation)•  Power-lawstructure

•  FractionofWebpageswithkin-linksisproportionalto1/k2•  Consequenceofinformationdisseminationduringformation

See«Networks,Crowds,andMarkets»,byDavidEasleyandJonKleinberg(2010)

18

Consistencyinreal-worldgraphs•  CAPtheoremstatesthatconsistencycannotbeachievedforavailable,partition-tolerantsystems•  Users’informationdisseminationmakesthismoreprecise

•  Limitingcommunicationbetweenuserscausespluralisticignorance,wheredifferentpartsofthenetworkhaveverydifferentinformation,andthiscanlastindefinitelylong

•  Limitingcommunicationcancausesuddenchanges•  Epidemicdissemination(synchronization,oscillation,stability)•  Collapseofgiantcomponents(oneconnectedcomponentthatcontainsasignificantfractionofallnodes)

•  Informationcontentcanbechangedduringdissemination•  Cascades(herding),whendecisionsaremadeinsequentialorder•  Tippingpoints,decisionsneedtoconvincealargeinitialgroup

19

Example:Webbow-tie

•  LargedecentralizedinformationnetworkssuchastheWebwilloftenhaveaglobalstructurewithacentralstrongly-connectedcomponent(from2000,butstillvalidtoday)

•  Thestructureismaintainedasthenetworkcontinuouslychanges

FromBroderetal(2000),asprintedinNetworks,Crowds,andMarkets

20

Example:financialnetworks

•  NetworkofloansamongUSfinancialinstitutions,revealingitsstronglyconnectedcore

•  Thisrevealsastructuralfragilityinthefinancialsystem

FromBech&Atalay(2008),asprintedinNetworks,Crowds,andMarkets

21

Heisenbergapplications 22

Manyapplicationsarebursty•  Manyreal-worldapplicationsrequirealotofcomputationalresourcesforveryshorttimeperiods•  Forexample,interactiveapplicationsrequiremassiveresources,butonlywhenbeingused(real-timevoicetranslationwhenyouarespeaking)

•  Yourcomputer’sCPUusageisbimodal:itiscloseto0%mostofthetime,exceptwhenyouaredoingacompute-intensivetaskwhenitiscloseto100%

•  Thesolutiontothisistoprovideelasticity•  Cloudcomputing:payonlyfortheresourcesactuallyused

•  Economyofscalewithmanysharedusers•  InternetofThings:poweruponlythenodesactuallyneeded

•  Hardware(silicon)ischeap•  Thisenablesanewkindofapplicationbasedonburstiness 23

ComputationalHeisenbergPrinciple(1)•  Acloudhastwokeyproperties:

•  Payperuse:payonlyfortheresourcesactuallyused•  Elasticity:abilitytoscaleresourceusageup(anddown)rapidly

•  Forafixedcost,asthetimeintervaldecreasesmoreresourcescanbemadeavailable:

•  Foragivenmaximumcost,theproductofresourceamountandusagetimeislessthanaconstant

•  AnalogywithHeisenberg’sUncertaintyPrincipleinphysics:theproductofuncertaintyintimeanduncertaintyinenergyisequalto(orgreaterthan)aconstant.Thisincreasestheprobabilityofeventsthatusearbitrarilyhighenergiesifthetimeperiodisshortenough.Aslongasthehighenergiesarelessthantheuncertainty,thentheyareallowed!

•  Thisisapropertyofthesystemitself,notalimitationofmeasurement!•  ∆t⋅∆E=candtallow≤∆tandEallow≤∆Eimpliestallow⋅Eallow≤c

•  Thisopensthedoortonewapplicationsthatcouldnotbedonebefore24

ComputationalHeisenbergPrinciple(2)

•  Forgivenfixedresourcecostc0,whatkindsofapplicationscanrun?

•  Beforeelasticity:allapplicationslivedinlightblueareawhichgiveslocalresourcesformaximumcostc0(r≤r0)

•  Withelasticity:darkblueareabecomesavailableforthesamecost(r>r0)

•  ThedarkblueareaisthehomeofHeisenbergapplications•  Cloudapplications(e.g.,bigdata)•  IoTapplications(e.g.,datafusion)

•  UsingmachinelearningoftenleadstoHeisenbergapplicationsTime interval

Available resources

t0

r0

Local resources for cost c0

c0

Cloud resources for cost c0

t ⋅ r ≤ c0

25

Query/usephaseelasticityrequirements

(responsetime)

Learning/setupphaseelasticity

requirements(learningtime)

S

S

M

M

L

L

GoogleSearchGoogleTranslate

Recommendationsys.SpeechrecognitionSkypeconnectionSocialnetworksMediatranslation

Weatherforecasting

Interactivity(learning+query)

One-shot

One-waystream

RecordedFuture

MMORPGRole-playinggamesChessprogram

ChampionchessprogramIBMJeopardy

WolframAlphaImagerecognition

ComputeralgebraPeer-to-peerCDNGoogleEarthJITCompiler

XL

XL

Conversation

BitTorrentWIMPGUI

MicrosoftOffice

Future applications

Standard applications

Advanced applications

Real-timeaudiolanguagetranslation

Real-timeexpertguidance

Creativeproblemsolving

Heisenbergapplicationspace

26

Buildingscalablesystems 27

Thestartingpoint

•  Everyscalabledesignstartsasindependentpieces(P+A,noC)•  Nodesoccasionallyinteract(addsomeC)→collaboration,emergence•  Splitprotocol:whathappenswhenanodeleavesagroup(maybeabrupt)•  Mergeprotocol:whathappenswhenanodejoinsagroup

•  Manyexamples:biology,peer-to-peer,map-reduce,…

split/merge

group

28

Buildinguptoarealsystem

• Westartwithadecentralizedsystem(P+A,noC)•  Theproblem:howmuchCandhowtoaddit?

•  TherestofthissectionexploreshowtoaddC•  Control-orientedapproach

•  Autonomiccomputing•  Weaklyinteractingfeedbackstructures

•  Data-orientedapproach•  Convergentdatamanagement•  LightKonereferencearchitecture

29 See«DesigningRobustandAdaptiveDistributedSystemswithWeaklyInteractingFeedbackStructures»,byP.VanRoy,S.Haridi,andA.Reinefeld(2011)

See«LightKoneReferenceArchitecture(LiRA)WhitePaper»,byAliShokeretal(2019)

Weaklyinteractingfeedbackstructures

30

Autonomiccomputing

•  Autonomiccomputing,initiatedbyIBMin2001,aimstomakecomputersystemsselfmanaging,toovercomethegrowingcomplexityofsystemsmanagementasscaleincreases

•  ThebasicbuildingblockofIBM’sautonomicsystemistheMAPE-Kfeedbackloop(Monitor–Analyze–Plan–Execute–Knowledge)

•  Westartwiththisapproach,andweinvestigatehowtobuildsystemsconsistingofmanyinteractingMAPE-Kloops

6 · Markus C. Huebscher, Julie A. McCann

Fig. 1. IBM’s MAPE-K (Monitor, Analyse, Plan, Execute, Knowledge) reference model for auto-nomic control loops.

a quality as possible, e.g. Kendra [McCann and Crane 1998] and Real Surestream[Lippman 1999]. However the autonomic community is more and more identifying asystem as autonomic if it exhibits more than one of the self-management propertiesdescribed earlier, e.g. Ganek and Friedrich [2006].

4. THE MAPE-K AUTONOMIC LOOP

To achieve autonomic computing, IBM has suggested a reference model for auto-nomic control loops [IBM 2003], which is sometimes called the MAPE-K (Monitor,Analyse, Plan, Execute, Knowledge) loop and is depicted in Figure 1. This model isbeing used more and more to communicate the architectural aspects of autonomicsystems. Likewise it is a clear way to identify and classify much of the work thatis being carried out in the field. Therefore this section introduces the MAPE-Kloop in more detail and then taking each of its components in turn, describes thework that focuses its research on that component. We later take the same workand examine it from a degree of autonomicity point of view.

The MAPE-K autonomic loop is similar to, and probably inspired by, the genericagent model proposed by Russel and Norvig [2003], in which an intelligent agentperceives its environment through sensors, and uses these percepts to determineactions to execute on the environment.

In the MAPE-K autonomic loop, the managed element represents any softwareor hardware resource that is given autonomic behaviour by coupling it with anautonomic manager. Thus, the managed element can for example be a web server ordatabase, a specific software component in an application (e.g. the query optimiserin a database), the operating system, a cluster of machines in a grid environment,a stack of hard drives, a wired or wireless network, a CPU, a printer, etc.

Sensors, often called probes or gauges, collect information about the managedelement. For a web-server, that could include the response time to client requests,network and disk usage, CPU and memory utilisation. A considerable amount ofACM Journal Name, Vol. V, No. N, Month 20YY.

31

Ascalablearchitectureinfoursteps

•  Concurrentcomponent•  Anactiveentitycommunicatingwithitsneighbors

throughasynchronousmessages•  “Intelligence”concentratedincorecomponents

•  Singlefeedbackloop(MAPE-Kloop)•  Manager,sensor,andeffectorcomponentsconnected

toasubsystemandcontinuouslymaintainingonelocalgoal

•  Feedbackstructure•  Asetoffeedbackloopsthatworktogetherto

maintainoneglobalsystemproperty

•  Weaklyinteractingfeedbackstructures(WIFS)•  Thecompletesystemisaconjunctionofglobal

properties,eachmaintainedbyonefeedbackstructure

•  Thefeedbackstructureshavedependenciesbasedontheoperatingconditions

32

T H E A D V E N T U R E S O F

Humanrespiratorysystem

•  Defaultbehavior:rhythmicbreathingreflex•  Complexcomponent:consciouscontrolcanoverrideandplanlifesavingactions•  Abstraction:consciouscontroldoesnotneedtoknowdetailsofbreathingreflex•  Fail-safe:consciouscontrolcanitselfbeoverridden(fallingunconscious)•  Timescales:laryngospasmisaquickactionthatinterruptsslowerbreathingreflex

Other inputs

when sufficient obstruction in airways

Laryngospasm

(seal air tube)

Breathing

reflex

MeasureO2

in blood

Monitor

breathing

MeasureCO2

in blood

Detectobstructionin airways

Trigger unconsciousness

when O2 falls to threshold

Conscious control

of body and breathing

Trigger breathing reflex

when CO2 increases to threshold

Trigger laryngospasm temporarily

Actuating agents Monitoring agents

in human body

Breathing apparatus

(maximum is breath!hold breakpoint)and change CO2 threshold

Increase or decrease breathing rate

(and reduce CO2 threshold to base level)Render unconscious

Somedesignrules:

Theoperationofthehumanrespiratorysystemisgivenasonefeedbackstructure,inferredfromaprecisemedicaldescriptionofitsbehavior(seeentryon“Drowning”,Wikipedia)

33

Statediagram

•  Thehumanrespiratorysystemcanbeseenasastatediagram•  Dominantsubset=activesubsetoffeedbackloops=state

•  Atanytime,onesubsetisactive,dependingonoperatingconditions•  Eachsubsetcorrespondstoastateinthestatediagram

breathingNormalUnconscious

breathingUnconscious

laryngospasmNormal

Consciousbreathing

laryngospasm

Consciouslaryngospasm

obstruction

timeoutobstruction

time out

obstruction

time out

wake up

oxygen low

consciousdecision

oxygen low

consciousdecision or

breath!holdbreakpoint

consciousdecision

consciousdecision

© 2

010

Pet

er V

an R

oy

34

Aself-managingkey/valuestore:Scalaris

•  Scalarisisahigh-performanceself-managingkey/valuestorethatprovidestransactionsandisbuiltontopofastructuredoverlaynetwork•  AmajorresultoftheSELFMANproject(www.ist-selfman.org)•  4000read-modify-writetransactions/secondontwodual-coreIntelXeon2.66GHz

•  ScalarishasfiveWIFS:connectivity(Sconnect),routing(Sroute),loadbalancing(Sload),replicamanagement(Sreplica),andtransactionmanagement(Strans)

! " "! " !

#!$%&'!()!)!*%&#"#!+&*,

-

.$)&#)*!"%&!/),+$

"- 0"-"!

)!%1"*"!,2!*%&#"#!+&*,2

"#%-)!"%&2!(3$)0"-"!,

4++$"!%"4++$ /),+$5)!) 5)!)

6+7-"*)!"%&!/),+$

#*)-)0"-"!,

)8)"-)0"-"!,

4++$ !% 4++$!/),+$

5)!)

9!%$+

:

5)!)

9!%$+

;

5)!)

9!%$+

<

5)!)

9!%$+

=

5)!)

9!%$+

>>>

5)!)

9!%$+

>>?

5)!)

9!%$+

>>@

5)!)

9!%$+

>>A

Sscalaris= Skey-value ∧ Sconnect ∧ Sroute ∧

Sload ∧ Sreplica ∧ Strans The Scalaris specification is a conjunction of six properties. Each non-functional property is implemented by one feedback structure. Sconnect → Sroute → Sreplica → Strans

Sload

35

Convergentdatamanagement 36

Convergentdatamanagement•  Maintainingadequateconsistencyandperformancearetwoofthemostimportantissueswhensystemscaleincreases•  Whereasautonomiccomputingisanoperation-orientedapproach,letusnowtakeadata-orientedapproach

•  Assumeoursystemhasadatabasethatisusedtocentralizethedatamanagement•  Weautomatethemanagementof“truth”inthedatabasebyasystemtocontinuouslyupdatethedatabasewithinformationcomingfromtheedge

•  Tosimplifyconsistencymanagement,weuseconvergentdatastructuresthattoleratemessageandnodefailuresandneedverylittlesynchronization(CRDTs)

37

CRDTs•  AConflict-FreeReplicatedDataTypeisareplicateddatastructurethatmaintainsconsistencybetweenreplicaswithaveryweaksynchronizationprotocol•  ItsatisfiesStrongEventualConsistency(SEC):nreplicasthatreceivethesameupdates(inanyorder)haveequivalentstate

•  Internally,allreplicasarecollectinginformationmonotonicallyandarealwaysconvergingtotheirresultingstate

•  Synchronizationbetweenreplicasiseventualreplica-to-replicacommunication

•  ManypracticalCRDTsexistandarewidelyusedincommercialsystemswithmillionsofusers•  Counters,sets,maps,graphs,etc.

•  CRDTsarewell-suitedforconvergentdatamanagement38

Basicconvergentdatascenario

•  TheCRDTdatabasereceivesperiodicupdates(raworaggregated)fromalledgedevices

•  Messagesmaybelost,reordered,orrepeatedwithoutharm•  TheCRDTdatabasewillalwaysbeconvergingtothetruthattheedge;stalenessdependsontheupdateprotocolanddelay

CRDTdatabase(cloudorgateway)

Edgedevices(inhighnumbers)

updates

39

LateraldatasharinginLiRA

•  LiRA(LightKoneReferenceArchitecture)proposesanarchitecturewithconvergentdatamanagement,withartefacts(AntidoteDB,Achlys,Legion)

•  Astheedgecontinuestogrowexponentially,verticaldatasharing(redlines)isextendedforconvergentdatamanagement

•  Inaddition,lateraldatasharing(bluelines)isaddedtodoconvergentdatamanagementbetweendevicesatthesamelevel

CHAPTER 1. OVERVIEW

Figure 1.0.1: A typical hierarchical architecture of Fog/Edge Comput-ing in a airport monitoring use case (source OpenFog [6]). The diagramdemonstrates the need for lateral and vertical data sharing and computa-tion across different airport sub-systems and terminals at different levelsof the fog hierarchy.

management is rather often delegated to lower layers, e.g., database vari-ants, in-memory caching systems, etc.

None of current RAs provide explicit solutions to these problems de-spite their significant importance to fog applications. In fact, solving P1helps keeping the data at a close proximity of its source and close-bynodes, thus avoiding extra time delays, communication failures and over-heads, and security threats. Whereas, solving P2 considers the diversedata usage patterns, consistency requirements, and invariants of fog ap-plications, key to improving user experience despite delays and networkspartitions—likely to occur in such hostile networks.

1.2 LiRA Approach

The LightKone Reference Architecture (LiRA) bridges the above gap byproviding a data management layer that supports lateral data sharing(via replication) as well as fog/edge application-level data management.LiRA gives priority to high availability, often desired in fog/edge applica-tions, without breaking application semantics or giving up consistency—although delayed. This approach stems from the fact that given the un-avoidable network partitions and delays in fog/edge networks, one hasto choose either Consistency or Availability as explained in the CAP the-

LightKone Reference Architecture LiRAv0.9, June 15, 2019

From«LightKoneReferenceArchitecture(LiRA)WhitePaper»,AliShokeretal(2019)

40

(stable)

(exponential)

Just-RightConsistency(JRC)•  JRCgeneralizesconvergentdatamanagementforapplicationinvariants

•  Differentpartsofanapplicationneeddifferentconsistencylevels•  Nosinglemodelisbest:synchronousmodelsaresafest,butasynchronousonesarefastandtoleratepartitions

•  Wewanttousesynchronousandasynchronoustogetherandgetthebestofbothworlds

•  JRCclassifiesapplicationinvariantsaccordingtoCAP:•  «Chooseanytwo:Consistency,Availability,Partition-tolerance»•  AP-compatiblepatterns:CRDTswithconcurrentupdates,causalorderofoperations,highly-availabletransactions

•  CAP-sensitivepattern:stablepreconditionbeforeconcurrentupdate(canbeverifiedwithtoolsupport,e.g.,CECtool)

•  Allotheroperationsmustbedonesequentially41

Simplifieddistributedsystems 42

Simplifyingdistributedsystems•  Distributedsystemshaveconcurrency,messagelatency,partialfailure,andreal-worldinteraction.Theyarequitedifficulttounderstandanddevelopintheirfullgenerality.

•  Ausefulapproachistofocusonsimplerkindsofdistributedsystems•  Thegeneralformcanbeseenasthesimplerkindwithsmalladditions•  Followingthisapproachcangreatlyhelpdesigningdistributedsystems

•  Weshowtwousefulkindsofsimplifieddistributedsystems•  Eachkindsatisfiesastrongpropertythatgreatlyhelpsdevelopment,andthe

generalformcanbeattainedwithsmalladditions

Linearsystem

Confluentsystem

NonlinearLinear

Imperative(real-worldinteraction)

Purefunctional(lambdacalculus)

43

Confluentsystemsprinciple•  Functionalprogrammingbasedonλ calculusisconfluent:

•  Church-Rosser:Givenaninitialexpression,thefinalresultofareductionisthesameforallreductionorders(uptorenaming)

•  Afunctionalprogramcanbeseenasanetworkofconcurrentagents,eachexecutinginitsownthread.Church-Rossermeansthatforallschedulerchoices,theresultisthesame.

•  Wemakethisprogramdistributedbyplacingeachsubexpressiononitsownnode•  Thisrequiresoneextrareductionrule,amobilityrule,tocolocateexpressionsonanodeforreduction.Church-Rosserstillholds.

•  Thelimitationisthatitissyntacticallyknownfromwhichsubexpressioneachagent’snextinputwillcome•  Forgeneraldistributedsystems,weneedtoaddinteractionpoints,whichdependonreductionorderandexpressreal-worldinteraction

See«WhyTimeisEvilinDistributedSystemsandwhattodoaboutit»,byPeterVanRoy,invitedtalk,CodeBEAM2019,May16,2019

44

Confluentdistributedsystems

•  Asimpleexampleofthisapproachistheclient/server:thisprogramiscompletelyfunctionalexceptforoneinteractionpoint,namelywheretheserveracceptsanincomingmessagefromanyclient

•  Thegeneralapproachistowritemostoftheprogramwithdistributedfunctionalconcurrencyandaddinteractionpointswhereneeded

Server

Client1

Client2 Nondeterministicmessagereceiveisaninteractionpoint

localspinnodep=newport(s)server(state,s)endnodeclient(state1,p)endnodeclient(state2,p)end… /*asmanyclientsasweneed*/endfunclient(state,p)send(query(state),p)client(fc(state),p)endfunserver(state,s)casesofq|tthenserver(fs(q,state),t)[]nilthennilendend

Oneinteractionpoint

Pseudocodeofclient/server

…

45

Linearsystemsprinciple•  Anotherusefulkindofdistributedsystemisthelinearsystem

•  Linearsystemsarewidelyusedinmechanicalandelectricalengineering,butarelessusedininformatics,becausecomputerprogramsareinherentlydiscreteandhencenonlinear

•  However,distributedsystemsaremoreandmorebeingusedintightconnectionwiththerealworld,tomonitorandcontrolrealworldsystems.Thesedistributedsystemscouldtakeadvantageoflinearity,justlikeinotherengineeringdisciplines.

•  Theworldisacombinationoflinearityandnonlinearity•  Linearity=independentparts=wholeequalsthesumoftheparts•  Nonlinearity=interactingparts=wholeismorethanthesumoftheparts

•  Linearsystemsaremucheasiertoanalyzequantitativelythannonlinearones•  Becauseinlinearsystems,thepartscanbeanalyzedseparatelyandthencombined

(superpositionprinciple,compositionalsystems)•  Inaddition,somenonlinearsystemscanbeanalyzedqualitatively(usinga

geometricapproach).See“NonlinearDynamicsandChaos”,byS.Strogatz(1994).•  Nonlinearitycannotbeeliminatedentirely;oftenthesystemiscriticallybasedonit

•  Forexample,intelligence(complexcomponents,machinelearning)isoftennonlinear•  That’swhybiologicalsystemsaremadeofweaklyinteractingsubsystems

46

Lineardistributedsystems•  Largedistributedsystemscanoftenbemostlylinear

•  Thisisespeciallytrueifthedistributedsystemisconnectedtightlytotherealworld(monitoring,analyzing,controlling),e.g.,indigitaltwins

•  Basicphysicalquantitiesareadditive(mass,force,momentum,energy)

•  Theycan’tbecompletelylinear,though•  Becauseweneednonlinearityformostnontrivialbehavior•  Discretedecisions(e.g.,Buridan’sprinciple)arenonlinear•  Real-worldgrapheffectsarenonlinear

•  Weshouldaddnonlinearitywhereneededbutnomore•  Today’sdistributedsystemsarefartoononlinearanddiscontinuous•  Theyshouldbemostlylinearwithsmallamountsofnonlinearityaddedwhereneeded

•  Thisalsosimplifiesresilience,sincelinearityimplieshighredundancy47

Conclusions 48

Conclusions•  Internetscalecontinuestoincrease,soscalabilityremainsanimportantresearchtopic

•  Buildinglarge-scalesystemsisdifficultbecausenewandunusualphenomenaappearateachnewscale•  Buridan’sprinciple,blackswans,real-worldgraphphenomena

•  Wegiveanoverviewofsomeofthesephenomenaandofsomedesignprinciplesthatwehaveidentified•  Heisenbergapplications,weaklyinteractingfeedbackstructures,convergentdatamanagement

•  Thisispartofongoingworkonunderstandingscalability•  Convergentdatamanagementisongoing(LightKoneproject)•  Confluentdistributedsystemsisongoing(VanRoyandHaeri)•  Wewelcomeyourreactionsandallpointerstorelatedwork

49

Reflections on scalability and - info.ucl.ac.be

Documents

Transcript of Reflections on scalability and - info.ucl.ac.be