A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for...
Transcript of A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for...
1
ALow-latencyConsensusAlgorithmforGeographically
ReplicatedSitesMasterofScienceThesisDefense
BalajiArun
Committee:Dr.Binoy Ravindran (chair)Dr.Haibo ZengDr.RobertBroadwater
2
Agenda• Introduction• Motivation• ThesisContribution:CAESAR• Evaluation• Conclusion
3
Buildingonlineservicestoday…
4
DesiredProperties• Availability– FaultTolerance• Low-latency• High-throughput• StrongConsistency
5
DesiredProperties• Availabilityunderfaults• Low-latency• High-throughput• StrongConsistency
Replication
DistributedSystem
6
ReplicatingStateful Systems• Stateful systems• e.g.databases,in-memorycaches
• Requiresheavycoordinationamongnodes• tomaintainconsistency
• Expensiveinwidearea• duetohighlatencylinks
7
CAPTheorem[Brewer1997]
Availability
PartitionTolerance
Consistency
8
CAPTheorem[Brewer1997]
Availability
PartitionTolerance
Consistency
9
CAPTheorem[Brewer1997]
Availability
PartitionTolerance
Consistency
10
CAPTheorem[Brewer1997]
Availability
PartitionTolerance
Consistency
11
ThesisContribution
Availability
PartitionTolerance
Consistency Availability
PartitionTolerance
ConsistencyUnderfaults*
*onlywhenmajorityofnodesinthesystemfail
12
StateMachineReplication[F.B.Schneider1990]• Executecommands inthesameorderinallreplicas.
C1
C2 C3
13
StateMachineReplication• Executecommandsinthesameorderinallreplicas
C2
C1
C3C2C2 C3 C3
C1C1
Consensus
14
Paxos [Lamport 2001]
• Agreementprotocol• Choosecommandperslot• 2RTTs• Oneforslotownership• Onefordecidingcommand
• SurvivesFcrashesin2F+1nodes
1 2 3 4 5 …
C1 C2 C3
Letmeownslot1
Letmeownslot1
Letmeownslot1
15
Paxos• Agreementprotocol• Choosecommandperslot• 2RTTs• Oneforslotownership• Onefordecidingcommand
• SurvivesFcrashesin2F+1nodes
1 2 3 4 5 …
C1 C2 C3
VoteOrange VoteBlueVoteBlue
16
Paxos• Agreementprotocol• Choosecommandperslot• 2RTTs• Oneforslotownership• Onefordecidingcommand
• SurvivesFcrashesin2F+1nodes
1 2 3 4 5 …
C1 C3
C2
17
Paxos variants• Onereplicadecides:• Multi-Paxos,FastPaxos
• Roundrobinapproach:• Mencius[Mao‘2008]
1 2 3 4 5 …
C1 C2 C3
1 2 3 4 5 …
C1 C2 C3C4 C5
18
GeneralizedConsensus[Lamport 2005]
• Orderonlynon-commutativecommands• e.g.samekeyinKey-Valuestore.
• EPaxos [Moraru ‘13],M2Paxos[Peluso ’16]• Bestperformanceundernoconflicts(FastDecision,1RTT)• Performancedegradesunderconflicts(SlowDecision,1+RTT)
• Thesiscontribution:CAESAR
19
üIntroductionüMotivation
CAESAR• Evaluation• Conclusion
20
Overview• ImplementsGeneralizedConsensus• Useslogicaltimestampingtoordercommands(likeMencius)• Exploitsquorumsbygatheringdependenciesforcommands(likeEPaxos)• Deploysanovelwait conditiontoboostfastdecisions• underconflicts
21
Systemmodel• Nodescommunicatethroughmessagepassing• Usestwoquorumtypes• ClassicQuorum: !" +1
• FastQuorum: #!$• Fourphases• FastPropose• SlowPropose• Retry• Stable
22
UniformReliableBroadcast
PROPOSE:
PROPOSE:C
STABLE:
STABLE:C
OK:C
OK:OK:
OK:C
p0
p1
p2
p3
p4c&
c& c&
c&
c&c&
c&
c&
c&
c
c
c
c
c
23
PROPOSE:|4
PROPOSE:C|0
STABLE:|4|{C}
STABLE:C|0|{}
OK:C|{}
OK:|{C}OK:|{}
OK:C|{}
p0
p1
p2
p3
p4 c&
c
c& c&c&
c&
c&
c&
c&
c&
c
c
c
c
CAESAR:BasicProtocol
24
PROPOSE:|4
PROPOSE:C|0
STABLE:|4|{C}
STABLE:C|0|{}
OK:C|{}
OK:|{C} OK:|{}
OK:C|{}
p0
p1
p2
p3
p4
WAIT
c&c& c&
c&
c&
c&
c&
c&
c&
c
c
c
c
c
CAESAR:WaitCondition(fastpath)
25
PROPOSE:|4
PROPOSE:C|0
STABLE:|4|{}
RETRY:C|5|{}
OK:C|{}
OK:|{} OK:|{}
NACK:C|{} OK:C|{}
STABLE:C|5|{}p0
p1
p2
p3
p4
WAIT
c&c&
c&
c&
c&
c&
c& c&
c&c&
c&
c&
c&
c
c
c
c
c
CAESAR:WaitCondition(slowpath)
26
üIntroductionüMotivationüTheContribution:Caesar
Evaluation• Conclusion
27
Implementation• CAESAR• ImplementedinJava8• AdoptedandmodifiedJPaxos
• usednetwork,messagingandstatemachineabstractions
• Competitors• EPaxos,M2Paxos,Mencius,Multi-Paxos publiclyavailable
• Benchmark• FullyreplicatedKey-Valuestorebenchmark
28
Deployment
Ireland
Frankfurt
Mumbai
Ohio
Virginia
• AmazonEC2• m4.2xlargeinstances(8vCPUs,32GBMemory)
29
Client-perceivedperformance
30
Latency
50
70
90
110
130
150
170
190
210
230
250
0% 2% 10% 30% 50% 100%
Latency(msec)
Virginia
50
70
90
110
130
150
170
190
210
230
250
0% 2% 10% 30% 50% 100%
Ohio
50
70
90
110
130
150
170
190
210
230
250
0% 2% 10% 30% 50% 100%
Frankfurt
50
70
90
110
130
150
170
190
210
230
250
0% 2% 10% 30% 50% 100%
Ireland
50
70
90
110
130
150
170
190
210
230
250
0% 2% 10% 30% 50% 100%
Mumbai
50
70
90
110
130
150
170
190
210
230
250
0% 2% 10% 30% 50% 100%
Ireland
Caesar EPaxos M2Paxos
• Closed-looprequestinjection• 10clientspernode
Conflict%
31
Systemperformance
32
Throughput
0
10
20
30
40
50
0% 2% 10% 30% 50% 100%
Throug
hput(1000x
tps) EPaxos CaesarM2Paxos Multi-Paxos-IRMulti-Paxos-IN Mencius
0
100
200
300
400Throug
hput(1000x
tps)
• Openlooprequestinjection
NetworkBatchingEnabled
NetworkBatchingDisabled
17%
24%
45%
Conflict%
33
SlowPathsvsEPaxos
0% 20% 40% 60% 80%
100%
0% 2% 10% 30% 50% 100%
%ofSlowPaths EPaxos Caesar
1.64%0.4%
44.5%
14.7%6.7%2.0%
29.83%
50.0%
9.84%
100%
Conflict%
34
üIntroductionüMotivationüTheContribution:CaesarüEvaluation
Conclusion
35
Conclusion• CAESARprovidesallthedesiredpropertiesforbuildingtoday’sservices.
HighAvailabilityunder faultsStateMachineReplication andConsensus
Strongconsistency
Low-latency Fast Paths
High-throughput Generalizedconsensusandsimpleexecutionphase
36
FutureWork• CAESARcanexhibitalatencydistributionwithalargestandarddeviation• duetomultiplephasesandwaitcondition
• ThisisachallengeformeetingServiceLevelAgreement(SLAs)• Futureworkshouldminimizelargetaillatencies
37
Submission
SubmittedtoDSN2017
38
Sourcecode• Opensource@https://www.github.com/ibalajiarun/caesar
39
References• Lamport,Leslie,“Paxos madesimple,”ACMSigact News,2001.• I.Moraru,D.G.Andersen,andM.Kaminsky,“ThereisMoreConsensusinEgalitarianParliaments,”inProceedingsoftheTwenty-FourthACMSymposiumonOperatingSystemsPrinciples,ser.SOSP’13.ACM,2013,pp.358–372.
• Y.Mao,F.P.Junqueira,andK.Marzullo,“Mencius:BuildingEfficientReplicatedStateMachinesforWANs,”inProceedingsofthe8thUSENIXConferenceonOperatingSystemsDesignandImplementation,ser.OSDI’08.USENIXAssociation,2008,pp.369–384.
• L.Lamport,“GeneralizedConsensusandPaxos,”MicrosoftResearch,Tech.Rep.MSR- TR-2005-33,March2005.
• S.Peluso,A.Turcu,R.Palmieri,G.Losa,andB.Ravindran,“Makingfastconsensusgenerallyfaster,”inDSN,2016.
• F.B.Schneider,“Implementingfault-tolerantservicesusingthestatemachineapproach:Atutorial,”ACMComput.Surv.,vol.22,no.4,pp.299–319,Dec.1990.[Online].Available:http://doi.acm.org/10.1145/98163.98167
40
Thanksto:Myadvisor,Dr.BinoyRavindran,
Roberto,Sebastiano
Dr.Broadwater,Dr.Zeng
and
AllmyfellowSSRGians(presentandpast)