When Raft Meets SDN: How to Elect a Leader and Reach Consensus … · RAFT: a representative...
Transcript of When Raft Meets SDN: How to Elect a Leader and Reach Consensus … · RAFT: a representative...
WhenRaftMeetsSDN:HowtoElectaLeaderandReachConsensusinan
UnrulyNetworkYangZhang,Eman Ramadan,Hesham Mekky,Zhi-LiZhang
UniversityofMinnesota
Introduction
ConsensusAlgorithm
Introduction
ConsensusalgorithmisessentialforSDNdistributedcontrolplane
ConsensusAlgorithm
SoftwareDefinedNetworkApplicationLayer
ControlPlane
DataPlane
Applications
NetworkOperatingSystems
NetworkDevice
NetworkDevice
NetworkDevice
NetworkDevice
API
API
API
ControltoDataPlaneInterface
ProblemsinSDNDistributedControlPlane
SDNcontrolplanesetup Cyclicdependencies• controlnetworkconnectivity• consensusalgorithm• controllogicmanagingthenetwork
Inconsensusalgorithm• serverfailurehasbeenstudiedfor
decades• fullmeshconnectivityisassumed• whatifnetworkfails?• newfailurescenariosariseinSDN
RAFT:arepresentativeconsensusalgorithm
• Atanygiventime,eachserveriseither:– Leader:handlesallclientinteractions,logreplication,etc.– Follower:completelypassive– Candidate:usedtoelectanewleader
•Normaloperation:1leader,N-1followers
Follower Candidate Leader
RAFTLeaderElection
Follower Candidate Leader
timesout,startelection
receivesvotesfrommajority
timesout,newelection
discoversserverwithhigherterm
discoverscurrentleaderorahigherterm
start
*TermisdefinedasvirtualtimeperiodinRaftVotecriteria:1)highestterm,2)latestlog
RAFTMEETSSDN
Control Cluster under Normal Operations.
R2
R3 R4
R5
R1
RAFTMEETSSDNR1
R3
Control Cluster under Normal Operations. Oscillating Leadership. Condition. Up-to-date servers have a quorum, but they cannot communicate with each other.
R3
R1
R3
R1
R3
R1
R4
R5R2R2
R3 R4
R5
R1
RAFTMEETSSDN
No Leader Exists.
Condition. Some servers have a quorum, but they have obsolete logs, and servers having up-to-date logs, do not have a quorum.
R5
R4R3
R2
R1R1
POSSIBLESOLUTIONS
• SolutionExpectation• all-to-allconnectivityamongclustermembersaslongasthenetworkisnotpartitioned.
• Gossiping(overlaynetwork)• Pros:easytoimplement• Cons:noguaranteetoworkinallscenarios;heavyoverhead
• RoutingviaPreorders• Pros:built-inresiliencyincontrolplane;nomodificationtoconsensusalgorithm• Cons:requirespathcalculationaheadoftime
RoutingviaPreorder:FailureHandling
• FailureHandlingProcess:• Uponfailures,usealternativeoutgoinglinksifexist• Grouptableisusedforimplementingallpossiblealternativepaths
d=G
E
F
C
D
B
s=A
Itguaranteesanetworkwheretwonodesarealwaysreachableaslongasthereisnopartition.
PRELIMINARYRESULTS
• ExperimentSetup
• RaftImplementation:RaftC++implementationinLogCabin
• SixDockercontainers:5serversand1client• FiveSoftwareswitches:OpenvSwitch• Simulatingthetwofailurescenarios:
§ OscillatingLeadership§ NoExistingLeader
PRELIMINARYRESULTS
Raft: leadership keeps oscillating among servers (unstable).
Raft: no viable leader (liveness lost). PrOG: leadership is stable.
Vanilla Raft is not stable under failure scenarios, while PrOG-assisted Raft is stable.
PRELIMINARYRESULTS
Latency of a request operation increases under failure scenarios
Client suffers much more failed attempts for accessing cluster leader in vanilla Raft.
Summary
• SDNcontrollerlivenessdependsonall-to-allmessagedeliverybetweenclusterservers• RaftisusedtoillustratetheprobleminducedbyinterdependencyinthedesignofSDNdistributedcontrolplane• Possiblesolutionsarediscussedtocircumventinterdependencyissues.• PreliminaryresultsshowtheeffectivenessofPrOG inimprovingtheavailabilityofleadershipinRaftusedbycriticalapplicationslikeONOS.