The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++...

23
The Raft Consensus Algorithm and Implementing Raft in C++ Diego Ongaro, August 2015

Transcript of The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++...

Page 1: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

The Raft Consensus Algorithm andImplementing Raft in C++

Diego Ongaro, August 2015

Page 2: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Replicated State MachinesTypical architecture for consensus systems

x←3 y←2 x←1 z←6Log

ConsensusModule

StateMachine

Log

ConsensusModule

StateMachine

Log

ConsensusModule

StateMachine

Servers

Clients

x 1

y 2

z 6

x←3 y←2 x←1 z←6

x 1

y 2

z 6

x←3 y←2 x←1 z←6

x 1

y 2

z 6

z←6

Replicated log ⇒ replicated state machineAll servers execute same commands in same order

Consensus module ensures proper log replication

Page 3: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

RaftAlgorithm for implementing a replicated logSystem makes progress as long as any majority of servers upFailure model: fail-stop (not Byzantine), delayed/lost msgsDesigned for understandability

Page 4: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Raft Overview1. Leader election

Select one of the servers to act as cluster leaderDetect crashes, choose new leader

2. Log replication (normal operation)Leader takes commands from clients, appends to its logLeader replicates its log to other servers (overwriting inconsistencies)

3. SafetyOnly a server with an up-to-date log can become leader

Page 5: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

RaftScope Visualizationjust leader election today

Page 6: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Leader Election ReviewHeartbeats and timeouts to detect crashesRandomized timeouts to avoid split votesMajority voting to guarantee at most one leader per term

Page 7: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

CCabinLLog

github.com/logcabin

LogCabinStarted as research platform for Raft at StanfordDeveloped into production system at Scale ComputingNetwork service running Raft replicated state machineData model: hierarchical key-value storeWritten in (Rust was pre-0.1)gcc 4.4's C++0x

Page 8: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

C++ WinsFastEasy to predict speed of language featuresNo GC pauses

Raft election timeouts can be very lowAs low-level as you want

LogCabin forks a child process to write a consistent snapshot of its state machineResource leaks are rarely an issue

Move semantics, in C++11LogCabin has 47 calls to new, only 6 calls to delete

All this is also true of Rust

std::unique_ptr

Page 9: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Libraries in C++LogCabin is nearly* self-contained *protobuf and gtest libraries are great

Contains event loop (epoll), RPC systemEasier to debug, understand system end-to-endLearned a lot

Hard to depend on librariesNo standard packaging systemLibraries use different subsets of C++

Exceptions? Lambdas? shared_ptr?Thread safety described in documentation (lol)

Hard to extract LogCabin's Raft implementation as a libraryRust: Cargo packaging, crates.io, rich type system

Page 10: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Thread Safety Is HardLogCabin uses style

One mutex per objectAll public methods hold the mutex the entire time (except when blocked on acondition variable)

No language support, not compiler-enforced

Monitor

Equivalent Rust code: exiting wouldn't be in scope

// occasionally hangs forever on shutdownvoid threadMain() while (!exiting) std::unique_lock lockGuard(mutex); // ... do stuff ... condition.wait(lockGuard);

Page 11: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

ConclusionRaft: designed for understandability

Randomized leader election approachVideos of log replication and safety on Paper/dissertation also include:

Cluster membership changes (simpler in dissertation)

Log compactionClient interactionUnderstandability, correctness, performance evaluation

In implementation, C++Offers good and predictable performanceIs missing a healthy library ecosystemAllows memory and thread safety bugs

Excited to see Rust and grow

Raft website

LogCabin

raft-rs

Page 12: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Questionsraft.github.io

mailing listraft-dev

Page 13: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Backup Slides

Page 14: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

MotivationGoal: shared key-value store (state machine)Host it on a single machine attached to network

Pros: easy, consistentCons: prone to failure

With Raft, keep consistency yet deal with failures

Page 15: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

What is consensusAgreement on shared state (single system image)Recovers from server failures autonomously

Minority of servers fail: no problemMajority fail: lose availability, retain consistency

Servers

Key to building consistent storage systems

Page 16: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

How Is Consensus Used?Top-level system configuration

repl. state machine

S1 S2

S3

N N N N...

repl. state machine

leader standby standby

S1 S2

S3

N N N N...

Replicate entire database state

Page 17: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Paxos ProtocolLeslie Lamport, 1989Nearly synonymous with consensus

“The dirty little secret of the NSDI community is that at most fivepeople really, truly understand every part of Paxos ;-).”

—NSDI reviewer

“There are significant gaps between the description of the Paxosalgorithm and the needs of a real-world system...the final system

will be based on an unproven protocol.” —Chubby authors

Page 18: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Raft's Design for UnderstandabilityWe wanted an algorithm optimized for building real systems

Must be correct, complete, and perform wellMust also be understandable

“What would be easier to understand or explain?”

Fundamentally different decomposition than PaxosLess complexity in state spaceLess mechanism

Page 19: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Safe Shutdown Is HardGlobals class constructs and destroys all major objects in correct order

(Config file, event loop, signal handlers, storage, Raft, state machine, RPC handlers)Still hard to get object lifetimes correct

RPCs, background threadsRust: compiler checks lifetimes, no dangling pointers

Page 20: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Raft User Study

0

10

20

30

40

50

60

0 10 20 30 40 50 60

Raft g

rade

Paxos grade

Raft then PaxosPaxos then Raft

0

5

10

15

20

implement explain

numbe

r of p

articipan

ts

Paxos much easierPaxos somewhat easierRoughly equalRaft somewhat easierRaft much easier

Page 21: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Core Raft Review1. Leader election

Heartbeats and timeouts to detect crashesRandomized timeouts to avoid split votesMajority voting to guarantee at most one leader per term

2. Log replication (normal operation)Leader takes commands from clients, appends to its logLeader replicates its log to other servers (overwriting inconsistencies)Built-in consistency check simplifies how logs may differ

3. SafetyOnly elect leaders with all committed entries in their logsNew leader defers committing entries from prior terms

Page 22: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Randomized TimeoutsHow much randomization is needed to avoid split votes?

0%

20%

40%

60%

80%

100%

100 1000 10000 100000

cumulative percent

time without leader (ms)

150­150ms150­151ms150­155ms150­175ms150­200ms150­300ms

Conservatively, use random range ~10x network latency

Page 23: The Raft Consensus Algorithm and Implementing Raft in C++ · 2019-09-13 · Libraries in C++ LogCabin is nearly* self-contained *protobuf and gtest libraries are great Contains event

Raft ImplementationsName Primary Authors Language License

Blake Mizerany, Xiang Li and Yicheng Qin (CoreOS) Go Apache 2.0(Sky) and (CMU, CoreOS) Go MIT (hashicorp) Go MPL-2.0

Java Apache2 (Stanford, Scale Computing) C++ ISC

Scala Apache2Javascript MPL-2.0

(Basho) Erlang Apache2Moiz Raja, Kamal Rameshan, Robert Varga (Cisco), Tom Pantelis (Brocade) Java Eclipse

Javascript MITJavascript ISCScala Apache2C BSD

Copied from Raft website, probably stale.

etcd/raftgo-raft Ben Johnson Xiang Lihashicorp/raft Armon Dadgarcopycat Jordan HaltermanLogCabin Diego Ongaroakka-raft Konrad Malawskikanaka/raft.js Joel Martinrafter Andrew StoneOpenDaylightliferaft Arnout Kazemierskiff Pedro Teixeirackite Pablo Medinawillemt/raft Willem-Hendrik Thiart