CRaft:Building High-Performance Consensus Protocols with ...
Transcript of CRaft:Building High-Performance Consensus Protocols with ...
CRaft: Building High-Performance Consensus Protocols with Accurate Clocks
Feiran Wang*, Balaji Prabhakar*, Mendel Rosenblum*, Gene Zhang†
*Stanford University, †eBay Inc.
Overview
• CRaft: a multi-leader extension to Raft enabled by accurate clocks
2
Better performanceExisting protocol Synchronized clocks
State Machines
• Maintain internal states
• Respond to external requests
• Examples: databases, storage systems
• How do we make them reliable?
3
Statex: 2y: 3
x←1 Statex: 1y: 3
Statex: 1y: 1
y←x
Replicated State Machines
Servers
StateMachine
x: 2y: 3
Consensus
Logx←1 y←1 …
ClientClient
x←1
StateMachine
x: 2y: 3
Consensus
Logx←1 y←1 …
StateMachine
x: 2y: 3
Consensus
Logx←1 y←1 …
• Consensus: ensures all servers agree on the same log
• Continues to operate if at least a majority of servers are up
4Diego Ongaro and John Ousterhout. The Raft consensus algorithm. https://raft.github.io
The Raft Consensus Protocol
• A widely used consensus protocol
• Leader-based
• Benefits: simple and efficient
• Limitation: leader is the bottleneck for throughput and scalability
5
Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In USENIX Annual Technical Conference. 305–319.
Follower Follower
Leader
ClientClient
Client
Leader
x←1 y←1 …
x←1 y←1 … x←1 y←1 …
Limitations with Single Leader
• Single leader limits throughput and scalability
6
Performance degrades with high load
Decreasing throughput with larger cluster sizes
Load increases
Challenge in a Multi-Leader Protocol
7
Replicate my log I have a log
I have a log I have a logok ok
Single leader Multiple leaders
• Challenge: how to coordinate leaders?
• Solution: agreement on time => agreement on order
Clock Synchronization
Percentile 90th 99th 99.9th max Clock offset 7us 11us 15us 26us
Distribution of clock offsets between servers
(20 machines on CloudLab)
• Achieving agreement on time is not trivial in a distributed system
• Huygens: a software clock synchronization system
8
Yilong Geng, Shiyu Liu, Zi Yin, Ashish Naik, Balaji Prabhakar, Mendel Rosenblum, and Amin Vahdat. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In NSDI 2018. 81–94.
Huygens precision: ~20us
NTP precision: ~20ms
Our Approach: CRaftRaft CRaft (Clocks + Raft)
Scalability
Output A replicate log A replicated log
Safety & Consistency ✓ ✓Same guarantee as Raft
Practicability ✓ ✓A simple add-on to Raft; easy to implement
9
The CRaft Consensus Protocol
CRaft Overview
Follower
Server 2
Leader
Server 1
Follower
Server 3
Client
LeaderFollower Follower
FollowerFollower Leader
Group 1
Group 2
Group 3
Client Client
Merged log Merged log Merged log
11
Life of a Request
Leader
Server 1
Follower
Follower
Client
State Machine
log Follower
Server 2
Leader
Follower
State Machine
log Follower
Server 3
Follower
Leader
State Machine
log
12
Replicate Commit Merge Execute
• Replicated on a majority of servers• Safe and durable
Timestamp Management
• CRaft guarantees monotonically increasing timestamps in each log
• Safe time: indicates how up-to-date a log is
1x←1
4y←1
6y←x
17x←2
18x←5
1 2 3 4 5 index
13
Leader
Server
Follower
Follower
Merged log
timestamp
command
LogSafe time = 20
Safe Times
14
1 4 6 17 18LogSafe time = 20 23 …25
Current entries: timestamps <= safe time
No entries come in with a timestamp smaller than safe time
NowHow up-to-date is this log?
Merging
1 4 6 17
2 5 12
18
3 8 10 15
1 2 3 4 5
Log 1
Log 2
Log 3
65 8
ts = 18
index
ts = 12
ts = 19
merged log
• Merge up to the smallest safe time
• CRaft ensures merged log in monotonically increasing timestamp order
15
10 12…
Optimization: Fast Path
• Fast path: respond to clients early for certain write operations
16
Replicate Commit Merge Execute
Normal path: respond
after execution
Fast path: respond
before execution
Evaluation
Experiment Setup
• Implementation• Based on HashiCorp Raft – a popular and well-optimized implementation
• Environment• CloudLab, single data center
• Workload• In-memory key-value store• Multiple clients send get or set requests concurrently
18
Throughput vs Cluster Size
• Up to ~2x read and ~2.5x write throughput compared to Raft
19
Latency vs Throughput
Average latency vs throughput (3 servers) 99th percentile latency vs throughput (3 servers)
• CRaft improves throughput and latency under high load
20
Performance gain under high load
Load increases
Load increases
Average Latency
Performance vs Number of Clients
21
Latency is bounded by clock difference
Throughput
2x
2x
2x
• NTP precision: ~20ms, Huygens: ~20us
Conclusion
22
Better performance
Stronger consistencyExisting systems Synchronized clocks
• Accurate clocks enable better performance and/or consistency
Thank you!