CRaft:Building High-Performance Consensus Protocols with ...

23
CRaft: Building High-Performance Consensus Protocols with Accurate Clocks Feiran Wang*, Balaji Prabhakar*, Mendel Rosenblum*, Gene Zhang† *Stanford University, †eBay Inc.

Transcript of CRaft:Building High-Performance Consensus Protocols with ...

Page 1: CRaft:Building High-Performance Consensus Protocols with ...

CRaft: Building High-Performance Consensus Protocols with Accurate Clocks

Feiran Wang*, Balaji Prabhakar*, Mendel Rosenblum*, Gene Zhang†

*Stanford University, †eBay Inc.

Page 2: CRaft:Building High-Performance Consensus Protocols with ...

Overview

• CRaft: a multi-leader extension to Raft enabled by accurate clocks

2

Better performanceExisting protocol Synchronized clocks

Page 3: CRaft:Building High-Performance Consensus Protocols with ...

State Machines

• Maintain internal states

• Respond to external requests

• Examples: databases, storage systems

• How do we make them reliable?

3

Statex: 2y: 3

x←1 Statex: 1y: 3

Statex: 1y: 1

y←x

Page 4: CRaft:Building High-Performance Consensus Protocols with ...

Replicated State Machines

Servers

StateMachine

x: 2y: 3

Consensus

Logx←1 y←1 …

ClientClient

x←1

StateMachine

x: 2y: 3

Consensus

Logx←1 y←1 …

StateMachine

x: 2y: 3

Consensus

Logx←1 y←1 …

• Consensus: ensures all servers agree on the same log

• Continues to operate if at least a majority of servers are up

4Diego Ongaro and John Ousterhout. The Raft consensus algorithm. https://raft.github.io

Page 5: CRaft:Building High-Performance Consensus Protocols with ...

The Raft Consensus Protocol

• A widely used consensus protocol

• Leader-based

• Benefits: simple and efficient

• Limitation: leader is the bottleneck for throughput and scalability

5

Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In USENIX Annual Technical Conference. 305–319.

Follower Follower

Leader

ClientClient

Client

Leader

x←1 y←1 …

x←1 y←1 … x←1 y←1 …

Page 6: CRaft:Building High-Performance Consensus Protocols with ...

Limitations with Single Leader

• Single leader limits throughput and scalability

6

Performance degrades with high load

Decreasing throughput with larger cluster sizes

Load increases

Page 7: CRaft:Building High-Performance Consensus Protocols with ...

Challenge in a Multi-Leader Protocol

7

Replicate my log I have a log

I have a log I have a logok ok

Single leader Multiple leaders

• Challenge: how to coordinate leaders?

• Solution: agreement on time => agreement on order

Page 8: CRaft:Building High-Performance Consensus Protocols with ...

Clock Synchronization

Percentile 90th 99th 99.9th max Clock offset 7us 11us 15us 26us

Distribution of clock offsets between servers

(20 machines on CloudLab)

• Achieving agreement on time is not trivial in a distributed system

• Huygens: a software clock synchronization system

8

Yilong Geng, Shiyu Liu, Zi Yin, Ashish Naik, Balaji Prabhakar, Mendel Rosenblum, and Amin Vahdat. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In NSDI 2018. 81–94.

Huygens precision: ~20us

NTP precision: ~20ms

Page 9: CRaft:Building High-Performance Consensus Protocols with ...

Our Approach: CRaftRaft CRaft (Clocks + Raft)

Scalability

Output A replicate log A replicated log

Safety & Consistency ✓ ✓Same guarantee as Raft

Practicability ✓ ✓A simple add-on to Raft; easy to implement

9

Page 10: CRaft:Building High-Performance Consensus Protocols with ...

The CRaft Consensus Protocol

Page 11: CRaft:Building High-Performance Consensus Protocols with ...

CRaft Overview

Follower

Server 2

Leader

Server 1

Follower

Server 3

Client

LeaderFollower Follower

FollowerFollower Leader

Group 1

Group 2

Group 3

Client Client

Merged log Merged log Merged log

11

Page 12: CRaft:Building High-Performance Consensus Protocols with ...

Life of a Request

Leader

Server 1

Follower

Follower

Client

State Machine

log Follower

Server 2

Leader

Follower

State Machine

log Follower

Server 3

Follower

Leader

State Machine

log

12

Replicate Commit Merge Execute

• Replicated on a majority of servers• Safe and durable

Page 13: CRaft:Building High-Performance Consensus Protocols with ...

Timestamp Management

• CRaft guarantees monotonically increasing timestamps in each log

• Safe time: indicates how up-to-date a log is

1x←1

4y←1

6y←x

17x←2

18x←5

1 2 3 4 5 index

13

Leader

Server

Follower

Follower

Merged log

timestamp

command

LogSafe time = 20

Page 14: CRaft:Building High-Performance Consensus Protocols with ...

Safe Times

14

1 4 6 17 18LogSafe time = 20 23 …25

Current entries: timestamps <= safe time

No entries come in with a timestamp smaller than safe time

NowHow up-to-date is this log?

Page 15: CRaft:Building High-Performance Consensus Protocols with ...

Merging

1 4 6 17

2 5 12

18

3 8 10 15

1 2 3 4 5

Log 1

Log 2

Log 3

65 8

ts = 18

index

ts = 12

ts = 19

merged log

• Merge up to the smallest safe time

• CRaft ensures merged log in monotonically increasing timestamp order

15

10 12…

Page 16: CRaft:Building High-Performance Consensus Protocols with ...

Optimization: Fast Path

• Fast path: respond to clients early for certain write operations

16

Replicate Commit Merge Execute

Normal path: respond

after execution

Fast path: respond

before execution

Page 17: CRaft:Building High-Performance Consensus Protocols with ...

Evaluation

Page 18: CRaft:Building High-Performance Consensus Protocols with ...

Experiment Setup

• Implementation• Based on HashiCorp Raft – a popular and well-optimized implementation

• Environment• CloudLab, single data center

• Workload• In-memory key-value store• Multiple clients send get or set requests concurrently

18

Page 19: CRaft:Building High-Performance Consensus Protocols with ...

Throughput vs Cluster Size

• Up to ~2x read and ~2.5x write throughput compared to Raft

19

Page 20: CRaft:Building High-Performance Consensus Protocols with ...

Latency vs Throughput

Average latency vs throughput (3 servers) 99th percentile latency vs throughput (3 servers)

• CRaft improves throughput and latency under high load

20

Performance gain under high load

Load increases

Load increases

Page 21: CRaft:Building High-Performance Consensus Protocols with ...

Average Latency

Performance vs Number of Clients

21

Latency is bounded by clock difference

Throughput

2x

2x

2x

• NTP precision: ~20ms, Huygens: ~20us

Page 22: CRaft:Building High-Performance Consensus Protocols with ...

Conclusion

22

Better performance

Stronger consistencyExisting systems Synchronized clocks

• Accurate clocks enable better performance and/or consistency

Page 23: CRaft:Building High-Performance Consensus Protocols with ...

Thank you!