Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical...

20
Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout

Transcript of Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical...

Page 1: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Exploiting Commutativity ForPractical Fast Replication

Seo Jin Park and John Ousterhout

Page 2: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Problem: consistent replication adds latency and throughput overheads§ Why? Replication happens after ordering

● Key idea: exploit commutativity to enable fast replication before ordering

● CURP (Consistent Unordered Replication Protocol)§ Clients replicate in 1 round-trip time (RTT) if operations are commutative§ Simple augmentation on existing primary-backup systems

● Results§ RAMCloud’s performance improvements

● Latency: 14 µs → 7.1 µs (no replication: 6.1 µs)● Throughput: 184 kops/sec → 728 kops/s (~4x)

§ Redis cache is now fault-tolerant with small cost (12% latency ↑, 18% throughput ↓)

Slide 2

Overview

Page 3: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Unreplicated Systems: 1 RTT for operation

● Replicated Systems: 2 RTTs for operations

Slide 3

Consistent Replication Doubles Latencies

client

Server

① write x = 1

② ok X: 1

client

Primary Backup

① write x = 1

④ ok X: 1③ ok② X: 1

X: 1 X: 1X: 1

Page 4: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Slide 4

Strawman 1 RTT Replication

Client

Client

Primary

… x←1 x←2x←1

x←2

… x←2 x←1

Backups

…Strong consistency is broken!

State: x = 1

State: x = 2

Page 5: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Consistent replication protocols must solve two problems:§ Consistent Ordering: all replicas should appear to execute operations in the same order§ Durability: once completed, an operation must survive crashes.

● Previous protocols combined the two requirements

Slide 5

What Makes Consistent Replication Expensive?

Client

Client

Client Primary

x←3 x←1 x←2

x←1

x←3

x←2

Backups

x←3 x←1 x←2

x←3 x←1 x←2

x←3 x←1 x←2

1 RTTfor serialization

Time to completean operation 1 RTT

for replication

Page 6: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● For performance: cannot do totally ordered replication in 2 RTTs

● Replicate just for durability & exploit commutativity to defer ordering§ Safe to reorder if operations are commutative (e.g. updates on different keys)

● Consistent Unordered Replication Protocol (CURP): § When concurrent operations commute, replicate without ordering§ When not, fall back to slow totally-ordered replication

Slide 6

Exploiting Commutativity to Defer Ordering

Page 7: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Primary returns execution results immediately (before syncing to backups)

● Clients directly replicate to ensure durability

Slide 7

Overview of CURP

Client

Client

Primary

… x←1 x←2 y←5 z←7y←5

z←7

… x←1 x←2

Backups

z←7

y←5

Witnesses

async

1 RTTTime to complete

an operation

Witness● No ordering info● Temporary until primary replicates to backups● Witness data are replayed during recovery

garbagecollection

Page 8: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Slide 8

Normal Operation

ClientPrimary

execute

“z←7”async

record“z←7”accepted

result:ok

z←7

y←5

Witnesses

… x←1 x←2

Backups

… x←1 x←2 y←5 z←7

● Clients send an RPC request to primary and witnesses in parallel● If all witnesses accepted (saved) request, client can complete operation

safely without sync to backups.

Page 9: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Slide 9

Normal Operation (continued.)

ClientPrimary

… x←1 x←2 y←5 z←7① execute “z←2”

… x←1 x←2

① record“z←2”rejected

② sync“z←2” y←5 z←7

z←7

y←5

Witnesses

Backupssynced

● If any witness rejected (not saved) request, client must wait for sync to backups.§ Operation completes in 2 RTTs mostly (worst case 3 RTTs)

● When a primary receives a sync request, usually syncing to backups is already completed or at least initiated

Page 10: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Slide 10

Crash Recovery

Primary

… x←1 x←2 y←5 z←7 … x←1 x←2

Backups

z←7

y←5

Witnesses

async

New Primaryy←5 z←7… x←1 x←2

② retry

y←5 z←7

y←5 z←7

async

State: x = 2, y = 5, z = 7

State: x = 2, y = 5, z = 7

● First load from a backup and then replay requests in a witness● Example:

Page 11: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

1. Replay from witness may be out of order§ Witnesses only keep commutative requests

2. Primaries may reveal not-yet-durable data to other clients§ Primaries detect & block reads of unsynced data

3. Requests replayed from witness may have been already recovered from backup§ Detect and avoid duplicate execution using RIFL

Slide 11

3 Potential Problems for Strong Consistency

Page 12: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Witness has no way to know operation order determined by primary● Witness detects non-commutative operations and rejects them

§ Then, client needs to issue explicit sync request to primary

● Okay to replay in any order

Slide 12

P1. Replay From Witness May Be Out Of Order

Witness

y←5x←3

Witness

Clientx←3

y←5acceptedx←4

Clientrejected

Page 13: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Primary doesn’t know if an operation is made durable in witnesses● Subsequent operations (e.g. reads) may externalize the new data

● Must wait for sync to backups if a client request depends on any unsynced updates

x = 3

Slide 13

P2. Primaries May Reveal Not-yet-durable Data

Client A

Client BPrimary

… x←1 x←2 x←3

x←3

read x… x←1 x←2

Backups

async

Recorded in witnesses or not?

State: x = 3

Page 14: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

State: x: 3, y: 7

● A client request may exist both in backups and witnesses.

● Replaying operations recovered by backups endangers consistency

Slide 14

P3. Replay Can Cause Duplicate Executions

Primary

… x←1 x←2 x←3 z←7 … x←1 x←2 x←3

Backups

z←7

x←2

Witnesses

async

New primary

… x←1 x←2 x←3 x←2 z←7② retry

Detect & ignore duplicate requests, e.g. RIFL [SOSP’15]①

State: x: 2, y: 7

Page 15: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Performance Evaluation of CURP

Slide 15

Perf

orm

ance

Ø Implemented in Redis and RAMCloud

Consistency

RAMCloud

CURP

CU

RP

Fast KV cache

Replicated in-memory KV store

Consistently replicated &As fast as no replication

Page 16: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

0.5

0.001

0.01

0.1

1

0 10 20 30 40 50 60 70

Fra

ctio

n o

f W

rite

s

Latency (µs)

Original (3 B)CURP (3 B, 3 W)CURP (1 B, 1 W)

Unreplicated

● Writes are issued sequentially by a client to a master§ 40 B key, 100 B value

§ Keys are randomly (uniform dist.) selected from 2 M unique keys

Slide 16

RAMCloud’s Latency after CURP

Median7.1 μs vs. 14 µsConfiguration

● Xeon 4 cores (8 T) @ 3 GHz

● Mellanox Connect-X 2InfiniBand (24 Gbps)

● Kernel-bypassing transport

Page 17: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Thanks to CURP, can batch replication requests without impacting latency: improves throughput

● Each client issues writes (40B key, 100B value) sequentially

Slide 17

RAMCloud’s Throughput after CURP

4x

~6% per replica

0

100

200

300

400

500

600

700

800

900

0 5 10 15 20 25 30Write

Thro

ughput (k

write

per

seco

nd)

Client Count (number of clients)

UnreplicatedCURP (1 B, 1 W)CURP (3 B, 3 W)Original (3 B)

Page 18: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Design§ Garbage collection§ Reconfiguration handling (data migration, backup crash, witness crash)§ Read operation§ How to extend CURP to quorum-based consensus protocols

● Performance§ Measurement with skewed workloads (many non-commutative ops)§ Resource consumption by witness servers§ CURP’s impact on Redis’ performance

Slide 18

See Paper for…

Page 19: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Rely on commutativity for fast replication§ Generalized Paxos (2005): 1.5 RTT§ EPaxos (SOSP’13) : 2 RTTs in LAN, expensive read

● Rely on the network’s in-order deliveries of broadcasts§ Special networking hardware: NOPaxos (OSDI’16), Speculative Paxos (NSDI’15)§ Presume & rollback: PLATO (SRDS’06), Optimistic Active Replication (ICDCS’01)§ Combine with transaction layer for rollback: TAPIR (SOSP’15), Janus (OSDI’16)

● CURP is§ Faster than other protocols using commutativity§ Doesn’t require rollback or special networking hardware§ Easy to integrate with existing primary-backup systems

Slide 19

Related work

Page 20: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Total order is not necessary for consistent replication● CURP clients replicate without ordering in parallel with sending requests

to execution servers è 1 RTT● Exploit commutativity for consistency● Improves both latency (2 RTTs -> 1 RTT) and throughput (4x)

§ RAMCloud’s latency: 14 µs → 7.1 µs (no replication: 6.1 µs)

§ Throughput: 184 kops/sec → 728 kops/s (~4x)

Slide 20

Conclusion