C8DB Geo-Replicated, Conflict-Free Document Database › papers › 2019 › macromedia.pdfC8DB...
Transcript of C8DB Geo-Replicated, Conflict-Free Document Database › papers › 2019 › macromedia.pdfC8DB...
-
C8DBGeo-Replicated, Conflict-Free
Document DatabaseChristopher S. Meiklejohn
MacrometaHPTS 2019
November 5th, 2019
12/16/2019 Macrometa 1
-
Geo-Distribution Challenges
12/16/2019 Macrometa 2
1. Remain consistent and block the write operation (CP)
2. Allow local write to proceed and risk inconsistency (AP)
-
Conflict-Free Replicated Data Types
12/16/2019 Macrometa 3
Distributed data structures, mimic sequential counterpartsDesigned for convergence under concurrency; deterministic by design.Multiple flavors (e.g., state-based, delta-based, operation-based, etc.)
CountersG-CounterPN-Counter
SetsG-Set2P-SetOR-SetAW-SetRW-Set
RegistersLWW-RegisterMV-Register
Flags
SequenceTreedocRGARGA-Split
DC A
DC B
DC C
{ 1 }{(1, a)}
{ }
{ 1 }{(1, {a, b})}
{ }
{ 1 }{(1, b)}
{ }
{ 1 }{(1, {a, b})}
{(1, a)}
{ 1 }{(1, a)}
{ }
{ }{(1, a)}{(1, a)}
{ 1 }{(1, {a, b})}
{ }
{ 1 }{(1, {a, b})}
{(1, a)}
{ 1 }{(1, {a, b})}
{(1, a)}
{ 1 }
{ 1 }
{ 1 }
What do we do when things don’t commute?
-
CRDTs in Production
12/16/2019 Macrometa 4
Riak distributed databaseFirst major production implementation of state-based CRDTsUsed by NHS, Riot Games, LO/JACK, bet365
AntidoteDB (HPTS ‘17)Developer on operation-based CRDT database with Transactional Causal ConsistencyPart of the LightKone, SyncFree research projects
OthersMesosphere’s DC/OS (Lashup, Minuteman), Adobe, Comcast, MacrometaTomTom, Roshi, Automerge, Ditto, Cosmos DB, etc.
-
Macrometa: Overview
12/16/2019 Macrometa 5
Document database for the edgeDocuments are JSON and grouped into collections(think: namespaces for each document, clients presented with dictionaries)
105 physical regions, 175 logical regions36 reserved core regions, 12 available through public beta today34 customer requested regions (on-prem, specific colo, etc.)35 burstable regions available
Configurable consistencyStrong consistency in a single DC with propagation to remote DCs in orderEventual consistency in an multi-master mode with all DCs receiving updates
-
Macrometa: Architecture
12/16/2019 Macrometa 6
Logging-based ArchitectureUpdates within a data-center are durably written to a logLogs are maintained with Zookeeper and BookKeeper
Causal broadcastThere is a single log per destination DC delivered in FIFO orderUpdates are buffered until log entries dependencies are met
Client operate on the document abstraction levelClients submit documents that are converted to updates by Macrometa nodesMacrometa nodes materialize values for keys based on incoming log updatges
-
Geo-Distribution Concerns
12/16/2019 Macrometa 7
Regions have different SLAs, availability, performance, cost.
Paying for storage on each provider, minimize storage costs.
-
A Set Alice
A Set April
A Set Alice
A Set Alice
DC A
DC B
DC C
Registers
A Set April
A Set April
C Set Charlie
B Set Bob
B Set Bob
B Set Bob
C Set Charlie
C Set Charlie
Charlie
Charlie
Charlie
Concurrent updates merge arbitrate based on the maximum DC identifier.
12/16/2019 Macrometa 8
Ensures at the end of convergence we need to only keep a single update in the log for each field in the
document.
-
A Set First
Andrew
A Set Last
Carnegie
A Set First
Andrew
A Set First
Andrew
DC A
DC B
DC C
Registers: Read Anomalies
A Set Last
Carnegie
A Set Last
Carnegie
{ first: Andrew last: undefined
}
Read anomaly if read is performed here based on user expectations.
12/16/2019 Macrometa 9
-
A Set First
Andrew
A Set Last
Carnegie
A Set First
Andrew
A Set First
Andrew
DC A
DC B
DC C
Transactions v1
A Set Last
Carnegie
A Set Last
Carnegie
Updates are grouped together with the same logical clock.
Updates are applied atomically to the log.
12/16/2019 Macrometa 10
-
A Set First
Andrew
A Set Last
Carnegie
A Set First
Andrew
A Set First
Andrew
DC A
DC B
DC C
Transactions v1: Concurrency
A Set Last
Carnegie
A Set Last
Carnegie
B Set Death1919
B Set Death1919
B Set Death1919
12/16/2019 Macrometa 11
{ first: Andrewlast: Carnegiedeath: 1919
}
{ first: Andrewlast: Carnegiedeath: 1919
}
{ first: Andrewlast: Carnegiedeath: 1919
}
Concurrent operations are merged together using the existing convergence strategy by maximum DC identifier.
-
A Set First
Andrew
A Set Last
Carnegie
A Set First
Andrew
A Set First
Andrew
DC A
DC B
DC C
Transactions v1: Concurrency Anomalies
A Set Last
Carnegie
A Set Last
Carnegie
B SetLast
Brown
B SetLast
Brown
B SetLast
Brown
{ first: Andrew last: Brown
}
Convergence uses maximum DC identifier and only takes part of the update.
12/16/2019 Macrometa 12
-
A Set First
Andrew
A Set Last
Carnegie
A Set First
Andrew
A Set First
Andrew
DC A
DC B
DC C
Transactions: v2
A Set Last
Carnegie
A Set Last
Carnegie
B SetLast
Brown
B SetLast
Brown
B SetLast
Brown
{ first: undefinedlast: Brown
}Convergence takes all of the concurrent updates from a single
DC using the maximum DC identifier.
12/16/2019 Macrometa 13
Renamed existing transaction mechanism atomics.
-
Counters?
12/16/2019 Macrometa 14
Can’t use a register, assignment doesn’t commute.
Store sum per node or…
…store individual updates.
Sum double counts, max under counts.
-
Types of CRDT Counters
12/16/2019 Macrometa 15
State-based CRDTsLattice-based, lower for storage overhead and require that developers ensure that all operations are always mergeable by proper design.e.g., Riak
Operation-based CRDTsOptimized for individual operations where all operations are designed to be commutative, but do not need for form a lattice. Also requires causal broadcast.e.g. Antidote
How do we find a solution that is compatible with the transactions that meets both the garbage collection and storage requirements?
-
AInc
AInc
AInc
AInc
DC A
DC B
DC C
State-based
AInc
AInc
BInc
BInc
BInc
AInc
AInc
AInc
12/16/2019 Macrometa 16
4
4
4
Storage overhead is O(n) on the number of nodes per counter.
-
AInc
AInc
AInc
AInc
DC A
DC B
DC C
State-based and Transactions
AInc
AInc
BInc
BInc
BInc
AInc
AInc
AInc
12/16/2019 Macrometa 17
Two concurrent updates, where one update is lost due to arbitration.
4
4
4
Convergence results in the wrong value.
Storage overhead is O(n) on the number of nodes per counter.
-
AInc1
AInc1
AInc1
AInc1
DC A
DC B
DC C
Operation-based and Transactions
AInc1
AInc1
BInc1
BInc1
BInc1
AInc1
AInc1
AInc1
12/16/2019 Macrometa 18
3
3
3
Convergence results in the right value.
Storage overhead is O(n) on the number of updates per counter.
Two concurrent updates, where one update is lost due to arbitration.
-
A Inc 1
A Inc 1
A Inc 1
A Inc 1
A Inc 1
A Inc 1
A Inc 1
A Inc 1
A Inc 1
A Inc 2
A Inc 1
B Inc 1
B Inc 1
B Inc 1
A Inc 2
A Inc 2
Moving Convergence WindowGarbage
DC A
DC B
DC C
A Inc 1
Summarizes entries from last summary to
A Inc 0
A Inc 0
A Inc 0
Implicit origin summary entry
Counters
B Inc 1
6
A Inc 1
A Inc 1
5
5
12/16/2019 Macrometa 19
-
CRDTs, tho?
12/16/2019 Macrometa 20
CRDTs?Are these actually CRDTs? Probably not, need a new name.Interesting point in the design space (c.f., Riak, Antidote)
-
Thanks!
12/16/2019 Macrometa 21
C8DB�Geo-Replicated, Conflict-Free Document DatabaseGeo-Distribution ChallengesConflict-Free Replicated Data TypesCRDTs in ProductionMacrometa: OverviewMacrometa: ArchitectureGeo-Distribution ConcernsRegistersRegisters: Read AnomaliesTransactions v1Transactions v1: ConcurrencyTransactions v1: Concurrency AnomaliesTransactions: v2Counters?Types of CRDT CountersState-basedState-based and TransactionsOperation-based and TransactionsCountersCRDTs, tho?Thanks!