C8DB Geo-Replicated, Conflict-Free Document Database › papers › 2019 › macromedia.pdfC8DB...

21
C8DB Geo-Replicated, Conflict-Free Document Database Christopher S. Meiklejohn Macrometa HPTS 2019 November 5 th , 2019 12/16/2019 Macrometa 1

Transcript of C8DB Geo-Replicated, Conflict-Free Document Database › papers › 2019 › macromedia.pdfC8DB...

  • C8DBGeo-Replicated, Conflict-Free

    Document DatabaseChristopher S. Meiklejohn

    MacrometaHPTS 2019

    November 5th, 2019

    12/16/2019 Macrometa 1

  • Geo-Distribution Challenges

    12/16/2019 Macrometa 2

    1. Remain consistent and block the write operation (CP)

    2. Allow local write to proceed and risk inconsistency (AP)

  • Conflict-Free Replicated Data Types

    12/16/2019 Macrometa 3

    Distributed data structures, mimic sequential counterpartsDesigned for convergence under concurrency; deterministic by design.Multiple flavors (e.g., state-based, delta-based, operation-based, etc.)

    CountersG-CounterPN-Counter

    SetsG-Set2P-SetOR-SetAW-SetRW-Set

    RegistersLWW-RegisterMV-Register

    Flags

    SequenceTreedocRGARGA-Split

    DC A

    DC B

    DC C

    { 1 }{(1, a)}

    { }

    { 1 }{(1, {a, b})}

    { }

    { 1 }{(1, b)}

    { }

    { 1 }{(1, {a, b})}

    {(1, a)}

    { 1 }{(1, a)}

    { }

    { }{(1, a)}{(1, a)}

    { 1 }{(1, {a, b})}

    { }

    { 1 }{(1, {a, b})}

    {(1, a)}

    { 1 }{(1, {a, b})}

    {(1, a)}

    { 1 }

    { 1 }

    { 1 }

    What do we do when things don’t commute?

  • CRDTs in Production

    12/16/2019 Macrometa 4

    Riak distributed databaseFirst major production implementation of state-based CRDTsUsed by NHS, Riot Games, LO/JACK, bet365

    AntidoteDB (HPTS ‘17)Developer on operation-based CRDT database with Transactional Causal ConsistencyPart of the LightKone, SyncFree research projects

    OthersMesosphere’s DC/OS (Lashup, Minuteman), Adobe, Comcast, MacrometaTomTom, Roshi, Automerge, Ditto, Cosmos DB, etc.

  • Macrometa: Overview

    12/16/2019 Macrometa 5

    Document database for the edgeDocuments are JSON and grouped into collections(think: namespaces for each document, clients presented with dictionaries)

    105 physical regions, 175 logical regions36 reserved core regions, 12 available through public beta today34 customer requested regions (on-prem, specific colo, etc.)35 burstable regions available

    Configurable consistencyStrong consistency in a single DC with propagation to remote DCs in orderEventual consistency in an multi-master mode with all DCs receiving updates

  • Macrometa: Architecture

    12/16/2019 Macrometa 6

    Logging-based ArchitectureUpdates within a data-center are durably written to a logLogs are maintained with Zookeeper and BookKeeper

    Causal broadcastThere is a single log per destination DC delivered in FIFO orderUpdates are buffered until log entries dependencies are met

    Client operate on the document abstraction levelClients submit documents that are converted to updates by Macrometa nodesMacrometa nodes materialize values for keys based on incoming log updatges

  • Geo-Distribution Concerns

    12/16/2019 Macrometa 7

    Regions have different SLAs, availability, performance, cost.

    Paying for storage on each provider, minimize storage costs.

  • A Set Alice

    A Set April

    A Set Alice

    A Set Alice

    DC A

    DC B

    DC C

    Registers

    A Set April

    A Set April

    C Set Charlie

    B Set Bob

    B Set Bob

    B Set Bob

    C Set Charlie

    C Set Charlie

    Charlie

    Charlie

    Charlie

    Concurrent updates merge arbitrate based on the maximum DC identifier.

    12/16/2019 Macrometa 8

    Ensures at the end of convergence we need to only keep a single update in the log for each field in the

    document.

  • A Set First

    Andrew

    A Set Last

    Carnegie

    A Set First

    Andrew

    A Set First

    Andrew

    DC A

    DC B

    DC C

    Registers: Read Anomalies

    A Set Last

    Carnegie

    A Set Last

    Carnegie

    { first: Andrew last: undefined

    }

    Read anomaly if read is performed here based on user expectations.

    12/16/2019 Macrometa 9

  • A Set First

    Andrew

    A Set Last

    Carnegie

    A Set First

    Andrew

    A Set First

    Andrew

    DC A

    DC B

    DC C

    Transactions v1

    A Set Last

    Carnegie

    A Set Last

    Carnegie

    Updates are grouped together with the same logical clock.

    Updates are applied atomically to the log.

    12/16/2019 Macrometa 10

  • A Set First

    Andrew

    A Set Last

    Carnegie

    A Set First

    Andrew

    A Set First

    Andrew

    DC A

    DC B

    DC C

    Transactions v1: Concurrency

    A Set Last

    Carnegie

    A Set Last

    Carnegie

    B Set Death1919

    B Set Death1919

    B Set Death1919

    12/16/2019 Macrometa 11

    { first: Andrewlast: Carnegiedeath: 1919

    }

    { first: Andrewlast: Carnegiedeath: 1919

    }

    { first: Andrewlast: Carnegiedeath: 1919

    }

    Concurrent operations are merged together using the existing convergence strategy by maximum DC identifier.

  • A Set First

    Andrew

    A Set Last

    Carnegie

    A Set First

    Andrew

    A Set First

    Andrew

    DC A

    DC B

    DC C

    Transactions v1: Concurrency Anomalies

    A Set Last

    Carnegie

    A Set Last

    Carnegie

    B SetLast

    Brown

    B SetLast

    Brown

    B SetLast

    Brown

    { first: Andrew last: Brown

    }

    Convergence uses maximum DC identifier and only takes part of the update.

    12/16/2019 Macrometa 12

  • A Set First

    Andrew

    A Set Last

    Carnegie

    A Set First

    Andrew

    A Set First

    Andrew

    DC A

    DC B

    DC C

    Transactions: v2

    A Set Last

    Carnegie

    A Set Last

    Carnegie

    B SetLast

    Brown

    B SetLast

    Brown

    B SetLast

    Brown

    { first: undefinedlast: Brown

    }Convergence takes all of the concurrent updates from a single

    DC using the maximum DC identifier.

    12/16/2019 Macrometa 13

    Renamed existing transaction mechanism atomics.

  • Counters?

    12/16/2019 Macrometa 14

    Can’t use a register, assignment doesn’t commute.

    Store sum per node or…

    …store individual updates.

    Sum double counts, max under counts.

  • Types of CRDT Counters

    12/16/2019 Macrometa 15

    State-based CRDTsLattice-based, lower for storage overhead and require that developers ensure that all operations are always mergeable by proper design.e.g., Riak

    Operation-based CRDTsOptimized for individual operations where all operations are designed to be commutative, but do not need for form a lattice. Also requires causal broadcast.e.g. Antidote

    How do we find a solution that is compatible with the transactions that meets both the garbage collection and storage requirements?

  • AInc

    AInc

    AInc

    AInc

    DC A

    DC B

    DC C

    State-based

    AInc

    AInc

    BInc

    BInc

    BInc

    AInc

    AInc

    AInc

    12/16/2019 Macrometa 16

    4

    4

    4

    Storage overhead is O(n) on the number of nodes per counter.

  • AInc

    AInc

    AInc

    AInc

    DC A

    DC B

    DC C

    State-based and Transactions

    AInc

    AInc

    BInc

    BInc

    BInc

    AInc

    AInc

    AInc

    12/16/2019 Macrometa 17

    Two concurrent updates, where one update is lost due to arbitration.

    4

    4

    4

    Convergence results in the wrong value.

    Storage overhead is O(n) on the number of nodes per counter.

  • AInc1

    AInc1

    AInc1

    AInc1

    DC A

    DC B

    DC C

    Operation-based and Transactions

    AInc1

    AInc1

    BInc1

    BInc1

    BInc1

    AInc1

    AInc1

    AInc1

    12/16/2019 Macrometa 18

    3

    3

    3

    Convergence results in the right value.

    Storage overhead is O(n) on the number of updates per counter.

    Two concurrent updates, where one update is lost due to arbitration.

  • A Inc 1

    A Inc 1

    A Inc 1

    A Inc 1

    A Inc 1

    A Inc 1

    A Inc 1

    A Inc 1

    A Inc 1

    A Inc 2

    A Inc 1

    B Inc 1

    B Inc 1

    B Inc 1

    A Inc 2

    A Inc 2

    Moving Convergence WindowGarbage

    DC A

    DC B

    DC C

    A Inc 1

    Summarizes entries from last summary to

    A Inc 0

    A Inc 0

    A Inc 0

    Implicit origin summary entry

    Counters

    B Inc 1

    6

    A Inc 1

    A Inc 1

    5

    5

    12/16/2019 Macrometa 19

  • CRDTs, tho?

    12/16/2019 Macrometa 20

    CRDTs?Are these actually CRDTs? Probably not, need a new name.Interesting point in the design space (c.f., Riak, Antidote)

  • Thanks!

    12/16/2019 Macrometa 21

    C8DB�Geo-Replicated, Conflict-Free Document DatabaseGeo-Distribution ChallengesConflict-Free Replicated Data TypesCRDTs in ProductionMacrometa: OverviewMacrometa: ArchitectureGeo-Distribution ConcernsRegistersRegisters: Read AnomaliesTransactions v1Transactions v1: ConcurrencyTransactions v1: Concurrency AnomaliesTransactions: v2Counters?Types of CRDT CountersState-basedState-based and TransactionsOperation-based and TransactionsCountersCRDTs, tho?Thanks!