Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data...

33
Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented by Alan Fekete (University of Sydney) rk by Peter Bailis (UCBerkeley), an Fekete (U of Sydney), Michael Franklin (UCBerkel i Ghodsi (UC Berkeley, KTH), seph M. Hellerstein (UC Berkeley), Ion Stoica (UC B

Transcript of Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data...

Page 1: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data storesTalk remotely at IITB, March 12 2015

Presented by Alan Fekete (University of Sydney)

Work by Peter Bailis (UCBerkeley), Alan Fekete (U of Sydney), Michael Franklin (UCBerkeley),Ali Ghodsi (UC Berkeley, KTH), Joseph M. Hellerstein (UC Berkeley), Ion Stoica (UC Berkeley)

Page 2: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Internet-Scale Data Storage

• Many early systems* offered scalability and availability but missed functionality expected in traditional database management platforms (=> “NoSQL”)– Access by id/key [without content-based access, without

joins]– Operations may see stale data – Lack all-or-nothing combining ops across items

*eg BigTable, PNUTS, S3, Dynamo, MongoDB, Cassandra, SimpleDB, Riak

2

Page 3: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Wouldn’t it be nice if…

• More recent papers and systems for internet-scale data offer extra features beyond early NoSQL approaches, including some familiar from DBMS– (Choice of) more consistency in an operation– Richer operations– Grouping operations on multiple items

• Our focus is transactions: ways to group operations on multiple items

3

Page 4: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Returning stale data

• Allowing weak consistency (return of stale data) in accesses was justified by CAP result: For single item access, you can’t offer strong consistency read and write, that will be always available, if partitions are possible in the system– Conjecture of Brewer (2000), proved by Gilbert

and Lynch (2002)

4

Page 5: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Not supporting transactions?

• A system can’t provide serializable transactions that are always available if the system can partition

• This was known long before Brewer; see Davidson et al (ACM Computing Surveys 1985)

5

Page 6: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Traditional DBMS txns

• Application can declare transaction boundary– Usually by ending current txn; another will

automatically start at the next request

– Sometimes restricted: no DDL statements (eg change to schema), only DML (SELECT, UPDATE etc)

• Application can set isolation level– Usually done per connection

6

Page 7: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

“Isolated”

• Academic definition: Serializable – (ought to be default for systems, but not so in practice)• Key property: interleaved execution is equivalent (same

values returned, same final state of db) as some execution where transactions run serially (no interleaving at all)

• No dirty read, no lost update• If each transaction (running alone) preserves some

constraint I, then the whole execution preserves I• Implemented: Traditionally done with Commit-duration

locks on data and indices– “Two Phase Locking (2PL)”– Also newer multiversion implementations (eg Cahill et al, TODS’09)

7

Page 8: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Weaker Isolation Levels

• SQL standard offers several isolation levels• Each transaction can have level set separately• Read Uncommitted– Usually only for read-only code– Implemented: no read locks, commit-duration write locks• Read Committed– No dirty reads (can’t see uncommitted, aborted or intermediate values)– Implemented: short duration read locks, commit-duration write locks– MV implementation: can return older version, while concurrent update is happening• Repeatable Read– No “phantoms” (predicate evaluation that sees versions inserted concurrently)– Implemented: Commit-duration locks on data

• Should be the same as Serializable for a key-value store

– Some multiversion systems provide “snapshot reading” for this level

8

Page 9: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

ACID Transactions with weaker I

• Serializability is the ideal for isolation of transactions but most transactions on (conventional, single site) dbms don’t run serializably!– Read Committed is

often the default level9

Page 10: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Coordination is Bad

• “Availability during partitions” is a signal of an algorithm that does not need to coordinate across sites– Coordination damages both latency and

throughput, during normal operation– Especially for georeplicated or geodistributed

systems, where intersite latency is high (and can’t be made much lower, because “speed of light”)

10

Page 11: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Coordination Costs

11

Maximum possible throughput for conflicting transactions that need coordinationwith current network timings

Page 12: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

HAT

• We propose a guide for platform developers – offer txns that can be arbitrary collection of

accesses to arbitrary sets of read/write objects, – with semantics chosen to be as strong as

feasible to implement with availability even when partitioned

• “Highly Available Transactions”

12

Page 13: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Available?

• Clearly not possible if client is partitioned away from its data– However, we should tolerate partition between

item replicas within the data store

• So, we ask for:– IF client can get to (at least one replica of) each

item it asks for, THEN transaction can eventually commit (or it aborts voluntarily)

13

Page 14: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Isolation levels for HAT

• We have shown (VLDB14) that you can offer available transactions that have– All-or-nothing atomicity

– Isolation level like (the definitions of) read committed and repeatable read*

• But where reads may not always see the most recent committed changes

• And you don’t get all the extra properties of conventional locking implementation (eg timeline view)

– Causal consistency (including RYW, monotonic reads, write follows reads) {as long as client is sticky to a partition}

14*in absence of predicate reads [which is not an issue for key-value store]

Page 15: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Read Atomic

• A new proposal for an isolation level (SIGMOD’14)

• Read committed, PLUS “No fractured reads”– Avoid the following:

• T1 writes x, y

• T2 reads x (seeing T1 or later), y (not seeing T1)

15

Page 16: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Anomaly Prevented by RA

16

x, init 10 y, init 10

x=11

y=11x?

10

11

y?

T1T2

timeincreasingdown the page

Page 17: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Caveat

• RA does not always guarantee transaction consistent snapshot– Transitive information flow may be fractured– However, many common coding idioms are

supported effectively• Eg maintain both ends of bidirectional associations

consistently– Contrast with Facebook TAO, LinkedIn Espresso etc

• Eg maintain secondary index consistent with data

• Eg maintain referential integrity17

Page 18: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

RAMP Algorithms

• We have shown 3 alternative implementation techniques that provide RA isolation

18

Page 19: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

RAMP-Fast

• Key ideas: – Multiversion stores, that keep older version– Make new version visible once everything

written in the txn is stored at all sites – Store metadata with each version listing other

items written in same transaction– Detect races and repair atomicity by looking for

aligned versions

19

Page 20: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

State kept by RAMP

• For each item:– A set of versions

• Each version has value, timestamp, metadata (which other items were written together with this)

– latestcommitted timestamp

• Those versions whose timestamp is greater than latestcommitted are ones whose commit has not yet arrived at this site

• Eg 20

y:(10,0,{x}),(11,1,{x})latestcommit=0

Page 21: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

RAMP-F put-all phase 1

21

x, init 10 y, init 10

PREP: x=11, ts=1. {y}

PREP: y=11, ts=1, {x}

T1

T2 x:(10,0,{y})latestcommit=0

y:(10,0,{x}),(11,1,{x})latestcommit=0

x:(10,0,{y}),(11,1,{y})latestcommit=0

Page 22: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

RAMP-F put-all phase 2

22

x, init 10 y, init 10

PREP

PREP

T1

T2 x:(10,0,{y})latestcommit=0

y:(10,0,{x}),(11,1,{x})latestcommit=0

x:(10,0,{y}),(11,1,{y})latestcommit=0

COMMIT(1)y:(10,0,{x}),(11,1,{x})latestcommit=1

x:(10,0,{y}),(11,1,{y})latestcommit=1

Page 23: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

RAMP-F get-all phase-1 (fast)

23

x, init 10 y, init 10

PREP: x=11, ts=1. {y}

PREP: y=11, ts=1, {x}x?

x:10,0,{y}

y:10,0,{x}

y?

T1

T2 x:(10,0,{y})latestcommit=0

y:(10,0,{x}),(11,1,{x})latestcommit=0

No fracture;Can return these!

x:(10,0,{y}),(11,1,{y})latestcommit=0

Page 24: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

RAMP-F get-all phase-1 (oops)

24

x, init 10 y, init 10

PREP

PREPx?

x:10,0,{y}

y:11,1,{x}

y?

T1

T2 x:(10,0,{y})latestcommit=0

y:(10,0,{x}),(11,1,{x})latestcommit=0

Detect fracture!Missing x with ts=1

COMMIT(1) y:(10,0,{x}),(11,1,{x})latestcommit=1

x:(10,0,{y}),(11,1,{y})latestcommit=0

x:(10,0,{y}),(11,1,{y})latestcommit=1

Page 25: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

RAMP-F get-all phase-2

25

x, init 10 y, init 10

PREP

PREPx?

x:10,0,{y}

y:11,1,{x}

y?

T1

T2 x:(10,0,{y})latestcommit=0

y:(10,0,{x}),(11,1,{x})latestcommit=0

Detect fracture!Missing x with ts=1 COMMIT(1)

y:(10,0,{x}),(11,1,{x})latestcommit=1

x:11,1{y}

x:(10,0,{y}),(11,1,{y})latestcommit=0

Page 26: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

YCSB Performance (95% Reads)

26

No consistency

Write locks only

RAMP-Fast

Strict 2PL

RAMP-Small

RAMP-Hybrid

Page 27: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Related work with Availability• Restricted form of transactions

– Operate on set of items that are colocated• Eg Google Megastore entity group, UCSB G-Store

– Multiple gets or multiple puts, not get with put• Eg Princeton COPS-GT, Eiger

• Restricted data types– Only allow commutative operations

• eg INRIA CRDTs, Berkeley BloomL

• Weak semantics– Without isolation properties

• Eg ETH Consistency rationing (some choices)

27

Page 28: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Related work without Availability

• Systems that support general (read committed, SI-like, or even serializable) transactions – but use coordination: 2PC, Paxos, a master replica

for ordering, etc– Eg Google Megastore (across entity groups), ETH

Consistency Rationing (some choices), Google Spanner, MSR Walter, UCSB Paxos-CP, Yale Calvin, Berkeley Planet (formerly MDCC)

28

Page 29: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Invariants

• RAMP supports many invariant-centric programming idioms

• Can we do this for other invariants?– Yes! (VLDB’15)

• Is this common?– Yes! (SIGMOD’15)

29

Page 30: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

I-Confluence

• Given an invariant and a set of operations (and a way to merge conflicting updates)

• Can the invariant be maintained without coordination?– Yes, if the I-confluence property holds– Essentially: the result of merging invariant-

satisfying changes also satisfies the invariant

30

Page 31: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Invariants in Rails

• Study a sample of 67 most popular Rails apps from github

• Hardly any app-specified transactions• Lots of validations (check invariant)

– Many built-in– Some user-defined

• Most are I-confluent– Some are not (and not supported correctly by most

dbms at default weak isolation)31

Page 32: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

Conclusion

• We advocate internet-scale data system to offer clients– Unrestricted sets of operations on arbitrary multiple

items as transaction– Semantics as strong as possible while avoiding

coordination• We offer RA, a choice that supports many idioms, and

can be implemented efficiently• We provide theory to check if an invariant can be

maintained coordination-free32

Page 33: Isolation properties, application behaviour, platform performance: The tradeoffs in distributed data stores Talk remotely at IITB, March 12 2015 Presented.

For further study

• http://www.bailis.org

33