Phase Reconciliation for Contended In-Memory Transactions

Phase Reconciliation for Contended In-Memory

TransactionsNeha Narula, Cody Cutler, Eddie Kohler, Robert Morris

MIT CSAIL and Harvard

Cloud Computing and Databases

Two trends: multicore and in-memory databases

Multicore databases face increased contention due to skewed workloads and rising core counts

BEGIN TransactionADD(x, 1)ADD(y, 2)

END Transaction

Throughput on a Contentious Transactional Workload

Concurrency Control Forces Serialization

core 0

core 1

core 2

ADD(x)ADD(y)

ADD(x)ADD(y)

ADD(x)ADD(y)

time


END Transaction

Multicore OS Scalable Counters

core 0

core 1

core 2

x0 = x0+1;

x1 = x1+1;

x2 = x2+1;

Kernel can apply increments in parallel using per-core counters

time

Multicore OS Scalable Counters

core 0

core 1

core 2

x0 = x0+1;

x1 = x1+1;

x2 = x2+1; print(x); x=x0+x1+x2; print(x)

To read per-core data, system has to stop all writes and reconcile x

time

Can we use per-core data in complex database transactions?


END Transaction

BEGIN TransactionGET(x)GET(y)

END Transaction

Challenges

• Deciding what records should be split among cores

• Transactions operating on some contentious and some normal data

• Different kinds of operations on the same records

Phase Reconciliation

• Database automatically detects contention to split a record among cores

• Database cycles through phases: split and joined

• OCC serializes access to non-split data• Split records have assigned operations for

split phases

Doppel, an in-memory transactional database

Outline

1. Phases2. Splittable operations3. Doppel optimizations4. Performance evaluation

Split Phase

• A transaction can operate on split and non-split data• Split records are written to per-core (x)• Rest of the records use OCC (u, v, w)• OCC ensures serializability for the non-split parts of the transaction

core 0

core 1

core 2

ADD(x) ADD(u)

ADD(x) ADD(v)

ADD(x) ADD(w)

GET(x)GET(v)

x0=x0+1

x1=x1+1

x2=x2+1

split phase

Reconciliation

• Cannot correctly process a read of x in current state• Abort read transaction• Reconcile per-core data to global store

core 0

core 1

core 2

ADD(x) ADD(u)

ADD(x) ADD(v)

ADD(x) ADD(w)

GET(x)GET(v)

x=x+x0

x0=0x0=x0+1

x1=x1+1

x2=x2+1

x=x+x1

x1=0

x=x+x2

x2=0

split phase reconciliation

Joined Phase

• Wait until all processes have finished reconciliation• Resume aborted read transactions using OCC• Process all new transactions using OCC• No split data

core 0

core 1

core 2

ADD(x) ADD(u)

ADD(x) ADD(v)

ADD(x) ADD(w)

GET(x)GET(v)

x=x+x0

x0=0GET(x) GET(v)

x0=x0+1

x1=x1+1

x2=x2+1

x=x+x1

x1=0

x=x+x2

x2=0

split phase joined phasereconciliation

Resume Split Phase

• When done, transition to split phase again• Wait for all processes to acknowledge they have

completed joined phase before writing to per-core data

core 0

core 1

core 2

ADD(x) ADD(u)

ADD(x) ADD(v)

ADD(x) ADD(w)

GET(x)GET(v)

x=x+x0

x0=0GET(x) GET(v)

x0=x0+1

x1=x1+1

x2=x2+1

ADD(x) ADD(v)x1=x1+1

x=x+x1

x1=0

x=x+x2

x2=0

ADD(x) ADD(v)x2=x2+1

split phase joined phase split phasereconciliation

Outline

1. Phases2. Splittable operations3. Doppel Optimizations4. Performance evaluation

Transactions and Operations

BEGIN Transaction (y)ADD(x,1)ADD(y,2)

END Transaction

• Transactions are composed of one or more operations

• Only some operations are amenable to splitting

Supported Operations

Splittable

ADD(k,n)MAX(k,n)MULT(k,n)

OPUT(k,v,o)TOPKINSERT(k,v

,o)

Not Splittable

GET(k)PUT(k,v)

MAX Example

core 0

core 1

core 2

MAX(x,55)

MAX(x,10)

MAX(x,21)

x0 = MAX(x0,55)

x1 = MAX(x1,10)

x2 = MAX(x2,21)

x0 = 55

x1 = 10

x2 = 21

• Each core keeps one piece of summarized state xi

• MAX is commutative so results can be determined out of order

MAX Example

core 0

core 1

core 2

MAX(x,55)

MAX(x,10)

MAX(x,21)

x0 = MAX(x0,55)

x1 = MAX(x1,10)

x2 = MAX(x2,21)

x0 = 55

x1 = 27

x2 = 21

• Each core keeps one piece of summarized state xi

• MAX is commutative so results can be determined out of order

MAX(x,27) x1 =

MAX(x1,27)

MAX(x,2)

x1 = MAX(x1,2)

x = 55

What Can Be Split?

Operations that can be split must be:– Commutative– Pre-reconcilable– On a single key

What Can’t Be Split?

• Operations that return a value– ADD_AND_GET(x,4)

• Operations on multiple keys– ADD(x,y)

• Different operations in the same phase, even if arguments make them commutative– ADD(x,0) and MULT(x,1)

Outline

1. Phases2. Splittable operations3. Doppel Optimizations4. Performance evaluation

Batching Transactions

core 0

core 1

core 2

ADD(x) ADD(u)

ADD(x) ADD(v)

ADD(x) ADD(w)

GET(x)GET(v)

x=x+x0

x0=0GET(x) GET(v)

x0=x0+1

x1=x1+1

x2=x2+1

x=x+x1

x1=0

x=x+x2

x2=0


Batching Transactions

• Don’t switch phases immediately; stash reads• Wait to accumulate stashed transactions• Batch for joined phase

core 0

core 1

core 2

ADD(x) ADD(u)

ADD(x) ADD(v)

ADD(x) ADD(w)

GET(x)GET(v)

x=x+x0

x0=0GET(x) GET(v)

x0=x0+1

x1=x1+1

x2=x2+1

x=x+x1

x1=0

x=x+x2

x2=0


GET(x) GET(x)GET(x) GET(x)

ADD(x) ADD(v)

x1=x1+1

Ideal World

• Transactions with contentious operations happen in the split phase

• Transactions with incompatible operations happen correctly in the joined phase

How To Decide What Should Be Split Data

• Database starts out with no split data• Count conflicts on records–Make key split if #conflicts >

conflictThreshold

• Count stashes on records in the split phase–Move key back to non-split if #stashes

too high

Outline

1. Phases2. Splittable operations3. Data classification4. Performance evaluation

Performance

• Contention vs. parallelism• What kinds of workloads benefit?• A realistic application: RUBiS

Doppel implementation:• Multithreaded Go server; worker thread per core• All experiments run on an 80 core intel server running 64 bit Linux 3.12• Transactions are one-shot procedures written in Go

Parallel Performance on Conflicting Workloads

Th

rou

gh

pu

t (m

illio

ns

txn

s/se

c)

20 cores, 1M 16 byte keys, transaction: ADD(x,1)

Doppel OCC 2PL0

5000000

10000000

15000000

20000000

25000000

30000000

35000000

Increasing Performance with More Cores

1M 16 byte keys, transaction: ADD(x,1) all writing same key

Varying Number of Hot Keys

20 cores, transaction: ADD(x,1)

How Much Stashing Is Too Much?

20 cores, transactions: LIKE read, LIKE write

RUBiS

• Auction application modeled after eBay– Users bid on auctions, comment, list new items,

search

• 1M users and 33K auctions• 7 tables, 26 interactions• 85% read only transactions

• More contentious workload– Skewed distribution of bids– More writes

StoreBid Transaction

BEGIN Transaction (bidder, amount, item)num_bids_item = num_bids_item + 1if amount > max_bid_item:

max_bid_item = amountmax_bidder_item = bidder

bidder_item_ts = Bid{bidder, amount, item, ts}END Transaction

StoreBid Transaction

BEGIN Transaction (bidder, amount, item)ADD(num_bids_item,1)MAX(max_bid_item, amount)

OPUT(max_bidder_item, bidder, amount)PUT(new_bid_key(), Bid{bidder, amount,

item, ts})END Transaction

All commutative operations on potentially conflicting auction

metadata

RUBiS ThroughputTh

rou

gh

pu

t (m

illio

ns

txn

s/se

c)

20 cores, 1M users 33K auctions

Low Contention High Contention0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

Related Work

• Split counters in multicore OSes– Linux kernel

• Commutativity in distributed systems and concurrency control– [Shapiro ‘11]

– [Li ‘12]

– [Lloyd ‘12]

– [Weihl ‘88]

• Optimistic concurrency control– [Kung ’81]

– [Tu ‘13]

Summary

• Contention is rising with increased core counts

• Commutative operations are amenable to being applied in parallel, per-core

• We can get good parallel performance on contentious transactional workloads by combining per-core split data with concurrency control

Neha Narula

http://nehanaru.la

@neha

Phase Reconciliation for Contended In-Memory Transactions

Documents

Transcript of Phase Reconciliation for Contended In-Memory Transactions