Phase Reconciliation for Contended In-Memory Transactions
-
Upload
shana-logan -
Category
Documents
-
view
16 -
download
0
description
Transcript of Phase Reconciliation for Contended In-Memory Transactions
Phase Reconciliation for Contended In-Memory
TransactionsNeha Narula, Cody Cutler, Eddie Kohler, Robert Morris
MIT CSAIL and Harvard
Cloud Computing and Databases
Two trends: multicore and in-memory databases
Multicore databases face increased contention due to skewed workloads and rising core counts
BEGIN TransactionADD(x, 1)ADD(y, 2)
END Transaction
Throughput on a Contentious Transactional Workload
Throughput on a Contentious Transactional Workload
Concurrency Control Forces Serialization
core 0
core 1
core 2
ADD(x)ADD(y)
ADD(x)ADD(y)
ADD(x)ADD(y)
time
BEGIN TransactionADD(x, 1)ADD(y, 2)
END Transaction
Multicore OS Scalable Counters
core 0
core 1
core 2
x0 = x0+1;
x1 = x1+1;
x2 = x2+1;
Kernel can apply increments in parallel using per-core counters
time
Multicore OS Scalable Counters
core 0
core 1
core 2
x0 = x0+1;
x1 = x1+1;
x2 = x2+1; print(x); x=x0+x1+x2; print(x)
To read per-core data, system has to stop all writes and reconcile x
time
Can we use per-core data in complex database transactions?
BEGIN TransactionADD(x, 1)ADD(y, 2)
END Transaction
BEGIN TransactionGET(x)GET(y)
END Transaction
Challenges
• Deciding what records should be split among cores
• Transactions operating on some contentious and some normal data
• Different kinds of operations on the same records
Phase Reconciliation
• Database automatically detects contention to split a record among cores
• Database cycles through phases: split and joined
• OCC serializes access to non-split data• Split records have assigned operations for
split phases
Doppel, an in-memory transactional database
Outline
1. Phases2. Splittable operations3. Doppel optimizations4. Performance evaluation
Split Phase
• A transaction can operate on split and non-split data• Split records are written to per-core (x)• Rest of the records use OCC (u, v, w)• OCC ensures serializability for the non-split parts of the transaction
core 0
core 1
core 2
ADD(x) ADD(u)
ADD(x) ADD(v)
ADD(x) ADD(w)
GET(x)GET(v)
x0=x0+1
x1=x1+1
x2=x2+1
split phase
Reconciliation
• Cannot correctly process a read of x in current state• Abort read transaction• Reconcile per-core data to global store
core 0
core 1
core 2
ADD(x) ADD(u)
ADD(x) ADD(v)
ADD(x) ADD(w)
GET(x)GET(v)
x=x+x0
x0=0x0=x0+1
x1=x1+1
x2=x2+1
x=x+x1
x1=0
x=x+x2
x2=0
split phase reconciliation
Joined Phase
• Wait until all processes have finished reconciliation• Resume aborted read transactions using OCC• Process all new transactions using OCC• No split data
core 0
core 1
core 2
ADD(x) ADD(u)
ADD(x) ADD(v)
ADD(x) ADD(w)
GET(x)GET(v)
x=x+x0
x0=0GET(x) GET(v)
x0=x0+1
x1=x1+1
x2=x2+1
x=x+x1
x1=0
x=x+x2
x2=0
split phase joined phasereconciliation
Resume Split Phase
• When done, transition to split phase again• Wait for all processes to acknowledge they have
completed joined phase before writing to per-core data
core 0
core 1
core 2
ADD(x) ADD(u)
ADD(x) ADD(v)
ADD(x) ADD(w)
GET(x)GET(v)
x=x+x0
x0=0GET(x) GET(v)
x0=x0+1
x1=x1+1
x2=x2+1
ADD(x) ADD(v)x1=x1+1
x=x+x1
x1=0
x=x+x2
x2=0
ADD(x) ADD(v)x2=x2+1
split phase joined phase split phasereconciliation
Outline
1. Phases2. Splittable operations3. Doppel Optimizations4. Performance evaluation
Transactions and Operations
BEGIN Transaction (y)ADD(x,1)ADD(y,2)
END Transaction
• Transactions are composed of one or more operations
• Only some operations are amenable to splitting
Supported Operations
Splittable
ADD(k,n)MAX(k,n)MULT(k,n)
OPUT(k,v,o)TOPKINSERT(k,v
,o)
Not Splittable
GET(k)PUT(k,v)
MAX Example
core 0
core 1
core 2
MAX(x,55)
MAX(x,10)
MAX(x,21)
x0 = MAX(x0,55)
x1 = MAX(x1,10)
x2 = MAX(x2,21)
x0 = 55
x1 = 10
x2 = 21
• Each core keeps one piece of summarized state xi
• MAX is commutative so results can be determined out of order
MAX Example
core 0
core 1
core 2
MAX(x,55)
MAX(x,10)
MAX(x,21)
x0 = MAX(x0,55)
x1 = MAX(x1,10)
x2 = MAX(x2,21)
x0 = 55
x1 = 27
x2 = 21
• Each core keeps one piece of summarized state xi
• MAX is commutative so results can be determined out of order
MAX(x,27) x1 =
MAX(x1,27)
MAX(x,2)
x1 = MAX(x1,2)
x = 55
MAX Example
core 0
core 1
core 2
MAX(x,55)
MAX(x,10)
MAX(x,21)
x0 = MAX(x0,55)
x1 = MAX(x1,10)
x2 = MAX(x2,21)
x0 = 55
x1 = 27
x2 = 21
• Each core keeps one piece of summarized state xi
• MAX is commutative so results can be determined out of order
MAX(x,27) x1 =
MAX(x1,27)
MAX(x,2)
x1 = MAX(x1,2)
x = 55
What Can Be Split?
Operations that can be split must be:– Commutative– Pre-reconcilable– On a single key
What Can’t Be Split?
• Operations that return a value– ADD_AND_GET(x,4)
• Operations on multiple keys– ADD(x,y)
• Different operations in the same phase, even if arguments make them commutative– ADD(x,0) and MULT(x,1)
Outline
1. Phases2. Splittable operations3. Doppel Optimizations4. Performance evaluation
Batching Transactions
core 0
core 1
core 2
ADD(x) ADD(u)
ADD(x) ADD(v)
ADD(x) ADD(w)
GET(x)GET(v)
x=x+x0
x0=0GET(x) GET(v)
x0=x0+1
x1=x1+1
x2=x2+1
x=x+x1
x1=0
x=x+x2
x2=0
split phase joined phasereconciliation
Batching Transactions
• Don’t switch phases immediately; stash reads• Wait to accumulate stashed transactions• Batch for joined phase
core 0
core 1
core 2
ADD(x) ADD(u)
ADD(x) ADD(v)
ADD(x) ADD(w)
GET(x)GET(v)
x=x+x0
x0=0GET(x) GET(v)
x0=x0+1
x1=x1+1
x2=x2+1
x=x+x1
x1=0
x=x+x2
x2=0
split phase joined phasereconciliation
GET(x) GET(x)GET(x) GET(x)
ADD(x) ADD(v)
x1=x1+1
Ideal World
• Transactions with contentious operations happen in the split phase
• Transactions with incompatible operations happen correctly in the joined phase
How To Decide What Should Be Split Data
• Database starts out with no split data• Count conflicts on records–Make key split if #conflicts >
conflictThreshold
• Count stashes on records in the split phase–Move key back to non-split if #stashes
too high
Outline
1. Phases2. Splittable operations3. Data classification4. Performance evaluation
Performance
• Contention vs. parallelism• What kinds of workloads benefit?• A realistic application: RUBiS
Doppel implementation:• Multithreaded Go server; worker thread per core• All experiments run on an 80 core intel server running 64 bit Linux 3.12• Transactions are one-shot procedures written in Go
Parallel Performance on Conflicting Workloads
Th
rou
gh
pu
t (m
illio
ns
txn
s/se
c)
20 cores, 1M 16 byte keys, transaction: ADD(x,1)
Doppel OCC 2PL0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
Increasing Performance with More Cores
1M 16 byte keys, transaction: ADD(x,1) all writing same key
Varying Number of Hot Keys
20 cores, transaction: ADD(x,1)
How Much Stashing Is Too Much?
20 cores, transactions: LIKE read, LIKE write
RUBiS
• Auction application modeled after eBay– Users bid on auctions, comment, list new items,
search
• 1M users and 33K auctions• 7 tables, 26 interactions• 85% read only transactions
• More contentious workload– Skewed distribution of bids– More writes
StoreBid Transaction
BEGIN Transaction (bidder, amount, item)num_bids_item = num_bids_item + 1if amount > max_bid_item:
max_bid_item = amountmax_bidder_item = bidder
bidder_item_ts = Bid{bidder, amount, item, ts}END Transaction
StoreBid Transaction
BEGIN Transaction (bidder, amount, item)ADD(num_bids_item,1)MAX(max_bid_item, amount)
OPUT(max_bidder_item, bidder, amount)PUT(new_bid_key(), Bid{bidder, amount,
item, ts})END Transaction
All commutative operations on potentially conflicting auction
metadata
RUBiS ThroughputTh
rou
gh
pu
t (m
illio
ns
txn
s/se
c)
20 cores, 1M users 33K auctions
Low Contention High Contention0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
Related Work
• Split counters in multicore OSes– Linux kernel
• Commutativity in distributed systems and concurrency control– [Shapiro ‘11]
– [Li ‘12]
– [Lloyd ‘12]
– [Weihl ‘88]
• Optimistic concurrency control– [Kung ’81]
– [Tu ‘13]
Summary
• Contention is rising with increased core counts
• Commutative operations are amenable to being applied in parallel, per-core
• We can get good parallel performance on contentious transactional workloads by combining per-core split data with concurrency control
Neha Narula
http://nehanaru.la
@neha