Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner
description
Transcript of Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner
1Johannes Schneider
Transactional Memory: How to Perform Load Adaption
in a Simple And Distributed Manner
Johannes SchneiderDavid Hasenfratz
Roger Wattenhofer
2Johannes Schneider
“computer science will become washing machine science.“
Without easy and efficient parallel programming methods…
How to handle access to shared data? Locks, Monitors…
Coarse grained vs. fine grained locking easy but slow program demanding, time consuming but fast programs
Problems difficult error prone Composability …
Johannes Schneider
lock all data modify/use data unlock all data
lock A lock B modify/use A,B lock C modify/use A,B,C unlock A modify/use B,C unlock B,C
lock Block Amodify/use A,Bunlock A,B
Deadlock!
Only 1 thread can execute
3
Thread 1 Thread 2
Transactional memory(TM) - a possible solution
Simple for the programmer
Composable
Idea from database community Many TM systems (internally) still use locks But the TM system (not the programmer) takes care of
Performance Correctness (no deadlocks...)
Johannes Schneider
Begin transactionmodify/use dataEnd transaction
Method A.x()Begin TransactionB.y() …End Transaction
Method B.y() Begin transaction …End transaction
4
Transactional memory systems
If transactions modify
different data, everything is ok
the same data, conflicts arise that must be resolved Transactions might get delayed or abortedÞ Job of a contention manager
A transaction keeps track of all modified values It restores all values, if it is aborted A transaction successfully finishes with a commit
Johannes Schneider 5
Abort or delay a transaction, i.e. adapt load Distributed
Each thread has its own manager Example
Initially: A=1, B=1Manager 1 Manager 2
T1Trans. 1 T1Trans. 2
B:=2…A:=3…
conflict…A:=2…
Abort (undo all changes, i.e. set A:=1) and restart (after a while)
T1Trans.1
…A:=2…
Trans. 2
B:=2…A:=3…
conflict
Abort (set B:=1) and restart OR wait and retry
Conflicts – A contention manager decides
Johannes Schneider 6
Manager 1 Manager 2
Delay to adapt load!
Prior work Contention Managers [PODC03,PODC05,ISAAC09…]
System load was not (explicitly) considered Load adaption (based on contention)
Estimate contention intensity: CI [SPAA08]
If abort: CI = a CI + (1-a) with parameter a [0,1] If commit: CI = a CI If CI > parameter b then resort to central scheduler
Keep a transaction queue per core [PODC08]
Central dispatcher assigns transactions to a core, i.e. its queue Each core iteratively executes transactions from queue If transaction A on core 1 is aborted due to B on core 2
then A is appended to the queue of core 2
Þ Central scheduler will become a bottleneck
Johannes Schneider 7
Core 1 Core 2
A B
CD
Core 1 Core 2A
BC
D
B aborts A
This paper
Theoretical analysis Decentralized (simple) approaches to load adaption
based on contention
Johannes Schneider 8
Strategies
Ignore: Do not learn from conflicts ImmediateRestart
Stay real: Remember faced conflicts SerializeFacedConflicts Do not schedule prior conflicting transactions
concurrently
Be cautious: Assume additional conflicts SerializeAll All transactions in a subgraph are assumed to conflict
Johannes Schneider 9
BA
DC
Conflict graphA conflicted with CD conflicted with B
A
D
C
B
A
DC
BA
C
B
D
Load Adaption Strategies
AbortBackoff If aborted wait for a random time [0,2#aborts] Priority = number of aborts #aborts
Who wins a conflict? 2 strategies
Estimate the work done Unrelated to work done
Johannes Schneider 10
Theory Part - Model
n transactions (and threads) Start concurrently on n cores
Transaction sequence of operations operation takes 1 time unit duration (number of operations) tT is fixed 2 types of operations
Write = modify (shared) resource and lock it until commit Compute/abort/commit
Ignore overhead of load adaption Remembering transactions, scheduling…
Johannes Schneider 11
Core 1 Core 2BA
Core nZ…
A
Moderate parallelism
Shared counter Conflicts directly after transaction start
Linked List Conflicts at arbitrary time
Expected time span until all transactions committed
Speed-up log n (at best)Johannes Schneider 12
Policy Counter ListImmediateRestartAbortBackoffSerializeFacedConflictsSerializeAll
Transaction run time#transactions
Substantial parallelism
Worst case Conflict graph is d-ary tree of logarithmic height
Exponential gap in worst case SerializeAll and others
Johannes Schneider 13
Policy Time until transactions committed
ImmediateRestartAbortBackoffSerializeFacedConflictsSerializeAll
T1
T2 T3
T4 T5 …
Practical investigation
Remembering conflicts causes too much overheadÞGood for analysis but not for implementation
Quickadapter Serializes transactions Each core has a “waiting” flag If aborted, set flag and wait until flag unset If commit, unset some flag
AbortBackOff
(Also considered some variants)
Johannes Schneider 14
Practical investigation
Evaluation on 16 core machine DSTM2 system
Visible readers
Six benchmarks Little parallelism
Shared counter, Sorted List (accessed objects not released), Listcounter Considerable parallelism
Red Black Tree, LFUCache, RandomAccessArray
Compare new load adaption policies to existing contention managers
Johannes Schneider 15
Discussion
Þ Hard to keep maximum throughput, also in [SPAA08, PODC08]
Even without conflicts Improvement for 1 benchmark worsens another
On average better than schemes without load adaption16Johannes Schneider
Conclusion
Simple and distributed load adaption strategies
Theory (For now) constants and parameters matter a lot
Practice Hard to keep load at peak for all usage patterns
17Johannes Schneider
18Johannes Schneider
\vspace{10pt}
Thanks for your attention!
Questions? ???
Analysis AbortBackoff for counter
Recall: If aborted wait for a random time [0,2#aborts] Assume #aborts ~ log (ntT) + x (for some x)
Define: a(x) := fraction of active nodes a(0) = 1 (after time ~2log (ntT) = ntT a constant fraction still active)
Chance conflict for interval [0,2#aborts] Interval [0, 2log(ntT)+x ]
~ a(x) ntT / 2log (ntT) +x = a(x) /2x
a(x+1) = a(x)/2x = 1/2∑i=0..x i ~ 1/2x2
a(√log n) = 1/2(√log n)2 = 1/n
∑i=0.. log (ntT) +√log n length interval = ∑
i=0.. .. log (ntT) +√log n 2i = ntT 2√log n+1
Johannes Schneider 19
T1 T2 T3
a(x)ntT = 3/n n tT = 3tT