Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner

19
1 Johannes Schneider Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner Johannes Schneider David Hasenfratz Roger Wattenhofer

description

Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner Johannes Schneider David Hasenfratz Roger Wattenhofer. Without easy and efficient parallel programming methods…. “computer science will become washing machine science.“. - PowerPoint PPT Presentation

Transcript of Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner

Page 1: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

1Johannes Schneider

Transactional Memory: How to Perform Load Adaption

in a Simple And Distributed Manner

Johannes SchneiderDavid Hasenfratz

Roger Wattenhofer

Page 2: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

2Johannes Schneider

“computer science will become washing machine science.“

Without easy and efficient parallel programming methods…

Page 3: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

How to handle access to shared data? Locks, Monitors…

Coarse grained vs. fine grained locking easy but slow program demanding, time consuming but fast programs

Problems difficult error prone Composability …

Johannes Schneider

lock all data modify/use data unlock all data

lock A lock B modify/use A,B lock C modify/use A,B,C unlock A modify/use B,C unlock B,C

lock Block Amodify/use A,Bunlock A,B

Deadlock!

Only 1 thread can execute

3

Thread 1 Thread 2

Page 4: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Transactional memory(TM) - a possible solution

Simple for the programmer

Composable

Idea from database community Many TM systems (internally) still use locks But the TM system (not the programmer) takes care of

Performance Correctness (no deadlocks...)

Johannes Schneider

Begin transactionmodify/use dataEnd transaction

Method A.x()Begin TransactionB.y() …End Transaction

Method B.y() Begin transaction …End transaction

4

Page 5: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Transactional memory systems

If transactions modify

different data, everything is ok

the same data, conflicts arise that must be resolved Transactions might get delayed or abortedÞ Job of a contention manager

A transaction keeps track of all modified values It restores all values, if it is aborted A transaction successfully finishes with a commit

Johannes Schneider 5

Page 6: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Abort or delay a transaction, i.e. adapt load Distributed

Each thread has its own manager Example

Initially: A=1, B=1Manager 1 Manager 2

T1Trans. 1 T1Trans. 2

B:=2…A:=3…

conflict…A:=2…

Abort (undo all changes, i.e. set A:=1) and restart (after a while)

T1Trans.1

…A:=2…

Trans. 2

B:=2…A:=3…

conflict

Abort (set B:=1) and restart OR wait and retry

Conflicts – A contention manager decides

Johannes Schneider 6

Manager 1 Manager 2

Delay to adapt load!

Page 7: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Prior work Contention Managers [PODC03,PODC05,ISAAC09…]

System load was not (explicitly) considered Load adaption (based on contention)

Estimate contention intensity: CI [SPAA08]

If abort: CI = a CI + (1-a) with parameter a [0,1] If commit: CI = a CI If CI > parameter b then resort to central scheduler

Keep a transaction queue per core [PODC08]

Central dispatcher assigns transactions to a core, i.e. its queue Each core iteratively executes transactions from queue If transaction A on core 1 is aborted due to B on core 2

then A is appended to the queue of core 2

Þ Central scheduler will become a bottleneck

Johannes Schneider 7

Core 1 Core 2

A B

CD

Core 1 Core 2A

BC

D

B aborts A

Page 8: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

This paper

Theoretical analysis Decentralized (simple) approaches to load adaption

based on contention

Johannes Schneider 8

Page 9: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Strategies

Ignore: Do not learn from conflicts ImmediateRestart

Stay real: Remember faced conflicts SerializeFacedConflicts Do not schedule prior conflicting transactions

concurrently

Be cautious: Assume additional conflicts SerializeAll All transactions in a subgraph are assumed to conflict

Johannes Schneider 9

BA

DC

Conflict graphA conflicted with CD conflicted with B

A

D

C

B

A

DC

BA

C

B

D

Page 10: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Load Adaption Strategies

AbortBackoff If aborted wait for a random time [0,2#aborts] Priority = number of aborts #aborts

Who wins a conflict? 2 strategies

Estimate the work done Unrelated to work done

Johannes Schneider 10

Page 11: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Theory Part - Model

n transactions (and threads) Start concurrently on n cores

Transaction sequence of operations operation takes 1 time unit duration (number of operations) tT is fixed 2 types of operations

Write = modify (shared) resource and lock it until commit Compute/abort/commit

Ignore overhead of load adaption Remembering transactions, scheduling…

Johannes Schneider 11

Core 1 Core 2BA

Core nZ…

A

Page 12: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Moderate parallelism

Shared counter Conflicts directly after transaction start

Linked List Conflicts at arbitrary time

Expected time span until all transactions committed

Speed-up log n (at best)Johannes Schneider 12

Policy Counter ListImmediateRestartAbortBackoffSerializeFacedConflictsSerializeAll

Transaction run time#transactions

Page 13: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Substantial parallelism

Worst case Conflict graph is d-ary tree of logarithmic height

Exponential gap in worst case SerializeAll and others

Johannes Schneider 13

Policy Time until transactions committed

ImmediateRestartAbortBackoffSerializeFacedConflictsSerializeAll

T1

T2 T3

T4 T5 …

Page 14: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Practical investigation

Remembering conflicts causes too much overheadÞGood for analysis but not for implementation

Quickadapter Serializes transactions Each core has a “waiting” flag If aborted, set flag and wait until flag unset If commit, unset some flag

AbortBackOff

(Also considered some variants)

Johannes Schneider 14

Page 15: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Practical investigation

Evaluation on 16 core machine DSTM2 system

Visible readers

Six benchmarks Little parallelism

Shared counter, Sorted List (accessed objects not released), Listcounter Considerable parallelism

Red Black Tree, LFUCache, RandomAccessArray

Compare new load adaption policies to existing contention managers

Johannes Schneider 15

Page 16: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Discussion

Þ Hard to keep maximum throughput, also in [SPAA08, PODC08]

Even without conflicts Improvement for 1 benchmark worsens another

On average better than schemes without load adaption16Johannes Schneider

Page 17: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Conclusion

Simple and distributed load adaption strategies

Theory (For now) constants and parameters matter a lot

Practice Hard to keep load at peak for all usage patterns

17Johannes Schneider

Page 18: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

18Johannes Schneider

\vspace{10pt}

Thanks for your attention!

Questions? ???

Page 19: Transactional Memory: How to Perform Load Adaption  in a Simple And Distributed Manner

Analysis AbortBackoff for counter

Recall: If aborted wait for a random time [0,2#aborts] Assume #aborts ~ log (ntT) + x (for some x)

Define: a(x) := fraction of active nodes a(0) = 1 (after time ~2log (ntT) = ntT a constant fraction still active)

Chance conflict for interval [0,2#aborts] Interval [0, 2log(ntT)+x ]

~ a(x) ntT / 2log (ntT) +x = a(x) /2x

a(x+1) = a(x)/2x = 1/2∑i=0..x i ~ 1/2x2

a(√log n) = 1/2(√log n)2 = 1/n

∑i=0.. log (ntT) +√log n length interval = ∑

i=0.. .. log (ntT) +√log n 2i = ntT 2√log n+1

Johannes Schneider 19

T1 T2 T3

a(x)ntT = 3/n n tT = 3tT