Scheduling-based TM Contention Management A survey talk
description
Transcript of Scheduling-based TM Contention Management A survey talk
Scheduling-based TM Contention ManagementA survey talk
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Danny Hendler Ben-Gurion university
Danny Hendler
TM: a research toy?
TM
“TM:Why it is only a research toy.”
Cascaval et al., 2008
“Why STM can be more than a research toy.”
Dragojevic et al., 2009
There is consensus that TM performance must be improved. Specifically:
“In some workloads, performance degraded when we used too many concurrent threads. One possible alternative to improving performance in these cases would be to modify the thread scheduler so it avoids running more concurrent threads than is optimal for a given workload, based on the
information provided by the STM runtime. “ [Dragojevic et al.]
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
TM scheduling: rationale
Transactional threads controlled by TM-aware schedulero Kernel-level, user-level
Richer “tool-box“ for reducing and/or preventing transaction conflicts
Improve performance under high-contention
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
TM system
Non TM-scheduledthreads
Contention
Manager
ContentionDetection
arbitrate
proceed
Abort/retry, wait
“Conventional” (non-scheduling) contention management
greedy
Aggressive
Karma
Polka
Suicide
Polite
“Polymorphic contention management”, [Guerraoui, Herlihy & Pochon DISC'05]
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Conventional Contention Management is often problematic
Loser resumes execution after a waiting period May resume execution too early May resume execution too late
Repeated collisions occur under high contention Livelocks Performance may become worse than single lock
Scheduling-based CM to the rescue.
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Talk outline
Preliminaries The first TM schedulers Later user-land work Kernel support
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
“Adaptive Transaction Scheduling for transactional memory systems” [Yoo & Lee, SPAA'08]
“CAR-STM: Scheduling-based collision avoidance and resolution for software transactional memory” [Dolev, Hendler & Suissa, PODC '08]
“Steal-on-abort: dynamic transaction reordering to reduce conflicts in transactional memory” [Ansari , Jarvis, Kirkham, Kotsedilis, Lujan and Watson, HiPEAC'09]
The first TM schedulers
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
A single scheduling queue
Per-thread Contention Intensity (CI) computed
Adaptive mechanism CI below threshold transaction begins normally CI above threshold transaction serialized (queued)
Adaptive Transaction Scheduling (ATS)(Yoo & Lee, SPAA'08)
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
time
Contention Intensity
ThresholdTimeline 1
Timeline 2
Timeline 3
Timeline 4
Timeline 5
Queued TransactionExecuting Transaction
Timeline flows from top to bottom
An average of all the CIs from running threads
Transactions begin execution without resorting to the scheduler
As contention starts to increase,some transactions call the scheduler
As more transactions get serialized,contention intensity starts to decrease
Contention intensity subsides below threshold
More transactions start without the scheduler to exploit more parallelism
ATS adaptively varies the number of concurrent transactions according to the dynamic contention feedback
Behavior of a Queue-Based Scheduler
ATS: adaptive parallelism control
Yoo & Lee, Transaction Scheduling.
CAR-STM (Collision Avoidance and Resolution for STM) (Dolev, Hendler & Suissa, PODC'08)
Per-core transaction queues
Serialize conflicting transactions
Contention avoidance: attempt to avoid even first collision
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
CAR-STM high-level architecture
Transaction queue #1
TQ thread
TQ thread
Transaction thread
T-Info
Core #1
Serializing
contention mgr.
Dispatcher
Collision
Avoider
Core #k
Transaction queue #k
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Execution time: STMBench7*
R/W dominated workloads
(*) “STMBench7: a benchmark for STM”, [Guerraoui, Kapalka & Viteck., Eurosys'07]
Shortcomings of first TM schedulers
May restrict parallelism too much
ATS: a single serialization queue
CAR-STM: at most a single transactional thread per coreo High overheads even in the lack of contention
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Talk outline
Preliminaries The first TM schedulers Later user-land work Kernel support
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
A low-overhead serializer
Avoid repeated collisions while minimizing over-serialization
No per-core queues
Adaptive
• “On the impact of serializing contention management on STM performance”, [Heber, Hendler & Suissa., opodis'09]
• “Scheduling support for TM contention management”, [Maldonado, Felber, Fedorova, Hendler, Lawall, Marlier, Muller & Suissa PPoPP'10]
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Transactional threads
Conditionvariables
A low-overhead serializer (cont'd)
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
1) t Identifies a collision
2) t calls contention manager: ABORT_OTHER
3) t changes status of t' to ABORT (writes that t is winner)
tt'
4) t' identifies it was aborted
A low-overhead serializer (cont'd)
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
t
t'
5) t' rolls back transaction and goes to sleep on the condition variable of t
6) Eventually t commits and broadcasts on its condition variable…
A low-overhead serializer (cont'd)
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
t'
A low-overhead serializer (cont'd)
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
A low-overhead serializer (cont'd)Stabilization mechanism
Algorithm is adaptiveo Serializing mode / “Conventional’’ mode
Prevents “mode-oscillations”:o Shifting to serialization-mode reduces perceived
contentiono Should use two thresholds
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Throughput of CM conventional algorithms is low
CAR-STM over-serializes compared withlow-overhead serializer
Stabilization mechanism helps
A low-overhead serializer (cont'd)Experimental evaluation
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
“Shrink” – collision prevention/avoidance
Predicts future accesses based on past accesseso Read-set predicted based on past few committed/aborted TXs
(temporal locality)o Write-set predicted based on immediately preceding aborted
TX
Serialize with a probability proportional to the number of threads currently serialized (serialization affinity), if thread's success rate is low and a collision is predicted
• “Preventing versus curing: avoiding conflicts in transactional memories”, [Dragojevic, Guerraoui, Singh and Singh, PODC'09]
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
“Shrink” – collision prevention (cont'd)
Don't serialize when contention is low
Update statistics, release lock if you own it
Serialize only if contention is high &a collision is “predicted”
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
“Shrink” – experimental evaluationSTMBench7, read-write workload
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Talk outline
Preliminaries The first TM schedulers Later user-land work Kernel support
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Scheduling Support for Transactional Memory Contention Management
Implement CM scheduling support in the kernel scheduler (Linux & OpenSolaris) (Strict) serialization Soft serialization Time-slice extension
Different mechanisms for communication between user-level STM library and kernel scheduler
• “Scheduling support for TM contention management”, [Maldonado, Felber, Fedorova, Hendler, Lawall, Marlier, Muller & Suissa PPoPP'10]
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
TM Library / Kernel Communication via Shared Memory Segment (Ser-k algorithm)
User code notifies kernel on events such as: transaction start, commit and abort (in which case thread yields)
Kernel code handles moving thread between ready and blocked queues
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Soft Serialization
Instead of blocking, reduce loser thread priority and yield
Efficient in scenarios where loser transactions may take a different execution path when retrying
Priority should be restored upon commit or when conflicting transactions terminate
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Time-slice extention
Preemption in the midst of a transaction increases conflict “window of vulnerability”
Defer preemption of transactional threads avoid CPU monopolization by bounding number of
extensions and yielding after commit
May be combined with serialization/soft serialization
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Evaluation (STMBench7, 16-core AMD
Opterom)
Conventional CM deteriorates when
threads>cores
Serializing by local spinning is efficient as long as threads ≤ cores
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Evaluation - STMBench7 throughput
Serializing by sleeping on condition var is best when threads>cores, since system call overhead is negligible
(long transactions)All strict serialization schemes significantly reduce aborts
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
“Transactional scheduling for read-dominated workloads” [Attiya & Milani, OPODIS'09]
“Taking the heat off transactions: dynamic selection of pessimistic concurrency control [Sonmez, Harris, Cristal, Unsal & Valeo, IPDPS'09]
“Proactive transaction scheduling for contention management”[Blake, Dreslinky & Mudge, MICRO'09]
“Improving performance by reducing aborts in HTM”[Ansari, Khan, Lujan, Kotselidis, Kirkham and Watson, HIPEAC'10]
“Window-based greedy contention management for TM”[Sharma, Estrade & Busch, DC'10]
“On Transaction Scheduling in distributed TM systems][Kim & Ravindran, 2010]
“Kernel-assisted Scheduling and Deadline Support for STM”[Maldonado, Marlier, Felber, Lawall, Muller & Riviere, DSN'11]
“Adaptive thread scheduling techniques for improving scalability of STM”[Chan, Lam & Wang, 2011]
…
Additional TM scheduling work
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Scheduling support for TMConclusions & future work
Scheduling-based CM results in improved throughput under high contentiono Overhead is negligible when contention is low
Lightweight kernel support can improve performance and efficiency for some workloads
Dynamically selecting best CM algorithm for workload at hand is a challenging research directiono Machine learning?
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler
Thank you.
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler