Scheduling-based TM Contention Management A survey talk

Scheduling-based TM Contention ManagementA survey talk

3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome

Danny Hendler Ben-Gurion university

Danny Hendler

TM: a research toy?

TM

“TM:Why it is only a research toy.”

Cascaval et al., 2008

“Why STM can be more than a research toy.”

Dragojevic et al., 2009

There is consensus that TM performance must be improved. Specifically:

“In some workloads, performance degraded when we used too many concurrent threads. One possible alternative to improving performance in these cases would be to modify the thread scheduler so it avoids running more concurrent threads than is optimal for a given workload, based on the

information provided by the STM runtime. “ [Dragojevic et al.]

3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, RomeDanny Hendler

TM scheduling: rationale

Transactional threads controlled by TM-aware schedulero Kernel-level, user-level

Richer “tool-box“ for reducing and/or preventing transaction conflicts

Improve performance under high-contention


TM system

Non TM-scheduledthreads

Contention

Manager

ContentionDetection

arbitrate

proceed

Abort/retry, wait

“Conventional” (non-scheduling) contention management

greedy

Aggressive

Karma

Polka

Suicide

Polite

“Polymorphic contention management”, [Guerraoui, Herlihy & Pochon DISC'05]


Conventional Contention Management is often problematic

Loser resumes execution after a waiting period May resume execution too early May resume execution too late

Repeated collisions occur under high contention Livelocks Performance may become worse than single lock

Scheduling-based CM to the rescue.


Talk outline

Preliminaries The first TM schedulers Later user-land work Kernel support


“Adaptive Transaction Scheduling for transactional memory systems” [Yoo & Lee, SPAA'08]

“CAR-STM: Scheduling-based collision avoidance and resolution for software transactional memory” [Dolev, Hendler & Suissa, PODC '08]

“Steal-on-abort: dynamic transaction reordering to reduce conflicts in transactional memory” [Ansari , Jarvis, Kirkham, Kotsedilis, Lujan and Watson, HiPEAC'09]

The first TM schedulers


A single scheduling queue

Per-thread Contention Intensity (CI) computed

Adaptive mechanism CI below threshold transaction begins normally CI above threshold transaction serialized (queued)

Adaptive Transaction Scheduling (ATS)(Yoo & Lee, SPAA'08)


time

Contention Intensity

ThresholdTimeline 1

Timeline 2

Timeline 3

Timeline 4

Timeline 5

Queued TransactionExecuting Transaction

Timeline flows from top to bottom

An average of all the CIs from running threads

Transactions begin execution without resorting to the scheduler

As contention starts to increase,some transactions call the scheduler

As more transactions get serialized,contention intensity starts to decrease

Contention intensity subsides below threshold

More transactions start without the scheduler to exploit more parallelism

ATS adaptively varies the number of concurrent transactions according to the dynamic contention feedback

Behavior of a Queue-Based Scheduler

ATS: adaptive parallelism control

Yoo & Lee, Transaction Scheduling.

CAR-STM (Collision Avoidance and Resolution for STM) (Dolev, Hendler & Suissa, PODC'08)

Per-core transaction queues

Serialize conflicting transactions

Contention avoidance: attempt to avoid even first collision


CAR-STM high-level architecture

Transaction queue #1

TQ thread

TQ thread

Transaction thread

T-Info

Core #1

Serializing

contention mgr.

Dispatcher

Collision

Avoider

Core #k

Transaction queue #k


Execution time: STMBench7*

R/W dominated workloads

(*) “STMBench7: a benchmark for STM”, [Guerraoui, Kapalka & Viteck., Eurosys'07]

Shortcomings of first TM schedulers

May restrict parallelism too much

ATS: a single serialization queue

CAR-STM: at most a single transactional thread per coreo High overheads even in the lack of contention


Talk outline



A low-overhead serializer

Avoid repeated collisions while minimizing over-serialization

No per-core queues

Adaptive

• “On the impact of serializing contention management on STM performance”, [Heber, Hendler & Suissa., opodis'09]

• “Scheduling support for TM contention management”, [Maldonado, Felber, Fedorova, Hendler, Lawall, Marlier, Muller & Suissa PPoPP'10]


Transactional threads

Conditionvariables

A low-overhead serializer (cont'd)


1) t Identifies a collision

2) t calls contention manager: ABORT_OTHER

3) t changes status of t' to ABORT (writes that t is winner)

tt'

4) t' identifies it was aborted



t

t'

5) t' rolls back transaction and goes to sleep on the condition variable of t

6) Eventually t commits and broadcasts on its condition variable…



t'



A low-overhead serializer (cont'd)Stabilization mechanism

Algorithm is adaptiveo Serializing mode / “Conventional’’ mode

Prevents “mode-oscillations”:o Shifting to serialization-mode reduces perceived

contentiono Should use two thresholds


Throughput of CM conventional algorithms is low

CAR-STM over-serializes compared withlow-overhead serializer

Stabilization mechanism helps

A low-overhead serializer (cont'd)Experimental evaluation


“Shrink” – collision prevention/avoidance

Predicts future accesses based on past accesseso Read-set predicted based on past few committed/aborted TXs

(temporal locality)o Write-set predicted based on immediately preceding aborted

TX

Serialize with a probability proportional to the number of threads currently serialized (serialization affinity), if thread's success rate is low and a collision is predicted

• “Preventing versus curing: avoiding conflicts in transactional memories”, [Dragojevic, Guerraoui, Singh and Singh, PODC'09]


“Shrink” – collision prevention (cont'd)

Don't serialize when contention is low

Update statistics, release lock if you own it

Serialize only if contention is high &a collision is “predicted”


“Shrink” – experimental evaluationSTMBench7, read-write workload


Talk outline



Scheduling Support for Transactional Memory Contention Management

Implement CM scheduling support in the kernel scheduler (Linux & OpenSolaris) (Strict) serialization Soft serialization Time-slice extension

Different mechanisms for communication between user-level STM library and kernel scheduler

• “Scheduling support for TM contention management”, [Maldonado, Felber, Fedorova, Hendler, Lawall, Marlier, Muller & Suissa PPoPP'10]


TM Library / Kernel Communication via Shared Memory Segment (Ser-k algorithm)

User code notifies kernel on events such as: transaction start, commit and abort (in which case thread yields)

Kernel code handles moving thread between ready and blocked queues


Soft Serialization

Instead of blocking, reduce loser thread priority and yield

Efficient in scenarios where loser transactions may take a different execution path when retrying

Priority should be restored upon commit or when conflicting transactions terminate


Time-slice extention

Preemption in the midst of a transaction increases conflict “window of vulnerability”

Defer preemption of transactional threads avoid CPU monopolization by bounding number of

extensions and yielding after commit

May be combined with serialization/soft serialization


Evaluation (STMBench7, 16-core AMD

Opterom)

Conventional CM deteriorates when

threads>cores

Serializing by local spinning is efficient as long as threads ≤ cores


Evaluation - STMBench7 throughput

Serializing by sleeping on condition var is best when threads>cores, since system call overhead is negligible

(long transactions)All strict serialization schemes significantly reduce aborts


“Transactional scheduling for read-dominated workloads” [Attiya & Milani, OPODIS'09]

“Taking the heat off transactions: dynamic selection of pessimistic concurrency control [Sonmez, Harris, Cristal, Unsal & Valeo, IPDPS'09]

“Proactive transaction scheduling for contention management”[Blake, Dreslinky & Mudge, MICRO'09]

“Improving performance by reducing aborts in HTM”[Ansari, Khan, Lujan, Kotselidis, Kirkham and Watson, HIPEAC'10]

“Window-based greedy contention management for TM”[Sharma, Estrade & Busch, DC'10]

“On Transaction Scheduling in distributed TM systems][Kim & Ravindran, 2010]

“Kernel-assisted Scheduling and Deadline Support for STM”[Maldonado, Marlier, Felber, Lawall, Muller & Riviere, DSN'11]

“Adaptive thread scheduling techniques for improving scalability of STM”[Chan, Lam & Wang, 2011]

…

Additional TM scheduling work


Scheduling support for TMConclusions & future work

Scheduling-based CM results in improved throughput under high contentiono Overhead is negligible when contention is low

Lightweight kernel support can improve performance and efficiency for some workloads

Dynamically selecting best CM algorithm for workload at hand is a challenging research directiono Machine learning?


Thank you.


Scheduling-based TM Contention Management A survey talk

Documents

Transcript of Scheduling-based TM Contention Management A survey talk