Post on 31-Dec-2015
description
Transactional Memory
Why?
Avoid explicit locking Non-blocking: can’t deadlock Higher level abstraction for the programmer “The value of STM is that it allows you to focus on designing
applications which happen to scale instead of the mechanisms employed to scale those applications.”
CS5204 – Operating Systems 2
Shan Lu, et. al, Learning From Mistakes
Transactional Memory
What is TM?
CS5204 – Operating Systems 3
http://intranet.cs.man.ac.uk/apt/projects/TM/LeeRouting/
Transactional Memory
Hardware TM
Transactions operate on hardware data cache Changes to memory are committed atomically to other caches A natural extension of Load-Linked/Store-Conditional
LL/SC provides an atomic update to a single word of memory HTM extends LL/SC to create groups of atomic instructions
Generally, HTM systems are bounded Restricts the size of a transaction!
CS5204 – Operating Systems 5
Transactional Memory
Hardware TM Concepts
Snoopy Cache Coherence – Goodman 1983 Caches listen to the bus to detect changes to their
copy of the same cached address
CS5204 – Operating Systems 6
Shared Memory
Bus
address value state tag cache. . .
CPUCPU
Transactional Memory
Hardware TM Instructions
LT Read with NON-exclusive access (normal)
LTX Read with exclusive access Useful for when we anticipated writing to the read value soon
ST Write to memory
Transaction State Instructions: Validate, Commit, Abort XCOMMIT = Old vs. XABORT = New Processor flags TSTATUS and TACTIVE
CS5204 – Operating Systems 7
Transactional Memory
Quick Example
shared int counter;
void process(int work){
int success = 0, backoff = BACKOFF_MIN;
unsigned wait;
while (success < work) {
ST(&counter, LTX(&counter) + 1);
if (COMMIT()) {
success++;
backoff = BACKOFF_MIN;
}
else {
wait = random() % (1 << backoff);
while (wait--);
if (backoff < BACKOFF_MAX)
backoff++;
}
}
}
CS5204 – Operating Systems 8
Herlihy and Moss, Transactional Memory: Architectural Support for Lock-Free Data Structures
Transactional Memory
Hardware TM Concerns
Requires new hardware Possibility of starvation with BUSY signals
Augment with queuing mechanism
Separate Transactional Cache or not 1 cache: set size limits transaction size
But could use cache emulation in software Extra abort, etc. logic applies to entire larger cache Tradeoff between strong atomicity and efficiency
Hybrid Systems: VTM HASTM Hybrid TM
CS5204 – Operating Systems 9
Transactional Memory
Notable HTM Implementations
Simulations Sun’s Rock SPARC multicore processor
CS5204 – Operating Systems 10
SPARC Rock CPU HTM Test on Rock Surpasses STM
Transactional Memory
Performance
Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock. Fewer access to memory
Long transactions more likely to abort Large transactions more likely to exceed TM cache size
CS5204 – Operating Systems 11
Transactional Memory
Word-Based STM
CS5204 – Operating Systems 13
Shavit & Touitou Historical Each word of memory has an ownership record with old values Transactions can “help” each other
Harris & Fraser Track the deltas in transaction descriptors Atomically commit transactions to the ownership records
Transactional Memory
Object-Based STM
CS5204 – Operating Systems 14
Dynamic STM (Herlihy et. al.) Higher-level TM Objects FSTM (Fraser) Similar to Dynamic STM
Larus and Kozyrakis, Transactional Memory
Transactional Memory
Software TM Design
Closed nesting The child commits into the parent
Open nesting The child commits to the world
Other considerations: Direct/Deferred Update Early/Late Conflict Detection Conflict Resolution
E.g. Abort or Backoff Nesting Exceptions
Often ignored in STM designs
CS5204 – Operating Systems 15
figure adopted from [tcc-mcdonald-isca06]
Transactional Memory
Case Study: STM.NET
Microsoft’s experiment 2008-2010, then dropped Hook the C# JIT compiler, also investigated C++ Separate Haskell development Joe Duffy’s retrospective on unbounded STM:
1. Applying transactions to intrinsically non-transactional operations (I/O) Reading a block or file from the FS, output to the console, entry in the Event Log,
web service calls, etc.
2. Weak vs. Strong Atomicity Weak: Non-TM regions seeing the results of TM regions
3. Privatization
4. “Where is the killer App?”… most applications naturally parallel?
CS5204 – Operating Systems 16
“But with the wisdom of age and hindsight, I do believe limited forms of TM could be wildly successful at particular tasks and yet would have avoided many of the biggest challenges with unbounded TM.”
Transactional Memory
Other Notable STM Implementations
Intel STM Compiler Prototype (C/C++) Sun DSTM2 (Java factories) Several libraries from Harris & Fraser Other languages: Haskell, LISP, Clojure, C#, OCaml, Perl
CS5204 – Operating Systems 17
Transactional Memory
Summary
Hardware TM Make memory access atomic by holding in a transactional cache Caches for each CPU cooperate in determining use of memory locations Faster
Software TM Allows for a larger transactions and more design flexibility than HTM Both word and object level granularity Many possible design choices:
Strong/Weak Atomicity Granularity Conflict detection Nested (Open or Closed)
Real-world implementations continue Issues with particular design choices of STM.NET
CS5204 – Operating Systems 18