WG5: Applications & Performance Evaluation Pascal Felber [email protected].
Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer,...
-
Upload
samantha-baldwin -
Category
Documents
-
view
212 -
download
0
Transcript of Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer,...
Automatic Data Partitioning in Software Transactional Memories
Torvald Riegel,Christof Fetzer, Pascal Felber
(TU Dresden, Germany / Uni Neuchatel, Switzerland)
2
No one-size-fits-all TM! STMs:
Design: Invisible vs. visible reads Object-based vs. word-based
Parameters: Lock-based: #locks, addresslock mapping
HTMs: Different interfaces (e.g., Rock vs. AMD’s ASF) Resource bounds
Heterogeneous workloads: Global tuning does not help
Divide and conquer !?
3
How to divide User-driven? hmm, rather not …
Temporally Runtime tuning can handle phases … But only if whole workload has same phases
Memory “Word-based”: Mapping function is difficult
Runtime overheads Mapping needs to be stable Memory allocator affects mapping heavily (see false
conflicts) “Object-based”: still need mapping or per-object data
Code Problem: same function might operate on different data
4
How to conquer? Tune concurrency control mechanisms
Use different STM implementations Use HTM only where applicable/necessary Tune TM parameters per partition Challenge: Threads must agree on which
mechanisms to use for each item/location! Two-phase commit or similar is necessary
when using several independent TM mechanisms
Improve mapping/partitioning at other levels E.g., locationlock mapping
5
Data Partitioning
Partition memory automatically We use Pool Allocation (Lattner et al, PLDI 05) Mixed compile-time/runtime technique:
Based on pointer analysis for C/C++ Nodes in points-to graph become partitions Partitions are instantiated dynamically at runtime and
supplied to called functions that use these partitions
Memory allocator is not affected Implementation extends Tanger (STM compiler)
STM load/store functions get pointer to partition
6
Example: Points-to graph for STAMP’s Vacation
Type, if known
struct has 4 fields, 2 are
pointers
A Red-Black Tree instance
Partial,simplified DS graph for main()
A second Red-Black Tree instance
7
Conquering … Partition types determine STM implementation
used per partition (TinySTM): Multiple Locks (general purpose) Single Shared Lock (infrequently updated partitions) Single Exclusive Lock (low concurrency partitions) Read-Only (no concurrency control necessary) Thread-local, transaction-local
Loads/stores dispatched to type-specific STM functions on each call
Partition types and parameters can be tuned E.g., read-only partitions get tuned on first write
8
Performance
Exclusive Lockis faster thangeneral purposeSTM
Partitioningdecreases falseconflicts in lockarray.Lock hashfunction gets a2nd levelat compile time.
Partitioning addsruntime overhead
TinySTM w/o partitioningsupport, 220 / 224 locks
TinySTM with partitioning,4 different tuning heuristics
9
Performance (2)
Read-Only partitions duringfirst phase of benchmark
5 x 256K locks
226 locks !(224 livelocksdue tofalse conflicts)
10
Challenges Analysis: Calls to libraries?
Points-to graphs can probably be attached to libs (local per-function analysis + callgraph)
Analysis is bottom-up on call-graph
TM implementations that don’t support two-phase commit
Dispatch: Runtime overheads JIT? Size of binaries
Tuning partitions and partitioning No direct feedback, partitioning results in even more
parameters to be tuned Partition selection / merging at compile-time/runtime
11
Questions?
Tanger + TinySTM + …:
http://tinystm.org
(send email for version with partitioning support)
12
Backup Slides
13
Are there partitions?
14
Partition Type Performance & Tuning Strategies
Tuning strategy: Start with read-only type On reaching a certain number of aborts, switch to:
1. Single Exclusive Lock2. Single Shared Lock3. Multiple Locks
Part-1: switch directly to Multiple Locks, Part-4: try other types first (single locks, fewer multiple locks)
15
Analysis We use Data Structure Analysis (DSA [1]):
Pointer analysis for LLVM compiler framework Creates a points-to graph with Data Structure (DS) nodes Context-sensitive:
Data structures distinguished based on call graphs Field-sensitive:
distinguish between DS fields Unification-based:
Pointers target a single node in the points-to graph Information about pointers from different places get merged If incompatible information, node is collapsed (= “nothing
known”) Can safely analyze incomplete programs:
Calls to external / not analyzed functions have an effect only on the data that escapes into / from these functions (get marked “External”)
Analyzing more code increases analysis precision
[1] Chris Lattner, PhD thesis, 2005
16
Analysis (2)Integration into Tanger compilation process:1. Compile and link program parts into LLVM intermediate
representation module2. Analyze module using DSA
Local intra-function analysis: per-function DS graph Merge DS graphs bottom-up in callgraph (put callees’
information into callers) Merge DS graphs top-down in callgraph (vice versa)
3. Transactify module Use DSA information to decide between object-based /
word-based Requirement: If memory chunk (DS node) is object-
based, then it must be safe for object-based everywhere in the program
DSA can give us this guarantee4. Link in STM library and generate native code