Hybrid STM/HTM for Nested Transactions on OpenJDK
-
Upload
antony-hosking -
Category
Software
-
view
263 -
download
3
Transcript of Hybrid STM/HTM for Nested Transactions on OpenJDK
Hybrid STM/HTM for Nested Transactions on OpenJDK
Keith ChapmanPurdue U
Tony HoskingANU/Data61, Purdue U
Eliot MossUMass
MotivationSTM has been around for ages
• But STM is slow
Commodity hardware for transactions available now
• But HTM approaches are only best effort
Out goal: Accelerate STM with HTM when possible
Nested TransactionsAllow composition of transactions
Two flavors
• Closed Nested Transactions
• HTM inside STM not useful; must do all the same work
• Open Nested Transactions
• HTM inside STM avoids most TM work – HTM well-suited!
XJ: Transactional JavaXJ Language
• Supports flat and closed/open/boosted nested transactions
The implementation supports hybrid transactions
• HTM support via Intel TSX
• HTM and STM can co-exist
Uses an Optimistic-reads / Pessimistic-writes protocol
TM Metadata
Txn Log
0 0Version # 0 0 0Txn ID 1
One metadata word for every object
STM Transaction Protocol (Read)
T1 Log
0 00 0
STM Transaction Protocol (Read)
T1 Log
0 00 0
Check Mode
STM Transaction Protocol (Read)
T1 Log
0 0
0 0
T1 Read
STM Transaction Protocol (Read)
T1 Log
0 0
0 0
T1 Validate
STM Transaction Protocol (Read)
T1 Log
0 0
T1 Commit
0 0
STM Transaction Protocol (Write)
T1 Log
0 0
STM Transaction Protocol (Write)
T1 Log
Check Mode
T1 1
0 0
STM Transaction Protocol (Write)
T1 Log
T1 Write
T1 1
0 0
STM Transaction Protocol (Write)
T1 Log
T1 Write
T1 1
0 0
STM Transaction Protocol (Write)
T1 Log
T1 Write
T1 11 0
STM Transaction Protocol (Write)
T1 Log
T1 Commit
STM Conflict Detection
T1 Log
0 00 0
T2 Log
0 0
STM Conflict Detection
T1 Log
0 0
0 0
T2 Log
0 0
T1 Read
STM Conflict Detection
T1 Log
0 0
0 0
T2 Log
0 0
T2 Write
T2 1
STM Conflict Detection
T1 Log
0 0
0 0
T2 Log
0 0
T1 Validate
T2 1
STM Conflict Detection
T1 Log
0 0
T2 Log
0 0
T1 Abort
T2 1
Hybrid Transaction ProtocolSTM – HTM conflicts detected by lock word accesses
• Explicit XABORT if locked by another transaction
• HTM reads – Read the metadata word
• STM writes modify the metadata word
• Causes HTM to abort
• HTM writes – Increment version number
• Causes STM read invalidation / HTM abort
Abstract Locking &
Undo Operations
Abstract Locks for STM
Open / Boosted Atomic Method Body
Acquire abstract locks
Release abstract locks
If top level transaction
Open / Boosted Atomic Method Body
Acquire abstract locks
Log abstract locks
If nested transaction
Log undo operations
Abstract Locks for Hybrid TM
Open / Boosted Atomic Method Body
Validate abstract locks
Open / Boosted Atomic Method Body
Acquire abstract locks
Log abstract locks
If top level is HTM
Log undo operations
Why Validation Works• HTM vs STM
• If abstract locks conflict they must touch some same physical words in the abstract locking data structure — otherwise they could not detect the conflict
Open / Boosted Atomic Method Body
Validate abstract locks
Why Validation Works• HTM vs STM
• If abstract locks conflict they must touch some same physical words in the abstract locking data structure — otherwise they could not detect the conflict
• HTM vs HTM
• No conflict in the locking data structure because all accesses to it are reads
• Any real conflicts that exist will occur on the actual data structure
Open / Boosted Atomic Method Body
Validate abstract locks
STM and HTM Methods
STM needs logging HTM doesn't
Different actions during read/write
Different actions for abstract locks
HTM should fall back to STM
STM and HTM Methods
STM needs logging HTM doesn't
Different actions during read/write
Different actions for abstract locks
HTM should fall back to STM
Maintain separate HTM and STM versions of methods
STM Method Variants
Original(Non-txnal)
STM Method Variants
Original(Non-txnal)
Can callA B
STM Method Variants
Original(Non-txnal)
Top-level Txn STM
Can callA B
STM Method Variants
Original(Non-txnal)
Top-level Txn STM
Can callA B
STM Method Variants
STM Transactionalized
Original(Non-txnal)
Top-level Txn STM
Can callA B
STM Method Variants
STM Transactionalized
Original(Non-txnal)
Top-level Txn STM
Can callA B
STM Method Variants
STM Transactionalized
Original(Non-txnal)
New nested Txn STM
Top-level Txn STM
Can callA B
STM Method Variants
STM Transactionalized
Original(Non-txnal)
New nested Txn STM
Top-level Txn STM
Can callA B
STM Method Variants
STM Transactionalized
Original(Non-txnal)
New nested Txn STM
Top-level Txn STM
Can callA B
STM Method Variants
STM Transactionalized
Original(Non-txnal)
New nested Txn STM
Top-level Txn STM
Can callA B
Hybrid TM Method VariantsOriginal
(Non-txnal)
Hybrid TM Method VariantsOriginal
(Non-txnal)Can callA B
Hybrid TM Method Variants
Router
Original(Non-txnal)Can call
A B
Hybrid TM Method Variants
Router
Original(Non-txnal)Can call
A B
Hybrid TM Method Variants
Top-level Txn STM
Router
Original(Non-txnal)Can call
A B
Hybrid TM Method Variants
Top-level Txn HTM
Top-level Txn STM
Router
Original(Non-txnal)Can call
A B
Hybrid TM Method Variants
HTM Transactionalized
Top-level Txn HTM
Top-level Txn STM
Router
Original(Non-txnal)Can call
A B
Hybrid TM Method Variants
HTM Transactionalized
Top-level Txn HTM
Top-level Txn STM
Router
Original(Non-txnal)Can call
A B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
Top-level Txn STM
Router
Original(Non-txnal)Can call
A B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
Top-level Txn STM
Router
Original(Non-txnal)Can call
A B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
Top-level Txn STM
Router
Original(Non-txnal)Can call
A B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
STMTransactionalized
Top-level Txn STM
Router
Original(Non-txnal)Can call
A B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
STMTransactionalized
Top-level Txn STM
Router
Original(Non-txnal)Can call
A B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
STMTransactionalized
Top-level Txn STM
New nestedTxn STM
Router
Original(Non-txnal)Can call
A BParent is closed
A B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
STMTransactionalized
Top-level Txn STM
New nestedTxn STM
Router
Original(Non-txnal)Can call
A BParent is closed
A B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
STMTransactionalized
Top-level Txn STM
New nestedTxn STM
Router
Original(Non-txnal)Can call
A BParent is closed
A B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
STMTransactionalized
Top-level Txn STM
New nestedTxn STM
Router Nested Router
Original(Non-txnal)Can call
A BParent is closed
A B
Parent is openA B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
STMTransactionalized
Top-level Txn STM
New nestedTxn STM
Router Nested Router
Original(Non-txnal)Can call
A BParent is closed
A B
Parent is openA B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
New nestedTxn HTM
STMTransactionalized
Top-level Txn STM
New nestedTxn STM
Router Nested Router
Original(Non-txnal)Can call
A BParent is closed
A B
Parent is openA B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
New nestedTxn HTM
STMTransactionalized
Top-level Txn STM
New nestedTxn STM
Router Nested Router
Original(Non-txnal)Can call
A BParent is closed
A B
Parent is openA B
Hybrid TM Method Variants
Optimized New nested Txn
HTMHTM
Transactionalized
Top-level Txn HTM
New nestedTxn HTM
STMTransactionalized
Top-level Txn STM
New nestedTxn STM
Router Nested Router
Original(Non-txnal)Can call
A BParent is closed
A B
Parent is openA B
XJ System Architecture
XJ System Architecture
XJ source code
XJ System Architecture
XJ source code XJ Compilerstandard Java
bytecode
compile
XJ System Architecture
XJ source code XJ Compilerstandard Java
bytecode XJ Rewriterbytecode
+ run-time calls
compile load
XJ System Architecture
XJ source code
XJ run-time library
XJ Compilerstandard Java
bytecode XJ Rewriterbytecode
+ run-time callsHTM-enabled
JVM
compile load run
XJ System Architecture
HTM 4-5 times faster than STM
XJ source code
XJ run-time library
XJ Compilerstandard Java
bytecode XJ Rewriterbytecode
+ run-time callsHTM-enabled
JVM
compile load run
VM ModificationsKept to a minimum
Modifications done on OpenJDK:
• Native methods to begin, end, and abort a HW transaction
• Made them intrinsic to the HotSpot C1/C2 compilers
Had to go through several hoops to get HTM to work with HotSpot’s optimizing compilers
Results
SynchrobenchMicro-benchmarks to evaluate synchronization performance on various data structures
Added ability to run multiple operations within a single transaction (group size)
Included XJ versions of the benchmarks
• TransactionalFriendlyTreeSet
48-way, Intel Xeon E5-2690 v3 machine with 2 sockets of 12 hyperthreaded cores
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Open nestedClosed nested
Group size 1
5% Updates
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Group size 1Group size 2
Open nestedClosed nested 5% Updates
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Group size 1Group size 2Group size 4
Open nestedClosed nested 5% Updates
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Group size 1Group size 2Group size 4
Group size 8
Open nestedClosed nested 5% Updates
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Group size 1Group size 2Group size 4
Group size 8Group size 16
Open nestedClosed nested 5% Updates
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Thro
ughp
ut (N
orm
aliz
ed)
0
0.2
0.4
0.6
0.8
1
Threads1 2 4 8 12 16 20 24 28 32 36 40 44 48
Group size 1Group size 2Group size 4
Group size 8Group size 16Group size 32
Open nestedClosed nested 5% Updates
0
50
100
150
200
1 2 4 8 12 16 20 24 28 32 36 40 44 48 1 2 4 8 12 16 20 24 28 32 36 40 44 48 1 2 4 8 12 16 20 24 28 32 36 40 44 48
Group size 1 Group size 2 Group size 4
com
mitt
ed o
ps a
nd a
borte
d tx
ns (1
06 )
threads
open htm commitsclosed htm commits
open stm commitsclosed stm commits
htm aborts
ConclusionsSTM and HTM can co-exist for nested transactions in Java
• Closed nesting — Similar to previous schemes
• Open nesting — Novel validation mechanism
• Implemented in OpenJDK on Intel TSX — Artifact evaluated
ConclusionsSTM and HTM can co-exist for nested transactions in Java
• Closed nesting — Similar to previous schemes
• Open nesting — Novel validation mechanism
• Implemented in OpenJDK on Intel TSX — Artifact evaluated
ConclusionsSTM and HTM can co-exist for nested transactions in Java
• Closed nesting — Similar to previous schemes
• Open nesting — Novel validation mechanism
• Implemented in OpenJDK on Intel TSX — Artifact evaluated
When it works, HTM is ~4-5× faster than STM
ConclusionsSTM and HTM can co-exist for nested transactions in Java
• Closed nesting — Similar to previous schemes
• Open nesting — Novel validation mechanism
• Implemented in OpenJDK on Intel TSX — Artifact evaluated
When it works, HTM is ~4-5× faster than STM
Open nesting increases the envelope of effectiveness for HTM
ConclusionsSTM and HTM can co-exist for nested transactions in Java
• Closed nesting — Similar to previous schemes
• Open nesting — Novel validation mechanism
• Implemented in OpenJDK on Intel TSX — Artifact evaluated
When it works, HTM is ~4-5× faster than STM
Open nesting increases the envelope of effectiveness for HTM
Production VM would need deeper modification