Racing Vehicle Control Adam Balgach [email protected] Adam Balgach [email protected].
HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system...
Transcript of HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system...
![Page 1: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/1.jpg)
HTM in the wild
Konrad Lai June 2015
![Page 2: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/2.jpg)
2 2 HTM in the Wild
Industrial Considerations for HTM
• Provide a clear benefit to customers
• Improve performance & scalability
• Ease programmability going forward
• Improve something common and fundamental
• Widely used critical section/lock-based primitives
• In an easy to use and deploy manner
• Minimal eco-system impact and effort
• Clean architectural boundaries
• While managing HW design and validation complexity
![Page 3: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/3.jpg)
3 3 HTM in the Wild
HTM [Mechanism]
• 1993 HTM paper, Herlihy & Moss • 2001 Lock elision, Rajwar & Goodman • 2003 STM, TM [programming model], … • 2006 1st TRANSACT
• Commercial Implementations
• 2011 IBM Blue Gene/Q • 2012 IBM zEC12 mainframe • 2013 Intel 4th generation Core (Haswell) • 2014 IBM POWER8 • 2015 Intel Xeon E7 v3, 4-way and 8-way SMP
• 1993 idea plus 2001 usage model • Lock Elision • Probabilistic lock free
• 2003 onward is still work in progress
![Page 4: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/4.jpg)
4 4 HTM in the Wild
HTM Features Convergence
• Convergence over basic functionalities… • Best effort HTM • Leverage cache coherency protocol/cache(s) • Strong Isolation • Hardware buffering • Reasonable buffer size • No instruction count limit • Checkpoint of Registers • Implicitly Transactional
• Some differences… • IBM BGQ supports thread speculation • IBM zEC supports constrained transactions • IBM POWER8 supports suspend/resume • IBM zEC/POWER8 supports non-txn stores (but differently) • IBM POWER8 supports Recovery Only Transactions • TX capacity varies medium to large
![Page 5: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/5.jpg)
5 5 HTM in the Wild
Lemming Effect
XA : xbegin; test; xabort; (retry loop when lock is busy)
L—U: Lock; critical section; Unlock (non-transactional execution)
T1 --AL------------UXAXAXAXAXAssssssssssssssssL----------UXAXA
T2 ---AXAXAXAXAXAsssL------------UXAXAXAXAXAXAssssssssssssL---
T3 ---AXAXAXAXAXAsssssssssssssssssL----------UXAXAXAXAXAssssss
Persistent convoy of non-transactional execution
Elision is effectively disabled until all threads have serially released the lock
– Disabled forever if at least 1 thread is holding the lock
Fix is simple
– Don’t retry until the lock is free
– Use well-known test-and-test-&-set pattern T1 --AL------------UX-------
T2 ---AsssssssssssssX-------
T3 ---AsssssssssssssX-------
Appear in far too many refereed papers
![Page 6: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/6.jpg)
6 6 HTM in the Wild
Intel TSX Case Studies: Databases • HPCA 2014
Improving In-Memory Database Index Performance with Intel® Transactional Synchronization Extensions - Tomas Karnagel, Roman Dementiev, Ravi Rajwar, Konrad Lai, Thomas Legler, Benjamin Schlegel, Wolfgang Lehner (Intel, SAP AG and TU Dresden)
• EuroSys 2014 Using Restricted Transactional Memory to Build a Scalable In-Memory Database. - Zhaoguo Wang, Hao Qian, Jinyang Li, Haibo Chen (Fudan University, Shanghai Jiao Tong University, New York University)
• TDKE 2015 Scaling HTM-Supported Database Transactions to Many Cores - Viktor Leis, Alfons Kemper, Thomas Neumann (TU Munchen)
![Page 7: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/7.jpg)
7 7 HTM in the Wild
A Case Study: Two Index Implementations
• Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server
B+Tree Index
(a common index implementation)
Delta Storage Index
(from the SAP HANA® database)
0
5
10
15
20
0 20
spe
ed
up
no lock
RW Lock
Spin Lock
0
5
10
15
20
25
0 20
spe
ed
up
no lock
Spin Lock
RW Lock
Hidden Scalability Impact of Atomic Read-Modify-Write Operations
![Page 8: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/8.jpg)
8 8 HTM in the Wild
• SAP HANA Database
– Read optimized column store database system
• Two index implementations
– B+Tree [Data Structure]
– Common implementation
– Smaller foot print
– Delta Storage Index (B+Tree with a Dictionary)
– Complex data structure with additional structures
– Large foot print
Lock protect access
– Reader-Writer
– Spin Lock
Case Study: Index Tree Implementations
![Page 9: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/9.jpg)
9 9 HTM in the Wild
Initial Results: B+Tree
• Intel TSX provides significant gains with no application changes
– Outperforms RW lock on read-only queries
– Significant gains with increasing inserts (6x for 50%)
Intel® Core™ i7 processor with 4 physical cores / 8 logical cores (HT)
0
1
2
3
4
5
6
7
8
9
10
0 20 40 60 80 100
rela
tive
sp
ee
du
p
insert operations (%)
No Concurrency Control
Spin Lock Elision w/ TSX
RW Lock
Spin Lock
![Page 10: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/10.jpg)
10 10 HTM in the Wild
Initial Results: Delta Storage Index
• Intel TSX provides gains with no application changes
– Different profile as compared to B+Tree
– Spin lock w/ Intel TSX better than RW Lock when > 5% insert
– Significant gap as compared to no concurrency control
• Baseline should implement good retry policy on aborts
Intel® Core™ i7 processor with 4 physical cores / 8 logical cores (HT)
0
1
2
3
4
5
6
7
0 20 40 60 80 100
rela
tive
pe
rfo
rman
ce
insert operations (%)
No Concurrency Control
Spin Lock Elision w/ TSX
Spin Lock Elision w/ TSX−no−retries
RW Lock
Spin lock
![Page 11: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/11.jpg)
11 11 HTM in the Wild
Software Transformations
• Capacity Aborts
– Node/Leaf Search Scan
– Causes O(n) random lookups
– Transformation – Binary Search
– Causes O(log(n)) random lookups
• Data Conflicts
– Single dictionary
– Global memory allocator
– Transformation – Multiple Dictionaries, per-thread/core allocators
Well Known Transformations
![Page 12: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/12.jpg)
12 12 HTM in the Wild
0
1
2
3
4
5
6
7
0 20 40 60 80 100
rela
tive
pe
rfo
rman
ce
insert operations (%)
No Concurrency Control
Spin Lock Elision w/ TSX−R (tuned)
Spin Lock Elision w/ TSX (previous)
RW Lock
Spin lock
Tuned Results: Delta Storage Index
• Intel TSX provides significant gains with transformations
– Restores read-only query performance
– Spin lock w/ Intel TSX significantly outperforms RW lock (5x for 50% inserts)
– Close to ‘No Concurrency Control”
![Page 13: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/13.jpg)
13 13 HTM in the Wild
4 way Intel Xeon E7 v3 w/wo TSX
![Page 14: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/14.jpg)
14 14 HTM in the Wild
TUM HyPer
•Breakup DB Txn
–Small HTM txn
•HTM Txn
–Sync access to DS
•Use timestamp to “commit” DB Txn
![Page 15: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/15.jpg)
15 15 HTM in the Wild
TUM HyPer Result – 2 way Xeon EP
![Page 16: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/16.jpg)
16 16 HTM in the Wild
TM Programming Model (C++TM)
• Is this a research toy?
– No – not even a toy as few play with it
– Take this out of the glass cage, and play with it
– Should we ban or boycott STAMP as workload ;-)
• Did not address issues raised in 2005
– Conditional synchronization
– Open and/or closed nesting
– Escape actions
– Inter-operate with other paradigms, e.g. locks
• Is the current set sufficient?
– Need broad usage experience
– Does this limit holistic performance?
• New issue – TM and persistent memory
![Page 17: HTM in the wild - Lehigh Universitytransact2015.cse.lehigh.edu/lai.pdf · •Minimal eco-system impact and effort ... •Read-Only Queries on Dual Socket Intel® Xeon® E5-2680 Server](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac7dca87f8b9aa1298bb5cc/html5/thumbnails/17.jpg)
17 17 HTM in the Wild
Better support for critical section?
• Even C++’11 is not good enough
• Tight definition of critical section (or sync block)
– Not just a coding convention
– Enable efficient application of lock elision
– Enable other transformations, like Hybrid Lock Elision
• How about adding lock declaration to C++TM synchronization block?
– Semi-automatic code refactoring needed
– Could be stepping stone to transactions
• Do we need cleaner threading library?
– Pthread has high overhead