Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static...
Transcript of Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static...
![Page 1: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/1.jpg)
Hot cold splitting in LLVM
Aditya KumarFacebook
![Page 2: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/2.jpg)
How does the density of an object affect its ability to float?
With apologies to the Tweeter...
...
[]
![Page 3: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/3.jpg)
“... but, yet, it's one of the most interesting things that happened in the LLVM optimizer this year.”
Anonymous Reviewer
![Page 4: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/4.jpg)
Hot cold splitting ● Intro
● Regions
● Marking Edges
● Propagating Profile Info
● Extracting maximal region
● Experimental Results
● Opportunities for improvement
![Page 5: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/5.jpg)
Regions
1. SESE
2. SEME
Image source: https://upload.wikimedia.org/wikipedia/commons/3/30/Some_types_of_control_flow_graphs.svg
SESE SEME
![Page 6: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/6.jpg)
Converting SEME to SESE
![Page 7: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/7.jpg)
Marking Edges
● Using static analysis
○ e.g., __builtin_expect, assertions, non-returning functions, catch-block
● Using dynamic profile information
![Page 8: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/8.jpg)
Propagating Profile Info
● Using dominance and post-dominance
CFG of ‘foo’
![Page 9: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/9.jpg)
Extracting cold region
1. Find maximal region
2. Compute inputs outputs
3. Extract as function
4. Add attributes
○ noinline, minsize, cold CFG of ‘foo’
CFG of ‘foo.cold.1 ’
![Page 10: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/10.jpg)
Design decisions (implementing in the middle end)
Advantages
Focus on the optimization and tuning
Optimize cold functions for size
Take advantage of (thin)LTO
Helps all backend targets
Low maintenance overhead
Drawbacks
Architecture specific opportunities
![Page 11: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/11.jpg)
Applications benefitting from HotColdSplitting
High icache misses
- Code with lots of branches
- Smaller page size
High premain time
- Reduce startup working set
![Page 12: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/12.jpg)
Experimental setup
- 2 step build with PGO or AutoFDO
Measurements
- Measure pre-main metrics e.g., page faults
- iCache misses (perf stat -e icache.misses)- Field data
- Code size
Experiment Evaluation
![Page 13: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/13.jpg)
Execution time
LLVM Testsuite
![Page 14: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/14.jpg)
Code size
LLVM Testsuite
![Page 15: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/15.jpg)
LLVM-testsuite (# of functions outlined)
LLVM Testsuite
![Page 16: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/16.jpg)
LLVM testsuite (perf stat*)
* perf stat -e instructions,icache.misses (try `perf list` to find out other metrics of interest)
![Page 17: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/17.jpg)
Impact
1. Enabled in Xcode, swift-llvm
2. ios-13 shipped with hot cold splitting enabled
○ All core libraries e.g., libc++, libSystem, dyld, CoreFoundation, UIKit, SSL
![Page 18: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/18.jpg)
Opportunities for improvement
1. Concepts of hot-cold
2. Outlining maximal regions
3. Improving static analysis
4. Improving Code Extractor
5. Tuning cost model for code-size
6. Merge Similar Function meets Hot Cold Splitting
7. Outlining regions post-dominated by non-returning function calls (D69257)
![Page 19: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/19.jpg)
Concepts of hot-cold partitioning
Hot = interesting
Cold = not interesting
- Randomly outlining code- https://reviews.llvm.org/D65376
- Hard coding custom sub-graphs- Or pass as compiler flags
![Page 20: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/20.jpg)
Outlining maximal regions
![Page 21: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/21.jpg)
Merge Similar Function + Hot Cold Splitting
Schedule MergeSim after HotColdSplit
- May improve code-size with appropriate
cost model
*Repaired the port of merge-similar-functions (MergeSim) to thinLTO https://reviews.llvm.org/D52896
![Page 22: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/22.jpg)
Performance
![Page 23: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/23.jpg)
Codesize
![Page 24: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/24.jpg)
Acknowledgements
Vedant KumarSebastian PopTeresa JohnsonSergey DmitrievKrzysztof Parzyszek
References:
https://reviews.llvm.org/D50658http://lists.llvm.org/pipermail/llvm-dev/2019-January/129606.html
$ c++filt __Z3fooifoo(int)$ c++filt __Z3fooi.cold.1foo(int) (.cold.1)$ c++filt __Z3fooi_cold__Z3fooi_cold
![Page 25: Hot cold splitting in LLVM1. Concepts of hot-cold 2. Outlining maximal regions 3. Improving static analysis 4. Improving Code Extractor 5. Tuning cost model for code-size 6. Merge](https://reader035.fdocuments.in/reader035/viewer/2022071101/5fda8db5d5018c2e6b106e1c/html5/thumbnails/25.jpg)
● How does Hot Cold splitting perform in absence of profile information, i.e. using only static analysis?
○ Depends on programmer annotations and programming-language features○ Only 280 functions outlined in llvm without profile information.
● Is this optimization now mature enough to be ON by default with PGO?○ Issues with AssumptionCache, and CodeExtractor: PR40710, PR43424
● Difference in performance for C vs C++ applications?○ Try-catch blocks
● Interaction with code layout optimization which reorder hot/warm BBs to reduce instruction cache misses
○ Reordering doesn’t change dominance● Debuginfo support for this optimization
○ Reasonable?● How to reduce code-size growth
○ Tune the number of function arguments to be created while splitting
Possible questions