Adaptive Optimization in the Jalapeño JVM
description
Transcript of Adaptive Optimization in the Jalapeño JVM
![Page 1: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/1.jpg)
Adaptive Optimization in the Jalapeño JVM
M. Arnold, S. Fink, D. Grove,
M. Hind, P. Sweeney
Presented by Andrew Cove
15-745 Spring 2006
![Page 2: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/2.jpg)
Jalapeño JVM
• Research JVM developed at IBM T.J. Watson Research Center• Extensible system architecture based on federation of threads that
communicate asynchronously• Supports adaptive multi-level optimization with low overhead
– Statistical sampling
![Page 3: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/3.jpg)
Contributions
• Extensible adaptive optimization architecture that enables online feedback-directed optimization
• Adaptive optimization system that uses multiple optimization levels to improve performance
• Implementation and evaluation of feedback-directed inlining based on low-overhead sample data
• Doesn’t require programmer directives
![Page 4: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/4.jpg)
Jalapeño JVM - Details
• Written in Java– Optimizations applied not only to application and libraries, but to JVM
itself
– Boot Strapped• Boot image contains core Jalapeño services precompiled to machine code• Doesn’t need to run on top of another JVM
• Subsystems– Dynamic Class Loader
– Dynamic Linker
– Object Allocator
– Garbage Collector
– Thread Scheduler
– Profiler • Online measurement system
– 2 Compilers
![Page 5: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/5.jpg)
Jalapeño JVM - Details
• 2 Compilers– Baseline
• Translates bytecodes directly into native code by simulating Java’s operand stack
• No register allocation
– Optimizing Compiler• Linear scan register allocation• Converts bytecodes into IR, which it uses for optimizations• Compile-only
– Compiles all methods to native code before execution
– 3 levels of optimization
– …
![Page 6: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/6.jpg)
Jalapeño JVM - Details
• Optimizing Compiler (without online feedback)– Level 0: Optimizations performed during conversion
• Copy, Constant, Type, Non-Null propagation• Constant folding, arithmetic simplification• Dead code elimination• Inlining• Unreachable code elimination• Eliminate redundant null checks• …
– Level 1:• Common Subexpression Elimination• Array bounds check elimination• Redundant load elimination• Inlining (size heuristics)• Global flow-insensitive copy and constant propagation, dead assignment
elimination• Scalar replacement of aggregates and short arrays
![Page 7: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/7.jpg)
Jalapeño JVM - Details
• Optimizing Compiler (without online feedback)– Level 2
• SSA based flow sensitive optimizations• Array SSA optimizations
![Page 8: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/8.jpg)
Jalapeño JVM - Details
![Page 9: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/9.jpg)
Jalapeño Adaptive Optimization System (AOS)
• Sample based profiling drives optimized recompilation• Exploit runtime information beyond the scope of a static model• Multi-level and adaptive optimizations
– Balance optimization effectiveness with compilation overhead to maximize performance
• 3 Component Subsystems (Asynchronous threads)– Runtime Measurement
– Controller
– Recompilation
– Database (3+1 = 3 ?)
![Page 10: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/10.jpg)
Jalapeño Adaptive Optimization System (AOS)
![Page 11: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/11.jpg)
Subsystems – Runtime Measurement
• Sample driven program profile– Instrumentation
– Hardware monitors
– VM instrumentation
– Sampling• Timer interrupts trigger yields between threads• Method-associative counters updated at yields
– Triggers controller at threshold levels
• Data processed by organizers– Hot method organizer
• Tells controller the time dominant methods that aren’t fully optimized
– Decay organizer• Decreases sample weights to emphasize recent data
![Page 12: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/12.jpg)
Hotness
• A hot method is where the program spends a lot of its time• Hot edges are used later on to determine good function calls to
inline• In both cases, hotness is a function of the number of samples that
are taken– In a method
– In a given callee from a given caller
• The system can adaptively adjust hotness thresholds– To reduce optimization in startup
– To encourage optimization of more methods
– To reduce analysis time when too many methods are hot
![Page 13: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/13.jpg)
Subsystems – Controller
• Orchestrates and conducts the other components of AOS– Directs data monitoring
– Creates organizer threads
– Chooses to recompile based on data and cost/benefit model
![Page 14: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/14.jpg)
• To recompile or not to recompile?
• Find j that minimizes expected future running time of recompiled m• If , recompile m at level j• Assume, arbitrarily, that program will run for twice its current
duration• , Pm is estimated percentage of future time
Subsystems – Controller
![Page 15: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/15.jpg)
• System estimates effectiveness of optimization levels as constant based on offline measurements
• Uses linear model of compilation speed for each optimization level as function of method size– Linearity of higher level optimizations?
Subsystems – Controller
![Page 16: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/16.jpg)
Subsystems – Recompilation
• In theory– Multiple compilation threads that invoke compilers
– Can occur in parallel to the application
• In practice– Single compilation thread
• Some JVM services require the master lock– Multiple compilation threads are not effective
– Lock contention between compilation and application threads
– Left as a footnote!
• Recompilation times are stored to improve time estimates in cost/benefit analysis
![Page 17: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/17.jpg)
Feedback-Directed Inlining
• Statistical samples of method calls used to build dynamic call graph– Traverse call stack at yields
• Identify hot edges– Recompile caller methods with inlined callee (even if the caller was
already optimized)
• Decay old edges• Adaptive Inlining Organizer
– Determine hot edges and hot methods worth recompiling with inlined method call
– Weight inline rules with boost factor• Based on number of calls on call edge and previous study on effects of
removing call overhead• Future work: more sophisticated heuristic
• Seems obvious: new inline optimizations don’t eliminate old inlines
![Page 18: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/18.jpg)
Experimental Methodology
• System– Dual 333MHz PPC processors, 1 GB memory
• Timer interrupts at 10 ms intervals• Recompilation organizer 2 times per second to 1 time every 4s• DCG and adaptive inline organizer every 2.5 seconds• Method sample half life 1.7 seconds• Edge weight half life 7.3 seconds
• SPECjvm98• Jalapeño Optimizing Compiler• Volano chat room simulator
• Startup and Steady-State measurements
![Page 19: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/19.jpg)
Results
• Compile time overhead plays large role in startup
![Page 20: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/20.jpg)
Results
• Multilevel Adaptive does well (and JIT’s don’t have overhead)
![Page 21: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/21.jpg)
Results
• Startup doesn’t reach high enough optimization level to benefit
![Page 22: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/22.jpg)
Questions
• Assuming execution time will be twice the current duration is completely arbitrary, but has nice outcome (less optimization at startup, more at steady state)
• Meaningless measurements of optimizations vs. phase shifts– Due to execution time estimation
![Page 23: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/23.jpg)
Questions
• Does it scale?– More online-feedback optimizations
• More threads needing cycles– Organizer threads
– Recompilation threads
• More data to measure• Especially slow if there can only be one recompilation thread• More complicated cost/benefit analysis
– Potential speed ups and estimate compilation times
![Page 24: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/24.jpg)
Questions
![Page 25: Adaptive Optimization in the Jalapeño JVM](https://reader035.fdocuments.in/reader035/viewer/2022062422/56814097550346895dac2fcc/html5/thumbnails/25.jpg)
Questions