Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II
-
Upload
jacob-byers -
Category
Documents
-
view
42 -
download
2
description
Transcript of Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Optimizing CompilersCISC 673
Spring 2009Dynamic Compilation II
John CavazosUniversity of Delaware
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
What is in a Dynamic Compiler?
Interpretation Popular approach for high-level languages
Ex, Python, APL, SNOBOL, BCPL, Perl, MATLAB Useful for memory-challenged
environments Low startup time & space overhead, but
much slower than native code execution MMI (Mixed Mode Interpreter)
[Suganauma’01] Fast interpreter implemented in assembler
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
What is in a Dynamic Compiler?
Quick compilation Reduced set of optimizations for fast
compilation, little inlining Full compilation
Full optimizations only for selected hot methods Classic just-in-time compilation
Compile methods to native code on first invocation
Ex, ParcPlace Smalltalk-80, Self-91 Initial high (time & space) overhead for each
compilation Precludes use of sophisticated optimizations (eg. SSA)
Responsible for many of today’s myths
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Interpretation vs JIT
0
20
40
60
80
100
120
Intepreter Compiler
Initial Overhead Execution
0
500
1000
1500
2000
2500
Intepreter Compiler
Execution: 20 time units Execution: 2000 time units
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Selective Optimization
Hypothesis: most execution is spent in a small percentage of methods
Idea: use two execution strategies1. Interpreter or non-optimizing compiler2. Full-fledged optimizing compiler
Strategy: Use option 1 for initial execution of all
methods Profile to find “hot” subset of methods Use option 2 on this subset
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Selective Optimization
0
20
40
60
80
100
120
Intepreter Compiler Selective
Initial Overhead Execution
0
500
1000
1500
2000
2500
Intepreter Compiler Selective
Initial Overhead Execution
Selective opt: compiles 20% of methods, representing 99% of execution time
Execution: 20 time units Execution: 2000 time units
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Designing an Adaptive Optimization System What is the system architecture?
What are the profiling mechanisms and policies for driving recompilation? How effective are these systems?
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Basic Structure of a Dynamic Compiler
ProgramMachine
code
Structural inlining
unrollingloop perm
Scalar cse
constantsexpressions
Memory scalar repl
ptrs
Reg. Alloc
Scheduling peephole
Still needs good core compiler - but more
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Raw Profile Data
Instrumented code
Basic Structure of a Dynamic Compiler
Compiler subsystem
Optimizations
Interpreter or Simple Translation
Program Executing Program
Profile Processor
History
prior decisionscompile time
ControllerCompilation
decisions
Processed Profile
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling
Counters Call Stack Sampling Combinations
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling: Counters Insert method-specific counter on method entry and loop
back edges Counts how often a method is called and approximates how
much time is spent in a method Very popular approach: Self, HotSpot Issues: overhead for incrementing counter can be
significant Not present in optimized code
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling: Counters
foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . .
}
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling: Call Stack Sampling
Periodically record which method(s) are on call stack
Approximates amount of time spent in each method
Can be compiled into the code Jikes RVM, JRocket
or use hardware sampling Issues: timer-based sampling is not
deterministic
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling: Call Stack Sampling
ABC
AB
A AB
ABC
ABC
......
Sample
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling Mixed Combinations
Use counters initially and sampling later on IBM DK for Java
foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . }
ABC
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling Mixed Software Hardware Combination
Use interupts & sampling
foo ( … ) { if (flag is set) { sample( … ); } . . . }
ABC
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Recompilation Policies: Which Candidates to Optimize?
Problem: given optimization candidates, which should be optimized?
Counters: 1. Optimize method that surpasses threshold
Simple, but hard to tune, doesn’t consider context2. Optimize method on the call stack based on inlining
policies Addresses context issue
Call Stack Sampling: 1. Optimize all methods that are sampled
− Simple, but doesn’t consider frequency of sampled methods2. Use Cost/benefit model
Seemingly complicated, but easy to engineer Maintenance free Naturally supports multiple optimization levels
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Jikes RVM: Recompilation Policy – Cost/Benefit Model Define
cur, current opt level for method m Exe(j), expected future execution time at level j Comp(j), compilation cost at opt level j
Choose j > cur that minimizes Exe(j) + Comp(j)
If Exe(j) + Comp(j) < Exe(cur) recompile at level j Assumptions
Sample data determines how long a method has executed Method will execute as much in the future as it has in the
past Compilation cost and speedup are offline averages
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Startup Programs: Jikes RVM [Hind et al.’04]
0
1
2
3
4
5
db/10jack/10
ipsixql/short
jess/10
jbb/12000
mtrt/10javac10
xerces/short
mpeg/10
compress/10daikon/shortsoot/shortjack/100
xerces/longjavac/100
jess/100mrtr/100db/100
ipsixql/longsoot/long
jbb/200000compres/100mpeg/100 daikon/long
Geom
Speedup over Baseline
JIT 0 JIT 1 JIT 2
No FDO, Mar’04, AIX/PPC
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Startup Programs: Jikes RVM
0
1
2
3
4
5
db/10jack/10
ipsixql/short
jess/10
jbb/12000
mtrt/10javac10
xerces/short
mpeg/10
compress/10daikon/shortsoot/shortjack/100
xerces/longjavac/100
jess/100mrtr/100db/100
ipsixql/longsoot/long
jbb/200000compres/100mpeg/100 daikon/long
Geom
Speedup over Baseline
JIT 0 JIT 1 JIT 2 Model
No FDO, Mar’04, AIX/PPC
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Steady State: Jikes RVM
0
1
2
3
4
5
6
7
jbb-300ipsixqlcompress
jessdb
javac
mpegaudio
mtrt jack
Geomean
Speedup over Baseline
JIT 0 JIT 1 JIT 2
No FDO, Mar’04, AIX/PPC
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Steady State: Jikes RVM
0
1
2
3
4
5
6
7
jbb-300ipsixqlcompress
jessdb
javac
mpegaudio
mtrt jack
Geomean
Speedup over Baseline
JIT 0 JIT 1 JIT 2 Model
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Feedback-Directed Optimization (FDO)
Exploit information gathered at run-time to optimize execution “selective optimization”: what to
optimize “FDO” : how to optimize
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Advantages of FDO Can exploit dynamic information
that cannot be inferred statically
System can change and revert decisions when conditions change
Runtime binding allows more flexible systems
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Challenges for automatic online FDO
Compensate for profiling overhead
Compensate for runtime transformation overhead
Account for partial profile available and changing conditions
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Profiling for What to Do
Clients Inlining, unrolling, method dispatch
Dispatch tables, synchronization services, GC
Pretching Misses, Hardware performance
monitors [Adl-Tabatabai et al.’04] Code layout
values - loop counts edges & paths
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Profiling for What to Do
Myth: Sophisticated profiling is too expensive to perform online
Reality: Well-known technology can collect sophisticated profiles with sampling and minimal overhead
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling Timer Based
class Thread scheduler (...) { ... flag = 1;}void handler(...) { // sample stack, perform GC, swap threads, etc. .... flag = 0;}
foo ( … ) { // on method entry, exit, & all loop backedges if (flag) { handler( … ); } . . . }
ABC
Useful for more than profiling Jikes RVM
Schedule garbage collection Thread scheduling policies, etc.
if (flag) handler();
if (flag) handler();
if (flag) handler();
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Arnold-Ryder [PLDI 01]: Full Duplication Profiling
Full-Duplication Framework
Duplicated CodeChecking Code
Method Entry
Checks
EntryBackedges
CheckPlacement
Generate two copies of a method• Execute “fast path” most of the time• Execute “slow path” with detailed profiling occassionally• Adapted by J9 due to proven accuracy and low overhead
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Suggested ReadingDynamic Compilation
Adaptive optimization in the Jalapeno JVM, M. Arnold, S. Fink, D. Grove, M. Hind, and P. Sweeney, Proceedings of the 2000 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA '00), pages 47--65, Oct. 2000.