Post on 21-Dec-2015
Compilation 2007Compilation 2007
OptimizationOptimization
Michael I. Schwartzbach
BRICS, University of Aarhus
2Optimization
OptimizationOptimization
The optimizer aims at:• reducing the runtime• reducing the code size
These goals often conflict, since a larger program may in fact be faster
The best optimizations achive both goals An optimizer may also have more esoteric aims:
• reducing energy consumption• reducing chip area
3Optimization
Optimizations for SpaceOptimizations for Space
Were historically important, because memory was small and expensive
When memory became large and cheap, optimizing compilers traded space for time
Java compilers do not optimize much, but JVM bytecodes are designed to be small
When Java is targeted at mobile devices, space optimizations are again important
4Optimization
Optimizations for SpeedOptimizations for Speed
Were historically important to gain acceptance for the introduction of high-level languages
Are still important, since the software always strains the limits of the hardware
Are challenged by ever higher abstractions in programming languages and must constantly adapt to changing microprocessor architecures
Java compilers do not optimize much, since the JVM kicks in with the JIT compiler
5Optimization
Opportunities for OptimizationOpportunities for Optimization
At the source code level At an intermediate low level At the binary machine code level At runtime (JIT compilers) At the hardware level
An aggressive optimization requires many small contributions from all levels
6Optimization
Optimizers Must Undo AbstractionsOptimizers Must Undo Abstractions
Variables abstract away from registers, so the optimizer must find an efficient mapping
Control structures abstract away from gotos, so the optimizer must simplify a goto graph
Data structures abstract away from memory, so the optimizer must find an efficient layout
... Method invocations abstract away from
procedure calls, so the optimizer must efficiently determine the intended implementation
7Optimization
Difficult CompromisesDifficult Compromises
A high abstraction level makes the development time cheaper, but the runtime more expensive
An optimizing compiler makes runtime more efficient, but compile time less efficient
Optimizations for speed and size may conflict
Different applications may require different choices at different times
8Optimization
Examples of OptimizationsExamples of Optimizations
Strength reduction Loop unrolling Common subexpression elimination Loop invariant code motion Inline expansion
These may take place either at the source level or at the bytecode level
Most require information from static analyses
9Optimization
Strength ReductionStrength Reduction
Replace expensive operations with cheap ones:
for (i = 0; i < a.length; i++)
a[i] = a[i] + i/4;
for (i = 0; i < a.length; i++)
a[i] += (i >> 2);
10Optimization
Loop UnrollingLoop Unrolling
Unfold a loop to save condition tests:
for (i = 0; i < 100; i++)
g(i);
for (i = 0; i < 100; i += 2) {
g(i);
g(i+1);
}
11Optimization
Common Subexpression EliminationCommon Subexpression Elimination
Avoid redundant computations:
double d = a * Math.sqrt(c);
double e = b * Math.sqrt(c);
double tmp = Math.sqrt(c);
double d = a * tmp;
double e = b * tmp;
12Optimization
Loop Invariant Code MotionLoop Invariant Code Motion
Move constant valued expressions outside loops:
for (i = 0; i < a.length; i++)
b[i] = a[i] + c * d;
int tmp1 = a.length;
int tmp2 = c * d;
for (i = 0; i < tmp1; i++)
b[i] = a[i] + tmp2;
13Optimization
Inline ExpansionInline Expansion
Replace method invocations with copies:int pred(int x) {
if (x == 0) return x; else return x-1;
}
int f(int y) {
return pred(y) + pred(0) + pred(y+1);
}
int f(int y) {
int tmp = 0;
if (y == 0) tmp += 0; else tmp += y-1;
if (0 == 0) tmp += 0; else tmp += 0-1;
if (y+1 ==0 ) tmp += 0; else tmp += (y+1)-1;
return tmp;
}
14Optimization
Collaborating OptimizationsCollaborating Optimizations
Optimizations may enable other optimizations:
int f(int y) {
int tmp = 0;
if (y == 0) tmp += 0; else tmp += y-1;
if (0 == 0) tmp += 0; else tmp += 0-1;
if (y+1 == 0) tmp += 0; else tmp += (y+1)-1;
return tmp;
}
int f(int y) {
if (y == 0) return 0;
else if (y == -1) return -2;
else return y+y-1;
}
15Optimization
Optimization in JoosOptimization in Joos
public int foo(int a, int b, int c) { c = a*b+c; if (c<a) a = a+b*113; while (b>0) { a = a*c; b = b-1; } return a;}
iload_1iload_2imuliload_3iadddupistore_3popiload_3iload_1if_icmplt true1iconst_0goto end2true1:iconst_1end2:ifeq false0
iload_1iload_2imuliload_3iaddistore_3iload_3iload_1if_icmpge cond4iload_1iload_2bipush 113imuliaddistore_1goto cond4loop3:iload_1iload_3imulistore_1iinc 2 -1cond4:iload_2ifgt loop3iload_1ireturn
iload_1iload_2bipush 113imuliadddupistore_1popfalse0:goto cond4 loop3:iload_1iload_3imuldupistore_1pop
iload_2iconst_1isubdupistore_2popcond4:iload_2iconst_0if_icmpgt true5iconst_0goto end6true5:iconst_1end6:ifne loop3iload_1ireturn
52 bytecodes
27 bytecodes
16Optimization
Peephole OptimizationsPeephole Optimizations
Make local improvements in bytecode sequences The optimizers considers only finite windows of
the sequence When the pattern "clicks", the optimizer rewrites a
part of the code using a template:
dup
istore 3 istore 3
pop
17Optimization
Peephole TransitionsPeephole Transitions
Let P be a collection of peephole patterns It defines a transition relation on sequences of
bytecodes:
B1 B2
meaning that pP clicked at some position in the sequence B1 and produced the sequence B2
p
18Optimization
TerminationTermination
A collection of peephole patterns must terminate This means that for the collection P, there must
not exist an infinite sequence:
B0 B1 B2 B3 ...
for any B0 and piP
p1 p2 p3 p4
19Optimization
Soundness (1/2)Soundness (1/2)
Every peephole pattern must preserve semantics Assume the pattern p transforms a bytecode
sequence B1 into the sequence B2
Consider now any bytecode context C If C[B1] emits the verifiable code E1, then C[B2]
must emit some verifiable code E2 with the same semantics
20Optimization
Soundness (2/2)Soundness (2/2)
C B1:
C C
B1 B2
E1 E2
p
emit emit
21Optimization
A Peephole Pattern LanguageA Peephole Pattern Language
Joos has a domain-specific language for specifying peephole patterns
The Joos compiler contains an interpreter for this peephole language
It is invoked with the option -O patternfile It will try all patterns in an unspecified order until
no pattern clicks anywhere
22Optimization
Pattern SyntaxPattern Syntax
pattern → pattern name var :
exp ->
intconst templates
The exp determines whether the pattern clicks The intconst tells how many bytecodes to replace The template specifies the new bytecodes
The evaluation of exp produces a set of bindings that may be used inside the templates and later in the expression
23Optimization
Expression TypesExpression Types
The following types are possible results:• int• label• type-signature• field-signature• method-signature• string• condition• bytecodes• boolean
The notation inst(σ1, ..., σk) means that the given instruction has these arguments in the JVM specification
24Optimization
exp intop exp |
exp intcomp exp |
exp comp exp |
exp ~ peepholes |
! exp |
exp && exp |
exp || exp |
intconst |
condconst
Pattern ExpressionsPattern Expressions
exp → var |
degree var |
target var |
formals var |
returns var |
negate exp |
commute exp |
25Optimization
Peepholes and TemplatesPeepholes and Templates
peepholes → peephole*
peephole → instruction |
instruction (vars) |
* | (any single instruction)
var : (label binder)
template → template*
template → instruction |
instruction (exps)
condconst → eq | ne | lt | le | gt | ge | aeq | ane
intop → + | - | * | / | %
intcomp →< | <= | > | >=
comp → == | !=
26Optimization
Peephole JudgementsPeephole Judgements
The judgement:
|- E: σ[→ ']
means that the expression E:• evaluates to a result of type σ • consumes the bindings • produces the bindings '
The judgement:
|- X: [→ ']
similarly describes peepholes, templates, and patterns
27Optimization
Expression Well-Formedness (1/5)Expression Well-Formedness (1/5)
(x) = σ
|- x: σ[→]
(x) = label
|- degree x: int[→]
(x) = label
|- target x: bytecodes[→]
(x) = method-signature
|- formals x: int[→]
28Optimization
Expression Well-Formedness (2/5)Expression Well-Formedness (2/5)
|- E: condition[→']
|- negate E: condition[→']
(x)= method-signature
|- returns x: int[→]
|- E: condition[→']
|- commute E: condition[→']
29Optimization
Expression Well-Formedness (3/5)Expression Well-Formedness (3/5)
|- E1: int[→'] |- E2: int[' →'']
|- E1 intop E2: int[→'']
|- E1: int[→'] |- E2: int['→'']
|- E1 intcomp E2: boolean[→'']
|- E1: σ[→'] |- E2: σ['→'']
|- E1 comp E2: boolean[→'']
30Optimization
Expression Well-Formedness (4/5)Expression Well-Formedness (4/5)
|- E: bytecodes[→'] |- P['→'']
|- E ~ P: boolean[→'']
|- E: boolean[→']
|- ! E: boolean[→]
|- E1: boolean[→'] |- E2: boolean['→'']
|- E1 && E2: boolean[→'']
31Optimization
Expression Well-Formedness (5/5)Expression Well-Formedness (5/5)
|- E1: boolean[→'] |- E2: boolean[→''] x: '(x)=' ''(x)='' ' = ''
|- E1 || E2: boolean[→ ' '']
|- k: int[→]
|- cond: condition[→]
32Optimization
Peephole Well-Formedness (1/2)Peephole Well-Formedness (1/2)
|- Pi[i→i+1]
|- P1P2...Pk[1→ k+1]
|- inst: [→]
xi ≠ xj xi inst(σ1,..., σk)
|- inst(x1,...,xk)[→[xi→σi]]
33Optimization
Peephole Well-Formedness (2/2)Peephole Well-Formedness (2/2)
|- *: [→]
|- x: : [ → [x→label]]
|- label(x) : [ → [x→label]]
34Optimization
Template Well-FormednessTemplate Well-Formedness
|- Ti: [i→i+1]
|- T1T2...Tk: [1→ k+1]
|- inst: [→]
|- Ei: σi[i→i+1] inst(σ1,..., σk)
|- inst(E1,...,Ek)[1→k+1]
|- E: label [1→2]
|- E:inst: [1→2]
35Optimization
Pattern Well-FormednessPattern Well-Formedness
|- E: boolean[[x→bytecodes] → ] |- T[→']
|- pattern n x: E -> k T: [[]→']
36Optimization
Pattern Examples (1/4)Pattern Examples (1/4)
pattern dup_istore_pop x:
x ~ dup
istore (i0)
pop
-> 3 istore (i0)
This pattern is relevant for code like:
x = a*b;
37Optimization
Pattern Examples (2/4)Pattern Examples (2/4)
pattern goto_label x:
x ~ goto (l1)
label (l2)
&& l1 == l2
-> 1
This pattern arises during optimization of nested control structures
38Optimization
Pattern Examples (3/4)Pattern Examples (3/4)
pattern constant_iadd_residue x:
x ~ ldc_int (i0)
iadd
ldc_int (i1)
iadd
-> 4 ldc_int (i0+i1)
iadd
This pattern is relevant for code like:
a+5+7
39Optimization
Pattern Examples (4/4)Pattern Examples (4/4)
pattern goto_goto x:
x ~ goto (l0)
&& target l0 ~ goto (l1)
&& ! (target l1 ~ goto (l2))
&& ! (target l1 ~ label (l3))
-> 1 goto (l1)
This pattern arises during optimization of nested control structures
40Optimization
Proving TerminationProving Termination
We want to avoid infinite sequences like:
B0 B1 B2 B3 ...
Define an integer valued function such that:
B: (B) 0
pP: B1 B2 (B2) < (B1)
p1 p2 p3 p4
p
41Optimization
Termination Function ExampleTermination Function Example
For our 4 example patterns we define:
(B) = #dup + #goto + #iadd + ???
What gets smaller in the goto_goto pattern?
42Optimization
Termination Function ExampleTermination Function Example
For our 4 example patterns we define:
(B) = #dup + #goto + #iadd + ???
What gets smaller in the goto_goto pattern?
label (l1) B
l1 → l2 → l3 → ... → lk li ≠ lj
goto goto goto goto
k
43Optimization
A Non-Terminating PatternA Non-Terminating Pattern
pattern bad_goto_goto x:
x ~ goto (l0)
&& target l0 ~ goto (l1)
-> 1 goto (l1)
foo: goto bar
bar: goto foo
44Optimization
Proving SoundnessProving Soundness
A formal proof of soundness for a collection of patterns requires a full formal semantics of:• bytecode sequences• peephole patterns• bytecode contexts• code emission• the complete JVM
The pitfall is usually the universal quantification of contexts: does this really always work?
45Optimization
An Unsound PatternAn Unsound Pattern
pattern idiv_pop x:
x ~ idiv
pop
-> 1 pop
This pattern may actually click And the resulting bytecode will always verify But the semantics is not preserved, since it may
remove a java.lang.ArithmeticException