Lecture Compilers SS 2009 Dr.-Ing. Ina Schaefer · Lecture Compilers SS 2009 Dr.-Ing. Ina Schaefer...
Transcript of Lecture Compilers SS 2009 Dr.-Ing. Ina Schaefer · Lecture Compilers SS 2009 Dr.-Ing. Ina Schaefer...
Selected Aspects of CompilersLecture Compilers SS 2009
Dr.-Ing. Ina Schaefer
Software Technology GroupTU Kaiserslautern
Ina Schaefer Selected Aspects of Compilers 1
Content of Lecture
1. Introduction: Overview and Motivation2. Syntax- and Type Analysis
2.1 Lexical Analysis2.2 Context-Free Syntax Analysis2.3 Context-Dependent Syntax Analysis
3. Translation to Target Language3.1 Translation of Imperative Language Constructs3.2 Translation of Object-Oriented Language Constructs
4. Selected Aspects of Compilers4.1 Intermediate Languages4.2 Optimization4.3 Data Flow Analysis4.4 Register Allocation4.5 Code Generation
5. Garbage Collection6. XML Processing (DOM, SAX, XSLT)
Ina Schaefer Selected Aspects of Compilers 2
Outline1. Intermediate Languages
3-Address CodeFurther Intermediate Languages
2. OptimizationOptimization TechniquesOptimization Potential
3. Data Flow AnalysisLiveness AnalysisData Flow EquationsNon-Local Program Analysis
4. Register AllocationEvaluation Ordering with Minimal RegistersRegister Allocation by Graph ColoringFurther Aspects of Register Allocation
5. Code GenerationIna Schaefer Selected Aspects of Compilers 3
Selected Aspects of Compilers
Focus:• Techniques that go beyond the direct translation of source
languages to target languges• Concentrate on concepts instead of language-dependent details• We use program representations tailored for the considered tasks
(instead of source language syntax)! simplifies representation! but makes practical integration more difficult
Ina Schaefer Selected Aspects of Compilers 4
Selected Aspects of Compilers (2)
Educational Objectives:• Intermediate languages for translation and optimization of
imperative languages• Different optimization techniques• Different static analysis techniques for (intermediate) programs• Register allocation• Some aspects of code generation
Ina Schaefer Selected Aspects of Compilers 5
Intermediate Languages
Intermediate Languages
• Intermediate languages are used as! appropriate program representation for certain language
implementation tasks! common representation of programs of different source languages
Source Language 1
Source Language 2
Source Language n
Intermediate Language
Target Language 1
Target Language 2
Target Language m
...
...
Ina Schaefer Selected Aspects of Compilers 6
Intermediate Languages
Intermediate Languages (2)
• Intermediate languages for translation are comparable to datastructures in algorithm design, i.e., for each task, an intermediatelanguage is more or less suitable.
• Intermediate languages can conceptually be seen as abstractmachines.
Ina Schaefer Selected Aspects of Compilers 7
Intermediate Languages 3-Address Code
3-Address Code
3-Address Code (3AC) is a common intermediate language with manyvariants.
Properties:
• only elementary data types (but often arrays)• no nested expressions• sequential execution, jumps and procedure calls as statements• named variables as in a high level language• unbounded number of temporary variables
Ina Schaefer Selected Aspects of Compilers 8
Intermediate Languages 3-Address Code
3-Address Code (2)
A program in 3AC consists of• a list of global variables• a list of procedures with parameters and local variables• a main procedure• each procedure has a sequence of 3AC commands as body
Ina Schaefer Selected Aspects of Compilers 9
Intermediate Languages 3-Address Code
3AC commands
Syntax Explanation
x := y bop zx : = uop zx:= y
x: variable (global, local, parameter, temporary)y,z: variable or constantbop: binary operatoruop: unary operator
goto Lif x cop y goto L
jump or conditional jump to label Lcop: comparison operatoronly procedure-local jumps
x:= a[i]a[i]:= y a one-dimensional arrayx : = & ax:= *y*x := y
a global, local variable or parameter& a address of a* dereferencing operator
Ina Schaefer Selected Aspects of Compilers 10
Intermediate Languages 3-Address Code
3AC commands (2)
Syntax Explanation
param xcall preturn y
call p(x1, ..., xn) is encoded as:(block is considered as one command)param x1
...
param xn
call p
return y causes jump to return addresswith (optional) result y
We assume that 3AC only contains labelsfor which jumps are used in the program.
Ina Schaefer Selected Aspects of Compilers 11
Intermediate Languages 3-Address Code
Basic Blocks
A sequence of 3AC commands can be uniquely partitioned into basicblocks.
A basic block B is a maximal sequence of commands such that• only one jump, procedure call and return command occurs at the
end of B• labels only occur at the first command of a basic block
Ina Schaefer Selected Aspects of Compilers 12
Intermediate Languages 3-Address Code
Basic Blocks (2)
Remarks:• The commands of a basic block are always executed sequentially,
there are no jumps to the inside• Often, a designated exit-block for a procedure containing the
return jump at its end is required. This is handled by additionaltransformations.
• The transitions between basic blocks are often denoted by flowcharts.
Ina Schaefer Selected Aspects of Compilers 13
Intermediate Languages 3-Address Code
Example: 3 AC and Basic Blocks
Consider the following C program:Beispiel: (3AC und Basisblöcke)
Wir betrachten den 3AC für ein C-Programm:
int a[2];
int b[7];
int skprod(int i1, int i2, int lng) {... }
int main( ) {
a[0] = 1; a[1] = 2;
b[0] = 4; b[1] = 5; b[2] = 6;
skprod(0 1 2);skprod(0,1,2);
return 0;
}
3AC mit Basisblockzerlegung für die Prozedur main:
main:
a[0] := 1a[0] := 1
a[1] := 2
b[0] := 4
b[1] := 5
b[2] := 6
param 0
param 1
param 2
call skprod
28.06.2007 296© A. Poetzsch-Heffter, TU Kaiserslautern
return 0
Ina Schaefer Selected Aspects of Compilers 14
Intermediate Languages 3-Address Code
Example: 3 AC and Basic Blocks (2)
3AC with basic block partitioning for main procedure
Beispiel: (3AC und Basisblöcke)
Wir betrachten den 3AC für ein C-Programm:
int a[2];
int b[7];
int skprod(int i1, int i2, int lng) {... }
int main( ) {
a[0] = 1; a[1] = 2;
b[0] = 4; b[1] = 5; b[2] = 6;
skprod(0 1 2);skprod(0,1,2);
return 0;
}
3AC mit Basisblockzerlegung für die Prozedur main:
main:
a[0] := 1a[0] := 1
a[1] := 2
b[0] := 4
b[1] := 5
b[2] := 6
param 0
param 1
param 2
call skprod
28.06.2007 296© A. Poetzsch-Heffter, TU Kaiserslautern
return 0
Ina Schaefer Selected Aspects of Compilers 15
Intermediate Languages 3-Address Code
Example: 3 AC and Basic Blocks (3)
Procedure skprod:Prozedur skprod mit 3AC und Basisblockzerlegung:
int skprod(int i1, int i2, int lng) {
int ix, res = 0;
for( ix=0; ix <= lng-1; ix++ ){
res += a[i1+ix] * b[i2+ix];
}
skprod:
}
return res;
}
res:= 0
ix := 0
t0 := lng-1
if ix<=t0
true false
t1 := i1+ix
t2 := a[t1]
t1 := i2+ix
t3 := b[t1]
t1 := t2*t3
return res
t1 := t2*t3
res:= es+t1
ix := ix+1
28.06.2007 297© A. Poetzsch-Heffter, TU Kaiserslautern
Ina Schaefer Selected Aspects of Compilers 16
Intermediate Languages 3-Address Code
Example: 3 AC and Basic Blocks (4)
Procedure skprod as 3AC with basic blocks
Prozedur skprod mit 3AC und Basisblockzerlegung:
int skprod(int i1, int i2, int lng) {
int ix, res = 0;
for( ix=0; ix <= lng-1; ix++ ){
res += a[i1+ix] * b[i2+ix];
}
skprod:
}
return res;
}
res:= 0
ix := 0
t0 := lng-1
if ix<=t0
true false
t1 := i1+ix
t2 := a[t1]
t1 := i2+ix
t3 := b[t1]
t1 := t2*t3
return res
t1 := t2*t3
res:= es+t1
ix := ix+1
28.06.2007 297© A. Poetzsch-Heffter, TU Kaiserslautern
Ina Schaefer Selected Aspects of Compilers 17
Intermediate Languages 3-Address Code
Intermediate Language Variations3 AC after elimination of array operations (at above example)
Variation im Rahmen einer Zwischensprache:
3-Adress-Code nach Elimination von Feldoperationen
anhand des obigen Beispiels:
skprod:p
res:= 0
ix := 0
t0 := lng-1
if ix<=t0
t1 := i1+ix
tx := t1*4
ta := a+tx
true false
return res
t2 := *ta
t1 := i2+ix
tx := t1*4
tb := b+tx
t3 *tbt3 := *tb
t1 := t2*t3
res:= res+t1
ix := ix+1
28.06.2007 298© A. Poetzsch-Heffter, TU Kaiserslautern
Ina Schaefer Selected Aspects of Compilers 18
Intermediate Languages 3-Address Code
Characteristics of 3-Address Code
• Control flow is explicit.• Only elementary operations• Rearrangement and exchange of commands can be handled
relatively easily.
Ina Schaefer Selected Aspects of Compilers 19
Intermediate Languages Further Intermediate Languages
Further Intermediate Languages
We consider• 3AC in Static Single Assignment (SSA) representation• Stack Machine Code
Ina Schaefer Selected Aspects of Compilers 20
Intermediate Languages Further Intermediate Languages
Static Single Assignment Form
If a variable a is read at a program position, this is an application of a.
If a variable a is written at a program position, this is a definition of a.
For optimizations, the relationship between application and definition ofvariables is important.
In SSA representation, each variable has exactly one definition. Thus,relationship between application and definition in the intermediatelanguage is explicit.
Ina Schaefer Selected Aspects of Compilers 21
Intermediate Languages Further Intermediate Languages
Static Single Assignment Form (2)
SSA is essentially a refinement of 3AC.
The different definitions of one variable are represented by indexingthe variable.
For sequential command lists, this means• At each definition position, the variable gets a different index.• At the application position, the variable has the the index of its last
definition.
Ina Schaefer Selected Aspects of Compilers 22
Intermediate Languages Further Intermediate Languages
Example: SSA
In SSA-Repräsentation besitzt jede Variable genau
eine Definition. Dadurch wird der Zusammenhang
ischen An end ng nd Definition in derzwischen Anwendung und Definition in der
Zwischensprache explizit, d.h. eine zusätzliche
def-use-Verkettung oder use-def-Verkettung wird
unnötig.
SSA ist im Wesentlichen eine Verfeinerung von 3AC.
Die Unterscheidung zwischen den Definitionsstellen
wird häufig durch Indizierung der Variablen dargestelltwird häufig durch Indizierung der Variablen dargestellt.
Für sequentielle Befehlsfolgen bedeutet das:
• An jeder Definitionsstelle bekommt die Variable
einen anderen Indexeinen anderen Index.
• An der Anwendungsstelle wird die Variable mit
dem Index der letzten Definitionsstelle notiert.
a := x + y
Beispiel:
a := x + y 1 0 0
b := a – 1
a := y + b
b := x * 4
a := a + b
b := a - 1
a := y + b
b := x * 4
a := a + b
1 1
2
2
0
0 1
28.06.2007 300© A. Poetzsch-Heffter, TU Kaiserslautern
a := a + b a := a + b 3 2 2
Ina Schaefer Selected Aspects of Compilers 23
Intermediate Languages Further Intermediate Languages
SSA - Join Points of Control Flow
At join points of control flow, an additional mechanism is required:
An Stellen, an denen der Kontrollfluß zusammen-
führt bedarf es eines zusätzlichen Mechanismus:führt, bedarf es eines zusätzlichen Mechanismus:
3 2 2a := x + y a := a – b1 0 0
?b := a3
...
Einführung der fiktiven Orakelfunktion“ ! dieEinführung der fiktiven „Orakelfunktion !, die
quasi den Wert der Variable im zutreffenden Zweig
auswählt:
3 2 2a := x + y a := a – b1 0 0
a := !(a ,a )b := a
43
1 34
28.06.2007 301© A. Poetzsch-Heffter, TU Kaiserslautern
...
Ina Schaefer Selected Aspects of Compilers 24
Intermediate Languages Further Intermediate Languages
SSA - Join Points of Control Flow (2)Introduce an "oracle" ! that selects the value of the variable of theapplicable branch:
An Stellen, an denen der Kontrollfluß zusammen-
führt bedarf es eines zusätzlichen Mechanismus:führt, bedarf es eines zusätzlichen Mechanismus:
3 2 2a := x + y a := a – b1 0 0
?b := a3
...
Einführung der fiktiven Orakelfunktion“ ! dieEinführung der fiktiven „Orakelfunktion !, die
quasi den Wert der Variable im zutreffenden Zweig
auswählt:
3 2 2a := x + y a := a – b1 0 0
a := !(a ,a )b := a
43
1 34
28.06.2007 301© A. Poetzsch-Heffter, TU Kaiserslautern
...
Ina Schaefer Selected Aspects of Compilers 25
Intermediate Languages Further Intermediate Languages
SSA - Remarks
• The construction of an SSA representation with a minimal numberof applications of the ! oracle is a non-trivial task.(cf. Appel, Sect. 19.1. and 19.2)
• The term Static Single Assignment form reflects that for eachvariable in the program text, there is only one assignment.Dynamically, a variable in SSA representation can be assignedarbitrarily often (e.g., in loops).
Ina Schaefer Selected Aspects of Compilers 26
Intermediate Languages Further Intermediate Languages
Further Intermediate Languages
While 3AC and SSA representation are mostly used as intermediatelanguages in compilers, intermediate languages and abstractmachines are more and more often used as connections betweencompilers and runtime environments.
Java Byte Code and CIL (Common Intermediate Language, cf. .NET)are examples for stack machine code, i.e., intermediate results arestored on a runtime stack.
Further intermediate languages are, for instance, used foroptimizations.
Ina Schaefer Selected Aspects of Compilers 27
Intermediate Languages Further Intermediate Languages
Stack Machine Code as Intermediate Language
Homogeneous Scenario for Java:Sprachlich homogenes Szenario bei Java:
C1.java
C2.javajikes
C1.class
C2 class
Java ByteCode
C2.java
C3.java javac2
C2.class
C3.class
JVM
Sprachlich ggf. inhomogenes Szenario bei .NET:
ProgrammeIntermediate
C# -
C il
prog1.cs prog1.il
verschiedener
Hochsprachen
Intermediate
Language
Compilerprog2.cs prog2.il
prog3.il
CLR
Haskell -
Compilerprog3.hs
Java-ByteCode und die MS-Intermediate Language
sind Beispiele für Kellermaschinencode, d.h.
Z i h b i d f i L f itk ll
28.06.2007 303© A. Poetzsch-Heffter, TU Kaiserslautern
Zwischenergebnisse werden auf einem Laufzeitkeller
verwaltet.
Ina Schaefer Selected Aspects of Compilers 28
Intermediate Languages Further Intermediate Languages
Stack Machine Code as Intermediate Language (2)
Inhomogeneous Scenario for .NET:
Sprachlich homogenes Szenario bei Java:
C1.java
C2.javajikes
C1.class
C2 class
Java ByteCode
C2.java
C3.java javac2
C2.class
C3.class
JVM
Sprachlich ggf. inhomogenes Szenario bei .NET:
ProgrammeIntermediate
C# -
C il
prog1.cs prog1.il
verschiedener
Hochsprachen
Intermediate
Language
Compilerprog2.cs prog2.il
prog3.il
CLR
Haskell -
Compilerprog3.hs
Java-ByteCode und die MS-Intermediate Language
sind Beispiele für Kellermaschinencode, d.h.
Z i h b i d f i L f itk ll
28.06.2007 303© A. Poetzsch-Heffter, TU Kaiserslautern
Zwischenergebnisse werden auf einem Laufzeitkeller
verwaltet.
Ina Schaefer Selected Aspects of Compilers 29
Intermediate Languages Further Intermediate Languages
Example: Stack Machine Code
Beispiel: (Kellermaschinencode)
package beisp;
class Weltklasse extends Superklasse
implements BesteBohnen {
Qualifikation studieren ( Arbeit schweiss){
return new Qualifikation();
}}
}
Compiled from Weltklasse.java
class beisp Weltklasse extends beisp Superklasseclass beisp.Weltklasse extends beisp.Superklasse
implements beisp.BesteBohnen{
beisp.Weltklasse();
beisp.Qualifikation studieren( beisp.Arbeit);
}
Method beisp.Weltklasse()
0 aload_0
1 invokespecial #6 <Method beisp.Superklasse()>
4 return
Method beisp.Qualifikation studieren( beisp.Arbeit )
0 new #2 <Class beisp.Qualifikation>
3 dup
4 invokespecial #5 <Method beisp.Qualifikation()>
7 areturn7 areturn
Bemerkung:
Weitere Zwischensprachen werden insbesondere auch
28.06.2007 304© A. Poetzsch-Heffter, TU Kaiserslautern
Weitere Zwischensprachen werden insbesondere auch
im Zusammenhang mit Optimierungen eingesetzt.
Ina Schaefer Selected Aspects of Compilers 30
Intermediate Languages Further Intermediate Languages
Example: Stack Machine Code (2)
Beispiel: (Kellermaschinencode)
package beisp;
class Weltklasse extends Superklasse
implements BesteBohnen {
Qualifikation studieren ( Arbeit schweiss){
return new Qualifikation();
}}
}
Compiled from Weltklasse.java
class beisp Weltklasse extends beisp Superklasseclass beisp.Weltklasse extends beisp.Superklasse
implements beisp.BesteBohnen{
beisp.Weltklasse();
beisp.Qualifikation studieren( beisp.Arbeit);
}
Method beisp.Weltklasse()
0 aload_0
1 invokespecial #6 <Method beisp.Superklasse()>
4 return
Method beisp.Qualifikation studieren( beisp.Arbeit )
0 new #2 <Class beisp.Qualifikation>
3 dup
4 invokespecial #5 <Method beisp.Qualifikation()>
7 areturn7 areturn
Bemerkung:
Weitere Zwischensprachen werden insbesondere auch
28.06.2007 304© A. Poetzsch-Heffter, TU Kaiserslautern
Weitere Zwischensprachen werden insbesondere auch
im Zusammenhang mit Optimierungen eingesetzt.
Ina Schaefer Selected Aspects of Compilers 31
Intermediate Languages Further Intermediate Languages
Example: Stack Machine Code - CIL!"#$%#"&'"#(")'"#(*+,-"('."/0"(1'23('4($5)065#3("(''
7"8"9"('$"#'"#("'60)1"'$5+5#$,-"':"5-3;"'<"$5:"5-3;='
'
!"#$%&'()*)%&'+,%-'./()0/)1,-'2%3)'*4'%3)'#5'66'*78"9/3)(':'' %3)'&;'%3)'-;'%3)'/;'66'$,&*$('' &'<'*'='#;'' -'<'>?;'' /'<'&'='-;'@''
'
>#"$"$'.5?,6'@3,-$%)+,-"'A#);'23B'CDEC3B%#&")'#('C4FEC3;"'?9")$"515='
'
A9/)1,-'!"#$%&'1%-/#B(%8'()*)%&'+,%-''./()0/)1,-2%3)CD'*4%3)CD'#5'&%$'9*3*8/-':'''66'E,-/'(%F/'''''''>D'2?G&5'''A9*G()*&H''D'''A$,&*$('%3%)'2I?J'%3)CD'&4''''''''''''I>J'%3)CD'-4''''''''''''IDJ'%3)CD'/5'''KLM????N''$-*78A?'''KLM???>N''$-*78A>'''KLM???DN''*--'''KLM???CN''()$,&A?'''KLM???ON''$-&A%OA('''>?'''KLM???PN''()$,&A>'''KLM???QN''$-$,&A?'''KLM???RN''$-$,&A>'''KLM???SN''*--'''KLM???*N''()$,&AD'''KLM???#N''7/)'@'66'/3-',T'9/)1,-'E$*((>NN./()0/)1,-''
'
!"#$%'!"#' ' $,*-'*78"9/3)'3,A'!"#',3),')1/'()*&H'&'!'&(')#!*+''
'
!"!,-'$!%&' ' $,*-'$,&*$'+*7%*#$/'3,A'%3-G',3),')1/'()*&H'&'!'&(')#!*+''
'
!"-'!"#' ' $,*-'3"9/7%&'&,3()*3)'&'!'&('.*/''
'
Ina Schaefer Selected Aspects of Compilers 32
Intermediate Languages Further Intermediate Languages
Example: Stack Machine Code - CIL (2)
!"#$%#"&'"#(")'"#(*+,-"('."/0"(1'23('4($5)065#3("(''
7"8"9"('$"#'"#("'60)1"'$5+5#$,-"':"5-3;"'<"$5:"5-3;='
'
!"#$%&'()*)%&'+,%-'./()0/)1,-'2%3)'*4'%3)'#5'66'*78"9/3)(':'' %3)'&;'%3)'-;'%3)'/;'66'$,&*$('' &'<'*'='#;'' -'<'>?;'' /'<'&'='-;'@''
'
>#"$"$'.5?,6'@3,-$%)+,-"'A#);'23B'CDEC3B%#&")'#('C4FEC3;"'?9")$"515='
'
A9/)1,-'!"#$%&'1%-/#B(%8'()*)%&'+,%-''./()0/)1,-2%3)CD'*4%3)CD'#5'&%$'9*3*8/-':'''66'E,-/'(%F/'''''''>D'2?G&5'''A9*G()*&H''D'''A$,&*$('%3%)'2I?J'%3)CD'&4''''''''''''I>J'%3)CD'-4''''''''''''IDJ'%3)CD'/5'''KLM????N''$-*78A?'''KLM???>N''$-*78A>'''KLM???DN''*--'''KLM???CN''()$,&A?'''KLM???ON''$-&A%OA('''>?'''KLM???PN''()$,&A>'''KLM???QN''$-$,&A?'''KLM???RN''$-$,&A>'''KLM???SN''*--'''KLM???*N''()$,&AD'''KLM???#N''7/)'@'66'/3-',T'9/)1,-'E$*((>NN./()0/)1,-''
'
!"#$%'!"#' ' $,*-'*78"9/3)'3,A'!"#',3),')1/'()*&H'&'!'&(')#!*+''
'
!"!,-'$!%&' ' $,*-'$,&*$'+*7%*#$/'3,A'%3-G',3),')1/'()*&H'&'!'&(')#!*+''
'
!"-'!"#' ' $,*-'3"9/7%&'&,3()*3)'&'!'&('.*/''
'
Ina Schaefer Selected Aspects of Compilers 33
Optimization
Optimization
Optimization refers to improving the code with the following goals:• Runtime behavior• Memory consumption• Size of code• Energy consumption
Ina Schaefer Selected Aspects of Compilers 34
Optimization
Optimization (2)
We distinguish the following kinds of optimizations:• machine-independent optimizations• machine-dependent optimizations (exploit properties of a
particular real machine)and
• local optimizations• intra-procedural optimizations• inter-procedural/global optimizations
Ina Schaefer Selected Aspects of Compilers 35
Optimization
Remark on Optimization
Appel (Chap. 17, p 350):
"In fact, there can never be a complete list [of optimizations]. "
"Computability theory shows that it will always be possible to inventnew optimizing transformations."
Ina Schaefer Selected Aspects of Compilers 36
Optimization Optimization Techniques
Constant Propagation
If the value of a variable is constant, the variable can be replaced withthe constant.
Ina Schaefer Selected Aspects of Compilers 37
Optimization Optimization Techniques
Constant FoldingEvaluate all expressions with constants as operands at compile time.
Iteration of Constant Folding and Propagation:
Ina Schaefer Selected Aspects of Compilers 38
Optimization Optimization Techniques
Non-local Constant Optimization
For each program position, the possible values for each variable arerequired. If the set of possible values is infinite, it has to be abstractedappropriately.
Ina Schaefer Selected Aspects of Compilers 39
Optimization Optimization Techniques
Copy Propagation
Eliminate all copies of variables, i.e., if there exist several variablesx,y,z at a program position, that are known to have the same value, allapplications of y and z are replaced by x.
Ina Schaefer Selected Aspects of Compilers 40
Optimization Optimization Techniques
Copy Propagation (2)
This can also be done at join points of control flow or for loops:
For each program point, the information which variables have the samevalue is required.
Ina Schaefer Selected Aspects of Compilers 41
Optimization Optimization Techniques
Common Subexpression Elimination
If an expression or a statement contains the same partial expressionseveral times, the goal is to evaluate this subexpression only once.
Ina Schaefer Selected Aspects of Compilers 42
Optimization Optimization Techniques
Common Subexpression Elimination (2)
Optimization of a basic block is done after transformation to SSA andconstruction of a DAG:
Ina Schaefer Selected Aspects of Compilers 43
Optimization Optimization Techniques
Common Subexpression Elimination (3)
Remarks:• The elimination of repeated computations is often done before
transformation to 3AC, but can also be reasonable following othertransformations.
• The DAG representation of expressions is also used asintermediate language by some authors.
Ina Schaefer Selected Aspects of Compilers 44
Optimization Optimization Techniques
Algebraic Optimizations
Algebraic laws can be applied in order to be able to use otheroptimizations. For example, use associativity and commutativity ofaddition:
Caution: For finite data type, common algebraic laws are not valid ingeneral.
Ina Schaefer Selected Aspects of Compilers 45
Optimization Optimization Techniques
Strength Reduction
Replace expensive operations by more efficient operations (partiallymachine-dependent).
For example: y: = 2* x can be replaced by
y : = x + x
or by
y: = x « 1
Ina Schaefer Selected Aspects of Compilers 46
Optimization Optimization Techniques
Inline Expansion of Procedure Calls
Replace call to non-recursive procedure by its body with appropriatesubstitution of parameters.
Note: This reduces execution time, but increases code size.
Ina Schaefer Selected Aspects of Compilers 47
Optimization Optimization Techniques
Inline Expansion of Procedure Calls (2)
Remarks:• Expansion is in general more than text replacement:
Ina Schaefer Selected Aspects of Compilers 48
Optimization Optimization Techniques
Inline Expansion of Procedure Calls (3)
• In OO programs with relatively short methods, expansion is animportant optimization technique. But, precise information aboutthe target object is required.
• A refinement of inline expansion is the specialization ofprocedures/functions if some of the current parameters areknown. This technique can also be applied to recursiveprocedures/functions.
Ina Schaefer Selected Aspects of Compilers 49
Optimization Optimization Techniques
Dead Code Elimination
Remove code that is not reached during execution or that has noinfluence on execution.
In one of the above examples, constant folding and propagationproduced the following code:
Provided, t3 and t4 are no longer used after the basic block (not live).
Ina Schaefer Selected Aspects of Compilers 50
Optimization Optimization Techniques
Dead Code Elimination (2)
A typical example for non-reachable and thus, dead code that can beeliminated:
Ina Schaefer Selected Aspects of Compilers 51
Optimization Optimization Techniques
Dead Code Elimination (3)
Remarks:
• Dead code is often caused by optimizations.• Another source of dead code are program modifications.• In the first case, liveness information is the prerequiste for dead
code elimination.
Ina Schaefer Selected Aspects of Compilers 52
Optimization Optimization Techniques
Code Motion
Move commands between branches such that commands get are inbasic blocks that are less often executed.
We consider two cases:
• Move commands in succeeding or preceeding branches• Move code out of loops
Optimization of loops is very profitable, because code inside loops isexecuted more often than code not contained in a loop.
Ina Schaefer Selected Aspects of Compilers 53
Optimization Optimization Techniques
Move Code between Execution Branches
If a sequential computation branches, the branches are less oftenexecuted than the sequence.
Ina Schaefer Selected Aspects of Compilers 54
Optimization Optimization Techniques
Move Code between Execution Branches (2)
Prerequisite for this optimization is that a defined variable is only usedin one branch.
Moving the command to a preceeding branch is reasonable, if thecommand can be removed from one branch.
Ina Schaefer Selected Aspects of Compilers 55
Optimization Optimization Techniques
Partial Redundancy Elimination
Definition (Partial Redundancy)An assignment is redundant at a program position s, if it has alreadybeen executed on all paths to s.
An expression e is redundant at s, if the value of e has already beencalculated on all paths to s.
An assignment/expression is partially redundant at s, if it is redundantwith respect to some execution paths leading to s.
Ina Schaefer Selected Aspects of Compilers 56
Optimization Optimization Techniques
Partial Redundancy Elimination (2)
Example:
Ina Schaefer Selected Aspects of Compilers 57
Optimization Optimization Techniques
Partial Redundancy Elimination (3)
Elimination of partial redundancy:
Ina Schaefer Selected Aspects of Compilers 58
Optimization Optimization Techniques
Partial Redundancy Elimination (4)
Remarks:
• PRE can be seen as a combination and extension of commonsubexpression elimination and code motion.
• Extension: Elimination of partial redundancy according toestimated probability for execution of specific paths.
Ina Schaefer Selected Aspects of Compilers 59
Optimization Optimization Techniques
Code Motion from Loops
Idea: Computations in loops whose operations are not changed insidethe loop should be done outside the loop.
Provided, t1 is not live at the end of the top-most block on the left side.
Ina Schaefer Selected Aspects of Compilers 60
Optimization Optimization Techniques
Optimization of Loop Variables
Variables and expressions that are not changed during the executionof a loop are called loop invariants.
Loops often have variables that are increased/decreasedsystematically in each loop execution, e.g., for-loops.
Often, a loop variable depends on another loop variable,e.g., a relative address depends on the loop counter variable.
Ina Schaefer Selected Aspects of Compilers 61
Optimization Optimization Techniques
Optimization of Loop Variables (2)
Definition (Loop Variables)A variable i is called explicit loop variable of a loop S, if there is exactlyone definition of i in S of the form i := i + c where c is a loop invariant.
A variable k is called derived loop variable of a loop S, if there isexactly one definition of k in S of the form k := j * c or k := j + dwhere j is a loop variable and c and d are loop invariants
Ina Schaefer Selected Aspects of Compilers 62
Optimization Optimization Techniques
Induction Variable AnalysisCompute derived loop variables inductively, i.e., instead of computingthem from the value of the loop variable, compute them from thevalued of the previous loop execution.
Note: For optimization of derived loop variables, dependenciesbetween variable definitions have to be considered precisely.
Ina Schaefer Selected Aspects of Compilers 63
Optimization Optimization Techniques
Loop Unrolling
If the number of loop executions is known statically or properties about thenumber of loop executions (e.g., always an even number) can be inferred, theloop body can be copied several times to save comparisons and jumps.
Provided, ix is dead at the end of the fragment. Note ,the static computationof ix in the unrolled loop.
Ina Schaefer Selected Aspects of Compilers 64
Optimization Optimization Techniques
Loop Unrolling (2)
Remarks:
• Partial loop unrolling aims at obtaining larger basic blocks in loopsto have more optimization options.
• Loop unrolling is in particular important for parallel processorarchitectures and pipelined processing (machine-dependent).
Ina Schaefer Selected Aspects of Compilers 65
Optimization Optimization Techniques
Optimization for Other Language Classes
The discussed optimizations aim at imperative languages. Foroptimizing programs of other language classes, special techniqueshave been developed.
For example:
• Object-oriented languages: Optimization of dynamic binding(type analysis)
• Non-strict functional languages: Optimization of lazy function calls(strictness analysis)
• Logic programming languages: Optimization of unification
Ina Schaefer Selected Aspects of Compilers 66
Optimization Optimization Potential
Optimization Potential - ExampleConsider procedure skprod for the evaluation of the optimization techniques:
Evaluation: Number of steps (lng): 2 + 2 + 13* lng +1 = 13 lng + 5Ina Schaefer Selected Aspects of Compilers 67
Optimization Optimization Potential
Optimization Potential - Example (2)Move computation of loop invariant out of loop:
Evaluation: 3+1+12*lng+1 = 12 *lng + 5Ina Schaefer Selected Aspects of Compilers 68
Optimization Optimization Potential
Optimization Potential - Example (3)Optimization of loop variables: There are no derived loop variables, becauset1 and tx have several definitions; Transformation to SSA for t1 and tx yieldsthat t11, tx1, ta, t12, tb become derived loop variables.
Ina Schaefer Selected Aspects of Compilers 69
Optimization Optimization Potential
Optimization Potential - Example (4)Optimization of loop variables(2): Inductive definition of loop variables
Ina Schaefer Selected Aspects of Compilers 70
Optimization Optimization Potential
Optimization Potential - Example (5)Dead Code Elimination: t11, tx1, t12, tx2 do not influence the result.
Evaluation: 9 + 1 + 8 * lng +1 = 8 * lng +11Ina Schaefer Selected Aspects of Compilers 71
Optimization Optimization Potential
Optimization Potential - Example (6)
Algebraic Optimizations: Use invariants ta = 4 ! (i1" 1 + ix) + a for thecomparison ta # 4 ! (i1" 1 + t0) + a
Ina Schaefer Selected Aspects of Compilers 72
Optimization Optimization Potential
Optimization Potential - Example (7)Dead Code Elimination: Assignment to ix is dead code and can be eliminated.
Evaluation: 11 + 1 + 7 * Ing +1 = 7 * lng + 13Ina Schaefer Selected Aspects of Compilers 73
Optimization Optimization Potential
Optimization Potential - Example (8)
Remarks:
• Reduction of execution steps by almost half, where the mostsignificant reductions are achieved by loop optimization.
• Combination of optimization techniques is important. Determiningthe ordering of optimizations is in general difficult.
• We have only considered optimizations at examples. The difficultyis to find algorithms and heuristics for detecting optimizationpotential automatically and for executing the optimizingtransformations.
Ina Schaefer Selected Aspects of Compilers 74
Data Flow Analysis
Data Flow Analysis
For optimizations, data flow information is required that can beobtained by data flow analysis.
Goal: Explanation of basic concepts of data flow analysis at examples
Outline:• Liveness analysis (Typical example of data flow analysis)• Data flow equations• Important Analyses Classes
Each analysis has an exact specification which information it provides.
Ina Schaefer Selected Aspects of Compilers 75
Data Flow Analysis Liveness Analysis
Liveness Analysis
Definition (Liveness Analysis)Let P be a program. A variable v is live at a program position S, if thereis an execution path from S on which an application of v preceds adefinition of v.
The liveness analysis determines for al positions S in P whichvariables are live at S.
Ina Schaefer Selected Aspects of Compilers 76
Data Flow Analysis Liveness Analysis
Liveness Analysis (2)
Remarks:• The definition of liveness of variables is static/syntactic. We have
defined dead code dynamically/semantically.• The result of the liveness analysis for a programm P can be
represented as a function live mapping positions in P to bitvectors, where a bit vector contains an entry for each variable in P.Let i be the index of a variable in P, then it holds that:
live(s)[i] = 1 iff v is live at position s
Ina Schaefer Selected Aspects of Compilers 77
Data Flow Analysis Liveness Analysis
Liveness Analysis (3)
Idea:
• For a procedure-local analysis, at the end of the procedure theglobal variables are live.
• If the live variables out(b) at the end of a basic block B are known,the live variables in(B) at the beginning of B are computed by:
in(B) = gen(B) $ (out(B) \ kill(B))
where! gen(B) is the set of variables v such that v is applied in B without a
prior definition of v! kill(B) is the set of variables that are defined in B
Ina Schaefer Selected Aspects of Compilers 78
Data Flow Analysis Liveness Analysis
Liveness Analysis (4)
As the set in(B) is computed from out(B), we have a backward analysis.
out(B) is obtained by
out(B) =!
in(Bi) for all successors Bi of B
For a program without loops, in and out for all basic blocks are defined.Otherwise, we obtain a recursive system of equations.
Ina Schaefer Selected Aspects of Compilers 79
Data Flow Analysis Liveness Analysis
Liveness Analysis - Example
Question: How do we compute out(B2)?Ina Schaefer Selected Aspects of Compilers 80
Data Flow Analysis Data Flow Equations
Data Flow Equations
Theory:
• There is always a solution for the equations of the consideredform.
• There is always a smallest solution that is obtained by an iterationstarting from empty in and out sets.
Note: The solution is not necessarily unique.
Ina Schaefer Selected Aspects of Compilers 81
Data Flow Analysis Data Flow Equations
Ambiguity of Solutions - Example
Thus, out(B0) = in(B0) and in(B0) = {a} $ in(B0).
Possible Solutions: in(B0) = {a} or in(B0) = {a, b}
Ina Schaefer Selected Aspects of Compilers 82
Data Flow Analysis Data Flow Equations
Computation of Smallest Fixpoint
1. Compute gen(B), kill(B) for all B.
2. Set out(B) = % for all B except for the exit block. For the exit block,out(B) comes from the program context.
3. While out(B) or in(B) changes for any B:
Compute in(B) from current out(B) for all B.
Compute out(B) from in(B) of its successors.
Ina Schaefer Selected Aspects of Compilers 83
Data Flow Analysis Data Flow Equations
Further Analyses Classes
Many data flow analyses can be described as bit vector problems:• Reaching definitions: Which definitions reach a position S?• Available expressions for elimination of repeated computations• Very busy expressions: Which expression is needed for the
subsequent computations?
The according analyses can be treated analogue to liveness analysis,but differ in
• the definition of the data flow information• the definition of gen and kill• the direction of the analysis and the equations
Ina Schaefer Selected Aspects of Compilers 84
Data Flow Analysis Data Flow Equations
Further Analyses Classes (2)
For backward analyses, the data flow information at the entry of abasic block B is obtained from the information at the exit of B:
in(B) = gen(B) $ (out(B) \ kill(B))
Analyses can be distinguished if they consider the conjunction or theintersection of the successor information:
out(B) =!
Bi!succ(B)
in(Bi)
or
out(B) ="
Bi!succ(B)
in(Bi)
Ina Schaefer Selected Aspects of Compilers 85
Data Flow Analysis Data Flow Equations
Further Analyses Classes (3)
For forward analyses, the dependency is the other way round:
out(B) = gen(B) $ (in(B) \ kill(B))
with
in(B) =!
Bi!pred(B)
out(Bi)
or
in(B) ="
Bi!pred(B)
out(Bi)
Ina Schaefer Selected Aspects of Compilers 86
Data Flow Analysis Data Flow Equations
Further Analyses Classes (4)
Overview of Analysis Classes:
conjunction intersectionforward reachable definitions available expressions
backward live variables busy expressions
Ina Schaefer Selected Aspects of Compilers 87
Data Flow Analysis Data Flow Equations
Further Analyses Classes (5)
For bit vector problems, data flow information consists of subsets offinite sets.
For other analyses, the collected information is more complex, e.g., forconstant propagation, we consider mappings from variables to values.
For interprocedural analyses, complexity increases because the flowgraph is not static.
Formal basis for the development and correctness of optimizations isprovided by the theory of abstract interpretation.
Ina Schaefer Selected Aspects of Compilers 88
Data Flow Analysis Non-Local Program Analysis
Non-Local Program Analysis
At the example of a points-to analysis, we consider• interprocedural aspects: The analysis crosses the borders of
single procedures.• constraints: Program analysis very often involves solving or
refining constraints.• complex analysis results: The analysis result cannot be
represented locally for a statement.• analysis as abstraction: The result of the analysis is an
abstraction of all possible program executions.
Ina Schaefer Selected Aspects of Compilers 89
Data Flow Analysis Non-Local Program Analysis
Points-to Analysis
Analysis for programs with pointers and for object-oriented programs
Goal: Compute which references to which records/objects a variablecan hold.
Applications of Analysis Results:
Basis for optimizations• Alias information (e.g., important for code motion)
! Can p.f = x cause changes to an object referenced by q?! Can z = p.f read information that is written by p.f = x?
• Call graph construction• Resolution of virtual method calls• Escape analysis
Ina Schaefer Selected Aspects of Compilers 90
Data Flow Analysis Non-Local Program Analysis
Alias InformationBeispiele: (Verwendung von Points-to-
Analyseinformation)Analyseinformation)
(1) p.f = x;
(2) f
A. Nutzen von Alias-Information:
(2) y = q.f;
(3) q.f = z;
p == q: (1)
(2) y = x;(2) y x;
(3) q.f = z;
p != q: Erste Anweisung lässt sich mit den
anderen beiden vertauschenanderen beiden vertauschen.
B. Elimination dynamischer Bindung:
class A {class A {
void m( ... ) { ... }
}
class B extends A {
void m( ) { }void m( ... ) { ... }
}
...
A p;
28.06.2007 338© A. Poetzsch-Heffter, TU Kaiserslautern
p = new B();
p.m(...) // Aufruf von B::m
First two statements can
be switched.
Ina Schaefer Selected Aspects of Compilers 91
Data Flow Analysis Non-Local Program Analysis
Elimination of Dynamic Binding
Beispiele: (Verwendung von Points-to-
Analyseinformation)Analyseinformation)
(1) p.f = x;
(2) f
A. Nutzen von Alias-Information:
(2) y = q.f;
(3) q.f = z;
p == q: (1)
(2) y = x;(2) y x;
(3) q.f = z;
p != q: Erste Anweisung lässt sich mit den
anderen beiden vertauschenanderen beiden vertauschen.
B. Elimination dynamischer Bindung:
class A {class A {
void m( ... ) { ... }
}
class B extends A {
void m( ) { }void m( ... ) { ... }
}
...
A p;
© A. Poetzsch-Heffter, TU Kaiserslautern
p = new B();
p.m(...) // Aufruf von B::mCall of B::m
Ina Schaefer Selected Aspects of Compilers 92
Data Flow Analysis Non-Local Program Analysis
Escape Analysis
C. Escape-Analyse:
R m( A p ) {( p ) {
B q;
q = new B(); // Kellerverwaltung möglich
q.f = p;
q.g = p.n(); q g p ();
return q.g;
}
Eine Points-to-Analyse für Java:
Vereinfachungen:
• Gesamte Programm ist bekannt.
• Nur Zuweisungen und Methodenaufrufe der
folgenden Form:
Di kt Z i- Direkte Zuweisung: l = r
- Schreiben auf Instanzvariablen: l.f = r
- Lesen von Instanzvariablen: l = r.f
Objekterzeugung: l C()- Objekterzeugung: l = new C()
- Einfacher Methodenaufruf: l = r0.m(r1,..)
• Ausdrücke ohne Seiteneffekte
• Zusammengesetzte Anweisungen
© A. Poetzsch-Heffter, TU Kaiserslautern
• Zusammengesetzte Anweisungen
Can be stored on stack
Ina Schaefer Selected Aspects of Compilers 93
Data Flow Analysis Non-Local Program Analysis
A Points-to Analysis for Java
Simplifications• Complete program is known.• Only assignments and method calls of the following form are
used:! Direct assignment: l = r! Write to instance variables: l.f = r! Read of instance variables: l = r.f! Object creation: l = new C()! Simple method call: l = r0.m(r1, ...)
• Expressions without side effects• Nested Expressions
Ina Schaefer Selected Aspects of Compilers 94
Data Flow Analysis Non-Local Program Analysis
A Points-to Analysis for Java (2)
Analysis Type• Flow-insensitive: The control flow of the program has no
influence on the analysis result. The states of the variables atdifferent program points are combined.
• Context-insensitive: Method calls at different program points arenot distinguished.
Ina Schaefer Selected Aspects of Compilers 95
Data Flow Analysis Non-Local Program Analysis
A Points-to Analysis for Java (3)
Points-to Graph as Abstraction
Result of the analysis is a points-to graph with• abstract variables and abstract objects as nodes• edges represent that an abstract variable may have a reference to
an abstract object
Abstract variables V represent sets of concrete variables at runtime.
Abstract objects O represent sets of concrete objects at runtime.
An edge between V and O means, that in a certain program state, aconcrete variable in V may reference an object in O.
Ina Schaefer Selected Aspects of Compilers 96
Data Flow Analysis Non-Local Program Analysis
Points-to Graph - ExampleBeispiel: (Points-to-Graph)
class Y { ... }
class X {
Y f;
void set( Y r ) { this.f = r; }
static void main() {
X p = new X(); // s1 „erzeugt“ o1
Y q = new Y(); // s2 „erzeugt“ o2q (); // „ g
p.set(q);
}
}
p
o1
this
o1
f
q
r
o2
28.06.2007 341© A. Poetzsch-Heffter, TU Kaiserslautern
r
Ina Schaefer Selected Aspects of Compilers 97
Data Flow Analysis Non-Local Program Analysis
Points-to Graph - Example (2)
Ina Schaefer Selected Aspects of Compilers 98
Data Flow Analysis Non-Local Program Analysis
Definition of the Points-to Graph
For all method implementations,• create node o for each object creation• create nodes for
! each local variable v! each formal parameter p of any method
(incl. this and results (ret))! each static variable s
(Instance variables are modeled by labeled edges.)
Ina Schaefer Selected Aspects of Compilers 99
Data Flow Analysis Non-Local Program Analysis
Definition of the Points-to Graph (2)Edges: Smallest Fixpoint of f : PtGraph & Stmt ' PtGraph with
• f (G, l = new C()) = G $ {(l , oi)}• f (G, l = r) = G $ {(l , oi) |oi ( Pt(G, r)}• f (G, l .f = r) = G $ {(< oi , f >, oj) |oi ( Pt(G, l), oj ( Pt(G, r)}• f (G, l = r .f ) = G $ {(l , oi) | )oj ( Pt(G, r).oi ( Pt(G, < oj , f >)}• f (G, l = r0.m(r1, . . . , rn)) =
G $#
oi!Pt(G,r0)solve(G, m, oi , r1, . . . , rn, l)
where Pt(G, x) is the points-to set of x in G,
solve(G, m, oi , r1, . . . , rn, l) =let mj(p0, p1, . . . , pn, retj) = dispatch(oi , m) in{(p0, oi)} $ f (G, p1 = r1) $ . . . $ f (G, l = retj) end
and dispatch(oi , m) returns the actual implementation of m for oi with formalparameters p1, . . . , pn, result variable retj , p0 refers to this.
Ina Schaefer Selected Aspects of Compilers 100
Data Flow Analysis Non-Local Program Analysis
Definition of the Points-to Graph (3)
Remark:
The main problem for practical use of the analysis is the efficientimplementation of the computation of the points-to graph.
Literature:
A. Rountev, A. Milanova, B. Ryder: Points-to Analysis for Java UsingAnnotated Constraints. OOPSLA 2001.
Ina Schaefer Selected Aspects of Compilers 101