CSE 431S Final Review - Washington University in St. Louis · Final Review Washington University...
Transcript of CSE 431S Final Review - Washington University in St. Louis · Final Review Washington University...
CSE 431S Final Review
Washington University
Spring 2013
What You Should Know
• The six stages of a compiler and what each stage does.
• The input to and output of each compilation stage (especially the back-end).
• Context-free languages. • Definition of a context-free grammar (including
the formal definition). • Leftmost and rightmost derivations and parse
trees. • Ambiguity.
What You Should Know
• Bottom-up (shift-reduce) parsing.
• LR(0) parser construction.
• SLR conflict resolution.
• LR(1) parser construction.
• Abstract syntax trees.
• L-value vs R-value
• Static type checking.
• Symbol tables.
What You Should Know
• CUP actions.
• Jasmin basics.
• Code generation.
• Call stack (function activation).
• Stack-based vs heap-based memory allocation.
• Parameter passing mechanisms.
• Register allocation (graph coloring).
Context-Free Languages
• Recall right-linear grammars:
X → a Y | b
Restricted right-hand side
• Context-free grammars:
– Allow anything on the right-hand side.
A → ( A ) | x
Context-Free Grammars
• A grammar is a 4-tuple: – ∑: set of terminals
– V: set of nonterminals
– S: start nonterminal
– P: set of productions (rewrite rules)
• For a grammar to be context-free, all productions must be of the form: – 𝐴 → 𝛼, where 𝛼 is any sequence of symbols
(terminals and nonterminals)
Ambiguity
What about: E → E + E | a
E
+ E E
+ E E
a a
a
E
+ E E
+ E
a a
a E
Two syntax trees for the string “a + a + a”
Ambiguity
• If there are multiple parse trees--or, equivalently, multiple leftmost derivations--for some string then the grammar is ambiguous.
– Note that it is the grammar that is ambiguous, not the language.
• There may exist a non-ambiguous grammar for the same language.
Bottom-Up Parsing
• Instead of starting from a start nonterminal and producing the parse tree, start from the leaves and build tree bottom up
– Start nonterminal is now the goal nonterminal
Sample Grammar
1. S E $
2. E E + T
3. | T
4. T a
5. | ( E )
LR(0) Item
• The dot “•” represents the current parse state (e.g. what has been “seen”)
• The initial set of rules are the called the “kernel”
• The “non-kernel” items are generated from the “closure” operation and represents any nonterminals after the dot
LR(0) Parse States
I0 = START
S • E $
LR(0) Parse States
I0 = START
S • E $
E • E + T
E • T
T • a
T • ( E )
• The closure operation adds all of the rules for a nonterminal to the immediate right of the dot – “Close” on the operation
• The number in the square indicates which state to go to on the symbol to the right of the dot – Must go to a single state
for each symbol (deterministic)
1
1
9
5
6
LR(0) Parse States
I0 = START
S • E $
E • E + T
E • T
T • a
T • ( E )
I1 = GOTO(I0, “E”)
S E • $
E E • + T
1
1
9
5
6
• There must be only one state with a given kernel
– i.e., no identical states
2
3
Example
1. S A C $ 2. A a B C d 3. | B Q 4. | λ 5. B b B 6. | d 7. C c 8. | λ 9. Q q
LR(0) Parse States I0
S • A C $
A • a B C d
A • B Q
A •
B • b B
B • d
I1
S A • C $
C • c
C •
I2
S A C • $
1
2
I3
S A C $ •
I4
C c •
I5
A a • B C d
B • b B
B • d
I6
A a B • C d
C • c
C •
3
4
5
6
I7
A a B C • d
I8
A a B C d •
I9
B b • B
B • b B
B • d
I10
B b B •
7
8
4
9
9
11 10
9
11
12
12
LR(0) Parse States I11
B d •
I12
A B • Q
Q • q
I13
A B Q •
I14
Q q •
13
14
Grammar is not LR(0) parsable: shift/reduce conflicts in states 0, 1, and 6
SLR(1)
• Create the LR(0) states.
• If there are no conflicts then we are done.
• For states with conflicts
– Try to use follow sets to resolve the conflicts.
– If all conflicts can be resolved using the follow sets then the grammar is SLR(1).
SLR(1)
• Shift/Reduce conflict – Need to make sure that
every terminal to the immediate right of a • in not in the Follow set of the nonterminal of the reduction rule
I6
A a B • C d C • c C •
States 1 and 6: Follow(C) = { d, $ } so c is not an element of Follow(C)
I1
S A • C $
C • c
C •
SLR(1)
• All conflicts can be resolved using the Follow sets, so the grammar is SLR parsable
I0
S • A C $ A • a B C d A • B Q A • B • b B B • d
State 0: Follow(A) = { c, $ } so a, b, and d are not elements of Follow(A)
SLR(1) State Table State a b c d q $ A B C Q S
0 S5 S9 R4 S12 R4 S1 S13 Done
1 S4 R8 R8 S2
2 S3
3 R1
4 R7
5 S9 S12 S6
6 S4 R8 R8 S7
7 S8
8 R2
9 S9 S11 S10
10 R5
11 R6
12 S14 S13
13 R3
14 R9
Sample Parse
- 0
- 0
a 5
- 0
a 5
b 9
- 0
a 5
b 9
b 9
- 0
a 5
b 9
b 9
d 11
- 0
a 5
b 9
b 9
-0
a 5
b 9
b 9
B 10
Stack
a b b d d c $
b b d d c $
b d d c $
d d c $
d c $
B d c $
d c $
Remaining Input
S5
S9
S9
S11
R6
S10
R5
Sample Parse
- 0
a 5
b 9
- 0
a 5
b 9
B 10
- 0
a 5
- 0
a 5
B 6
- 0
a 5
B 6
- 0
a 5
B 6
C 7
-0
a 5
B 6
C 7
d 8
Stack
B d c $
d c $
B d c $
d c $
C d c $
d c $
c $
Remaining Input
S10
S9
R5
S6
R8
S7
S8
Sample Parse
- 0
- 0
A 1
- 0
A 1
c 4
- 0
A 1
- 0
A 1
C 2
- 0
A 1
C 2
$ 3
-0
Stack
A c $
c $
$
C $
$
S
Remaining Input
R2
S1
S4
R7
S2
S3
R1
Done
Syntax Trees
• Concrete
– Actual parse tree
• Abstract
– Eliminates unnecessary nodes
– Structures the tree appropriately for evaluation
– Serves as basis for code generation
Concrete vs. Abstract
Construction
• Java code added to productions – Most common action is to build a new tree node and
assign to RESULT, which attaches it to the left-hand nonterminal • Values for the nonterminals on the right-hand side are
usually child tree nodes
Stmt ::= id:id assign E:e
{: RESULT = new AssignmentNode(id, e); :}
| if lparen E:pr rparent Stmt:s fi
{: RESULT = new IfNode(pr, s); :}
| if lparen E:pr rparent Stmt:s1 else Stmt:s2 fi
{: RESULT = new IfNode(pr, s1, s2); :}
…
;
Construction
Stmt ::= begin Stmts:block end
{: RESULT = block; :}
;
Stmts ::= Stmts:block semi Stmt:stmt
{:
block.add(stmt);
RESULT = block;
:}
| Stmt:s
{: RESULT = new BlockNode(s); :}
;
Construction
• Alternate construction of BlockNode
Stmt ::= begin Stmts:list end
{: RESULT = new BlockNode(list); :}
;
Stmts ::= Stmts:list semi Stmt:stmt
{:
list.add(stmt);
RESULT = list;
:}
| Stmt:s
{: RESULT = new ArrayList();
RESULT.add(s);
:}
;
Left and Right Values
x = y
• “x” is the L-value
– Refers to the location of “x”, not its value
• “y” is the R-value
– Refers to the value of “y”, not its location
Example
Note that there is an error in this figure. The deref in the tree for example b should not be there.
Type Checking
• When are types checked? – Statically at compile time
• Compiler does type checking during compilation
• Ideally eliminate runtime checks
– Dynamically at runtime • Compiler generates code to do type checking at runtime
• JavaScript vs. Java
• Java still does a large amount of runtime type checking
• We’ll focus on static typing for basic types
Expression Types
• For every operator we need to know – allowed types of operands – resulting type – implicit coercion
• changes the representation, not the data • short to long
– implicit conversion • may change the data • int to float
– explicit cast • may lose information • float to int, int to short
What are the types?
=
x +
y 3.14
int
float int
?
?
Determining Types
• make sure type is allowed (int + float)
• assign the resultant type to the operator (float)
• generate any necessary coercion(s) or conversion(s)
– most hardware has (int + int) and (float + float) but not (int + float)
Adding Coercion
=
x +
int 2 float
3.14
int
float float
?
y int
float
Explicit Casting
=
x
+
int 2 float
3.14
int
float float
int
y int
float
float 2 int int
Symbol Table
Proc Dcls Body
Dcls
int I; float j;
Proc
Body
i=3; j = i * 3.14;
Synthesize symbol info Inherit symbol info
Symbol Table
• Persists the synthesized information as a side effect of the translation
• Maps a name and environment to information
– Environment is the scope
– Scope is static
• Basic actions
– Establish a mapping
– Retrieve a mapping
public class Car { int id; int color; int GetType() { String id; } public class Wheel { Object id; int GetType() {
float id; }
} }
Name Scope Info
id Car int
color Car int
id Car:GetType String
id Car:Wheel Object
id Car:Wheel:GetType float
Scopes
• Scopes are static
• Scopes are nested
– LIFO (last in, first out) Car scope
GetType scope
Wheel scope
GetType scope
Possible Implementations
• Option 1: Keep all information available at all times
• Option 2: Use LIFO and process a scope at a time
Name Scope Info
id Car int
color Car int
id Car:GetType String
id Car:Wheel Object
id Car:Wheel:GetType float
LIFO Scopes
• Symbol table will be a stack of maps of name to information
• One map per scope (environment)
• Four basic operations
– Enter Scope
– Leave Scope
– Add Symbol
– Lookup Symbol
Implementation
• Scopes are LIFO so using a stack makes sense
• For each scope, use a map since we lookup names to retrieve info about them
– Typically use a hash map
Hello World :: Source
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello World!");
}
}
Hello World :: Jasmin
.class public HelloWorld
.super java/lang/Object ; ; standard initializer (calls java.lang.Object's initializer) ; .method public <init>()V aload_0 invokenonvirtual java/lang/Object/<init>()V return.end method ; ; main() - prints out Hello World ; .method public static main([Ljava/lang/String;)V .limit stack 2 ; up to two items can be pushed ; push System.out onto the stack getstatic java/lang/System/out Ljava/io/PrintStream ; ; push a string onto the stack ldc "Hello World!“ ; call the PrintStream.println() method. invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V ; done return .end method
Source to AST
Source if (i > 431) {
a = b + c;
}
AST IF_STATEMENT
GREATER_THAN
VAR_USE
IDENTIFIER (i) (SymbolInfo: INT, lv = 0)
INTEGER_LITERAL (431)
BLOCK
EXPRESSION_STATEMENT
ASSIGN
IDENTIFIER (a) (SymbolInfo: INT, lv = 1)
ADDITION
VAR_USE
IDENTIFIER (b) (SymbolInfo: INT, lv = 2)
VAR_USE
IDENTIFIER (c) (SymbolInfo: INT, lv = 3)
AST to Code
Code iload 0
ldc 431
if_icmpgt label3
iconst_0
goto label4
label3:
iconst_1
label4:
ifeq label1
iload 2
iload 3
iadd
istore 1
goto label2
label1:
label2:
AST IF_STATEMENT
GREATER_THAN
VAR_USE
IDENTIFIER (i) (SymbolInfo: INT, lv = 0)
INTEGER_LITERAL (431)
BLOCK
EXPRESSION_STATEMENT
ASSIGN
IDENTIFIER (a) (SymbolInfo: INT, lv = 1)
ADDITION
VAR_USE
IDENTIFIER (b) (SymbolInfo: INT, lv = 2)
VAR_USE
IDENTIFIER (c) (SymbolInfo: INT, lv = 3)
Break It Down
• IF_STATEMENT node – Create two labels (will be needed later) – Visit first child
• Code for boolean test expression should be generated – Code for the boolean expression should leave 0 (for false) or 1 (for true) on top of stack
– Output code that compares top of stack to 0 and jump to label for “else” block (to be output later) if 0
– Visit second child • Code for “then” block should be generated
– Output code that jumps over “else” block and output label to start “else” block
– Visit third child (if it exists) • Code for “else” block should be generated
– Output label at end of “else” block
IF_STATEMENT
private void visitIfStatementNode(ASTNode node) throws Exception {
String elseLabel = generateLabel();
String endLabel = generateLabel();
node.getChild(0).accept(this); // visit first child
stream.println(" ifeq " + elseLabel);
node.getChild(1).accept(this); // visit second child
stream.println(" goto " + endLabel);
stream.println(elseLabel + ":");
ASTNode elseBlock = node.getChild(2);
if (elseBlock != null) {
elseBlock.accept(this); // visit third child
}
stream.println(endLabel + ":");
}
Run-time System
• The run-time system consists of everything needed at run-time to support the execution of a process.
– This includes memory management, call-stack management, system call API, etc.
Function Calls
• Invoke “f” during runtime
• What happens? 1. Parameters are transmitted
2. Local storage is allocated
3. Local storage is initialized
4. Body of “f” executes
5. Return values prepared
6. Free storage
7. Return context to call
Function Calls
• Each invocation of “f” is a new activation
• What is the lifetime of “f”?
Lifetime
a a b b
overlapping
a
a b
b
disjoint
Activation
• Use a stack to represent activations
– No activation specific info survives death
– No activation specific info required for birth
• Each activation pushes a new “activation record” onto the run-time stack
• What will we record in it?
Activation Record
• Return address
• Storage information
– Local storage
– Parameters
– Access to non-locals
Parameter Passing
• Call by value – Argument is R-value – Value of arguments are copied into the function – swap(x, y) won’t change the value of x or y
• Call by reference – Argument is L-value – Variable in function points to the same location as the
argument – swap(x, y) would change the value of x and y
• Most modern languages use call-by-value semantics
Parameter Passing
• Java uses call-by-value semantics
– It is sometimes said that Java uses call-by-value for primitives and call-by-reference for object types, but that is not quite true.
– Java is call-by-value for everything, except that it does not copy objects but rather copies references to the objects.
• That is, the caller and callee both have references to the same object.
Parameter Passing
• Does not work in Java
– Primitive parameters are copied
void swap(int x, int y) {
int t = x;
x = y;
y = t;
}
Parameter Passing
• Still does not work in Java
– References to objects are copied
void swap(Integer x, Integer y) {
Integer t = x;
x = y;
y = t;
}
Parameter Passing
• Cannot swap the objects, but could change the internal state of the objects
void swap(ModInteger x, ModInteger y) {
int t = x.getValue();
x.setValue(y.getValue());
y.setValue(t);
}
Register Allocation
• Most architectures have only a handful of registers to use for calculations
– Values need to be copied from memory into registers when needed, and then copied back to memory when a register is needed for something else
– For performance, we want to minimize the number of copies to/from memory
Register Allocation
• Can build an interference graph to determine what variables are live at the same time
• First, determine the live ranges of variables based on their "use" and "def"
– A def is an assignment to a variable (L-value)
– A use is the use of the value of a variable (R-value)
Live Ranges
x =
y =
= x
z =
= z
= y
x y z
Variables with ranges that overlap are live at the same time and therefore must use different registers to avoid extra copying in and out of memory
Interference Graph
• Each variable is a vertex in the graph
• An edge in the graph indicates that those two variables are live at the same time
– So the edges indicate which variables cannot share a register
x y
z
Graph Coloring
• The problem of allocating registers now becomes one of coloring the interference graph – We want to color the vertices of the graph so that no
two adjacent vertices have the same color
– The maximum number of colors we can use is the equal to the number of available registers • A coloring with a maximum number of colors k is called a k-
coloring
• But k-coloring a graph is NP-complete and we need it to be fast – Use a heuristic algorithm
Graph Coloring
• Find a vertex whose edge count is < k • Push the vertex on a stack and remove from the graph • Repeat until there are no vertices left in the graph or
there are no vertices with an edge count < k in the graph
• If all vertices have been removed from the graph then the graph can be k-colored – Pop a vertex from the stack and add back to the graph – Color the vertex a different color from any of its neighbors
currently in the graph • How can we know that there is an available color?
– Repeat until stack is empty
Graph Coloring
A B C
G
E D F
Try k = 3
Graph Coloring
A B C
G
E D F
Graph Coloring
• Note that if we get to a point when removing vertices from the graph where all of the remaining vertices have an edge count >= k then it does not necessarily mean the graph cannot be k-colored – It just means the heuristic algorithm failed
– Could try a different algorithm • But it could be that the graph is not k-colorable
– Will need to spill the registers • At some point, copy the registers out to memory so we can
use them to hold other variables
Parsers
• LR(0) – 0 symbols of look ahead when creating the parse table
• SLR – Simple LR resolves conflicts using global grammar
follow sets
• LALR – Look Ahead LR combines some states based on follow
set information
• LR(k) – Most powerful of those where parse states are
created ahead of time
Yet Another Example
1. P S $
2. S A B A C
3. | a a c
4. A a a
5. B b
6. | λ
7. C c
8. | λ
Grammar Parse States I0
P • S $ , {}
S • A B A C, {$}
S • a a c , {$}
A • a a , {b,a}
I1
S a • a c , {$}
A a • a , {b, a}
I2
S a a • c , {$}
A a a • , {b, a}
I3
S a a c • , {$}
I4
S A • B A C , {$}
B • b , {a}
B • , {a}
I5
B b • , {a}
I6
S A B • A C , {$}
A • a a , {c,$}
7
Kernel Rules
1
4
12
1
3
2
2
9
6
5
Grammar Parse States (cont.)
Kernel Rules
8
11
10
13
I7
A a • a , {c,$}
I8
A a a • , {c,$}
I9
S A B A • C , {$}
C • c , {$}
C • , {$}
I10
C c • , {$}
I11
S A B A C • , {$}
I12
P S • $ , {}
I13
P S $ • , {}