CSE 431S Final Review - Washington University in St. Louis · Final Review Washington University...

CSE 431S Final Review

Washington University

Spring 2013

What You Should Know

• The six stages of a compiler and what each stage does.

• The input to and output of each compilation stage (especially the back-end).

• Context-free languages. • Definition of a context-free grammar (including

the formal definition). • Leftmost and rightmost derivations and parse

trees. • Ambiguity.


• Bottom-up (shift-reduce) parsing.

• LR(0) parser construction.

• SLR conflict resolution.

• LR(1) parser construction.

• Abstract syntax trees.

• L-value vs R-value

• Static type checking.

• Symbol tables.


• CUP actions.

• Jasmin basics.

• Code generation.

• Call stack (function activation).

• Stack-based vs heap-based memory allocation.

• Parameter passing mechanisms.

• Register allocation (graph coloring).

Context-Free Languages

• Recall right-linear grammars:

X → a Y | b

Restricted right-hand side

• Context-free grammars:

– Allow anything on the right-hand side.

A → ( A ) | x

Context-Free Grammars

• A grammar is a 4-tuple: – ∑: set of terminals

– V: set of nonterminals

– S: start nonterminal

– P: set of productions (rewrite rules)

• For a grammar to be context-free, all productions must be of the form: – 𝐴 → 𝛼, where 𝛼 is any sequence of symbols

(terminals and nonterminals)

Ambiguity

What about: E → E + E | a

E

+ E E

+ E E

a a

a

E

+ E E

+ E

a a

a E

Two syntax trees for the string “a + a + a”

Ambiguity

• If there are multiple parse trees--or, equivalently, multiple leftmost derivations--for some string then the grammar is ambiguous.

– Note that it is the grammar that is ambiguous, not the language.

• There may exist a non-ambiguous grammar for the same language.

Bottom-Up Parsing

• Instead of starting from a start nonterminal and producing the parse tree, start from the leaves and build tree bottom up

– Start nonterminal is now the goal nonterminal

Sample Grammar

1. S E $

2. E E + T

3. | T

4. T a

5. | ( E )

LR(0) Item

• The dot “•” represents the current parse state (e.g. what has been “seen”)

• The initial set of rules are the called the “kernel”

• The “non-kernel” items are generated from the “closure” operation and represents any nonterminals after the dot

LR(0) Parse States

I0 = START

S • E $

LR(0) Parse States

I0 = START

S • E $

E • E + T

E • T

T • a

T • ( E )

• The closure operation adds all of the rules for a nonterminal to the immediate right of the dot – “Close” on the operation

• The number in the square indicates which state to go to on the symbol to the right of the dot – Must go to a single state

for each symbol (deterministic)

1

1

9

5

6

LR(0) Parse States

I0 = START

S • E $

E • E + T

E • T

T • a

T • ( E )

I1 = GOTO(I0, “E”)

S E • $

E E • + T

1

1

9

5

6

• There must be only one state with a given kernel

– i.e., no identical states

2

3

Example

1. S A C $ 2. A a B C d 3. | B Q 4. | λ 5. B b B 6. | d 7. C c 8. | λ 9. Q q

LR(0) Parse States I0

S • A C $

A • a B C d

A • B Q

A •

B • b B

B • d

I1

S A • C $

C • c

C •

I2

S A C • $

1

2

I3

S A C $ •

I4

C c •

I5

A a • B C d

B • b B

B • d

I6

A a B • C d

C • c

C •

3

4

5

6

I7

A a B C • d

I8

A a B C d •

I9

B b • B

B • b B

B • d

I10

B b B •

7

8

4

9

9

11 10

9

11

12

12

LR(0) Parse States I11

B d •

I12

A B • Q

Q • q

I13

A B Q •

I14

Q q •

13

14

Grammar is not LR(0) parsable: shift/reduce conflicts in states 0, 1, and 6

SLR(1)

• Create the LR(0) states.

• If there are no conflicts then we are done.

• For states with conflicts

– Try to use follow sets to resolve the conflicts.

– If all conflicts can be resolved using the follow sets then the grammar is SLR(1).

SLR(1)

• Shift/Reduce conflict – Need to make sure that

every terminal to the immediate right of a • in not in the Follow set of the nonterminal of the reduction rule

I6

A a B • C d C • c C •

States 1 and 6: Follow(C) = { d, $ } so c is not an element of Follow(C)

I1

S A • C $

C • c

C •

SLR(1)

• All conflicts can be resolved using the Follow sets, so the grammar is SLR parsable

I0

S • A C $ A • a B C d A • B Q A • B • b B B • d

State 0: Follow(A) = { c, $ } so a, b, and d are not elements of Follow(A)

SLR(1) State Table State a b c d q $ A B C Q S

0 S5 S9 R4 S12 R4 S1 S13 Done

1 S4 R8 R8 S2

2 S3

3 R1

4 R7

5 S9 S12 S6

6 S4 R8 R8 S7

7 S8

8 R2

9 S9 S11 S10

10 R5

11 R6

12 S14 S13

13 R3

14 R9

Sample Parse

- 0

- 0

a 5

- 0

a 5

b 9

- 0

a 5

b 9

b 9

- 0

a 5

b 9

b 9

d 11

- 0

a 5

b 9

b 9

-0

a 5

b 9

b 9

B 10

Stack

a b b d d c $

b b d d c $

b d d c $

d d c $

d c $

B d c $

d c $

Remaining Input

S5

S9

S9

S11

R6

S10

R5

Sample Parse

- 0

a 5

b 9

- 0

a 5

b 9

B 10

- 0

a 5

- 0

a 5

B 6

- 0

a 5

B 6

- 0

a 5

B 6

C 7

-0

a 5

B 6

C 7

d 8

Stack

B d c $

d c $

B d c $

d c $

C d c $

d c $

c $

Remaining Input

S10

S9

R5

S6

R8

S7

S8

Sample Parse

- 0

- 0

A 1

- 0

A 1

c 4

- 0

A 1

- 0

A 1

C 2

- 0

A 1

C 2

$ 3

-0

Stack

A c $

c $

$

C $

$

S

Remaining Input

R2

S1

S4

R7

S2

S3

R1

Done

Syntax Trees

• Concrete

– Actual parse tree

• Abstract

– Eliminates unnecessary nodes

– Structures the tree appropriately for evaluation

– Serves as basis for code generation

Concrete vs. Abstract

Construction

• Java code added to productions – Most common action is to build a new tree node and

assign to RESULT, which attaches it to the left-hand nonterminal • Values for the nonterminals on the right-hand side are

usually child tree nodes

Stmt ::= id:id assign E:e

{: RESULT = new AssignmentNode(id, e); :}

| if lparen E:pr rparent Stmt:s fi

{: RESULT = new IfNode(pr, s); :}

| if lparen E:pr rparent Stmt:s1 else Stmt:s2 fi

{: RESULT = new IfNode(pr, s1, s2); :}

…

;

Construction

Stmt ::= begin Stmts:block end

{: RESULT = block; :}

;

Stmts ::= Stmts:block semi Stmt:stmt

{:

block.add(stmt);

RESULT = block;

:}

| Stmt:s

{: RESULT = new BlockNode(s); :}

;

Construction

• Alternate construction of BlockNode

Stmt ::= begin Stmts:list end

{: RESULT = new BlockNode(list); :}

;

Stmts ::= Stmts:list semi Stmt:stmt

{:

list.add(stmt);

RESULT = list;

:}

| Stmt:s

{: RESULT = new ArrayList();

RESULT.add(s);

:}

;

Left and Right Values

x = y

• “x” is the L-value

– Refers to the location of “x”, not its value

• “y” is the R-value

– Refers to the value of “y”, not its location

Example

Note that there is an error in this figure. The deref in the tree for example b should not be there.

Type Checking

• When are types checked? – Statically at compile time

• Compiler does type checking during compilation

• Ideally eliminate runtime checks

– Dynamically at runtime • Compiler generates code to do type checking at runtime

• JavaScript vs. Java

• Java still does a large amount of runtime type checking

• We’ll focus on static typing for basic types

Expression Types

• For every operator we need to know – allowed types of operands – resulting type – implicit coercion

• changes the representation, not the data • short to long

– implicit conversion • may change the data • int to float

– explicit cast • may lose information • float to int, int to short

What are the types?

=

x +

y 3.14

int

float int

?

?

Determining Types

• make sure type is allowed (int + float)

• assign the resultant type to the operator (float)

• generate any necessary coercion(s) or conversion(s)

– most hardware has (int + int) and (float + float) but not (int + float)

Adding Coercion

=

x +

int 2 float

3.14

int

float float

?

y int

float

Explicit Casting

=

x

+

int 2 float

3.14

int

float float

int

y int

float

float 2 int int

Symbol Table

Proc Dcls Body

Dcls

int I; float j;

Proc

Body

i=3; j = i * 3.14;

Synthesize symbol info Inherit symbol info

Symbol Table

• Persists the synthesized information as a side effect of the translation

• Maps a name and environment to information

– Environment is the scope

– Scope is static

• Basic actions

– Establish a mapping

– Retrieve a mapping

public class Car { int id; int color; int GetType() { String id; } public class Wheel { Object id; int GetType() {

float id; }

} }

Name Scope Info

id Car int

color Car int

id Car:GetType String

id Car:Wheel Object

id Car:Wheel:GetType float

Scopes

• Scopes are static

• Scopes are nested

– LIFO (last in, first out) Car scope

GetType scope

Wheel scope

GetType scope

Possible Implementations

• Option 1: Keep all information available at all times

• Option 2: Use LIFO and process a scope at a time

Name Scope Info

id Car int

color Car int

id Car:GetType String

id Car:Wheel Object

id Car:Wheel:GetType float

LIFO Scopes

• Symbol table will be a stack of maps of name to information

• One map per scope (environment)

• Four basic operations

– Enter Scope

– Leave Scope

– Add Symbol

– Lookup Symbol

Implementation

• Scopes are LIFO so using a stack makes sense

• For each scope, use a map since we lookup names to retrieve info about them

– Typically use a hash map

Hello World :: Source

public class HelloWorld {

public static void main(String[] args) {

System.out.println("Hello World!");

}

}

Hello World :: Jasmin

.class public HelloWorld

.super java/lang/Object ; ; standard initializer (calls java.lang.Object's initializer) ; .method public <init>()V aload_0 invokenonvirtual java/lang/Object/<init>()V return.end method ; ; main() - prints out Hello World ; .method public static main([Ljava/lang/String;)V .limit stack 2 ; up to two items can be pushed ; push System.out onto the stack getstatic java/lang/System/out Ljava/io/PrintStream ; ; push a string onto the stack ldc "Hello World!“ ; call the PrintStream.println() method. invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V ; done return .end method

Source to AST

Source if (i > 431) {

a = b + c;

}

AST IF_STATEMENT

GREATER_THAN

VAR_USE

IDENTIFIER (i) (SymbolInfo: INT, lv = 0)

INTEGER_LITERAL (431)

BLOCK

EXPRESSION_STATEMENT

ASSIGN

IDENTIFIER (a) (SymbolInfo: INT, lv = 1)

ADDITION

VAR_USE

IDENTIFIER (b) (SymbolInfo: INT, lv = 2)

VAR_USE

IDENTIFIER (c) (SymbolInfo: INT, lv = 3)

AST to Code

Code iload 0

ldc 431

if_icmpgt label3

iconst_0

goto label4

label3:

iconst_1

label4:

ifeq label1

iload 2

iload 3

iadd

istore 1

goto label2

label1:

label2:

AST IF_STATEMENT

GREATER_THAN

VAR_USE

IDENTIFIER (i) (SymbolInfo: INT, lv = 0)

INTEGER_LITERAL (431)

BLOCK

EXPRESSION_STATEMENT

ASSIGN

IDENTIFIER (a) (SymbolInfo: INT, lv = 1)

ADDITION

VAR_USE

IDENTIFIER (b) (SymbolInfo: INT, lv = 2)

VAR_USE

IDENTIFIER (c) (SymbolInfo: INT, lv = 3)

Break It Down

• IF_STATEMENT node – Create two labels (will be needed later) – Visit first child

• Code for boolean test expression should be generated – Code for the boolean expression should leave 0 (for false) or 1 (for true) on top of stack

– Output code that compares top of stack to 0 and jump to label for “else” block (to be output later) if 0

– Visit second child • Code for “then” block should be generated

– Output code that jumps over “else” block and output label to start “else” block

– Visit third child (if it exists) • Code for “else” block should be generated

– Output label at end of “else” block

IF_STATEMENT

private void visitIfStatementNode(ASTNode node) throws Exception {

String elseLabel = generateLabel();

String endLabel = generateLabel();

node.getChild(0).accept(this); // visit first child

stream.println(" ifeq " + elseLabel);

node.getChild(1).accept(this); // visit second child

stream.println(" goto " + endLabel);

stream.println(elseLabel + ":");

ASTNode elseBlock = node.getChild(2);

if (elseBlock != null) {

elseBlock.accept(this); // visit third child

}

stream.println(endLabel + ":");

}

Run-time System

• The run-time system consists of everything needed at run-time to support the execution of a process.

– This includes memory management, call-stack management, system call API, etc.

Function Calls

• Invoke “f” during runtime

• What happens? 1. Parameters are transmitted

2. Local storage is allocated

3. Local storage is initialized

4. Body of “f” executes

5. Return values prepared

6. Free storage

7. Return context to call

Function Calls

• Each invocation of “f” is a new activation

• What is the lifetime of “f”?

Lifetime

a a b b

overlapping

a

a b

b

disjoint

Activation

• Use a stack to represent activations

– No activation specific info survives death

– No activation specific info required for birth

• Each activation pushes a new “activation record” onto the run-time stack

• What will we record in it?

Activation Record

• Return address

• Storage information

– Local storage

– Parameters

– Access to non-locals

Parameter Passing

• Call by value – Argument is R-value – Value of arguments are copied into the function – swap(x, y) won’t change the value of x or y

• Call by reference – Argument is L-value – Variable in function points to the same location as the

argument – swap(x, y) would change the value of x and y

• Most modern languages use call-by-value semantics

Parameter Passing

• Java uses call-by-value semantics

– It is sometimes said that Java uses call-by-value for primitives and call-by-reference for object types, but that is not quite true.

– Java is call-by-value for everything, except that it does not copy objects but rather copies references to the objects.

• That is, the caller and callee both have references to the same object.

Parameter Passing

• Does not work in Java

– Primitive parameters are copied

void swap(int x, int y) {

int t = x;

x = y;

y = t;

}

Parameter Passing

• Still does not work in Java

– References to objects are copied

void swap(Integer x, Integer y) {

Integer t = x;

x = y;

y = t;

}

Parameter Passing

• Cannot swap the objects, but could change the internal state of the objects

void swap(ModInteger x, ModInteger y) {

int t = x.getValue();

x.setValue(y.getValue());

y.setValue(t);

}

Register Allocation

• Most architectures have only a handful of registers to use for calculations

– Values need to be copied from memory into registers when needed, and then copied back to memory when a register is needed for something else

– For performance, we want to minimize the number of copies to/from memory

Register Allocation

• Can build an interference graph to determine what variables are live at the same time

• First, determine the live ranges of variables based on their "use" and "def"

– A def is an assignment to a variable (L-value)

– A use is the use of the value of a variable (R-value)

Live Ranges

x =

y =

= x

z =

= z

= y

x y z

Variables with ranges that overlap are live at the same time and therefore must use different registers to avoid extra copying in and out of memory

Interference Graph

• Each variable is a vertex in the graph

• An edge in the graph indicates that those two variables are live at the same time

– So the edges indicate which variables cannot share a register

x y

z

Graph Coloring

• The problem of allocating registers now becomes one of coloring the interference graph – We want to color the vertices of the graph so that no

two adjacent vertices have the same color

– The maximum number of colors we can use is the equal to the number of available registers • A coloring with a maximum number of colors k is called a k-

coloring

• But k-coloring a graph is NP-complete and we need it to be fast – Use a heuristic algorithm

Graph Coloring

• Find a vertex whose edge count is < k • Push the vertex on a stack and remove from the graph • Repeat until there are no vertices left in the graph or

there are no vertices with an edge count < k in the graph

• If all vertices have been removed from the graph then the graph can be k-colored – Pop a vertex from the stack and add back to the graph – Color the vertex a different color from any of its neighbors

currently in the graph • How can we know that there is an available color?

– Repeat until stack is empty

Graph Coloring

A B C

G

E D F

Try k = 3

Graph Coloring

A B C

G

E D F

Graph Coloring

• Note that if we get to a point when removing vertices from the graph where all of the remaining vertices have an edge count >= k then it does not necessarily mean the graph cannot be k-colored – It just means the heuristic algorithm failed

– Could try a different algorithm • But it could be that the graph is not k-colorable

– Will need to spill the registers • At some point, copy the registers out to memory so we can

use them to hold other variables

Parsers

• LR(0) – 0 symbols of look ahead when creating the parse table

• SLR – Simple LR resolves conflicts using global grammar

follow sets

• LALR – Look Ahead LR combines some states based on follow

set information

• LR(k) – Most powerful of those where parse states are

created ahead of time

Yet Another Example

1. P S $

2. S A B A C

3. | a a c

4. A a a

5. B b

6. | λ

7. C c

8. | λ

Grammar Parse States I0

P • S $ , {}

S • A B A C, {$}

S • a a c , {$}

A • a a , {b,a}

I1

S a • a c , {$}

A a • a , {b, a}

I2

S a a • c , {$}

A a a • , {b, a}

I3

S a a c • , {$}

I4

S A • B A C , {$}

B • b , {a}

B • , {a}

I5

B b • , {a}

I6

S A B • A C , {$}

A • a a , {c,$}

7

Kernel Rules

1

4

12

1

3

2

2

9

6

5

Grammar Parse States (cont.)

Kernel Rules

8

11

10

13

I7

A a • a , {c,$}

I8

A a a • , {c,$}

I9

S A B A • C , {$}

C • c , {$}

C • , {$}

I10

C c • , {$}

I11

S A B A C • , {$}

I12

P S • $ , {}

I13

P S $ • , {}

CSE 431S Final Review - Washington University in St. Louis · Final Review Washington University...

Documents

Transcript of CSE 431S Final Review - Washington University in St. Louis · Final Review Washington University...