Intermediate Code Generation 1

56
Intermediate code generation Shashwat Shriparv [email protected] InfinitySoft

Transcript of Intermediate Code Generation 1

Page 1: Intermediate Code Generation 1

Intermediate code generation

Shashwat [email protected]

Page 2: Intermediate Code Generation 1

Intermediate Code Generation

Translating source program into an “intermediate language.” Simple CPU Independent, …yet, close in spirit to machine language.

Benefits is

1. Retargeting is facilitated

2. Machine independent Code Optimization can be applied.

Page 3: Intermediate Code Generation 1

Intermediate Code Generation Intermediate codes are machine independent codes, but they are close to

machine instructions.

The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator.

Intermediate language can be many different languages, and the designer of the compiler decides this intermediate language.

syntax trees can be used as an intermediate language.

postfix notation can be used as an intermediate language.

three-address code (Quadruples) can be used as an intermediate language

we will use quadruples to discuss intermediate code generation quadruples are close to machine instructions, but they are not actual

machine instructions.

Page 4: Intermediate Code Generation 1

Types of Intermediate Languages

Graphical Representations. Consider the assignment a:=b*-c+b*-c:

assign

a +

* *

uminus uminusb

c c

b

assign

a+

*

uminus

b c

Page 5: Intermediate Code Generation 1

Syntax Dir. Definition to produce syntax trees for Assignment Statements.

PRODUCTION Semantic Rule

S id := E { S.nptr = mknode (‘assign’,

mkleaf(id, id.entry), E.nptr) }

E E1 + E2 {E.nptr = mknode(‘+’, E1.nptr,E2.nptr) }

E E1 * E2 {E.nptr = mknode(‘*’, E1.nptr,E2.nptr) }

E - E1 {E.nptr = mknode(‘uminus’, E1.nptr) }

E ( E1 ) {E.nptr = E1.nptr }

E id {E.nptr = mkleaf(id, id.entry) }

Page 6: Intermediate Code Generation 1

Three Address Code

Statements of general form x:=y op z

No built-up arithmetic expressions are allowed. As a result, x:=y + z * w

should be represented ast1:=z * wt2:=y + t1

x:=t2

Observe that given the syntax-tree or the dag of the graphical representation we can easily derive a three address code for assignments as above.

In fact three-address code is a linearization of the syntax tree. Three-address code is useful: related to machine-language/ simple/

optimizable.

x,y,z- names,constants or compiler-generated temporaries

t1 , t2 – compiler generated temporary names

Page 7: Intermediate Code Generation 1

3 address code for the syntax tree and the daga:=b*-c+b*-c:

assign

a +

* *

uminus uminusb

c c

b

assign

a+

*

uminus

b c

Syntax tree Dag

Page 8: Intermediate Code Generation 1

3-address codes are

t1:=- c

t2:=b * t1

t5:=t2 + t2

a:=t5

t1:=- ct2:=b * t1

t3:=- ct4:=b * t3

t5:=t2 + t4

a:=t5

Syntax tree Dag

Page 9: Intermediate Code Generation 1

Types of Three-Address Statements.

Assignment Statement: x:=y op zAssignment Statement: x:=op zCopy Statement: x:=zUnconditional Jump: goto LConditional Jump: if x relop y goto

LStack Operations: Push/pop

More AdvancedProcedure:

param x1param x2…param xncall p,n

Index Assignments:x:=y[ i ]x[ i ]:=y

Address and Pointer Assignments:x:=&yx:=*y*x:=y

Generated as part of call of proc. p(x1,x2,……,xn)

Page 10: Intermediate Code Generation 1

Syntax-Directed Translation into 3-address code.

Page 11: Intermediate Code Generation 1

Syntax-Directed Translation for 3-address code for assignment statements

Use attributes E.place to hold the name of the “place” that will hold the value of

E Identifier will be assumed to already have the place attribute

defined. For example, the place attribute will be of the form t0, t1, t2, …

for identifiers and v0,v1,v2 etc. for the rest. E.code to hold the three address code statements that evaluate

E (this is the `translation’ attribute). Use function newtemp that returns a new temporary

variable that we can use. Use function gen to generate a single three address

statement given the necessary information (variable names and operations).

Page 12: Intermediate Code Generation 1

Syntax-Dir. Definition for 3-address code

PRODUCTION Semantic Rule S id := E { S.code = E.code||gen(id.place ‘=’ E.place ) }E E1 + E2 {E.place = newtemp ;

E.code = E1.code || E2.code || || gen(E.place‘:=’E1.place‘+’E2.place)

}E E1 * E2 {E.place = newtemp ;

E.code = E1.code || E2.code || || gen(E.place‘=’E1.place‘*’E2.place) }

E - E1 {E.place = newtemp ; E.code = E1.code ||

|| gen(E.place ‘=’ ‘uminus’ E1.place) }E ( E1 ) {E.place = E1.place ; E.code = E1.code}E id {E.place = id.entry ; E.code = ‘’ }

e.g. a := b * - (c+d)

‘||’: string concatenation

Page 13: Intermediate Code Generation 1

while statements

E.g. while statements of the form “while E do S”(interpreted as while the value of E is not 0 do S)

PRODUCTION

S while E do S1

Semantic Rule

S.begin = newlabel;

S.after = newlabel ;

S.code = gen(S.begin ‘:’)

|| E.code

|| gen(‘if’ E.place ‘=’ ‘0’ ‘goto’ S.after)

|| S1.code

|| gen(‘goto’ S.begin)

|| gen(S.after ‘:’)

E.code

If E.place = 0 goto S.after

S1.code

Goto S.begin

S.begin:

S.after:……………….

To mark the 1st stmt. In code for E

stmt. following code S

Page 14: Intermediate Code Generation 1

Implementation of 3 address code

QuadruplesTriplesIndirect triples

Page 15: Intermediate Code Generation 1

Quadruples

A quadruple is a record structure with four fields: op, arg1, arg2, and result The op field contains an internal code for an operator Statements with unary operators do not use arg2 Operators like param use neither arg2 nor result The target label for conditional and unconditional jumps are in result

The contents of fields arg1, arg2, and result are typically pointers to symbol table entries

Page 16: Intermediate Code Generation 1

Implementations of 3-address statements

Quadruples

t1:=- c

t2:=b * t1

t3:=- c

t4:=b * t3

t5:=t2 + t4

a:=t5

op arg1 arg2 result

(0) uminus c t1

(1) * b t1 t2

(2) uminus c

(3) * b t3 t4

(4) + t2 t4 t5

(5) := t5 a

a:=b*-c+b*-c:

Page 17: Intermediate Code Generation 1

Triples

Triples refer to a temporary value by the position of the statement that computes it Statements can be represented by a record with only three

fields: op, arg1, and arg2 Avoids the need to enter temporary names into the symbol table

Contents of arg1 and arg2: Pointer into symbol table (for programmer defined names) Pointer into triple structure (for temporaries)

Page 18: Intermediate Code Generation 1

Implementations of 3-address statements, II

Triples

t1:=- c

t2:=b * t1

t3:=- c

t4:=b * t3

t5:=t2 + t4

a:=t5

op arg1 arg2

(0) uminus c

(1) * b (0)

(2) uminus c

(3) * b (2)

(4) + (1) (3)

(5) assign a (4)

a:=b*-c+b*-c:

Page 19: Intermediate Code Generation 1

Implementations of 3-address statements, III

Indirect Triples

stmt op arg1 arg2

(14) uminus c

(15) * b (14)

(16) uminus c

(17) * b (16)

(18) + (15) (17)

(19) assign a (18)

stmt

(0) (14)

(1) (15)

(2) (16)

(3) (17)

(4) (18)

(5) (19)

t1:=- ct2:=b * t1t3:=- ct4:=b * t3t5:=t2 + t4a:=t5

a:=b*-c+b*-c:

Page 20: Intermediate Code Generation 1

DECLARATIONS

Declarations in a procedure Langs. like C , Pascal allows declarations in single procedure to

be processed as a group A global variable offset keeps track of the next available relative

addresses Before the Ist declaration is considered, the value of offset is set

to 0. When a new name is seen , name is entered in symbol table

with current value as offset , offset incre. by width of data object denoted by name.

Procedure enter(name,type,offset) creates symbol table entry for name , gives it type type ,and rel.addr. offset in its data area

Type , width – denotes no. of memory units taken by objects of that type

Page 21: Intermediate Code Generation 1

SDT to generate ICode for DeclarationsUsing a global variable offsetPRODUCTION Semantic Rule

P D { }D D ; DD id : T { enter (id.name, T.type, offset);

offset:=offset + T.width }T integer {T.type = integer ; T.width = 4; }T real {T.type = real ; T.width = 8}T array [ num ] of T1

{T.type=array(1..num.val,T1.type) T.width = num.val * T1.width}

T ^T1 {T.type = pointer(T1.type);T1.width = 4}

Page 22: Intermediate Code Generation 1

Nested Procedure Declarations

For each procedure we should create a symbol table.

mktable(previous) – create a new symbol table where previous is the parent symbol table of this new symbol table

enter(symtable,name,type,offset) – create a new entry for a variable in the given symbol table.

enterproc(symtable,name,newsymbtable) – create a new entry for the procedure in the symbol table of its parent.

addwidth(symtable,width) – puts the total width of all entries in the symbol table into the header of that table.

We will have two stacks: tblptr – to hold the pointers to the symbol tables offset – to hold the current offsets in the symbol tables in tblptr

stack.

Page 23: Intermediate Code Generation 1

SDT to generate ICode for Nested Procedures

( P M D { addwidth(top(tblptr), top(offset)); pop(tblptr); pop(offset) }

M { t:=mktable(null); push(t, tblptr); push(0, offset)}

D D1 ; D2 ...

D proc id ; N D ; S { t:=top(tblpr); addwidth(t,top(offset));

pop(tblptr); pop(offset);

enterproc(top(tblptr), id.name, t)}

N {t:=mktable(top(tblptr)); push(t,tblptr); push(0,offset);}

D id : T {enter(top(tblptr), id.name, T.type, top(offset);

top(offset):=top(offset) + T.width

Example: proc func1; D; proc func2 D; S; S

Page 24: Intermediate Code Generation 1

SDT to generate ICode for assignment statements

Use attributes E.place to hold the name of the “place” that will hold the value of

E Identifier will be assumed to already have the place attribute

defined. For example, the place attribute will be of the form t0, t1, t2, …

for identifiers and v0,v1,v2 etc. for the rest. E.code to hold the three address code statements that evaluate

E (this is the `translation’ attribute). Use function newtemp that returns a new temporary

variable that we can use. Use function gen to generate a single three address

statement given the necessary information (variable names and operations).

Page 25: Intermediate Code Generation 1

Syntax-Dir. Definition for 3-address code

PRODUCTION Semantic Rule S id := E { S.code = E.code||gen(id.place ‘=’ E.place ) }E E1 + E2 {E.place = newtemp ;

E.code = E1.code || E2.code || || gen(E.place‘:=’E1.place‘+’E2.place)

}E E1 * E2 {E.place = newtemp ;

E.code = E1.code || E2.code || || gen(E.place‘=’E1.place‘*’E2.place) }

E - E1 {E.place = newtemp ; E.code = E1.code ||

|| gen(E.place ‘=’ ‘uminus’ E1.place) }E ( E1 ) {E.place = E1.place ; E.code = E1.code}E id {E.place = id.entry ; E.code = ‘’ }

e.g. a := b * - (c+d)

Page 26: Intermediate Code Generation 1

Boolean Expressions

Boolean expressions has 2 purpose To compute Boolean values as a conditional expression for statements

Methods of translating boolean expression:

(2 methods to represent the value of Boolean expn) Numerical methods:

True is represented as 1 and false is represented as 0 Nonzero values are considered true and zero values are considered

false By Flow-of-control :

Represent the value of a boolean by the position reached in a program

Often not necessary to evaluate entire expression

Page 27: Intermediate Code Generation 1

SDT for Numerical Representation for booleans

Expressions evaluated left to right using 1 to denote true and 0 to donate false

Example: a or b and not ct1 := not ct2 := b and t1t3 := a or t2

Another example: a < b100: if a < b goto 103101: t : = 0102: goto 104103: t : = 1104: …

Page 28: Intermediate Code Generation 1

Emit & nextstat

emit fn.– places 3-address stmts into an o/p file in the right format

nextstat fn.– gives the index of the next 3 - address stmt in o/p sequence

E.place to hold the name of the “place” that will hold the value of E

Page 29: Intermediate Code Generation 1

Production Semantic Rules

E E1 or E2

E.place := newtemp;emit(E.place ':=' E1.place 'or'

E2.place)

E E1 and E2

E.place := newtemp;emit(E.place ':=' E1.place 'and'

E2.place)

E not E1

E.place := newtemp;emit(E.place ':=' 'not' E1.place)

E (E1) E.place := E1.place;

SDT for Numerical Representation for booleans

Page 30: Intermediate Code Generation 1

Production Semantic Rules

E id1 relop id2

E.place := newtemp;emit('if' id1.place relop.op

id2.place 'goto' nextstat+3);

emit(E.place ':=' '0');emit('goto' nextstat+2);emit(E.place ':=' '1');

E trueE.place := newtemp;emit(E.place ':=' '1')

E falseE.place := newtemp;emit(E.place ':=' '0')

SDT for Numerical Representation for booleans

nextstat fn.– gives the index of the next 3 - address stmt in o/p sequence

Page 31: Intermediate Code Generation 1

Example: a<b or c<d and e<f

100: if a < b goto 103101: t1 := 0102: goto 104103: t1 := 1104: if c < d goto 107105: t2 := 0106: goto 108107: t2 := 1108: if e < f goto 111109: t3 := 0110: goto 112111: t3 := 1112: t4 := t2 and t3113: t5 := t1 or t4

Page 32: Intermediate Code Generation 1

Flow of control Stmts S →if E then S1 |

if E then S1 else S2|while E do S1

Here E is the boolean expn. to be translated We assume that 3-address code can be labeled newlabel returns a symbolic label each time its called. E is associated with 2 labels

1. E.true – label which controls flow if E is true

2. E.false – label which controls flow if E is false S.next – is a label that is attached to the first 3 address

instruction to be executed after the code for S

Page 33: Intermediate Code Generation 1

1. Code for if - then

E.code

S1.code

to E.trueto E.false

………..

S →If E then S1

Semantic rules

E.true := newlabel;

E.false := S.next;

S1.next := S.next;

S.code := E.code || gen(E.true ':') || S1.code

E.true:

E.false:

Page 34: Intermediate Code Generation 1

2.Code for if-then-else

S if E then S1 else S2

E.code

S1.code

to E.true

to E.false

Semantic rules

goto S.next

S2.code

E.true:

E.false:

………..S.next

E.true := newlabel;E.false := newlabel;S1.next := S.next;S2.next := S.next;S.code := E.code || gen(E.true ':') || S1.code || gen(‘ goto‘ S.next) || gen ( E.false ‘:’ ) || S2.code

Page 35: Intermediate Code Generation 1

3. Code for while-do

S while E do S1

E.code

S1.code

to E.true

to E.false

Semantic rules

S.begin := newlabel;E.true := newlabel;E.false := S.next;S1.next := S.begin;S.code := gen(S.begin ':') || E.code || gen(E.true ':') || S1.code || gen('goto' S.begin)

goto S.begin

E.true:

E.false: ………..

S.begin

Page 36: Intermediate Code Generation 1

Jumping code/Short Circuit code for boolean expression Boolean Expressions are translated in a sequence of

conditional and unconditional jumps to either E.true or E.false.

a < b. The code is of the form:if a < b then goto E.truegoto E.false

E1 or E2. If E1 is true then E is true, so E1.true = E.true. Otherwise, E2 must be evaluated, so E1.false is set to the label of the first statement in the code for E2.

E1 and E2. Analogous considerations apply. not E1. We just interchange the true and false with that

for E.

Page 37: Intermediate Code Generation 1

Production Semantic Rules

E E1 or E2

E1.true := E.true;

E1.false := newlabel;

E2.true := E.true;

E2.false := E.false;

E.code := E1.code ||

gen(E1.false ':') || E2.code

E E1 and E2

E1.true := newlabel;

E1.false := E.false;

E2.true := E.true;

E2.false := E.false;

E.code := E1.code ||

gen(E1.true ':') || E2.code

Control flow translation of boolean expression

We will now see the code produced for the boolean expression E

Page 38: Intermediate Code Generation 1

Production Semantic Rules

E not E1E1.true := E.false;

E1.false := E.true;

E.code := E1.code

E (E1) E1.true := E.true;

E1.false := E.false;

E.code := E1.code

E id1 relop id2E.code := gen('if' id.place relop.op id2.place 'goto' E.true) || gen('goto' E.false)

E true E.code := gen('goto' E.true)

E false E.code := gen('goto' E.false)

Page 39: Intermediate Code Generation 1

Example

while a < b do if c < d then x := y + z else x := y - z

Page 40: Intermediate Code Generation 1

Example

Lbegin: if a < b goto L1 goto Lnext L1: if c < d goto L2 goto L3 L2: t1 := y + z x := t1 goto Lbegin L3: t2 := y - z x := t2 goto Lbegin Lnext:

while a < b do if c < d then x := y + z else x := y - z

Page 41: Intermediate Code Generation 1

Case Statements

Switch <expression>

begin

case value : statement

case value : statement

……..

case value : statement

default : statement

end

Page 42: Intermediate Code Generation 1

Translation of a case stmt

code to evaluate E into t goto test L1: code for S1 goto next …Ln-1: code for Sn-1 goto next Ln: code for Sn goto next

test: if t = V1 goto L1 … if t = Vn-1 goto Ln-1 goto Lnnext:

Page 43: Intermediate Code Generation 1

Backpatching

Easiest way to implement Syntax directed defn. is to use 2 passes

First, construct syntax tree Walk through syntax tree in depth-first order,

computing translations May not know the labels to which control must flow

at the time a jump is generated Affect boolean expressions and flow control statements Leave targets of jumps temporarily unspecified Add each such statement to a list of goto statements whose

labels will be filled in later This filling in of labels is called back patching

How backpatching is implemented in 1 pass….?

Page 44: Intermediate Code Generation 1

Lists of Labels

Imagine that we are generating quadruples into a quadruple array.

Labels are indices into this array To manipulate this list of labels we use 3 fns.

makelist(i) Creates a new list containing only i, and index into the array of

quadruples Returns a pointer to the new list

merge(p1, p2) Concatenates two lists of labels Returns a pointer to the new list

backpatch(p, i) – inserts i as target label for each statement on the list pointed to by p

Page 45: Intermediate Code Generation 1

Boolean Expressions and Markers

E E1 or M E2

| E1 and M E2

| not E1

| (E1) | id1 relop id2

| true | false

M ε

Page 46: Intermediate Code Generation 1

The New Marker , M

Translation scheme suitable for producing quadruples during bottom-up pass The new marker has an associated semantic action which

Picks up, at appropriate times, the index of the next quadruple to be generated

M.quad := nextquad Nonterminal E will have two new synthesized

attributes: E.truelist contains a list of statements that jump when

expression is true E.falselist contains a list of statements that jump when

expression is false

Page 47: Intermediate Code Generation 1

Example: E E1 and M E2

If E1 is false: Then E is also false So statements on E1.falselist become part of E.falselist

If E1 is true: Still need to test E2

Target for statements on E1.truelist must be the beginning of code generated for E2

Target is obtained using the marker M

Page 48: Intermediate Code Generation 1

New Syntax-Directed Definition (1)

Production Semantic Rules

E E1 or M E2

backpatch(E1.falselist, M.quad);

E.truelist := merge(E1.truelist,

E2.truelist);

E.falselist := E2.falstlist

E E1 and M E2

backpatch(E1.truelist, M.quad);

E.truelist := E2.truelist;

E.falselist := merge(E1.falselist,

E2.falselist)

E not E1

E.truelist := E1.falselist;

E.falselist := E1.truelist

E (E1)E.truelist := E1.truelist;

E.falselist := E1.falselist

Page 49: Intermediate Code Generation 1

New Syntax-Directed Definition (2)

Production Semantic Rules

E id1 relop id2

E.truelist := makelist(nextquad);E.falselist := makelist(nextquad+1);emit('if' id1.place relop.op

id2.place 'goto _');

emit('goto _')

E trueE.truelist := makelist(nextquad);emit('goto _')

E falseE.falselist := makelist(nextquad);emit('goto _')

M ε M.quad := nextquad

Page 50: Intermediate Code Generation 1

Example Revisited (1)

Reconsider: a<b or c<d and e<f First, a<b will be reduced, generating:

100: if a < b goto _101: goto _

Next, the marker M in E E1 or M E2 will be reduced, and M.quad will be set to 102

Next, c<d will be reduced, generating:102: if c < d goto _103: goto _

Page 51: Intermediate Code Generation 1

Example Revisited (2)

Next, the marker M in E E1 and M E2 will be reduced, and M.quad will be set to 104

Next, e<f will be reduced, generating:104: if e < f goto _105: goto _

Next, we reduce by E E1 and M E2

Semantic action calls backpatch({102}, 104) E1.truelist contains only 102 Line 102 now reads: if c <d goto 104

Page 52: Intermediate Code Generation 1

Example Revisited (3)

Next, we reduce by E E1 or M E2 Semantic action calls backpatch({101}, 102) E1.falselist contains only 101 Line 101 now reads: goto 102

Statements generated so far:100: if a < b goto _101: goto 102102: if c < d goto 104103: goto _104: if e < f goto _105: goto _

Remaining goto instructions will have their addresses filled in later

Page 53: Intermediate Code Generation 1

Annotated Parse Tree

Page 54: Intermediate Code Generation 1

Procedure Calls

GrammarS -> call id ( Elist )

Elist -> Elist , E

Elist -> E

Semantic Actions1. S -> call id (Elist) for each item p in queue do

{ gen(‘param’ p) gen(‘call’ id.place)}

Page 55: Intermediate Code Generation 1

2. Elist -> Elist , E {append E.place to the end of queue}

3. Elist -> E { initialize queue to contain only E.place}

e.g.

P (x1,x2,x3,…………….xn)

param x1

param x2

………….………….param xn

call P

Page 56: Intermediate Code Generation 1

Shashwat [email protected]