4. Processing the intermediate code

4. Processing the intermediate code

From: Chapter 4, Modern Compiler Design, by Dick Grunt et al.

Intermediate code 2

4.0 Background

• The AST still bears very much the traces of the source language and the the programming paradigm it belongs to:

– higher-level constructs are still represented by nodes and subtrees.

• The next step in processing the AST:– transformation to intermediate code

• IC generation– serves to reduce the set of the specific node types to a small set of general concepts

that can be implemented easily on actual machines.

• IC generation– finds the language-characteristic nodes and subtrees in the AST and rewrites them

into subtrees that employ only a small number of features, each of which corresponds rather closely to a set of machine instructions.

• The resulting tree should probably be called an intermediate code tree.

Intermediate code 3

4.0 Background

• The standard IC tree features– expressions, including assignments, routine calls, procedure headings, and

return statements, and conditional and unconditional jumps.

• Administrative features– memory allocation for global variables,

– activation record allocation, and

– module linkage information.

• IC generation– increases the size of the AST, bit

– reduces the conceptual complexity

Intermediate code 4

Deferred to Chs.6 through 9 This chapter

Intermediate code 5

4.0 Background

• Roadmap4. Processing the intermediate code

4.1 Interpretation

4.2 Code generation

4.3 Assemblers, linkers, and loaders

• A sobering thought– whatever the processing method, writing the run-time system and

library routines used by the programs will be a substantial part of the work.

– Little advice can be given on this; most of it is just coding, and usually there is much of it.

Intermediate code 6

4.1 Interpretation

• The simplest way to have the actions expressed by the source program performed is to– process the AST using an ‘interpreter’.

• An interpreter is– a program that considers the nodes of the AST in the correct order and per

forms the actions prescribed for those nodes by the semantics of the language.

• Two varieties of interpreter– Recursive: works directly on the AST and requires less preprocessing

– Iterative: works on a linearized version of the AST but requires more preprocessing.

Intermediate code 7

4.1.1 Recursive interpretation

• A recursive interpreter has an interpreting routine for each node type in the AST.– Such an interpreting routine calls other similar routines, depending on its

children;

– it essentially does what it says in the language definition manual.

• This architecture is possible because the meaning of a given language construct is defined as a function of the meanings of its components.– For example, if-statement = condition + then part + else part

Intermediate code 8

Intermediate code 9


• An important ingredient in a recursive interpreter is the uniform self-identifying data representation.– The interpreter has to manipulate data values defined in the program

being interpreted, but the types and sizes of these values are not known at the time the interpreter is written.

– This makes it necessary to implement these values in the interpreter as variable-size records that specify the type of the run-time value, its size, and the run-time value itself.

Intermediate code 10



• Another important feature is the status indicator.– It is used to direct the flow of control.

– Its primary component: the mode of operation of the interpreter.• An enumeration value, like Normal mode, indicating sequential flow of

control, but

• other values are available, to indicate jumps, exceptions, function returns, etc.

– Its second component: a value to supply information about non-sequential flow of control.• Return mode, Exception mode, Jump mode



• Each interpreting routine checks the status indicator after each call to another routine, to see how to carry on.– If Normal mode, the routine carries on normally.

– Otherwise, it checks to see if the mode is one it should handle;• If it is, it does so, but

• If it is not, the routine returns immediately, to let one of the parent routines handle the mode.

PROCEDURE Elaborate return with expression statement (Rwe node): SET Result TO Evaluate expression (Rwe node .expression); IF Status .mode /= Normal mode: RETURN; SET Status .mode TO Return mode; SET Status .value TO Result;



• Variables, named constants, and other named entities are handled by entering them into the symbol table, in the way they are described in the manual.– It is useful to attach additional data to the entry.

• E.g., if in the manual the entry for ‘declaration of a variable V of type T’ states that room should be allocated for it on the stack,

– we allocate the required room on the heap and enter into the symbol table under the name V a record with the following fields:

» A pointer to the name V,» The file name and line number of its declaration,» An indication of the kind of declarable (variable, constant, field selector,

etc.),» A pointer to the type T, » A pointer to newly allocated room for the value of V,» A bit telling whether or not V has been initialized, if known,» One or more scope- and stack-related pointers, depending on the language,» Perhaps other data, depending on the language.



• A recursive interpreter can be written relatively quickly, and is useful for rapid prototyping;– It is not the architecture of choice for heavy-duty interpreter.

• A secondary advantage: it can help the language designer to debug the design of the language and its description.

• Disadvantages: – Speed of execution

• May be a factor of 1000 or more lower than what could be achieved with a compiler

• Can be improved by doing judicious memorization.

– Lack of static context checking• If needed, full static context checking can be achieved by doing attribute

evaluation before stating the interpretation.


4.1.2 Iterative interpretation

• The structure of an iterative interpreter consists of – a flat loop over a case statement which contains a code segment for each

node type;– the code segment of a given node type implements the semantics of the

node type, as described in the language definition manual.

• It requires – A fully annotated and threaded AST, and– Maintains an active-node pointer, which points to the node to be

interpreted, the active node.

• It repeatedly runs the code segment for the node pointed at by the active-node pointer;– This code sets the active-node pointer to another node, its successor, thus

leading the interpreter to that node.



• The iterative interpreter possesses much more information about run-time events inside a program than a compiled program does, but less than a recursive interpreter.

• A recursive interpreter can maintain an arbitrary information for a variable by storing it in the symbol table, whereas iterative interpreter only has a value at a give address.– Remedy: a shadow memory parallel to the memory array maintained by t

he interpreter.

– Each byte in the shadow memory has 256 possibilities, for example, ‘This byte is uninitialized’, ‘This byte is a non-first byte of a pointer,’ ‘This byte belongs to a read-only array”, ‘This byte is part of the routine call linkage’, etc.



• The shadow data can be used for interpreter-time checking, for example,– To detect the use of uninitialized memory,

– Incorrectly aligned data access,

– Overwritting read-only and system data, and etc.



• Some iterative interpreter can store the AST in a single array, because– Easier to write it to a file

– A more compact representation

– Historical and conceptual



• Iterative interpreters are usually somewhat easier to construct than recursive interpreters;– They are much faster but yield less run-time diagnostics.

• Iterative interpreters are much easier to construct than compilers and– They yield far superior run-time diagnostics.

– Much slower than compiler version

– Between 100 and 1000 times slower, but after optimization interpreter reduced the loss perhaps to a factor of 30 or less.

• Advantages:– Increased portability

– Increased security, for example, in Java


4.2 Code generation

• Compilation produces object code from the intermediate code tree through a process: code generation.– Basic concept

• The systematical replacement of nodes and subtrees of the AST by target code segment, in a way that the semantics is preserved

• A linearization phase, producing a linear sequence of instructions from the rewritten AST

• The replacement process is called tree rewriting

• The linearization is controlled by the data-flow and flow-of-control requirements of the target code segments.


Ra

Rc

Rd

+

* 9

mem 2

+

@b +

*

4

Ra

+

* 9

2Rt

Load_Byte (b+Rd)[Rc],4,Rt

Ra

Load_Byte (b+Rd)[Rc],4,Rt

Load_Address 9[Rt],2,Ra


4.2 Code generation

• Three main issues in code generation– Code selection

• Which part of the AST will be rewritten with which template, using which substitutions for instruction parameters?

– Register allocation• What computational results are kept in registers? Note that it is not certain that

there will be enough registers for all values used and results obtained.

– Instruction ordering• Which part of the code is produced first and which later?


4.2 Code generation

• Optimal code generation is NP-complete

• Compromising by restricting the problem– Consider only small parts of the AST at a time;

– Assume that the target machine is simpler that it actually is, by disregarding some of its complicated features;

– Limit the possibilities in the three issues by having conventions for their use.


4.2 Code generation

• Preprocessing: AST node patterns are replaced by other (better) AST node patterns

• Code generation proper: AST node patterns are replaced by target code sequences, and

• Postprocessing: target code sequences are replaced by other (better) target code sequences, using peephole optimization


4.2.1 Avoiding code generation altogether

AST of source program P Interpreter

an executable program,like a compiled program

A good way to do rapid prototyping, if the interpreter is available


4.2.2 The starting point

• Classes of the nodes in an intermediate code tree– Administration

• For example, declarations, module structure indications, etc.• Code needed is minimal and almost trivial.

– Flow-of-control• For example, if-then, multi-way choice from case statements, computed gotos,

function calls, exception handling, method application, Prolog rule selection, RPC, etc.

– Expressions• Many of the nodes to be generated belongs to expressions.

– Techniques for code generation• Trivial• Simple, and • Advanced


4.2.3 Trivial code generation

• There is a strong relationship between iterative interpretation (II) and code generation (CG):– An II contains code segments performing the actions required by the

nodes in the AST;

– A CG generates code segments performing the actions required by the node in the AST

– Active node pointer is replaced by machine instruction pointer


4.2.3 Trivial code generation

• At first sight it may seem pointless to compile an expression in C to code in C, and the code obtained is inefficient, but still several points have been made:– Compilation has taken in a real sense

– The code generator was obtained with minimal effort

– The process can be repeated for much more complicated source languages

• Two improvements– the threaded code

– partial evaluation


4.2.3.1 Threaded code

• The code of Fig. 4.13 is very repetitive, and the idea is to pack the code segment into routines, possibly with parameters.– called threaded code


4.2.3.1 Threaded code

• The advantage of threaded code is that it is small.

• It is mainly used in process control and embedded systems, to control hardware with limited processing power, for example palmtop and telephone.

• If the ultimate in code size reduction is desired, the routines can be numbered and the list of calls can be replaced by an array of routine numbers.


4.2.3.2 Partial evaluation

• The process of performing part of a computation while generating code for the rest of the computation is called partial evaluation.

• It is a very general and powerful technique for program simplification and optimization.

• Many researchers believe that – many of the existing optimization techniques are special cases of partial

evaluation

– and that better knowledge of it would allow us to obtain very powerful optimizers,

– thus simplifying compilation, program generation, and even program design.


4.2.4 Simple code generation

• Two machine types are considered:– Pure stack machine and pure register machine

• A pure stack machine – uses a stack to store and manipulate values;

– it has no registers.

– It has two types of instructions• those that move or copy values between the top of the stack and elsewhere and

• those that do operations on the top element or elements of the stack.

– Two important data administration pointer• the stack pointer, SP, and

• the base pointer, BP.



• The code for p:=p+5is

Push_Local #p //Push value of #p-th local onto stackPush_Const 5 //Push value 5 onto stackAdd_Top2 //Add top two elementsStore_Local #p //Pop and store result back in #p-th local.



• A pure register machine has– a memory to store values in,

– a set of registers to perform operations on, and

– two set of instructions.• One set contains instructions to copy values between the memory and a

register.

• The other perform operations on the values in two registers and leave the result in one of them.



• The code for p:=p+5 on a register-memory machine would be:

Load_Mem p,R1Load_Const 5,R2Add_Reg R2,R1Store_Reg R1,p


4.2.4.1 Simple code generation for a stack machine



Push_Local #bPush_Local #bMult_Top2Push_Const 4Push_Local #aPush_Local #cMult_Top2Mult_Top2Store_Top2


4.2.4.2 Simple code generation for a register machine

• Much of what was said about code generation for stack machine applies to the register machine as well.

• The AST of the machine instructions from Fig. 4.22



• Use depth-first code generation again, but have to content with register this time.– Method:

• Make order that in the evaluation of each node in the expression tree,

• the result of the expression is expected in a given register, the target register,

• and that a given set of auxiliary register is available to help get it there.


a number



• Actually no set manipulation is necessary in this case, the set can be implemented as a stack of registers.

– We pick the top of the register stack for Target 2, which leaves us the rest of the stack Aux 2.



• Weighted register allocation– Motivating example

– We call the number of registers required by a node its weight.

– The weight of a subtree can be determined simply by a depth-first prescan.

– If the left tree is heavier, we compile it first.

– The same applies vice versa to the right tree if it is heavier.

– This technique is sometimes called Sethi-Ullman numbering.

– Generalization to operations with n operands (see pp. 311-314)


4 registers 3 registers



• Spilling registers– Problem: the expression to be translated may require more registers than t

he can get.

– Solution: one or more values from registers have to be stored in memory locations to be retrieved later. Register spilling technique

– A simple method• Consider the tree for a very complicated expression has a top region weighting

higher than the registers we have.

• Detach some of the subtrees and store to temporary variables

• This leaves us with a set of temporary variables with expressions for which we can generate code since we have enough registers.


4.2.4.3 Compilation on the stack/ compilation by symbolic interpretation

• Employ compilation by symbolic interpretation as a full code generation technique.– By extending the approximate stack representation

• Compilation by symbolic interpretation uses the same technique but does keep the representation exact.– Register and variable descriptor, or regvar descriptor


4.2.5 Code generation for basic blocks

• As explained previously, instruction selection, register allocation, and instruction ordering are intertwined, and – finding the optimal rewriting of the AST with available instruction

templates is NP-complete.

• We present here three techniques that each addresses part of the problem.– Basic block, is mainly concerned with optimization, instruction selection,

and instruction ordering in limited part of the AST. (4.2.5)

– Bottom-up tree rewriting shows how a very good instruction selector can be generated automatically for very general instruction sets and cost functions, under the assumption that enough registers are available. (4.2.6)

– Register allocation by graph coloring explains a good and very general heuristic for register allocation. (4.2.7)



• The idea of basic block is used in code generation.

• Basic block– A part of the control graph containing no splits (jumps) or combines

(labels).

– Usually consider only maximal basic block, basic blocks which cannot be extended by including adjacent nodes without violating the definition of a basic block.

• In the imperative languages, basic blocks consist exclusively of expressions and assignments, which flow each other sequentially.– In practice, this is also true for functional and logic languages.



• The effect of an assignment in a basic block

– may be local to the block, the resulting value is not used anywhere else and the variable is dead at the end of basic block, or

– it may be non-local, in which case the variable is an output variable of the basic block.

• In general, simpler means, for example the scope rule of C, is sufficient to determine whether local or non-local.

– If we do not have this information, we have to assume that all variables are live at basic block end.



• We will now look at one way to generate code for a basic block.

– First, convert the AST and the control graph implied in it into a dependency graph, a dag.

– Then rewrite the dependency graph to code.

• Use the code in Fig. 4.41 as an example. We assume that– n is local and dead at the end;– x and y are live at block exit.


4.2.5.1 From AST to dependency graph

• The threaded AST is not appropriate for code generation:– Control flow graphs are more restricted than necessary

– Only the data dependencies have to be obeyed.

• It is easier to generate code from data dependency graph than control flow graph.



• Two main sources of data dependencies in the AST of a basic block:– Data flow inside expressions– Data flow from values assigned to variables to the use of these variables in further

code

• Third source of data dependencies:– concerning pointers (4.2.5.3)

• Three observations– The order of the evaluation of operation in expression in immaterial, as long as the

data dependencies inside the expressions are respected.– If the values of a variable V is used more than once in a basic block, the order of

these uses is immaterial, as long as each use comes after the assignment it depends on and before the next assignment to V.

– The order in which the assignments to variables are executed is immaterial, as long as all assignments to a specific variable V are executed in sequential, left-to-right, order.



• The previous observations give us a simple algorithm to convert the AST of a basic block into a data dependency graph.1. Replace the arcs that connect the nodes in the AST of the basic block by

data dependency arrows.• := destination

• Others: from parent nodes downward

2. Insert an arrow from each variable used as on operand to the assignment that set its value, or to the beginning of the basic block if V was an input variable.

3. Insert an arrow from each assignment to a variable V to the previous assignment to V, if present.

4. Designate the nodes that describe the output values as roots of the graph.

5. Remove the ;-nodes from their arrows.


a

n

n d1

+ n

=: =: =: =:

n

*b

+ c

x+

1

+ n

n

* y

; ; ;

1. Replace the arcs that connect the nodes in the AST of the basic block by data dependency arrows.

• := destination

• Others: from parent nodes downward


a

n

n d1

+ n

=: =: =: =:

n

*b

+ c

x+

1

+ n

n

* y

; ; ;

2. Insert an arrow from each variable used as on operand to the assignment that set its value, or to the beginning of the basic block if V was an input variable.


a

n

n d1

+ n

=: =: =: =:

n

*b

+ c

x+

1

+ n

n

* y

; ; ;

3. Insert an arrow from each assignment to a variable V to the previous assignment to V, if present.


a

n

n d1

+ n

=: =: =: =:

n

*b

+ c

x+

1

+ n

n

* y

; ; ;

4. Designate the nodes that describe the output values as roots of the graph.


a

n

n d1

+ n

=: =: =: =:

n

*b

+ c

x+

1

+ n

n

* y

5. Remove the ;-nodes from their arrows.



• An assignment in the data dependency graph just passes on the value and can be short-circuited.

• Also we can eliminate from the graph all nodes not reachable through al least one of the root.



• Fig. 4-44 has the property that if specifies the semantics of the basic block precisely:– All required nodes and data dependencies are present and no node or data

dependency is superfluous.

• Two techniques for converting the data dependency graph into efficient machine instructions.– Common sub-expression elimination

– Triple representation of dependency graph


4.2.5.1 From AST to dependency graph Common sub-expression elimination

x=a*a+2*a*b+b*bx=a*a+2*a*b+b*b

double quads=a*a+b*bdouble cross_prod=2*a*bx=quads+cross_prodx=quads+cross_prod

a[i]+b[i] *(a+4*i)+*(b+4*i)

x=a*a+2*a*b+b*ba=b=0x=a*a-2*a*b+b*b

Not necessarily common sub-expressions


4.2.5.1 From AST to dependency graph Common sub-expression elimination

• Once we have the data dependency graph, finding the common sub-expressions is simple.– Rule: Two nodes that have the operands, the operator, and the

dependencies in common can be combined into one node.

• Detecting that two or more nodes in a graph are the same is usually implemented by storing some representation of each node in a hash table.– If the hash value of a node depends on its operands, its operator, and its

dependencies, common nodes will hash to the same vale.


4.2.5.1 From AST to dependency graph The triple representation of the data dependency graph

• Traditionally, data dependency graphs are implemented as array of triples.– A triple is a record with three fields representing an operator with its two

operands, and corresponding to an operator node in the data dependency graph.


4.2.5.2 From dependency graph to code

• Generating instructions from a data dependency graph is very similar to doing so from an AST:– The nodes are rewritten by machine instruction templates and the result is

linearlized.

– Main difference: the former allows more leeway than the latter

• Assume a ‘register-memory machine’ is used.


4.2.5.2 From dependency graph to codeLinearlization of the data dependency graph

• In the absence of ordering criteria, two orderings suggest themselves:– Early evaluation: code for a node is issued as soon as the code for all its

operands has been issued.

– Late evaluation: code for a node is issued as late as possible.

• Early evaluation ordering tends to require more registers than late evaluation ordering:– Since EEO creates values as soon as possible, which may be long before

they are used, and the values have to be kept in registers.



• Available ladder sequence– Available LS’s start at root node, continuously along left operands but ma

y continue along the right operand for commutative operators, may stop anywhere, but must stop at leaves.

• Code generated for a given LS – Starts at its last node, by loading a leaf variable if the sequence ends in a l

eaf, or an intermediate value if the sequence ends earlier.

– Working backwards along the sequence, code is generated for each of the operation nodes.

– Finally the resulting value is stored as indicated in the root node.



• Simple heuristic ordering algorithm combining the identification of ladder sequence with late evaluation1. Find an acceptable LS S that has the property that none of its nodes has

more than one incoming dependency.

2. If any operand of a node N in S is not a leaf but another node M, associate a new pseudo-register R with M if it does not have one already;

– Use R as the operand in the code generated for N and make M an additional root of the dependency graph.

3. Generate code for the LS S, using R1 as the ladder register.

4. Remove the LS S from the DDG.

5. Repeat steps 1 through 4 until the entire DDG has been consumed and rewritten to code.


Two available LS without multipleincoming dependencies

Generate the rightmost one first:Load_Reg X1, R1Add_Const 1, R1Mult_Mem d, R1Store_Reg R1, y

Generate the next available LS:Load_Reg X1, R1Mult_Ref X1, R1Add_Mem b, R1Add_Mem c, R1Store_Reg R1, x


Generate the remaining LS:Load_Mem a, R1Add_Const 1, R1Load_Reg R1, X1


4.2.5.2 From dependency graph to codeRegister allocation for the linearized code

• One thing remains to be done:– The pseudo-registers have to be mapped onto real registers, or failing that,

to memory locations.

• A simple method– Map the pseudo-registers onto real registers in the order of appearance,

and when running out registers, we map the remaining ones onto memory locations.

– For a machine with at least two registers, R1 and R2, the resulting code is shown in Fig. 4.54.


‘stupid’ instructionsgenerated


4.2.5.2 From dependency graph to code Register allocation for the linearized code

• Ways to deal with ‘stupid’ instruction generated:– Improving the code generation algorithm;

– Do register tracking (4.2.4.3); and

– Do peephole optimization (4.2.12)


4.2.5.3 Code optimization in the presence of pointers


4.2.6 BURS code generation and dynamic programming

• We consider here machines with great variety of instructions, rather than simple ones before.



• Two main problems identified:– How do we find all possible rewrites, and how do we represent them?

• Solved by a bottom-up rewriting system, BURS

– How do we find the best/cheapest rewrite among all possibilities, preferably in time linear in the size of the expression to be translated?

• Solved by a form of dynamic programming



• In BURB, the code is generated in three scans over the input tree:1. An instruction-selection scan: bottom-up, identifies possible instruction

for each node by pattern matching;– By a post-order recursive visit

2. An instruction-selection scan: top-down, selects at each node one instruction out of the possible instructions collected during the previous scan; (most interesting)

• By a pre-order recursive visit

3. A code-generating scan: bottom-up, emits the instructions in the correct linearized order.

• By a post-order recursive visit



• Four variants of the instruction-selection scan– Using item set (4.2.6.1)

– Using a tree automata (4.2.6.2)

– Using dynamic programming (4.2.6.3)

– Combining the above three to become an efficient bottom-up scan (4.2.6.4)


4.2.6.1 Bottom-up pattern matching

• The algorithm for bottom-up pattern matching is a tree version of the lexical algorithm from Section 2.1.6.1.

4. Processing the intermediate code

Documents

Transcript of 4. Processing the intermediate code