Lecture 21 22

42
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi

Transcript of Lecture 21 22

Page 1: Lecture 21 22

1

Compiler Construction (CS-636)

Muhammad Bilal BashirUIIT, Rawalpindi

Page 2: Lecture 21 22

Outline

1. Data Types & Type Checking

2. Intermediate Code Generation

3. Variants of Syntax Trees

4. Three-Address Code

5. Static Single-Assignment Form

6. Summary

2

Page 3: Lecture 21 22

Semantic Analysis

Lecture: 21-22

3

Page 4: Lecture 21 22

Data Types & Type Checking

One of the principal tasks of a compiler is the computation and maintenance of information on data types (type inference)

Compiler uses this information to ensure that each part of the program makes sense under the type rules of the language (type checking)

Data type information can occur in a program in several different forms

Theoretically, a data type is a set of values, or more precisely a set of values with certain operations on those values

4

Page 5: Lecture 21 22

Data Types & Type Checking (Continue…)

For instance, data type integer in a programming language refers to a subset of mathematical integers, together with the arithmetic operations

These sets in compiler constructions are described by a type expression

Type expressions can occur in several places in a program

5

Page 6: Lecture 21 22

Type Expressions & Type Constructors A programming language always contain a number

of built-in types These predefined types correspond either to

numeric data types like int or double OR they are elementary types like boolean or char

Such data types are called simple types, in that their values exhibit no explicit internal structure

An interesting predefined type in C language is void type This type has no values, and so represents empty set

6

Page 7: Lecture 21 22

Type Expressions & Type Constructors (Continue…) In some languages it is possible to define new

simple types subrange in Pascal and enumerated types in C

In Pascal, subrange of integers from 0 to 9 can be declared astype Digit = 0..9;

In C, an enumerated type consisting of named values can be declared astypedef enum {red, green, blue} Color;

7

Page 8: Lecture 21 22

Type Expressions & Type Constructors (Continue…) Given a set of predefined types, new data types can

be created using type constructors, such as array and record, or struct

Such constructors can be viewed as functions that take existing types as parameters and return new types with a structure that depends on the constructor

Such types are called structured types

8

Page 9: Lecture 21 22

Type Names, Type Declarations, and Recursive Types Languages that have a rich set of type constructors

usually also have a mechanism for a programmer to assign names to type expressions

Such type declarations (sometimes called type definitions) can be done in C as follows

struct RealIntRec {

double r;

int I;

};

9

Page 10: Lecture 21 22

Type Names, Type Declarations, and Recursive Types (Continue…) Type declarations cause the declared type names to

be entered into the symbol table just as variable declarations cause variable names to be entered

Type names are associated with attributes in the symbol table in a similar way to variable declarations

These attributes include scope and type expressions corresponding to the type name

Since type names can appear in type expressions, question arise about the recursive use of type names

10

Page 11: Lecture 21 22

Type Names, Type Declarations, and Recursive Types (Continue…) In C programming language, recursive type names

cannot be declared directly because at time of declaration it is unknown that how much memory be required for the structure;

struct intBST {

int val;

struct intBST *left, *right;

};

11

Page 12: Lecture 21 22

Type Equivalence

Given the possible type expressions of a language, a type checker must frequently answer the question of when two type expressions represent the same type

This is the question of type equivalence There are many possible ways for type equivalence

to be defined by a language Type equivalence checking can be seen as a

function in a compilerfunction typeEqual( t1, t2, TypeExp ) : Boolean

12

Page 13: Lecture 21 22

Type Equivalence (Continue…)

The typeEqual() function takes two type expressions and returns true if they represent the same type according to the type equivalence rules of the language

One issue that relates directly to the description of type equivalence algorithm is the way type expressions are represented within a compiler

One straightforward method is to use a syntax tree representation

13

Page 14: Lecture 21 22

Type Inference & Type Checking

Type checking is described in terms of semantic actions based on representation of types and a typeEqual() operation.

Compiler needs symbol table as well for this purpose along with three of its basic operations insert, lookup, and delete

14

Page 15: Lecture 21 22

Type Inference & Type Checking (Continue…)

Consider the following grammar;

15

Page 16: Lecture 21 22

Type Inference & Type Checking (Continue…)

16

Page 17: Lecture 21 22

Intermediate-Code Generation

Back-end of a Compiler

17

Page 18: Lecture 21 22

Where Are We Now?

18

Scanner

Parser

Semantics Analyzer

Intermediate Code Generator

Source code

Syntax Tree

Annotated Tree

Intermediate code

Tokens

Page 19: Lecture 21 22

Intermediate-Code Generation

In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target code

Ideally, details of the source language are confined to the front end, and details of the target machine to the back end

With a suitably defined intermediate representation, a compiler for language I and machine j can then be built by combining the front end for language I with back end for the machine j

19

Page 20: Lecture 21 22

Intermediate-Code Generation (Continue…)

Following figure shows front-end model of compiler

Static checking includes type checking, which ensures that operators are applied to compatible operands

Static checking also includes any syntactic checks that remain after parsing A break statement in C is enclosed within a while, for or

switch statement

20

Page 21: Lecture 21 22

Intermediate-Code Generation (Continue…)

While translating a program, compiler may construct a sequence of intermediate representations

High-level representations are close to the source language and low-level representation are close to the target machine

The abstract syntax trees are high-level intermediate representation Depict natural hierarchical structure of the source program

21

Source Program

High Level Intermediate

Representation

Low Level Intermediate

Representation

Target Code

Page 22: Lecture 21 22

Intermediate-Code Generation (Continue…)

A low-level representation is suitable for machine-dependent tasks like register allocation and instruction selection

Three-address code can range from high- to low-level, depending upon the choice of operators

The difference between syntax trees and three-address code are superficial A syntax tree represents the component of a statement,

whereas three-address code contains labels and jump instructions to represent the flow of control, as in machine language

22

Page 23: Lecture 21 22

Intermediate-Code Generation (Continue…)

The choice or design of an intermediate representation varies from compiler to compiler

An intermediate representation may either be an actual language or it may consist of internal data structures that are shared by phases of the compiler

C is a programming language, yet it is often used as an intermediate form C is flexible, it compiles into efficient machine code, and its

compilers are widely available The C++ compiler consisted of a front end that generated

C, treating a C compiler as a back end

23

Page 24: Lecture 21 22

Variants of Syntax Trees

Nodes in a syntax tree represent constructs in the source program The children of the node represents meaningful

components of a construct

A directed acyclic graph (DAG) for an expression identifies the common suhexpression of the expression

24

Page 25: Lecture 21 22

Directed Acyclic Graphs for Expressions A directed acyclic graph (DAG), is a directed graph

with no directed cycles Like syntax tree for an expression, a DAG has

leaves corresponding to atomic operands and interior nodes corresponding to operators

A node N in a DAG has more than one parent if N represents a common subexpression

A DAG not only represents expressions more succinctly, it gives the compiler important clues regarding the generation of efficient code to evaluate the expression

25

Page 26: Lecture 21 22

Directed Acyclic Graphs for Expressions (Continue…) Create Syntax Trees and DAG’s for the following

expressions a = a + 10 a + b + (a + b) a + b + a + b a + a * (b – c) + (b – c) * d

26

Page 27: Lecture 21 22

The Value-Number Method for Constructing DAG’s Often, the nodes of a syntax tree or DAG are stored

in an array of records Each row of the array represents one record, and

therefore one node Consider the figure on next slide that shows a DAG

along with an array for expression i = i + 10

27

Page 28: Lecture 21 22

The Value-Number Method for Constructing DAG’s (Continue…) In the following figure leaves have one additional

field, which holds the lexical value, and interior nodes have two additional fields indicating the left and right children

28

Page 29: Lecture 21 22

The Value-Number Method for Constructing DAG’s (Continue…) In the array, we refer to nodes by giving the integer

index of the record for that node within the array This integer is called the value number for the node

or for the expression represented by the node

29

Page 30: Lecture 21 22

Three-Address Code

In three-address code, there is at most one operation on the right side of an instruction

Expression like x+y*z might be translated into the sequence of three-address instructions

t1 = y*z

t2 = x+t1

t1 and t2 are compiler generated temporary names

The use of names for intermediate values computed by a program allows three-address code to be rearranged easily

30

Page 31: Lecture 21 22

Three-Address Code (Continue…)

Exercise Represent the following DAG in three-address code

sequence

31

Page 32: Lecture 21 22

Addresses and Instructions

Three-address code is built from two concepts: addresses and instructions

In object-oriented terms, these concepts correspond to classes, and the various kinds of addresses and instructions correspond to appropriate subclasses

Alternatively, three-address code can be implemented using records with fields for the addresses

The records called quadruples and triples

32

Page 33: Lecture 21 22

Addresses and Instructions (Continue…)

In three-address code scheme, an address can be one of the following A name: The names that appear in source program. In

implementation, a source name is replaced by a pointer to its symbol table entry, where all the information about the name is kept

A constant: In practice, a compiler must deal with many different types of constants and variables

A compiler-generated temporary: It is useful, especially in optimizing compilers, to create a distinct name each time a temporary is needed

33

Page 34: Lecture 21 22

Addresses and Instructions (Continue…)

Few examples of three-address code instructions are mentioned below; Assignment instruction x = y op z Assignment of the form x = op y Copy instructions of the form x = y An unconditional jump goto L Conditional jumps of the form if x goto L Indexed copy instructions of the form x = y[z] OR y[z] = x etc.

34

Page 35: Lecture 21 22

Addresses and Instructions (Continue…)

Consider the following statement and its three-address code in the figures;do

i = i+1;

while( a[i]<v );

35

Page 36: Lecture 21 22

Quadruples & Triples

The description of three-address instructions specifies components of each type of instructions, but it does not specify the representation of these instructions in a data structure

In a compiler, these instructions can be implemented as objects or as records with fields for the operator and the operands

Three such representations are called “quadruples”, “triples”, and “indirect triples”

36

Page 37: Lecture 21 22

Quadruples

A quadruple or just “quad” has four fields, which we call op, arg1, arg2, and result In x=y+z, ‘+’ is op, y and z are arg1 and arg2 whereas x is

result

The following are some exceptions in this rule; Instructions with unary operators like x = minus y OR x = y

do not use arg2

Operators like param use neither arg2 nor result

Conditional and unconditional jumps put the target label in result

37

Page 38: Lecture 21 22

Quadruples (Continue…)

Example: Three-address code for the assignment a = b*-c+b*-c is shown below

38

Page 39: Lecture 21 22

Triples

A triple has only three fields which we call op, arg1, and arg2

In earlier example we have seen the result field is used primarily for temporary names

Using triples, we refer to the result of an operation x op y by its position rather than an explicit temporary name

Consider the figure in next slide for details;

39

Page 40: Lecture 21 22

Triples (Continue…)

Example: Three-address code using Triples

40

Page 41: Lecture 21 22

Static Single-Assignment Form

The Static Single-Assignment Form (SSA) is an intermediate representation that facilitates certain code optimizations

Two aspects distinguish SSA from three-address code All assignments in SSA are to variables with distinct names SSA uses a notational convention Φ-function to combine

two definitions of same variablesif( flag ) x = -1; else x = 1;

y = x + a

if( flag ) x1 = -1; else x2 = 1;

x3 = Φ(x1,x2)

41

Page 42: Lecture 21 22

42

Summary

Any Questions?