HRA Project Report

8/13/2019 HRA Project Report

1/57

B.TECH PROJECT REPORT

on

HEAP REFERENCE ANALYSIS AND ITS IMPLEMENTATION IN GCC

SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF

BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE AND ENGINEERING

Submitted by:Pratik Patre

Niranjan Viladkar

Waman Virgaonkar

Under the guidance of

Dr. C. S. Moghe

Professor, Computer Science and Engineering, VNIT

&

Dr. U. P. Khedker

Professor, Computer Science and Engineering, IITB

Visvesvaraya National Institute of Technology, Nagpur

2010-2011


2/57

2

Visvesvaraya National Institute of Technology, Nagpur

2010-2011

CERTIFICATE

This is to certify that the project work entitled HEAP REFERENCE ANALYSIS AND ITS

IMPLEMENTATION IN GCC, is a bonafide work written by Mr. Pratik Patre, Mr. Niranjan

Viladkar and Mr. Waman Virgaonkar in the Electronics and Computer Science Engineering

Department, Visvesvaraya National Institute of Technology, Nagpur, in partial fulfilment of the

requirements for the award of the degree of Bachelor of Technology inComputer Science and

Engineering.

Dr. C.S. Moghe Dr. K. D. Kulat

Professor, Head of Department,

Electronics and Computer Science Electronics and Computer Science

Engineering, Engineering,

VNIT, Nagpur VNIT, Nagpur


3/57


4/57

4

ACKNOWLEDGEMENTS

We take this opportunity to acknowledge with deep sense of gratitude our project guides Dr.C.S. Moghe, Professor, Department of Electronics and Computer Science Engineering, VNIT

Nagpur and Dr. U. P. Khedker, Professor, Department of Computer Science, IITB for their

invaluable guidance, motivation, and support which has led to the successful completion of this

project.

We also take this opportunity to pay our sincere thanks to Dr. K. D. Kulat, Head of Department,

Department of Electronics and Computer Science Engineering, VNIT, Nagpur, for providing the

requisite facilities needed to complete the project. We would also like to thank all the teaching

and non- teaching staff for supporting us.


5/57

5

ABSTRACT

Garbage in programs in defined to be unsued data. However current garbage collectors

approximate it as unreachable data. This is due to the lack of effective analysis techniques for

heap data. The use of current data flow analysis techniques for heap references is difficult as

they are matured enough for static data but not for heap. In this project we put forth a data

flow analysis technique for heap references.

Our technique for collecting garbage is based on liveness analysis which approximates unused

data very closely. This analysis uses access graphs as data flow information which captures the

pattern of heap reference accesses. Since access graphs are bounded and the operations

defined on them are monotonic we can use data flow analysis framework and all its standard

results.


6/57

6

Table of Contents

1. Introduction ....................................................................................................................... 81.1. Motivation ................................................................................................................... 8

1.2. The solution ................................................................................................................. 8

1.3. Related work .............................................................................................................. 10

1.4. Challenges.................................................................................................................. 10

1.5. Contributions ............................................................................................................. 11

1.6. Organization of the report ......................................................................................... 11

2. Data Flow Analysis ........................................................................................................... 12

2.1. Program analysis ........................................................................................................ 12

2.2. Data flow analysis abstraction .................................................................................... 12

2.3. Data flow analysis schema ......................................................................................... 143. Explicit Liveness Analysis of Heap .................................................................................... 16

3.1. Program to be analysed ............................................................................................. 16

3.2. Capturing liveness of heap ......................................................................................... 16

3.3. Capturing liveness using access paths ........................................................................ 17

3.4. Capturing liveness using access graphs ...................................................................... 22

3.5. Other analyses ........................................................................................................... 31

3.6. Implementation in GCC .............................................................................................. 31

4. Overview of GCC............................................................................................................... 32

4.1. Intermediate representation ...................................................................................... 32

4.2. GCC Pass .................................................................................................................... 334.3. Adding a GIMPLE interprocedural pass....................................................................... 34

4.4. Building a compiler from GCC..................................................................................... 35

5. Pass Details ...................................................................................................................... 36

5.1. General outline .......................................................................................................... 36

5.2. Visiting each statement .............................................................................................. 36

5.3. Identifying assignment statements ............................................................................ 38

5.4. Identifying pointer type statements ........................................................................... 38

5.5. Generate access path set ........................................................................................... 39

6. Access Graph Library ........................................................................................................ 41

6.1. Files ........................................................................................................................... 41

6.2. Formal definitions of the data structures ................................................................... 41

6.3. The data structures .................................................................................................... 42

6.4. Operations on access graphs ...................................................................................... 44

7. Implementation of Explicit Liveness Analysis in GCC........................................................ 53

7.1. The main function ...................................................................................................... 53

7.2. Explicit liveness analysis ............................................................................................. 54

8. Conclusion ........................................................................................................................ 56

9. References ........................................................................................................................ 57


7/57

7

Table of Figures

Figure 1-1: Motivating Example of HRA .................................................................................... 10

Figure 2-1: A code to illustrate DFA .......................................................................................... 13

Figure 2-2: General algorithm for DFA ...................................................................................... 15

Figure 3-1: Capturing live objects on the heap ......................................................................... 17

Figure 3-2: Computation of ELInand ELOut.............................................................................. 19

Figure 3-3: Flow functions for liveness ..................................................................................... 21

Figure 3-4: Unbounded access path example ........................................................................... 22

Figure 3-5: Set of access paths represented using access graphs .............................................. 24

Figure 3-6: Summarization in access graphs ............................................................................. 24

Figure 3-7: Liveness capturing equations for assignment statement......................................... 26

Figure 3-8: Liveness capturing equations for function call statement ....................................... 27

Figure 3-9: Liveness capturing equations for return statement ................................................ 28

Figure 3-10: Liveness capturing equations for use statement ................................................... 28

Figure 3-11: Computation of ELIn for section 3.4.3 ................................................................... 30

Figure 3-12: ELIn and ELOut definitions .................................................................................... 30

Figure 3-13: Solution to Figure 1-1 ........................................................................................... 31

Figure 6-1: Examples of operations on access graphs ............................................................... 47

Figure 7-1: Main data structure ................................................................................................ 53

Figure 7-2: Data structure for liveness analysis ......................................................................... 54

Figure 7-3: General Algorithm .................................................................................................. 54

Figure 7-4: Computation of ELIn ............................................................................................... 54

Figure 7-5: Computation of LDirect .......................................................................................... 55

Figure 7-6: Calculation of EKillPath ........................................................................................... 55

Figure 7-7: Calculation of LTransfer .......................................................................................... 55


8/57

8

1. Introduction

Program analysis techniques, especially data flow analysis techniques are employed to find

various properties of data used in a program. This summarization of properties of data have

enabled us perform validation, verification and various optimizations on a program. These

techniques have matured significantly over time for static data i.e. data allocated on stack and

in static area. However analysis of data allocated on heap has not reached same level of

maturity.

Garbage is unused data in program causing memory leak and is mainly present on heap. The

current inability to analyse heap data has prevented efficient garbage collection. Taking this

problem as our main motivation we develop a technique for analysis of heap data [5] for

solving the problem of garbage collection. We would also implement the analysis in GCC to

obtain a working model of the analysis.

1.1. Motivation

Data is allocated on stack or heap. Data allocated on stack has fixed size and fixed lifetime,

depending on function scope or block scope. This fixed lifetime of static data makes it easy toallocate and de-allocate stack data. Allocating data on heap gives us the flexibility of variable

size and variable lifetimes. However variable lifetime of heap data makes the question of de-

allocating heap data a difficult one.

Traditionally, liveness of heap data has been approximated by reachability. The heap data that

is unreachable is considered as garbage and de-allocated. However what if some data on heap

is reachable but never used after a certain program point? That heap data should also be

treated as garbage and de-allocated. However the current analysis techniques are not powerful

enough to find such data. Solving the above problem and implementing it in GCC is our main

motivation.

1.2. The solution

We perform static analysis of program extracting properties of heap data accesses and find

unused data beyond each program point. We make all the references to this heap data as null.

Now that data is unreachable and will be collected by conventional garbage collectors. This is


9/57

9

known as Cedar Mesa folk wisdom. This would be done by analysing four properties of heap

references which are explicit liveness, aliasing, availability and anticipability. In accordance with

these analyses, null assignments are decided upon and checked for safety and profitability.

However we limit ourselves to explicit liveness analysis in this project. We will implement

explicit liveness analysis in GCC as an implementation of our approach. GCC is a widely used

compiler and supports many front-end languages and back-end machines. Also, GCC provides

good API for interfacing with the program and its manipulation. Hence we would implement

our analysis in GCC. Since the runtime environment of C program does not guarantee a garbage

collector, we have to explicitly free the memory when all aliases to an object are nullified.

1.2.1. Illustrative Example:We present an example to illustrate our approach.Figure 1-1(a) shows the program operating

on the heap.Figure 1-1(b) shows the memory graph. Root variables are on the stack and the

actual objects corresponding to the root variables are in the heap. The heap is represented as a

directed graph with entry nodes on the stack and objects represented as nodes and links i.e.

references represented as directed edges. Here before execution of line 5 w refers to ma

always as represented by solid edge. Depending on whether while loop executes none, once,

twice or thricexrefers to ma, mb, mc, mdas represented by dashed edges. Similarly, yrefers

to mi, mf, mg, me. mk is an unreachable object while variablezdoes not refer heap and is

ignored.

A conventional copying collector will preserve all nodes except mk. However, only a few of

them are used beyond line 5. The modified program makes the unused nodes unreachable by

nullifying relevant links. The modifications in the program are general enough to nullify

appropriate links for any number of iterations of the loop. Observe that a null assignment hasalso been inserted within the loop body thereby making some memory unreachable in each

iteration of the loop.


10/57

10

Figure 1-1: Motivating Example of HRA

Courtesy:[5]

1.3. Related work

The theoretical basis of our work which includes the heap reference analysis schema and

proofs of correctness of the analysis was done by Khedker et al.[5]

1.4. Challenges

A program accesses data through expressions that have l-values and called access expressions.

They can be scalar variables such asx or can be a reference expression such asx.lptr.rptr.

Program analyzes data and hence needs to know the binding of an access expression with data

i.e. answer the question: What are the different bindings of an access expression to any

object o on the heap at a program point p along different possible program paths? The

precision of the analysis depends on the precision of the answer to the above question.

When the access expressions are simple and correspond to static data, answering the above

question is often easy because, the mapping of access expressions to l-values remains fixed in a

given scope throughout the execution of a program. However in the case of reference


11/57

11

expressions, the mapping between an access expression and its l-value is likely to change

during execution. Observe that manipulation of the heap is nothing but changing the mapping

between reference expressions and their l-values. For example, inFigure 1-1,access expression

x.lptrrefers to miwhen the execution reaches line number 2 and may refer to mi, mf, mg, orme at line 4. This implies that, subject to type compatibility, any access expression can

correspond to any heap data, making it difficult to answer the question mentioned above. All

these make analysis of programs involving heaps difficult.

1.5. Contributions

This project would be the first complete implementation of the heap reference analysis in GCC.

We would be contributing to both the heap reference analysis by doing its first

implementation. And to GCC as it is an open source compiler by implementing this analysis in

GCC.

1.6. Organization of the report

Chapter2 would talk about data flow analysis techniques in general. Chapter3 would use the

data flow techniques for explicit liveness analysis of heap. Chapter4 would give an overview of

GCC. Chapter5 would be about the interfacing with GCC. Chapter0 consists of implementation

access graph and its associated operations. Chapter 7 would be about implementation of

explicit liveness analysis of heap.


12/57

12

2. Data Flow Analysis

Data Flow analysis1 is an important technique for program analysis. It is a technique for

gathering information about the flow of data regarding a particular property at various points

in a computer program. The information gathered is often used for validating a program or by

compilers when optimizing a program.

2.1. Program analysis

Program analysis techniques analyze a particular program with respect to some property.

Program analyses cover a large spectrum of motivations, basic principles, and methods.

Different approaches to program analysis differ in details but at a conceptual level, almost all

program analyses are characterized by some common properties. Although these properties

are abstract, they provide useful insights about a particular analysis. A deeper understanding of

the analysis would require exploring many more analysis-specific details.

Program analysis can be used to determine the validity of a program, to understand the

behaviour of a program or to transform and optimize a program. Some common paradigms of

program analysis are inference systems, constraint resolution systems, model checking andabstract interpretations. Data flow analysis is a constraint resolution system based program

analysis technique.

2.2. Data flow analysis abstraction

Data flow analysis statically computes information about the flow of data (i.e., uses and

definitions of data) for each program point in the program being analyzed. This information is

required to be a safe approximation of the desired properties of the run time behaviour of the

program during each possible execution of that program point on all possible inputs.

A state of a program at a particular time may be regarded as to consisting of values of various

data objects. The execution of a program can be viewed as a series of transformations of the

program state. Each execution of an intermediate-code statement transforms an input state to

1Based on[1] and[4]


13/57

13

a new output state. The input state is associated with the program point before the statement

and the output state is associated with the program point after the statement.

When we analyze the behaviour of a program, we must consider all the possible sequences of

program points i.e. paths through a flow graph that the program execution can take. We then

extract, from the possible program states at each point, the information we need for the

particular data-flow analysis problem we want to solve. In general, there is infinite number of

possible execution paths through a program, and there is no finite upper bound on the length

of an execution path. Program analyses summarize all the possible program states that can

occur at a point in the program with a finite set of facts. Different analyses may choose to

abstract out different information, and in general, no analysis is necessarily a perfect

representation of the state.

Illustration:

Consider the program given below.

Figure 2-1: A code to illustrate DFA

What values can a have at program point 5? Answering this question this question seems

difficult because there is infinite number of execution paths reaching program point 5.

However in data-flow analysis, we do not distinguish among the paths taken to reach a

program point. Moreover, we do not keep track of entire states; rather, we abstract out certain

details, keeping only the data we need for the purpose of the analysis. Summarizing all

program states at program point 5, a can have values {5, 13}. Also different data flow analyses

collect different information like, reaching definitions analysis says that definition set {1, 3}

reaches point 5 while constant folding detects that a cannot be treated as constant at point 5.

1: a = 5;

2: while (is_stop()) {

3: a = 13;

4: }

5: if ( a == 13 )

6: b = a;

7: else

8: b = 9;

9: return b;


14/57

14

2.3. Data flow analysis schema

In each application of data-flow analysis, we associate with every program point a data-flow

value that represents an abstraction of the set of all possible program states that can be

observed for that point. We denote the data-flow values before and after each statements

by

IN[s] and OUT[s], respectively. The data-flow problem is to find a solution to a set of

constraints on the IN[s]'s and OUT[s]'s, for all statements s. There are two sets of

constraints: those based on the semantics of the statements (transfer functions) and those

based on the flow of control.

Transfer function depends on the semantics of the statement and the analysis being

performed. In a forward-flow problem, the transfer function fs for statement s converts a

data-flow value before the statement to a new data-flow value after the statement. That is,

OUT[s] =fs(IN[s]) (2.1)

Conversely, in a backward-flow problem, the transfer function fs for statement s converts a

data-flow value after the statement to a new data-flow value before the statement. That is,

IN[s] =fs(OUT[s]) (2.2)

Control flow constraints are derived from flow of control. The flow of control is explicitly

represented in a program flow graph. In the forward flow problem, the constraint flow function

where U is confluence function is,

IN[s] = Up is a predecessor of sOUT[p] (2.3)

In backward flow problem, the constraint flow function is,

OUT[s] = Up is a successor of sIN[p] (2.4)

Illustration:

Consider program inFigure 2-1.While performing reaching definitions analysis of x, consider

the transfer function of statement 3. The IN set consists of definition set {1} while after the

statement OUTset is {1, 3}.

Now consider the constraint flow function at point 9, the program flow graph indicates 2

predecessors as 6 with OUTset {6} and 8 with OUTset as {8}. The INset of 9 is union of sets at6 and 8 and is {6, 8}.


15/57

15

Unlike linear arithmetic equations, the data-flow equations usually do not have a unique

solution. Our goal is to find the most "precise" solution that satisfies the two sets of

constraints. That is, we need a solution that encourages valid code improvements, but does not

justify unsafe transformations.

The general method of solving the above constraints is by initializing the INand OUTsets and

then traversing the program either against or with the control flow satisfying the equations.

The program is traversed iteratively till no further changes are made to the INand OUTsets.

The general algorithm for a forward flow problem is,

Figure 2-2: General algorithm for DFA

1: out[entry] = {initialization};

2: for (each statement s other than entry)

out[s] = {initialization};

3: while (changes to any OUT occur)

4: for (each statement s other than entry) {

5: IN[s] = p is a predecessor of sOUT[p];

6: OUT[s] = fs (IN[s])

7: }


16/57

16

3. Explicit Liveness Analysis of Heap

The method is based on liveness of links for a particular object. The links which are used

beyond a program point are live while those not used are dead and can be set to null. Here we

develop a method for liveness analysis of heap data. We define liveness of heap references,

devise a bounded representation called an access graph for liveness, and then propose a data

flow analysis for discovering liveness. The method is flow sensitive but context insensitive since

we take into account flow of control but approximate interprocedural information.

3.1. Program to be analysed

The analysis is context insensitive so we would not maintain a call graph and work on program

flow graph. The program flow graph has a unique Entry and a unique Exit node. Each

statement forms a basic block. All complex statements are broken down and all the resulting

simple statements fall into following categories:

Assignment Statements: These are assignments to references and are denoted by x= ywhere

the frontier of xand yare references. Only these statements can modify the structure of the

heap.

Function Calls: These are statements function calls which involve access expressions in

arguments and are likex = f (y, z,. . .).

Use Statements: These statements use heap references to access heap data but do not modify

heap references. These are access expressions with their frontiers not as references like

x.data = y.data + z.data.

Return Statement: These statements are return involving access expression like return x.

Other Statements: These statements include all statements which do not refer to the heap. We

ignore these statements since they do not influence heap reference analysis.

3.2. Capturing liveness of heap

Capturing liveness of heap at a program point p would mean finding all objects that can be

accessed in the program after program point p. Links is the way to access an object on the

heap. Thus if we capture links used after program point p we can capture live objects a s, if at


17/57

17

least one link to an object is live then the object is live. Link lcan be used in two different ways.

It may be dereferenced to access an object or tested for comparison. An erroneous nullification

of lwould affect the two uses in different ways: Dereferencing lwould result in an exception

being raised whereas testing lfor comparison may alter the result of condition and thereby theexecution path. Links are accessed in a program using access expressions as they contain heap

references. Thus by considering the access expressions after program point p, we can capture

live links thereby capturing live objects on heap.

Illustration:

Consider the program with root as binary tree with left and right as its children:

Figure 3-1: Capturing live objects on the heap

At program point 4, what is the liveness of heap? We see that root.left.dataaccess expression

is used in statement 5 hence the link between root and left (denoted as rootleft) in the

memory graph becomes alive. Thus we say that the left child of binary tree root is live and

since right child does not have any live link, it is dead.

Now we need to capture liveness of links in a memory graph which we do using access paths.

Access paths actually denote links in a memory graph. The next section would describe the

approach in detail.

3.3. Capturing liveness using access paths

3.3.1. Access paths

As discussed above, in order to discover liveness and other properties of heap, we need a way

of naming links in the memory graph. We do it using access paths. An access path is a root

variable name followed by a sequence of zero or more field names and is denoted by xx

f1f2....fk. Since an access path represents a path in a memory graph, it can be used for

naming links and nodes. An access path consisting of just a root variable name is called a simple

access path; it represents a path consisting of a single link corresponding to the root variable. E

1: binary_tree root;

2: root = set_binary_tree();

3: aliased_root = root;

4

5: return root.left.data;


18/57

18

denotes an empty access path. The last field name in an access path is called itsfrontier and is

denoted by Frontier (). The frontier of a simple access path is the root variable name. The

access path corresponding to the longest sequence of names in excluding its frontier is called

its base and is denoted by Base(). Base of a simple access path is the empty access path. Theobject reached by traversing an access path is called the target of the access path and is

denoted by Target(). When we use an access path to refer to a link in a memory graph, it

denotes the last link in, that is, the link corresponding to Frontier ().

Illustration:

ConsiderFigure 3-1,for the access pathroot leftat program point 3, Base ()is root

while Frontier ()is the link rootleft and Target ()is the left child of root.

As explained earlier, Figure 1-1(b) is the superimposition of memory graphs that can result

before line 5 for different executions of the program. For the access pathxx lptr lptr,

depending on whether the while loop is executed 0, 1, 2, or 3 times, Target (x) denotes

nodes mj, mh, mm,or ml. Frontier (x)denotes one of the links mimj, mfmh, mgmm

or meml. Base(x) represents the following paths in the heap memory: xmami ,

xmbmf, xmcmgorxmdme.

In the rest of the report, denotes an access expression, denotes an access path and

denotes a (possibly empty) sequence of field names separated by . Let the access expression

xbe xf1f2 fn. Then, the corresponding access path xis xf1f2 fn. When the

root variable name is not required, we drop the subscripts from xandx.

3.3.2. Liveness of access paths

Now we need to define liveness of access paths. For a link lto be live there must be at least one

access path from some root variable to lsuch that every link in this path is live. This is the path

that is actually traversed while using l. An access path is defined to be live at p if the link

corresponding to its frontier is live along some path starting at p. Safety of null assignments

requires that the access paths which are live are excluded from nullification.

We initially limit ourselves to a subset of live access paths, whose liveness can be determined

without taking into account the aliases created before p. These access paths are live solely

because of the execution of the program beyond p. We call access paths that are live in this


19/57

19

sense as explicitly live access paths. An interesting property of explicitly live access paths is that

they form the minimal set covering every live link.

Illustration:

Consider the program in Figure 3-1 at program point 4, the left child of root is accessed and

hence live. The access path used in program is rootleft and hence it is live. But even if

aliased_rootleft access path is not used after statement 4 its frontier link is live i.e. link

between objects pointed by rootand left child. Here we say that rootleft is explicitly live

since all its links are actually in the program. While for aliased_rootleft it is not explicitly

live and we also notice that aliased_root link (from aliased_rootvariable on stack to root

object on heap) is never used.

We would now focus on developing a data flow analysis technique based on capturing liveness

using access paths.

3.3.3. Using access paths to capture liveness

We now look at how statement semantics would affect liveness of access paths. And thus

derive flow constraints in the form of flow functions. Liveness analysis is a backward flow

analysis. Any statement can affect the incoming access path set in the following ways. Here

ELIndenotes incoming access path set and ELOutdenote the outgoing access path set from a

statement.

Let us try to see the effect by an illustration:

Illustration:

Consider the program fragment,

Figure 3-2: Computation of ELInand ELOut

The EOutof the above statement 2 is {xlptrrptrlptr}. Consider,

xlptrrptr is being modified rendering the value before the statement useless. Hence

access paths with prefixxlptrrptrcease to exist before the statement. Such access paths

are reffered as killed access paths. In this case it is {xlptrrptrlptr}.

1:

2: x.lptr.rptr = y.rptr.lptr;

3: print (x.lptr.rptr.lptr.data);


20/57

20

Objects with access paths xlptr and yrptr are directly accessed. These access paths

become live. Such access paths are reffered as directly generated access paths.

Here yrptrlptr is being assigned to xlptrrptr. Thus the objects accessed using

xlptrrptr{some_path} after the statement must be accessible using y

rptrlptr{some_path} before the assignment. Such access paths are reffered as

transferred access paths. Thus transferred access paths are { yrptrlptrlptr}.

The final set of access paths which are live can be computed by removing the killed access

paths from ELIn and adding directly generated and transferred access paths.

Thus the final ELInof statement 2 is {xlptr, yrptrlptrlptr}.

Formalizing the above observations,

Killed Access Paths: These are the access paths that cease to exist before the statement since

the access path was modified in the statement invalidating the previous value assigned to it.

Access paths those are live after the assignment and not killed by it are live before the

assignment also.

Directly Generated Access Paths: These are access paths directly used in a statement and hence

become live before a statement.

Transferred Access Paths: These are the access paths that get transferred from one access path

to another due to an assignment statement. This is to take into account the change in bindings

of an access expression.

Finally the ELInset is computed from the ELOutset as,

ELIn = (ELOut Killed access paths)

U (Directly generated access paths U Transferred access paths)(3.1)

3.3.4. Liveness analysis schema

Now we define the liveness analysis schema using access path. We would also describe control

flow constraints on data flow equations.


21/57

21

Explicit Liveness: The set of explicitly live access paths at a program point p, denoted by

Livenesspis defined as follows:

(3.2)

where, Paths(p)is a control flow path frompto Exitand

denotes the

liveness atpalong .

Path Liveness: Ifp is not program exit, then let the statement that follows it be denoted by s

and the program point immediately following sbe denoted byp. Then,

(3.3)

Statement Liveness: The flow function is defined as:

(3.4)

LKills denotes the sets of access paths that cease to be live before statement s, LDirects

denotes the set of access paths that become live due to local effect of s and LTransfers(X)

denotes the set of access paths which become live before sdue to transfer of liveness from

live access paths after s.

Illustration:

The flow functions explained later in section3.4.3

Flow function is defined as,

Figure 3-3: Flow functions for liveness

Courtesy:[5]

The definitions of LKills, LDirects, and LTransfers(X) ensure that the Livenessp is prefix-

closed.


22/57

22

3.3.5. Difficulties

3.3.5.1. Unbounded access paths:

Access paths cannot be guaranteed to be bounded in case of loops and thus termination

cannot be guaranteed.

Illustration:

Figure 3-4: Unbounded access path example

During 1st

iteration: ELInat 3 is {xptr}, ELOutat 3 is {xnptr}

During 2nd

iteration: ELInat 3 is {xnptr}, ELOutat 3 is {xnnptr}

During nth

iteration: ELInat 3 is {xn[n-1 times]ptr}, ELOutat 3 is {xn[n times]ptr}

Hence a way to summarize access paths is needed.

3.3.5.2. Data Flow Equations

The data flow equations above were MoP solution equations. Hence they are not suitable for

data flow analysis. We need to define MFP solution equations.

3.4. Capturing liveness using access graphs

In the presence of loops, the set of access paths may be infinite and the lengths of access paths

may be unbounded. This problem is solved by representing a set of access paths by a graph ofbounded size.

3.4.1. Access Graphs

A set of access paths can be represented using access graphs. An access graph, denoted by Gv,

is a directed graph representing a set of access paths starting from a root variable

v. N is the set of nodes, n0NF is the entry node with no in-edges and E is the set of edges.

Every path in the graph represents an access path. The empty graph Ghas no nodes or edges

and does not accept any access path.

1:

2: while (is_stop()) {

3: x = x.n;

4: }

5: print (x.ptr.data);


23/57

23

The entry node of an access graphs is labelled with the name of the root variable while the

non-entry nodes are labelled with a unique label created as follows: If a field name is

referenced in basic block b, we create an access graph node with a label 2where iis the

instance number used for distinguishing multiple occurrences of the field name in block b.Note that this implies that the nodes with the same label are treated as identical. Access paths

xare represented by including a summary node denoted nwith a self loop over it. It is

distinct from all other nodes but matches the field name of any other node. A node in the

access graph represents one or more links in the memory graph.

Illustration:1:

2: x.lptr.rptr = y.rptr.lptr;3: print (x.lptr.rptr.lptr.data);

4: print (y.rptr.obj1.data);

The live access paths at each point represented using both access paths and access graphs are,

Program

Point

Set of live access

pathsAccess graphs

OUT set at 4 NULL

IN set at 4 yrptrobj1

OUT set at 3 yrptrobj1

IN set at 3yrptrobj1,

xlptrrptrlptr

OUT set at 2yrptrobj1,

xlptrrptrlptr

2In implementation, lable is where s is statement number in a basic block b created by GCC.


24/57

24

IN set at 2xlptr,

yrptrlptrlptr,

yrptrobj1

Figure 3-5: Set of access paths represented using access graphs

Access graphs solve the problem of infinite access paths by summarization. Summarization in

access graphs is achieved by merging appropriate nodes in access graphs, retaining all INand

OUTedges of merged nodes. The technique is illustrated as below,

Illustration:

Consider the program flow graph shown,

Figure 3-6: Summarization in access graphs

Courtesy:[5]

Node n1 in access graph 1 indicates references of r at different execution instances of the

same program point. Every time this program point is visited during analysis, the same state is

reached in that the pattern of references after r1 is repeated. Thus all occurrences of r1 are

merged into a single state. This creates a cycle which captures the repeating pattern of

references.


25/57

25

In 2, nodes r1and r2indicate referencing n at different program points. Since the references

made after these program points may be different, r1and r2are not merged.

Some operations are defined on access graphs as, the complete formal definitions of the

following and more graph functions are described in Chapter0.

G () Constructs access graphs corresponding to

Path Removal() The operation Gremoves those access paths in Gthat haveas aprefix

lastNode (G) Returns the last node of a linear graph G

Union (U) GU Gcombines access graphs Gand Gsuch that any access pathcontained in Gor Gis contained in the resulting graph

Factorization (/) G/(G,M)returns all remainder graphs in Gstarting from nodes in Gcorresponding to Min G

Extension(#) (G,M)#R returns graph Gextending it by remainder graphs in Ratnodes in M

3.4.2. Liveness representation using access graphs

A set of access paths can be represented using access graphs. Every path in the graph

represents an access path. All the access paths present in an access graph are live. This causes

approximation during summarization but is safe.

3.4.3. Capturing liveness using access graphs

We now look at how statement semantics would affect liveness of access paths. And thus

derive flow constrains in the form of flow functions. Liveness analysis is a backward flow

analysis. Any statement affects the incoming access path set depending on its type and is

explained below. Here ELIndenotes incoming access path set and ELOutdenote the outgoing

access path set from a statement.

3.4.3.1. Assignment statement

Assignment statement will be of the form : x= y


26/57

26

We know how an assignment statement affects liveness of heap as seen in the illustration in

section3.3.3.Now we will see how to capture these effects using access graphs.

Illustration:

Consider the program statement,

5: x.left.right = y.right.left.right

The access path xleftright gets modified. So we have to remove all access paths with

xleftright as prefix. Hence killed access paths are {xleftright}.

The access paths xleft and yrightleft are generated. Thus the base of directly used access

expressions is generated.

Some access paths are to be transferred from xleftright to yrightleftright. The access

paths from access graph of x with prefix xleftright have to be copied as remainder graphs

using graph factorization and then attached to access graph of y with prefix

yrightleftright using graph extension.


Figure 3-7: Liveness capturing equations for assignment statement

In theabove equations, Gxand Gydenote G(x) and G(y), respectively, whereas Mxand My

denote lastNode(G(x))and lastNode(G(y))respectively.

3.4.3.2. Function callFunction call will be of the form: x=(y)


27/57

27

We conservatively assume that a function call may make any access path rooted at y or any

global reference variable live. Thus, this version of our analysis is context insensitive.

Illustration:

Consider the program statement, with global variable z,

5: x.left.right = func (y.right);

The access path xleftright gets modified. So we have to remove all access paths with

xleftright as prefix. Hence killed access paths are {xleftright}.

The access paths xleft and y get directly accessed hence get directly generated.

The access path yright is passed as parameter to the function and so any access paths may be

accessed after yright. Thus we conservatively approximate that the generated access path is

{yrightn*}. Similarly any access path from global variable may be accessed and so we

conservatively assume that the generated access path is {zn*}.


Figure 3-8: Liveness capturing equations for function call statement

3.4.3.3. Return statement

Return statement will be of the form : return x

Illustration:


5: return x.left;

The access paths xleft get directly accessed hence get directly generated.


28/57

28

The access path xleft is passed as a return value to the calling function and so any access

paths may be accessed after xleft. Thus we conservatively approximate that the generated

access path is {xleftn*}. Similarly any access path from global variable may be accessed and

so we conservatively assume that the generated access path is {zn*}.


Figure 3-9: Liveness capturing equations for return statement

3.4.3.4. Use statement

Illustration:


5: x.left.data = y.right.data + z.left.right.data;

The access paths xleft, yright, zleftright get directly accessed hence get directlygenerated.


Figure 3-10: Liveness capturing equations for use statement

3.4.4. Liveness analysis schema revisited

Now we define the liveness analysis schema using access graphs. We would also describe

control flow constraints on data flow equations.

Now to compute liveness ELIn due to a statement, we have to remove killed access paths and

add directly generated and transferred access paths.


29/57

29

And while computing ELOut we have to merge the access paths present at the ELIn of its

successors.

Now we will see their computation using some illustrations,

Illustration:

Now we will illustrate ELIn computation for the examples used to illustrate effect of each

statement type on access graphs.

Illustration

in sectionOUT set IN set

3.4.3.1

(Assignment

statement)

3.4.3.2(Function

call)


30/57

30

3.4.3.3

(Return

statement)

3.4.3.4

(Use

statement)

Figure 3-11: Computation of ELIn for section3.4.3

Formalizing,

For a given root variable v, ELInv(i) and ELOutv(i) denote the access graphs representing

explicitly live access paths at the entry and exit of statement i. We use EGas the initial value for

ELInv(i) / ELOutv(i).

Figure 3-12: ELIn and ELOut definitions

EKillPath, LDirectand LTransferare defined according to the type of statement.

Solving theabove data flow equations we get the solution as access graphs.


31/57

31

Illustration:

The solution of the problem described inFigure 1-1 is,

Figure 3-13: Solution toFigure 1-1

Courtesy:[5]

3.5. Other analyses

Other analyses that are required for null assignment insertions are discussed in brief below.

Their study and implementation is not covered in this project.

Alias analysis and complete liveness computation: This analysis discovers all aliases and thus

finds all paths aliased to live access paths.

Anticipability and availability analysis: This analysis discovers available and anticipable access

paths so that insertion of new access paths does not cause exceptions.

Null assignment insertion: Null assignment insertion is subject to safety and profitability.

3.6. Implementation in GCC

We have now seen the formulation of data flow analysis equations for heap reference analysis.

Now we would implement the analysis in GCC in the succeeding chapters.


32/57


33/57

33

4.1.4. GIMPLE

GIMPLE is a simplified version of GENERIC. It is lowering of GENERIC to a three-operand

representation. Temporaries are introduced to hold intermediate values needed to compute

complex expressions as three-operand statements. Additionally, all the control structures used

in GENERIC are lowered into the conditional jumps.

The compiler pass, which converts GENERIC to GIMPLE is referred to as gimplifier [7]. This

pass works recursively replacing each complex statement by a result-wise equivalent set of

gimple three-operand statements. These GIMPLE statements are also referred to as GIMPLE

tuples.

Earlier implementation of GIMPLE used trees as internal data structure[3].But, tree structure

was much more general than required for three address statements. Here comes the concept

of tuples. It contains information such as type of statement, result, operator and operands.

Operands themselves are represented as trees.

For example,

x= 10 would be represented as gimple_assign

x = b+c would be represented as gimple_assign

4.2. GCC Pass

In order to analyze programs, perform certain operations on them, we need to add a pass to

GCC. Pass is a C program that with the help of GCC APIs extracts information from previous

pass or input program or both, performs certain operations on the information received and

produce output that may or may not be forwarded to next pass. Behaviour of any pass can be

observed by looking at the dumps produced by corresponding pass. For eg. To observe the

output dump by gimplifier, while compiling input program, we can provide a switch -fdump-

tree-gimple.

4.2.1. Types of passes

There are 4 types of passes, gimple_opt_pass, simple_ipa_opt_pass, ipa_opt_pass and

rtl_opt_pass. The definitions and declarations are provided in $SOURCE/gcc/tree-pass.h. We

will use simple_ipa_opt_pass.


34/57

34

4.3. Adding a GIMPLE interprocedural pass

In GCC, any pass is represented by a structure, in our case that structure is:

simple_ipa_opt_pass. The declaration of this structure and detailed information about the

fields of this structure can be found in $SOURCE/gcc/tree-pass.h. The definition of our pass

structure is as follows:

struct simple_ipa_opt_pass pass_empty = {

{

SIMPLE_IPA_PASS, /*Type of Pass*/

"hra" , /*Switch to execute the pass*/

NULL , /*Condition function */

empty_func_driver, /*Entry point*/

NULL , /*sub passes*/

NULL , /*Next subpasses*/

0 , /*static pass number*/

0 , /*tv_id */

0 , /*properties required, indicated by bit position*/

0 , /*properties provided, indicated by bit position*/

0 , /*properties destroyed, indicated by bit position*/

0 , /*todo flags start*/

0 /* todo flags finish */

}

};

4.3.1. Registering the pass

We need to register our pass, i.e. our C program file by adding it in $SOURCE/gccdirectory

and make changes in following files:

1. $SOURCE/gcc/passes.c

2. $SOURCE/gcc/tree-pass.h3. $SOURCE/gcc/Makefile.in

In passes.c, we need to determine the position of pass by adding its entry in appropriate

position in pass list present in init_optimization_passes()function. As our pass is simple ipa

optimization pass, we can add our pass when the pass pointer is set to point all regular ipa


35/57

35

passes. As it does not take into input from any previous pass neither does it provide its output

to any other pass, the exact ordering is not of much importance.

In tree-pass.h, we need to make declaration of our pass as :

extern struct simple_ipa_opt_pass ;

InMakefile.in, we need to write rule to make target pass_name.o and and pass_name.o to the

list of language independent object files.

4.4. Building a compiler from GCC

Here, our target is to build a compiler (cc1) which when input by a C program would produce

corresponding assembly *.s file. The steps to build a compiler are as follows:

1. Write rule to make target cc1 in file $SOURCE/Makefile.in

cc1:

make all-gcc TARGET-gcc=cc1$(exeext)

2. Make a new build directory (hereafter $BUILD)outside the source code directory

3. With current directory as $BUILD, configure it with $SOURCE/configure. We can give

many options while configuring, such as, enable-languages, target(i.e target

architecture / machine for which generated compiler would produce the assembly

code), install directory etc.

4. After configuring, run make with target as cc1. This step requires time, roughly 10-12

minutes on average machine.

5. After successful completion of make, generated compiler can be used by using

$BUILD/gcc/cc1

for eg. $BUILD/gcc/cc1 program.c -fdump-ipa-allwould compileprogram.cto produce

program.sand around 20-25 dumps of all the interprocedural passes.

By observing the dumps, we can understand the behaviour of various passes for given input

program. For our pass, the corresponding switch is -fdump-ipa-hra.


36/57


37/57

37

5.2.2. Visiting each basic block

In a given function, each basic block can be visited in the following manner:

FOR_EACH_BB(BB){

//code to analyze each basic block here.

}

Here, FOR_EACH_BB(BB)is a macro provided by GCC which uses a global variable cfunto point to

current function, and in current function, it uses BBto point to each basic block. The body of

macro is a simple for loop which starts from the first basic block and then advances to next

block till it reaches end.

5.2.3. Visiting each GIMPLE statement

In a given basic block, each GIMPLE statement can be visited using the following macro:

#define FOR_ALL_STMT_FWD_VNIT(BB, GSI) \

FOR_EACH_BB(BB) \

FOR_EACH_GIMPLE_STMT_VNIT(BB, GSI)

Here, body of FOR_ALL_STMT_FWD_VNIT(BB, GSI)is made up of two macros, former is provided by GCC

and the later has been defined in the pass as:

#define FOR_EACH_GIMPLE_STMT_VNIT(BB, GSI) \

for(GSI = gsi_start_bb(BB); !gsi_end_p(GSI); gsi_next(&GSI))

Here, GSIis a gimple statement iterator, whose data type is provided by GCC. As we can see, in

the body of second macro, GSI first points to the start statement of the basic block and then

goes till it reaches the end. In the body of this for loop, we can use gsi_stmt(GSI) to access the

corresponding GIMPLE statement. Thus driver function for our pass after removing

unnecessary details looks like:

static unsigned int empty_func_driver(){

preparatory_iterations();

for (cnode = cgraph_nodes; cnode; cnode=cnodenext){ //iterate over all functions

push_cfun (DECL_STRUCT_FUNCTION (cnodedecl)); //push current function

FOR_ALL_STMT_FWD_VNIT(bb, gsi){ //iterate over each gimple statement in current function

if ( is_gimple_assign(gsi_stmt(gsi)) && is_stmt_pointer_type(gsi_stmt(gsi)) )

get_access_paths(gsi_stmt(gsi));

}

pop_cfun ();

}

return 0;

}


38/57

38

5.3. Identifying assignment statements

In our pass, we are currently able to identify only the GIMPLE assignment statements. Future

work will include identifying and analysing function call, return and use statements. After the

study of file$SOURCE/gcc/gimple.h

, we found a function is_gimple_assign( gimple stmt )that checks

whether a given GIMPLE statement is an assignment statement or not. So when we visit each

GIMPLE statement, we check that statement with above function and proceed towards further

analysis if it is an assignment statement else we move to the next GIMPLE statement.

5.4. Identifying pointer type statements

Once found to be an assignment statement, it needs to be checked for pointer type. If any of

the three operands of an assignment are of pointer type, we recognize that statement as

pointer type statement. The check consists of checking the tree codes and types of all the

operands. GCC assigns each operand a tree code and provides a macro TREE_CODE() that

extracts the tree code. It also provides with macro POINTER_TYPE_P(type) which checks the type (of

any operand) to be of pointer type and returns the boolean result. Type of operand can be

found by TREE_TYPE()macro, again provided by GCC. The code to check if variable is of pointer

type:

static bool is_pointer_var(tree var){

if (TREE_CODE(var) == COMPONENT_REF || TREE_CODE(var) == ADDR_EXPR)

return true;

return is_pointer_type(TREE_TYPE(var));

}

static bool is_pointer_type(tree type){

if(POINTER_TYPE_P(type))

return true;

if(TREE_CODE(type) == ARRAY_TYPE)

return (is_pointer_var(TREE_TYPE(type)));

return AGGREGATE_TYPE_P(type);

}

5.4.1. Extracting operands

In order to check the tree codes and types, first we need to extract operands from a given

GIMPLE statement. This can be done using functions provided by GCC:

1. tree gimple_assign_lhs(gimple stmt)

2. tree gimple_assign_rhs1(gimple stmt)

3. tree gimple_assign_rhs2(gimple stmt).


39/57

39

5.5. Generate access path set

This function returns access path set for each pointer type assignment statement. It gets access

paths for each operand and then clubs them together to get an access path set.

5.5.1. Getting access paths

In order to get access path from each operand, we use functions such as :

access_path * get_access_path_lhs(gimple stmt). This function extracts the names (field names) of

variables as used by programmer (or compiler generated temporaries). The function to get field

names looks like: (functions for rhs operands resemble this function)

static char * get_lhs_op (const gimple stmt){

tree t;

if (is_gimple_assign(stmt)){

t = gimple_assign_lhs(stmt);

return get_name_of_tree1(t);

}

return NULL;

}

And if operand is of pointer type, it generates a label for that operand from following entities:

field name, basic block number, statement number. Out of these, field name extraction and

assigning statement number task has been done in the pass. GCC assigns each basic block a

unique index (number). This triplet makes a label unique.

Once labels are prepared, they are combined together to get an access path for corresponding

operand. And then, access paths of all the operands in a statement are combined together to

get an access path set for that GIMPLE statement. Note that, access path is for an operand

while access path set is for a GIMPLE statement.


40/57

40

The code for getting access path set looks like:

static access_path_set * get_access_paths(gimple stmt){

switch(gimple_code(stmt)){

case GIMPLE_ASSIGN:

ap_lhs = get_access_path_lhs (stmt);

ap_rhs1= get_access_path_rhs1(stmt);

ap_rhs2= get_access_path_rhs2(stmt);

break;

default:

break;

}

stmt_aps lhs = ap_lhs;

stmt_aps rhs1 = ap_rhs1;

stmt_aps rhs2 = ap_rhs2;

return stmt_aps;

}

This completes the phase of retrieving static information from GCC.


41/57

41

6. Access Graph Library3

6.1. Files

AccessGraph.h

This file contains the declaration of the data structure used to represent access graphs and

access paths and also the declaration of the functions associated with it.

AccessGraph.c

This file contains the definition of all the functions required in the explicit liveness analysis.

6.2. Formal definitions of the data structures

6.2.1. Access Paths

An access path is a root variable name followed by a sequence of zero or more field names and

is denoted by x xf1f2 fk. Since an access path represents a path in a memory

graph, it can be used for naming links and nodes. An access path consisting of just a root

variable name is called a simple access path; it represents a path consisting of a single link

corresponding to the root variable. E denotes an empty access path.

The last field name in an access path r is called its frontier and is denoted by Frontier (). The

frontier of a simple access path is the root variable name. The access path corresponding to the

longest sequence of names in r excluding its frontier is called its base and is denoted by Base

(). Base of a simple access path is the empty access path E. The object reached by traversing

an access path r is called the target of the access path and is denoted by Target (). When we

use an access path r to refer to a link in a memory graph, it denotes the last link in , i.e. the

link corresponding to Frontier ().

6.2.2. Access graphsAn access graph, denoted by Gv, is a directed graph representing a set of access paths

starting from a root variable v. N is the set of nodes, n0NFis the entry node with no in-edges

and E is the set of edges. Every path in the graph represents an access path. The empty graph

EGhas no nodes or edges and does not accept any access path.

3Based on [5]


42/57


43/57

43

Here the access path lhs corresponds to the access path of the variable that is on the left hand

side of the =sign, while the access paths rhs1and rhs2correspond to the access paths of the

variables that are on the right hand side in the expression.

6.3.5. Access graph node

This structure represents a node in an access graph which has been implemented as a node in

an adjacency linked list representation of a graph.

typedef struct AGN{

unsigned summary : 1 ;

Label l ;

struct AGE * edges ;

struct AGN * next ;

} AccessGraphNode ;

The label lholds the information in the node while the summary bit denotes whether the node

is a summary node or not. The edges pointer points to the linked list of edges originating from

the node.

6.3.6. Access graph edge

This structure represents an edge in the access graph as well as in the adjacency linked list

representation of the graph.

typedef struct AGE{

AccessGraphNode * from_node ;

AccessGraphNode * to_node ;

struct AGE * next ;

} AccessGraphEdge ;

The access graph node pointers from_node and to_node point to the originating and

destination node of the edge respectively.

6.3.7. Nodes set

The nodes set is set of nodes in the access graph and is implemented as a simple linked list of

nodes.

typedef struct NS{

AccessGraphNode * first_node ;

} Nodes_Set ;


44/57

44

6.3.8. Edges set

The edges set is the set of edges in the access graph and is implemented as a simple linked list

of edges. Thus, unlike the conventional adjacency linked list representation, all the edges in the

access graph form a single linked list with edges originating from the same node grouped

together.

typedef struct ES{

AccessGraphEdge * first_edge ;

} Edges_Set ;

6.3.9. Access graph

As given by the formal definition of the access graph, it has been implemented as structure

with nodes set and edges set. The first node in the nodes set always corresponds to the entry

node in the graph.

typedef struct G{

Nodes_Set Nodes ;

Edges_Set Edges ;

struct G * next ;

} AccessGraph ;

6.3.10. Access graph set

The access graph set represents the set of access graphs as a link list.typedef struct AG{

AccessGraph * start ;

} AccessGraphSet ;

6.4. Operations on access graphs

6.4.1. Auxiliary operations

6.4.1.1. ConstructGraph( g)Constructs access graph g corresponding to access path . It involves converting the access

path nodes to access graph nodes and adding the corresponding edges.

void ConstructGraph (AccessPath * p , AccessGraph * g)

begin

For all the nodes in the access path

begin

Create a corresponding node in the access graph

end

Add edges with respect to access path to access graphend procedure


45/57

45

6.4.1.2. lastNode(G)

Returns the last node of a linear graph G constructed from a given

AccessGraphNode* lastNode (AccessGraph * G)

begin

Traverse the linked list and return the last node

end procedure

6.4.1.3. CleanUp(G)

Deletes the nodes which are not reachable from the entry node.

void CleanUp (AccessGraph * g)

begin

1. Run a Depth First Traversal over the graph and mark all the visited nodes

2. Traverse the linked list of nodes and delete all the unmarked nodes and

their edges from the graph

end procedure

6.4.1.4. CorrespondingNodes(G,G,S)

Computes the set of nodes of Gwhich correspond to the nodes of Gspecified in the set S. To

compute CN(G,G,S), we defineACN(G,G), the set of pairs of all corresponding nodes. Let G

and G .A node nin Gcorresponds to a node nin Gif there

exists an access path rwhich is represented by a path from n0tonin Gand a path from n0 to

nin G.

Formally,ACN(G,G)is the least solution of the following equation:

(0.1)

Note that Field(nj) = Field(nj)would hold even when njor njis the summary node n.

void Corresponding_Nodes (AccessGraph* G, AccessGraph* G_, Nodes_Set S, Nodes_Set CN)

begin

All_Corresponding_Nodes (G , G_ , ACN1 , ACN2);

For each node n in ACN2 and n in ACN1

if n S then add n to CN

end procedure


46/57

46

void All_Corresponding_Nodes (AccessGraph* G, AccessGraph* G_, Nodes_Set ACN1,

Nodes_Set ACN2)

begin

if root(G) != root (G_) then return;

Starting from the root node recursively add pair of nodes to the set ACN1 and

ACN2 which are same and have edges coming to them from the pair of nodes

already in these sets.

end procedure

6.4.1.5. CopyGraph (G,G)

Copies the graph Ginto a new access graph G.

AccessGraph* copy_graph (AccessGraph * g)

begin

Copy all the nodes of g into a new graph g

Copy all the edges of g into g establishing links between the nodes and the

edges set

Return g

end procedure

6.4.1.6. RemainderGraph(G,G,n)

Constructs a remainder graph Gfrom an access graph Gwith n as the entry node.

AccessGraph* remainder_graph (AccessGraph* g, AccessGraphNode* n)

beginRun a recursive depth first traversal over the graph g starting from node n

and add each node to a new graph g while visiting it along with all its

edges.

end procedure

6.4.2. Main operations

6.4.2.1. Union

G Gcombines access graphs Gand Gsuch that any access path contained in Gor G is

contained in the resulting graph.

G G = < n0, N N, E E > (0.2)

The operation N N treats the nodes with the same label as identical. Because of

associativity,can be generalized to arbitrary number of arguments in an obvious manner.

This operation can be explained more effectively by the examples given in Figure 6-1. In thefirst example the access graphs g3 and g4 unite to give the access graph g4 since the g3 is the


47/57

47

subset of g4. In the second example the union of access graphs g2 and g4 results in the access

graph g5. Note here that union basically just takes the unions of the nodes and edges set of the

two access graphs with the same root variable. The other two examples are on the same line.

The implementation of this operation is based on the definition given above. The union of

nodes set and edges set of both the graphs is done and then the links are established between

the two sets resulting in a new graph.

Figure 6-1: Examples of operations on access graphs

Courtesy:[5]

AccessGraphSet * Union (AccessGraphSet * G1 , AccessGraphSet * G2)

begin

AccessGraphSetG3 ;

for each graph g1 in set G1

begin


begin

if(root (g1) == root (g2))

then begin

g3 = union_graph(g1 , g2) ;

add g3 to G3 ;

endif

end for

end for

Return G3

end procedure


48/57

48

accessgraph * union_graph (accessgraph * g1 , accessgraph * g2)

begin

accessgraph * g3 ;

copy all the nodes of g1 to g3 ;

for each node n2 in g2

begin

if n2 is not present in g3

then add n2 to g3 ;

end for

copy all edges of g1 to g3 ;

for each edge e2 in g2

begin

if e2 is not present in g3

then add e2 to g3 ;

end for

for each node n3 in g3

begin

search for the first edge e3 in g3

such that e3 from_node = n3

n3 edges = e3 ;

end for

return g3 ;

end procedure

6.4.2.2. Path removalThe operation Gremoves those access paths in G which have as a prefix.

(0.3)

Where,

(0.4)

UniqueAccessPath?(G, n)returns true if in G, all paths from the entry node to node n represent

the same access path.

In the first example given in Figure 6-1,we can see that removal of the access path xl from

the access graph g6 results in the access graph g2. This operation requires removing the

frontier(),i.e. in this case, the node lfrom the access graph g6. The second example illustrates


49/57

49

the case where the is a simple access path. The third and the fourth examples are on the

same lines.

The implementation of this operation is derived from the definition given above. Firstly, the

access graph GB is constructed from the access path Base() and then set of corresponding

nodes is calculated as given above. Each node in the set obtained is then checked to see if it has

a unique access path from root to itself and also an edge to a node which is the frontier of . If

such an edge exists then it is removed from the set and after removing all such edges the graph

is cleaned up.

AccessGraphSet * Path_Removal (AccessGraphSet * G , AccessPath * p)

begin

if p is empty then return copy(G) ;

for each graph g in set G

begin

if (root(p) != root (g)) continue ;

if p is a simple access path

then remove everything from g (empty);

else

GB= construct_graph (base (p)) ;

Nodes_set N = Corresponding_nodes (G , GB, {lastNode(GB)})

for each node niin gbegin

if ni N

if UniqueAccessPath?(G,ni)

then begin

for each edge e from node ni

if e to_node == frontier(p)

delete edge e ;

end if

end for

CleanUp (g) ;

end for

end procedure

6.4.2.3. Factorization

Given a node m (N {n0})of an access graph G, the Remainder Graph of Gat m is the

subgraph of G rooted at m and is denoted by RG(G, m). If m does not have any outgoing

edges, then the result is the empty remainder graph RG. Let M be a subset of the nodes of G


50/57

50

and Mbe the set of corresponding nodes in G. Then, G/(G,M)computes the set of remainder

graphs of the successors of nodes in M.

G/(G,M) = {RG(G, nj) | ni njE, niCN(G,G,M)} (0.5)

A remainder graph is similar to an access graph except that (a) its entry node does not

correspond to a root variable but to a field name and (b) the entry node can have incoming

edges.

InFigure 6-1,the first example illustrates the result when g2 is factorized with g1 and {x}. The

resultant graph rg1 is the sub graph of g2 rooted at {r} which is the successor of the node {x},

which is the corresponding node between the two graphs and the given set. The second

example is on the same lines with the difference that {x} here has two successors, thus,

resulting in two different remainder graphs. In the third example the corresponding node {r}

does not have successor thus resulting in an empty graph. The fourth example illustrates the

case in which there is no corresponding node between the two graphs and thus the result is a

null set.

In the implementation of this operation, the set of corresponding nodes is calculated and then

a remainder graph is constructed for each successor of the node in this set.

AccessGraphSet * Factorization (AccessGraphSet G1, AccessGraphSet G2, Nodes_Set M)

begin

AccessGraphSet RG ;



begin

if (root(g1) != root(g2)) continue ;

Nodes_set N = Corresponding_nodes(g1,g2,M);

for each node n in N

for each edge e of n

begin

new_graph = remainder_graph (g1 , e to_node) ;

add new_graph to RG ;

end for

end for

end procedure


51/57

51

6.4.2.4. Extension

Extending an empty access graph EG results in the empty access graph EG. For non-empty

graphs, this operation is defined as follows.

(a) Extension with a remainder graph (). Let M be a subset of the nodes of G and R be a remainder graph. Then, (G,M) R appends the suffixes in R to the access paths ending

on nodes in M.

(G,M) RG= G (0.6)

(G,M) R = (0.7)

(b) Extension with a set of remainder graphs (#). Let S be a set of remainder graphs. Then, G#S

extends access graph G with every remainder graph in S.

(G,M) # = EG (0.8)

(G,M) #S =

(G,M) R (0.9)

This operation simply involves adding the remainder graph to the given graph at a certain given

node. From the Figure 6-1,we can see that extending g3 with rg1 at l1 results in the access

graph g4. In the second example the given access graph is extended with two remainder graphs

at two nodes, while the third and fourth examples are pretty much straight forward from the

definition given above.

The implementation of this function requires the union function followed by addition of some

edges from the nodes in the set M to the root node of the remainder graph R.


52/57


53/57

53

7. Implementation of Explicit Liveness Analysisin GCC

The theory of data flow analysis and explicit liveness analysis of heap have been seen in

Chapter2 and 3.Later chapters discussed interfacing with GCC and implementation of access

graph and access path libraries. Now we have access path and other information from GCC and

access graph library to support our analysis, so we now implement the explicit liveness analysis.

7.1. The main function

The analysis was divided into 3 functions, the preparatory pass, explicit liveness analysis and

other analyses. They are explained below,

The main data structuring storing the information is,

Figure 7-1: Main data structure

The preparatory pass: This pass consisted of computation of information which is static and

would be needed by all other analyses. Type of statement is computed as ASSIGNMENT,

FUNCTION CALL, RETURN, USE, OTHERand stored in tos field. Access paths are extracted from

each statement and stored for further use in other analyses in access_paths field. Each

statement would consist of maximum 3 access paths due to use of SSA form in GIMPLE. Basic

blocks are numbered in decreasing order while returning from depth first traversal. This

enables us to traverse each function against the control flow when basic blocks are traversed in

decreasing numbering[2].Also information of any statement can be accessed from Stmt_info

as a tuple .

Explicit Liveness analysis: This is the main function computing explicit liveness. It is explained in

the next section.

Other analyses: The other analyses are not implemented as of now.

typedef struct {

enum type_of_satement tos;

Access_paths * access_paths;

Liveness_analysis_info *

liveness_info;

} Heap_analysis_info;

Heap_analysis_info** Stmt_info;


54/57

54

7.2. Explicit liveness analysis

The explicit liveness analysis extracts information from statements and performs analysis on

this information. Some of the information remains constant while some of it changes with each

iteration. We do an initialization pass over the program computing the static information like

LDirect, EKillPath, some information required by LTransfer.

The data structure used to information in this pass is,

Figure 7-2: Data structure for liveness analysis

After the static information is computed and stored then comes the general data flow

algorithm iterations over the program. It is as shown below,

Figure 7-3: General Algorithm

7.2.1. Computation of ELOut

ELOutis computed by directly implementing Equation for ELOut inFigure 3-12.

7.2.2. Computation of ELIn

ELIndepends on the type of statement and calculated as,

Switch on type of statement

Assignment: calculate LTransfer, EKillPath and LDirect using equation inFigure 3-7;

calculate ELGen using equation inFigure 3-12;

return ELIn using equation inFigure 3-12;

Function Call or Return or Use:/*not completely implemented*/

Other: return same as ELOut;Figure 7-4: Computation of ELIn

typedef struct {

access_graph_set * LDirect;

access_graph_set * ELKillPath;

access_graph_set * LTransfer_info;

access_graph_set * ELIn;

access_graph_set * ELOut;

access_graph_set * LIn;

access_graph_set * LOut;

} Liveness_analysis_info;

For each function

for each statement in specified traversal ordercompute ELOut set of statement

compute ELIn set of statement

break if ELOut or ELIn is changed


55/57

55

7.2.3. Computation of LDirect

LDirectalso depends on type of statement and is calculated as,


Assignment:

calcuate LDirect using equation inFigure 3-7;

Function Call*:


Return*:


Use*:


* not implemented completely

Figure 7-5: Computation of LDirect

7.2.4. Calculation of EKillPath

EKillPathis only defined for assignment and function call statement.


Assignment:

caculate EKillPath using equation inFigure 3-7;

Function Call:

/*not implemented completely*/

Figure 7-6: Calculation of EKillPath

7.2.5. Calculation of LTransfer

LTransferis defined only for assignment statement.


Assignment :

calculate LTransfer using equation inFigure 3-7;

Figure 7-7: Calculation of LTransfer

Thus the above mentioned algorithm computes liveness analysis of heap and stores final access

graphs associated with each statement.


56/57


57/57

9. References[1]Aho, Sethi, & Ullman.Dragon Book.Pearson Education.

[2]Khedker. (2010). Generic Data Flow Analyser.IITB.

[3]Khedker. (2010).Manipulating GIMPLE and RTL IRs.IITB: GRC.

[4]Khedker, Sanyal, & Karkare.Data Flow Analysis: Theory and Practice.CRC Press.

[5]Khedker, Sanyal, & Karkare. (2007). Heap Reference Analysis Using Access Graphs.

ACM.

[6]Merrill, J. (2003). GENERIC and GIMPLE: A New Tree Representation for Entire

Functions.GCC Developers Summit.

[7]Stallman, R. (2010). GCC Internals.GCC.

HRA Project Report

Documents

Transcript of HRA Project Report