HRA Project Report

download HRA Project Report

of 57

Transcript of HRA Project Report

  • 8/13/2019 HRA Project Report

    1/57

    B.TECH PROJECT REPORT

    on

    HEAP REFERENCE ANALYSIS AND ITS IMPLEMENTATION IN GCC

    SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF

    BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE AND ENGINEERING

    Submitted by:Pratik Patre

    Niranjan Viladkar

    Waman Virgaonkar

    Under the guidance of

    Dr. C. S. Moghe

    Professor, Computer Science and Engineering, VNIT

    &

    Dr. U. P. Khedker

    Professor, Computer Science and Engineering, IITB

    Visvesvaraya National Institute of Technology, Nagpur

    2010-2011

  • 8/13/2019 HRA Project Report

    2/57

    2

    Visvesvaraya National Institute of Technology, Nagpur

    2010-2011

    CERTIFICATE

    This is to certify that the project work entitled HEAP REFERENCE ANALYSIS AND ITS

    IMPLEMENTATION IN GCC, is a bonafide work written by Mr. Pratik Patre, Mr. Niranjan

    Viladkar and Mr. Waman Virgaonkar in the Electronics and Computer Science Engineering

    Department, Visvesvaraya National Institute of Technology, Nagpur, in partial fulfilment of the

    requirements for the award of the degree of Bachelor of Technology inComputer Science and

    Engineering.

    Dr. C.S. Moghe Dr. K. D. Kulat

    Professor, Head of Department,

    Electronics and Computer Science Electronics and Computer Science

    Engineering, Engineering,

    VNIT, Nagpur VNIT, Nagpur

  • 8/13/2019 HRA Project Report

    3/57

  • 8/13/2019 HRA Project Report

    4/57

    4

    ACKNOWLEDGEMENTS

    We take this opportunity to acknowledge with deep sense of gratitude our project guides Dr.C.S. Moghe, Professor, Department of Electronics and Computer Science Engineering, VNIT

    Nagpur and Dr. U. P. Khedker, Professor, Department of Computer Science, IITB for their

    invaluable guidance, motivation, and support which has led to the successful completion of this

    project.

    We also take this opportunity to pay our sincere thanks to Dr. K. D. Kulat, Head of Department,

    Department of Electronics and Computer Science Engineering, VNIT, Nagpur, for providing the

    requisite facilities needed to complete the project. We would also like to thank all the teaching

    and non- teaching staff for supporting us.

  • 8/13/2019 HRA Project Report

    5/57

    5

    ABSTRACT

    Garbage in programs in defined to be unsued data. However current garbage collectors

    approximate it as unreachable data. This is due to the lack of effective analysis techniques for

    heap data. The use of current data flow analysis techniques for heap references is difficult as

    they are matured enough for static data but not for heap. In this project we put forth a data

    flow analysis technique for heap references.

    Our technique for collecting garbage is based on liveness analysis which approximates unused

    data very closely. This analysis uses access graphs as data flow information which captures the

    pattern of heap reference accesses. Since access graphs are bounded and the operations

    defined on them are monotonic we can use data flow analysis framework and all its standard

    results.

  • 8/13/2019 HRA Project Report

    6/57

    6

    Table of Contents

    1. Introduction ....................................................................................................................... 81.1. Motivation ................................................................................................................... 8

    1.2. The solution ................................................................................................................. 8

    1.3. Related work .............................................................................................................. 10

    1.4. Challenges.................................................................................................................. 10

    1.5. Contributions ............................................................................................................. 11

    1.6. Organization of the report ......................................................................................... 11

    2. Data Flow Analysis ........................................................................................................... 12

    2.1. Program analysis ........................................................................................................ 12

    2.2. Data flow analysis abstraction .................................................................................... 12

    2.3. Data flow analysis schema ......................................................................................... 143. Explicit Liveness Analysis of Heap .................................................................................... 16

    3.1. Program to be analysed ............................................................................................. 16

    3.2. Capturing liveness of heap ......................................................................................... 16

    3.3. Capturing liveness using access paths ........................................................................ 17

    3.4. Capturing liveness using access graphs ...................................................................... 22

    3.5. Other analyses ........................................................................................................... 31

    3.6. Implementation in GCC .............................................................................................. 31

    4. Overview of GCC............................................................................................................... 32

    4.1. Intermediate representation ...................................................................................... 32

    4.2. GCC Pass .................................................................................................................... 334.3. Adding a GIMPLE interprocedural pass....................................................................... 34

    4.4. Building a compiler from GCC..................................................................................... 35

    5. Pass Details ...................................................................................................................... 36

    5.1. General outline .......................................................................................................... 36

    5.2. Visiting each statement .............................................................................................. 36

    5.3. Identifying assignment statements ............................................................................ 38

    5.4. Identifying pointer type statements ........................................................................... 38

    5.5. Generate access path set ........................................................................................... 39

    6. Access Graph Library ........................................................................................................ 41

    6.1. Files ........................................................................................................................... 41

    6.2. Formal definitions of the data structures ................................................................... 41

    6.3. The data structures .................................................................................................... 42

    6.4. Operations on access graphs ...................................................................................... 44

    7. Implementation of Explicit Liveness Analysis in GCC........................................................ 53

    7.1. The main function ...................................................................................................... 53

    7.2. Explicit liveness analysis ............................................................................................. 54

    8. Conclusion ........................................................................................................................ 56

    9. References ........................................................................................................................ 57

  • 8/13/2019 HRA Project Report

    7/57

    7

    Table of Figures

    Figure 1-1: Motivating Example of HRA .................................................................................... 10

    Figure 2-1: A code to illustrate DFA .......................................................................................... 13

    Figure 2-2: General algorithm for DFA ...................................................................................... 15

    Figure 3-1: Capturing live objects on the heap ......................................................................... 17

    Figure 3-2: Computation of ELInand ELOut.............................................................................. 19

    Figure 3-3: Flow functions for liveness ..................................................................................... 21

    Figure 3-4: Unbounded access path example ........................................................................... 22

    Figure 3-5: Set of access paths represented using access graphs .............................................. 24

    Figure 3-6: Summarization in access graphs ............................................................................. 24

    Figure 3-7: Liveness capturing equations for assignment statement......................................... 26

    Figure 3-8: Liveness capturing equations for function call statement ....................................... 27

    Figure 3-9: Liveness capturing equations for return statement ................................................ 28

    Figure 3-10: Liveness capturing equations for use statement ................................................... 28

    Figure 3-11: Computation of ELIn for section 3.4.3 ................................................................... 30

    Figure 3-12: ELIn and ELOut definitions .................................................................................... 30

    Figure 3-13: Solution to Figure 1-1 ........................................................................................... 31

    Figure 6-1: Examples of operations on access graphs ............................................................... 47

    Figure 7-1: Main data structure ................................................................................................ 53

    Figure 7-2: Data structure for liveness analysis ......................................................................... 54

    Figure 7-3: General Algorithm .................................................................................................. 54

    Figure 7-4: Computation of ELIn ............................................................................................... 54

    Figure 7-5: Computation of LDirect .......................................................................................... 55

    Figure 7-6: Calculation of EKillPath ........................................................................................... 55

    Figure 7-7: Calculation of LTransfer .......................................................................................... 55

  • 8/13/2019 HRA Project Report

    8/57

    8

    1. Introduction

    Program analysis techniques, especially data flow analysis techniques are employed to find

    various properties of data used in a program. This summarization of properties of data have

    enabled us perform validation, verification and various optimizations on a program. These

    techniques have matured significantly over time for static data i.e. data allocated on stack and

    in static area. However analysis of data allocated on heap has not reached same level of

    maturity.

    Garbage is unused data in program causing memory leak and is mainly present on heap. The

    current inability to analyse heap data has prevented efficient garbage collection. Taking this

    problem as our main motivation we develop a technique for analysis of heap data [5] for

    solving the problem of garbage collection. We would also implement the analysis in GCC to

    obtain a working model of the analysis.

    1.1. Motivation

    Data is allocated on stack or heap. Data allocated on stack has fixed size and fixed lifetime,

    depending on function scope or block scope. This fixed lifetime of static data makes it easy toallocate and de-allocate stack data. Allocating data on heap gives us the flexibility of variable

    size and variable lifetimes. However variable lifetime of heap data makes the question of de-

    allocating heap data a difficult one.

    Traditionally, liveness of heap data has been approximated by reachability. The heap data that

    is unreachable is considered as garbage and de-allocated. However what if some data on heap

    is reachable but never used after a certain program point? That heap data should also be

    treated as garbage and de-allocated. However the current analysis techniques are not powerful

    enough to find such data. Solving the above problem and implementing it in GCC is our main

    motivation.

    1.2. The solution

    We perform static analysis of program extracting properties of heap data accesses and find

    unused data beyond each program point. We make all the references to this heap data as null.

    Now that data is unreachable and will be collected by conventional garbage collectors. This is

  • 8/13/2019 HRA Project Report

    9/57

    9

    known as Cedar Mesa folk wisdom. This would be done by analysing four properties of heap

    references which are explicit liveness, aliasing, availability and anticipability. In accordance with

    these analyses, null assignments are decided upon and checked for safety and profitability.

    However we limit ourselves to explicit liveness analysis in this project. We will implement

    explicit liveness analysis in GCC as an implementation of our approach. GCC is a widely used

    compiler and supports many front-end languages and back-end machines. Also, GCC provides

    good API for interfacing with the program and its manipulation. Hence we would implement

    our analysis in GCC. Since the runtime environment of C program does not guarantee a garbage

    collector, we have to explicitly free the memory when all aliases to an object are nullified.

    1.2.1. Illustrative Example:We present an example to illustrate our approach.Figure 1-1(a) shows the program operating

    on the heap.Figure 1-1(b) shows the memory graph. Root variables are on the stack and the

    actual objects corresponding to the root variables are in the heap. The heap is represented as a

    directed graph with entry nodes on the stack and objects represented as nodes and links i.e.

    references represented as directed edges. Here before execution of line 5 w refers to ma

    always as represented by solid edge. Depending on whether while loop executes none, once,

    twice or thricexrefers to ma, mb, mc, mdas represented by dashed edges. Similarly, yrefers

    to mi, mf, mg, me. mk is an unreachable object while variablezdoes not refer heap and is

    ignored.

    A conventional copying collector will preserve all nodes except mk. However, only a few of

    them are used beyond line 5. The modified program makes the unused nodes unreachable by

    nullifying relevant links. The modifications in the program are general enough to nullify

    appropriate links for any number of iterations of the loop. Observe that a null assignment hasalso been inserted within the loop body thereby making some memory unreachable in each

    iteration of the loop.

  • 8/13/2019 HRA Project Report

    10/57

    10

    Figure 1-1: Motivating Example of HRA

    Courtesy:[5]

    1.3. Related work

    The theoretical basis of our work which includes the heap reference analysis schema and

    proofs of correctness of the analysis was done by Khedker et al.[5]

    1.4. Challenges

    A program accesses data through expressions that have l-values and called access expressions.

    They can be scalar variables such asx or can be a reference expression such asx.lptr.rptr.

    Program analyzes data and hence needs to know the binding of an access expression with data

    i.e. answer the question: What are the different bindings of an access expression to any

    object o on the heap at a program point p along different possible program paths? The

    precision of the analysis depends on the precision of the answer to the above question.

    When the access expressions are simple and correspond to static data, answering the above

    question is often easy because, the mapping of access expressions to l-values remains fixed in a

    given scope throughout the execution of a program. However in the case of reference

  • 8/13/2019 HRA Project Report

    11/57

    11

    expressions, the mapping between an access expression and its l-value is likely to change

    during execution. Observe that manipulation of the heap is nothing but changing the mapping

    between reference expressions and their l-values. For example, inFigure 1-1,access expression

    x.lptrrefers to miwhen the execution reaches line number 2 and may refer to mi, mf, mg, orme at line 4. This implies that, subject to type compatibility, any access expression can

    correspond to any heap data, making it difficult to answer the question mentioned above. All

    these make analysis of programs involving heaps difficult.

    1.5. Contributions

    This project would be the first complete implementation of the heap reference analysis in GCC.

    We would be contributing to both the heap reference analysis by doing its first

    implementation. And to GCC as it is an open source compiler by implementing this analysis in

    GCC.

    1.6. Organization of the report

    Chapter2 would talk about data flow analysis techniques in general. Chapter3 would use the

    data flow techniques for explicit liveness analysis of heap. Chapter4 would give an overview of

    GCC. Chapter5 would be about the interfacing with GCC. Chapter0 consists of implementation

    access graph and its associated operations. Chapter 7 would be about implementation of

    explicit liveness analysis of heap.

  • 8/13/2019 HRA Project Report

    12/57

    12

    2. Data Flow Analysis

    Data Flow analysis1 is an important technique for program analysis. It is a technique for

    gathering information about the flow of data regarding a particular property at various points

    in a computer program. The information gathered is often used for validating a program or by

    compilers when optimizing a program.

    2.1. Program analysis

    Program analysis techniques analyze a particular program with respect to some property.

    Program analyses cover a large spectrum of motivations, basic principles, and methods.

    Different approaches to program analysis differ in details but at a conceptual level, almost all

    program analyses are characterized by some common properties. Although these properties

    are abstract, they provide useful insights about a particular analysis. A deeper understanding of

    the analysis would require exploring many more analysis-specific details.

    Program analysis can be used to determine the validity of a program, to understand the

    behaviour of a program or to transform and optimize a program. Some common paradigms of

    program analysis are inference systems, constraint resolution systems, model checking andabstract interpretations. Data flow analysis is a constraint resolution system based program

    analysis technique.

    2.2. Data flow analysis abstraction

    Data flow analysis statically computes information about the flow of data (i.e., uses and

    definitions of data) for each program point in the program being analyzed. This information is

    required to be a safe approximation of the desired properties of the run time behaviour of the

    program during each possible execution of that program point on all possible inputs.

    A state of a program at a particular time may be regarded as to consisting of values of various

    data objects. The execution of a program can be viewed as a series of transformations of the

    program state. Each execution of an intermediate-code statement transforms an input state to

    1Based on[1] and[4]

  • 8/13/2019 HRA Project Report

    13/57

    13

    a new output state. The input state is associated with the program point before the statement

    and the output state is associated with the program point after the statement.

    When we analyze the behaviour of a program, we must consider all the possible sequences of

    program points i.e. paths through a flow graph that the program execution can take. We then

    extract, from the possible program states at each point, the information we need for the

    particular data-flow analysis problem we want to solve. In general, there is infinite number of

    possible execution paths through a program, and there is no finite upper bound on the length

    of an execution path. Program analyses summarize all the possible program states that can

    occur at a point in the program with a finite set of facts. Different analyses may choose to

    abstract out different information, and in general, no analysis is necessarily a perfect

    representation of the state.

    Illustration:

    Consider the program given below.

    Figure 2-1: A code to illustrate DFA

    What values can a have at program point 5? Answering this question this question seems

    difficult because there is infinite number of execution paths reaching program point 5.

    However in data-flow analysis, we do not distinguish among the paths taken to reach a

    program point. Moreover, we do not keep track of entire states; rather, we abstract out certain

    details, keeping only the data we need for the purpose of the analysis. Summarizing all

    program states at program point 5, a can have values {5, 13}. Also different data flow analyses

    collect different information like, reaching definitions analysis says that definition set {1, 3}

    reaches point 5 while constant folding detects that a cannot be treated as constant at point 5.

    1: a = 5;

    2: while (is_stop()) {

    3: a = 13;

    4: }

    5: if ( a == 13 )

    6: b = a;

    7: else

    8: b = 9;

    9: return b;

  • 8/13/2019 HRA Project Report

    14/57

    14

    2.3. Data flow analysis schema

    In each application of data-flow analysis, we associate with every program point a data-flow

    value that represents an abstraction of the set of all possible program states that can be

    observed for that point. We denote the data-flow values before and after each statements

    by

    IN[s] and OUT[s], respectively. The data-flow problem is to find a solution to a set of

    constraints on the IN[s]'s and OUT[s]'s, for all statements s. There are two sets of

    constraints: those based on the semantics of the statements (transfer functions) and those

    based on the flow of control.

    Transfer function depends on the semantics of the statement and the analysis being

    performed. In a forward-flow problem, the transfer function fs for statement s converts a

    data-flow value before the statement to a new data-flow value after the statement. That is,

    OUT[s] =fs(IN[s]) (2.1)

    Conversely, in a backward-flow problem, the transfer function fs for statement s converts a

    data-flow value after the statement to a new data-flow value before the statement. That is,

    IN[s] =fs(OUT[s]) (2.2)

    Control flow constraints are derived from flow of control. The flow of control is explicitly

    represented in a program flow graph. In the forward flow problem, the constraint flow function

    where U is confluence function is,

    IN[s] = Up is a predecessor of sOUT[p] (2.3)

    In backward flow problem, the constraint flow function is,

    OUT[s] = Up is a successor of sIN[p] (2.4)

    Illustration:

    Consider program inFigure 2-1.While performing reaching definitions analysis of x, consider

    the transfer function of statement 3. The IN set consists of definition set {1} while after the

    statement OUTset is {1, 3}.

    Now consider the constraint flow function at point 9, the program flow graph indicates 2

    predecessors as 6 with OUTset {6} and 8 with OUTset as {8}. The INset of 9 is union of sets at6 and 8 and is {6, 8}.

  • 8/13/2019 HRA Project Report

    15/57

    15

    Unlike linear arithmetic equations, the data-flow equations usually do not have a unique

    solution. Our goal is to find the most "precise" solution that satisfies the two sets of

    constraints. That is, we need a solution that encourages valid code improvements, but does not

    justify unsafe transformations.

    The general method of solving the above constraints is by initializing the INand OUTsets and

    then traversing the program either against or with the control flow satisfying the equations.

    The program is traversed iteratively till no further changes are made to the INand OUTsets.

    The general algorithm for a forward flow problem is,

    Figure 2-2: General algorithm for DFA

    1: out[entry] = {initialization};

    2: for (each statement s other than entry)

    out[s] = {initialization};

    3: while (changes to any OUT occur)

    4: for (each statement s other than entry) {

    5: IN[s] = p is a predecessor of sOUT[p];

    6: OUT[s] = fs (IN[s])

    7: }

  • 8/13/2019 HRA Project Report

    16/57

    16

    3. Explicit Liveness Analysis of Heap

    The method is based on liveness of links for a particular object. The links which are used

    beyond a program point are live while those not used are dead and can be set to null. Here we

    develop a method for liveness analysis of heap data. We define liveness of heap references,

    devise a bounded representation called an access graph for liveness, and then propose a data

    flow analysis for discovering liveness. The method is flow sensitive but context insensitive since

    we take into account flow of control but approximate interprocedural information.

    3.1. Program to be analysed

    The analysis is context insensitive so we would not maintain a call graph and work on program

    flow graph. The program flow graph has a unique Entry and a unique Exit node. Each

    statement forms a basic block. All complex statements are broken down and all the resulting

    simple statements fall into following categories:

    Assignment Statements: These are assignments to references and are denoted by x= ywhere

    the frontier of xand yare references. Only these statements can modify the structure of the

    heap.

    Function Calls: These are statements function calls which involve access expressions in

    arguments and are likex = f (y, z,. . .).

    Use Statements: These statements use heap references to access heap data but do not modify

    heap references. These are access expressions with their frontiers not as references like

    x.data = y.data + z.data.

    Return Statement: These statements are return involving access expression like return x.

    Other Statements: These statements include all statements which do not refer to the heap. We

    ignore these statements since they do not influence heap reference analysis.

    3.2. Capturing liveness of heap

    Capturing liveness of heap at a program point p would mean finding all objects that can be

    accessed in the program after program point p. Links is the way to access an object on the

    heap. Thus if we capture links used after program point p we can capture live objects a s, if at

  • 8/13/2019 HRA Project Report

    17/57

    17

    least one link to an object is live then the object is live. Link lcan be used in two different ways.

    It may be dereferenced to access an object or tested for comparison. An erroneous nullification

    of lwould affect the two uses in different ways: Dereferencing lwould result in an exception

    being raised whereas testing lfor comparison may alter the result of condition and thereby theexecution path. Links are accessed in a program using access expressions as they contain heap

    references. Thus by considering the access expressions after program point p, we can capture

    live links thereby capturing live objects on heap.

    Illustration:

    Consider the program with root as binary tree with left and right as its children:

    Figure 3-1: Capturing live objects on the heap

    At program point 4, what is the liveness of heap? We see that root.left.dataaccess expression

    is used in statement 5 hence the link between root and left (denoted as rootleft) in the

    memory graph becomes alive. Thus we say that the left child of binary tree root is live and

    since right child does not have any live link, it is dead.

    Now we need to capture liveness of links in a memory graph which we do using access paths.

    Access paths actually denote links in a memory graph. The next section would describe the

    approach in detail.

    3.3. Capturing liveness using access paths

    3.3.1. Access paths

    As discussed above, in order to discover liveness and other properties of heap, we need a way

    of naming links in the memory graph. We do it using access paths. An access path is a root

    variable name followed by a sequence of zero or more field names and is denoted by xx

    f1f2....fk. Since an access path represents a path in a memory graph, it can be used for

    naming links and nodes. An access path consisting of just a root variable name is called a simple

    access path; it represents a path consisting of a single link corresponding to the root variable. E

    1: binary_tree root;

    2: root = set_binary_tree();

    3: aliased_root = root;

    4

    5: return root.left.data;

  • 8/13/2019 HRA Project Report

    18/57

    18

    denotes an empty access path. The last field name in an access path is called itsfrontier and is

    denoted by Frontier (). The frontier of a simple access path is the root variable name. The

    access path corresponding to the longest sequence of names in excluding its frontier is called

    its base and is denoted by Base(). Base of a simple access path is the empty access path. Theobject reached by traversing an access path is called the target of the access path and is

    denoted by Target(). When we use an access path to refer to a link in a memory graph, it

    denotes the last link in, that is, the link corresponding to Frontier ().

    Illustration:

    ConsiderFigure 3-1,for the access pathroot leftat program point 3, Base ()is root

    while Frontier ()is the link rootleft and Target ()is the left child of root.

    As explained earlier, Figure 1-1(b) is the superimposition of memory graphs that can result

    before line 5 for different executions of the program. For the access pathxx lptr lptr,

    depending on whether the while loop is executed 0, 1, 2, or 3 times, Target (x) denotes

    nodes mj, mh, mm,or ml. Frontier (x)denotes one of the links mimj, mfmh, mgmm

    or meml. Base(x) represents the following paths in the heap memory: xmami ,

    xmbmf, xmcmgorxmdme.

    In the rest of the report, denotes an access expression, denotes an access path and

    denotes a (possibly empty) sequence of field names separated by . Let the access expression

    xbe xf1f2 fn. Then, the corresponding access path xis xf1f2 fn. When the

    root variable name is not required, we drop the subscripts from xandx.

    3.3.2. Liveness of access paths

    Now we need to define liveness of access paths. For a link lto be live there must be at least one

    access path from some root variable to lsuch that every link in this path is live. This is the path

    that is actually traversed while using l. An access path is defined to be live at p if the link

    corresponding to its frontier is live along some path starting at p. Safety of null assignments

    requires that the access paths which are live are excluded from nullification.

    We initially limit ourselves to a subset of live access paths, whose liveness can be determined

    without taking into account the aliases created before p. These access paths are live solely

    because of the execution of the program beyond p. We call access paths that are live in this

  • 8/13/2019 HRA Project Report

    19/57

    19

    sense as explicitly live access paths. An interesting property of explicitly live access paths is that

    they form the minimal set covering every live link.

    Illustration:

    Consider the program in Figure 3-1 at program point 4, the left child of root is accessed and

    hence live. The access path used in program is rootleft and hence it is live. But even if

    aliased_rootleft access path is not used after statement 4 its frontier link is live i.e. link

    between objects pointed by rootand left child. Here we say that rootleft is explicitly live

    since all its links are actually in the program. While for aliased_rootleft it is not explicitly

    live and we also notice that aliased_root link (from aliased_rootvariable on stack to root

    object on heap) is never used.

    We would now focus on developing a data flow analysis technique based on capturing liveness

    using access paths.

    3.3.3. Using access paths to capture liveness

    We now look at how statement semantics would affect liveness of access paths. And thus

    derive flow constraints in the form of flow functions. Liveness analysis is a backward flow

    analysis. Any statement can affect the incoming access path set in the following ways. Here

    ELIndenotes incoming access path set and ELOutdenote the outgoing access path set from a

    statement.

    Let us try to see the effect by an illustration:

    Illustration:

    Consider the program fragment,

    Figure 3-2: Computation of ELInand ELOut

    The EOutof the above statement 2 is {xlptrrptrlptr}. Consider,

    xlptrrptr is being modified rendering the value before the statement useless. Hence

    access paths with prefixxlptrrptrcease to exist before the statement. Such access paths

    are reffered as killed access paths. In this case it is {xlptrrptrlptr}.

    1:

    2: x.lptr.rptr = y.rptr.lptr;

    3: print (x.lptr.rptr.lptr.data);

  • 8/13/2019 HRA Project Report

    20/57

    20

    Objects with access paths xlptr and yrptr are directly accessed. These access paths

    become live. Such access paths are reffered as directly generated access paths.

    Here yrptrlptr is being assigned to xlptrrptr. Thus the objects accessed using

    xlptrrptr{some_path} after the statement must be accessible using y

    rptrlptr{some_path} before the assignment. Such access paths are reffered as

    transferred access paths. Thus transferred access paths are { yrptrlptrlptr}.

    The final set of access paths which are live can be computed by removing the killed access

    paths from ELIn and adding directly generated and transferred access paths.

    Thus the final ELInof statement 2 is {xlptr, yrptrlptrlptr}.

    Formalizing the above observations,

    Killed Access Paths: These are the access paths that cease to exist before the statement since

    the access path was modified in the statement invalidating the previous value assigned to it.

    Access paths those are live after the assignment and not killed by it are live before the

    assignment also.

    Directly Generated Access Paths: These are access paths directly used in a statement and hence

    become live before a statement.

    Transferred Access Paths: These are the access paths that get transferred from one access path

    to another due to an assignment statement. This is to take into account the change in bindings

    of an access expression.

    Finally the ELInset is computed from the ELOutset as,

    ELIn = (ELOut Killed access paths)

    U (Directly generated access paths U Transferred access paths)(3.1)

    3.3.4. Liveness analysis schema

    Now we define the liveness analysis schema using access path. We would also describe control

    flow constraints on data flow equations.

  • 8/13/2019 HRA Project Report

    21/57

    21

    Explicit Liveness: The set of explicitly live access paths at a program point p, denoted by

    Livenesspis defined as follows:

    (3.2)

    where, Paths(p)is a control flow path frompto Exitand

    denotes the

    liveness atpalong .

    Path Liveness: Ifp is not program exit, then let the statement that follows it be denoted by s

    and the program point immediately following sbe denoted byp. Then,

    (3.3)

    Statement Liveness: The flow function is defined as:

    (3.4)

    LKills denotes the sets of access paths that cease to be live before statement s, LDirects

    denotes the set of access paths that become live due to local effect of s and LTransfers(X)

    denotes the set of access paths which become live before sdue to transfer of liveness from

    live access paths after s.

    Illustration:

    The flow functions explained later in section3.4.3

    Flow function is defined as,

    Figure 3-3: Flow functions for liveness

    Courtesy:[5]

    The definitions of LKills, LDirects, and LTransfers(X) ensure that the Livenessp is prefix-

    closed.

  • 8/13/2019 HRA Project Report

    22/57

    22

    3.3.5. Difficulties

    3.3.5.1. Unbounded access paths:

    Access paths cannot be guaranteed to be bounded in case of loops and thus termination

    cannot be guaranteed.

    Illustration:

    Figure 3-4: Unbounded access path example

    During 1st

    iteration: ELInat 3 is {xptr}, ELOutat 3 is {xnptr}

    During 2nd

    iteration: ELInat 3 is {xnptr}, ELOutat 3 is {xnnptr}

    During nth

    iteration: ELInat 3 is {xn[n-1 times]ptr}, ELOutat 3 is {xn[n times]ptr}

    Hence a way to summarize access paths is needed.

    3.3.5.2. Data Flow Equations

    The data flow equations above were MoP solution equations. Hence they are not suitable for

    data flow analysis. We need to define MFP solution equations.

    3.4. Capturing liveness using access graphs

    In the presence of loops, the set of access paths may be infinite and the lengths of access paths

    may be unbounded. This problem is solved by representing a set of access paths by a graph ofbounded size.

    3.4.1. Access Graphs

    A set of access paths can be represented using access graphs. An access graph, denoted by Gv,

    is a directed graph representing a set of access paths starting from a root variable

    v. N is the set of nodes, n0NF is the entry node with no in-edges and E is the set of edges.

    Every path in the graph represents an access path. The empty graph Ghas no nodes or edges

    and does not accept any access path.

    1:

    2: while (is_stop()) {

    3: x = x.n;

    4: }

    5: print (x.ptr.data);

  • 8/13/2019 HRA Project Report

    23/57

    23

    The entry node of an access graphs is labelled with the name of the root variable while the

    non-entry nodes are labelled with a unique label created as follows: If a field name is

    referenced in basic block b, we create an access graph node with a label 2where iis the

    instance number used for distinguishing multiple occurrences of the field name in block b.Note that this implies that the nodes with the same label are treated as identical. Access paths

    xare represented by including a summary node denoted nwith a self loop over it. It is

    distinct from all other nodes but matches the field name of any other node. A node in the

    access graph represents one or more links in the memory graph.

    Illustration:1:

    2: x.lptr.rptr = y.rptr.lptr;3: print (x.lptr.rptr.lptr.data);

    4: print (y.rptr.obj1.data);

    The live access paths at each point represented using both access paths and access graphs are,

    Program

    Point

    Set of live access

    pathsAccess graphs

    OUT set at 4 NULL

    IN set at 4 yrptrobj1

    OUT set at 3 yrptrobj1

    IN set at 3yrptrobj1,

    xlptrrptrlptr

    OUT set at 2yrptrobj1,

    xlptrrptrlptr

    2In implementation, lable is where s is statement number in a basic block b created by GCC.

  • 8/13/2019 HRA Project Report

    24/57

    24

    IN set at 2xlptr,

    yrptrlptrlptr,

    yrptrobj1

    Figure 3-5: Set of access paths represented using access graphs

    Access graphs solve the problem of infinite access paths by summarization. Summarization in

    access graphs is achieved by merging appropriate nodes in access graphs, retaining all INand

    OUTedges of merged nodes. The technique is illustrated as below,

    Illustration:

    Consider the program flow graph shown,

    Figure 3-6: Summarization in access graphs

    Courtesy:[5]

    Node n1 in access graph 1 indicates references of r at different execution instances of the

    same program point. Every time this program point is visited during analysis, the same state is

    reached in that the pattern of references after r1 is repeated. Thus all occurrences of r1 are

    merged into a single state. This creates a cycle which captures the repeating pattern of

    references.

  • 8/13/2019 HRA Project Report

    25/57

    25

    In 2, nodes r1and r2indicate referencing n at different program points. Since the references

    made after these program points may be different, r1and r2are not merged.

    Some operations are defined on access graphs as, the complete formal definitions of the

    following and more graph functions are described in Chapter0.

    G () Constructs access graphs corresponding to

    Path Removal() The operation Gremoves those access paths in Gthat haveas aprefix

    lastNode (G) Returns the last node of a linear graph G

    Union (U) GU Gcombines access graphs Gand Gsuch that any access pathcontained in Gor Gis contained in the resulting graph

    Factorization (/) G/(G,M)returns all remainder graphs in Gstarting from nodes in Gcorresponding to Min G

    Extension(#) (G,M)#R returns graph Gextending it by remainder graphs in Ratnodes in M

    3.4.2. Liveness representation using access graphs

    A set of access paths can be represented using access graphs. Every path in the graph

    represents an access path. All the access paths present in an access graph are live. This causes

    approximation during summarization but is safe.

    3.4.3. Capturing liveness using access graphs

    We now look at how statement semantics would affect liveness of access paths. And thus

    derive flow constrains in the form of flow functions. Liveness analysis is a backward flow

    analysis. Any statement affects the incoming access path set depending on its type and is

    explained below. Here ELIndenotes incoming access path set and ELOutdenote the outgoing

    access path set from a statement.

    3.4.3.1. Assignment statement

    Assignment statement will be of the form : x= y

  • 8/13/2019 HRA Project Report

    26/57

    26

    We know how an assignment statement affects liveness of heap as seen in the illustration in

    section3.3.3.Now we will see how to capture these effects using access graphs.

    Illustration:

    Consider the program statement,

    5: x.left.right = y.right.left.right

    The access path xleftright gets modified. So we have to remove all access paths with

    xleftright as prefix. Hence killed access paths are {xleftright}.

    The access paths xleft and yrightleft are generated. Thus the base of directly used access

    expressions is generated.

    Some access paths are to be transferred from xleftright to yrightleftright. The access

    paths from access graph of x with prefix xleftright have to be copied as remainder graphs

    using graph factorization and then attached to access graph of y with prefix

    yrightleftright using graph extension.

    Formalizing the above observations,

    Figure 3-7: Liveness capturing equations for assignment statement

    In theabove equations, Gxand Gydenote G(x) and G(y), respectively, whereas Mxand My

    denote lastNode(G(x))and lastNode(G(y))respectively.

    3.4.3.2. Function callFunction call will be of the form: x=(y)

  • 8/13/2019 HRA Project Report

    27/57

    27

    We conservatively assume that a function call may make any access path rooted at y or any

    global reference variable live. Thus, this version of our analysis is context insensitive.

    Illustration:

    Consider the program statement, with global variable z,

    5: x.left.right = func (y.right);

    The access path xleftright gets modified. So we have to remove all access paths with

    xleftright as prefix. Hence killed access paths are {xleftright}.

    The access paths xleft and y get directly accessed hence get directly generated.

    The access path yright is passed as parameter to the function and so any access paths may be

    accessed after yright. Thus we conservatively approximate that the generated access path is

    {yrightn*}. Similarly any access path from global variable may be accessed and so we

    conservatively assume that the generated access path is {zn*}.

    Formalizing the above observations,

    Figure 3-8: Liveness capturing equations for function call statement

    3.4.3.3. Return statement

    Return statement will be of the form : return x

    Illustration:

    Consider the program statement, with global variable z,

    5: return x.left;

    The access paths xleft get directly accessed hence get directly generated.

  • 8/13/2019 HRA Project Report

    28/57

    28

    The access path xleft is passed as a return value to the calling function and so any access

    paths may be accessed after xleft. Thus we conservatively approximate that the generated

    access path is {xleftn*}. Similarly any access path from global variable may be accessed and

    so we conservatively assume that the generated access path is {zn*}.

    Formalizing the above observations,

    Figure 3-9: Liveness capturing equations for return statement

    3.4.3.4. Use statement

    Illustration:

    Consider the program statement, with global variable z,

    5: x.left.data = y.right.data + z.left.right.data;

    The access paths xleft, yright, zleftright get directly accessed hence get directlygenerated.

    Formalizing the above observations,

    Figure 3-10: Liveness capturing equations for use statement

    3.4.4. Liveness analysis schema revisited

    Now we define the liveness analysis schema using access graphs. We would also describe

    control flow constraints on data flow equations.

    Now to compute liveness ELIn due to a statement, we have to remove killed access paths and

    add directly generated and transferred access paths.

  • 8/13/2019 HRA Project Report

    29/57

    29

    And while computing ELOut we have to merge the access paths present at the ELIn of its

    successors.

    Now we will see their computation using some illustrations,

    Illustration:

    Now we will illustrate ELIn computation for the examples used to illustrate effect of each

    statement type on access graphs.

    Illustration

    in sectionOUT set IN set

    3.4.3.1

    (Assignment

    statement)

    3.4.3.2(Function

    call)

  • 8/13/2019 HRA Project Report

    30/57

    30

    3.4.3.3

    (Return

    statement)

    3.4.3.4

    (Use

    statement)

    Figure 3-11: Computation of ELIn for section3.4.3

    Formalizing,

    For a given root variable v, ELInv(i) and ELOutv(i) denote the access graphs representing

    explicitly live access paths at the entry and exit of statement i. We use EGas the initial value for

    ELInv(i) / ELOutv(i).

    Figure 3-12: ELIn and ELOut definitions

    EKillPath, LDirectand LTransferare defined according to the type of statement.

    Solving theabove data flow equations we get the solution as access graphs.

  • 8/13/2019 HRA Project Report

    31/57

    31

    Illustration:

    The solution of the problem described inFigure 1-1 is,

    Figure 3-13: Solution toFigure 1-1

    Courtesy:[5]

    3.5. Other analyses

    Other analyses that are required for null assignment insertions are discussed in brief below.

    Their study and implementation is not covered in this project.

    Alias analysis and complete liveness computation: This analysis discovers all aliases and thus

    finds all paths aliased to live access paths.

    Anticipability and availability analysis: This analysis discovers available and anticipable access

    paths so that insertion of new access paths does not cause exceptions.

    Null assignment insertion: Null assignment insertion is subject to safety and profitability.

    3.6. Implementation in GCC

    We have now seen the formulation of data flow analysis equations for heap reference analysis.

    Now we would implement the analysis in GCC in the succeeding chapters.

  • 8/13/2019 HRA Project Report

    32/57

  • 8/13/2019 HRA Project Report

    33/57

    33

    4.1.4. GIMPLE

    GIMPLE is a simplified version of GENERIC. It is lowering of GENERIC to a three-operand

    representation. Temporaries are introduced to hold intermediate values needed to compute

    complex expressions as three-operand statements. Additionally, all the control structures used

    in GENERIC are lowered into the conditional jumps.

    The compiler pass, which converts GENERIC to GIMPLE is referred to as gimplifier [7]. This

    pass works recursively replacing each complex statement by a result-wise equivalent set of

    gimple three-operand statements. These GIMPLE statements are also referred to as GIMPLE

    tuples.

    Earlier implementation of GIMPLE used trees as internal data structure[3].But, tree structure

    was much more general than required for three address statements. Here comes the concept

    of tuples. It contains information such as type of statement, result, operator and operands.

    Operands themselves are represented as trees.

    For example,

    x= 10 would be represented as gimple_assign

    x = b+c would be represented as gimple_assign

    4.2. GCC Pass

    In order to analyze programs, perform certain operations on them, we need to add a pass to

    GCC. Pass is a C program that with the help of GCC APIs extracts information from previous

    pass or input program or both, performs certain operations on the information received and

    produce output that may or may not be forwarded to next pass. Behaviour of any pass can be

    observed by looking at the dumps produced by corresponding pass. For eg. To observe the

    output dump by gimplifier, while compiling input program, we can provide a switch -fdump-

    tree-gimple.

    4.2.1. Types of passes

    There are 4 types of passes, gimple_opt_pass, simple_ipa_opt_pass, ipa_opt_pass and

    rtl_opt_pass. The definitions and declarations are provided in $SOURCE/gcc/tree-pass.h. We

    will use simple_ipa_opt_pass.

  • 8/13/2019 HRA Project Report

    34/57

    34

    4.3. Adding a GIMPLE interprocedural pass

    In GCC, any pass is represented by a structure, in our case that structure is:

    simple_ipa_opt_pass. The declaration of this structure and detailed information about the

    fields of this structure can be found in $SOURCE/gcc/tree-pass.h. The definition of our pass

    structure is as follows:

    struct simple_ipa_opt_pass pass_empty = {

    {

    SIMPLE_IPA_PASS, /*Type of Pass*/

    "hra" , /*Switch to execute the pass*/

    NULL , /*Condition function */

    empty_func_driver, /*Entry point*/

    NULL , /*sub passes*/

    NULL , /*Next subpasses*/

    0 , /*static pass number*/

    0 , /*tv_id */

    0 , /*properties required, indicated by bit position*/

    0 , /*properties provided, indicated by bit position*/

    0 , /*properties destroyed, indicated by bit position*/

    0 , /*todo flags start*/

    0 /* todo flags finish */

    }

    };

    4.3.1. Registering the pass

    We need to register our pass, i.e. our C program file by adding it in $SOURCE/gccdirectory

    and make changes in following files:

    1. $SOURCE/gcc/passes.c

    2. $SOURCE/gcc/tree-pass.h3. $SOURCE/gcc/Makefile.in

    In passes.c, we need to determine the position of pass by adding its entry in appropriate

    position in pass list present in init_optimization_passes()function. As our pass is simple ipa

    optimization pass, we can add our pass when the pass pointer is set to point all regular ipa

  • 8/13/2019 HRA Project Report

    35/57

    35

    passes. As it does not take into input from any previous pass neither does it provide its output

    to any other pass, the exact ordering is not of much importance.

    In tree-pass.h, we need to make declaration of our pass as :

    extern struct simple_ipa_opt_pass ;

    InMakefile.in, we need to write rule to make target pass_name.o and and pass_name.o to the

    list of language independent object files.

    4.4. Building a compiler from GCC

    Here, our target is to build a compiler (cc1) which when input by a C program would produce

    corresponding assembly *.s file. The steps to build a compiler are as follows:

    1. Write rule to make target cc1 in file $SOURCE/Makefile.in

    cc1:

    make all-gcc TARGET-gcc=cc1$(exeext)

    2. Make a new build directory (hereafter $BUILD)outside the source code directory

    3. With current directory as $BUILD, configure it with $SOURCE/configure. We can give

    many options while configuring, such as, enable-languages, target(i.e target

    architecture / machine for which generated compiler would produce the assembly

    code), install directory etc.

    4. After configuring, run make with target as cc1. This step requires time, roughly 10-12

    minutes on average machine.

    5. After successful completion of make, generated compiler can be used by using

    $BUILD/gcc/cc1

    for eg. $BUILD/gcc/cc1 program.c -fdump-ipa-allwould compileprogram.cto produce

    program.sand around 20-25 dumps of all the interprocedural passes.

    By observing the dumps, we can understand the behaviour of various passes for given input

    program. For our pass, the corresponding switch is -fdump-ipa-hra.

  • 8/13/2019 HRA Project Report

    36/57

  • 8/13/2019 HRA Project Report

    37/57

    37

    5.2.2. Visiting each basic block

    In a given function, each basic block can be visited in the following manner:

    FOR_EACH_BB(BB){

    //code to analyze each basic block here.

    }

    Here, FOR_EACH_BB(BB)is a macro provided by GCC which uses a global variable cfunto point to

    current function, and in current function, it uses BBto point to each basic block. The body of

    macro is a simple for loop which starts from the first basic block and then advances to next

    block till it reaches end.

    5.2.3. Visiting each GIMPLE statement

    In a given basic block, each GIMPLE statement can be visited using the following macro:

    #define FOR_ALL_STMT_FWD_VNIT(BB, GSI) \

    FOR_EACH_BB(BB) \

    FOR_EACH_GIMPLE_STMT_VNIT(BB, GSI)

    Here, body of FOR_ALL_STMT_FWD_VNIT(BB, GSI)is made up of two macros, former is provided by GCC

    and the later has been defined in the pass as:

    #define FOR_EACH_GIMPLE_STMT_VNIT(BB, GSI) \

    for(GSI = gsi_start_bb(BB); !gsi_end_p(GSI); gsi_next(&GSI))

    Here, GSIis a gimple statement iterator, whose data type is provided by GCC. As we can see, in

    the body of second macro, GSI first points to the start statement of the basic block and then

    goes till it reaches the end. In the body of this for loop, we can use gsi_stmt(GSI) to access the

    corresponding GIMPLE statement. Thus driver function for our pass after removing

    unnecessary details looks like:

    static unsigned int empty_func_driver(){

    preparatory_iterations();

    for (cnode = cgraph_nodes; cnode; cnode=cnodenext){ //iterate over all functions

    push_cfun (DECL_STRUCT_FUNCTION (cnodedecl)); //push current function

    FOR_ALL_STMT_FWD_VNIT(bb, gsi){ //iterate over each gimple statement in current function

    if ( is_gimple_assign(gsi_stmt(gsi)) && is_stmt_pointer_type(gsi_stmt(gsi)) )

    get_access_paths(gsi_stmt(gsi));

    }

    pop_cfun ();

    }

    return 0;

    }

  • 8/13/2019 HRA Project Report

    38/57

    38

    5.3. Identifying assignment statements

    In our pass, we are currently able to identify only the GIMPLE assignment statements. Future

    work will include identifying and analysing function call, return and use statements. After the

    study of file$SOURCE/gcc/gimple.h

    , we found a function is_gimple_assign( gimple stmt )that checks

    whether a given GIMPLE statement is an assignment statement or not. So when we visit each

    GIMPLE statement, we check that statement with above function and proceed towards further

    analysis if it is an assignment statement else we move to the next GIMPLE statement.

    5.4. Identifying pointer type statements

    Once found to be an assignment statement, it needs to be checked for pointer type. If any of

    the three operands of an assignment are of pointer type, we recognize that statement as

    pointer type statement. The check consists of checking the tree codes and types of all the

    operands. GCC assigns each operand a tree code and provides a macro TREE_CODE() that

    extracts the tree code. It also provides with macro POINTER_TYPE_P(type) which checks the type (of

    any operand) to be of pointer type and returns the boolean result. Type of operand can be

    found by TREE_TYPE()macro, again provided by GCC. The code to check if variable is of pointer

    type:

    static bool is_pointer_var(tree var){

    if (TREE_CODE(var) == COMPONENT_REF || TREE_CODE(var) == ADDR_EXPR)

    return true;

    return is_pointer_type(TREE_TYPE(var));

    }

    static bool is_pointer_type(tree type){

    if(POINTER_TYPE_P(type))

    return true;

    if(TREE_CODE(type) == ARRAY_TYPE)

    return (is_pointer_var(TREE_TYPE(type)));

    return AGGREGATE_TYPE_P(type);

    }

    5.4.1. Extracting operands

    In order to check the tree codes and types, first we need to extract operands from a given

    GIMPLE statement. This can be done using functions provided by GCC:

    1. tree gimple_assign_lhs(gimple stmt)

    2. tree gimple_assign_rhs1(gimple stmt)

    3. tree gimple_assign_rhs2(gimple stmt).

  • 8/13/2019 HRA Project Report

    39/57

    39

    5.5. Generate access path set

    This function returns access path set for each pointer type assignment statement. It gets access

    paths for each operand and then clubs them together to get an access path set.

    5.5.1. Getting access paths

    In order to get access path from each operand, we use functions such as :

    access_path * get_access_path_lhs(gimple stmt). This function extracts the names (field names) of

    variables as used by programmer (or compiler generated temporaries). The function to get field

    names looks like: (functions for rhs operands resemble this function)

    static char * get_lhs_op (const gimple stmt){

    tree t;

    if (is_gimple_assign(stmt)){

    t = gimple_assign_lhs(stmt);

    return get_name_of_tree1(t);

    }

    return NULL;

    }

    And if operand is of pointer type, it generates a label for that operand from following entities:

    field name, basic block number, statement number. Out of these, field name extraction and

    assigning statement number task has been done in the pass. GCC assigns each basic block a

    unique index (number). This triplet makes a label unique.

    Once labels are prepared, they are combined together to get an access path for corresponding

    operand. And then, access paths of all the operands in a statement are combined together to

    get an access path set for that GIMPLE statement. Note that, access path is for an operand

    while access path set is for a GIMPLE statement.

  • 8/13/2019 HRA Project Report

    40/57

    40

    The code for getting access path set looks like:

    static access_path_set * get_access_paths(gimple stmt){

    switch(gimple_code(stmt)){

    case GIMPLE_ASSIGN:

    ap_lhs = get_access_path_lhs (stmt);

    ap_rhs1= get_access_path_rhs1(stmt);

    ap_rhs2= get_access_path_rhs2(stmt);

    break;

    default:

    break;

    }

    stmt_aps lhs = ap_lhs;

    stmt_aps rhs1 = ap_rhs1;

    stmt_aps rhs2 = ap_rhs2;

    return stmt_aps;

    }

    This completes the phase of retrieving static information from GCC.

  • 8/13/2019 HRA Project Report

    41/57

    41

    6. Access Graph Library3

    6.1. Files

    AccessGraph.h

    This file contains the declaration of the data structure used to represent access graphs and

    access paths and also the declaration of the functions associated with it.

    AccessGraph.c

    This file contains the definition of all the functions required in the explicit liveness analysis.

    6.2. Formal definitions of the data structures

    6.2.1. Access Paths

    An access path is a root variable name followed by a sequence of zero or more field names and

    is denoted by x xf1f2 fk. Since an access path represents a path in a memory

    graph, it can be used for naming links and nodes. An access path consisting of just a root

    variable name is called a simple access path; it represents a path consisting of a single link

    corresponding to the root variable. E denotes an empty access path.

    The last field name in an access path r is called its frontier and is denoted by Frontier (). The

    frontier of a simple access path is the root variable name. The access path corresponding to the

    longest sequence of names in r excluding its frontier is called its base and is denoted by Base

    (). Base of a simple access path is the empty access path E. The object reached by traversing

    an access path r is called the target of the access path and is denoted by Target (). When we

    use an access path r to refer to a link in a memory graph, it denotes the last link in , i.e. the

    link corresponding to Frontier ().

    6.2.2. Access graphsAn access graph, denoted by Gv, is a directed graph representing a set of access paths

    starting from a root variable v. N is the set of nodes, n0NFis the entry node with no in-edges

    and E is the set of edges. Every path in the graph represents an access path. The empty graph

    EGhas no nodes or edges and does not accept any access path.

    3Based on [5]

  • 8/13/2019 HRA Project Report

    42/57

  • 8/13/2019 HRA Project Report

    43/57

    43

    Here the access path lhs corresponds to the access path of the variable that is on the left hand

    side of the =sign, while the access paths rhs1and rhs2correspond to the access paths of the

    variables that are on the right hand side in the expression.

    6.3.5. Access graph node

    This structure represents a node in an access graph which has been implemented as a node in

    an adjacency linked list representation of a graph.

    typedef struct AGN{

    unsigned summary : 1 ;

    Label l ;

    struct AGE * edges ;

    struct AGN * next ;

    } AccessGraphNode ;

    The label lholds the information in the node while the summary bit denotes whether the node

    is a summary node or not. The edges pointer points to the linked list of edges originating from

    the node.

    6.3.6. Access graph edge

    This structure represents an edge in the access graph as well as in the adjacency linked list

    representation of the graph.

    typedef struct AGE{

    AccessGraphNode * from_node ;

    AccessGraphNode * to_node ;

    struct AGE * next ;

    } AccessGraphEdge ;

    The access graph node pointers from_node and to_node point to the originating and

    destination node of the edge respectively.

    6.3.7. Nodes set

    The nodes set is set of nodes in the access graph and is implemented as a simple linked list of

    nodes.

    typedef struct NS{

    AccessGraphNode * first_node ;

    } Nodes_Set ;

  • 8/13/2019 HRA Project Report

    44/57

    44

    6.3.8. Edges set

    The edges set is the set of edges in the access graph and is implemented as a simple linked list

    of edges. Thus, unlike the conventional adjacency linked list representation, all the edges in the

    access graph form a single linked list with edges originating from the same node grouped

    together.

    typedef struct ES{

    AccessGraphEdge * first_edge ;

    } Edges_Set ;

    6.3.9. Access graph

    As given by the formal definition of the access graph, it has been implemented as structure

    with nodes set and edges set. The first node in the nodes set always corresponds to the entry

    node in the graph.

    typedef struct G{

    Nodes_Set Nodes ;

    Edges_Set Edges ;

    struct G * next ;

    } AccessGraph ;

    6.3.10. Access graph set

    The access graph set represents the set of access graphs as a link list.typedef struct AG{

    AccessGraph * start ;

    } AccessGraphSet ;

    6.4. Operations on access graphs

    6.4.1. Auxiliary operations

    6.4.1.1. ConstructGraph( g)Constructs access graph g corresponding to access path . It involves converting the access

    path nodes to access graph nodes and adding the corresponding edges.

    void ConstructGraph (AccessPath * p , AccessGraph * g)

    begin

    For all the nodes in the access path

    begin

    Create a corresponding node in the access graph

    end

    Add edges with respect to access path to access graphend procedure

  • 8/13/2019 HRA Project Report

    45/57

    45

    6.4.1.2. lastNode(G)

    Returns the last node of a linear graph G constructed from a given

    AccessGraphNode* lastNode (AccessGraph * G)

    begin

    Traverse the linked list and return the last node

    end procedure

    6.4.1.3. CleanUp(G)

    Deletes the nodes which are not reachable from the entry node.

    void CleanUp (AccessGraph * g)

    begin

    1. Run a Depth First Traversal over the graph and mark all the visited nodes

    2. Traverse the linked list of nodes and delete all the unmarked nodes and

    their edges from the graph

    end procedure

    6.4.1.4. CorrespondingNodes(G,G,S)

    Computes the set of nodes of Gwhich correspond to the nodes of Gspecified in the set S. To

    compute CN(G,G,S), we defineACN(G,G), the set of pairs of all corresponding nodes. Let G

    and G .A node nin Gcorresponds to a node nin Gif there

    exists an access path rwhich is represented by a path from n0tonin Gand a path from n0 to

    nin G.

    Formally,ACN(G,G)is the least solution of the following equation:

    (0.1)

    Note that Field(nj) = Field(nj)would hold even when njor njis the summary node n.

    void Corresponding_Nodes (AccessGraph* G, AccessGraph* G_, Nodes_Set S, Nodes_Set CN)

    begin

    All_Corresponding_Nodes (G , G_ , ACN1 , ACN2);

    For each node n in ACN2 and n in ACN1

    if n S then add n to CN

    end procedure

  • 8/13/2019 HRA Project Report

    46/57

    46

    void All_Corresponding_Nodes (AccessGraph* G, AccessGraph* G_, Nodes_Set ACN1,

    Nodes_Set ACN2)

    begin

    if root(G) != root (G_) then return;

    Starting from the root node recursively add pair of nodes to the set ACN1 and

    ACN2 which are same and have edges coming to them from the pair of nodes

    already in these sets.

    end procedure

    6.4.1.5. CopyGraph (G,G)

    Copies the graph Ginto a new access graph G.

    AccessGraph* copy_graph (AccessGraph * g)

    begin

    Copy all the nodes of g into a new graph g

    Copy all the edges of g into g establishing links between the nodes and the

    edges set

    Return g

    end procedure

    6.4.1.6. RemainderGraph(G,G,n)

    Constructs a remainder graph Gfrom an access graph Gwith n as the entry node.

    AccessGraph* remainder_graph (AccessGraph* g, AccessGraphNode* n)

    beginRun a recursive depth first traversal over the graph g starting from node n

    and add each node to a new graph g while visiting it along with all its

    edges.

    end procedure

    6.4.2. Main operations

    6.4.2.1. Union

    G Gcombines access graphs Gand Gsuch that any access path contained in Gor G is

    contained in the resulting graph.

    G G = < n0, N N, E E > (0.2)

    The operation N N treats the nodes with the same label as identical. Because of

    associativity,can be generalized to arbitrary number of arguments in an obvious manner.

    This operation can be explained more effectively by the examples given in Figure 6-1. In thefirst example the access graphs g3 and g4 unite to give the access graph g4 since the g3 is the

  • 8/13/2019 HRA Project Report

    47/57

    47

    subset of g4. In the second example the union of access graphs g2 and g4 results in the access

    graph g5. Note here that union basically just takes the unions of the nodes and edges set of the

    two access graphs with the same root variable. The other two examples are on the same line.

    The implementation of this operation is based on the definition given above. The union of

    nodes set and edges set of both the graphs is done and then the links are established between

    the two sets resulting in a new graph.

    Figure 6-1: Examples of operations on access graphs

    Courtesy:[5]

    AccessGraphSet * Union (AccessGraphSet * G1 , AccessGraphSet * G2)

    begin

    AccessGraphSetG3 ;

    for each graph g1 in set G1

    begin

    for each graph g2 in set G2

    begin

    if(root (g1) == root (g2))

    then begin

    g3 = union_graph(g1 , g2) ;

    add g3 to G3 ;

    endif

    end for

    end for

    Return G3

    end procedure

  • 8/13/2019 HRA Project Report

    48/57

    48

    accessgraph * union_graph (accessgraph * g1 , accessgraph * g2)

    begin

    accessgraph * g3 ;

    copy all the nodes of g1 to g3 ;

    for each node n2 in g2

    begin

    if n2 is not present in g3

    then add n2 to g3 ;

    end for

    copy all edges of g1 to g3 ;

    for each edge e2 in g2

    begin

    if e2 is not present in g3

    then add e2 to g3 ;

    end for

    for each node n3 in g3

    begin

    search for the first edge e3 in g3

    such that e3 from_node = n3

    n3 edges = e3 ;

    end for

    return g3 ;

    end procedure

    6.4.2.2. Path removalThe operation Gremoves those access paths in G which have as a prefix.

    (0.3)

    Where,

    (0.4)

    UniqueAccessPath?(G, n)returns true if in G, all paths from the entry node to node n represent

    the same access path.

    In the first example given in Figure 6-1,we can see that removal of the access path xl from

    the access graph g6 results in the access graph g2. This operation requires removing the

    frontier(),i.e. in this case, the node lfrom the access graph g6. The second example illustrates

  • 8/13/2019 HRA Project Report

    49/57

    49

    the case where the is a simple access path. The third and the fourth examples are on the

    same lines.

    The implementation of this operation is derived from the definition given above. Firstly, the

    access graph GB is constructed from the access path Base() and then set of corresponding

    nodes is calculated as given above. Each node in the set obtained is then checked to see if it has

    a unique access path from root to itself and also an edge to a node which is the frontier of . If

    such an edge exists then it is removed from the set and after removing all such edges the graph

    is cleaned up.

    AccessGraphSet * Path_Removal (AccessGraphSet * G , AccessPath * p)

    begin

    if p is empty then return copy(G) ;

    for each graph g in set G

    begin

    if (root(p) != root (g)) continue ;

    if p is a simple access path

    then remove everything from g (empty);

    else

    GB= construct_graph (base (p)) ;

    Nodes_set N = Corresponding_nodes (G , GB, {lastNode(GB)})

    for each node niin gbegin

    if ni N

    if UniqueAccessPath?(G,ni)

    then begin

    for each edge e from node ni

    if e to_node == frontier(p)

    delete edge e ;

    end if

    end for

    CleanUp (g) ;

    end for

    end procedure

    6.4.2.3. Factorization

    Given a node m (N {n0})of an access graph G, the Remainder Graph of Gat m is the

    subgraph of G rooted at m and is denoted by RG(G, m). If m does not have any outgoing

    edges, then the result is the empty remainder graph RG. Let M be a subset of the nodes of G

  • 8/13/2019 HRA Project Report

    50/57

    50

    and Mbe the set of corresponding nodes in G. Then, G/(G,M)computes the set of remainder

    graphs of the successors of nodes in M.

    G/(G,M) = {RG(G, nj) | ni njE, niCN(G,G,M)} (0.5)

    A remainder graph is similar to an access graph except that (a) its entry node does not

    correspond to a root variable but to a field name and (b) the entry node can have incoming

    edges.

    InFigure 6-1,the first example illustrates the result when g2 is factorized with g1 and {x}. The

    resultant graph rg1 is the sub graph of g2 rooted at {r} which is the successor of the node {x},

    which is the corresponding node between the two graphs and the given set. The second

    example is on the same lines with the difference that {x} here has two successors, thus,

    resulting in two different remainder graphs. In the third example the corresponding node {r}

    does not have successor thus resulting in an empty graph. The fourth example illustrates the

    case in which there is no corresponding node between the two graphs and thus the result is a

    null set.

    In the implementation of this operation, the set of corresponding nodes is calculated and then

    a remainder graph is constructed for each successor of the node in this set.

    AccessGraphSet * Factorization (AccessGraphSet G1, AccessGraphSet G2, Nodes_Set M)

    begin

    AccessGraphSet RG ;

    for each graph g1 in set G1

    for each graph g2 in set G2

    begin

    if (root(g1) != root(g2)) continue ;

    Nodes_set N = Corresponding_nodes(g1,g2,M);

    for each node n in N

    for each edge e of n

    begin

    new_graph = remainder_graph (g1 , e to_node) ;

    add new_graph to RG ;

    end for

    end for

    end procedure

  • 8/13/2019 HRA Project Report

    51/57

    51

    6.4.2.4. Extension

    Extending an empty access graph EG results in the empty access graph EG. For non-empty

    graphs, this operation is defined as follows.

    (a) Extension with a remainder graph (). Let M be a subset of the nodes of G and R be a remainder graph. Then, (G,M) R appends the suffixes in R to the access paths ending

    on nodes in M.

    (G,M) RG= G (0.6)

    (G,M) R = (0.7)

    (b) Extension with a set of remainder graphs (#). Let S be a set of remainder graphs. Then, G#S

    extends access graph G with every remainder graph in S.

    (G,M) # = EG (0.8)

    (G,M) #S =

    (G,M) R (0.9)

    This operation simply involves adding the remainder graph to the given graph at a certain given

    node. From the Figure 6-1,we can see that extending g3 with rg1 at l1 results in the access

    graph g4. In the second example the given access graph is extended with two remainder graphs

    at two nodes, while the third and fourth examples are pretty much straight forward from the

    definition given above.

    The implementation of this function requires the union function followed by addition of some

    edges from the nodes in the set M to the root node of the remainder graph R.

  • 8/13/2019 HRA Project Report

    52/57

  • 8/13/2019 HRA Project Report

    53/57

    53

    7. Implementation of Explicit Liveness Analysisin GCC

    The theory of data flow analysis and explicit liveness analysis of heap have been seen in

    Chapter2 and 3.Later chapters discussed interfacing with GCC and implementation of access

    graph and access path libraries. Now we have access path and other information from GCC and

    access graph library to support our analysis, so we now implement the explicit liveness analysis.

    7.1. The main function

    The analysis was divided into 3 functions, the preparatory pass, explicit liveness analysis and

    other analyses. They are explained below,

    The main data structuring storing the information is,

    Figure 7-1: Main data structure

    The preparatory pass: This pass consisted of computation of information which is static and

    would be needed by all other analyses. Type of statement is computed as ASSIGNMENT,

    FUNCTION CALL, RETURN, USE, OTHERand stored in tos field. Access paths are extracted from

    each statement and stored for further use in other analyses in access_paths field. Each

    statement would consist of maximum 3 access paths due to use of SSA form in GIMPLE. Basic

    blocks are numbered in decreasing order while returning from depth first traversal. This

    enables us to traverse each function against the control flow when basic blocks are traversed in

    decreasing numbering[2].Also information of any statement can be accessed from Stmt_info

    as a tuple .

    Explicit Liveness analysis: This is the main function computing explicit liveness. It is explained in

    the next section.

    Other analyses: The other analyses are not implemented as of now.

    typedef struct {

    enum type_of_satement tos;

    Access_paths * access_paths;

    Liveness_analysis_info *

    liveness_info;

    } Heap_analysis_info;

    Heap_analysis_info** Stmt_info;

  • 8/13/2019 HRA Project Report

    54/57

    54

    7.2. Explicit liveness analysis

    The explicit liveness analysis extracts information from statements and performs analysis on

    this information. Some of the information remains constant while some of it changes with each

    iteration. We do an initialization pass over the program computing the static information like

    LDirect, EKillPath, some information required by LTransfer.

    The data structure used to information in this pass is,

    Figure 7-2: Data structure for liveness analysis

    After the static information is computed and stored then comes the general data flow

    algorithm iterations over the program. It is as shown below,

    Figure 7-3: General Algorithm

    7.2.1. Computation of ELOut

    ELOutis computed by directly implementing Equation for ELOut inFigure 3-12.

    7.2.2. Computation of ELIn

    ELIndepends on the type of statement and calculated as,

    Switch on type of statement

    Assignment: calculate LTransfer, EKillPath and LDirect using equation inFigure 3-7;

    calculate ELGen using equation inFigure 3-12;

    return ELIn using equation inFigure 3-12;

    Function Call or Return or Use:/*not completely implemented*/

    Other: return same as ELOut;Figure 7-4: Computation of ELIn

    typedef struct {

    access_graph_set * LDirect;

    access_graph_set * ELKillPath;

    access_graph_set * LTransfer_info;

    access_graph_set * ELIn;

    access_graph_set * ELOut;

    access_graph_set * LIn;

    access_graph_set * LOut;

    } Liveness_analysis_info;

    For each function

    for each statement in specified traversal ordercompute ELOut set of statement

    compute ELIn set of statement

    break if ELOut or ELIn is changed

  • 8/13/2019 HRA Project Report

    55/57

    55

    7.2.3. Computation of LDirect

    LDirectalso depends on type of statement and is calculated as,

    Switch on type of statement

    Assignment:

    calcuate LDirect using equation inFigure 3-7;

    Function Call*:

    calcuate LDirect using equation inFigure 3-8;

    Return*:

    calcuate LDirect using equation inFigure 3-9;

    Use*:

    calcuate LDirect using equation inFigure 3-10;

    * not implemented completely

    Figure 7-5: Computation of LDirect

    7.2.4. Calculation of EKillPath

    EKillPathis only defined for assignment and function call statement.

    Switch on type of statement

    Assignment:

    caculate EKillPath using equation inFigure 3-7;

    Function Call:

    /*not implemented completely*/

    Figure 7-6: Calculation of EKillPath

    7.2.5. Calculation of LTransfer

    LTransferis defined only for assignment statement.

    Switch on type of statement

    Assignment :

    calculate LTransfer using equation inFigure 3-7;

    Figure 7-7: Calculation of LTransfer

    Thus the above mentioned algorithm computes liveness analysis of heap and stores final access

    graphs associated with each statement.

  • 8/13/2019 HRA Project Report

    56/57

  • 8/13/2019 HRA Project Report

    57/57

    9. References[1]Aho, Sethi, & Ullman.Dragon Book.Pearson Education.

    [2]Khedker. (2010). Generic Data Flow Analyser.IITB.

    [3]Khedker. (2010).Manipulating GIMPLE and RTL IRs.IITB: GRC.

    [4]Khedker, Sanyal, & Karkare.Data Flow Analysis: Theory and Practice.CRC Press.

    [5]Khedker, Sanyal, & Karkare. (2007). Heap Reference Analysis Using Access Graphs.

    ACM.

    [6]Merrill, J. (2003). GENERIC and GIMPLE: A New Tree Representation for Entire

    Functions.GCC Developers Summit.

    [7]Stallman, R. (2010). GCC Internals.GCC.