ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

44
ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1

Transcript of ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

Page 1: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

1

ANALYSIS OF PROG. LANG.PROGRAM ANALYSISInstructors: Crista Lopes

Copyright © Instructors.

Page 2: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

2

Motivation(s)

Where do you see PA in your everyday life?

How does PA “work”? What is PA anyway?

Page 3: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

3

Auto-completion

Page 4: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

4

Pre-compilation error detection

Ex: missing parenthesis

Page 5: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

5

How do you know ...

int a;

increment_a() { a ++;

}

while(true) { String a = “hello”;

increment_a(); }

This “a” is not that “a”

Page 6: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

6

How do you remember ...

int a;

increment_a() { a ++;

}

while(true) { String a = “hello”;

increment_a(); }

Wait, what’s the type of “a” again?

“a” is of type int (FYI...)

Page 7: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

7

Outline

Introduction/motivations Program representation

AST 3-address code

Control flow analysis Data flow

Page 8: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

8

Intermediate Representation (IR) Initial Point Abstract Syntax Tree

Abstract vs Concrete Syntax Parse Tree vs Abstract Syntax Tree

Three-address Codes

Page 9: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

9

IR-1 Starting Point

Parsing, Lexical

Analysis

Code Generation, Optimizatio

n

Code Execution

Source

code

Intermediaterepresentation

Targetcode

Analyze IR – Perform analysis on the resultsUse this information for applications

Page 10: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

10

IR-2. Abstract Syntax Tree (AST) Concrete vs Abstract Syntax

Concrete show structure and is language-specific

Abstract shows structure

Representations Parse Tree represents Concrete Syntax Abstract Syntax Tree represents Abstract

Syntax

Page 11: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

11

IR-2. Example : Grammar

Example a:= b+c (Language 1) a = b+c; (Language 2)

Grammar for 1stmtlist � stmt | stmt stmtliststmt assign | if-then | …assign ident “:=“ ident binop identbinop “+” | “-” | …

Grammar for 2stmtlist � stmt “;”| stmt “;” stmtliststmt assign | if-then | …assign ident “=“ ident binop identbinop “+” | “-” | …

Page 12: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

12

IR-2. Example: Parse Tree

stmtlist

stmt

assign

Ident := ident binop ident

a b “+” c

Parse Tree for a:=b+c Parse Tree for a=b+c;

stmtlist

stmt “;”

assign

Ident = ident binop ident

a b “+” c

Page 13: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

13

IR-2 Example: Abstract Syntax Tree

Example

1. a:=b+c

2. a=b+c;

Abstract Syntax Tree for 1 and 2

assign

a add

b c

Page 14: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

14

IR-3. Three Address Code

General form: x = y op z More generally: (operator, operand1, operand2, result)

(at most 3 spots besides the operator) May include temporary variables Examples

Assignment Binary x:= y op z (op, y, z, x) Unary x := op y (op, v, _, x)

Copy x:=y (_, y, _, x) Jumps

Unconditional goto L (goto, L, _, _) Conditional if x relop y goto L (relop, x, y, L)

….

Page 15: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

15

IR-3. Example: Three Address Code if a>10

then x=y+zelse

x=y-z

1. if a>10 goto 4 2. x = y-z 3. goto 5 4. x = y + z 5. …..

Page 16: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

16

Analysis Levels

Local within a single basic block or statement

Intraprocedural within a single procedure, function, or method

Interprocedural across procedure boundaries, procedure call, shared

globals, etc Intraclass

within a single class Interclass

across class boundaries …..

Page 17: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

17

Outline

Introduction/motivations Program representation Control flow analysis

Computing Control Flow (analysis and representation)

Search and Traversals Applications

Data flow

Page 18: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

18

Computing Control flow (example)Procedure AVGS1 count=0;S2 fread(fptr , n)S3 while(not EOF) doS4 if(n<0)S5 return(error)

elseS6 nums[count]=nS7 count++ endifS8 fread(fptr , n);

endwhileS9 avg= mean(nums , count)S10 return (avg)

S1

S2

S3

S4

S5

S10

S6

S9

S8

S7

EXIT

entry

Page 19: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

19

CF1: Control Flow (Basic Blocks) A basic block is a sequence of

consecutive statements in which flow of control enters at the beginning and leaves at the end without halt of possibility of branch except at the end

A basic block may or may not be maximal

For compiler optimizations, maximal blocks are desirable

For software engineering tasks, basic blocks that represent one source code statement are often used

Page 20: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

20

Computing Control flow (example)Procedure AVGS1 count=0;S2 fread(fptr , n)S3 while(not EOF) doS4 if(n<0)S5 return(error)

elseS6 nums[count]=nS7 count++ endifS8 fread(fptr , n);

endwhileS9 avg= mean(nums , count)S10 return (avg)

S1

S2

S3

S4

S5

S10

S6

S9

S8

S7

EXIT

entry

Page 21: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

21

CF1: Computing Control Flow Input: A list of program statements in some form Output: A list of CFG nodes and edges Procedure:

Construct basic blocks Create entry exit nodes; create edge (entry, B1); create

(exit, Bk) for each Bk that represents an exit from program Add CFG edge from Bi to Bj if Bj can immediately follow Bi

in some execution i.e., There is conditional or unconditional goto from last statement of

Bi to first statement of Bj or Bj immediately follows Bi in the order of the program and Bi

does not end in unconditional goto statement Label edges that represent conditional transfers of control

Page 22: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

22

CF2: Search and Ordering

Many ways to visit the nodes in the graph Depth First Search: Visits descendants of the

node before visiting any of its siblings Breadth First Search: All of the node’s

immediate descendants are processed before any of their unprocessed children

Preorder Traversal: A node is processed before its descendants

Postorder Traversal: A node is processed after its descendants

Page 23: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

23

CF2: Search and Ordering (cont’d) (DFS)

One DFS of CFG 13467810,back to 8,9, back to 8, 7,6,4,5, back to 4,3,1,2,back to 1

The number assigned to a node during DFS is its depth first number

Depth first ordering of nodes is the reverse of the order in which nodes are visited in DFS

For the DFS, nodes are visited 1,3,4,6,7,8,10,8,9,8,7,6,5,4,3,1,2,1

Depth first ordering is 1,2,3,4,5,6,7,8,9,10

1

2

S3

S4

S5

S10

S6

S9

S8

S7

Page 24: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

24

CF: Types of Edges

Depth first representation is depth first spanning tree along with other edges not part of the tree; tree edges, other edges

Three kinds of edges Advanced (forward) edges: go

from a node to one of its proper descendants in the tree; these include tree edges

Back edges: go from a node to one of its ancestor in the tree

Cross edges: connect nodes such that neither is an ancestor of the other

Page 25: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

25

Applications of Control Flow

Complexity – Pointers to refactoring

Testing Branch, Path, Basis Path Branch: Must test 1-2, 1-3,

4-5, 4-8, 5-6, 5-7 Path: Infinite, due to loop Basis Path: Set of paths

which covers all the edges at least once e.g. 1,2,4,8; 1,3,4,5,6,7,4,8

Program Understanding Recover program structure

Impact analysis …..

1

2 3

4

8

6

5

7

Page 26: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

26

Outline

Introduction/motivations Program representation Control flow Data flow

Introduction Reaching definitions

Page 27: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

27

Data flow - Introduction

Flow of various data throughout the program Obtained from AST or CFG Used in software engineering tasks

Exact solutions to most data flow problems are undecidable May depend on input May depend on the outcome of a conditional

statement May depend on termination of loop

Thus we compute approximations of the exact solution

Page 28: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

28

Data flow - Introduction

Some Approximations “overestimate” the solution Approximations contain actual information plus some

spurious information but does not omit any actual information Conservative and safe approach

Some Approximations “underestimate” the solution Approximations may not contain all the information of the

actual solution Unsafe

Research challenge: Providing safe but precise information in an efficient way

Uses of data flow: Compiler optimization requires conservative analysis Software engineering tasks may only need unsafe info

Page 29: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

29

Data flow – Compiler Optimization

Common subexpression elimination

c=a+b=a

e=a+b=a

d=a+b=a

Page 30: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

30

Data flow – Compiler Optimization

Common subexpression elimination

Need to know available expressions: which expressions have been computed at that point before this statement

c=a+b=a

e=a+b=a

d=a+b=a

t=a+b

c=tc=a

t=a+b

d=tc=a

e=t=a

Page 31: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

31

Data Flow - Compiler Optimization

Register (de)allocation When assigning memory locations to

registers, if a value in a register (ie a memory location) is not used again, no need to keep it in a register

Is R2 needed after this statement? Need to know “live variables”: which

variables are still used after current line

R1=R2+10=a

Page 32: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

32

Data Flow - Compiler Optimization

Suppose every assignment that reaches this statement assigns 5 to c

then ‘a’ can be replaced by 15

But: Need to know reaching definitions: which definition(s) of variable c reach this statement

a=c+10 // need 3 registers=a

a=15 //need 2 registers/a

Page 33: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

33

Data Flow - Sw Eng Tasks

Data-Flow testing Suppose that a statement assigns a value but the use

of that value is never executed under test

a never used on this path

Need to know definition use pairs: link between definition(s) and use(s) of a variable (or a memory location)

a=c+10=a

d=a+y=a

Page 34: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

34

Data Flow - Sw Eng Tasks

Debugging Suppose that ‘a’ has an incorrect value in the

statement Eg int overflow

Need data dependence information: some

statements produce erroneous values, others are affected by those values

a=c+y=a

d=a+y=a

Page 35: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

35

Data flow - Example

Compute the flow of data throughout the program Where does the

assignment to i in statement 1 reach?

Where does the expression computed in statement 2 reach?

Which uses of variable are reachable from the end of Block1?

Is the value of variable i live after statement 2?

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

Page 36: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

36

Reaching definitions analysis

Definition = statement where a variable is assigned a value (e.g. input statement, assignment statement)

A definition of ‘a’ reaches a point ‘p’ if there exists a control flow path in the CFG from the definition to ‘p’ with no other definitions of ‘a’ on the path

Such a path may exist in the graph but may not be possible – infeasible path

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

Page 37: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

37

Reaching definitions analysis

What are the definitions in the program? Of variable i: Of variable k:

Which basic blocks (before block) do these definitions reach? Def 1 reaches: Def 2 reaches: Def 3 reaches: Def 4 reaches: Def 5 reaches:

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

Page 38: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

38

Reaching definitions analysis

What are the definitions in the program? Of variable i: 1,3 Of variable k: 2,4,5

Which basic blocks (before block) do these definitions reach? Def 1 reaches: B2 Def 2 reaches: B1, B2, B3 Def 3 reaches: B1, B3, B4 Def 4 reaches: B4 Def 5 reaches: exit

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

Page 39: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

39

Reaching definitions analysis

Method Compute two kinds of basic

information (within the block) Gen[B]: set of definitions

generated within B Kill[B]: set of definitions that, if

they reach the point before B, won’t reach end of B

Compute two other sets by propagation IN[B]: set of definitions the

reach the beginning of B OUT[B]: set of definitions that

reach the end of B

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

Page 40: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

40

Reaching definitions analysis

Init GEN

Init KILL

Init IN

Init OUT

IN OUT

1 1,2 3,4,5

-- 1,2 2,3 1,2

2 3 1 -- 3 1,2 2,3

3 4 2,5 -- 4 2,3 3,4

4 5 2,4 -- 5 3,4 3,5

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

Page 41: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

41

Iterative Data-Flow analysis algorithm

Algorithm for Reaching Definitions Input: CFG with GEN[B], KILL[B] for all B Output: IN[B], OUT[B] for all B

Begin RDIN[B]=empty, OUT[B]=GEN[B] for all B; change = trueWhile change do begin

change=falseFor each B do begin

IN[B]=union OUT[P] (P is a predecessor of B)OLDOUT=OUT[B]OUT[B]=GEN[B] union (IN[B]-KILL[B])if (OUT[B]!=OLDOUT) then change = true;

End forEnd whileEnd RD

Page 42: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

42

Tools

Eclipse JDT/AST (APIs to construct, traverse and manipulate AST)

http://www.vogella.de/articles/EclipseJDT/article.html Sourcererhttp://sourcerer.ics.uci.edu/index.html Crystal (Data Analysis Framework, mostly

for academic purposes)http://code.google.com/p/crystalsaf/wiki/Installation

Page 43: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

43

Mandatory Reading List

Representation and Analysis of Software – Rep-Analysis.pdf

Crystal Notes – CrystalTutorialNotes.pdf, CrystalTutorial.ppt

Eclipse JDT - AST - http://www.vogella.de/articles/EclipseJDT/article.html

Page 44: ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

44

More (optional) Reading List

Principles of Program Analysis, Nielson and Hankin

Invariant Detection using Daikon – daikon.pdf

More optional readings available at Program Analysis course material at CMU http://www.cs.cmu.edu/~aldrich/courses/15-819M/