Interprocedural Analysis Noam Rinetzky Mooly Sagiv msagiv/courses/pa05.html Tel Aviv University...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Interprocedural Analysis Noam Rinetzky Mooly Sagiv msagiv/courses/pa05.html Tel Aviv University...
Interprocedural AnalysisNoam Rinetzky
Mooly Sagiv
http://www.cs.tau.ac.il/~msagiv/courses/pa05.html
Tel Aviv University
640-6706
Textbook Chapter 2.5
Outline Challenges in interprocedural analysis The trivial solution Why isn’t it adequate Simplifying assumptions A naive solution Join over valid paths The call-string approach The functional approach
– A case study linear constant propagation– Context free reachability
Modularity issues Other solutions
Challenges in Interprocedural Analysis Respect call-return mechanism Handling recursion Local variables Parameter passing mechanisms: value, value-
result, reference, by name Procedure nesting The called procedure is not always known The source code of the called procedure is not
always available – separate compilation
– vendor code
– ...
Extended Syntax of While
a := x | n | a1 opa a2
b := true | false | not b | b1 opb b2 | a1 opr a2
S := [x := a]l | [call p(a, z)]ll’ |
[skip]l | S1 ; S2 | if [b]l then S1 else S2 | while [b]l do S
P := begin D S end
D := proc id(val id*, res id*) isl S endl’ | D D
A Trivial treatment of procedures
Analyze a single procedure After every call continue with conservative
information– Global variables and local variables which “may be
modified by the call” are mapped to
A Trivial treatment of procedures
begin proc p() is1
[x := 1]2
end3
[call p()]45
[print x]6
end
[a, x]
[a, x1]
[x0]
[x0]
[x]
[x]
Advantages of the trivial solution
Can be easily implemented Procedures can be written in different languages Procedure inline can help Side-effect analysis can help
Disadvantages of the trivial solution
Modular (object oriented and functional) programming encourages small frequently called procedures
Optimization– Modern machines allows the compiler to schedule
many instructions in parallel
– Need to optimize many instructions
– Inline can be a bad solution
Software engineering– Many bugs result from interface misuse
– Procedures define partial functions
Simplifying Assumptions
All the code is available Simple parameter passing The called procedure is syntactically known No nesting Procedure names are syntactically different from
variables Procedures are uniquely defined Recursion is supported
Constant Examplebegin
proc p() is1
if [b]2 then (
[a := a -1]3
[call p()]45
[a := a + 1]6
)
[x := -2* a + 5]7
end8
[a=7]9 ; [call p()]1011 ; [print(x)]12
end
A naive Interprocedural solution Treat procedure calls as gotos Obtain a conservative solution Find the least fixed point of the system:
Use Chaotic iterations
DFentry(s) =
DFentry(v) = {f(e)(DFentry(u) : (u, v) E}
Simple Examplebegin
proc p() is1
[x := a + 1]2
end3
[a=7]4
[call p()]56
[print x]7
[a=9]8
[call p()]910
[print a]11
end
proc p
x=a+1
end
a=7
call p5
call p6
print x
a=9
call p9
call p10
print a
[x0, a0]
[x0, a7] [x0, a7]
[x0, a7]
[x8, a7]
[x8, a7]
[x8, a7]
[x8, a7]
[x8, a7]
[x8, a9]
[x8, a9]
[x, a]
Simple Examplebegin
proc p() is1
[x := a + 1]2
end3
[a=7]4
[call p()]56
[print x]7
[a=9]8
[call p()]910
[print a]11
end
proc p
x=a+1
end
a=7
call p5
call p6
print x
a=9
call p9
call p10
print a
[x0, a0]
[x0, a7]
[x, a]
[x, a]
[x, a]
[x, a]
[x, a]
[x, a]
[x, a9]
[x, a]
We want something better …
Let paths(v) denote the potentially infinite set paths from start to v (written as sequences of labels)
For a sequence of edges [e1, e2, …, en] definef [e1, e2, …, en]: L L by composing the effects of basic blocksf [e1, e2, …, en](l) = f(en) (… (f(e2) (f(e1) (l)) …)
JOP[v] = {f[e1, e2, …,en]() [e1, e2, …, en] paths(v)}
Valid Paths
( )
f1 f2 fk-1 fk
f3
f4
f5
fk-2
fk-3
callq
enterq exitq
ret
void p() { if (...) { x = x + 1; p(); // p_calls_p1 x = x - 1; }return;}
Invalid Path
int x;
void main() {
x = 5;
p();
return;
}
A More Precise Solution Only considers matching calls and returns (valid) Can be defined via context free grammar Every call is a different letter Matching calls and returns
Matched | Matched Matched |(c Matched )c
for all [call p()]lclr in P
Valid Matched | lc Valid for all [call p()]lclr in P
A More Precise Solution Only considers matching calls and returns (valid) Can be defined via context free grammar Every call is a different letter Matching calls and returns
Intra | (li,lj) Intra for all li , lj in Lab*\ LabIP
Matched | Intra | Matched Matched |
(lc,ln) Matched (lx,lr) for all [call p()]lclr and p isln
S lx
Valid Matched | (lc,ln) Valid for all [call p()]lclr
for all [call p()]lclr and p isln S lx
Let
Lab* = all the labels in the program
LabIP={lc,lr : [call p()]lclr in the program}
The Join-Over-Valid-Paths (JVP) For a sequence of edges [e1, e2, …, en] define
f [e1, e2, …, en]: L L by composing the effects of basic statements– f[](s)=s– f [e, p](s) = f[p] (fe (s))
JVPl = {f[e1, e2, …, e]() [e1, e2, …, e] vpaths(l), e = (*,l)}
Compute a safe approximation to JVP In some cases the JVP can be computed
– Distributivity of f– Functional representation
The Call String Approach for Approximating JVP
No assumptions Record at every node a pair (l, c) where l L is
the dataflow information and c is a suffix of unmatched calls
Use Chaotic iterations To guarantee termination limit the size of c
(typically 1 or 2) Emulates inline (but no code growth) Exponential in C For a finite lattice there exists a C which leads to
join over all valid paths
Simple Examplebegin
proc p() is1
[x := a + 1]2
end3
[a=7]4
[call p()]56
[print x]7
[a=9]8
[call p()]910
[print a]11
end
proc p
x=a+1
end
a=7
call p5
call p6
print x
a=9
call p9
call p10
print a
[x0, a0]
[x0, a7] 5,[x0, a7]
5,[x0, a7]
5,[x8, a7]
[x8, a7]
[x8, a7]
[x8, a7]
[x8, a9]
9,[x8, a9]
9,[x8, a9]
9,[x10, a9]
[x10, a9]
5,[x8, a7]9,[x10, a9]
begin0
proc p() is1
if [b]2 then (
[a := a -1]3
[call p()]45
[a := a + 1]6
)
[x := -2* a + 5]7
end8
[a=7]9 ; [call p()]1011 ; [print(x)]12
end13
Recursive Example
a=7
Call p10
Call p11
print(x)
p
If( … )
a=a-1
Call p4
Call p5
a=a+1
x=-2a+5
end
10:[x0, a7]
[x0, a7]
[x0, a0]10:[x0, a7]
10:[x0, a6]
4:[x0, a6]
4:[x0, a6]
4:[x-7, a6]
10:[x-7, a6]
4:[x-7, a6]
4:[x-7, a6]
4:[x-7, a7]
4:[x, a]
The Functional Approach
The meaning of a function is mapping from states into states
The abstract meaning of a function is function from an abstract state to abstract states
begin
proc p() is1
if [b]2 then (
[a := a -1]3
[call p()]45
[a := a + 1]6
)
[x := -2* a + 5]7
end8
[a=7]9 ; [call p()]1011 ; [print(x)]12
end
Motivating Example
a=7
Call p10
Call p11
print(x)
p
If( … )
a=a-1
Call p4
Call p5
a=a+1
x=-2a+5
end
[x0, a7]
[x0, a0]
e.[x-2e(a)+5, a e(a)]
[x-9, a7]
[x-9, a7]
begin
proc p() is1
if [b]2 then (
[a := a -1]3
[call p()]45
[a := a + 1]6
)
[x := -2* a + 5]7
end8
[read(a)]9 ; [call p()]1011 ; [print(x)]12
end
Motivating Example
read(a)
Call p10
Call p11
print(x)
p
If( … )
a=a-1
Call p4
Call p5
a=a+1
x=-2a+5
end
[x0, a]
[x0, a0]
e.[x-2e(a)+5, a e(a)]
[x, a]
[x, a]
The Functional Approach
Main idea: Iterate on the abstract domain of functions from L to L
Two phase algorithm– Compute the dataflow solution at the exit of a
procedure as a function of the initial values at the procedure entry (functional values)
– Compute the dataflow values at every point using the functional values
Can compute the JVP
Example: Constant propagation
L = VarN {, } Domain: F:LL
– (f1f2)(x) = f1(x)f2(x)
x=7
y=x+1
x=y
env.env[x7]
env.env[yenv(x)+1]
Id=envL.env
env.env[x7] ○ env.env
env.env[yenv(x)+1] ○ env.env[x7] ○ env.env
Example: Constant propagation
L = VarN {, } Domain: F:LL
– (f1f2)(x) = f1(x)f2(x)
x=7 y=x+1
x=y
env.env[x7] env.env[yenv(x)+1]
Id=env.envId=env.env
env.env[yenv(x)+1] env.env[x7]
Running Example 1 init
a=79
Call p10
Call p11
print(x)12
p1
If( … )2
a=a-13
Call p4
Call p5
a=a+16
x=-2a+57
end8
begin0
end13
NFunction
0e.[xe(x), ae(a)]=id
1e.[xe(x), ae(a)]=id
3-13e.
Running Example 1 NFunction
1e. [xe(x), ae(a)]=id
2id
7id
8e.[x-2e(a)+5, a e(a)]
3id
4e.[xe(x), a e(a)-1]
5f8 ○ e.[xe(x), a e(a)-1] =
e.[x-2(e(a)-1)+5, a e(a)-1]
6e.[x-2(e(a)-1)+ 5, a e(a)-1]
7e.[x-2(e(a)-1)+5, a e(a)] e.[x e(x), a e(a)]
8a, x.[x-2e(a)+5, a e(a)]
a=79
Call p10
Call p11
print(x)12
p1
If( … )2
a=a-13
Call p4
Call p5
a=a+16
x=-2a+57
end8
begin0
end130e.[xe(x), ae(a)]=id
10e.[xe(x), a7]
11a, x.[x-2e(a)+5, a e(a)] ○ f10
Running Example 2 NFunction
1[x0, a7]
2[x0, a7]
7[x0, a7]
8[x-9, a7]
3[x0, a7]
4[x0, a6]
1[x-7, a6]
6[x-7, a7]
7[x, a7]
8[x-9, a7]
1[x, a]
a=79
Call p10
Call p11
print(x)12
p1
If( … )2
a=a-13
Call p4
Call p5
a=a+16
x=-2a+57
end8
begin0
end13
0[x0, a0]
10[x0, a7]
11[x-9, a7]
Issues in Functional Approach
How to guarantee that finite height for functional lattice?– It may happen that L has finite height and yet the
lattice of monotonic function from L to L do not
Efficiently represent functions – Functional join
– Functional composition
– Testing equality
– Usually non-trivial
– But can be done for distributive functions
Example Linear Constant Propagation
Consider the constant propagation lattice The value of every variable y at the program exit
can be represented by: y = {(axx + bx )| x Var* } c ax ,c Z {, } bx Z
Supports efficient composition and “functional” join– [z := a * y + b]
– What about [z:=x+y]?
Computes JVP
Functional Approach via Context Free Reachablity
The problem of computing reachability in a graph restricted by a context free grammar can be solved in cubic time
Can be used to compute JVP in arbitrary finite distributive data flow problems (not just bitvector)
Nodes in the graph correspond to individual facts Efficient implementations exit (MOPED)
Conclusion
Handling functions is crucial for abstract interpretation
Virtual functions and exceptions complicate things
But scalability is an issue Assume-guarantee helps
– But relies on specifications