Pointer Analysis
-
Upload
lars-sharp -
Category
Documents
-
view
27 -
download
1
description
Transcript of Pointer Analysis
Pointer Analysis
G. RamalingamMicrosoft Research, India
A Constant Propagation Example
x = 3;
y = 4;
z = x + 5;
• x is always 3 here• can replace x by 3• and replace x+5 by 8• and so on
A Constant Propagation ExampleWith Pointers
x = 3;
*p = 4;
z = x + 5;
• Is x always 3 here?
p = &y; x = 3; *p = 4; z = x + 5;
A Constant Propagation ExampleWith Pointers
x is always 3
p = &x; x = 3; *p = 4; z = x + 5;
if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5;
x is always 4
x may be 3 or 4(i.e., x is unknown in our lattice)
pointers affectmost program analyses
p = &y; x = 3; *p = 4; z = x + 5;
A Constant Propagation ExampleWith Pointers
p alwayspoints-to
y
p = &x; x = 3; *p = 4; z = x + 5;
if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5;
p alwayspoints-to
x
p may point-to x or y
Points-to Analysis
• Determine the set of targets a pointer variable could point-to (at different points in the program)– “p points-to x” == “*p has l-value x”– targets could be variables or locations in
the heap (dynamic memory allocation)• p = &x;• p = new Foo(); or p = malloc (…);
–must-point-to vs. may-point-to
A Constant Propagation ExampleWith Pointers
*q = 3;
*p = 4;
z = *q + 5; what values
canthis take?
Can *p denote thesame location as *q?
More Terminology
• *p and *q are said to be aliases (in a given concrete state) if they represent the same location
• Alias analysis– determine if a given pair of references
could be aliases at a given program point
– *p may-alias *q– *p must-alias *q
Pointer Analysis
• Points-To Analysis– may-point-to– must-point-to
• Alias Analysis– may-alias– must-alias
Points-To Analysis:A Simple Example
p = &x;q = &y;if (?) { q = p;}x = &a;y = &b;z = *q;
Points-To Analysis:A Simple Example
p = &x;q = &y;if (?) { q = p;}x = &a;y = &b;z = *q;
p q x y z
null null null null null
x null null null null
x y null null null
x y null null null
x x null null null
x {x,y} null null null
x {x,y} a null null
x {x,y} a b null
x {x,y} a b {a,b}
p q x y z
null null null null null
x null null null null
x y null null null
x y null null null
x x null null null
x {x,y} null null null
x {x,y} a null null
x {x,y} a b null
p q x y z
null null null null null
x null null null null
x y null null null
x y null null null
x x null null null
x {x,y} null null null
x {x,y} a null null
p q x y z
null null null null null
x null null null null
x y null null null
x y null null null
x x null null null
x {x,y} null null null
p q x y z
null null null null null
x null null null null
x y null null null
x y null null null
x x null null null
p q x y z
null null null null null
x null null null null
x y null null null
x y null null null
p q x y z
null null null null null
x null null null null
x y null null null
p q x y z
null null null null null
x null null null null
p q x y z
null null null null null
p q x y z
Points-To Analysisx = &a;y = &b;if (?) { p = &x;} else { p = &y;}
*x = &c;*p = &c;
How should we handlethis statement?
x: a y: bp: {x,y}
a: null
x: a y: bp: {x,y}
a: c
x: {a,c}
y: {b,c}
p: {x,y}
a: c
Strong update
Weak updateWeak update
Questions
• When is it correct to use a strong update? A weak update?
• Is this points-to analysis precise?
• What does it mean to say– p must-point-to x at pgm point u– p may-point-to x at pgm point u– p must-not-point-to x at u– p may-not-point-to x at u
Points-To Analysis, Formally
• We must formally define what we want to compute before we can answer many such questions
Static Program Analysis
• A static program analysis computes approximate information about the runtime behavior of a given program1. The set of valid programs is defined by the
programming language syntax2. The runtime behavior of a given program
is defined by the programming language semantics
3. The analysis problem defines what information is desired
4. The analysis algorithm determines what approximation to make
Programming Language: Syntax
• A program consists of– a set of variables Var– a directed graph (V,E,entry) with a distinguished
entry vertex, with every edge labelled by a primitive statement
• A primitive statement is of the form• x = null• x = y• x = *y• x = &y;• *x = y• skip(where x and y are variables in Var)
Omitted (for now)• Dynamic memory allocation• Pointer arithmetic• Structures and fields• Procedures
Example Program
x = &a;y = &b;if (?) { p = &x;} else { p = &y;}
*x = &c;*p = &c;
1
2
3
4 5
x = &a
y = &b
6
7
8
*x = &c
*p = &c
p = &yp = &x
skipskip
Vars = {x,y,p,a,b,c}
Programming Language:Operational Semantics
• Operational semantics == an interpreter (defined mathematically)
• State– Data-State ::= Var -> (Var U {null})– PC ::= V (the vertex set of the CFG)– Program-State ::= PC x Data-State
• Initial-state:– (entry, \x. null)
Example States
1
2
3
4 5
x = &a
y = &b
6
7
8
*x = &c
*p = &c
p = &yp = &x
skipskip
Vars = {x,y,p,a,b,c}x: N, y:N, p:N, a:N, b:N, c:N
Initial data-state
Initial program-statex: N, y:N, p:N, a:N, b:N, c:N
<1, >
Programming Language:Operational Semantics
• Meaning of primitive statements– CS[stmt] : Data-State -> Data-State
• CS[ x = y ] s = s[x s(y)]• CS[ x = *y ] s = s[x s(s(y))]• CS[ *x = y ] s = s[s(x) s(y)]• CS[ x = null ] s = s[x null]• CS[ x = &y ] s = s[x y]
must say what happens if null is dereferenced
Programming Language:Operational Semantics
• Meaning of program– a transition relation on program-states– Program-State X Program-State– state1 state2 means that the execution of
some edge in the program can transform state1 into state2
• Defining
– (u,s) (v,s’) iff the program contains a control-flow edge u->v labelled with a statement stmt such that M[stmt]s = s’
Programming Language:Operational Semantics
• A sequence of states s1s2 … sn is said to be an execution (of the program) iff – s1 is the Initial-State
– si si+1 for 1 <= I < n
• A state s is said to be a reachable state iff there exists some execution s1s2 … sn
is such that sn = s.
• Define RS(u) = { s | (u,s) is reachable }
Programming Language:Operational Semantics
• A sequence of states s1s2 … sn is said to be an execution (of the program) iff – s1 is the Initial-State
– si | si+1 for 1 <= I < n
• A state s is said to be a reachable state iff there exists some execution s1s2 … sn
is such that sn = s.
• Define RS(u) = { s | (u,s) is reachable }
All of this formalism for this
one definition
Ideal Points-To Analysis:Formal Definition
• Let u denote a vertex in the CFG
• Define IdealMustPT (u) to be { (p,x) | forall s in RS(u). s(p) == x }
• Define IdealMayPT (u) to be{ (p,x) | exists s in RS(u). s(p) == x }
May-Point-To Analysis:Formal Requirement Specification
For every vertex u in the CFG,compute a set R(u) such that
R(u) { (p,x) | $sÎRS(u). s(p) == x }
Compute R: V -> 2Vars’ such thatR(u) IdealMayPT(u)
(where Var’ = Var U {null})
May Point-To Analysis
May-Point-To Analysis:Formal Requirement Specification
• An algorithm is said to be correct if the solution R it computes satisfies
"uÎV. R(u) IdealMayPT(u)• An algorithm is said to be precise if the solution R it
computes satisfies"uÎV. R(u) = IdealMayPT(u)
• An algorithm that computes a solution R1 is said to be more precise than one that computes a solution R2 if
"uÎV. R1(u) Í R2(u)
Compute R: V -> 2Vars’ such thatR(u) IdealMayPT(u)
Back To OurMay-Point-To Algorithm
p = &x;q = &y;if (?) { q = p;}x = &a;y = &b;z = *q;
p q x y z
null null null null null
x null null null null
x y null null null
x y null null null
x x null null null
x {x,y} null null null
x {x,y} a null null
x {x,y} a b null
x {x,y} a b {a,b}
(May-Point-To Analysis)Algorithm A
• Is this algorithm correct?• Is this algorithm precise?
• Let’s first completely and formally define the algorithm.
Algorithm A: A Formal DefinitionThe “Data Flow Analysis” Recipe
• Define semi-lattice of abstract-values– AbsDataState ::= Var -> 2Var’
– f1 f2 = \x. (f1 (x) È f2 (x))
– bottom = \x.{}
• Define initial abstract-value– InitialAbsState = \x. {null}
• Define transformers for primitive statements
• AS[stmt] : AbsDataState -> AbsDataState
Algorithm A: A Formal DefinitionThe “Data Flow Analysis” Recipe
• Let st(v,u) denote stmt on edge v->u
• Compute the least-fixed-point of the following “dataflow equations”– x(entry) = InitialAbsState
– x(u) = v->u AS(st(v,u)) x(v)
v w
u
st(w,u)st(v,u)
x(v) x(w)
x(u)
Algorithm A:The Transformers
• Abstract transformers for primitive statements– AS[stmt] : AbsDataState -> AbsDataState
• AS[ x = y ] s = s[x s(y)]• AS[ x = null ] s = s[x {null}]• AS[ x = &y ] s = s[x {y}]• AS[ x = *y ] s = s[x s*(s(y))] where s*({v1,…,vn}) = s(v1) È … È s(vn)
• AS[ *x = y ] s = ???
Correctness & Precision
• We have a complete & formal definition of the problem.
• We have a complete & formal definition of a proposed solution.
• How do we reason about the correctness & precision of the proposed solution?
Enter: The French Recipe(Abstract Interpretation)
Abstract Domain
• A semi-lattice (A, b)• Transfer Functions• For every statement
st,AS[st] : A -> A
Concrete (Collecting) Domain
• A semi-lattice (2C, b)• Transfer Functions• For every statement st,
CS*[st] : 2C -> 2C
2Data-State 2Var x Var’
Concrete Domain• Concrete states: C• Semantics: For every statement st,
CS[st] : C -> C
g
a
Points-To Analysis(Abstract Interpretation)
a(Y) = { (p,x) | exists s in Y. s(p) == x }
RS(u)
2Data-State 2Var x Var’
IdealMayPT(u)
MayPT(u)
Íaa
IdealMayPT (u) = a ( RS(u) )
Approximating Transformers:Correctness Criterion
C A
correctly approximated by
c1
c2
f
a1
a2
f#
correctly approximated by
c is said to be correctly approximated by a
iffa(c) Í a
Approximating Transformers:Correctness Criterion
C A
c1
c2
f
a1
a2
f#
concretizationg
abstractiona
requirement:f#(a1) ≥ a (f( g(a1))
Concrete Transformers
• CS[stmt] : Data-State -> Data-State
• CS[ x = y ] s = s[x s(y)]• CS[ x = *y ] s = s[x s(s(y))]• CS[ *x = y ] s = s[s(x) s(y)]• CS[ x = null ] s = s[x null]
• CS*[stmt] : 2Data-State -> 2Data-State
• CS*[st] X = { CS[st]s | s Î X }
Abstract Transformers
• AS[stmt] : AbsDataState -> AbsDataState
• AS[ x = y ] s = s[x s(y)]• AS[ x = null ] s = s[x {null}]• AS[ x = *y ] s = s[x s*(s(y))] where s*({v1,…,vn}) = s(v1) È … È s(vn)
• AS[ *x = y ] s = ???
Algorithm A: TranformersWeak/Strong Update
x: {&y} y: {&x,&z} z: {&a}
x: &b y: &x z: &a
x: &y y: &z z: &bx: {&y,&b} y: {&x,&z} z: {&a,&b}
x: &y y: &x z: &a
x: &y y: &z z: &a
*y = &b;f#*y = &b;f
a
g
Algorithm A: TranformersWeak/Strong Update
x: {&y} y: {&x,&z} z: {&a}
x: &y y: &b z: &a
x: &y y: &b z: &ax: {&y} y: {&b} z: {&a}
x: &y y: &x z: &a
x: &y y: &z z: &a
*x = &b;f#*x = &b;f
a
g
Algorithm A: TransformersWeak/Strong Update
• Transformer for “*p = q”