Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

34
Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005

Transcript of Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

Page 1: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

Program Analysis using Random Interpretation

Sumit Gulwani

UC-Berkeley

March 2005

Page 2: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

2

Program Analysis

Applications in all aspects of software development, e.g.

• Program correctness– Software bugs are expensive!

• Compiler optimizations– Provide people freedom to write code the way they want

(leaving performance issues to compilers).

• Translation validation– Semantic equivalence of programs before and after

compilation (difficult to trust o/p of compiler for safety-critical systems).

Page 3: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

3

Design choices in Program Analysis

• Completeness (precision, # of false positives)

• Computational complexity

• Ease of implementation

• Soundness = If analysis says “no bugs”, it means “no bugs”.

What if we allow “probabilistic soundness” ?– We get more precise, efficient and even simpler

algorithms.– Earlier probabilistic algorithms were used in other areas

like networks, but not in program analysis.– We obtain a new class of analyses: random interpretation.

Page 4: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

4

Random Interpretation

= Random Testing + Abstract Interpretation

Random Testing:• Test program on random inputs• Simple, efficient but unsound (can’t prove absence of bugs)

Abstract Interpretation:• Class of deterministic program analyses• Interpret (analyze) an abstraction (approximation) of program

• Sound but usually complicated, expensive

Random Interpretation:• Class of randomized program analyses• Almost as simple, efficient as random testing• Almost as sound as abstract interpretation

Page 5: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

5

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert(c+d = 0); assert(c = a+i)

c := 2a + b; d := b – 2i;

True False

FalseTrue

*

*

Example 1

Page 6: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

6

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert(c+d = 0); assert(c = a+i)

c := 2a + b; d := b – 2i;

True False

FalseTrue

*

*

Example 1: Random Testing

• Need to test blue path to falsify second assertion.

• Chances of choosing blue path from set of all 4 paths are small.

• Hence, random testing is unsound.

Page 7: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

7

a+b=i

a+b=i, c=-d

a=i-2, b=2

a+b=i c=2a+b, d=b-2ia+b=i

c=b-a, d=i-2b

a=0, b=i

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert(c+d = 0); assert(c = a+i)

c := 2a + b; d := b – 2i;

True False

FalseTrue

*

*

Example 1: Abstract Interpretation

• Computes invariant at each program point.

• Operations are usually complicated and expensive.

Page 8: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

8

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert(c+d = 0); assert(c = a+i)

c := 2a + b; d := b – 2i;

True False

FalseTrue

*

*

Example 1: Random Interpretation

• Choose random values for input variables.

• Execute both branches of a conditional.

• Combine values of variables at join points.

• Test the assertion.

Page 9: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

9

Outline

• Random Interpretation

Linear arithmetic (POPL 2003)

– Uninterpreted functions (POPL 2004)

– Inter-procedural analysis (POPL 2005)

– Other applications

Page 10: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

10

Linear relationships in programs with linear assignments

• Linear relationships (e.g., x=2y+5) are useful for– Program correctness (e.g. buffer overflows)– Compiler optimizations (e.g., constant and copy

propagation, CSE, Induction variable elimination etc.)

• “programs with linear assignments” does not mean inapplicability to “real” programs– “abstract” other program stmts as non-

deterministic assignments (standard practice in program analysis)

Page 11: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

11

Basic idea in random interpretation

Generic algorithm:

• Choose random values for input variables.

• Execute both branches of a conditional.

• Combine the values of variables at join points.

• Test the assertion.

Page 12: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

12

Idea #1: The Affine Join operation

w = 7

a = 2b = 3

a = 4b = 1

a = 7(2,4) = -10b = 7(3,1) = 15

• Affine join of v1 and v2 w.r.t. weight w

w(v1,v2) ´ w v1 + (1-w) v2

• Affine join preserves common linear relationships (a+b=5)

• It does not introduce false relationships w.h.p.

Page 13: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

13

Idea #1: The Affine Join operation

• Affine join of v1 and v2 w.r.t. weight w

w(v1,v2) ´ w v1 + (1-w) v2

• Affine join preserves common linear relationships (a+b=5)• It does not introduce false relationships w.h.p.• Unfortunately, non-linear relationships are not preserved

(e.g. a £ (1+b) = 8)

w = 5

a = 5(2,4) = -6b = 5(3,1) = 11

w = 7

a = 2b = 3

a = 4b = 1

a = 7(2,4) = -10b = 7(3,1) = 15

Page 14: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

14

Geometric Interpretation of Affine Join

a

ba + b =

5

b = 2

(a = 2, b = 3)

(a = 4, b = 1)

: State before the join

: State after the join

satisfies all the affine relationships that are satisfied by both (e.g. a + b = 5)

Given any relationship that is not satisfied by any of (e.g. b=2), also does not satisfy it with high probability

Page 15: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

i=3, a=0, b=3

i=3

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert (c+d = 0); assert (c = a+i)

i=3, a=-4, b=7

i=3, a=-4, b=7c=23, d=-23

c := 2a + b; d := b – 2i;

i=3, a=1, b=2

i=3, a=-4, b=7c=-1, d=1

i=3, a=-4, b=7 c=11, d=-11

False

False

w1 = 5

w2 = 2

True

True*

*

Example 1

• Choose a random weight for each join independently.

• All choices of random weights verify first assertion

• Almost all choices contradict second assertion

Page 16: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

16

Example 2

We need to make use of the conditional x=y on the true branch to prove the assertion.

a := x + y

b := a

b := 2x

assert (b = 2x)

True Falsex = y ?

Page 17: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

17

Idea #2: The Adjust Operation

• Execute multiple runs of the program in parallel.

• Sample S = Collection of states at a program point

• Adjust(S, e=0) is the sample obtained by linear combination of states in S such that– The equality conditional is satisfied.– Note that original relationships are preserved.

• Use Adjust(S, e=0) on true branch of the conditional e=0

Page 18: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

18

Geometric Interpretation of Adjust

• Program states = points• Adjust = projection onto the hyperplane• Adjust operation loses one point.

Algorithm to obtain S’ = Adjust(S, e=0)

S4

S2S3

S1

S’3

S’1S’2

Hyp

erpl

ane

e =

0

Page 19: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

19

Correctness of Random Interpreter R

• Completeness: If e1=e2, then R ) e1=e2

– assuming non-det conditionals

• Soundness: If e1e2, then R e1 = e2

– error prob. ·

• b, j : number of branches and joins• d: size of set from which random values are

chosen• k: number of points in the sample

– If j = b = 10, k = 15, d ¼ 232, then error ·

Page 20: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

20

Proof Methodology

Proving correctness was the most complicated part in this work. We used the following methodology.

• Design an appropriate deterministic algorithm (need not be efficient)

• Prove (by induction) that the randomized algorithm simulates each step of the deterministic algorithm with high probability.

Page 21: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

21

Outline

• Random Interpretation

– Linear arithmetic (POPL 2003)

Uninterpreted functions (POPL 2004)

– Inter-procedural analysis (POPL 2005)

– Other applications

Page 22: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

22

Problem: Global value numbering

a := 5;x := a*b;y := 5*b;z := b*a;

a := 5;x := F(a,b);y := F(5,b);z := F(b,a);

Abstraction

• x=y and x=z• Reasoning about multiplication is undecidable

• only x=y• Reasoning is decidable but tricky in presence of joins

• Axiom: If x1=y1 and x2=y2, then F(x1,x2)=F(y1,y2)

• Goal: Detect expression equivalence when program operators are abstracted using “uninterpreted functions”

• Application: Compiler optimizations, Translation validation

Page 23: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

assert(x = y);

assert(z = F(y));

*x = (a,b)

y = (a,b)

z = (F(a),F(b))

F(y) = F((a,b))

• Typical algorithms treat as uninterpreted– Hence cannot verify the second assertion

• The randomized algorithm interprets – as affine join operation w

x := a; y := a;

z := F(a);

x := b; y := b;

z := F(b);

Example

True False

Page 24: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

24

How to “execute” uninterpreted functions ?

Expression Language e := y | F(e1,e2)

• Choose a random interpretation for F

• Non-linear interpretation– E.g. F(e1,e2) = r1e1

2 + r2e22

– Preserves all equivalences in straight-line code– But not across join points

• Let’s try linear interpretation

Page 25: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

25

Random Linear Interpretation

• Encode F(e1,e2) = r1e1 + r2e2

• Preserves all equivalences across a join point• Introduces false equivalences in straight-line code. E.g. e and e’ have same encodings even though e

e’

• Problem: Scalar multiplication is commutative.

• Solution: Choose r1 and r2 to be random matrices and evaluate expressions to vectors

F

F F

a b c d

e = F

F F

a c b d

e’ = Encodings

e = r1(r1a+r2b) + r2(r1c+r2d)

= r12(a)+r1r2(b)+r2r1(c)

+r22(d)

e’ = r12(a)+r1r2 (c)+r2r1(b)

+r22(d)

Page 26: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

26

Outline

• Random Interpretation

– Linear arithmetic (POPL 2003)

– Uninterpreted functions (POPL 2004)

Inter-procedural analysis (POPL 2005)

– Other applications

Page 27: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

27

Example

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert (c + d = 0); assert (c = a + i)

c := 2a + b; d := b – 2i;

True False

False

•The second assertion is true in the context i=2.

•Interprocedural Analysis requires computing procedure summaries.

True

*

*

Page 28: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

i=2

a=0, b=i

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert (c+d = 0); assert (c = a+i)

a=8-4i, b=5i-8

a=8-4i, b=5i-8c=21i-40, d=40-21i

c := 2a + b; d := b – 2i;

a=i-2, b=2

a=8-4i, b=5i-8c=8-3i, d=3i-8

a=8-4i, b=5i-8 c=9i-16, d=16-9i

False

False

w1 = 5

w2 = 2

Idea #1: Keep input variables symbolic

•Do not choose random values for input variables (to later instantiate by any context).

• Resulting program state at the end is a random procedure summary.

a=0, b=2c=2, d=-2

True

True

*

*

Page 29: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

29

Experiments

0255075

go (29K) ijpeg(28K) li (23K) gzip (8K)

# of inputs

f ound to be

constants

Page 30: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

30

Experiments

0255075

go (29K) ijpeg(28K) li (23K) gzip (8K)

# of inputs

f ound to be

constants

0

25

50

go (29K) ijpeg(28K) li (23K) gzip (8K)

time (in s)

• Randomized algorithm discovers 10-70% more facts.

• Randomized algorithm is slower by a factor of 2.

Randomized Deterministic

Page 31: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

31

Experimental measure of error

The % of incorrect relationships decreases with increase in • S = size of set from which random values are chosen.• N = # of random summaries used.

2 95.5 95.5 95.5

3 64.3 3.2 0

4 0.2 0 0

5 0 0 0

6 0 0 0

S

N

The experimental results are better than what is predicted by theory.

210 216 231

Page 32: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

32

Outline

• Random Interpretation

– Linear arithmetic (POPL 2003)

– Uninterpreted functions (POPL 2004)

– Inter-procedural analysis (POPL 2005)

Other applications

Page 33: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

33

Other applications of random interpretation

• Model Checking– Randomized equivalence testing algorithm for

FCEDs, which represent conditional linear expressions and are generalization of BDDs. (SAS 04)

• Theorem Proving– Randomized decision procedure for linear arithmetic

and uninterpreted functions. This runs an order of magnitude faster than det. algo. (CADE 03)

• Ideas for deterministic algorithms– PTIME algorithm for global value numbering, thereby

solving a 30 year old open problem. (SAS 04)

Page 34: Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

Summary

Linear Arithmetic Affine Join, Adjust

Lessons Learned

•Randomization buys efficiency, simplicity at cost of prob. soundness.

•Randomization suggests ideas for deterministic algorithms.

•Combining randomized and symbolic techniques is powerful.

Uninterpreted Fns.

Vectors

Interproc. Analysis

Symbolic i/p variables

Key Idea