On Abstraction Refinement for Program Analyses in Datalog

44
On Abstraction Refinement for Program Analyses in Datalog Xin Zhang , Ravi Mangal, Mayur Naik Georgia Tech Radu Grigore, Hongseok Yang Oxford University

description

On Abstraction Refinement for Program Analyses in Datalog. Xin Zhang , Ravi Mangal , Mayur Naik Georgia Tech. Radu Grigore , Hongseok Yang Oxford University. Datalog for program a nalysis. Datalog. What is Datalog ?. Datalog. What is Datalog ?. Input relations: - PowerPoint PPT Presentation

Transcript of On Abstraction Refinement for Program Analyses in Datalog

Page 1: On Abstraction Refinement for Program Analyses in  Datalog

On Abstraction Refinement for Program Analyses in

DatalogXin Zhang, Ravi Mangal,

Mayur NaikGeorgia Tech

Radu Grigore, Hongseok YangOxford University

Page 2: On Abstraction Refinement for Program Analyses in  Datalog

2 6/10/2014

Datalog for program analysis

Datalog

Programming Language Design and Implementation, 2014

Page 3: On Abstraction Refinement for Program Analyses in  Datalog

3 6/10/2014

What is Datalog?

Datalog

Programming Language Design and Implementation, 2014

Page 4: On Abstraction Refinement for Program Analyses in  Datalog

4 6/10/2014

What is Datalog?

Datalog

Input relations:Output relations:Rules:

Programming Language Design and Implementation, 2014

Least fixpoint computation:

edge(i, j).path(i, j).

(1) path(i, i).(2) path(i, k) :- path(i, j), edge(j, k).

Input: edge(0, 1), edge(1, 2).path(0, 0).path(1, 1).path(2, 2).path(0, 1) :- path(0, 0), edge(0, 1).path(0, 2) :- path(0, 1), edge(1, 2).

Page 5: On Abstraction Refinement for Program Analyses in  Datalog

5 6/10/2014

Why Datalog?

Programming Language Design and Implementation, 2014

Datalog

If there exists a path from a to b, and there is an edge from b to c, then there exists a path from a to c:

path(a, c) :- path(a, b), edge(b, c).

Page 6: On Abstraction Refinement for Program Analyses in  Datalog

6 6/10/2014

Why Datalog?

Programming Language Design and Implementation, 2014

k-object-sensitivity,

k = 2, ~100KLOC

Page 7: On Abstraction Refinement for Program Analyses in  Datalog

7 6/10/2014

Limitation

Programming Language Design and Implementation, 2014

k-object-sensitivity,

k = 2, ~100KLOC

k-object-sensitivity,

k = 10, ~500KLOC

Page 8: On Abstraction Refinement for Program Analyses in  Datalog

8 6/10/2014

Program abstraction

Programming Language Design and Implementation, 2014

Precision

Scalability

Abstraction

Page 9: On Abstraction Refinement for Program Analyses in  Datalog

9 6/10/2014

Parametric program abstraction

Programming Language Design and Implementation, 2014

Precision

Scalability

Abstraction

1 1 1 1 1

Page 10: On Abstraction Refinement for Program Analyses in  Datalog

10 6/10/2014

Parametric program abstraction

Programming Language Design and Implementation, 2014

Precision

Scalability

Abstraction

1 0 1 1 0

Page 11: On Abstraction Refinement for Program Analyses in  Datalog

11 6/10/2014

Parametric program abstraction: Example 1

Programming Language Design and Implementation, 2014

1 0 1 1 0

Cloning depth K for each call site and

allocation site

Pointer Analysis

Page 12: On Abstraction Refinement for Program Analyses in  Datalog

12 6/10/2014

Parametric program abstraction: Example 2

Programming Language Design and Implementation, 2014

1 0 1 1 0

Predicates to use as

abstraction predicates

Shape Analysis

Page 13: On Abstraction Refinement for Program Analyses in  Datalog

13 6/10/2014

Program abstraction

Programming Language Design and Implementation, 2014

1 0 1 1 0 0 1 1 0 0

Datalog Progra

m

Datalog Progra

m

alias(p, q)?

alias(m, n)?

Page 14: On Abstraction Refinement for Program Analyses in  Datalog

14 6/10/2014

Program abstraction

Programming Language Design and Implementation, 2014

1 0 1 1 0 0 1 1 0 0

Datalog Progra

m

Datalog Progra

m

alias(p, q)?

alias(m, n)?

Counterexample guided refinement

(CEGAR) via MAXSAT

Page 15: On Abstraction Refinement for Program Analyses in  Datalog

15 6/10/2014

Pointer analysis example

Programming Language Design and Implementation, 2014

f(){ v1 = new ...; v2 = id1(v1); v3 = id2(v2);q2:assert(v3!= v1);}

id1(v){return v;}

g(){ v4 = new ...; v5 = id1(v4); v6 = id2(v5);q1:assert(v6!= v1);}

id2(v){return v;}

Page 16: On Abstraction Refinement for Program Analyses in  Datalog

16 6/10/2014

Pointer analysis as graph reachability

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

6’

7

66’’

7’7’’

a1 a0 b0

c1 c0 d0 d1

a1

c1 c0

b0 b1

d1d0

a0

b1

Page 17: On Abstraction Refinement for Program Analyses in  Datalog

17 6/10/2014

Graph reachability in Datalog

Programming Language Design and Implementation, 2014

Input relations: edge(i, j, n), abs(n)

Output relations: path(i, j)

Rules: (1) path(i, i).(2) path(i, j) :- path(i, k), edge(k, j, n), abs(n).

0

1

2

3

4

5

6’

7

66’’

7’7’’

a1 a0 b0

c1 c0 d0 d1

a1

c1 c0

b0 b1

d1d0

a0

b1

Input tuples:edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0),

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Query Tuple

Original Query

q1: path(0, 5)

assert(v6!= v1)

q2: path(0, 2)

assert(v3!= v1)16 possible abstractions

in total

Page 18: On Abstraction Refinement for Program Analyses in  Datalog

18 6/10/2014

Desired result

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

6’

7

6

7’

6’’

7’’

a1 b0

c1 d0

a1

c1

b0

d0

a0

c0 d1

c0

b1

d1

a0

b1

Input tuples:edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0),

Query Answer

q1: path(0, 5)

a1b0c1d0

q2: path(0, 2)

Impossibility

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Input relations: edge(i, j, n), abs(n)

Output relations: path(i, j)

Rules: (1) path(i, i).(2) path(i, j) :- path(i, k), edge(k, j, n), abs(n).

Page 19: On Abstraction Refinement for Program Analyses in  Datalog

19 6/10/2014

Iteration 1

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

7

66’

7’

6’’

7’’

b0

d0

b0

d0

a0

c0

c0

a0

a1

c1

a1

c1

d1

b1

d1

b1

Query Eliminated Abstractions

q1: path(0, 5)

q2: path(0, 2)

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

path(0, 0).path(0, 6) :- path(0, 0), edge(0, 6, a0), abs(a0).path(0, 1) :- path(0, 6), edge(6, 1, a0), abs(a0).path(0, 7) :- path(0, 1), edge(1, 7, c0), abs(c0).path(0, 2) :- path(0, 7), edge(7, 2, c0), abs(c0).path(0, 4) :- path(0, 6), edge(6, 4, b0), abs(b0).path(0, 7) :- path(0, 4), edge(4, 7, d0), abs(d0).path(0, 5) :- path(0, 7), edge(7, 5, d0), abs(d0).…

Page 20: On Abstraction Refinement for Program Analyses in  Datalog

20 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

7

66’

7’

6’’

7’’

b0

d0

b0

d0

a0

c0

c0

a0

a1

c1

a1

c1

d1

b1

d1

b1

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Query Eliminated Abstractions

q1: path(0, 5)

q2: path(0, 2)

Page 21: On Abstraction Refinement for Program Analyses in  Datalog

21 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

Page 22: On Abstraction Refinement for Program Analyses in  Datalog

22 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

a0c0

Page 23: On Abstraction Refinement for Program Analyses in  Datalog

23 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

a0c0

Page 24: On Abstraction Refinement for Program Analyses in  Datalog

24 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

a0b0d0

Page 25: On Abstraction Refinement for Program Analyses in  Datalog

25 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

7

66’

7’

6’’

7’’

b0

d0

b0

d0

a0

c0

c0

a0

a1

c1

a1

c1

d1

b1

d1

b1

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Query Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0 (4/16)

q2: path(0, 2)

a0c0 (4/16)

Page 26: On Abstraction Refinement for Program Analyses in  Datalog

26 6/10/2014

Encoded as MAXSAT

Programming Language Design and Implementation, 2014

MAXSAT(Find thatMaximize Subject to

Hard Constraints

Soft Constraints

Page 27: On Abstraction Refinement for Program Analyses in  Datalog

27 6/10/2014

Encoded as MAXSAT

Programming Language Design and Implementation, 2014

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Hard constraints:

Soft constraints:

Avoid all the counterexam

ples

Minimize the abstraction

cost

Page 28: On Abstraction Refinement for Program Analyses in  Datalog

28 6/10/2014

Encoded as MAXSAT

Programming Language Design and Implementation, 2014

Hard constraints:

Soft constraints:

Query Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0 (4/16)

q2: path(0, 2)

a0c0 (4/16)

Solution:

a1c0d0

Page 29: On Abstraction Refinement for Program Analyses in  Datalog

29 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c0d0

Iteration 1 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1d0 (4/16)

q2: path(0, 2)

a0c0, a1c0 (4/16)

Constraint s

 𝑪𝟏 𝑪𝟏∧

Page 30: On Abstraction Refinement for Program Analyses in  Datalog

30 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c0d0

Iteration 2 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1d0 (4/16)

q2: path(0, 2)

a0c0, a1c0 (4/16)

Constraint s

 𝑪𝟏 𝑪𝟏∧

Page 31: On Abstraction Refinement for Program Analyses in  Datalog

31 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c0d0

Iteration 2 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1d0 (4/16)

q2: path(0, 2)

a0c0

(4/16)

Constraint s

 𝑪𝟏

Page 32: On Abstraction Refinement for Program Analyses in  Datalog

32 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c1d0

Iteration 2 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1c0d0

(6/16)

q2: path(0, 2)

a0c0, a1c0 (8/16)

Constraint s

 𝑪𝟏

Page 33: On Abstraction Refinement for Program Analyses in  Datalog

33

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1c0d0 (6/16)

q2: path(0, 2)

a0c0, a1c0 (8/16) 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c1d0

Iteration 3

q1 is proven.

a1c1d0

Derivation

Constraint s

 𝑪𝟏

Page 34: On Abstraction Refinement for Program Analyses in  Datalog

34

Constraint s

 𝑪𝟏

6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c1d0

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1c0d0 (6/16)

q2: path(0, 2)

a0c0, a1c0, a1c1, a0c1 (16/16)

Iteration 3

q2 is impossible to prove.

Impossibility

Derivation

q1 is proven.

a1c1d0

Page 35: On Abstraction Refinement for Program Analyses in  Datalog

35 6/10/2014

Mixing counterexamples

Programming Language Design and Implementation, 2014

Iteration 1 Iteration 3

a0c0 a1c1Eliminated Abstractio

ns:

Page 36: On Abstraction Refinement for Program Analyses in  Datalog

36 6/10/2014

Mixing counterexamples

Programming Language Design and Implementation, 2014

Iteration 1 Iteration 3

a0c0 a1c1a0c1

Mixed!

Eliminated Abstractio

ns:

Page 37: On Abstraction Refinement for Program Analyses in  Datalog

37

Implemented in JChord using off-the-shelf solvers: Datalog: bddbddb MAXSAT: MiFuMaX

Applied to two analyses that are challenging to scale: k-object-sensitivity pointer analysis:

flow-insensitive, weak updates, cloning-based typestate analysis:

flow-sensitive, strong updates, summary-based

Evaluated on 8 Java programs from DaCapo and Ashes.

6/10/2014

Experimental setup

Programming Language Design and Implementation, 2014

Page 38: On Abstraction Refinement for Program Analyses in  Datalog

38 6/10/2014

Benchmark characteristics

Programming Language Design and Implementation, 2014

classes methods bytecode(KB)

KLOC

toba-s 1K 6K 423 258

javasrc-p 1K 6.5K 434 265

weblech 1.2K 8K 504 326

hedc 1K 7K 442 283

antlr 1.1K 7.7K 532 303

luindex 1.3K 7.9K 508 295

lusearch 1.2K 8K 511 314

schroeder-m

1.9k 12K 708 460

Page 39: On Abstraction Refinement for Program Analyses in  Datalog

39 6/10/2014

Results: pointer analysis

Programming Language Design and Implementation, 2014

queries abstraction size

iterationstotal

resolved

current

baseline

final max

toba-s 7 7 0 170 18K 10

javasrc-p 46 46 0 470 18K 13

weblech 5 5 2 140 31K 10

hedc 47 47 6 730 29K 18

antlr 143 143 5 970 29K 15

luindex 138 138 67 1K 40K 26

lusearch 322 322 29 1K 39K 17

schroeder-m

51 51 25 450 58K 15

4-object-sensitivity

< 50%

< 3% of max

Page 40: On Abstraction Refinement for Program Analyses in  Datalog

40 6/10/2014

Performance of Datalog: pointer analysis

Programming Language Design and Implementation, 2014

lusearchk = 1, 153s

k = 2, 214s

k = 4, 3h28m

k = 3, 590s

Baseline

Page 41: On Abstraction Refinement for Program Analyses in  Datalog

41 6/10/2014

Performance of MAXSAT: pointer analysis

Programming Language Design and Implementation, 2014

lusearch

Page 42: On Abstraction Refinement for Program Analyses in  Datalog

42 6/10/2014

Statistics of MAXSAT formulae

Programming Language Design and Implementation, 2014

pointer analysis

variables

clauses

toba-s 0.7M 1.5M

javasrc-p 0.5M 0.9M

weblech 1.6M 3.3M

hedc 1.2M 2.7M

antlr 3.6M 6.9M

luindex 2.4M 5.6M

lusearch 2.1M 5M

schroeder-m

6.7M 23.7M

Page 43: On Abstraction Refinement for Program Analyses in  Datalog

43 6/10/2014

Conclusion

Programming Language Design and Implementation, 2014

Abstraction

Datalog MAXSAT

Page 44: On Abstraction Refinement for Program Analyses in  Datalog

44 6/10/2014

Conclusion

Programming Language Design and Implementation, 2014

Datalog

Soundness

Tradeoffs

MAXSAT

Hard Constraints

Soft Constraints

Scalability vs. Precision

Sound vs. Complete

A(x, y):- B(x, z), C(z, y)

𝐴 (𝑥 , 𝑦 )∨¬𝐵 (𝑥 , 𝑧 )∨¬𝐶 (𝑧 , 𝑦 )

1 0 1 1 0

𝑎𝑏𝑠 (𝑎0 ) 𝐰𝐞𝐢𝐠𝐡𝐭𝟏