On Abstraction Refinement for Program Analyses in Datalog

Post on 03-Jan-2016

29 views 1 download

Tags:

description

On Abstraction Refinement for Program Analyses in Datalog. Xin Zhang , Ravi Mangal , Mayur Naik Georgia Tech. Radu Grigore , Hongseok Yang Oxford University. Datalog for program a nalysis. Datalog. What is Datalog ?. Datalog. What is Datalog ?. Input relations: - PowerPoint PPT Presentation

Transcript of On Abstraction Refinement for Program Analyses in Datalog

On Abstraction Refinement for Program Analyses in

DatalogXin Zhang, Ravi Mangal,

Mayur NaikGeorgia Tech

Radu Grigore, Hongseok YangOxford University

2 6/10/2014

Datalog for program analysis

Datalog

Programming Language Design and Implementation, 2014

3 6/10/2014

What is Datalog?

Datalog

Programming Language Design and Implementation, 2014

4 6/10/2014

What is Datalog?

Datalog

Input relations:Output relations:Rules:

Programming Language Design and Implementation, 2014

Least fixpoint computation:

edge(i, j).path(i, j).

(1) path(i, i).(2) path(i, k) :- path(i, j), edge(j, k).

Input: edge(0, 1), edge(1, 2).path(0, 0).path(1, 1).path(2, 2).path(0, 1) :- path(0, 0), edge(0, 1).path(0, 2) :- path(0, 1), edge(1, 2).

5 6/10/2014

Why Datalog?

Programming Language Design and Implementation, 2014

Datalog

If there exists a path from a to b, and there is an edge from b to c, then there exists a path from a to c:

path(a, c) :- path(a, b), edge(b, c).

6 6/10/2014

Why Datalog?

Programming Language Design and Implementation, 2014

k-object-sensitivity,

k = 2, ~100KLOC

7 6/10/2014

Limitation

Programming Language Design and Implementation, 2014

k-object-sensitivity,

k = 2, ~100KLOC

k-object-sensitivity,

k = 10, ~500KLOC

8 6/10/2014

Program abstraction

Programming Language Design and Implementation, 2014

Precision

Scalability

Abstraction

9 6/10/2014

Parametric program abstraction

Programming Language Design and Implementation, 2014

Precision

Scalability

Abstraction

1 1 1 1 1

10 6/10/2014

Parametric program abstraction

Programming Language Design and Implementation, 2014

Precision

Scalability

Abstraction

1 0 1 1 0

11 6/10/2014

Parametric program abstraction: Example 1

Programming Language Design and Implementation, 2014

1 0 1 1 0

Cloning depth K for each call site and

allocation site

Pointer Analysis

12 6/10/2014

Parametric program abstraction: Example 2

Programming Language Design and Implementation, 2014

1 0 1 1 0

Predicates to use as

abstraction predicates

Shape Analysis

13 6/10/2014

Program abstraction

Programming Language Design and Implementation, 2014

1 0 1 1 0 0 1 1 0 0

Datalog Progra

m

Datalog Progra

m

alias(p, q)?

alias(m, n)?

14 6/10/2014

Program abstraction

Programming Language Design and Implementation, 2014

1 0 1 1 0 0 1 1 0 0

Datalog Progra

m

Datalog Progra

m

alias(p, q)?

alias(m, n)?

Counterexample guided refinement

(CEGAR) via MAXSAT

15 6/10/2014

Pointer analysis example

Programming Language Design and Implementation, 2014

f(){ v1 = new ...; v2 = id1(v1); v3 = id2(v2);q2:assert(v3!= v1);}

id1(v){return v;}

g(){ v4 = new ...; v5 = id1(v4); v6 = id2(v5);q1:assert(v6!= v1);}

id2(v){return v;}

16 6/10/2014

Pointer analysis as graph reachability

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

6’

7

66’’

7’7’’

a1 a0 b0

c1 c0 d0 d1

a1

c1 c0

b0 b1

d1d0

a0

b1

17 6/10/2014

Graph reachability in Datalog

Programming Language Design and Implementation, 2014

Input relations: edge(i, j, n), abs(n)

Output relations: path(i, j)

Rules: (1) path(i, i).(2) path(i, j) :- path(i, k), edge(k, j, n), abs(n).

0

1

2

3

4

5

6’

7

66’’

7’7’’

a1 a0 b0

c1 c0 d0 d1

a1

c1 c0

b0 b1

d1d0

a0

b1

Input tuples:edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0),

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Query Tuple

Original Query

q1: path(0, 5)

assert(v6!= v1)

q2: path(0, 2)

assert(v3!= v1)16 possible abstractions

in total

18 6/10/2014

Desired result

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

6’

7

6

7’

6’’

7’’

a1 b0

c1 d0

a1

c1

b0

d0

a0

c0 d1

c0

b1

d1

a0

b1

Input tuples:edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0),

Query Answer

q1: path(0, 5)

a1b0c1d0

q2: path(0, 2)

Impossibility

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Input relations: edge(i, j, n), abs(n)

Output relations: path(i, j)

Rules: (1) path(i, i).(2) path(i, j) :- path(i, k), edge(k, j, n), abs(n).

19 6/10/2014

Iteration 1

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

7

66’

7’

6’’

7’’

b0

d0

b0

d0

a0

c0

c0

a0

a1

c1

a1

c1

d1

b1

d1

b1

Query Eliminated Abstractions

q1: path(0, 5)

q2: path(0, 2)

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

path(0, 0).path(0, 6) :- path(0, 0), edge(0, 6, a0), abs(a0).path(0, 1) :- path(0, 6), edge(6, 1, a0), abs(a0).path(0, 7) :- path(0, 1), edge(1, 7, c0), abs(c0).path(0, 2) :- path(0, 7), edge(7, 2, c0), abs(c0).path(0, 4) :- path(0, 6), edge(6, 4, b0), abs(b0).path(0, 7) :- path(0, 4), edge(4, 7, d0), abs(d0).path(0, 5) :- path(0, 7), edge(7, 5, d0), abs(d0).…

20 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

7

66’

7’

6’’

7’’

b0

d0

b0

d0

a0

c0

c0

a0

a1

c1

a1

c1

d1

b1

d1

b1

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Query Eliminated Abstractions

q1: path(0, 5)

q2: path(0, 2)

21 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

22 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

a0c0

23 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

a0c0

24 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

abs(d0)

path(0,6)edge(6,1,a0) edge(6,4,b0)

path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)

path(0,7)edge(7,2,c0) edge(7,5,d0)

path(0,2) path(0,5)

abs(a0)edge(0,6,a0)path(0,0)

abs(a0) abs(b0)

abs(c0) abs(d0)

a0b0d0

25 6/10/2014

Iteration 1 - derivation graph

Programming Language Design and Implementation, 2014

0

1

2

3

4

5

7

66’

7’

6’’

7’’

b0

d0

b0

d0

a0

c0

c0

a0

a1

c1

a1

c1

d1

b1

d1

b1

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Query Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0 (4/16)

q2: path(0, 2)

a0c0 (4/16)

26 6/10/2014

Encoded as MAXSAT

Programming Language Design and Implementation, 2014

MAXSAT(Find thatMaximize Subject to

Hard Constraints

Soft Constraints

27 6/10/2014

Encoded as MAXSAT

Programming Language Design and Implementation, 2014

abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).

Hard constraints:

Soft constraints:

Avoid all the counterexam

ples

Minimize the abstraction

cost

28 6/10/2014

Encoded as MAXSAT

Programming Language Design and Implementation, 2014

Hard constraints:

Soft constraints:

Query Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0 (4/16)

q2: path(0, 2)

a0c0 (4/16)

Solution:

a1c0d0

29 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c0d0

Iteration 1 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1d0 (4/16)

q2: path(0, 2)

a0c0, a1c0 (4/16)

Constraint s

 𝑪𝟏 𝑪𝟏∧

30 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c0d0

Iteration 2 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1d0 (4/16)

q2: path(0, 2)

a0c0, a1c0 (4/16)

Constraint s

 𝑪𝟏 𝑪𝟏∧

31 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c0d0

Iteration 2 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1d0 (4/16)

q2: path(0, 2)

a0c0

(4/16)

Constraint s

 𝑪𝟏

32 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c1d0

Iteration 2 Derivation

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1c0d0

(6/16)

q2: path(0, 2)

a0c0, a1c0 (8/16)

Constraint s

 𝑪𝟏

33

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1c0d0 (6/16)

q2: path(0, 2)

a0c0, a1c0 (8/16) 6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c1d0

Iteration 3

q1 is proven.

a1c1d0

Derivation

Constraint s

 𝑪𝟏

34

Constraint s

 𝑪𝟏

6/10/2014

Iteration 2 and beyond

Programming Language Design and Implementation, 2014

MAXSAT solver

Datalog

solver

a1c1d0

Query Answer Eliminated Abstractions

q1: path(0, 5)

a0c0d0, a0b0d0, a1c0d0 (6/16)

q2: path(0, 2)

a0c0, a1c0, a1c1, a0c1 (16/16)

Iteration 3

q2 is impossible to prove.

Impossibility

Derivation

q1 is proven.

a1c1d0

35 6/10/2014

Mixing counterexamples

Programming Language Design and Implementation, 2014

Iteration 1 Iteration 3

a0c0 a1c1Eliminated Abstractio

ns:

36 6/10/2014

Mixing counterexamples

Programming Language Design and Implementation, 2014

Iteration 1 Iteration 3

a0c0 a1c1a0c1

Mixed!

Eliminated Abstractio

ns:

37

Implemented in JChord using off-the-shelf solvers: Datalog: bddbddb MAXSAT: MiFuMaX

Applied to two analyses that are challenging to scale: k-object-sensitivity pointer analysis:

flow-insensitive, weak updates, cloning-based typestate analysis:

flow-sensitive, strong updates, summary-based

Evaluated on 8 Java programs from DaCapo and Ashes.

6/10/2014

Experimental setup

Programming Language Design and Implementation, 2014

38 6/10/2014

Benchmark characteristics

Programming Language Design and Implementation, 2014

classes methods bytecode(KB)

KLOC

toba-s 1K 6K 423 258

javasrc-p 1K 6.5K 434 265

weblech 1.2K 8K 504 326

hedc 1K 7K 442 283

antlr 1.1K 7.7K 532 303

luindex 1.3K 7.9K 508 295

lusearch 1.2K 8K 511 314

schroeder-m

1.9k 12K 708 460

39 6/10/2014

Results: pointer analysis

Programming Language Design and Implementation, 2014

queries abstraction size

iterationstotal

resolved

current

baseline

final max

toba-s 7 7 0 170 18K 10

javasrc-p 46 46 0 470 18K 13

weblech 5 5 2 140 31K 10

hedc 47 47 6 730 29K 18

antlr 143 143 5 970 29K 15

luindex 138 138 67 1K 40K 26

lusearch 322 322 29 1K 39K 17

schroeder-m

51 51 25 450 58K 15

4-object-sensitivity

< 50%

< 3% of max

40 6/10/2014

Performance of Datalog: pointer analysis

Programming Language Design and Implementation, 2014

lusearchk = 1, 153s

k = 2, 214s

k = 4, 3h28m

k = 3, 590s

Baseline

41 6/10/2014

Performance of MAXSAT: pointer analysis

Programming Language Design and Implementation, 2014

lusearch

42 6/10/2014

Statistics of MAXSAT formulae

Programming Language Design and Implementation, 2014

pointer analysis

variables

clauses

toba-s 0.7M 1.5M

javasrc-p 0.5M 0.9M

weblech 1.6M 3.3M

hedc 1.2M 2.7M

antlr 3.6M 6.9M

luindex 2.4M 5.6M

lusearch 2.1M 5M

schroeder-m

6.7M 23.7M

43 6/10/2014

Conclusion

Programming Language Design and Implementation, 2014

Abstraction

Datalog MAXSAT

44 6/10/2014

Conclusion

Programming Language Design and Implementation, 2014

Datalog

Soundness

Tradeoffs

MAXSAT

Hard Constraints

Soft Constraints

Scalability vs. Precision

Sound vs. Complete

A(x, y):- B(x, z), C(z, y)

𝐴 (𝑥 , 𝑦 )∨¬𝐵 (𝑥 , 𝑧 )∨¬𝐶 (𝑧 , 𝑦 )

1 0 1 1 0

𝑎𝑏𝑠 (𝑎0 ) 𝐰𝐞𝐢𝐠𝐡𝐭𝟏