bddbddb:bddbddb:Using Datalog and BDDs for Using Datalog and BDDs for
Program AnalysisProgram Analysis
John WhaleyStanford University and moka5 Inc.
June 11, 2006
June 11, 2006 Using Datalog and BDDsfor Program Analysis
2
Implementing Program Analysis
…56 pages!
vs.
• 2x faster• Fewer bugs• Extensible
June 11, 2006 Using Datalog and BDDsfor Program Analysis
3
Is it really that easy?
• Requires:– A different way of thinking– Knowledge, experience, and intuition– Perseverance to try different techniques– A lot of tuning and tweaking– Luck
• Despite all this, people who use it swear by it and could “never go back”
June 11, 2006 Using Datalog and BDDsfor Program Analysis
4
Tutorial StructurePart I: Essential Background
– …
Part II: Using the Tools– …
Part III: Developing Advanced Analyses– …
Part IV: Profiling, Debugging, Avoiding Gotchas– …
Short break every 30 minutes
June 11, 2006 Using Datalog and BDDsfor Program Analysis
5
Tutorial StructurePart I: Essential Background
– Datalog for Program Analysis– Binary Decision Diagrams
Part II: Using the Tools– …
Part III: Developing Advanced Analyses– …
Part IV: Profiling, Debugging, Avoiding Gotchas– …
June 11, 2006 Using Datalog and BDDsfor Program Analysis
6
Tutorial StructurePart I: Essential Background
– …
Part II: Using the Tools– bddbddb– Compiler interface (Joeq compiler)– Datalog editor in Eclipse– Interactive mode
Part III: Developing Advanced Analyses– …
Part IV: Profiling, Debugging, Avoiding Gotchas– …
June 11, 2006 Using Datalog and BDDsfor Program Analysis
7
Tutorial StructurePart I: Essential Background
– …
Part II: Using the Tools– …
Part III: Developing Advanced Analyses– Context sensitivity– Combining multiple analyses– Race detection examples– Using advanced bddbddb features
Part IV: Profiling, Debugging, Avoiding Gotchas– …
June 11, 2006 Using Datalog and BDDsfor Program Analysis
8
Tutorial StructurePart I: Essential Background
– …
Part II: Using the Tools– …
Part III: Developing Advanced Analyses– …
Part IV: Profiling, Debugging, Avoiding Gotchas– Variable ordering– Iteration order– Machine learning– What it’s good for, what it isn’t good for
June 11, 2006 Using Datalog and BDDsfor Program Analysis
9
Try it yourself…
• Available as moka5 LivePC– Non-intrusive installation in a VM– Automatically kept up to date– Easy to try, easy to share– Complete environment on a USB stick
June 11, 2006 Using Datalog and BDDsfor Program Analysis
10
Part I:Part I: Essential Background
Program Analysis in DatalogProgram Analysis in Datalog
June 11, 2006 Using Datalog and BDDsfor Program Analysis
11
Datalog
• Declarative language for deductive databases [Ullman 1989]– Like Prolog, but no function symbols,
no predefined evaluation strategy
June 11, 2006 Using Datalog and BDDsfor Program Analysis
12
Datalog Basics
Atom = Reach(d,x,i)
Literal = Atom or NOT Atom
Rule = Atom :- Literal & … & Literal
Predicate
Arguments:variables or constants
The body :For each assignment of valuesto variables that makes all thesetrue …
Make thisatom true(the head ).
June 11, 2006 Using Datalog and BDDsfor Program Analysis
13
Datalog Example
parent(x,y) :- child(y,x).grandparent(x,z) :- parent(x,y), parent(y,z).
ancestor(x,y) :- parent(x,y).ancestor(x,z) :- parent(x,y), ancestor(y,z).
June 11, 2006 Using Datalog and BDDsfor Program Analysis
14
Datalog
• Intuition: subgoals in the body are combined by “and” (strictly speaking: “join”).
• Intuition: Multiple rules for a predicate (head) are combined by “or.”
June 11, 2006 Using Datalog and BDDsfor Program Analysis
15
Another Datalog Example
_ means “Dont-care” (at least one)! means “Not”
hasChild(x) :- child(_,x).hasNoChild(x) :- !child(_,x).
hasSibling(x) :- child(x,y), child(z,y), z!=x.onlyChild(x) :- child(x,_), !hasSibling(x).
“!” inverts the relation, not the atom!
June 11, 2006 Using Datalog and BDDsfor Program Analysis
16
Reaching Defs in Datalog
Reach(d,x,j) :- Reach(d,x,i), StatementAt(i,s), !Assign(s,x), Follows(i,j).
Reach(s,x,j) :- StatementAt(i,s), Assign(s,x), Follows(i,j).
June 11, 2006 Using Datalog and BDDsfor Program Analysis
17
Definition: EDB Vs. IDB Predicates
• Some predicates come from the program, and their tuples are computed by inspection.– Called EDB, or extensional database
predicates.
• Others are defined by the rules only.– Called IDB, or intensional database
predicates.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
18
Negation
• Negation makes things tricky.• Semantics of negation
– No negation allowed [Ullman 1988]– Stratified Datalog [Chandra 1985]– Well-founded semantics [Van Gelder
1991]
June 11, 2006 Using Datalog and BDDsfor Program Analysis
19
Stratification
• A risk occurs if there are negated literals involved in a recursive predicate.– Leads to oscillation in the result.
• Requirement for stratification :– Must be able to order the IDB predicates so
that if a rule with P in the head has NOT Q in the body, then Q is either EDB or earlier in the order than P.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
20
Example: Nonstratification
P(x) :- E(x), !P(x).• If E(1) is true, is P(1) true?• It is after the first round.• But not after the second.• True after the third, not after the
fourth,…
June 11, 2006 Using Datalog and BDDsfor Program Analysis
21
Iterative Algorithm for Datalog
• Start with the EDB predicates = “whatever the code dictates,” and with all IDB predicates empty.
• Repeatedly examine the bodies of the rules, and see what new IDB facts can be discovered from the EDB and existing IDB facts.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
22
Datalog evaluation strategy
• “Semi-naïve” evaluation– Remember that a new fact can be
inferred by a rule in a given round only if it uses in the body some fact discovered on the previous round.
• Evaluation strategy– Top-down (goal-directed) [Ullman 1985]– Bottom-up (infer from base facts)
[Ullman 1989]
June 11, 2006 Using Datalog and BDDsfor Program Analysis
23
Our Dialect of Datalog
• Totally-ordered finite domains– Domains are of a given, finite size– Makes all Datalog programs “safe”– Cannot mix variables of different domains
• Constants (named/integers)• Comparison operators:
= != < <= > >=
• Dont-care: _ Universe: *
June 11, 2006 Using Datalog and BDDsfor Program Analysis
24
Why Datalog?
• Developed a tool to translate inference rules to BDD implementation
• Later, discovered Datalog (Ullman, Reps)• Semantics of BDDs match Datalog exactly
– Obvious implementation of relations– Operations occur a set-at-a-time– Fast set compare, set difference– Wealth of literature about semantics,
optimization, etc.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
25
Inference Rules
• Datalog rules directly correspond to inference rules.
Assign(v1, v2), vPointsTo(v2, o)Assign(v1, v2), vPointsTo(v2, o).
vPointsTo(v1, o)vPointsTo(v1, o)
:-
June 11, 2006 Using Datalog and BDDsfor Program Analysis
26
Flow-Insensitive Pointer Analysis
o1: p = new Object();
o2: q = new Object();
p.f = q; r = p.f;
p o1
q o2
fr
Input TuplesvPointsTo(p, o1)
vPointsTo(q, o2)
Store(p, f, q)Load(p, f, r)
Output RelationshPointsTo(o1, f, o2)
vPointsTo(r, o2)
June 11, 2006 Using Datalog and BDDsfor Program Analysis
27
vPointsTo(v1, o) :- Assign(v1, v2), vPointsTo(v2, o).
v1
ov2
Inference Rule in Datalog
v1 = v2;
Assignments:
June 11, 2006 Using Datalog and BDDsfor Program Analysis
28
hPointsTo(o1, f, o2):- Store(v1, f, v2),
vPointsTo(v1, o1), vPointsTo(v2, o2).
v1 o1
v2 o2
f
Inference Rule in Datalog
v1.f = v2;
Stores:
June 11, 2006 Using Datalog and BDDsfor Program Analysis
29
vPointsTo(v2, o2):- Load(v1, f, v2),
vPointsTo(v1, o1), hPointsTo(o1, f, o2).
v1 o1
v2 o2
f
Inference Rule in Datalog
v2 = v1.f;
Loads:
June 11, 2006 Using Datalog and BDDsfor Program Analysis
30
The Whole Algorithm
vPointsTo(v1, o) :- Assign(v1, v2), vPointsTo(v2, o).
hPointsTo(o1, f, o2):- Store(v1, f, v2),
vPointsTo(v1, o1), vPointsTo(v2, o2).
vPointsTo(v2, o2):- Load(v1, f, v2),
vPointsTo(v1, o1), hPointsTo(o1, f, o2).
vPointsTo(v, o) :- vPointsTo0(v, o).
June 11, 2006 Using Datalog and BDDsfor Program Analysis
31
Format of a Datalog file
• DomainsName Size ( map file )V 65536 var.mapH 32768
• RelationsName ( <attribute list> ) flagsStore (v1 : V, f : F, v2 : V) inputPointsTo (v : V, h : H) input, output
• RulesHead :- Body .PointsTo(v1,h) :- Assign(v1,v), PointsTo(v,h).
June 11, 2006 Using Datalog and BDDsfor Program Analysis
32
Key Point
• Program information is stored in a relational database.– Everything in the program is numbered.
• Write declarative inference rules to infer new facts about the program.
• Negations OK if they are not in a recursive cycle.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
33
Take a break…(Next up: Binary Decision Diagrams)
June 11, 2006 Using Datalog and BDDsfor Program Analysis
34
Part I:Part I: Essential Background
Binary Decision DiagramsBinary Decision Diagrams
June 11, 2006 Using Datalog and BDDsfor Program Analysis
35
Call graph relation
• Call graph expressed as a relation.– Five edges:
• Calls(A,B)• Calls(A,C)• Calls(A,D)• Calls(B,D)• Calls(C,D)
B
D
C
A
June 11, 2006 Using Datalog and BDDsfor Program Analysis
36
Call graph relation
• Relation expressed as a binary function.– A=00, B=01, C=10, D=11
B
D
C
A 00
1001
11
Calls(A,B)Calls(A,C)Calls(A,D)Calls(B,D)Calls(C,D)
→ 00 01
→ 00 10
→ 00 11
→ 01 11
→ 10 11
June 11, 2006 Using Datalog and BDDsfor Program Analysis
37
Call graph relation
• Relation expressed as a binary function.– A=00, B=01, C=10, D=11
from to
x1 x2 x3 x4 f
0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 00 1 1 0 00 1 1 1 11 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 0
B
D
C
A 00
1001
11
June 11, 2006 Using Datalog and BDDsfor Program Analysis
38
Binary Decision Diagrams (Bryant 1986)
• Graphical encoding of a truth table.
x2
x4
x3 x3
x4 x4 x4
0 0 0 1 0 0 0 0
x2
x4
x3 x3
x4 x4 x4
0 1 1 1 0 0 0 1
x1 0 edge
1 edge
June 11, 2006 Using Datalog and BDDsfor Program Analysis
39
Binary Decision Diagrams
• Collapse redundant nodes.
x2
x4
x3 x3
x4 x4 x4
0 0 0 0 0 0 0
x2
x4
x3 x3
x4 x4 x4
0 0 0 0
x1
11 1 1 1
0 edge
1 edge
June 11, 2006 Using Datalog and BDDsfor Program Analysis
40
Binary Decision Diagrams
• Collapse redundant nodes.
x2
x4
x3 x3
x4 x4 x4
x2
x4
x3 x3
x4 x4 x4
0
x1
1
0 edge
1 edge
June 11, 2006 Using Datalog and BDDsfor Program Analysis
41
Binary Decision Diagrams
• Collapse redundant nodes.
x2
x4
x3 x3
x2
x3 x3
x4 x4
0
x1
1
0 edge
1 edge
June 11, 2006 Using Datalog and BDDsfor Program Analysis
42
Binary Decision Diagrams
• Collapse redundant nodes.
x2
x4
x3 x3
x2
x3
x4 x4
0
x1
1
0 edge
1 edge
June 11, 2006 Using Datalog and BDDsfor Program Analysis
43
Binary Decision Diagrams
• Eliminate unnecessary nodes.
x2
x4
x3 x3
x2
x3
x4 x4
0
x1
1
0 edge
1 edge
June 11, 2006 Using Datalog and BDDsfor Program Analysis
44
Binary Decision Diagrams
• Eliminate unnecessary nodes.
x2
x3
x2
x3
x4
0
x1
1
0 edge
1 edge
June 11, 2006 Using Datalog and BDDsfor Program Analysis
45
Binary Decision Diagrams
• Size depends on amount of redundancy,NOT size of relation.– Identical subtrees share the same
representation.– As set gets very large, more nodes have
identical zero and one successors, so the size decreases.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
46
BDD Variable Order is Important!
x1
x3
x4
0 1
x2
x1x2 + x3x4
x1<x2<x3<x4 x1<x3<x2<x4
x1
x3
x4
0 1
x2
x3
x2
June 11, 2006 Using Datalog and BDDsfor Program Analysis
47
Variable ordering is NP-hard
• No good general heuristic solutions• Dynamic reordering heuristics don’t
work well on these problems
• We use:– Trial and error– Active learning
June 11, 2006 Using Datalog and BDDsfor Program Analysis
48
Arguments Arguments AA, , BB, , opop AA and BB: Boolean Functions
Represented as OBDDs op: Boolean Operation (e.g., ^, &, |)
Apply Operation
• Concept– Basic technique for building OBDD from Boolean
formula.AA BBopop
ResultResult OBDD representing
composite function
AA opop BB
A op BA op B
0
d
c
b
1
a
0
d
c
b
1
a
0 1
d
c
a
b
0
d
1
c
a
|
June 11, 2006 Using Datalog and BDDsfor Program Analysis
49
0 1
d
c
a
B3 B4
B2
B5
B1
Argument A
Operation
Argument B
b
0
d
1
c
a
A4 A5
A3
A2
A6
A1
Apply Execution Example
• Optimizations– Dynamic programming– Early termination rules
|
Recursive Calls
A3,B2
A6,B2
A2,B2
A3,B4A5,B2
A6,B5
A1,B1
A5,B4A4,B3
June 11, 2006 Using Datalog and BDDsfor Program Analysis
50
0 1
d
c
b
11
c
a
Without Reduction With Reduction
0
d
c
b
1
a
Apply Result Generation
– Recursive calling structure implicitly defines unreduced BDD
– Apply reduction rules bottom-up as return from recursive calls
Recursive Calls
A3,B2
A6,B2
A2,B2
A3,B4A5,B2
A6,B5
A1,B1
A5,B4A4,B3
June 11, 2006 Using Datalog and BDDsfor Program Analysis
51
BDD implementation
• ‘Unique’ table– Huge hash table– Each entry: level, left, right, hash, next
• Operation cache– Memoization cache for operations
• Garbage collection– Mark and sweep, free list.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
52
Code for BDD ‘and’.
Base case:
Memo cache lookup:
Recursive step:
Memo cache insert:
June 11, 2006 Using Datalog and BDDsfor Program Analysis
53
BDD Libraries• BuDDy
– Simple, fast, memory-friendly– Identifies BDD by index in unique table
• JavaBDD– 100% Java, based on BuDDy– Also native interface to BuDDY, CUDD, CAL, JDD
• CUDD– Most popular, most feature-complete– Not as fast as BuDDy– Other types: ZDD, ADD
• JDD– 100% Java, fresh implementation– Still under development
June 11, 2006 Using Datalog and BDDsfor Program Analysis
54
Depth-first vs. breadth-first
• BDD algorithms have natural depth-first recursive formulations.
• Some work on using breadth-first evaluation for better parallelism and locality– CAL: breadth-first BDD package
• General idea: Assume independent, fixup if not.
• Doesn’t perform well in practice.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
55
Take a break…(Next up: Using the Tools)
June 11, 2006 Using Datalog and BDDsfor Program Analysis
56
Tutorial StructurePart I: Essential Background
– Datalog for Program Analysis– Binary Decision Diagrams
Part II: Using the Tools– bddbddb– Compiler interface (Joeq compiler)– Datalog editor in Eclipse– Interactive mode
Part III: Developing Advanced Analyses– Context sensitivity– Combining multiple analyses– Race detection examples– Using advanced bddbddb features
Part IV: Profiling, Debugging, Avoiding Gotchas– Variable ordering– Iteration order– Machine learning– What it’s good for, what it isn’t good for
June 11, 2006 Using Datalog and BDDsfor Program Analysis
57
bddbddbbddbddb((BDDBDD--bbased ased ddeductive eductive ddataatabbase)ase)
Part II:Part II: Using the Tools
June 11, 2006 Using Datalog and BDDsfor Program Analysis
58
bddbddb System Overview
Joeq frontend
Java bytecod
e
Datalogprogram
Input relations
Output relations
June 11, 2006 Using Datalog and BDDsfor Program Analysis
59
Compiler Frontend
• Convert IR into tuples• Tuples format:
# V0:16 F0:11 V1:160 0 10 1 21470 0 1464
header line
one tuple per line
June 11, 2006 Using Datalog and BDDsfor Program Analysis
60
Compiler Frontend
• Robust frontends:– Joeq compiler– Soot compiler– SUIF compiler (for C code)
• Still experimental:– Eclipse frontend– gcc frontend– …
June 11, 2006 Using Datalog and BDDsfor Program Analysis
61
Extracting Relations
• Idea: Iterate thru compiler IR, numbering and dumping relations of interest.– Types– Methods– Fields– Variables– …
June 11, 2006 Using Datalog and BDDsfor Program Analysis
62
joeq.Main.GenRelations
• Generate initial relations for points-to analysis.– Does initial pass to discover call graph.
• Options:-fly : dump on-the-fly call graph info-cs : dump context-sensitive info-ssa : dump SSA representation-partial : no call graph discovery-Dpa.dumppath= : where to save files-Dpa.icallgraph= : location of initial call graph-Dpa.dumpdotgraph : dump call graph in dot
June 11, 2006 Using Datalog and BDDsfor Program Analysis
63
Demo of joeq GenRelations
June 11, 2006 Using Datalog and BDDsfor Program Analysis
64
bddbddb:bddbddb:From Datalog to BDDsFrom Datalog to BDDs
Part II:Part II: Using the Tools
June 11, 2006 Using Datalog and BDDsfor Program Analysis
65
An Adventure in BDDs• Context-sensitive numbering scheme
– Modify BDD library to add special operations.– Can’t even analyze small programs. Time:
• Improved variable ordering– Group similar BDD variables together.– Interleave equivalence relations.– Move common subsets to edges of variable
order. Time: 40h• Incrementalize outermost loop
– Very tricky, many bugs. Time: 36h• Factor away control flow, assignments
– Reduces number of variables. Time: 32h
June 11, 2006 Using Datalog and BDDsfor Program Analysis
66
An Adventure in BDDs
• Exhaustive search for best BDD order– Limit search space by not considering intradomain
orderings. Time: 10h
• Eliminate expensive rename operations– When rename changes relative order, result is not
isomorphic. Time: 7h
• Improved BDD memory layout– Preallocate to guarantee contiguous. Time: 6h
• BDD operation cache tuning– Too small: redo work, too big: bad locality– Parameter sweep to find best values. Time: 2h
June 11, 2006 Using Datalog and BDDsfor Program Analysis
67
An Adventure in BDDs
• Simplified treatment of exceptions– Reduce number of variables, iterations necessary
for convergence. Time: 1h
• Change iteration order– Required redoing much of the code. Time:
48m
• Eliminate redundant operations– Introduced subtle bugs. Time: 45m
• Specialized caches for different operations– Different caches for and, or, etc. Time: 41m
June 11, 2006 Using Datalog and BDDsfor Program Analysis
68
An Adventure in BDDs
• Compacted BDD nodes– 20 bytes 16 bytes Time: 38m
• Improved BDD hashing function– Simpler hash function. Time: 37m
• Total development time: 1 year– 1 year per analysis?!?
• Optimizations obscured the algorithm.• Many bugs discovered, maybe still more.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
69
bddbddb:bddbddb:BDDBDD--BBased ased DDeductive eductive
DDataataBBasease• Automatically generate from Datalog
– Optimizations based on my experience with handcoded version.
– Plus traditional compiler algorithms.
• bddbddb even better than handcoded– handcoded: 37m bddbddb: 19m
June 11, 2006 Using Datalog and BDDsfor Program Analysis
70
Datalog BDDsDatalog BDDs
Relations Boolean functions
Relation ops: ⋈, ∪, select, project
Boolean function ops:∧, ∨, −, ∼
Relation at a time Function at a time
Semi-naïve evaluation
Incrementalization
Fixed-point Iterate until stable
June 11, 2006 Using Datalog and BDDsfor Program Analysis
71
Compiling Datalog to BDDs
1. Apply Datalog source level transforms.2. Stratify and determine iteration order.3. Translate into relational algebra IR.4. Optimize IR and replace relational
algebra ops with equivalent BDD ops.5. Assign relation attributes to physical BDD
domains.6. Perform more optimizations after domain
assignment.7. Interpret the resulting program.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
72
High-Level Transform:Magic Set Transformation
• Add “magic” predicates to control generated tuples [Bancilhon 1986, Beeri 1987]– Combines ideas from top-down and
bottom-up evaluation
• Doesn’t always help– Leads to more iterations– BDDs are good at large operations
June 11, 2006 Using Datalog and BDDsfor Program Analysis
73
Predicate Dependency Graph
vPointsTo
hPointsTo
Store
Load
vPointsTo0
Assign
vPointsTo(v, o) :- vPointsTo0(v, o).
add edge from RHS to LHS
vPointsTo(v1, o) :- Assign(v1, v2), vPointsTo(v2, o).
hPointsTo(o1, f, o2):- Store(v1, f, v2),
vPointsTo(v1, o1), vPointsTo(v2, o2).
vPointsTo(v2, o2):- Load(v1, f, v2),
vPointsTo(v1, o1), hPointsTo(o1, f, o2).
June 11, 2006 Using Datalog and BDDsfor Program Analysis
74
Determining Iteration Order
• Tradeoff between faster convergence and BDD cache locality
• Static heuristic– Visit rules in reverse post-order– Iterate shorter loops before longer loops
• Profile-directed feedback• User can control iteration order
– pri=# keywords on rules/relations
June 11, 2006 Using Datalog and BDDsfor Program Analysis
75
Predicate Dependency Graph
vPointsTo
hPointsTo
Store
Load
vPointsTo0
Assign
June 11, 2006 Using Datalog and BDDsfor Program Analysis
76
Datalog to Relational Algebra
vPointsTo(v1, o) :- Assign(v1, v2), vPointsTo(v2, o).
t1 = ρvariable→source(vPointsTo);
t2 = assign ⋈ t1;
t3 = πsource(t2);
t4 = ρdest→variable(t3);
vPointsTo = vPointsTo ∪ t4;
June 11, 2006 Using Datalog and BDDsfor Program Analysis
77
Incrementalization
t1 = ρvariable→source(vP);
t2 = assign ⋈ t1;
t3 = πsource(t2);
t4 = ρdest→variable(t3);
vP = vP ∪ t4;
vP’’ = vP – vP’;
vP’ = vP;
assign’’ = assign – assign’;
assign’ = assign;
t1 = ρvariable→source(vP’’);
t2 = assign ⋈ t1;
t5 = ρvariable→source(vP);
t6 = assign’’ ⋈ t5;
t7 = t2 ∪ t6;
t3 = πsource(t7);
t4 = ρdest→variable(t3);
vP = vP ∪ t4;
June 11, 2006 Using Datalog and BDDsfor Program Analysis
78
Optimize into BDD operationsvP’’ = vP – vP’;
vP’ = vP;
assign’’ = assign – assign’;
assign’ = assign;
t1 = ρvariable→source(vP’’);
t2 = assign ⋈ t1;
t5 = ρvariable→source(vP);
t6 = assign’’ ⋈ t5;
t7 = t2 ∪ t6;
t3 = πsource(t7);
t4 = ρdest→variable(t3);
vP = vP ∪ t4;
vP’’ = diff(vP, vP’);vP’ = copy(vP);t1 = replace(vP’’,variable→source);
t3 = relprod(t1,assign,source);
t4 = replace(t3,dest→variable);
vP = or(vP, t4);
June 11, 2006 Using Datalog and BDDsfor Program Analysis
79
Physical domain assignment
vP’’ = diff(vP, vP’);vP’ = copy(vP);t1 = replace(vP’’,variable→source);
t3 = relprod(t1,assign,source);
t4 = replace(t3,dest→variable);
vP = or(vP, t4);
vP’’ = diff(vP, vP’);vP’ = copy(vP);t3 =
relprod(vP’’,assign,V0);t4 = replace(t3, V1→V0);
vP = or(vP, t4);
• Minimizing renames is NP-complete• Renames have vastly different costs• Priority-based assignment algorithm
June 11, 2006 Using Datalog and BDDsfor Program Analysis
80
Other optimizations
• Dead code elimination• Constant propagation• Definition-use chaining• Redundancy elimination• Global value numbering• Copy propagation• Liveness analysis
June 11, 2006 Using Datalog and BDDsfor Program Analysis
81
Splitting rules
R(a,e) :- A(a,b), B(b,c), C(c,d), R(d,e).Can be split into:
T1(a,c) :- A(a,b), B(b,c).
T2(a,d) :- T1(a,c), C(c,d).
R(a,e) :- T2(a,d), R(d,e).
Affects incrementalization, iteration.Use “split” keyword to auto-split rules.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
82
Other Tools
• Banshee (John Kodumal)– Results are harder to use (not relational)
• Paddle/Jedd (Ondrej Lhotak)– Imperative style: more expressive– Not as efficient, doesn’t scale as well
June 11, 2006 Using Datalog and BDDsfor Program Analysis
83
Jedd code
vP’’ = diff(vP, vP’);vP’ = copy(vP);t1 = replace(vP’’,variable→source);
t3 = relprod(t1,assign,source);
t4 = replace(t3,dest→variable);
vP = or(vP, t4);
• Jedd code is like bddbddb internal IR before domain assignment:
June 11, 2006 Using Datalog and BDDsfor Program Analysis
84
Demo of using bddbddb
June 11, 2006 Using Datalog and BDDsfor Program Analysis
85
Tutorial StructurePart I: Essential Background
– Datalog for Program Analysis– Binary Decision Diagrams
Part II: Using the Tools– bddbddb– Compiler interface (Joeq compiler)– Datalog editor in Eclipse– Interactive mode
Part III: Developing Advanced Analyses– Context sensitivity– Combining multiple analyses– Race detection examples– Using advanced bddbddb features
Part IV: Profiling, Debugging, Avoiding Gotchas– Variable ordering– Iteration order– Machine learning– What it’s good for, what it isn’t good for
June 11, 2006 Using Datalog and BDDsfor Program Analysis
86
Context SensitivityContext SensitivityPart III:Part III: Developing Advanced Analyses
June 11, 2006 Using Datalog and BDDsfor Program Analysis
87
Old Technique: Summary-Based Analysis
• Idea: Summarize the effect of a method on its callers.– Sharir, Pnueli [Muchnick 1981]– Landi, Ryder [PLDI 1992]– Wilson, Lam [PLDI 1995]– Whaley, Rinard [OOPSLA 1999]
June 11, 2006 Using Datalog and BDDsfor Program Analysis
88
Old Technique: Summary-Based Analysis
• Problems:– Difficult to summarize pointer analysis.– Composed summaries can get large.– Recursion is difficult: Must find fixpoint.– Queries (e.g. which context points to x)
require expanding an exponential number of contexts.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
89
My Technique:Cloning-Based Analysis
• Simple brute force technique.– Clone every path through the call graph.– Run context-insensitive algorithm on
expanded call graph.
• The catch: exponential blowup
June 11, 2006 Using Datalog and BDDsfor Program Analysis
90
Cloning is exponential!
June 11, 2006 Using Datalog and BDDsfor Program Analysis
91
Recursion
• Actually, cloning is unbounded in the presence of recursive cycles.
• Technique: We treat all methods within a strongly-connected component as a single node.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
92
Recursion
A
G
B C D
E F
A
G
B C D
E F E F E F
G G
June 11, 2006 Using Datalog and BDDsfor Program Analysis
93
Top 20 Sourceforge Java AppsNumber of Clones
1.E+00
1.E+02
1.E+04
1.E+06
1.E+08
1.E+10
1.E+12
1.E+14
1.E+16
1000 10000 100000 1000000
Size of program (variable nodes)
Nu
mb
er o
f cl
on
es
1016
1012
108
104
100
June 11, 2006 Using Datalog and BDDsfor Program Analysis
94
Cloning is infeasible (?)
• Typical large program has ~1014 paths.• If you need 1 byte to represent a clone:
– Would require 256 terabytes of storage– >12 times size of Library of Congress– Registered ECC 1GB DIMMs: $41.7 million
• Power: 96.4 kilowatts = Power for 128 homes
– 500 GB hard disks: 564 x $195 = $109,980• Time to read sequential: 70.8 days
June 11, 2006 Using Datalog and BDDsfor Program Analysis
95
Key Insight
• There are many similarities across contexts.– Many copies of nearly-identical results.
• BDDs can represent large sets of redundant data efficiently.– Need a BDD encoding that exploits the
similarities.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
96
Expanded Call Graph
A
DB C
E
F G
H
A0
D0B0 C0
E1
F2 G0
H0
E0 E2
F0 F1 G2G1
H1 H2 H3 H4 H5
June 11, 2006 Using Datalog and BDDsfor Program Analysis
97
Numbering Clones
A
DB C
E
F G
H
0 0 0
0 1 2
0-2 0-2
0-2 3-5
0A0
D0B0 C0
E1
F2 G0
H0
E0 E2
F0 F1 G2G1
H1 H2 H3 H4 H5
June 11, 2006 Using Datalog and BDDsfor Program Analysis
98
Context-sensitive Pointer Analysis Algorithm
1. First, do context-insensitive pointer analysis to get call graph.
2. Number clones.3. Do context-insensitive algorithm on
the cloned graph.
• Results explicitly generated for every clone.• Individual results retrievable with Datalog
query.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
99
Counting rule
• IEnum(i,m,vc2,vc1) :- roots(m), mI0(m,i), IE0(i,m). number
• Special rule to define numbering.• Head: result of numbering
– First two variables: edge you want to number– Second two variables: context numbering
• Subgoals: graph edges– Single variable: roots of graph
June 11, 2006 Using Datalog and BDDsfor Program Analysis
100
Demo of context-sensitive
June 11, 2006 Using Datalog and BDDsfor Program Analysis
101
Example: Race DetectionExample: Race DetectionPart III:Part III: Developing Advanced Analyses
June 11, 2006 Using Datalog and BDDsfor Program Analysis
102
Object Sensitivity• k-object-sensitivity (Milanova, Ryder, Rountev 2003)
• k=3 suffices in our experiments
• CHA/context-insensitive/k-CFA too imprecise
static main() { Contexts of method bar():h1: C a = new A();h2: C b = new B(); 1-CFA: { p4 }p1: foo(a); 2-CFA: { p1:p4, p2:p4, p3:p4 }p2: foo(b);p3: foo(a); } 1-objsens: { h1, h2 }static foo(C c) { 2-objsens: { h1, h2 }p4: c.bar(); }
June 11, 2006 Using Datalog and BDDsfor Program Analysis
103
Open Programs
• Analyzing open programs is important
– Many “programs” are libraries
– Developers need to understand behavior w/o a client
• Standard approach
– Write a “harness” manually
– A client exercising the interface of the open program
• Our approach
– Generate the harness automatically
June 11, 2006 Using Datalog and BDDsfor Program Analysis
104
Race Detection
• A multi-threaded program contains a race if:– Two threads can access a memory
location– At least one access is a write– No ordering between the accesses
• As a rule, races are bad– And common …– And hard to find …
June 11, 2006 Using Datalog and BDDsfor Program Analysis
105
Running Example
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
Harness
(Note: Single-threaded)
public A() {f = 0; }
public int get() {return rd(); }
public sync int inc() { int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {return f; }
private int wr(int x) {f = x;return x; }
June 11, 2006 Using Datalog and BDDsfor Program Analysis
106
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
public A() {f = 0; }
public int get() {return rd(); }
public sync int inc() { int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {return f; }
private int wr(int x) {f = x;return x; }
Example: Two Object-Sensitive Contexts
June 11, 2006 Using Datalog and BDDsfor Program Analysis
107
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
public A() {f = 0; }
public int get() {return rd(); }
public sync int inc() { int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {return f; }
private int wr(int x) {f = x;return x; }
Example: 1st Context
June 11, 2006 Using Datalog and BDDsfor Program Analysis
108
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
public A() {f = 0; }
public int get() {return rd(); }
public sync int inc() { int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {return f; }
private int wr(int x) {f = x;return x; }
Example: 2nd Context
June 11, 2006 Using Datalog and BDDsfor Program Analysis
109
All pairs of accesses such that
– Both references of one of the following forms:
• e1.f and e2.f (the same instance field)
• C.g and C.g (the same static field)
• e1[e3] and e2[e4] (any array elements)
– At least one is a write
Computing Original Pairs
June 11, 2006 Using Datalog and BDDsfor Program Analysis
110
public A() {f = 0; }
public int get() {return rd(); }
public sync int inc() { int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {return f; }
private int wr(int x) {f = x;return x; }
public A() {f = 0; }
public int get() {return rd(); }
public sync int inc() { int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {return f; }
private int wr(int x) {f = x;return x; }
Example: Original Pairs
June 11, 2006 Using Datalog and BDDsfor Program Analysis
111
Computing Reachable Pairs
• Step 1
– Access pairs with at least one write to same field
• Step 2
– Consider any access pair (e1, e2)
– To be a race e1 must be:
– Reachable from a thread-spawning call site s1
• Without “switching” threads
– Where s1 is reachable from main
– (and similarly for e2)
June 11, 2006 Using Datalog and BDDsfor Program Analysis
112
Example: Reachable Pairs
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
private int rd() { return f; }
private int wr(int x) {f = x;return x; }
public A() {f = 0; }
public int get() {return rd(); }
public sync int inc() { int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {return f; }
private int wr(int x) {f = x;return x; }
June 11, 2006 Using Datalog and BDDsfor Program Analysis
113
Computing Aliasing Pairs
• Steps 1-2
– Access pairs with at least one write to same field
– And both are executed in a thread in some context
• Step 3
– To have a race, both must access the same memory location
– Use alias analysis
June 11, 2006 Using Datalog and BDDsfor Program Analysis
114
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
private int rd() { return f; }
private int wr(int x) {f = x;return x; }
public A() {f = 0; }
public int get() {return rd(); }
public sync int inc() { int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {return f; }
private int wr(int x) {f = x;return x; }
Example: Aliasing Pairs
June 11, 2006 Using Datalog and BDDsfor Program Analysis
115
Computing Escaping Pairs
• Steps 1-3
– Access pairs with at least one write to same field
– And both are executed in a thread in some context
– And both can access the same memory location
• Step 4
– To have a race, the memory location must alsobe thread-shared
– Use thread-escape analysis
June 11, 2006 Using Datalog and BDDsfor Program Analysis
116
Example: Escaping Pairs
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
private int rd() { return f; }
private int wr(int x) {f = x;return x; }
public A() {f = 0; }
public int get() {return rd(); }
public sync int inc() { int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() { return f; }
private int wr(int x) {f = x;return x; }
June 11, 2006 Using Datalog and BDDsfor Program Analysis
117
Computing Unlocked Pairs
• Steps 1-4
– Access pairs with at least one write to same field
– And both are executed in a thread in some context
– And both can access the same memory location
– And the memory location is thread-shared
• Step 5
– Discard pairs where the memory location is guarded by a common lock in both accesses
June 11, 2006 Using Datalog and BDDsfor Program Analysis
118
Example: Unlocked Pairs
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
private int rd() { return f; }
private int wr(int x) {f = x;return x; }
public A() {f = 0; }
public int get() {return rd(); }
public sync int inc() { int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {return f; }
private int wr(int x) {f = x;return x; }
June 11, 2006 Using Datalog and BDDsfor Program Analysis
119
Counterexamples
• Each pair of paths in the context-sensitive call graphfrom a pair of roots to a pair of accesses along which a common lock may not be held
• Different from most other systems
– Pairs of paths (instead of single interleaved path)
– At call-graph level
June 11, 2006 Using Datalog and BDDsfor Program Analysis
120
Example: Counterexample
// file Harness.java
static public void main() {
A a;
a = new A();
4: a.get();
5: a.inc(); }
field reference A.f (A.java:10) [Rd]A.get(A.java:4)Harness.main(Harness.java:4)
field reference A.f (A.java:12) [Wr]A.inc(A.java:7)Harness.main(Harness.java:5)
// file A.java public A() {
f = 0; } public int get() {4: return rd(); } public sync int inc() {
int t= rd() + (new A()).wr(1);7: return wr(t); }
private int rd() {10: return f; } private int wr(int x) {12: f = x;
return x; }
June 11, 2006 Using Datalog and BDDsfor Program Analysis
121
Race Checker Datalog
June 11, 2006 Using Datalog and BDDsfor Program Analysis
122
Map Sensitivity
• Maps with constant string keys are common
• Augment pointer analysis:
– Model Map.put/get operations specially
...
String username = request.getParameter(“user”)
map.put(“USER_NAME”, username);
...
String query = (String) map.get(“SEARCH_QUERY”);
stmt.executeQuery(query);
...
“USER_NAME” ≠ “SEARCH_QUERY”
June 11, 2006 Using Datalog and BDDsfor Program Analysis
123
Resolving Reflection
• Reflection is a dynamic language feature• Used to query object and class information
– static Class Class.forName(String className) • Obtain a java.lang.Class object• I.e. Class.forName(“java.lang.String”) gets an
object corresponding to class String– Object Class.newInstance()
• Object constructor in disguise• Create a new object of a given class
Class c = Class.forName(“java.lang.String”);Object o = c.newInstance();
• This makes a new empty string o
June 11, 2006 Using Datalog and BDDsfor Program Analysis
124
What to Do About Reflection?
1. Anything goes
+ Obviously conservative
- Call graph extremely big and imprecise
1. String className = ...;2. Class c = Class.forName(className);3. Object o = c.newInstance();4. T t = (T) o;
3. Subtypes of T
+ More precise
- T may have many subtypes
4. Analyze className
+ Better still- Need to
know where className comes from
2. Ask the user
+ Good results- A lot of work
for user, difficult to find answers
June 11, 2006 Using Datalog and BDDsfor Program Analysis
125
Analyzing Class Names
• Looking at className seems promising
• This is interprocedural const+copy prop on strings
String stringClass = “java.lang.String”;
foo(stringClass);
...
void foo(String clazz){
bar(clazz);
}
void bar(String className){
Class c = Class.forName(className);
}
June 11, 2006 Using Datalog and BDDsfor Program Analysis
126
Pointer Analysis Can Help
stringClass
clazz
className
Stack variables Heap objects
java.lang.String
June 11, 2006 Using Datalog and BDDsfor Program Analysis
127
Reflection Resolution Using Points-to
• Need to know what className is– Could be a local string constant like java.lang.String– But could be a variable passed through many layers of
calls• Points-to analysis says what className refers to
– className --> concrete heap object
1. String className = ...;2. Class c = Class.forName(className);3. Object o = c.newInstance();4. T t = (T) o;
June 11, 2006 Using Datalog and BDDsfor Program Analysis
128
Reflection ResolutionConstants
1. String className = ...;2. Class c = Class.forName(className);3. Object o = c.newInstance();4. T t = (T) o;
Specification points
1. String className = ...;2. Class c = Class.forName(className); Object o = new T1(); Object o = new T2(); Object o = new T3();4. T t = (T) o;
Q: what object does this create?
June 11, 2006 Using Datalog and BDDsfor Program Analysis
129
Resolution May Fail!
• Need help figuring out what className is• Two options
1. Can ask user for help• Call to r.readLine on line 1 is a specification point• User needs to specify what can be read from a file• Analysis helps the user by listing all specification points
2. Can use cast information• Constrain possible types instantiated on line 3 to subclasses of T• Need additional assumptions
1. String className = r.readLine();2. Class c = Class.forName(className);3. Object o = c.newInstance();4. T t = (T) o;
June 11, 2006 Using Datalog and BDDsfor Program Analysis
130
1. Specification Files
loadImpl() @ 43 InetAddress.java:1231 => java.net.Inet4AddressImpl
loadImpl() @ 43 InetAddress.java:1231 => java.net.Inet6AddressImpl
lookup() @ 86 AbstractCharsetProvider.java:126 => sun.nio.cs.ISO_8859_15
lookup() @ 86 AbstractCharsetProvider.java:126 => sun.nio.cs.MS1251
tryToLoadClass() @ 29 DataFlavor.java:64 => java.io.InputStream
Format: invocation site => class
June 11, 2006 Using Datalog and BDDsfor Program Analysis
131
2. Using Cast Information
• Providing specification files is tedious, time-consuming, error-prone
• Leverage cast data instead– o instanceof T– Can constrain type of o if
1. Cast succeeds2. We know all subclasses of T
1. String className = ...;2. Class c = Class.forName(className);3. Object o = c.newInstance();4. T t = (T) o;
June 11, 2006 Using Datalog and BDDsfor Program Analysis
132
Analysis Assumptions
1. Assumption: Correct casts.Type cast operations that always operate on the result of a call to Class.newInstance are correct; they will always succeed without throwing a ClassCastException.
2. Assumption: Closed world.We assume that only classes reachable from the class path at analysis time can be used by the application at runtime.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
133
Casts Aren’t Always Present
• Can’t do anything if no cast post-dominating a Class.newInstance call
Object factory(String className){
Class c = Class.forName(className);
return c.newInstance();
}
...
SunEncoder t = (SunEncoder)
factory(“sun.io.encoder.” + enc);
SomethingElse e = (SomethingElse)
factory(“SomethingElse“);
June 11, 2006 Using Datalog and BDDsfor Program Analysis
134
Call Graph Discovery Process
Program IRProgram IR Call graph constructionCall graph
constructionReflection resolution
using points-to
Reflection resolution
using points-to
Resolved calls
Resolved calls
Final call graph
Final call graph
User-providedspec
User-providedspec
Cast-based approximationCast-based approximation
Specification points
Specification points
June 11, 2006 Using Datalog and BDDsfor Program Analysis
135
Implementation Details
• Call graph construction algorithm in the presence of reflection is integrated with pointer analysis– Pointer analysis already has to deal with virtual calls:
new methods are discovered, points-to relations for them are created
– Reflection analysis is another level of complexity
• See Datalog specification
June 11, 2006 Using Datalog and BDDsfor Program Analysis
136
Reflection Resolution Results• Applied to 6 large Java apps, 190,000 LOC combined
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
jgap freetts gruntspud jedit columba jfreechart
Meth
ods
Call graph sizes compared
June 11, 2006 Using Datalog and BDDsfor Program Analysis
137
Map relations
• Need to map from values in one domain to another?
• Use special operator “=>”• mapAtoB(a,b) :- a => b.• Elements in A are appended to
domain of B– A must have map file.– B must have enough space.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
138
Using Code Fragments
• Execute a code fragment before/after every rule invocation.A(x,y) :- B(y,a), C(a,z). { code goes here }
• Can access:– Relations by name.– Rule information.– Solver information.
• Can also add code fragment to relations (triggered on change).
• Special keywords: “modifies”, “pre”, “post”
June 11, 2006 Using Datalog and BDDsfor Program Analysis
139
Take a break…(Next up: Profiling, Debugging)
June 11, 2006 Using Datalog and BDDsfor Program Analysis
140
Tutorial StructurePart I: Essential Background
– Datalog for Program Analysis– Binary Decision Diagrams
Part II: Using the Tools– bddbddb– Compiler interface (Joeq compiler)– Datalog editor in Eclipse– Interactive mode
Part III: Developing Advanced Analyses– Context sensitivity– Combining multiple analyses– Race detection examples– Using advanced bddbddb features
Part IV: Profiling, Debugging, Avoiding Gotchas– Variable ordering– Iteration order– Machine learning– What it’s good for, what it isn’t good for
June 11, 2006 Using Datalog and BDDsfor Program Analysis
141
Part IV:Part IV: Profiling, Debugging, Avoiding Gotchas
Variable OrderingVariable Ordering
June 11, 2006 Using Datalog and BDDsfor Program Analysis
142
TryDomainOrders
• Try all possible domain orders for a given operation and inputs.– Bounded: if an order takes longer than current
best, abort it.
• To profile slow-running operations:Run with -Ddumpslow, -Ddumpcutoff=5000java net.sf.bddbddb.TryDomainOrders
• If you know ordering constraints, you can add them to rules/relations– Constraints automatically propagated to other
rules/relations
June 11, 2006 Using Datalog and BDDsfor Program Analysis
143
Variable Numbering: Active Machine Learning
• Must be determined dynamically• Limit trials with properties of relations• Each trial may take a long time• Active learning:
select trials based on uncertainty– Can build up trial database to improve
accuracy
• Several hours• Comparable to exhaustive for small apps
June 11, 2006 Using Datalog and BDDsfor Program Analysis
144
Using Machine Learning
• -Dfindbestorder– Enable machine learning
• -Dfbocutoff=#– Minimum runtime (in ms) for an
operation to be considered• -Dfbotrials=#
– Maximum number of trials to run• -Dtrialfile=
– Filename to load/store trial information.
June 11, 2006 Using Datalog and BDDsfor Program Analysis
145
Changing Iteration Order
• bddbddb uses simple iteration order heuristic – not always optimal
• If a rule is iterating too many times:– Lower its priority with pri=5– Increase other rules with pri=-5– Can also adjust priority of relations
• Solver prints iteration order on startup• Also try reformulating the problem or
changing input relations
June 11, 2006 Using Datalog and BDDsfor Program Analysis
146
Reformulate the Problem
• Change rule form:A(a,c) :- A(a,b), A(b,c). vsA(a,c) :- A(a,b), A(b,c).
• Change input relations– Short-circuit paths
• Filter relations as you go
June 11, 2006 Using Datalog and BDDsfor Program Analysis
147
Debugging
• Debugging can be tricky– Relations are huge– Declarative: not so straightforward
• Adding code fragments can help.• Try it on a small example with full
trace information.• Best: Interactive solver
June 11, 2006 Using Datalog and BDDsfor Program Analysis
148
“Comes from” query
• Special kind of query:A(3,5) :- ?
“What contributed to (3,5) being added to A?”
• Add ‘single’ keyword to get only one path.• Doesn’t solve the negated problem
(missing tuples)
June 11, 2006 Using Datalog and BDDsfor Program Analysis
149
Solver options
-Dnoisy-Dtracesolve-Dfulltracesolve-Dbddvarorder=-Dbddnodes=-Dbddcache=-Dbddminfree=-Dfindbestorder
-Dbasedir=-Dincludedirs=-Ddumprulegraph-Duseir-Dprintir-Dsplit_all_rules-Dsplit_no_rules
June 11, 2006 Using Datalog and BDDsfor Program Analysis
150
Datalog directives
• .include• .split_all_rules• .report_stats• .noisy• .strict• .singleignore• .trace
• .bddvarorder• .bddnodes• .bddcache• .bddminfree• .findbestorder• .incremental• .dot• .basedir
June 11, 2006 Using Datalog and BDDsfor Program Analysis
151
Relation options
input / inputtuplesoutput / outputtuplesprinttuplesprintsizepri=#{ code fragment }x < y
splitnumbersinglecacheafterrenamefindbestordertrace / tracefullpre / post { code }modifies R
Rule options
June 11, 2006 Using Datalog and BDDsfor Program Analysis
152
Experimental Features
• Distributed computation: (dbddbddb?)• Profile-directed feedback of iteration
order• Eclipse integration• Touchgraph integration• Debugging interface• Tracing information• Include rules in come-from query
June 11, 2006 Using Datalog and BDDsfor Program Analysis
153
What works well
• Big sets of mostly redundant data– Pointer analysis– Context-sensitive analysis
• Short propagation paths– Each iteration takes quite a bit of time, so
>1000 iterations will hurt– Try to preprocess/reformulate problem to
shorten paths
• Natural ‘flow’ problems• Pure analysis problems (no transformations)
June 11, 2006 Using Datalog and BDDsfor Program Analysis
154
What doesn’t work well• Long propagation paths
– Traditional dataflow analysis (use sparse form instead)
• Huge problems with little redundancy– Too much context sensitivity
• Domains that are not easily countable– Need to manufacture names on the fly
• Problems that have inherently complicated formulations
• Problems optimized for particular data structures (union-find, etc.)
June 11, 2006 Using Datalog and BDDsfor Program Analysis
155
Using bddbddb in a class• bddbddb has been very useful in Stanford
advanced compiler course– Comparing/contrasting analyses becomes easier– Students can implement and evaluate multiple
techniques without much overhead• Projects:
– Implement an algorithm from a paper in Datalog, make a small change and evaluate its effectiveness
– Experiment with different kinds of context sensitivity on a given problem
– Improve on BDD solver efficiency– Build a tool based on analysis results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
156
Questions?
June 11, 2006 Using Datalog and BDDsfor Program Analysis
157
That’s all, folks!
Thanks for sticking around for all 157 slides!
June 11, 2006 Using Datalog and BDDsfor Program Analysis
158
Experimental ResultsExperimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
159
Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)
0
0.5
1
1.5
2
2.5
joeq jgraph jbidwatch jedit umldot megamek
Sp
ee
d r
ela
tiv
e t
o h
an
dc
od
ed
Handcoded
No Opts
Incr
+DU
+Dom
+All
+Order
Experimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
160
Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)
0
0.5
1
1.5
2
2.5
joeq jgraph jbidwatch jedit umldot megamek
Sp
ee
d r
ela
tiv
e t
o h
an
dc
od
ed
Handcoded
No Opts
Incr
+DU
+Dom
+All
+Order
Experimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
161
Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)
0
0.5
1
1.5
2
2.5
joeq jgraph jbidwatch jedit umldot megamek
Sp
ee
d r
ela
tiv
e t
o h
an
dc
od
ed
Handcoded
No Opts
Incr
+DU
+Dom
+All
+Order
Experimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
162
Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)
0
0.5
1
1.5
2
2.5
joeq jgraph jbidwatch jedit umldot megamek
Sp
ee
d r
ela
tiv
e t
o h
an
dc
od
ed
Handcoded
No Opts
Incr
+DU
+Dom
+All
+Order
Experimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
163
Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)
0
0.5
1
1.5
2
2.5
joeq jgraph jbidwatch jedit umldot megamek
Sp
ee
d r
ela
tiv
e t
o h
an
dc
od
ed
Handcoded
No Opts
Incr
+DU
+Dom
+All
+Order
Experimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
164
Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)
0
0.5
1
1.5
2
2.5
joeq jgraph jbidwatch jedit umldot megamek
Sp
ee
d r
ela
tiv
e to
ha
nd
co
de
d
Handcoded
No Opts
Incr
+DU
+Dom
+All
+Order
Experimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
165
Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)
0
0.5
1
1.5
2
2.5
joeq jgraph jbidwatch jedit umldot megamek
Sp
ee
d r
ela
tiv
e t
o h
an
dc
od
ed
Handcoded
No Opts
Incr
+DU
+Dom
+All
+Order
Experimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
166
Java Context-Sensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
joeq jgraph jbidwatch jedit umldot megamek
Sp
ee
d r
ela
tiv
e to
ha
nd
co
de
d
Handcoded
No Opts
Incr
+DU
+Dom
+All
+Order
Experimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
167
C Pointer AnalysisSpeed Comparison (Normalized to Handcoded)
0
0.5
1
1.5
2
2.5
3
3.5
4
crafty enscript hypermail monkey
Sp
ee
d r
ela
tiv
e t
o h
an
dc
od
ed
Handcoded
No Opts
Incr
+DU
+Dom
+All
+Order
Experimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
168
External Lock AnalysisSpeed Comparison (Normalized to No Opts)
0
1
2
3
4
5
6
joeq jgraph jbidwatch jedit umldot megamek
Sp
ee
d r
ela
tiv
e t
o N
o O
pts
No Opts
Incr
+DU
+Dom
+All
+Order
Experimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
169
SQL Injection AnalysisSpeed Comparison (Normalized to Incr)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
personalblog road2hibernate snipsnap roller
Sp
ee
d r
ela
tiv
e t
o In
cr
No Opts
Incr
+DU
+Dom
+All
+Order
Experimental Results
June 11, 2006 Using Datalog and BDDsfor Program Analysis
170
Related Work• Datalog in Program Analysis
– Specify as Datalog query [Ullman 1989]– Toupie system [Corsini 1993]– Demand-driven using magic sets [Reps 1994]– Program analysis with logic programming [Dawson
1996]– Crocopat system [Beyer 2003]– Modular class analysis [Besson 2003]
• BDDs in Program Analysis– Predicate abstraction [Ball 2000]– Shape analysis [Manevich 2002, Yavuz-Kahveci 2002]– Pointer Analysis [Zhu 2002, Berndl 2003, Zhu 2004]– Jedd system [Lhotak 2004]
June 11, 2006 Using Datalog and BDDsfor Program Analysis
171
Related Work• BDD Variable Ordering
– Variable ordering is NP-complete [Bollig 1996]– Interleaving [Fujii 1993]– Sifting [Rudell 1993]– Genetic algorithms [Drechsler 1995]– Machine learning for BDD orders [Grumberg 2003]
• Efficient Evaluation of Datalog– Semi-naïve evaluation [Balbin 1987]– Bottom-up evaluation [Ullman 1989, Ceri 1990, Naughton
1991]– Top-down evaluation with tabling [Tamaki 1986, Chen 1996]– Rule ordering [Ramakrishnan 1990]– Magic sets transformation [Bancilhon 1986]– Computing with BDDs [Iwaihara 1995]– Time and space guarantees [Liu 2003]
June 11, 2006 Using Datalog and BDDsfor Program Analysis
172
Program Analysis with bddbddb
• Context-sensitive Java pointer analysis
• C pointer analysis• Escape analysis• Type analysis• External lock analysis• Finding memory leaks• Interprocedural def-use• Interprocedural mod-ref
• Object-sensitive analysis• Cartesian product
algorithm• Resolving Java reflection• Bounds check
elimination• Finding race conditions• Finding Java security
vulnerabilities• And many more…
Performance better than handcoded!
June 11, 2006 Using Datalog and BDDsfor Program Analysis
173
Conclusion
• bddbddb: new paradigm in program analysis– Datalog compiled into optimized BDD operations– Efficiently and easily implement context-sensitive
analyses– Easier to develop correct analyses– Easily experiment with new ideas– Growing library of program analyses– Easily use and build upon work of others
• Available as open-source LGPL: http://bddbddb.sourceforge.net
June 11, 2006 Using Datalog and BDDsfor Program Analysis
174
My Contribution (2)
– Pointer analysis in 6 lines of Datalog (a database language)• Hard to create & debug efficient BDD-based
algorithms (3451 lines, 1 man-year)• Automatic optimizations in bddbddb
– Easy to create context-sensitive analyses using pointer analysis results (a few lines)
– Created many analyses using bddbddb
bddbddb(BDD-based deductive database)
June 11, 2006 Using Datalog and BDDsfor Program Analysis
175
Outline• Pointer Analysis
– Problem Overview– Brief History– Pointer Analysis in Datalog
• Context Sensitivity• Improving Performance• bddbddb: BDD-based deductive database• Experimental Results
– Analysis Time– Analysis Memory– Analysis Accuracy
• Conclusion
June 11, 2006 Using Datalog and BDDsfor Program Analysis
176
Performance is Tricky!• Context-sensitive numbering scheme
– Modify BDD library to add special operations.– Can’t even analyze small programs. Time:
• Improved variable ordering– Group similar BDD variables together.– Interleave equivalence relations.– Move common subsets to edges of variable
order. Time: 40h• Incrementalize outermost loop
– Very tricky, many bugs. Time: 36h• Factor away control flow, assignments
– Reduces number of variables. Time: 32h
June 11, 2006 Using Datalog and BDDsfor Program Analysis
177
due to V. Benjamin Livshits
Java Security VulnerabilitiesApplication Reported Errors Actual
NameClasses
context-insensitiv
e
context-sensitive
Errors
blueblog 306 1 1 1webgoat 349 81 6 6blojsom 428 48 2 2personalblog 611 350 2 2snipsnap 653 >321 27 15road2hiberna
867 15 1 1
pebble 889 427 1 1roller 989 >267 1 1
Total 5356 >1508 41 29
June 11, 2006 Using Datalog and BDDsfor Program Analysis
178
Vulnerabilities Found
SQL injection
HTTP splitting
Cross-site scripting
Path traversal
Total
Header 0 6 4 0 10Parameter 6 5 0 2 13Cookie 1 0 0 0 1Non-Web 2 0 0 3 5Total 9 11 4 5 29
June 11, 2006 Using Datalog and BDDsfor Program Analysis
179
Summary of Contributions• The first scalable context-sensitive subset-based
pointer analysis.– Cloning-based technique using BDDs– Clever context numbering– Experimental results on the effects of context sensitivity
• bddbddb: new paradigm in program analysis– Efficiently and easily implement context-sensitive
analyses– Datalog compiled into optimized BDD operations– Library of program analyses (with many others)– Active learning for BDD variable orders (with M. Carbin)
• Artifacts:– Joeq compiler and virtual machine– JavaBDD library and BuDDy library– bddbddb tool
June 11, 2006 Using Datalog and BDDsfor Program Analysis
180
Conclusion
• The first scalable context-sensitive subset-based pointer analysis.– Accurate: Results for up to 1014
contexts.– Scales to large programs.
• bddbddb: a new paradigm in prog analysis– High-level spec Efficient implementation
• System is publicly available at:http://bddbddb.sourceforge.net
Top Related