Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

39
Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1

Transcript of Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

Page 1: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

Constraint-Based Analysis

CS 8803 FPLOct 24, 2012

(Slides courtesy of Alex Aiken)

1

Page 2: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

2

void f(state *x, state *y) {result = spin_trylock(&x->lock); spin_lock(&y->lock);…if (!result) spin_unlock(&x->lock);spin_unlock(&y->lock);

}

Code Example

Path Sensitivity

result

(!result)Pointers &

Heap

(&x->lock);

(&x->lock);

(&y->lock);

(&y->lock); Inter-

procedural

Flow Sensitivityspin_tryloc

kspin_lock

spin_unlock

Locked

Unlocked

Error

unlock

lock un

loc

k

lock

Page 3: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

3

Saturn• What?

– SAT-based approach to static bug detection

• How? – SAT-based approach– Program constructs Boolean constraints– Inference SAT solving

• Why SAT?– Lots of reasons, but for now:– Program states naturally expressed as bits– The theory for bits is SAT– Efficient solvers widely available

Page 4: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

4

Intuition

• Analyzing in one direction is problematic– Forwards or backwards– Consider null dereference analysis

• No null ptr assignments: forwards is best• No dereferences: backwards is best

• Constraints– Give a global picture of the program– Allow more efficient order of solution

Page 5: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

5

Straight-line Code

void f(int x, int y) {

int z = x & y ; assert(z == x);

}

x31 … x0y31 … y0

==

x31y31 … x0y0

Bitwise-AND

R

y&xz

==

;

Page 6: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

6

Straight-line Code

void f(int x, int y) {

int z = x & y; assert(z == x);

}

R

Query: Is-Satisfiable( )

Answer: Yes

x = [00…1] y = [00…0]

Negated assertion is satisfiable.

Therefore, the assertion may fail.

Page 7: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

7

Control Flow – Preparation

• Approach– Assumes loop free program– Unroll loops, drop backedges

• May miss errors that are deeply buried– Bug finding, not verification– Many errors surface in a few iterations

• Advantages– Simplicity, reduces false positives

Page 8: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

8

if (c)

Control Flow – Example

• Merges– preserve path sensitivity– select bits based on the values of incoming

guards

G = c, x: [a31…a0]

G = c, x: [b31…b0]

G = c c, x: [v31…v0]

where vi = (cai)(cbi)

c x = a;

x = b;res =

x;

c if (c) x = a; else x = b; res = x;

true

Page 9: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

9

Pointers – Overview

• May point to different locations…– Thus, use points-to sets

p: { l1,…,ln }

• … but path sensitive – Use guards on points-to relationships

p: { (g1, l1), …, (gn, ln) }

Page 10: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

10

G = c, p: { (true, y) }

Pointers – Example

G = true, p: { (true, x) }p = &x;if (c) p = &y;res = *p; G = true, p: { (c, y); (c, x)}

if (c) res = y;

else if (c) res = x;

Page 11: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

11

Pointers – Recap

• Guarded Location Sets { (g1, l1), …, (gn, ln) }

• Guards– Condition under which points-to relationship

holds– Collected from statement guards

• Pointer Dereference– Conditional Assignments

Page 12: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

12

Not Covered

• Other Constructs– Structs, …

• Modeling of the environment

• Optimizations– several to reduce size of formulas– some form of program slicing important

Page 13: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

13

What can we do with Saturn?int f(lock_t *l) {

lock(l);…unlock(l);

}

if (l->state == Unlocked)

l->state = Locked;

else

l->state = Error;

if (l->state == Locked)

l->state = Unlocked;

else

l->state = Error;

Locked

Unlocked

Error

unlock

lock un

loc

k

lock

Page 14: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

14

General FSM Checking

• Encode FSM in the program– State Integer– Transition Conditional Assignments

• Check code behavior– SAT queries

Page 15: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

15

How are we doing so far?

• Precision:

• Scalability: – SAT limit is 1M clauses– About 10 functions

• Solution:– Divide and conquer– Function summaries

Page 16: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

16

Function Summaries (1st try)

• Function behavior can be summarized with a set of state transitions

• Summary:*l: Unlocked Unlocked

Locked Error

int f(lock_t *l){

lock(l);…

…unlock(l);return 0;

}

Page 17: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

17

int f(lock_t *l){

lock(l);…if (err) return -1;…unlock(l);return 0;

}

A Difficulty

• Problem – two possible output

states– distinguished by

return value(retval == 0)…

• Summary1. (retval == 0) *l: Unlocked Unlocked

Locked Error2. (retval == 0) *l: Unlocked Locked

Locked Error

Page 18: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

18

FSM Function Summaries

• Summary representation (simplified):{ Pin, Pout, R }

• User gives:– Pin: predicates on initial state– Pout: predicates on final state– Express interprocedural path sensitivity

• Saturn computes:– R: guarded state transitions– Used to simulate function behavior at call site

Page 19: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

19

int f(lock_t *l){

lock(l);…if (err) return -1;…unlock(l);return 0;

}

Lock Summary (2nd try)

• Output predicate:– Pout = { (retval == 0) }

• Summary (R):1. (retval == 0) *l: Unlocked Unlocked

Locked Error2. (retval == 0) *l: Unlocked Locked

Locked Error

Page 20: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

20

Lock checker for Linux

• Parameters:– States: { Locked, Unlocked, Error }– Pin = {}

– Pout = { (retval == 0) }

• Experiment:– Linux Kernel 2.6.5: 4.8MLOC– ~40 lock/unlock/trylock primitives– 20 hours to analyze

• 3.0GHz Pentium IV, 1GB memory

Page 21: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

21

Double Locking/Unlockingstatic void sscape_coproc_close(…) {

spin_lock_irqsave(&devc->lock, flags);if (…)

sscape_write(devc, DMAA_REG, 0x20);…

}

static void sscape_write(struct … *devc, …) {spin_lock_irqsave(&devc->lock, flags);…

}

Page 22: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

22

Ambiguous Return State

int i2o_claim_device(…) {down(&i2o_configuration_lock);if (d->owner) {

up(&i2o_configuration_lock);return –EBUSY;

}if (…) {

return –EBUSY;}…

}

Page 23: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

23

Bugs

Type Bugs False Pos. % Bugs

Double Locking 134 99 57%

Ambiguous State 45 22 67%

Total 179 121 60%

Previous Work: MC (31), CQual (18), <20% Bugs

Page 24: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

24

Function Summary Database• 63,000 functions in Linux

– More than 23,000 are lock related– 17,000 with locking constraints on entry– Around 9,000 affects more than one

lock– 193 lock wrappers– 375 unlock wrappers– 36 with return value/lock state

correlation

• Available on the web . . .

Page 25: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

25

Another Checker

• Memory leaks– Common, esp. in error handling code– Hard to find– Problematic in long running applications

• Current techniques– Escape analysis– Ownership types– Region based analysis…

Page 26: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

26

Simple Leak

char *f() {char *p;p = (char*)malloc(…);…if (err) return NULL;…return p;

}

Page 27: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

27

Scenario 1 – Malloc Wrapperschar *f() {char *p;p = (char*)strdup(…);…if (err) return NULL;…return p;

}

Page 28: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

28

Scenario 2 – External Referenceschar *f(struct *s) {char *p;p = (char*)malloc(…);s->name = p;if (err) return NULL;…return p;

}

Page 29: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

29

Scenario 3 – Function Calls

char *f(struct state *s) {char *p;p = (char*)malloc(…);g(s, p);if (err) return NULL;…return p;

}

void g(s, p) { s->name = p;}

Page 30: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

30

Scenario 4 – Data dependencyvoid f(int len) {char fastbuf[10], *p;if (len < 10) p = fastbuf;else p = (char *)malloc(len);…if (p != fastbuf) free(p);

}

Page 31: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

31

Requirements

• Track points-to relationships precisely

• Infer escaping functions– ones that create external references to

objects passed in via parameters

• Infer allocation functions

Page 32: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

32

Analysis Part I – Points-to Rule• PointsTo(p, l)

– condition under which p points to l

(p) = { (g0, l0), …, (gn-1, ln-1) }

PointsTo(p, l) = gi (if li = l) false (otherwise)

Page 33: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

33

Analysis Part II – EscapeVia

• EscapeVia(l, p, X)– the condition under which location l escapes

via pointer p, excluding references in set X

• Access Roots– Every object in the function body is accessed

through one of the following “roots”• Parameters (p1…pn)• The Return Value (ret_val)• Global Variables• Local Variables• Heap Allocated Objects

Page 34: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

34

Analysis Part II – EscapeVia

• Never escape through local variables

Root(p) Locals X EscapeVia(l, p, X) = false

• Always escape through global variables

RootOf(p) GlobalsEscapeVia(l, p, X) = PointsTo(p, l)

Page 35: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

35

• Escaping through parameters/return RootOf(p) (Params { ret_val }) – X EscapeVia(l, p, X) = PointsTo(p, l)

• Escaping via another allocated location

RootOf(p) NewLocs – XEscapeVia(l, p, X) = PointsTo(p, l)

Escaped(p,X {RootOf(l)})

Analysis Part II – EscapeVia

Page 36: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

36

Analysis Part III – Escape/Leak• Escape Condition

Escaped(l, X) = p EscapedVia(l, p, X)

• Leak ConditionLeaked(l, X) = Escaped(l, X)

• Leak CheckerFor all new locations l, there is a leak if

Satisfiable(Leaked(l, {}))

Page 37: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

37

Results

LOC (K)

# Alloc Func.

# Bugs FP (%)

Samba 404 80 83 8.79%

OpenSSL 296 101 117 0.85%

BinUtils 909 91 136(66)

3.55%

OpenSSH

36 19 29(10) 0%

Total 1,646 291 365 3.69%

Page 38: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

38

Why SAT? (Revisited …)

• Moore’s Law

• Uniform modeling of constructs as bits

• Constraints– Local specification– Global solution

• Incremental SAT solving– makes multiple queries efficient

Page 39: Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken) 1.

39

Why SAT? (Cont.)

• Path sensitivity is important– To find bugs– To reduce false positives– Much easier to model precisely with SAT

• Compositionality is important– Function summaries critical for

scalability– Easy to construct with SAT queries