Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation Patrice...

Post on 29-Mar-2015

219 views 3 download

Tags:

Transcript of Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation Patrice...

Statically Validating Must Summaries for Incremental Compositional

Dynamic Test Generation

Patrice Godefroid Shuvendu K. Lahiri Cindy Rubio-González

International Static Analysis Symposium – September 2011

Microsoft Research University of Wisconsin – Madison

2

Valid inputConstraintsRecorded

trace

Background

• Systematic Dynamic Test Generation (= DART)

• Used in many toolso EXE, CUTE, SAGE, PEX, KLEE, BitScope, Apollo, etc.

Run program

Symbolically execute program

Negate and solve constraints

New inputs

And the process repeats (possibly forever!)

3

o 200+ machines (since 2008)

• #1 application for SMT solvers today (CPU usage)o 1st whitebox fuzzer for security testing

SAGE @ Microsoft

o 1 billion+ constraints

o 100s of apps, 100s of security bugs Example: Win7 file fuzzing Found ~1/3 of all fuzzing bugs

o Millions of dollars saved for Microsoft + time/energy for the world

Compositional Test Generation

Compositional Dynamic Test Generation• Compute summaries that can be reused later• Avoid retesting• Can provide the same path coverage

exponentially faster!

4

Systematically executing all feasible paths does not scale

Example of Function Summary

5

1 int is_positive(int x) {2 if (x > 0) return 1;3 return 0;4 }

Where ret denotes the value returned by the function is_positive

Function Summaries

6

• Function summary for a function fo Logic formula over constraintso Derived by successive iterations and defined as a

disjunction of formulas

𝜑𝑤𝑓=𝑝𝑟𝑒𝑤𝑓 ⋀𝑝𝑜𝑠𝑡𝑤𝑓

Conjunction of constraints on the

inputs of f

Conjunction of constraints on the

outputs of f

o Can be computed automatically from the path constraint for the intraprocedural path

Must Summaries

7

• Symbolic execution of large programs impreciseo Complex program statementso Calls to operating-system and library functions

• Concrete values simplified constraintso Under-approximate path constraintso Summaries become must summaries

1 int g(int x, int y) {2 if ((x > 0) && (hash(y) > 10))

3 return 1;4 return 0;5 }

𝜑𝑔=(𝑥>0∧ 𝑦=45∧𝑟𝑒𝑡=1 )Under-approximate with smaller

precondition

Assume hash is a complex or unknown function

Assume if g is invoked with y = 45, then hash(45) = 987

Must Summaries

8

• Defined as quadruple ⟨lp, P, lq, Q ⟩ where:

Prog

Ip

lq

P summary precondition holding at lp

Q summary postcondition holding at lq

Some Facts About Summaries

• Time to be produced: weeks/months

9

• Number of summaries: millions

• Number of instructions executed between lp and lq: can be hundreds of thousands

Incremental Compositional Test Generation

10

Have to start from scratch if there is a small code change

Incremental compositional test generation • As in smart/selective regression testing• Reuse summaries still valid in new program• Recompute invalid summaries

Must Summary Checking

11

• Given a valid must summary for a program and a new version of the program, is the summary still valid for the new version?

• Intraprocedural summarieso locations lp and lq are in a same function fo function f does not return between lp to lq when the

summary is generated

Some proposals

• Naïveo For each summary, record executed instructions

Too expensive, ~100K of instructions executed Runtime overhead

12

• Our proposalo Verify statically what summaries are valid in

order to reuse them Less precise than recomputing summaries from

scratch, but cheaper

Algorithms

1. Static Change Impact Analysis

13

2. Predicate-Sensitive Change Impact Analysis

3. Must Summary Validity Checking Analysis

Phase 1: Static Change Impact Analysis

• Impact analysis of code changes in the control-flow and call graphs of the program

14Old program New program

Ip

lq

Ip

lq

Modified Instructions and Functions

• Instruction i of a program Prog is modified if:o i is changed or deleted in Prog’ oro Its ordered set of immediate successors has changed

15

• Function f in a program Prog is modified if f:o contains a modified instructiono calls a modified functiono calls an unknown function

Phase 1: Static Change Impact Analysis

16

...... ......

...... ............

......

......

...... ......

............

Construct call graph for the program1

17

...... ......

...... ............

......

......

...... ......

............

U

MMU

M

IM

IM IMIU

IU

IU

IU

IU

IUIU

IM

S

S

S

S

S S

Find modified and unknown functions2 Find indirectly modified and unknown functions3

Phase 1: Static Change Impact Analysis

4 Map summaries, construct control-flow graphs

18

...... ......

...... ............

......

......

...... ......

............

U

MMU

M

IM

IM IMIU

IU

IU

IU

IU

IUIU

IM

S

S

S

S

S S

Find summaries as valid or invalid5

Phase 1: Static Change Impact Analysis

Phase 2: Predicate-Sensitive Change Impact Analysis

19

• Exploit the predicates P and Q in a summary

if(x > 0)

if (y==0)

w = w + 1

w = 0 w = 1

...

Ip

lq

P: x>0 y<10

Q: w = 0

Old program

Invalidated by Phase 1

Phase 2: Predicate-Sensitive Change Impact Analysis

20

...if (x > 0) { if (y == 10) w++; // MODIFIED else w = 0;}else { w = 1; // MODIFIED}...

Old program

void foo() {

return;}

Ip

lq

P: x>0 y<10

Q: w = 0

goto lp;...assume P; modified = false;if (x > 0) { if (y == 10) { modified = true; w++; } else w = 0;}else { modified = true; w = 1; }assert(Q ¬modified);...

Phase 2: Predicate-Sensitive Change Impact Analysis

21Instrumented old program

1

2

4

3

3

void foo() {

return;}

Ip

lq

P: x>0 y<10

Q: w = 0

Phase 2: Predicate-Sensitive Change Impact Analysis

• Check assertion in instrumented code does not fail for all possible inputs

22

• Verification-condition based program verifiero Create logic formula from program with assertionso Check formula validity using theorem provero If valid, the assertion does not fail in any execution

Phase 3: Must Summary Validity Checking

23

• Check must summary validity against some code, independently of code changes

if(x < 0)

if (y < 0)

r = 1 r = 0 w = 1

...

Ip

lq

P: x < 0

Q: r 0

Old program

r = 4

New program

Invalidated by Phase 1 and Phase 2

Phase 3: Must Summary Validity Checking

24

...if (x < 0) { if (y < 0) r = 1; else { r = 4; // r = 0 in old code }}...

New program

void bar() {

return;}

Ip

lq

P: x < 0

Q: r 0

Phase 3: Must Summary Validity Checking

25

reach_lq = false; goto lp;...assume P;if (x < 0) { if (y < 0) r = 1; else { r = 4; // r = 0 in old code }}assert(Q); reach_lq = true;...assert(reach_lq);

Instrumented new program

1

2

3

4

void bar() {

return;}

Ip

lq

P: x < 0

Q: r 0

Phase 3: Must Summary Validity Checking

• Check that assertions hold in the instrumented program for all possible inputs

26

Result

27

Validated summaries can be reusedo Because of soundness

Invalidated summaries are discarded and need to be recomputed

o New tests are generated to cover their preconditions

Algorithms can be used in isolation or in a pipeline

Experimental Results

28

Implementation Details

29

Map summaries, find modified insts

and funcs (C++)

Old DLL SummariesOld

DLLOld DLL

NewDLLNew

DLLNewDLL

Vulcan

Produced by SAGE

Phase 1Change Impact

Phase 2Predicate Sensitive

Phase 3Validity Checking

Valid/Invalid Summaries

Library to statically analyze Windows binaries

Used in pipeline or isolation

Implementation Details

Translator from X86 to BoogiePL

Procedure (x86)

Vulcan

Summary ⟨lp,P,lq,Q⟩

Sound translation

Instrumented BPL file (Phase 2 or Phase 3)

Boogie/Z3

Benchmarks

31

• Image parsers embedded in Windows o ANI, GIF and JPEG

• Ran SAGE to generate summaries (small sample)o 286 for ANI, 288 for GIF and 517 for JPEG

• Identified the DLLs involvedo 3 for ANI, 4 for GIF and 8 for JPEG

• Compared old version against a randomly picked newer versiono Delta ~1 to 3 years

Difference Between Program Versions

32

ANI GIF JPEG0

5000

10000

15000

20000

25000

6978

13897

20357

Number of Functions per Benchmark

Modified functions: 3% - 10% Indirectly modified functions: 30% - 45%

Unknown functions: 27% - 37% Indirectly unknown functions: 60% - 74%

Applying Phases in Isolation

33

Phase 1 Phase 2 Phase 30

50

100

150

200

250

300

167

244

86

ANI (286 summaries)

Phase 1 Phase 2 Phase 30

50

100

150

200

250

300

198

264

90

GIF (288 summaries)

Phase 1 Phase 2 Phase 30

100

200

300

400

500

600

317

487

173

JPEG (517 summaries)

# Va

lidat

ed S

umm

arie

s

# Va

lidat

ed S

umm

arie

s

# Va

lidat

ed S

umm

arie

s

58% 85% 30% 69% 92% 31%

61% 94% 33%

Total Validated: 256/286 (90%)

Total Validated: 274/288 (95%)

Total Validated: 501/517 (97%)

Phase 1: Change ImpactPhase 2: Predicate SensitivePhase 3: Validity Checking

Applying Phases in Pipeline FashionPhase 1 → Phase 2 → Phase 3

34

Phase 1 Phase 2 Phase 30

20406080

100120140160180

167

77

12

ANI (286 summaries)

Phase 1 Phase 2 Phase 30

50

100

150

200

250

198

73

3

GIF (288 summaries)

Phase 1 Phase 2 Phase 30

50

100

150

200

250

300

350

317

179

5

JPEG (517 summaries)

# Va

lidat

ed S

umm

arie

s

# Va

lidat

ed S

umm

arie

s

# Va

lidat

ed S

umm

arie

s

58% 27% 4%

Total Validated: 256/286 (90%)

69% 25% 1%

61% 35% 1%

Total Validated: 274/288 (95%)

Total Validated: 501/517 (97%)

Phase 1: Change ImpactPhase 2: Predicate SensitivePhase 3: Validity Checking

Phase 1 Phase 2 Phase 30

5

10

15

20

25

30

35

40

12

31

37

JPEG

Phase 1 Phase 2 Phase 30

5

10

15

20

25

30

35

40

8

23

35

GIF

Phase 1 Phase 2 Phase 305

1015202530354045

5

3742

ANI

Running Time (Isolation)

35

# M

inut

es

# M

inut

es

# M

inut

es

Phase 1: Change ImpactPhase 2: Predicate SensitivePhase 3: Validity Checking

Running Time Phase 1 → Phase 2 → Phase 3

36

# M

inut

es

43 min 28min 41min

Preliminary results show that statically validating must summaries is up to 20 times faster than recomputing them!

ANI GIF JPEG0

5

10

15

20

25

30

35

40

45

50

Running Time (Pipeline)

Phase 3Phase 2Phase 1Mapping, etc.

Phase 1: Change ImpactPhase 2: Predicate SensitivePhase 3: Validity Checking

Summary• Formulated the problem of statically validating must

summaries

37

• Demonstrated the effectiveness of static must summary checkingo Validated hundreds of must summaries in minutes

• Described three approaches for validating must summaries

• Presented a preliminary evaluation on three large Windows image parsers

Questions?

38

Map summaries, find modified insts

and funcs (C++)

Old DLL SummariesOld

DLLOld DLL

NewDLLNew

DLLNewDLL

Vulcan

Phase 1Change Impact

Phase 2Predicate Sensitive

Phase 3Validity Checking

Valid/Invalid Summaries