KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson...
-
Upload
andy-woolen -
Category
Documents
-
view
213 -
download
1
Transcript of KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson...
![Page 1: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/1.jpg)
KLEE: Effective Testing of Systems Programs
Cristian Cadar
Joint work with Daniel Dunbar and Dawson Engler
April 16th, 2009
![Page 2: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/2.jpg)
2
• Code complexity– Tricky control flow– Complex dependencies– Abusive use of pointer operations
• Environmental dependencies– Code has to anticipate all possible interactions– Including malicious ones
Writing Systems Code Is Hard
![Page 3: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/3.jpg)
• Automatically generates high coverage test suites– Over 90% on average on ~160 user-level apps
• Finds deep bugs in complex systems programs– Including higher-level correctness ones
KLEE
• Based on symbolic execution and constraint solving techniques
[OSDI 2008, Best Paper Award]
![Page 4: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/4.jpg)
int bad_abs(int x)
{ if (x < 0)
return –x; if (x ==
1234) return –x; return x;}
x = 1234
x < 0x < 0 x 0
return x
x 1234
return -x
return -x
x = 1234
x =
x = -2
x = 3x = 1234
test1.out
test2.out test3.out
Toy Example
TRUE
TRUE FALSE
FALSE
![Page 5: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/5.jpg)
KLEE Architecture
LLVM bytecode
K L E ESYMBOLIC ENVIRONMENT
Constraint Solver (STP)
x = 3
x = -2
x = 1234
x = 3
C code
x 0x 1234
LLVM
![Page 6: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/6.jpg)
Outline
• Motivation
• Example and Basic Architecture
• Scalability Challenges
• Experimental Evaluation
![Page 7: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/7.jpg)
Three Big Challenges
• Motivation • Example and Basic Architecture• Scalability Challenges
– Exponential number of paths– Expensive constraint solving– Interaction with environment
• Experimental Evaluation
![Page 8: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/8.jpg)
Exponential Search Space
Naïve exploration can easily get “stuck”
Use search heuristics:
• Coverage-optimized search– Select path closest to an uncovered instruction– Favor paths that recently hit new code
• Random path search– See [KLEE – OSDI’08]
![Page 9: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/9.jpg)
Three Big Challenges
• Motivation • Example and Basic Architecture• Scalability Challenges
– Exponential number of paths– Expensive constraint solving– Interaction with environment
• Experimental Evaluation
![Page 10: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/10.jpg)
Constraint Solving
• Dominates runtime– Inherently expensive (NP-complete)– Invoked at every branch
• Two simple and effective optimizations– Eliminating irrelevant constraints– Caching solutions
• Dramatic speedup on our benchmarks
![Page 11: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/11.jpg)
Eliminating Irrelevant Constraints
• In practice, each branch usually depends on a small number of variables
x + y > 10
z & -z = z
x < 10 ?
…
…
if (x < 10) {
…
}
![Page 12: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/12.jpg)
Caching Solutions
2 y < 100x > 3x + y > 10
x = 5y = 15
2 y < 100x + y > 10
2 y < 100x > 3x + y > 10x < 10
• Static set of branches: lots of similar constraint sets
Eliminating constraintscannot invalidate solution
Adding constraints often does not invalidate solution
x = 5y = 15
x = 5y = 15
![Page 13: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/13.jpg)
0
50
100
150
200
250
300
0 0.2 0.4 0.6 0.8 1
Base Irrelevant Constraint Elimination Caching Irrelevant Constraint Elimination + Caching
Dramatic Speedup
Aggregated data over 73 applications
Tim
e (
s)
Executed instructions (normalized)
![Page 14: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/14.jpg)
Three Big Challenges
• Motivation • Example and Basic Architecture• Scalability Challenges
– Exponential number of paths– Expensive constraint solving– Interaction with environment
• Experimental Evaluation
![Page 15: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/15.jpg)
Environment: Calling Out Into OS
int fd = open(“t.txt”, O_RDONLY);
• If all arguments are concrete, forward to OS
• Otherwise, provide models that can handle symbolic files– Goal is to explore all possible legal interactions with
the environment
int fd = open(sym_str, O_RDONLY);
![Page 16: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/16.jpg)
Environmental Modeling
// actual implementation: ~50 LOCssize_t read(int fd, void *buf, size_t count) { exe_file_t *f = get_file(fd); … memcpy(buf, f->contents + f->off, count) f->off += count; …}
• Plain C code run by KLEE– Users can extend/replace environment w/o any knowledge of
KLEE internals
• Currently: effective support for symbolic command line arguments, files, links, pipes, ttys, environment vars
![Page 17: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/17.jpg)
Does KLEE work?
• Motivation
• Example and Basic Architecture
• Scalability Challenges
• Evaluation– Coverage results– Bug finding– Crosschecking
![Page 18: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/18.jpg)
GNU Coreutils Suite
• Core user-level apps installed on many UNIX systems• 89 stand-alone (i.e. excluding wrappers) apps (v6.10)
– File system management: ls, mkdir, chmod, etc.
– Management of system properties: hostname, printenv, etc.
– Text file processing : sort, wc, od, etc.
– …
Variety of functions, different authors,intensive interaction with environment
Heavily tested, mature code
![Page 19: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/19.jpg)
Coreutils ELOC (incl. called lib)
5
53
16
6 41 3 2
0
10
20
30
40
50
60
Executable Lines of Code (ELOC)
Num
ber
of
app
licati
ons
![Page 20: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/20.jpg)
Methodology
• Fully automatic runs• Run KLEE one hour per utility, generate test cases• Run test cases on uninstrumented version of utility• Measure line coverage using gcov
– Coverage measurements not inflated by potential bugs in our tool
![Page 21: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/21.jpg)
0%
20%
40%
60%
80%
100%
1 12 23 34 45 56 67 78 89
High Line Coverage (Coreutils, non-lib, 1h/utility = 89 h)
Overall: 84%, Average 91%, Median 95%16 at 100%
Apps sorted by KLEE coverage
Covera
ge (
ELO
C %
)
![Page 22: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/22.jpg)
9
-20%
0%
20%
40%
60%
80%
100%
Beats 15 Years of Manual TestingK
LEE c
overa
ge –
Manual co
vera
ge Avg/utility
KLEE 91%
Manual 68%
Apps sorted by KLEE coverage – Manual coverage
Manual tests also check correctness
![Page 23: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/23.jpg)
72
0%
20%
40%
60%
80%
100%
1 13 25 37 49 61
Busybox Suite for Embedded Devices
Overall: 91%, Average 94%, Median 98%31 at 100%
Apps sorted by KLEE coverage
Covera
ge (
ELO
C %
)
![Page 24: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/24.jpg)
72
-20%
0%
20%
40%
60%
80%
100%
1 13 25 37 49 61
Busybox – KLEE vs. Manual
Avg/utility
Apps sorted by KLEE coverage – Manual coverage
KLE
E c
overa
ge –
Manual co
vera
ge
KLEE 94%
Manual 44%
![Page 25: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/25.jpg)
Does KLEE work?
• Motivation
• Example and Basic Architecture
• Scalability Challenges
• Evaluation– Coverage results– Bug finding– Crosschecking
![Page 26: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/26.jpg)
GNU Coreutils Bugs
• Ten crash bugs– More crash bugs than approx last three years combined– KLEE generates actual command lines exposing crashes
![Page 27: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/27.jpg)
md5sum -c t1.txt
mkdir -Z a b
mkfifo -Z a b
mknod -Z a b p
seq -f %0 1
pr -e t2.txt
tac -r t3.txt t3.txt
paste -d\\ abcdefghijklmnopqrstuvwxyz
ptx -F\\ abcdefghijklmnopqrstuvwxyz
ptx x t4.txt
t1.txt: \t \tMD5( t2.txt: \b\b\b\b\b\b\b\t t3.txt: \n t4.txt: A
Ten command lines of death
![Page 28: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/28.jpg)
Does KLEE work?
• Motivation
• Example and Basic Architecture
• Scalability Challenges
• Evaluation– Coverage results– Bug finding– Crosschecking
![Page 29: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/29.jpg)
Finding Correctness Bugs
• KLEE can prove asserts on a per path basis– Constraints have no approximations– An assert is just a branch, and KLEE proves
feasibility/infeasibility of each branch it reaches– If KLEE determines infeasibility of false side of
assert, the assert was proven on the current path
![Page 30: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/30.jpg)
Crosschecking
Assume f(x) and f’(x) implement the same interface1. Make input x symbolic
2. Run KLEE on assert(f(x) == f’(x))3. For each explored path:
a) KLEE terminates w/o error: paths are equivalent
b) KLEE terminates w/ error: mismatch found
Coreutils vs. Busybox:1. UNIX utilities should conform to IEEE Std.1003.1
2. Crosschecked pairs of Coreutils and Busybox apps
3. Verified paths, found mismatches
![Page 31: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/31.jpg)
Mismatches Found
Input Busybox Coreutilstee "" <t1.txt [infinite loop] [terminates]
tee - [copies once to stdout]
[copies twice]
comm t1.txt t2.txt
[doesn’t show diff] [shows diff]
cksum / "4294967295 0 /" "/: Is a directory"
split / "/: Is a directory"
tr [duplicates input] "missing operand"
[ 0 ‘‘<’’ 1 ] "binary op. expected"tail –2l [rejects] [accepts]
unexpand –f [accepts] [rejects]
split – [rejects] [accepts]
t1.txt: a t2.txt: b (no newlines!)
![Page 32: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/32.jpg)
Related Work
Very active area of research. E.g.:
• EGT / EXE / KLEE [Stanford]
• DART [Bell Labs]
• CUTE [UIUC]
• SAGE, Pex [MSR Redmond]
• Vigilante [MSR Cambridge]
• BitScope [Berkeley/CMU]
• CatchConv [Berkeley]
• JPF [NASA Ames]
KLEE– Hundred distinct benchmarks
– Extensive coverage numbers
– Symbolic crosschecking
– Environment support
![Page 33: KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009.](https://reader038.fdocuments.in/reader038/viewer/2022110304/5518c685550346881f8b5859/html5/thumbnails/33.jpg)
• KLEE can effectively:– Generate high coverage test suites
• Over 90% on average on ~160 user-level applications
– Find deep bugs in complex software• Including higher-level correctness bugs, via
crosschecking
KLEE Effective Testing of Systems Programs