CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)
-
Upload
james-clause -
Category
Technology
-
view
224 -
download
1
Transcript of Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)
Penumbra: Automatically Identifying Failure-Relevant Inputs
James Clause and Alessandro OrsoCollege of Computing
Georgia Institute of Technology
Supported in part by:NSF awards CCF-0725202 and CCF-0541080
to Georgia Tech
Automated Debugging
• Gupta and colleagues ’05• Jones and colleagues ’02• Korel and Laski ’88• Liblit and colleagues ’05• Nainar and colleagues ’07• Renieris and Reiss ’03• Seward and Nethercote ’05• Tucek and colleagues ’07• Weiser ’81• Zhang and colleagues ’05• Zhang and colleagues ’06• ...
Automated Debugging
Code-centric
• Gupta and colleagues ’05• Jones and colleagues ’02• Korel and Laski ’88• Liblit and colleagues ’05• Nainar and colleagues ’07• Renieris and Reiss ’03• Seward and Nethercote ’05• Tucek and colleagues ’07• Weiser ’81• Zhang and colleagues ’05• Zhang and colleagues ’06• ...
Automated Debugging
Code-centric
• Gupta and colleagues ’05• Jones and colleagues ’02• Korel and Laski ’88• Liblit and colleagues ’05• Nainar and colleagues ’07• Renieris and Reiss ’03• Seward and Nethercote ’05• Tucek and colleagues ’07• Weiser ’81• Zhang and colleagues ’05• Zhang and colleagues ’06• ...
What about inputs which cause the failure?
• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06
Data-centric Techniques
• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging
Data-centric Techniques
• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging
Data-centric Techniques
Requires:1. Multiple executions2. Large amounts of manual
effort (oracle creation, setup)
• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging
Data-centric Techniques
Requires:1. Multiple executions2. Large amounts of manual
effort (oracle creation, setup)
Penumbra
• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging
Data-centric Techniques
Requires:1. Multiple executions2. Large amounts of manual
effort (oracle creation, setup)
Penumbra
Comparableperformance
• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging
Data-centric Techniques
Requires:1. Multiple executions2. Large amounts of manual
effort (oracle creation, setup)
Requires:1. Single execution2. Reduced manual effort
Penumbra
Comparableperformance
Intuition and Terminology
Failure-revealing input vector
Intuition and Terminology
Failure-revealing input vector
Failure-relevant subset(inputs which are useful for investigating the failure)
Intuition and Terminology
Failure-revealing input vector
Failure-relevant subset(inputs which are useful for investigating the failure)
Approximate failure-relevant subsets by identifying inputs that reach the failure along
program dependencies.
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfoCommand line arguments
(flag, list of file names)
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
File statistics (for each file)(size, last modified date, ...)
Command line arguments(flag, list of file names)
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
File statistics (for each file)(size, last modified date, ...)
File contents (for each file)(first 50 characters)
Command line arguments(flag, list of file names)
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
File statistics (for each file)(size, last modified date, ...)
File contents (for each file)(first 50 characters)
Command line arguments(flag, list of file names) Input vector
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
Overflow out
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
buf.st_size ≥ 1GB
Overflow out
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
buf.st_size ≥ 1GB
verbose is true
Overflow out
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
buf.st_size ≥ 1GB
verbose is true
Overflow out
read 50 characters
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
1. Many more inputs than lines of code.
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
1. Many more inputs than lines of code.
2. Understanding the failure requires tracing interactions between inputs from multiple sources.
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
1. Many more inputs than lines of code.
2. Understanding the failure requires tracing interactions between inputs from multiple sources.
3. Only a small percentage of all inputs are relevant for the failure.
Motivating Example
int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }
fileinfo
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
Relevant context:1. When the failure occurs.2. Which data are involved in
the failure.
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
13. strcat(out, pview);
In general, it is chosen using traditional debugging methods.
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
1
2
3
4
5
6
7
8
9
0
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
1
2
3
4
5
6
7
8
9
0
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
0 8 9
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
0 8 9
filei
nfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
0 8 9
verbose is true
read 50 characters
buf.st_size ≥ 1GB
Outline
• Penumbra approach1. Tainting inputs
2. Propagating taint marks
3. Identifying relevant inputs
• Evaluation
• Conclusions and future work
1: Tainting InputsAssign a taint mark to each input as it enters the application.
1: Tainting InputsAssign a taint mark to each input as it enters the application.
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique taint mark to each
byte.(read from files)
Assign the same taint mark to related bytes.
(argv, argc, fstat, ...)
Assign taint marks based on user-
provided information.
Assign a taint mark to each input as it enters the application.
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique taint mark to each
byte.(read from files)
Assign the same taint mark to related bytes.
(argv, argc, fstat, ...)
Assign taint marks based on user-
provided information.
Assign a taint mark to each input as it enters the application.
Precise identification
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique taint mark to each
byte.(read from files)
Assign the same taint mark to related bytes.
(argv, argc, fstat, ...)
Assign taint marks based on user-
provided information.
Assign a taint mark to each input as it enters the application.
Precise identification
Unnecessarily expensive
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique taint mark to each
byte.(read from files)
Assign the same taint mark to related bytes.
(argv, argc, fstat, ...)
Assign taint marks based on user-
provided information.
Assign a taint mark to each input as it enters the application.
Precise identification
Unnecessarily expensive
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique taint mark to each
byte.(read from files)
Assign the same taint mark to related bytes.
(argv, argc, fstat, ...)
Assign taint marks based on user-
provided information.
Assign a taint mark to each input as it enters the application.
Precise identification
Unnecessarily expensive
Maintains per -byte precision
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique taint mark to each
byte.(read from files)
Assign the same taint mark to related bytes.
(argv, argc, fstat, ...)
Assign taint marks based on user-
provided information.
Assign a taint mark to each input as it enters the application.
Precise identification
Unnecessarily expensive
Maintains per -byte precision
Increases scalability
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique taint mark to each
byte.(read from files)
Assign the same taint mark to related bytes.
(argv, argc, fstat, ...)
Assign taint marks based on user-
provided information.
Assign a taint mark to each input as it enters the application.
Precise identification
Unnecessarily expensive
Maintains per -byte precision
Increases scalability
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique taint mark to each
byte.(read from files)
Assign the same taint mark to related bytes.
(argv, argc, fstat, ...)
Assign taint marks based on user-
provided information.
Assign a taint mark to each input as it enters the application.
Precise identification
Unnecessarily expensive
Maintains per -byte precision
Increases scalability
Per-byte Per-entity Domain specific
Maintains per -byte precision
1: Tainting Inputs
Assign a unique taint mark to each
byte.(read from files)
Assign the same taint mark to related bytes.
(argv, argc, fstat, ...)
Assign taint marks based on user-
provided information.
Assign a taint mark to each input as it enters the application.
Precise identification
Unnecessarily expensive
Maintains per -byte precision
Increases scalability
Per-byte Per-entity Domain specific
Maintains per -byte precision
Further increases scalability
1: Tainting Inputs
Assign a unique taint mark to each
byte.(read from files)
Assign the same taint mark to related bytes.
(argv, argc, fstat, ...)
Assign taint marks based on user-
provided information.
Assign a taint mark to each input as it enters the application.
When a taint mark is assigned to an input, log the input’s value and where the input was read from.
Precise identification
Unnecessarily expensive
Maintains per -byte precision
Increases scalability
Per-byte Per-entity Domain specific
Maintains per -byte precision
Further increases scalability
2: Propagating Taint Marks
2: Propagating Taint MarksData-flow
Propagation (DF)Data- and control-flowPropagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along onlydata dependencies.
Taint marks flow along data and control dependencies.
Data-flowPropagation (DF)
Data- and control-flowPropagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along onlydata dependencies.
Taint marks flow along data and control dependencies.
C = A + B;
Data-flowPropagation (DF)
Data- and control-flowPropagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along onlydata dependencies.
Taint marks flow along data and control dependencies.
C = A + B;
1 2
Data-flowPropagation (DF)
Data- and control-flowPropagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along onlydata dependencies.
Taint marks flow along data and control dependencies.
C = A + B;
1 21 2
Data-flowPropagation (DF)
Data- and control-flowPropagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along onlydata dependencies.
Taint marks flow along data and control dependencies.
C = A + B;
1 21 2
Data-flowPropagation (DF)
Data- and control-flowPropagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along onlydata dependencies.
Taint marks flow along data and control dependencies.
C = A + B;if(X) { C = A + B;}
1 21 2
Data-flowPropagation (DF)
Data- and control-flowPropagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along onlydata dependencies.
Taint marks flow along data and control dependencies.
C = A + B;if(X) { C = A + B;}
1 21 2
1 2
3
Data-flowPropagation (DF)
Data- and control-flowPropagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along onlydata dependencies.
Taint marks flow along data and control dependencies.
C = A + B;if(X) { C = A + B;}
1 21 2
1 2
3
1 2 3
Data-flowPropagation (DF)
Data- and control-flowPropagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along onlydata dependencies.
Taint marks flow along data and control dependencies.
C = A + B;if(X) { C = A + B;}
1 21 2
1 2
3
1 2 3
The effectiveness of each option depends on the particular failure.
Data-flowPropagation (DF)
Data- and control-flowPropagation (DF + CF)
3: Identifying Relevant-inputs1. Relevant context indicates
which data is involved in the considered failure.
2. Identify which taint marks as associated with the data indicated by the relevant context.
3. Use recorded logs to reconstruct inputs that are identified by the taint marks.
Baz
1.5GB
Prototype Implementation
TraceProcessor
Tracegenerator
input vector
executable
trace
relevant context
Prototype Implementation
TraceProcessor
Tracegenerator
input vector
executable
trace
relevant context
Prototype Implementation
TraceProcessor
Tracegenerator
Implemented using Dytan, a generic x86 tainting framework
developed in previous work [Clause and Orso 2007].
input vector
executable
trace
relevant context
Prototype Implementation
TraceProcessor
Tracegenerator
input vector
executable
trace
relevant context
Prototype Implementation
TraceProcessor
Tracegenerator
input subset(DF)
input subset(DF+CF)
EvaluationStudy 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging
EvaluationStudy 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging
Application KLoC Fault locationbc 1.06 10.5 more_arrays : 177
gzip 1.24 6.3 get_istat : 828
ncompress 4.24 1.4 comprexx : 896
pine 4.44 239.1 rfc822_cat : 260
squid 2.3 69.9 ftpBuildTitleUrl : 1024
Subjects:
EvaluationStudy 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging
Application KLoC Fault locationbc 1.06 10.5 more_arrays : 177
gzip 1.24 6.3 get_istat : 828
ncompress 4.24 1.4 comprexx : 896
pine 4.44 239.1 rfc822_cat : 260
squid 2.3 69.9 ftpBuildTitleUrl : 1024
Subjects:
We selected a failure-revealing input vector for each subject.
Data GenerationPenumbra Delta Debugging
Setup(manual)
Execution(automated)
Choose a relevant context
Create an automated oracle
Use prototype tool to identify failure-relevant inputs (DF and DF +
CF)
Use the standard Delta Debugging
implementation to minimize inputs.
Data GenerationPenumbra Delta Debugging
Setup(manual)
Execution(automated)
Choose a relevant context
Create an automated oracle
Use prototype tool to identify failure-relevant inputs (DF and DF +
CF)
Use the standard Delta Debugging
implementation to minimize inputs.
Data GenerationPenumbra Delta Debugging
Setup(manual)
Execution(automated)
Choose a relevant context
Create an automated oracle
Use prototype tool to identify failure-relevant inputs (DF and DF +
CF)
Use the standard Delta Debugging
implementation to minimize inputs.
• Location: statement where the failure occurs.
• Data: any data read by such statement
Data GenerationPenumbra Delta Debugging
Setup(manual)
Execution(automated)
Choose a relevant context
Create an automated oracle
Use prototype tool to identify failure-relevant inputs (DF and DF +
CF)
Use the standard Delta Debugging
implementation to minimize inputs.
Data GenerationPenumbra Delta Debugging
Setup(manual)
Execution(automated)
Choose a relevant context
Create an automated oracle
Use prototype tool to identify failure-relevant inputs (DF and DF +
CF)
Use the standard Delta Debugging
implementation to minimize inputs.
Data GenerationPenumbra Delta Debugging
Setup(manual)
Execution(automated)
Choose a relevant context
Create an automated oracle
Use prototype tool to identify failure-relevant inputs (DF and DF +
CF)
Use the standard Delta Debugging
implementation to minimize inputs.
• Use gdb to inspect stack trace and program data.
• One second timeout to prevent incorrect results.
Data GenerationPenumbra Delta Debugging
Setup(manual)
Execution(automated)
Choose a relevant context
Create an automated oracle
Use prototype tool to identify failure-relevant inputs (DF and DF +
CF)
Use the standard Delta Debugging
implementation to minimize inputs.
Study 1: Effectiveness
Is the information that Penumbra provides helpful for
debugging real failures?
Study 1 Results: gzip & ncompressCrash when a file name is longer than 1,024 characters.
Study 1 Results: gzip & ncompress
Contents&
Attributes
Contents&
Attributes
bar
Contents&
Attributes
foo./gzip
Crash when a file name is longer than 1,024 characters.
# Inputs: 10,000,056
longfile name[ ]
Study 1 Results: gzip & ncompress
Contents&
Attributes
Contents&
Attributes
bar
Contents&
Attributes
foo./gzip
Crash when a file name is longer than 1,024 characters.
# Inputs: 10,000,056 # Relevant (DF): 1
longfile name[ ]
Study 1 Results: gzip & ncompress
Contents&
Attributes
Contents&
Attributes
bar
Contents&
Attributes
foo./gzip
Crash when a file name is longer than 1,024 characters.
# Relevant (DF + CF): 3# Inputs: 10,000,056 # Relevant (DF): 1
longfile name[ ]
Study 1 Results: pineCrash when a “from” field contains 22 or more double quote characters.
Study 1 Results: pine
# Inputs: 15,103,766
...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...
Crash when a “from” field contains 22 or more double quote characters.
Study 1 Results: pine
# Inputs: 15,103,766
...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...
… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "
Crash when a “from” field contains 22 or more double quote characters.
Study 1 Results: pine
# Inputs: 15,103,766 # Relevant (DF): 26
...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...
… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "
Crash when a “from” field contains 22 or more double quote characters.
Study 1 Results: pine
# Relevant (DF + CF): 15,100,344# Inputs: 15,103,766 # Relevant (DF): 26
...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...
… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "
Crash when a “from” field contains 22 or more double quote characters.
Study 1: Conclusions
Study 1: Conclusions1. Data-flow propagation is always effective,
data- and control-flow propagation is sometimes effective.
➡ Use data-flow first then, if necessary, use control-flow.
Study 1: Conclusions1. Data-flow propagation is always effective,
data- and control-flow propagation is sometimes effective.
➡ Use data-flow first then, if necessary, use control-flow.
2. Inputs identified by Penumbra correspond to the failure conditions.
➡Our technique is effective in assisting the debugging of real failures.
Study 2: Comparison with Delta Debugging
RQ1: How much manual effort does each technique require?
RQ2: How long does it take to fix a considered failure given the information provided by
each technique?
RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.
RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,8001259731470163
ncompress bc pine
Setu
p-tim
e (s
)
gzip
PenumbraDelta Debugging
squid
RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,8001259731470163
ncompress bc pine
Setu
p-tim
e (s
)
gzip
PenumbraDelta Debugging
squid
RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,8001259731470163
ncompress bc pine
Setu
p-tim
e (s
)
gzip
PenumbraDelta Debugging
squid
RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,8001259731470163
ncompress bc pine
Setu
p-tim
e (s
)
gzip
PenumbraDelta Debugging
squid
Penumbra requires considerably less setup time than Delta Debugging (although more time time overall for gzip and ncompress).
RQ2: Debugging EffortUse number of relevant inputs as a proxy for debugging effort.
RQ2: Debugging Effort
Subject PenumbraPenumbra Delta DebuggingDF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
Use number of relevant inputs as a proxy for debugging effort.
RQ2: Debugging Effort
Subject PenumbraPenumbra Delta DebuggingDF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
Use number of relevant inputs as a proxy for debugging effort.
• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.
RQ2: Debugging Effort
Subject PenumbraPenumbra Delta DebuggingDF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
Use number of relevant inputs as a proxy for debugging effort.
• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.
• Penumbra (DF + CF) is likely less effective for bc, pine, and squid
Conclusions & Future Work
• Novel technique for identifying failure-relevant inputs.
• Overcomes limitations of existing approaches
• Single execution
• Minimal manual effort
• Comparable effectiveness
• Combine Penumbra with existing code-centric techniques.