Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell...
-
Upload
kolton-wynter -
Category
Documents
-
view
224 -
download
0
Transcript of Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell...
Using Likely Invariants For Automated Software Fault Localization
Swarup Kumar Sahoo
John Criswell
Chase Geigle
Vikram Adve
1
Department of Computer ScienceUniversity of Illinois at Urbana-Champaign
Motivation - 1
2
Software bugs cost ~$59.5 billion
annually (about 0.6% of the GDP)
NIST Report
Windows 2000, 35M LOC, 63,000 known bugs at release time, 2 per 1000 lines [Mary Jo Foley]
In 2006, a Mozilla developer admitted that everyday, almost 300 bugs appear in their Bugzilla [Anvik et.al.]
Motivation - 2
3
• Cost of fault localization increases exponentially with life cycle
• Bug fix cost very high during operational/maintenance phase
• Important to quickly to fix the bugs before the appl ships
(Source: Barry Boehm, “Equity Keynote Address” 2007 &Stefan Priebsch in Advanced OOP and Design Pattern, Codeworks 2009)
Motivation - 3
• Debugging - process of eliminating a software failure
• Automatic fault localization – Automatically identify the root cause responsible for a failure
• Automatic fault localization can reduce dev cost and time
4
Reproduce failure
Locate and understand root
cause
Fix Root Cause
Goal
• Need an automated system to detect root cause of bugs
Efficient, scalable and report few false positives
5
Bug Localization
Tool
.C
Program
Faulty Input
Contributions
• Novel diagnosis mechanism for automated bug localization– Likely invariants using auto generated inputs "close to" failing input
– First to combine invariants-approach with dynamic slicing in s/w
– Two novel heuristics for reducing false positives
• Used 8 bugs in Squid, Apache, MySQL for evaluation– Tool provides only 5 to 17 program expressions as root cause
6
Diagnosis with Invariants
• We use likely range invs to find potential root causes
• Range of values computed by program insts in correct runs– Violated invariants give us a set of candidate root cause(s)
• Invariants on load values, store values, return values.
8
Value Type Example Instructions Example Invariants
Return return int %weekday 0 <= %weekday <= 6Load %value = load int* %p 0 <= %valueStore store int %val, int* %q 100 <= %val <= 100
Key Insights
• Tighter Likely invariants:– Compact (not precise) way to summarize & compare memory states
– Efficiently isolates initial candidates of root causes to a small set
• Novel way to generate invariants – Automatically generated good inputs "close to" failing input
– Few close good inputs to train invariants
Much tighter and more relevant invariants
Missed root causes less likely (though possibly many candidates)
• Sequence of novel filtering techniques to reduce false+ves
9
Generate Inputs
Generate Invariants
Test with Bad Input
False +ve Filters
Good Inputs
BadInputs
InstrumentedProgram
Failed Invariants
Failure-inducing (Bad) input
Optional Input SpecificationProgram
Backward Slicing
Dependence Filtering
Multi-faulty Inp Filter
Diagnosis Tool Architecture
False+ve Filters
1 long day_nr ( uint year, uint month, uint day ) {
2 if ( year == 0 && … )
3 return ( 0 ) ;
4 Delsum = 365 * year + … ;
5 if (month <= 2)
6 year--;
7 else delsum -= (month*4 + … ;
8 temp = ( year / 100 + … ;
9 return ( delsum + year/4 * temp ) ;
10 }
11 int week_day ( long daynr , bool first_weekday ) {
12 return ( daynr + … ) % 7 ; }
13 bool make_date ( … ) {
14 . . .
15 Weekday = week_day ( day_nr (year, month, day ) , 0 ) ;
16 str->append ( …names [ weekday ] …. ;
17 . . . }
Source Code Example
11
Failing MySQL input : SELECT DATE (”0000-01-01”,’%W %d %M %Y’) as a
1 long day_nr ( uint year, uint month, uint day ) {
2 if ( year == 0 && … )
3 return ( 0 ) ;
4 Delsum = 365 * year + … ;
5 if (month <= 2)
6 year--;
7 else delsum -= (month*4 + … ;
8 temp = ( year / 100 + … ;
9 return ( delsum + year/4 * temp ) ;
10 }
11 int week_day ( long daynr , bool first_weekday ) {
12 return ( daynr + … ) % 7 ; }
13 bool make_date ( … ) {
14 . . .
15 Weekday = week_day ( day_nr (year, month, day ) , 0 ) ;
16 str->append ( …names [ weekday ] …. ;
17 . . . }
Source Code Example – Invariant Failures
12
Invariant Failures – candidate Root causesInvariant Failures – candidate Root causes
Insights for Training Invariants
• Use training inputs very “similar” to the failing input
Capture the key relevant differences between 2 similar runs
• Very few “similar” good inputs to train invariants
Much tighter and more relevant invariants
Less likely to miss root causes
Though possibly many false-positive candidates
13
Training Input Construction – 2 Approaches
• Deletion-based specification-independent approach
– A variation of ddmin algorithm [Zeller et.al.]
– Apply character-level rewriting/deletion
• Replacement-based specification-dependent approach
– User gives input specification
Tokens in input grammar and alternative tokens for each token
– Create variations for each input token depending upon type
– Create inputs by using variations of 1 token at a time
– Possible to implement this by modifying inbuilt parsers in application• Can be automated for given input specifications
14
Replacement-Based Spec-Dependent Approach – Example
15
SELECT NAME_CONST('flag',1) * MAX(a) FROM t1;
SELECT NAME_CONST('flag',2) * MAX(a) FROM t1;
SELECT NAME_CONST('flag',3) * MAX(a) FROM t1;
SELECT NAME_CONST('flag',5) * MAX(a) FROM t1;
SELECT NAME_CONST('flag',9) * MAX(a) FROM t1;
SELECT NAME_CONST('flag',1) * SUM(a) FROM t1;
SELECT NAME_CONST('flag',1) * MIN(a) FROM t1;
SELECT NAME_CONST('flag',1) * AVG(a) FROM t1;
SELECT NAME_CONST('flag',1) * STD(a) FROM t1;
Selecting Candidate Root Causes
• Invariant generation:– Select a set of “close” good inputs based on edit distance
– Run Instrumented program on good inputs to generate invariants
• Candidate root cause selection:– Execute the program with the inserted invariants for the bad input
– Failed invariants provide a set of candidate root-causes
16
Filtering Techniques
Remove candidates that do not affect symptom
Leverages DFS traversal of DDG during Slicing
Discard dep failed inv, if no intervening passing inv
Processing time is linear in the #edges in DDG
Intersection of root causes for similar failing inputs
17
Dynamic Backward Slicing
Dependence Filtering
Multiple faulty Input Filter
Dependence Filtering - 1
18
Inv Pass
Inv Fail
Inv Fail
Crash
Symptom
Inv Pass
Inv Pass
Probably not a root cause
A possible root cause
Dependence Filtering - 2
19
Inv Pass
Inv Pass
Inv Fail
Crash
Symptom
Inv Pass
Inv Pass
A possible root cause
Inv Fail
A possible root cause
1 long day_nr ( uint year, uint month, uint day ) {
2 if ( year == 0 && … )
3 return ( 0 ) ;
4 Delsum = 365 * year + … ;
5 if (month <= 2)
6 year--;
7 else delsum -= (month*4 + … ;
8 temp = ( year / 100 + … ;
9 return ( delsum + year/4 * temp ) ;
10 }
11 int week_day ( long daynr , bool first_weekday ) {
12 return ( daynr + … ) % 7 ; }
13 bool make_date ( … ) {
14 . . .
15 Weekday = week_day ( day_nr (year, month, day ) , 0 ) ;
16 str->append ( …names [ weekday ] …. ;
17 . . . }
Souce Code Example – Dependence Filtering
20
Probably not a root cause
A possible root cause
• Slice from root cause to symptom span several functions.
• This distance is high for incorrect output bugs.
Methodology - Characteristics of 8 Bugs
21
Bug-Name SymptomDistance (Dyn#LLVM inst)
Distance(Static #LOC)
Distance(Static #Functions)
Squid-len Buffer Overflow 12 6 2SQL-int-overflow Buffer Overflow 18 7 3
SQL-convert Incorrect Output 86 27 10SQL-aggregate Buffer Overflow 4 4 2
SQL-precision Incorrect Output 124 41 17SQL-mul-overflow Incorrect Output 114 36 17SQL-dataloss Incorrect Output 429 16 5Apache- overflow Buffer Overflow 32780 6 2
• 8 s/w bugs in 3 large server appl: Squid, Apache & MySQL
• Overall 12 bugs - 4 missing code bugs not considered
• LLVM-2.6 (LLVM-gcc) to compile programs & run our passes
27 10
41 1736 1716 5
Experimental Results – Bug Loc Effectiveness
22
Bug-Name #Invs #Failed-Invs #Final Candidate Root Causes
Squid-len 3358 357 9
SQL-int-overflow 5917 95 16
SQL-convert 5942 93 6
SQL-aggregate 6847 156 8
SQL-precision 4566 130 17
SQL-mul-overflow 4652 83 5
SQL-dataloss 5836 153 11
Apache- overflow 2295 120 6
• Tool provides only 5 –17 exprs as candidate root causes
#Final Candidate Root Causes
916
68
175
116
#Failed-Invs
3579593
156130
83153120
Experimental Results – Effectiveness of Filters
23
Bug-Name#Failed-Invs Slice
Squid-len 357 30
SQL-int-overflow 95 36
SQL-convert 93 27
SQL-aggregate 156 44
SQL-precision 130 34
SQL-mul-overflow 83 13
SQL-dataloss 153 35
Apache- overflow 120 12% Reduction 80%
Experimental Results – Effectiveness of Filters
24
Bug-NameFailed-Invs Slice
Dependence-filter
Squid-len 357 30 9
SQL-int-overflow 95 36 16
SQL-convert 93 27 9
SQL-aggregate 156 44 14
SQL-precision 130 34 18
SQL-mul-overflow 83 13 7
SQL-dataloss 153 35 17
Apache- overflow 120 12 6% Reduction 80% 58%
Experimental Results – Effectiveness of Filters
25
Bug-NameFailed-Invs Slice
Dependence-filter
Multiple-faulty-inputs
Squid-len 357 30 9 9
SQL-int-overflow 95 36 16 12
SQL-convert 93 27 9 6
SQL-aggregate 156 44 14 8
SQL-precision 130 34 18 17
SQL-mul-overflow 83 13 7 5
SQL-dataloss 153 35 17 11
Apache- overflow 120 12 6 6% Reduction 80% 58% 23%
• Slicing removes 80%, Dep-filtering 58%, multy-inp 23% false+ves
Comparison With Tarantula and Ochiai
27
• Statistical techniques rank each statement based on formula
• Performed better for SQL-int-overflow & SQL-precision bugs
• Good for bugs where control flow diverges from good runs
• Our approach better than Tarantula/Ochiai in 6 out of 8 bugs