Cleansing test suites from coincidental correctness to enhance falut localization

37
Cleansing Coincidental Correctness to Enhance Fault Localization Tao He [email protected] Software Engineering Laboratory Department of Computer Science, Sun Yat-Sen University The 2 nd Joint Winter Workshop on Software Engineering December 2010 Sun Yat-Sen University, Guangzhou, China 1/37 1/37

Transcript of Cleansing test suites from coincidental correctness to enhance falut localization

Page 1: Cleansing test suites from coincidental correctness to enhance falut localization

Cleansing Coincidental Correctnessto Enhance Fault Localization

Tao [email protected]

Software Engineering LaboratoryDepartment of Computer Science, Sun Yat-Sen University

The 2nd Joint Winter Workshop on Software EngineeringDecember 2010

Sun Yat-Sen University, Guangzhou, China

1/371/37

Page 2: Cleansing test suites from coincidental correctness to enhance falut localization

Outline Coverage-Based Fault Localization

Introduction Methodology Evaluation Discussion

Cleansing Coincidental Correctness Methodology Evaluation

Conclusion and Future Work

2/37

Page 3: Cleansing test suites from coincidental correctness to enhance falut localization

Software Debugging is an arduous task[1] that requires Time Effort A good understanding of the source code

Three steps to debug[2]

Fault detection Fault localization Fault correction

We focus on automatic Fault Localization…

Introduction

[1] I. Vessey. Expertise in debugging computer programs: A process analysis. International Journal of Man-Machine Studies, 23(5):459–494, November 1985.[2] D. Wieland. Model-Based Debugging of Java Programs Using Dependencies. PhD thesis, Technischen Universitat Wien, 2001.

3/37

Page 4: Cleansing test suites from coincidental correctness to enhance falut localization

Input of Fault Localization

Source code Test Cases

//Find the maximum among a, b and c

int max (int a, int b, int c){

1 int temp = a;

2 if (b > temp ){

3 temp = b+1; //bug

4 }

5 if (c > temp ){

6 temp = c;

7 }

8 return temp;

}

Input:

a, b, c

oracle

3, 2, 1 3

2, 1, 3 3

1, 2, 3 3

1, 2, 4 4

1, 2, 3 3

1, 3, 2 3Source Code Test Cases

4/37

Page 5: Cleansing test suites from coincidental correctness to enhance falut localization

Output of Fault Localization

Suspiciousness of each statement Based on likelihood of containing faults. Statement with higher suspiciousness should be examined

before statement with a lower suspiciousness.

S1 S2 S3 S4 S5 S6 S7 S8

S 0.33 0.33 0.5 0.33 0.33 0.25 0.33 0.33

Suspiciousness results for Jaccard coefficient

//Find the maximum among a, b and c

int max (int a, int b, int c){

1 int temp = a;

2 if (b > temp ){

3 temp = b+1; //bug

4 }

5 if (c > temp ){

6 temp = c;

7 }

8 return temp;

}

Source Code

most suspicious

5/37

Page 6: Cleansing test suites from coincidental correctness to enhance falut localization

Coverage-Based Fault Localization (CBFL)

Based on the executable statement hit (coverage) Input of CBFL

Coverage Execution result (passed or failed)

a, b, c S1 S2 S3 S4 S5 S6 S7 S8 r

3, 2, 1 1 1 0 1 1 0 1 1 p

2, 1, 3 1 1 0 1 1 1 1 1 p

1, 2, 3 1 1 1 1 1 0 1 1 p

1, 2, 4 1 1 1 1 1 1 1 1 p

1, 2, 3 1 1 1 1 1 1 1 1 f

1, 3, 2 1 1 1 1 1 0 1 1 f

//Find the maximum among a, b and c

int max (int a, int b, int c){

1 int temp = a;

2 if (b > temp ){

3 temp = b+1; //bug

4 }

5 if (c > temp ){

6 temp = c;

7 }

8 return temp;

}

Source Code

6/37

Page 7: Cleansing test suites from coincidental correctness to enhance falut localization

a, b, c S3 S6 Others r

3, 2, 1 0 0 1 p

2, 1, 3 0 1 1 p

1, 2, 3 1 0 1 p

1, 2, 4 1 1 1 p

1, 2, 3 1 1 1 f

1, 3, 2 1 0 1 f

Input of CBFL

//Find the maximum among a, b and c

int max (int a, int b, int c){

1 int temp = a;

2 if (b > temp ){

3 temp = b+1; //bug

4 }

5 if (c > temp ){

6 temp = c;

7 }

8 return temp;

}

Source Code

For brevity…

7/37

Page 8: Cleansing test suites from coincidental correctness to enhance falut localization

Intuitively, for each statement, there are four factors, which will contribute to the suspiciousness.

Methodology

a, b, c S3 S6 Others r

3, 2, 1 0 0 1 p

2, 1, 3 0 1 1 p

1, 2, 3 1 0 1 p

1, 2, 4 1 1 1 p

1, 2, 3 1 1 1 f

1, 3, 2 1 0 1 f

a00(S) 2 2 0

a10(S) 2 2 4

a01(S) 0 1 0

a11(S) 2 1 2

For each statement S An example

SJ(s) Cue

↑a00(S) ↑ |Not cover S, Passed tests|

↑a10(S) ↓ |Cover S, Passed tests|

↑a01(S) ↓ |Not cover S, Failed tests|

↑a11(S) ↑ |Cover S, Failed tests|

8/37

Page 9: Cleansing test suites from coincidental correctness to enhance falut localization

Jaccard [3]

[3] M. Y. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. A. Brewer. Pinpoint: Problem determination in large, dynamic internet services. In Proceedings of 2002 International Conference on Dependable Systems and Networks (DSN 2002), pages 595–604, Bethesda, MD, USA, 23-26 June 2002. IEEE Computer Society.

a, b, c S3 S6 Others r

3, 2, 1 0 0 1 p

2, 1, 3 0 1 1 p

1, 2, 3 1 0 1 p

1, 2, 4 1 1 1 p

1, 2, 3 1 1 1 f

1, 3, 2 1 0 1 f

SJ(j) 0.5 0.25 0.33

)()()(

)(

100111

11

sasasa

sasSJ

)(

)(

spasseddtotalfaile

sfailedsSJ

Similarity of asymmetric binary attributes

9/37

Page 10: Cleansing test suites from coincidental correctness to enhance falut localization

)()(

)(

)()()(

)()()(

0010

10

0111

11

0111

11

sasa

sa

sasasa

sasasa

sST

Tarantula [4]

[4] J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic faultlocalization technique. In D. F. Redmiles, T. Ellman, and A. Zisman, editors, 20th IEEE/ACM International Conference on Automated Software Engineering (ASE 2005), pages 273–282, Long Beach, CA, USA, November 7-11 2005. ACM.

a, b, c S3 S6 Others r

3, 2, 1 0 0 1 p

2, 1, 3 0 1 1 p

1, 2, 3 1 0 1 p

1, 2, 4 1 1 1 p

1, 2, 3 1 1 1 f

1, 3, 2 1 0 1 f

ST(j) 0.66 0.5 0.5

dtotalpasse

spasseddtotalfaile

sfaileddtotalfaile

sfailed

sST )()(

)(

Used in the Tarantula fault localization tool

10/37

Page 11: Cleansing test suites from coincidental correctness to enhance falut localization

Ochiai [5]

[5] R. Abreu, P. Zoeteweij, and A. J. van Gemund. On the accuracy of spectrum-based fault localization. In P. McMinn, editor, Proceedings of the Testing: Academia and Industry Conference - Practice And Research Techniques (TAIC PART’07), pages 89–98, Windsor, United Kingdom, September 2007. IEEE Computer Society.

a, b, c S3 S6 Others r

3, 2, 1 0 0 1 p

2, 1, 3 0 1 1 p

1, 2, 3 1 0 1 p

1, 2, 4 1 1 1 p

1, 2, 3 1 1 1 f

1, 3, 2 1 0 1 f

SO(j) 0.7 0.41 0.57

))()(())()((

)(

10110111

11

sasasasa

sasSO

))()((

)(

sfailedspasseddtotalfaile

sfailedsSO

Used in the molecular biology domain To measure genetic similarity

11/37

Page 12: Cleansing test suites from coincidental correctness to enhance falut localization

Evaluation Assign a score to every faulty version of each

subject program Score [6]

Describes the percentage of program that need not to be examined until the first bug-containing statement is reached

Assumption Perfect bug detection

i.e., programmers can always correctly classify faulty code as faulty, and non-faulty code as non-faulty.

[6] J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic faultlocalization technique. In D. F. Redmiles, T. Ellman, and A. Zisman, editors, 20th IEEE/ACM International Conference on Automated Software Engineering (ASE 2005), pages 273–282, Long Beach, CA, USA, November 7-11 2005. ACM.

12/37

Page 13: Cleansing test suites from coincidental correctness to enhance falut localization

Evaluation (cont’) - An example

S6 S3 S1 S2 S4 S5 S7 S8S 0.7 0.5 0.33 0.33 0.33 0.33 0.33 0.33

Sorted suspiciousness results

//Find the maximum among a, b and c

int max (int a, int b, int c){

1 int temp = a;

2 if (b > temp ){

3 temp = b+1; //bug

4 }

5 if (c > temp ){

6 temp = c;

7 }

8 return temp;

}Source Code

Not bug

Step 1 Not bug

13/37

Page 14: Cleansing test suites from coincidental correctness to enhance falut localization

Step 1 Not bug

Step 2 Find it!

Evaluation (cont’) - An example

S6 S3 S1 S2 S4 S5 S7 S8S 0.7 0.5 0.33 0.33 0.33 0.33 0.33 0.33

Sorted suspiciousness results

//Find the maximum among a, b and c

int max (int a, int b, int c){

1 int temp = a;

2 if (b > temp ){

3 temp = b+1; //bug

4 }

5 if (c > temp ){

6 temp = c;

7 }

8 return temp;

}Source Code

Find it!

14/37

Page 15: Cleansing test suites from coincidental correctness to enhance falut localization

2 statements have been examined 8 statements in the program totally Score of this program is

1- (2 ÷ 8) = 0.75 The percentage of statements that need not to be examined

Evaluation (cont’) - An example

15/37

Page 16: Cleansing test suites from coincidental correctness to enhance falut localization

Evaluation (cont’) Assign a score to every faulty version of Siemens suite

The effectiveness of existing techniques has been limited…

16/37

Page 17: Cleansing test suites from coincidental correctness to enhance falut localization

Discussion

11

101

1

aa

CS

T

T

10

11

aC

aS

JJ

11

10

11

1a

aa

SO

11

10

112

1a

aC

aS

J

O

0010

0111

aa

aaCT

0111 aaCJ

100111

11

aaa

aSJ

0010

10

0111

11

0111

11

aaa

aaa

aaa

ST

)()( 10110111

11

aaaa

aSO

Rewrite the coefficients as below [7]

Divide by Replace by Square, anddivide by

0111 aaCJ

For brevity

[7] R. Abreu, P. Zoeteweij, R. Golsteijn, and A. J. C. van Gemund. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 82(11):1780–1792, 2009.

Both CT=(a11+a01)/(a10+a00) and CJ=a11+a01 are constant for all statements Not influence the suspiciousness ranking

So rankings from three coefficients depend only on a11 and a10

17/37

Page 18: Cleansing test suites from coincidental correctness to enhance falut localization

The suspiciousness calculated by the coefficients have a positive correlation with a11

a negative correlation with a10

Assume that the fault is executed, this execution will fail (to increase a11), the fault is not executed, this execution will pass (to increase a10), the test suite is adequate.

Then the fault statement will always rank top.

Why ineffective? Any interferences?

The impact of a11 and a10

18/37

Page 19: Cleansing test suites from coincidental correctness to enhance falut localization

Interferences

Factors impair the CBFL (interferences) Coincidental Correctness [8]

The fault is executed, but this execution will not fail,

Multiple Faults The fault is not executed, but this execution will fail.

Coverage Equivalence The coverage between statements are always the same.

[8] W. Masri, R. Abou-Assi, M. El-Ghali, and N. Al-Fatairi. An empirical study of the factors that reduce the effectiveness of coverage-based fault localization. In B. Liblit, N. Nagappan, and T. Zimmermann, editors, Proceedings of the 2nd International Workshop on Defects in Large Software Systems: Held in conjunction with ISSTA 2009, pages 1–5, Chicago, Illinois, July 19-19 2009. ACM.

19/37

Page 20: Cleansing test suites from coincidental correctness to enhance falut localization

Coincidental Correctness

condition a, b, c S3 S6 Others r

a < b, b + 1 = c 1, 2, 3 1 0 1 p

a < b, b + 1 < c 1, 2, 4 1 1 1 p

//Find the maximum among a, b and c

int max (int a, int b, int c){

1 int temp = a;

2 if (b > temp ){

3 temp = b+1; //bug

4 }

5 if (c > temp ){

6 temp = c;

7 }

8 return temp;

}

Not all conditions for failure are met. The RIP (reachability-infection-propagation) model[9]

Condition 1:the fault is executed Condition 2:the program has transitioned into an infectious state Condition 3:the infection has propagated to the output

[9] Ammann P. and Offutt J. Introduction to Software Testing. Cambridge University Press, 2008. 20/37

Page 21: Cleansing test suites from coincidental correctness to enhance falut localization

Multiple Faults

//Find the maximum among a, b and c

int max (int a, int b, int c){

1 int temp = a;

2 if (b > temp ){

3 temp = b+1; //bug

4 }

5 if (c > temp ){

6 temp = c+1; //bug

7 }

8 return temp;

}

condition a, b, c S3 S6 r

a < b, b + 1 ≥ c 1, 2, 4 1 0 f

a ≥ b, a < c 3, 2, 4 0 1 f

The fault is not executed, but this execution will failed.(Because another fault is executed.)

21/37

Page 22: Cleansing test suites from coincidental correctness to enhance falut localization

Coverage Equivalence

//Find the maximum among a, b and c

int max (int a, int b, int c){

1 int temp = a+1; //bug

2 if (b > temp ){

3 temp = b;

4 }

5 if (c > temp ){

6 temp = c;

7 }

8 return temp;

}

condition a, b, c S1 S8 r

a < b or a < c 1, 2, 3 1 1 p

otherwise 7, 2, 4 1 1 f

The coverage between statements are always the same. Due to

Inadequacy of the test suite The inherent property of a program

22/37

Page 23: Cleansing test suites from coincidental correctness to enhance falut localization

Empirical Study

Coincidental Correctness (72.1%) [8]

Strong Coincidental Correctness (15.7%) Meet Condition 1,2 of RIP(reachability-infection-propagation) model.

Weak Coincidental Correctness (56.4%) Meet only Condition 1 of RIP(reachability-infection-propagation) model.

A safety reducing factor. Causes the faulty statement has a lower score than others.

[8] W. Masri, R. Abou-Assi, M. El-Ghali, and N. Al-Fatairi. An empirical study of the factors that reduce the effectiveness of coverage-based fault localization. In B. Liblit, N. Nagappan, and T. Zimmermann, editors, Proceedings of the 2nd International Workshop on Defects in Large Software Systems: Held in conjunction with ISSTA 2009, pages 1–5, Chicago, Illinois, July 19-19 2009. ACM. 23/37

Page 24: Cleansing test suites from coincidental correctness to enhance falut localization

Cleansing Coincidental Correctness [10]

Input: A test suite and the coverage matrix

Output: Subset of passing tests that are likely to be coincidentally correct.

Assumption A good candidate for a cce is a program element that occurs in all

failing runs and in a non-zero but not excessively large percentage of passing runs

[10] Wes Masri, Rawad Abou Assi, Cleansing Test Suites from Coincidental Correctness to Enhance Fault-Localization, 2008 International Conference on Software Testing, Verification, and Validation, pp. 165-174, 2010 Third International Conference on Software Testing, Verification and Validation, 2010. IEEE

24/37

Page 25: Cleansing test suites from coincidental correctness to enhance falut localization

Technique - I

We estimate:CCE: the set of program elements that are likely to be correlated with coincidentally correct tests.cce: an element in CCEcct : test that induce cce

CCT: estimate of TCC

Assumption fT(cce) = 1.0

0 < pT(cce) ≤ θ

where fT(cce) is the percentage of TF executing cce, pT(cce) the percentage of Tp executing cce, and θ < 1.0.

T : a test suite TF : failing tests

TP : passing tests

TCC : Coincidentally Correct tests

Populate CCE with program elements that are totally correlated with failures.

25/37

Page 26: Cleansing test suites from coincidental correctness to enhance falut localization

Technique - I (cont’)

We estimate:CCE: the set of program elements that are likely to be correlated with coincidentally correct tests.cce: an element in CCEcct : test that induce cce

CCT: estimate of TCC

Assumption fT(cce) = 1.0

0 < pT(cce) ≤ θ

where fT(cce) is the percentage of TF executing cce, pT(cce) the percentage of Tp executing cce, and θ < 1.0.

T : a test suite TF : failing tests

TP : passing tests

TCC : Coincidentally Correct tests

Populate CCT with tests that execute one or more cce’s.

26/37

Page 27: Cleansing test suites from coincidental correctness to enhance falut localization

Technique - I - An example

a, b, c S3 S6 Others r

1, 2, 3 1 0 1 p

1, 2, 4 1 1 1 p

3, 2, 1 0 0 1 p

2, 1, 3 0 1 1 p

1, 2, 3 1 1 1 f

1, 3, 2 1 0 1 f

cce

//Find the maximum among a, b and c

int max (int a, int b, int c){

1 int temp = a;

2 if (b > temp ){

3 temp = b+1; //bug

4 }

5 if (c > temp ){

6 temp = c;

7 }

8 return temp;

}

27/37

Page 28: Cleansing test suites from coincidental correctness to enhance falut localization

Technique - I - An example (cont’)

a, b, c S3 S6 Others r

cct1, 2, 3 1 0 1 p

cct1, 2, 4 1 1 1 p

3, 2, 1 0 0 1 p

2, 1, 3 0 1 1 p

1, 2, 3 1 1 1 f

1, 3, 2 1 0 1 f

cce

//Find the maximum among a, b and c

int max (int a, int b, int c){

1 int temp = a;

2 if (b > temp ){

3 temp = b+1; //bug

4 }

5 if (c > temp ){

6 temp = c;

7 }

8 return temp;

}

Find them!

coincidentalcorrectness

28/37

Page 29: Cleansing test suites from coincidental correctness to enhance falut localization

Technique - II

A high average weight is more likely to be a coincidentally correct test. Weight (correlate with suspiciousness)

((average weight of the covered cce’s) + (percent of cce’s covered))

The lower ranked cct’s are discarded

29/37

Page 30: Cleansing test suites from coincidental correctness to enhance falut localization

Technique - III

Partitions the cct’s into two clusters based on the similarity of the suspicious cce’s

Assumptions Typically, some cce’s are relevant to the fault and others are not.

The coincidentally correct tests exercise these fault relevant cce’s whereas the correct tests don’t.

30/37

Page 31: Cleansing test suites from coincidental correctness to enhance falut localization

Evaluation

false negatives:

false positives:

safety change:

precision change:

coverage reduction:

31/37

Page 32: Cleansing test suites from coincidental correctness to enhance falut localization

Evaluation (cont’)

32/37

Page 33: Cleansing test suites from coincidental correctness to enhance falut localization

Evaluation (cont’)

Comparative results summaries

33/37

Page 34: Cleansing test suites from coincidental correctness to enhance falut localization

Conclusion

Without interferences, CBFL are effective and efficient techniques that automate Fault Localization.

Well designed coefficients will be compatible with some interferences but not all of them.

Three variations of a technique are presented to identify coincidental correctness, a safety reducing factor for CBFL.

34/37

Page 35: Cleansing test suites from coincidental correctness to enhance falut localization

Future Work Conduct more algorithms to identify coincidental

correctness e.g. cluster analysis and failure classification.

Evaluate whether different program elements can further reduce the rate of false positives e.g. predicates, function calls, program paths

Assess the impact of cleansing coincidental correctness on other fault localization approaches

35/37

Page 36: Cleansing test suites from coincidental correctness to enhance falut localization

Q & A

36/37

Page 37: Cleansing test suites from coincidental correctness to enhance falut localization

Thank you!Contact me via [email protected]

37/37