Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems
-
Upload
pavneet-singh-kochhar -
Category
Software
-
view
370 -
download
1
Transcript of Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in
Large Systems
Pavneet Singh Kochhar, Ferdian Thung, David Lo Singapore Management University
{kochharps.2012,ferdiant.2013,davidlo}@smu.edu.sg
International Conference on Software Analysis, Evolution, and Reengineering (SANER’15)
Software Testing, Why Bother?
2
Functionality -- Requirements
Bugs -- Software reliability
Costs -- Late bugs cost more
Software Testing, Why Bother?
• Horgan and Mathur [1]– Adequate testing is critical to develop reliable
software• Tassey [2]
– Inadequate testing cost US economy 59 billion dollars annually
3
[1] J.R. Horgan and A.P. Mathur, “Software testing and reliability.” McGraw-Hill, Inc., 1996.[2] G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, 2002.
• Gopinath et al. [1] – • Analyze hundreds of open-source projects to measure
the quality of test suites• Projects used are small i.e., 10 LOC to 10,000 LOC.
• Inozemtseva et al. [2] – • Analyze the relationship between test suite size,
coverage and effectiveness• Five large software systems
Both these studies use mutants i.e., artificially injected bugs
[1] Code coverage for suite evaluation by developersion, R. Gopinath, C. Jensen, and G. Alex, ICSE 2014[2] Coverage is not strongly correlated with test suite effectiveness, L. Inozemtseva and R. Holmes, ICSE 2014.
4
Previous Studies
Code Coverage
5
• Percentage of the code executed by test cases
• Used as a proxy for adequacy of testing• Types:
– Statement Coverage– Branch Coverage
• We measure coverage using Cobertura*
*http://cobertura.github.io/cobertura/
Study Goals
To understand the correlation between the test suite size, coverage and effectiveness.
6
Is code coverage effective in killing real bugs?
Outline
• Motivation and Goals• Overall Process• Dataset• Empirical Results• Conclusion and Future Work
7
Overall Process
8
Outline
• Motivation and Goals• Overall Process• Dataset• Empirical Results• Conclusion and Future Work
9
Dataset
10
Project Lines of Code Number of Bugs*
HTTPClient 122,288 67
Rhino 116,065 92
Project HTTPClient RhinoDescription Java library for
client side HTTP services
JavaScript Engine
Developed by Apache Mozilla
Build Tool Maven Ant
Issue Tracking JIRA Bugzilla
* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013
Test Suite Size & Coverage
11
Used Randoop tool to generate Junit tests for 5 mins
Project% of Original Test Suite Size
0.2 0.5 1 5 10 100
HTTPClient 7.43 15.62 39.13 197.82 396.17 3967.00
Rhino 7.64 16.01 40.10 202.52 405.46 4059.28
Project Coverage% of Original Test Suite Size
0.2 0.5 1 5 10 100
HTTPClient Line 7.5 11.0 17.2 28.0 31.8 37.4
Branch 2.8 4.4 7.6 14.4 17.2 22.5
Rhino Line 6.4 8.7 11.6 17.0 19.4 27.1
Branch 3.0 4.2 5.8 9.0 10.5 16.5
Test Suite Effectiveness
12
Test suite that runs successfully (i.e., all test cases run successfully) on a non-buggy version and fails on the buggy version (i.e., one of the test cases fails) kills the bug.
Point Biserial Correlation
13
• To measure the correlation between two variables when one of them is naturally dichotomous i.e., variable naturally takes value of 0 or 1.
• Pett et al. [1]Value Range Correlation
rpb2 ≥ 0.81 Very strong
0.49 ≤ rpb2 < 0.81 Strong
0.25 ≤ rpb2 < 0.49 Moderate
0.09 ≤ rpb2 < 0.25 Weak
0.00 ≤ rpb2 < 0.09 Very weak
[1] M. A. Pett. Nonparametric Statistics for Health Care Research: Statistics for Small Samples and Unusual Distributions. Sage Publications, Inc., 1997
Outline
• Motivation and Goals• Overall Process• Dataset• Empirical Results• Conclusion and Future Work
14
Research Questions
15
RQ1: Is there a correlation between a test suite’s size and its effectiveness? RQ2: Is there a correlation between a test suite’s coverage and its effectiveness?
Research Questions
16
RQ1:Size vs Effectiveness
RQ1: Size vs Effectiveness
17
Test suite size is weakly to strongly correlated with test suite effectiveness.
Point Biserial Correlation
HTTPClient Rhino
rpb2 0.49 0.14
p-value * *
* Statistically Significant
Research Questions
18
RQ2:Coverage vs Effectiveness
RQ2: Coverage vs Effectiveness
19
Code coverage of a test suite is moderately to strongly correlated to its effectiveness.
Point Biserial CorrelationStatement Branch
HTTPClient Rhino HTTPClient Rhino
rpb2 0.33 0.59 0.36 0.55
p-value * * * *
* Statistically Significant
Conclusion & Future WorkUsing real bugs, we find that• Test suite size is weakly to strongly correlated
with test suite effectiveness.• Code coverage is moderately or strongly
correlated to the effectiveness of a test suite.
Future Work:• Expand the study to include more projects
– Address threats to external validity• Use human generated test cases
20
Thank you!
Questions? Comments? Advice?{kochharps.2012,ferdiant.2013}@[email protected]
22
Threats to Validity
• Internal validity:– We link bug reports to commits using bug ids– We use Randoop for 5 minutes
• External validity:– Only analyze 2 large software systems
• Construct validity:– We use point biserial correlation
23
Related Work• Empirical study on testing and coverage
– Gligoric et al. show that branch coverage is the best measure for test suite quality[1]
– Namin et al. show that test suite size and coverage is correlated with test suite effectiveness [2]
– Gopinath et al. investigate the correlation between coverage and a test suite’s effectiveness in killing mutants [3]
[1] M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria, ISSTA, 2013.[2] A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness, ISSTA, 2009.[3] R Gopinath, C. Jensen, and A. Groce, Code coverage for suite evaluation for developers, ICSE, 2014.