Determinants of Small Business EDI Adoption: An Empirical ...
An Empirical Study of Adoption of Software Testing in Open Source Projects
-
Upload
pavneet-singh-kochhar -
Category
Software
-
view
210 -
download
1
Transcript of An Empirical Study of Adoption of Software Testing in Open Source Projects
An Empirical Study of Adoption of Software Testing in Open Source Projects
Pavneet Singh Kochhar1, Tegawendé F. Bissyandé2, David Lo1, Lingxiao Jiang1
1Singapore Management University2University of Luxembourg
2/24
Importance of Software Testing
Functionality -- Requirements
Debugging -- Software complexity
Costs -- $59 billions* for inadequate testing
What is the adoption of test casesIn open-source projects?
*G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, RTI Project, 2002.
3/24
Objective & Contributions
Popularity of test cases
Presence of test cases – project characteristics
Influence of software development artifacts
Large Scale Study on over 20,000 GitHub projects
4/24
Dataset & Statistical computationsDownloaded over 100,000 projects from GitHub
Randomly selected 50,000 projects Preliminary study
Filter out projects with < 500 Lines of Code (LOC) 20,817 projects
5/24
Dataset & Statistical computationsLines of code
LOC* – By programming languages
Number of test cases Count of test files
Developer contributions Project team size
Bug count Tags
Bug reporters User names
*SLOCCount (http://www.dwheeler.com/sloccount/)
6/24
Lines of Code (LOC)
7/24
RQ1– Popularity of Test Cases
Projects % of ProjectsWithout Test Cases 38.34%
With Test Cases 61.65%
84.87% of the projects < 100 test cases10.7% of the projects have >100 & < 500 cases4.4% of the projects >500 test cases
Distribution of Test Cases
8/24
Box Plot
Median
LowerQuartile
UpperQuartile
LowerWhisker
(25% of Data)
UpperWhisker
(25% of Data)
Outliers
50% of Data
9/24
RQ1– Popularity of Test CasesLOC (Projects with & without Test cases)
Difference between the distributions is statistically significan (p-value < 0.05)
10/24
RQ1– Popularity of Test CasesLOC & Test Cases
Positive correlation between #LOC and #Test Cases (ρ=0.427) (p-value < 0.05)
11//24
RQ1– Popularity of Test CasesLOC & Test cases/LOC
Negative correlation between #LOC and #Test Cases/LOC (ρ=-0.451) (p-value < 0.05)
12/24
RQ2– Developers & Test CasesDevelopers (Projects with & without Test cases)
Difference between the distributions is statistically significant (p-value < 0.05)
13/24
RQ2– Developers & Test CasesDevelopers & Test cases
Weak correlation between #Developers and #Test Cases (ρ=0.207) (p-value < 0.05)
14/24
RQ2– Developers & Test CasesDevelopers & Test cases/developer
Negative correlation between Team size and #Test Cases per developer (ρ=-0.444) (p-value < 0.05)
15/24
RQ3–Bug Count and Test Cases Identifying bugs (Tags)
bug bug; T bug; Bug Confirmed; bugs; starter bug; bug fix etc.
defect defect; Type-Defect; minor defect
error error; Wow error; build error; error page; user error etc.
16/24
RQ3–Bug Count and Test Cases Test cases & Bugs
Weak correlation between # bugs and #Test Cases (ρ=0.181) (p-value < 0.05)
17/24
RQ4–Bug Reporters and Test CasesBug reporters (Projects with & without Test cases)
Difference between the distributions is statistically significant (p-value < 0.05)
18/24
RQ4– Bug Reporters and Test CasesTest cases & Bug reporters
Weak correlation between # bug reporters and #Test Cases (ρ=0.171) (p-value < 0.05)
19/24
RQ5–Programming Languages and Test CasesProjects (Top 10 Languages)
1. Java2. Ruby3. PHP4. Python5. ANSI C6. C++7. Objective-C8. C#9. JavaScript10.Perl
20/24
RQ5–Programming Languages and Test CasesTest Cases/Project (Top 10 Languages)
Language # of Projects # of Test Cases Test Cases/ ProjectC++ 1,920 648,773 337.90
ANSI C 2,197 286,009 130.18
PHP 2,902 255,553 88.06
C# 1,042 81,334 78.05
Java 3,112 196,703 63.20
Ruby 3,016 173,864 57.64
JavaScript 819 39,070 47.70
Python 2,536 103,600 40.85
Objective-C 1,153 21,343 18.51
Perl 630 7,690 12.20
21/24
RQ5–Programming Languages and Test CasesTest Cases (Median) (Top 10 Languages)
Distribution of test cases (C++)
22/24
23/24
Threats to Validity
Heuristics to detect test cases
Counting bugs Tags: bug, error, defect
Not all projects use GitHub’s issue tracking system
24/24
Conclusion
Findings:o Projects with test cases are bigger in size. o # of test cases per LOC decreases with increasing LOC.o The more developers, the more test caseso The more developers, the less ratio of test cases/developero Weak correlation between # of test cases and # of bugso # of test cases and # of bug reporters have weak positive
correlationo Projects written in popular languages such as C++, ANSI C & PHP have higher mean numbers of test cases.
Future agenda:-- Exploration of the influence of more project characteristics/metrics- - Check with other open source datasets- - Use language specific heuristics
25/24
Appendix
Bug Tags
27
installation rich Improvement Reporting
duplicated pat New feature community
feature mark Confirmed documentation
routing needs review In Progress categorization
optimization Samples Feature request publishing
security Unable to reproduce Wont fix ranker
translations nack Resolved server
ui rich Bug confirmed Fatal
TODO pat backend Build System
low priority mark low-priority MS AspNet
Sam presentation frontend OAuth2
22/23
C++ test cases
URL Language # of test cases
https://github.com/isis-project/WebKit cpp 166,488
https://github.com/cswei/Olympia_on_Desktop cpp 94,591
https://github.com/librelab/qtmoko-test cpp 52,039
https://github.com/mozilla/mozilla-central cpp 36,671
https://github.com/weissms/owb-mirror cpp 29,340
Distribution of test cases (C#)
29
30
RQ5–Programming Languages and Test CasesTest Cases (Top 10 Languages)
Median
LowerQuartile
UpperQuartile
Lowerwhisker
UpperWhisker Outliers
50% of Data