Evaluating the presence and impact of bias in bug-fix datasets
-
Upload
israel-herraiz -
Category
Education
-
view
1.730 -
download
3
description
Transcript of Evaluating the presence and impact of bias in bug-fix datasets
![Page 1: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/1.jpg)
Evaluating the
presence and impact
of bias in bug-fix
datasets Israel Herraiz, UPM
http://mat.caminos.upm.es/~iht
Talk at University of California,
Davis
April 11 2012
This presentation is available at http://www.slideshare.net/herraiz/evaluating-the-presence-and-impact-of-bias-in-bugfix-datasets
![Page 2: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/2.jpg)
1 / 34 http://mat.caminos.upm.es/~iht
Outline
1. Who am I and what do I do
2. The problem
3. Preliminary results
4. The road ahead
5. Take away and discussion
![Page 3: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/3.jpg)
2 / 34 http://mat.caminos.upm.es/~iht
1. Who am I and what do I do
![Page 4: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/4.jpg)
3 / 34 http://mat.caminos.upm.es/~iht
About me
• PhD on Computer Science from Universidad
Rey Juan Carlos (Madrid) • “A statistical examination of the evolution and properties
of libre software”
• http://herraiz.org/phd.html
• Assistant Professor at the Technical University
of Madrid • http://mat.caminos.upm.es/~iht
• Visiting UC Davis from April to July hosted by
Prof. Devanbu • Kindly funded by a MECD “José Castillejo” grant
(JC2011-0093)
![Page 5: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/5.jpg)
4 / 34 http://mat.caminos.upm.es/~iht
What do I do?
![Page 6: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/6.jpg)
5 / 34 http://mat.caminos.upm.es/~iht
2. The problem
![Page 7: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/7.jpg)
6 / 34 http://mat.caminos.upm.es/~iht
Replication in Empirical Software Engineering
Empirical Software Engineering studies
are hard to replicate.
Verification and replication are crucial
features of an empirical research
discipline.
Reusable datasets lower the barrier for
replication.
![Page 9: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/9.jpg)
8 / 34 http://mat.caminos.upm.es/~iht
The case of the Eclipse dataset
http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/
Defects data for all packages in the releases
2.0, 2.1 and 3.0
Size and complexity metrics for all the files
![Page 10: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/10.jpg)
9 / 34 http://mat.caminos.upm.es/~iht
Bug-fix datasets
• The Eclipse data is a bug-fix dataset
• To cross correlate bugs with files, classes or
packages, the data is extracted from
• Bug tracking systems (fixed bug reports)
• Version control system (commits)
• Heuristics to detect relationships between bug-
fix reports and commits
![Page 11: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/11.jpg)
10 / 34 http://mat.caminos.upm.es/~iht
A study using the Eclipse dataset
![Page 12: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/12.jpg)
11 / 34 http://mat.caminos.upm.es/~iht
The distribution of software faults
• The distribution of software faults (over
packages) is a Weibull distribution
• This study can be easily replicated thanks to the
Eclipse reusable bug-fix dataset
• If the same data is obtained for other case
studies, it can also be easily verified and
extended
![Page 13: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/13.jpg)
12 / 34 http://mat.caminos.upm.es/~iht
But…
![Page 14: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/14.jpg)
13 / 34 http://mat.caminos.upm.es/~iht
What’s the difference between the two conflicting
studies?
• According to the authors there are
methodological differences
• Zhang uses Alberg diagrams
• Concas et al. use CCDF plots to fit different
distributions, and reason about the generative
process as a model for software maintenance
• What I suspect is a crucial difference
• Zhang reused the Eclipse bug-fix dataset
• Concas et al. gathered the data by themselves
• So the bias in both datasets will be different
![Page 15: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/15.jpg)
14 / 34 http://mat.caminos.upm.es/~iht
What’s wrong with the Eclipse bug-fix dataset?
![Page 16: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/16.jpg)
15 / 34 http://mat.caminos.upm.es/~iht
Bug feature bias
There are other kind of bias (commit features), but in the case of the two
Eclipse papers, the distribution is about packages features, not bugs
neither commits features.
RQ1: Will this kind of bias hold for packages / classes / files
features?
RQ2: What’s the impact on defect prediction?
![Page 17: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/17.jpg)
16 / 34 http://mat.caminos.upm.es/~iht
Impact on prediction
![Page 18: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/18.jpg)
17 / 34 http://mat.caminos.upm.es/~iht
Impact on prediction
J48 tree to classify files as defective or not
![Page 19: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/19.jpg)
18 / 34 http://mat.caminos.upm.es/~iht
Conclusions so far
• Developers only mark a subset of the bug-fix pairs,
and so heuristics-based recovery methods only find
a subset of the overall bug-fix pairs
• The bias appears as a difference in the distribution
of bugs and commits features
• The conflict between the two studies about the
distribution of bugs in Eclipse is likely to be due to
differences in the distributions caused by bias
• The bias has a great impact on the accuracy of
predictor models
![Page 20: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/20.jpg)
19 / 34 http://mat.caminos.upm.es/~iht
3. Preliminary results
![Page 21: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/21.jpg)
20 / 34 http://mat.caminos.upm.es/~iht
The distribution of bugs over files
• Number of bugs per file for the case of Zxing
![Page 22: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/22.jpg)
21 / 34 http://mat.caminos.upm.es/~iht
The distribution of bugs over files
• Number of bugs per file for the case of Eclipse
![Page 23: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/23.jpg)
22 / 34 http://mat.caminos.upm.es/~iht
The distribution of bugs over files
• Comparison between the ReLink and the biased
bug-fix sets (results of the χ2 test, p-values)
![Page 24: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/24.jpg)
23 / 34 http://mat.caminos.upm.es/~iht
The distribution of bugs over files
• Comparison between the ReLink and the biased
bug-fix sets (results of the χ2 test, p-values)
RQ1: Will this kind of bias hold for packages /
classes / files features?
Not supported by these examples
![Page 25: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/25.jpg)
24 / 34 http://mat.caminos.upm.es/~iht
Time over!
• So there is no difference between the biased
and non-biased datasets?
• And how come the ReLink paper (and others)
report improved accuracies when using the non-
biased datasets?
• What could explain these differences?
![Page 26: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/26.jpg)
25 / 34 http://mat.caminos.upm.es/~iht
Impact on prediction accuracy
• What is the prediction accuracy using different
(biased and non-biased) datasets?
• Three datasets
• Biased datasets recovered using heuristics
• “Golden” dataset manually recovered
• By Sung Kim et al., not me!
• Non-biased dataset obtained using the ReLink
tool
• J48 tree classifier, 10 folds cross validation
• Test datasets always extracted from the golden
dataset
![Page 27: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/27.jpg)
26 / 34 http://mat.caminos.upm.es/~iht
F-measure values
• Procedure
• Extract 100 subsamples of the same size for
both datasets
• Calculate F-measure using a 10 folds cross
validation
• The test set is always extracted from the “golden”
set
• Repeat for several subsample sizes
• Only results for the case of OpenIntents so far
![Page 28: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/28.jpg)
27 / 34 http://mat.caminos.upm.es/~iht
![Page 29: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/29.jpg)
28 / 34 http://mat.caminos.upm.es/~iht
RQ2: Impact on prediction
Not clear whether there is any impact
![Page 30: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/30.jpg)
29 / 34 http://mat.caminos.upm.es/~iht
RQ2: Impact on prediction
Not clear whether there is any impact
Little warning!
The size is not exactly the same for
the three cases in each boxplot.
The biased is always the smallest
of the three.
I have to repeat this using exactly
the same size for the three
datasets.
![Page 31: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/31.jpg)
30 / 34 http://mat.caminos.upm.es/~iht
Preliminary conclusions
• The biased dataset does not provide the worst
accuracy when predicting fault proneness for a
set of (supposedly) unbiased bug fixes and files
• Contrarily to what is reported in previous work
• What is the cause of the reported differences in
accuracy?
• By definition, the size of the so-called biased
dataset will be always smaller
• Dataset size does have an impact on the F-
measure
![Page 32: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/32.jpg)
31 / 34 http://mat.caminos.upm.es/~iht
4. The road ahead
![Page 33: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/33.jpg)
32 / 34 http://mat.caminos.upm.es/~iht
My workplan at UC Davis
• Discuss the ideas shown here
• Is bias really a problem for defect prediction?
• Extend the study to more cases
• Do you have a dataset of files, bugs, commits,
metrics? Please let me know!
• Improve the study
• What happens if we break down the data in more
coherent subgroups
• Do the results change at different levels of
granularity?
![Page 34: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/34.jpg)
33 / 34 http://mat.caminos.upm.es/~iht
5. Take away and conclusions
![Page 35: Evaluating the presence and impact of bias in bug-fix datasets](https://reader033.fdocuments.in/reader033/viewer/2022060111/55687586d8b42a3b7b8b4ce7/html5/thumbnails/35.jpg)
34 / 34 http://mat.caminos.upm.es/~iht
Systematic difference in bug-fixes collected
by heuristics
No observable difference in the
statistical properties of the so-called biased
dataset
Impact on prediction accuracy not clear
Ecological inference
What happens at other scales?
With other subgroups?