Benevol 2010
-
Upload
bogdan-vasilescu -
Category
Education
-
view
243 -
download
0
description
Transcript of Benevol 2010
Software metrics are usually right-skewed
Histogram of SLOC(org.argouml.ui)
SLOC for classes in org.argouml.ui
Fre
quen
cy
0 100 200 300 400 500
05
1015
2025
2/11
Aggregation of software metrics using the“softnometric” index
Bogdan [email protected]
Eindhoven University of TechnologyThe Netherlands
March 9, 2011
3/11Aggregation techniques
Classical:I MeanI SumI Cardinality
Distribution fitting:I Log-normalI ExponentialI Negative binomial
Inequality indices:I TheilI GiniI KolmI Atkinson
3/11Aggregation techniques
Classical:I MeanI SumI Cardinality
Distribution fitting:I Log-normalI ExponentialI Negative binomial
Inequality indices:I TheilI GiniI KolmI Atkinson
4/11Gini index
The Gini index is based on the Lorenz curve:I proportion of the total income of the population (y-axis)
cumulatively earned by the bottom x% of the people.I 0 perfect equality: every person receives the same income.I 1 perfect inequality: one person receives all the income.
IGini(X ) = AA+B
4/11Gini index
The Gini index is based on the Lorenz curve:I proportion of the total income of the population (y-axis)
cumulatively earned by the bottom x% of the people.I 0 perfect equality: every person receives the same income.I 1 perfect inequality: one person receives all the income.
IGini(X ) = AA+B
5/11Theoretical comparison
Criteria:I Domain→ determines applicability
I Range→ determines interpretation
I Invariance• w.r.t. addition→ LOC, ignore headers• w.r.t. multiplication→ LOC, percentages vs. absolute values
I Decomposability→ explain inequality by partitioning thepopulation into groups
6/11Theoretical comparison
Agg. technique Domain Range Invariance Decomposability
Mean R R - N/ASum R R - N/ACardinality R N - N/AGini Index R+ [0, 1] mult. -
R R mult. -Theil Index R+ [0, log n] mult. yesKolm Index R R+ add. yesAtkinson Index R+ [0, 1− 1/n] mult. -
7/11Empirical comparison
Research questions:
I Does LOC relate to bugs?
I Do the aggregation techniques influence the presence/strength ofthis relation?
I Is there any difference between the aggregation techniques?Do they express the same thing?
8/11Empirical comparison
Case study: ArgoUMLI Open-source,∼ 1200 Java classes,∼ 100 packages.
Methodology:I Tool chain to automatically process issue tracker and version
control system data.I Mapped defects to Java classes and then packages.I Measured SLOC of each class, aggregated to package level.I For each aggregation technique, statistically studied correlation
with bugs.
8/11Empirical comparison
Case study: ArgoUMLI Open-source,∼ 1200 Java classes,∼ 100 packages.
Methodology:I Tool chain to automatically process issue tracker and version
control system data.I Mapped defects to Java classes and then packages.I Measured SLOC of each class, aggregated to package level.I For each aggregation technique, statistically studied correlation
with bugs.
9/11Results
mean IGini ITheil IKolm IAtkinson defectsmean 0.170 0.192 0.6761 0.203 0.0096IGini 0.908 0.467 0.903 0.27ITheil 0.488 0.918 0.273IKolm 0.501 0.119IAtkinson 0.229
I IGini, ITheil and IAtkinson indicate the strongest and also statisticallysignificant correlation with the number of defects.However, high and statistically significant correlation betweenthem.
I Mean indicates the lowest correlation with the number of defects.
1statistically significant correlations, with two-sided p-values not exceeding 0.01, are typeset in boldface
10/11Threats to validity
No control over the issue tracker→mapping of defects to classes.I bugs missing from the issue tracker.I bug fixes not showing up in the commit log.
How representative is the case? How about the version?I replicate on more systems and more versions.
Is LOC the most suitable metric?I replicate with more metrics.
11/11Conclusions
Software metrics are not distributed normally.
Histogram of SLOC(org.argouml.ui)
SLOC for classes in org.argouml.ui
Fre
quen
cy
0 100 200 300 400 500
05
1015
2025
Theoretical comparison.Agg. technique Domain Range Invariance Decomposability
Mean R R - N/ASum R R - N/ACardinality R N - N/AGini Index R+ [0, 1] mult. -
R R mult. -Theil Index R+ [0, log n] mult. yesKolm Index R R+ add. yesAtkinson Index R+ [0, 1− 1/n] mult. -
Empirical comparison.mean Gini Theil Kolm Atkinson defects
mean 0.170 0.192 0.676 0.203 0.0096Gini 0.908 0.467 0.903 0.27Theil 0.488 0.918 0.273Kolm 0.501 0.119Atkinson 0.229
Classical aggregation techniques have problems when distributions areskewed. Inequality indices look more promising.