Metrics - You can't control the unfamiliar

16
/ W&I / MDSE PAGE 0 5-10-2011 Metrics are usually computed at a low level: classes, methods, …

description

Paper: You Can't Control the Unfamiliar: A Study on the Relations Between Aggregation Techniques for Software Metrics Authors: Bogdan Vasilescu, Alexander Serebrenik and Mark Van Den Brand Session: Research Track 11 - Metrics

Transcript of Metrics - You can't control the unfamiliar

Page 1: Metrics - You can't control the unfamiliar

/ W&I / MDSE PAGE 0 5-10-2011

Metrics are usually computed at a low level:

classes, methods, …

Page 2: Metrics - You can't control the unfamiliar

Multitude of data values obscures a general

picture of the system maintainability

/W&I / MDSE PAGE 1 5-10-2011

Page 3: Metrics - You can't control the unfamiliar

That we are actually interested in!

/W&I / MDSE PAGE 2 5-10-2011

Page 4: Metrics - You can't control the unfamiliar

You Can't Control the Unfamiliar:

A Study on the Relations

Between Aggregation

Techniques for Software Metrics

Bogdan Vasilescu

Alexander Serebrenik

Mark van den Brand

Page 5: Metrics - You can't control the unfamiliar

Two kinds of aggregation

Same artifact, different

metrics

Same metrics, different

artifacts

/W&I / MDSE PAGE 4 5-10-2011

Page 6: Metrics - You can't control the unfamiliar

Various techniques can be

found in the literature

Same metrics, different

artifacts

/W&I / MDSE PAGE 5 5-10-2011

Traditional: mean,

median, sum, …

Econometric

inequality indices:

Gini, Theil, Hoover,

Kolm, Atkinson

Page 7: Metrics - You can't control the unfamiliar

Various techniques can be

found in the literature

Same metrics, different

artifacts

/W&I / MDSE PAGE 6 5-10-2011

Traditional: mean,

median, sum, …

Econometric

inequality indices:

Gini, Theil, Hoover,

Kolm, Atkinson

Which

aggregation

technique

should we

use?

Page 8: Metrics - You can't control the unfamiliar

Questions

1. Which and to what extent do the different

aggregation techniques agree?

2. What is the nature of the relation between the

various aggregation techniques?

3. How does the correlation coefficient change as the

systems evolve?

/W&I / MDSE PAGE 7 5-10-2011

Page 9: Metrics - You can't control the unfamiliar

Qualitas Corpus 20101126

/W&I / MDSE PAGE 8 5-10-2011

• Qualitas Corpus 20101126r, 106 systems

• FitJava v1.1, 2 packages, 2240 SLOC

• NetBeans v6.9.1, 3373 packages 1890536 SLOC.

Page 10: Metrics - You can't control the unfamiliar

1) Agreement between diff techniques

• Agreement:

• Aggregation: Class SLOC Package

• Techniques agree if they rank the packages similarly

/W&I / MDSE PAGE 9 5-10-2011

We use rank-based correlation coefficient: Kendall’s

Page 11: Metrics - You can't control the unfamiliar

1) Agreement: different inequality indices?

• Gini, Theil, Hoover, Atkinson – agree

• aggregates obtained convey the same information

• Kolm does not!

/W&I / MDSE PAGE 10 5-10-2011

Page 12: Metrics - You can't control the unfamiliar

1) Agreement: traditional and ineq indices?

• mean

• Kolm: strong (0,8) and statistically significant (92%)

• median, standard deviation, and variance

• sum

• does not correlate with any other aggregation technique

/W&I / MDSE PAGE 11 5-10-2011

Page 13: Metrics - You can't control the unfamiliar

2) Nature of the relation: Typical patterns

• Theil is known to be more

sensitive to the rich

• Theil increases faster

when Gini increases

/W&I / MDSE PAGE 12 5-10-2011

• Linear relation with a “fat”

head

Page 14: Metrics - You can't control the unfamiliar

Which aggregation technique? (1)

• Theil, Hoover, Gini and Atkinson agree

• Any can be chosen from the correlation point of view

• Some might be “better” in each specific case

• easy to interpret: Gini [0,1]

• provide additional insights: Theil (explanation)

• negative values: Gini, Hoover

− affects the domain!

• sensitive for high values: Theil, Atkinson

• deviations from uniformity: Gini, Hoover

/ W&I / MDSE PAGE 13 5-10-2011

Page 15: Metrics - You can't control the unfamiliar

Which aggregation technique? (2)

• Kolm and mean agree

• Kolm is reliable for skewed distributions

− better alternative (“by no means”)

• Not in the paper:

− agreement observed for NOC

− but not for DIT!

/ W&I / MDSE PAGE 14 5-10-2011

Page 16: Metrics - You can't control the unfamiliar

Conclusions

/W&I / MDSE PAGE 15 5-10-2011