Idea

48
Learning to Change Projects Raymond Borges, Tim Menzies Lane Department of Computer Science & Electrical Engineering West Virginia University PROMISE’12: Lund, Sweden Sept 21, 2012 [email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 1 / 18

Transcript of Idea

Page 1: Idea

Learning toChange Projects

Raymond Borges, Tim Menzies

Lane Department of Computer Science & Electrical EngineeringWest Virginia University

PROMISE’12: Lund, SwedenSept 21, 2012

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 1 / 18

Page 2: Idea

Sound bites

Less predicition, more decision

Data has shape

“Data mining” = “carving” out that shape

To reveal shape, remove irrelvancies

Cut the cr*pUse reduction operators: dimension, column, row, rule

Show, don’t code

Once you can see shape, inference is superflous.Implications for other research.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 1 / 18

Page 3: Idea

Decisions, Decisions...

Tom Zimmermann:

“We forget that the original motivation for predictive modeling wasmaking decisions about software project.”

ICSE 2012 Panel on Software Analytics

“Prediction is all well and good, but what about decision making?”.

Predictive models are useful

They focus an inquiry onto particular issues

but predictions are sub-routines of decision processes

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18

Page 4: Idea

Decisions, Decisions...

Tom Zimmermann:

“We forget that the original motivation for predictive modeling wasmaking decisions about software project.”

ICSE 2012 Panel on Software Analytics

“Prediction is all well and good, but what about decision making?”.

Predictive models are useful

They focus an inquiry onto particular issues

but predictions are sub-routines of decision processes

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18

Page 5: Idea

Decisions, Decisions...

Tom Zimmermann:

“We forget that the original motivation for predictive modeling wasmaking decisions about software project.”

ICSE 2012 Panel on Software Analytics

“Prediction is all well and good, but what about decision making?”.

Predictive models are useful

They focus an inquiry onto particular issues

but predictions are sub-routines of decision processes

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18

Page 6: Idea

Q: How to Build Decision Systems?

1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies

Score contexts e.g. Hate, Love; count frequencies of ranges in each:

Diagnosis = what went wrong. δ = Hate(now)− Love(past)

Monitor = what not to do. δ = Hate(next)− Love(now)

Planning = what to do next. δ = Love(next)− Hate(now)

δ = X − Y = contrast set = things frequent X but rare in Y

TAR3 (2003),WHICH (2010),etc

But for PROMISE effort estimation data

Contrast sets are obvious...

... Once you find the underlying shape of the data.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18

Page 7: Idea

Q: How to Build Decision Systems?

1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies

Score contexts e.g. Hate, Love; count frequencies of ranges in each:

Diagnosis = what went wrong. δ = Hate(now)− Love(past)

Monitor = what not to do. δ = Hate(next)− Love(now)

Planning = what to do next. δ = Love(next)− Hate(now)

δ = X − Y = contrast set = things frequent X but rare in Y

TAR3 (2003),WHICH (2010),etc

But for PROMISE effort estimation data

Contrast sets are obvious...

... Once you find the underlying shape of the data.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18

Page 8: Idea

Q: How to Build Decision Systems?

1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies

Score contexts e.g. Hate, Love; count frequencies of ranges in each:

Diagnosis = what went wrong.

δ = Hate(now)− Love(past)

Monitor = what not to do. δ = Hate(next)− Love(now)

Planning = what to do next. δ = Love(next)− Hate(now)

δ = X − Y = contrast set = things frequent X but rare in Y

TAR3 (2003),WHICH (2010),etc

But for PROMISE effort estimation data

Contrast sets are obvious...

... Once you find the underlying shape of the data.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18

Page 9: Idea

Q: How to Build Decision Systems?

1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies

Score contexts e.g. Hate, Love; count frequencies of ranges in each:

Diagnosis = what went wrong. δ = Hate(now)− Love(past)

Monitor = what not to do. δ = Hate(next)− Love(now)

Planning = what to do next. δ = Love(next)− Hate(now)

δ = X − Y = contrast set = things frequent X but rare in Y

TAR3 (2003),WHICH (2010),etc

But for PROMISE effort estimation data

Contrast sets are obvious...

... Once you find the underlying shape of the data.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18

Page 10: Idea

Q: How to Build Decision Systems?

1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies

Score contexts e.g. Hate, Love; count frequencies of ranges in each:

Diagnosis = what went wrong. δ = Hate(now)− Love(past)

Monitor = what not to do.

δ = Hate(next)− Love(now)

Planning = what to do next. δ = Love(next)− Hate(now)

δ = X − Y = contrast set = things frequent X but rare in Y

TAR3 (2003),WHICH (2010),etc

But for PROMISE effort estimation data

Contrast sets are obvious...

... Once you find the underlying shape of the data.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18

Page 11: Idea

Q: How to Build Decision Systems?

1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies

Score contexts e.g. Hate, Love; count frequencies of ranges in each:

Diagnosis = what went wrong. δ = Hate(now)− Love(past)

Monitor = what not to do. δ = Hate(next)− Love(now)

Planning = what to do next. δ = Love(next)− Hate(now)

δ = X − Y = contrast set = things frequent X but rare in Y

TAR3 (2003),WHICH (2010),etc

But for PROMISE effort estimation data

Contrast sets are obvious...

... Once you find the underlying shape of the data.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18

Page 12: Idea

Q: How to Build Decision Systems?

1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies

Score contexts e.g. Hate, Love; count frequencies of ranges in each:

Diagnosis = what went wrong. δ = Hate(now)− Love(past)

Monitor = what not to do. δ = Hate(next)− Love(now)

Planning = what to do next.

δ = Love(next)− Hate(now)

δ = X − Y = contrast set = things frequent X but rare in Y

TAR3 (2003),WHICH (2010),etc

But for PROMISE effort estimation data

Contrast sets are obvious...

... Once you find the underlying shape of the data.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18

Page 13: Idea

Q: How to Build Decision Systems?

1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies

Score contexts e.g. Hate, Love; count frequencies of ranges in each:

Diagnosis = what went wrong. δ = Hate(now)− Love(past)

Monitor = what not to do. δ = Hate(next)− Love(now)

Planning = what to do next. δ = Love(next)− Hate(now)

δ = X − Y = contrast set = things frequent X but rare in Y

TAR3 (2003),WHICH (2010),etc

But for PROMISE effort estimation data

Contrast sets are obvious...

... Once you find the underlying shape of the data.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18

Page 14: Idea

Q: How to Build Decision Systems?

1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies

Score contexts e.g. Hate, Love; count frequencies of ranges in each:

Diagnosis = what went wrong. δ = Hate(now)− Love(past)

Monitor = what not to do. δ = Hate(next)− Love(now)

Planning = what to do next. δ = Love(next)− Hate(now)

δ = X − Y = contrast set = things frequent X but rare in Y

TAR3 (2003),WHICH (2010),etc

But for PROMISE effort estimation data

Contrast sets are obvious...

... Once you find the underlying shape of the data.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18

Page 15: Idea

Q: How to find the underlying shape of the data?

Data mining = data carving

To find the signal in the noise...

Timm’s algorithm

1 Find some cr*p

2 Throw it away

3 Go to 1

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 4 / 18

Page 16: Idea

IDEA = Iterative Dichomization on Every Attribute

Timm’s algorithm

1 Find some cr*p

2 Throw it away

3 Go to 1

1 Dimensionality reduction

2 Column reduction

3 Row reduction

4 Rule reduction

And in the reduced data, inference is obvious.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18

Page 17: Idea

IDEA = Iterative Dichomization on Every Attribute

Timm’s algorithm

1 Find some cr*p

2 Throw it away

3 Go to 1

1 Dimensionality reduction

2 Column reduction

3 Row reduction

4 Rule reduction

And in the reduced data, inference is obvious.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18

Page 18: Idea

IDEA = Iterative Dichomization on Every Attribute

Timm’s algorithm

1 Find some cr*p

2 Throw it away

3 Go to 1

1 Dimensionality reduction

2 Column reduction

3 Row reduction

4 Rule reduction

And in the reduced data, inference is obvious.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18

Page 19: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)

Fastmap (Faloutsos’94)

W = anything

X = furthest from W

Y = furthest from X

Takes time O(2N)

Let c = dist(X,Y)

If Z has distance a,b to X,Y thenX projects to a2+c2−b2

2c

Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18

Page 20: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)

Fastmap (Faloutsos’94)

W = anything

X = furthest from W

Y = furthest from X

Takes time O(2N)

Let c = dist(X,Y)

If Z has distance a,b to X,Y thenX projects to a2+c2−b2

2c

Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18

Page 21: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)

Fastmap (Faloutsos’94)

W = anything

X = furthest from W

Y = furthest from X

Takes time O(2N)

Let c = dist(X,Y)

If Z has distance a,b to X,Y thenX projects to a2+c2−b2

2c

Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18

Page 22: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)

Fastmap (Faloutsos’94)

W = anything

X = furthest from W

Y = furthest from X

Takes time O(2N)

Let c = dist(X,Y)

If Z has distance a,b to X,Y thenX projects to a2+c2−b2

2c

Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18

Page 23: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)

Fastmap (Faloutsos’94)

W = anything

X = furthest from W

Y = furthest from X

Takes time O(2N)

Let c = dist(X,Y)

If Z has distance a,b to X,Y thenX projects to a2+c2−b2

2c

Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18

Page 24: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)

2 Column reduction (info gain)

Sort columns by their diversity

Keep columns that select for fewest clusters

e.g. nine rows in two clusters

cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5

cluster c2 has acap=2,2,2,3; pcap=3,4,4,5

p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33

p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33

I(col) =∑

(p(x) ∗ (∑−p(x |c).log(x |c)))

I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18

Page 25: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)2 Column reduction (info gain)

Sort columns by their diversity

Keep columns that select for fewest clusters

e.g. nine rows in two clusters

cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5

cluster c2 has acap=2,2,2,3; pcap=3,4,4,5

p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33

p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33

I(col) =∑

(p(x) ∗ (∑−p(x |c).log(x |c)))

I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18

Page 26: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)2 Column reduction (info gain)

Sort columns by their diversity

Keep columns that select for fewest clusters

e.g. nine rows in two clusters

cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5

cluster c2 has acap=2,2,2,3; pcap=3,4,4,5

p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33

p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33

I(col) =∑

(p(x) ∗ (∑−p(x |c).log(x |c)))

I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18

Page 27: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)2 Column reduction (info gain)

Sort columns by their diversity

Keep columns that select for fewest clusters

e.g. nine rows in two clusters

cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5

cluster c2 has acap=2,2,2,3; pcap=3,4,4,5

p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33

p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33

I(col) =∑

(p(x) ∗ (∑−p(x |c).log(x |c)))

I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18

Page 28: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)2 Column reduction (info gain)

Sort columns by their diversity

Keep columns that select for fewest clusters

e.g. nine rows in two clusters

cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5

cluster c2 has acap=2,2,2,3; pcap=3,4,4,5

p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33

p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33

I(col) =∑

(p(x) ∗ (∑−p(x |c).log(x |c)))

I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18

Page 29: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)2 Column reduction (info gain)

Sort columns by their diversity

Keep columns that select for fewest clusters

e.g. nine rows in two clusters

cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5

cluster c2 has acap=2,2,2,3; pcap=3,4,4,5

p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33

p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33

I(col) =∑

(p(x) ∗ (∑−p(x |c).log(x |c)))

I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18

Page 30: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)

2 Column reduction (info gain)

3 Row reduction (replace clusters with their mean)

Replace all leaf cluster instances with their centroid

Described only using columns within 50% of min diversity.

e.g. Nasa93 reduces to 12 columns and 13 centroids.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18

Page 31: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)

2 Column reduction (info gain)

3 Row reduction (replace clusters with their mean)

Replace all leaf cluster instances with their centroid

Described only using columns within 50% of min diversity.

e.g. Nasa93 reduces to 12 columns and 13 centroids.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18

Page 32: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)

2 Column reduction (info gain)

3 Row reduction (replace clusters with their mean)

Replace all leaf cluster instances with their centroid

Described only using columns within 50% of min diversity.

e.g. Nasa93 reduces to 12 columns and 13 centroids.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18

Page 33: Idea

Nasa93 reduces to 12 columns and 13 centroids

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 9 / 18

Page 34: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)

2 Column reduction (info gain)

3 Row reduction (replace clusters with their mean)

4 Rule reduction (contrast home vs neighbors)

Surprise: after steps 1,2,3...

Further computation is superfluous.

Visuals sufficient for contrast set generation

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 10 / 18

Page 35: Idea

IDEA = Iterative Dichomization on Every Attribute

1 Dimensionality reduction (recursive fast PCA)

2 Column reduction (info gain)

3 Row reduction (replace clusters with their mean)

4 Rule reduction (contrast home vs neighbors)

Surprise: after steps 1,2,3...

Further computation is superfluous.

Visuals sufficient for contrast set generation

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 10 / 18

Page 36: Idea

Manual Construction of Contrast Sets

Table5 = Your “home” cluster

Table6 = Projects of similar size

Table7 = Nearby project with fearsome effort

Contrast set = delta on last line

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 11 / 18

Page 37: Idea

Why Cluster120?

Is it valid that cluter120 costs so much?

Yes, if building core services with cost amortized over N future apps.

No, if racing to get products to a competitive market

We do not know- but at least we are focused on that issue.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 12 / 18

Page 38: Idea

Reductions on PROMISE data sets

0

5

10

15

20

25

1 10 100

colu

mns

rows

size of reduces data setsreduced

data set rows columnsAlbrecht 4 4

China 66 15Cocomo81 8 18

Cocomo81e 4 16Cocomo81o 4 16Cocomo81s 2 16Desharnais 8 19

Desharnais L1 6 10Desharnais L2 4 10Desharnais L3 2 10

Finnish 6 2Kemerer 2 7

Miyazaki’94 6 3Nasa93 13 12

Nasa93 center 5 7 16Nasa93 center1 2 15Nasa93 center2 5 16

SDR 4 21Telcom1 2 1

Q: throwing away too much?

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 13 / 18

Page 39: Idea

Reductions on PROMISE data sets

0

5

10

15

20

25

1 10 100

colu

mns

rows

size of reduces data setsreduced

data set rows columnsAlbrecht 4 4

China 66 15Cocomo81 8 18

Cocomo81e 4 16Cocomo81o 4 16Cocomo81s 2 16Desharnais 8 19

Desharnais L1 6 10Desharnais L2 4 10Desharnais L3 2 10

Finnish 6 2Kemerer 2 7

Miyazaki’94 6 3Nasa93 13 12

Nasa93 center 5 7 16Nasa93 center1 2 15Nasa93 center2 5 16

SDR 4 21Telcom1 2 1

Q: throwing away too much?

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 13 / 18

Page 40: Idea

Q: Throwing Away Too Much?

Estimates = class variable of nearest centroid in reduced space

Compare to 90 pre-processors*learners from Kocagueneli et al. TSE,2011 On the Value of Ensemble Learning in Effort Estimation.

Performance measure = MRE = pred−actualactual

9 pre-processors:1 norm: normalize numerics 0..1, min..max

2 log: replace numerics of the non-class columns withtheir logarithms

3 PCA: replace non-class columns with principlecomponents

4 SWReg: cull uninformative columns with stepwiseregression

5 Width3bin: divide numerics into 3 bins with boundaries(max-min)/3

6 Wdith5bin: divide numerics into 5 bins with boundaries(max-min)/5

7 Freq3bins: split numerics into 3 equal size percentiles.

8 Freq5bins: split numerics into 5 equal size percentiles.

9 None: no pre-processor.

10 learners:1 INN: simple one nearest neighbor

2 ABE0-1nn: analogy-based estimation using nearestneighbor.

3 ABE0-5nn: analogy-based estimation using the medianof the five nearest neighbors.

4 CART(yes): regression trees, with sub-tree postpruning.

5 CART(no): regression trees, no post-pruning.

6 NNet: two-layered neural net.

7 LReg: linear regression

8 PLSR: partial least squares regression.

9 PCR: principle components regression

10 SWReg: Stepwise regressions.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18

Page 41: Idea

Q: Throwing Away Too Much?

Estimates = class variable of nearest centroid in reduced space

Compare to 90 pre-processors*learners from Kocagueneli et al. TSE,2011 On the Value of Ensemble Learning in Effort Estimation.

Performance measure = MRE = pred−actualactual

9 pre-processors:1 norm: normalize numerics 0..1, min..max

2 log: replace numerics of the non-class columns withtheir logarithms

3 PCA: replace non-class columns with principlecomponents

4 SWReg: cull uninformative columns with stepwiseregression

5 Width3bin: divide numerics into 3 bins with boundaries(max-min)/3

6 Wdith5bin: divide numerics into 5 bins with boundaries(max-min)/5

7 Freq3bins: split numerics into 3 equal size percentiles.

8 Freq5bins: split numerics into 5 equal size percentiles.

9 None: no pre-processor.

10 learners:1 INN: simple one nearest neighbor

2 ABE0-1nn: analogy-based estimation using nearestneighbor.

3 ABE0-5nn: analogy-based estimation using the medianof the five nearest neighbors.

4 CART(yes): regression trees, with sub-tree postpruning.

5 CART(no): regression trees, no post-pruning.

6 NNet: two-layered neural net.

7 LReg: linear regression

8 PLSR: partial least squares regression.

9 PCR: principle components regression

10 SWReg: Stepwise regressions.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18

Page 42: Idea

Q: Throwing Away Too Much?

Estimates = class variable of nearest centroid in reduced space

Compare to 90 pre-processors*learners from Kocagueneli et al. TSE,2011 On the Value of Ensemble Learning in Effort Estimation.

Performance measure = MRE = pred−actualactual

9 pre-processors:1 norm: normalize numerics 0..1, min..max

2 log: replace numerics of the non-class columns withtheir logarithms

3 PCA: replace non-class columns with principlecomponents

4 SWReg: cull uninformative columns with stepwiseregression

5 Width3bin: divide numerics into 3 bins with boundaries(max-min)/3

6 Wdith5bin: divide numerics into 5 bins with boundaries(max-min)/5

7 Freq3bins: split numerics into 3 equal size percentiles.

8 Freq5bins: split numerics into 5 equal size percentiles.

9 None: no pre-processor.

10 learners:1 INN: simple one nearest neighbor

2 ABE0-1nn: analogy-based estimation using nearestneighbor.

3 ABE0-5nn: analogy-based estimation using the medianof the five nearest neighbors.

4 CART(yes): regression trees, with sub-tree postpruning.

5 CART(no): regression trees, no post-pruning.

6 NNet: two-layered neural net.

7 LReg: linear regression

8 PLSR: partial least squares regression.

9 PCR: principle components regression

10 SWReg: Stepwise regressions.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18

Page 43: Idea

Results

Perennial problem with assessing different effort estimation tools.MRE not normal: low valley, high hills (injects much variance)

IDEA’s predictions not better or worse than others, avoids all hills

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 15 / 18

Page 44: Idea

Results

Perennial problem with assessing different effort estimation tools.MRE not normal: low valley, high hills (injects much variance)

IDEA’s predictions not better or worse than others, avoids all hills

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 15 / 18

Page 45: Idea

Related Work

Cluster using (a) centrality (e.g. k-means);(b) connectedness (e.g. dbScan) (c) separation (e.g. IDEA)

case- featureWho based clustering selection taskShepperd (1997)

√predict

Boley (1998) recursive PCA predictBettenburg et al. (MSR’12) recurive regression predictPosnett et al. (ASE’11) on file/package divisions predictMenzies et al. (ASE’11)

√FastMap contrast

IDEA√ √ √

contrast

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 16 / 18

Page 46: Idea

Related Work

Cluster using (a) centrality (e.g. k-means);(b) connectedness (e.g. dbScan) (c) separation (e.g. IDEA)

case- featureWho based clustering selection taskShepperd (1997)

√predict

Boley (1998) recursive PCA predictBettenburg et al. (MSR’12) recurive regression predictPosnett et al. (ASE’11) on file/package divisions predictMenzies et al. (ASE’11)

√FastMap contrast

IDEA√ √ √

contrast

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 16 / 18

Page 47: Idea

Back to the Sound bites

Less predicition, more decision

Data has shape

“Data mining” = “carving” out that shape

To reveal shape, remove irrelvancies

Cut the cr*pIDEA = reduction operators: dimension, column, row, rule

Show, don’t code

Once you can see shape, inference is superflous.Implications for other research.

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 17 / 18

Page 48: Idea

Questions? Comments?

[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 18 / 18