Idea
-
Upload
cs-ncstate -
Category
Technology
-
view
437 -
download
2
Transcript of Idea
Learning toChange Projects
Raymond Borges, Tim Menzies
Lane Department of Computer Science & Electrical EngineeringWest Virginia University
PROMISE’12: Lund, SwedenSept 21, 2012
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 1 / 18
Sound bites
Less predicition, more decision
Data has shape
“Data mining” = “carving” out that shape
To reveal shape, remove irrelvancies
Cut the cr*pUse reduction operators: dimension, column, row, rule
Show, don’t code
Once you can see shape, inference is superflous.Implications for other research.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 1 / 18
Decisions, Decisions...
Tom Zimmermann:
“We forget that the original motivation for predictive modeling wasmaking decisions about software project.”
ICSE 2012 Panel on Software Analytics
“Prediction is all well and good, but what about decision making?”.
Predictive models are useful
They focus an inquiry onto particular issues
but predictions are sub-routines of decision processes
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18
Decisions, Decisions...
Tom Zimmermann:
“We forget that the original motivation for predictive modeling wasmaking decisions about software project.”
ICSE 2012 Panel on Software Analytics
“Prediction is all well and good, but what about decision making?”.
Predictive models are useful
They focus an inquiry onto particular issues
but predictions are sub-routines of decision processes
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18
Decisions, Decisions...
Tom Zimmermann:
“We forget that the original motivation for predictive modeling wasmaking decisions about software project.”
ICSE 2012 Panel on Software Analytics
“Prediction is all well and good, but what about decision making?”.
Predictive models are useful
They focus an inquiry onto particular issues
but predictions are sub-routines of decision processes
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18
Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now)− Love(past)
Monitor = what not to do. δ = Hate(next)− Love(now)
Planning = what to do next. δ = Love(next)− Hate(now)
δ = X − Y = contrast set = things frequent X but rare in Y
TAR3 (2003),WHICH (2010),etc
But for PROMISE effort estimation data
Contrast sets are obvious...
... Once you find the underlying shape of the data.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now)− Love(past)
Monitor = what not to do. δ = Hate(next)− Love(now)
Planning = what to do next. δ = Love(next)− Hate(now)
δ = X − Y = contrast set = things frequent X but rare in Y
TAR3 (2003),WHICH (2010),etc
But for PROMISE effort estimation data
Contrast sets are obvious...
... Once you find the underlying shape of the data.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong.
δ = Hate(now)− Love(past)
Monitor = what not to do. δ = Hate(next)− Love(now)
Planning = what to do next. δ = Love(next)− Hate(now)
δ = X − Y = contrast set = things frequent X but rare in Y
TAR3 (2003),WHICH (2010),etc
But for PROMISE effort estimation data
Contrast sets are obvious...
... Once you find the underlying shape of the data.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now)− Love(past)
Monitor = what not to do. δ = Hate(next)− Love(now)
Planning = what to do next. δ = Love(next)− Hate(now)
δ = X − Y = contrast set = things frequent X but rare in Y
TAR3 (2003),WHICH (2010),etc
But for PROMISE effort estimation data
Contrast sets are obvious...
... Once you find the underlying shape of the data.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now)− Love(past)
Monitor = what not to do.
δ = Hate(next)− Love(now)
Planning = what to do next. δ = Love(next)− Hate(now)
δ = X − Y = contrast set = things frequent X but rare in Y
TAR3 (2003),WHICH (2010),etc
But for PROMISE effort estimation data
Contrast sets are obvious...
... Once you find the underlying shape of the data.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now)− Love(past)
Monitor = what not to do. δ = Hate(next)− Love(now)
Planning = what to do next. δ = Love(next)− Hate(now)
δ = X − Y = contrast set = things frequent X but rare in Y
TAR3 (2003),WHICH (2010),etc
But for PROMISE effort estimation data
Contrast sets are obvious...
... Once you find the underlying shape of the data.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now)− Love(past)
Monitor = what not to do. δ = Hate(next)− Love(now)
Planning = what to do next.
δ = Love(next)− Hate(now)
δ = X − Y = contrast set = things frequent X but rare in Y
TAR3 (2003),WHICH (2010),etc
But for PROMISE effort estimation data
Contrast sets are obvious...
... Once you find the underlying shape of the data.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now)− Love(past)
Monitor = what not to do. δ = Hate(next)− Love(now)
Planning = what to do next. δ = Love(next)− Hate(now)
δ = X − Y = contrast set = things frequent X but rare in Y
TAR3 (2003),WHICH (2010),etc
But for PROMISE effort estimation data
Contrast sets are obvious...
... Once you find the underlying shape of the data.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now)− Love(past)
Monitor = what not to do. δ = Hate(next)− Love(now)
Planning = what to do next. δ = Love(next)− Hate(now)
δ = X − Y = contrast set = things frequent X but rare in Y
TAR3 (2003),WHICH (2010),etc
But for PROMISE effort estimation data
Contrast sets are obvious...
... Once you find the underlying shape of the data.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
Q: How to find the underlying shape of the data?
Data mining = data carving
To find the signal in the noise...
Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 4 / 18
IDEA = Iterative Dichomization on Every Attribute
Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1
1 Dimensionality reduction
2 Column reduction
3 Row reduction
4 Rule reduction
And in the reduced data, inference is obvious.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18
IDEA = Iterative Dichomization on Every Attribute
Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1
1 Dimensionality reduction
2 Column reduction
3 Row reduction
4 Rule reduction
And in the reduced data, inference is obvious.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18
IDEA = Iterative Dichomization on Every Attribute
Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1
1 Dimensionality reduction
2 Column reduction
3 Row reduction
4 Rule reduction
And in the reduced data, inference is obvious.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
Fastmap (Faloutsos’94)
W = anything
X = furthest from W
Y = furthest from X
Takes time O(2N)
Let c = dist(X,Y)
If Z has distance a,b to X,Y thenX projects to a2+c2−b2
2c
Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
Fastmap (Faloutsos’94)
W = anything
X = furthest from W
Y = furthest from X
Takes time O(2N)
Let c = dist(X,Y)
If Z has distance a,b to X,Y thenX projects to a2+c2−b2
2c
Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
Fastmap (Faloutsos’94)
W = anything
X = furthest from W
Y = furthest from X
Takes time O(2N)
Let c = dist(X,Y)
If Z has distance a,b to X,Y thenX projects to a2+c2−b2
2c
Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
Fastmap (Faloutsos’94)
W = anything
X = furthest from W
Y = furthest from X
Takes time O(2N)
Let c = dist(X,Y)
If Z has distance a,b to X,Y thenX projects to a2+c2−b2
2c
Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
Fastmap (Faloutsos’94)
W = anything
X = furthest from W
Y = furthest from X
Takes time O(2N)
Let c = dist(X,Y)
If Z has distance a,b to X,Y thenX projects to a2+c2−b2
2c
Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
Sort columns by their diversity
Keep columns that select for fewest clusters
e.g. nine rows in two clusters
cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5
cluster c2 has acap=2,2,2,3; pcap=3,4,4,5
p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33
p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33
I(col) =∑
(p(x) ∗ (∑−p(x |c).log(x |c)))
I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)2 Column reduction (info gain)
Sort columns by their diversity
Keep columns that select for fewest clusters
e.g. nine rows in two clusters
cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5
cluster c2 has acap=2,2,2,3; pcap=3,4,4,5
p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33
p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33
I(col) =∑
(p(x) ∗ (∑−p(x |c).log(x |c)))
I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)2 Column reduction (info gain)
Sort columns by their diversity
Keep columns that select for fewest clusters
e.g. nine rows in two clusters
cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5
cluster c2 has acap=2,2,2,3; pcap=3,4,4,5
p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33
p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33
I(col) =∑
(p(x) ∗ (∑−p(x |c).log(x |c)))
I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)2 Column reduction (info gain)
Sort columns by their diversity
Keep columns that select for fewest clusters
e.g. nine rows in two clusters
cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5
cluster c2 has acap=2,2,2,3; pcap=3,4,4,5
p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33
p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33
I(col) =∑
(p(x) ∗ (∑−p(x |c).log(x |c)))
I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)2 Column reduction (info gain)
Sort columns by their diversity
Keep columns that select for fewest clusters
e.g. nine rows in two clusters
cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5
cluster c2 has acap=2,2,2,3; pcap=3,4,4,5
p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33
p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33
I(col) =∑
(p(x) ∗ (∑−p(x |c).log(x |c)))
I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)2 Column reduction (info gain)
Sort columns by their diversity
Keep columns that select for fewest clusters
e.g. nine rows in two clusters
cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5
cluster c2 has acap=2,2,2,3; pcap=3,4,4,5
p(acap = 2) = 0.44 p(acap = 3) = 0.55p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33
p(acap = 2|c1) = 0.25 p(acap = 2|c2) = 0.75p(acap = 3|c1) = 0.8 p(acap = 3|c2) = 0.2p(pcap = 3|c1) = 0.67 p(pcap = 3|c2) = 0.33p(pcap = 4|c1) = 0.33 p(pcap = 4|c2) = 0.67p(pcap = 5|c1) = 0.67 p(pcap = 5|c2) = 0.33
I(col) =∑
(p(x) ∗ (∑−p(x |c).log(x |c)))
I(acap) = 0.239 ← keepI(pcap) = 0.273 ← prune
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
3 Row reduction (replace clusters with their mean)
Replace all leaf cluster instances with their centroid
Described only using columns within 50% of min diversity.
e.g. Nasa93 reduces to 12 columns and 13 centroids.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
3 Row reduction (replace clusters with their mean)
Replace all leaf cluster instances with their centroid
Described only using columns within 50% of min diversity.
e.g. Nasa93 reduces to 12 columns and 13 centroids.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
3 Row reduction (replace clusters with their mean)
Replace all leaf cluster instances with their centroid
Described only using columns within 50% of min diversity.
e.g. Nasa93 reduces to 12 columns and 13 centroids.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18
Nasa93 reduces to 12 columns and 13 centroids
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 9 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
3 Row reduction (replace clusters with their mean)
4 Rule reduction (contrast home vs neighbors)
Surprise: after steps 1,2,3...
Further computation is superfluous.
Visuals sufficient for contrast set generation
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 10 / 18
IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
3 Row reduction (replace clusters with their mean)
4 Rule reduction (contrast home vs neighbors)
Surprise: after steps 1,2,3...
Further computation is superfluous.
Visuals sufficient for contrast set generation
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 10 / 18
Manual Construction of Contrast Sets
Table5 = Your “home” cluster
Table6 = Projects of similar size
Table7 = Nearby project with fearsome effort
Contrast set = delta on last line
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 11 / 18
Why Cluster120?
Is it valid that cluter120 costs so much?
Yes, if building core services with cost amortized over N future apps.
No, if racing to get products to a competitive market
We do not know- but at least we are focused on that issue.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 12 / 18
Reductions on PROMISE data sets
0
5
10
15
20
25
1 10 100
colu
mns
rows
size of reduces data setsreduced
data set rows columnsAlbrecht 4 4
China 66 15Cocomo81 8 18
Cocomo81e 4 16Cocomo81o 4 16Cocomo81s 2 16Desharnais 8 19
Desharnais L1 6 10Desharnais L2 4 10Desharnais L3 2 10
Finnish 6 2Kemerer 2 7
Miyazaki’94 6 3Nasa93 13 12
Nasa93 center 5 7 16Nasa93 center1 2 15Nasa93 center2 5 16
SDR 4 21Telcom1 2 1
Q: throwing away too much?
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 13 / 18
Reductions on PROMISE data sets
0
5
10
15
20
25
1 10 100
colu
mns
rows
size of reduces data setsreduced
data set rows columnsAlbrecht 4 4
China 66 15Cocomo81 8 18
Cocomo81e 4 16Cocomo81o 4 16Cocomo81s 2 16Desharnais 8 19
Desharnais L1 6 10Desharnais L2 4 10Desharnais L3 2 10
Finnish 6 2Kemerer 2 7
Miyazaki’94 6 3Nasa93 13 12
Nasa93 center 5 7 16Nasa93 center1 2 15Nasa93 center2 5 16
SDR 4 21Telcom1 2 1
Q: throwing away too much?
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 13 / 18
Q: Throwing Away Too Much?
Estimates = class variable of nearest centroid in reduced space
Compare to 90 pre-processors*learners from Kocagueneli et al. TSE,2011 On the Value of Ensemble Learning in Effort Estimation.
Performance measure = MRE = pred−actualactual
9 pre-processors:1 norm: normalize numerics 0..1, min..max
2 log: replace numerics of the non-class columns withtheir logarithms
3 PCA: replace non-class columns with principlecomponents
4 SWReg: cull uninformative columns with stepwiseregression
5 Width3bin: divide numerics into 3 bins with boundaries(max-min)/3
6 Wdith5bin: divide numerics into 5 bins with boundaries(max-min)/5
7 Freq3bins: split numerics into 3 equal size percentiles.
8 Freq5bins: split numerics into 5 equal size percentiles.
9 None: no pre-processor.
10 learners:1 INN: simple one nearest neighbor
2 ABE0-1nn: analogy-based estimation using nearestneighbor.
3 ABE0-5nn: analogy-based estimation using the medianof the five nearest neighbors.
4 CART(yes): regression trees, with sub-tree postpruning.
5 CART(no): regression trees, no post-pruning.
6 NNet: two-layered neural net.
7 LReg: linear regression
8 PLSR: partial least squares regression.
9 PCR: principle components regression
10 SWReg: Stepwise regressions.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18
Q: Throwing Away Too Much?
Estimates = class variable of nearest centroid in reduced space
Compare to 90 pre-processors*learners from Kocagueneli et al. TSE,2011 On the Value of Ensemble Learning in Effort Estimation.
Performance measure = MRE = pred−actualactual
9 pre-processors:1 norm: normalize numerics 0..1, min..max
2 log: replace numerics of the non-class columns withtheir logarithms
3 PCA: replace non-class columns with principlecomponents
4 SWReg: cull uninformative columns with stepwiseregression
5 Width3bin: divide numerics into 3 bins with boundaries(max-min)/3
6 Wdith5bin: divide numerics into 5 bins with boundaries(max-min)/5
7 Freq3bins: split numerics into 3 equal size percentiles.
8 Freq5bins: split numerics into 5 equal size percentiles.
9 None: no pre-processor.
10 learners:1 INN: simple one nearest neighbor
2 ABE0-1nn: analogy-based estimation using nearestneighbor.
3 ABE0-5nn: analogy-based estimation using the medianof the five nearest neighbors.
4 CART(yes): regression trees, with sub-tree postpruning.
5 CART(no): regression trees, no post-pruning.
6 NNet: two-layered neural net.
7 LReg: linear regression
8 PLSR: partial least squares regression.
9 PCR: principle components regression
10 SWReg: Stepwise regressions.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18
Q: Throwing Away Too Much?
Estimates = class variable of nearest centroid in reduced space
Compare to 90 pre-processors*learners from Kocagueneli et al. TSE,2011 On the Value of Ensemble Learning in Effort Estimation.
Performance measure = MRE = pred−actualactual
9 pre-processors:1 norm: normalize numerics 0..1, min..max
2 log: replace numerics of the non-class columns withtheir logarithms
3 PCA: replace non-class columns with principlecomponents
4 SWReg: cull uninformative columns with stepwiseregression
5 Width3bin: divide numerics into 3 bins with boundaries(max-min)/3
6 Wdith5bin: divide numerics into 5 bins with boundaries(max-min)/5
7 Freq3bins: split numerics into 3 equal size percentiles.
8 Freq5bins: split numerics into 5 equal size percentiles.
9 None: no pre-processor.
10 learners:1 INN: simple one nearest neighbor
2 ABE0-1nn: analogy-based estimation using nearestneighbor.
3 ABE0-5nn: analogy-based estimation using the medianof the five nearest neighbors.
4 CART(yes): regression trees, with sub-tree postpruning.
5 CART(no): regression trees, no post-pruning.
6 NNet: two-layered neural net.
7 LReg: linear regression
8 PLSR: partial least squares regression.
9 PCR: principle components regression
10 SWReg: Stepwise regressions.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18
Results
Perennial problem with assessing different effort estimation tools.MRE not normal: low valley, high hills (injects much variance)
IDEA’s predictions not better or worse than others, avoids all hills
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 15 / 18
Results
Perennial problem with assessing different effort estimation tools.MRE not normal: low valley, high hills (injects much variance)
IDEA’s predictions not better or worse than others, avoids all hills
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 15 / 18
Related Work
Cluster using (a) centrality (e.g. k-means);(b) connectedness (e.g. dbScan) (c) separation (e.g. IDEA)
case- featureWho based clustering selection taskShepperd (1997)
√predict
Boley (1998) recursive PCA predictBettenburg et al. (MSR’12) recurive regression predictPosnett et al. (ASE’11) on file/package divisions predictMenzies et al. (ASE’11)
√FastMap contrast
IDEA√ √ √
contrast
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 16 / 18
Related Work
Cluster using (a) centrality (e.g. k-means);(b) connectedness (e.g. dbScan) (c) separation (e.g. IDEA)
case- featureWho based clustering selection taskShepperd (1997)
√predict
Boley (1998) recursive PCA predictBettenburg et al. (MSR’12) recurive regression predictPosnett et al. (ASE’11) on file/package divisions predictMenzies et al. (ASE’11)
√FastMap contrast
IDEA√ √ √
contrast
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 16 / 18
Back to the Sound bites
Less predicition, more decision
Data has shape
“Data mining” = “carving” out that shape
To reveal shape, remove irrelvancies
Cut the cr*pIDEA = reduction operators: dimension, column, row, rule
Show, don’t code
Once you can see shape, inference is superflous.Implications for other research.
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 17 / 18
Questions? Comments?
[email protected] (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 18 / 18