Promise 2011: "Local Bias and its Impacts on the Performance of Parametric Estimation Models"
-
Upload
cs-ncstate -
Category
Technology
-
view
2.715 -
download
7
description
Transcript of Promise 2011: "Local Bias and its Impacts on the Performance of Parametric Estimation Models"
Institute of Software,Chinese Academy of Sciences
Local Bias and its Impacts on the Local Bias and its Impacts on the Performance of Parametric Estimation Performance of Parametric Estimation
ModelsModels
Ye Yang, Lang Xie, Zhimin He (ISCAS)
Qi Li, Vu Nguyen, Barry Boehm (USC)
Ricardo Valerdi (MIT/Univ. of Arizona)
Sep. 21, 2011
Promise 2011, Banff, Canada
Institute of Software,Chinese Academy of Sciences
OutlineOutline
Background Research questions Measuring local bias Measuring the impacts of local bias Handling Local Bias Conclusions and future work
2
Institute of Software,Chinese Academy of Sciences
BackgroundBackground
Continuously calibrated and validated parametric models are necessary for realistic software estimates.
3
Model user
Model maintener
Model researcher
Institute of Software,Chinese Academy of Sciences
Background(Cont.)Background(Cont.)
Typical parametric models are calibrated over a broad range of industry data
Advocate local calibration to improve accuracy over the default model calibration.
Pros and cons of local calibration (local tuning) Pros: better model performance Cons: less bound to reach full compliance with the general model
4
Institute of Software,Chinese Academy of Sciences
Background (Cont.)Background (Cont.)
The evolution cycle of a parametric model Mismatches between “general assumptions” and “local assumptions”
5
Model Localization
Model Usage
Model Calibration
Model Building
General assumptions
Underlying model
Local data
Calibrationdata
Local assumptions
Model updates
Historical data
Resultant tuning variance caught increasing research attention
Counter-intuitive calibration results
Challenges in making use of unbalanced dataset for developing and evaluating general model
Institute of Software,Chinese Academy of Sciences
Example: COCOMO II modelExample: COCOMO II model
COCOMO II model
Range of local tuning parameters: Yang and Clark: CII Database experience: 1<=A<=4 Menzies : (2.2 <= A <= 9.18) ^ (0.88 <= B <= 1.09)
6
5
1
170.01
1
ii
B SF
jj
Effort A Size EM
Ln_effort
Ln_Size
Institute of Software,Chinese Academy of Sciences
Research questionsResearch questions
Research questions: Is there a way to measure the local bias? As historical data accumulates from multiple companies, how will
the associated local bias impact the performance of the general parametric estimation model?
Are there any correlation patterns between local bias and model performance variation?
Assumptions: The general parametric model follows a similar structure as the
COCOMO II. In model localization stage, constant A and constant B are tuned
with local data. In model usage stage, locally calibrated A and B are used for
project estimation.
7
Institute of Software,Chinese Academy of Sciences
OutlineOutline
Background Research questions Measuring local bias Measuring the impacts of local bias Handling Local Bias Conclusions and future work
8
Institute of Software,Chinese Academy of Sciences
Local Bias DefinitionLocal Bias Definition
Local bias: degree of deviation between a local model and the general model
In the context of CII model:
where A’ and B’ are model parameters calibrated from local data of
each organization, A and B are default constant values of COCOMO II model
(A=2.94, B=0.91), and A standard size of 100KLOC to normalize local bias.
9
' '| ln( ) | | ln( ) ( ' ) ln( ) |
Effort Alocalbias B B Size
Effort A
Institute of Software,Chinese Academy of Sciences
Summary of Dataset Summary of Dataset
10
CII 2000 Subset After2000 SubsetCII 2010Dataset
Institute of Software,Chinese Academy of Sciences
Analysis procedureAnalysis procedure
Break After2000 subset into 10 subsets.
Conduct representative local calibration to produce A’ and B’.
Calculate local bias and compare among groups.
11
CII 2000 SubsetAfter2000 Subset
Subset1
…
A, BA1’, B1’ A2’, B2’ An’, Bn’
local_bias1 local_bias2 local_biasn
CII 2010Dataset
Subset2
Subsetn
Group by Organization_IDDefault Constants: A, B
Institute of Software,Chinese Academy of Sciences
Measuring local bias - ResultsMeasuring local bias - Results
Parameters of local models
12
Local bias of each group
Different local A and B in each group, indicating local bias introduced when adopting local calibration;
Local bias varies in different group, ranging from 0.06 to 2.25; E.g. in group 9, the relative ratio of the local model’s estimates and the CII
model estimates is as great as almost EXP(2.25)=9.49 times considering a normal project size at 100KSLOC.
Institute of Software,Chinese Academy of Sciences
OutlineOutline
Background Research questions Measuring local bias Measuring the impacts of local bias Handling Local Bias Conclusions and future work
13
Institute of Software,Chinese Academy of Sciences
Measuring the impacts of local biasMeasuring the impacts of local bias
Performance assessment Basic performance indicators: MMRE (mean MRE), stdMRE (the variance of
MRE) Assessment procedure:
Average MMRE, Range of MMRE, Average stdMRE, and Range of stdMRE are used to assess the performance of an estimation model.
14
Spliting data set into training set
and test set
Tuning model parameters on
training set
Evaluating model performance on
test setMMRE, stdMRE
Average MMRERange of MMREAverage stdMRERange of stdMRE
Repeat the above steps for 2000 times
2000 (MMRE, stdMRE) pairs
Institute of Software,Chinese Academy of Sciences
Analysis procedureAnalysis procedure
First, for each group ssi in the After2000 subset:
1. combine ssi with CII 2000 data set to produce a new data set dsi ;
2. Assessing model performance on data set dsi , record values of performance indicators;
Then conduct correlation analysis between local bias and model performance
15
CII 2000 subsetI
SS1 Performance Local bias
CII 2000 subsetI
SS2 Performance Local bias
…… …… ……
Correlation analysis
Institute of Software,Chinese Academy of Sciences
ResultsResults
Model performance
16
• Model performance decreases as new subsets being introduced
CII 2000 CII2010
MMRE 0.3478 0.4063
StdMRE 0.3261 0.3401
Reflecting the uncertainty inherent in model performance when adding just a small group of new data points into
the CII 2000 baseline dataset.
Institute of Software,Chinese Academy of Sciences
Measuring the impacts of local bias(cont.)Measuring the impacts of local bias(cont.)
Spearman correlation coefficients between local bias and model performance:
At the significant level of p-value less than 0.05, the range of stdMRE is significantly positive correlated with local bias and local_bias*num. Both the average stdMRE and the average MMRE are significantly positive correlated with local_bias*num.
Range of stdMRE reflects the uncertainty of model performance. Hence, the bigger the local bias is, the weaker the performance is.
17
Institute of Software,Chinese Academy of Sciences
DiscussionsDiscussions
Two types of measures Local bias:
Useful to bridge the potential gaps between “model building” stage and “model localization” stage
Performance measures: range and average of MMRE and stdMRE are easy to produce,
reflecting certain profile of bias’s influence
Two components that drive the decreased model performance the degree of local bias and the number of data points associated
with each additional group
18
Institute of Software,Chinese Academy of Sciences
Implications to Parametric Model Calibration
Previous approaches Data pre-processing
Reducing factors, removing outliers, etc regression based approaches
variants of standard linear regression, incorporating a priori knowledge
machine learning approaches mainly focus on optimizing model accuracy
Need to pay attention to balance accuracy and stability
19
Institute of Software,Chinese Academy of Sciences
Threats to ValidityThreats to Validity
Other sources of bias? chronological bias, new technologies influences, etc.
Other performance indicators? PRED, MRE, etc
Other parametric models?
20
Institute of Software,Chinese Academy of Sciences
Ongoing work on handling local biasOngoing work on handling local bias
Assumption : local historical data set with higher local bias presents more different
pattern for cost estimation, and it should be assigned a lower weight when being used for model calibration.
Constraints for weight distribution function Weight=F ( LocalBias ) IF LocalBias =0, THEN Weight =1; IF LocalBias → +∞, THEN Weight → 0; The F should be a decreasing function on interval [0, +∞).
Three functions
21
11
1F Weight
LocalBias
:
12
1 ln( )F Weight
LocalBias
:
13F Weight LocalBiase
:
Institute of Software,Chinese Academy of Sciences
ConclusionsConclusions
Providing a definition for consistently understanding and measuring local bias;
The impact assessment and correlation analysis verify that local bias can be harmful to general model performance;
Offering insights to ease parametric model evolution by identifying and avoiding local bias early on in the data collection stage;
Better local bias handling approach is needed. E.g. employ machine learning approach to learn local bias, and learn how to
improve the model structure to counter-effect the bias
22
Institute of Software,Chinese Academy of Sciences
Thank you!Thank you!
Contact:
Ye Yang ([email protected])