On the application of GP for software engineering predictive modeling: A systematic review Expert...
-
Upload
thomas-sandoval -
Category
Documents
-
view
215 -
download
1
Transcript of On the application of GP for software engineering predictive modeling: A systematic review Expert...
On the application of GP for software engineering predictive modeling: A systematic review
Expert systems with Applications, Vol. 38 no. 9, 2011
Wasif Afzal, Richard Torkar
Blekinge Institute of Technology,
Karlskrona, Sweden.
{waf,rto}@bth.se
Agenda• Research question
• Symbolic regression
• Prediction and estimation in sw engineering
• GP for prediction and estimation in sw engineering
• Application of GP for sw quality classification
• Application of GP for sw cost/effort/size estimation
• Application of GP for sw fault prediction and sw reliability growth modeling
• Future work
• Conclusions
• Recommendations
Our research question• Is there evidence that:
symbolic regression using GP is an effective method for:
prediciton and estimation, in comparison with:
regression, machine learning and other models (including expert opinion and different improvements over the standard GP algorithm)?
It is about symbolic regression!• Symbolic regression – One of the many application
areas of GP– Finds a function, with the outputs having desired
outcomes.
– Makes no assumptions about:
• Structure of the function
• Data distribution
• Relationship between independent and dependent variables
• Helps in identifying the significant variables in subsequent modeling attempts
Prediction and estimation in sw engineering
• Software quality
– Software quality classification
– Software fault prediction
– Software reliability growth modeling
• Software size
• Software development cost/effort
• Maintenance task effort
• Software release timing
GP for prediction and estimation in sw engineering
• 23 identified primary studies– Software quality classification (8)– Software cost/effort/size estimation (7)– Software fault prediction and software
reliability growth modeling (8)
GP for prediction and estimation in sw engineering cntd…
Application of GP for sw quality classification (8 studies)
• Variations of the dependent variable:
– Fault proneness
– Quality ranking of program modules (high risk to low risk)
• Variations in sampling of training and testing sets:
– Simple hold-out and 10-fold CV.
Application of GP for sw quality classification cntd…
• Variations in fitness function– Single objective
• Minimization of root mean square
• Minimization of average cost of misclassification
– Multi-objective• Minimization of average cost of misclassification +
minimization of tree size
• Maximization of the best percentage of the actual faults averaged over the percentiles level of interest + controlling the tree size.
• Balancing the over sampling and under sampling in each class for a decision tree.
Application of GP for sw quality classification cntd…
• Variations in comparison groups:– Neural networks – k-nearnest neighbour– Regression (linear, logistic)– Humans
Application of GP for sw quality classification cntd…
• Results:– Majority of the studies (6 out of 8) reported
results in favor of using GP for the classification task.
• Limitations:– Increase the comparisons with a more
representative set of techniques.– Increase the use of publically available data sets
for easier replications.
Application of GP for sw quality classification cntd…
• Encouraging aspects:– The datasets used represent real-world
projects.– Problem dependent objectives represented in
fitness functions perform better than standard GP.
Application of GP for sw cost/effort/size (CES) estimation (7 studies)
• Variations of the dependent variable– Software effort– Software cost– Software size
• Variations in fitness function– Single objective
• Minimization of mean squared error or MMRE
Application of GP for sw cost/effort/size (CES) estimation cntd…
• Variations in comparison groups– ANN, nearest neighbour and different forms
of regression.• Variations in sampling of training and testing
sets– Simple hold-out.
Application of GP for sw cost/effort/size (CES) estimation cntd…
• Results– No strong evidence of GP performing consistently on
all evaluation measures used.
• Limitations– Evaluation measures used are not standardized.
– Different hold-out samplings for train and test sets.
– Lack of statistical hypothesis testing.
– Lack of comparison groups.
Application of GP for sw fault prediciton and sw reliability growth modeling (8 studies) • Variations of the dependent variable
– SW fault prediction– SW reliability growth modeling
• Variations in fitness function– Single objective:
• Minimization of standard error
Application of GP for sw fault prediciton and sw reliability growth modeling cntd …
• Variations in comparison groups– Standard GP, Naive Bayes, traditional
software reliability growth models.
• Variations in sampling of training ad testing sets– Hold-out and 10-fold CV
Application of GP for sw fault prediciton and sw reliability growth modeling cntd …
• Results:– 7 out of 8 studies favor the use of GP.
• Limitations:– Poor representation of comparison groups– Absence of a baseline to compare to.
Promising future work to undertake
• Multi-objective fitness evaluation (e.g. Minimization of standard error and maximization of correlation coefficient)
• Simplification of GP solutions to help interpretation of relationships between variables.
• Evaluation of techniques to minimize overfitting of GP solutions.
Conclusions• A total of 23 studies apply GP for predictive studies in sw
engineering:
– sw quality classification (8)
– sw cost/effort/size estimation (7)
– sw fault prediciton and sw reliability growth modeling (8)
• There is evidence in support of using GP for:
– sw quality classifiaction
– sw fault prediction and SW reliability growth modeling
• but not for:
– sw cost/effort/size estimation.
Recommendations• Use public data sets wherever possible.• Apply commonly used sampling strategies.• Use techniques to avoid overfitting in GP
solutions.• Report the settings of GP parameters.• Compare the performances against a commonly
used baseline.• Use statistical experimental designs.