On the application of GP for software engineering predictive modeling: A systematic review Expert...

On the application of GP for software engineering predictive modeling: A systematic review

Expert systems with Applications, Vol. 38 no. 9, 2011

Wasif Afzal, Richard Torkar

Blekinge Institute of Technology,

Karlskrona, Sweden.

{waf,rto}@bth.se

Agenda• Research question

• Symbolic regression

• Prediction and estimation in sw engineering

• GP for prediction and estimation in sw engineering

• Application of GP for sw quality classification

• Application of GP for sw cost/effort/size estimation

• Application of GP for sw fault prediction and sw reliability growth modeling

• Future work

• Conclusions

• Recommendations

Our research question• Is there evidence that:

symbolic regression using GP is an effective method for:

prediciton and estimation, in comparison with:

regression, machine learning and other models (including expert opinion and different improvements over the standard GP algorithm)?

It is about symbolic regression!• Symbolic regression – One of the many application

areas of GP– Finds a function, with the outputs having desired

outcomes.

– Makes no assumptions about:

• Structure of the function

• Data distribution

• Relationship between independent and dependent variables

• Helps in identifying the significant variables in subsequent modeling attempts

Prediction and estimation in sw engineering

• Software quality

– Software quality classification

– Software fault prediction

– Software reliability growth modeling

• Software size

• Software development cost/effort

• Maintenance task effort

• Software release timing

GP for prediction and estimation in sw engineering

• 23 identified primary studies– Software quality classification (8)– Software cost/effort/size estimation (7)– Software fault prediction and software

reliability growth modeling (8)

GP for prediction and estimation in sw engineering cntd…

Application of GP for sw quality classification (8 studies)

• Variations of the dependent variable:

– Fault proneness

– Quality ranking of program modules (high risk to low risk)

• Variations in sampling of training and testing sets:

– Simple hold-out and 10-fold CV.

Application of GP for sw quality classification cntd…

• Variations in fitness function– Single objective

• Minimization of root mean square

• Minimization of average cost of misclassification

– Multi-objective• Minimization of average cost of misclassification +

minimization of tree size

• Maximization of the best percentage of the actual faults averaged over the percentiles level of interest + controlling the tree size.

• Balancing the over sampling and under sampling in each class for a decision tree.


• Variations in comparison groups:– Neural networks – k-nearnest neighbour– Regression (linear, logistic)– Humans


• Results:– Majority of the studies (6 out of 8) reported

results in favor of using GP for the classification task.

• Limitations:– Increase the comparisons with a more

representative set of techniques.– Increase the use of publically available data sets

for easier replications.


• Encouraging aspects:– The datasets used represent real-world

projects.– Problem dependent objectives represented in

fitness functions perform better than standard GP.

Application of GP for sw cost/effort/size (CES) estimation (7 studies)

• Variations of the dependent variable– Software effort– Software cost– Software size

• Variations in fitness function– Single objective

• Minimization of mean squared error or MMRE

Application of GP for sw cost/effort/size (CES) estimation cntd…

• Variations in comparison groups– ANN, nearest neighbour and different forms

of regression.• Variations in sampling of training and testing

sets– Simple hold-out.

Application of GP for sw cost/effort/size (CES) estimation cntd…

• Results– No strong evidence of GP performing consistently on

all evaluation measures used.

• Limitations– Evaluation measures used are not standardized.

– Different hold-out samplings for train and test sets.

– Lack of statistical hypothesis testing.

– Lack of comparison groups.

Application of GP for sw fault prediciton and sw reliability growth modeling (8 studies) • Variations of the dependent variable

– SW fault prediction– SW reliability growth modeling

• Variations in fitness function– Single objective:

• Minimization of standard error

Application of GP for sw fault prediciton and sw reliability growth modeling cntd …

• Variations in comparison groups– Standard GP, Naive Bayes, traditional

software reliability growth models.

• Variations in sampling of training ad testing sets– Hold-out and 10-fold CV

Application of GP for sw fault prediciton and sw reliability growth modeling cntd …

• Results:– 7 out of 8 studies favor the use of GP.

• Limitations:– Poor representation of comparison groups– Absence of a baseline to compare to.

Promising future work to undertake

• Multi-objective fitness evaluation (e.g. Minimization of standard error and maximization of correlation coefficient)

• Simplification of GP solutions to help interpretation of relationships between variables.

• Evaluation of techniques to minimize overfitting of GP solutions.

Conclusions• A total of 23 studies apply GP for predictive studies in sw

engineering:

– sw quality classification (8)

– sw cost/effort/size estimation (7)

– sw fault prediciton and sw reliability growth modeling (8)

• There is evidence in support of using GP for:

– sw quality classifiaction

– sw fault prediction and SW reliability growth modeling

• but not for:

– sw cost/effort/size estimation.

Recommendations• Use public data sets wherever possible.• Apply commonly used sampling strategies.• Use techniques to avoid overfitting in GP

solutions.• Report the settings of GP parameters.• Compare the performances against a commonly

used baseline.• Use statistical experimental designs.

On the application of GP for software engineering predictive modeling: A systematic review Expert...

Documents

Transcript of On the application of GP for software engineering predictive modeling: A systematic review Expert...