On the application of GP for software engineering predictive modeling: A systematic review Expert...
-
Upload
thomas-sandoval -
Category
Documents
-
view
215 -
download
1
Transcript of On the application of GP for software engineering predictive modeling: A systematic review Expert...
![Page 1: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/1.jpg)
On the application of GP for software engineering predictive modeling: A systematic review
Expert systems with Applications, Vol. 38 no. 9, 2011
Wasif Afzal, Richard Torkar
Blekinge Institute of Technology,
Karlskrona, Sweden.
{waf,rto}@bth.se
![Page 2: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/2.jpg)
Agenda• Research question
• Symbolic regression
• Prediction and estimation in sw engineering
• GP for prediction and estimation in sw engineering
• Application of GP for sw quality classification
• Application of GP for sw cost/effort/size estimation
• Application of GP for sw fault prediction and sw reliability growth modeling
• Future work
• Conclusions
• Recommendations
![Page 3: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/3.jpg)
Our research question• Is there evidence that:
symbolic regression using GP is an effective method for:
prediciton and estimation, in comparison with:
regression, machine learning and other models (including expert opinion and different improvements over the standard GP algorithm)?
![Page 4: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/4.jpg)
It is about symbolic regression!• Symbolic regression – One of the many application
areas of GP– Finds a function, with the outputs having desired
outcomes.
– Makes no assumptions about:
• Structure of the function
• Data distribution
• Relationship between independent and dependent variables
• Helps in identifying the significant variables in subsequent modeling attempts
![Page 5: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/5.jpg)
Prediction and estimation in sw engineering
• Software quality
– Software quality classification
– Software fault prediction
– Software reliability growth modeling
• Software size
• Software development cost/effort
• Maintenance task effort
• Software release timing
![Page 6: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/6.jpg)
GP for prediction and estimation in sw engineering
• 23 identified primary studies– Software quality classification (8)– Software cost/effort/size estimation (7)– Software fault prediction and software
reliability growth modeling (8)
![Page 7: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/7.jpg)
GP for prediction and estimation in sw engineering cntd…
![Page 8: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/8.jpg)
Application of GP for sw quality classification (8 studies)
• Variations of the dependent variable:
– Fault proneness
– Quality ranking of program modules (high risk to low risk)
• Variations in sampling of training and testing sets:
– Simple hold-out and 10-fold CV.
![Page 9: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/9.jpg)
Application of GP for sw quality classification cntd…
• Variations in fitness function– Single objective
• Minimization of root mean square
• Minimization of average cost of misclassification
– Multi-objective• Minimization of average cost of misclassification +
minimization of tree size
• Maximization of the best percentage of the actual faults averaged over the percentiles level of interest + controlling the tree size.
• Balancing the over sampling and under sampling in each class for a decision tree.
![Page 10: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/10.jpg)
Application of GP for sw quality classification cntd…
• Variations in comparison groups:– Neural networks – k-nearnest neighbour– Regression (linear, logistic)– Humans
![Page 11: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/11.jpg)
Application of GP for sw quality classification cntd…
• Results:– Majority of the studies (6 out of 8) reported
results in favor of using GP for the classification task.
• Limitations:– Increase the comparisons with a more
representative set of techniques.– Increase the use of publically available data sets
for easier replications.
![Page 12: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/12.jpg)
Application of GP for sw quality classification cntd…
• Encouraging aspects:– The datasets used represent real-world
projects.– Problem dependent objectives represented in
fitness functions perform better than standard GP.
![Page 13: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/13.jpg)
Application of GP for sw cost/effort/size (CES) estimation (7 studies)
• Variations of the dependent variable– Software effort– Software cost– Software size
• Variations in fitness function– Single objective
• Minimization of mean squared error or MMRE
![Page 14: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/14.jpg)
Application of GP for sw cost/effort/size (CES) estimation cntd…
• Variations in comparison groups– ANN, nearest neighbour and different forms
of regression.• Variations in sampling of training and testing
sets– Simple hold-out.
![Page 15: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/15.jpg)
Application of GP for sw cost/effort/size (CES) estimation cntd…
• Results– No strong evidence of GP performing consistently on
all evaluation measures used.
• Limitations– Evaluation measures used are not standardized.
– Different hold-out samplings for train and test sets.
– Lack of statistical hypothesis testing.
– Lack of comparison groups.
![Page 16: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/16.jpg)
Application of GP for sw fault prediciton and sw reliability growth modeling (8 studies) • Variations of the dependent variable
– SW fault prediction– SW reliability growth modeling
• Variations in fitness function– Single objective:
• Minimization of standard error
![Page 17: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/17.jpg)
Application of GP for sw fault prediciton and sw reliability growth modeling cntd …
• Variations in comparison groups– Standard GP, Naive Bayes, traditional
software reliability growth models.
• Variations in sampling of training ad testing sets– Hold-out and 10-fold CV
![Page 18: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/18.jpg)
Application of GP for sw fault prediciton and sw reliability growth modeling cntd …
• Results:– 7 out of 8 studies favor the use of GP.
• Limitations:– Poor representation of comparison groups– Absence of a baseline to compare to.
![Page 19: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/19.jpg)
Promising future work to undertake
• Multi-objective fitness evaluation (e.g. Minimization of standard error and maximization of correlation coefficient)
• Simplification of GP solutions to help interpretation of relationships between variables.
• Evaluation of techniques to minimize overfitting of GP solutions.
![Page 20: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/20.jpg)
Conclusions• A total of 23 studies apply GP for predictive studies in sw
engineering:
– sw quality classification (8)
– sw cost/effort/size estimation (7)
– sw fault prediciton and sw reliability growth modeling (8)
• There is evidence in support of using GP for:
– sw quality classifiaction
– sw fault prediction and SW reliability growth modeling
• but not for:
– sw cost/effort/size estimation.
![Page 21: On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649b57550346318e8d5f2a/html5/thumbnails/21.jpg)
Recommendations• Use public data sets wherever possible.• Apply commonly used sampling strategies.• Use techniques to avoid overfitting in GP
solutions.• Report the settings of GP parameters.• Compare the performances against a commonly
used baseline.• Use statistical experimental designs.