Systematic Mining of Software...
Transcript of Systematic Mining of Software...
![Page 1: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/1.jpg)
software evolution & architecture lab
University of Zurich, Switzerland http://seal.ifi.uzh.ch @ LASER summer school 2014
Harald Gall
Systematic Mining of Software Repositories !
Lecture 5 - Retrospective
![Page 2: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/2.jpg)
software evolution & architecture lab
2009 Roundtable on the Future of Mining Software Archives
![Page 3: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/3.jpg)
Type to enter text
2009 Future of Mining Software Repos
Vision statement Status open
Answer Commonly Asked Project Questions Michael W. Godfrey partly
Software Repositories: A Strategic Asset Ahmed E. Hassan yes
Create Centralized Data Repositories James Herbsleb yes
Embed Mining in Developer Tools Gail C. Murphy partly
Help Developers Search for Information Martin Robillard partly
Deploy Mining to Industry Audris Mockus ongoing
Let Us Not Mine for Fool’s Gold David Notkin ongoing
based on a Software Roundtable published in IEEE Software 2009
![Page 4: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/4.jpg)
software evolution & architecture lab
2013 MSRconf revisited
![Page 5: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/5.jpg)
Type to enter text
MSRconf.org: Status in 2013
‣ A Trend Analysis on Past MSR Papers, by Serge Demeyer et al., MSR 2013 ‣ RQ 1: Which are the popular and outdated research
topics? (by text analysis, with n-grams) ‣ RQ 2: Which are the frequently and less frequently cited
cases? ‣ RQ 3: Which is the popular and emerging mining
infrastructure? ‣ RQ 4: What is the “actionable information” which we are
deemed to uncover?
![Page 6: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/6.jpg)
RQ 1: Popularity of topics
![Page 7: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/7.jpg)
RQ 2: Frequently cited cases
![Page 8: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/8.jpg)
RQ 3: SCM’s
![Page 9: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/9.jpg)
Type to enter text
CfP of MSRconf:
‣ Goal is “to uncover interesting and actionable information about software systems and projects”
![Page 10: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/10.jpg)
software evolution & architecture lab
The LNCS book chapter
![Page 11: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/11.jpg)
Type to enter text
LNCS book chapter
‣ Revisiting Mining StudiesKatja Kevic (UZH), Stefanie Beyer (AAU), Ilias Rousinopoulos (AAU), Sven Amann (TUD) ‣ what’s a mining study: setup, resources, machinery, .. ‣ what sources (archives) can be used for what kind of
study (a catalog) ‣ what questions have been addressed so far ‣ what questions and conclusions (answers) so far ‣ which studies can be automated in terms of tooling and
infrastructure ‣ what is a benchmark for mining studies
![Page 12: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/12.jpg)
software evolution & architecture lab
A retrospective overview of topics addressed in the lectures
![Page 13: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/13.jpg)
The Screening Plant of a SW Miner
13
![Page 14: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/14.jpg)
Type to enter text
Which data sources?
‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole, Sourcerer, Ultimate Debian DB ‣ Provide benchmark (raw) data
‣ Interactive online web platforms that provide various analyses ‣ Boa, FOSSology, Alitheia core, Ohloh ‣ Analyses offered by design ‣ Data produced is best used within the system
‣ Industrial project data (not widely accessible!)
![Page 15: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/15.jpg)
Type to enter text
What kind of studies?
‣ Source code ‣ Which entities co-evolve/co-change? ‣ How to identify code smells or design disharmonies?
‣ Bugs and changes ‣ Who should / how long will it take to fix this bug? ‣ When do changes induce fixes? ‣ Predicting bugs and their components?
‣ Project and process ‣ Do code and comments co-evolve? ‣ Who are the experts of a piece of code?
![Page 16: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/16.jpg)
Example: Bug Prediction
Using Code Churn vs. Fine-Grained Changes
Using the Gini Coefficient for Bug Prediction
Predicting the MethodPredicting the Types of Code Changes
Using developer networks for Bug Prediction
![Page 17: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/17.jpg)
Type to enter text
• Learn a prediction model from historic data
• Predict defects for the same project
• Hundreds of prediction models / learners exist
• Models work fairly well with precision and recall of up to 80%.
Predictor Precision Recall
Pre-‐Release Bugs 73.80% 62.90%
Test Coverage 83.80% 54.40%
Dependencies 74.40% 69.90%
Code Complexity 79.30% 66.00%
Code Churn 78.60% 79.90%
Org. Structure 86.20% 84.00%From: N. Nagappan, B. Murphy, and V. Basili. The influence of organizational structure on software quality. ICSE 2008.
Performance of bug prediction
![Page 18: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/18.jpg)
Example: Code Ownership
C. Bird, N. Nagappan, B. Murphy, H. Gall, P Devanbu, Don't touch my code! Examining the effects of ownership on software quality, ESEC/FSE ’11
![Page 19: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/19.jpg)
Performance/Time variance
J. Ekanayake, J. Tappolet, H. Gall, A. Bernstein, Time variance and defect prediction in software projects, Empirical Software Engineering, Vol. 17 (4-5), 2012
![Page 20: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/20.jpg)
Workflows & Mashups
![Page 21: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,](https://reader033.fdocuments.in/reader033/viewer/2022050313/5f75f66307aa081cac5fa1ef/html5/thumbnails/21.jpg)
Type to enter text
Conclusions
‣ Bug predictions do work ‣ Cross-project predictions do not really work ‣ Data sets (systems) need to be “harmonized” ‣ Data preprocessing and learners need to be
calibrated ‣ Studies need to be replicable (systematically) ‣ Periods of stability vs. drift