Systematic Mining of Software...

software evolution & architecture lab

University of Zurich, Switzerland http://seal.ifi.uzh.ch @ LASER summer school 2014

Harald Gall

Systematic Mining of Software Repositories !

Lecture 5 - Retrospective

2009 Roundtable on the Future of Mining Software Archives

Type to enter text

2009 Future of Mining Software Repos

Vision statement Status open

Answer Commonly Asked Project Questions Michael W. Godfrey partly

Software Repositories: A Strategic Asset Ahmed E. Hassan yes

Create Centralized Data Repositories James Herbsleb yes

Embed Mining in Developer Tools Gail C. Murphy partly

Help Developers Search for Information Martin Robillard partly

Deploy Mining to Industry Audris Mockus ongoing

Let Us Not Mine for Fool’s Gold David Notkin ongoing

based on a Software Roundtable published in IEEE Software 2009

2013 MSRconf revisited

Type to enter text

MSRconf.org: Status in 2013

‣ A Trend Analysis on Past MSR Papers, by Serge Demeyer et al., MSR 2013 ‣ RQ 1: Which are the popular and outdated research

topics? (by text analysis, with n-grams) ‣ RQ 2: Which are the frequently and less frequently cited

cases? ‣ RQ 3: Which is the popular and emerging mining

infrastructure? ‣ RQ 4: What is the “actionable information” which we are

deemed to uncover?

RQ 1: Popularity of topics

RQ 2: Frequently cited cases

RQ 3: SCM’s

Type to enter text

CfP of MSRconf:

‣ Goal is “to uncover interesting and actionable information about software systems and projects”

The LNCS book chapter

Type to enter text

LNCS book chapter

‣ Revisiting Mining StudiesKatja Kevic (UZH), Stefanie Beyer (AAU), Ilias Rousinopoulos (AAU), Sven Amann (TUD) ‣ what’s a mining study: setup, resources, machinery, .. ‣ what sources (archives) can be used for what kind of

study (a catalog) ‣ what questions have been addressed so far ‣ what questions and conclusions (answers) so far ‣ which studies can be automated in terms of tooling and

infrastructure ‣ what is a benchmark for mining studies

A retrospective overview of topics addressed in the lectures

The Screening Plant of a SW Miner

Type to enter text

Which data sources?

‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole, Sourcerer, Ultimate Debian DB ‣ Provide benchmark (raw) data

‣ Interactive online web platforms that provide various analyses ‣ Boa, FOSSology, Alitheia core, Ohloh ‣ Analyses offered by design ‣ Data produced is best used within the system

‣ Industrial project data (not widely accessible!)

Type to enter text

What kind of studies?

‣ Source code ‣ Which entities co-evolve/co-change? ‣ How to identify code smells or design disharmonies?

‣ Bugs and changes ‣ Who should / how long will it take to fix this bug? ‣ When do changes induce fixes? ‣ Predicting bugs and their components?

‣ Project and process ‣ Do code and comments co-evolve? ‣ Who are the experts of a piece of code?

Example: Bug Prediction

Using Code Churn vs. Fine-Grained Changes

Using the Gini Coefficient for Bug Prediction

Predicting the MethodPredicting the Types of Code Changes

Using developer networks for Bug Prediction

Type to enter text

• Learn a prediction model from historic data

• Predict defects for the same project

• Hundreds of prediction models / learners exist

• Models work fairly well with precision and recall of up to 80%.

Predictor Precision Recall

Pre-‐Release Bugs 73.80% 62.90%

Test Coverage 83.80% 54.40%

Dependencies 74.40% 69.90%

Code Complexity 79.30% 66.00%

Code Churn 78.60% 79.90%

Org. Structure 86.20% 84.00%From: N. Nagappan, B. Murphy, and V. Basili. The influence of organizational structure on software quality. ICSE 2008.

Performance of bug prediction

Example: Code Ownership

C. Bird, N. Nagappan, B. Murphy, H. Gall, P Devanbu, Don't touch my code! Examining the effects of ownership on software quality, ESEC/FSE ’11

Performance/Time variance

J. Ekanayake, J. Tappolet, H. Gall, A. Bernstein, Time variance and defect prediction in software projects, Empirical Software Engineering, Vol. 17 (4-5), 2012

Workflows & Mashups

Type to enter text

Conclusions

‣ Bug predictions do work ‣ Cross-project predictions do not really work ‣ Data sets (systems) need to be “harmonized” ‣ Data preprocessing and learners need to be

calibrated ‣ Studies need to be replicable (systematically) ‣ Periods of stability vs. drift

Systematic Mining of Software...

Documents

Transcript of Systematic Mining of Software...

Gall bladder lecture

GALL - worldradiohistory.com

gall stone.ppt

John Gall, 1978

Book France Gall - Livre d'or Vol.2 (France Gall Chante Michel Berger)

Gall bladder carcinoma

Lanturi Gall

andré Le Gall

Anatomy of gall bladder

Propeller Design Workshop Presented by David J. Gall Gall Aerospace David@Gall.com .

Book France Gall

Optimal Retrospective Voting - University of Chicagohome.uchicago.edu/~bdm/PDF/retrospective.pdf · Optimal Retrospective Voting⁄ Ethan Bueno de Mesquitay Amanda Friedenbergz First

Gall stones

Liver&gall bladder

Art Gall/Reflection

Jumping Gall Wasp, California Jumping Gall Wasp, …Gall and female wasp of the second generation of Neuroterus saltatorius. The gall is 1 mm in diameter. Credits: R. Duncan, Natural

Gall-Making Insects and Mites - agrilifeextension.tamu.eduagrilifeextension.tamu.edu/.../09/E-397-gall-making-insects-and-mites… · the gall-making organism and the host plant is

Doss India - Gall Stones, Gall Bladder Surgery Treatment in Pune, Maharashtra

Gall bladder cancer

ca gall bladder