Predicting Defects Using Change Genealogies (ISSE 2013)

Post on 04-Jul-2015

240 views 0 download

Transcript of Predicting Defects Using Change Genealogies (ISSE 2013)

Predicting Defects Using Change GenealogiesKim Herzig*, Sascha Just†, Andreas Rau†, Andreas Zeller†

* Microsoft Research, UK† Saarland University, Germany

Prediction Models

• Goal: determine the likelihood of bugs in code entities Quality assurance limited by time and money.

Can be helpful for project outsiders.

• Trained on “ground truth” Known instances and their properties.

Idea: learning from past for future.

• Predicting / estimating defect likelihood of new, unknown code entities

Fine-Tuning Prediction Models

Prediction Target

Machine Learner

Training Methods

Metrics (independent variables)

(Social) Network Metrics

Some participants more active and central than others.

Are these participants also more crucial?

Assumption: “Central binaries tend to be defect-prone”.

Code Network Metrics

Code entities communicate with each other.

Use call graph network to compute network metrics.

[2008] Zimmermann and Nagappan: “Predicting Defects using Network Analysis on Dependency Graphs”

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

Call graphs do not change significantly over time!

Assumption: “Code being crucially changed tend to be defect prone”.

Change Network Metrics

Code changes depend on each other.

Central code changes tend to be crucial.

Idea: Use dependencies between code changes

Change Genealogies

Change Genealogies (in a nutshell)[2013] Kim Herzig: “Mining and Untangling Change Genealogies” (PhD thesis)

Directed graph structure

Method level dependencies

Multi-dimensional (space & time)

Change Genealogy Metrics EGO network metrics

Measures the immediate impact of changes on other changes.

GLOBAL network metrics Express the long-term impact of changes on other changes.

Considering the type of the change Adding method definition, modifying method call

Considering parent age How old are the parent changes a change depends on.

Change genealogy metrics must be aggregated to source file level.

Comparing change genealogies

against:

Code complexity models (e.g. McCabe)

Code dependency models(Zimmermann & Nagappan)

Combined network models(Change genealogy & code dependency network metrics)

Experimental Setup

Experimental Setup

Study subjects Multiple machine learners

Prediction Precision

Code complexity metrics

Code dependency network metrics (Zimmermann & Nagappan)

Change genealogy metrics

NM & CGM

Confirmed: Network metrics

outperform complexity metrics.

Change genealogy models report

less false positives (higher precision).

Change genealogy model slightly

more false negatives (lower recall).

Combining network metrics: good

recall but worse precision.

Influential Metrics

Network efficiency among the top 10 most influential metrics.

Relationship between changes and type of dependency top 2 metrics (for all projects).

Higher number of old parents the higher the probability to add bugs.

Code entities combining multiple older functionalities more defect prone.

Code entities combining multiple older functionalities more defect prone.

Change genealogies are well suited for defect prediction (better precision, close recall).

Adapting social network metrics Comparing prediction models.to change dependency graphs.

Summary