Post on 05-Dec-2014
description
Towards Building a Universal Defect Prediction Model
Feng Zhang
Audris Mockus
Iman Keivanloo
Ying Zou
2
ONE ring that rules the other rings of power.
3
A universal model that predicts defects for all the projects.
4
Most successful prediction models are within-project models
5
How about cross-project models?
6
Deriving a universal model with cross-project models?
7
Select the training set of projects like this?
8
Or select the training set of projects like this?
9
Is it still possible to build a universal model? If so, then how?
10
What context factors to consider ?
11
C++
S
C++
L
Java
S
Java
L
Steps towards building a universal model 1. Partition
C++ Java Small size
Large size
Programming languages System Size
12
C++
S
C++
L
Java
S
Java
L
Steps towards building a universal model 1. Partition
C++
S
C++
L
Java
2. Cluster
R1(x)
R1(x)
R3(x)
3. Obtain Ranking Functions
4. Rank
Using quantiles of metric values (- ∞, 10%] => level 1 (10%, 20%] => level 2
… [90%, +∞) => level 10
Java
S
Java
L
Java
13
C++
S
C++
L
Java
S
Java
L
Build a universal model 1. Partition
C++
S
C++
L
Java
2. Cluster
R1(x)
R1(x)
R3(x)
3. Obtain Ranking Functions
4. Rank
Build a universal defect prediction model using rank-transformed values.
14
Case study setup
937
461
0 200 400 600 800
Version Control System
0
200
400
600
800
1000
Using Not Using
Issue Tracking System
0
200
400
600
800
Programming languages
15
Research Questions
16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Precision Recall AUC
Rank Transformation
Log Transformation
0.48 0.48 0.57
0.58 0.62
0.61
RQ1. Is our rank transformation good ?
17
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Precision Recall AUC
Universal Model
Within-project Model
0.45 0.48
0.58 0.63 0.64
0.62
RQ2. How good is the universal model ?
18
RQ3. Does the universal model work for external projects ?
Predict
19
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Eclipse Equinox PDE Mylyn Lucene
Universal Model
Within-project Model 0.31
0.47
0.63 0.66
0.21
0.13
Precision
0.23 0.28
0.23 0.28
RQ3. Precision comparison
20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Eclipse Equinox PDE Mylyn Lucene
Universal Model
Within-project Model
0.57
0.79
0.54 0.61 0.61
0.34
Recall
0.47
0.72
0.42
0.60
RQ3. Recall comparison
21
0.6 0.62 0.64 0.66 0.68
0.7 0.72 0.74 0.76 0.78
0.8
Eclipse Equinox PDE Mylyn Lucene
Universal Model
Within-project Model
0.76 0.77 0.78
0.79
0.69 0.67
AUC
0.70 0.70 0.68
0.69
RQ3. AUC comparison
22
Summary