Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier
-
Upload
feng-zhang -
Category
Software
-
view
157 -
download
1
Transcript of Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier
Cross-project Defect Prediction Using a Connectivity-based
Unsupervised Classifier
Feng Zhang Quan Zheng Ying Zou Ahmed E. Hassan
Supervisedclassifier
Softwaremetrics
Defect data
Cross-project defect prediction
Softwaremetrics
Trainingproject
Supervisedclassifier
Softwaremetrics
Defect data
Cross-project defect prediction
Softwaremetrics
Targetproject
Trainingproject
Supervisedclassifier
Softwaremetrics
Defect data
Cross-project defect prediction
Softwaremetrics
Targetproject
Trainingproject
Supervisedclassifier
Softwaremetrics
Defect data
Cross-project defect prediction
Softwaremetrics
Targetproject
Defect proneness
Trainingproject
Supervisedclassifier
Softwaremetrics
Defect data
Cross-project defect prediction
Softwaremetrics
Targetproject
Defect proneness
Trainingproject
Supervisedclassifier
Softwaremetrics
Defect data
Cross-project defect prediction
Softwaremetrics
Targetproject
Defect proneness
Trainingproject
Heterogeneity
Supervisedclassifier
Softwaremetrics
Defect data
Cross-project defect prediction
Softwaremetrics
Targetproject
Defect proneness
Trainingproject
Heterogeneity
Supervisedclassifier
Softwaremetrics
Defect data
Softwaremetrics
Targetproject
Defect proneness
Trainingproject
Heterogeneity
Our Previous Solution(MSR 2014)
Supervisedclassifier
Softwaremetrics
Defect data
Softwaremetrics
Targetproject
Defect proneness
Trainingproject
Heterogeneity
Our Previous Solution(MSR 2014)
Supervisedclassifier
Softwaremetrics
Defect data
Softwaremetrics
Targetproject
Defect proneness
Trainingproject
How About Using Unsupervised Classifiers?
Unsupervisedclassifier
Softwaremetrics
Defect data
Softwaremetrics
Targetproject
Defect proneness
Trainingproject
How About Using Unsupervised Classifiers?
Unsupervisedclassifier
Softwaremetrics
Defect data
Softwaremetrics
Targetproject
Defect proneness
Trainingproject
How About Using Unsupervised Classifiers?
Heterogeneity
Unsupervisedclassifier
Softwaremetrics
Defect data
Softwaremetrics
Targetproject
Defect proneness
Trainingproject
How About Using Unsupervised Classifiers?
HeterogeneityInitial attempts using K-means were not very successful.
Far away in distance but may be connected !
Are defective software entities connected to each other?
Stronger Stronger
Weaker
Defective entities tend to connect to other defective entities.
Within-community and cross-community connections
Research questions
RQ1. How does the spectral clustering based classifier perform in cross-project defect prediction?
RQ2. Does the spectral clustering based classifier perform well in within-projectdefect prediction?
Equinox JDT Lucene Mylyn PDE
AEEEM (5 projects)
CM1 JM1 KC3 MC1 MC2 MW1
NASA (11 projects)
PC1 PC2 PC3 PC4 PC5
Subject projects (Total: 26)
Subject projects (Total: 26)
Equinox JDT Lucene Mylyn PDE
AEEEM (5 projects)
CM1 JM1 KC3 MC1 MC2 MW1
NASA (11 projects)
PC1 PC2 PC3 PC4 PC5
PROMISE (10 projects)
Ant Camel Ivy Jedit Log4j
Lucene POI Tomcat Xalan Xerces
Unsupervised
1. K-means clustering (KM)
2. Partition around medoids (PAM)
3. Fuzzy C-means (FCM)
4. Neural-gas (NG)
Classifiers for comparison (Total: 9)
Unsupervised
1. K-means clustering (KM)
2. Partition around medoids (PAM)
3. Fuzzy C-means (FCM)
4. Neural-gas (NG)
Supervised
1. Random forest (RF)
2. Naïve Bayes (NB)
3. Logistic regression (LR)
4. Decision tree (DT)
5. Logistic model tree (LMT)
Classifiers for comparison (Total: 9)
NASA
AEEEM
PROMISE
RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?
…
…
…
NASA
AEEEM
PROMISE
RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?
…
…
…
NASA
AEEEM
PROMISE
AverageAUC
AverageAUC
AverageAUC
RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?
…
…
…
AverageAUC
AverageAUC
AverageAUC
NASA
AEEEM
PROMISE
Rank classifiers(Scott-Knott Test)
RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?
Red text:Unsupervised
Blue text:Supervised
Rank 1
Rank 2
Rank 3
Rank 4
RQ1. Results (cross-project)
Our approach can compete with supervised classifiers under study,
and sometime is even better.
RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?
RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?
50%
50%
RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?
50%
50%
AUCTraining Testing
RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?
50%
50%
AUCTraining
Training
Testing
Testing AUC
RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?
50%
50%
AUCTraining
Training
Testing
Testing AUC
50%
50%
AUCTraining
Training
Testing
Testing AUC
…(500 random splits, thus 1,000 evaluations)
RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?
50%
50%
AUC
Rank classifiers(Scott-Knott Test)
Training
Training
Testing
Testing AUC
50%
50%
AUCTraining
Training
Testing
Testing AUC
…(500 random splits, thus 1,000 evaluations)
RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?
RQ2. Results (within-project)
12
Random forest
Logistic regressionSpectral clusteringLogistic model treeNaïve Bayes
Silver Gold
12 3
Random forest
Logistic regressionSpectral clusteringLogistic model treeNaïve Bayes
Fuzzy C-means
RQ2. Results (within-project)
Silver BronzeGold
12 3
Random forest
Logistic regressionSpectral clusteringLogistic model treeNaïve Bayes
Fuzzy C-means
RQ2. Results (within-project)
Silver BronzeGold
Our approach can achieve similar performance as supervised classifiers,
except random forest.
Feng Zhang([email protected]) (http://www.feng-zhang.com)