Automatic Fine-Grained Issue Report Reclassification

Automatic Fine-Grained Issue ReportReclassification

Pavneet Singh Kochhar, Ferdian Thung, David LoSingapore Management University

{kochharps.2012, ferdiant.2013, davidlo}@smu.edu.sg

Misclassification of Issue Reports

Herzig et al. *• 40% of issue reports are misclassified.• 1/3 issue reports are wrongly classified as bugs.

* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013

DOCUMENTATIONIMPROVEMENT

REFACTORING

BACKPORTCLEANUP

DESIGN DEFECT

Impact of Misclassification

• Well-known projects receive large number of issue reports

• Large number of bug reports can overwhelm the number of developers.

• Mozilla developer - “Everyday, almost 300 bugs appear that need triaging.” *

• Manual Process

• Misclassified reports take more time to fix+

* J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” in ETX, pp. 35–39, 2005+ X. Xia, D. Lo, M. Wen, E. Shihab, and B. Zhou, “An empirical study of bug report field reassignment,” in CSMR-WCRE, pp. 174–183, 2014.

Related Work

• Herzig et al. [1] – • Manually classify over 7000 issue reports.• 14 different categories

We use the same dataset We use 13 categories (merge UNKNOWN & OTHERS)

• Antoniol et al. [2] – • Classify issue reports either as “bug” or “enhancement”

We consider “reclassification” problem We use 13 different categories

[1] It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013[2] G. Antoniol, K. Ayari, M. D. Penta, F. Khomh, and Y.-G. Gueheneuc, “Is it a bug or an enhancement? a text-based approach to classify change requests,” in CASCON, pp. 23:304–23:318, 2008.

Our Study

Fine-Grained Issue Report Reclassification

13 Categories*

BUG RFE IMPROVEMENT DOCUMENTATION

TASK BUILD

REFACTORING

DESIGN DEFECT

TEST CLEANUP

BACKPORT

SPECIFICATION

OTHERS* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013

(Adaptive Maintenance)

(PerfectiveMaintenance)

(Deallocatingmemory)

(RemovingDuplicate methods)

Overall Framework

Training Issue

Reports

Ground Truth

Categories*

New Issue Reports

Model Building Model

Feature Extraction

Predicted Reclassified Categories

Training Phase Deployment Phase

*Herzig et al.

Pre-Processing

• Text Pre-Processing• Summary & Description fields

• Stop-word removal • eg., “is”, “are”, “if”

• Stemming (Reducing to root form)• eg., “reads” and “reading” -----> “read”• Use Porter Stemmer*

*http://tartarus.org/martin/PorterStemmer/

Feature Extraction

1. TF-IDF TF - Term Frequency, IDF- Inverse Document Frequency

2. Reported Category (C1-C13) Cn=1 where n=1 to 13

Feature Extraction

3. Exception Trace (S) a) Phrase: “Exception in thread” b) Regex : [A-Za-z0-9$.]+Exception eg., java.lang.NullPointerException c) Regex : [A-Za-z0-9$.]+[A-Za-z0-9]+([A-Za-z0-9]+(java:[0-9]+)?) eg., oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:447)

4. Issue Reporter (R1-RM) where M is total number of reporters

Model Building

• LibSVM (Support Vector Machine)*• Multi-class classification

• Inputs• L, Learner (Training Algorithm)• X, Set of Training Data i.e., Issue Reports• y, where }, Labels i.e., 13 categories

• Output• A list of classifiers for k },

• Classifiers are applied on unseen data to predict label k

*http://www.csie.ntu.edu.tw/~cjlin/libsvm/10/24

Dataset

Projects Organization Tracker Number of Issue Reports

HTTPClient Apache JIRA 746

Jackrabbit Apache JIRA 2402

Lucene-Java Apache JIRA 2443

Rhino Mozilla BugZilla 1226

Tomcat5 Apache BugZilla 584

Total = 7401 Issue Reports *

* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013

Evaluation Metrics

(Precision)

(Recall)

(F-Measure)

( Weighted F-Measure)

We use Weighted Precision, Recall & F-Measure

Baselines

• Baseline-1 Predicts reclassified category same as assigned category

• Baseline-2 Predicts reclassified category as “BUG” (Majority of the issues are BUGS)

Research Questions

RQ1: Effectiveness of Our Approach

RQ2: Varying the Amount of Training Data

RQ3: Most Discriminative Features

RQ4: Analysis of Correctly & Wrongly Classified Issue Reports

RQ5: Comparison to Other Classification Algorithms

RQ1: Effectiveness of Our ApproachHTTPClient Jackrabbit Lucene-Java

Prec Rec WF1 Prec Rec WF1 Prec Rec WF1

Ours 0.61 0.63 0.60 0.71 0.72 0.71 0.63 0.62 0.63

Baseline-1 0.54 0.52 0.43 0.61 0.62 0.54 0.50 0.50 0.43

Baseline-2 0.16 0.40 0.23 0.15 0.39 0.21 0.08 0.28 0.12

Improvement-1 12.96 21.15 39.53 16.39 16.12 31.48 24.00 26.00 44.18Improvement-2 281.2 57.4 160.8 373.3 84.6 238.0 675.0 125.0 416.6

Rhino Tomcat5Prec Rec WF1 Prec Rec WF1

Ours 0.58 0.61 0.57 0.58 0.62 0.58

Baseline-1 0.35 0.57 0.43 0.36 0.58 0.45

Baseline-2 0.26 0.51 0.35 0.30 0.54 0.38

Improvement-1 65.71 7.01 32.55 61.11 6.89 28.88Improvement-2 123.0 19.6 62.85 93.3 14.8 52.63

RQ2: Varying Training Data

% of Issue Reports

HTTPClient Jackrabbit Lucene-Java

Prec Rec WF1 Prec Rec WF1 Prec Rec WF1

10 0.49 0.56 0.47 0.63 0.65 0.60 0.55 0.57 0.5320 0.54 0.55 0.46 0.64 0.66 0.61 0.57 0.57 0.5430 0.58 0.60 0.54 0.68 0.70 0.67 0.59 0.60 0.5840 0.54 0.53 0.48 0.69 0.71 0.68 0.59 0.58 0.5650 0.58 0.61 0.57 0.69 0.71 0.69 0.62 0.63 0.6160 0.59 0.62 0.58 0.64 0.65 0.62 0.61 0.62 0.6170 0.60 0.62 0.58 0.70 0.72 0.70 0.62 0.63 0.6280 0.62 0.68 0.61 0.70 0.72 0.70 0.63 0.64 0.6390 0.61 0.64 0.60 0.71 0.73 0.71 0.62 0.63 0.62

RQ2: Varying Training Data

% of Issue Reports

Rhino Tomcat5

Prec Rec WF1 Prec Rec WF1

10 0.45 0.52 0.40 0.47 0.54 0.4320 0.46 0.50 0.39 0.50 0.55 0.4530 0.46 0.50 0.40 0.54 0.60 0.5340 0.47 0.48 0.40 0.56 0.62 0.5650 0.52 0.58 0.50 0.56 0.61 0.5660 0.55 0.59 0.53 0.50 0.48 0.4270 0.56 0.60 0.54 0.49 0.44 0.3880 0.58 0.61 0.56 0.57 0.62 0.5890 0.59 0.61 0.56 0.54 0.59 0.55

RQ3: Most Discriminative Features

HTTPClient JackrabbitFeature Fisher

ScoreFeature Fisher

ScoreStemmed word “test” 1.73 Reported Category (BUG) 0.72

Reported Category (TASK) 0.58 Stemmed word “test” 0.55

Stemmed word “privat” 0.56 Stemmed word “maven” 0.51

Reported Category (BUG) 0.54 Stemmed word “backport” 0.46

Stemmed word “cleanup” 0.50 Reported Category (IMPR) 0.43

RQ3: Most Discriminative FeaturesLucene-Java Rhino

Feature Fisher Score

Stemmed word “test” 0.94 Stemmed word “test” 3.84

Reported Category (BUG) 0.61 Stemmed word “suit” 0.43

Reported Category (TEST) 0.50 Stemmed word “patch” 0.32

Stemmed word “backport” 0.45 Stemmed word “driver” 0.29

Stemmed word “remov” 0.38 Stemmed word “regress” 0.27

Tomcat5Feature Fisher Score

Stemmed word “longer” 1.15

Issue Reporter “starksm” 0.71

Stemmed word “class” 0.64

Stemmed word “ant” 0.62

Reported Category (BUG) 0.56

RQ4: Correctly & Wrongly Classified Reports

BUG RFE IMPR TEST DOC BUILD CLEANUP REFACBUG 2631 48 119 26 23 8 8 1

RFE 139 765 223 6 13 7 13 31

IMPR 320 214 658 8 12 13 16 19

TEST 84 12 15 220 1 8 4 3

DOC 95 39 37 0 209 13 17 2

BUILD 29 17 19 11 10 127 5 1

CLEANUP 58 30 42 6 11 5 104 12

REFAC 20 51 61 1 2 0 16 91

Predicted Labels

Table shows 8 categories (Total 13 categories)

BUG – 2631/2914 (90.3%)TEST – 220/349 (63%)

RFE – 765/1221 (62.7%)

RQ4: Correctly & Wrongly Classified Reports

BUG RFE IMPR TEST DOC BUILD CLEANUP REFACBUG 2631 48 119 26 23 8 8 1

RFE 139 765 223 6 13 7 13 31

IMPR 320 214 658 8 12 13 16 19

TEST 84 12 15 220 1 8 4 3

DOC 95 39 37 0 209 13 17 2

BUILD 29 17 19 11 10 127 5 1

CLEANUP 58 30 42 6 11 5 104 12

REFAC 20 51 61 1 2 0 16 91

Predicted Labels

RQ5: Comparison with Other Algorithms

Approach HTTPClient Jackrabbit Lucene-JavaPrec Rec WF1 Prec Rec WF1 Prec Rec WF1

Ours (LibSVM) 0.61 0.63 0.60 0.71 0.72 0.71 0.62 0.63 0.62Naïve Bayes 0.49 0.47 0.48 0.51 0.39 0.43 0.46 0.37 0.40

NB Multinomial

0.53 0.60 0.54 0.64 0.66 0.61 0.60 0.59 0.56

K-Nearest Neighbors

0.47 0.29 0.34 0.60 0.58 0.59 0.46 0.40 0.42

Random Forest

0.45 0.56 0.46 0.54 0.58 0.53 0.45 0.48 0.43

RBF Network 0.37 0.39 0.37 0.39 0.41 0.40 0.31 0.31 0.30

RQ5: Comparison with Other Algorithms

Approach Rhino Tomcat5Prec Rec WF1 Prec Rec WF1

Ours (LibSVM) 0.58 0.61 0.57 0.58 0.62 0.58Naïve Bayes 0.51 0.51 0.51 0.48 0.40 0.42

NB Multinomial

0.52 0.58 0.49 0.51 0.58 0.47

K-Nearest Neighbors

0.50 0.43 0.43 0.43 0.43 0.42

Random Forest

0.51 0.56 0.47 0.45 0.56 0.46

RBF Network 0.40 0.43 0.41 0.33 0.54 0.39

Conclusion & Future Work

Automated approach to reclassify issue reportsEvaluate over 7000 issue reportsExtract features such as TF-IDF, Reported category, Exception trace, Issue reporterPerform multi-class classification (13 Categories)F-Measure Score 0.57-0.71Improvement of 28.88% - 414.66% over baselines

Future Work: Analyse more issue reports Design advanced multi-class solution

Thank You!

Email: kochharps.2012@smu.edu.sg

Automatic Fine-Grained Issue Report Reclassification

Software

Transcript of Automatic Fine-Grained Issue Report Reclassification

Fine-grained Data Access Control Systems with …salsahpc.indiana.edu/CloudCom2010/slides/PDF/Fine-grained...Fine-grained Data Access Control Systems with User Accountability in Cloud

PowerSpy: Fine Grained Power Profiler

Beyond Spatial Pooling: Fine-Grained Representation ...

Fine-grainedVideo Categorizationwith Redundancy ...yima/files/ECCV_2018.pdfKeywords: Fine-grained Video Categorization · Attention Mechanism. 1 Introduction Fine-grained visual recognition,

Bilinear Models for Fine-grained Visual Recognitionpeople.cs.umass.edu/.../presentations/BilinearModelsICCV2015oral.pdf · Bilinear Models for Fine-grained Visual Recognition ...

Available online at: Enabling Fine-grained ... · Enabling Fine-grained Multi-keyword Search ... by building up the fine-grained multi-watchword hunt plans over scrambled cloud information.

Visualization of Fine-Grained Code Change HistoryNatProg/papers/P1_PP20_Yoon Azurite VLHCC13.pdf · Visualization of Fine-Grained Code Change History . ... signed to visualize fine-grained

Fine-Grained Classificationdjacobs/CMSC733/FineGrainedClassification.pdfFine-Grained Classification •Classification can be defined as identification at the basic level. •Fine-grained

Parallel Programming Patterns · Parallel Programming Patterns 13 Fine grained vs Coarse grained partitioning Fine-grained Partitioning – Better load balancing, especially if combined

Fine-grained Measurement of PostgreSQL

A Fine-Grained Adaptive Middleware Framework for Parallel ...assured-cloud-computing.illinois.edu/files/2014/03/A-Fine-Grained-Adaptive-Middleware...A Fine-Grained Adaptive Middleware

Fine-grained authorization with XACML

Fine-Grained Authorization in Databases

Fine Grained Robotics - DSpace@MIT Home

Coarse-grained and fine-grained lockingfileadmin.cs.lth.se/cs/Education/EDA015F/2013/Herlihy4-5-presentation.pdf · Topics discussed • Coarse-grained locking – One lock • Fine-grained

Fine Grained Audit Trail

Fine-Grained Failover Using Connection Migration

Fine-Grained Urban Flow Prediction

Fine-grained Subjectivity and Sentiment Analysis · FINE-GRAINED SUBJECTIVITY AND SENTIMENT ANALYSIS: ... FINE-GRAINED SUBJECTIVITY AND SENTIMENT ANALYSIS: ... the manual and automatic

Fine-grained semantics for attitude reports