Mining Version Archives for Co-changed Linesropas.snu.ac.kr/seminar/20080717.pdf · Severe...
Transcript of Mining Version Archives for Co-changed Linesropas.snu.ac.kr/seminar/20080717.pdf · Severe...
Predicting Bugsby Analyzing History
Sunghun KimResearch On Program Analysis System
Seoul National University
Around the World in 80 days
Around the World in 8 years
Predicting Bugs
Severe consequences• July 28, 1962: Mariner I space probe• 1982 : Soviet gas pipeline• 1985 ~ 1987: Therac-25 medical accelerator• June 4, 1996: Ariane 5 Flight 501• …
Predicting Bugs
Severe consequences• July 28, 1962 - Mariner I space probe• 982 - Soviet gas pipeline• 985-1987 - Therac-25 medical accelerato• June 4, 1996 - Ariane 5 Flight 501
Boring and labor intensive workThe Ariane5 exploded seconds after launching.
Analyzing History
"Those who cannot learn from history are doomed to repeat it."
- George Santayana
Analyzing History
"Developers who cannot learn from history are doomed to repeat it."
- George Santayana
Analyzing History
"Developers who cannot learn from Software history are doomed to repeat it."
- George Santayana
Tests
E-mailsBug
reports
Models
Raw data
History Information
Mine
BugPrediction
Resource Allocation
Software Understanding
Change Impact Analysis
BugPrediction
Available Information
Feedback
Produce
All modules are bug-free!
Bug-free module
Dream
X
X X
X
X
Some modules are
buggy!
Bug-free moduleX Buggy module
Some are Buggy
X
X X
X
X
Often changes introduce
bugs!
Bug-free moduleX Buggy module
changeX
Changes introduce Bugs
X
X X
X
X
changeX
Two Bug Prediction Algorithms
Bug-free moduleX Buggy module
Change Classification:
predicting if a change introduces
a bug
Bug cache: predicting buggy
modules
Change Classification[TSE08, the featured article of March/April issue]
Change Classification
...
...
...
...
Rev 1
change............
Rev 2............
Rev 3
Development history of JEditTextArea.java
change
Change Classification
...
...
...
...
Rev 4............
Rev 1
change............
Rev 2............
Rev 3
Development history of JEditTextArea.java
change change
Change Classification
Did I just introduce
a bug?
...
...
...
...
Rev 4............
Rev 1
change............
Rev 2............
Rev 3
Development history of JEditTextArea.java
change change
What to do when a change is likely to introduce a bug?
Review the submitted code carefully• The submitted code change is fresh
Focus additional software quality assurance (QA) efforts on those changes• Software inspections• Additional test cases
Change Classification
It classifies all changes (as buggy or clean) with 70% recall and 94% precision.
...
...
...
Rev n .........
Rev n+1
buggy
...
...
...
Rev n .........
Rev n+1
clean
...
...
...
Rev n .........
Rev n+1
buggy
...
...
...
Rev n .........
Rev n+1
clean
Machine learner.........
Rev n .........
Rev n+1
?clean
buggy
Label Historical Changes
……
Development history of JEditTextArea.java
[MSR06, ASE06]
...
...
...
...
Rev 102............
Rev 1............
Rev 100............
Rev 101
change change
Label Historical Changes
...insertTab()......
Rev 102 (no BUG)
...setText(“\t”)......
Rev 101 (with BUG)
fixed
……
Development history of JEditTextArea.java
...
...
...
...
Rev 102............
Rev 1............
Rev 100............
Rev 101
change change
Label Historical Changes
...insertTab()......
Rev 102 (no BUG)
...setText(“\t”)......
Rev 101 (with BUG)
fixed
……
Development history of JEditTextArea.java
...
...
...
...
Rev 102............
Rev 1............
Rev 100............
Rev 101
change change
Change message: “fix for bug 28434”
Label Historical Changes
...insertTab()......
Rev 102 (no BUG)
...setText(“\t”)......
Rev 101 (with BUG)
fixed
……
Development history of JEditTextArea.java
...
...
...
...
Rev 102............
Rev 1............
Rev 100............
Rev 101
change change
Tracking Line Changes [MSR05]
1:2:3:4:5:6:7:8:9:10:11:12:
Rev121:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:
Rev231:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:
Rev421:2:3:4:5:6:7:8:9:10:11:12:13:14:
Rev101
3:4:5:
10:11:12:
7:8:
10:
CHG
CHG
CHG
DEL
ADD1:2:3:4:5:6:7:8:9:10:11:12:13:14:
Rev12Rev12Rev23Rev23Rev23Rev12Rev12Rev12, 23Rev12, 23Rev12, 23, 42Rev12Rev12Rev12Rev12
Line origins
...Rev 23Rev 12...
Line origins...insertTab()......
Rev 102 (no BUG)
...setText(“\t”)......
Rev 101 (with BUG)
……
Development history of JEditTextArea.java
...
...
...
...
Rev 102............
Rev 1............
Rev 100............
Rev 101
change change
Label Historical Changes
Buggychange fixed
...
...
...
Rev 22 .........
Rev 23 .........
Rev 24 .........
Rev 25 .........
Rev 26
buggy buggycleanclean
Label Historical Changes
Author: hunkimCheck-in time: March 23, 2006 11:30 AMLog: Never convert propresult from utf-16
...
...
...
...
Revision 11............
Revision 10
JEditTextArea.java
change
Extracting Features
10010
0 1 0 1 0 1 0 1 … 00 0 0 1 0 1 0 1 … 00 1 1 1 0 1 1 1 … 00 1 0 3 0 0 0 1 … 00 1 0 1 0 1 0 1 … 0H
isto
rica
l ch
an
ges
Training Classifiers
Machine learning techniques• Bayesian Network, SVM
Classifying New Changes
0 1 0 1 0 0 0 1 … 0Newchange
0
prediction
10010
0 1 0 1 0 1 0 1 … 00 0 0 1 0 1 0 1 … 00 1 1 1 0 1 1 1 … 00 1 0 3 0 0 0 1 … 00 1 0 1 0 1 0 1 … 0H
isto
rica
l ch
an
ges
Training and testing sets• 10-fold cross-validation
Performance measurement• Precision• Recall
Evaluation
Training and Testing Sets
10010
0 1 0 1 0 1 0 1 … 00 0 0 1 0 1 0 1 … 00 1 1 1 0 1 1 1 … 00 1 0 3 0 0 0 1 … 00 1 0 1 0 1 0 1 … 0T
rain
ing
0 1 0 1 0 0 0 1 … 0Testing 0
prediction
0real
Performance Measurement4 possible outcomes from using a classifier
Precision: , Recall:nb→b + nc→b
nb→b
nb→b + nb→c
nb→b
Buggy Clean
Buggy nb→b nb→c
Clean nc→b nc→c
RealPrediction
Subject Systems
PostgreSQLjEdit
Mozilla
and more …
Bugzilla
Bug Prediction Accuracy (Bayesian Network after feature selection)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
A1 BUG COL GAI GFO JED MOZ ECL PLO POS SCA SVN
Bug Recall Bug Precision
59
51
59
59
0 25 50 75 100
ChangeClassfication
Aversano et al. [IWPSE07]
Mizuno et al. [FSE07]
Recall Precision
Related Work
94
59
51
70
59
59
0 25 50 75 100
ChangeClassfication
Aversano et al. [IWPSE07]
Mizuno et al. [FSE07]
Recall Precision
Related Work
Change Classification Summary
Use machine learning techniques to analyze changesAfter training, can predict whether a new change has a bug, or doesn’t have a bug• Average recall is 70%• Average precision is 94%
Applicable in real development process• Yahoo and Apple are interested
Bug Cache[ ICSE07, Distinguished paper ]
X
X X
X
X
Bug-free moduleX Buggy module
Bug cache: locating buggy
modules
Bug Cache
Motivation
Which files should we focus on?
Where are the Bugs?
Temporal locality:Defected files are
likely to have more soon.[Ostrand, Hassan]
In modified files![Khoshgoftaar et al.]
In new files![Graves et al.]
Spatial locality:nearby other bugs![Zimmermann et al.]
Our Solution• List of most bug-prone files • Dynamically adaptive and intuitive• Combine bug prediction models
10% BugCache predicts 73~95% of bugs
Bug Cache Model
Cache Model
Miss
Cache size: 2
A B C
C
Cache Update• Load missed files • Pre-fetch nearby files (spatial locality)
File Number of common changes with .
140
C
A
B
D
4B
Cache Model
HitMiss Miss
Cache size: 2Block size: 2
Hit
A B C A DC B B A
CA
B
Which one should be replaced?
Replacement Policies
• Least recently used (LRU)Unload the files that have the least recently found defect.
• Least frequently changed (CHANGE)Unload the files that have the fewest changes.
• Least frequent defects (BUG)Unload the files that have the fewest defects.
HitMiss Miss
Cache size: 2Block size: 2
Hit
Replacement: BUG
A B C A DC B B A
CA
File BUG21B
C
BUG2
1 (replace)
B
Cache Evaluation
HitMiss Miss
Cache size: 2Block size: 2
Hit
Replacement: BUG
A B C A DC B B A
CA
Hit rate = #Hits / #Bugs = 50%
Hit Rates
0 25 50 75
Subversion
PostgreSQL
Mozilla
JEdit
Eclipse
Columba
Apache 1.3
FileCache size = 10% of all files
Hit Rates
73
79
88
85
95
83
82
0 25 50 75
Subversion
PostgreSQL
Mozilla
JEdit
Eclipse
Columba
Apache 1.3
FileCache size = 10% of all files
0 25 50 75 100
BugCache. Top 10%
Hassan et al. Top 10%
Ostrand et al. Top 20%
Khoshgoftaar et al. Top 20%
Khoshgoftaar et al. Top 10%
Related Work64%
82%
71~93%
44~78%
0 25 50 75 100
BugCache. Top 10%
Hassan et al. Top 10%
Ostrand et al. Top 20%
Khoshgoftaar et al. Top 20%
Khoshgoftaar et al. Top 10%
Related Work64%
82%
71~93%
44~78%
73~95%
0 25 50 75 100
BugCache. Top 10%
Hassan et al. Top 10%
Ostrand et al. Top 20%
Khoshgoftaar et al. Top 20%
Khoshgoftaar et al. Top 10%
Related Work64%
82%
71~93%
44~78%
73~95%
Previous State of the Art,10% predicts 44%~78% 20% predicts 71~93%
10% BugCache predicts 73~95%
Analyzing history is an effective way to predict bug locations
• Change classification can classify changes as buggy or clean with very good accuracy
• BugCache can identify the most bug-prone files
Conclusion
Research Overviewand Future Work
Research Goal
Developer productivity
Reliable software
Predicting bugs
Research Overview• Kenyon [FSE05]• Bug Introducing Changes [ASE06]• Prioritization of Warnings [FSE07]
History Mining
• Signature Change Patterns [ICSM06]• Micro Pattern Evolution [MSR06]• Matching Name Changes [WCRE06]
Software Understanding
• Memories of Bug Fixes [FSE06]• Change Classification [TSE08]• Bug Cache [ICSE07]
Bug Prediction
• ReCrash [ECOOP08]• Zero-day patch
Dynamic Monitoring
Static
Dynamic
Future Work• Micro commits and explicit fix-change marks• Change feedback to developers• Adaptive change classification
Change classification aware repository
• Mining common error patterns in my code• Showing code survival rates
Personal coding assistance
• Identifying/predicting crashed methods• Predicting locations based on bug reports• Increasing bug report quality
Mining bug and crash reports
• Example oriented API documents• Identifying more/less error prone APIs• Automatic API version upgrading
Mining APIs
• To find bugs effectively and efficiently• Static and Dynamic (ReCrash + BugCache)
Combining complementary
techniques
Static and Dynamic Analysis
Combine static and dynamic analysis• BugCache + ReCrash
Overhead False positives
Static Low High
Dynamic High Low
Reproducing crashes (faults) is hard!• Require the exact configuration of crash (in field)• Crashes usually involve nondeterministic facts
Must be able to reproduce crashes to fix bugs and validate fixes
Reproducing Crashes
Subject program
ReCrash
13-64% performance overhead
Test case
ReCrash [ECOOP08]
ReCrash
• Monitoring all modules
ReCrash + BugCache
X
X
X X
X
X
ReCrash
• Monitoring all modules
BugCache
• Identify crash-able modules
ReCrash + BugCache
X
X
X X
X
X
ReCrash
• Monitoring all modules
BugCache
• Identify crash-able modules
ReCrash + BugCache
• Monitoring only identified modules
ReCrash + BugCache
X
X
X X
X
ReCrash
• Monitoring all modules
BugCache
• Identify crash-able modules
ReCrash + BugCache
• Monitoring only identified modules
• 10% of modules account for 70% of crashes• ReCrash + BugCache can possibly
• run with 1~6% overhead• reproduce 70% of crashes
ReCrash + BugCache
Acknowledgement
Advisors• Jim Whitehead (UCSC), Michael Ernst (MIT)Collaborators (co-authors)
MIT Shay Artzi Adam Kiezun Danny DigUCSC Guozhang Ge Yi Zhang Kai Pan
Ramak Akella Jen Bevan Elias SindersonSaarland U Andreas Zeller Nicolas Bettenburg Rahul PremrajIowa Tien Nguyen Hojun Jaygarl Sean Chen (Taiwan)Canada Tom Zimmermann
(U. Calgary)Michael Godfrey (Waterloo)
Ahmed Hassan (Queen's U)
Europe Tudor Girba (SW-ENG) Martin Pinzger (U. Zurich)Industry /UW
Audris Mockus (Avaya) Shiv Shivaji (Yahoo) Miryung Kim (UW)
Summary
Predicting Bugsby Analyzing History
Sunghun [email protected]
Research On Program Analysis SystemSeoul National University
http://people.csail.mit.edu/hunkim