MSR presentation

40
Comparative Study Retrieval from Software Libraries for Bug Localization: A Comparative Study of Generic and Composite Text Models Shivani Rao and Avinash Kak School of ECE,Purdue University May 21, 2011 MSR, Hawaii Mining Software Repositories, Hawaii, 2011

description

MSR 2011 Talk slides

Transcript of MSR presentation

Page 1: MSR presentation

Comparative Study

Retrieval from Software Libraries for BugLocalization: A Comparative Study of Generic

and Composite Text Models

Shivani Rao and Avinash Kak

School of ECE,Purdue University

May 21, 2011MSR, Hawaii

Mining Software Repositories, Hawaii, 2011

Page 2: MSR presentation

Comparative Study

Outline

1 Bug localization

2 IR(Information Retrieval)-based bug localization

3 Text Models

4 Preprocessing of the source files

5 Evaluation Metrics

6 Results

7 Conclusion

Mining Software Repositories, Hawaii, 2011

Page 3: MSR presentation

Comparative Study

Bug localization

Bug localization

Bug localization means to locate the files, methods, classes,etc., that are directly related to the problem causing abnormalexecution behavior of the software.

IR Bug localization means to locate a bug from its textualdescription.

Mining Software Repositories, Hawaii, 2011

Page 4: MSR presentation

Comparative Study

Background

A typical bug localization process

Mining Software Repositories, Hawaii, 2011

Page 5: MSR presentation

Comparative Study

Background

A typical bug report:JEdit

Mining Software Repositories, Hawaii, 2011

Page 6: MSR presentation

Comparative Study

Background

Past work on IR-based bug localization

Authors/Paper Model Software dataset

Marcus et al.[1]

VSM Jedit

Cleary et al. [2] LM, LSA andCA

Eclipse JDT

Lukins et al. [3] LDA Mozilla, Eclipse, Rhino andJEdit

Drawbacks

1 None of the work reported has been evaluated on a standarddataset.

2 Inability to compare with the static and dynamic techniques.

3 Number of bugs is of the order 5-30

Mining Software Repositories, Hawaii, 2011

Page 7: MSR presentation

Comparative Study

Background

iBUGS

Created by Dallmeier and Zimmerman [4], iBUGS contains alarge number of real bugs with corresponding test suites inorder to generate failing and passing test runs

ASPECTJ software

Software Library Size (Number of files) 6546

Lines of Code 75 KLOC

Vocabulary Size 7553

Number of bugs 291

Table: The iBUGS dataset after preprocessing

Mining Software Repositories, Hawaii, 2011

Page 8: MSR presentation

Comparative Study

Background

A typical bug report in the iBUGS repository

Mining Software Repositories, Hawaii, 2011

Page 9: MSR presentation

Comparative Study

Text Models

Text models

VSM : Vector Space Model

LSA : Latent Semantic Analysis Model

UM : Unigram Model

LDA : Latent Dirichlet Allocation Model

CBDM : Cluster-Based Document Model

Mining Software Repositories, Hawaii, 2011

Page 10: MSR presentation

Comparative Study

Text Models

Vector Space Model

If V is the vocabularythen queries anddocuments are|V|-dimensional vectors.

sim(q, dm) =wq.wm

|wq||wm|

Sparse yet highdimensional space.

Mining Software Repositories, Hawaii, 2011

Page 11: MSR presentation

Comparative Study

Text Models

Latent semantic analysis: Eigen decomposition

A = UΣV T

Mining Software Repositories, Hawaii, 2011

Page 12: MSR presentation

Comparative Study

Text Models

LSA based models

Topic based representation: ~wk(m) which is a K -dimensionaleigen vector that mth document ~wm.

~wK (m) = Σ−1K UTK ~wm

qK = Σ−1K UTK q

sim(q, dm) =qK .~wK (m)

|qK ||~wK (m)|

LSA2: Fold back the K-dimensional representation to asmoothed |V| dimensional represenation and compare directlywith the query q. w = UKΣK ~w

TK

Combined Representation: combines the LSA2 with the VSMrepresentation using the mixture parameter λ .Acombined = λA + (1− λ)A

Mining Software Repositories, Hawaii, 2011

Page 13: MSR presentation

Comparative Study

Text Models

Unigram model to represent documents usingprobability distribution [5]

The term frequencies in a document are considered to be itsprobability distributionThe term frequencies in a query become the query’sprobablity distributionThe similarities are established by comparing the probabilitydistributions using KL divergence.To add smoothing we add the probability distribution over theentire source library.

puni (w |Dm) = µc(w , dm)

|dm|+ (1− µ)

∑|D|m=1 c(w , dm)∑|D|

m=1 |dm|

puni (w |q) = µc(w , q)

|q|+ (1− µ)

∑|D|m=1 c(w , dm)∑|D|

m=1 |dm|Mining Software Repositories, Hawaii, 2011

Page 14: MSR presentation

Comparative Study

Text Models

LDA: A mixture model to representdocuments using topics/concepts [6]

Mining Software Repositories, Hawaii, 2011

Page 15: MSR presentation

Comparative Study

Text Models

LDA based models [7]

Topic based representation θm which is a K -dimensionalprobability vector that indicates the topic proportionspresent in mth document.

Maximum Likelihood Representation folds back to the |V|dimensional term space.

plda(w |Dm) =t=K∑t=1

p(w |z = t)p(z = t|Dm)

=t=K∑t=1

φ(t,w)θm(t)

Combined Representation combines the Unigram representation ofthe document and the MLE-LDA representation of adocument.

pcombined(w |Dm) = λplda(w |Dm)+(1−λ)puni (w |Dm)Mining Software Repositories, Hawaii, 2011

Page 16: MSR presentation

Comparative Study

Text Models

Cluster Based Document Model (CBDM) [8]

Cluster the documents into K clusters using deterministicalgorithms like K-means, hierarchical, agglomerative clusteringand so on.

Represent each of the clusters using a multinomial distributionover the terms in the vocabulary. This distribution iscommonly denoted by pML(w |Clusterj) and we can expressprobabilistic distribution for a words in a dm ∈ Clusterj by:

pcbdm(w |~wm) = λ1 ×wm(n)∑n=|V|

n=1 wm(n)+ λ2 × pc(w) +

λ3 × pML(w |Clusterj) (1)

Mining Software Repositories, Hawaii, 2011

Page 17: MSR presentation

Comparative Study

Text Models

Summary of Text Models used in thecomparative study

Mining Software Repositories, Hawaii, 2011

Page 18: MSR presentation

Comparative Study

Text Models

Summary of Text Models used in thecomparative study (cont.)

Model Representation Similarity Metric

VSM frequency vector Cosine similarity

LSA K dimensional vector in theeigen space

Cosine similarity

Unigram |V| dimensional probability vec-tor (smoothed)

KL divergence

LDA K dimensional probability vec-tor

KL divergence

CBDM |V| dimensional combined prob-ability vector

KL divergence or likeli-hood

Table: Generic models used in the comparative evaluation

Mining Software Repositories, Hawaii, 2011

Page 19: MSR presentation

Comparative Study

Text Models

Summary of Text Models used in thecomparative study (cont.)

Model Representation Similarity Metric

LSA2 |V| dimensional representationin term-space

Cosine similarity

MLE-LDA

|V| dimensional MLE-LDAprobability vector

KL divergence or likeli-hood

Table: The variations on two of the generic models used in thecomparative evaluation

Mining Software Repositories, Hawaii, 2011

Page 20: MSR presentation

Comparative Study

Text Models

Summary of Text Models used in thecomparative study (cont.)

Model Representation Similarity Metric

Unigram+ LDA

|V| dimensional combined prob-ability vector

KL divergence or likeli-hood

VSM +LSA

|V| dimensional combined VSMand LSA representation

Cosine similarity

Table: The two composite models used

Mining Software Repositories, Hawaii, 2011

Page 21: MSR presentation

Comparative Study

Preprocessing of the source files

Preprocessing of the source files

If a patch file does not exist in the /trunk then it is searchedand added to the source library from the other branches/tagsof the ASPECTJ

The source library consists of ”.java” files only. After thisstep, our library ended up with 6546 Java files.

The repository.xml file documents all the information relatedto a bug. This includes the BugID, the bug description, therelevant source files, and so on. We shall call thisground-truth information as relevance judgements.

The bugs that are documented in iBUGS and do not have anyrelevant software files in the source library that results fromthe previous step are eliminated. After this step, we are leftwith 291 bugs.

Mining Software Repositories, Hawaii, 2011

Page 22: MSR presentation

Comparative Study

Preprocessing of the source files

Preprocessing of the source files (contd)

Hard-words, camel-case words and soft-words are handled byusing popular identifier-splitting methods [9, 10].

Stop-list consists of most commonly occuring words.Example: “for,” “else,” “while,” “int,”, “double,” “long,”“public,” “void,” etc. There are 375 such words in iBUGSASPECTJ software. We also drop from the vocabulary allunicode strings.

The vocabulary is pruned further by calculating the relativeimportance of terms and eliminating ubiquitous andrarely-occuring terms.

Mining Software Repositories, Hawaii, 2011

Page 23: MSR presentation

Comparative Study

Evaluation Metrics

Mean Average Precision (MAP)

Mean Average Precision (MAP)

Calculated using the following two sets:

retreived(Nr ) set consists of the top Nr documents from a rankedlist of documents retrieved vis-a-vis the query.

relevant set is extracted from relevance judgements availablefrom repository.xml

Precision and Recall:

Precision(P@Nr ) =|{relevant}

⋂{retrieved}|

|{retrieved}|

Recall(R@Nr ) =|{relevant}

⋂{retrieved}|

|{relevant}|

Mining Software Repositories, Hawaii, 2011

Page 24: MSR presentation

Comparative Study

Evaluation Metrics

Mean Average Precision (MAP)

Mean Average Precision (MAP) (cont.)

1 If we were to plot a typical P-R curve from the values forP@Nr and R@Nr , we would get a monotonically decrceasingcurve that has high values of Precision for low values of Recalland vice versa.

2 Area under the P-R curve is called the Average Precision.

3 Taking mean of the Average Precision over all the queriesgives Mean Average Precision (MAP).

4 Physical significance of MAP: Same as that of Precision.

Mining Software Repositories, Hawaii, 2011

Page 25: MSR presentation

Comparative Study

Evaluation Metrics

Rank of Retrieved Files

Rank of Retrieved Files [3]

The number of queries/bugs for which relevant source fileswere retrieved with ranks rlow ≤ R ≤ rhigh is reported.

For the retrieval performance reported in [3], ranks used areR = 1, 2 ≤ R ≤ 5, 6 ≤ R ≤ 10 and R > 10.

Mining Software Repositories, Hawaii, 2011

Page 26: MSR presentation

Comparative Study

Evaluation Metrics

SCORE

SCORE [11]

1 Indicates the proportion of the program that need to beexamined in order to locate or localize a fault

2 For each range of this proportion (example, 10− 20%) thenumber of test-runs (bugs) is reported.

Mining Software Repositories, Hawaii, 2011

Page 27: MSR presentation

Comparative Study

Results

Models using LDA

Figure: MAP using the three LDA models for different values of K, theexperimental parameters for LDA+Unigram model are λ = 0.9 µ = 0.5,β = 0.01 and α = 50/K

Mining Software Repositories, Hawaii, 2011

Page 28: MSR presentation

Comparative Study

Results

The combined LDA+Unigram model

Figure: MAP plotted for different values of mixture proportions (λ andµ) of the LDA+Unigram combined model.

Mining Software Repositories, Hawaii, 2011

Page 29: MSR presentation

Comparative Study

Results

Models using LSA

Figure: MAP using LSA model and its variations and combinations fordifferent values of K. The experimental parameter for the LSA+VSMcombined model is λ = 0.5.

Mining Software Repositories, Hawaii, 2011

Page 30: MSR presentation

Comparative Study

Results

CBDM

Model parameters Kλ1 λ2 λ3 100 250 500 1000

0.25 0.25 0.5 0.093144 0.0914 0.08666 0.07664

0.15 0.35 0.5 0.0883 0.0897 0.0963 0.0932

0.81 0.09 0.1 0.143 0.102 0.108 0.09952

0.27 0.63 0.1 0.1306 0.117 0.111 0.0998

0.495 0.495 0.01 0.141 0.141 0.141 0.141

0.05 0.05 0.99 0.069 0.075 0.072 0.065

Table: Retrieval performance using MAP with the CBDM.λ1 + λ2 + λ3 = 1. λ1 Unigram model λ2 Collection Model λ3 Clustermodel

Mining Software Repositories, Hawaii, 2011

Page 31: MSR presentation

Comparative Study

Results

Rank based metric

Figure: The height of the bars shows the number of queries (bugs) forwhich at least one relevant source file was retrieved at rank 1.

Mining Software Repositories, Hawaii, 2011

Page 32: MSR presentation

Comparative Study

Results

SCORE: IR based bug localization tools

Mining Software Repositories, Hawaii, 2011

Page 33: MSR presentation

Comparative Study

Results

SCORE: Compare with AMPLE andFINDBUGS

Figure: SCORE values calculated over 44bugs in iBUGS ASPECTJ using AMPLE[12]

SCORE with FINDBUGS

None of the bugs werelocalized correctly.

Mining Software Repositories, Hawaii, 2011

Page 34: MSR presentation

Comparative Study

Conclusion

Conclusion

IR based bug localization techniques are equally or moreeffective compared to static or dynamic bug localization tools.

Sophisticated models like LDA, LSA or CBDM do notout-perform simpler models like Unigram or VSM for IR basedbug localization on large software systems.

An analysis of the spread of the word distributions over thesource files with the help of measures such as tf and idf cangive useful insights into the usability of topic and clusterbased models for localization.

Mining Software Repositories, Hawaii, 2011

Page 35: MSR presentation

Comparative Study

Conclusion

End of Presentation

Thanks to

Questions?

Mining Software Repositories, Hawaii, 2011

Page 36: MSR presentation

Comparative Study

Conclusion

Threads to validity

We have tested on a single database like iBUGS. How doesthis generalize?

We have eliminated xml files among those that are indexedand queried. Maybe not a valid assumption?

Mining Software Repositories, Hawaii, 2011

Page 37: MSR presentation

Comparative Study

Conclusion

References

A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic, “AnInformation Retrieval Approach to Concept Location in Sourcecode,” in In Proceedings of the 11th Working Conference onReverse Engineering (WCRE 2004, pp. 214–223, IEEEComputer Society, 2004.

B. Cleary, C. Exton, J. Buckley, and M. English, “An EmpiricalAnalysis of Information Retrieval based Concept LocationTechniques in Software Comprehension,” Empirical Softw.Engg., vol. 14, no. 1, pp. 93–130, 2009.

S. K. Lukins, N. A. Karft, and E. H. Letha, “Source CodeRetrieval for Bug Localization using Latent DirichletAllocation,” in 15th Working Conference on ReverseEngineering, 2008.

Mining Software Repositories, Hawaii, 2011

Page 38: MSR presentation

Comparative Study

Conclusion

References (cont.)

V. Dallmeier and T. Zimmermann, “Extraction of BugLocalization Benchmarks from History,” in ASE ’07:Proceedings of the twenty-second IEEE/ACM internationalconference on Automated software engineering, (New York,NY, USA), pp. 433–436, ACM, 2007.

J. Lafferty and C. Zhai, “A Study of Smoothing Methods forLanguage Models Applied to information retrieval,” ACMTransactions Information Systems, pp. 179–214, 2004.

D. M. Blei, A. V. Ng, and M. I. Jordan, “Latent DirichletAllocation,” Journal of Machine Learning, pp. 993–1022, 2003.

Mining Software Repositories, Hawaii, 2011

Page 39: MSR presentation

Comparative Study

Conclusion

References (cont.)

X. Wei and W. B. Croft, “Lda-Based Document Models forAd-hoc Retrieval,” in Proceedings of the 29th annualinternational ACM SIGIR conference on Research anddevelopment in information retrieval, ACM, 2006.

L. X and W. B. Croft, “Cluster-Based Retrieval UsingLanguage Models,” in ACM SIGIR Conference on Researchand Development in Information Retrieval, ACM, 2004.

D. B. H. Field and D. Lawrie., “An Empirical Comparison ofTechniques for Extracting Concept Abbreviations fromIdentifiers.,” in Proceedings of IASTED InternationalConference on Software Engineering and Applications, 2006.

Mining Software Repositories, Hawaii, 2011

Page 40: MSR presentation

Comparative Study

Conclusion

References (cont.)

E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker, “MiningSource Code to Automatically Split Identifiers for SoftwareAnalysis,” in Proceedings of the 2009 6th IEEE InternationalWorking Conference on Mining Software Repositories, MSR’09, (Washington, DC, USA), pp. 71–80, IEEE ComputerSociety, 2009.

J. A. Jones and M. J. Harrold, “Empirical Evaluation of theTarantula Automatic Fault-Localization Technique,” inAutomated Software Engineering, 2005.

V. Dallmeier and T. Zimmermann, “Automatic Extraction ofBug Localization Benchmarks from History,” tech. rep.,Universiat des Saarlandes, Saarbrucken, Germany, June 2007.

Mining Software Repositories, Hawaii, 2011