Dependency Trackingin
software systems
Presented by: Ashgan Fararooy
Related Papers• Supporting Software Evolution Analysis with Historical
Dependencies and Defect Information (ICSM 2008)
• A Flexible Framework to Support Collaborative Software Evolution Analysis (CSMR 2008)
• Mining Software Repositories for Traceability Links (ICPC 2007)
• Tracking Objects to Detect Feature Dependencies (ICPC 2007)• Software Repositories: A Source for Traceability Links
(TEFSE-GTC 2007)• Mining Version Archives for Co-changed Lines (ICSE 2006)• Understanding Semantic Impact of Source Code Changes: an
Empirical Study
Mining Version Archives for Co-changed Lines
Thomas Zimmermann, Sunghun Kim, Andreas Zeller, E. James Whitehead Jr.
(ICSE 2006)
Abstract
• Files, classes, or methods have frequently been investigated in research on co-change
• Present a first study at the level of lines
• Annotation Graph which captures how lines evolve over time
• More fine-grained software evolution information (based on lines)
Overview
• Co-Change: items that are changed together, are related to each other
• Any granularity: modules, files, classes, methods
• What about more fine-grained items: blocks, lines …
Co-Change in More Fine-Grained Items
• Seemed infeasible
• Hard to identify across different versions
• Line numbers are not suitable identifiers
• SCM systems annotation feature is not enough
• Line content is not a good identifier either
Annotation Graph
Definition:– A multipartite graph where each part corresponds
to one version of a file
–Within each part/version every line is represented by a single node
– Edges between node indicate that a line originates from another: by modification / movement
– Node labels (e.g. bold node) indicate a changed line
Annotation Graph
Annotation Graph
Construction:
– One needs to compare all subsequent revisions of a file
– Using the GNU diff tool For computing textual differences
– The diff tool returns a list of regions (“hunk”s) that differ in the two files
Annotation Graph
Three different kinds of changes:
–Modifications• Result in a complete bipartite subgraphs
– Additions• Do not result in any edges• Positions of the following lines are updated
– Deletions• The same effect as in addition
Annotation Graph
Computation:
– Creates nodes for each revision and each line
– Two approaches • 1- Forward-Directed
• 2- Backward-Directed
Annotation Graph
Computation (Forward-Directed Algorithm): – Iterate over all pairs of subsequent revisions
– For each pair compute the differences (hunks)
– Process the hunks to create edges• Exactly one edge between unchanged lines (nodes)
• For modified lines all possible edges
• For inserted and deleted lines no edges
– Label the nodes of the later revision in modifications and additions
Annotation Graph
Problem:
– Changes that modify large parts of a file
– Results in a large number of edges
– Not reasonable for evolution analysis
Annotation Graph
– Treat large modifications as combined deletions and additions
– No creation of edges in the annotation graph
Annotation Graph
Recognizing Large Modifications:
Annotating Lines
Comparison
–Most SCM systems have annotating features for each line providing the latest change information
– Annotation graphs can be used to get such information
– Furthermore, they provide information on all past changes
Life Cycle of Lines
Investigated the life cycle of lines for the Eclipse Project
– How frequently are lines changed• Computed for each line the change count
• The number of distinct revisions in its annotation
– How many developers change a line
–What are the most frequently changed lines
Finding Related Lines
– Computed related lines using frequent pattern mining
– Used transaction ids instead of revision ids
– Used Apriori algorithm
– Inferred useful association rules
Thank you
Top Related