Model-based Mining of Software Repositories

33
Model-based Mining of Source Code Repositories Markus Scheidgen , Joachim Fischer {scheidge,fischer}@informatik.hu-berlin.de 1 Tuesday, 30. September 2014

description

The Mining Software Repositories (MSR) field analyzes the rich data available in source code repositories (SCR) to uncover interesting and actionable information about software system evolution. Major obstacles in MSR are the heterogeneity of software projects and the amount of data that is processed. Model-driven software engineering (MDSE) can deal with heterogeneity by abstraction as its core strength, but only recent efforts in adopting NoSQL-databases for persisting and processing very large models made MDSE a feasible approach for MSR. This paper is a work in progress report on srcrepo: a model-based MSR system. Srcrepo uses the NoSQL-based EMF-model persistence layer EMF-Fragments and Eclipse’s MoDisco reverse engineering framework to create EMF-models of whole SCRs that comprise all code of all revisions at an abstract syntax tree (AST) level. An OCL-like language is used as an accessible way to finally gather information such as software metrics from these SCR models.

Transcript of Model-based Mining of Software Repositories

Page 1: Model-based Mining of Software Repositories

Model-based Mining of Source Code Repositories

Markus Scheidgen, Joachim Fischer{scheidge,fischer}@informatik.hu-berlin.de

1

Tuesday, 30. September 2014

Page 2: Model-based Mining of Software Repositories

Agenda

▶ Mining Software Repositories (MSR) and Software Evolution Research (SER)

▶ srcrepo – a model-based MSR system

■ components and mining process

■ distributed storage of very large models with emf-fragments

■ a meta-model for source code repositories

■ gathering software metrics with an OCL-like internal Scala DSL

▶ work in progress - discussion of remaining problems and limitations

2

Tuesday, 30. September 2014

Page 3: Model-based Mining of Software Repositories

Relevant Research Fields

3

1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007

2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008

3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990

Tuesday, 30. September 2014

Page 4: Model-based Mining of Software Repositories

Relevant Research Fields

3

Mining Software Repositories (MSR)The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1]

Software MetricsA software metric is a mathematical definition mapping the entities of a software system to numeric metrics values.[...] to express features of software with numbers in order to facilitate software quality assessment. [2]

Reverse EngineeringReverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3]

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007

2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008

3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990

Tuesday, 30. September 2014

Page 5: Model-based Mining of Software Repositories

Relevant Research Fields

3

Mining Software Repositories (MSR)The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1]

Software MetricsA software metric is a mathematical definition mapping the entities of a software system to numeric metrics values.[...] to express features of software with numbers in order to facilitate software quality assessment. [2]

Reverse EngineeringReverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3]

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007

2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008

3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990

Tuesday, 30. September 2014

Page 6: Model-based Mining of Software Repositories

Relevant Research Fields

3

Mining Software Repositories (MSR)The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1]

Software MetricsA software metric is a mathematical definition mapping the entities of a software system to numeric metrics values.[...] to express features of software with numbers in order to facilitate software quality assessment. [2]

Reverse EngineeringReverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3]

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007

2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008

3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990

Tuesday, 30. September 2014

Page 7: Model-based Mining of Software Repositories

Relevant Research Fields

3

Mining Software Repositories (MSR)The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1]

Software MetricsA software metric is a mathematical definition mapping the entities of a software system to numeric metrics values.[...] to express features of software with numbers in order to facilitate software quality assessment. [2]

Reverse EngineeringReverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3]

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007

2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008

3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990

Tuesday, 30. September 2014

Page 8: Model-based Mining of Software Repositories

Relevant Research Fields

3

Mining Software Repositories (MSR)The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1]

Software MetricsA software metric is a mathematical definition mapping the entities of a software system to numeric metrics values.[...] to express features of software with numbers in order to facilitate software quality assessment. [2]

Reverse EngineeringReverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3]

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007

2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008

3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990

Tuesday, 30. September 2014

Page 9: Model-based Mining of Software Repositories

Relevant Research Fields

3

Mining Software Repositories (MSR)The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1]

Software MetricsA software metric is a mathematical definition mapping the entities of a software system to numeric metrics values.[...] to express features of software with numbers in order to facilitate software quality assessment. [2]

Reverse EngineeringReverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3]

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Software Evolution Research (SER)

■(dis-)proving Lehmann’s Laws of software evolution■empirical investigations of software repositories through statistical analysis of

software metrics and software change metrics over the evolutionary cause of many software systems.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

Model-based Mining Software Repositories (with srcrepo)Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth.

1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007

2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008

3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990

Tuesday, 30. September 2014

Page 10: Model-based Mining of Software Repositories

srcrepo – Components and Process

4

Tuesday, 30. September 2014

Page 11: Model-based Mining of Software Repositories

srcrepo – Components and Process

4

large scale software repositories

(e.g. github, sourceforge)

source code repository

(e.g. controlled by Git, SVN, CVS)

source code(e.g. java, C++,

eclipse*)issu

e tr

acke

r, m

ailin

g lis

ts, w

iki

software projects

Tuesday, 30. September 2014

Page 12: Model-based Mining of Software Repositories

srcrepo – Components and Process

4

large scale software repositories

(e.g. github, sourceforge)

srcrepo storage(EMF-models via EMF-Fragments,

e.g. on mongodb)

source code repository

(e.g. controlled by Git, SVN, CVS)

source code(e.g. java, C++,

eclipse*)issu

e tr

acke

r, m

ailin

g lis

ts, w

iki

software projects

C!

1

2 3

A" A!B#

revision tree

AST-models of new and changed CUs

import

Tuesday, 30. September 2014

Page 13: Model-based Mining of Software Repositories

srcrepo – Components and Process

4

large scale software repositories

(e.g. github, sourceforge)

srcrepo storage(EMF-models via EMF-Fragments,

e.g. on mongodb)

srcrepo runtime(headless eclipse RCP)

source code repository

(e.g. controlled by Git, SVN, CVS)

source code(e.g. java, C++,

eclipse*)issu

e tr

acke

r, m

ailin

g lis

ts, w

iki

software projects

C!

1

2 3

A" A!B#

revision tree

AST-models of new and changed CUs

import

1

2 3

S! S" S"

revision tree

fully resolved snapshot m

odels

A! B"A!C#B"A#

analysis

Tuesday, 30. September 2014

Page 14: Model-based Mining of Software Repositories

5

large scale software repositories

(e.g. github, sourceforge)

srcrepo storage(EMF-models via EMF-Fragments,

e.g. on mongodb)

srcrepo runtime(headless eclipse RCP)

source code repository

(e.g. controlled by Git, SVN, CVS)

source code(e.g. java, C++,

eclipse*)issu

e tr

acke

r, m

ailin

g lis

ts, w

iki

software projects

C!

1

2 3

A" A!B#

revision tree

AST-models of new and changed CUs

import

1

2 3

S! S" S"

revision tree

fully resolved snapshot m

odels

A! B"A!C#B"A#

analysis

Tuesday, 30. September 2014

Page 15: Model-based Mining of Software Repositories

5

large scale software repositories

(e.g. github, sourceforge)

srcrepo storage(EMF-models via EMF-Fragments,

e.g. on mongodb)

srcrepo runtime(headless eclipse RCP)

source code repository

(e.g. controlled by Git, SVN, CVS)

source code(e.g. java, C++,

eclipse*)issu

e tr

acke

r, m

ailin

g lis

ts, w

iki

software projects

C!

1

2 3

A" A!B#

revision tree

AST-models of new and changed CUs

import

1

2 3

S! S" S"

revision tree

fully resolved snapshot m

odels

A! B"A!C#B"A#

analysis

OCL

M! M" M#

Tuesday, 30. September 2014

Page 16: Model-based Mining of Software Repositories

5

large scale software repositories

(e.g. github, sourceforge)

srcrepo storage(EMF-models via EMF-Fragments,

e.g. on mongodb)

srcrepo runtime(headless eclipse RCP)

source code repository

(e.g. controlled by Git, SVN, CVS)

source code(e.g. java, C++,

eclipse*)issu

e tr

acke

r, m

ailin

g lis

ts, w

iki

software projects

C!

1

2 3

A" A!B#

revision tree

AST-models of new and changed CUs

import

1

2 3

S! S" S"

revision tree

fully resolved snapshot m

odels

A! B"A!C#B"A#

analysis

EMF-Co

mpa

reOCL

M!"#

D!"#

M#"$

D#"$

OCL

M! M" M#

Tuesday, 30. September 2014

Page 17: Model-based Mining of Software Repositories

5

large scale software repositories

(e.g. github, sourceforge)

srcrepo storage(EMF-models via EMF-Fragments,

e.g. on mongodb)

srcrepo runtime(headless eclipse RCP)

source code repository

(e.g. controlled by Git, SVN, CVS)

source code(e.g. java, C++,

eclipse*)issu

e tr

acke

r, m

ailin

g lis

ts, w

iki

software projects

C!

1

2 3

A" A!B#

revision tree

AST-models of new and changed CUs

import

1

2 3

S! S" S"

revision tree

fully resolved snapshot m

odels

A! B"A!C#B"A#

analysis

M!

M!"#

M# M#

M#"$

store

timelines of metrics

EMF-Co

mpa

reOCL

M!"#

D!"#

M#"$

D#"$

OCL

M! M" M#

Tuesday, 30. September 2014

Page 18: Model-based Mining of Software Repositories

5

statistics software(e.g. R, Matlab)

large scale software repositories

(e.g. github, sourceforge)

srcrepo storage(EMF-models via EMF-Fragments,

e.g. on mongodb)

srcrepo runtime(headless eclipse RCP)

source code repository

(e.g. controlled by Git, SVN, CVS)

source code(e.g. java, C++,

eclipse*)issu

e tr

acke

r, m

ailin

g lis

ts, w

iki

software projects

C!

1

2 3

A" A!B#

revision tree

AST-models of new and changed CUs

import

1

2 3

S! S" S"

revision tree

fully resolved snapshot m

odels

A! B"A!C#B"A#

analysis

M!

M!"#

M# M#

M#"$

store

timelines of metrics

export

EMF-Co

mpa

reOCL

M!"#

D!"#

M#"$

D#"$

OCL

M! M" M#

Tuesday, 30. September 2014

Page 19: Model-based Mining of Software Repositories

Distributed Model Store with emf-fragments

6

*

normal references

Tuesday, 30. September 2014

Page 20: Model-based Mining of Software Repositories

Distributed Model Store with emf-fragments

6

*

normal references

*«fragments»

fragmenting references

Tuesday, 30. September 2014

Page 21: Model-based Mining of Software Repositories

A Meta-Model for Source Code Repositories

7

revi

sion

tree

sour

ce c

ode

MoD

isco

«fragm

ents»

«fragments»

Tuesday, 30. September 2014

Page 22: Model-based Mining of Software Repositories

8

revi

sion

tree

sour

ce c

ode

MoD

isco

«fragm

ents»

«fragments»

Tuesday, 30. September 2014

Page 23: Model-based Mining of Software Repositories

A OCL-like internal Scala DSL for Computing Metrics

▶ OCL-like internal Scala DSL analog to our internal Scala model transformation language [1]

▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]:

9

1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012

2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare

Tuesday, 30. September 2014

Page 24: Model-based Mining of Software Repositories

A OCL-like internal Scala DSL for Computing Metrics

▶ OCL-like internal Scala DSL analog to our internal Scala model transformation language [1]

▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]:

9

context Model: self.ownedElements->collect(p|p.ownedElements)->size

pure OCL

Model PackageAbstractType

Declaration

ownedElements

ownedElements

ownedPackages

*

*

*

1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012

2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare

Tuesday, 30. September 2014

Page 25: Model-based Mining of Software Repositories

A OCL-like internal Scala DSL for Computing Metrics

▶ OCL-like internal Scala DSL analog to our internal Scala model transformation language [1]

▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]:

9

context Model: self.ownedElements->collect(p|p.ownedElements)->size

pure OCL

def numberOfFirstPackageLevelTypes(self: Model): Int = self.getOwnedElements().collect(p=>p.getOwnedElements()).size()

OCL-like expression in Scala

Model PackageAbstractType

Declaration

ownedElements

ownedElements

ownedPackages

*

*

*

1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012

2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare

Tuesday, 30. September 2014

Page 26: Model-based Mining of Software Repositories

A OCL-like internal Scala DSL for Computing Metrics

▶ Extending OCL’s collection operations:■ convenience operations■ closure■ aggregation■ execution

10

trait OclCollection[E] extends java.lang.Iterable[E] { def size(): Int def first(): E def exists(predicate: (E) => Boolean): Boolean def forAll(predicate: (E) => Boolean): Boolean def select(predicate: (E) => Boolean): OclCollection[E] def reject(predicate: (E) => Boolean): OclCollection[E] def collect[R](expr: (E) => R): OclCollection[R] def selectOfType[T]:OclCollection[T] def collectNotNull[R](expr: (E) => R): OclCollection[R] def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] def closure(expr: (E) => OclCollection[E]): OclCollection[E]

def aggregate[R,I](expr: (E) => I, start: () => R, aggr: (R, I) => R): R def sum(expr: (E) => Double): Double def product(expr: (E) => Double): Double def max(expr: (E) => Double): Double def min(expr: (E) => Double): Double def avg(expr: (E) => Double): Stats[Double]

def run(runnable: (E) => Unit): Unit}

123456789

10111213141516171819202122232425

Tuesday, 30. September 2014

Page 27: Model-based Mining of Software Repositories

11

trait OclCollection[E] extends java.lang.Iterable[E] { def size(): Int def first(): E def exists(predicate: (E) => Boolean): Boolean def forAll(predicate: (E) => Boolean): Boolean def select(predicate: (E) => Boolean): OclCollection[E] def reject(predicate: (E) => Boolean): OclCollection[E] def collect[R](expr: (E) => R): OclCollection[R] def selectOfType[T]:OclCollection[T] def collectNotNull[R](expr: (E) => R): OclCollection[R] def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] def closure(expr: (E) => OclCollection[E]): OclCollection[E]

def aggregate[R,I](expr: (E) => I, start: () => R, aggr: (R, I) => R): R def sum(expr: (E) => Double): Double def product(expr: (E) => Double): Double def max(expr: (E) => Double): Double def min(expr: (E) => Double): Double def stats(expr: (E) => Double): Stats

def run(runnable: (E) => Unit): Unit}

123456789

10111213141516171819202122232425

Tuesday, 30. September 2014

Page 28: Model-based Mining of Software Repositories

Complex Example: Average Weighted Methods per Class (WMC)

▶ WMC is the first CK-metric [1]. There different commonly used weights; here we use cyclomatic complexity.

12

def classes(model:Model):OclCollection[ClassDeclaration] = model.getOwnedElements() .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average

def cyclomaticComplexity(method:MethodDeclaration):Int = ...

123456789

10111213141516

1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994

Tuesday, 30. September 2014

Page 29: Model-based Mining of Software Repositories

Complex Example: Average Weighted Methods per Class (WMC)

▶ WMC is the first CK-metric [1]. There different commonly used weights; here we use cyclomatic complexity.

12

def classes(model:Model):OclCollection[ClassDeclaration] = model.getOwnedElements() .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average

def cyclomaticComplexity(method:MethodDeclaration):Int = ...

123456789

10111213141516

1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994

Model

ClassDeclaration

AbstractBody

Declaration

Package

AbstractType

Declaration

MethodDeclaration

AbstractMethod

Invocation

TypeAccess

ownedElements

ownedElements

ownedPackages

bodyDeclarations

superInterfaces

superClass

returnType

type

usagesInTypeAccess

method

*

* *

*

**

1

1

1

1

Tuesday, 30. September 2014

Page 30: Model-based Mining of Software Repositories

Complex Example: Average Weighted Methods per Class (WMC)

▶ WMC is the first CK-metric [1]. There different commonly used weights; here we use cyclomatic complexity.

12

def classes(model:Model):OclCollection[ClassDeclaration] = model.getOwnedElements() .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average

def cyclomaticComplexity(method:MethodDeclaration):Int = ...

123456789

10111213141516

1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994

Model

ClassDeclaration

AbstractBody

Declaration

Package

AbstractType

Declaration

MethodDeclaration

AbstractMethod

Invocation

TypeAccess

ownedElements

ownedElements

ownedPackages

bodyDeclarations

superInterfaces

superClass

returnType

type

usagesInTypeAccess

method

*

* *

*

**

1

1

1

1

Tuesday, 30. September 2014

Page 31: Model-based Mining of Software Repositories

Implementation of the OCL-Collection Operations

▶ Just in time iterator-based implementation rather than straight forward aggregation of result collections.

13

:RepositoryModel

:Rev :Rev :Rev

:Par... :Par... :Par... :Par... :Par... :Par...

:Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di!

1

2

3

4

Tuesday, 30. September 2014

Page 32: Model-based Mining of Software Repositories

Future Work, Remaining Problems, and Limitations

14

Scalability

■very large compilation units■incremental snapshot creation■batching OCL execution■experiments with large scale

repository (e.g. git.eclipse.org)

Heterogeneity

■MoDisco for different programming languages■common metrics meta-model

(e.g. OMG, KDM)■VCS abstraction and support for

different VCS

Accessibility

■relating results to software repository entities■persisting and

exporting results

Information Depth

■diff-models from comparison of compilation units

▶ Very large compilation units (CU): e.g. a 3 MB, 600 kLOC CU in org.eclipse.emf■ tends to have lots of dependencies ➞ changes often ➞ makes problem even bigger■ CUs are smallest common denominator between text-based VCS view and syntax-

based AST view■ smaller units require model-comparison or text-to-AST mappings

▶ Support for different programming languages: either abstraction, parallel meta-models, or mixed approach■ MoDisco is extendable, but only Java support exists; other languages need to be

implemented ➞ parallel meta-models■ A reasonable abstraction for multiple (or all) programming language probably does

not exist.■ A shared abstract meta-model that all language meta-models extends could be an

sensible compromise.

Tuesday, 30. September 2014

Page 33: Model-based Mining of Software Repositories

15

Tuesday, 30. September 2014