Software Reengineering & Evolution - UAntwerpen

Post on 02-Jan-2022

4 views 0 download

Transcript of Software Reengineering & Evolution - UAntwerpen

Software Reengineering���& Evolution

Serge Demeyer���Stéphane Ducasse���Oscar Nierstrasz

http://scg.unibe.ch/download/oorp/

© S. Demeyer, S. Ducasse, O. Nierstrasz Software Reengineering and Evolution.2

Schedule1. Introduction

There are OO legacy systems too !

2. Reverse EngineeringHow to understand your code

3. VisualizationScaleable approach

4. RestructuringHow to Refactor Your Code

5. Code DuplicationThe most typical problems

6. Software EvolutionLearn from the past

7. Conclusion

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.3

GoalsWe will try to convince you:

•  Yes, Virginia, there are object-oriented legacy systems too!

•  Reverse engineering and reengineering are essential activities in the lifecycle of any successful software system. (And especially OO ones!)

•  There is a large set of lightweight tools and techniques to help you with reengineering.

•  Despite these tools and techniques, people must do job and they represent the most valuable resource.

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.4

What is a Legacy System ?“legacy”

A sum of money, or a specified article, given to another by will; anything handed down by an ancestor or predecessor.

— Oxford English Dictionary

⇒ so, further evolution and development may be prohibitively expensive

A legacy system is a piece of software that:

•  you have inherited, and•  is valuable to you.

Typical problems with legacy systems:•  original developers not available•  outdated development methods used•  extensive patches and modifications have

been made•  missing or outdated documentation

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.5

Software Maintenance - Cost

requirementdesign

codingtesting

delivery

x 1

x 5

x 10

x 20

x 200Relative Maintenance EffortBetween 50% and 75% of global effort is spent on

“maintenance” !

Relative Costof Fixing Mistakes

Solution ?•  Better requirements engineering?•  Better software methods & tools���

(database schemas, CASE-tools, objects, components, …)?

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.6

Continuous Development

17.4% Corrective(fixing reported errors)

18.2% Adaptive(new platforms or OS)

60.3% Perfective(new functionality)

The bulk of the maintenance cost is due to new functionality⇒ even with better requirements, it is hard to predict new functions

data from [Lien78a]

4.1% Other

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.7

(*) process-oriented structured methods, information engineering,data-oriented methods, prototyping, CASE-tools – not OO !

Contradiction ? No!•  modern methods make it easier to change���

... this capacity is used to enhance functionality!

Modern Methods & Tools ?[Glas98a] quoting empirical study from Sasa Dekleva (1992)

•  Modern methods(*) lead to more reliable software•  Modern methods lead to less frequent software repair

•  and ...•  Modern methods lead to more total maintenance time

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.8

Lehman's LawsA classic study by Lehman and Belady [Lehm85a] identified several “laws”

of system change.

Continuing change

•  A program that is used in a real-world environment must change, or become progressively less useful in that environment.

Increasing complexity

•  As a program evolves, it becomes more complex, and extra resources are needed to preserve and simplify its structure.

Those laws are still applicable…

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.9

What about Objects ?Object-oriented legacy systems•  = successful OO systems whose architecture and design no longer

responds to changing requirements

Compared to traditional legacy systems•  The symptoms and the source of the problems are the same

•  The technical details and solutions may differ

OO techniques promise better•  flexibility, •  reusability,

•  maintainability

•  …

⇒ they do not come for free

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.10

What about Components ?

Components are very brittle …After a while one inevitably resorts to glue :)

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.11

Soccer Field Metaphor

© A. Van Deursen

•  Assume 10 lines of code���= 40 tiles of 1 x 1 cm

•  12.5 million lines of code���≈ 40 soccer fields

A. van Deursen, De software-evolutieparadoxIntreerede TU Delft, 23 feb 2005

Imagine 400 developers concurrentlymoving tiles around on 40 soccer fields

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.12

How to deal with Legacy ?New or changing requirements will gradually degrade original design… unless extra development effort is spent to adapt the structure

New Functionality

Hack it in ?

•  duplicated code•  complex conditionals•  abusive inheritance•  large classes/methods

First …•  refactor•  restructure•  reengineer

Take a loan on your software⇒ pay back via reengineering

Investment for the future⇒ paid back during maintenance

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.13

Common SymptomsLack of Knowledge•  obsolete or no documentation

•  departure of the original developers or users

•  disappearance of inside knowledge about the system

•  limited understanding of entire system

⇒ missing tests

Process symptoms•  too long to turn things over to

production•  need for constant bug fixes•  maintenance dependencies

•  difficulties separating products

⇒ simple changes take too long

Code symptoms•  duplicated code•  code smells⇒ big build times

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.14

The Reengineering Life-Cycle

Requirements

Designs

Code

(0) requirementanalysis

(1) modelcapture

(2) problemdetection (3) problem

resolution

(4) program transformation

•  people centric•  lightweight

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.15

A Map of Reengineering Patterns

Tests: Your Life Insurance

Detailed Model Capture

Initial Understanding

First Contact

Setting Direction

Migration Strategies

Detecting Duplicated Code

Redistribute Responsibilities

Transform Conditionals to Polymorphism

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.16

2. Reverse Engineering•  What and Why•  First Contact

☞  Interview during Demo

•  Initial Understanding☞ Analyze the Persistent Data

•  Detailed Model Capture☞ Look for the Contracts

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.17

What and Why ?DefinitionReverse Engineering is the process of analysing a subject system

☞  to identify the system’s components and their interrelationships and

☞  create representations of the system in another form or at a higher level of abstraction. — Chikofsky & Cross, ’90

MotivationUnderstanding other people’s code(cfr. newcomers in the team, code reviewing,original developers left, ...)

Generating UML diagrams is NOT reverse engineering ... but it is a valuable support tool

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.18

The Reengineering Life-Cycle

(0) req. analysis(1) model captureissues•  scale•  speed•  accuracy•  politics

Requirements

Designs

Code

(0) requirementanalysis

(1) modelcapture

(2) problemdetection

(3) problemresolution

(4) program transformation

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.19

First Contact

System experts

Chat with the ���Maintainers

Interview���during Demo

Talk with ���developers

Talk withend users

Talk about it

Verify what���you hear

feasibility assessment(one week time)

Software System

Read All the Codein One Hour

Do a Mock���Installation

Read it Compile it

Skim the ���Documentation

Read about it

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.20

First Project PlanUse standard templates, including:•  project scope

☞  see "Setting Direction"

•  opportunities☞  e.g., skilled maintainers, readable source-code, documentation

•  risks☞  e.g., absent test-suites, missing libraries, …☞  record likelihood (unlikely, possible, likely) ���

& impact (high, moderate, low) for causing problems

•  go/no-go decision•  activities

☞  fish-eye view

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.21

•  Solution: interview during demo-  select several users-  demo puts a user in a positive mindset-  demo steers the interview

Interview during Demo

Solution: Ask the user!

•  ... however☞ Which user ?

☞ Users complain☞ What should you ask ?

Problem: What are the typical usage scenarios?

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.22

Initial Understanding

understand ⇒higher-level model

Top down

Speculate about Design

Recover design

Analyze the Persistent Data

Study the Exceptional

Entities

Recover database

Bottom up

Identify problems

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.23

Analyze the Persistent DataProblem: Which objects represent valuable data?Solution: Analyze the database schema•  Prepare Model

☞  tables ⇒ classes; columns ⇒ attributes☞  candidate keys (naming conventions + unique indices)☞  foreign keys (column types + naming conventions���

+ view declarations + join clauses)•  Incorporate Inheritance

☞  one to one; rolled down; rolled up•  Incorporate Associations

☞  association classes (e.g. many-to-many associations)☞  qualified associations

•  Verification☞  Data samples + SQL statements

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.24

Example: One To One

Patient id: char(5) insuranceID: char(7) insurance: char(5)

Salesman id: char(5) company: char(40)

Person id: char(5) name: char(40) addresss: char(60)

Patient id: char(5) insuranceID: char(7) insurance: char(5)

Salesman id: char(5) company: char(40)

Person id: char(5) name: char(40) addresss: char(60)

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.25

Example: Rolled DownPatient id: char(5) name: char(40) addresss: char(60) insuranceID: char(7) insurance: char(5)

Salesman id: char(5) name: char(40) addresss: char(60) company: char(40)

Patient id: char(5) insuranceID: char(7) insurance: char(5)

Salesman id: char(5) company: char(40)

Person id: char(5) name: char(40) addresss: char(60)

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.26

Example: Rolled Up

Person id: char(5) name: char(40) addresss: char(60) insuranceID: char(7) «optional» insurance: char(5) «optional» company: char(40) «optional»

Patient id: char(5) insuranceID: char(7) insurance: char(5)

Salesman id: char(5) company: char(40)

Person id: char(5) name: char(40) addresss: char(60)

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.27

Example: Qualified Association

Patient id: char(5) …

Treatment patientID: char(5) date: date nr: integer comment: varchar(255)

Patient id: char(5) …

Treatment comment: Text

date: Date nr: Integer

1

1

addTreatment(d, n, t) lookupTreatment(d, n)

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.28

Initial Understanding (revisited)Top down

Speculate about Design

Analyze the Persistent Data

Study the Exceptional

Entities

understand ⇒higher-level model

Bottom up

ITERATION

Recover design

Recover database

Identify problems

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.29

3. Software Visualization•  Introduction

☞  The Reengineering life-cycle

•  Examples•  Lightweight Approaches

☞  CodeCrawler

•  Dynamic Analysis•  Conclusion

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.30

The Reengineering Life-cycle

Requirements

Designs

Code

(0) requirementanalysis

(1) modelcapture

(2) problemdetection (3) problem

resolution

(4) program transformation

(2) problem detectionissues•  Tool support•  Scalability•  Efficiency

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.31

Visualising Hierarchies•  Euclidean cones

☞ Pros:•  More info than 2D

☞ Cons: •  Lack of depth•  Navigation

•  Hyperbolic trees☞ Pros:

•  Good focus•  Dynamic

☞ Cons: •  Copyright

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.32

Bottom Up Visualisation

Filter

All program entities and

relations

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.33

A lightweight approach•  A combination of metrics

and software visualization☞ Visualize software using

colored rectangles for the entities and edges for the relationships

☞ Render up to five metrics on one node:•  Size (1+2)•  Color (3)•  Position (4+5)

Relationship

Entity

Y Coordinate

Height Color tone

Width

X Coordinate

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.34

Nodes: ClassesEdges: Inheritance RelationshipsWidth: Number of attributesHeight: Number of methodsColor: Number of lines of code

System Complexity View

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.35

Inheritance Classification View

Boxes: ClassesEdges: InheritanceWidth: Number of Methods AddedHeight: Number of Methods OverriddenColor: Number of Method Extended

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.36

Data Storage Class Detection View

Boxes: ClassesWidth: Number of Methods Height: Lines of CodeColor: Lines of Code

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.37

Industrial Validation

Nokia (C++ 1.2 MLOC >2300 classes)Nokia (C++/Java 120 kLOC >400 classes)MGeniX (Smalltalk 600 kLOC >2100classes)Bedag (COBOL 40 kLOC)...

Personal experience2-3 days to get something

Used by developers + Consultants

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.38

CO

BO

L C

ALL

GR

AP

H

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.39

Program Dynamics

•  Simple•  Reproducible•  Scales well

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.40

•  Visualization of similarities in event traces

•  Eliminate similarities

Frequency Spectrum

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.41

•  Extract run-time coupling•  Apply datamining (“google”)•  Experiment with documented

open-source cases (Ant, JMeter)☞  recall: +- 90 %☞  precision: +- 60 %

Key Concept Identification

Class

IC_C

C’ +

web-

mining

Ant docs

Project √ √ UnknownElement √ √ Task √ √ Main √ √ IntrospectionHelper √ √ ProjectHelper √ √ RuntimeConfigurable √ √ Target √ √ ElementHandler √ √ TaskContainer × √ Recall (%) 90 - Precision (%) 60 -

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.42

Replication

T. Eisenbarth, R. Koschke, and D. Simon. Locating features in source code. IEEE Transactions on Software Engineering, 29(3):210–224, March 2003.

Replication is not supported, industrial cases are rare, …. In order to help the discipline mature, we think that more systematic empirical evaluation is needed. ���[Tonella et. Al, in Empirical Software Engineering]

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.43

Pilot Study: ATM Simulation

Assumptions•  Feature: invoked from the outside.•  Map: scenario-feature map exists•  Recompile: recompile or instrumentation

possible•  Isolate: system can run in isolation

(prevent noise)•  Manual: perform dynamic analysis

without help (I.e. no operator)•  Generic: no limit to granularity of

computational unit

scenario-feature map

concept lattice

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.44

Case Study: Portfolio Management

2nd iteration

3rd iteration

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.45

4. RestructuringMost common situations

Transform Conditionals to Polymorphism☞  Transform Self Type Checks☞  Transform Provider Type Checks

Redistribute Responsibilities☞  Move Behaviour Close to Data ☞  Eliminate Navigation Code☞  Split up God Class☞  Empirical Validation

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.46

Transform Conditionals to Polymorphism

Transform ���Self Type Checks

Test providertype

Test self type Test externalattribute

Transform ���Client Type Checks

Transform Conditionalsinto Registration

Testnull values

Introduce ���Null Object

Factor Out Strategy

Factor Out State

Test object state

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.47

class Message { private: int type_; void* data;

... void send (Channel* ch) {

switch (type_) { case TEXT : { ch->nextPutAll(data); break; } case ACTION : { ch->doAction(data); ...

void makeCalls (Telephone* phoneArray[]) {

for (Telephone *p = phoneArray; p; p++) { switch (p-> phoneType()) { case TELEPHONE::POTS : { POTSPhone* potsp = (POTSPhone*)p potsp->tourne(); potsp->call();... case TELEPHONE::ISDN : { ISDNPhone* isdnp = (ISDNPhone*)p isdnp->initLine(); isdnp->connect();...

Example: Transform Conditional

⇒ Transform Self Type Checks ⇒ Transform Client Type Checks

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.48

Message send()

Message send()

ActionMessage send()

TextMessage send()

switch (type_) { case TEXT : { ch->nextPutAll(data); break;} case ACTION : { ch->doAction(data);

...

Client1

Client1

Client2

Client2

Transform Self Type Check

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.49

TelephoneBox makeCall ()

Telephone

POTSPhone ...

ISDNPhone ...

TelephoneBox makeCall ()

Telephone makeCall()

POTSPhone makeCall()

...

ISDNPhone makeCall

...

Transform Client Type Check

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.50

Redistribute Responsibilities

Eliminate Navigation Code

Data containers

Monster client���of data containers

Split Up God Class

Move Behaviour Close to Data

Chains ofdata containers

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.51

Move Behavior Close to Data (example 1/2) Employee +telephoneNrs +name(): String +address(): String

Payroll +printEmployeeLabel()

System.out.println(currentEmployee.name() ); System.out.println(currentEmployee.address() ); for (int i=0; i < currentEmployee.telephoneNumbers.length; i++) {

System.out.print(currentEmployee.telephoneNumbers[i]); System.out.print(" "); }

System.out.println("");

TelephoneGuide +printEmployeeTelephones()

**

… for …

System.out.print(" -- "); …

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.52

Move Behavior Close to Data (example 2/2)

Employee - telephoneNrs - name(): String - address(): String +printLabel(String)

Payroll +printEmployeeLabel()

public void printLabel (String separator) { System.out.println(_name); System.out.println(_address); for (int i=0; i < telephoneNumbers.length; i++) {

System.out.print(telephoneNumbers[i]); System.out.print(separator); }

System.out.println("");}

TelephoneGuide +printEmployeeTelephones()

**

… emp.printLabel(" -- "); …

… emp.printLabel(" "); …

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.53

Car -engine +increaseSpeed()

Eliminate Navigation Code

… engine.carburetor.fuelValveOpen = true

Engine +carburetor

Car -engine +increaseSpeed()

Carburetor +fuelValveOpen

Engine -carburetor +speedUp()

Car -engine +increaseSpeed()

… engine.speedUp()

carburetor.fuelValveOpen = true

Carburetor -fuelValveOpen +openFuelValve()

Engine -carburetor +speedUp()

carburetor.openFuelValve() fuelValveOpen = true

Carburetor +fuelValveOpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.54

Split Up God ClassProblem: Break a class which monopolizes control?Solution: Incrementally eliminate navigation code

•  Detection:☞  measuring size☞  class names containing Manager, System, Root, Controller☞  the class that all maintainers are avoiding

•  How:☞  move behaviour close to data + eliminate navigation code☞  remove or deprecate façade

•  However:☞  If God Class is stable, then don't split���

⇒ shield client classes from the god class

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.55

Split Up God Class: 5 variants

ControllerA

ControllerFilter1Filter2

B

ControllerFilter1Filter2

MailHeader

C

ControllerFilter1Filter2

MailHeader

FilterActionD

ControllerFilter1Filter2

MailHeader

FilterAction

NameValuePair

E

Mail client filters incoming mail

Extract behavioral class

Extract data class

Extract behavioral class

Extract data class

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.56

Empirical Validation•  Controlled experiment with 63 last-

year master-level students (CS and ICT)

Independent Variables Dependent Variables

Experimental ���task

Institution

God classdecomposition

9

6

3 Time

Accuracy

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.57

Interpretation of Results•  “Optimal decomposition” differs with respect to training

☞  Computer science: preference towards C-E

☞  ICT-electronics: preference towards A-C

•  Advanced OO training can induce a preference towards particular styles of decomposition☞  Consistent with [Arisholm et al. 2004]

“Good” design is in the

eye of the beholder

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.58

5. Code Duplicationa.k.a. Software Cloning, Copy&Paste

Programming

•  Code Duplication☞  What is it?☞  Why is it harmful?

•  Detecting Code Duplication•  Approaches•  A Lightweight Approach•  Visualization (dotplots)•  Duploc•  Recent trends

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.59

The Reengineering Life-Cycle

Requirements

Designs

Code

(0) requirementanalysis

(1) modelcapture

(2) problemdetection (3) problem

resolution

(2) Problem detection

(2) Problem detection

issues•  Scale•  Unknown a priori

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.60

Code is CopiedSmall Example from the Mozilla Distribution (Milestone 9)Extract from /dom/src/base/nsLocation.cpp

[432] NS_IMETHODIMP [433] LocationImpl::GetPathname(nsString[434] {[435] nsAutoString href;[436] nsIURI *url;[437] nsresult result = NS_OK;[438] [439] result = GetHref(href);[440] if (NS_OK == result) {[441] #ifndef NECKO[442] result = NS_NewURL(&url, href);[443] #else[444] result = NS_NewURI(&url, href);[445] #endif // NECKO[446] if (NS_OK == result) {[447] #ifdef NECKO[448] char* file;[449] result = url->GetPath(&file);[450] #else[451] const char* file;[452] result = url->GetFile(&file);[453] #endif[454] if (result == NS_OK) {[455] aPathname.SetString(file);[456] #ifdef NECKO[457] nsCRT::free(file);[458] #endif[459] }[460] NS_IF_RELEASE(url);[461] }[462] }[463] [464] return result;[465] }[466]

[467] NS_IMETHODIMP [468] LocationImpl::SetPathname(const nsString[469] {[470] nsAutoString href;[471] nsIURI *url;[472] nsresult result = NS_OK;[473] [474] result = GetHref(href);[475] if (NS_OK == result) {[476] #ifndef NECKO[477] result = NS_NewURL(&url, href);[478] #else[479] result = NS_NewURI(&url, href);[480] #endif // NECKO[481] if (NS_OK == result) {[482] char *buf = aPathname.ToNewCString();[483] #ifdef NECKO[484] url->SetPath(buf);[485] #else[486] url->SetFile(buf);[487] #endif[488] SetURL(url);[489] delete[] buf;[490] NS_RELEASE(url); [491] }[492] }[493] [494] return result;[495] }[496]

[497] NS_IMETHODIMP [498] LocationImpl::GetPort(nsString& aPort)[499] {[500] nsAutoString href;[501] nsIURI *url;[502] nsresult result = NS_OK;[503] [504] result = GetHref(href);[505] if (NS_OK == result) {[506] #ifndef NECKO[507] result = NS_NewURL(&url, href);[508] #else[509] result = NS_NewURI(&url, href);[510] #endif // NECKO[511] if (NS_OK == result) {[512] aPort.SetLength(0);[513] #ifdef NECKO[514] PRInt32 port;[515] (void)url->GetPort(&port);[516] #else[517] PRUint32 port;[518] (void)url->GetHostPort(&port);[519] #endif[520] if (-1 != port) {[521] aPort.Append(port, 10);[522] }[523] NS_RELEASE(url);[524] }[525] }[526] [527] return result;[528] }[529]

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.61

Case Study LOCDuplication

without comments

with comments

gcc 460’000 8.7% 5.6%

Database Server 245’000 36.4% 23.3%

Payroll 40’000 59.3% 25.4%

Message Board 6’500 29.4% 17.4%

How Much Code is Duplicated?Usual estimates: 8 to 12% in normal industrial code15 to 25 % is already a lot!

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.62

Copied Code Problems•  General negative effect:

☞  Code bloat•  Negative effects on Software Maintenance

☞  Copied Defects ☞  Changes take double, triple, quadruple, ... Work☞  Dead code☞  Add to the cognitive load of future maintainers

•  Copying as additional source of defects ☞  Errors in the systematic renaming produce unintended aliasing

•  Metaphorically speaking:☞  Software Aging, “hardening of the arteries”, ☞  “Software Entropy” increases even small design changes become very

difficult to effect

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.63

Code Duplication DetectionNontrivial problem:

•  No a priori knowledge about which code has been copied•  How to find all clone pairs among all possible pairs of segments?

Lexical Equivalence

Semantic Equivalence

Syntactical Equivalence

Type I

Type II(& Type III)Type IV

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.64

General Schema of Detection Process

Source Code Transformed Code Duplication Data

Transformation Comparison

Author Level Transformed CodeComparison Technique

[John94a] Lexical Substrings String-Matching

[Duca99a] Lexical Normalized Strings String-Matching

[Bake95a] Syntactical Parameterized Strings String-Matching

[Mayr96a] Syntactical Metric Tuples Discrete comparison

[Kont97a] Syntactical Metric Tuples Euclidean distance

[Baxt98a] Syntactical AST Tree-Matching

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.65

Simple Detection Approach (i)•  Assumption:

•  Code segments are just copied and changed at a few places•  Code Transformation Step

•  remove white space, comments•  remove lines that contain uninteresting code elements

(e.g., just ‘else’ or ‘}’)

… //assign same fastid as container fastid = NULL; const char* fidptr = get_fastid(); if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ];

… fastid=NULL; constchar*fidptr=get_fastid(); if(fidptr!=NULL) intl=strlen(fidptr) fastid = newchar[l+]

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.66

Simple Detection Approach (ii)•  Code Comparison Step

☞  Line based comparison (Assumption: Layout did not change during copying)

☞  Compare each line with each other line. ☞  Reduce search space by hashing:

1.  Preprocessing: Compute the hash value for each line2.  Actual Comparison: Compare all lines in the same hash bucket

•  Evaluation of the Approach☞  Advantages: Simple, language independent ☞  Disadvantages: Difficult interpretation

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.67

A Perl script for C++ (i)while (<>) { chomp; $totalLines++;

# remove comments of type /* */ my $codeOnly = ''; while(($inComment && m|\*/|) || (!$inComment && m|/\*|)) { unless($inComment) { $codeOnly .= $` } $inComment = !$inComment; $_ = $'; } $codeOnly .= $_ unless $inComment; $_ = $codeOnly;

s|//.*$||; # remove comments of type // s/\s+//g; #remove white space s/$keywordsRegExp//og if $removeKeywords; #remove keywords

$equivalenceClassMinimalSize = 1;$slidingWindowSize = 5;$removeKeywords = 0;@keywords = qw(if then else );

$keywordsRegExp = join '|', @keywords;

@unwantedLines = qw( else return return; { } ; );push @unwantedLines, @keywords;

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.68

A Perl script for C++ (ii)$codeLines++; push @currentLines, $_; push @currentLineNos, $.; if($slidingWindowSize < @currentLines) { shift @currentLines; shift @currentLineNos;} #print STDERR "Line $totalLines >$_<\n"; my $lineToBeCompared = join '', @currentLines; my $lineNumbersCompared = "<$ARGV>"; # append the name of the file $lineNumbersCompared .= join '/', @currentLineNos; #print STDERR "$lineNumbersCompared\n"; if($bucketRef = $eqLines{$lineToBeCompared}) { push @$bucketRef, $lineNumbersCompared; } else {$eqLines{$lineToBeCompared} = [ $lineNumbersCompared ];} if(eof) { close ARGV } # Reset linenumber-count for next file

•  Handles multiple files•  Removes comments

and white spaces•  Controls noise (if, {,)•  Granularity (number of

lines)•  Possible to remove

keywords

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.69

Output Sample

Lines: create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pnMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); create_property(pd,pnOwnership,stBool,true,*iOwnership); Locations: </face/typesystem/SCTypesystem.C>6178/6179/6180/6181/6182 </face/typesystem/SCTypesystem.C>6198/6199/6200/6201/6202 Lines: create_property(pd,pnSupertype,stReference,true,*iSupertype); create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); Locations: </face/typesystem/SCTypesystem.C>6177/6178 </face/typesystem/SCTypesystem.C>6229/6230

Lines = duplicated linesLocations = file names and line number

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.70

Visualization of Duplicated Code• Visualization provides insights into the duplication situation• A simple version can be implemented in three days• Scalability issue

• Dotplots — Technique from DNA Analysis •  Code is put on vertical as well as horizontal axis•  A match between two elements is a dot in the matrix

Exact Copies Copies with Inserts/Deletes Repetitive

a b c d e f a b c d e f a b c d e fa b x y e f b c d e a b x y dc ea x b c x d e x f xg ha

Variations Code Elements

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.71

Visualization of Copied Code Sequences

All examples are made using Duploc from an industrial case study (1 Mio LOC C++ System)

Detected ProblemFile A contains two copies of a piece of code

File B contains another copy of this code

Possible SolutionExtract Method

File A

File A

File B

File B

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.72

Visualization of Repetitive Structures

Detected Problem4 Object factory clones: a switch statement over a type variable is used to call individual construction code

Possible SolutionStrategy Method

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.73

Visualization of Cloned Classes

Class A

Class B

Class BClass A

Detected Problem:Class A is an edited copy of class B. Editing & Insertion

Possible SolutionSubclassing …

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.74

Visualization of Clone Families

20 Classes implementing lists for different data types

DetailOverview

Recent���Trends

© S. Demeyer, S. Ducasse, O. NierstraszObject-Oriented Reengineering.75

Clone Detection Inside

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.76

6. Software Evolution•  Exploiting the Version Control System

☞  Visualizing CVS changes•  The Evolution Matrix•  Yesterday's weather

It is not age that turns a piece of software into a legacy system, ���but the rate at which it has been developed and adapted without being reengineered.

[Demeyer, Ducasse and Nierstrasz: Object-Oriented Reengineering Patterns]

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.77

The Reengineering Life-Cycle

Requirements

Designs

Code

(0) requirementanalysis

(1) modelcapture

(2) problemdetection (3) problem

resolution

(2) Problem detection

(2) Problem detection Issues•  scale

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.78

Analyse CVS changes

4) Block Shift = Design Change

3) Triangle = Core Reduces

1) Vertical lines = Frequent Changers

2) Horizontal line = Shotgun Surgery

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.79

Ownership Map: ���Developer Activity

DialogueMonologue

Edit Takeover

Familiarization

What to (re)test ?

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.80

Data from Windows Vista and Windows 7

Software components with a high level of ownership will have fewer failures than components with lower top ownership levels.

Software components with many minor contributors will have more failures than software components that have fewer.

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.81

The Evolution Matrix

Last Version

First Version

Major Leap

Removed Classes

TIME (Versions) Growth Stabilisation

Added Classes

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.82

Example: MooseFinder (38 Versions)

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.83

Test history

single test

unit tests

integration tests

… affect unit tests … affect unit tests

phased testing

System under study = checkstyle

Selenium Tests

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.84

GitHub Repository Project Description # Commit # Sel. Commit Java LoC Sel. LoCgxa/atlas Gene Expression Atlas Portal for sharing gene expression data 2118 358 32375 5374

INCF/eeg-database EEG/ERP portal Portal for sharing EEG/ERP portal clinical data 854 17 68262 7158mifos/head MiFos Portfolio management for microfinance institutions 7977 505 338705 18735

motech/TAMA-Web Tama Front office application for clinics 2358 239 62034 2815OpenLMIS/open-lmis OpenLMIS Logistics management information system 4714 1153 72275 19195

xwiki/xwiki-enterprise XWiki Enterprise Enterprise-level wiki 688 164 28405 13506zanata/zanata-server Zanata Software for translators 3430 81 111698 3509

Zimbra-Community/zimbra-sources Zimbra Enterprise collaboration software 377 243 1025410 189413

TABLE IV: The 8 repositories in the high-quality corpus.

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●● ●●●●●●●●●●●●●●

●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ●●●●●●●●●●●●●● ●●●●●●●●●

●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●

●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ●●●●● ●●●●●●●●●

●● ●●●●●● ●●●●●●●●●●●●●●●● ●● ●●●●●● ● ●●● ●●●●

0

500

1000

0 500 1000 1500CommitId

FileId

ChangeType● added−regular

added−seleniumdelete−regulardelete−seleniumedit−regularedit−selenium

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

0

500

1000

1500

0 1000 2000 3000 4000CommitId

FileId

ChangeType● added−regular

added−seleniumdelete−regulardelete−seleniumedit−regularedit−selenium

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●

●●

●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●

●●

●●

●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●

●●

●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●

0

500

1000

1500

0 500 1000 1500 2000CommitId

FileId

ChangeType● added−regular

added−seleniumdelete−regulardelete−seleniumedit−regularedit−selenium

Fig. 3: Change histories of the XWIKI-ENTERPRISE (left), OPEN-LMIS (middle), and ATLAS projects (right).

B. Project-specific Change Histories

Before answering RQ2 quantitatively with two new corpus-wide metrics, we provide some visual insights into the commithistories of individual projects from our high-quality corpus.Figure 3 depicts a variant of the Change History Viewsintroduced by Zaidman et al. [23] for three randomly selectedprojects. For each commit in a project’s history, our variantvisualizes whether a SELENIUM (rather than a unit test file)or application file was added, removed or edited. The X-axisdepicts the different commits ordered by their timestamp. TheY-axis depicts the files of the project. To this end, we assign aunique identifier to each file. We ensure that SELENIUM filesget the lowest identifiers by processing them first. As a result,they are located at the bottom of the graph.

Figure 3 clearly demonstrates that SELENIUM tests aremodified as the projects evolve. However, the modificationsdo not appear to happen in a coupled manner. This is to beexpected as SELENIUM tests concern the user interface of anapplication, while application commits also affect the appli-cation logic underlying the user interface. Any evolutionarycoupling will therefore be less outspoken than, for instance,between a unit test and the application logic under test.

Note that a large number of files is removed and addedaround revision 1000 of the xwiki-enterprise project. Thecorresponding commit message6 reveals that this is due to achange in the project’s directory structure. We see this occur inseveral other projects. In providing a more quantitative answerto RQ2, the remainder of this section will therefore take careto track files based on their content rather than their path alone.

C. Corpus-Wide Commit Activity Metrics

We first aim to provide quantitative insights into the paceat which SELENIUM-based functional tests are changed as theweb application under test evolves.

1) Research method: To this end, we evaluate the high-quality corpus against several metrics that are based on thefollowing categorization of commit activities:

6Commit 74feec18b81dec12d9d9359f8fc793587b4ed329

Repository #S AS

C

SS

C

AS

D

SS

D

gxa/atlas 258 6.67 1.39 1.5 0.33INCF/eeg-database 11 75.36 1.55 98.59 2.8

mifos/head 381 19.58 1.33 6.7 0.4motech/TAMA-Web 170 11.52 1.41 3.33 0.44

OpenLMIS/open-lmis 704 5.05 1.64 0.45 0.16xwiki/xwiki-enterprise 96 5.33 1.71 6.94 1.95

zanata/zanata-server 51 65.65 1.59 35.82 0.72. . . /zimbra-sources 66 1.74 3.73 1.81 7.09

TABLE V: Averaged commit activity metrics for the high-quality corpus. The first column denotes either #SS or #AS(cannot diverge by more than 1).

SC SELENIUM commit: commit that adds, modifies or deletes atleast one SELENIUM file.

AC Application commit: commit that does not add, modify or deleteany SELENIUM file.

The same categorization transposes to commit sequences:

SS SELENIUM span: maximal sequence of successive SC.AS Application span: maximal sequence of successive AC.

Finally, this categorization enables computing the followingmetrics about each kind of span:

ASD,C

Length of an AS in days and in commits.SS

D,C

Length of a SS in days and in commits.

2) Results: The next table depicts the results for the com-mit activity metrics for the repositories in the high-quality cor-pus. The most revealing entries are related to the average lengthof the non-SELENIUM spans measured in commits (AS

C

).It takes on average about 11.23 non-SELENIUM commits(or 4.33 days) before a commit affects a SELENIUM file.However, this mean is largely influenced by outliers sincethe third quartile is even lower with only 9 non-SELENIUMcommits (2.05 days). These results suggest that SELENIUM-based functional tests do co-evolve with the web applicationunder test.

AS

C

SS

C

AS

D

SS

D

Mean 11.23 1.59 4.33 0.66Std Deviation 73.07 1.36 33.66 4.9

1st Quartile 2 1 0.05 0.02Median 4 1 0.54 0.06

3rd Quartile 9 2 2.05 0.36

Git repositories of the XWiki, OpenLMIS and Atlas© Laurent Christophe (Vrije Universiteit Brussel)

0.0

0.1

0.2

0.3

0.4

assertion command constant demarcator locationChange Classification

Cha

nge

Hit

Rat

io

Fig. 6: Summary of the corpus-wide change classification.Project Total Locator Command Demarcator Asserts ConstantsAtlas 8068 90 3 104 3282 2586XWiki 68665 115 154 24 1490 3114Tama 31821 95 89 43 36 571Zanata 12959 497 119 0 1 906EEG/ERP 248 3 0 0 6 24OpenLMIS 69792 2550 401 8 3454 8972

TABLE VII: Project-scoped change classification.

changes. Table VII lists project-scoped results. Our tool timedout on two projects of the corpus with an extensive history.

The most frequently made changes are those to con-stants and asserts. These are the two test components thatare most prone to changes. Constants occur frequently inlocator expressions to retrieve DOM elements from a webpage and in assert statements as the values compared against.10

Focusing future tool support for test maintenance on theseareas might therefore benefit test engineers most. Existingwork about repairing assertions in unit tests [4], and aboutrepairing references to GUI widgets in functional tests fordesktop applications [12] suggests that this is not infeasible.Note that existing work also targets repairing changes in testcommand sequences [15], but such repairs do not seem tooccur much in practice.

Both outliers in our results stem from the ATLAS project.Its test scripts contain hardcoded genome strings inside assertstatements that are frequently updated.

D. Threats to Validity

The edit script generated by CHANGENODES is not alwaysminimal. It may incorrectly output a sequence of redundantoperations for nodes that are not modified. This is due tosome of the heuristics used by the differencing algorithm.These unneeded operations only form a small set of the totaloperations. We have performed random validations of distilledchanges to ensure the correctness of our results.

Several changes are classified by looking at names ofmethods, without using type information. Computing thisinformation would be too expensive to do our experiments onmultiple large-scale projects. As a result some changes maybe incorrectly classified.

We have been unable to find examples of some of thechange categories from Section II. This is either due to ourchange query being too strict, the patterns not being present inthe examined projects or due incorrectly distilling the changesmade to the SELENIUM scripts.

10Our tool classifies such changes also in the locator or assertion category.

VI. RELATED WORK

Little is known about how automated functional tests areused in practice. A notable exception is Berner et al. [1]’saccount of their experiences with automated testing in in-dustrial applications. Their observation “The Capability ToRun Automated Tests Diminishes If Not Used” underlinesthe importance of test maintenance. In interviews with ex-perts from different organizations, an industrial survey [16]found that the main obstacles in adopting automated testsare development expenses and upkeep costs. Finally, Leottaet al. [17] recently reported on an experiment in which theauthors manually created two distinct sets of SELENIUM-basedautomated functional tests for 6 randomly selected open-sourcePHP projects. While the first set of tests corresponds to the testscripts studied in this paper, the second set of tests is createdusing a capture-and-replay functionality of the SELENIUMIDE. The authors find that the latter are less expensive todevelop in terms of time required, but much more expensive tomaintain. To the best of our knowledge, ours is the first large-scale study on the prevalence and maintenance of SELENIUM-based functional tests —albeit on open-source software.

Unit tests have received more attention in the literature. Thework of Zaidman et al. [23] on the co-evolution of unit testswith the application under test is seminal. Apart from “ChangeHistory Views” (cf. Section IV-B), they also proposed to plotthe relative growth of test and production code over time in“Growth History Views”. This enables observing whether theirdevelopment proceeds synchronously or in separate phases.Fluri et al. follow a similar method in their study on co-evolution of code and comments [8]. The same goes for astudy on co-evolution of database schema and application codeby Goeminne et al. [11]. Our metrics from Section IV aim toprovide quantitative rather than visual insights into this matter,based on commit activities rather than code growth.

Method-wise, there are several works tangential to oursin mining software repositories. German and Hindle [18], forinstance, classify metrics for change along the dimensions ofwhether the metric is aware of changes between two distinctprogram versions, of whether the metric is scoped to entitiesor commits, and of whether the metric is applied at specificevents or at evenly spaced intervals. The commit activity andmaintenance metrics from Section IV are scoped to commitsand SELENIUM files, unaware and aware of program changes,and applied at every and specific types of commits respec-tively. Several techniques and metrics have been proposed fordetecting co-changing files in a software repository (e.g., [13],[24]). We expect such fine-grained evolutionary couplings tobe less outspoken in our setting because test scripts exercisean implemented user interface along extensive scenarios, ratherthan the implementation itself. More research is needed.

Section V distilled and subsequently categorized changeswithin each commit to a SELENIUM file. Similar analyses havebeen performed using the CHANGEDISTILLER [9] tool. Theaforementioned study by Fluri et al. [8], for instance, includesa distribution of the types of source code changes (e.g., returntype change) that induce a comment change. Another fine-grained classification of changes to Java code has also beenused to better quantify evolutionary couplings of files [7]. Incontrast to these general-purpose change classifications, oursis specific to automated functional tests. More coarse-grained

Avoid Magic Constants !!

Recommender Systems

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.85

Stack Trace ⇒ link to source code

Description ⇒ text mining

Who to fix ? How long to fix ?

Misclassified bug reports ?

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.86

7. Conclusion1. Introduction

There are OO legacy systems too !

2. Reverse EngineeringHow to understand your code

3. VisualizationScaleable approach

4. RestructuringHow to Refactor Your Code

4. Code DuplicationThe most typical problems

5. Software EvolutionLearn from the past

6. ConclusionDid we convince you?

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.87

GoalsWe will try to convince you:•  Yes, Virginia, there are object-oriented legacy systems too!

☞  … actually, that's a sign of health

•  Reverse engineering and reengineering are essential activities in the lifecycle of any successful software system. (And especially OO ones!)☞  … consequently, do not consider it second class work

•  There is a large set of lightweight tools and ���techniques to help you with reengineering.☞  … check our book, but remember the list is growing

•  Despite these tools and techniques, ���people must do job and represent the most valuable resource.☞  … pick them carefully and reward them properly

⇒ Did we convince you ?