Software Reengineering & Evolution - UAntwerpen

87
Software Reengineering & Evolution Serge Demeyer Stéphane Ducasse Oscar Nierstrasz http://scg.unibe.ch/download/oorp/

Transcript of Software Reengineering & Evolution - UAntwerpen

Page 1: Software Reengineering & Evolution - UAntwerpen

Software Reengineering���& Evolution

Serge Demeyer���Stéphane Ducasse���Oscar Nierstrasz

http://scg.unibe.ch/download/oorp/

Page 2: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Software Reengineering and Evolution.2

Schedule1. Introduction

There are OO legacy systems too !

2. Reverse EngineeringHow to understand your code

3. VisualizationScaleable approach

4. RestructuringHow to Refactor Your Code

5. Code DuplicationThe most typical problems

6. Software EvolutionLearn from the past

7. Conclusion

Page 3: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.3

GoalsWe will try to convince you:

•  Yes, Virginia, there are object-oriented legacy systems too!

•  Reverse engineering and reengineering are essential activities in the lifecycle of any successful software system. (And especially OO ones!)

•  There is a large set of lightweight tools and techniques to help you with reengineering.

•  Despite these tools and techniques, people must do job and they represent the most valuable resource.

Page 4: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.4

What is a Legacy System ?“legacy”

A sum of money, or a specified article, given to another by will; anything handed down by an ancestor or predecessor.

— Oxford English Dictionary

⇒ so, further evolution and development may be prohibitively expensive

A legacy system is a piece of software that:

•  you have inherited, and•  is valuable to you.

Typical problems with legacy systems:•  original developers not available•  outdated development methods used•  extensive patches and modifications have

been made•  missing or outdated documentation

Page 5: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.5

Software Maintenance - Cost

requirementdesign

codingtesting

delivery

x 1

x 5

x 10

x 20

x 200Relative Maintenance EffortBetween 50% and 75% of global effort is spent on

“maintenance” !

Relative Costof Fixing Mistakes

Solution ?•  Better requirements engineering?•  Better software methods & tools���

(database schemas, CASE-tools, objects, components, …)?

Page 6: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.6

Continuous Development

17.4% Corrective(fixing reported errors)

18.2% Adaptive(new platforms or OS)

60.3% Perfective(new functionality)

The bulk of the maintenance cost is due to new functionality⇒ even with better requirements, it is hard to predict new functions

data from [Lien78a]

4.1% Other

Page 7: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.7

(*) process-oriented structured methods, information engineering,data-oriented methods, prototyping, CASE-tools – not OO !

Contradiction ? No!•  modern methods make it easier to change���

... this capacity is used to enhance functionality!

Modern Methods & Tools ?[Glas98a] quoting empirical study from Sasa Dekleva (1992)

•  Modern methods(*) lead to more reliable software•  Modern methods lead to less frequent software repair

•  and ...•  Modern methods lead to more total maintenance time

Page 8: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.8

Lehman's LawsA classic study by Lehman and Belady [Lehm85a] identified several “laws”

of system change.

Continuing change

•  A program that is used in a real-world environment must change, or become progressively less useful in that environment.

Increasing complexity

•  As a program evolves, it becomes more complex, and extra resources are needed to preserve and simplify its structure.

Those laws are still applicable…

Page 9: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.9

What about Objects ?Object-oriented legacy systems•  = successful OO systems whose architecture and design no longer

responds to changing requirements

Compared to traditional legacy systems•  The symptoms and the source of the problems are the same

•  The technical details and solutions may differ

OO techniques promise better•  flexibility, •  reusability,

•  maintainability

•  …

⇒ they do not come for free

Page 10: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.10

What about Components ?

Components are very brittle …After a while one inevitably resorts to glue :)

Page 11: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.11

Soccer Field Metaphor

© A. Van Deursen

•  Assume 10 lines of code���= 40 tiles of 1 x 1 cm

•  12.5 million lines of code���≈ 40 soccer fields

A. van Deursen, De software-evolutieparadoxIntreerede TU Delft, 23 feb 2005

Imagine 400 developers concurrentlymoving tiles around on 40 soccer fields

Page 12: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.12

How to deal with Legacy ?New or changing requirements will gradually degrade original design… unless extra development effort is spent to adapt the structure

New Functionality

Hack it in ?

•  duplicated code•  complex conditionals•  abusive inheritance•  large classes/methods

First …•  refactor•  restructure•  reengineer

Take a loan on your software⇒ pay back via reengineering

Investment for the future⇒ paid back during maintenance

Page 13: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.13

Common SymptomsLack of Knowledge•  obsolete or no documentation

•  departure of the original developers or users

•  disappearance of inside knowledge about the system

•  limited understanding of entire system

⇒ missing tests

Process symptoms•  too long to turn things over to

production•  need for constant bug fixes•  maintenance dependencies

•  difficulties separating products

⇒ simple changes take too long

Code symptoms•  duplicated code•  code smells⇒ big build times

Page 14: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.14

The Reengineering Life-Cycle

Requirements

Designs

Code

(0) requirementanalysis

(1) modelcapture

(2) problemdetection (3) problem

resolution

(4) program transformation

•  people centric•  lightweight

Page 15: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.15

A Map of Reengineering Patterns

Tests: Your Life Insurance

Detailed Model Capture

Initial Understanding

First Contact

Setting Direction

Migration Strategies

Detecting Duplicated Code

Redistribute Responsibilities

Transform Conditionals to Polymorphism

Page 16: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.16

2. Reverse Engineering•  What and Why•  First Contact

☞  Interview during Demo

•  Initial Understanding☞ Analyze the Persistent Data

•  Detailed Model Capture☞ Look for the Contracts

Page 17: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.17

What and Why ?DefinitionReverse Engineering is the process of analysing a subject system

☞  to identify the system’s components and their interrelationships and

☞  create representations of the system in another form or at a higher level of abstraction. — Chikofsky & Cross, ’90

MotivationUnderstanding other people’s code(cfr. newcomers in the team, code reviewing,original developers left, ...)

Generating UML diagrams is NOT reverse engineering ... but it is a valuable support tool

Page 18: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.18

The Reengineering Life-Cycle

(0) req. analysis(1) model captureissues•  scale•  speed•  accuracy•  politics

Requirements

Designs

Code

(0) requirementanalysis

(1) modelcapture

(2) problemdetection

(3) problemresolution

(4) program transformation

Page 19: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.19

First Contact

System experts

Chat with the ���Maintainers

Interview���during Demo

Talk with ���developers

Talk withend users

Talk about it

Verify what���you hear

feasibility assessment(one week time)

Software System

Read All the Codein One Hour

Do a Mock���Installation

Read it Compile it

Skim the ���Documentation

Read about it

Page 20: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.20

First Project PlanUse standard templates, including:•  project scope

☞  see "Setting Direction"

•  opportunities☞  e.g., skilled maintainers, readable source-code, documentation

•  risks☞  e.g., absent test-suites, missing libraries, …☞  record likelihood (unlikely, possible, likely) ���

& impact (high, moderate, low) for causing problems

•  go/no-go decision•  activities

☞  fish-eye view

Page 21: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.21

•  Solution: interview during demo-  select several users-  demo puts a user in a positive mindset-  demo steers the interview

Interview during Demo

Solution: Ask the user!

•  ... however☞ Which user ?

☞ Users complain☞ What should you ask ?

Problem: What are the typical usage scenarios?

Page 22: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.22

Initial Understanding

understand ⇒higher-level model

Top down

Speculate about Design

Recover design

Analyze the Persistent Data

Study the Exceptional

Entities

Recover database

Bottom up

Identify problems

Page 23: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.23

Analyze the Persistent DataProblem: Which objects represent valuable data?Solution: Analyze the database schema•  Prepare Model

☞  tables ⇒ classes; columns ⇒ attributes☞  candidate keys (naming conventions + unique indices)☞  foreign keys (column types + naming conventions���

+ view declarations + join clauses)•  Incorporate Inheritance

☞  one to one; rolled down; rolled up•  Incorporate Associations

☞  association classes (e.g. many-to-many associations)☞  qualified associations

•  Verification☞  Data samples + SQL statements

Page 24: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.24

Example: One To One

Patient id: char(5) insuranceID: char(7) insurance: char(5)

Salesman id: char(5) company: char(40)

Person id: char(5) name: char(40) addresss: char(60)

Patient id: char(5) insuranceID: char(7) insurance: char(5)

Salesman id: char(5) company: char(40)

Person id: char(5) name: char(40) addresss: char(60)

Page 25: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.25

Example: Rolled DownPatient id: char(5) name: char(40) addresss: char(60) insuranceID: char(7) insurance: char(5)

Salesman id: char(5) name: char(40) addresss: char(60) company: char(40)

Patient id: char(5) insuranceID: char(7) insurance: char(5)

Salesman id: char(5) company: char(40)

Person id: char(5) name: char(40) addresss: char(60)

Page 26: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.26

Example: Rolled Up

Person id: char(5) name: char(40) addresss: char(60) insuranceID: char(7) «optional» insurance: char(5) «optional» company: char(40) «optional»

Patient id: char(5) insuranceID: char(7) insurance: char(5)

Salesman id: char(5) company: char(40)

Person id: char(5) name: char(40) addresss: char(60)

Page 27: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.27

Example: Qualified Association

Patient id: char(5) …

Treatment patientID: char(5) date: date nr: integer comment: varchar(255)

Patient id: char(5) …

Treatment comment: Text

date: Date nr: Integer

1

1

addTreatment(d, n, t) lookupTreatment(d, n)

Page 28: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.28

Initial Understanding (revisited)Top down

Speculate about Design

Analyze the Persistent Data

Study the Exceptional

Entities

understand ⇒higher-level model

Bottom up

ITERATION

Recover design

Recover database

Identify problems

Page 29: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.29

3. Software Visualization•  Introduction

☞  The Reengineering life-cycle

•  Examples•  Lightweight Approaches

☞  CodeCrawler

•  Dynamic Analysis•  Conclusion

Page 30: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.30

The Reengineering Life-cycle

Requirements

Designs

Code

(0) requirementanalysis

(1) modelcapture

(2) problemdetection (3) problem

resolution

(4) program transformation

(2) problem detectionissues•  Tool support•  Scalability•  Efficiency

Page 31: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.31

Visualising Hierarchies•  Euclidean cones

☞ Pros:•  More info than 2D

☞ Cons: •  Lack of depth•  Navigation

•  Hyperbolic trees☞ Pros:

•  Good focus•  Dynamic

☞ Cons: •  Copyright

Page 32: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.32

Bottom Up Visualisation

Filter

All program entities and

relations

Page 33: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.33

A lightweight approach•  A combination of metrics

and software visualization☞ Visualize software using

colored rectangles for the entities and edges for the relationships

☞ Render up to five metrics on one node:•  Size (1+2)•  Color (3)•  Position (4+5)

Relationship

Entity

Y Coordinate

Height Color tone

Width

X Coordinate

Page 34: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.34

Nodes: ClassesEdges: Inheritance RelationshipsWidth: Number of attributesHeight: Number of methodsColor: Number of lines of code

System Complexity View

Page 35: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.35

Inheritance Classification View

Boxes: ClassesEdges: InheritanceWidth: Number of Methods AddedHeight: Number of Methods OverriddenColor: Number of Method Extended

Page 36: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.36

Data Storage Class Detection View

Boxes: ClassesWidth: Number of Methods Height: Lines of CodeColor: Lines of Code

Page 37: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.37

Industrial Validation

Nokia (C++ 1.2 MLOC >2300 classes)Nokia (C++/Java 120 kLOC >400 classes)MGeniX (Smalltalk 600 kLOC >2100classes)Bedag (COBOL 40 kLOC)...

Personal experience2-3 days to get something

Used by developers + Consultants

Page 38: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.38

CO

BO

L C

ALL

GR

AP

H

Page 39: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.39

Program Dynamics

•  Simple•  Reproducible•  Scales well

Page 40: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.40

•  Visualization of similarities in event traces

•  Eliminate similarities

Frequency Spectrum

Page 41: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.41

•  Extract run-time coupling•  Apply datamining (“google”)•  Experiment with documented

open-source cases (Ant, JMeter)☞  recall: +- 90 %☞  precision: +- 60 %

Key Concept Identification

Class

IC_C

C’ +

web-

mining

Ant docs

Project √ √ UnknownElement √ √ Task √ √ Main √ √ IntrospectionHelper √ √ ProjectHelper √ √ RuntimeConfigurable √ √ Target √ √ ElementHandler √ √ TaskContainer × √ Recall (%) 90 - Precision (%) 60 -

Page 42: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.42

Replication

T. Eisenbarth, R. Koschke, and D. Simon. Locating features in source code. IEEE Transactions on Software Engineering, 29(3):210–224, March 2003.

Replication is not supported, industrial cases are rare, …. In order to help the discipline mature, we think that more systematic empirical evaluation is needed. ���[Tonella et. Al, in Empirical Software Engineering]

Page 43: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.43

Pilot Study: ATM Simulation

Assumptions•  Feature: invoked from the outside.•  Map: scenario-feature map exists•  Recompile: recompile or instrumentation

possible•  Isolate: system can run in isolation

(prevent noise)•  Manual: perform dynamic analysis

without help (I.e. no operator)•  Generic: no limit to granularity of

computational unit

scenario-feature map

concept lattice

Page 44: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.44

Case Study: Portfolio Management

2nd iteration

3rd iteration

Page 45: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.45

4. RestructuringMost common situations

Transform Conditionals to Polymorphism☞  Transform Self Type Checks☞  Transform Provider Type Checks

Redistribute Responsibilities☞  Move Behaviour Close to Data ☞  Eliminate Navigation Code☞  Split up God Class☞  Empirical Validation

Page 46: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.46

Transform Conditionals to Polymorphism

Transform ���Self Type Checks

Test providertype

Test self type Test externalattribute

Transform ���Client Type Checks

Transform Conditionalsinto Registration

Testnull values

Introduce ���Null Object

Factor Out Strategy

Factor Out State

Test object state

Page 47: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.47

class Message { private: int type_; void* data;

... void send (Channel* ch) {

switch (type_) { case TEXT : { ch->nextPutAll(data); break; } case ACTION : { ch->doAction(data); ...

void makeCalls (Telephone* phoneArray[]) {

for (Telephone *p = phoneArray; p; p++) { switch (p-> phoneType()) { case TELEPHONE::POTS : { POTSPhone* potsp = (POTSPhone*)p potsp->tourne(); potsp->call();... case TELEPHONE::ISDN : { ISDNPhone* isdnp = (ISDNPhone*)p isdnp->initLine(); isdnp->connect();...

Example: Transform Conditional

⇒ Transform Self Type Checks ⇒ Transform Client Type Checks

Page 48: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.48

Message send()

Message send()

ActionMessage send()

TextMessage send()

switch (type_) { case TEXT : { ch->nextPutAll(data); break;} case ACTION : { ch->doAction(data);

...

Client1

Client1

Client2

Client2

Transform Self Type Check

Page 49: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.49

TelephoneBox makeCall ()

Telephone

POTSPhone ...

ISDNPhone ...

TelephoneBox makeCall ()

Telephone makeCall()

POTSPhone makeCall()

...

ISDNPhone makeCall

...

Transform Client Type Check

Page 50: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.50

Redistribute Responsibilities

Eliminate Navigation Code

Data containers

Monster client���of data containers

Split Up God Class

Move Behaviour Close to Data

Chains ofdata containers

Page 51: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.51

Move Behavior Close to Data (example 1/2) Employee +telephoneNrs +name(): String +address(): String

Payroll +printEmployeeLabel()

System.out.println(currentEmployee.name() ); System.out.println(currentEmployee.address() ); for (int i=0; i < currentEmployee.telephoneNumbers.length; i++) {

System.out.print(currentEmployee.telephoneNumbers[i]); System.out.print(" "); }

System.out.println("");

TelephoneGuide +printEmployeeTelephones()

**

… for …

System.out.print(" -- "); …

Page 52: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.52

Move Behavior Close to Data (example 2/2)

Employee - telephoneNrs - name(): String - address(): String +printLabel(String)

Payroll +printEmployeeLabel()

public void printLabel (String separator) { System.out.println(_name); System.out.println(_address); for (int i=0; i < telephoneNumbers.length; i++) {

System.out.print(telephoneNumbers[i]); System.out.print(separator); }

System.out.println("");}

TelephoneGuide +printEmployeeTelephones()

**

… emp.printLabel(" -- "); …

… emp.printLabel(" "); …

Page 53: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.53

Car -engine +increaseSpeed()

Eliminate Navigation Code

… engine.carburetor.fuelValveOpen = true

Engine +carburetor

Car -engine +increaseSpeed()

Carburetor +fuelValveOpen

Engine -carburetor +speedUp()

Car -engine +increaseSpeed()

… engine.speedUp()

carburetor.fuelValveOpen = true

Carburetor -fuelValveOpen +openFuelValve()

Engine -carburetor +speedUp()

carburetor.openFuelValve() fuelValveOpen = true

Carburetor +fuelValveOpen

Page 54: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.54

Split Up God ClassProblem: Break a class which monopolizes control?Solution: Incrementally eliminate navigation code

•  Detection:☞  measuring size☞  class names containing Manager, System, Root, Controller☞  the class that all maintainers are avoiding

•  How:☞  move behaviour close to data + eliminate navigation code☞  remove or deprecate façade

•  However:☞  If God Class is stable, then don't split���

⇒ shield client classes from the god class

Page 55: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.55

Split Up God Class: 5 variants

ControllerA

ControllerFilter1Filter2

B

ControllerFilter1Filter2

MailHeader

C

ControllerFilter1Filter2

MailHeader

FilterActionD

ControllerFilter1Filter2

MailHeader

FilterAction

NameValuePair

E

Mail client filters incoming mail

Extract behavioral class

Extract data class

Extract behavioral class

Extract data class

Page 56: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.56

Empirical Validation•  Controlled experiment with 63 last-

year master-level students (CS and ICT)

Independent Variables Dependent Variables

Experimental ���task

Institution

God classdecomposition

9

6

3 Time

Accuracy

Page 57: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.57

Interpretation of Results•  “Optimal decomposition” differs with respect to training

☞  Computer science: preference towards C-E

☞  ICT-electronics: preference towards A-C

•  Advanced OO training can induce a preference towards particular styles of decomposition☞  Consistent with [Arisholm et al. 2004]

“Good” design is in the

eye of the beholder

Page 58: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.58

5. Code Duplicationa.k.a. Software Cloning, Copy&Paste

Programming

•  Code Duplication☞  What is it?☞  Why is it harmful?

•  Detecting Code Duplication•  Approaches•  A Lightweight Approach•  Visualization (dotplots)•  Duploc•  Recent trends

Page 59: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.59

The Reengineering Life-Cycle

Requirements

Designs

Code

(0) requirementanalysis

(1) modelcapture

(2) problemdetection (3) problem

resolution

(2) Problem detection

(2) Problem detection

issues•  Scale•  Unknown a priori

Page 60: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.60

Code is CopiedSmall Example from the Mozilla Distribution (Milestone 9)Extract from /dom/src/base/nsLocation.cpp

[432] NS_IMETHODIMP [433] LocationImpl::GetPathname(nsString[434] {[435] nsAutoString href;[436] nsIURI *url;[437] nsresult result = NS_OK;[438] [439] result = GetHref(href);[440] if (NS_OK == result) {[441] #ifndef NECKO[442] result = NS_NewURL(&url, href);[443] #else[444] result = NS_NewURI(&url, href);[445] #endif // NECKO[446] if (NS_OK == result) {[447] #ifdef NECKO[448] char* file;[449] result = url->GetPath(&file);[450] #else[451] const char* file;[452] result = url->GetFile(&file);[453] #endif[454] if (result == NS_OK) {[455] aPathname.SetString(file);[456] #ifdef NECKO[457] nsCRT::free(file);[458] #endif[459] }[460] NS_IF_RELEASE(url);[461] }[462] }[463] [464] return result;[465] }[466]

[467] NS_IMETHODIMP [468] LocationImpl::SetPathname(const nsString[469] {[470] nsAutoString href;[471] nsIURI *url;[472] nsresult result = NS_OK;[473] [474] result = GetHref(href);[475] if (NS_OK == result) {[476] #ifndef NECKO[477] result = NS_NewURL(&url, href);[478] #else[479] result = NS_NewURI(&url, href);[480] #endif // NECKO[481] if (NS_OK == result) {[482] char *buf = aPathname.ToNewCString();[483] #ifdef NECKO[484] url->SetPath(buf);[485] #else[486] url->SetFile(buf);[487] #endif[488] SetURL(url);[489] delete[] buf;[490] NS_RELEASE(url); [491] }[492] }[493] [494] return result;[495] }[496]

[497] NS_IMETHODIMP [498] LocationImpl::GetPort(nsString& aPort)[499] {[500] nsAutoString href;[501] nsIURI *url;[502] nsresult result = NS_OK;[503] [504] result = GetHref(href);[505] if (NS_OK == result) {[506] #ifndef NECKO[507] result = NS_NewURL(&url, href);[508] #else[509] result = NS_NewURI(&url, href);[510] #endif // NECKO[511] if (NS_OK == result) {[512] aPort.SetLength(0);[513] #ifdef NECKO[514] PRInt32 port;[515] (void)url->GetPort(&port);[516] #else[517] PRUint32 port;[518] (void)url->GetHostPort(&port);[519] #endif[520] if (-1 != port) {[521] aPort.Append(port, 10);[522] }[523] NS_RELEASE(url);[524] }[525] }[526] [527] return result;[528] }[529]

Page 61: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.61

Case Study LOCDuplication

without comments

with comments

gcc 460’000 8.7% 5.6%

Database Server 245’000 36.4% 23.3%

Payroll 40’000 59.3% 25.4%

Message Board 6’500 29.4% 17.4%

How Much Code is Duplicated?Usual estimates: 8 to 12% in normal industrial code15 to 25 % is already a lot!

Page 62: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.62

Copied Code Problems•  General negative effect:

☞  Code bloat•  Negative effects on Software Maintenance

☞  Copied Defects ☞  Changes take double, triple, quadruple, ... Work☞  Dead code☞  Add to the cognitive load of future maintainers

•  Copying as additional source of defects ☞  Errors in the systematic renaming produce unintended aliasing

•  Metaphorically speaking:☞  Software Aging, “hardening of the arteries”, ☞  “Software Entropy” increases even small design changes become very

difficult to effect

Page 63: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.63

Code Duplication DetectionNontrivial problem:

•  No a priori knowledge about which code has been copied•  How to find all clone pairs among all possible pairs of segments?

Lexical Equivalence

Semantic Equivalence

Syntactical Equivalence

Type I

Type II(& Type III)Type IV

Page 64: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.64

General Schema of Detection Process

Source Code Transformed Code Duplication Data

Transformation Comparison

Author Level Transformed CodeComparison Technique

[John94a] Lexical Substrings String-Matching

[Duca99a] Lexical Normalized Strings String-Matching

[Bake95a] Syntactical Parameterized Strings String-Matching

[Mayr96a] Syntactical Metric Tuples Discrete comparison

[Kont97a] Syntactical Metric Tuples Euclidean distance

[Baxt98a] Syntactical AST Tree-Matching

Page 65: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.65

Simple Detection Approach (i)•  Assumption:

•  Code segments are just copied and changed at a few places•  Code Transformation Step

•  remove white space, comments•  remove lines that contain uninteresting code elements

(e.g., just ‘else’ or ‘}’)

… //assign same fastid as container fastid = NULL; const char* fidptr = get_fastid(); if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ];

… fastid=NULL; constchar*fidptr=get_fastid(); if(fidptr!=NULL) intl=strlen(fidptr) fastid = newchar[l+]

Page 66: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.66

Simple Detection Approach (ii)•  Code Comparison Step

☞  Line based comparison (Assumption: Layout did not change during copying)

☞  Compare each line with each other line. ☞  Reduce search space by hashing:

1.  Preprocessing: Compute the hash value for each line2.  Actual Comparison: Compare all lines in the same hash bucket

•  Evaluation of the Approach☞  Advantages: Simple, language independent ☞  Disadvantages: Difficult interpretation

Page 67: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.67

A Perl script for C++ (i)while (<>) { chomp; $totalLines++;

# remove comments of type /* */ my $codeOnly = ''; while(($inComment && m|\*/|) || (!$inComment && m|/\*|)) { unless($inComment) { $codeOnly .= $` } $inComment = !$inComment; $_ = $'; } $codeOnly .= $_ unless $inComment; $_ = $codeOnly;

s|//.*$||; # remove comments of type // s/\s+//g; #remove white space s/$keywordsRegExp//og if $removeKeywords; #remove keywords

$equivalenceClassMinimalSize = 1;$slidingWindowSize = 5;$removeKeywords = 0;@keywords = qw(if then else );

$keywordsRegExp = join '|', @keywords;

@unwantedLines = qw( else return return; { } ; );push @unwantedLines, @keywords;

Page 68: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.68

A Perl script for C++ (ii)$codeLines++; push @currentLines, $_; push @currentLineNos, $.; if($slidingWindowSize < @currentLines) { shift @currentLines; shift @currentLineNos;} #print STDERR "Line $totalLines >$_<\n"; my $lineToBeCompared = join '', @currentLines; my $lineNumbersCompared = "<$ARGV>"; # append the name of the file $lineNumbersCompared .= join '/', @currentLineNos; #print STDERR "$lineNumbersCompared\n"; if($bucketRef = $eqLines{$lineToBeCompared}) { push @$bucketRef, $lineNumbersCompared; } else {$eqLines{$lineToBeCompared} = [ $lineNumbersCompared ];} if(eof) { close ARGV } # Reset linenumber-count for next file

•  Handles multiple files•  Removes comments

and white spaces•  Controls noise (if, {,)•  Granularity (number of

lines)•  Possible to remove

keywords

Page 69: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.69

Output Sample

Lines: create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pnMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); create_property(pd,pnOwnership,stBool,true,*iOwnership); Locations: </face/typesystem/SCTypesystem.C>6178/6179/6180/6181/6182 </face/typesystem/SCTypesystem.C>6198/6199/6200/6201/6202 Lines: create_property(pd,pnSupertype,stReference,true,*iSupertype); create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); Locations: </face/typesystem/SCTypesystem.C>6177/6178 </face/typesystem/SCTypesystem.C>6229/6230

Lines = duplicated linesLocations = file names and line number

Page 70: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.70

Visualization of Duplicated Code• Visualization provides insights into the duplication situation• A simple version can be implemented in three days• Scalability issue

• Dotplots — Technique from DNA Analysis •  Code is put on vertical as well as horizontal axis•  A match between two elements is a dot in the matrix

Exact Copies Copies with Inserts/Deletes Repetitive

a b c d e f a b c d e f a b c d e fa b x y e f b c d e a b x y dc ea x b c x d e x f xg ha

Variations Code Elements

Page 71: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.71

Visualization of Copied Code Sequences

All examples are made using Duploc from an industrial case study (1 Mio LOC C++ System)

Detected ProblemFile A contains two copies of a piece of code

File B contains another copy of this code

Possible SolutionExtract Method

File A

File A

File B

File B

Page 72: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.72

Visualization of Repetitive Structures

Detected Problem4 Object factory clones: a switch statement over a type variable is used to call individual construction code

Possible SolutionStrategy Method

Page 73: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.73

Visualization of Cloned Classes

Class A

Class B

Class BClass A

Detected Problem:Class A is an edited copy of class B. Editing & Insertion

Possible SolutionSubclassing …

Page 74: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.74

Visualization of Clone Families

20 Classes implementing lists for different data types

DetailOverview

Page 75: Software Reengineering & Evolution - UAntwerpen

Recent���Trends

© S. Demeyer, S. Ducasse, O. NierstraszObject-Oriented Reengineering.75

Clone Detection Inside

Page 76: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.76

6. Software Evolution•  Exploiting the Version Control System

☞  Visualizing CVS changes•  The Evolution Matrix•  Yesterday's weather

It is not age that turns a piece of software into a legacy system, ���but the rate at which it has been developed and adapted without being reengineered.

[Demeyer, Ducasse and Nierstrasz: Object-Oriented Reengineering Patterns]

Page 77: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.77

The Reengineering Life-Cycle

Requirements

Designs

Code

(0) requirementanalysis

(1) modelcapture

(2) problemdetection (3) problem

resolution

(2) Problem detection

(2) Problem detection Issues•  scale

Page 78: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.78

Analyse CVS changes

4) Block Shift = Design Change

3) Triangle = Core Reduces

1) Vertical lines = Frequent Changers

2) Horizontal line = Shotgun Surgery

Page 79: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.79

Ownership Map: ���Developer Activity

DialogueMonologue

Edit Takeover

Familiarization

Page 80: Software Reengineering & Evolution - UAntwerpen

What to (re)test ?

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.80

Data from Windows Vista and Windows 7

Software components with a high level of ownership will have fewer failures than components with lower top ownership levels.

Software components with many minor contributors will have more failures than software components that have fewer.

Page 81: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.81

The Evolution Matrix

Last Version

First Version

Major Leap

Removed Classes

TIME (Versions) Growth Stabilisation

Added Classes

Page 82: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.82

Example: MooseFinder (38 Versions)

Page 83: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.83

Test history

single test

unit tests

integration tests

… affect unit tests … affect unit tests

phased testing

System under study = checkstyle

Page 84: Software Reengineering & Evolution - UAntwerpen

Selenium Tests

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.84

GitHub Repository Project Description # Commit # Sel. Commit Java LoC Sel. LoCgxa/atlas Gene Expression Atlas Portal for sharing gene expression data 2118 358 32375 5374

INCF/eeg-database EEG/ERP portal Portal for sharing EEG/ERP portal clinical data 854 17 68262 7158mifos/head MiFos Portfolio management for microfinance institutions 7977 505 338705 18735

motech/TAMA-Web Tama Front office application for clinics 2358 239 62034 2815OpenLMIS/open-lmis OpenLMIS Logistics management information system 4714 1153 72275 19195

xwiki/xwiki-enterprise XWiki Enterprise Enterprise-level wiki 688 164 28405 13506zanata/zanata-server Zanata Software for translators 3430 81 111698 3509

Zimbra-Community/zimbra-sources Zimbra Enterprise collaboration software 377 243 1025410 189413

TABLE IV: The 8 repositories in the high-quality corpus.

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●● ●●●●●●●●●●●●●●

●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ●●●●●●●●●●●●●● ●●●●●●●●●

●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●

●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ●●●●● ●●●●●●●●●

●● ●●●●●● ●●●●●●●●●●●●●●●● ●● ●●●●●● ● ●●● ●●●●

0

500

1000

0 500 1000 1500CommitId

FileId

ChangeType● added−regular

added−seleniumdelete−regulardelete−seleniumedit−regularedit−selenium

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

0

500

1000

1500

0 1000 2000 3000 4000CommitId

FileId

ChangeType● added−regular

added−seleniumdelete−regulardelete−seleniumedit−regularedit−selenium

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●

●●

●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●

●●

●●

●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●

●●

●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●

0

500

1000

1500

0 500 1000 1500 2000CommitId

FileId

ChangeType● added−regular

added−seleniumdelete−regulardelete−seleniumedit−regularedit−selenium

Fig. 3: Change histories of the XWIKI-ENTERPRISE (left), OPEN-LMIS (middle), and ATLAS projects (right).

B. Project-specific Change Histories

Before answering RQ2 quantitatively with two new corpus-wide metrics, we provide some visual insights into the commithistories of individual projects from our high-quality corpus.Figure 3 depicts a variant of the Change History Viewsintroduced by Zaidman et al. [23] for three randomly selectedprojects. For each commit in a project’s history, our variantvisualizes whether a SELENIUM (rather than a unit test file)or application file was added, removed or edited. The X-axisdepicts the different commits ordered by their timestamp. TheY-axis depicts the files of the project. To this end, we assign aunique identifier to each file. We ensure that SELENIUM filesget the lowest identifiers by processing them first. As a result,they are located at the bottom of the graph.

Figure 3 clearly demonstrates that SELENIUM tests aremodified as the projects evolve. However, the modificationsdo not appear to happen in a coupled manner. This is to beexpected as SELENIUM tests concern the user interface of anapplication, while application commits also affect the appli-cation logic underlying the user interface. Any evolutionarycoupling will therefore be less outspoken than, for instance,between a unit test and the application logic under test.

Note that a large number of files is removed and addedaround revision 1000 of the xwiki-enterprise project. Thecorresponding commit message6 reveals that this is due to achange in the project’s directory structure. We see this occur inseveral other projects. In providing a more quantitative answerto RQ2, the remainder of this section will therefore take careto track files based on their content rather than their path alone.

C. Corpus-Wide Commit Activity Metrics

We first aim to provide quantitative insights into the paceat which SELENIUM-based functional tests are changed as theweb application under test evolves.

1) Research method: To this end, we evaluate the high-quality corpus against several metrics that are based on thefollowing categorization of commit activities:

6Commit 74feec18b81dec12d9d9359f8fc793587b4ed329

Repository #S AS

C

SS

C

AS

D

SS

D

gxa/atlas 258 6.67 1.39 1.5 0.33INCF/eeg-database 11 75.36 1.55 98.59 2.8

mifos/head 381 19.58 1.33 6.7 0.4motech/TAMA-Web 170 11.52 1.41 3.33 0.44

OpenLMIS/open-lmis 704 5.05 1.64 0.45 0.16xwiki/xwiki-enterprise 96 5.33 1.71 6.94 1.95

zanata/zanata-server 51 65.65 1.59 35.82 0.72. . . /zimbra-sources 66 1.74 3.73 1.81 7.09

TABLE V: Averaged commit activity metrics for the high-quality corpus. The first column denotes either #SS or #AS(cannot diverge by more than 1).

SC SELENIUM commit: commit that adds, modifies or deletes atleast one SELENIUM file.

AC Application commit: commit that does not add, modify or deleteany SELENIUM file.

The same categorization transposes to commit sequences:

SS SELENIUM span: maximal sequence of successive SC.AS Application span: maximal sequence of successive AC.

Finally, this categorization enables computing the followingmetrics about each kind of span:

ASD,C

Length of an AS in days and in commits.SS

D,C

Length of a SS in days and in commits.

2) Results: The next table depicts the results for the com-mit activity metrics for the repositories in the high-quality cor-pus. The most revealing entries are related to the average lengthof the non-SELENIUM spans measured in commits (AS

C

).It takes on average about 11.23 non-SELENIUM commits(or 4.33 days) before a commit affects a SELENIUM file.However, this mean is largely influenced by outliers sincethe third quartile is even lower with only 9 non-SELENIUMcommits (2.05 days). These results suggest that SELENIUM-based functional tests do co-evolve with the web applicationunder test.

AS

C

SS

C

AS

D

SS

D

Mean 11.23 1.59 4.33 0.66Std Deviation 73.07 1.36 33.66 4.9

1st Quartile 2 1 0.05 0.02Median 4 1 0.54 0.06

3rd Quartile 9 2 2.05 0.36

Git repositories of the XWiki, OpenLMIS and Atlas© Laurent Christophe (Vrije Universiteit Brussel)

0.0

0.1

0.2

0.3

0.4

assertion command constant demarcator locationChange Classification

Cha

nge

Hit

Rat

io

Fig. 6: Summary of the corpus-wide change classification.Project Total Locator Command Demarcator Asserts ConstantsAtlas 8068 90 3 104 3282 2586XWiki 68665 115 154 24 1490 3114Tama 31821 95 89 43 36 571Zanata 12959 497 119 0 1 906EEG/ERP 248 3 0 0 6 24OpenLMIS 69792 2550 401 8 3454 8972

TABLE VII: Project-scoped change classification.

changes. Table VII lists project-scoped results. Our tool timedout on two projects of the corpus with an extensive history.

The most frequently made changes are those to con-stants and asserts. These are the two test components thatare most prone to changes. Constants occur frequently inlocator expressions to retrieve DOM elements from a webpage and in assert statements as the values compared against.10

Focusing future tool support for test maintenance on theseareas might therefore benefit test engineers most. Existingwork about repairing assertions in unit tests [4], and aboutrepairing references to GUI widgets in functional tests fordesktop applications [12] suggests that this is not infeasible.Note that existing work also targets repairing changes in testcommand sequences [15], but such repairs do not seem tooccur much in practice.

Both outliers in our results stem from the ATLAS project.Its test scripts contain hardcoded genome strings inside assertstatements that are frequently updated.

D. Threats to Validity

The edit script generated by CHANGENODES is not alwaysminimal. It may incorrectly output a sequence of redundantoperations for nodes that are not modified. This is due tosome of the heuristics used by the differencing algorithm.These unneeded operations only form a small set of the totaloperations. We have performed random validations of distilledchanges to ensure the correctness of our results.

Several changes are classified by looking at names ofmethods, without using type information. Computing thisinformation would be too expensive to do our experiments onmultiple large-scale projects. As a result some changes maybe incorrectly classified.

We have been unable to find examples of some of thechange categories from Section II. This is either due to ourchange query being too strict, the patterns not being present inthe examined projects or due incorrectly distilling the changesmade to the SELENIUM scripts.

10Our tool classifies such changes also in the locator or assertion category.

VI. RELATED WORK

Little is known about how automated functional tests areused in practice. A notable exception is Berner et al. [1]’saccount of their experiences with automated testing in in-dustrial applications. Their observation “The Capability ToRun Automated Tests Diminishes If Not Used” underlinesthe importance of test maintenance. In interviews with ex-perts from different organizations, an industrial survey [16]found that the main obstacles in adopting automated testsare development expenses and upkeep costs. Finally, Leottaet al. [17] recently reported on an experiment in which theauthors manually created two distinct sets of SELENIUM-basedautomated functional tests for 6 randomly selected open-sourcePHP projects. While the first set of tests corresponds to the testscripts studied in this paper, the second set of tests is createdusing a capture-and-replay functionality of the SELENIUMIDE. The authors find that the latter are less expensive todevelop in terms of time required, but much more expensive tomaintain. To the best of our knowledge, ours is the first large-scale study on the prevalence and maintenance of SELENIUM-based functional tests —albeit on open-source software.

Unit tests have received more attention in the literature. Thework of Zaidman et al. [23] on the co-evolution of unit testswith the application under test is seminal. Apart from “ChangeHistory Views” (cf. Section IV-B), they also proposed to plotthe relative growth of test and production code over time in“Growth History Views”. This enables observing whether theirdevelopment proceeds synchronously or in separate phases.Fluri et al. follow a similar method in their study on co-evolution of code and comments [8]. The same goes for astudy on co-evolution of database schema and application codeby Goeminne et al. [11]. Our metrics from Section IV aim toprovide quantitative rather than visual insights into this matter,based on commit activities rather than code growth.

Method-wise, there are several works tangential to oursin mining software repositories. German and Hindle [18], forinstance, classify metrics for change along the dimensions ofwhether the metric is aware of changes between two distinctprogram versions, of whether the metric is scoped to entitiesor commits, and of whether the metric is applied at specificevents or at evenly spaced intervals. The commit activity andmaintenance metrics from Section IV are scoped to commitsand SELENIUM files, unaware and aware of program changes,and applied at every and specific types of commits respec-tively. Several techniques and metrics have been proposed fordetecting co-changing files in a software repository (e.g., [13],[24]). We expect such fine-grained evolutionary couplings tobe less outspoken in our setting because test scripts exercisean implemented user interface along extensive scenarios, ratherthan the implementation itself. More research is needed.

Section V distilled and subsequently categorized changeswithin each commit to a SELENIUM file. Similar analyses havebeen performed using the CHANGEDISTILLER [9] tool. Theaforementioned study by Fluri et al. [8], for instance, includesa distribution of the types of source code changes (e.g., returntype change) that induce a comment change. Another fine-grained classification of changes to Java code has also beenused to better quantify evolutionary couplings of files [7]. Incontrast to these general-purpose change classifications, oursis specific to automated functional tests. More coarse-grained

Avoid Magic Constants !!

Page 85: Software Reengineering & Evolution - UAntwerpen

Recommender Systems

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.85

Stack Trace ⇒ link to source code

Description ⇒ text mining

Who to fix ? How long to fix ?

Misclassified bug reports ?

Page 86: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.86

7. Conclusion1. Introduction

There are OO legacy systems too !

2. Reverse EngineeringHow to understand your code

3. VisualizationScaleable approach

4. RestructuringHow to Refactor Your Code

4. Code DuplicationThe most typical problems

5. Software EvolutionLearn from the past

6. ConclusionDid we convince you?

Page 87: Software Reengineering & Evolution - UAntwerpen

© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.87

GoalsWe will try to convince you:•  Yes, Virginia, there are object-oriented legacy systems too!

☞  … actually, that's a sign of health

•  Reverse engineering and reengineering are essential activities in the lifecycle of any successful software system. (And especially OO ones!)☞  … consequently, do not consider it second class work

•  There is a large set of lightweight tools and ���techniques to help you with reengineering.☞  … check our book, but remember the list is growing

•  Despite these tools and techniques, ���people must do job and represent the most valuable resource.☞  … pick them carefully and reward them properly

⇒ Did we convince you ?