Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II...

32
Search-based Refactoring: Towards Semantics Preservation 1 DIRO, Université de Montréal, Canada 2 CS, Missouri University of Science and Technology, USA 3 IT Department, ABMMC, Qatar 28th IEEE International Conference on Software Maintenance Riva del Garda, September 23-30, 2012 Ali Ouni 1,2 , Marouane Kessentini 2 , Houari Sahraoui 1 , And Mohamed Salah Hamdi 3

Transcript of Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II...

Page 1: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

Search-based Refactoring: Towards Semantics

Preservation

1DIRO, Université de Montréal, Canada 2CS, Missouri University of Science and Technology, USA

3IT Department, ABMMC, Qatar

28th IEEE International Conference on Software Maintenance

Riva del Garda, September 23-30, 2012

Ali Ouni1,2, Marouane Kessentini2, Houari Sahraoui1,

And Mohamed Salah Hamdi3

Page 2: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

2

Motivating example

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Employee

ID

Name

FamilyName

Natinality

DateOfBirth

Sex

. . .

. . .

getPhoneNumber()

calculateLocalTax()

getAge()

calculateSalary()

setMaritalStatus()

getCurrentNatinality()

. . .

. . .

Car

IdNumber

TowingCapacity

OwnerName

. . .

getHistoryReport()

getTowingCapacity()

setInsuranceNum()

. . .

defect: Blob Position

PositionId

Grade

CompanyName

. . .

getPosition()

setGrade)

. . .

Page 3: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

3

Outline

• Problem statement

• Approach : Multi-Objective Refactoring

• Evaluation

• Conclusion

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Page 4: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

4

• Design defect introduced during the initial

design or during evolution

– Anomalies, anti-patterns, bad smells…

– Design situations that adversely affect the development of a

software

– Examples: Blob, spagheti code, functional decomposition, ...

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Problem statement

Page 5: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

5

• Refactoring to correct them and to improve code

quality – The process of improving a code after it has been written by

changing the internal structure of the code without changing the external behavior (Fowler et al., ‘99)

– Examples: Move method, extract class, move attribute, ...

• Refactoring implementation may produce

semantic errors

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Problem statement

Page 6: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

6

Problem statement

• Automate the refactoring task

• Existing approaches – See the refactoring as a single-objective problem

– Improve the internal structure

– The semantics is not a major concern

– Produce semantic errors / incoherencies

– Fastidious manual inspection is needed

• How to find the best compromise between quality

improvement and semantics preservation

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Page 7: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

7

Approach Overview

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Source code

with defects

Search-based Refactoring

(NSGA-II)

Detection rules Refactoring

operations

Proposed

refactorings

Call Graph

Page 8: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

8

Multi-Objective Refactoring

• See the refactoring task as a multi-objective

optimization problem

– Improve software quality

• Defects correction

– Preserve domain semantics / coherence

• Program analysis : call graph, derive metrics, ...

• Information retrieval : cosine similarity

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Meta Heuristic Search Using Multi-Objective

Optimization (NSGA-II)

Page 9: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

9

NSGA-II overview

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

• NSGA-II: Non-dominated Sorting Genetic Algorithm (K. Deb et al., ’02)

Parent

Population

Offspring

Population

Non-dominated

sorting

F1

F2

F3

F4 Crowding distance

sorting

Population

in next

generation

Page 10: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

10

NSGA-II adaptation

• Representation of the individuals

• Creation of a population of individuals

• Creation of new individuals using genetic operators

(crossover and mutation)

• Definition of fitness functions

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Page 11: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

11

– Individual = Refactoring solution

– Sequence of refactoring operations

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Representation of individuals

RO1 moveMethod

RO2 pullUpAttribute

RO3 extractClass

RO4 inlineClass

RO5 extractSuperClass

RO6 inlineMethod

RO7 extractClass

RO8 moveMethod

Page 12: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

12

– Population: set of refactoring solutions

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Population creation

RO1 moveMethod

RO2 pullUpAttribute

RO3 extractClass

RO4 inlineClass

RO5 extractSuperClass

RO6 inlineMethod

RO7 extractClass

RO8 pullUpAttribute

RO9 extractClass

RO10 moveMethod

RO1 moveMethod

RO2 pullUpAttribute

RO3 extractClass

RO4 inlineClass

RO5 extractSuperClass

RO6 inlineMethod

RO7 extractClass

RO8 moveMethod

RO1 moveMethod

RO2 pullUpAttribute

RO3 extractClass

RO4 inlineClass

RO5 extractSuperClass

RO6 inlineMethod

RO7 extractClass

RO8 moveMethod

RO1 moveMethod

RO2 pullUpAttribute

RO3 extractClass

RO4 inlineClass

RO5 extractSuperClass

RO6 inlineMethod

RO1 moveMethod

RO2 pullUpAttribute

RO3 extractClass

RO4 inlineClass

RO5 extractSuperClass

RO6 inlineMethod

RO7 inlineClass

RO8

inlineMethod

RO9 extractSuperClass

RO1 moveMethod

RO2 pullUpAttribute

RO3 extractClass

RO4 inlineClass

RO5 extractSuperClass

RO6 inlineMethod

RO7 extractClass

RO1 moveMethod

RO2 pullUpAttribute

RO3 extractClass

RO4 inlineClass

RO5 extractSuperClass

RO6 inlineMethod

RO7 extractClass

RO8 moveMethod

RO1 moveMethod

RO2 pullUpAttribute

RO3 extractClass

RO4 inlineClass

RO5 extractSuperClass

RO6 inlineMethod

RO7 extractClass

RO8 moveMethod

Page 13: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

13

Population creation

- Specify the controlling parameters

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

a1 a2 . . . an m1 m2 . . . mm moyenne

m1 1 0 1 1 1 0 0.42

m2 0 1 1 1 1 0 0.6

.

.

.

mm 1 0 1 0 1 1 0.32

Refactorings Controlling parameters

move method (sourceClass, targetClass, method)

move field (sourceClass, targetClass, field)

pull up field (sourceClass, targetClass, field)

pull up method (sourceClass, targetClass, method)

push down field (sourceClass, targetClass, field)

push down method (sourceClass, targetClass, method)

inline class (sourceClass, targetClass)

extract class (sourceClass, newClass)

Extract class

Page 14: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

14

1. Crossover

2. Mutation

Genetic Operators

(k=3)

(i=3, j=5)

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

child A

child B

Page 15: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

15

Fitness Functions

1. Quality - Maximize the number of corrected defects

2. Semantics - Minimize domain-semantics errors

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Page 16: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

16

Quality Fitness Function

• For each candidate refactoring solution:

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

oringore_refactefects_befdetected_d#

defectscorrected_#Quality

Page 17: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

17

Semantics Fitness Function

• Intuition : – The correctness of proposed refectorings increase

when applied to semantically connected elements.

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Page 18: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

18

Semantics Fitness Function

• Vocabulary-based similarity

– Vocabulary similarity

• Dependency-based similarity

– Shared methods call

– Shared field access

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Page 19: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

19 Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

• Vocabulary similarity

....

IGanttProj

project

mainFrame

rdrojectWizawizardNewP

jectWizardshowNewPro

GanttProj

PrjInfos

rojectcreateNewP

JFrame

myMainFram

WizardNewProject

1V

...

6

20

19

1

7

6

18

6

5

2

15

1V

...

2

26

4

0

0

10

9

1

0

6

3

2V

Class c1 Class c2

2*1

2*1arccos

VV

VV

Term frequency

Semantics Fitness Function

Cosine similarity

Page 20: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

20

• Shared methods

– Use call graph • Assumption: if two classes share many common

method calls then they are likely to be semantically

related

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

OutsharedCallInsharedCall ** c2) ods(c1,sharedMeth

Semantics Fitness Function

Page 21: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

21

• Shared Field access – Assumption: two software elements are

semantically related if they read or modify the same

fields.

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

|dsRW(c2)accessFiel dsRW(c1)accessFiel|

|dsRW(c2)accessFiel dsRW(c1)accessFiel| c2) dsRW(c1,sharedFiel

Semantics Fitness Function

Page 22: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

22

• For one refactoring operation

where

• For a refactoring solution

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

dssharedFiel* odCallsharedMeth*_*Sem(RO) similarityvocabulary

1

n

ROSem

Semantic

n

i

i

1

0

)(

Semantics Fitness Function

Page 23: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

23

• Two research questions

– RQ1. To what extent can the proposed approach

correct design defects and preserve the semantics

when applying refactorings?

– RQ2. To what extent can the semantics preservation

improve the results provided by existing work?

Evaluation

Context. Problem statement. Multi-Objective. Refactoring. Validation. Conclusion.

Page 24: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

24

Evaluation

• Data: Two large open source Java projects

Systems Number of

classes

KLOC Number of

Defects

GanttProject v1.10.2 245 31 41

Xerces-J v2.7.0 991 240 66

Context. Problem statement. Multi-Objective. Refactoring. Validation. Conclusion.

Page 25: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

25

Evaluation

• Method: Two metrics

– defect correction ratio (DCR)

– refactoring precision (RP)

Context. Problem statement. Multi-Objective. Refactoring. Validation. Conclusion.

gsrefactorin applying before defects #

defects corrected #DCR

gsrefactorin proposed#

gsrefactorin meaningful #RP

Page 26: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

26

Results & Comparison

Systems Approach Corrected defects

(DCR)

Meaningful refactorings

(RP)

GanttProject

v1.10.2

Multi-objective (NSGA-II) 87% (36|41) 91% (128|146)

Single-objective (GA)

Kessentini et al. 11 95% (39|41) 73% (158|216)

Harman et al. 07 N.A 69% (218|312)

Xerces-J v2.7.0

Multi-objective (NSGA-II) 78% (51|66) 89% (197|221)

Single-objective (GA)

Kessentini et al. 11 89% (59|66) 69% (212|304)

Harman et al. 07 N.A 63% (262|417)

Context. Problem statement. Multi-Objective. Refactoring. Validation. Conclusion.

Page 27: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

27

Approach sensitivity

An example of multiple executions on Xerces-J

Context. Problem statement. Multi-Objective. Refactoring. Validation. Conclusion.

Page 28: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

28

Conclusion

• A novel search-based approach for refactoring suggestion – Quality vs semantics preservation

• Evaluation – Two large open-source systems

– Our approach succeeded in correcting the majority of defects while

preserving the domain-semantics of the original program

• Future Work – Extend the semantics approximation

– Test this approach with other projects

– Comparative study with other different existing techniques

Context. Problem statement. Multi-Objective. Refactoring. Validation. Conclusion.

Page 29: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

29

Thanks for your attention

Questions? Comments? Suggestions?

Page 30: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

30

Selection of best solutions

An example of Pareto optimal solutions for Xerces-J

Context. Problem statement. Multi-Objective. Refactoring. Validation. Conclusion.

Page 31: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

31

Dominance Principle

• Problem: How to identify optimal solutions for a given

population?

• Pareto optimality

min f2

min f1

Non-dominated individuals

Pareto Front

Dominated individuals

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Page 32: Search-based Refactoring: Towards Semantics Preservation€¦ · in next generation . 10 NSGA-II adaptation • Representation of the individuals • Creation of a population of individuals

32

The Blob example

• Definition – Procedural-style design leads to one

object with numerous responsibilities

and most other objects only holding

data or executing simple processes.

• Symptoms – A Blob is a controller class, abnormally

large, with almost no parents and no

children. It mainly uses data classes,

i.e. very small classes with almost no

parents and no children (Brown et al.

’98).

Context. Problem statement. Multi-Objective Refactoring. Validation. Conclusion.

Main Controller Class

Start

Stop

Initialize

Set_Mode

Login

Set_Status

Begin_Review_A

Refresh_Controle

Load

Do_This

Do_That

Do_The_Other

Etc

,,,

Data_List_Pointer

Status

Mode

User

Group

Date_Time

ACL

Module_Button_1

Module_Button_2

Module_Button_3

Iterator_1

Iterator_2

Contrôle

Console

Etc

,,,

Records

Data_1

Curr_Fmt

Figures

Error_Set

Users

Table_1

Value_Nmc

Rows

Cols

Buffer

,,,

Image_1

X_Pos

Y_Pos

Loc_Ptr

Buffer

,,,

Image_1

X_Pos

Y_Pos

Loc_Ptr

Buffer

,,,

Image_1

X_Pos

Y_Pos

Loc_Ptr

Buffer

,,,