WCRE 1999 / 2009
description
Transcript of WCRE 1999 / 2009
![Page 1: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/1.jpg)
1 of 63
WCRE 1999 / 2009
Experiments with clustering
as a software
remodularization method
Nicolas AnquetilNicolas AnquetilTimothy C. LethbridgeTimothy C. Lethbridge
![Page 2: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/2.jpg)
2 of 63
Forewarning
Nicolas: After this research I became suspicious of the
usefulness of clustering for remodularization.
I still am.
![Page 3: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/3.jpg)
3 of 63
You have been warned
(although note that Tim has a less gloomy view)
![Page 4: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/4.jpg)
4 of 63
Agenda Background of the research Overview of the paper From then until now And now what? An analogy Another analogy
![Page 5: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/5.jpg)
5 of 63
Background of the research
Context: KBRE group, U. of Ottawa, Canada CSER project (Consortium for Software
Engineering Research) Pairs: university/company
(U. Of Ottawa/Telecom. company) Focus on real problems and/or
real situations
![Page 6: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/6.jpg)
6 of 63
Background of the research
The project: One company's PBX 2+ MLOC 2+ K files 10+ possible configurations 10+ years old (in 1999) 2 proprietary languages 1 directory 0 packages
![Page 7: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/7.jpg)
7 of 63
Background of the research
Company situation: High turnover (18 months) High entry barrier (6+ months to be
productive) Aging software (and languages) Configuration management difficulties
![Page 8: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/8.jpg)
8 of 63
Agenda Background of the research Overview of the paper From then until now And now what? An analogy Another analogy
![Page 9: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/9.jpg)
9 of 63
Overview of the paper
””providing solutions providing solutions to help software to help software engineers understand, engineers understand, restructure or restructure or migrate old software migrate old software towards more modern towards more modern architecture and/or architecture and/or languages”languages”
![Page 10: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/10.jpg)
10 of 63
Overview of the paper
Possible solution:Possible solution:
””Clustering is used Clustering is used to gather software to gather software components into components into modules significant modules significant to the software to the software engineers.”engineers.”
![Page 11: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/11.jpg)
11 of 63
Overview of the paper Seminal paper by Theo Wiggerts, “Using
Clustering Algorithms in Legacy Systems Remodularization”, WCRE'97 Summary of the literature on clustering Lists all the possible choices Lists some advantages and drawbacks of
these choices
![Page 12: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/12.jpg)
12 of 63
Overview of the paper
””Clustering is a Clustering is a sophisticated sophisticated research domain with research domain with many methods [...] many methods [...] Reverse engineering Reverse engineering is a young domain is a young domain [...] Clustering has [...] Clustering has been used with no been used with no deep understanding of deep understanding of all the issues all the issues involved.”involved.”
![Page 13: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/13.jpg)
13 of 63
Overview of the paper
””Conclusions of Conclusions of Wiggerts' paper are Wiggerts' paper are those of the those of the literature which may literature which may not entirely hold for not entirely hold for reverse engineering.”reverse engineering.”
![Page 14: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/14.jpg)
14 of 63
Overview of the paper For example:
Living things naturally fit in an evolution tree (more or less)
Not so with software modularization
This must impact the tools we use and how we use them
![Page 15: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/15.jpg)
15 of 63
Overview of the paper Three issues
What clustering algorithms to use?
How to compute cohesion? How to describe entities? How to evaluate the results?
![Page 16: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/16.jpg)
16 of 63
Overview of the paper Algorithms
We tested mainly hierarchical agglomerative algorithms
Some tests with hill-climbing algorithms (”Bunch” tool: Mancoridis)
![Page 17: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/17.jpg)
17 of 63
Overview of the paper Entities
We clustered files (into packages)
Description Elements contained in the files: Types, variables, routines, macros,
comments, identifiers
![Page 18: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/18.jpg)
18 of 63
Overview of the paper
Reminder:Reminder:
””Clustering Clustering algorithms do not algorithms do not discoverdiscover some hidden some hidden structure in a structure in a system, but system, but imposeimpose a a structure on the set structure on the set of entities they are of entities they are given.”given.”
![Page 19: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/19.jpg)
19 of 63
Overview of the paperSome results
Redundancies among description schemes: File, routine, variable, macro, type Comments, identifiers
![Page 20: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/20.jpg)
20 of 63
Overview of the paperSome results
Combining features (routine + variable + ...) improves the results
![Page 21: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/21.jpg)
21 of 63
Overview of the paperSome results
Direct/sibling links Sibling more used and better
![Page 22: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/22.jpg)
22 of 63
Overview of the paperSome results
Avoid “sparse” descriptive features Avoid similarity metrics that consider absence
of a feature as significant
![Page 23: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/23.jpg)
23 of 63
Agenda Background of the research Overview of the paper From then until now And now what? An analogy Another analogy
![Page 24: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/24.jpg)
24 of 63
From then until now Raw numbers What extensions?
![Page 25: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/25.jpg)
25 of 63
From then until nowReferences (volume)
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 20090
2
4
6
8
10
12
14
16
18
-
[data from Google scholar][data from Google scholar]
![Page 26: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/26.jpg)
26 of 63
From then until nowReferences (authors)
P.Tonella(8), F.Ricca(7), C.Girardi(5), E.Pianta(5)
O.Maqbool(7), HA.Babri(6) C.Tjortjis(5) N.Anquetil(5) S.Ducasse(5) K.Sartipi(4)
[data from Google scholar][data from Google scholar]
![Page 27: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/27.jpg)
27 of 63
From then until nowReferences (venue)
Thesis=11
CSMR = 6 IWPC = 6 WCRE = 5 J.Soft.Maint.
Evol. = 4
J.Syst.Soft. = 4
ICSM = 3
ICSE = 2
Trans.Syst.Eng. = 2
[data from Google scholar][data from Google scholar]
![Page 28: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/28.jpg)
28 of 63
From then until nowSome extensions
Clustering, how? New/improved algorithms New/improved distance metrics
Clustering what? New entities (and/or description)
Clustering, why?
Other extensions
![Page 29: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/29.jpg)
29 of 63
From then until nowNew algorithm
Genetic algorithm [Mahdavi]
“Combined algorithm” [Saeed, Maqbool, Babri, Hassan, Sarwar]
![Page 30: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/30.jpg)
30 of 63
From then until nowNew distance metric
Minimization of information loss [Andritsos, Tzerpos]
![Page 31: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/31.jpg)
31 of 63
From then until nowNew entities
Static web pages [Di Lucca,
Fasolino, Tramontana]
[Tonella,Ricca,Pianta, Girardi]
Association rules [Maqbool,Babri]
Data vs. Control [Davey,Burd],
[Sartipi,Kontogiannis]
Dynamic data [Stroulia,Systä]
Co-change records
![Page 32: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/32.jpg)
32 of 63
From then until nowOther extensions
Evaluations / comparisons [Tonella], [Wu, Holt], [Parsa, Bushehrian]
Framework
![Page 33: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/33.jpg)
33 of 63
From then until nowOther extensions
Needs of maintainers? [Tjortjis, Layzell]
Input for visualization tools [Ducasse]
Naming clusters [Tzerpos], [Maqbool, Babri]
![Page 34: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/34.jpg)
34 of 63
Agenda Background of the research Overview of the paper From then until now And now what? An analogy Another analogy
![Page 35: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/35.jpg)
35 of 63
And now what? Back to paper's results Wild ideas in clustering Related topics
![Page 36: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/36.jpg)
36 of 63
And now what?Paper's results
Choice of (traditional) algorithm matters little It will give a result Not significantly better or worse than other
![Page 37: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/37.jpg)
37 of 63
And now what?Paper's results
Choice of similarity metric matters little
As long as they don't consider absence of a feature as a sign of similarity
![Page 38: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/38.jpg)
38 of 63
And now what?Paper's results
Choice of description scheme for entity matters a bit more
May be source of short term progress? Using dynamic information?
![Page 39: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/39.jpg)
39 of 63
And now what?Wild ideas
Consider new entities? Individual instructions? Non code: requirements, model elements,
tests, … ?
Process-wise modularization? Clustering requirements, models elements, ...
![Page 40: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/40.jpg)
40 of 63
And now what?Related topics
Problem without solution? Software modularization is highly subjective Packages are not mutually exclusive Decisions must be made that are always
wrong (and always correct)
![Page 41: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/41.jpg)
41 of 63
And now what?Related topics
Modularization is a logical (virtual) decomposition based on semantics High cohesion, low coupling may only be an
(imperfect) by-product of pre-chosen modularization
Cohesion/coupling not a driving force but a secondary goal?
Other forces, e.g. packages of “comparable” sizes
![Page 42: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/42.jpg)
42 of 63
And now what?Related topics
Typical example: Utility package Low cohesion, high coupling java.util
BitSet, Calendar, Currency, Dictionary, EventListenerProxy, Formatter, Observable, Random, ResourceBundle, Scanner, UUID, TimeZone, ...
![Page 43: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/43.jpg)
43 of 63
And now what?Related topics
How to evaluate results? Open question in the paper
Cohesion/coupling Normaly useless because it is the function
optimized by the algorithms Gold standard
Manually: expensive, not precise Automatically: biased
![Page 44: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/44.jpg)
44 of 63
And now what?Related topics
How to evaluate results? Other metrics, e.g. Stability, Non-extremity
[Wu]
![Page 45: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/45.jpg)
45 of 63
Agenda Background of the research Overview of the paper From then until now And now what? An analogy Another analogy
![Page 46: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/46.jpg)
46 of 63
And now what?Paper's results
”The fact that all six algorithms are ranked low on authoritativeness suggests that they may not be mature enough for use in production on large systems undergoing evolutionary change.However ...”
[Wu, Holt, 2005]
![Page 47: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/47.jpg)
47 of 63
An analogy A short story of Belo Horizonte:
In 1893 a new capital is planned in the state of Minas Gerais (Brazil)
The arquitects/urbanists get inspiration from Washington D.C.
![Page 48: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/48.jpg)
48 of 63
An analogy The initial architecture:
Planned Belo Horizonte
![Page 49: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/49.jpg)
49 of 63
An analogy The city grew (2.5 Mhab., area=5.1 Mh.)
![Page 50: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/50.jpg)
50 of 63
An analogy The city grew (2.5 Mhab.)
![Page 51: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/51.jpg)
51 of 63
An analogy Could we remodularize that?
![Page 52: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/52.jpg)
52 of 63
An analogy Could we remodularize that?
![Page 53: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/53.jpg)
53 of 63
An analogy Analogy with software clustering:
Initial architecture is completly lost in the overall city
Regularities would allow to find only small “clusters”
There are large “empty” parts difficult to (automatically) cluster
A division into districts would necessarily be subjective
![Page 54: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/54.jpg)
54 of 63
Agenda Background of the research Overview of the paper From then until now And now what? An analogy Another analogy
![Page 55: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/55.jpg)
55 of 63
Another analogy You are a 21-year old leaving university
You buy a large house because you have a good job
You are not well organized You have a general concept that “food goes in
the kitchen and clothes go in the bedroom” But much of your stuff is strewn around
![Page 56: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/56.jpg)
56 of 63
Another analogy Initially you do not have many things, so the
disorganization doesn't matter
After a while, you accumulate very many worldly goods
You constantly can't find things Your new partner starts complaining
![Page 57: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/57.jpg)
57 of 63
Another analogy You realize it is time to organize things better
You are a computer scientist so you want to apply a clustering algorithm
![Page 58: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/58.jpg)
58 of 63
Another analogy But what criteria to use?
Things made in the same country go together?
Oops, the 'China' cluster is too big Temporal cohesion?
Things used in the morning in one place, things used in the evening in another place?
– Where does 'toothbrush' go?
![Page 59: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/59.jpg)
59 of 63
Another analogy Functional cohesion
Everything for each recipe I make is kept together
But utilities (things used commonly) are separately organized as a cluster
Too awkward
![Page 60: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/60.jpg)
60 of 63
Another analogy In the end, your approach is pragmatic:
1.You decide from general experience on a set of general categories and storage locations
2. You spend a weekend moving things into these locations (yes there are thousands of things)
![Page 61: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/61.jpg)
61 of 63
Another analogy
3. As you proceed, you notice Some things do not fit in any categories Some categories are not so well chosen Some categories overlap
4. You refactor the categories a bit and move things around
![Page 62: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/62.jpg)
62 of 63
How can this be applied to software? Use a clustering tool to mainly to give you a
sense of the possibilities Combine with other RE tools to learn about
the functionality of each module as well as other properties
But also apply general wisdom about good software design
![Page 63: WCRE 1999 / 2009](https://reader035.fdocuments.in/reader035/viewer/2022081506/56814a79550346895db78fd7/html5/thumbnails/63.jpg)
63 of 63
How can this be applied to software? Play with the parameters of the clustering tool
and other RE tools, refactoring until you have achieved a remodularization that you understand
Ideally, tools would allow instant adjustment with good visualization
Retain documents describing the resulting design