Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000]...
-
Upload
todd-summers -
Category
Documents
-
view
215 -
download
1
Transcript of Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000]...
![Page 1: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/1.jpg)
Evolution in Open Source Software: A Case Study
Michael W. Godfrey Qiang Tu
[paper in ICSM 2000]
Software Architecture Group University of Waterloo
![Page 2: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/2.jpg)
What is software evolution?
“Evolution is what happens while you’re busy
making other plans.”
Usually, we consider evolution to begin once the first version has been delivered:
Maintenance is the planned set of tasks to effect changes.
Evolution is what actually happens to the software.
![Page 3: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/3.jpg)
Previous research Lehman’s laws Parnas on software geriatrics Eick et al. on code decay (10 MLOC
telecom) Gall et al. (10 MLOC telecom)
Munro, Burd et al. (2 MLOC gcc)
![Page 4: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/4.jpg)
Lehman’s Laws in a nutshell Observations:
(Most) useful software must evolve or die. As a software system gets bigger, its resulting
complexity tends to limit its ability to grow. Development progress/effort is (more or less)
constant; growth is at best constant. Advice:
Need to manage complexity. Do periodic redesigns. Treat software and its development process as a
feedback system (and not as a passive theorem).
![Page 5: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/5.jpg)
Lehman’s examples
![Page 6: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/6.jpg)
A case study in evolution:The Linux OS kernel
![Page 7: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/7.jpg)
A case study in evolution:The Linux OS kernel It’s Linux!
Large system, very stable, many releases over several years, many developers
Growing mainstream adoption Open source development model
Interesting phenomenon in itself Easy to track, can publish results, many
experts Not much previous study
![Page 8: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/8.jpg)
Methodology Examined 96 versions of Linux kernel
34 of the 67 stable releases 62 of the 369 development releases
All measures considered only .c/.h files contained in the tarball
Counted LOC using “wc –l” and an awk script that ignored comments and blank lines
Counted # of fcns/vars/macros using ctags Architectural model (SSs hierarchy) based on default
directory structure We plotted growth against calendar time
Lehman suggests plotting growth against release number
![Page 9: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/9.jpg)
Growth of # of source files
0
1000
2000
3000
4000
5000
6000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
# o
f so
urc
e co
de
file
s (*
.[ch
] )
Development releases (1.1, 1.3, 2.1, 2.3)
Stable releases (1.0, 1.2, 2.0, 2.2)
![Page 10: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/10.jpg)
Growth of # of global fcns, variables, and macros
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
# o
f g
lob
al f
cns,
var
iab
les,
an
d m
acro
s Development releases (1.1, 1.3, 2.1, 2.3)
Stable releases (1.0, 1.2, 2.0, 2.2)
![Page 11: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/11.jpg)
Growth of Lines of Code (LOC)
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
LO
C
Total LOC ("wc -l") -- development releases
Total LOC ("wc -l") -- stable releases
Total LOC uncommented -- development releases
Total LOC uncommented -- stable releases
![Page 12: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/12.jpg)
Average/median .c file size
0
100
200
300
400
500
600
700
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Un
com
men
ted
LO
C
Average .c file size -- dev. releasesAverage .c file size -- stable releasesMedian .c file size -- dev. releasesMedian .c file size -- stable releases
![Page 13: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/13.jpg)
Average/median .h file size
0
20
40
60
80
100
120
140
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Un
co
mm
ente
d L
OC
Average .h file size -- dev. releasesAverage .h file size -- stable releasesMedian .h file size -- dev. releasesMedian .h file size -- stable releases
![Page 14: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/14.jpg)
Growth of major SSs (dev. releases)
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
un
com
men
ted
LO
C
drivers
arch
include
net
fs
kernel
mm
ipc
lib
init
![Page 15: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/15.jpg)
SS LOC as percentage of total system
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Per
cen
tag
e o
f to
tal
syst
em u
nco
mm
ente
d L
OC
driversarchincludenetfskernelmmipclibinit
![Page 16: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/16.jpg)
SS LOC as percentage of total system (ignoring drivers)
0.0
5.0
10.0
15.0
20.0
25.0
30.0
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Per
cen
tag
e o
f to
tal
syst
em u
nco
mm
ente
d L
OC
archincludenetfskernelmmipclibinit
![Page 17: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/17.jpg)
Growth of small core SSs
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
un
com
men
ted
LO
C
kernel
mm
ipc
lib
init
![Page 18: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/18.jpg)
Growth of arch SSs
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
un
com
men
ted
LO
C
arch/ppc/
arch/sparc/
arch/sparc64/
arch/m68k/
arch/mips/
arch/i386/
arch/alpha/
arch/arm/
arch/sh/
arch/s390/
![Page 19: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/19.jpg)
Growth of drivers SSs
0
50,000
100,000
150,000
200,000
250,000
300,000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
un
com
men
ted
LO
C
drivers/netdrivers/scsidrivers/chardrivers/videodrivers/isdndrivers/sounddrivers/acorndrivers/blockdrivers/cdromdrivers/usbdrivers/"others"
![Page 20: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/20.jpg)
Observations and hypotheses
Growth along devel. path is super-linear
y = .21*x^2 + 252*x + 90,055 r2=.997y = size in LOC x = days since v1.0 r2 is “coefficient of determination” using least squares
[Lehman/Turski’s model: y’ = y + E/y^2 (3Ex)^(1/3)]
Linux’s strong growth is continuing. This is stronger growth at MLOC level than
observed by others (Lehman, Gall), even for other OSs.
![Page 21: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/21.jpg)
Why has Linux been able to continue its geometric growth?
Core code quality is carefully maintained Architecture/problem domain
It’s largely drivers Much of the code is “parallel” It’s not as big as you might think
Vanilla configuration used only 15% of files
Development model (OSD) and its sociology Popularity and visibility has encouraged outsiders
(both hackers and industry) to contribute
![Page 22: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/22.jpg)
Growth of pine (email client)
0
50
100
150
200
250
300
350
Jan-93 Jun-94 Oct-95 Mar-97 Jul-98 Dec-99 Apr-01
# o
f M
od
ule
s
![Page 23: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/23.jpg)
Growth of gcc/g++/egcs
0
100
200
300
400
500
600
700
800
900
1000
Aug-87 Dec-88 May-90 Sep-91 Jan-93 Jun-94 Oct-95 Mar-97 Jul-98 Dec-99 Apr-01
# o
f m
od
ule
s g++
gcc
egcs
![Page 24: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/24.jpg)
Growth of vim (text editor)
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
May 1990 Sep 1991 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
LO
C
Total LOC ("wc -l")
Total LOC (ignoring comments and blank lines)
![Page 25: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/25.jpg)
vim avg % comments and blank lines per file
25.0
26.0
27.0
28.0
29.0
30.0
31.0
May 1990 Sep 1991 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Ave
rag
e p
erce
nt
com
men
ts +
bla
nk
lin
es
![Page 26: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/26.jpg)
vim avg/median file size
0
100
200
300
400
500
600
700
800
900
1000
May 1990 Sep 1991 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Un
com
men
ted
LO
C
Average uncommented LOC per source fileMedian uncommented LOC per source file
![Page 27: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/27.jpg)
vim’s architecture
![Page 28: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/28.jpg)
HypothesesFactors affecting evolution include
Size and age of system Use of traditional sw. eng. principles during
development
PLUS Problem domain
Problem complexity, multi-platform, multi-features Software architecture Process model Sociology, market forces, and acts-of-God
![Page 29: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/29.jpg)
Software evolution research: What next?So far, we have examined only growth. More case studies needed
Qualitative and quantitative Industrial and open source systems Different problem domains, architectures
Supporting tools to aid analysing, visualizing, and querying program evolution
More than just RCS and perl Support for architecture repair
Codified knowledge: Why and how does software change?
Build catalogue of change patterns and evolutionary narratives
![Page 30: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/30.jpg)
Codified knowledge Mature engineering disciplines codify knowledge
and experience. Arguably, this is lacking in software engineering.
Software architecture styles [Shaw] Design patterns [GoF]
Codified knowledge of how and why programs evolve:
Evolutionary narratives [Godfrey] Long term, coarse granularity
Change patterns Short term, fine granularity
![Page 31: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/31.jpg)
Change patterns and evolutionary narratives
Cathedral style [Raymond] careful control and management debugging done before committing code evolution is slow, planned, rarely undone
Bazaar style (OSD) lots of low-level changes, frequent fixes lots of “building around” rather than wholesale
changing, occasional redesigns creeping feature-itis, “complete” dependency graph
![Page 32: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/32.jpg)
Change patterns and evolutionary narratives Band-aid evolution (just add a layer)
quick & dirty way to add new functionality, esp. if system is not well understood
e.g., Y2K fixing, adding portability, new features
“Vestigial features” design artifact persists after rationale dies
e.g., whale fin bone structure resembles hand
![Page 33: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/33.jpg)
Change patterns and evolutionary narratives Phenomena observed in Linux evolution
Bandwagon effect Contributed third party code “Mostly parallel” enables sustained growth Clone and hack Careful control of core code; more flexibility
on contributed drivers, experimental features
![Page 34: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/34.jpg)
Defining, Transforming, and Exchanging High-Level Schemas
A guided journey through the outback
Presented by Michael W. GodfreySoftware Architecture Group (SWAG)Dept of Comp Sci, Univ of Waterloo
This presentation is available from http://plg.uwaterloo.ca/~migod/papers/
![Page 35: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/35.jpg)
What is a High-Level Schema?
My answer:Any schema above the statement level
I see two distinct levels of abstraction:1. Programming language entity level
– Entities are (shared) fcns, vars, types, classes, …
2. Architectural level– Entities are modules, subsystems, classes,
interfaces, …
![Page 36: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/36.jpg)
Previous Work
Lots of motivational work ad hoc extractor snarfing experimental translation mechanisms
Examples (many others exist) CORUM I and II GRAX TAXForm (TA eXchange FORMat) using Acacia, Rigiparse Rigi using VisualAge C++ Dali using Sniff+
![Page 37: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/37.jpg)
My (selfish) goals
I would like to be able to use other extractors … Want to perform architectural analyses of
systems written in languages other than C Want to implement BEAGLE
(a tool for exploring software evolution) … but extractors differ in languages
modelled, level of detail, robustness, bugs, data format, … I want to be able to convert data between tools. Need agreement (awareness) from tool creators
![Page 38: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/38.jpg)
TAXForm Utopia
PBS Extractor(cfx)
R ig i Extractor(rig iparse)
D ali Extractor(SN iFF+)
TAXFormR epository
PBS V iew erand Abstraction
Tools
SystemArtifacts
BunchC lustering Tool
R ig i SHriM PView er
Dali toTAXFormConverter
R igi toTAXFormConverter
cfx toTAXFormConverter
Bunch /TAXFormConverter
TAXForm toRigi Converter
![Page 39: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/39.jpg)
Transforming Between Schemas
Universal
High-Level
Procedural
PL/I C
Object-Oriented
C++ Java
Acacia C Rigi CPBS C
![Page 40: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/40.jpg)
TAXForm — Procedural schema
SourceFile
usesfile
Data Type
de fines
Procedure Data
de finesde fines
usestype
usesda ta
de fines de fines
usesp rocedu re
uses type
![Page 41: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/41.jpg)
TAXForm — High level schema
M odule
depends-on
Subsystemconta ins
conta ins
![Page 42: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/42.jpg)
Back to my (selfish) goals
Would like to concentrate on procedural and OO languages. Others are interested in COBOL, JCL etc.
I am interested in high-level info (f calls g) but not in ASGs, code-level metrics
Need to agree on Syntax Level of granularity and detail What to do in case of X e.g., X = “missing
files”
![Page 43: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/43.jpg)
My schema wish list
[influenced by Acacia’s C and C++ data models]
Top-level programming language entities: functions, variables, constants, type definitions
(procedural languages) methods, class member data, static methods and
member data (object-oriented languages)
Entity containers: files, modules, classes, packages
![Page 44: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/44.jpg)
My schema wish list
Entity attributes: Name, unique identifier (UID -- see next section) UID of container, UID of containing file (if container is not a
file) Signature/data type Line number information (see below) Declared scope/visibility, static or not, final or not Definition or declaration (see below)
Entity container attributes: name, UID relative path (if a file) version identifier (if provided) UID of container (if not a file), UID of cont. file (if not a file)
![Page 45: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/45.jpg)
My schema wish list
Relationships: Function calls, variable uses Line number information (see below) Container use/inclusion (by other containers) Inheritance (various kinds) “Friendship”, various template relationships
Relationship attributes: Line number information (see below) Scope/permission of inheritance
![Page 46: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/46.jpg)
Problems
Some technical problems: UID generation? (name-mangling?) Line numbering (ranges)? Incomplete information?
ill-formed code, gcc/K&R-isms missing header files resolving entity use to dfn/dcl
(esp. with polymorphism, overloading) Pre or post preprocessing?
![Page 47: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/47.jpg)
Problems
We’ve had these conversations before …
“Getting academics to agree on anything is like herding cats.”
![Page 48: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/48.jpg)
Example Extractors/Systems
Included here:
PBS [UWloo]
Acacia [AT&T]
cxref, ctags, cscope
TA++ [UOttawa]
BAUHAUS [UStuttgart]
GUPRO [UKoblenz]
Others:
Rigi [UVictoria]
SPOOL [UMontréal]
Datrix [Bell Canada]
MOOSE [UBern]
SHORE [SD&M]
Neuhold [UVienna]
VisualAge C++ [IBM] … [many others]
![Page 49: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/49.jpg)
Dimensions of Variation Intended use
Level of schema (entity level, architectural level, or mixed) Amount of detail
Languages modelled Multi-lingual Common super schemas Explicit model “cross-overs” (e.g., JCL, embedded SQL)
Hidden assumptions Known limitations
Notation/approach to store factbase Support for translations and transformations
What’s particularly novel and noteworthy
![Page 50: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/50.jpg)
PBS [Holt et al. @ UWaterloo]
Portable Bookshelf is a reverse engineering tool for creating software architecture models of large systems:
Guinea pigs: Mozilla, Linux, Apache, VIM, Mitel, TOBEY, …
Consists of fact extractor, fact manipulation engine (“grok”), and visualization tool (“landscape”)
sourcecode
cfx groklandscape
viewerentity-level
factsarchitectural
facts
![Page 51: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/51.jpg)
PBS C Language E/R View
![Page 52: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/52.jpg)
PBS Architectural Schema
![Page 53: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/53.jpg)
Acacia [Chen, Gansner et al. @ AT&T]
History: CIA CIAO Acacia
Consists of C and C++ extractors SQL-like query engine visualization with auto-layout
![Page 54: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/54.jpg)
Acacia C++/C Schemas
Entity attributes: Hex UID, name, kind (file, function, type, var,
macro), filename, datatype (string), typeclass (enum, struct, etc.), linenum info for def/dec, def/dec/undef, param list, template info, scope, storage spec (static, const, inline, inline virtual, etc.), signature
Relationship attributes: Linenum info, rel. kind (refers, contains,
inherits, instantiates, typedef, etc.), relationship scope
![Page 55: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/55.jpg)
Acacia Queries
SQL-like queries for entities and relationships produces “;” delimited textual output:
% ksh cdef -u fu closeTagFile26f53ece;closeTagFile;function;entry.h;void;regular;83;0;83;d
ec;00000000;(const boolean);;extern;;;;76e7ae31;closeTagFile;function;entry.c;void;regular;551;553;5
63;def;00000000;(const boolean);;extern;;;;
% ksh cref –u - - m - file2=‘osdeps.h’<all entity1 attrs> ; <all entity2 attrs > ; <rel attrs>
![Page 56: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/56.jpg)
ctags, cxref, cscope These are “open source” Unix tools that
perform extractions: ctags extracts only entity info
e.g., file, name, line num, kind, etc works with C, C++, Eiffel, Fortran, and Java. Used for fast context switching while editing source code
with vim/emacs cxref generates cross-reference table for C
systems. Often used for webifying source code (e.g., Linux, Mozilla).
cscope used for program comprehension of C systems (e.g., who calls f, who uses v)
Older commercial Unix tool, recently open sourced.
![Page 57: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/57.jpg)
TA++ [Lethbridge et al. @ UOttawa]
TKSee aids programming comprehension i.e., what programmers do all day TA++ is the data modelling language
Want “full story” from the source code: Want pre-preprocessing view of code for all
platforms and environments (text editor’s view)
… but most extractors use a compiler front end and preprocess toward a particular target and option set
Some extractors keep some macro info
![Page 58: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/58.jpg)
TA++ Combined E/R Model
![Page 59: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/59.jpg)
BAUHAUS [Koschke et al. @ UStuttgart]
Software architecture recovery system Parse code, look for hidden/decayed abstractions,
then redesign Uses various heuristics to perform “clustering” Works both at entity level and subsystem level
Built from many tools … … including Rigi viewer and a customized C
parser/extractor that (optionally) dumps RSF Example WoSEF problem:
Cannot derive full includes hierarchy from Bauhaus extracted facts; this was a design decision, as the researchers were not interested in this information
![Page 60: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/60.jpg)
BAUHAUS Entities
![Page 61: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/61.jpg)
BAUHAUS Relationships
![Page 62: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/62.jpg)
BAUHAUS Combined E/R
![Page 63: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/63.jpg)
GUPRO [Ebert, Kullbach, Winter et al.@ UKoblenz]
GUPRO supports simultaneous modelling of inter-related systems written in different programming languages In particular, concerned with the
COBOL/MVS/JCL mainframe world GUPRO is notable because:
Simultaneously multilingual Explicitly models “boundary crossings” (!) Looks at (very real) problems of the mainframe
world COBOL, JCL, database migration
![Page 64: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/64.jpg)
GUPRO
Candidate system is modelled in an object-based repository using a graph-based approach:
EER (modelling language)
+GRAL (constraint language)
GReQL mechanism supports structured queries on the repository via restricted first-order logic
![Page 65: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/65.jpg)
GUPRO
JCL schema COBOL schema
![Page 66: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/66.jpg)
GUPRO
Integrated schemas for JCL and COBOL
![Page 67: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/67.jpg)
GUPRO Multi-Language Model
![Page 68: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/68.jpg)
Summary — High-Level Schemas
Lots of sticky issues at the prog. lang. level: To pre- or not to pre-process Entity resolution often not done (e.g., Datrix) What is a function: def, dec, polymorphism,
overloading, templates, … How to deal with missing libraries, incremental
extractions, versioned extractions, non-ANSI-isms, … Conceptual gaps:
COBOL/JCL world very different from C/C++/Java world
“I didn’t know you wanted full includes info…”
![Page 69: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/69.jpg)
Summary — Good News
Many of us seem to be doing similar kinds of extractions. It seems like that:
Many extractors can be used within other tools Some form of common interchange format is feasible,
tho it may not please everyone. Challenges:
May want to use multiple tools together I have been working on a standalone cxref-based hack to
add full includes information to a BAUHAUS converter Can we take advantage of the web to set up some sort
of distributed fact extraction/conversion factory? [Holt]
![Page 70: Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.](https://reader035.fdocuments.in/reader035/viewer/2022062803/56649f1c5503460f94c3248a/html5/thumbnails/70.jpg)