Managing Code Base

46
Traditional and Agile Software Engineering Michele Marchesi and Giancarlo Succi Chapter 22 – Managing the Code Base 22. Managing the Code Base ......................................................................................... 2 22.1 Maintaining a good coffee shop......................................................................... 3 22.2 Issues in Managing the Code ............................................................................. 3 22.3 Managing Dependencies in Building the System .............................................. 5 22.4 Debugging .......................................................................................................... 9 22.5 Version Control and Configuration Management............................................ 11 22.5.1 Version Control ........................................................................................11 22.5.2 Configuration Management .....................................................................15 22.6 Frequent Integrations ....................................................................................... 22 22.7 Tracking Bugs and Issues ................................................................................ 23 22.7.1 Information to Track ................................................................................23 22.7.2 Issue and Bug Tracking Systems .............................................................25 22.7.3 Using Bugs to Predict Reliability ............................................................25 22.8 Constantly Improving the Code –a.k.a. “Refactoring” .................................... 26 22.8.1 Problems When Improving the Code.......................................................26 22.8.2 Example of Refactoring ...........................................................................27 22.8.3 Guidelines for Refactoring.......................................................................44 22.9 Do Tools Help? ................................................................................................ 45 References .................................................................................................................... 46 Please do not quote not distribute this manuscript, as it is a very preliminary version, still subject to several modifications, corrections, etc © The author retains all copyrights If you have any question or suggestion, please e-mail to: [email protected] , [email protected]

description

Reading material for Software Process Manageent course at FUB

Transcript of Managing Code Base

Page 1: Managing Code Base

Traditional and Agile Software Engineering

Michele Marchesi and Giancarlo Succi

Chapter 22 – Managing the Code Base

22. Managing the Code Base .........................................................................................2 22.1 Maintaining a good coffee shop.........................................................................3 22.2 Issues in Managing the Code .............................................................................3 22.3 Managing Dependencies in Building the System ..............................................5 22.4 Debugging..........................................................................................................9 22.5 Version Control and Configuration Management............................................11

22.5.1 Version Control........................................................................................11 22.5.2 Configuration Management .....................................................................15

22.6 Frequent Integrations .......................................................................................22 22.7 Tracking Bugs and Issues ................................................................................23

22.7.1 Information to Track ................................................................................23 22.7.2 Issue and Bug Tracking Systems .............................................................25 22.7.3 Using Bugs to Predict Reliability ............................................................25

22.8 Constantly Improving the Code –a.k.a. “Refactoring” ....................................26 22.8.1 Problems When Improving the Code.......................................................26 22.8.2 Example of Refactoring ...........................................................................27 22.8.3 Guidelines for Refactoring.......................................................................44

22.9 Do Tools Help? ................................................................................................45 References....................................................................................................................46

Please do not quote not distribute this manuscript, as it is a very preliminary version, still subject to several modifications, corrections, etc

© The author retains all copyrights

If you have any question or suggestion, please e-mail to: [email protected],

[email protected]

Page 2: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.2 of 22.46

22. Managing the Code Base As we have repeatedly emphasized, the code base is by far the single most valued asset of the overall development process. This does not mean at all that the only important activity in software development is coding. It means that the final working product is the code. Therefore, we need to do our best to keep the code in a safe environment. To develop code, we need planning, analysis, design, coding, and all the activities discussed extensively throughout this book. However, there is also a need to develop and maintain code safely from a pure code perspective. The beast we are fighting here is complexity. The core problem lies in the inherent complexity of the code itself. A program may span hundred of pages of code, and the developers of a portion of code might totally forget about it in a few months. We have only one beast, so we are not in such a bad shape. We have learned how to manage the beasts when they appear one at a time. For complexity, we need a clean and neat environment. Therefore, the code is the value, and the code is the issue. Utmost care is to be taken. There are no doubts: if a significant piece of code is not neat and clean it is going to create disasters. Do not believe you can skip this rule. It is going to happen anyway. If you are a bad programmer, it is going to happen very soon. (If you are a bad programmer, it is probably better that you look for a job in a different field anyway.) If you are a genius, an Albert Einstein or a Kent Beck, do not worry; it is going to happen sooner or later. It may be a matter of 10 days, or 20, or 30, but it will happen. The complexity of poorly managed code grows exponentially with the size of the code. In a short while, you will be toasted. From the very beginning of software engineering, people realized this. You might think, “If I write bad code, no one is going to understand it, so I become essential for my company!” People used to say, “Poorly written code is the best job security!” They meant exactly that. If you wrote a piece of code that became a core part of the flagship product of a company, then no one would dare to lay you off because no one else could manage such code. However, for this to happen, you need to ensure that your poor code is not detected before it becomes institutionalized. In the past, with bunches of incomprehensible assembly code, this might have been easy. However, software engineering has progressed so far that now it is easier to detect who writes garbage code before the code becomes part of the essential code base of a company. Apart from ethical considerations, you also ruin your reputation, which is a very valuable asset in a knowledge-based environment like ours.

Page 3: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.3 of 22.46

Therefore, we need to set clear rules to ensure that our code is properly managed. In this chapter, we summarize a set of rules described in various portions of this text, and we will introduce a few more. (Rule number 0 is that software engineers are not stupid…. Is it relevant to mention it? Just remember rule number 0, and always make daily backups of your work. You cannot imagine how many times daily backups have saved people!)

22.1 Maintaining a good coffee shop Think! Every day, you go to your favorite coffee shop, and you drink good coffee. Coffee beans are to coffee what code is to software development; they are the ultimate resource that you use to give value to customers. There is no coffee without coffee beans; likewise, there is no program without code. Coffee beans require a lot of attention. Here are a few of the rules about coffee beans:

• They should be stored in a cool and dry place. • They are to be ground at the very last moment before making the espresso. • Varieties should not be mixed unless you really know what you are doing. • After grinding, keep them in the fridge to preserve the whole aroma.

Within a coffee shop these rules should be strictly enforced. Suitable bins or separate cool areas are reserved to store the coffee. On the top of most espresso machines, there is a container holding a pound or so of coffee beans, which is connected to a grinder. In this way, very few beans are exposed to the air and losing aroma; and those few stay there for a minimal amount of time. They are ground in a strictly FIFO order, and they are ground just before they are put in the container to prepare the espresso. Altogether, this demands a strict, waterfall organization. Likewise, managing the code base requires a strict, waterfall organization, regardless of whether you opt for agile development processes or traditional ones.

22.2 Issues in Managing the Code We need to determine how we can manage the code base in the least invasive way possible.

Page 4: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.4 of 22.46

Configuring the compilation unit

Time

Managing code development in team

Testing and debugging

Managing a web of potentially multiple

customer configurations

Constantly coding, building and integrating

the code Refactoring

Tracking issues coming from customers, etc.

Coding starts

First build

First Deployment

Final Deployment

Scope: Individual developer

Team Organization

Figure 22.1: Issues in managing the code.

In Figure 22.1 there is an overall picture of the issues related to managing the code and how they span the organization. Typical issues are:

• How can I ensure that the code and the compilation instructions are properly configured to compile?

• How can I safely guide a group of people in developing code without everyone damaging everyone else’s code?

• How can I find bugs in my code? • How can I re-create specific configurations of my code for a specific

execution environment and/or customer? • How can I ensure every day that the new build is coherent with the big picture

of the overall system being developed? • How can I track the bugs that have been identified in my code and ensure that

they have all been fixed? • How can I perform constant housekeeping on my code in a safe way, so that as

time progresses its value increases? A variety of issues arise at many points throughout development:

• During development, we have the care of the physical and logical safety of the code we develop, of the correctness of the code with respect to our intentions, and of the coherence of the code with the overall goals of the project.

• While shipping, we need to ensure that the right configuration is shipped to the right customer.

• After the code has been distributed to the customer, it is important to take care of all the issues that arise during and after implementation and to use our spare time to constantly revisit the code so that it becomes simpler and simpler.

Some of these topics are really broad. Specific sections of chapters have been devoted to them. They include testing and JUnit. The other topics are briefly presented in this chapter. We follow the order presented in Figure 22.1 apart from refactoring, which is presented last; it contains a large example that is better suited for a discussion at the end of the chapter.

Page 5: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.5 of 22.46

22.3 Managing Dependencies in Building the System Often a portion of code depends on another portion of code. We now consider an example in Java. At the beginning of our development (Code Fragment 22.1), one package defines the class CoffeeMaker with method makeCoffee(), and another package defines the class CoffeeShop that uses makeCoffee(). We need to ensure that each time the code of class CoffeeMaker is changed, the developers of class CoffeeShop are informed so that the appropriate actions are taken, if needed. Suppose now that the method makeCoffee()is changed into the method makeCoffee(int Sugar), which allows the users to specify the amount of sugar to put in the coffee (Code Fragment 22.2).

package coffeeMaker; public class CoffeeMaker { public void prepareCoffee() { /* Prepare the coffee with a standard amount of sugar */ } } package coffeeShop; public class CoffeeShop { static public void main(String argv[]) { CoffeeMaker aCoffeeMaker = new CoffeeMaker(); aCoffeeMaker.prepareCoffee(); } }

Code Fragment 22.1: Situation at the beginning of development.

We see that there is now a dangerous inconsistency. The developers of package coffeeShop have to be made aware of the change; otherwise, the code will not inter-operate correctly.

Page 6: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.6 of 22.46

package coffeeMaker; public class CoffeeMaker { public void prepareCoffee(int sugar) { /* Prepare the coffee with a variable amount of sugar */ } } package coffeeShop; public class CoffeeShop { static public void main(String argv[]) { CoffeeMaker aCoffeeMaker = new CoffeeMaker(); aCoffeeMaker.prepareCoffee(); } }

Code Fragment 22.2: Situation after the change.

The situation may be tricky because the developers may have old versions of the specification documents or of jar files, where the declaration of obsolete interfaces or classes are located. There are three complementary approaches for keeping this dangerous situation under control:

• Avoid making changes abruptly; always notify users of future changes by first defining as “obsolete” in the javadoc documents the classes, methods, or interface that you plan to change.

• Keep the libraries you use up to date; if you supply your libraries or functions to someone else, be sure you also communicate any possible changes you want to implement.

• Use tools to verify that there have been no unexpected changes in packages you use. Java compilers do this automatically.

We can take a more general view, beyond Java. The dependency is related to the “uses” relationships. Uses may have different meanings. Specifically, it may mean, “I use the functionality of a given function,” as in Java or C++ when I simply access a function by calling it. It may also mean, “I use the text of another portion of code,” as with include files in C++ that are inserted in the code to compile during preprocessing. In the latter case, the situation is more complex, as the check is not done at the language level. From a compiler viewpoint, nothing changes when something is inserted via a preprocessing operation. The compiler does not see a change here. Therefore, using languages such as C++, we need to instruct the system that there may be changes of entities we depend on that are not handled by the compilers. The Unix make command is a very popular and handsome utility that serves this purpose. It was developed with Unix and C, but now it is available on most platforms. Without going into the details of make, we can say that in make we specify dependencies among files and commands that are executed when a modification occurs among the dependent entities. Suppose for instance that we want to model the situation of Figure 22.2.

Page 7: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.7 of 22.46

Module

A Module

B Module

C

Module E

Module D

Module F

Module G

depends on depends on

depends on depends on depends on

depends on depends on

Figure 22.2: Example of dependencies.

We need to inform make of the dependencies. We write a dependency file called Makefile, where we describe (Code Fragment 22.3):

• The structure of the dependencies • What to do if an entity we depend on is modified

#Makefile for Figure 22.2 G: D F <TAB> Command 1 to execute if D or F are modified <TAB> Command 2 to execute if D or F are modified ... <TAB> Command n to execute if D or F are modified <EMPTY LINE> D: A B <TAB> Command 1 to execute if A or B are modified <TAB> Command 2 to execute if A or B are modified ... <TAB> Command n to execute if A or B are modified <EMPTY LINE> F: B C <TAB> Command 1 to execute if B or C are modified <TAB> Command 2 to execute if B or C are modified ... <TAB> Command n to execute if B or C are modified <EMPTY LINE>

Code Fragment 22.3: Structure of the Makefile.

Because make was born in Unix, the entities we use for describing dependencies are files. There is nothing more detailed or less detailed than that. The dependency is described by first putting the name of the file that depends on something –for instance G, followed by the colon “:” and the list of what G depends on, A and B in our case.

Dependencies What to do in

case of change

Page 8: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.8 of 22.46

Dependency means that the date of last modification of A or B is after the date and time of last modification of G. There is no semantic check here. It is just a plain check of dates and times. If G is older than A or B, then make thinks that a potentially dangerous modification has occurred. Then, there is a list of commands to execute if G is older than A or B. This list is defined by a sequence of lines, each starting with a Tab. The list ends with an empty line. Clearly, we could list commands that have nothing to do with our goal of keeping the system under control. We specify the list of actions, and there is no system check on it. Code Fragment 22.4 contains an example of dependencies in C++. The file Main.cpp depends on the file CoffeeMaker.h, and the executable Main.exe depends on both.

// file CoffeeMaker.h class CoffeeMaker { private: int sugarAmount; public: int pumpPressure; void clean() { ... } private: void removeFilter() { ... } }; // file Main.cpp #include "CoffeeMaker.h" int main() { CoffeeMaker aCoffeeMaker; aCoffeeMaker.clean(); }

Code Fragment 22.4: Example of dependencies in C++.

Code Fragment 22.5 contains the Makefile for Code Fragment 22.4. We produce the executable Main.exe with the GNU C++ compiler command g++ -o Main.exe Main.cpp. Here we say that we need to recompile Main.cpp if the last modification to CoffeeMaker.h or the last modification to Main.cpp occurred after the last modification of Main.exe. There is no rule about Main.cpp even if it depends on CoffeeMaker.h. Main.cpp is not built in any way; it is coded by the programmer, so the programmer has to take care of possible changes to CoffeeMaker.h manually.

Page 9: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.9 of 22.46

#Makefile for Code Fragment 22.4 Main.exe: Main.cpp CoffeeMaker.h <TAB> gcc –o Main.exe Main.cpp

Code Fragment 22.5: Structure of the Makefile for Code Fragment 22.4.

The command make is very powerful with lots of different options. It can be used each time a dependency of any kind has to be enforced. Documents formed by several subdocuments scattered over the file system are another example of use. Process dependencies and workflows can also be implemented with make. We recommend that you consult a suitable manual (Oram and Talbott, 1991; Stallman and McGrath, 1998) or run “man make” in a Unix machine.

22.4 Debugging Sometimes, our tests are not complete enough and there is a bug in the code that is not evidenced by the unit tests. This is a very realistic scenario because tests cannot cover all the possibilities. We cannot say enough that testing proves the presence of a bug and not the absence of one. In the good old times, the search for a hidden bug was conducted by a sequence of print statements disseminated throughout the code and showing the values of the variables (Code Fragment 22.6).

package coffeeMaker; public class CoffeeMaker { public int coffeeAmount; public int status; static int howMany=0; public void prepareCoffee(int sugar) { /* Prepare the coffee with a variable amount of sugar */ System.out.println("The value of coffeeAmount is: "+ coffeeAmount); System.out.println("The value of status is: "+ status); System.out.println("The value of howMany is: "+ howMany); } } package coffeeShop; public class CoffeeShop { static public void main(String argv[]) { int howMuchSugar = 3; System.out.println("The value of howMuchSugar is: "+ howMuchSugar); CoffeeMaker aCoffeeMaker = new CoffeeMaker(); aCoffeeMaker.prepareCoffee(); } }

Code Fragment 22.6: Adding print statements to help in debugging.

Such dissemination of print statements helps debugging in two essential ways:

Page 10: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.10 of 22.46

1. It tells the programmer the lines the program had reached when it crashed –the program might be a couple of lines ahead if it does not flush the output.

2. It tells the programmer the values of the variables. The major disadvantage is that it fills up the program with code that is useful only for debugging. This has four major drawbacks:

1. It takes time to write such statements and then delete them at the end of debugging, for possible future new insertions if a new bug is found.

2. Adding and deleting may cause the introduction of new mistakes; for instance, you may delete a useful line without noticing it.

3. The print statements make the original code difficult to read. 4. The output may be so full of print statements that it becomes difficult to debug.

Development environments address such issue with debuggers. Debuggers do exactly what you do with print statements; they let you “watch” the value of any variable you want. They also support:

1. A step by step execution of the program. 2. A regular run of the program till a given instruction, the “breakpoint,” where

the program stops, so that you can examine the values of the objects and the variables at that point.

3. A regular run of the program till a variable or an object reaches a given value, so that, again, you can examine the values of the objects and the variables when such a condition occurs.

Figure 22.3 is a snapshot of the debugging support of Eclipse.

Figure 22.3: Structure of the debugging facility in Eclipse.

Values of the variables and objects

Output of the program

Original code

Breakpoint

Threads of execution

Page 11: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.11 of 22.46

As is evident, there are specific provisions to view in a glance at:

• The original code, with possible breakpoints attached to it • The values of variable and objects • The output of the program • The running threads

Much more information is available. You can consult the excellent tutorial on debugging in Eclipse (Shavor et al., 2003; Leszek, 2003). Additional information on Eclipse is in (Gamma and Beck, 2000).

22.5 Version Control and Configuration Management Several software engineers and developers may work on the same project. This is a common scenario, and happens even at the high school or university level. How can we ensure that people work cooperatively and do not obstruct each other? A typical obstruction occurs if everyone works on the same piece of code. The risk is that they concurrently modify it without properly notifying each other. The result can easily be complete confusion. Another problem occurs if the same code base is used for multiple customers and the software engineers have to track the different configurations of code shipped to the various customers. These two interconnected problems have generated two interconnected solutions called: “Version Control” and “Configuration Management,” respectively. Version Control has often been considered a subset of Configuration Management.

22.5.1 Version Control How can we prevent a developer from altering what someone else is already altering? Imagine multiple people asking for a cappuccino at our coffee shop. How can we ensure that everyone gets the right one and the various waiters do not mix ingredients in the wrong way or serve customers the wrong order as they try to take care of multiple customers? We have one resource, the code base, and we want to share it among multiple developers. A simple approach would be to grant access to the code base to only one developer. However, this would severely impact the performance of anyone else. We would effectively put in sequence work that could be performed in parallel. It would be as if in we would allow only one person at a time go through the pipeline of ordering the coffee beverage, waiting for the coffee beverage to be served, and then getting the coffee beverage from the waiter. We can do better.

Page 12: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.12 of 22.46

We can grant access in writing mode to only one developer and let everyone else access in reading mode. Only one could modify it but everyone could study solutions, improvements, fixes, and so forth simultaneously. It would be as if in our coffee shop we would let everyone read the menu and ask questions of the waiters in parallel, but still only one could enter an order to be served. This would still slow down our process significantly. A more refined approach is to slice our code into independent units and let only one person access a given unit in writing mode, while everyone else can still access it in read-only mode. In this way, we limit the restriction of only one modification to an individual unit at a time, while everyone else can still access the unit for reading it.. This is definitely better. Parallelism can go ahead! In our coffee shop, it means that we grant exclusive access to the different tools available to “developers” of the coffee beverages: the cashier, the steamer, the espresso maker, and so forth, but we let different people access the different tools simultaneously. This is not yet the solution, but it is very close to it. We use this scenario to describe a bit of software engineering jargon for version control. “Version control” is the system that ensures that units of development are accessed safely by multiple developers. The “versioned unit” is the smallest unit put into version control. Typically, this is a file. There are situations where parts of a file are put under version control, for instance, a class or a method. But this requires more refined tools and procedures that are not often available. Whenever we take control of a versioned unit, a file in most cases, we say that we “check out” a unit. Whenever we return the unit to the system with the modifications, we say that we “check in” the unit. Only a person who checks out a unit is allowed to check it back in. No one else can check out a unit if the unit is still checked out –only one check-out at a time is allowed.

Unit checked

out

Unit checked

in

Unit checked out read

only

Check out

Check in

• Version number • Time stamp • Author • Comments

Check outread only

Dispose

Figure 22.4: Lifecycle of a versioned unit.

Page 13: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.13 of 22.46

Each time the unit is checked in, a number increases automatically, the so called “version number,” so that we can track earlier versions of the unit. The version number is typically a positive integer. There is also a timestamp associated with each version, plus optional comments from the software engineer(s) who checked the unit in. Such structure allows us to ask to the system to retrieve version 24 of CoffeeMaker.h or the latest version of CoffeeMaker.h checked in before December 17, 2001. To review a unit without getting write access to it, we can check the unit out in “read only mode,” but we are not allowed to check the unit back in. After reviewing it, we must dispose of it. Figure 22.4 describes the overall lifecycle of a versioned unit in the version control system. We represent the unit checked out in read-only mode with a dashed line because this is not properly a state of a unit. There may be many copies of an entity checked out in read only mode, and the version control system does keep track of those copies. We cannot ever check them back in. After our code is ready to ship, we say that we have a “release” of the code. We also number releases to identify them uniquely. As versions progress to releases, it is common that

• Each “specific” version number restarts from 1 whenever there is a new release.

• The “full” version number is the release number followed by a dot and a “specific” version number.

So, if we have 25 versions before the first release, we identify them as version 0.1 to 0.25. If there are 7 versions in release 1, we identify them as versions 1.1 to 1.7. If there are 12 versions in release 2, we identify them as versions 2.1 to 2.12. And so on. We can also distinguish between minor releases and major releases, with more nested numbering schema. However, nothing changes substantially.

22.5.1.1 An Optimistic Approach to Version Control The situation described so far appears to be the most open approach to version control that ensures the integrity of the system. However, what if someone forgets to check back in an entity he has erroneously checked out? This would imply that no one else can modify it. Checking it out an entity by mistake may happen as someone may not be sure what he has to modify or change in the code. After browsing a file that requires no modification, it is quite easy to forget to check it back in. Such situation may be quite common. We know how absent-minded we are as programmers and software engineers.

Page 14: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.14 of 22.46

Since the early era of software engineering, people have proposed the opposite approach. We let everyone check out any entity, and when they check it back in, we control that the entity being checked in was checked out as the last version. Tricky to say. Let’s review the concept with an example. Each time we check in an entity, the entity gets assigned a version number: 1, 2, 3, … n. The version control system tracks the version of an entity. We can also track it on the computer where we check it out. We need a bit more machinery, but this is easily done. Suppose we check out an entity at version number 25. We can check it back in if the latest version is still 25 in the version control. If concurrently someone else has checked the entity out and then checked it back in, the version number is 26 or 27 or higher. We have a conflict. We cannot check our version number 25 back in automatically. The system informs us of the inconsistency and proposes several options such as reviewing the differences between the versions. A possibility would be to have multiple running configurations of the system. This leads to the issue of configuration management, which we describe later in this chapter.

Unit checked

out

Unit checked

in

Unit checked out read

only

Check out

Check in

• Version number • Time stamp • Author • Comments

Check outread only

Dispose

• Version number

Figure 22.5: Lifecycle of a versioned unit with an optimistic approach.

In Figure 22.5 we have a reviewed version of the lifecycle of a versioned unit in a system taking an optimistic approach. As mentioned, we need to track the status of the checked out entity as well. We still have the options of checking out an entity in read-only mode, even if it is less relevant. If we check it out in read-only mode, there is no need to track any information on the entity. It is also possible to gather status information on all the entities, such as querying the system to discover whether someone has introduced new entities or changed entities that we have checked out.

Page 15: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.15 of 22.46

22.5.1.2 Basic Commands for CVS System Version control has been offered as a free utility since the origin of the Unix operating system. The two most common version control systems were RCS (Revision Control System) and SCCS (Bolinger and Bronson, 1995). Nowadays, an open source extension of RCS, CVS (Concurrent Versions System)is very popular on both Unix and Windows. A brief review of the basic commands in CVS follows. A full description of CVS can be found in (Vesperman 2003; Bar and Fogel, 2003).

• cvs init initializes a new repository for version control. • cvs checkout checks an entity out of the version control system and stores

its status in the local machine where the developer works. • cvs commit checks an entity back into the version control system, with

updated status information. • cvs update queries the system if the current version of an entity is the latest

or if someone else has checked in new versions of it. • cvs remove removes an entity from a version control system.

Entities are always files, given the Unix background of CVS. A new version of CVS, Subversion, is now growing in popularity (Collins-Sussman et al., 2003).

22.5.2 Configuration Management There are situations where we need to track different versions of an entity, all relevant and all up to date. Imagine, for instance, that we develop a system for different hardware profiles. It is now quite common to write applications running on laptops or PDAs, so in Java we may want to keep the bulk of the system the same. In addition to the issues taken care by a version control system, we need also to ensure that the appropriate configuration of the system is always available to be checked out, so that we can promptly respond to requests for bug fixes, upgrades, and so forth. In the good old days, configuration was done within the code. In C and C++, it is possible to mark part of the code as code to include in the system to compile only if certain preprocessor flags are present.

[1] // Code good for any configuration [2] [3] #ifdef LAPTOP [4] // Some code specific for the laptop [5] #endif [6] [7] #ifdef PDA [8] // Some code specific for the PDA [9] #endif

Code Fragment 22.7: Managing multiple configuration within the source file.

Page 16: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.16 of 22.46

In Code Fragment 22.7 there is a sample use of such preprocessor directives. Lines [1], [2], and [6] are shared across all the versions of the preprocessed code of the source file. Line [4] is included in the preprocessed versions when there is the symbol LAPTOP defined. Line [8] is included in the preprocessed versions when there is the symbol PDA defined. It is up to the developer:

• To write the preprocessor directives to manage the different configurations within the same source files.

• To preprocess the source files with the appropriate definitions to ensure that the right code is produced to go to the compiler.

These two tasks are pretty complex. In addition, the source code ends up containing all the possible variations. This makes the code really difficult to read, debug, and upgrade. A different strategy is required. We need to use “Configuration Management Systems.” A configuration management system tracks multiple versions and different configurations of the same base code to adapt to different situations, such as:

(a) Different target hardware –Laptop vs. PDA (b) Different operating systems (c) Different customers with slightly different views on how the system

should work (d) …

Tracking multiple configurations of the same code base requires a lot of effort but does not pose any substantial difficulty. The strategy used by most configuration management systems is to keep a common main stream for the common part of a system and then to adopt various identification schema to identify the parts that are specific to given configurations. While version control focuses on one stream of evolution of the code, configuration management manages multiple parallel streams of evolution of the same code base. It is possible to manage configurations with version control systems. However, this is suboptimal. Because most configuration management systems cost a lot, this can be a strategy to save, which may work in small companies with a very limited number of configurations to track. It is also possible to use configuration management systems for simple version control, but it is like using a bazooka to shoot a mouse. Recent configuration management systems offer the developer a view of them as if they were the file system. In the case described above, the developer tells the configuration management system that she wants to work on the tool for the PDA. The configuration management system offers a tree-like view of resources that resembles files and that refers only to

Page 17: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.17 of 22.46

the parts of the tool related to the PDA. The configuration management system then takes the duty of integrating properly the source code, putting each item in the right place.

22.5.2.1 Features of recent configuration management systems The major features in modern configuration management system are the following:

• Be able to identify every system item to be included in configuration management.

• Keep track of all changes made to the system, and be able to know who made the change and why.

• Provide a change policy stating who is in charge of change control, and prescribing the steps to accept and validate changes (or reject them).

• Be able to return not only the configuration of all current variations of the system, but also all the versions of each variation approved in the past.

• Manage concurrent access to system modules, avoiding inconsistencies. • Manage dependencies among modules and versions, avoiding conflicts

and enforcing proper recompilations and changes to modules affected by changes in other modules.

• Manage a system configuration repository in an efficient way. The most recent configuration management systems offer the developer a view of them as if they were the file system. In the case described above, the developer tells the configuration management system that s/he wants to work on the tool for the PDA. The configuration management system offers a tree-like view of resources that resemble files and that refer only to the parts of the tool related to the PDA. The configuration management system then takes the duty of integrating properly the source code, putting each item in the right place.

22.5.2.2 Baselines Look at a software shop where software is built. Developers here spend most of their time behind a display, typing at a keyboard. Every keystroke is a change, or an addition, made to the current system. Clearly, we do not want to keep track of all these changes. When freely working, a developer creates, experiments, writes, throws away parts of his work, writes again, and so on. From time to time, the object of a developer’s work reaches a stage where it can be considered finished. At this point, this item is ready to be officially declared part of the system. In CM, a “baseline” is an element that has been formally reviewed and agreed upon, and thereafter serves as the basis for future development. For instance, let us consider our subway system. Let us suppose that a developer is in charge of writing class TrainDisplayWindow, which displays the line and train movement along it. The target operating system is Linux.

Page 18: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.18 of 22.46

Starting from analysis requirements, design documents, and her knowledge of the window system and graphic display libraries, our developer figures out the data structure and the methods of this class. In two weeks or so of development, she writes the code, starting with a rough draft and then refining it with additions, deletions, and re-writings. Of course, the new class makes use of existing classes and libraries already “frozen” as baselines. Once our developer decides that the TrainDisplayWindow class just written is ready to be delivered, she asks for formal review. The developer(s) in charge of quality assurance review the class, test it, and if all is OK, officially approve it. The class is given a name and a version number (for instance, class TrainDisplayWindow of package UserInterface, version 0.1), and is entered into the CM system. Now it is a baseline. The new baseline is then made available to other developers, who make use of this class. Despite the review, some bugs are found. These bugs are made known to our developer, who makes the necessary corrections. A new version of the class is thus produced, reviewed, and approved. It is a new bug-fix baseline, called version 0.2. As system development keeps going, new requirements emerge, and class TrainDisplayWindow must be improved, adding new features and maybe deleting some old ones that are no longer deemed relevant. Our developer in charge of this class makes the needed changes and again submits the class to formal review. Since the new baseline is significantly different from previous ones, its version number becomes 1.0. In the meantime, a Microsoft Windows version of the subway control system needs to be developed. In this new system, our class TrainDisplayWindow needs substantial changes, since the Windows management philosophy and graphic libraries are very different from Linux. So, a variation of class TrainDisplayWindow needs to be made. This variation will, in turn, become a new baseline, possibly with a version number independent from that of the original Linux variation. A new family of baselines has started. Of course, design patterns and techniques to minimize code duplication among different variations should be used. The concept of baseline is very important, since it allows us to clearly separate changes that are not relevant to configuration management from changes that are. Each baseline is unequivocally identified and is the building block of the system configuration.

22.5.2.3 Configuration items Each element in the configuration management system is called “configuration item,” (CI). As we mentioned previously, any artifact of software production can be considered a configuration item. The granularity (dimension) of these items may vary significantly. A class may be a configuration item. A baseline is always a configuration item. Each CI must be uniquely identified in order to be managed in a CM system. Since a CI evolves through baselines, every baseline also needs to be uniquely identified.

Page 19: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.19 of 22.46

Usually, the identifier of a baseline is the identifier of the corresponding CI, plus the exact version number. As a matter of fact, any configuration item evolves –code, documentation, analysis, design, etc.

Trading Software System System-related artifacts: Subsystems - vision doc - requirement docs - system architecture docs - data dictionary - user manual - . . .

Subsyst. 1

Subsyst. 2

Subsyst. 3

Sub-System 3 – Derivative Management Sub-system-related artifacts: Subsystems - requirement docs - architecture docs - OOA docs - OOD docs - Test cases - Programmer manual - . . .

Subsyst. 3.1

Subsyst. 3.2

Subsyst. 3.3

Sub-System 3.2 – Derivative Value computation

Sub-system-related artifacts: Software modules - Package architecture docs - OOA docs - OOD docs - Test cases - Programmer documentation - . . .

Package: Black-Scholes

Package XX Package YY

Code module 1

Object module 1

Code module 2

Object module 2

Figure 22.6: Different kinds of configuration items.

In Figure 22.6 there are the various kinds of configuration items present in a real system. Each configuration item may also contain other configuration items. It can also have different types of relationships with other configuration items. Here only composition relationships are shown. If “transversal relationships” were shown, these would amount to a deeply intricate spaghetti web. Transversal relationships represent dependencies between high-level documentation, such as general architecture diagrams and lower-level subsystems, analysis diagrams and design diagrams derived by them, design diagrams and source code classes and functions designed by them, user and programmer manuals and the corresponding documented parts, and so on. Even a medium-sized software system is composed of a huge amount of deeply interrelated modules. This would be an issue even if the system were frozen at a given time, and so its structure were static, and only one version were present for each module. In reality, some or many developers keep working on the system for months or years. New modules are continuously added to the system, and every module is continuously subject to change, evolving through a substantial number of revisions.

Page 20: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.20 of 22.46

22.5.2.4 Repositories of Legacy Systems and Tools Many software organizations also place software tools under configuration control. This means that every version of a compiler, development environment, or CASE tool is identified and controlled. Old versions of the tools are also kept in the system. In this way, it is possible to reproduce exactly not only every baseline, including old ones, but also the effects of their compilation or other elaborations. At this point, one could wonder why it is so important to keep track of older versions of the system. Wouldn’t it be simpler to keep only the most recent version and throw away the old ones? In fact, for most systems things are not so easy. Some customers could have upgraded their systems to new versions, and some not. There are cases in which software is part of a bigger engineering system, which is “frozen” and should not be changed, except for correcting severe errors. If a bug report is originated by one of the customers having an old version of the system (and this may happen even years after shipping), it is important to be able to recreate the failure, and therefore to recreate the original system shipped to the customer. This is the reason for placing software tools under configuration control. All the approved CIs that constitute the system are placed in a repository, often a database management system. A CM repository must be able to:

• uniquely identify each CI, its baselines, and the developer(s) in charge of them;

• store the CIs in such a way as to minimize redundancies among versions and variations;

• manage CIs composed of other, lower level, CIs, and manage dependence relationships among CIs (for instance, dependence between a design document and the software modules described in it);

• reconstruct a given system configuration, and the history of every CI; • manage synchronization of check-in and check-out of modules by different

developers, maintaining the consistence of the repository; • be efficiently and securely accessed through a network;

Usually CM systems store information incrementally. This means that, given a subsequent version or variation of an existing CI, only the differences between the old baseline and the new one are recorded in the repository. In this way, the redundancy of the system is kept to a minimum. Often, many baselines depend on a single module or on a single part of a module. For instance, let us return to our TrainDisplayWindow class. Let us suppose we have three variations of it, one for Linux, one for Windows, and one for MacOS. Let us also suppose that a change is needed to the part of the graphic engine common to all variations. This change, once made and approved, should be immediately reflected in a new version of all three baselines, without the need for manually changing their code. When a CI is composed of other, lower level CIs, its specification includes only the references to its parts and the information specific to the higher level component. When a new baseline is added to the repository, all the CIs that could be somehow affected by the change because they are composed of, or are related to, the changed

Page 21: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.21 of 22.46

CI, should be marked, and a message should be sent to the developers in charge of these CIs. This message signals that a new version of this module is available. The developers are left to decide whether to use the new CI, and thus create new versions of the modules that make use of or depend upon it, or to keep the old version.

22.5.2.5 The Change Control Process In the case of a large project, controlling change is essential to the success of the project. Having a good VC/CM in place is only the starting point to be able to control change. A process is needed to state how a change request may be generated, who is in charge of what, which registrations need to be recorded, and which activities and checks must be performed Change requests may originate:

• from the customer or from the marketing department, willing to add or to update one or more system requirements;

• from a user who detects an error in the system; • from a developer, who sees a way to improve the system; • from the quality assurance group, requiring changes to address specific

quality issues; • from the team manager, to improve the system architecture, or to address

requirements coming from upper management. The change control process needs at first to identify the roles and the persons or teams in charge of performing the various requests, activities, and audits. Possible roles are

• Evaluator: authority in charge of compiling official change requests. This is usually accomplished by a developer or a team of developers, who filter change requests coming from the sources described above.

• Change Authority: authority in charge of accepting or rejecting change requests. Usually, it is a team composed of managers, user representatives, and chief developers.

• Auditor: authority in charge of auditing changes. Its task is to evaluate and test the new baselines developed after a change request. Usually, the auditor is a team composed of chief developers and software quality assurance (SQA) people.

The change control process is usually something like the following:

1. The change request is originated by one of the sources described above. 2. The change request is evaluated by the Evaluator. If the request is deemed

sensible, an official change report is written, describing the change request originator, the requested change, its motivation, its advantages and disadvantages, and the estimated time and cost of the change.

3. The change report is recorded. 4. The Change Authority examines the report and decides whether to approve the

request. 5. The Change Authority’s decision is recorded.

If the request is not approved, the generator of the change request is informed of the rejection.

Page 22: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.22 of 22.46

6. If the change is approved, it is queued for action, taking into account the priority of the change.

7. When the change has to take place, it is assigned to one or more developers, usually those in charge of the modules affected by the change.

8. The change assignment is recorded. 9. The developers work on the system as described in the preceding section,

implementing the change. 10. When the modules related to the change are ready, the developers ask for

auditing and approval of them. 11. The changed modules are recorded. 12. The Auditor reviews the changed modules, tests them, and evaluates the

impact of the new modules on the system. 13. The Auditor’s decision is recorded.

If a module does not pass the audit, it is sent back to the developer for further modifications and error corrections. The process is resumed at Step 9.

14. If the module passes the audit, it becomes a new baseline. Its version number is updated, and it is inserted in the CMs.

The full change control process described above is clearly suited for big software projects, with many dozens, or hundreds, of developers. For smaller projects, it can be simplified, and the authorities may be composed of just one person. As explained by Bach (1998), such a big process might become mindless resistance to change, regardless of its potential reward, or it might degenerate into a ritual that allows any change to be made, as long as the ritual is honored. Moreover, the change control process might hinder the project, delaying changes for too long and constraining developers to produce too many documents to justify changes. In the end, controlling changes is crucial for every software project. The bigger the project, the more crucial it is to control its changes. The change control process, however, should be carefully engineered and tuned.

22.6 Frequent Integrations One of the tenets of modern process control methodologies is to make the quality evident to everyone. Lean management, just-in-time development, total quality control, and all these very fancy methodologies all advocate making evident all sorts of situations when quality is lacking. We have already seen a variety of ways to determine when a piece of code does not conform to the overall specifications. Pre and post conditions, invariant assertions, and test first are very effective methods to reach such goals. A full check for conformance is far more difficult … actually it is undecidable, but we will not discuss this now. It is not so easy to ensure that a piece of integrated code as a whole does not break a set of specifications of the entire system. As usual, such sets of specifications can be defined in terms of a sequence of tests.

Page 23: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.23 of 22.46

To check that the system as a whole satisfies the set of system tests, it is important to frequently integrate the system, and to launch a set of system tests. If the system is built using test first and daily integration, it is possible to have daily checks of the overall (non) conformance. Such daily verification of the conformance of the system adheres to the tenets of lean methodologies, which advocate a very specific control on the quality of the artifacts being produced and it is fully supported by several plan-based methodologies and most agile methodologies. There are tools that automatically launch a set of system tests after the system is integrated, and then inform the managers of possible breakage. One such tool is CruiseControl, developed by Thoughtworks (http://www.thoughtworks.com) and available free (http://cruisecontrol.sourceforge.net). The principles behind CruiseControl are well described by Fowler and Foemmel in their groundbreaking article (2003). Frequent integrations are hard to achieve in embedded systems, where integration requires you not only to assemble pieces of code but also to configure hardware components. In such cases, it is often possible to use simulators of the overall system behavior and to use the simulator for the frequent builds. Still, it is important every now and then to have complete system integration, not just a simulated integration. There are always aspects that cannot be fully captured in simulators, and it is very valuable to have such sanity checks often, well before the final release.

22.7 Tracking Bugs and Issues Once the code is developed, it enters the testing phase. After testing, it is released to the customers. At each stage, there may be problems related to bugs, that is, behavior that is not conformant to the specifications. There may also be other issues to consider for improving the system. Bugs and issues are very important to record because they are the reference points for improving the system and making it more suited to customers’ needs. Note that bugs are a type of issue. They are, however, the first to track, and they are often the most relevant to getting the customer to accept the system. Therefore, we speak explicitly of bug tracking in addition to issue tracking.

22.7.1 Information to Track Bugs and issues may have several attributes to record, so that they are properly described and taken care of. Bugs and issues usually refer to a variety of situations, involving different pieces of code. They are originated by different kinds of stakeholders –developers, testers, marketing staff, customer relation people, customers, and others.

Page 24: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.24 of 22.46

They may vary in severity–a bug that completely destroys the system where it runs is different from a menu item with a typo in the label. They may also vary in priority –porting a system to a language with several potential customers may be more important that making the lemon yellow color of a background window look paler. Altogether, the situation can be complex to track. As in the other cases in this chapter, complex does not mean dreadfully difficult. This is not a terrible problem. A systematic approach helps in solving the problem. Bug tracking and issue tracking systems have been conceived with the goal in mind of storing information on bugs or issues in a systematic way, so that the information is easier to retrieve later. In the jargon of software engineers, we say that we “serve” a bug or an issue when we are trying to resolve it. Typical information collected for bugs and issues includes:

1. When the bug or the issue entered the system. 2. Who entered it. 3. The code and the other artifacts the bug or the issue refers to. (By “other

artifact,” we mean design documents, documentation, analysis documents, test cases, and so forth.)

4. If possible, some hints on how to re-create the bug or the issue on the machines used by the developers, so that they can better understand and fix the problem.

5. The severity of the bug or the issue; a four-step ordinal scale is often used to measure severity.

6. The priority of the bug or the issue; a four-step ordinal scale is often used to measure priority.

7. The current status of servicing of the bug or issue. 8. Whom the service of the bug or the issue has been assigned to, if any. 9. When the service of the bug or the issue is terminated.

The severity of the bug or issue is often defined on a four-step ordinal scale, such as “crashing the system,” “making the application not run,” “local misbehavior,” and “cosmetic deficiency.” The priority of the bug or issue is also often defined on a four-step ordinal scale, such as “top priority--requires working 24/7 till the problem is resolved,” “high priority—resolved during regular business hours,” “must be resolved before the next release of the system,” and “nice to have.” The status of the servicing of a bug or issue is often described with a state diagram. A simplified state diagram for the servicing of a bug or issue is in Figure 22.7. When a bug or issue is entered into the system, its status is “waiting to be assigned” to some developers or other resource to take care of it.

Page 25: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.25 of 22.46

Once someone takes property of it, the status becomes “assigned.” That person then determines whether the bug or issue is to be considered, can be “ignored” forever, or can be “deferred” to a future time. If it is worth consideration, its status becomes “being serviced” and remains so until it has been “resolved.” It is also possible that after being serviced for a time, a higher understanding of an issue is gained and it is decided that the issue can be ignored or deferred for later consideration.

Figure 22.7: Typical lifecycle of the status of an issue.

The reality can be more complex than the situation presented here. More statuses and more arcs can be present, with loops between stakeholders, such as customers, testing department, marketing personnel, developers, and others.

22.7.2 Issue and Bug Tracking Systems Tracking bugs and issues can be done in a variety of ways. It is possible to send e-mail to developers. It is also possible to compile a written list of problems to consider and store it in a database. Because the information to consider can be substantial, it is important to proceed in a systematic way. Modern bug tracking systems help to track all the information related to bugs and issues and to access it in a convenient way, even via web interfaces. There are very popular open source ones, such as bugzilla (Barnson and Steenhagen, 2003) and the new, emerging scarab (Scarab Web Site). Bugzilla helps track the overall development of GNU open source software. We have to believe that it is quite adequate. Tracking the arrival and closure time of bugs can be useful also to predict the overall reliability of the system.

22.7.3 Using Bugs to Predict Reliability Bugs often cause failures of the system. Various methods for the software reliability management and control operate on the data, describing time between failures, failure rate, or cumulative count of failures over time (Littlewood, 1981). A Software Reliability Growth Model (SRGM) is a formal equation describing the cumulative number of errors discovered over time.

Page 26: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.26 of 22.46

Based on the error-detection rate, SRGMs can be classified as concave or S-shaped (Figure 22.8). S-shaped models start with a convex shape, reflecting the initial learning phase during which the detection rate increases (Lyu, 1996). The convex shape then gradually becomes concave as time progresses. Both types of models assume a finite number of bugs in the software product. As most of the errors are detected and the product reaches the stable state, the error-detection rate decreases in both types of SRGMs.

Bugs Discovered

Time

Concave S-shaped

Bugs Discovered

Time Figure 22.8: Concave and S-shaped SRGMs.

By interpolating the arrival time of bugs with known equations for SRGMs, it is possible to predict the future arrival of bugs and when the system is likely to be bug free. A solid prediction of when the system will be bug free requires multiple equations and, clearly, a very solid bug tracking system (Succi et al., 2003).

22.8 Constantly Improving the Code –a.k.a. “Refactoring” There are times when you simply want the code working and out the door. Suppose that tomorrow is the deadline for a major delivery. You have just discovered that your banking system wrongly assumes that $1US is always equal to $1.40CDN. However, the market is fluctuating; now $1US is worth $1.35CDN, and you badly need a different conversion rate inside the system. To get everything working, you simply scan the code and you replace all the occurrences of the factor 1.4 with 1.35. Then, you keep your finger crossed. Clearly, this does not make your system any better. However, it saves your job, and that is a good thing. After this tremendous risk, as soon as you have some spare time, you factor out the exchange rate into a global variable that is constantly updated via a connection to a web site with such information. You improve your code, so that in the future you will no longer have to worry about it.

22.8.1 Problems When Improving the Code In this simple example, there are a couple of potentially dangerous issues. First, the developer needs to have some spare time to do the improvement. This may be a challenge. Then, you need to ensure that while you improve the system, you do not introduce any additional mistakes.

Page 27: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.27 of 22.46

Getting the time to fix the issue may be a challenge. Preventive maintenance should be common in your organization. This may happen at the end of the development of the product, when the product is out the door and you do your final clean up. It may also happen earlier, during development. You can determine that there is a better, more effective way to handle a piece of code so that such restructuring it ends up saving time and effort of your team. Striving for quality is a must. Time should be reserved for restructuring the system to make it easier to handle. Such restructuring is often called “refactoring.” Formally stated, refactoring is the process of modifying the code base without altering its behavior, with the goal of improving its quality. We need to improve the code without introducing bugs. Having a thorough set of tests contributes to this goal. Remember that tests prove only the presence of bugs, not their absence. Still, having a large, smartly designed test base supports the identification of possible bugs introduced while trying to refactor the code. Refactoring is an investment made now in the expectation of future benefits. As such, refactoring adheres to the principles of plan based software engineering. However, refactoring is essential for agile methodologies. As agile methodologies advocate a step-wise elicitation of requirements, it must be possible to adapt the code base to the evolving desires of the customer. The code base needs to be as simple and well-designed as possible. Refactoring is a prerequisite to achieving this goal.

22.8.2 Example of Refactoring There are rules governing refactoring. We will not analyze them now. Rather, we will analyze one of our examples to show concrete cases of refactoring. Here we are going to present the process of refactoring, using the Subway Control System (SCS) case study. SCS is built upon NHD Toolkit, a Java class library for traffic control. Therefore, it is essential to understand NHD Toolkit first. In this example we will use Java. Using C++ would be absolutely the same.

22.8.2.1 Introduction to NHD Toolkit NHD Toolkit is a special purpose Java class library for traffic control. The Verona City Subway System uses NHD Toolkit to implement the SCS. In this section, we will introduce you to the toolkit. Figure 22.9 is the class diagram of NHD Toolkit.

Page 28: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.28 of 22.46

Station

+ getName ( )+ isTerminal ( )+ getTrain ( )+ addSensorListener ( )+ getNextS tation ( )+ getPrevStation ( )+ removeSensorListener ( )

Signal

+ getS ignals ( )

SensorListener

+ trigger ( )

Train

+ acceleration

+ getSpeed ( )+ move ( )+ getLastS tation ( )+ getLength ( )+ reverse ( )

«enumeration»SignalStatus

GREENRED

«enumeration»SensorEvent

TRAIN_ARRIVINGTRAIN_LEFT

RealTime

+ start ( )+ timeElapsed ( )

«S ingleton»SCS

+ start ( )+ stop ( )+ addRealTime ( )+ getInstance ( )+ removeRealTime ( )+ reset ( )

«enumeration»OpMode

MONITORSIMULATION

Operator

+ getRole ( )+ getUserID ( )+ authenticate ( )+ isAuthenticated ( )

Locatable

+ coord

+ getDistanceTo ( )+ getID ( )+ objectsAt ( )+ objectWithID ( )

«enumeration»Direction

FROTO

Door

+ close ( )+ open ( )+ isClosed ( )

TimeUtil

+ currentTimestamp ( )

+ protectS ignal

2+ startS ignal

2

+ status 1

+ opMode1

- realTimeObjects

*

- sensorListener*

+ direction1

- doors*

«instantiate»

Figure 22.9: Class Diagram of NHD Toolkit.

Every public attribute or association indicates that there is an accessor (getter) and a mutater (setter) for this attribute. For example, the Train class has a directed association direction with Direction class. This means the Train class has methods to getDirection and setDirection. The following tables describe the attributes and methods of each class.

Locatable This is the base class for all the objects that have an ID and can be specified with a location (coordinate).

Attributes coord The position of the object. The unit is in meters. Methods getDistanceTo Calculates the distance from this object to another locatable object. getID Returns the ID of this object objectsAt Searches the objects that are located at a certain position.

RealTime This is the base class for the objects that perform real time operations. RealTime class is a subclass of Locatable.

Methods start This method will be called when the system starts. A timestamp is

passed to this method.

timeElapsed This method will be invoked when a time quantum passes. In this example, the time quantum is 0.1 second. Two timestamps are passed to this method: one represents the current time, and the other the last time this method was invoked.

Page 29: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.29 of 22.46

Signal This represents the control signals of the subway system. This is a subclass of Locatable.

Associations status The status of the signal. The value can be either SignalStatus.GREEN or

SignalStatus.RED.

Station This class describes a subway station. This is a subclass of RealTime. Associations protectSignal Retrieves/sets the protection signal of the station. StartSignal Retrieves/sets the start signal of the situation. Methods getName Returns the name of the station. isTerminal Determines whether the station is a terminal station or not.

getTrains Returns the trains that are currently at the station (the head of the train has passed the protection signal, and the tail has not passed the start signal.)

addSensorListener removeSensorListener

Adds/removes a sensor listener to/from the station. When a train arrives or leaves the station, the sensor listeners will be notified with the event.

getNextStation getPrevStation

Gets the next and the previous station.

getProtectSignal setProtectSignal getStartSignal setStartSignal

Gets/sets the protection or start signal of the station.

SensorListener The station’s sensor notifies the sensor listeners when a train enters the station, or a train leaves the station.

Methods

trigger When a train enters or leaves a station, this method will be called. The train that triggers the event, the station that fires the event, and the event type, are also passed to this method. The event type can be either SensorEvent.TRAIN_ARRIVING or SensorEvent.TRAIN_LEFT.

Train The base class for the trains. This class is a subclass of RealTime.

Attributes acceleration The acceleration rate of the train, in km/min2. Associations Direction The direction the train is heading. The value can be either

Direction.FRO or Direction.TO. Methods getSpeed Returns the current speed of the train. The unit is in km/hr.

move This method is called when a time quantum is passed. All subclasses of Train must implement this method to specify the behavior of a specific kind of train.

getLastStation Returns the station the train last visited. When the train’s head passes the protection signal of a station, this value is set to the station the train has just entered.

getLength Returns the length of the train, in meters. reverse Initiates the reverse operation. The train will be brought to the other

side of the terminal station, and wait for the green signal. defineDoor A factory method to create a door of the train.

Page 30: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.30 of 22.46

Door This class describes the behaviors of the doors on the train.

Methods

close Closes the door. In the simulation, there is a 10% possibility that the door will get jammed. If the door is not jammed, it will be closed in 1 to 2.8 seconds.

open Opens the door immediately. isClosed Determines whether the door is closed or not.

SCS A singleton object that represents the Subway Control System. Associations opMode The operation mode of the system. This can be either

OpMode.MONITOR or OpMode.SIMULATION. Methods start Starts the system. stop Stops the system. addRealTime removeRealTime

Adds/removes a real-time object to/from the system.

getInstance Returns the singleton instance of the Subway Control System. reset Removes all the real-time objects and stops the system. In the following example, we are going to use NHD Toolkit to implement the subway control system. We will implement the following functionalities, and in this example you will learn how to refactor the code and make it better:

1. Change the signals. 2. Close the doors and start the train when the signal turns green.

22.8.2.2 Basics There are four essential steps to refactor the code:

1. Write a test. 2. Write the program to pass the test. 3. Refactor the program. 4. Make sure the tests are still passed.

In order to make the program testable, we can set up a simple subway system. Throughout this example, we are going to use the subway system as shown in Figure 22.10. The code snippet in Code Fragment 22.8 creates the simple system. Please note that the subway system lives only in testing space, and the code in Code Fragment 22.8 is not finalized. As you will see in the example, we will extend the functionality of the default Station class.

Page 31: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.31 of 22.46

1st Street(65M)

S P

P S

Central Square(2,110M)

S P

P S

Endever(4,200M)

S P

P S

TO

FROM

0M

0M

130M

130M

2,050M

2,050M 2,170M

2,170M 4,150M

4,150M 4,250M

4,250M

Figure 22.10: The Simple Subway System for Testing.

package verona.subway; import nhd.*; public class SimpleSCS { public static SCS createSimpleSCS(){ SCS scs = SCS.createInstance(); scs.addRealTime( createStation("S01", "1st Street", 65, 0, 130)); scs.addRealTime( createStation("S02", "Central Square", 2110, 2050, 2170)); scs.addRealTime(createStation("S03", "Endever", 4200, 4150, 4250)); return scs; } public static Station createStation(String id, String name, float coord, float start, float end){ Station station = new Station(id, coord, name); Signal s1 = new Signal(id + "1", start); Signal s2 = new Signal(id + "2", end); Signal s3 = new Signal(id + "3", start); Signal s4 = new Signal(id + "4", end); station.setProtectSignal(Direction.TO, s1); station.setStartSignal(Direction.TO, s2); station.setStartSignal(Direction.FRO, s3); station.setProtectSignal(Direction.FRO, s4); return station; } }

Code Fragment 22.8: Snippet to Create A Simple Subway System.

22.8.2.3 Changing Signals First, let’s list the rules for the signals:

1. When a train approaches a station, the protection signal turns red. 2. When a train has left a station, the start signal turns red and the protection

signal turns green. 3. When a train stops at a station, the start signal turns green after the protection

signal of the next station is green, and a certain amount of time has passed.

Page 32: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.32 of 22.46

22.8.2.3.1 Rule 1 How will we write a test to ascertain that when a train approaches a station, the protection signal turns red? We can put a train at the protection signal of Central Square Station (ID: S02), and see if the signal turns red after the system has started for an instant. We can start by creating a special class Train, class FastTrain (Code Fragment 22.9). (The trains in Verona are always 100 meters long ☺.) package verona.subway; import java.sql.Timestamp; import nhd.*; public class FastTrain extends Train { public FastTrain(String id, Station station, Direction dir) { super(id, station, dir, 100); setAcceleration(360); } public void move(Timestamp time, Timestamp lastTimestamp) { } public void start(Timestamp time) { } }

Code Fragment 22.9: Structure of the class FastTrain.

This kind of train moves so fast that it can travel approximately 50 meters in the one second after it starts, and more than 10 meters in the first half second. With FastTrain, we can spend less time running the tests. Now we can write the logic of the tests (Code Fragment 22.10).

Page 33: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.33 of 22.46

public class SignalTest extends TestCase { private SCS simple; public SignalTest(String name) { super(name); } protected void setUp() throws Exception { simple = SimpleSCS.createSimpleSCS(); } public void testSignalRule1(){ Station s02 = (Station)Locatable.objectWithID("S02"); Signal pSignal = s02.getProtectSignal(Direction.TO); Train t = new FastTrain("TrainSR1", s02, Direction.TO); t.setCoord(s02.getProtectSignal(Direction.TO).getCoord()); simple.addRealTime(t); simple.start(); try { Thread.sleep(200); } catch (InterruptedException e) { } simple.stop(); assertEquals(SignalStatus.RED, pSignal.getStatus()); } }

Code Fragment 22.10: Test cases for the system.

In the test in Code Fragment 22.10, we put a fast train just before the protection signal of Central Square Station. After 200 milliseconds (or 0.2 second), the train should pass the protection signal, and the signal should be red. As expected, this test fails because we still have not implemented the logic to change the signal. The simplest way to implement the logic is by adding a sensor listener for each station. Whenever a train arrives, the sensor listener changes the protection signal. Because each station has this sensor listener, it is easier to create a subclass of the default station (Code Fragment 22.11). public class VeronaStation extends Station { public VeronaStation(String id, float coord, String name) { super(id, coord, name); addSensorListener(new SensorListener() { public void trigger( Station station, Train train, SensorEvent event) { //Some actions when a train leaves or enters } }); } }

Code Fragment 22.11: Structure of subclass VeronaStation.

Page 34: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.34 of 22.46

We add an anonymous sensor listener to the station in the constructor. Therefore, when a VeronaStation is initiated, the sensor listener is installed automatically. Now we can fill up the actions when the event is triggered (Code Fragment 22.12). In VeronaStation’s constructor: addSensorListener(new SensorListener() { public void trigger( Station station, Train train, SensorEvent event) { if(event.equals(SensorEvent.TRAIN_ARRIVING)){ Direction d = train.getDirection(); Signal pSignal = getProtectSignal(d); pSignal.setStatus(SignalStatus.RED); } } });

Code Fragment 22.12: Addition to an anonymous sensor listener.

We also need to update the process when creating our simple subway system (Code Fragment 22.13). In SimpleSCS’s createStation: Station station = new Station(id, coord, name);

VeronaStation station = new VeronaStation(id, coord, name);

Code Fragment 22.13: Update of the process when creating the simple subway system.

We can pass the test now. Now we can at the code we have and ask ourselves: does the code have some bad smell? Yes, it does. In our simple subway system, all the stations are created via the createStation method of SimpleSCS. This is a factory method that instantiates a VeronaStation, and should locate in VeronaStation. We can move the createStation method from SimpleSCS to VeronaStation (Code Fragment 22.14). public class SimpleSCS { public static SCS createSimpleSCS(){ SCS scs = SCS.createInstance(); scs.addRealTime( createStation("S01", "1st Street", 65, 0, 130)); ... } public static Station createStation(String id, String name, float coord, float start, float end){ ...

Page 35: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.35 of 22.46

} }

public class SimpleSCS { public static SCS createSimpleSCS(){ SCS scs = SCS.createInstance(); scs.addRealTime( VeronaStation.createStation( "S01", "1st Street", 65, 0, 130)); ... } } public class VeronaStation extends Station { : public static Station createStation(String id, String name, float coord, float start, float end){ : } }

Code Fragment 22.14: Moving the method createStation.

The test is still passed, so we are confident that the refactoring does no harm to our program. We have two options to make the code better:

1. Put the code of createStation in the constructor. To do so, we will need to introduce two more parameters (start and end) to the constructor. After the refactoring, we can only use “new” to create a new instance of VeronaStation. The snippet in Code Fragment 22.15 shows SimpleSCS code after this refactoring.

public class SimpleSCS { public static SCS createSimpleSCS(){ SCS scs = SCS.createInstance(); scs.addRealTime( new VeronaStation("S01", "1st Street", 65, 0, 130)); ... } }

Code Fragment 22.15: Refactoring of the constructor.

2. Change the visibility of the constructor of VeronaStation to protected or

private. After this refactoring, VeronaStation can only be instantiated via the factory method.

Different refactoring gives the code a different smell. In this example, we will use the second option. Factory methods are usually an elegant way to instantiate objects.

Page 36: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.36 of 22.46

Always remember to run the test to make sure that refactoring does not break anything.

22.8.2.3.2 Rule 2 To test rule 2, we can put the fast train at Central Square and test the status of the signals after some time. It will take some time before the train passes the start signal, so we have to wait longer in this test (Code Fragment 22.16). In the body of SignalTest: public void testSignalRule2(){ Station s02 = (Station)Locatable.objectWithID("S02"); Signal pSignal = s02.getProtectSignal(Direction.TO); Signal sSignal = s02.getStartSignal(Direction.TO); Train t = new FastTrain("TrainSR1", s02, Direction.TO); simple.addRealTime(t); simple.start(); try { Thread.sleep(1800); } catch (InterruptedException e) { } simple.stop(); assertEquals(SignalStatus.GREEN, pSignal.getStatus()); assertEquals(SignalStatus.RED, sSignal.getStatus()); }

Code Fragment 22.16: Testing Rule 2.

The test fails. The protection signal is expected to be green, but it is red. We can modify the sensor listener of the station so that when a train leaves, the protection signal is turned green (Code Fragment 22. 17). public void trigger( Station station, Train train, SensorEvent event) { if(event.equals(SensorEvent.TRAIN_ARRIVING)){ Direction d = train.getDirection(); Signal pSignal = getProtectSignal(d); pSignal.setStatus(SignalStatus.RED); } else if(event.equals(SensorEvent.TRAIN_LEFT)){ Direction d = train.getDirection(); Signal pSignal = getProtectSignal(d); pSignal.setStatus(SignalStatus.GREEN); } }

Code Fragment 22. 17: Structure of trigger().

Re-run the test, and we can see we just solved the first problem. However, the start signal does not turn red. We modify the problem once more (Code Fragment 22.18).

Page 37: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.37 of 22.46

public void trigger(Station station, Train train, SensorEvent event) { if(event.equals(SensorEvent.TRAIN_ARRIVING)){ Direction d = train.getDirection(); Signal pSignal = getProtectSignal(d); pSignal.setStatus(SignalStatus.RED); } else if(event.equals(SensorEvent.TRAIN_LEFT)){ Direction d = train.getDirection(); Signal pSignal = getProtectSignal(d); Signal sSignal = getStartSignal(d); pSignal.setStatus(SignalStatus.GREEN); sSignal.setStatus(SignalStatus.RED); } }

Code Fragment 22.18: New version of trigger().

The test shows a green bar. It’s time for refactoring. Look at the code we just added, and we find that there’s duplicated fragments in the sensor listener code of the station. We can consolidate the duplicate conditional fragments (Code Fragment 22.19). public void trigger(Station station, Train train, SensorEvent event) { if(event.equals(SensorEvent.TRAIN_ARRIVING)){ Direction d = train.getDirection(); Signal pSignal = getProtectSignal(d); pSignal.setStatus(SignalStatus.RED); } else if(event.equals(SensorEvent.TRAIN_LEFT)){ Direction d = train.getDirection(); Signal pSignal = getProtectSignal(d); Signal sSignal = getStartSignal(d); pSignal.setStatus(SignalStatus.GREEN); sSignal.setStatus(SignalStatus.RED); } }

public void trigger( Station station, Train train, SensorEvent event) { Direction d = train.getDirection(); Signal pSignal = getProtectSignal(d); if(event.equals(SensorEvent.TRAIN_ARRIVING)){ pSignal.setStatus(SignalStatus.RED); } else if(event.equals(SensorEvent.TRAIN_LEFT)){ Signal sSignal = getStartSignal(d); pSignal.setStatus(SignalStatus.GREEN); sSignal.setStatus(SignalStatus.RED); } }

Code Fragment 22.19: Refactoring of trigger().

Page 38: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.38 of 22.46

22.8.2.3.3 Rule 3 We cannot use the fast train in rule three. We need a train that stops at a station, and we can test the station’s signals. A stopped train is more useful in this test (Code Fragment 22.20). public StoppedTrain(String id, Station station, Direction dir, float length) { super(id, station, dir, length); setAcceleration(0); } public void move(Timestamp time, Timestamp lastTimestamp) { } public void start(Timestamp time) { }

Code Fragment 22.20: Structure of StoppedTrain.

We need to specify the amount of time a train can stop in a station. In Verona City, each station has a different stop time. We can add this stop time as an attribute of VeronaStation (Code Fragment 22.21). In the body of VeronaStation: //stop time in milliseconds private long _toStopTime, _froStopTime; public static Station createStation(String id, String name, float coord, float start, float end, long toStopTime, long froStopTime){ VeronaStation station = new VeronaStation(id, coord, name); station._froStopTime = froStopTime; station._toStopTime = toStopTime; : } public long getStopTime(Direction dir){ if(dir.equals(Direction.TO)){ return _toStopTime; } else{ return _froStopTime; } }

Code Fragment 22.21: Revision of subclass VeronaStation.

We can initialize our simple subway system so that each station has three seconds of stop time. We are ready to write our test now. We can put a StoppedTrain at Central Square Station, and make sure the signal changes after the stop time has passed (Code Fragment 22.22).

Page 39: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.39 of 22.46

public void testSignalRule3(){ VeronaStation s02=(VeronaStation)Locatable.objectWithID("S02"); Signal sSignal = s02.getStartSignal(Direction.TO); long waitTime = s02.getStopTime(Direction.TO) + 300; Train t = new StoppedTrain("TrainSR3", s02, Direction.TO); simple.addRealTime(t); simple.start(); try { Thread.sleep(waitTime); } catch (InterruptedException e) { } assertEquals(SignalStatus.GREEN, sSignal.getStatus()); simple.stop(); }

Code Fragment 22.22: Test for signal rule 3.

Of course, this test fails. Now we are going to implement the logic to turn on expected signals. The signal switch logic behind the scene is a little bit complicated, so it might be a good idea to think thoroughly before we start. Figure 22.11 is the state chart for the signals and events. Dashed arrows are the transitions we already implemented. Now we need to implement other transitions.

Empty

SP

TrainArriving

SP

CountDown

SP

TrainDeparting

SP

WaitingClearance

SP

Train Arriving Train Stopped

Time Up

Next StationGreen

Train Left

WaitingForTrain

Figure 22.11: State Chart for Signals and Events.

We add some code in VeronaStation, so it knows its current state (Code Fragment 22.23).

Page 40: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.40 of 22.46

Revision of the class VeronaStation: private int _toState = 0, _froState = 0; private void setState(Direction dir, int newState){ if(dir.equals(Direction.TO)){ _toState = newState; } else{ _froState = newState; } } private int getState(Direction dir){ if(dir.equals(Direction.TO)){ return _toState; } else{ return _froState; } }

Code Fragment 22.23: Revision of class VeronaStation.

We can now write the first scratch of the method timeElapsed() (Code Fragment 22.24). In the class VeronaStation: public void timeElapsed(Timestamp time, Timestamp lastTimestamp) { super.timeElapsed(time, lastTimestamp); //process state change for the TO side int toState = getState(Direction.TO); if(toState == 0){ //wait for a train to stop //change state to 1 } else if(toState == 1){ //wait for time up //change state to 2 } else if(toState == 2){ //wait for next station's signal //turn start signal to green //change state to 0 } //process state change for the FRO side int froState = getState(Direction.FRO); if(froState == 0){ //wait for a train to stop //change state to 1 } else if(froState == 1){ //wait for time up //change state to 2 } else if(froState == 2){ //wait for next station's signal //turn start signal to green //change state to 0 } }

Code Fragment 22.24: Structure of method timeElapsed().

We can see two segments of code that are almost the same. This is because each station has two sides, and each side uses the same logic. We can extract the logic to a method (Code Fragment 22.25).

Page 41: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.41 of 22.46

In the class VeronaStation: public void timeElapsed(Timestamp time, Timestamp lastTimestamp) { super.timeElapsed(time, lastTimestamp); processState(Direction.TO); processState(Direction.FRO); } private void processState(Direction dir) { int state = getState(dir); if(state == 0){ //wait for a train to stop //change state to 1 } else if(state == 1){ //wait for time up //change state to 2 } else if(state == 2){ //wait for next station's signal //turn start signal to green //change state to 0 } }

Code Fragment 22.25: Restructure of timeElapsed().

Now we can now complete the code (Code Fragment 22.26). In the class VeronaStation: private long _toCountDown, _froCountDown; private long getCountDown(Direction dir){ if(dir.equals(Direction.TO)){ return _toCountDown; } else{ return _froCountDown; } } private void setCountDown(Direction dir, long countDown){ if(dir.equals(Direction.TO)){ _toCountDown = countDown; } else{ _froCountDown = countDown; } } private void processState(Direction dir, Timestamp time) { int state = getState(dir); if(state == 0){ //wait for a train to stop //change state to 1 Train[] trains = getTrains(); if(trains != null){ for(int x=0; x<trains.length; x++){ if(trains[x].getDirection().equals(dir) && trains[x].getSpeed() == 0 && trains[x].getAcceleration() == 0){ setState(dir, 1); setCountDown(dir, time.getTime()); } } }

Page 42: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.42 of 22.46

} else if(state == 1){ //wait for time up if(time.getTime() - getCountDown(dir) < getStopTime(dir)){ return; } //change state to 2 setState(dir, 2); } else if(state == 2){ //wait for next station's signal Signal pSignal; if(getNextStation() == null){ pSignal = getProtectSignal(dir.inverse()); } else{ pSignal = getNextStation().getProtectSignal(dir); } if(pSignal.getStatus().equals(SignalStatus.RED)) return; //turn start signal to green getStartSignal(dir).setStatus(SignalStatus.GREEN); //change state to 0 setState(dir, 0); } }

Code Fragment 22.26: Complete code for processState.

Although we can pass the test now, the code is a little bit messy. We can use Symbolic Constants instead of numeric values, as in the initial example in this chapter (Code Fragment 22.27). In the class VeronaStation: private final static int WAIT_FOR_TRAIN = 0; private final static int COUNT_DOWN = 1; private final static int WAIT_FOR_CLEARANCE = 2; : private void processState(Direction dir, Timestamp time) { int state = getState(dir); if(state == WAIT_FOR_TRAIN){ //wait for a train to stop //change state to COUNT_DOWN Train[] trains = getTrains(); if(trains != null){ for(int x=0; x<trains.length; x++){ if(trains[x].getDirection().equals(dir) && trains[x].getSpeed() == 0 && trains[x].getAcceleration() == 0){ setState(dir, COUNT_DOWN); setCountDown(dir, time.getTime()); } } } } else if(state == COUNT_DOWN){ //wait for time up if(time.getTime() - getCountDown(dir) < getStopTime(dir)){ return; } //change state to WAIT_FOR_CLEARANCE setState(dir, WAIT_FOR_CLEARANCE); } else if(state == WAIT_FOR_CLEARANCE){ //wait for next station's signal Signal pSignal; if(getNextStation() == null){

Page 43: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.43 of 22.46

pSignal = getProtectSignal(dir.inverse()); } else{ pSignal = getNextStation().getProtectSignal(dir); } if(pSignal.getStatus().equals(SignalStatus.RED)) return; //turn start signal to green getStartSignal(dir).setStatus(SignalStatus.GREEN); //change state to WAIT_FOR_TRAIN setState(dir, WAIT_FOR_TRAIN); } }

Code Fragment 22.27: Use of symbolic constant processState.

We notice that there are some comments to make the code understandable. This is a hint that we should do some refactoring to extract a method. First, let’s extract the method isTrainStopped (Code Fragment 22.28). private void processState(Direction dir, Timestamp time) { int state = getState(dir); if(state == WAIT_FOR_TRAIN){ if(isTrainStopped(dir)){ setState(dir, COUNT_DOWN); setCountDown(dir, time.getTime()); } } : } private boolean isTrainStopped(Direction dir){ Train[] trains = getTrains(); if(trains != null){ for(int x=0; x<trains.length; x++){ if(trains[x].getDirection().equals(dir) && trains[x].getSpeed() == 0 && trains[x].getAcceleration() == 0){ return true; } } } return false; }

Code Fragment 22.28: Extraction of the method isTrainStopped.

The test is passed. Now we can extract the method to determine time up (Code Fragment 22.29). private void processState(Direction dir, Timestamp time) { int state = getState(dir); : } else if(state == COUNT_DOWN){ if(isTimeUp(dir, time)){ setState(dir, WAIT_FOR_CLEARANCE); } } : } private boolean isTimeUp(Direction dir, Timestamp time){ if(time.getTime() - getCountDown(dir) < getStopTime(dir)){

Page 44: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.44 of 22.46

return false; } return true; }

Code Fragment 22.29: Extraction of the method isTimeUp.

Last, we can extract the logic that determines whether the protection signal of the next station is green (Code Fragment 22.30). private void processState(Direction dir, Timestamp time) { int state = getState(dir); : } else if(state == WAIT_FOR_CLEARANCE){ if(isNextStationCleared(dir)){ getStartSignal(dir).setStatus(SignalStatus.GREEN); setState(dir, WAIT_FOR_TRAIN); } } } private boolean isNextStationCleared(Direction dir){ Signal pSignal; if(getNextStation() == null){ pSignal = getProtectSignal(dir.inverse()); } else{ pSignal = getNextStation().getProtectSignal(dir); } if(pSignal.getStatus().equals(SignalStatus.RED)){ return false; } else{ return true; } }

Code Fragment 22.30: Extraction of the method isNextStationCleared.

You may say that we can refactor further: write some classes to maintain the state machine used in Rule 3. This may be a good idea because we can get rid of the conditional expressions in method processState. However, it is a big refactoring. In addition, we don’t need other state machines for now. The program is good enough, and it is here to stay. Now the program is far cleaner than it was before refactoring. Plus, the tests still indicate that there is nothing wrong with all the modification. We can see that refactoring really improve the readability and quality of the program.

22.8.3 Guidelines for Refactoring We have seen a few very simple rules governing refactoring. The following list simply describes them. A thorough description of refactoring situations can be found in the two reference books on refactoring (Fowler et al., 1999; Kerievsky, 2004).

1. Use symbolic constants instead of numeric constants. This is a well known approach in coding. It can be extended further when such constants are not really constants (☺) but are instead program-wide values that may be updated from the outside, like the exchange rate we discussed initially in the section.

Page 45: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.45 of 22.46

2. Extract classes. When there is a cohesive portion of a class that has a self-standing role and well-defined behavior, it is relevant to extract it and to make it an independent class. This helps an appropriate assignment of roles and responsibilities and also improves the readability. Subclassing may help such a strategy because we can assign some common behavior to the base class and specify it further in a subclass. Attention should be paid anyway to shared variables and communications among methods.

3. Extract methods. When there is a large method, a portion of which is devoted to a specific goal, it may be wise to extract the portion and make it a self-standing method. The rationale and the caveat are the same as for the rule on the extraction of classes.

There are also tools supporting refactoring. For instance, the Eclipse workbench lets you select among a set of predefined refactoring operations (Figure 22.12).

Figure 22.12: Refactoring support in Eclipse.

22.9 Do Tools Help? In this chapter we have discussed several tools that support a solid management of the code. Clearly, tools are not enough. It is possible to obtain the result with a rigorous process and without the use of specific tools. Agile methodologists say in their “Agile Manifesto” that they put more value on people than on tools (http://agilemanifesto.org). However, there are tools that are really instrumental in doing a good software engineering job. Especially when we deal only with the beast of complexity, tools can be the solution; they help the programmer in mastering the huge complexity of the code. Debugger, version controller, and configuration management tools have significantly improved the life of developers.

Page 46: Managing Code Base

Traditional and Agile Software Engineering Marchesi, Succi

Chapter 22 – Managing the Code Base Page 22.46 of 22.46

Tools also have a very important role in institutionalizing a methodology because tools can make evident the steps required to complete a task or to achieve a goal. Tools are also part of the common “language” that a team shares in developing a project. It is a language of symbols and of keystrokes; still, it is a very important language. Needless to say, tools must be used with care and taken with a grain of salt. Tools are not replacements for good people, and it may be that a good tool given to an inexperience programmer will cause harm if the novice focuses on learning all the subtle details of the tool instead of using the tool for his goal. Tools are instruments, not goals!

References Bar M. and K. Fogel (2003) Open Source Development with CVS, Paraglyph

Publishing

Barnson M.P. and J. Steenhagen (2003) “The Bugzilla Guide,” URL: http://www.bugzilla.org/docs/html/

Bolinger D. and T. Bronson (1995) Applying RCS and SCCS: From Source Control to Project Control, O'Reilly & Associates

Collins-Sussman B., B.W. Fitzpatrick, C.M. Pilato (2004) Version Control with Subversion, URL: http://svnbook.red-bean.com/

Gamma E. and K. Beck (2004) Contributing to Eclipse: Principles, Patterns, and Plug-Ins, Addison Wesley

Fowler M. and M. Foemmel (2003) “Continuous Integration,” URL: http://www.martinfowler.com/articles/continuousIntegration.html

Kerievsky J. (2004) Refactoring to Patterns, Addison Wesley

Leszek P. (2003) “Debugging with Eclipse,” URL: http://linuxdevices.com/articles/AT6046208714.html

Littlewood B. (1981) “Stochastic Reliability Growth: A Model for Fault Removal in Computer Programs and Hardware Design,” IEEE Transactions on Reliability, Dec., pp. 313-320

Lyu M.R. (1996) Handbook of Software Reliability Engineering, McGraw Hill

Oram A. and S. Talbott (1991) Managing Projects with Make, O'Reilly & Associates

Shavor S., J. D'Anjou, S. Fairbrother, D. Kehn, J. Kellerman, P. McCarthy (2003) The Java Developer's Guide to Eclipse, Addison Wesley

Scarab Web Site (2004) “The Scarab Documentation,” URL: http://scarab.tigris.org/project_docs.html

Succi G., W. Pedrycz, M. Stefanovic, B. Russo (2003) “An Investigation on the Occurrence of Service Requests in Commercial Software Applications,” Empirical Software Engineering, 8(2):197-215

Stallman R. and R. McGrath (1998) GNU Make, Free Software Foundation

Vesperman J. (2003) Essential CVS, O'Reilly & Associates