An Arificial Neural Network Based Software Re-Engineering Tool for Extracting Objects

7/27/2019 An Arificial Neural Network Based Software Re-Engineering Tool for Extracting Objects

1/6

An Artificial Neural Network Based SoftwareReengineering Tool for Extracting ObjectsJ Brant Arseneau and Tim Spracklen

University of Abe rdeen, Electronic Research Group, Department of Engineering,Fraser-Noble Building, Old Aberdeen,AB9 2UE, Scotland,U.K.

Abstract: Given the current level of interest in software reengineering and object-orientedmethodologies, the possibility of a tool that could reengineer non object-oriented source code(COBOL ) to a object-oriented formal specification and then into an implem entation might attractwidespread com mercial interest. In this paper the authors address the issues in reen gineering to anobject-oriented form: in particular the extraction of objects from existing source code .The paper is divided into three sections. The first section introduces software reengineeringand the ob ject-oriented methodology while the reader is referred to key papers in these areas . Thesecond section describes the three compon ents of the softw are reenginee ring tool, which are, (1)the decomposer, (2) the information base, and (3) the composers, and a detailed description of theartificial neural network is presented. The third sec tion describes results obtained from attemp tingthe extraction of objects from existing systems.

1. IntroductionSoftware Reengineering (SRE) refers to any process which improves ones understanding ofsoftware or improves the software itself (Arnold 1993). There are many papers that highlight thebenefits of software reengeering, such benefits include increased maintainability, automation,evolvability, and reuasbility (Britcher 1990, Slovin& Malik 1991, Sneed 1991). Companies haveinvested heavily in developing complex systems in the past and rather than ignore the existingsoftware, they can apply reengineering technologies to partially recoup their software investment.These technologies have several underlying themes which can be divided into three categories:(1) Understanding software, (2) improving software, and (3 ) capturing, preserving, and extendingknowledge about the software. Understanding software has been advanced by the development ofadvanced browsers, measurement techniques, and design recovery tools. Technology forimproving software includes tools which restructure, redocument, remodularise, and identifyreusable components in software. Finally, capturing, preserving, and extending knowledge aboutthe software can be achieved by decomposition, program understanding, transformations andobject recovery. The reengineering framework described in this paper addresses these themesdirectly through several tools which attempt to improve existing software by representing it in anew form.Several papers have underlined the importance of the object-oriented representation forimproving program understanding, migration, and for providing a suitable framework forreusability (Meyer 1987, Korson& McGregor 1990, Wirfs-Brock& Johnson 1990, Rumbaugh&Blaha et la. 1991). However, a large amount of code that is in use today has been written in a non-object-oriented form, such as COBOL. This may suggest a need for a tool that would reengineernonobject-oriented source code to an object-oriented form and further into an implementation.Todevelop such a tool using conventional programming techniques may prove to be difficult(Liebowitz 1993). For example, the use of a pure KBS would either cause difficulty in finding the0-7803-1901-X/944.00 01994 EEE 3888


2/6

rules, or the number of rules found would be excessive(). A possible solution may consist of ahybrid system, combining ANNs and a rule based system. The ANN could reduce the number ofrules that describe the system allowing the KB S to work with a reasonable number of rules. Thisis an area where ANNs are known to perform functions beyond the capability of conventionalKBS; they have the ability to make functional use of experimental knowledge (Aleksander 1989).If an object-oriented representation of a nonobject-oriented source code (COBOL) is to begenerated, objects must first be extracted from the original source code. This paper describes asoftware reengineering tool that incorporates KBS and ANN technology to identify and extractobjects from nonobject-oriented source code through a set of views. Results are presented andanalysed on test data which include C and COBOL source code.

SoftareWorkProduct

2. Project DemonstratorThe basic framework of most reengineering tools consist of a decomposer, an information baseand a composer (Chikofsky& Cross 1990), see Figure 1. The decomposer parses the input sourcestoring it in an information base. The composer then generates several views from the informationbase conveying different information about the software. Views of software, which can bespecifications, source code, measurements, reports or graphics, are then used to understand thesoftware better or transform it to new representations which attempt to improve the software.The software reengineering tool introduced in this paper is composed of several interactingprograms which attempt to extract objects from nonobject-oriented source code. This section willdescribe the three major components of the tool: (1) The decomposer, which is a parser andsemantic analyser for breaking down the original code, (2) the information base, which is anobject-oriented database representing information about the software and (3) the composer, whichis a combination of two translators. The first generates a dependency diagram view by extractingdata and method objects from the information base, and the second generates an object view,which is used by the ANN for object extraction.

Decomposer ComposerParser, ViewSemantic Composer(s)Analsier

* 4I I

2.1. The Decomposer and Information BaseDecomposition is the process of transforming a particular view of software into objects andrelationships which are stored in an information base. Working with a decomposed view ratherthan on the source code directly saves the time and energy of parsing the code for eachtransformation. Several papers introduce object based representations (Kozaczynski& Ning 1989and Harandi& Ning 1990) which allow additional program abstractions to be derived or added to

U Information BaseNew Viewsof SoftwareFormatGraphicsDocumen ationMetricsLogicReports

Figure 1. A typical autom ated reengineering process3889


3/6

program-object\\- onstantmodule\/ td d e file variableIstatement 4 Aparamter-var functionc-var noA al-v ar medefined userdefine d referenceco nst va luL on st

reference-pra value-pra nested-var normal-var ma pva r program sub-program

Figure 2. Object-oriented information source base hierarchy

the existing hierarchy giving the information base great flexibility.In this system the source code is originally decomposed using lex and yacc into animplementation language independent object-oriented information base which will be referred toas the source base, see Figure 2 . The parser uses the programming language syntax to translate thesource information; there is a corresponding parser for each source language. From the sourcebase a set of views can be composed, see section 2.2, with each level depending on the pervious.2.2. Composing ViewsComposing views is the process of generating visual information about the software from theinformation base. A composer is a tool which inspects the information base, collects relevantobjects and relationships, builds visual representations, and displays a view. This sectiondescribes the set of composers within the system, what technology they use and how they relate toobject extraction.

2.2 I . Composing a D ependency Diagram ViewThe dependency diagram view displays relationships between data and method objects. Thevisual representation of the dependency diagram consists of two rows of rectangles, withrectangles in the top row representing the data objects, and rectangles in the bottom rowrepresenting the method objects. The relationship between data and method objects is representedby a connection between the two objects: If a method works on a certain data object a connectionis made between the two objects, and conversely, if the method does not work on a certain dataobject no connection is made.These relationships represent design coupling; procedures that share data, share designdecisions. This idea was first put forward by Parnas (Parnas 1971), and forms the basis of hisinformation hiding principle. Information hiding occurs when a module's access to data, which isnot needed by the module, is denied by using the scope rules of the programming language. Oneadvantage of hiding such unnecessary information is that it can not be changed or deleted by unitswhich are not supposed to use that information.Object-oriented design is based on the idea of information hiding. To extract objects fromnonobject-oriented source code, procedures that share data must be identified within thedependency diagram view.

3890


4/6

2.2.2.Composing O bject Views: Extracting O bjects using A " sObservations show, that as the lengths of the connections between the related data and methodobjects are minimised, by rearranging the positions of the objects in the dependency diagram, aclustering effect occurs. These clusters intuitively represents a movement towards identificationof object candidates because the clusters contain methods that share data: In essence, partitioningthe dependency diagram into usable objects. As most dependency diagrams would be very large,because they represent massive application programs, it would be very difficult to rearrange theobjects manually.An ANN can be used to automate this process. The ANN can attempt to optimises a functionwhich describes the energy required to move object x to location y. This is known, in general, as aquadratic assignment problem (QAP). The energy is a function of the total Euclidian length ofrelated connections (edges) between the object being moved and other objects. To solve anoptimisation problem on a neural network requires the problem to be mapped onto the network byconstructing suitable energy functions. In general the problem can be defined by the energyfunction:

E = cost+global constraints, (1)where the QAP is

The QAP can be thought as the optimal location of M objects at N possible positions, where N2 M. The variable vik = 1 f the object k is located at position i , or Vik =.O if it is not. The coefficientc i .represents the cost of transporting the object from position i to positionj, and dkl is the numbero/ objects to be moved (in this case dkl = 1 and can be removed from the expression). Theobjective then is to minimise the cost function E@), equation (2), of moving all objects to aposition under the set of blocking constraints.To solve the QAP problem using a neural network an appropriate representation of theproblem must be decided upon, and a suitable energy function constructed. The QAP isrepresented by a neural network containing c = NM neurons arranged in a two dimensional arraywhere N = n + n' and M = m + m', see Figure 3 . The m + m' rows represent the data and methodobjects and the n + n' columns represent the position of the data and method objects. In otherwords, doubly indexed neurons are used to represent the assignment of dependency diagramobjects to positions in the dependency diagram. For such a neural network representation we canformulate the energy function as:

N U Y r u

The first term of the energy function is the cost function which measures the total distance ofthe connections for a given configuration of data and method objects (assuming that the final stateof the network is valid with respect to the constraint functions). The last three terms of the energyfunction represent constraint functions. The second term achieves a minimum value of zero whenthere is one neuron ON in each row (i.e. the data or method object only occupies one object in thedependency diagram). The third term achieves a minimum value of zero when there is one neuronON in each column (i.e. one object in the dependency diagram only contains one data or methodobject). The fourth term is expected to force the total number of objects in the optimum to N. Theminimisation of the energy function, equation (3), enables us to map to a set of mean fieldannealing (MFA) equations:

.. 3891


5/6

Data Position MethodPasition--2 n n+l n+2 n+n'100 .. @@@ ... 0,2 @@ .. . @@@ ... @. . .. . ... .3m @ @ .. . @@@ . . Q

m + 1 @ @ .. . @@a .. . @m + 2 @ @ .. . @@@ .. . @!

Columnsd Ncuruns +Figure 3. Assignment array of n e w " for the arrangementof the dependenc y diagram

which can be simulated by a mean field theory Potts neural network where uik represents theinternal potentials of each neuron (Cichocki & Unbehauen 1993). When the neural networkcompletes the optimisation process, where all neurons are forced either to 1 or 0, there will beexactly one neuron ON and the rest OFF for each group of neurons. The neural networks outputrepresentation of the optimising process can then be translated to the arranged dependencydiagram view, see Figure 4.,which can be then analysed by the programmer.3. Experimental R esults and SummaryThe current reengineering tool can read in C and COBOL code, breaking it down into objects andrelationships, storing it into an object-oriented database. An ANN then works on a dependencydiagram view of the software, which displays the relationships between method and data objects,attempting to extract potential objects from the existing code. These objects can then be used tobuild new representations of the software, such as an object-oriented representation.In one experiment, using a COBOL business application consisting of 5k of code, thesoftware reengineering tool extracted 21 classes (A small example of what the ANN produces canbe observed in Figure4.). ome of these classes were actually the same objects coded in differentmodules. Manual revision of the classes produces 11 distinct classes, which may seem to be asmall recovery from 5k of code. However, since the declarations and operations of these 11unique classes were duplicated several times, the size of the classes extracted was one quarter ofthe total size of the application. By replacing the original code with the 11 classes, the size of theapplication reduces significantly.In another experiment, using a C user-interface application consisting of I l k of code, thesoftware reengineering tool extracted 45 classes. Again some of these classes were actually thesame objects coded in different modules. Manual revision of the classes, once again, produces

3892


6/6

1 2 3 4 5 6 7 1 2 3 4 5 61 yo o 1 o o o o o o o o o 012456I1234

0 0 0 0 0 1 0 0 0 0 0 0 01 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 0 1 0 0 00 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 1 0 0 0 0 0J0 0 0 0 0 0 0 0 1 0 0 0 06 1 0 0 0 0 0 0 0 0 0 1 0 0

Figure 4. Neural network's representation of the arranged dep endency diagramless distinct classes. Not all the objects the tool extracts can be used, however, in such cases thesystem still benefits because it can be reorganised, localising plans; i.e. bringing together codethat is related increases the program understanding of the code.

Our goal is to develop a software reengineering tool that would reengineer non object-oriented source code to a formal specification and then into an implementation. We havedescribed the framework of a hybrid system that combines ANNs and a Rule Based System toconstruct this tool. A procedure for extracting objects, from a non-object-oriented form, usingANNs and has been developed using only code as input; it does not need design diagrams ordocuments.

ReferencesAleksander, I. (1989 ), Neural Co mpu ting Architectures: The Design of Brain-Lik e Mach ines. North Ox fordArnold, R.S. (1993) Software Engineering. IEEE Com puter Society Press.Britcher, R.N. (1990) "Re-engineering Software: A Case Study". IBM Systems Journal, Vol. 29, No.5 551-5 67.

Academic Publishers L td ISBN 0-946536-47-3.

Chikofsky, E., and Cross, J. (1990) "Reverse Engineering and Design Recov ery: A Taxonomy". IEEE Software, Vol.7, No.1, Jan., 13-17.Cichocki, A., and U nbehauen, R. (1993) "Neural networks for Optimization and Signal Processing". John Wily &Harandi, T.H., and Ning, J.Q. (1990) "Knowledge-BasedProgram A nalysis". IEEE Software, Jan., 74-81.Korson, T., and McGregor, J. (1990) "U nderstanding Object-Oriented:A Unifying Paradigm. Comm unications of theKozaczynski, W., and N ing, J.Q. (1989) "SRE: A Knowledge-based Environment for Large-scale Software Re-Liebowitz, J. (1993) "Roll Your Own Hybrids". Byte, 18-8,133-115.Mey er, B. (1 987) "Reu sability: The Ca se for Object-Orien ted Design". IEEE S oftware, Vol. 4, No.2, March.Parnas, D. (1971) "On the Criteria To Be Used In decom posing System into Modules". Tech. Report, ComputerRum baugh , J., Blaha, M., Preme rlani, W., Eddy, F., and Lorensen, W. (1991) O bject oriented Modelling and D esign.Slovin, M., and Malik, S. (1991) "Reengineering to Reduce System M aintenance: A Case Study". SoftwareSneed, H.M. (1991) "Economics of Software Re-engineering". Joumal of Software Maintenance: Research and

Sons.

ACM, Sept.engineering Activites". Proceedings of the 11* Conference on Software Engineering, 113-122.

Science Depa rtment, Carnegie-M ellon University.Prentice Hall.Engineering. JulJAug., 14-24.Practice, Sept., 163-182.Wirfs-B rock, R., Johnson, R. (1990) Su rveying Current Research in Object Oriented Design. Comm unications of theACM, Sept.

3893

An Arificial Neural Network Based Software Re-Engineering Tool for Extracting Objects

Documents

Transcript of An Arificial Neural Network Based Software Re-Engineering Tool for Extracting Objects