Distributed dynamic slicing of Java programs

18

Click here to load reader

Transcript of Distributed dynamic slicing of Java programs

Page 1: Distributed dynamic slicing of Java programs

www.elsevier.com/locate/jss

The Journal of Systems and Software 79 (2006) 1661–1678

Distributed dynamic slicing of Java programs

Durga P. Mohapatra a, Rajeev Kumar b,*, Rajib Mall b, D.S. Kumar b, Mayank Bhasin b

a Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Orissa 769 008, Indiab Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur WB 721 302, India

Received 21 February 2005; received in revised form 15 January 2006; accepted 15 January 2006Available online 28 February 2006

Abstract

We propose a novel dynamic slicing technique for distributed Java programs. We first construct the intermediate representation of adistributed Java program in the form of a set of Distributed Program Dependence Graphs (DPDG). We mark and unmark the edges of theDPDG appropriately as and when dependencies arise and cease during run-time. Our algorithm can run parallely on a network of com-puters, with each node in the network contributing to the dynamic slice in a fully distributed fashion. Our approach does not require anytrace files to be maintained. Another advantage of our approach is that a slice is available even before a request for a slice is made. Thisappreciably reduces the response time of slicing commands. We have implemented the algorithm in a distributed environment. Theresults obtained from our experiments show promise.� 2006 Elsevier Inc. All rights reserved.

Keywords: Program slicing; Dynamic slicing; Program dependence graph; Debugging; Object-oriented program; Multithreading; Java; Distributedprogramming

1. Introduction

As software applications grow larger and become morecomplex, program maintenance activities such as addingnew functionalities, porting to new platforms, and correct-ing the reported bugs consume enormous effort. This isespecially true for distributed object-oriented programs.In order to cope with this scenario, programmers needeffective computer-supported techniques for decompositionand dependence analysis of programs. Program slicing isone technique for such decomposition and dependenceanalysis. A program slice with respect to a specified vari-able v at some program point P consists of those parts ofthe program which potentially affect the value of v at p.The pair hv,pi is known as the slicing criterion. A static sliceis valid for all possible executions of a program, while adynamic slice is meaningful for only a particular executionof a program (Ashida et al., 1999; Binkley et al., 1996;

0164-1212/$ - see front matter � 2006 Elsevier Inc. All rights reserved.

doi:10.1016/j.jss.2006.01.009

* Corresponding author. Tel.: +91 5122596074; fax: +91 5122597586.E-mail addresses: [email protected] (D.P. Mohapatra), rkumar@

cse.iitkgp.ernet.in (R. Kumar), [email protected] (R. Mall).

Korel and Laski, 1988). Program slicing has been foundto be useful in a variety of applications such as debugging,program understanding, testing and maintenance (Agrawalet al., 1993; Gallagher and Lyle, 1991; Goswami and Mall,2002; Kamkar, 1993; Mall, 2003; Mund et al., 2002; Zhanget al., 2004).

Many real life object-oriented programs are distributedin nature and run on different machines connected to a net-work. The emergence of message passing standards, suchas MPI, and the commercial success of high speed net-works have contributed to making message passing pro-gramming common place. Message passing programminghas become an attractive option for tackling the vexingissues of portability, performance, and cost effectiveness.As distributed computing gains momentum, developmentand maintenance tools for these distributed systems seemto gain utmost importance.

Development of real life distributed object-oriented pro-grams presents formidable challenge to the programmer. Itis usually accepted that understanding and debugging ofdistributed object-oriented programs are much hardercompared to those of sequential programs. A typical

Page 2: Distributed dynamic slicing of Java programs

1662 D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678

nature of distributed programs, lack of global states,unsynchronized interactions among threads, multiplethreads of control and a dynamically varying number ofprocesses are some reasons for this difficulty. An increasingamount of effort is being spent in debugging, testing andmaintaining these products. Slicing techniques promise tocome in handy at this point. Through the computation ofa slice for a message passing program, one can significantlyreduce the amount of code that a maintenance engineer hasto analyze to achieve some maintenance tasks. However,research attempts in program slicing area have focusedattention largely on sequential programs. Many researchreports addressed efficient handling of data structures suchas arrays, pointers, etc. in the sequential framework.Attempts have also been made to deal with unstructuredconstructs like goto, break, etc. Although researchers havereported extension of the traditional concept of programslicing to static slicing of distributed programs, dynamicslicing of distributed object-oriented programs has scarcelybeen reported in the literature.

Any effective dynamic slicing technique for distributedobject-oriented programs needs to address the importantconcepts associated with object-oriented programs suchas encapsulation, inheritance, polymorphism and messagepassing. This poses new challenges during slice computa-tion which are not encountered in traditional program slic-ing and render representation and analysis techniquesdeveloped for imperative language programs inadequate.So, the object-oriented features need to be considered care-fully in the process of slicing.

We have already mentioned that object-oriented pro-grams are often large. Therefore, to be practically usefulin interactive applications such as debugging, programtraces should be avoided in the slicing process. Maintainingexecution traces become unacceptable due to slow I/Ooperations. Further, to be useful in a distributed environ-ment, the construction of slices should preferably beconstructed in a distributed manner. Each node in adistributed system should contribute to the slice by deter-mining its local portion of the global slice in a fully distrib-uted fashion.

Keeping the above identified objectives in mind, in thispaper we propose an algorithm for computing dynamicslices of distributed Java programs. Though we have con-sidered only Java programs, programs in any other lan-guage can be handled by making only small changes toour algorithm. We have concentrated only on the commu-

nication and concurrency issues in Java. Standard sequen-tial and object-oriented features are not discussed in thispaper, as these are easily found in the literature (Ashidaet al., 1999; Larson and Harrold, 1996; Liang and Larson,1998; Krishnaswamy, 1994; Umemori et al., 2003; Wakin-shaw et al., 2002; Wang and RoyChoudhury, 2004). Forexample, Larson and Harrold (1996) have discussed thetechniques to represent the basic object-oriented features.Their technique can easily be incorporated into our algo-rithm to represent the basic object-orientation features.

We have named our proposed algorithm distributed

dynamic slicing (DDS) algorithm for Java programs. Toachieve fast response time, our algorithm can run in a fullydistributed manner on several machines connected througha network, rather than running it on a centralized machine.We use local slicers at each node in a network. A localslicer is responsible for slicing the part of the programexecutions occurring on the local machine.

Our algorithm uses a modified program dependencegraph (PDG) (Horwitz et al., 1990) as the intermediate rep-resentation. We have named this representation distributed

program dependence graph (DPDG). We first statically con-struct the DPDG before run-time. Our algorithm marks

and unmarks the edges of the DPDG appropriately asand when dependencies arise and cease during run-time.Such an approach is more time and space efficient and alsocompletely does away with the necessity to maintain a tracefile. This eliminates the slow file I/O operations that occurwhile accessing a trace file. Another advantage of ourapproach is that when a request for a slice for any slicingcriterion is made, the required slice is already available.This appreciably reduces the response time of slicingcommands.

The rest of the paper is organized as follows. In Section2, we present some basic concepts and definitions that willbe used in our algorithm. In Section 3, we discuss the inter-mediate program representation: distributed program

dependence graph (DPDG). In Section 4, we present ourdistributed dynamic slicing (DDS) algorithm for distributedobject-oriented programs. In Section 5, we briefly presentan implementation of our algorithm. In Section 6, wecompare our algorithm with related algorithms. Finally,Section 7 concludes the paper.

2. Basic concepts

Before presenting our dynamic slicing algorithm, wefirst briefly discuss the relevant features of Java. Then, weintroduce a few basic concepts and definitions thatwould be used in our algorithm. In the following discus-sions and throughout the rest of the paper, we use theterms a program statement, a node and a vertex inter-changeably.

2.1. Concurrency and communication in Java

Java supports concurrent programming using threads. Athread is a single sequential flow of control within a pro-gram. A thread is similar to a sequential program in thesense that each thread also has a beginning, an executionsequence and an end. Also, at any given time during therun of a thread, there is a single point of execution. How-ever, a thread itself is not a program; it cannot run on itsown. To support thread programming, Java provides aThread class library, which defines a set of standard oper-ations on a thread such as start( ), stop( ), join( ), suspend( ),resume( ) and sleep( ), etc. (Naughton and Schildt, 1998).

Page 3: Distributed dynamic slicing of Java programs

D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678 1663

Java supports communication among threads boththrough shared memory and message passing. Objectsshared by two or more threads are called condition vari-

ables. Access to these variables must be synchronized forthe proper working of the system. The Java language andrun-time system support thread synchronization throughthe use of monitors. A monitor is associated with a specificdata item and functions to lock that data. When a threadholds the monitor for some data item, other threads cannotinspect or modify the data. The code segments within aprogram that access the same data from within separatethreads are known as critical sections. In Java programs,critical sections need to be marked with the keywordsynchronized for synchronized access to shared data. Javaprovides the methods wait( ), notify( ), and notifyall( ) tosupport synchronization among different threads.

When a thread needs to send a message to anotherthread, it calls the method getOutputStream( ). To receivea message, the receiving thread calls the method getInput-

Stream( ). Java provides sockets to support distributedprogramming among component programs running ondifferent machines. By using the key word Socket, a clientprogram can specify the IP address and the port number

of a sever program with which it wants to communicate.A distributed object-oriented program P = (P1,P2, . . . ,

Pn) is a collection of concurrent individual programs Pi

such that each of the Pi’s may communicate with other pro-grams through the reception and transmission of messages.We refer to the individual programs Pi as the component

programs. We assume asynchronous (non-blocking) sendand synchronous (blocking) receive message passingamong component programs, in our DDS algorithm. How-ever, other models can easily be considered through minoralterations to our proposed algorithm. Each componentprogram may contain multiple threads. We assume use ofsockets for message passing among threads of differentcomponent programs, and assume use of shared objects

for message passing among different threads within a singlecomponent program.

We assume that the number of nodes on which a distrib-uted object-oriented program runs, is predefined. We makeno assumptions on the order in which messages arrive atthe destination once they are sent. The only assumptionwe make is that messages sent by one thread to anotherare received in the same order as dispatched by the sendingthread. However, messages sent concurrently from differentthreads to one thread may arrive in any arbitrary order. Allmessages that arrive at a thread are collected in a message

queue. A thread executing a getInputStream( ) call removesthe oldest message that is available at the front of thequeue. The receiving thread waits for the sending threadto put a new entry in the queue, if the queue is empty. Here,communication is non-deterministic in the sense that, thereceiving thread continues with its execution by selectingwhichever message arrives first.

We explain the message passing mechanism in distrib-uted Java programs by taking a sample distributed Java

program. We subsequently use the example program toexplain the notations we have used in our algorithm. Weconstruct the intermediate representation of this exampleprogram in the next section. Finally, we use this exam-ple program to explain the working of our proposedalgorithm.

Let us consider the distributed Java program shown inFigs. 1 and 2. In this example, Fig. 1 represents a client pro-

gram and Fig. 2 represents a server program. The client pro-

gram specifies the IP address of the machine where theserver program runs and the port number for connection.The client program reads the value of the integer variablex from the key board and performs some arithmetic com-putations using it. Then, it sends the results of the compu-tation to the server program through a socket. The clientprogram in Fig. 1 contains one thread called clthd. The ser-

ver program in Fig. 2 contains two threads thread1 andthread2. We distinguish the threads by assigning uniquethread-ids to each of the threads. The thread thread1

receives the result sent by the client program and performssome arithmetic computations. Then, thread1 sends theresults to thread2 through a shared object obj. The threadthread2 performs some computations using the sharedobject obj and sends the results to the client program

through a socket. Using this result, the client program per-forms the final computations and displays the computedvalue.

2.2. Notations and terminologies

We now introduce a few terms and notations whichwould be used through out the rest of the paper.

D1. Precise Dynamic Slice. A dynamic slice is said to beprecise iff it includes only those statements that actuallyaffect the value of a variable at a program point for thegiven execution. It is very difficult to determine whether agiven slice is precise or not since determining a precise sliceis an undecidable problem (Weiser, 1982). However, usingthe notion of a precise slice, we can determine whether agiven slice is more precise than another for most casesexcepting a few pathogenic cases.

D2. Correct Dynamic Slice. A dynamic slice is said to becorrect iff it contains all the statements of the program thataffect the slicing criterion. A dynamic slice is said to beincorrect iff it fails to include some statements of the pro-gram that affect the slicing criterion. Note that the wholeprogram is always a correct slice for any slicing criterion.A correct slice is imprecise if and only if it contains at leastone statement that does not affect the slicing criterion.

D3. def(var). Let var be an instance variable in a classin an object-oriented program. A node x is said to be adef(var) node, if x represents a definition for the variablevar. In Fig. 1 the node 14 is a def(z) node.

D4. defSet(var). The set defSet(var) denotes the set of alldef(var) nodes. In Fig. 1 defSet(z) = {14,15}.

D5. use(var) node. Let var be a variable defined in a classin an object-oriented program. A node x is said to be a

Page 4: Distributed dynamic slicing of Java programs

Fig. 1. An example client program.

1664 D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678

use(var) node, iff it uses the variable var. In Fig. 1, node 17is a use(z) node.

D6. recentDef(thread,var). Let s be a def(var) node of acomponent program Pi. Let pi and pj be threads in Pi.Then, recentDef(pi,var) represents the most recent defini-tion of the variable var available to the thread pi. Further,recentDef(pi,var) = (pj, s) indicates that the most recent def-inition of the variable var in thread pj is also the mostrecent definition of the variable var with respect to threadpi. pi may or may not be same as pj, and the variable var

can either be a local or a shared variable.D7. Distributed Control Flow Graph (DCFG). A distrib-

uted control flow graph (DCFG) G of a component pro-gram Pi of a distributed program P = (P1, . . . ,Pn) is aflow graph (N,E,Start,Stop), where each node n 2 N repre-sents a statement of Pi, and each edge e 2 E representspotential control transfer among the nodes. Nodes Start

and Stop are two unique nodes representing entry and exitnodes of the component program Pi, respectively. There isa directed edge representing a control flow from node a tonode b if control may flow from node a to node b.

D8. Post Dominance. Let x and y be two nodes in aCCFG G. Node y post dominates node x iff every directedpath from x to stop passes through y.

D9. Control Dependence. Let G be a DCFG and x be atest (predicate) node. A node y is said to be control depen-

dent on a node x iff there exists a directed path D from x toy such that:

(1) y post dominates every node z 5 x in D.(2) y does not post dominate x.

D10. Data Dependence. Let x be a def(var) node and y bea use(var) node in a DCFG G. A node y is said to be data

dependent on a node x, iff there exists a directed path D

from x to y such that there is no intervening def(var) nodein D.

D11. Thread Dependence. For a DCFG G, let x be thenode representing the run( ) statement of thread pi. A nodey is said to be thread dependent on x, iff there exists a direc-ted path D from x to y such that none of the nodes in D is arun node.

D12. Synchronization Dependence. A statement y in athread is synchronization dependent on a statement x inanother thread, iff execution of y is dependent on executionof x due to a synchronization operation.

Let y be a wait( ) node in thread t1 and x be the corre-sponding notify( ) node in thread t2. Then the node y is saidto be synchronization dependent on node x. For example, inFig. 2, node 10 represents a wait( ) call (which is invoked inthread2) and node 6 represents the corresponding notify( )call (which is invoked in thread1). So, in Fig. 2, node 10 issynchronization dependent on node 6.

D13. Communication Dependence. In a Java program twotypes of communication dependencies may exist. We restrictcommunication dependency among threads belonging tothe same component program to be only S-Communication

Page 5: Distributed dynamic slicing of Java programs

Fig. 2. An example server program.

D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678 1665

dependence type. Whereas communication dependencyamong threads belonging to different component programsis termed as M-Communication dependence type. In S-Com-

munication dependence, shared memory may be used to sup-port communication among threads. In this type, twothreads exchange data via shared objects. In M-Communi-cation dependence, communication among threads occursthrough sockets.

S-Communication Dependence. For two threads belong-ing to the same component program, a statement y in onethread is S-Communication dependent on a statement x inanother thread, if the value of an object defined at x isdirectly used at y through inter thread communication.

Let x be a def(var) node in a shared object present ina component program P1 and let y be the correspondinguse(var) node in the same shared object. Then node y is saidto be S-Communication dependent on node x. For examplein Fig. 2, node 9 represents a use(flag) node (which is usedin thread2) and node 7 represents the correspondingdef(flag) node (in thread1). So, in Fig. 2, node 9 isS-Communication dependent on node 7. Similarly, node11 (in thread2) is S-Communication dependent on node 5(in thread1). Note that both thread1 and thread2 belongto the same component program.

M-Communication Dependence. In a component pro-gram P1, let x be a node representing a statement whichinvokes a getOutputStream( ) method and y be the corre-sponding node representing a statement which invokes agetInputStream( ) method in another component programP2. Let both P1 and P2 use the same socket for communi-cation. Then, the node y is said to be M-Communication

dependent on node x. For example in Fig. 1, node 18 repre-sents a statement which invokes a getInputStream( )method. Node 52 in Fig. 2 represents the statement whichinvokes the corresponding getOutputStream( ) method. So,node 18 of Fig. 1 is M-Communication dependent on node52 of Fig. 2.

3. Intermediate representation

In this section, we introduce an intermediate representa-tion for distributed Java programs. We have named ourintermediate representation Distributed Program Depen-

dence Graph (DPDG). We use this representation to com-pute dynamic slices of distributed Java programs. Wefirst discuss some issues that must be addressed to be ableto accurately capture the dependencies existing in a distrib-uted program, we then introduce our DPDG, and finallyexplain how it can be constructed.

The intermediate representation for a concurrent object-oriented program on a single machine can be constructedstatically as in Mohapatra et al. (2004b). But, this interme-diate representation cannot be used to accurately model adistributed object-oriented program where true concurrencyexists among the different threads running on differentmachines. For distributed object-oriented programs, wecan have communication dependency among threads

Page 6: Distributed dynamic slicing of Java programs

1666 D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678

running on different machines. A getInputStream( ) callexecuted on one machine, might have a pairing getOutput-

Stream( ) on some other remote machine. To represent thisaspect, we introduce a logical(dummy) node in the DPDG.We call this logical node as a C-node. In the following, wedefine a C-node.

D14. C-Node. Let GD1and GD2

be the DPDGs of twocomponent programs P1 and P2, respectively. Let x be anode in GD1

representing a statement invoking a getOutput-

Stream( ) method. Let y be a node in GD2representing the

statement invoking the corresponding getInputStream( )method. A C-Node represents a logical connection of thenode y of DPDG GD1

with the node x of the remote DPDGGD2

. Node x represents the pairing of getOutputStream( )with a getInputStream( ) call at node y. Node y is M-Com-

munication dependent on node x.As an example consider node 18 in Fig. 3 and node 25 in

Fig. 4 which represent statements invoking getInput-

Stream( ) methods. At those nodes, the messages sent bythe sending threads (e.g., from statement 52 in Fig. 2 andfrom statement 16 in Fig. 1, respectively), are received.So, the algorithm associates C-nodes C(18) and C(25) atnodes 18 and 25, respectively. Node 18 is M-communication

dependent on node C(18) and node 25 is M-communication

dependent on node C(25) due to message passing.The C-nodes maintain the logical connectivity among

DPDGs representing different component programs. Wetherefore call them logical nodes. A C-node does not repre-sent any specific statement in the source code of a compo-nent program. Rather, it encapsulates the triplethsend_TID, send_node_number,dynamic_slice_at_send_nodei

26

7

12

13

14 15

17

11control dependence edge

data dependence edge

thread dependence edge

Communication dep. edge

Fig. 3. DPDG of the example

representing the pairing of the components in a distributedprogram. Here, send_TID represents the id of the threadsending the message, send_node_number represents the par-ticular label number of the statement sending the messageand dynamic_slice_at_send_node represents the dynamicslice at the sending node. C-nodes capture communicationdependencies among the threads of different componentprograms. Since C-nodes are not mapped to any specificprogram statement, we call them dummy nodes.

The triplet hsend_TID, send_node_number,dynamic_slice_at_send_nodei, in case of inter-thread communicationthrough sockets, should be made available on the C-nodeC(x) corresponding to the getInputStream( ) node x ofthe DPDG. For this, the thread executing a getOutput-Stream( ) call needs to perform the following. The threadpasses the message to be sent to the slicer. The slicer piggy-backs this triplet on the message. Whenever any thread exe-cutes a getInputStream( ) call, the slicer extracts the tripletfrom the message in the message queue and passes theactual message to the receiving thread. Thus the slicerupdates the information on C-nodes and establishes thecommunication dependency.

It may be noted that the number of C-nodes in theDPDGs of a distributed Java program, equals the numberof getInputStream( ) calls present in the program. In theDPDG, for a getInputStream( ) node x, the correspondingC-node is represented as C(x).

Using the discussed terminology and concepts, we cannow define a Distributed Program Dependence Graph

(DPDG). Let P = (P1, . . . ,Pn) be a distributed Java pro-gram, and Pi be a component program of P. P is repre-

24

25

27

6

16

9

18

19

20

21 22

23

8

10

Slice Point

C(18)

client program of Fig. 1.

Page 7: Distributed dynamic slicing of Java programs

53

54 58

59

63

64

20

21 22 2341

60

2425

26

27

28

29

30

3132

4

56

7

46

47

48 49

50

52

8

9

10

11

4243 44

45

61

control dependence edge

data dependence edge

thread dependence edge

62

Communication dep. edge

synchronization dep. edge

6665

19

51

C(25)

33

34

Fig. 4. DPDG of the example server program of Fig. 2.

D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678 1667

sented using a set of DPDGs ðGD1; . . . ;GDnÞ. The distrib-

uted program dependence graph (DPDG) GDi of the com-ponent program Pi is a directed graph ðNDi ;EDiÞ whereeach node n (excepting the dummy nodes) represents astatement in Pi. For x; y 2 NDi , ðy; xÞ 2 EDi iff any one ofthe following holds:

(1) y is control dependent on x. Such an edge is called acontrol dependence edge.

(2) y is data dependent on x. Such an edge is called a data

dependence edge.(3) y is thread dependent on x. Such an edge is called a

thread dependence edge.(4) y is synchronization dependent on x. Such an edge is

called a synchronization dependence edge.(5) y is communication dependent on x. Such an edge is

called a communication dependence edge.

For all the nodes x, representing getInputStream( ) calls,in the component program Pi, a dummy node C(x) is cre-ated, and a corresponding dummy M-Communication edge(x,C(x)) is added.

A Distributed Program Dependence Graph (DPDG)captures the basic thread structure of a distributed Java

program component along with its run-time behavior.Thus a DPDG represents dynamic thread creation, syn-chronization of threads, and inter-thread communicationusing message passing. This graph contains the informa-tion available from other remote slicers by having addi-tional logical nodes (C-nodes). A DPDG can contain ninedifferent types of nodes. In the following, we list these typesof nodes:

(1) A def (assignment) node represents a statement defin-ing a variable,

(2) A use node represents a statement using a variable,(3) A predicate node represents a statement containing

an if construct,(4) A run node represents a statement containing a run( )

statement,(5) A notify node represents a statement containing a

notify( ) method call,(6) A wait node represents a statement containing a

wait( ) method call,(7) A getInputStream( ) node represents a statement

invoking a getInputStream( ) method,(8) getOutputStream( ) node represents a statement

invoking a getOutputStream( ) method,

Page 8: Distributed dynamic slicing of Java programs

1668 D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678

(9) A C-node is a dummy node associated with the getIn-

putStream( ) node, and represents its logical connec-tion with the corresponding getOutputStream( )node of a remote DPDG.

The DPDGs of the example programs given in Figs. 1and 2 are shown in Figs. 3 and 4, respectively. In thesefigures circles represent program statements and ellipsesrepresent the C-nodes. Edges represent the variousdependencies existing among program statements. Since,the S-communication dependence and M-communication

dependence are handled in a similar fashion in our DDSalgorithm, so we have used the same notations (dashed-dot edge) to represent them in the DPDG.

It can be observed that control dependencies do not varywith the choice of input values and hence can be deter-mined statically at compile time. We refer to control depen-dencies as static dependencies. The dependencies arising dueto data definitions, statements appearing under the scopeof selection and loop constructs, inter-thread synchroni-zation and inter-thread communication are handled atrun-time after execution of every statement. These depen-dencies are dynamic dependencies and have to be handledappropriately at run-time.

4. Distributed dynamic slicing (DDS) algorithm

In this section, we first briefly explain our DDSalgorithm. Subsequently, we illustrate the working of ouralgorithm through an example. Next, we investigate thespace and time complexities of the DDS algorithm.

4.1. Overview of DDS algorithm

We now provide a brief overview of our dynamic slicingalgorithm. Before execution of a distributed Java programP = (P1, . . . ,Pn), the DCFG of each component programPi is constructed statically. Next, we statically constructthe DPDG of each component program Pi from the corre-sponding DCFG. During execution of a component pro-gram Pi, we mark an edge of the DPDG when itsassociated dependence exists, and unmark the edge whenits associated dependence ceases to exist Mohapatra et al.(2004a). Since control dependencies do not change duringrun-time, we permanently mark the control dependenceedges. We consider all the data dependence edges, threaddependence edges, synchronization dependence edges andcommunication dependence edges for marking and unmark-ing. We support communication to occur across differentmachines. The following activities are explicitly carriedout in our DDS algorithm to capture this communication.Inter-machine communication is captured at run-time byadding C-nodes in the DPDG. The addition of C-nodes inthe DPDG takes care of any communication dependencythat might exist at run-time between communicatingthreads on different machines.

Whenever a statement invoking a getInputStream( )method is executed during a run of the program, the slicerchecks the message queue for availability of any messagefrom any communicating thread. It then extracts the trip-let hsend_TID, send_node_number,dynamic_slice_at_send_nodei that was piggybacked on the actual message. Then,the slicer updates the information on the C-node regardingthe execution of the pairing getOutputStream( ) node insome thread on a remote or a local machine.

We compute the dynamic slice of a distributed Java pro-gram with respect to a distributed slicing criterion. Wedefine a distributed slicing criterion for a component pro-gram Pi, as a triplet hp,u,vari, where u is the statementof interest in thread p and var is a variable used or definedat statement u. During execution of the component pro-gram Pi, let Dynamic_Slice(p,u,var) with respect to the dis-tributed slicing criterion hp,u,vari denote the dynamic slicewith respect to variable var in the most recent execution ofthe statement u in thread p. Let (x1,u), . . . , (xk,u) be all themarked incoming dependence edges of u in the updatedDPDG after an execution of the statement u. Then, wedefine the dynamic slice with respect to the present execu-tion of the statement u, for the variable var in thread p as

Dynamic_Slice(p,u,var) = {(p,x1), . . . , (p,xk)}[Dynamic_Slice(p,x1,var) [ � � � [ Dynamic_Slice(p,xk,var).

Let {var_1,var_2, . . . ,var_k} be the set of all the vari-ables used or defined at a statement u in some thread p.Then, we define dynamic slice of the statement u as

Dynamic_Slice(p,u) = Dynamic_Slice(p,u,var_1) [ Dy-

namic_Slice(p,u,var_2) [ � � � [ Dynamic_Slice(p,u,var_k).Our slicing algorithm operates in the following three

main stages:

Stage 1: Construct the intermediate program representa-tion graph statically.

Stage 2: Manage the DPDG at run-time.Stage 3: Compute the required dynamic slice.

In the first stage, the DCFG of each component pro-gram Pi is constructed from a static analysis of the sourcecode. Also, in this stage the static DPDG is constructedusing the DCFG. Stage 2 of the algorithm handles run-timeupdations and is responsible for maintaining the DPDG asthe program execution proceeds. The maintenance of theDPDG at run-time involves marking and unmarking thedifferent dynamic dependencies as they arise and cease,and creating nodes for dynamic creation of threads,objects, etc. Stage 3 is responsible for computing thedynamic slices for a given slicing criterion using theDPDG. Once a slicing criterion is specified, our DDS algo-rithm immediately displays the dynamic slice with respectto the slicing criterion by looking up the correspondingDynamic_Slice computed during run time.

To achieve fast response time, our DDS algorithmparallely runs on several machines connected through a net-work. For this purpose, we use local slicers at each remote

Page 9: Distributed dynamic slicing of Java programs

D.P. Mohapatra et al. / The Journal of Syst

machine. Our slicing algorithm in effect operates as thecoordinated activities of local slicers running at the remotemachines. Each local slicer contributes to the dynamic sliceby determining its local portion of the global slice in afully distributed fashion. We now present our DDS algo-rithm for distributed Java programs in pseudo-code form.

Algorithm: Distributed Dynamic Slicing (DDS) algorithm.Input: Slicing Criterion hp,u,variOutput Dynamic_Slice(p,u,var)

Stage 1: Constructing Static Graphs(1) DCFG Construction

(a) Node Construction(i) Create two special nodes start and stop

(ii) For each statement s of the sub-program Pi

do the following:(A) Create a node s

(B) Initialize the node with its type, list ofvariables used or defined, and its scope.

(b) Add control flow edges

for each node x do the following

for each node y do the following

Add control flow edge (y,x) if control mayflow from node y to node x.

(2) DPDG Construction(a) Add control dependence edges

for each test(predicate) node u, dofor each node x in the scope of u, do

Add control dependence edge (u,x) and mark

it.

(b) Add data dependence edges

for each node x, dofor each variable var used at x, do

for each reaching definition u of var, doAdd data dependence edge (u,x) andunmark it.

(c) Add thread dependence edges

for each run node u, do

Add thread dependence edge (u,x) for everynode x that is thread dependent on u andunmark it.

(d) Add synchronization dependence edges

for each wait node x in thread t1, do

for the corresponding notify node u in thread t2,do

Add synchronization dependence edge (u,x)and unmark it.

(e) Add S-Communication dependence edges

for each use(var) node x in thread t1, do

for the corresponding def(var) node u in threadt2, do

Add S-Communication dependence edge (u,x)and unmark it.

(f) Add M-Communication dependence edges

for each getInputStream( ) node u, do

Add a C-node C(u)

Add M-Communication dependence edge(u,C(u)) and unmark it.

Stage 2: Managing the DPDG at run-time(1) Initialization: Do the following before execution of

each of the component program Pi at the correspond-ing local slicers:(a) Set Dynamic_slice(NULL,u,var) = / for every

variable var used or defined at every node u ofthe DPDG.

(b) Set recentDef(NULL,var) = / for every variablevar in Pi.

(c) Set message queue = /.(d) Set hsend_TID, send_node_number,dynamic_slice_

at_send_nodei = NULL for every C-node C(x).

ems and Software 79 (2006) 1661–1678 1669

(2) Runtime Updations: Run the component programsparallely. For a component program Pi, carry outthe following at the corresponding local slicer aftereach statement (p,u) of Pi is executed:(a) Unmark all incoming marked dependence edges

to (p,u) excluding the control dependence edges,if any, associated with the variable var, corre-sponding to the previous execution of the node u.

(b) Update data dependencies: For every variable var

used at node (p,u), mark the data dependenceedge corresponding to the most recent definitionrecentDef(p,var) of the variable var.

(c) Update thread dependencies: For every node u,mark the thread dependence edge between themost recently executed run node and the node(p,u).

(d) Update synchronization dependencies: If u is await node, then mark the incoming synchroniza-tion dependence edge corresponding to the asso-ciated notify node.

(e) Update S-Communication dependencies: If u is ause(var) node in thread t1, then mark the incom-ing S-Communication dependence edge, if any,from the corresponding def(var) node in threadt2.

(f) Update M-Communication dependencies: If (p,u)is a getInputStream( ) node, then mark the incom-ing M-Communication dependence edge, if any,from the corresponding C-node C(u).

(g) Update dynamic slice for different dependencies:(i) Handle data dependency: Let {(d1,u), . . . ,

(dj,u)} be the set of marked incoming data

dependence edges to u in thread p. Then,Dynamic_Slice(p,u) = {(p,d1), . . . , (p,dj)} [Dynamic_Slice(p,d1) [ � � � [ Dynamic_Slice

(p,dj), where d1,d2, . . . ,dj are the initial verti-ces of the corresponding marked incomingedges of u.

(ii) Handle control dependency: Let (c,u) be themarked control dependence edge. Then,Dynamic_Slice(p,u) = Dynamic_Slice(p,u) [{(p,c)} [ Dynamic_Slice(p,c).

Page 10: Distributed dynamic slicing of Java programs

1670 D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678

(iii) Handle thread dependency: Let (t,u) be themarked thread dependence edge. Then,Dynamic_Slice(p,u) = Dynamic_Slice(p,u) [{(p, t)} [ Dynamic_Slice(p, t).

(iv) Handle synchronization dependency: Let u bea notify node in thread p and s be a wait nodein thread p1. Let (s,u) be the marked synchro-

nization dependence edge. Then,Dynamic_Slice(p,u) = Dynamic_Slice(p,u) [{(p1, s)} [ Dynamic_Slice(p1, s).

(v) Handle S-Communication dependency: Let u

be a use(var) node in thread p and (z,u) be themarked S-Communication dependence edge

from the corresponding def(var) node z inthread p1. Then,Dynamic_Slice(p,u) = Dynamic_Slice(p,u) [{(p1,z)} [ Dynamic_ Slice(p1,z).

(vi) Handle M-Communication dependency: Letu be a getInputStream( ) node and (u,C(u))be the marked communication dependence edge

associated with the corresponding C-nodeC(u). Then, Dynamic_Slice(p,u) = Dynamic_Slice(p, u) [ {(p, C(u))} [ Dynamic_Slice(p,C(u)).

Stage 3: Computing dynamic slice for a given slicingcriterion

(1) Dynamic Slice Computation:

(a) For every variable var used at node u in thread p

of the component program Pi, do

Let (d,u) be a marked data dependence edge

corresponding to the most recent definition ofthe variable var, (c,u) be the marked control

dependence edge, (s,u) be the marked synchroni-zation dependence edge, (t,u) be the marked

thread dependence edge, (z,u) be the marked S-Communication dependence edge, and (C(u),u)be the marked M-Communication dependenceedge. Then,Dynamic_Slice(p,u,var) = {(p,d), (p,c), (p1,s), (p, t),(p1,z), (p,C(u))} [ Dynamic_Slice(p,d) [ Dynamic_Slice(p,c) [ Dynamic_Slice(p, s) [ Dynamic_Slice

(p, t) [ Dynamic_Slice(p1, z) [ Dynamic_Slice(p,C(u))//p, p1 may be different threads.

(b) For a variable var defined at node u, doDynamic_Slice(p,u,var) = Dynamic_Slice(p,u).

(2) Slice Look Up:

(a) If a slicing command hp,u,vari is given for acomponent program Pi, carry out the follo-wing:(i) Look up Dynamic_Slice(p,u,var) for the con-

tent of the slice.(ii) Display the resulting slice.

(b) If the program has not terminated, go to step 2 ofStage 2.

4.2. Working of the DDS algorithm

We illustrate the working of the algorithm with the helpof an example. Consider the distributed Java programgiven in Figs. 1 and 2. The threads in the client and server

programs are identified by unique thread-ids. Let thethread-id of the clthd in Fig. 1 be 1001, the thread-id ofthd1 in Fig. 2 be 2001 and the thread-id of thd2 in Fig. 2be 2002. The updated DPDGs are obtained after applyingstage 2 of the DDS algorithm and are shown in Figs. 5 and6. Let us compute the dynamic slice with respect to variableq at statement 23 of the thread clthd in the client program

(Fig. 1). This gives us the slicing criterion h1001, 23,qi inthe client program. With input data s = 20 in the client pro-gram in Fig. 1 and b = 2 in the server program in Fig. 2, weexplain how our DDS algorithm computes the dynamicslice.

During the initialization step, our algorithm firstunmarks all the edges excepting control dependence edgesof the DPDG and sets Dynamic_Slice(p,u,var) = / forevery node u of the DPDG. The algorithm has markedthe synchronization dependence edges (6,10) in Fig. 6 assynchronization dependency exists between statements 6and 10 due to wait-notify relationship. Statement 9 iscommunication dependent on statement 7 and statement11 is communication dependent on statement 5 due tothe shared objects flag and s, respectively. So, in Fig. 6,the algorithm marks the S-Communication dependence

edges (7,9) and (5, 11). Node 9 is communication depen-

dent on node C(9) and node 63 is communication depen-dent on node C(63) due to message passing. So, thealgorithm marks the M-Communication dependence edges

(C(9),9) as shown in Fig. 5 and (C(63),63) in Fig. 6. Sim-ilarly, the algorithm marks the thread and data depen-dence edges when the respective dependencies arise. Wehave shown all the marked edges in Figs. 5 and 6 in boldlines.

Now we explain how the DDS algorithm finds the back-ward dynamic slice with respect to the slicing criterionh1001,23,qi. According to our DDS algorithm, thedynamic slice at statement 23 of the client program, is givenby the expression Dynamic_Slice(1001, 23,q) = {(1001,21), (1001, 6)} [ Dynamic_Slice(1001,21) [ Dynamic_Slice(1001,21). By applying the DDS algorithm, we get thefinal dynamic slice at statement 23 of Fig. 1. The state-ments included in the dynamic slice are shown asshaded vertices in Figs. 5 and 6. The dynamic slice is alsoshown as the statements in rectangular boxes in Figs. 7and 8.

4.3. Correctness of DDS algorithm

In this section, we sketch the proof of correctness of ourDDS algorithm.

Page 11: Distributed dynamic slicing of Java programs

19

20

21 22 23

2425

26

27

28

29

30

3132

33

4

56

7

8

9

10

11

control dependence edge

data dependence edge

thread dependendence edge

start point

Communication dep. edge

synchronization dep. edge

53

54 58

59 60

63 65 6664

6261 41

4243 44

45

46

47

48 49

50

51

52

C(25)

34

Fig. 6. Updated DPDG of server program.

24

25

27

6

7

12

13

14 15

16

18

19

20

22

2317

8

10

11control dependence edge

data dependence edge

thread dependence edge

Communication dep. edge

21

9

26

Slice Point

C(18)

Fig. 5. Updated DPDG of client program.

D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678 1671

Page 12: Distributed dynamic slicing of Java programs

Fig. 7. Dynamic slice in the client program (Fig. 1) for slicing criterion (1001,23,q).

1672 D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678

Theorem 1. DDS algorithm always finds a correct dynamicslice with respect to a given slicing criterion.

Proof. The proof is given through mathematical induction.Let P = (P1, . . . ,Pn) be a distributed Java program forwhich a dynamic slice is to be computed using DDS algo-rithm. Let Pi be a component program of P. For any givenset of input values to Pi, the dynamic slice with respect tothe first executed statement is certainly correct, accordingto the definition. From this, we can argue that, the dynamicslice with respect to the second executed statement is alsocorrect. During execution of the component program Pi,assume that the algorithm has computed correct dynamicslices prior to the execution of a statement u. To completethe proof, we need only to show that the dynamic slicecomputed after execution of the statement u is correct.Note that the statements that affect the execution of thestatement u must have been executed prior to this executionof the statement u. It is obvious that the dynamic sliceDynamic_Slice(p,u,var) contains all those statements whichhave affected the current value of the variable var used at u,since our DDS algorithm has marked all the incomingedges to u only from those nodes on which node u is depen-dent. The Steps 2(b), 2(c), 2(d), 2(e) and 2(f) of Stage 2 of

the DDS algorithm ensure that the node u is dependent(with respect to its present execution) on a node v if andonly if the edge (u,v) is marked in the DPDG of the com-ponent program Pi. If a node has no affect on the variablevar, then it will not be included in the dynamic slice Dyna-

mic_Slice(p,u,var). So, Dynamic_Slice(p,u,var) is a correct

dynamic slice. In other words, we can say that the dynamicslices computed prior to this execution of the statement u

are correct. Therefore, the Steps 2(g(i)), 2(g(ii)), 2(g(iii)),2(g(iv)), 2(g(v)), 2(g(vi)) of Stage 2 and Steps 1(a) and1(b) of Stage 3 of the DDS algorithm ensure that thedynamic slices computed after execution of the statementu are correct. Further Step 2(b) of Stage 3 of the DDS algo-rithm guarantees that the algorithm stops when executionof the component program Pi terminates. This establishesthe correctness of the algorithm. h

4.4. Algorithm analysis

In this section, we analyze the space and time complexityof our DDS algorithm.

Space complexity. Let Pi be a component program of thedistributed Java program P. We assume that the number of

Page 13: Distributed dynamic slicing of Java programs

Fig. 8. Dynamic slice in the server program (Fig. 2) for slicing criterion(1001,23,q).

D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678 1673

threads in a component program Pi is bounded by a smallpositive number. Let Pi contains ni program statements.Let k be the number of component programs in the distrib-uted program P. The value of k is usually a small finitenumber in a loosely coupled environment. Let N be thetotal number of statements in all the component programsof P. So, N ¼

Pki¼1ni. We used C-nodes for each getInput-

Stream( ) node in the DPDG of every component program.The C-nodes are used to maintain logical connectivityamong various component programs running on differentmachines. So, the slice at some arbitrary node of oneDPDG may contain nodes of some other remote DPDGs.The number of C-nodes in the DPDGs of a distributed Javaprogram, equals the number of getInputStream( ) callspresent in the program. Since, the number of getInput-

Stream( ) calls present in a component program isbounded, so number of C-nodes in all the DPDGs of thedistributed program P is bounded. It can be easily realizedthat the space requirement for the DPDG of a componentprogram Pi having ni statements is Oðn2

i Þ. We haveassumed that the number of statements of a componentprogram is bounded by the total number of statements inthe whole distributed program. Also, if N ¼

Pni, then

ðP

niÞ2 < N 2. So, the space requirement for all the DPDGsof the distributed program P having N statements is O(N2).We need the following additional space at run-time formanipulating the DPDG:

(1) To store Dynamic_Slice(p,u,var) for every statementu of the component program Pi, at most O(N) spaceis required, as the maximum size of the slice is equalto the size of the distributed program P. So, for ni

statements in the component program Pi, at mostO(niN) space is required. Since ni is bounded by N,so in the worst case, the space requirement for Dyna-

mic_Slice(p,u,var), becomes O(N2), where N is thetotal number of statements in P.

(2) Let there be v number of variables present in the com-ponent program Pi. To store recentDef(thread,var)for every variable var of Pi, at most O(ni) space isrequired. Assuming the number of variables present(v) is less than the number of statements (ni), ourDDS algorithm will require Oðn2

i Þ space to store therecentDef(thread,var) of all the variables.

Since the space complexity of the DPDG and the run-time storage requirements is O(N2), the space complexityof our DDS algorithm is O(N2), N being the total numberof statements of the distributed program P.

Time complexity. To determine the time complexity ofour DDS algorithm, we need to consider two factors mak-ing up the time required to compute a slice. The first one isthe execution time requirement for the run-time manipula-tions of the DPDG. The second one is the time required tolook up the data structure Dynamic_Slice for extracting thedynamic slice, once the slicing command is given. Let Si bethe length of execution of the component program Pi. Let

Page 14: Distributed dynamic slicing of Java programs

1674 D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678

S ¼Pk

i¼1Si, where k is the number of component programsin P. Let N be the total number of statements in all thecomponent programs in P, i.e., N ¼

Pki¼1ni. Then, the time

required for computing and updating information corre-sponding to an execution of a statement is O(kN2), sincethe updations occur simultaneously in the local nodes.The value of k is usually a small finite number for a looselycoupled environment. So the time required for computingand updating information corresponding to an executionof a statement is O(N2). Hence, the run-time complexityof the DDS algorithm for computing the dynamic slice,for the entire execution of the distributed program P isO(N2S). We consider the complexity of computing the sliceonce a slicing criterion is defined, this excludes run-timecomputations required to maintain the DPDG.

The DPDG is annotated with the dynamic slices for theexecuted statements. So, the dynamic slices can be lookedup in O(N) time, where N is the total number of statementsin the distributed program P.

5. Implementation

In this section, we present a brief description of a toolwhich we have developed to implement our dynamic slicingalgorithm for distributed object-oriented programs. Ourtool can compute the dynamic slice of a distributedobject-oriented program with respect to a given slicing

Distributed OOP code

Lexical Analyzer

Tokens

various data structers

Remote Slicer Program

Q

to construct DCFG

Program Info. in

Program Analysis Phase

Static DCFG

su

su

Pi

Info. to remote slicerson other machines

Dynamic Slice

Fig. 9. Schematic di

criterion. The current version handles only a subset of Javalanguage constructs. Now, we are trying to extend our toolto handle the complete Java syntax. We have named ourtool Dynamic Slicer for Distributed Java programs (DSDJ).To construct the intermediate graphs we have used thecompiler witting tools JLEX and JYACC (Levine et al.,2002). A distributed Java program is given as the inputto the JLEX program. The JLEX program automaticallygenerates the DPDGs for the component programs. Theschematic design of our implementation is shown in Fig. 9.

A distributed Java program is read as input to our slicer.The lexical analyzer, parser and semantic analyzer compo-nents are combined and the joint component is termed asprogram analysis component (Aho et al., 1986). The lexicalanalyzer part has been implemented using the standard lex-ical analysis tool JLEX. The semantic analyzer componenthas been implemented using JYACC, the standard tool forLALR(1) parser. During semantic analysis, the Java sourcecode is analyzed token by token to gather the variousprogram dependencies.

The tokens are first used to construct the DCFG (Dis-tributed Control Flow Graph). Next, using the DCFGthe corresponding DPDG (Distributed Program Depen-dence Graph) is constructed as mentioned in stage 1(DPDG Construction) of the DDS algorithm. The sourceprogram is then automatically instrumented, by addingcalls to the slicer module after every statement in the sourceprogram.

CodeInstrumentor

InstrumentedSource Code

Compileand Execute

Local Slicer

Q

Program

DPDG ofbprogram P1 ofLocal M/C

DPDG ofbprogram Pn of

Local M/C

P1

Pn

Copy of Instrumented code given to

process Pi on its creation at runtime.

Graphical User Interface Slicing

Criterion

Analyzer Semantic Parser and

agram of DSDJ.

Page 15: Distributed dynamic slicing of Java programs

Local SlicerProgram

Machine 1

Local SlicerProgram

Local SlicerProgram

Local SlicerProgram

Local SlicerProgram

Machine 2

Machine 3

Machine 4

Machine 5

Fig. 10. Communication among different local slicers.

D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678 1675

We have used local slicers at every node of the distrib-uted system for running the algorithm in a distributed man-ner. Each local slicer contributes to the dynamic slice bydetermining its local portion of the global slice in a fullydistributed fashion. The local slicers update the DPDGsas and when the dependencies arise and cease at run-time,and compute the dynamic slice depending on the specifiedslicing criterion through the GUI. The local slicers alsocommunicate among each other to help in the inter-component dependencies. Fig. 10 shows how the local slic-ers communicate with each other in a distributed envi-ronment.

Table 2Memory requirement of DDS algorithm

Sl. no. Total prg. size (# stmts.) # Component program

1 250 22 355 23 462 24 558 35 670 36 782 37 894 3

Table 1Average run-time requirement and overhead cost of DDS algorithm

Sl. no. Prg. size (# stmts.) # Component programs Normal exec. tim

1 250 2 942 355 2 1173 462 2 1414 558 3 1655 670 3 1906 782 3 2157 894 3 247

We have tested the working of our slicing tool, DSDJ,using a large number of distributed Java programs andfor several slicing criteria. Our tool supports inter-threadsynchronization and inter-thread communication usingsockets and shared memory. We studied the run-timerequirements of our DDS algorithm for several programsand for several runs. Table 1 summarizes the average run-

time requirements and over head costs of the DDS algo-rithm. Since, we could not found any algorithm fordynamic slicing of distributed object-oriented programs,so we do not present any comparative results. We haveonly presented the results obtained from our experiments.The performance results of our implementation completelyagree with the theoretical analysis. From the experimentalresults, it can be observed that the average run-timerequirement increases slowly as the program size increases.So, the over head cost increases slowly as the program sizeincreases. Table 2 summarizes the memory requirements ofthe DDS algorithm. It can be observed that the memoryrequirement increases slowly as the program size increases.This is due to the fact that the number of nodes and edgesof the intermediate graph and the number of objectsincrease as the program size increases. It may be notedthat, we have conducted the experiments for some typicalexample programs. So, the results such as average run-timerequirements and memory requirements may vary fromprogram to program. As the tool DSDJ does not needany trace files to store the execution history, it does notimpose any restrictions on the size of the distributed pro-grams. Also it saves the expensive file I/O operations.Another advantage is that, the marking and unmarkingtechnique used in our approach obviates the necessity tocreate any new nodes in the different iterations of a loop.Thus, the run-time data structure remains bounded evenin the presence of several loops.

s # Objects present Memory reqmt. (in KB)

55 50276 71488 930

105 1125127 1350145 1575167 1798

e (in ms) Avg. run-time reqmt. (in ms) Over head cost (in ms)

142 48185 68237 96294 129355 165416 201482 235

Page 16: Distributed dynamic slicing of Java programs

1676 D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678

6. Comparison with related work

Slicing of concurrent object-oriented programs has beeninvestigated by many researchers (Chen and Xu, 2001;Zhao, 1999a,b; Zhao et al., 1996). Slicing of distributedprocedural programs (Cheng, 1993; Duesterwald et al.,1992; Kamkar and Krajina, 1995; Korel and Ferguson,1992; Li et al., 2004) has also drawn the attention of manyresearchers. To the best of our knowledge, no algorithm fordynamic slicing of distributed object-oriented programs hasbeen reported in the literature. In the absence of anydirectly comparable work, we compare our algorithm withthe existing dynamic slicing algorithms for distributedprocedural programs.

Korel and Ferguson (1992) have proposed an extensionof their dynamic slicing algorithm (Korel and Laski, 1988)to distributed programs with Ada type rendezvous commu-nication. In their approach, each process generates acomplete execution trace. The necessary dependence infor-mation to construct program slices is determined postmor-tem by analyzing the stored traces. Slicing algorithm ofKorel and Ferguson (1992) operates on complete executiontraces whose lengths may be unbounded. The computedslices are not independent programs and are executed usingan explicit run-time scheduler which ensures the replay ofthe recorded communication events.

Duesterwald et al. (1992) presented a hybrid parallel algo-rithm for computing dynamic slices of procedural distrib-uted programs using a distributed dependence graph. Theiralgorithm combines both static and dynamic informationto compute a slice. They used a Distributed DependenceGraph (DDG) to represent distributed program. A DDGcontains a separate vertex for each statement and controlpredicate in the program. Control dependencies betweenstatements are determined statically, prior to execution.Edges for data and communication dependencies are addedto the graph at run-time. Slices are computed in the usualway by determining the set of DDG vertices from whichthe vertices specified in the criterion can be reached. Boththe construction of the DDG and the computation of slicesis performed in a distributed manner. Separate DDG con-struction procedure and slicing procedure are assigned toeach process pi in the program. The processes in the programcommunicate when a send or a receive construct is encoun-tered. Additionally, they proposed to transform non-deter-ministic communication constructs to deterministic ones toprovide re-executable slices. Their approach requires theuser to specify a slicing criterion in terms of a particular pro-cess and execution position. However, since a single vertex isused for all occurrences of a statement in the execution his-tory, inaccurate slices may be computed in the presence ofloops. They did not consider communication throughshared objects. Also, their method cannot be applied to pro-grams where processes send messages asynchronously due tothe assumption of synchronous message send.

Cheng (1993) presented an alternate dependence graph-based algorithm for computing dynamic slices of proce-

dural distributed and concurrent programs. He usedProgram Dependence Net (PDN) as the intermediate repre-sentation. The PDN representation of a concurrent pro-gram is basically a generalization of the initial approachproposed by Agrawal and Horgan (1990). The PDN verti-ces corresponding to the executed statements are marked,and the static slicing algorithm is applied to the PDNsub-graph induced by the marked vertices. So, if a state-ment in a while loop is executed in some iteration, thenthe corresponding vertex is marked and included in theslice. But, if that statement is not executed in some otheriteration, then that marked vertex is not removed fromthe slice. So, this approach yields inaccurate slices for pro-grams having loops. Our algorithm unmarks the edges ofthe DPDG when the dependency does not exist. So, if astatement was executed in some iteration of a loop andfor some other iteration it is not executed, then our algo-rithm successfully omits that statement from the slice.

Li et al. (2004) and Rilling et al. (2002) presented twonovel predicate-based dynamic slicing algorithms for proce-dural distributed programs. Their algorithms are based ona partially ordered multi-set (POMSET) model. Unlike tra-ditional slicing criteria that focus only on parts of the pro-gram that influence a variable of interest at a specificposition in the program, a predicate focuses on those partsof the program that influence the predicate. The dynamicpredicate slices capture some global requirements or sus-pected error properties of a distributed program and com-putes all statements that are relevant. Their algorithmshandle distributed programs that communicate throughmessage passing. They did not consider communicationthrough shared objects. They have not considered theobject-orientation aspects too.

Goel et al. (2003) proposed compression schemes for rep-resenting execution profiles of shared memory parallel pro-grams. Their representation captures control, data flow andsynchronization in the execution of a shared memory multi-threaded program running on a multiprocessor architec-ture. According to their approach the control and data flowof each processor is maintained individually as whole pro-gram paths (WOP). The total order of the synchronizationoperations executed by all processors and the annotation ofeach processor’s WOP with synchronization counts help tocapture the inter-processor communications which are pro-tected via synchronization primitives such as lock, unlock

and barriers. They have illustrated the applications of com-pact execution traces in program debugging, program com-prehension, code optimization, memory layout, etc. Theyhave used trace files to store the execution history. Thisleads to slow I/O operations. They have considered thatthe communication across different threads occurs onlyvia synchronization primitives. Communication via sharedvariable accesses is not explicitly represented in theirmethod. We have considered communications amongthreads through shared variables as well as message passing.

Garg and Mittal (2001) introduced the notion of a sliceof a distributed computation. They have defined the slice of

Page 17: Distributed dynamic slicing of Java programs

D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678 1677

a distributed computation with respect to a global predicate,as a computation which captures those and only those con-sistent cuts of the original computation which satisfy theglobal predicate. A computation slice differs from a dynamic

slice in that it is defined for a property rather than a set ofvariables of a program. Unlike a program slice, whichalways exists, a computation slice may not always exist.They have proved that the slice of a distributed computa-tion with respect to a predicate exists iff the set of consistentcuts that satisfy the predicate, forms a sub lattice of the lat-tice of consistent cuts. Mittal and Garg (2001a,b) presentedan efficient algorithm to graft two slices, that is, given twoslices, either compute the smallest slice that contains allconsistent cuts that are common to both slices or computethe smallest slice that contains all consistent cuts thatbelong to at least one of the slices. They have not consid-ered object-orientation aspects.

Some of the existing techniques (Duesterwald et al.,1992; Korel and Ferguson, 1992) for debugging distributedprograms include event-based debugging based onrecorded event histories and execution replay. Duringinstant replay, the original execution of a program (or anindividual process) is reproduced based on the recordedorder of received messages. Most existing methods (Duest-erwald et al., 1992; Goel et al., 2003; Kamkar and Krajina,1995; Korel and Ferguson, 1992) use execution trace filewhose size is proportional to the number of executed state-ments which itself can be unbounded in presence of loops,and upon this, they use graph reachability to computedynamic slices which can take large amount of time. Ouruse of DPDG does not involve the use of trace files. Asno trace files are used in our method, it also significantlyimproves the space as well as time complexities.

Our graph representation, is substantially different fromall the other existing methods, e.g., Cheng (1993), Duester-wald et al. (1992), and Kamkar and Krajina (1995), andtakes care of dynamically created threads and messagepassing using message queues. Our DPDG can handlethread creation, inter-thread synchronization and inter-thread communication. By using our method, messagescan be sent asynchronously from one thread to another.In our approach, messages get stored in message queuesand are later retrieved from the queue by the receivingthread. This is a more elaborate message passing mecha-nism compared to the techniques developed by Cheng(1993), Duesterwald et al. (1992), Goel et al. (2003), Kam-kar and Krajina (1995), Korel and Ferguson (1992), and Liet al. (2004). Our dynamic slicing algorithm successfullyhandles the complications created by this message passingmechanism.

7. Conclusions

In this paper, we have proposed a novel technique forcomputing dynamic slices of distributed Java programs.We have introduced the notion of distributed program

dependence graph (DPDG) as the intermediate program

representation used by our slicing algorithm. We havenamed our algorithm distributed dynamic slicing (DDS)algorithm. It is based on marking and unmarking the edgesof the DPDG as and when the dependencies arise and ceaseat run-time. To achieve fast response time, our algorithmruns on several machines connected through a network ina distributed fashion. Our algorithm addresses the concur-

rency issues of Java programs while computing thedynamic slices. It also handles the communication depen-

dency arising due to objects shared among threads on samemachine and due to message passing among threads on dif-ferent machines. Our algorithm does not require any tracefile to store the execution history. Another importantadvantage of our algorithm is that when a slicing commandis given, the dynamic slice is extracted immediately by look-ing up the data structure Dynamic_Slice, as it is alreadyavailable during run-time. Although we have presentedour dynamic slicing technique for Java programs, thetechnique can easily be adapted to other object-orientedlanguages such as C++.

References

Agrawal, H., DeMillo, R., Spafford, E., 1993. Debugging with dynamicslicing and backtracking. Software-Practice and Experience 23 (6),589–616.

Agrawal, H., Horgan, J., 1990. Dynamic program slicing. In: Proceedingsof the ACM SIGPLAN’90 Conference on Programming LanguagesDesign and Implementation, SIGPLAN Notices, Analysis and Veri-fication, vol. 25. White Plains, New York, pp. 246–256.

Aho, A., Sethi, R., Ullman, J., 1986. Compilers: Principles, Techniquesand Tools. Addison-Wesley.

Ashida, Y., Ohata, F., Inoue, K., 1999. Slicing methods using static anddynamic analysis information. In: Proceedings of the 6th Sixth Asia-Pacific Software Engineering Conference (APSEC-99), Takamatsu,Japan, pp. 344–350.

Binkley, D., Gallagher, K., Zelkowitz, M., 1996Program Slicing,Advances in Computers, vol. 43. Academic Press, San Diego, CA.

Chen, Z., Xu, B., 2001. Slicing concurrent Java programs. ACMSIGPLAN Notices 36, 41–47.

Cheng, J., 1993. Slicing concurrent programs—a graph theoreticalapproach. Automated and Algorithmic Debugging, AADE-BUG’93LNCS. Springer-Verlag.

Duesterwald, E., Gupta, R., Soffa, M.L., 1992. Distributed slicing andpartial re-execution for distributed programs. In: Fifth Workshop onLanguages and Compilers for Parallel Computing, New HavenConnecticut. LNCS. Springer-Verlag, pp. 329–337.

Gallagher, K., Lyle, J., 1991. Using program slicing in softwaremaintenance. IEEE Transactions on Software Engineering SE-17 (8),751–761.

Garg, V., Mittal, N., 2001. On slicing a distributed computation. In:Proceedings of 21st IEEE International Conference on DistributedComputing Systems (ICDCS), pp. 322–329.

Goel, A., RoyChoudhury, A., Mitra, T., 2003. Compactly representingparallel program executions. In: Proceedings of ACM SIGPLANSymposium on Principles and Practice of Parallel Programming(PPoPP), pp. 191–202.

Goswami, D., Mall, R., 2002. An efficient method for computing dynamicprogram slices. Information Processing Letters 81, 111–117.

Horwitz, S., Reps, T., Binkley, D., 1990. Interprocedural slicing usingdependence graphs. ACM Transactions on Programming Languagesand Systems 12 (1), 26–61.

Kamkar, M., 1993. Inter procedural dynamic slicing with applications todebugging and testing. Ph.D. thesis, Linkoping University, Sweden.

Page 18: Distributed dynamic slicing of Java programs

1678 D.P. Mohapatra et al. / The Journal of Systems and Software 79 (2006) 1661–1678

Kamkar, M., Krajina, P., 1995. Dynamic slicing of distributed programs.In: International Conference on Software Maintenance. IEEE CSPress, pp. 222–229.

Korel, B., Ferguson, R., 1992. Dynamic slicing of distributed programs.Applied Mathematics and Computer Science 2, 199–215.

Korel, B., Laski, J., 1988. Dynamic program slicing. InformationProcessing Letters 29 (3), 155–163.

Krishnaswamy, A., 1994. Program slicing: an application of programdependency graphs. Tech. rep., Department of Computer Science,Clemson University.

Larson, L., Harrold, M., 1996. Slicing object-oriented software. In:Proceedings of the 18th International Conference on SoftwareEngineering, German.

Levine, J., Mason, T., Brown, D., 2002. Lex and Yacc. O’REILLY.Li, H., Rilling, J., Goswami, D., 2004. Granularity-driven dynamic

predicate slicing algorithms for message passing systems. AutomatedSoftware Engineering 11, 63–89.

Liang, D., Larson, L., 1998. Slicing objects using system dependencegraphs. In: Proceedings of International Conference on SoftwareMaintenance, pp. 358–367.

Mall, R., 2003. Fundamentals of Software Engineering. Prentice Hall,India.

Mittal, N., Garg, V., 2001a. Computation slicing: techniques and theory.In: Proceedings of Symposium on Distributed Computing.

Mittal, N., Garg, V., 2001b. Computation slicing: techniques and theory,Technical Report, TR-PDS-2001-02, The Parallel and DistributedSystems Laboratory, Department of Electrical and ComputerEngineering, The University of Texas, Austin.

Mohapatra, D., Mall, R., Kumar, R., 2004a. An edge marking dynamicslicing technique for object-oriented programs. In: Proceedings of 28thIEEE Annual International Computer Software and ApplicationsConference. IEEE CS Press, pp. 60–65.

Mohapatra, D., Mall, R., Kumar, R., 2004b. An efficient technique fordynamic slicing of concurrent Java programs. In: Proceedings of AsianApplied Conference on Computing (AACC-2004), Kathmandu, LNCS3285. Springer-Verlag, pp. 255–262.

Mund, G., Mall, R., Sarkar, S., 2002. An efficient dynamic programslicing technique. Information and Software Technology 44, 123–132.

Naughton, P., Schildt, H., 1998. Java—The Complete Reference.McGrawHill.

Rilling, J., Li, H.F., Goswami, D., 2002. Predicate based dynamic slicingof message passing programs. In: Proceedings of IEEE InternationalWorkshop on Source Code Analysis and Manipulation, pp. 133–144.

Umemori, F., Konda, K., Yokomori, R., Inoue, K., 2003. Design andimplementation of bytecode-based Java slicing system. In: Proceedings

of the 3rd IEEE International Workshop on Source Code Analysis andManipulation (SCAM-03), Netherlands, pp. 26–27.

Wakinshaw, N., Roper, M., Wood, M., 2002. The Java system depen-dence graph. In: Proceedings of IEEE International Workshop onSource Code Analysis and Manipulation, pp. 145–154.

Wang, T., RoyChoudhury, A., 2004. Using compressed bytecode tracesfor slicing Java programs. In: Proceedings of IEEE InternationalConference on Software Engineering, pp. 512–521.

Weiser, M., 1982. Programmers use slices when debugging. Communica-tions of the ACM 25 (7), 446–452.

Zhang, X., Gupta, R., Zhang, Y., 2004. Efficient forward computation ofdynamic slices using reduced ordered binary decision diagrams. In:International Conference on Software Engineering.

Zhao, J., 1999a. Multithreaded dependence graphs for concurrent Javaprograms. In: Proceedings of the 1999 International Symposium onSoftware Engineering for Parallel and Distributed Systems (PDSE’99).

Zhao, J., 1999b. Slicing concurrent Java programs. In: Proceedings of the7th IEEE International Workshop on Program Comprehension.

Zhao, J., Cheng, J., Ushijima, K., 1996. Static slicing of concurrent object-oriented programs. In: 20th IEEE Annual International ComputerSoftware and Applications Conference, pp. 312–320.

Durga P. Mohapatra is a senior lecturer of computer science and engi-neering at National Institute of Technology (NIT), Rourkela. He receivedhis Ph.D. from IIT, Kharagpur, and M.Tech. from Regional EngineeringCollege (now NIT), Rourkela. His research interests include softwareengineering and distributed computing.

Rajeev Kumar is an associate professor of computer science and engi-neering at Indian Institute of Technology (IIT), Kharagpur. Prior tojoining IIT, he worked for Birla Institute of Technology & Science (BITS),Pilani and Defence Research & Development Organization (DRDO). Hereceived his Ph.D. from University of Sheffield, and M.Tech. from Uni-versity of Roorkee (now, IIT - Roorkee) both in computer science andengineering. His main research interests include programming languagesand software engineering, multimedia and embedded systems, and multi-objective combinatorial optimization. He is a member of ACM, seniormember of IEEE, and a fellow of IETE.

Rajib Mall has been with the department computer science and engi-neering of IIT, Kharagpur for the last 12 years, where he is now a fullprofessor. He obtained his Bachelor’s, Master’s, and Ph.D. degrees, allfrom Indian Institute of Science (IISc), Bangalore. His primary researchinterests are in the areas of program analysis and program testing. He is asenior member of IEEE.