Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32...

15
32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R. Gupta and M. University of Pittsburgh, Pittsburgh, PA Sofia Abstract We present a parallel algorithm to compute dynamic slices for distri- buted programs. Dynamic slices are used in debugging to re-execute only those statements of the original program that actually influenced an observed erroneous result. We introduce the notion of a Distributed Dependence Graph (DDG) as the graphical representation of the relevant dependencies among statements that arise during execution of a distributed program. Based on the DDG, we developed a parallel and fully distributed slicing algorithm, where each process determines its local section of the global slice. The potential for non-determinism in distributed programs is addressed by constructing a slice such that non- deterministic selections that were made during execution of the original program are reproduced when re-executing the program slice. 1 Introduction The development of program slicing was motivated by the observation that programmers, when debugging, often construct "mental slices" [19]. Tracing backwards from the program point where an error is first observed, programmers attempt to identify the statements that influence the incorrect value. To aid in debugging by automating this process, a static pro- gram slice was introduced as the subset of a sequential program that contains the statements that may influence the values of a selected set of variables at given program points [15, 20]. Dynamic slices [1, 2, 9] were developed as a further refinement to a static slice with respect to a specific input by ruling out statements that do not influence the selected values for that input. The task of developing a mental slice for debugging parallel programs is more difficult due to the timing related interdependencies among processes that have an influence on observed erroneous behavior. Thus, it is critical that techniques be developed that automate

Transcript of Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32...

Page 1: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

32 Di s t r ibuted Slicing and Part ia l R e - e x e c u t i o n for D i s t r ibuted P r o g r a m s

E. D u e s t e r w a l d , R. G u p t a and M. University of Pittsburgh, Pittsburgh, PA

Sofia

Abstract We present a parallel algorithm to compute dynamic slices for distri- buted programs. Dynamic slices are used in debugging to re-execute only those statements of the original program that actually influenced an observed erroneous result. We introduce the notion of a Distributed Dependence Graph (DDG) as the graphical representation of the relevant dependencies among statements that arise during execution of a distributed program. Based on the DDG, we developed a parallel and fully distributed slicing algorithm, where each process determines its local section of the global slice. The potential for non-determinism in distributed programs is addressed by constructing a slice such that non- deterministic selections that were made during execution of the original program are reproduced when re-executing the program slice.

1 Introduction

The development of program slicing was motivated by the observation that programmers, when debugging, often construct "mental slices" [19]. Tracing backwards from the program point where an error is first observed, programmers attempt to identify the statements that influence the incorrect value. To aid in debugging by automating this process, a static pro- gram slice was introduced as the subset of a sequential program that contains the statements that may influence the values of a selected set of variables at given program points [15, 20]. Dynamic slices [1, 2, 9] were developed as a further refinement to a static slice with respect to a specific input by ruling out statements that do not influence the selected values for that input.

The task of developing a mental slice for debugging parallel programs is more difficult due to the timing related interdependencies among processes that have an influence on observed erroneous behavior. Thus, it is critical that techniques be developed that automate

Page 2: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

498

the slicing of parallel programs. By eliminating irrelevant portions of the program with respect to an incorrect result, slices reduce the complexity involved in locating errors in distributed programs. A current debugging strategy for distributed programs is the replay [14] of a partic- ular program execution in an attempt to localize errors. In contrast, a program slice allows the programmer to re-execute and inspect only those statements that actually have an influence on the computation of some selected values. In this paper, we develop a technique to compute dynamic slices for distributed programs. The slices are constructed for use in partial re- execution when debugging distributed programs.

Given a program point s in one process of a distributed program P and an input set I for P , we define a Distributed Dynamic Slice to be the executable subset of P that contains the statements in each process that actually affected the values computed or used in s for the input I . Thus, a distributed dynamic slice (or slice for short) is a distributed subprogram that enables the re-execution of only the portion of the program that is of interest with respect to the com- putation of some selected values for a specific input.

Developing a slicing technique for partial re-execution of distributed programs intro- duces two challenges. First, dynamic slicing principles developed for sequential programs must be extended to distributed programs. Importantly, to be useful in debugging, program slicing must be both efficient and effective. In particular, the generation of complete execution traces should be avoided, as tracing may become prohibitively expensive for large programs. Furthermore, to be useful in an distributed environment, the construction of slices should not rely on a central coordinator. Instead, each process should contribute to the slice by determin- ing its local portion of the global slice in a fully distributed fashion.

The second challenge is concerned with the potential of non-determinism in distributed programs. Distributed programs often make non-deterministic decisions; the order in which concurrently sent messages are received is time dependent and thus may vary from one execu- tion to the next. It follows that repeated executions of a program on the same input may result in the execution of different program paths. However, when debugging, the programmer is interested in the exact execution that exhibits the error. Thus, unlike re-executing sequential program slices, a slice for re-execution of a distributed program must be based on dynamically collected information. In order to precisely reproduce the original behavior, all non- deterministic decisions that are made during execution must be recorded, and program slices must be instrumented to reproduce the recorded decisions.

We addressed the above issues in the development of our slicing algorithm. We account for the potential of non-determinism in distributed programs by transforming the non- determinism in the executing program into determinism in the computed slice. To achieve this, we collect run-time information about the relative execution order of received messages. By incorporating the recorded information in the constructed slices, we can effectively transform the non-deterministic communication constructs in the original program into deter- ministic ones in the slice. The amount of tracing is kept to a minimum by only reporting the relative order of communication events. We do not require the contents of transmitted mes- sages to be traced, nor does our technique generate additional message traffic.

To determine which statements to include in a slice, we utilize static as well as dynamic information about dependencies among program statements. The static information describes the control dependencies among statements, which do not change when a program executes. Hence we determine control dependencies prior to execution. The dynamic information describes the actual data and communication dependencies that occur during execution. For

Page 3: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

499

efficiency, we avoid the computation of static data flow information, as only a dynamic subset is actually needed for a specific execution. Dynamic data flow information is computed on- the-fly as the program executes through a simple pointer mechanism requiring no run-time analysis. Since all static ambiguities involving dynamically referenced variables are resolved at run-time, we can effectively handle pointers and arrays.

The static and dynamic dependence information is represented in a common structure, the Distributed Dependence Graph (DDG). The DDG is distributed among the participating processes, and explicitly represents both static and dynamic dependencies among program statements. Prior to execution, the static portion of the DDG is constructed and the code of the original program is instrumented to enable the on-the-fly construction of the dynamic depen- dence subgraph as the program executes.

After execution, the complete DDG is available for the computation of dynamic slices. By using a dependence graph, the slice computation reduces to a simple vertex reachabifity problem [17]. A parallel algorithm extracts program slices from the DDG in a fully distri- buted fashion, where each process identifies its local portion of the global slice,

The basic concepts of distributed slicing are introduced in Section 2. The distributed dependence graph (DDG) is defined in Section 3. Section 4 describes the construction of the DDG and Section 5 discusses our parallel slicing algorithm that operates on the DDG. Our technique to enable partial re-execution by eliminating non-determinism for slices is presented in Section 6. Section 7 presents related work and a conclusion is given in Section 8.

2 Distributed Dynamic Slicing

A distributed program is a collection of sequential processes P = (P 1 . . . . . Pn) that communi- cate through the reception and transmission of messages. We assume in this work a synchro- nous (i.e., blocking) message passing mechanism. However, an asynchronous (i.e., non- blocking) model can easily be incorporated. There are no shared variables in P and each pro- cess Pi executes a possibly distinct program over a separate address space. The language con- structs for message passing are send and receive statements. The syntax and semantics of send and receive are informally stated below.

send( msg, dest): Executing a send statement results in the transmission of the message stored in msg to the process identified by the expression dest, where dest must evaluate to a valid pro- cess identification number (process ID). The send statement is blocking, i.e., the sending pro- cess is blocked from execution until process dest has received the message.

receive( msg {, src} ): Executing a receive statement causes the assignment of a sent message to variable msg. The second argument is optional, and if provided, identifies the sending pro- cess of the message. The receive statement is blocking.

There are no assumptions made on the order in which messages arrive at their destination except that messages sent by one process to another are received in the order they were sent.

However, messages sent concurrently from different sources to one process may arrive in any order. All messages that arrive at a process are collected in a message queue. A process exe-

cuting a receive(msg, src) statement removes the first message sent by process src from the

queue. If argument src is not provided, the first message in the queue is removed regardless of

Page 4: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

500

its source process. A receive statement that does not specify a source process for the message

to be received is semantically similar to the select statement in Ada [4]. Communication

occurs non-deterministically by selecting whatever message arrives first. We call such a

receive statement a non-deterministic receive. The communication in a program is determinis-

tic if and only if all receive statements specify a source process.

Using Weiser ' s terminology [20], we define a dynamic slice for a distributed program

based on a slicing criterion.

Definition: Given a distributed program P = (P 1 ..... Pn), a distributed dynamic slicing cri- terion for P is a tuple C = <(11, X 1) ..... (In, X n)>, where I i is the input to process Pi and X i is a set of statements in Pi.

Definition: Given a slicing criterion C = <(I1, X 1) ..... (I n, X n)> for a distributed program P - (P 1 ..... Pn ), a distributed dynamic slice S = (P' 1 ..... P'n) with respect to an execution E of P is an executable subset of P , such that P'i is a subset of Pi and when S is executed on input (I 1 ..... I n) it produces the same values for the variables in (X 1 ..... X n) as P did in execution E .

There can be more than one dynamic slice for a given distributed program and a slicing criterion. Moreover, determining the statement-minimal dynamic slice is undecidable as it is in the case of static slices [20]. Thus, we have developed our slicing algorithm to construct slices conservatively, such that they satisfy the slicing criterion but may not be statement- minimal.

In order to define the statements that must be included to satisfy the slicing criterion, we utilize the notion of dependence among program statements. In sequential programs, state- ments depend on each other with respect to two relations: control and data dependence. According to conventional dependence definitions, a statement s 1 is control dependent on statement s2, i f s 2 is a control predicate and control reaches s 1 depending on the result of evaluating s 2. Data dependencies describe the definition-use pairs in a program. Since we are concerned with a specific execution of a distributed program, we consider the actual (i.e., dynamic) definition-use pairs as opposed to the potential (i.e., static) ones. Thus, a statement s I is dynamically data dependent on statement s 2, i f s 2 computes a value that is used in s 1.

Control and data dependence are the relations among statements that hold within one process. Distributed programs introduce a new form of dependence that crosses process boun- daries and is called communication dependence. The execution of a receive statement depends on the execution of a corresponding send statement. In a blocking message passing mechan- ism, the reverse also holds, i.e., an executed send statement depends on the corresponding exe- cuted receive statement.

Definition: A statement s 1 is dynamically communication dependent on statement s 2, if s 1 is a receive (send) statement and s 2 is a send (receive) statement and communication occurs between s 1 and s 2.

Control dependence information is static, since for a given statement s , the statements that s is control dependent upon do not change in different executions. However, the state- ments that s is data or communication dependent upon may vary in different executions. Hence, the dynamic dependencies are the data and communication dependencies among

Page 5: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

501

statements. Both static and dynamic dependence information is represented in a common structure, the Distributed Dependence Graph (DDG). A slice for a given slicing criterion C is the closure of the statements in C with respect to control, data and communication depen- dence.

3 The Distributed Dependence Graph (DDG)

The DDG is a distributed and modified version of the program dependence graph [6]. The DDG for a distributed program P = (P 1 .... Pn) is a directed distributed graph G = (G 1 ..... G , , E C ). G i = (Ni, E i ) is a local subgraph for process Pi, where N i is a set of nodes and E i a set of control dependence and dynamic data dependence edges. The nodes in N i represent the assignment statements, the send and receive statements, and the control predicates in Pi with one distinguished control predicate called entry Pi. The physically distributed subgraphs are logically connected through communication dependence edges in the set E C between send and receive nodes.

Each process Pi determines its local subgraph G i of the DDG and collects information about the logical connections at send and receive nodes. The set of edges in each subgraph G i is divided into a static and a dynamic portion. The static portion is given by the control depen- dence edges. There is a control dependence edge (n 1, n 2) if n 2 is immediately control depen- dent on n ]. Dynamic dependence edges are added to the graph on-the-fly. During execution, the dynamic reaching definitions are determined and a corresponding edge is created in the DDG. Dynamic data flow analysis, which has previously been used in program testing [10], entirely eliminates the need to statically analyze a program for data dependencies. Instead, the original code is instrumented to create dynamic dependence edges on-the-fly through a simple pointer mechanism requiring no run-time analysis.

Consider the distributed program in Fig. 1 (i) consisting of three processes, Consumer, Producerl and Producer2 with process IDs 0,1 and 2, respectively. Process Consumer con- sumes data sent by the two producer processes and each time non-deterministically selects the message to be received (lines 2 and 5). The values received by Consumer are maintained in a local array and Consumer computes and outputs the minimum received value. Consider now a particular execution with input I 0 = <n=3>, 11 = <n=3, a=(10,-50,100)>, 12 = <n=3, b=(10,-50,100)> for processes 0,1 and 2 and assume process Consumer first receives a mes- sage from Producer 2. The produced output value by Consumer is -50.

A programmer can understand how the output value -50 for variable *p has been com- puted by constructing a mental slice across the three processes. However, even in such a sim- ple program example, determining a mental slice from the program text is difficult due to the presence of non-deterministic communication and the use of pointers and arrays. The actual dynamic slice for the output value in process Consumer is shown in Fig. 1 (ii). Note, that not all executed statements are included in the slice. For example, although process Producerl executed its loop and sent values to process Consumer, the communicated values had no influence on the output value in Consumer. Hence, no statement in process Producerl besides the input statement is included in the slice. The section of code the programmer might have examined to understand how *p obtained the output value -50 is likely to be much larger than this slice and would probably include communication between process Consumer and Producer l .

Page 6: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

502

Process Consumer /*pid=O*/ Process Producer1 /*pid-l*/ int n,min,*p,a[lO]; int n,a[lO];

(i} input(n); (i0} input(n,a}; (2) receive(min}; (11} while n>O { (3} p:-&min; (12) if a[n]>O then (4} while n>O { (13) send(a[n],O); (5) receive(a[n]); (14) n:=n-1; (6) if a[n]<*p } (7) p:=&a[n]; (8) n:=n-l; }

Process Producer2 /*pid=2*/ int n,b[lO]; (15} input(n,b); (16) while n>O { (17) if b[n]<O then (18) send (b [n] , O} ; (19) n:=n-l;

I

(9) output('min value reoeived',*p);

(i) A distribumd program for processes 0,1, and 2.

Process Consumer /*pid=O*/ Process Producer1 /*pid=l*/ int n,min,*p,a[lO]; int n,a[lO]; (i) input(n}; (I0} input(n,a); (2) receive(min, 2); (3) p:=&min; (9} output('min value received',*p);

Process Producer2 /*pid~2*/ int n,b[lO]; (15) input(n,b); (16) while n>O { (17) if b[n]<O then (18} send(b[n],O); (19) n:=n-1;

I

(ii) Dynanaic slice for the output value in process 0 with respect to input (1o, 11,12).

Fig. 1: A distributed program (i) and a dynamic slice for the output value in process 0 with respect to input I 0 = <n=3>, I 1 = <n=3, a=(10,-50,100)>, 12 = <n=3, b=(10,-50,100)>.

Consumer . . . . . . . . control dependence

............ .........

l ~ Zlreceive(min I 3 ~ 4 v @ r . 91output*p> I

5 [receive(~])] 6 ~ 8

Producer 1 Producer 2

12~ 14~ 17~ 19~ i

13 [send(a[n],0) ] 18 [send(b[n],01

Fig. 2: Control dependence subgraphs for each process for the program in Fig. 1.

Page 7: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

503

Consumer ......... control dependence t~77L'~ - - data dependence

. - ~ - . I I slice

. .--"n /-" : "'.. .......

1 ['mput(n ] 2 ~ p:=lmin .1 4 n (~ .~n~ lou tpu t (*p ) ]

Producer 1

lq input(u,a) ~ 1 1

12 " 1 aO ~ ~

13 [send(a[n],0) [

7 Ip:=&a[n] I

Producer 2

~ , 1 7 ~ 1 9 ~

181 send(b[n],0) [

Fig. 3: and The DDG for the program of Fig. 1 and the input I 0 = <n--3>, I 1 = <n---3, a=(10,-50,100)>. 12 = <n=3, b---(10,-50,100)> with the slice for the output value in process 0 shown in bold nodes.

Fig. 2 shows the control dependence subgraphs of the DDG that are built prior to execu- tion by each process. In Fig. 3 we show the DDG for the program after execution. Commun- ication edges are omitted in the figure for clarity and merely the source processes of received messages are listed in <>'s at the receive nodes 2 and 5. The slice for the output value in node 9 in process Consumer is determined as follows. First, the slice is initialized with the input statement in each process and, in addition, in process Consumer with node 9. Then, all nodes that are reachable in the graph are included. Note, that the inclusion of input statements ensures that the slice consumes the same input as the original program. The resulting set of nodes is shown in bold in the figure.

4 Construct ing the D D G through Code Instrumentat ion

Each process statically determines its local control dependence subgraph. There is one node for each statement in P and a control dependence edge from a node n 1 to a node n 2 if n ] is immediately control dependent on n 2. The remaining portion of the DDG consists of the set of dynamic data and communication edges. We discuss in this section the code instrumentation that is added to the original program for the on-the-fly creation of dynamic data and communi- cation dependence edges in the DDG.

Page 8: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

504

4.1 Constructing Dynamic Data Dependence Edges

We first consider the construction of dynamic data dependence edges. A scheme similar to Korel's dynamic data flow analysis [10] is used to collect dynamic information through a sim- ple pointer mechanism. To determine the actual reaching definitions during execution, a pointer v.dptr is associated with each variable v. At any point during execution v.dptr points to the program statement that last defined a value for v. Thus, for each executed use of vari- able v, the dynamic reaching definition is immediately found through v.dptr and a corresponding dependence edge is created. Every definition of a variable v that is executed at a statement s causes an update of v.dptr to point to s.

To handle composite structures such as records, the individual record components are treated in the same way as described above. If, however, the entire record is used or defined, code instrumentation is inserted for each component of the record. The same procedure fol- lows for a reference to an entire array. For the reference of individual array elements, code instrumentation is added for the evaluated array element and also for every variable that occurs in the subscript expression. The use of pointer variables requires special treatment. Pointer variable references are similar to array references in that they may access a different variable in different instances. However, the reference of a pointer may actually access two variables: the pointer itself and the variable pointed to by the pointer. If a pointer/7 points to a variable v and v is accessed through p , we also insert code for the appropriate run-time actions for the reference to v.

4.2 Constructing Dynamic Communication Dependence Edges

Communication dependence edges cross process boundaries and establish the logical connec- tions among the physically distributed local DDG subgraphs. The information needed to establish a communication edge from a node n in process P to a remote node m in a process Q consists of the pair (Q, m) that is stored at node n and describes the remote sink node. However, the pair (Q, m) cannot be determined locally by P due to the separate address spaces.

Consider first the case of establishing a communication edge from a receive node n r in a process Pr tO the respective remote send node n s in a process Ps. If the sending process Ps appends the pair (Ps, ns) to the sent message, the receiver Pr can immediately establish the communication edge to remote node ( I s , ns) upon message reception. Thus, the source node information is appended to every sent message and a process receiving a message at node n adds this information to a set src In ] representing the communication edges emerging from node n.

Storing communication edges at send nodes would require additional message traffic to inform the sender about the specific receive node identity. Additional message traffic clearly imposes an undesirable execution overhead. Fortunately, if all communication edges are reported at receive nodes, it is not necessary to duplicate the same edges at send nodes. To retrieve the communication edges from a send node n s at a process Ps after execution, a search for all receive nodes n r such that the pair (Ps, ns) is part of src [nr ] is performed. However, such a search may become unnecessarily expensive; since the DDG is distributed, process Ps may have to send messages to every other process to retrieve the communication edges for send node n s. Ideally, only the processes that have actually communicated with Ps

Page 9: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

505

(10)send(msg:m, src:my_pid, node:10,dest:p);

(ll)receive(msg:m, src:p,node:n);

(12)receive(msg:m, src:c,node:n);

(I0) send (msg:m, dest :p) Add(dest [100], p) ;

(Ii) receive (msg:m) ; Add(src[101], (p,n)) ;

(12) receive (msg:m, src: c) ; Add(src[102], (c,n)) ;

(i) (ii)

Fig. 4: Original message transfer commands (i) and the instrumented commands for the construction of commun- ication edges (ii). The system variable my..pid is automatically assigned the actual process identification number.

at send node ns are involved in the search. Thus, to quickly retrieve the communication edges that emerge from a send node, we store additional information. The additional information consists of a set dest [ns ] that contains all destination processes of messages sent from node n s. Whenever a message is sent from a node n s the destination process is added to the set des t [n s ]. The complete set of communication edges emerging from send node n s can then be retrieved after execution by collecting the specific receive nodes in the processes contained in dest [n s ].

The code instrumentation to report communication edges is illustrated in Fig. 4. For clar- ity, the actual arguments in send and receive statements are preceded by the name of their matching formal argument specifier. The code instrumentation enables the construction of the dynamic dependence subgraph as the program executes. After execution the complete DDG is available for the extraction of program slices.

5 Parallel Slicing Algorithm

By using the DDG the extraction of a dynamic slice for a given slicing criterion reduces to a simple vertex reachability problem. A slice is computed as a set of nodes in the DDG and the corresponding subprogram is obtained by restricting the original program to only those state- ments that are represented by a node in the computed slice set. The slice set is initialized with nodes in the DDG according to the slicing criterion, and the slice is computed by adding all nodes that are reachable from a node already in the slice. However, each process has access to only its local section of the DDG. Thus, the slice extraction is performed in a fully distributed algorithm, where each process contributes by determining the local statements in the slice. Communication among processes only takes place when a send or receive node is included in the local slice by some process. Note that all local DDG subgraphs could have also been col- lected in a single site. However, for large programs, the central collection may become too expensive. Moreover, the potential parallelism in the slice extraction cannot be exploited. For this reason, we developed the parallel slice extraction algorithm depicted in Fig. 5.

Algorithm E x t r a c t S l i c e is executed by each process. All processes start by initializing their local working set W S in parallel according to the slicing criterion (line 2). Since we require that slices, when executed, consume the same input as the original program, all input statements in a process are added during initialization (line 2). Note, that the statements named in the slicing criterion will also be included in the slice. A process slices on the state- ment in W S in the loop in lines 3 through 16. Nodes that are reachable via data or control dependence edges are handled locally by each process (line 12).

Page 10: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

506

Algorithm: Extract_Slice Input: The set Xi for process Pi from a slicing criterion C Output: Process Pi's section of the dynamic slice fro" C

Begin (1) Slice:=~; (2) WS := { s I s is an input statement or SE Xi } (3) While not termination Do (4) While working set WS ~ O Do (5) remove a statement node s from WS (6) Slice:--Slice u {s} (7) mark s as 'visited' (8) If s is a receive node Then (9) For each pair (laid, node) stored at s Do send message (type:"receive", node) to pid (10) Else Ifs is a send node Then (11) For each destination pid stored at s Do send message (type:"send", s) to pid (12) Else WS:=WS L) {t I s---~t is a control or data dependence edge and t is not marked 'visited' }

EndWhile (13) While message queue MQ r O Do (14) remove a message with contents (type, node) from the queue; (15) If type = "receive" Then WS:=WS u { node I if node is not marked 'visited' } (16) Else WS:=WS U { t I t--a, node is a communication edge and t is a receive node not marked 'visited' }

EndWhile EndWhile

(17) output(Slice); End

Fig. 5: Algorithm Exlract_Slice to be executed by each process.

Cooperation among processes takes place when communication dependencies are met. When a process P includes a receive node nr, the slice crosses process boundaries and P sends a request message to all processes that communicated with P at receive node n r . The

request message contains the send node identity that was stored in the set src [nr] (lines 8-9 ). As described in Section 4.2, communications edges are reported only at receive nodes. Thus, the inclusion of a send node ns is treated differently, since only the destination processes of messages sent from node ns and not the specific receive nodes are stored at dest[ns]. A dif-

ferent request message is sent to every process P contained in dest [n s ] that requests the inclu- sion of all receive nodes that have a communication edge emerging to n s (lines 10-11).

Whenever a process has exhausted the nodes in the local working set W S , it checks whether a request has arrived from another process (line 13). If so, the respective nodes are included in the set WS, and the process continues slicing on the nodes in WS (lines 14-16). The slicing terminates when no process has any more work. Termination is detected via a standard algorithm for distributed detection of global termination [5] and we have omitted showing the code for termination detection in Fig. 5.

We now apply the distributed slicing algorithm to our sample DDG from Fig. 3 to deter- mine the slice for the output value in process Consumer. Thus, we consider the slicing cri- terion C = < (<n=3>, {9}), (<n=3, a=(10,-50,100)>, O), (<n=3, b=(10,-50,100)>, O) >. The three processes initialize their local slice sets with input statements, and process 0 (Consu- mer) also adds node 9 to the initialization. Slicing on input statements adds in each process the respective entry node. In addition, process 0 includes nodes reachable from node 9, i.e, the

Page 11: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

507

two nodes 2 and 3. When node 3 is reached a message is sent to process 2 (Producer2). Pro- cess 2 then adds node 18 to its working set and in turn includes all local nodes to the slice. The resulting slice is shown in bold nodes in Fig. 3.

A critical aspect in developing a distributed algorithm is the degree of parallelism in the computation. The degree of parallelism in algorithm Extract_Slice depends mainly on the parallelism in the original program (i.e., if the original program executes mostly serial, the slice extraction will do so too). However, the process activities are mostly independent. Beside the initialization work, a process' activities consist of repeatedly computing slices on send or receive statements (communication-slices). The parallelism in the computation increases with a decrease in the idle times, that the processes have between the computation of communication-slices. However, the communication-slices that are needed cannot be antici- pated by a process unless an already actively slicing process sends a request. Thus, although the total work done by a process Pi is O ( [ E i I ), where E t is the local edge set in Pi, the com- munication delays may require a worst case time of O (I E I ), where E is the set of all edges in the DDG. The worst case happens when the slice computation is performed strictly serial, i.e, there is a chain processes and each process starts slicing only after all his predecessors in the chain have finished their contribution to the slice computation.

We can optimize the amount of parallelism even in the worst case scenario if the idle times in a process are filled by precomputing communication-slices that may be used later. Whenever a process becomes idle during the distributed slice computation, it starts precom- puting communication-slices that are predicted to be needed later. The precomputation is interrupted whenever a request message from another actively slicing process is received. We benefit from the precomputation if the received message requests a communication-slice that has already been completely or partially precomputed.

6 Eliminating Non-determinism for Re-execution of Slices

If the slicing algorithm from the previous section is applied to programs that make non- deterministic decision, the constructed slices may inherit some of the non-determinism that was present in the original program. Whereas non-determinism in slices is of no relevance if slices are merely textually displayed to the programmer, non-deterministic communication constructs in a slice cause problems when the slice is to be used for partial re-execution. We discuss in this section how the non-determinism in slices is eliminated to enable a re-execution of a slice that reproduces the original program behavior. Although the elimination of non- determinism is discussed in the context of slicing, the our techniques are generally applicable to enable the reproduction of non-deterministic behavior in distributed programs.

A simple approach to reproduce non-deterministic decisions it to first generate complete execution traces and then simulate re-execution by replaying the recorded traces. However, the overhead of storing the traces may become overly expensive for large programs, in partic- ular since the traces must be available for every replay. We developed an alternative approach to the slice re-execution problem based on the techniques used in Instant Replay [14]. Similar to Instant Replay only information about the relative execution order of non-deterministic decisions that are made during execution is recorded. By using the recorded information, we transform the non-deterministic receive statements in the original program into deterministic ones in the constructed slices. The transformation is semantically similar to transforming the non-deterministic select statement in Ada into a conditional. Thus, our constructed program

Page 12: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

508

process 0

r.~~d(m,1)

process 1

s"~ l eive(m)

(i) original program

process 0

r.~~ tnd(msg:m,src:O,n~le:r,dest: 1)

process 1

~:~~ive(msg:m,src:p,node:n) ueue(src_q[s], (p,n))

(ii) instrumented code

process 0

r.~]nd(m, 1)

process 2

nd(n,1)

process 2

t.-~~d(msg:m,src:2,node:t,dest: 1)

process 1 process 2

s..~~-dequeue(src-q[s] ) t . ~ ] d(n, 1 ) eive(m,p)

(iii) constructed slice l~g. 6: Transforming non-determinism in the original program into determinism in the slice

slices are independent deterministic distributed programs that do not require a special environ- ment for re-execution.

To implement the transformation, additional information is collected at non-deterministic receive nodes. Note that for the establishment of communication edges, as described in Sec- tion 4, it was sufficient to collect the source of messages received at a node n in an unordered set src [n ]. However, to enable replication of the identical communication events, it is neces- sary to store additional information about the relative order of received messages.

Instead of the unordered set src[n], a queue src_q[n] is maintained at every non- deterministic receive node n. Two operations enqueueO and dequeueO are defined to insert and remove elements in the queue. The queue s r c q [n ] is filled during the original execution by enqueuing the sender process of every message received at n. After execution, the queue src_q [n] describes the relative order of the source processes for messages received at n. When constructing a slice, the filled queues are incorporated as part of the slice's data section. Determinism is ensured by instrumenting the constructed slice to dequeue the appropriate source process immediately before executing a receive statement.

When a queue contains a single source process src ( possibly multiple times) it is not necessary to incorporate the entire queue in the constructed slice. Instead, a non-deterministic receive(m) statement in the original program is directly transformed into a deterministic receive(m,src) statement.

Page 13: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

509

The transformation is illustrated in Fig. 6. A fragment of a distributed program with three processes is shown in Fig. 6 (i). Processes 0 and 2, concurrently send messages to process 1 and process 1 non-deterministically receives one message from either process in each loop iteration. Fig. 6 (ii) shows the original program after code instrumentation. The queue src_q [s ] in process 1 is used to record the exact sequence in which message axe received at statement s. Fig. 6 (iii) shows the code in a constructed slice. The reproduction of the original communication events at statement s is enforced by always dequeueing the respective sender process prior to executing the receive statement.

Returning to the slice in Fig. 3, we note that there is one non-deterministic receive state- ment (node 9) included in the slice. During execution only Producer2 communicated with Consumer at this receive statement. Thus, the non-deterministic receive(min) is transformed into the deterministic receive(rain,2) in the slice as shown in Fig. 3.

7 Related Work

Several variations of the concept of a program slice have been developed for sequential pro- grams. A formal treatment of these various notions of a sequential slice in a denotational framework is given by Venkatesh [18]. Originally, Weiser introduced static program slicing based on iteratively solving sets of data flow equations [20]. A different approach to compute static slices by solving a vertex reachability problem in the Program Dependence Graph (PDG) is presented in [6, 12]. The PDG based approach was further extended for static inter- procedural slicing [8]. Dynamic slices for sequential programs have been developed as a refinement to the static slice by ruling out statements that have no influence for a specific input. In Korel and Laski's dynamic slicing technique [9] a complete execution trace is gen- erated at run-time and a dynamic slice is computed by solving data flow equations over the generated trace. A different approach to compute dynamic slices based on dependence graphs is presented in [1,2]. A form of dynamic expansion of the PDG is used, capable of distin- guishing dependencies that hold for different instances of a statements.

Recently, Korel et al. have proposed an extension of their dynamic slicing algorithm to distributed programs [11]. In their approach, each process generates a complete execution trace. The necessary dependence information to construct program slices is computed post- mortcm by analyzing the generated traces. Unlike our algorithm that uses the DDG, their slic- ing algorithm operates on complete execution traces whose lengths are unbounded with respect to the program size. The constructed slices are not independent programs and are exe- cuted using an explicit run-time scheduler that ensures the replay of the recorded communica- tion events.

Using dynamic dependence graphs for debugging parallel programs is also described in [16]. Unlike our dynamic dependence graphs, the graphs used in this technique are dynamic expansions of a static dependence graph that are incrementally generated during debugging to be displayed to the user. Other approaches to debugging distributed programs include event- based debugging based on recorded event histories [3, 13] and execution replay [14]. During Instant Replay [14] the original execution of a program (or an individual process) is repro- duced based on the recorded order of received messages. Our technique relates to Instant Replay in that we avoid the re-execution of an entire process by constructing dynamic slices.

Page 14: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

510

8 Conclusion

We have presented an extension of the concept of dynamic program slicing to distributed pro- grams. Distributed slices are used to aid in error location by allowing the programmer to re- execute only those portions of the program (i.e., the slice) that have an influence on the com- putation of some selected values. To develop such a partial re-execution tool for debugging distributed programs we addressed and solved two problems. First, we developed a dynamic slicing technique for distributed programs that is both efficient and effective. Using a parallel algorithm, program slices are determined after execution in a fully distributed fashion. Secondly, we addressed the issue of non-determinism in distributed programs. We demon- strated a transformation of the non-detemainistic communication constructs in the original pro- gram into deterministic ones in a slice to reproduce non-deterministic decisions that were made during the original execution. Although presented in the context of slicing, this transfor- marion is of general interest to reproduce non-deterministic behavior.

The development of our algorithm was motivated by the need for efficient debugging tools for distributed programs that offer the programmer an alternative to complete execution trace generation, which is part of most existing distributed debuggers. A problem that is shared by all parallel debugging strategies that are based on code instrumentation is the so called "probe effect" [7]. That is, the additional code instrumentation may affect the timing of non- deterministic events and when removing the instrumentation after debugging, the program may exhibit different behavior. If the instrumentation overhead is kept to a minimum it may be acceptable to permanently keep the instrumentation in the code. We are currently investi- gating methods to reduce the instrumentation overhead in our technique by incorporating static analysis. The objective is to avoid dynamic analysis through code instrumentation when- ever static analysis can provide sufficiently accurate information.

References

1. H. Agrawal and B. Horgan, "Dynamic program slicing," Proc. of the SIGPLAN '90 Symposium on Programming Language Design and Implementation, SIGPLAN Notices, vol. 25, no. 6, pp. 246-256, 1990.

2. H. Agrawal, R. A. DeMillo, and E. H. Spafford, "Dynamic slicing in the presence of unconstrained pointers," Proc. of the Symposium on Testing, Analysis, and Verification, pp. 60-73, Victoria, British Columbia, 1991.

3. P. Bates, "Debugging heterogneous distributed systems using event-based models of behavior," Proc. of the Workshop on Parallel and Distributed Debugging, SIGPLAN Notices, vol. 24, no. 1, pp. 11-22, 1989.

4. United State Department of Defense, "Reference manual for the Ada programming language," (ANSI/MIL-SDT-1815A), Washington, D.C., 1983.

5. E.W. Dijkstra, W. H. Feijen, and A. J. van Gasteren, "Derivation of a termination detec- tion algorithm for distributed computations," Information Processing Letters, North- Holland, no. 16, pp. 217-219, 1983.

6. J. Ferrante, K. J. Ottenstein, and J. D. Warren, "The program dependence graph and its use in optimization," ACM Transactions on Programming Languages and Systems, vol. 9, no. 3, pp. 319-349, 1987.

7. J. Gait, "A debugger for concurrent programs," Software - Practice and Experience, vol. 15, no. 6, pp. 539-554, 1985.

Page 15: Distributed slicing and partial re-execution for …soffa/research/Comp/distributed-92.pdf32 Distributed Slicing and Partial Re-execution for Distributed Programs E. Duesterwald, R.

511

8. S. Horwitz, T. Reps, and D. Binkley, "Interprocedural slicing using dependence graphs," ACM Transactions on Programming Languages and Systems, vol. 12, no. 1, pp. 26-60, 1990.

9. B. Korel and J. Laski, "Dynamic program slicing," Information Processing Letters, vol. 29, no. 3, pp. 155-163, 1988.

10. B. Korel, "Automated Software Test Data Generation," IEEE Transactions on Software Engineering, vol. 16, no. 8, pp. 870-879, 1990.

11. B. Korel, H. Wedde, and R. Ferguson, "Dynamic program slicing for distributed pro- grams," Technical Report CSC-91-006, Computer Science Department, Wayne State University, Detroit, Michigan, 1991.

12. D.J. Kuck, R.H. Kuhn, B. Leasure, D.A. Padua, and M. Wolfe, "Dependence graphs and compiler optimizations," Proc. of the 8th annual ACM Symposium on Principles of Pro- gramming Languages, pp. 207-218, Williamsburg, Virginia, 1981.

13. L. Lamport, "Time, clocks, and the ordering of events in a distributed system," Com- munications of the ACM, vol. 21, no. 7, pp. 558-565.

14. T. J. LeBlanc and J. M. Mellor-Crummey, "Debugging parallel programs with instant replay," IEEE Transactions on Computers, vol. 36, no. 4, pp. 471-482, 1987.

15. J. Lyle and M. Weiser, "Automatic program bug location by program slicing," Proc. of the 2nd IEEE Symposium on Computers and Applications, pp. 877-883, Peking, 1987.

16. B. P. Miller and J.-D. Choi, "A mechanism for efficient debugging of parallel pro- grams," Proc. of the SIGPLAN '88 Conference on Programming Language Design and Implementation, pp. 135-144, Atlanta, Georgia, 1988.

17. K. Ottenstein and L. Ottenstein, "The program dependence graph in a software develop- ment environment," Proc. of the ACM SIGSOFT/SIGPLAN Symposium on practical SDEs, SIGPLAN Notices, vol. 19, no. 5, pp. 177-184, 1984.

18. G. Venkatesh, "The semantic approach to program slicing," Proc. of the ACM SIG- PLAN '91 Conference on Programming Language Design and Implementation, pp. 107- 119, Toronto, Ontario, Canada, 1991.

19. M. Weiser, "Programmers use slices when debugging," Communication of the ACM, vol. 25, pp. 446-452, 1982.

20. M. Weiser, "Program slicing," IEEE Transactions on Software Engineering, vol. 10, no. 4, pp. 352-357, 1984.