Proving Total Correctness of Parallel Programs - computer.org ·...

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-5, NO. 6, NOVEMBER 1979

REFERENCES[1] G. V. Bochmann, "Multiple exits from a loop without the GOTO,"

Commun. Ass. Comput. Mach., vol. 16, pp. 443-444, July 1973.[2] J. B. Goodenough, "Exception handling: Issues and a proposed

notation," Commun. Ass. Comput. Mach., vol. 18, pp. 683-696,Dec. 1975.

[3] R. Levin, "Program structures for exceptional condition handling,"Ph.D. dissertation, Dep. Comput. Sci., Carnegie-Mellon Univ.,Pittsburgh, PA, June 1977.

[4] B. Liskov, A. Snyder, R. Atkinson, and C. Schaffert, "Abstractionmechanisms in CLU," Commun. Ass. Comput. Mach., vol. 20,pp. 564-576, Aug. 1977.

[5] Proc. ACM Conf on Language Design for Reliable Software,SIGPLANNotices,vol. 12, Mar. 1977.

[6] J. G. Mitchell, W. Maybury, and R. Sweet, "Mesa language man-ual," Xerox Res. Cent., Palo Alto, CA, Rep. CSL-78-1, Feb. 1978.

[7] P. M. Melliar-Smith and B. Randell, "Software reliability: Therole of programmed exception handling," in Proc. ACM Conf onLanguage Design for Reliable Software, SIGPLAN Notices, vol.12, pp. 95-100, Mar. 1977.

[81 B. Randell, "System structure for software fault tolerance,"IEEE Trans. Software Eng., vol. SE-1, pp. 220-232, June 1975.

[9] C. T. Zahn, Jr., "A control statement for natural top-down struc-tured programming," Programming Symposium, Lecture Notes inComputer Science, vol. 19, B. Robinet, Ed. New York: Springer-Verlag, 1974, pp. 170-180.

Barbara H. Liskov received the B.A. degree in mathematics from theUniversity of California, Berkeley, and the M.S. and Ph.D. degrees incomputer science from Stanford University, Stanford, CA.

From 1968 to 1972, she was associated withthe Mitre Corporation, Bedford, MA, whereshe participated in the design and implementa-tion of the Venus Machine and the Venus Op-erating System. She is presently AssociateProfessor of Electrical Engineering and Com-puter Science at the Massachusetts Institute ofTechnology, Cambridge. Her research interestsinclude programming methodology, distributedsystems, and the design of languages and sys-tems to support structured programming.

Alan Snyder received the S.B., S.M., andPh.D. degrees in computer science fromthe Massachusetts Institute of Technology,Cambridge.He is currently a member of the Technical

Staff in the Computer Research Laboratoryat Hewlett-Packard Laboratories, Palo Alto,CA, working primarily in the area of integratedcircuit design automation. His other interestsinclude programming languages and machinearchitecture.

Dr. Snyder is a member of the Association for Computing Machinery.

Proving Total Correctness of Parallel ProgramsALAN F. BABICH, MEMBER, IEEE

Abstract-An approach to proving paralel programs correct is pre-sented. The steps are 1) model the paralel program, 2) prove partialcorrectness (proper synchronization), and 3) prove the absence of dead-lock, livelock, and inrmite loops. The parallel program model is basedon KeUler's model. The main contributions of the paper are tech-niques for proving the absence of deadlock and livelock. A connectionis made between Keler's work and Dijkstra's work with serial non-deterministic programs. It is shown how a variant function may beused to prove finite termination, even if the variant function is notstrictly decreasing, and how finite termination can be used to provethe absence of fivelock. Handling of the finite delay assumption isalso discussed. The ilustrative examples indude one which occurredin a commercial environment and a classic synchronization problemsolved without the aid of special synchronization primitives.

Manuscript received April 12, 1978; revised April 30, 1979.The author is with the Basic Four Corporation, Santa Ana, CA

92711.

Index Terms-Concurrent program, correctness, deadlock, finitedelay, finite termination, infinite loops, lvelock, mutual exclusion,parallel program, termination, variant function, verification.

INTRODUCTIONA N abstract model general enough to capture most notionsA of parallel computation is highly desirable. Three crucialparts of such a model seem to be as follows:

1) The state must factor into a control part and a data part,so that such topics as "the number of processes at a givenpoint in the program" may conveniently be discussed.2) The atomic actions must be specifiable, for no coarser

level of detail will, in general, suffice for rigorously provingthe correctness of parallel programs.3) It must be possible to ignore irrelevant details of the

computation including absolute and relative execution timings,

0098-5589/79/1100-0558$00.75 ©) 1979 IEEE

558

BABICH: PROVING TOTAL CORRECTNESS OF PARALLEL PROGRAMS

for otherwise the model will be too cumbersome to be usefulin practical situations.In the following sections we present a model which is based

on Keller's model in [6]. We then discuss how to use themodel to prove the correctness of parallel programs, includingproper termination behavior, and illustrate the techniques withseveral examples. The details of many proofs have beensketched or omitted due to considerations of space. In manycases they can be found in [8], which also contains additionalexamples.

I. THE MODEL

Following Keller [6] we model a computation as a pair (s, T)called a transition system, where s is a set of states (possiblyinfinite) and T is a binary relation on s. The relation T repre-sents the results of the indivisible ("atomic") actions whichresult in successive states of the system. Since any given stateof the system may have more than one possible successor, therelation T does not usually have a corresponding single valuedfunction. We take the position that events which can happensimultaneously during the execution of the parallel prorgrammay be modeled as successive single transitions, where thesetransitions can happen in any arbitrary order. Furthermore,we do not model how long it takes a transition to complete.Thus, execution time is not modeled directly. This seems anecessary simplification to achieve the desired goal that allproofs based on the model are completely independent ofrelative execution timings.For proofs to be practical, we must develop the model

further. Following Keller [6] we may represent a parallelprogram as a directed graph having only two types of nodes:1) place nodes which represent points in the program code atwhich instruction pointers of processes may dwell and 2)transition nodes which represent the possible "indivisible"computations of the program code. There are no restrictionsin forming the directed graph other than that there are noarcs directed from a node of one type to a node of the sametype, or more than one arc in a given direction between anytwo nodes. (The graph does not even have to be completelyconnected.) When there is a directed arc from node x to nodeY, we say that x is an input node of Y, and that Y is an out-put node of x.

It is convenient to label the transition nodes and the placenodes with two sets of names, Ti and Pi, respectively. Thesenames are distinct from each other and from the names of anyprogram variables. It is useful to associate a unique nonnega-tive integer variable, the place variable, with each place node,whose value is the number of instruction pointers dwelling atthat place node. Place variables enable us to talk about thecontrol state and data state on the same ground-as the con-tents of variables-and seem to have been first introduced byKeller in [6]. To avoid confusion, we adopt the notationalconvention that "Pi" represents the name of the place node,and Pi represents the number of instruction pointers dwellingat the place node.Each transition node Ti has an expression associated with it

of the form "EP -+AA", where EP is the enabling predicate,and AA is the atomic (indivisible) action associated with the

transition. The enabling predicate is a Boolean expressionwithout side effects involving the place variables of the inputplace nodes of the transition and some subset of the programvariables. The atomic action is a program which advancesprocess instruction pointers across the transition (by changingthe values of the place variables of the input and output placenodes of the transition), and which may also change thevalues of some subset of the program variables. The atomicaction is required to terminate in a finite amount of time.The intent is that when the enabling predicate is satisfied,the transition is enabled. At each step of program execution,exactly one of the enabled transitions is randornly selected,and its atomic action executed.Although the atomic actions of simultaneously enabled

transitions could theoretically be executed simultaneously insome cases, it makes no difference that they are performedserially in some arbitrary order-the final effect on the pro-gram and place variables is the same for those cases. A con-sequence is that the execution of one transition does not neces-sarily correspond to an amount of elapsed simulated time.However, we are not concerned with modeling executiontime exactly. We have achieved the goal that any proofs ofprogram correctness will be valid for any possible set ofrelative execution timings.

It is usually convenient to factor the enabling predicate andthe atomic action into two parts-the control part and thedata part-as follows.

EPC(Pin) A EPd(x) -* AAc(Pin, Pout); AAd(X)

where

Pin is the vector of place variables corresponding tothe input place nodes of the transition;

Pout is the vector of place variables corresponding tothe output place nodes of the transition;

x is the vector of all program variables (bothglobal and local to processes);

AAC, AAd are the atomic actions on the control and datavariables, respectively.

In the normal case EPJ(Pin) is just the condition that allinput place variables (there is usually only one) are positive,i.e., that all input place nodes are occupied by the instructionpointer of at least one process. The part of the atomic actionthat affects the control state, AAc(Pin, Pout) is normally justthe action of subtracting one from each input place variable,and adding one to each output place variable, i.e., advancingthe processes across the transition. (The normal behaviorwith respect to the place variables is the same as that of aPetri net.)As examples, the "fork" construct is modeled in Fig. l(a),

and the "join" construct is modeled in Fig. l(b).We have departed from Keller's notational convention that

the conditions and actions on the place variables are implicit,for we have discovered that it is sometimes safer and less con-fusing if the actions on the place variables are explicit, andthat it is sometimes desirable to depart slightly from standardaction on the place variables. (For example, see the discussion

559


P1T1 P>-O- P: P-1,

p2 = p2 +IP3 : =P3 +1

P2 P3

(a)

Pi P2

T1 P>O P2>0 =- t

F1: P2 -1 ;P3 - P3 +

(b)Fig. 1. (a) Transition diagram for "fork". (b) Transition diagram for

4join".

of the modeling of WAITANDRESET on an EVENT variable in[8, sect. 3].)To illustrate some considerations in the modeling of vari-

ables global to a process, consider the program "s := s+1"where s is a global variable. This program is modeled in Fig. 2.The temporary TX represents the variable T (the central pro-cessor's accumulator register in most machines) local to theprocess whose process number is x. This example illustrateshow counts can be lost when this program is executed bymultiple processes. For example, process x executes transi-tion T1. Then process y executes both transition T1 and T2.When process x finally executes transition T2, the assignment"S:= Tx" will have the effect of losing the increase in s madeby process y.By definition, variables local to a process cannot be directly

affected by any actions of other processes. Thus a sequence ofcode referencing only local variables could be modeled as anindivisible action.The above considerations lead to the concept of an "effec-

tively indivisible" computation. A computation is effectivelyindivisible if the final values of all the variables which it maychange are always determined only by the initial values of thevariables it references. That is, the final values for the vari-ables it may change are not affected by other processes. Thus,in general, an effectively indivisible computation makes atmost one access, read or write, to a gobal variable, unlessthere are special constraints on the multiple global accesses.The concept of effectively indivisible actions can be usedto greatly reduce the number of transitions when modelingparallel programs.

II. THE APPROACH

In the literature, parallel programs often take one of twoforms: Either there is a fixed number of processes repeatedlyexecuting the program forever, or there is an infinite numberof processes, each of which executes the program once.Neither of these forms lends itself well to proving the absenceof livelock. The alternative proposed is to have a finite butarbitrarily large number of processes each execute the pro-gram once or a finite but arbitrary number of times. The pro-posed alternative has several advantages as follows:An infinite number of input processes is unrealistic, even on

ideal computers.Having an infinite number of input processes can obscure

Pt9

Ti PI>°-Pp = Pi-'; P2 =p2+1;

T2E¢ P2>0- P2:= P2-1; P3:=p3+1;

P3 S:=Tx;

Fig. 2. Transition diagram for "s := s + 1".

certain deadlocks. An example occurred in a commercialenvironment where one or more processes could be hung, butcould be unblocked if two or more processes later executedthe program concurrently. If there were an infinite numberof input processes, then such a hung process could potentiallyalways be unblocked. However, if the number of input pro-cesses were fmite, the possibility of unblocking the hung pro-cess eventually disappears.Suppose the processes are required to execute the parallel

program repeatedly. It is desirable to be sure that there areno deadlocks or livelocks caused by one or more processeslooping back and executing the program again. To help provethis is not the case, the program may be modeled as a loop;each process would go around the loop exactly Mx times, andthen halt, where Mx is an arbitrary positive integer constantwhich may have a different value for each process x. Further-more, suppose that the number of processes executing theprogram is an arbitrary positive integer N. Let the initial statesatisfy the initial state predicate, Qo. If it can be proven thatthe program terminates in a finite number of steps under theseconditions, then there are no livelocks or infinite loops. Also,if any final state must satisfy the final state predicate, thenthere are no unwanted deadlocks.The model of the program can sometimes be simplified

slightly. Suppose it makes no practical difference (it cannotbe distinguished) whether a process is looping back to re-execute the program, or a process is beginning to execute theprogram for the first time. Then the processes can be modeledas executing the program only once. (When the suppositionholds, if N ' processes loop back and reexecute the program anaverage of M times each, that is a subcase of N processes ex-ecuting the program exactly once each, where N=N '*M.)We conclude that it is often best to model a finite but

arbitrarily large number of processes executing the programeither once or a finite but arbitrary number of times, as ap-propriate, so that finite termination may be used to showthe absence of livelock and infinite loops.In [21 and [3] Dijkstra presents excellent techniques for

constructing correct programs and for proving programs cor-rect utilizing serial nondetenninistic programs. His repetitiveconstructis doB1 s lS B2 -S2 **2 Bn--Sn odHere the "Bi -+ si" are guarded commands. The guards Bi

are Boolean expressions without side effects, and the si areordinary program statements or lists of statements separatedby semicolons. At each iteration, one of the guarded com-mands is randomly selected for execution from among thosewhose guards are satisfied. When all guards are false, the nextstatement in sequence is executed.Recalling the form of the expression associated with each

transition node in the program's transition diagram, it is seen

560


that each transition expression can be considered as one ofDijkstra's guarded commands, where the enabling predicateis the guard. Simulating the parallel program consists ofrandomly selecting one of the enabled transitions for execu-tion at each iteration, but this is exactly the same controlstructure associated with Dijkstra's repetitive construct.Therefore, we can immediately construct an equivalent non-deterministic serial program from the parallel program's transi-tions. (Keller also constructs an equivalent nondeterministicserial program in [6].) This program consists of a single in-stance of Dijkstra's repetitive construct, the guarded com-mands of which are the collection of all the transition expres-sions in the program's transition diagram. Having constructedthis program, Dijkstra's results can then be applied.The above ideas were used to formulate a strategy for

proving parallel programs which is summarized in the fol-lowing six steps.

1) Make a model of the code to be proven. Suppress detailthat is irrelevant for the purposes of the proof. It may be de-sirable to make the proof in stages. (For example, first provethat the normal mutual exclusion mechanism works. Then, inthe next stage of the proof, the locking mechanism can bemodeled as a primitive, and critical sections can probably bemodeled as occurring in one transition.) If a process reexecut-ing the program can be distinguished from a process first ex-ecuting the program, model the program as one big loop ex-ecuted an arbitrary but finite number of times by each process.Otherwise, a reexecution cannot be distinguished from aninitial execution, so model the program as being executedonce by each of an arbitrary but finite number of processes.2) Make a transition diagram for the modeled code. Use

the concept of effective indivisibility to minimize the num-ber of transitions.3) Write down the predicates characterizing the initial and

desired final states-QO and Qf, respectively. These predicateswill involve both the program and place variables. It is usuallybest and most realistic to have one or a small number of placenodes which are initialized to contain a finite but arbitrarilylarge number of processes, and upon correct termination tohave all or almost all place nodes empty, and no transitionsenabled.4) Tabulate all the variables (including all program and

place variables), the initial and final state predicates, andall of the transition expressions. This table and the transitiondiagram will be referred to in the course of the proofs. Itsaves space and often increases clarity if references to theplace variables are omitted in the transition diagram. How-ever, the place variables should be included in the table oftransition expressions to prevent confusion and to minimizethe chance of error when referring to the table during thecourse of a proof.5) Prove partial correctness: Devise predicates which specify

the desired synchronization behavior. The place variables willbe very suggestive in devising such predicates. Typically, how-ever, the original predicates will have to be strengthened by"ANDing" additional conditions which are suggested by thetransitions for which the original predicates cannot be proven.It can also happen that new program variables will have to beintroduced in order to prove the desired invariants.

The goal is to prove the predicates are QO-invariant- (in-variantly true between transitions provided the initial statesatisfies QO). This can be done by induction over the Qo-reachable states (all states which satisfy Qo or are reachableby a finite legal execution sequence of transitions from atleast one state which satisfies QO). Then, for each transition,prove that if the induction hypothesis holds for the currentstate, and the transition is enabled, then the predicate musthold in any state resulting from an execution of the transition.The induction hypothesis is threefold: that the predicate holdsfor the current state, that the current state is Qo-reachable,and that the enabling predicate of the transition is satisfied.6) Prove that, provided the initial state satisfies Qo, the pro-

gram terminates in a state satisfying the final state predicateQf after executing at most a finite number of transitions, asfollows:

a) Prove that a Qo-reachable state satisfying the finalstate predicate is, indeed, a final state: Prove that in any Qo-reachable state, no transitions are enabled if the final statepredicate Qf holds, by examining the enabling predicates ofall the transitions. This is usually straightforward.

b) Prove that there are no Qo-reachable final states whichdo not satisfy the final state predicate Qf. This proves there isno deadlock. (Note: Presumably Qf was chosen so that itimplies no deadlock. What is actually proven is that any Qo-reachable deadlock state must satisfy Qf.)Let ACTIVE(q) be the predicate which is true if and only if at

least one transition is enabled in state q. (Throughout this pa-per, q represents the vector of all state variables. It includes allprogram, place, and introduced variables.) Thus -ACTIVE(q)holds if and only if q is a terminal or final state (a state inwhich no transitions are enabled). ACTIVE(q) may be ex-pressed as {EP 1(q) V EP2(q) V ... V EPn(q)}, where EPi is theenabling predicate of the ith transition expression, and n is thenumber of transitions. Then the objective of this step may bestated as proving {-ACTIVE(q) D Qf(q)} Qo-invariant. Because{A D B} is equivalent to {-B D A}, the predicate to beproven may be rewritten as {'Qf (q) D ACTIVE(q)}.- Thelatter form is usually easier to prove directly. Proving thispredicate may be nontrivial, and normally involves insightinto the program. Usually, additional Qo-invariants will haveto be proven in addition to the invariants already proven forpartial correctness. Also, additional program variables mayhave to be introduced.(Note: If Qf includes any conjuncts that are Qo-invariant,

they are Qo-invariantly false (and, therefore, irrelevant) dis-juncts in -Qf by De Morgan's rule. They do not help to dis-tinguish the final state, and their Qo-invariance must be provento prove {-Qf(q) D ACTIVE(q)}. Therefore, it is more con-venient and less confusing if Qf does not include any conjunctsthat are Qo-invariant.)

c) Prove that each transition guarantees progress toward thefinal state provided that the initial state satisfies QO. This provesthere are no infinite loops and no livelock. This step is oftenmost conveniently accomplished with the aid of Dijkstra'svariant function v. (Note: Dijkstra calls it t in [21.) Let v bea finite integer function on the state of the system. Then wewant to prove the following two predicates on v to be Qo-in-variant: i) {v(q) > 0} and ii) {q -* q' D v(q)>v(q')}. (Here

561


q -e q' means that if the state vector is q, then the state vectorq' results after the execution of exactly one, but any one, ofthe transitions which are enabled in state q. The two condi-tions mean that if the initial state of the program satisfies Qo,then 1) the value of v is bounded from below by zero and 2)the value of v decreases by at least one after the execution ofany transition. This step c) proves that the program mustterminate after a finite legal execution sequence of transitions,and the final state in which it terminates must satisfy Qf (pro-vided the initial state satisfies Q0).Unfortunately, for some programs a strictly decreasing

variant function cannot conveniently be found. In such casesit is necessary to prove that (provided Qo holds initially) thevariant function fails to decrease at most a finite number oftimes, and that any increases in the variant function are finite.This proves the program must terminate in a fmite number ofsteps (if Qo holds initially). However, the proof of finitetermination in such cases is often more difficult than if astrictly decreasing variant function can be found. The proofin such cases consists primarily of showing that the transitionswhich do not decrease the variant function can occur at mosta finite number of times.In the following sections, the proof strategy will be illustrated

with the aid of examples. The next section explains certainextensions to Algol 60 which are incorporated in BurroughsAlgol, so that the code to be proven correct can be understood.

III. BURROUGHS ALGOL CONSTRUCTS

The B6000 series large-scale digital computer systems havemany interesting aspects. One of these aspects is that the sys-tem software is written in high-level languages which are ex-tensions to Algol 60. This is practical because the architectureof the machine was designed with the execution of Algol inmind. A few high-level language constructs have been addedto various dialects of Burroughs Algol in order to effectivelyutilize some interesting hardware operators. One of theseis the readlock operator.The readlock operator makes it possible to write a word

of memory and return the previous contents of the wordin one uninterruptible memory cycle. For the expressionREADLOCK(A, B), first A is evaluated. Then, the old valueof B is retrieved (and is the value of the expression), and thevalue computed for A is stored into B in one uninterruptiblememory cycle. The name "READLOCK" comes from the ex-pected usage of the operator-to read a lock variable and lockit at the same time.In the operating system a "buzz" loop can be written as

follows:

DISALLOW;while READLOCK(true, LOCKVAR) do;[critical section]LOCKVAR := false;ALLOW;

The ALLOW and DISALLOW turn external interrupts on andoff, respectively, so the central processor executes the loopuntil the central processor that owns the lock releases it.

User programs, however, are not permitted to turn interruptson or off directly, since that might cause the system to hang.Also, it is not always desirable to use a buzz loop in the op-erating system. Therefore a mechanism was provided forblocking and unblocking processes which frees the centralprocessor for other work while the process is blocked. It iscalled the event mechanism.

It is not the intent of this paper to debate the pros and consof the event mechanism versus alternative synchronizationprimitives. Rather, the intent is to explain just enough of itto motivate the code for the buffer synchronization example,which is necessary so that correctness proofs of it can beunderstood.The software interrupt and available/unavailable properties

of events will be ignored for purposes of this discussion. Forfurther information see [1] and [10]. An event variable isdeclared exactly analogous to a real variable, e.g., "EVENT E",and causes a two-word event variable to be allocated. An eventmay be in one of two states-"happened" or "not happened".The state of the event may be tested with the Boolean func-tion HAPPENED(E). If a process waits on an event [e.g.,WAIT(E)J, then it is blocked if the event is "not happened,"otherwise it proceeds. Another process will presumably latercause the event [e.g., CAUSE(E)], causing any blocked processeswaiting on the event to resume and simultaneoulsy setting thestate of the event to "happened."A variation on WAIT iS WAITANDRESET: When the event is

caused, all processes waiting on the event resume, and if anyprocess had executed WAITANDRESET on the event, then theevent is left in the "not happened" state. An event may alsobe put into the "not happened" state with the reset function,e.g., RESET(E).The procedures CAUSE, WAIT, WAITANDRESET, and RESET

are in the operating system, but the details of these proceduresare not of interest in the example to be presented.The only remaining detail which must be explained is the

stack number register SNR. The central processors of the B6000series are stack machines, and every process has a physicalhardware stack in main memory. (See [9] or [10] for agood description of the stack and process structure of theB6000 series.) The stack number does not change during thelife of a process, and is an integer greater than one and lessthan 1024. The SNR register of each central processor al-ways contains the stack number of the process it is currentlyexecuting. SNR can be read directly by programs written inthe dialects of Algol for which the SNR construct is imple-mented. Thus, the value of SNR is a nearly ideal processnumber.

IV. BUFFER SYNCHRONIZATION EXAMPLEBecause it is desirable to allow as much parallelism as pos-

sible, the data management system on the B6000 and B7000series gives up the exclusive lock before waiting for an I/Ooperation to complete. This gives rise to the need for ascheme to synchronize the usage of buffers. The data manage-ment system uses the following scheme for synchronizationof index table buffers.A process may be in one of three states in relation to a

562


buffer: It can either be a reader, a potential changer, or anactual changer (exclusive controller) of the buffer. A processbecomes a reader if it will not change the buffer. The buffermay not be changed by any process while there are anyreaders using it.

If the process will or might change the buffer, the processbecomes a potential changer. The existence of a potentialchanger does not preclude processes from becoming readersof the buffer. This facilitates maximum accessibility of thebuffer while a process is deciding whether to change or is pre-paring to change the buffer. However, there may be at mostone potential changer at any point in time.The potential changer will decide whether to actually change

the buffer or not. If so, it must first obtain exclusive controlof the buffer. Once it has signaled its intention to become theexclusive controller, no new processes may become readersof the buffer. When the last existing reader releases the buffer,the potential changer obtains exclusive control. When the ex-clusive controller is through changing the buffer, it releases it.Until it has done so, no processes may become readers or po-tential changers. At any time there is at most one potentialchanger or exclusive controller. An exclusive controller mustfirst have been a potential changer.Thus, there are three paths through the code:

1) Become a reader.[reader section]Release the buffer.

2) Become the potential changer.[changer section]Release the buffer.

3) Become the potential changer.[changer section]Become the exclusive controller[exclusive section]Release the buffer.

Simplifying things by modeling only one buffer, these werecoded approximately as follows in DMALGOL, a dialect ofBurroughs Algol:

1) BEREADER;[reader section]DIVEST(READER);

2) BECHANGER;[changer section]DIVEST(CHANGER);

3) BECHANGER;[changer section]BEEXCLUSIVE;[exclusive section]DIVEST(XCLUSIVE);

where the declarations and procedures were approximately asfollows:

BooleanLOCKOUT,

EXCLUSIVE;

realREADOUT,

WAITCOUNT,

BLOCKLOCK;event

BUFFEREVENT,

BLOCKLOCKEV,

% TRUE 7-* THERE IS A POTENTIAL% CHANGER PROCESS.% TRUE =* A PROCESS DESIRES OR

% HAS EXCLUSIVE CONTROL OF THE BUFFER.

% NUMBER OF READER PROCESSES.

% NUMBER OF PROCESSES WAITING ON

% BUFFEREVENT OR EXCLUSIVEEVENT.% LOCK VARIABLE

% USED FOR BLOCKING PROCESSES

% WAITING ON THE BUFFER.

% EVENT FOR THE LOCK VARIABLE "BLOCKLOC K".EXCLUSIVEEVENT; % BLOCKS THE EXCLUSIVE PROCESS.

defineREADER = #,CHANGER =2#,XCLUSIVE 3#,LOCKBLOCKLOCK =

beginif READLOCK(SNR, BLOCKLOCK) 0 0 then

do WAITANDRESET(BLOCKLOCKEV)until READLOCK(1, BLOCKLOCK) = 0

end#,UNLOCKBLOCKLOCK=

beginif READLOCK(0, BLOCKLOCK) # SNR then

563


CAUSE (BLOCKLOCKEV);end#;

procedure BEREADER;begin

LOCKBLOCKLOCK;while EXCLUSIVE do BIGWAITER(false);READCOUNT := READCOUNT+1;UNLOCKBLOCKLOCK;

end;procedure BECHANGER;

beginLOCKBLOCKLOCK;while LOCKOUT do BIGWAITER(false);LOCKOUT := true;UNLOCKBLOCKLOCK;

end;procedure BEEXCLUSIVE;

beginLOCKBLOCKLOCK;EXCLUSIVE := true;while READCOUNT * 0 do BIGWAITER(true);UNLOCKBLOCKLOCK;

end;procedure BIGWAITER(BEEXCL); value BEEXCL; Boolean BEEXCL;

beginWAITCOUNT :WAITCOUNT+1;if BEEXCL thenbegin

RESET (EXCLUSIVEEVENT);UNLOCKBLOCKLOCK;WAIT(EXCLUSIVEEVENT);

endelsebegin

RESET(BUFFEREVENT);UNLOCKBLOCKLOCK;WAIT(BUFFEREVENT);

end;LOCKBLOCKLOCK;WAITCOUNT := WAITCOUNT -1;

end OF BIGWAITER:procedure DIVEST(TYPE); value TYPE; real TYPE;

beginLOCKBLOCKLOCK;if TYPE=READER thenbegin

READCOUNT := READCOUNT -1;ff READCOUNT =0 then

ff WAITCOUNT * 0 thenif EXCLUSIVE then

CAUSE(EXCLUSIVEEVENT)else

CAUSE(BUFFEREVENT);endelseif TYPE=CHANGER then

564


beginLOCKOUT := false;if WAITCOUNT # 0 then

CAUSE(BUFFEREVENT);endelseif TYPE=EXCLUSIVE thenbegim

EXCLUSIVE := false;LOCKOUT := false;if WAITCOUNT # 0 then

CAUSE(BUFFEREVENT);end;UNLOCKBLOCKLOCK;

end of DIVEST;

Critical sections are started by executing LOCKBLOCKLOCK,and finished by executing UNLOCKBLOCKLOCK. The code forthese two constructs is compiled in-line using the powerfulDEFINE (macro) facility, and insures that at most one processis in a critical section at any point in time. For a completediscussion of the mutual exclusion code and a proof of its totalcorrectness, see [8]. Here we will simplify the situation byonly modeling the effect of the code: The variable BLOCKLOCKis defined to be 0 or 1 according to whether the lock is un-locked or lcked, respectively. (In reality it can also be thestack number of a process.) BLOCKLOCKEV is ignored entirely.Then the transition diagram for LOCKBLOCKLOCK is shown inFig. 3(a), and the transition diagram for UNLOCKBLOCKLOCKis shown in Fig. 3(b).Whenever possible, we will simplify even further by merging

the code

LOCKBLOCKLOCK;[action]UNLOCKBLOCKLOCK;

into one transition, as shown in Fig. 4, with no net effect onthe lock variable BLOCKLOCK.For smplification, we will abbreviate the variable names as

follows:

C LOCKOUT,E BUFFEREVENT,EX EXCLUSIVEEVENT,L BLOCKLOCK,R READCOUNT,W WAITCOUNT,* EXCLUSIVE,u introduced variable.

We introduce the variable u in order to produce a strictly de-creasing variant function to help prove proper termination.Its only usage in the program will be to be decremented by 1as an indivisible action in the procedure BIGWAITER immedi-ately after the process waits on BUFFEREVENT or EXCLUSIVE-EVENT. Thus, it will count down by one every time a processproceeds after waiting on either event.

PiQT,

P2e

PiQT,

P2t

Pi>0 - BLOCKLOCK=0-Pi: = Pl-'l; P2 = P2+ 1;BLOCKLOCK: =I;

(a)

Pi>_ PL:=KPl-; P2:=p2o; ;BLOCKLOCK:= 0;

(b)Fig. 3. (a) Transition diagram for "LOCKBLOCKLOCK".

diagram for "UNLOCKBLOCKLOCK".(b) Transition

I

TI P1>0 ^ BLOCKLOCK=O-P i:= pi-'; P2 =P2+ ;

P2 (ACTION]

Fig. 4. Transition diagram for "LOCKBLOCKLOCK; [ACTION];UNLOCKBLOCKLOCK".

With the above in mind, the transition diagram is shown inFig. 5. (Again, the place variables have been omitted from thetransition expressions for convenience and readability.)The initial state predicate Qo specifies that all place variables

except P1 are zero, and that L, R, w, and x are zero. Itspecifies that the initial value for u is the square of the initialvalue for Pl. The processes make a nondeterministic choiceat P1 of whether to become a reader or a potential changer.At P8 the processes make another nondeterministic choiceas to whether to become the exclusive controller or to releasethe buffer.The fmal state predicate Qf specifies that all place variables

are zero, and that L, C, W, R and x are zero. The states ofthe events and u are irrelevant.The above is summarized in Table I.The variable u decreases by one as an indivisible action

each time a process goes around any of the three loops. The

565


Fig. 5. Transition diagram for buffer management algorithm.

problem is to prove that it is bounded from below by zero.

This will be shown below.

Partial CorrectnessThe following includes the partial correctness coniditions and

can be proven Qo-invariant.

11: Ila A Ilb A ilc A Ild A ile A if

where

1ia P13*P12 =Olb: P8+P9+Plo+Pll+P13 S 1

II c: x1 D "P 12 does not increase"lid: C=0-P8+P9+P1o+P11+P13=0Ile: R=P12Ilf: X=O-P9+PlO+Pll+P13=0.

In writing the above predicates, use has been made of the factthat all the place variables are nonnegative.Predicate ila means that there are no readers if there is an

exclusive process, and that there is no exclusive process ifthere are any readers.Predicate Ilb implies that P8 < 1 and P13 < 1. Thus, there

is at most one changer, and there is at most one exclusiveprocess. Furthermore, there is never both a changer and an

exclusive process, for Ilb also implies P8 +P 13 < 1Predicate Ilc means that once a process has signaled its

intent to become the exclusive controller of the buffer, no

new readers are allowed.Predicates 1d through IIf show that the variables c, R, and

x accurately reflect the state of certain place variables, andhad to be included in order to prove Iia through il,The proof is straightforward and is not reproduced here.

Termination, Step A: Qf(q) Implies -ACTIVE(q)If Qf(q), then all place variables are zero. Since every transi-

tion requires a place variable to be positive, no transitionsare enabled. Thus, {Qf(q) D -ACTIVE(q)} is Q0-invariant.

Termnination, Step B: -ACTIVE(q)Implies Qf(q)

The proof that {-ACTIVE(q) D Qf(q)} is QO-invariant isbroken into separate cases. Each case is of the form {x t 0 DACTIVE(q)}, where x is one of the state variables that mustbe zero for Qf to hold. The proof will be facilitated if the

following Qo-invariants are proven first:

(L=0 D P2+P3+P9=0) A (L:$ 0 D P2+P3+P9=1)

W=P4+P6+P5+P7+Plo+Pll

P4>0 A E=O0D X>O

P5>OA E=O0D C>OP10>0 A EX=O D R>O.

The first two predicates show how L and w reflect the valuesof certain place variables. The last three predicates show whythe three key transitions T7, T8, and T15 do not result indeadlock.The proof will be easiest if the variables are taken in the

following order: L, then all the place variables except P4, P5,and P1o, then PI0, P4, P5, and, finally, C, R, W, x. The strat-egy for the program variables is to use previously proven in-variants to show that if the variable is positive, then the sum

of certain place variables is positive. Since at that point,{Pi >0 D ACTIVE(q)} will have been proven for all the placevariables Pi, the proof of the current case will then follow.The details of the proof are not included here.

Termination, Step C: Finite

The introduced variable u may be proven nonnegative(provided the initial state satisfies QO) by the followingargument.We assume the initial state satisfies Qo. There are exactly

three loops, and they go through T7, T8, and T15. These

P10

TI: L=O-L:= I; T2 L=O-L:= I;

GLOBALS:C.L.R,W,X: oE,EX:0 OR IU: PIl pI

ThA X= -

R:=Rfl; .WE:=1; C =; E:=W;E4L 5 L' -0,

T7E =U-T1 E =UI

P6 P7

=9L=W-Lt, L LW=.-L i,

([CHANGER SECTION]

T , L=o-C:=o;. 712 L=o-=II_ if W*Oo then E = l; ( _ X = 1;

Ti 3: T0,4i R - l

P12 [READER SECTION] P13 (EXCLSTION P10 L -0;

T if R= 0 - W= then TnLi T 5 =ITI7 if X=l then EX:= I Tc:=o x:=o; U:=U-;else E: =I; , )_ if W$ O then p 1,

19g 6 W:=W-l.

566


TABLE IBUFFER MANAGEMENT ALGORITHM

State Variables:Place Variables: P1,...,P 14Program Variables: C,E,EX,L,R,W,X,U

State Predicates:QO: All state variable zero except: PI>O, E,EX=O or 1, U=PI*P .Qf: All state variables zero except don't care about E,EX, or U.

Transitions:

Enabling Predicate

T1 P1' =TI P>0 A L=OT2 P1> A L=0T3 P2>0 A X=O -

T4 P2>0 A Xe 0 -+T5 P3>0 A C=O

T6 P3>0 A Ce0OT7 P4'° A E=1T8 P>0 A =1E=

T9 P6>O A L=O A

T10 P7>0 A L=O A

T11 P80 A L=O A

T12 P8>O A L=O A

T13 P9>0 A R=O A

T14 P9>a A RXe 0 A

Tis Plo>0 A EX=lT P

al 0A=T16 11 LT17 P12>° A L=0

T18 P13> A L=O A

T19 P14 '°

P1 :=P1- 1; P2 :=P2+1; L:=l;PI :=P1-1; P3:P3+1; L-=l;P2:=P2-1; P12 P12+1; L:=0; R:=R+I;P2 :=P2-1; P4 :=PP4+1; L:=0; E:=O; W:=W+l;

P3 -1; P8 =P8+1; L: 0; C:=l;

P3:=P3- 1; P5 :=P+l; L:=0; E:=O; W:=W+l;P4:=P4-1; P6 :=P6+1; U:=U-1;P5:=PS-1; P7 :=PP7+1; U:=U-1;p6:=P6-1; P2:=P2+1; L =1; W:=W-1;P7:=P7- 1; P3 :=P3+1; L =1; W:=W-1;P8:=P8-1; P14 P14+1; C:=O;

if WhO then E:=I;

P8:=P8-1; P9 =P9+1; L:=l; X:=l;P9:=P9-1; P13:=P13+1; L:=O

P9:=P9-1; 10 :=Plo+1; L:=O; EX:=O; W:=WAI;P1:=Plo- ; P11: Pl+ ; U:=U-1;

P11:=Pl-1; P9=P9+1; L:=l; W:=W-1;p12:=P12-1; 14 14+1; R=R-1;

if R=OAWAtO thenif X=1 then EX:=1

else E:=l;

P13 :=P13- ; P14 P14+1; C:=O; X:=O;if W(o then E:=I;

I P14 =P14- 1;

Note: Underlined terms in tables are shown in boldface in text.

three key transitions cause processes to wait on event E or

EX, which is reset in the previous transition. Thus, a process

waiting on E or Ex at T7, T8, or T15 depends on anotherprocess to unblock it. E and Ex are only caused by transi-tions Tll, T17, and T18. A process passing through any ofthese transitions will never cause either event again, for itwill eventually be destroyed by T1g. Thus, the events willnot be caused more than n times, where n is the value of P1in the initial state Qo. Each time an event is caused, fewerthan n processes resume and decrement u by 1. Thus, u willbe decremented by less than a total of n *n. Since n*n is theinitial value of u in the initial state, {u > 0} is Qo-invariant.Given the fact that {u > O} is Q 0-invariant, the following

is a strictly decreasing nonnegative vari-ant function for theprogram:

v = 3*u + 6*p1 + 3*P2 + 5*P3 + 2*P4 + 4*P5 + 4*P6 + 6*P7+ 4*P8 + 3*P9 + 2*p10 + 4*P11 + 2*P12 + 2*P13 + P14-

The proof is straightforward, and is not included here.

Discussion

In the version of the program for which a proof was first at-tempted, EXCLUSIVEEVENT was not used. This version of theprogram can be obtained by substituting BUFFEREVENT forEXCLUSIVEEVENT in the program presented above. (Then, ifdesired, one could simplify the resulting program by eliminat-ing the parameter to BIGWAITER, and by collapsing the IF state-ment in BIGWAITER and the IF statement in the READER case

of DIVEST.) Partial correctness was proven, i.e., that buffer

usage was synchronized properly. Finite termination was alsoproven with a strictly decreasing variant function which in-cluded the place variables and u. This left only the absence ofdeadlock to be proven, i.e., that {-Qf(q) D ACTIVE(q)} isQ 0-invariant.The absence of deadlock, however, could not be proven.

(The proof of {fp10 >OA E=O D R>0}, which correspondsto {P10>0A EX=O D R>0} for the above program, wouldnot go through.) After many frustrating attempts, the pos-sibility that the code was actually incorrect was considered,albeit reluctantly, for the code had been in the field for twoand a half years since the only reported deadlock conditionhad been corrected. However, a few minutes later a counter-example was found. This was so surprising that full confi-dence in this result was not felt until the deadlock had beenobserved in an actual test case.The deadlock could occur if the process trying to obtain

exclusive control of the buffer were suspended just beforewaiting on the event at Plo, the last reader caused the event,a new reader reset the event, and the exclusive process thenresumed and waited on the event. (For a full discussion, see[8, sect. 4].) The code was corrected by adding a new event,EXCLUSIVEEVENT (abbreviated EX), to avoid the problem ofBUFFEREVENT being reset before the exclusive process had achance to wait on it. The corrected code is slightly more ef-ficient, because the last reader only unblocks the exclusiveprocess instead of all blocked processes.Thus, the buffer synchronization example shows that, be-

cause there are so many possible execution sequences, it can

567


be practical to apply proof techniques to eliminate errors inparallel programs, eyen though the code has apparently beenworking correctly for years.The buffer synchronization example illustrates a strategy to

structure proofs hierarchically: The basic locking code provencorrect in the mutual exclusion example was modeled as aprimitive, which simplified the proof.The current example also illustrates the technique of intro-

ducing program variables in order to aid in the proof.The example also illustrates that it can be useful to refer

to the transition diagram itself rather than only to a list oftransition expressions, because the loop structure is obscuredin such a list. In the current example, reference to the loopstructure simplified the argument that the introduced variableu was bounded from below.The buffer synchronization example illustrates the difficulty

in thinking accurately in detail about three or more processesexecuting a parallel program concurrently. The number ofcases to consider grows swiftly as the number of processesincreases.

V. LIVELOCK AND FINITE DELAYConsider the following Burroughs Algol program.

P1: while READLOCK(true, L) do;P2: [critical section]

L := false;

This program is modeled in Fig. 6 with {true, false} mappedonto {1,0}.Referring to Fig. 6, the transitions may be summarized as

follows:

T1: P1>OAL=O-+P1 :=P1-l;P2 :=P2+1;L:=1;T2: P1>OAL=1-e L :=1T3: P2>O -*P2 :=P2-1; L :=0

The state predicates are as follows:

Qo: L=0AP1>0AP2=0Qf: L=0AP1=0AP2=0.

Proving mutual exclusion is not difficult. Proving finite termi-nation is harder., Processes at P1 will loop until the process atP2 unlocks the lock. In theory, processes at P1 can loop for-ever, because at each step it is possible to select T2 for firinginstead of T3. However, the scheduling algorithms of mostoperating systems try to avoid starving out processes indefi-nitely. An interrupt from an I/O completion or a periodicinterrupt from a system timer provides an opportunity for theoperating system to decide which process to resume executingafter handling the interrupt. By accounting for the time spentexecuting each process, the operating system can avoid in-dividual process starvation. This avoidance of complete starva-tion of an individual process is referred to as the finite delayassumption-no transition may be enabled forever withoutfixing. It means that all processes run at finite speed.(Note: There is an undesirable property that parallel pro-

grams can have which has also been referred to as starvation.

Ti :L0L:=I; T2 -L=

P2 [CRITICALSECTION]

T3 true--L-= U

Fig. 6. Transition diagram for readlock loop.

It occurs when a process can make no progress because of ac-tions of other processes. For example, in the classic diningphilosophers problem [7], a philosopher can "starve" if it sohappens that his two neighbors never put down their forkssimultaneously. Discussion of this topic is beyond the scopeof the present paper.)The finite delay assumption must be used to prove that the

above program terminates. When the finite delay assumptionis needed for proof of finite termination, it should be takenas a warning that the program might possibly be coded moreeffectively for a multiprogramming system. On a multipro-eessing/multiprogramming system, the use of synchronizationprimitives frees the central processor to execute other pro-grams, while the use of lock loops with interrupts enabled gen-erally wastes central processor time unnecessarily.Consider the most obvious attempt at a variant function v

for the above program v = 2*p1+PP2.The following table illustrates its behavior under the three

possible transitions:

v - v

TI: -1T2: 0T 3: - I

The value of v is decreased by T, and T3 but not by T2, andthus v is not strictly decreasing. To prove finite termination,we must show that starting from a state satisfying Qo, anytransitions that do not decrease v can fire at most a fmitenumber of times, and that any increase in v is finite. In thepresent example v never increases, and only T2 does notdecrease v.For T2 to fire, L=1 must hold. However, the following

predicate is easily proven Q0-invariant:

(L=1 P2=1) A (L=0-P2=0)This predicate includes the partial correctness condition (mu-tual exclusion at P2). Thus, L=1 implies P2=1. But P2>0means T3 is enabled and must fire after at most a finite num-ber of transitions by the finite delay assumption. When itdoes, T2 is decreased by one. The proof then follows by in-duction on the number of processes (which is finite, butarbitrarily large initially).In the operating system on the B6700 there are buzz loops

(readlock loops) similar to the above but with external inter-rupts turned off. (These control state buzz loops are used tosynchronize the central processors on a B6700 multiprocessor

568


system.) Once external interrupts enter the picture, the modelof the situation is at the central processor level, not the pro-cess level. The proof strategy we have presented is applicableto proving parallel programs at the central processor level aswell as at the process level. (The examples up to this pointhave been at the process level.)The examples so far have illustrated two different synchroni-

zation techniques. In the examples of the previous sections, aprocess could block another from proceeding at all by meansof a common global variable. In the current example, no pro-cess ever blocks another completely from making transitions,but may steer other processes out of the critical section bymeans of a common global variable.In the first case deadlock must be avoided. In the second

case deadlock is impossible, but livelock must be avoided.Consider the following program in pseudo-Algol from [5].

(The notation "{c := c+l}" means that c is increased by oneas an indivisible operation.)

P1: while {c := c+ll}#I doP2: {c :=c-l};P3: [critical section]

{c :=c-l};The transition diagram is shown in Fig. 7.Again, it is relatively easy to prove mutual exclusion, but not

finite termination. If there are at least three processes initially,it is possible in theory for the system of processes to oscillatein an "after you", "after you" situation.For example, assume P1 initially holds three processes, and

c is 0. The notation "P1 [T I ] P3 " means that the process ad-vances from place node P1 to place node P3 via transition T1.

Process 1 Process 2 Process 3

At P1. At Pl. At Pi.

c =0.PI [T1 ]P3.C=1.

c=l1.Pl [T2 1P2.c=2.

c=2.P1 [T2 ] P2-c=3.

c=3.P3AT4].c=2.

c=2.P2[T3]P1 [T2]P2-c=2.

C = 2.P2 [T3 1PI [T2 I P2-c =2.

etc.etc.

Pi

Tl tI=0-C:=C+ I; T24coo _C:=C+I;

P3 [CRITICAL P2SEC TI ON]

T4 true--C:=C-i T3 trueI -C:=c-

Fig. 7. Transition diagram for livelock.

The oscillation can, in theory, continue forever, and is called"livelock." Livelock can theoretically occur whether thefinite delay assumption holds or not. Reliance on the finitedelay property in proofs is generally not considered objection-able. On the other hand, the possibility of livelock is gen-erally considered objectionable, and it is desirable to proveits absence. Proving finite termination using a variant func-tion is a general strategy which can be used to prove theabsence of livelock in most parallel programs. The examplepresented in the next section is another program in whichdeadlock cannot occur, but for which the absence of livelockmust be proven.

VI. CLASSIC SYNCHRONIZATION PROBLEMOur fmal example illustrates how to handle variables local

to processes, how to distinguish particular individual pro-cesses, and how to use the finite delay assumption. To theauthor's knowledge, the solution to the problem of synchro-nizing processes without the aid of special synchronizationprimitives was first published by Dijkstra in [4]. He requireda symmetrical solution free from livelock, and assumed onlythat memory accesses are indivisible operations (which is truefor nearly all computers). We have freely adapted his programhere as follows.The quantities N, K, A, and B are global to all processes.

The positive integer constant N is equal to the number of pro-cesses. The value of the integer variable K is the process num-ber of one of the next processes to enter the critical section.The variable K iS used to steer some of the processes awayfrom part of the program code. The array A [1 :N] is of typeBoolean. If A [z] =true, then process number z desires to enteror is in the critical section. Once true, A [z] remains true untilprocess z has entered and left the critical section. It is then setfalse. The array B[l:N] is of type Boolean. If B[z]=true,then process number z is attempting to enter the critical sec-tion, and other processes are steered away from the criticalsection. The value of B [Z] may change several times beforeprocess z gains access to the critical section.The quantities J, C, M, and z are local to each process. The

integer variable i is used as a loop counter. The integer vari-able c indicates the number of times the process is yet to re-peat the program. The positive integer constant M is equal tothe total number of times the process is to execute the pro-gram. The integer constant z is equal to the process numberof the process currently executing the program, and 1 < z < N.When the process which owns the local variable must be madeexplicit, the variable is subscripted with the process number.For example, iz indicates i of process z.

569


Program:thru M dobegin

PI: A [Z] := true;P2: LOOP: while K #z do

beginP3:P4:Ps:

P6:P7:P8:

Plo:

Pll

B [Z] := false;ff -A [K] then

K :=Z;end;J := 1;B[Z] := true;while J < N do

if z=J V-B[J] thenJ :=J+1

elsegO LOOP;

[critical section]B [z] :=false;A [Z] :=false;

P13: end;The transition diagram, with references to the place variablesomitted in the transition expressions, is shown in Fig. 8.For this section, the following things hold. The integers x

and y are arbitrary process numbers, and z is the process num-ber of the process currently executing the program. Theglobal variable K may be accessed or changed by any process.The global Boolean arrays A and B may be accessed by anyprocess, but A [xl and B[xl are only changed by the processwhose number is x. The variables i and c are local to a pro-cess and may be accessed only by that process. Jx and c X arei and c for process x. The value of the instruction counterfor process x is loc(x) and may take on the values "PI ","p2 , . . . ,"p13" (as before, P1 represents the number ofprocess instruction counters at place node "P1 ").The transitions are summarized in Table II.The Qo-invariant {PI1 1} expresses mutual exclusion in

the critical section, but it is convenient to first prove severalother Qo-invariants for later reference. The following pred-icates show the relationship between A [X I, B [x ], and loc (x).They may be proven Qo-invariant using the observation thatonly process x changes A [Xl or B [XI.

I l: A [X] -loc (x) E- {"P 2 , P 3 , *, p 12}I2: loc(x){"pE ","P9 , "Plo","P11} D B[X]13: B[X] D loc(x)e { P2", P3, "P6" p7 , . .p11 }-

The proofs of Il through 13 are not like any of the previousproofs in that all possible paths to a place node must beconsidered.The next Qo-invariant, I4, iS the key to proving mutual ex-

clusion. The predicate {B [XI D X> Jy} expresses the factthat process x prevents process y from entering the criticalsection when B [xl holds (assuming y # x). The predicate14 means that when B [xl and B [y] hold, either process x pre-vents process y from entering the critical section, or y pre-vents x, or they both prevent each other.

14: B [x]A B [y] A x#y D x > i y V y >J

110 zJ>N- ; Jz N-;-

P I1 [CRITICAL P9SECTION]

TI? true - T4I -(Z =Jzv T Z32=Jz2'P8 z I =false -B3[Jz I ~ -BmBJz

T 5 true- T1 6 true-A[zI:=false JJz:=Jz,I

T17 LCzio -Cz:=Cz-l

Fig. 8. Transition diagram for mutual exclusion example.

The proof that 14 is Qo-invariant is straightforward, except fortransition T16. For transition T16, part of the past historymust be considered. For details, see [8].The predicate Is expresses mutual exclusion in the critical

section.

I5: Pll < 1

Condition I5 can be proven Qo-invariant by contradictionusing 12, 14, and the enabling predicate of T1o. For details,see [8].The fact that more than one process cannot be at "Pi,1"

simultaneously does not imply that any processes ever getthere. However, if we succeed in proving finite termination,then all processes will have entered the critical section exactlyonce, for they must go through "pl " exactly once to getfrom "P 1 " to "P 13

Termination, Step A: Qf Implies Terminal StateIf Qf holds, all place variables except P13 are zero, and all

c x are zero; thus no enabling predicates are satisfied.

Termination, Step B: Terninal State Implies QfIt is easy to see that the only way a process can be blocked

completely from proceeding is by reaching "P13" with c=O.Thus, all place variables must be zero except P13, and P13 =Nin any terminal state. From iI, all A [i] are false if all pro-cesses are at "iP13", and, from the contrapositive of 13, allB [i] are false if all processes are at "PI13"

Termination, Step C: Finite"Time" is now defined more formally so that we can be

more precise when discussing it in proving finite termination.

570


TABLE IIMUTUAL EXCLUSION EXAMPLE

State variables and constants:Global variables: K, A[1:N], BEl:N].Global constants: N.Local variables: J, C.Local constants: M, z.

State predicates:

QO: {1 K < N} A P1=N A P2=P3=. .. 13=O A

for 1 < z < N: -A[z] A -B[z] A {Jzs N+1} A {Cz=Mz-l}

Qf: Pl=P2= **.=P2=0 AP13=N A

for 1 < z < N: -A[z] AB[z] A C =0

Transitions:Fnabl ing Predicate

T1: loc(z)="P"

T2 loc(z)="P2" AK=z

T3 loc(z)="P2" A K7z

T4 loc (z) ="P6"

T5 loc ( z) ="P3"

T6 loc (z) ="P71

T7 loc(z)="P4" A -A[K]

T:8 loc(z)='TP4" A A[K]

T : loc(z)="P5

T1: loc(z)=P8 z

T : loc(z)=IP4" A J K N

T 12: loc(z)="P AA

T13: loc (z) ="IP9 A (Z=Jz vf Jz]

T14: loc(z)='IP9" A z z

T15: loc (z)="P12J1

T16: loc(z)="Plo

T17: loc(z)="P131C2z'

Atomic Action

P =P 1; P *=P2+1;1 1 2 2loc(z) ="P2"; A[z]:=true;

+P2:=P2-1; P6:=P6+1;loc (z): ="P61;P =P2- 1; P3:=P3+1;loc(z):="P 3;P6: P6- 1; P7:=P7+1;loc(z) :="P7"; Jz:=l;P3:P p3- 1; P4 =P4+1;loc(z) ="P41; B[z]:=false;P7 P7- 1; P8 =P8+1;loc(z):=P8 B[z]:=true;

P4: 4- 1; P :=P5+1;loc(z) ="P5";p4=P4- 1; P2 =p2+1;loc(z) :="P2;P5:=P5-1; P2: =P2+1;loc(z):="P2"; K:=z;P8:=P8-1; P11 =P11 +l;loc(z) :="Pi,';P8:=P8- 1; P9:=P9+l;loc(z):="P9";F11:=Pil-1; P12:=P12+1;loc(z) :=''P121'; B[z]:=false;

P9 =P9-1; P10 P10 ;

loc(z) :="Plo ';- P9:=P9- 1; P2: =P2+1;

loc(z) :="P2";P12:P12- I;P13 :P13+1;loc(z):="P13''; A[z]:=false;

- lp *=P -1; P *=P +1;10 10 '8 8loc(z)P: =P81;.Pz: =.Jz+1;P13:P13-1; P1:=P1+1;loc(z) :=''P1'; Cz:=Cz-

Note: Underlined terms in tables are shown in boldface in text.

We define t as a special integer state variable unique from all than one set of values for the variables of the state vector isothers, and include it in the state vector. The value of t is involved. In such cases the notation v(q) or Q (q) can be usedinitially zero and increases by one at the completion of each to emphasize which state vector (and thus which instant oftransition. (Note: Since in a "real" system, transitions can time) is involved, where v can be any state variable (includinghappen simultaneously, t does not quite model "real" time. t), and Q can be any predicate involving the state variables.Rather, it is the number of transitions executed thus far.) For example, the statement "P is Qo-invariant" may be repre-Normally, a predicate involves one instant of time (between sented as {(t=OAQO)(qO)D(t>OAP)(qj)}, or simplytransitions) and thus only one set of values for the variables {(t=O A Qo) (qo) D P(ql )}. And {q -* q' D v(q)>v (q')}of the state vector. In this case there is no confusion. How- may be represented as {t(q)+l=t(q') D v(q)>v(q')}. Infin-ever, when more than one instant of time is involved, more ity is excluded as a value for t. Thus, proving the existence of

571


a value for t such that some predicate, say Q, holds means thatthe truth of the predicate Q is inevitable; i.e., that the predi-cate Q will become true after at most a finite number of tran-sitions have fired. (It may subsequently become false, how-ever.) For programs that halt, there is a maximum value fort for any particular execution sequence.

It appears that the proof of finite termination must bestructured around a key idea of the program: The globalvariable K holds the process number of one of the processeswhich will eventually enter the critical section. The situationis complicated by the fact that K can change several timesbefore the process whose process number is stored in K entersthe critical section. This is because the testing and setting ofK is not (and cannot be) an indivisible operation.To simplify the logic of the proof, it will be useful to first

prove several lemmas (16-18) to be Q0-invariant.

16: (t=to A A [K]) (qo) A (to <t < t1 D "T13 was not justexecuted by the process whose number is currentlyequal to K")(q1) (to < t < ti D A[K] ) (q2)-

Once A [K] holds, A [K] will remain true until (possibly) it isset false by a process which executes T13 and has processnumber equal to K when it does so. This is true even thoughK may change.

Proof: The value of the expression A [K] can only becomefalse if the value of the variable A [K I becomes false, or if thevalue of the variable K changes. In the former case, 16 holds(only process z can change A [Z] ). In the latter case the onlytransition to consider is T9. For Tg to fire for process x,loc(x)="P ", and from 1l we have that A[x] holds. AfterT9 fires, {K=x A A [X] } is unchanged, so A [K] will still hold.

QE.D. [16]17: If at any time A [K] holds, then K can change at most

a finite number of times while A [K] continues to hold.

Proof: The variable K can only change at T9. For T9 tofire, P5 must be greater than zero. P5 can increase only byT7 firing, but once A[K] holds, T7 is disabled. Therefore,T9 can fire at most P5 more times as long as A[K] holds.Since P5 < N, K can change at most a finite number of timeswhile A [K] continues to hold. Q.E.D [17 ]

I8: x# y A (t=toA K=x A A [K]) (qo)D (3t, )[t, > to

A (to < t < t, D K=x A A [K] )(ql)A {(tl1 t S t2 D K=x A A [K] ) (q2)

D (tl < t < t2 D -B [y] ) (q3)}].If x and y are different processes, then if {K=x A A [K] } holdslong enough, B [y] will eventually (at t1) be false and willstay false for at least as long as {K=x A A [K] } continuesto hold.

Proof: Let the present time be to. Assume {x#AyA K=xAA[K]} holds, for otherwise 17 is vacuously true. Assume{K=x A A [K]} continues to hold (for an unspecified time),for otherwise the theorem can be satisfied by letting t1 bethe time at which {K=x A A [K] } flrst becomes false. UsingI3 we break up the proof into 4 cases, as follows:

Case 1: loc(y)= "P3 ".Transition T5 will inevitably insure that B [y] is false. The

variable B [y] will remain false until process y can executeT6 (T6 is the only transition which can set B [y] true). Start-ing from "P4" process y cannot execute T6 without firstexecuting T2. But T2 is disabled for process y for at leastas long as {K=xAA [K]} holds. Thus, once process y ex-ecutes T5, B [y] will remain false at least until {K=x A A [K] }no longer holds.Case 2: loc(y)=""P2Process y will inevitably execute T3, resulting inloc(y)= "P3",

but then Case 1 proves the theorem.Case 3: loc(y)="Pj1 .

Transition T12 inevitably fires for process y and sets B [y]false. Process y will then inevitably reach "P13" and willeither be blocked there forever (if cy=O) or will inevitablyreach "P2". In the former case B[y] remains false forever,satisfying the theorem. The latter case was proven by Case 2.Case 4: loc(y) E {"P6","P67'1 ","8 ","P9","Po"r } .

If loc(y)="P6", or loc(y)= "P7" then inevitably loc(y)= P8.If loc(y) is in the loop "p8", "P9", "Pio", "P8", then in-evitably process y exits the loop by going to either "P2"or "iPI 1" (it cannot go around the loop more than N times).If it reaches "P2", then Case 2 proves the theorem. If itreaches "P11", then Case 3 proves the theorem. Q.E.D. [18]

A nonnegative finite variant function for the program isthe sum of N individual variant functions as follows:

V = V1+V2+V3+V+ +VN

where the nonnegative finite individualVX are defined as follows.

variant functions

VX (if loc(x)="P1 then 3*N+7 elseif 1oc(x)="P2" then 3*N+6 elseif loc(x)="P3" then 2 elseif loc(x)="P4" then 1 elseif loc (x)= "P 5 " then 0 elseif loc(x)= "P 6 " then 3 * N+5 elseif loc(x)= "P 7" then 4 elseif Ioc(x)="P8" then 3 elseif loc(x)="Pg" then 2 elseif loc(x)= "PIl0" then 1 elseif loc(x)="P I1 " then 2 elseif loc(x)= "P 12" then 1 elseif loc(x)= "P13" then 0)

+3*(N+l Jx) + (3*N+8)*Cx

Since {fx < N+l} is Qo-invariant, all transitions decrease v byat least one except for T8, T9, and T14, which increase v by3*N+5, 3*N+6, and 3*N+4, respectively. There is no ap-parent variant function which is strictly decreasing. The pro-gram really does seem to lose progress when one of these threetransitions loops back to "P2" without changing any programvariables. The program terminates in a finite number of stepsif and only if these transitions can be executed at most afinite number of times.

572


Proof of Finite Termination: Because the finite delay as-sumption must be relied upon, finite termination might, perhaps,be best expressed as Qf is Qo-inevitable, i.e., {(3to)(t=to AQf) (q)} is Q0-invariant. First we prove the predicate {(t=to A,-Qf)(qo) D (3tj )(3x) (tj >to /\ t=tj A loWx=r"P13"9)(q&)Qo-invariant. That is, if the program has not yet halted, thereis at least one process that will reach "P13" after a finite (i.e.,positive) number of transitions. Proof of fmite termination(i.e., that Qf is Qo-inevitable) then follows by induction on thenumber of times processes can reach "P 13""

If Qf does not hold, then there is at least one process that isnot at place node "P13" with c=O. There are two cases toconsider: A [K] and -A [ K].

Case 1: A[K].There must be a process x whose location counter is not at

"P13" (from Ii).Case 1.1: Suppose K does not change as long as process x

has not yet reached "P13". Since no process can be blockedcompletely except at "P 13 ", if process x does not reach "P 13"in a finite number of steps, then it must be in an infiniteloop. The variant function v,x for an individual process showsthat the only transitions which do not make progress towards"P13" are T8, Tg, and T14.

Case 1.1.1: Consider T8.Any loop through T8 must go through T3. But T3 is dis-

abled (by the assumption of Case 1.1). Therefore, there is noinfinite loop through T8.

Case 1.1.2: Consider Tg.There is no infinite loop through Tg by the.same argument

as the previous Case 1.1.1.Case 1.1.3: Consider T14-The only way an infinite loop through T14 can occur is

if the enabling predicate of T14 can always be satisfied whenprocess x is at "pg". Let y=s,. For T14 to be enabled,{y#xAB[y]} must hold. From I8, B[y] will eventuaUybecome false and remain false until K changes or process xreaches "'P13'. Since y could be any process other than x,and since there are at most N - 1 of these, there can be no in-finite loop through T14 for process x.

Since the only three transitions which do not decrease thevariant function for process x can be executed at most afinite number of times by process x before it reaches "P13"',there is no infinite loop for process x, and it inevitably reachesP13

Case 1.2: Suppose K changes before process x reaches"P13". A[K] remains invariantly true by 16. Thus Case 1still applies. Furthermore, K can change at most a finitenumber of times in this fashion by 17. When K has changedfor the last time before the process whose number is equalto K reaches "P13'"(or the last time K changes, whicheverhappens first), we have the previous case (Case 1.1).Case 2: -A [K].From I and the fact that Qf doesn't hold, there is a process,

say, x at "P1" or at "P 13" with c,x>O. Assume A [K] remainsfalse, for otherwise Case 1 applies. If loc(x)="P13'" then in-

cess x inevitably follows the path P1 [T1] P2[T3]P3[T5IP4[T7]P5 [T9]. The value of A [K] is then true, and Case 1 applies.

Thus, if Qf does not hold, a process will inevitably reach"''P13" after making at least one transition. Each process x canreach "P 13 " at most a finite number of times (MX and is thenblocked forever. The number of processes (N) iS finite. Byinduction on the number of times processes can reach "P 13",the program inevitably halts. Q.E.D. [Finite Termination]

Discussion

The proof of partial correctness and finite termination forthis example was much more difficult than the previous ex-amples. The proofs seem to center around the key ideas be-hind the program, and the last program was much more dif-ficult than the others. The relative simplicity of the previousprograms compared to the last illustrates the utility of syn-chronization primitives.In the mutual exclusion and buffer synchronization ex-

amples, it made no significant difference whether a processreexecuted the program or a process first executed the pro-gram. Thus, the programs were modeled so that each pro-cess executed the program only once. In the current ex-ample, the processes could be distinguished by their effecton the global A and B arrays: A [xl and B [XI could only bechanged by process x. Therefore, the program was modeledso that each process could repeatedly execute the programa finite number of times.The current example illustrates how to handle the proofs

when specific processes need to be identified, how to handlevariables local to processes, how to use variant functions forindividual processes, how to use the finite delay assumption,and how to handle time.

VII. DISCUSSION

The parallel program model presented above is based onKeller's model [6], which is very general. A connection be-tween Keller's work and Dijkstra's work with serial nonde-terministic programs [21 was shown. The main contributionsof the paper are general techniques for proving proper termina-tion behavior of parallel programs-the absence of deadlock,livelock, and infinite loops. The program is modeled so as toterninate, and the initial state satisfies the predicate Qo. Theabsence of deadlock is shown by proving that all terminalstates must satisfy the final state predicate Qf, provided Qoholds initially. The absence of livelock and infinite loops isshown by proving finite tennination, provided Qo holdsinitially. Finite termination is proven with the aid of aninteger variant function which is not necessarily strictly de-creasing. Reliance on the assumption of finite delay is some-times necessary in proving finite termination.The approach presented is general and does not depend upon

any special -constructs, such as the "resource" of Owicki andGries [7], or any special synchronization primitives, etc.Proofs can be carried out both at the process level and at thecentral processor level. It is possible to suppress details of theprogram irrelevant to the proofs, and to structure proofs

573

evitably T 1 7fires, and then loc (x)= "P 1"

- Starting at "P 1"pro-


hierarchically in a stepwise decomposition. The effort requiredfor proofs grows linearly with the size of the program, anddoes not grow with the number of processes executing theprogram.The use of place variables is very suggestive in forming syn-

chronization conditions and variant functions. However, tocarry out proofs, insight into the program behavior is required,for the key properties of the program must be captured in in-variantly true predicates. It seems unavoidable that the com-plexity of the program is reflected in the complexity of itsproof.The need for correctness proofs is often greater for parallel

programs than for serial programs. The number of possiblecases grows enormously, both with the length of the programand the number of processes. Exhaustive testing and repro-ducibility are normally out of the question, because it is notpractical to control the relative execution sequences of theprocesses. Sometimes, even extremely short programs, suchas the LOCKBLOCKLOCK/UNLOCKBLOCK code of the buffersynchronization example, are difficult to get correct. (In fact,[8, sect. 3] reveals that the original version of the code forLOCKBLOCKLOCK/UNLOCKBLOCKLOCK was incorrect.) Con-sequently, parallel programs are released with errors-errorswhich may not be encountered for months or years.The method presented above is simple enough that many

programmers could understand it without undue effort, andpractical enough to be used at least to prove relatively smallcritical routines. However, constructing correct programs isoften more to the point than proving programs correct. Thearea of constructive methods for parallel programs, therefore,seems to merit further study. Starvation of processes andfairness in schedulers are also possible areas of further re-search among very many others.

ACKNOWLEDGMENTThis work was based on a large body of previously pub-

lished literature in the field of program correctness. The twomost important influences were Keller's inspiring work withparallel programs [6] and Dijkstra's very beautiful work withserial nondeterministic programs in [21 and [3]. The authorwishes to thank P. Garrett, T. Gottfried, B. Forbes, and D. Wein-berger for help with typographical errors, and D. Mihatovic,G. Brown, and M. Oswalt for typing the manuscript. Thanksto L. Potter and B. Fiori for doing the figures. Special thanksto the editor and the anonymous referees for many valuablecorrections and suggestions.

REFERENCES[1] Burroughs Corp., B7000/B6000 ALGOL Reference Manual

(5001639). Detroit, MI: Burroughs Corp., 1977.[2] E. W. Dijkstra, "Guarded commands, nondeterminacy, and

formal derivation of programs," Commun. Assoc. Comput.Mach., vol. 18, pp. 453-457, Aug. 1975.

[3] -, A Discipline of Programming. Englewood Cliffs, NJ:Prentice-Hall, 1976.

[4] -, "Solution of a problem in concurrent programming con-trol," Commun. Assoc. Comput. Mach., vol. 8, p. 569, Sept.1965.

[5] -, "Hierarchical ordering of sequential processes," Acta In-form., vol. 1, pp. 115-138, 1971.

[61 R. M. Keller, "Formal verification of parallel programs," Com-mun. Assoc. Comput. Mach., vol. 19, pp. 371-384, July 1976.

[7] S. Owicki and D. Gries, "Verifying properties of parallel pro-grams: An axiomatic approach," Commun. Assoc. Comput.Mach., vol. 19, pp. 279-285, May 1976.

[8] A. F. Babich, "Proving the correctness of parallel programs,"IEEE Computer Society Repository, R78-21, Feb. 1978.

[9] J. C. Cleary, "Process handling on the Burroughs B6500," inProc. 4th Australian Comput. Conf., Adelaide, South Australia,1969, pp. 231-239.

[10] E. T. Organik, Computer System Organization: TheBS 700/B6700Series. New York: Academic, 1973.

.g Alan F. Babich (M'72) was born in Sewickley,PA, on November 21, 1943. He received theB.S. degree in physics in 1965, the M.S. degree

* in electrical engineering in 1966, and the Ph.D.degree in electrical engineering in 1972 all fromCarnegie-Mellon University, Pittsburgh, PA. His

| l | 1 doctoral thesis evaluated a novel method ofcomputer simulation.He worked as a Computer Scientist for Bur-

roughs Corporation, Detroit, MI, from 1971 to1972. From 1972 to 1974 he worked as a

Senior Programmer for Burroughs in the City of Industry and MissionViejo, CA. From 1974 to 1975 he was a Management Systems Analystand from 1975 to 1979 he was Project Leader of data management forBurroughs Corporation, Mission Viejo, CA. Since February 1979 hehas worked as a Senior Project Engineer on the technical staff of BasicFour Corporation, Santa Ana, CA. Some of his duties for Burroughswere research, design, and development of data management systems.He was a principal architect and project leader of Burroughs' DMSIIdatabase management system. DMSII has been on the Datapro maga-zine honor roll. It received a perfect 4.0 rating for overall user satisfac-tion. His duties at Basic Four Corporation include research, planning,and developing advanced software and hardware-software products.His interests have included reliable software, proof of correctness,parallel programming, computer simulation techniques, data manage-ment, database recovery techniques, computer architecture, databasemachines, and hardware-software tradeoffs.Dr. Babich is a member of the Association for Computing Machinery

and Sigma Xi.

574

Proving Total Correctness of Parallel Programs - computer.org ·...

Documents

Transcript of Proving Total Correctness of Parallel Programs - computer.org ·...