Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf ·...

14
SIAM J. APPL. MATH. Vol. 17, No. 2, March 1969 BOUNDS ON MULTIPROCESSING TIMING ANOMALIES* R. L. GRAHAMS" 1. Introduction. It is well known (cf. [5], [6], [8]) to workers in the field of parallel computation that a multiprocessing system consisting of many identical processors acting in parallel may exhibit certain somewhat unexpected "anomal- ies," even though the system operates under a rather natural set of rules; e.g., it can happen that increasing the number of processors can increase the length of time required to execute a given set of tasks. In this paper we study a typical model of such a multiprocessing system, and we determine the precise extent by which the execution time for a set of tasks can be influenced because of these timing anomalies. A special case of this model will be shown to generate an interesting number- theoretic question, partial answers to which are given in the latter half of the paper. 2. Description of the system; examples of anomalies. Let us suppose we are given n (abstract) identical processing units Pi, 1,..., n, and a set of tasks T { T1, "", T} which is to be processed by the Pi. We are also given a partial order -< on T and a function/: T --, (0, oe). Once a processor Pi begins to execute a task T, it works without interruption until the completion of that task, requiring altogether/z(T) units of time. It is also required that the partial order be respected in the following sense: If T -< T then T cannot be started until T has been com- pleted. Finally, we are given a sequence L (TI, ..., T,)consisting of all the tasks of T and called a priority list. The P execute the T as follows: Initially, at time 0, all the processors (instantaneously) scan the list L from the beginning, searching for tasks T which are "ready" to be executed, i.e., which have no pre- decessors under -<. The first ready task T in L which Pi comes to is started by P; P continues to execute T for the #(T) units of time required to complete T. In general, at any time a processor P completes a task, it immediately scans L for the first available ready task to execute. If there are currently no such tasks, then P becomes idle. (We shall also say that P is executing an empty task denoted by p.) P remains idle until some other P completes a task, at which time P (and, of course, P) immediately scans L for ready tasks (which may now exist because of the completion of P). If two (or more) processors both attempt to start executing a task, it will be our convention to assign the task to the processor with the smaller index. The least time at which all tasks of T have been completed will be denoted by o. In the interests of mathematical rigor, it will be convenient to consider the t(T) units of time required for the execution of to be a half-open interval It, + #(T)) on the time axis. Received by the editors January 7, 1968. Presented by invitation at the Symposium on Com- binatorial Mathematics, sponsored by the Office of Naval Research, at the 1967 Fall Meeting of Society for Industrial and Applied Mathematics held at the University of California, Santa Barbara, November 29-December 2, 1967. f Bell Telephone Laboratories, Incorporated, Murray Hill, New Jersey 07974. Compare with [4]. 416 Downloaded 11/13/16 to 128.93.162.74. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Transcript of Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf ·...

Page 1: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

SIAM J. APPL. MATH.Vol. 17, No. 2, March 1969

BOUNDS ON MULTIPROCESSING TIMING ANOMALIES*R. L. GRAHAMS"

1. Introduction. It is well known (cf. [5], [6], [8]) to workers in the field ofparallel computation that a multiprocessing system consisting of many identicalprocessors acting in parallel may exhibit certain somewhat unexpected "anomal-ies," even though the system operates under a rather natural set of rules; e.g., itcan happen that increasing the number of processors can increase the length oftime required to execute a given set of tasks. In this paper we study a typical modelof such a multiprocessing system, and we determine the precise extent by which theexecution time for a set oftasks can be influenced because ofthese timing anomalies.A special case of this model will be shown to generate an interesting number-theoretic question, partial answers to which are given in the latter half of the paper.

2. Description of the system; examples of anomalies. Let us suppose we aregiven n (abstract) identical processing units Pi, 1,..., n, and a set of tasksT { T1, "", T} which is to be processed by the Pi. We are also given a partialorder -< on T and a function/: T --, (0, oe). Once a processor Pi begins to executea task T, it works without interruption until the completion of that task, requiringaltogether/z(T) units of time. It is also required that the partial order be respectedin the following sense: If T -< T then T cannot be started until T has been com-pleted. Finally, we are given a sequence L (TI, ..., T,)consisting of all thetasks of T and called a priority list. The P execute the T as follows: Initially, attime 0, all the processors (instantaneously) scan the list L from the beginning,searching for tasks T which are "ready" to be executed, i.e., which have no pre-decessors under -<. The first ready task T in L which Pi comes to is started byP; P continues to execute T for the #(T) units of time required to completeT. In general, at any time a processor P completes a task, it immediately scansL for the first available ready task to execute. If there are currently no such tasks,then P becomes idle. (We shall also say that P is executing an empty task denotedby p.) P remains idle until some other P completes a task, at which time P(and, of course, P) immediately scans L for ready tasks (which may now existbecause of the completion of P). If two (or more) processors both attempt tostart executing a task, it will be our convention to assign the task to the processorwith the smaller index. The least time at which all tasks of T have been completedwill be denoted by o. In the interests of mathematical rigor, it will be convenientto consider the t(T) units of time required for the execution of to be a half-openinterval It, + #(T)) on the time axis.

Received by the editors January 7, 1968. Presented by invitation at the Symposium on Com-binatorial Mathematics, sponsored by the Office of Naval Research, at the 1967 Fall Meeting of Societyfor Industrial and Applied Mathematics held at the University of California, Santa Barbara, November29-December 2, 1967.

f Bell Telephone Laboratories, Incorporated, Murray Hill, New Jersey 07974.Compare with [4].

416Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 2: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

MULTIPROCESSING TIMING ANOMALIES 417

We noW consider an example which illustrates the working of the precedingmultiprocessing system and various anomalies associated with it. We indicate thepartial order -< on T and the function/t by a directed graph G(-<, l). In G(-<,the vertices correspond to the T and a directed edge from T to T denotes T -< T.The vertex T of G(-<,/) will actually be labeled with the symbols T//4T). Theactivity of each Pi is conveniently represented by a timing diagram D. D consistsof n horizontal half-lines (labeled by the Pi) in which each line is a time axis startingfrom time 0 and is subdivided into labeled half-open segments according to thecorresponding activity of P.

Example. n 3; L (T1, T2, T9).

6(-4, I:

TI/3 0 ,0 T9/9T2/2 )0 T5/4

T6/4T3/2 T7/4T4/2 ’0 Ta/4

rl T9

D" [TIT T5 T o9=12.2 2 4 4

T3 qg, T6 Ts2 2 4 4

Note that in D we have labeled the intervals above by the task and below by itslength.

It is evident from the definition of o that it is a function of L, #, -4 and n.Let us vary each of these four parameters in the example and see the effect thisvariation has on o.

(i) Replace L by L’= (Tx, T2, T4, T5, T6, T3, T9, TT, T8), leaving #, -< andn unchanged. In this case we obtain

T1 T3 T93 2 9

T T T o2 4 4 4

T T6 T8 02 4 4 4

and o’ ’(L’, p,--<, n) 14.

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 3: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

418 R.L. GRAHAM

(ii) Change -< to -<’ by removing T4 T5 and T4 --, T6. Then

3 4 9

[ 2 4 8

T.! T5 T8 CPe2 4 4 6

and 09’ co’ (L, p, -<’, n) 1 6.(iii) Decrease p to p’ by defining p’(T)= p(T)- 1 for all i. In this case

G(-<, p) becomes

and

T1/2 0 ,0 T9/8

Tel1T6/3

Tdl T/3T,/1 .0 T8/3

T T T8 (’t93

2 3 3 5

D’" TeT4IT61113 T981311 T7 (’192

3 8

with co’ co’(L,/t’, -<, n) 1 3.(iv) Increase n from 3 to 4. Then

T T9

T 24 9

T7 cp3

4 9

and co’ 15.

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 4: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

MULTIPROCESSING TIMING ANOMALIES 419

The examples in (ii), (iii) and (iv) show that contrary to what might be generallyexpected, relaxing -<, decreasing p, or increasing n can all cause co to increase.In the next section we obtain an upper bound on the factor by which co can increaseby simultaneously changing L, relaxing -<, decreasing p and changing n. Thisbound is optimal in the sense that it cannot be replaced by any smaller function ofthe same variables.

3. The general bound. Suppose we are given a set T of tasks which we wishto execute two separate times. The first time we are given a time function p, apartial order -<, a priority list L and a multiprocessing system composed of nidentical processors Pi, 1, ..-, n. The second time we are given a time functionp’ __< p, a partial order2 -<’

_-<, a priority list L’, and n’ identical processors

Pi, 1, ..., n’. Let co and co’ denote the respective finishing times. Our next goalis to establish the following theorem.

TIOnM 1.

<1+

Proof. Consider the timing diagram D’ obtained by executing the tasks Tof T using the primed parameters.

0 fO

The set of all points of time in [0, co’) can be partitioned into two subsets A and B.A is defined to be the set of all points of time for which all processors are executingsome task of T. Similarly, B is defined to be the set of all points of time for whichat least one processor is idle (but not all processors are idle). We note that bothA and B are the disjoint union of half-open intervals. Let T, denote a task whichfinishes in D’ at time co’. Let S(TI denote the time at which TI is started. There aretwo possibilities. If S(T)e B and S(T,) is not a boundary point of B, then by thedefinition of B there is some processor Pi which for some e > 0 is idle during thetime [S(T) e, S(T,)). The question which naturally occurs is why TI was notstarted until time S(TI when P was idle before and during this time. The onlypossible answer is that there must be some task T2 in D’ such that T2 -<’ T and

T is completed at time S(T1 (for this would certainly cause T to wait untiltime S(TI) to be started).

On the other hand, suppose S(T,)e A or S(T) - 0 is a boundary point ofB and there exists x < S(TI) such that x B. Let xl 1.u.b. {x:x < S(T,) and

Since a partial order on T is a subset of T T, -<’_

-< has the obvious meaning.

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 5: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

420 R.L. GRAHAM

X e B}. By the construction of A and B, we see that x e A, and for some processorPi and some e > 0, Pi is idle during the time Ix1 e, xl). We again ask why T,was not started during this time period. The only possible answer is that sometask T2 must have been executed during this period upon which Tj, dependedin order to be started. For certainly if this were not the case, then either T orsome predecessor of T, should have been started during this time.

In either case (S(T,) A or B) we have seen that either there exists a taskwhich is completed at time F(T2) such that T2 -<’ T, and y [F(Tj), S(T,))impliesy A or x < S(T) implies x A or x < 0. We can repeat this construction in-ductively, forming T3, T4, ..., etc., until we reach a task T,, for which x < S(T,,)implies x e A or x < 0. Consequently, we have shown the existence of a chain oftasks

in D’ such that at every time B, some Tj is being executed. We say that this chaincovers B. The important thing to notice about this chain is

(2) ’(rp) =< (n’ 1) ] /(Tj),D’ k=

where the left-hand sum is over all empty tasks o’ in D’. But (1) and the hypothesis-<’

_-< imply.

(3)

Thus

(4) 09 > ,u(Tj,,) >_ ,u’(Tj,,).k=l k=l

Consequently, by (2) and (4),

(5) n TtceT qeD’

<= n’(nee + (n’ 1)co).

From this we obtain

(6) < -t---ee n

and the theorem is proved.Examples are given in [2] which show that the bound in (6) is best possible.

In fact, for n n’, it is shown that the ratio of 2 1/n for ee’/ee can be achieved (towithin an arbitrary e > 0) by the variation of any one of L,/ or -<. It can be notedthat for n 1, co’ is never greater than co, while for n > 1, co’ can be greater thanco even though n’ is quite large.

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 6: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

MULTIPROCESSING TIMING ANOMALIES 421

4. A modified system. It may be pointed out that it is quite reasonable toconsider a multiprocessor system in which the priority list L is "dynamicallyformed" as opposed to the fixed list we have used thus far. For example one quitereasonable way of doing this is as follows: At any time a processor is free, itimmediately begins to execute the ready task which currently heads the longestchain of unexecuted tasks (in the sense that the sum of the task times in the chainis maximal). Suppose by following this algorithm of choosing tasks we have afinishing time of mr.. If we denote by 090 the minimum possible finishing time(for all possible lists), then we would like to assert something about the ratio09L/09o. It follows from Theorem 1 that 09J09o =< 2 i/n; we could hope in factthat 09/090 would always be considerably closer to 1 than this. Unfortunately,however, this is not the case since it can be shown that the best possible bound onthis ratio is given by

09. 2<2

090-- n+l

(which is a slight improvement). Similarly, we might use the algorithm that aprocessor always tries to execute the ready task T which has the largest sum#(T) + T,.<Tjt(T) (i.e., the sum of the descendant lengths is maximal). If wedenote the finishing time using this algorithm by 09M, then it is again possible toproduce examples for which 09M/09o is as large as 2 2/(n + 1).

However, there is a special case for which it is possible to lower the precedingbounds significantly and still use algorithms which require relatively little effort.This is the case in which -,( is empty, and it is this case to which we shall restrictour attention for the remainder of the paper.

5. The special case in which -( is empty. As before we are given tasksT {T1,..., Tr}, a time function /: T- (0, ), and n processing units. Wecould ask for an algorithm for choosing the T for which the finishing time isoptimal, i.e., as small as possible. However, in general, it seems quite likely3 that thiscould only be achieved by an exponential (in r) number of steps, and even formoderate r, this would be prohibitive. It is more reasonable to ask for a method ofobtaining a finishing time 09 such that 09/090 is known to be relatively close to 1,and only a modest amount of energy is expended in obtaining 09. The algorithmwhich generates 09 described in the preceding section is an example of such amethod. In this algorithm, since -( is now empty, a free processor always startsto execute the longest remaining unexecuted task. With 09. as the finishing timefor this algorithm we have the following theorem.

THEOREM 2.

(7)09. < 4

COo- 3 3n

and this bound is best possible.

This has never been proved.

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 7: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

422 R.L. GRAHAM

Proof. Assume there exist a set of tasks T { T1, T} and #" T (0, )which contradict (7). Let i denote #(T) and let us renumber the T so that

(8)

The theorem clearly holds for n 1; hence, we can assume that n >__ 2 and r isminimal.

First note that, by the definition of ogL, the order in which the tasks areexecuted corresponds precisely to using the priority list L (T1, T2,..., T).Suppose in the corresponding timing diagram DL, Tm is a task with the latestfinishing time (which must be coL), where m < r. If we consider the truncated setT’ { TI, Tin} with the list L’ (T, Tin), we see that the execution timeo’ for T’ using L’ is exactly ooL. On the other hand, for the optimal value o9)for T’, it is true that o9 < O9o, where 09o denotes the optimal time for the set T.Hence

09’ % 4

o) Oo 3 3n

and the set T’ forms a smaller counterexample to the theorem. This contradictsthe minimality assumption on r. Thus, we can assume that T is the only task whichfinishes at time ogL.

It is immediate that

1(9) Oo >

ni=l

Also, it follows that if denotes the starting time of T, thenr-1

(10) ] i >- nr, L=r +.i=1

since no processor is idle before starts being executed. Therefore

0 0 0 HO =(n 1)

+n(Do ncoo i=

Since T contradicts (7),

(n 1)o,< +1.

(n 1), oL 41+ >-->

nOo o 3 3n

(n- 1) n- 1>

nogo 3 3n 3n

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 8: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

MULTIPROCESSING TIMING ANOMALIES 423

and finally(.Oo(11)3

Hence, if (7) is false, in an optimal solution (which has timing diagram Do), noprocessor can execute more than two tasks.

Suppose the following configuration occurs in Do:

If we interchange i, and j, to form

then in Db the (possibly) new finishing time o9’ certainly satisfies o’ =< O9o, i.e., thissingle interchange could not have caused Oo to increase. Also, if the configuration

0 0

occurs in Do, then moving i from the line with i to the line with j cannot cause09o to increase. Let us call either of these two preceding operations a Type 1operation. By a Type 2 operation on Do we mean changing any occurrence of

to the form

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 9: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

424 R.L. GRAHAM

Clearly, this operation does not affect COo. For any timing diagram D we definea function 5e(D) as follows" Let F denote the least time such that for every timet’ >= t, the processor P is idle in D. Then

6e(D)-- IF,- FjI.<=i<j<_n

It is not difficult to check"(a) If D’ is obtained from D by a Type operation, then 5e(D’) < 5(D);(b) if D’ is obtained from D by a Type 2 operation, then 5(D’) 5e(D).Now we start from Do and apply all possible Type 1 and Type 2 operations

until the resulting timing diagram D* has no internal configurations to whicheither type of operation may be applied. That such a D* exists follows from thefacts that there are only a finite number of possible arrangements of the r tasks onn lines, between two Type operations only a finite number of Type 2 operationsmay be performed, and only a finite number ofType operations may be performedbecause of (a). Hence, in D* it follows that for any configuration of the form

(Z (X

Zj

k

we have"

(12) > , implies , (xi, 0 : 0k and (Xi > i’’

Thus, by a suitable rearrangement of the lines of D* we can bring D* into the form

Ok

Ok

i2 Oi

Oit

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 10: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

MULTIPROCESSING TIMING ANOMALIES 425

where

and, by (12),

> >k k Oks

Oil 0i2

But (12) also implies ks >- i, and 0it 0i. Hence we combine these to obtain

(13)

Since none of the operations applied to Do caused COo to increase, by the optimalityof COo the finishing time of/ must also be COo. But note that looks very much like4

the timing diagram DL obtained by using the decreasing length list (T1, ..., Tr).In fact the only way in which DL could differ from D is in the assignment of thesecond-layer tasks T/. Specifically, a difference could occur only if for some pairT/, T/ we have oiu -]- (Xi (Xim for some m < k,

Oi

In this case, in DL, T;, with length i, might be assigned to Pj instead of Pi. However,if this situation were possible, then, in ,

and it would be possible to move ti from Pi to ej; and since the finishing time is notincreased, it is still COo. This is a contradiction since this is now an optimal solutionwhich has three tasks assigned to one processor. Hence, we conclude that DLand D are isomorphic (in the obvious sense) and COo i.. But this contradictsthe hypothesis that COL/COo > 4/3 1/(3n), and the validity of the bound givenby the theorem is established.

To show that this bound is best possible, we consider the following set of tasklengths"

(01, 02, "’’, r) (2n 1,2n 1,2n 2,2n 2, ..., n + l,n + 1, n,n,n),

’ Up to renaming tasks of equal length.

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 11: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

426 R.L. GRAHAM

where r 2n + 1. Specifically we have

k-- 1,-..,2n,

In this case

P1 0(1 0(2n

0(2

0(3

0(n+ 2

P0(n 0(n +

[,

0(2n +

and 0(2n+1 /7.

COL 4n 1,

and

P1 0(1

0(2

0(2n 0(2n +

Oo 3n,

and therefore

OL 4 1

COo 3 3n

It will be admitted that the bound 4/3 1/(3n) is not the first expression whichcomes to mind if one were to make an a priori guess at the answer. In the followingsection, however, it will be seen that this as well as the earlier bound of 2 1/n areboth simple special cases of a more general result.

6. A general algorithm. Following a suggestion ofD. Kleitman and D. Knuth,we were led to consider the following algorithm (still for the case in which -< isempty): For an integer k _>_ 0, choose the k longest tasks of the set of tasksT-- {T1, ..., T} and arrange them in a list L which gives the optimal solution

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 12: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

MULTIPROCESSING TIMING ANOMALIES 427

09k for these k tasks. Now, extend L to a sequence containing all the tasks of T byadjoining the remaining r k tasks arbitrarily to L, forming the list L(k). Let09(k) denote the finishing time in the timing diagram D(k) for T using L(k). Again,let 09o denote the minimum possible finishing time for T. Our final result is thefollowing theorem (where . denotes the greatest integer function).

THEOREM 3.

co(k) 1- 1/n_<1+

090 1 + [k/n]

This bound is best possible for k =- 0 (mod n).Proof. If og(k) (_Ok, then 09(k) 090 and the theorem holds. We can therefore

assume 09(k) > 09k- We can also assume r > k. Let a* denote maxk+ lt=<r {at}where, as before, a p(T). By the definition of a* and the rules of operation of themultiprocessor system it follows that no processor can be idle before time09(k) a*. Hence,

(14)j=l

and consequently,

(1 5) 090 >- o >= 09(k)nj=

There are at least k + tasks T which have length _>_ *. Hence, some processormust execute at least 1 + [k/n] of these "long" tasks. This implies

(16) 090 1+

By (15) and (1 6) we finally obtain

(17) 09(k) < 090 + * < 09ol 1 +n 1 + [k/n]

which is the bound stated in the theorem.To show that this bound is best possible when k _= 0 (mod n) we present the

following example" Define i for 1 __< <_ k + 1 + n(n 1) by

{for l__<i<__k+ 1,i=

fork+2_<_i<=k+ + n(n- 1).

For this set of tasks and the list L(k) (T1, ..., Tk, Tk+ 2, "’", Tk+ 1+,,0,- 1), Tk+ 1)we have 09(k) k + 2n 1. Since 090 k + n,

09(k) k + 2n- 1 n- 1 1 1In 1 1In=1+--=1+ =1+090 k+ n k + n 1 +kin 1 + [k/n]

This completes the proof of the theorem.

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 13: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

428 R.L. GRAHAM

We have already seen several special cases of Theorem 3. For k 0, we have

(0) 1_<2

090

which is also implied by Theorem 1 for n n’. Theorem 3 implies

1/n 409L_< 09(2n)_ _< +090- 090 1 + [2n/n] 3 3n

which is also the bound of Theorem 2.There is an obvious algorithm for achieving the optimal solution for the n

largest tasks namely, just assign one task to each processor. If the remaining r ntasks are chosen arbitrarily, then by Theorem 3 we conclude

co(n) -< 3

COo 2 2n

It would be interesting to know other simple algorithms which are optimal forthe cases r 3n, 4n, etc.

The problem we have been considering is equivalent to the following" Weare given a set (with possible repetition) of positive real numbers A {01, 2, "’",

0r}. For each partition rt of A into n subsets A1,..., An, let m(rt) denote maxi

’otAi and let mo denote min m(rc). For a given e > 0, we wish to "efficiently"determine a partition rt 7r()such that

mo-< +e.

We must keep in mind, of course, that we can always find a partition rc such thatm(zc) =mo just by a finite enumeration of all partitions of A. This algorithm,however, requires an "exponential" amount of work (in terms of the number oftasks r). On the other hand, by simply ordering the 0i into a nonincreasing sequence,which can be done in roughly r log2 r comparisons, we can form a partition rc forwhich m(rc)/mo < 1 + (by Theorem 2).

Clearly a basic problem in this area is to make the preceding concept of"amount of work" precise and to develop strong upper and lower bounds on thework needed to achieve near optimal solutions. For example, suppose we restrictourselves to the two operations of addition and comparison and assume n 2.It is probably true that there exists a constant C > 1 such that any algorithm whichdetermines an optimal partition rt (i.e., such that m(rt) too) for any finite set oftasks Tmust require at least CITI operations. However, this is not known at present.

REFERENCES

[1] E. L. CODD, Multiprogram scheduling, Comm. ACM, 3 (1960), pp. 347-350.[2] R. L. GRAHAM, Bounds for certain multiprocessing anomalies, Bell System Tech. J., 45 (1966),

pp. 1563-1581.[3] J. HELLER, Sequencing aspects ofmultiprogramming, J. Assoc. Comput. Math., 8 (1961), pp. 426-

439.

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php

Page 14: Bounds on Multiprocessing Timing Anomaliespeople.irisa.fr/Sophie.Pinchinat/AA/Graham1969SIAM.pdf · 2016. 11. 14. · activity ofeach Pi is conveniently represented bya timing diagram

MULTIPROCESSING TIMING ANOMALIES 429

[4] J. L. KELLEY, General Topology, Van Nostrand, Princeton, 1955.[5] B. LIEBESMAN, The use ofa special algebra in schedule analysis, to appear.[6] G. K. MANACHER, Production and stabilization of real-time task schedules, J. Assoc. Comput.

Mach., 14 (1967), pp. 439-465.[7] B. P. OCHSNER, Controlling a multiprocessor system, Record 44, Bell Laboratories, 1966, pp.

59-62.[8] P. RICHARDS, Parallelprogramming, Rep. TD-B60-27, Technical Operations Inc., 1960.[9] M. ROTHKOPF, Scheduling independent tasks on one or more processors, Interim Tech. Rep. 2,

Operations Research Center, M.I.T., Cambridge, 1964.

Dow

nloa

ded

11/1

3/16

to 1

28.9

3.16

2.74

. Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php