Scheduling Techniques for Operating Systems

SchedulingTechniques forOperating SystemsR. B. BuntUniversity of Saskatchewan

I notice so many people slipping awayAnd many more waiting in the lines

Paul Simon, "Congratulations"Charing Cross Music, ( 1971

Introduction

One of the primary functions of an operating system isto distribute the resources under its control among theusers of the system in such a way as to achieveinstallation standards of performance (including service).One of the most important resources in a computer

system is, undeniably, the processor. For example, allsystem activities require time on at least one processor.So it is hardly surprising that processor scheduling hasreceived considerable attention since the very early daysof computing, and that many techniques for accomplishingthis essential task have been developed. These have beenvariously simulated, analyzed mathematically, and oc-casionally, implemented in actual systems. This paperlooks at some techniques for scheduling processors andcompares their implementations in a number of familiaroperating systems.

A general model of processor scheduling

According to one definition, given by Hellerman andConroy,' an operating system scheduler "is an algorithmthat uniquely specifies which job is to receive nextservice by a resource."Schedulers are usually described in isolation (if at all),

and consequently it is sometimes difficult to see howthey are related to one another. In this section, to pro-vide a common framework for the description of actualimplementations, a general model will be presented. Forsimplicity, this discussion is based on a single-processorsystem; extensions to multiprocessor systems arestraightforward.For the sake of convenience, the overall scheduling

function is often divided into distinct levels, as shown in

10

Figure 1. Both the operating constraints and the perform-ance objectives change with the levels. Although theterminology may differ from author to author, the basicdivision remains much the same. At the lowest level, de-cisions are made concerning the allocation of physicalresources such as CPU cycles to processes* in the system.This level of scheduling will be referred to as processmanagement. Since actual physical resources of the sys-tem are being managed at this level, the performanceobjectives should be given in terms of measures ofresource utilization and efficiency. In effect, the processmanager takes a real processor and through its schedulingpresents the illusion of (or simulates) a number of in-dependent virtual processors. Higher-level decisions, con-cerning the allocation of these virtual processors, aremade at the level of job management. At this level it isassumed that a certain number of these virtual processorsexists (the maximum allowed level of multiprogramming).

0

Mange- ~ - anager-FiPhysical

0 Vi,... PrcsosRsouc

User Jobs

Figure 1. The division of the scheduling function.

*It is assumed that the reader is familiar with the concept ofprocess or task. The literature abounds with definitions (see,for example, Dijkstral and Horning and Randell3). For thepurpose of this paper it suffices to view a process simply as aprogram in execution.

COMPUTER

------. --I .. - I---------

The job manager sees a set of user-submitted jobscompeting for these virtual processors and allocatesthem according to some predetermined policy. The pro-cess manager sees a community of sequential processes,each executing on its own virtual processor, requestingactual physical resources such as CPU cycles. Whereasthe process manager is primarily concerned with meas-ures of resource utilization, the job manager, since itdeals directly with users of the system, ought to haveits performance assessed in terms of measures of serviceto the users (such as turnaround time).

It is possible now to chart the history of a user re-quest under this model. A user requests some actionof the system by submitting a job to the system. Thejob manager allocates a virtual processor to this job ifone is free, and a corresponding process (or possiblymore than one) is created to perform the requestedaction. If there are no free virtual processors, the jobis queued until one becomes available. At this level itis assumed that the virtual processor will actually exe-cute the process. The process manager, however, mustallocate to the process whatever resources it may need(including CPU time) for the action to be performed.Allocation strategies are required for each level. Again,it is important to keep in mind both the operating con-straints and the performance objectives at each level.Decisions by the process manager must be made veryquickly and generally last for only a brief period. Strategiesinvolving complicated queuing methods, complex datastructures, or a lengthy analysis of process characteristicsmay consume unwarranted quantities of the very resourcebeing scheduled. At the job level more effort can be (andought to be) expended on making decisions resulting inthe best service to the users of the system. In termsof performance objectives, evaluation of the job managershould focus on job-oriented measures such as turn-around or throughput. Hellerman and Conroy' refer tothese as "job performance measures," normal indicatorsof the level of service afforded users of the system. Onthe other hand, the process manager is most appropri-ately evaluated in terms of system-oriented measures-such as CPU utilization. Hellerman and Conroy referto these as "resource performance measures." In theremaining portions of this section some classical alloca-tion strategies used at both the process and job levels willbe described. (A very good presentation along the samelines is given by Muntz.4)

Strategies for process management. Process managersappear in the literature under a number of aliases. Theprocess manager is known variously as the "dispatcher,"5the "traffic controller,"6 the "CPU scheduler,"7 the "short-term scheduler,"8 the "process scheduler,"9 or simply "thescheduler."10Most strategies operate by moving processes through a

series of well-defined states. For the model of the paper, aprocess having instructions executed by its virtual pro-cessor is said to be in the running state; it is blocked if it iswaiting for some wakeup signal (from another process oran external agent) to resume execution. For example, ifa running process wants to perform an input operation,it issues the appropriate input command and blocks,remaining blocked until a signal that the requested inputoperation has been completed permits it to resume. Theprpcess manager handles the block and wakeup instruc-tiES.

Idthough all the virtual processors are logically operat-in' in parallel, the real processor (here, the CPU) can infact be executing only one instruction at a time. Thus,while several processes may be running, only one pro-

October 1976

cess at a time can actually be executing. Therefore,from the point of view of the actual processor, threestates are possible for each process. Processes that areneither executing nor blocked are said to be ready. Ifthe executing process' becomes blocked, the processmanager decides which of the ready processes is to runnext. The movement through states is summarized inFigure 2.

PROCESSMANAGEMENT

JOBMANAGEMENT

Figure 2. A two-level scheduling model.

The running processes are often ordered in priorityaccording to some criterion, and each time the executingprocess blocks, the process manager passes control to thehighest-priority ready process. Priority may be purchasedby the user, deserved by his status or rank, or earnedby way of the executing process displaying certain de-sirable characteristics.'1 For example, in some systemsthe I/O activity of the running processes is monitoreddynamically and higher priority is given to those pro-cesses that seem to be doing a lot of I/O. An I/O-bound process will frequently release control of the CPUfor its I/O operations. The time necessary to completethe requested operation can then be overlapped with pro-cessing of a lower-priority process. This technique im-proves system throughput and at the same time alleviatesa possible system bottleneck by keeping I/O devices busy.This scheme can be elaborated to include other charac-teristics of running processes as well.A second method of distributing the CPU is through

time slicing. Rather than allow processes to run untilthey block themselves, each ready process receives in turnsome small amount of CPU time known as a time slice(or quantum), at the end of which it is interrupted by a sig-nal from an interval timer (or clocking process). The timeslice may be fixed or varied, and the sequence in whichready processes are allocated time slices (as well as thetime slice duration) may be determined in a variety ofways. For example, the ready processes may be allocatedfixed-length time slices in a cyclical or round robin fashion.

In theoretical or simulation studies, true parallel execu-tion is often assumed. A method known as processorsharing serves all the ready processes simultaneouslyat a rate inversely proportional to their current number.Thus a process that would run at a rate r if it were the

11

only one being processed, runs at a rate r/n if there are

n ready processes. This assumption makes possible theparallel advancement of jobs without requiring the com-

plication introduced by the mechanics of process switch-ing. Although this strategy cannot be implemented inconventional computers (and therefore will be consideredno further in this paper), it is often useful as a yard-stick against which other strategies can be comparedin theoretical or simulation studies. A good discussion ofprocessor sharing is given by Coffman and Kleinrock."

Strategies for job management. At the level of jobmanagement, the real processor has been replaced logicallyby some number of virtual processors (as determined bythe level of multiprogramming). In a standard batch pro-

cessing environment, users submit jobs to the computersystem in the form of programs. The number of virtualprocessors is usually considerably smaller than the num-

ber of jobs competing for them. If there are no availablevirtual processors, the jobs are queued by the job manager

according to some priority. When a virtual processorbecomes available, the job manager allocates it to thewaiting job with the highest priority.The decision algorithm used by the job manager,

commonly referred to as the "scheduling algorithm,"enforces a sequencing discipline on the jobs waiting forvirtual processors and determines the order in which theywill be allowed access to the virtual processors. Althoughin large part a political decision, the choice of schedulingdiscipline is a significant factor in the performance of thesystem and should be made according to the needs of theparticular system. For example, in a system devoted tointeractive use, emphasis may be placed on minimizingworst-cage response to short terminal requests. In othersystems average turnaround might be considered mostimportant. These two systems would require differentscheduling strategies at the job level.

The simplest scheduling discipline is "first-come-first-served" (FCFS), also known as "first-in-first-out" (FIFO).*All jobs are assumed to be equally preferred, and thusare serviced to completion in the order that they arrive.Although it has been said'3 that "an inherent sense offair play has elevated [the FCFS rule] to an eminenceout of all proportion to its basic virtue," it can bequite adequate in certain situations, and is often used as

a basis of comparison for other disciplines. Little isrequired in the way of system overhead for queuemanagement, but the system performance can be veryerratic, particularly under heavy loading.An important class of rules selects certain users as

preferred and gives them better service (possibly atincreased cost). A valuable principle to be borne in mindis known as Kleinrock's conservation law'4 which states,informally, that for given arrival and service patterns, a

particular weighted sum of average waiting times for alljobs is invariant to the scheduling discipline used. Thissays that scheduling can only improve the service af-forded some jobs at the expense of that given others.Preferential scheduling algorithms differ in their choice ofusers to be given preferential treatment.A common preferential discipline is "shortest first."

This rule requires a priori knowledge of each job'srunning time and bases the job's priority on this informa-tion. Each time a job is completed, its virtual processoris allocated to the waiting job having the smallestprocessing requirement. A distinction may be made'3 be-

tween the case in which the processing requirements are

known exactly [called "shortest processing time" (SPT)or "shortest job first" (SJF)] and the case wherein theyare estimated in some fashion [such as "shortest expected

12

processing time" (SEPT)]. This is not a difficult rule toimplement but requires more overhead for queue manage-ment than is required by FCFS. It gives much betterservice to jobs with small processing requirements, butdoes so by giving poorer service to the longer jobs. Sinceusers with short jobs would be quite unhappy withdelays that might be tolerated by users with much longerjobs, this seems a reasonable approach. In job mixesdistributed in such a way that there are many moreshort jobs than long jobs (which is often the case in thenormal operating environment"**), this rule makes manypeople happy at the expense of relatively few. Of all therules not using preemption, this rule yields the smallestmean turnaround time'3 provided that accurate a prioriknowledge is available.In both FCFS and SPT, a job once scheduled (or

allocated a virtual processor) is served until it is com-pleted. The introduction of a preemption mechanism leadsto interesting variations of nonpreemptive rules. Pre-emption involves interrupting the job currently executingon a particular virtual processor, recording the cur-rent state of its execution, perhaps rolling the job outto secondary storage, and allocating the virtual processorto a new job. A certain amount of processor time (called"preemption overhead" or cost) is consumed by thisoperation.*** Normally, preempted jobs are returned to thesame queues in which the arriving jobs are held. Whena preempted job again comes up for service, executionresumes at the point of interruption; consequently, thistechnique is known as resume preemption.The incorporation of preemption in the SPT rule,

yielding "shortest remaining processing time" (SRPT)or "preemptive shortest job first" (PSJF), results in stillsharper service discrimination between short jobs andlong jobs with the added cost of some preemption over-head. SRPT is simply the natural extension to SPT,applying the "shortest first" rule at every arrival as wellas every completion. If the new arrival has a smallerprocessing requirement than that remaining for the jobcurrently in service, the job being serviced is preemptedand replaced by the new arrival. It can be shown'7that SRPT scheduling yields the smallest average turn-around time (but with the highest variance) when arrivalsoccur intermittently. However, the cost of preemption insome systems may negate the advantage of SRPT overSPT.Both SPT and SRPT require the scheduler to have

exact a priori knowledge of each job's processing require-ments. Such information is not generally available in mostsystems. In some systems, estimates provided by theusers themselves form the basis of the scheduling de-cisions. Although experienced users can, with some prac-tice, learn to estimate fairly closely, in many cases for avariety of reasons (such as rapidly changing environments,novice users, weak penalties for bad estimates) the over-all accuracy of estimates may be questionable.

*In a multiprogramming system, jobs may not complete inthe order they arrive because of the interleaved executionenforced by the process manager. Thus "first-in" does notnecessarily imply "first-out."

**In one study of a CP-67 system" it was observed that 85%of the jobs comprised only 7% of the demand for CPU.Similar findings have been reported elsewhere.

***The cost of preemption varies from system to system de-pending on the amount of work involved, the number ofprograms in the system, and the proportion of time that can

be overlapped.'6COMPUTER

A final class of rules requires no a prior7i knowledgeof processing requirements for scheduling decisions. Inthe round robin discipline shown in Figure 3, each job,in turn, is allocated a quantum of uninterrupted service.If it does not complete within the quantum, it is pre-empted and returned to the waiting queues, and a newjob is started. The strategy is basically one of samplingthe jobs in turn to see which of them can completewith a small amount of additional time.

preempted jobs

arrivals ompletions

processor

preempted jobs

arrivals

system of waiting queues

Figure 3. The round robin scheduling strategy.

Some form of round robin scheduling is used in manyinteractive systems, and is appropriate since most of thejobs (or requests for service) are short (perhaps requiringonly a single quantum) and fast response is essential.The size of the quantum is a design parameter and a

critical factor in the performance of the algorithm. Theexpected time any given job might wait to receive itsnext quantum of service is proportional to the number ofactive jobs n, the quantum size q, and the overhead dueto switching jobs s (assume s << q). This overheadcomprises the cost required for preempting the job cur-rently running and starting a new one, and is clearlyassociated with the sampling operation. As n increases,the expected response- time must also increase if q

and s remain constant. A variable quantum size canallow the algorithm to be more responsive to the currentload. For example, under light loads n is small and ade-quate response can be achieved with a large quantum;therefore, the amount of switching done (and thus-thetotal sampling overhead) can be reduced by choosing afairly large quantum size. As the load increases, responsedegrades; once it becomes unacceptable the quantumsize should be reduced. If q becomes too small, however,the s term will become significant and the cost of repeatedswitching will limit the achievable response.An important variation of the round robin rule is the

multilevel feedback rule (FB, sometimes called "multi-level foreground/background") shown in Figure 4. Anordering is imposed on the system of waiting queues cor-responding to the number of service periods that the jobsin the queue have already had. Thus QUEUEo containsnew arrivals, QUEUE1 contains jobs that have been pre-

empted once, and so on. Each of the queues is orderedFCFS. After the quantum of a job being serviced isexhausted, the first job in the lowest-numbered non-empty queue is selected for service, and the preemptedjob is returned to a queue one level higher than the queuefrom which it was chosen. If there are at most Nwaiting queues, the rule is sometimes known as FBN. TheFB rule results in sharper short-job discrimination thanthe round robin rule by ensuring that long jobs do notinterfere. The result of the movement to lower-priorityqueues is an implicit ordering of the jobs by length ofrunning time (within the limits of accuracy of the quantumsize). Thus the effect is similar to that of a "shortest

October 1976

Figure 4. The FBN scheduling strategy.

first" rule, but is achieved without any advance knowl-edge of running times. Like round robin, however, thereis associated sampling overhead.

In this section, a number of classical schedulingmethods have been described. These can be classified in a

variety of ways. First, we have non-preemptive rules,such as FCFS and SPT, versus preemptive rules, such asSRPT, round robin, and FB. The preemption mechanismenhances the discriminatory capability of the schedulingrule, at a cost of a certain amount of system overheadrequired for the preemption. If the cost of preemption can

be kept reasonably small, preemptive rules should out-perform non-preemptive rules.7Scheduling rules could also be classified according to

the information they require about the jobs a priori.Rules such as FCFS, round robin, and FB require no

information, whereas rules such as SPT and SRPT re-

quire exact information. As expected, schedulers performbetter if they can take advantage of information aboutthe nature of the jobs they are scheduling, but in manysystems exact knowledge is not available. Rules such asSEPT ("shortest expected processing time") rely onestimates.Most operating system schedulers resemble one (or

may combine several) of the classical rules. These arenormally altered to meet certain special requirements ofthe particular system. In the remainder of this paper,several important operating systems will be described.In the descriptions, an attempt will be made to separatethe scheduling component from other aspects of resourceallocation, an approach that is neither easy to do noradvisable in practice. This approach has been taken inthe interests of space, and hopefully, clarity of description.Apologies are extended in advance to those who mightrightfully object to this simplification.

IBM SystemI360 andSystemI370 Operating Systems

IBM provides a family of operating systems for itsSystem/360 and System/370 lines of computers. Morecomplete descriptions of these systems, the services theyprovide, and details of their design are found in theliterature.",9 In this section the techniques for both job

13

completions

processor

management and process management in some of themore common systems will be described.The terminology used by IBM differs in some respects

from both that used elsewhere and that used earlier inthis paper. Here we will be as uniform as possible-and that means some liberties will have to be taken withactual system terminology. In IBM systems, a job isactually submitted by a user as a collection of jobsteps. For purposes of simplification each job step willbe referred to simply as a job in this discussion.Similarly, the terms "process" will be used in place of"task," and "virtual processor" in place of "initiator,""main storage," or "region."The IBM operating systems operate essentially on a

job classification basis. The user is allowed considerableopportunity to make input to the classification decision.In some of the more sophisticated systems, schedulingparameters are modified dynamically, with the user in-formation giving starting values.

OS/MFT. The operating system OS/MFT (multipro-gramming with a fixed number of tasks) is the simplestof the systems offering a multiprogramming capability.Essentially, a fixed number of virtual processors (nomore than 15, typically less) are made available for usersof the system. Associated with each is a fixed amountof main memory. The virtual processors are numberedPO, Pl, .... PN with the index used to determinethe dispatching (or process management) priority.The user classifies his job according to the nature of itsresource demands according to the class definitionsestablished by the installation. For example, class Amight indicate a job that is I/O-bound, class B a jobthat is CPU-bound, class C a short express run, classD a job requiring tape and/or disk mounts, and so forth.A system queue is established for each class. Sinceseveral jobs will often belong to the same class, a schemeis needed -to break ties. First, the user is allowed tospecify a priority for his job. If this fails to producea unique candidate, the jobs are selected in the orderthey entered the queue (i.e., FCFS within priority andclass).

The system operator assigns up to three of the possiblejob classes to each virtual processor. The order theclasses are assigned indicates the job scheduling priorityfor that particular virtual processor. That is, the firstclass assigned to a virtual processor has highest priorityfor the use of that processor, the second class (if there isone) has second priority, and so on. A job may not bescheduled on a virtual processor unless it is from a classassigned to that virtual processor.In the example illustrated in Figure 5, virtual pro-

cessor P0 has been assigned, in order of decreasingpriority, classes C, A, and B. This means that first callon this virtual processor goes to class C jobs, secondcall goes to class A jobs, and in the event that bothclass C and A queues are empty, a class B job isscheduled. Class D jobs are not eligible to be scheduledon P0. Classes have been assigned to the remaining virtualprocessors in a similar manner.The process management technique used in OS/MFT is

known as "highest static priority first" (with preemption).Associated with each virtual processor is a fixed prioritywith processes running on P0 having the highest priorityand processes running on P the lowest. The highest-priority-ready process is scheduled and executes until oneof two events occurs:

(1) the process blocks, for example, on a request for anI/O operation, or

14

Job Queues Virtual Assigned JobProcessors Closses

class A e

class B p third riority

class C p1

class D 0 |

CAB

ACB

DB

Figure 5. Scheduling from classes in OS/M FT.

(2) a process with a higher priority (i.e., running on ahigher-priority virtual processor) becomes ready, forexample, by completing an I/O operation (signaledby an interrupt).

The assignment of classes to virtual processors has asignificant effect on performance as a result of theactions of the process manager, illustrating the effectof interactions between the two levels of scheduling. Forexample, if CPU-bound processes are assigned to high-priority virtual processors they will seldom relinquishcontrol of the real CPU, and I/O-bound processes withlower priority will have little chance to execute. As aresult, system throughput will be low. Note also thatthe user has only indirect control over the attention hisjob receives at the process management (or dispatching)level. The definition of the classes and the assignmentof priorities to these classes (through their assignmentto certain virtual processors) is out of his hands. He isallowed to specify a job management priority thataffects the service he receives relative to other jobs of thesame class. It is important for the performance of theentire system for the installation to ensure the users donot attempt to misrepresent their jobs.

OS/MVT. OS/MVT (multiprogramming with a variablenumber of tasks) is similar in many ways to OS/MFT.The job class and priority within class are once againdetermined at the time of job submission. The assignmentof classes to virtual processors and the selection of jobsare done as in MFT. Because the amount of main memoryassociated .with a virtual processor in MVT can vary,further processing of a scheduled job may have to be post-poned until the requested amount of memory can be madeavailable. The additional complication is described byHellerman and Conroy' and will not be dealt with here.Process management techniques in MVT differ from

those of MFT. Once again the "highest-static-priority-first" rule is used, but in MVT the virtual processorshave no inherent priority; the process managementpriority is taken from the priority specified by the user onsubmission of his job (explicitly or by default), which nowserves two purposes:

(1) for breaking ties within classes at the level of jobmanagement, and

(2) as the process priority at the level of processmanagement.

This now allows the user to assign high process manage-ment priority to I/O-bound jobs and low priority to CPU-bound jobs to achieve the high resource usage and im-proved throughput described earlier. Of course, this pre-sumes that the user knows his job characteristics, andfurther, that they remain constant throughout the job's

COMPUTER

execution. Neither of these may be the case. Once again,care must be taken to prevent abuse of this system.

It is possible for an MVT (or MFT) installation toemploy the time slice option as well as the "highest-static-priority-first" rule at the level of process manage-ment. Under this option, all processes at a certain instal-lation-specified priority are scheduled in a round robinfashion as described earlier. Processes with prioritiesabove or below this value are handled in the normalmanner. When the priority of the time slice groupbecomes the highest among all the ready processes, eachready process of the time slice group receives a fixedquantum of CPU time in turn until interrupted by ahigher-priority process or until all processes in the timeslice group enter the blocked state. Conventional schedul-ing then takes over.

Among the features offered by HASP is an optioncalled heuristic dispatching, which tries to improve re-source utilization and increase system throughput bygiving high priority to I/O-bound processes. This is doneby monitoring process characteristics as the processesexecute. Each executing process is given a quantum ofCPU time. If the process uses the entire quantum, it isassumed to be CPU-bound and placed in the CPU sub-group. If the process blocks for an I/O operation duringits quantum, it is assumed to be I/O-bound and placedin the I/O subgroup. The heuristic dispatcher gives higherpriority to processes in the I/O subgroup and schedulesthe CPU subgroup only if all processes in the I/O sub-group are blocked. Processes in the I/O subgroup areallowed to preempt processes in the CPU subgroup. Asprocesses change their characteristics during execution,HASP will change their subgroup. The effectiveness ofthe current quantum size in making the distinction be-tween I/O-bound processes and CPU-bound processes isalso monitored at specified long intervals (of many quanta).If the proportion of processes identified as I/O-bound ismore than a proportion specified as desired by the instal-lation, the quantum size is shortened so as to increasethe number of processes identified as CPU-bound andbring the ratio down. If the observed ratio is less thandesired, the quantum size is lengthened. The adjustmentsare made within specified upper and lower bounds. Thetechnique of heuristic dispatching has been found to bevery effective, with throughput improvements of almost19 percent reported by Marshall.'9

As mentioned, it is often difficult for users to makejudgments on the execution characteristics of jobs,particularly when these characteristics change as the jobexecutes (for example, jobs may alternate between periodsof CPU-boundedness and 1/0-boundedness) or when in-ferences on the characteristics of jobs other than theuser's own are required. The HASP (Houston AutomaticSpooling Priority) system offers an enhancement thatattempts to meet these difficulties.5HASP was originally developed as part of an enhance-

ment to OS/360 for real-time spaceflight control forNASA's Apollo spaceflights (see the work of Johnstone'8for a description of the extensions made to OS/360for this purpose), but it soon became a popular additionto many OS/360 installations. HASP is primarily con-cerned with peripheral functions, such as the collectingof the job stream and its output (following execution) ondirect access devices, and the scheduling of printing andpunching of this output from the direct access devices.Many of the HASP functions have been designed directlyinto the more recent IBM operating systems.IBM also offers a number of operating systems capable

of supporting virtual storage on the System/370 (The

October 1976

reader not familiar with the concept of virtual storage isreferred to treatments by Doran,'0 Shaw," or Hellermanand Conroy.'). Basically they are enhanced versions ofOS/MFT and OS/MVT, originally developed for theSystem/360. The major differences are in the area ofmemory management; the scheduling techniques aresimilar to those already described for MFT and MVT.OS/VS2 is the enhanced version of OS/MVT (OSIVS1

is the enhanced MFT). The major differences in jobmanagement between VS2 and MVT are the support ofmore virtual processors (up to 63 as compared to 15), andthe inclusion of techniques to reduce contention of I/Odevices (see IBM guide22 for details). The process managerincorporates a facility called "automatic priority grouping,"based on HASP's heuristic dispatching. A particularpriority level can be specified as an automatic prioritygroup to which the techniques described earlier areapplied. A restriction imposed is that this priority levelcannot also be specified as a time-slicing group. Jobmanagement in VS2 is closely tied to memory manage-ment. For example, a "load leveler" can interrupt andtemporarily halt an active job if the paging rate isassessed to be too high. Thus the number of runningvirtual processors is dynamically varied. A good descrip-tion of the facility is given by Hellerman and Conroy.

MULTICS

The IBM operating systems described in the previoussection are primarily oriented to a batch environment,although options such as TSO (timesharing option) areavailable. An example of an operating system design tomeet a somewhat different need is MULTICS (Multiplexed Information and Computing Service), developedjointly by MIT and General Electric. MULTICS offersboth interactive and batch service with considerable em-phasis placed on the concept of information sharing.Performance objectives vary in such a system, and as aresult the scheduling function is handled in a differentfashion. A very complete discussion of all aspects of theMULTICS system is given by Organick." The MULTICSvirtual storage structure is discussed by Doran.'0

In a system oriented toward interactive timesharing,the distinction between jobs and processes is somewhatfuzzy. "Jobs" are normally very short requests enteredfrom some type of terminal. The request might be foran edit of some line of text, or it might be for theexecution of some previously saved job, such as a compiler.Once a request is received, a "process" is created toperform the requested action. For the sake of uniformityin the presentation, the distinction between job manage-ment and process management will be retained in thisdescription. In general, the processor time required toservice a request is not known in advance. Consequently,the scheduler is usually one that assumes no knowledge.

The MULTICS job manager is a modified FBN scheduler.To provide for different service requirements, each job(or request) is assigned, on submission, a range ofpriority levels (l, 12) and given the initial priority 11. Therange of priorities indicates roughly the type of servicethe job will receive. Highly interactive jobs (such as lineedits) will require very fast response and therefore willbe given high priority. Longer-running interactive Jobs(such as compilations) or "absentee user" jobs (such asbatch jobs) are given a lower-priority range. The levelsmay, in fact, overlap. Corresponding to the prioritylevels is a set ofN queues from which jobs are scheduled

15

according to the FBN rule, with the additional complica-tion that each job begins at the queue corresponding toits assigned 11 value, and is not allowed to drop to queueslower than its assigned 12 value. As described earlier,the FB rule implicitly determines the amount of servicerequireZ by a job and relegates longer jobs to lower-priority queues. The quantum allocated in MULTICSvaries with the level, doubling at each successive lower-priority level. This policy tends to reduce total samplingoverhead. The number of jobs permitted to be active atany time (or the multiprogramming level) is determineddynamically from an assessment of the current memorydemands of all the active jobs. This is similar to the loadleveler embodied in OS/VS2.At the process management level, control is given to

the highest-priority process that is ready to run. If theprocess blocks, control passes to the next highest-priority process. If the quantum allocated to the job bythe job manager expires, the job is deactivated and re-turned to the system of FB queues. The policy isdesigned to increase the amount of "effective work"done, or minimize resource wastage. Service considerations,such as fast response, are the province of the jobmanager.

Concluding remarks

In an attempt to provide a common framework for thedescription of diverse schedulers, a general model wasproposed. A number of classical scheduling techniqueswere described using this model and their characteristicswere assessed. Actual implementations of these techniquesoften compromise the classical definitions to accommo-date some special requirements or constraints of theparticular system. A common problem is that of balancingthe total resource demand of the jobs in the system againstthe resources available. For example, a scheduling deci-sion resulting from the application of one of the classicalrules may have to be overridden because of insufficientavailable memory. Clearly, any discussion that attemptsto concentrate solely on processor scheduling will bedeficient in some of these areas. A scheduler must bean integral part of the resource allocation component ofan operating system.In this paper the scheduling methods of a number of

popular operating systems have been described. The IBMsystems' described (OS/MFT, OS/MVT, OS/VS2) are allprimarily oriented to an environment of batch sub-missions. The MULTICS system and the UNIX system,offering different types of service, have different perform-ance objectives and hence employ a different schedulingapproach to meet these objectives. U

UNIX

UNIX is a general-purpose multiuser, interactiveoperating system developed by a group at Bell Labora-tories for the Digital Equipment Corporation's PDP-1 1/40and PDP-11/45 computers.24 Unlike the systems de-scribed previously, it is quite feasible to run UNIX onrelatively small and inexpensive machines, yet the systemstill offers very effective interactive service. It was pri-marily designed with objectives such as simplicity, ele-gance, and ease of use in mind.Because it is devoted entirely to interactive use,

scheduling decisions are somewhat simpler than thosemade by the MULTICS system. The UNIX job managerexamines the jobs in the waiting queue and selects theone that has been waiting the longest. If there is suffi-cient main memory available to accommodate the needs ofthis job, it is immediately transferred to the runningstate. If this is not the case, the job manager tries to findenough "easy core," that is, memory occupied by pro-cesses currently in the blocked state. If this additionalsearch fails to meet the specified needs, then a decisionis made to acquire the needed memory by deactivating thejob that has been active for the longest uninterruptedperiod, provided it has been active for more than 2seconds. If all efforts to acquire memory fail, the jobmanager itself is put to sleep until either a period of 1second elapses, or until one of the executing jobs blocks,at which time the job manager is reinvoked and theabove operation is repeated.The objective of this job manager is to attempt to give

each user a "fair crack" at the processor; that is, nouser will wait excessively long for a virtual processor, andonce given a virtual processor, each user has a chance todo a reasonable amount of work before deactivation. Thisobjective is consistent with the overall response needs ofan interactive system.The UNIX process manager is very simple. The CPU

is given to the highest-priority ready process which re-tains control for up to 1 second. If the process shouldblock before 1 second has elapsed, control will pass to thenext process in the ready queue. This is much the samestrategy as that employed in the MULTICS system.

Acknowledgments

The preparation of this paper was supported in part bythe Defence Research Board of Canada, Grant No. 9931-40. The author is grateful for the help of Chris Thomsonfor gathering material and of Dianne Good for typing themanuscript.

References

1. H. Hellerman and T. F. Conroy, Computer System Per-formance, McGraw-Hill, New York, 1975.

2. E. W. Dijkstra, "Cooperating Sequential Processes," Pro-gramming Languages (F. Genuys ed.), Academic Press,New York, 1968, pp. 43-112.

3. J. J. Horning and B. Randell, "Process Structuring,"ACM Computing Surveys, Vol. 5, No. 1 (March 1973),pp. 5-30.

4. R. R. Muntz, Software Systems Principles: A Survey(P. Freeman, ed), Chapter 7, Science Research Associates,1975.

5. K. D. Ryder, "A Heuristic Approach to Task Dispatching,"IBMSystems Journal, Vol.9, No.3 (1970), pp. 189-198.

6. J. H. Saltzer, "Traffic Control in a Multiplexed ComputerSystem," Sc.D. thesis, Dept. of EE, MIT, Cambridge,Massachusetts, 1966.

7. S. Sherman, F. Baskett, and J. C. Browne, "Trace-Driven Modeling and Analysis of CPU Scheduling in aMultiprogramming System," CACM, Vol. 15, No. 12(December 1972), pp. 1063-1069.

8. P. Brinch Hansen, Operating Systems Principles, Prentice-Hall, Englewood Cliffs, New Jersey, 1973.

COMPUTER16

9. S. E. Madnick and J. J. Donovan, Operating Systems,McGraw-Hill, New York, 1974.

10. B. W. Lampson, "A Scheduling Philosophy for Multi-processing Systems," CACM, Vol. 11, No. 5 (May 1968),pp. 347-360.

11. E. G. Coffman, Jr., and L. Kleinrock, "Computer Schedul-ing Methods and Their Countermeasures," Proc. AFIPS,SJCC, Vol. 32 (1968), pp. 11-21.

12. E. G. Coffman, Jr., and L. Kleinrock, "Feedback QueueingModels for Time-Shared Systems," JACM, Vol. 15, No.4 (October 1968), pp. 549-576.

13. R. W. Conway, W. L. Maxwell, and L. W. Miller, Theoryof Scheduling, Addison-Wesley, Reading, Massachusetts,1967.

14. L. Kleinrock, "A Conservation Law for a Wide Classof Queueing Disciplines," Naval Research LogisticsQuarterly, Vol. 12, No. 2 (June 1965), pp. 181-192.

15. J. Rodriguez-Rosell and J. Dupuy, "The Design, Imple-mentation and Evaluation of a Working Set Dispatcher,"CACM, Vol. 16, No.4 (April 1973), pp. 247-253.

16. U. N. Bhat and R. E. Nance, "Dynamic Quantum Alloca-tion and Swap Time Variability in Time-Sharing OperatingSystems," Tech. Report No. CP-73009, Department ofComputer Science and Operations Research, SouthernMethodist University, Dallas, Texas, April 1973.

17. L. Schrage, "A Proof of the Optimality of the ShortestRemaining Processing Time Discipline," Operations Re-search, Vol. 16, No. 3 (May-June 1968), pp. 687-690.

18. J. L. Johnstone, Software Systems Principles: A Survey(P. Freeman, ed.), Chapter 17, Science Research Associ-ates, 1975.

19. B. S. Marshall, "Dynamic Calculation of DispatchingPriorities Under OS/360 MVT," Datamation (August1969), pp. 93-97.

20. R. W. Doran, "Virtual Memory," Computer, Vol. 9, No. 10(October 1976), pp. 27-37.

21. A. C. Shaw, The Logical Design of Operating Systems,Prentice-Hall, Englewood Cliffs, New Jersey, 1974.

22. IBM Corporation, OS/VS2 Planning and Use Guide,Form No. GC28-0600-2, September 1972.

23. E. I, Organick, The MULTICS System: An Examinationof Its Structure, MIT Press, Cambridge, Massachusetts,1972.

24. D. M. Ritchie and K. Thompson, "The UNIX Time-Sharing System," CACM, Vol. 17, No. 7 (July 1974),pp. 365-375.

Rick Bunt is an associate professor in theDepartment of Computational Science at theUniversity of Saskatchewan, Saskatoon,Canada. He has been at the university since1972. His research interests include operatingsystems, simulation, and the study of pro-gramming and programmers. He receivedthe B.Sc. degree from Queen's University,Kingston, Ontario, and the M.Sc. and Ph.Ddegrees in computer science from the Uni-

versity *of Toronto. He is a member of ACM, the IEEEComputer Society, and the Computer Science Association(Canada).

I osystem&

software OflgmnTexas Instruments has immediateopenings for highly motivated,talented individuals in their CorporateEngineering Center. In this highlyvisible and dynamic organization you

will be a member of a team whosefunction is to evaluate emergingtechnologies, design and implementprototype systems, and developcomprehensive strategies for futurebusiness growth.

If you are a computer professionalwith a MS or PhD and are looking for a

challenging opportunity, we havepositions for experienced individualswith records of achievement in:

* Computer architecture* Memory system design* Multiprocessor systems* Peripheral technology* Data Communications* Software development* Programming language design* Software engineering* Operating system design

If you qualify, send your resume incomplete confidence to: Harry Moseley/Texas Instruments/P.O. Box 5474, M.S.217/Dallas,'rexas 75222.

TEXAS INSTRUMENTSINCORPORATED

I An equal opportunity employer.

October 1976 17

rL

Scheduling Techniques for Operating Systems

Documents

Transcript of Scheduling Techniques for Operating Systems