operating system lecture notes

of 177 /177
A.V.C.COLLEGE OF ENGINEERING MANNAMPANDAL, MAYILADUTHURAI-609 305 COURSE MATERIAL FOR THE SUBJECT OF OPERATING SYSTEMS Subject Code : CS 2254 Semester :IV SEMESTER Department : B.E CSE Academic Year : 2012-2013 Nameof the Faculty : M.PARVATHI Designation and Dept : Asst Prof /CSE

Embed Size (px)



Transcript of operating system lecture notes

Page 1: operating system lecture notes






Subject Code : CS 2254


Department : B.E CSE

Academic Year : 2012-2013

Name of the Faculty : M.PARVATHI

Designation and Dept : Asst Prof /CSE

Page 2: operating system lecture notes
Page 3: operating system lecture notes


Regulations 2008Curriculum



CS1253 – OPERATING SYSTEMS(Common to CSE and IT)


Introduction to operating systems – Review of computer organization – Operating system structures – System calls – System programs – System structure – Virtual machines – Processes – Process concept – Process scheduling – Operations on processes – Cooperating processes – Interprocess communication – Communication in client-server systems – Case study – IPC in linux – Threads – Multi-threading models – Threading issues – Case study – Pthreads library.

UNIT II PROCESS SCHEDULING AND SYNCHRONIZATION 10CPU scheduling – Scheduling criteria – Scheduling algorithms – Multiple – Processor scheduling – Real time scheduling – Algorithm evaluation – Case study – Process scheduling in Linux – Process synchronization – The critical-section problem– Synchronization hardware – Semaphores – Classic problems of synchronization – Critical regions – Monitors – Deadlock system model – Deadlock characterization –Methods for handling deadlocks – Deadlock prevention – Deadlock avoidance – Deadlock detection – Recovery from deadlock.

UNIT III STORAGE MANAGEMENT 9Memory management – Background – Swapping – Contiguous memory allocation –Paging – Segmentation – Segmentation with paging – Virtual memory – Background– Demand paging – Process creation – Page replacement – Allocation of frames – Thrashing – Case study – Memory management in Linux.

UNIT IV FILE SYSTEMS 9File system interface – File concept – Access methods – Directory structure – Filesystem mounting – Protection – File system implementation – Directoryimplementation – Allocation methods – Free space management – Efficiency and performance – Recovery – Log structured file systems – Case studies – File system in Linux – File system in Windows XP.

UNIT V I/O SYSTEMS 8I/O Systems – I/O Hardware – Application I/O interface – Kernel I/O subsystem –Streams – Performance – Mass-storage structure – Disk scheduling – Disk

Page 4: operating system lecture notes

management – Swap-space management – RAID – Disk attachment – Stable storage – Tertiary storage – Case study – I/O in Linux.

Total: 45


1. Silberschatz, Galvin and Gagne, “Operating System Concepts”, 6th Edition, Wiley India Pvt. Ltd., 2003.


1. Tanenbaum, A.S., “Modern Operating Systems”, 2nd Edition, PearsonEducation, 2004.2. Gary Nutt, “Operating Systems”, 3rd Edition, Pearson Education, 2004.3. William Stallings, “Operating Systems”, 4th Edition, Prentice Hall of India,2003.

Page 5: operating system lecture notes


1.1 Introduction An operating system act as an intermediary between the user of a computer and computer hardware. The purpose of an operating system is to provide an environment in which a user can execute programs in a convenient and efficient manner. An operating system is a software that manages the computer hardware. The hardware must provide appropriate mechanisms to ensure the correct operation of the computer system and to prevent user programs from interfering with the proper operation of the system. 1.2 Operating System 1.2.1 Definition of Operating System: An Operating system is a program that controls the execution of application programs and acts as an interface between the user of a computer and the computer hardware.

Page 6: operating system lecture notes

A more common definition is that the operating system is the one program running at all times on the computer (usually called the kernel), with all else being applications programs.

An Operating system is concerned with the allocation of resources and services, such as memory, processors, devices and information. The Operating System correspondingly includes programs to manage these resources, such as a traffic controller, a scheduler, memory management module, I/O programs, and a file system.

1.2.2 Functions of Operating System Operating system performs three functions: 1. Convenience: An OS makes a computer more convenient to use.

2. Efficiency: An OS allows the computer system resources to be used in an efficient manner.

3. Ability to Evolve: An OS should be constructed in such a way as to permit the effective development, testing and introduction of new system functions without at the same time interfering with service.

1.2.3 Operating System as User Interface Every general purpose computer consists of the hardware, operating system, system programs, application programs. The hardware consists of memory, CPU, ALU, I/O devices, peripheral device and storage device. System program consists of compilers, loaders, editors, OS etc. The application program consists of business program, database program.

Page 7: operating system lecture notes

The fig. 1.1 shows the conceptual view of a computer system

Page 8: operating system lecture notes

Fig 1.1 Conceptual view of a computer system Every computer must have an operating system to run other programs. The operating system and coordinates the use of the hardware among the various system programs and application program for a various users. It simply provides an environment within which other programs can do useful work.

The operating system is a set of special programs that run on a computer system that allow it to work properly. It performs basic tasks such as recognizing input from the keyboard, keeping track of files and directories

Page 9: operating system lecture notes

on the disk, sending output to the display screen and controlling a peripheral devices.

OS is designed to serve two basic purposes :

1. It controls the allocation and use of the computing system‘s resources among the various user and tasks.

2. It provides an interface between the computer hardware and the programmer that simplifies and makes feasible for coding, creation, debugging of application programs.

The operating system must support the following tasks. The tasks are :

1. Provides the facilities to create, modification of program and data files using and editor.

2. Access to the compiler for translating the user program from high level language to machine language.

3. Provide a loader program to move the compiled program code to the computer‘s memory for execution.

4. Provide routines that handle the details of I/O programming.

1.3 I/O System Management I/O System Management The module that keeps track of the status of devices is called the I/O traffic controller. Each I/O device has a device handler that resides in a separate process associated with that device.

The I/O subsystem consists of

Page 10: operating system lecture notes

1. A memory management component that includes buffering, caching and spooling.

2. A general device driver interface.

Drivers for specific hardware devices. 1.4 Assembler Input to an assembler is an assembly language program. Output is an object program plus information that enables the loader to prepare the object program for execution. At one time, the computer programmer had at his disposal a basic machine that interpreted, through hardware, certain fundamental instructions. He would program this computer by writing a series of ones and zeros(machine language), place them into the memory of the machine. 1.5 Compiler The high level languages – examples are FORTRAN, COBOL, ALGOL and PL/I – are processed by compilers and interpreters. A compilers is a program that accepts a source program in a ―high-level language‖ and produces a corresponding object program. An interpreter is a program that appears to execute a source program as if it was machine language. The same name (FORTRAN, COBOL etc) is often used to designate both a compiler and its associated language. 1.6 Loader

Page 11: operating system lecture notes

A loader is a routine that loads an object program and prepares it for execution. There are various loading schemes: absolute, relocating and direct-linking. In general, the loader must load, relocate, and link the object program. Loader is a program that places programs into memory and prepares them for execution. In a simple loading scheme, the assembler outputs the machine language translation of a program on a secondary device and a loader is placed in core. The loader places into memory the machine language version of the user‘s program and transfers control to it. Since the loader program is much smaller than the assembler, thos makes more core available to user‘s program. 1.7 History of Operating System Operating systems have been evolving through the years. Following table shows the history of OS. Generation Year Electronic

devices used Types of OS and devices

First 1945 – 55 Vacuum tubes Plug boards Second 1955 – 1965 Transistors Batch system Third 1965 – 1980 Integrated Circuit

(IC) Multiprogramming

Fourth Since 1980 Large scale integration


The 1960’s definition of an operating system is “the software that controls the

hardware”. However, today, due to microcode we need a better definition. We see an

operating system as the programs that make the hardware useable. In brief, an operating

system is the set of programs that controls a computer. Some examples of operating

systems are UNIX, Mach, MS-DOS, MS-Windows, Windows/NT, Chicago, OS/2,

MacOS, VMS, MVS, and VM.

Controlling the computer involves software at several levels. We will differentiate

kernel services, library services, and application-level services, all of which are part of

the operating system. Processes run Applications, which are linked together with libraries

that perform standard services. The kernel supports the processes by providing a path to

the peripheral devices. The kernel responds to service calls from the processes and

interrupts from the devices.

The core of the operating system is the kernel, a control program that functions in

privileged state (an execution context that allows all hardware instructions to be

executed), reacting to interrupts from external devices and to service requests and traps

from processes. Generally, the kernel is a permanent resident of the computer. It creates

Page 12: operating system lecture notes

and terminates processes and responds to their request for service.

Batch Systems


Batch operating system is one where programs and data are collected

together in a batch before processing starts. A job is predefined sequence of

commands, programs and data that are combined in to a single unit called job.


Fig. 2.1 shows the memory layout for a simple batch system. Memory

management in batch system is very simple. Memory is usually divided into

two areas : Operating system and user program area.

Scheduling is also simple in batch system. Jobs are processed in the order of submission i.e first come first served fashion.

When job completed execution, its memory is releases and the output for the job gets copied into an output spool for later printing.

Batch system often provides simple forms of file management. Access to file is serial. Batch systems do not require any time critical device management.

Batch systems are inconvenient for users because users can not interact with their jobs to fix problems. There may also be long turn around times. Example of this system id generating monthly bank statement. Advantages o Batch System Move much of the work of the operator to the computer.

Increased performance since it was possible for job to start as soon as the previous job finished.

Page 13: operating system lecture notes

Disadvantages of Batch System Turn around time can be large from user standpoint.

Difficult to debug program.

A job could enter an infinite loop.

A job could corrupt the monitor, thus affecting pending jobs.

Due to lack of protection scheme, one batch job can affect pending jobs.

2.5 Time Sharing Systems Multi-programmed batched systems provide an environment where the various system resources (for example, CPU, memory, peripheral devices) are utilized effectively.

Time sharing, or multitasking, is a logical extension of multiprogramming. Multiple jobs are executed by the CPU switching between them, but the

Page 14: operating system lecture notes

switches occur so frequently that the users may interact with each program while it is running.

An interactive, or hands-on, computer system provides on-line communication between the user and the system. The user gives instructions to the operating system or to a program directly, and receives an immediate response. Usually, a keyboard is used to provide input, and a display screen (such as a cathode-ray tube (CRT) or monitor) is used to provide output.

If users are to be able to access both data and code conveniently, an on-line file system must be available. A file is a collection of related information defined by its creator. Batch systems are appropriate for executing large jobs that need little interaction.

Time-sharing systems were developed to provide interactive use of a computer system at a reasonable cost. A time-shared operating system uses CPU scheduling and multiprogramming to provide each user with a small portion of a time-shared computer. Each user has at least one separate program in memory. A program that is loaded into memory and is executing is commonly referred to as a process. When a process executes, it typically executes for only a short time before it either finishes or needs to perform I/O. I/O may be interactive; that is, output is to a display for the user and input is from a user keyboard. Since interactive I/O typically runs at people speeds, it may take a long time to completed.

A time-shared operating system allows the many users to share the computer simultaneously. Since each action or command in a time-shared system tends to be short, only a little CPU time is needed for each user. As the system switches rapidly from one user to the next, each user is given the impression that she has her own computer, whereas actually one computer is being shared among many users.

Page 15: operating system lecture notes

Time-sharing operating systems are even more complex than are multi-programmed operating systems. As in multiprogramming, several jobs must be kept simultaneously in memory, which requires some form of memory management and protection.

2.6 Multiprogramming When two or more programs are in memory at the same time, sharing the processor is referred to the multiprogramming operating system. Multiprogramming assumes a single processor that is being shared. It increases CPU utilization by organizing jobs so that the CPU always has one to execute.

Fig. 2.2 shows the memory layout for a multiprogramming system.

The operating system keeps several jobs in memory at a time. This set of jobs is a subset of the jobs kept in the job pool. The operating system picks and begins to execute one of the job in the memory.

Page 16: operating system lecture notes

Multiprogrammed system provide an environment in which the various system resources are utilized effectively, but they do not provide for user interaction with the computer system.

Jobs entering into the system are kept into the memory. Operating system picks the job and begins to execute one of the job in the memory. Having

Page 17: operating system lecture notes

several programs in memory at the same time requires some form of memory management.

Multiprogramming operating system monitors the state of all active programs and system resources. This ensures that the CPU is never idle unless there are no jobs.

Advantages 1. High CPU utilization.

2. It appears that many programs are allotted CPU almost simultaneously.

Disadvantages 1. CPU scheduling is requires.

2. To accommodate many jobs in memory, memory management is required.

2.7 Spooling Acronym for simultaneous peripheral operations on line. Spooling refers to putting jobs in a buffer, a special area in memory or on a disk where a device can access them when it is ready.

Spooling is useful because device access data that different rates. The buffer provides a waiting station where data can rest while the slower device catches up. Fig 2.3 shows the spooling.

System Components

Even though, not all systems have the same structure many modern operating

systems share the same goal of supporting the following types of system components.

Process Management

The operating system manages many kinds of activities ranging from user

programs to system programs like printer spooler, name servers, file server etc. Each of

Page 18: operating system lecture notes

these activities is encapsulated in a process. A process includes the complete execution

context (code, data, PC, registers, OS resources in use etc.)

It is important to note that a process is not a program. A process is only ONE

instant of a program in execution. There are many processes can be running the same

program. The five major activities of an operating system in regard to process

management are

Creation and deletion of user and system processes.

Suspension and resumption of processes.

A mechanism for process synchronization.

A mechanism for process communication.

A mechanism for deadlock handling.

Main-Memory Management

Primary-Memory or Main-Memory is a large array of words or bytes. Each word

or byte has its own address. Main-memory provides storage that can be access directly by

the CPU. That is to say for a program to be executed, it must in the main memory.

The major activities of an operating in regard to memory-management are:

Keep track of which part of memory are currently being used and by


Decide which process is loaded into memory when memory space becomes available.

Allocate and deallocate memory space as needed.

File Management

A file is a collected of related information defined by its creator. Computer can

store files on the disk (secondary storage), which provide long term storage. Some

examples of storage media are magnetic tape, magnetic disk and optical disk. Each of

Page 19: operating system lecture notes

these media has its own properties like speed, capacity, data transfer rate and access


File systems normally organized into directories to ease their use.

These directories may contain files and other directions.

The five main major activities of an operating system in regard to file

management are

The creation and deletion of files.

The creation and deletion of directions.

The support of primitives for manipulating files and directions.

The mapping of files onto secondary storage.

The back up of files on stable storage media.

I/O System Management

I/O subsystem hides the peculiarities of specific hardware devices from the user.

Only the device driver knows the peculiarities of the specific device to whom it is


Secondary-Storage Management

Generally speaking, systems have several levels of storage, including primary

storage, secondary storage and cache storage. Instructions and data must be placed in

primary storage or cache to be referenced by a running program. Because main memory

is too small to accommodate all data and programs, and its data are lost when power is

lost, the computer system must provide secondary storage to back up main memory.

Secondary storage consists of tapes, disks, and other media designed to hold information

that will eventually be accessed in primary storage (primary, secondary, cache) is

ordinarily divided into bytes or words consisting of a fixed number of bytes. Each

location in storage has an address; the set of all addresses available to a program is called

an address space.

Page 20: operating system lecture notes

The three major activities of an operating system in regard to secondary storage

management are:

Managing the free space available on the secondary-storage device.

Allocation of storage space when new files have to be written.

Scheduling the requests for memory access.


A distributed systems is a collection of processors that do not share memory,

peripheral devices, or a clock. The processors communicate with one another through

communication lines called network. The communication-network design must consider

routing and connection strategies, and the problems of contention and security.

Protection System

If computer systems has multiple users and allows the concurrent execution of

multiple processes, then the various processes must be protected from one another's

activities. Protection refers to mechanism for controlling the access of programs,

processes, or users to the resources defined by computer systems.

Command Interpreter System

A command interpreter is an interface of the operating system with the user. The

user gives commands with are executed by operating system (usually by turning them

into system calls). The main function of a command interpreter is to get and execute the

next user specified command. Command-Interpreter is usually not part of the kernel,

since multiple command interpreters (shell, in UNIX terminology) may be support by an

operating system, and they do not really need to run in kernel mode. There are two main

advantages to separating the command interpreter from the kernel.

If we want to change the way the command interpreter looks, i.e., I want to

change the interface of command interpreter, I am able to do that if the command

interpreter is separate from the kernel. I cannot change the code of the kernel so I cannot

modify the interface.

Page 21: operating system lecture notes

If the command interpreter is a part of the kernel it is possible for a malicious

process to gain access to certain part of the kernel that it showed not have to avoid this

ugly scenario it is advantageous to have the command interpreter separate from kernel.

Operating Systems Services

Following are the five services provided by an operating systems to the convenience of

the users.

Program Execution

The purpose of a computer systems is to allow the user to execute programs. So

the operating systems provides an environment where the user can conveniently run

programs. The user does not have to worry about the memory allocation or multitasking

or anything. These things are taken care of by the operating systems.

Running a program involves the allocating and deallocating memory, CPU scheduling in

case of multiprocess. These functions cannot be given to the user-level programs. So

user-level programs cannot help the user to run programs independently without the help

from operating systems.

I/O Operations

Each program requires an input and produces output. This involves the use of I/O.

The operating systems hides the user the details of underlying hardware for the I/O. All

the user sees is that the I/O has been performed without any details. So the operating

systems by providing I/O makes it convenient for the users to run programs.

For efficiently and protection users cannot control I/O so this service cannot be provided

by user-level programs.

File System Manipulation

The output of a program may need to be written into new files or input taken from

some files. The operating systems provide this service. The user does not have to worry

about secondary storage management. User gives a command for reading or writing to a

Page 22: operating system lecture notes

file and sees his task accomplished. Thus operating systems make it easier for user

programs to accomplish their task.

This service involves secondary storage management. The speed of I/O that

depends on secondary storage management is critical to the speed of many programs and

hence I think it is best relegated to the operating systems to manage it than giving

individual users the control of it. It is not difficult for the user-level programs to provide

these services but for above mentioned reasons it is best if this service s left with

operating system.


There are instances where processes need to communicate with each other to

exchange information. It may be between processes running on the same computer or

running on the different computers. By providing this service the operating system

relieves the user of the worry of passing messages between processes. In case where the

messages need to be passed to processes on the other computers through a network it can

be done by the user programs. The user program may be customized to the specifics of

the hardware through which the message transits and provides the service interface to the

operating system.

Error Detection

An error is one part of the system may cause malfunctioning of the complete

system. To avoid such a situation the operating system constantly monitors the system for

detecting the errors. This relieves the user of the worry of errors propagating to various

part of the system and causing malfunctioning.

This service cannot allow to be handled by user programs because it involves

monitoring and in cases altering area of memory or deallocation of memory for a faulty

process. Or may be relinquishing the CPU of a process that goes into an infinite loop.

These tasks are too critical to be handed over to the user programs. A user program if

given these privileges can interfere with the correct (normal) operation of the operating


Page 23: operating system lecture notes

System Calls and System Programs

System calls provide an interface between the process an the operating system.

System calls allow user-level processes to request some services from the operating

system which process itself is not allowed to do. In handling the trap, the operating

system will enter in the kernel mode, where it has access to privileged instructions, and

can perform the desired service on the behalf of user-level process. It is because of the

critical nature of operations that the operating system itself does them every time they are

needed. For example, for I/O a process involves a system call telling the operating system

to read or write particular area and this request is satisfied by the operating system.

System programs provide basic functioning to users so that they do not need to write t

heir own environment for program development (editors, compilers) and program

execution (shells). In some sense, they are bundles of useful system calls.

Layered Approach Design

In this case the system is easier to debug and modify, because changes aff ect only

limited portions of the code, and programmer does not have to know the details of the

other layers. Information is also kept only where it is needed and is accessible only in

certain ways, so bugs affecting that data are limited to a specific module or layer.

Mechanisms and Policies

The policies what is to be done while the mechanism specifies how it is to be

done. For instance, the timer construct for ensuring CPU protection is mechanism. On the

other hand, the decision of how long the timer is set for a particular user is a policy


The separation of mechanism and policy is important to provide flexibility to a

system. If the interface between mechanism and policy is well defined, the change of

Page 24: operating system lecture notes

policy may affect only a few parameters. On the other hand, if interface between these

two is vague or not well defined, it might involve much deeper change to the system.

Once the policy has been decided it gives the programmer the choice of using his/her

own implementation. Also, the underlying implementation may be changed for a more

efficient one without much trouble if the mechanism and policy are well defined.

Specifically, separating these two provides flexibility in a variety of ways.

First, the same mechanism can be used to implement a variety of policies, so

changing the policy might not require the development of a new mechanism, but just a

change in parameters for that mechanism, but just a change in parameters for that

mechanism from a library of mechanisms.

Second, the mechanism can be changed for example, to increase its efficiency or

to move to a new platform, without changing the overall policy.

Layered Approach Design

In this case the system is easier to debug and modify, because changes affect only

limited portions of the code, and programmer does not have to know the details of the

other layers. Information is also kept only where it is needed and is accessible only in

certain ways, so bugs affecting that data are limited to a specific module or layer.

Definition of Process

The notion of process is central to the understanding of operating systems. There are

quite a few definitions presented in the literature, but no "perfect" definition has yet



The term "process" was first used by the designers of the MULTICS in 1960's. Since

then, the term process, used somewhat interchangeably with 'task' or 'job'. The process

has been given many definitions for instance

A program in Execution.

Page 25: operating system lecture notes

An asynchronous activity.

The 'animated sprit' of a procedure in execution.

The entity to which processors are assigned.

The 'dispatch able' unit. Many more definitions have given.

As we can see from above that there is no universally agreed upon definition, but

the definition "Program in Execution" seem to be most frequently used. And this is a

concept are will use in the present study of operating systems.

Now that we agreed upon the definition of process, the question is what the

relation between process and program is. It is same beast with different name or when

this beast is sleeping (not executing) it is called program and when it is executing

becomes process. Well, to be very precise. Process is not the same as program. In the

following discussion we point out some of the difference between process and program.

As we have mentioned earlier.

Process is not the same as program. A process is more than a program code. A

process is an 'active' entity as oppose to program which consider to be a 'passive' entity.

As we all know that a program is an algorithm expressed in some suitable notation, (e.g.,

programming language). Being a passive, a program is only a part of process. Process, on

the other hand, includes:

Current value of Program Counter (PC)

Contents of the processors registers

Value of the variables

The process stack (SP) which typically contains temporary data such as subroutine

parameter, return address, and temporary variables.

A data section that contains global variables.

A process is the unit of work in a system.

In Process model, all software on the computer is organized into a number of sequential

processes. A process includes PC, registers, and variables. Conceptually, each process

has its own virtual CPU. In reality, the CPU switches back and forth among processes.

(The rapid switching back and forth is called multiprogramming).

Page 26: operating system lecture notes

Process State

The process state consist of everything necessary to resume the process execution if it is

somehow put aside temporarily. The process state consists of at least following:

Code for the program.

Program's static data.

Program's dynamic data.

Program's procedure call stack. Contents

of general purpose registers. Contents of

program counter (PC) Contents of

program status word (PSW). Operating

Systems resource in use. Process


Process Creation

In general-purpose systems, some way is needed to create processes as needed during

operation. There are four principal events led to processes creation.

System initialization.

Execution of a process Creation System calls by a running process.

A user request to create a new process.

Initialization of a batch job.

Foreground processes interact with users. Background processes that stay in background

sleeping but suddenly springing to life to handle activity such as email, webpage,

printing, and so on. Background processes are called daemons. This call creates an exact

clone of the calling process.

A process may create a new process by some create process such as 'fork'. It choose to

does so, creating process is called parent process and the created one is called the child

processes. Only one parent is needed to create a child process. Note that unlike plants and

Page 27: operating system lecture notes

animals that use sexual representation, a process has only one parent. This creation of

process (processes) yields a hierarchical structure of processes like one in the figure.

Notice that each child has only one parent but each parent may have many children. After

the fork, the two processes, the parent and the child, have the same memory image, the

same environment strings and the same open files. After a process is created, both the

parent and child have their own distinct address space. If either process changes a word in

its address space, the change is not visible to the other process.

Following are some reasons for creation of a process

User logs on.

User starts a program.

Operating systems creates process to provide service, e.g., to manage printer.

Some program starts another process, e.g., Netscape calls xv to display a picture.

Process Termination

A process terminates when it finishes executing its last statement. Its resources

are returned to the system, it is purged from any system lists or tables, and its process

control block (PCB) is erased i.e., the PCB's memory space is returned to a free memory

pool. The new process terminates the existing process, usually due to following reasons:

Normal Exist

Most processes terminates because they have done their job. This call is exist in


Error Exist

When process discovers a fatal error. For example, a user tries to compile a

program that does not exist.

Fatal Error

An error caused by process due to a bug in program for example, executing an

illegal instruction, referring non-existing memory or dividing by zero.

Killed by another Process

Page 28: operating system lecture notes

A process executes a system call telling the Operating Systems to terminate some

other process. In UNIX, this call is kill. In some systems when a process kills all

processes it created are killed as well (UNIX does not work this way).

Process States

A process goes through a series of discrete process states.

New State

The process being created.

Terminated State

The process has finished execution.

Blocked (waiting) State

When a process blocks, it does so because logically it cannot continue, typically

because it is waiting for input that is not yet available. Formally, a process is said to be

blocked if it is waiting for some event to happen (such as an I/O completion) before it can

proceed. In this state a process is unable to run until some external event happens.

Running State

A process is said t be running if it currently has the CPU, that is, actually using

the CPU at that particular instant.

Ready State

A process is said to be ready if it use a CPU if one were available. It is runable but

temporarily stopped to let another process run.

Logically, the 'Running' and 'Ready' states are similar. In both cases the process is willing

to run, only in the case of 'Ready' state, there is temporarily no CPU available for it. The

Page 29: operating system lecture notes

'Blocked' state is different from the 'Running' and 'Ready' states in that the process cannot

run, even if the CPU is available.

Process State Transitions

Following are six(6) possible transitions among above mentioned five (5) states

Transition 1 occurs when process discovers that it cannot continue. If running process

initiates an I/O operation before its allotted time expires, the running process voluntarily

relinquishes the CPU.

This state transition is:

Block (process-name): Running → Block.

Transition 2

occurs when the scheduler decides that the running process has run long enough and it is

time to let another process have CPU time.

This state transition is:

Time-Run-Out (process-name): Running → Ready.

Transition 3

occurs when all other processes have had their share and it is time for the first process to

run again

Page 30: operating system lecture notes

This state transition is:

Dispatch (process-name): Ready → Running.

Transition 4

Page 31: operating system lecture notes

occurs when the external event for which a process was waiting (such as arrival of input)


This state transition is:

Wakeup (process-name): Blocked → Ready.

Transition 5

occurs when the process is created.

This state transition is:

Admitted (process-name): New → Ready.

Transition 6

occurs when the process has finished execution.

This state transition is:

Exit (process-name): Running → Terminated.

Process Control Block

A process in an operating system is represented by a data structure known as a process

control block (PCB) or process descriptor. The PCB contains important information

about the specific process including

The current state of the process i.e., whether it is ready, running, waiting, or whatever.

Unique identification of the process in order to track "which is which" information.

A pointer to parent process.

Similarly, a pointer to child process (if it exists).

Page 32: operating system lecture notes

The priority of process (a part of CPU scheduling information).

Pointers to locate memory of processes.

A register save area.

The processor it is running on.

The PCB is a certain store that allows the operating systems to locate key information

about a process. Thus, the PCB is the data structure that defines a process to the operating




Despite of the fact that a thread must execute in process, the process and its

associated threads are different concept. Processes are used to group resources together

and threads are the entities scheduled for execution on the CPU.

A thread is a single sequence stream within in a process. Because threads have some of

the properties of processes, they are sometimes called lightweight processes. In a process,

threads allow multiple executions of streams. In many respect, threads are popular way to

improve application through parallelism. The CPU switches rapidly back and forth

among the threads giving illusion that the threads are running in parallel. Like a

traditional process i.e., process with one thread, a thread can be in any of several states

(Running, Blocked, Ready or Terminated). Each thread has its own stack. Since thread

will generally call different procedures and thus a different execution history.

This is why thread needs its own stack. An operating system that has thread facility, the

basic unit of CPU utilization is a thread. A thread has or consists of a program counter

(PC), a register set, and a stack space. Threads are not independent of one other like

processes as a result threads shares with other threads their code section, data section, OS

resources also known as task, such as open files and signals.

Page 33: operating system lecture notes

Processes Vs Threads

As we mentioned earlier that in many respect threads operate in the same way as

that of processes. Some of the similarities and differences are:


Like processes threads share CPU and only one thread active (running) at a time.

Like processes, threads within a processes, threads within a processes execute


Like processes, thread can create children.

And like process, if one thread is blocked, another thread can run.


Unlike processes, threads are not independent of one another.

Unlike processes, all threads can access every address in the task .

Unlike processes, thread is design to assist one other. Note that processes might or

might not assist one another because processes may originate from different


Why Threads?

Following are some reasons why we use threads in designing operating systems.

Processes with multiple threads make a great server for example printer server.

Because threads can share common data, they do not need to use interposes


Because of the very nature, threads can take advantage of multiprocessors.

Threads are cheap in the sense that They only need a stack and storage for registers

therefore, threads are cheap to create.

Page 34: operating system lecture notes

Threads use very little resources of an operating system in which they are

working. That is, threads do not need new address space, global data, program code or

operating system resources. Context switching are fast when working with threads. The

reason is that we only have to save and/or restore PC, SP and registers.

But this cheapness does not come free - the biggest drawback is that there is no

protection between threads.

User-Level Threads

User-level threads implement in user-level libraries, rather than via systems calls,

so thread switching does not need to call operating system and to cause interrupt to the

kernel. In fact, the kernel knows nothing about user-level threads and manages them as if

they were single-threaded processes.


The most obvious advantage of this technique is that a user-level threads package

can be implemented on an Operating System that does not support threads. Some other

advantages are

User-level threads do not require modification to operating systems.

Simple Representation:

Each thread is represented simply by a PC, registers, stack and a small control

block, all stored in the user process address space.

Simple Management:

This simply means that creating a thread, switching between threads and

synchronization between threads can all be done without intervention of the kernel.

Fast and Efficient:

Page 35: operating system lecture notes

Thread switching is not much more expensive than a procedure call.


There is a lack of coordination between threads and operating system kernel .

Therefore, process as whole gets one time slice irrespective of whether process has one

thread or 1000 threads within. It is up to each thread to relinquish control to other threads.

User-level threads require non-blocking systems call i.e., a multithreaded kernel.

Otherwise, entire process will blocked in the kernel, even if there are unable threads left

in the processes. For example, if one thread causes a page fault, the process blocks.

Kernel-Level Threads

In this method, the kernel knows about and manages the threads. No runtime

system is needed in this case. Instead of thread table in each process, the kernel has a

thread table that keeps track of all threads in the system. In addition, the kernel also

maintains the traditional process table to keep track of processes. Operating Systems

kernel provides system call to create and manage threads.


Because kernel has full knowledge of all threads, Scheduler may decide to give

more time to a process having large number of threads than process having small number

of threads.

Kernel-level threads are especially good for applications that frequently block.


The kernel-level threads are slow and inefficient. For instance, threads operations

are hundreds of times slower than that of user-level threads.

Page 36: operating system lecture notes

Since kernel must manage and schedule threads as well as processes. It requires a full

thread control block (TCB) for each thread to maintain information about threads. As a

result there is significant overhead and increased in kernel complexity.

Advantages of Threads over Multiple Processes

Context Switching

Threads are very inexpensive to create and destroy, and they are inexpensive to

represent. For example, they require space to store, the PC, the SP, and the general -

purpose registers, but they do not require space to share memory information,

Information about open files of I/O devices in use, etc. With so little context, it is much

faster to switch between threads. In other words, it is relatively easier for a contex t

switch using threads.


Treads allow the sharing of a lot resources that cannot be shared in process, for

example, sharing code section, data section, Operating System resources like open file


Disadvantages of Threads over Multiprocesses


The major disadvantage if that if the kernel is single threaded, a system call of

one thread will block the whole process and CPU may be idle during the blocking period.


Since there is, an extensive sharing among threads there is a potential problem of

security. It is quite possible that one thread over writes the stack of another thread (or

damaged shared data) although it is very unlikely since threads are meant to cooperate on

a single task.

Application that Benefits from Threads

Page 37: operating system lecture notes

A proxy server satisfying the requests for a number of computers on a LAN

would be benefited by a multi-threaded process. In general, any program that has to do

more than one task at a time could benefit from multitasking. For example, a program

that reads input, process it, and outputs could have three threads, one for each task.

Application that cannot Benefit from Threads

Any sequential process that cannot be divided into parallel task will not benefit

from thread, as they would block until the previous one completes. For example, a

program that displays the time of the day would not benefit from multiple threads.

Resources used in Thread Creation and Process Creation

When a new thread is created it shares its code section, data section and operating

system resources like open files with other threads. But it is allocated its own stack,

register set and a program counter.

The creation of a new process differs from that of a thread mainly in the fact that all the

shared resources of a thread are needed explicitly for each process. So though two

processes may be running the same piece of code they need to have their own copy of the

code in the main memory to be able to run. Two processes also do not share other

resources with each other. This makes the creation of a new process very costly

compared to that of a new thread.

Context Switch

To give each process on a multiprogrammed machine a fair share of the CPU, a

hardware clock generates interrupts periodically. This allows the operating system to

schedule all processes in main memory (using scheduling algorithm) to run on the CPU at

Page 38: operating system lecture notes

equal intervals. Each time a clock interrupt occurs, the interrupt handler checks how

much time the current running process has used. If it has used up its entire time slice,

then the CPU scheduling algorithm (in kernel) picks a different process to run. Each

switch of the CPU from one process to another is called a context switch.

Major Steps of Context Switching

The values of the CPU registers are saved in the process table of the process that

was running just before the clock interrupt occurred.

The registers are loaded from the process picked by the CPU scheduler to run next.

In a multiprogrammed uniprocessor computing system, context switches occur frequently

enough that all processes appear to be running concurrently. If a process has more than

one thread, the Operating System can use the context switching technique to schedule the

threads so they appear to execute in parallel. This is the case if threads are implemented

at the kernel level. Threads can also be implemented entirely at the user level in run -time

libraries. Since in this case no thread scheduling is provided by the Operating System, it

is the responsibility of the programmer to yield the CPU frequently enough in each thread

so all threads in the process can make progress.

Action of Kernel to Context Switch Among Threads

The threads share a lot of resources with other peer threads belonging to the same

process. So a context switch among threads for the same process is easy. It involves

switch of register set, the program counter and the stack. It is relatively easy for the

kernel to accomplish this task.

Action of kernel to Context Switch Among Processes

Context switches among processes are expensive. Before a process can be

switched its process control block (PCB) must be saved by the operating system. The

PCB consists of the following information:

The process state.

The program counter, PC.

The values of the different registers.

Page 39: operating system lecture notes

The CPU scheduling information for the process.

Memory management information regarding the process.

Possible accounting information for this process.

I/O status information of the process.

When the PCB of the currently executing process is saved the operating system loads the

PCB of the next process that has to be run on CPU. This is a heavy task and it takes a lot

of time.

Solaris-2 Operating Systems


At user-level

At Intermediate-level

At kernel-level


The solaris-2 Operating Systems supports:

threads at the user-level.

threads at the kernel-level.

symmetric multiprocessing and

real-time scheduling.

The entire thread system in Solaris is depicted in following figure.

Page 40: operating system lecture notes

At user-level

The user-level threads are supported by a library for the creation and scheduling

and kernel knows nothing of these threads.

These user-level threads are supported by lightweight processes (LWPs). Each LWP is

connected to exactly one kernel-level thread is independent of the kernel.

Many user-level threads may perform one task. These threads may be scheduled

and switched among LWPs without intervention of the kernel.

User-level threads are extremely efficient because no context switch is needs to

block one thread another to start running.

Resource needs of User-level Threads

A user-thread needs a stack and program counter. Absolutely no kernel resource

are required. Since the kernel is not involved in scheduling these user-level threads,

switching among user-level threads are fast and efficient.

At Intermediate-level

Page 41: operating system lecture notes

The lightweight processes (LWPs) are located between the user-level threads and

kernel-level threads. These LWPs serve as a "Virtual CPUs" where user-threads can run.

Each task contains at least one LWp.

The user-level threads are multiplexed on the LWPs of the process.

Resource needs of LWP

An LWP contains a process control block (PCB) with register data, accounting

information and memory information. Therefore, switching between LWPs requires quite

a bit of work and LWPs are relatively slow as compared to user-level threads.

At kernel-level

The standard kernel-level threads execute all operations within the kernel. There

is a kernel-level thread for each LWP and there are some threads that run only on the

kernels behalf and have associated LWP. For example, a thread to service disk requests.

By request, a kernel-level thread can be pinned to a processor (CPU). See the rightmost

thread in figure. The kernel-level threads are scheduled by the kernel's scheduler and

user-level threads blocks.

SEE the diagram in NOTES

In modern solaris-2 a task no longer must block just because a kernel-level threads

blocks, the processor (CPU) is free to run another thread.

Resource needs of Kernel-level Thread

A kernel thread has only small data structure and stack. Switching between kernel

threads does not require changing memory access information and therefore, kernel -

level threads are relating fast and efficient.

Page 42: operating system lecture notes

Unit 2

CPU/Process Scheduling

The assignment of physical processors to processes allows processors to

accomplish work. The problem of determining when processors should be assigned and

to which processes is called processor scheduling or CPU scheduling.

When more than one process is runable, the operating system must decide which

one first. The part of the operating system concerned with this decision is called the

scheduler, and algorithm it uses is called the scheduling algorithm.

Goals of Scheduling (objectives)

In this section we try to answer following question: What the scheduler try to


Many objectives must be considered in the design of a scheduling discipline. In

particular, a scheduler should consider fairness, efficiency, response time, turnaround

time, throughput, etc., Some of these goals depends on the system one is using for

example batch system, interactive system or real-time system, etc. but there are also some

goals that are desirable in all systems.

General Goals


Fairness is important under all circumstances. A scheduler makes sure that each process

gets its fair share of the CPU and no process can suffer indefinite postponement. Note

that giving equivalent or equal time is not fair. Think of safety control and payroll at a

nuclear plant.

Policy Enforcement

The scheduler has to make sure that system's policy is enforced. For example, if the

local policy is safety then the safety control processes must be able to run whenever they

want to, even if it means delay in payroll processes.

Page 43: operating system lecture notes


Scheduler should keep the system (or in particular CPU) busy cent percent of the

time when possible. If the CPU and all the Input/Output devices can be kept running all

the time, more work gets done per second than if some components are idle.

Response Time

A scheduler should minimize the response time for interactive user.


A scheduler should minimize the time batch users must wait for an output.


A scheduler should maximize the number of jobs processed per unit time.

A little thought will show that some of these goals are contradictory. It can be shown that

any scheduling algorithm that favors some class of jobs hurts another class of jobs. The

amount of CPU time available is finite, after all.

Preemptive Vs Nonpreemptive Scheduling

The Scheduling algorithms can be divided into two categories with respect to how they

deal with clock interrupts.

Nonpreemptive Scheduling

A scheduling discipline is nonpreemptive if, once a process has been given the CPU, the

CPU cannot be taken away from that process.

Following are some characteristics of nonpreemptive scheduling

In nonpreemptive system, short jobs are made to wait by longer jobs but the overall

treatment of all processes is fair.

Page 44: operating system lecture notes

In nonpreemptive system, response times are more predictable because incoming high

priority jobs can not displace waiting jobs.

In nonpreemptive scheduling, a schedular executes jobs in the following two situations.

When a process switches from running state to the waiting state.

When a process terminates.

Preemptive Scheduling

A scheduling discipline is preemptive if, once a process has been given the CPU can

taken away.

The strategy of allowing processes that are logically runable to be temporarily suspended

is called Preemptive Scheduling and it is contrast to the "run to completion" method.

CPU/Process Scheduling

The assignment of physical processors to processes allows processors to

accomplish work. The problem of determining when processors should be assigned and

to which processes is called processor scheduling or CPU scheduling.

When more than one process is runable, the operating system must decide which one

first. The part of the operating system concerned with this decision is called the

scheduler, and algorithm it uses is called the scheduling algorithm.

First-Come-First-Served (FCFS) Scheduling

Other names of this algorithm are:

First-In-First-Out (FIFO)



Page 45: operating system lecture notes

Perhaps, First-Come-First-Served algorithm is the simplest scheduling algorithm

is the simplest scheduling algorithm. Processes are dispatched according to their arrival

time on the ready queue. Being a nonpreemptive discipline, once a process has a CPU, it

runs to completion.

The FCFS scheduling is fair in the formal sense or human sense of fairness but it

is unfair in the sense that long jobs make short jobs wait and unimportant jobs make

important jobs wait.

FCFS is more predictable than most of other schemes since it offers time. FCFS

scheme is not useful in scheduling interactive users because it cannot guarantee good

response time. The code for FCFS scheduling is simple to write and understand. One of

the major drawback of this scheme is that the average time is often quite long.

The First-Come-First-Served algorithm is rarely used as a master scheme in modern

operating systems but it is often embedded within other schemes.

Round Robin Scheduling

One of the oldest, simplest, fairest and most widely used algorithm is round robin

(RR).In the round robin scheduling, processes are dispatched in a FIFO manner but are

given a limited amount of CPU time called a time-slice or a quantum.If a process does

not complete before its CPU-time expires, the CPU is preempted and given to the next

process waiting in a queue. The preempted process is then placed at the back of the ready

list.Round Robin Scheduling is preemptive (at the end of time-slice) therefore it is

effective in time-sharing environments in which the system needs to guarantee reasonable

response times for interactive users.

The only interesting issue with round robin scheme is the length of the quantum.

Setting the quantum too short causes too many context switches and lower the CPU

efficiency. On the other hand, setting the quantum too long may cause poor response time

and appoximates FCFS.In any event, the average waiting time under round robin

scheduling is often quite long.

Page 46: operating system lecture notes

Shortest-Job-First (SJF) Scheduling

Other name of this algorithm is Shortest-Process-Next (SPN).

Shortest-Job-First (SJF) is a non-preemptive discipline in which waiting job (or

process) with the smallest estimated run-time-to-completion is run next. In other words,

when CPU is available, it is assigned to the process that has smallest next CPU burst.

The SJF scheduling is especially appropriate for batch jobs for which the run

times are known in advance. Since the SJF scheduling algorithm gives the minimum

average time for a given set of processes, it is probably optimal.

The SJF algorithm favors short jobs (or processors) at the expense of longer ones.

The obvious problem with SJF scheme is that it requires precise knowledge of how long

a job or process will run, and this information is not usually available.The best SJF

algorithm can do is to rely on user estimates of run times.

In the production environment where the same jobs run regularly, it may be

possible to provide reasonable estimate of run time, based on the past performance of the

process. But in the development environment users rarely know how their program will

execute.Like FCFS, SJF is non preemptive therefore, it is not useful in timesharing

environment in which reasonable response time must be guaranteed.

Shortest-Job-First (SJF) Scheduling

Other name of this algorithm is Shortest-Process-Next (SPN).

Shortest-Job-First (SJF) is a non-preemptive discipline in which waiting job (or

process) with the smallest estimated run-time-to-completion is run next. In other words,

when CPU is available, it is assigned to the process that has smallest next CPU burst.

Page 47: operating system lecture notes

The SJF scheduling is especially appropriate for batch jobs for which the run times are

known in advance. Since the SJF scheduling algorithm gives the minimum average time

for a given set of processes, it is probably optimal.

The SJF algorithm favors short jobs (or processors) at the expense of longer ones.

The obvious problem with SJF scheme is that it requires precise knowledge of how long

a job or process will run, and this information is not usually available.

The best SJF algorithm can do is to rely on user estimates of run times.

In the production environment where the same jobs run regularly, it may be

possible to provide reasonable estimate of run time, based on the past performance of the

process. But in the development environment users rarely know how their program will

execute.Like FCFS, SJF is non preemptive therefore, it is not useful in timesharing

environment in which reasonable response time must be guaranteed.

Shortest-Remaining-Time (SRT) Scheduling

The SRT is the preemtive counterpart of SJF and useful in time-sharing


In SRT scheduling, the process with the smallest estimated run-time to

completion is run next, including new arrivals.

In SJF scheme, once a job begin executing, it run to completion.

In SJF scheme, a running process may be preempted by a new arrival process

with shortest estimated run-time.

The algorithm SRT has higher overhead than its counterpart SJF.

The SRT must keep track of the elapsed time of the running process and must

handle occasional preemptions.

In this scheme, arrival of small processes will run almost immediately. However,

longer jobs have even longer mean waiting time.

Page 48: operating system lecture notes

Priority Scheduling

The basic idea is straightforward: each process is assigned a priority, and priority

is allowed to run. Equal-Priority processes are scheduled in FCFS order. The shortest-

Job-First (SJF) algorithm is a special case of general priority scheduling algorithm.

An SJF algorithm is simply a priority algorithm where the priority is the inverse of the

(predicted) next CPU burst. That is, the longer the CPU burst, the lower the priority and

vice versa.

Priority can be defined either internally or externally. Internally defined priorities

use some measurable quantities or qualities to compute priority of a process.

Examples of Internal priorities are

Time limits.

Memory requirements.

File requirements,

for example, number of open files.

CPU Vs I/O requirements.

Externally defined priorities are set by criteria that are external to operating system such


The importance of process.

Type or amount of funds being paid for computer use.

The department sponsoring the work.


Priority scheduling can be either preemptive or non preemptive

A preemptive priority algorithm will preemptive the CPU if the priority of the

newly arrival process is higher than the priority of the currently running process.

A non-preemptive priority algorithm will simply put the new process at the head

of the ready queue.

A major problem with priority scheduling is indefinite blocking or starvation. A

solution to the problem of indefinite blockage of the low-priority process is aging. Aging

Page 49: operating system lecture notes

is a technique of gradually increasing the priority of processes that wait in the system for

a long period of time.

Multilevel Queue Scheduling

A multilevel queue scheduling algorithm partitions the ready queue in several

separate queues.

In a multilevel queue scheduling processes are permanently assigned to one


The processes are permanently assigned to one another, based on some property

of the process, such as

Memory size

Process priority

Process type

Algorithm choose the process from the occupied queue that has the highest priority,

and run that process either

Preemptive or


Each queue has its own scheduling algorithm or policy.

Possibility I

If each queue has absolute priority over lower-priority queues then no process in the

queue could run unless the queue for the highest-priority processes were all empty.

For example, in the above figure no process in the batch queue could run unless th e

queues for system processes, interactive processes, and interactive editing processes will

all empty.

Possibility II

If there is a time slice between the queues then each queue gets a certain amount of

CPU times, which it can then schedule among the processes in its queue. For instance;

Page 50: operating system lecture notes

80% of the CPU time to foreground queue using RR.

20% of the CPU time to background queue using FCFS.

Since processes do not move between queue so, this policy has the advantage of low

scheduling overhead, but it is inflexible.

Multilevel Feedback Queue Scheduling

Multilevel feedback queue-scheduling algorithm allows a process to move between

queues. It uses many ready queues and associate a different priority with each queue.

The Algorithm chooses to process with highest priority from the occupied queue and run

that process either preemptively or unpreemptively. If the process uses too much CPU

time it will moved to a lower-priority queue. Similarly, a process that wait too long in the

lower-priority queue may be moved to a higher-priority queue may be moved to a

highest-priority queue. Note that this form of aging prevents starvation.

A process entering the ready queue is placed in queue 0.

If it does not finish within 8 milliseconds time, it is moved to the tail of queue 1.

If it does not complete, it is preempted and placed into queue 2.

Processes in queue 2 run on a FCFS basis, only when queue 2 run on a FCFS basis, only

when queue 0 and queue 1 are empty.


A set of process is in a deadlock state if each process in the set is waiting for an

event that can be caused by only another process in the set. In other words, each member

of the set of deadlock processes is waiting for a resource that can be released only by a

deadlock process. None of the processes can run, none of them can release any resources,

and none of them can be awakened. It is important to note that the number of processes

and the number and kind of resources possessed and requested are unimportant.

Page 51: operating system lecture notes

The resources may be either physical or logical. Examples of physical resources

are Printers, Tape Drivers, Memory Space, and CPU Cycles. Examples of logical

resources are Files, Semaphores, and Monitors.

The simplest example of deadlock is where process 1 has been allocated non -

shareable resources A, say, a tap drive, and process 2 has be allocated non-sharable

resource B, say, a printer. Now, if it turns out that process 1 needs resource B (printer) to

proceed and process 2 needs resource A (the tape drive) to proceed and these are the only

two processes in the system, each is blocked the other and all useful work in the system

stops. This situation ifs termed deadlock. The system is in deadlock state because each

process holds a resource being requested by the other process neither process is willing to

release the resource it holds.

Preemptable and Nonpreemptable Resources

Resources come in two flavors: preemptable and nonpreemptable. A preemptable

resource is one that can be taken away from the process with no ill effects. Memory is an

example of a preemptable resource. On the other hand, a nonpreemptable resource is one

that cannot be taken away from process (without causing ill effect). For example, CD

resources are not preemptable at an arbitrary moment.

Reallocating resources can resolve deadlocks that involve preemptable resources.

Deadlocks that involve nonpreemptable resources are difficult to deal with.

Necessary and Sufficient Deadlock Conditions

1. Mutual Exclusion Condition

The resources involved are non-shareable.

Explanation: At least one resource (thread) must be held in a non-shareable mode, that

is, only one process at a time claims exclusive control of the resource. If another process

requests that resource, the requesting process must be delayed until the resource has

been released.

Page 52: operating system lecture notes

2. Hold and Wait Condition

Requesting process hold already, resources while waiting for requested resources.

Explanation: There must exist a process that is holding a resource already allocated to

it while waiting for additional resource that are currently being held by other processes.

3. No-Preemptive Condition

Resources already allocated to a process cannot be preempted.

Explanation: Resources cannot be removed from the processes are used to completion or

released voluntarily by the process holding it.

4. Circular Wait Condition

The processes in the system form a circular list or chain where each process in the list is

waiting for a resource held by the next process in the list.

As an example, consider the traffic deadlock in the following figure

Page 53: operating system lecture notes

Consider each section of the street as a resource.

Mutual exclusion condition applies, since only one vehicle can be on a section of the

street at a time.

Hold-and-wait condition applies, since each vehicle is occupying a section of the street,

and waiting to move on to the next section of the street.

No-preemptive condition applies, since a section of the street that is a section of the street

that is occupied by a vehicle cannot be taken away from it.

Circular wait condition applies, since each vehicle is waiting on the next vehicle to move.

That is, each vehicle in the traffic is waiting for a section of street held by the next

vehicle in the traffic.

The simple rule to avoid traffic deadlock is that a vehicle should only enter an

intersection if it is assured that it will not have to stop inside the intersection.

It is not possible to have a deadlock involving only one single process. The

deadlock involves a circular “hold-and-wait” condition between two or more processes,

Page 54: operating system lecture notes

so “one” process cannot hold a resource, yet be waiting for another resource that it is

holding. In addition, deadlock is not possible between two threads in a process, because it

is the process that holds resources, not the thread that is, each thread has access to the

resources held by the process.

Deadlock Prevention

Elimination of “Mutual Exclusion” Condition

The mutual exclusion condition must hold for non-sharable resources. That is,

several processes cannot simultaneously share a single resource. This condition is

difficult to eliminate because some resources, such as the tap drive and printer, are

inherently non-shareable. Note that shareable resources like read-only-file do not require

mutually exclusive access and thus cannot be involved in deadlock.

Elimination of “Hold and Wait” Condition

There are two possibilities for elimination of the second condition. The first

alternative is that a process request be granted all of the resources it needs at once, prior

to execution. The second alternative is to disallow a process from requesting resources

whenever it has previously allocated resources. This strategy requires that all of the

resources a process will need must be requested at once. The system must grant resources

on “all or none” basis. If the complete set of resources needed by a process is not

currently available, then the process must wait until the complete set is available. While

the process waits, however, it may not hold any resources. Thus the “wait for” condition

is denied and deadlocks simply cannot occur. This strategy can lead to serious waste of

resources. For example, a program requiring ten tap drives must request and receive all

ten derives before it begins executing. If the program needs only one tap drive to begin

execution and then does not need the remaining tap drives for several hours. Then

substantial computer resources (9 tape drives) will sit idle for several hours. This strategy

Page 55: operating system lecture notes

can cause indefinite postponement (starvation). Since not all the required resources may

become available at once.

Elimination of “No-preemption” Condition

The nonpreemption condition can be alleviated by forcing a process waiting for a

resource that cannot immediately be allocated to relinquish all of its currently held

resources, so that other processes may use them to finish. Suppose a system does allow

processes to hold resources while requesting additional resources. Consider what happens

when a request cannot be satisfied. A process holds resources a second process may need

in order to proceed while second process may hold the resources needed by the first

process. This is a deadlock. This strategy require that when a process that is holding some

resources is denied a request for additional resources. The process must release its held

resources and, if necessary, request them again together with additional resources.

Implementation of this strategy denies the “no-preemptive” condition effectively.

High Cost When a process release resources the process may lose all its work to that

point. One serious consequence of this strategy is the possibility of indefinite

postponement (starvation). A process might be held off indefinitely as it repeatedly

requests and releases the same resources.

Elimination of “Circular Wait” Condition

The last condition, the circular wait, can be denied by imposing a total ordering

on all of the resource types and than forcing, all processes to request the resources in

order (increasing or decreasing). This strategy impose a total ordering of all resources

types, and to require that each process requests resources in a numerical order (increasing

or decreasing) of enumeration. With this rule, the resource allocation graph can never

have a cycle.

For example, provide a global numbering of all the resources, as shown

Page 56: operating system lecture notes

Now the rule is this: processes can request resources whenever they want to, but all

requests must be made in numerical order. A process may request first printer and then a

tape drive (order: 2, 4), but it may not request first a plotter and then a printer (order: 3,

2). The problem with this strategy is that it may be impossible to find an ordering that

satisfies everyone.

Deadlock Avoidance

This approach to the deadlock problem anticipates deadlock before it actually

occurs. This approach employs an algorithm to access the possibility that deadlock could

occur and acting accordingly. This method differs from deadlock prevention, which

guarantees that deadlock cannot occur by denying one of the necessary conditions of


If the necessary conditions for a deadlock are in place, it is still possible to avoid

deadlock by being careful when resources are allocated. Perhaps the most famous

deadlock avoidance algorithm, due to Dijkstra [1965], is the Banker’s algorithm. So

named because the process is analogous to that used by a banker in deciding if a loan can

be safely made.

In this analogy

Banker’s Algorithm

CustomersUsed Max

A 0 6

B 0 5

C 0 4

D 0 7


Units = 10

Fig. 1

Page 57: operating system lecture notes

In the above figure, we see four customers each of whom has been granted a

number of credit nits. The banker reserved only 10 units rather than 22 units to servi ce

them. At certain moment, the situation becomes

CustomersUsed Max

A 1 6

B 1 5 Available

C 2 4 Units = 2

Safe State

D 4 7

The key to a state being safe is that there is at least one way for all users to finish.

In other analogy, the state of figure 2 is safe because with 2 units left, the banker can

delay any request except C's, thus letting C finish and release all four resources. With

four units in hand, the banker can let either D or B have the necessary units and so on.

Unsafe State

Consider what would happen if a request from B for one more unit were granted

in above

We would have following situation

CustomersUsed Max

A 1 6

B 2 5

C 2 4

D 4 7


Units = 1

This is an unsafe state.

Page 58: operating system lecture notes

If all the customers namely A, B, C, and D asked for their maximum loans, then banker

could not satisfy any of them and we would have a deadlock.

Important Note:

It is important to note that an unsafe state does not imply the existence or even the

eventual existence a deadlock. What an unsafe state does imply is simply that

some unfortunate sequence of events might lead to a deadlock.

The Banker's algorithm is thus to consider each request as it occurs, and see if granting it

Deadlock Detection

Deadlock detection is the process of actually determining that a deadlock exists

and identifying the processes and resources involved in the deadlock.

The basic idea is to check allocation against resource availability for all possible

allocation sequences to determine if the system is in deadlocked state a. Of course, the

deadlock detection algorithm is only half of this strategy. Once a deadlock is detected,

there needs to be a way to recover several alternatives exists:

Temporarily prevent resources from deadlocked processes.

Back off a process to some check point allowing preemption of a needed resource and

restarting the process at the checkpoint later.

Successively kill processes until the system is deadlock free.

These methods are expensive in the sense that each iteration calls the detec tion algorithm

until the system proves to be deadlock free. The complexity of algorithm is O(N2) where

N is the number of proceeds. Another potential problem is starvation; same process killed


File System Implementation

File-System Structure

File-System Implementation

Directory Implementation

Page 59: operating system lecture notes

Allocation Methods

Free-Space Management

Efficiency and Performance


Log-Structured File Systems


Example: WAFL File System


To describe the details of implementing local file systems and directory structures

To describe the implementation of remote file systems

To discuss block allocation and free-block algorithms and trade-offs

File-System Structure

File structure

Logical storage unit

Collection of related information

File system resides on secondary storage (disks)

File system organized into layers

File control block – storage structure consisting of information about a file

Layered File System

Page 60: operating system lecture notes

A Typical File Control Block

Page 61: operating system lecture notes

The following figure illustrates the necessary file system structures provided by the

operating systems.

Virtual File Systems

Virtual File Systems (VFS) provide an object-oriented way of implementing file

systems. VFS allows the same system call interface (the API) to be used for different

types of file systems.

The API is to the VFS interface, rather than any specific type of file system.

Schematic View of Virtual File System

Page 62: operating system lecture notes

Directory Implementation

Linear list of file names with pointer to the data blocks.

simple to program

time-consuming to execute

Hash Table – linear list with hash data structure.

decreases directory search time

collisions – situations where two file names hash to the same location

fixed size

Allocation Methods

An allocation method refers to how disk blocks are allocated for files:

Page 63: operating system lecture notes

Contiguous allocation

Linked allocation

Page 64: operating system lecture notes

Indexed allocation

Contiguous Allocation

Each file occupies a set of contiguous blocks on the disk

Simple – only starting location (block #) and length (number of blocks) are


Random access

Wasteful of space (dynamic storage-allocation problem)

Files cannot grow

Mapping from logical to physical

Contiguous Allocation of Disk Space

Extent-Based Systems

Many newer file systems (I.e. Veritas File System) use a modified contiguous

allocation scheme

Page 65: operating system lecture notes

Extent-based file systems allocate disk blocks in extents

An extent is a contiguous block of disks

Extents are allocated for file allocation

A file consists of one or more extents.

Linked Allocation

Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk.

Simple – need only starting address

Free-space management system – no waste of space

Page 66: operating system lecture notes

No random access


Indexed Allocation

Brings all pointers together into the index block.

Logical view.

Need index table

Random access

Dynamic access without external fragmentation, but have overhead of

index block.

Mapping from logical to physical in a file of maximum size of 256K words and block

size of 512 words. We need only 1 block for index table.

Mapping from logical to physical in a file of unbounded length (block size of 512 words).

Free-Space Management

Bit map requires extra space


block size = 212 bytes

disk size = 230 bytes (1 gigabyte)

n = 230/212 = 218 bits (or 32K bytes)

Easy to get contiguous files

Linked list (free list)

Cannot get contiguous space easily

No waste of space



Free-Space Management

Need to protect:

Pointer to free list

Bit map

Must be kept on disk

Page 67: operating system lecture notes

Copy in memory and disk may differ

Cannot allow for block[i] to have a situation where bit[i] = 1 in memory and bit[i]

= 0 on disk


Set bit[i] = 1 in disk

Allocate block[i]

Set bit[i] = 1 in memory

Directory Implementation

Linear list of file names with pointer to the data blocks

simple to program

time-consuming to execute

Hash Table – linear list with hash data structure

decreases directory search time

collisions – situations where two file names hash to the same location

fixed size

Linked Free Space List on Disk

Efficiency and Performance

Efficiency dependent on:

disk allocation and directory algorithms

types of data kept in file’s directory entry


disk cache – separate section of main memory for frequently used blocks

free-behind and read-ahead – techniques to optimize sequential access

improve PC performance by dedicating section of memory as virtual disk, or RAM disk

Page 68: operating system lecture notes

Page Cache

A page cache caches pages rather than disk blocks using virtual memory techniques

Memory-mapped I/O uses a page cache

Routine I/O through the file system uses the buffer (disk) cache

This leads to the following figure

I/O Without a Unified Buffer Cache

Page 69: operating system lecture notes

Unified Buffer Cache

A unified buffer cache uses the same page cache to cache both memory-mapped

pages and ordinary file system I/O

I/O Using a Unified Buffer Cache


Consistency checking – compares data in directory structure with data blocks on disk,

and tries to fix inconsistencies

Use system programs to back up data from disk to another storage device (floppy

disk, magnetic tape, other magnetic disk, optical)

Recover lost file or disk by restoring data from backup

Log Structured File Systems

Log structured (or journaling) file systems record each update to the file system

as a transaction

All transactions are written to a log

A transaction is considered committed once it is written to the log

However, the file system may not yet be updated

The transactions in the log are asynchronously written to the file system

When the file system is modified, the transaction is removed from the log

If the file system crashes, all remaining transactions in the log must still be p erformed

Page 70: operating system lecture notes

The Sun Network File System (NFS)

An implementation and a specification of a software system for accessing remote

files across LANs (or WANs).The implementation is part of the Solaris and SunOS

operating systems running on Sun workstations using an unreliable datagram protocol

(UDP/IP protocol and EthernetInterconnected workstations viewed as a set of

independent machines with independent file systems, which allows sharing among these

file systems in a transparent manner.

A remote directory is mounted over a local file system directory The mounted

directory looks like an integral subtree of the local file system, replacing the subtree

descending from the local directory Specification of the remote directory for the mount

operation is nontransparent; the host name of the remote directory has to be provided

Files in the remote directory can then be accessed in a transparent manner Subject to

access-rights accreditation, potentially any file system (or directory within a file system),

can be mounted remotely on top of any local directory NFS is designed to operate in a

heterogeneous environment of different machines, operating systems, and network

architectures; the NFS specifications independent of these media.

This independence is achieved through the use of RPC primitives built on top of

an External Data Representation (XDR) protocol used between two implementation -

independent interfaces

The NFS specification distinguishes between the services provided by a mount

mechanism and the actual remote-file-access services

NFS Mount Protocol

Establishes initial logical connection between server and client

Mount operation includes name of remote directory to be mounted and name of server

machine storing it

Page 71: operating system lecture notes

Mount request is mapped to corresponding RPC and forwarded to mount server running

on server machine

Export list – specifies local file systems that server exports for mounting, along with

names of machines that are permitted to mount them

Following a mount request that conforms to its export list, the server returns a file

handle—a key for further accesses

File handle – a file-system identifier, and an inode number to identify the mounted

directory within the exported file system

The mount operation changes only the user’s view and does not affect the server side

NFS Protocol

Provides a set of remote procedure calls for remote file operations. The procedures

support the following operations:

searching for a file within a directory

reading a set of directory entries

manipulating links and directories

accessing file attributes

reading and writing files

NFS servers are stateless; each request has to provide a full set of arguments

(NFS V4 is just coming available – very different, stateful)

Modified data must be committed to the server’s disk before results are returned to the

client (lose advantages of caching)

The NFS protocol does not provide concurrency-control mechanisms

Page 72: operating system lecture notes

Three Major Layers of NFS Architecture

UNIX file-system interface (based on the open, read, write, and close calls, and file


Virtual File System (VFS) layer – distinguishes local files from remote ones, and local

files are further distinguished according to their file-system types

The VFS activates file-system-specific operations to handle local requests according to

their file-system types

Calls the NFS protocol procedures for remote requests

NFS service layer – bottom layer of the architecture

Implements the NFS protocol

Performed by breaking the path into component names and performing a separate NFS

lookup call for every pair of component name and directory vnode

To make lookup faster, a directory name lookup cache on the client’s side holds the

vnodes for remote directory names

NFS Remote Operations

Nearly one-to-one correspondence between regular UNIX system calls and the

NFS protocol RPCs (except opening and closing files)

NFS adheres to the remote-service paradigm, but employs buffering and caching

techniques for the sake of performance

File-blocks cache – when a file is opened, the kernel checks with the remote server

whether to fetch or revalidate the cached attributes

Cached file blocks are used only if the corresponding cached attributes are up to date

File-attribute cache – the attribute cache is updated whenever new attributes arrive from

the server

Page 73: operating system lecture notes

Clients do not free delayed-write blocks until the server confirms that the data have been

written to disk

Example: WAFL File System

Used on Network Appliance “Filers” – distributed file system appliances

“Write-anywhere file layout”

Serves up NFS, CIFS, http, ftp

Random I/O optimized, write optimized

NVRAM for write caching

Similar to Berkeley Fast File System, with extensive modifications

Page 74: operating system lecture notes

File-System Interface

File Concept

Access Methods

Directory Structure

File-System Mounting

File Sharing



To explain the function of file systems

To describe the interfaces to file systems

To discuss file-system design tradeoffs, including access methods, file

sharing, file locking, and directory structures

To explore file-system protection

File Concept

Contiguous logical address space







File Structure

None - sequence of words, bytes

Simple record structure


Fixed length

Page 75: operating system lecture notes

Variable length

Complex Structures

Formatted document

Relocatable load file

Can simulate last two with first method by inserting appropriate control characters

Who decides:

Operating system


File Attributes

Name – only information kept in human-readable form

Identifier – unique tag (number) identifies file within file system

Type – needed for systems that support different types

Location – pointer to file location on device

Size – current file size

Protection – controls who can do reading, writing, executing

Time, date, and user identification – data for protection, security, and usage


Information about files are kept in the directory structure, which is maintained on the disk

File is an abstract data type




Reposition within file



File Operations

Open(Fi) – search the directory structure on disk for entry Fi, and move the content of

entry to memory

Page 76: operating system lecture notes

Close (Fi) – move the content of entry Fi in memory to directory structure on disk

Open Files

Several pieces of data are needed to manage open files:

File pointer: pointer to last read/write location, per process that has the file open

File-open count: counter of number of times a file is open – to allow removal of data

from open-file table when last processes closes it

Disk location of the file: cache of data access information

Access rights: per-process access mode information

Open File Locking

Provided by some operating systems and file systems

Mediates access to a file

Mandatory or advisory:

Mandatory – access is denied depending on locks held and requested

Advisory – processes can find status of locks and decide what to do

File Locking Example – Java API

import java.io.*;

import java.nio.channels.*;

public class LockingExample {

public static final boolean EXCLUSIVE = false;

public static final boolean SHARED = true;

public static void main(String arsg[]) throws IOException {

FileLock sharedLock = null;

FileLock exclusiveLock = null;

try {

RandomAccessFile raf = new RandomAccessFile("file.txt", "rw");

// get the channel for the file

FileChannel ch = raf.getChannel();

// this locks the first half of the file - exclusive

Page 77: operating system lecture notes

exclusiveLock = ch.lock(0, raf.length()/2, EXCLUSIVE);

/** Now modify the data . . . */

// release the lock



// this locks the second half of the file - shared

sharedLock = ch.lock(raf.length()/2+1, raf.length(),

/** Now read the data . . . */

// release the lock


} catch (java.io.IOException ioe) {


}finally {

if (exclusiveLock != null)


if (sharedLock != null)





File Types – Name, Extension

Access Methods

Sequential Access

read next

write next


Page 78: operating system lecture notes

no read after last write


Direct Access

read n

write n

position to n

read next

write next

rewrite n

n = relative block number

Sequential-access File

Simulation of Sequential Access on a Direct-access File

Example of Index and Relative Files

Directory Structure

A collection of nodes containing information about all files

A Typical File-system Organization

Operations Performed on Directory

Search for a file

Create a file

Delete a file

List a directory

Rename a file

Traverse the file system

Organize the Directory (Logically) to Obtain

Efficiency – locating a file quickly

Naming – convenient to users

Two users can have same name for different files

The same file can have several different names

Page 79: operating system lecture notes

Grouping – logical grouping of files by properties, (e.g., all Java programs, all

games, …)

A single directory for all users

Single-Level Directory

Separate directory for each user

Tree-Structured Directories

Two-Level Directory

Page 80: operating system lecture notes

Efficient searching

Grouping Capability

Current directory (working directory)

Absolute or relative path name

Creating a new file is done in current directory

Delete a file

rm <file-name>

Creating a new subdirectory is done in current directory

mkdir <dir-name>

Example: if in current directory /mail

mkdir count

Acyclic-Graph Directories

Have shared subdirectories and files

Page 81: operating system lecture notes

Two different names (aliasing)

If dict deletes list dangling pointer


Backpointers, so we can delete all pointers

Variable size records a problem

Backpointers using a daisy chain organization

Entry-hold-count solution

New directory entry type

Link – another name (pointer) to an existing file

Resolve the link – follow pointer to locate the file

Page 82: operating system lecture notes

General Graph Directory

General Graph Directory (Cont.)

How do we guarantee no cycles?

Allow only links to file not subdirectories

Garbage collection

Every time a new link is added use a cycle detection

algorithm to determine whether it is OK

File System Mounting

A file system must be mounted before it can be accessed

A unmounted file system (i.e. Fig. 11-11(b)) is mounted at a mount point

(a) Existing. (b) Unmounted Partition Mount Point

File Sharing

Sharing of files on multi-user systems is desirable

Sharing may be done through a protection scheme

On distributed systems, files may be shared across a network

Network File System (NFS) is a common distributed file-sharing method

File Sharing – Multiple Users

User IDs identify users, allowing permissions and protections to be per-user

Group IDs allow users to be in groups, permitting group access rights

File Sharing – Remote File Systems

Uses networking to allow file system access between systems

Page 83: operating system lecture notes

Manually via programs like FTP

Automatically, seamlessly using distributed file systems

Semi automatically via the world wide web

Client-server model allows clients to mount remote file systems from servers

Server can serve multiple clients

Client and user-on-client identification is insecure or complicated

NFS is standard UNIX client-server file sharing protocol

CIFS is standard Windows protocol

Standard operating system file calls are translated into remote calls

Distributed Information Systems (distributed naming services) such as LDAP, DNS,

NIS, Active Directory implement unified access to information needed for remote


File Sharing – Failure Modes

Remote file systems add new failure modes, due to network failure, server failure

Recovery from failure can involve state information about status of each remote request

Stateless protocols such as NFS include all information in each request, allowing easy

recovery but less security

File Sharing – Consistency Semantics

Consistency semantics specify how multiple users are to access a shared file

simultaneously Similar to process synchronization algorithms.

Tend to be less complex due to disk I/O and network latency (for remote file systems

Andrew File System (AFS) implemented complex remote file sharing semantics

Unix file system (UFS) implements:

Writes to an open file visible immediately to other users of the same open file

Sharing file pointer to allow multiple users to read and write concurrently

Page 84: operating system lecture notes

AFS has session semantics

Writes only visible to sessions starting after the file is closed


File owner/creator should be able to control:

Types of access







Access Lists and Groups

Mode of access: read, write, execute

Three classes of users


a) owner access 7 1 1 1


b) group access 6 1 1 0


c) public access 1 0 0 1

Ask manager to create a group (unique name), say G, and add some users to the group.

For a particular file (say game) or subdirectory, define an appropriate access.

Page 85: operating system lecture notes

Mass-Storage Systems

Overview of Mass Storage Structure

Disk Structure

Disk Attachment

Disk Scheduling

Disk Management

Swap-Space Management

RAID Structure

Disk Attachment

Stable-Storage Implementation

Tertiary Storage Devices

Operating System Issues

Performance Issues


Describe the physical structure of secondary and tertiary storage devices and

the resulting effects on the uses of the devices

Explain the performance characteristics of mass-storage devices

Discuss operating-system services provided for mass storage, including RAID and HSM

Overview of Mass Storage Structure

Magnetic disks provide bulk of secondary storage of modern computers

Drives rotate at 60 to 200 times per second

Transfer rate is rate at which data flow between drive and computer

Positioning time (random-access time) is time to move disk arm to desired cylinder

(seek time) and time for desired sector to rotate under the disk head (rotational latency)

Head crash results from disk head making contact with the disk surface

That’s bad

Disks can be removable

Drive attached to computer via I/O bus

Page 86: operating system lecture notes

Busses vary, including EIDE, ATA, SATA, USB, Fibre Channel, SCSI

Host controller in computer uses bus to talk to disk controller built into drive or storage


Moving-head Disk Mechanism

Magnetic tape

Was early secondary-storage medium

Relatively permanent and holds large quantities of data

Access time slow

Random access ~1000 times slower than disk

Mainly used for backup, storage of infrequently-used data, transfer medium between


Kept in spool and wound or rewound past read-write head

Once data under head, transfer rates comparable to disk

20-200GB typical storage

Common technologies are 4mm, 8mm, 19mm, LTO-2 and SDLT

Page 87: operating system lecture notes

Disk Structure

Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the

logical block is the smallest unit of transfer.

The 1-dimensional array of logical blocks is mapped into the sectors of the disk


Sector 0 is the first sector of the first track on the outermost cylinder.

Mapping proceeds in order through that track, then the rest of the tracks in that cylinder,

and then through the rest of the cylinders from outermost to innermost.

Disk Attachment

Host-attached storage accessed through I/O ports talking to I/O busses

SCSI itself is a bus, up to 16 devices on one cable, SCSI initiator requests operation and

SCSI targets perform tasks

Each target can have up to 8 logical units (disks attached to device controller

FC is high-speed serial architecture

Can be switched fabric with 24-bit address space – the basis of storage area networks

(SANs) in which many hosts attach to many storage units

Can be arbitrated loop (FC-AL) of 126 devices

Network-Attached Storage

Network-attached storage (NAS) is storage made available over a network rather than

over a local connection (such as a bus)

NFS and CIFS are common protocols

Implemented via remote procedure calls (RPCs) between host and storage

New iSCSI protocol uses IP network to carry the SCSI protocol

Page 88: operating system lecture notes

Storage Area Network

Common in large storage environments (and becoming more common)

Multiple hosts attached to multiple storage arrays - flexible

Disk Scheduling

The operating system is responsible for using hardware efficiently — for the disk drives,

this means having a fast access time and disk bandwidth.

Access time has two major components

Seek time is the time for the disk are to move the heads to the cylinder containing the

desired sector.

Rotational latency is the additional time waiting for the disk to rotate the desired sector to

the disk head.

Minimize seek time

Seek time seek distance

Disk bandwidth is the total number of bytes transferred, divided by the total time between

the first request for service and the completion of the last transfer.

Several algorithms exist to schedule the servicing of disk I/O requests.

We illustrate them with a request queue (0-199).

98, 183, 37, 122, 14, 124, 65, 67

Head pointer 53

Page 89: operating system lecture notes



Selects the request with the minimum seek time from the current head position.

SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests.

Illustration shows total head movement of 236 cylinders.

Page 90: operating system lecture notes


The disk arm starts at one end of the disk, and moves toward the other end, servicing

requests until it gets to the other end of the disk, where the head movement is reversed

and servicing continues.

Sometimes called the elevator algorithm.

Illustration shows total head movement of 208 cylinders.

Page 91: operating system lecture notes


Provides a more uniform wait time than SCAN.

The head moves from one end of the disk to the other. servicing requests as it goes.

When it reaches the other end, however, it immediately returns to the beginning of the

disk, without servicing any requests on the return trip.

Treats the cylinders as a circular list that wraps around from the last cylinder to the first


Page 92: operating system lecture notes


Version of C-SCAN

Arm only goes as far as the last request in each direction, then reverses direction

immediately, without first going all the way to the end of the disk.

Page 93: operating system lecture notes

Selecting a Disk-Scheduling Algorithm

SSTF is common and has a natural appeal

SCAN and C-SCAN perform better for systems that place a heavy load on the disk.

Performance depends on the number and types of requests.

Requests for disk service can be influenced by the file-allocation method.

The disk-scheduling algorithm should be written as a separate module of the operating

system, allowing it to be replaced with a different algorithm if necessary.

Either SSTF or LOOK is a reasonable choice for the default algorithm.

Disk Management

Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk

controller can read and write.

To use a disk to hold files, the operating system still needs to record its own data

structures on the disk.

Partition the disk into one or more groups of cylinders.

Page 94: operating system lecture notes

Logical formatting or “making a file system”.

Boot block initializes system.

The bootstrap is stored in ROM.

Bootstrap loader program.

Methods such as sector sparing used to handle bad blocks.

Booting from a Disk in Windows 2000

Swap-Space Management

Swap-space — Virtual memory uses disk space as an extension of main memory.

Swap-space can be carved out of the normal file system,or, more commonly, it can be

in a separate disk partition.

Swap-space management

4.3BSD allocates swap space when process starts; holds text segment (the program) and

data segment.

Kernel uses swap maps to track swap-space use.

Solaris 2 allocates swap space only when a page is forced out of physical memory,

not when the virtual memory page is first created.

RAID Structure

Data Structures for Swapping on Linux Systems

RAID – multiple disk drives provides reliability via redundancy.

RAID is arranged into six different levels.

RAID (cont)

Several improvements in disk-use techniques involve the use of multiple disks working


Disk striping uses a group of disks as one storage unit.

Page 95: operating system lecture notes

RAID schemes improve performance and improve the reliability of the storage system

by storing redundant data.

Mirroring or shadowing keeps duplicate of each disk.

Block interleaved parity uses much less redundancy.

RAID Levels

Page 96: operating system lecture notes

RAID (0 + 1) and (1 + 0)

Page 97: operating system lecture notes

Stable-Storage Implementation

Write-ahead log scheme requires stable storage.

To implement stable storage:

Replicate information on more than one nonvolatile storage media with independent

failure modes.

Update information in a controlled manner to ensure that we can recover the stable data

after any failure during data transfer or recovery.

Tertiary Storage Devices

Low cost is the defining characteristic of tertiary storage.

Generally, tertiary storage is built using removable media

Common examples of removable media are floppy disks and CD-ROMs; other types

are available.

Removable Disks

Floppy disk — thin flexible disk coated with magnetic material, enclosed in a protective

plastic case.

Most floppies hold about 1 MB; similar technology is used for removable disks that hold

more than 1 GB.

Removable magnetic disks can be nearly as fast as hard disks, but they are at a

greater risk of damage from exposure.

Removable Disks (Cont.)

A magneto-optic disk records data on a rigid platter coated with magnetic

material. Laser heat is used to amplify a large, weak magnetic field to record a bit.

Laser light is also used to read data (Kerr effect).

Page 98: operating system lecture notes

The magneto-optic head flies much farther from the disk surface than a magnetic disk

head, and the magnetic material is covered with a protective layer of plastic or glass;

resistant to head crashes.

Optical disks do not use magnetism; they employ special materials that are alte red

by laser light.

WORM Disks

The data on read-write disks can be modified over and over.

WORM (“Write Once, Read Many Times”) disks can be written only once.

Thin aluminum film sandwiched between two glass or plastic platters.

To write a bit, the drive uses a laser light to burn a small hole through the aluminum;

information can be destroyed by not altered.

Very durable and reliable.

Read Only disks, such ad CD-ROM and DVD, com from the factory with the data pre-



Compared to a disk, a tape is less expensive and holds more data, but random access is

much slower.

Tape is an economical medium for purposes that do not require fast random access, e.g.,

backup copies of disk data, holding huge volumes of data.

Large tape installations typically use robotic tape changers that move tapes between

tape drives and storage slots in a tape library.

stacker – library that holds a few tapes

silo – library that holds thousands of tapes

A disk-resident file can be archived to tape for low cost storage; the computer can stage it

back into disk storage for active use.

Page 99: operating system lecture notes

Operating System Issues

Major OS jobs are to manage physical devices and to present a virtual machine

abstraction to applications

For hard disks, the OS provides two abstraction:

Raw device – an array of data blocks.

File system – the OS queues and schedules the interleaved requests from several


Application Interface

Most OSs handle removable disks almost exactly like fixed disks — a new cartridge is

formatted and an empty file system is generated on the disk.

Tapes are presented as a raw storage medium, i.e., and application does not not open a

file on the tape, it opens the whole tape drive as a raw device.

Usually the tape drive is reserved for the exclusive use of that application.

Since the OS does not provide file system services, the application must decide how to

use the array of blocks.

Since every application makes up its own rules for how to organize a tape, a tape full of

data can generally only be used by the program that created it.

Tape Drives

The basic operations for a tape drive differ from those of a disk drive.

locate positions the tape to a specific logical block, not an entire track (corresponds to


The read position operation returns the logical block number where the tape head is.

The space operation enables relative motion.

Tape drives are “append-only” devices; updating a block in the middle of the tape also

effectively erases everything beyond that block.

An EOT mark is placed after a block that is written.

Page 100: operating system lecture notes

File Naming

The issue of naming files on removable media is especially difficult when we want to

write data on a removable cartridge on one computer, and then use the cartridge in

another computer.

Contemporary OSs generally leave the name space problem unsolved for removable

media, and depend on applications and users to figure out how to access and interpret the


Some kinds of removable media (e.g., CDs) are so well standardized that all

computers use them the same way.

Hierarchical Storage Management (HSM)

A hierarchical storage system extends the storage hierarchy beyond primary memory

and secondary storage to incorporate tertiary storage — usually implemented as a

jukebox of tapes or removable disks.

Usually incorporate tertiary storage by extending the file

system. Small and frequently used files remain on disk.

Large, old, inactive files are archived to the jukebox.

HSM is usually found in supercomputing centers and other large installations that have

enormous volumes of data.


Two aspects of speed in tertiary storage are bandwidth and latency.

Bandwidth is measured in bytes per second.

Sustained bandwidth – average data rate during a large transfer; # of bytes/transfer

time. Data rate when the data stream is actually flowing.

Effective bandwidth – average over the entire I/O time, including seek or locate, and

cartridge switching.

Drive’s overall data rate.

Access latency – amount of time needed to locate data.

Page 101: operating system lecture notes

Access time for a disk – move the arm to the selected cylinder and wait for the rotational

latency; < 35 milliseconds.

Access on tape requires winding the tape reels until the selected block reaches the tape

head; tens or hundreds of seconds.

Generally say that random access within a tape cartridge is about a thousand times

slower than random access on disk.

The low cost of tertiary storage is a result of having many cheap cartridges share a

few expensive drives.

A removable library is best devoted to the storage of infrequently used data, because

the library can only satisfy a relatively small number of I/O requests per hour.


A fixed disk drive is likely to be more reliable than a removable disk or tape drive.

An optical cartridge is likely to be more reliable than a magnetic disk or tape.

A head crash in a fixed hard disk generally destroys the data, whereas the failure of a

tape drive or optical disk drive often leaves the data cartridge unharmed.


Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only

one tape is used per drive.

The cheapest tape drives and the cheapest disk drives have had about the same storage

capacity over the years.

Tertiary storage gives a cost savings only when the number of cartridges is considerably

larger than the number of drives.

Page 102: operating system lecture notes

Price per Megabyte of DRAM, From 1981 to 2004

Price per Megabyte of Magnetic Hard Disk, From 1981 to 2004

Price per Megabyte of a Tape Drive, From 1984-2000

I/O Hardware

Application I/O Interface

Kernel I/O Subsystem

I/O Systems

Transforming I/O Requests to Hardware Operations




Explore the structure of an operating system’s I/O subsystem

Discuss the principles of I/O hardware and its complexity

Provide details of the performance aspects of I/O hardware and software

I/O Hardware

A Typical PC Bus Structure

Device I/O Port Locations on PCs (partial)


Determines state of device




Busy-wait cycle to wait for I/O from device


CPU Interrupt-request line triggered by I/O device

Page 103: operating system lecture notes

Interrupt handler receives interrupts

Maskable to ignore or delay some interrupts

Interrupt vector to dispatch interrupt to correct handler

Based on priority

Some nonmaskable

Interrupt mechanism also used for exceptions

Interrupt-Driven I/O Cycle

Page 104: operating system lecture notes

Intel Pentium Processor Event-Vector Table

Direct Memory Access

Used to avoid programmed I/O for large data movement

Requires DMA controller

Bypasses CPU to transfer data directly between I/O device and memory

Page 105: operating system lecture notes

Six Step Process to Perform DMA Transfer

Application I/O Interface

I/O system calls encapsulate device behaviors in generic classes

Device-driver layer hides differences among I/O controllers from kernel

Devices vary in many dimensions

Character-stream or block

Sequential or random-access

Sharable or dedicated

Speed of operation

read-write, read only, or write only

A Kernel I/O Structure

Characteristics of I/O Devices

Block and Character Devices

Block devices include disk drives

Commands include read, write, seek

Raw I/O or file-system access

Memory-mapped file access possible

Character devices include keyboards, mice, serial ports

Commands include get, put

Libraries layered on top allow line editing

Network Devices

Varying enough from block and character to have own interface

Unix and Windows NT/9x/2000 include socket interface

Separates network protocol from network operation

Includes select functionality

Approaches vary widely (pipes, FIFOs, streams, queues, mailboxes)

Page 106: operating system lecture notes

Clocks and Timers

Provide current time, elapsed time, timer

Programmable interval timer used for timings, periodic interrupts

ioctl (on UNIX) covers odd aspects of I/O such as clocks and timers

Blocking and Nonblocking I/O

Blocking - process suspended until I/O completed

Easy to use and understand

Insufficient for some needs

Nonblocking - I/O call returns as much as available

User interface, data copy (buffered I/O)

Implemented via multi-threading

Returns quickly with count of bytes read or written

Asynchronous - process runs while I/O executes

Difficult to use

I/O subsystem signals process when I/O completed

Two I/O Methods

Kernel I/O Subsystem


Some I/O request ordering via per-device queue

Some OSs try fairness

Buffering - store data in memory while transferring between devices

To cope with device speed mismatch

To cope with device transfer size mismatch

To maintain “copy semantics”

Device-status Table

Sun Enterprise 6000 Device-Transfer Rates

Page 107: operating system lecture notes

Kernel I/O Subsystem

Caching - fast memory holding copy of data

Always just a copy

Key to performance

Spooling - hold output for a device

If device can serve only one request at a time

i.e., Printing

Device reservation - provides exclusive access to a device

System calls for allocation and deallocation

Watch out for deadlock

Error Handling

OS can recover from disk read, device unavailable, transient write failures

Most return an error number or code when I/O request fails

System error logs hold problem reports

I/O Protection

User process may accidentally or purposefully attempt to disrupt normal operation

via illegal I/O instructions

All I/O instructions defined to be privileged

I/O must be performed via system calls

Memory-mapped and I/O port memory locations must be protected too

Use of a System Call to Perform I/O

Kernel Data Structures

Page 108: operating system lecture notes

Kernel keeps state info for I/O components, including open file tables, network

connections, character device state

Many, many complex data structures to track buffers, memory allocation, “dirty” blocks

Some use object-oriented methods and message passing to implement I/O

UNIX I/O Kernel Structure

I/O Requests to Hardware Operations

Consider reading a file from disk for a process:

Determine device holding file

Translate name to device representation

Physically read data from disk into buffer

Make data available to requesting process

Return control to process

Life Cycle of An I/O Request


STREAM – a full-duplex communication channel between a user-level process and a

device in Unix System V and beyond

A STREAM consists of:

- STREAM head interfaces with the user process

- driver end interfaces with the device

- zero or more STREAM modules between them.

Each module contains a read queue and a write queue

Page 109: operating system lecture notes

Message passing is used to communicate between queues

The STREAMS Structure


I/O a major factor in system performance:

Demands CPU to execute device driver, kernel I/O code

Context switches due to interrupts

Data copying

Network traffic especially stressful

Page 110: operating system lecture notes

Intercomputer Communications

Improving Performance

Reduce number of context switches

Reduce data copying

Reduce interrupts by using large transfers, smart controllers, polling


Balance CPU, memory, bus, and I/O performance for highest throughput

Device-Functionality Progression

***The End***