Chapter 1

INDEX

What is Distributed Operating System ?

Evolution

Distributed Communicating System Models

Why are so popular?

Issues in designing DOS

Distributed Computing Environment

1.1 What is a Distributed Operating System

Advancement in microelectronic technology and

communication technology result increase the use of

multiple processors instead of single processor.

Multiple processors architectures are two types:

Tightly Coupled Systems

Known as Parallel Processing System

Single system wide primary memory

Loosely Coupled Systems

Known as Distributed Computing System

No shared memory, each processor has local

memory

Fig 1.1 Difference between tightly coupled and loosely coupled systems

Distributed computing system is a collection of

processors interconnected by a communication

network in which each processor has its own local

memory and other peripherals and the

communication between any two processors of the

system takes place by message passing over the

communication network.

Its own resources are local, and the other

processors are remote.

A processors and its resources are referred to as a

node or site or machine of the distributed

computing system.

1.2 Evolution of Distributed Computing System

Earlier computers were very expensive, large in size.

Job setup time was really a problem on those days.

During 1950/1960, advancement in technology introduced

batch processing.

Batching similar jobs automatic sequencing of jobs, off-line

processing by buffering and spooling and

multiprogramming increased CPU utilization.

But still multiple users can’t directly interact with the

computer system & share their resources.

In 1970, Time-Sharing concept was introduced.

Now, multiple users (computers) could simultaneously

execute interactive jobs and share the resources of the

computer system.

Parallel advancement in hardware allowed reduction in

size and increase in the processing speed of computer,

more processing capability. These systems were called

minicomputers.

Advent of time-sharing systems was the first step toward

distributed computing system b’cus it provides sharing of

resources and accessing of remote computers.

The merging of computer and networking technologies

gave birth to Distributed computing System in late 1970.

1.3 Distributed Computing System Models

Various models of Distributed Computing System are Minicomputer Workstation Workstation server Processor pool hybrid

1.3.1 Minicomputer Model

It is an extension of centralized time-sharing system.

In this model few minicomputers are interconnected by a

network.

Each such minicomputer has multiple users logged on to it.

Several interactive terminals are connected to each

minicomputer.

Each user is logged on to one specific computer & the network

allows a user to access remote resources that are available on

some machine other than one on to which the user is logged in.

This model is useful when resource sharing with remote machine

is desired.

E.g. ARPAnet

Fig 1.2 A distributed computing system based on minicomputer model

1.3.2 Workstation Model

In this model, several workstations inter connected by a


Each workstation has its own disk and serving as a single

user computer.

This model is used to interconnect workstations by a high-

speed LAN so that idle workstations may be used to process

jobs of users who are logged onto other workstations and do

not have sufficient power at their own workstations to get

their jobs processed efficiently.

Fig 1.3 A distributed computing system based on workstation model

In this model, user logs onto his own workstation. When system finds that

the user’s workstation doesn’t have sufficient processing power for

submitted job, it transfers one or more of the processes from the user’s

workstation to some other workstation that is currently idle and gets the

process executed there, and finally the result of execution is returned to

the user’s workstation.

E.g. The Sprite system and experimental system developed at Xerox

PARC.

Major issues with this model

How does the system find an idle workstation?

How is process migration

What happen if we migrate process when it is busy?

For third issue, solution

Allow remote process share resources of workstation

Kill the remote process

Migrate remote process back to its home workstation.

1.3.3 Workstation-Server Model

Workstation Model is a network of personal workstations, each with

its own disk and local file system.

A workstation with its own local disk is called diskfull workstation

and a workstation without a local disk is called diskless workstation.

A distributed system which use workstation server model consists of

a few minicomputers and several workstations interconnected by a


When diskless workstations are used, minicomputers are for file

system decisions and also for some special services like database

service, print service etc.

Fig 1.4 A distributed computing system based on workstation-server model

In this model, there are specialized machines for running server

process for managing and providing access to shared resources.

In this model, a user logs onto a workstation called his or her

home workstation. Normal computation activities required by the

user’s processes are performed at the user’s home workstation,

but requests for special service are sent to a server providing

that type of service that performs the use’s requested activity and

returns the result of request processing to the use’s workstation.

E.g. V-system

Some advantages over workstation model

It is much cheaper to use a few minicomputers with large, fast disks than a

large number of diskful workstation.

Backup and hardware maintenance are easier than with many small disks

scattered all over network.

In this model, all files are managed by servers, users have the flexibility to use

any workstation and access the files in the same manner irrespective of which

workstation the user is currently logged in.

In this model, request-response protocol is used to access the service of the

server machines. So, no process migration is required which is very complex.

A user has guaranteed response time because workstations are not used for

executing remote processes.

Request-Response Protocol

It is also known as client-server model of communication.

In this model, a client process (on workstation) sends a request to a

server process (minicomputer) for getting some service such as reading

a block of a file.

The server executes the request and sends back a reply to the client that

contains the result of request processing.

1.3.3 Processor-Pool Model In this model, processors are pooled together to be shared by the

users as needed.

The pool consists of a large number of microcomputers and minicomputers attached to the network.

Each processor has its own memory to load and run a system program or an application program of distributed computing system.

The processors in the pool have no terminals attached directly to them, and users access the system from terminals that are attached to the network via special devices.

A special server (called a run server) manages and allocates the processors in the pool to different users on demand basis.

Fig 1.5 A distributed computing system based on processor-pool model

Here no concept of home machine.

It allows better utilization of the available processing power of a distributed computing system.

Greater flexibility- that is system’s service can be easily expanded without the need to install any more computers.

E.g. Amoeba and the Cambridge Distributed Computing System.

Not suitable for high performance interactive application.

1.3.3 Hybrid Model Out of four model, workstation-server model is widely used

model because large numbers of computer users perform task such as editing jobs, sending mails and executing small programs.

Processor-pool model is also useful for massive computation.

A hybrid model is a combination of these two.

It is the workstation-server model but with a pool of processors.

Processors in the pool can be dynamically allocated for computations that are too large for workstations or that require several computers concurrently for efficient execution.

1.4 Why are Distributed computing systems gaining popularity ?

The two main demerits of DCS is complexity and the difficulty of building distributed system.

But the advantages of DCS overcome all disadvantages.

They are

Inherently Distributed Applications

Information Sharing among Distributed Users

Resource Sharing

Better price-performance ratio

Shorter Response Times and higher throughput

Higher Reliability

Extensibility

Better Flexibility

1.4.1 Inherently Distributed Applications

Several applications are inherently distributed.

E.g an employee database of a nationwide organization, the data

for an employee are generated at the employee’s branch office,

and the global need is to view the entire database, also local need

for frequent and immediate access to locally generated data at

each branch office.

Some other examples are online reservation, online banking etc.

1.4.2 Information Sharing among distributed users

DS provides efficient person-person communication facility by

sharing information over great distance.

The users are geographically separated from each other, they can

work in cooperation by transferring file of the project, logging

onto each other’s remote computers to run programs, and

exchanging messages by email.

A group of users uses distributed computing system to

cooperatively work is known as “Computer-Supported

Cooperative Working (CSCW)”.

Using DS, Both hardware and software resources can easily be shared.

Hardware resources include printers, scanners, hard-disks etc.

Software resources include files, database etc.

1.4.3 Resource Sharing

1.4.4 Better Price-Performance Ratio

With increasing power and reduction in the price of processors,

increasing speed of networks, a DS potentially have better price-

performance ratio.

Another reason for more cost effective price-performance ratio is

good sharing of information/resources among multiple

computers.

1.4.5 Shorter Response Time

Two most important performance matrices are response time and

throughput.

In DS, multiple processors are utilize in such a way, so that they

can provide response in short time period and with large amount

of output.

DS with fast network is increasingly being used as parallel

computers to solve single complex problem.

To improve performance, DS use load distribution techniques.

1.4.6 Higher Reliability

Reliability refers to the degree of tolerance against errors and component failures in a system.

A reliable system prevents loss of information even in the event of component failures.

The multiplicity of storage devices and processors in DS allows maintenance of multiple copies of critical information within the system and execution of computation to protect them against catastrophic failures.

An important aspect of reliability is availability which means the fraction of time for which a system is available for use.

In DS, few parts of the system can be down without interrupting the jobs of the users.

In processor pool model, if some of the processors are down at any time, the system can continue to function normally.

In DS, it is possible to easily extend power and functionality of

the system by simply adding additional resources

(hardware/software).

Properly designed DS that have the property of extensibility and

incremental growth are called Open Distributed System.

1.4.7 Extensibility and Incremental Growth

A Distributed computing system may have a pool of different types of computers.

Different types of computers are usually more suitable for performing different types of computation.

1.4.8 Better flexibility

1.5 What is Distributed Operating System

It is a program that controls the resources of a computer system

and provides its users with an interface or virtual machine that is

more convenient to use than the bare machine.

Two main tasks are:

To present users with a virtual machine that is easier to program than the

underlying hardware.

To manage the various resources of the system.

OS used for distributed environment are of two types

Network Operating System

Distributed Operating System

Three most common features for differentiation between two types of OS

1) System Image :

In case of Network Operating System, the users view the distributed computing

system as a collection of distinct machines connected by a communication

subsystem.

In case of Distributed OS, system hides the existences of multiple computers

and provides a single system image to its users.

In network OS, a user can run a job on any machine of the distributed

computing system, he/she is aware of the machine on which his/her job is

executed.

In Distributed OS, system dynamically and automatically allocates jobs to the

various machines of the system for processing.

With network OS, a user is required to know the location of a

resource to access it, and different sets of system calls have to be

used for accessing local and remote resources.

With Distributed OS, users need not keep track of the locations of

various resources for accessing them, and same set of system

calls is used for accessing both remote and local resources.

2) Autonomy

In network OS, each computer of systems has its own local OS and there is no

coordination at all among the computers except for the rule that when two

processes of different computers communicate with each other.

In this type of OS, all computers are work independently.

In distributed OS, there is a single system wide OS and each computer of the

distributed computing system runs a part of this global OS.

Here, all computers are work in close cooperation with each other for effective

& efficient utilization of the various resources.

Here, kernel (set of system calls) manages and controls the hardware of the

computer system to provide the facilities and resources that are accessed by

other programs.

3) Fault Tolerance

With Network OS, very little fault tolerance capability is there.

If 10% of machines are down, at least 10% of users are unable to work.

With Distributed OS, most of the users are mostly unaffected by the failed

machines and can continue to perform their work.

A distributed computing system that uses a network OS is

referred to as a Network System.

Whereas a distributed computing system that uses a distributed

OS is referred to as a true distributed System.

1.6 Issues in Designing a Distributed OS

Designing a distributed OS is very difficult.

It is designed with the assumption that complete information

about the system environment will never be available.

In distributed system, the resources are physically separated,

there is no common clock among the multiple processors,

delivery of messages is delayed and message could even be lost.

The designer of distributed OS has to focus on the following

design issues for good design.

Distributed system is more difficult to design as compared to centralized system.

Design Issues Transparency Reliability Flexibility Performance Scalability Heterogeneity Security Emulation of Existing Operating Systems

1.6.1 Transparency Main goals of a distributed Os is to make the existence of

multiple computers invisible and provide a single system image to all users.

All users have virtual uniprocessor. Eight forms of transparency are

Access Transparency

Location Transparency

Replication Transparency

Failure Transparency

Migration Transparency

Concurrency Transparency

Performance Transparency

Scaling Transparency

Access Transparency Users should not be able to recognize whether a resource (hardware /

software) is remote or local. User can access remote resource in the same way as local.

Location Transparency(1) Name transparency :

Name of resource should not reveal any hint as to the physical location of the resource.

Resource name must be unique system wide.

(2) User mobility :

No matter which machine a users is logged onto, he or she should be able to access a resource with the same name.

Replication Transparency All DOS have the provision to create replicas of files and other resources on

different nodes.

Here, both existence of replicas and replication activity should be transparent

to the users.

Also replication control decision such as number of copies, location of

replicas, creation/deletion time should be mad entirely automatically by the

system in a user-transparent manner.

Failure Transparency Users are not aware of partial failures in the system, such as a communication

link failure, a machine failure or a storage device crash.

Design of such system is practically impossible.

Migration Transparency The aim of this is to ensure that the movement of the object is handled

automatically by the system in a user-transparent manner.

Three important issues:

Migration decision such as which object should be moved from where to

where should be made automatically.

Migration from one node to another should not require the change in

object name.

Concurrency Transparency

For concurrency transparency, the resource sharing mechanism of the

distributed OS must have the following four properties:

An event-ordering property ensures that all access request to various

system resources are properly ordered to provide a consistent view to all

users of the system.

A mutual exclusion ensures that at any time at most one process accesses

a shared resource, which must not used simultaneous by multiple

processes if program operation is to be correct.

A no-starvation property ensures that if every process that is granted a

resource, which must not be used simultaneously by multiple process,

eventually release it, every request for that resources is eventually granted.

A no-deadlock property ensures that a situation will never occur in which competing processes prevent their mutual progress even though no single one requests more resources than available in the system.

Performance Transparency

The aim of this transparency is to allow the system to be automatically

reconfigured to improve performance, as loads very dynamically in the system.

The processing capability of the system should be uniformly distributed among

the currently available jobs in the system.

Scaling Transparency The aim of scaling transparency is to allow the system to expand in scale

without disrupting the activities of the users.

1.6.2 Reliability DOS is more reliable as compared to centralized OS because

there is existence of multiple copies of resources.

For higher reliability, the fault-handling mechanisms of DOS must

be designed properly to avoid faults, to tolerate faults and to

detect and recover from faults.

A fault is mechanical or algorithmic defect that may generate an error.

System failures are of two types

Fail-stop : The system stops functioning after

changing to a state in which its failure can be

detected.

Byzantine : The system continues to function but

produces wrong result.

Commonly used methods for dealing with above issues are Fault Avoidance

Fault Tolerance

Fault Detection & Recovery

Fault Avoidance It deals with designing the components of the system in such a way that the

occurrence of faults is minimized.

Fault Tolerance It is the ability of a system to continue functioning in the event of partial

system failure.

Some concepts to improve this fault tolerance are

Redundancy techniques : to avoid single point of failure by replicating critical hardware/software, so if one fails, the others can be used.

Distributed control : For better reliability, a DOS must employee distributed control mechanism. Multiple control server should be there.

Fault Detection & Recovery

This method uses hardware and software mechanism to determines the

occurrence of a failure and then to correct the system to a state acceptable for

continued operation.

Atomic transaction : It is a computation consisting of a collection of operations

that take place indivisibly in the presence of failures and concurrent

computations. That is either all of the operations are performed successfully or

none of them.

Stateless Servers : Stateless server does not depend on the history of serviced

requests between client & server. It means history doesn’t affects the execution

of the next service request.

Acknowledgement & timeout based retransmission of messages :

In DOS node or communication link failure may interrupt a

communication events. So there may be loss of message. So

receiver must send acknowledgement for every message.

Reliable IPC should also be able of detecting and handling lost

messages or duplicate message.

The main drawback by increasing reliability is potential loss of execution

time efficiency due to extra overhead involved in these techniques.

1.6.3 Flexibility The design of DOS should be flexible due to following reasons:

Ease of modification

Ease of enhancement

The most important design factor is the model used for kernel. Two models are there

Monolithic Kernel : In which most services like process management, memory management, file management etc. are provided by kernel.

Microkernel : Kernel is small nucleus of software that provides only minimal facilities. The only service provided by kernel is IPC, low-level device management, some memory management.

Fig 1.6 Modes of Kernel

Microkernel has advantages :

Microkernel is easy to design, implement, and install.

Since most of the services are implemented as user-level server

processes, it is easy to modify the design or add new services.

Disadvantage :

Sometimes poor performance

Since each server is an independent process having its own address

space. So the servers have to use some form of message based IPC

for communication.

Message passing between server processes and the microkernel

requires context switches, resulting additional performance

overhead.

1.6.4 Performance

Some design principles for better performance are as follows :

Batch if possible

Cache whenever possible

Minimize copying of data

Minimize network traffic

Take advantage of parallelism for multiprocessing

1.6.5 Scalability It means capability of a system to adapt to increased service load.

DS will grow with time since it is very common to add new machines or an entire

sub-network to the system to take care of increased workload.

Some principles to follow when designing DS

Avoid centralized entities

In DOS, use of centralized entities such as a single central file server,

single database makes the distributed system nonscalable.

Avoid centralized algorithms

This algorithm operates by collecting information from all nodes,

processing this information on a single node and then distributing the

result to other nodes.

Perform most operations on client workstation

1.6.6 Heterogeneity

A heterogeneous DS consists of interconnected sets of dissimilar hardware or

software systems.

It is more difficult to design heterogeneous systems.

In a heterogeneous DS, some form of data translation is necessary for interaction

between two incompatible users.

To reduce the complexity of translation process, some intermediate standard data

format should be declared.

1.6.7 Security The various resources of a computer system must be protected against destruction and

unauthorized access.

Providing security in DS is very difficult.

In DS, client-server model is there.

Additional requirement for security in dos

It should be possible for the sender of a message to know that the message was

received by the intended receiver.

It should be possible for the receiver of a message to know that the message was

sent by the intended sender.

It should be possible for both the sender and receiver of a message to be guaranteed

that the contents of the message were not changed while it was in transfer.

When client is sending message, an intruder may pretend to an authorized client or may

change the message contents during transmission.

Cryptography is only known practical method for dealing with these security.

1.6.8 Emulation of existing OS

For commercial success, it is important that a newly designed DOS be able to

emulate existing popular OS.

With this property, new s/w can be written using the system call interface of the

new OS to take full advantage of its special features of distribution.

Chapter 1

Documents

Transcript of Chapter 1