CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

166
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Transcript of CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

Page 1: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Page 2: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Goal

2. Transparency

3. Service

Page 3: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Efficiency

2. Flexibility

3. Consistency

4. Robustness

Page 4: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Access transparency

2. Location transparency

3. Migration transparency

4. Concurrency transparency

5. Replication transparency

6. Parallelism transparency7. Failure transparency

8. Performance transparency

9. Size transparency

10. Revision transparency

Page 5: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Primitive Services

2. Services by System Servers

3. Value added services.

Page 6: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Workstation-server model

2. Processor-pool model

CommunicationNetworkServers Workstations

Intelligent Terminals

CommunicationNetwork ServersProcessor Pool

Server

Page 7: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. LAN – Local Area Network

2. MAN – Metropolitan Area Network

3. WAN – Wide Area Network

4. On board interconnection networks

Page 8: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Physical Layer

2. Data Link Control (DLC) Layer

3. Network Layer

4. Transport Layer

5. Session Layer

6. Presentation Layer

7. Application Layer

Note: TCP/IP Protocol Suite Correspondence. 1, 2, as it is, 3 is IP, 4 and parts of 5 is TCP/UDP.

Page 9: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Object models and naming schemes.

2. Distributed Coordination.

3. Interprocess Communication.

4. Distributed Resources.

5. Fault tolerance and security.

Page 10: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

• Everything in a computer system are objects.

Objects are characterized by operations.

• Object identification schemes

• By name.

• By address.

• By service.

Page 11: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Barrier Synchronization

2. Condition Coordination

3. Mutual Exclusion

Related Problems:

1. Deadlock handling

2. Agreement protocols

Page 12: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Point-to-point communication (unicast)

2. Group communication (multicast or broadcast)

1. Client/server model (socket)

2. Remote Procedure Call (RPC)

Page 13: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Processing Power (processors) – load sharing, multiprocessor scheduling

2. Distributed Data (distributed shared memory and distributed file system) - synchronization, replica management

Page 14: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Transparency

2. Recovery

3. Confidentiality, authentication and authorization

Page 15: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Page 16: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

•Processes are programs in execution --- Each process has its own PCB, page table

and address space.

•Threads are lightweight processes --- Multiple threads run on a process sharing the

address space of the process.

--- Each thread has its own TCB TCB consisting of

program counter, stack pointers and register set.

--- Further each thread has its own stack.

Page 17: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Ease of creation – no need to copy address space

Context switching is cheaper, particularly for threads implemented in user space

Allow concurrency as well as blocking system calls making the programming easier

Simpler and more efficient resource sharing

Page 18: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Dispatcher-workers model.

Team model.

Pipeline model.

Page 19: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Dispatcher thread

Requests

Wor

ker

th

read

Wor

ker

th

read

Wor

ker

th

read

Page 20: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

ThreadThread Thread

Requests

o Identical threads

o Not identicalthreads

o Static

o Dynamic

Page 21: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Requests

Thread Thread Thread

Page 22: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Thread creation

-- Static vs. Dynamic.

Thread termination

Thread synchronization

Thread scheduling

-- Priority assignment, quantum size variation,

handoff and affinity scheduling.

Signal handling

-- Signal handler, signals must not be lost.

Page 23: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Implementing in user space

Implementing in kernel space

Hybrid approach

Page 24: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

As an add-on library

OS schedules the process, not threads, limitedscheduling choice – non-preemptivepriority.

Low context switching overhead.

One thread may block all other.

Possible unfair process scheduling.

Page 25: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

All scheduling policies may be implemented.

OS is aware of threads and schedule them.

Context switching overhead is higher.

Page 26: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

LWP LWP LWP LWP

User space threads

Kernel space threads

Multiprocessor system

Page 27: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

To analyze the performance requirements, the processes should be represented in abstract ways, hiding unnecessary detail.

Models for process representation:

Synchronous process graph.

Asynchronous process graph.

Timing diagram.

Page 28: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Represents the precedence relation between communicating processes using Directed Acyclic Graph (DAG).

Can be used for total completion time analysis etc.

Fork/join or cobegin/coend constructs may be used to implement the precedence relation.

Page 29: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Represents communicating processes without any reference to the precedence relation.

Used for processor allocation etc.

One-way communication

Client/server communication

Peer to peer communication

Page 30: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The events alongwith their time of occurrence is shown in this method. Provides greater detail and both the synchronous and asynchronous process graphs may be derived from the timing diagram.

Time

P1

P2

P3

Page 31: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Fact: Many applications need the timing information.Ex: Two processes updating a file. The file server received two simultaneous write requests. Without a time stamp it is not possible for the file server to determine which request is older.

Fact: It is not possible to perfectly synchronize clocks in a distributed system.

Fact: Even if the clocks may be synchronized, physical clocks will not run at same pace. Therefore, clock skew will develop over time.

Hence, we need some protocol to ensure that the clocks run in reasonable agreement. Such protocols are called clock synchronization protocols.

Page 32: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Two types of clocks:

• Physical Clock (a close approximation to the real life clock).

• Logical Clock (used to provide ordering of the events).

Different algorithms are used to synchronize physical and logical clocks.

Page 33: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Many Universal Coordinated Time (UTC) sources are available for use by computers.

NIST short wave radio

Automated Computer Time Service (via modem)

GPS satellites

Using these time services are costly and/or complicated.Using these time services are costly and/or complicated.

Therefore, a few time servers (TS) access the UTC sources directly. Other computers obtain the timing information from one or more TS.

Page 34: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Fact: Accessing a TS via network requires a non-deterministic message exchange time. Therefore, the following methods are used to compensate the delay.

1. Cristian’s Algorithm

2. The Berkeley Algorithm

3. Averaging Algorithm

Page 35: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Sender Time Server

T0

T1

Request

Time, T

I, time to process interrupt

If d is the estimate of delay from the TS to the sender, the sender should set its clock to T+d.

If nothing is known about I, then estimated d=(T1-T0)/2.

If I is known, then estimated d=(T1-T0-I)/2.

Or d may be estimated using average or minimum of multiple requests.

Page 36: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The time server periodically gathers time data from all other machines and determines the average time. Then it communicates the average time (in fact the difference) to all machines.

Suitable for systems which can not access any UTC server.

Page 37: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The Berkeley algorithm is centralized and therefore has a single point of failure.

Every computer broadcasts its time information at a regular interval known systemwide. Every machine averages the broadcast information to compute its own time.

Page 38: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Note that the clock of the system can not be set backwards. Doing so may result in an inconsistent system state.

To solve the problem the clock is slowed down for a certain period of time. Once the system clock reaches the desired state, the rate of change of the clock may be restored to the old state.

Page 39: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

It is possible to put an upper bound on the rate at which the clock skew develops. That means, the clock in a system may deviate at most t from the real clock over time t.

Therefore, if two clocks are synchronized at time 0, then they will differ by at most 2t at the end of time period t. If the system specification allows the clocks to differ by at most T, then every computer will have to synchronize its clock at least once every T/2 time.

Page 40: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Lamport’s algorithm

Vector logical clocks

Matrix logical clocks

Page 41: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Lamport’s algorithm is based on two facts.

1. The events within a processor may be consistently ordered using the system clock provided that the system clock is monotonically increasing.

2. The ordering of the events occurring in different processors do not matter as long as it is consistent with the communication pattern. In other words, If processor Pj sends a messsage to processor Pk at time T, then Pk can not receive it on or before T.

Page 42: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Let T be the universal physical time and Cj(T) be the value of the logical clock in processor j at time T. Further, Cj(T) is monotonically non-decreasing.

1. Stamp each internal event by the logical clock value of that processor.

2. Whenever a message is sent by processor k, stamp it by Ck(T).

3. When a message is received by processor k with a time stamp C’ set the logical clock of k to max(C’+1,Ck(T)).

Page 43: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Note that the version of the Lamport’s algorithm provides only a partial ordering of the events across the system.

If a total ordering of the events is necessary, then the time stamp may be appended with the processor id and the process id.

Page 44: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

One problem with Lamport’s algorithm is it does not help to recognize the concurrent events in the sense that two concurrent events may have different times stamps. Therefore, if T(a)<T(b) (where T(a) is the time stamp associated with the event a), it is not possible to say whether a causally happened before b, or they are concurrent.

Definition: a causally happened before b if either

1. a occurred before b in the same processor

2. A message sent by the processor of a after a occurred is received by the processor of b before b occurred.

3. a causally happened before c and c causally happened before b.

Page 45: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The problem may be solved using vector logical clock.

Each processor (k) maintains an array of logical clocks Ck(T)=(Ck1

(T),…,Ckn(T)) where the ith entry corresponds to the logical clock

of processor I.

Page 46: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Definition: Ck(T)<Cj(T) if all coordinates of Ck(T) are less than or equal to the corresponding coordinates of Cj(T) and at least one coordinate of Ck(T) is strictly less than that of Cj(T).

If neither Ck(T)<Cj(T) nor Cj(T)< Ck(T) then they are called incomparable.

The vector logical clock algorithm uses this vector clock to time stamp the events. The vector logical clock is maintained in such a way that the time stamp for two concurrent events will be incomparable. If the time stamps are comparable then the event with smaller time stamp occurred before.

Page 47: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The processor k maintains its vector clock as follows.

• The clock Ck(T) is initiated at (0,…,0).

• The kth coordinate (Ckk(T)) changes as the

local clock changes.

• Each message sent out is stamped by the vector clock.

• Whenever a message with time stamp Ci(T) is received, coordinate p of the logical clock (Ck

p(T)) is replaced by max(Ck

p(T), Cip(T)) for p=1,…,n.

Page 48: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Instead of maintaining an array of logical clocks, an array of vector clock is maintained. The matrix thus obtained is used to stamp messages. Upon receipt of a message, the matrix clock is updated by taking the pair-wise maximum.

Page 49: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Page 50: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Message Passing (lowest level of communication, unstructured)

2. Request/Reply (Client/server, RPC)

3. Transactions (Satisfies ACID property)

Page 51: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

• Send(destination,message)

• Receive(source, message)

Source/destination may be described in four different ways

1. Process name

2. Link

3. Mailbox

4. port

Page 52: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Unique global process id. required. May be obtained by adding machine address to process id.

2. Allows only one logical communication link between two processes.

3. Process identifiers need to be known at coding time.

Page 53: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Allows multiple logical communication path between communicating processes.

2. The communicating processes need to know about each other.

Page 54: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Mailboxes are global data structures shared by some sender and some receiver processes.

1. Allows indirect communication between sender and receiver.

2. Allows multipoint and multipath communication.

Page 55: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Ports are finite mailboxes with first-in-first-out (FIFO) message queues.

1. Created by user processes using system calls.

2. May have restrictions in terms of access rights.

Page 56: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Send/receive primitives may be blocking or non-blocking.

Blocking receive implies the process cannot continue till the message is received.

Blocking send may be of different types.

• Ordinary blocking send

• Reliable blocking send

• Explicit blocking send

• Request and reply

Page 57: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Used in both Windows and UNIX.

• Pipes: Byte streams shared by parent process and children.

• Named pipes: FIFO files shared by unrelated processes

• Sockets: two-way communication links shared by processes in different domains.

Page 58: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Provides Privacy, Integrity and Authenticity.

Authentication is done by third-party certification authority.

Privacy and integrity are maintained by handshake protocol and cryptography.

Page 59: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

• Reliability (Best effort vs. Reliable)

• Delivery Order (FIFO, Causal Order, Total Order)

• Failure of recipient(s) vs. Failure of originator

• Overlapping groups

Page 60: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The algorithm is very similar to the vector logical clock.

1. Each message is time stamped by a vector where each entry in the vector is the number of messages received by the sender from that group member.

2. Accept a message from process i if a) you have received all previous messages from i and b) you have received all messages seen by i. Otherwise,delay accepting the message.

3. Reject any duplicated message.

Page 61: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Atomic multicast alongwith total order delivery provided by two-phase, total-order multicast.Originator:

• Send the message, collect acknowledgements with time stamp.

• Send the commit message with the highest logical ack time stamp (commit stamp).

Recipient:

• Send acknowledgement with the logical clock value as time stamp (local ack stamp).

• Do not deliver a message with commit stamp t until the commit message for all messages with local ack stamp < t has been committed.

• Deliver messages in the order of the commit stamp.

Page 62: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

In request/reply communication the sender is blocked (or the message is considered not delivered) until it receives a reply.

RPC (Remote Procedure Call) is a language-level abstraction to support request/reply communication.

RPC is implemented by stub procedures at both the client end and the server end.

Page 63: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Implementation issues:

• Parameter passing and data conversion - Call-by-value and call-by-copy/restore are favorite choices.

- Data typing, data representation, data transfer syntax problems. Use a canonical data representation.

• Binding: directory server and port mapper.

• RPC compilation

• RPC exception and failure in-band or out-band signaling, idempotent services, request id., reliable transport layer, handling server crash, handling client crash

Page 64: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Transactions are communications with ACID property:

• Atomicity: all or nothing

• Consistency: interleaving results in serial execution in some order

• Isolation: Partial results are not visible outside

• Durability: After committing, the results are permanent

Page 65: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

There is one coordinator and multiple participants. Each of them have access to some stable storage. Activity log is maintained in the stable storage.

Coordinator:

1. Prepare to commit the transaction by writing every update in activity log.

2. Write a precommit message in the activity log. Send a voting message to all participants asking whether they are ready to commit.

3. If all participants vote yes within the time-out period, multicast a commit message. Otherwise, multicast an abort message.

Page 66: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Participant:

1. Prepare to commit the transaction by writing every update in activity log.

2. Write a precommit message in the activity log. Wait for request to vote from coordinator.

3. Wait for commit message from the coordinator. If received, commit the transaction. If abort message is received, abort the transaction.

Page 67: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Coordinator:

1. If the processor crashes, it will check the activity log for the transaction.

2. If the precommit message is not in the log, abort the transaction.

3. If the commit message is not in the log, retake the vote.

4. If the commit message is there in the log, finish the transaction.

Page 68: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

• Contention basedLamport’s algorithm, Ricart & Agrawala’s algorithm, Voting

algorithm

• Token basedRing structure, tree structure, broadcast structure

Page 69: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Every process maintains a queue of pending requests for entering critical section ordered according to the logical time-stamp.

Requestor:

1. Send a request to every process.

2. After all replies are collected, enter its own request in its own queue

3. If own request is at the head of the queue, enter critical section.

4. Upon exiting the critical section, send a release message to every process.

Page 70: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Other processes:

1. After receiving a request, send a reply and enter the request in the queue

2. After receiving release message, remove the corresponding request from the queue.

3. If own request is at the head of the queue, enter critical section.

Problems:

• 3(N-1) messages per requests

• Multiple points of failure

Page 71: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Requestor:

1. Send a request to all other process.

2. Enter critical section upon receipt of reply from all processes.Other Processes:

1. Upon receipt of a request check whether it has any older (by logical clock) pending request of its own or whether it is executing in critical section.

2. If neither of the above conditions hold, send reply. Otherwise delay the reply message till both the conditions are false.

Problems:

1. 2(N-1) message exchange per request.

2. Multiple points of failure.

Page 72: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Requestor:

1. Send a request to all other process.

2. Enter critical section once REPLY from a majority is received

3. Broadcast RELEASE upon exit from the critical section.

Other processes:

1. REPLY to a request if no REPLY has been sent. Otherwise, hold the request in a queue.

2. If a REPLY has been sent, do not send another REPLY till the RELEASE is received.

Observations:

1. No single point of failure, but possibility of deadlock.

2. O(N) messages per request.

Page 73: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Requestor:

1. Send a request with time stamp to all other process.

2. Enter critical section once REPLY from a majority is received

3. Return the REPLY message upon receipt of an INQUIRY if not currently executing in the critical section.

4. Broadcast RELEASE upon exit from the critical section.

Deadlock prevention method is used by ensuring no hold-and-wait

Page 74: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Other processes:

1. If no REPLY has been sent, REPLY to any REQUEST.

2. If a REPLY has been sent but the RELEASE has not been received, compare the time stamp of the new REQUEST with the time stamp of the replied REQUEST. If the new REQUEST has an earlier time stamp, try to retract the old REPLY sending INQUIRY message. If the older REPLY is returned, send REPLY to the new REQUEST. Otherwise, wait for the RELEASE.

Observation:

1. High (O(N)) messages per critical section entry.

Page 75: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Requestor:

1. Need to secure the REPLY messages from all members of its quorum only.

To reduce message overhead, the set of (N) processes is divided into N sets S1 to Sn such that Si Sk is non-null for all i and k. Si is called the quorum of process i.

Observations:

1. It is possible to reduce number of messages significantly.

2. Depending on the structure of the quorum and the exact algorithm, single point of failure may exist.

Page 76: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Control-token circulates in the system in some fixed order. Possession of the token grants permission to enter critical section.

• Ring structure – simple, deadlock-free, fair. The token circulates even in the absence of any request (unnecessary traffic). Long path (O(N)) – the wait for token may be high.

• Tree structure

• Broadcast structure

Page 77: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The processes are organized in a logical tree structure, each node pointing to its parent. Further, each node maintains a FIFO list of token requesting neighbors. Each node has a variable Tokenholder initialized to false for everybody except for the first token holder (token generator).

Entry section:

If not Tokenholder

If the request queue empty

request token from parent;

put itself in request queue;

block self until Tokenholder is true;

Page 78: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Exit section:

If the request queue is not emptyparent = dequeue(request

queue); send token to parent; set Tokenholder to false;

if the request queue is still not empty, request token from parent;

Page 79: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Upon receipt of a request:

If TokenholderIf in critical section put the requestor

in the queue else parent = requestor;

Tokenholder = false; send token to parent; endif

else if the queue is empty

send a request to the parent; put the requestor

in queue;

Page 80: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Upon receipt of a token:

Parent = Dequeue(request queue); if self is the parent

Tokenholder = true else

send token to the parent;

if the queue is not emptyrequest token from parent;

Note that all requests/token receipts must be processed in FIFO order and the routines must execute atomically.

Page 81: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Data Structure:

The token containsToken vector T(.) – number of completion of the critical section

for every process.Request queue Q(.) – queue of requesting processes.

Every process (i) maintains the followingseq_no – how many times i requested critical section.Si(.) – the highest sequence number from every process i heard of.

Page 82: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Entry Section (process i):

Broadcast a REQUEST message stamped with seq_no.

Enter critical section after receiving token.

Exit Section (process i):

Update the token vector T by setting T(i) to Si(i).

If process k is not in request queue Q and there are pending requests from k (Si(k)>T(k)), append process k to Q.

If Q is non-empty, remove the first entry from Q and send the token to the process indicated by the top entry.

Page 83: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Processing a REQUEST (process j):

Set Sj(k) to max(Sj(k), seq_no) after receiving a REQUEST from process k.

If holds an idle token, send it to k.

Requires broadcast. Therefore message overhead is high.

Page 84: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Used to elect a centralized controller if the older one fails. The election of a new controller helps to alleviate some of the problems arising from having a single point of failure.

Two types of election algorithm

• Extrema- finding – based on global priority

• Preference-based – based on local preferences.

Page 85: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Election algorithms and mutual-exclusion algorithms are similar in the sense that they try to find one process that will be leader or enter the critical section. However, there are significant differences between two.

1. Leader election is one time. A process may yield to another process. In mutual exclusion algorithm, a process competes until it succeeds.

2. In leader election starvation is not an issue.

3. Once the leader is elected, all other processes must know who is the leader.

Page 86: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Election algorithms are designed on the basis of the logical topology of the group of the processes.

1. Complete topology

2. Ring

3. Tree

Page 87: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Assumptions:

1. A process may reach all other processes in one logical hop.

2. Communication is reliable, only processes may fail.

3. Time to handle a message is bounded.

4. Process faults are benign, i.e., a faulty process stops to generate messages. In other words, a process never generates confusing wrong messages.

5. Upon recovery, a process knows that it experienced a failure.

6. A failed process may rejoin the group upon recovery.

Page 88: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Our algorithm is an extrema-finding algorithm. It is called the bully algorithm.

Upon detecting that the leader has failed, a process sends an inquiry message to all higher-priority processes. If no reply is received within the time-out period, the initiator sends an enter-election message to all lower priority processes. After the initiator receives a reply from all lower-priority processes or after the time-out, it declares itself as new leader and broadcasts that information to all processes.

If a process receives an inquiry message from a lower priority process, it send inquiry message to all higher priority process and enters election.

Page 89: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Our algorithm is an extrema-finding algorithm. The processes are organized in a logical ring.

The initiator sends a message along ring with its own id. Requesting everybody to elect it the leader.

A process compares its own priority with that of the id. on an incoming message. If its own priority is higher, and it never forwarded any other messages, replace the message with its own id. and forward it. If its own priority is lower, forward the message. If the message carries its own id. then elect itself the leader and forward the election message.Message complexity O(N2) in the worst-case, O(N) in the best-case.

Page 90: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

There are two types of election algorithms those use the logical tree topology.

1. The algorithm runs after the spanning tree is constructed.

2. The algorithm runs before the spanning tree is constructed and constructs the spanning tree in the process.

Page 91: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Note that the tree is an undirected tree without any parent-child relation. When the election begins there are possible more than one contender.

1. If receives an elect-me message from some other process before step 2 is completed, stop contending.

2. Send an elect-me message to all neighbors with a time-stamp.

3. If receives reply from all neighbors, declare itself as leader by broadcasting I-am-leader message.

4. If receives I-am-leader message from some other process, stop contending.

Contender:

Page 92: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Among possible multiple elect-me messages pick the one with lowest time-stamp and reply to it. Do not send multiple replies.

Leaf Nodes:

Internal nodes:

1. Upon receiving an elect-me message from neighbor u, compare its time stamp with other such messages received earlier (if any). If this one is the oldest elect-me message, forward it to all neighbors except u. Designate u as parent.

2. Upon receiving reply from all neighbors except parent, forward it to the parent.

Page 93: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Page 94: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. The effect of communication overhead

2. The effect of underlying architecture

3. Dynamic behavior of the system

4. Resource utilization

5. Turnaround time

6. Location and performance transparency

7. Task and data migration

Page 95: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Static or off-line

2. Dynamic or on-line

3. Real-time

Page 96: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Precedence graph

2. Communication graph

3. Disjoint process model

4. Statistical load modeling

Page 97: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Communication system model

2. Processor pool model

3. Isolated workstation model

4. Migration workstation model

Page 98: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Speedup

2. Resource utilization

3. Makespan or completion time

4. Load sharing and load balancing

Page 99: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Turnaround time=1/(

Page 100: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Turnaround time

2

Page 101: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The general problem is NP-complete.

1. Stone’s algorithm (2 heterogeneous processors, arbitrary communication process graph).

2. Bokhari’s algorithm (n homogeneous processors, linear communication process graph, linear communication system)

Page 102: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Assumptions:

1. Two heterogeneous processors. Therefore, the execution cost of each process depends on the processor. Further, the execution cost for each process is known for both processors.

2. Communication cost between each pair of processes is known.

3. Interprocess communication incurs negligible cost if both the processes are in the same processor.

Page 103: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Objective Function:

Stone’s Algorithm minimizes the total execution and communication cost by properly allocating the processors among the processes.

Let, G=(V,E)=(V,E) be the communication process model. Let A and B be two processors. Let wA(u) and wB(u) be the cost of executing process u on processors A and B respectively. Let, c(u,v) be the cost of interprocess communication between processes u and v if they are allocated different processors. Further, SS be the set of processes to be executed on processor A and V-SV-S be the set of process to be executed on processor B. Stone’s algorithm computes SS such that the objective function uS S wA(u)+vV-S V-S wB(v)+uSS, vV-SV-S c(u,v) is minimized.

Page 104: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Example:

Processes Cost at Processor A

Cost at Processor B

1 3 2

2 1 5

3 2 4

4 3 3

1 2 3 4

1 4 5 0

2 1 2

3 2

4

Computation Costs Communication Costs

If process 1 and 2 are allocated to processor A, and the rest in processor B, then the total cost is 3+1+4+3+5+0+1+2=19.

Page 105: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Commodity-flow problem:

Stone’s algorithm reduces the scheduling problem to the commodity-flow problem described below.

Let G=(V,E) be a graph with two special nodes u (source) and v (sink). Each edge has a maximum capacity to carry some commodity. What is the maximum amount of the commodity that can be carried from the source to the sink.

1. There is a known polynomial time algorithm to solve the commodity-flow problem

2. Let S be a subset of V such that the source is in S and the sink is in V-S. A set of edges (say C) with one end in S and the other end in V-S is called a cut and the sum of the capacities of the edges in C is called the weight of the cut. It can be shown that the maximum-flow is equal to the minimum possible cut-weight. This is called max-flow, min-cut theorem.

Page 106: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Reduction to the commodity-flow problem:

Stone’s algorithm reduces the scheduling problem to the commodity-flow problem as follows.

Let G=(V,E) be the communication process graph. Construct a new graph G’=(V’,E’) by adding two new nodes a (source, corresponding to processor A) and b (sink, corresponding to processor B). For every node u in G, add the edges (a,u) and (b,u) in G’. The weights (capacities) of the new edges will be wB(u) and wA(u) respectively.

1. Note that a cut in G’ gives a processor allocation for the job.

2. Further, the weight of the cut is same as the cost (objective function) of execution + cost of communication for the processor allocation given by the cut.

3. Therefore, if we compute the max-flow on G’, the corresponding min-cut gives the best processor allocation.

Page 107: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Example:

1

2

3

4

4

12

2

5A

B2

3

1

5

24

3

3

Page 108: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Assumptions:

1. n homogeneous processors, connected in a linear topology.

2. k>n processes, linear communication graph.

3. Communication links are homogeneous.

4. Computation and communication costs are known.

5. Two processes not communicating directly will not be allocated the same processor.

6. Two communicating processes, if not allocated same processor, must be in the adajacent processors.

Page 109: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Objective function:

Bokhari’s algorithm minimizes the sum of the total communication cost and the maximum execution cost in a processor. Note the difference with the objective function in Stone’s algorithm.

Let P1,…,Pn be processors in linear order and p1,…,pk be processes in linear order. Let the execution cost of pi be wi and the communication cost between pi and pi+1 be ci. If processes pj(j) to pi(j+1)-1 be allocated to processor Pi, then the objective function for Bokhari’s algorithm will be maxiWj+i=1

ncj(i) where Wj is the execution cost for processor j. Wj=wi(j)+wj(i+1)-1.

Page 110: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Example:

23 2 4 1 2 1

4 2 1 3

P1 P2 P3

The value of the objective function max(3+2,4,1+2+1)+(2+1)=8.

Page 111: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

To solve the problem Bokhari constructed the a layered graph. The graph has n+2 layers, where n is the number of processors. The top and bottom layer has 1 node each. All other layers have k nodes where k is the number of processes. We number the layers from 0 thru k+1 and denote the ith node of jth layer by v(i,j). The node in the top (0th) layer is adjacent to all nodes in layer 1. The node in the bottom layer is adjacent to all nodes in the kth layer. Other v(i,j)s are adjacent to v(i,j+1),…,v(k,j+1).

Example of a layered graph constructed for n=2 and k=3. Note that any path from the top layer to the bottom layer gives a processor allocation subject to the restrictions described earlier and vice versa.

Page 112: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Each edge e in the graph has two weights, w1(e) and w2(e). If e is an edge connecting v(i,j) and v(r,j+1) then w1(e) is the total execution time of the processes pi to pr-1 and w2(e) is the communication cost between pr-1 and pr. Note that, a shortest path from the top to bottom on the basis of w2 will minimize the total communication cost. Any known shortest path algorithm may be used to do that.

Further, w1 may take only certain values. To be precise, there are k+1C2 possible values of w1. We can compute those values and sort them. Then we can restrict the w1 by deleting all edges with w1(e) greater than the restricted value. A shortest path in the new layered graph will minimize the total communication cost subject to the restriction that the maximum of the execution cost in any processor will not exceed the threshold value. Using this method we can compute the objective function for each of the possible k+1C2 threshold values. The best processor allocation can thus be computed.

Note that a clever binary search over the possible values of w1 may reduce the computational complexity of the algorithm significantly.

Page 113: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The Goal of dynamic scheduling is to load share and balance dynamically. This goal is achieved by migrating tasks from heavily loaded processors to lightly loaded processors.

Two types of task migration algorithms:

• Sender-initiated

- Transfer policy (when to transfer?)

- Selection policy (which task to transfer?)

- Location policy (where to transfer?)

Potentially unstable.

• Receiver-initiated

Page 114: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

A real-time task consists of Si, the earliest start time, Di, the deadline to finish the task and Ci, the worst-case completion time.

A set of real-time tasks is called a real-time task set or a real-time system.

A system of real-time tasks is called hard real-time system if all of the tasks in the system must complete execution within their respective deadlines. Even if one task misses its deadline, the whole system is considered incorrectly executed.

A system of real-time tasks is called soft real-time system if the performance is judged by how many tasks missed deadline and by how much.

Page 115: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

A schedule for a real-time system is a CPU assignment to the tasks of the system such that a CPU is assigned to no more than one task at one time.

A schedule for a real-time system is called valid if no task is assigned CPU before its earliest-start time and all tasks complete execution within their respective deadlines.

A real-time system is called feasible if there is a valid schedule for the system.

Page 116: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Page 117: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Dispersed files and clients- login transparency, access transparency- location transparency, location independence

2. Multiple users and files- concurrency transparency- replication transparency

Page 118: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Files and file systems

2. Services and servers

3. File mounting and server registration

4. Stateful and stateless file servers

5. File access and semantics od sharing

6. Version control

Basic Concepts

Page 119: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Files have names, attributes and data.

2. File organization – flat or hierarchical

3. File access – sequential, direct, indexed/indexed sequential

4. Key components of file service – directory, authorization, file service (basic and transaction) and system services.

Page 120: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. A service may be provided by one/many servers. Similarly one process may provide multiple services.

2. Client/server relationship is relative. A schematic is given below.

Clients

Directoryservices

Auth.services

Fileservices

Systemservices

Page 121: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Mounting is method to build users’ directory structure by using files and directories dispersed over multiple systems.

2. Different types of mounting – Explicit, Boot and Auto-mounting.

3. Mounting may or may not provide uniform global view.

4. Mounting is not a location transparent protocol, hence server registration is useful

Page 122: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. A stateful server maintains state information for each open file. Staeless server does not maintain state information.Examples of state information – list of open files and clients, file descriptors and handles, file position pointers, mounting information, lock status, session keys, cache or buffer etc.

2. For stateless servers there are certain issues to be taken care of- Idempotency requirement- File locking mechanism- Session key management- Cache consistency

Page 123: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

space

time

Remote

Access

Cache

Access

Down/upload acc.

Simple

R/W

No true

Sharing

Coherency control

Coherency

Control

Transaction Concurrency

Control

Coherency and Concurrency

Control

Coherency and Concurrency Control

Session Not applicable

Not applicable

Ignore sharing

Page 124: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. UNIX Semantics- Can be implemented using only server-side cache.- Closely approximated using client-side cache with write-through and write-invalidate/update- As a compromise write-back (file updated and cache invalidated only at the end of the batch) cache coherence policy may be used.

2. Transaction Semantics- File at server is updated only at the end of a successful transaction

3. Session Semantics- File at server is updated only at the end of the session.

Page 125: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

One solution of the file sharing/replication problem is to create a new version of the file upon every write or upon every close.

Problem: What if two applications open an older version of a file, then application one closes the file (and thus creates a newer version) before application 2. Then application 2 closes the file.

Three solutions:

1. Ignore Conflict

2. Resolve version conflict

3. Resolve serializability conflict.

Page 126: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Transaction requirements:

1. The execution of transactions are all or none.

2. The interleaving of multiple transactions is serializable.

3. Update is atomic.

Clients Transactionmanager

SchedulerObject

managerObjects

ClientsTransaction

managerScheduler

Objectmanager

Objects

Page 127: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

A serializable schedule is a legal schedule such that its execution will produce result equivalent to some serial execution of all transactions.

Three concurrency control protocol for transaction management

1. Two-phase locking

2. Timestamp ordering

3. Optimistic concurrency control

A simple, inefficient method to achieve serializability: Complete transactions in private space, then update distributed objects using total order multicast.

Page 128: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

An object must be locked before accessing it.

No new lock can be acquired after releasing a lock.

Problems:

1. Potential deadlock. May be solved by rollback or abort.

2. Commit dependence resulting in probable rolling aborts if locks are released as early as possible. May be solved using strict two-phase locking – locks released only at commit or abort point.

Page 129: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Each transaction has a time-stamp. Transactions are serialized according to this time-stamp. In other words, older transactions abort and restart if they have a conflict with an younger transaction.

Each object has two time stamps RD and WR – times of the transactions which read (wrote) that object last.

Further each object will have a list of tentative transaction times for pending writes. Let Tmin be the minimum of these tentative times.

Page 130: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The following actions are taken with each event:

•READ: Does not conflict with other reads. If the time-stamp of this read is smaller than WR, this transaction is aborted. If it is allowed to proceed if it’s time stamp is in between WR and Tmin. It is kept in a queue otherwise for some other writes to commit.

•WRITE: Allowed to proceed only if it’s time stamp is greater than both RD and WR. Then the write is marked tentative and put in the list.

•ABORT: Aborted read has no effect. Aborted write is removed from the tentative list and if a pending read reaches the head of the queue, it is performed.

•COMMIT: A commit may not involve any pending read. If it has any pending write then all preceding tentative writes aborted and this transaction is committed.

Page 131: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Transactions are allowed to proceed freely in the private work space. Before committing it is validated. Once validated, the objects may be updated in updation phase.

Each transaction ti has two time stamps, TSi and TVi for the start of the execution and the validation phase respectively. Each object Oj records the times for its last read and write commits, RDj and WRj, respectively. Ri and Wi are read set and write set for ti. The transactions are serialized in the order of TVi.

VALIDATION: If tk is already in validation phase, and TVi < TVk, then ti can not be validated and must be aborted.

If ti has no overlap with any other transaction, it is validated.

If ti execution phase overlaps with update phase of tk, then Ri and Wk needs to be disjoint to validate ti.

If ti execution phase overlaps with validate and update of tk, then Ri and Wk needs to be disjoint as well as Wi and Wk need to be disjoint.

Page 132: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Reading operation:

1. Read-one-primary

2. Read-one

3. Read-quorum

Writing operation:

1. Write-one-primary

2. Write-all

3. Write-all-available

4. Write-quorum

5. Write-Gossip

Page 133: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. 2*write-quorum > number of replicas

2. Read-quorum+write_quorum > number of replicas

3. Usually read-quorum is chosen to be smaller than write-quorum

4. Voting by witnesses

5. Weighted voting schemes.

Page 134: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Each File service agent (FSA) f maintains a time stamp of last updation – TSf

Each Replica manager (RM) i maintains a time stamp of last updation – TSi

Basic Gossip Protocol:

Read: If TSf > TSi the replica manager do not have current data. Either wait or contact a different replica manager. Otherwise proceed with reading.

Write: Increase TSf. If TSf > TSi update and ask the replica manager to propagate the update by gossip. Otherwise, depending on the application, either overwrite or update and read. In both cases increase TSf.

Gossip message: If the gossip message carries a newer time-stamp, update data, TSi and depending on application, repeat the gossip with Tsi.

Page 135: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Page 136: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Uniform Memory Access (UMA)

2. Non-Uniform Memory Access (NUMA)

Network

MM

MM

PE

PE

Network

MM

PE

MM

PE

Cache

Cache

Cache

Cache

Page 137: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Virtual Memory Space

MM MM MM MM MM

MM Frame Offset

Page Table Entry

Page 138: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Temporal and Spatial locality of references may be exploited using cache

2. Scalability may be improved by using improved network and/or by using cache.

3. DSM helps the application programmer by providing communication transparency.

4. Programmers are more familiar with shared memory. Further many software are available for single processor and multiprocessor shared memory systems.

Page 139: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The performance of a DSM may be improved by 1. Placing a data block in appropriate memory module, 2. By migrating data block to an appropriate memory module and 3. by replicating block.

The performance of migration and replication strategies are measured by hit ratio.

Block-migration strategies suffer from block-bouncing problem.

Both block-migration and block-replication strategies suffer from the problem of false-sharing.

Page 140: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

In a distributed system, it is not possible to enforce uniprocessor-like coherency for shared data items.

1. There is no global clock. Therefore, it is not possible to determine latest write.

2. There will always be delays (non-deterministic).

Hence we use some weaker consistency model for shared data in distributed environment. The programmer is supposed to know what kind of consistency is guaranteed and act accordingly.

Page 141: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

If the user can not give any synchronization information, then general access consistency models are used. The general access consistency models are

1. Atomic consistency (usual uniprocessor consistency, a read never gets stale data)

2. Sequential consistency

3. Causal consistency

4. Processor consistency

5. Slow memory consistency.

Page 142: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The operation of all processors are executed in some sequential order and the operations of each individual processor are executed in the order specified by its program.

P1: W(X)1

P2: W(Y)2

P3: R(Y)2 R(X)0 R(X)1

Page 143: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Only causally related writes must appear in same order to all processors.

P1: W(X)1 W(X)3

P2: RX(1) W(X)2

P3: R(X)1 R(X)3 R(X)2

P4: R(X)1 R(X)2 R(X)3

Causally consistent but not sequentially consistent.

Page 144: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Writes from same processor must appear in the order they were issued.

P1: W(X)1

P2: RX(1) W(X)2

P3: R(X)1 R(X)2

P4: R(X)2 R(X)1

Processor consistent, but neither causally consistent nor sequentially consistent.

Page 145: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Writes from the same processor only in the same location (memory address) must appear in the order they were issued.

P1: W(X)1 W(Y)0 W(Y)2 W(X)3

P2: R(X)1 R(X)3 R(Y)0

Slow memory consistent, but not processor consistent.

Page 146: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The user controls access to the ordinary shared variable by using semaphore or other explicit synchronizing variables. The system provides some consistency only for synchronizing variables.

Synchronization access consistency models are:

1. Weak consistency

2. Release consistency

3. Entry consistency.

Page 147: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Access to all synchronizing variables are sequentially consistent

2. A processor must not access a synchronizing variable with pending access to any data

3. A processor must not access any variable with pending access to a synchronizing variable.

Access to a synchronization variable is synch(S).

Page 148: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Access to all synchronizing variables are processor consistent

2. Access to all shared variables between acquire and release are exclusive.

3. Changes made to shared variables are made visible to the outside world after release.

Access to a synchronization variable are acquire(S) and release(S).

Page 149: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Access to all synchronizing variables are processor consistent

2. Access to shared variable X between acquire and release are exclusive.

3. Changes made to X are made visible to the outside world after release.

Access to a synchronization variable are acquire(X) and release(X).

Page 150: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Master copy E

Replicated block V E

Replicated block V E

Replicated block V E

< Processor- bits >

V – valid/invalid

E - exclusive

Page 151: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Any cache manager faces the following events, and must take appropriate actions to maintain functionality

1. Read-hit

2. Read-miss

3. Write-hit

4. Write-miss

The performance of a cache management algorithm is judged by both hit-ratios and the overhead that occurs at each of the four events.

Page 152: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Read hit Use data from local cache Do nothing

Read miss Get the block from master copy. Set V bit. May set E bit also.

Send the block to the requestor. Set P bit, may set E bit also.

Write hit Change own cache, send data and invalidate message to the directory.

Update master copy, send invalidate message to other copies.

Write miss Same as write hit Same as write hit.

Event Action by processor Action by directory

Page 153: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Read hit Use data from local cache Do nothing

Read miss Get the block from master copy. Set V bit. May set E bit also.

Send the block to the requestor. Set P bit, may set E bit also.

Write hit Change own cache, send data and invalidate message to the directory.

Update master copy, send update message to other copies.

Write miss Same as write hit Same as write hit.

Event Action by processor Action by directory

Page 154: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

If the communication medium is a broadcast medium like bus, there is no need of the directory, as all processors can monitor all cache related messages and take appropriate actions. Clearly, such systems can implement sequential consistency.

However, bus is a potential bottleneck and a single point of failure. Several approaches are taken to alleviate the problem.1. Cache traffic is carried on a separate bus.

2. Multiple bus in hierarchical configuration.

3. Multistage interconnection networks (MIN) made of bus.

Page 155: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

In a DSM system, memory modules are strictly local and communication is only via high-latency links. Therefore, such systems simulates/emulates shared memory by message passing.

Obviously, a process in such a system finds its address-space distributed in the memories of different processors. Therefore, it has three options while accessing a non-local memory page.

1. Remote access

2. Migration

3. Replication

Page 156: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

With three access mechanisms and two access types (read and write), there are nine possible DSM management algorithms. Out of nine, the following four are more interesting.

1. Read-remote-write-remote

2. Read-migrate-write-migrate

3. Read-replicate-write-migrate

4. Read-replicate-write-replicate

Page 157: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

One or multiple server serves the memory pages.

1. Simple to implement

2. If implemented with a single server, the system is always sequentially consistent provided that the server serializes request and response services.

3. High latency

4. Servers are potential bottlenecks.

Page 158: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

A memory page migrates to the new processors memory upon access

1. Coherence problem need not be handled separately

2. Exploits program localities well

3. Upon migration, all processors sharing the page need to update their virtual page to physical block mapping

4. Ping-pong effect is a problem that becomes more severe by false sharing. May be handled by smaller page size (associated with high overhead), or a combination of migrate-replicate-remote access strategy.

Page 159: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Same as the write-invalidate strategy for cache management.

1. Popular because many software use the multiple-read/exclusive-write semantic.

2. Performs well when reads are dominant operation.

3. In the absence of broadcast support in hardware, write becomes a costly operation.

4. Maintains strong consistency.

Page 160: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Same as the write-update strategy for cache management.

1. Performs well when reads are not dominant operation.

2. In the absence of broadcast support in hardware, write becomes a very costly operation.

3. Maintains strong consistency only when used with a suitable protocol like two-phase commit.

Page 161: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Two problems need to be solved:

1. Finding the current owner of a page (if there is migration).

2. Finding the set of processors with a copy of the page (if there is replication.)

Page 162: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. If there is a master owner, that may keep track of the current owner

2. Otherwise, the following data structure may be used.

Block probable owner

Block probable owner

Block probable owner

Block probable owner

Block probable owner

Page 163: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

From To’s From To’s From To’s

From NILFrom NIL

Spanning Tree

Head master next Head master next Head master next Head master next

Requestor

Update Invalidate Request

Forwarded

Acknowledgement

Linked list

Page 164: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Distributed Shared Memory systems are classified on the basis of the following criteria

1. Implementation – Hardware, Software or Hybrid

2. Architecture Configuration or Interconnection Topology – Bus, Ring, Cube, MIN etc.

3. Shared Data Organization – Structured (Objects, higher level types etc.) or Non-structured.

4. Granularity/Coherence unit – Word, Cache, Block, Page, Object etc.

5. DSM Algorithms – SRSW, MRSW, MRMW.

6. Management Responsibility – Distributed or centralized

7. Consistency Model

8. Coherence Protocol

Page 165: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Processor Processor Processor Processor

Cache Cache Cache Cache

I/O I/O

LB to VBInterface

Node LocalMemory

Lock SpaceControl Spc

Data Spc

Read/Write NetworkInterconnect

Write

RM BUS

Write

Local Bus

VME Bus

Page 166: CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Implementation – Hybrid, Reflective memory + library routines

2. Architecture Configuration or Interconnection Topology – Hierarchical bus

3. Shared Data Organization – Structured (Data structure)

4. Granularity/Coherence unit – Data structure

5. DSM Algorithms – MRMW.

6. Management Responsibility - Distributed

7. Consistency Model – Entry consistency

8. Coherence Protocol – Write-update