Distributed Scheduling

25
DISTRIBUTED SCHEDULING 1. INTRODUCTION:-Distributed systems offer a tremendous processing capacity. However, in order to realize this tremendous computing capacity, and to take advantage of it, a good resource allocation schemes are needed. A distributed scheduler is a resource management component of distributed operating system that focuses on judiciously and transparently redistributing the load of the system among the computers such that overall performance of the system is maximized. Because wide-area networks have high communication delays, distributed scheduling is more suitable for distributed systems based on local area networks. 2. MOTIVATION:-A locally distributed system consists of a collection of autonomous computers, connected by a local area communication network. Users submit tasks at their host computers for processing. The need for load distributing arises in such environment because due to the random arrival of tasks and their random CPU service time requirements, there is a good possibility that several computers are heavily loaded (hence suffering from performance degradation), while others are idle or lightly loaded, as shown in Figure 1. Clearly, if the workload at some computers is typically heavier than that at others, or if the processors execute tasks at a slower rate than others, this situation is likely to occur often. The usefulness of load distributing is not as obvious in the systems in which all processors are equally powerful and, over the long term, have equally heavy workloads. Livny and Melman have shown that even in such homogeneous distributed systems, statistical fluctuations in the arrival of tasks and task service time requirements at computers lead to the high probability that at least one computer is idle while a task is waiting

description

DOS

Transcript of Distributed Scheduling

Page 1: Distributed Scheduling

DISTRIBUTED SCHEDULING1. INTRODUCTION:-Distributed systems offer a tremendous processing capacity. However, in order to realize this tremendous computing capacity, and to take advantage of it, a good resource allocation schemes are needed. A distributed scheduler is a resource management component of distributed operating system that focuses on judiciously and transparently redistributing the load of the system among the computers such that overall performance of the system is maximized. Because wide-area networks have high communication delays, distributed scheduling is more suitable for distributed systems based on local area networks.2. MOTIVATION:-A locally distributed system consists of a collection of autonomous computers, connected by a local area communication network. Users submit tasks at their host computers for processing. The need for load distributing arises in such environment because due to the random arrival of tasks and their random CPU service time requirements, there is a good possibility that several computers are heavily loaded (hence suffering from performance degradation), while others are idle or lightly loaded, as shown in Figure 1.

Clearly, if the workload at some computers is typically heavier than that at others, or if the processors execute tasks at a slower rate than others, this situation is likely to occur often. The usefulness of load distributing is not as obvious in the systems in which all processors are equally powerful and, over the long term, have equally heavy workloads. Livny and Melman have shown that even in such homogeneous distributed systems, statistical fluctuations in the arrival of tasks and task service time requirements at computers lead to the high probability that at least one computer is idle while a task is waiting for service elsewhere. Their analysis models a computer in a distributed system by an M/M/1 server.Therefore, even in such homogeneous distributed system, system performance can potentially be improved by appropriately transferring the load from heavily loaded computers (senders) to idle or lightly loaded computers (receivers). 3. ISSUES IN LOAD DISTRIBUTING:-Distributing of load raises two questions. FIRST, what is meant by performance? One widely used performance metric is the average response time of tasks. The response time of a task is the length of the time interval between its origination and completion. Minimizing the average response time is often the goal of load distributing. SECOND, what constitutes a proper

Page 2: Distributed Scheduling

characterization of load at a node? Defining a proper load index is very important as load distributing decisions are based on the load measured at one or more nodes. Also, it is crucial that the mechanism used to measure load is efficient and imposes minimal overhead. These issues are discussed as below:i. LOAD :- Resource queue lengths and particularly the CPU queue length are good indicators of load because they correlate well with the task response time. Moreover, measuring the CPU queue length is fairly simple and carries little overhead. Is a task transfer involves significant delays, however, simply using the current CPU queue length as a load indicator can result in a node accepting tasks while other tasks it accepted earlier are still in transit. As a result, when all the tasks that the node has accepted have arrived, the node can become overloaded and require further task transfers to reduce its load. This undesirable situation can be prevented by artificially incrementing the CPU queue length at a node whenever it accepts a remote task. To avoid anomalies when task transfers fail, a timeout can be employed. After the timeout, if the task has not yet arrived, the CPU queue length is decremented.While the CPU queue length has been extensively used as a load indicator, it has been reported that a little correlation exists between CPU queue length and processor utilization, particularly in an interactive environment. Hence, the designers used CPU utilization as an indicator of the load at a site. This approach requires a background process that monitors CPU utilization continuously and imposes more overhead, compared to simply finding the queue length at a node.ii. CLASSIFICATION OF LOAD DISTRIBUTING ALGORITHMS :- The basic function of a load distributing algorithm is to transfer load (tasks) from heavily loaded computers to idle or lightly loaded computers. Load distributing algorithms can be broadly classified as: static, dynamic, or adaptive.a. STATIC ALGORITHMS:- Static load distributing algorithms make no use of system-state information. Decisions are hardwired in the algorithm using a priori knowledge of the system. These algorithms can potentially make poor assignment decisions. Because they do not consider node states when making such decisions, they can transfer a task initiated at an otherwise idle node to a node having a serious backlog of tasks. b. DYNAMIC AGLORITHMS:- Dynamic load distributing algorithms use system-state information (the loads at nodes), at least in part, to make load-distributing decisions. Dynamic algorithms have the potential to outperform static algorithms by using system-state information to improve the quality of their decisions. Essentially, dynamic algorithms improve performance by exploiting short-term fluctuations in the system state. Because they must collect, store, and analyze state information, dynamic algorithms incur more overhead than their static counterparts, but this overhead is often well spent.c. ADAPTIVE ALGORITHMS:- Adaptive load-distributing algorithms are a special class of dynamic algorithms. They adapt their activities by dynamically changing their parameters, or even their policies, to suit the changing system state. For example, if some load distributing policy performs better than others under certain conditions, while another policy performs better under other conditions, a simple adaptive algorithm might choose between these policies based on observations of the system state. Even when the system is uniformly so heavily loaded that no performance advantage can be gained by transferring tasks, a non adaptive dynamic algorithm might continue operating (and

Page 3: Distributed Scheduling

incurring overhead). To avoid overloading such a system, an adaptive algorithm might instead curtail its load distributing activity when it observes this condition.iii. LOAD BALANCING vs. LOAD SHARING :- Load-distributing algorithms can be further classified as load-sharing or load-balancing algorithms, based on their load distributing principle. Both types of algorithms strive to reduce the likelihood of an unshared state: state in which some computer lies idle while tasks contend for service at some other computer, by transferring tasks to lightly loaded nodes. Load balancing algorithms, however, go a step beyond load sharing algorithms by attempting to equalize the loads at all computers. Because a load balancing algorithm requires a higher transfer rate than load sharing algorithm, the higher overhead incurred may outweigh this potential performance improvement.Task transfers are not instantaneous because of communication delays and delays that occur during the collection of task state. Delays in transferring a task increase the duration of an unshared state as an idle computer must wait for the arrival of the transferred task. To avoid lengthy unshared states, anticipatory task transfers from overloaded computers to computers that are likely to become idle shortly can be used. Anticipatory transfers increase the task transfer rate of a load sharing algorithm, making it less distinguishable from load balancing algorithms. In this sense, load balancing algorithms can be considered a special case of load sharing algorithms, performing a particular level of anticipatory task transfers.iv. PREEMPTIVE vs. NONPREEMPTIVE TRANSFERS :- Preemptive task transfers involve transferring a partially executed task. This operation is generally expensive, since collecting a task's state (which can be quite large or complex) is often difficult. Typically, a task state consists of a virtual memory image, a process control block, unread I/O buffers and messages, file pointers, timers that have been set, and so on. Nonpreemptive task transfers, on the other hand, involve only tasks that have not begun execution and hence do not require transferring the task's state. In both types of transfers, information about the environment in which the task will execute must be transferred to the receiving node. This information may include the user's current working directory and the privileges inherited by the task. Nonpreemptive task transfers are also called task placements.4. COMPONENTS OF A LOAD DISTRIBUTING ALGORITHMS:- Typically, a dynamic load distributing algorithm has four components: a transfer policy, a selection policy, a location policy, and an information policy.i. TRANSFER POLICY :-A transfer policy determines whether a node is in a suitable state to participate in a task transfer, either as a sender or a receiver. It typically requires information on the local node’s state to make decisions. Many proposed transfer policies are threshold policies. Thresholds are expressed in units of load. When a new task originates at a node, the transfer policy decides that the node is a sender if the load at that node exceeds a threshold T1. On the other hand, if the load at a node falls below T2, the transfer policy decides that the node can be a receiver for a remote task. Depending on the algorithm, T1, and T2 may or may not have the same value.Alternatives to threshold transfer policies include relative transfer policies. Relative policies consider the load of a node in relation to loads at other system nodes. For example, a relative policy might consider a node to be a suitable receiver if its load is lower than that of some other node by at least some fixed δ. Alternatively, a node might be considered a receiver if its load is among the lowest in the system.ii. SELECTION POLICY :-A selection policy determines which task should be transferred. A selection policy selects a task for transfer, once the transfer policy

Page 4: Distributed Scheduling

decides that a node is a sender. Should the selection policy fail to find a suitable task to transfer, the node is no longer considered a sender.The simplest approach is to select one of the newly originated tasks that caused the node to become a sender by increasing the load at the node beyond the threshold. Such tasks are relatively cheap to transfer, since the transfer is nonpreemptive.A basic criterion that a task selected for transfer should satisfy is that the overhead incurred in the transfer should be compensated for by the reduction in the response time realized by the task. In general, long-lives tasks satisfy this criterion. Also, a task can be selected for remote execution if the estimated average execution time for that type of task is greater than some execution time threshold. Another approach based on the reduction in response time that can be obtained for a task is transferring it elsewhere. In this method, a task is selected for transfer only if its response time will be improves upon transfer.There are other factors to consider in the selection of a task:a. The overhead incurred by the transfer should be minimal. For example, a small task carries less overhead. The selected task should be long lived so that it is worthwhile to incur the transfer overhead.b. The number of location-dependent system calls made by the selected task should be minimal. Location-dependent calls are system calls that must be executed on the node where the task originated, because they use resources such as windows, the clock, or the mouse that are only at that node.iii. LOCATION POLICY :-A location policy determines to which node a task selected for transfer should be sent. A location policy is likely to require information on the states of remote nodes to make decisions. The responsibility of location policy is to find a suitable "transfer partner" (sender or receiver) for a node to share load.A widely used decentralized policy finds a suitable node through polling. A node polls another node to find out whether it is suitable for load sharing. Nodes can be polled either serially or in parallel (for example, multicast). A node can be selected for polling on a random basis, on the basis of the information collected during the previous polls, or on a nearest neighbour basis.An alternative to polling is to broadcast a query seeking any node available for load sharing.In a centralized policy, a node contacts one specified node called a coordinator to locate a suitable node for load sharing. The coordinator collects information about the system (which is the responsibility of the information policy), and the transfer policy uses this information at the coordinator to select receivers.iv. INFORMATION POLICY :-An information policy is responsible for triggering the collection of system state information. The information policy is responsible for deciding when information about the states of other nodes in the system is to be collected, from where it is to be collected, and what information is collected. There are three types of information policies:a. DEMAND-DRIVEN POLICIES: Under these decentralized policies, a node collects the state of other nodes only when it becomes either a sender or a receiver, making it a suitable candidate to initiate load sharing. A demand-driven information policy is inherently a dynamic policy, as its actions depend on the system state. Demand-driven policies may be sender, receiver, or symmetrically initiated. In sender-initiated policies, senders look for receivers to which they can transfer their load. In receiver-initiated policies, receivers solicit loads from senders. A symmetrically initiated policy is a combination of both: Load-sharing actions are triggered by the demand for extra processing power or extra work.

Page 5: Distributed Scheduling

b. PERIODIC POLICIES: These policies, which may be either centralized or decentralized, nodes, exchange information periodically. Depending on the information collected, the transfer policy at a node may decide to transfer tasks. Periodic information policies generally do not adapt their rate of activity to the system state. For example, the benefits resulting from load distributing are minimal at high system loads because most nodes in the system are busy. Nevertheless, overheads due to periodic information collection continue to increase the system load and thus worsen the situation.c. STATE-CHANGE-DRIVEN POLICIES: Under state-change-driven policies, nodes disseminate information about their states whenever their states change by a certain degree. A state-change-driven policy differs from a demand-driven policy in that it disseminates information about the state of a node, rather than collecting information about other nodes. Under centralized state-change driven policies, nodes send state information to a centralized collection point. Under decentralized state-change driven policies, nodes send information to peers.5. STABILITY:- We first informally describe two views of stability: the queuing theoretic perspective and the algorithmic perspective. According to the queuing theoretic perspective, when the long-term arrival rate of work to a system is greater than the rate at which the system can perform work, the CPU queues grow without bound. Such a system is termed unstable. For example, consider a load distributing algorithm performing excessive message exchanges to collect state information. The sum of the load due to the external work arriving and the load due to the overhead imposed by the algorithm can become higher than the service capacity of the system, causing system instability.On the other hand, an algorithm can be stable but still cause a system to perform worse than the same system without the algorithm. Hence, we need a more restrictive criterion for evaluating algorithms — the effectiveness of an algorithm. A load-distributing algorithm is effective under a given set of conditions if it improves performance relative to a system not using load distributing. An effective algorithm cannot be unstable, but a stable algorithm can be ineffective.According to the algorithmic perspective, if an algorithm can perform fruitless actions indefinitely with nonzero probability, the algorithm is unstable. For example, consider processor thrashing. The transfer of a task to a receiver may increase the receiver's queue length to the point of overloading it, necessitating the transfer of that task to yet another node. This process may repeat indefinitely. In this case, a task moves from one node to another in search of a lightly loaded node without ever receiving any service.6. LOAD DISTRIBUTING ALGORITHMS:- During the past decade, many load distributing algorithms have been proposed. They illustrate how the components of load distributing algorithms fit together and show how the choice of components affects system stability. The load distributing algorithms are as follows:I. SENDER-INITIATED ALGORITHMS: - Under sender-initiated algorithms, load-distributing activity is initiated by an overloaded node (sender) trying to send a task to an under loaded node (receiver). Eager, Lazowska, and Zahorjan studied three simple, yet effective, fully distributed sender-initiated algorithms. The features of these policies are as follows:a. TRANSFER POLICY — All Each of the algorithms uses the same transfer policy, a threshold policy based on the CPU queue length. A node is identified as a sender if a new task originating at the node makes the queue length exceed a threshold T. A node identifies itself as a suitable receiver for a task transfer if accepting the task will not cause the node's queue length to exceed T.b. SELECTION POLICY — All three algorithms have the same selection policy, considering only newly arrived tasks for transfer.

Page 6: Distributed Scheduling

c. LOCATION POLICY — The algorithms differ only in their location policies, which we review in the following subsections:i. Random: One algorithm has a simple dynamic location policy called random, which uses no remote state information. A task is simply transferred to a node selected at random, with no information exchange between the nodes to aid in making the decision. A problem with this approach is that useless task transfers can occur when a task is transferred to a node that is already heavily loaded (its queue length exceeds the threshold).An issue is how a node should treat a transferred task. If a transferred task is treated as a new arrival, then it can again be transferred to another node, providing the local queue length exceeds threshold. If such is the case, then irrespective of the average load of the system, the system will eventually enter a state in which the nodes are spending all their time transferring tasks, with no time spent executing them. A simple solution is to limit the number of times a task can be transferred. Despite its simplicity, this random location policy provides substantial performance improvements over systems not using load distributing. ii. Threshold : A location policy can avoid useless task transfers by polling a node (selected at random) to determine whether transferring a task would make its queue length exceed T (see Figure 11.3). If not, the task is transferred to the selected node, which must execute the task regardless of its state when the task actually arrives. Otherwise, another node is selected at random and is polled. To keep the overhead low, the number of polls is limited by a parameter called the poll limit. If no suitable receiver node is found within the poll limit polls, then the node at which the task originated must execute the task. By avoiding useless task transfers, the threshold policy provides a substantial performance improvement over the random location policy. iii. Shortest : The two previous approaches make no effort to choose the best destination node for a task. Under the shortest location policy, a number of nodes (=poll limit) are selected at random and polled to determine their queue length. The node with the shortest queue length is selected as the destination for task transfer, unless its queue length is greater than or equal to T. The destination node will execute the task regardless of its queue length when the transferred task arrives. The performance improvement obtained by using the shortest location policy over the threshold policy was found to be marginal, indicating that using more detailed state information does not necessarily improve system performance significantly.d. INFORMATION POLICY — When either the shortest or the threshold location policy is used, polling starts when the transfer policy identifies a node as the sender of a task. Hence, the information policy is demand driven.e. STABILITY — Sender-initiated algorithms using any of the three location policies cause system instability at high system loads. At such loads, no node is likely to be lightly loaded, so a sender is unlikely to find a suitable destination node. However, the polling activity in sender-initiated algorithms increases as the task arrival rate increases, eventually reaching a point where the cost of load sharing is greater than its benefit. At a more extreme point, the workload that cannot be offloaded from a node, together with the overhead incurred by polling, exceeds the node's CPU capacity and instability results. Thus, the actions of sender-initiated algorithms are not effective at high system loads and

Page 7: Distributed Scheduling

cause system instability, because the algorithms fail to adapt to the system state.

II. RECEIVER-INITIATED ALGORITHMS :- In receiver-initiated algorithms, load distributing activity is initiated from an under loaded node (receiver), which tries to get a task from an overloaded node (sender) as shown in Figure 11.4.

a. TRANSFER POLICY: The algorithm's threshold transfer policy bases its decision on the CPU queue length. The policy is triggered when a task departs. If the local queue length falls below the threshold T, then the node is identified as a receiver for obtaining a task from a node (sender) to be determined by the location policy. A node is identified to be a sender if its queue length exceeds the threshold T.b. SELECTION POLICY: The algorithm considers all tasks for load distributing, and can use any of the approaches discussed earlier.

Page 8: Distributed Scheduling

c. LOCATION POLICY: The location policy selects a node at random and polls it to determine whether transferring a task from it would place its queue length below the threshold level. If not, then the polled node transfers a task. Otherwise, another node is selected at random, and the procedure is repeated until either a node that can transfer a task (a sender) is found or a static poll limit number of tries has failed to find a sender.A problem with the location policy is that if all polls fail to find a sender, then the processing power available at a receiver is completely lost by the system until another task originates locally at the receiver (which may not happen for a long time). The problem severely affects performance in systems where only a few nodes generate most of the system workload and random polling by receivers can easily miss them. The remedy is simple: If all the polls fail to find a sender, then the node waits until another task departs or for a predetermined period before reinitiating the load distributing activity, provided the node is still a receiver.d. INFORMATION POLICY: The information policy is demand driven, since polling starts only after a node becomes a receiver.e. STABILITY: Receiver-initiated algorithms do not cause system instability because, at high system loads, a receiver is likely to find a suitable sender within a few polls. Consequently, polls are increasingly effective with increasing system load, and little waste of CPU capacity results. At low system loads, there are few sender-initiated polls but more receiver-initiated polls. These polls do not cause system instability as space CPU cycles are available at low system loads.A DRAWBACK: Under the most widely used CPU scheduling disciplines (such as round-robin and its variants), a newly arrived task is quickly provided a quantum of service. In receiver-initiated algorithms, the polling starts when a node becomes a receiver. However, these polls seldom arrive at senders just after new tasks have arrived at the senders but before these tasks have begun executing. Consequently, most transfers are preemptive and therefore expensive. Sender-initiated algorithms, on the other hand, make greater use of nonpreemptive transfers, since they can initiate load-distributing activity as soon as a new task arrives.III. SYMMETRICALLY INITIATED ALGORITHMS :-Under symmetrically initiated algorithms, both senders and receivers initiate load-distributing activities for task transfers. These algorithms have the advantages of both sender- and receiver-initiated algorithms. At low system loads, the sender-initiated component is more successful at finding under loaded nodes. At high system loads, the receiver-initiated component is more successful at finding overloaded nodes. However, these algorithms may also have the disadvantages of both sender and receiver-initiated algorithms. As with sender-initiated algorithms, polling at high system loads may result in system instability. As with receiver initiated algorithms, a preemptive task transfer facility is necessary. A simple symmetrically initiated algorithm can be constructed by combining the transfer and location policies described for sender-initiated and receiver- initiated algorithms. Another symmetrically initiates algorithm, called the above-average algorithm id described as below:THE ABOVE-AVERAGE ALGORITHM: The above-average algorithm, proposed by Krueger and Finkel, tries to maintain the load at each node within an acceptable range of the system average. Striving to maintain the load at a node at the exact system average can cause processor thrashing, as the transfer of a task may result in a node becoming either a sender (load above average) or a receiver (load below average). The description of this algorithm is as below:

Page 9: Distributed Scheduling

a. TRANSFER POLICY: The transfer policy is a threshold policy that uses two adaptive thresholds. These thresholds are equidistant from the node’s estimate of the average load across all nodes. For example, if a node’s estimate of the average load is 2, then the lower threshold = 1 and the upper threshold = 3. A node whose load is less than the lower threshold is considered as a receiver, while a node whose load is greater than the upper threshold is considered as a sender. Nodes that have loads between these thresholds lie within the accepting range, so they are neither senders nor receivers.b. LOCATION POLICY: The location policy has the following two components:i. Sender-Initiated Component :- A sender (a node that has a load greater than the acceptable range) broadcasts a TooHigh message, sets a TooHigh timeout alarm, and listens for an Accept message until the timeout expires. A receiver (a node that has a load less than the accepting range) that receives a TooHigh message cancels its TooLow timeout, sends an Accept message to the source of the TooHigh message, increases its load value (taking into account the task to be received) and sets an AwaitingTask timeout. Increasing its load value prevents a receiver from over-committimg itself to accepting remote tasks. If the AwaitingTask timeout expires without the arrival of a transferred task, the load value at the receiver is decreased. On receiving an Accept message, if the node is still a sender, it chooses the best task to transfer and transfers it to the node that responded. When a sender that is waiting for a response for its TooHigh message receives a TooLow message, it sends a TooHigh message to the node that sent TooLow message. On expiration of the TooHigh timeout, if no Accept message has been received, the sender infers that its estimate of the average system load is too low. To correct this problem, the sender broadcasts a ChangeAverage message to increase the average load estimate at the other nodes.ii. Receiver-Initiated Component:- A node, on becoming a receiver, broadcasts a TooLow message, sets a TooLow timeout alarm, and starts listening for a TooHigh message. If a TooHigh message is received, the receiver performs the same actions that it does under sender-initiated negotiation. If the TooLow timeout expires before receiving any TooHigh messages, the receiver broadcasts a ChangeAverage message to decrease the average load estimate at the other nodes.c. SELECTION POLICY :- This algorithm can use any of the approaches discussed earlier.d. INFORMATION POLICY :- The information policy is demand-driven. A highlight of this algorithm is that the average system load is determined individually at each node, imposing little overhead and without the exchange of many messages. Another key point to note is that the acceptable range determines the responsiveness of the algorithm. when the communication network is heavily/lightly loaded, the acceptable range can be increased/decreased by each node individually so that the load balancing actions adapt to the state of the communication network as well.IV. ADAPTIVE ALGORITHM :- i. A STABLE SYMMETRICALLY INITIATED ALGORITHM :- The main cause of system instability due to load sharing in the previously reviewed algorithms is indiscriminate polling by the sender's negotiation component. The stable symmetrically initiated algorithm uses the information gathered during polling

Page 10: Distributed Scheduling

(instead of discarding it, as the previous algorithms do) to classify the nodes in the system as sender /overloaded, receiver/underloaded, or OK (nodes having manageable load). The knowledge about the state of nodes is maintained at each node by a data structure composed of a senders list, a receivers list, and an OK list. These lists are maintained using an efficient scheme in which List-manipulative actions, such as moving a node from one list to another or determining to which list a node belongs, impose a small and constant overhead, irrespective of the number of nodes in the system. Consequently, this algorithm scales well to large distributed systems.Initially, each node assumes that every other node is a receiver. This state is represented at each node by a receivers list containing all nodes (except the node itself), and an empty senders list and OK list.a. TRANSFER POLICY: The transfer policy is a threshold policy where decisions are based on the CPU queue length. The transfer policy is triggered when a new task originates or when a task departs. The policy uses two threshold values — a lower threshold (LT) and an upper threshold (UT)—to classify the nodes. A node is a sender if its queue length is greater than its upper threshold, a receiver if its queue length is less than its lower threshold, and OK otherwise.b. LOCATION POLICY: The location policy has two components: the sender-initiated component and the receiver-initiated component. The sender-initiated component is triggered at a node when it becomes a sender. The sender polls the node at the head of the receivers list to determine whether it is still a receiver. The polled node removes the sender node ID from the list it is presently in, puts it at the head of its senders list, and informs the sender whether it is currently a receiver, sender, or OK. On receipt of this reply, the sender transfers the new task if the polled node has indicated that it is a receiver. Otherwise, the polled node's ID is removed from the receivers list and is put at the head of the OK list or the senders list based on its reply. Then the sender polls the node at the head of the receivers list.Polling stops if a suitable receiver is found for the newly arrived task, or if the number of polls reaches a poll limit (a parameter of the algorithm), or if the receivers list at the sender node becomes empty. If polling fails to find a receiver, the task is processed locally, though it may later be preemptively transferred as a result of receiver initiated load sharing. The goal of the receiver-initiated component is to obtain tasks from a sender node. The nodes polled are selected in the following order: (1) Head to tail in the senders list. The most up-to-date information is used first.(2) Tail to head in the OK list. The most out-of-date information is used first in the hope that the node has become a sender. (3) Tail to head in the receivers list. Again, the most out-of-date information is used first. The receiver-initiated component is triggered at a node when the node becomes a receiver. The receiver polls the selected node to determine whether it is a sender. On receipt of the message, the polled node, if it is a sender, transfers a task to the polling node and informs it of its state after the task transfer. If the polled node is not a sender, it removes the receiver node ID from the list it is presently in, puts it at the head of the receivers list, and informs the receiver whether the polled node is a receiver or OK. On receipt of this reply, the receiver node removes the

Page 11: Distributed Scheduling

polled node ID from whatever list it is presently in and puts it at the head of its receivers list or OK list, based on its reply. Polling stops if a sender is found, if the receiver is no longer a receiver, or if the number of polls reaches a static poll limit. c. SELECTION POLICY: The sender-initiated component considers only newly arrived tasks for transfer. The receiver-initiated component can use any of the approaches discussed under "Selection policy" in the "Issues in load distributing" section.d. INFORMATION POLICY: The information policy is demand driven, as polling starts when a node becomes either a sender or a receiver.DISCUSSION: At high system loads, the probability of a node's being under loaded is negligible, resulting in unsuccessful polls by the sender-initiated component. Unsuccessful polls result in the removal of polled node IDs from receivers lists. Unless receiver-initiated polls to these nodes fail to find senders, which are unlikely at high system loads, the receivers lists remain empty. This scheme prevents future sender-initiated polls at high system loads (which are most likely to fail). Hence, the sender initiated component is deactivated at high system loads, leaving only receiver- initiated load sharing (which is effective at such loads). At low system loads, receiver-initiated polls are frequent and generally fail. These failures do not adversely affect performance, since extra processing capacity is available at low system loads. In addition, these polls have the positive effect of updating the receivers lists. With the receivers lists accurately reflecting the system's state, future sender- initiated load sharing will generally succeed within a few polls. Thus, by using sender-initiated load sharing at low system loads, receiver-initiated load sharing at high loads, and symmetrically initiated load sharing at moderate loads, the stable symmetrically initiated algorithm achieves improved performance over a wide range of system loads and preserves system stability.ii. A STABLE SENDER-INITIATED ALGORITHM :- This algorithm has two desirable properties. FIRST, it does not cause instability. SECOND, load sharing is due to nonpreemptive transfers. This algorithm uses the sender- initiated load-sharing component of the previous approach but has a modified receiver-initiated component to attract future nonpreemptive task transfers from sender nodes. An important feature is that the algorithm performs load sharing only with nonpreemptive transfers, which are cheaper than preemptive transfers. The stable sender initiated algorithm is very similar to the stable symmetrically initiated algorithm. In the stable sender-initiated algorithm, the data structure (at each node) of the stable symmetrically initiated algorithm is augmented by an array called the statevector. Each node uses the statevector to keep track of which list (senders, receivers, or OK) it belongs to at all the other nodes in the system. For example, statevector[nodeid] says to which list node i belongs at the node indicated by nodeid. As in the stable symmetrically initiated algorithm, the overhead for maintaining this data structure is small and constant, irrespective of the number of nodes in the system. The sender-initiated load sharing is augmented with the following steps: When a sender polls a selected node, the sender's state vector is updated to show that the sender now belongs to the senders list at the selected node. Likewise, the polled node updates its state vector based on the reply it sent to the sender node to reflect which list it will belong to at the sender.The receiver-initiated component is replaced by the following protocol:

Page 12: Distributed Scheduling

When a node becomes a receiver, it informs only those nodes that are misinformed about its current state. The misinformed nodes are those nodes whose receivers’ lists do not contain the receiver's ID. This information is available in the statevector at the receiver. The statevector at the receiver is then updated to reflect that it now belongs to the receivers list at all those nodes that were misinformed about its current state.By this technique, this algorithm avoids receivers sending broadcast messages to inform other nodes that they are receivers. Broadcast messages impose message handling overhead at all the nodes in the system. This overhead can be high if the nodes frequently change their state.There are no preemptive transfers of partly executed tasks here. The sender initiated load-sharing component will do any task transfers, if possible, on the arrival of a new task. The reasons for this algorithm's stability are the same as for the stable symmetrically initiated algorithm.7. PERFORMANCE COMPARISON:- The general performance trends of some of the algorithms discussed earlier are shown. Figure 11.5 through Fig. 11.7 plot the average response time vs. the offered system load for load sharing algorithms. In addition, we compare their performance with that of a system that performs no load distributing and that of an ideal system that performs perfect load distributing without incurring any overhead in doing so. For the simulation we made the following assumptions: Task interarrival times and service demands are independently exponentially distributed, and the average task CPU service demand is one time unit. The system load is assumed to be homogeneous; that is, all nodes have the same long-term task arrival rate. The system is assumed to contain 40 identical nodes. The notations used in the figures correspond to the algorithm as follows:

M/M/l, a distributed system that performs no load distributing; RAND, a sender-initiated algorithm with a random location policy, assuming that a task can be transferred at most once; SEND, a sender-initiated algorithm with a threshold location policy; SYM, a symmetrically initiated algorithm (sender- and receiver-initiated algorithms combined); ADSEND, a stable sender-initiated algorithm; RECV, a receiver-initiated algorithm; ADSYM, a stable symmetrically initiated algorithm; And M/M/K, a distributed system that performs ideal load distributing without incurring overhead for load distributing. A fixed threshold of T = upper threshold = lower threshold = 1 was used for each algorithm. However, the value of T should adapt to the system load and the task transfer cost because a node is identified as sender or a receiver by comparing its queue length with T. At low system loads, many nodes are likely to be idle—a low value of T will result in nodes with small queue lengths being identified as senders who can benefit by transferring load. At high system loads, most nodes are likely to be busy—a high value of T will result in the identification of only those nodes with significant queue length as senders, who can benefit the most by transferring the load. While a scheduling algorithm may adapt to the system load by making use of an adaptive T, the adaptive stable algorithms adapt to the system load by varying the PollLimit with the help of the lists. Also, low thresholds are desirable for low transfer costs as smaller differences in node queue lengths can be exploited; high transfer costs demand higher thresholds.For these comparisons, we assumed a small fixed poll limit. A small limit is sufficient: If P is the probability that a particular node is below threshold, then the

Page 13: Distributed Scheduling

probability that a node below threshold is first encountered on the ith poll is P(l - P)i-1. (This result assumes that nodes are independent, a valid assumption if the poll limit is small relative to the number of nodes in the system.)For large P, this expression decreases rapidly with increasing i; the probability of succeeding on the first few polls is high. For small P, the quantity decreases more slowly. However, since most nodes are above threshold, the improvement in systemwide response time that will result from locating a node below threshold is small; quitting the search after the first few polls does not carry a substantial penalty. MAIN RESULT: The ability of load distributing to improve performance is intuitively obvious when work arrives at some nodes at a greater rate than at others, or when some nodes have faster processors than others. Performance advantages are not so obvious when all nodes are equally powerful and have equal workloads over the long term. Figure 4a plots the average task response time versus offered system load for such a homogeneous system under each load-distributing algorithm. Comparing M/M/l with the sender-initiated algorithm (random location policy), we see that even this simple load-distributing scheme provides a substantial performance improvement over a system that does not use load distributing. Considerable further improvement in performance can be gained through simple sender-initiated (threshold location policy) and receiver-initiated load sharing schemes. The performance of the best algorithm — the stable symmetrically initiated algorithm — approaches that of M/M/K, though this optimistic lower bound can never be reached, since it assumes no load-distribution overhead.a. RECEIVER-INITIATED vs. SENDER-INITIATED LOAD SHARING :- Figure 11.5 shows that the sender- initiated algorithm with a threshold location policy performs marginally better than the receiver-initiated algorithm at light to moderate system loads, while the receiver-initiated algorithm performs substantially better at high system loads (even though the preemptive transfers it uses are much more expensive than the nonpreemptive transfers used by the sender-initiated algorithm). Receiver-initiated load sharing is less effective at low system loads because load sharing is not initiated at the time that one of the few nodes becomes a sender, and thus load sharing often occurs late.

Figure 11.5 Average response time vs. System load.In robustness, the receiver-initiated policy has an edge over the sender initiated policy. The receiver-initiated policy performs acceptably over the entire system load spectrum, whereas the sender-initiated policy causes system instability at high loads. At such loads, the receiver-initiated policy maintains system stability because its polls generally find busy nodes, while polls due to the sender-initiated

Page 14: Distributed Scheduling

policy are generally ineffective and waste resources in efforts to find under loaded nodes.b. SYMMETRICALLY INITIATED LOAD SHARING :- This policy takes advantage of its sender- initiated load-sharing component at low system loads, its receiver-initiated component at high system loads, and both at moderate system loads. Hence, its performance is better than or matches that of the sender-initiated algorithm with threshold location policy at all levels of system load, and is better than that of the receiver-initiated policy at low to moderate system loads as shown in figure 11.6.

Nevertheless, this policy also causes system instability at high system loads because of the ineffective polling by its sender initiated component at such loads.c. STABLE LOAD-SHARING ALGORITHMS :- The performance of the stable symmetrically initiated algorithm (ADSYM) approaches that of M/M/K (Fig. 11.7), though this optimistic lower bound can never be reached, as is assumes no load distributing overhead. The performance of ADSYM matches that of the best of the algorithms at low system loads and offers substantial improvements at high loads (greater than 0.85) over all the non adaptive algorithms. This performance improvement results from its judicious use of the knowledge gained by polling. Furthermore, this algorithm does not cause system instability. The stable sender-initiated algorithm yields as good or better performance than the sender-initiated algorithm with threshold location policy, with marked improvement at loads greater than 0.6, and yields better performance than the receiver-initiated policy for system loads less than 0.85. And it does not cause system instability. While it is not as good as the stable symmetrically initiated algorithm, it does not require expensive preemptive task transfers.

Page 15: Distributed Scheduling

d. PERFORMANCE UNDER HETEROGENEOUS WORKLOAD :- Heterogeneous workloads are common in distributed systems. Figure 11.8 plots mean response time against the number of load-generating nodes in the system. All system workload is assumed to initiate at this subset of nodes, with none originating at the remaining nodes. A smaller subset of load-generating nodes indicates a higher degree of heterogeneity. We assume a system load of 0.85. Without load distributing, the system becomes unstable even at low levels of heterogeneity under this load. While we do not plot these results, instability occurs for M/M/l when the number of load-generating nodes is less than or equal to 33. Among the load-distributing algorithms, Figure 11.8 shows that the receiver- initiated algorithm becomes unstable at a much lower degree of heterogeneity than any other algorithm. The instability occurs because random polling is unlikely to find a sender when only a few nodes are senders. The sender-initiated algorithm with a threshold location policy also becomes unstable at relatively low levels of heterogeneity.

As fewer nodes receive the entire system load, they must quickly transfer tasks. But the senders become overwhelmed as random polling results in many wasted polls. The symmetrically initiated algorithm also becomes unstable, though at higher levels of heterogeneity, because of ineffective polling. It outperforms the receiver- and sender-initiated algorithms because it can transfer tasks at a higher rate than either. The stable sender initiated algorithm remains stable for higher levels of heterogeneity than the sender-initiated algorithm with a threshold location policy because it is able to poll more effectively. Its eventual instability results from the absence of preemptive transfers, which prevents senders from transferring existing tasks even after they learn about receivers. Thus senders become overwhelmed. The sender-initiated algorithm with a random location policy, the simplest algorithm of all, performs better than most algorithms at extreme levels of heterogeneity. By simply transferring tasks from the load-generating nodes to randomly selected nodes without any regard to their status, it essentially balances the load across all nodes in the system, thus avoiding instability. Only the stable symmetrically initiated algorithm remains stable for all levels of heterogeneity. Interestingly, it performs better with increasing heterogeneity. As heterogeneity increases, senders rarely change their states and will generally be in the senders lists at the non load-generating nodes. The non load-generating nodes will alternate between the OK and receiver states and appear in the OK or receivers

Page 16: Distributed Scheduling

lists at the load-generating nodes. With the lists accurately representing the system state, nodes are often successful in finding partners.8. SELECTING A SUITABLE LOAD SHARING ALGORITHM:- Based on the performance trends of load a sharing algorithms, one may select a load sharing algorithm that is appropriate to the system under consideration as follows:a. If the system under consideration never attains high loads, sender-initiated algorithms will give an improved average response time over no load sharing at all.b. Stable scheduling algorithms are recommended for systems that can reach high loads. These algorithms perform better than nonadaptive algorithms for the following reasons: Under sender-initiated algorithms, an overloaded processor must send inquiry messages delaying the existing tasks. If an inquiry fails, two overloaded processors are adversely affected because of unnecessary message handling. Therefore, the performance impact of an inquiry is quite severe at high system loads, where most inquiries fail. Receiver-initiated algorithms remain effective at high loads but require the use of preemptive task transfers. Preemptive task transfers are expensive as compared to nonpreemptive task transfers because they involve saving and communicating a far more complicated task state.c. For a system that experiences a wide range of load fluctuations, the stable symmetrically initiated scheduling algorithm is recommended because it provides improved performance and stability over the entire spectrum of system loads.d. For a system that experiences wide fluctuations in load and has a high cost for the migration of partly executed tasks, stable sender-initiated algorithm are recommended, as they perform better than unstable sender-initiated algorithm at all loads, perform better than receiver-initiated algorithms over most system loads, and are stable at high loads.e. For a system that experiences heterogeneous work arrival, adaptive stable algorithms are preferable, as they provide substantial performance improvement over nonadaptive algorithms.9. REQUIREMENTS FOR LOAD DISTRIBUTING:- While improving system performance is the main objective of a load distributing scheme, there are other important requirements it must satisfy.a. SCALABILITY: It should work well in large distributed systems. This requires the ability to make quick scheduling decisions with minimum overhead.b. LOCATION TRANSPARENCY: A distributed system should hide the location of tasks, just as a network file system hides the location of files from the user. In addition, the remote execution of tasks should not require any special provisions in the program.c. DETERMINISM: A task transferred must produce the same results as it would produce if it were not transferred.d. PREEMPTION: While utilizing idle workstations in the owner’s absence improves the utilization of resources, a workstation’s owner must not get a degraded performance on his return. Guaranteeing the availability of the workstation’s resources to its owner requires that remotely executed tasks must be preempted and migrated elsewhere on demand. Alternatively, these tasks may be executed at a lower priority.e. HETEROGENEITY: It should be able to distinguish among different architectures, processors of different processing capability, servers equipped with special hardware, etc.10. TASK MIGRATION:-The performance comparison of several load sharing algorithms showed that receiver-initiated task transfers can improve system

Page 17: Distributed Scheduling

performance at high system loads. However, receiver-initiated transfers require preemptive task transfers. Even though most systems do not operate at high system loads, an occasional occurrence of high system load can disrupt service to the users. If such circumstances are frequent, system designers may consider a preemptive task transfer facility. Also, some distributed schedulers for workstation environments guarantee the workstation to its owner by preempting foreign tasks and migrating, them to another workstation. Other distributed schedulers for this environment require preemptive task transfers to avoid starvation. Another situation where preemptive transfers are beneficial is when most of the system load originates at a few nodes in the system. In this case, receiver-initiated task transfers result in improved system performance.Task migration facilities allow preemptive transfers. Task placement refers to the transfer of task that is yet to begin execution to a new location and start its execution there. Task migration refers to the transfer of a task that has already begun execution to a new location and continuing its execution there. To migrate, a partially executed task to a new location, the task’s state should be made available at the new location. The general steps involved in task migration are:a. STATE TRANSFER: The transfer of the task’s state to a new machine. The task’s state includes information such as the contents of the registers, the task stack, whether the task is ready, blocked, etc., virtual memory address space, file descriptor, any temporary files the task might have created, and buffered messages. In addition, the current working directory, signal masks and handlers, resource usage statistics, references to the children processes (if any), etc., may be maintained by the kernel as a part of task’s state. The task is suspended at some point during the transfer so that the state does not change further, and then the transfer of the task’s state is complete.b. UNFREEZE: the task is installed at the new machine and is put in the ready queue so that it can continue executing. BENEFITS OF TASK MIGRATION:a. Load balancing.b. Reduction in communication overheads.c. Resource access.d. Fault tolerance.11. ISSUES IN TASK MIGRATION:- In the design of a task migration mechanism, several issues play an important role in determining the efficiency of the mechanism. These include the following:a. STATE TRANSFER: There are two important issues to be considered in state transfer. FIRST, the cost to support remote execution, which includes delays due to freezing the task, obtaining and transferring the state, and unfreezing the task. The lengthy freezing of a task during migration may result in the abortion of tasks interacting with it, as a result of timeouts. Hence, it is desirable that a migrating task may be frozen for as little time as possible. SECOND, is the residual dependencies, which refer to the amount of resources a former host of a preempted or migrated task continues to dedicate to service requests from the migrated task.The following are examples where residual dependency occurs:i. An implementation that does not transfer all the virtual memory address space at the time of migration but rather transfers pages to the new host as they are referenced.ii. An implementation that requires a previous host to redirect messages meant for migrated task to the present host of migrated task.

Page 18: Distributed Scheduling

iii. Location-dependent system calls accessing resources that exist only at the home node. These system calls must be forwarded to the home node where the task originated.Residual dependencies are undesirable for 3 reasons, namely, reliability, performance, and complexity.i. Residual dependencies reduce reliability as migrated process depends on its previous host(s). If any one of the hosts from which the task previously migrated fails, the task might be unable to make progress.ii. Residual dependencies affect the performance of migrated task. Since a memory access or a system call made by a migrated task may have to be redirected to a previous host, the communication delays of these remote operations can slow the task’s progress.iii. Residual dependencies also reduce the availability of the previous host by increasing their loads, due to remote operations initiated by the tasks migrated from them.iv.Finally, residual dependencies complicate the system’s operations by distributing a task’s state among several nodes. For instance, the check pointing and recovery of a process becomes much more complex if its state is distributed among many nodes. As another example, memory management may become complex because the memory management must distinguish between memory segments that belong to a local task and to a remote task. The situation might get much worse if a task migrates several times.b. LOCATION TRANSPARENCY: Many distributed systems support the notion of location transparency wherein services are provided to user processes irrespective of the location of the processes and services. In distributed systems that support task migration, it is essential that location transparency be supported. That is, task migration should hide the location of tasks, just as a distributed file system hides the location of files. In addition, the remote execution of tasks should not require any special provisions in the programs. Location transparency in principle requires that names be independent of their locations. By implementing a uniform name space throughout the system, a task can be guaranteed the same access to resources independent of its present location of execution. In addition, the migration of tasks should be transparent to the rest of the system. In other words, any operation or communication that was possible before the migration of a task should also be possible after its migration.Typically, the mapping of names to physical address in distributed systems is handles in 2 ways. FIRST, addresses are maintained as hints. If an access fails, hints can be updated either by multicasting a query or through some other means. This method poses no serious hindrance to task migration. An effect of a task migration in such a system is that hints maintaining the task’s address are no longer correct. SECOND, an object can be accessed with the help of pointers. In such cases, whenever a task migrates, pointers may have to be updated to enable continued access to and from the new location. If the pointers are maintained by kernel, then it is relatively easy to update the pointers. On the other hand, if pointers are maintained in the address space of tasks, then updating them can become more difficult.Transferring the entire state of a migrating task to a new location also aids in achieving location transparency. This allows most kernel calls to be local rather than remote. For example, the kernel at the new machine can handle the requests for virtual memory management, file I/O, IPC, etc.c. STRUCTURE OF A MIGRATION MECHANISM: The first issue in the design of a task migration facility is deciding whether to separate the policy-making modules from mechanism modules. These include modules responsible for collecting,

Page 19: Distributed Scheduling

transferring, and reinstating the state of migrating tasks. This decision is important, as it has implications for both performance and the ease of development. By separating the two, one can easily test different policies without having to change the mechanisms, and vice versa. Thus, the separation of policy and mechanism modules simplifies the developmental efforts.The second issue in the design of a task migration facility is deciding where the policy and mechanism should reside. The first step in the migration of task is to collect the task state. Typically, some part of the state such as file pointers, references to child processes, etc., is maintained by the kernel’s data structure. In addition, the migration mechanism is closely interwined with interprocess communication (IPC) mechanisms, which are generally inside the kernel. Hence, the migration mechanism may best fit inside the kernel.Policy modules decide whether a task transfer should occur. If the process of making these decisions is simple, the policy modules can be placed in the kernel. This will make the implementation more efficient as both types of modules can interact efficiently. If policy modules require large amounts of state information from the kernel to make decisions, then it also may be more efficient to place these modules in the kernel. If the policy modules do not impose a heavy overhead on the system due to their interactions with the kernel, then they fit best in utility processes.THIRD, the interplay between the task migration mechanism and various other mechanisms plays an important role in deciding where a module resides. Typically, there will be interaction between the task migration mechanism, the memory management system, the interprocess communication mechanisms, and the file system. The mechanisms can be designed to be independent of one another so that if one mechanism’s protocol changes, the other’s need not. Another flexibility provided by this principle is that the migration mechanism can be turned off without interfering with other mechanisms. On the other hand, the integration of mechanisms can reduce redundancy of mechanisms as well as make use of existing facilities. One serious disadvantage of integrated mechanisms, however, is that if one mechanism breaks down, all the other mechanisms that depend on it will also break down.d. PERFORMANCE: Comparing the performance of task migration mechanisms implemented in different systems is a difficult task, because of the different hardware, operating systems, IPC mechanisms, file systems, policy mechanisms, etc., on which the mechanisms are based.