ENGG4420 ‐‐ CHAPTER 5 ‐‐ LECTURE 1 5 By Radu Muresan... · tasks require no other resources...

REAL‐TIME TASKS SCHEDULING IN MULTIPROCESSOR AND DISTRIBUTED SYSTEMS

Real‐time scheduling on multiprocessors and distributed systems is much more difficult ‐‐determining an optimal schedule for a set of real‐time tasks on a multiprocessor or a distributed system is an NP‐hard problem.

•

The interprocess communication (IPC) is inexpensive and can be ignored in comparison to task execution times

○

May use a centralized dispatcher/scheduler that requires that the system maintains the state of various tasks of the system in a centralized data structure.

○

Multiprocessor systems are known as tightly coupled systems ‐‐ shared physical memory present in the system

•

IPC times are comparable to task execution times○Cannot use a centralized dispatcher/scheduler ‐‐communication to update the centralized data structure too costly.

○

Distributed systems are called loosely coupled ‐‐ no shared physical memory present

•

ENGG4420 ‐‐ CHAPTER 5 ‐‐ LECTURE 1November‐20‐094:14 PM

CHAPTER 5 By Radu Muresan University of Guelph

SCHEDULING REAL‐TIME TASKS on distributed and multiprocessor systems consists of two subproblems:

The task assignment problem is concerned with how to partition a set of tasks and then how to assign these tasks to processors ‐‐ task assignment can be: 1) static or 2) dynamic.

a.

In the static allocation scheme, the allocation of tasks to nodes is permanent and does not change with time

b.

In the dynamic task assignment, tasks are assigned to the nodes as they arise ‐‐ different instances of tasks may be allocated to different nodes

c.

Task allocation to processors1.

Uniprocessor scheduling algorithms can be used for the task set allocated to a particular processor

a.Scheduling of tasks on the individual processors2.


ENGG4420. CHAPTER 4: Uniprocessor and Multiprocessor Scheduling. Developed by Radu Muresan, Univesity of Guelph 14

Multiprocessor Schedule

The vast majority of assignment/scheduling problems on systems with more than two processors are NP-complete. We must therefore use heuristics.

Development of multiprocessor schedule is divided into two steps: assign tasks to processors; run a uniprocessor schedule for each processor.

Make an allocation

Schedule each processor based on

allocation

Are all theseschedule feasible?

Outputschedule

Check stoppingcriteria

Declarefailure

Changealloca-

tion

continue

stop

Most algorithms that are deployed in practice for multiprocessor scheduling are heuristic algorithms.Most heuristics are motivated by the fact that the uniprocessor scheduling (i.e. scheduling a set of tasks on a single processor) problem is usually tractable. If one or more of the schedules turn to be infeasible, then we must either return to the allocation step and change the allocation, or declare that a schedule cannot be found and stop -- the steps are presented in the diagram of this slide

•

Many variations of this approach are possible; for example, one can check for schedulability after the allocation of each task.

•

NP-complete: can’t be solved by an algorithm in polynomial time.


MULTIPROCESSOR TASK ALLOCATION

STATIC ALLOCATION ALGORITHMS ‐‐ the tasks are pre‐allocated to processors ‐‐ no overhead incurs during run time since tasks are permanently assigned to processors at the system initialization time.Utilization Balancing Algorithm1.Next‐Fit Algorithm for RMA2.Bin Packing Algorithm for EDF3.

DYNAMIC ALLOCATION ALGORITHMS ‐‐ in many applications tasks arrive sporadically at different nodes ‐‐ the tasks are assigned to processor as and when they arise ‐‐ the dynamic approach incurs high run time overhead since the allocator component running at every node needs to keep track of the instantaneous load position at every other node.Focussed Addressing and Binding (FAB) 1.The Buddy Strategy Algorithm2.

The task allocation algorithms for multiprocessors do not try to minimize communication costs as interprocess communication time is low ‐‐ communication time is the same as memory access time.The above algorithms may not work satisfactorily in distributed environments



Utilization-Balancing Algorithm

This algorithm attempts to balance processor utilization, and proceeds by allocating the tasks one by one and selecting the least utilized processor.

1. For each task Ti, do Allocate one copy of the task to each of the ri least utilized

processors.

Update the processor allocation to account for the allocation of task Ti.

end do

endwhere ri is the redundancy, i.e., the number of copies of task i that must be scheduled.

In a perfectly balanced system the utilization ui at each processor equals the overall average utilization u of the processors of the system. For a set of tasks STi assigned to a processor Pi we have:



A Next-Fit Algorithm for RM-Scheduling

Task set properties: independence, preemptibility, and periodicity

Other assumptions: identical processors tasks require no resources other than processor time.

Define M > 3 classes as follows: Task Ti is in class j < M if:

We allocate tasks one by one to the appropriate processor class until all the tasks have been scheduled, adding processors to classes if that is needed for RM-sched.

12/12 /1)1/(1 jii

j Pe

This is a utilization‐based allocation heuristic.•The task set has the same properties as for the RM uniprocessor scheduling algorithm (i.e., independence, preemptibility, and periodicity).

•

M is picked by user. •Corresponding to each task class is a set of processors that is only allocated to tasks of that class.

•

It is possible to show that this approach uses no more than N times the minimum possible number of processors, where:

•

otherwise340.2

]5.0,12(in n utilizatioith not task w is thereif911.1N



Exampleclass Bound

C1 (0.41, 1]

C2 (0.26, 0.41]

C3 (0.19, 0.26]

C4 (0.0, 0.19]

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11

ei 5 7 3 1 10 16 1 3 9 17 21

Pi 10 21 22 24 30 40 50 55 70 90 95

u(i) 0.5 0.33 .14 .04 .33 .40 .02 .05 .13 .19 .22

Cls. C1 C2 C4 C4 C2 C2 C4 C4 C4 C4 C3

Note: u(i) = ei/Pi; Solution presented on the overhead

Suppose we have M = 4 classes. The table lists the utilization bounds corresponding to each class

Since we have at least one task in each of the four classes, let us begin by earmarking one processor for each class.

•

In particular, let processor pi be reserved for task in class Ci, 1 ≤ i ≤ 4.

•



A Bin-Packing Assignment Algorithm for EDF

Problem. Schedule a set of periodic independent preemptible tasks on a multiprocessor system consisting of identical processors. the task deadlines equal their periods tasks require no other resources

Solution: EDF-scheduling on a processor: if U < 1 (for the task set

assigned to the processor) => the set is schedulable on the processor

The problem reduces to making task assignments to processors with the property that U < 1.



First-Fit Decreasing Algorithm

Initialize i to 1. Set U(j) = 0, for all j.

while i ≤ nT doLet j = min{k|U(k) + u(i) ≤ 1}.

Assign the ith task in L to pj.

i ← i + 1.

end while

We would like to minimize the number of processors needed. This is the famous bin‐packing problem and many algorithms exist for solving it. In this algorithm, j selects the first processor that meats the schedulability inequality.

•

The algorithm we present here is the first‐fit decreasing algorithm. Suppose there are nT tasks to be assigned.

•

Prepare a list L of the tasks so that their utilizations (i.e., u(i) = ei/Pi) are in decreasing order.

•



Example

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11

ei 5 7 3 1 10 16 1 3 9 17 21

Pi 10 21 22 24 30 40 50 55 70 90 95

u(i) 0.5 0.33 0.14 0.04 0.33 0.40 0.02 0.05 0.13 0.19 0.22

The ordered list L is:L = (T1, T6, T2, T5, T11, T10, T3, T9, T8, T4, T7)The assignment is presented in the next slide.


Solution

Step Ti u(i) Assign to Post-assignment U vector

1 T1 0.50 p1 (0.50)

2 T6 0.40 p1 (0.90)

3 T2 0.33 p2 (0.90, 0.33)

4 T5 0.33 p2 (0.90, 0.66)

5 T11 0.22 p2 (0.90, 0.88)

6 T10 0.18 p3 (0.90, 0.88, 0.18)

7 T3 0.14 p3 (0.90, 0.88, 0.32)

8 T9 0.13 p3 (0.90, 0.88, 0.45)

9 T8 0.06 p1 (0.96, 0.88, 0.45)

10 T4 0.04 p1 (1.00, 0.88, 0.45)

11 T7 0.02 p2 (1.00, 0.90, 0.45)

The vector U = (U1, U2, U3, …) contains the total utilizations of processor pi in Ui.

•

It is possible to show that when the number of processors required is large, the ratio:

•

(number of processors used by the first‐fit decreasing algorithm)/(number of processors used by optimal algorithm) approaches 11/9 = 1.22, when a large task set is used. In fact, this limit is approached quickly, so that 1.22 is a good measure even for relatively small systems.

•


FOCUSED ADDRESSING AND BIDDING (FAB)ALGORITHM

FAB is a simple algorithm that can be used as an on‐line procedure for task sets consisting of both critical and non‐critical real‐time tasks.Critical tasks must have sufficient time reserved for them so that they continue to execute successfully, even if they need their worst‐case execution time.

•

The non‐critical tasks are either processed or not, depending on the system's ability to do so.

•

The guarantee can be based on the expected run time of the task rather than the worst‐case run time (noncritical task).

○

THE UNDERLYING SYSTEM MODE IS: when a noncritical task arrives at processor pi, the processor checks to see if it has the resources and time to execute the task without missing any deadlines of the critical tasks or the previously guaranteed noncritical tasks ‐‐ if yes, pi accepts this new noncritical task and adds it to its list of tasks to be executed and reserves time for it.

The FAB ALGORITHM IS USED WHEN pi determines that it does not have the resources or time to execute the task ‐‐ in this case, it tries to ship that task out to some other processor in the system.

•


THE FAB ALGORITHM WORKS AS FOLLOWS

STATUS TABLE indicates which tasks have been already committed to run including the set of critical tasks (which were preassigned statically), and any additional noncritical tasks that have been accepted ‐‐ execution time and periods of the tasks.

○

LOAD TABLE contains the latest load information of all other processors of the system ‐‐ the surplus computing capacity available at the different processors can be determined.

○

Every processor maintains two tables called: status table and system load table.

•

Every processor on receiving a broadcast from a node about the load position updates the system load table.

○

Since the system is distributed, this information may never be completely up to date.

○

As a result, when a task arrives at a node, the node first checks whether the task can be processed locally ‐‐ if yes, it updates its status table if not, it looks for a processor to offload the task.

○

THE TIME AXIS is divided into windows, which are intervals of fixed duration ‐‐ at the end of each window, each processor broadcasts to all other processors the fraction of computing power in the next window for which it has no committed tasks

•


THE PROCESS OF OFFLOADING A TASK is based on the content of the system load table ‐‐ an overloaded processor checks its surplus information and:Selects a processor (called the focused processor) ps that is believed to be the most likely to be able to successfully execute that task by its deadline.

1)

The RFB contains the vital statistics of the task ‐‐its expected execution time, any other resource requirements, its deadline, etc.

a.

The system load table information might be out of date ‐‐ the overloaded processor, as insurance against this, will decide to send requests for bids (RFB) to other lightly loaded processor in parallel with sending out the task to the focused processor ps ‐‐ this is to gain time in case ps refuses the task.

2)

The RFB asks any processor that can successfully execute the task to send a bid to the focused processor ps stating how quickly it can process the task.

3)

An RFB is only sent out if the sending processor ps estimates that there will be enough time for timely response to it.

4)

Specifically, two times tbid and toffload are calculated ‐‐if tbid ≤ toffload then the RFB is sent out.

5)


TIME CALCULATIONS BY FAB ALGORITHM:tbid = (estimated time taken by RFB to reach its destination) + (the estimated time taken by the destination to respond with a bid) + (the estimated time taken to transmit the bid to the focused processor);

•

If tbid ≤ toffload; then RFB is sent out.○

toffload = (task deadline) ‐ [(current time) + (time to move the task) + (task‐execution time)];

tarr = (current time) + (time for bid to be received by ps) + (time taken by ps to make a decision) + (time taken to transfer the task) + (time taken by pt to either guarantee or reject the task);

First will estimate when the new task will arrive and how long will take to be either guaranteed or rejected:

○

tcomp = (time allotted to critical tasks in [tarr, D]) + (time needed in [tarr, D] to run already‐accepted noncritical tasks] + (fraction of recently accepted bids)x(time needed in [tarr, D] to honor pending bids);

tsurplus = D ‐ (current time) ‐ tcomp;

Next calculates the surplus time between the absolute deadline D, current time, and the computational time spoken for in the interval [tarr, D]:

○

If the task worst‐case run times are used the bids will be very conservative; if the average‐case values are used, the bids will be less conservative.

○

When a processor pt receives an RFB, it checks to see if it can meet the task requirements and still execute its already‐scheduled tasks successfully.

•


COURSE OF ACTION ‐‐ if tsurplus < (task execution time), then no bid is sent out ‐‐ if tsurplus ≥ (task execution time), pt sends out a bid to ps (the focus processor);The bid contains tarr, tsurplus, and an estimate of how long a task transferred to pt will have to wait before it is either guaranteed or rejected;

•

All bids are sent to the focused processor ps ‐‐ if ps is unable to process the task successfully, it can review the bids it gets and see which other processor is most likely to be able to run the task, and transfer the task to that processor;

•

ps waits for a certain minimum number of bids to arrive or until a specified time has expired since receiving the task ‐‐ for each bidding node pi, ps computes the estimated arrival time η(i) of the task at the node ‐‐ denotes by tsurplus(i) and tarr(i) the tsurplus and tarr values contained in the bid received by pi ‐‐ computes the following:

•


CARDINAL RULE IN BIDDING ‐‐ no new noncritical task can be allowed to cause any critical or previously guaranteed noncritical task to miss its guaranteed deadline.WAYS TO ASSES the schedulability of newly arrived tasks:Introduce into the schedule a periodic task that checks for schedulability every t seconds.

1.

A similar flag can be used to decide if there is enough time to respond to an RFB.

a.

Set a flag that indicates if the processor has time to check the schedulability of a new task and still meet all guaranteed deadlines ‐‐ if the flag is set when a new task arrives, the executing task is preempted and the processor deals with this new task ‐‐ if the flag is reset the processor cannot be interrupted and the new task must wait.

2.


THE BUDDY STRATEGY

The buddy strategy tries to solve the same problem as the FAB algorithm ‐‐ soft real‐time tasks arrive at the various processors of a multiprocessor and, if an individual processor finds itself overloaded, it tries to offload some tasks onto less lightly loaded processors.The buddy strategy differs from the FAB algorithm in the manner in which the target processors are found

•

STRATEGY: Each processor has 3 thresholds of loading: under TU, full TF, and over TV.

1.

State U (underloaded) if Q ≤ TU;a.State F (fully loaded) if TF < Q ≤ TV;b.State V (overloaded) if Q > TV;c.

The loading is determined by the number of jobs awaiting service in the processor's queue. If the queue length is Q, the processor is said to be in:

2.

If in state U a processor is in a position to execute tasks transferred from other processors ‐‐ if in state V, it looks for other processors on which to offload some tasks ‐‐ if in state F will neither accept nor offload tasks from/to other processors.

3.

When a processor makes a transition out of or into state U, it broadcasts an announcement to this effect.

4.


THE TRANSITION to/from state U is broadcasted to a limited subset of processors called processor's buddy set ‐‐ each processor is aware of whether any member of its buddy set is in state U ‐‐ if it is overloaded, it chooses an underloaded member (if any) in its buddy set on which to offload a task.ISSUES RELATED TO THE BUDDY SET

If a multihop network is used and the buddy set is restricted to processors that are "close" to it (in terms of the number of hops between them), there will be a substantial saving of network bandwidth ‐‐keeping the size of the buddy sets constant (some simulation experiments show 10 ‐ 15 for buddy set size) and independent of the system size results in the constant traffic per processor when state‐updated occurs.

a.

How the buddy set is to be chosen in a multi hop network? ‐‐ if too large the state‐change broadcast will heavily load the interconnection network ‐‐ if too small the success of finding an available U state processor will diminish

1.

SOLUTION to reduce the probability of the above happening: construct an ordered list of preferred processors ‐‐ first list processors that are only one hop away from the processor, then those that are two hops away, etc.

a.

If a node is in the buddy set of many overloaded processors, and it delivers a state‐change message to them saying that now is underloaded ‐‐ this can result in each of the overloaded processors dumping their load on this one processor and make it overloaded.

2.


Simulation experiments have shown that setting TU = 1; TF = 2; and TV = 3 produces good results for a wide range of system parameters;

a.

Which threshold are best for a given system depends on the particular characteristics of that system, including the size of the buddy set, the prevailing load, and the bandwidth and topology of the interconnection network.

b.

THE CHOICE OF THE THRESHOLDS TU, TF, AND TV ‐‐in general, the greater the value of TV, the smaller the rate at which tasks are transferred from one node to another

3.


ASSIGNMENT WITH PRECEDENCE CONDITIONS

Algorithm that assigns and schedules tasks with precedence conditions and additional resource constraints ‐‐ basic algorithm idea is to reduce communication costs by assigning (if possible) to the same processor tasks that heavily communicate with one another.UNDERLYING TASK MODEL: each task may be composed of one or more subtasks ‐‐ the release time of each task and the worst‐case execution time of each subtask are given ‐‐the subtask communication pattern is represented by a task precedence graph ‐‐ we are also given the volume of communication between tasks ‐‐ it is assumed that if subtask s1 sends output to s2, this is done at the end of s1;

Associated with each subtask is a latest finishing time (LFT) that will be explained through an example;

•

The algorithm is a trial‐and‐error process ‐‐ assign subtasks to the processors one by one in the order of their LFT values ‐‐ for same LFT the subtask with the greatest number of successor wins.

•

Check for feasibility after each assignment ‐‐ if one assignment is not feasible try another one, etc.

•

When subtasks communicate a lot, if possible, they are assigned to the same processor.

○

A threshold policy kc is followed when to characterize the volume of communication between tasks.

•


EXAMPLE: Consider the task graph below with the execution times described in the table. The labels within the circles are the subtasks subscripts and the arc labels denote the volume of communication between tasks. Suppose the overall deadline for this task (i.e., the deadline of subtask s4) is 30. Assume that s1, s2, and s2 are run in parallel. Calculate LFT for tasks and consider a threshold policy with kc = 3. Apply the algorithm to this task set.


kc is a TUNABEL PARAMETER ‐‐ if kc is 0, then intersubtask communication will not be taken in consideration when we do the assignment ‐‐ if kc is too high, then most of the time, we will be forced to assign tasks to the same processor and this may result in an infeasible schedule ‐‐ when we obtain an infeasible schedule we are forced to reduce kc adaptively to relax this condition.

Start with the set of tasks that need to be assigned and scheduled. Choose a value for kc and determine, based on this value, which tasks need to be on the same processor.

1.

Start assigning and scheduling the subtasks in order of precedence. If a particular allocation is infeasible, reallocate if possible. If feasibility is impossible to achieve because of the tasks that need to be assigned to the same processor, reduce kc suitably and go to the previous step ‐‐

2.

STOP when either the tasks have been successfully assigned and scheduled, or when the algorithm has tried more than a certain number of iterations ‐‐ in the former case, output the completed schedule; in the latter, declare failure.

3.

The algorithm can be described as follows:•


ASSIGNMENT

Consider a system composed of one task that is subdivided into 8 subtasks. The table below gives the execution times and the deadline for the output to the outside world. The task graph is presented in figure below. The task graph indicates the precedence and the communication volume. Apply the algorithm presented in the previous slide and come up with a feasible schedule. Note that the value of kc can be relaxed in order to obtain a feasible constraint. Calculate the LFT values for all subtasks. For generating a feasible schedule start with kc = 1.5.


FAULT‐TOLERANT SCHEDULING

Static schedules must have the ability to respond to hardware failures ‐‐ they do this by having a sufficient reserve capacity and a sufficient fast failure response mechanism to continue to meet critical‐task deadlines despite a certain number of failures.

•

These ghost copies do not need to be identical to the primary copies ‐‐ they can be alternative copies that are simplified and produce poorer results but still acceptable.

○

THE APPROACH TO FAULT‐TOLERANT SCHEDULING CONSIDERED HERE uses additional ghost copies of tasks, which are embedded into the schedule and activated whenever a processor carrying one of their corresponding primary or previously‐activated ghost copies fails.

ASSUMPTIONS ‐‐ a set of periodic critical tasks ‐‐multiple copies of each version are assumed to be executed in parallel with voting or some other error masking mechanism.

The task running at the time of failure ‐‐ the use of forward‐error recovery is assumed to be sufficient to compensate for this loss.

a.

Tasks that need to be run by the processor in the future ‐‐the fault‐tolerant algorithm presented here is meant to compensate for finding a replacement processor.

b.

WHEN A PROCESSOR FAILS there are two types of tasks affected by that failure:

ENGG4420 ‐‐ CHAPTER 5 ‐‐ LECTURE 2 ‐ 3November‐21‐0911:51 AM


DEFINITION OF THE PROBLEM

We assume the existence of a nonfault‐tolerant algorithm for allocation and scheduling ‐‐ called by the fault‐tolerant procedure introduced here.

•

We assume that the allocation/scheduling procedure consists of an assignment part and an EDF scheduling part

•

PROBLEM: Suppose that the system is meant to run nc(i) copies of each version (or iteration) of task Ti, and is supposed to tolerate up to nsust processor failures. The fault‐tolerant schedule must ensure that, after some time for reacting to the failure(s), the system can still execute nc(i) copies of each version of task Ti, despite the failure of up to nsust processors ‐‐ the processor failure may occur in any order.THE OUTPUT of the fault‐tolerant scheduling algorithm will be a ghost schedule, plus one or more primary schedules for each processor.


FEASIBLE PAIR: a ghost schedule and a primary schedule are said to form a feasible pair if all deadlines continue to be met even if the primary tasks are shifted right by the time needed to execute the ghosts. Ghosts may overlap in the ghost schedule of a processor ‐‐ in this case only one ghost copy can be activated.THEOREM. Conditions C1 and C2 for ghosts are the necessary and sufficient conditions for up to nsustprocessor failures to be tolerated.CONDITION C1: each version must have ghost copies scheduled on nsust distinct processors. Two or more copies (primary or ghost) of the same version must not be scheduled on the same processor.CONDITION C2: Ghosts are conditionally transparent ‐‐they must satisfy the following two properties:Two ghost copies may overlap in the schedule of a processor if no other processor carries a copy (either primary of ghost) of both tasks.

a.

Primary copies may overlap the ghosts in the schedule only if there is sufficient slack time in the schedule to continue to meet the deadlines of all primary and activated ghost copies on that processor.

b.

Theorem above provides the conditions that the fault‐tolerant scheduling algorithm must follow in producing the ghost and primary schedule. Fault‐tolerant algorithms: FA1 and FA2.


FA1: under FA1, the primary copies will always execute in the positions specified in schedule S (this is a static schedule), regardless of whether any ghosts happen to be activated, since the ghost and primary schedule do not overlap.FA1 ALGORITHMRUN to obtain a candidate allocation of copies to processors. Denote by and the primary and ghost copies allocated to processor pi, i = 1, ..., np;

1.

RUN If the resultant schedule is found to be infeasible the allocation as produced by is infeasible; return control to in step 1. Otherwise, record the position of ghost copies (as put out by ) in ghost schedule Gi, and the position of the primary copies in schedule S.

2.

DRAWBACK OF FA1: the primary task are needlessly delayed when the ghosts do not have to be executed ‐‐ while all the tasks will meet their deadlines, it is frequently best to complete execution of the tasks early to provide slack time to recover from transient failures.


ALGORITHM FA2Run to obtain a candidate allocation of copies to processors. Denote by and the primary and ghost copies allocated to processor pi, i = 1, ..., np. For each processor, pi, do steps 2 and 3.

1.

Run If the resultant schedule is found to be infeasible, the allocation as produced by is infeasible; return control to in step 1. Otherwise, record the position of the ghost copies (as put out by ) in ghost schedule Gi. Assign static priorities to the primary tasks in the order in which they finish executing, i.e., if primary completes before in the schedule generated in this step,

2.

will have higher priority than Generate primary schedule Si by running on 3.with priorities assigned in step 2

The drawback of FA1 does not occur in FA2 ‐‐ here an additional scheduling algorithm which is a static‐priority preemptive scheduler is used ‐‐ given a set of tasks, each with its own unique static priority, will schedule them by assigning the processor to execute the highest‐priority task that has been released but is not yet completed.THEOREM. The ghost schedule Gi and primary schedule Si form a feasible pair ‐‐ PROOF. The primary tasks will complete no later than the time specified in even if all the space allocated to ghosts in Gi is, in fact occupied by them.


EXAMPLE (Home Work). Consider the case where a processor p has been allocated ghosts g4, g5, g6, and primaries π1, π2, π3. The release times, execution times, and deadlines are given in table below. CASES TO BE SCHEDULED:Suppose that there exists some processor q to which the primary copies of g4 and g5 have been allocated and we cannot overlap g4 and g5 in the ghost schedule of processor p.

1.

There is other allocation of the same ghost and primary tasks to p. In this allocation the primary of g4 and g5 cannot be allocated to other processor. As a result, we can overlap g4 and g5.

2.


IN TRADITIONAL SYSTEMS safety and reliability are normally considered to be independent issues ‐‐ there are traditional systems that are safe and unreliable and vice‐versa. EXAMPLE 1: a word processing software may

not be very reliable but is safe ‐‐ a failure to the software does not usually cause any significant damage or financial loss. EXAMPLE 2: a hand gun can be unsafe but reliable ‐‐ a hand gun rarely fails ‐‐however, if it fails for some reason, it can misfire or even explode and cause damage.

IN REAL‐TIME SYSTEMS safety and reliability are coupled together.DEFINITIONS:FAIL‐SAFE STATE of a system is a state which if entered when the system fails, no damage would result. EXAMPLE: the fail‐safe state of a word processing program is a state where the document being processed has been saved on the disk.

1.

If no damage can result if a system enters a fail‐safe state just before it fails, then through careful transition to fail‐safe state upon a failure, it is possible to turn an extremely unreliable and unsafe system into a safe system.

SAFETY‐CRITICAL SYSTEM is a system whose failure can cause severe damage. EXAMPLE: the navigation system

on‐board an aircraft. In a safety‐critical system, the absence of fail‐safe states implies that safety can only be ensured through increased reliability

2.

SAFETY AND RELIABILITYNovember‐22‐093:30 PM


HOW TO ACHIEVE HIGH RELIABILITY?

For safety‐critical systems the issue of safety and reliability become interrelated ‐‐ safety can only be ensured through increased reliability. Highly reliable software can be developed by adopting all of the 3 following techniques:ERROR AVOIDANCE. Every possibility of occurrence of errors should be minimized during product development as much as possible ‐‐ adopt well‐founded software engineering practices and sound design methodologies, etc.

1.

ERROR DETECTION AND REMOVAL. In spite of using best available error avoidance techniques, many errors are still possible in the code ‐‐ conducting thorough reviews and testing, the errors can be detected and then removed.

2.

FAULT‐TOLERANCE. It is virtually impossible to make a practical software system entirely error free. Few errors still persist even after carrying out thorough reviews and testing. Therefore, to achieve high reliability, even in situations where errors are present, the system should be able to tolerate the faults and compute the correct results. This is called fault‐tolerance ‐‐ fault tolerance can be achieved by carefully incorporating redundancy.

3.


FAULT TOLERANCE IN HARDWARE

BUILT‐IN SELF TEST (BIST). In BIST, the system periodically performs self tests of its components. Upon detection of a failure, the system automatically reconfigures itself by switching out of the faulty component and switching in one of the redundant good components.TRIPLE MODULAR REDUNDANCY (TMR). In TMR, three redundant copies of all critical components are made to run concurrently.


ENGG4420: Real-Time Systems Design. Developed by Radu Muresan; University of Guelph 759

Fault Tolerance through Redundancy Follows the Real-time System by C. M. Krishna

If a system is kept running despite the failure of some of its parts, it must have spare capacity

Hardware redundancy: the system is provided with far more hw than it would need, typically, between 2 and 3 times as much

Software redundancy: the system is provided with different software versions of tasks, ...

Time redundancy: the task schedule has some slack in it, so that some tasks can be rerun if necessary and still meet critical deadlines

Information redundancy: the data are coded in such a way that a certain number of bit errors can be detected and/or corrected

Additional hardware can be used in 2 ways:(1) Fault detection, correction, and masking (this is a short term measure) ‐‐multiple hardware units may be assigned to the same task in parallel and their results compared ‐‐when units become faulty this will show as a disagreement in the results. We can mask the faults with the majority result (if only a minority of the units are faulty). Example. In

a chemical plant, the computer is accessible to be repaired and replaced so we are interested in short‐term measures to respond to failures.

(2) Replace the malfunctioning units (this is a long term measure) ‐‐ it is possible for systems to be designed so that spares can be switched in to replace any faulty units. Example ‐‐ if a computer is used aboard an unmanned deep‐space probe it must include sufficient spare modules and self repair mechanism to sustain long functionality.



Voting and Consensus

By using redundancy, multiple units can execute the same task and comp the output if at least 3 units are involved, this comparison can

choose the majority value – a process called voting this process can mask some effects of failures

The designer must decide whether exact or approximate agreement is expected

If there are 3 units A, B, C, and both A and B produce the value x while C produces the value x±α, for what value of α is C considered faulty


Types of Voters Approximate agreement may be used in cases where

sensors are measuring the physical environment. Ex There are three main types of voters, which can

function in cases where approximate agreement is required the formalized majority voter the generalized k-plurality voter generalized median voter

Each case uses the distance metric measurement: d(x1,x2) denotes the distance between outputs x1 and x2 if x1 and x2 are real numbers => d(x1,x2) = |x1-x2| if they are vectors of real numbers Cartesian distance may

be chosen



This table assumes that there are N outputs to be voted on, and that N is an odd number

Table 7.1

Comparison of Voter Types

Case Majority Voter k-plurality Voter Median Voter

All outputs and SE

Majority correct and SE

k correct and SE

All outputs correct but none SE

All outputs incorrect and not SE

Majority incorrect and SE

Correct

Correct

No output

No output

No output

Incorrect

Correct

Correct

Correct

No output

No output

Incorrect

Correct

Correct

Possibly correct

Correct

Incorrect

Incorrect

Note: SE = Sufficiently equal

Adapted from Larczak, Caglayan, and Eckhardt



Formalized Majority Voter

If d(x1,x2) ≤ ε; x1 and x2 are sufficiently equal (not transitive relation) for all practical purposes.

The voter constructs a set of classes, P1, …,Pn such that: x,y are in Pi iff d(x,y) ≤ ε, and

Pi is maximal; that is, if z is not in Pi, then there exists some w in Pi such that d(w,z) > ε.

Example. Let ε = 0.001 for some 5-unit system. Let the five outputs be 1.0000, 1.0010, 0.9990, 1.0005, and 0.9970. The classes will be: …

The classes may share some elements ‐‐ take the largest Pi thus generated. If it has more than elements in it, any of its elements can be chosen as the output of the voter.

•

We say Pi is maximal in the sense that there are no other elements that can be added to its set.

•



Generalized k-plurality voter

The generalized k-plurality voter works along the same lines as the generalized majority voter, except that it simply chooses any output from the largest partition Pi, so long as Pi contains at least k elements

k is selected by the system designer


Generalized Median Voter

By selecting the middle value (N is odd) successively throwing away outlying values until only

the middle value is left

Algorithm Let outputs being voted on be the set S={x1,…,xn};

Step1. Compute dij=d(xi,xj) for all xi, xj in S for i ≠ j;

Step2. Let dkl be the maximum such dij (break any ties arbitrarily); define S = S – {xk,xl}; If S contains only one element, that is the output of the voter; else go back to step1.



Static Pairing SchemeP1

P2

P1

P2

interface

monitor

interface

to/from network

to/from network

FIGURE 7.6Static Pairing

FIGURE 7.7Use of a Monitor

TECHNIQUE FOR AUTOMATIC HARDWARE REPLACEMENT: Static pairing is a simple scheme that hardwires processors in pairs and discards the entire pair when one of the processors fails.The pair runs identical software using identical inputs, and compares the output of each task ‐‐ if the outputs are identical, the pair is functional ‐‐ if either processor in the pair detects nonidentical outputs, that is an indication that at least one of the processors in the pair is faulty.

•

The processor that detects this discrepancy switches off the interface to the rest of the system, thus isolating this pair.

•

PROBLEMS WITH THIS SCHEME: 1) if the interface fails; 2) if both processors fail identically and around the same time. The interface problem can be solved by introducing an interface monitor ‐‐monitor and interface can check each other.

•



A cluster of N=2m+1processors is sufficientto guard against upto m failures

N –MODULAR RUDUNDANCY. N –modular redundancy (NMR) is a scheme forforward error recovery. It works by using N processors instead of one, and votingon their output. N is usually odd. Figure 7.8 illustrates this scheme for N = 3.One of the approaches is possible. In design (a), there are N voters and the entire cluster produces N outputs. In design (b), there is just one voter.

To sustain up to m failed units the NMR system requires (2m + 1) units inall. The most popular is the triplex, which consists of a total of 3 units and can mask the effects of up to one failure.

Usually, the NMR clusters are designed to allow the purging ofmalfunctioning units.

Figure Structure of an NMR cluster.(a) N voters (b) single voter.(a) (b)

voter

voter

voter

P1

P2

P3

P1

P2

P3

voter


SOFTWARE FAULT‐TOLERANCE TECHNIQUES

Three methods are popular for software fault‐tolerance: 1) N‐version programming technique, 2) recovery block technique, and 3) roll‐back recovery.

Independent teams develop N different versions (value of N depends on the degree of fault‐tolerance required) of a software component (module) ‐‐ the central idea is that independent teams would commit different types of mistakes, which would be eliminated when the results produced by them are subjected to voting.

○

The redundant modules are run concurrently (possibly on redundant hardware);

○

The results produced by the different versions of module are subject to voting at run time and the result on which majority of the components agree is accepted

○

N‐VERSION PROGRAMMING. This technique is an adaption of the TMR technique for hardware fault‐tolerance. In the N‐version programming technique:

THE SCHEME is not very successful in achieving fault‐tolerance and the problem can be attributed to "statistical correlation of failure" ‐‐ which means that even with independent teams developing different version the versions tend to fail for identical reasons.FOR EXAMPLE it is easy to understand that programmers commit errors in those parts of a problem which they perceive to be difficult ‐‐ and what is difficult to one team is difficult to all teams. SO, identical errors remain in the most complex and least understood parts of a software component.


RECOVERY BLOCKS

In the recovery block scheme, the redundant components are called "try blocks". Each try block computes the same end result as the others but is intentionally written using a different algorithm compared to the other try blocks.

•

In this scheme the try blocks are run one after another•The results produced by a try block are subjected to an acceptance test ‐‐ if the acceptance test fails then the next try block is tried

•

The process is repeated in a sequence until the result produced by a try block successfully passes the test.

•

The scheme can use a common test for all blocks•



Software Redundancy

To provide reliability in the face of software faults, we must use redundancy simply replicating the same software N times will

not work instead, the N versions of the software must be

diverse so that the probability that they fail on the same input is acceptably small this can be done by having different software teams

generating software for the same task

There are 2 approaches in handling multiple versions of software: N-version programming; and recovery-block


Software Redundancy Structures

version 1 version 1

version 1 version 1

version 1 version 1

acceptance test

acceptance test

acceptance test

(a) (b)

pass

pass

fail

fail

FIGURESoftware fault-tolerant structures; (a) N –version programming; (b) recovery-block approach.

voter


CHECKPOINTING AND ROLL‐BACK RECOVERY

In this scheme as the computation proceeds, the system state is tested each time after some meaningful progress in computation is made. Immediately after a state‐check succeeds, the state of the system is backed up on a stable storage

•

If the next test does not succeed the system can be made to roll back to the last checkpointed state.

•

After a roll back, from a checkpointed state a fresh computation can be initiated

•

This technique is especially useful, if there is a chance that the system state may be corrupted as the computation proceeds.

•



Time Redundancy-Implementing Backward Error Recovery

Critical to the successful implementation of backward error recovery is the restoration of the state of the affected processor to what it was before the error occurred

Corrective action, such as assigning another processor to carry on with the execution beyond this point or retrying on the same processor with the corrected state information, can be taken


Recovery Points

A := 100;C := 35;F := 40;

A := 99;Q := 35;L := 134;

C := 36;D := 44;F := 39;

A := 89;C := 44;

Program Checkpoint Contents

Time

FIGURE Checkpointing a program (Riindicate recovery points; Ci

indicate checkpoints).

A 100C 35F 40

A 99 Q 35C 35 L 135F 40

A 99 F 40C 35 Q 35D 44 L 135

C1

C2

C3

R1

R2

R3



Example

C1 C2 C3 C4 C5 C6 C7 C8

system rolls back to here

error occurred here

error detected here

FIGURECheckpoints and rolling back the process (Ci indicate checkpoints);


Recovery Cache

A := 100;C := 35;F := 40;

A := 99;Q := 35;L := 134;

C := 36;D := 44;F := 39;

A := 89;C := 44;

Program

Checkpoint ContentsTime

A 100

C 35F 40

A 99C 36

RC1

RC2

RC3

R1

R2

R3

t1

12

t3

Recovery cache contents

FIGURE Recovery caches.



Cyclic Codes [Reading]

Cyclic is carried out by multiplying the word to be coded by a polynomial, called the generator polynomial

All additions in this process are modulo-2

Multiplication by Xn essentially means shifting by n places

Example ...


Let us consider more complex multiplication. Multiply the word to be coded,1+X+X5 (representing the number 100011), by the generator polynomial 1+X+X2 (representing 101). We have 1+(1+1)X+(1+1)X2+1X3+0X4+1X5+1X6+1X7. Doing the additions modulo-2 (which means putting them through exclusive-OR gates) results in 1+0X+0X2+1X3+1X4+1X5 = 1+X3+X5+X6+X7, representing 11101001. The coded value corresponding to 100011 is therefore 11101001. The circuit in Figure 7.20 will carry out this coding operation. To begin, all flip-flops have their value set to 0. the flip-flops represent multiplication.

The coding circuit can be written down by inspection of the generatorpolynomial. Let us return to figure 7.20; note that the input is fed in serially,bit by bit, When we ask for the multiplication (with modulo-2 addition)by 1+X+X2, we are, in effect saying, “add, mudulo-2, the present inputbit to the previous one (representing X) to the input before that one(representing X2).” the circuit follows immediately from what: The flip-flopproduces that required delay, and the exclusive-OR gate carries out the modulo-2addition.

FIGURE:Coding with the generatorpolynomial 1+X+X2.

serial output

serial input

D QD Q


ENGG4420 ‐‐ CHAPTER 5 ‐‐ LECTURE 1 5 By Radu Muresan... · tasks require no other resources...

Documents

Transcript of ENGG4420 ‐‐ CHAPTER 5 ‐‐ LECTURE 1 5 By Radu Muresan... · tasks require no other resources...