5 May 20051 CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.

Post on 05-Jan-2016

214 views 0 download

Transcript of 5 May 20051 CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.

5 May 2005 1

CmpE 516

Fault Tolerant Scheduling in Multiprocessor Systems

Betül Demiröz

5 May 2005 2

Outline

General consepts about tasks and schedulingReal time systemsFault Tolerant SchedulingBasic approaches used in Fault Tolerant SchedulingAlgorithms and their execution details

5 May 2005 3

Task

Deadlinetime the task should be finished

Preemptive tasks can be stopped during executionrestarted

Nonpreemptive tasks cannot berestartedinterrupted during execution

5 May 2005 4

Task Properties

PeriodicAperiodic

activated only when certain events occurarrival times are not knownscheduled dynamically

DependentIndependent

5 May 2005 5

Task Scheduling

Distribution of tasks to the processors according to a given policy.Major goals of task scheduling:

distribute the system loadreduce total execution time

5 May 2005 6

Static & Dynamic SchedulingStatic Scheduling

compile time schedulingan accurate weight estimation is neededschedules of all tasks are predetermined

Dynamic Schedulingscheduling at run timeuses actual values of execution times of processes and communication times

5 May 2005 7

Real Time Systems

Hard Real TimeCorrectness depends on

logical resultsthe result production time

missing a deadline may be catastrophicmission-critical or life-critical applicationsfault tolerance is extremely important

Soft Real Time

5 May 2005 8

Processors In The System

Uniprocessorthere is a single processor

Multiprocessorthere are n processors in the systemcan be identical (homogenous)can have different properties (heterogenous)

5 May 2005 9

Hard Real Time SystemsUse multiprocessorAdvantages

more reliableunless a processor failure causes the whole system to failcan happen if no fault-tolerant capability is provided

one processor failure does not cause the whole system to failmore computational power

Disadvantagethe probability of processor failure is higher

5 May 2005 10

Fault Tolerant SystemThe system should produce correct results even in the presence of faultsImportant for most real time applicationsTasks can have deadlines, and should be finished before the deadline

fault tolerance requiredhard real time systems

5 May 2005 11

Error Detection in Fault Tolerant Scheduling

Fail-Signalnotify other processors of a detected fault

Alarms or watchdogsdetection of timing failures

Signaturesdetection of HW/SW faults

Acceptance Teststest results for HW/SW faults

5 May 2005 12

Fault Tolerance In Multiprocessor Systems

Multiple copies of tasks scheduled on different processorsAim: the task completes before its deadline

5 May 2005 13

Fault Tolerante Scheduling In Multiprocessor Systems (Cont.)

Multiple copies of tasks are scheduled to different processorsOne or more copies can run to ensure task completion before deadlinePB (Primary/Backup Approach)TMR (Triple Modular Redundancy)

Error checking is done by comparing results

5 May 2005 14

PB (Primary/Backup Approach)

If incorrect results are generated from primary processor, backup processor is activatedSmall HW resource requirementsTasks are

nonpreemptive, aperiodic, real-time

5 May 2005 15

An Algorithm For Real Time Fault Tolerant Scheduling in

Multiprocessor Systems

N periodic tasks are scheduled on a number of processorsFor each task i, there is a primary copy Pi and a backup copy Bi

If primary copy fails, backup copy is activatedEnough time needed to execute backup copiesStatic scheduling of tasks

5 May 2005 16

Scheduling RequirementsEach task is executed by one processor at a timeAll tasks should meet their deadlinesMaximize the number of processor failures to be toleratedPi and Bi are assigned to only one processor which are different.Tasks are preemptive The number of processors used should be minimized

5 May 2005 17

Scheduling AlgorithmPrimary tasks are arranged in order of decreasing computation timesPrimary copies are scheduled (m processors are used)

assign each copy to existing processors

Primary schedule is dublicated for the backup copies (m processors are used)Any pair of primary and backup copies should not overlap

5 May 2005 18

An Example Distribution

S={T1, T2, T3, T4, T5}

C={5, 4, 4, 3, 2}T1 -> P1

T2 -> P2

T3 -> P1

T4 -> P2

T5 -> P2

5 May 2005 19

Example Cont.

5 May 2005 20

Another Algorithm

Two copies of tasks allowed to start execution on different timesImproves schedulability of tasksN identical processors and a scheduling processor are usedDynamic scheduling

5 May 2005 21

System Model

A task is scheduled ifPreviously scheduled and the arrived task meet their deadlines

OtherwiseTask is rejected because its deadline is not met despite of a fault

5 May 2005 22

Techniques Used

Backup copies are activated only when a fault occurs on the processor executing the primary copyBackup Overloading

overlaping multiple slots for backups

Backup De-allocationRelease the slot for a backup copy when its primary copy is completed successfully

5 May 2005 23

Backup Overloading

5 May 2005 24

Backup Deallocation

5 May 2005 25

Proposed Technique

The primary copy and backup copy are scheduled and executed in parallelThe backup copy is divided into

preceding part executed together with primary copy (redundant part)remaining part executed after the primary copy is completed (backup part)

Backup overloading and backup deallocation is used

5 May 2005 26

Proposed Technique (Cont.)

5 May 2005 27

Scheduling Algorithm

Schedule primary copy try to find a free slot between arrival time and deadline time

Schedule backup copyschedule both redundant and backup parts

5 May 2005 28

System Overwiev

5 May 2005 29

Experiments

Basic parameters used in experiments

system loadnumber of processors and tasks usedcomputation timewindow size

Analysing resultsrejection rate

5 May 2005 30

Experimental Results

5 May 2005 31

Thank You

ANY QUESTIONS?