Post on 05-Jan-2016
5 May 2005 1
CmpE 516
Fault Tolerant Scheduling in Multiprocessor Systems
Betül Demiröz
5 May 2005 2
Outline
General consepts about tasks and schedulingReal time systemsFault Tolerant SchedulingBasic approaches used in Fault Tolerant SchedulingAlgorithms and their execution details
5 May 2005 3
Task
Deadlinetime the task should be finished
Preemptive tasks can be stopped during executionrestarted
Nonpreemptive tasks cannot berestartedinterrupted during execution
5 May 2005 4
Task Properties
PeriodicAperiodic
activated only when certain events occurarrival times are not knownscheduled dynamically
DependentIndependent
5 May 2005 5
Task Scheduling
Distribution of tasks to the processors according to a given policy.Major goals of task scheduling:
distribute the system loadreduce total execution time
5 May 2005 6
Static & Dynamic SchedulingStatic Scheduling
compile time schedulingan accurate weight estimation is neededschedules of all tasks are predetermined
Dynamic Schedulingscheduling at run timeuses actual values of execution times of processes and communication times
5 May 2005 7
Real Time Systems
Hard Real TimeCorrectness depends on
logical resultsthe result production time
missing a deadline may be catastrophicmission-critical or life-critical applicationsfault tolerance is extremely important
Soft Real Time
5 May 2005 8
Processors In The System
Uniprocessorthere is a single processor
Multiprocessorthere are n processors in the systemcan be identical (homogenous)can have different properties (heterogenous)
5 May 2005 9
Hard Real Time SystemsUse multiprocessorAdvantages
more reliableunless a processor failure causes the whole system to failcan happen if no fault-tolerant capability is provided
one processor failure does not cause the whole system to failmore computational power
Disadvantagethe probability of processor failure is higher
5 May 2005 10
Fault Tolerant SystemThe system should produce correct results even in the presence of faultsImportant for most real time applicationsTasks can have deadlines, and should be finished before the deadline
fault tolerance requiredhard real time systems
5 May 2005 11
Error Detection in Fault Tolerant Scheduling
Fail-Signalnotify other processors of a detected fault
Alarms or watchdogsdetection of timing failures
Signaturesdetection of HW/SW faults
Acceptance Teststest results for HW/SW faults
5 May 2005 12
Fault Tolerance In Multiprocessor Systems
Multiple copies of tasks scheduled on different processorsAim: the task completes before its deadline
5 May 2005 13
Fault Tolerante Scheduling In Multiprocessor Systems (Cont.)
Multiple copies of tasks are scheduled to different processorsOne or more copies can run to ensure task completion before deadlinePB (Primary/Backup Approach)TMR (Triple Modular Redundancy)
Error checking is done by comparing results
5 May 2005 14
PB (Primary/Backup Approach)
If incorrect results are generated from primary processor, backup processor is activatedSmall HW resource requirementsTasks are
nonpreemptive, aperiodic, real-time
5 May 2005 15
An Algorithm For Real Time Fault Tolerant Scheduling in
Multiprocessor Systems
N periodic tasks are scheduled on a number of processorsFor each task i, there is a primary copy Pi and a backup copy Bi
If primary copy fails, backup copy is activatedEnough time needed to execute backup copiesStatic scheduling of tasks
5 May 2005 16
Scheduling RequirementsEach task is executed by one processor at a timeAll tasks should meet their deadlinesMaximize the number of processor failures to be toleratedPi and Bi are assigned to only one processor which are different.Tasks are preemptive The number of processors used should be minimized
5 May 2005 17
Scheduling AlgorithmPrimary tasks are arranged in order of decreasing computation timesPrimary copies are scheduled (m processors are used)
assign each copy to existing processors
Primary schedule is dublicated for the backup copies (m processors are used)Any pair of primary and backup copies should not overlap
5 May 2005 18
An Example Distribution
S={T1, T2, T3, T4, T5}
C={5, 4, 4, 3, 2}T1 -> P1
T2 -> P2
T3 -> P1
T4 -> P2
T5 -> P2
5 May 2005 19
Example Cont.
5 May 2005 20
Another Algorithm
Two copies of tasks allowed to start execution on different timesImproves schedulability of tasksN identical processors and a scheduling processor are usedDynamic scheduling
5 May 2005 21
System Model
A task is scheduled ifPreviously scheduled and the arrived task meet their deadlines
OtherwiseTask is rejected because its deadline is not met despite of a fault
5 May 2005 22
Techniques Used
Backup copies are activated only when a fault occurs on the processor executing the primary copyBackup Overloading
overlaping multiple slots for backups
Backup De-allocationRelease the slot for a backup copy when its primary copy is completed successfully
5 May 2005 23
Backup Overloading
5 May 2005 24
Backup Deallocation
5 May 2005 25
Proposed Technique
The primary copy and backup copy are scheduled and executed in parallelThe backup copy is divided into
preceding part executed together with primary copy (redundant part)remaining part executed after the primary copy is completed (backup part)
Backup overloading and backup deallocation is used
5 May 2005 26
Proposed Technique (Cont.)
5 May 2005 27
Scheduling Algorithm
Schedule primary copy try to find a free slot between arrival time and deadline time
Schedule backup copyschedule both redundant and backup parts
5 May 2005 28
System Overwiev
5 May 2005 29
Experiments
Basic parameters used in experiments
system loadnumber of processors and tasks usedcomputation timewindow size
Analysing resultsrejection rate
5 May 2005 30
Experimental Results
5 May 2005 31
Thank You
ANY QUESTIONS?