Implementation Study - Courses (please see the UBC course...
Transcript of Implementation Study - Courses (please see the UBC course...
![Page 1: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/1.jpg)
1 Real-Time Scalability
On the Scalability of Real-Time Scheduling Algorithms on
Multicore Platforms: A Case Study
Sathish Gopalakrishnan The University of British Columbia
(based on work by others at the University of North Carolina)
![Page 2: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/2.jpg)
2 Real-Time Scalability
Focus of this Talk
Multicore platforms are predicted to get much larger in the future. » 10s or 100s of cores per chip, multiple
hardware threads per core.
Research Question: How will different real-time scheduling algorithms scale?
» Scalability is defined w.r.t. schedulability (more on this later).
![Page 3: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/3.jpg)
3 Real-Time Scalability
Outline
Background. » Real-time workload assumed. » Scheduling algorithms evaluated. » Some properties of these algorithms.
Research questions addressed. Experimental results. Observations/speculation. Future work.
![Page 4: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/4.jpg)
4 Real-Time Scalability
Real-Time Workload Assumed in this Talk
Set τ of periodic tasks scheduled on M cores:
0 10 20 30
T = (2,5)
5 15 25
U = (9,15)
2 5
One Core Here
![Page 5: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/5.jpg)
5 Real-Time Scalability
Real-Time Workload Assumed in this Talk
Set τ of periodic tasks scheduled on M cores: » Task T = (T.e,T.p) releases a job with exec. cost T.e every
T.p time units. – T’s utilization (or weight) is U(T) = T.e/T.p. – Total utilization is U(τ) = ΣT T.e/T.p.
0 10 20 30
T = (2,5)
5 15 25
U = (9,15)
2 5
One Core Here
![Page 6: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/6.jpg)
6 Real-Time Scalability
Real-Time Workload Assumed in this Talk
Set τ of periodic tasks scheduled on M cores: » Task T = (T.e,T.p) releases a job with exec. cost T.e every
T.p time units. – T’s utilization (or weight) is U(T) = T.e/T.p. – Total utilization is U(τ) = ΣT T.e/T.p.
0 10 20 30
T = (2,5)
5 15 25
U = (9,15)
2 5
One Core Here
![Page 7: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/7.jpg)
7 Real-Time Scalability
Real-Time Workload Assumed in this Talk
Set τ of periodic tasks scheduled on M cores: » Task T = (T.e,T.p) releases a job with exec. cost T.e every
T.p time units. – T’s utilization (or weight) is U(T) = T.e/T.p. – Total utilization is U(τ) = ΣT T.e/T.p.
» Each job of T has a deadline at the next job release of T.
0 10 20 30
T = (2,5)
5 15 25
U = (9,15)
2 5
One Core Here
![Page 8: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/8.jpg)
8 Real-Time Scalability
Real-Time Workload Assumed in this Talk
Set τ of periodic tasks scheduled on M cores: » Task T = (T.e,T.p) releases a job with exec. cost T.e every
T.p time units. – T’s utilization (or weight) is U(T) = T.e/T.p. – Total utilization is U(τ) = ΣT T.e/T.p.
» Each job of T has a deadline at the next job release of T.
0 10 20 30
T = (2,5)
5 15 25
U = (9,15)
2 5
One Core Here
![Page 9: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/9.jpg)
9 Real-Time Scalability
Real-Time Workload Assumed in this Talk
Set τ of periodic tasks scheduled on M cores: » Task T = (T.e,T.p) releases a job with exec. cost T.e every
T.p time units. – T’s utilization (or weight) is U(T) = T.e/T.p. – Total utilization is U(τ) = ΣT T.e/T.p.
» Each job of T has a deadline at the next job release of T.
0 10 20 30
T = (2,5)
5 15 25
U = (9,15)
2 5
One Core Here
This is an earliest-deadline-first schedule. Much of our work pertains to EDF scheduling…
![Page 10: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/10.jpg)
10 Real-Time Scalability
Scheduling vs. Schedulability
W.r.t. scheduling, we actually care about two kinds of algorithms: » Scheduling algorithm (of course).
– Example: Earliest-deadline-first (EDF): Jobs with earlier deadlines have higher priority.
» Schedulability test.
Test for EDF
τ yes no
no timing requirement will be violated if τ is scheduled with EDF
a timing requirement will (or may) be violated …
![Page 11: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/11.jpg)
11 Real-Time Scalability
Multiprocessor Real-Time Scheduling
Two Approaches:
Steps: 1. Assign tasks to processors (bin
packing). 2. Schedule tasks on each
processor using a uniprocessor algorithm.
Partitioning Global Scheduling
Important Differences: • One task queue. • Tasks may migrate among
the processors.
![Page 12: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/12.jpg)
12 Real-Time Scalability
Scheduling Algorithms Considered
Partitioned EDF: PEDF. Preemptive & Non-preemptive Global
EDF: GEDF & NP-GEDF. Clustered EDF: CEDF.
» Partition onto clusters of cores, globally schedule within each cluster
L2
From other 8 cores…
L1
C C C C
L1
C C C C clusters
![Page 13: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/13.jpg)
13 Real-Time Scalability
Scheduling Algorithms (Continued)
PD2, a global Pfair algorithm. » Schedule jobs one quantum at a time at a
“uniform” rate. – May preempt and migrate jobs frequently.
Staggered PD2: S-PD2. » Same as PD2 but quanta are “staggered” to
avoid excessive bus contention.
![Page 14: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/14.jpg)
14 Real-Time Scalability
PD2 Example
Under partitioning & most global algorithms, overall utilization must be capped to avoid deadline misses. » Due to connections to bin-packing.
Exception: Global “Pfair” algorithms do not require caps. » Such algorithms schedule jobs one quantum at a time.
– May therefore preempt and migrate jobs frequently. – Perhaps less of a concern on a multicore platform.
Under most global algorithms, if utilization is not capped, deadline tardiness is bounded. » Sufficient for soft real-time systems.
3 tasks with parameters (2,3) on two processors…
0 10 20 30
T = (2,3)
5 15 25
U = (2,3)
V = (2,3)
On Processor 1 On Processor 2
![Page 15: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/15.jpg)
15 Real-Time Scalability
Schedulability
HRT: No deadline is missed. SRT: Deadline tardiness is bounded. For some scheduling algorithms,
utilization loss is inherent when checking schedulability. » That is, schedulability cannot be
guaranteed for all task systems with total utilization at most M.
![Page 16: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/16.jpg)
16 Real-Time Scalability
Example: PEDF
Under partitioning & most global algorithms, overall utilization must be capped to avoid deadline misses. » Due to connections to bin-packing.
Exception: Global “Pfair” algorithms do not require caps. » Such algorithms schedule jobs one quantum at a time.
– May therefore preempt and migrate jobs frequently. – Perhaps less of a concern on a multicore platform.
Under most global algorithms, if utilization is not capped, deadline tardiness is bounded. » Sufficient for soft real-time systems.
Example: Partitioning three tasks with parameters (2,3) on two processors will overload one processor.
In terms of bin-packing…
Processor 1 Processor 2
Task 1
Task 2 Task 3
0
1
![Page 17: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/17.jpg)
17 Real-Time Scalability
Schedulability Summary
HRT SRT
PEDF util. loss util. loss (same as HRT) GEDF util. loss no loss NP-GEDF util. loss no loss CEDF util. loss util. loss (not as bad as PEDF)
PD2 no loss no loss S-PD2 slight loss no loss
(must shrink periods by one quantum)
![Page 18: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/18.jpg)
18 Real-Time Scalability
GEDF SRT Example
Under partitioning & most global algorithms, overall utilization must be capped to avoid deadline misses. » Due to connections to bin-packing.
Exception: Global “Pfair” algorithms do not require caps. » Such algorithms schedule jobs one quantum at a time.
– May therefore preempt and migrate jobs frequently. – Perhaps less of a concern on a multicore platform.
Under most global algorithms, if utilization is not capped, deadline tardiness is bounded. » Sufficient for soft real-time systems.
Earlier example with GEDF…
0 10 20 30
T = (2,3)
5 15 25
U = (2,3)
V = (2,3)
Tardiness is at most one quantum.
![Page 19: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/19.jpg)
19 Real-Time Scalability
Outline
Background. » Real-time workload assumed. » Scheduling algorithms evaluated. » Some properties of these algorithms.
Research questions addressed. Experimental results. Observations/speculation. Future work.
![Page 20: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/20.jpg)
20 Real-Time Scalability
Research Questions
In theory, PD2 is always preferable. » It is optimal (no utilization loss).
What about in practice? » That is, what happens if system overheads
are taken into account? Do migrations really matter on a
multicore platform with a shared cache? As multicore platforms get larger, will
global algorithms scale?
Focus of this Talk: An Experimental comparison of these scheduling algorithms on the basis of schedulability.
![Page 21: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/21.jpg)
21 Real-Time Scalability
Test System
HW platform: Sun Niagara (UltraSPARC T1).
– OS has 32 “logical CPUs” to manage. – Far larger than any system considered before in RT literature. – Note: CEDF “cluster” = 4 HW threads on a core.
Core 1 Core 8
L1 L1
L2
… • 1.2 GHz “RISC-like” cores.
• Relatively simple, e.g., no instr. reordering or branch prediction.
• Caches somewhat small compared to Intel.
4 HW threads per core
16K (8K) L1 instr. (data) cache per core Shared 3MB L2
![Page 22: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/22.jpg)
22 Real-Time Scalability
Test System (Cont’d)
Operating System: LITMUSRT: LInux Testbed for MUltiprocessor Scheduling in Real-Time systems. » Developed at UNC. » Extends Linux by allowing different schedulers to
be linked as “plug-in” components. » Several (real-time) synchronization protocols are
also supported. » Code is available at http://www.cs.unc.edu/
~anderson/litmus-rt/.
![Page 23: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/23.jpg)
23 Real-Time Scalability
Methodology
Ran several hundred (synthetic) task sets on the test system.
Collected 70 GB of raw overhead samples. Distilled expressions for average (for SRT)
and worst-case (for HRT) overheads. Conducted schedulability experiments
involving 8.5 million randomly-generated task sets with overheads considered.
Note: This step is offline. It does not involve the Niagara.
![Page 24: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/24.jpg)
24 Real-Time Scalability
Kinds of Overheads
Tick scheduling overhead. » Incurred when the kernel is invoked at the beginning of
each quantum (timer “tick”). A quantum is 1ms. Release overhead.
» Incurred when the kernel is invoked to handle a job release.
Scheduling overhead. » Incurred when the scheduler (in the kernel) is invoked.
Context-switching overhead. » Non-cache-related costs associated with a context switch.
Preemption/migration overhead. » Costs incurred upon a preemption/migration due to a loss
of cache affinity.
These overheads can be accounted for in schedulability tests by inflating
job execution costs.
(Doing this correctly is a little tricky.)
![Page 25: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/25.jpg)
25 Real-Time Scalability
Kernel Overheads
Alg Scheduling Overhead (in µs) PD2 32.7 S-PD2 43.1 GEDF/NP-GEDF 55.2+.26N (N = no. of tasks)
Most overheads were small (2-15µs) except worst-case overheads impacted by global queues. » Most notable: Worst-case scheduling overheads
for PD2, S-PD2, and GEDF/NP-GEDF:
![Page 26: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/26.jpg)
26 Real-Time Scalability
Preemption/Migration Overheads
Obtained by measuring synthetic tasks, each with a 64K working set & 75/25 read/write ratio. » Interesting trends: PD2 is terrible, staggering really
helps, preempt. cost ≈ mig. cost per algorithm, but algorithms that migrate have higher costs.
Worst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster Mig Inter-Cluster Mig PD2 681.1 649.4 654.2 681.1 S-PD2 104.1 103.4 103.4 104.1 GEDF 375.4 375.4 326.8 321.1 CEDF 171.6 171.6 167.3 --- PEDF 139.1 139.1 --- ---
![Page 27: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/27.jpg)
27 Real-Time Scalability
Schedulability Results
Generated random tasks using 6 distributions and checked schedulability using “state-of-the-art” tests (with overheads considered). » 8.5 million task sets in total.
Distributions: » Utilizations uniform over
– [0.001,01] (light), – [0.1,0.4] (medium), and – [0.5,09] (heavy).
» Bimodal with utilizations distributed over either [0.001,05) or [0.5,09] with probabilities of – 8/9 and 1/9 (light), – 6/9 and 3/9 (medium), and – 4/9 and 5/9 (heavy).
![Page 28: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/28.jpg)
28 Real-Time Scalability
Schedulability Results
Generated random tasks using 6 distributions and checked schedulability using “state-of-the-art” tests (with overheads considered). » 8.5 million task sets in total.
Distributions: » Utilizations uniform over
– [0.001,01] (light), – [0.1,0.4] (medium), and – [0.5,09] (heavy).
» Bimodal with utilizations distributed over either [0.001,05) or [0.5,09] with probabilities of – 8/9 and 1/9 (light), – 6/9 and 3/9 (medium), and – 4/9 and 5/9 (heavy).
will only show graphs for these
![Page 29: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/29.jpg)
29 Real-Time Scalability
HRT Summary
PEDF usually wins. » Exception: Lots of heavy tasks (makes bin-packing
hard). S-PD2 usually does well.
» Staggering has an impact.
PD2 and GEDF are quite poor. » PD2 is negatively impacted by high preemption and
migration costs due to aligned quanta. » GEDF suffers from high scheduling costs (due to
the global queue).
![Page 30: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/30.jpg)
30 Real-Time Scalability
HRT, Bimodal Light
PEDF peforms pretty well if most task utilizations are low.
S-PD2 performs pretty well too.
![Page 31: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/31.jpg)
31 Real-Time Scalability
HRT, Bimodal Light
![Page 32: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/32.jpg)
32 Real-Time Scalability
HRT, Bimodal Medium
In this and the next slide, as the fraction of heavy tasks grows, the gap between S-PD2 and PEDF narrows.
![Page 33: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/33.jpg)
33 Real-Time Scalability
HRT, Bimodal Medium
![Page 34: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/34.jpg)
34 Real-Time Scalability
HRT, Bimodal Heavy
![Page 35: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/35.jpg)
35 Real-Time Scalability
SRT Summary
PEDF is not as effective as before, but still OK in light-mostly cases.
CEDF performs the best in most cases. S-PD2 still performs generally well. GEDF is still negatively impacted by
higher scheduling costs. » Note: SRT schedulability for GEDF entails
no utilization loss. » NP-GEDF and GEDF are about the same.
Note: The scale is different from before.
![Page 36: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/36.jpg)
36 Real-Time Scalability
SRT, Bimodal Light
PEDF and CEDF perform well if tasks are mostly light.
Note: S-PD2 never performs really badly in any experiment.
![Page 37: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/37.jpg)
37 Real-Time Scalability
SRT, Bimodal Light
![Page 38: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/38.jpg)
38 Real-Time Scalability
SRT, Bimodal Medium
This and the next slide show that as the frequency of heavy tasks increases, PEDF degrades. CEDF isn’t affected by this increase much.
![Page 39: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/39.jpg)
39 Real-Time Scalability
SRT, Bimodal Medium
![Page 40: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/40.jpg)
40 Real-Time Scalability
SRT, Bimodal Heavy
![Page 41: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/41.jpg)
41 Real-Time Scalability
Outline
Background. » Real-time workload assumed. » Scheduling algorithms evaluated. » Some properties of these algorithms.
Research questions addressed. Experimental results. Observations/speculation. Future work.
![Page 42: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/42.jpg)
42 Real-Time Scalability
Observations/Speculation
Global algorithms are really sensitive to how shared queues are implemented. » Saw 100X performance improvement by switching
from linked lists to binomial heaps. » Still working on this… » Speculation: Can reduce GEDF costs to close to
PEDF costs for systems with ≤ 32 cores. Per algorithm, preempt. cost ≈ mig. cost.
» Due to having a shared cache. » One catch: Migrations increase both costs.
Quantum staggering is very effective.
![Page 43: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/43.jpg)
43 Real-Time Scalability
Observations/Speculation (Cont’d)
No one “best” algorithm. Intel has claimed they will produce an 80-
core general-purpose chip. If they do… » the cores will have to be simple ⇒ high
execution costs ⇒ high utilizations ⇒ PEDF will suffer;
» “pure” global algorithms will not scale; » some instantiation of CEDF (or maybe CS-
PD2) will hit the “sweet spot”.
![Page 44: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/44.jpg)
44 Real-Time Scalability
Future Work
Thoroughly study “how to implement shared queues”.
Repeat this study on Intel and embedded machines.
Examine mixed HRT/SRT workloads. Factor in synchronization and dynamic
behavior. » In past work, PEDF was seen to be more
negatively impacted by these things.
![Page 45: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/45.jpg)
45 Real-Time Scalability
Thanks!
Questions?
![Page 46: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/46.jpg)
46 Real-Time Scalability
SRT Tardiness, Uniform Medium
![Page 47: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/47.jpg)
47 Real-Time Scalability
Measuring Overheads
Done using a UNC-produced tracer called Feather-Trace. » http://www.cs.unc.edu/~bbb/feathertrace/
Highest 1% of values were tossed. » Eliminates “outliers” due to non-deterministic
behavior in Linux, warm-up effects, etc. Used worst-case (average-case) values for
HRT (SRT) schedulability. Used linear regression analysis to produce
linear (in the task count) overhead expressions.
![Page 48: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/48.jpg)
48 Real-Time Scalability
Obtaining Kernel Overheads
Ran 90 (synthetic) task sets per scheduling algorithm for 30 sec.
In total, over 600 million individual overheads were recorded (45 GB of data).
![Page 49: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/49.jpg)
49 Real-Time Scalability
Kernel Overheads (in µs) (N = no. of tasks)
Worst-Case Alg Tick Schedule Context SW Release PD2 11.2 +.3N 32.7 3.1+.01N --- S-PD2 4.8+.3N 43.1 3.2+.003N --- GEDF 3+.003N 55.2+.26N 29.2 45+.3N CEDF 3.2 14.8+.01N 6.1 30.3 PEF 2.7+.002N 8.6+.01N 14.9+.04N 4.7+.009N
Average Alg Tick Schedule Context SW Release PD2 4.3+.03N 4.7 2.6+.001N --- S-PD2 2.1+.02N 4.2 2.5+.001N --- GEDF 2.1+.002N 11.8+.06N 7.6 5.8+.1N CEDF 2.8 6.1+.01N 3.2 16.5 PEDF 2.1+.002N 2.7+.008N 4.7+.005N 4+.005N
![Page 50: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/50.jpg)
50 Real-Time Scalability
Kernel Overheads (in µs) (N = no. of tasks)
Worst-Case Alg Tick Schedule Context SW Release PD2 11.2 +.3N 32.7 3.1+.01N --- S-PD2 4.8+.3N 43.1 3.2+.003N --- GEDF 3+.003N 55.2+.26N 29.2 45+.3N CEDF 3.2 14.8+.01N 6.1 30.3 PEF 2.7+.002N 8.6+.01N 14.9+.04N 4.7+.009N
Average Alg Tick Schedule Context SW Release PD2 4.3+.03N 4.7 2.6+.001N --- S-PD2 2.1+.02N 4.2 2.5+.001N --- GEDF 2.1+.002N 11.8+.06N 7.6 5.8+.1N CEDF 2.8 6.1+.01N 3.2 16.5 PEDF 2.1+.002N 2.7+.008N 4.7+.005N 4+.005N
![Page 51: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/51.jpg)
51 Real-Time Scalability
Obtaining Preemption/Migration Overheads
Ran 90 (synthetic) task sets per scheduling algorithm for 60 sec.
Each task has a 64K working set (WS) that it accesses repeatedly with a 75/25 read/write ratio.
Recorded time to access WS after preemption/migration minus “cache-warm access”.
In total, over 105 million individual preemption/migration overheads were recorded (15 GB of data).
![Page 52: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/52.jpg)
52 Real-Time Scalability
Preemption/Migration Overheads (in µs) (N = no. of tasks)
Worst-Case Alg Overall Preemption Intra-Cluster Mig Inter-Cluster Mig PD2 681.1 649.4 654.2 681.1 S-PD2 104.1 103.4 103.4 104.1 GEDF 375.4 375.4 326.8 321.1 CEDF 171.6 171.6 167.3 --- PEDF 139.1 139.1 --- ---
Average Alg Overall Preemption Intra-Cluster Mig Inter-Cluster Mig PD2 172 131.4 141.8 187.6 S-PD2 89.3 86.2 87.8 90.2 GEDF 73 95.1 73.5 72.6 CEDF 67 78.5 64.8 --- PEDF 72.3 72.3 --- ---
![Page 53: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/53.jpg)
53 Real-Time Scalability
Preemption/Migration Overheads (in µs) (N = no. of tasks)
Worst-Case Alg Overall Preemption Intra-Cluster Mig Inter-Cluster Mig PD2 681.1 649.4 654.2 681.1 S-PD2 104.1 103.4 103.4 104.1 GEDF 375.4 375.4 326.8 321.1 CEDF 171.6 171.6 167.3 --- PEDF 139.1 139.1 --- ---
Average Alg Overall Preemption Intra-Cluster Mig Inter-Cluster Mig PD2 172 131.4 141.8 187.6 S-PD2 89.3 86.2 87.8 90.2 GEDF 73 95.1 73.5 72.6 CEDF 67 78.5 64.8 --- PEDF 72.3 72.3 --- ---
![Page 54: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/54.jpg)
54 Real-Time Scalability
HRT, Uniform Light
This is the easiest case for partitioning, so PEDF wins.
S-PD2 does pretty well too.
![Page 55: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/55.jpg)
55 Real-Time Scalability
HRT, Uniform Light
![Page 56: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/56.jpg)
56 Real-Time Scalability
HRT, Uniform Medium
Similar to before.
Utilizations aren’t high enough to start causing problems for partitioning.
![Page 57: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/57.jpg)
57 Real-Time Scalability
HRT, Uniform Medium
![Page 58: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/58.jpg)
58 Real-Time Scalability
HRT, Uniform Heavy
Utilizations are high enough to cause problems for partitioning.
S-PD2 wins now.
![Page 59: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/59.jpg)
59 Real-Time Scalability
HRT, Uniform Heavy
![Page 60: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/60.jpg)
60 Real-Time Scalability
SRT, Uniform Light
PEDF wins, S-PD2 performs pretty well.
![Page 61: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/61.jpg)
61 Real-Time Scalability
SRT, Uniform Light
![Page 62: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/62.jpg)
62 Real-Time Scalability
SRT, Uniform Medium
CEDF really benefits from using a “no utilization loss” schedulability test within each cluster.
![Page 63: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/63.jpg)
63 Real-Time Scalability
SRT, Uniform Medium
![Page 64: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/64.jpg)
64 Real-Time Scalability
SRT, Uniform Heavy
GEDF and NP-GEDF actually win in this case.
CEDF and S-PD2 perform pretty well.
PEDF loses.
![Page 65: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/65.jpg)
65 Real-Time Scalability
SRT, Uniform Heavy
![Page 66: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/66.jpg)
On the Implementationof Global Real-Time Schedulers
Sathish GopalakrishnanThe University of British Columbia
Work supported by IBM, SUN, and Intel Corps., NSF grants CNS 0834270, CNS 0834132, and CNS 0615197, and ARO grant W911NF-06-1-0425.
Simon Fraser UniversityApril 15, 2010
Tuesday, April 5, 2011
![Page 67: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/67.jpg)
On the Implementation of Global Real-Time Schedulers
UNC’s Implementation Studies (I)
2
Calandrino et al. (2006)➡ Are commonly-studied RT schedulers implementable?➡ In Linux on common hardware platforms?
Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.
Tuesday, April 5, 2011
![Page 68: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/68.jpg)
On the Implementation of Global Real-Time Schedulers
Main M
emory
L2 Cache
L2 Cache
Proc. 1
Proc. 2
Proc. 4
Proc. 3
L2 Cache
L2 Cache
UNC’s Implementation Studies (I)
3
Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.
Intel 4x 2.7 GHz Xeon SMP(few, fast processors; private caches)
Calandrino et al. (2006)➡ Are commonly-studied RT schedulers implementable?➡ In Linux on common hardware platforms?
Tuesday, April 5, 2011
![Page 69: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/69.jpg)
On the Implementation of Global Real-Time Schedulers
Main M
emory
L2 Cache
L2 Cache
Proc. 1
Proc. 2
Proc. 4
Proc. 3
L2 Cache
L2 Cache
UNC’s Implementation Studies (I)
4
Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.
G-EDF
P-EDF
S-PD2
G-NP-EDF
PD2
partitioned EDF
2 x global EDF
2 x PFAIR
Calandrino et al. (2006)➡ Are commonly-studied RT schedulers implementable?➡ In Linux on common hardware platforms?
Tuesday, April 5, 2011
![Page 70: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/70.jpg)
On the Implementation of Global Real-Time Schedulers
Main M
emory
L2 Cache
L2 Cache
Proc. 1
Proc. 2
Proc. 4
Proc. 3
L2 Cache
L2 Cache
UNC’s Implementation Studies (I)
5
Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.
G-EDF
P-EDF
S-PD2
G-NP-EDF
PD2
Calandrino et al. (2006)➡ Are commonly-studied RT schedulers implementable?➡ In Linux on common hardware platforms?
“for each tested scheme, scenarios exist in which it is a viable choice”
Tuesday, April 5, 2011
![Page 71: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/71.jpg)
On the Implementation of Global Real-Time Schedulers
UNC’s Implementation Studies (II)
6
Brandenburg et al. (2008)➡ What if there are many slow processors?
Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.
G-EDF
P-EDF
S-PD2
G-NP-EDF
PD2
Main M
emory
L2 Cache
L2 Cache
Proc. 1
Proc. 2
Proc. 4
Proc. 3
L2 Cache
L2 Cache
Tuesday, April 5, 2011
![Page 72: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/72.jpg)
On the Implementation of Global Real-Time Schedulers
Main
Mem
ory
L2 Cache
L1 Cache
Core 1
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 2
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 3
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 4
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 6
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 5
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 7
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 8
Thread
1
Thread
2
Thread
3
Thread
4
UNC’s Implementation Studies (II)
7
Brandenburg et al. (2008)➡ What if there are many slow processors?➡ Explored scalability of RT schedulers on a Sun Niagara.
Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.
G-EDF
P-EDF
S-PD2
G-NP-EDF
PD2
Tuesday, April 5, 2011
![Page 73: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/73.jpg)
On the Implementation of Global Real-Time Schedulers
Main
Mem
ory
L2 Cache
L1 Cache
Core 1
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 2
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 3
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 4
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 6
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 5
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 7
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 8
Thread
1
Thread
2
Thread
3
Thread
4
UNC’s Implementation Studies (II)
8
Brandenburg et al. (2008)➡ What if there are many slow processors?➡ Explored scalability of RT schedulers on a Sun Niagara.
Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.
G-EDF
P-EDF
S-PD2
G-NP-EDF
PD2
G-EDF: high overheads, low schedulability.
Tuesday, April 5, 2011
![Page 74: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/74.jpg)
On the Implementation of Global Real-Time Schedulers
Today’s discussion
9
How to implement global schedulers?
Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.
Main
Mem
ory
L2 Cache
L1 Cache
Core 1
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 2
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 3
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 4
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 6
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 5
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 7
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 8
Thread
1
Thread
2
Thread
3
Thread
4
G-EDF
P-EDF
S-PD2
G-NP-EDF
PD2
Tuesday, April 5, 2011
![Page 75: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/75.jpg)
On the Implementation of Global Real-Time Schedulers
Main
Mem
ory
L2 Cache
L1 Cache
Core 1
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 2
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 3
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 4
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 6
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 5
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 7
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 8
Thread
1
Thread
2
Thread
3
Thread
4
Today’s discussion
10
How to implement global schedulers?➡ Explore how implementation tradeoffs affect schedulability.
Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.
Instead of considering
one implementation
of severaldifferent scheduling
algorithms…
Tuesday, April 5, 2011
![Page 76: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/76.jpg)
On the Implementation of Global Real-Time Schedulers
Main
Mem
ory
L2 Cache
L1 Cache
Core 1
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 2
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 3
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 4
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 6
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 5
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 7
Thread
1
Thread
2
Thread
3
Thread
4
L1 Cache
Core 8
Thread
1
Thread
2
Thread
3
Thread
4
Today’s discussion
11
How to implement global schedulers?➡ Explore how implementation tradeoffs affect schedulability.➡ Case study: nine G-EDF variants on a Sun Niagara.
Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.
G-EDF G-EDF
G-EDF G-EDF
G-EDF
G-EDF G-EDF
G-EDF
G-EDF
Tuesday, April 5, 2011
![Page 77: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/77.jpg)
On the Implementation of Global Real-Time Schedulers
Design Choices
12Tuesday, April 5, 2011
![Page 78: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/78.jpg)
On the Implementation of Global Real-Time Schedulers
Design Choices
13
➡ When to schedule.➡ Quantum alignment.➡ How to handle interrupts.➡ How to queue pending jobs.➡ How to manage future releases.➡ How to avoid unnecessary preemptions.
Tuesday, April 5, 2011
![Page 79: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/79.jpg)
On the Implementation of Global Real-Time Schedulers
Scheduler Invocation
14Tuesday, April 5, 2011
![Page 80: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/80.jpg)
On the Implementation of Global Real-Time Schedulers
Scheduler Invocation
15
Event-Driven➡ on job release➡ on job completion➡ preemptions occur
immediately
P1
P2T x
1T y2
T z3 T y
2
5 10 150
release completion
Tuesday, April 5, 2011
![Page 81: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/81.jpg)
On the Implementation of Global Real-Time Schedulers
release completion
Scheduler Invocation
16
Event-Driven➡ on job release➡ on job completion➡ preemptions occur
immediately
Quantum-Driven➡ on every timer tick➡ easier to implement➡ on release a job is just
enqueued; scheduler is invoked at next tick
P1
P2T x
1T y2
T z3 T y
2
5 10 150
P1
P2T x
1T y2
T z3 T y
2
delay partially-used quantum
5 10 150
Tuesday, April 5, 2011
![Page 82: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/82.jpg)
On the Implementation of Global Real-Time Schedulers
Quantum Alignment
17
Aligned➡ Tick synchronized
across processors.➡ Contention at
quantum boundary!
P1
P2
5 10 150release completion
Tuesday, April 5, 2011
![Page 83: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/83.jpg)
On the Implementation of Global Real-Time Schedulers
Quantum Alignment
18
Aligned➡ Tick synchronized
across processors.➡ Contention at
quantum boundary!
Staggered➡ Ticks spread out
across quantum.➡ Reduced bus and
lock contention.➡ Additional latency.
P1
P2
5 10 150
P1
P2
5 10 150
release completion
Tuesday, April 5, 2011
![Page 84: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/84.jpg)
On the Implementation of Global Real-Time Schedulers
Quantum Alignment
19
Aligned➡ Tick synchronized
across processors.➡ Contention at
quantum boundary!
Staggered➡ Ticks spread out
across quantum.➡ Reduced bus and
lock contention.➡ Additional latency.
P1
P2T x
1T y2
T z3 T y
2
delay partially-used quantum
5 10 150
P1
P2
5 10 150
release completion
Tuesday, April 5, 2011
![Page 85: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/85.jpg)
On the Implementation of Global Real-Time Schedulers
Quantum Alignment
20
Aligned➡ Tick synchronized
across processors.➡ Contention at
quantum boundary!
Staggered➡ Ticks spread out
across quantum.➡ Reduced bus and
lock contention.➡ Additional latency.
P1
P2T x
1T y2
T z3 T y
2
delay partially-used quantum
5 10 150
P1
P2T x
1T y2
T z3 T y
2
staggering delays
5 10 150
release completion
Tuesday, April 5, 2011
![Page 86: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/86.jpg)
On the Implementation of Global Real-Time Schedulers
Quantum Alignment
21
Aligned➡ Tick synchronized
across processors.➡ Contention at
quantum boundary!
Staggered➡ Ticks spread out
across quantum.➡ Reduced bus and
lock contention.➡ Additional latency.
P1
P2T x
1T y2
T z3 T y
2
staggering delays
5 10 150
P1
P2T x
1T y2
T z3 T y
2
delay partially-used quantum
5 10 150release completion
Tuesday, April 5, 2011
![Page 87: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/87.jpg)
On the Implementation of Global Real-Time Schedulers
Interrupt Handling
22Tuesday, April 5, 2011
![Page 88: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/88.jpg)
On the Implementation of Global Real-Time Schedulers
Interrupt Handling
23
Global interrupt handling.➡ Job releases triggered by interrupts.➡ Interrupts may fire on any processor.➡ Jobs may execute on any processor.➡ Thus, in the worst case, a job may be
delayed by each interrupt.
Mai
n M
emor
y
L2 Cache
L1 Cache
Core 1
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 2
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 3
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 4
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 6
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 5
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 7
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 8
Thread1
Thread2
Thread3
Thread4
Tuesday, April 5, 2011
![Page 89: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/89.jpg)
On the Implementation of Global Real-Time Schedulers
Interrupt Handling
24
Global interrupt handling.➡ Job releases triggered by interrupts.➡ Interrupts may fire on any processor.➡ Jobs may execute on any processor.➡ Thus, in the worst case, a job may be
delayed by each interrupt.
Mai
n M
emor
y
L2 Cache
L1 Cache
Core 1
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 2
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 3
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 4
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 6
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 5
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 7
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 8
Thread1
Thread2
Thread3
Thread4
Mai
n M
emor
y
L2 Cache
L1 Cache
Core 1
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 2
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 3
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 4
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 6
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 5
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 7
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 8
Thread1
Thread2
Thread3
Thread4
Dedicated interrupt handling.➡ Only one processor services interrupts.➡ Jobs may execute on other processors.➡ Jobs are not delayed by release interrupts.➡ Well-known technique; used in the Spring
kernel (Stankovic and Ramamritham, 1991).➡ How does it affect schedulability?
J.A. Stankovic and K. Ramamritham (1991), The Spring kernel: A new paradigm for real-time systems. IEEE Software, 8(3):62–72.
Tuesday, April 5, 2011
![Page 90: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/90.jpg)
On the Implementation of Global Real-Time Schedulers
Ready Queue
25
Main Memory
L2 Cache
L2 Cache
Core 1 Core 2 Core 4Core 3
Q1
T a1
T b2
T c3
T d4
Tuesday, April 5, 2011
![Page 91: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/91.jpg)
On the Implementation of Global Real-Time Schedulers
Main Memory
L2 Cache
L2 Cache
Core 1 Core 2 Core 4Core 3
Q1
T a1
T b2
T c3
T d4
Ready Queue
26
Globally-shared priority queue.➡ Problem: hyper-period boundaries.➡ Problem: lock contention.➡ Problem: bus contention.
Tuesday, April 5, 2011
![Page 92: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/92.jpg)
On the Implementation of Global Real-Time Schedulers
Main Memory
L2 Cache
L2 Cache
Core 1 Core 2 Core 4Core 3
Q1
T a1
T b2
T c3
T d4
Ready Queue
27
Globally-shared priority queue.➡ Problem: hyper-period boundaries.➡ Problem: lock contention.➡ Problem: bus contention.
Requirements.➡ Mergeable priority queue: release n
jobs in O(log n) time.➡ Parallel enqueue / dequeue operations.➡ Mostly cache-local data structures.
Tuesday, April 5, 2011
![Page 93: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/93.jpg)
On the Implementation of Global Real-Time Schedulers
Ready Queue
28
Globally-shared priority queue.➡ Problem: hyper-period boundaries. ➡ Problem: lock contention.➡ Problem: bus contention.
P1 P2
…
P32
Coarse-Grained Heap Hierarchical Heaps Fine-Grained Heap
In this study, we consider three queue implementations.
Tuesday, April 5, 2011
![Page 94: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/94.jpg)
On the Implementation of Global Real-Time Schedulers
Ready Queue: Coarse-Grained Heap
29
Binomial heap + single lock.➡ Lock used to synchronize all G-EDF state.➡ Mergeable queue.➡ No parallel updates.➡ No cache-local updates.➡ Low locking overhead
(only single lock acquisition).
Tuesday, April 5, 2011
![Page 95: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/95.jpg)
On the Implementation of Global Real-Time Schedulers
Ready Queue: Hierarchical Heaps
30
P1 P2
…
P32
Per-processor queues + master queue.➡ Each queue protected by a lock.➡ Master queue holds min element of each per-
processor queue.➡ Global, sequential dequeue operations.➡ Mostly-local enqueue operations.
Tuesday, April 5, 2011
![Page 96: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/96.jpg)
On the Implementation of Global Real-Time Schedulers
Ready Queue: Hierarchical Heaps
31
P1 P2
…
P32
Per-processor queues + master queue.➡ Each queue protected by a lock.➡ Master queue holds min element of each per-
processor queue.➡ Global, sequential dequeue operations.➡ Mostly-local enqueue operations.
Locking.➡ Dequeue: top-down.➡ Enqueue: bottom-up.➡ Enqueue may have to
drop lock, retry.➡ Additional complexity
wrt. dequeue (see paper).➡ Bottom line: expensive.
Tuesday, April 5, 2011
![Page 97: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/97.jpg)
On the Implementation of Global Real-Time Schedulers
Ready Queue: Fine-Grained Heap
32
Parallel binary heap.➡ One lock per heap node.➡ Proposed by Hunt et al. (1996).➡ Not mergeable.➡ Parallel enqueue / dequeue.➡ No cache-local data.
Hunt et al. (1996), An efficient algorithm for concurrent priority queue heaps. Information Processing Letters, 60(3):151–157.
Tuesday, April 5, 2011
![Page 98: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/98.jpg)
On the Implementation of Global Real-Time Schedulers
Ready Queue: Fine-Grained Heap
33
Parallel binary heap.➡ One lock per heap node.➡ Proposed by Hunt et al. (1996).➡ Not mergeable.➡ Parallel enqueue / dequeue.➡ No cache-local data.
Locking.➡ Many lock acquisitions.➡ Atomic peek+dequeue
operation needed to check for preemptions.
Hunt et al. (1996), An efficient algorithm for concurrent priority queue heaps. Information Processing Letters, 60(3):151–157.
Tuesday, April 5, 2011
![Page 99: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/99.jpg)
On the Implementation of Global Real-Time Schedulers
Additional Components
34
Release queue.➡ Support mergeable queues.➡ Support dedicated interrupt handling.
Job-to-processor mapping.➡ Quickly determine whether preemption is required.➡ Avoid unnecessary preemptions.➡ Used to linearize concurrent scheduling decisions.
Tuesday, April 5, 2011
![Page 100: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/100.jpg)
On the Implementation of Global Real-Time Schedulers
Implementation in LITMUSRT
35Tuesday, April 5, 2011
![Page 101: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/101.jpg)
On the Implementation of Global Real-Time Schedulers
36
Linux Testbed for Multiprocessor Scheduling in Real-Time systems
Tuesday, April 5, 2011
![Page 102: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/102.jpg)
On the Implementation of Global Real-Time Schedulers
37
Linux Testbed for Multiprocessor Scheduling in Real-Time systems
UNC’s Linux patch.➡ Used in several previous studies.➡ On-going development.➡ Currently, based off of Linux 2.6.24.
Tuesday, April 5, 2011
![Page 103: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/103.jpg)
On the Implementation of Global Real-Time Schedulers
38
Linux Testbed for Multiprocessor Scheduling in Real-Time systems
UNC’s Linux patch.➡ Used in several previous studies.➡ On-going development.➡ Currently, based off of Linux 2.6.24.
Scheduler Plugin API.➡ scheduler_tick()➡ schedule()➡ release_jobs()
Tuesday, April 5, 2011
![Page 104: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/104.jpg)
On the Implementation of Global Real-Time Schedulers
Considered G-EDF Variants
39
Name Ready Q Scheduling InterruptsCEm coarse-grained event-driven global
CQm coarse-grained quantum (aligned) global
FEm fine-grained event-driven global
HEm hierarchical event-driven global
S-CQm coarse-grained quantum (staggered) global
CE1 coarse-grained event-driven dedicated
FE1 fine-grained event-driven dedicated
CQ1 coarse-grained quantum (aligned) dedicated
S-CQ1 coarse-grained quantum (staggered) dedicated
Tuesday, April 5, 2011
![Page 105: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/105.jpg)
On the Implementation of Global Real-Time Schedulers
Considered G-EDF Variants
40
Name Ready Q Scheduling InterruptsCEm coarse-grained event-driven global
CQm coarse-grained quantum (aligned) global
S-CQm coarse-grained quantum (staggered) global
HEm hierarchical event-driven global
FEm fine-grained event-driven global
CE1 coarse-grained event-driven dedicated
FE1 fine-grained event-driven dedicated
CQ1 coarse-grained quantum (aligned) dedicated
S-CQ1 coarse-grained quantum (staggered) dedicated
Tuesday, April 5, 2011
![Page 106: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/106.jpg)
On the Implementation of Global Real-Time Schedulers
Considered G-EDF Variants
41
Name Ready Q Scheduling InterruptsCEm coarse-grained event-driven global
CQm coarse-grained quantum (aligned) global
S-CQm coarse-grained quantum (staggered) global
HEm hierarchical event-driven global
FEm fine-grained event-driven global
CE1 coarse-grained event-driven dedicated
FE1 fine-grained event-driven dedicated
CQ1 coarse-grained quantum (aligned) dedicated
S-CQ1 coarse-grained quantum (staggered) dedicated
Baseline from(Brandenburg et al., 2008)
Tuesday, April 5, 2011
![Page 107: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/107.jpg)
On the Implementation of Global Real-Time Schedulers
Considered G-EDF Variants
42
Name Ready Q Scheduling InterruptsCEm coarse-grained event-driven global
CQm coarse-grained quantum (aligned) global
S-CQm coarse-grained quantum (staggered) global
HEm hierarchical event-driven global
FEm fine-grained event-driven global
CE1 coarse-grained event-driven dedicated
FE1 fine-grained event-driven dedicated
CQ1 coarse-grained quantum (aligned) dedicated
S-CQ1 coarse-grained quantum (staggered) dedicated
No fine-grained heaps + quantum-driven scheduling.(Parallel updates not beneficial due to quantum barrier.)
Tuesday, April 5, 2011
![Page 108: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/108.jpg)
On the Implementation of Global Real-Time Schedulers
Considered G-EDF Variants
43
Name Ready Q Scheduling InterruptsCEm coarse-grained event-driven global
CQm coarse-grained quantum (aligned) global
S-CQm coarse-grained quantum (staggered) global
HEm hierarchical event-driven global
FEm fine-grained event-driven global
CE1 coarse-grained event-driven dedicated
CQ1 coarse-grained quantum (aligned) dedicated
S-CQ1 coarse-grained quantum (staggered) dedicated
FE1 fine-grained event-driven dedicated
Tuesday, April 5, 2011
![Page 109: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/109.jpg)
On the Implementation of Global Real-Time Schedulers
Considered G-EDF Variants
44
Name Ready Q Scheduling InterruptsCEm coarse-grained event-driven global
CQm coarse-grained quantum (aligned) global
S-CQm coarse-grained quantum (staggered) global
HEm hierarchical event-driven global
FEm fine-grained event-driven global
CE1 coarse-grained event-driven dedicated
CQ1 coarse-grained quantum (aligned) dedicated
S-CQ1 coarse-grained quantum (staggered) dedicated
FE1 fine-grained event-driven dedicated
No hierarchical heaps + dedicated interrupt handling.(Hierarchical heaps not beneficial if only one proc. enqueues.)
Tuesday, April 5, 2011
![Page 110: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/110.jpg)
On the Implementation of Global Real-Time Schedulers
Schedulability Study
45Tuesday, April 5, 2011
![Page 111: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/111.jpg)
On the Implementation of Global Real-Time Schedulers
Objective
46
Compare the discussed implementationsin terms of the ratio of randomly-generated task sets
that can be shown to be schedulableunder consideration of system overheads.
Tuesday, April 5, 2011
![Page 112: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/112.jpg)
On the Implementation of Global Real-Time Schedulers
Scheduling Overheads
47Tuesday, April 5, 2011
![Page 113: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/113.jpg)
On the Implementation of Global Real-Time Schedulers
Scheduling Overheads
48
Release overhead.➡ The cost of a one-shot timer interrupt.
Scheduling overhead.➡ Selecting the next job to run.
Context switch overhead.➡ Changing address space.
P1
P2T x
1T y2
T z3 T y
2
context switchrelease schedule
T z3csr r
cscsr
cs
cs
cs
5 10 150
release completion
Tuesday, April 5, 2011
![Page 114: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/114.jpg)
On the Implementation of Global Real-Time Schedulers
Scheduling Overheads
49
Release overhead.➡ The cost of a one-shot timer interrupt.
Scheduling overhead.➡ Selecting the next job to run.
Tick overhead.➡ Cost of a periodic timer interrupt.➡ Beginning of a new quantum.
Context switch overhead.➡ Changing address space.
Preemption and migration overhead.➡ Loss of cache affinity.➡ Known from (Brandenburg et al., 2008).
P1
P2T x
1T y2
T z3 T y
2
context switchrelease schedule
T z3csr r
cscsr
cs
cs
cs
5 10 150
release completion
Tuesday, April 5, 2011
![Page 115: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/115.jpg)
On the Implementation of Global Real-Time Schedulers
IPI Latency
50
P1
P2 T x1T y
2
T z3 T y
2
IPI latency
5 10 150
Inter-processor interrupts (IPIs).➡ Interrupt may be processed by a processor different from the one
that will schedule a newly-arrived job.➡ Requires notification of remote processor.➡ Event-based scheduling incurs added latency.
release completion
Tuesday, April 5, 2011
![Page 116: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/116.jpg)
On the Implementation of Global Real-Time Schedulers
Test Platform
51
LITMUSRT
➡ UNC’s Linux-based Real-Time Testbed
Sun UltraSPARC T1 “Niagara”➡ 8 cores, 4 HW threads per core = 32 logical processors.➡ 3 MB shared L2 cache
on
— SUN UltraSPARC T1 “Niagara”
Tuesday, April 5, 2011
![Page 117: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/117.jpg)
On the Implementation of Global Real-Time Schedulers
Test Platform
52
LITMUSRT
➡ UNC’s Linux-based Real-Time Testbed
Sun UltraSPARC T1 “Niagara”➡ 8 cores, 4 HW threads per core = 32 logical processors.➡ 3 MB shared L2 cache
on
— SUN UltraSPARC T1 “Niagara”
0
100
200
300
400
500
600
700
PFAIR S-PFAIR G-EDF C-EDF P-EDF
Overheads➡ Traced overheads under each of the plugins.➡ Collected more than 640,000,000 samples (total).➡ Computed worst-case and average-case overheads.➡ Over 20 graphs; see online version.
Outliers➡ Removed top 1% of samples to discard outliers.
Tuesday, April 5, 2011
![Page 118: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/118.jpg)
On the Implementation of Global Real-Time Schedulers
Example: Tick Overhead
53
0
50
100
150
200
250
300
50 100 150 200 250 300 350 400 450
over
head
(us)
number of tasks
worst-case tick overhead
CEm tick overhead (worst-case)CE1 tick overhead (worst-case)FEm tick overhead (worst-case)FE1 tick overhead (worst-case)
CQm tick overhead (worst-case)CQ1 tick overhead (worst-case)HEm tick overhead (worst-case)“Higher is worse.”
number of tasks
mic
rose
cond
s
Tuesday, April 5, 2011
![Page 119: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/119.jpg)
On the Implementation of Global Real-Time Schedulers
Example: Tick Overhead
54
0
50
100
150
200
250
300
50 100 150 200 250 300 350 400 450
over
head
(us)
number of tasks
worst-case tick overhead
CEm tick overhead (worst-case)CE1 tick overhead (worst-case)FEm tick overhead (worst-case)FE1 tick overhead (worst-case)
CQm tick overhead (worst-case)CQ1 tick overhead (worst-case)HEm tick overhead (worst-case)
Event-Driven
Quantum-Driven
Tuesday, April 5, 2011
![Page 120: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/120.jpg)
On the Implementation of Global Real-Time Schedulers
0
50
100
150
200
250
300
350
400
50 100 150 200 250 300 350 400 450
over
head
(us)
number of tasks
worst-case release overhead
CEm release overhead (worst-case)CE1 release overhead (worst-case)FEm release overhead (worst-case)FE1 release overhead (worst-case)
CQm release overhead (worst-case)CQ1 release overhead (worst-case)HEm release overhead (worst-case)
Example: Release Overhead
55
Quantum-Driven
Event-Driven
Tuesday, April 5, 2011
![Page 121: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/121.jpg)
On the Implementation of Global Real-Time Schedulers
Study Setup
56
Methodology.➡ Randomly generate task set.➡ Apply overheads (for each G-EDF implementation).➡ Test whether task set can be claimed schedulable (for
each G-EDF implementation). 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.1, 0.4]; period uniformly in [10, 100]
G-EDF CEm CE1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.001, 0.1]; period uniformly in [10, 100]
G-EDF CQ1 S-CQ1
Tuesday, April 5, 2011
![Page 122: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/122.jpg)
On the Implementation of Global Real-Time Schedulers
Study Setup
57
Methodology.➡ Randomly generate task set.➡ Apply overheads (for each G-EDF implementation).➡ Test whether task set can be claimed schedulable (for
each G-EDF implementation). 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.1, 0.4]; period uniformly in [10, 100]
G-EDF CEm CE1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.001, 0.1]; period uniformly in [10, 100]
G-EDF CQ1 S-CQ1
Schedulability.➡ Hard real-time: worst-case overheads, no tardiness.➡ Soft real-time: average-case overheads, bounded
tardiness. 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.5, 0.9]; period uniformly in [10, 100]
G-EDF CE1 FE1
Tuesday, April 5, 2011
![Page 123: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/123.jpg)
On the Implementation of Global Real-Time Schedulers
Study Setup
58
Methodology.➡ Randomly generate task set.➡ Apply overheads (for each G-EDF implementation).➡ Test whether task set can be claimed schedulable (for
each G-EDF implementation). 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.1, 0.4]; period uniformly in [10, 100]
G-EDF CEm CE1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.001, 0.1]; period uniformly in [10, 100]
G-EDF CQ1 S-CQ1
Task set generation.➡ Six utilization distributions (uniform and bimodal).➡ Three period distributions (uniform).➡ Over 300 graphs; see online version.
Schedulability.➡ Hard real-time: worst-case overheads, no tardiness.➡ Soft real-time: average-case overheads, bounded
tardiness. 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.5, 0.9]; period uniformly in [10, 100]
G-EDF CE1 FE1
Tuesday, April 5, 2011
![Page 124: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/124.jpg)
On the Implementation of Global Real-Time Schedulers
Results
59
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.1, 0.4]; period uniformly in [10, 100]
G-EDF CEm CE1
increasing utilizationsche
dula
ble
task
set
s
“Higher is better.”
Tuesday, April 5, 2011
![Page 125: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/125.jpg)
On the Implementation of Global Real-Time Schedulers
Interrupt Handling
60
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.1, 0.4]; period uniformly in [10, 100]
G-EDF CEm CE1
Dedicated
Zero Overh.Global
Dedicated interrupt handlingwas generally preferable (or no worse).
Tuesday, April 5, 2011
![Page 126: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/126.jpg)
On the Implementation of Global Real-Time Schedulers
Quantum Staggering
61
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.001, 0.1]; period uniformly in [10, 100]
G-EDF CQ1 S-CQ1
Staggered
Zero OverheadsAligned
Staggered quantawere generally preferable (or no worse).
Tuesday, April 5, 2011
![Page 127: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/127.jpg)
On the Implementation of Global Real-Time Schedulers
Quantum- vs. Event-Driven
62
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.1, 0.4]; period uniformly in [10, 100]
G-EDF FE1 S-CQ1
QuantumEvent
Event-driven scheduling was preferable in most cases.
Zero Overh.
Tuesday, April 5, 2011
![Page 128: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/128.jpg)
On the Implementation of Global Real-Time Schedulers
Choice of Ready Queue (1)
63
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.1, 0.4]; period uniformly in [10, 100]
G-EDF CEm HEm
HierarchicalCoarse-Grained
Zero Overh.
The coarse-grained ready queueperformed better than the hierarchical queue.
Tuesday, April 5, 2011
![Page 129: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/129.jpg)
On the Implementation of Global Real-Time Schedulers
Choice of Ready Queue (II)
64
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
ratio
of s
ched
ulab
le ta
sk s
ets
[har
d]
task set utilization cap (prior to inflation)
utilization uniformly in [0.5, 0.9]; period uniformly in [10, 100]
G-EDF CE1 FE1
Zero O.Coarse-Grained
Fine-Grained
The fine-grained ready queueperformed marginally better than the coarse-grained queue if used together with dedicated interrupt handling.
Tuesday, April 5, 2011
![Page 130: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/130.jpg)
On the Implementation of Global Real-Time Schedulers
Conclusion
65Tuesday, April 5, 2011
![Page 131: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/131.jpg)
On the Implementation of Global Real-Time Schedulers
Summary of Results
66
Implementation choices can impact schedulability as much as
scheduling-theoretic tradeoffs.
Unless task counts are very highor periods very short,
G-EDF can scale to 32 processors.
Tuesday, April 5, 2011
![Page 132: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/132.jpg)
On the Implementation of Global Real-Time Schedulers
Recommendation
67
Mai
n M
emor
y
L2 Cache
L1 Cache
Core 1
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 2
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 3
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 4
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 6
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 5
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 7
Thread1
Thread2
Thread3
Thread4
L1 Cache
Core 8
Thread1
Thread2
Thread3
Thread4
Best results obtained with combination of:
fine-grained heapevent-driven scheduling
dedicated interrupt handling
Tuesday, April 5, 2011
![Page 133: Implementation Study - Courses (please see the UBC course ...courses.ece.ubc.ca/494/files/MP-ImplementationStudy.pdfWorst-Case Overheads (in µs) Alg Overall Preemption Intra-Cluster](https://reader036.fdocuments.in/reader036/viewer/2022071606/61439ef36b2ee0265c0228e3/html5/thumbnails/133.jpg)
On the Implementation of Global Real-Time Schedulers
Future Work
68
Platform.➡ Repeat study on embedded hardware platform.
Implementation.➡ Simplify locking requirements.➡ Parallel mergeable heaps?
Analysis.➡ Less pessimistic hard real-time G-EDF schedulability tests.➡ Less pessimistic interrupt accounting.
Tuesday, April 5, 2011