Post on 16-Apr-2018
Cache-Related Preemption and Migration Delays:Cache Related Preemption and Migration Delays:Empirical Approximation and
Impact on Schedulability
lOSPERT 2010, BrusselsJuly 6, 2010
Andrea Bastoni Björn B BrandenburgAndrea Bastoni Björn B. BrandenburgJames H. Anderson
University of Rome The University of North Carolinah l ll“Tor Vergata” at Chapel Hill
Work supported by AT&T, IBM, and Sun Corps.; NSF grants CNS 0834270, CNS 0834132, and CNS 0615197;ARO grant W911NF-09-1-0535; and AFOSR grant FA 9550-09-1-0549.
The Problem
Release
T1 T2P1Deadline
Overhead
T2 T3P2
T2 migrates to P1.
t0 4 8 12 16 20
M lti Gl b l EDF h d l ith h d
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 2
Multiprocessor Global EDF schedule with overheads.
The Problem
Kernel overheads (e.g., release overhead,scheduling overhead, etc.) are “easy” to measure.
T2P1
T2P2 Can directly measure delay(code instrumentation)
t
(code instrumentation).
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 3
t0 4 8 12 16 20
The Problem
Overheads due to preemption / migrations are not!
T2P1
T2P2
t
Delay does not occur consecutively,but in many “little pieces”.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 4
t0 4 8 12 16 20
Outline
Cache-related preemption and migration delays (CPMD)(CPMD).
Two methods to measure CPMDTwo methods to measure CPMD.
Experimental results and discussionExperimental results and discussion.
Impact on schedulability (sketch).Impact on schedulability (sketch).
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 5
Cache-Related Preemption and Migration Delays (CPMD)y ( )
Cache-related preemption and migration delays:D l d t dditi l h i h i• Delays due to additional cache misses when resuming execution after a preemption or a migration.
T2
Data in local cache. (Re)fetch.
Data in remote cache Invalidate
Heavily dependent on working set size (WSS).
T2 Data in remote cache. Invalidate.
No effective WCET analysis techniques available for current multiprocessors with cache hierarchies.
Cache-Related Preemption and Migration Delays
Need to rely on empirical measurements.Bastoni, Brandenburg, and Anderson 6
Detecting CPMD
In this study: empirical approximation of CPMD.
CPMD
Complete access to J Working Set (WS),
Complete access to J Working Set (WS),
Complete access to J Working Set (WS),
Cache-Related Preemption and Migration Delays
cache-cold.g ( ),
cache-warm. after preemption.
Bastoni, Brandenburg, and Anderson 7
Measuring CPMD
Directly measure delaywith low overhead clock devicewith low-overhead clock device
M tiMeasure access times.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 8
Measuring CPMD
Indirectly measure delayith h d f twith hardware perf. counters
C t h iCount cache misses.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 9
Outline
Cache-related preemption and migration delays (CPMD)(CPMD).
Two methods to measure CPMDTwo methods to measure CPMD.
Experimental results and discussionExperimental results and discussion.
Impact on schedulability (sketch).Impact on schedulability (sketch).
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 10
Schedule-Sensitive Method
On-line recording of delays:• Execute instrumented synthetic tasks under desired• Execute instrumented synthetic tasks under desired
scheduling policy.• Wide range of WSS, TSS, and read/write ratios.
Can reveal dependencies on:• Scheduling policy.• Task set size (TSS).
Cannot explicitly control preempt./migrat.• P/M depends on the scheduling policy.• Not every job yields a valid measure.
A j b b t d t l t l t t t ll
Cache-Related Preemption and Migration Delays
• A job may be preempted too early, too late, or not at all.
Bastoni, Brandenburg, and Anderson 11
Synthetic Method
Fine-grained control on measurement process:• Artificially trigger preemptions and migrations.Artificially trigger preemptions and migrations.• Explicit control on preemptions and different types of
migrations (through L2 cache, L3 cache, and memory).Fixed-prio scheduling policy (e.g., SCHED_FIFO).• Single high-prio tasks access wide range of WSS.• Wide range of read/write ratios.
Every job yields a valid measure.Cannot detect dependencies on:• Scheduling policy, TSS.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 12
Implementation
Operating System:
LINUX+
UNC’s real-time Linux extension.D l d k l t h ( tl b dDeveloped as kernel patch (currently based on Linux 2.6.32).Code is available atCode is available at http://www.cs.unc.edu/~anderson/litmus-rt/.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 13
Implementation Issues
Low-overhead clock device.• Time-stamp counter (TSC) Per-core clock deviceTime stamp counter (TSC). Per core clock device.
Clock skew among cores.• WS access times only based on samples from the sameS access es o y based o sa p es o e sa e
processor.Interrupts interference.• Interrupts disabled during WS access.
How to detect when a preempt./migration occurred.• Low-overhead kernel-user communication mechanism.• Per-task memory page shared with the kernel.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 14
Outline
Cache-related preemption and migration delays (CPMD)(CPMD).
Two methods to measure CPMDTwo methods to measure CPMD.
Experimental results and discussionExperimental results and discussion.
Impact on schedulability (sketch).Impact on schedulability (sketch).
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 15
Test Platform
Intel Xeon L7455 “Dunnington”:
6 cores per socket.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 16
4 physical sockets.
Test Platform
Intel Xeon L7455 “Dunnington”:
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 17
12 MB L3 Cache.
Test Platform
Intel Xeon L7455 “Dunnington”:
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 18
3 MB L2 Cache.
Test Platform
Intel Xeon L7455 “Dunnington”:
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 19
12 MB L3 Cache.32 KB + 32 KB L1 Cache.
Study Setup
Schedule-Sensitive method:• G-EDF algorithm (but can be applied to other algos)G EDF algorithm (but can be applied to other algos).• TSS between 25 and 250, tasks randomly generated.
Uniform distribution. Periodic tasks with utilizations in [0.001, 0.1] and periods in [10,100] ms.
• WSS in the range from 4 KB to 2048 KB.• Per-WSS write ratio 1/2 and 1/4• Per-WSS write ratio 1/2 and 1/4.
Synthetic method:• Single SCHED FIFO task at the highest prioritySingle SCHED_FIFO task at the highest priority.• WSS in the range from 4 KB to 12 MB.• Per-WSS write ratio in the range from 0 to 1.
Cache-Related Preemption and Migration Delays
g• Preemption length uniformly distributed in [0,50] ms.
Bastoni, Brandenburg, and Anderson 20
Study Setup
Schedule-Sensitive method:• G-EDF algorithm (but can be applied to other algos)G EDF algorithm (but can be applied to other algos).• TSS between 25 and 250, tasks randomly generated.
Uniform distribution. Periodic tasks with utilizations in Tested two configurations for each[0.001, 0.1] and periods in [10,100] ms.
• WSS in the range from 4 KB to 2048 KB.• Per-WSS write ratio 1/2 and 1/4
Tested two configurations for each method:
Idle system.• Per-WSS write ratio 1/2 and 1/4.Synthetic method:• Single SCHED FIFO task at the highest priority
Idle system.Under load.
Single SCHED_FIFO task at the highest priority.• WSS in the range from 4 KB to 12 MB.• Per-WSS write ratio in the range from 0 to 1.
Cache-Related Preemption and Migration Delays
g• Preemption length uniformly distributed in [0,50] ms.
Bastoni, Brandenburg, and Anderson 21
Results
s)Logarithmic scale (log-log plots).Graphs display delays
elay
(us Graphs display delays.
Results from synthetic method only.
sing
de y y
Results from schedule-sensitive method revealed that CPMD does not depend on TSS (on our platform)
Incr
eas TSS (on our platform).
Increasing WSS (KB)
Cache-Related Preemption and Migration Delays“Higher is worse”.
Bastoni, Brandenburg, and Anderson 22
Results (System under load)
Size of L2 cacheimpacts predictability.
Standard DeviationStandard Deviation.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 23
Results (System under load)
Size of L2 cacheimpacts predictability.
Linear trend forWSS < L2 cache.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 24
Results (System under load)
No substantial difference between preemption and migration costs.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 25
Results (System under load)
Worst-case scenario: must reload cache under intense bus contention.
A li l b k d ti it
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 26
Applies unless no background activityand only small WSS for all tasks.
Results (Idle system)Best-case scenario: virtually no
contention for the memory sub-system.
L3, Memory migrations.
L2 i tiL2 migrations.Preemptions.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 27
Outline
Cache-related preemption and migration delays (CPMD)(CPMD).
Two methods to measure CPMDTwo methods to measure CPMD.
Experimental results and discussionExperimental results and discussion.
Impact on schedulability (sketch).Impact on schedulability (sketch).
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 28
Impact on Schedulability
Standard schedulability plot:
sk S
ets
C-EDF (L3)
ble
Tas
G-EDF
hedu
lab
P-EDF C-EDF (L2)
Sch
Soft Real-Time.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 29
Increasing utilization
Impact on Schedulability
Standard schedulability plot:
sk S
ets
ble
Tas
hedu
lab
Fixed CPMD value(fixed WSS value).
Sch (fixed WSS value).
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 30
Increasing utilization
Impact on Schedulabilityty
Weighted schedulability plot:
dula
bilit
e Sc
hed
greg
ate
Compact > 300 of plotsin ~ 10 plots
Agg in 10 plots.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 31
Increasing CPMD (us)
Evaluation of Scheduling Algorithmsity
C-EDF (L3) > P-EDFfor CPMD < 600 us.du
labi
lie
Sche
dgr
egat
e
Soft Real Time
Ag Soft Real-Time.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 32
Increasing CPMD (us)
Conclusions
Empirical approximation of CPMD:S h d l iti th d• Schedule-sensitive method.
• Synthetic method.
CPMD strongly impacts evaluation of schedulers:• Preemptions are not necessarily (much) cheaper
th i ti (f t h d )than migrations (for worst-case overheads).• If there is memory bus contention, then this is
also true for average-case overheads
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 33
also true for average-case overheads.
Future Work
Validate TSC-based results with performance counterscounters.Apply the methodologies on NUMA and embedded platformsembedded platforms.Investigate impact of bus locking on CPMD.• For example: DMA transfers, atomic instructions etc.For example: DMA transfers, atomic instructions etc.
Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 34