Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores
description
Transcript of Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores
Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores
Zheng Wu
• Background• Motivation• Analysis Framework• Intra-Core Cache Analysis• Cache Conflict Analysis• Optimization Techniques• WCRT Analysis• Experiment Setup• Experiment Results• Contribution• Conclusion
Outline
• Hard Real-time Systems– Worst case execution time: essential input for schedulability analysis
• Static Program Analysis– Program path modeling: infeasible path/ loop bound detection– Micro-architecture modeling: instruction/data cache, branch prediction,
out-of-order pipeline.
Background
Dis
tribu
tion Actual WCET
Execution Time
ActualObserved
Observed WCET
Background• Concurrent Programs
– Task interaction: control/data dependency, preemption.– Resource contention: Shared cache multi-core architectures
• Problem: shared L2 instruction cache contention.
Core 1 Core nCPU
L1 Cache
CPU
L1 Cache
L2 Cache
……
• Message Sequence Chart (MSC)
time
Process 1 Process 2 Process 3
fm1
fm2
fm4
fr0
fr1
fs0
Core 1 Core 2
task Message communication
System Model
• Different cores conflict in L2 Shared cache.
Problem: Shared L2 cache conflicts
L2 Cache
Set 0
Set 1
Set 2
Set 3
Core 1 Core 2Process 1 Process 2 Process 3
fm1
fm2
fm4
fr0
fr1
fs0
m m’
concurrent
• J. Yan and W. Zhang RTAS 2008– T, T’ are from different cores and they conflict for cache set C. – All the accesses from T and T’ to C are cache misses in the worst case.
Related Work
L2 Cache
Set C - 1
Set C
Set C + 1
Core 2
m1, …, mk m’1, …, m’k
misses misses
Core 1Process 1
fm1
fm2
fm4
Process2 Process3
fr0
fr1
fs0
• Task Execution Lifetime
Motivation
Core 1 Core 2
Start time
End time
Scenario 1: overlap lifetime
conflicts
Core 1 Core 2
Start time
End time
Scenario 2: disjoint lifetime
No conflicts
Task 1 Task 2 Task 1 Task 2
Analysis Framework
yes
no
Estimated WCRT
Intra Core Cache Analysis
Core 1
Initial task interference
Intra Core Cache Analysis
Core n
L2 cache conflict analysis
WCRT Analysis
Task Interference changes ?
……
Analysis Framework
yes
no
Estimated WCRT
Initial task interference
Modified taskinterference
L2 cache conflict analysis
WCRT Analysis
Interference changes ?
Intra Core Cache Analysis
Core n……Intra Core Cache Analysis
Core 1
• Must Analysis: Always Hit (AH)– Memory blocks guaranteed to be present in the cache
• May Analysis: Always Miss (AM)– Memory blocks may be present in the cache.
• Persistence Analysis – Never evicted from cache after first iteration.
• Others: Non Classified (NC)
Intra-core Cache AnalysisH. Theiling, C. Ferdinand. and R. Wilheml. Fast and precise WCET prediction by separated cache and path analyses. RTS 2000.
• L1 Intra-core Cache Analysis.− Always Hit (AH), Always Miss (AM), Non Classified (NC)
• L2 Intra-core Cache Analysis
Intra-core Cache Analysis
AH AM NC
L2 cache analysis
L1 cache analysis
AH AM NC
access not access
Analysis Framework
yes
no
Estimated WCRT
Intra Core Cache Analysis
Core 1
Initial task interference
Modified taskinterference
Intra Core Cache Analysis
Core 2
L2 cache conflict analysis
WCRT Analysis
Interference changes ?
• Initial Task interference graph
L2 Cache Conflict Analysis
L2 Cache Conflict Analysis
• Analyze each cache set individually• Intra core L2 analysis
– Always miss– Non classified– Always hit
Set i
Set j
Task T
m0, m1
m2, m3
conflicting tasks
m’0, m’1Non classified
Always hit
L2 cache
Optimization for Set Associativity• Consider memory block age: LRU replacement
– age(m): maximal/upper bound of age of m.
m0
m1
Age: 1
2
3
4
Task T
m2
Always hitAlways hitAlways hit
m’0, m’1
Conflicting tasks
m0
m1
Age: 1
2
3
4
w/o optimization
m2
Non classifiedNon classifiedNon classified
m0
m1
Age: 1
2
3
4
with optimization
m2
Always hitAlways hitNon classified
2 memory blocks
Analysis Framework
yes
no
Estimated WCRT
Intra Core Cache Analysis
Core 1
Initial task interference
Modified taskinterference
Intra Core Cache Analysis
Core 2
L2 cache conflict analysis
WCRT Analysis
Interference changes ?
BCET and WCET Analysis
L1 cache L2 cache Best-Case Worst-CaseAH --- L1 hit L1 hit
AM AH L2 hit L2 hit
AM AM L2 miss L2 miss
AM NC L2 hit L2 miss
NC AH L1 hit L2 hit
NC AM L1 hit L2 miss
NC NC L1 hit L2 miss
• BCET and WCET •Access Latency for best case and worst case.
– Assumption: no timing anomalies with other architecture features• Shortest (longest) path
WCRT Analysis• Compute earliest, latest ready and finish time
Initial task interference
L2 cache conflict analysis and WCRT analysis
Interference graph
Estimated WCRT
Change ?Yes
No
Putting Together
Experiments Parameters
• Cache access latency– L1 hit: 1 cycle, L2 hit: 10 cycle, Memory access: 100 cycle
• Various core numbers– 1 core, 2 cores and 4 cores
•Various cache configurations – cache size, block size, associativity
• Real-World Benchmarks: DEBIE.
• Real-World Benchmarks: DEBIE.• Space Debris Monitoring Software• 8 MSC, 35 tasks.
0-1k 1k-2k 2k-4k 4k-8k 8k-16k 16k-0
2
4
6
8
10
12Code Size Distribution
Task Code Size
#of t
asks
Experiments Parameters
• Comparison with Yan-Zhang RTAS 2008. • Direct mapped cache only.
Experimental Results
(a) WCRT Comparison (b) Inter-core Eviction Comparison
1-core, L2:8KB
2-core, L2:16KB
4-core, L2:32KB
10,000,000
15,000,000
20,000,000
25,000,000
Yan-Zhang's Method Our Method
Core Configuration (L1: 2KB)
Estim
ated
WC
RT(
mill
ion)
1-core. L2:8KB
2-core. L2:16KB
4-core. L2:32KB
05,000
10,00015,00020,00025,00030,000
Yan-Zhang's Method Our Method
Core Configuration (L1: 2KB)In
ter c
ore
Evic
tions
• Vary L1 and L2 Size.
Experimental Results
(a) Varying L1 Size
512B 1KB 2KB 4KB0
20,000,000
40,000,000
60,000,000
80,000,000
100,000,000
120,000,000
Yan-Zhang's Method Our Method
Core Configuration (2-core, L2: 16KB)
Estim
ated
WC
RT(
mill
ion)
4KB 8KB 16KB 32KB18,000,00019,000,00020,000,00021,000,00022,000,00023,000,00024,000,00025,000,00026,000,000
Yan-Zhang's Method Our Method
Core Configuration (2-core, L1: 2KB)Es
timat
ed W
CR
T (m
illio
n)
(b) Varying L2 Size
• Set associative cache optimizationsExperimental Results
1way 2way 4way 8way16,000,00017,000,00018,000,00019,000,00020,000,00021,000,00022,000,00023,000,00024,000,000
w/o optimization wtih optimization
Core Configuration (2-core, L1:2KB)
Est
imat
ed W
CR
T (
mill
ion)
m0
m1
w/o optimization
m2
m0
m1
w/o optimization
m2
Non classifiedNon classifiedNon classified
Always hitAlways hitNon classified
Age:1234
2KB 4KB 8KB 16KB 32KB05
1015202530
L1:2x512B
L1:2x1KB
L1:2x2KB
Shared L2 Cache Size
Ana
lysi
s T
ime
(sec
)
Experimental Results
• Runtime of our iterative analysis
• WCRT analysis of concurrent programs running on Shared cache multi-cores.
• Use task lifetime to indentify real conflicts.
• Optimizations for set associative cache.
• Experiments: tighter WCET than state of the art.
• Future work: data cache, other replacement policy
Conclusion
Thank You!