Instruction Cache Memory Issues in Real-Time Systems
description
Transcript of Instruction Cache Memory Issues in Real-Time Systems
Instruction Cache Memory Issuesin Real-Time Systems
Licentiate dissertation
Filip Sebek October 11th, 2002
Opponent: Axel Jantsch (KTH)
Examinator: Lars Wanhammar (LiTH)
2
Outline of this dissertation Seminar
About this thesis (Lennart Lindh)
Thesis presentation (Filip Sebek)
Comments and questions (Axel Jantsch and Filip Sebek)
Questions from the audience
Consideration (Lars Wanhammar, Axel Jantsch, and Lennart Lindh)
Festivity (?) at the department
3
Organisation
RT Systems Design Lab Comp. Architecture Lab Computer Science Lab
Graduate Education
Lic school
Int’l MSc school
Undergraduate Education
4
Mohammed El Shobaki: System
Monitoring/Debugging of S/Multiprocessor Systems
Tommy Klevin: Bus analyzer (RealFast)
Stefan Sjöberg: Design ASIC/FPGA with Top Down Design Flow and VHDL
(RealFast ABB)
Joakim Persson: Redundant System
(ProTang, KK)
Johan Stärner: Multiprocessor
Architecture (KK)
Leif Enblom (ABB APR): Multiprocessor system
for (ABB KK)
Filip Sebek: Instruction Cache Memory Issues in
RTS
Filip Sebek: Instruction Cache Memory Issues in
RTS
Stefan Stjernen: IP Design (RealFast,
Industrial ResearchSchool: Electronic Design )
Raimo Haukilahti KTH/MDH: Low-Power Techniques for HW-RTOS
(KTH)
5
The title and the questions
Title:Instruction Cache Memory Issues
in Real-Time Systems
Initial questions How do I measure the cache-related preemption
delay in a real-time system?
Is a cache memory really a problem in real-time
systems?
6
Automatic control – Real-time system
1. Get input – sample…2. Compute – execute instructions3. Actuate – control the process…4. = Action!
A real-time system must produce correct results in time
Examples Air bag in action An armored tank in movement shoots Supertanker turns Toaster
7
Real-time system implementation Often as many ”small” cyclic programs
– tasks or processes – that communicate with each other
Alarm task
Sample task
Computation
Actuate
8
What Real-Time research is about:
Predicting execution time (of a task) Difficult – Many parameters
– Input data sensitive
– Program design
– Hardware dependant
– Compiler dependent
Several methods
Scheduling tasks static or dynamic may allow pre-emption
9
The title and the questions
Title:Instruction Cache Memory Issues
in Real-Time Systems
Initial questions How do I measure the cache-related preemption
delay in a real-time system?
Is a cache memory really a problem in real-time
systems?
10
What is a cache memory?
Cache memories are faster than primary memory and keeps pace
with CPU speed
Reduce congesting bus-traffic
Saves energy
Instruction fetch time becomes variable with caches;
hit-time and miss-penalty
CPUI/O
MEM
CACHE
Fast (~95%)
Slow (~5%)
11
How does a cache memory work?
Cache hit and cache miss
Locality Temporal locality;
– memory references close in time
– loops and functions
Spatial locality; – memory references close in space
– cache block and wide data bus
int funk(int term){ int vector[SIZE]; int i, sum=0; for(i=0;i<SIZE;i++) { vector[i] +=term; sum +=vector[i]; } return sum;}
12
The title and the questions
Title:Instruction Cache Memory Issues
in Real-Time Systems
Initial questions How do I measure the cache-related preemption
delay in a real-time system?
Is a cache memory really a problem in real-time
systems?
13
Cache memories and real-time Cache memories make execution time variable
Sample, execute, actuate – action! Sample, execute, actuate – action! Sample, execute, actuate – action!
Analysis is non-trivial; cache contents depends on execution path execution path depends on cache contents
Missed deadline?
14
Predicting cache behavior Avoidance and simplifications
Disable cache! Special designed processors and caches
Static analysis + no probe effects + safe overestimation - modern hardware (Paper C)
Simulation + simple - simulator must model correctly
Real measurement + measure on complex systems - probe effect (Papers A, B, D)
15
The title and the questions
Title:Instruction Cache Memory Issues
in Real-Time Systems
Initial questions How do I measure the cache-related preemption
delay in a real-time system?
Is a cache memory really a problem in real-time
systems?
16
Measurement and probe effect Most measurement affect the measured object when included or
removed from the measured environment. Examples:
A warm thermometer measures a glass of cold water
A computer monitoring system measures CPU load
Reduce the intrusion (probe effect) to a minimum!
17
Facts and Problems Solutions
18
Exploit the performance monitor that is equipped on CPU 4 registers on MPC750 Counts events
L1 Instruction fetch miss Branch miss Processor clocks Completed instructions Completed Load/Stores …
The Built-in Performance Monitor
NON-INTRUSIVE !
19
SARA CPU Card
20
SARA MP-system and MAMon
21
My questions revised
Initial questions: How do I measure the cache-related preemption delay in a
real-time system?
Is a cache memory really a problem in real-time systems?
Modified questions: Is there a simple(r) way to predict or measure cache misses
in a real-time system?
Can an instruction cache cause a missed deadline when it is enabled?
How much is the cache-related pre-emption delay in absolute and relative terms?
22
Outline of this presentation Introduction
The cache memory and real-time
Measurement and probe effect
CPX2000 – “SARA system”
My own questions
Synthetic code generation
Analysis
Determine worst-case cache miss-ratio of a program
Measure instruction execution time w/wo cache
Measure cache related preemption delay
Conclusion and future work
23
Current state in presentation:
We have 3 questions!
We have an experimental system!
We can measure on it with a small intrusion!
Q: Measure on what program?
24
Code generation: size Workbench
Standard benchmark? (Rhealstone, EEMBC etc.) Measure worst-case situations
Synthetic code – size specific One big loop
addis r3,r3,0x0000 = 4 bytes
Not representative code – no problem! Swap out cache contents – find maximum cost
– Code size measured in “cache size”
25
Code generation: miss-ratio One (out of several methods)
”Play with spatial locality”
– Method: Jump instructions breaks spatial locality
– Requirements: code size 2×cache size– Result: 1/block size – 100% cache misses
L1: nop (m)nop (h)nop (h)nop (h)
L2: nop (m)nop (h)nop (h)nop (h)
L1: J L2 (m)n.u.n.u.n.u.
L2: J L3 (m)n.u.n.u.n.u.
L1: nop (m)J L2 (h)n.u.n.u.
L2: nop (m)J L3 (h)n.u.n.u.
L1: nop (m)nop (h)J L2 (h)n.u.
L2: nop (m)nop (h)J L3 (h)n.u.
25% 100% 50% 33%
26
Analysis!
27
1.Code interpretation: miss-ratio
misshithithit
misshit--
-misshithitmiss---
i1i2i3i4
i5beq 10i7i8
i9i10i11i12jmp 18i14i15i16
1/41/41/41/4
1/21/2--
-1/31/31/31/1---
4/10 = 40% miss-ratio
misshithithit
hithit--
-misshithithit---
1/61/61/61/6
1/61/6--
-1/41/41/41/4---
2/10 = 20% miss-ratio
Block size = 4 words
i1i2i3i4
i5beq 10i7i8
i9i10i11i12jmp 18i14i15i16
Block size = 8 words
(reversed process to generate code with a fix miss-ratio)
28
1.Code interpretation: miss-ratio1.Code interpretation: miss-ratio
misshithithit
misshit--
-misshithitmiss---
i1i2i3i4
i5beq 10i7i8
i9i10i11i12jmp 18i14i15i16
1/41/41/41/4
1/21/2--
-1/31/31/31/1---
Line size = 4 words
misshithithit
1/41/41/41/4
misshithithit
1/21/21/41/4
missmisshithit
1/41/31/31/3
misshithithit
1/11/41/41/4
(reversed process to generate code with a fix miss-ratio)
29
1.Code interpretation: miss-ratio1.Code interpretation: miss-ratio Determine the worst-case cache miss-ratio (WCCMR) The highest frequency of misses possible for a
program! Depends on execution path (actually input data)
> Miss% < Miss%
The WCCMR-path is the most energy consuming! Optimize for
– Speed or Size
– Energy consumption
30
1.Key concepts bounding WCCMR
Spatial locality analysis Determine instruction’s ”local miss-ratio”
Search Find the execution path with the highest
cache miss-ratio
Execution path analysis Determine the weight of each
basic block (loop dependent)
31
1.Result (finding WCCMR)
Path Miss ratio # Instr Executiontime
1 20.6% 43 132
2 18.9% 37 107
3 19.7% 40 119
4 17.6% 34 94
5 21.6% 87 275
6 18.0% 43 121
...
if(a>b) {
...
...
do{
...
}while(c>d);
}
else {
...
...
while(e<3){
...
}
}
...
max !!
(1) (2) (3) (4) (5) (6)
32
Outline of this presentation Introduction
The cache memory and real-time
Measurement and probe effect
CPX2000 – “SARA system”
My own questions
Synthetic code generation
Analysis
Determine worst-case cache miss-ratio of a program
Measure instruction execution time w/wo cache
Measure cache related preemption delay
Conclusion and future work
33
2.When is a cache memory beneficial? On cache misses, the complete cache block is loaded
If cache block > instruction size miss-penalty
A cache can reduce system performance! High miss-ratio AND long miss-penalty
Experiment: Generate code with fix miss-ratio Measure time Plot the average execution time
34
2.Threshold miss-ratio level (@CPX2000)
Execution time (ns/instruction)
Cach
e m
iss-ra
tio (%
)
Cac
he d
isab
led
Cache enabled
Threshold-level (84%)
35
2.When is a cache memory beneficial? Concluding question:
“When is instruction caching beneficial?”
Answer: ”Always” (!!) “No code is so jumpy” “No missed deadlines” “Safe!”
(New Q&As) ”Why 84% miss?” ”Low refill penalty” ”Why?” ”Burst refill!”
CPUI/O
MEM
CACHE
CPUI/O
MEM
CACHE
Request MISS!Refill block
HITRequest
36
Outline of this presentation Introduction
The cache memory and real-time
Measurement and probe effect
CPX2000 – “SARA system”
My own questions
Synthetic code generation
Analysis
Determine worst-case cache miss-ratio of a program
Measure instruction execution time w/wo cache
Measure cache related preemption delay
Conclusion and future work
37
Extrinsic cache behavior - Task interference Non-preemptive systems
Preemptive systems
– Cache Related Preemption Delay - CRPD
T1 T2Mis
s-ra
tio
Time
T1 T2 T1
T2 preempts T1 T1 resumes
Mis
s-ra
tio
Time
3.Cache Related Preemption Delay
38
3.CRPDmax measurement
T1 T2 T1
T2 preempts T1 T1 resumes
Mis
s-ra
tio
Time
non-preempted preempted
iteration 1 iteration 2 i3 i4 i4 (cont.)
39
3.CRPDmax measurement
CRPD = ((e - d) + (c - b)) – (b - a) = 195 500 ns = 195,5 s915
399
425
918
791
225
921
219
825
921
592
925
922
751
625
non-preempted preempted
OS:43-87 s
40
3.CRPD (@CPX2000)
T1 Task size (cache size %)
CRPD
(micr
o se
cond
s)
195,5 s
41
Conclusions and summary of results
1. The worst-case cache miss-ratio of a program can be identified to quantify the energy usage of the memory system
2. The CPX2000 system cannot miss any deadline because of an enabled instruction cache.
3. Synthetic workbenches can force a system into a worst-case state
• The cache related preemption delay has been measured as a function of task size.
42
Future Work
None!
Develope the analysis method of worst-case cache miss-ratio levels by including temporal locality
Data caches (Generate synthetic code) Measure CRPD Measure threshold miss-ratio level
43
Acknowledgements Research was funded by
KK-stiftelsen Department of Computer Science and Engineering
(Mälardalen University)
Thank you… Supervisor Professor Dr. Ing. Lennart Lindh All people at the Computer Architecture Lab My family