Instruction Cache Memory Issues in Real-Time Systems

43
Instruction Cache Memory Issues in Real-Time Systems Licentiate dissertation Filip Sebek October 11 th , 2002 Opponent: Axel Jantsch (KTH) Examinator: Lars Wanhammar (LiTH)

description

Instruction Cache Memory Issues in Real-Time Systems. Licentiate dissertation Filip Sebek October 11 th , 2002 Opponent: Axel Jantsch (KTH) Examinator : Lars Wanhammar (LiTH). Outline of this dissertation. Seminar About this thesis ( Lennart Lindh ) - PowerPoint PPT Presentation

Transcript of Instruction Cache Memory Issues in Real-Time Systems

Page 1: Instruction Cache Memory  Issues in Real-Time Systems

Instruction Cache Memory Issuesin Real-Time Systems

Licentiate dissertation

Filip Sebek October 11th, 2002

Opponent: Axel Jantsch (KTH)

Examinator: Lars Wanhammar (LiTH)

Page 2: Instruction Cache Memory  Issues in Real-Time Systems

2

Outline of this dissertation Seminar

About this thesis (Lennart Lindh)

Thesis presentation (Filip Sebek)

Comments and questions (Axel Jantsch and Filip Sebek)

Questions from the audience

Consideration (Lars Wanhammar, Axel Jantsch, and Lennart Lindh)

Festivity (?) at the department

Page 3: Instruction Cache Memory  Issues in Real-Time Systems

3

Organisation

RT Systems Design Lab Comp. Architecture Lab Computer Science Lab

Graduate Education

Lic school

Int’l MSc school

Undergraduate Education

Page 4: Instruction Cache Memory  Issues in Real-Time Systems

4

Mohammed El Shobaki: System

Monitoring/Debugging of S/Multiprocessor Systems

Tommy Klevin: Bus analyzer (RealFast)

Stefan Sjöberg: Design ASIC/FPGA with Top Down Design Flow and VHDL

(RealFast ABB)

Joakim Persson: Redundant System

(ProTang, KK)

Johan Stärner: Multiprocessor

Architecture (KK)

Leif Enblom (ABB APR): Multiprocessor system

for (ABB KK)

Filip Sebek: Instruction Cache Memory Issues in

RTS

Filip Sebek: Instruction Cache Memory Issues in

RTS

Stefan Stjernen: IP Design (RealFast,

Industrial ResearchSchool: Electronic Design )

Raimo Haukilahti KTH/MDH: Low-Power Techniques for HW-RTOS

(KTH)

Page 5: Instruction Cache Memory  Issues in Real-Time Systems

5

The title and the questions

Title:Instruction Cache Memory Issues

in Real-Time Systems

Initial questions How do I measure the cache-related preemption

delay in a real-time system?

Is a cache memory really a problem in real-time

systems?

Page 6: Instruction Cache Memory  Issues in Real-Time Systems

6

Automatic control – Real-time system

1. Get input – sample…2. Compute – execute instructions3. Actuate – control the process…4. = Action!

A real-time system must produce correct results in time

Examples Air bag in action An armored tank in movement shoots Supertanker turns Toaster

Page 7: Instruction Cache Memory  Issues in Real-Time Systems

7

Real-time system implementation Often as many ”small” cyclic programs

– tasks or processes – that communicate with each other

Alarm task

Sample task

Computation

Actuate

Page 8: Instruction Cache Memory  Issues in Real-Time Systems

8

What Real-Time research is about:

Predicting execution time (of a task) Difficult – Many parameters

– Input data sensitive

– Program design

– Hardware dependant

– Compiler dependent

Several methods

Scheduling tasks static or dynamic may allow pre-emption

Page 9: Instruction Cache Memory  Issues in Real-Time Systems

9

The title and the questions

Title:Instruction Cache Memory Issues

in Real-Time Systems

Initial questions How do I measure the cache-related preemption

delay in a real-time system?

Is a cache memory really a problem in real-time

systems?

Page 10: Instruction Cache Memory  Issues in Real-Time Systems

10

What is a cache memory?

Cache memories are faster than primary memory and keeps pace

with CPU speed

Reduce congesting bus-traffic

Saves energy

Instruction fetch time becomes variable with caches;

hit-time and miss-penalty

CPUI/O

MEM

CACHE

Fast (~95%)

Slow (~5%)

Page 11: Instruction Cache Memory  Issues in Real-Time Systems

11

How does a cache memory work?

Cache hit and cache miss

Locality Temporal locality;

– memory references close in time

– loops and functions

Spatial locality; – memory references close in space

– cache block and wide data bus

int funk(int term){ int vector[SIZE]; int i, sum=0; for(i=0;i<SIZE;i++) { vector[i] +=term; sum +=vector[i]; } return sum;}

Page 12: Instruction Cache Memory  Issues in Real-Time Systems

12

The title and the questions

Title:Instruction Cache Memory Issues

in Real-Time Systems

Initial questions How do I measure the cache-related preemption

delay in a real-time system?

Is a cache memory really a problem in real-time

systems?

Page 13: Instruction Cache Memory  Issues in Real-Time Systems

13

Cache memories and real-time Cache memories make execution time variable

Sample, execute, actuate – action! Sample, execute, actuate – action! Sample, execute, actuate – action!

Analysis is non-trivial; cache contents depends on execution path execution path depends on cache contents

Missed deadline?

Page 14: Instruction Cache Memory  Issues in Real-Time Systems

14

Predicting cache behavior Avoidance and simplifications

Disable cache! Special designed processors and caches

Static analysis + no probe effects + safe overestimation - modern hardware (Paper C)

Simulation + simple - simulator must model correctly

Real measurement + measure on complex systems - probe effect (Papers A, B, D)

Page 15: Instruction Cache Memory  Issues in Real-Time Systems

15

The title and the questions

Title:Instruction Cache Memory Issues

in Real-Time Systems

Initial questions How do I measure the cache-related preemption

delay in a real-time system?

Is a cache memory really a problem in real-time

systems?

Page 16: Instruction Cache Memory  Issues in Real-Time Systems

16

Measurement and probe effect Most measurement affect the measured object when included or

removed from the measured environment. Examples:

A warm thermometer measures a glass of cold water

A computer monitoring system measures CPU load

Reduce the intrusion (probe effect) to a minimum!

Page 17: Instruction Cache Memory  Issues in Real-Time Systems

17

Facts and Problems Solutions

Page 18: Instruction Cache Memory  Issues in Real-Time Systems

18

Exploit the performance monitor that is equipped on CPU 4 registers on MPC750 Counts events

L1 Instruction fetch miss Branch miss Processor clocks Completed instructions Completed Load/Stores …

The Built-in Performance Monitor

NON-INTRUSIVE !

Page 19: Instruction Cache Memory  Issues in Real-Time Systems

19

SARA CPU Card

Page 20: Instruction Cache Memory  Issues in Real-Time Systems

20

SARA MP-system and MAMon

Page 21: Instruction Cache Memory  Issues in Real-Time Systems

21

My questions revised

Initial questions: How do I measure the cache-related preemption delay in a

real-time system?

Is a cache memory really a problem in real-time systems?

Modified questions: Is there a simple(r) way to predict or measure cache misses

in a real-time system?

Can an instruction cache cause a missed deadline when it is enabled?

How much is the cache-related pre-emption delay in absolute and relative terms?

Page 22: Instruction Cache Memory  Issues in Real-Time Systems

22

Outline of this presentation Introduction

The cache memory and real-time

Measurement and probe effect

CPX2000 – “SARA system”

My own questions

Synthetic code generation

Analysis

Determine worst-case cache miss-ratio of a program

Measure instruction execution time w/wo cache

Measure cache related preemption delay

Conclusion and future work

Page 23: Instruction Cache Memory  Issues in Real-Time Systems

23

Current state in presentation:

We have 3 questions!

We have an experimental system!

We can measure on it with a small intrusion!

Q: Measure on what program?

Page 24: Instruction Cache Memory  Issues in Real-Time Systems

24

Code generation: size Workbench

Standard benchmark? (Rhealstone, EEMBC etc.) Measure worst-case situations

Synthetic code – size specific One big loop

addis r3,r3,0x0000 = 4 bytes

Not representative code – no problem! Swap out cache contents – find maximum cost

– Code size measured in “cache size”

Page 25: Instruction Cache Memory  Issues in Real-Time Systems

25

Code generation: miss-ratio One (out of several methods)

”Play with spatial locality”

– Method: Jump instructions breaks spatial locality

– Requirements: code size 2×cache size– Result: 1/block size – 100% cache misses

L1: nop (m)nop (h)nop (h)nop (h)

L2: nop (m)nop (h)nop (h)nop (h)

L1: J L2 (m)n.u.n.u.n.u.

L2: J L3 (m)n.u.n.u.n.u.

L1: nop (m)J L2 (h)n.u.n.u.

L2: nop (m)J L3 (h)n.u.n.u.

L1: nop (m)nop (h)J L2 (h)n.u.

L2: nop (m)nop (h)J L3 (h)n.u.

25% 100% 50% 33%

Page 26: Instruction Cache Memory  Issues in Real-Time Systems

26

Analysis!

Page 27: Instruction Cache Memory  Issues in Real-Time Systems

27

1.Code interpretation: miss-ratio

misshithithit

misshit--

-misshithitmiss---

i1i2i3i4

i5beq 10i7i8

i9i10i11i12jmp 18i14i15i16

1/41/41/41/4

1/21/2--

-1/31/31/31/1---

4/10 = 40% miss-ratio

misshithithit

hithit--

-misshithithit---

1/61/61/61/6

1/61/6--

-1/41/41/41/4---

2/10 = 20% miss-ratio

Block size = 4 words

i1i2i3i4

i5beq 10i7i8

i9i10i11i12jmp 18i14i15i16

Block size = 8 words

(reversed process to generate code with a fix miss-ratio)

Page 28: Instruction Cache Memory  Issues in Real-Time Systems

28

1.Code interpretation: miss-ratio1.Code interpretation: miss-ratio

misshithithit

misshit--

-misshithitmiss---

i1i2i3i4

i5beq 10i7i8

i9i10i11i12jmp 18i14i15i16

1/41/41/41/4

1/21/2--

-1/31/31/31/1---

Line size = 4 words

misshithithit

1/41/41/41/4

misshithithit

1/21/21/41/4

missmisshithit

1/41/31/31/3

misshithithit

1/11/41/41/4

(reversed process to generate code with a fix miss-ratio)

Page 29: Instruction Cache Memory  Issues in Real-Time Systems

29

1.Code interpretation: miss-ratio1.Code interpretation: miss-ratio Determine the worst-case cache miss-ratio (WCCMR) The highest frequency of misses possible for a

program! Depends on execution path (actually input data)

> Miss% < Miss%

The WCCMR-path is the most energy consuming! Optimize for

– Speed or Size

– Energy consumption

Page 30: Instruction Cache Memory  Issues in Real-Time Systems

30

1.Key concepts bounding WCCMR

Spatial locality analysis Determine instruction’s ”local miss-ratio”

Search Find the execution path with the highest

cache miss-ratio

Execution path analysis Determine the weight of each

basic block (loop dependent)

Page 31: Instruction Cache Memory  Issues in Real-Time Systems

31

1.Result (finding WCCMR)

Path Miss ratio # Instr Executiontime

1 20.6% 43 132

2 18.9% 37 107

3 19.7% 40 119

4 17.6% 34 94

5 21.6% 87 275

6 18.0% 43 121

...

if(a>b) {

...

...

do{

...

}while(c>d);

}

else {

...

...

while(e<3){

...

}

}

...

max !!

(1) (2) (3) (4) (5) (6)

Page 32: Instruction Cache Memory  Issues in Real-Time Systems

32

Outline of this presentation Introduction

The cache memory and real-time

Measurement and probe effect

CPX2000 – “SARA system”

My own questions

Synthetic code generation

Analysis

Determine worst-case cache miss-ratio of a program

Measure instruction execution time w/wo cache

Measure cache related preemption delay

Conclusion and future work

Page 33: Instruction Cache Memory  Issues in Real-Time Systems

33

2.When is a cache memory beneficial? On cache misses, the complete cache block is loaded

If cache block > instruction size miss-penalty

A cache can reduce system performance! High miss-ratio AND long miss-penalty

Experiment: Generate code with fix miss-ratio Measure time Plot the average execution time

Page 34: Instruction Cache Memory  Issues in Real-Time Systems

34

2.Threshold miss-ratio level (@CPX2000)

Execution time (ns/instruction)

Cach

e m

iss-ra

tio (%

)

Cac

he d

isab

led

Cache enabled

Threshold-level (84%)

Page 35: Instruction Cache Memory  Issues in Real-Time Systems

35

2.When is a cache memory beneficial? Concluding question:

“When is instruction caching beneficial?”

Answer: ”Always” (!!) “No code is so jumpy” “No missed deadlines” “Safe!”

(New Q&As) ”Why 84% miss?” ”Low refill penalty” ”Why?” ”Burst refill!”

CPUI/O

MEM

CACHE

CPUI/O

MEM

CACHE

Request MISS!Refill block

HITRequest

Page 36: Instruction Cache Memory  Issues in Real-Time Systems

36

Outline of this presentation Introduction

The cache memory and real-time

Measurement and probe effect

CPX2000 – “SARA system”

My own questions

Synthetic code generation

Analysis

Determine worst-case cache miss-ratio of a program

Measure instruction execution time w/wo cache

Measure cache related preemption delay

Conclusion and future work

Page 37: Instruction Cache Memory  Issues in Real-Time Systems

37

Extrinsic cache behavior - Task interference Non-preemptive systems

Preemptive systems

– Cache Related Preemption Delay - CRPD

T1 T2Mis

s-ra

tio

Time

T1 T2 T1

T2 preempts T1 T1 resumes

Mis

s-ra

tio

Time

3.Cache Related Preemption Delay

Page 38: Instruction Cache Memory  Issues in Real-Time Systems

38

3.CRPDmax measurement

T1 T2 T1

T2 preempts T1 T1 resumes

Mis

s-ra

tio

Time

non-preempted preempted

iteration 1 iteration 2 i3 i4 i4 (cont.)

Page 39: Instruction Cache Memory  Issues in Real-Time Systems

39

3.CRPDmax measurement

CRPD = ((e - d) + (c - b)) – (b - a) = 195 500 ns = 195,5 s915

399

425

918

791

225

921

219

825

921

592

925

922

751

625

non-preempted preempted

OS:43-87 s

Page 40: Instruction Cache Memory  Issues in Real-Time Systems

40

3.CRPD (@CPX2000)

T1 Task size (cache size %)

CRPD

(micr

o se

cond

s)

195,5 s

Page 41: Instruction Cache Memory  Issues in Real-Time Systems

41

Conclusions and summary of results

1. The worst-case cache miss-ratio of a program can be identified to quantify the energy usage of the memory system

2. The CPX2000 system cannot miss any deadline because of an enabled instruction cache.

3. Synthetic workbenches can force a system into a worst-case state

• The cache related preemption delay has been measured as a function of task size.

Page 42: Instruction Cache Memory  Issues in Real-Time Systems

42

Future Work

None!

Develope the analysis method of worst-case cache miss-ratio levels by including temporal locality

Data caches (Generate synthetic code) Measure CRPD Measure threshold miss-ratio level

Page 43: Instruction Cache Memory  Issues in Real-Time Systems

43

Acknowledgements Research was funded by

KK-stiftelsen Department of Computer Science and Engineering

(Mälardalen University)

Thank you… Supervisor Professor Dr. Ing. Lennart Lindh All people at the Computer Architecture Lab My family