LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6...

25
1 LECTURE 6 WELCOME

Transcript of LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6...

Page 1: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

1

LECTURE 6

WELCOME

Page 2: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

2

Page 3: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

3

Page 4: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

4

PART 1

THE CACHE

Page 5: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

5

cache

Page 6: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

6

cache

Page 7: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

7

cache

Page 8: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

8

Why is RAM slow?

Runs at a lower clockspeed;

Too far from the CPU

c = 300.000Km / s

at 4Ghz: 7.5cm per cycle

c in copper is lower

actually 5cm per cycle

2.5cm hence and forth

cache

Page 9: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

9

Level 1 cache

Level 2 cache

Registers: 0 cycles

L1: 2 cycles

L2: 15 cycles

RAM: 80 cycles

cache

Page 10: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

10

Level 1 cache

Level 2 cache

Registers: 0 cycles

L1: 4 cycles

L2: 11 cycles

L3: 39 cycles

RAM: 107 cycles

Level 3 cache

32KB

256KB

6MB

RAM: 107 cycles

cache

Page 11: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

11

cache

CACHE

0 0050 411CBB372B37

1 0000 0A3246F3762B

2 0030 8910EE24BACF

3 0080 2AB348FE376C

RAM

0000 0A3246F3762B

0010 64000101EA67

0020 2BD634633642

0030 8910EE24BACF

0040 374C34648232

0050 411CBB372B37

0060 283E34A8623A

0070 A83829200176

0080 2AB348FE376C

Full associative cache

Page 12: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

12

cache

CACHE

0 0050 411CBB372B37

1 0000 0A3246F3762B

2 0030 8910EE24BACF

3 0080 2AB348FE376C

Full associative cache

Retrieving data:

CPU wants to read from RAM

Cache searches for address

If found, data is returned

Otherwise, RAM is used

Obtained data is stored in cache

Writing data:

CPU wants to write to RAM

Cache searches for address

If found, data is written

Otherwise, new entry is created

Data to be written is stored in cache

Stored data is written to RAM ‘later’

Page 13: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

13

cache

CACHE

line tag data

0000 0000 000000000000

0001 0000 000000000000

0002 1A50 8910EE24BACF

0003 0B70 2AB348FE376C

0004 0000 000000000000

0005 0000 000000000000

0006 0000 000000000000

0007 0000 000000000000

Set associative cache

Page 14: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

14

cache

CACHE

line tag data

0000 0000 000000000000

0001 0000 000000000000

0002 1A50 8910EE24BACF

0003 0B70 2AB348FE376C

0004 0000 000000000000

0005 0000 000000000000

0006 0000 000000000000

0007 0000 000000000000

Set associative cache

Address: 0B700003

0003 0B70

line tag

Steps:

Split address in ‘line’ and ‘tag’

At cache line ‘line’, verify ‘tag’

If tag matches, return data

Otherwise, get data from RAM

Page 15: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

15

cache

CACHE

line tag data

0000 0000 000000000000

0001 0000 000000000000

0002 1A50 8910EE24BACF

0003 0B70 2AB348FE376C

0004 0000 000000000000

0005 0000 000000000000

0006 0000 000000000000

0007 0000 000000000000

Set associative cache

Address: 0CA00006

0006 0CA0

line tag

Address: 098A0006

0006 098A

line tag

Page 16: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

16

cache

N-Set associative cache

CACHE

line tag 1 data 1

0000 0000 000000000000

0001 0000 000000000000

0002 1A50 8910EE24BACF

0003 0B70 2AB348FE376C

0004 0000 000000000000

0005 0000 000000000000

0006 0000 000000000000

0007 0000 000000000000

CACHE

line tag 2 data 2

0000 0000 000000000000

0001 0000 000000000000

0002 0000 000000000000

0003 0FC0 1056BBA001FF

0004 0000 000000000000

0005 0000 000000000000

0006 0000 000000000000

0007 0000 000000000000

Page 17: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

17

cache

Caching – Summary

Full associative cache:

Based on an address, we search through all cache lines to see if

the requested data is available. This kind of cache must be small,

or the number of tests is huge.

Set associative cache:

Based on the address, we determine the cache line where our data

could be. We check for that line only if the data is available. Data

that ends up in the same cache line will render the cache useless.

N-Set associative cache:

Every cache line can now hold N addresses. We need to check all

N tags, so N is small. However, several addresses sharing the

same cache line can still be cached.

Page 18: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

18

cache

So… How does this affect your program?

1. 64 bytes per cache line:

2. 32Kb L1 cache, 8-way set associative:

3. Memory latency of 107 cycles:

4. Prefetching:

5. L1 instruction cache:

Page 19: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

19

PART 2

TOTAL RECAP

Page 20: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

20

Page 21: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

21

Page 22: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

22

“Dear Charles,

In almost every computation a

great variety of arrangements

for the succession of the

processes is possible, and various

considerations must influence

the selection amongst them

(...).

One essential object is to

choose that arrangement which

shall tend to reduce to a

minimum the time necessary for

completing the calculation.

Therefore, one should attend

PR3 and learn from it.

Love, Ada.”

Page 23: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

23

10 TIPS straight from Ada Lovelace & Charles Babage!

“HOW TO PASS PR3”

(0. Read the slides once more.)

1. Chose your tools. (timer, compiler, SVN, Excell, etc.)

2. Measure & note. (original performance, scalability, time for various parts of the app)

3. Take a step back. (think, don’t type: what could be done smarter? – then research)

4. Resist the urge. (don’t touch that sqrtf yet. Improve algorithms instead)

5. Measure & note. (things changed radically, so measure again, and write down things)

6. Now give in to the urge. (go wild: Cache. Low level. Multithread.)

7. Measure. Note. (don’t forget! More results means a better report and a higher grade.)

8. Goto 6. (there’s always more to tweak. Mind diminishing returns though.)

9. Add some SIMD. It’s mandatory. (really. Don’t forget.)

10. Add polish. Hand in. (at least make it *look* professional, it really helps)

Wednesday in the exam week – By MAIL!

FRIDAY

Page 24: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation
Page 25: LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6 cache. 7 cache. 8 Why is RAM slow? ... 22 “Dear Charles, In almost every computation

THE END