Post on 17-Jul-2020
1
LECTURE 6
WELCOME
2
3
4
PART 1
THE CACHE
5
cache
6
cache
7
cache
8
Why is RAM slow?
Runs at a lower clockspeed;
Too far from the CPU
c = 300.000Km / s
at 4Ghz: 7.5cm per cycle
c in copper is lower
actually 5cm per cycle
2.5cm hence and forth
cache
9
Level 1 cache
Level 2 cache
Registers: 0 cycles
L1: 2 cycles
L2: 15 cycles
RAM: 80 cycles
cache
10
Level 1 cache
Level 2 cache
Registers: 0 cycles
L1: 4 cycles
L2: 11 cycles
L3: 39 cycles
RAM: 107 cycles
Level 3 cache
32KB
256KB
6MB
RAM: 107 cycles
cache
11
cache
CACHE
0 0050 411CBB372B37
1 0000 0A3246F3762B
2 0030 8910EE24BACF
3 0080 2AB348FE376C
RAM
0000 0A3246F3762B
0010 64000101EA67
0020 2BD634633642
0030 8910EE24BACF
0040 374C34648232
0050 411CBB372B37
0060 283E34A8623A
0070 A83829200176
0080 2AB348FE376C
Full associative cache
12
cache
CACHE
0 0050 411CBB372B37
1 0000 0A3246F3762B
2 0030 8910EE24BACF
3 0080 2AB348FE376C
Full associative cache
Retrieving data:
CPU wants to read from RAM
Cache searches for address
If found, data is returned
Otherwise, RAM is used
Obtained data is stored in cache
Writing data:
CPU wants to write to RAM
Cache searches for address
If found, data is written
Otherwise, new entry is created
Data to be written is stored in cache
Stored data is written to RAM ‘later’
13
cache
CACHE
line tag data
0000 0000 000000000000
0001 0000 000000000000
0002 1A50 8910EE24BACF
0003 0B70 2AB348FE376C
0004 0000 000000000000
0005 0000 000000000000
0006 0000 000000000000
0007 0000 000000000000
Set associative cache
14
cache
CACHE
line tag data
0000 0000 000000000000
0001 0000 000000000000
0002 1A50 8910EE24BACF
0003 0B70 2AB348FE376C
0004 0000 000000000000
0005 0000 000000000000
0006 0000 000000000000
0007 0000 000000000000
Set associative cache
Address: 0B700003
0003 0B70
line tag
Steps:
Split address in ‘line’ and ‘tag’
At cache line ‘line’, verify ‘tag’
If tag matches, return data
Otherwise, get data from RAM
15
cache
CACHE
line tag data
0000 0000 000000000000
0001 0000 000000000000
0002 1A50 8910EE24BACF
0003 0B70 2AB348FE376C
0004 0000 000000000000
0005 0000 000000000000
0006 0000 000000000000
0007 0000 000000000000
Set associative cache
Address: 0CA00006
0006 0CA0
line tag
Address: 098A0006
0006 098A
line tag
16
cache
N-Set associative cache
CACHE
line tag 1 data 1
0000 0000 000000000000
0001 0000 000000000000
0002 1A50 8910EE24BACF
0003 0B70 2AB348FE376C
0004 0000 000000000000
0005 0000 000000000000
0006 0000 000000000000
0007 0000 000000000000
CACHE
line tag 2 data 2
0000 0000 000000000000
0001 0000 000000000000
0002 0000 000000000000
0003 0FC0 1056BBA001FF
0004 0000 000000000000
0005 0000 000000000000
0006 0000 000000000000
0007 0000 000000000000
17
cache
Caching – Summary
Full associative cache:
Based on an address, we search through all cache lines to see if
the requested data is available. This kind of cache must be small,
or the number of tests is huge.
Set associative cache:
Based on the address, we determine the cache line where our data
could be. We check for that line only if the data is available. Data
that ends up in the same cache line will render the cache useless.
N-Set associative cache:
Every cache line can now hold N addresses. We need to check all
N tags, so N is small. However, several addresses sharing the
same cache line can still be cached.
18
cache
So… How does this affect your program?
1. 64 bytes per cache line:
2. 32Kb L1 cache, 8-way set associative:
3. Memory latency of 107 cycles:
4. Prefetching:
5. L1 instruction cache:
19
PART 2
TOTAL RECAP
20
21
22
“Dear Charles,
In almost every computation a
great variety of arrangements
for the succession of the
processes is possible, and various
considerations must influence
the selection amongst them
(...).
One essential object is to
choose that arrangement which
shall tend to reduce to a
minimum the time necessary for
completing the calculation.
Therefore, one should attend
PR3 and learn from it.
Love, Ada.”
23
10 TIPS straight from Ada Lovelace & Charles Babage!
“HOW TO PASS PR3”
(0. Read the slides once more.)
1. Chose your tools. (timer, compiler, SVN, Excell, etc.)
2. Measure & note. (original performance, scalability, time for various parts of the app)
3. Take a step back. (think, don’t type: what could be done smarter? – then research)
4. Resist the urge. (don’t touch that sqrtf yet. Improve algorithms instead)
5. Measure & note. (things changed radically, so measure again, and write down things)
6. Now give in to the urge. (go wild: Cache. Low level. Multithread.)
7. Measure. Note. (don’t forget! More results means a better report and a higher grade.)
8. Goto 6. (there’s always more to tweak. Mind diminishing returns though.)
9. Add some SIMD. It’s mandatory. (really. Don’t forget.)
10. Add polish. Hand in. (at least make it *look* professional, it really helps)
Wednesday in the exam week – By MAIL!
FRIDAY
THE END