LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6...

LECTURE 6

WELCOME

PART 1

THE CACHE

Why is RAM slow?

Runs at a lower clockspeed;

Too far from the CPU

c = 300.000Km / s

at 4Ghz: 7.5cm per cycle

c in copper is lower

actually 5cm per cycle

2.5cm hence and forth

Level 1 cache

Level 2 cache

Registers: 0 cycles

L1: 2 cycles

L2: 15 cycles

RAM: 80 cycles

Level 1 cache

Level 2 cache

Registers: 0 cycles

L1: 4 cycles

L2: 11 cycles

L3: 39 cycles

RAM: 107 cycles

Level 3 cache

RAM: 107 cycles

0 0050 411CBB372B37

1 0000 0A3246F3762B

2 0030 8910EE24BACF

3 0080 2AB348FE376C

0000 0A3246F3762B

0010 64000101EA67

0020 2BD634633642

0030 8910EE24BACF

0040 374C34648232

0050 411CBB372B37

0060 283E34A8623A

0070 A83829200176

0080 2AB348FE376C

Full associative cache

0 0050 411CBB372B37

1 0000 0A3246F3762B

2 0030 8910EE24BACF

3 0080 2AB348FE376C

Full associative cache

Retrieving data:

CPU wants to read from RAM

Cache searches for address

If found, data is returned

Otherwise, RAM is used

Obtained data is stored in cache

Writing data:

CPU wants to write to RAM

Cache searches for address

If found, data is written

Otherwise, new entry is created

Data to be written is stored in cache

Stored data is written to RAM ‘later’

line tag data

0000 0000 000000000000

0001 0000 000000000000

0002 1A50 8910EE24BACF

0003 0B70 2AB348FE376C

0004 0000 000000000000

0005 0000 000000000000

0006 0000 000000000000

0007 0000 000000000000

Set associative cache

line tag data

0000 0000 000000000000

0001 0000 000000000000

0002 1A50 8910EE24BACF

0003 0B70 2AB348FE376C

0004 0000 000000000000

0005 0000 000000000000

0006 0000 000000000000

0007 0000 000000000000

Address: 0B700003

0003 0B70

line tag

Steps:

Split address in ‘line’ and ‘tag’

At cache line ‘line’, verify ‘tag’

If tag matches, return data

Otherwise, get data from RAM

line tag data

0000 0000 000000000000

0001 0000 000000000000

0002 1A50 8910EE24BACF

0003 0B70 2AB348FE376C

0004 0000 000000000000

0005 0000 000000000000

0006 0000 000000000000

0007 0000 000000000000

Address: 0CA00006

0006 0CA0

line tag

Address: 098A0006

0006 098A

line tag

N-Set associative cache

line tag 1 data 1

0000 0000 000000000000

0001 0000 000000000000

0002 1A50 8910EE24BACF

0003 0B70 2AB348FE376C

0004 0000 000000000000

0005 0000 000000000000

0006 0000 000000000000

0007 0000 000000000000

line tag 2 data 2

0000 0000 000000000000

0001 0000 000000000000

0002 0000 000000000000

0003 0FC0 1056BBA001FF

0004 0000 000000000000

0005 0000 000000000000

0006 0000 000000000000

0007 0000 000000000000

Caching – Summary

Full associative cache:

Based on an address, we search through all cache lines to see if

the requested data is available. This kind of cache must be small,

or the number of tests is huge.

Set associative cache:

Based on the address, we determine the cache line where our data

could be. We check for that line only if the data is available. Data

that ends up in the same cache line will render the cache useless.

N-Set associative cache:

Every cache line can now hold N addresses. We need to check all

N tags, so N is small. However, several addresses sharing the

same cache line can still be cached.

So… How does this affect your program?

1. 64 bytes per cache line:

2. 32Kb L1 cache, 8-way set associative:

3. Memory latency of 107 cycles:

4. Prefetching:

5. L1 instruction cache:

PART 2

TOTAL RECAP

“Dear Charles,

In almost every computation a

great variety of arrangements

for the succession of the

processes is possible, and various

considerations must influence

the selection amongst them

(...).

One essential object is to

choose that arrangement which

shall tend to reduce to a

minimum the time necessary for

completing the calculation.

Therefore, one should attend

PR3 and learn from it.

Love, Ada.”

10 TIPS straight from Ada Lovelace & Charles Babage!

“HOW TO PASS PR3”

(0. Read the slides once more.)

1. Chose your tools. (timer, compiler, SVN, Excell, etc.)

2. Measure & note. (original performance, scalability, time for various parts of the app)

3. Take a step back. (think, don’t type: what could be done smarter? – then research)

4. Resist the urge. (don’t touch that sqrtf yet. Improve algorithms instead)

5. Measure & note. (things changed radically, so measure again, and write down things)

6. Now give in to the urge. (go wild: Cache. Low level. Multithread.)

7. Measure. Note. (don’t forget! More results means a better report and a higher grade.)

8. Goto 6. (there’s always more to tweak. Mind diminishing returns though.)

9. Add some SIMD. It’s mandatory. (really. Don’t forget.)

10. Add polish. Hand in. (at least make it *look* professional, it really helps)

Wednesday in the exam week – By MAIL!

FRIDAY

THE END

LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6...

Documents

Transcript of LECTURE 6 WELCOME - Utrecht University · LECTURE 6 WELCOME. 2. 3. 4 PART 1 THE CACHE. 5 cache. 6...

Lecture 14 Cache Manager

361 Computer Architecture Lecture 14: Cache Memory

Lecture 2. Snoop-based Cache Coherence Protocols

Lecture 11: Large Cache Design IV

1 Lecture 16: Large Cache Innovations Today: Large cache design and other cache innovations Midterm scores 91-80: 17 students 79-75: 14 students

Lecture 17: Large Cache Design

Lecture 10: Snooping-Based Cache Coherence15418.courses.cs.cmu.edu/spring2016content/lectures/10_snoopcoherence/... · Cache controllers monitor (“they snoop”) memory operations,

Cau 6 cache

1 Lecture 7: PCM, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.

Lecture 17-18: Memory Hierarchy · • In computer architecture, almost everything is a cache! – Registers a cache on variables – First-level cache a cache on second-level cache

Lecture 9: Cache Coherence

Cache Coherence Protocols for Chip Multiprocessors - Ijohnmc/comp522/lecture-notes/COMP... · Cache Coherence Protocols for Chip Multiprocessors - I COMP 522 Lecture 5 6 September

Lecture 12: Large Cache Design

Cache Memories October 6, 2006

Lecture 16: Cache Memories • Last Time • Today

Lecture 12: Memory Hierarchy --Cache Optimizations...Lecture 12: Memory Hierarchy--Cache Optimizations CSCE 513 Computer Architecture Department of Computer Science and Engineering

EE898.02 Architecture of Digital Systems Lecture 3 Cache Memory

Lecture 11: Snooping Cache Coherence: Part II

Lecture 3. Directory-based Cache Coherence

CSE 30321 – Lecture 20 – Improving Cache Performance ...mniemier/teaching/2010_B_Fall/... · CSE 30321 – Lecture 20 – Improving Cache Performance! Addressing Miss Penalties!