Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of...
-
Upload
dora-eaton -
Category
Documents
-
view
219 -
download
0
Transcript of Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of...
![Page 1: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/1.jpg)
Lecture 14: Caching, cont.
EEN 312: Processors: Hardware, Software, and Interfacing
Department of Electrical and Computer Engineering
Spring 2014, Dr. Rozier (UM)
![Page 2: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/2.jpg)
EXAM
![Page 3: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/3.jpg)
Exam is one week from Thursday
• Thursday, April 3rd
• Covers:– Chapter 4– Chapter 5– Buffer Overflows– Exam 1 material fair game
![Page 4: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/4.jpg)
LAB QUESTIONS?
![Page 5: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/5.jpg)
CACHING
![Page 6: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/6.jpg)
Caching Review
• What is temporal locality?
• What is spatial locality?
![Page 7: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/7.jpg)
Caching Review
• What is a cache set?
• What is a cache line?
• What is a cache block?
![Page 8: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/8.jpg)
Caching Review
• What is a direct mapped cache?
• What is a set associative cache?
![Page 9: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/9.jpg)
Caching Review
• What is a cache miss?
• What happens on a miss?
![Page 10: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/10.jpg)
ASSOCIATIVITY AND MISS RATES
![Page 11: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/11.jpg)
How Much Associativity
• Increased associativity decreases miss rate– But with diminishing returns
• Simulation of a system with 64KBD-cache, 16-word blocks, SPEC2000– 1-way: 10.3%– 2-way: 8.6%– 4-way: 8.3%– 8-way: 8.1%
![Page 12: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/12.jpg)
Set Associative Cache Organization
![Page 13: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/13.jpg)
Multilevel Caches
• Primary cache attached to CPU– Small, but fast
• Level-2 cache services misses from primary cache– Larger, slower, but still faster than main memory
• Main memory services L-2 cache misses• Some high-end systems include L-3 cache
![Page 14: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/14.jpg)
Multilevel Cache Example
• Given– CPU base CPI = 1, clock rate = 4GHz– Miss rate/instruction = 2%– Main memory access time = 100ns
• With just primary cache– Miss penalty = 100ns/0.25ns = 400 cycles– Effective CPI = 1 + 0.02 × 400 = 9
![Page 15: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/15.jpg)
Example (cont.)
• Now add L-2 cache– Access time = 5ns– Global miss rate to main memory = 0.5%
• Primary miss with L-2 hit– Penalty = 5ns/0.25ns = 20 cycles
• Primary miss with L-2 miss– Extra penalty = 500 cycles
• CPI = 1 + 0.02 × 20 + 0.005 × 400 = 3.4• Performance ratio = 9/3.4 = 2.6
![Page 16: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/16.jpg)
Multilevel Cache Considerations
• Primary cache– Focus on minimal hit time
• L-2 cache– Focus on low miss rate to avoid main memory
access– Hit time has less overall impact
• Results– L-1 cache usually smaller than a single cache– L-1 block size smaller than L-2 block size
![Page 17: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/17.jpg)
Interactions with Advanced CPUs
• Out-of-order CPUs can execute instructions during cache miss– Pending store stays in load/store unit– Dependent instructions wait in reservation
stations• Independent instructions continue
• Effect of miss depends on program data flow– Much harder to analyse– Use system simulation
![Page 18: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/18.jpg)
Interactions with Software• Misses depend on
memory access patterns– Algorithm behavior– Compiler optimization
for memory access
![Page 19: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/19.jpg)
Replacement Policy
• Direct mapped: no choice• Set associative
– Prefer non-valid entry, if there is one– Otherwise, choose among entries in the set
• Least-recently used (LRU)– Choose the one unused for the longest time
• Simple for 2-way, manageable for 4-way, too hard beyond that
• Random– Gives approximately the same performance as
LRU for high associativity
![Page 20: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/20.jpg)
Replacement Policy
• Optimal algorithm:– The clairvoyant algorithm– Not a real possibility
– Why is it useful to consider?
![Page 21: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/21.jpg)
Replacement Policy
• Optimal algorithm:– The clairvoyant algorithm– Not a real possibility
– Why is it useful to consider?
– We can never beat the Oracle– Good baseline to compare to
• Get as close as possible
![Page 22: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/22.jpg)
LRU
• Least Recently Used– Discard the data which was used least recently.
– Need to maintain information on the age of a cache. How do we do so efficiently?
Break into GroupsPropose an algorithm
![Page 23: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/23.jpg)
Using a BDD
![Page 24: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/24.jpg)
LRU is Expensive
• Age bits take a lot to store
• What about pseudo LRU?
![Page 25: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/25.jpg)
Using a BDD
• BDDs have decisions at their leaf nodes
• BDDs have variables at their non-leaf nodes
• Each path from a non-leaf to a leaf node is an evaluation of the variable at the node
• Standard convention:– Left node on 0– Right node on 1
![Page 26: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/26.jpg)
BDD for LRU 4-way associative cache
all lines valid
all lines valid
b0 is 0b0 is 0Choose invalid
Line
Choose invalid
Line
b1 is 0b1 is 0 b1 is 0b1 is 0
line0line0 line1line1 line2line2 line3line3
![Page 27: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/27.jpg)
BDD for LRU 4-way associative cache
all lines valid
all lines valid
b0 is 0b0 is 0Choose invalid
Line
Choose invalid
Line
b1 is 0b1 is 0 b1 is 0b1 is 0
line0line0 line1line1 line2line2 line3line3
![Page 28: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/28.jpg)
BDD for LRU 4-way associative cache
• Pseudo-LRU– Uses only 3-bits
![Page 29: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/29.jpg)
WRAP UP
![Page 30: Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bff31a28abf838cbc9d3/html5/thumbnails/30.jpg)
For next time
• Read Chapter 5, section 5.4