Block size for caches
-
Upload
mona-ortiz -
Category
Documents
-
view
24 -
download
0
description
Transcript of Block size for caches
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Block size for cachesBlock size for caches
110
130
150160
180
200
220
240
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]
CacheProcessor
tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
2 cache lines2 byte block3 bit tag field
V
V
Block #
0
1
2
3
4
5
6
7
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Block size for cachesBlock size for caches
110
130
150160
180
200
220
240
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]
CacheProcessor
tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
0
0
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Block size for cachesBlock size for caches
110
130
150160
180
200
220
240
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]
CacheProcessor
0tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
100110
110 Misses: 1
Hits: 0
lru
Addr: 0001 block offs
et
1
0
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Block size for cachesBlock size for caches
110
130
150160
180
200
220
240
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]
CacheProcessor
0tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
100110
110 Misses: 1
Hits: 0
lru
1
0
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Block size for cachesBlock size for caches
110
130
150160
180
200
220
240
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]
CacheProcessor
0tag data
R0R1R2R3
Memory
2
100
120
140
170
190
210
230
250
100110
110 Misses: 2
Hits: 0
lru
150140
150
Addr: 0101 block offs
et
1
1
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Block size for cachesBlock size for caches
110
130
150160
180
200
220
240
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]
CacheProcessor
0tag data
R0R1R2R3
Memory
2
100
120
140
170
190
210
230
250
100110
110 Misses: 2
Hits: 0
lru
150140
150
1
1
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Block size for cachesBlock size for caches
110
130
150160
180
200
220
240
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]
CacheProcessor
0tag data
R0R1R2R3
Memory
2
100
120
140
170
190
210
230
250
100110
110 Misses: 2
Hits: 1
lru150140
150110
1
1
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Block size for cachesBlock size for caches
110
130
150160
180
200
220
240
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]
CacheProcessor
0tag data
R0R1R2R3
Memory
2
100
120
140
170
190
210
230
250
100110
110 Misses: 2
Hits: 1
lru150140
150110
1
1
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Block size for cachesBlock size for caches
110
130
150160
180
200
220
240
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]
CacheProcessor
0tag data
R0R1R2R3
Memory
2
100
120
140
170
190
210
230
250
100110
110 Misses: 2
Hits: 2
lru
150140
150140
1
1
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Block size for cachesBlock size for caches
110
130
150160
180
200
220
240
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]
CacheProcessor
0tag data
R0R1R2R3
Memory
2
100
120
140
170
190
210
230
250
100110
110 Misses: 2
Hits: 2
lru
150140
150140
1
1
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Block size for cachesBlock size for caches
110
130
150160
180
200
220
240
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]
CacheProcessor
0tag data
R0R1R2R3
Memory
2
100
120
140
170
190
210
230
250
100110
110 Misses: 2
Hits: 3
lru150140
140100140
1
1
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Basic Cache organizationBasic Cache organization• Decide on the block size
– How? Simulate lots of different block sizes and see which one gives the best performance
– Most systems use a block size between 32 bytes and 128 bytes– Longer sizes reduce the overhead by:
• Reducing the number of bits in each TAG• Reducing the size of each TAG Array
– Very large sizes reduce the “usefulness” of the extra data• Spatial Locality – the closer it is, the more likely it will be used
TagBlockoffset
Address
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Questions to ask about a cacheQuestions to ask about a cache
• What is the block size?• How many lines?• How many bytes of data storage?• How much overhead storage?• What is the hit rate?• What is the latency of an access?• What is the replacement policy ?
– LRU? LFU? FIFO? Random?
The Design Space is The Design Space is LargeLarge
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
What about stores?What about stores?
• Where should you write the result of a store?– If that memory location is in the cache?
• Send it to the cache• Should we also send it to memory?
(write-through policy)– If it is not in the cache?
• Write it directly to memory without allocation? (write-around policy)• OR – Allocate the line (put it in the cache)?
(allocate-on-write policy)
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Handling stores (write-through)Handling stores (write-through)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
V tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 0
Hits: 0
0
0
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-through (REF 1)write-through (REF 1)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
V tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 0
Hits: 0
0
0
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-through (REF 1)write-through (REF 1)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 1
Hits: 0
lru
1
02978
29
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-through (REF 2)write-through (REF 2)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 1
Hits: 0
lru
1
02978
29
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-through (REF 2)write-through (REF 2)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V tag data
R0R1R2R3
Memory
3
78
120
71
173
21
28
200
225
Misses: 2
Hits: 0
lru 1
12978
29
162173
173
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-through (REF 3)write-through (REF 3)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V tag data
R0R1R2R3
Memory
3
78
120
71
173
21
28
200
225
Misses: 2
Hits: 0
lru 1
12978
29
162173
173
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-through (REF 3)write-through (REF 3)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V tag data
R0R1R2R3
Memory
3
120
71
173
21
28
200
225
Misses: 2
Hits: 1
lru
1
129
29
162173
173
173
173
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-through (REF 4)write-through (REF 4)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V tag data
R0R1R2R3
Memory
3
173
120
71
173
21
28
200
225
Misses: 2
Hits: 1
lru
1
129173
29
162173
173
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-through (REF 4)write-through (REF 4)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V tag data
R0R1R2R3
Memory
2
173
120
71
173
21
28
200
225
Misses: 3
Hits: 1
lru 1
129173
29173
1507129
29
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-through (REF 6)write-through (REF 6)
29
123
29162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V tag data
R0R1R2R3
Memory
2
173
120
71
173
21
28
200
225
Misses: 3
Hits: 1
lru 1
129173
29173
2971
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-through (REF 6)write-through (REF 6)
29
123
29162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
5V tag data
R0R1R2R3
Memory
2
173
120
71
173
21
28
200
225
Misses: 4
Hits: 1
lru
1
1
29
2971
3328
33
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
How many memory references?How many memory references?
• Every time we STORE, we go all the way to memory– Even if we hit in the cache!
caches generally miss < 10%
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Write-throughWrite-through vs. vs. Write-backWrite-back
• Can we design the cache to NOT write all stores to memory immediately?– We can keep the most current copy JUST in the cache– If that data gets evicted from the cache, update memory
(a write-back policy)• We don’t want to lose the data!
– Do we need to write-back all evicted blocks?• No, only blocks that have been stored into
– Keep a “dirty bit”, reset when the block is allocated, set when the block is stored into. If a block is “dirty” when evicted, write its data back into memory.
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Handling stores (write-back)Handling stores (write-back)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 0
Hits: 0
0
0
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-back (REF 1)write-back (REF 1)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 0
Hits: 0
0
0
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-back (REF 1)write-back (REF 1)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 1
Hits: 0
01
0lru 29
78
29
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-back (REF 2)write-back (REF 2)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 1
Hits: 0
0
1
0lru 29
78
29
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-back (REF 2)write-back (REF 2)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V d tag data
R0R1R2R3
Memory
3
78
120
71
173
21
28
200
225
Misses: 2
Hits: 0
0
0
1
1lr
u2978
29
162173
173
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-back (REF 3)write-back (REF 3)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V d tag data
R0R1R2R3
Memory
3
78
120
71
173
21
28
200
225
Misses: 2
Hits: 0
0
0
1
1lr
u2978
29
162173
173
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-back (REF 3)write-back (REF 3)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V d tag data
R0R1R2R3
Memory
3
78
120
71
173
21
28
200
225
Misses: 2
Hits: 1
1
0
1
1lru 29
173
29
162173
173
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-back (REF 4)write-back (REF 4)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V d tag data
R0R1R2R3
Memory
3
78
120
71
173
21
28
200
225
Misses: 2
Hits: 1
1
0
1
1lru 29
173
29
162173
173
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-back (REF 4)write-back (REF 4)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V d tag data
R0R1R2R3
Memory
3
78
120
71
173
21
28
200
225
Misses: 3
Hits: 1
1
1
1
1lr
u29173
29173
2971
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-back (REF 5)write-back (REF 5)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V d tag data
R0R1R2R3
Memory
3
78
120
71
173
21
28
200
225
Misses: 3
Hits: 1
1
1
1
1lr
u29173
29173
2971
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-back (REF 5)write-back (REF 5)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
0V d tag data
R0R1R2R3
Memory
3
78
120
71
173
21
28
200
225
Misses: 4
Hits: 1
1
1
1
1lr
u29173
29173
2971
173
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
write-back (REF 5)write-back (REF 5)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]
CacheProcessor
5V d tag data
R0R1R2R3
Memory
3
173
120
71
173
21
28
200
225
Misses: 4
Hits: 1
0
1
1
1lru
29
2971
3328
33
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Where does write-back save us?Where does write-back save us?
• We write the data to memory eventually anyways – how is this better than write-through?
• If a value is written repeatedly, it only gets updated in the cache. It doesn’t have to store to memory every time!
– Think: loop counter, running sum, etc.
• Result: less total trips to memory, lower latency for stores
• If your data set fits in the cache – you can essentially skip going to memory beyond the initial load-up of program values!
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
What about instructions?What about instructions?
• Instructions should be cached as well.• We have two choices:
1. Treat instruction fetches as normal data and allocate cache blocks when fetched.
2. Create a second cache (called the instruction cache or ICache) which caches instructions only.• What are advantages of a separate ICache?• Can anything go wrong with this?
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Cache AssociativityCache Associativity
Balancing speed with capacity
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
AssociativityAssociativity
• We designed a fully associative cache.– Any memory location can be copied to any cache block.– We check every cache tag to determine whether the data is in the cache.
• This approach is too slow for large caches– Parallel tag searches are slow and use a lot of power– OK for a few entries…but hundreds/thousands is not feasible
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct mappedDirect mapped cache cache
• We can redesign the cache to eliminate the requirement for parallel tag lookups.
– Direct mapped caches partition memory into as many regions as there are cache lines
– Each memory block has a single cache line in which data can be placed.– You then only need to check a single tag – the one associated with the region
the reference is located in.
• Think: Modulus Hash Function
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Mapping memory to cacheMapping memory to cache
29
123
150162
18
33
19
210
0123456789
101112131415
tag data
78
120
71
173
21
28
200
225
tag line index block offset
Address:
0
1
2
3
1 bit2 bits1 bit
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped cacheDirect-mapped cache
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 0
Hits: 0
0
0
LRU
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped (REF 1)Direct-mapped (REF 1)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 0
Hits: 0
0
0
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped (REF 1)Direct-mapped (REF 1)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 1
Hits: 0
01
0
29
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped (REF 2)Direct-mapped (REF 2)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 1
Hits: 0
01
0
29
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped (REF 2)Direct-mapped (REF 2)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
1 71150
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 2
Hits: 0
01
0
29150
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped (REF 3)Direct-mapped (REF 3)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
1 71150
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 2
Hits: 0
01
0
29150
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped (REF 3)Direct-mapped (REF 3)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
1 71150
V d tag data
R0R1R2R3
Memory
0 150123
78
120
71
173
21
28
200
225
Misses: 3
Hits: 0
0
1
1
1
29150
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped (REF 4)Direct-mapped (REF 4)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
1 71150
V d tag data
R0R1R2R3
Memory
0 150123
78
120
71
173
21
28
200
225
Misses: 3
Hits: 0
0
1
1
1
29150
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped (REF 4)Direct-mapped (REF 4)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
1 71150
V d tag data
R0R1R2R3
Memory
0 150123
78
150
71
173
21
28
200
225
Misses: 4
Hits: 0
0
1
1
1
29150
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped (REF 4)Direct-mapped (REF 4)
29
150
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
1 71150
V d tag data
R0R1R2R3
Memory
1 16229
78
120
71
173
21
28
200
225
Misses: 4
Hits: 0
0
1
1
1
29150
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped (REF 5)Direct-mapped (REF 5)
29
150
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
1 71150
V d tag data
R0R1R2R3
Memory
1 16229
78
120
71
173
21
28
200
225
Misses: 4
Hits: 0
0
1
1
1
29150
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Direct-mapped (REF 5)Direct-mapped (REF 5)
29
150
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]
CacheProcessor
1 71150
V d tag data
R0R1R2R3
Memory
1 16229
78
120
71
173
21
28
200
225
Misses: 4
Hits: 1
0
1
1
1
2971
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Split the differenceSplit the difference
• Direct mapped costs us in performance– Certain memory access patterns can turn out poorly
• Set associative caches:– Partition memory into regions
• like direct mapped but fewer partitions– Associate a region to a set of cache blocks
• Check tags for all blocks in a set to determine a HIT
• Treat each set like a small fully associative cache.– LRU (or LRU-like) policy generally used.
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set Associative CacheSet Associative Cache
29
123
150162
18
33
19
210
0123456789
101112131415
tag data78
120
71
173
21
28
200
225
tag set index block offset
Address:
0
1
1 bit1 bits2 bit
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set Associative CacheSet Associative Cacheusing the book’s styleusing the book’s style
29
123
150162
18
33
19
210
0123456789
101112131415
tag data78
120
71
173
21
28
200
225
tag set index block offset
Address:
1 bit1 bits2 bit
tag data
Way 1 Way 2
01
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache exampleSet-associative cache example
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 0
Hits: 0
0
0
0
0
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 1)Set-associative cache (REF 1)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 0
Hits: 0
0
0
0
0
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 1)Set-associative cache (REF 1)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 1
Hits: 0
01
0
0
0
29
lru
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 2)Set-associative cache (REF 2)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
78
120
71
173
21
28
200
225
Misses: 1
Hits: 0
01
0
0
0
29
lru
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 2)Set-associative cache (REF 2)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
1 71150
78
120
71
173
21
28
200
225
Misses: 2
Hits: 0
0
0
1
1
0
0
29
lru
150
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 3)Set-associative cache (REF 3)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
1 71150
78
120
71
173
21
28
200
225
Misses: 2
Hits: 0
0
0
1
1
0
0
29
lru
150
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 3)Set-associative cache (REF 3)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
1 71150
78
120
71
173
21
28
200
225
Misses: 3
Hits: 0
0
0
1
1
1 162150
11
0
29
lru
150
lru
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 4)Set-associative cache (REF 4)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
1 71150
78
120
71
173
21
28
200
225
Misses: 3
Hits: 0
0
0
1
1
1 162150
11
0
29
lru
150
lru
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 4)Set-associative cache (REF 4)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
1 29150
78
120
71
173
21
28
200
225
Misses: 3
Hits: 1
0
1
1
1
1 162150
11
0
29
lru
150
lru
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 5)Set-associative cache (REF 5)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
1 29150
78
120
71
173
21
28
200
225
Misses: 3
Hits: 1
0
1
1
1
1 162150
11
0
29
lru
150
lru
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 5)Set-associative cache (REF 5)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
1 29150
78
120
71
173
21
28
200
225
Misses: 3
Hits: 2
0
1
1
1
1 162150
11
0
29
lru
150
lru
78
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 6)Set-associative cache (REF 6)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
1 29150
78
120
71
173
21
28
200
225
Misses: 3
Hits: 2
0
1
1
1
1 162150
11
0
29
lru
150
lru
78
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 6)Set-associative cache (REF 6)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
1 29150
78
120
29
173
21
28
200
225
Misses: 3
Hits: 2
0
1
1
1
1 162150
11
0
29
lru
150
lru
78
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Set-associative cache (REF 6)Set-associative cache (REF 6)
29
123
150162
18
33
19
210
0123456789
101112131415
Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]
CacheProcessor
0 7829
V d tag data
R0R1R2R3
Memory
2 1821
78
120
29
173
21
28
200
225
Misses: 4
Hits: 2
0
0
1
1
1 162150
11
0
29
lru
18
lru
78
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Reasons for cache missesReasons for cache misses• First reference to an address
– Compulsory miss• Reduce by increasing block size• or pre-fetching
• Cache is too small to hold all the data– Capacity miss
• Reduce misses by building a bigger cache
• Replaced it from a busy set– Conflict miss
• Reduce by increasing associativity
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Sample Cache hit ratesSample Cache hit rates
010
20
30
8K byte 16 K byte 32 K byte 64 K byte
Direct mapped
4-way set associative
Fully Associative
Cac
he
mis
s ra
te
Cache Size (block data only)
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Itanium-2 On-chip Caches (Original)Itanium-2 On-chip Caches (Original)• L1, 16KB, 4-way s.a., 64B line
– quad-port (2 load+2 store)– L1D for data, L1I for instructions– 1 cycle latency
• L2, 256KB, 4-way s.a, 128B line– quad-port (4 load or 4 store)– 5 cycle latency
• L3, 3MB, 12-way s.a., 128B line– single 32B port– 12 cycle latency
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Itanium-2 On-chip Caches (More Recent)Itanium-2 On-chip Caches (More Recent)• L1, 16KB, 4-way s.a., 64B line
– quad-port (2 load+2 store)– L1D for data, L1I for instructions– 2 cycle latency
• L2, 96KB, 4-way s.a, 128B line– quad-port (4 load or 4 store)– 9 cycle latency
• L3, 4MB, 12-way s.a., 128B line– single 32B port– 24 cycle latency
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Specializing Caches (Intel Pentium 4 Trace Cache)Specializing Caches (Intel Pentium 4 Trace Cache)• Intel IA-32 (x86) instructions are CISC (Complex)
– They can take many cycles to decode because of complexity and variable length
• Current (and recent) Intel chips have adopted a “RISC-like” organization– Different Interior Instruction Set (using “micro-ops”)– Exterior Instruction Set remains the same (for compatability)
• Need sophisticated (and slow) hardware to translate between the two instruction sets
• Intel introduced their Trace Cache to avoid having to repeatedly do this translation
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Specializing Caches (Intel Pentium 4 Trace Cache)Specializing Caches (Intel Pentium 4 Trace Cache)• The first time an instruction enters the processor, it is decoded into its
respective microinstructions– SAVE the string of micro-ops in a cache!– String together multiple micro-ops in a sequential order called a trace
• Break up traces by “basic blocks”
• If that instruction is accessed again, just grab the micro-ops directly from the trace cache.
• The trace cache operates in much the same way as a L1 Instruction cache– There is a bigger penalty for missing, since you have to:
• Load from L2 cache• Decode into micro-ops
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Specializing Caches (Intel Pentium 4 Trace Cache)Specializing Caches (Intel Pentium 4 Trace Cache)Trace Cache:• 12k ops• 8 way s.a.• 256 lines• “block size” is
6 ops
~ 80 KBytes
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Pitfall: How you access a 2-D arrayPitfall: How you access a 2-D arrayIn C/C++:
int bigArray[100][16];
How do we map this to a 1-D “storage array” (AKA memory?)C/C++ uses row-major order – store the first row, then the second, etc.
When accessing a row, how does a cache do?When accessing a column, how does a cache do?