Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan...
-
Upload
dayana-strother -
Category
Documents
-
view
216 -
download
0
Transcript of Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan...
![Page 1: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/1.jpg)
Counting Stream Registers: An Efficient and Effective Snoop Filter
Architecture
Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter (BFH), Philip Brisk (UC Riverside), Edoardo
Charbon (TU Delft), Paolo Ienne (EPFL)
![Page 2: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/2.jpg)
2 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Multicore Embedded Systems
• Increasing number of multiprocessor based embedded systems.
• Low energy requirement with little compromise on performance.
• Significant energy consumption in the memory subsystem (caches, shared bus, main memory).
![Page 3: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/3.jpg)
3 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Symmetric Multiprocessor System
SharedMemory
D$I$
CPU 1
D$I$
CPU 2
D$I$
CPU n
![Page 4: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/4.jpg)
4 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Cache Coherency Problem
SharedMemory
D$I$
CPU 1
D$I$
CPU 2
D$I$
CPU n
![Page 5: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/5.jpg)
5 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Snoopy Hardware Coherence Protocols
SharedMemory
D$I$
CPU 1
D$I$
CPU 2
D$I$
CPU n
Snoop misses consume
excessive energy
![Page 6: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/6.jpg)
6 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Snoop Filters
SharedMemory
D$I$
CPU 1
D$I$
CPU 2
D$I$
CPU n
SF SF SF
Snoop filter lookup costs lesser energy than a cache
lookup
![Page 7: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/7.jpg)
7 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Snoop Filters in Prior Art
• Include, Exclude and Hybrid JETTY– Expensive for an embedded system in terms of
area.– Energy consumed by the JETTYs itself is
significant.
• Stream Registers– Present in IBM's BlueGene Supercomputer.– Inclusive filter.– Uses a base and mask register pair to track the
cache lines.
![Page 8: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/8.jpg)
8 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Stream Registers
1 0 0 1 1 1 1 1 10b1001
1 0 0 1 1 1 0 0 10b1010
--- --- 0
Base Mask Valid
No general mechanism to remove address from SR
without compromising correctness
Addresses with 10XX result in snoop filter hit
![Page 9: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/9.jpg)
9 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Drawbacks of Stream Register based Snoop Filters
• No efficient way to update the registers when a line is removed from cache– Degraded filtering performance over time– Additional logic units introduced but not
efficient (e.g., cache wrap detection)
![Page 10: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/10.jpg)
10 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Our Contribution
• Counting Stream Registers– Eliminates cache wrap detection logic– Counter to track cache lines– More robust to workload variability– Better or similar energy savings compared to
SRs
![Page 11: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/11.jpg)
11 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Counting Stream Registers
1 0 0 1 1 1 1 1 0x010b1001
1 0 0 1 1 1 0 0 0x020b1010
--- --- 0
Base Mask Counter
Removes the need for extra logic such as cache wrap detection, active register
history etc.
Invalidated cache lines can be trackedby decrementing the counter
![Page 12: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/12.jpg)
12 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Snoop Filter Architecture
Index to direct mapped snoop filter table
Set of cache lines grouped into a page
Used for comparison with base register
![Page 13: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/13.jpg)
13 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Experimental Analysis
• Virtex 2 FPGA running OpenRISC soft cores– Configurable no. of processors, associativity and
size of data and instruction cache, cache type and coherence protocol
• EEMBC Multibench Benchmarks• CACTI 5.3 energy model
– Total memory subsystem energy accounted for main memory r/w energy, data and instruction cache r/w energy, leakage and snoop energy
![Page 14: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/14.jpg)
14 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Cache Design Space Exploration
![Page 15: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/15.jpg)
15 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Results: Filtering Percentage
CSR achieves higher filtering % for smaller number of
registers
![Page 16: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/16.jpg)
16 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Analysis: RGB2CMYK Benchmark
![Page 17: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/17.jpg)
17 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Discussion: Energy Consumption
• For most benchmarks, snoop energy was around 8-10% of the total memory subsystem energy without snoop filters
• CSR filters more effective for certain benchmarks (H.264, Image rotation)– Better filtering performance with smaller no. of stream
registers.
• Small reduction in overall energy– Platform limited to 32 MB of off-chip SDRAM– No complex data sharing and limited no. of multiple
producers of same data
![Page 18: Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cb15503460f9497627d/html5/thumbnails/18.jpg)
18 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture
Summary
• Introduced counting stream registers based snoop filter architecture– Lesser hardware complexity and ability to track
cache line invalidations
• Experimental evaluation shows better filtering percentage than stream registers with lesser performance variation for different workloads.