Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power
description
Transcript of Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power
![Page 1: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/1.jpg)
1
Integrating Adaptive On-Chip Storage Structures for Reduced
Dynamic Power
Steve Dropsho,
Alper Buyuktosunoglu, Rajeev Balasubramonian,
David H. Albonesi, Sandhya Dwarkadas,
Greg Semeraro, Grigorios Magklis, and Michael Scott
ECE and CS Departments
University of Rochester
![Page 2: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/2.jpg)
2
Why Adaptive Structures?
• General purpose uP are “one size fits all”
• But, needs vary across (within) applications
• Can save considerable energy by matching resources to the application
Objective: Less energy for same performanceby adapting storage structures to application
![Page 3: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/3.jpg)
3
Related Work
• Adaptable cache– Balasubramonian et al., MICRO 2000– Dhodapkar and Smith, ISCA 2002
• Adaptable issue logic– Buyuktosunoglu et al., GLS VLSI 2001– Folegnani and Gonzalez, ISCA 2000
![Page 4: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/4.jpg)
4
Common Themes
• A single adaptive structure
• Use of global information for feedback
• Exploration-based (caches)
![Page 5: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/5.jpg)
5
Related Work (cont)
• Adaptable IQ, LSQ, and ROB– Ponomarev et al., MICRO 2001– Three (3) adaptable structures– Reconfigurations based on local state
![Page 6: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/6.jpg)
6
Integrating Multiple Adaptive Structures
L2UnifiedCache
ROBRename
map
FPQ
IPREG
IIQ
LSQL1
Dcache
Branchpredict
L1Icache
Integer
Memory
Floating Pt
FPREG
Int FUs
FP FUs
FetchQ
![Page 7: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/7.jpg)
7
Challenges
• Multiple (9) adaptive structures creates state explosion problem
• Use of global information makes assigning cause and effect difficult
• Potential for additive performance effects among the structures
![Page 8: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/8.jpg)
8
Approach: Local Management
• Local information for configuration decisions
• Tight control over performance variance
![Page 9: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/9.jpg)
9
Part I: The Caches
L2UnifiedCache
ROBRename
map
FPQ
IPREG
IIQ
LSQL1
Dcache
Branchpredict
L1Icache
Integer
Memory
Floating Pt
FPREG
Int FUs
FP FUs
FetchQ
![Page 10: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/10.jpg)
10
The Accounting Cache
A access (primary)
B access (secondary)
• Sequential accesses, A then B• Save energy on A access hit• Swap blocks on A access miss
20 1 3
20 1 3
20 1 3
20 1 3
20 1 3 Swap
A1 B3
A2 B2
A3 B1
A4 B0
![Page 11: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/11.jpg)
11
Most-Recently-Used Statistics
0 1 2 3
Way 1 2 3 4
Line A B C D
0 1 2 3
0 1 2 3
01 2 3
0 1 2 3
01 2 3
01 2 3
MRU StateTransitions
MRU[0]
MRU StateCounters
MRU[1]
MRU[2]
MRU[3]
Misses
3
2
1
0
0A
A
A
B
B
C
![Page 12: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/12.jpg)
12
Configuration Evaluation
MRU[0] MRU[1] MRU[2] MRU[3] Misses
3 2 1 0 0
(lru)(mru)
Delay = 6 DA + 3 DB
Delay = 6 DA + 1 DB
Delay = 6 DA
Delay = 6 DA
Energy = 6 E1 + 3 E3
Energy = 7 E2
Energy = 6 E3
Energy = 6 E4BASE
![Page 13: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/13.jpg)
13
Tolerance and the Bank Account
• Tolerance allows more delay than BASE– DTOL = DBASE (1 + TOL)
– TOL = {0.015, 0.062, 0.25} (1/64, 1/16, 1/4)
• Bank account allows accumulation of unused tolerance
• Use account credits in later intervals– Allows aggressive resizing– Amortizes mistakes over many intervals
![Page 14: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/14.jpg)
14
Memory Hierarchy
20 1 3 20 1 3
20 1 3
L1I-Cache
(A/B)
L1D-Cache(A, no B)
L2Unified Cache
(A/B)
One PossibleConfiguration
![Page 15: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/15.jpg)
15
Environment
• Simplescalar simulator
• Microarchitecture is similar to Alpha 21264
• Benchmarks are a mix of SPEC95, SPEC2K, and Olden
• Energy models for buffers and caches from Buyuktosunoglu et al., GLS VLSI 2001 and Balasubramonian et al., MICRO 2000
![Page 16: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/16.jpg)
16
Cache Results
![Page 17: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/17.jpg)
17
Part II: Queues, Regs, and ROB
L2UnifiedCache
ROBRename
map
FPQ
IPREG
IIQ
LSQL1
Dcache
Branchpredict
L1Icache
Integer
Memory
Floating Pt
FPREG
Int FUs
FP FUs
FetchQ
![Page 18: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/18.jpg)
18
Resizable Queues/Reg File
m
Buffer
PN
P1
N partitions of m elements
![Page 19: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/19.jpg)
19
Buffer SizingDistribution ofBuffer Size
0
0
0
Full
Full
Full
Grow buffer
Proper size
Precise shrink
ave
ave
• 8K cycle period• Tolerances:
• 1.5% (1/64)• 6.2% (1/16)• 25.0% (1/4)
WithLimited Histogramming
![Page 20: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/20.jpg)
20
Resizing the Register File
• Issue: Do not know when registers expire
• Solution: To make reg file smaller, move values out of partition (P) to be turned off– First, inhibit new assignments to P– Next, use a software interrupt routine to move
values via normal rename logic mov r1 r1
– Register mappings automatically updated
![Page 21: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/21.jpg)
21
Floating Point App Results
![Page 22: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/22.jpg)
22
Summary Results
![Page 23: Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813caf550346895da65b78/html5/thumbnails/23.jpg)
23
Conclusion
• Simultaneous adaptation of all major regular structures– Accounting cache
– Limited histogramming for buffers
– Adaptable register file
• Local control yet tolerable performance loss
• Future work– Augment local control with global control for bounded
performance loss