1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper...

23
1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi, Sandhya Dwarkadas, Greg Semeraro, Grigorios Magklis, and Michael Scott ECE and CS Departments University of Rochester

Transcript of 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper...

Page 1: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

1

Integrating Adaptive On-Chip Storage Structures for Reduced

Dynamic Power

Steve Dropsho,

Alper Buyuktosunoglu, Rajeev Balasubramonian,

David H. Albonesi, Sandhya Dwarkadas,

Greg Semeraro, Grigorios Magklis, and Michael Scott

ECE and CS Departments

University of Rochester

Page 2: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

2

Why Adaptive Structures?

• General purpose uP are “one size fits all”

• But, needs vary across (within) applications

• Can save considerable energy by matching resources to the application

Objective: Less energy for same performanceby adapting storage structures to application

Page 3: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

3

Related Work

• Adaptable cache– Balasubramonian et al., MICRO 2000– Dhodapkar and Smith, ISCA 2002

• Adaptable issue logic– Buyuktosunoglu et al., GLS VLSI 2001– Folegnani and Gonzalez, ISCA 2000

Page 4: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

4

Common Themes

• A single adaptive structure

• Use of global information for feedback

• Exploration-based (caches)

Page 5: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

5

Related Work (cont)

• Adaptable IQ, LSQ, and ROB– Ponomarev et al., MICRO 2001– Three (3) adaptable structures– Reconfigurations based on local state

Page 6: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

6

Integrating Multiple Adaptive Structures

L2UnifiedCache

ROBRename

map

FPQ

IPREG

IIQ

LSQL1

Dcache

Branchpredict

L1Icache

Integer

Memory

Floating Pt

FPREG

Int FUs

FP FUs

FetchQ

Page 7: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

7

Challenges

• Multiple (9) adaptive structures creates state explosion problem

• Use of global information makes assigning cause and effect difficult

• Potential for additive performance effects among the structures

Page 8: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

8

Approach: Local Management

• Local information for configuration decisions

• Tight control over performance variance

Page 9: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

9

Part I: The Caches

L2UnifiedCache

ROBRename

map

FPQ

IPREG

IIQ

LSQL1

Dcache

Branchpredict

L1Icache

Integer

Memory

Floating Pt

FPREG

Int FUs

FP FUs

FetchQ

Page 10: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

10

The Accounting Cache

A access (primary)

B access (secondary)

• Sequential accesses, A then B• Save energy on A access hit• Swap blocks on A access miss

20 1 3

20 1 3

20 1 3

20 1 3

20 1 3 Swap

A1 B3

A2 B2

A3 B1

A4 B0

Page 11: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

11

Most-Recently-Used Statistics

0 1 2 3

Way 1 2 3 4

Line A B C D

0 1 2 3

0 1 2 3

01 2 3

0 1 2 3

01 2 3

01 2 3

MRU StateTransitions

MRU[0]

MRU StateCounters

MRU[1]

MRU[2]

MRU[3]

Misses

3

2

1

0

0A

A

A

B

B

C

Page 12: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

12

Configuration Evaluation

MRU[0] MRU[1] MRU[2] MRU[3] Misses

3 2 1 0 0

(lru)(mru)

Delay = 6 DA + 3 DB

Delay = 6 DA + 1 DB

Delay = 6 DA

Delay = 6 DA

Energy = 6 E1 + 3 E3

Energy = 7 E2

Energy = 6 E3

Energy = 6 E4BASE

Page 13: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

13

Tolerance and the Bank Account

• Tolerance allows more delay than BASE– DTOL = DBASE (1 + TOL)

– TOL = {0.015, 0.062, 0.25} (1/64, 1/16, 1/4)

• Bank account allows accumulation of unused tolerance

• Use account credits in later intervals– Allows aggressive resizing– Amortizes mistakes over many intervals

Page 14: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

14

Memory Hierarchy

20 1 3 20 1 3

20 1 3

L1I-Cache

(A/B)

L1D-Cache(A, no B)

L2Unified Cache

(A/B)

One PossibleConfiguration

Page 15: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

15

Environment

• Simplescalar simulator

• Microarchitecture is similar to Alpha 21264

• Benchmarks are a mix of SPEC95, SPEC2K, and Olden

• Energy models for buffers and caches from Buyuktosunoglu et al., GLS VLSI 2001 and Balasubramonian et al., MICRO 2000

Page 16: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

16

Cache Results

Page 17: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

17

Part II: Queues, Regs, and ROB

L2UnifiedCache

ROBRename

map

FPQ

IPREG

IIQ

LSQL1

Dcache

Branchpredict

L1Icache

Integer

Memory

Floating Pt

FPREG

Int FUs

FP FUs

FetchQ

Page 18: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

18

Resizable Queues/Reg File

m

Buffer

PN

P1

N partitions of m elements

Page 19: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

19

Buffer SizingDistribution ofBuffer Size

0

0

0

Full

Full

Full

Grow buffer

Proper size

Precise shrink

ave

ave

• 8K cycle period• Tolerances:

• 1.5% (1/64)• 6.2% (1/16)• 25.0% (1/4)

WithLimited Histogramming

Page 20: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

20

Resizing the Register File

• Issue: Do not know when registers expire

• Solution: To make reg file smaller, move values out of partition (P) to be turned off– First, inhibit new assignments to P– Next, use a software interrupt routine to move

values via normal rename logic mov r1 r1

– Register mappings automatically updated

Page 21: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

21

Floating Point App Results

Page 22: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

22

Summary Results

Page 23: 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

23

Conclusion

• Simultaneous adaptation of all major regular structures– Accounting cache

– Limited histogramming for buffers

– Adaptable register file

• Local control yet tolerable performance loss

• Future work– Augment local control with global control for bounded

performance loss