Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H....

22
Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Transcript of Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H....

Page 1: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Dynamic Power Redistribution in Failure-Prone CMPs

Paula Petrica, Jonathan A. Winter* and David H. Albonesi

Cornell University

*Google, Inc.

Page 2: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 2

Motivation Hardware failures expected to become

prominent in future generations

Front End (FE)

Back End (BE)

Load-Store Queue (LSQ)Core

Page 3: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 3

Motivation Deconfiguration

tolerates defects at the expense of performance

Pipeline imbalance Units correlated with

deconfigured one might become overprovisioned

Power inefficiencies Application specific

Front End (FE)

Back End (BE)

Load-Store Queue (LSQ)Core

Page 4: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 4

Research Goal

Given a CMP with a set of failures and a power budget: Eliminate power inefficiencies Improve performance

Page 5: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 5

Outline

Motivation Architecture

Power Harnessing Performance Boosting

Power Transfer Runtime Manager Conclusions and future work

Page 6: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 6

Core 2

Front End (FE)

Load-Store Queue (LSQ)

Architecture

Two-step approach Transfer power Harness Power

Back End (BE)

Core 1

Front End (FE)

Load-Store Queue (LSQ)

Back End (BE)

Page 7: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 7

Power Harnessing

FQDecode/ Rename

Dispatch

ROB

IQ

Select

D-Cache

RFBPred

I-Cache

FE

BE

LSQ

Page 8: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 8

Pipeline Imbalance

Per

form

ance

Los

s Pow

er Saved

Page 9: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 9

Performance Boosting

Distribute accumulated margin of power to boost performance Temporarily enable a previously dormant feature

Requirements Small area and fast power-up Small PPR (Power-Performance Ratio)

Page 10: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 10

Performance Boosting Techniques

Speculative Cache Access Speculatively send L1 requests to the L2 cache Speculatively access both tag and data in the L2

cache at the same time (rather than serially) Turned on independently or in combination Approximately linear power-performance relationship Benefits applications limited by L1 cache capacity

LoadL1

CacheL2

CacheL1 Miss Tag Data

Lower Hierarchy Level

miss

hit

L2 Cache

Tag

Data

Lower Hierarchy Level

miss

L2 Cache

Page 11: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 11

Performance Boosting Techniques

Boosting main memory performance CLEAR [N. Kirman et al, HPCA 2005] Predict and speculatively retire long latency loads Supply predicted values to destination registers Free processor resources for non-dependent

instructions Linear power-performance relationship Benefits memory bound applications

Page 12: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 12

Performance Boosting Techniques

DVFS Scale up voltage and frequency Already built in Cubic power cost for linear performance benefit Benefits high-IPC applications

Page 13: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 13

Comparison of Boosting Techniques

Per

form

ance

Im

prov

emen

t

Page 14: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 14

Core 2

Front End (FE)

Load-Store Queue (LSQ)

Architecture

Two-step approach Transfer power Harness Power

Back End (BE)

Core 1

Front End (FE)

Load-Store Queue (LSQ)

Back End (BE)

Page 15: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 15

Power Transfer Runtime Manager Periodically coordinate chip-wide effort to

relocate power among cores Obtain current local hardware deconfiguration

status (due to faults) Determine additional components to be

deconfigured Transfer power to one or more mechanisms that

make best use of it

Page 16: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 16

Power Transfer Runtime Manager

Sampling Phase

Steady Phase

Sample deconfigurations

Choose additional deconfiguration

Sample performance boosting

Compute global throughput with fairness

Choose best 4-core configuration

Apply DVFS (greedy)

Local decisions

Global Decisions

Page 17: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 17

Global vs Local Optimization 100 4-core configurations, random errors and random SPEC

CPU2000 benchmarks

22.2%

10.0%

Spe

edup

Page 18: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 18

Diversity of Boosting Techniques 100 4-core configurations, random errors and random SPEC

CPU2000 benchmarks

22.2%

6.3%

Spe

edup

Page 19: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 19

Power Transfer Runtime Manager 100 4-core configurations, random errors and random SPEC

CPU2000 benchmarks

22.2%

15.3%

10.0%

6.3%

Spe

edup

Page 20: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 20

Conclusions We proposed a technique to increase performance

given a certain power budget in the presence of hard faults

Exploited the deconfiguration capabilities already built in microprocessors

Demonstrated that pipeline imbalances and additional deconfiguration are application-dependent

Proposed several boosting techniques Demonstrated the potential for substantial

performance gains for a 4-core CMP

Page 21: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Paula Petrica WEED2010 21

Future Work Heuristic approaches to scale this problem to

many cores Simulated Annealing, Genetic Algorithm Pareto optimal fronts to reduce the number of

combinations Hierarchical optimization

Page 22: Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.

Questions?