Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

15
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division of Engineering Richard Weiss Hampshire College School of Cognitive Science BROWN UNIVERSITY

description

Combining Software and Hardware Monitoring for Improved Power and Performance Tuning. Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division of Engineering. Richard Weiss Hampshire College School of Cognitive Science. BROWN UNIVERSITY. Motivation. - PowerPoint PPT Presentation

Transcript of Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

Page 1: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

Eric Chi, A. Michael Salem, and R. Iris BaharBrown University

Division of Engineering

Richard WeissHampshire College

School of Cognitive Science

BROWN UNIVERSITY

Page 2: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Motivation Performance drives high-end processor design

Include many complex architectural features Resources may not always be optimally utilized

Resources dissipate some power regardless of utilization Dynamic schemes allow processor to reconfigure

resources according to program’s needs Some means of monitoring program is needed to

drive reconfiguration

Page 3: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Monitoring Options Hardware monitoring

Relatively easy to implement Can easily adjust to changing patterns Must first recognize pattern before reacting Restricted to fixed-sized sampling windows

Software profiling Reconfiguration occurs in anticipation of changing needs Sampling ranges are adaptable Requires instruction annotation and initial sampling

overhead Only applicable to instructions with very deterministic

behavior

Page 4: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Why Not Combine?

Each has its particular benefits If hardware and software techniques can be

combined, can we improve the control policies driving processor reconfiguration?

Potentially lead to better energy savings and higher overall performance.

Page 5: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Our Goal

Have HW and SW profiling work together to better identify program behavior Allow processor to react more quickly to strongly

deterministic behavior Allow HW monitoring to assist with hard-to-predict

cases with hints from software profiling

Page 6: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Low Power Configurations

We consider 2 different configurations separately: Reducing issue width and ALUs

Save power in issue queue arbitration logic Save power from underutilized ALUs

Fetch Halting Triggered by a critical load missing to main memory Fetching is disabled for the duration of the miss Reduces occupancy rates in fetch and issue queues Reduces number of wrong path instructions fetched

Page 7: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Pipeline Organization

AnnotationDecoder

AnnotationDecoder

BranchPredictor

BranchPredictor

FetchUnit

FetchUnit

InstructionCache

InstructionCache

InstructionDecoder

InstructionDecoder

InstructionScheduler

InstructionScheduler

RegisterFile

RegisterFile

Integer ALUCluster 1

Integer ALUCluster 1

Integer ALUCluster 2

Integer ALUCluster 2

Floating Point ALUCluster 2

Floating Point ALUCluster 2

Floating Point ALUCluster 1

Floating Point ALUCluster 1

Load/Store UnitLoad/Store Unit

Load/Store UnitLoad/Store Unit

Load/Store UnitLoad/Store Unit DataCache

DataCache

Low-Power StateLogic

Low-Power StateLogic

Disabl

e Fet

ch

Unit

Disable auxiliary ALU cluster and

reduce issue width

Page 8: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Adjusting Issue Width Adjust issue width between 8 and 4 and disable

second integer ALU cluster SW approach profiles IPC from train dataset

Annotates blocks with low IPC Decoding start of block triggers entry to LP mode

HW approach using built-in counters to monitor IPC Use fixed 256 cycle window If integer IPC < threshold, enter LP mode

Combined approach SW steers blocks with consistent behavior HW handles remaining blocks

Page 9: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Results for Reduced Issue Width % Time w/ Reduced Issue Width

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

gzip mgrid vpr gcc mcf equake vortex Average

SW

HW

COMB

SW and HW results are comparable COMBined results show that SW + HW methods identify different

opportunities for saving power

Page 10: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Results for Reduced Issue Width Performance w/ Reduced Issue Width

93%

94%

95%

96%

97%

98%

99%

100%

gzip mgrid vpr gcc mcf equake vortex Average

SW

HW

COMB

SW performance is more consistent because thresholds can be tuned on a per-application basis

Page 11: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Fetch Halting Requires a combination of SW and HW monitoring: SW profiling:

Identify critical loads that miss to main memory IPC, occupancy rates, dead cycles, “miss stride”

HW monitoring: Using annotations from SW profiling, HW tracks miss

behavior only for “promising” load instructions. Miss stride from annotations is compared to miss counter in

HW to capture dynamic miss behavior

For now we simulate a perfect miss-predictor

Page 12: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Fetch Halting Potential

Memory access rates shows that the fetch halting potential for each benchmark varies

Bench-mark

% DL1 miss

% L2 miss

% mem access

mgrid 3.9% 22.8% 0.9%

vpr 4.5% 24.7% 1.1%

gcc 0.5% 12.8% 0.1%

mcf 23.8% 48.0% 11.4%

twolf 6.4% 20.1% 1.3%

Page 13: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Results for fetch halting

Restricting fetch halting based on criticality information benefits performance

Fetch Halting Performance and Percent Time Halting

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

mgrid vpr gcc mcf twolf

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Perfect w/o Crit Performance Perfect w/ Crit Performance

Perfect w/o Crit Halting Time Perfect w/ Crit Halting Time

Page 14: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Fetch Halting and RUU Occupancy

Perfect + crit results in average 10% RUU occupancy drop

Fetch Halting's Effect on RUU Occupancy

50%

55%

60%

65%

70%

75%

80%

85%

90%

95%

100%

mgrid vpr gcc mcf twolf

baseline

perfect w/o crit

perfect w/ crit

Page 15: Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

BROWN UNIVERSITY BARC January 30, 2003

Conclusions and Future Work HW and SW predict different low power

events and can be combined offering greater power saving potential.

Future work: Improve HW/SW combination scheme Improve criticality predictor Currently working on HW miss predictor Adjust the halt period