Combining Software and Hardware Monitoring for Improved Power and Performance Tuning
description
Transcript of Combining Software and Hardware Monitoring for Improved Power and Performance Tuning
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning
Eric Chi, A. Michael Salem, and R. Iris BaharBrown University
Division of Engineering
Richard WeissHampshire College
School of Cognitive Science
BROWN UNIVERSITY
BROWN UNIVERSITY BARC January 30, 2003
Motivation Performance drives high-end processor design
Include many complex architectural features Resources may not always be optimally utilized
Resources dissipate some power regardless of utilization Dynamic schemes allow processor to reconfigure
resources according to program’s needs Some means of monitoring program is needed to
drive reconfiguration
BROWN UNIVERSITY BARC January 30, 2003
Monitoring Options Hardware monitoring
Relatively easy to implement Can easily adjust to changing patterns Must first recognize pattern before reacting Restricted to fixed-sized sampling windows
Software profiling Reconfiguration occurs in anticipation of changing needs Sampling ranges are adaptable Requires instruction annotation and initial sampling
overhead Only applicable to instructions with very deterministic
behavior
BROWN UNIVERSITY BARC January 30, 2003
Why Not Combine?
Each has its particular benefits If hardware and software techniques can be
combined, can we improve the control policies driving processor reconfiguration?
Potentially lead to better energy savings and higher overall performance.
BROWN UNIVERSITY BARC January 30, 2003
Our Goal
Have HW and SW profiling work together to better identify program behavior Allow processor to react more quickly to strongly
deterministic behavior Allow HW monitoring to assist with hard-to-predict
cases with hints from software profiling
BROWN UNIVERSITY BARC January 30, 2003
Low Power Configurations
We consider 2 different configurations separately: Reducing issue width and ALUs
Save power in issue queue arbitration logic Save power from underutilized ALUs
Fetch Halting Triggered by a critical load missing to main memory Fetching is disabled for the duration of the miss Reduces occupancy rates in fetch and issue queues Reduces number of wrong path instructions fetched
BROWN UNIVERSITY BARC January 30, 2003
Pipeline Organization
AnnotationDecoder
AnnotationDecoder
BranchPredictor
BranchPredictor
FetchUnit
FetchUnit
InstructionCache
InstructionCache
InstructionDecoder
InstructionDecoder
InstructionScheduler
InstructionScheduler
RegisterFile
RegisterFile
Integer ALUCluster 1
Integer ALUCluster 1
Integer ALUCluster 2
Integer ALUCluster 2
Floating Point ALUCluster 2
Floating Point ALUCluster 2
Floating Point ALUCluster 1
Floating Point ALUCluster 1
Load/Store UnitLoad/Store Unit
Load/Store UnitLoad/Store Unit
Load/Store UnitLoad/Store Unit DataCache
DataCache
Low-Power StateLogic
Low-Power StateLogic
Disabl
e Fet
ch
Unit
Disable auxiliary ALU cluster and
reduce issue width
BROWN UNIVERSITY BARC January 30, 2003
Adjusting Issue Width Adjust issue width between 8 and 4 and disable
second integer ALU cluster SW approach profiles IPC from train dataset
Annotates blocks with low IPC Decoding start of block triggers entry to LP mode
HW approach using built-in counters to monitor IPC Use fixed 256 cycle window If integer IPC < threshold, enter LP mode
Combined approach SW steers blocks with consistent behavior HW handles remaining blocks
BROWN UNIVERSITY BARC January 30, 2003
Results for Reduced Issue Width % Time w/ Reduced Issue Width
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
gzip mgrid vpr gcc mcf equake vortex Average
SW
HW
COMB
SW and HW results are comparable COMBined results show that SW + HW methods identify different
opportunities for saving power
BROWN UNIVERSITY BARC January 30, 2003
Results for Reduced Issue Width Performance w/ Reduced Issue Width
93%
94%
95%
96%
97%
98%
99%
100%
gzip mgrid vpr gcc mcf equake vortex Average
SW
HW
COMB
SW performance is more consistent because thresholds can be tuned on a per-application basis
BROWN UNIVERSITY BARC January 30, 2003
Fetch Halting Requires a combination of SW and HW monitoring: SW profiling:
Identify critical loads that miss to main memory IPC, occupancy rates, dead cycles, “miss stride”
HW monitoring: Using annotations from SW profiling, HW tracks miss
behavior only for “promising” load instructions. Miss stride from annotations is compared to miss counter in
HW to capture dynamic miss behavior
For now we simulate a perfect miss-predictor
BROWN UNIVERSITY BARC January 30, 2003
Fetch Halting Potential
Memory access rates shows that the fetch halting potential for each benchmark varies
Bench-mark
% DL1 miss
% L2 miss
% mem access
mgrid 3.9% 22.8% 0.9%
vpr 4.5% 24.7% 1.1%
gcc 0.5% 12.8% 0.1%
mcf 23.8% 48.0% 11.4%
twolf 6.4% 20.1% 1.3%
BROWN UNIVERSITY BARC January 30, 2003
Results for fetch halting
Restricting fetch halting based on criticality information benefits performance
Fetch Halting Performance and Percent Time Halting
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
mgrid vpr gcc mcf twolf
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Perfect w/o Crit Performance Perfect w/ Crit Performance
Perfect w/o Crit Halting Time Perfect w/ Crit Halting Time
BROWN UNIVERSITY BARC January 30, 2003
Fetch Halting and RUU Occupancy
Perfect + crit results in average 10% RUU occupancy drop
Fetch Halting's Effect on RUU Occupancy
50%
55%
60%
65%
70%
75%
80%
85%
90%
95%
100%
mgrid vpr gcc mcf twolf
baseline
perfect w/o crit
perfect w/ crit
BROWN UNIVERSITY BARC January 30, 2003
Conclusions and Future Work HW and SW predict different low power
events and can be combined offering greater power saving potential.
Future work: Improve HW/SW combination scheme Improve criticality predictor Currently working on HW miss predictor Adjust the halt period