Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Razor: Dynamic Voltage Scaling Based on Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation Circuit-Level Timing Speculation
Advanced Computer Architecture LaboratoryThe University of Michigan
Dan Ernst, Nam Sung Kim,Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham,
and Conrad ZieslerFaculty Members: David Blaauw, Todd Austin, and Trevor Mudge
Krisztián Flautner, ARM Ltd.
December 3rd, 2003
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Dynamic Voltage Scaling and Design UncertaintyDynamic Voltage Scaling and Design Uncertainty• DVS - Adapting voltage/frequency to meet performance demands of workload
– Lower processor voltage during periods of low utilization– Lower Voltage is a Good Thing™ for power
• Minimum voltage is limited by Safety Margins– Error-free operation must be guaranteed!
• Technology trends are Maximizing the Minimums
– Process and temperature variation– Capacitive and inductive noise
• Key Observation: worst-case conditions also highly improbable– Significant gain for circuits optimized for common case– Efficient mechanisms needed to tolerate infrequent worst-case scenarios
Intra-die variations in ILD thickness
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Shaving Voltage Margins with RazorShaving Voltage Margins with Razor• Goal: reduce voltage margins with in-situ error detection and correction
for delay failures
• Proposed Approach:– Remove safety margins and tolerate occasional errors– Tune processor voltage based on error rate– Purposely run below critical voltage
• Data-dependent latency margins
• Trade-off: voltage power savings vs. overhead of correction– Analogous to wireless power modulation
0.8 1.0 1.2 1.4 1.6 1.8 2.0
0
20
40
60
Supply VoltageP
erce
nta
ge
Erro
rs
Traditional DVS
Zero margin Sub-critical
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Razor Timing Error DetectionRazor Timing Error Detection
• Second sample of logic value used to validate earlier sample
• Key design issues:– Maintaining pipeline forward progress - Meta-stable results in main flip-
flop– Short path impact on shadow-latch - Recovering pipeline state after errors– Power overhead of error detection and correction
Mai
n FF
Sha
dow
Latc
h
Mai
n FF
clk clk
clk_del
5
49 MEM39
9
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
998
Razor Short Path ConstraintRazor Short Path Constraint
• Second sample of logic value used to validate earlier sample
• Key design issues:– Maintaining pipeline forward progress - Meta-stable results in main flip-
flop– Short path impact on shadow-latch - Recovering pipeline state after errors– Power overhead of error detection and correction
Mai
n FF
Sha
dow
Latc
h
Mai
n FF
clk clk
clk_del
5
4
Hold Constraint(~1/2 cycle)
MEM
8
3
2
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
inst2
IF
Razo
r FF ID
Razo
r FF EX
Razo
r FF MEM WB
(reg/mem)
error
recover recover recover
Razo
r FF
PC
recover
errorerror error
clock
Cycle: 0inst1inst3inst4inst5
123456inst6
Centralized Razor Pipeline Error RecoveryCentralized Razor Pipeline Error Recovery
• Once cycle penalty for timing failure• Global synchronization may be difficult for fast, complex designs
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
recover
IF
Razo
r FF ID
Razo
r FF EX
Razo
r FF MEM
(read-only)WB
(reg/mem)
error bubble
recover recover
Razo
r FF
Stab
ilizer
FF
PC
recover
flushID
bubbleerror bubble
flushID
error bubble
flushIDFlushControl
flushID
error
Cycle: 0
inst1inst2inst3inst4inst5
123456
inst6
Distributed Razor Pipeline Error RecoveryDistributed Razor Pipeline Error Recovery
inst2inst7inst8
789
inst3inst4
• Multiple cycle penalty for timing failure• Scalable design since all recovery communication is local• Builds on existing branch / data speculation recovery framework
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Error-Rate Studies – Hardware Measurement Error-Rate Studies – Hardware Measurement
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
18x18-bit Multiplier Block at 90 MHz and 27 C
0.0000000%
0.0000001%
0.0000010%
0.0000100%
0.0001000%
0.0010000%
0.0100000%
0.1000000%
1.0000000%
10.0000000%
100.0000000%
1.141.181.221.261.301.341.381.421.461.501.541.581.621.661.701.741.78
Supply Voltage (V)
Err
or r
ate
random
Zero-margin@ 1.54 V
Environmental-margin@ 1.69 V
Error Rate Studies – Empirical ResultsError Rate Studies – Empirical Results
35% energy savings with 1.3% error
22% saving
once every 20 seconds!
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Error Rate Studies – SPICE-Level SimulationsError Rate Studies – SPICE-Level Simulations• Based on a SPICE-level simulations of a Kogge-Stone adder
Kogge-Stone Adder at 870 MHz and 27 C
0.00%
0.01%
0.10%
1.00%
10.00%
100.00%
0.60.811.21.41.61.82
Supply Voltage
Err
or r
ate
random
bzip
ammp
200 mV
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Razor I - Prototype Razor ImplementationRazor I - Prototype Razor Implementation• 4 stage 64-bit Alpha pipeline:
– 200MHz expected operation in 0.18mtechnology, 1.8V, ~500mW
– Tunable via software from50-200MHz, 1.1-1.8V
– Razor applied to combinational logic
• Razor overhead:– Total of 192 Razor flip-flops
out of 2408 total (9%)– Error-free power overhead: ~ 3%
D-Cache
IF ID EX
ME
M
WB
Register FileI-Cache
3.3 mm
3 mm
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Effects of Razor DVSEffects of Razor DVS
Decreasing Supply Voltage
Energy
Energy of ProcessorOperations, Eproc
Energy ofPipeline
Recovery,Erecovery
Total Energy,Etotal = Eproc + Erecovery
Optimal Etotal
PipelineThroughput
IPC
Energy of Processorw/o Razor Support
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
EX-Stage Analysis – Optimal Voltage SweepEX-Stage Analysis – Optimal Voltage SweepBZIP
0.31% Error Rate,58% Energy Savings
0.2
0.4
0.6
0.8
1
1.2
1.4
0.6
0.6
5
0.7
0.7
5
0.8
0.8
5
0.9
0.9
51
1.0
5
1.1
1.1
5
1.2
1.2
5
1.3
1.3
5
1.4
1.4
5
1.5
1.5
5
1.6
1.6
5
1.7
1.7
5
1.8
Voltage
Re
lati
ve
IPC
an
d E
ne
rgy
Rel Energy
Rel Performance
Recovery cost includes energy torecover entire pipeline (18x an add)
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
EX-Stage Analysis – Optimal Voltage SweepEX-Stage Analysis – Optimal Voltage SweepGCC
1.62% Error Rate,24% Energy Savings
0.2
0.4
0.6
0.8
1
1.2
1.4
0.6
0.650.
7
0.750.
8
0.850.
9
0.95
1
1.051.
1
1.151.
2
1.251.
3
1.351.
4
1.451.
5
1.551.
6
1.651.
7
1.751.
8
Voltage
Rel
ativ
e IP
C a
nd
En
erg
y
Rel Energy
Rel Performance
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Simulation Analysis – Energy-Optimal VoltageSimulation Analysis – Energy-Optimal Voltage
0
20
40
60
80
100
120
bzip crafty eon gap gcc gzip mcf parser twolf vortex vpr Average
Pe
rce
nta
ge
of
Ba
se
line
(ze
ro-m
arg
in)
Total Energy
IPC
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Simulation Analysis – Razor DVS ExecutionSimulation Analysis – Razor DVS Execution
GCC
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Time
Su
pp
ly V
olt
ag
e
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
Err
or
Ra
te
Voltage
Error Rate
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
0
20
40
60
80
100
120
bzip crafty eon gap gcc gzip mcf parser twolf vortex vpr Average
Pe
rce
nta
ge
of
Ba
se
line
(ze
ro-m
arg
in)
Total Energy
DVS Energy
IPC
DVS IPC
Simulation Analysis – Razor DVS PerformanceSimulation Analysis – Razor DVS Performance
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
ConclusionsConclusions• In-situ detection/correction of timing errors
– Eliminate process, temperature, and safety margins – Tune processor voltage based on error rate– Purposely run below critical voltage to capture
data-dependent latency margins
• Implemented with architecture/circuit support– Double-sampling metastability-tolerant
Razor flip-flops validate logic results– Pipeline initiates recovery after circuit timing errors,
no voltage/clock re-tuning needed
• Trade-off: supply voltage power savingsvs. overhead of correction
– Running with error is good!
Error_L
Errorcomparator
RAZOR FFclk_del
Main Flip-Flop
clk
Shadow Latch
Q1D101
recover
IF
Razo
r FF
ID
Razo
r FF
EX
Razo
r FF
MEM(read-only)
WB(reg/mem)
errorbubble
recover recover
Razo
r FF
Stab
ilizer
FF
PC
recover
flushID
bubble
errorbubble
flushID
errorbubble
flushID
FlushControl
flushID
error
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Future DirectionsFuture Directions• Research opportunities
– Razor for caches/memory and control logic– Voltage control algorithms, especially per-stage tuning– Typical-case energy optimized designs (instead of worse-case latency optimized)– Turnkey application of Razor technology
• Prototype design, fabrication, evaluation– Razor I – Q4 2003 – Razor-ized combinational logic, global tuning– Razor II – Q3 2004 – Razor-ized caches and control logic, per-stage tuning
• Other applications– Single-event upset (SEU) protection using Razor error detection/re-execution– Over-clocking for performance improvement (large gains among hobbyists)
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
QuestionsQuestions
?
?
??
?
? ?
? ?
?
??
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Back-up SlidesBack-up Slides
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
• Traditional DVS– Valid voltage / delay combinations “blessed” at design time– Approach leaves a significant amount of energy “on the table”– Temperature, process, data, and safety margins placed on voltage
• Other approaches miss some margins– Slack detector – automatic tuning
• ARM’s Intelligent Energy Manager (IEM)• Processor voltage automatically tuned to
external ambient conditions • Inverter chain designed to track most
restrictive critical path, margin still required
Other Approaches to Dynamic Voltage ScalingOther Approaches to Dynamic Voltage Scaling
L2 Cache L2 Cache
control
Floating point and graphics
Data cache
Cache control
L2tags
Ex Unit
ControlUnit
IOUNIT
MemControl
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
• Compare latched data with shadow-latch on delayed clock• Upon failure: place data from shadow-latch in main latch
– Ensure shadow latch always correct using conservative design techniques– Correct value in shadow latch guarantees forward progress
• Recover pipeline using microarchitectural recovery mechanism
Razor Flip-Flop ImplementationRazor Flip-Flop Implementation
Errorcomparator
RAZOR FF
Main Flip-Flop
clk
clk_del
Shadow Latch
QLogic Stage
L1
Logic Stage
L2 Error_L
01
D
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Razor Flip-Flop CircuitRazor Flip-Flop Circuit
Inv_n
Inv_p
Meta-stability detector
Error_L
clk_b
clk
clk
clk_b
D Q
Error_L
clk_del
clk_del_b
Shadow Latch
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Overcoming Short Path ConstraintsOvercoming Short Path Constraints• Delayed clock imposes a short-path constraint
Pad with extra delay
Razor_ff
ff
clock
Long Paths
Short Paths
– Razor necessary only for latches on slow paths
– Pad fast path for latches with mixed path delays
– Trade-off between DVS headroom and short path constraints
clock
clock_del
tdelay thold
Min. path delay
Min. Path Delay > tdelay + tholdintended path short path
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Hardware Measurement SetupHardware Measurement Setup48
-bit
LFS
R48
-bit
LFS
R48
-bit
LFS
R48
-bit
LFS
R
X
X
X
clk/2
clk/2
clk clk
clk/2
clk/2
clk
!=
40
-bit
Err
or
Co
un
ter
40
-bit
Err
or
Co
un
ter
Slow Pipeline A
Slow Pipeline B
Fast Pipeline
clk/2
18
18
36
36
36
18x18
18x18
18x18
stabilize
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Simulation MethodologySimulation Methodology
• Challenge: instruction latency depends on circuit evaluation latency – May vary with changes in stage inputs, stage logic, voltage, temperature…
• Dynamic timing simulation combines architectural/circuit simulation
• Initial implementation utilized a hand-generated EX-stage circuit model– Effort ongoing to automate extraction/decomposition/integration into SimpleScalar
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Supply Voltage Control SystemSupply Voltage Control System
Eref
VoltageControl
Function
.
.
.Pipeline
reset
Vdd
Ediff = Eref - Esample
-
EsampleVoltageRegulator
Ediff
errorsignals
• Current design utilizes a very simple proportional control function– Control algorithm implemented in software
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Redo instruction in MEM
IF ID EX MEM MEMinst inst inst inst inst
WBinst
clk
clk_d
error
ID.d
EX.d
MEM.dErrorNo Error
Pipeline RecoveryPipeline Recovery
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Voltage Scaling under Dynamic WorkloadsVoltage Scaling under Dynamic Workloads• Adapt frequency/voltage to performance demands of workload
– Software controlled processor speed– Lower processor voltage during periods of low operating frequency
• Quadratic reduction in dynamic power and energy• Super-quadratic reduction in leakage
Uti
liza
tion
Time
Voltage
FreqVdd
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Simulation FlowSimulation Flow• Automatic creation of very detailed power/delay C-models
IF FF ID EX MEM WBPC FFFFFF
Circuit Extractionwith Parasitics
Variable Voltage SDF generation
Power/DelayC-model
Architecture Specification
Detailed Power/Delay Analysis
SimpleScalar + DTA Voltage ControlAlgorithm
High-level HDL Specification
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Simulation MethodologySimulation Methodology
• Dynamic timing simulation combines architectural/circuit simulation– Contrast to static timing simulation which is only concerned with critical path– SimpleScalar/Alpha architectural-level simulation– Gate-level simulation of per-stage logic blocks
• Logic block model describes cells, local and global interconnect• Cells characterized with SPICE at varied slew/cap-load/voltage• Each cycle, circuit simulator evaluates delay of each stages’ logic block\
01
01
011 0
1
11
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Simulation Analysis – Razor DVS ExecutionSimulation Analysis – Razor DVS Execution
Gap
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Time
Su
pp
ly V
olt
ag
e
0.00%
3.00%
6.00%
9.00%
12.00%
15.00%
18.00%
21.00%
24.00%
27.00%
30.00%
Err
or
Ra
te
Voltage
Error Rate
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
Razor DemoRazor Demo
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
More Details on Meta-StabilityMore Details on Meta-Stability• Sub-critical operation invites meta-stability
– Meta-stability detector itself can become meta-stable– double latch error signal to obtain sufficient small probability
clk_b
clk
clk
clk_b
D Q
clk_del
clk_del_brestore
restore
bubble
flush
Dynamic Or / Latch
– Flush entire pipe– No forward progress– Reduce frequency
restore
bubble
flush
pos
neg
fail
pos
neg
error
Advanced Computer Architecture LabThe University of Michigan
Razor DVSDan Ernst – 12/3/2003
IF ID EX MEM WBinst1 inst1 inst1
clk
clk_d
error
ID.d
EX.d
MEM.d
Short Path
I2
inst2 inst2
I1 I2
I1
Short Path FailureShort Path Failure
Top Related