Methods for Achieving RTL to Gate Power Consistency
-
Upload
ansys-inc -
Category
Engineering
-
view
516 -
download
3
description
Transcript of Methods for Achieving RTL to Gate Power Consistency
© 2014 ANSYS, Inc.6/23/2014 11
Methods for Achieving RTL to Gate Power Consistency
Design Automation Conference 2014
© 2014 ANSYS, Inc.6/23/2014 22
PowerArtist™: RTL Design-for-Power Platform
Power Analysis and Debug
Original RTL Low-Power RTL
Automated Power Reduction Links with Physical
Physical
Power
RTL Power
PACE RPM
© 2014 ANSYS, Inc.6/23/2014 33
Objectives of RTL Power Analysis
• Power trade-off analysis using relative accuracy
• Sign off power with absolute accuracy
• Analysis driven power reduction
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291
Cu
mu
lati
ve
Are
a
Ove
rhe
ad
(n
orm
aliz
ed
)
To
tal P
ow
er
Sa
vin
gs
Ava
ila
ble
(n
orm
aliz
ed
)
# RTL Changes (Design Effort)
Maximum acceptable area
impact
Maximum possible
power savings
Only 5 changes
gave 50% saving
© 2014 ANSYS, Inc.6/23/2014 55
RTL Power: Inputs for PowerArtist
Vdd
1
Power domains(UPF / CPF)
Vdd
2module PA (
...
always @ (posedge clk) begin
dout <= din1;
end
assign out = sel ? dout : din2;
...
endmodule RTL (VHDL, Verilog, System Verilog)
RTL Power
Analysis
Capacitance model (WLM / PACE)
mux
andregister
register
Activity
(FSDB / VCD / SAIF)
Clock tree, gating (SDC, PACE, user input)
clk
Power models(Liberty .lib)
register
registerand
mux
© 2014 ANSYS, Inc.6/23/2014 66
Factors Affecting RTL Power Accuracy
Synthesis
Modeling
Inferencing
Multi-VT
Cell Selection
Micro-
architecture
Algorithmic
RTL Models
Activity
Propagation
Timing
Power
Computation
Physical
Models
Clock Tree
Wire Cap
Transition Time
Low Power
Structures
Voltage / Power
Domains
CPF / UPF
NOTE: Algorithmic and Low Power
structures are not configured for
accuracy
© 2014 ANSYS, Inc.6/23/2014 77
Synthesis Modeling Aspects for RTL Power
• Optimization settings to be consistent as synthesis
• Enable DesignWare flow (if DW components are present)Inferencing
• Apply consistent multi-VT settings from synthesisMulti-VT
• Fine-tune cell selection based on synthesis netlist
• Apply boundary conditions based on load/ frequencyCell Selection
• Apply microarchitectures for macros (e.g. adders, multipliers)Microarchitecture
© 2014 ANSYS, Inc.6/23/2014 88
Synthesis Modeling Aspects in PowerArtist
b = 8’b11000100;
assign z = a * b;
CSA
Constant Multipliers
assign z = a + b + c + d ; a b c
CSA d
CSA
+
a b
+ c
d+
+
Chains of Adders
Look-Up Table Optimization
OR
plane
addressdata
case (address)
8'd0 : data = {32'd0};
8'd1 : data = {32'd12};
…
endcase
address
Optimized and-or plane by
sharing common logic
data
Cell mapping to
basic 2-input cellsModeled using
AOIs
Un-encoded mux
© 2014 ANSYS, Inc.6/23/2014 99
RTL Power AccuracyUsing Wire Load Models
– Large difference seen with
simple wire load models
– Clock and Combo power show
the largest difference
– Total power shows 40%
difference wrt gate level
Mobile SoC Case Study
** Note: GATE considered to be most accurate
28.8%11.0%
-9.2%
69.2%
41.2%32.3%
40.2%
-100%
-80%
-60%
-40%
-20%
0%
20%
40%
60%
80%
100%
0.000
0.020
0.040
0.060
0.080
0.100
0.120
% D
iffe
ren
ce
Po
wer
(W
atts
)
RTL Wire Load Models vs. Gate Level(Different Power Categories)
RTL WLM GATE %diff
© 2014 ANSYS, Inc.6/23/2014 1010
Physical Aspects Modeling for Power
• Modeling clock tree
• Balanced and Clock Mesh topologyClock Tree
• Accurately model post-layout wire capacitance
• Model capacitance profile for different types of netsWire Cap
• Accurately model slew for realistic power
• Both clock and logic netsTransition Time
© 2014 ANSYS, Inc.6/23/2014 1111
Physical Modeling: Clock Tree
• RTL clock power accuracy requirements
– Understand clock gating methodology
– Understand clock tree topology and buffering
• Difficult for RTL designers to get data from backend team
Clock Mesh TopologyBalanced Clock Tree
© 2014 ANSYS, Inc.6/23/2014 1212
Physical Modeling: Wire Cap
40nm, 45k nets with fanout 1
Traditional Wire Load Models
• Not available in some vendor libraries; often not calibrated
• Custom WLMs not portable across blocks and designs
• Simplistic modeling results in poor accuracy
WLM assigns 1fF for all nets vs. SPEF
that varies 0.2fF to >129fF
© 2014 ANSYS, Inc.6/23/2014 1313
PACE™ for RTL Power Accuracy
PACE applies from RTL to Pre-layout Power
• Clock tree models
– Determine buffer and CG cells per inferred clock tree
– Supports both balanced clock tree as well as clock mesh
• Wire capacitance models
– Granular, power-oriented vs. traditional WLMs
module PA (
...
always @ (posedge clk)
begin
dout <= din1;
end
assign out = sel ? dout :
din2;
...
endmodule
Clock distribution
Parasitics
Multiple Vt
Low-power structures
RTL Power
Bridge the RTL ↔ Implementation Gap
Statistical Models:
Wire Cap and Clock
Representative
LayoutPowerArtist
Calibration (PACE)
Post-Layout Power
© 2014 ANSYS, Inc.6/23/2014 1414
-13.4%5.1%
-9.2%
22.8%8.1%
-37.4%
3.0%
-100%
-80%
-60%
-40%
-20%
0%
20%
40%
60%
80%
100%
0.000
0.020
0.040
0.060
0.080
0.100
0.120
% D
iffe
ren
ce
Po
wer
(W
atts
)
PACE Cap Models vs. WLM & Gate Level(Different Power Categories)
RTL WLM RTL w PACE Cap GATE %diff
RTL Power AccuracyUsing PACE Cap Models
– Tighter correlation seen with
PACE Cap models
– Register and Combo power
are within +/-20%
– Total power shows <5%
difference wrt gate level
Mobile SoC Case Study
** Note: GATE considered to be most accurate
© 2014 ANSYS, Inc.6/23/2014 1515
RTL Power AccuracyUsing PACE Cap + Clock Models
– Best correlation seen with
PACE Cap + Clock models
– Overall correlation is within
+/-15%
Mobile SoC Case Study
** Note: GATE considered to be most accurate
-13.4%
9.9%
-9.2%
-12.8% -9.0% -13.6% -9.4%
-100.0%
-80.0%
-60.0%
-40.0%
-20.0%
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
0.000
0.020
0.040
0.060
0.080
0.100
0.120
% D
iffe
ren
ce
Po
we
r (W
atts
)
PACE Cap+Clk Models vs. WLM & Gate Level(Different Power Categories)
RTL WLM RTL w PACE Cap+Clock GATE
%diff w/ PACE %diff w/ WLM
© 2014 ANSYS, Inc.6/23/2014 1616
0.000
0.020
0.040
0.060
0.080
0.100
0.120
Design 1 Design 2 Design 3
Po
wer
(W
atts
)
Total Power Comparison
RTL WLM RTL PACE GATE
RTL Power AccuracyUsing PACE Cap + Clock Models
– Total power with WLM is
greater than +/-30%
– With PACE models within
+/-20%
Mobile SoC Blocks Case
Study
** Note: GATE considered to be most accurate
© 2014 ANSYS, Inc.6/23/2014 1717
RTL Power AccuracyUsing PACE Cap + Clock Models
– Total power with WLM is
greater than +/-30%
– With PACE models within
+/-20%
Mobile SoC Blocks Case
Study
** Note: GATE considered to be most accurate
– Clock power with PACE
is within +/-20% as well
15.5%
19.0%20.7%
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
7.00E-02
8.00E-02
Design 1 Design 2 Design 3
% d
iff
Po
we
r (W
atts
)
Clock Power wrt RTL PACE vs. GATE
GATE RTL PACE %diff
© 2014 ANSYS, Inc.6/23/2014 1818
Nvidia Case Study: RTL Power Accuracy
DESIGNNumber of
instances
Black-
boxed DW
instances
Avg
Dynamic
Power
(mW)
Avg
Leakage
Power
(mW)
Avg Total
Power
(mW)
Avg
Dynamic
Power
(mW)
Avg
Leakage
Power
(mW)
Avg Total
Power
(mW)
%
Dynamic
Power
% Leakage
Power
% Total
Power
PR 580320 0 82.524 114.210 196.735 92.900 111.734 204.635 12.57% -2.17% 4.02%
TD 268993 0 89.209 38.713 127.923 101.755 35.089 136.844 14.06% -9.36% 6.97%
TTM 158407 14 64.828 21.353 86.181 63.583 20.212 83.795 -1.92% -5.34% -2.77%
TTF 134152 64 47.850 14.874 62.724 32.563 13.431 45.995 -31.95% -9.70% -26.67%
SMI 1137155 101 145.497 201.661 347.158 125.133 135.635 260.768 -14.00% -32.74% -24.88%
SRF 509095 24 263.894 75.515 339.409 258.332 73.897 332.229 -2.11% -2.14% -2.12%
115.634 77.721 193.355 112.378 65.000 177.378 -2.82% -16.37% -8.26%
125.114 62.448 187.562 129.143 60.233 189.376 3.22% -3.55% 0.97%
85.867 76.462 162.329 97.328 73.412 170.739 13.35% -3.99% 5.18%
Average Power excluding SMI/TTF
Average Power PR/TD only
RTL Power ArtistPost-synthesis PT-PXRTL Power Artist vs
Post-synthesis PT-PX
Average Power overall designs
• Power correlation performed for 6 designs 130K - 1.13M instances
• In general, very good average power correlation observed (SMI and TTF having DWs)
• 8-16 tests being run across the blocks
** Source : Nvidia-Apache Webinar, July 2013 (Miki)
© 2014 ANSYS, Inc.6/23/2014 1919
Summary
• RTL power enables early design trade offs for high power impact
• PowerArtist provides predictable RTL power accuracy wrt GATE
• PowerArtist has advanced synthesis and physical modeling techniques
• PowerArtist PACE modeling is proven across designs
• Use PowerArtist for RTL power sign-off with absolute accuracy