MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency...
-
Upload
arthur-ball -
Category
Documents
-
view
219 -
download
0
Transcript of MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency...
![Page 1: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/1.jpg)
MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency
Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei Li
Key Laboratory of Computer System and Architecture,ICT (Institute of Computing Technology), CAS, Beijing, P.R. China
NVIDIA Corporation, USA
![Page 2: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/2.jpg)
Outline
• What’s Path-grained Timing Adaptability (PTA)
• Potential of PTA for Efficiency Improvement
• How to Exploit PTA
• Case Study Results
• Conclusions
![Page 3: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/3.jpg)
Impact of DVFS to Path Delay
P1
P2
FF
FFCritial Path
TCycle Period
Non-critical Path
K-1th stage Kth stage
T T
• Traditionally, suppose voltage scaling down makes P1 and P2 timing critical, then what?
• Scaling down frequency to all stages of pipeline
Question:
• Can these emerging critical paths be salvaged to trade more voltage scaling down?
• Maybe Yes! By fine-grained time stealing
![Page 4: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/4.jpg)
Timing Imbalance
T T
FF
FF
FF
FF
PCP
TCycle Period
NCP
Generous Flip-flop (GFF)
Backward Adaptable Flip-flop (BAFF)
Forward Adaptable Flip-flop (FAFF)
Unadaptable Flip-flop (UAFF)
Slack_up > TH, Slack_dn > TH
Slack_up > TH, Slack_dn ≤ TH
Slack_up ≤ TH, Slack_dn > TH
Slack_up ≤ TH, Slack_dn ≤ TH
![Page 5: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/5.jpg)
Intrinsic Timing Imbalance
• Case study• FPU, adopted by OpenSPARC T1
• Support all IEEE 754 floating-point data types
• Synthesized by Synopsys Design Compiler with UMC 0.18um technology
• Cycle period: (1+10%) ×T critical
1
10
100
1000
10000
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
TH=0.1Cycle
TH=0.15Cycle
TH=0.2Cycle
TH=0.25Cycle
TH=0.3Cycle
TH=0.35Cycle
TH=0.4Cycle
# F
lip-f
lop
s
The GFFs, FAFFs, and BAFFs take considerable even dominated proportion!
Attractive Potential
![Page 6: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/6.jpg)
DVFS Exacerbating Imbalance
• Generally, the time margin of longer paths diminish much more faster than that of short ones
FF
• Assume that the path delay is the sum of delay of gates on the path
• TG : the gate delay
• Delta: the delay change during the voltage scaling down
• Before voltage scaling down• △S1 = (n - m) × TG
• After voltage scaling down• △S2 = (n - m) × (TG + Delta)
Define: S=|Slack_dn △ - Slack_up|
Slack_dnSlack_up
n gates m gates
△S1 < S△ 2
Example
![Page 7: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/7.jpg)
If the Imbalance be utilized…• Check the lower bound of cycle period T
• Traditionally:
T1 = n× (TG+Delta)• From MicroFix’s perspective:
T2 = (m+n)/2 × (TG+Delta) ≤ T1 - TH
T
δ
n
(m+n)/2
Without MicroFix
With MicroFix
F=1/Tδ= δ(V)
F
1/V
1/n
2/(m+n)
Without MicroFix
With MicroFix
Note: preclude the UAFFs
![Page 8: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/8.jpg)
How to deal with UAFFs?
• Two-supply voltage scheme [Usami, JSSC’98] [Ghosh, TCAD’07]
• Critical Isolation: the critical paths resulting in UAFFs
• The supply voltage of Critical Isolation are more conservative than that of other portion out of Critical Isolation.
Critical Isolation
Powered by Conservative Voltage
Powered by Aggressive Voltage
The exploitable scope of MicroFix
![Page 9: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/9.jpg)
How to “Fix’’?
• Two supply voltage scheme• Timing sensors [Yan, DATE’09][Agarwal, VTS’07]
• Multiple-phase Clocks (generated by a DLL)
……(K-1)FFs
KFFs
Delay Error Prediction Signals
……(K-1)th stage
LogicKth stage
Logic
Timing Sensors
Timing Sensors
Target Pipeline
Voltage/Frequency
Control
Normal Voltage Supply
………… …… …… ……
CLK……
……
……
FCLK
BCLK
Conservative Voltage Supply
CLK
BCLK
FCLKT×TH
T×TH
UAFFFAFF
GFF
FCLK BCLK
BAFF
CLK
FFs
![Page 10: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/10.jpg)
Operational Principles
V, F V, FV, F
Reduce frequency from F to F Reduce voltage from V to V
(a) Traditional DVFS
Increase frequency from F to F Increase voltage from V to V
Reducing Power
Increasing Performance
V, FV, F
Increase voltage from V to V
Increase frequency from F to F
V, F
V V-v Monitoring
No error predicted
V V+ v
Error predicted
F F + fMonitoring
No error predicted
F F- fError
predicted
(b) MicroFix enhanced DVFS
Reduce frequency from F to F
Reduce voltage from V to V
Restore a tight margin
Restore a tight margin
Ensure that the restored margin ‘v’ and ‘f ’ can guard safe voltage and frequency turning.
![Page 11: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/11.jpg)
Experimental Setup
• Gate-level• Study the adaptability and overhead with a
synthesized FPU – Timing info. -> PrimeTime
• Transistor-level • Investigated the Power-Performance tradeoffs
with Hspice simulations – 32nm PTM models dedicated for HP and LP
applications, respectively.
![Page 12: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/12.jpg)
Exploring Design Tradeoffs
• ‘TH’ play a critical role in determining the ultimate Efficiency
Critical Isolation
The exploitable scope of MicroFix
Critical Isolation
The exploitable scope of MicroFix
Smaller ‘TH’, smaller CI, but less aggressive voltage reduction!
1
10
100
1000
10000
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
GF
F
FA
FF
BA
FF
UA
FF
TH=0.1Cycle
TH=0.15Cycle
TH=0.2Cycle
TH=0.25Cycle
TH=0.3Cycle
TH=0.35Cycle
TH=0.4Cycle
# F
lip-f
lop
s
Larger ‘TH’, larger CI, but more aggressive voltage reduction!
What ‘TH’ is optimal?
![Page 13: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/13.jpg)
Exploring Design Tradeoffs /2
• Percentage of Cells in Critical Isolation
0.00% 0.00% 0.16%2.04%
10.82%
22.70%
33.52%
0%
5%
10%
15%
20%
25%
30%
35%
40%
0.1 0.15 0.2 0.25 0.3 0.35 0.4
TH
Pe
rce
nta
ge
of
Ce
lls
![Page 14: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/14.jpg)
Exploring Design Tradeoffs /3
• Sensor Area Overhead• a sensor is about 8x that of a pipeline flip-flop (based on the
number of transistors) [Yan, DATE09]
• The paths in the critical isolation and those with ‘over-larger’ slack (i.e. slack >T × TH + tmargin) do not need to be monitored by sensors
0.00%
2.10%3.75%
9.20%
12.34%10.95%
9.97%
0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
12.0%
14.0%
0.1 0.15 0.2 0.25 0.3 0.35 0.4TH
Se
nso
r a
rea
ove
rhe
ad
![Page 15: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/15.jpg)
Exploring Design Tradeoffs /4
• Sensor Power Overhead• in the most pessimistic case (TH=0.3, all sensors
simultaneously flag timing errors): 14%
• HOWEVER, such worst-case power overhead can hardly happen due to three reasons
1) Sensors do not need to be always on
2) It’s almost impossible all sensors flag impending timing errors simultaneously
3) TH=0.3 actually is not a optimal configuration
Therefore, the pessimistic power overhead won’t offset much efficiency of MicroFix!
![Page 16: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/16.jpg)
Hspice Simulations
• Object: Investigate the detailed delay-power relation of the target pipeline
• It is ideal to directly simulate the transistor-level model of the target pipeline with Hspice; however it is very labor-intensive and time consuming.
• So we took a indirect way to conduct the Hspice simulations
Ptotal(V,F) = Pcomb(V,F)+Pff(V,F)1/F = T = tc + tsetup + tc−to−q
![Page 17: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/17.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6
Voltage (V)N
orm
aliz
ed P
ower
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6
Voltage (V)
Nor
amliz
ed D
elay
(a) (b)
High Perf.
Low PowerLow Power
High Perf.
Combinational Component
• ISCAS85 (c432, c499, c880, c1355, c1908, c2670)
• 32nm PTM models (HP and LP versions)
Normalized V-D and V-P relations comply well with all of the simulated benchmarks!
![Page 18: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/18.jpg)
Sequential Component
• V-D
• V-P
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
1 0.9 0.8 0.7 0.6Voltage (V)
No
rma
lize
d D
ela
y
t_setup + t_c-to_q
t_setup + t_c-to_q
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6Voltage (V)
Nor
mal
ized
Pow
er
α=1α=0.5α=0.25α=1α=0.5α=0.25
Low PowerHigh Perf.
Low Power
High Perf.
(a) (b)
![Page 19: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/19.jpg)
Efficiency Comparsion
TH = 0.2 is an optimal choice!Efficiency Improvement: 35% EDP, 28% PDP
0.00%
2.10%3.75%
9.20%
12.34%10.95%
9.97%
0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
12.0%
14.0%
0.1 0.15 0.2 0.25 0.3 0.35 0.4TH
Se
nso
r a
rea
ove
rhe
ad
![Page 20: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/20.jpg)
Conclusion
• MicroFix can improve DVFS efficiency by exploiting the path-grained adaptability
• The timing imbalance threshold, TH, implies a critical design tradeoff
• The efficiency of EDP for HP application up to 35% and PDP for LP application up to 28%, at the expense of only 7% area overhead
![Page 21: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1e5503460f94c362bf/html5/thumbnails/21.jpg)
Thanks!
Q&A