Dynamic Voltage/Frequency Scaling in Loop Accelerators using BLADES
-
Upload
oliver-torres -
Category
Documents
-
view
24 -
download
0
description
Transcript of Dynamic Voltage/Frequency Scaling in Loop Accelerators using BLADES
University of MichiganElectrical Engineering and Computer Science
Dynamic Voltage/Frequency Scaling Dynamic Voltage/Frequency Scaling in Loop Accelerators using BLADESin Loop Accelerators using BLADES
Ganesh Dasika1, Shidhartha Das2, Kevin Fan1,Scott Mahlke1, David Bull2
1
1University of MichiganAdvanced Computer
Architecture LaboratoyAnn Arbor, MI
2ARM Ltd.Cambridge
United Kingdom
University of MichiganElectrical Engineering and Computer Science
IntroductionIntroduction
2
[Austin, IEEE Computer March 04]
University of MichiganElectrical Engineering and Computer Science
RazorRazor
• Allows for voltage/frequency scaling beyond first-failure point• Exploits difference between design-time conditions (“slow”) and
actual conditions (“typical”)
3
[Das, JSSC 2006]
University of MichiganElectrical Engineering and Computer Science
Razor in General Purpose ProcessorsRazor in General Purpose Processors
• Requires detailed analysis of microarchitectural impact– Analyze what state should be stored– Lengthening pipeline for stabilization increases
complexity of forwarding logic• Unpredictable control and data flow• Difficult to determine worst-case vectors
4
University of MichiganElectrical Engineering and Computer Science
BLADESBLADES
• Better-than-worst-case Loop Accelerator Design• Incorporate DVFS into ASICs using Razor
– Shave off some of the high NRE using HLS– Develop generic methodology for any application– Razor solution for a templated architecture
• Create ASIC design flow that is aware of Razor-ization costs
5
University of MichiganElectrical Engineering and Computer Science
Loop Accelerator TemplateLoop Accelerator Template
• Hardware realization of modulo-scheduled loop• Parameterized execution resources, storage, connectivity• Control is statically determined, simple and not timing-critical• Opportunity to make application-specific optimizations
6
University of MichiganElectrical Engineering and Computer Science
Razorized Loop AcceleratorRazorized Loop Accelerator
7
Razor++ **++ **
Extended register queues
Addedinterconnect
“Roll-back” muxes
} R
R is the number of extra entries required
Function of max pipeline depth and error-detection delay
University of MichiganElectrical Engineering and Computer Science
Error “Life-Cycle”Error “Life-Cycle”
8
Razor++ **++ **
Error Reset
Error
…
Error OR-tree Error stabilization
Roll-backpipelining
…
++Error
processing
Control
University of MichiganElectrical Engineering and Computer Science
Issues with RazorIssues with Razor
• Area, added hold-fixing
9
tspec
D
CLK
University of MichiganElectrical Engineering and Computer Science10
Or1Or1Or0Or0FU 1
Add1Add1Add0Add0FU 0
Time 5Time 4Time 3Time 2Time 1Time 0
Or1FU 3
Or0FU 2
Add1FU 1
Add0FU 0
Time 2Time 1Time 0
Add-Or1Add-Or0FU 0
Time 3Time 2Time 1Time 0
Or1Or0FU 1
Add1Add0FU 0
Time 2Time 1Time 0
50% FU utilization removes hold-fixing need, but requires halving performance or doubling area
Use hybrid scheme to execute >2 ops per FU
++
II
Opcode-chainingOpcode-chaining
University of MichiganElectrical Engineering and Computer Science
Identifying Opcode ChainsIdentifying Opcode Chains• Compiler identifies
subgraphs of 3-4 input, 1 output instructions– All arith. ops supported
• Greedy selection algorithm
11
<<<< <<<<
++++
>>>>
+
>>>>
++++
++
++
&&
STST
&&
STST
>>>>
++
<<<< ++
<<<<
LDLD
>>>>
LDLD
1 2
3
4 5
6
7
University of MichiganElectrical Engineering and Computer Science
Custom FUsCustom FUs
12
<<<< <<<<
++++
>>>>
+
>>>>
++++
++
++
&&
STST
&&
STST
>>>>
++
<<<< ++
<<<<
LDLD
>>>>
LDLD
1 2
3
4 5
6
7
<<<< <<<<
++++
>>>>
+
>>>>
++++
++
++
&&
STST
&&
STST
>>>>
++
<<<< ++
<<<<
LDLD
>>>>
LDLD
1 2
3
4 5
6
7
>>
+
+
<<
+
Enabled every2 cycles
Razor DFF
University of MichiganElectrical Engineering and Computer Science
ResultsResults
13
idct, sharp, systolic_dct had multiple CFUs, and overall lower # of FUsViterbi, dequant had signficant control-flow that restricted opportunities for creating custom ops
22% reduction in hold-fixing overhead in sobel
University of MichiganElectrical Engineering and Computer Science
ConclusionConclusion
• Application-specific optimizations definitely help to mitigate Razor costs– 24% reduction in overhead– 33% energy savings overall
• Can optimize Razor-ization with further input from the compiler– Critical-instruction analysis– Error impact analysis
14
University of MichiganElectrical Engineering and Computer Science
Thank you!Thank you!
15
http://cccp.eecs.umich.edu
University of MichiganElectrical Engineering and Computer Science
Future WorkFuture Work
• Errors in different FUs affect the system differently– Error “impact-analysis”– Data computation not necessarily error-sensitive– Address, branch target/direction critical to functionality
• Razor-ization of arbitrary Verilog
16
University of MichiganElectrical Engineering and Computer Science
MotivationMotivation
• Using Razor has significant design overhead– Error-recovery system– Added “backup” state– Additional hold-time fixing
• Modifications for different u-archs are different• Information about work-load cannot be used since
design must preserve generality
17