SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... ·...
Transcript of SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... ·...
![Page 1: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/1.jpg)
SPIRAL DSP Transform Compiler:Application Specific Hardware Synthesis
Peter A. Milder1 ([email protected])Franz Franchetti, James C. Hoe, and Markus Pueschel2
Department of ECEDepartment of ECECarnegie Mellon University
CMU/ECE/Hoe, February 2013, slide‐1
now with 1SUNY Stonybrook and 2ETH
![Page 2: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/2.jpg)
The SPIRAL ProjectThe SPIRAL Project• High performance implementations of linear DSP
f (DFT DCT DWT fil )transforms (DFT, DCT, DWT, filters, etc) are an important class of design problemsH d d i d t i i t i k d i• Hand design and tuning is tricky and expensive– needs both math and implementation knowledge
i i d di– time‐consuming and tedious – needs to repeat effort for every new context
• SPIRAL research goal: A flexible push‐button design generator that produces SW & HW implementations comparable with expert hand design
CMU/ECE/Hoe, February 2013, slide‐2
comparable with expert hand design
![Page 3: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/3.jpg)
Why we can do better than hand designy g
• SPIRAL is only focused on linear DSP transforms
• These transforms are highly structured, highly regular and very well understood mathematicallyegu a a d e y e u de stood at e at ca y
• Algorithmic implementations of a transform can be d f ll i k f lenumerated following a known set of rules
• For a given objective function and mapping target, aFor a given objective function and mapping target, a computer generates a solution at least as good as the best human effortby trying enough
CMU/ECE/Hoe, February 2013, slide‐3
implementations
![Page 4: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/4.jpg)
SPIRAL Framework
SPIRAL
I want a DFT of size 1024on a {Xilinx, P4, Cell....}
SPIRAL automationstarts here
where mostwhere mosttools beginautomatingthe problem
CMU/ECE/Hoe, February 2013, slide‐4
Principle 1: Domain knowledge in the systemPrinciple 2: Optimization at a high level of abstraction
![Page 5: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/5.jpg)
www.spiral.net/hardware/dftgen.html
CMU/ECE/Hoe, February 2013, slide‐5
![Page 6: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/6.jpg)
High‐Level, Quality, and SpecializationHigh Level, Quality, and Specialization
High level:High‐level:tools know
better than you
RTL Synthesis: general purpose
y
RTL Synthesis: general‐purposebut special handling of
structures like FSM, arith, etc.
Place and Route: works the same
, ,
CMU/ECE/Hoe, February 2013, slide‐6
Place‐and‐Route: works the sameno matter what design
![Page 7: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/7.jpg)
OutlineOutline• SPIRAL Formula Framework
• SPIRAL for HW FFT cores
• SPIRAL for HW FFT “un”‐coreSPIRAL for HW FFT un core
CMU/ECE/Hoe, February 2013, slide‐7
![Page 8: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/8.jpg)
Linear TransformsLinear Transforms• Linear transform is a matrix‐vector multiplication
– computing by definition takes O(N2) operations– the matrix has structure
• E.g. discrete Fourier transform: y = DFTN xy0y1.
x0x1.
k 0 .. N-1
j
yj.
= xk.
.Njkie 2j
0 ..
CMU/ECE/Hoe, February 2013, slide‐8
.yN-1
.xN-1
N-1
![Page 9: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/9.jpg)
“Fast” AlgorithmsFast Algorithms• a “fast” algorithm factors the matrix into a sequence of structured sparse matricesstructured, sparse matricescheaper sparse multiplies O(N log(N)) operations
• E g Cooley Tukey Factorization of DFT• E.g. Cooley‐Tukey Factorization of DFT4
11
1111
11
1111
111111ii
11
1
1111
111
1
1111
11
111111
11
iii
ii
• Matrix formula representation
44
CMU/ECE/Hoe, February 2013, slide‐9
4222
42224 LDFTIDIDFTDFT
![Page 10: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/10.jpg)
Factorization RulesFactorization RulesE.g. Cooley‐Tukeyg y y
mnnmn
mnnmnmn LDFTIDIDFTDFT
11
– DFT2 is – D is a diagonal matrix of twiddle factors
1111
– L is a stride permutation matrix– AB=[aj,kB] is the tensor (or kronecker) product
A In
a0,0a0,0a0,000 a0,1a0,1a0,10
0
a1,0a1 00 a1,1a1 1
00BBB
e g I B
CMU/ECE/Hoe, February 2013, slide‐10
A In a1,0a1,00a1,1a1,10
0
B
e.g., In B
![Page 11: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/11.jpg)
“Fast” Fourier Transform AlgorithmsFast Fourier Transform Algorithms• Recursively factorize by the Cooley‐Tukey rule until only leaf cases remain (e g DFT for radix r)only leaf cases remain (e.g. DFTr for radix‐r)
8242
82428 LDFTIDIDFTDFT
• Exponential number of alternatives
82
4222
42222
8242 LLDFTIDIDFTIDIDFT
Exponential number of alternatives 8DFT
2DFT 4DFT
8DFT
4DFT 2DFT
• Each ruletree corresponds a different algorithm
2DFT 4DFT2DFT 2DFT2DFT 2DFT
CMU/ECE/Hoe, February 2013, slide‐11
• Each ruletree corresponds a different algorithm• All cost O(N log(N))
![Page 12: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/12.jpg)
A System of Transforms and Rulesy
2
)(2 2/1,1 FdiagDCT II
QnIV
nII
nII
n FIDCTDCTPDCT 22/)(
2/)(
2/)(
DDCTSDCT IIn
IVn )()(
IV )(r
IVn MMDCT 1
)(
PDFTIDIDFTDFT
CDSTDCTBDFT In
Inn )( )(
2/)(2/
50+ transforms150+ rules
))(()()( // hFIIIhF ddnkdk
dnn
PDFTIDIDFTDFT mnmnnm
EhCirchF )()(
( )n
WHT I WHT I
EWIPIWDWTWDWT knnnn )())(()( 2/2/2/
EhCirchFn )()(
CMU/ECE/Hoe, February 2013, slide‐12
1 1 12 2 2 21
( )n n n n n ni i i t
iWHT I WHT I
![Page 13: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/13.jpg)
Algorithmic Design Spaceg g psize # of DFT # of DCT‐IV
248
1640
1101268
163264
40296
27744162570361280
12631242
1924443362734381512163135424264
128256512
162570361280~1.01 1027~2.31 1061~2 86 10133
7343815121631354242~1.07 1038~2.30 1076~1 06 10153512 2.86 10133 1.06 10153
Different characteristics: data flow, numerical stability operation ordering working set size
CMU/ECE/Hoe, February 2013, slide‐13
stability, operation ordering, working set size, datapath regularity
![Page 14: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/14.jpg)
Design Space: SW DCT 32 on P4Design Space: SW DCT 32 on P4 Histogram of 10,000 randomly selected algorithms
histogram by runtime(P4, 3.2 GHz)
CMU/ECE/Hoe, February 2013, slide‐14
histogram by num. accuracy
![Page 15: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/15.jpg)
OutlineOutline• SPIRAL Formula Framework
• SPIRAL for HW FFT cores
• SPIRAL for HW FFT “un”‐coreSPIRAL for HW FFT un core
CMU/ECE/Hoe, February 2013, slide‐15
![Page 16: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/16.jpg)
Formula to HW (Combinational)Formula to HW (Combinational)• Given where is:
– apply then– apply , then – is a permutation permute– apply , times in parallel
i di l l– is a diagonal scale
B A
A×7
×8
CMU/ECE/Hoe, February 2013, slide‐16
A ×4
×2
![Page 17: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/17.jpg)
DFT8 Datapath Example
xx
x
x x
x
x
x
x
x
CMU/ECE/Hoe, February 2013, slide‐17
82
4222
42222
82428 LLDFTIDIDFTIDIDFTDFT
(formula is applied from right to left)
![Page 18: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/18.jpg)
Pease DFT8 Example8 p
x x x
x x xx
x
x
x
x
xx x x
x x x
stage 1 stage 2 stage 3
CMU/ECE/Hoe, February 2013, slide‐18
![Page 19: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/19.jpg)
How about good HW?g• Matrix formulas have a natural mapping to dataflow and hence combinational datapathdataflow and hence combinational datapath
• However, real hardware designs must fit a given resource constraintresource constraint sequential datapath that reuse available HW
identify repeated kernels– identify repeated kernels– instantiate kernels under resource constraints
h d l t ti t i t ti t d– schedule computation to reuse instantiated kernels
We want to do the analysis and mapping at
CMU/ECE/Hoe, February 2013, slide‐19
We want to do the analysis and mapping at formula level, with high‐level algorithm knowledge
![Page 20: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/20.jpg)
Tensor as Streaming Parallelismg
partially streamedfully parallel fully streamed
CMU/ECE/Hoe, February 2013, slide‐20
![Page 21: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/21.jpg)
Pease DFT Example: DFTPease DFT Example: DFT8x x x
x x xx
x
x
x
x
xx x x
x x x
stage 1 stage 2 stage 3
CMU/ECE/Hoe, February 2013, slide‐21
![Page 22: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/22.jpg)
Pease DFT Example: DFT StreamingPease DFT Example: DFT8 Streaming
x x xx x x
x x x
f(L82)f(L8
2) f(L82)f(L8
2) f(L82)f(L8
2) f(L82)f(R8)
x x x
x x xf(L8
2)f(L82) f(L8
2)f(L82) f(L8
2)f(L82) f(L8
2)f(R8)
stage 1 stage 2 stage 3
CMU/ECE/Hoe, February 2013, slide‐22
![Page 23: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/23.jpg)
Regular Structure for HW• Simple regular structure embodied in Pease FFT
Regular Structure for HW
• Example:
CMU/ECE/Hoe, February 2013, slide‐23
![Page 24: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/24.jpg)
Formally representing horizontal reuseFormally representing horizontal reuse
hr hrhr
not horizontally horizontally partially horizontally
CMU/ECE/Hoe, February 2013, slide‐24
yreused
yreused
p y yreused
![Page 25: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/25.jpg)
Iterative Reuse of Logicg
latencyt
CMU/ECE/Hoe, February 2013, slide‐25
latencyFine-grained control over cost/latency tradeoff
cost
![Page 26: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/26.jpg)
Example: rewriting rulesf ifor streaming reuse
CMU/ECE/Hoe, February 2013, slide‐26
26
![Page 27: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/27.jpg)
Applicability to other transforms?pp y• DFT radix 2
1
0
22222 11
k
ii
k
kkk LDFTITR
• DFT radix 2r
0i
1/
0
22222
rk
i
k
rkrrkk LDFTITR
• 2‐D DFTnxn 1
2
nnnn DFTIL
0i
nxn
• WHT
0i
nnn
1/
2rk
k
kk LWHTIWHT
• DCT (type II) Hk
kkk
PLLLADP 221
2
0
222i
rkrrk LWHTI
CMU/ECE/Hoe, February 2013, slide‐27
• DCT (type II) H
iik kkk PLLLADP 2
22
22
0
22 11
![Page 28: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/28.jpg)
FPGA: Area vs ThroughputFPGA: Area vs. Throughput
Pareto optimal
49x slices132x throughput
CMU/ECE/Hoe, February 2013, slide‐28
28
![Page 29: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/29.jpg)
OutlineOutline• SPIRAL Formula Framework
• SPIRAL for HW FFT cores
• SPIRAL for HW FFT “un”‐coreSPIRAL for HW FFT un core
CMU/ECE/Hoe, February 2013, slide‐29
![Page 30: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/30.jpg)
A 2D‐FFT AlgorithmA 2D‐FFT Algorithm• Row‐column algorithm:g
2D-
Row StageColumn Stage
Dataset:Dataset:(Logical abstractionof the 2D dataset)
… …
CMU/ECE/Hoe, February 2013, slide‐30
![Page 31: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/31.jpg)
Off‐chip Data SetsOff chip Data Sets
off‐chipDRAM
DFTnk l
on‐chipSRAMDRAM kernelSRAM
• Need to balance– kernel processing bandwidth– off‐chip memory bandwidth
CMU/ECE/Hoe, February 2013, slide‐31
– on‐chip storage capacity
![Page 32: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/32.jpg)
Inefficient DRAM Access Patterns• Row‐wise traversal ‐> Sequential accesses• Column‐wise traversal ‐> Large strided accesses
row‐major 2D array
lin
0
n
ear mem
…
Row buffersizem
space…
CMU/ECE/Hoe, February 2013, slide‐32
nn2
![Page 33: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/33.jpg)
How to Optimize the Access Patterns p
…
0k2
row‐major “blocked”
nk
linear mn
…
…
kRow buffer
sizemem
spa
n
…
2
ace
CMU/ECE/Hoe, February 2013, slide‐33
n2in row‐buffersized chunks [Akin, et al., FCCM 2012]
![Page 34: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/34.jpg)
Design Generator w/ Tensor Formalismg /
row column algorithm
row stagecolumn stage
2D row‐column algorithm
symmetric algorithm
2D-
symmetric algorithm
symmetric algorithm y gwith tiling
read tileslinearizehi
FFT processingtransposeand re‐tile
write tilescolumn‐wise
CMU/ECE/Hoe, February 2013, slide‐34
row‐wiseon‐chipand re‐tileon‐chip
column wise
[Akin, et al., FCCM 2012]
![Page 35: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/35.jpg)
2D‐FFT (double) Raw Performance( )
CMU/ECE/Hoe, February 2013, slide‐35
Problem Size[Akin, et al., FCCM 2012]
![Page 36: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/36.jpg)
2D‐FFT (double) BW Efficiency( ) y
CMU/ECE/Hoe, February 2013, slide‐36
Problem Size[Akin, et al., FCCM 2012]
![Page 37: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/37.jpg)
2D‐FFT (double) Power Efficiency( ) y
CMU/ECE/Hoe, February 2013, slide‐37
Problem Size[Akin, et al., FCCM 2012]
![Page 38: SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter](https://reader033.fdocuments.in/reader033/viewer/2022050507/5f986ff825639807ee3abd24/html5/thumbnails/38.jpg)
ConclusionsConclusions• Encapsulating domain knowledge in a domain p g gspecific tool for high‐level design automation
• SPIRAL– mathematical approach to DSP transform implementation (cores and “un”‐core)
– generalizable to other linear DSP transforms– as good as best expert designer
• Thank you
CMU/ECE/Hoe, February 2013, slide‐38