Overlays: a soluon paradigm for FPGA high-level design?
Transcript of Overlays: a soluon paradigm for FPGA high-level design?
![Page 1: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/1.jpg)
Overlays:asolu-onparadigmforFPGAhigh-leveldesign?
TarekS.Abdelrahman
TheEdwardS.RogersDepartmentofElectricalandComputerEngineering
UniversityofToronto
![Page 2: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/2.jpg)
ReconfigurableSystemsontheRise• FPGAsareincreasinglyintegratedincompuCngsystems
– Massiveparallelismcanleadtohighperformance– Lowerpower– Customizability
• NewergeneraConofhigh-performancesystemsintegrateFPGAswithmulCcores,targeCngdatacenters– ExamplesystemsfromIntel,IBMandXilinx– UsedmainlybysoOwaredevelopers
16-07-11 2
![Page 3: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/3.jpg)
ReconfigurableSystemsontheRise• FPGAsareincreasinglyintegratedincompuCngsystems
– Massiveparallelismcanleadtohighperformance– Lowerpower– Customizability
• NewergeneraConofhigh-performancesystemsintegrateFPGAswithmulCcores,targeCngdatacenters– ExamplesystemsfromIntel,IBMandXilinx– UsedmainlybysoOwaredevelopers
16-07-11 3
![Page 4: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/4.jpg)
FPGAProgrammabilityBurdens• FPGAsareprogrammedusingahardwaredesignabstracCon,
whichisforeigntothebulkofsoOwaredevelopers– HDL,Timing,fiYng,seedsweeps,etc.
• FPGAdevelopmenttoolsleadtoextremelylongdevelopmentcyclescomparedtotheirsoOwarecounterparts– Alargecircuitcantakedaystocompile(synthesis,place,route,Cme,
etc.)andmayneedseveralcompiles
• ThereisapressingneedtoalleviatetheseburdensandmakeFPGAdesignaccessibletosoOwaredevelopers
16-07-11 4
![Page 5: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/5.jpg)
TacklingtheBurden• High-LevelSynthesis(HLS)
– GeneratedhardwareincreasinglycompeCCvewithHDLdesign
• High-levelprogrammingmodels– DataflowmodelfromMaxeler
• Nonetheless:– Developerremainsexposedtovariousaspectsofhardwaredesign– UseofFPGAdesigntoolsissCllrequired!⇒longdevelopmentcycles
16-07-11 5
![Page 6: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/6.jpg)
Overlays• Pre-compiledFPGAcircuitsthatareinthemselves
configurable/programmable,i.e.,run-Cmeconfigurable– Examples:soOprocessors,GPU-on-FPGA,mesh-of-FUs,etc.
16-07-11 6
SoFProcessor
Source:Andrycetal:FlexGrip:ASoFGPGPUforFPGAs,FPT13
PE PE PE
PE PE PE
PE PE PE
![Page 7: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/7.jpg)
FPGAvs.OverlayDesignFlows
16-07-11 7
Pre-compiledoverlay
FPGAFPGA
FPGADesignTools
ConfiguraConStreamFPGA
bitstream
Applica-on(HDL)
Applica-on-to-OverlayTools
Applica-on(C,CUDA,DFG,etc.)
seconds
hours/days
µseconds
harder simpler
![Page 8: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/8.jpg)
Mesh-of-FUsOverlays[FPL2013]
16-07-11 8
ADD ADD
EXP SHF
ADD SUB SUB
MUL
DIV
FuncConUnit
RouCnglogic
4-NNconnectedarrayofcells
DataFlowGraph
O1
O2
I1 I2 I3 I4 I5 I6
ADD SUB SUB
MUL
DIV
C
E
D
A B
![Page 9: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/9.jpg)
MappingDFGstoOverlay–Place
16-07-11 9
ADD ADD
EXP SHF
ADD SUB SUB
MUL
DIV
I1
I2
I3 I4 I5
I6
O1 O2
A B C
D
E
DataFlowGraph
O1
O2
I1 I2 I3 I4 I5 I6
ADD SUB SUB
MUL
DIV
C
E
D
A B
![Page 10: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/10.jpg)
16-07-11 10
ADD SUB SUB
ADD MUL ADD
DIV EXP SHF
O1
O2
I1 I2 I3 I4 I5 I6
ADD SUB SUB
MUL
DIV
C
E
D
A B
I1
I2
I3 I4 I5
I6
O1 O2
A B C
D
E
pipelineregister/FIFO
DataFlowGraph
MappingDFGstoOverlay–Route
![Page 11: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/11.jpg)
O1
O2
I1 I2 I3 I4 I5 I6
ADD SUB SUB
MUL
DIV
C
E
D
A B
O1
O2
I1 I2 I3 I4 I5 I6
ADD SUB SUB
MUL
DIV
C
E
D
A B
PipelinedExecu-on
16-07-11 11
ADD SUB SUB
ADD MUL ADD
DIV EXP SHF
O1
O2
I1 I2 I3 I4 I5 I6
ADD SUB SUB
MUL
DIV
C
E
D
A B
I1
I2
I3 I4 I5
I6
O1 O2
A B C
D
E
pipelineregister/FIFO
DataFlowGraph
![Page 12: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/12.jpg)
Mesh-of-FUsTools• ApplicaCon-to-overlaytoolchainthat:
– ExtractsDFGofbodiesofparallelloopsinCcode
– PlacesandroutestheDFGnodesontotheoverlay• ConfigurestheswitchestoestablishDFGconnecCvity• GeneratestheconfiguraConstreamoftheoverlay
16-07-11 12
![Page 13: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/13.jpg)
HighPerformancewithnoHardwareDesign
16-07-11 13
DFG Size(nodes) GFLOPS CompileTime(sec)
n-Body 125 18.72 0.44BlackSholes 131 21.22 1.33MatMul 96 19.66 1.05MatMulAdd 114 22.46 3.80
• Examplemesh-of-FUsoverlayonaStraCxIV[FPL2013]– SingleprecisionfloaCngpointoperaCons– 288cellsimplementedasan18x16mesh– fMAXof312MHzand32.4GFLOPSpeak(integerat415MHz)
• Othersalsoreporthighperformanceresults
GFLOPS CompileTime(sec)
21.52 272422.10 250825.21 204528.79 919
HDLOverlay
![Page 14: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/14.jpg)
SoFware-FriendlyTarget• OverlaysraisethelevelofabstracConofusingFPGAstoone
thatismorefamiliartosoOwaredesigners– CprogrammingforasoOprocessor– CUDA/OpenCLforGPUoverlays– Dataflowgraphsformesh-of-FUs
• ThisopensupopportuniCesfor“standard”soOwaretoolstotargetFPGAs
16-07-11 14
![Page 15: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/15.jpg)
JITCompila-ontoHardware
• Profilecode
16-07-11 15
:ADD R9,R7,R10BEQZ end
L1: ADD R1,R3,R7MULT R11,R12,R13ADD R8,R1,R11SUB R9,R8,#8SLT R8,R9,R7BNZ R8,L1ADD R7,R6,R1:
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FPGAOverlay
CPU
![Page 16: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/16.jpg)
JITCompila-ontoHardware
• IdenCfyhotsegmentsofcode
16-07-11 16
:ADD R9,R7,R10BEQZ end
L1: ADD R1,R3,R7MULT R11,R12,R13ADD R8,R1,R11SUB R9,R8,#8SLT R8,R9,R7BNZ R8,L1ADD R7,R6,R1:
FPGAOverlay
CPU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
![Page 17: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/17.jpg)
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
JITCompila-ontoHardware
• ExtractDFGandconfiguretheoverlay
16-07-11 17
:ADD R9,R7,R10BEQZ end
L1: ADD R1,R3,R7MULT R11,R12,R13ADD R8,R1,R11SUB R9,R8,#8SLT R8,R9,R7BNZ R8,L1ADD R7,R6,R1:
ADD
MULT ADD
SLT SUB
FPGAOverlay
CPU
ADD
MULT ADD
SLT SUB
![Page 18: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/18.jpg)
JITCompila-ontoHardware
• Re-writethecode
16-07-11 18
:ADD R9,R7,R10BEQZ end
L1: ADD R7,R6,R1:
FPGAOverlay
CPU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
ADD
MULT ADD
SLT SUB
ADD
MULT ADD
SLT SUB
![Page 19: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/19.jpg)
JITCompila-ontoHardware
• TransferexecuContotheoverlay
16-07-11 19
:ADD R9,R7,R10BEQZ end
L1: ADDR7,R6,R1:
FPGAOverlay
CPU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
FU FU FU
ADD
MULT ADD
SLT SUB
ADD
MULT ADD
SLT SUB
User-TransparentDynamicProgramAccelera-on
![Page 20: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/20.jpg)
APrototypeJITCompiler• Target:IntelQuickAssistPlamorm• Thecompilerprototype:
– BuiltaroundLLVM,targetsinnermostloopsofscienCficcode– MiCgatesmuchoftherun-CmeoverheadtocompileCme
• Overlaycurrentlybeingintegratedintothetargetplamorm16-07-11 20
CPU CPU
SystemMemory
QPICoherentInterconnect
StraCxFPGA
QPIIPAFU
XeonMulCcoreProcessor
FigureaOerIntelliterature
![Page 21: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/21.jpg)
Accelera-onPoten-al
16-07-11 21
0
1
2
3
4
5
6
7
Speedu
p
Aplplica-on
FPGAsimulaConresultsbasedonmeasuredsystemparameters
![Page 22: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/22.jpg)
Customizability• OneofthekeyadvantagesofFPGAsisthattheycanbe
customizedforapplicaCons
• Overlayscanalsobe“user”customizable– WithminimalusageofFPGAdesigntools
• Inthecontextofourmesh-of-FUs,wecanvarythechoiceoftheFUateachlocaConofthemesh,i.e.,thefuncConallayout,totheoverlaymoreefficientforanapplicaCon
16-07-11 22
![Page 23: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/23.jpg)
ALibrary-BasedApproach
16-07-11 23
A M D D
A M S S
A M A M
A M A M
A M D D
A M S S
A M A M
A M A M
DesiredOverlay LibraryofPre-PlacedandPre-routedOverlays SCtchedOverlay
M M
A AD D
S SA M
A M
• Bopom-Upflowallows(restricted)relocaConofpre-placedandpre-routedgroupsofcells[FPL2014] sCtch
• Example12x15overlay:35minutesvs.15hours
![Page 24: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/24.jpg)
D A S S
A
A
A
M
M
M
M
ProgramAnalysisforCustomiza-on
16-07-11 24
A
M A M
SS
M
D
A
D
MAProgramAnalysis
A M D D
A M S S
A M A M
A M A M
CandidateOverlaysProgramDFGWork-to-be-done
![Page 25: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/25.jpg)
SystemIntegra-on
• MustbeabletovirtualizetheFPGA– Takesnapshots– Migrate– Shareandmanageasaresource
16-07-11 25
CPUs GPUs FPGAs
VM VM
CPUs GPUs FPGAs
VM VM
CPUs GPUs FPGAs
VM VM
Spark Hadoop GraphLab TensorFlow
ApplicaCon ApplicaCon ApplicaCon
![Page 26: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/26.jpg)
OverlaysFacilitateVirtualiza-on• FPGAvirtualizaCononlynowbeingexplored
– Requiresspecializedhardware– Averylarge“state”
• Overlaysnaturallyhaveamuchsmallerstate,facilitaCngsnapshotsandcontextswitching
– Wewouldliketoexplorethissupportinourmesh-of-FUsoverlay
16-07-11 26
![Page 27: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/27.jpg)
ChallengestoOverlays
• Resourceoverhead– Thatis,theFPGAresourcesusedbytheoverlaycomparedtoa
dedicatedcircuit(HDL)thatimplementsthesameapplicaCon– ~4XforourFPoverlayandcanbehigher– DifficulttoquanCfydesigneffort
– FPGAsareareincreasinginsize– HardfloaCngpointunits– Hardeningtheoverlayoncedesignisover?
16-07-11 27
![Page 28: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/28.jpg)
ChallengestoOverlays–Cont’d• OverlayarchitecturesneedmoreexploraCon:which
architectureforagivenapplicaCondomain– Howtoensurescalability?– TakingintoaccounttheunderlyingFPGAdeviceconstraints– Howtoimplementwell(e.g.,data-drivenexecuCon,FIFOs,etc.)?– FixedfuncConvs.mulC-funcConFUs?– Howtoreducingresourceoverhead?– TimemulCplexed?– MulCpledevices?
16-07-11 28
![Page 29: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/29.jpg)
ChallengestoOverlays–Cont’d• EvolvingtheFPGAdesigntools
– Modulararchitecturesdonotleadtomodularcircuits• Thetoolsdonotunderstandthemodularity• Atpresentwemust“fightwiththem”[FPL2014]
– Thetoolsmustevolvetoallowdeveloperstoexpressandtorecognizethemodularityofthearchitecture• Scalablecircuitsfromscalablearchitectures
16-07-11 29
![Page 30: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/30.jpg)
ConcludingRemarks• Acaseforoverlays
– Performance,soOware-friendliness,customizabilityandsystemintegraCon
• Theycanserveas“middleground”betweenhardwaredesignandsoOwareprogramming– EitherforproducConorfordebuggingandprototyping
• Challengestoarchitecture,programmingmodels,implementaConandresourceoverhead
16-07-11 30
![Page 31: Overlays: a soluon paradigm for FPGA high-level design?](https://reader031.fdocuments.in/reader031/viewer/2022012416/6170947fb939fa19c63a5f37/html5/thumbnails/31.jpg)
Ques-ons?
16-07-11 31