An FPGA Design And Implementation Framework Combined With Commercial VLSI CADs
ReCoSoC 2013
Qian Zhao・Motoki Amagasaki Masahiro Iida・Morihiro Kuga・Toshinori Sueyoshi
(Kumamoto University, Japan)
Kumamoto University Background
• FPGA IP core development – Make SoC programmable – Requirements:
• Synthesizable • Architecture exploration • Scalability • Other features (3D, dependability,etc)
• FPGA IP design framework is required
– Efficiently evaluate & implement various arch. – Connect to commercial VLSI design flow
1
MCU Mem.
ADC FPGA
IP
System-On-a-Chip
Traditional academic flows
are not applicable
Kumamoto University
• Goal: Develop an FPGA IP design framework • Method: Develop novel routing tool EasyRouter • New features:
– Implement new architectures easily • C#(Mono) Platform (Linux/Windows) • C# script based Arch. definition
– Arch. Exploration -> HDL & Bitstream generation • Same arch. exploration functions as traditional tools • Generate HDL & BitStream with simple templates • Performance analysis with commercial VLSI flow
Goal of this research
2
Kumamoto University Compare with VPR
• VPR is the main academic placement & routing tool
• Only support Island style FPGA • Arch. is defined with XML&C
• Hard to implement new
architectures 3
• C# language • Script based
arch. definition
Novel EasyRouter • HDL & Bitstream
generation • High accuracy
analysis with VLSI CADs
VPR6.0 (current version)
• Not for synthesizable FPGA
• Delay model is not suitable for
standard cells based design
Kumamoto University
• Only support “Island Style” FPGA • Parameters can be changed in XML based Arch. File
• Only small changes can be made for the supposed Arch. • For unsupported Arch., need to modify the source code
4
VPR
void MakeSB(type, fs) {
…
…
}
type=“Wilton“
fs="3"
Arch. File(XML based)
Example of making “Switch box”
VPR: XML based Arch. definition
Kumamoto University VPR: Delay model
5
Routing delay Logic delay
Synthesis
HDL(Gate) & .sdc
HDL(Gate) & .spef Layout
Static Timing Analysis
Logic Block Verilog
Write delay info. to Arch. file
L
WHρ
Wire model
L : calc. from LB area (length of segment wire) W, ρ,H : determined by process
R=ρ𝐿/𝐻∗𝑊
𝐶↓𝑐𝑏_𝑡𝑜_𝐿𝐵
𝐶↓𝑐𝑏_𝑓𝑟𝑜𝑚_𝐿𝐵
𝐶↓𝑡𝑜_𝑠𝑤𝑖𝑡𝑐ℎ 𝐶↓𝑤𝑖𝑟𝑒
𝐶↓𝑓𝑟𝑜𝑚_𝑠𝑤𝑖𝑡𝑐ℎ
Elmore delay model
Critical path delay (Composed with different
delay model)
R
C
Kumamoto University EasyRouter blocks
Clustered netlist
Placement result
EasyRouter
FPGA HDL codes
RRGraph building block
Routing block
HDL codes Generation
block
BitStream Generation
block
Reports
Report Generation
block
Bitstream
Netlist & placement files import
FPGA architecture script file
Arch. & physical parameters setup
RRGraph build
HDL templates
Mapped netlist
6
Breadth_first & Timing_driven routing algorithm (same with VPR)
HDL & Bitstream Generation with routed info.
RRGraph: Routing-resource graph
Arch. Scrip File HDL templates
Kumamoto University
• Implement FPGA Arch. in a script file – Dynamic compiled during execution – Contains both Arch. Implement codes and parameters – Any architecture can be implemented in low cost
EasyRouter
load switch block object
from c# script
int Fs=3; Public switchbox makeSB() { … … }
Arch. Script
Structure can be modified freely
Script based Arch. definition
7
Example of making “Switch box”
Kumamoto University Verilog & BitStream generation
VerilogHDL • Logic block
– Prepared in template • Routing
– Generated according to Channel width / CB,SB topology
• Chip :Top level module according to array size BitStream • Routing & Local connection block
– Generated according to routed info. • Logic block
– Technology mapping results 8
CB: Connection Block SB: Switch Box
Kumamoto University FPGA design flow based on EasyRouter
9
Standard Cell Library
HDL or Blif file
VTR Project (ODIN II , ABC , VPR)
Clustered Netlists
Placement results
EasyRouter (evaluation mode)
Clustered Netlist
Placement result
EasyRouter (CW exploring mode)
Minimum Channel Width ArraySize
Device scale exploration
Fixed Channel Width & Array Size
HDL Template
Device implementation
HDL Codes (RTL) BitStream
Function Verification
VLSI CADs
Synthesis
Timing Report
HDL(Gate) & .sdc
HDL(Gate) & .spef Layout
Static Timing Analysis
GDS
Formal Verification
Formal Verification
FPGA CADs
Kumamoto University Fast evaluation with EasyRouter
Synthesis
Layout
STA
Tile HDL code
EasyRouter (Fast performance analysis)
Chip area & timing report
Tile physical information
Fixed Channel
Width & Array Size
Netlist & placement reading block
RRGraph building block
Routing block
Report generation block
(a) Tile info. (b) Fast evaluation with EasyRouter 10
Placement results
Tile physical information
• The fast evaluation with EasyRouter can be used when • VLSI analysis is too slow (big chip) • No available VLSI flow(3D-FPGA)
Clustered Netlists (.blif)
Kumamoto University Evaluation : EasyRouter
• Device structure – Island style FPGA
• MCNC Benchmarks • Evaluation items
– Minimum channel width for routability evaluation – Execution time for tool efficiency evaluation
11
Item value Logic cluster 4-LUT×4 # of LB inputs 10 SB Wilton(Fs=3) CB Fc_in=0.5, Fc_out=1.0 Channel Single lines
Kumamoto University Minimum channel width (Routability)
0 5
10 15 20 25 30 35 40 45 50
Cha
nnel
wid
th
Benchmarks
VPR EasyRouter
12
• EasyRouter has almost the same routability with VPR • Minor differences are from different routing orders
Kumamoto University Execution time (CPU time)
0
2
4
6
8
10
12
14
Exec
utio
n tim
e no
rmal
ized
by
VPR
Benchmarks
13
• EasyRouter is 8 times slower than VPR on average • Big circuits are about 5 times slower
Kumamoto University Evaluation for proposed design flow
• Process:e-Shuttle 65nm • Device parameters
• MCNC Benchmarks • Evaluation items
– Critical path delay on average(10 placement seeds) 14
Item value Array size 15×15 Logic cluster 4-LUT×4 # of LB inputs 10 SB Wilton(Fs=3) CB Fc_in=0.5, Fc_out=1.0 IOB Fc=1.0 # of routing tracks(single line) 50/channel
Kumamoto University Device architecture for evaluation
15
Homogeneous FPGA Tile structure
IOB
IOB
IOB
IOB
IOB
IOB
IOB
IOB IOB IOB IOB
IOB IOB IOB IOB
Tile Tile Tile Tile
Tile Tile Tile Tile
Tile Tile Tile Tile
Tile Tile Tile Tile IOB
LB
Local Connection
BLE_3
BLE_0
BLE_1
BLE_2
CB_Y
CB_X SB
Kumamoto University Critical path delay results
16
0
50
100
150
200
250
s298 elliptic diffeq alu4 misex3 apex4
Crit
ical
pat
h de
lay
(ns)
Benchmarks
Full FPGA STA
EasyRouter
VPR
• Delay from VPR is about 10x worse than STA results • Delay from EasyRouter is about 1.5x worse then STA results
Kumamoto University Conclusions
EasyRouter • Simple:New arch. can be implemented in low cost • High accuracy:Link academic FPGA tools with
commercial VLSI CADs
Other works with the proposed flow • 3D-FPGA(VLSI-SoC2013)
– “Three-Dimensional Stacking FPGA Architecture Using Face-to-Face Integration”
• Hierarchy routing FPGA(FPL2013) – “Defect-Robust FPGA Architectures For Intellectual Property
Cores In System LSI” 17
Kumamoto University VPR(MIN-MAX-AVG. value from 10 seeds results)
0.0
50.0
100.0
150.0
200.0
250.0
300.0
misex3 alu4 apex4 diffeq elliptic s298
Crit
ical
pat
h de
lay
(ns)
Benchmarks
• Change range of delay results of VPR is large when placement seed value changes 19
76.19ns
Kumamoto University EasyRouter(MIN-MAX-AVG. value from 10 seeds results)
20
Benchmarks
Crit
ical
pat
h de
lay
(ns)
• Change range of delay results of STA results are smaller than VPR when placement seed changes
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0
22.5
25.0
misex3 alu4 apex4 diffeq elliptic s298
4.17ns
Top Related