PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3...
Transcript of PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3...
1
6/22/2001 DAC 2001 Tutorial: Jason Cong 1
PLD Synthesis Algorithms
Professor Jason CongComputer Science Department
University of California, Los AngelesLos Angeles, CA 90095<[email protected]>
http://cadlab.cs.ucla.edu/~cong
6/22/2001 DAC 2001 Tutorial: Jason Cong 2
What to Synthesize
Structured logic:Examples: datapath, register files, …Best to be provided by FPGA vendors as libraries and functional generators
Random logic: Examples: control circuits, finite state machinesGood candidates to be synthesized by automatic tools
2
6/22/2001 DAC 2001 Tutorial: Jason Cong 3
Programmable Logic Blocks (PLBs) in FPGAs
Lookup-table basedAltera APEX and FLEX devicesLucent Technologies ORCA devicesXilinx Virtex and XC4K devices
MUX-basedActel ACT1 and ACT2Quicklogic Eclips
PLA-based (CPLD)Altera MAX7000Cypress CY37000 and CY39000
6/22/2001 DAC 2001 Tutorial: Jason Cong 4
Focus of This Talk
Synthesis for random logicNeed high-degree of automationMuch room for optimizationExtensive research
Synthesis for SRAM-based (LUT-based) FPGAsHas the largest share in the FPGA marketSynthesis-friendlyReconfigurability provides many potential applications
3
6/22/2001 DAC 2001 Tutorial: Jason Cong 5
Formulation of LUT-Based Synthesis Problems
Logic optimization (Network transformation)Transform the input network into another network that is more suitable for mapping into LUT networks
Technology mapping (LUT covering)Cover the optimized network with LUTs for one or more objectives
6/22/2001 DAC 2001 Tutorial: Jason Cong 6
Logic Optimization OperationsExample: decomposition
structural : abcd = ((ab)c)d
functional: f(a,b,c,d) = g (y1(a,b,c), y2(a,b,c), d)y1
y2 gf
4
6/22/2001 DAC 2001 Tutorial: Jason Cong 7
Logic Optimization Operations (Cont’d)
Extractionf = ac + bc, g = ad + bd then f = ec, g = ed, e = (a+b)
Substitutionf = a+bc , h = bc then f = a + h
Eliminationf = a+bc , b = d+e, then f = a+cd+ce
Critical path re-synthesis…...
6/22/2001 DAC 2001 Tutorial: Jason Cong 8
Technology Mapping for K-LUT
Cover the network using K-LUTsDuplication-free v.s. duplicated mapping (k = 3)
original circuit duplication-free duplication
5
6/22/2001 DAC 2001 Tutorial: Jason Cong 9
Outline
Early results (1990-95)
Recent advances (1995-1999)
New challenges (2000 - )
6/22/2001 DAC 2001 Tutorial: Jason Cong 10
Outline
1990 1995 20001998
• Simple, homogeneous LUTS• E.g. XC2K, Flex8K
• Homogeneous K-LUT mapping for depth, area min.
• Focus on combinational circuits
• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs
• Heterogeneous FPGA mapping
• Mapping for EMBs• Boolean matching• Simultaneous mapping +
retiming
• Million-gate FPGAs• Field-programmable
system-on-a-chip
• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC
Architecture
Synthesis
6
6/22/2001 DAC 2001 Tutorial: Jason Cong 11
Outline
1990 1995 20001998
• Simple, homogeneous LUTS• E.g. XC2K, Flex8K
• Homogeneous K-LUT mapping for depth, area min.
• Focus on combinational circuits
• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs
• Heterogeneous FPGA mapping
• Mapping for EMBs• Boolean matching• Simultaneous mapping +
retiming
• Million-gate FPGAs• Field-programmable
system-on-a-chip
• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC
Architecture
Synthesis
• Simple, homogeneous LUTS• E.g. XC2K, Flex8K
• Homogeneous K-LUT mapping for depth, area min.
• Focus on combinational circuits
• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs
• Heterogeneous FPGA mapping
• Mapping for EMBs• Boolean matching• Simultaneous mapping
+ retiming
• Million-gate FPGAs• Field-programmable
system-on-a-chip
• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC
6/22/2001 DAC 2001 Tutorial: Jason Cong 12
Outline
1990 1995 20001998
• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs
• Heterogeneous FPGA mapping
• Mapping for EMBs• Boolean matching• Simultaneous mapping +
retiming
• Million-gate FPGAs• Field-programmable
system-on-a-chip
• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC
Architecture
Synthesis
• Simple, homogeneous LUTS• E.g. XC2K, Flex8K
• Homogeneous K-LUT mapping for depth, area min.
• Focus on combinational circuits
7
6/22/2001 DAC 2001 Tutorial: Jason Cong 13
Outline
Early results (1990-95)Developed for homogeneous LUTsFocus on combinational circuits
Recent advances (1995-1999)
New challenges (2000 - )
6/22/2001 DAC 2001 Tutorial: Jason Cong 14
Early Results: Depth Minimization
Optimal mapping for treesChortle-d [Francis, Rose, Vranesic, ICCAD’91]
Optimal mapping for general networksFlowMap [Cong&Ding, ICCAD’92]
8
6/22/2001 DAC 2001 Tutorial: Jason Cong 15
Early Result: FlowMapDepth-optimal technology mapping [Cong&Ding, TCAD’94]
BASIC APPROACHCompute a label for each node
Label of a node represents the minimum possible depth of the node in any mapping solution
Dynamic ProgrammingStarting from PI nodes, compute node labels in topological order: compute the label of a node based on labels of its predecessors
Labels of PO nodes give the depth of the optimal mapping solution.
6/22/2001 DAC 2001 Tutorial: Jason Cong 16
Cuts in a NetworkGiven a cut (X, X) and a label l(v) on each node v
Node-Cut size:n(X,X) = |{v:(v,u) is cut}|
K-feasible cut: n(X,X) < K
Height of a cut:h(X,X) = max{l(v)|v ∈ X}
0 0 0
s
1 1
12
2
3
33
3
4 44
4
t
X
X
9
6/22/2001 DAC 2001 Tutorial: Jason Cong 17
Label Computation in FlowMapDynamic programming - compute each node label (optimal mapping depth) by computing a min-height K-feasible cut.Min-height K-feasible cut can be computed in O(Km) time using flow computation
LUT input size K = 3
1 1 1
1
2 2
0 0 0 0 0 0
infeasible cut, h = 0
K-feasible cut, h = 1
Primary inputs
u v
w
6/22/2001 DAC 2001 Tutorial: Jason Cong 18
FlowMap Algorithm: SummaryPhase1 : Label computationProcess each node t in topological order starting from PIs:Compute minimum height K-feasible cut (Xt, Xt) in Nt;l(t) = h(Xt, Xt) + 1;
Phase 2: Generate necessary K-LUTsL = list of POs;WHILE L≠ 0 DO
remove a node t from L;LUT(t) = Xt; L = L ∪ {non-PI inputs to LUT(t)}
END.Produce depth-optimal mapping for any K-bounded network in O(Kmn) time where m: # number of edges; n: # nodes in the network
10
6/22/2001 DAC 2001 Tutorial: Jason Cong 19
Early Results: Area MinimizationOptimal mapping for trees with bounded or unbounded fanins
Chortle-crf [Francis, Rose, Vranesic, DAC’91] :
Optimal mapping without logic duplication for general networks
DF-Map [Cong&Ding, DAC’93] :
NP-hard for general networks with possible logic duplication
[Farrahi&Sarrafzadeh, TCAD’94] :6/22/2001 DAC 2001 Tutorial: Jason Cong 20
Early Results –Combined Synthesis with Mapping
Extension of traditional logic optimization techniques + covering & functional decomposition
MIS-pga and MIS-pga-delay [Murgai et. al., DAC’90, ICCAD91]
Use of functional decomposition to generate a LUT network directly
FGSyn [Lai, Pedram, Vrudula, DAC’93], IMODEC [Wurth, et al, DAC’95], BoolMap-D [Legl, et al, DAC’96]
Mapping with Re-synthesisFlowSYN [Cong&Ding, ICCAD’93] , ALTO [Huang, Jou, Shen,ICCAD’96]
11
6/22/2001 DAC 2001 Tutorial: Jason Cong 21
Outline
1990 1995 20001998
• Simple, homogeneous LUTS• E.g. XC2K, Flex8K
• Homogeneous K-LUT mapping for depth, area min.
• Focus on combinational circuits
• Million-gate FPGAs• Field-programmable
system-on-a-chip
• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC
Architecture
Synthesis
• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs
• Heterogeneous FPGA mapping
• Mapping for EMBs• Boolean matching• Simultaneous mapping
+ retiming
6/22/2001 DAC 2001 Tutorial: Jason Cong 22
Outline
Early resultsRecent advances
Optimization and mapping for sequential circuitsSynthesis for heterogeneous FPGAsSynthesis for FPGAs with embedded memory blocksUse of Boolean matching instead of pattern matchingCombined decomposition and mappingUCLA RASP FPGA synthesis system
New challenges
12
6/22/2001 DAC 2001 Tutorial: Jason Cong 23
Direct Optimization and Mapping for Sequential Circuits
F = 2 without retiming
3-LUT
Φ = 1 with retiming
3-LUT
original circuit
Traditional approachesAssuming the positions of FFs are fixedMapping each combinational subcircuit separatelyThe optimal solutions for all subcircuits may not lead to the optimal solution of the entire circuit
6/22/2001 DAC 2001 Tutorial: Jason Cong 24
Difficulties and Solutions
Difficulties:When to retime?
Before mapping? -- delay is un-known for retimingAfter mapping? -- FF positions are fixed during mapping
How to compute an equivalent initial state?
Solutions:Simultaneous mapping with retiming [Pan&Liu, DAC’96] [Cong&Wu, ICCD’96]
Optimal mapping + forward retiming [Cong&Wu, DAC’98]
13
6/22/2001 DAC 2001 Tutorial: Jason Cong 25
Simultaneous Mapping with Retiming
Key idea -- expanded circuita DAG rooted at a node and,every path from a node to the root has the same #FFs
Usage: to form all possible LUTs under retiming
3-LUToriginal circuit
a
b c
a0a
b c
0
0 1
a
a
b c
0
0 1
1
1b c 2
6/22/2001 DAC 2001 Tutorial: Jason Cong 26
Simultaneous Mapping + Retiming (Cont’d)
Polynomial-time optimal algorithm for mapping + retiming
First proposed in SeqMapII [Pan&Liu, DAC’96]
Significant speed-up (over 2000x) achieved by TurboMap [Cong&Wu, ICCD’96 ]
Automatic pipelining with use of re-synthesis to reduce max. loop’s delay-to-register ratio
TurboSYN [Cong&Wu, DAC’97]
14
6/22/2001 DAC 2001 Tutorial: Jason Cong 27
Experimental Results: Mapping + Retiming + Pipelining
16 MCNC FSMs and ISCAS Sequential Benchmarks with 30~10,000 simple gates
3.3
6.97.9
avg. Clock Period
TurboSYN:resynthesis+retiming+pipeliningTurboMap:mapping+retiming
FlowMap+retiming:separate mapping with retiming
6/22/2001 DAC 2001 Tutorial: Jason Cong 28
Experimental Results:Mapping + Retiming + Pipelining (Cont’d)
16 MCNC FSMs and ISCAS Sequential Benchmarks with 30~10,000 simple gates
206
134 139
84
3617
avg. #LUT avg. #Flipflop
TurboSYN:resynthesis+retiming+pipeliningTurboMap:mapping+retiming
FlowMap+retiming:separate mapping withretiming
15
6/22/2001 DAC 2001 Tutorial: Jason Cong 29
Retiming with Initial States
Many sequential circuit have initial statesRetiming will change the initial state!Equivalent initial state computation for (backward) retiming is NP-hard
f(X)ijk
f(X)move
backward(BRT)
???
exists X, f(X)=y?NP-complete
f(X) ymove
forward(FRT)
y = f(X)guaranteeinit-state
6/22/2001 DAC 2001 Tutorial: Jason Cong 30
Conventional Approaches
Initial-state computation for a given retiming is NP-hardIteration may not find a feasible retiming solution
can find anequivalentinit-state?
originalcircuit
computea retiming
no
yes
finish
16
6/22/2001 DAC 2001 Tutorial: Jason Cong 31
Optimal Mapping with Forward RetimingOptimal mapping with forward retiming (FRT) in polynomial time => guarantee initial state computation
TurboMap-frt [Cong&Wu, DAC’98]
New flow for retiming: Step 1: move FFs backward as much as possible
create large freedom for mapping+FRTStep 2: optimal mapping+FRT
clock period min. with guaranteed equivalent initial states
6/22/2001 DAC 2001 Tutorial: Jason Cong 32
Experimental Results of Optimal Mapping with Forward Retiming
18 Benchmarks with 30~10,000 simple gates10 out 18 TurboMap solutions cannot compute init-states
5.8 5.6
7.0
avg. Clock Period
TurboMap-frt:mapping+forwardretimingTurboMap:mapping+retiming
FlowMap-frt: separatemapping with forwardretiming
17
6/22/2001 DAC 2001 Tutorial: Jason Cong 33
Experimental Results of Optimal Mapping with Forward Retiming
18 Benchmarks with 30~10,000 simple gates10 out 18 TurboMap solutions cannot compute init-states
92 94100
23 2415
avg. #5-LUTs avg. #Flipflops
TurboMap-frt:mapping+forwardretimingTurboMap:mapping+retiming
FlowMap-frt: separatemapping with forwardretiming
6/22/2001 DAC 2001 Tutorial: Jason Cong 34
Technology Mapping for FPGAs with Heterogeneous LUTS
Almost all recent FPGA architectures support heterogeneous LUTs
“One-size fits all” is not good enough
ExamplesXilinx XC4000
1 CLB = 2 x 4-LUTs = 1 x 5-LUT
Lucent ORCA2C1 PFU = 4 x 4-LUTs = 2 x 5-LUTs = 1 x 6-LUT
18
6/22/2001 DAC 2001 Tutorial: Jason Cong 35
XC4000 Block Diagram
1 CLB = 2 x 4-LUTs = 1 x 5-LUT6/22/2001 DAC 2001 Tutorial: Jason Cong 36
1 PFU = 4 x4-LUTs
1 PFU = 2 x5-LUTs
1 PFU = 1 x6-LUT
ORCA2C Block Diagram
19
6/22/2001 DAC 2001 Tutorial: Jason Cong 37
Problem Formulation
Problem
Heterogeneous LUTs have different delays and areas
Two types of heterogeneous LUT-based FPGAs
Fully configurable, no fixed ratio between different types of LUTs
Fixed combination of several different LUTs in a PLB (discussed
later using Boolean matching)
Objective:
Delay or area minimization
6/22/2001 DAC 2001 Tutorial: Jason Cong 38
Mapping for Heterogeneous FPGAs
SolutionsCompute multiple cuts at each node in the network
network-flow computationcut enumeration
Select the most appropriate LUT implementation
Depth minimizationHeteroMap [Cong&Xu, DAC’98]: Polynomial-time delay-optimal for general networks
Area minimizationOptimal for trees [Korupolu, Lee, Wong, DAC’98]
Heuristic for general networks [He&Rose, FPGA’94]
20
6/22/2001 DAC 2001 Tutorial: Jason Cong 39
Comparison between FlowMap and HeteroMap onXC4000 Series FPGAs
0
0.2
0.4
0.6
0.8
1
1.2
Mapping-Delay
PostLayout-Delay
#PLB
Com
paris
on R
atio
FlowMap(5)HeteroMap(5,4)
Comparison between FlowMap and HeteroMap onXC4000 Series FPGAs
0
0.2
0.4
0.6
0.8
1
1.2
Mapping-Delay
PostLayout-Delay
#PLB
Com
paris
on R
atio
FlowMap(5)HeteroMap(5,4)
Heterogeneous v.s. Homogeneous MappingXC4000
-19%-7% +2%
Delay(4-LUT) : Delay(5-LUT) = 1 : 1.5
[Cong&Xu, DAC’98]
6/22/2001 DAC 2001 Tutorial: Jason Cong 40
Comparison between Homogeneous and Heterogeneous FPGAs
Performance Comparison between Homogeneous and Heterogeneous FPGAs
0
0.5
1
1.5
2
2.5
Mapping-Delay MemoryCell-Area
Com
paris
on R
atio
3-LUT-FPGA
4-LUT-FPGA
5-LUT-FPGA
6-LUT-FPGA
3-4-5-6-LUT-HeteroFPGA
Performance Comparison between Homogeneous and Heterogeneous FPGAs
0
0.5
1
1.5
2
2.5
Mapping-Delay MemoryCell-Area
Com
paris
on R
atio
3-LUT-FPGA
4-LUT-FPGA
5-LUT-FPGA
6-LUT-FPGA
3-4-5-6-LUT-HeteroFPGA
Delay(3-LUT) : Delay(4-LUT) : Delay(5-LUT) : Delay(6-LUT) = 1 : 1.3 : 1.7 : 2
21
6/22/2001 DAC 2001 Tutorial: Jason Cong 41
Embedded memory blocks (EMBs)
On-chip memories
Logic functions
FLEX10K Device Block Diagram
Mapping for FPGAs with Embedded Memory Blocks
6/22/2001 DAC 2001 Tutorial: Jason Cong 42
Problem Formulation
Minimize delay and/or area
Mapped Circuit
EMB
EMB
LUTLUT
LUT
LUT
Unmapped Circuit
Limited number of EMBs in one chipConfiguration flexibility of EMBs
E.g. Each EMB in FLEX10K has 2K cells and can be configured to
2Kx1, 1Kx2, 512x4, 256x8 memory
22
6/22/2001 DAC 2001 Tutorial: Jason Cong 43
SolutionsEMB_Pack [Cong&Xu, FPGA’98]
Use EMBs to minimize the circuit area
Maintain the circuit delay
Post-mapping processing and pre-mapping processing
Smap [Wilton, FPGA’98]
Use EMBs to minimize the circuit area
Post-mapping processing
6/22/2001 DAC 2001 Tutorial: Jason Cong 44
Results of EMB_Pack
Comparison between CutMap [Cong&Hwang, FPGA'95] and CutMap Followed by EMB_Pack on MCNC
Benchmarks on FLEX10K Device Family
00.20.40.60.8
11.2
#LUT Layout Delay
Com
paris
on R
atio CutMap
CutMap Followedby EMB_Pack
Comparison between CutMap [Cong&Hwang, FPGA'95] and CutMap Followed by EMB_Pack on MCNC
Benchmarks on FLEX10K Device Family
00.20.40.60.8
11.2
#LUT Layout Delay
Com
paris
on R
atio CutMap
CutMap Followedby EMB_Pack
-10%
23
6/22/2001 DAC 2001 Tutorial: Jason Cong 45
Boolean Matching for Complex PLBs
PLB: Programmable Logic Block
XC4K
G
FHx
f(X)
Example: given a 9-input function f off = x’1x2 + x2x’3 + x’2x3x8 + x5x6a + x’5x’7a + x4x’5x6x7 + x’5x’6x’7a + x5x’6x’7a’a = x’0x4 + x0x’4
Target: Xilinx XC4K FPGAsLUT covering + packing: 4 CLBsBoolean matching: 1 CLB
Advantage: significant area & delay reduction
6/22/2001 DAC 2001 Tutorial: Jason Cong 46
BenefitsMay have significant area and delay reduction
Difficulties : Need to perform Boolean matching
Given an arbitrary function f and a PLB,
determine if PLB can implement f .
Direct Mapping to Programmable Logic Blocks (PLBs)
24
6/22/2001 DAC 2001 Tutorial: Jason Cong 47
Example: Boolean Matching for XC4K CLB
Functional decompositionf (X) = H ( F (X1) , G (X2) ),f(X) = H ( F (X1) , G (X2) , x ),f(X) = H (F(X1,x), G(X2), x ),f(X) = H (F(X1,x), G(X2,x), x ).
ConditionsF and G input sizes ≤ 4
XC4K
G
FHx
f(X)
6/22/2001 DAC 2001 Tutorial: Jason Cong 48
Boolean Matching Results-- for MCNC benchmarks
XC4K CLB can implement98% of 6-input functions
88% of 7-input functions
Circuits 5-input 6-input 7-input9sym 651 1256 2333C499 3649 9716 27599alu2 2696 6666 18231alu4 5889 14841 40332des 28875 65245 157028
Experiment: enumerate all K-input functions
25
6/22/2001 DAC 2001 Tutorial: Jason Cong 49
Application to Technology Mapping(for XC4000 and XC5200 FPGAs)
Comparing to LUT mapping results, the PLB mapping obtains
for XC5200 FPGAs7% depth reduction
13% area reduction
for XC4000 FPGAs17% depth reduction
3% area increase
6/22/2001 DAC 2001 Tutorial: Jason Cong 50
Application to Architecture Evaluation(logic capability v.s. silicon area)
XC4K(0,4,3)24 Memory cells
( > 4 inputs) GH
XC4K(3/4,4,2) 28,36 Memory cells
( > 4 inputs)G
FH
3,4
XC4K CLB40 Memory cells
( > 5 inputs) G
FHH1 XC5K
24-48 Memory cells( > 4 or 5 inputs)G
F3,4,5
S
26
6/22/2001 DAC 2001 Tutorial: Jason Cong 51
Architecture Evaluation(for wide function implementation)
# implementable functions / # memory cells for each type of PLB
0
1000
2000
3000
4000
B.2XC4K
(4,4,MUX) (0,4,3)XC4K
(3,4,2) (4,4,2)XC5K XC4K
5-input funcs6-input funcs
6/22/2001 DAC 2001 Tutorial: Jason Cong 52
Combined Decomposition with Mappinga
(a) Initial 5-bounded network
b c d e f g a b c d e f g
(b) Best mapping after dmig: depth 3, area 5
27
6/22/2001 DAC 2001 Tutorial: Jason Cong 53
Impact of Decompositiona b c d e f g
(c) Optimal decomposition: depth 2, area 3
a
(a) Initial 5-bounded network
b c d e f g
6/22/2001 DAC 2001 Tutorial: Jason Cong 54
Problem Formulation
Structural Gate Decomposition in a W-bounded network for K-LUT mapping (W-SGD/K)Goal: find a decomposition with minimum depth after mappingThe W-SGD/K problem is NP-hard for W ≥ K ≥ 5 [Cong&Hwang, DAC96]
28
6/22/2001 DAC 2001 Tutorial: Jason Cong 55
Available Solutions
Simultaneous decomposition and mapping for trees (Chortle-crf or Chortle-d algorithms)Combines bin-packing with flow computation (for computing the min height cuts): DOGMA [Cong&Hwang, DAC’96]Use a mapping graph to encode all possible (or a large class of) decompositions, and compute a mapping solution on it: SLDmap [Chen&Cong, FPGA’01]
6/22/2001 DAC 2001 Tutorial: Jason Cong 56
Mapping Graph Definition
A modified AND2/INV network to encode a set of circuit structures in a single graph [Lehman et al. ICCAD95]
choice nodes (logical equivalence)ugates (two choice nodes and fanins)cycles
Reductionunique choice nodeunique INV and AND2 nodes
29
6/22/2001 DAC 2001 Tutorial: Jason Cong 57
Mapping Graph Example
A4
B
C
D
1
2
3
5 6 7 8 9
A
B
C
D
a
b
c
d
e
f g h
Z
6/22/2001 DAC 2001 Tutorial: Jason Cong 58
Mapping Graph Example
A4
B
C
D
1
2
3
5 6 7 8 9
a
b
c
d
e
f g h
Z3
c
30
6/22/2001 DAC 2001 Tutorial: Jason Cong 59
Mapping Graph Example
A4
B
C
D
1
2
3
5 6 7 8 9
a
b
d
e
f g h
Z3i
7
f
3i
6/22/2001 DAC 2001 Tutorial: Jason Cong 60
Mapping Graph Example
A4
B
C
D
1
2
3i
5 6
7
8 9
a
b
d
e
f g h
Z
8
g
9
h
31
6/22/2001 DAC 2001 Tutorial: Jason Cong 61
Mapping Graph Example
A4
B
C
D
1
2
3i
5 6
7a
b
d
e
f8g 9h Z
6/22/2001 DAC 2001 Tutorial: Jason Cong 62
Initial W-bounded network generationMapping graph constructionDepth optimal labelingLabel relaxation and area minimizationDecomposition selectionFixed decomposition mapping
Overview of SLDMap [Chen&Cong, FPGA’01]
32
6/22/2001 DAC 2001 Tutorial: Jason Cong 63
Experimental Flow
MCNC Circuit Set, UCLA RASP package, CUDDdmig [chen et. al, IEEE Design & Test 92]
dogma [cong, DAC96]
dmig dogma sldmap
Xilinx Foundation 3.1 P&R
greedy_pack
cutmap
Initial W-bounded network
6/22/2001 DAC 2001 Tutorial: Jason Cong 64
Depth/Area Comparison
0.940.960.98
11.021.041.061.08
1.11.12
depth area
dmigdogmasldmap
33
6/22/2001 DAC 2001 Tutorial: Jason Cong 65
Post-layout Delay
0
20
40
60
80
100
k2(V) 9sym(S) i3(V) x1(S) C499(3K)
dogma sldmap
ns
V:Vertex S:Spartan 3K:XC3K
6/22/2001 DAC 2001 Tutorial: Jason Cong 66
Outline
Early resultsRecent advances
Synthesis and optimization for sequential circuitsSynthesis for heterogeneous FPGAsSynthesis for FPGAs with embedded memory blocksUse of Boolean matching instead of pattern matchingCombined decomposition with mappingUCLA RASP FPGA synthesis system
New challenges
34
6/22/2001 DAC 2001 Tutorial: Jason Cong 67
UCLA RASP Synthesis Systemhttp://cadlab.cs.ucla.edu
EDIFnetlist HDL design
Internal netlist
LUT MappingEngine
LUT netlist PLB MappingEngine
Vendor Specific netlistXilinx, Altera, ORCA
PlacementRouting
Chip ProgrammingInformation
6/22/2001 DAC 2001 Tutorial: Jason Cong 68
Objective 1:A Flexible and Efficient FPGA Synthesis Engine
Delay Optimal Mapping Area Optimal MappingFlowMap/HeteroMap
FlowSYNTurboMapTurboSYN
DF-mapCutMap-EMarkMap
Delay/Area Trade-offFlowMap-r
CutMapCutSyn
PLB MappingPDDmapPDDSYN
Match-4K/3KEAB-pack
Gate Decompositionfor Mapping
DMIGDOGMASLDmap
35
6/22/2001 DAC 2001 Tutorial: Jason Cong 69
Objective 2:FPGA Architecture Evaluation
6/22/2001 DAC 2001 Tutorial: Jason Cong 70
Outline
1990 1995 20001998
• Simple, homogeneous LUTS• E.g. XC2K, Flex8K
• Homogeneous K-LUT mapping for depth, area min.
• Focus on combinational circuits
• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs
• Heterogeneous FPGA mapping
• Mapping for EMBs• Boolean matching• Simultaneous mapping +
retiming
Architecture
Synthesis
• Million-gate FPGAs• Field-programmable
system-on-a-chip
• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC
36
6/22/2001 DAC 2001 Tutorial: Jason Cong 71
Outline
Early resultsRecent advancesNew challenges
Integration of synthesis and layoutField-programmable system-on-a-chip
6/22/2001 DAC 2001 Tutorial: Jason Cong 72
Logic vs. Interconnect DelaysExample: Altera FPGA: (EPF8282A A-2 speed, Altera Data Book’98)
LE delay: 2.4 nsconnection between LE in same LAB: 0.5 nsconnection between LE in same row, different LAB:4.7 nsconnection between LE in different row: 7.2 ns
37
6/22/2001 DAC 2001 Tutorial: Jason Cong 73
Layout-Driven SynthesisIterative design flow
construct-by-correctionneed to guarantee convergence
Concurrent design flowcorrect-by-constructionneed to handle design abstraction, constraint propagation, design refinement
Best candidate: combination of iterative and concurrent design approaches
concurrent synthesis, layout planning, and solution refinementlimited number of iterations within the same or adjacent levels to correct unacceptable estimation errors
6/22/2001 DAC 2001 Tutorial: Jason Cong 74
Layout-Driven Synthesis Flow of ADT
HDL DESIGN OR NETLIST FROM THIRD PARTY SYNTHESIS TOOL
FPGA VENDOR P&R TOOL
GLOBAL LOGIC OPTIMIZATION AND INTERCONNECT PLANNING
PLACEMENT-DRIVEN SYNTHESIS AND ARCHITECTURE EMBEDDING
Source: www.aplus-dt.comCourtesy of Aplus Design Technologies, Inc. (ADT)
38
6/22/2001 DAC 2001 Tutorial: Jason Cong 75
Use of IP Blocks
Classification of IP BlocksSoftHardFirm
ChallengesIP representation and characterizationInterface with synthesis toolsIP protection
6/22/2001 DAC 2001 Tutorial: Jason Cong 76
Field-Programmable System-on-a-Chip (FPSOC)
processor
memory
ProgrammableLogic
General-Purpose FPSOC
processor
memory ProgrammableLogic
ASIC
Application-specific FPSOC
39
6/22/2001 DAC 2001 Tutorial: Jason Cong 77
Design Challenges
Integration ofEmbedded operating systemsCompilersSynthesis toolsLayout tools
Need for architecture evaluationExplore and choose the best embedded FPGA architecture (for the given application domain)
6/22/2001 DAC 2001 Tutorial: Jason Cong 78
ArchEvaluator: ADT’s PLD Architecture Evaluation Tool
Evaluation of programmable logic blocksSizesConfigurations
Evaluation of on-chip hierarchyNumber of levelsSizes, configuration, and delays at each level
Evaluation of heterogeneous architecturesMultiple sizes and/or configurations of the same type of logic blocksMultiple types of logic blocksDifferent kinds of resources on the same chip
Embedded Array Configuration Array aspect-ratioSingle vs. multiple arrays …
Source: www.aplus-dt.comCourtesy of Aplus Design Technologies, Inc. (ADT)
40
6/22/2001 DAC 2001 Tutorial: Jason Cong 79
Conclusions
Synthesis and technology mapping for homogeneous LUTs is a well-understood problem (some room for area min.)Recent advances in FPGA synthesis enable many new architecture innovations
Embedded memory blocksHeterogeneous LUT based FPGAsArchitectures for efficient retiming and pipelining
New FPGA synthesis tools and algorithms have toconsider layout designsupport efficient IP re-use
Field-programmable logic will be an important component of system-on-a-chip designs
6/22/2001 DAC 2001 Tutorial: Jason Cong 80
Acknowledgments
Contributions from Current and former graduate students from my group: Michael Chen (UCLA), Eugene Ding (Agere), Yean-Yow Hwang (Altera), John Peck (AMD), Chang Wu (ADT), and Songjie Xu (ADT)Other colleagues: Peichen Pan (ADT)
Supports from National Science FoundationSupport from Actel, Altera, Lucent Technologies, Quickturn, Vantis/Lattic, and Xilinx under the California MICRO Program
41
6/22/2001 DAC 2001 Tutorial: Jason Cong 81
Further Information
Visithttp://cadlab.cs.ucla.edu/~conghttp://cadlab.cs.ucla.edu/projects/fpga
Updated copy of the slides of this talkSurvey/tutorial paper on FPGA synthesis
Cong and Ding, ACM TODAES, 1996
Recent research publications and software on FPGA synthesis from UCLA
6/22/2001 DAC 2001 Tutorial: Jason Cong 82
Speaker Bio
JASON CONG received his B.S. degree in computer science from Peking University in 1985, his M.S. and Ph. D. degrees in Computer Science from the University of Illinois at Urbana-Champaign in 1987 and 1990, respectively. Currently, he is a Professor and Co-Director of the VLSI CAD Laboratory in the Computer Science Department of University of California, Los Angeles. His research interests include layout synthesis and logic synthesis for high-performance low-power VLSI circuits, design and optimization of high-speed VLSI interconnects, synthesis and architecture design for FPGAs. Dr. Cong is a fellow of IEEE and serves as a consultant or advisory board member for several semiconductor or EDA companies. In 1998, Dr. Cong founded Aplus Design Technologies, Inc. (www.aplus-dt.com), which provides innovative layout-driven synthesis solutions and architecture evaluation solutions for both stand-alone FPGAs/CPLDs and embedded FPGAs for SOC designs.