Functionally Linear Decomposition and Synthesis of Logic Circuits for FPGAs
description
Transcript of Functionally Linear Decomposition and Synthesis of Logic Circuits for FPGAs
June 10, 2008 1
Functionally Linear Decomposition and Synthesis of Logic Circuits for FPGAs
Tomasz S. Czajkowski
and
Stephen D. Brown
University of Toronto
2
FPGA CAD Background
Start with HDL
Convert HDL to gates
Gates to logic components on FPGA
Place and route
Get Results
Program FPGA
HDL Description
Logic Synthesis
Technology Mapping
Place and Route
Timing Analysis
Bitstream Generation
3
Motivation
Synthesis of XOR-based logic circuits is Difficult Time Consuming
Very useful for circuits that deal with Arithmetic Error correction Communication
Focus on area optimization in this work
4
00 01 11 10
00 0 0 0 0
01 0 0 1 1
11 0 1 0 1
10 0 1 1 0
ab
cd
Why Use XOR Gates?
cb
df
a
f bc ad
5
Basic Idea
Express a k-input logic function in a truth table(2n rows, 2m columns, n+m=k)
Find a set of linearly independent columns,also known as a basis
Express each column asa weighted sum of basis functions Column Selector Functions
are the weighting factors
Synthesize
011011
101010
110001
000000
11100100
cd
ab
G1 G2 G1 XOR G2
G1 = cb
G2 = df
a
f = bc + ad
6
Finding Basis Functions Use Gaussian Elimination to
determine the basis columns Perform elementary row
operations (add rows, swap rows) Reduce the matrix until for each
row the column with the left most 1 has only 0s below it
Result The leftmost 1 element of each
non-zero row points to the basis vector in the original truth table
Note: Linear Independence is guaranteed Number of basis vectors is
minimum
0110
1010
1100
0000
0000
1100
1010
0110
0000
1100
1100
0110
0000
0000
1100
0110
0110
1010
1100
0000
7
Express Each Column in terms of G1 and G2
Trivial for columns of all zeros or those that are either G1 or G2
Other columns Ci are expressed as
h1 and h2 are the solution to the following equation
11 2
2
ii
i
hG G C
h
1
2
0 0 0
0 1 1
1 0 1
1 1 0
i
i
h
h
1
2
1
1i
i
h
h
Easy to see
1 1 2 2i i iC h G h G
8
Create Column Selector Functions
For each basis function, G1 and G2, record for which columns h1i and h2i are 1
Create Truth Tables H1 and H2 to identify columns in which h1i and h2i are 1.
H1 and H2 are the selector functions
011011
101010
110001
000000
11100100
cd
ab
1 2
00 0 0
01 1 0
10 0 1
11 1 1
ab H H
H1 = bH2 = a
9
Synthesize
Put G1, G2, H1 and H2 together to synthesize function f
1 1 2 2f H G H G G1 = cH1 = b
G2 = d
f
H2 = a
f bc ad
10
How to order variables? Partition variables
between rows (bound set) and columns (free set) Which one is the better
choice?
For a function with k variables the largest number of possible variable partitions is
)!2/)!*(2/(
)!(2/ kkk
kk
k
011011
101010
110001
000000
11100100
cd
ab
011111
100010
100001
100000
11100100
bc
ad
ad
bc
f
11
Heuristic Variable Ordering: Procedure Step 1:
Starting with n=2, determine all possible partitions with bound set size of 2. Pick k/2 best such that each
variable is in exactly one grouping. Step 2:
For (n=4; n < m; n=n*2) Repeat procedure in Step 1, except
now group groupings generated in the step for n/2.
Step 3: If m is not a power of 2, use the
generated groupings to form valid bound sets and pick the best one (longest step).
Step 4: Reorder variables in f to match the
best grouping of size m found.
a
b
c
d
e
f
g
h
ab
cd
ef
gh
abcd
efgh
best
12
Heuristic Variable Ordering: Runtime
For k=16, m=8 the number of partitions tested is 154, versus 12870 possible partitions 120 tested for n=2, picked 8 best 28 tested for n=4, picked 4 best 6 tested for n=8, picked 2 best
If m was 7 then in addition we would test combinations of valid partitions formed from initial inputs, as well as n=2 and n=4 groups. 4*6*10 = 240 Thus for a 16 variable function we are testing at most
388 partitions (instead of 11440 partitions)
13
Basis and Selector Optimization
Variable ordering can change the area of the final implementation of the logic function
A set of basis/selector functions for a given variable partition is a minimum set, but Not unique Other sets can be better (less costly to
implement) than the one we found
We need to explore alternate solutions
14
G2H2G’H’
Example Same function as before
bound set {b,c} free set {a,d}
Basis-selector pairs are:
Let We can replace G2 with G’
and then we have basis-selector pairs:
011111
100010
100001
100000
11100100
bc
ad
G1H1 +00 01 10 11
00 0 0 0 0
01 0 0 0 0
10 0 0 0 0
11 1 1 1 1
ad
bc
bc
00 01 10 11
00 0 0 0 0
01 0 0 0 0
10 0 0 0 0
11 1 1 1 0
ad
bc
!(ad)*bc
00 01 10 11
00 0 0 0 1
01 0 0 0 1
10 0 0 0 1
11 0 0 0 1
ad
bc
ad
00 01 10 11
00 0 0 0 1
01 0 0 0 1
10 0 0 0 1
11 0 0 0 0
ad
bc
ad*!(bc)
1 2 1G G G
1 1 1 2
2
, 1
1,
G bc H H H
G H H ad
1 1
2 2
,
,
G bc H ad
G bc H ad
15
Multi-Output Synthesis
Put truth tables side by side
Apply Gaussian Elimination to all functions simultaneously Create a common set of basis functions Selector functions are different for each output
16
Example: 2-bit Adder
Synthesize S1 and Cout as
1 1 0 0 1 1
1 1 1 0 0
( )outC x y x y x y
S x y x y
0 0 0 0 0 0 0 1 0 1 1 0
0 0 0 1 1 1 1 0 0 1 1 0
0 0 0 1 1 1 1 0 0 1 1 0
1 1 1 1 0 0 0 1 0 1 1 0
0 0 0 0 0 0
00 01 10 11 00 01 10 11 00 01 10 11
x y x y x y
1 1
00
01
10
11
x y
Cout S1 S0
17
Circuit for Example 2
Let x0y0 be Cin
x1
y1
Cin
Cout
S1
18
Duplication Reduction Replace a duplicate function (related by equality or
complementation) with a wire/inverter
Store a list of functions with k inputs or less created in the process of synthesis If the same function is repeated then connect to it
via a wire/inverter
Both methods are utilized frequently
19
Results 99 MCNC circuits tested
25 XOR based, as determined by prior research Circuit known to have a lot of XOR gates inside Set used in many XOR–based logic synthesis papers
74 non-XOR Compiled BDS-PGA 2.0, ABC, and our tool (FLDS)
under Windows XP Dual Xeon 2.8GHz with 2GB of RAM
Synthesized each circuit with BDS-PGA 2.0,ABC and FLDS.
Used ABC to map logic into 4-LUTs
20
XOR circuits (1 of 2)FLDS vs. ABCFLDS vs. BDSFLDSBDS-PGA-2.0ABC
DepthAreaDepthAreaSizeCone
(s)Time
DepthLUTsSHX(s)
TimeDepthLUTs
(s)Time
DepthLUTsName
0.00-45.000.00-12.0080.02422 X0.074250.084405xp1-33.33-88.60-33.33-77.59200.031413 X 0.546580.0861149sym-20.00-84.71-33.33-31.58120.051413 X0.176190.085859symml-22.22-32.00-41.67-50.96240.0827102XX 0.82122080.099150alu233.336.250.003.7580.036680XXX3.876770.14475C135538.4635.9330.7726.9580.07713167XX 0.9191220.118107C190829.4140.7817.6531.8180.23617613 2.99144180.2212363C354016.670.0033.33-3.7580.035677 3.424800.13577C49918.1832.9318.1826.3580.06711167XXX1.2191230.149112C880
-37.50-92.11-72.22-85.09249.735524 1501181610.168304cordic0.0019.230.0025.0080.036552XX 0.125390.06542count
33.3329.750.0010.1980.1289363XX 2.3493260.166255dalu25.00-13.8112.50-13.68165.45681155 6.03713380.6761340des
0.00-56.10-20.00-43.75120418 0.125320.06441f51m0.000.000.0011.7680.035451 0.14450.06451inc
Cordic two 23-input functions, small area, fast synthesis Neither ABC nor BDS-PGA can synthesize it well
21
XOR circuits (2 of 2)FLDS vs. ABCFLDS vs. BDSFLDSBDS-PGA-2.0ABC
DepthAreaDepthAreaSizeCone
(s)Time
DepthLUTsSHX(s)
TimeDepthLUTs
(s)Time
DepthLUTsName
0.008.570.005.71200.0511635XXX0.8916330.081632my-adder-33.33-30.00-33.33-41.6712027XX 03120.05310rd53-25.00-82.54-25.00-26.67120.015311 X 0.094150.09463rd73-50.00-85.85-40.00-40.0080.031315 0.45250.096106rd84-25.00-53.85-25.00-45.45120312 0.064220.06426sqrt8-40.00-40.00-50.00-53.85160.02312XX 0.076260.08520sqrt8ml-33.33-16.67-33.33-16.67240.015215XX 0.033180.06318squar5-66.67-96.30-66.67-92.86240.07825XX 22.816700.116135t481
0.000.000.000.0080.01522 0220.0522xor50.0012.500.0025.0080.0238XX 0.01360.0537z4ml
-7.68-25.26-14.46-18.7616.2715482.96Total/Average5.497522.81Ratio
Good results Win on both area and depth
Synthesis is fast
22
Non-XOR circuits vs. BDS-PGA
-100
-80
-60
-40
-20
0
20
40
60
Per
cen
t D
iffe
ren
ce
FLDS vs. BDS-PGA 2.0 Average (4.78)
23
Non-XOR circuits vs. ABC
-100
-80
-60
-40
-20
0
20
40
60
Per
cen
t D
iffe
ren
ce
FLDS vs. ABC Average (6.2)
24
Circuits not included in comparison Failed to synthesize with BDS-PGA 2.0 Two circuits failed to synthesize with BDS-PGA 2.0
Ex1010 ABC results: 4094 LUTs, Depth 8, Time 1.52 seconds FLDS results: 1063 LUTs, Depth 7, Time 13.94 seconds, Cone size set to 12 Comparison
Area: -74.04 % Depth: -12.5 %
Misex3 ABC results: 1093 LUTs, Depth 6, Time 0.44 seconds FLDS results: 493 LUTs, Depth 10, Time 3.8 seconds, Cone size set to 16 Comparison
Area: -54.89 % Depth: +40.0 %
25
Interesting Experiment Does FLDS work in tandem with other synthesis tools?
Optimize circuit with FLDS and then apply ABC’s optimizations Compared to ABC alone
Results: XOR circuits:
Area: -24.2 % Depth: -16.2%
Non-XOR circuits: Area: -4.25 % Depth: +1%
Overall Area: -9.3% Depth: -3.3%
26
FLDS with ABC vs. ABC(all circuits included)
-100
-80
-60
-40
-20
0
20
40
60
Per
cen
t D
iffe
ren
ce
FLDS+ABC vs. ABC Average (-9.34)
27
Observations
FLDS is good for XOR based logic
Performs reasonably well for non-XOR logic Most gains due to synthesis of multi-output
logic functions
FLDS is fast Runtime in second for functions larger than
16 inputs
28
Future Work
Look at non-disjoint decomposition
Combine with tools such as ABC to synthesize all types of logic well
29
Acknowledgements
Valavan Manohararajah,Deshanand Singh of Altera Corporation
Professors Zvonko G. Vranesic and Jianwen Zhu from the University of Toronto for their input during the course of this research
We would also like to take this opportunityto thank Altera Corporation for fundingthis research
30
Questions?