Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs
Optimality Study of Logic Synthesis for LUT-Based FPGAs
description
Transcript of Optimality Study of Logic Synthesis for LUT-Based FPGAs
Optimality Study of Logic Synthesis for LUT-Based Optimality Study of Logic Synthesis for LUT-Based FPGAsFPGAs
Jason Cong and Kirill MinkovichJason Cong and Kirill Minkovich
VLSI CAD LabVLSI CAD Lab
Computer Science DepartmentComputer Science Department
University of California, Los AngelesUniversity of California, Los Angeles
Supported by Altera, Xilinx, and Magma under the California MICRO program.
UCLA VLSICAD LAB
OutlineOutline Motivation and backgroundMotivation and background
Current testcases hinted towards algorithms not having much Current testcases hinted towards algorithms not having much room for improvement. room for improvement.
LEKO LEKO Logic synthesis Examples with Known OptimalsLogic synthesis Examples with Known Optimals Creation, optimality, and resultsCreation, optimality, and results
LEKULEKU Logic synthesis Examples with Known Upper boundsLogic synthesis Examples with Known Upper bounds Creation and results Creation and results
ConclusionConclusion
UCLA VLSICAD LAB
Goals of PaperGoals of PaperGoal was to test the optimality of two design steps for logic synthesis:Goal was to test the optimality of two design steps for logic synthesis:
Technology MappingTechnology Mapping Logic Optimization combined with Technology MappingLogic Optimization combined with Technology Mapping
Definitions Definitions Technology MappingTechnology Mapping Logic OptimizationLogic Optimization Logic Synthesis = Logic Optimization + Technology MappingLogic Synthesis = Logic Optimization + Technology Mapping
f
a b c d e
f
a b c d e
f
a b c d e
UCLA VLSICAD LAB
MotivationMotivationLogic synthesis is NP-hard in generalLogic synthesis is NP-hard in general
Combining logic optimization & mapping is much harderCombining logic optimization & mapping is much harder
Academic tools mostly focus on mappingAcademic tools mostly focus on mapping
Problems with current test casesProblems with current test cases
How far from optimal?How far from optimal?
Logic optimization? Logic optimization?
Decrease of FPGA synthesis papersDecrease of FPGA synthesis papers
Suggests fewer improvements possibleSuggests fewer improvements possible
Why there is a need for new onesWhy there is a need for new ones
Test specific properties of logic synthesis toolsTest specific properties of logic synthesis tools
LEKO & LEKU LEKO & LEKU
UCLA VLSICAD LAB
Construction Overview (LEKO)Construction Overview (LEKO) First create a small “core” graph, G5, with a known optimal mapping (and First create a small “core” graph, G5, with a known optimal mapping (and
possibly a logic synthesis) solution.possibly a logic synthesis) solution.
G5 has to have the following propertiesG5 has to have the following properties
1.1. 5 inputs (x5 inputs (x11,x,x22,…,x,…,x55) )
2.2. 5 outputs (y5 outputs (y11,y,y22,…,y,…,y55))
3.3. yyii = = f f (x(x11,x,x22,…,x,…,x55) )
4.4. Internal nodes have exactly two inputs.Internal nodes have exactly two inputs.
5.5. optimal (in terms of area/depth) mapping of optimal (in terms of area/depth) mapping of G5 into a 4-LUT mapping solution with only G5 into a 4-LUT mapping solution with only has 4-LUTs (no 3-LUTs or 2-LUTs). has 4-LUTs (no 3-LUTs or 2-LUTs).
Why these properties?Why these properties?
Simplest G5 for 4-LUT architectureSimplest G5 for 4-LUT architecture
Can be cascaded into larger structuresCan be cascaded into larger structures
y 1 y 2 y 3 y 4 y 5
x 1 x 2 x 3 x 4 x 5
G5
UCLA VLSICAD LAB
G5 – example G5 – example (optimal 7 4-LUTs)(optimal 7 4-LUTs)
I1
I2I5
I3 I4
N1
N6
N8
N13
N21
N20
N2
N9
N10
N24N22
N23
N19
N18
N17
N16
N15
N14
N12
N7
N3
N4
N5 N11
O1 O5
O3
O2 O4
Output Node
Internal Node
Input Node
Legend
O1 = N1 ∙ N10 O2 = N13 ∙ N14 O3 = N12 + N17 O4 = N13 ∙ N16 O5 = N9 + N11 N1 = I1 ∙ I2' + I2 N2 = N1 ∙ I3' + N1' ∙ I3 N3 = N1' ∙ N7' N4 = N1 + N6 N5 = N3' + N4' N6 = I2' + I5' N7 = N1 ∙ N6 N8 = I3 ∙ I4' + I3' ∙ I4 N9 = N8 ∙ N2 N10 = N9 ∙ I5' + N9' ∙ I5 N11 = I5 ∙ N5 N12 = N18 ∙ I5 N13 = I1 + I1' ∙ I2 N14 = N15 + I2 N15 = N17 ∙ I5 N16 = N17 ∙ I5 + N17' ∙ I5' N17 = N20 ∙ N19 N18 = N23' + N24' N19 = N13 + N13' ∙ I3 N20 = I3 + I4 N21 = I2' + I5' N22 = N13 ∙ N21 N23 = N13' ∙ N22' N24 = N13 + N21
Node Values
UCLA VLSICAD LAB
Construction Overview (LEKO)Construction Overview (LEKO) Algorithm Steps Algorithm Steps
1.1. Create a G5Create a G5
2.2. Then duplicate it and connect them together is such a way s.t. there is a Then duplicate it and connect them together is such a way s.t. there is a unique traversal of G5’s from PO to PI. unique traversal of G5’s from PO to PI.
This creates a new graph where we have the following properties:This creates a new graph where we have the following properties: There exists a known optimal mapping solutionThere exists a known optimal mapping solution
This also provides a tight upper-bound to the optimal logic synthesis This also provides a tight upper-bound to the optimal logic synthesis solutionsolution
By using different G5s we can construct different LEKO networks By using different G5s we can construct different LEKO networks
with any variety of properties. with any variety of properties. G5 can have different mapping and logic synthesis solutionsG5 can have different mapping and logic synthesis solutions
G5 can be based on realistic designs (multipliers, adders, etc)G5 can be based on realistic designs (multipliers, adders, etc)
UCLA VLSICAD LAB
Construction Examples (LEKO)Construction Examples (LEKO)
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5 G5 G5 G5 G5 G5 G5 G5 G5
G5 G5 G5 G5 G5 G5 G5 G5 G5
G5 G5 G5 G5 G5 G5 G5 G5 G5
UCLA VLSICAD LAB
OptimalityOptimality
Theorem: The optimal mapping solution of an arbitrarily sized LEKO Theorem: The optimal mapping solution of an arbitrarily sized LEKO
circuit circuit withoutwithout logic optimization is achieved when every G5 in the logic optimization is achieved when every G5 in the
circuit is mapped optimally without overlapping any other G5. circuit is mapped optimally without overlapping any other G5.
Proof Idea: A LUT spanning two layers can will not reduce the area of Proof Idea: A LUT spanning two layers can will not reduce the area of
the solution. This can be easily shown the solution. This can be easily shown
by looking at what would happen to G5 by looking at what would happen to G5
at layer i and at layer i+1 at layer i and at layer i+1
Complete proof is in the paperComplete proof is in the paper
G5
G5
layer i+1
layer i
3-LUT
4-LUT
UCLA VLSICAD LAB
LEKO ExamplesLEKO ExamplesLEKO – Logic synthesis Examples with Known Optimals LEKO – Logic synthesis Examples with Known Optimals
NamingNaming• GG25 25 has 25 inputs and 25 outputshas 25 inputs and 25 outputs
• GGx x has x inputs and x outputshas x inputs and x outputs
Tools testedTools tested
Altera’s Quartus 5.0, Xilinx’s ISE 7.1i, UCLA’s DAOmap and Berkeley’s ABCAltera’s Quartus 5.0, Xilinx’s ISE 7.1i, UCLA’s DAOmap and Berkeley’s ABC
4-LUT architecture4-LUT architecture
Area optimization only (NP-hard)Area optimization only (NP-hard)
Circuits # Nodes
Depth # I/O
Optimal
# LUTs Depth
LEKO
G25 305 13 50 70 4
G125 2350 20 225 525 6
G625 15,875 27 1250 3,500 8
UCLA VLSICAD LAB
Results (LEKO)Results (LEKO)Only mapping needed to produce Only mapping needed to produce optimaloptimal results. results.
What do these mean? What do these mean?
Scaled fairly well Scaled fairly well
Average gap = 15%Average gap = 15%
Why Quartus and ISE did so wellWhy Quartus and ISE did so well
Performed extra non-mapping stepsPerformed extra non-mapping steps
Circuits DAOmap ABC Quartus ISE Optimal
LEKO(G25)Area 83 80 72 80 70
Ratio 1.19 1.14 1.03 1.14 1
LEKO(G125)Area 650 609 561 588 525
Ratio 1.24 1.16 1.07 1.12 1
LEKO(G625)Area 4,435 4,072 3,737 3,974 3,500
Ratio 1.27 1.16 1.07 1.14 1
Average Ratio 1.23 1.16 1.05 1.13 1
UCLA VLSICAD LAB
Creating LEKUCreating LEKULEKU – Logic synthesis Examples with Known Upper bounds LEKU – Logic synthesis Examples with Known Upper bounds
Constructed from LEKO Constructed from LEKO G25 (25 inputs and 25 outputs)
• Collapse then decompose the graphCollapse then decompose the graph• Creates much larger graph that is logically equivalent to originalCreates much larger graph that is logically equivalent to original
• LEKU-CD – collapsed LEKU-CD – collapsed decomposed into AND/OR gatesdecomposed into AND/OR gates
• LEKU-CB – collapsed LEKU-CB – collapsed balancedbalanced
LEKU-CD’LEKU-CD’• LEKU-CD was too large for Xilinx as a single inputLEKU-CD was too large for Xilinx as a single input• Split LEKU-CD into 25 separate designs, one for each POSplit LEKU-CD into 25 separate designs, one for each PO
Circuits # Nodes Depth #I/O Upper-Bound on Optimal
# LUTs Depth
LEKU-CD(G25) 1,166,655 19 50 70 4
LEKU-CB(G25) 814 16 50 70 4
UCLA VLSICAD LAB
Results on LEKUResults on LEKULogic Optimization and Mapping were neededLogic Optimization and Mapping were needed
Academic tools were allowed to use preprocessing toolsAcademic tools were allowed to use preprocessing tools
What does this mean?What does this mean? There exist designs on which these tool perform very badlyThere exist designs on which these tool perform very badly Average gap = 171xAverage gap = 171x Suggest that all of these tools lack global minimization heuristicsSuggest that all of these tools lack global minimization heuristics
Circuits DAOmap ABC Quartus ISE Upper
Bounds
LEKU-CD(G25)
Area 22,717 30,511 10,381 * 70
Ratio 325 436 148 * 1
LEKU-CD(G25)’
Area 25,247 35,271 5,005 9,717 70
Ratio 361 504 72 139 1
LEKU-CB(G25)
Area 322 191 239 280 70
Ratio 4.6 2.7 3.4 4 1
Average Ratio (last 2 designs)
183 255 38 72 1
Average Ratio (ALL) 230 314 74 * 1
UCLA VLSICAD LAB
LEKO/LEKU vs Real DesignsLEKO/LEKU vs Real Designs LimitationsLimitations
Whole circuit is combinational logicWhole circuit is combinational logic Contain highly repeated structures in the original circuitsContain highly repeated structures in the original circuits Doesn’t mean tools are 70x away from optimal on real designsDoesn’t mean tools are 70x away from optimal on real designs
Different uses than real designDifferent uses than real design LEKOLEKO
• Test mapping phase of algorithmTest mapping phase of algorithm Perform well on current LEKO benchmarksPerform well on current LEKO benchmarks Will construct larger core graphs Will construct larger core graphs worse results ? worse results ?
LEKULEKU• Test logic optimization phase of algorithmTest logic optimization phase of algorithm
Ability to reproduce original structureAbility to reproduce original structure Duplication removalDuplication removal Logic IdentificationLogic Identification Other global heuristics Other global heuristics
UCLA VLSICAD LAB
ConclusionsConclusions ConclusionsConclusions
LEKOLEKO• Only circuits that test optimality of technology mapping Only circuits that test optimality of technology mapping • Have an optimal mapping solutionHave an optimal mapping solution
LEKULEKU• Test global area minimizing heuristicsTest global area minimizing heuristics• Have a very tight upper bound on optimal solutionHave a very tight upper bound on optimal solution
These circuits address a need for specific method testingThese circuits address a need for specific method testing
Current state of technology Current state of technology Technology MappingTechnology Mapping
• Current tools do very wellCurrent tools do very well Overall Logic SynthesisOverall Logic Synthesis
• Current tools just can’t produce good solutions that require a global Current tools just can’t produce good solutions that require a global minimization heuristics.minimization heuristics.
UCLA VLSICAD LAB
Conclusions (continued)Conclusions (continued) Download every testcases mentioned here Download every testcases mentioned here
http://cadlab.cs.ucla.edu/http://cadlab.cs.ucla.edu/Click on “Optimality Study”Click on “Optimality Study”Click on “LEKO/LEKU”Click on “LEKO/LEKU”
Harder and Larger LEKO and LEKU circuits will be posted soon! Harder and Larger LEKO and LEKU circuits will be posted soon!
Check out the article in EE TimesCheck out the article in EE Times Just search EE Times for “kirill”Just search EE Times for “kirill” Thank you EE Times for your interest!Thank you EE Times for your interest!
http://eetimes.com/showArticle.jhtml?articleID=180204087http://eetimes.com/showArticle.jhtml?articleID=180204087
Questions? Questions?
UCLA VLSICAD LAB
Additional SlidesAdditional Slides
UCLA VLSICAD LAB
Construction Algorithm (LEKO)Construction Algorithm (LEKO)
UCLA VLSICAD LAB
VariationsVariations
LEKOLEKO Using larger core graphs to create more complex designsUsing larger core graphs to create more complex designs
Using commonly used cells as the core graphsUsing commonly used cells as the core graphs
Using collection of core graphsUsing collection of core graphs
LEKULEKU Using LEKO and adding in specific things to testUsing LEKO and adding in specific things to test
• Duplicating some specific partsDuplicating some specific parts• Adding wires that will be removed when DON’T CARES are Adding wires that will be removed when DON’T CARES are
computedcomputed
UCLA VLSICAD LAB
Interesting New ResultsInteresting New Results After seeing the results we got several responses After seeing the results we got several responses
ABC ABC • RepeatingRepeating
map 4-LUTs map 4-LUTs don’t care calculation don’t care calculation
let to 3x improvement on the largest LEKU examplelet to 3x improvement on the largest LEKU example DAOMapDAOMap
• Multiple iteration of Multiple iteration of map 5-LUTs map 5-LUTs simplify simplify map 4-LUTs map 4-LUTs
showed similar improvements on the LEKU examplesshowed similar improvements on the LEKU examples Altera Altera
• For the LEKO the followingFor the LEKO the followingmap 5-LUT map 5-LUT map 4-LUT map 4-LUT
was able to achieve near optimal solutionswas able to achieve near optimal solutions• This result wouldn’t extend if we used a larger G5This result wouldn’t extend if we used a larger G5
UCLA VLSICAD LAB
Different G5sDifferent G5s Assuming a Assuming a KK-LUT -LUT
G5 has to have the following propertiesG5 has to have the following properties
1.1. It has It has mm inputs and inputs and mm outputs. outputs.
2.2. Every output is a function of all Every output is a function of all fivefive inputs. inputs.
3.3. Each internal node of G5 has exactly two inputs.Each internal node of G5 has exactly two inputs.
4.4. There exists an optimal (in terms of area/depth) mapping of G5 into a There exists an optimal (in terms of area/depth) mapping of G5 into a KK-LUT -LUT mapping solution, denoted M5, such that M5 only has mapping solution, denoted M5, such that M5 only has KK-LUTs. -LUTs.
WhereWhere m m ≥ ≥ K K + 1+ 1
The larger the The larger the mm the harder the G5 is to map the harder the G5 is to map