Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit...

30
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley

Transcript of Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit...

Page 1: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

Combinational and Sequential Mapping with Priority Cuts

Alan MishchenkoSungmin ChoSatrajit ChatterjeeRobert Brayton

UC Berkeley

Page 2: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

2

Outline

1. Traditional cut-based LUT mapping

2. Improved technology mapping with priority cuts

3. Sequential mapping

4. Other applications of priority cuts

5. Experimental results

Page 3: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

3

Technology Mapping

Input: A Boolean network (And-Inverter Graph)

Output: A netlist of K-LUTs implementing the Boolean network optimizing some cost function

a b c d

f

The subject graph The mapped netlist

TechnologyMapping

e a b c d e

f

Page 4: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

4

k-feasible Cuts

a b c

p q

r

A cut of a node n is a set of nodes in transitive fan-in

such thatevery path from the node to PIs is blocked by nodes in the cut.

A k-feasible cut means the size of the cut must be k or less.

The set {p, b, c} is a 3-feasible cut of node r. (It is also a 4-feasible cut.)

k-feasible cuts are important in FPGA mapping since the logic between a node and the nodes in its cut can be replaced by a k-LUT.

Page 5: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

5

k-feasible Cut Computation

a b c

p q

{ {p}, {a, b} } { {q}, {b, c} }

{ {a} } { {b} } { {c} }

r

{ {r}, {p, q}, {p, b, c}, {a, b, q}, {a, b, c} }

The set of cuts of a node is a ‘cross product’ of the sets of cuts of its children

Any cut that is of size greater than k is discarded

Computation is done bottom-up

(P. Pan et al, FPGA ’98; J. Cong et al, FPGA ’99)

Page 6: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

6

Basic Mapping AlgorithmDepth-optimal LUT mapping of a DAG using all cuts at each node

Input: And-Inverter Graph

1. Compute K-feasible cuts for each node2. Compute best arrival time at each node

• In topological order (from PI to PO)• Compute the depth of all cuts and choose the best one

3. Perform area recovery• Using area flow• Using exact local area

4. Chose the best cover • In reverse topological order (from PO to PI)

Output: Mapped Netlist

Page 7: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

7

Area Recovery Summary

• Area recovery heuristics– Area-flow (global view)

• Chooses cuts with better logic sharing

– Exact local area (local view)• Minimizes the number of LUTs needed to map each node

• The results of area recovery depends on – The order of processing nodes– The order of applying two passes– The number of iterations

• This scheme works for the constant-delay model– Any change off the critical path doesn’t affect critical path

Page 8: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

8

Drawbacks of Traditional Mapping Based on Exhaustive Cut Enumeration

• For large designs, there may be many k-feasible cuts– Order of millions

• Previous ways of dealing with the problem– Detect and remove cut dominance – Perform cut pruning– Store only cuts on the frontier of

mapping

k

Average number of cuts

per node

4 6

5 20

6 80

7 150

8 240

Page 9: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

9

Outline

1. Traditional cut-based technology mapping

2. Improved technology mapping

3. Sequential mapping

4. Other applications of priority cuts

5. Experimental results

Page 10: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

10

New Mapping AlgorithmNear-depth-optimal LUT mapping of a DAG using several cuts at each node

Input: And-Inverter Graph1. Compute K-feasible cuts for each node2. Compute arrival time at each node

• In topological order (from PI to PO)• Compute the depth of all cuts and choose the best one• Compute at most C good cuts and choose the best one

3. Perform area recovery• Using area flow• Using exact local area• Re-compute at most C good cuts and choose the best one in

each iteration4. Chose the best cover

• In reverse topological order (from PO to PI)Output: Mapped Netlist

Page 11: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

11

Computing Priority Cuts

• Consider nodes in a topological order– At each node, merge two sets of fanin cuts (each containing C cuts)

getting (C+1) * (C+1) + 1 cuts

– Sort these cuts using a given cost function, select C best cuts, and use them for computing priority cuts of the fanouts

– Select one best cut, and use it to map the node

• Sorting criteria

Mapping pass Primary metric Tie-breaker 1 Tie-breaker 2 Depth depth cut size area flow Area flow area flow fanin refs depth Exact area exact area fanin refs depth

Page 12: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

12

Discussion

• Complexity analysis– Traditional mapping algorithm

• FlowMap O(Kmn) (J. Cong et al, TCAD ’94)• CutMap O(2KmnK) (J. Cong et al, FPGA ’95)

– Proposed mapping algorithm• O(KC2n)

• K - max cut size• C - max number of cuts• n - number of nodes• m – number of edges

Page 13: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

13

Priority Cuts: A Bag of Tricks

Compute and use priority cuts (a subset of all cuts) Dynamically update the cuts in each mapping pass Use different sorting criteria in each mapping pass Include the best cut from the previous pass into the set

of candidate cuts of the current pass Consider several depth-oriented mappings to get a good

starting point for area recovery Use complementary heuristics for area recovery Perform cut expansion as part of area recovery Use efficient memory management

Page 14: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

14

Outline

1. Traditional cut-based technology mapping

2. Improved technology mapping

3. Sequential mapping

4. Other applications of priority cuts

5. Experimental results

Page 15: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

15

Sequential Mapping

That is, combinational mapping and retiming combined Minimizes clock period in the combined solution space Previous work:

Pan et al, FPGA’98 Cong et al, TCAD’98

Our contribution: divide sequential mapping into steps Find the best clock period via sequential arrival time computation

(Pan et al, FPGA’98) Run combinational mapping with the resulting arrival/required

times of the register outputs/inputs Perform final retiming to bring the circuit to the best clock period

computed in Step 1

Page 16: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

16

Sequential Mapping (continued)

• Advantages– Uses priority cuts (L=1) for computing sequential arrival times

• very fast– Reuses efficient area recovery available in combinational mapping

• almost no degradation in LUT count and register count– Greatly simplifies implementation

• due to not computing sequential cuts (cuts crossing register boundary)

• Quality of results– Leads to quality that is better (by ~15%) than combinational mapping

followed by retiming• due to searching the combined search space

– Achieves almost the same (-1%) clock period as the general sequential mapping with sequential cuts

• due to using transparent register boundary without computing sequential cuts

Page 17: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

17

Outline

1. Traditional cut-based technology mapping

2. Improved technology mapping

3. Sequential mapping

4. Other applications of priority cuts

5. Experimental results

Page 18: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

18

Speeding Up SAT Solving

• Perform technology mapping into K-LUTs for area– Define area as the number of CNF clauses needed to represent

the Boolean function of the cut– Run several iterations of area recovery

• Reduced the number of CNF clauses by ~50% – Compared to a smart circuit-to-CNF translation (M. Velev)

• Improves SAT solver runtime by 3-10x– Experimental results will be given later

Page 19: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

19

Minimizing the Total Number of BDD Nodes Needed to Represent a Boolean Network

• Perform technology mapping into K-LUTs for minimizing area under delay constraints– Define area of a cut as the number of BDD nodes needed to

represent the Boolean function of the cut– Run delay-oriented mapping, followed by several iterations of

area recovery

Page 20: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

20

Cut Sweeping

• Reduce the circuit by detecting and merging shallow equivalences (proposed by Niklas Een)– By “shallow” equivalences, we mean equivalent points, A and B,

for which there exists a K-cut C (K < 16) such that FA(C) = FB(C)– A subset of “good” K-input priority cuts can be computed– The quality of a cut is determined by the number of fanouts of

the cut leaves• The more fanouts, the more likely the cut is a common cut for two

nodes

• Cut sweeping quickly reduces the circuit– Typically ~50% gain of SAT sweeping (Fraiging)

• Cut sweeping is much faster than SAT sweeping– Typically 10-100x, for large designs

• Can be used as a fast preprocessing to (or a low-cost substitute for) SAT sweeping

Page 21: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

21

Sequential Resynthesis for Delay

• Restructure logic along the tightest sequential loops to reduce delay after retiming (Soviani/Edwards, TCAD’07)– Similar to sequential mapping– Computes seq arrival times for the circuit– Uses the current logic structure, as well as

logic structure, transformed using Shannon expansion w.r.t. the latest variables

– Accepts transforms leading to delay reduction– In the end, retimes to the best clock period

• The improvement is 7-60% in delay with 1-12% area degradation (ISCAS circuits)

• This algorithm could benefit from the use of priority cuts

Page 22: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

22

Outline

1. Traditional cut-based technology mapping

2. Improved technology mapping

3. Sequential mapping

4. Other applications of priority cuts

5. Experimental results

Page 23: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

23

Experimental Comparison

• Compare the new mapping against the traditional mapping in terms of – Delay– Area– Runtime– Memory

• Compare on large industrial benchmarks with choices• Analyze the performance of the new mapping for

– Large designs– Large LUTs

• Explore the potential of sequential mapping

• Computer used for experiments– IBM ThinkPad laptop with 1.6GHz and 2Gb RAM

Page 24: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

24

Priority cuts vs. Cut enumeration (C=8)

Ratio K = 4 K = 6 K = 8 K = 10 old new old new old new old new

Depth 1.00 1.00 1.00 1.00 1.00 0.93 1.00 0.82 Area 1.00 0.99 1.00 1.00 1.00 0.96 1.00 0.84 Memory 1.00 0.12 1.00 0.06 1.00 0.05 1.00 0.05 Runtime 1.00 0.78 1.00 0.15 1.00 0.02 1.00 0.03

Used a set of the large public benchmarks

Page 25: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

25

Priority Cuts vs. Cut Enumeration (K=6, C = 16)M

appi

ng w

/o c

hoic

esM

appi

ng w

ith c

hoic

es

Priority cuts

Priority cuts

Cut enumeration

Cut enumeration

Used a set of large industrial benchmarks

Page 26: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

26

Performance on Large Designs (C=1)

AIG statistics FPGA mapping statistics

Computer resources Number of

frames Levels Nodes Depth Number of LUTs

Memory, Mb

Runtime, sec

1 18 40381 4 11069 2.21 0.02 20 284 808135 61 205143 42.68 0.42 40 564 1616285 121 409149 85.28 0.84 60 844 2424435 181 613155 127.88 1.35 80 1124 3232585 241 817161 170.48 1.77 100 1404 4040735 301 1021167 213.09 2.25

Using design wb_conmax.v (part of IWLS 2005 benchmarks)

This is a WISHBONE Interconnect Matrix IP core. It can interconnect up to 8 Masters and 16 Slaves

Source: http://www.opencores.org

Page 27: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

27

Performance for Large LUTs (C=1)

LUT FPGA mapping statistics

Computer resources

size Depth Number of LUTs

Memory, Mb

Runtime, sec

4 602 2279062 114.74 1.89 6 451 1704400 147.52 2.00 8 352 1205319 180.30 2.19

10 301 1021167 213.09 2.24 12 276 1044370 245.87 2.50 14 227 799618 278.65 2.55 16 202 694954 311.43 2.62

Using 100 timeframes of design wb_conmax.v

Page 28: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

28

Sequential Mapping (K=6, C=8)

Statistics Depth (LUTs) Area (LUTs) Area (registers) Time, sec Name

PI PO AIG M M+R MR M M+R MR M M+R MR M MR s13207 31 121 2136 6 5 4 1047 1047 1056 648 666 733 0.06 0.23 s1423 17 5 441 10 10 9 131 131 146 74 74 80 0.01 0.04 s15850.1 77 150 2755 9 7 6 1012 1012 1042 516 552 533 0.09 0.38 s15850 14 87 2760 9 7 5 1002 1002 1015 563 640 640 0.09 0.43 s35932 35 320 8129 3 3 2 2320 2320 2320 1728 1728 1872 0.19 0.45 s382 3 6 100 3 3 2 36 36 34 21 21 22 0.00 0.04 s38417 28 106 8171 6 6 5 2623 2623 2901 1564 1564 1636 0.28 3.02 s38584.1 38 304 9967 6 6 5 2491 2491 2558 1276 1276 1299 0.31 0.81 s38584 12 278 9989 6 6 5 2504 2504 2517 1301 1301 1327 0.31 0.92 s9234.1 36 39 1349 5 5 3 319 319 332 145 145 171 0.03 0.10 s9234 19 22 1349 5 4 3 321 321 330 160 181 182 0.02 0.14 Ratio 1.00 0.93 0.71 1.00 1.00 1.03 1.00 1.03 1.08 1.00 4.54

Used a subset of ISCAS benchmarks, for which retiming reduced delay

Page 29: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

29

Summary

• Reviewed traditional technology mapping– Cut computation– Optimum-depth mapping– Area recovery

• Presented an improved approach to mapping– Computes a small number of cuts at each node– Uses new ideas to dramatically reduce memory and runtime

• Reported experimental results– Compared priority cuts with exhaustive cut enumeration

• Delay and area are comparable or better by 1-3%• Memory and runtime are greatly reduced (5x for 6-LUTs)

– Showed performance on very large designs (2 sec to map 1M)– Compared combinational and sequential mapping

• Implemented in ABC– Google: “abc berkeley” (package “if”)

Page 30: Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

30

The End