Outline - University of California, Los Angelescadlab.cs.ucla.edu/~cong/CS258F/chap3.pdfJason Cong...

CS258F 2002F

Prof. J. Cong 1

Jason Cong 1

Jason Cong 2

Outline

a Circuit Partitioning formulation

a Importance of Circuit Partitioning

a Partitioning Algorithms

a Circuit Clustering Formulation

a Clustering Algorithms

CS258F 2002F

Prof. J. Cong 2

Jason Cong 3

Circuit Partitioning Formulation

Bi-partitioning formulation:Minimize interconnections between partitions

a Minimum cut: min c(x, x’)

a minimum bisection: min c(x, x’) with |x|= |x ’|

a minimum ratio-cut: min c(x, x’) / |x||x’|

X X’

c(X,X’)

Jason Cong 4

A Bi-Partitioning Example

Min-cut size=13Min-Bisection size = 300Min-ratio-cut size= 19

a

b

c e

d f

mini-ratio-cut min-bisection

min-cut 9

10

100100 100

100100

100

4

Ratio-cut helps to identify natural clusters

CS258F 2002F

Prof. J. Cong 3

Jason Cong 5

Circuit Partitioning Formulation (Cont’d)

General multi-way partitioning formulation:

Partitioning a network N into N1, N2, …, Nk such that

a Each partition has an area constraint

a each partition has an I/O constraint

Minimize the total interconnection:

∑∈

≤iNv

iAva )(

iii INNNc ≤− ),(

),( iN

i NNNci

−∑

Jason Cong 6

Importance of Circuit PartitioningaDivide-and- conquer methodology

The most effective way to solve problems of high complexity

E.g.: min-cut based placement, partitioning-based test generation,…

aSystem-level partitioning for multi -chip designs

inter-chip interconnection delay dominates system performance.

aCircuit emulation/parallel simulation

partition large circuit into multiple FPGAs (e.g. Quickturn), or multiple special -purpose processors (e.g. Zycad).

aParallel CAD development

Task decomposition and load balancing

a In deep-submicron designs, partitioning defines local and global interconnect, and has significant impact on circuit performance

…… ……

CS258F 2002F

Prof. J. Cong 4

Jason Cong 7

Partitioning Algorithms

a Iterative partitioning algorithms

a Spectral based partitioning algorithms

a Net partitioning vs. module partitioning

aMulti-way partitioning

aMulti-level partitioning (to be discussed after clustering)

a Further study in partitioning techniques

Jason Cong 8

Iterative Partitioning Algorithmsa Greedy Iterative improvement method

[Kernighan-Lin 1970]

[Fiduccia-Mattheyses 1982]

[krishnamurthy 1984]

a Simulated Annealing

[Kirkpartrick-Gelatt-Vecchi 1983]

[Greene-Supowit 1984]

(See Appendix, SA will be formally introduced in the Floorplan chapter)

CS258F 2002F

Prof. J. Cong 5

Jason Cong 9

Kernighan-Lin’s Algorithma Pair-wise exchange of nodes to reduce cut sizea Allow cut size to increase temporarily within a pass

Compute the gain of a swapRepeat

Perform a feasible swap of max gainMark swapped nodes “locked”;Update swap gains;

Until no feasible swap; Find max prefix partial sum in gain sequence g1, g2, …, gm

Make corresponding swaps permanent.

a Start another pass if current pass reduces the cut size(usually converge after a few passes)

u • v •

v • u •

locked

Jason Cong 10

Fiduccia-Mattheyses’ Improvement

a Each pass in KL-algorithm takes O(n3) or O(n2 logn) time (n: #modules)Choosing swap with max gain and updating swap gains take O(n2) time

a FM-algorithm takes O(p) time per pass( p: #pins)

a Key ideas in FM-algorithms! Each move affects only a few moves

⇒ constant time gain updating per move(amortized)! Maintain a list of gain buckets

⇒ constant time selection of the move with max gain

a Further improvement by KrishnamurthyLook-ahead in gain computation

•

•

u1

V1 V2

u2

gmax

-gmax

CS258F 2002F

Prof. J. Cong 6

Jason Cong 11







Jason Cong 12

Spectral Based Partitioning Algorithms

a

b

c

d

13

4

3

0

3

4

3

343

000

001

010

d

c

b

a

A

dcba

=

10

0

0

0

000

300

050

004

d

c

b

a

D

dcba

=

D: degree matrix; A: adjacency matrix; D-A: Laplacian matrix

Eigenvectors of D-A form the Laplacian spectrum of G

CS258F 2002F

Prof. J. Cong 7

Jason Cong 13

Some Applications of Laplacian Spectrum

a Placement and floorplan[Hall 1970][Otten 1982][Frankle-Karp 1986][Tsay-Kuh 1986]

a Bisection lower bound and computation[Donath-Hoffman 1973][Barnes 1982][Boppana 1987]

a Ratio-cut lower bound and computation[Hagen-Kahng 1991][Cong-Hagen-Kahng 1992]

Jason Cong 14

Eigenvalues and Eigenvectors

If Ax=λx

then λ is an eigenvalue of A

x is an eignevector of A w.r.t. λ

(note that Kx is also a eigenvector, for any constant K).

+++

+++

=

nnnnn

nn

n

n

xaxaxa

xaxaxa

x

xaaa

L

M

L

ML

2211

12121111

nnnn aaa L21

11211

xAxA

...

CS258F 2002F

Prof. J. Cong 8

Jason Cong 15

A Basic Property

( )

∑

∑∑

=

=

=

==

jiijji

n

n

iini

n

iii

nnnn

n

n

axx

x

xaxax

x

x

aa

aaxx

,

1

1,

1,1

1

1

111

1 ,,

ML

M

L

M

L

LxTAx

Jason Cong 16

Basic Idea of Laplacian Spectrum Based Graph Partitioning

a Given a bisection ( X, X’ ), define a partitioning vector

clearly, x ⊥ 1, x ≠ 0 ( 1 = (1, 1, …, 1 ), 0 =(0, 0, …, 0))

a For a partition vector x:

a Let S = {x | x ⊥ 1 and x ≠ 0}. Finding best partition vector x such that the total edge cut C(X,X’) is minimum is relaxed to finding x in S such that is minimum

a Linear placement interpretation:

minimize total squared wirelength

'11

.s.t),,,( 21 XiXi

xxxxx in ∈∈

−

== L

xxEji

ji )(),(

2−∑∈

….

x1 x2 x3 xn

xxEji

ji )(),(

2−∑∈

= 4 * C(X, X’)

CS258F 2002F

Prof. J. Cong 9

Jason Cong 17

Property of Laplacian Matrix

∑

∑∑ ∑

∑

∑

∈

∈

−=

−=

−=

−=

=

Ejiji

Ejijiii

jiijii

TT

jiijT

xx

xaij xxd

xxAxd

AxxDxx

xxQQxx

),(

2

),(

2

2

)(

2

( 1 )

squared wirelength

= 4 * C(X, X’)

Therefore, we want to minimizeT Qxx

If x is a partition vector

….

x1 x2 x3 xn

Jason Cong 18

( 2 ) Q is symmetric and semi-definite, i.e.

(i)

(ii) all eigenvalues of Q are ≥0

( 3 ) The smallest eigenvalue of Q is 0

corresponding eigenvector of Q is x0= (1, 1, …, 1 )

(not interesting, all modules overlap and x0∉S )

( 4 ) According to Courant-Fischer minimax principle:

the 2nd smallest eigenvalue satisfies:

Property of Laplacian Matrix (Cont’d)

0≥= ∑ jiijT xxQQxx

2||min

xQxxT

=λx in S

CS258F 2002F

Prof. J. Cong 10

Jason Cong 19

Results on Spectral Based Graph Partitioning

a Min bisection cost c(x, x’) ≥ n⋅λ/4

a Min ratio-cut cost c(x,x’)/|x|⋅|x’| ≥ λ/n (homework)

a The second smallest eigenvalue gives the best linear placement

a Compute the best bisection or ratio-cut

based on the second smallest eigenvector

Jason Cong 20

Computing the Eigenvector

aOnly need one eigenvector

(the second smallest one)

aQ is symmetric and sparse

aUse block Lanczos Algorithm

CS258F 2002F

Prof. J. Cong 11

Jason Cong 21

Interpreting the Eigenvector

aLinear Ordering of modules

aTry all splitting points, choose the best one

using bisection of ratio-cut metric

23 10 22 34 6 21 8

23 10 22 34 6 21 8

Jason Cong 22

Experiments on Special GraphsaGBUI(2n, d, b) [Bui-Chaudhurl-Leighton 1987]

2n nodesd-regularmin-bisection≈b

aGGAR (m, n, P int, Pext ) [Garbe-Promel-Steger 1990]

m clusters

n nodes each cluster

Pint: intra-cluster edge probability

Pext: Inter-cluster edge probability

n

n

n

n

n

n

CS258F 2002F

Prof. J. Cong 12

Jason Cong 23

GBUI Results

GBUI Results

Valu

e of X

i

Cluster 2

Cluster 1

Module i

Jason Cong 24

GGAR Results

GGAR Results

Valu

e of X

i

Cluster 4

Cluster 1

Module i

Cluster 2

Cluster 3

CS258F 2002F

Prof. J. Cong 13

Jason Cong 25







Jason Cong 26

Transform Netlists to Graph

a Transform each r-pin net into a r-clique

a Weight each edge in the r-clique1/r-1 ( simple scaling) or2/r (edge probability) or4/r2 (bound each net contribution in the cut size)

a Transforming multi-pin nets to cliques destroy the sparsity of D-A

CS258F 2002F

Prof. J. Cong 14

Jason Cong 27

Net Partitioning vs. Module Partitioninga Dual netlist representation-- intersection graph

Node: NetEdge: if two nets share a common gate

a Intersection graph also captures circuit connectivity, but does not use hyperedges

o

o

o

o

o

o

o

i

G1

G7

G6

G5

G4

G3

G2

i

b

g

d

ca j

f

eh

Netlist N Intersection graph I( N )

i

b

c

f

j

hd

a e

Jason Cong 28

Transform Net Partition to Module Partition

o

o

o

o

o

o

o

i

G1

G7

G6

G5

G4

G3

G2

i

bg

d

ca j

f

eh

⊆X’⊆X

N

Every edge in B(Y, Y’) indicates a conflict:

G3 is wanted by both c and g

G4 is wanted by both d and e

G5 is wanted by both d and f

ig

c

i

b

c

f

j

hd

a

I ( N )

e

fd

e

B (Y, Y’)

Y Y’

CS258F 2002F

Prof. J. Cong 15

Jason Cong 29

Some DefinitionsaMaximum Independent Set(MIS)

Maximum subset of vertices with no two connected

aMinimum Vertex Cover(MVC)Minimum subset of vertices which “covers” all edges

MIS MVC

Jason Cong 30

Resolving Net Conflict When Transforming A Net Partition to A Module Partition

a Some nets in B(Y, Y’) will lose

Each losing net corresponds to a net-cut in module partition

a Objective: maximize the # winning nets, or equivalently, minimize the # losing nets

a Theorem: Set of winning nets forms an independent set in B(Y, Y’), and the set of losers form a vertex cover

⇒Find max independent set in B(Y, Y’)⇒Solved optimally in polynomial time for bipartite graph

CS258F 2002F

Prof. J. Cong 16

Jason Cong 31

Bipartite Graph Propertiesa |MIS|+|MVC|=|V|

a size of any vertex cover is no smaller than max. matching

a |MVC|=|MM| (MM: maximum matching)

⇒ Goal: Find an MVC of size |MM|, or equivalently, an MIS of size |V|-|MM|

MIS = Winners

MVC =Losers

Jason Cong 32

Finding an MISL

R

L

R

Maximum Matching (MM)

UL

UR

CS258F 2002F

Prof. J. Cong 17

Jason Cong 33

Finding an MIS

Alternating Paths

L

R

UL

UR

Jason Cong 34

Finding an MIS

Odd, Even Sets

L

R

UL

UR

Odd(R)

Odd(L)Even ( R )

Even(L)

CS258F 2002F

Prof. J. Cong 18

Jason Cong 35

Improving the Module PartitionNet1= {a, b}

Net3= {a, e}

Net2= {b, c}

Net6= {g, e}

Net5= {e, f}

Net4= {b, d}

Even(L)={Net1, Net2}

ModSL={a,b,c}

Even(R)={Net5, Net6}

ModSR={e,f,g}

ModSL+{d} cuts 1set

ModSR+{d} cuts 2 nets

Jason Cong 36

Algorithm IG-Match

aGet spectral linear ordering

a Construct bipartite graph B

a Find MIS of B

a Convert MIS into module partition

Theorem: IG-Match computes all |V|-1 module partitions in O(|V|(|V|+|E|)) time

Key idea: incremental updating of the bipartite graph and alternating paths

CS258F 2002F

Prof. J. Cong 19

Jason Cong 37

Experimental Results Wei-Cheng

Areas/ Cutsize IG-Match

Areas/Cutsize Improvement

bm1 9: 873 / 1 21:861 / 1 57%

19ks 1011:1833 / 109 650:2194/85 -1%

Prim1 152:681 / 14 154: 679/ 14 1%

Prim2 1132: 1882/123 740: 2274/ 77 21%

Test02 372:1291/95 211:1452/38 37%

Test03 147: 1460/31 803: 804/ 58 38%

Test04 401:1114/51 73: 1442 / 6 50%

Test05 1204:1391/110 105: 1442 / 6 53%

Test06 145: 1607/18 141: 1611/ 17 3%

a 28.8% average improvement

Jason Cong 38

Further Study in Partitioning Techniques

a Module replication in circuit partitioning [Kring-Newton 1991; Hwang-ElGamal 1992; Liu et al TCAD’95; Enos, et al, TCAD’99]

a Generating uni-directional partitioning [Iman-Pedram-Fabian-Cong 1993] or acyclic partitioning [Cong-Li-Bagrodia, DAC94] [Cong-Lim, ASPDAC2000]

a Logic restructuring during partitioning

[Iman-Pedram-Fabian-Cong 1993]

a Communication based partitioning

[Hwang-Owens-Irwin 1990; Beardslee-Lin-Sangiovanni 1992]

a Multi-level partitioning (will discuss after clustering)

CS258F 2002F

Prof. J. Cong 20

Jason Cong 39

Multi-Way Partitioning

a Recursive bi-partitioning[Kernighan-Lin 1970]

a Generalization of Fuduccia-Mattheyse’sand Krishnamurthy’s algorithms [ Sanchis 1989] [Cong-Lim, ICCAD’98]

a Generalization of ratio-cut and spectral method to multi-way partitioning [Chan-Schlag-Zien 1993]ageneralized ratio-cut value=sum of flux of each partitionageneralized ratio-cut cost of a k-way partition

≥sum of the k smallest eigenvalue of the Laplacian Matrix

Jason Cong 40

Circuit Clustering Formulation

Motivation:

a Reduced the size of flat netlistsa Identify natural circuit hierarchy

Objectives:

aMaximize the connectivity of each clusteraMinimize the size, delay (or simply depth),

and density of clustered circuits

CS258F 2002F

Prof. J. Cong 21

Jason Cong 41

Measuring Cluster Connectivity

aSimple density measurementm/n -- include a new node when it

connects to more than m/n edges

nm/( ) -- includes a new node when it

connects to more than 2m/(n-1) edges -- biased for small clusters

a (k, I)-connectivity [Garber-Promel-Steger 1990]Every two nodes in the cluster are connected by at

least k edge -disjoint paths of length no more than IProper choice of k and I may not be easy

aD/S-metric [Cong-Hagen-Kahng 1991]D/S=Degree/SeparationDegree= average degree in the clusterSeparation=average shortest path between a pair of nodes in the cluster Robust but expensive to compute

uP1

P2

Pk

≤ l

v

2

Jason Cong 42

Clustering Algorithmsa (k, I)-connectivity algorithms [Garber-Promel-Steger 1990]

a Random-walk based algorithm [Cong-Hagen-Kahng 1991; Hagen-Kahng 1992]

a Multicommodity-Flow based algorithm [Yeh-Cheng-Lin 1992]

a Clique based algorithm [Bui 1989; Cong-Smith 1993]

a Clustering for delay minimization (combinational circuits) [Lawler-Levitt-Turner 1969; Murgai-Brayton-Sanjiovanni 1991; Rajaraman-Wong 1993]

a Clustering for delay minimization (with consideration of retiming for sequential circuits) [Pan et al, TCAD’99; Cong et al, DAC’99]

a Signal flow based clustering [Cong-Ding, DAC’93; Cong et al ICCAD’97]

a Multi-level clustering [Karypis-Kumar, TR95/DAC97; Cong-Sung, ASPDAC’2000]

CS258F 2002F

Prof. J. Cong 22

Jason Cong 43

Lawler’s Labeling Algorithm[Lawler-Levitt-Turner 1969]

aAssumption: Cluster size≤ K; Intra-cluster delay = 0; Inter-cluster delay =1

aObjective: Find a clustering of minimum delay

aAlgorithm:aPhase 1: Label all nodes in topological ordor

For each PI node V, L(v)= 0;For each non-PI node v

p=Maximum label of predecessors of vXp = set of predecessors of v with label pif |Xp|<K then L(v) = p else L(v) =p+1

aPhase2: Form clustersStart from PO to generate necessary clustersNodes with the same label form a cluster

p-1

Xp

p-1

v

p-1

p

p

Jason Cong 44

Lawler’s Labeling Algorithm(Cont’d)

aPerformance of the algorithmaEfficient run-timeaMinimum delay clustering solutionaAllow node duplicationaNo attempt to minimize the number of clusters

aExtension to allow arbitrary gate delaysaHeuristic solution

[Murgai-Brayton-Sangiovanni 1991]aOptimal solution

[Rajaraman-Wong 1993]

CS258F 2002F

Prof. J. Cong 23

Jason Cong 45

Signal Flow and Logic Dependency• Beyond Connectivity

– identifies logically dependant cells

– useful in guiding• clustering • partitioning [CoLB94, CoLL+97]• placement [CoXu95]

Jason Cong 46

Maximum Fanout Free Cone (MFFC)• Definition: for a node v in a combinational circuit,

– cone of v ( ) : v and all of its predecessors such that any path connecting a node in and v lies entirely in

– fanout free cone at v ( ) : cone of v such that for any node

– maximum FFC at v ( ) : FFC of v such that for any non-PI node w,

vCvC vC

vFFC

vv FFCuoutputFFCvu ⊆≠ )( , in

vMFFCvv MFFCwMFFCwoutput ∈⊆ then ,)( if

CS258F 2002F

Prof. J. Cong 24

Jason Cong 47

Properties of MFFCs• If

• Two MFFCs are either disjoint or one contains another [CoDi93]• Area-optimal duplication-free LUTs map each MFFC independently

to get an optimal solution [CoDi93]• For acyclic bipartitioning without area constraint, there exists an

optimal cut which does not cut through any MFFCs [CoLB94]

[CoDi93] then , vMFFCwMFFCvMFFCw ⊆∈

Jason Cong 48

Maximum Fanout Free Subgraph (MFFS)• Definition : for a node v in a sequential circuit,

• Illustration

} through passes PO some to from path every|{ vuuFFSv =} , allfor |{ vvv FFSuFFSuMFFS ∈=

MFFCs ??? MFFS

CS258F 2002F

Prof. J. Cong 25

Jason Cong 49

MFFS Construction Algorithm• For Single MFFS at Node v

– select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs – MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)

v

Jason Cong 50


– select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs– MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)

v

CS258F 2002F

Prof. J. Cong 26

Jason Cong 51



v

Jason Cong 52



v

CS258F 2002F

Prof. J. Cong 27

Jason Cong 53

MFFS Clustering Algorithm• Clusters Entire Netlist

– construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))

v

Jason Cong 54



CS258F 2002F

Prof. J. Cong 28

Jason Cong 55



v

Jason Cong 56



v

CS258F 2002F

Prof. J. Cong 29

Jason Cong 57



Jason Cong 58

Speedup of MFFS Clustering• Approximation of MFFS

– cluster size limit imposed in practice– perform MFFS construction on subcircuit SC(v, h)

• defined by search depth h for given node v• internal nodes : all visited nodes by BFS of depth h from v• pseudo PIs/POs : all I/Os from/to nodes not in SC(v, h)

h

v

SC(v, h)

circuit

CS258F 2002F

Prof. J. Cong 30

Jason Cong 59

Speedup of MFFS Clustering• Approximation of MFFS

– cluster size limit imposed in practice– perform MFFS construction on subcircuit SC(v, h)

• defined by search depth h for given node v• internal nodes : all visited nodes by BFS of depth h from v• pseudo PIs/POs : all I/Os from/to nodes not in SC(v, h)

h

v

pseudo PIs

pseudo POs

SC(v, h)

circuit

Jason Cong 60

LSR/MFFS Partitioning Algorithm

• Basic Approach : Clustering Based Partitioning– MFFS clustering : exploit logic dependency– LSR partitioning : focus on loose and stable net

removal• Overview of the Algorithm

– cluster netlist using MFFS algorithm– partition clustered netlist with LSR algorithm– completely decompose netlist– refine flat netlist with LSR algorithm

CS258F 2002F

Prof. J. Cong 31

Jason Cong 61

Loose net Removal (LR)• Net Status Transition

• Loose net Removal

free net

loose net

locked net =

block 0 block 1 block 2

free cells of loose net

uncut net =

Jason Cong 62



free net

loose net

locked net =



uncut net =

CS258F 2002F

Prof. J. Cong 32

Jason Cong 63



free net

loose net

locked net =



uncut net =

Jason Cong 64



free net

loose net

locked net =



uncut net =

CS258F 2002F

Prof. J. Cong 33

Jason Cong 65



free net

loose net

locked net =



uncut net =

Jason Cong 66

LR Implementation• Basic Approach

– increase gain of free cells of loose nets outside anchor

– focus on smaller nets

• Cell Gain Increase Function– accumulate : until complete removal– set threshold : array-based bucket can be used– expanded gain range : no tie-breaking scheme (LIFO, LA3)

needed

⋅=∑∑

∈

∈

n

n

Fc

Lc

cdeg

cdeg

nsizemax_net

nincr)(

)(

)()(

CS258F 2002F

Prof. J. Cong 34

Jason Cong 67

Stable Net Transition (SNT)• Stable net [Shibuya et al. 1995]

– remain cut throughout entire run– more than 80% of cut nets are stable

• Implementation– detect stable nets by comparing cutsets– move all unprocessed cells of a stable net into single block– serve as initial partition for next run

• fast convergence• efficiently explore more local minima

• Loose and Stable net Removal (LSR)– combination of two effective net removal schemes

• LR : small loose nets• SNT : large stable nets

Jason Cong 68

Summary of LSR/MFFS Algorithm

Cluster netlist with MFFS

Partition netlist with LSR

Gain?

Decluster netlist

Refine netlist with LSR

Gain?yes yes

nono

START

END

CS258F 2002F

Prof. J. Cong 35

Jason Cong 69

Clustering Result• MCNC benchmarks with signal flow information

NAME SIZE AVR # CLST TIME # CLST TIMEs1423 619 1.0 193 0.9 168 1.9sioo 664 4.6 442 2.3 442 2.6

s1488 686 1.0 187 0.9 187 1.4balu 801 12.9 287 2.0 284 3.8

prim1 833 6.7 394 3.2 379 5.9struct 1952 2.5 1183 17.9 1183 12.1prim2 3014 6.7 1404 34.8 1243 22.1s9234 5866 1.0 3203 36.7 3177 9.7biomed 6514 4.5 2142 87.8 1853 30.5s13207 8772 1.0 2132 73.9 1994 14.1s15850 10470 1.0 2279 84.9 2137 17.6s35932 18148 1.0 5562 420.8 2943 32.1s38584 20995 1.0 5139 565.1 4242 44.5avq.sm 21918 4.5 8477 1287.3 8309 116.1s38417 23949 1.0 5906 452.2 5295 45.1avq.lg 25178 4.5 9103 1473.2 8658 90.2

TOTAL 48033 4543.9 42494 449.7

Optimal Heuristic

min_areamax_area

AVR =

Jason Cong 70

Cutsize Reduction by MFFS Clustering

• Bipartitioning under 45-55% Balance Criterion

500 700 900 1100 1300 1500 1700 1900

LSR

LR

SNT

FM

FLATMFFS

0.0%-25.8%

-8.0%-31.7%

-45.6%--48.4%

-46.9%-49.9%

CS258F 2002F

Prof. J. Cong 36

Jason Cong 71

Cutsize Reduction by MFFS Clustering

NAME FM SNT LR LSR FM SNT LR LSRs1423 12 14 13 13 12 12 12 12sioo 25 25 25 25 25 25 25 25

s1488 48 45 44 42 42 42 43 43balu 27 27 27 27 27 27 27 27prim1 43 42 42 42 43 42 42 43struct 42 45 33 33 39 52 33 33prim2 174 167 121 122 133 122 131 119s9234 49 54 41 43 42 43 41 40biomed 83 83 83 83 100 84 84 84s13207 90 74 75 70 91 65 73 61s15850 113 73 73 65 72 70 44 43s35932 100 128 45 44 48 76 45 44s38584 52 52 49 47 52 52 47 47avq.sm 321 378 135 139 273 172 127 127s38417 176 112 53 55 119 147 51 50avq.lg 491 380 145 131 251 168 127 127

TOTAL 1846 1699 1004 981 1369 1261 952 925% IMPR 0.0 8.0 45.6 46.9 25.8 31.7 48.4 49.9

TIME 121.4 59.4 162.3 138.9 84.2 61.3 113.6 89.4

No Clustering MFFS Clustering

Jason Cong 72

Cutsize Comparison with Other Methods

800

900

1000

1100

1200

CD

IP

Pro

pF

Stra

w

hMet

is

ML

c

L/M

Pan

za

Par

ab

FB

B

IIPsNon-IIPs

CS258F 2002F

Prof. J. Cong 37

Jason Cong 73

Details on Cutsize Comparison

NAME CDIP PROPf Straw hMetis MLc L/M PANZA Parab FBBs1423 12 15 16 13sioo 25 25 45

s1488 43 44 50balu 27 27 27 27 27 27 27 41

prim1 52 51 49 50 47 43 struct 36 33 33 33 33 33 33 40prim2 152 152 143 145 139 119 s9234 44 42 42 40 40 40 40 74 70biomed 83 84 83 83 83 84 83 135s13207 70 71 57 55 55 61 66 91 74s15850 67 56 44 42 44 43 44 91 67s35932 73 42 47 42 41 44 43 62 49s38584 47 51 49 47 47 47 47 55 47avq.sm 148 144 131 130 128 127 s38417 79 65 53 51 49 50 49 49 58avq.lg 145 143 140 127 128 127

TOTAL1 1023 961 898 872 861 845 TOTAL2 509 516 749% IMPR 17.4 12.1 5.9 3.1 1.9 1.4 32.0 21.4

TIME 5817 5611 12577 3455 1338 24619

Iterative Improvement Partitioners (IIPs) non-IIPs

Jason Cong 74

Multi-level Paradigm• Combination of Bottom-up and Top-down Methods

– From coarse-grain into finer-grain optimization– Successfully used in partial differential equations, image

processing, combinatorial optimization, etc, and circuit partitioning.

Coarsening Uncoarsening

Initial Partitioning

CS258F 2002F

Prof. J. Cong 38

Jason Cong 75

General Framework• Step 1: Coarsening

– Generate hierarchical representation of the netlist

• Step 2: Initial Solution Generation– Obtain initial solution for the top-level clusters

– Reduced problem size: converge fast

• Step 3: Uncoarsening and Refinement– Project solution to the next lower-level (uncoarsening)– Perturb solution to improve quality (refinement)

• Step 4: V-cycle– Additional improvement possible from new clustering– Iterate Step 1 (with variation) + Step 3 until no further gain

Jason Cong 76

V-cycle Refinement• Motivation

– Post-refinement scheme for multi-level methods– Different clustering can give additional improvement

• Restricted Coarsening– Require initial partitioning– Do not merge clusters in different partition– Maintaincutline: cutsize degradation is not possible

• Two Strategies: V-cycle vs. v-cycle– V-cycle: start from the bottom-level– v-cycle: start from some middle-level– Tradeoff between quality vs. runtime

CS258F 2002F

Prof. J. Cong 39

Jason Cong 77

Application in Partitioning• Multi-level Partitioning

– Coarsening engine (bottom-up)• Unrestricted and restricted coarsening• Any bottom-up clustering algorithm can be used• Cutsize oriented (MHEC, ESC) vs. delay oriented (PRIME)

– Initial partitioning engine• Move-based methods are commonly used

– Refinement engine (top-down)• Move-based methods are commonly used• Cutsize oriented (FM, LR) vs. delay oriented (xLR)

• State-of-the-art Algorithms– hMetis [DAC97] and hMetis-Kway [DAC99] for cutsize– HPM [DAC00] for delay

Jason Cong 78

hMetis Algorithm• Best Bipartitioning Algorithm [DAC97]

– Contribution: 3 new coarsening schemes for hypergraphs

Original Graph Edge Coarsening

Edge Coarsening = heavy-edge maximal matching1. Visit vertices randomly2. Compute edge-weights (=1/(|n|-1)) for all unmatched neighbors3. Match with an unmatched neighbor via max edge-weight

CS258F 2002F

Prof. J. Cong 40

Jason Cong 79

hMetis Algorithm (cont)

• Best Bipartitioning Algorithm [DAC97]– Contribution: 3 new coarsening schemes for hypergraphs

Hyperedge Coarsening Modified Hyperedge Coarsening

Hyperedge Coarsening = independent hyperedge merging1. Sort hyperedges in non-decreasing order of their size2. Pick an hyperedge with no merged vertices and merge

Modified Hyperedge Coarsening = Hyeredge Coarsening + post process1. Perform Hyperedge Coarsening2. Pick a non-merged hyperedge and merge its non-merged vertices

Jason Cong 80

hMetis-Kway Algorithm• Multiway Partitioning Algorithm [DAC99]

– New coarsening: First Choice (variant of Edge Coarsening)• Can match with either unmatched or matched neighbors

– Greedy refinement• On-the-fly gain computation• No bucket: not necessarily the max-gain cell moves• Save time and space requirements

Original Graph First Choice

CS258F 2002F

Prof. J. Cong 41

Jason Cong 81

hMetis Results• Bipartitioning on ISPD98 Benchmark Suite

1.61

1.211.03 1

0

0.4

0.8

1.2

1.6

Scal

ed C

utsi

ze

FM LR LR/ESC hMetis

Jason Cong 82

hMetis-Kway Results• Multiway Partitioning on ISPD98 Benchmark Suite

1.2

1.03

1.19

1.02

1.18

1.011.15

0.97

0

0.2

0.4

0.6

0.8

1

1.2

Scal

ed C

utsi

ze

2way 8way 16way 32way

hMetis-KwayKPM/LRLR/ESC-PM

CS258F 2002F

Prof. J. Cong 42

Jason Cong 83

Summarya Partitioning is very important for applying divide -and-

conquer methodology (for complexity management)

a Partitioning also defines global/local interconnects and greatly impact circuit performance

a Increasing emphasis on interconnect design has introduced many new partitioning formulations

a clustering is effective in reducing circuit size and identifying natural circuit hierarchy

aMulti-level circuit clustering + iterative improvement based methods produce the best partitioning results (so far)

Jason Cong 84

Appendix: Simulated Annealing Based Partitioning

aBasic approacha Solution space: all partitionsa Neighboring solution: move a module from one side to another

a Cost function: cost = c(X,X’)+λ(|X|-|X’|)2

a Temperature schedule: standard cooling schedule

aSimulated annealing without rejectiona Generate a move with probability proportional to its gain

a All gains can be updated efficiently after a move( unique for partitioning!)

a Substantial run-time reduction at low temperature

a Use conventional SA at high temperature and rejectless SA at low temperatures

CS258F 2002F

Prof. J. Cong 43

Jason Cong 85

Simulated Annealing

a Local Search

cost

fun

ctio

n

solution space

o

o

oo

o

oo

o?

Jason Cong 86

Statistical Mechanics

Combinational Optimization

State { r: } (configuration - a set of atomic position)

Weight

-Boltzmann distribution

E({r:}) energy of configuration

KB: Boltzmann constant; T: temperature.

Low Temperature Limit??

TKrEbe /:})({−

CS258F 2002F

Prof. J. Cong 44

Jason Cong 87

Analogy

Physical System

State(configuration)

Energy

Ground State

Rapid Quenching

Careful Annealing

Optimization Problem

(Solution)

Cost function

Optimal solution

Iteration Improvement

Simulated Annealing

Jason Cong 88

Generic Simulated Annealing Algorithm

1. Get an initial solution S

2. Get an initial temperature T>0

3. While not yet “frozen” do the following:3.1 For 1≤ i ≤ L, do the following:

3.1.1 Pick a random neighbor S’ of S.3.1.2 Let ∆=cost( s’ )-cost(s)3.1.3 If ∆ ≤ ( 0 ) (downhill move),

Set S=S’3.1.4 If ∆ > 0 (uphill move)

set S=S’ with probability , 3.2 Set T= rT (reduce temperature)

4. Return S

Te /∆−

CS258F 2002F

Prof. J. Cong 45

Jason Cong 89

Basic Ingredients for S.A.

a Solution space

a Neighborhood Structure

a Cost Function

a Annealing Schedule

Jason Cong 90

SA Partitioning“ Optimization by simulation Annealing” -Kirkpatrick, Gaett, Vecchi.

a Solution space=set of all partitions

a Neighborhood Structure

abc

def

ab

def

af

bcde

abc

a solution a solution a solution

def

bcde

ac

a move

Randomly move one cell to the other side

CS258F 2002F

Prof. J. Cong 46

Jason Cong 91

SA PartitioningaCost function:

f=C+λBaC is the partitioning cost as used before

aB is a measure of how balance the partitioning is

aλ is a constant.

Example of B:

ab...

cd...S2S1

B = ( |S1| - |S2| )2

Jason Cong 92

SA PartitioningaAnnealing schedule:

aTn=(T1/T0)nT0 Ratio T1/T0=0.9

aAt each temperature, either

1. There are 10 accepted moves on the average;

or

2. # of attempts≥100× total # of cells

aThe system is “frozen” if very low acceptances at 3 consecutive temp.

CS258F 2002F

Prof. J. Cong 47

Jason Cong 93

Graph Partition UsingSimulated Annealing Without Rejections

aGreene and Supowit, ICCD-88 pp. 658-663

aMotivation:

At low temperature, most moves are rejected!

e.g. 1/100 acceptance rate for 1,000 vertices

Jason Cong 94

aKey Idea(I) Biased selection

If a move i has probability ω i to be accepted, generate move i with probability

N: size of neighborhood

In general,

In conventional model, each move has probability 1/N to be generated.

(II) If a move is generated, it is always be accepted

Graph Partition Using Simulated Annealing Without Rejections (Cont’d)

∑=

N

Jj

i

1

ω

ω

}.,1min{ /Ti

ie ∆ −=ω

CS258F 2002F

Prof. J. Cong 48

Jason Cong 95

Graph Partition Using Simulated Annealing Without Rejections (Cont’d)

aMain Difficulty

( 1 ) ωi is dynamic ( since ∆ i is dynamic )

It is too expensive to update ωi’s (∆ i’s) after every move

( 2 ) Weighted selection problem

how to select move i with probability

∑=

N

jji

1

ωω ??

Jason Cong 96

Solution to the Weight Selection Problem(general solution to the several problems)

ω1+ ω 2

ω1+ •••+ ω 7

ω7

ω5+ ω 6+ ω 7

ω5+ ω 6ω3+ ω 4

ω1+ •••+ ω 4

ω1ω2 ω3 ω4

ω5 ω6 ω70

CS258F 2002F

Prof. J. Cong 49

Jason Cong 97

Solution to the Weight Selection Problem (Cont’d)

Let W= ω1+ ω2+ ω3+ω4+ ω5+ ω6+ •••+ωn, how to select i with probability ωi /W ?

Equivalent to choosing x such that ω1+ •••+ωi-1< x ≤ ω2+ •••+ωnv← rootx ← random( 0, 1 )* ω(v)while v is not a leaf do

if x < ω(left (v)) then v ← left(v)else x ← x-ω(left(v)), v ← right (v)

endProbability of ending up at leaf:

∑ ∑

∑−

= =

=

≤<=

=

1

1 1

1

(Probi

j

i

jjj

i

N

jji

x

W

ωω

ωωω

)

Jason Cong 98

Application to PartitioningSpecial solution to the first problem

Given a partition (A, B)

Cost F(A,B)=Fc(A,B)+FI(A,B)Fc(A,B) = net-cut between A,BFI(A,B) = C(|A|2+|B|2) (min when |A|=|B|=n/2)

for move i, ∆ i=F(A’,B’)-F(A,B)

After a move

),()','(

),()','(

BAFBAF

BAFBAFcII

i

ccci

−=∆

−=∆

changesAll

changes.fewaIi

Ci

∆

∆

CS258F 2002F

Prof. J. Cong 50

Jason Cong 99

Solution:

Two-step biased selection:

(i) choose A or B based on

(ii) choose move i within A or B based

Note, ’s are the same for each in A or B.

So we keep one copy of for A

one copy of for B

choose the moves within A or B using the tree algorithm

Application to PartitioningSpecial solution to the first problem(Cont’d)

)(-> TIi

Ii ∆ ∆ α

)(TCi

Ci ∆ ∆ −>α

)(TIi∆ α )(T

Ci∆ α •Pi=

Ii∆

Ii∆

Ii∆

Outline - University of California, Los Angelescadlab.cs.ucla.edu/~cong/CS258F/chap3.pdfJason Cong...

Documents

Transcript of Outline - University of California, Los Angelescadlab.cs.ucla.edu/~cong/CS258F/chap3.pdfJason Cong...