A Framework for Layout-Level Logic Restructuring

35
A Framework for Layout-Level Logic Restructuring Hosung Leo Kim John Lillis

Transcript of A Framework for Layout-Level Logic Restructuring

Page 1: A Framework for Layout-Level Logic Restructuring

A Framework for Layout-Level Logic Restructuring

Hosung Leo KimJohn Lillis

Page 2: A Framework for Layout-Level Logic Restructuring

Motivation: Logical-to-Physical Disconnect

• Performance is determined largely by physical-level Interconnect delay.

• Problem: timing optimization at logic-level ≠actual performance.

Logic-level Optimization

Physical-level Optimization

disconnect

Limited by the structure obtained from the logic-level

fixed netlist

Page 3: A Framework for Layout-Level Logic Restructuring

Past Layout-Driven Restructuring Work: Replication Based

• Basic Operations:– Gate Splitting– Fanout Partitioning;

Enables “Path Straightening”

• [Schabas, Brown. ISFPGA03]• [Beraudo, Lillis. DAC03]• [Hrkic, Lillis, Beraudo. TCAD06]• [Chen, Cong. ISFPGA05]

Page 4: A Framework for Layout-Level Logic Restructuring

Limitation of Logic Replication

• While interconnect delay can be significantly reduced, the LUT-depth of a path remains unchanged.

• The LUT-depth is typically determined by a technology mapper which does not have an accurate view of critical paths.

Candidate: Remapping

Page 5: A Framework for Layout-Level Logic Restructuring

Other Work• Redundant Wires (e.g., [Chang, Cheng, Suaris, Marek-

Sadowska. DAC00])rewire connections while keeping logical equivalence.Predictable, but optimization scope limited

• [Lin, Jagannathan, and Cong. ISFPGA03]Remap based on placement-level timing analysisSignificant restructuring, but placement of remapped cells determined by initial placement (not simultaneous).

• [Singh and Brown. Integration07]Shannon’s expansion / precomputationAllows late signals to skip logic levels, but relatively local in nature

Page 6: A Framework for Layout-Level Logic Restructuring

Objectives

• Overcome limitations of basic replication (e.g., fixed LUT-depth)

• Large and flexible remapping space• Explicitly account for placement freedom

of remapped LUTs• Tight coupling with placement

Page 7: A Framework for Layout-Level Logic Restructuring

Components of Approach(FPGA Domain)

Placement-LevelStatic Timing

Analysis

Timing-Critical Fan-in Cone Extraction

Induce Replication Tree[Hrkic,TCAD06]

Page 8: A Framework for Layout-Level Logic Restructuring

Components of Approach (cont’d)

Replicationtree

Subject Graph(Choice Tree)

A

i j k l

B

a b c d

C

e f g h

choice node

i k

j l

i

l

choice node

e

f g

h

h

e f h

choice node

a b c d

e g

e f g h

k ji j

k l

i j k l

a b

d

g1

g2

g3

c d

a b

g7

g8

dc

g4

g5 c d

g6

c

(a) Given LUT-tree

(b) Choice tree

A

B C

a

e

d

f

bR

cR

Recursive, Exhaustive Ashenhurst LUTDecomposition

Legalizer

Mapper and Embedder(Dynamic

Programming)

Page 9: A Framework for Layout-Level Logic Restructuring

Remapping ExampleA

DCB

E

a b c

f g h i j k ld e

m

a

b c

f g

h

i j k l

d e m

A

B

DC

E

a

b c

f g

h

i j k l

d e m

A′

B′

DC

E

A′

DC

B′E

a b c

f g h i j k l

d e

m

(a) Given LUT-tree (b) “Mini-LUT” tree after LUT-decompositions

(c) Alternative mapping (d) Corresponding LUT-tree

Page 10: A Framework for Layout-Level Logic Restructuring

Functional Decomposition

• Test for decomposability– Ashenhurst’s theorem

• Recursively decomposeg2

w

g1

y

z

x

0 0 0

0 0

0

0 0

xywz

1

1

1

1

1

1

1

1

f

yxw z

1 bit (Simple)

disjoint

• Simple Disjoint Functional Decomposition

Page 11: A Framework for Layout-Level Logic Restructuring

All Recursive Decompositions

f

a b c d

c

a b

d

g1

g2

g3

c d

a b

g7

g8

d

a b

c

g4

g5

g3

a b

c d

g6

g3

g1 g2 dg2 g3 c′g3 a bg4 g5 c′g5 g3 dg6 g3 c′dg7 a b g8g8 c′d

f a b c′d

Page 12: A Framework for Layout-Level Logic Restructuring

Choice Tree [Lehman,TCAD97]

A

i j k l

B

a b c d

C

e f g h

choice node

i k

j l

i

l

choice node

e

f g

h

h

e f h

choice node

a b c d

e g

e f g h

k ji j

k l

i j k l

a b

d

g1

g2

g3

c d

a b

g7

g8

dc

g4

g5 c d

g6

c

(a) Given LUT-tree

(b) Choice tree

A

B C

Page 13: A Framework for Layout-Level Logic Restructuring

Algorithm

• Mini-LUT Tree Mapping • Fan-in Tree Embedding[Hrkic,TCAD06]

• Simultaneous Remapping and Embedding

Page 14: A Framework for Layout-Level Logic Restructuring

Logic Remapping Formulation• Formulation

– Given a “mini-LUT” tree and arrival time at the leaves,– map the tree to K-input LUTs minimizing cost subject

to an arrival time constraint at the root.

a

b c

fed

h i j k l m

g

Page 15: A Framework for Layout-Level Logic Restructuring

Solution Signature

• (c,a)– for a sub-tree rooted u, a solution is characterized by two

parameters:• cost of the embedding (and remapping) of a sub-tree.• arrival time at u.

• Dominance Relation– (c,a) is not dominated by (c’,a’) when c

is better than c’ or a is better than a’.

cost

arriv

al ti

me

Page 16: A Framework for Layout-Level Logic Restructuring

Solution Sets• Si [u] = {(c,a)}

– u: signal produced by root LUT– i: # inputs of root LUT– c: # LUTs in subtrees– a: the latest among the fan-ins.

i(0,2)

u

h(0,6)

S2[u]={(0,6)}J

• Si[u]– “finalized” solution from Si [u].– c: # LUTs in subtrees + 1– a: the root LUT included.

i(0,2)

u

h(0,6)

S2[u]={(1,7)}• S[u]

– non-dominated_sol(S2[b], … , SK[b])

J

J

Page 17: A Framework for Layout-Level Logic Restructuring

Si [u] Example

For simplicity:one LUT = one unit cost

one LUT = one unit delay

J

Page 18: A Framework for Layout-Level Logic Restructuring

Si[u] and S[u] Example

• S[b] = non-dominated_sol(S2[b], … , SK[b]) = {(1,7)}

• Si[b]

Page 19: A Framework for Layout-Level Logic Restructuring

Computation of Si [u]i = 1, no collapsing of u and Li = K-1, no collapsing of u and ROtherwise, collapsing of u, L, and R.

L R

S4[u] = join(S[a],S3[b]) ∪ join(S2[a], S2[b]) ∪ join(S3[a],S[b])

(a)

(b) (c) (d)

i K - i

J

K = 4

i = 1 i = 2 i = 3 (=K–1)

J J J J J

u

ba

dc

u

ba

dc

Page 20: A Framework for Layout-Level Logic Restructuring

Remapping Algorithm Example

arriv

al ti

me

(a) Subject Tree

i(0,2)

a

b c

fed

h(0,6)

j(0,3)

k(0,2)

l(0,1)

m(0,4)

g(0,4)

Page 21: A Framework for Layout-Level Logic Restructuring

Algorithms

• Mini-LUT Tree Mapping • Fan-in Tree Embedding[Hrkic,TCAD06]

• Simultaneous Remapping and Embedding

Page 22: A Framework for Layout-Level Logic Restructuring

Tree Embedding [Hrkic,TCAD06]

a

e

d

f

a

e

d

f

bR

cR

topologyarrival timepin locations

target layout graphEmbeddingAlgorithm

cost metrics arriv

al ti

me

a

e

d

f

bR

cR

bR

a

cR

e

d

f(0,2)(0,3)

(0,4)

Page 23: A Framework for Layout-Level Logic Restructuring

Algorithms

• Mini-LUT Tree Mapping • Fan-in Tree Embedding[Hrkic,TCAD06]

• Simultaneous Remapping and Embedding

Page 24: A Framework for Layout-Level Logic Restructuring

Simultaneous Remapping and Embedding

• Formulation– Given a “mini-LUT” tree with fixed leaves and root, and

arrival time at the leaves, a target layout graph– Simultaneously map the tree to K-input LUTs and embed.

Page 25: A Framework for Layout-Level Logic Restructuring

Solution Set Si [u][v]

• The remapped root produces signal u and isplaced at v in the target layout graph.

J

Page 26: A Framework for Layout-Level Logic Restructuring

Solution Set Si[u][v]

• Solutions Si [u][w] are finalized and drives vertex v in the target layout graph.

• Computed by shortest weight-constrained path algorithm.

w

Si[u][v]

vu

h i j k

J

Page 27: A Framework for Layout-Level Logic Restructuring

Solution Set S[u][v]

• S[u][v] ← non-dominated-sol(S2[u][v],…,SK[u][v])• The best remapping regardless of the number of

inputs at v in the target layout graph.

Page 28: A Framework for Layout-Level Logic Restructuring

Simultaneous Remapping and Embedding Example

cost

arriv

al ti

me (19,13)

(20,11)

a

b c

fed

h i j k l m

g

(c) S4[a][v23]={(22,10)}

(22,10)

(c) S[a][v23]={(19,13),(20,11),(22,10)}

g

m

lkj

i

h

v23

Page 29: A Framework for Layout-Level Logic Restructuring

Experiment• Benchmarks

– 20 MCNC benchmark circuits– At least 20% white space

• Criteria of Interest– LUT depth– Clock period of circuits

• Comparisions– Timing-driven VPR placer– Replication Tree embedder– Arbor embedder [Kim,GLSVLSI06]– Remapping embedder

• Different logic-level mappers and Stability effect of new algorithm

Page 30: A Framework for Layout-Level Logic Restructuring

Optimization Flow

Modified Netlist

Initial Netlist & Placement

Tree Embedding

Static Timing Analysis &Replication Tree Construction

Modified Netlist & Placement

Post-Processing & Legalization

• Repl Tree embedder• Remapping embedder

Page 31: A Framework for Layout-Level Logic Restructuring

LUT Depth Changes

13974

10888

117

Crit. Path

161085

14988

128

Crit. Path

cktckt

1416s2981010apex298seq88dsip

1314diffeq99alu499misex398apex4

1212tseng99ex5p

New cktInit. ckt

135889

159

2244

Crit. Path

145

10101015102045

Crit. Path

cktckt

1415clma1010s38584.11010s384171111pdc1010ex10101817elliptic1010spla2223frisc55bigkey88des

New cktInit. ckt

Page 32: A Framework for Layout-Level Logic Restructuring

Routed Clock Period

0.8260.8480.8861Avg DelayRemapArborReplT-VPR

0

0.2

0.4

0.6

0.8

1

1.2

ex5p

tseng

apex

4mise

x3 alu4

diffeq ds

ip seq

apex

2s2

98 des

bigke

yfris

csp

laell

iptic

ex10

10 pdc

s384

17s3

8584

.1clm

a

VPRReplArborRemap

• Average Normalized Clock Period • Max reduction of REMAP vs Arbor

11.7%

Page 33: A Framework for Layout-Level Logic Restructuring

Different Logic-level Mappersand Stability Effect of Remap

0

10

20

30

40

50

60

70

80

90

VPR Repl Remap

FlowMapFlowMap-rZMapPraetorDaomap

• FlowMap: optimal depth.• FlowMap-r: relaxed depth. • ZMap: optimal depth with simultaneous area minimization.• Praetor: minimized area.• Daomap

seq

Span: 12%Span: 4%

Page 34: A Framework for Layout-Level Logic Restructuring

Summary

• Study of layout-level restructuring for interconnect optimization.– Functional Decomposition– Choice Tree– Remapping Algorithm– Simultaneous remapping and embedding

• Experimental Result– Average 17% reduction on clock period compared

with T-VPR.

Page 35: A Framework for Layout-Level Logic Restructuring

Thank You!