Download - Giuseppe S. Garcea Delft University of Technology Delft, The Netherlands [email protected] Ralph H.J.M. Otten Eindhoven University of Technology.

Giuseppe S. GarceaDelft University of Technology

Delft, The [email protected]

Ralph H.J.M. Otten

Eindhoven University of TechnologyEindhoven, The Netherlands

[email protected]

are wires plannable?are wires plannable?

Ralph Otten

wire planningwire planning

1987: providing floorplan design with alignment constraints floorplan is a data structure capturing the relative positions

(i.e. no geometry, possibly overlap, several optimizations) alignment to save wire area (data path generator) often tremendous reduction in routing complexity in essence not limited to "data path" regularity

1998: fixing (and maximizing) time budgets for modules remove global iteration from synthesis fix total path delay provide pre-placement and pin positioning data enable early retiming, layer assignments, system partitioning ensure satisfaction of system timing requirements

this talk: iteration free synthesis: what is needed? trends in chip industry: where do the wires go? some directions

Ralph Otten

conceptualdesign behavioral

synthesis logicsynthesis

layoutsynthesis

foot print

library

technology

datapreparation

gate and net

list

weightedincidencestructure

wire length and areaminimization under

technology constraints

iteration free synthesis (silicon compilers)iteration free synthesis (silicon compilers)

timing was an incidental,usually surprisingly good,result of a synthesis flow with size as its prime objective

Ralph Otten



layoutsynthesistiming

analysis

foot print

library

technology

datapreparation

timingoptimization

buffer insertion,transistor sizing,

fanout trees

wire loads,resistances,critical paths

iterative timing optimizationiterative timing optimization

Ralph Otten

timing awareness in conventional flowstiming awareness in conventional flows

synthesis: uses delay models but has very limited information

timing is the (arbitrary) outcome of

desired: a flow that satisfies timing constraints exactly whenever possible

resynthesis: accepts additional constraints and wire load models

layout synthesis: tries to reduce total wire length and area

• a sequence of optimizations with other objectives• adding constraints and resynthesis bringing it to a local optimum •adding more constraints and resynthesis bringing it to another local optimum

TIMING CLOSURE

Ralph Otten

sutherland's delay formula

sr

R o

oin scC pp scC LC

poin

Loo cbr

CC

cbr note: the absence of resistancenbetween

driver and load

p fg

g: computing effortsize independent !

depends on:• function• topology

• device size

p: inherent(parasitic) delaysize independent1/f : restoring effort

g/f : effort delay

if f is kept constant, then delay stays constant

Ralph Otten

continuously sized networkscontinuously sized networks

ca

cb

cn

C

C f g c aa C f g c bb

C f g c nn

gate size C f aC f g p c pinputs

xinputs

x C f aC f g p c pinputs

xinputs

x C f aC f g p c pinputs

xinputs

x

the size of a gatewith constant delay

varies linearly with the load:gate size = a f C

f, the scaling factor,is the same for all input

a, the area sensititivity,is a property of the gate,that is function,topology,

sizing

Ralph Otten

continuously sized networkscontinuously sized networks

the size of a gatewith constant delay

varies linearly with the load:gate size = a f C

Cj

Ck

qi

j

i

k

c f n jjij

c f n kkik

j)i(foj

jiji

kkikjjiji

i

c f nq

c f n c f n q

c

in vector notation:

cf N q c D q c f N-I D q f N-I c 1-D

Ralph Otten

timing closuretiming closure

p fg the size of a gate

with constant delayvaries linearly with the load:

gate size = a f C

Grodstein, e.a. ICCAD, 1995Sutherland and Sproul:

VLSI, 1991

gain-basedsynthesis

constantdelay

methodology

fixed delays

fixed timing

performanceplanning

guaranteedtiming

Ralph Otten

synthesis under timing constraints



areaoptimization layout

synthesistiminganalysis

foot print datapreparation

library

technology

sizeassignment area

solve the correspondingleontieff system

q f N-I c 1-D

insert buffers to reduce area

no iterative loop has been created!

Ralph Otten

size assignment

q f N-I c 1-D

N f

sizes

floorplanoptimization

qimposedcapacitances

N'

wirelengths

synthesis+netlist

restoringeffort

layoutsynthesis

weightedincidence

matrix

vector ofeffort reciprocals

that is Cin/Couta vector

implied bythe calculated

input capacitances

netlistpossibly modified by

inserted buffers

TIMING GUARANTEED

- for f was fixed - buffers inserted

for area recoveryonly where

enough slack is available !

Ralph Otten

resistive interconnectresistive interconnect

problem 1: how to cope with resistive interconnect while their delay models cannot be made size independent?

vst

Rtr c.lvtr

0.5Cw+Cp0.5CW

RW

ph rg

pcorinCwC

inCwR21gcor

2wC

wRwCpCtrR

r is not size independent

Ralph Otten

the new synthesis problemthe new synthesis problem

problem 2: how can we prevent synthesis from generating networks that preclude satisfying timing constraints, while timing correct networks exist?problem 1: how to cope with resistive interconnect while their delay models cannot be made size independent?

• sutherland's principle of uniform stage effort• brayton's uniform stage delay• technology mapping for speed

logic synthesis is to provide an initial netlist and the restoring effort 1/f for every gate !

how can synthesis be guided to produce networksthat lead to "fast enough" implementations?

wire planning

Ralph Otten

synthesis with wire planning



areaoptimization layout

synthesistiminganalysis

wireplanning

foot print datapreparation

library

technology

timing budgets

sizeassignment area

preplacementpin assignment

layer assignmentwire structures

no iterative loop has been created!

Ralph Otten

global wire theoryglobal wire theory

global interconnections are always point-to point wires

first moment matching is accurate enough

restoring circuits are modeled with sakurai's first order model

global wires are interconnects whose delay can be improved by inserting restoring circuitryassumptions

the length of a section , the critical length, dependson the wiring layer, but not on the buffer size , and tends to be constant when measured in feature sizes

the delay of an optimally segmented line is linear in its length,path delay is therefore independent of the positionof the restoring circuits on the path

the delay of a section of an optimally buffered line is the same for all layers

Ralph Otten

wire planning considerationswire planning considerations

the definition of global wires creates a two-level hierarchy

global wires will be optimally buffered

a wire planning scenario: allocate delays to global paths assign time budgets to modules create net lists for the modules assign size to all gates

given the path delays, and convex trade-off between module size and delay,size optimization is efficiently solvable,and produces time budgets for each module

logic synthesis has to create net lists for the modules with given time budgets, and assign restoring effortsto the gates

size assignment is done by solving the leontieff system

Ralph Otten

remaining problemsremaining problems

problem 2: how can we prevent synthesis from generating networks that preclude satisfying timing constraints, while timing correct networks exist?problem 1: how to cope with resistive interconnect while their delay models cannot be made size independent?

problem 3: optimally buffered lines fix input and output capacitances, and therefore constrain the total effort along a path, and thus the delay of that path.

cin

Cout

optimally buffered lines have fixed input /output capacitance

Ralph Otten

discrete librariesdiscrete libraries

problem 2: how can we prevent synthesis from generating networks that preclude satisfying timing constraints, while timing correct networks exist?

problem 4: does the fact that libraries are not continuously sizable defeat timing closure by fixing individual gate delays?



derivation assumes continuous sizability !

libraries are mostly discrete and offer limited range in sizes

Ralph Otten

some problems of timing closuresome problems of timing closure

problem 2: how can we prevent synthesis from generating networks that preclude satisfying timing constraints, while timing correct networks exist?

problem 4: does the fact that libraries are not continuously sizable defeat timing closure by fixing individual gate delays?


problem 5: can the efficiency of load independent mapping for speed be advantageous under a constant delay methodology?


Ralph Otten

are wires plannable?

a solid basis for wire planning pin placement for detour free routing valid retiming early layer assignment . . . . . . .

resistive interconnect and guiding synthesiswire planning

iteration free synthesissize assignment to achieve proper timing

for that we need:

Ralph Otten

wire planswire plans

a wire plan for a functional network is a position for each of its function nodes, and a pin assignment for all its primary inputs and outputs

a global wire plan is a wire plan of which all arcs represent global wires, andwill be laid out as optimally buffered lines.

a wire plan is monotonic if all its arcs can be laid out such thatthe L1-length of every directed path in the networkis equal to the L1-distance between its end points

given a pin assignment, no global wire plan is faster than a monotonic wire plan (if functions have fixed delays)

given a pin assignment, monotonic wire plans have the least wire capacitance

Ralph Otten

wire plans for given pin assignmentwire plans for given pin assignment

the inbox of a node is the smallest iso-rectanglecontaining its support

a functional networkhas a monotonic wire plan

with respect toa given pin assignment

fifevery node has

one and only one bridge

the outbox of a node is the smallest iso-rectanglecontaining its range

a bridge of a node is a minimum L2-length lineconnecting the inbox and the outbox

Ralph Otten

existence criterionexistence criterion

its in- or outbox is a single point

a functional networkhas a monotonic wire plan

with respect toa given pin assignment

fifevery node has

one and only one bridge

the existence of monotonic wire plan of a functional network for a given pin assignmentcan be checked on a node-by-node basis:

its inbox and outbox are perpendicular iso-lines

its outbox is in the projection of the inbox

Ralph Otten



iteration free synthesissize assignment to achieve timing

delay prediction is needed and should be enabledoptimally buffered global interconnect

Ralph Otten

trends in chip industrytrends in chip industry

many laws in chip industry fit a specific generic form:

)V(h)U(f

dVdU differential equation with an integral

(solvable by separation of variables)

Ralph Otten

moore's law

nu

mb

er o

f tr

ansi

sto

rs

10

1 K

70 year80 90 00

10

10

10

10

10

10

10

10

3

4

5

6

7

8

9

10

11

4 K

16 K

64 K

256 K

1 M

4 M

64 M

256 M

1 G

intel microprocessors

static memory

NdtdN

the growth rate of chip complexitywill be proportionalto the achieved complexity to date

[Gordon Moore, 1964]

proportionality constant,"moore exponent m",0.2 for processors, and0.4 for memory

N=numerical complexity of the module (e.g. the chip)

Ralph Otten

rent's rule

NT

dNdT

the growth rate of the terminal countwith the complexityof the modulewill be proportionalto the averagenumber of terminalsper submodule

[Landman, Russo, 1971]

proportionality constant,"rent exponent r",

N=numerical complexity of the module (e.g. the chip)

T(N) = the number of terminals of a module with numerical complexity N

Ralph Otten

r=0.45K=0.82

r=0.63K=1.4

rent’s curves

100 1,000 10,000 100,000 1,000,00010

100

1,000

10,000

static ram

dynamic ram

microprocessors

gate arrays

high performance computers

chip level

board level

[Bakoglu, 1987]

r=0.25K=82

r=0.5K=1.9

r=0.12K=6

r=0.1K=4

NT

dNdT

Ralph Otten

process exponents

10 -2

10-1

100

101

3 atom layers

gate oxide thickness

source/drain junction depth

minimum feature size

[ m]

10-3

1960 1970 1980 1990 2000 2010Year

LdtdL

the reduction rate of device sizeswill be proportionalto the achieved device size

[Status2000,ICE, 2000]

proportionality constantsare pretty close in value,and will be calledthe "process exponent p",

Ralph Otten

straverius laws

NdtdN

moore's law on chip industry

TN

dTdN

rent's rule on intra-module communication

L dtdL

observed miniaturization in chip technology

many laws in chip industry have generic form:

)V(h)U(f

dVdU differential equation with an integral

(solvable by separation of variables)

there are many more!!!

Ralph Otten

another old rule

massivememory

machines

massiveparallel

machines

in a balanced computer system the size of primary memory in bytesis close to the number of instructions per second

amdahl'sconstant

[Richard P. Case, 60's]

how primary memory should be supplied to a processor with a given speed

pentium IV

80486

cray 2

cray 1vax 11

ibm 360

processorspeed( MIPS)

memorysize (Mb)

1

1

1k

1k

Ralph Otten

memory-to-compute ratio

m

c

o SS

A)t(

)t(

down scaling forces

the memory-to-compute ratio to increase

M(t)

C(t)SmMScC

M(tO)

C(tO)

downscaling makes

memory (by Sm) and

processor (by Sc)

smaller

processing becameA times faster

due to downscaling

to rebalance the system

memory has to be extended

)1b( L

sb

very fast !!!

[Paul Stravers, 2000] bL

sbpdtdL

Ldtd

o

oo tC

tMt

tCtM

t

Ralph Otten

buffer area under global wire assumptionsbuffer area under global wire assumptions

nl rc

osc

oo2

o asrcrbncbrs)n,(l,T

ba

cc

c r bc r a

rccr

critoptoooo

o l/s

max

crit

l l I dl)l(Pl N area buffer

0 acbrn

T

2

2

n

l rcoo

c r ac r b

optcrit

oo n

ll

0l cc rbs

T

2o

s

ro

o

oc r

c r opts

note: buffer area is independent of

wire resistance

r.l

cos

ro/s /n

nc.l

Ralph Otten

wire length distributionwire length distribution

P(l), the wire length distribution, is usually obtained by requiring that rent's rule must be satisfied

max

crit


donath-feuer: pareto-levy distribution

sastry-parker: weibull distribution

davis-de-meindl: explicit (long) formulas separate for two regions

3r2lg)l(P

)lgexp(lrg)l(P r1r

Ralph Otten

relative buffer arearelative buffer area

•

tota

l bu

ffer

are

a / d

ie a

rea

0.1

0.2

0.3

0.4

0.25 0.20 0.15 0.10 0.05

L

0.5

• ••

•

•

•••

r =0.55

r =0.45

r =0.63

r =0.75

•• ••

using formulae of davis-de-meindl

max

crit


r =0.55

r =0.45

10 -1

10 -1 2.10 -1

10 -2

10 -3

5.10 -2 m

r =0.63

L

tota

l bu

ffer

are

a / d

ie a

rea

r =0.75

Ralph Otten




delay prediction is needed and should enable wire planningoptimally buffered global interconnect

the memory share of a balanced processor chip area will increase very fast with scaling

new architectures optimal buffering forces almost all functionality from a single layer chip

new technologies

Ralph Otten

multilayer integrationmultilayer integration

recrystall-ization

layergrowth

sidewallmetallization

filmtransfer

multilayer integration

growing stacking

seeding verticalintegration

already triedbefore 1980

the true3D integration

main disadvantage:early layers have to go through many cycles

main disadvantage:poor alignment of

inter-layer via's

Ralph Otten

benefitsbenefits

global interconnect length considerably reduced

folding datapaths over layers and determining optimum crossing points can shorten cycle time

much smaller total footprint for the same functionality

different technologies for different layers are feasible

industry sustained its miraculous growth up to now without it

technological feasibility for vlsi only shown recently

economical feasibility not yet proven

virtually no adequate cad-support

no design experience with multilayer integration

why not fully exploited today ?

Ralph Otten

possible layer dedicationpossible layer dedication

AlOSi

Si

2

buffers, optical receivers, i.o

processor, first level cache

second level cache interfaces

advanced memory technology

polyimide

AlOSi

Si

2

polyimide

AlOSi

Si

2

polyimide

Si

optical clock receivers,line repeaters,

regular i/o [Otten,1980]

processors(the main heat source),

first level memory

second level cachefor performance improvement

[M.B. Kleiner, S.A.Kühn, P. Ramm, W.Weber, 1995]

high densityadvanced

memory technology

Ralph Otten

thermal analysisthermal analysis

400

m

350

300

2500.5 1.0 1.5

temperature increase

ºC

AlOSi

Si

2

buffers, optical receivers, i.o

processor, first level cache

second level cache interfaces

advanced memory technology

polyimide

AlOSi

Si

2

polyimide

AlOSi

Si

2

polyimide

Si

[M.B. Kleiner, S.A.Kühn, P. Ramm, W.Weber, 1995]

Ralph Otten




delay prediction is needed and should enable wire planningoptimally buffered global interconnect

the memory share of a balanced processor chip area will increase very fast with scaling

new architectures optimal buffering forces almost all functionality from a single layer chip

new technologies multilayer integration

may ease all of the abovenew theories

today we are far from plannable wiring!

Giuseppe S. GarceaDelft University of Technology

Delft, The [email protected]

Ralph H.J.M. Otten

Eindhoven University of TechnologyEindhoven, The Netherlands

[email protected]

are wires plannable?are wires plannable?