The wire

24
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University [email protected]

description

SOC design and test

Transcript of The wire

  • EE 587SoC Design & TestPartha PandeSchool of EECSWashington State [email protected]

  • SoC Physical Design Issues

  • Design ChallengesNon-scalable global wire delayMoving signals across a large die within one clock cycle is not possible.Current interconnection architecture- Buses are inherently non-scalable. Transmission of digital signals along wires is not reliable.

  • Interconnect Scaling EffectsDense multilayer metal increases coupling capacitance Old Assumption DSM

    Long/narrow line widths further increases resistance of interconnect

  • Effect of Advanced Interconnect

  • Effect of Wire Scaling on DelayWhat happens to wire delay?Many people claim that wire delay goes up, as shown in the famous plot from the 1995 SIA roadmapBut it depends on how you scale the wires and which wires you are talking about.In a technology shrink (s< 1)There are really two types of wiresa. Wires that scale L directly by s, b. Wires of constant percentage of die size, the global wires of the increasing complex chipsDelay is different for these two cases as shown here:

  • Global Wire DelayGlobal wiresNon-scalable delayDelay exceeds one clock cycle

  • Wire Modeling

  • Elmore Delay

  • Elmore Delay

  • Delay of a wire

  • FO4 vs. Wire DelayFO41mm2mm3mm

    Interconnect1

    Assumption and 0.35um Technology Parameters

    scaling0.7Rw (ohm/um)0.03

    Rn (kohm/sq)13.5Cw (fF/um)0.2

    Tr Size (sq)1L_wire limit (um)1,040.96

    Cd (fF/um)2Wire Fitting34%

    Results

    node (nm)1T Delay (ps)FO4 (ps)1mm Al (ps)2mm Al (ps)3mm Al (ps)Total Delay Cu + FO4 (ps)

    65019.3280.00.783.127.02283.12

    50013.5200.01.596.3714.34206.37

    3509.5140.03.2513.0029.26153.00

    2506.6100.06.6326.5459.71126.54

    1804.670.013.5454.16121.85124.16

    1303.250.027.63110.53248.68160.53

    902.335.056.39225.56507.51260.56

    651.625.0115.08460.331,035.74485.33

    451.117.5234.86939.452,113.76956.95

    320.812.5479.311,917.254,313.801,929.75

    220.58.5978.193,912.758,803.683,921.25

    To mimic the trend as plotted in the lecture notes, the following assumptions have to be made:1. Gate delay is calculated based on a minimum size transistor;2. The interconnect length does not change over time (i.e. the interconnect used to calculate the delay at .65 would be the same for .1 um technology);3. K factor for the Cu case is changed from 3.9 to 2.0;4. all the parameters scale with the technology; and5. no coupling, fridging or lateral capacitance was assumed.There are problems or flaws in the above assumptions:i. (1) is not the proper or normal way of representing gate delay which gives rise to the under-estimation. Instead, FO4 delay should be used;ii. (2) this assumption does not consider the majority of the wires in a chip which are normally short in nature. Also, no basis could be found as the length being used in the study. Even assuming it is the worst case, this length should statistically represent 1% of the total wire population. Beside, this type wire delay could be easily solved by buffer insertion.iii. The K factor used is too low compared to what is being implemented at this point. This portraits a over optimistic view of switching wire to Cu;iv. All scale assumption is too optimistic as some parameters, such as wire height, do not scale in accordance to the normal scaling rule;v. Effects mentioned in (5) become very important in Deep Sub-Micron region and should not be ignored.

    A plot with FO4 delay is plotted to show the effects on the relative delay magnitude.

    Delay

    19.28571428570.780517083.122068325.98550376141

    13.51.5928926.37156812.2153137988500

    9.453.250813.003224.9292118343350

    6.6156.634285714326.537142857150.875942519250

    4.630513.539358600654.1574344023103.8284541204180

    3.2413527.6313440828110.5253763313211.8948043274130

    2.26894556.3904981282225.5619925128432.438376178590

    1T Delay (ps)

    1mm Al (ps)

    2mm Al (ps)

    3mm Al (ps)

    #REF!

    Technology (nm)

    Delay (ps)

    Interconnect Delay Trend(from lecture notes)

    FO4

    2800.780517083.122068327.02465372

    2001.5928926.37156814.336028

    1403.250813.003229.2572

    1006.634285714326.537142857159.7085714286

    7013.539358600654.1574344023121.8542274052

    5027.6313440828110.5253763313248.6820967454

    3556.3904981282225.5619925128507.5144831539

    25115.0826492412460.3305969651035.7438431712

    17.5234.8625494719939.45019788772113.7629452473

    12.5479.31132545291917.24530181164313.8019290762

    8.5978.18637847533912.74551390138803.677406278

    FO4 (ps)

    1mm Al (ps)

    2mm Al (ps)

    3mm Al (ps)

    Technology (nm)

    Delay (ps)

    Interconnect2

    Length (um)2.00E+04

    node (nm)Cw (fF/um)Rw (ohm/um)Cg (fF/um)Rt (kohm-um)PbetaMN (um)Wire Delay (ns)Gate Delay (ns)Coupled (ns)Total (ns)Total Delay Cu Interconnect + FO4 Inverter (ns)N/gate Length

    6500.20.0072032.025.000.521340.140.250.250.440.950.37523.286237572

    5000.200.01472.017.500.522199.200.300.300.531.140.37398.4095364448

    3500.200.032.012.250.523116.670.360.360.631.360.49333.3333333333

    2500.200.062.08.580.52668.330.430.430.751.620.81273.3089420011

    1800.200.122.06.000.521040.020.520.520.901.941.52222.3148148148

    1300.200.252.04.200.521623.440.620.621.082.323.01180.2787828969

    900.200.522.02.940.522813.730.740.741.292.776.08152.507962963

    650.201.062.02.060.52488.040.890.891.543.3112.37123.6712450673

    450.202.172.01.440.52824.711.061.061.843.9625.21104.6204625926

    320.204.422.01.010.521402.761.271.272.204.7351.4386.1640752742

    220.209.032.00.710.522381.611.511.512.625.65104.9473.4007654598

    Chooser21- Cu 2- Al

    M and N

    1.1316065276340.1360544218523.286237572

    1.9321835662199.2047682224398.4095364448

    3.2991443954116.6666666667333.3333333333

    5.633188239568.3272355003273.3089420011

    9.618496779540.0166666667222.3148148148

    16.423289328123.4362417766180.2787828969

    28.042264663313.7257166667152.507962963

    47.88130999448.0386309294123.6712450673

    81.75587365394.7079208167104.6204625926

    139.59565596032.757250408886.1640752742

    238.35531677511.614816840173.4007654598

    M

    N (um)

    N/gate Length

    Node (nm)

    M (Number of Segment)N/Gate Length

    NSize of Buffer(um)

    Wire + Buffer Delay

    0.25461146870.25461146870.4410.95022293740.3730128938

    0.30431891170.30431891170.52709581671.13573364010.3733717949

    0.36373066960.36373066960.631.35746133920.4904679487

    0.43474130240.43474130240.75299402391.62247662870.8108942831

    0.51961524230.51961524230.91.93923048451.521843792

    0.62105900340.62105900341.07570574842.31782375523.0126739072

    0.7423074890.7423074891.28571428572.77032926366.0831232714

    0.88722714770.88722714771.53672249773.311176793212.368903711

    1.06043926991.06043926991.83673469393.957613233825.2107171304

    1.26746735391.26746735392.19531785394.730252561751.4280826704

    1.51491324281.51491324282.62390670555.6537331911104.9396184387

    Wire Delay (ns)

    Gate Delay (ns)

    Coupled (ns)

    Total (ns)

    Total Delay Cu Interconnect + FO4 Inverter (ns)

    Node (um)

    Buffered Delay (ns)

    Total Delay without Buffer (ns)

    Sheet3

  • Delay with Buffer insertionFollow board notes (Chapter 10 of HJS)Refer to section 4.8 of HJS for resistance of a transistor

  • Buffer Insertion for Long WiresMake Long wires into short wires by inserting buffers periodically. Divide interconnect into N sections as follows:

    Then delay through buffers and interconnect is given by: tp = N *[Reff(Cself+ CW/2) + (Reff + RW)(CW/2+Cfanout)]What is the optimal number of buffers? Find N such that tP/ N = 0 N sqrt(0.4RintCint L2 /tpbuf) where tpbuf = Reff(Cself + Cfanout)What size should the buffers be? Find M such that tP/ M = 0 M = sqrt((Reqn/Cg3W)(Cint/Rint)) RwRwRwRwCw/2MMMMReff = Reqn/M Cself=Cj3W*M Cfanout = Cg3W*M Rw = RintL/N Cw = CintL/N2W

    WCw/2Cw/2Cw/2

  • Issues in Buffer InsertionEven number of repeaters needed to avoid logic inversionBetter strategy to optimize the delay-power productRepeaters for global wires require many via cuts from the upper-layer wires all the way down to the substrateFloorplanningArea and powerRepeated wires offer increased bandwidth

  • Gate Delay ScalingGate delay has scaled almost linearly.Gate and Diffusion capacitance also scale nicely

  • Wire ScalingResistance: Resistance grows under scaling, since the width and height both scale downDetail analysis of capacitance in later classes

    L_drawn0.18 um0.13 um0.10 um0.07 um0.05 um0.035 umSemi-global pitch, um0.360.260.200.140.100.07Global pitch, um0.720.520.400.280.200.14Chip edge, mm1920.722.824.927.430.1

  • Delay and BandwidthClassification of wiresConnects gates locally within blocks, when devices and blocks get smaller, these wires get shorterConnects blocks together, spanning significant portion of the die

  • Delay and Bandwidth (Contd)Wires that scale in lengthDelay scales with technologyWires span block of 50k gatesWires that do not scale in lengthIncreasing delay disparity with gatesRelative to gate delay roughly doubles each generation

  • Global wire delayGlobal wires limit the system performance

  • Uniformly Repeated Lines

  • Non-uniform Buffer Insertion

  • Non-uniform Buffer Insertion (Contd) Gain in power consumption is due to less number of buffers

  • SummarySingle synchronous clock region will span only a small fraction of the chip areaWe should not try to distribute a single low power clock all along the whole chipThe whole SoC needs to be divided into multiple functional islands with independent frequencySynchronization of signals crossing multiple clock boundary is important

    *