vlsi

BiCMOS Technology

• Combines Bipolar and CMOS transistors in a

single integrated circuit.

• By retaining benefits of bipolar and CMOS,

BiCMOS is able to achieve VLSI circuits with

speed-power-density performance previously

unattainable with either technology individually.

• BiCMOS technology enables highperformance

integrated circuits ICs but increases process

complexity

Characteristics of CMOS

Technology

• Lower static power dissipation

• Higher noise margins

• Higher packing density – lower manufacturing cost per device

• High yield with large integrated complex functions

• High input impedance (low drive current)

• Scaleable threshold voltage

• High delay sensitivity to load (fan-out limitations)

• Low output drive current (issue when driving large capacitive loads)

• Low transconductance, where transconductance, gm Vin

• Bi-directional capability (drain & source are interchangeable)

• A near ideal switching device

Advantages of CMOS over

bipolar

Other CMOS Advantages

Characteristics of Bipolar

Technology • Higher switching speed

• Higher current drive per unit area, higher gain

• Generally better noise performance and better high frequency characteristics

• Better analogue capability

• Improved I/O speed (particularly significant with the growing importance of package limitations in high speed systems).

• high power dissipation

• lower input impedance (high drive current)

• low voltage swing logic

• low packing density

• low delay sensitivity to load

• high gm (gm Vin)

• high unity gain band width (ft) at low currents

• essentially unidirectional

Advantages of Bipolar

over CMOS

Other Bipolar

Advantages

BJT & CMOS Advantages

BJT over CMOS

• Switching speed

• Current drive per unit area

• Noise performance no 1/f

noise

• Analog capability

• I/O speed

• High transconductance

CMOS over BJT

• Power dissipation

• Noise margin

• Packing density

• Ability to integrate large

and complex circuits and

functions with high yield

• Good switch

BiCMOS Circuit Advantages

• Improved speed over CMOS

• Lower power dissipation over BJT (Simplifies – packing and board requirements)

• Flexible I/O (ECL, CMOS, or TTL)

• High performance analog

• Latch-up immunity

• High impedance input (FET)

• High gain (BJT)

• Low 1/f noise, >1 GHz toggle frequency

• Zero offset analog switches

• Gain-bandwidth product extended

• Good voltage reference

• Low input offset voltage for differential pair

The simplified BiCMOS Inverter

• Two bipolar transistors (T3 and T4), one nMOS and one pMOS transistor (both enhancement-type devices, OFF at Vin=0V)

• The MOS switches perform the logic function & bipolar transistors drive output loads

Vout

Vdd

Vin

T2

T4

T1

T3

CL

Vin = 0 :

• T1 is off. Therefore T3 is non-conducting.

• T2 ON - supplies current to base of T4.

• T4 base voltage set to Vdd.

• T4 conducts & acts as current source to charge load CL towards Vdd.

• Vout rises to Vdd - Vbe (of T4)

• Note : Vbe (of T4) is base-emitter voltage of T4.

(pullup bipolar transistor turns off as the output approaches 5V - Vbe (of T4))

Vin = Vdd :

• T2 is off. Therefore T4 is non-conducting.

• T1 is on and supplies current to the base of T3

• T3 conducts & acts as a current sink to discharge load CL towards 0V.

• Vout falls to 0V+ VCEsat (of T3)

• T3 & T4 present low impedances when turned on into saturation & load CL will be charged or discharged rapidly.

• Output logic levels will be good & will be close to rail voltages since VCEsat is quite small & VBE 0.7V. Therefore, inverter has high noise margins

• Inverter has high input impedance, i.e., MOS gate input

• Inverter has low output impedance

• Inverter has high drive capability but occupies a relatively small area

• However, this is not a good arrangement to implement since no discharge path exists for current from the base of either bipolar transistor when it is being turned off, i.e., – when Vin=Vdd, T2 is off and no conducting path to the base of T4

exists

– when Vin=0,T1 is off and no conducting path to the base of T3 exists

This will slow down the action of the circuit.

Further advantages of BiCMOS

Technology • High impedance CMOS transistors may be used for the

input circuitry while the remaining stages and output drivers are realized using bipolar transistors.

• BiCMOS devices offer many advantages where high load current sinking and sourcing is required.

• The high current gain of the NPN transistor greatly improves the output drive capability of a conventional CMOS device.

• MOS speed depends on device parameters such as saturation current and capacitance.

• These in turn depend on oxide thickness, substrate doping and channel length.

• BiCMOS can take advantage of any advances in CMOS and/or bipolar technology, greatly accelerating the learning curve normally associated with new technologies.

Are there disadvantages with

BiCMOS technology ? • Main disadvantage: greater process complexity when

compared to CMOS

• Results in a 1.25 -> 1.4 times increase in die costs over conventional CMOS.

• Taking into account packaging costs, the total manufacturing costs of supplying a BiCMOS chip ranges from 1.1-> 1.3 times that of CMOS.

• Since BiCMOS does not have to be scaled as aggressively as CMOS, existing fabs can be utilized resulting in lower capital costs.

• Extra costs incurred in developing a BiCMOS technology is more than offset by the fact that the enhanced chip performance obtained extends the usefulness of manufacturing equipment & clean rooms by at least one technology generation.

Signal Strengths

• Signals such as 1 and 0 have strengths, measures ability to sink or source current VDD and GND Rails are the strongest 1 and 0.

• Under the switch abstraction, G has complete control and S and D have no effect.

• In reality, the gate can turn the switch on only if a potential difference of at least Vt exists between the G and S.

• We will look at Vt in detail later on in the course.

• Thus signal strengths are related to Vt and therefore p and n transistors produce signals with different strengths

Tristate Inverter

• C=1, C=0 =>

– inverter function Z=A‟

• C=0, C=1 =>

– high impedance output Z

Building CMOS Gates (n-side)

Building CMOS Gates (p-side)

Building CMOS Gates (Final CMOS gate)

Pseudo-NMOS Logic

• In CMOS, use PMOS that is always ON to make pseudo-NMOS gates.

• PMOS transistor is always on and thus fights the pull-down NMOS network.

• Ratio of NMOS to PMOS strength very important design parameter.

• Uses a p-type transistor as a resistive pullup, ntype network for pulldowns.

• Consumes static power.

• Has much smaller pullup network than static gate.

• Pulldown time is longer because pullup is fighting.

• For logic 0 output, pullup and pulldown form a voltage divider.

• Must choose n, p transistor sizes to create effective resistances of the required ratio.

• Effective resistance of pulldown network must be comptued in worst case; series n-types means larger transistors.

• Make PMOS about 1/4 effective strength of pulldown network

Pseudo-NMOS Example

Pseudo-NMOS is a ratio circuit where dc current flows when the N pull-down

tree is conducting.

Must design the ratio of N devices W/L to P load device W/L so that

when the N pull down leg with max resistance is conducting, the output

is at a sufficiently low VOL.

Dynamic CMOS Logic

• General Dynamic CMOS Logic:

– IDD Path is turned off when clock-disabled and/or the

output is evaluated when clock enabled.

• In general:

– Basic Dynamic CMOS n-Logic.

– Basic Dynamic CMOS p-Logic.

– Clocked CMOS Logic.

– Domino Logic

Basic Dynamic CMOS Logic with n-

Logic

Example

• Erroneous Evaluation in Cascaded Dynamic n-

Logics:

– They are evaluated at the same clock edges.

– Even slow evaluation may be impossible if the output

has been discharged by pre-charged inputs.

Cascading Problem in Dynamic CMOS Logic

• If several stages of the previous circuit are cascaded together using the same clock , a problem in evaluation involving a built-in “race condition” will exist – During pre-charge, both Vout1 and Vout2 are pre-charged to Vdd

– When goes high to begin evaluate, all inputs at stage 1 require some finite time to resolve, but during this time charge may erroneously be discharged from Vout2

• e.g. assume that eventually the 1st stage NMOS logic tree conducts and fully discharges Vout1, but since all the inputs to the N-tree all not immediately resolved, it takes some time for the N-tree to finally discharge Vout1 to GND.

• If, during this time delay, the 2nd stage has the input condition shown with bottom NMOS transistor gate at a logic 1, then Vout2 will start to fall and discharge its load capacitance until Vout1 finally evaluates and turns off the top series NMOS transistor in stage 2

– The result is an error in the output of the 2nd stage Vout2

Cascaded Dynamic CMOS Logic Gates:

Evaluate Problem

• With simple cascading of dynamic CMOS logic stages, a problem arises in the evaluate cycle: – The pre-charged high voltage on Node N2 in stage 2

may be inadvertently (partially) discharged by logic inputs to stage 2 which have not yet reached final correct (low) values from the stage 1 evaluation operation.

– Can not simply cascade dynamic CMOS logic gates without preventing unwanted bleeding of charge from pre-charged nodes

• Possible Solutions: – Two phase clocks

– Use of inverters to create Domino Logic

– NP Domino Logic

– Zipper/NORA logic

Single Phase Dynamic CMOS Gates

• When the gates are precharged, N1 and N2 are charged to Vdd.

• During evaluate, n-logic-1 will conditionally discharge with some delay due to t.

• N1 must be fully evaluated before

• N2 is evaluated; otherwise the output will N2 be in error.

• This cannot be assured with single-phase clocks.

• These problems can be overcome by using multi-phase clocks and a sample and hold circuit to isolate cascaded stages and control evaluate timing.

• Example:

• During precharge, is automatically high and S is automatically ON

• Suppose that n-logic-1 is ON and n-logic-2 is ON – Immediately after precharge ends,

– N2 will tend to be pulled down until N1is pulled down by n-logic-1 thereby turning S OFF

• To make cascaded stages (single phase clock), S should be OFF until it is conditionally turned ON ( by N1 ).

• Can do this by inserting an inverter between N1and S => domino logic N1.

CMOS Domino Logic

• Problem with faulty discharge of precharged nodes in

CMOS dynamic logic circuits can be solved by placing

an inverter in series with the output of each gate:

– All inputs to N logic blocks (which are derived from

inverted outputs of previous stages) therefore will be at

zero volts during precharge and will remain at zero until

the evaluation stage has logic inputs to discharge the

precharged node PZ.

– However, all circuits only provide non-inverted outputs

• In (b) a weak P device compensates for charge loss

due to charge sharing and leakage at low frequency

clock operation

• In (c) the weak P device can be used to latch the

output high

NP Domino Logic (NORA Logic)

• An elegant solution to the dynamic CMOS logic “erroneous

evaluation” problem is to use NP Domino Logic (also called

NORA logic) as shown below.

• Alternate stages of N logic with stages of P logic: – N logic stages use true clock, normal precharge and

evaluation phases, with N logic tree in the pull down leg. P logic stages use a complement clock, with P logic stage tied above the output node.

– During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd.

– During evaluate clk is high (-clk is low) and both type stages go through evaluation; N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd.

• Inverter outputs can be used to feed other N-blocks from N-blocks, or to feed other P-blocks from P-blocks.

NORA CMOS Logic Circuit Example

• An example of NP or NORA (No Race) logic is shown

below:

• During low (‟ high), each stage pre-charges – N logic stages pre-charge to Vdd; P logic stages pre-

charge to GND

• When goes high (‟ low), each stage enters the evaluation phase – N logic evaluates to GND; P logic stages evaluate to

Vdd

– All NMOS and PMOS stages evaluate one after another in succession, as in Domino logic

• Logic is: – Stage 1 is X = (A · B)‟

– Stage 2 is G = X‟ + Y‟

– Stage 3 is Z = (F · G + H)‟

Zipper CMOS Dynamic Logic

• Zipper CMOS logic is a scheme for improving charge leakage and charge sharing problems

• Pre-charge transistors receive a slightly modified clock where the clock pulse (during pre-charge off time) holds the pre-charge transistor at weak conduction in order to provide a trickle pre-charge current during the evaluation phase – PMOS pre-charge transistor gates are held at Vdd -

|Vtp|

– NMOS pre-charge transistor gates are held at Vtn above GND

Clocked CMOS Logic (C2MOS)

Clock Distribution

• On a small chip, the clock distribution network is

just a wire

– And possibly an inverter for clkb

• On practical chips, the RC delay of the wire

resistance and gate load is very long

– Variations in this delay cause clock to get to different

elements at different times

– This is called clock skew

• Most chips use repeaters to buffer the clock and

equalize the delay

– Reduces but doesn‟t eliminate skew

Solutions

• Reduce clock skew – Careful clock distribution network design

– Plenty of metal wiring resources

• Analyze clock skew – Only budget actual, not worst case skews

– Local vs. global skew budgets

• Tolerate clock skew – Choose circuit structures insensitive to skew

Clock Skew

• Clock Skew Sources are:

– systematic,

– random,

– drift,

– jitter

• Reduce clock skew – Careful clock distribution network design

– Plenty of metal wiring resources

• Analyze clock skew – Only budget actual, not worst case skews

– Local vs. global skew budgets

• Tolerate clock skew – Choose circuit structures insensitive to skew

Clock System Architecture

Synchronous chip interface with PLL

Phase-locked loop block diagram

Global Clock Distribution Networks

• Grids

• H-trees

• Spines

• Ad-hoc

• Hybrid

Clock Grids

• A clock grid is a mesh of horizontal and vertical wires driven from the middle or edges.

• Use grid on two or more levels to carry clock.

• Make wires wide to reduce RC delay.

• Ensures low skew between nearby points.

• But possibly large skew across die.

• Grids compensate for random skew.

• Grids consume a large amount of metal resources and hence a high switching capacitance and power consumption.

Alpha Clock Grids

PLL

gclk grid

Alpha 21064 Alpha 21164 Alpha 21264

gclk grid

Alpha 21064 Alpha 21164 Alpha 21264

H-Trees

• Fractal structure – Gets clock arbitrarily close to

any point

– Matched delay along all paths

• Delay variations cause skew

• Buffers are added to serve as repeaters

• A and B might see big skew

• High random skew, drift and jitter between two nearby points

A B

Ad-hoc

• The clock is routed haphazardly with some attempt to equalize wire lengths or add buffers to equalize delay.

• Have low systematic skews because the buffer sizes can be adjusted until nominal delays are nearly equal.

• Subject to random skew.

• If loads are uniform, the spine avoids the systematic skew of the grid by matching the length of the clock wires.

• Save power by not switching certain wires.

• System with many clocked elements may require a large number of serpentine routes, leading to high area and capacitance for the clock network.

• Clock spines have large skews between nearby elements driven by different serpentines.

Clock Spine

Hybrid Networks

• Use H-tree to distribute clock to many points.

• Tie these points together with a grid.

• Hybrid combination of H-tree and grid offers

lower skew.

• Hybrid approach has lower systematic skew,

less susceptible to skew from non-uniform load

distribution.

• Hybrid approach is regular, making layout of

well-controlled transmission line structures

easier.

Clock Tree Distribution

• To prevent clock skew problems on a chip, clock

distribution networks are designed very carefully

• Example shown: linear (E-W) clock tree distribution

network

– Clock is buffered several times before driving FO=3

– Each FO3 buffer drives another high FO (FO=4 shown) buffer

– Finally another single buffer is used for each linear clock line to

drive across chip or functional island on a chip

• H tree distribution network often used on chips with area

pads (solder bumps)

– Master clock is brought on board chip near central part of chip

and driven outward with large H interconnection arrangement

Clock Generation

clken

1

2

3

4

Scaling of MOS Circuits

• VLSI technology is constantly evolving towards

smaller line widths.

• Reduced feature size generally leads to:

– Better / faster performance

– More gate / chip

• More accurate description of modern technology

is ULSI (ultra large scale integration.

Scaling Factors

• In our discussions we will consider 2 scaling factors, α and β

• 1/ β is the scaling factor for VDD and oxide thickness D

• 1/ α is scaling factor for all other linear dimensions

• Scaling models are: – Constant electric field

– Constant voltage field

– Combined E & V

Scaling Factors for Device Parameters

• Simple derivations showing the effects of scaling are derived in Pucknell and Eshraghian pages 125 – 129.

• It is important that you understand how the following parameters are effected by scaling: – Gate Area

– Gate Capacitance per unit area

– Gate Capacitance

– Charge in Channel

– Channel Resistance

– Transistor Delay

– Maximum Operating Frequency

– Transistor Current

– Switching Energy

– Power Dissipation Per Gate (Static and Dynamic)

– Power Dissipation Per Unit Area

– Power - Speed Product

Introduction

• Wiring-Up of chip devices takes place through

various conductors produced during processing

• Today, interconnects constitute the main source

of delay in MOS circuits

• We will examine:

– Sheet Resistance – Resistance / Unit Area

– Area Capacitance

– Delay Units

– CMOS Inverter Delay

– Rise and Fall Time Estimation

Sheet Resistance

• Resistance of a square slab of material

• RAB = ρL/A

• => R = ρL/t*W

• Let L = W (square slab)

• => RAB = ρ/t = Rs ohm / square

t w L

A

B

RAB = ZRsh

Z = L/W

Typical sheet resistance values for materials

are very well characterized

Layer Rs (Ohm / Sq)

Aluminium 0.03

N Diffusion 10 – 50

Silicide 2 – 4

Polysilicon 15 - 100

N-transistor Channel 104

P-transistor Channel 2.5 x 104

Typical Sheet Resistances for 5µm Technology

Polysilicon

N - diffusion

N-type Minimum Feature Device

R = 1sq x Rs = Rs = 104 Ώ

L

W 2λ

2λ

Polysilicon

N - diffusion

L = 2λ

W = 8λ

R = Z Rs

R = (L/W) * Rs

R = 4 104 Ώ

Area Capacitance of Layers

• Conducting layers are separated from each other by

insulators (typically SiO2)

• This may constitute a parallel plate capacitor, C = є0єox A

/ D (farads)

• D = thickness of oxide, A = area,

• єox = 4 F/µm2

• Area capacitance given in pF/µm2

• Standard unit for a technology node is the gate - channel

capacitance of the minimum sized transistor (2λ x 2λ),

given as •Cg

• This is a „technology specific‟ value

Delay Unit

• For a feature size square gate, τ = Rs x •Cg

• i.e for 5µm technology, τ = 104 ohm/sq x 0.01pF = 0.1ns

• Because of effects of parasitics which we have not

considered in our model, delay is typically of the order of

0.2 - 0.3 ns

• Note that τ is very similar to channel transit time τsd

CMOS Inverter Delay

• Pull-down delay = Rpd x 2 •Cg

• Pull-up delay = Rpu x 2•Cg

• Asymmetry in rise and fall due to resistance difference between pull-up and pull-down (factor of 2.5) (due to motilities of carriers)

• Delay through a pair of inverters is 2 τ (fall time) + 5 τ (rise time)

• Delay through a pair of CMOS inverters is therefore 7 τ

• Asymmetry can be improved by reducing resistance of pull - up

• Reduce resistance of pull - up by increasing channel width ( typically by a factor of 2.5)

• Note that increasing channel width also increases the capacitance

CMOS Inverter Rise and Fall Time

Estimation

• Tf ~ 3CL / βVDD

• Τr ~ 3CL / βVDD

• (Derivations for the above are in Pucknell and

Eshraghian Pages 105 - 107)

• So, τ r/ τf = βn/βp

• Given that (due to mobilities) βn = 2.5 βp, rise time is

slower by a factor of 2.5 when using minimum

dimensions of n and p transistors

Large Capacitive Loads

• When trying to drive off-chip loads, large capacitances are often encountered

• off-chip capacitances can be of the order of 104 •Cg

• Inverters intended to drive large capacitive loads must therefore present low pull-up and pull-down resistances

• The technique employed is to cascade a series of inverters in series, each one of which is larger than the preceding by a width factor f

Cascaded Inverters

• If N is the number of stages required to drive the load, f is the width scaling, CL is the load capacitance, and Cmin is the minimum capacitance,

• y = CL/Cmin = fN

• The number of stages is minimized if f = base of natural log e

• Details of this derivation is given in Pucknell and Eshraghian pages 107 - 109

Power and Energy

• Power is drawn from a voltage source attached to the

VDD pin(s) of a chip.

• Instantaneous Power:

• Energy:

• Average Power:

( ) ( )DD DDP t i t V

0 0

( ) ( )

T T

DD DDE P t dt i t V dt

avg

0

1( )

T

DD DD

EP i t V dt

T T

CMOS Power Dissipation

• Power dissipation in CMOS circuits comes from two components i.e. Ptotal = Pstatic + Pdynamic

• Static dissipation due to: – subthreshold conduction through OFF transistors

– tunneling current through gate oxide

– leakage through reverse-biased diodes

– contention current in ratioed circuits

• Dynamic dissipation due to: – charging and discharging of load capacitances

– short circuit current while both PMOS and NMOS networks are partially ON

• Ratioed circuits (e.g. pseudo NMOS) have more static dissipation.

Dynamic Power Dissipation

• Dynamic power is required to charge and discharge load

capacitances when transistors switch.

• One cycle involves a rising and falling output.

• On rising output, charge Q = CVDD is required

• On falling output, charge is dumped to GND

• This repeats Tfsw times over an interval of T

Cfsw

iDD

(t)

VDD

Dynamic Power Cont.

Cfsw

iDD

(t)

VDD

dynamic

0

0

sw

2

sw

1( )

( )

T

DD DD

T

DDDD

DDDD

DD

P i t V dtT

Vi t dt

T

VTf CV

T

CV f

Activity Factor

• Suppose the system clock frequency = f

• Let fsw = af, where a = activity factor

– If the signal is a clock, a = 1

– If the signal switches once per cycle, a = ½

– Dynamic gates:

• Switch either 0 or 2 times per cycle, a = ½

– Static gates:

• Depends on design, but typically a = 0.1

• Dynamic power: 2

dynamic DDP CV f

Short Circuit Current

• When transistors switch, both nMOS and pMOS

networks may be momentarily ON at once

• Leads to a blip of “short circuit” current.

• < 10% of dynamic power if rise/fall times are comparable

for input and output

Static Power

• Static power is consumed even when chip is

quiescent:

– Ratioed circuits burn power in fight between ON

transistors

– Leakage draws power from nominally OFF devices

0 1

gs t ds

T T

V V V

nv v

ds dsI I e e

0t t ds s sb sV V V V

Low Power Design

• Reduce dynamic power

– : clock gating, sleep mode

– C: small transistors (esp. on clock), short wires

– VDD: lowest suitable voltage

– f: lowest suitable frequency

• Reduce static power

– Selectively use ratioed circuits

– Selectively use low Vt devices

– Leakage reduction:

stacked devices, body bias, low temperature

THANK YOU

vlsi

Documents

Transcript of vlsi