Performance of CMOS Circuits

67
Dec 2010 Performance of CMOS Circuits 1 Performance of CMOS Circuits Instructed by Shmuel Wimer Eng. School, Bar-Ilan University Credits: David Harris Harvey Mudd College (Some material copied/taken/adapted from Harris’ lecture notes)

description

Performance of CMOS Circuits. Instructed by Shmuel Wimer Eng. School, Bar-Ilan University Credits: David Harris Harvey Mudd College (Some material copied/taken/adapted from Harris’ lecture notes). Outline. Gate and Diffusion Capacitance RC Delay Models Power and Energy Dynamic Power - PowerPoint PPT Presentation

Transcript of Performance of CMOS Circuits

Page 1: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 1

Performance of CMOS Circuits

Instructed by Shmuel WimerEng. School, Bar-Ilan University

Credits: David HarrisHarvey Mudd College

(Some material copied/taken/adapted from Harris’ lecture notes)

Page 2: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 2

Outline

Gate and Diffusion Capacitance

RC Delay Models

Power and Energy

Dynamic Power

Static Power

Low Power Design

Page 3: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 3

MOSFET Capacitance

gate to substrate

gate to source

gate to drain

Page 4: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 4

Any two conductors separated by an insulator have

capacitance

Gate to channel capacitor is very important

– Creates channel charge necessary for operation

Source and drain have capacitance to body

– Across reverse-biased diodes

– Called diffusion capacitance because it is

associated with source/drain diffusion

Page 5: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 5

Gate Capacitance Approximate channel as connected to source

Cgs = oxWL/tox = CoxWL = CpermicronW

Cpermicron is typically about 2 fF/m

n+ n+

p-type body

W

L

tox

SiO2 gate oxide(good insulator, ox = 3.90)

polysilicongate

Page 6: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 6

Accumulation occurs

when Vg is negative (for

P material). Holes are

induced under the oxide.

Cgate = CoxA where Cox

= SiO2o/tox

Page 7: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 7

Depletion occurs

when Vg is near zero

but < Vtn. Here the

Cgate is given by CoxA

in series with

depletion layer

capacitance Cdep

Page 8: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 8

Inversion occurs when

Vg is positive and >

Vtn (for P material). A

model for inversion in

comprised of Cox A

connecting from gate-

to-channel and Cdep

connecting from

channel-to-substrate.

Page 9: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 9

Normalized gate

capacitance versus

Gate voltage Vgs.

High freq behavior

is due to the

distributed

resistance of

channel

Page 10: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 10

Normalized Experimental MOS Gate Capacitance

Measurements vs Vds, Vgs

For Vds = 0, the total gate capacitance Cox A splits

equally to the drain and source of the transistor.

Page 11: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 11

For Vds > 0, the gate capacitance tilts more toward

the source and becomes roughly 2/3 CoxA to the

source and 0 to the drain for high Vds.

Page 12: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 12

Higher Vgs – Vt forces this tilting to occur later,

since the device is linear up to Vgs – Vt = Vds.

Page 13: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 13

MOS Transistor Gate Capacitance Model

Gate capacitance has different components in different modes, but total remains constant.

Page 14: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 14

Gate capacitance has different components in different modes, but total remains constant.

Page 15: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 15

Diffusion Capacitance

Csb, Cdb

Undesirable, called parasitic capacitance

Capacitance depends on area and perimeter

– Use small diffusion nodes

– Comparable to Cg

for contacted diff

– ½ Cg for uncontacted

– Varies with process

Page 16: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 16

Diffusion Capacitance (Cont’d)

best

worst

Page 17: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 17

Effective Resistance

Shockley models have limited value

– Not accurate enough for modern transistors

– Too complicated for much hand analysis

Simplification: treat transistor as resistor

– Replace Ids(Vds, Vgs) with effective resistance R

• Ids = Vds/R

– R averaged across switching of digital gate

Too inaccurate to predict current at any given time

– But good enough to predict RC delay

Page 18: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 18

RC Delay Model Use equivalent circuits for MOS transistors

– Ideal switch + capacitance and ON resistance– Unit nMOS has resistance R, capacitance C– Unit pMOS has resistance 2R, capacitance C

Capacitance proportional to width Resistance inversely proportional to width

kg

s

d

g

s

d

kCkC

kCR/k

kg

s

d

g

s

d

kC

kC

kC

2R/k

Page 19: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 19

RC Values

Capacitance

– C = Cg = Cs = Cd = 2 fF/m of gate width

– Values similar across many processes Resistance

– R 6 K*m in 0.6um process– Improves with shorter channel lengths

Unit transistors– May refer to minimum contacted device (4/2 )– Or maybe 1 m wide device– Doesn’t matter as long as you are consistent

Page 20: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 20

Inverter Delay Estimate Estimate the delay of a fanout-of-1 inverter

d = 6RC

C

2C

C

2C

R

C

CR

2C

2C

R

C

2C

Y2

1A

Y 2

1

Page 21: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 21

Transient Response

DC analysis tells us Vout if Vin is constant

Transient analysis tells us Vout(t) if Vin(t) changes

– Requires solving differential equations

Input is usually considered to be a step or ramp

– From 0 to VDD or vice versa

Page 22: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 22

Inverter Step Response Ex: find step response of inverter driving load cap

0

0

( )

( ( ))

( )( )

DD

DD

dsn

loa

i

d

ut

n

o

out

V

V

u t t V

t t

V t

V

d I t

d

t

t C

Vin(t)Vout(t)

Cload

Idsn(t)

0

22

( ) )

)

(

0

(

2

dsn DD t DD t

DD

out

outout out Dt D t

t t

I t V V V VV

V t V t VV V V V

Page 23: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 23

Inverter Step Response Ex: find step response of inverter driving load cap

Vin(t)Vout(t)

Cload

Idsn(t)

Vout(t)

Vin(t)

t0t

Page 24: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 24

Delay Definitionsrising delay falling delay

high to low propagation

delay

low to high propagation

delay

Page 25: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 25

Page 26: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 26

Delay Definitions (Cont’d) tpdr: rising propagation delay

– Maximum time from input crossing 50% to rising output crossing 50%

tpdf: falling propagation delay

– Maximum time from input crossing 50% to falling output crossing 50%

tpd: average propagation delay

– tpd = (tpdr + tpdf)/2

tr: rise time

– From output crossing 0.2 VDD to 0.8 VDD

Page 27: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 27

Delay Definitions (Cont’d) tf: fall time

– From output crossing 0.8 VDD to 0.2 VDD

tcdr: rising contamination delay

– Minimum time from input crossing 50% to rising

output crossing 50%

tcdf: falling contamination delay

– Minimum time from input crossing 50% to falling

output crossing 50%

tcd: average contamination delay

– tcd = (tcdr + tcdf)/2

Page 28: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 28

Simulated Inverter Delay Solving differential equations by hand is too hard SPICE simulator solves the equations numerically

– Uses more accurate I-V models too! But simulations take time to write

(V)

0.0

0.5

1.0

1.5

2.0

t(s)0.0 200p 400p 600p 800p 1n

t pdf = 66ps

= 83ps

VinVout

t pdr

Page 29: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 29

Delay Estimation

We would like to be able to easily estimate delay– Not as accurate as simulation– But easier to ask “What if?”

The step response usually looks like a 1st order RC response with a decaying exponential.

Use RC delay models to estimate delay– C = total capacitance on output node– Use effective resistance R– So that tpd = RC

Characterize transistors by finding their effective R– Depends on average current as gate switches

Page 30: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 30

RC Delay Model Use equivalent circuits for MOS transistors

– Ideal switch + capacitance and ON resistance– Unit nMOS has resistance R, capacitance C– Unit pMOS has resistance 2R, capacitance C

Capacitance proportional to width Resistance inversely proportional to width

kg

s

d

g

s

d

kCkC

kCR/k

kg

s

d

g

s

d

kC

kC

kC

2R/k

Page 31: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 31

Example: 3-input NAND

Sketch a 3-input NAND with transistor widths chosen to achieve effective rise and fall resistances equal to a unit inverter (R).

222In worst case of P

only one device is

opened.

3

3

3

all N devices must

be opened.

Page 32: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 32

3-input NAND Caps

Annotate the 3-input NAND gate with gate and diffusion capacitance.

2C

2C

2C

2C

2C

2C2C 2C 2C

3C

3C

3C

3C3C

3C

3C

2 2 2

3

3

3

Page 33: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 33

3-input NAND Caps (Cont’d)

Annotate the 3-input NAND gate with gate and diffusion capacitance.

9C

3C

3C3

3

3

222

5C

5C

5C

Page 34: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 34

Elmore Delay

out0

tV t dt

out1 1 10

jn n n

i j j ii j i j i

tV t dt R C C R

Page 35: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 35

Elmore Delay

ON transistors look like resistors Pullup or pulldown network modeled as RC ladder Elmore delay of RC ladder

nodes

1 1 1 2 2 1 2... ...

pd i to source ii

N N

t R C

RC R R C R R R C

R1 R2 R3 RN

C1 C2 C3 CN

Page 36: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 36

For a step input Vin, the delay at any node can be estimated with the

Elmore delay equation tDi = Cj Rk

For example, the Elmore delay at node 7 is give by:

R1 ( C1 + C2 + C3 + C4 + C5+ C6+ C7 + C8 ) +

R6 ( C6+ C7+ C8 )+

R7 ( C7 + C8)

Page 37: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 37

Page 38: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 38

Page 39: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 39

Page 40: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 40

Example: 2-input NAND Estimate rising and falling propagation delays of a 2-

input NAND driving h identical gates.

6 4pdrt h RC

6C

2C2

2

22

4hC

B

Ax

Y

h copies

R

(6+4h)CY

Page 41: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 41

Example: 2-input NAND Estimate rising and falling propagation delays of a

2-input NAND driving h identical gates.

2 2 22 6 4

7 4

R R Rpdft C h C

h RC

6C

2C2

2

22

4hC

B

Ax

Y

h copies

(6+4h)C2CR/2

R/2x Y

Page 42: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 42

Delay Components

Delay has two parts– Parasitic delay

• 6 or 7 RC• Independent of load

– Effort delay• 4h RC• Proportional to load capacitance

Page 43: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 43

Contamination Delay

Best-case (contamination) delay can be substantially less than propagation delay.

Ex: If both inputs fall simultaneously

6C

2C2

2

22

4hC

B

Ax

Y

R

(6+4h)CYR

3 2cdrt h RC

Page 44: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 44

Diffusion Capacitance we assumed contacted diffusion on every s / d Good layout minimizes diffusion area Ex: NAND3 layout shares one diffusion contact

– Reduces output capacitance by 2C– Merged uncontacted diffusion might help too

7C

3C

3C3

3

3

222

3C

2C2C

3C3C

IsolatedContactedDiffusionMerged

UncontactedDiffusion

SharedContactedDiffusion

Page 45: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 45

Layout Comparison Layout representation by stick diagram. What CKT?

Which layout is better?

AVDD

GND

B

Y

AVDD

GND

B

Y

Page 46: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 46

Power is drawn from a voltage source attached to the VDD pin(s) of a chip.

Instantaneous Power:

Energy:

Average Power:

Power and Energy

( ) ( )DD DDP t i t V

0 0

( ) ( )T T

DD DDE P t dt i t V dt

avg0

1( )

T

DD DDE

P i t V dtT T

Page 47: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 47

Dynamic Power Dynamic power is required to charge and discharge

load capacitances when transistors switch One cycle involves a rising and falling output On rising output, charge Q = CVDD is required

On falling output, charge is dumped to GND This repeats Tfsw times over an interval of T

Cfsw

iDD

(t)

VDD

Page 48: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 48

dynamic0

1( )

T

DD DDE

P i t V dtT T

Cfsw

iDD

(t)

VDD

2sw sw

0

( )T

DD DDDD DD DD

V Vi t dt Tf CV CV f

T T

Page 49: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 49

Activity Factor Suppose the system clock frequency = f Let fsw = f, where = activity factor

– If the signal is a clock, = 1– If the signal switches once per cycle, = ½– Static gates:

• Depends on design, but typically = 0.1– Dynamic gates:

• Switch either 0 or 2 times per cycle, = ½

Dynamic power:2

dynamic DDP CV f

Page 50: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 50

Short Circuit Current

When transistors switch, both nMOS and pMOS

networks may be momentarily ON at once

Leads to a blip of “short circuit” current.

< 10% of dynamic power if rise/fall times are

comparable for input and output

Page 51: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 51

Power Dissipation Sources Ptotal = Pdynamic + Pstatic

Dynamic power: Pdynamic = Pswitching + Pshortcircuit

– Switching load capacitances– Short-circuit current

Static power: Pstatic = (Isub + Igate + Ijunct + Icontention)VDD

– Sub-threshold leakage– Gate leakage– Junction leakage– Contention current (ratioed logic)

Page 52: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 52

Dynamic Power Example

1 billion transistor chip– 50M logic transistors

• Average width: 12 • Activity factor = 0.1

– 950M memory transistors• Average width: 4 • Activity factor = 0.02

– 1.0 V 65 nm process– C = 1 fF/m (gate) + 0.8 fF/m (diffusion)

Estimate dynamic power consumption @ 1 GHz. Neglect wire capacitance and short-circuit current.

Page 53: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 53

6logic 50 10 12 0.025 / 1.8 / 27C m fF m nF

Power Estimate Ex (Cont’d)

2dynamic logic mem0.1 0.02 1.0 1.0 6.1WP C C GHz

6mem 950 10 4 0.025 / 1.8 / 171C m fF m nF

Page 54: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 54

Dynamic Power Reduction

Try to minimize:

– Activity factor

– Capacitance

– Supply voltage

– Frequency

2switching DDP CV f

Page 55: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 55

Activity Factor Estimation Let Pi = Prob(node i = 1)

i = Pi *(1- Pi)

Completely random data has P = 0.5 and = 0.25 Data is often not completely random

– e.g. MSBs of 64-bit words in memory address bus. MSBs of data representing measurements of physical phenomena.

Data propagating through ANDs and ORs has lower activity factor– Depends on design, but typically ≈ 0.1

Page 56: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 56

Switching Probability

What is the switching probability?

Page 57: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 57

Example A 4-input AND is built out of two levels of gates Estimate the activity factor at each node if the inputs

have P = 0.5

Page 58: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 58

Clock Gating The best way to reduce the activity is to turn

off the clock to registers in unused blocks– Saves clock activity ( = 1)– Eliminates all switching activity in the block– Requires determining if block will be used

Page 59: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 59

Capacitance

Gate capacitance– Fewer stages of logic– Small gate sizes

Wire capacitance– Good floorplanning to keep communicating

blocks close to each other– Drive long wires with inverters or buffers rather

than complex gates

Page 60: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 60

Voltage / Frequency

Run each block at the lowest possible voltage and frequency that meets performance requirements

Voltage Domains– Provide separate supplies to different blocks– Level converters required when crossing

from low to high VDD domains

Dynamic Voltage Scaling– Adjust VDD and f according to

workload

Page 61: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 61

Static Power

Static power is consumed even when chip is

quiescent

– Ratioed circuits burn power in fight between ON

transistors. Occurs when output is low (0).

– Leakage draws power from nominally OFF

devices

Page 62: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 62

Static Power Example

Revisit power estimation for 1 billion transistor chip Estimate static power consumption

– Subthreshold leakage

• Normal Vt: 100 nA/m

• High Vt: 10 nA/m

• High Vt used in all memories and in 95% of logic gates

– Gate leakage 5 nA/m– Junction leakage negligible

Page 63: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 63

Solution

t

6 6normal-V 50 10 12 0.025 m / 0.05 0.75 10 mW

P 584 mA 275 mA 1.0 V 859 mWstatic

t

6 6high-V

6

50 10 12 0.95 950 10 4 0.025 m /

109.25 10 m

W

t tnormal-V high-V100 nA/ m+ 10 nA/ m / 2 584 mAsubI W W

t tnormal-V high-V 5 nA/ m / 2 275 mAgateI W W

Page 64: Performance of CMOS Circuits

Leakage Control

Leakage and delay trade off– Aim for low leakage in sleep and low delay in

active mode To reduce leakage:

– Increase Vt: multiple Vt

• Use low Vt only in critical circuits– Increase Vs: stack effect

• Input vector control in sleep

Dec 2010 Performance of CMOS Circuits 64

Page 65: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 65

Gate Leakage

Extremely strong function of tox and Vgs

– Negligible for older processes– Approaches subthreshold leakage at 65 nm and

below in some processes An order of magnitude less for pMOS than nMOS Control leakage in the process using tox > 10.5 Å

– High-k gate dielectrics help

– Some processes provide multiple tox

• e.g. thicker oxide for 3.3 V I/O transistors Control leakage in circuits by limiting VDD

Page 66: Performance of CMOS Circuits

Power Gating Turn OFF power to blocks when they are idle to

save leakage– Use virtual VDD (VDDV)– Gate outputs to prevent

invalid logic levels to next block

Voltage drop across sleep transistor degrades performance during normal operation– Size the transistor wide enough to minimize

impact Switching wide sleep transistor costs dynamic power

– Only justified when circuit sleeps long enough

Dec 2010 Performance of CMOS Circuits 66

Page 67: Performance of CMOS Circuits

Dec 2010 Performance of CMOS Circuits 67

Low Power Design Reduce dynamic power

: clock gating, sleep mode

– C: small transistors (esp. on clock), short wires

– VDD: lowest suitable voltage

– f: lowest suitable frequency

Reduce static power

– Selectively use ratioed circuits (minimize)

– Selectively use low Vt devices (minimize)

– Leakage reduction:

stacked devices, body bias, low temperature