EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large...

26
1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 17: Process Variations 2 Variability Sources Physical Changes in characteristics of devices and wires. Caused by IC manufacturing process & wear-out (electro- migration). Time scale: 10 9 sec (years). Environmental Changes in VDD, Temperature, local coupling. Caused by the specifics of the design implementation. Time scale: 10 6 to 10 9 sec (clock tick).

Transcript of EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large...

Page 1: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

1

EE241 - Spring 2005Advanced Digital Integrated Circuits

Lecture 17:Process Variations

2

Variability SourcesPhysical

Changes in characteristics of devices and wires.Caused by IC manufacturing process & wear-out (electro-migration).Time scale: 109sec (years).

EnvironmentalChanges in VDD, Temperature, local coupling.Caused by the specifics of the design implementation.Time scale: 10−6 to 10−9 sec (clock tick).

Page 2: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

2

3

4

Process VariationsControl of minimum features does not track feature scaling

Relative device/interconnect variations increaseSources:

Random dopant fluctuationsFeature size, oxide thickness variations

Effects:SpeedPower, primary leakageYield

Page 3: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

3

5

Increasing Process Variations

Increase in variation of process parameters with scalingWorst-case design getting more expensive“Better than worst-case” design must be error tolerant

Percentage of total variation accounted for by within-die variation(device and interconnect)

Original Source: Original Source: SaniSani NassifNassif IBMIBM

6

Page 4: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

4

7

0

20

40

60

80

100

120

-39.71 -25.27 -10.83 3.61 18.05 32.49

∆VTn(mv)

# of

Chi

ps

~30mV

Vt Distribution

0.18 micron~1000 samples

Low FreqLow Isb

High FreqMedium Isb

High FreqHigh Isb

8

Sources of Variations

10

100

1000

10000

1000 500 250 130 65 32

Technology Node (nm)

Mea

n N

umbe

r of D

opan

t A

tom

s

Random Dopant Fluctuations

0.01

0.1

1

1980 1990 2000 2010 2020

micron

10

100

1000

nm193nm193nm248nm248nm

365nm365nm LithographyLithographyWavelengthWavelength

65nm65nm90nm90nm

130nm130nm

GenerationGeneration

GapGap

45nm45nm32nm32nm 13nm 13nm

EUVEUV

180nm180nm

Source: Mark Bohr, Intel

Sub-wavelength Lithography

Page 5: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

5

9

10

Achieving Sub-wavelength Resolution

Page 6: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

6

11

12

Page 7: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

7

13

14

Page 8: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

8

15

16

Causes Larger Frequency Distribution

Courtesy IntelCourtesy Intel

Page 9: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

9

17

Frequency & SD Leakage

0.9

1.0

1.1

1.2

1.3

1.4

0 5 10 15 20

Normalized Leakage (Isb)

Nor

mal

ized

Fre

quen

cy

0.18 micron~1000 samples

20X30%

Low FreqLow Isb

High FreqMedium Isb

High FreqHigh Isb

18

Variation-tolerant Design

0

0.5

1

1.5

# uArch critical pathsless more

Balance power &

frequency with

variation tolerance

0

0.5

1

1.5

Logic depthsmalllarge

frequencytarget frequency probability

00.5

11.5

2

Transistor sizesmall large

powertarget frequency probability

00.5

11.5

2

Low-Vt usagelow high

Page 10: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

10

19

ApproachesWorst-case design

Leaves too many crumbs on the table. Huge concurrency overhead for performance.

Regular design strategies to reduce variationCareful choice of logic stylesSelf-adapting design.

Turns on-line knobs (Vdd, Vt) to guarantee operation of the design. Uses one-time correction for systematic errors

Alternative Timing ApproachesSelf-timed or clockless design

Defers the decisions to the system level. Comes with large overheadPseudo-synchronous design (e.g. Razor)

Allows for occasional timing errors. Limited operation range.

30 March 2005 Slide 20

http://www.c2s2.org

www.c2s2.org

Problem: Predictability ≈ (Chip Variability)-1Problem: Predictability ≈ (Chip Variability)-1

Std library abstractions break:don’t “hide” the details

anymore, as we scale down

Defocuseffect

Defocuseffect

Exposure variation Resist effect

(shrinks)(grows)

Local printability problems

Cu thickness distrib

Cu thickness histogram

Global effects

Demise of context-freelayout design rules

Correlated randomvariations hit ckt level

Page 11: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

11

21

Yesterday’s Freelance Layout

Vss

Vdd

OpIp

Vss

Vdd

Op

No layout restrictionsNo layout restrictionsNo layout restrictions

22

Transistor Orientation Restrictions

Vss

Vdd

OpIp

Vss

Vdd

Op

Transistor orientation restricted to improve manufacturing control

Transistor orientation restricted to improve Transistor orientation restricted to improve manufacturing controlmanufacturing control

Page 12: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

12

23

Op

Vss

Vdd

Ip

Vss

Vdd

Op

Transistor Width Quantization

Page 13: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

1

30 March 2005 Slide 28

http://www.c2s2.org

www.c2s2.org

“Fabrics” Idea: Atomic Regularity(Make the Variablility Small…Everywhere)“Fabrics” Idea: Atomic Regularity(Make the Variablility Small…Everywhere)

Starting from basic manufacturingshapes circuits logic routing everything is extremely regular

Means radical re-architecting of flowsHow much predictability? At what cost?Initial motivation was “what’s after ASICs”, now more generally aimed at “predictability”

Tomorrow’sdesigns

Today’sdesigns

Regular Geometry FabricRegular Geometry Fabric

Regular/Structured Integrated SystemRegular/Structured Integrated System

Regular CircuitsRegular Circuits

ASPDAC, Jan. 2005ASPDAC, Jan. 2005

Regular Fabrics – A Plethora of Choices

FPGAFPGA

VPGACMU

VPGACMU

River PLABerkeley

River PLABerkeley

Structured ASIC (e.g. LSI RapidChip)Structured ASIC (e.g. LSI RapidChip)

Trade-off between area, performance, power and

time-to-market (factors 5 to 10)

TradeTrade--off between area, off between area, performance, power and performance, power and

timetime--toto--market market (factors 5 to 10)(factors 5 to 10)

Page 14: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

2

30 March 2005 Slide 30

http://www.c2s2.org

www.c2s2.org

Fabric Architectures:Via Patterned Gate ArrayFabric Architectures:Via Patterned Gate Array

Configurable with 4 masks for top viasBase architecture can be like an FPGA – but replace expensive switches with mask-config vias

Many possible interconnect options: std cell routing, or fully regular top-level patterned routing

Array offers fully predictable geom. patterning

-2.9821960000Regular Logic VPGA flow

-2.5211752048Standard ASIC flow

Slack (ns)Area (um2)Network switch (80k Gates)

VPGA regular logic +standard cell routing

[Pileggi, CMU]

30 March 2005 Slide 31

http://www.c2s2.org

www.c2s2.org

Fabric Analysis: Enhanced Manufacturability for Regular Ckt FabricsFabric Analysis: Enhanced Manufacturability for Regular Ckt Fabrics

Reduced CMP effectsCopper dishing < 40ÅPost-CMP Copper thickness variation is less than 2-3%

Highly promising as a manufacturable ‘logic’replacement structure

M4 Density of CMU VPGA FPU Cu Dishing (M4) Final Post-CMP Cu Thickness (M4)

Plated Thickness (M4) Oxide Erosion (M4)

[Boning, MIT Pileggi, CMU]

Page 15: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

3

30 March 2005 Slide 32

http://www.c2s2.org

www.c2s2.org

Fabric-level Custom Circuit Design:Limited-Switch Dynamic Logic (LSDL)Fabric-level Custom Circuit Design:Limited-Switch Dynamic Logic (LSDL)

Merges latch with every output stageSpeed of domino – with less power

Experiment: 16-bit Kogge-Stone adders, full domino vs LDSL

58 extracted 0.18um fab run modelsMonte-Carlo for chip-to-chip & mismatch

LDSL: good for size & speed~ 20% less area; ~2X faster

LSDL: good for regularity, var toleranceCells more regular in content and size. Less variation in pattern density.More tolerant of manufacturing variation

LSDL

Domino

Poly M1Pattern Density

Domino LSDL [Montoye 03]

Interchip+MismatchVariation

[Boning, MIT]

30 March 2005 Slide 33

http://www.c2s2.org

www.c2s2.org

Fabric-Level Flow Design:“Regularizing” Cell-Based FlowsFabric-Level Flow Design:“Regularizing” Cell-Based Flows

Complementary approach:

Regularize a library-based flowEvery cell is identical—except for viasRegularity issues handled by cell layout generators, exclusivelyExtends lifetime of existing flows

Looking at impacts on performanceEx: granularity of available cell sizes (i.e. library size) is reduced

[Sechen, Washington]

Page 16: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

4

2 Dec 200434 LSI Logic Confidential © LSI Logic 2004

RapidChip® Platform ASIC

Configurable PlatformFamilies of pre-manufactured slicesSea of transistors for high density, high performance user-configurable logicUp to 5 layers of metal personalizationFlexible approach to IP:

Diffused only when performance dictates, eg high speed SerDesOn-demand for most other IPs, egprocessors

Rich portfolio of soft IPs available

Xilinx

Today’s Reconfigurable FPGA Platform

PowerPC™Processor 400+ MHz

Programmable IO

10Mbit Dual-Port™RAM

High-speed 3.125 Gbps Serial Transceivers

>500 DSP datapaths 10 Million gates

18 Bit

18 Bit36 Bit

Z

VCCIO

Z

Z

ImpedanceControl

Page 17: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

5

36

Delay and Power Variability in CMOSGoal: Investigate the effects of variations in Vth, Lpoly, W, tox and Vdd on the performance of a family of representative circuits.

Quantify the statistical variability of circuit delay and power (active).Identify single parameter contributions to overall variability levels.

Circuits under study:NAND chain (six stages) Adders (16-bits, various architectures)Logic styles: Static, Dynamic Domino, PassgateAll transistor sizes optimized for minimum delay under an area constraint

Experimental Setup:90nm, pd-SOI technologyIndustrial research siteAll parameter distributions set by predictive BSIMSOI models, ITRS (2003)

37

Monte Carlo Simulation IGoal I: Vary all parameters simultaneously; study the statistical variability of power and delay.

Variable parameters: Vth, Lpoly, W, tox, Vdd: 1V (mean value) Temperature held at 85˚CInterdependencies between parameters reconciled within the simulation

N = 200 for adders, N = 1000 for NANDs

The spatial correlation coefficient defines parameter matching between adjacent transistors

Each parameter is assigned identically to all transistors within each circuit instanceρ is set to 1, indicates perfect correlation (worst-case)

Draw all parameter values randomly

from respective distributions

Choose circuit under study

Apply parameters to circuit to create a specific instance

Submit circuit instance to SPICE

simulation, measure active power and delay

repeat N times

Page 18: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

6

38

The operating value of Vth is composed of its long channel Vth0 value modified by ∆Vthfactors (BSIMSOI Model):

Interdependencies between parameters are reconciled within each simulation by separating Vth, OPERATING into independent and dependent components.

Interdependencies Between Parameters

)(,)(,

)(,)(,

)(

,

,

,,,0

WVLNV

LVVVV

tNV

thNarrrowWid

HALO

DIBL

BIAS

th

haloth

dsth

bsth

oxSMchannelth

∆ +∆+

∆− ∆+

ΦΦ

≅OPERATINGthV ,

L

Vth,OPERATING

Vth0

∆Vth,HALO

∆Vth,DIBL

∆Vth,BIAS

39

Monte Carlo Simulation II

. . . .

Draw one parameter value randomly

from its distribution, hold others at nominal

Choose circuit under study

Apply parameters to circuit to create a specific instance

Submit circuit instance to SPICE

simulation, measure active power and delay

repeat N times

repeat for all p parameters

Draw one parameter value randomly

from its distribution, hold others at nominal

Apply parameters to circuit to create a specific instance

Submit circuit instance to SPICE

simulation, measure active power and delay

repeat N times

Goal II: Isolate individual parameter contributions to overall power/delay variability

Parameter distributions same as in previous setup

Again, perfect spatial correlation of parameters is assumed (ρ = 1)

Page 19: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

7

40

NAND Chains (6-stages)

out

a

b

c

M1 M2 M3

M4

M5

M6

Static CMOS Static Passgate (LEAP)

x

ab

clk

out

clk'

c

M1

M2

M3

M4

M5

out'

clk

a

b

c

M1

M2

M3

M4

M5

M6

out

Pulsed Static Dynamic Domino

Static capacitive load, CL = 10fF

Active, FO3 load (value varies with parameter fluctuations)

41

AddersRipple carry with Manchester carry chain (passgate-based)

Static

Dynamic

Carry select, logarithmic configuration

bit00

1

0

1

0

1

0

1

0

1

bit0

bit1 bit2

bit4 bit5bit3

Cout2

Cout0

Cin

Cout5

Cout9

Cout15

bit6 bit9bit8bit7

bit15bit11 bit14bit13bit12bit10

Static, Dynamic Domino, Passgate

Bit level Cout and Sum selection

Block level Cout selection

Cout generation

Page 20: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

8

42

Adders: CLA Trees

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Sum 15

Sum 14

Sum 13

Sum 12

Sum 11

Sum 10

Sum 09

Sum 08

Sum 07

Sum 06

Sum 05

Sum 04

Sum 03

Sum 02

Sum 01

Sum 00

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Sum 15

Sum 14

Sum 13

Sum 12

Sum 11

Sum 10

Sum 09

Sum 08

Sum 07

Sum 06

Sum 05

Sum 04

Sum 03

Sum 02

Sum 01

Sum 00

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Sum 15

Sum 14

Sum 13

Sum 12

Sum 11

Sum 10

Sum 09

Sum 08

Sum 07

Sum 06

Sum 05

Sum 04

Sum 03

Sum 02

Sum 01

Sum 00

Kogge Stone, Radix 2Kogge Stone, Radix 4

Large stack height (static) = 8

Rippled Carries

P,G Generation

Group P,G

Sum Generation

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Sum 15

Sum 14

Sum 13

Sum 12

Sum 11

Sum 10

Sum 09

Sum 08

Sum 07

Sum 06

Sum 05

Sum 04

Sum 03

Sum 02

Sum 01

Sum 00

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Sum 15

Sum 14

Sum 13

Sum 12

Sum 11

Sum 10

Sum 09

Sum 08

Sum 07

Sum 06

Sum 05

Sum 04

Sum 03

Sum 02

Sum 01

Sum 00

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Sum 15

Sum 14

Sum 13

Sum 12

Sum 11

Sum 10

Sum 09

Sum 08

Sum 07

Sum 06

Sum 05

Sum 04

Sum 03

Sum 02

Sum 01

Sum 00

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Sum 15

Sum 14

Sum 13

Sum 12

Sum 11

Sum 10

Sum 09

Sum 08

Sum 07

Sum 06

Sum 05

Sum 04

Sum 03

Sum 02

Sum 01

Sum 00

Brent-KungLarge intermediate load capacitance along critical path (Sum07 node)

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Sum 15

Sum 14

Sum 13

Sum 12

Sum 11

Sum 10

Sum 09

Sum 08

Sum 07

Sum 06

Sum 05

Sum 04

Sum 03

Sum 02

Sum 01

Sum 00

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Sum 15

Sum 14

Sum 13

Sum 12

Sum 11

Sum 10

Sum 09

Sum 08

Sum 07

Sum 06

Sum 05

Sum 04

Sum 03

Sum 02

Sum 01

Sum 00

Han-Carlson

43

Delay, Power Variability: NAND chains

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

STATIC PSCMOS LEAP DOMINO

Nor

mal

ized

Del

ay V

aria

bilit

y (s

igm

a/m

ean)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

STATIC PSCMOS LEAP DOMINO

Nor

mal

ized

Pow

er V

aria

bilit

y (s

igm

a/m

ean)

The static CMOS implementation is the most robust to process parameter variationsThe passgate style (LEAP) displays the highest levels of delay and power variability (30% higher than static)

Page 21: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

9

44

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

MAN

CHES

TER

STAT

IC

MAN

CHES

TER

DYN

AMIC

CAR

RY

SELE

CTST

ATIC

CAR

RY

SELE

CTPA

SSG

ATE

CAR

RY

SELE

CTD

YNAM

IC

KO

GG

E-ST

ON

EST

ATIC

RAD

IX 2

KO

GG

E-ST

ON

EST

ATIC

RAD

IX 4

KO

GG

E-ST

ON

EPA

SSG

ATE

KO

GG

E-ST

ON

ED

YNAM

IC

HAN

-CAR

LSO

N

BR

ENT-

KU

NG

Nor

mal

ized

Del

ay V

aria

bilit

y (s

igm

a/m

ean)

Delay Variability: Adders

Static carry select is the most robustThe three most variable are passgate-based, between 31% - 67% more spread than static carry select

45

0.0

0.5

1.0

1.5

2.0

2.5

MAN

CHES

TER

STAT

IC

MAN

CHES

TER

DYN

AMIC

CAR

RY

SELE

CTST

ATIC

CAR

RY

SELE

CTPA

SSGA

TE

CAR

RY

SELE

CTD

YNAM

IC

KOG

GE-

STO

NE

STAT

IC R

ADIX

2

KOG

GE-

STO

NE

STAT

IC R

ADIX

4

KOG

GE-

STO

NE

PASS

GATE

KOG

GE-

STO

NE

DYN

AMIC

HAN

-CAR

LSO

N

BR

ENT-

KU

NG

Nor

mal

ized

Pow

er V

aria

bilit

y (s

igm

a/m

ean)

Most robust: static ripple with Manchester carry chainThe least robust: designs with large/irregular intermediate load capacitance along critical paths (radix 4 Kogge Stone, Brent Kung)

Power Variability: Adders

Page 22: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

10

46

Single Parameter Breakdown: NAND Chains

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

STATIC PSCMOS LEAP DOMINO

Indi

vidu

al P

aram

eter

Co

ntrib

utio

n to

Del

ay V

aria

bilit

yVthLVddtoxW

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

STATIC PSCMOS LEAP DOMINO

Indi

vidu

al P

aram

eter

Co

ntrib

utio

n to

Del

ay V

aria

bilit

y

VthLVddtoxW

Results vary depending on final loading stage (static vs. FO3) Vth is most significant contributor in all casesFor active, F03 loads:

Passgate design is most sensitive to Vth variationsIncreased significance of L variations

Static capacitive load F03 load

47

0%

20%

40%

60%

80%

100%

MAN

CHES

TER

STAT

IC

MAN

CHES

TER

DYN

AMIC

CAR

RY

SELE

CTST

ATIC

CAR

RY

SELE

CTPA

SSG

ATE

CAR

RY

SELE

CTD

YNAM

IC

KO

GG

E-ST

ON

EST

ATIC

RAD

IX 2

KO

GG

E-ST

ON

EST

ATIC

RAD

IX 4

KO

GG

E-ST

ON

EPA

SSG

ATE

KO

GG

E-ST

ON

ED

YNAM

IC

HAN

-CAR

LSO

N

BR

ENT-

KU

NGN

orm

aliz

ed In

divi

dual

Par

amet

er

Cont

ribut

ion

to D

elay

Var

iabi

lity

WtoxvddLvth

Single Parameter Breakdown: Adders (Delay)

Vth is most significant contributor (33% average)Passgate designs are the most sensitive to Vth variationsL is nearly as significant (28% average)

Page 23: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

11

48

0%

20%

40%

60%

80%

100%

MAN

CHES

TER

STAT

IC

MAN

CHES

TER

DYN

AMIC

CAR

RY

SELE

CTST

ATIC

CAR

RY

SELE

CTPA

SSG

ATE

CAR

RY

SELE

CTD

YNAM

IC

KO

GG

E-ST

ON

EST

ATIC

RAD

IX 2

KO

GG

E-ST

ON

EST

ATIC

RAD

IX 4

KO

GG

E-ST

ON

EPA

SSG

ATE

KO

GG

E-ST

ON

ED

YNAM

IC

HAN

-CAR

LSO

N

BR

ENT-

KU

NGN

orm

aliz

ed In

divi

dual

Par

amet

er

Cont

ribut

ion

to P

ower

Var

iabi

lity

WtoxvddLvth

Single Parameter Breakdown: Adders (Power)

Vdd contributions dominate (41% average)Vth variations are also significant (30% average)

49

ConclusionsStatic CMOS implementations are generally the most robust to parameter variations, for both delay and powerPassgate designs display the least amount of robustness:

Suffer spreads in delay and power variability between 30% – 70% higher than static designsTend to display highest sensitivity to Vth variationsThese are worst-case results, due to the assumption of perfect parameter correlation

Vth variations account for 35% - 40% of delay variabilityPower variability trends suggest a dependence upon large or irregular intermediate load capacitancesVth, L and Vdd are consistently the highest contributors to both delay (85%) and power (80%) variation.

Page 24: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

12

50

A Self-adapting Approach

Module

Motivation: Most timing variations are systematic, and can be adjusted forat start-up time using one-time calibration!

• Relevant parameters: Tclock, Vdd, Vth• Vth control — the most effective and efficient at low voltages• Can be easily extended to include leakage-reduction and power-down in standby

TestModule

Vdd

Vbb

Test inputsand responses

Tclock

• Achieves the maximum power saving under technology limit• Inherently improves the robustness of design timing• Minimum design overhead required over the traditional design methodology

51

Vth Tuning via Body Bias

Less design cost than Vdd tuningVth tunable range: >150mV for a 90nm Technology

0

0.1

0.2

0.3

0.4

0.5

0.6

-2 -1 0 1 2

Vth (V)

Vbs

(V)

Reversed Vbs

Forward Vbs

G

B

S D

Page 25: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

13

52

Power and Timing Tradeoffs

Vth tuning can effectively gain performance back

5

10

15

20

25

30

35

40

45

50

1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07

Path Delay (ps)

Esw

itchi

ng(fJ

)

Adaptive Tuning

Worst Case, w/o Vth tuning

Worst Case, w/ Vth tuning

Nominal, w/o Vth tuning

Nominal, w/ Vth tuning

Vdd: 200-500mV

Resistor Network

4.5 mm

5.3

mm

Multiplesubsites PD & Counter Resistor

Network

CUT Bias Amplifier

Delay

Die frequency: Min(F1..F21)Die power: Sum(P1..P21)

Technology 150nm CMOSNumber of subsites per die 21

Body bias range 0.5V FBB to 0.5V RBB

Bias resolution 32 mV

1.6 X 0.24 mm, 21 sites per die150nm CMOS

Adaptive Body BiasAdaptive Body Bias----ExperimentExperiment

Page 26: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...Transistor size small large power target frequency probability 1 1.5 2 Low-Vt usage low high. 10 19 Approaches

14

Adaptive Body BiasAdaptive Body Bias----ResultsResults

0%

20%

60%

100%

Acc

epte

d di

e

noBB

100% yield

ABB

Higher Frequency

Num

ber o

f die

s

Frequency

too slow

ftarget

too leaky

ftarget

ABB

FBB RBB

Num

ber o

f die

s

Frequency

too slow

ftarget

too leaky

ftarget

ABB

FBB RBB

97% highest bin

within die ABB

For given Freq and Power densityFor given Freq and Power density•• 100% yield with ABB 100% yield with ABB •• 97% highest freq bin with ABB for 97% highest freq bin with ABB for within die variability within die variability

55

Adaptive Approach for Dealing with Variations

Source: Sam Naffziger, HP