PPGEE ’08PPGEE ’08 Reliability in Nanometer Technologies – Reliability in Nanometer Technologies –
Problems and SolutionsProblems and SolutionsDr.-Ing. Frank Sill
Department of Electrical Engineering, Federal University of Minas Gerais,
Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil
http://www.cpdee.ufmg.br/~frank/
Copyright Sill, 2008
AgendaAgenda
Motivation Failures in Nanometer Technologies Techniques to Increase Reliability Shadow Transistors
PPGEE‘08, Reliability 2
Copyright Sill, 2008
MotivationMotivation
Reliability important for
Normal user
Companies
Medical applications
Cars
Air / Space Environment
…
PPGEE‘08, Reliability 3
Copyright Sill, 2008
MotivationMotivation
Probability for failures increases due to: Increasing transistor count Shrinking technology
PPGEE‘08, Reliability
Northwood55 Mill.
Prescott125 Mill.
Yonah, 151 Mill.
Wolfdale410 Mill.
Yonah151 Mill.
Copyright Sill, 2008
DimensionsDimensions
PPGEE‘08, Reliability 5
1 m10 cm1 cm1 mm100 µm
10 µm100 nm
„65 nm“-Transistor Source: Intel
Source: „Spektrum der Wissenschaften“
Failures in Nanometer Failures in Nanometer TechnologiesTechnologies
Copyright Sill, 2008
Process FailuresProcess Failures
Occur at production phase Based on
Process Variations Particles …
PPGEE‘08, Reliability 7
Source: Mak
Copyright Sill, 2008
Sub-wavelength LithographySub-wavelength Lithography
PPGEE‘08, Reliability 8
193nm248nm
365nm
Lith
ogra
phy
Wav
ele
ngt
h [n
m]
65nm
90nm
130nm
Generation
Gap
45nm
32nm13nm EUV
180nm
Source: Mark Bohr, Intel
0,01
0,1
1
1980 1990 2000 2010 2020
Ge
ner
atio
n [µ
]
10
100
1000
Copyright Sill, 2008
Field-dependent AberrationsField-dependent Aberrations
PPGEE‘08, Reliability 9
Cell A
Cell A
Cell A
(X1 , Y1)
(X0 , Y0)
(X2 , Y2)
Big Chip
),(A_CELL),(A_CELL),(A_CELL 220011 YXYXYX
Center: Minimal
Aberrations
Edge: High Aberrations
To
war
ds L
en
s
Wafer Plane
Lens
Source: R. Pack, Cadence
Copyright Sill, 2008
Varying Line WidthVarying Line Width
PPGEE‘08, Reliability 10
2.3
2.2
2.1
2.0
1.9
1.8
50100
150
020
4060
Lin
eWid
th [
nm
]
Wafer X Wafer Y0
Source: Zhou, 2001
Copyright Sill, 2008
Random Dopant FluctuationsRandom Dopant Fluctuations
PPGEE‘08, Reliability 11
UniformUniform Non-uniformNon-uniform
Causes Vth Variations
Source: Borkar, Intel
10
100
1000
10000
1000 500 250 130 65 32
Technology Node (nm)
Mea
n N
um
ber
of
Do
pan
t A
tom
s
Copyright Sill, 2008
Power DensityPower Density
PPGEE‘08, Reliability 12
40048008
80808085
8086
286386
486Pentium®
P4
1
10
100
1000
10000
1970 1980 1990 2000 2010
Year
Po
wer
Den
sity
(W
/cm
2)
Hot Plate
NuclearReactor
RocketNozzle
Sun’sSurface
Prescott Pentium®
Source: Moore, ISSCC 2003
Copyright Sill, 2008
Temperature VariationTemperature Variation
PPGEE‘08, Reliability 13
Power density is not uniformly distributed across the chip Silicon is not a good heat conductor Max junction temperature is determined by hot-spots
Impact on packaging, cooling
Power Map On-Die Temperature
Source: Borkar, Intel
Copyright Sill, 2008
Temperature Variation cont’dTemperature Variation cont’d
PPGEE‘08, Reliability 14
Power4 Server Chip
Source: Devgan, ICCAD’03
Copyright Sill, 2008
Temperature Variation cont’dTemperature Variation cont’d
Threshold voltage Vth changes with temperature drain-source current changes delay
changes
PPGEE‘08, Reliability 15
IDS
dela
y
Dra
in c
urre
nt I
DS [p
A]
De
lay
[s]
Source: Burleson, UMASS, 2007
Temperature [°C]
Copyright Sill, 2008
Supply Voltage DropSupply Voltage Drop
PPGEE‘08, Reliability 16
Source: Trester, 2005
Copyright Sill, 2008
Failures Through Increasing DelayFailures Through Increasing Delay
PPGEE‘08, Reliability 17
FFLogicFF FFFF
FFFF
VDD↓, Temp.↑, ...
Clock (Clk)
Data are
processed before
clock phase is over
Logic too slow!
→ Data processing
longer than clock
phase
→ Wrong Data in
next clock phase!
Clk
Clk
Copyright Sill, 2008
Soft ErrorsSoft Errors
PPGEE‘08, Reliability 18
Source: Automotive 7-8, 2004
1
In 70’s observed: DRAMs occasionally flip bits for no apparent reason Ultimately linked to alpha particles and cosmic rays Collisions with particles create electron-hole pairs in substrate These carriers are collected on dynamic nodes, disturbing the voltage
Copyright Sill, 2008
Soft Errors cont’dSoft Errors cont’d
Internal state of node flips shortly If error isn’t masked by
Logic: Wrong input doesn’t lead to wrong output Electrical: Pulse is attenuated by following gates Timing: Data based on pulse reach flipflop after clock transistion
wrong data
PPGEE‘08, Reliability 19
FF
FF
FF
FF
Copyright Sill, 2008
ElectromigrationElectromigration
Electromigration: Transport of material caused
by the gradual movement of ions in a conductor
One of the major failure mechanisms in interconnects.
Proportional to the width and thickness of the metal lines
Inversely proportional to the current density
PPGEE‘08, Reliability 20
Top ViewVoid
Thick Oxide
Cross Section View
Whisker, Hillock
Source: Plusquellic, UMBC
Metal 1
Metal 1
Metal 1
Metal 2
Copyright Sill, 2008
Electromigration cont’dElectromigration cont’d
Void in 0.45mm Al-0.5%Cu lineSource: IMM-Bologna
PPGEE‘08, Reliability 21
Hillocks in ZnSnSource: Ku&Lin,2007
Whiskers in SnSource: EPA Centre
Copyright Sill, 2008
Tunneling currents
Wear out of gate oxide
Creation of conducting path
between Gate and Substrate,
Drain, Source
Depending on electrical field over
gate oxide, temperature (exp.),
and gate oxide thickness (exp.)
Also: abrupt damage due to
extreme overvoltage (e.g. Electro-
Static Discharge)Source: Pey&Tung
Source: Pey&Tung
Time-Dependent Dielectric Breakdown (TDDB)Time-Dependent Dielectric Breakdown (TDDB)
PPGEE‘08, Reliability 22
Copyright Sill, 2008
Variability TrendsVariability Trends
PPGEE‘08, Reliability 23
0
10
20
30
40
50
60
70
90 80 70 65 57 50 45 40 36 32 28
% V
aria
bili
ty
Technology Node [nm]
Vdd
Vth
Performance
Power
Lgate
Source: Burleson, UMASS, 2007
Copyright Sill, 2008
Variability Trends cont’dVariability Trends cont’d
PPGEE‘08, Reliability 24
Technology [nm]
0
50
100
150
180 130 90 65 45 32 22 16
Rel
ativ
e S
ER
Source: Borkar, Intel
Soft Error / Chip (Logic & Mem)
Copyright Sill, 2008
Variability Trends cont’dVariability Trends cont’d
PPGEE‘08, Reliability 25
130nm~1000 samples
30%
5X
Frequency~30%
LeakagePower~5-10X
0.9
1.0
1.1
1.2
1.3
1.4
1 2 3 4 5Normalized Leakage (Isub)
No
rmal
ized
Fre
qu
en
cy
Source: Borkar, Intel
Frequency and sub-threshold leakage variations
Copyright Sill, 2008
1
10
100
1000
10000
180 nm 90 nm 45 nm 22 nm
Curr
ent D
ensi
ty J
oxTechnology
Source: Borkar, Intel
Increasing probability for Gate-Oxide-Breakdown
Source: Kauerauf, EDL, 2002
0
4
8
12
16
0 2 4 6 8 10 12Relia
bilit
y (W
eibu
ll sl
ope
β)
Gate Oxide Thickness [nm]
high-k?
Variability Trends cont’dVariability Trends cont’d
PPGEE‘08, Reliability 26
Copyright Sill, 2008 PPGEE‘08, Reliability 27
Future DesignsFuture Designs
100 Billion
Transistors
100 Billion
Transistors
100 BT integration capacity
Billions unusable (variations)
Some will fail over time
Intermittent failures
Source: Borkar, Intel
Approaches to Increase Approaches to Increase ReliabilityReliability
Copyright Sill, 2008
Reliability R(t):
– Probability of a system to perform as desired until time t
– Example: R(tx) = 0.8 80 % chance that system is still running at time tx
Mean Time To Failure MTTF:
– Average time that a system runs until it fails
Failure rate λ:
– Probability that system fails in given time interval
Failure MeasurementFailure Measurement
PPGEE‘08, Reliability
0
( )
1( )
tR t e
MTTF R t dt
29
Copyright Sill, 2008
Bathtube Failure ModelBathtube Failure Model
PPGEE‘08, Reliability 30
Time
Fai
lure
rat
e
7-15 years1-40 weeks
Infant mortality Declining failure rate Based on latent reliability
defects
Normal lifetime Constant failure rate Based on TDDB,
EM, hot-electrons…
Wearout period Increasing failure rate Based on TDDB, EM, etc.
Copyright Sill, 2008
ClassificationClassification
PPGEE‘08, Reliability 31
Failure
PermanentDefects, wearout, out of range parameters, EM, TDDB ...
Temporary
Transient IntermittentProcess variations, infant mortality, random dopant fluctation, ...
RadiationSoft errors
Non - RadiationPower supply, coupling, operation peaks
Source: Mitra, 2007
Copyright Sill, 2008
The Whole System Counts!The Whole System Counts!
PPGEE‘08, Reliability 32
Copyright Sill, 2008
Triple Module Redundancy (TMR)Triple Module Redundancy (TMR)
PPGEE‘08, Reliability 33
Voter Output
Logic L
Copy of Logic L
Copy of Logic L
Input
A
B
C
Copyright Sill, 2008
Triple Module Redundancy: VoterTriple Module Redundancy: Voter
PPGEE‘08, Reliability 34
Hardware realization of 1-bit majority voter
OUT = AB+AC+BC A
B
C
Requires 2 gate delays
1110
0010
0100
1011
OUTCBA
1110
0010
0100
1011
OUTCBAOut
::
Copyright Sill, 2008
Triple Module Redundancy cont’dTriple Module Redundancy cont’d
After certain time: Reliability of TMR system is lower than of simplex system
Why: After some time probability that 2 modules are wrong is higher that 2 modules are working!
PPGEE‘08, Reliability 35
Time
Note: For a constant module failure rate
0
1.0
0.5
Simplex (only 1 module)
Rel
iabi
lity
TMR
Copyright Sill, 2008
Self Adaptive DesignSelf Adaptive Design
Extend idea of clock domains to Adaptive Power Domains
Tackle static process and slowly varying timing variations
Control VDD, Vth (indirectly by body bias), fclk by calibration at
Power On
PPGEE‘08, Reliability 36
ModuleTest
Module
VDD
VBB
Test inputsand
responses
fclk
Copyright Sill, 2008
Self Adaptive Design: ExampleSelf Adaptive Design: Example 21 submodules per die Applying 0.5V Forward/Reverse Body Biasing (FBB/RBB) in steps
of 32 mV, respectively
PPGEE‘08, Reliability 37
0%
20%
60%
100%
Acc
ep
ted
die
noBB
100% yield
ABB
Higher Frequency
within die ABB
97% highest bin
For given Freq and Power density 100% yield with ABB 97% highest freq bin with ABB for within die variability
Source: Borkar, Intel
Copyright Sill, 2008
Razor Flip-FlopRazor Flip-Flop
For uncertainty- and variation-tolerant design Razor methodology
Voltage-scaling methodology based on real-time detection and correction of circuit timing errors
Use the actual hardware to check for errors Latch the input data twice:
Once on the clock edge, and then a little later If the data is not the same, you are going too fast
PPGEE‘08, Reliability 38
Source: Austin, Computer Magazine, 2004
Copyright Sill, 2008
Razor Flip-Flop cont’dRazor Flip-Flop cont’d
PPGEE‘08, Reliability 39
Logic stage n+1Main
flip-flop
MUX
Logic Stage n
Error
Shadowlatch
Comperator
Error_Sl
CLK
CLK_delayed
DQ
Shadow FF
Instr 1 Instr 2
Instr 1 Instr 2
CLK_delayed
CLK
D
Q
Error
Source: Austin, 2004
Shadow Transistor Shadow Transistor ApproachApproach
Copyright Sill, 2008
GateGate Oxide
DrainSource
TDDB modelTDDB model
TDDB between gate and channel
PPGEE‘08, Reliability
W
0
5
10
15
20
0%
25%
50%
75%
100%
-
Vout/VDD
rel. delay
RGC [kΩ] →
W1 W2
RGC
For an Inverter, 65nm-BPTM:
Model:
Based on: Segura et. al., “A Detailed Analysis of GOS Defects in MOS Transistors: Testing Implications at Circuit Level” 1995.
W= W1+W2
41
Copyright Sill, 2008
0%
25%
50%
75%
100%
-
PPGEE‘08, Reliability 42
TDDB between gate and source/drain
TDDB Model cont’dTDDB Model cont’d
For an Inverter, 65nm-BPTM:
Model:
Vout/VDD
RGC [kΩ] →
GateGate Oxide
DrainSource
RGS RGD
WW
Based on: Segura et. al., “A Detailed Analysis of GOS Defects in MOS Transistors: Testing Implications at Circuit Level” 1995.
Copyright Sill, 2008
Shadow TransistorsShadow Transistors
1. Insertion of additional transistors in parallel to vulnerable transistors
Shadow transistors (ST)
PPGEE‘08, Reliability
02468
10
-
Relative Delay
RGC [kΩ] →
wo/ ST
w/ ST
0%
25%
50%
75%
100%
-
VDD/Vout
RGC [kΩ] →
w/ ST
wo/ ST
For an Inverter, 65nm-BPTM
43
Copyright Sill, 2008
H-Vt/To
PPGEE‘08, Reliability 44
Shadow Transistors cont’dShadow Transistors cont’d
2. Application of H-Vt/To transistors with:
– Higher threshold voltage
– Thicker gate oxide
Less vulnerable to TDDB
0.15/ 0.22
/
10 4.81H Vt To
L Vt To
MTTF
MTTF
0.2210oxt
Source: Srinivasan, “RAMP: A Model for Reliability Aware Microprocessor Design”Stathis, J., “Reliability Limits for the Gate Insulator in CMOS Technology”
Source: Srinivasan, “RAMP: A Model for Reliability Aware Microprocessor Design”Stathis, J., “Reliability Limits for the Gate Insulator in CMOS Technology”
MTTF – Mean Time To Failure
Copyright Sill, 2008 PPGEE‘08, Reliability 45
Shadow Transistors cont’dShadow Transistors cont’d
3. Selective insertion of shadow transistors in parallel to vulnerable
transistors:
– Component reliability depends on
Activity, state, temperature, size, fabrication …
Most vulnerable can be identified
Shadow transistors only added in parallel to most vulnerable devices.
Shadow transistors only added in parallel to most vulnerable devices.
Netlist modification
Netlist modification
Copyright Sill, 2008 PPGEE‘08, Reliability 46
Shadow Transistors cont’dShadow Transistors cont’d
3. Selective insertion of shadow transistors in parallel to vulnerable
transistors:
– Component reliability depends on
Activity, state, temperature, size, fabrication …
Most vulnerable can be identified
Shadow transistors only added in parallel to most vulnerable devices.
Shadow transistors only added in parallel to most vulnerable devices.
Netlist modification
Netlist modification
Estimation of stress factors Determination of components reliability Adding redundancy only at most vulnerable components
Advantage: Lower area, power and delay penalty compared to
complete redundancy or random insertion [Sri04]
Estimation of stress factors Determination of components reliability Adding redundancy only at most vulnerable components
Advantage: Lower area, power and delay penalty compared to
complete redundancy or random insertion [Sri04]
New Approach
Source: [Sri04] Sirisantana, D&T, 2004
Copyright Sill, 2008
Shadow Transistors cont’dShadow Transistors cont’d
PPGEE‘08, Reliability
Increased reliability in respect to TDDB H-Vt/To: Reliability increases by ~5x (for Δtox = 0.15 nm)
Remarkable increase of system life time
Increased reliability in respect to TDDB H-Vt/To: Reliability increases by ~5x (for Δtox = 0.15 nm)
Remarkable increase of system life time
Advantages
Higher input capacity → higher delay and dynamic power dissipation Area increase
Higher input capacity → higher delay and dynamic power dissipation Area increase
Drawbacks
Only slight improvements for Gate-Drain/Source breakdown H-Vt/To has to be supported by technology
Only slight improvements for Gate-Drain/Source breakdown H-Vt/To has to be supported by technology
Remarks
47
Copyright Sill, 2008
0%
5%
10%
15%
20%
c17 c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552
Impr
ovem
net o
f MTT
F as
reg
ards
TD
DB
Insertion of L-Vt/To Shadow Transistors
our algorithm random insertion
ST – Improvement MTTFST – Improvement MTTF
PPGEE‘08, Reliability
≈ 23 % additional transistors
48
Copyright Sill, 2008
0%
50%
100%
150%
200%
250%
c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552
Impr
ovem
net o
f MTT
F as
reg
ards
TD
DB
Insertion of H-Vt/To Shadow Transistors
SPth = 30 SPth = 55
ST – Improvement MTTF (H-Vt/To)ST – Improvement MTTF (H-Vt/To)
PPGEE‘08, Reliability 49
Copyright Sill, 2008
Take Home MessagesTake Home Messages
Integrated circuits face several kinds of failures
Decreasing structures sizes create more failure sources
Future designs should (have to) be failure tolerant
Possible approaches: Triple Module Redundancy (TMR)
Self-Adapting Designs
Razor Flip-Flops
Shadow Transistors
There’s still a lot to do!
PPGEE‘08, Reliability 50
Top Related