Reliability in Reliability in Nanometer Technologies Nanometer … · 2017. 10. 19. · PPGEE ’08...
Transcript of Reliability in Reliability in Nanometer Technologies Nanometer … · 2017. 10. 19. · PPGEE ’08...
-
PPGEEPPGEE ’08’08PPGEE PPGEE ’08’08Reliability in Reliability in Nanometer Technologies Nanometer Technologies ––
P bl d S l tiP bl d S l tiProblems and SolutionsProblems and SolutionsDr -Ing Frank SillDr.-Ing. Frank Sill
Department of Electrical Engineering, Federal University of Minas Gerais,Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil
[email protected]://www cpdee ufmg br/~frank/http://www.cpdee.ufmg.br/~frank/
-
AgendaAgendagg
MotivationMotivationFailures in Nanometer TechnologiesgTechniques to Increase ReliabilityShadow Transistors
Copyright Sill, 2008 PPGEE‘08, Reliability 2
-
MotivationMotivation
Reliability important forReliability important for
Normal user
Companies
Medical applications
Cars
Air / Space Environment
…
Copyright Sill, 2008 PPGEE‘08, Reliability 3
-
MotivationMotivation
500 Wolfdale
400
500M
ill.]
410 Mill.
200
300
stor
s [M
Prescott
Yonah151 Mill.
100
200
Tran
si
Northwood55 Mill.
Prescott125 Mill.
Yonah
02002 2004 2006 2008
Yonah, 151 Mill.
Probability for failures increases due to:
Year
Increasing transistor countShrinking technology
Copyright Sill, 2008 PPGEE‘08, Reliability
-
MotivationMotivation
150 nm500 Wolfdale130 nm
90
150 nm
400
500
gyMill
.]410 Mill.
90 nm
65 nm
45
100 nm
200
300
chno
log
isto
rs [M
Prescott 45 nm 50 nm
100
200
Tec
Tran
si
Northwood55 Mill.
Prescott125 Mill.
Yonah
0 nm 02002 2004 2006 2008
Yonah, 151 Mill.
Year
Probability for failures increases due to:Increasing transistor countShrinking technology
Copyright Sill, 2008 PPGEE‘08, Reliability
-
DimensionsDimensions
11 m10 cm1 cm1 mm100 µm10 µm100 nm
Source: „Spektrum der Wissenschaften“
„65 nm“-TransistorSource: Intel
Copyright Sill, 2008 PPGEE‘08, Reliability 6
Source: Intel
-
Failures in Nanometer Failures in Nanometer TechnologiesTechnologies
-
Process FailuresProcess Failures
Occur at production phaseOccur at production phaseBased on
P V i tiProcess VariationsParticles …
Copyright Sill, 2008 PPGEE‘08, Reliability 8
Source: Mak
-
SubSub--wavelength Lithographywavelength Lithographyg g p yg g p y
1 1000
365nm nm]
1 1000
193nm248nm
engt
h [n
180nmon [µ
]
Wav
ele
90nm
130nm Gap0,1
ener
atio
100
grap
hy
65nm
Generation 45nm32nm
Ge
Lith
og32nm 13nm EUV
0,011980 1990 2000 2010 2020
10
Copyright Sill, 2008 PPGEE‘08, Reliability 9
Source: Mark Bohr, Intel
-
FieldField--dependent Aberrationsdependent Aberrationspp
)(ACELL)(ACELL)(ACELL 220011 YXYXYX ≠≠ ),(A_CELL),(A_CELL),(A_CELL 220011 YXYXYX ≠≠
s
Lens
ds L
ens
Wafer
Tow
ard Wafer
Plane
Center: Mi i l
Edge: High Ab tiMinimal
AberrationsAberrations
Source: R Pack Cadence
Copyright Sill, 2008 PPGEE‘08, Reliability 10
Source: R. Pack, Cadence
-
Varying Line WidthVarying Line Widthy gy g
2.32.2[n
m]
2.12.0
eWid
th
1.91.8150
Line
50100
150
020
4060
W f X W f Y0 0Wafer X Wafer Y0Source: Zhou, 2001
Copyright Sill, 2008 PPGEE‘08, Reliability 11
-
Random Dopant FluctuationsRandom Dopant FluctuationsppCauses Vth Variations
10000D
opan
t
1000
mbe
r of
DA
tom
s
100
Mea
n N
umA
101000 500 250 130 65 32
M
Technology Node (nm)
UniformUniform NonNon--uniformuniform
Copyright Sill, 2008 PPGEE‘08, Reliability 12
Source: Borkar, Intel
-
Power Power DensityDensityyy
Sun’s10000
Rocket
Surface
1000
W/c
m2)
Nuclear
RocketNozzle
100
ensi
ty (W Nuclear
ReactorPrescott
40048086
P410ower
De
Hot PlatePentium®
80088080
8085286 386 486
Pentium®
1
Po
11970 1980 1990 2000 2010
Year
Copyright Sill, 2008 PPGEE‘08, Reliability 13
Source: Moore, ISSCC 2003
-
Temperature VariationTemperature VariationppPower Map On-Die Temperaturep
Power density is not uniformly distributed across the chipSilicon is not a good heat conductorMax junction temperature is determined by hot-spots
Impact on packaging, coolingSource: Borkar Intel
Copyright Sill, 2008 PPGEE‘08, Reliability 14
Source: Borkar, Intel
-
Temperature Variation cont’dTemperature Variation cont’dpp
Power4 Server ChipPower4 Server Chip
Source: Devgan ICCAD’03
Copyright Sill, 2008 PPGEE‘08, Reliability 15
Source: Devgan, ICCAD 03
-
Temperature Variation cont’dTemperature Variation cont’dpp
S[p
A]
rrent
I DS
ay [s
]
rain
cur Del
Dr
Temperature [°C]
Threshold voltage Vth changes with temperature drain-source current changes delay changes Source: Burleson UMASS 2007
Temperature [ C]
Copyright Sill, 2008
changes delay changes
PPGEE‘08, Reliability 16
Source: Burleson, UMASS, 2007
-
Supply Voltage DropSupply Voltage Droppp y g ppp y g p
Source: Trester 2005
Copyright Sill, 2008 PPGEE‘08, Reliability 17
Source: Trester, 2005
-
Failures Through Increasing DelayFailures Through Increasing Delayg g yg g y
Data are processed before clock phase is over
Clock (Clk)
Clk
Clock (Clk) Logic too slow!
→ Data processing longer than clock phase
→ Wrong Data in next clock phase!Clk
Copyright Sill, 2008 PPGEE‘08, Reliability 18
-
Soft ErrorsSoft ErrorsSource: Automotive 7-8, 2004
11
In 70’s observed: DRAMs occasionally flip bits for no apparent reason Ultimately linked to alpha particles and cosmic raysCollisions with particles create electron-hole pairs in substrateThese carriers are collected on dynamic nodes, disturbing the voltage
Copyright Sill, 2008 PPGEE‘08, Reliability 19
-
Soft Errors cont’dSoft Errors cont’d
Internal state of node flips shortlyp yIf error isn’t masked by
Logic: Wrong input doesn’t lead to wrong outputg g p g pElectrical: Pulse is attenuated by following gatesTiming: Data based on pulse reach flipflop after clock transistion
wrong data
Copyright Sill, 2008 PPGEE‘08, Reliability 20
-
ElectromigrationElectromigrationgg
Electromigration: Top View VoidElectromigration: Transport of material caused by the gradual movement of
Top View
Metal 1by the gradual movement of ions in a conductor One of the major failure jmechanisms in interconnects.Proportional to the width and
Metal 1
thickness of the metal linesInversely proportional to the Whisker, Hillockcurrent density
Cross Section View
,
Metal 1Metal 1
Metal 2
Copyright Sill, 2008 PPGEE‘08, Reliability 21
Source: Plusquellic, UMBC
-
Electromigration cont’dElectromigration cont’dggVoid in 0.45mm Al-0.5%Cu line
Source: IMM-BolognaSource: IMM-BolognaWhiskers in Sn
Source: EPA Centre
Hillocks in ZnSnSource: Ku&Lin,2007
Copyright Sill, 2008 PPGEE‘08, Reliability 22
-
TimeTime--Dependent Dielectric Breakdown (TDDB)Dependent Dielectric Breakdown (TDDB)
T li tTunneling currents
Wear out of gate oxide
Creation of conducting path between Gate and Substrate, Drain, Source
Depending on electrical field over gate oxide, temperature (exp.), and gate oxide thickness (exp.)
Source: Pey&Tung
Also: abrupt damage due to extreme overvoltage (e.g. Electro-g ( gStatic Discharge)
Source: Pey&Tung
Copyright Sill, 2008
Source: Pey&Tung
PPGEE‘08, Reliability 23
-
Variability TrendsVariability Trendsyy
70
60
70
Vdd
40
50
lity Vth
30
40
Vari
abil
Performance
20
%
Power
Lgate
0
10Lgate
090 80 70 65 57 50 45 40 36 32 28
Technology Node [nm] Source: Burleson, UMASS, 2007
Copyright Sill, 2008 PPGEE‘08, Reliability 24
gy [ ]
-
Variability Trends cont’dVariability Trends cont’dyy
Soft Error / Chip (Logic & Mem)150
Soft Error / Chip (Logic & Mem)
100SE
R
50elat
ive
50
Re
0180 130 90 65 45 32 22 16
Technology [nm]
80 30 90 65 5 3 6
Source: Borkar Intel
Copyright Sill, 2008 PPGEE‘08, Reliability 25
Source: Borkar, Intel
-
Variability Trends cont’dVariability Trends cont’dyyFrequency and sub-threshold leakage variations
Frequency
1.4
cy
q y g
30%Frequency
~30%1 2
1.3
eque
nc
130nm~1000 samples
LeakagePower1 1
1.2
zed
Fre
1000 samples Power~5-10X
1 0
1.1
orm
aliz
5X
0 9
1.0No
0.91 2 3 4 5
Normalized Leakage (Isub)Source: Borkar Intel
Copyright Sill, 2008 PPGEE‘08, Reliability 26
Source: Borkar, Intel
-
Variability Trends cont’dVariability Trends cont’d
Increasing probability for Gate Oxide Breakdown
yy
10000
Increasing probability for Gate-Oxide-Breakdown
16β)
1000
sity Jox
12
6
ull slope
β
10
100
rren
t Den
s
4
8
ty (Weibu
high-k?
1
0
180 nm 90 nm 45 nm 22 nm
Cur
0
0 2 4 6 8 10 12Reliabilit
180 nm 90 nm 45 nm 22 nm
Technology
Source: Borkar, IntelSource: Kauerauf EDL 2002
0 2 4 6 8 10 12
Gate Oxide Thickness [nm]
Source: Borkar, IntelSource: Kauerauf, EDL, 2002
Copyright Sill, 2008 PPGEE‘08, Reliability 27
-
Future DesignsFuture Designsgg
100 BT integration capacity
( )100
Billions unusable (variations)
Some will fail over timeBillion
Some will fail over time
Intermittent failuresTransistors
Intermittent failures
Copyright Sill, 2008 PPGEE‘08, Reliability 28
Source: Borkar, Intel
-
Approaches to Increase Approaches to Increase ReliabilityReliability
-
Failure MeasurementFailure Measurement
R li bilit R(t)Reliability R(t):– Probability of a system to perform as desired until time t
– Example: R(tx) = 0.8 80 % chance that system is still running at time tx
Mean Time To Failure MTTF:– Average time that a system runs until it fails
Failure rate λ:Failure rate λ: – Probability that system fails in given time interval
( )
1
tR t e λ−∞
=
0
1( )MTTF R t dtλ
∞
= =∫
Copyright Sill, 2008 PPGEE‘08, Reliability 30
-
Bathtube Failure ModelBathtube Failure ModelInfant mortality Wearout period
Increasing failure rateDeclining failure rate Based on latent reliability defects
Increasing failure rate Based on TDDB, EM, etc.
Normal lifetimeConstant failure rate
re ra
te
Constant failure rateBased on TDDB, EM, hot-electrons…
Failu
r
Time7-15 years1-40 weeks
Copyright Sill, 2008 PPGEE‘08, Reliability 31
weeks
-
ClassificationClassification
FailureFailure
PermanentDefects, wearout, out of range parameters EM
Temporary
range parameters , EM, TDDB ...
Transient IntermittentProcess variations, infant mortality, random dopant fluctation, ...,
Radiation N R di tiRadiationSoft errors
Non - RadiationPower supply, coupling, operation peaks
Copyright Sill, 2008 PPGEE‘08, Reliability 32
p pSource: Mitra, 2007
-
The Whole System The Whole System Counts!Counts!yy
Copyright Sill, 2008 PPGEE‘08, Reliability 33
-
Triple Module Redundancy (TMR)Triple Module Redundancy (TMR)p y ( )p y ( )
Logic LInput Logic Lp
A
Voter OutputCopy of Logic L
B
Cg
Copy of
C
Copy of Logic L
Copyright Sill, 2008 PPGEE‘08, Reliability 34
-
Triple Module Redundancy: VoterTriple Module Redundancy: Voterp yp y
Hardware realization of 1-bit majority voterHardware realization of 1-bit majority voter
OUT = AB+AC+BCA
OUTCBA OUTCBAB 1011
OUTCBA1011
OUTCBAOut
C 00100100
0010
0100
1110 1110
:Requires 2 gate delays
::
Copyright Sill, 2008 PPGEE‘08, Reliability 35
-
Triple Module Redundancy cont’dTriple Module Redundancy cont’dp yp y
Note: For a constant module failure rate λNote: For a constant module failure rate λ1.0
tyTMR
0.5
Rel
iabi
lit
Simplex (only 1 module)
R
Time0
After certain time: Reliability of TMR system is lower than of simplex systemWhy: After some time probability that 2 modules are wrong is higher that 2 modules are working!
Copyright Sill, 2008 PPGEE‘08, Reliability 36
-
Self Adaptive DesignSelf Adaptive Designp gp g
Extend idea of clock domains to Adaptive Power Domains p
Tackle static process and slowly varying timing variations
Control VDD V (indirectly by body bias) f by calibration atControl VDD, Vth (indirectly by body bias), fclk by calibration at Power On
VDDTest inputsd
M d lTest
and responses
fclkModuleModule
VBB
Copyright Sill, 2008 PPGEE‘08, Reliability 37
-
Self Adaptive Design: ExampleSelf Adaptive Design: Examplep g pp g p21 submodules per dieApplying 0.5V Forward/Reverse Body Biasing (FBB/RBB) in steps of 32 mV, respectively
100%noBB ABB within die ABB
97% highest bin
60%
ted
die
100% yield
97% highest bin
0%
20%
Acc
ept
0%
Higher Frequency
For given Freq and Power density
Source: Borkar, Intel
For given Freq and Power density100% yield with ABB 97% highest freq bin with ABB for within die variability
Copyright Sill, 2008 PPGEE‘08, Reliability 38
97% highest freq bin with ABB for within die variability
-
Razor FlipRazor Flip--FlopFloppp pp
For uncertainty- and variation-tolerant designFor uncertainty and variation tolerant designRazor methodology
V lt li th d l b d l tiVoltage-scaling methodology based on real-time detection and correction of circuit timing errorsUse the actual hardware to check for errorsLatch the input data twice:Latch the input data twice:
Once on the clock edge, and then a little laterIf the data is not the same, you are going too fast
Copyright Sill, 2008 PPGEE‘08, Reliability 39
Source: Austin, Computer Magazine, 2004
-
Razor FlipRazor Flip--Flop cont’dFlop cont’dpp pp
Logic stage n+1Main
flip-flop
MUX
Logic Stage n
E Sl
DQ
Shadow FF
Shadowlatch Comperator
Error_Sl
CLKError
ComperatorCLK
CLK_delayed
Copyright Sill, 2008 PPGEE‘08, Reliability 40
Source: Austin, 2004
-
Shadow Transistor Shadow Transistor ApproachApproach
-
TDDB modelTDDB model
TDDB between gate and channel
For an Inverter, 65nm-BPTM:
GateGate Oxide
DrainSource 15
20
75%
100%
DrainSource
10
15
50%
75%Vout/VDD
rel. delay
525%
y
Model:
00%‐RGC [kΩ] →RGC
W1 W2
Based on: Segura et. al., “A Detailed Analysis of GOS Defects in MOS Transistors: Testing Implications at Circuit Level” 1995.
W= W1+W2
Copyright Sill, 2008 PPGEE‘08, Reliability
in MOS Transistors: Testing Implications at Circuit Level 1995.
42
-
TDDB Model cont’dTDDB Model cont’d
TDDB between gate and source/drain
For an Inverter, 65nm-BPTM:
100%V /V
GateGate Oxide
DrainSource
50%
75%Vout/VDD
DrainSource
25%
50%
Model:
0%‐RGC [kΩ] →
Based on: Segura et. al., “A Detailed Analysis of GOS Defects in MOS Transistors: Testing Implications at Circuit Level” 1995.
Copyright Sill, 2008 PPGEE‘08, Reliability 43
in MOS Transistors: Testing Implications at Circuit Level 1995.
-
Shadow TransistorsShadow Transistors
1. Insertion of additional transistors in parallel to vulnerable transistors
Shadow transistors (ST)
Relative Delay V /V
6
8
10Relative Delay
wo/ ST75%
100%VDD/Vout
w/ ST
2
4
6w/ ST
25%
50%wo/ ST
0‐RGC [kΩ] →
0%‐RGC [kΩ] →
For an Inverter, 65nm-BPTM
Copyright Sill, 2008 PPGEE‘08, Reliability
For an Inverter, 65nm BPTM
44
-
Shadow Transistors cont’dShadow Transistors cont’d
2. Application of H-Vt/To transistors with:– Higher threshold voltage
– Thicker gate oxide
Less vulnerable to TDDBLess vulnerable to TDDB
0.15/ 0.2210 4.81H Vt ToMTTF − = =0 2210
oxtΔ
/L Vt ToMTTF −0.2210
Source: Srinivasan, “RAMP: A Model for Reliability Aware Microprocessor Design”Stathis, J., “Reliability Limits for the Gate Insulator in CMOS Technology”
MTTF – Mean Time To Failure
Copyright Sill, 2008 PPGEE‘08, Reliability 45
Stathis, J., Reliability Limits for the Gate Insulator in CMOS Technology
-
Shadow Transistors cont’dShadow Transistors cont’d
3. Selective insertion of shadow transistors in parallel to vulnerable transistors:– Component reliability depends on
Activity, state, temperature, size, fabrication …
Most vulnerable can be identified
Shadow transistors only added in parallelNetlist only added in parallel to most vulnerable devices.
modification
Copyright Sill, 2008 PPGEE‘08, Reliability 46
-
Shadow Transistors cont’dShadow Transistors cont’d
3. Selective insertion of shadow transistors in parallel to vulnerable transistors:– Component reliability depends on
Activity, state, temperature, size, fabrication …E ti ti f t f t
New Approach
Most vulnerable can be identifiedEstimation of stress factors Determination of components reliabilityAdding redundancy only at most vulnerable componentsAdding redundancy only at most vulnerable components
Advantage: Lower area, power and delay penalty compared to
Shadow transistors only added in parallelNetlist
complete redundancy or random insertion [Sri04] Source: [Sri04] Sirisantana, D&T, 2004only added in parallel
to most vulnerable devices.
modification
Copyright Sill, 2008 PPGEE‘08, Reliability 47
-
Shadow Transistors cont’dShadow Transistors cont’d
Ad t
Increased reliability in respect to TDDB
Advantages
H-Vt/To: Reliability increases by ~5x (for ∆tox = 0.15 nm)Remarkable increase of system life time
Drawbacks
Higher input capacity → higher delay and dynamic power dissipationArea increase
Remarks
Only slight improvements for Gate-Drain/Source breakdownH-Vt/To has to be supported by technology
Copyright Sill, 2008 PPGEE‘08, Reliability 48
-
ST ST –– Improvement MTTFImprovement MTTF≈ 23 % additional transistors
20%
ds TDDB
15%
as regard
10%
of M
TTF
5%
ovem
net
0%
c17 c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552
Impr
Insertion of L‐Vt/To Shadow Transistors
our algorithm random insertion
Copyright Sill, 2008 PPGEE‘08, Reliability 49
-
ST ST –– Improvement MTTF (HImprovement MTTF (H--VtVt/To)/To)
250%B
(( ))
200%
250%
gards TD
DB
150%
TTF as reg
100%
net o
f MT
0%
50%
mprovem
n
0%
c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552
Im
SPth = 30 SPth = 55
Insertion of H‐Vt/To Shadow Transistors
SPth 30 SPth 55
Copyright Sill, 2008 PPGEE‘08, Reliability 50
-
Take Home MessagesTake Home Messagesgg
I t t d i it f l ki d f f ilIntegrated circuits face several kinds of failures
Decreasing structures sizes create more failure sourcesg
Future designs should (have to) be failure tolerant
Possible approaches:Triple Module Redundancy (TMR)Triple Module Redundancy (TMR)
Self-Adapting Designs
R Fli FlRazor Flip-Flops
Shadow Transistors
There’s still a lot to do!
Copyright Sill, 2008 PPGEE‘08, Reliability 51
-
Th k !Th k !Thank you!Thank [email protected]@ufmg.br
Copyright Sill, 2008 PPGEE‘08, Reliability 52