Building reliable chips in future technologies: Fact...

45
1 Salishan Conference on High-Speed Computing, 2015 Building reliable chips in future technologies: Fact, fiction or an oxymoron? Vikas Chandra ARM Research San Jose, CA

Transcript of Building reliable chips in future technologies: Fact...

Page 1: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

1 Salishan Conference on High-Speed Computing, 2015

Building reliable chips in future technologies: Fact, fiction or an oxymoron?

Vikas Chandra ARM Research

San Jose, CA

Page 2: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

2 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Acknowledgments

§ Rob Aitken (ARM) § Greg Yeric (ARM) §  Subhasish Mitra (Stanford)

Page 3: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

3 Salishan Conference on High-Speed Computing, 2015

Some things are beyond our control!

Courtesy Takahashi Kaito (SII Nanotechnology Inc.)

Page 4: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

4 Salishan Conference on High-Speed Computing, 2015

The eras of computing

1960 1970 1980 1990 2000 2010 2020

Uni

ts

1M

10M Mainframe

Mini

1st Era

100M

1 Billion PC

Desktop Internet

2nd Era

100 Billion The Internet of Things

10 Billion Mobile Internet

2025

§ Cheap §  Low power § Ubiquitous

2030

Page 5: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

5 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Need for reliability

Page 6: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

6 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Lifetime profiles

Expected Lifetime

Usa

ge

low high

high

Page 7: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

7 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Feature size scaling

§ Impact of feature size scaling § Charge discretization § Random manufacturing defects §  Increasing electric field § Thin gate oxides §  Interface defects at Si/SiON interface § Metal defects §  Susceptibility to soft errors

Density doubling ~ every 2.5 Years

0.028

3 2

1.5

0.8 1.0

0.5 0.35

0.25 0.18

0.09 0.13

0.065 0.045

0.032

0.022

§ Beyond 14nm, scaling is more reliant on new materials §  Will require more reliability analysis

Low-K ILD

Hi-K

FinFET

Cu Strain

Page 8: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

8 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

CMOS Switches: 19xx - 2015

Page 9: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

9 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Devices 2015 – 2025

Page 10: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

10 Salishan Conference on High-Speed Computing, 2015

Technology scaling overview

1980 1990 2000 2010 1970

Com

plex

ity

Transistors

Patterning

Interconnect

Al wires

NMOS

2020

Planar CMOS

HKMG Strain

CU wires

PMOS LE, ~λ

LE, <λ Strong RET

The Good Ole Days

Lithography scaling

Wires negligible

Page 11: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

11 Salishan Conference on High-Speed Computing, 2015

Technology scaling overview

1980 1990 2000 2010 1970

Com

plex

ity

Transistors

Patterning

Interconnect

Al wires

NMOS

2020

Planar CMOS

HKMG Strain

CU wires

PMOS LE, ~λ

LE, <λ Strong RET

Extrapolating Past Trends OK

Delay ~ CV/I

Area ~ Pitch2

Power ~ CV2f

Page 12: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

12 Salishan Conference on High-Speed Computing, 2015

FinFET LELE

Technology complexity inflection point?

1980 1990 2000 2010 1970

Com

plex

ity

Transistors

Patterning

Interconnect

Al wires

NMOS

2020

Planar CMOS

HKMG Strain

CU wires

PMOS LE, ~λ

LE, <λ Strong RET

Extrapolating Past Trends OK

Delay ~ CV/I

Area ~ Pitch2

Power ~ CV2f

Futu

re P

rodu

ct D

evel

opm

ent

?

Page 13: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

13 Salishan Conference on High-Speed Computing, 2015

Technology complexity inflection point?

1980 1990 2000 2010 1970

Com

plex

ity

Transistors

Patterning

Interconnect

Al wires

NMOS

2020

Planar CMOS

HKMG Strain

CU wires

PMOS LE, ~λ

LE, <λ Strong RET

FinFET LELE

Page 14: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

14 Salishan Conference on High-Speed Computing, 2015

LE

Future technology

2010 2015 2020 2025 2005

Com

plex

ity Transistors

Patterning

Interconnect

FinFET

LELE

SADP

LELELE

EUV

HNW

µ-enh

SAQP

// 3DIC Graphene wire, CNT via

EUV LELE

VNW

Opto I/O

EUV + DWEB

2D: C, MoS

EUV + DSA

Opto int Spintronics

Seq. 3D

Al / Cu / W wires

NEMS

Planar CMOS

W LI

10nm

eNVM

Cu doping

7nm 5nm 3nm

1D: CNT

You are here

Page 15: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

15 Salishan Conference on High-Speed Computing, 2015

LE

Future transistors

2010 2015 2020 2025 2005

Com

plex

ity Transistors

Patterning

Interconnect

FinFET

LELE

SADP

LELELE

EUV

HNW

µ-enh

SAQP

// 3DIC Graphene wire, CNT via

EUV LELE

VNW

Opto I/O

EUV + DWEB

2D: C, MoS

EUV + DSA

Opto int Spintronics

Seq. 3D

Al / Cu / W wires

NEMS

Planar CMOS

W LI

10nm

eNVM

Cu doping

7nm 5nm 3nm

1D: CNT

Page 16: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

16 Salishan Conference on High-Speed Computing, 2015

LE

Future technology: patterning

2010 2015 2020 2025 2005

Com

plex

ity Transistors

Patterning

Interconnect

FinFET

LELE

SADP

LELELE

EUV

HNW

µ-enh

SAQP

// 3DIC Graphene wire, CNT via

EUV LELE

VNW

Opto I/O

EUV + DWEB

2D: C, MoS

EUV + DSA

Opto int Spintronics

Seq. 3D

Al / Cu / W wires

NEMS

Planar CMOS

W LI

10nm

eNVM

Cu doping

7nm 5nm 3nm

1D: CNT

Page 17: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

17 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Device lifetime and failure rate

1 – 20 weeks

Normal lifetime Early life failures Wearout

3 – 10 years time

Failu

re r

ate

Increasing manufacturing

defects

Increasing transient errors

Acceleration of aging

phenomena

Page 18: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

18 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Unreliable transistors – 3 phases

Gate to source shorts Insulator cracks Thin oxide defects Small opens Poor vias/contacts

Burn-in

Soft errors - Memory/Flip-flops - Combinational cells Noise - Electrical - RTN

Design Architecture

ECC

Transistor wearout - BTI, TDDB

Metal wearout - Electromigration

ILD wearout

Design margins Overdesign

1 2 3 Early life failures Normal lifetime Wearout/Aging

Page 19: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

19 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Soft error: Impact on circuits 1 0 0 1

upset

transient

Storage cells - SRAM bit cells, Flip-flops

Combinational cells

Flip-flops

Un protected memory

Soft error contribution rate

Page 20: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

20 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Big system reliability §  “Always on” system are especially vulnerable to soft error

§  Vulnerability further increases substantially at low supply voltage

§ As technology scales down, number of cores scales up §  Rates of failures increases: From 2 weeks (current) to 1 hour

Source: Draft ICiS 2012 Reliability Workshop

Page 21: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

21 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

SRAM upsets 0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  1  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

Single  cell  upset  

Mul0  cell  upset  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  0  0  0  0  0  

0  0  1  

1  0  0  0  

0  1  

1  1  

1  0  0  

0  0  

0  1  

0  0  0  

0  0  1  0  0  0  0  

0  0  

1  0  0  0  0  

cell  area  ê  cell  capacitances  ê  

Qcoll  ê      

Feature  size  scaling    

1   2   3   4   1   2   3   4  Word  arrangement  in  a  mux-­‐4  arch  

Page 22: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

22 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

FIT rate in 28nm: SRAM

Page 23: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

23 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Soft error mitigation - SRAM

§  ECC with physical interleaving reduces the FIT rate to ~0 §  Physical interleaving: multi cell error à single bit error § Temporal scrubbing to correct single bit errors

Error Correcting

Code

Area

Power Error rate

f

f Memory Compare

Corrector

Data in

Data out

Error signal

M

K

M

K

K

M

Page 24: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

24 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

SoC soft error trends Bitcell SER FIT rate per node

0

100

200

300

400

500

600

700

200 150 100 50 0

SCU Avg/node MCU Avg/node

SoC SER FIT rate per node

1

10

100

1000

200 150 100 50 0

Memory SER Logic SER

Even though per memory bitcell SER sensitivity is decreasing, overall FIT per SoC is increasing

Page 25: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

25 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Are all bits equally vulnerable?

Intrinsic soft error

susceptibility

Architectural Vulnerability

Factor

Timing Vulnerability

Factor

Soft error rate

Functional masking

Timing masking

Page 26: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

26 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Soft error in logic § No “cheap” way to protect - spatially distributed bits

§ Vulnerability factor analysis helps to identify critical ones

Architecture Workload

Critical flip-flop

identification

- Fault injection - Formal methods - …

Page 27: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

27 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Error injection flow

Start application

Application outcome

Pick random injection cycle

Bit flip

Pick random register

Pick random bit Rep

eat

Vanished, Output Mismatch Unexpected Termination, Hang

Page 28: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

28 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Fault injection results

The  Key  message  here  is  that  even  though  most  faults  vanish,  we  s6ll  need  to  worry  about  the  remaining  4-­‐8%  of  faults.  

Page 29: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

29 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Cross layer resilience approach Reliability Performance Power Area

Resilience Library

Circuit Logic

Architecture

SIHFT

Application

Fault Injection Guides cross-layer

Physical Design Aware Exploration

ARM IP ARM Core

Technology Library SP&R flow

Emulation

Cross layer resilience

Design

Workload

Critical FF identification

FIT analysis

Fault Injection Optimized

Design

ECO

SER optimized

library

FIT, power, area, performance

Workload

Workload

Library SER data

Circuit resilience example

Page 30: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

30 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

FinFET and soft error rate

SER = Adiff e(-Qcrit/Qcoll)

Qcoll ê leads to SER ê

LET

Col

lect

ed c

harg

e Planar SRAM FinFET SRAM

Source: Fang and Oates, TDMR 2011

Technology node

Soft

erro

r ra

te

Reduction due to FinFET

Source: S. Ramey, et al, IRPS 2013

Page 31: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

31 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Random telegraph noise (RTN)

§ Discrete level current fluctuation with time § Caused by charge trapping/de-trapping in dielectric § Behavior is results in 1/f noise for large FETs

Drain current Dra

in c

urre

nt

Time (s)

10%

Source: K. Takeuchi, Renesas, VMC 2011

Page 32: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

32 Salishan Conference on High-Speed Computing, 2015

RTN scaling

RTN scales faster (1/LW) as opposed to other variability phenomena which scale as 1/(LW)1/2

Sour

ce: K

. Tak

euch

i, R

enes

as, V

MC

201

1

Page 33: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

33 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Phase 3: Reliability towards end of life

1 – 20 weeks 3 – 10 years time

Failu

re r

ate

Transistor wearout - BTI, TDDB

Metal wearout - Electromigration

Page 34: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

34 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Wearout/aging

Vt gm Ids Ioff

Transistor

Igate

BTI

BTI TDDB

BTI TDDB BTI

TDDB

R

EM

BTI à Bias Temperature Instability

TDDB à Time Dependent Dielectric Breakdown

EM à Electromigration

Page 35: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

35 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Bias Temperature Instability (BTI)

§ Reliability concern at Si-SiON interface § Gradual shift in transistor parameters with time

Si Si Si Si Si

* H

* H

H2

H

H

Silicon Gate oxide Poly

* H

Si-H bond recovery

ΔVt

Time

Si-H bond disassociation

Stress stage Recovery stage

PMOS

Negative Bias: Si-H bond disassociation

Zero Bias: Si-H bond recovery

++

++

H

HH0

VDD

VDD

-VDD

+

+

H

H

VDD or 0

VDD

VDD0

Page 36: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

36 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Wearout: Impact on circuits

Combinational State-holding cells (bit cells, flip-flops)

§  Fmax ê § Timing failure as circuits age

D Q

clk

§  Static Noise Margin ê § Read and write stability ê §  Parametric yield loss

Page 37: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

37 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Delay Degradation ≠ Failure

But there could be hold errors due to aging in clock path

Page 38: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

38 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

CPU workload dependent aging: A case study

§ Mid-size ARM CPU §  > 100K instances, > 10K sequentials §  Large enough to be interesting §  Small enough for rapid turnaround time

§  Simulation settings §  Vdd = 0.9V, Temp = 105 oC, Lifetime = 3 years §  CP 28LP standard-cell library §  >1K cell-topologies, >10K timing-arcs

§  NBTI and PBTI aging model §  RD: Reaction-Diffusion [Gielen 11, Zheng 09] §  TD: Trapping-Detrapping [Velamala 12]

Page 39: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

39 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Instance based simulation flow

For more details: Workload Dependent NBTI and PBTI Analysis for a sub-45nm Commercial Microprocessor, Intl. Reliability Physics Symposium (IRPS), 2013.

Page 40: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

40 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Block and Path Timing Degradation

§ Dhrystone workload

x 10

6 9 12 15 18 0

5

10 4

max = 17.6% avg = 9.5% min = 4.9%

% Timing Degradation

Pat

h H

isto

gram

0 10 20 0

5

10

Block

% T

imin

g D

egra

datio

n

Page 41: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

41 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Path Rank Analysis § Rank paths in fresh and aged design, sorted by slack § Non-critical paths can become critical and vice versa

Path rank % Timing

Degradation Fresh Aged (Dhrystone)

1 14084 7.64 2 9781 7.94 3 9329 8.02 4 12345 7.87 5 6220 8.31 6 36672 7.16 7 7771 8.19 8 11580 7.96 9 28975 7.40 10 20054 7.66

Path rank % Timing

Degradation Aged (Dhrystone) Fresh

1 179394 15.61 2 145042 15.41 3 134419 15.18 4 1413427 17.57 5 272323 15.67 6 224034 15.46 7 331934 15.76 8 275422 15.56 9 481425 16.06 10 208561 15.24

Page 42: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

42 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Workload Power-State Trace

Workload Power-State (Active / Sleep)

vs. Time Average

Active Time

mp3 0.03

web-browse 0.07

3D rendering 0.24

Dhrystone 0.40

video H264 0.54

Page 43: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

43 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Workload-Dependent Processor Aging

Workload

% Timing Degradation

Switching-activity Power-state Switching-activity and

power-state RD TD RD TD RD TD

mp3 10.0 6.3 3.6 2.5 2.3 1.6 web-browse 10.6 7.3 6.1 4.2 4.1 2.8 3D rendering 12.3 8.4 6.9 4.7 5.4 3.7

Dhrystone 11.2 7.7 10.1 6.9 7.3 5.0 video H264 12.5 8.5 11.4 7.8 9.1 6.2 worst-case 15.6 10.7 15.6 10.7 15.6 10.7

RD: Reaction-Diffusion model TD: Trapping-Detrapping model

Page 44: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

44 Salishan Conference on High-Speed Computing, 2015 Salishan Conference on High-Speed Computing, 2015

Conclusions

§ Reliability is more critical than ever for RAS critical designs – Servers, HPC etc.

§  Errors can be mitigated at devices, circuits, architecture or even software level §  Lots of opportunities to design reliable systems from ground up

§ As we scale, some things are becoming worse, some better §  BTI, TDDB, EM é §  SER ê

§  Future technology challenges §  Need to keep an eye on the evolution of future devices §  Quantification of wearout impact on CPU PPA scaling §  RTN scaling in the era of new devices (nanowires, compound

semiconductors, CNT)

Page 45: Building reliable chips in future technologies: Fact ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/chandra.pdfDelay ~ CV/I Area ~ Pitch2 Power ~ CV2f . Salishan Conference on High-Speed

45 Salishan Conference on High-Speed Computing, 2015

Fin