Boston Rs Meeting Jan09

7/30/2019 Boston Rs Meeting Jan09

1/45

D. J. Weidman January 2009

1

Daniel J. Weidman, Ph.D.

Advanced Electron Beams, Wilmington, MA

IEEE Boston-area Reliability Society MeetingWednesday, January 14, 2009

Practical Reliability Engineering

for Semiconductor Equipment

[email protected]

The pdf of this version of this presentation may be distributed freely.

No proprietary information is included.

If distributing any portion, please include my name as the author.

Thank you, Dan Weidman

Copyright 2009, D. J. Weidman


2/45


2

Practical Reliability Engineering for

Semiconductor Equipment

Abstract Reliability data can be utilized to allocate efforts for improvements and

presentation to customers. This talk presents several practical techniques used to

gather reliability data for these purposes. These techniques are based on basicreliability engineering concepts and are applied in simple ways. Data will be shownfor illustrative purposes without details about specific components or subsystems.This presentation will review definitions of several reliability engineering metrics.Examples will illustrate Pareto plots over various time intervals and availability withplanned and unplanned downtime. Important metrics such as Mean Time BetweenFailure (MTBF) and Mean Time Between Assists / Interrupts (MTBA/I) are used for

quantifying failure rates.

Data is collected and analyzed from various sources and tallied in a variety ofways. Repair data can be collected from service technician or customer fieldreports. Reliability data can be collected from in-house or customer-site machines.In-house inventory statistics can indicate which parts are being replaced mostfrequently, by part number or cost.

Failure Analysis Reports should be communicated within the organization in a waythat is effective. Vendors often have to be engaged to improve reliability ofcomponents or subsystems. Information that will be presented may be applicableto several other industries.


3/45


3

Daniel J. Weidman, Ph.D.

Dr. Daniel J. Weidman received his Bachelors degree in Physics from MIT in1985. He earned his Ph.D. in Electrical Engineering from the University ofMaryland, College Park. He has authored or co-authored more than 20

journal articles and technical reports in publications and more than 60conference presentations. He started working with electron beams more than20 years ago, and has since returned to that industry. He brings a freshperspective to reliability engineering in the semiconductor industry, becausehe has no formal training in reliability engineering and he had less than twoyears of experience in the semiconductor industry when he took a position asthe Reliability Engineer at NEXX Systems.

NEXX Systems is located in Billerica, Massachusetts, and designs and sellssemiconductor manufacturing equipment. Dr. Weidman was the ReliabilityEngineer there for almost five years. Dr. Weidman has resumed working inthe field of electron beams, at Advanced Electron Beams of Wilmington, MA.

He is the Principal Process Engineer, and his responsibilities includereliability testing of the electron-beam emitters and high-voltage powersupplies.


4/45


4



Goal and scope of this talk

Review basic reliability engineering concepts and showhow they can be used successfully

Applicable to equipment in the semiconductor industry,and other industries


5/45


5



Goal

Reliability program Immediate issues

Reactive reliability engineering

Proactive reliability engineering


6/45


6



Goal

Reliability program Immediate issues


Proactive reliability engineering


7/45


7

PVD Machine

Physical Vapor Deposition of thin metal film Wafers carried on trays to minimize handling & time to

change size

Up to five metals in a small footprint


8/45


8



Goal

Reliability program plan Immediate issues


Overall process

Data gathering to record each issue

Data tallying

Reliability engineering metrics with examples


9/45


9

Reliability program

Failure Reporting, Analysis, and Corrective Action System

reports faultcustomer

internet

access

Field ServiceEngineer receives

customer report orobserves fault

FAR

parts or

information

or both

Immediately: Serviceteam addresses

customer need

FSR

FTA

FMEA

Longer term:Reliability

teaminvestigates

ECO

TestTrack

Implement /integrateidentified

improvement

Deliver

on new machine

upgrade existing machine


10/45


10

Machine faults from customers About 300 service reports per product line per year

Copied from FSR database, pasted into Excel, and reviewed.

9 entries are shownas an example.


11/45


11



Goal


Reactive reliability engineering Overall process

Reliability engineering metrics with examples Fault vs. failure

Pareto plot

Uptime and Availability

MTBF, MTBA, MTBI

MTTR, MTR

Additional metrics specific to industry

Additional metrics


12/45


12

Reliability definitions: faults

Fault: anything that has gone wrong

Failure: an equipment problem

All failures are faults

Examples: If a transport system stops due to particles that are normal to the process, then it is a failure (and a fault).

a left wrench inside, then its a fault but not a failure.

faults failures action


13/45


13

Pareto plots

Location Function

Locatio

n1

Locatio

n2

Locatio

n5

Locatio

n4

Locatio

n6

Locatio

n7

Locatio

n8

Locatio

n3

FunctionC

FunctionE

FunctionI

FunctionH

Functio

nF

FunctionD

Functio

nG

FunctionB

Functio

nA


14/45


14

Machine cross-section

Location 6 Location 6

for new control system

sn323 et seq.

Location 4Location 3Location 2

Location 5

Location 8

Chase: Location 7

Front end: Location 1


15/45


15

Faults on all machines in one quarter

Location FunctionLo

catio

n2

Locatio

n4

Locatio

n3

Locatio

n7

Locatio

n5

Locatio

n6

Locatio

n8

Locatio

n1

FunctionI

FunctionC

FunctionE

Functio

nG

FunctionF

Functio

nD

FunctionH

Functio

nB

FunctionA


16/45


16


Top faults shown bylocation and function

Allows focusing on the biggesttypes of issues

Few enough issues percategory per quarter toinvestigate each issue

Note: A shorter interval, suchas monthly,

Has the advantage of a fasterresponse if a problem arises

Has the disadvantage ofnoise due to smallersampling (issues shift backand forth)

location and function

1C 2I 2C 4I 4C 1I


17/45


17


Location 1 & Function C, 7 new subsystem

new subsystem new dll

reboot controller

reboot controller

component ineffective issue with test wafers

Most of these faults are notfailures: upgrades of

subsystem on oldermachines or rebooting

No predominant issue


1C 2I 2C 4I 4C 1I


18/45


18


Location 2 & Function I, 6

5 of 6 faults: same component

Validated a known issue andtwo ECOs to address it


1C 2I 2C 4I 4C 1I


19/45


19


Locatio

n1

Locatio

n2

Locatio

n5

Locatio

n4

Locatio

n6

Locatio

n7

Locatio

n8

Location Function

Locatio

n3

FunctionC

FunctionE

FunctionI

FunctionH

Functio

nF

FunctionD

Functio

nG

FunctionB

Functio

nA


20/45


20

Sample size & machine failures by month

Locatio

n1

Locatio

n2

Locatio

n5

Locatio

n4

Locatio

n6

Locatio

n7

Locatio

n8

Locatio

n3

Function

C

Function

E

Fun

ction

I

Function

H

Function

F

Function

D

Function

G

Function

A

May JuneApril

May JuneApril

Function

B

Locatio

n4

Locatio

n5

Locatio

n1

Locatio

n8

Locatio

n3

Locatio

n6

Locatio

n7

Locatio

n2

Locatio

n4

Locatio

n2

Locatio

n5

Locatio

n7

Locatio

n3

Locatio

n1

Locatio

n6

Locatio

n8

Function

C

Fun

ction

I

Function

D

Function

H

Function

E

Function

F

Function

A

Function

G

Function

B

Function

D

Function

C

Fun

ction

I

Function

E

Function

H

Function

G

Function

B

Function

F

Function

A


21/45


21



Goal



Reliability engineering metrics with examples Fault vs. failure Pareto plot


MTBF, MTBA, MTBI

MTTR, MTR

Additional metrics specific to industry

Additional metrics


22/45


22

Reliability definitions: uptime, etc.

All time: either uptime or downtime

Uptime: either operating or idle time

Uptime (hours) availability (%)

Downtime: either PM, or Unscheduled Maintenance (Repairs) MTTR (mean time to repair) applies to PM and to UM

uptime downtime

operating /productive

idle /standby

uptime

availability

PM /Scheduled

UM / Repairs

downtime


23/45


23

Reliability definitions: SEMI

Above plot is from SEMI E10 We assume that Total Time is Operations Time


24/45


24

Machine availability, on average

Machine Availability

Specification: availability > 85% Typical performance: 90%

Measured from Field Service Reports

quarter

portionofOperation

stime


25/45


25

Machine availability, on average

Machine Availability

Specification: availability > 85% Typical performance: 90%

Measured from Field Service Reports Machine PM time approximately 7%. Customers report

At beta Customer, one machine: 94.3% avail. better than 6% PM

At another customer: 6% PM reported

portionofOperation

stime

quarter


26/45


26



Goal



Reliability engineering metrics with examples Fault vs. failure Pareto plot


MTBF, MTBA, MTBI

MTTR, MTR Additional metrics specific to industry

Additional metrics


27/45


27

SEMI E10 definitions

Assist: an unplanned interruption where

Externally resumed (human operator or host

computer), and

No replacement of parts, other than specifiedconsumables, and

No further variation from specifications of equipmentoperation

Failure: unplanned interruption that is not an

assist # of interrupts = # of assists + # of failures


28/45


28

SEMI E10 definitions

MTBF, MTBA, MTBI

MTBF = Interval / (number of failures)

MTBA = Interval / (number of assists)

MTBI = Interval / (number of interrupts)

# of interrupts = # of assists + # of failures

Interval/MTBI = Interval/MTBA + Interval/MTBF

1/MTBI = 1/MTBA + 1/MTBF


29/45


29

MTBF (Mean Time Between Failure) in hrsper machine each month

250 hours is specified

Based on Field Service Reports

month


30/45


30

MTBF (Mean Time Between Failure) in hrsper machine each quarter

250 hours is specified

Field Service Reports indicate we exceed this

Quarterly less noisy than monthly

quarter


31/45


31

Customer-measured MTBF due to our

improvements

Per machine, averaged over two machines

1: Adjusted one of the subsystems

2: Installed an upgraded version of the subsystem in one machine

4-week rolling average

spec 250 hours

week


32/45


32

MTTR (Mean Time To Repair) and MTR

MTTR is Mean Time to Repair (SEMI E10definition): the average elapsed time (not person

hours) to correct a failure and return theequipment to a condition where it can perform itsintended function, including equipment test time

and process test time (but not maintenancedelay).

MTR is Mean Time to Restore: includes

maintenance delays.


33/45


33



Goal



Reliability engineering metrics with examples Fault vs. failure: all failures are faults Pareto plot: location and function, sample size of several

Uptime and Availability: time is up or down

MTBF, MTBA, MTBI: I = F + A

MTTR, MTR: working time vs. clock time Additional metrics specific to industry

Additional metrics


34/45


34



Goal



Reliability engineering metrics with examples

Fault vs. failure: all failures are faults

Pareto plot: location and function, sample size of several

Uptime and Availability: time is up or down

MTBF, MTBA, MTBI: I = F + A

MTTR, MTR: working time vs. clock time Additional metrics specific to industry

Additional metrics


35/45


35

Broken wafers

Goals Ideally zero

In practice, need fewer than 1 in 10k (or 1 in 100k) Broken wafers reported on four different machines

Qty 1, Dec, Year 1

Qty 4, Feb, Year 2

Qty 4, March, Year 2 Qty 1, May, Year 2

Total broken wafers reported 1 in Year 1

9 in Year 2 Q1 and Q2 >7,000k wafers/year on all machines within 1 in 300k

Total reported on newer-style machines: zero

F il ti


36/45


36

0

2

4

6

8

10

12

numberoffa

iluresper

Nimbusre

porting

month

#offailuresp

ercustomer

machiner

eporting

Failures vs. timeper customer machine each month

First two new-series machines being used 24/7.

First machine with two new major features shipped.

2 more machines both arrived at customer

dropped

from >5

to


37/45


37

Unplanned downtime

by machine life

Unplanned maintenance (UM) based on FSRs only Actual UM is higher

Data scattered: 1 std dev values themselves

All machines have reported in time shown (6 quarters)

Not customer dependent

PM > UM

machine life (Q1 warranty start)

unpla

nnedmaintenance


38/45


38

Failures by component

Component

1

Component

2

Component

4

Component

5

Component

6

Component

7

Component

3


39/45


39

Component failure rate normalized

Component 8 failure rateis 3 to 5 times the rate of

other failures Component 1 failures

addressed by ECOs

Component 4 to be movedfrom baseline

Component 9: Eng project

Compo

nent8

Compo

nent1

Compo

nent9

Compo

nent5

Compo

nent2

Component10

Component11

Compo

nent4

Component12


40/45


41/45


41

Most expensive shipments

Includes all shipments

replacements upgrades

5 quarters

Item1

3

Item1

4

Item1

6

Item1

7

Item1

8

Item1

9

Item1

5

Item2

0

Item2

1

Item2

2

Item2

3


42/45


42

All failures

Two quarters

123 reported failures, which fell into 65 categories

One series and another series

Categories are named by cause not by symptom

Other faults were not included. If PM was required, then the fault was not counted as a failure.

Failures occurring twice or more are plotted, which are in 23 categories.

Failures occurring three times or more were analyzed13 categories. Next slides

0123456789

cause

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23


43/45


43

Failures from previous slide: analysis

Failures occurring three times or more

1. 8

2. 7

3. 74. 6+2

5. 5

6. 5

7. 3 (not 4)

8. 4+1

9. 4-1

10. 4

11. 3

12. 313. 3

Status of design

improvement

completed, 24in progress, 7

not started, 33

Note: status has not been updated.



44/45


44



Acknowledgements

Thank you


45/45


45

End of presentation

Thank you

Boston Rs Meeting Jan09

Documents

Transcript of Boston Rs Meeting Jan09