Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

24
ParametricDegradationBased Virtual Failures and Reliability Assessment and Monitoring FengBin Sun HDD Reliability Engineering Hitachi Global Storage Technologies San Jose, California ©2011 ASQ & Presentation FengBin Sun Presented live on Oct 14 th , 2010 http://reliabilitycalendar.org/The_Reli ability_Calendar/Webinars__English/Webinars__English.html

description

This talk presents a parametric perspective of product reliability in terms of virtual failures quantified by time degradation of product failure-governing critical parameters (CP). Traditionally, product reliability is quantified and measured by the confirmed failures through failure analysis (FA); i.e., a product is characterized by two distinctive states: either “functioning” (“surviving”) or “failed”. However, similar to human being, inside a product there is an ongoing health degrading process that could cause the product to a critical, or “near-death” condition. Depending on the degree of such sub-healthy condition, a product may not manifest itself as a physical failure on the macro level during a life test, but it can soon become a true failure after a short time of field operation. To account for the field reliability impact of such invisible but sub-healthy product, the concept of “virtual failure” is proposed and its impact on the product long-term reliability is quantified based on the correlation between the incident of each CP exceeding the trigger limit and the incident of its actual failing in reliability life test. The objective of this paper is to describe the importance of tracking product parametric performance during reliability life test, the identification of failure-indicative critical parameters, their degradation pattern, measurement algorithm, triggering mechanism and limits, and quantification of their impact on long-term reliability.

Transcript of Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Page 1: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Parametric‐Degradation‐Based Virtual Failures and Reliability Assessment and 

MonitoringFeng‐Bin SunHDD Reliability Engineering

Hitachi Global Storage TechnologiesSan Jose, California

©2011 ASQ & Presentation  Feng‐Bin SunPresented live on Oct 14th, 2010

http://reliabilitycalendar.org/The_Reliability_Calendar/Webinars_‐_English/Webinars_‐_English.html

Page 2: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

ASQ Reliability Division English Webinar SeriesOne of the monthly webinars 

on topics of interest to reliability engineers.

To view recorded webinar (available to ASQ Reliability Division members only) visit asq.org/reliability

To sign up for the free and available to anyone live webinars visit reliabilitycalendar.org and select English Webinars to find links to register for upcoming events

http://reliabilitycalendar.org/The_Reliability_Calendar/Webinars_‐_English/Webinars_‐_English.html

Page 3: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

10/14/2010 1

October 14, 2010

by

Feng-Bin Sun

HDD Reliability Engineering

Hitachi Global Storage Technologies

San Jose, California

Page 4: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

2

Disclaimers & Acknowledgements

!   This presentation is based on a joint paper by F. Sun, E. Ou, and S. Zhang published at ISSAT 15th International Conference on Reliability and Quality in Design, August 6 - 8, 2009, San Francisco, California.

!   Deep appreciation to the speaker’s previous employer, especially its management, for their consistent support in the course of this study.

!   This paper doesn’t have any company or product specific information.

!   This presentation will focus on the concepts and philosophies with minimum mathematics.

10/14/2010

Page 5: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

3

Table of Contents !   Introduction !   Parametric Degradation Pattern And Quantification During

Product Reliability Test !   Parametric Trigger Limits Determination !   Population Dynamic Behavior Monitoring Via Box-Whisker

Plotting !   Individual Behavior Examination For The Potential Failure

Candidates Via Scatter (Line) Plotting !   Virtual Failure Concept and Holistic Health/Reliability !   Holistic Reliability Assessment/Monitoring based on True and

Virtual Failures !   Example and Conclusions

10/14/2010

Page 6: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

What’s Happening Inside Your Product?

Page 7: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Health Philosophy and Holistic Reliability Assessment

10/14/2010 5

Physical Health Mental Health

Holistic Health

True Failure

(Macro Level) Virtual Failures

(Micro Level) Holistic Failures

(Macro + Micro) +

Page 8: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Human Health Parametrics

!  Body temperature !  Blood pressure !  Pulse rate & rhythm & quality !  Heartbeat, or heart sound !  Respiratory rate & effort & quality !  Cholesterol level !  Blood sugar level !  …………..

Page 9: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

7

Product Sub-Healthy Condition vs Parametric Monitoring

!   Similar to human being, inside a product there is an ongoing health degrading process that could cause the product to a critical, or “near-death” condition.

!   Depending on the degree of such sub-healthy condition, a product may not manifest itself as a physical failure on the macro level during a life test, but it can soon become a true failure after a short time of field operation.

!   To account for the field reliability impact of such “invisible” but sub-healthy product, the concept of “virtual failure” is proposed and its impact on the product long-term reliability is quantified based on the correlation between the incident of each CP exceeding the trigger limit and the incident of its actual failing in pre-designed reliability testing.

!   The objective of this paper is to describe the importance of tracking the HDD parametric performance during reliability life testing, the identification of failure-indicative critical parameters, their degradation pattern, measurement algorithm, triggering mechanism and limits, and quantification of their impact on long-term reliability.

10/14/2010

Page 10: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

8

Values & Benefits of Parametric Monitoring

  The parametric tracking provides faster feedback to Engineering Design teams for product and design improvements

  The parametric tracking accelerates parametric feedback into product test, manufacturing, and field health monitoring

  The parametric tracking increases product reliability and provide better ability to pro-actively identify field excursions

  Revealing latent defects, improving manufacturing process and product design will reduce overall Field Returns

10/14/2010

Page 11: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

9

Critical Parameters – HDD Example

There are more than 100 critical parameters at both the head component level and drive system level that can be tracked and collected during HDD reliability tests:

!   G-List: "growth" defect table for sectors that have gone bad after the drive was placed in use

!   Error Rate (ER): raw bit error rate !   MR Bias: bias current applied on magneto resistive head !   Magneto-Resistive Asymmetry (MRA): channel amplitude asymmetry

compensation !   MR Resistance: reader element resistance of magneto resistive head !   Non-Repeatable Run-Outs (NRRO): measurement of how much a platter wobbles,

or moves off-center !   Spin Up Time: time required for the disk platters to get up to full operational

speed from a stationary start !   Variable Gain Amplifier (VGA): channel amplitude gain compensation !   Write Current: current going through the writer element during write operation !   ………………….

10/14/2010

Page 12: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Identification of Failure Indicative Critical Parameters for Parametric Reliability Tracking

!   Based on the extensive studies of available historical CP data: !   from reliability life test failures using degradation analysis, and !   from field failure returns using multivariate analysis

!   Four parameters, three at head level and one at drive level, were identified as the most failure-indicative candidates: !   Parameter A – Head Level !   Parameter B – Head Level !   Parameter C – Head Level !   Parameter D – Drive Level

10 10/14/2010

Page 13: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

11

Underlying Principle & Justifications !   The failure mechanism of a product is governed by its

weakest link. !   The head-level reliability performance of a HDD, as a

whole, is governed by the worst-head (if one head fails, the whole drive fails).

!   Therefore, the CP values of the worst head should be used to represent the head-level parametric performance of a HDD.

Page 14: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Parametric Degradation Pattern During Life Test - A sample line plot of Parameters A, B, C, and D

8/8/2009 12

(a) Par A (b) Par B

(c) Par C (d) Par D

Page 15: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

13

Graphical Illustration of the Maximum Degradation Measurement

10/14/2010

Page 16: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Maximum Degradation Measurement !  For Monotonic Parameters, Parameters A and

B, use the max % change of the worst head

Max [ABS(CPcurrent day – CP1st day)/CP1st day] x 100%

!  For Fluctuating Parameters, Parameters C and D, use the max fluctuation (of the worst head for head level)

14

Max CP Fluctuation = (CPMax – CPMin) / Constant

10/14/2010

Page 17: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Parametric Trigger Limits Determination

15 10/14/2010

Page 18: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Individual Behavior Examination For The Potential Failure Candidates

16 10/14/2010

Page 19: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Potential Failure and Virtual Failure !   Potential Failure candidates: those product whose

maximum % change or the maximum fluctuation of any of failure-indicative parameters exceeds its trigger limit

!   Not every potential failure candidate will fail in reliability test.

!   There is a correlation, or likelihood, between the incident of each CP exceeding the trigger limit and the incident of its actual failing in reliability test.

!   Potential Failure x Failure Likelihood => Virtual Failure !   A weighted linear function is introduced to convert

potential failures to their equivalent virtual failures: Virtual Failure Counts = Potential Failure Counts Triggered by Parameter A x a% + Potential Failure Counts Triggered by Parameter B x b% + Potential Failure Counts Triggered by Parameter C x c% + Potential Failure Counts Triggered by Parameter D x d%

17 10/14/2010

Page 20: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Calculation of Failure-Equivalence Factor

The Failure-Equivalence Factor, or Failure Likelihood, is estimated using historical reliability test data as follows: Conditional probability that drive will fail given that it’s CP maximum % change exceeds the trigger limit:

[ Total # of ORT HDDs whose maximum % changes exceed the specified trigger limits and fail in ORT ]

[ Total # of ORT HDDs whose maximum % changes in ORT exceed the specified trigger limits ]

Page 21: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

True Failure + Virtual Failure - Holistic Failure and Reliability

19

True Failure (Macro Level)

Virtual Failures (Micro Level)

Total Adjusted Failures

(Holistic: Macro + Micro) +

!   Traditionally, product reliability during life test, such as annual failure rate (AFR), is assessed based on the true failures.

!   Such approach ignores the sub-healthy condition and failure likelihood of Potential Failures.

!   Combining true failures (at macro level) and virtual failures (at micro level) for product with health degradation provides a holistic view of product failures.

Physical Health Mental Health Holistic Health + 10/14/2010

Page 22: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Holistic Reliability Assessment – based on True and Virtual Failures

20

Enhanced sensitivity due to incorporation of “virtual failures”

10/14/2010

Page 23: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

Conclusions !   Introducing parametric-degradation-based virtual failure

concept adds a micro-dimension to the traditional failure domain.

!   Incorporating virtual failures into reliability calculation enhances the detection sensitivity of ongoing reliability test , and therefore can surface the poor vintage with potential high future failure rate due to “invisible” high degradation of critical parameters.

!   This approach is not only applicable in HDD industry, but also in any other product with health degradation and measurable parametric monitoring.

!   The accuracy of the ‘virtual failure’ counts depends on how well the critical parameters are identified and how good the failure-equivalence factors are estimated.

!   Further analyses should be conducted to better estimate the failure-equivalence factors and evaluate the linear function assumption when more and more life test data are accumulated.

21 10/14/2010

Page 24: Parametric-Degradation-Based Virtual Failures and Reliability Assessment and Monitoring

22 10/14/2010