Reliability prediction of electronic products combining models, lab testing and field data analysis
NOOR CHOUDHURY
KTH ROYAL INSTITUTE OF TECHNOLOGY
I N F O R M A T IO N A N D C O M M U N I C A T I O N T E C H N O L O G Y
DEGREE PROJECT IN COMMUNICATION SYSTEMS, SECOND LEVEL
STOCKHOLM, SWEDEN 2017
Reliability prediction of electronic products combining models, lab testing and field data analysis
Noor Choudhury
2017-01-16
Master’s Thesis
Examiner and Academic adviser Professor Elena Dubrova
Industrial adviser Romain Tiennot (Bombardier Transportation)
KTH Royal Institute of Technology
School of Information and Communication Technology (ICT)
Department of Communication Systems
SE-100 44 Stockholm, Sweden
Abstract | i
Abstract
At present there are different reliability standards that are being used for carrying out reliability
prediction. They take into consideration different factors, environments and data sources to give
reliability data for a wide range of electronic components. However, the users are not aware of the
differences between the different reliability standards due to the absence of benchmarks of the
reliability standards that would help classify and compare between them. This lack of benchmark
denies the users the opportunity to have a top-down view of these different standards and choose
the appropriate standard based on qualitative judgement in performing reliability prediction for a
specific system.
To addres this issue, the benchmark of a set of reliability standards are developed in this
dissertation.
The benchmark helps the users of the selected reliability standards understand the similarities
and differences between them and based on the evaluation criterion defined can easily choose the
appropriate standard for reliability prediction in different scenarios.
Theoretical reliability prediction of two electronic products in Bombardier is performed using
the standards that have been benchmarked. One of the products is matured with available incident
report from the field while the other is a new product that is under development and yet to enter in
service. The field failure data analysis of the matured product is then compared and correlated to
the theoretical prediction. Adjustment factors are then derived to help bridge the gap between the
theoretical reliability prediction and the reliability of the product in field conditions.
Since the theoretical prediction of the product under development could not be used to compare
and correlate any data due to unavailability, instead, the accelerated life test is used to find out the
product reliability during its lifetime and find out any failure modes intrinsic to the board. A crucial
objective is realized as an appropriate algorithm/model is found in order to correlate accelerated
test temperature-cycles to real product temperature-cycles. The PUT has lead-free solder joints,
hence, to see if any failures occurring due to solder joint fatigue has also been of interest.
Additionally, reliability testing simulation is a performed in order to verify and validate the
performance of the product under development during ALT.
Finally, the goal of the thesis is achieved as separate models are proposed to predict product
reliability for both matured products and products under development. This will assist the
organization in realizing the goal of predicting their product reliability with better accuracy and
confidence.
Keywords
Reliability, Reliability Standards, Benchmark, Field data analysis, Thermal Cycling test, FIDES,
Siemens SN 29500, IEC 62380, MIL-HDBK-217F-Notice2, Reliability Prediction, Accelerated Life
Test, Product Reliability
Sammanfattning | iii
Sammanfattning
För närvarande finns det olika tillförlitlighetsstandarder som används för att utföra
tillförlitlighet förutsägelse. De tar hänsyn till olika faktorer, miljöer och datakällor för att ge
tillförlitlighetsdata för ett brett spektrum av elektronikkomponenter. Men användarna inte är
medvetna om skillnaderna mellan de olika tillförlitlighetsstandarder på grund av avsaknaden av
riktmärken för tillförlitlighetsstandarder som skulle hjälpa klassificera och jämföra mellan dem.
Denna brist på jämförelse förnekar användarna möjlighet att få en top-down bakgrund av dessa
olika standarder och välja lämplig standard baserad på kvalitativ bedömning att utföra
tillförlitlighet prognos för ett specifikt system.
För att lösa detta problem, är riktmärket en uppsättning av tillförlitlighetsstandarder som
utvecklats i denna avhandling.
Riktmärket hjälper användarna av de utvalda tillförlitlighetsstandarder förstå likheter och
skillnader mellan dem och på grundval av bedömningskriteriet definieras kan enkelt välja lämplig
standard för pålitlighet förutsägelse i olika scenarier.
Teoretisk tillförlitlighet förutsäga två elektroniska produkter i Bombardier utförs med hjälp av
standarder som har benchmarking. En av produkterna är mognat med tillgängliga incidentrapport
från fältet, medan den andra är en ny produkt som är under utveckling och ännu inte gå in i
tjänsten. Analysen av den mognade produkten fält feldata jämförs sedan och korreleras till den
teoretiska förutsägelsen. Justeringsfaktorer sedan härledas för att överbrygga klyftan mellan den
teoretiska tillförlitlighet förutsägelse och tillförlitligheten av produkten i fältmässiga förhållanden.
Eftersom den teoretiska förutsägelsen av produkt under utveckling inte kan användas för att
jämföra och korrelera alla data på grund av otillgängligheten, i stället är det accelererade
livslängdstest som används för att ta reda på produktens tillförlitlighet under dess livstid och reda
ut eventuella felmoder inneboende till styrelsen . Ett viktigt mål realiseras som en lämplig algoritm
/modell finns i syfte att korrelera accelererade provningen temperaturcykler på verkliga
produkttemperatur cykler. PUT har blyfria lödfogar därmed att se om några fel inträffar på grund av
löda gemensam trötthet har också varit av intresse. Dessutom är tillförlitlighet testning simulering
en utförs för att verifiera och validera produktens prestanda under utveckling under ALT.
Slutligen är målet med avhandlingen uppnås som separata modeller föreslås att förutsäga
produktens tillförlitlighet för både förfallna och produkter under utveckling. Detta kommer att
hjälpa organisationen att förverkliga målet att förutsäga deras tillförlitlighet med bättre
noggrannhet och förtroende.
Nyckelord
Pålitlighet, Tillförlitlighetsstandarder, Riktmärke, Fältdatanalys, Termisk cykelstest, FIDES, Siemens
SN 29500, IEC 62380, MIL-HDBK-217F-Notice2, tillförlitlighet förutsägelse, accelererade
livslängdstestet, Produktens tillförlitlighet
Acknowledgments | v
Acknowledgments
Professor Elena Dubrova
My academic adviser and examiner for this thesis. Thank you for being an absolutely amazing
teacher for my course in “Design of Fault Tolerant Systems” at KTH. It is because of this course and
your teaching, I became interested in the field of reliability and ended up doing this research. Also,
thanks a lot for motivating me while doing this project. Your invaluable feedback and motivation
allowed me to accomplish this research successfully.
Romain Tiennot
My industrial adviser for this thesis at Bombardier Transportation. Thank you for your guidance
throughout the course of my research. Your advice and feedback helped me to be on course and
ensured high quality of my research. I very much appreciate the knowledge and experience that you
shared with me. They helped me in becoming more organized and professional.
Kenneth Nylund
My mentor at Bombardier Transportation. Thank you for sharing all your valuable experiences
regarding accelerated life testing, effect of solder fatigue on circuit card assemblies, etc. It has been a
privilege for me to have worked in close collaboration with you. I very much appreciate your
materials for literature study. They have been of great value and helped in enhancing my knowledge.
Laudier Ndikuriyo & Olga Eskova
My colleagues from Bombardier Transportation. Thanks for your support during my tenure at
BT. It certainly helped in my smooth transition to the organization.
My parents and my sister
Thank you for cheering me on all this time. Because of your encouragement, I had the strength
of getting through all the hurdles.
Stockholm, January 2017 Noor Choudhury
Table of contents | vii
Table of contents
Abstract ..................................................................................... i Keywords .................................................................................................. i Reliability, Reliability Standards, Benchmark, Field data analysis, Thermal Cycling test, FIDES, Siemens SN 29500, IEC 62380, MIL-HDBK-217F-Notice2, Reliability Prediction, Accelerated Life Test, Product Reliability .................................................................................... i
Sammanfattning ...................................................................... iii Nyckelord ................................................................................................ iii Pålitlighet, Tillförlitlighetsstandarder, Riktmärke, Fältdatanalys, Termisk cykelstest, FIDES, Siemens SN 29500, IEC 62380, MIL-HDBK-217F-Notice2, tillförlitlighet förutsägelse, accelererade livslängdstestet, Produktens tillförlitlighet .......................................... iii
Acknowledgments.................................................................... v
Table of contents ................................................................... vii List of Figures .......................................................................... x
List of Tables .......................................................................... xii List of acronyms and abbreviations .................................... xiii 1 Introduction ........................................................................ 1
1.1 Problem definition ....................................................................... 1
1.2 Purpose ........................................................................................ 1 1.3 Goals ............................................................................................ 2
1.4 Research Methodology ............................................................... 2
1.5 Delimitations ................................................................................ 2
1.6 Structure of the thesis ................................................................ 2
2 Background ........................................................................ 3
2.1 Reliability standards ................................................................... 3 2.2 Electronic products for reliability prediction, ALT and field failure data analysis ................................................................................ 4
2.2.1 Board_A ............................................................................ 4 2.2.2 Board_B ............................................................................ 4
2.3 Accelerated Testing .................................................................... 4 2.4 ITEM QT ........................................................................................ 5
2.5 Sherlock ....................................................................................... 5 2.5.1 Life Cycle .......................................................................... 5 2.5.2 Inputs ................................................................................ 7
2.5.3 Analysis Modules .............................................................. 7
3 Benchmark of Reliability Prediction standards .............. 10
3.1 Identification of reliability standards used at Bombardier Transportation ....................................................................................... 10 3.2 Benchmark Structure ................................................................ 10 3.3 Classification and Comparison ................................................ 11
8 | Table of contents
3.4 Evaluation Criterias ................................................................... 12
3.5 Evaluation of reliability standards based on the defined criterias .................................................................................................. 14
3.5.1 FIDES ............................................................................. 14 3.5.2 IEC 62380 ....................................................................... 14 3.5.3 Siemens SN29500 .......................................................... 15
3.5.4 MIL-HDBK-217F-Notice 2 ............................................... 15
3.6 References across different standards ................................... 15
4 Theoretical Reliability Prediction .................................... 16
4.1 Input parameters and Assumptions ........................................ 16 4.1.1 FIDES ............................................................................. 16 4.1.2 IEC 62380 ....................................................................... 17
4.1.3 Siemens SN 29500 ......................................................... 17 4.1.4 MIL-HDBK-217F2 ........................................................... 18
4.2 Prediction outcome ................................................................... 19 4.2.1 Board_A .......................................................................... 19 4.2.2 Board_B .......................................................................... 22
4.3 Theoretical Prediction Analysis ............................................... 25
5 Field Failure Data Analysis for Board_A ......................... 28
5.1 Board_A version distinction ..................................................... 28 5.2 Data Sources and required Inputs ........................................... 29 5.3 Elaboration of field failure data ................................................ 29
5.4 Solder Fatigue analysis for Board_A ....................................... 34 5.5 Conclusion ................................................................................. 35
6 Lab Testing and Reliability testing simulation on BT Products ................................................................................. 38
6.1 Accelerated Life Testing ........................................................... 38 6.1.1 Experimental Setup ......................................................... 38
6.1.2 Input conditions and duration of the thermal cycling ....... 40
6.1.3 Observation .................................................................... 43
7 Sherlock Reliability Testing Simulation – Accelerated Life Testing .................................................................................... 45
7.1 Observation................................................................................ 48
8 The Model .......................................................................... 49
9 Conclusions and Future work ......................................... 51
9.1 Conclusions ............................................................................... 51
9.2 Limitations ................................................................................. 51 9.3 Future work ................................................................................ 51
References .............................................................................. 53
Appendix A: Benchmark of Reliability Standards ............... 55
Supplementary Data File ...................................................................... 55 Description: ........................................................................................... 55 Filename: ............................................................................................... 55
| ix
10 | List of Figures
List of Figures
Figure 2-1: Solder joint fatigue failure on a TSOP package. (Source: CALCE News, September 1993) ............................................................ 8
Figure 4-1: Block level prediction of Board_A .......................................... 20 Figure 4-2: Functional level prediction of Board_A .................................. 21 Figure 4-3: Component level prediction of Board_A ................................. 22 Figure 4-4: Block level prediction of Board_B ........................................... 23 Figure 4-5: Functional level prediction of Board_B .................................. 25 Figure 4-6: Component level prediction of Board_B ................................. 25 Figure 5-1: Timeline of Observation ......................................................... 30 Figure 5-2: Stackup of operating hours ...................................................... 31 Figure 5-3: Field failure data analyis of Board_A and Board_AE ............. 31 Figure 5-4: MTBF of different versions of Board_A extracted from field
failure data analysis ................................................................. 32 Figure 5-5: Failure statistics for Board_A .................................................. 33 Figure 5-6: Solder Fatigue life prediction for Transceiver (D709) ............ 34 Figure 5-7: Solder Fatigue life prediction for LED (D717) ......................... 35 Figure 5-8: Solder Fatigue life prediction for connector (J702) ................ 35 Figure 5-9: Comparison of reliability data for Board_Av2.5 ..................... 36 Figure 5-10: Overall Solder Joint fatigue life prediction ............................. 37 Figure 6-1: Thermal chamber ..................................................................... 39 Figure 6-2: Powered on boards inside the subrack ................................... 40 Figure 6-3: Temperature Log ......................................................................... 43 Figure 7-1: Solder Fatigue Life Prediction Curve, Board_B_ALT, Weibull
curve .........................................................................................46 Figure 7-2: Solder Joint Fatigue Life Distribution_Board_B_ALT .......... 47 Figure 7-3: Solder Joint fatigue Life prediction_Board_B_m=2.65........ 48 Figure 8-1: Theoretical Reliability Prediction of Board_A ........................49
List of Figures | xi
12 | List of Tables
List of Tables
Table 3-1: Classification and Comparison of reliability standards ...........11 Table 3-2: List of evaluation criteria ......................................................... 12 Table 5-1: Board_A version history ......................................................... 28 Table 5-2: List of failed components ........................................................ 34 Table 6-1: Calculation for duration of ALT for different solder materials42
List of acronyms and abbreviations | xiii
List of acronyms and abbreviations
AF Acceleration factor
ALT Accelerated Life Test
AT Accelerated Testing
BGA Ball Grid Array
BT Bombardier Transportation
CCU Communication Controller Unit
CIS Central Interlocking System
COM Communication Board
CoP Causes of Problem
COTS Commercial-off-the-shelf
DGA Délégation générale pour l'armement
EOS Electrical overstress
FFDA Field Failure Data Analysis
FIT Failures in Time
FR Failure Rate
HW Hardware
IEC International Electrotechnical Commission
IC Integrated Circuit
ICT Information and Communication Technology
KPI Key Product Information
MOS Mechanical overstress
MTBF Mean Time Between Failures
MTTF Mean Time to Failure
nf/h Nano-faults per hour
NTC Number of thermal cycle
OC Object Controller
OS Operating System
Pb Lead
PCB Printed Circuit Board
PoF Physics of Failure
PUT Product under testing
QFN Quad Flat No-Leads
RCS Rail control solutions
RH Relative humidity
ROI Return on Investment
T Ambient temperature
TOS Thermal overstress
TR Technical report
Introduction | 1
1 Introduction
This chapter describes the specific problem that this thesis addresses, the context of the problem,
the goals of this thesis project, and outlines the structure of the thesis.
This dissertation aims at providing an overview and cross-checking of the different prediction
methods for the Reliability of electronic/electro-mechanical components used in the field of Rail
Signalling (Object Controller System), with a special emphasis on Reliability lab-testing. It also
consists in deriving and evaluating some corrective and/or scaling factors to correlate the Reliability
figures obtained by:
Theoretical predictions by use of applicable Reliability databases, Norms, Standards
and tools
Tests carried out in laboratory (e.g. by use of temperature chambers, accelerated life
test approaches, etc.)
Failure analysis from field data by use of Root Cause Analysis and statistical approach
The key responsibilities have been to classify and compare the reliability prediction results
when applying the various applicable standards/norms for reliability. Afterwards, “typological” lab
tests are perfomed on selected RCS products, extrapolating the resulting reliability figures and
comparing the outcomes with theoretical reliability predictions. For RCS product that is mature and
is has operational history available, the reliability figures based on the field failure data is cross
checked with theoretical reliability predictions.
All the results are accumulated and summarized to elaborate a global model for RCS BT to
predict product reliability. The model also integrates a derived algorithm to correlate accelerated
test temperature-cycles to real product temperature-cycles.
1.1 Problem definition
BT carries out reliability prediction for each and every products and this model applies globally.
However, it is true that there is no tool or standard that provides guidelines to consider theoretical
reliability prediction, reliability testing and field data analysis all together in the product reliability
predictions.
1.2 Purpose
The purpose of the thesis spans across several objectives.
First of all, creation of a benchmark of reliability standards increases the visibility and
understanding of the various standards along with their usability across different projects.
Second of all, coming up with a scientific process to bridge the gap between theoretical
reliability prediction and actual product reliability in the field.
In addition, performing ALT to attain more information on product reliability and failure
modes. Afterwards, validating and verifying the observation by performing reliability testing
simulation.
Finally, producing a model that takes into consideration all the abovementioned process and
efficiently determines product reliability.
2 | Introduction
1.3 Goals
The objective of this thesis project is based on the needs of the railway industry, more specifically
within BT.
The goal of this project is to elaborate a ‘global' model to predict product reliability within BT.
To ensure that the goal is successfully attained at the end of the duration of this project, sub-goals
are established. The sub-goals are listed below:
1. Produce a benchmark of the reliability prediction standards that are used within BT and
have the potential to be used in the future.
2. Perform theoretical prediction of two electronic products belonging to BT using the
standards that have been selected for creation of the benchmark.
3. Evaluation of reliability figures of the matured product based on field failure data and
comparing them to the theoretical reliability prediction.
4. Perform “typological” lab tests (e.g. accelerated-life tests and simulation) on a product
under development to predict board reliability.
5. Summarizing the overall result from the above four subgoals to realize the overall goal
of elaborating a global model to predict product reliability.
1.4 Research Methodology
The project utilizes the quantitative method to draw conclusions. The method requires verifying a
hypothesis or theories by quantitative measurements via experiements or testing [1]. The method is
used by comparing the predicted reliability results with the field failure data analysis and finally
producing adjustment factors. In addition, the method aids in the development of the model by
combining the statisics of FFDA and the experimental data.
1.5 Delimitations
Accelerated life testing on the product under development is performed where the acceleration
variable is thermal (i.e. temperature). Other stress variables such as vibration, humidity, electrical
stress are not taken into consideration due to time constraints for the realization of this thesis. In
addition, during the accelerated life tests only failures intrinsic to the board and it’s components are
looked out for. Therefore, any reliability issues related to the software running on the board are
considered out of the scope of this project.
1.6 Structure of the thesis
Chapter 2 presents relevant background information about reliability standards and accelerated
testing. The chapter also introduces the electronic products that are worked with during this project.
Information on tools used in this project are also provided in this chapter. Chapter 3 presents
information on the Benchmark of Reliability Standards. Chapter 4 displays the input parameters
and assumptions for the selected standards and the reliability prediction outcome. The results
obtained after performing the field failure data analysis are shown in Chapter 5. Chapter 6 covers
the topic of Accelerated Life Testing whereas Chapter 7 presents the outcome from the reliability
testing simulation. Chapter 8 discusses the model derived after the completion of the project.
Chapter 9 concludes the dissertation and provides suggestion for future work.
Background | 3
2 Background
This chapter provides basic background information about reliability prediction, ALT, electronic
products within BT and the tools used in this project.
2.1 Reliability standards
Reliability is a measure of the continuous delivery of correct service. High reliability is required in
situations when a system is expected to operate without interruptions [2]. The communication
systems within BT are such systems requiring them to be highly reliable and be functional over a
long period of time without any failures.
According to [3], “Product reliability is an indicator that the product will perform satisfactorily
over its intended useful life when operated normally. It is of great interest to both customers and
manufacturers.” This holds true for both BT and its customers. From the perspective of BT, high
reliability performance of the products is a requirement to meet the customer requirements, be
competitive and control warranty costs. The effects of poor product reliability is equally important
for the customers as this would result in increased number of failures as well as increased
maintenance costs over the product’s lifetime. In addition, BT being one of the frontrunners of the
railway industry focuses a lot on safety and the inability of it’s products to perform satisfactorily can
have severe implications. Thus the organization emphasises a lot on predicting the reliability of their
products using different reliability prediction standards.
Major work has been done in the field of reliability prediction since the 1950’s and a timeline of
the major events in this field of study can be found in [4]. Included in the timeline of these major
events, are the initial publication of different reliability prediction standards and how they have
evolved over the years after that. The standards follow different methodology to predict the
reliability of electronic systems which can result in completely different reliability data for the same
system. This has been established as a fact during the research work on “A comparison of
Electronic-Reliability Prediction Models” which has been published in 1999 by two researchers from
Longborouh University.
The research was done after it anecdotally became obvious there were problems with prediction
systems that were in common use. During the research work, reliability information which were
collected from leading British and Danish manufacturers for many years was used. The collected
data was regarded as of the highest possible quality and supposedly could provide as a benchmark
against the data from the reliability handbooks that could be tested [5]. 6 circuit boards were chosen
for which the manufacturers provided extra data to match them with the handbook types. The fIVE
reliability prediction handbooks that were used are: MIL-HDBK-217E, HRD4, Siemens SN29500,
CNET and Bellcore (TR-TSY-000332). The methodology implemented was to predict the reliability
of the circuit boards by using the selected reliability standards and then to compare the results with
the failure rates that have been observed in the field [5]. The outcome was that the reliability
handbooks were not good at accurately predicting the reliability. Apparently, either the handbooks
were too pessimistic or too optimistic in their prediction and would thus far have deviation from the
reliability of the products in reality. Further research also demonstrated that the models in these
handbooks were sensitive to different factors in different ways. During the course of this
dissertation, part of the task involves implementing the same methodology of comparing the field
failure data with the theoretical prediction from four different standards.
There are not many research done in this area of classification and comparison of different
reliability standards. One exception is a handbook published from IMdR which exhibits different
4 | Background
reliability standards and the principles of the reliability models selection [6]. The listed reliability
standards in this handbook are:
1. MIL-HDBK-217
2. RDF 93
3. UTE-C 80810
4. FIDES
5. 217Plus
2.2 Electronic products for reliability prediction, ALT and field failure data analysis
2.2.1 Board_A
The Board_A is a communication board which is part of the Communication Controller Unit (CCU)
in the Object Controller System. The functionality of the board is to receive telegrams from the
Central Interlocking System (CIS) and pass them on to the object controller boards via the OC-Link.
The board is manufactured by using the Pb-soldering process.
The board is kept in a cabinet where temperature ranges from 50°C to 70°C and one
temperature cycle takes approximately 24 hours.
2.2.2 Board_B
The Board_B is a part of the HW and SW platform Base_B. This product, which will be a part of the
object controller system is still under development and is yet to be installed in the field for use. The
board consists of three processors and a FPGA, IC_B. Amongst, the three processors, Processor A
and Processor B execute safety critical applications. Both the processors come with diverse HW and
OS to ensure higher level of redundancy and to negate the cause of the same failure happening to
both the processors at the same time. Processor S provide services to Processor A and Processor B
which are non-safety critical. The front-end switch is used for interfacing with the CIS whereas the
IC_B is used for interfacing with the wayside objects. The Board_B is manufactured following a
lead-free soldering process and this is the norm for all the different versions of the Board_B.
2.3 Accelerated Testing
Accelerated testing experiments are run with the purpose of extracting reliability information.
During accelerated testing, the test units of a component, subsystem or system are administered to
higher-than-usual levels of one or more accelerating variables such as temperature or stress [7]. The
results from the AT aids in predicting the life of the test units at use conditions. Present day
electronic products are required to have high-reliability and are expected to operate without failure
for many years. This results in few units failing in a test of practical length at normal use conditions.
For example, the design and construction of an object controller board may allow only a few months
to test its components that are expected to be in the field for nearly 30 years. However, if the testing
is done at normal use conditions at practical length, it will be difficult to assess completely the
product reliability. AT helps to assess or demonstrate component and subsystem reliability and to
detect failure modes. If failure modes are detected, the manufacturer can correct them before
putting them in use to the field.
Background | 5
5
The idea behind performing ALT during this project is to find out failure modes, if any, in one of
the products under development within BT. In addition, the new product that is being developed
characterises lead-free (SAC305) solder joints which is prone to reliability issues at high
temperature unlike Tin-lead (SnPb) solder joints. The change to lead-free solders have given rise to
new/unfamiliar failure modes [8]. Also, study has shown that lead-free failure times, failure
mechanisms and failure locations are significantly different than that of Tin-Lead and more work is
required to understand the consequences of lead-free soldering [9]. Thus it will be of interest to see
if any form of failure occur in these lead-free board due to solder joint fatigue.
2.4 ITEM QT
ITEM QT is a reliability, safety and risk assessment software that have been used during this project
in order to perform theoretical reliability prediction. The software has all the selected reliability
standards chosen for this project and these standards are embedded in the form of modules.
ITEM QT prompts the users, if any key parameters have been left blank, thus allowing the user
to add the parameters accordingly. This ensures that the key factors affecting the prediction for the
different standards are not left out.
ITEM QT also allows the user to perform reliability prediction of a board using different
standards in the same project. In essence this allows the user to apply the appropriate standards
based on qualitative judgement on different components of the same board at the same time.
One of the factors not being considered in the FIDES module of the tool is the lead-free process
factor for boards manufactured using lead-free soldering process.
In case, there is a component for which the failure rate is not modelled, this can be added to the
prediction using a component named “External”, where the user inputs the FR of the component
based on experience or other sources. This functionality is available to all the standards and can
particularly be useful for components whose FR is mentioned in the company datasheets.
2.5 Sherlock
Sherlock is a tool that allows users to analyze the reliability of circuit card assemblies based on their
design files. The analysis is done using Physics of failure and different modules are used such as
solder fatigue, PTH fatigue and CAF failure.
A Sherlock project consists of three basic sets of information, the Life Cycle definition, the
Project results and the CCA. Within the CCA or the main board, design files, analysis inputs and
results as well as the results for the individual circuit card is available.
Reliability testing simulation is performed for both Board_A and Board_B in Sherlock
Automated Design Analysis Software. Life cycle mimicking real product environment is performed
for Board_A. For Board_B, the life cycle is defined so as to mimic the accelerated life test.
2.5.1 Life Cycle
The life cycle in Sherlock can be defined in a way so as to mimic conditions experienced by the
circuit card during real operational scenario and/or lab testing. This allows for the opportunity to
validate reliability results between Sherlock and lab testing. The result can also be compared with
the actual board behavior in the field.
The “Life Cycle Editor” allows the user to set reliability goals for the CCA. The reliability goals
consist of two input parameters: Reliability Metric and Service Life.
6 | Background
The reliability metric is a quantified reliability goal set by the user for the CCA in any of the
following forms: Reliability (%), Probability of failure (%), MTBF (years), MTBF (hours), FITs (1E6
hours) and FITs (1E9 hours). Service life can be defined as the duration up to which the CCA is
expected to be in service with full functionalities.
The reliability goals for Board_A and Board_B are defined as follows (according to board
requirement specifications):
1. Reliability Metric: 0% probability of failure
2. Service Life: 30 years
In this document the reliability goals are set so as to mimic the normal operating condition of
the board in field.
A life cycle can have one or more “Phases” which in turn can have one or more “Events”.
The “Phase settings” segment in the “Life Phase Editor” allows user to choose the environment
from a predefined set of 14 environments. The users set the duration of the phase along with the #
of cycles for each phase.
The phase settings defined for Board_A and Board_B are:
1. Environment: Ground_Benign
2. Duration: 1 Day
3. # of cycles: 100 duty cycles (100 duty cycle corresponds to the cycle running for the whole
duration)
Once the Phase has been defined, the user can add any one or more of the following events to
the phase: Thermal cycle, Harmonic vibe, Random vibe and Shock event. During the life cycle of a
product, it can experience different forms of stresses. In Sherlock, users can model the different
stresses that the product undergoes during its lifetime by modeling the relevant stresses as events
under different phases of the lifecycle.
For the Board_A and Board_B, only the thermal event is defined.
1. Thermal Event Editor
a. Thermal Event Settings
i. # of Cycles: 100 Duty Cycle
ii. Life cycle State - Parameter to define whether the CCA undergoing the
thermal cycling is in an operating state or in storage. For both the
boards, “Operating” is chosen as the life cycle state
b. Thermal Profile
i. Allows the user to create the thermal profile that the CCA experiences
during laboratory testing or in real scenario.
1. The thermal profile for both the boards have been set to the
following according to real operation conditions:
a. Minimum Temperature: 50°C
b. Maximum Temperature: 70°C
c. Dwell time (Minimum temperature): 6 hours
d. Dwell time (Maximum temperature): 6 hours
e. Ramp up: 6 hours
Background | 7
7
f. Ramp down: 6 hours
Due to lack of information assumption has been made that the time of the day is eventually
spread out for all four activities and each part of the thermal cycling is allocated 6 hours evenly.
2.5.2 Inputs
Sherlock retieves a lot of inputs from the design files of the CCA (usually stored in an ODB++ file.
ODB++ archive for Board_A and Board_B are imported to Sherlock in order to provide Sherlock
with all the information required for designing and analyzing the boards. From the ODB++ file,
Sherlock is able to extract the following information as inputs.
1. Parts list - List of all parts defined for the CCA.
2. Stackup – Displays all the layers the CCA is composed of and their properties. In
addition, board properties are also shown based on the board outline and the individual
layer properties.
3. Layers – A layer viewer with a collection of graphical tools to review, analyze and
update circuit card information.
4. Pick & Place – Displays the pick and place data in the graphical layer viewer.
5. Drill Holes – Displays all the drill holes in the CCA.
6. Net List – Table containing all the net list information for the CCA.
The part properties for every part in the parts list, needed to be checked and confirmed before
performing any analysis. Sherlock relies on a number of critical properties such as package names
and descriptions to guess the parts that are being analyzed. The software compares these
information with its internal databases and attempts to standardize the property values whenever
possible. However, to ensure that these property values are correct, the user needs to confirm the
properties before carrying out the analysis.
2.5.3 Analysis Modules
The Sherlock analysis modules attempt to predict reliability of an electronic circuit card and its
components based on the circuit card design and the expected environmental conditions to be
experienced by the circuit card over its expressed service life. Shwelock analysis modules include the
following:
1. Conductive Anodic Filament (CAF) failure analysis
2. Failure rate analysis
3. PTH fatigue analysis
4. Solder fatigue analysis
For this project, only the solder fatigue analysis is of importance and is performed for Board_B.
2.5.3.1 Solder Fatigue (Thermal Cycling) analysis
Solder joints allows for the electrical, thermal and mechanical connections between a PCB and a
printed board. Board_A and Board_B contains thousands of solder joints. As mentioned previously,
the boards, being part of the object controller system undergoes thermal cycling once they are
installed in the field. During the course of this thermal cycling, the PCB and the components
mounted on the PCB expands or contracts due to change in temperature. However, the rate of
8 | Background
expansion or contraction for the PCB and the components vary due to difference in CTE. This places
the solder joint under a lot of stress which damages the solder. Over time this damage accumulates
and leads to crack propagation which in turn cause failure to the solder joint due to solder joint
fatigue. Figure 2-1 displays a TSOP package failing due to solder joint fatigue and losing connection
with the PCB substrate.
Figure 2-1: Solder joint fatigue failure on a TSOP package. (Source: CALCE News, September 1993)
Solder joint fatigue can be influenced by the following:
Maximum temperature
Minimum temperature
Dwell time at maximum temperature
Component design (size, number of I/O, etc.)
Component material properties (CTE, elastic modulus, etc.)
Solder joint geometry (size and shape)
Solder joint material (SnPb, SAC305, etc.)
PCB thickness
Printed in-plane material properties (CTE, elastic modulus)
Sherlock Solder Fatigue analysis module makes use of the following input parameters to
perform solder joint fatigue failure analysis:
Life-Cycle Reliability Goals
Parts list
Circuit card mechanical properties (stackup data)
Component sizes and locations
Solder properties
Thermal events and associated thermal maps
Solder material: Lead-free (SAC305/SnPb)
Part temperature rise: 0°C
Part validation: Enabled
The sensitivity analysis for the Sherlock solder fatigue module gives a better understanding of
what affects the damage to the components. The procedure to perform this sensitivity analysis has
Background | 9
9
been to take a single part from the Board_B and then comparing the result of the component with
its original properties against the result with each varying properties. At any one point, the property
of interest has been changed while the others are kept the same in order to ensure an accurate
comparison of the results. The properties that have an effect on the solder fatigue of the component
are given below:
Package types: The larger the package of an electrical component, the higher will be the
damage. For instance, a part of package type “0805” (2 mm*1.2 mm*0.6 mm) will
suffer more damage than a part with package type “0603” (1.6 mm*0.8 mm*0.5 mm).
Material: Damage on components is dependent on the primary material of which the
part is made. For instance, a part made up of Bariumtitanate will suffer less damage
than a part made up of Alumina.
Pad size: If the pad size is big, the damage on the component will be less and vice versa.
Solder material: Lead-free solder material (SAC 305) increases the damage on the
component more than TIN-LEAD (63SN37PB) solder material.
Stencil Thickness: The thickness of the solder joint that connects the part to the PCB. A
thick stencil thickness will result in a conservative damage to the part than that of a thin
stencil thickness.
Part Temperature rise: The higher the part temperature rise, the higher will be the
damage experienced by the component due to solder fatigue.
Dimension of the die: Higher dimension of the die will result in higher damage due to
solder fatigue and vice versa.
Dimension of the flag: Higher dimension of the flag will result in higher damage due to
solder fatigue and vice versa.
The life prediction curve rises vertically if the damage to the component is high which resembles
to a higher probability of failure (%). The x-axis represents the lifetime in years. The prediction
curve is a 2-parameter Weibull curve.
10 | Benchmark of Reliability Prediction standards
3 Benchmark of Reliability Prediction standards
The reliability prediction standards at RCS, Bombardier Transportation is identified after
classification of current and potential reliability prediction standards.
Apparently, there are quite a few reliability standards that are used to carry out reliability
prediction for electronic boards, components, etc. However, what has been missing is a reliability
standards benchmark that would allow the users to select the appropriate standard to do the
prediction based on qualitative judgement. The benchmark is built upon a platform of evaluation
criterias mapped against selected standards assisting the users in decision making.
3.1 Identification of reliability standards used at Bombardier Transportation
According to [10], within a time frame between the year 1997 and 2015, 65 reliability predictions
have been performed on various components at RCS BT. From [10], we can retrieve the standards
that were used to do the predictions as well as their usage statistics. The standards that were used
during this time and their usage statistics are as follows:
IEC 62380 TR (Former RDF 2000): Total predictions-48, Percent contribution-73.85%
MIL-HDBK-217F-Notice2: Total predictions – 11, Percent contribution – 16.92%
Telcordia Issue 1 (Former Bellcore): Total predictions – 4, Percent contribution – 6.15%
Siemens SN 29500: Total predictions: 2, Percent contribution – 3.08%
After the identification, apart from Telcordia Issue 1, the rest of the standards along with FIDES
is selected to perform classification and comparison between them. FIDES is chosen due to its po-
tential of future use at Bombardier, since there is a possibility that IEC 62380 may well be replaced
by FIDES as a defacto standard. In addition, FIDES allows users to take into consideration complex
mission profiles for the components. Since, IEC 62380 is used for majority of the reliability
prediction within Bombardier, it is thus included in the benchmark. MIL-HDBK-217F is one of the
oldest reliability standards and have been used by different industries for many decades. Even
though its use has become very limited due to technological advancements, it is still of interest to
see how the reliability model of this standard compares with its more recent counterparts. Siemens
SN 29500 is included in the benchmark due to its common use within the railway industry.
3.2 Benchmark Structure
The benchmark consists of five segments in the form of five spreadsheets. The first spreadsheet is
named “Classification and Comparison”. This is where the set of evaluation criterias are mapped
against the standards.
The “Component Mapping” spreadsheet is dedicated to displaying the electronic components
and their variants that is included in the different standards. During reliability prediction, one of the
issues faced by the user is a variation between the naming of different types of components. This
worksheet brings an added value to the benchmark by including a list of electronic components and
how they have been addressed across the selected standards. This aids the user performing the
prediction into knowing which components are part of the benchmark and quickens the process of
selecting the appropriate component in the tool during the prediction. During the creation of this
worksheet, the list of components have been retrieved from [11] and [12].
Benchmark of Reliability Prediction standards | 11
11
The “Possible Problems & CoP” worksheet presents a list of components including the possible
problems that may occur with them and the potential cause. The worksheet can be used in the fu-
ture by BT RCS to perform FMECA, RCA and preventive maintenance analysis. The repair center
can also also use this information while looking to repair failures that have been reported in the field
for specific component family type. The list of addressed components with regards to the
component mapping is not necessarily exhaustive and can be extended in case information on the
possible problems and the cause of problems of the missed components is found.
Amongst all the standards, only IEC 62380 provides life expectancy information for few
component families and their failure mode repartition. The list is included in the worksheet “IEC
62380-Miscellaneous”. The added value of this worksheet is also the fact that this will aid during
design purposes and scheduling preventive maintenance. Also, if failure occurs for any one of the
com-ponents in the list the repair center can look at the failure repartition between the failure
modes to have a primary idea of what the probable cause of failure.
Finally, the “References” spreadsheet displays the references that are indispensable for the use
of the various standards.
3.3 Classification and Comparison
This section is dedicated to the “Classification & Comparison” segment of the benchmark and
includes Table 3-1.
Table 3-1: Classification and Comparison of reliability standards
FIDES guide 2009 IEC 62380 TR Siemens SN 29500 MIL-HDBK-217F-N2
Type Guide (Proposed as
future IEC standard)
Standard Standard Standard
Status To be continued To be continued
(Until 2017)
To be continued (on
Siemens initiative)
Discontinued
Last update 2010 2004 Various releases 1995
Application domain Railway included Commercial
application but not
railway
Railway included Specific to Military use
Mehodology Families count analysis
Part count analysis
Part stress analysis
Only Part stress
analysis
Only Part stress
analysis
Part count analysis
Part stress analysis
Component Life cycle phases Permanent working
On/off Cycling
Dormant application
Permanent working
On/Off cycling
Dormant application
Not covered
Not covered
Environment Dynamic Dynamic Categorized Categorized
COTS components Covered Covered Covered Not- overed
Principles of construction Physics of Failure Empirical Empirical Empirical
Mission/Life profile Covered Covered Not Covered Not covered
12 | Benchmark of Reliability Prediction standards
Failures derived from
development/manufacturing
errors
Covered No information Not covered Not covered
Electrical overstress Covered Covered Covered Covered (Partial)
Mechanical overstress Covered Not Covered Covered Not covered
Thermal overstress Covered Covered Not covered Covered
Process contributing factor
(Component manufacturing
factor)
Covered Not covered Not covered Not covered
Process contributing factor
(∏_Process factor)
Covered Not covered Not covered Not covered
Humidity Covered Not covered Not covered Not covered
Lead-free Soldering Covered Not covered Not covered Not covered
Package Data Covered Covered Not covered Covered (Negligible)
Conformal Coating Not covered Not covered Not covered Not covered
The standards use different methodology to perform reliability analysis. The terms that are part
of the methodology in the benchmark is explained in details below:
Parts count method requires less information, generally part quantities, quality level
and the application environment. This method is applicable during the early design
phases and during proposal formulation. Usually, this method of prediction will result
in a more conservative estimate of system reliability than the Parts stress method.
Parts stress analysis method requires a greater amount of detailed information and is
applicable during the later phase when actual hardware and circuits are being designed.
The families count prediction method introduced in FIDES is particularly applicable
during the earliest phases of the project. This method can be used to produce a
reliability evaluation with the least amount of information about the product definition.
In particular, the technological description of items is very much simplified and
practically all application constraints are fixed at default values
Physics of failure is a technique under the practice of Design for Reliability that uses the
knowledge and understanding of the processes and mechanisms that induce failure to
predict reliability and improve product performance
3.4 Evaluation Criterias
The list containing the evaluation criterias that have been identified can be seen in Table 3-2.
Table 3-2: List of evaluation criteria
Evaluation Criteria
Type
Status
Benchmark of Reliability Prediction standards | 13
13
Last release/update
Benchmark version release
Next anticipated release/update/maintenance
Publisher
Objectives
Origins of data
Principles of construction
Methodology
Model coverage
Mathematical Model Type
Mathematical equation
Reliability metrics
Life/Mission Profile
Phases
Stresses
Environment
Possibility to consider additional environments
MIL-SPEC components
COTS components
Lead-free Process Factor
Bathtub curve coverage
Terms and definitions
Applicability indicators
Temperature cycling
Lead-free soldering
Solder joint failure rate
Conformal coating
Influence of environment
Composition
Warning/Limitations
Confidence level in the prediction
Package data
Software model
14 | Benchmark of Reliability Prediction standards
Vibration
Shock
Chemical
Covered Product Life Cycle Phases
Failure mode
Failure Distribution
Life expectancy
3.5 Evaluation of reliability standards based on the defined criterias
3.5.1 FIDES
FIDES Guide 2009 is the latest reliability prediction guide that is available as of now. It has been
produced under the supervision of DGA by companies in the FIDES Group. The Group consists of
the following companies: AIRBUS France, Eurocopter, Nexter Electronics, MBDA missile sys-tems,
Thales Systèmes Aéroportés. Even though the group is very much dominated by companies from the
field of aeronautics and defense, the guide eventually covers a more broad application domain.
The first publication for this guide was in 2004 under the name FIDES Guide 2004 issue A
which was later accepted by the French standardisation organization with the reference UTE C 80
811. The rationale behind the release of the latest publication has been to take into consideration the
technological advancements, increase the coverage and to make improvements. The release for the
guide had been in 2010-09-01
The methodology takes into account failures that are derived from development or
manufacturing errors and overstresses such as electrical, thermal and mechanical. The methodology
also deals with non-functioning phases such as dormant application and genuine storage.
The evaluation method of FIDES does not consider the infant mortality and the wear out
periods of the components except for some special cases for some sub-assemblies [13].
The objectives of the creation of this standard have been:
To make a realistic evaluation of the reliability of the electronic products including
systems that encounter severe or non-aggressive environments (storage).
To provide a specific tool for the construction and control of this reliability.
To develop a new reliability assessment method for electronic components which takes
into consideration COTS and specific parts and new technologies.
3.5.2 IEC 62380
IEC 62380 TR is a reliability data handbook that is based on the French telecommunications
standard RDF 2000. This reliability handbook has been released in 2005 and is defined as an in-
ternational standard by International Electrotechnical Commission (IEC).
The IEC 62380 TR calculation model takes into consideration the influence of the environment,
the thermal cycling seen by the cards, function of mission profiles undergone by the equipment,
replace environment factor which is difficult to evaluate. These models can handle permanent
Benchmark of Reliability Prediction standards | 15
15
working, on/off cycling and dormant applications. On the other hand failure rate related to the
component soldering, is henceforth included in the component failure rate [14].
The initiating motivation of the IEC 62380 TR has been to take into consideration the influence
of the environment which is much more effective.
3.5.3 Siemens SN29500
Siemens SN29500 is a reliability standard used by Siemens AG and the Siemens companies as a
uniform basis for reliability predictions.
The initiating motivation for this reliability standard has been the customer requirements on
demonstrating the reliability calculation of the products’ from Siemens. Another motivation has
been to write a reliability engineering guide in order to provide engineering process and tools to
improve reliability in the development of new electronic systems.
Siemens SN 29500 is based on the IEC standard IEC 61709. The standard comes in individual
documents for specific component groups, 12 to be exact. Instead of updating the whole standard at
once Siemens have resorted to updating individual documents based on their needs.
The IEC 61709 standard is intended for reliability prediction of electronic components. The
standard describes how to state and use data belonging to an organization in order to perform
reliability predictions [15]. The standard can also be used by an organization to set up a failure rate
database and to describe the reference conditions for which field failure rates should be stated [15].
3.5.4 MIL-HDBK-217F-Notice 2
The purpose of this handbook has been to establish and maintain consistent and uniform methods
for estimating the inherent reliability of military electronic equipment and systems. During acqui-
sition programs for the military electronic systems and equipment there was a need to have a
common basis for reliability predictions hence the creation of this handbook. The MIL-STD-217F
also creates the opportunity to compare and evaluate reliability predictions of related or competi-
tive designs. The intended use of the handbook is as a tool to increase the reliability of the
equipment being designed.
MIL-HDBK-217F is becoming obsolete as the technology coverage of electronic products and
sys-tems widen. Apparently, MIL-HDBK-217F is very pessimistic when it comes to components that
are not MIL-SPEC. However, at present it is commonplace for the military and the avionics in-
dustry to use COTS components while building their system.
3.6 References across different standards
From the “References” worksheet it can be seen that both FIDES and Siemens standards have re-
ferred to “IEC 60050 (191) A1 (1999-03) Electromechanical vocabulary - Chapter 191: operating
dependability and service quality” as well as “IEC 61709 Electronic components - Reliability –
Reference conditions for failure rates and stress influence models for conversion”. FIDES have al-so
used data from the “Military standard mil-hdbk-217F (+notice 1 & notice 2)” and the “UTE C 80-810
RELIABILITY DATA HANDBOOK: RDF 2000 – A universal model for reliability predic-tion
calculations for components, electronic boards and equipment” (currently IEC 62380) both of
which are part of the benchmark. Siemens and IEC 62380 tr have both used IEC 60747 as refer-
ences. As for the military standard MIL-HDBK-217F2, most of its reference documents are specif-ic
to different components which the standard have compiled to create the failure database.
The benchmark can be viewed in the file attached in Apppendix A.
16 | Theoretical Reliability Prediction
4 Theoretical Reliability Prediction
Two products within RCS, BT have been identified for which reliability features are predicted. The
reliability prediction was performed for Board_A, Version 2.5 and Board_B, Version 1.4 using
selected relevant standards. The prediction is performed using the software ITEM QT. The
standards that have been implemented are FIDES, IEC 62380, MIL-HDBK-217F2 and Siemens SN
29500.
4.1 Input parameters and Assumptions
This section displays certain input parameters and assumptions that are used during the prediction
for both Board_A and Board_B across the chosen standards. The input parameters and
assumptions are made in accordance to the reference condition. Components across the different
standards are mapped accordingly.
4.1.1 FIDES
4.1.1.1 Life Profile
The life profile for the reliability prediction performed on Board_A and Board_B have been set to
mimic the real conditions that the boards go through in the field after they have been installed. The
board is to be in a permanent working mode and no standby time due to repair is assumed.
The life profile parameters are as follows:
Permanent Working Phase: On
Calendar Time Hours: 8760 (represents 1 year)
Ambient Temperature: 25°C
Relative Humidity: 0 % (due to the board being powered on 100% of the time)
Temperature Amplitude, ΔT: 10°C
Number of Cycles Per year: 365 (1 cycle/day)
Cycle duration: 24 hours
Maximum temperature during cycling: 45°C
Random Vibration: 0 Grms (Assumed)
Component Junction Temperature 60°C
4.1.1.2 General Input Parameters
The placement is selected to be “Non-interface” Digital function.
Default value is assumed for the “Ruggedizing calculation mode” and the “Process factor
calculation mode”.
Manufacturer Quality Assurance Level and Component Quality Assurance Level are set to
“Equivalent” which stands second best amongst four different levels. The Component Reliability
Assurance Level is set to “Very Reliable – Level B”. The Manufacture Experience Factor is chosen to
be “Recognised manufacturer: Mature processes for the item considered”.
Theoretical Reliability Prediction | 17
17
4.1.1.3 Constraints
Since, FIDES is almotst 7 years old, the most recent packages are not included in the standard.
4.1.2 IEC 62380
The assumptions during the prediction using the IEC 62380 standard are kept similar to that of the
assumptions made during the earlier predictions of Board_A and Board_B. IEC 62380 is the only
standard amongst all the standards user here to provide information on life expectancy for the
different components.
4.1.2.1 Mission Profile
“Ground, stationary: weather protected” and “Permanent working” is used as the mission profile.
The Night and Day temperature difference has been set to 10°C with 365 cycles per year.
Non-interface setting is used for the electrical environments since the boards are inside the
cabinet and do not have any cables going outside.
The average outside ambient temperature 𝑡𝑎𝑒 is selected to be 25°C and the the average ambient
temperature of the board near the components 𝑡𝑎𝑐 is 40°C.
4.1.2.2 General Input Parameters
The general input parameters that have been used are:
Junction Temperature Estimation Mode: Junction-Ambient
Air Flow Type: Natural Convection
Function/Electrical Environment: Non Interface
Year of Manufacturing: Board_A (2008), Board_B (2014)
4.1.2.3 Constraints
IEC 62380 is unable to model QFN and BGA packages with 0.8 mm pitch. A lot of the packages are
missing and there is a limitation to the number of transistors and memory bits.
4.1.3 Siemens SN 29500
4.1.3.1 Mission Profile
No mission profile is available in this standard.
4.1.3.2 General Input Parameters
The general input parameters are as follows:
Junction Temperature Calculation Mode: Junction Temperature User Input
Junction Temperature, Input: 60°C
Stress Profile: Disabled
Inegrated Circuits, Operating Time: 3000 hours [Default] [Maximum value]
18 | Theoretical Reliability Prediction
4.1.3.3 Constraints
Failures instrinsic to the PCB cannot be modelled in Siemens. Instead, the PCB block contains the
failures related to the connections for both the boards.
Absence of temperature cycling and mission/life profile disallows mimicking the real scenario
the predicted products undergo.
Advanced IC packages can not be modelled.
4.1.4 MIL-HDBK-217F2
4.1.4.1 Mission Profile
No mission profile is available in this standard.
4.1.4.2 General Input Parameters
Application
o Repair Mode: Non-repairable
o Environment: Ground, Benign
o MTTR: 0 hour
o Number of Standby: 0
o Ambient Temperature: 40°C (component ambient temperature inside cabinet)
o Voltage Stress: 0,8 (default)
o Current Stress: 0,7 (default)
o Power Stress: 0,75 (Default)
o Adjustment Factor: 1
o Connection type: Reflow Solder
Physical
o Technology: CMOS
o Package Type IC: Surface Mount Tech
o Quality, Microelectronics: Comercial or Unknown
o Number of Gates: 60000 (Maximum)
o Number of Transistors: 10000 (Maximum)
o Number of years in production: Board_A - 8 years, Board_B - 2 years
o Theta Case/Ambient: 40°C
o Theta junction Case: 60°C
o Quality, Other: Lower
o Quality Capacitors: Commercial or Unknown
4.1.4.3 Constraints
MIL-HDBK-217F due to being last published nearly 20 years back, could not keep up with the
technological advancements. The standard therefore lacks input parameters and models which are
essential for a good prediction. Few of these constraints are mentioned in this section.
The failure on the PCB of the products could not be modelled since failure intrinsic to the PCB is
not modelled in this standard.
Voltage converter is not modelled which has been used in Board_A.
Theoretical Reliability Prediction | 19
19
Board_A and Board_B contain linear microcircuits with more than 10,000 transistors which is
a limitation in the standard.
Bipolar and MOS circuits limited to 60,000 gates and memory devices are limited to 1 million
bits which is far less than what it actually is both the boards used for reliability prediction.
Prediction model for Flash memory and FPGA’s not modelled. Hence, the flash memories and
the FPGA used in the Board_B and Board_A could not be modelled in the prediction.
Mission/Life Profile and Temperature cycling is not modelled in this standard which do not
allow for mimicing the exact conditions that the products undergo.
4.2 Prediction outcome
This section is dedicated to the outcome of the reliability prediction for the Board_A and the
Board_B.
To perform reliability prediction of the Board_A, all the components are divided into 7 blocks.
The blocks are as follows: Active, Block S, Connectors, Power, Passive, 8-Layer Equipped PCB and
Miscellaneous. Amongst the abovementioned blocks, the “Miscellaneous” block is part of the
analysis but as it has no impact on board operability, it is not taken as a contributor to the MTTF.
The block and the components within are presented in “Italics” in the figures. The block partitioning
and the classification of components for the Board_A can be found in [16].
The block partitioning and the classification of components for the Board_B can be found in
[17]. None of the selected reliability standards allows for performing prediction of lead-free boards.
However, FIDES 2009 have a small section discussing the consequences on reliability due to the
transition to lead-free manufacturing process. FIDES proposes to calculate the failure rate of a
product manufactured using the lead-free process by deriving the product of the original failure
rate, part manufacturing factor, process manufacturing factor and the lead-free process factor
∏_LF. The lead-free process factor hereby varies between 1 (for a mature process) to 2 (for a
process for which no precautions were taken). Apparently, for the prediction on the Board_B that
was done on [17], the PCB failure rate was multiplied with a lead-free process factor, ∏_LF=2.
The prediction results are displayed in three stages for both the boards.
The output from Stage 1 shows the failure rate of each block and their contribution to the failure
of the whole board. We refer to this as “Board X”. The X is replaced by Board_A and Board_B with
respect to the outcome displayed.
The output from Stage 2 is the prediction results on functional level showing all the components
in each block, their quantity, their failure rate and their contribution to the failure of the whole
board. We refer to this as “Board X Functional Level”.
Stage 3 output, displays the total number of components used in the board, their quantity, their
failure rate and their contribution to the failure rate related to the whole failure rate of the board.
This we refer to as the “Board X Component Level”.
It shall be noted that for IEC 62380, SN 29500 and MIL-HDBK-217F2, the model for the
integrated circuits is limited due to insufficient IC packages.
4.2.1 Board_A
According to Figure 4-1, the MTBF for Board_A for the different standards are as follows:
FIDES: 72.1 years
20 | Theoretical Reliability Prediction
IEC 62380: 95.8 years
SN 29500: 95.9 years
MIL-HDBK-217F2: 5.5 years
Amongst all the results, it is very obvious that the MIL-HDBK-217F gives the most conservative
results.
Figure 4-1: Block level prediction of Board_A
Upon closer inspection, even though Siemens and IEC 62380 have very similar results, and
FIDES having not as big of a difference to them as that of MIL-HDBK 217F2, we can see the
differences in the most contributing blocks.
It shall be noted that according to the Benchmark that has been produced during the course of
this thesis, only FIDES and IEC 62380 allows for thermal cycling to be taken into consideration
during reliability prediction while the SN 29500 and MIL-HDBK-217F2 do not.
Since the PCB intrinsic failure rate in MIL-HDBK-217F is not modeled hence it is more
optimistic than the other standards where it is modeled.
For FIDES, the most contribution to the failure rate is by Block Power with a value of 906 FITS
which is approximately twice the failure rate for the block in IEC 62380, 12x greater than the value
in SN 29500 and nearly 4x greater than the FR in MIL standard.
For IEC 62380, the trend is the same as that of FIDES, with Block Power and Block S
contributing the most with values of 40.85% and 31.96% respectively.
Block Passive and Block S dominates the failure rate contribution in SN 29500 by contributing
46.96% and 31% of the total failure rate of the board. While for MIL-HDBK-217F Block S
contributes the most to the failure of the board and that is by 42.37% and the Active Block
contributing 30.83%.
One very interesting observation from the outcome of the prediction is the difference in values
related to the failure rate of the PCB between the different standards. As for Siemens, SN 29500 the
value provided as the failure rate of the PCB is due to the connections rather than the intrinsic
failure of the PCB. The intrinsic failure to the PCB is well modelled in FIDES and IEC 62380 and it
contributes by 0.2% and 9.8% respectively.
The Stage 2 output showing the predicted values for the board can be seen in Figure 4-2.
FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%)
8-layer Equipped PCB 2,5 0,2% 117,0 9,8% 51,0 4,3% No Model 0,0%
Active 167,0 10,5% 36,5 3,1% 124,0 10,4% 6440,0 30,8%
Block S 293,0 18,5% 381,0 32,0% 369,0 31,0% 8850,0 42,4%
Power 906,0 57,2% 487,0 40,8% 75,5 6,3% 252,0 1,2%
Passive 190,0 12,0% 95,5 8,0% 559,0 47,0% 4710,0 22,5%
Connectors 25,3 1,6% 75,2 6,3% 11,9 1,0% 636,0 3,0%
Miscellaneous 180,0 0,0% 648,0 0,0% 400,0 0,0% 782,0 0,0%
Sum 1583,8 100% 1192,2 100% 1190,4 100% 20888,0 100%
MTBF (Years) 72,1 95,8 95,9 5,5
BlocksFIDES IEC 62380 SN 29500 MIL-HDBK217F2
Theoretical Reliability Prediction | 21
21
Figure 4-2: Functional level prediction of Board_A
Figure 4-2, gives us a much better overview of which components are contributing the most for
the failure of the board across different standards.
The voltage converter used in the Board_A has a big impact on the failure rate of the board
according to FIDES and IEC 62380. Unfortunately, this component could not be modeled in MIL-
HDBK-217F2, which is another constraint of the standard. Integrated circuits have a quite a big
impact as well across all the four standards. However, the failure rate achieved for the IC’s would be
different in for the SN 29500 and the MIL-HDBK-217F standard given that they have limitation in
the input parameters due to aging.
Stage 3 of the prediction on component level for Board_A can be seen in Figure 4-3.
FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%)
8-layer
Equipped PCB1 2,5 0,2% 117,0 9,8% 51,0 4,3% No Model 0,0%
CNY17
Optocoupler/Fuse4 74,7 4,7% 13,7 1,1% 60,0 5,0% 40,0 0,2%
Integrated Circuits 10 92,3 5,8% 22,8 1,9% 64,0 5,4% 6400,0 30,6%
Integrated Circuits 9 245,7 15,5% 335,6 28,2% 339,0 28,5% 8740,0 41,8%
Oscillator 1 47,3 3,0% 45,4 3,8% 30,0 2,5% 114,0 0,5%
Voltage Converter 2 852,0 53,8% 454,0 38,1% 48,9 4,1% No Model 0,0%
Integrated Circuits 1 54,0 3,4% 32,7 2,7% 26,6 2,2% 252,0 1,2%
Capacitor 298 146,1 9,2% 45,0 3,8% 420,1 35,3% 3310,4 15,8%
Inductor/Transfor
mer25 2,5 0,2% 32,3 2,7% 39,6 3,3% 96,1 0,5%
Power Switch 1 4,5 0,3% 12,9 1,1% 32,0 2,7% 430,0 2,1%
Resistor 200 36,9 2,3% 5,3 0,4% 67,3 5,7% 872,0 4,2%
Connectors 7 25,3 1,6% 75,2 6,3% 11,9 1,0% 636,0 3,0%
Transistor 5 26,1 0,0% 6,0 0,0% 35,5 0,0% 23,5 0,0%
LED 8 7,5 0,0% 561,0 0,0% 48,4 0,0% 9,4 0,0%
Switch 2 7,8 0,0% 55,8 0,0% 12,9 0,0% 430,5 0,0%
Integrated Circuits 2 35,5 0,0% 15,0 0,0% 81,7 0,0% 249,3 0,0%
Diode 13 103,0 0,0% 11,0 0,0% 222,0 0,0% 68,8 0,0%
Sum 559 1584 100% 1192 100% 1190 100% 20891 100%
MTBF (Years) 72,1 95,8 95,9 5,5
Miscellaneous
QuantitiesBlocksFIDES IEC 62380 SN 29500 MIL-HDBK217F2
Active
Block S
Power
Passive
Component
22 | Theoretical Reliability Prediction
Figure 4-3: Component level prediction of Board_A
The output in Figure 4-3 gives us in-depth view of what we saw in the outputs from Stage 1 and
Stage 2. The output validates how Integrated circuits and the voltage converter are one of the major
contributors to the failure of the Board_A. According to the comparatively older standards, SN
29500 and MIL-HDBK-217F2, capacitors contribute very much to the failure of the board as well.
The italicized components are part of the “Miscellaneous” Block and are part of the analysis but as it
has no impact on board operability, it is not taken as a contributor to the MTTF.
4.2.2 Board_B
The result of the reliability prediction of Board_B can be seen in Figure 4-4. According to the results
the MTBF of the Board_B across different standards are as follows:
FIDES: 33.2 years
IEC 62380: 35.5 years
SN 29500: 25.1 years
MIL-HDBK-217F: 1.5 years
FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%)
Capacitor 298 146,1 9,2% 45,0 3,8% 420,1 35,3% 3310,4 15,8%
Connectors 7 25,3 1,6% 75,2 6,3% 11,9 1,0% 636,0 3,0%
Inductor/Transfome
r25 2,5 0,2% 32,3 2,7% 39,6 3,3% 96,1 0,5%
Integrated Circuit 20 392,0 24,8% 391,1 32,8% 429,6 36,1% 15392,0 73,7%
Optocoupler/Fuse 4 74,7 4,7% 13,7 1,1% 60,0 5,0% 40,0 0,2%
Oscillator 1 47,3 3,0% 45,4 3,8% 30,0 2,5% 114,0 0,5%
PCB 1 2,5 0,2% 117,0 9,8% 51,0 4,3% No Model 0,0%
Power Switch 1 4,5 0,3% 12,9 1,1% 32,0 2,7% 430,0 2,1%
Resistor 200 36,9 2,3% 5,3 0,4% 67,3 5,7% 872,0 4,2%
Voltage Converter 2 852,0 53,8% 454,0 38,1% 48,9 4,1% No Model 0,0%
Transistor 5 26,1 0,0% 6,0 0,0% 35,5 0,0% 23,5 0,0%
LED 8 7,5 0,0% 561,0 0,0% 48,4 0,0% 9,4 0,0%
Switch 2 7,8 0,0% 55,8 0,0% 12,9 0,0% 430,5 0,0%
Integrated Circuits 2 35,5 0,0% 15,0 0,0% 81,7 0,0% 249,3 0,0%
Diode 13 103,0 0,0% 11,0 0,0% 222,0 0,0% 68,8 0,0%
Sum 559 1584 100% 1192 100% 1190 100% 20891 100%
MTBF (Years) 72,1 95,8 95,9 5,5
ComponentsFIDES IEC 62380 SN 29500 MIL-HDBK217F2
Quantity
Theoretical Reliability Prediction | 23
23
Figure 4-4: Block level prediction of Board_B
In FIDES we can see the failure rate contribution is almost evenly spaced out between the three
processor blocks, MMI, IC_B and the Back end connectors.
In IEC 62380, nearly one-third of the contribution towards the failure of the board is predicted
to be due to the PCB. This is a huge difference compared to Siemens, where only the failures related
to the connection within the board is taken into consideration. The MMI and the CPU B also
contributes considerable amount of failure rate according to IEC 62380. The abovementioned
blocks also remain the major contributor to the failure rate according to SN 29500 and MIL-HDBK-
217F as well.
FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%)
PCB 6,1 0,2% 1010,0 31,4% 111,0 2,4% No Model 0%
Power 246,0 7,1% 90,0 2,8% 477,0 10,5% 8840,0 11,2%
Reset 102,0 3,0% 13,7 0,4% 88,7 1,9% 4950,0 6,3%
Sub-rack links 169,0 4,9% 93,8 2,9% 256,0 5,6% 6040,0 7,7%
Front End switch 200,0 5,8% 124,0 3,9% 186,0 4,1% 3580,0 4,6%
MMI 444,0 12,9% 398,0 12,4% 501,0 11,0% 12300,0 15,6%
CPU S 470,0 13,7% 415,0 12,9% 849,0 18,6% 9940,0 12,6%
IC_B 387,0 11,2% 214,0 6,7% 260,0 5,7% 2520,0 3,2%
CPU A 397,0 11,5% 202,3 6,3% 694,0 15,2% 8680,0 11,0%
CPU B 431,0 12,5% 344,0 10,7% 665,0 14,6% 11300,0 14,4%
Back end links 144,0 4,2% 41,6 1,3% 159,0 3,5% 2840,0 3,6%
Back end
connectors446,0 13,0% 266,0 8,3% 306,0 6,7% 7650,0 9,7%
Sum 3442 100% 3212 100% 4553 100% 78640 100%
MTBF (Years) 33,2 35,5 25,1 1,5
FIDES IEC 62380 SN 29500 MIL-HDBK217F2Blocks
24 | Theoretical Reliability Prediction
FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%)
PCB 1 6,1 0,2% 1010,0 31,4% 111,0 2,4% No Model 0,0%
Diode 8 15,2 0,4% 12,8 0,4% 136,0 3,0% 41,0 0,1%
Capacitor 70 85,7 2,5% 25,8 0,8% 151,3 3,3% 1260,0 1,6%
Fuse 1 1,1 0,0% 10,0 0,3% 25,0 0,5% 10,0 0,01%
Inductor 16 1,8 0,1% 14,6 0,5% 24,0 0,5% 2,0 0,003%
Transistor 2 0,3 0,0% 2,4 0,1% 12,2 0,3% 44,7 0,1%
Resistor 106 35,1 1,0% 2,9 0,1% 35,6 0,8% 462,0 0,6%
Integrated circuit 12 106,7 3,1% 21,6 0,7% 92,4 2,0% 7013,0 8,9%
Capacitor 8 10,0 0,3% 3,0 0,1% 17,7 0,4% 75,1 0,1%
Resistor 28 5,2 0,2% 0,7 0,0% 9,4 0,2% 122,0 0,2%
Integrated circuit 8 86,9 2,5% 10,0 0,3% 61,6 1,4% 4750,0 6,0%
Capacitor 38 62,2 1,8% 13,4 0,4% 139,5 3,1% 685,1 0,9%
Inductor 4 2,3 0,1% 9,1 0,3% 6,0 0,1% 0,5 0,001%
Connector 3 32,4 0,9% 6,2 0,2% 2,4 0,1% 630,0 0,8%
Resistor 79 14,5 0,4% 2,1 0,1% 26,6 0,6% 344,0 0,4%
Transformer 4 1,0 0,0% 25,0 0,8% 12,6 0,3% 372,0 0,5%
Integrated circuit 5 46,2 1,3% 15,4 0,5% 38,5 0,8% 3228,0 4,1%
Oscillator 1 10,4 0,3% 22,7 0,7% 30,0 0,7% 783,0 1,0%
Capacitor 46 73,4 2,1% 17,0 0,5% 99,4 2,2% 830,1 1,1%
Connector 4 18,0 0,5% 28,4 0,9% 2,0 0,0% 424,1 0,5%
Inductor 9 3,0 0,1% 8,2 0,3% 13,5 0,3% 1,1 0,001%
Oscillator 1 50,3 1,5% 22,7 0,7% 30,0 0,7% 783,0 1,0%
Resistor 72 20,0 0,6% 1,9 0,1% 10,8 0,2% 314,0 0,4%
Transformer 2 0,5 0,0% 12,5 0,4% 6,3 0,1% 186,0 0,2%
Integrated circuit 3 34,2 1,0% 33,0 1,0% 23,1 0,5% 1038,0 1,3%
LED 12 9,6 0,3% 210,0 6,5% 72,7 1,6% 15,7 0,02%
Diode 6 178,4 5,2% 25,9 0,8% 28,2 0,6% 44,6 0,1%
Capacitor 47 35,1 1,0% 15,3 0,5% 92,7 2,0% 703,1 0,9%
Connector 2 31,4 0,9% 13,9 0,4% 2,0 0,0% 106,3 0,1%
Resistor 65 11,6 0,3% 1,8 0,1% 21,9 0,5% 283,0 0,4%
Integrated Circuit 10 112,6 3,3% 35,6 1,1% 136,0 3,0% 5978,5 7,6%
Oscillator 6 62,3 1,8% 88,6 2,8% 135,0 3,0% 4700,0 6,0%
Switch 1 2,7 0,1% 6,4 0,2% 12,0 0,3% 430,0 0,5%
Capacitor 137 169,6 4,9% 51,7 1,6% 301,1 6,6% 2468,0 3,1%
Inductor 3 1,7 0,0% 2,7 0,1% 4,5 0,1% 0,4 0,0005%
Connector 1 20,1 0,6% 8,2 0,3% 1,0 0,0% 210,0 0,3%
Resistor 131 24,1 0,7% 3,3 0,1% 44,1 1,0% 397,6 0,5%
Integrated Circuit 13 204,4 5,9% 304,1 9,5% 437,5 9,6% 6321,0 8,0%
Oscillator 2 50,3 1,5% 45,4 1,4% 60,0 1,3% 540,0 0,7%
LED 1 0,9 0,0% 17,5 0,5% 6,1 0,1% 0,9 0,001%
Capacitor 67 97,3 2,8% 25,4 0,8% 148,0 3,3% 1210,0 1,5%
Connector 1 20,1 0,6% 8,2 0,3% 1,0 0,0% 210,0 0,3%
Resistor 22 4,1 0,1% 0,6 0,0% 7,4 0,2% 95,9 0,1%
Integrated Circuit 6 263,9 7,7% 162,0 5,0% 97,5 2,1% 1002,0 1,3%
Capacitor 129 152,0 4,4% 48,9 1,5% 285,0 6,3% 2330,0 3,0%
Inductor 3 1,7 0,0% 2,7 0,1% 4,5 0,1% 0,4 0,0005%
Connector 1 20,1 0,6% 8,2 0,3% 1,0 0,0% 210,0 0,3%
Resistor 93 17,1 0,5% 2,5 0,1% 31,3 0,7% 122,0 0,2%
Integrated Circuit 10 195,5 5,7% 117,3 3,7% 341,6 7,5% 5240,8 6,7%
Oscillator 1 10,4 0,3% 22,7 0,7% 30,0 0,7% 783,0 1,0%
Resistor 80 14,7 0,4% 2,1 0,1% 26,9 0,6% 349,0 0,4%
Integrated Circuit 14 148,0 4,3% 247,6 7,7% 233,0 5,1% 7446,6 9,5%
Oscillator 1 10,4 0,3% 22,7 0,7% 30,0 0,7% 270,0 0,3%
Capacitor 168 236,9 6,9% 61,8 1,9% 371,0 8,2% 3030,0 3,9%
Connector 1 20,1 0,6% 8,2 0,3% 1,0 0,0% 210,0 0,3%
Inductor 2 1,1 0,0% 1,8 0,1% 3,0 0,1% 0,3 0,0003%
Capacitor 29 41,2 1,2% 10,2 0,3% 60,7 1,3% 523,1 0,7%
Inductor 3 1,7 0,0% 2,5 0,1% 4,5 0,1% 0,4 0,0005%
Resistor 71 13,1 0,4% 2,8 0,1% 23,9 0,5% 310,0 0,4%
Transformer 3 0,8 0,0% 2,7 0,1% 9,5 0,2% 279,0 0,4%
Integrated circuit 4 36,9 1,1% 18,9 0,6% 30,8 0,7% 945,0 1,2%
Oscillator 1 50,3 1,5% 4,5 0,1% 30,0 0,7% 783,0 1,0%
MMI
CPU S
BlocksFIDES IEC 62380 SN 29500 MIL-HDBK217F2
QuantitiesComponent
Power
Reset
Sub-rack links
Front End switch
IC_B
CPU A
CPU B
Back end links
Theoretical Reliability Prediction | 25
25
Figure 4-5: Functional level prediction of Board_B
Figure 4-5 and Figure 4-6 displays the functional level prediction and the component level
prediction of Board_B respectively.
In IEC 62380, it is not possible to model integrated circuits with BGA or QFN packages of 0.8
mm pitch. However, in certain blocks in Board_B, integrated circuits with this kind of package types
are present. These integrated circuits are thus mapped in IEC 62380 to the best case possible. The
failure rate for these specific integrated circuits have been marked in light orange in Figure 4-5. The
same colour code is used in Figure 4-6 for the accumulated failure rate of the integrated circuits
modelled in IEC 62380.
In FIDES, when the capacitors have been modelled, the rated voltage is assumed to be 10V for
all of them and the accumulated failure rate of the capacitors used in the Board_B can be seen in
Figure 4-6.
Figure 4-6: Component level prediction of Board_B
According to all the standards integrated circuits contribute quite a big amount towards the
failure rate of Board_B. FIDES and Siemens, marks capacitors as big contributors as well.
4.3 Theoretical Prediction Analysis
Amongst all the selected standards, FIDES uses a prediction model that is based on physics of
failure whereas the other standards use prediction models that are empirical and are based on
statistical interpretation of test data analysis and previous models. . Physics of failure is a technique
to predict reliability and improve product performance. The technique utilizes the knowledge and
understanding of the processes and mechanisms that induce failure. Contrary to the other reliability
standards that uses empirical modelling of operational feedback to build their models FIDES uses
physics of failure to build their models which is later supported by test data analysis, operational
Optocoupler 2 37,4 1,1% 30,7 1,0% 100,0 2,2% 30,9 0,04%
Capacitor 15 18,8 0,5% 5,7 0,2% 33,1 0,7% 81,2 0,1%
Connector 12 251,0 7,3% 169,0 5,3% 36,0 0,8% 2520,0 3,2%
Transistor 2 0,3 0,0% 1,2 0,0% 12,2 0,3% 122,0 0,2%
Resistor 51 9,3 0,3% 1,8 0,1% 17,2 0,4% 222,0 0,3%
Integrated circuit 14 128,9 3,7% 57,2 1,8% 108,0 2,4% 4670,0 5,9%
Sum 1774 3440 100% 3212 93% 4550 100% 78594 100%
MTBF (Years) 33,2 35,5 25,1 1,5
Back end connectors
FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%)
Capacitor 754 982,00 28,546% 278,40 8,668% 1699,53 37,35% 13195,70 16,79%
Connector 25 413,15 12,010% 250,07 7,786% 46,40 1,02% 4520,42 5,75%
Diode 14 193,61 5,628% 38,62 1,202% 164,15 3,61% 85,58 0,11%
Fuse 1 1,05 0,031% 10,00 0,311% 25,00 0,55% 10,00 0,01%
Inductor 40 13,28 0,386% 41,62 1,296% 60,00 1,32% 5,08 0,01%
Integrated Circuits 99 1364,07 39,652% 1022,63 31,839% 1599,96 35,17% 47632,90 60,61%
LED 13 10,57 0,307% 227,50 7,083% 78,76 1,73% 16,59 0,02%
Optocoupler 2 37,40 1,087% 30,70 0,956% 100,00 2,20% 30,90 0,04%
Oscillator 13 244,40 7,104% 229,33 7,140% 345,00 7,58% 8642,00 11,00%
PCB 1 6,10 0,177% 1010,00 31,446% 111,00 2,44% No Model 0,00%
Resistor 798 169,01 4,913% 22,69 0,706% 255,16 5,61% 3021,50 3,84%
Switch 1 2,69 0,078% 6,43 0,200% 12,00 0,26% 430,00 0,55%
Transformer 9 2,26 0,066% 40,23 1,253% 28,37 0,62% 837,00 1,06%
Transistor 4 0,51 0,015% 3,61 0,112% 24,40 0,54% 166,73 0,21%
Sum 1774 3440 100% 3212 100% 4550 100% 78594 100%
MTBF (Years) 33,2 35,5 25,1 1,5
ComponentsFIDES IEC 62380 SN 29500 MIL-HDBK217F2
Quantity
26 | Theoretical Reliability Prediction
feedback and existing modelling. Once the creation of the models have been completed, they are
calibrated from operational feedback [6].
MIL-HDBK-217 in general is conservative due to the components being Non-MIL spec. In
addition, it has also been found out from the predictions of both the products that the standard is
much more conservative for specific components than the other three standards. For selected
component types of both the boards, validated statements are provided below outlining the
differences between them.
1. Board_B
a. Integrated circuits
i. The output of MIL-HDBK-217F for integrated circuits in Board_B yield
to a predicted failure rate which is approximately 35x, 47x and 30x
greater than the FR of FIDES, IEC 62380 and SN 29500 respectively.
ii. FR according to Siemens is 1.6x greater than IEC and 1.2x greater than
FIDES.
b. Capacitors
i. Output from MIL standard is 7.9x, 47.5x, 13x greater than SN, IEC and
FIDES respectively.
ii. FR according to Siemens is 6.1x greater than IEC and 1.73x greater than
FIDES.
c. PCB
i. FR according to IEC is 166x greater than FIDES and 9.1x greater than
Siemens.
d. Connectors
i. FR provided by FIDES is 1.65x, 8.9x greater than IEC and Siemens
respectively.
ii. Output from MIL standard is 97.4x, 18x, 11x greater than SN, IEC and
FIDES respectively.
Similarly, it can be seen that the FR of diodes are much higher for FIDES and SN than IEC and
the MIL standard. MIL standard gives very conservative values for Oscillators while Siemens SN
29500, FIDES and IEC have similar values in comparison.
2. Board_A
a. Voltage converter
i. FR provided by FIDES is 1.88x, 17.4x greater than IEC and Siemens
respectively.
b. Resistors
i. Output from MIL standard is 13x, 164.5x, 23.6x greater than SN, IEC
and FIDES respectively.
ii. FR provided by SN is 12.7x, 1.8x greater than IEC and FIDES
respectively.
c. Integrated circuits
i. FR according to Siemens is 1.1x greater than IEC and 1.1x greater than
FIDES respectively.
Theoretical Reliability Prediction | 27
27
ii. FR output from MIL is 35.8x, 39.4x and 39.3x greater than SN, IEC and
FIDES respectively.
The above statements clearly outline the differences between specific component types across
various standards. However, these statements can be given more confidence once they are
compared with field data (for the Board_A) whose MTBF will allow of a much better understanding
of which standard relates to the realistic scenario more than the others.
Both Siemens and MIL-HDBK-217F have constraints due to technological advances and do not
take into consideration modern components while creating the models. Thus the output from these
standards should be less trusted than the other two comparatively new standards. The reliability
results of both these standards would likely have gotten worse if the failures related to the PCB
could have been modeled. The limitations in the number of transistors and gates for IC’s have also
affected the prediction in these two standards as well.
In IEC 62380, the major contributing factors of failures for both the boards were in prediction
of failures to the PCB and the Integrated circuits. It can be fair to say, IEC 62380 allocated
approximately 1/3 of the failure rate contribution to the PCB, Integrated circuits and the rest of the
components. IEC 62380, also considers temperature cycling and the mission profile allows for a
better simulation of the real scenario in which the boards are at. However, there is a set boundary to
this standard as well, with a limitation to the amount of transistors and pins in IC’s. This is due to
the standard being not updated after 2004.
FIDES, allows users to mimic real conditions as well with mission profile. The list of
components are more advanced and is in line with the components used in both Board_A and
Board_B. One of the important differences of FIDES in regards to the other standards is that it
includes the manufacturing process factors in the prediction model unlike other standards.
Inclusion of the manufacturing process factor raises the confidence level in the prediction.
In terms of usability, FIDES and IEC 62380 are more user-friendly than the other two
standards.
None of the standards consider lead-free soldering process of the board in their prediction
models. Even though FIDES, in its handbook has a section dedicated to Lead-free soldering process.
However, this is not implemented in the FIDES module of the ITEM QT tool during the time this
prediction was done.
28 | Field Failure Data Analysis for Board_A
5 Field Failure Data Analysis for Board_A
Board_A is the matured product for which field failure data collection, elaboration and analysis is
performed. Substantial amount of field data is available for the product. The Board_A is a
communication board which is part of the Communication Controller Unit (CCU). The functionality
of the board is to receive telegrams from the Central Interlocking System (CIS) and pass them on to
the object controller boards via the OC-Link. The board is manufactured by using the SnPb-
soldering process. The board is kept in the cabinet of the object controller system where
temperature ranges from 50°C to 70°C and one temperature cycle takes approximately 24 hours.
5.1 Board_A version distinction
The very first version of Board_A was manufactured in 2009 and was labelled as Version 2.2. The
board was updated with new version releases up until the year of 2014. The version distinction
along those years including the changed components and the motivation behind those changes can
be found in Table 5-1.
Table 5-1: Board_A version history
Version Manufactured
year
Component
Added
Component
removed
Motivation behind
change
2.2 2009 Hardware fix of
“reset/handover”
2.3 2009 IC_A_Service &
OS update
2.4 2010 J707 VGA
connector
alignment, label
update
2.5 2009/2010/2011 OS update
2.6 2011/2012 Layout update,
removed copper
between pads,
front panel screw
fix
2.7 2012/2013/2014 U106: Power
module
1 of 2 power
modules removed
to save parts in
store due to
pending
obsolescence
2.9 2014 U739: Switched
power regulator
U104: Power
module
The second power
module removed
due to
obsolescence. New
design with
Field Failure Data Analysis for Board_A | 29
29
updated layout.
Board depth
including front
panel reduced by
0.8 mm
5.2 Data Sources and required Inputs
Internal data sources within RCS, BT are used to gather the data required for the field failure data
analysis.
The required inputs needed to perform the field failure data analysis are:
Population of units per project
Commissioning date of the projects
Date of warranty expiration
Number of failures of the Board_A in each project
Operating hours of Board_A in different projects
Version of the board in different projects
Usage Rate (%)
The following attributes are used for RAM elaborations:
Customer name – Incident reports are related to specific projects. The name of the
project is indicated in the “Customer name” attribute.
Case-ID – Each incident is attached with an ID, more commonly referred to as the Case
ID. The Case ID is unique for each incident report.
Product Name – Indicates the name of the product according to BT RCS convention.
Product Number – Combination of characters and numerics, unique for each product.
Serial Number – Specific product can have many identical individual. Serial number
aids in differentiating beween these identical products
Revision – Version of the product.
Date of Error – Displays the date when the incident occur in the field.
Anser – Indicates mitigation/solution applied on the incident report.
Status – The life cycle of an incident report can be classified in 9 different status. For,
RAM elaborations only incident reports that are classified as “Closed” are taken into
consideration.
Fault Type 1 – Indicates the cause of the incident report.
5.3 Elaboration of field failure data
For the elaboration of field failure data, incident reports whose status are reported as “Closed” are
taken into account for the analysis. In addition, for the purpose of confidentiality, the project names
that are used in this Master Thesis have been named randomly.
30 | Field Failure Data Analysis for Board_A
Assumption during the field failure data analysis includes not considering the following fault
types:
No fault found
Upgrade
Handling
The observation period is from the commissioning date of the project until the end of the
warranty period. In the scenario, where the end of warranty period is not available or is in the future
then 2016-06-21 is considered as the end of observation period. The timeline for the observation
period of the data for analysis can be seen in Figure 5-1.
Figure 5-1: Timeline of Observation
There have been 182 incident reports reported in [18] for Board_A. Amongst these, 21 reports
could be traced back to failures related to hardwares. Fault types that have been considered for
these analyses are failing components, supplier, damaged, wear & tear. For incident reports which
do not contain any fault types, the repair report and the failure code is viewed in more details to
figure out if there was any failures due to hardware.
There have been 38 incident reports where the project upgraded the board from version 2.4 to
2.5 which is basically an OS update mentioned in Table 5-1.
The data that has been collected for the field failure data analysis contains a total of 2571 boards
including 2555 units of Board_A and 16 units of Board_AE. The stackup of hours for the Board_A
and Board_AE across the different projects which have been used for field failure data analysis is
given in Figure 5-2. Since Board_AE is manufactured very recently, it is yet to be installed in
different projects. We could retrieve information on one project in Kaxholmen where 16 units of
Board_AE v1.2 is installed and the boards have been in operation much less than the Board_A.
The equation used to find out the MTBF of the boards are as follows:
Field Failure Data Analysis for Board_A | 31
31
𝑀𝑇𝐵𝐹 = (𝑂𝑝𝑒𝑟𝑎𝑡𝑖𝑛𝑔 𝐻𝑜𝑢𝑟
𝐻𝑊 𝑓𝑎𝑖𝑙𝑢𝑟𝑒𝑠 + 0.69)
Figure 5-2: Stackup of operating hours
Figure 5-3 shows the analysis that has been performed for the boards based on the field failure
data. For clear visualisation, the MTBF of respective versions of the boards are given in Figure 5-4.
According to the field data for the Board_A, version 2.4 has the highest MTBF followed by version
2.9 whereas Board_A version 2.5 has the lowest MTBF overall. Version 2.5 of the Board_A
experienced the most hardware failures with a tally of 16 however, in the next version of the board
this has been reduced to just 1 failure.
Figure 5-3: Field failure data analyis of Board_A and Board_AE
0
5 000 000
10 000 000
15 000 000
20 000 000
25 000 000
30 000 000
Board_Av2.4
Board_Av2.5
Board_Av2.6
Board_Av2.7
Board_Av2.9
Board_AEv1.2
25 722 120
20 668 968
4 125 360
8 582 904
2 548 488 377088 To
tal H
ou
rs o
f O
per
atio
n
Board version
Total number of boards Total operating hours Total Failures MTBF (Hours) MTBF (Years)
Board_A v2.4 682 25 722 120 2 9 562 126 1 092
Board_A v2.5 1020 20 668 968 16 1 238 404 141
Board_A v2.6 189 4 125 360 1 2 441 041 279
Board_A v2.7 479 8 582 904 4 1 830 043 209
Board_A v2.9 185 2 548 488 0 3 693 461 422
Board_AE v1.2 16 377088 0 546 504 62
BOARD_A
32 | Field Failure Data Analysis for Board_A
Figure 5-4: MTBF of different versions of Board_A extracted from field failure data analysis
It may seem counter intuitive to have developed the board by introducing new versions when
the MTBF of Board_A v2.4 was very high. However, this is due to the fact that Board_A v2.4
underwent more operating hours than the other versions and had less failures. Also, the idea of
having newer version of a product is to increase functionality and efficiency of the product and
solving issues such as component obsolescence.
For each board that is reported, a failure code, a failure cause and a failure detection phase is
stated. The definition for each of this code can be found in [19].
The incident reports of the “Haitai” and the “Sainan” projects did not have failure code, failure
cause and failure detection phase assigned. Hence, they are assumed based on fault description
from the failure reports. Due to the codes being assumed, they have been represented in “italics”.
The failure detection phase for all the records are 5041 which according to [19], stands for
“Customer – Commercial Operation”. This phase is under the parent code of 5000 which is defined
as “Warranty”. It is understandable that all the failures recorded here have been detected by the
customer while they were in operation. The failure code and the failure cause is more diverse than
the failure detection phase.
There are three specific failure cause and one parent failure cause that have been recorded for
the projects for which analysis is performed. They are as follows:
3006 – Engineering
4000 – External Causer (Parent failure cause)
4036 – Customer
4041 – Consortium partner
The failure codes recorded are:
6026 – Electrical shortage
7211 – Part Broken
8400 – Functional failure (Parent failure code)
8406 – Electric component (unit) does not work
Field Failure Data Analysis for Board_A | 33
33
Figure 5-5: Failure statistics for Board_A
Figure 5-5, includes the failure statistics for Board_A. The figure contains important
information regarding the failure code, failure cause and the failure detection phase for the Board_A
failures that happened in different projects. Table 5-2 displays all the component failure records
extracted after the FFDA was performed. Apparently, all the components that failed are solely
present in all the versions of Board_A.
One interesting observation from Figure 5-5 is that the failures for the J702 components and
D709 components are concentrated on specific projects. That is 6 of the 7 failures that occurred in
the field due to the J702 component failing happened in two projects in the same country. 3 of the
failures occurred in the Randsburg project while the other 3 failures occurred in Kaxholmen. All of
the 6 failures occurred for the version 2.5 of Board_A. All the failures in the Kaxholmen project
occurred in 2014 while one of the failures in the Randsburg project occurred in 2014 while the rest
occurring in 2016. As for the D709 component, both of them failed in the Randsburg project in the
year of 2016 for version 2.5 of the Board_A.
Apparently, according to the prediction, the connectors are supposedly contributing very less to
the total failure rate of the system. Hence, the occurrence of so many connector failures that too
concentrated on specific projects urges for investigation to be made on those specific projects. This
can happen due to the environment, installation or any other external factors. In this scenario,
investigation to these two projects can be a good ROI. This is because if a specific reason for these
concentrated failures are figured out and action is taken to mitigate these failures, the performance
of the Board_A v2.5 in the field would rise by a considerable margin.
Given that the concentrated failures of the components on the project Randsburg and project
Kaxholmen did not occur, the MTBF of Board_A version 2.5 would then be 271.5 years instead of
141.4 years. And that actually makes much more sense if the MTBF of version 2.6 is looked at, which
has a MTBF of 278.7 years.
Projects Incidents Failure Code CountFailure
CauserCount
Failure
Detection phaseCount D717 D709 J702 F100
Scrap due to obsolete
component/unknown
component
Version
Haitai 9 8400 9 4000 9 5041 9 - - - - 9 2,5
8406 2 4000 2 1 1 1 - -
7211 1 3006 1 - 1 - -
8406 1 4041 1 - 1 1 - -
Kaxholmen 3 7211 3 3006 3 5041 3 - - 3 - - 2,5
8406 1 4036 1 - - - - 1 2,4
8406 1 4036 1 - - 1 - - 2,6
Brownell 1 8406 1 4036 1 5041 1 1 - - - - 2,7
Aptos Hills 1 7211 1 3006 1 5041 1 - - - - 1 2,7
Babylon 1 8400 1 3006 1 5041 1 - - - - 1 2,7
Oketo 1 6026 1 4036 1 5041 1 - - - - 1 2,7
Sainan 1 8400 1 4000 1 5041 1 - - - 1 - 2,4
1 4000 1 - - - - - 2,7
1 4036 1 - - - - - 2,5
Haitai 1 9999 1 4000 1 5041 1 - - - - - 2,5
Brownell 2 9999 2 4036 2 5041 2 - - - - - 2,7
Westernport 3 9999 3 4036 3 5041 3 - - - - - 2,6
Willow Island 2 9999 2 4036 2 5041 2 - - - - - 2,7
Brogan 1 9999 1 4036 1 5041 1 - - - - - 2,9
Tuscola 1 9999 1 4036 1 5041 1 - - - - - 2,7
1 9999 - - - - -
1 8406 - - - - -
Sum 37 2 2 7 1 13
Wakefield 2,92 24036 50412
Hingham 2 5041 2
Randsburg 9999 2 5041 2
Randsburg 4 5041 4 2,5
34 | Field Failure Data Analysis for Board_A
Table 5-2: List of failed components
5.4 Solder Fatigue analysis for Board_A
In Sherlock, Solder fatigue analysis is performed for Board_A to mimic field conditions with a ΔT of
20°C ranging from 50°C to 70°C. Each cycle takes 24 hours to be completed and it happens
throughout the service life of Board_A which is defined as 30 years.
From the analysis, the solder fatigue life prediction for the components that have failed in the
field can be viewed. Figure 5 6, Figure 5 7 and Figure 5 8 shows the life prediction for the
transceiver, LED and the connector respectively.
The simulated results show that for all the three components that have failed in the field, the
probability of them failing due to solder joint fatigue is negligible.
Figure 5-6: Solder Fatigue life prediction for Transceiver (D709)
The LED in Figure 5-7 is omitted from the theoretical prediction across all four standards.
However, the component has been included in the FFDA.
Component Description Failures
J702 Connector 7
D717 LED 2
D709 Transceiver 2
F100 Fuse 1
Scrapped Board/Unknown - 13
Field Failure Data Analysis for Board_A | 35
35
Figure 5-7: Solder Fatigue life prediction for LED (D717)
Figure 5-8: Solder Fatigue life prediction for connector (J702)
5.5 Conclusion
There have been a lot of incident reports for the Board_A, however in many of them the failure
code, the failure cause and the failure detection phase has been missing. However, in the recent
projects, their inclusions have made it easier to understand better the incident reports.
Several hardware failures could not be included for the analysis due to the constraint of limited
observation period. One example of a limited observation period is the absence of expiry date of the
warranty. This can be made better if the person responsible in recording the incident report keeps
track of it.
In comparison to the theoretical prediction in Figure 4-1, the MTBF of all the versions of the
Board_A are much higher in reality according to the field failure data analysis. Since the prediction
was done for version 2.5 of the Board_A, a bar chart displaying how the MTBF acquired from the
field failure data analysis differs from the theoretical prediction is given in Figure 5-9.
36 | Field Failure Data Analysis for Board_A
Figure 5-9: Comparison of reliability data for Board_Av2.5
The figure from the field failure data analysis is almost twice of that of the value predicted by
FIDES and 1.5 times of that of the values predicted by IEC 62380 and Siemens SN 29500. MIL-
HDBK-217F2 predicted a MTBF that is approximately 26 times lower than what we get in reality.
Thus proving the extremely conservative approach of its prediction which gives values that are
unrealistic and can be misleading.
Given that the reliability of the board is significantly higher but not way off from the the
predicted values of FIDES, IEC 62380 and Siemens, it raises the confidence level of using these
standards.
Even though both FIDES and IEC 62380 predicted a high failure rate of the voltage converters
that were used in version 2.5 we were unable to see any failures occurring to the converters in
reality. The MTBF for converters according to the field failure thus is 3419 years.
It can also be concluded from the Sherlock simulation that the thermal cycling the board
undergoes in field conditions is very weak in order to cause damage to the solder joints of the
components that have failed.
This conclusion not only stands true for the transceiver, LED and the connector that have failed
in the field but also for the overall board including the rest of the components. This statement is
validated by the simulation result in Figure 5-10, where the probability of failure for the overall
board stands at only 0.8% at 30 years’ time.
The failures have most likely been induced due to other external factors rather than thermal
cycling. And it can well be but not limited to mechanical shock, vibration, thermal shock, etc.
One of the important conclusions to make is that the the LED and the switches shall not be
excluded from the theoretical prediction since the FFDA suggests that boards with these
components failing have been reported and sent for repair.
Even though, the designers do not consider these components for reliability prediction due to
their futility in board operability. However, once the customer, detects a failure of these components
in the field, they regardless of its effect in board operability send it for repair.
Finally, upon close observation of the field failure data analysis it can be deduced that at times
failures can be concentrated on specific projects, environments, etc. Upon analysis, if it can be
Field Failure Data Analysis for Board_A | 37
37
derived that mitigating these failures would increase the board performance by a substantial
amount, investigation can be made on those projects to find out the reasons for the failures and
eradicate them. The penultimate chapter in this dissertation includes how this process can be
integrated into elaborating a global model to predict RCS product reliability.
Figure 5-10: Overall Solder Joint fatigue life prediction
38 | Lab Testing and Reliability testing simulation on BT Products
6 Lab Testing and Reliability testing simulation on BT Products
Board_B is one of the very first boards within Bombardier to have been manufactured using the
lead-free soldering process.
Lead-free solder has a higher Young’s modulus than lead-based solder which makes lead-free
solders stiffer. According to [20], “for lead-free SAC solder, the structure exhibits grain formation
due to recrystallization which results in finer grains that separate at grain boundaries resulting in
crack growth”.
Second level interconnection is interconnect between the PCB and the package. Now
that this second level interconnection is made with lead-free solder pastes, the
reliability of the board is impacted.
New compound packages like 0.8 mm pitch BGA and QFN with smaller solder joints
are connected to the PCB. The combination of 0.8 mm pitch BGA and QFN with SAC
305 solder will reduce the reliability.
The idea behind performing the temperature cycling test for Board_B has been to find out
general failures on the board due to variation in temperature and failures occurring due to solder
joint fatigue on the components with BGA and QFN packages and other SMD components.
BGA packages are surface mount packages that are connected to the printed board via solder
balls.
Accelerated life tests is used to reduce the time it requires for testing when the product is very
reliable. During ALT, the product under test is put under environmental conditions that are much
more severe than the conditions that it will encounter after installation. This allows for the
opportunity to evaluate the useful life of the product and the electronic components and
connections within. The results can then be used to identify problems and improve them. The data
from the ALT is also used to predict life under typical field conditions.
There are many forms of accelerated life testing where the stresses applied accelerates the
failure process. The stresses can be applied as high or low temperatures, humidity, temperature
cycling, vibration, electrical stress, etc.
Since the duration of the Master Thesis is 5 months, this report is based on observation ending
on 2016-07-14. The test started on 2016-03-11 making the observation period 125 days which is
equivalent to 4.17 months.
6.1 Accelerated Life Testing
6.1.1 Experimental Setup
The thermal cycling is performed inside the thermal chamber Vötsch, VT 3050. The chamber has a
test space volume of 500 liters with a temperature range from -30°C to +100°C. The temperature
rate of change while heating is 2.0 K/min and while cooling is 1.4 K/min. External dimensions of
the chamber are 1955x1030x940 mm and test space dimensions are 1250x590x710 mm. The
thermal chamber can be seen in Figure 6-1.
Lab Testing and Reliability testing simulation on BT Products | 39
39
Figure 6-1: Thermal chamber
The boards are placed inside the subrack which in turn is placed inside the test space within the
thermal chamber. The boards are powered on and are running an internal test program to mimic
the operational state. The total population of boards for the accelerated life test have been 8 boards.
Four sandwiches are formed from these 8 boards with a pair of boards forming one sandwich. The
boards including the sub-rack inside the thermal chamber can be seen in Figure 6-2.
40 | Lab Testing and Reliability testing simulation on BT Products
Figure 6-2: Powered on boards inside the subrack
Sandwich 2, 4 and 5 are coated with conformal coating while sandwich 3 is left uncoated.
Conformal coating is a material of thin polymeric film which is applied on the printed circuit board
to provide protection against moisture, dust, chemical, etc.
6.1.2 Input conditions and duration of the thermal cycling
To ensure that the boards undergoing the temperature cycling test are placed in similar conditions
to that of the real scenario, the condition inside the thermal chamber is adjusted to mimic real
conditions. The settings of the thermal chamber are as follows:
Temperature range programmed: -12°C ≤ θ ≤ +82°C
Temperature range logged within the PCB: -2°C ≤ θ ≤ +72°C
Temperature Cycle: fluctuates between 145 minutes to 180 minutes for one period
Dwell time (Minimum Temperature): 25 minutes
Lab Testing and Reliability testing simulation on BT Products | 41
41
Dwell time (Maximum Temperature): 15 minutes
Ramp up: 50 minutes
Ramp down: 90 minutes
In real conditions, the temperature cycles between 50°C and 70°C with 24 hours cycle. Hence,
according to the service life of 30 years, the board goes through a total of 10950 cycles (30
years*365 cycles). For, the accelerated life test, the temperature range is from 0°C to 70°C with the
cycle taking roughly 145 -180 minutes to complete on an average. Hence, while the product
undergoes once cycle each day while at the field; the same duration will see the PUT undergo 8-10
cycles (24 hours*60 minutes/180 minutes).
One of the important factors to consider while performing accelerated life test is to find out for
how long it should run for the product under testing to have experienced the similar amount of
stress and number of cycles in accordance with the real conditions.
The Norris Landzberg model (modified Coffin-Manson model) is used to model the acceleration
on the fatigue mechanism due to temperature variation. The equation uses different fatigue
coefficient for different types of solder materials. For SnPb solder, the fatigue coefficient, m is
assigned the constant value of 1.9 whereas when it is a lead-free (SAC 305) solder, m is assigned the
constant value of 2.65 [21]. However, none of the equations include any impact from conformal
coating on second level interconnect. The Norris-Landzberg model that is used to acquire the
acceleration factor is given below:
𝐴𝐹 = (𝛥𝑇𝑡
𝛥𝑇𝑜)
𝑚
∗ (𝑓𝑜
𝑓𝑡)
0.136
∗ 𝑒𝑥𝑝 {𝐸𝑎
𝑘(
1
𝑇𝑚𝑎𝑥,𝑜−
1
𝑇𝑚𝑎𝑥,𝑡)} (𝑆𝐴𝐶 305)
𝐴𝐹 = (𝛥𝑇𝑡
𝛥𝑇0)
𝑚
∗ (𝑓𝑜
𝑓𝑡)
(1/3)
∗ 𝑒𝑥𝑝 {𝐸𝑎
𝑘(
1
𝑇𝑚𝑎𝑥,𝑜−
1
𝑇𝑚𝑎𝑥,𝑡)} (𝑆𝑛𝑃𝑏)
AF = acceleration factor
ΔT = rate of change in temperature
o = operating/field condition
t = test condition
𝑇𝑚𝑎𝑥,𝑜 = maximum temperature in operating field/condition
𝑇𝑚𝑎𝑥,𝑡 = maximum temperature in test condition
f = frequency of number of cycles
m = solder fatigue coefficient
k = Boltzmann constant
𝐸𝑎= activation energy
The reliability prediction standard FIDES uses the Norris-Landzberg model as well to model the
acceleration on the solder joint fatigue mechanism due to temperature variations.
In the current state, the maximum temperature in the field condition and the test condition is
the same. In addition we are also aware of the acceleration factor of the number of cycles. Hence,
the equations reduce to:
𝐴𝐹 = (𝛥𝑇𝑡
𝛥𝑇0)
𝑚
42 | Lab Testing and Reliability testing simulation on BT Products
𝐴𝐹 = (70
20)
1.9
= 10.807 (𝑆𝑛𝑃𝑏)
𝐴𝐹 = (70
20)
2.65
= 27.655 (𝑆𝐴𝐶305)
Now based on the different acceleration factor depending on the solder material, the number of
thermal cycles we need to carry out can be found by using the following formula:
𝐶𝑦𝑐𝑙𝑒𝑠𝑡 = 𝐶𝑦𝑐𝑙𝑒𝑠𝑜
𝐴𝐹
Table 6-1 shows how for the different solder materials, from the above information it is possible
to deduce the duration of the ALT.
Table 6-1: Calculation for duration of ALT for different solder materials
Solder material SnPb SAC 305
𝐶𝑦𝑐𝑙𝑒𝑠𝑡 10950
10.807= 1013.2 𝑐𝑦𝑐𝑙𝑒𝑠
10950
27.655= 395.95 𝑐𝑦𝑐𝑙𝑒𝑠
Duration for cycles to complete
(minutes)
1013.2 𝑐𝑦𝑐𝑙𝑒𝑠 ∗ 180 𝑚𝑖𝑛𝑢𝑡𝑒𝑠= 182376 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
395.95 𝑐𝑦𝑐𝑙𝑒𝑠 ∗ 180 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
= 71271 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
Duration for cycles to complete
(days)
126.65 days ≈ 4.22 months 49.5 days ≈ 1.65 months
From the calculations above it is possible to deduce that for the Board_B where components are
mounted using lead-free solders, 396 accelerated life cycles with ΔT=70°C is equivalent to
approximately 10950 cycles in field conditions. Hence, running the test for 49.5 days in the thermal
chamber is equivalent to 30 years of service life of the product in field conditions.
There are different ways the failures can be analysed if they occur. These methods include
optical and scanning electron microscopy (SEM), X-ray and coupled scanning acoustic microscopy
(CSAM), cross-section (transverse and parallel), and dye-and-pry (pressurized dye exposure of
assembled unit followed by mechanical package removal). In the case of any hardware related
failures, the test sample will be sent for micro-sectioning and finding out the failure cause.
In case, the test completes the accelerated life test without any failures a minimum of three
randomly selected boards from the overall test sample shall be sent to perform the failure analysis.
This is to ensure that no failures were missed due to design in the test application, hardware design
or external factors.
All internal interfaces between processors on Board_B are supervised by test programs running
on the processors. Figure 6-2 displays the test boards of Board_B mounted in “sandwiches”
constituting of two Board_B, interlinked with a small backplane board.
The test application allows for one of the processors to send a packet/telegram to the same
processor on the opposite side of the sandwich. The other processor after receiving the packet sends
back an “ack” packet to the processor that initiated the message passing. This holds true for all the
processors. This is a way for the test application to check if all the interfaces under supervision are
functional or not. In case a packet goes missing and an interface stops working it can be identified
from the GUI.
Lab Testing and Reliability testing simulation on BT Products | 43
43
Not all the interfaces are supervised by the test application. The interface around the power
module is manually supervised by observing the LED on the board which is green during operation.
If a component within the power module interface fails, this can be observed from the LED as it will
turn either orange or red or will completely go off.
6.1.3 Observation
The temperature log displaying the temperature cycling over time and the temperature variation can
be seen in Figure 6-3 .
Figure 6-3: Temperature Log
During the course of the observation period, the four sandwiches have accumulated up to 3929
cycles. Total cycles for each individual sandwich are as follows:
Sandwich 2: 928 cycles
Sandwich 3: 679 cycles
Sandwich 4: 1156 cycles
Sandwich 5: 1166 cycles
No potential HW failures have been observed. The monitor displaying the test application
activities gave some warnings but all that is because of issues related to test application and cables.
The software issues stemmed due to priority preference of the tasks to be performed by the test
application. The current test application version have high priority for “DD-test” which is the NOR
Flash test. The issues can be seen in Monitor Log in [22].
44 | Lab Testing and Reliability testing simulation on BT Products
The methodology used for observing if any HW failure has occurred in the boards have been to
map the test application output to the temperature logged within the thermal chamber. The idea has
been to look for any correlations to the failures with temperature. One such interesting observation
is made, when the S- processor in the right board of the sandwich 2 started to show recurrent timing
out and becoming active. There have been two different volatile temperature ranges at which the
processor would time out and become active again. However, with time, they were shifting towards
the lower temperature region of the thermal cycle. Eventually, the processor S sent a time out
message via the test app while the temperature was -5°C and never became active again. Upon
further investigation, it is found out that the cable connecting the two boards in Sandwich 2 have
become loose and do not allow Processor S on the right board in Sandwich 2 to send any package
eventually leading to the time out messages via the test app.
The change in temperature range while the error messages are popping up led to the
understanding that this is a random failure happening and could be due to SW failures. However,
upon the finding of the broken cable, this can be an indicator in the future for similar failures. At
higher temperatures the cable would expand and will have connectivity but will lose connectivity
due to low temperature when contraction would occur. At the end, the final contraction led the cable
to a position from where it could not connect to the port even at higher temperatures anymore. This
failure is not considered as a HW failure intrinsic to the board.
Also a good indicator, of this kind of failures as observed would be, with progressing time the
frequency of the processor being timed out and getting back active again would decrease eventually
leading to failure.
Sherlock Reliability Testing Simulation – Accelerated Life Testing | 45
45
7 Sherlock Reliability Testing Simulation – Accelerated Life Testing
Solder fatigue analysis is performed for the Board_B undergoing the Accelerated life testing. The
input parameters have been set in line with the parameters used for accelerated life testing and by
using the correlation derived using the acceleration factor. The input parameters for Sherlock
simulation are as follows:
Life cycle
o Service Life = 50 days
Life phase editor
Phase settings
o Environment: Ground_Benign
o Duration: 180 days
o Number of cycles: 100 Duty cycles
Thermal event editor
o Thermal event settings
Number of cycles: 100 Duty cycles
Life cycle status: Operating
Thermal Profile Editor
o Time units: Minutes
o Temperature units: °C
o Step
Minimum temperature: Hold for 25 minutes at 0°C
Ramp up: 50 minutes until temperature reaches 70°C
Maximum temperature: Hold for 15 minutes at 70°C
Ramp down: 90 minutes until temperature reaches 0°C
According to [23], in 3.4.11, the dwell time in this case has been considered the time when the
temperature stays below 0°C for the lower end of the cycle and above 70°C for the upper end of the
cycle.
The life prediction curve from the solder fatigue analysis performed in Sherlock is given in
Figure 7-1.
46 | Sherlock Reliability Testing Simulation – Accelerated Life Testing
Figure 7-1: Solder Fatigue Life Prediction Curve, Board_B_ALT, Weibull curve
From the solder fatigue analysis, the possibility of the Board_B having any failure after the
completion of the ALT in approximately 50 days is 0.32%.
Figure 7-2, shows the chart with the distribution of TTF (days) values for all of the 1774
components analyzed. It can be seen from the chart that all the 1774 components are having a TTF
that is 5 times higher than the service life defined for the ALT.
The following components are not part of the analysis due to package type being not supported:
F1
U42
U50
U144
Sherlock Reliability Testing Simulation – Accelerated Life Testing | 47
47
Figure 7-2: Solder Joint Fatigue Life Distribution_Board_B_ALT
In the worst possible case even if the consideration is made to use the lowest acceleration factor
which is derived after using the solder fatigue coefficient for SnPb solders, the accelerated life test is
enough with 127 days. Figure 7-3 displays a zoomed out version of the life prediction curve in Figure
7-1 in order to view the probability of failure (%) due to solder joint fatigue for the Board_B at the
end of ALT in day 127. The simulation result shows that there is a 5.17% probability of failure due to
solder joint fatigue for Board_B in case the ALT is continued for 127 days keeping in consideration
the lowest acceleration factor possible.
48 | Sherlock Reliability Testing Simulation – Accelerated Life Testing
Figure 7-3: Solder Joint fatigue Life prediction_Board_B_m=2.65
7.1 Observation
Based on the observation from the ALT, not HW failures have been observed. As for Sandwich 4 and
Sandwich 5 according to the worst case acceleration factor they have already seen as many cycles as
the real product in field conditions. Based on the reference made above, three of the random boards
from the overall sample can be sent to for micro-sectioning of the boards to ensure no failures are
missed due to design in the test application, hardware design or external factors once the other two
boards survive the ALT without enduring any failures.
The ALT outcomes can also be validated by observing the Sherlock simulation result which at
least for failures due to solder joint fatigue displays a very low probability.
One of the important findings during the course of this ALT have been to figure out the
acceleration factor correlating the ALT with the field conditions. This model for finding out the
acceleration factor can be used within Bombardier for other products to undergo thermal cycling as
part of accelerated life testing. This would allow for correlating better the number of cycles that the
product must undergo during ALT to match the number of cycles the product sees during the
normal operating conditions throughout the course of its service life.
The Model | 49
49
8 The Model
Circuit Card assemblies or PCBs are one of the major focus of development within BT RCS. It is vital
that the prediction of the reliability of these boards are done with utmost care. A carefully
constructed prediction allows the stakeholder involved to have a better understanding of the
performance of the product. It also allows the organization to present an important element during
the bid phase to familiarize its clients to the performance of the board.
The issue in hand is that the predicted reliability values often differ from the performance of the
boards in field conditions. This document reveals how to enhance the process of predicting the
reliability of the boards that is more realistic to the performance of the board in field condition. This
will increase the confidence in the prediction performed and also providing the clients with better
results during the bid phase.
The model is what has been implemented during the course of this thesis. First of all, reliability
prediction across selected prediction standards are performed on Board_Av2.5. The predicted
results are shown in Figure 8-1. Afterwards, FFDA is performed by utilizing the substantial amount
of field failure data that is reported for different versions of Board_A. The MTTF obtained for
Board_Av2.5 from the field failure data analysis is reported to be approximately 141 failures.
Comparing this with the predicted results for Board_A across different standards in Figure 8-1, it is
possible to correlate the board performance with each of the reliability results predicted by the
standards. Following this procedure, it can be seen that the performance of the Board_Av 2.5 in the
field is 1.96 times better than the result predicted by FIDES, 1.47 times better than what’s predicted
by IEC 62380 and Siemens SN 29500. The difference between the field performance and the results
predicted by the MIL-HDBK-217F2 standard was the largest with the field performance being 25.7
times greater than predicted value.
Figure 8-1: Theoretical Reliability Prediction of Board_A
For the future versions of the Board_A, the same factors can be applied to the respective
prediction standards to get a MTTF that is more relevant and closer to the performance of the board
in the field.
The model can be applied to other boards within BT RCS. However, the prediction needs to be
done again since the complexity of the boards vary between one another. And as long as the board is
a matured product with field failure data available, after the field failure analysis the result can be
compared to the predicted values by the standards to find out the correlation factor. This factors can
later be applied to the future versions of the board.
The model is constrained by the fact that it requires the product to have field failure data to be
implemented. Hence, it is not implementable on boards that are not installed in the field or boards
50 | The Model
without substantial amount of failure data. This specifically applies to the lead-free boards within
BT RCS most of which are still under development.
To ensure that the products under development are not excluded from predicting their
performance in the field due to lack of field data. A separate model is suggested.
The boards are to undergo accelerated life testing similar to what the Board_B has undergone as
part of the thesis. The results obtained from the accelerated life testing will allow to predict the
board’s performance in the field operating conditions. However, it is very important that an
appropriate acceleration factor is derived so as to ensure that the ALT cycles undergone by the
board is equivalent to the cycles it experiences during its service life in field operating conditions.
Some physics of failure models that can be used to relate the results obtained under ALT with
results under normal field conditions are given below:
Arrhenius Acceleration Model
o Thermal stress
Inverse Power Law Model
o Non-Thermal accelerated stress
Eyring Model
o Thermal stress and Electrical/Humidity stress
Norris-Landzberg Model
o Thermal cycling
During the course of this thesis, one of the major findings has been the acceleration factor
correlating the test conditions to the operating conditions which later helped in finding out the ALT
cycles equivalent to the cycles in the field. Since, this was a test concerning thermal cycling, the
Norris-Landzberg model was used. However, depending on the ALT, the relevant model needs to be
chosen.
The proposed model states that once the ALT is performed the results can be validated by using
a reliability testing simulation tool e.g. Sherlock, where it is possible to mimic both the ALT
condition and the field operating condition. This way the confidence level in the reliability
prediction outcome will increase as well as the performance of the board throughout its service life.
Conclusions and Future work | 51
9 Conclusions and Future work
9.1 Conclusions
To conclude this dissertation, the goals that have been defined in Section 1.3 have been realized. The
initial steps of achieving the sub-goals and later utilizing their outcomes helped in accomplishing
the final goal of this dissertation. The ‘global’ model that is the outcome of this thesis project would
allow BT to efficiently perform product reliability. Using the model, would allow them to gather
reliability information on their products that is more accurate and realistic. Presenting more
accurate reliability information to the potential clients in the bid phase will be very attractive both
for BT and the clients.
During the course of this project, many insights were gained. Most important of all would be the
fact, that the knowledge that has been earned during the initial period through literature study was
successfully implemented in practice. The planning of the project has been very crucial and the
decision to have bi-weekly meetings to provide updates on the progress of the project turned out to
be of great help. Thus, the project ended successfully in due time.
Initially, almost a month of literature study was performed in Board_C, which was supposed to
be the matured product, to perform field failure data analysis on. Later the board was changed to
Board_A, due to the non compatibility of its design files with the reliability testing simulation tool.
If I were to do the same work again, I would check the compatibility of the design files of the board
with the reliability resting simulation tool. This would allow me to spend the time I would save in
other parts of the project.
9.2 Limitations
In general, during the course of this dissertation, the process have run smoothly. In few instances,
there were some hindrances that limited the efforts done in order for a successful accomplishment
of the tasks being done. They are listed in this section.
One of the limitations have been in the delay in the development of the test application software
designed to track down failures occurring to the three processors in the Board_B during the
accelerated life testing. Due to the delay in development of the test application, the processors of the
board could not be monitored for any failures until almost a month after the accelerated life testing
begun. However, this limitation did not have any impact on the results since once the test app was
ready and installed on the processors, no failures were found. This went on to suggest that during
the time period when the test app was not ready, the board was fully functional without any failures.
There has been some limitation during the field failure data analysis. In some of the field
incident report, the version of the Board_A was missing for which the failure occured. To limit the
overall effect on the accuracy of the field failure data analysis, the period of the failure is matched
with the period of the Board_A version release. The version of the Board_A that is released before
the failure date is assumed to have been failed and is used for the analysis.
9.3 Future work
The field failure data analysis consisted of field data collections on different dates. They are namely,
Project commissioning date, Warranty expiry date of the project, Service contract of the project (if
any) after the project has ended. However, what has not been included in the scope of this master
thesis is the grey zone between the product entering in service and project commissioning date.
Apparently, the time duration between the installation date and the project commissioning date can
52 | Conclusions and Future work
vary from some weeks to a couple of years. During this time, there can be incident reports regarding
product failure. Similar grey zone can exist between the expiry date of the warranty and the end of
service contract when the incident reports are likely to not be recorded or observed. In the future, it
is possible to extract data from these grey zones. This will ensure an increased coverage of the
failure incident reports for all the products being used at Bombardier. As a result, the field failure
data analysis can be enhanced much more than what it is at present.
In addition, in the future ALT of the products can be done with other accelerating variables such
as vibration, humidity, electrical stress, etc.
References | 53
References
[1] A. Hakansson, “Portal of research methods and methodologies for research projects and degree projects,” Steer. Comm. World Congr. Comput. Sci. Comput. Eng. Appl. Comput. WorldComp, p. 1, 2013.
[2] E. Dubrova, Fault-Tolerant Design | Elena Dubrova | Springer, 1st ed. Springer-Verlag New York, 2013.
[3] Murthy, Rausand, and Osteras, Product Reliability, 1st ed. London: Springer-Verlag London, 2008.
[4] J. A. Jones, “Electronic reliability prediction: a study over 25 years,” phd, University of Warwick, 2008.
[5] J. Jones and J. Hayes, “A comparison of electronic-reliability prediction models,” IEEE Trans. Reliab., vol. 48, no. 2, pp. 127–134, Jun. 1999.
[6] “Selection Guide for electronic components predictive reliability models.” IMdR, Oct-2009.
[7] L. Escobar and W. Meeker, “A Review of Accelerated Test Models,” vol. 21, no. 4, pp. 552–557, 2006.
[8] A. Kostic, “Lead-free Electronics Reliability - An Update,” GEOINT Development Office, Aug-2011.
[9] M. Meilunas, A. Primavera, and S. Dunford, “Reliability and Failure Analysis of Lead-Free Solder Joints,” in IPC Conference Proceedings, 2002.
[10] “RAM Products Catalogue: Components vs. RAM Deliverables.” Bombardier Transportation (RCS), 31-Mar-2016.
[11] C. Platt, Encyclopedia of Electronic Components, vol. 1. O’Reilly, 2012. [12] C. Platt, Encyclopedia of Electronic Components, vol. 2. O’Reilly, 2014. [13] “UTE_Guide_FIDES_2009_Ed_A_EN.pdf.” . [14]“IEC TR 62380: Reliability Data Handbook - Universal model for reliability prediction
of electronic componenets, PCBs and equipment,” International Electrotechnical Commission, Aug. 2004.
[15] “IEC 61709: Electric components - Reliability - Reference conditions for failure rates and stress models for conversion.” International Electrotechnical Commission, Jun-2011.
[16] K. Nylund, “Reliabilitity Prediction for Board_A.” Bombardier Transportation (RCS), 09-Nov-2011.
[17] M. Magnusson, “Reliability Prediction for Board_B.” EHC, Bombardier Transportation (RCS), 29-Apr-2014.
[18] W. Nualpluad, “All RAM view 21-06-2016.” Bombardier Transportation (RCS). [19] “GRP-40-10-25-007660 rev 03 en - NCR Standard Catalogues (data) (004).”
Bombardier Transportation (RCS), 21-Jun-2016. [20] M. Osterman, “Effect of Temperature Cycling Parameters (Dwell and Mean
Temperature) on the Durability of Pb-free solders,” 27-Jan-2010. [Online]. Available: http://www.calce.umd.edu/lead-free/CALCE-IMAPS2010.pdf. [Accessed: 09-Nov-2016].
[21] V. Vasudevan and X. Fan, “An acceleration model for lead-free (SAC) solder joint reliability under thermal cycling,” in 2008 58th Electronic Components and Technology Conference, 2008, pp. 139–145.
[22] K. Nylund, “Logbook_BoardB_TempCycling.” 30-Jun-2016. [23] “Performance Test Methods and Qualification Requirements for Surface Mount
Solder Attachments,” Feb-2006. [Online]. Available: http://www.ipc.org/TOC/IPC-9701A.pdf. [Accessed: 09-Nov-2016].
Appendix A: Benchmark of Reliability Standards | 55
Appendix A: Benchmark of Reliability Standards
Supplementary Data File
Description:
The accompanying Excel spreadsheet consists of five worksheets. They are:
1. Classification and Comparison
2. Component Mapping
3. Possible Problems and CoP
4. Component Ratio Mode and LE
5. References
Filename:
Benchmark_of_Reliability_Standards.xlsx
Appendix A: Benchmark of Reliability Standards | 57
TRITA-ICT-EX-2016:185
www.kth.se
Top Related