Download - Reliability prediction of electronic products combining ...

Reliability prediction of electronic products combining models, lab testing and field data analysis

NOOR CHOUDHURY

KTH ROYAL INSTITUTE OF TECHNOLOGY

I N F O R M A T IO N A N D C O M M U N I C A T I O N T E C H N O L O G Y

DEGREE PROJECT IN COMMUNICATION SYSTEMS, SECOND LEVEL

STOCKHOLM, SWEDEN 2017

Reliability prediction of electronic products combining models, lab testing and field data analysis

Noor Choudhury

2017-01-16

Master’s Thesis

Examiner and Academic adviser Professor Elena Dubrova

Industrial adviser Romain Tiennot (Bombardier Transportation)

KTH Royal Institute of Technology

School of Information and Communication Technology (ICT)

Department of Communication Systems

SE-100 44 Stockholm, Sweden

Abstract | i

Abstract

At present there are different reliability standards that are being used for carrying out reliability

prediction. They take into consideration different factors, environments and data sources to give

reliability data for a wide range of electronic components. However, the users are not aware of the

differences between the different reliability standards due to the absence of benchmarks of the

reliability standards that would help classify and compare between them. This lack of benchmark

denies the users the opportunity to have a top-down view of these different standards and choose

the appropriate standard based on qualitative judgement in performing reliability prediction for a

specific system.

To addres this issue, the benchmark of a set of reliability standards are developed in this

dissertation.

The benchmark helps the users of the selected reliability standards understand the similarities

and differences between them and based on the evaluation criterion defined can easily choose the

appropriate standard for reliability prediction in different scenarios.

Theoretical reliability prediction of two electronic products in Bombardier is performed using

the standards that have been benchmarked. One of the products is matured with available incident

report from the field while the other is a new product that is under development and yet to enter in

service. The field failure data analysis of the matured product is then compared and correlated to

the theoretical prediction. Adjustment factors are then derived to help bridge the gap between the

theoretical reliability prediction and the reliability of the product in field conditions.

Since the theoretical prediction of the product under development could not be used to compare

and correlate any data due to unavailability, instead, the accelerated life test is used to find out the

product reliability during its lifetime and find out any failure modes intrinsic to the board. A crucial

objective is realized as an appropriate algorithm/model is found in order to correlate accelerated

test temperature-cycles to real product temperature-cycles. The PUT has lead-free solder joints,

hence, to see if any failures occurring due to solder joint fatigue has also been of interest.

Additionally, reliability testing simulation is a performed in order to verify and validate the

performance of the product under development during ALT.

Finally, the goal of the thesis is achieved as separate models are proposed to predict product

reliability for both matured products and products under development. This will assist the

organization in realizing the goal of predicting their product reliability with better accuracy and

confidence.

Keywords

Reliability, Reliability Standards, Benchmark, Field data analysis, Thermal Cycling test, FIDES,

Siemens SN 29500, IEC 62380, MIL-HDBK-217F-Notice2, Reliability Prediction, Accelerated Life

Test, Product Reliability

Sammanfattning | iii

Sammanfattning

För närvarande finns det olika tillförlitlighetsstandarder som används för att utföra

tillförlitlighet förutsägelse. De tar hänsyn till olika faktorer, miljöer och datakällor för att ge

tillförlitlighetsdata för ett brett spektrum av elektronikkomponenter. Men användarna inte är

medvetna om skillnaderna mellan de olika tillförlitlighetsstandarder på grund av avsaknaden av

riktmärken för tillförlitlighetsstandarder som skulle hjälpa klassificera och jämföra mellan dem.

Denna brist på jämförelse förnekar användarna möjlighet att få en top-down bakgrund av dessa

olika standarder och välja lämplig standard baserad på kvalitativ bedömning att utföra

tillförlitlighet prognos för ett specifikt system.

För att lösa detta problem, är riktmärket en uppsättning av tillförlitlighetsstandarder som

utvecklats i denna avhandling.

Riktmärket hjälper användarna av de utvalda tillförlitlighetsstandarder förstå likheter och

skillnader mellan dem och på grundval av bedömningskriteriet definieras kan enkelt välja lämplig

standard för pålitlighet förutsägelse i olika scenarier.

Teoretisk tillförlitlighet förutsäga två elektroniska produkter i Bombardier utförs med hjälp av

standarder som har benchmarking. En av produkterna är mognat med tillgängliga incidentrapport

från fältet, medan den andra är en ny produkt som är under utveckling och ännu inte gå in i

tjänsten. Analysen av den mognade produkten fält feldata jämförs sedan och korreleras till den

teoretiska förutsägelsen. Justeringsfaktorer sedan härledas för att överbrygga klyftan mellan den

teoretiska tillförlitlighet förutsägelse och tillförlitligheten av produkten i fältmässiga förhållanden.

Eftersom den teoretiska förutsägelsen av produkt under utveckling inte kan användas för att

jämföra och korrelera alla data på grund av otillgängligheten, i stället är det accelererade

livslängdstest som används för att ta reda på produktens tillförlitlighet under dess livstid och reda

ut eventuella felmoder inneboende till styrelsen . Ett viktigt mål realiseras som en lämplig algoritm

/modell finns i syfte att korrelera accelererade provningen temperaturcykler på verkliga

produkttemperatur cykler. PUT har blyfria lödfogar därmed att se om några fel inträffar på grund av

löda gemensam trötthet har också varit av intresse. Dessutom är tillförlitlighet testning simulering

en utförs för att verifiera och validera produktens prestanda under utveckling under ALT.

Slutligen är målet med avhandlingen uppnås som separata modeller föreslås att förutsäga

produktens tillförlitlighet för både förfallna och produkter under utveckling. Detta kommer att

hjälpa organisationen att förverkliga målet att förutsäga deras tillförlitlighet med bättre

noggrannhet och förtroende.

Nyckelord

Pålitlighet, Tillförlitlighetsstandarder, Riktmärke, Fältdatanalys, Termisk cykelstest, FIDES, Siemens

SN 29500, IEC 62380, MIL-HDBK-217F-Notice2, tillförlitlighet förutsägelse, accelererade

livslängdstestet, Produktens tillförlitlighet

Acknowledgments | v

Acknowledgments

Professor Elena Dubrova

My academic adviser and examiner for this thesis. Thank you for being an absolutely amazing

teacher for my course in “Design of Fault Tolerant Systems” at KTH. It is because of this course and

your teaching, I became interested in the field of reliability and ended up doing this research. Also,

thanks a lot for motivating me while doing this project. Your invaluable feedback and motivation

allowed me to accomplish this research successfully.

Romain Tiennot

My industrial adviser for this thesis at Bombardier Transportation. Thank you for your guidance

throughout the course of my research. Your advice and feedback helped me to be on course and

ensured high quality of my research. I very much appreciate the knowledge and experience that you

shared with me. They helped me in becoming more organized and professional.

Kenneth Nylund

My mentor at Bombardier Transportation. Thank you for sharing all your valuable experiences

regarding accelerated life testing, effect of solder fatigue on circuit card assemblies, etc. It has been a

privilege for me to have worked in close collaboration with you. I very much appreciate your

materials for literature study. They have been of great value and helped in enhancing my knowledge.

Laudier Ndikuriyo & Olga Eskova

My colleagues from Bombardier Transportation. Thanks for your support during my tenure at

BT. It certainly helped in my smooth transition to the organization.

My parents and my sister

Thank you for cheering me on all this time. Because of your encouragement, I had the strength

of getting through all the hurdles.

Stockholm, January 2017 Noor Choudhury

Table of contents | vii

Table of contents

Abstract ..................................................................................... i Keywords .................................................................................................. i Reliability, Reliability Standards, Benchmark, Field data analysis, Thermal Cycling test, FIDES, Siemens SN 29500, IEC 62380, MIL-HDBK-217F-Notice2, Reliability Prediction, Accelerated Life Test, Product Reliability .................................................................................... i

Sammanfattning ...................................................................... iii Nyckelord ................................................................................................ iii Pålitlighet, Tillförlitlighetsstandarder, Riktmärke, Fältdatanalys, Termisk cykelstest, FIDES, Siemens SN 29500, IEC 62380, MIL-HDBK-217F-Notice2, tillförlitlighet förutsägelse, accelererade livslängdstestet, Produktens tillförlitlighet .......................................... iii

Acknowledgments.................................................................... v

Table of contents ................................................................... vii List of Figures .......................................................................... x

List of Tables .......................................................................... xii List of acronyms and abbreviations .................................... xiii 1 Introduction ........................................................................ 1

1.1 Problem definition ....................................................................... 1

1.2 Purpose ........................................................................................ 1 1.3 Goals ............................................................................................ 2

1.4 Research Methodology ............................................................... 2

1.5 Delimitations ................................................................................ 2

1.6 Structure of the thesis ................................................................ 2

2 Background ........................................................................ 3

2.1 Reliability standards ................................................................... 3 2.2 Electronic products for reliability prediction, ALT and field failure data analysis ................................................................................ 4

2.2.1 Board_A ............................................................................ 4 2.2.2 Board_B ............................................................................ 4

2.3 Accelerated Testing .................................................................... 4 2.4 ITEM QT ........................................................................................ 5

2.5 Sherlock ....................................................................................... 5 2.5.1 Life Cycle .......................................................................... 5 2.5.2 Inputs ................................................................................ 7

2.5.3 Analysis Modules .............................................................. 7

3 Benchmark of Reliability Prediction standards .............. 10

3.1 Identification of reliability standards used at Bombardier Transportation ....................................................................................... 10 3.2 Benchmark Structure ................................................................ 10 3.3 Classification and Comparison ................................................ 11

8 | Table of contents

3.4 Evaluation Criterias ................................................................... 12

3.5 Evaluation of reliability standards based on the defined criterias .................................................................................................. 14

3.5.1 FIDES ............................................................................. 14 3.5.2 IEC 62380 ....................................................................... 14 3.5.3 Siemens SN29500 .......................................................... 15

3.5.4 MIL-HDBK-217F-Notice 2 ............................................... 15

3.6 References across different standards ................................... 15

4 Theoretical Reliability Prediction .................................... 16

4.1 Input parameters and Assumptions ........................................ 16 4.1.1 FIDES ............................................................................. 16 4.1.2 IEC 62380 ....................................................................... 17

4.1.3 Siemens SN 29500 ......................................................... 17 4.1.4 MIL-HDBK-217F2 ........................................................... 18

4.2 Prediction outcome ................................................................... 19 4.2.1 Board_A .......................................................................... 19 4.2.2 Board_B .......................................................................... 22

4.3 Theoretical Prediction Analysis ............................................... 25

5 Field Failure Data Analysis for Board_A ......................... 28

5.1 Board_A version distinction ..................................................... 28 5.2 Data Sources and required Inputs ........................................... 29 5.3 Elaboration of field failure data ................................................ 29

5.4 Solder Fatigue analysis for Board_A ....................................... 34 5.5 Conclusion ................................................................................. 35

6 Lab Testing and Reliability testing simulation on BT Products ................................................................................. 38

6.1 Accelerated Life Testing ........................................................... 38 6.1.1 Experimental Setup ......................................................... 38

6.1.2 Input conditions and duration of the thermal cycling ....... 40

6.1.3 Observation .................................................................... 43

7 Sherlock Reliability Testing Simulation – Accelerated Life Testing .................................................................................... 45

7.1 Observation................................................................................ 48

8 The Model .......................................................................... 49

9 Conclusions and Future work ......................................... 51

9.1 Conclusions ............................................................................... 51

9.2 Limitations ................................................................................. 51 9.3 Future work ................................................................................ 51

References .............................................................................. 53

Appendix A: Benchmark of Reliability Standards ............... 55

Supplementary Data File ...................................................................... 55 Description: ........................................................................................... 55 Filename: ............................................................................................... 55

10 | List of Figures

List of Figures

Figure 2-1: Solder joint fatigue failure on a TSOP package. (Source: CALCE News, September 1993) ............................................................ 8

Figure 4-1: Block level prediction of Board_A .......................................... 20 Figure 4-2: Functional level prediction of Board_A .................................. 21 Figure 4-3: Component level prediction of Board_A ................................. 22 Figure 4-4: Block level prediction of Board_B ........................................... 23 Figure 4-5: Functional level prediction of Board_B .................................. 25 Figure 4-6: Component level prediction of Board_B ................................. 25 Figure 5-1: Timeline of Observation ......................................................... 30 Figure 5-2: Stackup of operating hours ...................................................... 31 Figure 5-3: Field failure data analyis of Board_A and Board_AE ............. 31 Figure 5-4: MTBF of different versions of Board_A extracted from field

failure data analysis ................................................................. 32 Figure 5-5: Failure statistics for Board_A .................................................. 33 Figure 5-6: Solder Fatigue life prediction for Transceiver (D709) ............ 34 Figure 5-7: Solder Fatigue life prediction for LED (D717) ......................... 35 Figure 5-8: Solder Fatigue life prediction for connector (J702) ................ 35 Figure 5-9: Comparison of reliability data for Board_Av2.5 ..................... 36 Figure 5-10: Overall Solder Joint fatigue life prediction ............................. 37 Figure 6-1: Thermal chamber ..................................................................... 39 Figure 6-2: Powered on boards inside the subrack ................................... 40 Figure 6-3: Temperature Log ......................................................................... 43 Figure 7-1: Solder Fatigue Life Prediction Curve, Board_B_ALT, Weibull

curve .........................................................................................46 Figure 7-2: Solder Joint Fatigue Life Distribution_Board_B_ALT .......... 47 Figure 7-3: Solder Joint fatigue Life prediction_Board_B_m=2.65........ 48 Figure 8-1: Theoretical Reliability Prediction of Board_A ........................49

List of Figures | xi

12 | List of Tables

List of Tables

Table 3-1: Classification and Comparison of reliability standards ...........11 Table 3-2: List of evaluation criteria ......................................................... 12 Table 5-1: Board_A version history ......................................................... 28 Table 5-2: List of failed components ........................................................ 34 Table 6-1: Calculation for duration of ALT for different solder materials42

List of acronyms and abbreviations | xiii

List of acronyms and abbreviations

AF Acceleration factor

ALT Accelerated Life Test

AT Accelerated Testing

BGA Ball Grid Array

BT Bombardier Transportation

CCU Communication Controller Unit

CIS Central Interlocking System

COM Communication Board

CoP Causes of Problem

COTS Commercial-off-the-shelf

DGA Délégation générale pour l'armement

EOS Electrical overstress

FFDA Field Failure Data Analysis

FIT Failures in Time

FR Failure Rate

HW Hardware

IEC International Electrotechnical Commission

IC Integrated Circuit

ICT Information and Communication Technology

KPI Key Product Information

MOS Mechanical overstress

MTBF Mean Time Between Failures

MTTF Mean Time to Failure

nf/h Nano-faults per hour

NTC Number of thermal cycle

OC Object Controller

OS Operating System

Pb Lead

PCB Printed Circuit Board

PoF Physics of Failure

PUT Product under testing

QFN Quad Flat No-Leads

RCS Rail control solutions

RH Relative humidity

ROI Return on Investment

T Ambient temperature

TOS Thermal overstress

TR Technical report

Introduction | 1

1 Introduction

This chapter describes the specific problem that this thesis addresses, the context of the problem,

the goals of this thesis project, and outlines the structure of the thesis.

This dissertation aims at providing an overview and cross-checking of the different prediction

methods for the Reliability of electronic/electro-mechanical components used in the field of Rail

Signalling (Object Controller System), with a special emphasis on Reliability lab-testing. It also

consists in deriving and evaluating some corrective and/or scaling factors to correlate the Reliability

figures obtained by:

Theoretical predictions by use of applicable Reliability databases, Norms, Standards

and tools

Tests carried out in laboratory (e.g. by use of temperature chambers, accelerated life

test approaches, etc.)

Failure analysis from field data by use of Root Cause Analysis and statistical approach

The key responsibilities have been to classify and compare the reliability prediction results

when applying the various applicable standards/norms for reliability. Afterwards, “typological” lab

tests are perfomed on selected RCS products, extrapolating the resulting reliability figures and

comparing the outcomes with theoretical reliability predictions. For RCS product that is mature and

is has operational history available, the reliability figures based on the field failure data is cross

checked with theoretical reliability predictions.

All the results are accumulated and summarized to elaborate a global model for RCS BT to

predict product reliability. The model also integrates a derived algorithm to correlate accelerated

test temperature-cycles to real product temperature-cycles.

1.1 Problem definition

BT carries out reliability prediction for each and every products and this model applies globally.

However, it is true that there is no tool or standard that provides guidelines to consider theoretical

reliability prediction, reliability testing and field data analysis all together in the product reliability

predictions.

1.2 Purpose

The purpose of the thesis spans across several objectives.

First of all, creation of a benchmark of reliability standards increases the visibility and

understanding of the various standards along with their usability across different projects.

Second of all, coming up with a scientific process to bridge the gap between theoretical

reliability prediction and actual product reliability in the field.

In addition, performing ALT to attain more information on product reliability and failure

modes. Afterwards, validating and verifying the observation by performing reliability testing

simulation.

Finally, producing a model that takes into consideration all the abovementioned process and

efficiently determines product reliability.

2 | Introduction

1.3 Goals

The objective of this thesis project is based on the needs of the railway industry, more specifically

within BT.

The goal of this project is to elaborate a ‘global' model to predict product reliability within BT.

To ensure that the goal is successfully attained at the end of the duration of this project, sub-goals

are established. The sub-goals are listed below:

1. Produce a benchmark of the reliability prediction standards that are used within BT and

have the potential to be used in the future.

2. Perform theoretical prediction of two electronic products belonging to BT using the

standards that have been selected for creation of the benchmark.

3. Evaluation of reliability figures of the matured product based on field failure data and

comparing them to the theoretical reliability prediction.

4. Perform “typological” lab tests (e.g. accelerated-life tests and simulation) on a product

under development to predict board reliability.

5. Summarizing the overall result from the above four subgoals to realize the overall goal

of elaborating a global model to predict product reliability.

1.4 Research Methodology

The project utilizes the quantitative method to draw conclusions. The method requires verifying a

hypothesis or theories by quantitative measurements via experiements or testing [1]. The method is

used by comparing the predicted reliability results with the field failure data analysis and finally

producing adjustment factors. In addition, the method aids in the development of the model by

combining the statisics of FFDA and the experimental data.

1.5 Delimitations

Accelerated life testing on the product under development is performed where the acceleration

variable is thermal (i.e. temperature). Other stress variables such as vibration, humidity, electrical

stress are not taken into consideration due to time constraints for the realization of this thesis. In

addition, during the accelerated life tests only failures intrinsic to the board and it’s components are

looked out for. Therefore, any reliability issues related to the software running on the board are

considered out of the scope of this project.

1.6 Structure of the thesis

Chapter 2 presents relevant background information about reliability standards and accelerated

testing. The chapter also introduces the electronic products that are worked with during this project.

Information on tools used in this project are also provided in this chapter. Chapter 3 presents

information on the Benchmark of Reliability Standards. Chapter 4 displays the input parameters

and assumptions for the selected standards and the reliability prediction outcome. The results

obtained after performing the field failure data analysis are shown in Chapter 5. Chapter 6 covers

the topic of Accelerated Life Testing whereas Chapter 7 presents the outcome from the reliability

testing simulation. Chapter 8 discusses the model derived after the completion of the project.

Chapter 9 concludes the dissertation and provides suggestion for future work.

Background | 3

2 Background

This chapter provides basic background information about reliability prediction, ALT, electronic

products within BT and the tools used in this project.

2.1 Reliability standards

Reliability is a measure of the continuous delivery of correct service. High reliability is required in

situations when a system is expected to operate without interruptions [2]. The communication

systems within BT are such systems requiring them to be highly reliable and be functional over a

long period of time without any failures.

According to [3], “Product reliability is an indicator that the product will perform satisfactorily

over its intended useful life when operated normally. It is of great interest to both customers and

manufacturers.” This holds true for both BT and its customers. From the perspective of BT, high

reliability performance of the products is a requirement to meet the customer requirements, be

competitive and control warranty costs. The effects of poor product reliability is equally important

for the customers as this would result in increased number of failures as well as increased

maintenance costs over the product’s lifetime. In addition, BT being one of the frontrunners of the

railway industry focuses a lot on safety and the inability of it’s products to perform satisfactorily can

have severe implications. Thus the organization emphasises a lot on predicting the reliability of their

products using different reliability prediction standards.

Major work has been done in the field of reliability prediction since the 1950’s and a timeline of

the major events in this field of study can be found in [4]. Included in the timeline of these major

events, are the initial publication of different reliability prediction standards and how they have

evolved over the years after that. The standards follow different methodology to predict the

reliability of electronic systems which can result in completely different reliability data for the same

system. This has been established as a fact during the research work on “A comparison of

Electronic-Reliability Prediction Models” which has been published in 1999 by two researchers from

Longborouh University.

The research was done after it anecdotally became obvious there were problems with prediction

systems that were in common use. During the research work, reliability information which were

collected from leading British and Danish manufacturers for many years was used. The collected

data was regarded as of the highest possible quality and supposedly could provide as a benchmark

against the data from the reliability handbooks that could be tested [5]. 6 circuit boards were chosen

for which the manufacturers provided extra data to match them with the handbook types. The fIVE

reliability prediction handbooks that were used are: MIL-HDBK-217E, HRD4, Siemens SN29500,

CNET and Bellcore (TR-TSY-000332). The methodology implemented was to predict the reliability

of the circuit boards by using the selected reliability standards and then to compare the results with

the failure rates that have been observed in the field [5]. The outcome was that the reliability

handbooks were not good at accurately predicting the reliability. Apparently, either the handbooks

were too pessimistic or too optimistic in their prediction and would thus far have deviation from the

reliability of the products in reality. Further research also demonstrated that the models in these

handbooks were sensitive to different factors in different ways. During the course of this

dissertation, part of the task involves implementing the same methodology of comparing the field

failure data with the theoretical prediction from four different standards.

There are not many research done in this area of classification and comparison of different

reliability standards. One exception is a handbook published from IMdR which exhibits different

4 | Background

reliability standards and the principles of the reliability models selection [6]. The listed reliability

standards in this handbook are:

1. MIL-HDBK-217

2. RDF 93

3. UTE-C 80810

4. FIDES

5. 217Plus

2.2 Electronic products for reliability prediction, ALT and field failure data analysis

2.2.1 Board_A

The Board_A is a communication board which is part of the Communication Controller Unit (CCU)

in the Object Controller System. The functionality of the board is to receive telegrams from the

Central Interlocking System (CIS) and pass them on to the object controller boards via the OC-Link.

The board is manufactured by using the Pb-soldering process.

The board is kept in a cabinet where temperature ranges from 50°C to 70°C and one

temperature cycle takes approximately 24 hours.

2.2.2 Board_B

The Board_B is a part of the HW and SW platform Base_B. This product, which will be a part of the

object controller system is still under development and is yet to be installed in the field for use. The

board consists of three processors and a FPGA, IC_B. Amongst, the three processors, Processor A

and Processor B execute safety critical applications. Both the processors come with diverse HW and

OS to ensure higher level of redundancy and to negate the cause of the same failure happening to

both the processors at the same time. Processor S provide services to Processor A and Processor B

which are non-safety critical. The front-end switch is used for interfacing with the CIS whereas the

IC_B is used for interfacing with the wayside objects. The Board_B is manufactured following a

lead-free soldering process and this is the norm for all the different versions of the Board_B.

2.3 Accelerated Testing

Accelerated testing experiments are run with the purpose of extracting reliability information.

During accelerated testing, the test units of a component, subsystem or system are administered to

higher-than-usual levels of one or more accelerating variables such as temperature or stress [7]. The

results from the AT aids in predicting the life of the test units at use conditions. Present day

electronic products are required to have high-reliability and are expected to operate without failure

for many years. This results in few units failing in a test of practical length at normal use conditions.

For example, the design and construction of an object controller board may allow only a few months

to test its components that are expected to be in the field for nearly 30 years. However, if the testing

is done at normal use conditions at practical length, it will be difficult to assess completely the

product reliability. AT helps to assess or demonstrate component and subsystem reliability and to

detect failure modes. If failure modes are detected, the manufacturer can correct them before

putting them in use to the field.

Background | 5

5

The idea behind performing ALT during this project is to find out failure modes, if any, in one of

the products under development within BT. In addition, the new product that is being developed

characterises lead-free (SAC305) solder joints which is prone to reliability issues at high

temperature unlike Tin-lead (SnPb) solder joints. The change to lead-free solders have given rise to

new/unfamiliar failure modes [8]. Also, study has shown that lead-free failure times, failure

mechanisms and failure locations are significantly different than that of Tin-Lead and more work is

required to understand the consequences of lead-free soldering [9]. Thus it will be of interest to see

if any form of failure occur in these lead-free board due to solder joint fatigue.

2.4 ITEM QT

ITEM QT is a reliability, safety and risk assessment software that have been used during this project

in order to perform theoretical reliability prediction. The software has all the selected reliability

standards chosen for this project and these standards are embedded in the form of modules.

ITEM QT prompts the users, if any key parameters have been left blank, thus allowing the user

to add the parameters accordingly. This ensures that the key factors affecting the prediction for the

different standards are not left out.

ITEM QT also allows the user to perform reliability prediction of a board using different

standards in the same project. In essence this allows the user to apply the appropriate standards

based on qualitative judgement on different components of the same board at the same time.

One of the factors not being considered in the FIDES module of the tool is the lead-free process

factor for boards manufactured using lead-free soldering process.

In case, there is a component for which the failure rate is not modelled, this can be added to the

prediction using a component named “External”, where the user inputs the FR of the component

based on experience or other sources. This functionality is available to all the standards and can

particularly be useful for components whose FR is mentioned in the company datasheets.

2.5 Sherlock

Sherlock is a tool that allows users to analyze the reliability of circuit card assemblies based on their

design files. The analysis is done using Physics of failure and different modules are used such as

solder fatigue, PTH fatigue and CAF failure.

A Sherlock project consists of three basic sets of information, the Life Cycle definition, the

Project results and the CCA. Within the CCA or the main board, design files, analysis inputs and

results as well as the results for the individual circuit card is available.

Reliability testing simulation is performed for both Board_A and Board_B in Sherlock

Automated Design Analysis Software. Life cycle mimicking real product environment is performed

for Board_A. For Board_B, the life cycle is defined so as to mimic the accelerated life test.

2.5.1 Life Cycle

The life cycle in Sherlock can be defined in a way so as to mimic conditions experienced by the

circuit card during real operational scenario and/or lab testing. This allows for the opportunity to

validate reliability results between Sherlock and lab testing. The result can also be compared with

the actual board behavior in the field.

The “Life Cycle Editor” allows the user to set reliability goals for the CCA. The reliability goals

consist of two input parameters: Reliability Metric and Service Life.

6 | Background

The reliability metric is a quantified reliability goal set by the user for the CCA in any of the

following forms: Reliability (%), Probability of failure (%), MTBF (years), MTBF (hours), FITs (1E6

hours) and FITs (1E9 hours). Service life can be defined as the duration up to which the CCA is

expected to be in service with full functionalities.

The reliability goals for Board_A and Board_B are defined as follows (according to board

requirement specifications):

1. Reliability Metric: 0% probability of failure

2. Service Life: 30 years

In this document the reliability goals are set so as to mimic the normal operating condition of

the board in field.

A life cycle can have one or more “Phases” which in turn can have one or more “Events”.

The “Phase settings” segment in the “Life Phase Editor” allows user to choose the environment

from a predefined set of 14 environments. The users set the duration of the phase along with the #

of cycles for each phase.

The phase settings defined for Board_A and Board_B are:

1. Environment: Ground_Benign

2. Duration: 1 Day

3. # of cycles: 100 duty cycles (100 duty cycle corresponds to the cycle running for the whole

duration)

Once the Phase has been defined, the user can add any one or more of the following events to

the phase: Thermal cycle, Harmonic vibe, Random vibe and Shock event. During the life cycle of a

product, it can experience different forms of stresses. In Sherlock, users can model the different

stresses that the product undergoes during its lifetime by modeling the relevant stresses as events

under different phases of the lifecycle.

For the Board_A and Board_B, only the thermal event is defined.

1. Thermal Event Editor

a. Thermal Event Settings

i. # of Cycles: 100 Duty Cycle

ii. Life cycle State - Parameter to define whether the CCA undergoing the

thermal cycling is in an operating state or in storage. For both the

boards, “Operating” is chosen as the life cycle state

b. Thermal Profile

i. Allows the user to create the thermal profile that the CCA experiences

during laboratory testing or in real scenario.

1. The thermal profile for both the boards have been set to the

following according to real operation conditions:

a. Minimum Temperature: 50°C

b. Maximum Temperature: 70°C

c. Dwell time (Minimum temperature): 6 hours

d. Dwell time (Maximum temperature): 6 hours

e. Ramp up: 6 hours

Background | 7

7

f. Ramp down: 6 hours

Due to lack of information assumption has been made that the time of the day is eventually

spread out for all four activities and each part of the thermal cycling is allocated 6 hours evenly.

2.5.2 Inputs

Sherlock retieves a lot of inputs from the design files of the CCA (usually stored in an ODB++ file.

ODB++ archive for Board_A and Board_B are imported to Sherlock in order to provide Sherlock

with all the information required for designing and analyzing the boards. From the ODB++ file,

Sherlock is able to extract the following information as inputs.

1. Parts list - List of all parts defined for the CCA.

2. Stackup – Displays all the layers the CCA is composed of and their properties. In

addition, board properties are also shown based on the board outline and the individual

layer properties.

3. Layers – A layer viewer with a collection of graphical tools to review, analyze and

update circuit card information.

4. Pick & Place – Displays the pick and place data in the graphical layer viewer.

5. Drill Holes – Displays all the drill holes in the CCA.

6. Net List – Table containing all the net list information for the CCA.

The part properties for every part in the parts list, needed to be checked and confirmed before

performing any analysis. Sherlock relies on a number of critical properties such as package names

and descriptions to guess the parts that are being analyzed. The software compares these

information with its internal databases and attempts to standardize the property values whenever

possible. However, to ensure that these property values are correct, the user needs to confirm the

properties before carrying out the analysis.

2.5.3 Analysis Modules

The Sherlock analysis modules attempt to predict reliability of an electronic circuit card and its

components based on the circuit card design and the expected environmental conditions to be

experienced by the circuit card over its expressed service life. Shwelock analysis modules include the

following:

1. Conductive Anodic Filament (CAF) failure analysis

2. Failure rate analysis

3. PTH fatigue analysis

4. Solder fatigue analysis

For this project, only the solder fatigue analysis is of importance and is performed for Board_B.

2.5.3.1 Solder Fatigue (Thermal Cycling) analysis

Solder joints allows for the electrical, thermal and mechanical connections between a PCB and a

printed board. Board_A and Board_B contains thousands of solder joints. As mentioned previously,

the boards, being part of the object controller system undergoes thermal cycling once they are

installed in the field. During the course of this thermal cycling, the PCB and the components

mounted on the PCB expands or contracts due to change in temperature. However, the rate of

8 | Background

expansion or contraction for the PCB and the components vary due to difference in CTE. This places

the solder joint under a lot of stress which damages the solder. Over time this damage accumulates

and leads to crack propagation which in turn cause failure to the solder joint due to solder joint

fatigue. Figure 2-1 displays a TSOP package failing due to solder joint fatigue and losing connection

with the PCB substrate.

Figure 2-1: Solder joint fatigue failure on a TSOP package. (Source: CALCE News, September 1993)

Solder joint fatigue can be influenced by the following:

Maximum temperature

Minimum temperature

Dwell time at maximum temperature

Component design (size, number of I/O, etc.)

Component material properties (CTE, elastic modulus, etc.)

Solder joint geometry (size and shape)

Solder joint material (SnPb, SAC305, etc.)

PCB thickness

Printed in-plane material properties (CTE, elastic modulus)

Sherlock Solder Fatigue analysis module makes use of the following input parameters to

perform solder joint fatigue failure analysis:

Life-Cycle Reliability Goals

Parts list

Circuit card mechanical properties (stackup data)

Component sizes and locations

Solder properties

Thermal events and associated thermal maps

Solder material: Lead-free (SAC305/SnPb)

Part temperature rise: 0°C

Part validation: Enabled

The sensitivity analysis for the Sherlock solder fatigue module gives a better understanding of

what affects the damage to the components. The procedure to perform this sensitivity analysis has

Background | 9

9

been to take a single part from the Board_B and then comparing the result of the component with

its original properties against the result with each varying properties. At any one point, the property

of interest has been changed while the others are kept the same in order to ensure an accurate

comparison of the results. The properties that have an effect on the solder fatigue of the component

are given below:

Package types: The larger the package of an electrical component, the higher will be the

damage. For instance, a part of package type “0805” (2 mm*1.2 mm*0.6 mm) will

suffer more damage than a part with package type “0603” (1.6 mm*0.8 mm*0.5 mm).

Material: Damage on components is dependent on the primary material of which the

part is made. For instance, a part made up of Bariumtitanate will suffer less damage

than a part made up of Alumina.

Pad size: If the pad size is big, the damage on the component will be less and vice versa.

Solder material: Lead-free solder material (SAC 305) increases the damage on the

component more than TIN-LEAD (63SN37PB) solder material.

Stencil Thickness: The thickness of the solder joint that connects the part to the PCB. A

thick stencil thickness will result in a conservative damage to the part than that of a thin

stencil thickness.

Part Temperature rise: The higher the part temperature rise, the higher will be the

damage experienced by the component due to solder fatigue.

Dimension of the die: Higher dimension of the die will result in higher damage due to

solder fatigue and vice versa.

Dimension of the flag: Higher dimension of the flag will result in higher damage due to

solder fatigue and vice versa.

The life prediction curve rises vertically if the damage to the component is high which resembles

to a higher probability of failure (%). The x-axis represents the lifetime in years. The prediction

curve is a 2-parameter Weibull curve.

10 | Benchmark of Reliability Prediction standards

3 Benchmark of Reliability Prediction standards

The reliability prediction standards at RCS, Bombardier Transportation is identified after

classification of current and potential reliability prediction standards.

Apparently, there are quite a few reliability standards that are used to carry out reliability

prediction for electronic boards, components, etc. However, what has been missing is a reliability

standards benchmark that would allow the users to select the appropriate standard to do the

prediction based on qualitative judgement. The benchmark is built upon a platform of evaluation

criterias mapped against selected standards assisting the users in decision making.

3.1 Identification of reliability standards used at Bombardier Transportation

According to [10], within a time frame between the year 1997 and 2015, 65 reliability predictions

have been performed on various components at RCS BT. From [10], we can retrieve the standards

that were used to do the predictions as well as their usage statistics. The standards that were used

during this time and their usage statistics are as follows:

IEC 62380 TR (Former RDF 2000): Total predictions-48, Percent contribution-73.85%

MIL-HDBK-217F-Notice2: Total predictions – 11, Percent contribution – 16.92%

Telcordia Issue 1 (Former Bellcore): Total predictions – 4, Percent contribution – 6.15%

Siemens SN 29500: Total predictions: 2, Percent contribution – 3.08%

After the identification, apart from Telcordia Issue 1, the rest of the standards along with FIDES

is selected to perform classification and comparison between them. FIDES is chosen due to its po-

tential of future use at Bombardier, since there is a possibility that IEC 62380 may well be replaced

by FIDES as a defacto standard. In addition, FIDES allows users to take into consideration complex

mission profiles for the components. Since, IEC 62380 is used for majority of the reliability

prediction within Bombardier, it is thus included in the benchmark. MIL-HDBK-217F is one of the

oldest reliability standards and have been used by different industries for many decades. Even

though its use has become very limited due to technological advancements, it is still of interest to

see how the reliability model of this standard compares with its more recent counterparts. Siemens

SN 29500 is included in the benchmark due to its common use within the railway industry.

3.2 Benchmark Structure

The benchmark consists of five segments in the form of five spreadsheets. The first spreadsheet is

named “Classification and Comparison”. This is where the set of evaluation criterias are mapped

against the standards.

The “Component Mapping” spreadsheet is dedicated to displaying the electronic components

and their variants that is included in the different standards. During reliability prediction, one of the

issues faced by the user is a variation between the naming of different types of components. This

worksheet brings an added value to the benchmark by including a list of electronic components and

how they have been addressed across the selected standards. This aids the user performing the

prediction into knowing which components are part of the benchmark and quickens the process of

selecting the appropriate component in the tool during the prediction. During the creation of this

worksheet, the list of components have been retrieved from [11] and [12].

Benchmark of Reliability Prediction standards | 11

11

The “Possible Problems & CoP” worksheet presents a list of components including the possible

problems that may occur with them and the potential cause. The worksheet can be used in the fu-

ture by BT RCS to perform FMECA, RCA and preventive maintenance analysis. The repair center

can also also use this information while looking to repair failures that have been reported in the field

for specific component family type. The list of addressed components with regards to the

component mapping is not necessarily exhaustive and can be extended in case information on the

possible problems and the cause of problems of the missed components is found.

Amongst all the standards, only IEC 62380 provides life expectancy information for few

component families and their failure mode repartition. The list is included in the worksheet “IEC

62380-Miscellaneous”. The added value of this worksheet is also the fact that this will aid during

design purposes and scheduling preventive maintenance. Also, if failure occurs for any one of the

com-ponents in the list the repair center can look at the failure repartition between the failure

modes to have a primary idea of what the probable cause of failure.

Finally, the “References” spreadsheet displays the references that are indispensable for the use

of the various standards.

3.3 Classification and Comparison

This section is dedicated to the “Classification & Comparison” segment of the benchmark and

includes Table 3-1.

Table 3-1: Classification and Comparison of reliability standards

FIDES guide 2009 IEC 62380 TR Siemens SN 29500 MIL-HDBK-217F-N2

Type Guide (Proposed as

future IEC standard)

Standard Standard Standard

Status To be continued To be continued

(Until 2017)

To be continued (on

Siemens initiative)

Discontinued

Last update 2010 2004 Various releases 1995

Application domain Railway included Commercial

application but not

railway

Railway included Specific to Military use

Mehodology Families count analysis

Part count analysis

Part stress analysis

Only Part stress

analysis

Only Part stress

analysis

Part count analysis

Part stress analysis

Component Life cycle phases Permanent working

On/off Cycling

Dormant application

Permanent working

On/Off cycling

Dormant application

Not covered

Not covered

Environment Dynamic Dynamic Categorized Categorized

COTS components Covered Covered Covered Not- overed

Principles of construction Physics of Failure Empirical Empirical Empirical

Mission/Life profile Covered Covered Not Covered Not covered


Failures derived from

development/manufacturing

errors

Covered No information Not covered Not covered

Electrical overstress Covered Covered Covered Covered (Partial)

Mechanical overstress Covered Not Covered Covered Not covered

Thermal overstress Covered Covered Not covered Covered

Process contributing factor

(Component manufacturing

factor)

Covered Not covered Not covered Not covered

Process contributing factor

(∏_Process factor)

Covered Not covered Not covered Not covered

Humidity Covered Not covered Not covered Not covered

Lead-free Soldering Covered Not covered Not covered Not covered

Package Data Covered Covered Not covered Covered (Negligible)

Conformal Coating Not covered Not covered Not covered Not covered

The standards use different methodology to perform reliability analysis. The terms that are part

of the methodology in the benchmark is explained in details below:

Parts count method requires less information, generally part quantities, quality level

and the application environment. This method is applicable during the early design

phases and during proposal formulation. Usually, this method of prediction will result

in a more conservative estimate of system reliability than the Parts stress method.

Parts stress analysis method requires a greater amount of detailed information and is

applicable during the later phase when actual hardware and circuits are being designed.

The families count prediction method introduced in FIDES is particularly applicable

during the earliest phases of the project. This method can be used to produce a

reliability evaluation with the least amount of information about the product definition.

In particular, the technological description of items is very much simplified and

practically all application constraints are fixed at default values

Physics of failure is a technique under the practice of Design for Reliability that uses the

knowledge and understanding of the processes and mechanisms that induce failure to

predict reliability and improve product performance

3.4 Evaluation Criterias

The list containing the evaluation criterias that have been identified can be seen in Table 3-2.

Table 3-2: List of evaluation criteria

Evaluation Criteria

Type

Status


13

Last release/update

Benchmark version release

Next anticipated release/update/maintenance

Publisher

Objectives

Origins of data

Principles of construction

Methodology

Model coverage

Mathematical Model Type

Mathematical equation

Reliability metrics

Life/Mission Profile

Phases

Stresses

Environment

Possibility to consider additional environments

MIL-SPEC components

COTS components

Lead-free Process Factor

Bathtub curve coverage

Terms and definitions

Applicability indicators

Temperature cycling

Lead-free soldering

Solder joint failure rate

Conformal coating

Influence of environment

Composition

Warning/Limitations

Confidence level in the prediction

Package data

Software model


Vibration

Shock

Chemical

Covered Product Life Cycle Phases

Failure mode

Failure Distribution

Life expectancy

3.5 Evaluation of reliability standards based on the defined criterias

3.5.1 FIDES

FIDES Guide 2009 is the latest reliability prediction guide that is available as of now. It has been

produced under the supervision of DGA by companies in the FIDES Group. The Group consists of

the following companies: AIRBUS France, Eurocopter, Nexter Electronics, MBDA missile sys-tems,

Thales Systèmes Aéroportés. Even though the group is very much dominated by companies from the

field of aeronautics and defense, the guide eventually covers a more broad application domain.

The first publication for this guide was in 2004 under the name FIDES Guide 2004 issue A

which was later accepted by the French standardisation organization with the reference UTE C 80

811. The rationale behind the release of the latest publication has been to take into consideration the

technological advancements, increase the coverage and to make improvements. The release for the

guide had been in 2010-09-01

The methodology takes into account failures that are derived from development or

manufacturing errors and overstresses such as electrical, thermal and mechanical. The methodology

also deals with non-functioning phases such as dormant application and genuine storage.

The evaluation method of FIDES does not consider the infant mortality and the wear out

periods of the components except for some special cases for some sub-assemblies [13].

The objectives of the creation of this standard have been:

To make a realistic evaluation of the reliability of the electronic products including

systems that encounter severe or non-aggressive environments (storage).

To provide a specific tool for the construction and control of this reliability.

To develop a new reliability assessment method for electronic components which takes

into consideration COTS and specific parts and new technologies.

3.5.2 IEC 62380

IEC 62380 TR is a reliability data handbook that is based on the French telecommunications

standard RDF 2000. This reliability handbook has been released in 2005 and is defined as an in-

ternational standard by International Electrotechnical Commission (IEC).

The IEC 62380 TR calculation model takes into consideration the influence of the environment,

the thermal cycling seen by the cards, function of mission profiles undergone by the equipment,

replace environment factor which is difficult to evaluate. These models can handle permanent


15

working, on/off cycling and dormant applications. On the other hand failure rate related to the

component soldering, is henceforth included in the component failure rate [14].

The initiating motivation of the IEC 62380 TR has been to take into consideration the influence

of the environment which is much more effective.

3.5.3 Siemens SN29500

Siemens SN29500 is a reliability standard used by Siemens AG and the Siemens companies as a

uniform basis for reliability predictions.

The initiating motivation for this reliability standard has been the customer requirements on

demonstrating the reliability calculation of the products’ from Siemens. Another motivation has

been to write a reliability engineering guide in order to provide engineering process and tools to

improve reliability in the development of new electronic systems.

Siemens SN 29500 is based on the IEC standard IEC 61709. The standard comes in individual

documents for specific component groups, 12 to be exact. Instead of updating the whole standard at

once Siemens have resorted to updating individual documents based on their needs.

The IEC 61709 standard is intended for reliability prediction of electronic components. The

standard describes how to state and use data belonging to an organization in order to perform

reliability predictions [15]. The standard can also be used by an organization to set up a failure rate

database and to describe the reference conditions for which field failure rates should be stated [15].

3.5.4 MIL-HDBK-217F-Notice 2

The purpose of this handbook has been to establish and maintain consistent and uniform methods

for estimating the inherent reliability of military electronic equipment and systems. During acqui-

sition programs for the military electronic systems and equipment there was a need to have a

common basis for reliability predictions hence the creation of this handbook. The MIL-STD-217F

also creates the opportunity to compare and evaluate reliability predictions of related or competi-

tive designs. The intended use of the handbook is as a tool to increase the reliability of the

equipment being designed.

MIL-HDBK-217F is becoming obsolete as the technology coverage of electronic products and

sys-tems widen. Apparently, MIL-HDBK-217F is very pessimistic when it comes to components that

are not MIL-SPEC. However, at present it is commonplace for the military and the avionics in-

dustry to use COTS components while building their system.

3.6 References across different standards

From the “References” worksheet it can be seen that both FIDES and Siemens standards have re-

ferred to “IEC 60050 (191) A1 (1999-03) Electromechanical vocabulary - Chapter 191: operating

dependability and service quality” as well as “IEC 61709 Electronic components - Reliability –

Reference conditions for failure rates and stress influence models for conversion”. FIDES have al-so

used data from the “Military standard mil-hdbk-217F (+notice 1 & notice 2)” and the “UTE C 80-810

RELIABILITY DATA HANDBOOK: RDF 2000 – A universal model for reliability predic-tion

calculations for components, electronic boards and equipment” (currently IEC 62380) both of

which are part of the benchmark. Siemens and IEC 62380 tr have both used IEC 60747 as refer-

ences. As for the military standard MIL-HDBK-217F2, most of its reference documents are specif-ic

to different components which the standard have compiled to create the failure database.

The benchmark can be viewed in the file attached in Apppendix A.

16 | Theoretical Reliability Prediction

4 Theoretical Reliability Prediction

Two products within RCS, BT have been identified for which reliability features are predicted. The

reliability prediction was performed for Board_A, Version 2.5 and Board_B, Version 1.4 using

selected relevant standards. The prediction is performed using the software ITEM QT. The

standards that have been implemented are FIDES, IEC 62380, MIL-HDBK-217F2 and Siemens SN

29500.

4.1 Input parameters and Assumptions

This section displays certain input parameters and assumptions that are used during the prediction

for both Board_A and Board_B across the chosen standards. The input parameters and

assumptions are made in accordance to the reference condition. Components across the different

standards are mapped accordingly.

4.1.1 FIDES

4.1.1.1 Life Profile

The life profile for the reliability prediction performed on Board_A and Board_B have been set to

mimic the real conditions that the boards go through in the field after they have been installed. The

board is to be in a permanent working mode and no standby time due to repair is assumed.

The life profile parameters are as follows:

Permanent Working Phase: On

Calendar Time Hours: 8760 (represents 1 year)

Ambient Temperature: 25°C

Relative Humidity: 0 % (due to the board being powered on 100% of the time)

Temperature Amplitude, ΔT: 10°C

Number of Cycles Per year: 365 (1 cycle/day)

Cycle duration: 24 hours

Maximum temperature during cycling: 45°C

Random Vibration: 0 Grms (Assumed)

Component Junction Temperature 60°C

4.1.1.2 General Input Parameters

The placement is selected to be “Non-interface” Digital function.

Default value is assumed for the “Ruggedizing calculation mode” and the “Process factor

calculation mode”.

Manufacturer Quality Assurance Level and Component Quality Assurance Level are set to

“Equivalent” which stands second best amongst four different levels. The Component Reliability

Assurance Level is set to “Very Reliable – Level B”. The Manufacture Experience Factor is chosen to

be “Recognised manufacturer: Mature processes for the item considered”.

Theoretical Reliability Prediction | 17

17

4.1.1.3 Constraints

Since, FIDES is almotst 7 years old, the most recent packages are not included in the standard.

4.1.2 IEC 62380

The assumptions during the prediction using the IEC 62380 standard are kept similar to that of the

assumptions made during the earlier predictions of Board_A and Board_B. IEC 62380 is the only

standard amongst all the standards user here to provide information on life expectancy for the

different components.

4.1.2.1 Mission Profile

“Ground, stationary: weather protected” and “Permanent working” is used as the mission profile.

The Night and Day temperature difference has been set to 10°C with 365 cycles per year.

Non-interface setting is used for the electrical environments since the boards are inside the

cabinet and do not have any cables going outside.

The average outside ambient temperature 𝑡𝑎𝑒 is selected to be 25°C and the the average ambient

temperature of the board near the components 𝑡𝑎𝑐 is 40°C.


The general input parameters that have been used are:

Junction Temperature Estimation Mode: Junction-Ambient

Air Flow Type: Natural Convection

Function/Electrical Environment: Non Interface

Year of Manufacturing: Board_A (2008), Board_B (2014)

4.1.2.3 Constraints

IEC 62380 is unable to model QFN and BGA packages with 0.8 mm pitch. A lot of the packages are

missing and there is a limitation to the number of transistors and memory bits.

4.1.3 Siemens SN 29500


No mission profile is available in this standard.


The general input parameters are as follows:

Junction Temperature Calculation Mode: Junction Temperature User Input

Junction Temperature, Input: 60°C

Stress Profile: Disabled

Inegrated Circuits, Operating Time: 3000 hours [Default] [Maximum value]


4.1.3.3 Constraints

Failures instrinsic to the PCB cannot be modelled in Siemens. Instead, the PCB block contains the

failures related to the connections for both the boards.

Absence of temperature cycling and mission/life profile disallows mimicking the real scenario

the predicted products undergo.

Advanced IC packages can not be modelled.

4.1.4 MIL-HDBK-217F2


No mission profile is available in this standard.


Application

o Repair Mode: Non-repairable

o Environment: Ground, Benign

o MTTR: 0 hour

o Number of Standby: 0

o Ambient Temperature: 40°C (component ambient temperature inside cabinet)

o Voltage Stress: 0,8 (default)

o Current Stress: 0,7 (default)

o Power Stress: 0,75 (Default)

o Adjustment Factor: 1

o Connection type: Reflow Solder

Physical

o Technology: CMOS

o Package Type IC: Surface Mount Tech

o Quality, Microelectronics: Comercial or Unknown

o Number of Gates: 60000 (Maximum)

o Number of Transistors: 10000 (Maximum)

o Number of years in production: Board_A - 8 years, Board_B - 2 years

o Theta Case/Ambient: 40°C

o Theta junction Case: 60°C

o Quality, Other: Lower

o Quality Capacitors: Commercial or Unknown

4.1.4.3 Constraints

MIL-HDBK-217F due to being last published nearly 20 years back, could not keep up with the

technological advancements. The standard therefore lacks input parameters and models which are

essential for a good prediction. Few of these constraints are mentioned in this section.

The failure on the PCB of the products could not be modelled since failure intrinsic to the PCB is

not modelled in this standard.

Voltage converter is not modelled which has been used in Board_A.


19

Board_A and Board_B contain linear microcircuits with more than 10,000 transistors which is

a limitation in the standard.

Bipolar and MOS circuits limited to 60,000 gates and memory devices are limited to 1 million

bits which is far less than what it actually is both the boards used for reliability prediction.

Prediction model for Flash memory and FPGA’s not modelled. Hence, the flash memories and

the FPGA used in the Board_B and Board_A could not be modelled in the prediction.

Mission/Life Profile and Temperature cycling is not modelled in this standard which do not

allow for mimicing the exact conditions that the products undergo.

4.2 Prediction outcome

This section is dedicated to the outcome of the reliability prediction for the Board_A and the

Board_B.

To perform reliability prediction of the Board_A, all the components are divided into 7 blocks.

The blocks are as follows: Active, Block S, Connectors, Power, Passive, 8-Layer Equipped PCB and

Miscellaneous. Amongst the abovementioned blocks, the “Miscellaneous” block is part of the

analysis but as it has no impact on board operability, it is not taken as a contributor to the MTTF.

The block and the components within are presented in “Italics” in the figures. The block partitioning

and the classification of components for the Board_A can be found in [16].

The block partitioning and the classification of components for the Board_B can be found in

[17]. None of the selected reliability standards allows for performing prediction of lead-free boards.

However, FIDES 2009 have a small section discussing the consequences on reliability due to the

transition to lead-free manufacturing process. FIDES proposes to calculate the failure rate of a

product manufactured using the lead-free process by deriving the product of the original failure

rate, part manufacturing factor, process manufacturing factor and the lead-free process factor

∏_LF. The lead-free process factor hereby varies between 1 (for a mature process) to 2 (for a

process for which no precautions were taken). Apparently, for the prediction on the Board_B that

was done on [17], the PCB failure rate was multiplied with a lead-free process factor, ∏_LF=2.

The prediction results are displayed in three stages for both the boards.

The output from Stage 1 shows the failure rate of each block and their contribution to the failure

of the whole board. We refer to this as “Board X”. The X is replaced by Board_A and Board_B with

respect to the outcome displayed.

The output from Stage 2 is the prediction results on functional level showing all the components

in each block, their quantity, their failure rate and their contribution to the failure of the whole

board. We refer to this as “Board X Functional Level”.

Stage 3 output, displays the total number of components used in the board, their quantity, their

failure rate and their contribution to the failure rate related to the whole failure rate of the board.

This we refer to as the “Board X Component Level”.

It shall be noted that for IEC 62380, SN 29500 and MIL-HDBK-217F2, the model for the

integrated circuits is limited due to insufficient IC packages.

4.2.1 Board_A

According to Figure 4-1, the MTBF for Board_A for the different standards are as follows:

FIDES: 72.1 years


IEC 62380: 95.8 years

SN 29500: 95.9 years

MIL-HDBK-217F2: 5.5 years

Amongst all the results, it is very obvious that the MIL-HDBK-217F gives the most conservative

results.

Figure 4-1: Block level prediction of Board_A

Upon closer inspection, even though Siemens and IEC 62380 have very similar results, and

FIDES having not as big of a difference to them as that of MIL-HDBK 217F2, we can see the

differences in the most contributing blocks.

It shall be noted that according to the Benchmark that has been produced during the course of

this thesis, only FIDES and IEC 62380 allows for thermal cycling to be taken into consideration

during reliability prediction while the SN 29500 and MIL-HDBK-217F2 do not.

Since the PCB intrinsic failure rate in MIL-HDBK-217F is not modeled hence it is more

optimistic than the other standards where it is modeled.

For FIDES, the most contribution to the failure rate is by Block Power with a value of 906 FITS

which is approximately twice the failure rate for the block in IEC 62380, 12x greater than the value

in SN 29500 and nearly 4x greater than the FR in MIL standard.

For IEC 62380, the trend is the same as that of FIDES, with Block Power and Block S

contributing the most with values of 40.85% and 31.96% respectively.

Block Passive and Block S dominates the failure rate contribution in SN 29500 by contributing

46.96% and 31% of the total failure rate of the board. While for MIL-HDBK-217F Block S

contributes the most to the failure of the board and that is by 42.37% and the Active Block

contributing 30.83%.

One very interesting observation from the outcome of the prediction is the difference in values

related to the failure rate of the PCB between the different standards. As for Siemens, SN 29500 the

value provided as the failure rate of the PCB is due to the connections rather than the intrinsic

failure of the PCB. The intrinsic failure to the PCB is well modelled in FIDES and IEC 62380 and it

contributes by 0.2% and 9.8% respectively.

The Stage 2 output showing the predicted values for the board can be seen in Figure 4-2.

FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%) FR [FITs] Ctrbn(%)

8-layer Equipped PCB 2,5 0,2% 117,0 9,8% 51,0 4,3% No Model 0,0%

Active 167,0 10,5% 36,5 3,1% 124,0 10,4% 6440,0 30,8%

Block S 293,0 18,5% 381,0 32,0% 369,0 31,0% 8850,0 42,4%

Power 906,0 57,2% 487,0 40,8% 75,5 6,3% 252,0 1,2%

Passive 190,0 12,0% 95,5 8,0% 559,0 47,0% 4710,0 22,5%

Connectors 25,3 1,6% 75,2 6,3% 11,9 1,0% 636,0 3,0%

Miscellaneous 180,0 0,0% 648,0 0,0% 400,0 0,0% 782,0 0,0%

Sum 1583,8 100% 1192,2 100% 1190,4 100% 20888,0 100%

MTBF (Years) 72,1 95,8 95,9 5,5

BlocksFIDES IEC 62380 SN 29500 MIL-HDBK217F2


21

Figure 4-2: Functional level prediction of Board_A

Figure 4-2, gives us a much better overview of which components are contributing the most for

the failure of the board across different standards.

The voltage converter used in the Board_A has a big impact on the failure rate of the board

according to FIDES and IEC 62380. Unfortunately, this component could not be modeled in MIL-

HDBK-217F2, which is another constraint of the standard. Integrated circuits have a quite a big

impact as well across all the four standards. However, the failure rate achieved for the IC’s would be

different in for the SN 29500 and the MIL-HDBK-217F standard given that they have limitation in

the input parameters due to aging.

Stage 3 of the prediction on component level for Board_A can be seen in Figure 4-3.


8-layer

Equipped PCB1 2,5 0,2% 117,0 9,8% 51,0 4,3% No Model 0,0%

CNY17

Optocoupler/Fuse4 74,7 4,7% 13,7 1,1% 60,0 5,0% 40,0 0,2%

Integrated Circuits 10 92,3 5,8% 22,8 1,9% 64,0 5,4% 6400,0 30,6%


Oscillator 1 47,3 3,0% 45,4 3,8% 30,0 2,5% 114,0 0,5%

Voltage Converter 2 852,0 53,8% 454,0 38,1% 48,9 4,1% No Model 0,0%


Capacitor 298 146,1 9,2% 45,0 3,8% 420,1 35,3% 3310,4 15,8%

Inductor/Transfor

mer25 2,5 0,2% 32,3 2,7% 39,6 3,3% 96,1 0,5%

Power Switch 1 4,5 0,3% 12,9 1,1% 32,0 2,7% 430,0 2,1%

Resistor 200 36,9 2,3% 5,3 0,4% 67,3 5,7% 872,0 4,2%

Connectors 7 25,3 1,6% 75,2 6,3% 11,9 1,0% 636,0 3,0%

Transistor 5 26,1 0,0% 6,0 0,0% 35,5 0,0% 23,5 0,0%

LED 8 7,5 0,0% 561,0 0,0% 48,4 0,0% 9,4 0,0%

Switch 2 7,8 0,0% 55,8 0,0% 12,9 0,0% 430,5 0,0%


Diode 13 103,0 0,0% 11,0 0,0% 222,0 0,0% 68,8 0,0%

Sum 559 1584 100% 1192 100% 1190 100% 20891 100%

MTBF (Years) 72,1 95,8 95,9 5,5

Miscellaneous

QuantitiesBlocksFIDES IEC 62380 SN 29500 MIL-HDBK217F2

Active

Block S

Power

Passive

Component


Figure 4-3: Component level prediction of Board_A

The output in Figure 4-3 gives us in-depth view of what we saw in the outputs from Stage 1 and

Stage 2. The output validates how Integrated circuits and the voltage converter are one of the major

contributors to the failure of the Board_A. According to the comparatively older standards, SN

29500 and MIL-HDBK-217F2, capacitors contribute very much to the failure of the board as well.

The italicized components are part of the “Miscellaneous” Block and are part of the analysis but as it

has no impact on board operability, it is not taken as a contributor to the MTTF.

4.2.2 Board_B

The result of the reliability prediction of Board_B can be seen in Figure 4-4. According to the results

the MTBF of the Board_B across different standards are as follows:

FIDES: 33.2 years

IEC 62380: 35.5 years

SN 29500: 25.1 years

MIL-HDBK-217F: 1.5 years


Capacitor 298 146,1 9,2% 45,0 3,8% 420,1 35,3% 3310,4 15,8%

Connectors 7 25,3 1,6% 75,2 6,3% 11,9 1,0% 636,0 3,0%

Inductor/Transfome

r25 2,5 0,2% 32,3 2,7% 39,6 3,3% 96,1 0,5%

Integrated Circuit 20 392,0 24,8% 391,1 32,8% 429,6 36,1% 15392,0 73,7%

Optocoupler/Fuse 4 74,7 4,7% 13,7 1,1% 60,0 5,0% 40,0 0,2%

Oscillator 1 47,3 3,0% 45,4 3,8% 30,0 2,5% 114,0 0,5%

PCB 1 2,5 0,2% 117,0 9,8% 51,0 4,3% No Model 0,0%

Power Switch 1 4,5 0,3% 12,9 1,1% 32,0 2,7% 430,0 2,1%

Resistor 200 36,9 2,3% 5,3 0,4% 67,3 5,7% 872,0 4,2%

Voltage Converter 2 852,0 53,8% 454,0 38,1% 48,9 4,1% No Model 0,0%

Transistor 5 26,1 0,0% 6,0 0,0% 35,5 0,0% 23,5 0,0%

LED 8 7,5 0,0% 561,0 0,0% 48,4 0,0% 9,4 0,0%

Switch 2 7,8 0,0% 55,8 0,0% 12,9 0,0% 430,5 0,0%


Diode 13 103,0 0,0% 11,0 0,0% 222,0 0,0% 68,8 0,0%

Sum 559 1584 100% 1192 100% 1190 100% 20891 100%

MTBF (Years) 72,1 95,8 95,9 5,5

ComponentsFIDES IEC 62380 SN 29500 MIL-HDBK217F2

Quantity


23

Figure 4-4: Block level prediction of Board_B

In FIDES we can see the failure rate contribution is almost evenly spaced out between the three

processor blocks, MMI, IC_B and the Back end connectors.

In IEC 62380, nearly one-third of the contribution towards the failure of the board is predicted

to be due to the PCB. This is a huge difference compared to Siemens, where only the failures related

to the connection within the board is taken into consideration. The MMI and the CPU B also

contributes considerable amount of failure rate according to IEC 62380. The abovementioned

blocks also remain the major contributor to the failure rate according to SN 29500 and MIL-HDBK-

217F as well.


PCB 6,1 0,2% 1010,0 31,4% 111,0 2,4% No Model 0%

Power 246,0 7,1% 90,0 2,8% 477,0 10,5% 8840,0 11,2%

Reset 102,0 3,0% 13,7 0,4% 88,7 1,9% 4950,0 6,3%

Sub-rack links 169,0 4,9% 93,8 2,9% 256,0 5,6% 6040,0 7,7%

Front End switch 200,0 5,8% 124,0 3,9% 186,0 4,1% 3580,0 4,6%

MMI 444,0 12,9% 398,0 12,4% 501,0 11,0% 12300,0 15,6%

CPU S 470,0 13,7% 415,0 12,9% 849,0 18,6% 9940,0 12,6%

IC_B 387,0 11,2% 214,0 6,7% 260,0 5,7% 2520,0 3,2%

CPU A 397,0 11,5% 202,3 6,3% 694,0 15,2% 8680,0 11,0%

CPU B 431,0 12,5% 344,0 10,7% 665,0 14,6% 11300,0 14,4%

Back end links 144,0 4,2% 41,6 1,3% 159,0 3,5% 2840,0 3,6%

Back end

connectors446,0 13,0% 266,0 8,3% 306,0 6,7% 7650,0 9,7%

Sum 3442 100% 3212 100% 4553 100% 78640 100%

MTBF (Years) 33,2 35,5 25,1 1,5

FIDES IEC 62380 SN 29500 MIL-HDBK217F2Blocks



PCB 1 6,1 0,2% 1010,0 31,4% 111,0 2,4% No Model 0,0%

Diode 8 15,2 0,4% 12,8 0,4% 136,0 3,0% 41,0 0,1%

Capacitor 70 85,7 2,5% 25,8 0,8% 151,3 3,3% 1260,0 1,6%

Fuse 1 1,1 0,0% 10,0 0,3% 25,0 0,5% 10,0 0,01%

Inductor 16 1,8 0,1% 14,6 0,5% 24,0 0,5% 2,0 0,003%

Transistor 2 0,3 0,0% 2,4 0,1% 12,2 0,3% 44,7 0,1%

Resistor 106 35,1 1,0% 2,9 0,1% 35,6 0,8% 462,0 0,6%

Integrated circuit 12 106,7 3,1% 21,6 0,7% 92,4 2,0% 7013,0 8,9%

Capacitor 8 10,0 0,3% 3,0 0,1% 17,7 0,4% 75,1 0,1%

Resistor 28 5,2 0,2% 0,7 0,0% 9,4 0,2% 122,0 0,2%


Capacitor 38 62,2 1,8% 13,4 0,4% 139,5 3,1% 685,1 0,9%

Inductor 4 2,3 0,1% 9,1 0,3% 6,0 0,1% 0,5 0,001%

Connector 3 32,4 0,9% 6,2 0,2% 2,4 0,1% 630,0 0,8%

Resistor 79 14,5 0,4% 2,1 0,1% 26,6 0,6% 344,0 0,4%

Transformer 4 1,0 0,0% 25,0 0,8% 12,6 0,3% 372,0 0,5%


Oscillator 1 10,4 0,3% 22,7 0,7% 30,0 0,7% 783,0 1,0%

Capacitor 46 73,4 2,1% 17,0 0,5% 99,4 2,2% 830,1 1,1%

Connector 4 18,0 0,5% 28,4 0,9% 2,0 0,0% 424,1 0,5%

Inductor 9 3,0 0,1% 8,2 0,3% 13,5 0,3% 1,1 0,001%

Oscillator 1 50,3 1,5% 22,7 0,7% 30,0 0,7% 783,0 1,0%

Resistor 72 20,0 0,6% 1,9 0,1% 10,8 0,2% 314,0 0,4%

Transformer 2 0,5 0,0% 12,5 0,4% 6,3 0,1% 186,0 0,2%


LED 12 9,6 0,3% 210,0 6,5% 72,7 1,6% 15,7 0,02%

Diode 6 178,4 5,2% 25,9 0,8% 28,2 0,6% 44,6 0,1%

Capacitor 47 35,1 1,0% 15,3 0,5% 92,7 2,0% 703,1 0,9%

Connector 2 31,4 0,9% 13,9 0,4% 2,0 0,0% 106,3 0,1%

Resistor 65 11,6 0,3% 1,8 0,1% 21,9 0,5% 283,0 0,4%


Oscillator 6 62,3 1,8% 88,6 2,8% 135,0 3,0% 4700,0 6,0%

Switch 1 2,7 0,1% 6,4 0,2% 12,0 0,3% 430,0 0,5%

Capacitor 137 169,6 4,9% 51,7 1,6% 301,1 6,6% 2468,0 3,1%

Inductor 3 1,7 0,0% 2,7 0,1% 4,5 0,1% 0,4 0,0005%

Connector 1 20,1 0,6% 8,2 0,3% 1,0 0,0% 210,0 0,3%

Resistor 131 24,1 0,7% 3,3 0,1% 44,1 1,0% 397,6 0,5%


Oscillator 2 50,3 1,5% 45,4 1,4% 60,0 1,3% 540,0 0,7%

LED 1 0,9 0,0% 17,5 0,5% 6,1 0,1% 0,9 0,001%

Capacitor 67 97,3 2,8% 25,4 0,8% 148,0 3,3% 1210,0 1,5%

Connector 1 20,1 0,6% 8,2 0,3% 1,0 0,0% 210,0 0,3%

Resistor 22 4,1 0,1% 0,6 0,0% 7,4 0,2% 95,9 0,1%


Capacitor 129 152,0 4,4% 48,9 1,5% 285,0 6,3% 2330,0 3,0%

Inductor 3 1,7 0,0% 2,7 0,1% 4,5 0,1% 0,4 0,0005%

Connector 1 20,1 0,6% 8,2 0,3% 1,0 0,0% 210,0 0,3%

Resistor 93 17,1 0,5% 2,5 0,1% 31,3 0,7% 122,0 0,2%


Oscillator 1 10,4 0,3% 22,7 0,7% 30,0 0,7% 783,0 1,0%

Resistor 80 14,7 0,4% 2,1 0,1% 26,9 0,6% 349,0 0,4%


Oscillator 1 10,4 0,3% 22,7 0,7% 30,0 0,7% 270,0 0,3%

Capacitor 168 236,9 6,9% 61,8 1,9% 371,0 8,2% 3030,0 3,9%

Connector 1 20,1 0,6% 8,2 0,3% 1,0 0,0% 210,0 0,3%

Inductor 2 1,1 0,0% 1,8 0,1% 3,0 0,1% 0,3 0,0003%

Capacitor 29 41,2 1,2% 10,2 0,3% 60,7 1,3% 523,1 0,7%

Inductor 3 1,7 0,0% 2,5 0,1% 4,5 0,1% 0,4 0,0005%

Resistor 71 13,1 0,4% 2,8 0,1% 23,9 0,5% 310,0 0,4%

Transformer 3 0,8 0,0% 2,7 0,1% 9,5 0,2% 279,0 0,4%


Oscillator 1 50,3 1,5% 4,5 0,1% 30,0 0,7% 783,0 1,0%

MMI

CPU S

BlocksFIDES IEC 62380 SN 29500 MIL-HDBK217F2

QuantitiesComponent

Power

Reset

Sub-rack links

Front End switch

IC_B

CPU A

CPU B

Back end links


25

Figure 4-5: Functional level prediction of Board_B

Figure 4-5 and Figure 4-6 displays the functional level prediction and the component level

prediction of Board_B respectively.

In IEC 62380, it is not possible to model integrated circuits with BGA or QFN packages of 0.8

mm pitch. However, in certain blocks in Board_B, integrated circuits with this kind of package types

are present. These integrated circuits are thus mapped in IEC 62380 to the best case possible. The

failure rate for these specific integrated circuits have been marked in light orange in Figure 4-5. The

same colour code is used in Figure 4-6 for the accumulated failure rate of the integrated circuits

modelled in IEC 62380.

In FIDES, when the capacitors have been modelled, the rated voltage is assumed to be 10V for

all of them and the accumulated failure rate of the capacitors used in the Board_B can be seen in

Figure 4-6.

Figure 4-6: Component level prediction of Board_B

According to all the standards integrated circuits contribute quite a big amount towards the

failure rate of Board_B. FIDES and Siemens, marks capacitors as big contributors as well.

4.3 Theoretical Prediction Analysis

Amongst all the selected standards, FIDES uses a prediction model that is based on physics of

failure whereas the other standards use prediction models that are empirical and are based on

statistical interpretation of test data analysis and previous models. . Physics of failure is a technique

to predict reliability and improve product performance. The technique utilizes the knowledge and

understanding of the processes and mechanisms that induce failure. Contrary to the other reliability

standards that uses empirical modelling of operational feedback to build their models FIDES uses

physics of failure to build their models which is later supported by test data analysis, operational

Optocoupler 2 37,4 1,1% 30,7 1,0% 100,0 2,2% 30,9 0,04%

Capacitor 15 18,8 0,5% 5,7 0,2% 33,1 0,7% 81,2 0,1%

Connector 12 251,0 7,3% 169,0 5,3% 36,0 0,8% 2520,0 3,2%

Transistor 2 0,3 0,0% 1,2 0,0% 12,2 0,3% 122,0 0,2%

Resistor 51 9,3 0,3% 1,8 0,1% 17,2 0,4% 222,0 0,3%


Sum 1774 3440 100% 3212 93% 4550 100% 78594 100%

MTBF (Years) 33,2 35,5 25,1 1,5

Back end connectors


Capacitor 754 982,00 28,546% 278,40 8,668% 1699,53 37,35% 13195,70 16,79%

Connector 25 413,15 12,010% 250,07 7,786% 46,40 1,02% 4520,42 5,75%

Diode 14 193,61 5,628% 38,62 1,202% 164,15 3,61% 85,58 0,11%

Fuse 1 1,05 0,031% 10,00 0,311% 25,00 0,55% 10,00 0,01%

Inductor 40 13,28 0,386% 41,62 1,296% 60,00 1,32% 5,08 0,01%


LED 13 10,57 0,307% 227,50 7,083% 78,76 1,73% 16,59 0,02%

Optocoupler 2 37,40 1,087% 30,70 0,956% 100,00 2,20% 30,90 0,04%

Oscillator 13 244,40 7,104% 229,33 7,140% 345,00 7,58% 8642,00 11,00%

PCB 1 6,10 0,177% 1010,00 31,446% 111,00 2,44% No Model 0,00%

Resistor 798 169,01 4,913% 22,69 0,706% 255,16 5,61% 3021,50 3,84%

Switch 1 2,69 0,078% 6,43 0,200% 12,00 0,26% 430,00 0,55%

Transformer 9 2,26 0,066% 40,23 1,253% 28,37 0,62% 837,00 1,06%

Transistor 4 0,51 0,015% 3,61 0,112% 24,40 0,54% 166,73 0,21%

Sum 1774 3440 100% 3212 100% 4550 100% 78594 100%

MTBF (Years) 33,2 35,5 25,1 1,5

ComponentsFIDES IEC 62380 SN 29500 MIL-HDBK217F2

Quantity


feedback and existing modelling. Once the creation of the models have been completed, they are

calibrated from operational feedback [6].

MIL-HDBK-217 in general is conservative due to the components being Non-MIL spec. In

addition, it has also been found out from the predictions of both the products that the standard is

much more conservative for specific components than the other three standards. For selected

component types of both the boards, validated statements are provided below outlining the

differences between them.

1. Board_B

a. Integrated circuits

i. The output of MIL-HDBK-217F for integrated circuits in Board_B yield

to a predicted failure rate which is approximately 35x, 47x and 30x

greater than the FR of FIDES, IEC 62380 and SN 29500 respectively.

ii. FR according to Siemens is 1.6x greater than IEC and 1.2x greater than

FIDES.

b. Capacitors

i. Output from MIL standard is 7.9x, 47.5x, 13x greater than SN, IEC and

FIDES respectively.

ii. FR according to Siemens is 6.1x greater than IEC and 1.73x greater than

FIDES.

c. PCB

i. FR according to IEC is 166x greater than FIDES and 9.1x greater than

Siemens.

d. Connectors

i. FR provided by FIDES is 1.65x, 8.9x greater than IEC and Siemens

respectively.

ii. Output from MIL standard is 97.4x, 18x, 11x greater than SN, IEC and

FIDES respectively.

Similarly, it can be seen that the FR of diodes are much higher for FIDES and SN than IEC and

the MIL standard. MIL standard gives very conservative values for Oscillators while Siemens SN

29500, FIDES and IEC have similar values in comparison.

2. Board_A

a. Voltage converter

i. FR provided by FIDES is 1.88x, 17.4x greater than IEC and Siemens

respectively.

b. Resistors

i. Output from MIL standard is 13x, 164.5x, 23.6x greater than SN, IEC

and FIDES respectively.

ii. FR provided by SN is 12.7x, 1.8x greater than IEC and FIDES

respectively.

c. Integrated circuits

i. FR according to Siemens is 1.1x greater than IEC and 1.1x greater than

FIDES respectively.


27

ii. FR output from MIL is 35.8x, 39.4x and 39.3x greater than SN, IEC and

FIDES respectively.

The above statements clearly outline the differences between specific component types across

various standards. However, these statements can be given more confidence once they are

compared with field data (for the Board_A) whose MTBF will allow of a much better understanding

of which standard relates to the realistic scenario more than the others.

Both Siemens and MIL-HDBK-217F have constraints due to technological advances and do not

take into consideration modern components while creating the models. Thus the output from these

standards should be less trusted than the other two comparatively new standards. The reliability

results of both these standards would likely have gotten worse if the failures related to the PCB

could have been modeled. The limitations in the number of transistors and gates for IC’s have also

affected the prediction in these two standards as well.

In IEC 62380, the major contributing factors of failures for both the boards were in prediction

of failures to the PCB and the Integrated circuits. It can be fair to say, IEC 62380 allocated

approximately 1/3 of the failure rate contribution to the PCB, Integrated circuits and the rest of the

components. IEC 62380, also considers temperature cycling and the mission profile allows for a

better simulation of the real scenario in which the boards are at. However, there is a set boundary to

this standard as well, with a limitation to the amount of transistors and pins in IC’s. This is due to

the standard being not updated after 2004.

FIDES, allows users to mimic real conditions as well with mission profile. The list of

components are more advanced and is in line with the components used in both Board_A and

Board_B. One of the important differences of FIDES in regards to the other standards is that it

includes the manufacturing process factors in the prediction model unlike other standards.

Inclusion of the manufacturing process factor raises the confidence level in the prediction.

In terms of usability, FIDES and IEC 62380 are more user-friendly than the other two

standards.

None of the standards consider lead-free soldering process of the board in their prediction

models. Even though FIDES, in its handbook has a section dedicated to Lead-free soldering process.

However, this is not implemented in the FIDES module of the ITEM QT tool during the time this

prediction was done.

28 | Field Failure Data Analysis for Board_A

5 Field Failure Data Analysis for Board_A

Board_A is the matured product for which field failure data collection, elaboration and analysis is

performed. Substantial amount of field data is available for the product. The Board_A is a

communication board which is part of the Communication Controller Unit (CCU). The functionality

of the board is to receive telegrams from the Central Interlocking System (CIS) and pass them on to

the object controller boards via the OC-Link. The board is manufactured by using the SnPb-

soldering process. The board is kept in the cabinet of the object controller system where

temperature ranges from 50°C to 70°C and one temperature cycle takes approximately 24 hours.

5.1 Board_A version distinction

The very first version of Board_A was manufactured in 2009 and was labelled as Version 2.2. The

board was updated with new version releases up until the year of 2014. The version distinction

along those years including the changed components and the motivation behind those changes can

be found in Table 5-1.

Table 5-1: Board_A version history

Version Manufactured

year

Component

Added

Component

removed

Motivation behind

change

2.2 2009 Hardware fix of

“reset/handover”

2.3 2009 IC_A_Service &

OS update

2.4 2010 J707 VGA

connector

alignment, label

update

2.5 2009/2010/2011 OS update

2.6 2011/2012 Layout update,

removed copper

between pads,

front panel screw

fix

2.7 2012/2013/2014 U106: Power

module

1 of 2 power

modules removed

to save parts in

store due to

pending

obsolescence

2.9 2014 U739: Switched

power regulator

U104: Power

module

The second power

module removed

due to

obsolescence. New

design with

Field Failure Data Analysis for Board_A | 29

29

updated layout.

Board depth

including front

panel reduced by

0.8 mm

5.2 Data Sources and required Inputs

Internal data sources within RCS, BT are used to gather the data required for the field failure data

analysis.

The required inputs needed to perform the field failure data analysis are:

Population of units per project

Commissioning date of the projects

Date of warranty expiration

Number of failures of the Board_A in each project

Operating hours of Board_A in different projects

Version of the board in different projects

Usage Rate (%)

The following attributes are used for RAM elaborations:

Customer name – Incident reports are related to specific projects. The name of the

project is indicated in the “Customer name” attribute.

Case-ID – Each incident is attached with an ID, more commonly referred to as the Case

ID. The Case ID is unique for each incident report.

Product Name – Indicates the name of the product according to BT RCS convention.

Product Number – Combination of characters and numerics, unique for each product.

Serial Number – Specific product can have many identical individual. Serial number

aids in differentiating beween these identical products

Revision – Version of the product.

Date of Error – Displays the date when the incident occur in the field.

Anser – Indicates mitigation/solution applied on the incident report.

Status – The life cycle of an incident report can be classified in 9 different status. For,

RAM elaborations only incident reports that are classified as “Closed” are taken into

consideration.

Fault Type 1 – Indicates the cause of the incident report.

5.3 Elaboration of field failure data

For the elaboration of field failure data, incident reports whose status are reported as “Closed” are

taken into account for the analysis. In addition, for the purpose of confidentiality, the project names

that are used in this Master Thesis have been named randomly.


Assumption during the field failure data analysis includes not considering the following fault

types:

No fault found

Upgrade

Handling

The observation period is from the commissioning date of the project until the end of the

warranty period. In the scenario, where the end of warranty period is not available or is in the future

then 2016-06-21 is considered as the end of observation period. The timeline for the observation

period of the data for analysis can be seen in Figure 5-1.

Figure 5-1: Timeline of Observation

There have been 182 incident reports reported in [18] for Board_A. Amongst these, 21 reports

could be traced back to failures related to hardwares. Fault types that have been considered for

these analyses are failing components, supplier, damaged, wear & tear. For incident reports which

do not contain any fault types, the repair report and the failure code is viewed in more details to

figure out if there was any failures due to hardware.

There have been 38 incident reports where the project upgraded the board from version 2.4 to

2.5 which is basically an OS update mentioned in Table 5-1.

The data that has been collected for the field failure data analysis contains a total of 2571 boards

including 2555 units of Board_A and 16 units of Board_AE. The stackup of hours for the Board_A

and Board_AE across the different projects which have been used for field failure data analysis is

given in Figure 5-2. Since Board_AE is manufactured very recently, it is yet to be installed in

different projects. We could retrieve information on one project in Kaxholmen where 16 units of

Board_AE v1.2 is installed and the boards have been in operation much less than the Board_A.

The equation used to find out the MTBF of the boards are as follows:


31

𝑀𝑇𝐵𝐹 = (𝑂𝑝𝑒𝑟𝑎𝑡𝑖𝑛𝑔 𝐻𝑜𝑢𝑟

𝐻𝑊 𝑓𝑎𝑖𝑙𝑢𝑟𝑒𝑠 + 0.69)

Figure 5-2: Stackup of operating hours

Figure 5-3 shows the analysis that has been performed for the boards based on the field failure

data. For clear visualisation, the MTBF of respective versions of the boards are given in Figure 5-4.

According to the field data for the Board_A, version 2.4 has the highest MTBF followed by version

2.9 whereas Board_A version 2.5 has the lowest MTBF overall. Version 2.5 of the Board_A

experienced the most hardware failures with a tally of 16 however, in the next version of the board

this has been reduced to just 1 failure.

Figure 5-3: Field failure data analyis of Board_A and Board_AE

0

5 000 000

10 000 000

15 000 000

20 000 000

25 000 000

30 000 000

Board_Av2.4

Board_Av2.5

Board_Av2.6

Board_Av2.7

Board_Av2.9

Board_AEv1.2

25 722 120

20 668 968

4 125 360

8 582 904

2 548 488 377088 To

tal H

ou

rs o

f O

per

atio

n

Board version

Total number of boards Total operating hours Total Failures MTBF (Hours) MTBF (Years)

Board_A v2.4 682 25 722 120 2 9 562 126 1 092

Board_A v2.5 1020 20 668 968 16 1 238 404 141

Board_A v2.6 189 4 125 360 1 2 441 041 279

Board_A v2.7 479 8 582 904 4 1 830 043 209

Board_A v2.9 185 2 548 488 0 3 693 461 422

Board_AE v1.2 16 377088 0 546 504 62

BOARD_A


Figure 5-4: MTBF of different versions of Board_A extracted from field failure data analysis

It may seem counter intuitive to have developed the board by introducing new versions when

the MTBF of Board_A v2.4 was very high. However, this is due to the fact that Board_A v2.4

underwent more operating hours than the other versions and had less failures. Also, the idea of

having newer version of a product is to increase functionality and efficiency of the product and

solving issues such as component obsolescence.

For each board that is reported, a failure code, a failure cause and a failure detection phase is

stated. The definition for each of this code can be found in [19].

The incident reports of the “Haitai” and the “Sainan” projects did not have failure code, failure

cause and failure detection phase assigned. Hence, they are assumed based on fault description

from the failure reports. Due to the codes being assumed, they have been represented in “italics”.

The failure detection phase for all the records are 5041 which according to [19], stands for

“Customer – Commercial Operation”. This phase is under the parent code of 5000 which is defined

as “Warranty”. It is understandable that all the failures recorded here have been detected by the

customer while they were in operation. The failure code and the failure cause is more diverse than

the failure detection phase.

There are three specific failure cause and one parent failure cause that have been recorded for

the projects for which analysis is performed. They are as follows:

3006 – Engineering

4000 – External Causer (Parent failure cause)

4036 – Customer

4041 – Consortium partner

The failure codes recorded are:

6026 – Electrical shortage

7211 – Part Broken

8400 – Functional failure (Parent failure code)

8406 – Electric component (unit) does not work


33

Figure 5-5: Failure statistics for Board_A

Figure 5-5, includes the failure statistics for Board_A. The figure contains important

information regarding the failure code, failure cause and the failure detection phase for the Board_A

failures that happened in different projects. Table 5-2 displays all the component failure records

extracted after the FFDA was performed. Apparently, all the components that failed are solely

present in all the versions of Board_A.

One interesting observation from Figure 5-5 is that the failures for the J702 components and

D709 components are concentrated on specific projects. That is 6 of the 7 failures that occurred in

the field due to the J702 component failing happened in two projects in the same country. 3 of the

failures occurred in the Randsburg project while the other 3 failures occurred in Kaxholmen. All of

the 6 failures occurred for the version 2.5 of Board_A. All the failures in the Kaxholmen project

occurred in 2014 while one of the failures in the Randsburg project occurred in 2014 while the rest

occurring in 2016. As for the D709 component, both of them failed in the Randsburg project in the

year of 2016 for version 2.5 of the Board_A.

Apparently, according to the prediction, the connectors are supposedly contributing very less to

the total failure rate of the system. Hence, the occurrence of so many connector failures that too

concentrated on specific projects urges for investigation to be made on those specific projects. This

can happen due to the environment, installation or any other external factors. In this scenario,

investigation to these two projects can be a good ROI. This is because if a specific reason for these

concentrated failures are figured out and action is taken to mitigate these failures, the performance

of the Board_A v2.5 in the field would rise by a considerable margin.

Given that the concentrated failures of the components on the project Randsburg and project

Kaxholmen did not occur, the MTBF of Board_A version 2.5 would then be 271.5 years instead of

141.4 years. And that actually makes much more sense if the MTBF of version 2.6 is looked at, which

has a MTBF of 278.7 years.

Projects Incidents Failure Code CountFailure

CauserCount

Failure

Detection phaseCount D717 D709 J702 F100

Scrap due to obsolete

component/unknown

component

Version

Haitai 9 8400 9 4000 9 5041 9 - - - - 9 2,5

8406 2 4000 2 1 1 1 - -

7211 1 3006 1 - 1 - -

8406 1 4041 1 - 1 1 - -

Kaxholmen 3 7211 3 3006 3 5041 3 - - 3 - - 2,5

8406 1 4036 1 - - - - 1 2,4

8406 1 4036 1 - - 1 - - 2,6

Brownell 1 8406 1 4036 1 5041 1 1 - - - - 2,7

Aptos Hills 1 7211 1 3006 1 5041 1 - - - - 1 2,7

Babylon 1 8400 1 3006 1 5041 1 - - - - 1 2,7

Oketo 1 6026 1 4036 1 5041 1 - - - - 1 2,7

Sainan 1 8400 1 4000 1 5041 1 - - - 1 - 2,4

1 4000 1 - - - - - 2,7

1 4036 1 - - - - - 2,5

Haitai 1 9999 1 4000 1 5041 1 - - - - - 2,5

Brownell 2 9999 2 4036 2 5041 2 - - - - - 2,7

Westernport 3 9999 3 4036 3 5041 3 - - - - - 2,6

Willow Island 2 9999 2 4036 2 5041 2 - - - - - 2,7

Brogan 1 9999 1 4036 1 5041 1 - - - - - 2,9

Tuscola 1 9999 1 4036 1 5041 1 - - - - - 2,7

1 9999 - - - - -

1 8406 - - - - -

Sum 37 2 2 7 1 13

Wakefield 2,92 24036 50412

Hingham 2 5041 2

Randsburg 9999 2 5041 2

Randsburg 4 5041 4 2,5


Table 5-2: List of failed components

5.4 Solder Fatigue analysis for Board_A

In Sherlock, Solder fatigue analysis is performed for Board_A to mimic field conditions with a ΔT of

20°C ranging from 50°C to 70°C. Each cycle takes 24 hours to be completed and it happens

throughout the service life of Board_A which is defined as 30 years.

From the analysis, the solder fatigue life prediction for the components that have failed in the

field can be viewed. Figure 5 6, Figure 5 7 and Figure 5 8 shows the life prediction for the

transceiver, LED and the connector respectively.

The simulated results show that for all the three components that have failed in the field, the

probability of them failing due to solder joint fatigue is negligible.

Figure 5-6: Solder Fatigue life prediction for Transceiver (D709)

The LED in Figure 5-7 is omitted from the theoretical prediction across all four standards.

However, the component has been included in the FFDA.

Component Description Failures

J702 Connector 7

D717 LED 2

D709 Transceiver 2

F100 Fuse 1

Scrapped Board/Unknown - 13


35

Figure 5-7: Solder Fatigue life prediction for LED (D717)

Figure 5-8: Solder Fatigue life prediction for connector (J702)

5.5 Conclusion

There have been a lot of incident reports for the Board_A, however in many of them the failure

code, the failure cause and the failure detection phase has been missing. However, in the recent

projects, their inclusions have made it easier to understand better the incident reports.

Several hardware failures could not be included for the analysis due to the constraint of limited

observation period. One example of a limited observation period is the absence of expiry date of the

warranty. This can be made better if the person responsible in recording the incident report keeps

track of it.

In comparison to the theoretical prediction in Figure 4-1, the MTBF of all the versions of the

Board_A are much higher in reality according to the field failure data analysis. Since the prediction

was done for version 2.5 of the Board_A, a bar chart displaying how the MTBF acquired from the

field failure data analysis differs from the theoretical prediction is given in Figure 5-9.


Figure 5-9: Comparison of reliability data for Board_Av2.5

The figure from the field failure data analysis is almost twice of that of the value predicted by

FIDES and 1.5 times of that of the values predicted by IEC 62380 and Siemens SN 29500. MIL-

HDBK-217F2 predicted a MTBF that is approximately 26 times lower than what we get in reality.

Thus proving the extremely conservative approach of its prediction which gives values that are

unrealistic and can be misleading.

Given that the reliability of the board is significantly higher but not way off from the the

predicted values of FIDES, IEC 62380 and Siemens, it raises the confidence level of using these

standards.

Even though both FIDES and IEC 62380 predicted a high failure rate of the voltage converters

that were used in version 2.5 we were unable to see any failures occurring to the converters in

reality. The MTBF for converters according to the field failure thus is 3419 years.

It can also be concluded from the Sherlock simulation that the thermal cycling the board

undergoes in field conditions is very weak in order to cause damage to the solder joints of the

components that have failed.

This conclusion not only stands true for the transceiver, LED and the connector that have failed

in the field but also for the overall board including the rest of the components. This statement is

validated by the simulation result in Figure 5-10, where the probability of failure for the overall

board stands at only 0.8% at 30 years’ time.

The failures have most likely been induced due to other external factors rather than thermal

cycling. And it can well be but not limited to mechanical shock, vibration, thermal shock, etc.

One of the important conclusions to make is that the the LED and the switches shall not be

excluded from the theoretical prediction since the FFDA suggests that boards with these

components failing have been reported and sent for repair.

Even though, the designers do not consider these components for reliability prediction due to

their futility in board operability. However, once the customer, detects a failure of these components

in the field, they regardless of its effect in board operability send it for repair.

Finally, upon close observation of the field failure data analysis it can be deduced that at times

failures can be concentrated on specific projects, environments, etc. Upon analysis, if it can be


37

derived that mitigating these failures would increase the board performance by a substantial

amount, investigation can be made on those projects to find out the reasons for the failures and

eradicate them. The penultimate chapter in this dissertation includes how this process can be

integrated into elaborating a global model to predict RCS product reliability.

Figure 5-10: Overall Solder Joint fatigue life prediction

38 | Lab Testing and Reliability testing simulation on BT Products

6 Lab Testing and Reliability testing simulation on BT Products

Board_B is one of the very first boards within Bombardier to have been manufactured using the

lead-free soldering process.

Lead-free solder has a higher Young’s modulus than lead-based solder which makes lead-free

solders stiffer. According to [20], “for lead-free SAC solder, the structure exhibits grain formation

due to recrystallization which results in finer grains that separate at grain boundaries resulting in

crack growth”.

Second level interconnection is interconnect between the PCB and the package. Now

that this second level interconnection is made with lead-free solder pastes, the

reliability of the board is impacted.

New compound packages like 0.8 mm pitch BGA and QFN with smaller solder joints

are connected to the PCB. The combination of 0.8 mm pitch BGA and QFN with SAC

305 solder will reduce the reliability.

The idea behind performing the temperature cycling test for Board_B has been to find out

general failures on the board due to variation in temperature and failures occurring due to solder

joint fatigue on the components with BGA and QFN packages and other SMD components.

BGA packages are surface mount packages that are connected to the printed board via solder

balls.

Accelerated life tests is used to reduce the time it requires for testing when the product is very

reliable. During ALT, the product under test is put under environmental conditions that are much

more severe than the conditions that it will encounter after installation. This allows for the

opportunity to evaluate the useful life of the product and the electronic components and

connections within. The results can then be used to identify problems and improve them. The data

from the ALT is also used to predict life under typical field conditions.

There are many forms of accelerated life testing where the stresses applied accelerates the

failure process. The stresses can be applied as high or low temperatures, humidity, temperature

cycling, vibration, electrical stress, etc.

Since the duration of the Master Thesis is 5 months, this report is based on observation ending

on 2016-07-14. The test started on 2016-03-11 making the observation period 125 days which is

equivalent to 4.17 months.

6.1 Accelerated Life Testing

6.1.1 Experimental Setup

The thermal cycling is performed inside the thermal chamber Vötsch, VT 3050. The chamber has a

test space volume of 500 liters with a temperature range from -30°C to +100°C. The temperature

rate of change while heating is 2.0 K/min and while cooling is 1.4 K/min. External dimensions of

the chamber are 1955x1030x940 mm and test space dimensions are 1250x590x710 mm. The

thermal chamber can be seen in Figure 6-1.

Lab Testing and Reliability testing simulation on BT Products | 39

39

Figure 6-1: Thermal chamber

The boards are placed inside the subrack which in turn is placed inside the test space within the

thermal chamber. The boards are powered on and are running an internal test program to mimic

the operational state. The total population of boards for the accelerated life test have been 8 boards.

Four sandwiches are formed from these 8 boards with a pair of boards forming one sandwich. The

boards including the sub-rack inside the thermal chamber can be seen in Figure 6-2.


Figure 6-2: Powered on boards inside the subrack

Sandwich 2, 4 and 5 are coated with conformal coating while sandwich 3 is left uncoated.

Conformal coating is a material of thin polymeric film which is applied on the printed circuit board

to provide protection against moisture, dust, chemical, etc.

6.1.2 Input conditions and duration of the thermal cycling

To ensure that the boards undergoing the temperature cycling test are placed in similar conditions

to that of the real scenario, the condition inside the thermal chamber is adjusted to mimic real

conditions. The settings of the thermal chamber are as follows:

Temperature range programmed: -12°C ≤ θ ≤ +82°C

Temperature range logged within the PCB: -2°C ≤ θ ≤ +72°C

Temperature Cycle: fluctuates between 145 minutes to 180 minutes for one period

Dwell time (Minimum Temperature): 25 minutes


41

Dwell time (Maximum Temperature): 15 minutes

Ramp up: 50 minutes

Ramp down: 90 minutes

In real conditions, the temperature cycles between 50°C and 70°C with 24 hours cycle. Hence,

according to the service life of 30 years, the board goes through a total of 10950 cycles (30

years*365 cycles). For, the accelerated life test, the temperature range is from 0°C to 70°C with the

cycle taking roughly 145 -180 minutes to complete on an average. Hence, while the product

undergoes once cycle each day while at the field; the same duration will see the PUT undergo 8-10

cycles (24 hours*60 minutes/180 minutes).

One of the important factors to consider while performing accelerated life test is to find out for

how long it should run for the product under testing to have experienced the similar amount of

stress and number of cycles in accordance with the real conditions.

The Norris Landzberg model (modified Coffin-Manson model) is used to model the acceleration

on the fatigue mechanism due to temperature variation. The equation uses different fatigue

coefficient for different types of solder materials. For SnPb solder, the fatigue coefficient, m is

assigned the constant value of 1.9 whereas when it is a lead-free (SAC 305) solder, m is assigned the

constant value of 2.65 [21]. However, none of the equations include any impact from conformal

coating on second level interconnect. The Norris-Landzberg model that is used to acquire the

acceleration factor is given below:

𝐴𝐹 = (𝛥𝑇𝑡

𝛥𝑇𝑜)

𝑚

∗ (𝑓𝑜

𝑓𝑡)

0.136

∗ 𝑒𝑥𝑝 {𝐸𝑎

𝑘(

1

𝑇𝑚𝑎𝑥,𝑜−

1

𝑇𝑚𝑎𝑥,𝑡)} (𝑆𝐴𝐶 305)


𝛥𝑇0)

𝑚

∗ (𝑓𝑜

𝑓𝑡)

(1/3)

∗ 𝑒𝑥𝑝 {𝐸𝑎

𝑘(

1

𝑇𝑚𝑎𝑥,𝑜−

1

𝑇𝑚𝑎𝑥,𝑡)} (𝑆𝑛𝑃𝑏)

AF = acceleration factor

ΔT = rate of change in temperature

o = operating/field condition

t = test condition

𝑇𝑚𝑎𝑥,𝑜 = maximum temperature in operating field/condition

𝑇𝑚𝑎𝑥,𝑡 = maximum temperature in test condition

f = frequency of number of cycles

m = solder fatigue coefficient

k = Boltzmann constant

𝐸𝑎= activation energy

The reliability prediction standard FIDES uses the Norris-Landzberg model as well to model the

acceleration on the solder joint fatigue mechanism due to temperature variations.

In the current state, the maximum temperature in the field condition and the test condition is

the same. In addition we are also aware of the acceleration factor of the number of cycles. Hence,

the equations reduce to:


𝛥𝑇0)

𝑚


𝐴𝐹 = (70

20)

1.9

= 10.807 (𝑆𝑛𝑃𝑏)

𝐴𝐹 = (70

20)

2.65

= 27.655 (𝑆𝐴𝐶305)

Now based on the different acceleration factor depending on the solder material, the number of

thermal cycles we need to carry out can be found by using the following formula:

𝐶𝑦𝑐𝑙𝑒𝑠𝑡 = 𝐶𝑦𝑐𝑙𝑒𝑠𝑜

𝐴𝐹

Table 6-1 shows how for the different solder materials, from the above information it is possible

to deduce the duration of the ALT.

Table 6-1: Calculation for duration of ALT for different solder materials

Solder material SnPb SAC 305

𝐶𝑦𝑐𝑙𝑒𝑠𝑡 10950

10.807= 1013.2 𝑐𝑦𝑐𝑙𝑒𝑠

10950

27.655= 395.95 𝑐𝑦𝑐𝑙𝑒𝑠

Duration for cycles to complete

(minutes)

1013.2 𝑐𝑦𝑐𝑙𝑒𝑠 ∗ 180 𝑚𝑖𝑛𝑢𝑡𝑒𝑠= 182376 𝑚𝑖𝑛𝑢𝑡𝑒𝑠

395.95 𝑐𝑦𝑐𝑙𝑒𝑠 ∗ 180 𝑚𝑖𝑛𝑢𝑡𝑒𝑠

= 71271 𝑚𝑖𝑛𝑢𝑡𝑒𝑠

Duration for cycles to complete

(days)

126.65 days ≈ 4.22 months 49.5 days ≈ 1.65 months

From the calculations above it is possible to deduce that for the Board_B where components are

mounted using lead-free solders, 396 accelerated life cycles with ΔT=70°C is equivalent to

approximately 10950 cycles in field conditions. Hence, running the test for 49.5 days in the thermal

chamber is equivalent to 30 years of service life of the product in field conditions.

There are different ways the failures can be analysed if they occur. These methods include

optical and scanning electron microscopy (SEM), X-ray and coupled scanning acoustic microscopy

(CSAM), cross-section (transverse and parallel), and dye-and-pry (pressurized dye exposure of

assembled unit followed by mechanical package removal). In the case of any hardware related

failures, the test sample will be sent for micro-sectioning and finding out the failure cause.

In case, the test completes the accelerated life test without any failures a minimum of three

randomly selected boards from the overall test sample shall be sent to perform the failure analysis.

This is to ensure that no failures were missed due to design in the test application, hardware design

or external factors.

All internal interfaces between processors on Board_B are supervised by test programs running

on the processors. Figure 6-2 displays the test boards of Board_B mounted in “sandwiches”

constituting of two Board_B, interlinked with a small backplane board.

The test application allows for one of the processors to send a packet/telegram to the same

processor on the opposite side of the sandwich. The other processor after receiving the packet sends

back an “ack” packet to the processor that initiated the message passing. This holds true for all the

processors. This is a way for the test application to check if all the interfaces under supervision are

functional or not. In case a packet goes missing and an interface stops working it can be identified

from the GUI.


43

Not all the interfaces are supervised by the test application. The interface around the power

module is manually supervised by observing the LED on the board which is green during operation.

If a component within the power module interface fails, this can be observed from the LED as it will

turn either orange or red or will completely go off.

6.1.3 Observation

The temperature log displaying the temperature cycling over time and the temperature variation can

be seen in Figure 6-3 .

Figure 6-3: Temperature Log

During the course of the observation period, the four sandwiches have accumulated up to 3929

cycles. Total cycles for each individual sandwich are as follows:

Sandwich 2: 928 cycles




No potential HW failures have been observed. The monitor displaying the test application

activities gave some warnings but all that is because of issues related to test application and cables.

The software issues stemmed due to priority preference of the tasks to be performed by the test

application. The current test application version have high priority for “DD-test” which is the NOR

Flash test. The issues can be seen in Monitor Log in [22].


The methodology used for observing if any HW failure has occurred in the boards have been to

map the test application output to the temperature logged within the thermal chamber. The idea has

been to look for any correlations to the failures with temperature. One such interesting observation

is made, when the S- processor in the right board of the sandwich 2 started to show recurrent timing

out and becoming active. There have been two different volatile temperature ranges at which the

processor would time out and become active again. However, with time, they were shifting towards

the lower temperature region of the thermal cycle. Eventually, the processor S sent a time out

message via the test app while the temperature was -5°C and never became active again. Upon

further investigation, it is found out that the cable connecting the two boards in Sandwich 2 have

become loose and do not allow Processor S on the right board in Sandwich 2 to send any package

eventually leading to the time out messages via the test app.

The change in temperature range while the error messages are popping up led to the

understanding that this is a random failure happening and could be due to SW failures. However,

upon the finding of the broken cable, this can be an indicator in the future for similar failures. At

higher temperatures the cable would expand and will have connectivity but will lose connectivity

due to low temperature when contraction would occur. At the end, the final contraction led the cable

to a position from where it could not connect to the port even at higher temperatures anymore. This

failure is not considered as a HW failure intrinsic to the board.

Also a good indicator, of this kind of failures as observed would be, with progressing time the

frequency of the processor being timed out and getting back active again would decrease eventually

leading to failure.

Sherlock Reliability Testing Simulation – Accelerated Life Testing | 45

45

7 Sherlock Reliability Testing Simulation – Accelerated Life Testing

Solder fatigue analysis is performed for the Board_B undergoing the Accelerated life testing. The

input parameters have been set in line with the parameters used for accelerated life testing and by

using the correlation derived using the acceleration factor. The input parameters for Sherlock

simulation are as follows:

Life cycle

o Service Life = 50 days

Life phase editor

Phase settings

o Environment: Ground_Benign

o Duration: 180 days

o Number of cycles: 100 Duty cycles

Thermal event editor

o Thermal event settings

Number of cycles: 100 Duty cycles

Life cycle status: Operating

Thermal Profile Editor

o Time units: Minutes

o Temperature units: °C

o Step

Minimum temperature: Hold for 25 minutes at 0°C

Ramp up: 50 minutes until temperature reaches 70°C

Maximum temperature: Hold for 15 minutes at 70°C

Ramp down: 90 minutes until temperature reaches 0°C

According to [23], in 3.4.11, the dwell time in this case has been considered the time when the

temperature stays below 0°C for the lower end of the cycle and above 70°C for the upper end of the

cycle.

The life prediction curve from the solder fatigue analysis performed in Sherlock is given in

Figure 7-1.

46 | Sherlock Reliability Testing Simulation – Accelerated Life Testing

Figure 7-1: Solder Fatigue Life Prediction Curve, Board_B_ALT, Weibull curve

From the solder fatigue analysis, the possibility of the Board_B having any failure after the

completion of the ALT in approximately 50 days is 0.32%.

Figure 7-2, shows the chart with the distribution of TTF (days) values for all of the 1774

components analyzed. It can be seen from the chart that all the 1774 components are having a TTF

that is 5 times higher than the service life defined for the ALT.

The following components are not part of the analysis due to package type being not supported:

F1

U42

U50

U144

Sherlock Reliability Testing Simulation – Accelerated Life Testing | 47

47

Figure 7-2: Solder Joint Fatigue Life Distribution_Board_B_ALT

In the worst possible case even if the consideration is made to use the lowest acceleration factor

which is derived after using the solder fatigue coefficient for SnPb solders, the accelerated life test is

enough with 127 days. Figure 7-3 displays a zoomed out version of the life prediction curve in Figure

7-1 in order to view the probability of failure (%) due to solder joint fatigue for the Board_B at the

end of ALT in day 127. The simulation result shows that there is a 5.17% probability of failure due to

solder joint fatigue for Board_B in case the ALT is continued for 127 days keeping in consideration

the lowest acceleration factor possible.

48 | Sherlock Reliability Testing Simulation – Accelerated Life Testing

Figure 7-3: Solder Joint fatigue Life prediction_Board_B_m=2.65

7.1 Observation

Based on the observation from the ALT, not HW failures have been observed. As for Sandwich 4 and

Sandwich 5 according to the worst case acceleration factor they have already seen as many cycles as

the real product in field conditions. Based on the reference made above, three of the random boards

from the overall sample can be sent to for micro-sectioning of the boards to ensure no failures are

missed due to design in the test application, hardware design or external factors once the other two

boards survive the ALT without enduring any failures.

The ALT outcomes can also be validated by observing the Sherlock simulation result which at

least for failures due to solder joint fatigue displays a very low probability.

One of the important findings during the course of this ALT have been to figure out the

acceleration factor correlating the ALT with the field conditions. This model for finding out the

acceleration factor can be used within Bombardier for other products to undergo thermal cycling as

part of accelerated life testing. This would allow for correlating better the number of cycles that the

product must undergo during ALT to match the number of cycles the product sees during the

normal operating conditions throughout the course of its service life.

The Model | 49

49

8 The Model

Circuit Card assemblies or PCBs are one of the major focus of development within BT RCS. It is vital

that the prediction of the reliability of these boards are done with utmost care. A carefully

constructed prediction allows the stakeholder involved to have a better understanding of the

performance of the product. It also allows the organization to present an important element during

the bid phase to familiarize its clients to the performance of the board.

The issue in hand is that the predicted reliability values often differ from the performance of the

boards in field conditions. This document reveals how to enhance the process of predicting the

reliability of the boards that is more realistic to the performance of the board in field condition. This

will increase the confidence in the prediction performed and also providing the clients with better

results during the bid phase.

The model is what has been implemented during the course of this thesis. First of all, reliability

prediction across selected prediction standards are performed on Board_Av2.5. The predicted

results are shown in Figure 8-1. Afterwards, FFDA is performed by utilizing the substantial amount

of field failure data that is reported for different versions of Board_A. The MTTF obtained for

Board_Av2.5 from the field failure data analysis is reported to be approximately 141 failures.

Comparing this with the predicted results for Board_A across different standards in Figure 8-1, it is

possible to correlate the board performance with each of the reliability results predicted by the

standards. Following this procedure, it can be seen that the performance of the Board_Av 2.5 in the

field is 1.96 times better than the result predicted by FIDES, 1.47 times better than what’s predicted

by IEC 62380 and Siemens SN 29500. The difference between the field performance and the results

predicted by the MIL-HDBK-217F2 standard was the largest with the field performance being 25.7

times greater than predicted value.

Figure 8-1: Theoretical Reliability Prediction of Board_A

For the future versions of the Board_A, the same factors can be applied to the respective

prediction standards to get a MTTF that is more relevant and closer to the performance of the board

in the field.

The model can be applied to other boards within BT RCS. However, the prediction needs to be

done again since the complexity of the boards vary between one another. And as long as the board is

a matured product with field failure data available, after the field failure analysis the result can be

compared to the predicted values by the standards to find out the correlation factor. This factors can

later be applied to the future versions of the board.

The model is constrained by the fact that it requires the product to have field failure data to be

implemented. Hence, it is not implementable on boards that are not installed in the field or boards

50 | The Model

without substantial amount of failure data. This specifically applies to the lead-free boards within

BT RCS most of which are still under development.

To ensure that the products under development are not excluded from predicting their

performance in the field due to lack of field data. A separate model is suggested.

The boards are to undergo accelerated life testing similar to what the Board_B has undergone as

part of the thesis. The results obtained from the accelerated life testing will allow to predict the

board’s performance in the field operating conditions. However, it is very important that an

appropriate acceleration factor is derived so as to ensure that the ALT cycles undergone by the

board is equivalent to the cycles it experiences during its service life in field operating conditions.

Some physics of failure models that can be used to relate the results obtained under ALT with

results under normal field conditions are given below:

Arrhenius Acceleration Model

o Thermal stress

Inverse Power Law Model

o Non-Thermal accelerated stress

Eyring Model

o Thermal stress and Electrical/Humidity stress

Norris-Landzberg Model

o Thermal cycling

During the course of this thesis, one of the major findings has been the acceleration factor

correlating the test conditions to the operating conditions which later helped in finding out the ALT

cycles equivalent to the cycles in the field. Since, this was a test concerning thermal cycling, the

Norris-Landzberg model was used. However, depending on the ALT, the relevant model needs to be

chosen.

The proposed model states that once the ALT is performed the results can be validated by using

a reliability testing simulation tool e.g. Sherlock, where it is possible to mimic both the ALT

condition and the field operating condition. This way the confidence level in the reliability

prediction outcome will increase as well as the performance of the board throughout its service life.

Conclusions and Future work | 51

9 Conclusions and Future work

9.1 Conclusions

To conclude this dissertation, the goals that have been defined in Section 1.3 have been realized. The

initial steps of achieving the sub-goals and later utilizing their outcomes helped in accomplishing

the final goal of this dissertation. The ‘global’ model that is the outcome of this thesis project would

allow BT to efficiently perform product reliability. Using the model, would allow them to gather

reliability information on their products that is more accurate and realistic. Presenting more

accurate reliability information to the potential clients in the bid phase will be very attractive both

for BT and the clients.

During the course of this project, many insights were gained. Most important of all would be the

fact, that the knowledge that has been earned during the initial period through literature study was

successfully implemented in practice. The planning of the project has been very crucial and the

decision to have bi-weekly meetings to provide updates on the progress of the project turned out to

be of great help. Thus, the project ended successfully in due time.

Initially, almost a month of literature study was performed in Board_C, which was supposed to

be the matured product, to perform field failure data analysis on. Later the board was changed to

Board_A, due to the non compatibility of its design files with the reliability testing simulation tool.

If I were to do the same work again, I would check the compatibility of the design files of the board

with the reliability resting simulation tool. This would allow me to spend the time I would save in

other parts of the project.

9.2 Limitations

In general, during the course of this dissertation, the process have run smoothly. In few instances,

there were some hindrances that limited the efforts done in order for a successful accomplishment

of the tasks being done. They are listed in this section.

One of the limitations have been in the delay in the development of the test application software

designed to track down failures occurring to the three processors in the Board_B during the

accelerated life testing. Due to the delay in development of the test application, the processors of the

board could not be monitored for any failures until almost a month after the accelerated life testing

begun. However, this limitation did not have any impact on the results since once the test app was

ready and installed on the processors, no failures were found. This went on to suggest that during

the time period when the test app was not ready, the board was fully functional without any failures.

There has been some limitation during the field failure data analysis. In some of the field

incident report, the version of the Board_A was missing for which the failure occured. To limit the

overall effect on the accuracy of the field failure data analysis, the period of the failure is matched

with the period of the Board_A version release. The version of the Board_A that is released before

the failure date is assumed to have been failed and is used for the analysis.

9.3 Future work

The field failure data analysis consisted of field data collections on different dates. They are namely,

Project commissioning date, Warranty expiry date of the project, Service contract of the project (if

any) after the project has ended. However, what has not been included in the scope of this master

thesis is the grey zone between the product entering in service and project commissioning date.

Apparently, the time duration between the installation date and the project commissioning date can

52 | Conclusions and Future work

vary from some weeks to a couple of years. During this time, there can be incident reports regarding

product failure. Similar grey zone can exist between the expiry date of the warranty and the end of

service contract when the incident reports are likely to not be recorded or observed. In the future, it

is possible to extract data from these grey zones. This will ensure an increased coverage of the

failure incident reports for all the products being used at Bombardier. As a result, the field failure

data analysis can be enhanced much more than what it is at present.

In addition, in the future ALT of the products can be done with other accelerating variables such

as vibration, humidity, electrical stress, etc.

References | 53

References

[1] A. Hakansson, “Portal of research methods and methodologies for research projects and degree projects,” Steer. Comm. World Congr. Comput. Sci. Comput. Eng. Appl. Comput. WorldComp, p. 1, 2013.

[2] E. Dubrova, Fault-Tolerant Design | Elena Dubrova | Springer, 1st ed. Springer-Verlag New York, 2013.

[3] Murthy, Rausand, and Osteras, Product Reliability, 1st ed. London: Springer-Verlag London, 2008.

[4] J. A. Jones, “Electronic reliability prediction: a study over 25 years,” phd, University of Warwick, 2008.

[5] J. Jones and J. Hayes, “A comparison of electronic-reliability prediction models,” IEEE Trans. Reliab., vol. 48, no. 2, pp. 127–134, Jun. 1999.

[6] “Selection Guide for electronic components predictive reliability models.” IMdR, Oct-2009.

[7] L. Escobar and W. Meeker, “A Review of Accelerated Test Models,” vol. 21, no. 4, pp. 552–557, 2006.

[8] A. Kostic, “Lead-free Electronics Reliability - An Update,” GEOINT Development Office, Aug-2011.

[9] M. Meilunas, A. Primavera, and S. Dunford, “Reliability and Failure Analysis of Lead-Free Solder Joints,” in IPC Conference Proceedings, 2002.

[10] “RAM Products Catalogue: Components vs. RAM Deliverables.” Bombardier Transportation (RCS), 31-Mar-2016.

[11] C. Platt, Encyclopedia of Electronic Components, vol. 1. O’Reilly, 2012. [12] C. Platt, Encyclopedia of Electronic Components, vol. 2. O’Reilly, 2014. [13] “UTE_Guide_FIDES_2009_Ed_A_EN.pdf.” . [14]“IEC TR 62380: Reliability Data Handbook - Universal model for reliability prediction

of electronic componenets, PCBs and equipment,” International Electrotechnical Commission, Aug. 2004.

[15] “IEC 61709: Electric components - Reliability - Reference conditions for failure rates and stress models for conversion.” International Electrotechnical Commission, Jun-2011.

[16] K. Nylund, “Reliabilitity Prediction for Board_A.” Bombardier Transportation (RCS), 09-Nov-2011.

[17] M. Magnusson, “Reliability Prediction for Board_B.” EHC, Bombardier Transportation (RCS), 29-Apr-2014.

[18] W. Nualpluad, “All RAM view 21-06-2016.” Bombardier Transportation (RCS). [19] “GRP-40-10-25-007660 rev 03 en - NCR Standard Catalogues (data) (004).”

Bombardier Transportation (RCS), 21-Jun-2016. [20] M. Osterman, “Effect of Temperature Cycling Parameters (Dwell and Mean

Temperature) on the Durability of Pb-free solders,” 27-Jan-2010. [Online]. Available: http://www.calce.umd.edu/lead-free/CALCE-IMAPS2010.pdf. [Accessed: 09-Nov-2016].

[21] V. Vasudevan and X. Fan, “An acceleration model for lead-free (SAC) solder joint reliability under thermal cycling,” in 2008 58th Electronic Components and Technology Conference, 2008, pp. 139–145.

[22] K. Nylund, “Logbook_BoardB_TempCycling.” 30-Jun-2016. [23] “Performance Test Methods and Qualification Requirements for Surface Mount

Solder Attachments,” Feb-2006. [Online]. Available: http://www.ipc.org/TOC/IPC-9701A.pdf. [Accessed: 09-Nov-2016].

Appendix A: Benchmark of Reliability Standards | 55

Appendix A: Benchmark of Reliability Standards

Supplementary Data File

Description:

The accompanying Excel spreadsheet consists of five worksheets. They are:

1. Classification and Comparison

2. Component Mapping

3. Possible Problems and CoP

4. Component Ratio Mode and LE

5. References

Filename:

Benchmark_of_Reliability_Standards.xlsx

Appendix A: Benchmark of Reliability Standards | 57

TRITA-ICT-EX-2016:185

www.kth.se