Applying Reliability Engineering Techniques
-
Upload
anonymous-ngxdt2bx -
Category
Documents
-
view
227 -
download
0
Transcript of Applying Reliability Engineering Techniques
-
7/30/2019 Applying Reliability Engineering Techniques
1/15
Tutorial Notes 2012 AR&MS
2012 Annual RELIABILITY and MAINTAINABILITY Symposium
Applying Reliability Engineering Techniques &
Best Practices to Achieve Functional Safety
William M. Goble, Ph.D. & Julia V. Bukowski, Ph.D.
William M. Goble, Ph.D., P.E., CSFEPrincipal Partner, exida, LLC
61 N. Main Street
Sellersville, PA 18960 USAInternet (e-mail): [email protected]
Julia V. Bukowski, Ph.D.Dept of Electrical & Computer Engineering
Villanova University
Villanova, PA 19085 USAInternet (e-mail):[email protected]
-
7/30/2019 Applying Reliability Engineering Techniques
2/15
ii Goble & Burkowski 2012 AR&MS Tutorial Notes
SUMMARY & PURPOSE
The purpose of this tutorial is to introduce the basics of functional safety and illustrate how a variety of conventional
reliability engineering techniques and best practices can be applied to the problem of achieving it. The material presented has
wide applicability in industries as diverse as petro-chemical, nuclear, automotive, pharmaceuticals, railroads, and power
generation, to name a few. The material is relevant to engineers and managers who work in situations which require functional
safety to be achieved and maintained; thus it is equally beneficial for designers of safety systems and products as well as end
users who rely on such safety systems and products. The tutorial assumes no prior knowledge of functional safety and is not
mathematically intense. After completing this tutorial, attendees should be conversant with the basic concepts of functionalsafety, understand the System and Product Safety Lifecycles, be knowledgeable about which reliability engineering techniques
and best practices to consider applying at various points in the safety lifecycles, and have a broad overview of the IEC 61508
safety standard.
William M. Goble, Ph.D., P.E., CSFEWilliam M. Goble is currently Principal Partner and co-founder of exida, a product certification and engineering consulting
company focused on automation system safety and reliability. He has over 30 years of experience in electronic design,
software, reliability analysis and management. He has a BSEE from Penn State University, an MSEE from VillanovaUniversity and a Ph.D. in Reliability Engineering from Eindhoven University of Technology. He is a registered professional
engineer in the State of Pennsylvania and a Certified Functional Safety Expert (CFSE). He is a fellow member of ISA and
author/co-author of three books.
Julia V. Bukowski, Ph.D.
Julia V. Bukowski recently retired from the Department of Electrical and Computer Engineering at Villanova University
where she was a member of the standing faculty for more than 25 years. She is currently engaged in a variety of research and
consulting activities as well as part-time teaching. She has more than 30 years experience in the field of reliability and safety.
She received her BSEE and Ph.D. (Systems Engineering) from the University of Pennsylvania, and her DIC in Electronics
Engineering from Imperial College of Science and Technology, University of London. She has been a Fulbright Senior Lecturer
and Visiting Associate Professor with the Faculty of Industrial Engineering and Management at the Technion Israel Institute of
Technology in Haifa, Israel. She is a senior member of the IEEE and has been a guest editor for a special issue of theIEEE
Transactions on Reliability.
Table of Contents
1. Introduction ..........................................................................................................................................................................1
2. Background ..........................................................................................................................................................................13. Overview of IEC 61508 ....................................................................................................................................................... 3
4. Reliability Engineering Techniques & Best Practices: System Level Application .............................................................. 5
5. Reliability Engineering Techniques & Best Practices: Product Level Application ............................................................106. IEC 61508 Certification ..................................................................................................................................................... 11
7. Cyber Security .................................................................................................................................................................... 12
8. Conclusions ........................................................................................................................................................................ 129. References ..........................................................................................................................................................................12
10. Tutorial Visuals .................................................................................................................................................................. 14
-
7/30/2019 Applying Reliability Engineering Techniques
3/15
2012 Annual RELIABILITY and MAINTAINABILITY Symposium Goble & Bukowski 1
1. INTRODUCTION
Many industries use automatic protection equipment to
safeguard people, property and the environment from
potentially hazardous events. The reliability engineering
techniques for optimal design of such automatic protection
equipment have evolved over the years and international
standards have been written to document best practices. This
area of engineering design is known as functional safety.
Since a best practice for achieving functional safety is
meeting the requirements of an appropriate safety standard,
we will use a well-recognized international safety standard
(IEC 61508 [1]) as a vehicle for discussing techniques and
best practices. This standard is an especially good choice for
several reasons. First, it does not prescribe any specific
techniques or practices which must be used. Rather it allows
the user to choose appropriate techniques and practices
provided the choice can be reasonably justified. Therefore,
many different techniques and practices can be highlighted in
this tutorial. Second, it relies on the concept of Safety
System and Product Lifecycles which permits us to highlight
techniques and practices used throughout the lifecycle
beginning with initial system/product concept through design,implementation, testing, validation, documentation,
commissioning, operations, and finally decommissioning.
Third, the standard covers complete systems as well as
hardware and software components. Various techniques and
practices for each of these areas are presented.The remainder of this tutorial consists of the following
topics:
1. Background information to introduce basic conceptspertinent to functional safety
2. A broad overview of the IEC 61508 safety standard whichis used as a framework for discussing various techniques
and practices
3. Details regarding engineering reliability techniques andbest practices to achieve functional safety applied at thea. system level
b. product level4. Information on IEC 61508 product certification5. Highlights of issues regarding cyber security1.1 Notation and Acronyms
ACOS Advisory Committee of Safety
A/CV actuator and control valve
A/SV actuator and safety valve
BPSC basic process control system
CMMI Capability Maturity Model Integration
E/E/PE electrical/electronic/programmableelectronic
EN European norm
DD dangerous detected failure
DU dangerous undetected failure
FD fail dangerous
FMEA failure, modes & effects analysis
FMEDA failure, modes, effects & diagnostic analysis
FS fail safe
FSM functional safety management
HAZOP hazard and operation study
IEC International Electro-technical Commission
ISCI ISA Security Compliance Institute
L/S logic solver
PES programmable electronic system
PFDavg average probability of failure on demand
PLC programmable logic controller
POS positioner
RR risk reductionRRF risk reduction factor
RRFa risk reduction factor achieved by SF
RRFr risk reduction factor required of SF
SEN sensor
SF safety function
SFF safe failure fraction
SIL safety integrity level
SRS safety related system
SrS safety requirements specification
S/V solenoid valve
2. BACKGROUND
2.1 Hazards, Risks, & Risk Reduction Factor
IEC 61508 defines a hazard as a potential source of
physical injury, damage to the health of people, or damage to
property or the environment. It also links the hazard to its
potential consequences in order to establish a measure of risk.
Consider, for example, the steam turbine system and a basic
process control system (BPCS) illustrated in Figure 1. The
steam turbine system consists of a valve to control the inlet
steam, a turbine spun by the steam, a shaft turned by the
turbine, and an unseen load on the shaft. The BPCS consists
of a sensor (SEN) to monitor the shaft speed, a logic solver
(L/S) to determine if the shaft speed is appropriate or needs to
be altered, a positioner (POS) and an actuator and controlvalve (A/CV) to adjust the amount of steam driving the
turbine.
Figure 1 - Steam turbine system with a BPCS.
One can identify a number of hazards for this system; we
Steam
Turbine
BPCS
SEN
BPCS
L/SBPCS
POS
BPCS
A/CV
-
7/30/2019 Applying Reliability Engineering Techniques
4/15
2 Goble & Bukowski 2012AR&MS Tutorial Notes
consider a few examples and their possible consequences. If
the shaft spins too fast, flying projectiles may result which
represent damage to the turbine system itself, but which may
further cause personnel injury or damage to adjacent
equipment. If the shaft spins too slowly, it will bend under the
load (equipment damage) unless the load is partially or fully
removed. If steam leaks, personnel in the vicinity may sustain
serious injury. A later example illustrates hazards with
environmental consequences as well.
Risk is a quantitative measure which incorporates boththe likelihood and the consequences of a hazard, i.e., how
often can a hazard occur and what and how severe will the
consequences be if it does. The impacts of risk include
personnel, environment, equipment/property damage, business
interruption, business liability, and company image.
In performing risk analysis, we need to distinguish
between inherent and tolerable risk. Inherent risk is the risk
posed by the process (including its BPCS) unmitigated by
additional automatic protection equipment, i.e., unmitigated
by a safety function (SF) whose concept is detailed in the next
section. It is impossible and/or impractical to eliminate all
inherent risk. Tolerable risk is risk designated as that
acceptable to management, insurers, regulatory authorities,and the general public.
The concept of risk reduction factor (RRF) has two
distinct but related usages. The first is to specify the
minimum risk reduction required of an implemented SF in
order to decrease the overall risk from its inherent level to its
tolerable level. We designate this required RRF as RRFr
which is defined as
RRFr = inherent risk / tolerable risk. (1)
The second usage is to specify what RRF is achievedby a
particular SF implementation. We designate this achieved
RRF as RRFa which is defined as
RRFa = 1/PFDavg (2)
where PFDavg is the average probability of failure on
demand. Further details regarding PFDavg are described in
the sections following.
In order for an SF to be appropriate for a given
application, RRFa must be greater than or equal to RRFr.
2.2 Concepts of a Safety Function & Safety Related System
We have already referred to an SF and, here, further
explore this concept. An SF is a collection of sensors, a logic
solver, and final elements used to implement automatic
mitigation of a specific hazard; see Figure 2. Again, consider
the turbine system, now illustrated in Figure 3 with an SF also
present. The SF consists of a speed SEN, an L/S, a solenoidvalve (S/V) and an actuator and safety valve (A/SV) which are
referred to as final elements.
At first glance, it may appear that the SF directly
duplicates the function of the BPCS. However, there are
differences. For example, where the BPCS A/CV is designed
to adjust position to allow varying amounts of steam to drive
the turbine, the SF A/SV is designed merely to be either fully
opened (allowing the BPCS A/CV to control the amount of
steam) or fully closed (to deprive the turbine of steam in the
event of an over-speed hazard). The illustrated SF is designed
to protect against a specific hazard. Other SFs may be
required to protect against other hazards.
Figure 2 Illustration of the basic components
of a safety function.
Figure 3 Steam turbine system with a BPCS
and a safety function (SF).
For example, if the shaft spins too slowly, and the BPCS
does not or cannot appropriately compensate, the SF SEN will
measure the shaft speed, the SF L/S will determine that load
shedding is necessary and direct different (unseen) final
elements to perform this task. Thus, in general, a process will
need the additional protection of several SFs which will likely
share a common L/S and which may or may not share certain
sensors and final elements. A collection of SFs designed to
protect a process against several hazards is a safety related
system (SRS). Figure 4 illustrates this concept.
2.3 SF and SRS Failure Modes
SFs and SRSs are needed because we recognize that theprocess or its BPCS may fail. Clearly, an SF or SRS may also
fail due to the failure of one or more components of the SRS.
To properly analyze the impacts of an SF or SRS failure per
IEC 61508 we must distinguish between different failure
modes.
An SF or SRS is said to fail safe (FS) if, due to failure of
SF or SRS component(s), it erroneously determines that a
hazard exists and inappropriately intervenes in the process,
-
7/30/2019 Applying Reliability Engineering Techniques
5/15
2012 Annual RELIABILITY and MAINTAINABILITY Symposium Goble & Bukowski 3
usually by executing a shutdown of the process that is not, if
fact, required. Safe failures are disruptive to the process but
do not pose any safety risks. On the other hand, an SF or SRS
is said to fail dangerously (FD) if, due to failure of SF or
SRS component(s), it is unable to intervene appropriately if it
is required to do so. Dangerous failures are of paramount
concern from a safety perspective because the RRFa of the SF
is a function of PFDavg, which is the average time the SF
spends in states of FD, i.e., the average time the SF is unable
to respond to a hazard (demand).
Figure 4 Illustration of a safety related system.
2.4 Automatic Diagnostics, Detected & Undetected Failures
SFs and SRSs usually have built-in diagnostics to monitor
the health of the safety system and to determine if the SRS has
entered an FD failure mode. The ability to automatically
detect FD failure modes is important because, once detected,
measures can be taken to reduce the amount of time the SFremains in an FD state thereby reducing the PFDavg and
increasing the RRFa.
During design analysis it should be possible to identify all
FD states in an SF. Ideally, then, we would like to design and
implement automatic diagnostics to detect all FD states.
Sometimes, it is not possible to implement an automatic
diagnostic for a particular FD state. This often arises with
mechanical final elements. Furthermore, even if an automatic
diagnostic could be designed for every FD, it is neither
practical nor prudent to implement diagnostics to cover all
possible FD states. Thus, it is normal practice to provide
automatic diagnostic coverage where practical for the most
likely and the most critical failures. Therefore, some SF/SRS
failures will be detected and others will be undetected.
Two strategies for minimizing the time spent in
dangerous detected failure (DD) states are to
1. Automatically convert a DD state to an FS state, or2. Minimize the repair time needed to leave the DD state
and return the SF to a functioning state.
Undetected dangerous failures (DU), on the other hand,
can only be addressed by periodic manual testing.
Consequently, in most well designed SRS, the time spent in
states of DU is the principle contributor to PFDavg.
2.5 Architectures
Classical k-out-of-n models are familiar to reliability
engineers. Thus, in Figure 5 we illustrate the familiar 1-out-
of-2 and 2-out-of-2 reliability models. When reliability is
being analyzed, continuity from input to output is the key.
Safety architectures are different. The de-energize-to-trip
design is the most common safety architecture in which the SFand SRS are designed to deprive the process of energy, i.e., to
shutdown the process, in the event of a hazard. Thus, in safety
models, the key is the ability to interruptcontinuity from input
to output. Figure 5 also illustrates two common de-energize to
trip safety models. Note how, in these two examples, the
nomenclature for the safety models is the reverse of that for
the comparable reliability models.
Figure 5 Comparison or reliability and safety models.
3. OVERVIEW OF IEC 61508
3.1 Historical Perspective
IEC 61508 is an international standard for the functional
safety of electric, electronic, and programmable electronic
equipment. Development of this standard began in the mid-
1980s when the International Electro-technical CommissionAdvisory Committee of Safety (IEC ACOS) set up a task
force to consider standardization issues raised by the use of
programmable electronic systems (PES) in automatic
protection systems. At that time, many regulatory bodies
forbade the use of any software-based equipment in safety
critical applications. Work began within IEC
SC65A/Working Group 10 on a standard for PES used in
SRS. This group merged with Working Group 9 where a
standard on software safety was in progress. The combined
group treated safety as a system issue.
3.2 Structure of the Standard
The complete IEC 61508 standard is divided into seven
parts:
1. General requirements (required for compliance)2. Requirements for electrical/electronic/programmable
electronic safety-related systems (required for
compliance)
3. Software requirements (required for compliance)4. Definitions and abbreviations (supporting information)
-
7/30/2019 Applying Reliability Engineering Techniques
6/15
4 Goble & Bukowski 2012AR&MS Tutorial Notes
5. Examples of methods for the determination of safetyintegrity levels (supporting information)
6. Guidelines on the application of parts 2 and 3 (supportinginformation)
7. Overview of techniques and measures (supportinginformation)
Parts 1, 3, 4, and 5 were approved in 1998. Parts 2, 6, and
7 were approved in February 2000.
Parts 1-4 are normative meaning the requirements
(interpreted using the official definitions) must be met forcompliance with the standard. Parts 5-7 are informative
meaning that they provide examples, guidelines, techniques
and measures but do not mandate the use of any specific
guidelines, techniques or measures to be in compliance.
The normative parts of the standard comprise nearly 500
pages with thousands of requirements, i.e., sentences
including the term shall or must which need to be
correctly addressed for compliance with the standard. Broadly
speaking, these requirements fall into one of two groups
which relate directly to the two fundamental concepts of IEC
61508 discussed in Section 3.6 below:
One group of requirements covers the design lifecycleprocess. This is intended to provide a sufficient level ofintegrity against systematic failures of the system, i.e.,
fault avoidance.
One group of requirements covers the probabilisticanalysis of all hardware involved in any safety function.
This is intended to provide a sufficient level of integrity
against random failures of the system.
3.3 Philosophy and Consequences of the Standard
All of the requirements are intended to help designers
create systems that work correctly (are reliable) or fail in a
predictable (hopefully fail-safe) manner. Most designers
consider the requirements of IEC 61508 to be classical,
common sense practices that come directly from prior quality
standards and general engineering practices.
The standard focuses attention on risk-based safety-
related system design, which should result in higher levels of
safety and far more cost-effective implementation. The
standard also requires the attention to detail that is vital to any
safe system design. Finally, the standard offers flexibility by
failing to prescribe specific techniques and measures, instead,
offering alternatives to achieve compliance. Because of these
features and the large degree of international acceptance for a
single set of documents, many consider the standard to be
major advance for the technical world.
3.4 Goals of the Standard
IEC 61508 is a basic safety publication of the IEC.
Lacking industry-specific language, it is an umbrella
document covering multiple industries and applications. A
primary goal of the standard is to help individual industries
develop supplemental standards, tailored specifically to those
industries, based on the original 61508 standard. Several such
industry specific standards have now been developed with
more on the way. IEC 61511 [2] has been written for the
process industries. IEC 62061 [3] addresses machinery safety.
IEC 61513 [4] deals with the nuclear industry. There are even
productspecific standards now being released that follow the
framework and the concepts IEC 61508. One of these is IEC
61800-5-2 [5], Safety Requirements Functional Safety, for
variable speed motor controllers. All of these standards build
directly on IEC 61508 and reference it accordingly.
A secondary goal of the standard is to enable the
development of electrical/electronic/programmable electronic
(E/E/PE) SRS where specific application sector standards donot already exist.
3.5 Scope of the Standard
Although originally conceived as a standard for E/E/PE
SRS, the IEC 61508 standard covers SRS when one or more
of such systems incorporate mechanical as well as E/E/PE
devices. Thus, these devices can include anything from ball
valves, clutch/brake assemblies, solenoid valves, electrical
relays and switches to complex computerized brake controls
and programmable logic controllers (PLC). The overall
program to insure that the E/E/PE SRS brings about a safe
state when called upon to do so is defined as functional
safety.IEC 61508 does not cover safety issues such as electric
shock, hazardous falls, long-term exposure to a toxic
substance, etc.; these issues are covered by other standards.
3.6 Two Fundamental Concepts
The standard is based on two fundamental concepts:
1. The safety lifecycle, a detailed engineering designprocess, intended to reduce or eliminate failures due to
systematic errors, and
2. Probabilistic failure performance analysis, quantified inorder of magnitude levels - called safety integrity levels
(SIL) - intended to address random failures.
3.6.1 Safety Lifecycle
The safety lifecycle is defined as an engineering process
that includes all of the steps necessary to achieve required
functional safety. The safety lifecycle is included in the
standard to provide sufficient protection against systematic
errors, errors resulting in failures that are deterministically
related to a certain cause. Systematic errors are typically
design mistakes.
The basic philosophy of protection behind the safety
lifecycle is to develop and document a safety plan that
includes all engineering activities per the requirements of the
standard, execute that plan and document its execution (toshow that the plan has been met). Changes along the way
must similarly follow the pattern of planning, execution,
validation, and documentation. Although the standard is
written in the context of a custom, turnkey system, the
requirements are applicable to general product design and
development.
SIL are order of magnitude levels of RRF. There are four
SIL defined in IEC 61508 as shown in Table 1. SIL1 has the
lowest level of risk reduction (RR); SIL4 has the highest level
-
7/30/2019 Applying Reliability Engineering Techniques
7/15
2012 Annual RELIABILITY and MAINTAINABILITY Symposium Goble & Bukowski 5
of RR.
3.6.2 Probabilistic Failure Performance Analysis
Probabilistic failure performance analysis is the second
fundamental concept. Quantitative RR targets, i.e., RRFr, are
established and failure probability calculations are performed
to verify that each SF design meets its RRFr. This
performance-based approach allows the standard to avoidprescriptive rules for redundancy and self-test capability that
so often become obsolete soon after they are published.
Table 1 Correspondence between SIL and RRF
Safety Integrity Level (SIL) Risk Reduction Factor (RRF)
SIL 1 (10, 100]
SIL 2 (100, 1,000]
SIL 3 (1,000, 10,000]
SIL 4 (10,000, 100,000]
IEC 61508 recognizes that all failures are not equal. Two
primary failure modes are defined, FS and FD as discussed in
the Background section.
3.6.3 The Standard from Different Viewpoints
Both of the fundamental concepts and supporting
concepts will be dealt with in greater detail later in this
tutorial. However, it is worth noting at this point that from an
installed system level viewpoint, which is usually that of the
owner-operator, the entire safety lifecycle needs to be
addressed for IEC 61508 compliance and the requirements for
this are treated primarily in Part 1 of the standard, although
Parts 2 & 3 apply to hardware and software design issues in
the lifecycle. On the other hand, from the viewpoint of a
manufacturer who is producing a component or system used in
a safety related application, Parts 2 & 3 of the standard are
paramount, though some aspects of Part 1, such as
documentation issues, must still be addressed.
3.7 Compliance with the Standard
The IEC 61508 standard states: To conform to this
standard it shall be demonstrated that the requirements have
been satisfied to the required criteria specified (e.g., SIL) and
therefore, for each clause or sub-clause, all the objectives have
been met. This is often demonstrated by the use of a Safety
Case.
The Safety Case / Safety Justification methodology
provides a systematic and complete way to show compliance
to one or more functional safety standards. The methodology
was established in industries which deal with functional safety
of computerized automation in nuclear and avionics
applications [6, 7].
For the IEC 61508 standard, all requirements from IEC
61508 have been compiled in a number of industry databases
[8, 9]. Each requirement should be precisely documented
along with the reasoning behind the requirement. Arguments
/ Solutions provide a description of how each requirement is
met by listing design arguments, verification activities and test
cases relevant to that requirement. For full traceability, each
design argument and verification/test activity is linked with
evidence documents showing the results of the work.
When a safety case for IEC 61508 compliance of a
product is completed it must show all requirements along with
an argument for each requirement as to how the system /
product meets the requirement. A link to the evidence
document that supports the argument is also provided.
Additional fields are provided for the independent assessor to
record the results of the assessment and to communicate their
expectations with other assessors and the certifyingindividuals.
Overall, the safety case concept provides a single place to
store compliance information in an organized manner. The
use of a safety case provides a systematic means to ensure
completeness of any assessment. The Safety Case method
supports company learning over multiple projects by
establishing a knowledge base consisting of patterns of
fundamental requirements and related design arguments.
Templates and previous examples of evidence documents
provide the ability to reduce effort on subsequent projects.
3.8 Legal Implications of the Standard
Because IEC 61508 is technically only a standard and nota regulation or law, compliance is not always legally required.
However, in many instances, compliance is identified as best
practice and thus can be cited in liability cases. Also, many
countries have incorporated IEC 61508 or large parts of the
standard directly into their safety codes, so in those instances,
it has the force of law. Finally, many industry and
government contracts for safety equipment, systems, and
services specifically require compliance with IEC 61508. So
although IEC 61508 originated as a standard, its wide
acceptance has led to legally required compliance in some
cases.
4. SYSTEM LEVEL APPLICATION
4.1 The System Safety Lifecycle Overview
IEC 61508 was written assuming that a complete custom
automatic protection system is being created. Thus the system
safety lifecycle process covers all activities from initial project
definition to de-commissioning of a system. These activities
are divided into three phases and are numbered to match their
depictions in the flowcharts described below. The three
system safety lifecycle phases are
Analysis phase consisting of the following activities:1. Conceptual process design2. Identification of potential risks3. Consequence analysis4. Layer of protection analysis5. SF RRFr and SIL determination6. Requirements documentation
Realization phase consisting of the following activities:7a.SRS technology selection
7b. SRS architecture selection
7c. Test frequency determination
7d. Reliability and safety evaluation
-
7/30/2019 Applying Reliability Engineering Techniques
8/15
6 Goble & Bukowski 2012AR&MS Tutorial Notes
8. SRS detailed design
9. SRS installation & commissioning planning
10. SRS installation, commissioning, & acceptance
testing
Operation phase consisting of the following activities:11. Validation planning
12. Safety review
13. Operating & maintenance planning
14. Start-up, operation, maintenance, periodic proof
testing15. Modifications
16. Decommissioning
A brief flowchart illustrating the three phases of the
system safety lifecycle process is shown in Figure 6. Note
that during the lifecycle, all modifications are required to be
fed back to the analysis phase. The individual activities and
their relationships are readily depicted in an extensive
flowchart which is illustrated, by phase, in Figures 7, 8 & 9.
Figure 6 Overview of system safety lifecycle.
Figure 7 Details of analysis phase of system safety lifecycle.
There is a requirement that documentedprocedures exist
for all safety lifecycle activities. The results of all safety
lifecycle activities must also be documented. Additionally,
IEC 61508 requires quality auditing be performed to ensure
that the lifecycle process is actually being followed on a
project. This is called functional safety management
(FSM). Depending on the SIL level of a project, different
levels of FSM independence are required with an
independent organization required for the higher SIL levels.
Practical interpretations of this FSM independence
requirement, determined by SIL level, are as follows:
SIL 1: Independent FSM auditor is an independentperson(s) outside of the immediate design team/
development group.
SIL 2: Independent FSM auditor is an independentperson(s) outside of the immediate department
responsible for design/development.
SIL 3: Independent FSM auditor is an independentorganization commonly interpreted to mean an entityoutside the design/development company.
The remainder of this section explains the three phases in
greater detail and highlights some of the specific activities.
Figure 8 Details of realization phase of
system safety lifecycle.
Figure 9 Details of operation phase ofsystem safety lifecycle.
4.2 Analysis Phase
The overall objective of activities 1-4 is to identify where
dangerous situations are and how dangerous they may be.
Thus, hazards and consequences are identified and
documented, and the inherent risk (likelihood and
consequence) of each hazard (in a process without automatic
-
7/30/2019 Applying Reliability Engineering Techniques
9/15
2012 Annual RELIABILITY and MAINTAINABILITY Symposium Goble & Bukowski 7
protection equipment) is estimated or calculated. IEC 61508
does not specify how these activities are to be accomplished.
There are a number of accepted methods depending on
industry. These methods are well documented and widely
practiced in many industries [10, 11]. Figures 10 and 11
provide an industrial example of hazards and consequences for
a platform separation process.
In activity 5, the inherent risk is compared to tolerable
risk criteria. Tolerable risk criteria are not included in IEC
61508.
Figure 10 Industrial example of platform separation
process.
Figure 11 Industrial example of hazards and consequences.
In some countries government regulators establish
quantitative tolerable risk criteria but in most cases, tolerablerisk is established by the owner-operator of the process. If the
inherent risk exceeds the tolerable risk, then RR requirements
for each hazard are established. Often RR is simply specified
as an order of magnitude level designated SIL. In some case,
when quantitative risk frequency methods are used, the RRFr
is calculated per (1). Figure 12 continues the platform
separation process example to the calculation of an RRFr and
Figure 13 indicates the SIL level requirements.
In activity 6, an SF is defined to protect against each
hazard when a RR is required. The description of the SF
along with the required RR, i.e., the RRFr or the required SIL,
is documented in a safety requirements specification (SrS).
This document becomes the input to the realization phase of
the system safety lifecycle.
Figure 12 Example calculation of RRFr.
Figure 13 SIL level requirements to meet RRFr.
4.3 Realization Phase
After all the SFs are identified and documented, the
realization phase begins with a conceptual design. Specific
equipment is selected. Redundancy levels are chosen. Test
strategies are planned. Based on that information, a
probability of failure calculation is then performed to verify
that the design meets the RRFr. Often initial designs do not
meet the RRFr and the designer must make changes. When
the optimal design is reached through what is normally an
iterative process, the design details can be completed.
4.3.1 Equipment Selection
In activity 7a a conceptual design is performed by
choosing the desired equipment to perform the safety function.
Equipment must be chosen to sense the hazardous condition.
Typical sensors include measuring pressure, temperature,
flow, level, proximity, velocity or other variables. Often a
microprocessor-based product is chosen to implement the
protection logic. IEC 61508 calls this device the L/S. An SF
will also need a final element. This set of equipment
-
7/30/2019 Applying Reliability Engineering Techniques
10/15
8 Goble & Bukowski 2012AR&MS Tutorial Notes
performs the protective action. Commonly in the chemical /
petro-chemical industries this is a remote actuated valve that
opens or closes to reduce energy. In machine safety there is
often a clutch/brake assembly that dissipates kinetic energy.
In many applications an electrical relay will de-energize a
motor or other load.
The equipment is selected based on the classical
requirements for needed functionality, accuracy and
environmental constraints. For functional safety, it is also
necessary to justify the equipment choices. Justificationshould consider experience in using a product in similar
applications and the product functional safety design features.
Often products that are third party certified to meet
requirements of IEC 61508 are selected. The designer must
also obtain failure rate and failure mode data for each piece of
equipment. IEC 61508 certified equipment is supplied with a
Safety Manual which contains this information along with all
needed information to support compliance with functional
safety standards.
4.3.2 Redundancy
In activity 7b, the safety architecture is specified.
Redundant equipment may be chosen so as to achieve highlevels of safety integrity, high levels of availability or a
combination of both [12]. Unlike other prescriptive standards,
there are no specific requirements for redundant equipment in
IEC 61508. Instead the designer may choose the type of
redundancy that is best for the application considering
maintenance capabilities and cost issues. For the equipment
chosen, the reliability and safety models and failure rates must
be obtained. Some redundant controller manufacturers
provide calculation tools that model their redundant systems.
Others provide the models and data in the Safety Manual.
4.3.3 Testing
Once the technology and architecture have been chosen,
the designers plan any potential on-line testing methods
during activity 7c. In some applications the equipment
comprising an SF will, hopefully, not be called on to activate
frequently. This situation is called low demand. In a low
demand application, the equipment often sits dormant for
years at a time. There is no overt indication as to whether the
equipment is still working. Final element equipment in
particular can corrode, cold-weld or otherwise fail in a
completely hidden way such that the SF will not work when
needed, a condition of FD. Therefore the equipment must be
completely inspected and tested at specified time intervals.
Equipment with automatic testing and annunciation is thepreferred choice. However even with automatic testing there
is normally some manual testing that must be done. This
manual testing can verify that the automatic testing continues
to work correctly and can detect FD states not covered by the
automatic test.
In some industries, the target periodic test interval
corresponds with a process shutdown and major maintenance
cycle. In other industries, a periodic inspection/test must be
done more frequently. If these tests must be performed while
the process is operating, on-line test facilities are designed into
the SRS. A periodic inspection and test plan must be created
for all the equipment in each SF.
4.3.4 Probabilistic Failure Analysis
Once the equipment, redundant architecture, and test
strategies are defined, the designers engage in activity 7d by
performing a probabilistic failure analysis to verify that the
design has met the target SIL, RRFr, and reliability
requirements. The effort requires gathering failure rate data asa function of failure modes for each piece of equipment in the
SF.
Most manufacturers that supply equipment intended for
functional safety applications have a failure modes effects and
diagnostics analysis (FMEDA) performed for their equipment
[13]. When that data are not available, designers can use
industry failure rate databases [14, 15, 16, 17].
IEC 61508 does not specify how to perform this failure
probability analysis. There is no specific requirement that
fault trees or Markov models be used although Part 6 of IEC
61508 does have example simplified equations. There is only
a statement that industry accepted methods shall be used.
There are requirements that some important variables in theanalysis be included such as common cause failures [18, 19]
in redundant systems. There is no specific failure rate or
failure mode database in the standard either. So, again the
reliability engineer is only expected to use industry accepted
practices. Specialized analysis tools [20, 21] are available,
some with built-in failure rate databases. There is a
requirement that databases and tools be publicly available.
The results of the probabilistic failure evaluation typically
include a number of safety integrity and availability
measurements. Most importantly, however, the PFDavg and
the safe failure fraction (SFF) are calculated for low demand
mode. Probability of failure per hour is calculated for high
demand mode. From charts in IEC 61508 the SIL level that
the design achieves is determined.
Figure 14 Continuation of platform separation process
example.
-
7/30/2019 Applying Reliability Engineering Techniques
11/15
2012 Annual RELIABILITY and MAINTAINABILITY Symposium Goble & Bukowski 9
Figure 15 - Output of a specialized analysis tool that computed the PFDavg, RRFa, and SIL achieved
by a conceptual design for an SF for the process.
Figures 14 and 15 continue the platform separation
process example. Figure 15 shows the output of a specializedanalysis tool that computed the PFDavg, RRFa, and SIL
achieved by a conceptual design for an SF for the process.
Note that the SFF was also computed (though not shown on
Figure 15) and used to determine the SIL Architectural
Constraints. (See box on performance metrics in Figure 15.)
Further note that the current design does not meet the required
SIL level (see lower left hand corner of Figure 15) and that the
greatest contributor to PFDavg is the final element (see first
pie chart in lower left hand corner of figure 15). Clearly
redesign is required with special attention to the final element.
Many initial designs do not meet failure probability
requirements, and the designers have a choice regarding which
changes to make. Designers may choose to: Increase manual proof test frequency for low demand
systems. This results in more manual testing for the life
of the system. While this decreases the PFDavg, it
increases on-going maintenance cost.
Choose equipment with higher SIL capability. Suchequipment will have lower DU failure rates and this will
reduce PFDavg and often increase SFF. However, it will
typically increase capital expense.
Add redundant equipment. Depending on the redundantarchitecture chosen, the PFDavg will decrease and thesystem availability may increase. Capital costs will
increase and on-going maintenance costs will increase.
The advantage of the IEC performance-based approach is
that, unlike older performance-based standards that dictated
levels of redundancy and equipment choice, IEC 61508 gives
the designers choices. The disadvantage of this approach,
however, is that designers must have the means to perform the
probabilistic failure analysis and enough knowledge to make
design tradeoffs. There is no cookbook in IEC 61508.
4.3.5 Detail Design through Acceptance Testing
When the optimal conceptual design is complete and
documented, the SrS is typically updated to include the newinformation about redundancy and test requirements.
Activities 8-10 can commence. Detailed design activities
include much of the normal project engineering that is
performed by integration companies and project engineering
teams. Wiring and piping diagrams are created. The PLC (if
used) is programmed and tested. A plan is created for the SRS
installation and commissioning. This step includes a
comprehensive test to validate that all requirements from the
-
7/30/2019 Applying Reliability Engineering Techniques
12/15
10 Goble & Bukowski 2012AR&MS Tutorial Notes
original SrS have been completely and accurately
implemented. A revalidation plan, which is a subset of the
validation plan, is also completed for all changes. When the
installed system is tested and validated, the SRS is ready to
provide protection when actual operation begins.
4.4 Operation Phase
The operation phase of the system safety lifecycle
includes activities 11-16 and begins with a safety review of
the implemented SRS which should ensure that allrequirements have been met and that the SRS has been
implemented, installed and commissioned correctly. All
maintenance procedures, management of change procedures,
and test procedures must be available. All training must have
been completed. The operation phase continues with all
needed maintenance and periodic testing. All changes must
feed back through the system safety lifecycle steps to be sure
that safety integrity is maintained. This continues until the
system is de-commissioned.
5. RELIABILITY ENGINEERING TECHNIQUES & BEST
PRACTICES: PRODUCT LEVEL APPLICATION
While IEC 61508 was written for a complete turnkeysystem, the more common usage of the standard is the design
and certification of equipment and components. In this
context the two fundamental concepts still completely apply;
however, some system level requirements no longer apply.
The product safety lifecycle process requirements of IEC
61508 are intended to ensure a sufficient level of safety
integrity against systematic faults. In effect the process
should reduce design errors. The process requirements are
detailed in Parts 2 and 3 of IEC 61508. There are an extensive
number of requirements. However a study of the standard
should indicate to any professional that this material is not
radical but classic quality and software engineering techniques
[22] that have evolved over decades.
The level of detail and rigor varies with SIL Capability
rating. A SIL 1 process does not require as much procedure
and documentation as does a SIL 2 process. A SIL 3 process
has very high rigor with more methods and more
documentation requirements. A SIL 3 capable process has
been compared to somewhere between Capability Maturity
Model Integration (CMMI) Level 3 and Level 4 [23]. A SIL 4
process has the highest level of process requirements
including the use of formal methods. Most product
certifications per IEC 61508 have been performed to a SIL 3
capability level as many practitioners consider SIL 4 to be
impractical.IEC 61508 treats products with documented field
experience differently than newly developed products. If a
product has a sufficient numbers of operational hours in the
field, this is considered as partial evidence of systematic
integrity. Therefore, in these cases, certain documentation and
process steps are not required.
5.1 Analysis Phase
Since products may be used in many diverse applications,
the specific hazards of all possible processes cannot be
analyzed. Therefore system level risk analysis now becomes,
at the product application level, a market requirement. A
product must be specified to be designed to a particular SIL
level. That product can then only be used in a system at that
SIL level or lower. The SIL capability requirement is one of
the product market requirements. All safety requirements for
a product development are contained in the product SrS. This
document may be separate or part of a general product
requirements document.
5.2 Realization Phase
IEC 61508 requires a documented new product
development process with over 1000 specific requirements for
that process. Example methods are suggested and alternatives
are permissible with justification. This amount of detail and
flexibility has helped increase the acceptance level of the
standard. Knowing that many alternatives are available, an
example process can show the general concepts.
5.2.1 Requirements Review and Acceptance
The example process begins with a review of the SrS in
order to make certain that the designers understand therequirements. A concept system is designed and the design is
verified against the requirements by performing a traditional
design failure modes and effects analysis (FMEA) [24]. If
design issues are identified, new requirements are added to the
SrS and the system design is modified. Typically after a
number of iterations, the system design would show no major
design flaws. IEC 61508 requires that at least a draft
validation concept document be created at this point in the
process. This is done primarily to show that the requirements
of the SrS are testable. When the requirements have been
shown to be understandable, sufficient and testable, they are
then allocated to various specific hardware or software
implementations.
5.2.2 Hardware Design Process
The hardware design process requirements from IEC
61508 are primarily common sense quality issues. All design
tools must be qualified and judged fit for use. This typically
means that designers have a good understanding of how each
tool works including limitations and bugs. Today, most
tools for mechanical and electronic design and analysis meet
the SIL 3 requirements of IEC 61508. However, design teams
must be careful to re-evaluate any new release of a design tool
and these evaluations must be documented.
5.2.3 Software Design Process
Most of the process requirements of IEC 61508 are
software process requirements. An IEC 61508 example
software process starts with software safety requirements.
They are reviewed and if understandable, a prototype design is
performed. Several design verification methods are suggested
but most common is the software FMEA, also known as
software hazard and operation study (HAZOP). When
-
7/30/2019 Applying Reliability Engineering Techniques
13/15
2012 Annual RELIABILITY and MAINTAINABILITY Symposium Goble & Bukowski 11
software safety requirements are understandable, sufficient,
and testable, the software architecture design is complete.
Software design tools must be qualified and justified for
use in IEC 61508. Again, the key requirement is that those
using a tool understand how the tool works and its limitations.
Tool justification is typically done by a combination of testing
and experience. Like all design tools it is important to
completely evaluate any new revisions of compilers and test
tools. These evaluations must be documented.
Detail design and code implementation has a set ofnormal quality requirements. There are statements in IEC
61508 that require, for example, The source code shall be
readable, understandable and testable. Many ask how this
can be proven to a third party auditor. A good code review
process can solve the problem. Who better to judge the
quality of the source code than those who must understand it
and make future changes to it? In addition, there are specific
language requirements. For any software language that is not
completely and unambiguously defined, a coding standard is
required to restrict the language to unambiguously defined
features. Therefore, effectively, a coding standard must be
created and actually used. Language constructs that are prone
to error should be banned. Language constructs that can becompiled differently by different compilers must be banned or
only one, completely understood compiler can be used.
Strong data typing is required. Static source code analyzers
are strongly recommended.
Documented module / unit testing of software is required.
Even module testing is planned and executed with
documented test results. A fault tracking system with
documented resolution of problems is required. Documented
software integration testing is required.
5.2.4 Integration Testing
Hardware and software integration test planning is
required with test results recorded indicating a pass/fail result.
A fault tracking system with documented resolution of
problems is required with version control performed on any
design revisions done from this point forward in the process.
5.2.5 Failure Modes Effects and Diagnostic Analysis
In order to support probabilistic analysis at the system
level for each set of equipment used in an SF, the failure rates
for each failure mode must be estimated and published. If
extensive accurate field failure records are available, they may
be analyzed and used to determine the failure rates for each
mode. However, realistically this never happens. Many
failure records are missing important information. Most fieldfailure recording systems (outside of NASA and nuclear
facilities) are not complete. Therefore the FMEDA technique
is used. In an FMEDA, all components of a product are
considered. The failure rate and failure modes of each
component are translated into product level failure rates
primarily by summing failure rates of individual components
for each failure mode.
5.3 Operation Phase
At the product level, operations do not typically involve
the product manufacturer. However, in IEC 61508, the
product manufacturer has the responsibility to provide all
needed information for safe operation and maintenance of a
product. This includes suggested manual proof test
procedures to detect any internal failures of automatic
diagnostics or to detect any hidden FD states, i.e., any DU
states. Any special maintenance instructions must also be
provided to the user of the product.
6. IEC 61508 PRODUCT CERTIFICATION
There are several independent companies performing
third party technical assessments to certify products as IEC
61508 compliant at specific levels of SIL capability. Most
product certification programs are operated per EN45011 [25],
a product certification program quality guide.
A product could receive IEC 61508 certification if the
detailed assessment shows that the product meets all relevant
requirements of IEC 61508. In general this certification is an
indication of high design quality for hardware and software
and high manufacturing quality.
The certification trend is relatively new with few products
achieving this distinction prior to 2006. Starting in 2007 theavailable certified instrumentation products for the process
industries increased dramatically. See the charts in Figures 16
and 17.
Figure 16 Number of IEC 61508 certified sensors.
0
5
10
15
20
25
30
35
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
-
7/30/2019 Applying Reliability Engineering Techniques
14/15
12 Goble & Bukowski 2012AR&MS Tutorial Notes
Figure 17 Number of IEC 61508 certified mechanical
devices.
As an example, a solenoid valve, which obviously does
not have any software, achieved certification with SIL 3
capability when the manufacturer
Demonstrated a design process that met IEC 61508 SIL 3requirements
Had an FMEDA performed on the product resulting infailure rates per failure mode as well as useful life
estimates
Produced a safety manual, a document that contained aset of specific information as required by IEC 61508.
In another example an operating system supplier received
a SIL 3 capability certification per IEC 61508 because the
supplier
Demonstrated a software development process that metSIL 3 requirements
Added several features to the operating system includingscheduling timeout failure detection and task memory
protection so that this software product would make it
easier for the user to create IEC 61508 certified products.
Obviously no hardware is part of the product so no failurerates were produced.
There are hundreds of examples of products with both
hardware and software. In these cases, certification was
achieved by using a combination of good design processes,
FMEDA hardware analysis and user documentation. Figure
18 provides some details for the certification of a product with
both hardware and software components.
Figure 18 Certification details for device with both
hardware and software components.
7. CYBER SECURITY
The IEC 61508 currently requires a cyber-security threat
analysis to be performed for all SF involving software. If a
credible threat is identified, that threat must be addressed.
For the process industries, ISA Security Compliance
Institute (ISCI) has completed a set of requirements for
embedded products cyber security [26]. These requirements
are patterned after IEC 61508 and include
Specific design reviews in which the ability to withstand acyber attack is the objective
A design process audit with very similar requirements toIEC 61508 for software quality
Actual network attack testing, called network robustnesstesting, which involves stress conditions that may
originate with external hackers as well as internal failure
of equipment on a network.
Most IEC 61508 users are addressing cyber security via
current ISCI requirements.
8. CONCLUSIONS
There are many conventional reliability engineering
techniques and best practices that can be applied to the
problem of achieving functional safety. Safety standards such
as IEC 61508 provide structured frameworks for selecting
from among all techniques and practices those appropriately
suited to different phases of the safety lifecycle.
The IEC 61508 functional safety standard has been in
existence for ten years now. During that time it has found
wide acceptance in many industries. A common usage is the
third party certification of products to be used in safety critical
systems. It is also used for system level design.The standard has strong requirements for engineering
processes to defend against systematic faults. It also
utilizes probabilistic failure calculations for the equipment set
used in each safety function to show sufficient protection
against random faults.
This performance-based approach has allowed the
standard to remain relevant even with the rapid advances of
new technologies. The performance-based approach has
allowed innovation in safety designs for both products and
systems.
It is a complicated, detailed standard but allows justified
alternatives to the many methods, techniques, and practices
presented as examples. Many industry specific standards have
been derived from IEC 61508 showing its value. IEC 61508
is having a major impact on the field of reliability engineering.
9. REFERENCES
1. IEC 61508, Functional Safety of electrical / electronic /programmable electronic safety-related systems, Geneva,
Switzerland, 2000.
2. IEC 61511, Application of Safety Instrumented Systemsfor the Process Industries, Geneva, Switzerland, 2003.
3. IEC 62061, Safety of machinery - Functional safety ofsafety-related electrical, electronic and programmable
electronic control systems, Geneva, Switzerland, 2005.4. IEC 61513, Nuclear power plants - Instrumentation andcontrol for systems important to safety - General
requirements for systems, Geneva, Switzerland, 2001.
5. IEC 61800-5-2, Adjustable speed electrical power drivesystems, Part 5-2: Safety Requirements Functional,
Geneva, Switzerland, 2007.
6. Bishop, P. G. and Bloomfield, R. E., "A Methodology forSafety Case Development", Proc 6th Safety-Critical
Systems Symposium, Birmingham, U.K., Feb 1998.
-
7/30/2019 Applying Reliability Engineering Techniques
15/15
2012 Annual RELIABILITY and MAINTAINABILITY Symposium Goble & Bukowski 13
7. Defence Standard 00 55, Parts 1 and 2, Issue 2, U.K.Ministry of Defence, Aug. 1997.
8. exida Safety Case Database Users Manual, exida,Sellersville, PA, 2002.
9. The CASS Guide to Functional Safety CapabilityAssessment, The CASS Scheme Ltd., U.K., Ap 2000.
10. Guidelines for Hazard Evaluation Procedures, AIChECenter for Chemical Process Safety, 1992.
11. Marszal, E., and Scharpf, E., Safety Integrity LevelSelection, ISA, Research Triangle Park, NC, 2003.
12. Goble, W. M., Control System Safety Evaluation andReliability, 3rd Ed., ISA, Research Triangle Park, NC,
2010.
13. Goble, W. M. and Brombacher, A. C., Using a FailureModes, Effects and Diagnostic Analysis (FMEDA) to
Measure Diagnostic Coverage in Programmable
Electronic Systems, Reliability Engineering and System
Safety, Vol. 66, No. 2, Nov 1999, pp. 145-148.
14. Telcordia 332- Issue 3, Reliability Prediction Procedurefor Electronic Equipment, Jan, 2011.
15.Handbook of 217Plus Reliability Prediction Models,The Reliability Information Analysis Center, 2006.
16. OREDA - 97, Offshore Reliability Data, DNV Industry,Hovik, Norway, 1997.
17. Safety Equipment Reliability Handbook, exida,Sellersville, PA, 2003.
18. Dhillon, B. S. and Rayapati, S. N., Common-causeFailures in Repairable Systems 1988 Proc Ann
Reliability and Maintainability Symp, Jan, 1988, pp. 283-
289.
19. Hokstad, P. and Bodsberg, L., Reliability Model forComputerized Safety Systems. 1989 Proc Ann
Reliability and Maintainability Symp, Jan, 1989, pp. 435-
440.
20. exSILentia Users Manual, exida, Sellersville, PA, 2008.21. Industrie-Automatisierung, SILence Handbuch, HIMAPaul Hildebrandt GmbH, Bruhl, Germany, 2003.22. Pressman, R., Software Engineering: A Practitioners
Approach, McGraw-Hill, New York, NY, 2005.
23. Ahern, D. M., Clouse, A. and Turner, R., CMMIDistilled: A Practical Introduction to Integrated Process
Improvement, Addison-Wesley, New York, NY, 2004.
24. McDermott, R. E., Mikulak, R. J., and Beaurgard, M. R.,The Basics of FMEA, Productivity, Inc., Portland, OR,
1996.
25. EN45011, ISO/IEC Guide 65, General requirements forbodies operating product certification systems, Geneva,
Switzerland, 1996.
26. ISASecure Embedded Device Security AssuranceCertification, www.isasecure.org.