Fault-tree analysis for system design, development, modification, and verification

5
IEEE TRANSACTIONS ON RELIABILITY, VOL. 39, NO. 1, 1990 APRIL 87 Fault-Tree Analysis for System Design, Development, Modification, and Verification Robert T. Hessian Jr. Barbara B. Salter Edwin F. Goodwin Stone & Webster Engineering Corp., Cherry Hill Stone & Webster Engineering Corp., Cherry Hill Stone & Webster Engineering Corp., Cherry Hill Key Words - Design analysis, Reliability enhancement, Availability assessment Reader Aids - Purpose: Case history, Tutorial Special math needed for explanations: Boolean algebra Special math needed to use results: Same Results useful to: System engineers, reliability analysts, plant managers, and training supervisors Summary & Conclusions - A methodology has been generated which uses fault-tree analysis (FTA) techniques to assess the weaknesses of a new chemical/process design at any time during system development. R A provides a cost-effective means of im- proving or verifying the reliability and efficiency of chemicallpro- cess design. The methodology evaluates the consequences of con- ceivable failure to indicate where improvements are justified. l T A techniques were used to model the failure modes of an existing control-room heating, ventilation, and air-conditioning W A C ) system of a large production facility. Application of this logic-based design methodology is treated in this paper. The fault-tree reduction revealed 129 single-, 434 double-, and 442 triple-failure combinations, any of which could cause system failure. Single-failures, and double-failures consisting of an equip ment malfunction and an operator error, were targeted for design andlor procedural modifications. These modifications were then incorporated into the operating system design to enhance system availability. In an iterative fashion, lTA techniques were reapplied to the modified design and used to verify the adequacy of proposed revisions prior to implementation. This resulted in a thorough review of system vulnerabilities and a clear understanding of how to correct them. The logic-based methodology applies to any elec- tricallmechanical system and is especially useful for complex systems or those with strong dependence on support systems. Benefits from the method stem from - Awareness of the type & level of human involvement required by the design. The inability to determine the level of risk which a design creates or is capable of mitigating. Training of operators by revealing which operator actions, coupled with specific equipment failures, could be catastrophic or mitigative. Allowing prioritizationof relevant maintenance tasks or modified design implementation. INTRODUCTION System design is often based on traditional engineering analysis, experience, and judgment. With rapid technology evolution and increasing system complexity, the additional use of more comprehensive analytic techniques in the development of safe, efficient plant designs is appropriate. Recently, the management of a production facility was questioned by a regulatory authority regarding the adequacy of the control room heating, ventilating, air-conditioning (HVAC) system. The system operating history was respectable; however, the system had been in continuous service for approximately 20 years, and the system design (see figure 1) had provided only minimal backup should a problem arise. It was decided to analyze the system reliability, focusing on components critical to continuous operation. A major criterion was that the methodology be flex- ible enough to incorporate and assess any modifications that resulted from the analysis. Thus, logic-based methodology was chosen to assess the design. Successful application of the logic-based methodology described in this paper is predicated upon the following key events: Availability of engineers who are knowledgeable in system design and operation and instructed in the use of fault-tree analysis (FTA) techniques and the principles of Boolean algebra. Availability of design documents such as piping and in- strumentation diagrams, control logic diagrams, and electrical 1-line diagrams. The pertinence of these documents depends upon the phase of plant life during which the analysis is be- ing performed. Creation of a baseline logic model of the facility and/or its parts. This provides a basis to assess design changes, during the initial/conceptual development phases, on an existing system or process. Continual updating of the logic model. These updates test the impact of proposed changes on systedprocess failure, as well as add approved changes to provide a revised baseline to which future changes can be compared. This logic-based methodology is the application of FTA techniques in conjunction with computer-graphics software. If the techniques are carefully applied, the result is a rigorous review or verification of existing plant designs and/or proposed modifications. Prior to using this or any other logic-based methodology to determine how an undesired event can occur, an overall hazard and operability [l] study is typically per- formed to identify what possible undesired events should be investigated. 0018-9529/90/1OOO-0087$01.0001990 IEEE

Transcript of Fault-tree analysis for system design, development, modification, and verification

Page 1: Fault-tree analysis for system design, development, modification, and verification

IEEE TRANSACTIONS ON RELIABILITY, VOL. 39, NO. 1, 1990 APRIL 87

Fault-Tree Analysis for System Design, Development, Modification, and Verification

Robert T. Hessian Jr.

Barbara B. Salter

Edwin F. Goodwin

Stone & Webster Engineering Corp., Cherry Hill

Stone & Webster Engineering Corp., Cherry Hill

Stone & Webster Engineering Corp., Cherry Hill

Key Words - Design analysis, Reliability enhancement, Availability assessment

Reader Aids - Purpose: Case history, Tutorial Special math needed for explanations: Boolean algebra Special math needed to use results: Same Results useful to: System engineers, reliability analysts, plant

managers, and training supervisors

Summary & Conclusions - A methodology has been generated which uses fault-tree analysis (FTA) techniques to assess the weaknesses of a new chemical/process design at any time during system development. R A provides a cost-effective means of im- proving or verifying the reliability and efficiency of chemicallpro- cess design. The methodology evaluates the consequences of con- ceivable failure to indicate where improvements are justified. lTA techniques were used to model the failure modes of an existing control-room heating, ventilation, and air-conditioning W A C ) system of a large production facility. Application of this logic-based design methodology is treated in this paper.

The fault-tree reduction revealed 129 single-, 434 double-, and 442 triple-failure combinations, any of which could cause system failure. Single-failures, and double-failures consisting of an equip ment malfunction and an operator error, were targeted for design andlor procedural modifications. These modifications were then incorporated into the operating system design to enhance system availability. In an iterative fashion, l T A techniques were reapplied to the modified design and used to verify the adequacy of proposed revisions prior to implementation. This resulted in a thorough review of system vulnerabilities and a clear understanding of how to correct them.

The logic-based methodology applies to any elec- tricallmechanical system and is especially useful for complex systems or those with strong dependence on support systems. Benefits from the method stem from -

Awareness of the type & level of human involvement required by the design. The inability to determine the level of risk which a design creates or is capable of mitigating. Training of operators by revealing which operator actions, coupled with specific equipment failures, could be catastrophic or mitigative. Allowing prioritization of relevant maintenance tasks or modified design implementation.

INTRODUCTION

System design is often based on traditional engineering analysis, experience, and judgment. With rapid technology evolution and increasing system complexity, the additional use of more comprehensive analytic techniques in the development of safe, efficient plant designs is appropriate. Recently, the management of a production facility was questioned by a regulatory authority regarding the adequacy of the control room heating, ventilating, air-conditioning (HVAC) system. The system operating history was respectable; however, the system had been in continuous service for approximately 20 years, and the system design (see figure 1) had provided only minimal backup should a problem arise. It was decided to analyze the system reliability, focusing on components critical to continuous operation. A major criterion was that the methodology be flex- ible enough to incorporate and assess any modifications that resulted from the analysis. Thus, logic-based methodology was chosen to assess the design.

Successful application of the logic-based methodology described in this paper is predicated upon the following key events:

Availability of engineers who are knowledgeable in system design and operation and instructed in the use of fault-tree analysis (FTA) techniques and the principles of Boolean algebra. Availability of design documents such as piping and in- strumentation diagrams, control logic diagrams, and electrical 1-line diagrams. The pertinence of these documents depends upon the phase of plant life during which the analysis is be- ing performed. Creation of a baseline logic model of the facility and/or its parts. This provides a basis to assess design changes, during the initial/conceptual development phases, on an existing system or process. Continual updating of the logic model. These updates test the impact of proposed changes on systedprocess failure, as well as add approved changes to provide a revised baseline to which future changes can be compared.

This logic-based methodology is the application of FTA techniques in conjunction with computer-graphics software. If the techniques are carefully applied, the result is a rigorous review or verification of existing plant designs and/or proposed modifications. Prior to using this or any other logic-based methodology to determine how an undesired event can occur, an overall hazard and operability [l] study is typically per- formed to identify what possible undesired events should be investigated.

0018-9529/90/1OOO-0087$01.0001990 IEEE

Page 2: Fault-tree analysis for system design, development, modification, and verification

88 IEEE TRANSACTIONS ON RELIABILITY, VOL. 39, NO. 1 , 1990 APRIL

lctake

Exhaust [Y

From A' aux blr

htg Stm

New packaged roonap awcondilionlng unit

Normal intake exhaust

To new unit

A--- - A

New 2

I Recirci control damper

compressors

damper assembly

Figure 1. Control-Room HVAC Modifications

OBJECTIVES

The following methodology objectives relate to qualitative assessment of the failure modes of a proposed design or design modifications:

Effect on overall performance and reliability. Prioritization of design implementation or testing & main-

Insight into potentially sensitive human actions created as a

Determination of operator-training requirements.

As the following discussion indicates, rigorous application of the logic-based methodology and these objectives enables the creation of a plant model complete with component failures, operator actions, and pertinent system design & operating data.

tenance requirements.

result of the proposed design.

PROCEDURES

FTA [2] is a logic-based methodology that can be used to model the relationships between system components as a result of a postulated undesired event. FTA was used for this study.

Review of the control room (CR) HVAC design indicated that heating or cooling were crucial aspects of continued control- room habitability and safe plant operation. Consequently, CR HVAC Unavailability was selected as the top event of the fault tree.

With the undesired event determined, the CR HVAC sup- port systems were readily identified. Heating and cooling re-

i cantans electric heater

2 Ax AII Chiller

3 Filtration

quirements for the CR HVAC system were supplied by aux- iliary boiler and chilled-water systems, respectively. Air distribution was by compressed-air systems for modulating or isolation-type pneumatic dampers. Thus, the fault tree model for CR HVAC unavailable included these support functions. The complexity of a system model is a function of its dependence on other plant systems. A compressed-& system which requires support only from a power supply should produce a correspond- ingly simple fault tree model. However, a fault tree model whose top event is Control Room Uninhabitable would be very com- plex due to the interactions of the CR HVAC with compressed air, chilled water, and component cooling-water systems, as well as the power supply for each of these systems.

Support systems for this analysis were modeled only to the first major mechanical or electrical component, or subcompo- nent, which directly interfaced with the CR HVAC system (eg, pressure-regulating valve or power source dedicated to a com- ponent). It was not necessary to continue the model further since the directly interfacing components generally govern the availability of the CR HVAC interfacing systems. For exam- ple, plant power usually comes from more than one source. However, individual plant components are typically connected to the plant power supply via a single circuit breaker. Therefore, the single circuit breaker is the only means by which a compo- nent is powered, and thus is the controlling factor in power availability.

The SETS (Sets Equation Transformation System) program [3] was chosen to reduce the Boolean equations to minimal cut- set form. The minimal cut-sets generated from the fault tree model reduction were not quantitatively analyzed, ie, neither

Page 3: Fault-tree analysis for system design, development, modification, and verification

HESSIAN ET AL. : FAULT-TREE ANALYSIS FOR SYSTEM DESIGN, DEVELOPMENT, MODIFICATION, AND VERIFICATION 89

SF.1-15 relay F not energized

the probabilities of the individual cut sets nor the probability of the top event was evaluated.

The initial reduced model of the CR HVAC system yield- ed more than loo0 cut sets - of which 129 were single-failure, 434 double-failure, and 442 triple-failure combinations. Single- and double-failures consisting of an equipment failure in con- junction with operator error were targeted for corrective action.

Reviewing the list of targeted minimal cut-sets against the

Locating the cause of the undesired cut sets Eliminating each.

The corrective action consists of

original-system fault tree

sF-1-15 relay main

fail open

List Review

analyst determines whether any undesired cut sets can be eliminated by appropriate operator action(s) and/or ad- ministrative control(s). An example of such control is procedures which include signatures, times, and dates of completed actions, or charging one person with control of keyed locks or valves with a signout system to track usage.

Cause Elimination

All remaining undesired cut sets can be eliminated by design modification incorporating alternate/redundant com- ponents. These corrections ranged in complexity from new ad- ministrative procedures to redundant control circuits or addi- tional system interlocks.

Figure 2 is an excerpt from the facility fault tree of the The model is reviewed to dete-ne if the orientation of

intrasystem components is the cause of any of the undesired cut sets. For example, improper valve configuration can lead to unreliability; changing a valve position from normally clos- ed to normally open can increase system unreliability markedly.

Cause Location

CR HVAC System and shows single failure SFMECH, loss of the system fan* The Original contained O d Y One fan, With no load center lB1- The load Center was fed from a 4-kV bus via a stepdown transformer.

The analyst readily determined that a redundant fan would eliminate single-failure SFMECH. However, the real value of the fault tree was the valuable insight as to how the new fan should be incorporated into the system. The fault tree showed how the original fan was controlled and its source of power,

and was powered

Following verification of proper system configuration, the

FTSK-..-.-AA

Insufficient flow from

SF-1-15

A i

I r I I I

Lossof SF-1-15 control power

FTSK-..-.-AD

RLYFcont

energizing cir coil fails not established

RLYFcoil

FTSK-..-.- A €

I SF-;-15 1 I Two:way 1 1 1 damper fails Mech failure blocking air olSF-1-15

from SF-1 -1 5

trips due to motor overload

SFMOTOVL ' P2A2WD15 SFmech

[This excerpt shows SFMECH reference to loss of system fan.]

Figure 2. CR HVAC Fault Tree for Single-Failure Analysis

Page 4: Fault-tree analysis for system design, development, modification, and verification

90 IEEE TRANSACTIONS ON RELIABILITY, VOL. 39, NO. 1, 1990 APRIL

load center 1B1. This information was crucial to avoid the ad- dition of a redundant component which would have the same commoncause failures as the original component. While it is acceptable for the new fan to use the same electric power design as the Original, it is unacceptable to connect the new fan to the same loud center. This philosophy applies to redesign for con- trol circuits and pneumatic supplies for other system com- ponents, such as dampers and sensors. The fault tree can be used to guide the modified configuration of the system and sup- port systems (eg, two fans).

Now with design modifications added, or administrative controls recognized, the updated fault tree is more important than the original. Through the fault tree, the analyst has iden- tified areas of concern in the original design and has provided data for incorporation of altemate/redundant components as re- quired. Following incorporation of design changes into the fault tree, the fault tree is again reduced to ensure that all previously identified undesired cut sets have been eliminated and that new ones have not been created. That is, the proposed modifications and their incorporation are verified.

how a system fails, rather than how ofren it fails. When the failure modes of a system are accurately identified and proper- ly corrected, and the corrections are verified, then system reliability is improved. Generally, even without assigning a numerical value to that reliability, it is increased by creating a system design with minimal cut sets that have only multiple contributors. For example, if an initial analysis reveals 50 single- failures, 100 double-failures, and 200 triple-failures, and after reconfiguration shows 0 single-failures, 50 double-failure, and 250 triple-failures, then system reliability has very likely been improved, and a numerical estimate of the degree of improve- ment is not required.

The goal for the analysis of the production facility was to determine the adequacy of the CR HVAC system by identify- ing those components critical to continued operation of the system. The fault tree was reduced qualitatively in order to find single failures and thus identify critical components. The review was assessed qualitatively by examining the size of each cut set which can cause the undesired event. As the size of the cut set increases, its unreliability generally decreases. In this particular analysis, the design assessment was focused on control-room habitability.

USE OF RESULTS Prioritization of Testing and Maintenance Scheduling

The FTA is independent of any quantitative assessment and its associated uncertainty. The following two concerns are not issues when employing this qualitative technique:

Assignment of equipment failure rates, the source of that data, their statistical variability, and the use of generic industry data

Minimal cut sets have various uses in system analysis. One important application is the prioritization of maintenance ac- tivities. When a fault tree is reduced to its minimal cut-set form, components are grouped in combinations capable of causing the top event. some components appear in minimal cut sets more than others. Thus, maintenance activities for components in a system can be prioritized according to how frequently a com- ponent appears in the various minimal cut sets.

at a specific plant Calculations of operator error rates.

This methodology is designed to provide information on

Figure 3. Effect of operator Error on Loss of Pump 1-1

Page 5: Fault-tree analysis for system design, development, modification, and verification

HESSIAN ET AL. : FAULT-TREE ANALYSIS FOR SYSTEM DESIGN, DEVELOPMENT, MODIFICATION, AND VERIFICATION 91

Human Interaction with System Design or Proposed Design Modijications

When constructing a fault tree, human actions should be included. Experience has shown that the operator can be the dominating factor in system failure or unavailability. Figure 3 is an excerpt from the CR HVAC fault tree, and shows the ef- fect of operator error on loss of a cooling-water pump. The abili- ty of the operator to perform a variety of tasks under various conditions must be considered.

An operator action can be described as any human-based intervention capable of correcting, bypassing, aggravating, or mitigating the effects of a failed component. When determin- ing the feasibility of an operator action, human response behavior, performance capability, and operations environment are taken into account.

Operator Training

Operator training is crucial to plant performance and sur- vival. From a logic-based methodology, an operator can learn how his actions are coupled to components by seeing how those actions are included in minimal cut-sets. As an example, CR W A C unavailable occurs because the electrical bus 1B1 primary breaker fails and the operator fails to close the secon- dary breaker.

Although this example is simplistic, it points out that the CR HVAC system is unavailable if the operator does not take proper action (viz, connecting the secondary breaker to an alter- nate power source). A training supervisor can measure an operator’s ability to recover the CR HVAC system by observ- ing the operator’s response to the situation.

REFERENCES

[l] R. E. Knowlton, An Introduction to Hazard and Operability Studies, Chemetics International Ltd., 1981 February, (available from Chemetics).

121 N. H. Roberts, W. E. Veseley, D. F. Haasl, F. F. Goldberg, Fault Tree Handbook, NUREG-0492, 1981 January, Systems and Reliability Research, Office of Nuclear Regulatory Research, Nuclear Regulatory Commission, Washington DC USA.

[3] R. B. Worrell, D. W. Stack, “A SETS users’s manual for the fault tree

analyst”, SAND 77-205 1, Sandia Laboratories, Albuquerque, New Mex- ico, USA, 1978 November.

[4] A. D. Swain, H. E. Guttmann, Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications, Final Report, SAND 80-0200, Sandia Laboratories, Albuquerque, New Mexico USA, 1983 August.

AUTHORS

R. T. Hessian Jr.; Senior Titled Engineer; Stone & Webster Engineering Cor- poration; POBox 5200; Cherry Hill, New Jersey 08034 USA. Robert T. Hessian Jr. received a BE in Mechanical Engineering from Stevens

Institute of Technology, Hoboken in 1978. During the past 7 years, he has per- formed and supervised hazard identification & analysis for power plants and chemical/process facilities, including reliability, fault-tree analysis, probabilistic risk assessment, and off-site consequence analyses. He has developed and con- ducted training seminars on hazard identification, hazard analysis, and risk- management program development for the chemical/process industry. Mr. Hes- sian has coauthored 9 technical papers related to hazard analysis and is coauthor- ing a handbook on the subject. He is a member of the American Society of Mechanical Engineers and the American Society of Safety Engineers, and a meniber of the Facility Safety Committee.

B. B. Salter, PE, Senior Titled Engineer; Stone & Webster Engineering Cor- poration; POBox 5200; Cherry Hill, New Jersey 008034 USA.

Barbara B. Salter received a BS in Chemical Engineering from Bucknell University, Lewisburg in 1978. She worked as a system engineer, performing design development and modification from 1978 to 1984. She is actively in- volved in performing fault-tree analysis, reliability/availability assessments of system designs, and consequence analysis. She has conducted training seminars on fault-tree analysis for the chemicaUprocess industry. Ms. Salter is a member of the American Nuclear Society.

E. F. Goodwin; Principal Engineer; Stone & Webster Engineering Corpora- tion; POBox 5200; Cherry Hill, New Jersey 08034 USA.

Edwin F. Goodwin received a BS in Nuclear Engineering from the Univer- sity of Lowell, Lowell in 1973. He worked from 1973 to 1985 on engineered safety systems, and analyzed potential consequences associated with various plant designs. As an adjunct to this work, he developed and modified various software packages for accident analysis. He is involved in computer software development and application using computer-aided design systems. Mr. Goodwin is a member of the National Computer Graphics Association.

Manuscript TR88-049 received 1988 February 5 ; revised 1989 June 7

IEEE Log Number 32657 4 T R b