System Verification Through Reliability, Availability

15
This is a preprint of a paper intended for publication in a journal or proceedings. Since changes may be made before publication, this preprint should not be cited or reproduced without permission of the author. This document was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, or any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for any third party’s use, or the results of such use, of any information, apparatus, product or process disclosed in this report, or represents that its use by such third party would not infringe privately owned rights. The views expressed in this paper are not necessarily those of the United States Government or the sponsoring agency. INL/CON-10-20342 PREPRINT System Verification Through Reliability, Availability, Maintainability (RAM) Analysis & Technology Readiness Levels (TRLs) INCOSE 2011 Emmanuel Ohene Opare, Jr. Charles Park June 2011

Transcript of System Verification Through Reliability, Availability

Page 1: System Verification Through Reliability, Availability

This is a preprint of a paper intended for publication in a journal or proceedings. Since changes may be made before publication, this preprint should not be cited or reproduced without permission of the author. This document was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, or any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for any third party’s use, or the results of such use, of any information, apparatus, product or process disclosed in this report, or represents that its use by such third party would not infringe privately owned rights. The views expressed in this paper are not necessarily those of the United States Government or the sponsoring agency.

INL/CON-10-20342PREPRINT

System Verification Through Reliability, Availability, Maintainability (RAM) Analysis & Technology Readiness Levels (TRLs)

INCOSE 2011

Emmanuel Ohene Opare, Jr. Charles Park

June 2011

Page 2: System Verification Through Reliability, Availability

System Verification through Reliability, Availability, Maintainability (RAM) Analysis & Technology

Readiness Levels (TRLs)

Emmanuel Ohene Opare Jr, Idaho National Laboratory

208-526-0189 [email protected]

Charles Park

Idaho National Laboratory 208-526-1091

[email protected]

Idaho National Laboratory 2525 N. Fremont Ave.

P.O. Box 1625 Idaho Falls, ID 83415-3780

Copyright © 2010 by Battelle Energy Alliance, LLC. Published and used by INCOSE with permission.

Page 3: System Verification Through Reliability, Availability

Abstract

Abstract. The Next Generation Nuclear Plant (NGNP) Project, managed by the Idaho National Laboratory (INL), is authored by the Energy Policy Act of 2005, to research, develop, design, construct, and operate a prototype fourth generation nuclear reactor to meet the needs of the 21st Century. A section in this document proposes that the NGNP will provide heat for process heat applications. As with all large projects developing and deploying new technologies, the NGNP is expected to meet high performance and availability targets relative to current state of the art systems and technology. One requirement for the NGNP is to provide heat for the generation of hydrogen for large scale productions and this process heat application is required to be at least 90% or more available relative to other technologies currently on the market.

To reach this goal a RAM Roadmap was developed highlighting the actions to be taken to ensure that various milestones in system development and maturation concurrently meet required availability requirements. Integral to the RAM Roadmap was the use of a RAM analytical/simulation tool which was used to estimate the availability of the system when deployed based on current design configuration and the maturation level of the system.

Introduction

This paper highlights some approaches the hydrogen production system (HPS) project uses in identifying and tracking system vulnerabilities during maturation and validating system operational requirements with the end-user in mind. The objective of the RAM guide, and flow chart is to ensure that end-user needs are identified and noted upfront during system design in order to minimize the risk of developing a system that does not meet end user operational needs. Early stages of RAM in system development helps to identify those vulnerabilities within the system that may cause it to fail when deployed and provides the decision maker and customer with critical information to make early design alterations from an economic standpoint.

This paper also introduces the audience to the NGNP RAM system which provides analytical and simulation capabilities specifically designed to facilitate the development and execution of activities that pertain to system operational requirements. RAM analysis is used to determine the feasibility of project plans, identify potential problems that may affect life-cycle activities and the quality and performance of products. RAM analysis provides to systems engineering an added capability to communicate and represent the needs of the end-user during system design and development.

Background

The concepts which will be discussed in this paper were used on the HPS. The HPS will utilize a high temperature steam electrolysis (HTSE) system which uses high temperature heat from a nuclear reactor to split water into hydrogen and oxygen with no consumption of fossil fuels, no production of greenhouse gases, and no other forms of air pollution. High temperature electrolytic water-splitting supported by nuclear process heat and electricity has the potential to produce hydrogen with purity levels of 99.99%. The hydrogen production system is in the conceptual phase of development and it is during this early design phase that risk mitigation and RAM improvement activities can be most beneficial

Page 4: System Verification Through Reliability, Availability

to the customer. When RAM analysis is performed during the early stages of system development it minimizes the risk of the system not accomplishing its mission. RAM analysis provides systems engineers a way to articulate, clarify, define and verify that the most important requirement metrics have been met. Although customer requirements may specify system design features critical to satisfaction, the most important metric to the customer is the operational availability of the system. If the system has all design features specified by a customer and it is mostly unavailable to accomplish its mission, there is a risk of dissatisfying the customer. Therefore to track RAM improvement as the system matures while mitigating risks; a RAM Roadmap was developed to trace RAM activities that will verify that system operational requirements are being adhered to. The document will discuss the measures being taken to advance system maturation while simultaneously improving system RAM.

RAM Process Development

As a part of the NGNP Project, INL staff developed a project-phased process to perform RAM Analyses within the Department of Energy (DOE) environment. A careful review was conducted to identify relevant documents that might contain applicable requirements for RAM Analysis at a DOE nuclear site. These documents included the INL Operations and Management contract, various DOE Orders and Guides, and national and international codes and standards. Best business practices were also identified and reviewed. Requirements were then systematically extracted from these documents and compiled. As expected, some of these requirements were duplicative or contradictory. A consolidated and reconciled set of applicable requirements was then developed for the project. Since the time-phased nature of the requirements was complicated it was desirable to develop a RAM Roadmap that diagramed the RAM activities including their inter-relationship. These actions were then integrated with DOE critical decisions for authorization and funding. In addition, the NGNP Project Technology Readiness Levels (TRLs) were added to guide and correlate technical maturity with the needed RAM activities. RAM Analysis methods and tools were highlighted along with key deliverable documents that support effective Systems Engineering.

Technology Readiness Assessment The RAM process provides a visual representation of the sequential and interrelated activities needed to mature the HPS. The HPS project uses a Technology Readiness Assessment (TRA) process to determine the technology maturity of critical systems, subsystems and components on a Technology Development Roadmap. Technology development activities are done concurrently with select RAM activities at each phase of system development. As the technology is studied, tested and matured through design its reliability is expected to improve accordingly. This is illustrated in Figure 1where RAM activities are done concurrently with technology maturation activities. For the HPS, establishing and executing a Reliability, Availability and Maintainability Roadmap (RAM) ensures that end-user needs are addressed early in system design and development. The RAM process provides a method for verifying and validating that system operational requirements are met at each design phase and TRL.

Page 5: System Verification Through Reliability, Availability

Figure 1 High Level Technology Maturation Schematic

Quantification of Customer Requirements

One method employed for the HPS project to verify that system requirements are adhered to was to translate all loose requirements into quantifiable metrics. The process provides the capability to trace system design requirements through development. Usually in the DOE environment all requirements are of equal weight and value, therefore it is necessary that all requirements are addressed. In the case of the HPS, system requirements were too broad to be useful for verification and validation as the system matured. Therefore to make firm the requirements, the system was viewed in its future state of operation and the characteristics of the systems in the future state were quantified. This process implied the consolidation of requirements with similar themes into some measurable metrics like RAM for verification and validation purposes. By grouping and showing the interconnectedness of these metrics provides a way to condense, clarify, and articulate to the customer what the desirable target metrics of the system should be. This enables systems engineers to perform verification and validation work as the system matures through TRL space. Shown in Figure 2 is an example of how system requirements may be grouped for iterative quantitative analysis as the system matures in development.

Risk vs. Technology Readiness

0

10

20

30

40

50

2 3 4 5 6 7 8

TRL Maturity Score

Nor

mal

ized

Ris

k Sc

ore

Advance TRLs & Reduce Risk

Build Roadmap &Define Path Forward

Evaluate Technology Development Roadmap

& Refine Path Forward

Assess TechnologyMaturity

NGNPArea Min

System TRLNGNP 3

Nuclear Heat Supply System (NHSS) 4Reactor Pressure Vessel 4Reactor Vessel Internals 4Reactor Core and Core Structure 4Fuel Elements 4Reserve Shutdown System 5Reactivity Control System 4Core Conditioning System 4Reactor Cavity Cooling System 4

Heat Transfer System (HTS) 3Circulators 5Intermediate Heat Exchanger 3Cross Vessel Piping 4High Temperature Valves - Flapper 6High Temperature Valves - Iso, Relief 4

Power Conversion System (PCS) 4Steam Generator 4

Balance of Plant (BOP) 3Fuel Handling System - Prismatic 4Fuel Handling System - Pebble Bed 5Instrumentation & Control 3

Evaluate RAM Roadmap& Perform RAM Tasks

y = -0.191x3 + 1.052x2 + 18.69x - 21.03R² = 0.997

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9

Avai

labi

lity

Advance TRLs & Improve RAM

RAM vs. Technology Readiness

NGNPArea Min

System TRLNGNP 3

Nuclear Heat Supply System (NHSS) 4Reactor Pressure Vessel 4Reactor Vessel Internals 4Reactor Core and Core Structure 4Fuel Elements 4Reserve Shutdown System 5Reactivity Control System 4Core Conditioning System 4Reactor Cavity Cooling System 4

Heat Transfer System (HTS) 3Circulators 5Intermediate Heat Exchanger 3Cross Vessel Piping 4High Temperature Valves - Flapper 6High Temperature Valves - Iso, Relief 4

Power Conversion System (PCS) 4Steam Generator 4

Balance of Plant (BOP) 3Fuel Handling System - Prismatic 4Fuel Handling System - Pebble Bed 5Instrumentation & Control 3

Reassess TechnologyMaturity

TRL Maturity Score

Page 6: System Verification Through Reliability, Availability

Figure 2 Inter-relationship of quantifiable metrics

As can be seen in Figure 2, system Availability is a function of Reliability, Maintainability, and Supportability (which is a function of maintainability). In cases where the customer did not know what the target metrics of the system should be, a benchmark of comparable systems currently operating in industry provided some target metrics for the system under development.

Mitigating Operational Risk through RAM Analysis With some target operational requirements metrics established for the system a RAM analysis was performed. RAM refers to three interrelated characteristics (reliability, availability and maintainability) of a system as shown in Table 1 and these metrics are defined as the following:

� Reliability: Reliability is the probability of an item to successfully perform a required function under stated conditions without failure for a specified period of time.

� Availability: Availability measures the degree to which an item is in an operable state can be committed to operate at any point in time. Availability from an end user perspective is a function of:

o How often system failure occurs

o How quickly failures can be isolated and repaired

o How long logistics support delays contribute to down time

o How often preventative maintenance is performed

Affordable Operational Effectiveness

Reliability

Maintainability

Supportability

Availability

Production

Operation

Maintenance

Logistics

Process Efficiency

Capabilities

Functions

Priorities

System Performance

System Effectiveness

Total Ownership Cost

Technical Effectiveness

Page 7: System Verification Through Reliability, Availability

� Maintainability: ����������������� ��� ������������������� ��������� ����������������� ����� ����� ����������������� ������������������ ����

������� �������������������������������� ����� ������������ ���

����������� ������� �

Table 1 RAM Inter-relationship Matrix

When system RAM is improved with system development, system operational risks are reduced accordingly. Since the customer is most interested in the operational availability of system and the ability of the system to perform its function without compromising safety standards and posing a threat to the public these issues have to be addressed during the entire phase of system development. Anything that prevents the system from accomplishing its mission becomes an operational risk to the customer and the potential consequences that could result if these operational risks are not mitigated may include:

� Inability to operate the system due to regulatory standards

� Increase in system operational cost due to poor system operational availability

� Loss of system throughput due to poor system availability caused by frequent system failures

� Loss of customers for the end-user due to inability to meet customer demand caused by poor system operational availability

To avoid these consequences these risks must be mitigated through RAM analysis as the system matures. Though RAM analysis cannot address risks pertaining to system regulatory standards, two major outputs of RAM analysis and simulation tool employed by the project is the Failure Criticality Index (FCI) and Total Ownership Cost (TOC).

The FCI identifies the critical entities within the system responsible for majority of system failures. This information is important to improve system robustness and minimize the availability challenges of the system when it is constructed and deployed.

The TOC is another important output from RAM analysis since it helps to estimate the operational cost of the system when in operation. Being able to estimate the cost of owning and operating the system provides the end-user a way to influence system design and development.

If system vulnerabilities are ignored project management will realize the risk of designing a system that is operationally unavailable, costly to maintain, and below the expectation of the end-user. These risks can be reduced when RAM analysis is employed at the early stages of system development.

Then�AvailabilityDoes�not�change Decreases DecreasesDoes�not�change Increases Increases

Increases Does�not�change IncreasesDecreases Does�not�change DecreasesIncreases Decreases(long�repair�time) DecreasesDecreases Increases�(short�repair�time) Increases

If�Reliability And�Maintainability

Page 8: System Verification Through Reliability, Availability

The purpose of RAM analysis to the Systems engineer is to verify that these potential operational risks are mitigated before the system is constructed and deployed. RAM analysis identifies those vulnerabilities within the system which prevent it from fulfilling its mission. If these vulnerabilities go unresolved, the consequence of operating the system is very high thereby increasing its risk. The objective of risk management is to address all technical and non-technical issues which may prevent the system from fulfilling its mission. The risk of a system can be defined by the following equation:

Risk Priority Number = (PE x PC) x C x W EQN (1)

Where:

PE = Probability of occurrence

PC = Probability that consequence occurs at level of severity noted

C = Consequence of occurrence (loss if event occurs)

W = Weighting factor is a function of the probability of the event times consequence

Figure 3

As the system advances in TRL space, it is expected that operational risk decreases accordingly as shown in Figure 3.

Technology Readiness Level (TRL) and RAM Integration The NGNP uses TRLs with a tailored scale of 1 to 10 compared to the standard 1 to 9 scale used by NASA and DOD. The Technology Readiness Assessment process originates with NASA and the U.S. Department of Defence (DOD) and evaluates the deployment readiness of a technology and its readiness to function in an integrated environment. The 10 scale rating allows the project to assess readiness for full commercialization following the construction and successful operation of the NGNP.

TRL’s provide input to inform NGNP project management of the readiness of a particular technology, component, or system. For TRLs 1-5, assessment typically occurs on an individual technology or component with a calculated roll up TRL for the associated area, systems, and

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10

TRLMaturation�Intervals

NormalizedRAM�curve

NormalizedRisk�Curve

Page 9: System Verification Through Reliability, Availability

subsystems. As the technology or component progress to further levels of maturity, integrated testing occurs and allows TRL assessments directly against subsystems and systems. The integrated testing or modelling occurs at increasingly larger scales and in increasingly relevant environments, thus achieving realistic availability metrics at higher TRL ratings.

As the system matures through TRL space, RAM metrics are verified to ensure that system maturity equates to RAM improvement. When RAM analysis is paralleled with system technology advancement activities, it ensures that system TRL maturation as it pertains to the system’s operational requirements in areas such as system availability is measured and traced through out system development. Definitions for TRLs are shown in Table 2 in addition to RAM activities at each TRL advancement.

Table 2. TRL scale and criteria

TRL Criteria High Level Concurrent RAM Activity

1 Basic principles observed and reported.

2 Technology concept and or application formulated.

Create a system’s description document and gather relevant data for RAM analysis

3 Analytical and experimental critical function and/or characteristic proof of concept: Lab level for pieces of components.

Define system RAM requirements, and develop RAM program plan

4 Lab-scale component validation in lab environment: Demonstrate technical feasibility and functionality. Beginning of integration of some interfacing components into sub-assemblies.

Perform a high level RAM analysis to identify system risks and vulnerabilities and recommend improvement methods.

5 Lab-scale component or sub-assembly validation in relevant environment. Beginning of integration of sub-assemblies into sub-systems.

Perform a high level RAM analysis to identify system risks and vulnerabilities and recommend improvement methods.

6 Subsystem model or prototypical scale demonstration in relevant environment.

Repeat RAM analysis to mitigate system vulnerabilities and improve system RAM.

7 Subsystem prototype demonstration in an operational environment. Beginning integration of subsystems into complete system.

Address all system vulnerabilities through RAM analysis to improve system RAM before construction and deployment. Verify that system RAM will not be compromised by manufacturing constraints.

8 Total system completed, tested and fully demonstrated and validated.

Improve and standardize manufacturing processes to eliminate human and machine errors in the manufacturing process that may compromise system RAM.

Page 10: System Verification Through Reliability, Availability

9 Total system used successfully in project operations.

Establish a monitoring and knowledge management system to track system performance

10 Commercial Scale up–Multiple units Repeat RAM activities 8 and 9 and re-implement best practises in manufacturing and system performance tracking.

As system vulnerabilities are minimized or eliminated either through design changes or research and development, the system’s operational risks are reduced which then improves the system’s operational availability as illustrated in Figure 4. The objective of RAM is to reduce system operational risks while increasing system availability as the system matures.

Application of RAM Analysis at Conceptual Design RAM at the conceptual stage is done using various qualitative and quantitative techniques one of which is RAM simulation which uses real data to simulate the future state of the system based on current design configurations and component failure characteristics. This approach is an iterative process which is used to track the progress of the system towards stated target operational requirements as the system matures at known TRL’s.

RAM analysis at the early stages of system development can be challenging, especially when system components are not clearly defined. Nonetheless at early stages of system development, RAM analysis can be used to verify system adherence to target operational requirements. Current operational availability requirement of the HTSE is 90% which means that the system after construction and development should be operational 90% of the time during its life. When RAM is adopted at the early stages of system development it drives system design toward an established reliability and operational availability as the system matures.

One important step prior to performing a RAM analysis through simulation is the need to create a systems description document which identifies the functions of all the sub-entities of the system and highlights the interfaces between them. In addition, this document also captures the anticipated performance metrics of the system’s entities which are critical inputs for RAM analysis at the conceptual stages of development. Metrics to capture in the system’s description document for each system entity is its reliability, availability and maintainability. These metrics of the system must be specified based on system similarity to the source of comparison in order to make the results of the RAM analysis useful. Benefits of the system description document include:

� Provides a clear description of system boundaries and interfaces

� Defines the function of system entities and performance metrics

In the case of a cutting edge HPS system technology, RAM analysis was possible because the various entities within the system were fundamentally similar to technologies commercially available today. This observation is true for many first of a kind (F-O-A-K) systems, therefore for or F-O-A-K systems, RAM analysis can still be done during the early stages of system development if the assumption that any F-O-A-K system is a system built on existing technology whose application is new or whose entities are reassembled in a new way to yield new outcomes.

Page 11: System Verification Through Reliability, Availability

Though this may not be true for all systems, yet this is true for many systems whose developments are incremental improvements of previous designs and applications.

Based on this notion, subject matter expert (SME) approximation of failure rates for F-O-A-K systems can be used as a starting point for RAM analysis at the conceptual stage of system development. The areas of consideration when approximating failure rate of these system entities are; changes in the operating environment of the new system and the entity within the system whose function underwent adjustments or changes. Knowing about these changes helps SME’s reduce the margin of error in approximating failure rates of new system entities.

The RAM tool used to capture and rollup system failure rates to the system level is BlockSim by ReliaSoft. Within BlockSim system failure rates, maintenance schedules and end-user operational costs are rolled up to the highest level in the system hierarchy to yield a system level availability metric. This metric is scored on a scale of 0% to 100%. The closer the simulation output is close to 100%, the more reliable and available the system. In addition, this output is measured against the operational availability requirement established by the customer. If the simulation result is lower than the established operational availability requirement of the system, it means that the system as currently designed will not meet customer target metrics when constructed and deployed for operation.

Therefore as the system matures at established TRL’s, RAM analysis provides a way to verify and validate the current system at each TRL level against established operational requirements of the customer. Because simulations represent the ideal future state of the system it must be understood that the complexities of manufacturing and subsystem integration reduce the availability of the system when deployed. Therefore to accommodate for the impact of manufacturing on system availability, RAM analysis is used to improve system design beyond the operational availability requirement of the customer while ensuring that manufacturing limitations are addressed during the early stages of system development.

Page 12: System Verification Through Reliability, Availability
Page 13: System Verification Through Reliability, Availability

Conclusion Through a disciplined approach, the NGNP Project has been able to apply various risk management techniques and tools to forecast effective risk reduction and verify system operational requirements for the project. As various project tasks are completed, the risk management system and RAM analysis/simulation tool provides feedback and forecasting on system risk reductions and RAM improvements respectively. By managing system risks and RAM verification activities the Project is providing excellent information to all stakeholders and decision makers associated with the project.

Disclaimer

This manuscript has been authored by Battelle Energy Alliance, LLC under Contract No. DE-AC07-05ID14517 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

Page 14: System Verification Through Reliability, Availability

References 1. Wasson, Charles. System Analysis, Design, and Development: Concepts, Principles, and Practices.

John Willey & Sons Inc, 2006. 615-49. Print.

2. O, Patrick, David Newton, and Richard Bromley. Practical reliability engineering. 4th. John Wiley & Sons Inc, 2002. Print.

3. Blanchard, Benjamin, Wolter Fabrycky, Martin Abraham, Benjamin Blanchard, and James Jones. Supportability engineering handbook. 3rd. Prentice Hall: McGraw-Hill Professional, 2006. 345-89. Print.

4. Kubiak, T.M, and Donald Benbow. The Certified Six Sigma Black Belt Handbook. 2nd. ASQ Press, 2009. 278. Print.

5. Creveling, Clyde, Jeff Slutsky, Dave Antis, Clyde Creveling, and John Wang. Engineering robust designs with Six Sigma. Prentice Hall, 2005. 449-66,687-705. Print.

6. Department of Energy Office of Field Management/Office of Project and Fixed Asset Management, Good Practice Guide: RMA Planning. GPG-FM-004. 1996. Print.

7. Department of Defense (DOD), DOD Guide for Achieving Reliability, Availability, and Maintainability: Systems Engineering for Mission Success. 2005. Print

8. "Reliability Basics-Relationship between Availability and Reliability." Reliability Hot Wire Apr 2003: n. pag. Web. 25 Jan 2010. <http://www.weibull.com/hotwire/issue26/relbasics26.htm>.

9. “Fundamentals of Design for Reliability.” ReliaSoft 560 Training Seminar 2010”. < http://www.reliasoft.com/seminars/gencourses/rs560.htm>

Page 15: System Verification Through Reliability, Availability

Biography

Emmanuel O. Opare. Emmanuel received a B.S in Mechanical Engineering from Brigham Young University-Idaho and a Masters degree in Engineering Management and has work experience in Venture Capital, Entrepreneurship, and Systems Engineering. Currently a Systems Engineer at the Idaho National Lab, he supports the Next Generation Nuclear Plant (NGNP) project with Systems Engineering activities like Risk Management and Analysis, Requirements Management and RAM Analysis. Emmanuel is a member of American Society of Quality Engineers (ASQ) and INCOSE. In his spare time, Emmanuel develops tools to enable decision support in areas of strategic management, policy making, operational research, risk management and project management. Emmanuel’s essential skills are blending form and function in designs and mining intelligence from data.

Charles Park. Charles Park is a Registered Professional Engineer (PE) who has worked at the Idaho National Laboratory (INL) since 1982, with an emphasis in Systems Engineering and Project Management. He currently supports the Next Generation Nuclear Plant (NGNP) and the production of hydrogen using nuclear power. He is also a Certified Systems Engineering Professional (CSEP) with the International Council on Systems Engineering (INCOSE) and a certified Project Management Professional (PMP) with the Project Management Institute (PMI). Charles helped oversee construction of the Columbia Generating Station (a General Electric 1100 MW Boiling Water Reactor) near Richland, Washington, and was a project engineer with a regional consulting engineering firm before that. He has a Bachelor of Science degree in Civil Engineering from the University of Idaho, and his essential skill is coaxing order out of chaos.