Sean carter dan_deans
-
Upload
nasapmc -
Category
Technology
-
view
13.288 -
download
0
description
Transcript of Sean carter dan_deans
11
Sean Carter, NASA JSCDaniel Deans, ManTech SRS Technologies
Constellation ReliabilityEngineering Process –Optimizing CxP Risk
Used with Permission
2
DFRAM Overview
Why does reliability engineering exist?
How does it fit within the life cycle?
Success space vs. failure space
Partnership on system engineering team
The value of “designing-out” failure modes
Where does it fit in the lifecycle?
What are some of the tools?
How are they applied?
Real examples
2
3
Failure is not an option…
A design engineer does not know what he does not know
An extra set of eyes and ears is always good
You have to spend money to make money
Mr. Murphy tends to rear his ugly head when you are not expecting it…
What all this means is: You have to work at it – nothing worth accomplishing comes easy
Reliability engineering is a discipline that adds value to the systems engineering process!
3
Reliability Engineering Value - Clichés
4
Typical System Engineering Lifecycle
5
Reliability Engineering Throughout Project Life
66
The Life Cycle Approach Reliability is best designed-in;
it is, for the most part, not: Analyzed in Tested in Operated in
Successful reliability performance begins with a diligent, intentional approach at the very beginning of a project
Pre-phase A: requirements Phase A: allocation; plan; resources Phase B: analysis, design input, preliminary design review Phase C: detailed design inputs; more analysis; trade studies;
design verification; critical design review Phase D: test planning, test readiness, manufacturing, final
validation; flight readiness review Phase E/F: ops, growth, disposal and lessons learned
System EngineeringSystem Engineering Test and AssessmentTest and Assessment
Element Integration & Test
System Integration Test
System Element Data Reduction and
AssessmentSystem Concept
Exploration
Preliminary Design
Design Synthesis
Component Fabrication, Assembly, Integrate, & Test
Requirements Compliance
Configuration Management
Project Direction, Control, & Planning
Risk Management
System Analysis
Project Direction
and Control
Project Direction
and Control
• System, Element, Subsystem Models
• System Performance Analyses
• Specifications• Verification
• Management Plan• Budget Development & Control• Project Plan Development• Schedule Development & Control
• Design Data Base• Problem/Failure
Reports (PFR)• Engineering Change
Orders
• Risk Planning• Risk Assessment• Risk Handling/Mitigation• Risk Monitoring
77
Success Space vs. Failure Space
A design engineer thinks in success space (typically) How will the widget work? When it is designed, what function will it perform? What are the performance requirements?
Reliability engineer paid to think in failure space How will the widget fail? What about the operating environment will cause issues? What materials, processes, and tools will accentuate failure modes? Is redundancy required Are there operational work-arounds? How will faults propagate through the system? What are the effects of a failure mode on the mission
Superimpose the two processes, you get success!
88
Credibility: Partnership on System Engineering Team
Safety and Mission Assurance organization provides discipline experts to support design teams
Our job is to serve; not to inhibit
We help the system engineering teams identify hazards and failure modes and design them out
Our sole reason for existing is to ensure project/program success and to reduce/eliminate operational risk
We are partners for success
The aim in partnership is to duplicate our knowledge in the collective heads of our design-team partners
9
The Value of “Designing-Out” Failure Modes
A failure mode is an obstacle to mission success
Not all may cause mission failure, but, any failure of a component has potential
In the commercial world, a failure in the field costs 10 times what it costs to mitigate in the design process
In the space business, a failure can and will cost the mission and quite possibly endanger people
Identifying and designing-out failure modes is important!
9Company Confidential
1010
How Do We Design Out Failure Modes?
Methodical process; starts in pre-phase A, follows the lifecycle. DMEDI – Define, Measure, Explore, Develop, Implement
(12 steps) Define requirements Allocate requirements Plan activities and analysis, including test and verification Collect data and develop data sources Use RAM simulation, FMEA, FTA, worst case analysis, derating,
proven design practices to drive the design Support design reviews and require improvement Verify and ensure that design will meet requirements Plan and implement thorough testing Finalize verification, ascertain flight readiness Identify reliability growth opportunities once design is complete Investigate and eliminate root causes to anomalies Develop lessons learned, provide feedback to future engineering teams
11
Pre-Phase A Concept Development
Very important part of process –DFRAM starts here
Develop requirements that will optimize RAM for program/project
Requirements include availability, mean time to failure, fault tolerance, mean time to repair, time to replace
Import lessons learned from similar programs/systems
Collect similar system failure history data
Begin development of system model
Begin development of RAM Plan
12
Phase A: Preliminary Analysis Refine requirements, negotiate
allocations with design elements Finalize RAM Plan and educate design
team on process; what role reliability engineering team will fill
Continue to develop preliminary model; begin FMEAs, FTAs, Probabilistic assessments
Allocate requirements to lowest design-to level
Negotiate failure definitions, failure budgets with design teams
Identify initial critical items, compare with lessons learned from previous systems
Continue to identify data sources Identify critical suppliers; begin to form
partnerships
13
Phase B – Preliminary Design Continue to build simulation (model) and
add more details Identify most effective analyses tools to use
to drive design Complete preliminary FMEA, FTA, PRA Continue to develop supplier partnerships Prepare for preliminary design review Perform maintenance task analysis Identify design improvement initiatives and
optimize using simulation Perform other sensitivity studies based on
fault tolerance requirements Begin developing and finalizing FRACAS,
test plans, reliability growth strategy Partner with designers to identify failure
modes, design them out Support concept of operations optimization
14
Phase C – Detailed Design
Perform detailed design analysis – PDR recovery Focus on pareto items identified from analyses (Top 10) Continue to develop and use RAM simulation, FMEA,
FTA, etc. to design out failure modes Use Con-Ops to develop operational work-arounds as
failure mode mitigation Finalize test plans –review for reliability success criteria Audit suppliers, provide support for reliability
improvement Mitigate schedule risks Finalize critical items, document for testing Begin life testing of components and subsystems as
feasible Perform specialized analysis (sneaks, fault propagation) Prepare for and support CDR
15
Phase D –Development Finalize design - CDR recovery, cut into
manufacturing Finalize FMEAs, FTAs, Simulations, CILs Support testing, root cause
investigations and corrective action Begin collection of failure and
operational history data (upon first application of power)
Finalize reliability growth strategy Develop and begin implementation of
reliability-centered maintenance approach
Make “last minute” improvements based on test results
Identify lessons learned and document Update Con-Ops with operational work-
arounds for critical items
16
Phase E/F – Ops and Disposal
Continue to gather data, monitor operations for anomalies
Support failure analyses, root cause investigations
Implement reliability growth process, identify areas for growth, design solutions
Document lessons learned Use simulation to validate reliability
growth strategy, sensitivities Update RAM Plan with lessons
learned Support system disposal via
identification of reliability challenges to shutdown
17
What are the Tools? Some of the tools that we use are:
Requirements allocation
RAM simulation/probabilistic risk assessment
FMEA/FMECA
Fault tree analysis (FTA)/event tree assessment
Parts stress analysis/derating
Detailed design analysis
Worst case analysis
Redundancy screens
Extensive testing and verification analysis
Reliability growth planning and implementation
Others….
18
Reliability and Maintainability Simulation A very powerful process Can help design out failure modes without cutting metal Provides for the Pareto Principle (20/80) Gives design team a tool for sensitivity analysis Allows for trying many different scenarios Helps to optimize the return on investment based on cost to
improve curve
$ Cost
Rel
iabi
lity
High rate of return
KITC
Area of diminishing return
KITC = Point on Curve where rise becomes less than run (reliability improvement = rise, cost to improve = run)
19
Simulation Basics
Simulations are built based on the system architecture Model provides for “RAM” characteristics of system Input data includes failure rates, repair times, sparing
information, logistics information, operational work-arounds
Simulation is run based on mission profiles “Monte Carlo” methodology is used Typically data is input using statistical distributions Outputs are system availability and cutsets (and other
failure “illuminators”) Cutsets lead to sensitivity analyses which in turn can
drive improvements (failure mode elimination)
20
RAM Simulation Example
Simulation is dynamic, not static analysis Can provide much information about overall availability
of system under many different sets of conditions Today’s tools can include operational concepts and
rules, optimization of spares (some automatic) Requires specific input data
21
How Results are Used Outputs of baseline simulations are verified and
validated using expert elicitation Once all agree that the simulation is in the “ballpark,” (do
not get wrapped around the axle on the numbers; it is the gap elimination that provides the most value) – begin the sensitivity analyses
Identify opportunities for improvement, plug those back into the sim, ascertain value of improvements
Continue this process until gaps are eliminated or at least reduced.
This can include block improvement of overall component failure rates – get the suppliers in on the act (supplier partnerships)
Ensure data from simulation is used in the design process
22
Success Stories: NASA Instrument Design Validation of proper installation of sample cup retaining springs
on Sample Manipulation System to preclude workmanship failures. (single ring failure would result in loss of solid sample science)
Use of physics of failure methods to identify and eliminate, where possible, failure modes of Pyrolysis Oven.
Implementation of HiPot test for Wide Range Pump motor to eliminate workmanship related failures.
Identification of Hall Effect Device on actuators as possible Radiation Sensitive device. Subsequent testing validated suitability of device.
Identification of thermal switch on Gas Trap as Reliability Issue. Redesign produced higher Reliability solution.
FMEA of Gas Processing System provided justification for addition of limited redundancy.
Improved reliability of instrument by approximately 25% based in initial predictions.
23
Complex Space Systems Application Predicated on effective
requirements implementation
Detailed RAM Plan developed and implemented at Program Level
RAM requirements, RAM Plan flowed down to systems, elements of systems
System owners responsible for DFRAM, but program will facilitate and audit
Program level analyses including simulation, FMEA, PRA being performed
Verification and validation will be program level functions
PRA will be part of flight readiness decision
Software included in DFRAM activities (no longer black box)
System Engineering organization partnering with S&MA organization for RAM implementation
23
24
SUMMARY Success of a system
predicated on intentional implementation of DFRAM
It will not happen spontaneously
Must be married with the system engineering process
Program management must be disciples – will not work otherwise
It is always easier and more cost effective to do it right the first time
Implementation requires people skills and a service mentality
24