1
RCM Reliability
Centered
Maintenance
Overview for RCM Teams
x
2
RCM Overview for RCM Teams
Reliability
Centered
Maintenance
x
3
• Based on Reliability Centered Maintenance
Concepts
• Proven Technology
• Provides a structured approach to
establish a Maintenance strategy
x
4
• Provides a strategy for stocking
spare parts
• Provides an easy way to track
implementation of an RCM analysis
x
5
Course Outline
I. RCM History
II. Principles of RCM
III. Maintenance Tasks
IV. Understanding Failure
•Failure Patterns
•Effective Maintenance Tasks
for each pattern
x
6
V. RCM process steps
•Operating Context
•Primary and Secondary Functions
•Functional Failures
•Failure Modes
•Probability Rating
•Failure Effects
•Consequence Rating
VI. Case study x
7
Reliability Centered Maintenance
History
• 1974 - United States Department of
Defense commissioned United Airlines
to report on the maintenance process used
in the aviation industry
• Conducted by
• Stanley Nowlan
• Howard Heap
x
8
Reliability Centered Maintenance
History
• At this time the airline industry was using a high
amount of redundant systems combined with
scheduled discard and scheduled restoration tasks
to ensure safety
• Records showed that the maintenance tasks had little
or no effect on the reliability of the aircraft. In fact
some of the maintenance was shown to be
intrusive, and could cause failures.
• Today RCM remains the process used to
develop and refine aircraft maintenance
x
9
x
Principles of RCM
•What Function does this machine serve?
•What is the Functional Failure?
•How does the Failure occur?
•What are its Consequences?
•What can be done to PREVENT the Failure?
•What can be done to reduce the
Consequences of a failure?
10
MAINTENANCE:
Focus on equipment performance
ensure physical assets continue to fulfill their
intended function
improve equipment performance to meet
business needs
performed in a cost effective mannerx
11
•The objective of RCM is to use existing
process knowledge to develop a
maintenance program that will maximize
equipment up time. (Reliability)
• RCM looks to accomplish this by applying
a series of questions to a modified
FMECA (Failure Mode Effect and
Criticality Analysis).
x
12
• Maintenance tasks tend to fit in one of
five categories
•On Condition Tasks - or Condition Monitoring
•Restoration Tasks - restore equipment to its
original state
•Discard Tasks - complete replacement of
a component
•Failure Identification Tasks - operating
check of hidden functions to ensure
they work when needed
•Redesign Tasksx
13
Understanding Failures
• In order to set up the proper maintenance
for a piece of equipment or component
you must first understand how it fails.
• RCM requires an understanding of six
failure patterns and designs a maintenance
program using techniques best suited
for each patternx
14
A
B
C
D
E
F
Failure
Patterns
x
15
A
Pattern A
•Commonly referred
to as a bath tub curve
•For many years this was thought to be the
only failure pattern for equipment.
•We now know it is really a
combination of separate failure
patterns
x
16
A
• This failure pattern
shows early life failure followed,
by a period of random failure, until it reaches an
age where it becomes rapidly more prone to
failure.
• 4% of the failures in the Nowlan and Heap
study fit this failure pattern
• On-Condition monitoring is the preferred
maintenance strategy.
17
•An age limit may be an effective maintenance
strategy, provided a large percentage of units
survive to the age at which wear-out begins.
• Simple electromechanical systems fit this failure
pattern.
A
18
BPattern B
•Shows age related failures where a
component has a low level of random
failures, until it reaches an age where it
becomes rapidly more prone to failure.
x
B• 2% of the failures in the
Nowlan and Heap study fit this pattern
• On-Conditioning monitoring is the preferred
Maintenance strategy.
• Scheduled Discard or Restoration may also be an
effective maintenance strategy.
• Aircraft reciprocating engines fit this failure
pattern - belts, sheaves, chains, sprockets, and
impellers are other examples19
20
CPattern C
•Shows a steadily increasing probability
of failure but no one point where we can
say it reaches an age where it becomes
rapidly more prone to failure.
• 5% of the failures in the Nowlan
and Heap study fit this failure
pattern x
C• On-Condition monitoring
is the preferred
maintenance strategy.
• Scheduled Discard or Restoration may also be an
effective maintenance strategy. It depends on the
cost of downtime in comparison to the cost of
component replacement.
• Aircraft turbine engines fit this pattern. Other
examples of Pattern C components are Pipes,
Tires and Clutches.
21
22
DPattern D
• This failure pattern shows that the
equipment starts up and runs for a short
time with no failures, increasing quickly
over a short period of time, to a consistent
level of random failures.
x
23
D• 7% of of the failures
in the Nowlan and Heap study fit this pattern.
• On-Condition monitoring is the preferred
maintenance strategy.
• Hydraulics and pneumatic components fit this
failure pattern.
24
EPattern E
• Random failure pattern, the probability
an item will fail is the same at any given
point.
• Ball bearings are an example of a failure
pattern E component
x
25
E• 14% of the failures
in the Nowlan and Heap study fit this failure
pattern. ( Exponential Survival Distribution )
• More recent studies indicate failure rates between
42% and 72%.
• On-Condition monitoring is the preferred
maintenance strategy.
• Time based maintenance is not effective
in reducing these failures.
26
FPattern F
• Early life failure pattern, the probability
of failure declines with age. The highest
probability of failure occurs when the
equipment is new.
• Electronic components are an example
of this failure patternx
27
F• 68% of the failures
in the Nowlan and Heap study fit this pattern.
• More recent studies indicate failure rates between
16% and 29%.
• Look to solve early life failures by using
“burn-in” techniques.
• If failure rates are too high, explore
redesign possibilities.
A
B
C
D
E
F x
28
4% On-Condition Task
2% On-Condition Task
Scheduled Discard Task
Scheduled Restoration Task
7% On-Condition Task
65% On-Condition Task
17% On-Condition Task
Redesign Task
Industry today
5% On-Condition Task
Scheduled Discard Task
Scheduled Restoration Task
29
The RCM Process
The RCM process is structured and interactive, it
takes the combined effort of a cross functional team
to complete the RCM process.
The remainder of this course will use a case study to
simulate how the RCM process flows
and demonstrates the outputs of an RCM
analysis.
30
The RCM Process
We begin the RCM process by writing a
formal Operating Context for the machine
we are analyzing.
31
The RCM Process
The Operating Context will clearly describe:
• Why the asset exists
• When it was purchased and installed
• What the expectations are of this process
• The consequences of unscheduled downtime
• The present condition and performance
of this process
32
The RCM Process
Describing Primary and Secondary functions
With an Operating Context and process
drawings, the RCM team can now begin to describe the
machine functions.
The first function of any machine is the Primary Function
The remaining machine functions are know
as Secondary Functions.
The RCM Process
Describing Primary and Secondary functions
The Primary Function is the major reason why an
asset exists. A primary function should include the
following:
• Performance Standards
• Quality Standards
• HSE Standards
“Think about - What you have when the process starts,
and what you want when the process is finished.”
33
Describing Primary and Secondary functions
The RCM Process
While the primary function is the reason an asset exists,
it is important to list the remaining Secondary Functions.
Secondary functions are often less obvious, but should be
considered just as crucial as the primary function.
Secondary functions can be active, such as to be “To be
able to pump”, or passive as in “To be capable of
relieving an over pressure condition”.
34
35
The RCM Process
Describing Primary and Secondary functions
Examples of Secondary Functions:
• To be able to contain fluid in a gear box.
• To be capable of shutting down the machine in the event
of and emergency.
• To be able to support the mixer
• To be able to provide visual indication
• To be capable of protecting personnel from
rotating equipment.
36
Class Exercise
•Read the Operating Context for the Methanol
Unload System
•List the primary function and three secondary
functions.
37
Methanol Unload System Operation Performance Statement
The methanol unload system is located at building Z Kodak Park. The pumping
system was designed to unload methanol from a tractor trailer tanker to a holding tank. It
was purchased and installed in 1995. The unload system was designed and installed for
off loading full and partial loads of methanol. The holding tank is a 20,000 gallon
stainless steel tank mounted inside a sealed concrete vault and surrounded by pea -gravel.
The holding tank has a level float that provides continuous level readings and a high level
shutdown at 18,000 gallons. Level alarms warn the operator in the control room with a
light on the control panel, an audible bell and a message on the CRT. A high level
condition will shut down the pump, it will be able to restart when the level drops below
13,000 gallons. The emergency high level device is a high level probe set at 19,000
gallons. The emergency high level probe will alarm and shutdown the entire system, it
will not be able to restart without operator intervention. The concrete vault is designed
for environmental protection should the tank overflow, or leak. There is a Solvent Vapor
Detector located inside the vault, it is capable of detecting minute amounts of vapor.
Approximately a one quart spill will trigger the detector, the entire system will be shut
down. The trailer unload pump is a stainless steel centrifugal pump. The designed flow
and pressure of the pump are 120 GPM at 70 PSI. The pump is isolated by hand valves
on either side and protected by a minimum flow indicator and a pressure tap. The
minimum flow indicator must see a liquid flow of greater that 100 GPM or the pump will
shutdown after one minute of low flow condition. The pressure tap will shutdown the
pump if the discharge pressure falls below 50 PSI for more than one minute. Piping
between the pump and holding tank is 2” stainless steel and has drain valve taps before
and after the pump. Trailer wagons average 5,000 gallons and the cost of methanol is
$9.00 per gallon. Chemical spills are not tolerated at Kodak Park, leaks of more than 1
quart are reported to loss control at a minimum cost of $20,000 per incident. If a spill of
more than one quart reaches the soil outside of the concrete vault it requires a total
shutdown and clean up of the spill area resulting in at least 3 days of lost production.
Tank trucks are scheduled in at two hour intervals, starting at 7 am each day with the last
deliver at 3 pm, the average cycle time to unload is 40 minutes. If a truck is not unloaded
when the next one arrives on site, the driver calls dispatch and cancels the next delivery.
We are charged for the undelivered load at the regular rate. Any time the tank runs dry
there are consequences of catastrophic proportions.
Methanol Case Study P&ID DrawingSV3
LS3LS2
PS1
LS1
SV2
TV1SV1
FS1
SVD1Hose
Pump
DV1 DV2
38
39
The RCM Process
Functional Failures
The next step in the RCM process is to list out the
Functional Failures for the primary and secondary
functions.
40
The RCM Process
Functional Failures
Functional Failure - Is the point in time when a
process or component can no longer perform its
function at all or is unable to meet its desired
performance standard.
A Functional Failure statement includes:
• An exact performance level that defines the
point of failure.
• Is normally stated as either a total loss of functional
capability or a reduced functional capacity.
41
Class Exercise
List out the functional failures for the
primary function and three secondary functions
of the Methanol Unload System.
42
The RCM Process
Failure Modes
Once the Functions and Functional Failures
have been listed, we look to identify the
Failure Modes.
The Failure Mode is the specific manner or
cause of a process or component failure.
43
The RCM Process
Failure Modes
Failure Mode statements should include:
• The specific component that has failed
• The specific cause of the failure
For example - Pump bearing fails due
to lack of lubrication.
The bearing is the specific component, and
lack of lubrication is the specific cause.
44
Failure Modes
The RCM Process
When listing failure modes you should:
• List all failure modes that have occurred
• List failure modes that may not have
occurred but are likely to at some
point in time.
45
The RCM Process
Probability Rating
Once a failure mode has been identified and
listed, we then look to assign a probability
rating to the failure mode.
The probability rating combined with
the consequence rating is used to help
us determine the criticality of a
failure mode.
46
The RCM Process
Probability Rating
Probability -The likelihood that an event will happen. If there is
no maintenance history for your process, we use the experience of
the mechanics and operators to access the probability of failure.
High - The failure has happened often enough to be considered a
dominant failure mode.
Medium - The failure has happened often enough to remember
and probably will happen again.
Low - The failure has never happened or has happened
so infrequently that I can’t remember last time it did.
47
The RCM Process
Failure Effects
After listing a Failure mode and defining its
probability of failure, we now can describe the
Failure Effect.
The Failure Effect is the physical effects of a failure
mode on the functional capability of the equipment.
48
The RCM Process
Failure Effects
Failure Effect statements include:
• The first sign of evidence the failure has occurred
to the operating crew.
• Describe as much as possible the chain of
events that precede the failure.
• Describe the chain of events that happen
after the failure occurs
49
Failure Effects
The RCM Process
Failure Effect statements include:
• The consequences resulting from the failure
• Any secondary damaged caused by the failure
• Downtime that resulted from the failure
• How often the failure occurs.
50
The RCM Process
Consequence Rating
Once the failure effect has been clearly described,
the team can now assign a Consequence Rating to the
Failure Mode.
The Consequence of the failure mode is the impact
the failure has on your business -
• HSE
• Cost
• Secondary Damage
• Downtime
The RCM Process
Consequence Rating
The impact on your business, safety, environment
or cost of repair if failure occurs
High - The failure causes a loss in capacity that will adversely impact
production schedules. This failure has a direct adverse effect on HSE.
High repair cost or waste.
Medium - This failure will cause the operating department to shift schedules,
crews may have to work overtime or additional costs will be incurred to
manufacture product. This failure does not effect HSE issues.
Low - The failure has no impact on the end user customer, there is a lot of
unused machine capacity or a large amount of stored product. There is no
secondary damage resulting from this failure and no impact to HSE.
51
52
Class Exercise
List out five Failure Modes that would cause you
to be unable to unload methanol at all.
- Assign a probability rating for the failure mode
- Clearly describe the effects of the failure
- Assign a consequence rating to the failure
53
Methanol Tank Farm Failure History
Component # of failures Repair Time Work around SOP's
TV1 1 0 Pump from top of trailer
Hose connector 4 0 Pump from top of trailer
SV1 failed closed 2 6 hrs
SV1 leaks by Leaking now
DV1 left open 1 72 hrs
DV2 left open 0
DV1 or 2 failed closed 0
FS1 failed closed 2 0 Rely on Pressure reading, extra operator
FS1 failed open 0
Pump seal leaking 8 6 hrs
Pump impeller 1 10 hrs
Pump motor 0
Pump motor coupling 8 30 mins
LS1 failed open 1 4 hrs
LS2 failed open 1 4 hrs
LS3 failed 0
PS1 failed low 1 30 mins Rely on Flow reading, extra operator
PS1 failed high 0
Pump bearing 1 6 hrs
Pump pipe flange seal 2 2 hrs
SVD1 failed 1 4 hrs Substitute a portable sniffer
Component Cost Lead time P/N
FS1 $4,000 6 hrs SW123
Hose coupling $150 2 hrs HC-99
Impeller $10,000 3 months IMP-345
LS1, LS2, LS3 $75 2 hrs LS-239
PS1 $350 2 hrs PS-124
Pump seal $500 2 days SEV-789
Pump bearing $400 2 hrs TMP-41
Flange seal $25 2 hrs FS-Y2K
SVD1 $5,000 75 days SVF-UALL
SV1 $35 4 hrs STV-1000
DV1 $35 4 hrs STV-1000
Pump motor coupling $35 4 hrs COUP-1
54
The next step of the RCM analysis is the decision process.
It is where we determine the type of failure consequence and
the proper maintenance strategy. There are four types of failure
consequences.
• Hidden
• Health, Safety, and Environmental
• Economic
• Non-Economic
The RCM ProcessRCM Decisions
55
Hidden Failures
• Have no direct consequences
• Require multiple failures to
be evident to the operating
crew
Examples:
• Safety devices
• Redundant systems
56
Health, Safety and Environmental Failures
•Have a direct effect on the health & safety
of individuals or the environment
Examples:
•Leaks
•Spills
•Emissions
•Pinch points
57
Economic Failures
•Direct adverse effect on Production
costs
Examples:
•Non-Conforming Product
•Yield losses
•Maintenance cost
•Operating Costs
•Parts costs
58
The RCM ProcessDetermining the correct Maintenance Strategy
In working our way through the decision process, the
final step is to determine the correct maintenance
strategy and interval.
• Predict the failure is about to occur
• Prevent the failure from occurring
• Reduce the consequences should the failure
occur.
• Redesign to eliminate the failure
This requires an understanding of the different
types of maintenance.
59
Types of Maintenance
•On Condition - Inspection, measurement
observation, non-destructive task for parts prone
to failure. (Predictive)
•Restoration -Once the end of useful
life is reached, you restore the component
to “like new” condition (Preventive)
60
•Discard - Once the end of Useful Life is
reached, you remove and replace the component.
(Preventive)
•Failure Identification - Inspection for
undetected failure of components not used
during normal operation (Consequence
reduction)
Types of Maintenance
61
On-Condition Monitoring
Traditional methods
• Vibration analysis
• Lubrication analysis
• Thermography
• Motor Circuit Evaluation
• MCEmax testing
• Ultrasonic testing
62
Contemporary methods
Process Monitoring
• Temperature
• Pressure
• Electrical Current
Statistical Techniques
• Trend Charts
• Control Charts
On-Condition Monitoring
63
Informal methods
•People’s senses
• Daily walk around
• Cleaning
• Listening
• Visual observations
• Smell
• Touch
On-Condition Monitoring
64
RCM Maintenance Intervals
• For scheduled discard and restoration
tasks
• Are determined by knowing the
“Useful life” of the component
65
MTBF
Useful
Life
Time
Nu
mber
of
Fai
lure
sUseful Life Failure Chart
x
66
RCM Maintenance Intervals
For On-Condition Tasks
• When setting up tasks for On-Condition
maintenance, use the P-F interval of the
failure.
67
Typical Failure Curve
P
F
P-F Interval
P - The point in time when failure
begins
F - The point in time when
equipment can no longer
deliver it’s primary function
x
68
P
F
Ball Bearing
PF Curve
12
3
4
5
6
7
Time
Perfo
rman
ce Lev
el
69
P
F
1 23
4
5
6
7
Time
Perfo
rman
ce Lev
el Reaction
Time
Ball Bearing
PF Curve
70
P
F
1 23
4
5
6
7
Time
Perfo
rman
ce Lev
el
Reaction
Time
Inspection
Interval
1/2
Ball Bearing
PF Curve
71
MTBF (Mean Time Between Failure)
is not valid or useful in determining
Maintenance task intervals.
x
72
• When determining maintenance task
intervals, the team should be conservative,
when data for failure history is not available.
• Tasks should be cost effective.
73
Class Exercise
•Run each of your failure modes through the
RCM decision and Spare Part Flow Charts
•Determine the correct task and interval
•Determine the correct spare part strategy
Will failure
be detected
while the operator is
performing their
normal
duties?
Will this
failure on its
own effect
HSE?
Will
this failure
have operational
(economic)
consequences?
Is there
a failure finding
task that would
detect
the failure?
NO
START
YES NO
No scheduled
maintenance
required
NO
Establish a
failure finding
task
YES
Is there
any early warning
the failure is going
to occur?
YES
Is there an
on-condition task
that is applicable
and cost effective?
YES
Is there a
scheduled rework or
discard task that
would reduce the
failure rate?
NO
Is this task
applicable and
cost effective?
YESNO
YES
Redesign of the
1. Equipment
or
2. Procedures
or
3. Process
is
REQUIRED
NO
NO
Implement a
preventive
maintenance task
(Scheduled
rework or
discard)
YES
Does this
failure effect
HSE?
Consider
Redesign of the
1. Equipment
or
2. Procedures
or
3. Process
to reduce
consequences to
an acceptable
level
NO
NO
YES
YES
YES
NO
YESNO
Implement
predictive
(on-condition)
maintenance task
YES
NO
NO
YES
Implement a
preventive
maintenance task
(Scheduled
rework or
discard)
Is there a
business case for
a redesign?
Redesign
No scheduled
maintenance
required
(Consequence
reduction strategy )
YES
NO
Implement
predictive
(on-condition)
maintenance task
Reliability Centered Maintenance Decision Flow Chart
Is there a
scheduled rework or
discard task that
would reduce the
failure rate?
Is there
any early warning
the failure is going
to occur?
Is there an
on-condition task
that is applicable
and cost effective?
Is this task
applicable and
cost effective?
Risk
Consequence
Hig
hN
eglig
ible
Negligible Low Medium High Extensive
Spare Parts Risk Matrix
Do
Not
Stock
Part
Stock
The
Part
Risk
CONSEQUENCENegligible - *No impact on production
*No HSE risk*No impact on process
Low * No impact on production* No HSE risk* Some impact on process
Medium * Some impact on product* Some impact on process* No HSE risk
High * Unacceptable product* Failed process* HSE risk
Extensive * Defective product to Customer* Loss of this machine shuts down
a customers process* Someone will be injured
Formula :Part Stocking Cost = Purchase price + (.30 X Purchase price
X yrs of no use)W aiting For Parts Cost = Out of pocket cost per hr X Lead time
in Hrs waiting for part
PROBABILITYNegligible - can not imagine this ever happeningLow - can not remember the last time it happenedMedium - happens occasionallyHigh - happens often
DO NOT STOCK THE PART - The risk of not stocking the part is low
RISK - The risk factor varies based on many local decision points. If your business has a high risk tolerance, do not stock the part. If your business has a lowrisk tolerance, stock the part
STOCK THE PART - The part can be stocked as an LMSC, minus one, Stock 1, Sub-stock room, or KSC stock. Pick the option that suits your needs.
Spare Parts Risk/Cost Decision Model
START
Is there anyearly warning thefailure is going to
occur?
Can areplacement part
be acquired beforethe failure
occurs?
Is therea known age atwhich the part
fails?
Does waitingfor part to be
delivered cost morethan Stocking
Part ?
Do notStock Part
Yes
No
Yes
Ispart already
stocked?
Look for areplacement part
or redesign
Continue toStock Part
Yes
No
No
No
Yes
No
Yes
Yes
Is partObsolete?
NO
Top Related