Systems Prognostic Health Management April 1, 2008
description
Transcript of Systems Prognostic Health Management April 1, 2008
Systems Prognostic Health ManagementApril 1, 2008
Christopher ThompsonIBM Global Business Services
FCS LDMS Program
Systems Engineering Program
Disclaimer: This briefing is unclassified and contains no proprietary information. Any views expressed by the author are his, and in no way represent those of Lockheed Martin Corporation.
2
My Engineering ExperienceIBM Global Business Services, Dallas TX
Requirements Lead/Prognostics SMEFCS Logistics Data Management Service (LDMS)
Lockheed Martin Missiles and Fire Control, Dallas TXSenior Systems Engineer- Multifunction Utility/Logistics Equipment (MULE)
Lockheed Martin Aeronautics, Fort Worth TXVehicle Systems - Prognostic Health Management- F-35 Joint Strike Fighter (Lightning II)
Lockheed Martin Missiles and Fire Control, Dallas TXReliability Engineer- Army Tactical Missile System (TACMS)
SMU School of Engineering, Dallas TX- TA for Dr. Stracener
3
My Education
B.S. in Electrical Engineering, SMU (1997)
M.S. in Mechanical Engineering, SMU (2001)- Major: Fatigue/Fracture Mechanics
M.S. in Systems Engineering (2002)- Major: Reliability, Statistical Analysis
Ph.D. in Applied Science (anticipated ~ 2009)- Major: Systems Engineering (PHM)
4
My Dissertation
Fleet Based Analysis of Mission Equipment Sensor Configuration and Coverage Optimization for Systems Prognostic Health Management
5
Sensor Tradeoffs
As more sensors are added to your system:
GOOD BAD
weightincreases
A0
increases
P(FDI)increases
R(t)decreases
TRADEOFFS
powerincreases
AUPCincreases
Life costdecreases
MTTRdecreases
MTBUMAincreases
P(Prog)increases
volumeincreases
cablingincreases
6
PHM Optimization
OperationalAvailability
AO
0 # of Sensors N
Cost*$
AUPC
* Other metrics will include Weight/Volume, Power (K/W), Specants (Computing Power)
LCC
AO
Optimum
7
PHM Optimization
Probabilityof
Detectionof
Crack
0 x = mean distance between sensors X
x = mean distance between sensorsx
x
x
structuralelement
∞ N = # of sensors 0
optimumsolution
8
For a common LRU (such as an engine), plotting engine power against against an environmental measure (such as temperature) over time:
NoDamage
MildDamage
ModerateDamage
SevereDamage
EngineInternalTemp.
Time
Engine Power
200% spec limit
150% spec limit
125% spec limit
100% spec limit
9
Estimating the damage accumulated (or life consumed)
MildDamage
ModerateDamage
SevereDamage
x1
x2
x4 (or more)
150% specification limit
200% specification limit
Hypothetical engine air/oil/fuel filter performance over its life
optimalperformance
engine systemfailure likely
acce
ptab
lepe
rfor
man
ce
filter life (in miles)
filt
er p
erfo
rman
ce (
flo
w r
ate)
engine systemdamage likely
deg
rade
dpe
rfor
man
ce
MTBF
haza
rdou
spe
rfor
man
ce
distribution offailure times
11
Common LRU used on multiple vehicle types with platform specific (hidden) failure modes
Why is the Failure
Rate for the LRU
in Platform 4 higher?
What is different
about Platform 4?
Platform 1
FailureRate
Platform 2 Platform 3 Platform 4
stat
istic
ally
sig
nific
ant
diff
eren
ce
12
Standard oil filter used in engines across FCS vehicles,
replaced at a scheduled time/miles
ScheduledReplacement
Time
Actual Condition of the oil filter
Vehicle 1 Vehicle 2 Vehicle 3
We
aro
ut –
Life
Co
nsu
mpt
ion
wasted filter life
Increased engine lifeconsumption
Correctaction
13
Standard structural element across several vehicles (under cyclic loading)
Time (or miles, or load cycles, or on/off cycles, …)
DamageAccumulation
fleetbased
estimate
Repair neededbefore estimate
LRU lifehistories
14
The MULE Program
Future Combat Systems Multifunction Utility/Logistics Equipment
15
Keys to the Success of FCS
• Reducing Logistics footprint• Increasing Availability• Reducing Total Cost of Ownership• Implementing Performance Based Logistics• Improvements in the ‘ilities’ (RAM-T)
– Reliability– Availability– Maintainability– Testability– Supportability
16
Prognostics
Of or relating to prediction; a sign of a future happening; a portent.
The process of calculating an estimate of remaining useful life for a component, within sufficient time to repair or replace it before failure occurs.
17
Prognostic Health Management (PHM)
PHM is the integrated system of sensors which:• Monitors system health, status and performance • Tracks system consumables
oil, batteries, filters, ammunition, fuel…• Tracks system configuration
software versions, component life history…• Isolates faults/failures to their root causes• Calculates remaining life of components
18
Diagnostics
The identification of a fault or failure condition of an element, component, sub-system or system, combined with the deduction of the lowest measurable cause of that condition through confirmation, localization, and isolation. • Confirmation is the process of validation that a
failure/fault has occurred, the filtering of false alarms, and assessment of intermittent behavior.
• Localization is the process of restricting a failure to a subset of possible causes.
• Isolation is the process of identifying a specific cause of failure, down to the smallest possible ambiguity group.
19
Faults and Failures
Fault: A condition that reduces an element’s abilityto perform its required function at desired levels, or degrades performance.
Failure: The inability of a component, sub-system or system to perform its intended function. Failure may be the result of one or more faults.
Failure Cascade: The result when a failure occurs in a system where the successful operation of a component depends on a preceding component, which can a failure can trigger the failure of successive parts, and amplify the result or impact.
20
Classes of Failures
Design Failures: These take place due to inherent errors or flaws in the system design.
Infant Mortality Failures: These cause newly manufactured systems to fail, and can generally be attributed to errors in the manufacturing process, or poor material quality control.
Random Failures: These can occur at any time during the entire life of a system. Electrical systems are more likely to fail in this manner.
Wear-Out Failures: As a system ages, degradation will cause systems to fail. Mechanical systems are more likely to fail in this manner.
21
The Ultimate Goal of Prognostics
The aim of Prognostics is to maximize system availability and life consumption while minimizing Logistical Downtime and Mean Time To Repair, by predicting failures before they occur. This is a notional diagram indicative of a wear out failure.
22
What is PHM?Prognostic Health Management (PHM) is the integratedhardware and software system which:• Monitors system health, status and performance • Tracks system consumables
oil, batteries, filters, ammunition, fuel…• Tracks system configuration
software versions, component life history…• Diagnoses/Isolates faults/failures to their root causes• Calculates remaining life of components• Predicts failures before they occur• Continually updates predictive models with failure data
23
What is PHM?Prognostic Health Management is a methodology for establishing system status and health, and projectingremaining life and future operational condition, by comparing sensor-based operational parameters to threshold values within knowledge base models. These PHM models utilize predictive diagnostics, fault isolation and corroboration algorithms, and knowledge of the operational history of the system, allowing users to make appropriate decisions about maintenance actions based on system health, logistics and supportability concerns and operational demands, to optimize such characteristics as availability or operational cost.
24
PHM Stakeholders
SYSTEMS ENGINEERING
SOFTWARE & SIMULATION
TEST ENGINEERING
MECHANICAL ENGINEERING
ELECTRICAL ENGINEERING
TRAINING & PROD. SUPP.
PHM ModelDesign
InterfaceManagement
RequirementsDevelopment
SensorOptimization
CAIV/WAIVAnalysis
PrognosticTrending
SystemArchitecture
PHM ModelIntegration
SoftwareInterfaces
Fault/FailureSimulation
ContinuousBIT/PHM
TestPlanning
Fault/FailureCriticality
Fault/FailurePropagation
Fault/FailureSimulation
PlatformIntegration
Crack GrowthSensing
Stress/StrainSensing
CorrosionSensing
VibrationSensing
ConsumablesMonitoring
AcousticSensing
ThermalSensing
SensorImplementation
SensorIntegration
Data Management
Data Architecture
Reliability/Failure Modes
Maintainability& Testability
Logistics &Sustainment
Training
Safety
25
PHM Design Methodology
26
PHM Design Methodology
27
PHM Design Methodology
28
PHM Design Methodology
29
PHM Design Methodology
30
• Availability, Achieved
where
MTBF = Mean Time Between Failure
MTTR = Mean Time To Repair
MTTRMTBF
MTBF
Time Down
Time UpAA
Availability Analysis
31
• Availability, Operational
where
MTBUMA = Mean Time Between Unscheduled Maintenance Actions
ALDT = Administrative Logistical Down Time
MTTR = Mean Time To Repair
MTTRALDTMTBUMA
MAMTBU
Time Down
Time UpAO
Availability Analysis
32
• MTBUMA = Mean Time Between Unscheduled Maintenance Actions
where
MTBM = Mean Time Between Failures
MTBM = Mean Time Between Maintenance
MTBM
1
MTBM
1MTBF
11
MTBUMA
defect noinduced
Availability Analysis
33
• How can we improve AO?
- By decreasing Administrative & Logistical Down Time (ALDT)
- By increasing Mean Time Between Failures (MTBF)
- By decreasing Mean Time To Repair (MTTR)
- By increasing Mean Time Between Unscheduled Maintenance Actions (MTBUMA) – [by decreasing MTBR induced and MTBR no defect]
Availability Analysis
34
• How can we decrease ALDT?
- By improving Logistics Improve scheduling of inspectionsImprove commonality of partsDecrease time to get replacements
- By improving PrognosticsReplace parts before they fail, not afterMaximize use of component lifeImprove off-board prognostics trendingMore sensors!!
Availability Analysis
35
• How can we increase MTBF?
- By improving ReliabilitySelect more rugged componentsImprove life screening and testingImprove thermal management
- By improving QualityBetter parts screeningBetter manufacturing processes
- By adding RedundancyAt the cost of Size, Weight and Power!
Availability Analysis
36
• How can we decrease MTTR?
- By improving MaintainabilityImprove quality and efficacy trainingSimplify fault isolationDecrease number of tools and special equipmentDecrease access time (panels, connectors…)Improve Preventative Maintenance
- By improving DiagnosticsImprove BIT and BITEDecrease ambiguity group sizeImprove maintenance manuals and training
Availability Analysis
37
• How can we increase MTBM (induced/no defect)?
- By improving SafetyLimit the potential for accidental damage
- By improving PrognosticsImprove PHM models to monitor induced damage
- By improving DiagnosticsLower the false alarm rateDon’t repair/replace things which aren’t broken!
Availability Analysis