Process Monifwhg and Diagnosis -...

8
Process Monifwhg and Diagnosis A Model-Based Approach Daniel Dvorak, AT&T Bell laboratories Benjamin Kuipers, University of Texas at Austin I N A BOOK WITH THE CURIOUS title Normal Accidents,’ Charles Perrow examines several of the most notable acci- dents involving complex systems in the modern industrial world-accidents such as the 1979 Three Mile Island nuclear pow- er accident, the 1977 New York City black- out, and the 1969 Texas City explosion of a butadiene refining unit. Perrow high- lights the difficult job of plant operators who are responsible for physical systems with complex interactions and tight cou- pling. With current monitoring technolo- gy, alarms are triggered whenever fixed thresholds are exceeded. A nuclear power plant, for example, can have over a thou- sand distinct alarms, and hundreds of them can be activated within a minute, as in a loss-of-coolant accident. In such situations, process operators tend to overlook relevant information, respond too slowly, and panic when the rate of information flow is too great. Not surprisingly, operator advisory sys- tems have become an important area of application for expert systems. ESCORT2 (an expert system for complex operations in real time) and REALM3 (a reactor emer- gency action level monitor) are two of many expert systems developed for pro- cess industries. These systems aim to PROCESS PLANTS MUST DEAL WITH CHANGZNG STATES, MULTZPLE FAULTS, AND INCOMPLETE AND UNZIELZABLE MEASUREMENTS. OUR MODEL-BASED ADVISORY SYSTEM EXPLOITS THREE RECENT TEWNOLOGZES TO ZMPROVE ,i.,.b”i^ PROCESS MONZTOZUNGAND DZAGNOSZS. reduce the cognitive load on operators, ~ based on an expert’s imperfect recall of usually by helping to diagnose the cause of alarms and possibly by suggesting correc- ~ symptoms and faults-is to use a model of , the process to predict its behavior or to tive actions. Most of these expert systems ~ check consistency among observed vari- get their knowledge of symptoms, faults, ~ dbles. I When observations disagree with and corrective actions through the usual ~ the model’s predictions, some diagnostic process of codifying human expertise in technique is initiated to identify the fault rules or decision trees. The problem, as candidates. These model-based approach- with all expert systems, is reliability. As es to diagnosis have emerged from two Denning observes4 different communities. In the engineering The trial-and-error process by which knowl- edge is elicited, programmed, and tested is likely to produce inconsistent andincomplete databases; hence,an expert system may ex- hibit important gaps in knowledge at unex- pectedtimes. Obviously, these “gaps in knowledge” can have serious consequences in some pro- cess industries. An alternate approach-one that is not community, fault detection and isolation techniques generally rely on a precise math- ematical model of the process and on pre- enumerated symptom-fault patterns known as fault signatures. In the computer sci- ence/artificial intelligence community, model-based diagnosis relies on models of structure and behavior.” For example, giv- en symptoms of misbehavior (as detected by the behavioral model), fault candidates JUNE 1991 0885/9000/9 110600-0067 $ I .OO 0 199 I IEEE 67

Transcript of Process Monifwhg and Diagnosis -...

Page 1: Process Monifwhg and Diagnosis - materias.fi.uba.armaterias.fi.uba.ar/7566/Process-Monitoring-and-Diagnosis.pdf · and nuclear power plants) are examples of continuous-variable dynamic

Process Monifwhg and Diagnosis A Model-Based Approach Daniel Dvorak, AT&T Bell laboratories Benjamin Kuipers, University of Texas at Austin

I N A BOOK WITH THE CURIOUS title Normal Accidents,’ Charles Perrow examines several of the most notable acci- dents involving complex systems in the modern industrial world-accidents such as the 1979 Three Mile Island nuclear pow- er accident, the 1977 New York City black- out, and the 1969 Texas City explosion of a butadiene refining unit. Perrow high- lights the difficult job of plant operators who are responsible for physical systems with complex interactions and tight cou- pling. With current monitoring technolo- gy, alarms are triggered whenever fixed thresholds are exceeded. A nuclear power plant, for example, can have over a thou- sand distinct alarms, and hundreds of them can be activated within a minute, as in a loss-of-coolant accident. In such situations, process operators tend to overlook relevant information, respond too slowly, and panic when the rate of information flow is too great.

Not surprisingly, operator advisory sys- tems have become an important area of application for expert systems. ESCORT2 (an expert system for complex operations in real time) and REALM3 (a reactor emer- gency action level monitor) are two of many expert systems developed for pro- cess industries. These systems aim to

PROCESS PLANTS MUST DEAL WITH CHANGZNG

STATES, MULTZPLE FAULTS, AND INCOMPLETE

AND UNZIELZABLE MEASUREMENTS. OUR

MODEL-BASED ADVISORY SYSTEM EXPLOITS

THREE RECENT TEWNOLOGZES TO ZMPROVE ,i.,.b”i^

PROCESS MONZTOZUNG AND DZAGNOSZS.

reduce the cognitive load on operators, ~ based on an expert’s imperfect recall of usually by helping to diagnose the cause of alarms and possibly by suggesting correc-

~ symptoms and faults-is to use a model of , the process to predict its behavior or to

tive actions. Most of these expert systems ~ check consistency among observed vari- get their knowledge of symptoms, faults, ~ dbles. I When observations disagree with and corrective actions through the usual ~ the model’s predictions, some diagnostic process of codifying human expertise in technique is initiated to identify the fault rules or decision trees. The problem, as candidates. These model-based approach- with all expert systems, is reliability. As es to diagnosis have emerged from two Denning observes4 different communities. In the engineering

The trial-and-error process by which knowl- edge is elicited, programmed, and tested is likely to produce inconsistent and incomplete databases; hence, an expert system may ex- hibit important gaps in knowledge at unex- pected times.

Obviously, these “gaps in knowledge” can have serious consequences in some pro- cess industries.

An alternate approach-one that is not

community, fault detection and isolation techniques generally rely on a precise math- ematical model of the process and on pre- enumerated symptom-fault patterns known as fault signatures. In the computer sci- ence/artificial intelligence community, model-based diagnosis relies on models of structure and behavior.” For example, giv- en symptoms of misbehavior (as detected by the behavioral model), fault candidates

JUNE 1991 0885/9000/9 110600-0067 $ I .OO 0 199 I IEEE 67

Page 2: Process Monifwhg and Diagnosis - materias.fi.uba.armaterias.fi.uba.ar/7566/Process-Monitoring-and-Diagnosis.pdf · and nuclear power plants) are examples of continuous-variable dynamic

Alarms j Advising

Figure 1. The three tasks of process monitoring.

are identified using the structural model by following a dependency chain back from a violated prediction to each component and parameter that contributed to that prediction.

The model-based approach we discuss here has evolved within the AI community, but similar ideas have appeared indepen- dently in the fault detection and isolation literature.6 The type of model used in any model-based approach determines many of the capabilities and limitations of the specific method. Model types vary a great deal: There are numerical models, dynam- ic qualitative models, extended signed di- rected graphs, causal models and conflu- ence equations, fuzzy qualitative models, and this article’s semiquantitative model.

We focus here on process monitoring and diagnosis - the basic elements of an operator advisory system. In this setting, several conditions hold that challenge di- agnostic methods. First, the plant is a con- tinuous-variable dynamic system with feed- back loops and changing states. Second, diagnosis must be performed while the system operates. Third, many system quan- tities are not sensed. Finally, measurements are unreliable due to sensor failures.

The basic idea: mimicry

The key cognitive skill for process oper- ators is the formation of a mental model that not only accounts for current observa- tions but also lets the operators predict near-term behavior as well as the effect of possible control actions. This observation underlies our framework for process mon- itoring, named Mimic. The basic idea is quite simple: Mimic the physical system

with a predictive model, and when the system changes behavior due to a fault or repair, change the model accordingly so that it continues to give accurate predic- tions of expected behavior.

Figure 1 depicts the Mimic framework. Two tasks maintain the model. The track- ing task advances the model’s state in step with observations from the physical sys- tem. When observations disagree with pre- dictions, Mimic uses model-based diagno- sis to determine the possible faults. After identifying a fault, the diagnosis task in- jects it into the current model so that the predictions will continue to agree with ac- tual observations. To be precise, Mimic maintains a set of candidate models since a given behavior might be caused by one of several faults. Each candidate model rep- resents a possible condition of the system, including its state and faults.

The key benefit of this approach is that we can use the model as our window into the physical system. Specifically, the model can

l detect early deviations from expected behavior more quickly than with fixed- threshold alarms (this method, known as analytical redundancy, uses known analytical relationships among sets of sig- nals to check for mutual consistency);

l predict the values of unobserved vari- ables (signal reconstruction) to permit alarms or other inferences on unseen vari- ables, and to help the operator understand process conditions;

l predict near-term undesirable or haz- ardous conditions, thus providing early warnings; and

l predict the effect of proposed control actions to test if they will have the desired effect - a valuable capability in complex systems.

The end purpose of monitoring and di- agnosis is advice to the operator about what’s happening and what to do about it. The advising task applies expert knowl- edge of safety conditions, recommended operating procedures, and performance objectives to produce advice in the form of alarms, warnings, and recommended ac- tions. Although we do not discuss it further here, the advising task is a major beneficia- ry of the model-based approach in that the candidate models (and their tracked states) provide a testbed for generating warnings and for testing proposed control actions.

Three key technologies

Mimic exploits threerelatively new tech- nologies: semiquantitative simulation, measurement interpretation (tracking), and model-based diagnosis. These technolo- gies work together in a hypothesize-build- simulate-match cycle, as shown in Figure 2. This figure gives a more detailed view of the tracking and diagnosis tasks of Figure I.

Semiquantitative simulation. Industrial process plants (such as chemical refineries and nuclear power plants) are examples of continuous-variable dynamic systems. In modern control theory, these dynamic sys- tems are modeled with a set of coupled first- order differential equations consisting of balance equations, physical-chemical state equations, and phenomenological laws. Given a set of initial values, a numerical simulation of the equations yields precise predictions of values for each variable over time.

Oddly enough, standard numeric simu- lation is too exact and too narrow for our purposes. Real process systems contain much imprecision. Sensors, actuators, and functional units operate within certain tol- erances; parameter values and some func- tional relations are known only approxi- mately. One approach, of course, is to do precise numerical simulation using aver- age values, and then use some form of approximate matching of simulation re- sults to observations. This approach pre- sents two problems. First, given initial con- ditions, numerical simulation predicts only one behavior from a model, even though more than one might be possible, given the real imprecision. For example, a tiny dif- ference in one parameter can determine whether or not a rocket achieves escape velocity. In effect, numerical simulation makes an inappropriate commitment to a single behavior. In contrast, qualitative sim- ulation guarantees that all possible behav- iors will be predicted. This capability is especially important in testing a fault hy- pothesis, which can exhibit several quali- tatively distinct behaviors.

The second problem is the approximate- matching problem - how do you decide, in a principled way, when a difference between prediction and observation is due to imprecision and when it is due to a fault? What we really want is to explicitly ex- press imprecise knowledge as part of the

68 IEEE EXPERT

Page 3: Process Monifwhg and Diagnosis - materias.fi.uba.armaterias.fi.uba.ar/7566/Process-Monitoring-and-Diagnosis.pdf · and nuclear power plants) are examples of continuous-variable dynamic

model and have the simulator use it. pro- ducing valid ranges for each variable and permitting direct matching of observations to predictions. Semiquantitative simulation provides this capability. Furthermore, when observations match the predictions of two (or more) distinct behaviors, we want to track both hypotheses until they diverge. which Mimic does.

Qualitative simulation.’ the foundation for semiquantitative simulation, has two important characteristics for our applica- tion. First. it uses a qualitative level of description that lets us express imprecise knowledge. This purely qualitative de- scription uses no numbers (but can take advantage of quantitative information when available). Second. qualitative simulation generates all the qualitatively distinct be- haviors that are attainable from a starting state and consistent with the given imprc- cise knowledge. This property is essential in monitoring a physical system. whether healthy or faulty.

Semiquantitative simulatiot? take5 ad- vantage of quantitative knowledge when it is available, which is always the case in process plants. This knowledge consists of numeric ranges for landmark values (for example. the pressure relief valve opens at 200-2 IO pounds per square inch) plus en- velope functions that define the limits of monotonic relationships (for instance, an approximate relationship between the vol- ume of fluid in a tank and its height). Quantitative values are expressed as nu- meric ranges and simulated with a modi- fied interval arithmetic. Interval arithmetic is normally subject to an uncertainty ex- plosion, but this problem is avoided be- cause all reasoning takes place with rc- spect to the fixed set of landmarks provided by the qualitative behavior. The resulting semiquantitative simulation retains all the properties of qualitative simulation. but with two additional benefits: (I ) it elimi- nates behaviors that are qualitatively pos- sible but quantitatively inconsistent, and (2) it permits direct comparison of numeric sensor readings to the numeric ranges prc- dieted for each variable. Qsim, which we have used in our research, provides this form of semiquantitative simulation.

The version of Mimic we describe here has evolved from an earlier design’ that based hypothesis generation on a decision tree and used a less sophisticated form of semiquantitative simulation.

JUNE 1991

Figure 2. The architecture of Mimic.

Tracking. Mimic seeks to maintain a model whose state and faults (if any) re- flect the current state of the physical sys- tem. More precisely, it maintains a set of models. each in a state consistent with the most recent observsations. This set is called the “tracking set.” and each model in the set embodies different fault hypotheses and therefore represents an alternate interpre- tation of the system. During diagnosis. Mimic adds models to the tracking set as it generatcs fault/repair hypotheses. During tracking. Mimic deletes models from the tracking set when their predictions fail to track observations.

Qualitative simulation generates a be- havior graph. which is a directed graph of the system’s possible qualitative states and the transitions among them. A behavior is a path through the directed graph and corn- prises a sequence of states alternating be- tween states representing an instant oftime and states representing an interval of time. Tracking. also called measurement inter- pretation, is the process of using observa- tions to follow a path (a behavior) through the behavior graph.“’

Using the behavior graph fragment in Figure 3. we can describe several details of the process. If a model is in state E. Mimic compares each new set of observations to the values of state E. If the observations match the predictions. (that is. if each ob- servation falls within the predicted range). then the model remains in state E. The usual noise filtering of sensor readings should still be performed before matching.

When an observation does not match the

current state E, Mimic uses incremental simulation to generate the immediate suc- cessor states F. Cl, H. (Recall that a state in a semiquantitative simulation can have more than one possible successor state because of the imprecise knowledge expressed in the model.) If, say, the observations match state G, then the model is retained with its state now set to G. Incremental simulation refers to the control that the tracking task exerts over the simulator. When triggered. the simulator generates only the immediate qualitative successor states to the current state. Thus, the simulation is advanced only as needed; it is never “run to completion.”

If the observations do not match any of the immediate successor states. Mimic re- peats the incremental simulation and com- pares the observations to the second- generation successor states (I. J, K, L. M). The model needs this limited-distance look- ahead to jump over instantaneous states that fall between consecutive observations.

Figure 3. Tracking through a behavior graph.

b9

Page 4: Process Monifwhg and Diagnosis - materias.fi.uba.armaterias.fi.uba.ar/7566/Process-Monitoring-and-Diagnosis.pdf · and nuclear power plants) are examples of continuous-variable dynamic

Lower heater

I- Figure 4. An electric water heater.

Observations can include independent variables, that is, exogenous variables whose values cannot be predicted. When an inde- pendent variable changes value, tracking must reinitialize the models in the tracking set using the current observations and most recent predictions for integrated quanti- ties. Thus the simulation, just like the physical system, is made to react to chang- es in independent variables.

Through progressive step-sizerefinements of the semiquantitative simulation, we can attain a desired precision for the quantita- tive predictions.8 Specifically, we can use the time interval between qualitative time points to refine the step size of the quantita- tive simulation, inserting new quantitative states that have time points within that inter- val. These new quantitative states more precisely bound the predicted behavior.

Mimic never has to generate the full behavior graph (envisionment), a compu- tation that can be prohibitively expensive for complex systems because of the intrac- table branching problem of qualitative simulation. Instead, Mimic performs in- cremental simulation to generate only the states in the immediate vicinity of the last tracked state, abandoning any branches that do not track the observations. In effect, the observations act as a filter that elimi- nates irrelevant branches in the behavior graph.

Model-based diagnosis. Process sys- tems are designed for continuous opera- tion, and are therefore somewhat fault tol- erant. Economic pressures to keep a plant in operation often mean that the system will continue running with multiple faults. Thus, single-fault diagnosis is inadequate. However, complete multiple-fault diagno-

70

sis is combinatorially explosive and there- ~ fore unrealistic for real-time monitoring. i As a middle approach, Mimic incremental- ly constructs and tests multiple-fault hy- ! potheses. Specifically, since sensor mea- 1 surements are frequent, we assume that ) only a single new fault (or a single repair) : can occur between successive measure- ~ ments. Thus, Mimic can construct multiple- fault hypotheses, one hypothesis at a time.

Let’s examine when and how diagnosis occurs. The tracking task discards a model ( when it finds a discrepancy between pre- 1 dictions and observations. However, be- fore discarding the model, Mimic tries to modify it to bring its predictions into agreement with observations. This is sim- ilar in intent to the debug phase of the 1 generate-test-debug paradigm,” though it and Mimic differ in many other ways. Us- ing the structural model containing com- ponents, connections, and parameters, Mimic’s algorithm traces upstream from 1 the site of the discrepancy to identify all components and parameters that could have contributed to the discrepancy (dependen- cy tracing). Assuming that the discrepan- cies are due to a single new fault or a single 1 new repair, the only suspects are those that can account for all discrepancies. Mimic further checks these suspects for global 1 consistency through constraint-suspension; i if it finds no assignment of values consis- tent with all symptoms, then the suspect is exonerated. For each remaining suspect, each of its other operating modes is tested for compatibility with the observations. ’ Mimic adds to the tracking set whatever ) model variations survive this test.

Unlike many diagnostic methods, mod- el-based diagnosis does not rely on a set of I symptom-fault patterns. Such patterns are often incomplete, since it is difficult for an expert to anticipate all possible faults and predict their symptoms, especially the symptoms of interacting faults. Even if we collect symptom-fault patterns from ex- haustive fault-model simulations, using these patterns is not necessarily more efficient.

Another important property of model- based diagnosis is that it handles failed sensors and missing data naturally, not as a special case. A sensor is just another com- ponent that affects an observation; depen- dency tracing will identify it as a suspect in the usual way. As Scar1 observesI model- based reasoning avoids combinatoric

problems in handling failed sensors and unavailable data because it matches against predictions rather than symptom- atic patterns. If a sensor is bad and thus gives readings different than predicted, the sensor becomes a suspect simply be- cause it is upstream of the discrepancy. If a datum is unavailable, it is not compared with predictions and therefore cannot cause discrepancies.

An example

To illustrate Mimic at work, consider the electric water heater shown in Figure 4, which we have modeled and tested with Mimic. The water heater has a single ther- mostat that controls whether or not power is applied to the two heating elements (on- offcontrol). Raw sensorinformationcomes from a temperature sensor near the thermo- stat, from a flow-rate sensor on the cold- water inlet, and from a voltage sensor on the heating elements. In a real monitoring situation we would want to diagnose vari- ous possible faults such as defective heat- ing elements, a stuck thermostat, a faulty flow-rate sensor, and loss of electrical power. However, to keep this example simple, we consider only the possibility of defective heating elements.

The water heater is modeled as a two- compartment model in which two masses of water (upper and lower) are connected with thermal flow and mass flow between them, as shown in Figure 5. Each compart- ment is treated as well mixed (the temper- ature is the same everywhere within the compartment). Each compartment’s tem- perature is affected by five heat flows: heat gain from the heating element, heat loss to the room through the insulating jacket, heat gain due to water inflow, heat loss due to water outflow, and heat transfer through thermal contact with the other compart- ment. The semiquantitative Qsim model of the water heater contains the usual equa- tions that relate mass, mass flow, heat, heat flow, thermal resistance, and temperature. It also contains numeric ranges for land- mark values such as room temperature, inlet water temperature, nominal heating rate, and thermal resistance of the insula- tion. This model does not require any enve- lope functions because it does not contain imprecise monotonic relationships.

In the normal (fault-free) model, all

IEEE EXPERT

Page 5: Process Monifwhg and Diagnosis - materias.fi.uba.armaterias.fi.uba.ar/7566/Process-Monitoring-and-Diagnosis.pdf · and nuclear power plants) are examples of continuous-variable dynamic

components (tank. heating elements, thermostat, temperature sensor, flow-rate sensor. voltage sensor, voltage supply) operate according to their intended pur- poses. In a fault model, a faulty component operates according to a failure mode, such as a heating element that generates no heat when power is applied.

Table 1 summarizes an example ofmon- itoring the water heater, showing how monitoring progresses over eight moments in a series of observations. The observa- tions are simulated from a separate numer- ical model of’ a faulty water heater in which the lower heating element produces no heat. For each moment. the table shows what hypotheses have been proposed and what models are beinp tracked. The water heater begins in a state where the tank’s water is hot, the heating elements are off. no water is flowing. and the temperature is slowly falling. These readings are consis- tent with the normal model. Now. someone starts to draw water for a bath. A high flow rate is measured, but all other readings remain the same. Since water flow is an independent variable, Mimic reinitializes every tracked model (just the normal mod- el in this case) to reflect the change. Since the normal model remains consistent with the new values, it is retained.

As time continues, the temperature in- side the tank drops because of the cooler inlet water. These readings are consistent with the current state of the normal model. 50 no change occurs to the tracking set. At moment 3 the temperature drops to the point where the heating elements turn on. as observed by the voltage sensor. Since this event is predicted by the normal model as an immediate successor state of the most recently tracked state, the normal model is retained with its state updated.

At moment 4 the temperature contin- ues to drop. Although this observation is qualitatively consistent with the normal model, it is inconsistent with the associ- ated quantitative ranges. In effect, the model is saying that for this flow rate. tank capacity, heating rate. and inlet tem- perature. the water temperature should not be dropping so fast. Thus, the track- ing task discards the normal model. At the same time, this discrepancy triggers dependency tracing. which identifies two possible faults - a bad upper heating element or a bad lower heating element (denoted bad-h I and bad-h2). This causes

Water source

Figure 5. A structural model of water heater.

Mimic to build two fault models. Both models are successfully initialized. so Mimic is now tracking two models. Since Mimic assumes that only one fault occurs at a time. it does not hypothesire adouble fault (bad-h1 and bad-h2). In a more de- tailed example. other hypotheses would also be proposed. such as a faulty temper- ature sensor. faulty flow sensor, and faulty thermostat, since all are upstream of the temperature discrepancy.

The water flow stops at moment 5 (some- body turned offthe faucet). With this change

in an independent variable, Mimic reini- tializes the two models. At moment 6, the temperature is observed to be rising. Mim- ic then compares the observed temperature to the quantitative predictions of the two models. Because the observed temperature exceeds the ran&e predicted by the bad-hl model, Mimic discards that model. The predictions of the one remaining model, bad-h2. are compatible with the observa- tions. so the model is retained. This model continues to track future readings. and emerges as the sole fault hypothesis.

Table I. Diagnosing the water heater based on its dynamic behavior.

MOMENT

0 1 2 3 4 5 6 7

SYNOPSlS

TIME (MIN.)

FLOW

(LITE~~S/MIN.)

TEMPERATURE

(OEG. C)

POWER

(ON OR OFF)

NEW FAULT HYPOTHESES

NEW FAULT MODELS

TRACKED MODELS

Temp Flow Temp Heater Temp still Flow hot starts dropping on dropping stops

0.0 1.0 2.0 2.4 2.7 3.0

0 30 30 30 30 0

64.9 64.8 61.4 58.9 57.1 55.9 60.0 66.0

off

none

none

normal

off off on on on

none none none bad-h1 none bad-h2

none none none bad-h1 none bad-h2

normal normal normal bad-h1 bad-hi bad-h2 bad-h2

Temp rising

13.0

0

on

none

bad-h2 bad-h2

Heater off

27.7

off

none

none

JUNE 1991 71

Page 6: Process Monifwhg and Diagnosis - materias.fi.uba.armaterias.fi.uba.ar/7566/Process-Monitoring-and-Diagnosis.pdf · and nuclear power plants) are examples of continuous-variable dynamic

Advantages

The water heater example shows how Mimic can diagnose a system by observing its dynamic behavior with few observable variables. In general, the speed at which Mimic can narrow its diagnosis depends on the number of monitored variables and the dynamic activity of the system. With more monitored variables and more system ac- tivity, more opportunities arise to refute incorrect hypotheses.

Diagnostic systems often rank compet- ing hypotheses by probability, based on the previously known fault probabilities of components. Because Mimic monitors continuously, it can also rank hypotheses by age. The longer a hypothesis survives changing observations, the stronger the evidence supporting that hypothesis. This age-ranking also focuses attention on hy- potheses that account for the earliest man- ifestations of a fault, before other manifes- tations and corresponding hypotheses appear. In short, the system’s natural time delays help identify the correct hypothesis.

Mimic treats alarms in a new way, in that they are based primarily on the predictions of the models in the tracking set. This has several positive consequences:

l Alarm thresholds can be dynamic rather than fixed, thus allowing earlier alerting.

l Alarms can be based on unobserved variables, permitting more freedom in alarm design.

l Alarms can reveal any mutually in- consistent readings (extreme analytical redundancy).

l Alarms, called forewarnings, can be based on near-future predicted states.

l False alarms due to operating-mode changes (for example, startupor shutdown) should not occur if the model faithfully predicts such dynamic behavior.

limitations

If Mimic cannot quickly refute invalid hypotheses, the tracking set will grow and Mimic will slow down correspondingly. Mimic refutes hypotheses through track- ing: there must be an observed discrepancy between the model’s predictions and the sensor readings. In practice, this means the model’s quantitative predictions must be reasonably well bounded and that there must be an adequate number of well-placed sensors.

Mimic assumes that faults occur one at a time. More precisely, it assumes that the manifestations of different faults appear at different times with respect to its sampling rate. This assumption can be violated in the case of a catastrophic event (such as an explosion) or cascading faults, where dis- crepancies due to more than one fault can appear simultaneously.

A qualitative simulation algorithm can predict spurious behaviors when the qual- itative model does not explicitly preserve an invariant quantity (for example, energy)

THE SPEED AT ~HZCH

MIMIC CAN NARROW ITS

DLAGNOSZS DEPENDS ON THE

NUMBER OF MONlTORED

VARlABLES AND THE DYNAMIC

ACTMTY OF THE SYSTEM.

from a qualitative state to its successors. If the manifestations of an actual fault hap- pen to match a predicted but spurious be- havior, then the system will find no dis- crepancy and the fault will go undetected. However, this problem has been substan- tially reduced by introducing several glo- bal constraints, that is, constraints that elim- inate spurious behaviors through global consistency checks, such as applying the nonintersection constraint to trajectories in qualitative phase space, automati-tally de- riving energy constraints by recognizing conservative and nonconservative forces, and using higher-order derivative constraints.

The Qsim algorithm guarantees that all behaviors are predicted, and only under a qualitative level of description does this give a tractable set ofpossibilities. In simple cases such as the water heater, this is tracta- ble in practice as well as in theory. For more complex systems, controlling the size of the hypothesis set is still a potential problem.

Related work

Kay” has demonstrated the Mimic ap- proach in monitoring the pump-down phase of a vacuum system for semiconductor

fabrication, which requires ultrahigh vacu- ums ( 10m9 Tot-r). Since no practical theory exists for the sorption of gases, it is diffi- cult to model the process numerically. Kay’s semiquantitative model, with dynamic en- velopes that bound the expected observa- tions, permits reasoning with uncertainties and still achieves detection of faults early in the pump-down phase.

Abbott’s approach to monitoring and diagnosisI like Mimic, takes advantage of the sequence in which symptoms ap- pear, although the mechanisms differ somewhat. Her Draphys system detects symptoms (discrepancies) by comparing sensor readings to expected values com- puted from a numerical simulation model of the fault-free system. Fault hypotheses are then generated by tracing upstream from the symptom through a graph model of the paths of interaction among compo- nents, tracing both functional and physical paths. As new symptoms appear, Draphys tests each existing hypothesis to see if propagating its effects further downstream in the graph model covers the new symp- toms. This latter step is akin to Mimic’s tracking, but at a more abstract level. Spe- cifically, the graph model in Draphys rep- resents only that a fault in one component might affect another component; there is no information about whether the affected sensor should read high or low, nor about time delays in fault propagation. Such in- formation could be used to refute some hypotheses. Mimic can refute some hy- potheses because the semiquantitative model it uses during tracking provides such information.

Isermann’s model-based approach to process fault diagnosis,6 like Mimic, uses dynamic mathematical models and mea- surable input and output signals to allow estimation of unmeasurable internal quan- tities, which can then be used for fault detection. Unlike Mimic, however, Iser- mann’s models are strictly quantitative and are expected to “describe the process be- havior precisely.” The resulting approxi- mate-matching problem (to determine if an observation is “normal”) is handled with a Bayes decision algorithm. After recogniz- ing a symptom, this system classifies the fault by comparing it with fault signatures, which have been established beforehand. Although Isermann’s work uses different methods for simulation, measurement in- terpretation, and diagnosis, he reaches a

-. - 72 -iiEE EXPERT

Page 7: Process Monifwhg and Diagnosis - materias.fi.uba.armaterias.fi.uba.ar/7566/Process-Monitoring-and-Diagnosis.pdf · and nuclear power plants) are examples of continuous-variable dynamic

conclusion that we share: “Dynamic process behavior yields considerably more infor- mation on process faults than can be achieved in the static case.”

Several expert systems have been built that share Mimic’s operational goal -that of relieving some of the monitoring burden from process operators.” Mimic focuses primarily on determining the physical sys- tem’s state, but most of these expert sys- tems have the broader scope of trying to advise the operator on corrective actions. ESCORT,’ for example, gets its knowl- edge of faults, anomalies, and corrective actions through the usual process of codi- fying human expertise in rules, rather than by encoding a predictive model of the phys- ical system as Mimic does.

C OMPARED TO EXISTING METH- ods based on fixed-threshold alarms, fault dictionaries, decision trees, andexpert sys- tems, our method for monitoring and diag- nosing process systems has several advan- tages beyond those we’ve already discussed:

VARlABLES AND MORE

SYSTEM ACTMTY, MORE

OPPORTUNITIES ARlSE TO

REFUTE IhCORRECT

HYPOTHESES.

l A semiquantitative model of a physi- cal system can predict all possible behav- iors that are consistent with the incomplete and imprecise knowledge of the system’s devices and processes. This ensures, for example, that rare but hazardous behaviors will not be overlooked.

l By using a structural model of the plant and tracing upstream from the site of un- matched observations, model-based diag- nosis generates fault candidates efficient- ly, without resorting to precompiled (and often incomplete) symptom-fault patterns.

l By injecting a hypothesized fault into the model and tracking its predictions against observations, the dynamic behavior of the plant is exploited to corroborate or refute hypotheses.

a higher-level description of a physical system, and a simulation task, which takes the equations and predicts possible behav- iors. Although we described our example model at the level of semiquantitative dif- ferential equations used in Qsim, it is usu- ally more convenient to describe a model at a higher level of abstraction. For example, a device ontology” views a system as a collection of interconnected devices (such as tanks, pumps, and pipes), while a pro- cess ontology’* views a system as a set of processes (such as liquid flow and heat flow) plus the preconditions that enable each process. These popular ontologies can be compiled into the qualitative math- ematics of Qsim, but additional work on

l By simulating ahead in time from the automated model building is needed to add

current state, an operator can be warned of partial quantitative information and permit

nearby undesirable states that the plant automatic injection of faults.

might enter. Similarly, the effects of pro- 1 posed control actions can be determined by simulating from the current state. Acknowledgments

many benefits from this research. For ex- ample, recent research* has improved quan- titative reasoning mechanisms to provide tighter bounds on predictions of semiquan- titative models. Also, research on reason- ing about energy l6 has eliminated an im- portant source of spurious behaviors in qualitative simulation.

An important task that we have not dis- cussed is model building. Model-based reasoning can and should be decomposed into a model-building task, which creates semiquantitative differential equations from

WITH MORE MONITORED

The three foundational technologies that support Mimic’s method of monitoring and diagnosing process systems - semiquan- titative simulation, tracking, and model- based diagnosis - continue to be active areas of research. Mimic stands to inherit

We thank the guest editors and anonymous referees for pointing out several shortcomings in the earlier version of this article. Daniel Dvorak has been supported by the AT&T Doctoral Support Program. Benjamin Kuipers has been supported in part by NSF grants IRI-8602665, IRI-8905494, and IRI-8904454, by NASA grants

NAG 2-507 and NAG 9-200, and by the Texas Advanced Research Program under grant 003658175.

References

1. C. Perrow, NormalAccidents, Basic Books, New York, 1984.

2. P.A. Sachs, A.M. Paterson, and M.H.M. Turner, “ESCORT - An Expert System for Complex Operations in Real Time,” Expert Systems, Vol. 3, No. I, Jan. 1986, pp. 22-29.

3. R.A. Touchton, “Emergency Classification: A Real-Time Expert System Application,” Proc. Southcon 1986, Electronics Conven- tions Management, Los Angeles, 1986, pp. 2,321.2,323.

4. P.J. Denning, “Towards a Science of Ex- pert Systems,” IEEE Expert, Vol. 1, No. 2, Summer 1986, pp. 80-83.

5. R. Davis and W. Hamscher, “Model-Based Reasoning: Troubleshooting,” in Explor- ing Artificial Intelligence: Survey Talks from the Nar’l Conferences on Artificial Intelligence, H.E. Schrobe, ed., MIT Press, Cambridge, Mass., 1988, pp. 297-346.

6. R. Isermann, “Process Fault Diagnosis Based on Dynamic Models and Parameter Esti- mation Methods,” in Fault Diagnosis in DynamicSystems: TheoryandApplications, Chapter 7, R. Patton, P. Frank, and R. Clark, eds., PrenticeHall,EnglewoodCliffs, N.J., 1989.

7. B. Kuipers, “Qualitative Simuiation,“Arti- ficial Intelligence, Vol. 29, No. 3, Sept. 1986, pp. 289-338.

8. D. Berleant and B. Kuipers, “Qualitative- Numeric Simulation with 43,” to appear in Recent Advances in Qualitative Physics, B. Faltings and P. Struss, eds., MIT Press, Cambridge, Mass., 1991.

9. D. Dvorak and B. Kuipers, “Model-Based Monitoring of Dynamic Systems,” in Proc. ZIthZnt’IlointConf.ArtificialZntelligence (ZJCAZ 89), Morgan Kaufman”, San Ma- teo, Calif., 1989, pp. 1,238.1,243.

10. K.D. Forbus, “Interpreting Measurements of Physical Systems,” Proc. Fifth Nat’1 Conf. Arrificial Intelligence (AAAI 86), MIT Press, Cambridge, Mass., 1986, pp. 113-117.

Il. R. Simmons and R. Davis, “Generate, Test, and Debug: Combining Associational Rules and Causal Models,” Proc. ZOth Znt’l Joint Conf Artificial Intelligence (ZJCAZ 87), Morgan Kaufman”, SanMateo, Calif., 1987, pp. 1,071-1,078.

JUNE 1991 73

Page 8: Process Monifwhg and Diagnosis - materias.fi.uba.armaterias.fi.uba.ar/7566/Process-Monitoring-and-Diagnosis.pdf · and nuclear power plants) are examples of continuous-variable dynamic

12. E. Scarl, “Sensor Failure and Missing Data: Further Inducements for Reasoning with Models,” Proc. 1989 AAAI Workshop on Model-Based Reasoning, 1989, pp. l-6, available from E. Scarl, ed., Boeing Com- puter Services, M/S 7L-64, Seattle, Wash.

13. H. Kay, Monitoring and Diagnosis ofMul- titank Flows Using Qualitative Reasoning, master’s thesis, Univ. of Texas at Austin, 1990.

14. K. Abbott, Robust Fault Diagnosis of Physical Systems in Operation, doctoral dissertation, Rutgers Univ., New Brunswick, N.J., 1990.

15. D. Dvorak, “Expert Systems for Monitor- ing and Control,” Tech. Report AI87-55, Dept. of Computer Sciences, Univ. of Tex- as at Austin, 1987.

16 P. Fouche and B. Kuipers, “Reasoning About Energy in Qualitative Simulation,” to be published in IEEE Trans. Systems, Man, and Cybernetics, 1991.

17. J. de Kleer and J.S. Brown, “A Qualitati- Physics Based on Confluences,” in Qua, tative Reasoning about Physical Systen D.G. Bobrow, ed., MIT Press, Cambridg Mass., 1985, pp. 7-83.

18. K.D. Forbus, “Qualitative Process The ry,” in Qualitative Reasoning about Ph) icalSystems,D.G. Bobrow,ed.,MITPre! Cambridge, Mass., 1985, pp. 85-168.

Further reading

D. Dalle Molle, Qualitative Simulation of C namic Chemical Processes, doctoral disserl tion, Univ. of Texas at Austin, 1989.

Fault Diagnosis in Dynamic Systems: The< and Applications, R. Patton, P. Frank, and Clark, eds., Prentice Hall, Englewood Clif N.J., 1989.

T.J. Laffeyetal., “Real-Time Knowledge-Bas Systems,“AZMagazine, Vol. 9, No. I, 1988,l 27-45.

1-91 NEW PROCEEDINGS from e IEEE Computer Society Press

2nd ANNUAL CONFERENCE ON AI, SIMULATION AND PLANNING IN HIGH AUTONOMY SYSTEMS

The proceedings examines integrated methods in simulation and planning that help automate basic decision-making processes in computer systems. The articles explore current and future decision-making tools that rely heavily on their ability to reason with sophisticated models that are designed, planned, and simulated in real time. It also examines the ongoing need to integrate the qualitative system structures found in expert systems, reasoning systems, logic and social science by utilizing the knowledge found in physical science and engineering.

328 pages. /SBN O-8786-2162-1. Catalog # 2162 $70.00/$35.00 Members

I991 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (3 VOLUMES)

This three volume collection covers recent research and advances in all aspects of robotics and manufacturing automation. The book contains 420 papers balanced among theoretical developments, experimentation, robotics, and manufacturing aspects. It includes subjects such as robot vision, multiple robotic systems, robot modeling and design, motion planning, object recognition, and telerobotics.

2976 pages. LSBN O-8186-21 63-X. Catalog # 2163 $300.00/$150.00 Members

is, I 1

1.0. Oyeleye and M.A. Kramer, “Qualitative iimulation of Chemical Process Systems: Steady- State Analysis,” AZChemical Engineering J., 401. 34, No. 9, Sept. 1988.

1. Shen and R. Leitch, “Synchronized Qualita- ive Simulation in Diagnosis,” to be published in #orking Papers Fifth Znt’l Workshop Qualita- ive Reasoning about Physical Systems, AI Lab, Jniv. of Texas at Austin, 1991.

V. Venkatasubramanian and S.H. Rich, “An 3bject.Oriented Two-Tier Architecture for In- .egrating Compiled and Deep-Level Knowledge ror Process Diagnosis,” Computers and Chemi- :a1 Eng., Vol. 12, No. 9/10, 1988, pp. 903-921.

~ 1 ‘Y- ta- I ’

‘TV I R. Ts, ~

,ed ‘P.

Daniel Dvorak is a dis- tinguished member of technical staff at AT&T Bell Laboratories and a doctoral candidate in computer science at the University of Texas at Austin. His research in- terests include model- based reasoning, quali- tative reasoning, and

case-based reasoning. Recent work has focused on knowledge-based monitoring of the UUCP computer network.

Dvorak received his BS in electrical engi- neering from Rose-Hulman Institute of Tech- nology in 1972 and his MS in computer engi- neering from Stanford University in 1974. He is a member of the IEEE Computer Society, IEEE, AAAI, and Computer Professionals for Social Responsibility.

His address is AT&T Bell Laboratories, 2000 N. Naperville Rd., Naperville, IL 60566-7033; e-mail, [email protected]

Benjamin Kuipers is an associate professor of computer science at the University of Texas at Austin and a new member of ZEEE Expert’s editorial board (his photo appears on p. 2). His researcn Interests include commonsense knowl- edge; qualitative reasoning with incomplete knowledge; resource-limited inference; and spa- tial exploration, learning, and problem solving.

Kuipers received his BA in mathematics from Swarthmore College in 1970 and his PhD in mathematics from the Massachusetts Institute of Technology in 1977. He is a member of AAAI, ACM, the Cognitive Science Society, and the New York Academy of Sciences.

His address is Department of Computer Sci- ence, University of Texas at Austin, Austin, TX 78712; e-mail, [email protected]