Integrating model-based diagnosis techniques into current...

25
99 Integrating model-based diagnosis techniques into current work processes – three case studies from the INDIA project Heiko Milde a,* , Thomas Guckenbiehl b , Andreas Malik c,1 , Bernd Neumann a and Peter Struss d a Laboratory for Artificial Intelligence, University of Hamburg, Vogt-Koelln-Str. 30, 22527 Hamburg, Germany E-mail: {milde, neumann}@informatik. uni-hamburg.de b Fraunhofer-Institut IITB, Fraunhoferstr. 1, 76131 Karlsruhe, Germany E-mail: [email protected] c ESG Elektroniksystem- und Logistik-GmbH, Germany E-mail: [email protected] d Technical University of Munich, Department of Computer Science, Orleansstr. 34, 81667 Munich, Germany E-mail: [email protected] Although the area of model-based diagnosis has developed a number of prototypes with impressive features that promised economic impact and, hence, caught industrial interest, the number of actual industrial applications is still close to zero. One of the reasons is that the successful techniques have not yet been turned into tools that reflect and support the current diagnostic work processes and their existing tools. The INDIA project joined eight German partners (research groups, software suppliers, and end users) in an attempt to take a major step in the transfer of model-based diagnosis techniques into industrial applications. This paper describes part of the work carried out in this project. Rather than pre- senting the theoretical foundations of the techniques in depth, we focus on the aspect of how model-based diagnostic tech- niques can be related to established tools and systems in or- der to provide some leverage for today’s work processes and to change them gradually, as opposed to postulating a radi- cal change in current practice and organizational structures. * Corresponding author. 1 Also affiliated with Robert Bosch GmbH during the project. From this perspective, we discuss the utilization of model- based techniques for the generation of fault trees for on-line testing and diagnosis of fork lifters, generation of test plans for an intelligent authoring system for car diagnosis manu- als, and the exploitation of existing state-chart process de- scriptions for post-mortem diagnosis of processes in a dyeing plant. 1. Introduction Research on model-based diagnosis has claimed for some time that its methods promise applications with attractive problem-solving capabilities and significant economical advantages. Reviewing the attractive fea- tures of model-based diagnosis, the main benefits are connected with the compositionality and transparency of the model, from which diagnosis knowledge can be generated. Compositionality bears the potential for re- using components, building component libraries and inheritance hierarchies alleviating version control and easing modifications. The transparency of component- based behavior descriptions may add further benefits, including complexity management, exploitation of in- formation from the design phase, and a large degree of compatibility with other life-cycle product data in- cluding documentation. Hence, by exploiting model- ing techniques important benefits can be gained from model-based diagnosis technology. In fact, the field has not only made these claims, but has recently provided some evidence for their correct- ness by producing prototypes for significant applica- tion areas [5,26]. Despite all this, transfer of the tech- nology encounters a number of difficulties and is not progressing as fast as expected or hoped for. There are a number of reasons for this experience. One of them is certainly the fact that applying model-based techniques requires significant initial efforts, mainly for produc- ing the appropriate models. Furthermore, in order to AI Communications 13 (2000) 99–123 ISSN 0921-7126 / $8.00 2000, IOS Press. All rights reserved

Transcript of Integrating model-based diagnosis techniques into current...

  • 99

    Integrating model-based diagnosis techniquesinto current work processes – three casestudies from the INDIA project

    Heiko Mildea,∗, Thomas Guckenbiehlb,Andreas Malikc,1, Bernd Neumanna

    and Peter StrussdaLaboratory for Artificial Intelligence, University ofHamburg, Vogt-Koelln-Str. 30, 22527 Hamburg,GermanyE-mail: {milde, neumann}@informatik.uni-hamburg.deb Fraunhofer-Institut IITB, Fraunhoferstr. 1, 76131Karlsruhe, GermanyE-mail: [email protected] ESG Elektroniksystem- und Logistik-GmbH,GermanyE-mail: [email protected] Technical University of Munich, Department ofComputer Science, Orleansstr. 34, 81667 Munich,GermanyE-mail: [email protected]

    Although the area of model-based diagnosis has developed anumber of prototypes with impressive features that promisedeconomic impact and, hence, caught industrial interest, thenumber of actual industrial applications is still close to zero.One of the reasons is that the successful techniques havenot yet been turned into tools that reflect and support thecurrent diagnostic work processes and their existing tools.The INDIA project joined eight German partners (researchgroups, software suppliers, and end users) in an attempt totake a major step in the transfer of model-based diagnosistechniques into industrial applications. This paper describespart of the work carried out in this project. Rather than pre-senting the theoretical foundations of the techniques in depth,we focus on the aspect of how model-based diagnostic tech-niques can be related to established tools and systems in or-der to provide some leverage for today’s work processes andto change them gradually, as opposed to postulating a radi-cal change in current practice and organizational structures.

    * Corresponding author.1Also affiliated with Robert Bosch GmbH during the project.

    From this perspective, we discuss the utilization of model-based techniques for the generation of fault trees for on-linetesting and diagnosis of fork lifters, generation of test plansfor an intelligent authoring system for car diagnosis manu-als, and the exploitation of existing state-chart process de-scriptions for post-mortem diagnosis of processes in a dyeingplant.

    1. Introduction

    Research on model-based diagnosis has claimed forsome time that its methods promise applications withattractive problem-solving capabilities and significanteconomical advantages. Reviewing the attractive fea-tures of model-based diagnosis, the main benefits areconnected with the compositionality and transparencyof the model, from which diagnosis knowledge can begenerated. Compositionality bears the potential for re-using components, building component libraries andinheritance hierarchies alleviating version control andeasing modifications. The transparency of component-based behavior descriptions may add further benefits,including complexity management, exploitation of in-formation from the design phase, and a large degreeof compatibility with other life-cycle product data in-cluding documentation. Hence, by exploiting model-ing techniques important benefits can be gained frommodel-based diagnosis technology.

    In fact, the field has not only made these claims, buthas recently provided some evidence for their correct-ness by producing prototypes for significant applica-tion areas [5,26]. Despite all this, transfer of the tech-nology encounters a number of difficulties and is notprogressing as fast as expected or hoped for. There area number of reasons for this experience. One of them iscertainly the fact that applying model-based techniquesrequires significant initial efforts, mainly for produc-ing the appropriate models. Furthermore, in order to

    AI Communications 13 (2000) 99–123ISSN 0921-7126 / $8.00 2000, IOS Press. All rights reserved

  • 100 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    promote the application of the technology, one has toconsider the current practice and role of diagnosis inindustry.

    First of all, many producers of technical systemsprovide only limited diagnosis support to their prod-ucts. There are only few large market segments whereproducers develop sophisticated diagnosis support(e.g., the automotive and aircraft industry). This ischanging, however, as devices grow more complex,the cost of maintenance personnel becomes more im-portant and improved service is required to increasecustomer satisfaction and to remain competitive. Of-ten, there is even enough computational power on-board the devices (such as cars, photo copiers, and evensimpler consumer products) or easily available to hu-man trouble shooters. Also expert knowledge exists, asprincipled knowledge about the devices in the heads ofdesigners and developers and as diagnostic expertiseof maintenance personal. What is needed, is computersupport for the effective and efficient generation of di-agnostics, be it for automated on-board or workshopdiagnosis, training or instructions for technicians, etc.This is exactly what model-based diagnosis is claimingto offer. However, to really provide some leverage, itstechniques have to deliver effective and efficient sup-port to thecurrent work processes!

    So far, many prototype systems have demonstratedthat there are principled solutions to the diagnosticproblems in certain industrial areas. However, most ofthem remained isolated programs that do not reflectand integrate into the actual work processes and thetools and techniques applied. Now, it is time to addressthese issues.

    To present examples, there exist written diagnosisinstructions and authoring tools to create and maintainthem, and many of today’s computer tools capture di-agnostic knowledge in the form of decision trees orfault trees. These techniques have been developed froma maintenance rather than from a design perspective.Traditionally, diagnosis matters are a concern only tothe service division of a company, not to the design di-vision. However, as technical systems become largerand more complicated, the design of decision treesbecomes more demanding and problems arise. Fur-thermore, frequent product changes cause excessivecosts for the maintenance of such diagnosis equipment.Hence there is growing awareness of the need for re-usability of diagnostic information. Diagnosis equip-ment should be developed in close connection with de-sign, and construction data and FMEA data should beexploited. This illustrates that, ultimately, model-based

    techniques will lead to significant changes in the workprocesses. Indeed, representing and sharing engineer-ing knowledge in computer-based model libraries willresult in a major change, and many benefits will stemexactly from such changes.

    However, industry usually prefers to perform suchchanges in small steps. The introduction of model-based reasoning for complete automation of diagnosisis often perceived as too different from the traditionalways of doing diagnosis. Existing know-how wouldbecome worthless and new know-how would have tobe acquired. As noted above, the organizational struc-ture would be affected, with diagnosis tasks shiftingfrom the service to the design division.

    Any realistic strategy for the deployment of the tech-nology has to be aware of how it can be related to ex-isting knowledge and organization of diagnosis tasksand design solutions that complement existing tools ina seamless fashion. Steps in this direction were under-taken in the German joint research project INDIA (In-telligent Diagnosis in Industrial Applications). In thisproject, three teams,1 each consisting of a research in-stitute, a software supplier, and an industrial produc-tion company, joined to apply model-based diagnosistechnology to real-life diagnosis problems and pavethe way for successful applications elsewhere. The par-ticular diagnosis problems provided by the industrialpartners represented an interesting subset of industrialdiagnosis applications. One application area is diag-nosis of automotive equipment, another one is main-tenance support for transport vehicles (forklifts), thethird one deals with operator assistance in post mortemdiagnosis of machinery in a dye house.

    Although in INDIA also solutions to automated di-agnosis were developed, this paper focuses on workthat attempted to exploit present forms of represent-ing and exploiting diagnostic knowledge and producemodel-based solutions as add-ons to existing technolo-gies. The next section presents a method for automat-ically generating fault-tree diagnosis systems for on-line diagnosis of forklifts from design data. The keyidea is to use modeling techniques of model-based di-agnosis for an exhaustive computation of faulty be-havior. Based on these data, fault trees can be gener-

    1Robert Bosch GmbH, STILL GmbH, THEN GmbH,ServiceXpert GmbH, R.O.S.E. Informatik GmbH, Artificial In-telligence Laboratory of the University of Hamburg, FraunhoferInstitute for Information Technology and Data Processing, Model-based Systems and Qualitative Reasoning Group of the TechnicalUniversity of Munich.

  • H. Milde et al. / Integrating model-based diagnosis techniques into current work process 101

    ated automatically and edited preserving correctnessand completeness properties.

    Section 3 discusses work on a new generation au-thoring system for diagnosis manuals for car subsys-tems. While maintaining a text-oriented view on themanuals, it provides an explicit representation of theunderlying knowledge. This allows to import plans fordiagnosis, testing, and repair which are generated au-tomatically by a model-based system.

    The third case study is on post mortem diagnosis ofthe chemical distributor (CHD), a system to dose anddistribute chemicals and colors in a dye house auto-matically. The solution is guided by the goal of linkingto existing engineering knowledge and to re-use by im-porting state charts from other engineering tools intothe diagnostic system.

    2. Generating fault trees

    2.1. Application scenario and challenges

    More than 100.000 forklifts made by the Germancompany STILL GmbH Hamburg are in daily use allover Europe. In order to reduce forklift downtimes, ap-proximately 1100 STILL service workshop trucks uti-lize fault tree-based computer diagnosis systems forworkshop diagnosis. In case of a malfunction a diag-nosis system is attached to a forklift performing au-tomated testing. The diagnosis system also instructsservice technicians to carry out manual tests. Test se-quences are specified by fault trees which are diagnos-tic decision trees allowing fault identification.

    Due to the complexity of the electrical circuits em-ployed in forklifts, fault trees may consist of more than5000 nodes. When forklift model ranges are modifiedor new model ranges are released, fault trees are man-ually generated or adapted by service engineers whoapply detailed expert knowledge concerning faults andtheir effects. Obviously, this practice is costly and qual-ity management is difficult. Furthermore, fault trees arenot optimized and fault identification cost is unneces-sarily high. Hence, there is a need for computer meth-ods to systematically support the design, modification,and optimization of fault trees. The introduction of newdiagnosis techniques, however, raises challenges.

    Dealing with this application scenario, the ArtificialIntelligence Laboratory of the University of Hamburgconsiders model-based techniques to be advantageousbecause they provide a systematic way for design,modification and optimization of diagnosis equipment.

    Fig. 1. Wiring diagram of a forklift accelerator pedal circuit.

    Since for STILL, completely replacing fault trees is notan immediate option, automatically generating faulttrees from device models is promising provided thatthe complexity of the device modeling process is min-imized for economical benefit. Thus, extensive re-useof diagnosis models and fault trees as well as integra-tion of available resources such as design data, serviceexpertise, and expert knowledge from the design pro-cess is essential.

    In our application, model-based approaches have todeal with electrical circuits of the transport vehicle do-main. As an example Fig. 1 shows the wiring diagramof a STILL forklift accelerator pedal circuit whichenables an electronic control unit (ECU) to measurethe accelerator pedal position. The pedal determinesa potentiometer position which controls the value ofvoltage UFG. In addition, the pedal is connected to aswitch controlling voltage UFGS. The ECU provides asupply voltage VCC and it measures UFG and UFGS.Thus, the pedal position is supplied redundantly to se-cure reliability of the measurements. Beside the quan-titative value of UFG, the ECU provides two errorflags UFG_HIGH and UFG_LOW with valuesOK andNOT_OK which are utilized for fault identification.These flags indicate that UFG exceeds or falls be-low certain threshold values. The quantitative value ofUFGS is also automatically mapped to valuesOPENandCLOSED. In addition to these automated measure-ments, for fault identification, service technicians canmanually measure the voltage drop from wire 8 at con-nector X16 to ground.

    Fault trees of the current diagnosis system allow toidentify the following faults: Wires located betweenthe ECU and connector X16 may break due to mechan-ical stress. The mechanical connection between the ac-celerator pedal and the switch 1S16 may also break.In this case, the switch is independent from the actualpedal position and it connects either wire 1 and wire 2

  • 102 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    or wire 2 and wire 5. The voltage VCC may be sup-plied incorrectly, i.e., the voltage is not between 9.8 Vand 10.1 V. Additionally, the mechanical connectionbetween accelerator pedal and potentiometer 1B1, maynot be adjusted correctly or may even be broken.

    This example demonstrates that, in our application,electrical circuits usually consist of components thatshow a variety of different behavior types, such asanalog, digital, static, dynamic, linear, nonlinear andsoftware-controlled behavior. Faults may slightly mod-ify component behavior or may even change circuitstructures. Hence, a wide range of different symptomssuch as slight deviations of parameter values and totalloss of functionality may occur. In principle, model-based techniques provide a systematic way for predict-ing the behavior of electrical circuits, including faultybehavior. However, in our application, adequate mod-eling of heterogeneous circuits is essential to securehigh quality of generated fault trees. Thus, accurate de-vice modeling was a major focus in our work. The fol-lowing subsections report on the progress which hasbeen achieved.

    2.2. Model-based fault tree generation

    In our application, nodes of fault trees representfault sets. Edges are labeled by the tests (involvingmeasurements, observations, display values and errorcodes) which must be carried out to verify the cor-responding child node. Although the basic conceptsof model-based fault tree generation are already de-scribed in [5,11], for the reader’s convenience, webriefly outline the main ideas of the approach in thefollowing.

    The first step to model-based fault tree generation isto model a device. This step is supported by compo-nent libraries and a device model archive (see Fig. 2).Design data and knowledge from the design process(knowledge concerning intended device behavior, ex-pected faults, available measurements) are integratedinto the device modeling process. In a second step, cor-rect and faulty device behavior is predicted automati-

    Fig. 2. Basic concepts of model-based fault tree generation.

    cally from the device model. Behavior predictions arestored in the so-called fault relation. The third step isto build fault trees from the fault relation. This stepis supported by a fault tree archive and a cost modelfor the tests which can be performed. Fault tree gen-eration can be performed automatically or guided byservice know-how, i.e., knowledge concerning cost ofmeasurements and preferable fault tree topologies. Inorder to realize these concepts, we implemented theMAD system (Modeling, Analyzing, and Diagnosing)that is described in the following.

    2.3. Device modeling and behavior prediction

    For model-based fault tree generation, behavior pre-dictions are computed taking fault models into accountwhich represent faulty component behavior. Each faultmodel of the device model is explicitly represented ingenerated fault trees. Thus, computation of behaviorpredictions is uncomplex and fault trees are clear andmanageable if the device model shows a small numberof fault models. Since qualitative models rather repre-sent fundamental system behavior than sharp quantita-tive parameter values, a single qualitative fault modelcan cover a wide range of faulty component behaviorand, thus, a limited number of fault models can sufficefor extensive faulty behavior representation. Therefore,in principle, qualitative circuit modeling is useful formodel-based fault tree generation. Moreover, in our ap-plication, qualitative electrical device models are ade-quate because, in STILL fault trees, component faultsand symptoms are described qualitatively. Addition-ally, qualitative models are advantageous because, inprinciple, dealing with product variants is possible.In the following, MAD’s qualitative network analysisis presented. As known from electrical engineering,MAD represents electrical circuits by equivalent net-works consisting of standard component models.

    2.3.1. Standard component modelsStandard component models show no internal struc-

    ture but they show well-defined and idealized behav-ior. MAD provides four different standard componentmodels: idealized voltage sources, consumers, conduc-tors and insulators. The behavior of idealized volt-age sources is well-known from electrical engineer-ing. Consumers are passive and their current/voltagecharacteristic is monotonic. Idealized conductors donot allow any voltage drop while idealized insulatorsdo not allow any current. Controlled versions of thesestandard component models exist. Standard compo-

  • H. Milde et al. / Integrating model-based diagnosis techniques into current work process 103

    nent models can be connected in combinations of se-ries, parallel, star and delta (triangular) groupings. Thissimple internal representation of electrical circuits issufficient for the following reasons.

    – In STILL service workshops, only steady-state di-agnosis of electrical circuits is performed. There-fore, only steady-state behavior of physical com-ponents has to be represented in component mod-els. In particular, an explicit representation oftemporal dependencies is not necessary.

    – A small number of qualitative standard com-ponent models suffices, because, often, differ-ent physical components show similar electri-cal behavior, i.e., their current/voltage charac-teristics differ only slightly. Qualitative versionsof these current/voltage characteristics are fre-quently identical.

    – MAD’s standard component models are deliber-ately selected so that important behavior classesof the application domain can be adequately rep-resented.

    Due to analogies between electrics, mechanics and hy-draulics, MAD’s internal circuit models are, in princi-ple, also adequate for other technical domains.

    2.3.2. Qualitative parameter representationIn general, any parameter value that is different from

    the expected value can be a fault symptom. Thus, rep-resenting actual parameter values as well as deviationsfrom reference values is helpful to characterize faultsand symptoms adequately. However, in order to in-crease the accuracy of automated behavior predictionsMAD’s qualitative parameter representation consistsof three attributes, i.e., actual value, deviation valueand reference value. In [21] we motivate MAD’s three-fold qualitative parameter representation in detail.

    Qualitative parameter values stand for intervals andlandmarks. Only signs of parameterdeviationsare rep-resented, i.e., ‘−’, ‘0’, and ‘+’ are MAD’s qualitativedeviation values. As an example, Fig. 3 shows MAD’squalitativeabsolutevoltage values and their semantics.These values are utilized to characterizeactual andreferencevalues of voltage. In MAD’s internal circuitmodels, there are no quantitative parameter values rep-resented but the value 0. Note that,negative_landmark= − positive_landmarkholds. The meaning of theselandmarks is further specified by their utilization in themodeling process. For instance, the value of the supplyvoltage VCC of the accelerator pedal circuit is mod-eled bypositive_landmark. Thus, for all voltage in the

    Fig. 3. Qualitative and quantitative absolute voltage values.

    device model,positive_landmarkrepresents the valueof VCC.

    Since MAD provides qualitative voltage valuespos-itive_landmarkand negative_landmark, in principle,dealing with logical circuits is possible. For instance,logical values (true, false) can be mapped to MAD’svoltage valueszero and positive_landmark. Dealingwith logical values is the basis for dealing with hybridsystems consisting of both analog and digital subsys-tems which are usually found in our application.

    2.3.3. Computation of qualitative valuesIn order to compute qualitative current and voltage

    values, local propagation methods have been inves-tigated [30]. Since detailed studies proved that localpropagation in electrical networks is inappropriate, wefollow a different approach first presented by Maussand Neumann in [20]. Networks are transformed intotrees representing the network structure. In particu-lar, series, parallel, star and delta groupings are repre-sented explicitly. Exploiting these structure trees, qual-itative behavior can in fact be computed by local prop-agation.

    In general, if interval-based qualitative values are lo-cally propagated, spurious solutions occur [30]. Thatis, some solutions are not consistent with a globalanalysis of the device model and, thus, provided themodel is accurate, these solutions do not describe elec-trical circuit behavior. Obviously, fault trees groundedon spurious behavior predictions are useless for faultidentification. In order to secure high quality of gen-erated fault trees we developed a new qualitative cal-culus which shows certain features to improve the ac-curacy of circuit behavior prediction. In the following,these features are briefly summarized. In [21] they aredescribed in detail.

    – First of all, rather than relying on qualitative ver-sions of basic arithmetics, MAD computes qual-itative values for currents and voltages by a setof qualitative operators which are qualitative ver-sions of complex quantitative equations. In ef-fect, these equations describe behavior of series,parallel, star and delta groupings. For example,utilizing the well-known equationRp = (R1 ∗R2)/(R1 + R2), the compensation resistanceRp

  • 104 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    of a parallel grouping of two resistorsR1 andR2 can be computed provided that values ofR1andR2 are given. MAD’s qualitative operatorsare defined by applying the corresponding quan-titative equation to the interval boundaries whichrepresent actual values and reference values ofinput parameters. Resulting boundaries representthe corresponding qualitative values of output pa-rameters. Qualitative deviation values are com-puted from actual and reference values. Addition-ally, output deviation values are inferred from in-put deviation values if parameter dependenciesare monotonous.

    To demonstrate that complex qualitative opera-tors are essential, a parallel grouping of two resis-torsR1 andR2 is considered. Obviously,Rp= 0holds ifR1 = infinite andR2 = 0 is valid. Thisresult cannot be computed applying qualitativebasic operators because qualitative multiplicationof R1 andR2 is undefined ifR1 = infinite andR2 = 0 holds. MAD provides a qualitative opera-tor based on the equationRp = (R1∗R2)/(R1+R2) such that, forR1 = infinite andR2 = 0,Rp= 0 is derived.

    In principle, for network analysis, a limitednumber of operators suffices because MAD’s in-ternal representation of electrical circuits offersa limited number of standard component modelsand elementary network structures. Operators arerepresented by a set of tables comprising morethan 30.000 entries which had to be generated bycomputer in order to secure reliability.

    – Second, for computation of qualitative values,MAD utilizes a set of equations which wouldbe redundant if quantitative values were used. Itcan be shown that if qualitative values are com-puted, MAD’s set of equations is not redundantbut avoids some spurious behavior predictions.

    To demonstrate that MAD utilizes a ‘redun-dant’ set of equations, we again consider a par-allel grouping of two resistorsR1 andR2 withcorresponding voltage drops and currentsU1, I1,U2, andI2. Up is the voltage drop across theparallel grouping.Ip is the current through theparallel grouping. If values ofR1, R2, and Ipare given, the value ofI1 can be computed ap-plying the well-known current divider equationI1 = R2/(R1 + R2)∗ Ip. MAD additionally cal-culates the value ofI1 by, first, applyingUp =(R1 ∗ R2)/(R1 + R2)∗ Ip and, second, utilizingI1 = Up /R1. If quantitative values were used,

    both computations would lead to the same resultbecause, obviously, the set of three equations isredundant. It can be shown that utilization of allthree equations avoids some spurious solutions ifqualitative values are computed.

    – Third, in addition to local propagation of qual-itative values, MAD globally analyses networkstructures and structure trees in order to elim-inate (some) spurious predictions for circuitswhich cannot be structured into series and parallelgroupings of standard components. For instance,a global analysis of the network structure allowsto determine current directions. Knowledge aboutcurrent directions can be used to eliminate certainqualitative current values. Therefore, global net-work analysis may prevent spurious current pre-dictions.

    Due to the properties of this qualitative calculus, spu-rious solutions do not occur at all if the network canbe structured into series and parallel groupings of stan-dard components.

    2.3.4. Automated behavior predictionBehavior predictions are performed for all operat-

    ing modes, faults and fault combinations for which di-agnosis support is required. For each operating modeand fault assumption, all symptoms (measurements,observations, error codes, display values) are com-puted which are in principle available for diagnosis.The output of the prediction step is model-based di-agnosis knowledge in form of fault-symptom associa-tions stored the fault relation. This fault relation is thebasis for fault tree generation. In Fig. 6, parts of thefault relation of the accelerator circuit example are pre-sented.

    2.3.5. COMEDI (COmponent Modeling EDItor)To improve acceptance among engineers, MAD pro-

    vides a user interface called COMEDI which is similarto a CAD tool (see Fig. 5). Figure 4 describes the mod-eling process utilizing predefined models from threedifferent libraries. Providing library models is funda-mental because utilization of libraries massively re-duces the complexity of the modeling process which isessential for the acceptance of MAD in our application.In the following, the libraries and the modeling processare briefly described. A more detailed presentation canbe found in [22].

    Due to MAD’s internal representation of electri-cal circuits, it is possible to distinguish 728 differentgeneric component class models. For application com-ponent modeling, all these models are provided by the

  • H. Milde et al. / Integrating model-based diagnosis techniques into current work process 105

    Fig. 4. COMEDI libraries and the device modeling process.

    generic component library which cannot be extended.Generic component class models show both correctand faulty behavior modes. Internally, these modes arerepresented by MAD’s threefold qualitative parameterrepresentation. To COMEDI users, behavior modes ofgeneric component class models are described in col-loquial language similar to engineers thinking of howcomponents work. By this means, MAD’s threefoldparameter representation is hidden from users to fa-cilitate the modeling process. For example, a certaingeneric component class model shows two behaviormodes. To COMEDI users, these behavior modes aresimply described byok_consumerandfault_insulator.Internally, these modes are represented by two differ-ent threefold qualitative resistance values. That is, ei-ther (R_act= positive,R_ref = positive,R_dev= 0)or (R_act = infinite, R_ref = positive, R_dev= +)holds.

    The application component library contains mod-els of component and subcircuit classes occurring in acertain application. Application component class mod-els can be interactively generated and stored in the li-brary. The ability to extend the application componentlibrary is essential for dealing with real world appli-cations because a limited number of predefined ap-plication component models cannot cope with a per-manently increasing number of different electrical andelectronic components utilized for electrical device de-sign. To build a certain application component model,structure and behavior of the model have to be deter-mined. The model structure is generated by assemblinggeneric component class models on the screen. Behav-ior of application component class models is implic-itly given by the model structure and the behavior ofgeneric component class models. Additionally, causalparameter dependencies can be represented in behaviortables based on user-defined qualitative parameter val-ues. By this means, generation of abstract qualitativemodels for complex components or subsystems such asamplifying circuits, logical circuits, and software con-trolled components is facilitated. To demonstrate ap-

    plication component class modeling, we consider lightbulbs which could be blown as a simple example.

    The generic component class model described in theprevious paragraph is a reasonable application compo-nent class model for light bulbs because if a light bulbis blown it shows insulator behavior, otherwise it isa consumer of electrical energy. This generic compo-nent class model completely determines structure andbehavior of a light bulb application component classmodel. There are no additional causal parameter de-pendencies defined. For accelerator pedal circuit mod-eling, only 11 different application component classmodels are required. Since we also modeled other cir-cuits, up to now, the application component libraryconsists of about 50 models.

    The device model archive allows systematic reuseand modification of device models that were createdduring former modeling sessions. These models are as-semblies of application component models. That is, fordevice modeling only a structure description is createdon the screen.

    2.3.6. Modeling and analyzing the accelerator pedalcircuit

    In Fig. 5, the COMEDI device model of the accel-erator pedal circuit is presented. The device operatingmode STANDARD is modeled which means that theswitch 1S16 is in the position shown in Fig. 1. Cor-rect and faulty behavior of wire 4 is presented in thesmall window in the center of Fig. 5. To model the errorflags UFG_LOW, UFG_HIGH, and UFGS, we utilizefunctional labeling [25], i.e., strings such as ‘OPEN’,‘CLOSED’, ‘OK’, and ‘NOT_OK’ can be attached toMAD’s qualitative parameter values. Note that, thesestrings occur in the fault relation shown in Fig. 6. Themanual voltage measurement UG described in Sec-tion 2.1 is modeled by a special multimeter compo-nent model. Automated behavior predictions are per-formed for all possible component behavior (correctand faulty) and all device operating modes consideredin the accelerator pedal circuit device model. The re-sults are stored in the fault relation. Figure 6 showsfault symptom associations which hold in the deviceoperating mode STANDARD.

  • 106 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    Fig. 5. COMEDI model of accelerator pedal circuit.

    Fig. 6. Parts of fault relation of the accelerator pedal circuit.

    2.4. Fault tree generation

    MAD offers three different possibilities to generatefault trees. First, based on fault relations, fault trees

    can be created automatically. Second, fault trees fromarchives can be reused. Third, in order to permit man-ual adoption and modification of fault trees, MAD of-fers basic editing operations, such as moving a certain

  • H. Milde et al. / Integrating model-based diagnosis techniques into current work process 107

    fault from one fault set to another and recomputing thecorresponding tests. In the following, automated faulttree generation is presented in more detail. One canchoose from the following criteria to guide fault treegeneration.

    – Grouping by observations, error codes, displayvalues.Fault trees are generated such that sub-sets of faults correspond to a prespecified symp-tom. For instance, all faults are grouped togetherwhich lead to an unexpected value of the acceler-ator pedal voltage UFG.

    – Grouping by aggregate structure. If the ag-gregate structure of the device is known, faulttrees can be generated such that subsets of faultscorrespond to the same physical component. Forinstance, faults occurring in the ECU may begrouped together.

    – Minimization of average diagnosis cost.Auto-mated fault tree generation uses the well-knownA*-algorithm to select the tests which minimizethe average fault identification cost according to acost model.

    Since minimization of average diagnosis cost is essen-tial for the acceptance of MAD in our application, inthe following, we describe MAD’s utilization of theA*-algorithm for optimized fault tree generation. Wedo not take fault probabilities into account because, inour application, these probabilities are not available.A promising approach for fault tree generation consid-ering fault probabilities is presented in [11].

    Cost optimized fault tree generation is based on theA*-algorithm which performs a search in a state space.A state contains a set of fault sets which are the cur-rent leaves of a growing fault tree. The start state con-sists of one fault set containing all faults of the faultrelation. The goal state consists of fault sets contain-ing faults which cannot be discriminated. A successorof a state is generated by partitioning one leaf fault setinto at least two subsets by selecting a partitioning testT . All successors of a state are generated by applyingeach partitioning test to each leaf. Each path in the statespace represents a possible fault tree. A test shows cor-responding costs and a set of different possible test re-sults, i.e., a test domain. In general, tests are not exclu-sive. That is, if a test partitions a fault set into subsets,certain faults can occur in more than one subset. Eachstate is evaluated by the functionsg andh whose defi-nitions are presented in Fig. 7.g is defined as the sumof the diagnostic effort for each faultf . The diagnos-tic effort of a faultf is the sum of all test costC(T )

    on the path between the current leaf fault set contain-ing f and the root fault set. To guide the search, theheuristic functionh estimates cost of fault identifica-tion assuming that, in the fault identification process, acertain state is already reached. For details of the A*-algorithm see [27].

    To demonstrate thath never overestimates real costof fault identification in a cost-optimized tree, a certainleaf b is considered. There is a set of available testsTinot yet used on the path betweenb and the root. Thesetest allow the generation of a cost-optimized subtreebelow b. Available tests show costsci and domains.kmx is the maximum size of these test domains. Thefollowing two properties guarantee that the heuristicfunctionh underestimates fault identification cost in acost-optimized subtree. First,h considers an impossi-ble subtree in which fault identification is cheaper thanin a cost-optimized subtree. Second, rather than pre-cisely computing fault identification cost in the impos-sible subtreeh underestimates diagnosis cost.

    The impossible subtree and the cost-optimized sub-tree show the same faults but different tests. The fol-lowing characteristics secure that fault identificationin the impossible subtree is cheaper than in the cost-optimized subtree. First, in the impossible subtree, allavailable test are exclusive. That is, if a test partitions afault set, each fault occurs in only one subset. Second,all available tests split fault sets intokmxsubsets. Dueto these two properties, in the impossible subtree, thediscriminating power of available tests is higher thanin the cost optimized subtree. Third, in the impossi-ble subtree, the test first performed for fault identifica-tion, is as expensive as the cheapest available test ofthe cost-optimized subtree. All tests performed in thesecond level of the impossible subtree are as expensiveas the second cheapest test of the cost-optimized sub-tree, and so on. Fourth, in the impossible subtree, if afault set is partitioned into subsets, all of these subsetscontain the same number of faults, i.e., the impossiblesubtree is balanced. Provided the first three propertieshold, a balanced tree yields to lowest fault identifica-tion cost.

    For the estimation of fault identification cost in theimpossible subtree, it is assumed that the depth of thesubtree isblogkmx(|b|)c, (b. . .c is the floor operation)which, in general, is an underestimation. Based on thisassumption, for each fault inb, cost of fault identifi-cation in the impossible subtree can be underestimatedas:

  • 108 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    Fig. 7. Cost functiong and heuristic functionh for computing an optimal fault tree withn faultsfi. de denotes the diagnostic effort.

    Fig. 8. Cost optimized fault tree of accelerator pedal circuit.

    blogkmx(|b|)c∑i=1

    C(Ti), with T i is

    i-th cheapest unused TestT.

    Figure 8 shows a cost optimized fault tree of the accel-erator pedal circuit. Note that, in the figure, the finalrow of a fault set is the corresponding test. The rootcontains 13 faults which are not explicitly shown. Notethat, in MAD fault trees, faults are sometimes discrim-inated by so-called direct fault identifications such asw3 = { BROKEN}. Direct fault identifications are de-scribed in [22].

    2.5. Evaluation

    The prototypical implementation of MAD allowsmodel-based behavior prediction and automatic gener-ation as well as manual modification of fault trees. All

    faults considered in the device model occur in the gen-erated fault tree, and tests are selected correctly to dis-criminate fault sets. This holds even when fault treesare modified manually. Furthermore, average diagno-sis cost is minimal within the constraints imposed by aprespecified fault tree structure. Fault trees have beensuccessfully integrated into existing STILL diagnosissystems.

    MAD has been evaluated in the STILL applicationscenario and it has been found that using the modelingtechniques of MAD with some extensions regardingthe implementation of certain network analysis con-cepts, more than 90% of the faults of the current hand-crafted diagnosis system can be handled successfully.In some cases, since component-dependent parameterthreshold values are not explicitly represented in MADmodels, in fault trees, correct and faulty behavior can-not be completely distinguished. Using MAD, an ac-celerator pedal fault tree was automatically generatedfrom the circuit model and imported into the STILL di-

  • H. Milde et al. / Integrating model-based diagnosis techniques into current work process 109

    agnosis system. STILL service experts found that thisfault tree can be used for fault identification.

    3. Model-based support to diagnosis and faultanalysis in the automotive industry

    3.1. The applications

    In collaboration with Robert Bosch GmbH as a ma-jor supplier of mechatronic car subsystems, the Model-based Systems and Qualitative Modeling Group(MQM) at the Technical University of Munich workedon three prototypes that support different tasks relatedto fault analysis during the life cycle of a product:

    – Failure-Mode-and-Effects Analysis (FMEA).This task is performed during the design phaseof a device. Its goal is to analyze the effects ofcomponent failures in a system implemented ac-cording to the respective design. The focus is onassessment of the criticality of such effects, i.e.,how severe or dangerous the resulting disturbanceof the functionality is in objective or subjectiveterms (e.g., inconvenience for the driver, environ-mental impact, risk of hazards). In addition, theprobability of the fault and its detectability is con-sidered. Based on this analysis, revisions of thedesign may be suggested. FMEA is performed ina growing number of areas. Particularly for vehi-cles, due to safety and environmental aspects, ithas to be as ‘complete’ as possible, not only w.r.t.the faults considered, but also in the sense that allpossible effects under all relevant circumstances(driving conditions of a vehicle, states of interact-ing subsystems, etc.) have to be detected.

    – Workshop diagnosis.The diagnostic task in therepair workshop starts from a set of initial symp-toms which are either customer complaints ortrouble codes generated and stored on-board byone of the electronic control units responsiblefor various subsystems of the vehicle. Except forthe obvious cases, some investigations, tests, andmeasurements have to be carried out in order tolocalize and remove the fault, usually by replace-ment of components.

    – Generation of diagnosis manuals.The mecha-nic in the repair workshop is educated or guidedby diagnosis manuals produced and distributed bythe corporate service department (on paper, CD-ROM, or via a network). Here, engineers compilevarious kinds of information (tables, figures, text)

    Fig. 9. The three tools and their software components.

    which is required or useful for carrying out the di-agnosis. Such documents have to be produced foreach variant of the various subsystems and spe-cific to a particular make, type, and special equip-ment of a vehicle, a broad set of issues for a sup-plier. The core of each document is a set of testplans that guide fault localization starting fromclassified customer complaints or trouble codes ofthe ECU.

    In practice, these tasks are usually not extremely dif-ficult to perform by an expert. However, they can betime-consuming since they have to be carried out foreach specific instance of a general device type whichincludes collection of all the information specific tothis instance. This situation of routine work applied toa large set of variants justifies computer support to bedeveloped. And because knowledge about physical de-vices is the key to solving the tasks, model-based sys-tems offer a perspective.

    In the INDIA project, we built software prototypesfor the above tasks by developing model libraries forthe domain and by using software components formodel-based reasoning of the commercial RAZ’R sys-tem. Figure 9 indicates the components used for thevarious systems and shows that the degree of re-useis fairly high. In this paper, we discuss the task of thegeneration of diagnosis manuals with a focus on thegeneration of test plans.

  • 110 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    3.2. Automated test generation

    Testing a system means manipulating it by chang-ing its structure and/or providing a certain stimulus (in-put) to it in order to obtain observations that help togain information about its mode of behavior. If a sys-tem is composed of individual components that are as-sembled in a fixed structure and if component faultsare considered as the only origins of malfunctions, test-ing aims at distinguishing different behavior modes ofits components. For instance, testing a newly manu-factured device aims to check whether all componentsfunction correctly or whether one of them has a fault(usually done by trying to rule out all, or the mostlikely, single faults). Testing performed to serve diag-nosis tries to identify the fault (or class of faults) that ispresent. Since it is impossible for principled reasons toidentify a fault (class) by directly confirming it throughobservations, it can be done only by looking for teststhat help to refute all other (classes of) faults. There-fore, test generation and testing, in contrast to diagno-sis, relies on knowledge about the (enumerable) set offaults that are considered.

    Model-based methods offer themselves as a basis forsupporting and automating these tasks, because behav-ior models can capture knowledge about the correctand the faulty behavior of the components involved.We have developed the theory underlying model-based(component-oriented) test generation some time ago([31,32]). Here, we refrain from a formal presentationand only try to convey the intuition. In the theory andthe respective algorithms, the behavior models of thesystem to be tested (e.g., the modes established by allsingle faults) are represented as relations in the spaceof its variables, state variables, and parameters. Figure10a shows two abstract behavior relations as Venn di-agrams (ignoring the temporal aspects of the behav-

    ior). Testing, i.e., mode discrimination, is the task ofdetecting the differences between these sets, i.e., tuplesthat are not part of all relations. More precisely, we arelooking for some input to the system (fixing the val-ues of some ‘causal’ variables, i.e., the ones that canbe influenced), e.g., vari = vali,1 on the vertical axisin Fig. 10a, such that the possible observations underthe different modes are differing as much as possible.The possible resulting observations in vobs, indicatedon the horizontal axis in Fig. 10a, b, are given by theprojections of the intersections of vari = vali,1 with thedifferent behavior relations to the observable variablesvobs. The example suggests that vari = vali,1 is a badchoice because these projections have significant over-lap, while vari = vali,2 is certain to distinguish mode1from mode2. If probabilities of the behavior modes areavailable (and/or if behavior descriptions are probabil-ity distributions over the relations), it is possible to usethe minimization of entropy as a criterion for selectingthe test with maximum information gain (see [32] forthe details).

    Obviously, for continuous systems, computing therespective projections for the possible inputs rangesfrom very complex to unfeasible. It can be made feasi-ble by applying qualitative instead of numerical mod-els, symbolized in Fig. 10b by moving to the finitegranularity of the rectangular models. When we assurethat the qualitative behavior relations cover the morefine-grained ones, we can apply the same computa-tional steps as above, which are now operations on fi-nite sets: the number of qualitative distinctions is finite,and so is the number of tuples. Furthermore, the set offaults becomes finite, because the qualitative abstrac-tion aggregates continuous sets of faults (e.g., leakagesof different size) into one fault model.

    When applied to electrical circuits, qualitative mod-els do not relate numerical values of current and volt-

    Fig. 10. Quantitative (a) and qualitative (b) abstract behavior relations.

  • H. Milde et al. / Integrating model-based diagnosis techniques into current work process 111

    age, but, for instance, capture only the direction of cur-rent, represented by values negative (‘neg’), positive(‘pos’), and zero (‘0’). The domain of the voltage vari-ables could consist of values ‘gnd’ (for ground), ‘bat’(representing the battery voltage), and ‘btw’ (the in-terval between). With these domains, a model of thecorrect behavior of a temperature sensor (or any resis-tor) is given by the relation shown in the left-hand ta-ble in Fig. 11, where the value pairs along the verticalaxis represent the combinations of voltages on eitherside of the sensor, and the horizontal dimension rep-resents current. An open circuit fault is characterizedby a zero current for all voltages which yields the rela-tion on the right-hand side. From a comparison of thetwo relations, one can directly identify (and computeby a straightforward algorithm) which inputs (qualita-tive voltage pairs) lead possibly or deterministically todifferent observed values for the current.

    Thus, qualitative modeling allowed us to implementtest generation algorithms and apply them to relaycircuits, a problem extracted from a real applicationof programming test automata for circuits of a trainstation control system ([18,19]). Although the imple-mented system produces provably correct sets of tests(and identifies behavior modes that cannot be discrim-inated under the given sets of causal and observablevariables) for examples of fifty or so components, forwhich the task of test generation is already beyond anexhaustive treatment by humans. However, it is still farfrom an application system or usable tool. There aretwo fundamental limitations. The solution treats testgeneration as a very abstract task and ignores the prac-tical aspects of testing (except for the restriction oncausal and observable variables): the equipment andcost involved, sequencing of tests to reduce these costs,

    Fig. 11. Qualitative behavior relations for a correct sensor/resistor(left) and one showing an open circuit fault (right).

    the fact that certain tests are risky or impossible unlesssome faults have been ruled out, etc.

    Secondly, even if the model-based test generationis extended appropriately to include such aspects, be-coming a useful tool requires to reflect the actual workprocesses that deal with the generation or executionof test plans. In our domain, this is mostly done inthe service department. Here, a significant amount ofwork is dedicated to creating, checking, adapting, up-dating, and translating diagnosis manuals which aredistributed as guidelines to the service stations (in ourcase on CD-ROM). The core of these documents isformed by the textual description of plans for perform-ing testing in the workshops with the goal of fault lo-calization. Hence, one natural way to introduce testgeneration is obviously to support the authors in theservice departments in their task of producing suchmanuals. We started this by specifying and implement-ing a new type of authoring system, called Genesis.

    3.3. Generation of diagnosis manuals

    Genesis is a software tool that combines test genera-tion with the authoring of repair manuals. It combinesmodel reuse with text reuse and provides a knowledgerepresentation level for device data.

    We analyzed the conventional authoring process ofa repair manual in order to couple test generation andrepair planning. Traditional authoring systems for re-pair manuals are essentially word processors or pub-lishing systems. They partially support the publishingprocess, like document release and translation, but donot exploit the strong structuring of repair manuals nordo they assist the author in keeping the repair manualconsistent. Furthermore, these text processors cannotexport documents using semantically tagged documentformats like SGML or XML because they have no no-tion of semantics.

    Our aim was to improve upon a conventional textprocessing system in three stages:

    1. Provide a stronger and more detailed structuringof repair manuals. Associate words, phrases orsections of a document with their semantics.

    2. Supply a knowledge representation level that re-lates text fragments from the repair manuals witha representations of the real world system struc-ture.

    3. Enhance the represented system structure with alibrary of behavioral models for the device andits components as a basis for the integration andexploitation of model-based techniques, such asautomated test generation.

  • 112 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    The first stage of Genesis aims at a sophisticated reuseof textual fragments from existing repair manuals. Thisaddresses translation issues, which are one of the mainexpense factors involved in the distribution of repairmanuals. There are hundreds of vehicle variants, butthe repair manuals commonly build upon only a hand-ful of different types of tests and actions. For the diag-nosis of electrical systems, which constitutes the mainpart of the manuals, these are steps like turning onthe ignition, pulling plugs, measuring voltage and re-sistance, reading signal values, and comparing themwith nominal values (refer to Fig. 12 for an example).Hence, the same verbalizations occur, with slight vari-ations, again and again. In Genesis, this issue is ex-ploited by splitting the text of a repair manual intosmall coherent phrases, which are translated individu-ally. The translation of a manual assembled from thesetext fragments does not require any additional effort.As a side effect, the repair manuals become more uni-form, even if they are written by different authors, be-cause they are all composed from the same repositoryof verbal expressions. Each text fragment serves a spe-cific semantic role in the final repair manual. The frag-ments are organized according to this role, which facil-itates their retrieval. Moreover, these semantics enablea transformation of the repair manual into a semanti-cally tagged document format like SGML or XML.

    Figure 12 is a screen-shot of this first stage of thesystem. The left part of the screen displays the struc-

    ture of the repair manual, by which the author cannavigate through the chapters and pages of the docu-ment (in this case a manual for a fuel injection sys-tem). When a page is selected, its contents is displayedin the second window. This window is also used forediting text. Words and phrases that are connected tophysical properties of the vehicle are displayed in bold– components are underlined, parameters are not. Thethird view verifies that the page is actually composedfrom semantic text fragments. It illustrates that textfragments can themselves be composed from other textfragments, for example aTestStepconsists of a list ofSetupActionsand aTest. Some text fragments are notvisible because the tree has not been fully expanded inthis display.

    The second stage of Genesis extends the semanticrepresentation of text with a model-based knowledgerepresentation of the vehicle. This vehicle model con-tains a hierarchical representation of parts and compo-nents, a description of the vehicle topology and datasheets with device parameters. The knowledge rep-resentation level associates vehicle specific text frag-ments with their physical correspondent in the model.A main advantage for the author is that he can ap-ply changes to the vehicle model without having to gothrough the whole repair manual. In fact, if the vehi-cle data are available in electronic form (e.g., as CADdata), they could be imported via an interface. Genesis

    Fig. 12. First stage of Genesis.

  • H. Milde et al. / Integrating model-based diagnosis techniques into current work process 113

    will automatically update all references to the changeddata. This leaves the arrangement of tests and actionsto repair plans unaffected. If the structure of the repairplan itself requires modifications (e.g. because a sen-sor plug is no longer accessible in the vehicle variant),with the current tools, these modifications have to becarried out by hand.

    Figure 13 shows a snap-shot of the additional viewsin this second program stage. The top left window dis-plays an hierarchical view of the device structure. Thewindow below presents the data sheet of the selectedcomponent, in this case the intake-air temperature sen-sor. The right windows contains a diagram for the se-lected component (if available). The author can createreferences to components, terminals or parameters byselecting them in either of the three views.

    The third and final stage of Genesis employs tech-niques of model-based reasoning. Stage two alreadyencourages the author to explicitly represent the struc-ture and characteristics of the vehicle subsystems. Thethird stage combines this information with models ofcomponent behavior, as they are used in model-baseddiagnosis or design. The resulting behavior model of

    the vehicle can be employed to perform automated rea-soning tasks, such as the generation of complete repairplans.

    3.4. Model-based test plan generation for genesis

    Further development of the automatic test genera-tion outlined in Section 3.2 into a useful add-on to Gen-esis required to include the practical aspects of the di-agnosis and testing task, such as the actions and costinvolved. In accordance with [13], we discriminate be-tween generic and pragmatic aspects:

    Generic modelsare not task specific. They describethe general physical principles underlying thebehav-ior of a type of component or device. An integratedmodel-based system will tend to share generic mod-els for testing with other reasoning tasks like design orquality analysis. The generic models have been createdusing the commercial model-based diagnosis environ-ment RAZ’R ([23]). This environment provides facil-ities for modeling generic behavior of components, aswell as for defining parameters, states and the structureof devices. RAZ’R includes algorithms for model com-

    Fig. 13. Second stage of Genesis.

  • 114 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    position and can export fully composed device modelsusing XML.

    Pragmaticsare given by aspects of the componentsor the device which are considered specific to a par-ticular device and its physical realization and/or a taskas it is carried out in reality. In the testing and repaircontext this comprises information like

    – probing points and their accessibility,– available test equipment,– assembly and disassembly actions,– setup, treatment and repair actions,– action costs, preconditions and safety constraints,– description of device symptoms,– failure probabilities.

    Such aspects have to be captured by a representationlayer that is separate from, but related to elements ofthe generic behavior models. For instance, a variable inthe behavior model might be linked to a probing pointand its accessibility conditions in the pragmatics.

    It is important to note that there are some interde-pendencies between the generic behavior model andthe pragmatic aspects of testing that can complicatethe task significantly. For instance, measuring the re-sistance of, say, a sensor requires disconnecting it fromthe control unit and attaching the measurement devicewhich amounts to a significant structural change of theoriginal device. In out current solution, we have chosento include the respective potential structural changes inthe device model by adding switch connections to themeasurement devices. Clearly, this is not a satisfactoryand principled solution.

    Nevertheless, we chose, to maintain the distinctionbetween behavior models and representation of thepragmatics for reasons of re-usability: because the for-mer are generic, they can be reused in different tasksand for devices that share the principled features of be-havior, the latter can vary considerably in form, con-tent, and the reasoning schemes required. For instance,for different types of vehicles, the device topology forthe temperature sensor circuit will look like the one de-picted in Fig. 13. But the physical layout and installa-tion could be vastly different, including variations inaccessibility of probing points, distance between com-ponents etc. A typical consequence is that the measure-ment device has to be connected to different plugs.

    As a consequence, we also decided to maintainthe test generation algorithm described in Section 3.1(which operates on the generic behavior model only) asa separate software component. Rather than extendingthisabstract test generatorto include the pragmatic as-

    Fig. 14. The three modules of the repair plan generator.

    pects and their exploitation, we organize this knowl-edge and reasoning in a separate component, theac-tion planner, and coordinate the two components by astrategistcomponent (see Fig. 14).

    This approach differs from the one applied in thefork lift application (Section 2) and decision tree gen-eration for on-board diagnosis as described in [5]. Theytreat the task basically as aclassification problem,while we consider aplanning approach as the ultimateappropriate solution which can take into account ac-tions, goals, cost etc. in a more flexible way (althoughwe have not employed a sophisticated planner at thistime). The alternative solutions can be seen to reflectdifferent importance of actions and cost in on-board vs.off-board diagnosis.

    Dependent on the relevant criteria for the quality andcost of repair plans, the action planner can vary, andthe strategist can be used to organize different waysof interaction between the abstract test generator andthe planner. Figure 15 shows the interfaces and majorphases of the iterative repair plan generation processfor Genesis which we will briefly explain in the fol-lowing.

    This input of the strategist comprises

    – a behavior model of the device that is to betested,

    – a description of the availableactionsand possibleobservations,

    – pragmatic information about their cost, precon-ditions, etc. (For example, the connector of theelectronic control unit will only be accessible ifthe ECU housing is accessible and open. To getto the ECU housing, depending on the location,the hood may need to be open and a splash guardremoved,)

    – a description of thecurrent situation whichchanges dynamically at each stage in the genera-tion process.

    The latter covers

    – thesystem stateas expressible by the behaviormodel (e.g., switch positions, engine temperature,etc.)

  • H. Milde et al. / Integrating model-based diagnosis techniques into current work process 115

    Fig. 15. Four phases of a cycle in the plan generation process.

    – the state of thephysical systemas a result ofprevious actions, for instance installation of mea-surement devices, disassembly, etc.

    – the state of the problem solving process, e.g.,given as the set of remaining fault hypotheses andtheir probabilities.

    At each stage (basically each node in a discriminationtree), the strategist generates a discrimination task (setsof faults to be discriminated) for the abstract test gen-erator and, according to the current situation, identi-fies the variables that can be influenced and observedas well as ascenario, the current system state and fur-ther restrictions (see Fig. 15a). The latter may reflect,for example, critical states to be avoided. This resultsin a restriction of the behavior model by an additionalrelation. For example, a first step in a test plan maybe to determine whether the device has some criticalfault, which requires additional safety measures and re-strictions on tests. After critical faults have been exon-erated, the discrimination focus can be refined to dis-criminate among the remaining faults, which will thenbe easier to perform because the safety measures canbe relaxed.

    Based on this input and fault probabilities, if avail-able, the abstract test generator produces candidatetests based on the principles described in Section 3.1along with a characterization of their discriminationpower in terms of remaining sets of fault hypothesesunder different test outcomes and, perhaps, changes inentropy (Figure 15b). A test is characterized in an ab-stract way as a set of input values including directlyaccessible states and a set of variables to be observed.

    While theabstract test, thus, characterizeswhathap-pens when the test is performed, the plan needscon-crete testsproviding details onhowto perform the test.For example, an abstract test for a car may state thatwe want to determine the voltage across the battery.A concrete realization of this abstract test could be toopen the hood, disconnect the battery, connect a volt-age meter and observe its display. Obviously the con-crete realization depends on the current situation, be-cause if the voltage meter is already connected to thebattery, the test is reduced to just observing the display.

    The strategist selects one of the abstract tests to beturned into a concrete test by the action planner, if pos-sible. Injecting the required input and observing therelevant variables specifies the goal which the planner

  • 116 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    Fig. 16. Test plan of temperature sensor circuit.

    has to achieve based on the available actions respectingor establishing their preconditions and observing costof actions (Fig. 15c). As mentioned earlier, we onlyhave a basic implementation of this component whichnevertheless seems appropriate for the problems posedby the existing repair plans. In general, though, we feelthat much more research needs to be dedicated to thedevelopment of more powerful model-based planners.

    In the final phase of each cycle, the strategist ex-tends the repair plan by a selected concrete test (mainlybased on cost criteria) and updates the current situationto the ones after the application of the test and the ob-servation of its different possible outcomes (Fig. 15d).

    The integration of abstract test generation and ac-tion planning makes the algorithm more effective, be-cause, in general, the process of computing a concreterealization for an abstract test involves a planning task,which can become cumbersome. Early pre-selectionthus helps to keep the plan generation task feasible.Moreover, abstract tests mainly depend on the devicemodel and accessibility conditions. If the test generatoris used in an interactive repair planning context, whereit is invoked at different stages, much of the reasoninginvolved in abstract test generation remains valid andcan be reused. If the device model does not change,only the ranking of tests and, of course, the concrete

    realizations, need to be re-computed to fit the new sit-uation.

    In Fig. 16, a plan (including some verbalization forthe sake of readability) is presented which is generatedby the system for a temperature sensor circuit simi-lar to the one depicted in Fig. 13. As can be seen, theplan has a structure and tagging of the abstract repre-sentation of a test plan in Genesis, as it is presentedin the right-hand window of Fig. 12. Hence it can beimported to the authoring system and automaticallyturned into a textual representation based on the prim-itive text building blocks for the different nodes in thegraph. Thus, the results of the model-based system takeon the same form as the current test plans and can befurther processed without requiring new tools or im-mediate changes in the actual work process.

    4. Exploiting statecharts in modelling for diagnosis

    4.1. Application scenario

    One of the three domains investigated in INDIAhas been dyehouse machinery as developed by THENGmbH. In particular, Fraunhofer IITB has looked atpost mortem diagnosis of the chemical distributor(CHD), a system to dose and distribute chemicals andcolours in a dyehouse automatically (see Fig. 17).A failure of this system may stop the overall produc-tion and damage material as well as environment con-siderably. It is therefore important to identify and re-move the underlying fault as fast as possible. A postmortem diagnostic system that assists dyehouse staffwith this task can considerably reduce downtime, sincein many cases it allows them to get the CHD up andrunning again without having to wait for THEN’s ser-vice technicians.

    Although the benefits of a diagnostic system forCHD are therefore quite clear, developing such a sys-tem has been out of question for THEN before westarted the INDIA project. One of the main reasons isthat introducing this technology into a company is toexpensive. Firstly, using today’s tools and techniquesrequires not only a lot of experience in diagnosis, butknowledge about concepts from Artificial Intelligenceor mathematics that are unknown to the ordinary do-main expert. Hence, either domain experts have to betrained on the tools or diagnostic experts have to beintroduced to the domain – both costly alternatives.Secondly, existing documents with engineering knowl-edge are hard to re-use or to import into the diagnostic

  • H. Milde et al. / Integrating model-based diagnosis techniques into current work process 117

    Fig. 17. Sketch of the Chemical Distributor CHD (c© THEN GmbH).

    knowledge base. Therefore, knowledge has to be rep-resented several times, with additional costs for main-taining consistency.

    These reasons are not specific for THEN but, in ourexperience, are typical for many manufacturers of ma-chinery. They have to be overcome to foster a morewidespread industrial application of intelligent diag-nostic systems.

    This chapter sketches Fraunhofer’s approach to re-duce these costs considerably. It is based on our vi-sion to integrate not only knowledge for diagnosis withexisting engineering or product data; but to integratebuisiness processes: development of diagnostic sys-tems for a device with development of the device it-self. To achieve a smooth integration, diagnostic devel-opment should rely on the same people and found onexisting knowledge and tools.

    4.2. The chemical distributor CHD

    In the course of dyeing, specific mixtures of chemi-cals have to be filled in the dyeing machine about threeor four times. CHD automates measuring every chemi-cal in the mixture off its tank, transporting the mixturethrough a network of pipes, valves and flaps to the re-

    questing dyeing machine and finally rinsing the pipesinvolved with water.

    Figure 18 shows the structure of the network. Thebasic components are the PLC, a flow meter, pipes,valves, pumps and flaps. Valves are grouped into col-lectors, which join different input pipes to an output,and distributors, which multiplex a single input pipeto multiple outputs. The flap between drain valve andair valve separates the collector side, which initially isfilled with water, from the distribution side, which ini-tially is filled with air.

    Roughly, CHD processes a request for chemicals asfollows: Initially, all the valves and flaps are closed,all the pumps are off. To compose the correct cock-tail of chemicals, the PLC signals all the flaps andvalves on the path from the tank of the first chemicalto the requesting machine to open. Then it starts thepump on that path. When the flow meter signals thatthe requested amount has been measured, the pumpstops and all the valves are signalled to close. Theother chemicals are dosed in the same way, with certainamounts of water between them.

    After the last chemical has been dosed, water isadded to push the cocktail past the flap into the distri-bution part of the network. Then the flap is closed, thevalves and flaps between air valve and requesting ma-

  • 118 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    Fig. 18. Structure of the CHD.

    chine are opened and compressed air is used to blowthe cocktail to the machine.

    Finally, a cocktail of water from all the segments ofthe collector network involved is composed in the sameway and transported to the drain valve close to the re-questing machine.

    4.3. Model based diagnosis of the chemicaldistributor with CHD-DIAS

    According to THEN, the very rare faults of thechemical distributor are typically valves stuck at closedor pumps that do not work. In that case an operator mayactivate a program like our post mortem diagnosticsystem CHD-DIAS (ChemicalDistributor-DiagnosticAssistant). CHD-DIAS is a model based system in thetradition of GDE [8], which analyses inconsistenciesbetween observations and device models.

    Initially, CHD-DIAS compares device models withhistories of the control signals to pumps and valvesas well as the history of sensor signals from the flowmeter. In principle, these can be observed by the pro-grammable logic controller (PLC) and reported to alogging facility. In practice, this proved impossible,since THEN was reluctant to modify the PLC-programcontrolling a real dyehouse to add a logging facility.This points at a typical problem in process industry:frequently suppliers of equipment do not have theirown test sites but have to test innovations at the sitesof certain pilot customers. This also supports our claimto integrate development of diagnostic functions with

    development of the device itself, since it involves con-siderably less risks than adding new functionality to analready running system.

    From the histories of the control signals CHD-DIAScan deduce if all components on a path through CHDshould have enabled flow of chemicals. Using thestructural model of CHD, this information is passed onto the model of the flowmeter. If this sensor does notdetect any flow, this contradiction can be exploited fordiagnosis: CHD-DIAS will suspect every componenton the path. To discriminate between them it will askthe dyehouse staff for observations about these compo-nents.

    Using the approach presented above CHD-DIAS isable to localize the typical faults of the chemical dis-tributor. Nevertheless, CHD-DIAS is still a prototype.To turn it into a product, one has to improve the userinterface and integrate it more closely with monitoringand control systems in the dyehouse. Furthermore, thetime required for diagnosis has to be decreased consid-erably.

    4.4. To reduce introduction costs

    Fraunhofer IITB has used the process of develop-ing CHD-DIAS as a guiding example to our searchfor ways to reduce the costs for introducing machineor plant building companies to diagnostic technology.The resulting approach seems to be applicable at leastto diagnosis of devices whose behaviour is describedby domain experts using (extended) state transition di-

  • H. Milde et al. / Integrating model-based diagnosis techniques into current work process 119

    agrams. Its main elements, which will be demonstratedin a second, are:

    1. We start from existing state transition diagramsdescribing the behaviour of the relevant devicecomponents or processes. If no such diagramsexist for certain components or processes, do-main experts may use a commercial CAE-toollike StateMate to design such diagrams.

    2. The diagrams are checked, and experts are askedto provide certain missing information aboutstates or transitions, which may be useful or nec-essary for diagnosis. The heuristics which en-able such pointed questions are a central resultof our work. Checking diagrams and questioningexperts should be done automatically in the fu-ture.

    3. We import the diagrams into the respective diag-nostic engine. To this end we export the diagramsfrom a commercial tool like StateMate into a newintermediate language called AML. AML is tai-lored to the representation of extended state tran-sition diagrams. Another program called SCI willthen translate the AML-representation into theformalism understood by our diagnostic engineMuDia. By this two step approach we minimizedouble work in translating models from differentcommercial discrete event modelling tools intoour own formalism.

    4. The models are used in standard consistencybased diagnosis.

    Thus, domain experts can specify much of the knowl-edge required for diagnosis without relying on ex-pensive diagnostic experts. In the following we willdemonstrate this approach while developing a diagnos-tic model for the behaviour of a valve.

    Initial diagram

    We start with the straightforward state transition di-agram shown in Figure 19. The eventsdoOpen anddoClose correspond to the positive flanks of the con-trol signals from the PLC. Since we cannot observe ifa particular valve is open or closed (at least not with-out recurring to the operator), the model specifies thatstatesOPENINGandCLOSINGwill have been left af-ter a certain time signalled by some timeout event. Thebehaviour of pumps, flaps and even blocks of valvesmay be described in a similar way: by specifying a flowblocking state, a flow enabling state and two interme-diate states.

    Fig. 19. State Transition Diagram describing the behavior of a valve.

    Augment the diagram

    The diagram from Fig. 19 is not very helpful withrespect to diagnosis, since it does not allow for any pre-dictions that could be compared to observations. Wehave developed a set of guidelines to help domain ex-perts in augmenting the model:

    1. For every state try to identifyinvariantswhichare refutable by considering the available signalhistories. An invariant is a feature wich does notchange throughout the state.In our diagram we can state that flow is en-abled throughoutOPENand disabled throughoutCLOSED. However, there do not seem to be anyinvariants forOPENINGor CLOSING. In partic-ular, flow may be disabled at the beginning ofOPENINGand enabled towards the end.

    2. Identify intermediate states, i.e., states duringwhich the operation mode of the device changesqualitatively.In our example,OPENINGand CLOSINGareobviously intermediate states whileOPENandCLOSEDare stable.

    3. For every intermediate state try to identify refu-table post conditionsthat indicate success ofthe mode transition. (Again, ‘refutable’ means‘refutable by considering the available signal his-tories’.)In our example the post conditions coincide withthe invariants of the successor states: flow has tobe enabled afterOPENINGand it has to be dis-abled afterCLOSING.

  • 120 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    4. For every intermediate state try to identify refu-tablepreconditions, i.e., conditions that necessar-ily have to be satisfied to successfully completethe corresponding operation mode transition.

    There are of course a lot of preconditions forsuccessfully opening or closing a valve; but noneof them can be refuted by considering the avail-able signal histories. However, if there were suchconditions and if some of them are not met, thena failing mode transition may be due to the samefailure that invalidated the precondition.

    5. For every intermediate state try to identify refu-tableconditions sufficient for successof the cor-responding mode transition.

    Again, there do not seem to be any such condi-tions for opening or closing the valve. However,if there were such conditions we could use themto prove that some failure is not an effect of someprevious failure by proving that at least one suf-ficient condition was satisfied.

    6. For every intermediate state try to identifyloweror upper bounds for the durationof the corre-sponding mode transition.

    In particular, the upper bound for the durationcan be combined e.g., with the post conditionsof an intermediate state to refute that the corre-sponding mode transition has finished in time.Fortunately, domain experts can very often givesuch bounds. In our example, a valve needs nolonger than 1 second to open and 0.5 s to close.

    7. Try to specify refutableoperation conditionsofthe device, i.e., conditions on the interaction ofthe device with its environment. For instancespecify conditions on the frequency of externalevents or whether certain events are not allowedin certain states.

    If some operation condition is not satisfied, one mighthypothesize a fault in the past of the system or itsenvironment rather than a fault in the current state.However, one has to be careful to meet the ‘No struc-ture in function’ principle: operation conditions haveto hold in every environment if the device is supposedto behave as intended. Therefore, we will not requirethat doClose is only allowed in stateOPENor thatdoOpen is only allowed in stateCLOSED:

    One may ask whether specification of invariants,preconditions and post conditions of a mode transi-tion does not suffer from the notorious frame problem,qualification problem and ramification problem, resp.But these problems do not seem to be significant inthis context, since we require conditions to be refutable

    Fig. 20. Augmented State Transition Diagram describing the behav-ior of a valve for diagnosis.

    from the limited set of available signal histories. Fol-lowing these guidelines to augment the valve model ofFig. 19 yields Fig. 20.

    In [14] we have argued that these heuristics maybe transferred to flow diagrams in general. Such dia-grams are used by experts in many domains, e.g., di-agrams of material or energy flow in process indus-try [9], function structures in engineering design [24],function graphs in plant automation [1] or phase mod-els to describe production processes [4]. This indicatesthat developing ‘wizards’ which rely on these heuris-tics to guide domain experts in modelling for diagnosisis indeed a promising approach.

    Import the state transition diagram into the diag-nostic engine

    Domain experts should use their favourite discretemodelling tool, e.g., StateMate or StateFlow, to spec-ify state transition diagrams. However, these tools willusually not provide export filters to languages under-stood by today’s diagnostic engines. In particular, mostdiagnostic engines require that implicit assumptionslike Persistence by Default (‘if evente arrives in states,but there is no transition labelled withe from s whoseguard conditions are satisfied, then the system stays ins’) are specified explicitly. Our diagnostic tool MuDiais no exception. Hence we have to translate the statetransition diagram and explicate such assumptions onthe way.

    To minimize double work in developing translatorsfrom different discrete event modelling tools into ourown formalism DEEP, we use a two step approach.In the first step we translate the representation con-

  • H. Milde et al. / Integrating model-based diagnosis techniques into current work process 121

    Fig. 21. AML-representation of the augmented state transition dia-gram.

    structed by the commercial tool into a new intermedi-ate language called AML. [3] describes AML in moredetail. Figure 21 shows the AML-representation of thediagram from Fig. 20.

    In the second step, a program called SCI (StateChart Integrator) translates AML-representations intoDEEP-representations. A first version of SCI has beendescribed in [3]. The current version described in [14]eliminates a couple of serious shortcomings.

    From the AML-representation of a state transitiondiagram SCL generates five types of implication for-mulae in a temporal logic language:

    – Transition rulesdescribe the preconditions andeffects of transitions.

    – Intra-state action rulesdescribe the actions whichare processed whenever entering or leaving a stateor reacting to an event without leaving the currentstate.

    – Activity rulesdescribe invariants of a state and ac-tivities that proceed as long as the state is active.They correspond to traditional intra-state con-straints.

    – Definition rulesdefine certain types of events, liketimeouts or changing values of system variables.

    – Explanation closure rulesdescribe under whichcircumstances state and variable values remainunchanged.

    Unfortunately there is not enough space to give exam-ples for these rules or the DEEP-representation of thevalve model. The reader should look at [14] for the for-mer and contact the author for the latter.

    4.5. Discussion

    The development of CHD-DIAS has shown again:while today’s diagnostic systems are powerful enoughto provide real value to operators of complex technicaldevices, high costs prevent the routine development ofsuch systems. In this chapter we have sketched an ap-proach to reduce these costs in domains, where expertsmodel behaviour with state transition diagrams. Theapproach strives at enabling domain experts to con-struct diagnostic models in the environment they areused to, thus minimizing the need for costly externalexperts.

    Other approaches like [2,6,7,17,28,29] do also usestate transition diagrams for model based diagnosis. Itis still unclear how they relate to our approach. Thereare also interconnections to systematic requirementsengineering and to FMEA which need to be further in-vestigated.

    Diagnosis based on (extended) state transition dia-grams like statecharts should gain relevance in the fu-ture, since they are increasingly used in industry tocompletely or partially model so called reactive sys-tems. This class encompasses e.g., many embeddedsystems or control systems [15]. Moreover, the ap-proach does also seem to be applicable to flow dia-grams in general, as argued in [14].

    During the last years the idea of identifying sets ofmore or less standardized modules for implementation,called ‘well known solutions’, ‘basic steps’ or ‘pat-tern’, resp. has come up in different disciplines (cf [10,12,16,24]). Founding diagnosis on libraries with diag-nostic models for these standard modules would notonly further reduce development costs but present an-other step towards the integration of diagnostic devel-opment into device development.

    5. Summary

    The INDIA project was an attempt to promote a ma-jor step in the transfer of model-based systems into in-dustrial applications. This required not only the fur-

  • 122 H. Milde et al. / Integrating model-based diagnosis techniques into current work process

    ther development of modeling and model-based rea-soning techniques, but also consideration of the contextin which the systems were to be used as tools support-ing the current work process. The solutions includedapproaches to

    – exploiting existing engineering knowledge, as inthe chemical distributor case study,

    – generating diagnostic output in a form that is fa-miliar to experts involved in the task and that canbe processed by existing tools like fault trees inthe fork lift application, and

    – exporting results to a (new generation of a) stan-dard tool like authoring systems for the automo-tive repair plans.

    The case studies indicate that with the mature partof the model-based technology, it is possible to pro-vide effective support and speed-up of current steps inthe work processes. This applies particularly to thosesteps that require technical knowledge about the de-vices and plants, but apply in a very structured way un-der well-defined goals. But it is often this part that istime-consuming and limits the exploitation of expertknowledge for more creative and ‘non-standard’ tasks.Model-based systems could be shown to provide a so-lution to the problem of having to carry out the sametask for a number of variants. Of course, the projectalso helped to reveal shortcomings and limitations ofthe current technology. We feel that much more work isneeded to effectively support the generation of modelsfor the various tasks and also to model the tasks to besupported as real WORK PROCESSES including theirpractical conditions and constraints in order to providethe appropriate tools.

    Acknowledgments

    We would like to thank our colleagues at IITB, LKIand TUM as well as our partners in INDIA for theirvaluable contributions. This work has been supportedby the Bundesministerium für Bildung, Wissenschaft,Forschung und Technologie (BMBF) under the grant01 IN 509 D 0, INDIA.

    References

    [1] W. Ahrens, H.-J. Scheurlen and G.-U. Spohr,Informationsori-entierte Leittechnik, Oldenbourg-Verlag, 1997 (in German).

    [2] P. Baroni, G. Lamperti, P. Pogliano and M. Zanella, Diagnosisof large active systems,Artificial Intelligence110(1999), 135–183.

    [3] A. Brinkop, Statecharts zur Simulation und Diagnose.Fraunhofer-Institut für Informations- und DatenverarbeitungIITB, Karlsruhe, INDIA-Report 99-01, 1999 (in German).

    [4] H. Buchner, J. Lauber and M. Polke, Das Informationsmodell:Basis für die interdisziplinäre Prozeßbeschreibung,AT – Au-tomatisierungstechnik42(1) (1994), 5–10 (in German).

    [5] F. Cascio, L. Console, M. Guagliumi, M. Osella, A. Panati,S. Sottano and D. Thes