Self-organizing maps for virtual sensors, fault detection ...20098/FULLTEXT01.pdf ·...

Self-organizing maps for virtual sensors,fault detection and fault isolation in

diesel engines

Examensarbete utfort i Reglerteknikvid Tekniska Hogskolan i Linkoping

av

Conny Bergkvist and Stefan Wikner

Reg nr: LiTH-ISY-EX05/3634SELinkoping 2005

Self-organizing maps for virtual sensors,fault detection and fault isolation in

diesel engines

Examensarbete utfort i Reglerteknikvid Tekniska Hogskolan i Linkoping

av


Reg nr: LiTH-ISY-EX05/3634SE

Supervisor: Johan Sjoberg (LiTH)Urban Walter (Volvo)Fredrik Wattwil (Volvo)

Examiner: Svante Gunnarsson (LiTH)Jonas Sjoberg (Chalmers)

Linkoping 18th February 2005.

Avdelning, Institution Division, Department

Institutionen fr systemteknik 581 83 LINKPING

Datum Date 2005-02-11

Sprk Language

Rapporttyp Report category

ISBN

Svenska/Swedish X Engelska/English

Licentiatavhandling X Examensarbete

ISRN LITH-ISY-EX--05/3634--SE

C-uppsats D-uppsats

Serietitel och serienummer Title of series, numbering

ISSN

vrig rapport ____

URL fr elektronisk version http://www.ep.liu.se/exjobb/isy/2005/3634/

Titel Title

Self-organizing maps for virtual sensors, fault detection and fault isolation in diesel engines

Frfattare Author


Sammanfattning Abstract This master thesis report discusses the use of self-organizing maps in a diesel engine management system. Self-organizing maps are one type of artificial neural networks that are good at visualizing data and solving classification problems. The system studied is the Vindax(R) development system from Axeon Ltd. By rewriting the problem formulation also function estimation and conditioning problems can be solved apart from classification problems. In this report a feasibility study of the Vindax(R) development system is performed and for implementation the inlet air system is diagnosed and the engine torque is estimated. The results indicate that self-organizing maps can be used in future diagnosis functions as well as virtual sensors when physical models are hard to accomplish.

Nyckelord Keyword self-organizing maps, neural network, virtual sensor, diesel engine, fault detection, fault isolation, automotive, development system

Abstract

This master thesis report discusses the use of self-organizing maps in a diesel enginemanagement system. Self-organizing maps are one type of artificial neural networksthat are good at visualizing data and solving classification problems. The systemstudied is the Vindax R development system from Axeon Ltd. By rewriting theproblem formulation also function estimation and conditioning problems can besolved apart from classification problems.

In this report a feasibility study of the Vindax R development system is per-formed and for implementation the inlet air system is diagnosed and the enginetorque is estimated. The results indicate that self-organizing maps can be used infuture diagnosis functions as well as virtual sensors when physical models are hardto accomplish.

Keywords: self-organizing maps, neural network, virtual sensor, diesel engine,fault detection, fault isolation, automotive, development system

i

Preface

The thesis has been made by two students, one student from Chalmers University ofTechnology and one from Linkopings University. For practical purposes this reportexist in two versions; one at Chalmers and one at Linkoping. Their contents areidentical, only the framework differs slightly to meet each universitys rules.

iii

Acknowledgments

Throughout the thesis work a number of persons have been helpful and engaged inour work. We would like to thank our instructors Fredrik Wattwil and Urban Walterat Engine Diagnostics, Volvo Powertrain for their support and engagement. In ad-dition we thank instructors and examiners; Svante Gunnarsson and Johan Sjobergat Linkopings University and Jonas Sjoberg at Chalmers University of Technol-ogy. Also the support team from Axeon Ltd - Chris Kirkham, Helge Nareid, IainMacLeod and Richard Georgi - and the project coordinator, Carl-Gustaf Theen,as well as the other employees at the Engine Diagnostics group have been to greathelp for the success of this thesis.

v

Notation

Symbols

x, X Boldface letters are used for vectors and matrices.

Abbreviations

ABSE ABSolute ErrorANN Artificial Neural NetworkDC Dynamic CycleEMS Engine Management SystemETC European Transient CyclePCI Peripheral Component InterconnectRMSE Root-Mean-Square ErrorSOM Self-Organizing MapVDS Vindax R Development SystemVP Vindax R Processor

vii

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.6 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.7 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background theory 52.1 Self-organizing maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Diesel engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 The basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.2 Four-stroke engine . . . . . . . . . . . . . . . . . . . . . . . . 92.2.3 Turbo charger . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.4 What happens when the gas pedal is pushed? . . . . . . . . . 10

3 The Vindax R Development System 133.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Labeling step . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1.2 Classification step . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Conditioning of input signals . . . . . . . . . . . . . . . . . . . . . . 153.3 Classification of cluster data . . . . . . . . . . . . . . . . . . . . . . . 16

3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3.2 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 19

3.4 Function estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4.2 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 21

3.5 Handling dynamic systems . . . . . . . . . . . . . . . . . . . . . . . . 253.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.5.2 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 26

3.6 Comparison with other estimation and classification methods . . . . 293.6.1 Cluster classification through minimum distance estimation . 29

ix

x Contents

3.6.2 Function estimation through polynomial regression . . . . . . 30

4 Data collection and pre-processing 314.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 Amount of data . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1.2 Sampling frequency . . . . . . . . . . . . . . . . . . . . . . . 324.1.3 Quality of data . . . . . . . . . . . . . . . . . . . . . . . . . . 324.1.4 Measurement method . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Conditioning - Fault detection 375.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Classification - Fault detection and isolation 416.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.2.1 Leakage isolation . . . . . . . . . . . . . . . . . . . . . . . . . 426.2.2 Reduced intercooler efficiency isolation . . . . . . . . . . . . . 43

6.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7 Function estimation - Virtual sensor 477.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 48

8 Conclusions and future work 518.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

8.1.1 Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 528.1.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 528.1.3 Function estimation . . . . . . . . . . . . . . . . . . . . . . . 53

8.2 Method criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Bibliography 55

A RMSE versus ABSE 57

B Measured variables 59

C Fault detection results 61C.1 90 % activation frequency resolving . . . . . . . . . . . . . . . . . . . 62C.2 80 % activation frequency resolving . . . . . . . . . . . . . . . . . . . 63C.3 70 % activation frequency resolving . . . . . . . . . . . . . . . . . . . 64C.4 60 % activation frequency resolving . . . . . . . . . . . . . . . . . . . 65

Contents xi

C.5 Most frequent resolving . . . . . . . . . . . . . . . . . . . . . . . . . 66

xii Contents

Chapter 1

Introduction

This chapter gives the background, the purpose, the goal, delimitations, used meth-ods and the outline of the thesis.

1.1 Background

The demand for new methods and technologies in the Engine Management System,EMS, is increasing. Laws that govern the permissible level of pollutants in theexhaust of diesel engines have to be followed at the same time as the drivers demandhigh performance and low fuel consumption, putting the EMS under pressure.

Applications in the EMS are based on a number of inputs measured by sensorsand estimated by models. Limitations of these sensors and models create situationswhere qualitative input signals cannot be obtained. Adding new sensors is expensiveand creates additional need for diagnostics. Together, this forces engineers to lookat new methods to attain these values or to find ways around the problem.

Identifying models using Artificial Neural Networks, ANNs, can be one solution.A lot of research has been put into this area. The increasing number of applica-tions with implemented ANNs indicates that the technology could be of use in anEMS. There are development systems available to produce hardware and softwareapplications using ANNs to be deployed in the EMS. The Vindax R DevelopmentSystem, VDS, by Axeon Limited1 is an example of this. The system may be capableof solving problems within the EMS.

The thesis work has been performed at Volvo Powertrain Corporation. Thediagnosis group want to know if the VDS can be used to help in the area ofdiagnosing diesel engines.

1www.axeon.com

1

2 Introduction

1.2 Problem description

Three different kinds of problems are investigated in this thesis:

Conditioning - Fault detection Conditioning a system is one way of determin-ing if the system is fault free. Assume that a system has states x , where xdenotes the system states and denotes a set, if it is fault free. Determiningif x is a conditioning problem.

Classification - Fault isolation There can be different kinds of faults that needsto be isolated. Assume that a system with states x is fault free if x 0.Assume also that k different classes of errors can occur and that each errorcauses the states to belong to one of the sets 1, 2, . . . k. Determiningwhich set x belongs to is a classification problem.

Function estimation - Virtual sensor A system generates output according toy = f(u)

1.4 Goal 3

1.4 Goal

To fulfill the purpose, the thesis goal is divided into three parts:

Estimate the output torque with better accuracy than the value estimatedby the EMS

Detect air leakages in the inlet air system as well as a reduced intercoolerefficiency with accuracy sufficient to make a diagnosis

Isolate the same errors sufficient accurate to make a diagnosis

All experiments are performed on a Volvo engine using the VDS.

1.5 Delimitations

The evaluation of the VDS will be based upon a PCI2-card version of the Vindax R

hardware on a Windows PC.

For the fault detection and isolation problems data from one single engine in-dividual are used. However, two different types of test running cycles are used. Forfunction estimation data from one single type of test cycle are used, but the testcycle is run on two different types of engines. For all problems, the data used forverification are of the same kind of data as for training.

1.6 Method

To fulfill the purpose and reach the goal of this thesis, a feasibility study of theVDS is performed. The study reveals basic properties of the system and it will bea solid base for the continued work. With the feasibility study as base, the actualdevelopment of the applications takes place.

2Peripheral Component Interconnect, PCI, is a standard for connecting expansion cards to aPC.

4 Introduction

1.7 Thesis outline

Chapter 2 Introduction to the theory behind the Vindax R Processor, VP, usedin the VDS; self-organizing maps. The diesel engine is also described to givea sense of the environment where the algorithm is going to be deployed.

Chapter 3 A feasibility study of the VDS and its three main application areas;conditioning, classification and function estimation.

Chapter 4 Here the collection and pre-processing of data is discussed. Importantissues as representability of data are handled, both in general and specificterms.

Chapter 5-7 Each of these chapters describes the development of an application.These are very specific chapters and presents results that can be expected tobe achieved with the VDS.

Chapter 8 Ends the report with conclusions and recommendations for futurework.

Chapter 2

Background theory

This chapter gives a short introduction to self-organizing maps and diesel enginesfor readers with no or little knowledge about these areas.

2.1 Self-organizing maps

Professor Kohonen developed the SOM algorithm in its present form in 1982 andpresents it in his book [1]. The algorithm maps high-dimensional data onto a lowdimensional array. It can be seen as a conversion of statistical relationships intogeometrical, suitable for visualization of high-dimensional data. As this is a kindof compression of information, preserving topological and/or metric relationshipsof the primary data elements, it can also be seen as a kind of abstraction. Thesetwo aspects make the algorithm useful for process analysis, machine perception,control tasks and more.1

A SOM is actually an ANN categorized as a competitive learning network. Thesekinds of ANNs contain neurons that, for each input, compete over determining theoutput. Each time the SOM is fed with an input signal, a competition takes place.One neuron is chosen as the winner and decides the output of the network.

The algorithm starts with initiation of the SOM. Assume that the input vari-ables, i, are fed to the SOM as a vector u = [1, 2 . . . n]T

6 Background theory

input sample is calculated. According to equation (2.1)2 the winner mc(t) is theneuron closest to the input sample.

||u(t)mc(t)|| ||u(t)mi(t)|| i (2.1)

Here t is a discrete time-coordinate of the input samples.When a winner is selected, the reference vectors of the network are updated.

The winning neuron and all neurons that belong to its neighborhood are updatedaccording to equation (2.2).

mi(t + 1) = mi(t) + hc,i(t)(u(t)mi(t)) (2.2)

The function hc,i(t) is a neighborhood function decreasing with the distancebetween mc and mi. The neighborhood function is traditionally implementedas a Gaussian (bell-shaped) function. Also for convergence reasons, hc,i(t) 0 when t . This regression distributes the neurons to approximate the proba-bility density function of the input signal.

To give a more qualitative description of the SOM-algorithm an example is con-sidered. The input signal, generated by taking points on the arc of a half circle3, isshown in figure 2.1. This is a signal of dimension two that the SOM is going to map.The SOM has in this case a size of 16 neurons, i.e. an array 4 by 4. To illustratethe learning process, the array is imagined as a net, with knots, representing neu-rons, and elastic strings keeping a relation to each knots neighbors. For initiation,the SOM is distributed over the input space according to figure 2.1. The referencevectors are randomized in the input space (using a Gaussian distribution).

During training, a regression process adjusts the SOM to the input signal. Foreach input, the closest neuron is chosen to be the winner. When a winner is selected,it and its neighbors are adjusted towards the input. This can be seen as taking thewinning neuron and pulling it towards the input. The elastic strings in the netcauses the neighbors also to adjust, following the winner. Referring to equation(2.2) the neighborhood-function, hc,i(t), determines the elasticity of the net.

The regression continues over a finite number of steps to complete the trainingprocess. In figure 2.1 network shapes after 10 and 100 steps of training are illus-trated. As can be seen a nonlinear, ordered and smooth mapping of the input signalis formed after 100 steps. The network has formed an approximation of the inputsignal.4

2In equation (2.1) the norm used usually is the Euclidean (2-norm), but other norms can beused as well. The VDS uses the 1-norm.

3Although there is probably no physical system giving this input, it suites the purpose toillustrate the SOM-algorithm.

4Additional training has no significant affect on the structure of the neurons.

2.1 Self-organizing maps 7

Figure 2.1. Illustration of the SOM-algorithm. Each step involves presenting one inputsample for the SOM. The network has started approximating the distribution of the inputsignal already after 10 steps. After 100 steps the network has formed an approximationof the input signal.

8 Background theory

2.2 Diesel engines

It is not crucial to know how diesel engines work to read this report, but someknowledge might increase the understanding why certain signals are used and howthey affect the engine.

This thesis will only discuss the four-stroke diesel engine because that is thetype that is used in Volvos trucks. To learn more about engines, both diesel andOtto (e.g. petrol), see e.g. [2], [3].

2.2.1 The basics

A diesel engine, or compression ignited engine as it is sometimes called, convertsair and fuel through combustion into torque and emissions.

Figure 2.2. A picture from the side of one cylinder in a diesel engine.

Figure 2.2 shows the essential components in the engine. The air is broughtinto the engine through a filter to the intake manifold. There the air is guidedinto the different cylinders. In the cylinder the air is compressed and when thefuel is injected the mixture self ignites. The combustion results in a pressure thatgenerates a force on the piston. This in turn is transformed into torque on thecrankshaft that makes it rotate. Another result of the combustion is exhaust gasesthat go through the exhaust manifold, the muffler and out into the air.

In this thesis the measured intake manifold pressure and temperature are calledboost pressure and boost temperature respectively.

2.2 Diesel engines 9

2.2.2 Four-stroke engine

The engine operates in four strokes (see figure 2.3):

Figure 2.3. The four strokes in an internal combustion engine.

1. Intake stroke. (The inlet valve is open and the exhaust valve is closed) Theair in the intake manifold is sucked into the cylinder while the piston movesdownwards. In engines equipped with a turbo charger, see section 2.2.3, airwill be pressed into the cylinder.

2. Compression stroke. (Both valves are closed) When the piston moves up-wards the air is compressed. The compression causes the temperature to rise.Fuel is injected when the piston is close to the top position and the hightemperature causes the air-fuel mixture to ignite.

3. Expansion stroke. (Both valves are closed) The combustion increases thepressure in the cylinder and pushes the piston downwards, which in turnrotates the crankshaft. This is the stroke where work is generated.

4. Exhaust stroke. (The exhaust valve is open and the inlet valve is closed)When the piston moves upwards again it pushes the exhaust gases out intothe exhaust manifold.

After the exhaust stroke it all starts over with the intake stroke again. As a resultof this, each cylinder produces work during one stroke and consumes work (friction,heat, etc.) during three strokes.

2.2.3 Turbo charger

The amount of work, and in the end the speed, generated by the engine dependson the amount of fuel injected, but also the relative mass of fuel and oxygen isimportant. For the fuel to be able to burn there must be enough of oxygen. Fromthe intake manifold pressure the available amount of oxygen is calculated and in

10 Background theory

turn, together with the wanted work, the amount of fuel injected is calculated. So,a manifold pressure that is not high enough might lead to lower performance.

Figure 2.4. A schematic picture of a diesel engine.

To get more air, and with that more oxygen, into the cylinders, almost all largediesel engines of today use a turbo charger. This will increase the performance ofthe engine as more fuel can be injected. The turbo charger uses the heat energyin the exhaust gases to rotate a turbine. The turbine is connected to a compressor(see figure 2.4) that pushes air from the outside into the cylinders.

One effect of the turbo charger is that the compressor increases the temperatureof the air. When the air is warmer it contains less oxygen per volume. An intercoolercan be used to increase the density of oxygen and by that increase the performanceof the engine. After the intercooler the air has (almost) the same temperature asbefore going through the compressor but at a much higher pressure. From the idealgas law, d = m/V = p/RT , we get that the density has increased. This way we canget more air mass into the same cylinder volume only using energy that otherwisewould be thrown away.

2.2.4 What happens when the gas pedal is pushed?

When the driver pushes the pedal the engine produces more torque. Simplified, theprocess looks like:

1. The driver pushes the pedal.

2. More fuel is injected into the cylinders.

3. More fuel yields a larger explosion which leads to more heat and in turn morepressure in the cylinder. (Ideal gas law)

2.2 Diesel engines 11

4. The increased pressure pushes the piston harder which leads to higher torque.

But it does not stop here; after a little while the turbo kicks in.

5. The increased amount of exhaust gases together with the increased temper-ature accelerate the turbine.

6. The increased speed of the turbine makes the compressor push more air intothe intake manifold and the pressure rises.

7. This leads to more air in the cylinders.

8. With more air in the cylinders more fuel can be used in the combustion which,in turn, leads to an even larger explosion and even more torque.

The final four steps is repeated until a steady state is achieved. This is a ratherslow process which takes a second or two before steady state is achieved.

12 Background theory

Chapter 3

The Vindax R DevelopmentSystem

In this chapter the VDS by Axeon Limited is introduced with a feasibility study. Tobe able to visualize and easily understand the way the VDS works, low dimensionalinputs are used to create illustrative examples. To finish this chapter the VDS iscompared with other systems trying to solve the same problems.

In this chapter the variables y and u denote the output signal and the inputsignal respectively.

3.1 Introduction

The VDS deploys a SOM algorithm in hardware through a software developmentsystem. The hardware system, used for this thesis, consists of a PCI version of theVP. It consists of 256 RISC processors, each representing a neuron in a SOM. Thearchitecture allows for the computations to be done in parallel, which decreases theworking time substantially. The dimension of the input vectors, i.e. the memorydepth, is currently limited to 16.

Configuration of the processor is quite flexible. It can be divided, in up tofour separate networks that can work in parallel, individually or in other typesof hierarchies. There is also a possibility to swap the memory (during operation),enabling more than one full size network to run in the same processor.

The software is simple to use and provides good possibilities to handle andvisualize data. It also includes some data pre-processing tools as well as outputfunctions. In the software system, the VP is used with networks sizes of 256 neu-rons and smaller. To test larger networks, a software emulation mode, where themaximum size is 1024 neurons and the memory depth is 1024, is available. Theemulation mode do not use the VP and therefore increases the calculation timesconsiderably.

13

14 The Vindax R Development System

The VDS is able to solve classification problems, described in section 1.2, accord-ing to section 3.3. Rewriting of the conditioning and function estimation problems,enables a solution to these problems as well. Performing this is described in sections3.2 and 3.4.

Development of applications using the VDS involves 4 steps:

1. Data collection and pre-processing

2. Training

3. Labeling

4. Classification

Data collection and pre-processing are described in chapter 4. During the train-ing step the network is organized using the SOM algorithm presented in section 2.1.The labeling and classification steps are described below.

3.1.1 Labeling step

The training has formed an approximation of the input signal distribution and thisapproximation should now be associated with output signals, be labeled.

The labeling step uses the same input data as the training step, together withmeasured output data. It requires the correct output value to be known for eachinput. The neuron responding to a particular input sample should have an outputvalue matching the measured output value. Presenting the data to the networkmakes it possible to assign values for each neuron.

As there are more input samples than neurons, each neuron will have manyoutput value candidates. If these candidates are not identical, which is most oftenthe case, some kind of strategy for choosing which one to use as label is needed.Either, it is chosen manually for each neuron, or one of the automatic methods ofthe VDS is used. There are different kinds of methods suitable for different kindsof problems and they are discussed in sections 3.2-3.4 in its contexts.

3.1.2 Classification step

If correctly labeled, the network can be used for mapping new inputs using theclassification step. This is done in a very straightforward manner and can be seenas function evaluation. For each new input a winning neuron is selected as describedin section 2.1. The label of the winner is the output of the network.

3.2 Conditioning of input signals 15

3.2 Conditioning of input signals

To solve a conditioning problem, according to section 1.2, the distance measurementhas to be involved. As discussed in section 2.1, the distance from the input sampleto each neuron is calculated when a winner is selected. It is this distance that canbe used for conditioning of signals.

The idea is to save the maximum distance, between the neurons and the inputsignals, that occurs when the network is presented with input signals from a faultfree system. An error has probably occurred if the maximum distance is exceededwhen new inputs are presented to the network.

To do this, the network is first trained on data from a system with no faults.After that, the same data are fed to the network again and the distance betweenthe input signal and the winning neuron is saved for each sample. The maximumdistance, for each neuron, is used as label.

During the classification step, the label value of the winning neuron is subtractedfrom the distance between this neuron and the current sample to get the differencebetween the distance of the current sample and the maximum distance that occuredduring the training. Previously unseen data have been presented to the VP whenthe difference is larger than zero. Assuming that the network has been trained onrepresentative (see 4.1) data, this implies that an error has probably occurred.

Applying this on the real problem is as simple as on a constructed one. Thereforesee chapter 5 for a real world example that illustrates the use of this technique.


3.3 Classification of cluster data

To explore the capabilities to solve classification problems, see section 1.2, withVPs, an example with clusters in three dimensions is made.

3.3.1 Introduction

Three, Gaussian, clusters are created with the mean values in three points. Theyare labeled: blue, red and black. Different variances are used to create two sets ofinput data, shown in figures 3.1-3.2. Printed in black and white it is hard to seewhich cluster the data points belong to. The purpose of the figures is however toshow how the clusters are distributed.

Figure 3.1. First cluster setup with every tenth sample of the data plotted from differentangles.

The task is to take an input signal, composed of the three coordinates for eachpoint, and classify it as blue, red or black. To do this, the VP is first trained (usingall 256 neurons). This creates an approximation of the input signal distributionthat, hopefully, contains three regions. This can be visualized by looking at thedistance between neurons using the visualization functionality in the VDS. Thisis very useful when the input signal has more than three dimensions, as it is notpossible to visualize the actual data itself. In this way it is possible to get an idea ofhow the neurons are distributed over the input space also when working in higherdimension. Then the SOM is labeled to be able to give some output. This is thepart where settings for the classification are made.

3.3 Classification of cluster data 17

Figure 3.2. Second cluster setup with every tenth sample of the data plotted fromdifferent angles.

When labeling the SOM, there will be conflicts when neurons are associatedwith more than one input. For example, if a neuron is associated with inputs withlabel red 60 times and black 40 times the neuron will be named multi-classifiedafter the labeling process in the VDS. This situation has to be resolved and thereare different methods for this.

In the VDS, there are six different multi-classification resolving methods:

1. Most frequent

2. Activation frequency threshold

3. Activation count threshold

4. Activation frequency range threshold

5. Activation count range threshold

6. Manually

The first two are most appropriate for the situations occurring in this thesis, whichis why the other methods not are handled here.1

1The third method is similar to the second one but harder to use as the number of activationsdepend on the amount of data. There is no reason to use a range (method 4-5) instead of athreshold as the neurons with a high number of activations definitely should be labeled. Finallythe manual method is very time-consuming and does not really provide an advantage comparedto the first two methods.


Using the first method makes the VDS assign neuron labels that occurred mostfrequently. In the example above, this would mean that the neuron would be clas-sified as red.

With the second option it is possible to affect the accuracy of the classification.There will be a trade-off between the number of unclassified inputs and the accuracyof the ones classified. If the activation frequency threshold is set to a high value,say 90 percent, the classification will be accurate but there can be many inputsthat are classified as unknown. In the example above, this would mean that theneuron is not classified and the threshold has to be lowered to 60 percent to classifythe neuron as red.

Figure 3.3. Possible label regions for input classification when using data from figure 3.2as input.

This is visualized in figure 3.3. In between the clusters there are inputs thatbelong to one cluster but are situated closer to another cluster center. For example,there are many inputs in figure 3.2 that belong to the red cluster but are closerto the black cluster center (and vice versa). Depending on how high the frequencythreshold is set for the labeling, these inputs will be classified as black or notclassified at all.

With a high threshold there will be many neurons without a label. This causesthe unclassified region to be big and the accuracy, of those classified, to be high.The drawback of this is that many inputs will not be classified at all, e.g. trying toclassify between an engine with no faults and an engine with air leakage will returnmany answers saying nothing.

3.3 Classification of cluster data 19

3.3.2 Results and discussion

The results from using the VDS to perform this task are summarized in tables 3.1-3.4.

Classifiedas blue

Classifiedas red

Classifiedas black

Notclassified

True value blue 95.74% 0.00% 0.06% 4.20%True value red 0.00% 95.72% 0.06% 4.16%True value black 0.00% 0.00% 98.41% 1.59%

Table 3.1. Classification on first cluster (figure 3.1) data with no multi-classificationresolution.

Classifiedas blue

Classifiedas red

Classifiedas black

True value blue 99.75% 0.00% 0.26%True value red 0.13% 99.74% 0.13%True value black 0.07% 0.50% 99.42%

Table 3.2. Classification on first cluster (figure 3.1) data using most frequent as multi-classification resolution.

Classifiedas blue

Classifiedas red

Classifiedas black

Notclassified

True value blue 77.72% 0.13% 0.19% 21.96%True value red 0.07% 84.13% 0.13% 15.68%True value black 0.28% 0.28% 70.25% 29.19%

Table 3.3. Classification on second cluster (figure 3.2) data with no multi-classificationresolution.

Classifiedas blue

Classifiedas red

Classifiedas black

True value blue 96.76% 1.10% 2.14%True value red 0.59% 96.80% 2.61%True value black 2.46% 1.12% 96.42%

Table 3.4. Classification on second cluster (figure 3.2) data using most frequent as multi-classification resolution.

Using no multi-classification resolution, i.e. a frequency threshold of 100 percent,gives high accuracy but many unclassified samples. Instead, using most frequentresolution gives a high classification rate but with less accuracy. Which one to usedepend on the application where it should be used.

These two are extremes where no multi-classification resolution leaves manyneurons without a label whereas the most frequent resolution method labels allneurons, except for the ones with equally many activations from two or more labels.The frequency threshold can be lowered to get a result that would lie somewherein between these. I.e. a little lower accuracy but a little lower not classified signals.


3.4 Function estimation

This section investigates the capabilities to estimate functions, see section 1.2, withthe VDS.

3.4.1 Introduction

First a simple linear system is created:

y1[t] = 600u1[t] + 750u2[t], (3.1)

with uniformly distributed random inputs u1, u2 [0, 2]. These are used to trainthe VP.

This is a very simple problem; parameter estimation. It is questionable whetherto use a neural network to solve this as simpler methods will probably producebetter results. It is however suitable to illustrate the characteristics of the VDS.

The labeling of the VP will be done by, for each neuron, choosing the meanvalue of all label candidates. This is the most common method to select labelswhen estimating functions. It is appropriate as there will be a very large numberof labels, i.e. measured outputs, and choosing the mean value will give a niceapproximation.

Three network sizes are used to estimate the system:

- 64 neurons

- 256 neurons, the physical limit of the VP

- 1024 neurons, the limit of the software emulation mode

Increasing the number of neurons also increases the number of output values (oneoutput value per neuron). Used on the same problem a larger network should leadgive a higher output resolution hence reduce the errors.

As a second step, noise is introduced to the system. The network is going to betrained and labeled using noisy signals. The classification can then be comparedwith the true output values to see whether the network is performing worse or if itis a robust method.

The noise, band-limited white noise, is added to the input and output signals.Different amplitudes will be used, as the input signals are much weaker than theoutput. The amplitude is chosen according to table 3.5.

Signal Noise-amplitudeu1 5 104u2 5 104y 100

Table 3.5. Noise amplitudes

3.4 Function estimation 21

Three non-linear systems are compared to the linear system. These are:

y2[t] = 1200 atan(u1[t]) + |18 u2[t]|2 (3.2)

y3[t] = 1200 atan(u1[t]) + 100 e2

u2[t] (3.3)

plus a third system identical to system (3.2) except for a backlash with a dead-bandwidth of 200 and an initial output of 10002, that is applied on the output signal.All systems have (different) uniformly distributed, random inputs, u1, u2 [0, 2].


The results are summarized in table 3.6 with Root Mean Square Errors, RMSEs,and Absolute Errors, ABSEs for the estimations. See appendix A for details ofRMSE and ABSE.

System RMSE mean ABSE max ABSESystem (3.1)64 neurons 70.9 59.2 187.8256 neurons 35.3 29.6 93.21024 neurons 18.5 15.6 51.5256 neurons(noisy signal) 254.0 199.2 1069System (3.2)without backlash 39.8 31.2 164.5with backlash 83.4 68.4 283.1System (3.3) 41.2 33.3 148.4

Table 3.6. Statistics after classification with VDS on the four systems examined in thissection. Systems (3.2) and (3.3) are all with 256 neuron networks and without noise.

With a SOM of 256 neurons a result shown in figure 3.4 is achieved. The esti-mated output values in the figure, are gathered at a number of levels. I.e. the outputfrom the VP is discrete. It is hard to see in the figure, but there are actually 256different levels, corresponding to the number of neurons in the SOM.

Looking at how the SOM is distributed over the input space, the output levelsare denser towards the middle. When trained, the SOM approximates the proba-bility distribution of the input signal. In this example two uniformly distributedsignals are used as input. Together these inputs have a higher probability for valuesin the middle range of the space. Extreme values have less probability and hencethe look of figure 3.4.3

2See Matlab R/Simulink R help for details3A classical example in probability theory is throwing dices. Each throw is uniformly distrib-

uted among the results. Taking two throws in a row, the probability to get an outcome of 6 ishigher than the probability to get for example 2. That is very similar to the problem that isapproached here.


Figure 3.4. Correlation plot with estimated values compared to the true values of sys-tem (3.1) using a network of 256 neurons. The straight line shows the ideal case wherethe network has infinitely many neurons and is optimally trained and labeled.

Figure 3.5. Correlation plot with estimated values compared to the true values using anetwork of 1024 neurons applied on system (3.1). The straight line shows the ideal casewhere the network has infinitely many neurons and is optimally trained and labeled.

3.4 Function estimation 23

Comparing figure 3.4 with figure 3.5, the horizontal lines are closer together andalso shorter when the SOM is larger. This indicates smaller errors and the valuesin table 3.6 confirms this.

The horizontal lines in figure 3.4 are approximately twice the length of the linesin figure 3.5. The RMSE and the mean ABSE are also twice as big with a 256neuron SOM, see table 3.6.

Looking at the results when using only 64 neurons shows that these values are,approximately, twice as large as for 256 neurons. All together the results indicatesthat, for this problem, the error size is reduced with the square-root of the networksize. This is probably problem dependent and more tests are needed to reveal realrelationship.

Figure 3.6. Correlation plot with estimated values compared to the true values using aSOM of 256 neurons with noisy signals applied on system (3.1). The straight line showsthe ideal case where the SOM has infinitely many neurons and is optimally trained andlabeled.

As noise is introduced to the problem, the SOM performs a lot worse. Figure 3.6shows the correlation plot and the estimation error is much bigger now. In com-parison to the noise free estimation error, there is a six times higher error madeby the SOM now. The horizontal lines are distributed the same way compared tofigure 3.4, but they are longer. This points out that there are more inputs that aremisclassified now than earlier, increasing the error. This concurres with the valuesin table 3.6.

Looking into non-linearities, the results from the systems (3.2) and (3.3) donot differ much from the linear case. This can also be seen when comparing thetopmost graph in figure 3.7 with figure 3.4. There is no big difference between a


Figure 3.7. Correlation plot between true and estimated values for system (3.2). Withoutbacklash (top figure) and with backlash (bottom figure). The bottom figure definitelyshows worse results than the top figure. The straight line shows the ideal case where thenetwork has infinitely many neurons and is optimally trained and labeled.

linear and non-linear system as long as it is static. The reason for this is that theSOM does a mapping.

The addition of the backlash to system (3.2) decreases the VDS ability to modelthe system as can be seen in table 3.6 and when comparing the two plots in fig-ure 3.7. A SOM cant handle a backlash. The backlash output is dependent on thedirection from which the backlash area is entered. The SOM system is static, so itonly looks at current signal values, and therefore it doesnt have the required in-formation to handle dynamic systems, i.e. systems that depend on previous values.See section 3.5 for more about dynamic systems and SOMs.

3.5 Handling dynamic systems 25

3.5 Handling dynamic systems

In this section, a method to estimate dynamic systems with SOMs is described.Although a function estimation problem is used to exemplify this, the method applyto conditioning and classification problems as well.

In [4], the estimation of a dynamic system is done automaticly by modificationof the SOM-algorithm. In short, the output signal is used as an extra input sig-nal during the training step to create some short-term memory mechanism. Thetechnique is called Vector-Quantized Temporal Associative Memory. This solutionis not possible in the VDS without remaking of the core system.

Another, manual, way of handling dynamic data is adopted instead. Lagged,input and output, signals are used as additional input signals to incorporate theinformation needed to handle dynamics.

3.5.1 Introduction

The SOM can be seen as a function estimator, estimating the function f() in

y(t) = f(u(t)),

i.e. the output at time t only depends on input signals at time t. By adding historicalinput and output values to the input vector, the method can estimate functions onthe form

y(t) = f(u(t),u(t 1), ...,u(t k), y(t 1), ..., y(t n))

In this way, the dynamic function becomes, in theory, static when enough historicalvalues are used. Knowing that the SOM method has no problem with non-linearitiesthis leads to that, in theory, given enough historical signal values all causal, iden-tifiable systems can be estimated with this method. This is only limited by thediscrete output, there can only be as many output values as there are neurons.

This theory is illustrated using the discrete system

y[t] = u[t], (3.4)

using both Gaussian and uniformly distributed random signals as input u. The VPis trained on the signal u[t] and labeled with y[t]. This example also shows upondifferences when using different kinds of distributions.

When this is verified, a time shift is introduced to the system;

y[t] = u[t 1] (3.5)

Exactly as before, the processor is trained on the signal u[t] and labeled with y[t].It should be impossible to approximate the system because of the random datainput. The VP is not able to guess the value of y[t] when just knowing u[t] and notu[t 1] as there is no correlation between them.

Then the VP is trained on the signal u[t1] and labeled with y[t]. This problemis identical to the first problem, just as a change of variable, and there should


be no problem to estimate the system with the same accuracy as the system inequation (3.4).

A situation where the exact dynamics of the system are known will not likelyappear in practice. Therefore a test is performed to see what happens if the VPis over-fed with information. The processor is trained with both u[t] and historicalvalues of u[t] as input.


Table 3.7 summarizes these results.

System Input signals max ABSE mean ABSE RMSEy[t] = u[t] u[t] 2.6 0.4 0.5

u[t] Gaussian 38.1 0.4 0.8y[t] = u[t 1] u[t] 134.7 64.0 73.9

u[t 1] 2.0 0.4 0.5u[t], u[t 1] 12.7 4.0 4.8u[t], ..., u[t 3] 89.3 15.6 19.6

Table 3.7. Results of function estimation using the VDS on simple time lagging examples.

Starting with the system in equation (3.4), the results show that the VP hasno problem doing this estimation using a uniformly distributed random signal asinput. The correlation plot in figure 3.8 visualizes how well the system is estimatedand the values in table 3.7 are very good.

Figure 3.8. Correlation plot of correct vs estimated values using a uniformly distributedinput signal.

Comparing figure 3.9 with figure 3.8 shows the differences between using a

3.5 Handling dynamic systems 27

Gaussian and a uniformly distributed signal as input. The estimation at extremevalues are worse with a Gaussian signal. The reason for this is that the SOMalgorithm approximates the distribution of the training data. Using an uniformlydistributed input signal will spread the neurons evenly over the input space, whileusing Gaussian input more neurons will be placed in the center of the input spaceand less neurons at the perimeters. Less neurons, in an area, gives a larger errorwhen discretizising the input signal and most often results in a larger error in theoutput signal, which can be seen in figure 3.9.

Figure 3.9. Correlation plot of correct vs estimated values using a Gaussian input signal.

Trying to estimate the system in equation (3.5), the result again confirms theexpectations. The correlation between the correct and the estimated values of yis 0.0043, the system could not be estimated. As proposed, this is solved bylagging the input signal. Now the accuracy is almost identical to the first estimation.The small difference is due to randomness of the input signals.

The VP is clearly confused when the dynamics is not exactly matched, as canbe seen when comparing figure 3.10 with figure 3.8. Here both u[t] and u[t 1] areused as inputs. The result shows that it is possible to estimate the system (3.5)although not as good as when using only u[t 1] as input. This is because theSOM method uses unsupervised learning. It does not know, during training, thatthe extra dimension in the input space is useless. This results in a lower resolution.

Using more historical signals as input gives even worse results. This shows theimportance of having good knowledge of time shifts and other dynamics in thesystem.


Figure 3.10. Correlation plot of correct vs estimated values when estimating y[t] =u[t 1] with the VP. The processor is given u[t] and u[t 1] as inputs.

3.6 Comparison with other estimation and classification methods 29

3.6 Comparison with other estimation and classi-fication methods

In this section a comparison is made between the VDS and some other simplemethods to get an idea of the performance of the VDS. Classification and functionestimation are the problems chosen as suitable for a comparison.

3.6.1 Cluster classification through minimum distance esti-mation

The classification of cluster data with the VP can be compared with a simplealgorithm implemented in Matlab R. The idea is to estimate the cluster centres bytaking an average over the input training data. These averaged centres will thenact as reference points to which a distance is calculated when an input data pointshould be classified. The algorithm is simply4:

1. Estimate each cluster centre by taking the mean value of all input data thatbelong to each cluster

2. Calculate the distance from each new input data point to all the clustercentres

3. Classify the input as belonging to the closest cluster centre

Classified as blue red blackTrue value blue 97.10% 0.86% 2.04%True value red 0.50% 98.18% 1.32%True value black 1.84% 1.34% 96.82%

Table 3.8. Classification results using the algorithm described.

This algorithm is tested on the second set of cluster data (figure 3.2). Theresults are presented in table 3.8 and are compared with table 3.2 where mostfrequent multi-classification resolution is used. The performance is slightly betterin the VDS approach.

This shows that the VDS has sufficient capabilities to classify data. The examplewas a simple one, why the comparing algorithm was quite easy to design. If howevernoisy signals are used or the clusters are not Gaussian, there might be a lot moreproblems to find a suitable algorithm, mainly because of that the cluster centresare difficult to estimate. The VDS however, is used the same way, which is why ithas a strong advantage in simplicity.


Figure 3.11. Correlation plot with estimated values compared to true values using leastsquares estimation.

3.6.2 Function estimation through polynomial regression

The estimation of the system in equation (3.1) with the VDS in section 3.4 is putinto perspective with a least squares parameter estimation with Matlab R.

With no noise added to inputs and outputs the estimation problem is deter-ministic. Therefore only the case with noise is interesting for a comparison. Withknowledge of the system the estimation problem is formulated as:

y = u (3.6)

where = [1 2] are the parameters to be estimated and u = [u1 u2]T the inputsignals. This is an over determined equation system that is solved with least squareminimization.

Using exactly the same data set as in the noise example in section 3.4 butusing Matlab Rs polynomial regression the result shown in figure 3.11 is achieved.Comparing this to figure 3.6, the polynomial regression look worse and table 3.9confirms this. The VDS performs better than a polynomial regression.

Method Max ABSE Average ABSE RMSEVDS 1078 189.3 254.0Polynomial regression 1371 237.1 297.3

Table 3.9. Errors when using polynomial regression compared to the results from theVDS.

4The first step of this algorithm correspond to the training and labeling steps in the VDS andsteps 2-4 to the classification step.

Chapter 4

Data collection andpre-processing

Collecting and pre-processing data are two subjects that due to their complexitycould be subject for a thesis of their own. Therefore, this report does not handlethem in depth and only relevant issues are discussed. It is however important tostress that data collection and pre-processing are two key factors to succeed indeveloping applications with SOMs.

4.1 Data collection

The entirely data driven property of the SOMs, and therefore the VDS makes datacollection extremely important. The training of the SOM is affected by the waydata are collected. The

amount of data

sampling frequency

quality of data

measurement method

are issues that have to be considered.

4.1.1 Amount of data

A rule of thumb is that during the training sequence the VDS should be presentedto, at least, an amount of data points 500 times larger than the size of the SOM1.This is to ensure that the network approximates the probability density function

1There is no risk of over-training (over-fitting). The SOM method has no such problems.

31

32 Data collection and pre-processing

of the input data. In a standard size VP with 256 neurons, this means 128000samples.

When not enough data is available, training data can be presented to the SOMin a cyclic way until the required number of data points have been processed2.However, doing this may cause information to be lost, e.g. input data with gapsare not representative.

In addition, enough data for both training and validation has to be available.

4.1.2 Sampling frequency

The sampling frequency has to be considered when dealing with dynamics. Aswas shown in section 3.5, dynamics can be handled by lagging input and/or outputsignals. How many and which old values should be used, depend on the dynamics ofthe system combined with the sampling frequency. With a high sampling frequency,many old values are needed to cover the dynamics of the system. On the other hand,a low sampling frequency, might cause dynamic information to be lost.

In this thesis a sampling frequency of 10 Hz is used. This is the samplingfrequency that is available in the test cell equipment used.

4.1.3 Quality of data

The quality of data is the most important factor to get good results. There are manyaspects of this, but the two most important are that data have to be representativeand correctly distributed. This is, again, due to the fact that the SOM-algorithmis data driven and cannot extrapolate or interpolate. The VDS will, most likely,generate unreliable output if variables are not representative and correctly distrib-uted.

Representative data contain values from the entire operating range. This meansthat the variables have varied from their maximum to their minimum values duringthe measurement. All possible combinations of variables should also be availableto ensure representativeness.

Data from variables that are not representative should either be omitted orused for scaling/normalization. E.g. the ambient air pressure variable is hard tomeasure in its full range. Therefore it is better to use it to scale other pressurevariables that depend upon it.

The distributions of the input variables depend on the test cycle used whencollecting data. Therefore the test cycle has to produce data that are suitable forthe intended application. There are two different categories that are used: uniformand real life distributions.

Uniform distribution The distribution should be uniform when estimating func-tions and doing condition monitoring. In both these cases it is important that

2In this thesis the data are also randomized to ensure that they are not grouped in a badmanner.

4.1 Data collection 33

the application has equal performance over the entire input space, why inputdata should to be uniformly distributed.

Real life distribution A classification problem does not require equal perfor-mance over the entire input space. It is sufficient, and often advantageous, tomake the classification in areas where the input signal resides frequently in areal situation.

Although these are the distributions wanted for all input signals, often not allof them can be controlled individually, i.e the boost temperature depends on thetorque and the engine speed and can not be controlled manually. This is somethingthat needs to be considered from case to case when constructing the cycle used forgathering data.

4.1.4 Measurement method

Data are collected in an engine test cell where an engine is run in a test bench.In the bench an electromagnetic braking device is connected to the engine. Thisdevice brakes the engine to the desired speed. Different operating conditions canthen be obtained. A lot of variables can be measured but the ones in appendix Bare the ones measured for this thesis.

As there are deviations between engine individuals, only one engine is used.In the laboratory, the engine is controlled with test cycles/schemes to simulatedifferent driving styles and routes. It is important that the cycle generates appro-priate data that suite the purpose, as discussed in section 4.1.3. For this thesis,two different test cycles are chosen, the Dynamic Cycle, DC, and the EuropeanTransient Cycle, ETC. A DC provides the possibility to control variables usingdifferent regulators, here the engine speed and torque are used, whereas the ETCis a standardized cycle.

Aiming for data from the entire dynamical range of the engine, a randomizedDC is created. The cycle starts with a torque of 600 Nm and an engine speedof 1350 RPM and then desires new torques and engine speeds twice per second.These values are randomly set under the constraint to prevent the engine to over-load3. This is followed by not allowing the speed to vary with no more than 100RPM/second and the torque with 1000 Nm/second. In this way, the cycle will takethe engine through a large amount of operating conditions.

Two different change rates is used in this thesis. The first sets new desiredvalues two times per second and runs for 2000 seconds. With a sampling frequencyof 10 Hz 20000 samples are collected per cycle. The other sets new desired valuesfive times per 10 seconds and runs for 20000 seconds. The same sampling frequencyis used hence 200000 samples are collected.

The ETC is a standardized cycle that contains three phases:

3An overload situation can occur if the engine is running at a very high speed. If the demandedspeed drops from a high level to a much lover level in a short time, for example going from 2500RPM to 1000 RPM in one second, the breaking of the engine will cause the torque to peak.


1. city driving - many idle points, red lights, and not very high speed.

2. country road driving - small road with lot of curves driving

3. high-way driving - rather continuous high speed driving

The data from this cycle are not as uniformly distributed as the DC but containother types information about the dynamical behavior of the engine. Sequencessuch as taking the engine from idle to full load are captured in the ETC cycle butnot in the DC.

The ETC cycle takes approximately 1800 seconds. Again 10 Hz is used as sam-pling frequency why approximately 18000 samples are collected.

Measurements are performed in three different cases, listed in table 4.1. Theyare performed on different occasions (read days) why the quality of the collecteddata has to be examined closely. Both the DC and the ETC are used, thus six setsof measurements are collected.

Case Description1. Fault free engine2. Engine with a 11 mm leakage on the air inlet pipe

(after the intercooler, see figure 2.2)3. Engine with reduced intercooler efficiency to 80%

Table 4.1. Description of the three different measurement cases.

4.2 Pre-processing

The pre-processing of data can be divided into specific and general pre-processing.The specific pre-processing deals with how to generate proper files for use as

input to the VDS. This can be very different depending on how the data filesare structured. In addition to this, data may have to be filtered, time shifted,differentiated or other. This is also, more appropriately, called feature extractingand is more a part of the problem solving method.4 When the input files are inorder, data are ready to be used as input to the VDS but not to the VP.

The general pre-processing involves file segmenting, statistics generation andchannel splitting where different toolboxes in the VDS can be used as well as othersoftware. Also, the general pre-processing includes transforming data to a form theVP handles. The input interface of the VP uses 8-bit unsigned integers. Due tothis the data, usually provided as floats, are scaled to the range 0-255 and thenconverted to unsigned integers. It is convenient to use the VDS for these tasks.

One variable that is handled in the same way for all applications (chapters 5-7)is the ambient air pressure. The mean value and the standard deviation in table 4.2

4The feature extracting is handled in the three chapters 5-7 where the developments of theapplications are discussed.

4.2 Pre-processing 35

Case 1 2 3Mean value [kPa] 97.3 98.1 99.7

Standard deviation [kPa] 0.0603 0.0448 0.0604

Table 4.2. Ambient air pressure mean values and standard deviations for the three cases,see table 4.1, using the DC.

reveals that this variable is not representative. The pressure values collected, stemfrom the ambient air pressure outdoors, which varies from day to day5. Measure-ments from all different altitudes and weather conditions are not available, andtherefore this variable is used to normalize turbo boost pressure, which is the onlyother pressure signal used.

5All cases are measured in different days. This will cause the VDS to produce very impressiveresults if, for example, a distinction is to be made between case one and two. The VDS willonly use the ambient pressure when deciding the winner. The results are not applicable on a realsituation where the ambient air pressure will vary more.

Chapter 5

Conditioning - Faultdetection

In this chapter a fault detection problem is treated. This is done using the condi-tioning method in the VDS. The goal is to test if Vindax R can be used to detectsome faults in a diesel engine.

5.1 Introduction

The functionality used for fault detection is described in section 3.2. This methodis based on the distance measurement between neurons and input data that can beextracted from the VDS. When a combination of the engine states produces a largerdistance between the winning neuron and the input signal than occurred duringtraining, an error is detected, or, at least, an unknown situation has occurred.

5.2 Method

Data from a DC and an ETC collected according to chapter 4, are used. Thesystem is trained and labeled with 75% of the data from the engine with no faults.The verification data consist of the remaining 25% of the fault free data plus datafrom an engine with an air leakage and an engine with reduced InterCooler, IC,efficiency.

The following variables are used as input:

moving average over 10 values of boost pressure

moving average over 100 values of boost temperature

moving average over 100 values of fuel value

moving average over 100 values of engine speed

37

38 Conditioning - Fault detection

The smoothing, moving averages, is done to solve the problem with dynamics,see section 6.2 for a more thorough discussion. It would also be possible to de-lay signals to solve dynamic problems. However, since different faults show withdifferent delays and no obvious delays can be detected, that approach is not used.

5.3 Results and discussion

The results from the conditioning are summarized in table 5.1 and visualized infigure 5.1.

% meanHealthy engine 0.9 4.3

Air leakage 11.8 9.1IC reduced 35.7 33.9

Table 5.1. Results during fault detection with a fault free engine, an engine with an airleakage and an engine with reduced IC efficiency. The second column shows the percentage,of all samples, the samples where the difference is above zero. The last column shows themean value of the difference values when the difference is above zero. Values above zeroindicates an unknown input combination, i.e. a probable fault.

This shows that faults can be detected. As explained in section 3.2 differencevalues above zero indicates a probable fault in the system. There are definitely alot more samples with a difference above zero when there is a fault in the system.The differences are also in general more above zero in the faulty cases than in thefault free case.

In the results presented it is clear that the fault detection performs better whenthe error is a reduced IC efficiency than when it is an air leakage. This can beexplained by comparing the chosen input variables to variables chosen in chapter 6.They are similar to the variables chosen for isolation of reduced IC efficiency faults,which explains the better performance.

It would be the other way around if the chosen variables would be more similarto the ones for isolation of an air leakage. Therefore a change in variables towardsthis would improve the fault detection when there is an air leakage. This is howevernot a solid approach as the idea of conditioning is to detect new types, i.e. unseen,faults. Input variables should not be adapted to a specific problem but should bequite general to ensure that unseen faults are detected.

5.3 Results and discussion 39

Figure 5.1. Results during fault detection with (from top to bottom) a healthy engine,an engine with air system leakage and an engine with reduced IC. Values above zeroindicates an unknown input combination, i.e. a probable fault.

40 Conditioning - Fault detection

Chapter 6

Classification - Faultdetection and isolation

This chapter describes fault detection and isolation with the VDS. The purpose isto demonstrate how to develop such an application and to give an indication ofresults that can be expected when using Vindax R for this task. The goal is to testwhether or not Vindax R is capable to produce sufficient information to make adiagnosis on a diesel engine inlet air system.

6.1 Introduction

As was described in section 4.1.4, data from three different cases are available. Inengine diagnostics it is desirable to isolate the faults in these cases, i.e. distinguisha fault free engine from an engine with a leakage in the inlet air system or a reducedintercooler efficiency. This is what the VDS is going to be used for in this chapter.

Two applications are developed to fulfill the purpose of this chapter; detectionof a leakage and detection of a reduced intercooler efficiency. DC and ETC datafrom a fault free engine and an engine with implemented faults are combined andare used for input. For training and labeling 75 % of the data are used and theremaining part is used for verification.

6.2 Method

To develop these applications the network is trained to recognize fault situations.The engine state will reveal these situations by the combination of its state vari-ables, e.g. load, speed, oil and water temperature. All variables will not containinformation about the fault and hence the selection of the proper variables is vital.

41

42 Classification - Fault detection and isolation

6.2.1 Leakage isolation

When there is an air leakage, the compressor is not able to build up the intakemanifold pressure, i.e. the boost pressure signal, as fast and to the same level,compared to the fault free engine. Therefore, the boost pressure signal and itsderivative are suitable as inputs. Additional state variables are the amount of fuelinjected, i.e. fuel value, with its derivative. This variable is selected because theamount of fuel injected is correlated with the boost pressure. See section 2.2.4 fordetails.

The following signals are used as inputs and pre-processed as follows:

boost pressureSmoothed by taking a 10 sample average and then normalized with the am-bient air pressure.

boost pressure derivativesThe difference between the current boost pressure and the values at t4 andt 8 are used as inputs to incorporate the slope. These two where chosensimply by examining the slope of the boost pressure and trying to incorporateas useful information about the derivative as possible.

fuel valueSmoothed by taking a 10 sample average delayed by 7 samples to accountfor dynamics, see section 3.5. The time constant, 7 samples, is estimated bymaximizing the correlation between the fuel value and the boost pressuresignals.

fuel value derivativesThe differences between the current fuel value and the values at t 4 andt 8 are used as inputs to incorporate the slope. These two where chosen ofthe same reasons as with the boost pressure derivatives.

The smoothing reduces the effect of noise and dynamic behavior. This makes atotal of six input signals.

Another important variable is the engine speed, however not suitable for input.Experiments using engine speed as an input variable, give worse results1. The infor-mation that however resides in the engine speed signal is incorporated in anotherway. As the boost pressure depends on this variable, but not as strongly as on thefuel value, the input space is divided into three areas:

1. Low speed : 0 rpm 1250 rpm

2. Middle speed : 1150 rpm 1650 rpm

3. High speed : 1550 rpm rpmThis is done to reduce the effect that the engine speed has on the boost pressure.

1Why this is the case is not obvious. Maybe the VP is confused as more inputs are introduced.But it could also be that the dynamics are not properly handled, i.e. the signal lagging is notcorrect. Experiments could be done to try to solve this, but the solution presented here givessatisfactory results.

6.2 Method 43

In each range, the fuel value will become more dominant as the engine speed doesnot differ as much.

The ranges are chosen by looking at the distribution of the engine speed signal.The test cycles used, DC and ETC, generate more measurements of the enginespeed in the mid range. This is due to the fact that the regulator used to controlthe engine speed is not optimal, hence will the distribution be slightly normalized.In addition, the ETC cycle is not designed to generate a uniform distribution ofthe engine speed.

A separate network is used for each area. The size of them is 256 neuronsexcept for the middle range where a 1024 neuron network is tested as well. Thisis done because the measurements contain more input signals in this area. Thereare approximately four times as many samples here, compared to the other ranges.Trying to increase the network size in the other ranges will probably not improvethe results much because the size is already sufficient.

6.2.2 Reduced intercooler efficiency isolation

Reduced intercooler efficiency, will result in a different behavior in the intake man-ifold temperature, i.e. the boost temperature signal. When the efficiency is reduced,the air flowing through the intercooler will not be cooled as efficiently as with afault free engine.

Both engine speed and fuel value, i.e. amount of fuel injected, signals affect theboost temperature. In the process in section 2.2.4, it is described how an increase infuel value causes the temperature to rise. This puts more pressure on the intercoolerto cool the air. Also an increase in engine speed would mean that more air wouldbe flowing through the intercooler. These two should therefore be the states thatthe boost temperature depends upon.

The relationship, in time, between the fuel value and engine speed states andthe boost temperature cannot be concluded from the figure 6.1. It is difficult toestimate a time constant and solve the dynamic behavior by delaying input signals.

Instead of delaying signals, a very large average is used. This means that thesignals are smoothed and the time delay will not be as significant. This is actuallyhow the temperature changes in the engine, hence a way to solve the dynamicproblems. Therefore, these three signals are used as inputs and pre-processed asfollows:

boost temperatureSmoothed by taking a 100 sample average.

fuel valueSmoothed by taking a 100 sample average.

engine speedSmoothed by taking a 100 sample average.

Figure 6.2 shows the signals after smoothing. The correlation between the threestates is much higher now. The actual changes in fuel value can be recognized in the


Figure 6.1. Boost temperature from an engine without faults during a DC. The boosttemperature is presented in both graphs and compared with, top: the fuel value andbottom: the engine speed.

boost temperature curve. It is harder to see the same relationship between enginespeed and boost temperature but the improvement in results indicates a highercorrelation. There is a risk that information is lost when the signals are heavilysmoothed, but experiments show good results.

As with leakage detection the problem is divided into three intervals to increasethe resolution. This time, the intervals are in the boost pressure signal:

1. Low pressure: 0 kPa 75 kPa

2. Middle pressure : 65 kPa 155 kPa

3. High pressure : 145 kPa kPa


Tables 6.1-6.2 show the results from the classification. This is achieved by usingan activation frequency threshold of 80% for labeling; see section 3.3. This givesa high response frequency, still keeping the ratio between correct and incorrect


Figure 6.2. Averaged boost temperature from an engine without faults during a DC.The boost temperature is presented in both graphs and compared with, top: the averagedfuel value and bottom: the averaged engine speed.

classifications at a low level. Appendix C shows results when using 90%, 80%, 70%and 60% as activation frequency threshold as well as most frequent method.

The results presented in tables 6.1-6.2, were found to be the best. Differentfrequency thresholds produce different kinds of information. For the purpose here,passing classification results to a diagnosis algorithm, a high ratio between correctand incorrect classifications has to be achieved. Also a sufficiently high responsefrequency, i.e. few non-classifiable signals, should be achieved to be able to make adiagnosis.


true mode \ classified as NF FAL Not cl.Low rpm

NF 13.80% 0.94% 85.27%FAL 1.41% 10.01% 88.58%

Mid rpm (256)NF 38.92% 3.18% 57.89%FAL 2.09% 29.98% 67.93%

Mid rpm (1024)NF 46.76% 4.65% 48.59%FAL 2.13% 48.94% 48.92%

High rpmNF 68.95% 4.93% 26.12%FAL 1.86% 65.72% 32.42%

Over all (256)NF 37.44% 2.89% 59.67%FAL 1.86% 30.86% 67.27%

Over all (256/1024/256)NF 41.81% 3.70% 54.48%FAL 1.89% 41.20% 56.91%

Table 6.1. Classification results with an activation frequency of 80% at labeling. NF =No Fault, FAL = Air Leakage.

true mode \ classified as NF FRIE Not cl.Low BP

NF 48.85% 3.89% 47.25%FRIE 2.61% 51.63% 45.76%

Mid BPNF 66.58% 2.41% 31.00%

FRIE 3.01% 70.99% 26.00%

High BPNF 69.60% 2.48% 27.93%

FRIE 4.66% 71.87% 23.48%

Over allNF 59.91% 3.03% 37.07%

FRIE 3.11% 62.69% 34.21%

Table 6.2. Classification results with an activation frequency of 80% at labeling. NF =No Fault, FRIE = Reduced Intercooler Efficiency.

Chapter 7

Function estimation - Virtualsensor

In this chapter a function estimation problem is solved. The VDS is used as avirtual sensor to estimate the engine torque. The goal is to see how well Vindax R

performs as a virtual sensor on a test problem.

7.1 Introduction

When discussing the torque estimation problem only the DC data are used. Thisis because the quality of the ETC data that was acquired is not as high as for theDC data, i.e. the number of samples is too low.1

Two different DC runs have been used. First, the same cycle as in the condi-tioning and classification problems is used. Then, a DC that is slower and longer isused. The second cycle is run on a different engine. For training and labeling 75 %of the data are used and the rest are used for validation. Only data from engineswith no faults are used.

7.2 Method

As discussed in section 2.2, the torque value is connected to the amount of fuelinjected into the cylinders, but also the amount of oxygen pushed into the cylinderis of interest. This leads to the use of the following variables:

Fuel value

Engine speed

Boost pressure

1See chapter 4.1.4 for a description of the two test cycles.

47

48 Function estimation - Virtual sensor

Boost temperature

where the last three are connected to the amount of oxygen.It is important to realize what using the fuel value, as one of the input signals,

leads to. This signal is an input signal for the fuel system in the engine, specifyingthe amount of fuel that should be injected. If there is a fault in the system, e.g.a clogged injector, this is not what will be realized. This might lead to a faultintolerant torque estimation.


The quality of the estimation is measured using a comparison between the estimatedtorque value and the torque value used for labeling, i.e. the torque value measuredin the test bench. This result is put in relation to the torque estimation done bythe EMS.

The best result, on the first data set, is achieved using fuel value, enginespeed and boost pressure as input signals. The RMSE is then 146.5 Nm com-pared to 165.5 Nm for the estimation made by the EMS. If also boost temperatureis used, which might be desirable to compensate for faults in the system, the RMSEincreases to 158.5 Nm.

On the second set of data the corresponding RMSE value is 90.1 Nm (usingall four variables) compared to 66.9 Nm for the estimation by the EMS. Using1024 Nm neurons lowers the RMSE value to 74.6 Nm. This set of data has slowerchanges and not as many transients as the first set of data.

Two things can be seen that make the results worse. First, it is a resolutionproblem. It is not possible to get that much better result using only 256/1024neurons. In addition the I/O-interface of the VP causes problems on resolution. Theinterface uses 8-bit integers for input/output which causes quantization errors. Thesecond problem is unaccountable peaks in the torque signal, several hundred Nmin 0.1 seconds, from the test bench, as can be seen in figure 7.1.

These peaks can not be explained2 and are of no interest, as they are too shortto affect the truck. Therefore, they can be removed by a filter, that eliminates peaksthat are larger than 100 Nm. This filter is used on the signal before using it forlabeling and verification. Then the RMSE (without boost temperature) decreasesto 67.0 with 256 neurons and 55.4 with 1024 neurons. Also the RMSE for the EMS.estimation decreases, down to 55.5.

For the faster set of data, the estimation made by the VP is clearly better thanthe EMS estimation. For the slower set of data its a more even game, but here aresolution issue in the Vindax R can be seen. The increase in neurons from 256 to1024 gives a clear increase in the accuracy of the estimation. This suggests thatmore neurons probably would improve the results even more.

Also for the second, slower, set of data, more variables should probably beused. The set of data is collected from an engine equipped with an exhaust gas

2They are probably a result of the test bench configuration and are not present when theengine is installed in a truck.


Figure 7.1. Unaccountable peaks in the torque signal from the test bench.

recirculation system that takes some of the exhaust gases and leads them into theinlet air system. This is a way to reduce the emissions, but it also results in thatless oxygen in the air that is pushed into the cylinders. The amount of oxygen isone of the key factors used in the estimation so using more variables, incorporatingthe amount of oxygen lost, might improve the results for the slower set of data.

50 Function estimation - Virtual sensor

Chapter 8

Conclusions and future work

In this chapter conclusions from the work and suggestions of future work in thearea will be presented.

8.1 Conclusions

The probability density functions of the input signals determine the structure ofthe SOM. This is all according to the SOM theory and a confirmation that the VDSfollows this. Depending on the application area, the distributions of input signalshave to be analyzed before using it for training, i.e. the training data should besuitable for the purpose of the network.

A desired distribution of data can however be extremely hard to achieve. Oftenmeasurements are done by controlling a limited number of variables. It is possibleto control the distribution of these but it is hard, or even impossible, to control thedistribution of the rest of the measured variables.

An advantage, and disadvantage, of the SOM approach is that prior knowledgemost often cannot be invoked to improve the results. Dealing with system identi-fication usually starts with assuming the order of the system and then estimatingparameters. With prior knowledge about the system it is sometimes very straight-forward to estimate the parameters. But with a more complex system this becomesharder to do. Therefore this is both an advantage as well as a disadvantage.

Some types of prior knowledges might be incorporated by smart choices of inputsignals, e.g. using derivatives instead of absolute values, combine signals into a newsignals etc.

Noise redundancy properties are hard to conclude. The numbers in table 3.6showed a six times as high estimation error when the noise was introduced. Com-paring with the polynomial regression there was a slightly better performance.More studies should be performed to investigate the connection between the noiseand the estimation error. It is however clear that the signals should not be toonoisy to get a good result.

51

52 Conclusions and future work

Working with linear or non-linear systems is not much of a difference. This is thenature of the SOM network. It does a mapping of the input data independentlyof what the system is like. Sending the same input data to a linear and a non-linear system gives the same network. In the end the system layout only affects thelabeling. This is a big benefit of the SOM based methods.

The SOM method is a mapping for pure static relationships. It has no abilityto handle dynamic data. But, by feeding it with old input and/or output signalsas input the problems goes from being dynamic to being static. This way SOMs(including the VDS) can be used to solve dynamic problems.

Dealing with dynamics, a problem is to find out which input (and/or output)signals that need to be lagged and with how much. Optimal is when all dynamics ofthe system are included but nothing more. Using signals that are irrelevant makesthe estimation worse. The VP tries to adapt to information that is not relevant tothe problem.

As was mentioned in section 3.1, the thesis evaluated a PCI-version of the VP.The processor is accessed through the VDS. This creates a very fast system andthe time required to train and label a network is negligible compared to the pre-processing. Using the software emulation mode is however very slow and usuallytakes more than an hour.1

The VDS has a strong advantage in visualization. When the VP is trained itis easy to visualize with different kinds of graphs how the training turned out.Then for example adjustments in data pre-processing can be made to get betterperformance.

Conclusions so far have been general conclusions applicable to all usage areas.The following sections handle each area more specifically.

8.1.1 Conditioning

The conditioning functionality of the VDS is questionable for detection of faults.The reason for this is because the VDS requires very precise handling of dynamics.An on beforehand known possible fault can, probably, be detected by designing aVDS application. But to say whether or not unknown errors will be recognized isimpossible. Probably, this new error shows up in a specific state and this informa-tion has to be extracted in a proper way to be detected.

There will be a problem if unseen data is presented to the VP as it will beclassified as faulty. This means that all possible combinations have to be measuredand used to train the VP if the conditioning is going to be useful.

8.1.2 Classification

When working with classification problems, the VDS has a strong advantage insimplicity. The method to design an application is basically the same, no matterthe dimensionality or other aspects that complicates the problem. This also makes

1This depends on the performance of the PC that is used.

8.2 Method criticism 53

it possible to get results very fast, and it is also very easy to change classificationboundaries to try out a good combination.

There is a high potential in this area. As the demands on representability ofdata are lower when making a diagnosis, i.e. a diagnose does not have to be madeduring every operating condition, the training data do not need to incorporate allengine properties.

8.1.3 Function estimation

The accuracy of the function estimation depends on how many neurons that areused in the SOM and the number of misclassified signals. In the example with noisysignals the number of misclassified signals were high. This caused the error to growalthough the neuron labels had approximately the same distance, i.e. the networkhad the same resolution.

As was discussed in section 7.3 there is a resolution problem in some cases.In general, this applies to function estimation, but could be an issue for the otherareas as well. The I/O interface uses 8-bit integers, i.e. 256 levels, which means thatthere will be quantification errors in both the input signals and the output signal.It is a disadvantage that the output is discrete and limited to as many values as thenumber of neurons. This might be compensated for with hierarchies in the VDSbut in the end it is a limitation.

8.2 Method criticism

The same kind of data have been used for training as for verification. It is arguablethat this approach does not give a true image of the VDS capabilities. Classifyingother kinds of data, e.g. more stationary data, may not give as good results. Thisis because these are unseen data that the system is not trained to classify.

A temporary solution would be to incorporate part of the new data into thetraining set and the system would then be able to classify it. Then again, a newtype of data could be introduced and so on. The solution to this problem is to usetraining data that capture sufficient properties of the system to be able to classifysignals and make a diagnosis.

8.3 Future Work

Going forth would probably involve developing a new and bette

Self-organizing maps for virtual sensors, fault detection ...20098/FULLTEXT01.pdf ·...

Documents

Transcript of Self-organizing maps for virtual sensors, fault detection ...20098/FULLTEXT01.pdf ·...