Research Article Fault Detection and Diagnosis for Gas ...2. Framework of Remote Monitoring and...
Transcript of Research Article Fault Detection and Diagnosis for Gas ...2. Framework of Remote Monitoring and...
Research ArticleFault Detection and Diagnosis for Gas Turbines Based on aKernelized Information Entropy Model
Weiying Wang12 Zhiqiang Xu12 Rui Tang2 Shuying Li2 and Wei Wu3
1 College of Power and Energy Engineering Harbin Engineering University Harbin 150001 China2Harbin Marine Boiler amp Turbine Research Institute Harbin 150036 China3Harbin Institute of Technology Harbin 150001 China
Correspondence should be addressed to Zhiqiang Xu xuzhiqianghiteducn
Received 28 April 2014 Accepted 19 June 2014 Published 28 August 2014
Academic Editor Xibei Yang
Copyright copy 2014 Weiying Wang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Gas turbines are considered as one kind of the most important devices in power engineering and have been widely used in powergeneration airplanes and naval ships and also in oil drilling platforms However they are monitored without man on duty in themost cases It is highly desirable to develop techniques and systems to remotely monitor their conditions and analyze their faultsIn this work we introduce a remote system for online condition monitoring and fault diagnosis of gas turbine on offshore oil welldrilling platforms based on a kernelized information entropy model Shannon information entropy is generalized for measuringthe uniformity of exhaust temperatures which reflect the overall states of the gas paths of gas turbine In addition we also extendthe entropy to compute the information quantity of features in kernel spaces which help to select the informative features for acertain recognition task Finally we introduce the information entropy based decision tree algorithm to extract rules from faultsamples The experiments on some real-world data show the effectiveness of the proposed algorithms
1 Introduction
Gas turbines mechanical systems operating on a thermo-dynamic cycle usually with air as the working fluid areconsidered as one kind of the most important devices inpower engineering where the air is compressed mixed withfuel and burnt in a combustor with the generated hot gasexpanded through a turbine to generate power which is usedfor driving the compressor and for providing the means toovercome external loads Gas turbines play an increasinglyimportant role in the domains of mechanical drives in the oiland gas sectors electricity generation in the power sector andpropulsion systems in the aerospace and marine sectors
Safety and economy are always two fundamentally impor-tant factors in designing producing and operating gasturbine systemsOnce amalfunction occurs to a gas turbine aserious accident even disastermay take place It was reportedthat about 25 accidents take place every year due to jetmalfunctioning In 1989 111 were killed in a plane crash dueto an engine fault Although great progress has been madethese years in the area of condition monitoring and fault
diagnosis how to predict and detect malfunctions is still anopen problem for the complex systems In some cases suchas offshore oil well drilling platforms the main power systemis self-monitoring without man on duty So the reliabilityand stabilization are of critical importance to these systemsThere are hundreds of offshore platforms with gas turbinesproviding electricity and powers in ChinaThere is an urgentrequirement to design and develop online remotemonitoringand health management techniques for these systems
More than two hundred sensors are installed in eachgas turbine for monitoring the state of a gas turbine Thedata gathered by these sensors reflects the state and trendof the system If we build a center to monitor two hundredgas turbine systems we should watch the data coming frommore than forty thousand sensors Obviously it is infeasibleto manually analyze them Techniques on intelligent dataanalysis have been employed in gas turbine monitoring anddiagnosis In 2007 Wang et al designed a conceptual systemfor remote monitoring and fault diagnosis of gas turbine-based power generation systems [1] In 2008 Donat et aldiscussed the issue of data visualization data reduction
Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 617162 13 pageshttpdxdoiorg1011552014617162
2 The Scientific World Journal
and ensemble learning for intelligent fault diagnosis in gasturbine engines [2] In 2009 Li and Nilkitsaranont describeda prognostic approach to estimating the remaining useful lifeof gas turbine engines before their next major overhaul basedon a combined regression technique with both linear andquadratic models [3] In the same year Bassily et al proposeda technique which assessed whether or not the multivariateautocovariance functions of two independently sampledsignals coincide to detect faults in a gas turbine [4] In 2010Young et al presented an offline fault diagnosis method forindustrial gas turbines in a steady-state using Bayesian dataanalysis The authors employed multiple Bayesian modelsvia model averaging for improving the performance of theresulted system [5] In 2011 Yu et al designed a sensorfault diagnosis technique for Micro-Gas Turbine Enginebased on wavelet entropy where wavelet decomposition wasutilized to decompose the signal in different scales and thenthe instantaneous wavelet energy entropy and instantaneouswavelet singular entropy are computed based on the previouswavelet entropy theory [6]
In recent years signal processing and data miningtechniques are combined to extract knowledge and buildmodels for fault diagnosis In 2012 Wu et al studied theissue of bearing fault diagnosis based on multiscale per-mutation entropy and support vector machine [7] In 2013they designed a technique for defecting diagnostics basedon multiscale analysis and support vector machines [8]Nozari et al presented a model-based robust fault detectionand isolation method with a hybrid structure where time-delay multilayer perceptron models local linear neurofuzzymodels and linear model tree were used in the system [9]Sarkar et al [10] designed symbolic dynamic filtering byoptimally partitioning sensor observation and the objectiveis to reduce the effects of sensor noise level variation andmagnify the system fault signatures Feature extraction andpattern classification are used for fault detection in aircraftgas turbine engines
Entropy is a fundamental concept in the domains ofinformation theory and thermodynamics It was first definedto be a measure of progressing towards thermodynamicequilibrium then it was introduced in information theory byShannon [11] as a measure of the amount of information thatis missing before receptionThis concept gets popular in bothdomains [12ndash16] Now it is widely used in machine learningand data driven modeling [17 18] In 2011 a new measure-ment called maximal information coefficient was reportedThis function can be used to discover the association betweentwo random variables [19] However it cannot be used tocompute the relevance between feature sets
In this work we will develop techniques to detect abnor-mality and analyze faults based on a generalized informationentropy model Moreover we also describe a system forstate monitoring of gas turbines on offshore oil well drillingplatforms First we will describe a system developed forremote and online condition monitoring and fault diagnosisof gas turbines installed on oil drilling platforms As vastamount of historical records is gathered in this system it isan urgent task to design algorithms for automatically onlinedetecting abnormality of the data and analyze the data to
obtain the causes and sources of faults Due to the complexityof gas turbine systems we focus on the gas-path subsystemin this workThe function of entropy is employed to measurethe uniformity of exhaust temperatures which is a key factorreflecting the health of the gas path of a gas turbine and alsoreflecting the performance of the gas turbineThenwe extractfeatures from the healthy and abnormal records An extendedinformation entropy model is introduced to evaluate thequality of these features for selecting informative attributesFinally the selected features are used to build models forautomatic fault recognition where support vector machines[20] and C45 are considered Real-world data are collectedto show the effectiveness of the proposed techniques
The remainder of the work is organized as followsSection 2 describes the architecture of the remotemonitoringand fault diagnosis center for gas turbines installed on theoil drilling platforms Section 3 designs an algorithm fordetecting abnormality of the exhaust temperatures Thenwe extract features from the exhaust temperature data andselect informative ones based on evaluating the informationbottlenecks with extend information entropy in Section 4Support vector machines and C45 are introduced for build-ing fault diagnosis models in Section 5 In addition numer-ical experiments are also described in this section Finallyconclusions and future work are given in Section 6
2 Framework of Remote Monitoring andFault Diagnosis Center for Gas Turbine
Gas turbines are widely used as power and electric powersources The structure of a general gas turbine is presentedin Figure 1 This system transforms chemical energy intothermal power then mechanical energy and finally electricenergy Gas turbines are usually considered as the hearts of alot of mechanical systems
As the offshore oil well drilling platforms are usuallyunattended an online and remote state monitoring systemis much useful in this area which can help find abnormalitybefore serious faults occur However the sensor data cannotbe sent into a center with ground based internetThe data canonly be transmitted via telecommunication satellite whichwas too expensive in the past Now this is available
The system consists of four subsystems data acquisitionand local monitoring subsystem (DALM) data commu-nication subsystem (DAC) data management subsystem(DMS) and intelligent diagnosis system (IDS) The firstsubsystem gathers the outputs from different sensors andchecks whether there is any abnormality in the system Thesecond one packs the acquired data and transforms theminto the monitoring center Users in the center can also senda message to this subsystem to ask for some special data ifabnormality or fault occursThe datamanagement subsystemstores the historic information and also fault data and faultcases A data compression algorithm is embedded in thesystem As most of the historic data are useless for the finalanalysis they will be compressed and removed for savingstorage space Finally IDS watches the alarm informationfrom different unit assemblies and starts the correspondingmodule to analyze the related information This system gives
The Scientific World Journal 3
Exhaust
Turbine
Combustor
Compressor
Output shaft and gearbox
Titan 130
Single shaft gas turbine forpower generation applications
Figure 1 Prototype structure of a gas turbine
some decision and explains how the decision has been madeThe structure of the system is shown in Figure 2
One of the webpages of the system is given in Figure 3where we can see the rose figure of exhaust temperaturesand some statistical parameters varying with time are alsopresented
3 Abnormality Detection inExhaust Temperatures Based onInformation Entropy
Exhaust temperature is one of the most critical parameters ina gas turbine as excessive turbine temperatures may lead tolife reduction or catastrophic failures In the current gener-ation of machines temperatures at the combustor dischargeare too high for the type of instrumentation available Exhausttemperature is also used as an indicator of turbine inlettemperature
As the temperature profile out of a gas turbine is notuniform a number of probes will help pinpoint disturbancesor malfunctions in the gas turbine by highlighting the shiftsin the temperature profile Thus there are usually a set ofthermometers fixed on the exhaust If the system is normallyoperating all the thermometers give similar outputs How-ever if a fault occurs to some components of the turbinedifferent temperatures will be observed The uniformity ofexhaust temperatures reflects the state of the system So weshould develop an index to measure the uniformity of theexhaust temperatures In this work we consider the entropyfunction for it is widely used in measuring uniformity ofrandomvariables However to the best of our knowledge thisfunction has not been used in this domain
Assume that there are 119899 thermometers and their outputsare 119879119894 119894 = 1 119899 respectively Then we define the unifor-
mity of these outputs as
119864 (119879) = minus
119899
sum
119894=1
119879119894
119879
log2
119879119894
119879
(1)
where 119879 = sum119895119879119895 As 119879
119894ge 0 we define 0 log 0 = 0
Obviously we have log2119899 ge 119864(119879) ge 0119864 (119879) = log
2119899 if and
only if 1198791= 1198792= sdot sdot sdot = 119879
119899 In this case all the thermometers
produce the same output So the uniformity of the sensorsis maximal In another extreme case if 119879
1= 1198792= 119879119894minus1
=
119879119894+1
sdot sdot sdot = 119879119899= 0 and 119879
119894= 119879 then 119864 (119879) = 0
It is notable that the value of entropy is independentof the values of thermometers while it depends on thedistribution of the temperatures The entropy is maximal ifall the thermometers output the same values
Now we show two sets of real exhaust temperatures mea-sured on an oil well drilling platform where 13 thermometersare fixed In the first set the gas turbine starts from a timepoint and then runs for several minutes finally the systemstops
Observing the curves in Figure 4 we can see that the13 thermometers give the almost the same outputs at thebeginning In fact the outputs are the room temperature inthis case as shown in Figure 6(a) Thus the entropy reachesthe peak value
Some typical samples are presented in Figure 6 wherethe temperature distributions around the exhaust at timepoints 119905 = 5130250400 and 500 are given Obviously thedistributions at 119905 = 130250 and 400 are not desirable It canbe derived that some abnormality occurs to the system Theentropy of temperature distribution is given in Figure 5
4 The Scientific World Journal
Local monitoring Data acquiring Inner firewallOuter firewall
TelstarSatellite signal generator Satellite signal receiver
Inner firewall Outer firewall Internet
Online database subsystem
Video conference system
Internet
Client
Client
Client
Intelligent diagnosis subsystem
Human-system interface
Data communication
Oil drilling platforms
Gas turbine
Diagnosis center
Figure 2 Structure of the remote system of condition monitoring and fault analysis
Figure 3 A typical webpage for monitoring of the subsystem
0 50 100 150 200 250 300 350 400 450 500
0
100
200
300
400
500
600
700
Time
Tem
pera
ture
Figure 4 Exhaust temperatures from a set of thermometers
0 50 100 150 200 250 300 350 400 450 500
Time
255
26
265
27
275
28
Info
rmat
ion
entro
py
Figure 5 Uniformity of the temperatures (red dash line is the idealcase blue line is the real case)
The Scientific World Journal 5
20
40
30
210
60
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 5
20
40
60
80
100
30
210
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 =130
30
210
60
240
90
270
120
300
150
330
180 0
400
500
300
200
100
(c) Rose map of exhaust temperatures at 119905 =250
30
50
210
60
240
90
270
120
100
150
300
150
330
180 0
(d) Rose map of exhaust temperatures at 119905 =400
100
80
60
40
20
30
210
60
240
90
270
120
300
150
330
180 0
(e) Rose map of exhaust temperatures at 119905 =500
Figure 6 Samples of temperature distribution in different times
Another example is also given in Figures 7 to 9 In thisexample there is significant difference between the outputsof 13 thermometers even when the gas turbine is not runningjust as shown in Figure 9(a)Thus the entropy of temperaturedistribution is a little lower than the ideal case as shown inFigure 8 Besides some representative samples are also givenin Figure 9
Considering the above examples we can see that the func-tion of entropy is an effective measurement of uniformity Itcan be used to reflect the uniformity of exhaust temperatures
If the uniformity is less than a threshold some faults possiblyoccur to the gas path of the gas turbine Thus the entropyfunction is used as an index of the health of the gas path
4 Fault Feature Quality Evaluation withGeneralized Entropy
The above section gives an approach to detecting the abnor-mality in the exhaust temperature distribution However thefunction of entropy cannot distinguish what kind of faults
6 The Scientific World Journal
0 500 1000 1500
0
100
200
300
400
500
600
700
Time
Tem
pera
ture
Figure 7 Exhaust temperatures from another set of thermometers
0 500 1000 1500
Time
245
25
255
26
265
27
275
28
Info
rmat
ion
entro
py
Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case
occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault
Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue
Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879
0is also an impor-
tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as
119865 = 1198790 1198791 1198792 119879
119899 1198791minus 1198792 1198792minus 1198793
119879119899minus 1198791 1198791015840
1 1198791015840
2 119879
1015840
119899
(2)
where 1198791015840119894= 119879119894(119895) minus 119879
119894(119895 minus 1) 119879
119894(119895) is the temperature at time
119895 of the 119894th thermometerApart from the above features we can also construct other
attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally
There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it
Given a set of samples 119880 = 1199091 1199092 119909
119898 each sample
is described with 119899 features 119865 = 1198911 1198912 119891
119899 As to
classification learning each training sample 119909119894is associated
with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a
kernel function119870 we can calculate a kernel matrix
119870 =[
[
[
11989611
1198961119898
d
1198961198981
119896119898119898
]
]
]
(3)
where 119896119894119895= 119896(119909
119894 119909119895) The Gaussian function is a representa-
tive kernel function
119896119894119895= exp(minus
10038171003817100381710038171003817119909119894minus 119909119895
10038171003817100381710038171003817
2
120590
) (4)
A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896
119894119895= 119896119895119894
Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as
119863 =[
[
[
11988911
1198891119898
d
1198891198981
119889119898119898
]
]
]
(5)
where 119889119894119895= 1 if 119910
119894= 119910119895 otherwise 119889
119894119895= 0 In fact the matrix
119863 is a matching kernel
Definition 1 Given a set of samples119880 = 1199091 1199092 119909
119898 each
sample is described with 119899 features 119865 = 1198911 1198912 119891
119899 1198651015840 sube
119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as
119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(6)
where119870119894= sum119898
119895=1119896119894119895
As to the above entropy function if we use Gaussianfunction as the kernel we have log
2119898 ge 119864 (119870) ge 0 119864 (119870) = 0
if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log
2119898 if and only if 119896
119894119895= 0
119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be
The Scientific World Journal 7
30
210
60
15
10
5
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 500
30
210
800
600
400
200
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 = 758
30
210
250
200
150
100
50
60
240
90
270
120
300
150
330
180 0
(c) Rose map of exhaust temperatures at 119905 = 820
30
210
60
240
90
270
120
300
150
330
180 0
400
300
200
100
(d) Rose map of exhaust temperatures at 119905 =1220
Figure 9 Samples of temperature distribution in different moments
distinguished with the current features while 119864 (119870) = log2119898
means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log
2119898]
Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge
119864(1198702) where119870
1sube 1198702means119870
1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895
Definition 2 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651 1198652sube 119865 119870
1and 119870
2are two kernel matrices induced by 119865
1
and 1198652119870 is a new function computed with 119865
1cup 1198652 Then the
joint entropy of 1198651and 119865
2is defined as
119864 (1198701 1198702) = 119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(7)
where119870119894= sum119898
119895=1119896119894119895
As to the Gaussian function 119870(119909119894 119909119895) = 119870
1(119909119894 119909119895) times
1198702(119909119894 119909119895) Thus 119870 sube 119870
1and 119870 sube 119870
2 In this case 119864(119870) ge
119864(1198701) and 119864(119870) ge 119864(119870
2)
Definition 3 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912
119891119899One has 119865
1 1198652sube 119865 119870
1and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Knowning 119865
1 the condition entropy of 119865
2is
defined as
119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870
1) (8)
As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870
2)
so 119864(1198701| 1198702) ge 0 and 119864(119870
2| 1198701) ge 0
Definition 4 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
One has 1198651 1198652
sube 119865 1198701and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Then the mutual information of 119870
1and 119870
2is
defined as
MI (1198701 1198702) = 119864 (119870
1) + 119864 (119870
2) minus 119864 (119870) (9)
As to Gaussian kernel MI(1198701 1198702) = MI(119870
2 1198701) If 119870
1sube
1198702 we have MI(119870
1 1198702) = 119864(119870
2) and if 119870
2sube 1198701 we have
MI(1198701 1198702) = 119864(119870
1)
8 The Scientific World Journal
0 5 10 15 20 25 30 35 40 450
005
01
015
02
025
03
035
04
Feature index
Dep
ende
ncy
Figure 10 Fuzzy dependency between a single feature and decision
0 5 10 15 20 25 30 35 40 45
Feature index
Dep
ende
ncy
0
02
04
06
08
1
12
14
Figure 11 Kernelized mutual information between a single featureand decision
Please note that if 1198651sube 1198652 we have 119870
2sube 1198701 However
1198702sube 1198701does not mean 119865
1sube 1198652
Definition 5 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the
kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as
MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)
MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used
Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust
1 2 3 4
0
02
04
06
08
1
Selected feature number
Fuzz
y de
pend
ency
03475
09006
0999609969
Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)
0
02
04
06
08
1
12
14
16
18
Kern
eliz
ed m
utua
l inf
orm
atio
n
1267
161215771510
1 2 3 4
Number of selected features
Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)
Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets
Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space
Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results
Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In
The Scientific World Journal 9
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency
each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively
In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature
Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different
regions which leads to the difficulty in learning classificationmodels
However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model
Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance
5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm
After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent
10 The Scientific World Journal
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information
with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects
Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this
work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows
Decision tree algorithm C45Input a set of training samples 119880 = 119909
1 1199092 119909
119898
with features 119865 = 1198911 1198912 119891
119899
Stopping criterion 120591Output decision tree 119879
(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-
mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized
information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on
f best and add those nodes as children of node untilstopping criterion 120591 is satisfied
(6) Output 119879
The Scientific World Journal 11
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information
F2
le050gt050
F37 F2
le041
Class 2
gt018
Class 5
le049
Class 1
gt049
Class 4 F3
Class 3
gt041
le018
Figure 17 Decision tree trained on the features selected with fuzzyrough sets
We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively
F39le017gt017
F39 F40
le042
Class 3
gt042
Class 5
le045
Class 2
gt045
F38le080gt080
Class 1Class 4
Figure 18 Decision tree trained on the features selected withkernelized mutual information
We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules
(1) if F2 gt 050 and F37 gt 049 then the decision is Class4
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
2 The Scientific World Journal
and ensemble learning for intelligent fault diagnosis in gasturbine engines [2] In 2009 Li and Nilkitsaranont describeda prognostic approach to estimating the remaining useful lifeof gas turbine engines before their next major overhaul basedon a combined regression technique with both linear andquadratic models [3] In the same year Bassily et al proposeda technique which assessed whether or not the multivariateautocovariance functions of two independently sampledsignals coincide to detect faults in a gas turbine [4] In 2010Young et al presented an offline fault diagnosis method forindustrial gas turbines in a steady-state using Bayesian dataanalysis The authors employed multiple Bayesian modelsvia model averaging for improving the performance of theresulted system [5] In 2011 Yu et al designed a sensorfault diagnosis technique for Micro-Gas Turbine Enginebased on wavelet entropy where wavelet decomposition wasutilized to decompose the signal in different scales and thenthe instantaneous wavelet energy entropy and instantaneouswavelet singular entropy are computed based on the previouswavelet entropy theory [6]
In recent years signal processing and data miningtechniques are combined to extract knowledge and buildmodels for fault diagnosis In 2012 Wu et al studied theissue of bearing fault diagnosis based on multiscale per-mutation entropy and support vector machine [7] In 2013they designed a technique for defecting diagnostics basedon multiscale analysis and support vector machines [8]Nozari et al presented a model-based robust fault detectionand isolation method with a hybrid structure where time-delay multilayer perceptron models local linear neurofuzzymodels and linear model tree were used in the system [9]Sarkar et al [10] designed symbolic dynamic filtering byoptimally partitioning sensor observation and the objectiveis to reduce the effects of sensor noise level variation andmagnify the system fault signatures Feature extraction andpattern classification are used for fault detection in aircraftgas turbine engines
Entropy is a fundamental concept in the domains ofinformation theory and thermodynamics It was first definedto be a measure of progressing towards thermodynamicequilibrium then it was introduced in information theory byShannon [11] as a measure of the amount of information thatis missing before receptionThis concept gets popular in bothdomains [12ndash16] Now it is widely used in machine learningand data driven modeling [17 18] In 2011 a new measure-ment called maximal information coefficient was reportedThis function can be used to discover the association betweentwo random variables [19] However it cannot be used tocompute the relevance between feature sets
In this work we will develop techniques to detect abnor-mality and analyze faults based on a generalized informationentropy model Moreover we also describe a system forstate monitoring of gas turbines on offshore oil well drillingplatforms First we will describe a system developed forremote and online condition monitoring and fault diagnosisof gas turbines installed on oil drilling platforms As vastamount of historical records is gathered in this system it isan urgent task to design algorithms for automatically onlinedetecting abnormality of the data and analyze the data to
obtain the causes and sources of faults Due to the complexityof gas turbine systems we focus on the gas-path subsystemin this workThe function of entropy is employed to measurethe uniformity of exhaust temperatures which is a key factorreflecting the health of the gas path of a gas turbine and alsoreflecting the performance of the gas turbineThenwe extractfeatures from the healthy and abnormal records An extendedinformation entropy model is introduced to evaluate thequality of these features for selecting informative attributesFinally the selected features are used to build models forautomatic fault recognition where support vector machines[20] and C45 are considered Real-world data are collectedto show the effectiveness of the proposed techniques
The remainder of the work is organized as followsSection 2 describes the architecture of the remotemonitoringand fault diagnosis center for gas turbines installed on theoil drilling platforms Section 3 designs an algorithm fordetecting abnormality of the exhaust temperatures Thenwe extract features from the exhaust temperature data andselect informative ones based on evaluating the informationbottlenecks with extend information entropy in Section 4Support vector machines and C45 are introduced for build-ing fault diagnosis models in Section 5 In addition numer-ical experiments are also described in this section Finallyconclusions and future work are given in Section 6
2 Framework of Remote Monitoring andFault Diagnosis Center for Gas Turbine
Gas turbines are widely used as power and electric powersources The structure of a general gas turbine is presentedin Figure 1 This system transforms chemical energy intothermal power then mechanical energy and finally electricenergy Gas turbines are usually considered as the hearts of alot of mechanical systems
As the offshore oil well drilling platforms are usuallyunattended an online and remote state monitoring systemis much useful in this area which can help find abnormalitybefore serious faults occur However the sensor data cannotbe sent into a center with ground based internetThe data canonly be transmitted via telecommunication satellite whichwas too expensive in the past Now this is available
The system consists of four subsystems data acquisitionand local monitoring subsystem (DALM) data commu-nication subsystem (DAC) data management subsystem(DMS) and intelligent diagnosis system (IDS) The firstsubsystem gathers the outputs from different sensors andchecks whether there is any abnormality in the system Thesecond one packs the acquired data and transforms theminto the monitoring center Users in the center can also senda message to this subsystem to ask for some special data ifabnormality or fault occursThe datamanagement subsystemstores the historic information and also fault data and faultcases A data compression algorithm is embedded in thesystem As most of the historic data are useless for the finalanalysis they will be compressed and removed for savingstorage space Finally IDS watches the alarm informationfrom different unit assemblies and starts the correspondingmodule to analyze the related information This system gives
The Scientific World Journal 3
Exhaust
Turbine
Combustor
Compressor
Output shaft and gearbox
Titan 130
Single shaft gas turbine forpower generation applications
Figure 1 Prototype structure of a gas turbine
some decision and explains how the decision has been madeThe structure of the system is shown in Figure 2
One of the webpages of the system is given in Figure 3where we can see the rose figure of exhaust temperaturesand some statistical parameters varying with time are alsopresented
3 Abnormality Detection inExhaust Temperatures Based onInformation Entropy
Exhaust temperature is one of the most critical parameters ina gas turbine as excessive turbine temperatures may lead tolife reduction or catastrophic failures In the current gener-ation of machines temperatures at the combustor dischargeare too high for the type of instrumentation available Exhausttemperature is also used as an indicator of turbine inlettemperature
As the temperature profile out of a gas turbine is notuniform a number of probes will help pinpoint disturbancesor malfunctions in the gas turbine by highlighting the shiftsin the temperature profile Thus there are usually a set ofthermometers fixed on the exhaust If the system is normallyoperating all the thermometers give similar outputs How-ever if a fault occurs to some components of the turbinedifferent temperatures will be observed The uniformity ofexhaust temperatures reflects the state of the system So weshould develop an index to measure the uniformity of theexhaust temperatures In this work we consider the entropyfunction for it is widely used in measuring uniformity ofrandomvariables However to the best of our knowledge thisfunction has not been used in this domain
Assume that there are 119899 thermometers and their outputsare 119879119894 119894 = 1 119899 respectively Then we define the unifor-
mity of these outputs as
119864 (119879) = minus
119899
sum
119894=1
119879119894
119879
log2
119879119894
119879
(1)
where 119879 = sum119895119879119895 As 119879
119894ge 0 we define 0 log 0 = 0
Obviously we have log2119899 ge 119864(119879) ge 0119864 (119879) = log
2119899 if and
only if 1198791= 1198792= sdot sdot sdot = 119879
119899 In this case all the thermometers
produce the same output So the uniformity of the sensorsis maximal In another extreme case if 119879
1= 1198792= 119879119894minus1
=
119879119894+1
sdot sdot sdot = 119879119899= 0 and 119879
119894= 119879 then 119864 (119879) = 0
It is notable that the value of entropy is independentof the values of thermometers while it depends on thedistribution of the temperatures The entropy is maximal ifall the thermometers output the same values
Now we show two sets of real exhaust temperatures mea-sured on an oil well drilling platform where 13 thermometersare fixed In the first set the gas turbine starts from a timepoint and then runs for several minutes finally the systemstops
Observing the curves in Figure 4 we can see that the13 thermometers give the almost the same outputs at thebeginning In fact the outputs are the room temperature inthis case as shown in Figure 6(a) Thus the entropy reachesthe peak value
Some typical samples are presented in Figure 6 wherethe temperature distributions around the exhaust at timepoints 119905 = 5130250400 and 500 are given Obviously thedistributions at 119905 = 130250 and 400 are not desirable It canbe derived that some abnormality occurs to the system Theentropy of temperature distribution is given in Figure 5
4 The Scientific World Journal
Local monitoring Data acquiring Inner firewallOuter firewall
TelstarSatellite signal generator Satellite signal receiver
Inner firewall Outer firewall Internet
Online database subsystem
Video conference system
Internet
Client
Client
Client
Intelligent diagnosis subsystem
Human-system interface
Data communication
Oil drilling platforms
Gas turbine
Diagnosis center
Figure 2 Structure of the remote system of condition monitoring and fault analysis
Figure 3 A typical webpage for monitoring of the subsystem
0 50 100 150 200 250 300 350 400 450 500
0
100
200
300
400
500
600
700
Time
Tem
pera
ture
Figure 4 Exhaust temperatures from a set of thermometers
0 50 100 150 200 250 300 350 400 450 500
Time
255
26
265
27
275
28
Info
rmat
ion
entro
py
Figure 5 Uniformity of the temperatures (red dash line is the idealcase blue line is the real case)
The Scientific World Journal 5
20
40
30
210
60
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 5
20
40
60
80
100
30
210
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 =130
30
210
60
240
90
270
120
300
150
330
180 0
400
500
300
200
100
(c) Rose map of exhaust temperatures at 119905 =250
30
50
210
60
240
90
270
120
100
150
300
150
330
180 0
(d) Rose map of exhaust temperatures at 119905 =400
100
80
60
40
20
30
210
60
240
90
270
120
300
150
330
180 0
(e) Rose map of exhaust temperatures at 119905 =500
Figure 6 Samples of temperature distribution in different times
Another example is also given in Figures 7 to 9 In thisexample there is significant difference between the outputsof 13 thermometers even when the gas turbine is not runningjust as shown in Figure 9(a)Thus the entropy of temperaturedistribution is a little lower than the ideal case as shown inFigure 8 Besides some representative samples are also givenin Figure 9
Considering the above examples we can see that the func-tion of entropy is an effective measurement of uniformity Itcan be used to reflect the uniformity of exhaust temperatures
If the uniformity is less than a threshold some faults possiblyoccur to the gas path of the gas turbine Thus the entropyfunction is used as an index of the health of the gas path
4 Fault Feature Quality Evaluation withGeneralized Entropy
The above section gives an approach to detecting the abnor-mality in the exhaust temperature distribution However thefunction of entropy cannot distinguish what kind of faults
6 The Scientific World Journal
0 500 1000 1500
0
100
200
300
400
500
600
700
Time
Tem
pera
ture
Figure 7 Exhaust temperatures from another set of thermometers
0 500 1000 1500
Time
245
25
255
26
265
27
275
28
Info
rmat
ion
entro
py
Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case
occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault
Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue
Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879
0is also an impor-
tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as
119865 = 1198790 1198791 1198792 119879
119899 1198791minus 1198792 1198792minus 1198793
119879119899minus 1198791 1198791015840
1 1198791015840
2 119879
1015840
119899
(2)
where 1198791015840119894= 119879119894(119895) minus 119879
119894(119895 minus 1) 119879
119894(119895) is the temperature at time
119895 of the 119894th thermometerApart from the above features we can also construct other
attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally
There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it
Given a set of samples 119880 = 1199091 1199092 119909
119898 each sample
is described with 119899 features 119865 = 1198911 1198912 119891
119899 As to
classification learning each training sample 119909119894is associated
with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a
kernel function119870 we can calculate a kernel matrix
119870 =[
[
[
11989611
1198961119898
d
1198961198981
119896119898119898
]
]
]
(3)
where 119896119894119895= 119896(119909
119894 119909119895) The Gaussian function is a representa-
tive kernel function
119896119894119895= exp(minus
10038171003817100381710038171003817119909119894minus 119909119895
10038171003817100381710038171003817
2
120590
) (4)
A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896
119894119895= 119896119895119894
Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as
119863 =[
[
[
11988911
1198891119898
d
1198891198981
119889119898119898
]
]
]
(5)
where 119889119894119895= 1 if 119910
119894= 119910119895 otherwise 119889
119894119895= 0 In fact the matrix
119863 is a matching kernel
Definition 1 Given a set of samples119880 = 1199091 1199092 119909
119898 each
sample is described with 119899 features 119865 = 1198911 1198912 119891
119899 1198651015840 sube
119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as
119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(6)
where119870119894= sum119898
119895=1119896119894119895
As to the above entropy function if we use Gaussianfunction as the kernel we have log
2119898 ge 119864 (119870) ge 0 119864 (119870) = 0
if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log
2119898 if and only if 119896
119894119895= 0
119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be
The Scientific World Journal 7
30
210
60
15
10
5
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 500
30
210
800
600
400
200
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 = 758
30
210
250
200
150
100
50
60
240
90
270
120
300
150
330
180 0
(c) Rose map of exhaust temperatures at 119905 = 820
30
210
60
240
90
270
120
300
150
330
180 0
400
300
200
100
(d) Rose map of exhaust temperatures at 119905 =1220
Figure 9 Samples of temperature distribution in different moments
distinguished with the current features while 119864 (119870) = log2119898
means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log
2119898]
Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge
119864(1198702) where119870
1sube 1198702means119870
1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895
Definition 2 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651 1198652sube 119865 119870
1and 119870
2are two kernel matrices induced by 119865
1
and 1198652119870 is a new function computed with 119865
1cup 1198652 Then the
joint entropy of 1198651and 119865
2is defined as
119864 (1198701 1198702) = 119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(7)
where119870119894= sum119898
119895=1119896119894119895
As to the Gaussian function 119870(119909119894 119909119895) = 119870
1(119909119894 119909119895) times
1198702(119909119894 119909119895) Thus 119870 sube 119870
1and 119870 sube 119870
2 In this case 119864(119870) ge
119864(1198701) and 119864(119870) ge 119864(119870
2)
Definition 3 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912
119891119899One has 119865
1 1198652sube 119865 119870
1and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Knowning 119865
1 the condition entropy of 119865
2is
defined as
119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870
1) (8)
As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870
2)
so 119864(1198701| 1198702) ge 0 and 119864(119870
2| 1198701) ge 0
Definition 4 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
One has 1198651 1198652
sube 119865 1198701and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Then the mutual information of 119870
1and 119870
2is
defined as
MI (1198701 1198702) = 119864 (119870
1) + 119864 (119870
2) minus 119864 (119870) (9)
As to Gaussian kernel MI(1198701 1198702) = MI(119870
2 1198701) If 119870
1sube
1198702 we have MI(119870
1 1198702) = 119864(119870
2) and if 119870
2sube 1198701 we have
MI(1198701 1198702) = 119864(119870
1)
8 The Scientific World Journal
0 5 10 15 20 25 30 35 40 450
005
01
015
02
025
03
035
04
Feature index
Dep
ende
ncy
Figure 10 Fuzzy dependency between a single feature and decision
0 5 10 15 20 25 30 35 40 45
Feature index
Dep
ende
ncy
0
02
04
06
08
1
12
14
Figure 11 Kernelized mutual information between a single featureand decision
Please note that if 1198651sube 1198652 we have 119870
2sube 1198701 However
1198702sube 1198701does not mean 119865
1sube 1198652
Definition 5 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the
kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as
MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)
MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used
Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust
1 2 3 4
0
02
04
06
08
1
Selected feature number
Fuzz
y de
pend
ency
03475
09006
0999609969
Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)
0
02
04
06
08
1
12
14
16
18
Kern
eliz
ed m
utua
l inf
orm
atio
n
1267
161215771510
1 2 3 4
Number of selected features
Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)
Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets
Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space
Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results
Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In
The Scientific World Journal 9
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency
each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively
In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature
Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different
regions which leads to the difficulty in learning classificationmodels
However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model
Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance
5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm
After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent
10 The Scientific World Journal
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information
with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects
Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this
work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows
Decision tree algorithm C45Input a set of training samples 119880 = 119909
1 1199092 119909
119898
with features 119865 = 1198911 1198912 119891
119899
Stopping criterion 120591Output decision tree 119879
(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-
mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized
information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on
f best and add those nodes as children of node untilstopping criterion 120591 is satisfied
(6) Output 119879
The Scientific World Journal 11
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information
F2
le050gt050
F37 F2
le041
Class 2
gt018
Class 5
le049
Class 1
gt049
Class 4 F3
Class 3
gt041
le018
Figure 17 Decision tree trained on the features selected with fuzzyrough sets
We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively
F39le017gt017
F39 F40
le042
Class 3
gt042
Class 5
le045
Class 2
gt045
F38le080gt080
Class 1Class 4
Figure 18 Decision tree trained on the features selected withkernelized mutual information
We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules
(1) if F2 gt 050 and F37 gt 049 then the decision is Class4
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 3
Exhaust
Turbine
Combustor
Compressor
Output shaft and gearbox
Titan 130
Single shaft gas turbine forpower generation applications
Figure 1 Prototype structure of a gas turbine
some decision and explains how the decision has been madeThe structure of the system is shown in Figure 2
One of the webpages of the system is given in Figure 3where we can see the rose figure of exhaust temperaturesand some statistical parameters varying with time are alsopresented
3 Abnormality Detection inExhaust Temperatures Based onInformation Entropy
Exhaust temperature is one of the most critical parameters ina gas turbine as excessive turbine temperatures may lead tolife reduction or catastrophic failures In the current gener-ation of machines temperatures at the combustor dischargeare too high for the type of instrumentation available Exhausttemperature is also used as an indicator of turbine inlettemperature
As the temperature profile out of a gas turbine is notuniform a number of probes will help pinpoint disturbancesor malfunctions in the gas turbine by highlighting the shiftsin the temperature profile Thus there are usually a set ofthermometers fixed on the exhaust If the system is normallyoperating all the thermometers give similar outputs How-ever if a fault occurs to some components of the turbinedifferent temperatures will be observed The uniformity ofexhaust temperatures reflects the state of the system So weshould develop an index to measure the uniformity of theexhaust temperatures In this work we consider the entropyfunction for it is widely used in measuring uniformity ofrandomvariables However to the best of our knowledge thisfunction has not been used in this domain
Assume that there are 119899 thermometers and their outputsare 119879119894 119894 = 1 119899 respectively Then we define the unifor-
mity of these outputs as
119864 (119879) = minus
119899
sum
119894=1
119879119894
119879
log2
119879119894
119879
(1)
where 119879 = sum119895119879119895 As 119879
119894ge 0 we define 0 log 0 = 0
Obviously we have log2119899 ge 119864(119879) ge 0119864 (119879) = log
2119899 if and
only if 1198791= 1198792= sdot sdot sdot = 119879
119899 In this case all the thermometers
produce the same output So the uniformity of the sensorsis maximal In another extreme case if 119879
1= 1198792= 119879119894minus1
=
119879119894+1
sdot sdot sdot = 119879119899= 0 and 119879
119894= 119879 then 119864 (119879) = 0
It is notable that the value of entropy is independentof the values of thermometers while it depends on thedistribution of the temperatures The entropy is maximal ifall the thermometers output the same values
Now we show two sets of real exhaust temperatures mea-sured on an oil well drilling platform where 13 thermometersare fixed In the first set the gas turbine starts from a timepoint and then runs for several minutes finally the systemstops
Observing the curves in Figure 4 we can see that the13 thermometers give the almost the same outputs at thebeginning In fact the outputs are the room temperature inthis case as shown in Figure 6(a) Thus the entropy reachesthe peak value
Some typical samples are presented in Figure 6 wherethe temperature distributions around the exhaust at timepoints 119905 = 5130250400 and 500 are given Obviously thedistributions at 119905 = 130250 and 400 are not desirable It canbe derived that some abnormality occurs to the system Theentropy of temperature distribution is given in Figure 5
4 The Scientific World Journal
Local monitoring Data acquiring Inner firewallOuter firewall
TelstarSatellite signal generator Satellite signal receiver
Inner firewall Outer firewall Internet
Online database subsystem
Video conference system
Internet
Client
Client
Client
Intelligent diagnosis subsystem
Human-system interface
Data communication
Oil drilling platforms
Gas turbine
Diagnosis center
Figure 2 Structure of the remote system of condition monitoring and fault analysis
Figure 3 A typical webpage for monitoring of the subsystem
0 50 100 150 200 250 300 350 400 450 500
0
100
200
300
400
500
600
700
Time
Tem
pera
ture
Figure 4 Exhaust temperatures from a set of thermometers
0 50 100 150 200 250 300 350 400 450 500
Time
255
26
265
27
275
28
Info
rmat
ion
entro
py
Figure 5 Uniformity of the temperatures (red dash line is the idealcase blue line is the real case)
The Scientific World Journal 5
20
40
30
210
60
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 5
20
40
60
80
100
30
210
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 =130
30
210
60
240
90
270
120
300
150
330
180 0
400
500
300
200
100
(c) Rose map of exhaust temperatures at 119905 =250
30
50
210
60
240
90
270
120
100
150
300
150
330
180 0
(d) Rose map of exhaust temperatures at 119905 =400
100
80
60
40
20
30
210
60
240
90
270
120
300
150
330
180 0
(e) Rose map of exhaust temperatures at 119905 =500
Figure 6 Samples of temperature distribution in different times
Another example is also given in Figures 7 to 9 In thisexample there is significant difference between the outputsof 13 thermometers even when the gas turbine is not runningjust as shown in Figure 9(a)Thus the entropy of temperaturedistribution is a little lower than the ideal case as shown inFigure 8 Besides some representative samples are also givenin Figure 9
Considering the above examples we can see that the func-tion of entropy is an effective measurement of uniformity Itcan be used to reflect the uniformity of exhaust temperatures
If the uniformity is less than a threshold some faults possiblyoccur to the gas path of the gas turbine Thus the entropyfunction is used as an index of the health of the gas path
4 Fault Feature Quality Evaluation withGeneralized Entropy
The above section gives an approach to detecting the abnor-mality in the exhaust temperature distribution However thefunction of entropy cannot distinguish what kind of faults
6 The Scientific World Journal
0 500 1000 1500
0
100
200
300
400
500
600
700
Time
Tem
pera
ture
Figure 7 Exhaust temperatures from another set of thermometers
0 500 1000 1500
Time
245
25
255
26
265
27
275
28
Info
rmat
ion
entro
py
Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case
occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault
Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue
Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879
0is also an impor-
tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as
119865 = 1198790 1198791 1198792 119879
119899 1198791minus 1198792 1198792minus 1198793
119879119899minus 1198791 1198791015840
1 1198791015840
2 119879
1015840
119899
(2)
where 1198791015840119894= 119879119894(119895) minus 119879
119894(119895 minus 1) 119879
119894(119895) is the temperature at time
119895 of the 119894th thermometerApart from the above features we can also construct other
attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally
There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it
Given a set of samples 119880 = 1199091 1199092 119909
119898 each sample
is described with 119899 features 119865 = 1198911 1198912 119891
119899 As to
classification learning each training sample 119909119894is associated
with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a
kernel function119870 we can calculate a kernel matrix
119870 =[
[
[
11989611
1198961119898
d
1198961198981
119896119898119898
]
]
]
(3)
where 119896119894119895= 119896(119909
119894 119909119895) The Gaussian function is a representa-
tive kernel function
119896119894119895= exp(minus
10038171003817100381710038171003817119909119894minus 119909119895
10038171003817100381710038171003817
2
120590
) (4)
A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896
119894119895= 119896119895119894
Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as
119863 =[
[
[
11988911
1198891119898
d
1198891198981
119889119898119898
]
]
]
(5)
where 119889119894119895= 1 if 119910
119894= 119910119895 otherwise 119889
119894119895= 0 In fact the matrix
119863 is a matching kernel
Definition 1 Given a set of samples119880 = 1199091 1199092 119909
119898 each
sample is described with 119899 features 119865 = 1198911 1198912 119891
119899 1198651015840 sube
119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as
119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(6)
where119870119894= sum119898
119895=1119896119894119895
As to the above entropy function if we use Gaussianfunction as the kernel we have log
2119898 ge 119864 (119870) ge 0 119864 (119870) = 0
if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log
2119898 if and only if 119896
119894119895= 0
119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be
The Scientific World Journal 7
30
210
60
15
10
5
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 500
30
210
800
600
400
200
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 = 758
30
210
250
200
150
100
50
60
240
90
270
120
300
150
330
180 0
(c) Rose map of exhaust temperatures at 119905 = 820
30
210
60
240
90
270
120
300
150
330
180 0
400
300
200
100
(d) Rose map of exhaust temperatures at 119905 =1220
Figure 9 Samples of temperature distribution in different moments
distinguished with the current features while 119864 (119870) = log2119898
means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log
2119898]
Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge
119864(1198702) where119870
1sube 1198702means119870
1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895
Definition 2 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651 1198652sube 119865 119870
1and 119870
2are two kernel matrices induced by 119865
1
and 1198652119870 is a new function computed with 119865
1cup 1198652 Then the
joint entropy of 1198651and 119865
2is defined as
119864 (1198701 1198702) = 119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(7)
where119870119894= sum119898
119895=1119896119894119895
As to the Gaussian function 119870(119909119894 119909119895) = 119870
1(119909119894 119909119895) times
1198702(119909119894 119909119895) Thus 119870 sube 119870
1and 119870 sube 119870
2 In this case 119864(119870) ge
119864(1198701) and 119864(119870) ge 119864(119870
2)
Definition 3 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912
119891119899One has 119865
1 1198652sube 119865 119870
1and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Knowning 119865
1 the condition entropy of 119865
2is
defined as
119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870
1) (8)
As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870
2)
so 119864(1198701| 1198702) ge 0 and 119864(119870
2| 1198701) ge 0
Definition 4 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
One has 1198651 1198652
sube 119865 1198701and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Then the mutual information of 119870
1and 119870
2is
defined as
MI (1198701 1198702) = 119864 (119870
1) + 119864 (119870
2) minus 119864 (119870) (9)
As to Gaussian kernel MI(1198701 1198702) = MI(119870
2 1198701) If 119870
1sube
1198702 we have MI(119870
1 1198702) = 119864(119870
2) and if 119870
2sube 1198701 we have
MI(1198701 1198702) = 119864(119870
1)
8 The Scientific World Journal
0 5 10 15 20 25 30 35 40 450
005
01
015
02
025
03
035
04
Feature index
Dep
ende
ncy
Figure 10 Fuzzy dependency between a single feature and decision
0 5 10 15 20 25 30 35 40 45
Feature index
Dep
ende
ncy
0
02
04
06
08
1
12
14
Figure 11 Kernelized mutual information between a single featureand decision
Please note that if 1198651sube 1198652 we have 119870
2sube 1198701 However
1198702sube 1198701does not mean 119865
1sube 1198652
Definition 5 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the
kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as
MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)
MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used
Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust
1 2 3 4
0
02
04
06
08
1
Selected feature number
Fuzz
y de
pend
ency
03475
09006
0999609969
Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)
0
02
04
06
08
1
12
14
16
18
Kern
eliz
ed m
utua
l inf
orm
atio
n
1267
161215771510
1 2 3 4
Number of selected features
Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)
Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets
Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space
Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results
Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In
The Scientific World Journal 9
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency
each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively
In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature
Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different
regions which leads to the difficulty in learning classificationmodels
However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model
Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance
5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm
After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent
10 The Scientific World Journal
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information
with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects
Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this
work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows
Decision tree algorithm C45Input a set of training samples 119880 = 119909
1 1199092 119909
119898
with features 119865 = 1198911 1198912 119891
119899
Stopping criterion 120591Output decision tree 119879
(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-
mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized
information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on
f best and add those nodes as children of node untilstopping criterion 120591 is satisfied
(6) Output 119879
The Scientific World Journal 11
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information
F2
le050gt050
F37 F2
le041
Class 2
gt018
Class 5
le049
Class 1
gt049
Class 4 F3
Class 3
gt041
le018
Figure 17 Decision tree trained on the features selected with fuzzyrough sets
We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively
F39le017gt017
F39 F40
le042
Class 3
gt042
Class 5
le045
Class 2
gt045
F38le080gt080
Class 1Class 4
Figure 18 Decision tree trained on the features selected withkernelized mutual information
We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules
(1) if F2 gt 050 and F37 gt 049 then the decision is Class4
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
4 The Scientific World Journal
Local monitoring Data acquiring Inner firewallOuter firewall
TelstarSatellite signal generator Satellite signal receiver
Inner firewall Outer firewall Internet
Online database subsystem
Video conference system
Internet
Client
Client
Client
Intelligent diagnosis subsystem
Human-system interface
Data communication
Oil drilling platforms
Gas turbine
Diagnosis center
Figure 2 Structure of the remote system of condition monitoring and fault analysis
Figure 3 A typical webpage for monitoring of the subsystem
0 50 100 150 200 250 300 350 400 450 500
0
100
200
300
400
500
600
700
Time
Tem
pera
ture
Figure 4 Exhaust temperatures from a set of thermometers
0 50 100 150 200 250 300 350 400 450 500
Time
255
26
265
27
275
28
Info
rmat
ion
entro
py
Figure 5 Uniformity of the temperatures (red dash line is the idealcase blue line is the real case)
The Scientific World Journal 5
20
40
30
210
60
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 5
20
40
60
80
100
30
210
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 =130
30
210
60
240
90
270
120
300
150
330
180 0
400
500
300
200
100
(c) Rose map of exhaust temperatures at 119905 =250
30
50
210
60
240
90
270
120
100
150
300
150
330
180 0
(d) Rose map of exhaust temperatures at 119905 =400
100
80
60
40
20
30
210
60
240
90
270
120
300
150
330
180 0
(e) Rose map of exhaust temperatures at 119905 =500
Figure 6 Samples of temperature distribution in different times
Another example is also given in Figures 7 to 9 In thisexample there is significant difference between the outputsof 13 thermometers even when the gas turbine is not runningjust as shown in Figure 9(a)Thus the entropy of temperaturedistribution is a little lower than the ideal case as shown inFigure 8 Besides some representative samples are also givenin Figure 9
Considering the above examples we can see that the func-tion of entropy is an effective measurement of uniformity Itcan be used to reflect the uniformity of exhaust temperatures
If the uniformity is less than a threshold some faults possiblyoccur to the gas path of the gas turbine Thus the entropyfunction is used as an index of the health of the gas path
4 Fault Feature Quality Evaluation withGeneralized Entropy
The above section gives an approach to detecting the abnor-mality in the exhaust temperature distribution However thefunction of entropy cannot distinguish what kind of faults
6 The Scientific World Journal
0 500 1000 1500
0
100
200
300
400
500
600
700
Time
Tem
pera
ture
Figure 7 Exhaust temperatures from another set of thermometers
0 500 1000 1500
Time
245
25
255
26
265
27
275
28
Info
rmat
ion
entro
py
Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case
occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault
Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue
Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879
0is also an impor-
tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as
119865 = 1198790 1198791 1198792 119879
119899 1198791minus 1198792 1198792minus 1198793
119879119899minus 1198791 1198791015840
1 1198791015840
2 119879
1015840
119899
(2)
where 1198791015840119894= 119879119894(119895) minus 119879
119894(119895 minus 1) 119879
119894(119895) is the temperature at time
119895 of the 119894th thermometerApart from the above features we can also construct other
attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally
There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it
Given a set of samples 119880 = 1199091 1199092 119909
119898 each sample
is described with 119899 features 119865 = 1198911 1198912 119891
119899 As to
classification learning each training sample 119909119894is associated
with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a
kernel function119870 we can calculate a kernel matrix
119870 =[
[
[
11989611
1198961119898
d
1198961198981
119896119898119898
]
]
]
(3)
where 119896119894119895= 119896(119909
119894 119909119895) The Gaussian function is a representa-
tive kernel function
119896119894119895= exp(minus
10038171003817100381710038171003817119909119894minus 119909119895
10038171003817100381710038171003817
2
120590
) (4)
A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896
119894119895= 119896119895119894
Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as
119863 =[
[
[
11988911
1198891119898
d
1198891198981
119889119898119898
]
]
]
(5)
where 119889119894119895= 1 if 119910
119894= 119910119895 otherwise 119889
119894119895= 0 In fact the matrix
119863 is a matching kernel
Definition 1 Given a set of samples119880 = 1199091 1199092 119909
119898 each
sample is described with 119899 features 119865 = 1198911 1198912 119891
119899 1198651015840 sube
119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as
119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(6)
where119870119894= sum119898
119895=1119896119894119895
As to the above entropy function if we use Gaussianfunction as the kernel we have log
2119898 ge 119864 (119870) ge 0 119864 (119870) = 0
if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log
2119898 if and only if 119896
119894119895= 0
119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be
The Scientific World Journal 7
30
210
60
15
10
5
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 500
30
210
800
600
400
200
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 = 758
30
210
250
200
150
100
50
60
240
90
270
120
300
150
330
180 0
(c) Rose map of exhaust temperatures at 119905 = 820
30
210
60
240
90
270
120
300
150
330
180 0
400
300
200
100
(d) Rose map of exhaust temperatures at 119905 =1220
Figure 9 Samples of temperature distribution in different moments
distinguished with the current features while 119864 (119870) = log2119898
means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log
2119898]
Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge
119864(1198702) where119870
1sube 1198702means119870
1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895
Definition 2 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651 1198652sube 119865 119870
1and 119870
2are two kernel matrices induced by 119865
1
and 1198652119870 is a new function computed with 119865
1cup 1198652 Then the
joint entropy of 1198651and 119865
2is defined as
119864 (1198701 1198702) = 119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(7)
where119870119894= sum119898
119895=1119896119894119895
As to the Gaussian function 119870(119909119894 119909119895) = 119870
1(119909119894 119909119895) times
1198702(119909119894 119909119895) Thus 119870 sube 119870
1and 119870 sube 119870
2 In this case 119864(119870) ge
119864(1198701) and 119864(119870) ge 119864(119870
2)
Definition 3 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912
119891119899One has 119865
1 1198652sube 119865 119870
1and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Knowning 119865
1 the condition entropy of 119865
2is
defined as
119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870
1) (8)
As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870
2)
so 119864(1198701| 1198702) ge 0 and 119864(119870
2| 1198701) ge 0
Definition 4 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
One has 1198651 1198652
sube 119865 1198701and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Then the mutual information of 119870
1and 119870
2is
defined as
MI (1198701 1198702) = 119864 (119870
1) + 119864 (119870
2) minus 119864 (119870) (9)
As to Gaussian kernel MI(1198701 1198702) = MI(119870
2 1198701) If 119870
1sube
1198702 we have MI(119870
1 1198702) = 119864(119870
2) and if 119870
2sube 1198701 we have
MI(1198701 1198702) = 119864(119870
1)
8 The Scientific World Journal
0 5 10 15 20 25 30 35 40 450
005
01
015
02
025
03
035
04
Feature index
Dep
ende
ncy
Figure 10 Fuzzy dependency between a single feature and decision
0 5 10 15 20 25 30 35 40 45
Feature index
Dep
ende
ncy
0
02
04
06
08
1
12
14
Figure 11 Kernelized mutual information between a single featureand decision
Please note that if 1198651sube 1198652 we have 119870
2sube 1198701 However
1198702sube 1198701does not mean 119865
1sube 1198652
Definition 5 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the
kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as
MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)
MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used
Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust
1 2 3 4
0
02
04
06
08
1
Selected feature number
Fuzz
y de
pend
ency
03475
09006
0999609969
Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)
0
02
04
06
08
1
12
14
16
18
Kern
eliz
ed m
utua
l inf
orm
atio
n
1267
161215771510
1 2 3 4
Number of selected features
Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)
Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets
Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space
Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results
Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In
The Scientific World Journal 9
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency
each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively
In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature
Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different
regions which leads to the difficulty in learning classificationmodels
However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model
Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance
5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm
After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent
10 The Scientific World Journal
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information
with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects
Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this
work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows
Decision tree algorithm C45Input a set of training samples 119880 = 119909
1 1199092 119909
119898
with features 119865 = 1198911 1198912 119891
119899
Stopping criterion 120591Output decision tree 119879
(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-
mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized
information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on
f best and add those nodes as children of node untilstopping criterion 120591 is satisfied
(6) Output 119879
The Scientific World Journal 11
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information
F2
le050gt050
F37 F2
le041
Class 2
gt018
Class 5
le049
Class 1
gt049
Class 4 F3
Class 3
gt041
le018
Figure 17 Decision tree trained on the features selected with fuzzyrough sets
We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively
F39le017gt017
F39 F40
le042
Class 3
gt042
Class 5
le045
Class 2
gt045
F38le080gt080
Class 1Class 4
Figure 18 Decision tree trained on the features selected withkernelized mutual information
We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules
(1) if F2 gt 050 and F37 gt 049 then the decision is Class4
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 5
20
40
30
210
60
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 5
20
40
60
80
100
30
210
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 =130
30
210
60
240
90
270
120
300
150
330
180 0
400
500
300
200
100
(c) Rose map of exhaust temperatures at 119905 =250
30
50
210
60
240
90
270
120
100
150
300
150
330
180 0
(d) Rose map of exhaust temperatures at 119905 =400
100
80
60
40
20
30
210
60
240
90
270
120
300
150
330
180 0
(e) Rose map of exhaust temperatures at 119905 =500
Figure 6 Samples of temperature distribution in different times
Another example is also given in Figures 7 to 9 In thisexample there is significant difference between the outputsof 13 thermometers even when the gas turbine is not runningjust as shown in Figure 9(a)Thus the entropy of temperaturedistribution is a little lower than the ideal case as shown inFigure 8 Besides some representative samples are also givenin Figure 9
Considering the above examples we can see that the func-tion of entropy is an effective measurement of uniformity Itcan be used to reflect the uniformity of exhaust temperatures
If the uniformity is less than a threshold some faults possiblyoccur to the gas path of the gas turbine Thus the entropyfunction is used as an index of the health of the gas path
4 Fault Feature Quality Evaluation withGeneralized Entropy
The above section gives an approach to detecting the abnor-mality in the exhaust temperature distribution However thefunction of entropy cannot distinguish what kind of faults
6 The Scientific World Journal
0 500 1000 1500
0
100
200
300
400
500
600
700
Time
Tem
pera
ture
Figure 7 Exhaust temperatures from another set of thermometers
0 500 1000 1500
Time
245
25
255
26
265
27
275
28
Info
rmat
ion
entro
py
Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case
occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault
Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue
Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879
0is also an impor-
tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as
119865 = 1198790 1198791 1198792 119879
119899 1198791minus 1198792 1198792minus 1198793
119879119899minus 1198791 1198791015840
1 1198791015840
2 119879
1015840
119899
(2)
where 1198791015840119894= 119879119894(119895) minus 119879
119894(119895 minus 1) 119879
119894(119895) is the temperature at time
119895 of the 119894th thermometerApart from the above features we can also construct other
attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally
There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it
Given a set of samples 119880 = 1199091 1199092 119909
119898 each sample
is described with 119899 features 119865 = 1198911 1198912 119891
119899 As to
classification learning each training sample 119909119894is associated
with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a
kernel function119870 we can calculate a kernel matrix
119870 =[
[
[
11989611
1198961119898
d
1198961198981
119896119898119898
]
]
]
(3)
where 119896119894119895= 119896(119909
119894 119909119895) The Gaussian function is a representa-
tive kernel function
119896119894119895= exp(minus
10038171003817100381710038171003817119909119894minus 119909119895
10038171003817100381710038171003817
2
120590
) (4)
A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896
119894119895= 119896119895119894
Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as
119863 =[
[
[
11988911
1198891119898
d
1198891198981
119889119898119898
]
]
]
(5)
where 119889119894119895= 1 if 119910
119894= 119910119895 otherwise 119889
119894119895= 0 In fact the matrix
119863 is a matching kernel
Definition 1 Given a set of samples119880 = 1199091 1199092 119909
119898 each
sample is described with 119899 features 119865 = 1198911 1198912 119891
119899 1198651015840 sube
119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as
119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(6)
where119870119894= sum119898
119895=1119896119894119895
As to the above entropy function if we use Gaussianfunction as the kernel we have log
2119898 ge 119864 (119870) ge 0 119864 (119870) = 0
if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log
2119898 if and only if 119896
119894119895= 0
119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be
The Scientific World Journal 7
30
210
60
15
10
5
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 500
30
210
800
600
400
200
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 = 758
30
210
250
200
150
100
50
60
240
90
270
120
300
150
330
180 0
(c) Rose map of exhaust temperatures at 119905 = 820
30
210
60
240
90
270
120
300
150
330
180 0
400
300
200
100
(d) Rose map of exhaust temperatures at 119905 =1220
Figure 9 Samples of temperature distribution in different moments
distinguished with the current features while 119864 (119870) = log2119898
means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log
2119898]
Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge
119864(1198702) where119870
1sube 1198702means119870
1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895
Definition 2 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651 1198652sube 119865 119870
1and 119870
2are two kernel matrices induced by 119865
1
and 1198652119870 is a new function computed with 119865
1cup 1198652 Then the
joint entropy of 1198651and 119865
2is defined as
119864 (1198701 1198702) = 119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(7)
where119870119894= sum119898
119895=1119896119894119895
As to the Gaussian function 119870(119909119894 119909119895) = 119870
1(119909119894 119909119895) times
1198702(119909119894 119909119895) Thus 119870 sube 119870
1and 119870 sube 119870
2 In this case 119864(119870) ge
119864(1198701) and 119864(119870) ge 119864(119870
2)
Definition 3 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912
119891119899One has 119865
1 1198652sube 119865 119870
1and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Knowning 119865
1 the condition entropy of 119865
2is
defined as
119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870
1) (8)
As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870
2)
so 119864(1198701| 1198702) ge 0 and 119864(119870
2| 1198701) ge 0
Definition 4 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
One has 1198651 1198652
sube 119865 1198701and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Then the mutual information of 119870
1and 119870
2is
defined as
MI (1198701 1198702) = 119864 (119870
1) + 119864 (119870
2) minus 119864 (119870) (9)
As to Gaussian kernel MI(1198701 1198702) = MI(119870
2 1198701) If 119870
1sube
1198702 we have MI(119870
1 1198702) = 119864(119870
2) and if 119870
2sube 1198701 we have
MI(1198701 1198702) = 119864(119870
1)
8 The Scientific World Journal
0 5 10 15 20 25 30 35 40 450
005
01
015
02
025
03
035
04
Feature index
Dep
ende
ncy
Figure 10 Fuzzy dependency between a single feature and decision
0 5 10 15 20 25 30 35 40 45
Feature index
Dep
ende
ncy
0
02
04
06
08
1
12
14
Figure 11 Kernelized mutual information between a single featureand decision
Please note that if 1198651sube 1198652 we have 119870
2sube 1198701 However
1198702sube 1198701does not mean 119865
1sube 1198652
Definition 5 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the
kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as
MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)
MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used
Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust
1 2 3 4
0
02
04
06
08
1
Selected feature number
Fuzz
y de
pend
ency
03475
09006
0999609969
Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)
0
02
04
06
08
1
12
14
16
18
Kern
eliz
ed m
utua
l inf
orm
atio
n
1267
161215771510
1 2 3 4
Number of selected features
Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)
Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets
Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space
Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results
Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In
The Scientific World Journal 9
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency
each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively
In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature
Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different
regions which leads to the difficulty in learning classificationmodels
However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model
Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance
5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm
After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent
10 The Scientific World Journal
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information
with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects
Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this
work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows
Decision tree algorithm C45Input a set of training samples 119880 = 119909
1 1199092 119909
119898
with features 119865 = 1198911 1198912 119891
119899
Stopping criterion 120591Output decision tree 119879
(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-
mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized
information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on
f best and add those nodes as children of node untilstopping criterion 120591 is satisfied
(6) Output 119879
The Scientific World Journal 11
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information
F2
le050gt050
F37 F2
le041
Class 2
gt018
Class 5
le049
Class 1
gt049
Class 4 F3
Class 3
gt041
le018
Figure 17 Decision tree trained on the features selected with fuzzyrough sets
We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively
F39le017gt017
F39 F40
le042
Class 3
gt042
Class 5
le045
Class 2
gt045
F38le080gt080
Class 1Class 4
Figure 18 Decision tree trained on the features selected withkernelized mutual information
We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules
(1) if F2 gt 050 and F37 gt 049 then the decision is Class4
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
6 The Scientific World Journal
0 500 1000 1500
0
100
200
300
400
500
600
700
Time
Tem
pera
ture
Figure 7 Exhaust temperatures from another set of thermometers
0 500 1000 1500
Time
245
25
255
26
265
27
275
28
Info
rmat
ion
entro
py
Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case
occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault
Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue
Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879
0is also an impor-
tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as
119865 = 1198790 1198791 1198792 119879
119899 1198791minus 1198792 1198792minus 1198793
119879119899minus 1198791 1198791015840
1 1198791015840
2 119879
1015840
119899
(2)
where 1198791015840119894= 119879119894(119895) minus 119879
119894(119895 minus 1) 119879
119894(119895) is the temperature at time
119895 of the 119894th thermometerApart from the above features we can also construct other
attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally
There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it
Given a set of samples 119880 = 1199091 1199092 119909
119898 each sample
is described with 119899 features 119865 = 1198911 1198912 119891
119899 As to
classification learning each training sample 119909119894is associated
with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a
kernel function119870 we can calculate a kernel matrix
119870 =[
[
[
11989611
1198961119898
d
1198961198981
119896119898119898
]
]
]
(3)
where 119896119894119895= 119896(119909
119894 119909119895) The Gaussian function is a representa-
tive kernel function
119896119894119895= exp(minus
10038171003817100381710038171003817119909119894minus 119909119895
10038171003817100381710038171003817
2
120590
) (4)
A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896
119894119895= 119896119895119894
Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as
119863 =[
[
[
11988911
1198891119898
d
1198891198981
119889119898119898
]
]
]
(5)
where 119889119894119895= 1 if 119910
119894= 119910119895 otherwise 119889
119894119895= 0 In fact the matrix
119863 is a matching kernel
Definition 1 Given a set of samples119880 = 1199091 1199092 119909
119898 each
sample is described with 119899 features 119865 = 1198911 1198912 119891
119899 1198651015840 sube
119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as
119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(6)
where119870119894= sum119898
119895=1119896119894119895
As to the above entropy function if we use Gaussianfunction as the kernel we have log
2119898 ge 119864 (119870) ge 0 119864 (119870) = 0
if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log
2119898 if and only if 119896
119894119895= 0
119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be
The Scientific World Journal 7
30
210
60
15
10
5
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 500
30
210
800
600
400
200
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 = 758
30
210
250
200
150
100
50
60
240
90
270
120
300
150
330
180 0
(c) Rose map of exhaust temperatures at 119905 = 820
30
210
60
240
90
270
120
300
150
330
180 0
400
300
200
100
(d) Rose map of exhaust temperatures at 119905 =1220
Figure 9 Samples of temperature distribution in different moments
distinguished with the current features while 119864 (119870) = log2119898
means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log
2119898]
Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge
119864(1198702) where119870
1sube 1198702means119870
1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895
Definition 2 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651 1198652sube 119865 119870
1and 119870
2are two kernel matrices induced by 119865
1
and 1198652119870 is a new function computed with 119865
1cup 1198652 Then the
joint entropy of 1198651and 119865
2is defined as
119864 (1198701 1198702) = 119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(7)
where119870119894= sum119898
119895=1119896119894119895
As to the Gaussian function 119870(119909119894 119909119895) = 119870
1(119909119894 119909119895) times
1198702(119909119894 119909119895) Thus 119870 sube 119870
1and 119870 sube 119870
2 In this case 119864(119870) ge
119864(1198701) and 119864(119870) ge 119864(119870
2)
Definition 3 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912
119891119899One has 119865
1 1198652sube 119865 119870
1and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Knowning 119865
1 the condition entropy of 119865
2is
defined as
119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870
1) (8)
As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870
2)
so 119864(1198701| 1198702) ge 0 and 119864(119870
2| 1198701) ge 0
Definition 4 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
One has 1198651 1198652
sube 119865 1198701and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Then the mutual information of 119870
1and 119870
2is
defined as
MI (1198701 1198702) = 119864 (119870
1) + 119864 (119870
2) minus 119864 (119870) (9)
As to Gaussian kernel MI(1198701 1198702) = MI(119870
2 1198701) If 119870
1sube
1198702 we have MI(119870
1 1198702) = 119864(119870
2) and if 119870
2sube 1198701 we have
MI(1198701 1198702) = 119864(119870
1)
8 The Scientific World Journal
0 5 10 15 20 25 30 35 40 450
005
01
015
02
025
03
035
04
Feature index
Dep
ende
ncy
Figure 10 Fuzzy dependency between a single feature and decision
0 5 10 15 20 25 30 35 40 45
Feature index
Dep
ende
ncy
0
02
04
06
08
1
12
14
Figure 11 Kernelized mutual information between a single featureand decision
Please note that if 1198651sube 1198652 we have 119870
2sube 1198701 However
1198702sube 1198701does not mean 119865
1sube 1198652
Definition 5 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the
kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as
MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)
MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used
Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust
1 2 3 4
0
02
04
06
08
1
Selected feature number
Fuzz
y de
pend
ency
03475
09006
0999609969
Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)
0
02
04
06
08
1
12
14
16
18
Kern
eliz
ed m
utua
l inf
orm
atio
n
1267
161215771510
1 2 3 4
Number of selected features
Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)
Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets
Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space
Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results
Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In
The Scientific World Journal 9
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency
each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively
In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature
Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different
regions which leads to the difficulty in learning classificationmodels
However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model
Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance
5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm
After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent
10 The Scientific World Journal
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information
with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects
Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this
work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows
Decision tree algorithm C45Input a set of training samples 119880 = 119909
1 1199092 119909
119898
with features 119865 = 1198911 1198912 119891
119899
Stopping criterion 120591Output decision tree 119879
(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-
mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized
information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on
f best and add those nodes as children of node untilstopping criterion 120591 is satisfied
(6) Output 119879
The Scientific World Journal 11
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information
F2
le050gt050
F37 F2
le041
Class 2
gt018
Class 5
le049
Class 1
gt049
Class 4 F3
Class 3
gt041
le018
Figure 17 Decision tree trained on the features selected with fuzzyrough sets
We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively
F39le017gt017
F39 F40
le042
Class 3
gt042
Class 5
le045
Class 2
gt045
F38le080gt080
Class 1Class 4
Figure 18 Decision tree trained on the features selected withkernelized mutual information
We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules
(1) if F2 gt 050 and F37 gt 049 then the decision is Class4
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 7
30
210
60
15
10
5
240
90
270
120
300
150
330
180 0
(a) Rose map of exhaust temperatures at 119905 = 500
30
210
800
600
400
200
60
240
90
270
120
300
150
330
180 0
(b) Rose map of exhaust temperatures at 119905 = 758
30
210
250
200
150
100
50
60
240
90
270
120
300
150
330
180 0
(c) Rose map of exhaust temperatures at 119905 = 820
30
210
60
240
90
270
120
300
150
330
180 0
400
300
200
100
(d) Rose map of exhaust temperatures at 119905 =1220
Figure 9 Samples of temperature distribution in different moments
distinguished with the current features while 119864 (119870) = log2119898
means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log
2119898]
Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge
119864(1198702) where119870
1sube 1198702means119870
1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895
Definition 2 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651 1198652sube 119865 119870
1and 119870
2are two kernel matrices induced by 119865
1
and 1198652119870 is a new function computed with 119865
1cup 1198652 Then the
joint entropy of 1198651and 119865
2is defined as
119864 (1198701 1198702) = 119864 (119870) = minus
1
119898
119898
sum
119894=1
log2
119870119894
119898
(7)
where119870119894= sum119898
119895=1119896119894119895
As to the Gaussian function 119870(119909119894 119909119895) = 119870
1(119909119894 119909119895) times
1198702(119909119894 119909119895) Thus 119870 sube 119870
1and 119870 sube 119870
2 In this case 119864(119870) ge
119864(1198701) and 119864(119870) ge 119864(119870
2)
Definition 3 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912
119891119899One has 119865
1 1198652sube 119865 119870
1and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Knowning 119865
1 the condition entropy of 119865
2is
defined as
119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870
1) (8)
As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870
2)
so 119864(1198701| 1198702) ge 0 and 119864(119870
2| 1198701) ge 0
Definition 4 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
One has 1198651 1198652
sube 119865 1198701and 119870
2are two kernel matrices
induced by 1198651and 119865
2 119870 is a new kernel function computed
with 1198651cup 1198652 Then the mutual information of 119870
1and 119870
2is
defined as
MI (1198701 1198702) = 119864 (119870
1) + 119864 (119870
2) minus 119864 (119870) (9)
As to Gaussian kernel MI(1198701 1198702) = MI(119870
2 1198701) If 119870
1sube
1198702 we have MI(119870
1 1198702) = 119864(119870
2) and if 119870
2sube 1198701 we have
MI(1198701 1198702) = 119864(119870
1)
8 The Scientific World Journal
0 5 10 15 20 25 30 35 40 450
005
01
015
02
025
03
035
04
Feature index
Dep
ende
ncy
Figure 10 Fuzzy dependency between a single feature and decision
0 5 10 15 20 25 30 35 40 45
Feature index
Dep
ende
ncy
0
02
04
06
08
1
12
14
Figure 11 Kernelized mutual information between a single featureand decision
Please note that if 1198651sube 1198652 we have 119870
2sube 1198701 However
1198702sube 1198701does not mean 119865
1sube 1198652
Definition 5 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the
kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as
MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)
MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used
Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust
1 2 3 4
0
02
04
06
08
1
Selected feature number
Fuzz
y de
pend
ency
03475
09006
0999609969
Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)
0
02
04
06
08
1
12
14
16
18
Kern
eliz
ed m
utua
l inf
orm
atio
n
1267
161215771510
1 2 3 4
Number of selected features
Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)
Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets
Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space
Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results
Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In
The Scientific World Journal 9
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency
each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively
In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature
Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different
regions which leads to the difficulty in learning classificationmodels
However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model
Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance
5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm
After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent
10 The Scientific World Journal
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information
with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects
Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this
work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows
Decision tree algorithm C45Input a set of training samples 119880 = 119909
1 1199092 119909
119898
with features 119865 = 1198911 1198912 119891
119899
Stopping criterion 120591Output decision tree 119879
(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-
mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized
information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on
f best and add those nodes as children of node untilstopping criterion 120591 is satisfied
(6) Output 119879
The Scientific World Journal 11
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information
F2
le050gt050
F37 F2
le041
Class 2
gt018
Class 5
le049
Class 1
gt049
Class 4 F3
Class 3
gt041
le018
Figure 17 Decision tree trained on the features selected with fuzzyrough sets
We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively
F39le017gt017
F39 F40
le042
Class 3
gt042
Class 5
le045
Class 2
gt045
F38le080gt080
Class 1Class 4
Figure 18 Decision tree trained on the features selected withkernelized mutual information
We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules
(1) if F2 gt 050 and F37 gt 049 then the decision is Class4
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
8 The Scientific World Journal
0 5 10 15 20 25 30 35 40 450
005
01
015
02
025
03
035
04
Feature index
Dep
ende
ncy
Figure 10 Fuzzy dependency between a single feature and decision
0 5 10 15 20 25 30 35 40 45
Feature index
Dep
ende
ncy
0
02
04
06
08
1
12
14
Figure 11 Kernelized mutual information between a single featureand decision
Please note that if 1198651sube 1198652 we have 119870
2sube 1198701 However
1198702sube 1198701does not mean 119865
1sube 1198652
Definition 5 Given a set of samples 119880 = 1199091 1199092 119909
119898
each sample is described with 119899 features 119865 = 1198911 1198912 119891
119899
1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the
kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as
MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)
MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used
Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust
1 2 3 4
0
02
04
06
08
1
Selected feature number
Fuzz
y de
pend
ency
03475
09006
0999609969
Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)
0
02
04
06
08
1
12
14
16
18
Kern
eliz
ed m
utua
l inf
orm
atio
n
1267
161215771510
1 2 3 4
Number of selected features
Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)
Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets
Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space
Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results
Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In
The Scientific World Journal 9
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency
each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively
In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature
Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different
regions which leads to the difficulty in learning classificationmodels
However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model
Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance
5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm
After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent
10 The Scientific World Journal
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information
with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects
Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this
work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows
Decision tree algorithm C45Input a set of training samples 119880 = 119909
1 1199092 119909
119898
with features 119865 = 1198911 1198912 119891
119899
Stopping criterion 120591Output decision tree 119879
(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-
mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized
information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on
f best and add those nodes as children of node untilstopping criterion 120591 is satisfied
(6) Output 119879
The Scientific World Journal 11
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information
F2
le050gt050
F37 F2
le041
Class 2
gt018
Class 5
le049
Class 1
gt049
Class 4 F3
Class 3
gt041
le018
Figure 17 Decision tree trained on the features selected with fuzzyrough sets
We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively
F39le017gt017
F39 F40
le042
Class 3
gt042
Class 5
le045
Class 2
gt045
F38le080gt080
Class 1Class 4
Figure 18 Decision tree trained on the features selected withkernelized mutual information
We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules
(1) if F2 gt 050 and F37 gt 049 then the decision is Class4
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 9
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency
each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively
In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature
Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different
regions which leads to the difficulty in learning classificationmodels
However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model
Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance
5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm
After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent
10 The Scientific World Journal
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information
with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects
Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this
work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows
Decision tree algorithm C45Input a set of training samples 119880 = 119909
1 1199092 119909
119898
with features 119865 = 1198911 1198912 119891
119899
Stopping criterion 120591Output decision tree 119879
(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-
mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized
information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on
f best and add those nodes as children of node untilstopping criterion 120591 is satisfied
(6) Output 119879
The Scientific World Journal 11
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information
F2
le050gt050
F37 F2
le041
Class 2
gt018
Class 5
le049
Class 1
gt049
Class 4 F3
Class 3
gt041
le018
Figure 17 Decision tree trained on the features selected with fuzzyrough sets
We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively
F39le017gt017
F39 F40
le042
Class 3
gt042
Class 5
le045
Class 2
gt045
F38le080gt080
Class 1Class 4
Figure 18 Decision tree trained on the features selected withkernelized mutual information
We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules
(1) if F2 gt 050 and F37 gt 049 then the decision is Class4
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
10 The Scientific World Journal
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information
with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects
Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this
work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows
Decision tree algorithm C45Input a set of training samples 119880 = 119909
1 1199092 119909
119898
with features 119865 = 1198911 1198912 119891
119899
Stopping criterion 120591Output decision tree 119879
(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-
mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized
information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on
f best and add those nodes as children of node untilstopping criterion 120591 is satisfied
(6) Output 119879
The Scientific World Journal 11
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information
F2
le050gt050
F37 F2
le041
Class 2
gt018
Class 5
le049
Class 1
gt049
Class 4 F3
Class 3
gt041
le018
Figure 17 Decision tree trained on the features selected with fuzzyrough sets
We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively
F39le017gt017
F39 F40
le042
Class 3
gt042
Class 5
le045
Class 2
gt045
F38le080gt080
Class 1Class 4
Figure 18 Decision tree trained on the features selected withkernelized mutual information
We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules
(1) if F2 gt 050 and F37 gt 049 then the decision is Class4
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 11
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
0 05 1
0
05
1
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information
F2
le050gt050
F37 F2
le041
Class 2
gt018
Class 5
le049
Class 1
gt049
Class 4 F3
Class 3
gt041
le018
Figure 17 Decision tree trained on the features selected with fuzzyrough sets
We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively
F39le017gt017
F39 F40
le042
Class 3
gt042
Class 5
le045
Class 2
gt045
F38le080gt080
Class 1Class 4
Figure 18 Decision tree trained on the features selected withkernelized mutual information
We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules
(1) if F2 gt 050 and F37 gt 049 then the decision is Class4
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
12 The Scientific World Journal
(2) if F2 gt 050 and F37 le 049 then the decision is Class1
(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5
(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3
(5) if F2 le 018 then the decision is Class 2
As to the second decision tree we can also obtain somerules as
(1) if F39 gt 045 and F38 gt 080 then the decision is Class4
(2) if F39 gt 045 and F38 le 080 then the decision is Class1
(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class
5(5) if F39 le 017 and F40 le 042 then the decision is Class
3
We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex
6 Conclusions and Future Works
Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived
Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054
References
[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007
[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008
[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009
[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009
[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010
[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011
[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012
[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013
[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012
[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011
[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948
[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991
[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003
[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008
[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 13
[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011
[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993
[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011
[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011
[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007
[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006
[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004
[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004
[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007
[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009
[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009
[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014