A Hybrid Feature Selection Scheme for Reducing Diagnostic ... · However, some of fault features...

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 63, NO. 5, MAY 2016 3299

A Hybrid Feature Selection Scheme forReducing Diagnostic Performance DeteriorationCaused by Outliers in Data-Driven Diagnostics

Myeongsu Kang, Md. Rashedul Islam, Jaeyoung Kim, Jong-Myon Kim, Member, IEEE ,and Michael Pecht, Fellow, IEEE

Abstract—In practice, outliers, defined as data pointsthat are distant from the other agglomerated data pointsin the same class, can seriously degrade diagnostic per-formance. To reduce diagnostic performance deteriorationcaused by outliers in data-driven diagnostics, an outlier-insensitive hybrid feature selection (OIHFS) methodologyis developed to assess feature subset quality. In addition,a new feature evaluation metric is created as the ratio ofthe intraclass compactness to the interclass separabilityestimated by understanding the relationship between datapoints and outliers. The efficacy of the developed method-ology is verified with a fault diagnosis application by iden-tifying defect-free and defective rolling element bearingsunder various conditions.

Index Terms—Data-driven diagnostics, outlier detection,outlier-insensitive hybrid feature selection (OIHFS), rollingelement bearings.

NOMENCLATURE

Bd Roller diameter.Ci Intraclass compactness of the ith class.Coverall Overall compactness estimated from per-

class compactness.cdpi Centroid data point of the ith class.D A set of data points in a class.Dcentroid Centroid data point.Di ith data point in a class.

Manuscript received April 24, 2015; revised July 31, 2015, September10, 2015, and December 13, 2015; accepted January 9, 2016. Dateof publication February 11, 2016; date of current version April 8, 2016.This work was supported in part by the National Research Foundationof Korea funded by the Ministry of Education, Science, and Technologyof Korea under Grant NRF-2013R1A2A2A05004566 and Grant NRF-2015K2A1A2070866, in part by the over 100 CALCE members ofthe CALCE Consortium, and in part by the National Natural ScienceFoundation of China under Grant 71420107023. (Corresponding author:Jong-Myon Kim.)

M. Kang was with the Department of Electrical Engineering, Universityof Ulsan, Ulsan 44610, Korea. He is now with the Center for AdvancedLife Cycle Engineering, The University of Maryland, College Park, MD20742 USA (e-mail: [email protected]).

Md. R. Islam and J. Kim are with the School of Electrical, Electronic,and Computer Engineering, University of Ulsan, Ulsan 44610, Korea(e-mail: [email protected]; [email protected]).

J.-M. Kim is with the Department of IT Convergence, University ofUlsan, Ulsan 44610, Korea (e-mail: [email protected]).

M. Pecht is with the Center for Advanced Life Cycle Engineering,The University of Maryland, College Park, MD 20742 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIE.2016.2527623

dpj,k kth data point in class j.Edistl Euclidean distance between the centroid data

point and the lth outlier.Emetric Feature evaluation metric.Fshaft Shaft speed in hertz.Ledist, Redist Two Euclidean distance values specified by

the membership level.Li Cumulative distance to all the other data

points, calculated using a norm metric for theDi.

Mdisti Maximum Euclidean distance between theith data point and its neighboring data points.

membershipi Membership degree of the ith data point.membershiplevel Membership level to determine outlier

candidates.Nanaldata Number of data points per bearing condition

in the analysis dataset.Nclasses Number of bearing conditions.NFN Number of data points in class i that are not

correctly classified as class i.Nfeatures Number of fault features (or signatures).Niterations Number of iterations for both the filter and

wrapper methods.Noutliers Number of outliers in each class.Nrollers Number of rollers in a rolling element

bearing.NTP Number of data points in class i that are

correctly classified as class i.Ntfreqbins Total number of frequency bins.Ntotaldata Total number of data points used to test the

k-nearest neighbor (k-NN) classifier.Ntrdata Total number of data points used to train the

k-NN classifier.Ntsamples Total number of data samples in five-second

acoustic emission data sampled at 250 kHz,x(n).

n Number of data points (n = Nanaldata in thisstudy).

outlieri,l lth outlier in class i.Pd Pitch diameter.RVorder Order of random variation in the theoretical

bearing characteristic frequencies.S(f ) Magnitude response of fast Fourier transform

of x(n).

0278-0046 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

3300 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 63, NO. 5, MAY 2016

Si Interclass separability of the ith class.Soverall Overall separability estimated from per-class

separability.vDi Two-dimensional (2-D) vector used to deter-

mine whether a data point Di is an outlier.α Contact angle.ε Constant value to control the membership

level.x̄, σ Mean and standard deviation of x(n).

I. INTRODUCTION

D URING the past several decades, model-based faultdetection and diagnosis (FDD) techniques have been

extensively used to enhance the reliability and availability ofindustrial systems subject to faults [1]–[3]. However, indus-trial processes have become more complicated in recent years,and there is less tolerance for both performance deteriorationand productivity decreases. Hence, conventional model-basedschemes, which demand a deep knowledge of process modelsderived from first principles, are impractical for complicatedindustrial processes [4], [5]. Fortunately, the rapid develop-ment of data acquisition, data mining, and machine-learningtechniques is facilitating the collection and storage of massiveamounts of data (e.g., current, vibration, and acoustic emis-sion), the extraction of useful inherent information, and theclassification of fault types, respectively, in large-scale indus-tries. Accordingly, data-driven methodologies are alternativesthat can be used for efficient process monitoring and faultdiagnosis [6]–[8].

Today’s FDD approaches use high-dimensional feature vec-tors to avoid the risks of missing potentially necessary informa-tion needed for reliable FDD. However, some of fault featuresare redundant or irrelevant to the predictive models (i.e., super-vised and unsupervised learning). If so, these redundant orirrelevant fault features can be a primary cause of diagnosticperformance degradation. To address this issue, discriminatoryfault feature (or signature) selection has become an indis-pensable part of reliable FDD. Essentially, the following twoprocedures are performed in the feature selection process, i.e.,a number of feature subsets (or subsets of fault signatures)are first formed (i.e., configuration process of feature subsets)and then evaluated (i.e., evaluation process of feature subsetquality). Based on the evaluation process, feature selectionschemes are basically categorized into filters or wrappers. Filterapproaches employ an evaluation strategy that is independentfrom any classification scheme, while wrapper methods useaccuracy estimates for specific classifiers during the assessmentof feature subset quality [9]. Accordingly, wrapper method-ologies theoretically offer better diagnostic performance forpredefined specific classifiers than filter methods. However, fil-ter approaches are computationally efficient since they avoidthe accuracy estimation process for a certain classifier.

To achieve high-computational efficiency and diagnostic per-formance concurrently, recent intelligent FDD approaches haveadopted hybrid feature selection (HFS) schemes that appropri-ately exploit the advantages of the filter and wrapper methods[10], [11]. Liu et al. presented an HFS approach for the effective

identification of various failures in a direct-drive wind turbine[10]. More specifically, the HFS method consists of a globalgeometric similarity scheme that yields promising feature sub-sets and a predefined classifier (e.g., support vector machineor general regression neural network) to predict diagnosticperformance (or classification accuracies) with these featuresubsets. In [11], Yang et al. proposed a method to improve diag-nostic performance by introducing an HFS framework whichis an unsupervised learning model. This method is effectivefor bearing fault diagnosis with fewer fault signatures thatare closely related to single and multiple-combined bearingdefects.

With the negative effects of high-dimensional feature vec-tors, current FDD approaches are also subject to diagnosticperformance deterioration caused by outliers, defined as datapoints that are distant from the other agglomerated data pointsin the same class. To address this issue, an outlier-insensitivehybrid feature selection (OIHFS) methodology is developed inthis study. This method mainly consists of sequential forwardfloating search (SFFS) [12], [13], precise assessment of thequality of feature subsets, and accuracy estimation of a classi-fier. In this OIHFS, the SFFS is first used to produce a series offeature subsets. Then, a discriminatory feature subset candidateis determined among them, where the feature subset candidatecan be insensitive to outliers. That is, to successfully discrimi-nate the feature subset candidate, proper assessment of featuresubset quality is a key issue. Accordingly, this study introducesa new feature subset evaluation metric which is defined as theratio of the intraclass compactness to the interclass separabil-ity. Both the compactness and the separability are estimatedby understanding the relationship between the data points andoutliers. A technique to accurately detect outliers based on thefollowing attributes is further investigated in this study: 1) largedistances from the agglomerated data points in the same classand 2) low membership degrees in the same class.

Once a discriminatory feature subset candidate is deter-mined, it is further used to predict the classification accuracy ofa k-nearest neighbors (k-NN) algorithm, where the k-NN algo-rithm is a nonparametric method of classification. Additionally,both the filter method for determining a discriminatory fea-ture subset candidate and the wrapper method for estimatingaccuracy with that feature subset candidate are carried out mul-tiple times via k-fold cross validation (k-cv) in the OIHFSmethodology. This iterative process is effective for reducingvariability of the feature subsets. However, since this processresults in several discriminatory feature subset candidates, itis necessary to determine the most discriminatory feature sub-set that will be eventually used for data-driven diagnostics. Todeal with this issue, a decision rule based on both accuracyestimates and the frequency of feature subsets is employed.In this study, the efficacy of the OIHFS methodology is vali-dated with a low-speed rolling element-bearing fault-diagnosisapplication.

This paper is organized as follows. In Section II, the dataacquisition environment for performing bearing fault diagnosisis illustrated. In Section III, the OIHFS scheme is presented,and its efficacy is validated in Section IV. Conclusion andsuggestion for future work are provided in Section V.

KANG et al.: HFS SCHEME FOR REDUCING DIAGNOSTIC PERFORMANCE DETERIORATION 3301

Fig. 1. Example of single and multiple-combined seeded bearing failures with crack length, width, and depth of 3, 0.35, and 0.3 mm, respectively.(a) BCI, (b) BCO, (c) BCR, (d) BCIO, (e) BCIR, (f) BCOR, and (g) BCIOR, where the number of rollers is 13, the contact angle is 0◦, and the rollerand pitch diameters are 9 and 46.5 mm, respectively.

Fig. 2. (a) Machinery fault simulator. (b) Data acquisition system.

II. ACOUSTIC EMISSION DATA ACQUISITION FOR

ROLLING ELEMENT-BEARING FAULT DIAGNOSIS

As mentioned above, a bearing fault-diagnosis application isused to determine whether the OIHFS methodology is usefulfor data-driven FDD approaches. This study uses a defect-freeand seven defective cylindrical roller bearings (FAG NJ206-E-TVP2). Specifically, defective bearings have either a singlecrack or multiple-combined cracks in the following locations(see Fig. 1): the inner raceway (BCI), outer raceway (BCO),roller (BCR), inner and outer raceways (BCIO), inner racewayand roller (BCIR), outer raceway and roller (BCOR), and innerraceway, outer raceway, and roller (BCIOR).

Since, according to [14], the acoustic emission (AE) sensoris capable of capturing intrinsic information about defect-freebearing (DFB) and defective bearings, this study records AEdata sampled at 250 kHz. For this purpose, a PCI-2-based sys-tem [15] that is connected to a general-purpose wideband fre-quency AE sensor (WSα from Physical Acoustics Corporation)[16] is used, as shown in Fig. 2. More specifically, the AE sen-sor is attached at the top of the nondrive end-bearing housing,and its distance from the nondrive end cylindrical roller bear-ing is 21.48 mm. Likewise, this study uses various datasetsinvolving 90 five-second AE data samples for each bearing con-dition (e.g., a defect-free bearing and seven defective bearings)for efficacy validation of the OIHFS method, as presented inTable I.

A. Load Condition and Defect Severity of DefectiveBearings

In 2006, Al-Ghamd et al. demonstrated the effectiveness ofusing a root-mean-square (RMS) method to measure load con-ditions [17]. Accordingly, in this study, the RMS values arecalculated for five-second AE data obtained from DFBs rotatingat 500 RPM under no-load and load conditions. As illustratedin Fig. 3, the RMS value for a DFB with a slight load condition

is 0.2714, which is approximately three times higher than theRMS value of 0.0838 for a DFB without the load.

In addition, Al-Ghamd et al. conducted experiments to eval-uate bearing defect severity by exploiting the kurtosis and RMSvalues, and they concluded that RMS was suitable for mea-suring the degree of bearing defect severity [17]. Thus, thisstudy employs the same method to show the defect severity ofdefective bearings under the slight load condition. As shownin Fig. 4, the RMS value increases with bearing defect sever-ity. More specifically, the RPM values, on average, increase1.81-fold, 1.97-fold, 2.56-fold, 5.39-fold, 2.53-fold, and 7.1-fold for the BCI, BCO, BCR, BCIO, BCIR, BCOR, and BCIORconditions, respectively, as severity worsens.

III. OUTLIER-INSENSITIVE HYBRID FEATURE SELECTION

As illustrated in Fig. 5, a dataset (see Table I) is divided intotwo different subdatasets: an analysis dataset and an evaluationdataset. This is to guarantee the reliability of the performanceevaluation results by isolating the analysis dataset from theevaluation dataset. In this study, 30 of 90 five-second AE datasamples for each bearing condition are used to configure theanalysis dataset, while the remaining data are reserved as theevaluation dataset (i.e., 60 of 90 five-second AE data samplesfor each bearing condition not involved in the analysis dataset).That is, the analysis dataset is used to determine the most dis-criminatory feature subset for bearing fault diagnosis, whereasthe evaluation dataset is employed for efficacy verification ofthe OIHFS methodology.

A. Fault Signature Pool Configuration

In Fig. 5, the analysis dataset is used to configure a fault-signature pool and to determine the most discriminatory featuresubset. According to [18], statistical parameters from the timeand frequency domains are well corroborated with intelligentfault-diagnosis schemes. Thus, this study uses them as faultsignatures for the identification of various single and multiple-combined bearing defects. Tables II and III define statisticalparameters for the given five-second AE data, x(n). Theseparameters are calculated in the time and frequency domains,and include the root-mean-square (RMS, f1), square root ofthe amplitude (SRA, f2), kurtosis (f3), skewness (f4), peak-to-peak (PP, f5), crest factor (CF, f6), impulse factor (IF, f7),margin factor (MF, f8), shape factor (SF, f9), kurtosis factor(KF, f10), frequency center (FC, f11), RMS frequency (RMSF,f12), and root variance frequency (RVF, f13).


TABLE IDETAILED DESCRIPTION OF VARIOUS DATASETS USED TO EVALUATE THE EFFICACY OF THE OIHFS METHODOLOGY

Fig. 3. RMS values of five-second AE data obtained from DFBs rotatingat a speed of 500 RPM under (a) no-load and (b) slight-load conditions.

Fig. 4. Defect severity of defective bearings considered in this study.

For bearing failures, there are characteristic (or defect) fre-quencies at which faulty symptoms must be observable. Thishas prompted us to use statistical values, computed around har-monics of these characteristic frequencies in an envelope powerspectrum, as additional fault signatures. A detailed descriptionof the method for obtaining an envelope power spectrum for thegiven AE data is provided in [19].

The aforementioned faulty symptoms can be revealed at oneof the following four defect frequencies as the roller strikes alocal failure in the bearing [20]:

BPFI =Nroller · Fshaft

2

(1 +

Bd

Pdcosα

)

BPFO =Nroller · Fshaft

2

(1− Bd

Pdcosα

)

BSF =Pd · Fshaft

2 ·Bd

(1−

(Bd

Pdcosα

)2)

FTF =Fshaft

2

(1− Bd

Pdcosα

)(1)

where BPFI is the ball pass frequency of the inner raceway,BPFO is the ball pass frequency of the outer raceway, BSF is theball spin frequency, and FTF is the fundamental train frequency.In (1), these characteristic frequencies depend on the followingparameters: the number of rollers (Nrollers), the shaft speed inhertz (Fshaft), the contact angle (α), the roller diameter (Bd),and the pitch diameter (Pd).

In fact, narrow-band RMSF values (see Table III) are used asadditional fault signatures, computed around each BPFI, BPFO,2× BSF, up to the third harmonics of each of these defectfrequencies in an envelope power spectrum obtained from afive-second AE data sample. Moreover, this study assumesthat a bearing contains all of the possible bearing defects(i.e., a crack on the inner raceway, the outer raceway, and aroller). Accordingly, in total, nine RMSF values are furtherused as fault signatures: RMSFBPFI1(f14), RMSFBPFI2(f15),RMSFBPFI3(f16), RMSFBPFO1(f17), RMSFBPFO2(f18),RMSFBPFO3(f19), RMSF2×BSF1(f20), RMSF2×BSF2(f21), andRMSF2×BSF3(f22).

In a bearing, the radial load greatly influences the forceof the impact caused by rolling over a defect. The outerraceway is a stationary component of the bearing, and thusa defect of the outer raceway is subjected to the same forceat each roll. On the other hand, a fault of the inner racewayis subjected to variable forces because it rotates near theshaft speed. Consequently, all the harmonics of the BPFI areamplitude-modulated by the RPM of the shaft (i.e., Fshaft).Similarly, 2× BSF, caused by a roller defect, is amplitude-modulated by the RPM of the cage (i.e., FTF). Theoretically,amplitude modulation by either inner-raceway or roller-relatedbearing defects produces sidebands that are spaced apart by themodulation frequency (i.e., Fshaft or FTF) and centered aboutthe BPFI or 2× BSF. Furthermore, it is common to observe arandom variation in the calculated bearing-defect frequencieson the order of 1%–2% [21]. Thus, in this study, narrow-bandRMSF values are calculated in frequency ranges with arandom variation of 2%. Namely, RMSFBPFIi,RMSFBPFOi,and RMS2×BSFi are computed in ranges from

(1− RVorder

2

) ·(BPFIi − 2 · Fshaft) to

(1 + RVorder

2

) · (BPFIi + 2 · Fshaft),from

(1− RVorder

2

) · (BPFOi) to(1 + RVorder

2

) · (BPFOi),and from

(1− RVorder

2

) · (2× BSFi − 2 · FTF) to(1 + RVorder

2

) · (2× BSFi + 2 · FTF), respectively, whereRV order is the order of random variation of the theoretical


Fig. 5. Overall flow diagram of bearing fault diagnosis.

TABLE IIDEFINITION OF TIME-DOMAIN STATISTICAL PARAMETERS USED IN THIS STUDY

Ntsamples is the total number of data samples in five-second AE data sampled at 250 kHz, x(n).x̄ and σ are the mean and the standard deviation of x(n), respectively.

TABLE IIIDEFINITION OF FREQUENCY-DOMAIN STATISTICAL PARAMETERS USED IN THIS STUDY

S(f) is the magnitude response of the fast Fourier transform of x(n.Ntfreqbin is the total number of frequency bins.

characteristic frequencies (RV order = 2% in this study) and iis an index indicating the ith harmonic of BPFI, BPFO, and2× BSF (i = 1, 2, and 3 in this study).

In summary, the dimensionality of the fault-signature poolused in the feature selection process is Nfeatures ×Nanaldata ×Nclasses, where Nfeatures is the number of fault signatures(Nfeatures = 22 in this study), Nanaldata is the number of datapoints per bearing condition in the analysis dataset (Nanaldata =30 in this study), and Nclasses is the number of bearing con-ditions to be discriminated in this study (Nclasses = 8 in thisstudy).

B. OIHFS Scheme

As shown in Fig. 6, k-cv [22] is used to divide theNfeatures ×Nanaldata ×Nclasses-dimensional fault-signature poolrandomly into k Nfeatures × (Nanaldata/k)×Nclasses-dimensionalfault-signature subpools. Accordingly, the filter method inOIHFS can yield k discriminatory feature subset candidates.Then, these feature subsets are used in the wrapper method foraccuracy estimation of k-NN. As briefly mentioned in Section

I, both the filter and wrapper methods are performed Niterations

times (Niterations = 20 in this study). This iterative process withcross validation reduces the variability of the most discrimi-natory fault-signature subset, which is used for reliable FDD.More details about the OIHFS approach are given below.

1) A Metric to Assess the Quality of Feature Subsetsin OIHFS: SFFS is used to yield a series of feature subsets.To determine a promising feature subset candidate from these,a metric to precisely assess the quality of feature subsets isneeded. Recently, Kang et al. presented an efficient multivariatefeature evaluation criterion using average values of pair-wiseEuclidean distances to measure the intraclass compactness andinterclass separability [23]. Based on this criterion, the authorsgreatly improved the diagnostic performance. In practice, theabove-mentioned feature evaluation criterion is effective fordetermining a feature subset that minimizes the intraclass com-pactness and maximizes the interclass separability. However,diagnostic performance deterioration is likely, since the esti-mation of both the compactness and separability based onaverage values of pair-wise Euclidean distances does not takeinto account the impact of outliers.


Fig. 6. Overall process of the OIHFS methodology.

Fig. 7. Example of a 2-D fault-signature distribution, where the dot-ted circle (left) indicates the degree of intraclass compactness of aclass, and the dotted line (right) indicates the pair-wise Euclidean dis-tance between two data samples in different classes for estimating theinterclass separability of a class.

The example in Fig. 7 demonstrates the reason that it is nec-essary to consider outliers in estimations of both the intraclasscompactness and the interclass separability. In Fig. 7, three out-liers are clearly observed in class 1. Although there is a highprobability that these outliers are not correctly discriminatedinto class 1, the ratio of the intraclass compactness to interclassseparability may be low enough when the average values ofpair-wise Euclidean distances are calculated, which means thatthese features are effectively used to identify the class 1 fromthe others. This is because most of the data points in class 1 areclosely agglomerated and separate from data points belongingto other classes. Hence, this study investigates a way to mea-sure both the compactness and the separability by consideringoutliers.

In the developed OIHFS methodology, the intraclass com-pactness of a class is defined as the maximum Euclideandistance between a centroid data point and outliers. Thus, acentroid data point and outliers must be detected in each classso that the intraclass compactness can be estimated. Let D =D1, D2, D3, . . . , Dn be a set of data points in a class, wheren is the total number of data points (i.e., n = Nanaldata in thisstudy). In addition, each data point in D corresponds to a vec-tor involving fault signatures specified by SFFS. Fig. 8 depictsan example of data point configuration in a class.

Fig. 8. Example of a data point configuration used in the OIHFS schemeto yield the most discriminatory feature subset.

For every data point Di, the norm metric is used to computethe cumulative distances to all other data points, resulting in

Li =∑n

j=1‖Di −Dj‖2, i = 1, 2, . . . , n. (2)

The centroid data point Dcentroid is associated with a datapoint yielding the minimum accumulated distance. That is,Dcentroid is a data point satisfying Dcentroid = Di and i =argmin

i{Li}, respectively. In practice, data points can be cat-

egorized as outliers if they have both of the following twoattributes: a large distance from the agglomerated data pointsand a low membership degree. Specifically, the membershipdegree of the data point can be interpreted as the degree towhich the point is affinitive with the class to which it belongs.

To effectively detect outliers, a 2-D vector involving theaforementioned two attributes, vDi = {disti,membershipi}, isneeded for each data point, i = 1, 2, . . . , n. As depicted inFig. 9, the maximum Euclidean distance between the ith datapoint and its neighboring data points disti is used as one ofthe outlier detection criteria. The number of neighboring datapoints is set to 3 in this study. In addition, a membership degreefor the ith data point, membershipi, is assigned by a probabilitydensity function (pdf) of the maximal Euclidean distance. This


Fig. 9. 2-D vectors used for outlier detection.

Fig. 10. Example showing the process used to compute the intraclasscompactness of a class.

membership degree is used as another outlier detection crite-rion. In this study, the membership level for determining outliercandidates, membershiplevel, is defined as

membershiplevel = min {pdf (disti)}+ {max {pdf (disti)}−min {pdf (disti)}} · ε, i = 1, 2, . . . n

(3)

where ε, which controls the membership level, is a constantvalue from 0 to 1. Likewise, ε is experimentally determinedand set to 0.25 in this study. As the membershiplevel is decided,it is possible to maximally obtain two Euclidean distance valuesspecified by that membership level, denoted as Ledist and Redist

in Fig. 9.Due to the properties of outliers which are greatly sepa-

rated from the agglomerated data points in the same class,this study can ultimately identify the ith data point that sat-isfies disti ≥ Redist and membershipi ≤ membershiplevel as anoutlier, i = 1, 2, . . . , n. As illustrated in Fig. 10, the intraclasscompactness of class i, Ci, is computed as follows:

Ci = max {Edistl} , l = 1, 2, . . . , Noutliers (4)

where Edistl is the Euclidean distance between the centroid datapoint and the lth outlier. Likewise, Noutliers is the total numberof outliers in each class. Based on the per-class compactness,the overall compactness Coverall is finally estimated as

Coverall = max {Ci} , i = 1, 2, . . . , Nclasses. (5)

Fig. 11 pictorially illustrates the process of estimating theinterclass separability of class i, Si, which is calculated as

Si = min {mindistcdp-to-dp,mindistoutlier-to-dp} (6)

Fig. 11. Example showing how to calculate the interclass separability ofa class.

TABLE IVDIFFERENCE IN MEANING OF THE SYMBOL K WHEN USED FOR

K -FOLD CROSS VALIDATION AND K -NEAREST NEIGHBORS

where mindistcdp-to-dp = mini�=j

{∥∥cdpi − dpj,k

∥∥2

}∀i, j = 1, 2,

. . . , Nclasses ∀k = 1, 2, . . . , Nanaldata and mindistoutlier-to-dp =

mini�=j

{∥∥outlieri,l − dpj,k

∥∥2

}∀i, j = 1, 2, . . . , Nclasses ∀k =

1, 2, . . . , Nanaldata ∀l = 1, 2, . . . , Noutliers. Moreover, cdpi isthe centroid data point of the ith class, dpj,k is the kth datapoint in class j, and outlieri,l is the lth outlier in class i. For theoverall separability, this study uses the minimum value amongthe per-class separability estimates, denoted as Soverall

Soverall = min {Si} , i = 1, 2, . . . , Nclasses. (7)

Based on both Coverall and Soverall, the metric to assess thequality of any given feature subset by SFFS, Emetric, is definedas follows:

Emetric =Coverall

Soverall. (8)

In summary, discriminatory feature subset candidates deter-mined through the use of well-defined intraclass compactnessand interclass separability estimates are helpful for reduc-ing diagnostic performance degradation caused by outliers indata-driven diagnostics.

2) Accuracy Estimation of the k-NN Classifier inOIHFS: In Fig. 6, two-fold cross validation generates tworandomly portioned fault-signature subpools, denoted asNfeatures × (Nanaldata/2)×Nclasses-dimensional fault-signaturesubpool1 and Nfeatures × (Nanaldata/2)×Nclasses-dimensionalfault-signature subpool2, respectively. For accuracy estimationof the k-NN classifier, a subpool used to yield a discriminatoryfeature subset candidate is reserved as a training dataset, whilethe other subpool is employed as a test dataset. This is repeated


Fig. 12. Predictive classification accuracy of the k -NN classifier using 40 discriminatory feature subset candidates.

TABLE VSUMMARY OF FEATURE SUBSETS DETERMINED BY THE OIHFS

METHODOLOGY

until each subpool is reserved as either a training dataset or atest dataset at least once.

In OIHFS, the aforementioned accuracy estimation processis repeated Niterations times. Hence, a decision rule is required sothat the most discriminatory feature subset can be determined

TABLE VISUMMARY OF DISCRIMINATORY FEATURE SUBSETS YIELDED

BY THE FEATURE EVALUATION METRIC IN [23]

from Niterations × k feature subset candidates. More specifically,the decision rule is based not only on the predictive classifica-tion accuracy (or diagnostic performance) of the k-NN classifierbut also on the frequency of the feature subsets. In this decisionrule, the predictive classification accuracy has the priority in the


TABLE VIIAVERAGE CLASSIFICATION ACCURACIES AND SENSITIVITIES FOR IDENTIFYING DEFECT-FREE AND DEFECTIVE BEARINGS

VIA 20 TIMES K -CV (UNIT: %)

determination of which feature subset to use in the performanceevaluation process (see Fig. 5). This study further considersthe frequency of feature subset candidates in the decision rule.This is because some feature subset candidates may yield thesame high level of classification accuracy even though they aredifferent from each other.

In this study, the predictive classification accuracy (CA) isdefined as follows:

CA =

∑Nclasses

NTP

Ntotaldata× 100 (%) (9)

where Ntotaldata is the total number of data points used to test thek-NN classifier and NTP is the number of data points in class ithat are correctly classified as class i, i = 1, 2, . . . , Nclasses.

C. Summary of k Values for Both k-Fold Cross Validationand k-Nearest Neighbors

Although the use of the same symbol k for both k-cv andk-NN may reduce readability, this symbol is extensively used inthe literature to denote not only the number of randomly dividedfolds in k-cv but also the number of neighbors for measuring thedegree of affinity with classes in k-NN. To enhance readability,Table IV clarifies the difference in meaning when the symbol“k” is used for k-cv and k-NN.

Fig. 13. 2-D representation of discriminatory feature subsets yielded bythe OIHFS approach for (a) dataset 1 and (b) dataset 10.

IV. EXPERIMENTAL RESULTS

A. Most Discriminatory Fault-Signature SubsetDetermined by the OIHFS Methodology

As mentioned in Section III-B, Niterations × k discriminatoryfeature subset candidates are eventually created (Niterations = 20and k = 2 for k-cv). This study then examines the predic-tive classification accuracy of the k-NN classifier using these40 discriminatory feature subset candidates. As depicted inFig. 12, it is obvious that the classification performance largelydepends on the aforementioned feature subsets. Accordingly,this study evaluates each of them by employing the decisionrule described in Section III-B. Table V lists the most discrimi-natory feature subset for each dataset (see Table I), and each ofthese subsets is used to identify various bearing conditions inthe performance evaluation process.


TABLE VIIIAVERAGE EXECUTION TIMES FOR BOTH FAULT SIGNATURE CALCULATIONS AND ACCURACY ESTIMATIONS

B. Efficacy Verification of the OIHFS Methodology for aBearing Fault-Diagnosis Application

The key difference between the OIHFS and other conven-tional HFS methodologies is the manner of assessing the qualityof feature subsets. Hence, this section validates the effective-ness of the feature subset evaluation metric (under the OIHFSscheme) in comparison with one using average values of pair-wise Euclidean distances [23]. Table VI summarizes the mostdiscriminatory feature subset for each dataset yielded by thefeature subset evaluation metric in [23].k-cv is also employed to estimate the generalized classifica-

tion accuracy in the performance evaluation process. That is,an evaluation dataset (see Fig. 5) is randomly divided into kmutual folds, denoted as F1, F2, . . . , Fk. At the ith iterationin k-cv, fold Fi is reserved as a training dataset for the k-NNclassifier, while the remaining folds are used to test the k-NNclassifier, in which the test of the k-NN classifier is performedk times. In addition, this study iteratively performs the per-formance evaluation process to increase the reliability of thediagnostic results. Hence, the final diagnostic performance isdefined as the average of Niterations × k classification accuracies(ACA).

Table VII presents the diagnostic performance for eachdataset (see Table I) in terms of the ACA, which is helpful forunderstanding the overall diagnostic performance. In addition,sensitivity, a useful metric for evaluating the diagnostic perfor-mance for each bearing condition, is provided in Table VII, andis defined as

Sensitivity =NTP

NTP +NFN× 100 (%) (10)

where NFN is the number of data points in class i that are notcorrectly classified as class i.

As shown in Table VII, the OIHFS method enhances theACAs of 0.44% and 10% compared to the ACAs obtainedwhen all the fault signatures and feature subsets specified by themethod in [23] are used, respectively. Consequently, the OIHFSmethodology can effectively solve the diagnostic performancedegradation problem caused by outliers.

An interesting observation in Table VII is that the diagnos-tic performance decreases for bearings with small cracks at

low rotational speeds. To analyze this phenomenon, this studyemploys a 2-D representation of discriminatory feature subsetsyielded by the OIHFS approach, as depicted in Fig. 13. Forbearings with small cracks at low rotational speeds (e.g., dataset1), fault signatures in the same class are not as closely agglom-erated as those obtained from bearings with large cracks at highrotational speeds (e.g., dataset 10). In addition, fault signaturesbelonging to BCO and BCIO are not clearly separated, whicheventually results in diagnostic performance degradation.

In this study, the OIHFS methodology efficiently reduces thedimensionality of the feature vector by eliminating irrelevantand redundant fault signatures for reliable FDD. This impliesthat it is possible to alleviate the computational burden of con-figuring a feature vector in real FDD applications. Moreover,low-dimensional feature vectors can contribute to reducing thetime needed to compute the classification accuracy of a clas-sifier (e.g., k-NN in this study). Table VIII shows the speedupsobtained due to the use of the low-dimensional feature vector inthe developed method; all the experiments have been performedwith MATLAB 2008a on an Intel Core i3-2120 CPU operatingat 3.30 GHz. As shown in Table VIII, the average time requiredto execute a given task within the bearing fault-diagnosis appli-cation largely depends on the time required to calculate faultsignatures. The average execution time for predicting classifica-tion accuracy is almost negligible. In summary, the developedmethod achieves 2.15- to 29.7-fold speedups by significantlyreducing the execution times in fault-signature calculations.

V. CONCLUSION AND FUTURE WORK

The OIHFS scheme was developed to reduce diagnosticperformance deterioration caused by outliers in data-drivendiagnostics. Its key contribution is the assessment of the qual-ity of feature subsets. The developed feature subset evaluationmetric, defined as the ratio of the intraclass compactness to theinterclass separability estimated by understanding the relation-ship between data points and outliers, is capable of determiningthe most discriminatory feature subset, and can minimize thenegative impacts of outliers. The experimental results indicatedthat feature subsets specified by the developed metric are moreeffective for alleviating diagnostic performance degradation


than feature subsets determined by the metric in [23]. Thedeveloped approach achieved diagnostic performance improve-ments of up to 30.0% over the conventional method in termsof ACA. In addition, the OIHFS method reduced the execu-tion time of the bearing fault diagnosis application by efficientlyreducing the dimensionality of the originally produced featurevector.

This study presented a method of detecting outliers based onthe following attributes: 1) large distances from the agglom-erated data points in the same class and 2) low membershipdegrees. In this method, the membership level was experimen-tally set to a constant value so that outliers could be identified.The downside of this is that a fixed membership level may leadto misdetection of outliers. Thus, future research will explorea technique in which the membership level is set adaptively sothat outliers can be detected more accurately in real data-drivenFDD applications.

REFERENCES

[1] J. Seshadrinath, B. Singh, and B. K. Panigrahi, “Vibration analysis basedinterturn fault diagnosis in induction motors,” IEEE Trans. Ind. Informat.,vol. 10, no. 1, pp. 340–250, Feb. 2014.

[2] Y. Gritli, L. Zarri, C. Rossi, F. Filippetti, G.-A. Capolino, and D. Casadei,“Advanced diagnosis of electrical faults in wound-rotor inductionmachines,” IEEE Trans. Ind. Electron., vol. 60, no. 9, pp. 4012–4024,Sep. 2013.

[3] S. Huang, K. K. Tan, and T. H. Lee, “Fault diagnosis and fault-tolerantcontrol in linear drives using the Kalman filter,” IEEE Trans. Ind.Electron., vol. 59, no. 11, pp. 4285–4292, Nov. 2012.

[4] X. Dai and Z. Gao, “From model, signal to knowledge: A data-drivenperspective of fault detection and diagnosis,” IEEE Trans. Ind. Informat.,vol. 9, no. 4, pp. 2226–2238, Nov. 2013.

[5] S. Yin, X. Li, H. Gao, and O. Kaynak, “Data-based techniques focusedon modern industry: An overview,” IEEE Trans. Ind. Electron., vol. 62,no. 1, pp. 657–667, Jan. 2015.

[6] S. Yin, X. Zhu, and O. Kaynak, “Improved PLS focused onkey-performance-indicator-related fault diagnosis,” IEEE Trans. Ind.Electron., vol. 62, no. 3, pp. 1651–1658, Mar. 2015.

[7] S. Yin and Z. Huang, “Performance monitoring for vehicle suspensionsystem via fuzzy positivistic C-means clustering based on accelerom-eter measurements,” IEEE/ASME Trans. Mechatronics, vol. 20, no. 5,pp. 2613–2620, Oct. 2015.

[8] A. Soualhi, G. Clerc, and H. Razik, “Detection and diagnosis of faultsin induction motor using an improved artificial ant clustering technique,”IEEE Trans. Ind. Electron., vol. 60, no. 9, pp. 4053–4062, Sep. 2013.

[9] B. Li, P.-L. Zhang, H. Tian, S.-S. Mi, D.-S. Liu, and G.-Q. Ren, “Anew feature extraction and selection scheme for hybrid fault diagnosisof gearbox,” Expert Syst. Appl., vol. 38, pp. 10000–10009, 2011.

[10] C. Liu, D. Jiang, and W. Yang, “Global geometric similarity scheme forfeature selection in fault diagnosis,” Expert Syst. Appl., vol. 41, pp. 3585–3595, 2014.

[11] Y. Yang, Y. Liao, G. Meng, and J. Lee, “A hybrid feature selection schemefor unsupervised learning and its application in bearing fault diagnosis,”Expert Syst. Appl., vol. 38, pp. 11311–11320, 2011.

[12] T. W. Rauber, F. A. Boldt, and F. M. Varejao, “Heterogeneous featuremodels and feature selection applied to bearing fault diagnosis,” IEEETrans. Ind. Electron., vol. 62, no. 1, pp. 637–646, Jan. 2015.

[13] L. Lu, J. Yan, and C. W. de Silva, “Dominant feature selection for thefault diagnosis of rotary machines using modified genetic algorithm andempirical mode decomposition,” J. Sound Vib., vol. 344, pp. 464–483,2015.

[14] A.-B. Ming, W. Zhang, Z.-Y. Qin, and F.-L. Chu, “Dual-impulse responsemodel for the acoustic emission produced by a spall and the size evalu-ation in rolling element bearings,” IEEE Trans. Ind. Electron., vol. 62,no. 10, pp. 6606–6615, Oct. 2015.

[15] PCI-2 Based AE System User’s Manual [Online]. Available: http://www.physicalacoustics.com/content/literature/multichannel_systems/PCI-2_Product_Bulletin.pdf

[16] WSα Sensor [Online]. Available: http://www.pacndt.com/downloads/Sensors/Alpha/WS_Alpha.pdf

[17] A. M. Al-Ghamd and D. Mba, “A comparative experimental study on theuse of acoustic emission and vibration analysis for bearing defect identifi-cation and estimation of defect size,” Mech. Syst. Signal Process., vol. 20,pp. 1537–1571, 2006.

[18] Z. Xia, S. Xia, L. Wan, and S. Cai, “Spectral regression based fault featureextraction for bearing accelerometer sensor signals,” Sensors, vol. 12,pp. 13694–13719, 2012.

[19] M. Kang, J. Kim, and J.-M. Kim, “High-performance and energy-efficientfault diagnosis using effective envelope analysis and denosing on ageneral-purpose graphics processing unit,” IEEE Trans. Power Electron.,vol. 30, no. 5, pp. 2763–2776, May 2015.

[20] I. Bediaga, X. Mendizabal, A. Arnaiz, and J. Munoa, “Ball bearingdamage detection using traditional signal processing algorithms,” IEEEInstrum. Meas. Mag., vol. 16, no. 12, pp. 20–25, Apr. 2013.

[21] R. B. Randall and J. Antoni, “Rolling element bearing diagnostics—Atutorial,” Mech. Syst. Signal Process., vol. 25, pp. 485–520, 2011.

[22] J. D. Rodriguez, A. Perez, and J. A. Lozano, “Sensitivity analysis of k-fold cross validation in prediction error estimation,” IEEE Trans. PatternAnal. Mach. Intell., vol. 32, no. 3, pp. 569–575, Mar. 2010.

[23] M. Kang, J. Kim, and J.-M. Kim, “Reliable fault diagnosis for incipi-ent low-speed bearings using fault feature analysis based on a binary batalgorithm,” Inf. Sci., vol. 294, pp. 423–438, 2015.

[24] A. Hajnayeb, A. Ghasemloonia, S. E. Khadem, and M. H. Moradi,“Application and comparison of an ANN-based feature selection methodand the algorithm in gearbox fault diagnosis,” Expert Syst. Appl., vol. 38,pp. 10205–10209, 2011.

Myeongsu Kang received the B.E. and M.S.degrees in computer engineering and informa-tion technology and the Ph.D. degree in electri-cal, electronic, and computer engineering fromthe University of Ulsan, Ulsan, Korea, in 2008,2010, and 2015, respectively.

He is currently a Research Associate withthe Center for Advanced Life Cycle Engineering,The University of Maryland, College Park, MD,USA. His research interests include data-drivenprognostics and health management using sig-

nal processing, data mining, and machine learning techniques, andhigh-performance computing.

Md. Rashedul Islam received the B.S. degreein computer science and engineering from theUniversity Rajshahi, Rajshahi, Bangladesh, in2006, and the M.S. degree in informatics fromthe Högskolan i Borås (University of Boras),Boras, Sweden, in 2011. He is currently workingtoward the Ph.D. degree in electrical, electronic,and computer engineering at the University ofUlsan, Ulsan, Korea.

He is an Assistant Professor (on study leave)with the Department of Computer Science and

Engineering, University of Asia Pacific (UAP), Dhaka, Bangladesh. Hisresearch interests include signal processing, machine learning, data-driven diagnostics and prognostics, parallel processing, and GPS.

Jaeyoung Kim received the B.S. and M.S.degrees in electrical, electronic, and com-puter engineering from the University of Ulsan,Ulsan, Korea, in 2012 and 2015, respectively,where he is currently working toward the Ph.D.degree in electrical, electronics, and computerengineering.

His research interests include artificial intelli-gence, data-driven prognostics and health man-agement, and high-performance computing.


Jong-Myon Kim (M’05) received the B.S.degree in electrical engineering from MyongjiUniversity, Yongin, Korea, in 1995, the M.S.degree in electrical and computer engineeringfrom the University of Florida, Gainesville, FL,USA, in 2000, and the Ph.D. degree in electricaland computer engineering from Georgia Instituteof Technology, Atlanta, GA, USA, in 2005.

He is currently a Professor with theDepartment of IT Convergence and also aVice President of the Foundation for Industry

Cooperation, University of Ulsan, Ulsan, Korea. His research interestsinclude multimedia-specific processor architecture, fault diagnosis andcondition monitoring, parallel processing, and embedded systems.

Michael Pecht (M’83–SM’90–F’92) received thedual M.S. degree in electrical engineering andengineering mechanics and the Ph.D. degreein engineering mechanics from the Universityof Wisconsin–Madison, Madison, WI, USA, in1978, 1979, and 1982, respectively.

He is the Founder of the Center for AdvancedLife Cycle Engineering, The University ofMaryland, College Park, MD, USA, which isfunded by more than 150 of the world’s lead-ing electronics companies at more than U.S. $6

million/year. He is also the George E. Dieter Professor of MechanicalEngineering and a Professor of Applied Mathematics with The Universityof Maryland. He has authored/coauthored more than 20 books on elec-tronic product development, use, and supply chain management andmore than 500 technical articles.

A Hybrid Feature Selection Scheme for Reducing Diagnostic ... · However, some of fault features...

Documents

Transcript of A Hybrid Feature Selection Scheme for Reducing Diagnostic ... · However, some of fault features...