Gearbox Extraction

8/13/2019 Gearbox Extraction

http://slidepdf.com/reader/full/gearbox-extraction 1/10

A new feature extraction and selection scheme for hybrid fault diagnosis of gearbox

Bing Li a,b,⇑, Pei-lin Zhang a, Hao Tian a, Shuang-shan Mi b, Dong-sheng Liu b, Guo-quan Ren a

a First Department, Mechanical Engineering College, No. 97, Hepingxilu Road, Shi Jia-zhuang, He Bei province, PR Chinab Forth Department, Mechanical Engineering College, No. 97, Hepingxilu Road, Shi Jia-zhuang, He Bei province, PR China

a r t i c l e i n f o

Keywords:

GearboxHybrid fault diagnosis

Feature extraction

Feature selection

S transform

Non-negative matrix factorization (NMF)

Mutual information

Non-dominated sorting genetic algorithms

II (NSGA-II)

a b s t r a c t

A novel feature extraction and selection scheme was proposed for hybrid fault diagnosis of gearbox based

on S transform, non-negative matrix factorization (NMF), mutual information and multi-objective evolu-tionary algorithms. Time–frequency distributions of vibration signals, acquired from gearbox with differ-

ent fault states, were obtained by S transform. Then non-negative matrix factorization (NMF) was

employed to extract features from the time–frequency representations. Furthermore, a two stage feature

selection approach combining filter and wrapper techniques based on mutual information and non-dom-

inated sorting genetic algorithms II (NSGA-II) was presented to get a more compact feature subset for

accurate classification of hybrid faults of gearbox. Eight fault states, including gear defects, bearing

defects and combination of gear and bearing defects, were simulated on a single-stage gearbox to eval-

uated the proposed feature extraction and selection scheme. Four different classifiers were employed to

incorporate with the presented techniques for classification. Performances of four classifiers with differ-

ent feature subsets were compared. Results of the experiments have revealed that the proposed feature

extraction and selection scheme demonstrate to be an effective and efficient tool for hybrid fault diagno-

sis of gearbox.

2011 Elsevier Ltd. All rights reserved.

1. Introduction

Gearbox is one of the core components in rotating machinery

and has been widely employed in various industrial equipments.

Faults occurring in gearbox such as gears and bearings defects

must be detected as early as possible to avoid fatal breakdowns

of machines and prevent loss of production and human casualties.

Vibration based analysis is the most commonly used technique to

monitoring the condition of gearboxes. By employing appropriate

data analysis algorithms, it is feasible to detect changes in vibra-

tion signals caused by fault components, and to make decisions

about the gearboxes health status (Al-Ghamd & Mba, 2006; Chen,

He, Chu, & Huang, 2003; Lin & Zuo, 2003; Saravanan, Cholairajan,

& Ramachandran, 2008; Wang & McFadden, 1993). Although often

the visual inspection of the frequency domain features of the mea-

sured signals is adequate to identify the faults, many techniques

available presently require a good deal of expertise to apply them

successfully. Simpler approaches are needed which allow relatively

unskilled operators to make reliable decisions without the need for

a diagnosis specialist to examine data and diagnose problems. Con-

sequently, there is a need for a reliable, fast and automated proce-

dure of diagnostics. Various intelligent techniques such as artificial

neural networks (ANN), support vector machine (SVM), fuzzy logic

and evolving algorithms (EA) have been successfully applied to

automated detection and diagnosis of machine conditions (Firpi

& Vachtsevanos, 2008; Lei, He, Zi, & Hu, 2007; Lei & Zuo, 2009;

Samanta, 2004; Samanta, Al-Balushi, & Al-Araimi, 2003; Samanta

& Nataraj, 2009; Srinivasan, Cheu, Poh, & Ng, 2000; Wuxing, Tse,

Guicai, & Tielin, 2004). They have largely improved the reliability

and automation of fault diagnosis systems for gearbox. For intelli-

gent fault diagnosis systems, feature extraction and feature selec-

tion schemes can be regarded as the most two important steps.

Feature extraction is a mapping process from the measured sig-

nal space to the feature space. Representative features associated

with the conditions of machinery components should be extracted

by using appropriate signal processing and calculating approaches.

Over the past few years, various techniques including Fourier

transform (FT), envelope analysis, wavelet analysis, empirical

mode decomposition (EMD) and time–frequency distributions

were employed to processing the vibration signals (Lin & Qu,

2000; Oehlmann, Brie, Tomczak, & Richard, 1997; Peng, Tse, &

Chu, 2005; Randall, Antoni, & Chobsaard, 2001; Wang & McFadden,

1993). Based on these processing techniques, statistic calculation

methods, autoregressive model (AR), singular value decomposition

(SVD), principal component analysis (PCA) and independent com-

ponent analysis (ICA) have been adopted to extracting representa-

tive features for machinery fault diagnosis ( Junsheng, Dejie, & Yu,

2006; Lei, He, Zi, & Chen, 2008; Li, Shi, Liao, & Yang, 2003; Wang,

Luo, Qin, Leng, & Wang, 2008; Widodo, Yang, & Han, 2007). Even

though several techniques have been proposed in the literature

0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved.doi:10.1016/j.eswa.2011.02.008

⇑ Corresponding author at: First Department, Mechanical Engineering College,

No. 97, Hepingxilu Road, Shi Jia-zhuang, He Bei province, PR China.

E-mail address: [email protected] (B. Li).

Expert Systems with Applications 38 (2011) 10000–10009

Contents lists available at ScienceDirect

Expert Systems with Applications

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a

http://dx.doi.org/10.1016/j.eswa.2011.02.008

mailto:[email protected]


http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa

http://www.elsevier.com/locate/eswa

http://www.sciencedirect.com/science/journal/09574174


mailto:[email protected]




for feature extraction, it still remains a challenge in implementing

a diagnostic tool for real-world monitoring applications because of

the complexity of machinery structures and operating conditions.

This investigation implement the feature extraction scheme by

utilizing two newly developed techniques, the S transform and

non-negative matrix factorization (NMF). The S transform intro-

duced by Stockwell (Stockwell, Mansinha, & Lowe, 1996), which

combines the separate strengths of the STFT and wavelet trans-

forms, has provided an alternative approach to process the non-

stationary signals generated by mechanical systems. It employs

variable window length. The frequency-dependent window func-

tion produces higher frequency resolution at lower frequencies,

while at higher frequencies, sharper time localization can be

achieved. S transform has become a valuable tool for the analysis

of signals in many applications (Assous, Humeau, Tartas, Abraham,

& L’Huillier, 2006; Dash & Chilukuri, 2004; Dash, Panigrahi, & Pan-

da, 2003). Non-negative matrix factorization (NMF) is a new sub-

space decomposition technique which is proposed to extract key

features of data by operating an iterative matrix factorization

(Lee & Seung, 1999). Differs from other similar methods, NMF im-

poses a non-negativity constraint on factorization which leads to

form intuitive and parts based representations of data on its fac-

tors. Due to its superior property, NMF has been adopted to various

applications such as face expression and recognition (Pu, Yi, Zheng,

Zhou, & Ye, 2005), object detection (Liu & Zheng, 2004), image

compression and classification (Yuan & Oja, 2005), sounds classifi-

cation (Cho & Choi, 2005), etc. In this paper, a novel feature extrac-

tion scheme based on S transform and non-negative matrix

factorization (NMF) for hybrid fault diagnosis of gearbox is

presented.

Feature selection is another indispensable procedure for intelli-

gent fault diagnosis system, since there are still high noises, irrel-

evant or redundant information in these extracted features.

Otherwise, too many features can cause of curse of dimensionality

due to the fact that the number of training samples must grow

exponentially with the number of features in order to learn an

accurate model. Therefore, a feature selection procedure is indeedneeded before classification. Several researches have been done on

this issue, such as genetic algorithms (GAs) ( Jack & Nandi, 2002;

Jack, Nandi, & McCormick, 2000; Samanta, 2004), decision tree

(Sugumaran, Muralidharan, & Ramachandran, 2007) and distance

evaluation technique (Lei et al., 2008; Lei & Zuo, 2009).

Filters and wrappers methods are the two mainly categories of

feature selection algorithms. Filter methods evaluate the goodness

of the feature subset by using the intrinsic characteristic of the

data. They are relatively computationally cheap since they do not

involve the induction algorithm. However, they also take the risk

of selecting subsets of features that may not match the chosen

induction algorithm. Wrapper methods, on the contrary, directly

use the classifiers to evaluate the feature subsets. They generally

outperform filter methods in terms of prediction accuracy, but theyare generally computationally more intensive (Zhu, Ong, & Dash,

2007). In summary, wrapper and filter methods can complement

each other, in that filter methods can search through the feature

space efficiently while the wrappers can provide good accuracy.

It is desirable to combine the filter and wrapper methods to

achieve high efficiency and accuracy simultaneously. In this work,

a two stage feature selection combing filter and wrapper selection

technique base on the mutual information and the improved non-

dominated sort genetic algorithm NSGA-II was proposed. In the

first stage, a candidate feature subset was chosen by the max-rel-

evance and min-redundancy (mRMR) criterion (Peng, Long, & Ding,

2005) from the original feature set. Then at the second stage, clas-

sifier combined with NSGA-II was adopted to find a more compact

feature subset from the candidate feature subset obtained by filtermethods. In this stage, Feature selection problem is defined as a

multi-objective problem dealing with two competing objectives.

Consequently, an optimal feature subset consists of a minimal

number of features and produces the minimum classification error

can be obtained for intelligent fault diagnosis. Fig. 1 displays the

flowchart of the intelligent fault diagnosis system for gearbox inthis investigation.

The rest of this investigation is organized as follows. Section 2

describes the experiment setup and vibration dataset. The feature

extraction method based on S transform and NMF is detailed in

Section 3. The two stage feature selection scheme based on mutual

information and NSGA-II is outlined in Section 4. Section 5 presents

the experiment results and discussions. Conclusions are summa-

rized in Section 6.

2. Experimental setup and data acquisition

Fig. 2 displays the diagram of the experimental system used in

this work to evaluate the performance of the proposed approach.

The experiment system includes a single-stage gearbox, an ac mo-tor and a magnetic brake. The ac motor is used to drive the gearbox

Vibration signals

Final optimal feature subset

Original feature subset

Gearbox with sensors

Signal acquisition system

NMF

Filter methods based onmutual information

Time-frequency distributions

Wrapper methods based on

NSGA- II

Candidate feature subset

Classifiers outputs

Gearbox condition diagnosis

Data acquisition

Feature extraction

Feature selection

Classification

S transform

Fig. 1. Flowchart of the presented intelligent fault diagnosis system.

Gear A: 30T

Gear B: 50T

B1 B2

B3B4

Motor

Load

Fig. 2. Structure of the experiment gearbox.

B. Li et al. / Expert Systems with Applications 38 (2011) 10000–10009 10001



and the rotating speed is controlled by a speed controller, which al-

lows the tested gearbox to operate under various speeds. The load

is provided by the magnetic brake connected to the output shaft

and the torque can be adjusted by a brake controller. There are

two shafts inside the gearbox, which are mounted to the gearbox

housing by four rolling element bearings. Gear A has 30 teeth

and gear B has 50 teeth.

Local faults such as gear tooth wear, bearing inner race defect,

bearing outer race defect are most frequently detected fault mode

studied in gearbox fault diagnosis. Previous studies mainly focused

on gear faults or bearing faults independently. Little research has

been done to detect the hybrid faults of gears and bearings occur

simultaneously. This work investigated this issue by simulating

hybrid fault types of gears and bearings simultaneously in a sin-

gle-stage gearbox. Eight fault states including gear wear, bearing

inner race defect, bearing outer race defect and their combinations

were tested in the experiments.

As shown in Fig. 2, all the defects were set on gear A and bearing

B1. The detailed descriptions about the eight operating states are

summarized in Table 1.

Vibration signals were acquired by using acceleration sensors

mounted on the four bearing bases. The sampling frequency is

6400 Hz and sampling points is 4096. Twenty samples for everystate were acquired in the tests. Hence, 160 samples in total were

collected for further investigations. Fig. 3 demonstrates the wave-

forms of the eight operating states in time domain.

3. Feature extraction based on S transform and non-negative

matrix factorization (NMF)

In this section, the feature extraction scheme based on the two

newly developed techniques, mean the S transform and non-nega-

tive matrix factorization (NMF), was presented.

Vibration signals acquired from gearbox are non-stationary and

complex, which contains enrich information about the operation

conditions. Thus the joint time–frequency analysis is often adopted

to describe the local information of these unstable signals com-

pletely. The S transform, which combines the separate strengths

of the STFT and wavelet transforms, was employed to obtain a high

resolution time–frequency representations of vibration signals.

However, it is not reasonable to classify these time–frequency

distributions directly because of the data dimensions are too high

to deal with. Thus the NMF technique was utilized to reduce the

high dimensional feature space while the most information of

the original time–frequency representations can be reserved.

3.1. S transform

The S transform, put forward by Stockwell in 1996, can be re-

garded as an extension to the ideas of the Gabor transform andthe wavelet transform. The S transform of signal x(t ) is defined as:

S ðs; f Þ ¼Z þ1

1 xðt Þwðt sÞe j2p ft dt ð1Þ

where

wðt Þ ¼ 1

r ffiffiffiffiffiffiffi2p

p e t 2

2r2 ð2Þ

And

r ¼ 1j f j ð3Þ

Then the S transform can be given by combining the Eq. (1)–(3):

S ðs; f Þ ¼Z þ1

1 xðt Þ j f j ffiffiffiffiffiffiffi

2pp eðt sÞ2 f 2

2 e j2p ft dt ð4Þ

Since S transform is a representation of the local spectra, Fourier

or time average spectrum can be directly obtainedby averaging the

local spectrums as:Z þ1

1S ðs; f Þds ¼ X ð f Þ ð5Þ

where X ( f ) is the Fourier transform of x(t ).The inverse S transform is

given by

xðt Þ ¼Z þ1

1

Z þ1

1S ðs; f Þe j2p ft dsdf ð6Þ

The main advantage of the S transform over the short-time Fourier

transform (STFT) is that the standard deviation r is actually a func-

tion of frequency f . Consequently, the window function is also a

function of time and frequency. As the width of the window is con-

trolled by the frequency, it can obviouslybe seen that the windowis

wider in the time domain at lower frequencies, and narrower at

higher frequencies. In other words, the window provides good

localization in the frequency domain for low frequencies while pro-

viding good localization in time domain for higher frequencies. It is

a very desirable characteristic for accurate representation of non-

stationary vibration signals in time–frequency domain.

3.2. Non-negative matrix factorization (NMF)

The NMF algorithm is a technique that compresses a matrix into

a smaller number of basis functions and their encodings (Lee &

Seung, 1999). The factorization can be expressed as following:

V nm W nr H r m ð7ÞwhereV denotes a n m matrix and m is the number of examples in

the dataset, each column of which contains an n-dimensional ob-

served data vector with non-negative values. This matrix then

approximately factorized into a n r matrix W and a r m matrix

H . The rank r of the factorization is usually chosen such that

(n + m)r < nm, and hence the compression or dimensionality reduc-

tion is achieved. This results in a compressed version of the original

data matrix. In other words, each data vector V is approximated by a

linear combination of the columns of W , weighted by the compo-

nents of H . Therefore, W can be regarded as basis matrix and H as

coefficient matrix.

The key characteristic of NMF is the non-negativity constraints

imposed on the two factors, and the non-negativity constraints are

compatible with the intuitive notion of combining parts to form a

whole.

In order to complete approximate factorization in Eq (7), a cost

function is needed to quantify the quality of the approximation.

NMF uses the divergence measure as the objective function:

F ¼Xn

i¼1 Xm

j¼1

V ij logðWH Þij ðWH Þijh i ð8Þ

which subjects to the non-negative constraints as described above.

Table 1

Description of the gearbox experiments.

Fault

states

Fault description Load

(Nm)

Speed

(rpm)

1 Normal 100 1200

2 Gear wear

3 Bearing inner race

4 Bearing outer race

5 Gear wear and bearing inner race6 Gear wear and bearing outer race

7 Bearing inner race and bearing outer race

8 Gear wear and bearing inner race and

bearing outer race

10002 B. Li et al. / Expert Systems with Applications 38 (2011) 10000–10009



In order to obtain W and H , a multiplicative update rule is givenin (Lee & Seung, 1999) as follows:

W ia ¼ W iaXm j¼1

V ijðWH Þij

H aj ð9Þ

W ia ¼ W iaPmi¼1W ia

ð10Þ

H aj ¼ H ajXni¼1

W iaV ij

ðWH Þijð11Þ

In this way, the basis matrix W and the coefficient matrix H can beobtained based on Eq. (9)–(11) in an iterative procedure.

4. Two stage feature selection based on the mutual informationand NSGA-II

As described in Section 3, for every signal f (n), we can get

numbers of features via the S transform and NMF. Even a dimen-

sion reduction has been done by the NMF, direct manipulation of

a whole set of feature components is not appropriate because the

feature space has still high dimensionality, and the existence of

irrelevant and redundant components makes the classification

unnecessarily difficult. Thus, a feature selection scheme combine

filter method and wrapper method based on mutual information

and NSGA-II is applied to identify a set of robust features

that provides the most discrimination among the classes of gear-

box vibration data. This will significantly ease the design of

the classifier and enhance the generalization capability of thesystem.

0 0.1 0.2 0.3 0.4 0.5 0.6-2

-1

0

1

2

A m p l i t u d e [ v ]

Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6-2

-1

0

1

2


Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6-2

-1

0

1

2


Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6-2

-1

0

1

2


Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6-2

-1

0

1

2


Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6-2

-1

0

1

2


Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6-2

-1

0

1

2


Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6-2

-1

0

1

2


Time [s]

a b

c d

e f

g h

Fig. 3. Vibration signals acquired from eight states of gearbox in the experiments: (a)–(h) corresponds to 1–8 states in Table 1.




4.1. Filter method using max-relevance and min-redundancy (mRMR)

based on mutual information

4.1.1. Mutual information

Mutual information is one of the most widely used measures to

define relevancy of variables (Peng et al., 2005). In this section, we

focus on feature selection method based on mutual information.

Given two random variables x

and y

, their mutual information

can be defined in terms of their probabilistic density functions

p( x), p( y) and p( x, y):

I ð x; yÞ ¼Z Z

pð x; yÞ log pð x; yÞ pð xÞ pð yÞ dxdy ð12Þ

The estimation of the mutual information of two variables was

detailed in (Peng et al., 2005).

In supervised classification, one can view the classes as a vari-

able (that we will name C ) with L possible values (where L is the

number of classes of the system) and the feature component as an-

other variable (that we will name X ) with K possible values (where

K is the number of parameters of the system). So, one will be able

to compute the mutual information I ( xk, c ) between the classes c

and the feature xk (k = 1, 2, . . ., K ):

I ð xk; c Þ ¼Z Z

pð xk; c Þ log pð xk; c Þ pð xkÞ pðc Þ dxkdc ð13Þ

Then the informative variables with larger I ( xk, c ) can be identified.

A more compact feature subset can be obtained via selecting the d

best features based on Eq. (13) from the original feature set.

Eq. (13) provides us with a measure to evaluate the effective-

ness of the ‘‘global’’ feature that is simultaneously suitable to dif-

ferentiate all classes of signals. For a small number of classes,

this approach may be sufficient. The more signal classes, the more

ambiguous I ( xk, c ) becomes.

4.1.2. Max-relevance and min-redundancy

Max-relevance means that the selected features xi are required,

individually, to have the largest mutual information I ( xi, c ). Itmeans that the m best individual features should be selected

according to this criterion. It can be represented as

max DðS ; c Þ; D ¼ 1

jS jX x2S

I ð xi; c Þ ð14Þ

where |S | denotes the number of features contained by S.

However, it has been proved that the simply combination of the

best individual features do not necessarily lead to a good perfor-

mance. In other words, ‘‘the m best features are not the best m fea-

tures’’ (Kohavi & John, 1997; Peng et al., 2005). The most important

problem of the max-relevance is it neglects the redundancy be-

tween features and may cause the degradation of the classification

performance.

So the min-redundancy criterion should be added to the selec-

tion of the optimal subsets. It can be represented as:

min RðS Þ; R ¼ 1

jS j2X

xi; x j2S I ð xi; x jÞ ð15Þ

The criterion combining the above two constraints is called the

‘‘minimal-redundancy–maximal-relevance’’ (mRMR) (H. Peng

et al., 2005). The operator U(D, R) is defined to optimize D and R

simultaneously:

max UðD;RÞ; U ¼ D R ð16Þ

4.1.3. Candidate feature subset obtained based on max-relevance and

min-redundancy

In practice, greedy search methods can be used to find the near-optimal features by U. Let F to be the original feature sets, S to be

the selected subsets. Suppose that we already have S m1, means we

have selected m – 1 features. The next work is to select the mth

feature from the set fF S m1g. This is done according to the fol-

lowing criterion:

max x j2F S m1

I ð x j; c Þ 1

m 1

X xi2S m1

I ð x j; xiÞ" #

ð17Þ

The main steps can be represented as:

Step 1: Let F to be the original feature set, S to be the selected

subset. We initiate S to be a empty subset, S ? {}.

Step 2: Calculate the relevance of individual feature xi with the

target class c , denoted by I ( xi, c ).

Step 3: Find the feature xk have the maximum relevance:

I ð xk; c Þ ¼ max xi2F

I ð xi; c Þ

Let F 1 ! fF xkg; S 1 ! fS þ xkg:

Step 4:

for m ¼ 2 : N

Let x j 2 F m1, xi 2 S m1, find the xk according to the following

criterion:

max x j2F m1

I ð x j; c Þ 1

m 1

X xi2S m1

I ð x j; xiÞ" #

Let F m ¼ fF m1 xkg; S m ¼ fS m1 þ xkgend

In this way, N sequential feature subsets can be obtainedand sat-

isfy S 1 S 2 S N . Thenextproblemis how to choosean optimal

set from the serial sets, in other words, how to determinate the

number of features that the sub-optimal set contained. Accordingto the cross-validation method, we compare the performances of

the N sequential feature subsets. The feature subset that corre-

sponds to best performance can be chosen as the candidate feature

subset for further more sophisticated selection using wrapper.

4.2. Wrapper method based on NSGA-II

Genetic algorithms were intensively used for feature selection

to solve the combinatory problem and to provide efficient explora-

tion of the solutions’ space. GAs work with a set of candidate solu-

tions called a population. Based on the Darwinian principle of

‘survival of the fittest’, GAs obtain the optimal solution after a ser-

ies of iterative computations. GAs generate successive populations

of alternate solutions that are represented by a chromosome, i.e. asolution to the problem, until acceptable results are obtained.

Associated with the characteristics of exploitation and exploration

search, GAs can deal with large search spaces efficiently, and hence

has less chance to get local optimal solution than other algorithms.

GAs were specifically developed for this task and results provided

by GAs solution were more efficient than classical methods devel-

oped for feature selection as confirmed in (Oh, Lee, & Moon, 2004;

Tan, Fu, Zhang, & Bourgeois, 2008; Yang & Honavar, 1998; Zhu &

Guan, 2004). These researches mainly used the single objective

optimization GAs which provide one optimal solution with the

maximum classification performance.

Feature selection problem can be defined as a multi-objective

problem dealing with two competing objectives. Consequently,

an optimal feature set has to be of a minimal number of features and have to produce the minimum classification error.




Non-dominated sorting genetic algorithm (NSGA) was suggested

by Goldberg and implemented by Srinivas and Deb (1994).

Although NSGA has been proved to be effective for multi-

objective optimization problems, NSGA has its drawbacks such as

high computational complexity of non-dominated sorting, lack of

elitism, and need for specifying the sharing parameter rshare. Aim

at such problems, Deb introduced NSGA-II (Deb, Pratap, Agarwal,

& Meyarivan, 2002) as an improved method which overcame the

original NSGA defects by alleviating computational complexity, by

introducing elitist-preserving mechanism and employing crowded

comparisonoperator. In this work, this new multi-objective optimi-

zation technique was employed to solve the problem of feature

selection with wrapper. More detaileddescriptionandimplementa-

tion methods about the NSGA-II canbe referred in (Deb et al., 2002).

For wrapper feature selection approach, there are several

factors for controlling the process of NSGA-II while searching the

sub-optimal feature subsets for classifiers. To apply NSGA-II to

feature selection, we focus on the following issues.

4.2.1. Fitness functions

Two competing objectives were defined as the fitness functions:

the first was minimization of the number of used features and the

second was minimization of the classification error. Four different

popular classifiers means the K nearest neighbor classifier (KNNC)

(Grother, Candela, & Blue, 1997), nearest mean classifier (NMC)

(Veenman & Tax, 2005), linear discriminant classifier (LDC) (Du &

Chang, 2000) and least-square support vector machine (LS-SVM)

(Suykens & Vandewalle, 1999) were employed as induction algo-

rithms to implement and evaluate the proposed feature selection

approach.

4.2.2. Encoding scheme

The binary coding system was used to represent the chromo-

some in this investigation. For chromosome representing the fea-

ture subsets, the bit with value ‘1’ represents the feature is

selected, and ‘0’ indicates feature is not selected.

4.2.3. Genetic operators

Genetic operator consists of three basic operators, i.e., selection,

crossover and mutation. The binary tournament selection, which

can obtain better result than the methods of proportional and gen-

itor selection, was adopted to select the next generation individual.

The used crossover technique was the uniform crossover consist-

ing on replacing genetic material of the two selected parents uni-

formly in several points. The mutation operator used in this work

was implemented as conventional mutation operator operating

on each bit separately and changing randomly its value.

5. Results and discussion

5.1. Feature extraction

For all the 160 samples as described in (Table 1), 160 times–

frequency matrices were obtained by employing the S transform

described in Section 3. Fig. 4 displays the time–frequency distribu-

tions of vibration signals from eight states of gearbox obtained by S

transform. It can be observed very obviously that the resolutions of

time–frequency representations calculated by S transform are

shown to be very satisfactory. The time–frequency representations

of the gearboxes with different states are shown to be different.

Consequently, we can expect that it will be very available to clas-

sify the vibration signals with S transform.

However, it is impossible to directly use the original time–

frequency matrix for classification because of the high dimension.

In this work, the dimension of the time–frequency matrix is1024 2048. Then the feature vector should have 2,097,152

dimensions if every matrix is regarded as an input vector. It is

not acceptable for any pattern recognition system to deal with

such high-dimensional input vectors. Thus it is very desirable to

reduce the data dimension to an acceptable scale, and, at the same

time, the information of the matrices should be reserved as much

as possible.

A new subspace decomposition technique NMF is employed

reduce the dimension of the time–frequency matrices. With S

transform, 160 time–frequency matrices can be obtained. All these

matrices are standardized and normalized to fulfill the non-

negative constrains for NMF. Forty samples, five samples for each

state, were selected as training samples. All the matrices are trans-

formed to vectors firstly and a training matrix can be formed for

NMF. We first apply the NMF to the training samples to extract

non-negative basis vectors W and associated encoding variables

H . Feature components can be computed from these non-negative

basis vectors W for other samples. The parameter r , whichis the re-

duced dimension, was chosen to be 100 in this paper. The iterative

step size is set to be 50. These parameters are chosen based on

some preliminary experiments. By computing the feature compo-

nents with the extracted basis vectors, 100 parameters can be

obtained as features for every sample. The feature dimension was

reduced from 2,097,152 to 100 by the NMF technique.

5.2. Feature selection

One hundred feature components can be obtained via the NMF

and S transform. A feature selection procedure is needed to find the

most informative feature components out from the original one

hundred features. According to the feature selection approach

based on the mutual information and NSGA-II as described in

Section 4, the most discriminative feature vector can be acquired.

We firstly partitioned the collected 160 samples into two parts:

a training dataset and a testing dataset (10 samples of each state

for training and another 10 samples for testing). We did this seg-

mentation randomly for 20 times to get a more robust evaluation

results.The sequential feature subsets S 1 S 2 S N were obtained

based on the training dataset by using the mRMR criterion and

the greedy search as described in Section 4. Then the performances

of these sequential feature subsets were evaluated by the testing

dataset. Four classifiers, as described in Section 4, were adopted

to assess the proposed scheme. Fig. 5 has given out the average

performances of four classifiers using the sequential feature sub-

sets over the 20 randomly segmented datasets.

It is clearly that the performances of four classifiers were not

keeping steadily improvement with the increasing of features

number. This phenomenon has proven the assumption that there

are many irrelevant and redundant features in the original feature

set. Another observation is that the performances of the four clas-

sifiers varied with different trends. The optimal feature subsets of four classifiers varied with each other, too. The candidate feature

subsets of the four classifiers are selected corresponding best per-

formances with the smallest feature subset size. So the candidate

feature subsets for NMC, KNNC, LDC and LS-SVM were chosen as

the S 50, S 45, S 38, S 54 respectively. The candidate feature subsets

were used for further selection using NSGA-II.

Based on the candidate feature subsets, more compact feature

subsets can be acquired by using wrappers with NSGA-II. The four

classifiers were also employed as induction algorithms combined

with the NSGA-II as wrappers for feature selection.

We implemented the NSGA-II methods with the following

parameters:

– population size: 100.– generation: 200.




– crossover rate: 0.9.

– mutation rate: 1/N (N is the size of the candidate feature

subset).

We also randomly partitioned the dataset into training dataset

and testing dataset for 20 times to assess the performances of thepresented wrapper methods with four classifiers.

0 10 20 30 40 50 60 70 80 90 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of features

C l a s s i f i c

a t i o n r a t e

SVM

NMC

LDC

KNNC

Fig. 5. Performances of four classifiers with sequential feature subsets obtained based on mRMR criterion.

F r e q u e n c y [ H z ]

Time [s]0 0.1 0.2 0.3 0.4 0.5 0.6

1000

2000

3000


Time [s]0 0.1 0.2 0.3 0.4 0.5 0.6

1000

2000

3000


Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6

1000

2000

3000


Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6

1000

2000

3000


Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6

1000

2000

3000


Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6

1000

2000

3000

F

r e q u e n c y [ H z ]

Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6

1000

2000

3000

F

r e q u e n c y [ H z ]

Time [s]

0 0.1 0.2 0.3 0.4 0.5 0.6

1000

2000

3000

a b

c d

e f

g h

Fig. 4. S transforms of vibration signals in Fig. 3: (a)–(h) corresponds to 1–8 states in Table 1.




Table 2–5 displays the results of the proposed feature selection

scheme with the four classifiers. For comparison, the original fea-

ture set, the candidate feature subset and feature subset formed

by wrappers directly on the original feature set are also evaluated

with the four classifiers. For convenience, we denote the original

feature subset as F0, the filtered feature subset as F1, the wrapper

feature subset as F2 and the two stage feature subset as F3. One

thing worthy to be noted is that F2 and F3 all give multiple solu-

tions for the feature selection problem.

For the NMC algorithm, the best performance 98.99% is

achieved by using the feature subset F2 with 33 features. With

the original feature subset F0 which contains all the 100 features,

the classification rate is only 80.83%. The performance with F1 is

not very satisfactory but still superior to F0 and contains only half

of the features in F0. Although the performances of F3 were not

best when compared to F2, the dimensions of feature subset are

much small. With only 10 features, its performance achieved

96.67%.

For the KNNC classifier, the highest classification rate 97.78%

was reached by using F3 with 8 features. The performances of

the feature subsets in F2 were also very available but the feature

subset sizes are larger than F3. Similar to the NMC classifier, the

worst performance is obtained by F0 and F1 performed better than

F0 with less features.

With LDC classifier, 100% classification rate is achieved by using

F2 with 23 parameters. With feature subsets in F3, the best perfor-

mance is 98.99% which is comparable to F2 and the feature subset

size is 12, which is smaller than F2. The performance of the original

feature subset F0 is very poor with LDC classifier, which is only

38.28%. F1 gives a performance 89%, which is much better than

F0 but worse when compared with F2 and F3.

The highest classification rate 100% was achieved by F2 and F3

simultaneously when LS-SVM is employed as classifiers. It also can

be observed that the dimension of feature subset in F3 is much

smaller than F2. The performance of F0 is still the worst in four fea-ture subsets.

5.3. Discussions

(1) It can be observed that, for all classifiers, the performances

obtained by the original feature set (F0) demonstrate to be

the worst among four feature subsets. Otherwise, the size

of F0 is the largest. It ascertains our assumption that there

exist many irrelevant and redundant features, which will

decrease the performances and increase the computation

cost of classifiers, in the original feature set. Thus the feature

selection procedure is very necessary before classification.

(2) Comparing F1 with F0, it can be found that, the mRMR

method can get a better classification rate than the originalfeature set with less features. However, when compared

with F3, F4, its performances seem to be poor and the fea-

ture subsets sizes are larger. It is because the filter method

did not involve any induction algorithms and the classifica-

tion accuracy is poorer than the wrapper methods. An

advantage of mRMR lies in the computation cost than the

wrapper method.

(3) The performances of F3 are promising compared with other

methods. The classification rates of LS-SVM achieved 100%

with F3. The performances of F2 are also very available when

compared with F3. In some cases, F2 outperforms F3 in

terms of classification accuracy. The main disadvantages of

wrapper methods with original feature set lie in the large

computation cost and the larger size of the selected featuresubsets.

Table 2

Performances of NMC with different feature subsets.

NMC

Feature subsets Feature

size

Performance

(%)

Original (F0) 100 80.83

mRMR (F1) 50 84.83

NSGA-II (F2) (patrol optimal solutions) 1 23 95.56

2 28 97.78

3 33 98.89

mRMR + NSGA-II (F3) (patrol optimal

solutions)

1 5 92.22

2 6 93.33

3 8 94.45

4 10 96.67

Table 3

Performances of KNNC with different feature subsets.

KNNC


size

Performance

(%)

Original (F0) 100 78.50

mRMR (F1) 45 83.06

NSGA-I I(F2) (patrol optimal solutions) 1 25 87.78

2 26 88.89

3 30 94.45

4 31 96.67

mRMR + NSGA-II(F3) (patrol optimal

solutions)

1 4 85.56

2 5 92.22

3 6 94.45

4 8 97.78

Table 4

Performances of LDC with different feature subsets.

LDC


size

Performance

(%)

Original (F0) 100 38.28

mRMR (F1) 38 89.00

NSGA-I I(F2) (patrol optimal solutions ) 1 14 91.112 15 95.56

3 16 97.78

4 19 98.89

5 23 100


solutions)

1 5 93.34

2 6 94.45

3 7 96.67

4 10 97.78

5 12 98.89

Table 5

Performances of LS-SVM with different feature subsets.

LS-SVM


size

Performance

(%)

Original (F0) 100 88.28

mRMR (F1) 54 93.61

NSGA-I I(F2) (patrol optimal s olutions) 1 28 88.89

2 32 97.78

3 33 98.89

4 35 100


solutions)

1 6 88.89

2 12 92.22

3 16 100




(4) For different classifiers, the classification rates with the same

feature selection technique varied from each other. It

approved the assumption that there does not exist a com-

mon optimal feature subset for all classifiers. This also sug-

gests that different classifiers with different feature subsets

potentially can offer complementary information about the

patterns to be classified. So, it is very desirable to combine

different classifiers to achieve a better performance. The

future work can be done to explore the capacity of classifier

ensemble schemes for intelligent fault diagnosis.

6. Conclusions

This investigation has described a feature extraction and feature

selection scheme for hybrid fault diagnosis of gearbox based on S

transform, non-negative matrix factorization, mutual information

and NSGA-II. For feature extraction, the S transform was firstly

adopted to obtain a high resolution time–frequency distributions

of vibration signals. Then the non-negative matrix factorization

technique was applied to extract informative features from the

time–frequency representations.

Then a two stage feature selection scheme combines the filter

and wrapper method is outlined based on mutual information

and NSGA-II. The filter method was implemented by mRMR crite-

rion based on mutual information and a candidate feature subset

can be obtained. Based on the candidate feature subset, wrapper

technique combined with the multi-objective optimization evolu-

tionary algorithm NSGA-II was adopted to get a more compact fea-

ture subset and higher classification accuracy.

Eight different fault states were simulated on a gearbox for

evaluating the effectiveness of the proposed intelligent fault diag-

nosis system. In order to assess the generality of the proposed fea-

ture extraction and selection methods, four different classifiers

were employed in this investigation. Moreover, some other feature

selection schemes were also implemented and compared with the

proposed approach. Experiment results have shown that the pro-

posed feature extraction and feature selection scheme can give

very promising performances with very small feature subset

dimension.

This research demonstrates clearly that the presented intelli-

gent fault diagnosis system has great potential to be an effective

and efficient tool for the fault diagnosis of gearbox and can be eas-

ily extended to be applied to other rotating machinery.

Acknowledgments

This research is supported by the National Natural Science

Foundation of China (No. 50705097) and Natural Science Founda-

tion of Hubei Province (No. E2007001048).

References

Al-Ghamd, A. M., & Mba, D. (2006). A comparative experimental study on the use of

acoustic emission and vibration analysis for bearing defect identification and

estimation of defect size. Mechanical Systems and Signal Processing, 20(7),

1537–1571.

Assous, S., Humeau, A., Tartas, M., Abraham, P., & L’Huillier, J. P. (2006). S-transform

applied to laser doppler flowmetry reactive hyperemia signals. IEEE Transactionson Biomedical Engineering, 53(6), 1032–1037.

Chen, Z., He, Y., Chu, F., & Huang, J. (2003). Evolutionary strategy for classification

problems and its application in fault diagnostics. Engineering Applications of Artificial Intelligence, 16 (1), 31–38.

Cho, Y. C., & Choi, S. (2005). Nonnegative features of spectro-temporal sounds for

classification. Pattern Recognition Letters, 26 (9), 1327–1336.

Dash, P. K., & Chilukuri, M. V. (2004). Hybrid S-transform and Kalman filtering

approach for detection and measurement of short duration disturbances in

power networks. IEEE Transactions on Instrumentation and Measurement, 53(2),

588–596.

Dash, P. K., Panigrahi, B. K., & Panda, G. (2003). Power quality analysis using S-transform. IEEE Transactions on Power Delivery, 18(2), 406–411.

Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist

multiobjective genetic algorithm: NSGA-II. IEEE Transactions on EvolutionaryComputation, 6 (2), 182–197.

Du, Q., & Chang, C. I. (2000). A linear constrained distance-based discriminant

analysis for hyperspectral image classification. Pattern Recognition, 34(2),

361–373.

Firpi, H., & Vachtsevanos, G. (2008). Genetically programmed-based artificial

features extraction applied to fault detection. Engineering Applications of Artificial Intelligence, 21(4), 558–568.

Grother, P. J., Candela, G. T., & Blue, J. L. (1997). Fast implementations of nearest

neighbor classifiers. Pattern Recognition, 30(3), 459–465. Jack, L. B., & Nandi, A. K. (2002). Fault detection using support vector machines and

artificial neural networks, augmented by genetic algorithms. MechanicalSystems and Signal Processing, 16 (2-3), 373–390.

Jack, L. B., Nandi, A. K., & McCormick, A. C. (2000). Diagnosis of rolling element

bearing faults using radial basis function networks. Applied Signal Processing,6 (1), 25–32.

Junsheng, C., Dejie, Y., & Yu, Y. (2006). A fault diagnosis approach for roller bearings

based on EMD method and AR model. Mechanical Systems and Signal Processing, 20(2), 350–362.

Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. ArtificialIntelligence, 97 (1-2), 273–324.

Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative

matrix factorization. Nature, 401(6755), 788–791.

Lei, Y., He, Z., Zi, Y., & Chen, X. (2008). New clustering algorithm-based fault

diagnosis using compensation distance evaluation technique. MechanicalSystems and Signal Processing, 22(2), 419–435.

Lei, Y., He, Z., Zi, Y., & Hu, Q. (2007). Fault diagnosis of rotating machinery based on

multiple ANFIS combination with GAs. Mechanical Systems and Signal Processing, 21(5), 2280–2294.

Lei, Y., & Zuo, M. J. (2009). Gear crack level identification based on weighted K

nearest neighbor classification algorithm. Mechanical Systems and SignalProcessing, 23(5), 1535–1547.

Li, W., Shi, T., Liao, G., & Yang, S. (2003). Feature extraction and classification of gear

faults using principal component analysis. Journal of Quality in MaintenanceEngineering, 9(2), 132–143.

Lin, J., & Qu, L. (2000). Feature extraction based on morlet wavelet and its

application for mechanical fault diagnosis. Journal of Sound and Vibration, 234(1), 135–148.

Lin, J., & Zuo, M. J. (2003). Gearbox fault diagnosis using adaptive wavelet filter.

Mechanical Systems and Signal Processing, 17 (6), 1259–1269.

Liu, W., & Zheng, N. (2004). Non-negative matrix factorization based methods for

object recognition. Pattern Recognition Letters, 25(8), 893–897.

Oehlmann, H., Brie, D., Tomczak, M., & Richard, A. (1997). A method for analysing

gearbox faults using time–frequency representations. Mechanical Systems andSignal Processing, 11(4), 529–545.

Oh, I. S., Lee, J. S., & Moon, B. R. (2004). Hybrid genetic algorithms for feature

selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26 (11),1424–1437.

Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information:

Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (8), 1226–1238.

Peng, Z. K., Tse, P. W., & Chu, F. L. (2005). A comparison study of improved Hilbert-

Huang transform and wavelet transform: Application to fault diagnosis for

rolling bearing. Mechanical Systems and Signal Processing, 19(5), 974–988.

Pu, X., Yi, Z., Zheng, Z., Zhou, W., & Ye, M. 2005. Face recognition using fisher non-

negative matrix factorization with sparseness constraints. Paper presented at

the lecture notes in computer science.

Randall, R. B., Antoni, J., & Chobsaard, S. (2001). The relationship between spectral

correlation and envelope analysis in the diagnostics of bearing faults and other

cyclostationary machine signals. Mechanical Systems and Signal Processing, 15(5),

945–962.

Samanta, B. (2004). Gear fault detection using artificial neural networks and

support vector machines with genetic algorithms. Mechanical Systems and SignalProcessing, 18(3), 625–644.

Samanta, B., Al-Balushi, K. R., & Al-Araimi, S. A. (2003). Artificial neural networks

andsupport vector machines with genetic algorithmfor bearing fault detection.Engineering Applications of Artificial Intelligence, 16 (7-8), 657–665.

Samanta, B., & Nataraj, C. (2009). Use of particle swarm optimization for machinery

fault detection. Engineering Applications of Artificial Intelligence, 22(2), 308–

316.

Saravanan, N., Cholairajan, S., & Ramachandran, K. I. (2008). Vibration-based fault

diagnosis of spur bevel gear box using fuzzy technique. Expert Systems with Applications.

Srinivas, N., & Deb, K. (1994). Muiltiobjective optimization using nondominated

sorting in genetic algorithms. Evolutionary Computation, 2(3), 221–248.

Srinivasan, D., Cheu, R. L., Poh, Y. P., & Ng, A. K. C. (2000). Automated fault detection

in power distribution networks using a hybrid fuzzy-genetic algorithm

approach. Engineering Applications of Artificial Intelligence, 13(4), 407–418.

Stockwell, R. G., Mansinha, L., & Lowe, R. P. (1996). Localization of the complex

spectrum: The S transform. IEEE Transactions on Signal Processing, 44(4),

998–1001.

Sugumaran, V., Muralidharan, V., & Ramachandran, K. I. (2007). Feature selection

using decision tree and classification through proximal support vector machine

for fault diagnostics of roller bearing. Mechanical Systems and Signal Processing, 21(2), 930–942.




Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine

classifiers. Neural Processing Letters, 9(3), 293–300.

Tan, T., Fu, X., Zhang, Y., & Bourgeois, A. G. (2008). A genetic algorithm-based

method for feature subset selection. Soft Computing, 12(2), 111–120.

Veenman, C. J., & Tax, D. M. J., 2005. A weighted nearest mean classifier for sparse

subspaces. Paper presented at the proceedings of the IEEE computer society

conference on computer vision and pattern recognition.

Wang, G., Luo, Z., Qin, X., Leng, Y., & Wang, T. (2008). Fault identification and

classification of rolling element bearing based on time-varying autoregressive

spectrum. Mechanical Systems and Signal Processing, 22(4), 934–947.

Wang, W. J., & McFadden, P. D. (1993). Early detection of gear failure by vibrationanalysis – ii. interpretation of the time–frequency distribution using image

processing techniques. Mechanical Systems and Signal Processing, 7 (3), 205–215.

Widodo, A., Yang, B.-S., & Han, T. (2007). Combination of independent component

analysis and support vector machines for intelligent faults diagnosis of

induction motors. Expert Systems with Applications, 32(2), 299–312.

Wuxing, L., Tse, P. W.,Guicai, Z., & Tielin, S. (2004). Classificationof gear faults using

cumulants and the radial basis function network. Mechanical Systems and SignalProcessing, 18(2), 381–389.

Yang, J., & Honavar, V. (1998). Feature subset selection using genetic algorithm. IEEE Intelligent Systems and Their Applications, 13(2), 44–48.

Yuan, Z., & Oja, E., 2005. Projective nonnegative matrix factorization for image

compression and feature extraction. Paper presented at the lecture notes in

computer science.

Zhu, F., & Guan, S. (2004). Feature selection for modular GA-based classification.

Applied Soft Computing Journal, 4(4), 381–393.

Zhu, Z., Ong, Y. S., & Dash, M. (2007). Wrapper-filter feature selection algorithmusing a memetic framework. IEEE Transactions on Systems, Man, and Cybernetics,Part B: Cybernetics, 37 (1), 70–76.


Gearbox Extraction

Documents

Transcript of Gearbox Extraction