Gearbox Extraction
Transcript of Gearbox Extraction
8/13/2019 Gearbox Extraction
http://slidepdf.com/reader/full/gearbox-extraction 1/10
A new feature extraction and selection scheme for hybrid fault diagnosis of gearbox
Bing Li a,b,⇑, Pei-lin Zhang a, Hao Tian a, Shuang-shan Mi b, Dong-sheng Liu b, Guo-quan Ren a
a First Department, Mechanical Engineering College, No. 97, Hepingxilu Road, Shi Jia-zhuang, He Bei province, PR Chinab Forth Department, Mechanical Engineering College, No. 97, Hepingxilu Road, Shi Jia-zhuang, He Bei province, PR China
a r t i c l e i n f o
Keywords:
GearboxHybrid fault diagnosis
Feature extraction
Feature selection
S transform
Non-negative matrix factorization (NMF)
Mutual information
Non-dominated sorting genetic algorithms
II (NSGA-II)
a b s t r a c t
A novel feature extraction and selection scheme was proposed for hybrid fault diagnosis of gearbox based
on S transform, non-negative matrix factorization (NMF), mutual information and multi-objective evolu-tionary algorithms. Time–frequency distributions of vibration signals, acquired from gearbox with differ-
ent fault states, were obtained by S transform. Then non-negative matrix factorization (NMF) was
employed to extract features from the time–frequency representations. Furthermore, a two stage feature
selection approach combining filter and wrapper techniques based on mutual information and non-dom-
inated sorting genetic algorithms II (NSGA-II) was presented to get a more compact feature subset for
accurate classification of hybrid faults of gearbox. Eight fault states, including gear defects, bearing
defects and combination of gear and bearing defects, were simulated on a single-stage gearbox to eval-
uated the proposed feature extraction and selection scheme. Four different classifiers were employed to
incorporate with the presented techniques for classification. Performances of four classifiers with differ-
ent feature subsets were compared. Results of the experiments have revealed that the proposed feature
extraction and selection scheme demonstrate to be an effective and efficient tool for hybrid fault diagno-
sis of gearbox.
2011 Elsevier Ltd. All rights reserved.
1. Introduction
Gearbox is one of the core components in rotating machinery
and has been widely employed in various industrial equipments.
Faults occurring in gearbox such as gears and bearings defects
must be detected as early as possible to avoid fatal breakdowns
of machines and prevent loss of production and human casualties.
Vibration based analysis is the most commonly used technique to
monitoring the condition of gearboxes. By employing appropriate
data analysis algorithms, it is feasible to detect changes in vibra-
tion signals caused by fault components, and to make decisions
about the gearboxes health status (Al-Ghamd & Mba, 2006; Chen,
He, Chu, & Huang, 2003; Lin & Zuo, 2003; Saravanan, Cholairajan,
& Ramachandran, 2008; Wang & McFadden, 1993). Although often
the visual inspection of the frequency domain features of the mea-
sured signals is adequate to identify the faults, many techniques
available presently require a good deal of expertise to apply them
successfully. Simpler approaches are needed which allow relatively
unskilled operators to make reliable decisions without the need for
a diagnosis specialist to examine data and diagnose problems. Con-
sequently, there is a need for a reliable, fast and automated proce-
dure of diagnostics. Various intelligent techniques such as artificial
neural networks (ANN), support vector machine (SVM), fuzzy logic
and evolving algorithms (EA) have been successfully applied to
automated detection and diagnosis of machine conditions (Firpi
& Vachtsevanos, 2008; Lei, He, Zi, & Hu, 2007; Lei & Zuo, 2009;
Samanta, 2004; Samanta, Al-Balushi, & Al-Araimi, 2003; Samanta
& Nataraj, 2009; Srinivasan, Cheu, Poh, & Ng, 2000; Wuxing, Tse,
Guicai, & Tielin, 2004). They have largely improved the reliability
and automation of fault diagnosis systems for gearbox. For intelli-
gent fault diagnosis systems, feature extraction and feature selec-
tion schemes can be regarded as the most two important steps.
Feature extraction is a mapping process from the measured sig-
nal space to the feature space. Representative features associated
with the conditions of machinery components should be extracted
by using appropriate signal processing and calculating approaches.
Over the past few years, various techniques including Fourier
transform (FT), envelope analysis, wavelet analysis, empirical
mode decomposition (EMD) and time–frequency distributions
were employed to processing the vibration signals (Lin & Qu,
2000; Oehlmann, Brie, Tomczak, & Richard, 1997; Peng, Tse, &
Chu, 2005; Randall, Antoni, & Chobsaard, 2001; Wang & McFadden,
1993). Based on these processing techniques, statistic calculation
methods, autoregressive model (AR), singular value decomposition
(SVD), principal component analysis (PCA) and independent com-
ponent analysis (ICA) have been adopted to extracting representa-
tive features for machinery fault diagnosis ( Junsheng, Dejie, & Yu,
2006; Lei, He, Zi, & Chen, 2008; Li, Shi, Liao, & Yang, 2003; Wang,
Luo, Qin, Leng, & Wang, 2008; Widodo, Yang, & Han, 2007). Even
though several techniques have been proposed in the literature
0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved.doi:10.1016/j.eswa.2011.02.008
⇑ Corresponding author at: First Department, Mechanical Engineering College,
No. 97, Hepingxilu Road, Shi Jia-zhuang, He Bei province, PR China.
E-mail address: [email protected] (B. Li).
Expert Systems with Applications 38 (2011) 10000–10009
Contents lists available at ScienceDirect
Expert Systems with Applications
j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a
8/13/2019 Gearbox Extraction
http://slidepdf.com/reader/full/gearbox-extraction 2/10
for feature extraction, it still remains a challenge in implementing
a diagnostic tool for real-world monitoring applications because of
the complexity of machinery structures and operating conditions.
This investigation implement the feature extraction scheme by
utilizing two newly developed techniques, the S transform and
non-negative matrix factorization (NMF). The S transform intro-
duced by Stockwell (Stockwell, Mansinha, & Lowe, 1996), which
combines the separate strengths of the STFT and wavelet trans-
forms, has provided an alternative approach to process the non-
stationary signals generated by mechanical systems. It employs
variable window length. The frequency-dependent window func-
tion produces higher frequency resolution at lower frequencies,
while at higher frequencies, sharper time localization can be
achieved. S transform has become a valuable tool for the analysis
of signals in many applications (Assous, Humeau, Tartas, Abraham,
& L’Huillier, 2006; Dash & Chilukuri, 2004; Dash, Panigrahi, & Pan-
da, 2003). Non-negative matrix factorization (NMF) is a new sub-
space decomposition technique which is proposed to extract key
features of data by operating an iterative matrix factorization
(Lee & Seung, 1999). Differs from other similar methods, NMF im-
poses a non-negativity constraint on factorization which leads to
form intuitive and parts based representations of data on its fac-
tors. Due to its superior property, NMF has been adopted to various
applications such as face expression and recognition (Pu, Yi, Zheng,
Zhou, & Ye, 2005), object detection (Liu & Zheng, 2004), image
compression and classification (Yuan & Oja, 2005), sounds classifi-
cation (Cho & Choi, 2005), etc. In this paper, a novel feature extrac-
tion scheme based on S transform and non-negative matrix
factorization (NMF) for hybrid fault diagnosis of gearbox is
presented.
Feature selection is another indispensable procedure for intelli-
gent fault diagnosis system, since there are still high noises, irrel-
evant or redundant information in these extracted features.
Otherwise, too many features can cause of curse of dimensionality
due to the fact that the number of training samples must grow
exponentially with the number of features in order to learn an
accurate model. Therefore, a feature selection procedure is indeedneeded before classification. Several researches have been done on
this issue, such as genetic algorithms (GAs) ( Jack & Nandi, 2002;
Jack, Nandi, & McCormick, 2000; Samanta, 2004), decision tree
(Sugumaran, Muralidharan, & Ramachandran, 2007) and distance
evaluation technique (Lei et al., 2008; Lei & Zuo, 2009).
Filters and wrappers methods are the two mainly categories of
feature selection algorithms. Filter methods evaluate the goodness
of the feature subset by using the intrinsic characteristic of the
data. They are relatively computationally cheap since they do not
involve the induction algorithm. However, they also take the risk
of selecting subsets of features that may not match the chosen
induction algorithm. Wrapper methods, on the contrary, directly
use the classifiers to evaluate the feature subsets. They generally
outperform filter methods in terms of prediction accuracy, but theyare generally computationally more intensive (Zhu, Ong, & Dash,
2007). In summary, wrapper and filter methods can complement
each other, in that filter methods can search through the feature
space efficiently while the wrappers can provide good accuracy.
It is desirable to combine the filter and wrapper methods to
achieve high efficiency and accuracy simultaneously. In this work,
a two stage feature selection combing filter and wrapper selection
technique base on the mutual information and the improved non-
dominated sort genetic algorithm NSGA-II was proposed. In the
first stage, a candidate feature subset was chosen by the max-rel-
evance and min-redundancy (mRMR) criterion (Peng, Long, & Ding,
2005) from the original feature set. Then at the second stage, clas-
sifier combined with NSGA-II was adopted to find a more compact
feature subset from the candidate feature subset obtained by filtermethods. In this stage, Feature selection problem is defined as a
multi-objective problem dealing with two competing objectives.
Consequently, an optimal feature subset consists of a minimal
number of features and produces the minimum classification error
can be obtained for intelligent fault diagnosis. Fig. 1 displays the
flowchart of the intelligent fault diagnosis system for gearbox inthis investigation.
The rest of this investigation is organized as follows. Section 2
describes the experiment setup and vibration dataset. The feature
extraction method based on S transform and NMF is detailed in
Section 3. The two stage feature selection scheme based on mutual
information and NSGA-II is outlined in Section 4. Section 5 presents
the experiment results and discussions. Conclusions are summa-
rized in Section 6.
2. Experimental setup and data acquisition
Fig. 2 displays the diagram of the experimental system used in
this work to evaluate the performance of the proposed approach.
The experiment system includes a single-stage gearbox, an ac mo-tor and a magnetic brake. The ac motor is used to drive the gearbox
Vibration signals
Final optimal feature subset
Original feature subset
Gearbox with sensors
Signal acquisition system
NMF
Filter methods based onmutual information
Time-frequency distributions
Wrapper methods based on
NSGA- II
Candidate feature subset
Classifiers outputs
Gearbox condition diagnosis
Data acquisition
Feature extraction
Feature selection
Classification
S transform
Fig. 1. Flowchart of the presented intelligent fault diagnosis system.
Gear A: 30T
Gear B: 50T
B1 B2
B3B4
Motor
Load
Fig. 2. Structure of the experiment gearbox.
B. Li et al. / Expert Systems with Applications 38 (2011) 10000–10009 10001
8/13/2019 Gearbox Extraction
http://slidepdf.com/reader/full/gearbox-extraction 3/10
and the rotating speed is controlled by a speed controller, which al-
lows the tested gearbox to operate under various speeds. The load
is provided by the magnetic brake connected to the output shaft
and the torque can be adjusted by a brake controller. There are
two shafts inside the gearbox, which are mounted to the gearbox
housing by four rolling element bearings. Gear A has 30 teeth
and gear B has 50 teeth.
Local faults such as gear tooth wear, bearing inner race defect,
bearing outer race defect are most frequently detected fault mode
studied in gearbox fault diagnosis. Previous studies mainly focused
on gear faults or bearing faults independently. Little research has
been done to detect the hybrid faults of gears and bearings occur
simultaneously. This work investigated this issue by simulating
hybrid fault types of gears and bearings simultaneously in a sin-
gle-stage gearbox. Eight fault states including gear wear, bearing
inner race defect, bearing outer race defect and their combinations
were tested in the experiments.
As shown in Fig. 2, all the defects were set on gear A and bearing
B1. The detailed descriptions about the eight operating states are
summarized in Table 1.
Vibration signals were acquired by using acceleration sensors
mounted on the four bearing bases. The sampling frequency is
6400 Hz and sampling points is 4096. Twenty samples for everystate were acquired in the tests. Hence, 160 samples in total were
collected for further investigations. Fig. 3 demonstrates the wave-
forms of the eight operating states in time domain.
3. Feature extraction based on S transform and non-negative
matrix factorization (NMF)
In this section, the feature extraction scheme based on the two
newly developed techniques, mean the S transform and non-nega-
tive matrix factorization (NMF), was presented.
Vibration signals acquired from gearbox are non-stationary and
complex, which contains enrich information about the operation
conditions. Thus the joint time–frequency analysis is often adopted
to describe the local information of these unstable signals com-
pletely. The S transform, which combines the separate strengths
of the STFT and wavelet transforms, was employed to obtain a high
resolution time–frequency representations of vibration signals.
However, it is not reasonable to classify these time–frequency
distributions directly because of the data dimensions are too high
to deal with. Thus the NMF technique was utilized to reduce the
high dimensional feature space while the most information of
the original time–frequency representations can be reserved.
3.1. S transform
The S transform, put forward by Stockwell in 1996, can be re-
garded as an extension to the ideas of the Gabor transform andthe wavelet transform. The S transform of signal x(t ) is defined as:
S ðs; f Þ ¼Z þ1
1 xðt Þwðt sÞe j2p ft dt ð1Þ
where
wðt Þ ¼ 1
r ffiffiffiffiffiffiffi2p
p e t 2
2r2 ð2Þ
And
r ¼ 1j f j ð3Þ
Then the S transform can be given by combining the Eq. (1)–(3):
S ðs; f Þ ¼Z þ1
1 xðt Þ j f j ffiffiffiffiffiffiffi
2pp eðt sÞ2 f 2
2 e j2p ft dt ð4Þ
Since S transform is a representation of the local spectra, Fourier
or time average spectrum can be directly obtainedby averaging the
local spectrums as:Z þ1
1S ðs; f Þds ¼ X ð f Þ ð5Þ
where X ( f ) is the Fourier transform of x(t ).The inverse S transform is
given by
xðt Þ ¼Z þ1
1
Z þ1
1S ðs; f Þe j2p ft dsdf ð6Þ
The main advantage of the S transform over the short-time Fourier
transform (STFT) is that the standard deviation r is actually a func-
tion of frequency f . Consequently, the window function is also a
function of time and frequency. As the width of the window is con-
trolled by the frequency, it can obviouslybe seen that the windowis
wider in the time domain at lower frequencies, and narrower at
higher frequencies. In other words, the window provides good
localization in the frequency domain for low frequencies while pro-
viding good localization in time domain for higher frequencies. It is
a very desirable characteristic for accurate representation of non-
stationary vibration signals in time–frequency domain.
3.2. Non-negative matrix factorization (NMF)
The NMF algorithm is a technique that compresses a matrix into
a smaller number of basis functions and their encodings (Lee &
Seung, 1999). The factorization can be expressed as following:
V nm W nr H r m ð7ÞwhereV denotes a n m matrix and m is the number of examples in
the dataset, each column of which contains an n-dimensional ob-
served data vector with non-negative values. This matrix then
approximately factorized into a n r matrix W and a r m matrix
H . The rank r of the factorization is usually chosen such that
(n + m)r < nm, and hence the compression or dimensionality reduc-
tion is achieved. This results in a compressed version of the original
data matrix. In other words, each data vector V is approximated by a
linear combination of the columns of W , weighted by the compo-
nents of H . Therefore, W can be regarded as basis matrix and H as
coefficient matrix.
The key characteristic of NMF is the non-negativity constraints
imposed on the two factors, and the non-negativity constraints are
compatible with the intuitive notion of combining parts to form a
whole.
In order to complete approximate factorization in Eq (7), a cost
function is needed to quantify the quality of the approximation.
NMF uses the divergence measure as the objective function:
F ¼Xn
i¼1 Xm
j¼1
V ij logðWH Þij ðWH Þijh i ð8Þ
which subjects to the non-negative constraints as described above.
Table 1
Description of the gearbox experiments.
Fault
states
Fault description Load
(Nm)
Speed
(rpm)
1 Normal 100 1200
2 Gear wear
3 Bearing inner race
4 Bearing outer race
5 Gear wear and bearing inner race6 Gear wear and bearing outer race
7 Bearing inner race and bearing outer race
8 Gear wear and bearing inner race and
bearing outer race
10002 B. Li et al. / Expert Systems with Applications 38 (2011) 10000–10009
8/13/2019 Gearbox Extraction
http://slidepdf.com/reader/full/gearbox-extraction 4/10
In order to obtain W and H , a multiplicative update rule is givenin (Lee & Seung, 1999) as follows:
W ia ¼ W iaXm j¼1
V ijðWH Þij
H aj ð9Þ
W ia ¼ W iaPmi¼1W ia
ð10Þ
H aj ¼ H ajXni¼1
W iaV ij
ðWH Þijð11Þ
In this way, the basis matrix W and the coefficient matrix H can beobtained based on Eq. (9)–(11) in an iterative procedure.
4. Two stage feature selection based on the mutual informationand NSGA-II
As described in Section 3, for every signal f (n), we can get
numbers of features via the S transform and NMF. Even a dimen-
sion reduction has been done by the NMF, direct manipulation of
a whole set of feature components is not appropriate because the
feature space has still high dimensionality, and the existence of
irrelevant and redundant components makes the classification
unnecessarily difficult. Thus, a feature selection scheme combine
filter method and wrapper method based on mutual information
and NSGA-II is applied to identify a set of robust features
that provides the most discrimination among the classes of gear-
box vibration data. This will significantly ease the design of
the classifier and enhance the generalization capability of thesystem.
0 0.1 0.2 0.3 0.4 0.5 0.6-2
-1
0
1
2
A m p l i t u d e [ v ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6-2
-1
0
1
2
A m p l i t u d e [ v ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6-2
-1
0
1
2
A m p l i t u d e [ v ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6-2
-1
0
1
2
A m p l i t u d e [ v ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6-2
-1
0
1
2
A m p l i t u d e [ v ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6-2
-1
0
1
2
A m p l i t u d e [ v ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6-2
-1
0
1
2
A m p l i t u d e [ v ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6-2
-1
0
1
2
A m p l i t u d e [ v ]
Time [s]
a b
c d
e f
g h
Fig. 3. Vibration signals acquired from eight states of gearbox in the experiments: (a)–(h) corresponds to 1–8 states in Table 1.
B. Li et al. / Expert Systems with Applications 38 (2011) 10000–10009 10003
8/13/2019 Gearbox Extraction
http://slidepdf.com/reader/full/gearbox-extraction 5/10
4.1. Filter method using max-relevance and min-redundancy (mRMR)
based on mutual information
4.1.1. Mutual information
Mutual information is one of the most widely used measures to
define relevancy of variables (Peng et al., 2005). In this section, we
focus on feature selection method based on mutual information.
Given two random variables x
and y
, their mutual information
can be defined in terms of their probabilistic density functions
p( x), p( y) and p( x, y):
I ð x; yÞ ¼Z Z
pð x; yÞ log pð x; yÞ pð xÞ pð yÞ dxdy ð12Þ
The estimation of the mutual information of two variables was
detailed in (Peng et al., 2005).
In supervised classification, one can view the classes as a vari-
able (that we will name C ) with L possible values (where L is the
number of classes of the system) and the feature component as an-
other variable (that we will name X ) with K possible values (where
K is the number of parameters of the system). So, one will be able
to compute the mutual information I ( xk, c ) between the classes c
and the feature xk (k = 1, 2, . . ., K ):
I ð xk; c Þ ¼Z Z
pð xk; c Þ log pð xk; c Þ pð xkÞ pðc Þ dxkdc ð13Þ
Then the informative variables with larger I ( xk, c ) can be identified.
A more compact feature subset can be obtained via selecting the d
best features based on Eq. (13) from the original feature set.
Eq. (13) provides us with a measure to evaluate the effective-
ness of the ‘‘global’’ feature that is simultaneously suitable to dif-
ferentiate all classes of signals. For a small number of classes,
this approach may be sufficient. The more signal classes, the more
ambiguous I ( xk, c ) becomes.
4.1.2. Max-relevance and min-redundancy
Max-relevance means that the selected features xi are required,
individually, to have the largest mutual information I ( xi, c ). Itmeans that the m best individual features should be selected
according to this criterion. It can be represented as
max DðS ; c Þ; D ¼ 1
jS jX x2S
I ð xi; c Þ ð14Þ
where |S | denotes the number of features contained by S.
However, it has been proved that the simply combination of the
best individual features do not necessarily lead to a good perfor-
mance. In other words, ‘‘the m best features are not the best m fea-
tures’’ (Kohavi & John, 1997; Peng et al., 2005). The most important
problem of the max-relevance is it neglects the redundancy be-
tween features and may cause the degradation of the classification
performance.
So the min-redundancy criterion should be added to the selec-
tion of the optimal subsets. It can be represented as:
min RðS Þ; R ¼ 1
jS j2X
xi; x j2S I ð xi; x jÞ ð15Þ
The criterion combining the above two constraints is called the
‘‘minimal-redundancy–maximal-relevance’’ (mRMR) (H. Peng
et al., 2005). The operator U(D, R) is defined to optimize D and R
simultaneously:
max UðD;RÞ; U ¼ D R ð16Þ
4.1.3. Candidate feature subset obtained based on max-relevance and
min-redundancy
In practice, greedy search methods can be used to find the near-optimal features by U. Let F to be the original feature sets, S to be
the selected subsets. Suppose that we already have S m1, means we
have selected m – 1 features. The next work is to select the mth
feature from the set fF S m1g. This is done according to the fol-
lowing criterion:
max x j2F S m1
I ð x j; c Þ 1
m 1
X xi2S m1
I ð x j; xiÞ" #
ð17Þ
The main steps can be represented as:
Step 1: Let F to be the original feature set, S to be the selected
subset. We initiate S to be a empty subset, S ? {}.
Step 2: Calculate the relevance of individual feature xi with the
target class c , denoted by I ( xi, c ).
Step 3: Find the feature xk have the maximum relevance:
I ð xk; c Þ ¼ max xi2F
I ð xi; c Þ
Let F 1 ! fF xkg; S 1 ! fS þ xkg:
Step 4:
for m ¼ 2 : N
Let x j 2 F m1, xi 2 S m1, find the xk according to the following
criterion:
max x j2F m1
I ð x j; c Þ 1
m 1
X xi2S m1
I ð x j; xiÞ" #
Let F m ¼ fF m1 xkg; S m ¼ fS m1 þ xkgend
In this way, N sequential feature subsets can be obtainedand sat-
isfy S 1 S 2 S N . Thenextproblemis how to choosean optimal
set from the serial sets, in other words, how to determinate the
number of features that the sub-optimal set contained. Accordingto the cross-validation method, we compare the performances of
the N sequential feature subsets. The feature subset that corre-
sponds to best performance can be chosen as the candidate feature
subset for further more sophisticated selection using wrapper.
4.2. Wrapper method based on NSGA-II
Genetic algorithms were intensively used for feature selection
to solve the combinatory problem and to provide efficient explora-
tion of the solutions’ space. GAs work with a set of candidate solu-
tions called a population. Based on the Darwinian principle of
‘survival of the fittest’, GAs obtain the optimal solution after a ser-
ies of iterative computations. GAs generate successive populations
of alternate solutions that are represented by a chromosome, i.e. asolution to the problem, until acceptable results are obtained.
Associated with the characteristics of exploitation and exploration
search, GAs can deal with large search spaces efficiently, and hence
has less chance to get local optimal solution than other algorithms.
GAs were specifically developed for this task and results provided
by GAs solution were more efficient than classical methods devel-
oped for feature selection as confirmed in (Oh, Lee, & Moon, 2004;
Tan, Fu, Zhang, & Bourgeois, 2008; Yang & Honavar, 1998; Zhu &
Guan, 2004). These researches mainly used the single objective
optimization GAs which provide one optimal solution with the
maximum classification performance.
Feature selection problem can be defined as a multi-objective
problem dealing with two competing objectives. Consequently,
an optimal feature set has to be of a minimal number of features and have to produce the minimum classification error.
10004 B. Li et al. / Expert Systems with Applications 38 (2011) 10000–10009
8/13/2019 Gearbox Extraction
http://slidepdf.com/reader/full/gearbox-extraction 6/10
Non-dominated sorting genetic algorithm (NSGA) was suggested
by Goldberg and implemented by Srinivas and Deb (1994).
Although NSGA has been proved to be effective for multi-
objective optimization problems, NSGA has its drawbacks such as
high computational complexity of non-dominated sorting, lack of
elitism, and need for specifying the sharing parameter rshare. Aim
at such problems, Deb introduced NSGA-II (Deb, Pratap, Agarwal,
& Meyarivan, 2002) as an improved method which overcame the
original NSGA defects by alleviating computational complexity, by
introducing elitist-preserving mechanism and employing crowded
comparisonoperator. In this work, this new multi-objective optimi-
zation technique was employed to solve the problem of feature
selection with wrapper. More detaileddescriptionandimplementa-
tion methods about the NSGA-II canbe referred in (Deb et al., 2002).
For wrapper feature selection approach, there are several
factors for controlling the process of NSGA-II while searching the
sub-optimal feature subsets for classifiers. To apply NSGA-II to
feature selection, we focus on the following issues.
4.2.1. Fitness functions
Two competing objectives were defined as the fitness functions:
the first was minimization of the number of used features and the
second was minimization of the classification error. Four different
popular classifiers means the K nearest neighbor classifier (KNNC)
(Grother, Candela, & Blue, 1997), nearest mean classifier (NMC)
(Veenman & Tax, 2005), linear discriminant classifier (LDC) (Du &
Chang, 2000) and least-square support vector machine (LS-SVM)
(Suykens & Vandewalle, 1999) were employed as induction algo-
rithms to implement and evaluate the proposed feature selection
approach.
4.2.2. Encoding scheme
The binary coding system was used to represent the chromo-
some in this investigation. For chromosome representing the fea-
ture subsets, the bit with value ‘1’ represents the feature is
selected, and ‘0’ indicates feature is not selected.
4.2.3. Genetic operators
Genetic operator consists of three basic operators, i.e., selection,
crossover and mutation. The binary tournament selection, which
can obtain better result than the methods of proportional and gen-
itor selection, was adopted to select the next generation individual.
The used crossover technique was the uniform crossover consist-
ing on replacing genetic material of the two selected parents uni-
formly in several points. The mutation operator used in this work
was implemented as conventional mutation operator operating
on each bit separately and changing randomly its value.
5. Results and discussion
5.1. Feature extraction
For all the 160 samples as described in (Table 1), 160 times–
frequency matrices were obtained by employing the S transform
described in Section 3. Fig. 4 displays the time–frequency distribu-
tions of vibration signals from eight states of gearbox obtained by S
transform. It can be observed very obviously that the resolutions of
time–frequency representations calculated by S transform are
shown to be very satisfactory. The time–frequency representations
of the gearboxes with different states are shown to be different.
Consequently, we can expect that it will be very available to clas-
sify the vibration signals with S transform.
However, it is impossible to directly use the original time–
frequency matrix for classification because of the high dimension.
In this work, the dimension of the time–frequency matrix is1024 2048. Then the feature vector should have 2,097,152
dimensions if every matrix is regarded as an input vector. It is
not acceptable for any pattern recognition system to deal with
such high-dimensional input vectors. Thus it is very desirable to
reduce the data dimension to an acceptable scale, and, at the same
time, the information of the matrices should be reserved as much
as possible.
A new subspace decomposition technique NMF is employed
reduce the dimension of the time–frequency matrices. With S
transform, 160 time–frequency matrices can be obtained. All these
matrices are standardized and normalized to fulfill the non-
negative constrains for NMF. Forty samples, five samples for each
state, were selected as training samples. All the matrices are trans-
formed to vectors firstly and a training matrix can be formed for
NMF. We first apply the NMF to the training samples to extract
non-negative basis vectors W and associated encoding variables
H . Feature components can be computed from these non-negative
basis vectors W for other samples. The parameter r , whichis the re-
duced dimension, was chosen to be 100 in this paper. The iterative
step size is set to be 50. These parameters are chosen based on
some preliminary experiments. By computing the feature compo-
nents with the extracted basis vectors, 100 parameters can be
obtained as features for every sample. The feature dimension was
reduced from 2,097,152 to 100 by the NMF technique.
5.2. Feature selection
One hundred feature components can be obtained via the NMF
and S transform. A feature selection procedure is needed to find the
most informative feature components out from the original one
hundred features. According to the feature selection approach
based on the mutual information and NSGA-II as described in
Section 4, the most discriminative feature vector can be acquired.
We firstly partitioned the collected 160 samples into two parts:
a training dataset and a testing dataset (10 samples of each state
for training and another 10 samples for testing). We did this seg-
mentation randomly for 20 times to get a more robust evaluation
results.The sequential feature subsets S 1 S 2 S N were obtained
based on the training dataset by using the mRMR criterion and
the greedy search as described in Section 4. Then the performances
of these sequential feature subsets were evaluated by the testing
dataset. Four classifiers, as described in Section 4, were adopted
to assess the proposed scheme. Fig. 5 has given out the average
performances of four classifiers using the sequential feature sub-
sets over the 20 randomly segmented datasets.
It is clearly that the performances of four classifiers were not
keeping steadily improvement with the increasing of features
number. This phenomenon has proven the assumption that there
are many irrelevant and redundant features in the original feature
set. Another observation is that the performances of the four clas-
sifiers varied with different trends. The optimal feature subsets of four classifiers varied with each other, too. The candidate feature
subsets of the four classifiers are selected corresponding best per-
formances with the smallest feature subset size. So the candidate
feature subsets for NMC, KNNC, LDC and LS-SVM were chosen as
the S 50, S 45, S 38, S 54 respectively. The candidate feature subsets
were used for further selection using NSGA-II.
Based on the candidate feature subsets, more compact feature
subsets can be acquired by using wrappers with NSGA-II. The four
classifiers were also employed as induction algorithms combined
with the NSGA-II as wrappers for feature selection.
We implemented the NSGA-II methods with the following
parameters:
– population size: 100.– generation: 200.
B. Li et al. / Expert Systems with Applications 38 (2011) 10000–10009 10005
8/13/2019 Gearbox Extraction
http://slidepdf.com/reader/full/gearbox-extraction 7/10
– crossover rate: 0.9.
– mutation rate: 1/N (N is the size of the candidate feature
subset).
We also randomly partitioned the dataset into training dataset
and testing dataset for 20 times to assess the performances of thepresented wrapper methods with four classifiers.
0 10 20 30 40 50 60 70 80 90 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of features
C l a s s i f i c
a t i o n r a t e
SVM
NMC
LDC
KNNC
Fig. 5. Performances of four classifiers with sequential feature subsets obtained based on mRMR criterion.
F r e q u e n c y [ H z ]
Time [s]0 0.1 0.2 0.3 0.4 0.5 0.6
1000
2000
3000
F r e q u e n c y [ H z ]
Time [s]0 0.1 0.2 0.3 0.4 0.5 0.6
1000
2000
3000
F r e q u e n c y [ H z ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6
1000
2000
3000
F r e q u e n c y [ H z ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6
1000
2000
3000
F r e q u e n c y [ H z ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6
1000
2000
3000
F r e q u e n c y [ H z ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6
1000
2000
3000
F
r e q u e n c y [ H z ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6
1000
2000
3000
F
r e q u e n c y [ H z ]
Time [s]
0 0.1 0.2 0.3 0.4 0.5 0.6
1000
2000
3000
a b
c d
e f
g h
Fig. 4. S transforms of vibration signals in Fig. 3: (a)–(h) corresponds to 1–8 states in Table 1.
10006 B. Li et al. / Expert Systems with Applications 38 (2011) 10000–10009
8/13/2019 Gearbox Extraction
http://slidepdf.com/reader/full/gearbox-extraction 8/10
Table 2–5 displays the results of the proposed feature selection
scheme with the four classifiers. For comparison, the original fea-
ture set, the candidate feature subset and feature subset formed
by wrappers directly on the original feature set are also evaluated
with the four classifiers. For convenience, we denote the original
feature subset as F0, the filtered feature subset as F1, the wrapper
feature subset as F2 and the two stage feature subset as F3. One
thing worthy to be noted is that F2 and F3 all give multiple solu-
tions for the feature selection problem.
For the NMC algorithm, the best performance 98.99% is
achieved by using the feature subset F2 with 33 features. With
the original feature subset F0 which contains all the 100 features,
the classification rate is only 80.83%. The performance with F1 is
not very satisfactory but still superior to F0 and contains only half
of the features in F0. Although the performances of F3 were not
best when compared to F2, the dimensions of feature subset are
much small. With only 10 features, its performance achieved
96.67%.
For the KNNC classifier, the highest classification rate 97.78%
was reached by using F3 with 8 features. The performances of
the feature subsets in F2 were also very available but the feature
subset sizes are larger than F3. Similar to the NMC classifier, the
worst performance is obtained by F0 and F1 performed better than
F0 with less features.
With LDC classifier, 100% classification rate is achieved by using
F2 with 23 parameters. With feature subsets in F3, the best perfor-
mance is 98.99% which is comparable to F2 and the feature subset
size is 12, which is smaller than F2. The performance of the original
feature subset F0 is very poor with LDC classifier, which is only
38.28%. F1 gives a performance 89%, which is much better than
F0 but worse when compared with F2 and F3.
The highest classification rate 100% was achieved by F2 and F3
simultaneously when LS-SVM is employed as classifiers. It also can
be observed that the dimension of feature subset in F3 is much
smaller than F2. The performance of F0 is still the worst in four fea-ture subsets.
5.3. Discussions
(1) It can be observed that, for all classifiers, the performances
obtained by the original feature set (F0) demonstrate to be
the worst among four feature subsets. Otherwise, the size
of F0 is the largest. It ascertains our assumption that there
exist many irrelevant and redundant features, which will
decrease the performances and increase the computation
cost of classifiers, in the original feature set. Thus the feature
selection procedure is very necessary before classification.
(2) Comparing F1 with F0, it can be found that, the mRMR
method can get a better classification rate than the originalfeature set with less features. However, when compared
with F3, F4, its performances seem to be poor and the fea-
ture subsets sizes are larger. It is because the filter method
did not involve any induction algorithms and the classifica-
tion accuracy is poorer than the wrapper methods. An
advantage of mRMR lies in the computation cost than the
wrapper method.
(3) The performances of F3 are promising compared with other
methods. The classification rates of LS-SVM achieved 100%
with F3. The performances of F2 are also very available when
compared with F3. In some cases, F2 outperforms F3 in
terms of classification accuracy. The main disadvantages of
wrapper methods with original feature set lie in the large
computation cost and the larger size of the selected featuresubsets.
Table 2
Performances of NMC with different feature subsets.
NMC
Feature subsets Feature
size
Performance
(%)
Original (F0) 100 80.83
mRMR (F1) 50 84.83
NSGA-II (F2) (patrol optimal solutions) 1 23 95.56
2 28 97.78
3 33 98.89
mRMR + NSGA-II (F3) (patrol optimal
solutions)
1 5 92.22
2 6 93.33
3 8 94.45
4 10 96.67
Table 3
Performances of KNNC with different feature subsets.
KNNC
Feature subsets Feature
size
Performance
(%)
Original (F0) 100 78.50
mRMR (F1) 45 83.06
NSGA-I I(F2) (patrol optimal solutions) 1 25 87.78
2 26 88.89
3 30 94.45
4 31 96.67
mRMR + NSGA-II(F3) (patrol optimal
solutions)
1 4 85.56
2 5 92.22
3 6 94.45
4 8 97.78
Table 4
Performances of LDC with different feature subsets.
LDC
Feature subsets Feature
size
Performance
(%)
Original (F0) 100 38.28
mRMR (F1) 38 89.00
NSGA-I I(F2) (patrol optimal solutions ) 1 14 91.112 15 95.56
3 16 97.78
4 19 98.89
5 23 100
mRMR + NSGA-II(F3) (patrol optimal
solutions)
1 5 93.34
2 6 94.45
3 7 96.67
4 10 97.78
5 12 98.89
Table 5
Performances of LS-SVM with different feature subsets.
LS-SVM
Feature subsets Feature
size
Performance
(%)
Original (F0) 100 88.28
mRMR (F1) 54 93.61
NSGA-I I(F2) (patrol optimal s olutions) 1 28 88.89
2 32 97.78
3 33 98.89
4 35 100
mRMR + NSGA-II(F3) (patrol optimal
solutions)
1 6 88.89
2 12 92.22
3 16 100
B. Li et al. / Expert Systems with Applications 38 (2011) 10000–10009 10007
8/13/2019 Gearbox Extraction
http://slidepdf.com/reader/full/gearbox-extraction 9/10
(4) For different classifiers, the classification rates with the same
feature selection technique varied from each other. It
approved the assumption that there does not exist a com-
mon optimal feature subset for all classifiers. This also sug-
gests that different classifiers with different feature subsets
potentially can offer complementary information about the
patterns to be classified. So, it is very desirable to combine
different classifiers to achieve a better performance. The
future work can be done to explore the capacity of classifier
ensemble schemes for intelligent fault diagnosis.
6. Conclusions
This investigation has described a feature extraction and feature
selection scheme for hybrid fault diagnosis of gearbox based on S
transform, non-negative matrix factorization, mutual information
and NSGA-II. For feature extraction, the S transform was firstly
adopted to obtain a high resolution time–frequency distributions
of vibration signals. Then the non-negative matrix factorization
technique was applied to extract informative features from the
time–frequency representations.
Then a two stage feature selection scheme combines the filter
and wrapper method is outlined based on mutual information
and NSGA-II. The filter method was implemented by mRMR crite-
rion based on mutual information and a candidate feature subset
can be obtained. Based on the candidate feature subset, wrapper
technique combined with the multi-objective optimization evolu-
tionary algorithm NSGA-II was adopted to get a more compact fea-
ture subset and higher classification accuracy.
Eight different fault states were simulated on a gearbox for
evaluating the effectiveness of the proposed intelligent fault diag-
nosis system. In order to assess the generality of the proposed fea-
ture extraction and selection methods, four different classifiers
were employed in this investigation. Moreover, some other feature
selection schemes were also implemented and compared with the
proposed approach. Experiment results have shown that the pro-
posed feature extraction and feature selection scheme can give
very promising performances with very small feature subset
dimension.
This research demonstrates clearly that the presented intelli-
gent fault diagnosis system has great potential to be an effective
and efficient tool for the fault diagnosis of gearbox and can be eas-
ily extended to be applied to other rotating machinery.
Acknowledgments
This research is supported by the National Natural Science
Foundation of China (No. 50705097) and Natural Science Founda-
tion of Hubei Province (No. E2007001048).
References
Al-Ghamd, A. M., & Mba, D. (2006). A comparative experimental study on the use of
acoustic emission and vibration analysis for bearing defect identification and
estimation of defect size. Mechanical Systems and Signal Processing, 20(7),
1537–1571.
Assous, S., Humeau, A., Tartas, M., Abraham, P., & L’Huillier, J. P. (2006). S-transform
applied to laser doppler flowmetry reactive hyperemia signals. IEEE Transactionson Biomedical Engineering, 53(6), 1032–1037.
Chen, Z., He, Y., Chu, F., & Huang, J. (2003). Evolutionary strategy for classification
problems and its application in fault diagnostics. Engineering Applications of Artificial Intelligence, 16 (1), 31–38.
Cho, Y. C., & Choi, S. (2005). Nonnegative features of spectro-temporal sounds for
classification. Pattern Recognition Letters, 26 (9), 1327–1336.
Dash, P. K., & Chilukuri, M. V. (2004). Hybrid S-transform and Kalman filtering
approach for detection and measurement of short duration disturbances in
power networks. IEEE Transactions on Instrumentation and Measurement, 53(2),
588–596.
Dash, P. K., Panigrahi, B. K., & Panda, G. (2003). Power quality analysis using S-transform. IEEE Transactions on Power Delivery, 18(2), 406–411.
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist
multiobjective genetic algorithm: NSGA-II. IEEE Transactions on EvolutionaryComputation, 6 (2), 182–197.
Du, Q., & Chang, C. I. (2000). A linear constrained distance-based discriminant
analysis for hyperspectral image classification. Pattern Recognition, 34(2),
361–373.
Firpi, H., & Vachtsevanos, G. (2008). Genetically programmed-based artificial
features extraction applied to fault detection. Engineering Applications of Artificial Intelligence, 21(4), 558–568.
Grother, P. J., Candela, G. T., & Blue, J. L. (1997). Fast implementations of nearest
neighbor classifiers. Pattern Recognition, 30(3), 459–465. Jack, L. B., & Nandi, A. K. (2002). Fault detection using support vector machines and
artificial neural networks, augmented by genetic algorithms. MechanicalSystems and Signal Processing, 16 (2-3), 373–390.
Jack, L. B., Nandi, A. K., & McCormick, A. C. (2000). Diagnosis of rolling element
bearing faults using radial basis function networks. Applied Signal Processing,6 (1), 25–32.
Junsheng, C., Dejie, Y., & Yu, Y. (2006). A fault diagnosis approach for roller bearings
based on EMD method and AR model. Mechanical Systems and Signal Processing, 20(2), 350–362.
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. ArtificialIntelligence, 97 (1-2), 273–324.
Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative
matrix factorization. Nature, 401(6755), 788–791.
Lei, Y., He, Z., Zi, Y., & Chen, X. (2008). New clustering algorithm-based fault
diagnosis using compensation distance evaluation technique. MechanicalSystems and Signal Processing, 22(2), 419–435.
Lei, Y., He, Z., Zi, Y., & Hu, Q. (2007). Fault diagnosis of rotating machinery based on
multiple ANFIS combination with GAs. Mechanical Systems and Signal Processing, 21(5), 2280–2294.
Lei, Y., & Zuo, M. J. (2009). Gear crack level identification based on weighted K
nearest neighbor classification algorithm. Mechanical Systems and SignalProcessing, 23(5), 1535–1547.
Li, W., Shi, T., Liao, G., & Yang, S. (2003). Feature extraction and classification of gear
faults using principal component analysis. Journal of Quality in MaintenanceEngineering, 9(2), 132–143.
Lin, J., & Qu, L. (2000). Feature extraction based on morlet wavelet and its
application for mechanical fault diagnosis. Journal of Sound and Vibration, 234(1), 135–148.
Lin, J., & Zuo, M. J. (2003). Gearbox fault diagnosis using adaptive wavelet filter.
Mechanical Systems and Signal Processing, 17 (6), 1259–1269.
Liu, W., & Zheng, N. (2004). Non-negative matrix factorization based methods for
object recognition. Pattern Recognition Letters, 25(8), 893–897.
Oehlmann, H., Brie, D., Tomczak, M., & Richard, A. (1997). A method for analysing
gearbox faults using time–frequency representations. Mechanical Systems andSignal Processing, 11(4), 529–545.
Oh, I. S., Lee, J. S., & Moon, B. R. (2004). Hybrid genetic algorithms for feature
selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26 (11),1424–1437.
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information:
Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (8), 1226–1238.
Peng, Z. K., Tse, P. W., & Chu, F. L. (2005). A comparison study of improved Hilbert-
Huang transform and wavelet transform: Application to fault diagnosis for
rolling bearing. Mechanical Systems and Signal Processing, 19(5), 974–988.
Pu, X., Yi, Z., Zheng, Z., Zhou, W., & Ye, M. 2005. Face recognition using fisher non-
negative matrix factorization with sparseness constraints. Paper presented at
the lecture notes in computer science.
Randall, R. B., Antoni, J., & Chobsaard, S. (2001). The relationship between spectral
correlation and envelope analysis in the diagnostics of bearing faults and other
cyclostationary machine signals. Mechanical Systems and Signal Processing, 15(5),
945–962.
Samanta, B. (2004). Gear fault detection using artificial neural networks and
support vector machines with genetic algorithms. Mechanical Systems and SignalProcessing, 18(3), 625–644.
Samanta, B., Al-Balushi, K. R., & Al-Araimi, S. A. (2003). Artificial neural networks
andsupport vector machines with genetic algorithmfor bearing fault detection.Engineering Applications of Artificial Intelligence, 16 (7-8), 657–665.
Samanta, B., & Nataraj, C. (2009). Use of particle swarm optimization for machinery
fault detection. Engineering Applications of Artificial Intelligence, 22(2), 308–
316.
Saravanan, N., Cholairajan, S., & Ramachandran, K. I. (2008). Vibration-based fault
diagnosis of spur bevel gear box using fuzzy technique. Expert Systems with Applications.
Srinivas, N., & Deb, K. (1994). Muiltiobjective optimization using nondominated
sorting in genetic algorithms. Evolutionary Computation, 2(3), 221–248.
Srinivasan, D., Cheu, R. L., Poh, Y. P., & Ng, A. K. C. (2000). Automated fault detection
in power distribution networks using a hybrid fuzzy-genetic algorithm
approach. Engineering Applications of Artificial Intelligence, 13(4), 407–418.
Stockwell, R. G., Mansinha, L., & Lowe, R. P. (1996). Localization of the complex
spectrum: The S transform. IEEE Transactions on Signal Processing, 44(4),
998–1001.
Sugumaran, V., Muralidharan, V., & Ramachandran, K. I. (2007). Feature selection
using decision tree and classification through proximal support vector machine
for fault diagnostics of roller bearing. Mechanical Systems and Signal Processing, 21(2), 930–942.
10008 B. Li et al. / Expert Systems with Applications 38 (2011) 10000–10009
8/13/2019 Gearbox Extraction
http://slidepdf.com/reader/full/gearbox-extraction 10/10
Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine
classifiers. Neural Processing Letters, 9(3), 293–300.
Tan, T., Fu, X., Zhang, Y., & Bourgeois, A. G. (2008). A genetic algorithm-based
method for feature subset selection. Soft Computing, 12(2), 111–120.
Veenman, C. J., & Tax, D. M. J., 2005. A weighted nearest mean classifier for sparse
subspaces. Paper presented at the proceedings of the IEEE computer society
conference on computer vision and pattern recognition.
Wang, G., Luo, Z., Qin, X., Leng, Y., & Wang, T. (2008). Fault identification and
classification of rolling element bearing based on time-varying autoregressive
spectrum. Mechanical Systems and Signal Processing, 22(4), 934–947.
Wang, W. J., & McFadden, P. D. (1993). Early detection of gear failure by vibrationanalysis – ii. interpretation of the time–frequency distribution using image
processing techniques. Mechanical Systems and Signal Processing, 7 (3), 205–215.
Widodo, A., Yang, B.-S., & Han, T. (2007). Combination of independent component
analysis and support vector machines for intelligent faults diagnosis of
induction motors. Expert Systems with Applications, 32(2), 299–312.
Wuxing, L., Tse, P. W.,Guicai, Z., & Tielin, S. (2004). Classificationof gear faults using
cumulants and the radial basis function network. Mechanical Systems and SignalProcessing, 18(2), 381–389.
Yang, J., & Honavar, V. (1998). Feature subset selection using genetic algorithm. IEEE Intelligent Systems and Their Applications, 13(2), 44–48.
Yuan, Z., & Oja, E., 2005. Projective nonnegative matrix factorization for image
compression and feature extraction. Paper presented at the lecture notes in
computer science.
Zhu, F., & Guan, S. (2004). Feature selection for modular GA-based classification.
Applied Soft Computing Journal, 4(4), 381–393.
Zhu, Z., Ong, Y. S., & Dash, M. (2007). Wrapper-filter feature selection algorithmusing a memetic framework. IEEE Transactions on Systems, Man, and Cybernetics,Part B: Cybernetics, 37 (1), 70–76.
B. Li et al. / Expert Systems with Applications 38 (2011) 10000–10009 10009