1-s2.0-S0263224112002400-main

36
Accepted Manuscript Diagnosis of artificially created surface damage levels of planet gear teeth using ordinal ranking Xiaomin Zhao, Ming J. Zuo, Zhiliang Liu, Mohammad R. Hoseini PII: S0263-2241(12)00240-0 DOI: http://dx.doi.org/10.1016/j.measurement.2012.05.031 Reference: MEASUR 1955 To appear in: Measurement Received Date: 8 August 2011 Revised Date: 6 May 2012 Accepted Date: 30 May 2012 Please cite this article as: X. Zhao, M.J. Zuo, Z. Liu, M.R. Hoseini, Diagnosis of artificially created surface damage levels of planet gear teeth using ordinal ranking, Measurement (2012), doi: http://dx.doi.org/10.1016/j.measurement. 2012.05.031 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Transcript of 1-s2.0-S0263224112002400-main

Page 1: 1-s2.0-S0263224112002400-main

Accepted Manuscript

Diagnosis of artificially created surface damage levels of planet gear teeth using

ordinal ranking

Xiaomin Zhao, Ming J. Zuo, Zhiliang Liu, Mohammad R. Hoseini

PII: S0263-2241(12)00240-0

DOI: http://dx.doi.org/10.1016/j.measurement.2012.05.031

Reference: MEASUR 1955

To appear in: Measurement

Received Date: 8 August 2011

Revised Date: 6 May 2012

Accepted Date: 30 May 2012

Please cite this article as: X. Zhao, M.J. Zuo, Z. Liu, M.R. Hoseini, Diagnosis of artificially created surface damage

levels of planet gear teeth using ordinal ranking, Measurement (2012), doi: http://dx.doi.org/10.1016/j.measurement.

2012.05.031

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers

we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and

review of the resulting proof before it is published in its final form. Please note that during the production process

errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Page 2: 1-s2.0-S0263224112002400-main

1

Diagnosis of artificially created surface damage levels of planet gear teeth using ordinal ranking

Xiaomin Zhao1, Ming J Zuo1*, Zhiliang Liu1,2 and Mohammad R. Hoseini1

1Department of Mechanical Engineering University of Alberta, Edmonton, Canada

[email protected]

2School of Automation Engineering University of Electronic Science and Technology

Chengdu, China

Effective diagnosis of damage levels is important for condition based preventive

maintenance of gearboxes. One special characteristic of damage levels is the inherent

ordinal information among different levels. Retaining the ordinal information is therefore

important for diagnosing damage levels. Classification, a machine learning technique, has

been widely adopted for automated diagnosis of gear faults. However, classification cannot

keep the ordinal information because the damage levels are treated as nominal variables.

This paper employs ordinal ranking, another machine learning technique, to preserve the

ordinal information in automated diagnosis of damage levels. As to ordinal ranking,

feature selection is important. However, most existing feature selection methods are for

classification, which are not suitable for ordinal ranking. This paper designs a feature

selection method for ordinal ranking based on correlation coefficients. A diagnosis

approach based on ordinal ranking and the proposed feature selection method is then

introduced. This method is tested on diagnosis of artificially created surface damage levels

of planet gear teeth in a planetary gearbox. Experimental results show the effectiveness of

the proposed diagnosis approach. The advantages of using ordinal ranking for diagnosing

gear damage levels are also demonstrated.

Keywords- ordinal ranking; damage level; planetary gearbox; correlation coefficient

Page 3: 1-s2.0-S0263224112002400-main

2

1. Introduction

Gearboxes are widely employed in automotive, aerospace, and various other industries to

provide speed and torque conversions. Effective diagnosis of gear faults is critical to the

reliable operation of a mechanical system. Majority of the reported investigations on gear

fault diagnosis focus on the detection of the presence of a fault [1] and identification of the

fault modes, such as pitting [1], crack [2] and tooth breakage [2]. The fault propagation is,

however, not studied as much.

Information on fault propagation reveals the severity of a fault, so it is very helpful in

scheduling preventive maintenance or other actions to avoid a failure. Feng et al. [3]

reported that the regularization dimension of gearbox vibration signals increased

monotonically with the gear fault severity. Ozturk et al. [4] stated that the scalogram,

especially its mean frequency variation, provided early indications of the presence and

progression of gear pitting. These methods find an indicator that monotonically varies with

fault levels. The fault level can then be estimated by checking the value of this indicator.

However, such methods need expertise from a diagnostician to apply them successfully and

cannot distinguish fault levels automatically. To alleviate this problem, researchers have

used classifiers to automatically classify different fault levels [5, 6]. For example, Lei et al.

[6] proposed a weighted KNN classification method for gear crack level identification. In

these methods, the fault level is regarded as a nominal variable, and the problem of

diagnosing fault levels is treated as a classification problem. As a result, the ordinal

information among fault levels is ignored [7]. For example, a moderate fault is worse than

(<) a slight fault but is better than (>) a severe fault; however, in classification, the severe,

Page 4: 1-s2.0-S0263224112002400-main

3

moderate and slight faults are parallel to each other and cannot be compared using “>” and

“<” operations. Having ordinal information is the main characteristic of fault levels, which

makes the diagnosis of fault levels more complicated than the diagnosis of the fault modes.

How to utilize and retain the ordinal information is therefore an important issue.

Planetary gearbox is a type of gearbox whose planet gears are mounted on a carrier which

itself may rotate relative to the sun gear. It has many advantages over the traditional (fixed-

shaft) gearbox, e.g. high power output, small volume, multiple kinematic combinations, etc.

[8]. Nevertheless, planetary gearboxes are structurally more complicated and possess

several unique behaviours that are not found in fixed-shaft gearboxes. For instance, gear

meshing frequencies of planetary gearboxes are often completely suppressed, and sidebands

are not as symmetric as those of fixed-shaft gearboxes [8]. Barszca and Randall [9] found

that the conventional analysis methods did not detect a tooth crack in a planet gear. They

then proposed a diagnosis method based on spectral kurtosis and achieved good results.

Bartelmus and Zimroz [10] found that a planetary gearbox with a fault was more

susceptible to load than a healthy gearbox. Based on this observation, they proposed a new

feature for monitoring a planetary gearbox in non-stationary operating conditions in [11, 12].

The objective of this paper is to develop an intelligent method for diagnosing damage levels

of planet gears in a planetary gearbox. In order to reserve the ordinal information in damage

levels, ordinal ranking will be employed. Ordinal ranking [13, 14] (also called ordinal

classification [15] and ordinal regression [16]) is a supervised machine learning technique

that uses an ordinal variable as its label. Details on ordinal ranking will be introduced in

Section 2. Feature selection is a necessary procedure for ordinal ranking, particularly

Page 5: 1-s2.0-S0263224112002400-main

4

because it can enhance accuracy and improve the efficiency of training [17]. Most existing

feature selection methods are actually developed for classification problems, which makes

them work less effectively in ordinal ranking problems. For example, Mukras et al. [18]

found that the standard feature selection method using information gain failed to identify

discriminatory features in an ordinal ranking problem. The reported work on feature

selection dedicated to ordinal ranking is very limited. Baccianella et al. [19] proposed

feature selection methods based on information gain for ranking in text-related applications.

Geng et al. [17] proposed a feature selection method based on mean average precision

(MAP) and Kendall‟s tau correlation coefficient for ranking in information retrieval. In this

paper, a feature selection method based on correlation coefficients will be designed.

The rest of the paper is organized as follows. Section 2 introduces the concept of ordinal

ranking and describes a reported ordinal ranking algorithm. Section 3 proposes a feature

selection method for ordinal ranking. Section 4 describes the planetary gearbox test rig for

experimental data collection. Section 5 explains the feature calculation and extraction for

the planetary gearbox. Section 6 presents an approach for diagnosis of damage levels based

on ordinal ranking. Section 7 tests the diagnosis approach with experimental data and

discusses the results. Finally, conclusion comes in Section 8.

2. Ordinal ranking

2.1. Review on ordinal ranking

Steven [20] divided the scales of measurement into four types: nominal, ordinal, interval,

and ratio. The nominal scale is for variables that have two or more categories but do not

have an intrinsic order, e.g. types of fruits. When only two categories are involved, the

Page 6: 1-s2.0-S0263224112002400-main

5

variable is called binary, e.g. gender. The ordinal scale is rank-ordered but does not have

metric information, e.g. grades of students (A+, A, A-, B+,…, F). The interval scale and the

ratio scale are for continuous variables which have metric information. In supervised

machine learning, a training set in the form of T=[X, y] is given, where X is the input set

(the feature set) and y is the output set (the label). According to the label‟s measurement

scale, supervised machine learning problems can be grouped into three categories: if the

label is a nominal variable, the problem is called classification; if the label is a continuous

variable, the problem is called regression; if the label is an ordinal variable, the problem is

called ordinal ranking [13]. In this paper, we call the labels of ordinal ranking, ranks.

Ordinal ranking is similar to classification in the sense that the rank y is a finite set.

Nevertheless, besides representing the nominal variables as classification labels, ranks of

ordinal ranking also carry ordinal information. That is, two ranks in y can be compared by

the “<” (better) or “>” (worse) operation. Ordinal ranking is also similar to regression, in

the sense that ordinal information is similarly contained in y. However, unlike the real-

valued regression labels; the discrete ranks do not carry metric information. That is, we can

say rank A is better than rank B”, but it is hard to say quantitatively how much better rank A

is.

A commonly used idea to conduct ordinal ranking is to transform the ranking problem to a

set of binary classification problems, or to add additional constraints to traditional

classification formations. Herbrich et al. [16] proposed a loss function between pairs of

ranks, and then applied the principle of structure risk minimization to solve the ordinal

ranking problem. However, because there are O (N2) pairwise comparisons out of N training

Page 7: 1-s2.0-S0263224112002400-main

6

samples, the computation complexity is high when N is large. Crammer and Singer [21]

generalized the online perception algorithm with multiple thresholds to predict r ranks. The

feature space was divided into r parallel equally-ranked regions, where each region stood

for one rank. With this approach, the loss function was calculated pointwisely and the

quadratic expansion problem was avoided. Following the same idea, Shashua and Levin [14]

generalized support vector machine (SVM) into ordinal ranking by finding r-1 thresholds

that divided the real line into r consecutive intervals for the r ranks. Chu and Keerthi [22]

improved the approach in [14] and proposed two new approaches by imposing the ordinal

inequality constraints on the thresholds explicitly in the first approach and implicitly in the

second one. Among the above methods, SVM-based methods have shown great promise in

ordinal ranking. The algorithm proposed in [22] (called support vector ordinal regression

(SVOR)) implicitly adding constraints to SVM is straightforward and easy to interpret, and

therefore will be adopted in this paper.

2.2.A reported ordinal ranking algorithm [22]

The concept of the ordinal ranking algorithm (SVOR) [22] is briefed as follows. First, the

original feature space (X) is mapped into the high dimensional feature space ( ( )) x . In this

feature space ( ( )) x , an optimal projection direction (w) and r-1 thresholds (b) which

define r-1 parallel discriminant hyperplanes for the r ranks correspondingly were found, as

illustrated in Fig. 1. The sample x satisfying 1 ( )i ib b w x is assigned rank i. The

ranking model is thus

1(rank) , ( )i iz i if b b w x . (1)

Page 8: 1-s2.0-S0263224112002400-main

7

b1 bib2... br-1

rank 1 rank 2

bi-1 bi+1 ...

rank i rank i+1 rank r

( ) ( )f wx x

Fig. 1 Schematic drawing of the ordinal ranking algorithm (SVOR)

For a threshold bj, the function values of samples from all the lower ranks should be less

than the lower margin bj-1. If sample k

ix violates this requirement, then

( ) ( 1)j k

ki i jb w x (2)

is taken as the error associated with the sample k

ix for bj, where k is the rank of k

ix and

k j . Similarly, the function values of samples for the upper rank should be greater than

the upper margin bj+1. Otherwise,

* ( 1) ( )j k

ki j ib w x (3)

is the error associated with the sample k

ix for bj, where k j . Considering the error terms

associated with all r-1 thresholds, the primal problem to find the optimal w and thresholds

b are defined as follows:

*

1*

, , ,1 1 1 1 1

* *

1min ( )

2

. . ( ) 1 , 0,

1, 2,..., ; 1, 2,...,

( ) 1 , 0,

1, 2,..., ; 1, 2,...,

k kj jr n nT j j

ki ki

j k i k i

k j j

i j ki ki

k

k j j

i j ki ki

k

C

s t b

for k j i n

b

for k j j r i n

w

w w

w

w

b

x

x

(4)

Page 9: 1-s2.0-S0263224112002400-main

8

where j runs over 1, 2, …, r-1 and nk is the number of samples with the rank k. By solving

this optimization problem, the optimal w and b will be found, and thus the ranking model

(Eq. (1)) will be built.

3. Feature selection method for ordinal ranking

In fault diagnosis, the number of features is usually large because many features can be

extracted to capture fault information utilizing signal processing technology. The large

number of features may reduce the performance of ordinal ranking [17]. Thus feature

selection is needed. Feature selection can be achieved by finding a feature subset that has

maximum relevance to the rank (label) and minimum redundancy among the features

themselves [17, 23, 24]. Measures are needed to evaluate the relevance between a feature

and the label (i.e. feature-label relevance) and redundancy among features (i.e. feature-

feature redundancy), respectively. However, most existing measures are proposed for

classification. The measures of feature-feature redundancy can work in an ordinal ranking

problem if the types of features the same as in a classification problem. Nevertheless, the

measures of feature-label relevance in a classification problem is not suitable for an ordinal

ranking problem [17, 18], because the label of ordinal ranking is an ordinal variable

whereas the label of classification is a nominal variable, as stated in Section 2.1. In this

paper, correlation coefficients are used for these measures, because they are conceptually

simple and practically effective.

Page 10: 1-s2.0-S0263224112002400-main

9

3.1. Correlation coefficients

Correlation coefficients evaluate the relevance between two variables. Depending upon the

types of variables, several correlation coefficients are defined, some of which are listed in

Table 1 [25, 26].

Table 1 List of correlation coefficients [25, 26]

Types of variables Nominal (binary) Ordinal Continuous Nominal (binary) Phi Rank-biserial Point-biserial Ordinal Rank-biserial Polychoric, Spearman rank,

Kendal rank Polyserial

Continuous Point-biserial Polyserial Pearson

In fault diagnosis of machinery, features are usually continuous variables (e.g. kurtosis) [6,

27] and/or nominal variables (e.g. the status of a valve (on/off)) [27]. In a classification

problem (the label is nominal), the feature-label relevance can be evaluated using the Point-

biserial correlation coefficient (if the feature is continuous) and the Phi correlation

coefficient (if the feature is nominal). The feature-feature redundancy can be evaluated

using the Point-biserial correlation coefficient (if one feature is continuous and the other is

nominal) and the Pearson correlation coefficient (if both features are continuous). The

Point-biserial and the Phi correlation coefficients are mathematically equal to the Pearson

correlation coefficient. That is why the Pearson correlation coefficient is said to be widely

used in classification problems [23, 28].

In an ordinal ranking problem, the feature-label relevance can be evaluated by the Polyserial

correlation coefficient (if the feature is continuous), the Rank-biserial correlation coefficient

(if the feature is nominal) and the Polychoric, Spearman rank or Kendal rank correlation

coefficient (if the feature is ordinal). The feature-feature redundancy can be evaluated using

Page 11: 1-s2.0-S0263224112002400-main

10

the Pearson correlation coefficient (if both features are continuous), the Point-biserial

correlation coefficient (if one feature is continuous and the other is nominal), and the Rank-

biserial correlation coefficient (if one feature is ordinal and the other is nominal).

In planetary gearbox experiments, the features (will be shown in Section 5) calculated are

continuous variables. So the Polyserial correlation coefficient and the Pearson correlation

coefficient are used as measures of feature-label relevance and feature-feature redundancy,

respectively in this paper. The mathematical explanations of the Pearson and the Polyserial

correlation coefficients are described in the following.

The Pearson correlation coefficient, xy , is defined in Eq.(5), where x and y are two

continuous variables.

1

2 2

1 1

( (cov( , )

( (

n

i i

ixy

n nx y

i i

i i

x y

x y

) )

) )

x yx y

x y

(5)

In some cases, the continuous variable (y) could only be measured in rough categories like

low, medium, high, etc, using an ordinal variable z (shown as Eq. (6)). Here z1, z2, …, zr

( 2)r are known increasing ranks (i.e. zi< zi+1) and b= 0 1( , ,..., )rb b b is a vector of known

thresholds with 0b and rb .

1 , 1,2,...,i j jz z if b y b j r (6)

The Polyserial correlation coefficient between observed data (x and z) is actually an

estimation of the Pearson correlation coefficient between x and y. Without loss of generality,

Page 12: 1-s2.0-S0263224112002400-main

11

we assume x ~ ( , )x xN , y ~ (0,1)N , and the joint distribution of x and y follows the

bivariate normal distribution,

22

2

22

2 )( )

1( , ) exp[

2(1 )2 1-

xxyx

xx

xyx xy

xy yxy

p x y

. (7)

In Eq. (7), xy is called the Polyserial correlation coefficient between x and z. The value of

xy can be obtained by maximum-likelihood estimator or two-step estimator [29]. For

details, refer to Ref. [29].

3.2. The proposed feature selection method for ordinal ranking

The value of correlation coefficient varies from -1 to 1. A correlation coefficient of 1 (-1)

means that the two variables are perfectly (perfectly inversely) correlated, and 0 means that

the two variables are not correlated. If the absolute value of the correlation coefficient

between the two variables is closer to 0, then the two variables are less correlated.

A feature with a larger absolute value of the Polyserial correlation coefficient has more

information on the rank, whereas a feature with a smaller absolute value contributes less

information on the rank. Similarly, two features with larger absolute value of the Pearson

correlation coefficient share more redundant information. The proposed feature selection

method follows the maximum relevance and minimum redundancy feature selection scheme.

It selects a feature subset, V, based on Eq. (8) where I( ix , z) is the relevance between

feature ix and rank z, ( , )i jI x x is the redundancy between two features ix and jx , and t1

and t2 are two thresholds. t1 is selected to avoid features with little or negative information

being included in V. t2 is selected to ensure that the redundancy between any two arbitrary

Page 13: 1-s2.0-S0263224112002400-main

12

features is below a certain level. The values of t1 and t2 are determined by specific

application problems.

1

2

max ( , ) ( , )

s.t. ( , )

( , ) , , ,

i

i

i

i j i j

D I

I t

I t i j

x

V z x z

x z

x x x x V

(8)

A sequential forward search strategy can be used to find the solution to Eq. (8). The

proposed feature selection method is described in Table 2. The raw feature set is denoted by

matrix 1 2[ , , ..., ]m n m X x x x (m is the total number of features and n is the total number of

samples). Vector z 1[ ]'i nz represents the ranks for the n samples.

Table 2 The proposed feature selection method for ordinal ranking

Input: T=1 2 ( 1)[ , , ..., , ]m n m x x x z // input data set

t1 , t2 (1 t1,t2 0) // thresholds Output: V // the selected feature subset

Step 1. Set V= . Step 2. Calculate the relevance vector, p 1[ ]j mp , whose element, pj, is the absolute

value of the Polyserial correlation coefficient between the jth feature (i.e. jx ) and the

rank vector (z). j=1, 2, …, m. Calculate the redundancy matrix [ ]ij m ms S , whose element, sij,, is the absolute

value of the Pearson correlation coefficient between the ith feature ( ix ) and the jth

features ( jx ). i, j=1, 2, …, m.

Step 3. Find the largest element in p, i.e. pr =max (p), then put the corresponding feature

rx into V (i.e. V=[V rx ]) and set pr=0.

Step 4. Find the features whose redundancy with feature rx are not smaller than t2,

i.e. 2 | hrh s t , then set their relevance values to zero (i.e. ph=0) so that these features won‟t be selected in future steps.

Step 5. Check elements in p. If pj t1 for all j=1, 2, …, m, then go to Step 6; otherwise, go to Step 3.

Step 6. Return V.

Page 14: 1-s2.0-S0263224112002400-main

13

Note that in Table 2, the Polyserial and the Pearson correlation coefficient are used for

evaluation of feature-label relevance and feature-feature redundancy respectively, because

the features involved in our experiments (will explained in Section 5) are continuous. In

other applications, proper type of correlation coefficient can be chosen according to the

types of features (Table 1) if the features are not continuous.

4. Experimental data collection

The planetary gearbox test rig [30] shown in Fig. 2 was utilized to collect vibration data for

developing a reliable fault diagnosis system. It includes a 20 HP drive motor, a bevel

gearbox, two planetary gearboxes, two speed-up gearboxes and a 40 HP load motor. The

load was applied through the drive motor. A torque sensor was installed at the output shaft

of the second stage planetary gearbox. The transmission ratio of each gearbox is listed in

Table 3. Fig. 3 shows the schematic diagram of the two planetary gearboxes. Four

accelerometers, namely LS1, HS1, LS2 and HS2, are located on the housing of the

planetary gearboxes. LS1 and LS2 are two identical low-sensitivity accelerometers. HS1

and HS2 are two identical high-sensitivity accelerometers.

Page 15: 1-s2.0-S0263224112002400-main

14

Table 3 Specification of the planetary gearbox test rig

LS1 LS2HS1 HS2

1st stage housing 2nd stage housing

2nd stage planet gear

2nd stage sun gear

1st stage planet gear

1st stage sun gear

1st stage carrier

1st stage ring gear 2nd stage ring gear

2nd stage carrier

Shaft #1 Shaft #2 Shaft #3

1st stage planet gear

1st stage sun gear

1st stage carrier

1st stage ring gear

2nd

stage planet gear

2nd

stage sun gear

2nd

stage carrier

2nd

stage ring gear

2nd

stage housing1st stage housing

Shaft #3Shaft #2Shaft #1

LS1 HS1 HS2LS2

Fig. 2 Schematic diagram of the two-stage planetary gearbox

We artificially created the gear tooth surface damage at several severity levels following

Refs. [4] and [31]. Circular holes with the diameter of 3 mm and the depth of 0.1 mm were

created on the teeth of one of the four planet gears in the second stage planetary gearbox.

Gearbox No. of Teeth Ratio Bevel Input gear 18 4.000

Output gear 72 The 1st stage planetary Sun gear 28 6.429

Three planet gears 62 Ring gear 152

The 2nd stage planetary Sun gear 19 5.263 Four planet gears 31 Ring gear 81

The 1st stage speed-up Input gear 72 0.133 Middle gear 1 32 Middle gear 2 80 Output gear 24

The 2nd stage speed-up Input gear 48 0.141 Middle gear 1 18 Middle gear 2 64 Output gear 24

Page 16: 1-s2.0-S0263224112002400-main

15

The number of holes and the number of teeth with holes were varied to mimic the slight,

moderate and severe damage,as below. Fig. 4 shows the four gears having different

damage levels. Details on damage creation are reported in [30]. For slight damage, 3 holes

on one tooth and 1 hole on each of the two neighbouring teeth were created. The damage

area accounted for 2.65%, 7.95%, and 2.65% of the surface of each of these three teeth,

respectively. For moderate damage, 10 holes on one tooth, 3 holes on each of the two

immediate neighbouring teeth, and 1 hole on each of the next neighbouring teeth on

symmetric sides were created. The damage areas of these five teeth were 2.65%, 7.95%,

26.5%, 7.95% and 2.65% of the tooth surface, respectively. For severe damage, 24 holes

on one tooth, 10 holes on each of the two immediate neighbouring teeth, and 3 holes on

each of the next neighbouring teeth on symmetric sides were created. The damage areas of

these five teeth were 7.95%, 26.5%, 63.6%, 26.5% and 7.95% of the tooth surface,

respectively.

For each gear, vibration data with the length of 10 minutes were collected from four

accelerometers with a sampling frequency 10 KHz at each combination of the following

conditions: four drive motor speed conditions (i.e. 300, 600, 900 and 1200 revolution per

minute, RPM) and two load conditions (i.e. low load (191.9 ~ 643.6 [N-m]) and high load

(812.9 ~ 1455.2 [N-m]). At the low load condition, the load motor was off, but there were

frictions in the two speedup gearboxes and the rotor in the load motor was also rotating.

According to the readings of the torque sensor, at this low load condition, the load that was

applied at the output shaft of the planetary gearboxes ranged from 191.9 [N-m] to 643.6

[N-m]. The high load condition was selected based on the gear materials and stress

calculation. We adjusted the loading applied by the load motor to reach an average of 1130

Page 17: 1-s2.0-S0263224112002400-main

16

[N-m] as displayed by the torque sensor, so that the system would run with a comfortable

safe margin. The actual readings of the torque sensor fluctuated from 812.9 [N-m] to

1455.2 [N-m]. The 10-minute data were evenly split into 20 samples, so 160 samples (i.e.

20 samples 2 (loads) 4 (speeds)) were collected for each gear. Totally 640 samples (i.e.

160 samples 4 (damage levels)) are available.

a) No damage b) Slight damage

c) Moderate damage d) Severe damage

Fig. 4 Planet gears with artificially created damage at different severity levels

5. Feature calculation and extraction for planetary gearboxes

The traditional techniques for vibration-based gear fault diagnosis are typically based on

statistical measurements of the collected vibration signals [32, 33]. Many statistical features

have been proposed and studied for fixed-shaft gearboxes; however, some are not suitable

for planetary gearboxes.

Page 18: 1-s2.0-S0263224112002400-main

17

In a fixed-shaft gearbox, damage to an individual gear tooth modulates with the vibration on

the housing at the shaft frequency. In the frequency domain, the damage appears in the form

of symmetric sidebands around the gear meshing frequency which is the dominant

component. In a planetary gearbox, the dominant frequency component usually does not

appear at the gear meshing frequency because the planet gears are usually not in phase. In

fact, the gear mesh frequencies are often completely suppressed, and sidebands are not

symmetric around the gear meshing frequency any more [33]. For description convenience,

,m nf is used to denote the frequency at ( ) crf m Z n f , where Zr is the number of ring gear

teeth, cf is the carrier frequency, m (m>0) and n are integers. In an ideal planetary gearbox,

only frequency components that appear at sidebands where r pm Z n kN (Np is the number

of planets) survive in a vibration signal. Keller and Grabill [33] referred to the surviving

sidebands with two different names: dominant sideband and apparent sideband. For each

group of sidebands with the same value of m, there is one dominant sideband (donated

bydm,nRMC ) which is the one closest to the mth harmonic of gear meshing frequency. Other

surviving sidebands in this group are called apparent sidebands (donated by ,m nRMC ). Let

sRMC denote the shaft frequency and its harmonics, +1dm,nRMC denote the first-order

sideband of dm,nRMC . The regular (reg), difference (d), residual (r) and envelope (e) signals

for a planetary gearbox are then defined in Eq. (9) [33]. In Eq. (9), x is a vibration signal in

the time-waveform, F-1 is the inverse Fourier transform, bp is the signal band-pass filtered

about the dominant sideband of the gear meshing frequency ( 1 d,nRMC ) and H(bp) is the

Hilbert transform of bp.

Page 19: 1-s2.0-S0263224112002400-main

18

1, , , 1

1, ,

[ ]

[ ]

= | [ ( )] |

d d

d

sm n m n m n

sm n m n

F RMC RMC RMC RMC

F RMC RMC RMC

iH

reg

d x reg

r x

e bp bp

(9)

With signals reg, d, r and e defined, features can now be calculated for planetary gearboxes.

Sixty-three features are extracted from each vibration signal: 18 features are from the time-

domain, 30 features are from the frequency-domain, and 15 features are specifically

designed for gear fault diagnosis. Table 4 lists the definitions of these features.

Table 4 List of feature names and definitions [33, 34]

# Feature Name Definition # Feature Name

Definition

Time-domain features F1 maximum _ max max( )x x F2 minimum _ min min( )x x

F3 average absolute 1

1_

N

i

i

x abs xN

F4 peak to peak _ _ max- _ minx p x x

F5 mean 1

1=

N

i

i

xN

x F6 RMS 2

1

1_

N

i

i

x rms xN

F7 delta RMS

1_ _ _j jx drms x rms x rms

where j is the current segment of time record and j-1 is the previous segment.

F8 variance 2 2

1

1_ ( )

N

i

i

x xN

x

F9 standard deviation

2_ _x x F10 skewness

3

1

3

1

_( _ )

N

i

i

xN

x skx

x

F11 kurtosis

4

1

4

1

_( _ )

N

i

i

xN

x kurx

x

F12 crest factor _ max

__

xx cf

x rms

F13 clearance factor

2

max | |_

( )x clf

x_rms

x F14 impulse factor

max(| |)_

_x if

x abs

x

F15 shape factor _

__

x rmsx sf

x abs F16

coefficient of variation

__

xx cv

x

F17 coefficient of skewness

3

1

3

1

_( _ )

N

i

i

xN

x csx

F18 coefficient of kurtosis

4

1

4

1

_( _ )

N

i

i

xN

x ckx

Frequency-domain features

Page 20: 1-s2.0-S0263224112002400-main

19

F19 mean frequency 1

1 K

k

k

mf XK

F20 frequency center

1

1

[ ]K

k k

k

K

k

k

f X

fc

X

F21 root mean square frequency

2

1

1

( )K

k k

k

K

k

k

f X

rmsf

X

F22

standard deviation frequency

2

1

1

K

k k

k

K

k

k

f fc X

stdf

X

F23-F35

amplitudes at characteristic frequencies of the 1st stage planetary gearbox

amplitudes at the frequencies:

1 11, 1( ) c

n rf Z n f

where n= -6, -5, …, 6

F36-F48

amplitudes at characteristic frequencies of the 2nd stage planetary gearbox

amplitudes at the frequencies:

2 21, 2( ) c

n rf Z n f

where n= -6, -5, …, 6

Features specifically designed for planetary gearboxes

F49 energy ratio ( )

( )

RMSer

RMS

d

r F50

energy operator

( )eo kurtosis y where 2

1 +1ii i iy x x x

F51 FM4 4 ( )FM kurtosis d F52 M6A

6

1

32

1

1

61

N

i

i

N

i

i

dN

M A

dN

d

d

F53 M8A

8

1

42

1

1

81

N

i

i

N

i

i

dN

M A

dN

d

d

F54 NA4

4

1

22

1 1

1

41 1

N

i

i

M N

ij j

j i

rN

NA

rM N

r

r

F55 NB4

4

1

22

1 1

1

41 1

N

i

i

M N

ij j

j i

eN

NB

eM N

e

e

F56 FM4*

'

4

1

22

'1 1

1

4*1 1

N

i

i

M N

ij j

j i

dN

FM

dM N

d

d

F57 M6A*

'

6

1

32

'1 1

1

6 *1 1

N

i

i

M N

ij j

j i

dN

M A

dM N

d

d

F58 M8A*

'

8

1

42

'1 1

1

8 *1 1

N

i

i

M N

ij j

j i

dN

M A

dM N

d

d

F59 NA4*

4

1

2' 2

1 1

1

4*1 1

'

N

i

i

M N

ij j

j i

rN

NA

rM N

r

r

F60 NB4*

4

1

2' 2

1 1

1

4*1 1

'

N

i

i

M N

ij j

j i

eN

NB

eM N

e

e

F61 FM0 1

max( ) min( )0

d

p

m,n

m

FM

RMC

x x

where p is the total number of harmonics considered

F62 sideband level factor

1 +1

_

d dm, n m, nRMC RMCslf

x

F63 sideband index 1 +1

2

d dm, n m, nRMC RMCsi

Note: X is the Fourier transform of x. N is the length of signal x. K is the length of signal X. M represents the total number of segments up to the present. M’ represents the total number of segment in which gearbox is “healthy”.

Page 21: 1-s2.0-S0263224112002400-main

20

6. Diagnosis of damage levels using ordinal ranking

Fig. 5 shows the flow chart of the proposed method for diagnosing gear tooth surface

damage levels using ordinal ranking. Firstly, sixty-three features described in Table 4 were

calculated for each of the four sensors. Secondly, features from all the sensors are combined,

making the total number of features be 252 (i.e. 634), and the whole data set be 640 (i.e.

160 samples/level 4 levels) by 252 (features). Thirdly, feature selection is conducted

using the feature selection method proposed in Section 3.2. Finally, the selected feature

subset is imported into the ordinal ranking algorithm (SVOR) described in Section 2.2 to

diagnose the damage levels, and output the diagnosis results.

For the convenience of description, we will use ranks „1‟, „2‟, „3‟, and „4‟ to denote the

baseline, slight damage, moderate damage and severe damage in subsequent sections. The

diagnosis results will be quantitatively evaluated using two metrics [22]: mean absolute

(MA) error (Eq. 10) and mean zero-one (MZ) error (Eq. (11)). MA error is affected by how

wrongly a sample is diagnosed. The further the diagnosed rank is from the true rank, the

larger the MA error is. If more ordinal information is preserved in the trained ranking

model, the MA error is more likely to be smaller. MZ error, commonly used in

classification problems, is affected only by whether a sample is wrongly diagnosed or not.

If each rank is more clearly separated from others, the MZ error is more likely to be

smaller. The smaller MA and MZ errors mean a better ranking model. In the two equations,

N is the total number of samples, 'iz is the diagnosed rank, and iz is the true rank.

Mean absolute error (MA error): 1

1| ' |

N

i i

i

z zN

(10)

Page 22: 1-s2.0-S0263224112002400-main

21

Mean zero-one error (MZ error): 1

1 '1,

0 '

Ni i

i i

i i i

if z zt where t

if z zN

(11)

Vibration data

Envelope signalDifference and

residual signals

Frequency

spectrum

18 time-domain

features

15 features specifically

designed for gearbox

damage diagnosis

30 frequency-

domain features

Ccombine features from all sensors

Select sensitive features using the

proposed method

Diagnose damage level using ordinal

ranking

Output diagnosis results

Fig. 5 The proposed diagnosis approach for damage levels

7. Results and discussion

Vibration data collected from the planetary gearbox test rig are used to demonstrate the

effectiveness of the proposed diagnosis approach. Among the total 252 features, features #1

~ # 63 are from sensor LS1 following the order in Table 4; features #64 ~ #126, #127 ~

#139, and #140~ #252 are from LS2, HS1, and HS2, respectively. The feature-label

relevance (i.e. the absolute value of the Polyserial correlation coefficient) between each

individual feature and ranks (damage levels) are shown in Fig. 6. It can be seen that

Page 23: 1-s2.0-S0263224112002400-main

22

different features have different relevance values, some of which are very small. A

threshold (t1) is employed to determine whether a feature has positive contribution to the

ranks. If t1 is large, only a few really important features will be kept; if t1 is small, most

features will be kept and some might be useless. In this paper, we choose t1=0.5 so that

more than half information contained in an individual feature is related to the ranks. The

largest value in Fig. 6 is 0.765 (feature #94), followed by 0.762 (feature #31), 0.762 (feature

#157), and 0.752 (feature #220). These features (#94, #31, #157 and #220) are the

amplitudes at sideband 11( 2) c

rZ f from sensors LS2, LS1, HS1 and HS2, respectively.

0 50 100 150 200 250 3000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

#31 #94 #157 #220

Feature #

Po

lys

eri

al

co

rre

lati

on

co

eff

icie

nt

Fig. 6 Feature-label relevance between damage levels and each of the 252 features

The feature-feature redundancy (i.e. the absolute value of the Pearson correlation coefficient)

between feature #94 and each of the 252 features are shown in Fig. 7. It can be seen that

some features (e.g. #31, #157 and #220) are highly related to feature #94; this means that a

large amount of information in those features is also contained in feature #94. A threshold

(t2) is chosen to limit the redundancy among selected features. Features whose redundancy

Page 24: 1-s2.0-S0263224112002400-main

23

values with selected features are higher than t2 will be omitted. If t2 is large, only a few

features will be omitted and finally most feature will be selected; if t2 is small, most features

will be omitted and finally only a few features will be selected. By checking Fig. 7, we

choose t2=0.8 so that the highly related features (i.e. feature #31, #157 and #220) are

omitted and others can be further considered in next feature selection steps. After t1 and t2

being chosen, the proposed feature selection method (Section 3.2) is applied and 11 features

are finally selected (Table 5).

0 50 100 150 200 250 3000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

#31 #94 #157 #220

Feature #

Pe

ars

on

co

rre

lati

on

co

eff

icie

nt

Fig. 7 Feature-feature redundancy between feature #94 and each of the 252 features

Table 5 Eleven features selected by the proposed feature selection method

List Feature # Physical meaning Sensor 1 94 Amplitude at 1

1( 2) crZ f LS2

2 10 Skewness LS1 3 93 Amplitude at 1

1( 1) crZ f LS2

4 11 Kurtosis LS1 5 172 Amplitude at 2

2( 4) crZ f HS1

6 124 FM0 LS2 7 4 Peak to peak LS1 8 89 Amplitude at 1

1( 3) crZ f LS2

9 192 Average absolute value HS2 10 22 Standard deviation frequency LS1 11 29 Amplitude at 1

1c

rZ f LS1

Page 25: 1-s2.0-S0263224112002400-main

24

To test the diagnostic ability of ordinal ranking, the whole data set is split into two subsets:

the training set and the test set. The training set is for training a ranking model. The test set

is for testing the diagnostic ability of the trained ranking model. In this paper, three separate

scenarios are considered for splitting the whole data set, as listed in Table 6. The algorithm

SVOR introduced in Section 2.2 is employed to train and test a ranking model. The 2nd

degree polynomial kernel was used as the kernel function.

Table 6 Distribution of the training set and the test set

Scenario Training Set Test Set # of samples ranks of samples #of samples ranks of samples

Scenario 1 320 „1‟, „2‟, „3‟,‟ 4‟ 320 „1‟, „2‟, „3‟,‟ 4‟ Scenario 2 480 „1‟, „3‟, „4‟ 160 „2‟ Scenario 3 480 „1‟, „2‟, „3‟ 160 „4‟

In Scenario 1, the whole data set was randomly split into two equal-sized subsets,

one for training and the other for testing. In both subsets, samples from ranks „1‟, „2‟, „3‟

and „4‟ are included. This scenario tests the performance of the ranking model when the

training set covers the whole rank range.

In Scenario 2, samples from only ranks „1‟, „3‟ and „4‟ are included in the training

set; and samples from only rank „2‟ are included in the test set. Practically, it might occur

that data of some damage levels are missed in the training set. This scenario represents the

case when data of slight damage level are not collected for training. This scenario tests the

interpolation ability of the ranking model.

In Scenario 3, samples from only ranks „1‟, „2‟ and „3‟ are included in the training

set; and samples from only rank „4‟ are included in the test set. Similar to Scenario 2, this

scenario examines the case when data of severe damage level are not collected for training.

It tests the extrapolation ability of the ranking model.

Page 26: 1-s2.0-S0263224112002400-main

25

7.1.Effect of feature selection

To check the performance of the proposed feature selection method, five feature subsets

are employed for analyzing the same data in each scenario: (1) all 252 features; (2) top 38

relevant features; (3) 11 features in Table 5 (the proposed method); (4) randomly selected

11 features; (5) 11 features selected by the Pearson correlation coefficient.

Details on how and why these feature subsets are obtained are explained as follows.

Feature subset (1) doesn‟t involve feature selection. Feature subset (2) follows the feature

selection scheme that uses top features without considering relationships among features

[6]. Under this scheme, 38 features having feature-label relevance values larger than 0.5

are obtained. Comparison of feature subsets (1) and (2) shows the influence of irrelevant

features. Feature subset (3) is generated by the proposed method. Comparison of feature

subsets (2) and (3) demonstrates the influence of redundant features. Feature subset (4)

chooses 11 features randomly. Comparison of feature subsets (3) and (4) further

emphasizes the importance of proper feature selection. Feature subset (5) is generated

using a feature-label evaluation measure that is employed in [23] (i.e. the Pearson

correlation coefficient, more precisely, the Point-biserial correlation coefficient). The

generation process for feature subset (5) is the same as the proposed method except that the

Pearson correlation coefficient is used in evaluation of the feature-label relevance (In the

proposed method, the Polyserial correlation coefficient is used). Comparison of feature

subsets (3) and (5) indicates the proper measure for feature-label relevance in ordinal

ranking problems.

Page 27: 1-s2.0-S0263224112002400-main

26

Each of the five feature subsets is imported into SVOR to diagnose damage levels in each

scenario. In Scenario 1, the training set and the test set are randomly generated. To reduce

the impact of randomness on the test results, 30 runs are conducted. The mean and the

standard deviation of the 30 test errors are provided in Table 7. Using all 252 features, the

mean values of MA error and MZ error are both 0.099. Using the 38 relevant features, the

mean values of MA error and MZ error are reduced to 0.078 and 0.077, respectively. This

shows that irrelevant features have adverse effects on the ranking model. Using the

proposed method, some redundant features are further deleted from the 38 features,

keeping only 11 features. As a result, the mean values of MA error and MZ error are

further reduced to 0.073 and 0.072, respectively. This shows that the redundant

information can reduce the performance of the ranking model, and thus needs to be

excluded. Using the randomly selected 11 features, the mean MA error and the mean MZ

error are 0.229 and 0.220 respectively, which are relatively high. The reason is that not

enough relevant information is adopted in these features and there might be redundant

information as well. Using the 11 features selected by the Pearson correlation coefficient,

the mean MA error and the mean MZ error are 0.083 and 0.082, respectively. Compared

with the results of the proposed method, it can be shown that the Pearson correlation

coefficient work less efficiently than the Polyserical correlation coefficient (the proposed

method). The reason is that the Pearson correlation coefficient cannot properly reflect the

relevance between a continuous feature and an ordinal rank. As a result, relevant features

are not correctly selected. In Scenario 1, the proposed method corresponds to the lowest

MA error and MZ error.

Page 28: 1-s2.0-S0263224112002400-main

27

Table 7 Results of Scenario 1 (320 training samples (ranks „1‟, „2‟, „3‟, „4‟) and 320 test samples (ranks „1‟,

„2‟, „3‟, „4‟))

Feature subset MA Error (mean standard deviation)

MZ Error (mean standard deviation)

(1) all 252 features 0.099 0.022 0.099 0.022 (2) top 38 features 0.078 0.016 0.077 0.016 (3) 11 features (the proposed method) 0.073 0.012 0.072 0.012 (4) randomly selected 11 features 0.229 0.025 0.220 0.024 (5) 11 features selected using the Pearson correlation coefficient

0.083 0.020 0.082 0.019

In Scenario 2, the training samples are from ranks „1‟, „3‟ and „4‟ only. The test samples

(rank „2‟) are predicted to be one of the three ranks (i.e. „1‟, „3‟, and „4‟). Because rank „2‟

is never predicted, the MZ error is always 1. We will compare MA errors only. In the

perfect case, the test samples are all diagnosed to be either rank „1‟ or „3‟, which are two

closest ranks to the true rank („2‟). In this case, the MA error is 1. In the worst case, the

test samples are all diagnosed to be rank „4‟, making a MA error of 2. The diagnosed

results are shown in Table 8. With all 252 features, 124 samples are diagnosed to be rank

„3‟ and the rest are rank „4‟, making a MA error of 1.225. Using the top 38 features, 149

samples are ranked „3‟ and the rest are ranked „4‟, resulting in a MA error of 1.069. It can

be seen that after deleting irrelevant features, the MA error is reduced meaning that the

interpolation ability of the ranking model is improved. With the proposed method, eight

samples are ranked as „4‟, and others are ranked as either „1‟ or „3‟, generating a MA error

of 1.050. This indicates that deleting redundant features improves the interpolation ability

of the ranking model. With randomly selected 11 features, 137 samples are ranked „4‟,

giving a high MA error (1.856). This is because randomly selected features contain

irrelevant and redundant information. Using 11 features selected by the Pearson correlation

coefficient, 89 samples are ranked as „4‟ and a MA error of 1.556 is generated, which

Page 29: 1-s2.0-S0263224112002400-main

28

means that the interpolation ability of this learned ranking model is poor. In Scenario 2,

features selected by the proposed method demonstrate the best interpolation ability.

Table 8 Results of Scenario 2 (380 training samples (ranks „1‟, „3‟, „4‟) and 160 test samples (rank „2‟))

Feature subset # of samples in predicted ranks

MA error

MZ error

„1‟ „3‟ „4‟ (1) all 252 features 0 124 36 1.225 1 (2) top 38 features 0 149 11 1.069 1 (3) 11 features (the proposed method) 21 131 8 1.050 1 (4) randomly selected 11 features 0 23 137 1.856 1 (5) 11 features selected using the Pearson correlation coefficient

0 71 89 1.556 1

In Scenario 3, the training samples are from ranks „1‟, „2‟ and „3‟ only. The test samples

(rank „4‟) are predicted to be one of the three ranks (i.e. „1‟, „2‟, and „3‟). Same as in

Scenario 2, the MZ error is always 1 because rank „4‟ is never predicted. We will compare

MA errors only. In the perfect case, the test samples are all diagnosed to be rank „3‟, which

is the closest rank to the true rank (i.e. „4‟). In this case, the MA error is 1. In the worst

case, the test samples are all diagnosed as rank „1‟, making an MA error of 3. Table 9

shows the detailed results. With all 252 features, around half of the test samples (76

samples) are ranked „2‟ and half are ranked „3‟, making an MA error of 1.475. Using the

top 38 features, 34 samples are put in rank „2‟ and others in rank „3‟, reducing the MA

error to 1.215. This demonstrates that irrelevant features should be excluded in order to

improve the extrapolation ability of the ranking model. The features selected by the

proposed method further reduce the MA error to 1.075 by eliminating the redundant

information. Randomly selected features put most samples into rank „2‟, giving an MA

error of 1.569. The 11 features selected using the Pearson correlation coefficient give a

MA error of 1.294, indicating a worse extrapolation ability of the ranking model than that

Page 30: 1-s2.0-S0263224112002400-main

29

of the proposed method (1.075). In this scenario, the proposed method generates the lowest

MA error, and thus produces a ranking model with the best exploration ability.

The comparisons in three scenarios prove the benefits of deleting irrelevant features and

redundant features. Moreover, comparisons between results of the proposed method and

results of features selected using the Pearson correlation coefficient show the effectiveness

of the Polyserical correlation coefficient in evaluating the feature-label relevance for

ordinal ranking problems. Using the Pearson (more precisely, Point-biserial) correlation

coefficient, the rank is regarded as a nominal variable. That is why the Pearson (Point-

biserial) correlation coefficient works well for classification problems not for ordinal

ranking problems. In all three scenarios, the proposed method gives the lowest error,

proving its effectiveness in selecting features for ordinal ranking.

Table 9 Results of Scenario 3 (480 training samples (ranks „1‟,‟2‟,‟3‟) and 160 test samples (rank ‟4‟)) Feature subset # of samples in predicted

ranks MA error

MZ error

„1‟ „2‟ „3‟ (1) all 252 features 0 76 84 1.475 1 (2) top 38 features 0 34 126 1.215 1 (3) 11 features (the proposed method) 0 12 148 1.075 1 (4) randomly selected 11 features 0 91 69 1.569 1 (5) 11 features selected using the Pearson correlation coefficient

0 47 113 1.294 1

7.2.Comparison of ordinal ranking and classification

For comparison purposes, the traditional diagnosis approach [5, 6] which uses a multi-class

classifier to diagnose the damage levels is also applied to each scenario. To avoid the

influence of the learning machine, support vector machine (SVM) is adopted as a classifier

since the ordinal ranking algorithm SVOR is based on SVM. The same kernel function (2nd

degree polynomial kernel) was used. The diagnosis procedure is the same as described in

Page 31: 1-s2.0-S0263224112002400-main

30

Section 4.2 except that the ordinal ranking algorithm is replaced by the classification

algorithm. Results are listed in Table 10.

Table 10 Comparison of the proposed approach and traditional approach

Diagnosis approach

Scenario 1 Scenario 2 Scenario 3 MA Error

MZ error

# of samples in predicted ranks

MA Error

MZ error

# of samples in predicted ranks

MA error

MZ error

„1‟ „3‟ „4‟ „1‟ „2‟ „3‟ proposed approach (ordinal ranking)

0.073

0.012

0.072

0.012

21 131 8 1.050 1 0 12 148 1.075 1

traditional approach (classification)

0.088

0.017

0.066

0.012

0 80 80 1.500 1 25 19 116 1.431 1

In Scenario 1, the MA error of ordinal ranking (0.073 0.012) is smaller than that of

classification (0.088 0.017), whereas the MZ error (0.072 0.0120) is larger than that of

classification (0.066 0.012). This is explained as follows. The MZ error treats wrongly

ranked samples equally and the value of MZ error isn‟t influenced by how well the ordinal

information is kept. The more separately each rank is classified, the more likely that the

MZ error is low. The aim of classification is to classify each rank as separately as possible;

therefore classification gives a lower MZ error. However, the MA error is influenced by

how well the ordinal information is kept. It penalizes the wrongly ranked samples

considering how far a sample is wrongly ranked from its true rank. The more ordinal

information is kept in the ranking model, the more likely that MA error becomes small.

Classification doesn‟t guarantee that the ordinal information is kept. Ordinal ranking, on

the other hand, aims to express the ordinal information by searching a monotonic trend in

the feature space, and therefore the ordinal information is largely preserved. That is why

ordinal ranking produces a smaller MA error than classification. The above arguments are

Page 32: 1-s2.0-S0263224112002400-main

31

also supported by results in Scenarios 2 and 3. In Scenario 2, the MA error is 1.500 for

classification and 1.050 for ordinal ranking. In Scenario 3, the MA error is 1.431 for

classification and 1.075 for ordinal ranking.

The above comparisons show that the ordinal ranking results in a lower MA error, and

classification generates a lower MZ error. For diagnosis of damage levels, a low MA error

is more important than a low MZ error. The reason is explained as follows. A low MA error

means that the diagnosed damage level of a new sample is close to its true level. A low MZ

error, however, cannot ensure a “closer” distance between the diagnosed damage level and

true level. In this sense, ordinal ranking is more suitable for diagnosis of damage levels than

classification. The advantage of ordinal ranking is more obvious when data of some damage

level are missing in the training process, as can be seen from Scenarios 2 and 3 in Table 10.

8. Conclusion

Diagnosis of damage levels is an important task in fault diagnosis of machinery. One key

characteristic of damage levels is the inherent ordinal information. Thus keeping the ordinal

information is important in the diagnosis process. This paper proposes to preserve the

ordinal information by using ordinal ranking techniques. Experimental results on diagnosis

of artificially created surface damage levels of planet gear teeth shows that ordinal ranking

has advantages over classification in terms of lower mean absolute error, better

interpolation ability and extrapolation ability.

A feature selection method is proposed based on correlation coefficients to improve the

diagnosis accuracy of ordinal ranking. The proposed method selects features that are

relevant to ranks, and meanwhile ensures that the redundant information is limited to a

Page 33: 1-s2.0-S0263224112002400-main

32

certain level. Experimental results show that the proposed feature selection method

efficiently reduces the diagnosis errors, and improve the interpolation and extrapolation

abilities of the ranking model.

Correlation coefficient reflects only linear relationship between two variables. Feature

selection methods that consider the nonlinearity when evaluating the feature-label relevance

and feature-feature redundancy will be studied in our future work. Furthermore, the

experimental data used in this paper are from lab experiments not from industry fields. The

effectiveness of the proposed method in real industry needs to be tested in future.

ACKNOWLEDGMENT

The research was supported by the National Sciences and Engineering Research Council of

Canada, Syncrude Canada Ltd., and China Scholarship Council. Critical comments and

constructive suggestions from reviewers and the editor are very much appreciated.

REFERENCES

[1] B. Samanta, Gear fault detection using artificial neural networks and support vector machines with genetic algorithms, Mechanical Systems and Signal Processing, 18 (2004), 625-644.

[2] N. Saravanan, S. Cholairajan, and K. I. Ramachandran, Vibration-based fault diagnosis of spur bevel gear box using fuzzy technique, Expert Systems with Applications, 36 (2009), 3119-3135.

[3] Z. P. Feng, M. J. Zuo, and F. L. Chu, Application of regularization dimension to gear damage assessment, Mechanical Systems and Signal Processing, 24 (2010), 1081-1098.

[4] H. Ozturk, I. Yesilyurt, and M. Sabuncu, Detection and advancement monitoring of distributed pitting failure in gears, Journal of Nondestructive Evaluation, 29 (2010), 63-73.

[5] Y. Lei, M. J. Zuo, Z. J. He, and Y. Y. Zi, A multidimensional hybrid intelligent method for gear fault diagnosis, Expert Systems with Applications, 37 (2010), 1419-1430.

[6] Y. Lei and M. J. Zuo, Gear crack level identification based on weighted K nearest neighbor classification algorithm, Mechanical Systems and Signal Processing, 23 (2009), 1535-1547.

[7] X. Zhao, M. J. Zuo, and Z. Liu, Diagnosis of pitting damage levels of planet gears based on ordinal ranking, in IEEE International Conference on Prognostics and Health management, Denver, U.S., 2011.

[8] M. Inalpolat and A. Kahraman, A theoretical and experimental investigation of modulation sidebands of planetary gear sets, Journal of Sound and Vibration, 323 (2009), 677-696.

[9] T. Barszcz and R. B. Randall, Application of spectral kurtosis for detection of a tooth crack in the planetary gear of a wind turbine, Mechanical Systems and Signal Processing, 23 (2009), 1352-1365.

[10] W. Bartelmus and R. Zimroz, Vibration condition monitoring of planetary gearbox under varying external load, Mechanical Systems and Signal Processing, 23 (2009), 246-257.

Page 34: 1-s2.0-S0263224112002400-main

33

[11] W. Bartelmus and R. Zimroz, A new feature for monitoring the condition of gearboxes in non-stationary operating conditions, Mechanical Systems and Signal Processing, 23 (2009), 1528-1534.

[12] W. Bartelmus, F. Chaari, R. Zimroz, and M. Haddar, Modelling of gearbox dynamics under time-varying nonstationary load for distributed fault detection and diagnosis, European Journal of Mechanics - A/Solids, 29 (2010), 637-646.

[13] H.-T. Lin, "From ordinal ranking to binary classification," Doctor of Philosophy, California Institute of Technology, Pasadena, Unites States, 2008.

[14] A. Shashua and A. Levin, Ranking with large margin principle: two approaches, in Proceedings of Advances in Neural Information Processing Systems, 2002, pp. 937-944.

[15] E. Frank and M. Hall, A simple approach to ordinal classification, in Machine Learning: ECML 2001, L. De Raedt and P. Flach, Eds., ed: Springer Berlin Heidelberg, 2001, pp. 145-156.

[16] R. Herbrich, T. Graepel, and K. Obermayer, Large margin rank boundaries for ordinal regression, in Advances in Large Margin Classifiers, ed: MIT Press, 2000, pp. 115-132.

[17] X. Geng, T. Y. Liu, T. Qin, and H. Li, Feature selection for ranking, in 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, 2007, pp. 407-414.

[18] Rahman Mukras, Nirmalie Wiratunga, Robert Lothian, Sutanu Chakraborti, and D. Harper, Information gain feature selection for ordinal text classification using probability redistribution, in In proceedings of the IJCAI07 workshop on texting mining and link analysis, Hyderabad IN, 2007.

[19] S. Baccianella, A. Esuli, and F. Sebastiani, Feature selection for ordinal regression, in Proceedings of the 2010 ACM Symposium on Applied Computing, New York, 2010, pp. 1748-1754.

[20] S. S. Stevens, On the theory of scales of measurement Science, 103 (1946), 677-680. [21] K. Crammer and Y. Singer, Pranking with ranking, Advances in Neural Information Processing

Systems 14 (2001), 641-647. [22] W. Chu and S. S. Keerthi, Support vector ordinal regression, Neural Computation, 19 (2007), 792-

815. [23] Y. Lei and L. Huan, Efficient feature selection via analysis of relevance and redundancy, The

Journal of Machine Learning Research, 5 (2004), 1205-1224. [24] H. C. Peng, Long, F., and Ding, C., Feature selection based on mutual information: criteria of max-

dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (2005), 1226-1238.

[25] R. E. Schumacker and R. G. Lomax, A beginner's guide to structural equation modeling, Second ed. New Jersey: Lawrence Erlbaum Associates, Inc., 2004.

[26] D. Muijs, Doing quantitative research in education with SPSS, Second ed. London: Sage Publications Ltd, 2010.

[27] X. Zhao, Q. Hu, Y. Lei, and M. J. Zuo, Vibration-based fault diagnosis of slurry pump impellers using neighbourhood rough set models, Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 224 (2010), 995-1006.

[28] I. Guyon, Practical feature selection: from correlation to causality., in Mining Massive Data Sets for Security, ed: IOS Press, 2008.

[29] U. Olsson, F. Drasgow, and N. Dorans, The polyserial correlation coefficient, Psychometrika, 47 (1982), 337-347.

[30] Mohammad Hoseini, Yaguo Lei, Do Van Tuan, Tejas Patel, and M. J. Zuo, Experiment Design of Four Types of Experiments: Pitting Experiments, Run-To- Failure Experiments, Various Load and Speed Experiments, and Crack Experiments, University of Alberta, Edmonton, Canada, January 31, 2011.

[31] A. S. f. Metals, Friction, Lubrication, and Wear Technology Handbook, 10th ed. vol. 18: ASM International, 1992.

[32] P. D. Samuel and D. J. Pines, A review of vibration-based techniques for helicopter transmission diagnostics, Journal of Sound and Vibration, 282 (2005), 475-508.

[33] J.Keller and P.Grabill, Vibration monitoring of a UH-60A transmission planetary carrier fault, in The American Helicopter Society 59th Annual Forum, Phoenix, U.S., 2003.

Page 35: 1-s2.0-S0263224112002400-main

34

[34] A. S. Sait and Y. I. Sharaf-Eldeen, A review of gearbox condition monitoring based on vibration analysis techniques diagnostics and prognostics, in Rotating Machinery, Structural Health Monitoring, Shock and Vibration, Volume 5, 2011, pp. 307-324.

Page 36: 1-s2.0-S0263224112002400-main

The highlights of this paper are as follows:

A feature selection method is designed for ordinal ranking.

A diagnosis approach is proposed for diagnosing damage levels using ordinal ranking.

The proposed approach is applied to diagnosis of surface damage levels of planet gear teeth.

The effectiveness of the designed feature selection method is demonstrated.

The advantage of the proposed approach over the traditional approach is discussed.