Best Basis Intelligent Monitoring

download Best Basis Intelligent Monitoring

of 14

Transcript of Best Basis Intelligent Monitoring

  • 8/13/2019 Best Basis Intelligent Monitoring

    1/14

    Mechanical Systems

    and

    Signal ProcessingMechanical Systems and Signal Processing 19 (2005) 357370

    Best basis-based intelligent machine fault diagnosis

    S. Zhang, J. Mathew, L. Ma, Y. Sun

    CRC for Integrated Engineering Asset Management, School of Mechanical, Manufacturing and Medical Engineering,

    Queensland University of Technology, Brisbane, QLD 4001, Australia

    Received 5 April 2004; received in revised form 21 May 2004; accepted 16 June 2004

    Abstract

    The wavelet packet transform decomposes a signal into a set of bases for timefrequency analysis. This

    decomposition creates an opportunity for implementing distributed data mining where features are

    extracted from different wavelet packet bases and served as feature vectors for applications. This paper

    presents a novel approach for integrated machine fault diagnosis based on localised wavelet packet bases of

    vibration signals. The best basis is firstly determined according to its classification capability. Data mining

    is then applied to extract features and local decisions are drawn using Bayesian inference. A final conclusion

    is reached using a weighted average method in data fusion. A case study on rolling element bearing

    diagnosis shows that this approach can greatly improve the accuracy of diagnosis.

    r 2004 Elsevier Ltd. All rights reserved.

    Keywords: Wavelet packet transform; Best basis; Fault diagnosis; Bayesian inference; Data mining/fusion

    1. Introduction

    Condition monitoring is an important part of the process of modern equipment maintenance.

    Its implementation typically consists of data acquisition, feature extraction, conditionidentification and fault diagnosis [1]. Researchers in the field have tended to focus on two areas

    for their work. The extraction of features that represent the faults in some way is an identified area

    of work. The other is design and implementation of an automatic fault diagnosis procedure.

    ARTICLE IN PRESS

    www.elsevier.com/locate/jnlabr/ymssp

    0022-460X/$ - see front matter r 2004 Elsevier Ltd. All rights reserved.

    doi:10.1016/j.ymssp.2004.06.001

    Corresponding author.

    E-mail address: [email protected] (S. Zhang).

    http://www.elsevier.com/locate/jnlabr/ymssphttp://www.elsevier.com/locate/jnlabr/ymssp
  • 8/13/2019 Best Basis Intelligent Monitoring

    2/14

    Various methods are available for feature extraction. For example, statistical methods are used

    to derive time-domain features, such as signal energy and kurtosis. The fast Fourier transform

    (FFT) is a traditional tool to extract frequency-domain features. Joint timefrequency features

    which can be generated by short-time Fourier transforms are increasingly used since the majorityof real-world signals are essentially time varying. In the past two decades, the wavelet transform

    (WT) and wavelet packet transform (WPT)[2,3], have been researched and applied in a variety of

    ways[4]. More particularly, in machine fault diagnostics, WT and WPT have become preferred

    techniques to the traditional FFT method in the analysis of transient signals [57].

    The features extracted from signals build a foundation for subsequent condition identification

    and fault diagnosis. On the other hand, different approaches have been developed to design

    condition classifiers, aimed at enhancing the accuracy of diagnosis and automating the diagnosis

    procedure. Linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and

    Bayesian inference are known statistical methods. Modern methods, such as neural networks,

    fuzzy logic and expert systems, are preferred due to their intelligent properties. Some integratedapproaches consider different signal features in a combined fashion to enhance the accuracy of

    diagnosis [8].

    In this work, the authors propose a novel approach to conduct integrated fault diagnosis based

    on the best bases of the WPT of vibration signals, using data mining and fusion. The best bases

    of WPT are firstly selected according to their classification capability. Features are then

    extracted from individual best basis and local decisions are made by classifiers such as Bayesian

    inference. A final conclusion is reached using the decision-fusion technique, where the

    classification capabilities of the best bases are served as the decision weights. This proposed

    approach is similar to the distributed data mining approach (DDM), which generally starts from

    local data analysis and subsequently generates a global model [9]. However, the proposed

    approach has not been previously reported in work related to wavelet packet-based faultdiagnosis in the literature.

    This paper is arranged as follows. Section 2 presents the techniques used in this work, such as

    WPT, best basis selection, Bayesian inference, data mining and fusion. Section 3 describes the

    integrated procedure for fault classification by fusing local information from each best basis of

    wavelet packets. The proposed method is validated using signals from faulty rolling element

    bearings in Section 4. In addition, back propagation neural networks (BP) as a classifier is

    compared. The conclusions are presented in Section 5.

    2. Brief introduction of the techniques

    2.1. WPT

    Both WT and WPT have continuous and discrete formats. The discrete format of WPT was

    adopted in this work because it is more popularly used in engineering applications. To illustrate

    the underlying mathematical theory of WPT briefly, we denote fhkgk2Z andfgkgk2Z as thequadrature mirror filter banks. A signal can be decomposed at different scales on the basis

    functions with the form 2j=2un2jtk; j; k2Z; n2Z; where, Zdenotes the integer and Z+

    ARTICLE IN PRESS

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370358

  • 8/13/2019 Best Basis Intelligent Monitoring

    3/14

    denotes the non-negative integer. These functions are iterated as

    u2nt ffiffiffi2

    p

    Xn2Z

    hkun2tk; 1

    u2n1t ffiffiffi

    2pX

    n2Zgkun2tk; 2

    where,jis a scale parameter, kis a time localisation parameter and n is an oscillation parameter.

    Thus, u0t is a scale function which corresponds to a low-pass filter. The filtered signal is anapproximation of the analysed signal. The function u1tis a wavelet function which correspondsto a high-pass filter. The filtered signal is a detail of the analysed signal.

    The approximation and detail can be further sliced by dyadic decomposition using the dilated

    and translated scale functions and wavelet functions. Consequently, WPT generates a binary tree,

    with 2

    j

    1 bases at decomposition level j. Each basis is indexed by a pair of integers (j, k).The binary structure of the tree enables WPT to be used in various applications. For signalrepresentation, for example, a signal can be reconstructed from wavelet packet coefficients

    confined in some specific frequency bands. For pattern recognition, features can be extracted from

    different wavelet packet bases. In addition, the distributed best bases create opportunities for

    feature extraction and combination, where data mining, a convergence of knowledge discovering

    techniques [10], can play an important role. Based on the features of each best basis, local

    decisions can be made by a classifier.

    2.2. Best basis selection

    The binary tree of bases can also be considered as a 2D timefrequency plane. The information

    in the bases is redundant along two axes, i.e. information in child bases are overlapped with that

    in the parent basis. The best basis is preferably selected from the binary tree, so as to reduce the

    data analysis effort without losing information. For signal representation, best bases are defined

    in that they cover the complete horizontal axis while not overlapping the vertical axis[11]. This

    definition results in a complete tree and ensures no redundant information. The Shannon entropy-

    based criterion[12]is well suited to the selection of the complete tree. When signals come from

    different classes and common best basis is required, a WPT-structured tree[13,14]is used for the

    best basis selection. For pattern recognition, common best bases are selected in that they have the

    best classification capability. They are not necessary for the construction of a complete tree

    [15,16]. In this work, the best bases were searched to guarantee class separation, since faultdiagnosis is essentially about pattern recognition.

    Suppose there are c classes oi; i1; . . .; c; in a classification problem, si is denoted as thecluster centre for theith class oi;then the normalised distance between two closest classes iandjis

    di;j jjsisjjjPi1j1Pc1

    i1 jjsisjjj: 3

    The minimal distance infdi;jg is selected as the discriminant distance for best basis selectiond minfdi;jg i1;. . .; c1;j1;. . .; c1: 4

    ARTICLE IN PRESS

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370 359

  • 8/13/2019 Best Basis Intelligent Monitoring

    4/14

    Apparently, a largerdindicates a better capability of classification. It is noted that the minimal

    distance, rather than other measures, such as mean distance, is adopted. This choice assists in the

    determination of a best basis in which the classes are relatively well separated.

    2.3. Bayesian inference for classification

    Bayesian inference is an application of Bayesian theorem and has been used as a fundamental

    classifier for pattern recognition [17,18]. Bayesian inference works by assigning an unknown

    patternx to the class which has the highest posterior probability. According to Bayesian theorem,

    the posterior probability is given by

    Poijx PoiPxjoi

    Px ; 5

    where, Poi is the prior probability of class oi and Pxjoi is the class-conditional probabilitywhich represents the probability distribution ofxin class oi:The total probabilityPxis given by

    Px Xci1

    PoiPxjoi; 6

    To obtain the posterior probability, the prior probability and class-conditional probability

    must be known. The prior probability can be inferred from prior knowledge of the application,

    estimated from the data or assumed to be equal. The class-conditional probability can be

    estimated from the data using either parametric or non-parametric methods. For simplicity, the

    parametric multivariate normal distribution is always used as an approximation of probabilitydensity estimation in case of multivariate features. If the underlying distribution does not follow

    normal probability distribution, the non-parametric density estimation provides an alternative

    approach. In this work, the signal energy and kurtosis were extracted from each best basis as a

    scale feature separately. Their distributions were estimated using both parametric and non-

    parametric methods.

    2.4. Data fusion at the decision level

    Distributed data resources, such as the distributed sensors, require the integration of local

    information to make a final decision. The data-fusion technique provides such a solution and hasbeen successfully used in military and civilian applications [17]. Data fusion helps improve the

    identification accuracy in pattern classification and is typically performed at three levels, i.e. (1)

    sensor-level fusion, (2) feature-level fusion, and (3) decision-level fusion. More recently, the

    decision-level fusion has been termed classifier fusion[1921]. In the work reported in this paper,

    local decisions were drawn from each best basis of wavelet packets. The decision-level data fusion

    is therefore used for integration. Different methods are used for decision-level fusion, such as the

    weighted average method, majority voting technique, Bayesian inference and DempsterShafers

    method. The discriminant distance in Eq. (4) supplies a reasonable decision weight for each best

    basis. As a result, the weighted average method was adopted for decision fusion in this work.

    ARTICLE IN PRESS

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370360

  • 8/13/2019 Best Basis Intelligent Monitoring

    5/14

    3. A procedure to implement integrated fault diagnosis

    A procedure using the above techniques to implement integrated fault diagnosis based on

    wavelet packets is illustrated in Fig. 1. It has the following steps.

    (1) Wavelet packet transformation of signals: The m signals from c signal classes collected for

    training and testing classifiers are decomposed by WPT. This step results in m wavelet packet

    trees.

    (2) Common best basis selection: The discriminant distance (Eq. (4)) is applied for the selection of

    a set ofn best bases from m binary trees.

    (3) Feature extraction from best basis: The data in a best basis is essentially a time-domain signal

    confined in a specific frequency band. Features, such as signal energy or signal kurtosis, are

    extracted to construct a feature vector x:A local feature set for a best basis from the m signalsis X

    fx

    g:

    (4) Decision making on local feature set: The class-conditional probability is firstly estimated fromfeature setX:Given the prior probability, the posterior probability Poijxis computed usingBayesian inference for the unknown signal with a feature vector xin a best basis. The posterior

    probability represents which class x belongs to and is prepared for final decision making.

    (5) Data fusion for final decision making: The weighted average method is adopted for decision

    fusion, while Bayesian inference produces probabilities or confidence values corresponding to

    each class in an individual best basis. For a specific class oi; a fused probability is given by

    Poijx Xnj1

    wi;jPoi;jjx; 7

    where,wi;j

    is the normalised weight given by Eq. (4). A final decision thatxis assigned to class

    Iis made by selecting a maximal averaged posterior probability.

    PoIjx maxPoijx i1; . . .; c; 8

    ARTICLE IN PRESS

    WPT

    Bayes infer. 1

    Feature set 1

    Best basis 1

    Bayes infer. 2

    Feature set 2

    Best basis 2

    Bayes infer. n

    Feature set n

    Best basis n

    Fusion center

    Fig. 1. A procedure for wavelet packets-based fault diagnosis by data mining/fusion.

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370 361

  • 8/13/2019 Best Basis Intelligent Monitoring

    6/14

    Alternatively, a binary classification method can be used by voting the highest posterior

    probability (Eq. (9)). This majority voting technique takes the winner-take-all principle.

    PoIjx 1 max

    Po

    ijx;

    i

    1;

    . . .;

    c;

    0 others: 9

    This procedure facilitates an automatic integrated fault diagnosis approach, since WPT, best

    basis selection and Bayesian inference are highly computational.

    4. A case study

    Rolling element bearings are key components in mechanical systems. Their failures account for

    a large percentage of breakdowns in rotating machinery. Some of these breakdowns can be

    catastrophic. Conducting diagnosis and prognosis on bearings is therefore fundamental tomaintaining the integrity of mechanical systems.

    In this case study, experimental data of faulty ball bearings were used to test our methodology.

    For these bearings, a single defect was introduced by laser processing on the outer-race, inner-race

    and ball, respectively. The data were collected under different operation conditions, i.e. different

    speeds and loads, to ensure that broad conditions are covered for the benefit of the generalisation

    of classifiers.

    Seven hundred samples were acquired for each fault class. Among them, 600 samples were used

    for classifier training, while 100 samples were used for classifier testing. Since three types of faults

    were involved, there were a total of 2100 samples.

    Following the procedure in Section 3, the signals were decomposed by WPT up to level 3 by

    Db20 wavelets. Fig. 2 illustrates a signal from the faulty outer-race and its WPT. The signal

    energy and kurtosis were adopted as features separately and formed the training and testing

    ARTICLE IN PRESS

    Fig. 2. WPT for an outer race signal.

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370362

  • 8/13/2019 Best Basis Intelligent Monitoring

    7/14

    datasets [22,23]. Tables 1 and 2list the normalised discriminant distances for all nodes. The six

    selected common best bases are illustrated in Fig. 3 and 4for the energy and kurtosis features,

    respectively. The corresponding decision weights measured by the discriminant distances are

    shown inTables 3 and 4.For each best basis, it was assumed that the prior probabilities for the three faults were equal,

    i.e. Poi 13; i1; 2; 3; and the class-conditional probabilities were estimated from the trainingdatasets. According to Bayesian inference (Eqs. (5) and (6)), the testing signals were classified to

    reach the local decisions, which were further fused (Eq. (7)) to produce a final decision using

    Eq. (8) or (9).

    ARTICLE IN PRESS

    Table 1

    Discriminant distance (energy)

    0.1865

    0.3949 0.1024

    0.2558 0.2758 0.1148 0.1021

    0.4918 0.2043 0.2306 0.4812 0.0618 0.1430 0.0897 0.2886

    Table 2

    Discriminant distance (kurtosis)

    0.4898

    0.2026 0.4791

    0.3257 0.3362 0.1705 0.3982

    0.0355 0.4888 0.4676 0.0236 0.1730 0.4503 0.3836 0.3728

    (3,7)(3,3)(3,0)

    (2,0)

    (1,0)

    (2,1)

    Fig. 3. Common best basis, (energy).

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370 363

  • 8/13/2019 Best Basis Intelligent Monitoring

    8/14

    ARTICLE IN PRESS

    Table 3

    Decision weights (energy)

    Node (1,0) (2,0) (2,1) (3,0) (3,3) (3,7)

    Weight 0.1805 0.1169 0.1261 0.2248 0.2199 0.1319

    Table 4

    Decision weights (kurtosis)

    Node (0,0) (1,1) (2,3) (3,1) (3,2) (3,5)

    Weight 0.1766 0.1727 0.1436 0.1762 0.1686 0.1623

    (2,3)

    (1,1)(0,0)

    (3,5)(3,2)(3,1)

    Fig. 4. Common best basis (kurtosis).

    0 20 40 60 80 1000

    0.05

    0.1

    Energy

    0 5 10 15 200

    0.2

    0.4

    Energy

    0 0.5 1 1.5 2 2.50

    1

    2

    3

    Energy

    PDF

    PDF

    PDF

    Fig. 5. ASH estimated distribution (energy).

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370364

  • 8/13/2019 Best Basis Intelligent Monitoring

    9/14

    To obtain the class-conditional probabilities, the features are assumed to follow a normal

    distribution or an unknown distribution. For the normal distribution, its mean and variance are

    estimated relatively easily. The averaged shifted histogram (ASH), a non-parametric estimation

    technique, was used to estimate the unknown probability distribution [24]. The estimated

    ARTICLE IN PRESS

    -50 0 50 1000

    0.02

    0.04

    Energy

    -5 0 5 10 15 200

    0.2

    0.4

    Energy

    -1 -0.5 0 0.5 1 1.5 20

    1

    2

    Energy

    PDF

    PDF

    PDF

    Fig. 6. Estimated normal distribution (energy).

    0 50 100 150 200 250 300 3500

    0.02

    0.04

    0.06

    Kurtosis

    0 50 100 150 200 250 3000

    0.05

    0.1

    Kurtosis

    0 50 100 150 200 250 300 3500

    0.01

    0.02

    0.03

    Kurtosis

    PDF

    PDF

    PDF

    Fig. 7. ASH estimated distribution (kurtosis).

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370 365

  • 8/13/2019 Best Basis Intelligent Monitoring

    10/14

  • 8/13/2019 Best Basis Intelligent Monitoring

    11/14

    The misclassification rate was calculated using the winner-take-all principle.

    ri11

    JXJ

    j1

    bi;j; 11

    where, bi;j was 1 if the related result of Bayesian inference for class i is maximum, otherwise it

    was 0.

    Table 5shows that when signal energy and signal kurtosis were employed as the feature. All 100

    test signals in each class were correctly classified. However, employing energy as the feature

    resulted in significantly higher decision confidences. It is concluded that signal energy is a better

    feature choice. Another findingthe classification results by ASH estimation are comparable with

    that obtained by normal distribution estimation for each feature case. This finding suggests that

    both probability estimation methods work well for the case study.

    For classification problems, a feature vector may alternatively be built in that its elements come

    from different best bases. Instead of using the DDM approach, a final decision can be madedirectly based on this feature vector. A matched classifier is required. However, if Bayesian

    inference is used, the assumption of multivariate normality distribution for the feature vectors is

    always violated, resulting in unacceptable misclassifications. The non-parametric multivariate

    probability density estimation is also difficult to implement for this case. As a comparison, a

    6 nh 3 BP neural network[25,26] was designed. In the three-layer neural network, six inputnodes corresponded to the features extracted from the six best bases, and three output nodes

    corresponded to the three types of faults. The target outputs were [1, 0, 0], [0, 1, 0] and [0, 0, 1],

    respectively. The number of hidden nodes,nh, was varied from 5 to 20 to reach an optimal design.

    The signal features in each common best basis were concatenated into a normalised feature vector

    which constructed the training and testing datasets. During network training, the cross- validation

    technique[26]was used to prevent over fitting. Four-fifths of the training samples were used for

    ARTICLE IN PRESS

    0 50 100 150 20010

    -6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    101

    Epoch

    Training

    ValidationE

    rror

    Goal

    Fig. 9. Learning curve (energy).

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370 367

  • 8/13/2019 Best Basis Intelligent Monitoring

    12/14

    training and one-fifth was used for validation. The maximum iteration was 1000 and the target

    error was 0.00001. The training of the BP neural networks ceased when either the maximum

    iteration or the target error was reached. Another criterion to stop training was the cross-

    validation. The initial values of weights and bias of the networks were randomly set. It was found

    thatnh=15 generated best results for energy feature vectors, andnh=13 generated best results for

    kurtosis feature vectors.Figs. 9 and 10present the learning curves. In our case study, the cross-validation ceased the training of BP neural networks with an acceptable total error lower than

    0.01. In addition, 230 and 108 epochs were required for the energy and kurtosis features,

    respectively. The classification results are also listed in Table 5. Similarly, the averaged confidence

    and misclassification rate were computed by Eqs. (10) and (11).

    FromTable 5, it is clear that BP networks relying on the concatenated feature vectors resulted

    in deteriorated classification. Misclassifications occurred in each fault class for both individual

    features. The signal kurtosis led to poorer results. The comparison using BP networks suggests the

    proposed integrated method significantly outperforms adopted BP neural networks for

    classification.

    5. Conclusion

    This work has presented an automatic fault classification technique based on the WPT and best

    basis selection. The novel approach performs integrated fault diagnosis based on vibration

    signals. The following conclusions are drawn:

    (1) Signals can be classified (diagnosed) based on the best basis of WPT. For each best basis, the

    capability to discriminate features served as the decision weight for final decision fusion.

    ARTICLE IN PRESS

    0 20 40 60 80 10010

    -6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    101

    Epoch

    Validation

    Training

    Goal

    Error

    Fig. 10. Learning curve (kurtosis).

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370368

  • 8/13/2019 Best Basis Intelligent Monitoring

    13/14

    (2) Both signal energy and kurtosis can be used to classify the signals 100% correctly by the

    integrated method. Signal energy, however, resulted in higher decision confidences and is

    preferred.

    (3) The probability estimation methods by ASH and normal estimation led to comparable resultsin the case study.

    (4) BP neural networks employing concatenated feature vectors with the element coming from

    individual best basis, deteriorated classification results, in terms of both misclassification rate

    and decision confidence.

    References

    [1] A. Davies, Handbook of Condition MonitoringTechniques and Methodology, Chapman & Hall, UK, 1998.[2] I. Daubechies, Ten lectures on wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 61,

    SIAM, Philadelphia, PA, 1992.

    [3] S. Mallat, A theory for multiresolution signal decomposition: the wavelet representation, IEEE Transactions on

    Pattern Analysis and Machine Intelligence 11 (1989) 674692.

    [4] B.K. Alsberg, A.M. Woodward, D.B. Kell, An introduction to wavelet transforms for chemometricians: a

    timefrequency approach, Chemometrics and Intelligent Laboratory Systems 37 (1997) 215239.

    [5] S.K. Goumas, M.E. Zervakis, G.S. Stavrakakis, Classification of washing machines vibration signals using discrete

    wavelet analysis for feature extraction, IEEE Transactions on Instrumentation and Measurement 51 (3) (2002)

    497508.

    [6] G. Meltzer, Y.H. Ivanov, Fault detection in gear drives with non-stationary rotational speedpart II: the

    timefrequency approach, Mechanical Systems and Signal Processing 17 (2) (2003) 273283.

    [7] N.G. Nikolaou, I.A. Antoniadis, Rolling element bearing fault diagnosis using wavelet packets, NDT&EInternational 35 (2002) 197205.

    [8] S.L. Chen, Y.W. Jen, Data fusion neural network for tool condition monitoring in CNC milling machining,

    International Journal of Machine Tools and Manufacture 40 (2000) 381400.

    [9] D.E. Hershberger, H. Kargupta, Distributed multivariate regression using wavelet-based collective data mining,

    Journal of Parallel and Distributed Computing 61 (2001) 372400.

    [10] K. Mehmed, Data Mining: Concepts, Models, Methods and Algorithms, IEEE Press, Wiley, New York, 2002.

    [11] M. Cocchi, R. Seeber, A. Ulrici, WPTER: wavelet packet transform for efficient pattern recognition of signals,

    Chemometrics and Intelligent Laboratory Systems 57 (2001) 97119.

    [12] R.R. Coifman, M.V. Wickerhauser, Entropy-based algorithms for best basis selection, IEEE Transactions on

    Information Theory 38 (2) (1992) 713718.

    [13] B. Walczak, D.L. Massart, Wavelet packet transform applied to a set of signals: a new approach to the best-basis

    selection., Chemometrics and Intelligent Laboratory Systems 38 (1997) 3950.

    [14] N. Saito, R.R. Coifman, F.B. Geshwind, F. Warner, Discriminant feature extraction using empirical probability

    density estimation and a local basis library, Pattern Recognition 35 (2002) 28412852.

    [15] Y. Wu, R. Du, Feature extraction and assessment using wavelet packets for monitoring of machining processes,

    Mechanical Systems and Signal Processing 10 (1) (1996) 2953.

    [16] S. Zhang, J. Mathew, L. Ma, Common best basis selection of wavelet packets for machine fault diagnosis,

    Proceedings of the 10th Asia-Pacific Vibration Conference, 2003, pp. 835840.

    [17] D.L. Hall, J. Llinas, Handbook of Multisensor Data Fusion, CRC Press, Boca Raton, FL, 2001.

    [18] B. Chen, P.K. Varshney, A Bayesian sampling approach to decision fusion using hierarchical model, IEEE

    Transactions on Signal Processing 50 (8) (2002) 18091818.

    [19] J. Kittler, M. Hatef, R.P.W. Duin, J. Matas, On combining classifiers, IEEE Transactions on Pattern Analysis and

    Machine Intelligence 20 (3) (1998) 226239.

    ARTICLE IN PRESS

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370 369

  • 8/13/2019 Best Basis Intelligent Monitoring

    14/14

    [20] D.M.J. Tax, M.V. Breukelen, R.P.W. Duin, J. Kittler, Combining multiple classifiers by averaging or by

    multiplying?, Pattern Recognition 33 (2000) 14751485.

    [21] S. Prabhakar, A.K. Jain, Decision-level fusion in fingerprint verification, Pattern Recognition 35 (2002) 861874.

    [22] B. Samanta, K.R. Al-Balushi, Artificial neural network based fault diagnostics of rolling element bearings usingtime-domain features, Mechanical Systems and Signal Processing 17 (2) (2003) 317328.

    [23] J. Shiroishi, Y. Li, S. Liang, T. Kurfess, S. Danyluk, Bearing condition diagnostics via vibration and acoustic

    emission measurements, Mechanical Systems and Signal Processing 11 (5) (1997) 693705.

    [24] W.L. Martinez, A.R. Martinez, Computational Statistics Handbook with MATLAB, Chapman & Hall/CRC,

    New York, 2002.

    [25] D. Rumelhart, G. Hinton, R. Williams, Learning representation by back-propagating errors, Nature 323 (1986)

    533536.

    [26] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995.

    ARTICLE IN PRESS

    S. Zhang et al. / Mechanical Systems and Signal Processing 19 (2005) 357370370