[IEEE 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Singapore...
Transcript of [IEEE 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Singapore...
Tool Wear Forecast Using Singular Value Decomposition for Dominant
Feature Identification
Chee Khiang Pang1,2, Jun-Hong Zhou3,4, Frank L. Lewis1, and Zhao-Wei Zhong4
Abstract— Identification and prediction of lifetime of indus-trial cutting tools using minimal sensors is crucial to reduceproduction costs and down-time in engineering systems. In thispaper, we provide a formal decision software tool to extract thedominant features enabling tool wear prediction. This decisiontool is based on a formal mathematical approach that selectsdominant features using the Singular Value Decomposition(SVD) of real-time measurements from the sensors of anindustrial cutting tool. It is shown that the proposed methodof dominant feature selection is optimal in the sense that itminimizes the least-squares estimation error. The identifieddominant features are used with the Recursive Least Squares(RLS) algorithm to identify parameters in forecasting the timeseries of cutting tool wear on an industrial high speed millingmachine.
I. INTRODUCTION
In an era of intensive competition, where asset usage and
plant operating efficiency must be maximized, unexpected
downtime due to machinery failure has become more costly
than before. Therefore predictive maintenance has been ac-
tively pursued in the manufacturing industry in recent years,
where equipment outages are predicted and maintenance is
carried out only when necessary. To ensure successful condi-
tion based maintenance, it is necessary to detect, identify and
classify different kinds of failure modes in the manufacturing
process.
Tool wear in industrial cutting machines is particularly
difficult to assess, as measurements of tool wear require that
the machine is stopped, and the tool extracted and visually
inspected. This results in machine down time and human user
intervention costs. On the other hand, failure to remove worn
tools can lead to their failure, with concomitant damage to
expensive parts. In recent years, much research works that
have been devoted to tool wear monitoring in the machining
process using a set of reduced features [1][2][3]. In spite
of these efforts, realization of acceptable prediction of tool
wear still needs improvement.
In this paper, we provide a formal decision software tool
for selecting the dominant features that are most essential
in predicting time series of tool wear in industrial cutting
1C. K. Pang and F. L. Lewis are with Automation & Robotics ResearchInstitute, The University of Texas at Arlington, Fort Worth, TX 76118, USA{ckpang,lewis}@uta.edu
2C. K. Pang is with Department of Electrical and Computer Engineering,National University of Singapore, Singapore 117576, Singapore
3J. -H. Zhou is with A*STAR Singapore Institute ofManufacturing Technology, Singapore 638075, [email protected]
4J. -H. Zhou and Z. -W. Zhong are with School of Mechanicaland Aerospace Engineering, Nanyang Technological University, Singapore639798, Singapore [email protected]
machines. A rigorous mathematical framework for Domi-
nant Feature Identification (DFI) is developed that provides
an autonomous rule-base for data and sensor reduction.
We use Singular Value Decomposition (SVD) to decom-
pose the inner product matrix of collected data from the
sensors monitoring the wear of an industrial cutting tool.
The principal components affecting the machine wear are
optimized in a least squares sense in a certain reduced
space, and the dominant features are extracted using the K-
means clustering algorithm [4]. This DFI framework uses
formal mathematical analysis to select dominant features
based on the inner product matrix of the collected data, not
the correlation (outer product) matrix. It is proven that the
proposed method of dominant feature selection is optimal in
the sense that it minimizes the total least-squares estimation
error. The performance of DFI is evaluated based on the
accuracy of prediction of the actual tool wear. Comparisons
are made with another technique for feature selection in the
literature [5].
II. PRINCIPAL COMPONENT ANALYSIS (PCA)
In this section, the theoretical fundamentals of PCA using
SVD are reviewed in a rigorous manner, which is essential
for selecting dominant features in Section III.
The SVD of a linear transformation X where X ∈ Rm×n
of rank n < m is
X = UΣV T (1)
with U ∈ Rm×n and V ∈ R
n×n, such that UT U =V T V = In with In ∈ R
n×n being an identity matrix of
dimension n. Σ ∈ Rn×n is a diagonal matrix whose elements
are corresponding singular values (principal gains) arranged
in descending order, i.e., with Σ = diag(σ1, σ2, · · · , σn)and σ1 ≥ σ2 ≥ · · · ≥ σn > 0.
A. Approximation of Linear Transformation X
X can be regarded as a transformation from feature
space Rn into data space R
m. Note that X =∑n
i=1 σiuivTi ,
where ui are the column vectors of U and vTi are the
row vectors of V T , respectively. Partition the SVD of X
according to
X =[
U1 U2
]
[
Σ1 00 Σ2
] [
V T1
V T2
]
= U1Σ1VT1 + U2Σ2V
T2 (2)
with q < m as the desired number of singular values
to be retained in Σ1 for data space of dimension m. As
such, Σ2 contains the n − q discarded singular values.
2009 IEEE/ASME International Conference on Advanced Intelligent MechatronicsSuntec Convention and Exhibition CenterSingapore, July 14-17, 2009
978-1-4244-2853-3/09/$25.00 ©2009 IEEE 421
Obviously, U1 ∈ Rm×q , U2 ∈ R
m×(n−q), Σ1 ∈ Rq×q, Σ2 ∈
R(n−q)×(n−q), V T
1 ∈ Rq×n, and V T
2 ∈ R(n−q)×n.
Now the approximation X to X is
X = U1Σ1VT1 . (3)
Then X =∑q
i=1 σiuivTi contains the columns ui of U1 and
the rows vTi of V T
1 . The dominant singular values, i.e., the q
retained singular values, and their associated columns of U
are called principal components in PCA [6].
The error induced X by the approximation X of the linear
transformation X is given by
X = X − X = U2Σ2VT2 =
n∑
i=q+1
σiuivTi . (4)
The covariance matrix of the approximation error is
PX = (X − X)(X − X)T
= U2Σ2VT2 V2Σ2U
T2 = U2Σ
22U
T2 . (5)
The 2-norm of the approximation error is given by
tr{PX} = tr{U2Σ22U
T2 } = tr{Σ2
2UT2 U2} = tr{Σ2
2}
=
n∑
i=q+1
σ2i (6)
where tr{¦} denotes the trace operation. This is the sum of
squares of the neglected singular values. It can be shown that
this SVD approximation gives the Least-Square Error (LSE)
of any approximation to X of rank q.
B. Approximation in Range Space by Principal Components
Now, we identify Rn as the feature space and R
m as the
range space of X which we term the data space. Consider an
arbitrary vector x ∈ Rn being mapped onto a vector z ∈ R
m
by X according to z = Xx. As such, z in the singular value
space (range space) of Rm can also be represented according
to the partitioned singular value matrix in (2) as
z = Xx = U1Σ1VT1 x + U2Σ2V
T2 x (7)
with Σ1 containing the retained q singular values of X .
An approximation to z is z given in terms of the q retained
singular values as
z = U1Σ1VT1 x = Xx (8)
with X being the approximation of X . Note that z =∑q
i=1 ui(σivTi x) which expresses z as a linear combination
of principal components ui with coefficients (σivTi x).
The approximation error is given by
z = z − z = (X − X)x = U2Σ2VT2 x (9)
and the approximation error 2-norm is given by
zT z = xT V2Σ22V
T2 x
∴ ||z||2 = tr{Σ22V
T2 xxT V2}
≤ σ2q+1x
T V2VT2 x = σ2
q+1||x||2
≤ tr{Σ22}||x||
2. (10)
III. DOMINANT FEATURE IDENTIFICATION (DFI)
In this section, the proposed Dominant Feature Identi-
fication (DFI) methodology of using SVD to identify the
dominant features is detailed. Note that traditional PCA is
performed with respect to data space Rm, but the features
however reside in Rn.
A. Data Compression
Select
Y = UT1 X ∈ R
q×n (11)
so that x ∈ Rn is mapped to y = Y x = UT
1 Xx ∈ Rq . Then
vectors z = Xx ∈ Rm may be approximated in terms of
vectors y ∈ Rq according to
z = U1y. (12)
Combining with (2), we get an approximation of z ∈ Rm
in terms of vectors in the reduced space Rq depicted by (11).
viz.
z = U1y = U1UT1 Xx
= (U1UT1 )(U1Σ1V
T1 + U2Σ2V
T2 )x
= U1Σ1VT1 x = Xx (13)
with X depicted in (3).
It is well known in the literature that (11) and (13) provide
the best approximation of data vectors in Rm in terms of
reduced vectors y ∈ Rq . The approximation error z is given
by
z = (X − X)x = Xx = U2Σ2VT2 x (14)
which is exactly as (9). In fact, note that
zT z = (U1Σ1VT1 x)T U2Σ2V
T2 x
= = xT V1Σ1UT1 U2Σ2V
T2 x = 0, (15)
i.e., z is orthogonal to z which implies that (11) is the optimal
choice of Rq for Least Squared Error (LSE) approximation
of vectors in Rm by (12).
Note that approximation of z ∈ Rm using y ∈ R
q , the
reduced space is equivalent to approximation by principal
components in (8).
B. Selection of Dominant Features
For any q > 0, the selection of the first q singular values
of X yields a reduced space Rq generated by the linear
transformation UT1 : R
n → Rq as in (11). Moreover, y =
Y x and
z = U1y (16)
best approximates z = Xx in a least squares sense. This
allows us to use reduced vectors y ∈ Rq to compute
approximations to data vectors z ∈ Rm, instead of using
the full feature vector x ∈ Rn.
Now we wish to approximate further by selecting the
dominant features in Rn. That is, it is desired to select the
most important basis vectors from Rn to approximate the
422
data vectors z. To do so, it is instrumental to note the little-
realized fact that
Y = UT1 X = UT
1 (U1Σ1VT1 + U2Σ2V
T2 ) = Σ1V
T1 . (17)
The original basis vectors in feature space Rn are known
as features, with the ith basis vector corresponding to
the ith feature. The original basis vectors are denoted in Rn
by {e1, e2, · · · , en} with ei being the ith column of In, i.e., ei
is an n-vector consisting of zeros except for one in the ith
position.
In terms of these notions, the vector generated in Rq by
the ith feature ei ∈ Rn is given by
Y ei = UT1 Xei = Σ1V
T1 ei. (18)
Recall that the rows of V T1 are denoted by row vec-
tors vTi , i.e., the columns of V1 are denoted by column
vectors vi. By contrast, denote now the columns of Σ1VT1
as column vectors wi ∈ Rq. Then
Y ei = Σ1VT1 ei =
[
w1 w2 · · · wn
]
ei = wi. (19)
Therefore, the ith feature in Rn maps into the reduced
space Rq as the ith column of matrix Σ1V
T1 . There are
therefore n vectors wi in Rq corresponding to the n basis
axes ei, i.e., features, in Rn.
The above-mentioned notions of the proposed DFI algo-
rithm are summarized in Fig. 1.
Fig. 1. Proposed DFI algorithm showing feature space Rn, compressed
feature space Rq , and data (singular value) space R
m.
We now want to select the best features to retain so as
to obtain the best approximation to z ∈ Rm. We call these
dominant features. This corresponds to selecting which basis
vectors ei in Rn to retain, which is equivalent to selecting the
best columns wi of Σ1VT1 ∈ R
q . This may be accomplished
by several methods, including projections; we use clustering
methods inspired by [5]. Then, z ∈ Rm will be approximated
using the selected p dominant features within Rn.
Note that we will cluster the n columns wi of Σ1VT1 ,
as dictated by (19). This is in contrast to [5] who clustered
the columns of UT1 in (11). Clustering is the classification
of n objects in a data set into p different subsets (clusters),
usually by minimizing some norms or pre-defined perfor-
mance indices. To select the dominant features in Rn, we
cluster the n vectors wi ∈ Rq into n ≥ p ≥ q clusters. For
our application, the commonly used K-means algorithm is
used [7]. The K-means algorithm minimizes the following
positive semidefinite scalar error cost function J iteratively
J =
p∑
i=1
∑
wj∈Si
(wj − ci)T (wj − ci) (20)
where Si is the ith cluster set, and ci is its centroid (or
center of “mass”) in the cluster space. J is in essence the
expectation of the 2-norm (or Euclidian distance) between
the objects in the cluster. For good approximations in Rm,
one should select p > q, the number of retained singular
values.
C. Error Analysis
Here we determine the total error induced by retaining
only q singular values and by clustering the vectors wi ∈ Rq
into p clusters. It is shown that the proposed method of DFI is
optimal in the sense that it minimizes the total least-squares
estimation error.
For each cluster, we shall select the vector wi ∈ Rq
closest to the cluster center ci as representative of each other
vector wj ∈ Rq in that cluster. We call this representative
vector for cluster i, wi, the cluster leader. The p features ei ∈R
n corresponding to the p cluster leaders wi ∈ Rq shall be
selected as dominant features. This means that the clustering
error is given by
J =
p∑
i=1
∑
wj∈Si
(wj − wi)T (wj − wi). (21)
To summarize these notions, recall that
Y = Σ1VT1 =
[
w1 w2 · · · wn
]
(22)
and define
Y =[
w1 w2 · · · wn
]
(23)
where wj = wi if wj ∈ Si, i.e., each vector wj ∈ Si is
replaced by its cluster leader wi.
This means that only the corresponding features ei ∈ Rn
are needed for computation since wi = Σ1VT1 ei.
Note that by (19)
y = Σ1VT1 x = Y x =
n∑
j=1
wjxj =
n∑
j=1
xj
[
Σ1VT1 ej
]
(24)
and define
y = Y x =
n∑
j=1
wjxj =
p∑
i=1
(
∑
wj∈Si
xj
)
wi. (25)
Then an estimate for z ∈ Rm taking into account both q < n
retained singular values and p < n features is given by
ˆz = U1y. (26)
Recall from (16) that z = U1y, so the error induced by
clustering is
z − ˆz = U1(y − y) = U1(Y − Y )x = U1Y x. (27)
423
Therefore, the error norm induced by clustering is
||z − ˆz||2 = (z − ˆz)T (z − ˆz) = xT Y T UT1 U1Y x
= tr{Y T UT1 U1Y xxT } ≤ J ||x||2 (28)
since J = tr{Y T Y }.
The total error induced by neglecting the n − q singular
values in Σ2 and by clustering is then
z − ˆz = (z − z) + (z − ˆz) = U2Σ2VT2 x + U1Y x. (29)
Therefore, the total approximation error norm is
||z − ˆz||2 ≤ (tr{Σ22} + J)||x||2 (30)
whose first term depends on the neglected singular values,
and the second term is the clustering error, i.e., the neglected
features.
We claim that the procedure of first selecting q principle
components and then selecting p dominant features yields
the minimum overall approximation error in (29). For note
that
zT (z − ˆz) = (U2Σ2VT2 x)T U1Y x
= xT V2Σ2UT2 U1Y x = 0, (31)
i.e., the error in neglecting n − q singular values and the
clustering error are orthogonal. This means that there is
no better way of selecting dominant features than the DFI
methodology proposed therein.
D. Simplified Computations
Traditional PCA relies on computations using the cor-
relation matrix XXT = UΣ2UT ∈ Rm×m [6]. This is
computationally expensive since generally m ≫ n.
Defining the inner product matrix as XT X , we get
XT X = V ΣUT UΣV T = V Σ2V T . (32)
As XT X ∈ Rn×n with n ≪ m, the computation of Σ1
and V T1 required to find Y in (17) is highly simplified
using (32).
IV. TIME SERIES FORECASTING
We now wish to apply DFI to prediction of tool wear in
an industrial cutting tool. Tool wear can only be measured
by removing the tool and performing visual inspection and
measurement, which is tedious and time consuming and
results in down-time for the machine. We wish to predict tool
wear using signals that are easily monitored in real time.
In our application, sensors are used to measure cutting
force online in real time. Then, signal processing is used to
compute n features, such as mean force value, maximum
force level, standard deviation, third moment (skew), etc.
Data is taken over N time steps and stored, and consists
of the values of the n features at each time step, along with
the tool wear measured (by visual inspection) at each time
step.
Partition the collected date set over N time steps into two
sets. The data through time m < N is used as a training set to
compute the p dominant features and determine the unknown
parameter vector θ as a training set. Then the remaining data
from time m + 1 to time N is used to verify the prediction
accuracy as a validation set. In our application, N = 20×103
samples, so a reasonable choice for m is m = 15 × 103
samples. The number of features computed is n = 16. These
are described in the Section V.
A. Determining the Dominant Features
Define
X ≡
fT1
fT2...
fTk
∈ Rm×n (33)
in terms of the collected data from the installed sensors
through time m. Use the machinery in Sections II and III
to select the p dominant features.
B. Prediction of Tool Wear
Define ϕTk ∈ R
p as the vector containing the measured
dominant features at time k ≤ m. Note that ϕk is a p-
subvector of fk. Then one desires to predict the tool wear dk
in terms of the p dominant features using
dk = ϕTk θ (34)
To do so, one can estimate the parameter vector θ using the
measured data with least-squares techniques. Note that only
the p dominant features are used, i.e., θ ∈ Rp.
Define
Φkθ ≡
ϕT1
ϕT2...
ϕTk
θ =
d1
d2
...
dk
≡ Dk (35)
in terms of the data collected. The estimation error through
time k is
Ek = Dk − Φkθ (36)
where θ ∈ Rp is the vector unknowns to be regressed for
time series forecast of tool wear.
A least square estimation of θ which minimizes the error
norm ETk Ek is given by the standard unique batch solution
θ = (ΦTk Φk)−1ΦT
k Dk (37)
if there is sufficient persistent excitation, i.e., ΦTk Φk is
invertible. To compute θ using efficient on-line recursive
means, one may use Recursive Least Squares (RLS) instead
of (37).
V. INDUSTRIAL APPLICATION
A case study is carried out to verify the usability of the
method. An application related to high speed machining tool
condition is selected for the experiment.
424
A. Experimental Setup
In our experiment, we used a nose ball cutter in a milling
machine as the test bed. The cutting force signal is used to
establish usable models due to its high sensitivity to tool
wear, low noise, and good measurement accuracy [8]. The
cutting forces along the X, Y, and Z axes were captured
using a Kistler dynamometer in the form of charges, which
were converted to voltages and sampled by a PCI 1200 board
at 2000 Hz. The flank wear of each individual tooth of the
cutting tool was measured with an Olympus microscope. The
experimental set up is shown in Fig. 2.
Fig. 2. Experimental setup.
B. Selection of Dominant Features
Sixteen features from the methodologies mentioned above
are summarized in Table I and form the scope of the feature
subset selection. To avoid inefficiency which results in loss
in productivity, the tool wear and part failures are estimated
online without ceasing operation of the cutting tool.
TABLE I
FEATURES AND NOMENCLATURE
No Feature Notation
1 Residual error re2 First order differencing fod3 Second order differencing sod4 Maximum force level fm5 Total amplitude of cutting force fa6 Combined incremental force changes df7 Amplitude ratio ra8 Standard deviation of force components fstd
in tool breakage zone9 Sum of the squares of residual errors sre10 Peak rate of cutting forces kpr11 Total harmonic power thp12 Average force fca13 Variable force vf14 Standard deviation std
15 Skew (3rd moment) skew
16 Kurtosis (4th moment) kts
TABLE II
PRINCIPAL COMPONENTS AND SINGULAR VALUES
No. Singular values No. Singular values
1 31.90702 9 0.00001
2 1.082043 10 3.07857× 10−6
3 0.00342 11 1.13154× 10−6
4 0.00026 12 6.45546× 10−7
5 0.00011 13 4.62940× 10−7
6 0.00005 14 1.71882× 10−7
7 0.00005 15 4.60490× 10−9
8 0.00001 16 2.02272× 10−9
C. Calculation of Dominant Features
Based on the above, the five following steps are involved
in the proposed DFI to obtain a feature subset:
1) Data acquisition. Collect data on time interval [0,m]from the cutting tools’ sensors and compute n features
using digital signal processing techniques. Pack n time
series, each of length m, as columns into matrix X ∈R
m×n.
2) Initialization. Detrend the data in X by subtracting
the mean (average across each dimension) of each of
the dimensions to normalize the data set to zero-mean.
This step results in a data set whose mean is zero.
3) Choose number of principal components and clus-
ters. Perform SVD on inner product matrix XT X to
obtain Σ1 and V T1 . Select the desired number q of
principal components in σivi and number of clusters p.
4) Clustering. Use the K-means algorithm for clustering
to find the centroids ci of each cluster.
5) Subset Determination. Select the vector wi “nearest”
to the centroid of each cluster as its cluster leader wi,
and corresponding ei as the dominant feature. Combine
the p dominant features to form the reduced feature
space.
VI. EXPERIMENTAL RESULTS
In the experiment, 52,800 time points of measured force
sensor data were captured under the following machine
settings: spindle speed 1000 rpm, feed rate 200 mm/min,
depth of cut 1 mm, and insert number 2. This yields the
baseline actual tool wear plot shown in Figs. 3–4, and the
resulting singular values are shown in Table II.
Using the RLS techniques of Section IV, a Multiple
Regression Model (MRM) (34) was identified to predict the
baseline measured tool wear using all sixteen of the original
features. Fig. 3 shows the actual measured tool wear and the
predicted tool wear using this MRM as functions of time.
A Mean Relative Error (MRE) of 8.8% is observed and
represents our best possible prediction of tool wear using
this set of sixteen features.
Next, we select q = 3 retained singular values and p = 4dominant features and the result is shown in Fig. 4. The
tool wear prediction MRE is 11.12%, and is close to the
MRE of 8.8% obtained using all sixteen features. The four
dominant features turn out to be {fa, fca, fstd, thp}.
425
Fig. 3. MRM using sixteen dominant features and the RLS algorithm.
Fig. 4. Examples of MRMs using three retained singular values, fourdominant features, and the RLS algorithm.
Also, we compared our DFI method to the PFA method
of [5]. Fig. 4 shows the actual measured tool wear, and
its prediction using the best four features selected by DFI
and PFA. The MRE for DFI is 11.12%, while MRE for
PFA is 13.18%. Comparison results of DFI and PFA using
three retained singular values are shown in Table III using
DFI methodology and in Table IV using PFA method,
respectively.
TABLE III
RESULTS OF DFI METHOD USING THREE RETAINED SINGULAR
VALUES
No. Dominant features selected MSE MRE
used (mm2) (%)
4 fa, fca, fstd, thp 1.262 11.615 fa, fca, fm, skew, thp 1.202 11.196 fa, fca, fstd, ra, sre, thp 1.111 10.497 fa, fca, fstd, ra, skew, sod, thp 1.111 10.408 fa, fca, fstd, kts, ra, skew, sod, thp 0.946 8.869 fa, fca, fstd, kpr, kts, ra, skew, thp, vf 0.946 8.86
The features chosen are different using the proposed DFI
methodology and PFA. DFI gives a smaller MSE and MRE
when compared to that using PFA, which provides a better
accuracy than PFA in tool wear prediction. When the number
of dominant features chosen reaches to eight or more, the
improvements in MSE and MRE using increasing number of
dominant features are insignificant. The MRE of using eight
dominant features is 8.86%, which is very close to the MRE
TABLE IV
RESULTS OF PFA METHOD USING THREE RETAINED SINGULAR
VALUES
No. Dominant features selected MSE MRE
used (mm2) (%)
4 fa, fca, ra, hp 1.40 13.185 fa, fca, kts, ra, thp 1.365 12.896 fa, kts, re, skew, std, thp 1.133 10.537 fa, fca, fstd, kts, skew, thp, vf 1.130 10.468 fa, fca, fstd, kts, ra, re, skew, thp 0.949 8.889 fa, fca, fm, fstd, kts, ra, re, skew, thp 0.948 8.88
value of 8.80% from using all of the sixteen original features.
Our experiments also concluded that using the selected eight
dominant features using the proposed DFI method to build
MRMs save about 60% of computational time than that to
build the models with original sixteen features.
VII. CONCLUSION
In this paper, the Dominant Feature Identification (DFI)
methodology using Singular Value Decomposition (SVD) of
collected tool wear data is proposed for Recursive Least
Squares (RLS) prediction of times series of deterioration of
an industrial cutting tool. The DFI uses SVD which operates
on the inner product matrix at a lower dimension, and re-
duces the Least Squares Error (LSE) induced when selecting
the principal components and clustering to identify the dom-
inant features. Our experimental results show Mean Squares
Errors (MSEs) values from 0.946 mm2 to 1.262 mm2 and
Mean Relative Error (MRE) values from 8.86% to 11.61%
between the actual measured tool wear to that predicted from
the Multiple Regression Models (MRMs) using RLS.
ACKNOWLEDGEMENTS
This work was supported by ARO grant ARO W91NF-05-
1-0314 and NSF grant ECCS-0801330. This work was with
kind support by Alignment Tool (Singapore) Pte. Ltd.
REFERENCES
[1] J. Sun, G. S. Hong, M. Rahman, and Y. S. Wong, “Identificationof Feature Set for Effective Tool Condition Monitoring by AcousticEmission Sensing,” International Journal of Production Research, Vol.42, No. 5, pp. 901-918, 2004.
[2] K. Z. Mao, “Identifying Critical Variables of Principal Componentsfor Unsupervised Feature Selection,” IEEE Trans. Syst., Man, Cybern.
B, Vol 35, No. 2, pp. 339–344, April 2005.[3] A. Malhi and R. X. Gao, “PCA-Based Feature Selection Scheme for
Machine Defect Classification,” IEEE Trans. Instrum. Meas., Vol. 53,No. 6, pp. 1517–1525, December 2004.
[4] S. P. Lloyd, “Least Squares Quantization in PCM,” IEEE Trans.
Inform. Theory (Special Issue on Quantization), Vol. 28, No. 2, pp.129-137, March 1982.
[5] I. Cohen, Q. Tian, X. S. Zhou, and T. S. Huang, “Feature SelectionUsing Principal Feature Analysis,” in Proc. of the 15
th International
Conference on Information Processing, Rochester, NY, USA, Septem-ber 22–25, 2002.
[6] I. T. Jolliffe, Principal Component Analysis, Springer-Verlag, New-York, 1986.
[7] J. MacQueen, “Some Methods for Classification and Analysis ofObservations,” in Proc. Fifth Berkeley Symp. on Math. Statist. and
Prob., Vol. 1, pp. 281–297, (Univ. of Calif. Press), 1967.[8] Y. Altintas and I. Yellowley, “In-Process Detection of Tool Failure
in Milling Using Cutting Force Models,” Journal of Engineering for
Industry, Vol. 111, pp. 149–157, May 1989.
426