[IEEE 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Singapore...

6
Tool Wear Forecast Using Singular Value Decomposition for Dominant Feature Identification Chee Khiang Pang 1,2 , Jun-Hong Zhou 3,4 , Frank L. Lewis 1 , and Zhao-Wei Zhong 4 Abstract— Identification and prediction of lifetime of indus- trial cutting tools using minimal sensors is crucial to reduce production costs and down-time in engineering systems. In this paper, we provide a formal decision software tool to extract the dominant features enabling tool wear prediction. This decision tool is based on a formal mathematical approach that selects dominant features using the Singular Value Decomposition (SVD) of real-time measurements from the sensors of an industrial cutting tool. It is shown that the proposed method of dominant feature selection is optimal in the sense that it minimizes the least-squares estimation error. The identified dominant features are used with the Recursive Least Squares (RLS) algorithm to identify parameters in forecasting the time series of cutting tool wear on an industrial high speed milling machine. I. I NTRODUCTION In an era of intensive competition, where asset usage and plant operating efficiency must be maximized, unexpected downtime due to machinery failure has become more costly than before. Therefore predictive maintenance has been ac- tively pursued in the manufacturing industry in recent years, where equipment outages are predicted and maintenance is carried out only when necessary. To ensure successful condi- tion based maintenance, it is necessary to detect, identify and classify different kinds of failure modes in the manufacturing process. Tool wear in industrial cutting machines is particularly difficult to assess, as measurements of tool wear require that the machine is stopped, and the tool extracted and visually inspected. This results in machine down time and human user intervention costs. On the other hand, failure to remove worn tools can lead to their failure, with concomitant damage to expensive parts. In recent years, much research works that have been devoted to tool wear monitoring in the machining process using a set of reduced features [1][2][3]. In spite of these efforts, realization of acceptable prediction of tool wear still needs improvement. In this paper, we provide a formal decision software tool for selecting the dominant features that are most essential in predicting time series of tool wear in industrial cutting 1 C. K. Pang and F. L. Lewis are with Automation & Robotics Research Institute, The University of Texas at Arlington, Fort Worth, TX 76118, USA {ckpang,lewis}@uta.edu 2 C. K. Pang is with Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117576, Singapore 3 J. -H. Zhou is with A*STAR Singapore Institute of Manufacturing Technology, Singapore 638075, Singapore [email protected] 4 J. -H. Zhou and Z. -W. Zhong are with School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore 639798, Singapore [email protected] machines. A rigorous mathematical framework for Domi- nant Feature Identification (DFI) is developed that provides an autonomous rule-base for data and sensor reduction. We use Singular Value Decomposition (SVD) to decom- pose the inner product matrix of collected data from the sensors monitoring the wear of an industrial cutting tool. The principal components affecting the machine wear are optimized in a least squares sense in a certain reduced space, and the dominant features are extracted using the K- means clustering algorithm [4]. This DFI framework uses formal mathematical analysis to select dominant features based on the inner product matrix of the collected data, not the correlation (outer product) matrix. It is proven that the proposed method of dominant feature selection is optimal in the sense that it minimizes the total least-squares estimation error. The performance of DFI is evaluated based on the accuracy of prediction of the actual tool wear. Comparisons are made with another technique for feature selection in the literature [5]. II. PRINCIPAL COMPONENT ANALYSIS (PCA) In this section, the theoretical fundamentals of PCA using SVD are reviewed in a rigorous manner, which is essential for selecting dominant features in Section III. The SVD of a linear transformation X where X R m×n of rank n<m is X = U ΣV T (1) with U R m×n and V R n×n , such that U T U = V T V = I n with I n R n×n being an identity matrix of dimension n. Σ R n×n is a diagonal matrix whose elements are corresponding singular values (principal gains) arranged in descending order, i.e., with Σ= diag(σ 1 2 , ··· n ) and σ 1 σ 2 ≥···≥ σ n > 0. A. Approximation of Linear Transformation X X can be regarded as a transformation from feature space R n into data space R m . Note that X = n i=1 σ i u i v T i , where u i are the column vectors of U and v T i are the row vectors of V T , respectively. Partition the SVD of X according to X = U 1 U 2 Σ 1 0 0 Σ 2 V T 1 V T 2 = U 1 Σ 1 V T 1 + U 2 Σ 2 V T 2 (2) with q < m as the desired number of singular values to be retained in Σ 1 for data space of dimension m. As such, Σ 2 contains the n q discarded singular values. 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics Suntec Convention and Exhibition Center Singapore, July 14-17, 2009 978-1-4244-2853-3/09/$25.00 ©2009 IEEE 421

Transcript of [IEEE 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Singapore...

Page 1: [IEEE 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Singapore (2009.07.14-2009.07.17)] 2009 IEEE/ASME International Conference on Advanced Intelligent

Tool Wear Forecast Using Singular Value Decomposition for Dominant

Feature Identification

Chee Khiang Pang1,2, Jun-Hong Zhou3,4, Frank L. Lewis1, and Zhao-Wei Zhong4

Abstract— Identification and prediction of lifetime of indus-trial cutting tools using minimal sensors is crucial to reduceproduction costs and down-time in engineering systems. In thispaper, we provide a formal decision software tool to extract thedominant features enabling tool wear prediction. This decisiontool is based on a formal mathematical approach that selectsdominant features using the Singular Value Decomposition(SVD) of real-time measurements from the sensors of anindustrial cutting tool. It is shown that the proposed methodof dominant feature selection is optimal in the sense that itminimizes the least-squares estimation error. The identifieddominant features are used with the Recursive Least Squares(RLS) algorithm to identify parameters in forecasting the timeseries of cutting tool wear on an industrial high speed millingmachine.

I. INTRODUCTION

In an era of intensive competition, where asset usage and

plant operating efficiency must be maximized, unexpected

downtime due to machinery failure has become more costly

than before. Therefore predictive maintenance has been ac-

tively pursued in the manufacturing industry in recent years,

where equipment outages are predicted and maintenance is

carried out only when necessary. To ensure successful condi-

tion based maintenance, it is necessary to detect, identify and

classify different kinds of failure modes in the manufacturing

process.

Tool wear in industrial cutting machines is particularly

difficult to assess, as measurements of tool wear require that

the machine is stopped, and the tool extracted and visually

inspected. This results in machine down time and human user

intervention costs. On the other hand, failure to remove worn

tools can lead to their failure, with concomitant damage to

expensive parts. In recent years, much research works that

have been devoted to tool wear monitoring in the machining

process using a set of reduced features [1][2][3]. In spite

of these efforts, realization of acceptable prediction of tool

wear still needs improvement.

In this paper, we provide a formal decision software tool

for selecting the dominant features that are most essential

in predicting time series of tool wear in industrial cutting

1C. K. Pang and F. L. Lewis are with Automation & Robotics ResearchInstitute, The University of Texas at Arlington, Fort Worth, TX 76118, USA{ckpang,lewis}@uta.edu

2C. K. Pang is with Department of Electrical and Computer Engineering,National University of Singapore, Singapore 117576, Singapore

3J. -H. Zhou is with A*STAR Singapore Institute ofManufacturing Technology, Singapore 638075, [email protected]

4J. -H. Zhou and Z. -W. Zhong are with School of Mechanicaland Aerospace Engineering, Nanyang Technological University, Singapore639798, Singapore [email protected]

machines. A rigorous mathematical framework for Domi-

nant Feature Identification (DFI) is developed that provides

an autonomous rule-base for data and sensor reduction.

We use Singular Value Decomposition (SVD) to decom-

pose the inner product matrix of collected data from the

sensors monitoring the wear of an industrial cutting tool.

The principal components affecting the machine wear are

optimized in a least squares sense in a certain reduced

space, and the dominant features are extracted using the K-

means clustering algorithm [4]. This DFI framework uses

formal mathematical analysis to select dominant features

based on the inner product matrix of the collected data, not

the correlation (outer product) matrix. It is proven that the

proposed method of dominant feature selection is optimal in

the sense that it minimizes the total least-squares estimation

error. The performance of DFI is evaluated based on the

accuracy of prediction of the actual tool wear. Comparisons

are made with another technique for feature selection in the

literature [5].

II. PRINCIPAL COMPONENT ANALYSIS (PCA)

In this section, the theoretical fundamentals of PCA using

SVD are reviewed in a rigorous manner, which is essential

for selecting dominant features in Section III.

The SVD of a linear transformation X where X ∈ Rm×n

of rank n < m is

X = UΣV T (1)

with U ∈ Rm×n and V ∈ R

n×n, such that UT U =V T V = In with In ∈ R

n×n being an identity matrix of

dimension n. Σ ∈ Rn×n is a diagonal matrix whose elements

are corresponding singular values (principal gains) arranged

in descending order, i.e., with Σ = diag(σ1, σ2, · · · , σn)and σ1 ≥ σ2 ≥ · · · ≥ σn > 0.

A. Approximation of Linear Transformation X

X can be regarded as a transformation from feature

space Rn into data space R

m. Note that X =∑n

i=1 σiuivTi ,

where ui are the column vectors of U and vTi are the

row vectors of V T , respectively. Partition the SVD of X

according to

X =[

U1 U2

]

[

Σ1 00 Σ2

] [

V T1

V T2

]

= U1Σ1VT1 + U2Σ2V

T2 (2)

with q < m as the desired number of singular values

to be retained in Σ1 for data space of dimension m. As

such, Σ2 contains the n − q discarded singular values.

2009 IEEE/ASME International Conference on Advanced Intelligent MechatronicsSuntec Convention and Exhibition CenterSingapore, July 14-17, 2009

978-1-4244-2853-3/09/$25.00 ©2009 IEEE 421

Page 2: [IEEE 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Singapore (2009.07.14-2009.07.17)] 2009 IEEE/ASME International Conference on Advanced Intelligent

Obviously, U1 ∈ Rm×q , U2 ∈ R

m×(n−q), Σ1 ∈ Rq×q, Σ2 ∈

R(n−q)×(n−q), V T

1 ∈ Rq×n, and V T

2 ∈ R(n−q)×n.

Now the approximation X to X is

X = U1Σ1VT1 . (3)

Then X =∑q

i=1 σiuivTi contains the columns ui of U1 and

the rows vTi of V T

1 . The dominant singular values, i.e., the q

retained singular values, and their associated columns of U

are called principal components in PCA [6].

The error induced X by the approximation X of the linear

transformation X is given by

X = X − X = U2Σ2VT2 =

n∑

i=q+1

σiuivTi . (4)

The covariance matrix of the approximation error is

PX = (X − X)(X − X)T

= U2Σ2VT2 V2Σ2U

T2 = U2Σ

22U

T2 . (5)

The 2-norm of the approximation error is given by

tr{PX} = tr{U2Σ22U

T2 } = tr{Σ2

2UT2 U2} = tr{Σ2

2}

=

n∑

i=q+1

σ2i (6)

where tr{¦} denotes the trace operation. This is the sum of

squares of the neglected singular values. It can be shown that

this SVD approximation gives the Least-Square Error (LSE)

of any approximation to X of rank q.

B. Approximation in Range Space by Principal Components

Now, we identify Rn as the feature space and R

m as the

range space of X which we term the data space. Consider an

arbitrary vector x ∈ Rn being mapped onto a vector z ∈ R

m

by X according to z = Xx. As such, z in the singular value

space (range space) of Rm can also be represented according

to the partitioned singular value matrix in (2) as

z = Xx = U1Σ1VT1 x + U2Σ2V

T2 x (7)

with Σ1 containing the retained q singular values of X .

An approximation to z is z given in terms of the q retained

singular values as

z = U1Σ1VT1 x = Xx (8)

with X being the approximation of X . Note that z =∑q

i=1 ui(σivTi x) which expresses z as a linear combination

of principal components ui with coefficients (σivTi x).

The approximation error is given by

z = z − z = (X − X)x = U2Σ2VT2 x (9)

and the approximation error 2-norm is given by

zT z = xT V2Σ22V

T2 x

∴ ||z||2 = tr{Σ22V

T2 xxT V2}

≤ σ2q+1x

T V2VT2 x = σ2

q+1||x||2

≤ tr{Σ22}||x||

2. (10)

III. DOMINANT FEATURE IDENTIFICATION (DFI)

In this section, the proposed Dominant Feature Identi-

fication (DFI) methodology of using SVD to identify the

dominant features is detailed. Note that traditional PCA is

performed with respect to data space Rm, but the features

however reside in Rn.

A. Data Compression

Select

Y = UT1 X ∈ R

q×n (11)

so that x ∈ Rn is mapped to y = Y x = UT

1 Xx ∈ Rq . Then

vectors z = Xx ∈ Rm may be approximated in terms of

vectors y ∈ Rq according to

z = U1y. (12)

Combining with (2), we get an approximation of z ∈ Rm

in terms of vectors in the reduced space Rq depicted by (11).

viz.

z = U1y = U1UT1 Xx

= (U1UT1 )(U1Σ1V

T1 + U2Σ2V

T2 )x

= U1Σ1VT1 x = Xx (13)

with X depicted in (3).

It is well known in the literature that (11) and (13) provide

the best approximation of data vectors in Rm in terms of

reduced vectors y ∈ Rq . The approximation error z is given

by

z = (X − X)x = Xx = U2Σ2VT2 x (14)

which is exactly as (9). In fact, note that

zT z = (U1Σ1VT1 x)T U2Σ2V

T2 x

= = xT V1Σ1UT1 U2Σ2V

T2 x = 0, (15)

i.e., z is orthogonal to z which implies that (11) is the optimal

choice of Rq for Least Squared Error (LSE) approximation

of vectors in Rm by (12).

Note that approximation of z ∈ Rm using y ∈ R

q , the

reduced space is equivalent to approximation by principal

components in (8).

B. Selection of Dominant Features

For any q > 0, the selection of the first q singular values

of X yields a reduced space Rq generated by the linear

transformation UT1 : R

n → Rq as in (11). Moreover, y =

Y x and

z = U1y (16)

best approximates z = Xx in a least squares sense. This

allows us to use reduced vectors y ∈ Rq to compute

approximations to data vectors z ∈ Rm, instead of using

the full feature vector x ∈ Rn.

Now we wish to approximate further by selecting the

dominant features in Rn. That is, it is desired to select the

most important basis vectors from Rn to approximate the

422

Page 3: [IEEE 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Singapore (2009.07.14-2009.07.17)] 2009 IEEE/ASME International Conference on Advanced Intelligent

data vectors z. To do so, it is instrumental to note the little-

realized fact that

Y = UT1 X = UT

1 (U1Σ1VT1 + U2Σ2V

T2 ) = Σ1V

T1 . (17)

The original basis vectors in feature space Rn are known

as features, with the ith basis vector corresponding to

the ith feature. The original basis vectors are denoted in Rn

by {e1, e2, · · · , en} with ei being the ith column of In, i.e., ei

is an n-vector consisting of zeros except for one in the ith

position.

In terms of these notions, the vector generated in Rq by

the ith feature ei ∈ Rn is given by

Y ei = UT1 Xei = Σ1V

T1 ei. (18)

Recall that the rows of V T1 are denoted by row vec-

tors vTi , i.e., the columns of V1 are denoted by column

vectors vi. By contrast, denote now the columns of Σ1VT1

as column vectors wi ∈ Rq. Then

Y ei = Σ1VT1 ei =

[

w1 w2 · · · wn

]

ei = wi. (19)

Therefore, the ith feature in Rn maps into the reduced

space Rq as the ith column of matrix Σ1V

T1 . There are

therefore n vectors wi in Rq corresponding to the n basis

axes ei, i.e., features, in Rn.

The above-mentioned notions of the proposed DFI algo-

rithm are summarized in Fig. 1.

Fig. 1. Proposed DFI algorithm showing feature space Rn, compressed

feature space Rq , and data (singular value) space R

m.

We now want to select the best features to retain so as

to obtain the best approximation to z ∈ Rm. We call these

dominant features. This corresponds to selecting which basis

vectors ei in Rn to retain, which is equivalent to selecting the

best columns wi of Σ1VT1 ∈ R

q . This may be accomplished

by several methods, including projections; we use clustering

methods inspired by [5]. Then, z ∈ Rm will be approximated

using the selected p dominant features within Rn.

Note that we will cluster the n columns wi of Σ1VT1 ,

as dictated by (19). This is in contrast to [5] who clustered

the columns of UT1 in (11). Clustering is the classification

of n objects in a data set into p different subsets (clusters),

usually by minimizing some norms or pre-defined perfor-

mance indices. To select the dominant features in Rn, we

cluster the n vectors wi ∈ Rq into n ≥ p ≥ q clusters. For

our application, the commonly used K-means algorithm is

used [7]. The K-means algorithm minimizes the following

positive semidefinite scalar error cost function J iteratively

J =

p∑

i=1

wj∈Si

(wj − ci)T (wj − ci) (20)

where Si is the ith cluster set, and ci is its centroid (or

center of “mass”) in the cluster space. J is in essence the

expectation of the 2-norm (or Euclidian distance) between

the objects in the cluster. For good approximations in Rm,

one should select p > q, the number of retained singular

values.

C. Error Analysis

Here we determine the total error induced by retaining

only q singular values and by clustering the vectors wi ∈ Rq

into p clusters. It is shown that the proposed method of DFI is

optimal in the sense that it minimizes the total least-squares

estimation error.

For each cluster, we shall select the vector wi ∈ Rq

closest to the cluster center ci as representative of each other

vector wj ∈ Rq in that cluster. We call this representative

vector for cluster i, wi, the cluster leader. The p features ei ∈R

n corresponding to the p cluster leaders wi ∈ Rq shall be

selected as dominant features. This means that the clustering

error is given by

J =

p∑

i=1

wj∈Si

(wj − wi)T (wj − wi). (21)

To summarize these notions, recall that

Y = Σ1VT1 =

[

w1 w2 · · · wn

]

(22)

and define

Y =[

w1 w2 · · · wn

]

(23)

where wj = wi if wj ∈ Si, i.e., each vector wj ∈ Si is

replaced by its cluster leader wi.

This means that only the corresponding features ei ∈ Rn

are needed for computation since wi = Σ1VT1 ei.

Note that by (19)

y = Σ1VT1 x = Y x =

n∑

j=1

wjxj =

n∑

j=1

xj

[

Σ1VT1 ej

]

(24)

and define

y = Y x =

n∑

j=1

wjxj =

p∑

i=1

(

wj∈Si

xj

)

wi. (25)

Then an estimate for z ∈ Rm taking into account both q < n

retained singular values and p < n features is given by

ˆz = U1y. (26)

Recall from (16) that z = U1y, so the error induced by

clustering is

z − ˆz = U1(y − y) = U1(Y − Y )x = U1Y x. (27)

423

Page 4: [IEEE 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Singapore (2009.07.14-2009.07.17)] 2009 IEEE/ASME International Conference on Advanced Intelligent

Therefore, the error norm induced by clustering is

||z − ˆz||2 = (z − ˆz)T (z − ˆz) = xT Y T UT1 U1Y x

= tr{Y T UT1 U1Y xxT } ≤ J ||x||2 (28)

since J = tr{Y T Y }.

The total error induced by neglecting the n − q singular

values in Σ2 and by clustering is then

z − ˆz = (z − z) + (z − ˆz) = U2Σ2VT2 x + U1Y x. (29)

Therefore, the total approximation error norm is

||z − ˆz||2 ≤ (tr{Σ22} + J)||x||2 (30)

whose first term depends on the neglected singular values,

and the second term is the clustering error, i.e., the neglected

features.

We claim that the procedure of first selecting q principle

components and then selecting p dominant features yields

the minimum overall approximation error in (29). For note

that

zT (z − ˆz) = (U2Σ2VT2 x)T U1Y x

= xT V2Σ2UT2 U1Y x = 0, (31)

i.e., the error in neglecting n − q singular values and the

clustering error are orthogonal. This means that there is

no better way of selecting dominant features than the DFI

methodology proposed therein.

D. Simplified Computations

Traditional PCA relies on computations using the cor-

relation matrix XXT = UΣ2UT ∈ Rm×m [6]. This is

computationally expensive since generally m ≫ n.

Defining the inner product matrix as XT X , we get

XT X = V ΣUT UΣV T = V Σ2V T . (32)

As XT X ∈ Rn×n with n ≪ m, the computation of Σ1

and V T1 required to find Y in (17) is highly simplified

using (32).

IV. TIME SERIES FORECASTING

We now wish to apply DFI to prediction of tool wear in

an industrial cutting tool. Tool wear can only be measured

by removing the tool and performing visual inspection and

measurement, which is tedious and time consuming and

results in down-time for the machine. We wish to predict tool

wear using signals that are easily monitored in real time.

In our application, sensors are used to measure cutting

force online in real time. Then, signal processing is used to

compute n features, such as mean force value, maximum

force level, standard deviation, third moment (skew), etc.

Data is taken over N time steps and stored, and consists

of the values of the n features at each time step, along with

the tool wear measured (by visual inspection) at each time

step.

Partition the collected date set over N time steps into two

sets. The data through time m < N is used as a training set to

compute the p dominant features and determine the unknown

parameter vector θ as a training set. Then the remaining data

from time m + 1 to time N is used to verify the prediction

accuracy as a validation set. In our application, N = 20×103

samples, so a reasonable choice for m is m = 15 × 103

samples. The number of features computed is n = 16. These

are described in the Section V.

A. Determining the Dominant Features

Define

X ≡

fT1

fT2...

fTk

∈ Rm×n (33)

in terms of the collected data from the installed sensors

through time m. Use the machinery in Sections II and III

to select the p dominant features.

B. Prediction of Tool Wear

Define ϕTk ∈ R

p as the vector containing the measured

dominant features at time k ≤ m. Note that ϕk is a p-

subvector of fk. Then one desires to predict the tool wear dk

in terms of the p dominant features using

dk = ϕTk θ (34)

To do so, one can estimate the parameter vector θ using the

measured data with least-squares techniques. Note that only

the p dominant features are used, i.e., θ ∈ Rp.

Define

Φkθ ≡

ϕT1

ϕT2...

ϕTk

θ =

d1

d2

...

dk

≡ Dk (35)

in terms of the data collected. The estimation error through

time k is

Ek = Dk − Φkθ (36)

where θ ∈ Rp is the vector unknowns to be regressed for

time series forecast of tool wear.

A least square estimation of θ which minimizes the error

norm ETk Ek is given by the standard unique batch solution

θ = (ΦTk Φk)−1ΦT

k Dk (37)

if there is sufficient persistent excitation, i.e., ΦTk Φk is

invertible. To compute θ using efficient on-line recursive

means, one may use Recursive Least Squares (RLS) instead

of (37).

V. INDUSTRIAL APPLICATION

A case study is carried out to verify the usability of the

method. An application related to high speed machining tool

condition is selected for the experiment.

424

Page 5: [IEEE 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Singapore (2009.07.14-2009.07.17)] 2009 IEEE/ASME International Conference on Advanced Intelligent

A. Experimental Setup

In our experiment, we used a nose ball cutter in a milling

machine as the test bed. The cutting force signal is used to

establish usable models due to its high sensitivity to tool

wear, low noise, and good measurement accuracy [8]. The

cutting forces along the X, Y, and Z axes were captured

using a Kistler dynamometer in the form of charges, which

were converted to voltages and sampled by a PCI 1200 board

at 2000 Hz. The flank wear of each individual tooth of the

cutting tool was measured with an Olympus microscope. The

experimental set up is shown in Fig. 2.

Fig. 2. Experimental setup.

B. Selection of Dominant Features

Sixteen features from the methodologies mentioned above

are summarized in Table I and form the scope of the feature

subset selection. To avoid inefficiency which results in loss

in productivity, the tool wear and part failures are estimated

online without ceasing operation of the cutting tool.

TABLE I

FEATURES AND NOMENCLATURE

No Feature Notation

1 Residual error re2 First order differencing fod3 Second order differencing sod4 Maximum force level fm5 Total amplitude of cutting force fa6 Combined incremental force changes df7 Amplitude ratio ra8 Standard deviation of force components fstd

in tool breakage zone9 Sum of the squares of residual errors sre10 Peak rate of cutting forces kpr11 Total harmonic power thp12 Average force fca13 Variable force vf14 Standard deviation std

15 Skew (3rd moment) skew

16 Kurtosis (4th moment) kts

TABLE II

PRINCIPAL COMPONENTS AND SINGULAR VALUES

No. Singular values No. Singular values

1 31.90702 9 0.00001

2 1.082043 10 3.07857× 10−6

3 0.00342 11 1.13154× 10−6

4 0.00026 12 6.45546× 10−7

5 0.00011 13 4.62940× 10−7

6 0.00005 14 1.71882× 10−7

7 0.00005 15 4.60490× 10−9

8 0.00001 16 2.02272× 10−9

C. Calculation of Dominant Features

Based on the above, the five following steps are involved

in the proposed DFI to obtain a feature subset:

1) Data acquisition. Collect data on time interval [0,m]from the cutting tools’ sensors and compute n features

using digital signal processing techniques. Pack n time

series, each of length m, as columns into matrix X ∈R

m×n.

2) Initialization. Detrend the data in X by subtracting

the mean (average across each dimension) of each of

the dimensions to normalize the data set to zero-mean.

This step results in a data set whose mean is zero.

3) Choose number of principal components and clus-

ters. Perform SVD on inner product matrix XT X to

obtain Σ1 and V T1 . Select the desired number q of

principal components in σivi and number of clusters p.

4) Clustering. Use the K-means algorithm for clustering

to find the centroids ci of each cluster.

5) Subset Determination. Select the vector wi “nearest”

to the centroid of each cluster as its cluster leader wi,

and corresponding ei as the dominant feature. Combine

the p dominant features to form the reduced feature

space.

VI. EXPERIMENTAL RESULTS

In the experiment, 52,800 time points of measured force

sensor data were captured under the following machine

settings: spindle speed 1000 rpm, feed rate 200 mm/min,

depth of cut 1 mm, and insert number 2. This yields the

baseline actual tool wear plot shown in Figs. 3–4, and the

resulting singular values are shown in Table II.

Using the RLS techniques of Section IV, a Multiple

Regression Model (MRM) (34) was identified to predict the

baseline measured tool wear using all sixteen of the original

features. Fig. 3 shows the actual measured tool wear and the

predicted tool wear using this MRM as functions of time.

A Mean Relative Error (MRE) of 8.8% is observed and

represents our best possible prediction of tool wear using

this set of sixteen features.

Next, we select q = 3 retained singular values and p = 4dominant features and the result is shown in Fig. 4. The

tool wear prediction MRE is 11.12%, and is close to the

MRE of 8.8% obtained using all sixteen features. The four

dominant features turn out to be {fa, fca, fstd, thp}.

425

Page 6: [IEEE 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Singapore (2009.07.14-2009.07.17)] 2009 IEEE/ASME International Conference on Advanced Intelligent

Fig. 3. MRM using sixteen dominant features and the RLS algorithm.

Fig. 4. Examples of MRMs using three retained singular values, fourdominant features, and the RLS algorithm.

Also, we compared our DFI method to the PFA method

of [5]. Fig. 4 shows the actual measured tool wear, and

its prediction using the best four features selected by DFI

and PFA. The MRE for DFI is 11.12%, while MRE for

PFA is 13.18%. Comparison results of DFI and PFA using

three retained singular values are shown in Table III using

DFI methodology and in Table IV using PFA method,

respectively.

TABLE III

RESULTS OF DFI METHOD USING THREE RETAINED SINGULAR

VALUES

No. Dominant features selected MSE MRE

used (mm2) (%)

4 fa, fca, fstd, thp 1.262 11.615 fa, fca, fm, skew, thp 1.202 11.196 fa, fca, fstd, ra, sre, thp 1.111 10.497 fa, fca, fstd, ra, skew, sod, thp 1.111 10.408 fa, fca, fstd, kts, ra, skew, sod, thp 0.946 8.869 fa, fca, fstd, kpr, kts, ra, skew, thp, vf 0.946 8.86

The features chosen are different using the proposed DFI

methodology and PFA. DFI gives a smaller MSE and MRE

when compared to that using PFA, which provides a better

accuracy than PFA in tool wear prediction. When the number

of dominant features chosen reaches to eight or more, the

improvements in MSE and MRE using increasing number of

dominant features are insignificant. The MRE of using eight

dominant features is 8.86%, which is very close to the MRE

TABLE IV

RESULTS OF PFA METHOD USING THREE RETAINED SINGULAR

VALUES

No. Dominant features selected MSE MRE

used (mm2) (%)

4 fa, fca, ra, hp 1.40 13.185 fa, fca, kts, ra, thp 1.365 12.896 fa, kts, re, skew, std, thp 1.133 10.537 fa, fca, fstd, kts, skew, thp, vf 1.130 10.468 fa, fca, fstd, kts, ra, re, skew, thp 0.949 8.889 fa, fca, fm, fstd, kts, ra, re, skew, thp 0.948 8.88

value of 8.80% from using all of the sixteen original features.

Our experiments also concluded that using the selected eight

dominant features using the proposed DFI method to build

MRMs save about 60% of computational time than that to

build the models with original sixteen features.

VII. CONCLUSION

In this paper, the Dominant Feature Identification (DFI)

methodology using Singular Value Decomposition (SVD) of

collected tool wear data is proposed for Recursive Least

Squares (RLS) prediction of times series of deterioration of

an industrial cutting tool. The DFI uses SVD which operates

on the inner product matrix at a lower dimension, and re-

duces the Least Squares Error (LSE) induced when selecting

the principal components and clustering to identify the dom-

inant features. Our experimental results show Mean Squares

Errors (MSEs) values from 0.946 mm2 to 1.262 mm2 and

Mean Relative Error (MRE) values from 8.86% to 11.61%

between the actual measured tool wear to that predicted from

the Multiple Regression Models (MRMs) using RLS.

ACKNOWLEDGEMENTS

This work was supported by ARO grant ARO W91NF-05-

1-0314 and NSF grant ECCS-0801330. This work was with

kind support by Alignment Tool (Singapore) Pte. Ltd.

REFERENCES

[1] J. Sun, G. S. Hong, M. Rahman, and Y. S. Wong, “Identificationof Feature Set for Effective Tool Condition Monitoring by AcousticEmission Sensing,” International Journal of Production Research, Vol.42, No. 5, pp. 901-918, 2004.

[2] K. Z. Mao, “Identifying Critical Variables of Principal Componentsfor Unsupervised Feature Selection,” IEEE Trans. Syst., Man, Cybern.

B, Vol 35, No. 2, pp. 339–344, April 2005.[3] A. Malhi and R. X. Gao, “PCA-Based Feature Selection Scheme for

Machine Defect Classification,” IEEE Trans. Instrum. Meas., Vol. 53,No. 6, pp. 1517–1525, December 2004.

[4] S. P. Lloyd, “Least Squares Quantization in PCM,” IEEE Trans.

Inform. Theory (Special Issue on Quantization), Vol. 28, No. 2, pp.129-137, March 1982.

[5] I. Cohen, Q. Tian, X. S. Zhou, and T. S. Huang, “Feature SelectionUsing Principal Feature Analysis,” in Proc. of the 15

th International

Conference on Information Processing, Rochester, NY, USA, Septem-ber 22–25, 2002.

[6] I. T. Jolliffe, Principal Component Analysis, Springer-Verlag, New-York, 1986.

[7] J. MacQueen, “Some Methods for Classification and Analysis ofObservations,” in Proc. Fifth Berkeley Symp. on Math. Statist. and

Prob., Vol. 1, pp. 281–297, (Univ. of Calif. Press), 1967.[8] Y. Altintas and I. Yellowley, “In-Process Detection of Tool Failure

in Milling Using Cutting Force Models,” Journal of Engineering for

Industry, Vol. 111, pp. 149–157, May 1989.

426