C ISSN: 0740-817X print / 1545-8830 online DOI: 10.1080...

Dow

nloa

ded

By: [

aple

y@no

rthw

este

rn.e

du] A

t: 14

:37

27 M

arch

200

7 IIE Transactions (2007) 39, 691–701Copyright C© “IIE”ISSN: 0740-817X print / 1545-8830 onlineDOI: 10.1080/07408170600899573

Identifying and visualizing nonlinear variation patterns inmultivariate manufacturing data

DANIEL W. APLEY1,∗ and FENG ZHANG2

1Department of Industrial Engineering and Management Science, Northwestern University, Evanston, IL 60208-3119, USAEmail: [email protected] Semiconductor, 82 Running Hill Road, South Portland, ME 04106, USAEmail: [email protected]

Received March 2004 and accepted October 2005

In modern manufacturing processes, it is not uncommon to have hundreds, or even thousands, of different process variables that aremeasured and recorded in databases. The large quantities of multivariate measurement data usually contain valuable information onthe major sources of variation that contribute to the final product or process variability. The focus of this work is on variation sourcesthat result in complex, nonlinear variation patterns in the data. We propose a model for representing nonlinear variation patterns anda method for blindly identifying the patterns, based on a sample of measurement data, with no prior knowledge of the nature of thepatterns. The identification method is based on principal curve estimation, in conjunction with a data preprocessing step that makesit suitable for high dimensional data. We also discuss an approach for interactively visualizing the nature of the identified variationpatterns, to aid in identifying and eliminating the major root causes of manufacturing variation.

Keywords: Principal curves, principal component analysis, blind identification, variation patterns, statistical process control

1. Introduction

Recent advances in in-process measurement and data col-lection technologies in manufacturing settings, are allowinghundreds and even possibly thousands of different processvariables to be measured, often for 100% of the parts beingproduced. Companies invest in measurement technologyand generate large quality-related databases with the expec-tation that this will somehow aid in efforts to systematicallyidentify and eliminate the major root causes of manufac-turing variation. Effectively transforming the large sets ofmeasurement data into extracted knowledge that is usefulfor variation reduction efforts, however, is a challengingdata mining problem.

We investigate this problem following the same paradigmconsidered in Apley and Lee (2003), in which one viewseach major variation source as causing a distinct variationpattern in the data, spreading across any or all of the dif-ferent variables that are measured. Each variation patternrepresents the interrelated manner in which the differentvariables vary from part to part, due to the effects of aparticular variation source. The data mining objective be-comes one of unsupervised learning, in which we seek to

∗Corresponding author

discover the nature of any variation pattern that happensto be present in the data. We refer to this as blindly iden-tifying the variation patterns, in the sense that we are notattempting to recognize the presence of premodeled or pre-trained patterns. Rather, we seek to identify the nature ofthe patterns based only on a sample of data, with no priortraining or modeling required. After blindly identifying thenature of a variation pattern, the results can be graphicallyillustrated in order to facilitate root cause identification,which we illustrate with examples in later sections.

A number of techniques have been recently developedfor blindly identifying the nature of variation patterns inlarge sets of manufacturing measurement data (e.g., Apleyand Shi (2001), Apley and Lee (2003), and Lee and Apley(2004)). All of these approaches assume that the varia-tion patterns can be represented using linear models, how-ever. Although linear models are reasonably versatile, thereare many situations in which nonlinear models are re-quired to represent variation patterns, such as in the fol-lowing example from autobody assembly. Figure 1 illus-trates schematically the rear liftgate opening of a sportsutility vehicle and shows the locations at which six cross-car (left/right) dimensional measurements on the left andright bodysides are taken (denoted x1 through x6). The mea-surements are obtained automatically via in-process lasermeasurement, so that every autobody is measured (for a

0740-817X C© 2007 “IIE”

Dow

nloa

ded

By: [

aple

y@no

rthw

este

rn.e

du] A

t: 14

:37

27 M

arch

200

7 692 Apley and Zhang

Fig. 1. Measurement layout on the liftgate opening of an auto-body. The fore direction is pointing into the page.

more detailed description of the assembly process and mea-surement technology, refer to Ceglarek and Shi (1996) orApley and Shi (1998; 2001)). Although almost 200 differ-ent dimensional features were measured for each autobody,for simplicity we illustrate with only the six measurementsshown in Fig. 1. The nonlinear variation pattern describedin the following paragraphs primarily affected only the lift-gate region of the autobody.

The assembly process is relatively complex and involveshundreds of different assembly stations and thousands of

Fig. 2. Scatter plots of data from 100 measured autobodies exhibiting a nonlinear variation pattern.

different tooling elements. When a tooling element breaks,wears, malfunctions, or is simply not designed properly,this often results in a distinct variation pattern in the di-mensional measurement data. Figure 2 illustrates this withscatter plots of pairs of the six variables over a sample of100 measured autobodies. The measurements are devia-tions from nominal, in units of millimeters. A positive mea-surement represents deviation to the right. Although therelationship between x2 and x3 appears linear, the scatterplots for x2/x5 and x3/x4 clearly illustrate that the varia-tion pattern is nonlinear. Moreover, this nonlinear patternappears to be approximately piecewise linear with only twosegments (i.e., pieces).

One potential root cause for the variation pattern is dueto a fixturing problem when locating the right bodyside inthe framing station (a major assembly station in which theleft and right bodysides are joined to the underbody anda set of upper cross-members). When the right bodysidedeviates by only a small amount to the right, it has no af-fect on the left bodyside. When the right bodyside deviatesby a larger amount to the right, however, it begins to inter-fere with the upper cross-member. The upper cross-memberthen interferes with the left bodyside by also pulling the leftbodyside towards the right.

In situations like that depicted in Fig. 2, linear mod-els are inadequate to represent the nonlinear relationshipbetween the different variables. One potential method fortreating nonlinear variation patterns is based on the notion

Dow

nloa

ded

By: [

aple

y@no

rthw

este

rn.e

du] A

t: 14

:37

27 M

arch

200

7 Analyzing multivariate manufacturing data 693

of a principal curve (Hastie and Stuetzle, 1989), which isa nonlinear generalization of Principal Components Anal-ysis (PCA). Broadly speaking, a principal curve is definedas a one-dimensional curve that passes through the middleof the distribution of higher dimensional data. Principalcurve estimation is relatively robust and has been appliedto a variety of nonlinear data analysis problems (Tibshirani,1996; Chang and Ghosh, 2001; Delicado, 2001). However,most applications of principal curves have been for rela-tively low dimensional data, especially for two-dimensionalimage processing (e.g., Banfield and Raftery (1992), Keglet al., (2000), and also Chang and Ghosh (2001)). For thehigh dimensional data often encountered in manufacturing,principal curve estimation becomes inefficient. Because ofthis, we use a common data preprocessing step that involveslinear PCA to first reduce the dimensionality of the prob-lem. This not only reduces the computational complexity,but also improves the estimation accuracy by filtering out asubstantial portion of the noise (i.e., random variations thatare not due to any systematic pattern). One advantage ofthe principal curve approach is that it lends itself well to vi-sualizing the blindly identified variation patterns. Effectivevisualization of a variation pattern is crucial for identifyingthe root cause of the variation. In a later section, we illus-trate the visualization approach with an example in whichthe data are extremely high dimensional, representing pointcloud data from laser-scanned stamped panels.

The remainder of the paper is organized as follows. Sec-tion 2 discusses the model we use to represent nonlinearmanufacturing variation patterns. Section 3 provides somebackground on linear PCA and nonlinear principal curves.Although PCA is a common first step for reducing dimen-sionality in data mining applications, including principalcurve estimation (e.g., Delicado and Huerta (2003)), theamount of information that is lost ranges from nothingat all to perhaps a non-negligible amount. In Section 4,we argue that for our application virtually no informationis lost. In Section 5, we describe the algorithm for prin-cipal curve estimation with the PCA preprocessing step.Examples illustrating the approach are provided in Sec-tions 5 and 6. Section 7 includes a Monte Carlo analy-sis demonstrating the performance improvement when oneuses linear PCA to first reduce the dimensionality of theproblem

2. Representing linear and nonlinear variation patterns

In most of the previous work on identifying variation pat-terns in manufacturing data (e.g., Apley and Shi (2001) andalso Apley and Lee (2003)), the following linear model wasused to represent the pattern. Let X = [X1, X2, . . . , Xd ]T bea d × 1 random vector that represents a set of d measuredcharacteristics from the product or process. Let xi: i = 1,2, . . . , N be a sample of N observations of X. In autobodyassembly, for example, X would represent the vector of all

measured dimensional characteristics across a given auto-body, and N would denote the number of autobodies inthe sample. If there is only a single variation source presentin the manufacturing process over a particular sample, thelinear model takes the form:

X = ct + W, (1)

where c is a d × 1 constant vector and t is a scalar randomvariable that is scaled (without loss of generality) to haveunit variance. The noise W = [W1, W2, . . . , Wd ]T is a d × 1zero-mean random vector that is independent of t .

The interpretation of the model is that the effects of thevariation source on the measurements are represented bythe elements of c. The vector c therefore indicates the spa-tial nature of the variation pattern, in terms of the resultinginterrelationships between the different measured variables.The random vector W represents the aggregated effects ofmeasurement noise and any unsystematic variation that isnot attributed to the major variation source. Unless oth-erwise noted, it is assumed throughout that the covariancematrix of W is w = σ 2

w I, a scalar multiple of the iden-tity matrix. Apley and Lee (2003) discussed approaches fordealing with more general noise covariance structure, whichalso apply to the present case of nonlinear patterns.

Although linear models of this form are relatively com-mon for representing manufacturing variation (Barton andGonzalez-Barreto, 1996; Ceglarek and Shi, 1996; Jin andShi, 1999; Ding et al., 2002; Zhou et al., 2003), they cannotdescribe more complex nonlinear phenomena. In analogywith the linear model (1), we propose the following modelfor representing nonlinear variation patterns:

X = f(t) + W, (2)

where t and W are as in the linear model, and f(·) is somegeneral vector-valued nonlinear function that representsthe spatial nature of the nonlinear variation pattern. Forthe variation pattern shown in Fig. 2, f(·) would be (at leastapproximately) a piecewise linear function with two pieces.

In the linear situation, the objective is to blindly estimatec, given a sample of data. Apley and Shi (2001) and Apleyand Lee (2003) provide examples of this and discuss howgraphical illustrations of the estimated c can aid in identi-fying the root cause of the variation. In the nonlinear case,the objective is to blindly estimate the entire function f(·),a suitable illustration of which can aid in root cause iden-tification. The remainder of this paper focuses on how toestimate f(·).

3. PCA and principal curves

Let µx and Σx denote the mean vector and covariance ma-trix, respectively, of X. For linear variation patterns, PCAcan be used directly to provide an estimate of c. PCA in-volves the analysis of the eigenvectors and eigenvalues ofΣx(Johnson and Wichern, 2002). Denote the eigenvalues and

Dow

nloa

ded

By: [

aple

y@no

rthw

este

rn.e

du] A

t: 14

:37

27 M

arch

200


corresponding eigenvectors by λj : j = 1, 2, . . . , d andzj : j = 1, 2, . . . , d, respectively, where the eigenvalues arearranged in descending order. It is well known that whena single linear variation pattern is present, the dominanteigenvector z1 is a scaled version of c (Ceglarek and Shi,1996; Apley and Shi 2001). Thus, the dominant eigenvectorof the sample covariance matrix can be used as an estimateof c.

One related fundamental property of PCA is that thestraight line µx + tz1 (with t viewed as a scalar parame-ter that determines the distance along the line) providesthe best one-dimensional linear approximation to the dis-tribution of X, in the sense of minimizing the mean square(orthogonal) distance between X and the approximatingline. Principal curves are a natural nonlinear generalizationof this concept. Because f(t) in Equation (2) satisfies thedefinition of a principal curve stated below, we also useit to denote a principal curve. According to the definitionof Hastie and Stuetzle (1989) (hereafter referred to as HS)a one-dimensional curve f(t) = [f1(t), f2(t), . . . , fd(t)]T ind-dimensional space is a principal curve of (the distributionof) X if:

1. f(t) does not intersect itself;2. f(t) has finite length inside any bounded subset of d ;

and3. f(t) = E(X | tf(X) = t), where E(·|·) denotes a condi-

tional expectation, and the projection index tf( x) is de-fined as

tf(x) = supt : ||x − f(t)|| = infu

‖x − f(u)‖. (3)

HS showed that no infinitesimally small smooth pertur-bation to a principal curve will decrease E||X − f(tf(X))||2.In this sense principle curves generalize the minimum meansquare distance property of the linear PCA approximationto the distribution of X. In the hypothetical case that the dis-tribution of X is known, the HS algorithm for constructingprincipal curves iterates over the following steps:

Step 0. Initialize f(0)(t) = µx + tz1, and set j = 1.Step 1. (Expectation.) Define f(j)(t) = E[X|tf(j−1) (X) = t ]

for all t .Step 2. (Projection.) For each x∈ d , set tf(j) (x) = maxt :

‖x − f(j)(t)‖ = minu

‖x − f(j)(u)‖.Step 3. Compute (f(j)) = E‖X − f(j)(tf(j) (X))‖2. If |(f(j))−

(f(j−1))| < threshold, then stop. Otherwise, let j =j + 1 and go to Step 1.

In practice, the theoretical distribution of X is unknownand the HS algorithm must be implemented on a samplexi : i = 1, 2, . . . , N of data. The actual HS algorithmstarts with f(0)(t) = µx + t z1 and the expectations in Steps1 and 3 are replaced by some form of locally weighted sam-ple averages. Throughout, the overscore symbol “ˆ” will beused to denote an estimate of a quantity.

4. Principal curves in lower dimensional linear varieties

If we are to effectively use principal curve concepts in theidentification of nonlinear variation patterns in the high di-mensional data typically encountered in manufacturing, wemust first reduce the dimensionality of the problem usinglinear PCA as a preprocessing step, as we will describe inthe following section. In the present section, we show thatno information is lost in the dimensionality reduction step,if f(t) lies in a linear variety (a translated linear subspace) ofdimension r < d. We also argue that there are many practi-cal manufacturing situations in which nonlinear variationpatterns lie in lower dimensional subspaces, in particularwhen f(t) can be closely approximated as a piecewise linearcurve.

Let M denote some r -dimensional subspace of d ,and let a0 + M denote the r -dimensional linear varietythat results from translating M by some constant vectora0. Suppose that f(t) lies in a0 + M for all t and thatthe noise covariance is w = σ 2

w I. Suppose also that noother linear variety in which f(t) lies has a dimensionsmaller than r . We show in the Appendix that in thiscase,

λ1 ≥ λ2 ≥ . . . ≥ λr > σ 2w = λr+1 = λr+2 . . . = λd, (4)

and

a0 + M = µx + span z1, z2, . . . , zr . (5)

The significance is that when f(t) lies in an r -dimensionallinear variety, PCA can be used to identify the linear variety.The linear variety is given by the span of the first r eigen-vectors, translated by µx, where r is equal to the numberof dominant eigenvalues. Although Equation (4) applies tothe theoretical covariance matrix x, in practice PCA isconducted on a sample covariance matrix. In this case, itmay not be clear what value should be chosen for r . Thereare, however, a number of statistical methods that have beenproposed for estimating the number of dominant eigenval-ues in this situation. The most common methods use eitherlikelihood-based or information-based criteria, and involvethe ratio of the arithmetic and geometric averages of thesmall eigenvalues. The reader is referred to Apley and Shi(2001), and the references therein, for a detailed discussionon the various methods for estimating the number of dom-inant eigenvalues, including their relative performances intypical manufacturing scenarios.

Although the algorithm described in Section 5 fits aprincipal curve within the r -dimensional space of domi-nant PCA scores, it is conceptually helpful to think of fit-ting a principal curve within the original d-dimensionalspace, but using a “PCA-filtered” version of the data toreduce the effects of the noise: Consider the d × r matrixZ = [z1z2 . . . zr ] and the d × d matrix Pz = ZZT , which isthe projection operator onto M. Define the filtered ver-sion Xz = Pz (X − µx) + µx of the random vector X.

Dow

nloa

ded

By: [

aple

y@no

rthw

este

rn.e

du] A

t: 14

:37

27 M

arch

200


Then

Xz = Pz(X − µx) + µx = Pz(f(t) − µx + W) + µx

= f(t) − µx + PzW + µx = f(t) + Wz, (6)

where the third equality follows from the fact that f(t) −µx ∈ M, and Wz = PzW is the filtered version of the noise,obtained by projecting W onto the r -dimensional subspaceM.

Equation (6) implies that by filtering the data using theresults of PCA, the principal curve component f(t) of themodel (2) is unchanged, whereas the noise component W isreduced by an amount that depends on d and r . To quan-tify the extent to which the noise is reduced, consider thetotal variance E[WT W] as a measure of the noise level. Be-fore filtering, the total noise variance is E[WT W] = traceE[WWT ] = trace σ 2

w I = dσ 2w. After filtering, the total

variance of the filtered noise is E[WTz Wz] = trace E[Wz

WTz ] = tracePzw Pz = σ2

w trace PzPz = rσ 2w. Thus,

the ratio of total noise variance before and after filteringis d/r . When f(t) lies in a linear variety of dimensionmuch smaller than d, the reduction in noise variance willbe substantial. The simulation results presented in Section7 demonstrate that this improves the principal curves esti-mation accuracy.

In what type of situations might we expect f(t) to lie ina lower dimensional linear variety? The autobody exampleintroduced in Section 1 is one situation, because the princi-pal curve f(t) appears to be piecewise linear with two linearsegments. More generally, suppose f(t) is piecewise linearwith p linear segments, which is illustrated in Fig. 3 for thecase that p = 4 and d = 3. Although we illustrate this withp > d, the primary utility of the approach will be for situa-tions in which p d. The following arguments show that apiecewise linear f(t) lies in a linear variety whose dimensionis equal to the number of linearly independent pieces.

For j = 1, 2, . . . , p, define Ωj = t : f(t) lies on the jthsegment and cj = ∂f(t)/∂t |t∈j , which is proportional to

Fig. 3. A piecewise linear principal curve.

the direction of the jth segment. Without loss of generality,assume that t is scaled so that ∂f(t)/∂t |t∈j is constant overeach Ωj. For each fixed t , f(t) can be expressed as a linearcombination of cj : j = 1, 2, ..., p added to f(t0), wheref(t0) is an arbitrary point on f(t) . In other words, f(t) canbe expressed as

f(t) = f(t0) + Cv(t), (7)

where C = [c1 c2 . . . cp], and v(t) = [v1(t) v2(t) . . . vp(t)]T

for some set of p random variables vj(t) : j = 1, 2, . . . , pthat are functions of t alone. For example, suppose t0 ∈ Ω1,and tj : j = 1, 2, . . . , p − 1 are defined such that f(tj) isthe intersection between the jth and the (j + 1)st segments.Then for each t , v(t) = [t1 − t0, t2 − t1, . . . tk−1 − tk−2, t −tk−1, 0, . . . , 0]T where k is such that t ∈ Ωk.

From Equation (7), it is clear that a piecewise linear curvef(t) lies in the r -dimensional linear variety f(t0) + span c1,c2, . . . , cp, where f(t0) is an arbitrary point on f(t) and r= rank C ≤ p. Hence, Equations (4)–(6) are applicableto nonlinear variation patterns that can be represented aspiecewise linear curves. This has important practical im-plications with high dimensional manufacturing data, be-cause any nonlinear principal curve can be approximatedarbitrarily closely by a piecewise linear one. If only a small(relative to d) number of linear pieces are needed to ade-quately approximate a principal curve, the preceding resultsimply that r will be small relative to d and the level of noisereduction achieved by filtering the data will be substantial.

When f(t) lies entirely in an r -dimensional linear variety,the results of PCA provide an estimate of the linear varietyand its dimension. On the other hand, suppose that f(t)does not lie entirely in any low-dimensional linear variety,but that it can be reasonably approximated by a piecewiselinear curve. If the approximation is close, then f(t) will haveonly a small component that falls outside the linear varietyin which the piecewise linear approximation lies, which maybe of much lower dimension than d. In this case, the resultsof PCA will yield an estimate of a linear variety in which f(t)approximately lies. The methods of estimating the numberof dominant eigenvalues (i.e., the dimension of the linearvariety) discussed in Apley and Shi (2001) are attractive inthat the size of an eigenvalue is measured relative to σ 2

w. Ifthe size of all components of f(t) that lie outside a particularlinear variety is small relative to σ 2

w, it would typically bethe case that these components could be neglected.

5. A PCA-filtered principal curve algorithm

The HS algorithm for estimating a principal curve based ona sample of data xi : i = 1, 2, . . . , N seeks an estimate f(·)that minimizes

∑Ni=1 ‖xi − f‖2

under certain smoothnessconstraints. Here, ‖xi − f‖2 = inft‖xi − f(t)‖2. After con-ducting PCA on the sample covariance matrix, a set xz,i :i = 1, 2, . . . , N of filtered observations could be generated

Dow

nloa

ded

By: [

aple

y@no

rthw

este

rn.e

du] A

t: 14

:37

27 M

arch

200


in analogy with Equation (6) via xz,i = Pz(xi − µx) + µx.Here, Pz would be formed from the first r eigenvectorszj : j = 1, 2, . . . , r of the sample covariance matrix, andµx is taken to be the sample average of the observations.One might consider directly applying the principal curveestimation algorithm to the filtered observations by mini-mizing

∑Ni=1 ‖xz,i − f‖2

. As was discussed in Section 4 andwill be demonstrated in Section 7, this reduces the effects ofthe noise and improves the accuracy of the principal curveestimation.

Aside from the PCA step, the computational expenseof this approach would be the same as if the nonlinearvariation pattern identification algorithm were applied tothe original, unfiltered observations. If r is much less thand, however, considerable savings in computational expensecan be achieved as follows. Note that since each xz,i liesin the linear variety µx + M, where M = span z1, z2, . . . ,

zr, minimizing∑N

i=1 ‖xz,i − f‖2must always result in an

f(·) that also lies in µx + M. Consequently, there is no lossof generality in restricting our search for f(t) to this r -dimensional linear variety. This can be accomplished byworking with the r -dimensional vectors of PCA scores:

yi = ZT (xi − µx), i = 1, 2, . . . , N, (8)

where Z = [z1 z2 . . . zr ]. Note that yi consists of the co-efficients of xz,i − µx using zj: j = 1, 2, . . . , r as a basisfor the r -dimensional subspace M. Similarly define h(t) =ZT (f(t) − µx) to be the coefficients of f(t) − µx in M. Be-cause we are restricting f(t) − µx to lie in M, it follows thatf(t) − µx = Zh(t), or

f(t) = Zh(t) + µx. (9)

Using Equations (8) and (9) and the definition of Xz,i, forany t we have

xz,i − f(t) = ZZT [xi − µx] + µx − [Zh(t) + µx]

= Z[yi − h(t)],

so that

‖xz,i − f(t)‖2

= ‖Z[yi − h(t)]‖2 = [yi − h(t)]t Zt Z[yi − h(t)]

= [yi − h(t)]T [yi − h(t)] = ‖yi − h(t)‖2.

Therefore, choosing a f(·) to minimize∑N

i=1 ‖xz,i − f‖2is

equivalent to choosing a h(·) to minimize∑N

i=1 ‖yi − h‖2

and then recovering f(·) from h(·) via Equation (9).The advantage of working with the r -dimensional vectors

yi: i = 1, 2, . . . , N, as opposed to the d-dimensional vec-tors xz,i : i = 1, 2, . . . , N, is that computational expenseand convergence speed are substantially improved.

The PCA-filtered principal curve algorithm is summa-rized as follows:

Step 1. (Linear variety identification.) Conduct PCA onthe sample data and estimate the dimension ofthe linear variety as the number r of dominanteigenvalues.

Step 2. (PCA filtering.) Form Z from the first r sampleeigenvectors and generate the PCA score vectorsyi = ZT (xi − µx).

Step 3. (Principal curve estimation.) Use a standard prin-cipal curve estimation algorithm (e.g., the HS al-gorithm) to find the curve h(·) in r -dimensionalspace that minimizes

∑Ni=1 ‖yi − h‖2

. We have usedthe publicly available software R, which can be ob-tained from http://www.r-project.org.

Step 4. (Principal curve recovery.) Given h(·), recoverthe d-dimensional principal curve f (·) usingEquation (9).

We illustrate the variation pattern identificationalgorithm with the same autobody assembly example in-troduced in Section 1. An example with much higherdimensional data is considered in the following section.After applying PCA to a sample of N = 100 autobodies,it was concluded that there were two dominant eigenval-ues. Hence, the principal curve lies approximately in a two-dimensional subspace, which is consistent with the piece-wise linear appearance of the variation pattern in Fig. 2.Figure 4(a) is a plot of the eigenvalues. The two correspond-ing eigenvectors are z1 = [0.25 0.53 0.65 0.36 0.29 0.14]T

and z2 = [−0.21 − 0.28 − 0.34 0.62 0.56 0.26]T . Figure 4(b)is a scatter plot of the two PCA scores y1 and y2 for the 100autobodies, along with the estimated principal curve h(·) inthe two-dimensional subspace.

In order to visualize the pattern, we should transformthe lower dimensional principal curve h(·) back to the prin-cipal curve f(·) in the original six-dimensional data spacevia Equation (9). The six elements of f(·) (corresponding tothe six variables x1, . . . , x6) are plotted versus t in Fig. 5.Although the scaling of t is somewhat arbitrary in general,the scaling in Fig. 5 was such that t coincides with the arclength along the principal curve. Compared with the scatterplots in Fig. 2, the plots in Fig. 5 allow a clearer, less noisy,visualization of the nature of the nonlinear pattern.

The ultimate purpose of blindly identifying and visual-izing the variation pattern is, of course, to gain insight intothe root cause of the variation. By inspection of Fig. 5,certain characteristics of the variation pattern become ev-ident: x1–x3, which are all located on the right bodyside,are roughly linearly related. As t varies across its range ofvalues, x3 varies the most, followed by x2, and then x1 variesthe least. The features x4–x6, which are located on the leftbodyside, are not affected by the variation pattern until tapproaches its midrange value (i.e., when the movementsof x1–x3 approach their midrange values). After this point(t > 1.5, roughly) the motion of all six features are approx-imately linearly related, with the features located higher on

Dow

nloa

ded

By: [

aple

y@no

rthw

este

rn.e

du] A

t: 14

:37

27 M

arch

200


Fig. 4. (a) Eigenvalues and (b) scatter plot of the two dominant PCA scores for the autobody example. Panel (b) also shows theestimated principal curve h(·) in the two-dimensional subspace of dominant PCA scores.

the bodysides (x3 and x4) varying to a larger extent. Thevariation pattern therefore represents a matchboxing of theliftgate opening, in which the left side of the opening be-gins to matchbox with the right side only after the right

Fig. 5. Plots of the elements of f(·) versus t , illustrating the char-acteristics of the nonlinear variation pattern.

side moves through roughly half its total motion. Based onthis, a process engineer might speculate that the variationpattern is due to a fixturing problem during the locating ofthe right bodyside in the framing station (a major assemblystation in which the left and right bodysides are joined tothe underbody and a set of upper cross-members). Whenthe right bodyside deviates to the right by a large enoughamount, it begins to interfere with the upper cross-member.The upper cross-member then interferes with the left body-side, pulling the left bodyside along with it to the right andcausing the matchboxing pattern.

In this example, we have described how one might inter-pret the nature of the variation pattern based on inspectinga plot of the elements of f(t) versus t , such as that shown inFig. 5. In applications where the measurement data repre-sent dimensional measurements distributed over the surfaceof a part, visualizing the nature of the variation pattern canbe greatly facilitated via the use of interactive graphical an-imations of the pattern. We describe such an example in thefollowing section.

6. Visualizing high dimensional variation patterns

The purpose of identifying variation patterns is to serve asan aid in identifying and eliminating major root causes ofmanufacturing variation. In order for a process operatoror engineer to effectively interpret the pattern identifica-tion results, graphical visualization techniques are critical,especially with high dimensional data. In this section, weillustrate a method for visualizing variation patterns in highdimensional data with an example from sheet metal flang-ing. In sheet metal panel assembly, the edges of the panelsare often formed into a flange in order to add stiffness tothe panel or create a mating surface. In shrink flanging, thepanel is formed into a concave shape at the same time theflange is formed, which often causes wrinkling in the flange(see Fig. 6). Wrinkling not only mars the appearance of thepart, but may also interfere with subsequent sealing andwelding operations.

Dow

nloa

ded

By: [

aple

y@no

rthw

este

rn.e

du] A

t: 14

:37

27 M

arch

200


Fig. 6. Illustration of a panel in shrink flanging exhibiting wrinkleson the flange surface.

Wang et al. (2001) and the references therein describeanalytical and numerical (finite element) models of flangewrinkling. As boundary conditions (e.g., binding force, lu-brication, and slippage) change from part to part, the char-acteristics of the wrinkles change. In the simulation studyof this section, the variation pattern represents the positionof the wrinkles varying along the length of the flange dueto variations in the boundary conditions. The measurementvector X represents the height of the wrinkles (normal tothe surface) over an array of locations distributed across thepanel. This could represent point cloud data from a laserscan of the panel, for example. For analysis purposes, wehave used d = 1080 points distributed uniformly over thepanel. The variation pattern was simulated over a sampleof N = 70 parts, with random noise added, and the resultsof PCA indicated that there were r = 2 dominant eigenval-ues. Similar to Fig. 4 (a and b), Fig. 7 (a and b) shows thefirst 11 eigenvalues and a scatter plot of the PCA scores

Fig. 7. (a) The first 11 eigenvalues; and (b) the scatter plot of the dominant PCA scores for the simulated shrink flanging example.Panel (b) also shows the estimated principal curve h(·) in the two-dimensional subspace of dominant PCA scores.

corresponding to the two dominant eigenvalues, as well asthe fitted principal curve h(·) in two-dimensional space. Theremaining 1069 eigenvalues (not shown in Fig. 7(a)) mono-tonically decreased to zero. The strong nonlinear (circular)pattern results from what can be viewed as phase shift-ing as the wrinkles change position along the flange. Al-though principal curves often have distinct beginning andend points, the underlying (circular) principal curve for thisexample does not. The reason for this is that the pattern re-peats itself as the phase of the wrinkles shifts 360. Hence,the principal curve is closed for this example, and one mightthink of joining the beginning and end points of the curveshown in Fig. 7(b).

After transforming h(·) back to the principal curve f(·) inthe original 1080-dimensinal space, the nature of the varia-tion pattern could be visualized by graphically illustratinghow f(·) varies as t is varied. Figure 8 shows a MATLAB©R

graphical user interface that was developed for this purpose.The plot of f(·) shown in Fig. 8 changes dynamically as theuser moves the slide bar control back and forth. The posi-tion of the slide bar represents the value of t for which f(·)is currently plotted. Figure 9 shows five frames (for five dif-ferent values spanning the range of t) produced from Fig. 8.Only the flange part of the panel is shown. Although theinteractive animation would allow one to more clearly visu-alize the nature of the variation pattern, one can still discernfrom the five static frames that the pattern represents thewrinkles changing position from part to part.

7. Monte Carlo performance comparison

In this section, Monte Carlo simulation is used to comparethe performances of the standard HS algorithm and thePCA-filtered HS algorithm. In all simulations, the data weregenerated via the model (2). The noise component of x was

Dow

nloa

ded

By: [

aple

y@no

rthw

este

rn.e

du] A

t: 14

:37

27 M

arch

200


Fig. 8. Graphical user interface for interactively visualizing thenature of a nonlinear variation pattern that represents wrinklingin a shrink flanged panel.

generated from a multivariate Gaussian distribution withcovariance matrix σ 2

w I. The principal curve component ofx was generated as a uniformly distributed point along thespecified principal curve f(t) . Denoting the principal curveestimates from the HS algorithm and the PCA-filtered HS

Fig. 9. Illustration of the variation pattern in the flange for fivedifferent frames from the interactive visualization interface shownin Fig. 8. The slide bar control in the left panel shows the valuesof t for the five frames.

Fig. 10. Illustration of data along a quadratic principal curve withd = 3, r = 2 and σw = 0.1.

algorithm by fHS(·) and fHS-PCA(·), respectively, considerthe mean square distances:

dHS =∫

‖fHS − f(t)‖2g(t)dt, (10)

and

dHS-PCA =∫

‖fHS-PCA − f(t)‖2g(t)dt, (11)

as measures of closeness between the estimated and trueprincipal curves. Here, g(t) denotes the probability den-sity of t , which was uniform in all cases. The variable twas scaled so that ||∂f(t)/∂t || was constant along the entirelength of the curve. In each of the Monte Carlo simulationsfor the examples discussed below, dHS and dHS-PCA were av-eraged over 10 000 Monte Carlo replicates to compare theaccuracy of the two principal curve estimation methods.

A quadratic principal curve lying in a two-dimensionallinear variety of d was used for all simulations. The prin-cipal curve was constructed by first generating the curvex2 = x2

1 for x1 ∈ [−1, 1], which is illustrated in Fig. 10 ford = 3. The curve was then multiplied by a Householdermirror reflection matrix in d-dimensional space to trans-form it without changing its shape or size so that it still fellin a linear variety of dimension r = 2.

Table 1 compares the performance of the HS algorithmand the PCA-filtered HS algorithm for various d and σw,with a sample size of N = 200. Note that the value of σw inFig. 10 corresponds to the midrange value of 0.1 in Table 1.The extent to which PCA filtering improves the accuracy (asmeasured by dHS/dHS-PCA) varied between 1.48 and 6.96,with larger relative improvement for larger d (higher di-mensional data) and/or large σw (larger noise-to-signal ra-tio). Simulations were also run for N = 100 and N = 400.The individual performance measures dHS and dHS-PCA were

Dow

nloa

ded

By: [

aple

y@no

rthw

este

rn.e

du] A

t: 14

:37

27 M

arch

200


Table 1. Performance comparison of the HS and PCA-filtered HSalgorithms for quadratic principal curves (N = 200)

d σw dHS dHS-PCA dHS/dHS−PCA

8 0.05 0.0031 0.0021 1.488 0.1 0.021 0.0089 2.368 0.2 0.43 0.16 2.69

16 0.05 0.0034 0.0019 1.7916 0.1 0.056 0.021 2.6716 0.2 0.63 0.19 3.3264 0.05 0.0065 0.0025 2.6064 0.1 0.24 0.059 4.0764 0.2 3.56 0.68 5.24

100 0.05 0.022 0.0052 4.23100 0.1 0.56 0.098 5.71100 0.2 5.64 0.81 6.96

roughly inversely proportional to N, and their ratio was vir-tually the same for all values of N.

8. Conclusions

This paper develops a methodology for representing andblindly identifying nonlinear variation patterns in high di-mensional manufacturing measurement data. The nonlin-ear variation pattern model is an extension of previouslinear models that provides more accurate representationof complex nonlinear phenomena. To overcome compu-tational and accuracy problems caused by high dimen-sionality, thereby making the approach more suitable forlarge-scale manufacturing databases, we used a commonPCA-based dimensionality reduction step prior to prin-cipal curve estimation. We showed that the dimensional-ity reduction step sacrifices very little information withthe type of variation patterns that one would typicallyencounter in manufacturing. The Monte Carlo simulationresults demonstrate that the PCA-filtering can also sub-stantially improve the accuracy of the variation patternestimation.

The paper also describes the use of graphical visualiza-tion methods for aiding in interpreting the blindly identifiedvariation patterns. This allows potentially valuable infor-mation to be extracted from large volumes of manufactur-ing measurement data. Together, the pattern identificationand visualization techniques provide an effective tool fordiagnosing major root causes of manufacturing variation.

As a final comment, we point out that there are manynonlinear pattern identification and other data miningmethods (e.g., those discussed in Hastie et al. (2002)) thatone might have considered for this problem. The body ofcandidate methods shrinks considerably when one restrictsattention to only those that fall into the category of un-supervised learning, as is required in our application. Fur-thermore, we desired an approach that could identify thenature of the variation pattern as blindly and generically as

possible, without assuming any particular structural formfor the functional relationship between the different vari-ables (e.g., radial basis functions, sigmoidal functions asso-ciated with neural networks, wavelet basis functions, etc.).With these requirements, principal curve estimation (or theself-organizing maps of Kohonen (Kohonen, 1990), whichmay be viewed as a discrete version of principal curves(Cherkassky and Mulier, 1998)) is perhaps the most suit-able. Another attractive aspect of the principal curve ap-proach is that it provides a convenient means for visualiz-ing the variation pattern (refer to Figs. 8 and 9) followingits blind identification, thereby helping process engineers todiscover the nature and root cause of the variation.

References

Apley, D.W. and Lee, H.Y. (2003) Identifying spatial variation patterns inmultivariate manufacturing processes: a blind separation approach.Technometrics, 45, 220–235.

Apley, D.W. and Shi, J. (1998) Diagnosis of multiple fixture faults in panelassembly. ASME Journal of Manufacturing Science and Engineering,120, 793–801.

Apley, D.W. and Shi, J. (2001) A factor analysis method for diagnosingvariability in multivariate manufacturing processes. Technometrics,43, 1–12.

Banfield, J.D. and Raftery, A.E. (1992) Ice floe identification in satel-lite images using mathematical morphology and clustering aboutprincipal curves. Journal of the American Statistical Association, 87,7–16.

Barton, R.R. and Gonzalez-Barreto, D.R. (1996) Process-oriented basisrepresentation for multivariate process diagnostics. Quality Engi-neering, 9, 107–118.

Ceglarek, D. and Shi, J. (1996) Fixture failure diagnosis for the autobodyassembly using pattern recognition. ASME Journal of Engineeringfor Industry, 118, 55–66.

Chang, K. and Ghosh, J. (2001) A unified model for probabilistic prin-cipal surfaces. IEEE Transactions on Pattern Analysis and MachineIntelligence, 23, 22–41.

Cherkassky, V. and Mulier, F. (1998) Learning From Data: Concepts, The-ory, and Methods, Wiley, New York, NY.

Delicado, P. (2001) Another look at principal curves and surfaces. Journalof Multivariate Analysis, 77, 84–116.

Delicado, P. and Huerta, M. (2003) Principal curves of oriented points:theoretical and computational improvements. Computional Statis-tics, 18, 293—315.

Ding, Y., Ceglarek, D. and Shi, J. (2002) Fault diagnosis of multistagemanufacturing processes by using state space approach. ASME Jour-nal of Manufacturing Science and Engineering, 124, 313–322.

Hastie, T. and Stuetzle, W. (1989) Principal curves. Journal of the AmericanStatistical Association, 84, 502–516.

Hastie, T., Tibshirani, R. and Friedman, J. (2002) The Elements of Sta-tistical Learning: Data Mining, Inference, and Prediction, Springer,New York, NY.

Jin, J. and Shi, J. (1999) State space modeling of sheet metal assembly fordimensional control. ASME Journal of Manufacturing Science andEngineering, 121, 756–762.

Johnson, R.A. and Wichern, D.W. (2002) Applied Multivariate StatisticalAnalysis, 5th edn., Prentice Hall, Upper Saddle River, NJ.

Kegl, B., Krzyzak, A., Linder, T. and Zeger, K. (2000) Learning anddesign of principal curves. IEEE Transactions on Pattern Analysisand Machine Intelligence, 22, 281–297.

Kohonen, T. (1990) The self-organizing map, Proceedings of the IEEE,78, 1464–1479.

Dow

nloa

ded

By: [

aple

y@no

rthw

este

rn.e

du] A

t: 14

:37

27 M

arch

200


Lee, H.Y. and Apley, D.W. (2004) Diagnosing manufacturing variationusing second-order and fourth-order statistics. International Journalof Flexible Manufacturing Systems, 16, 45–64.

Tibshirani, R. (1996) Principal Curves and Neural Networks, CambridgeUniversity Press, Cambridge, UK.

Wang, X., Cao, J. and Li, M. (2001) Wrinkling analysis in shrink flanging.ASME Journal of Manufacturing Science and Engineering, 124, 426–432.

Zhou, S., Ding, Y., Chen, Y. and Shi, J. (2003) Diagnosability study ofmultistage manufacturing processes based on linear mixed-effectsmodels. Technometrics, 45, 312–325.

Appendix

Proofs of Equations (4) and (5). First note that becausef(t) ∈ a0 + M for all t , it must be the case that its mean µfalso lies in a0 + M. Also, because W is zero-mean, µx = µf.Hence, µx ∈ a0 + M, so that the linear variety a0 + M is thesame as µx + M. By the independence of W and t :

x = E[(X − µx)(X − µx)T ] = E[(f(t) − µx + W)(f(t)−µx + W)T ] = E[(f(t) − µx)(f(t) − µx)T ] + σ 2

wI.

Now consider any unit-norm vector z ∈ M⊥ (the orthog-onal complement of M). Because f(t) − µx lies in M andz lies in the orthogonal complement of M, (f(t) − µx)T z =0 for all t . Therefore, xz = E[(f(t) − µx)(f(t) − µx)T z] +σ 2

wIz = σ 2wz. Hence, z is an eigenvector of x with eigen-

value σ 2w. Because M⊥ is a d − r dimensional subspace,

there exists an orthonormal set of d − r such eigenvec-tors, which we denote zr+1, zr+2, . . . , zd. Because theremaining eigenvectors z1, z2, . . . , zr are orthogonal tozr+1, zr+2, . . . , zd, they must all lie in the r -dimensionalsubspace M. Consequently, M = span z1, z2, . . . , zr,which completes the proof of Equation (5).

Because zj ∈ M for 1 ≤ j ≤ r , its corresponding eigen-value is

λj = zTj (λjzj) = zT

j (Σxzj)

= zTj

E[(f(t) − µx)(f(t) − µx)T ] + σ 2

wIzj

= E[zT

j (f(t) − µx)(f(t) − µx)T zj] + σ 2

w

= E[zT

j (f(t) − µx)2] + σ 2w > σ 2

w.

That E [(zTj (f(t) − µx))2] is strictly greater than zero follows

from the condition that no other linear variety in whichf(t) lies has dimension smaller than r . Indeed, if E [(zT

j (f(t)−µx)2)] = 0 for some 1 ≤ j ≤ r , this would imply that f(t)lies in the r − 1 dimensional linear variety µx + spanz1,z2, . . . , zj−1, zj+1, . . . , zr.

Biographies

Daniel W. Apley received BS and MS degrees in Mechanical Engineering,an MS degree in Electrical Engineering, and a PhD degree in Mechani-cal Engineering in 1990, 1992, 1995, and 1997, respectively, all from theUniversity of Michigan. He was an Assistant Professor of Industrial En-gineering at Texas A&M University from 1998 to 2003. Since September,2003, he has been an Associate Professor of Industrial Engineering andManagement Sciences and the Morris E. Fine Professor of Manufac-turing at Northwestern University. His research interests include man-ufacturing variation reduction, with particular emphasis on processesin which advanced measurement, data collection, and automatic controltechnologies are prevalent. Much of his work involves the use of engineer-ing models, knowledge and information within a statistical framework foranalyzing large sets of process and product measurement data. He was anAT&T Bell Labs PhD Fellow in Manufacturing Science from 1993–1997,and he received the NSF Career Award from 2001–2006 for his researchand teaching. He currently serves on the Editorial Board of the Journalof Quality Technology and as an Associate Editor of Technometrics. Heis a member of IEEE, ASME, IIE, SME, and INFORMS.

Feng Zhang received B.S. and M.S. degrees in Control Theory and Engi-neering from Tsinghua University, China, in 1996 and 1999, respectively,and a Ph.D degree in Industrial and System Engineering from TexasA&M University, College Station, TX, in 2004. He is currently an Op-erations Research Analyst at Fairchild Semiconductor, South Portland,ME. His research encompasses many aspects of data mining and statisti-cal quality control, which involves the integration of pattern recognitionand visualization with multivariate data analysis for identifying system-atic causes of manufacturing variation. He is a member of INFORMS,IEEE, and DSI.

C ISSN: 0740-817X print / 1545-8830 online DOI: 10.1080...

Documents

Transcript of C ISSN: 0740-817X print / 1545-8830 online DOI: 10.1080...