05458324

download 05458324

of 9

Transcript of 05458324

  • 8/2/2019 05458324

    1/9

    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010 201

    Introducing a Unified PCA Algorithm forModel Size Reduction

    Richard P. Good, Daniel Kost, and Gregory A. Cherry

    AbstractPrincipal component analysis (PCA) is a techniquecommonly used for fault detection and classification (FDC) inhighly automated manufacturing. Because PCA model buildingand adaptation rely on eigenvalue decomposition of parametercovariance matrices, the computational effort scales cubically withthe number of input variables. As PCA-based FDC applicationsmonitor systems with more variables, or trace data with fastersampling rates, the size of the PCA problems can grow faster thanthe FDC system infrastructure will allow. This paper introducesan algorithm that greatly reduces the overall size of the PCAproblem by breaking the analysis of a large number of variablesinto multiple analyses of smaller uncorrelated blocks of variables.Summary statistics from these subanalyses are then combined

    into results that are comparable to what is generated from thecomplete PCA of all variables together.

    Index TermsCombined index, computation time, fault detec-tion, large scale systems, multivariate statistical process control(MSPC), principal component analysis (PCA), recursive PCA.

    I. INTRODUCTION

    MULTIVARIATE fault detection and classification(FDC) has been widely applied in semiconductor

    manufacturing to quickly identify when a process is behavingabnormally. These abnormalities often result from faulty mea-

    surements, misprocessed wafers, process drifts or trends, toolaging, and tool failures. Studies have been reported on theanalysis of optical emission spectroscopy data [1], site-level

    metrology data [2], trace-level process equipment data [2][6],and end-of-line electrical test data [7][9].

    Several trends in the semiconductor manufacturing industryhave made the use of FDC increasingly critical. First, thegrowing size of wafers increases the cost of missing a processfault. A missed fault during the processing of a 300 mmwafer puts roughly twice as many die in jeopardy comparedto a 200 mm wafer. This will become even more critical asthe industry shifts to 450 mm wafers or reduces the size ofeach die. Second, process sampling is fast becoming a more

    common practice [10][12]. As fewer wafers are measured, itbecomes more critical to capture disturbances at the tool level.By waiting until product wafers are measured, it is possible tohave misprocessed hundreds of wafers. In addition, if FDC is

    Manuscript received May 31, 2009; revised November 30, 2009. Currentversion published May 05, 2010. This work was supported by the AdvancedProcess Control Group, GLOBALFOUNDRIES, Austin, TX and Dresden,Germany.

    R. P. Good and G. A. Cherry are with GLOBALFOUNDRIES, Austin,TX, USA (e-mail: [email protected]; [email protected]).

    D. Kost is with GLOBALFOUNDRIES, Dresden, Germany (e-mail: [email protected]).

    Digital Object Identifier 10.1109/TSM.2010.2041263

    used on metrology tools, then waiting for univariate trends tosignal a process drift puts too many wafers at risk. MultivariateFDC enables quicker identification of process faults by betterutilizing the available metrology data. The third trend is theever-increasing cost of manufacturing equipment. The mostexpensive process equipment currently cost upwards of $40million [13]. To ensure a return on such a large investment,maximizing equipment uptime is essential. Relying on mul-tivariate FDC to identify abnormal processing allows for lessfrequent preventative maintenance and lower likelihood of cat-astrophic failure, which results in greater equipment utilization.

    Principal component analysis (PCA) is a technique com-monly used to perform FDC in semiconductor manufacturing.The PCA algorithm is well-suited for semiconductor processdata because of its robustness to collinearity. Furthermore, PCAuses the existing correlation structure to identify process faultsand reduce false alarms. A model is first built to characterizethe correlation of the data. New measurements are then com-pared to the model, and, if a new measurement is significantlydifferent from the historical data, the measurement is classifiedas a fault.

    As we will see in Section II, PCA loadings are calculatedfrom the correlation matrix, the size of which scales quadrati-cally with the number of variables, . When the requirement

    exists to adjust to changes in correlation for new measurements,the overall size of the stored model scales , and thecomputation effort for loading updates scales whenusing the rank-one modification approach [14]. As PCA-basedFDC applications monitor systems with more and more vari-ables or on trace data with faster sampling rates, the FDCsystems infrastructure is faced with a growing storage andcomputational burden that can be difficult to overcome. Thispaper introduces an algorithm for breaking the PCA probleminto multiple smaller problems, which greatly reduces the sizeof models and the computation time for model generation andadaptation.

    The idea of dividing a large PCA problem into two or moresmaller ones has been investigated in the past. Wold et al. in-troduced the method of consensus PCA (CPCA), in which thecomplete set of process variables is split into blocks [15]. Upper-level modelling is performed to capture the relationships be-tween blocks to generate super scores, while the relationshipsbetween variables of the same block are captured at the lowerlevel. Hierarchical PCA (HPCA) has also been presented, whichdiffers only from CPCA with respect to normalization [16], butit has been shown to have problems with convergence [17]. Sub-sequent research was performed that proved the equivalence be-

    tween CPCA and regular PCA, such that block scores and load-ings can be derived from a single PCA model constructed from

    all variables [17], [18].

    0894-6507/$26.00 2010 IEEE

  • 8/2/2019 05458324

    2/9

    202 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010

    A commonality between the multiblock methods of CPCAand HPCA is that the estimation of both levels of model pa-rameters must be performed simultaneously, which is usefulfor capturing correlation that exists from one block to another.However, for applications in which block-to-block correlationis minimal, little is gained by modelling the crosstalk acrossblocks. In such cases, it is more efficient to deploy models

    independently for each block of variables, which can be donewithout compromising fault detection accuracy. The indepen-dent modeling approach has seen significant use for batchprocess monitoring, in which trace data can be split usingeither engineering knowledge [19], [20] or phase identificationalgorithms [21], [22]. Yet one thing that is missing in theseapproaches is the ability to combine the results from the in-dividual models to provide monitoring statistics at the overallprocess level. The method presented in this paper allows for theefficient use of smaller independent models while providing amechanism for rolling up results to higher levels on demand.

    In the rest of the paper, Section II provides a review of the re-

    cursive PCA algorithm, and the unified PCA algorithm (UPCA)is introduced in Section III. In Section IV, we demonstrate theUPCA algorithm on electrical test data, and follow with someconcluding remarks in Section V.

    II. RECURSIVE PCA

    In this section, we will briefly review the recursive PCA algo-

    rithm as well as three commonly used statistics for quantifying

    the severity of a fault.

    A. PCA

    With PCA, a zero mean and unit variance data matrix of

    samples (rows) and variables (columns) is first transformed

    into a new orthogonal basis that is aligned along the directions

    of largest variation. This defines the eigenvalue problem using

    the sample covariance matrix according to

    (1)

    (2)

    where are the eigenvectors (loadings) of the covari-

    ance matrix. The scores , which are the orthogonal

    projections of in the new basis , can be obtained by

    (3)

    Data compression and dimension reduction is obtained by de-

    composing into

    (4)

    where and are the modeled and residual components, re-

    spectively. Furthermore, and can be written as

    (5)

    (6)

    where contains the first eigenvectors (corresponding to the

    largest eigenvalues) of the correlation matrix, , and contains

    the last eigenvectors of . The matricesand are the loading and score matrices, respectively.

    Fig. 1. Decomposition of the example data matrix, X , into its modeled ( X )and unmodeled ( ~X ) components.

    Likewise, the matrices and are

    the residual loading and score matrices. Because the columns of

    both and are orthogonal

    (7)

    (8)

    It is worth noting that the choice of the appropriate number of

    principal components is critical when building a PCA model.

    Please refer to [23] and [24] for detailed discussions on criteria

    and methods for choosing the appropriate number of principal

    components. The main computational effort goes to the solu-

    tion of the eigenvalue problem and the selection of the appro-

    priate number of principal components, which increases with

    the number of variables monitored.An example of PCA decomposition is provided in Fig. 1,

    where two correlated variables, and , are plotted against

    each other. After scaling each of them to zero mean and

    unit variance, they are assembled into the data matrix,

    , prior to the application of PCA. The eigen-

    vector corresponding to the largest eigenvalue provides the first

    principal direction and is shown in Fig. 1 as a dashed line. We

    have chosen to use only this one principal component while the

    second principal component is relegated to the residual space.

    The modeled and residual component can now be calculated

    for every observation in Fig. 1. The modeled component is

    the distance from the origin along the principal eigenvector.

    Likewise, the residual component is the distance from origin

    along the residual eigenvector.

    B. Recursive PCA

    Two main aspects lead to the introduction of recursive PCA.

    First, the number of samples used for building models is not

    always sufficient for a representative estimation of the correla-

    tion structure between variables. When that is the case, it may

    be useful to deploy an immature model to provide immediate

    monitoring as soon as possible and then adapt that model as

    more data become available. Second, due to time-varying be-

    havior of some processes (such as equipment aging, sensor and

    process drifts, etc.), newer data are often more representativeof the normal behavior of a process than older data [14].

  • 8/2/2019 05458324

    3/9

    GOOD et al.: ALGORITHM FOR MODEL SIZE REDUCTION 203

    In these cases, it is appropriate to adapt PCA models because

    these normal drifts and trends may be inaccurately identified

    as process faults. The PCA model can be adapted to a new

    unscaled data sample, , by updating the estimate of the

    correlation matrix and scaling parameters according to

    (9)

    where

    (10)

    is the new scaled measurement, is the vector of means,

    is a diagonal matrix containing the standard deviations of the

    variables, , and is a tuning parameter that gov-erns the rate at which the correlation matrix is updated [2]. The

    closer is to zero, the more quickly the model will adapt. Con-

    versely, when is close to unity, the model will adapt gradually.

    Subsequent to the correlation matrix update, the loadings would

    be generated by performing a singular value decomposition on

    the result.

    C. PCA Performance Indices

    When a PCA model is applied to a new observation, thetwo performance statistics that are normally considered are thesquared prediction error (SPE) and Hotellings . The SPEindicates how much an observation deviates from the model

    and is defined as

    SPE (11)

    Alternatively, Hotellings indicates how much an observa-tion deviates inside the model, and is calculated by

    (12)

    where is a diagonal matrix containing the principal eigen-values used in the PCA model. A process is considered normalif, as referenced in [25] and [4], respectively, both the SPE and

    statistics satisfy

    SPE (13)

    and

    (14)

    where is the inverse of the chi-squared distribution func-tion with degrees of freedom and a confidence interval of

    . The SPE and limits are shown for the example dataset in Fig. 2. Here we see that would identify deviations inthe modeled direction only, while the SPE would identify devi-

    ations in the residual direction only. Taken collectively, theand SPE limits would form a box around the raw data.

    Fig. 2. T and SPE limits for the example data set.

    As an alternative to looking at the SPE and separately, Yueand Qin [4] introduce a combined metric, , which is a sum ofthe SPE and metrics weighted against their control limits

    SPE(15)

    where

    (16)

    Furthermore, Yue and Qin show that is approximately propor-tional to the distribution

    (17)

    where

    (18)

    and

    (19)

    The limit is shown for the example data set in Fig. 3.

    D. Recursive PCA Model Sizes

    Although PCA is in essence a data compression algorithm,when using recursive PCA one must keep a record of the en-tire correlation matrix. This implies that the PCA model sizeincreases quadratically with the number of variables. Althoughthis is not a concern with small models, several trends in semi-conductor manufacturing are creating a situation in which thePCA model sizes are becoming too large for FDC system infra-structures. First, process equipment data are unfolded to allowFDC to be applied on the trace-level sensor trajectories [26].Because data are unfolded in the time direction, the increase indata collection frequency directly increases the number of vari-

    ables in the PCA model. Second, we are seeing an increase in thenumber of variables being monitored. From a microprocessor

  • 8/2/2019 05458324

    4/9

    204 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010

    Fig. 3. ' limits for the example data set.

    manufacturing perspective, the industry has seen the introduc-

    tion of multiple cores and additional levels of cache memoryon the devices. This has dramatically increased the number ofelectrical parameters to monitor. In just the past couple of years,the number of parameters has increased approximately ten-fold,thus increasing the PCA model size by a factor of 100. Finally,we have seen an increase in the number of processes being mon-itored by PCA models. Although this does not affect the sizeof the models, the increase in the number of applications com-pounded with the increase in model sizes can drive the FDCsystem infrastructure to its limits.

    III. A UNIFIED PCA ALGORITHM

    When investigating sample covariance matrices, we often see

    that data are segregated into nearly uncorrelated blocks. Con-sider, for example, the correlation matrix of some electrical testdata in Fig. 4. Here we see several blocks of variables with verylittle correlation between the blocks. Let us assume for the mo-ment that the correlation between the blocksis negligible suchthat the correlation matrix, , can be written in block-diagonalform

    ......

    . . ....

    (20)

    It is widely known that the eigenvalues of this block-diagonalmatrix are the union of the eigenvalues of the individual blocks,

    for . Furthermore, the corresponding eigenvec-tors of are the individual eigenvectors of filled with zerosat the positions of the other blocks (see, for example, [27], [28]).It follows that, if we abandon the practice of ordering the eigen-vectors by the magnitude of the eigenvalues, the loadings andscores matrices, and , can be rewritten as

    ......

    . . ....

    (21)

    (22)

    Fig. 4. Correlation coefficients and blocking for the electrical test data.

    Therefore, if the data are composed of uncorrelated blocks, and

    if we assign the appropriate number of principal components toeach block, then it is possible to reduce the greater PCA modelto a series of smaller PCA models. We term this method uni-fied PCA (UPCA). One can think of two special cases. On oneextreme, only one block is used such that UPCA is identical toPCA. In this case, there is no performance loss nor model sizereduction. On the other extreme, by assigning each variable toits own block, the number of nonzero elements in the covariancematrix is reduced from to . Assuming that only nonzerovalues are stored in the model, the model size would be at a min-imum, and the FDC approach would be reduced to a test on

    independent variables. In this sense, the proposed method canbe considered a generalization of the PCA and approaches.

    Calculating the overall fault detection indices and their limitsis straightforward since we need only to operate on, at most,five block scalar values (SPE, , , and ) . Since isnow block diagonal, SPE and can be calculated by summingthe contributions from the individual blocks

    SPE SPE (23)

    (24)

    Next, when calculating the SPE limits, we see that we are op-

    erating on two scalar values: , and.

    Again, using the block diagonal form of , and canbe calculated by summing the individual blocks

    (25)

    (26)

    The SPE limit, follows from (13)

    (27)

  • 8/2/2019 05458324

    5/9

    GOOD et al.: ALGORITHM FOR MODEL SIZE REDUCTION 205

    TABLE IUPCA ROLL-UP OF SUMMARY STATISTICS FROM SMALLER

    PCA BLOCKS TO FORM A LARGER PCA MODEL

    and the limit, , can be calculated by summing the numberof principal components used in each block, , by

    (28)

    If we wish to use the combined index , then SPE, , , andcan be used directly with (15) through (19) such that the fault

    limit is

    (29)

    An example of the UPCA algorithm is shown in Table I. Herewe see the summary statistics for a UPCA model with six un-

    correlated blocks. Each block has its own PCA model and thesummary statistics ( , , and ) are shown for each model.Also shown in Table I are the performance metrics (SPE and

    ) for an observation for each of the six blocks. We operate onthese five summary statistics to calculate the SPE limits ,the limits , the combined metric and limit foreach of the blocks. Then, to combine the blocks into a singlePCA model (again, assuming correlation between blocks doesnot exist or can be neglected), the same operations are appliedto the sum of the five summary statistics.

    A. Fault Diagnosis With UPCA

    If a fault is observed, it is important to quickly determinethe root cause of the fault so that engineering intervention canreturn the tool to its normal operating state. Specifically, we areinterested in rapidly identifying the variable(s) that contributeto the fault [2]. The UPCA algorithm lends itself well to such adrill-down approach. In the example in Table I, we see that thecombined statistic is greater than its limit, indicating that afault has likely occurred. After establishing that a fault has beenobserved, the next step would be to look at the contributions ofthe individual blocks. In this case, we see that Block 6 has thelargest contribution. At this point the engineer would look at thecontributions of the individual variables within the block. Thecontribution to the SPE from a single variable is

    SPE (30)

    The contribution to the statistic according to Qin et al. [18]is

    (31)

    and the contribution to the combined index is

    (32)

    where

    (33)

    In the above equations, and are used to denote row ofand , respectively. Limits for the variable contributions couldbe determined using (27)(29), by considering each block to bea single variable.

    B. Process Characterization With UPCA

    We discussed in the previous section how it is often desirableto drill down to find the root cause of a process fault. However,often the opposite is also true, namely, that we wish to roll upprocess data to characterize larger groups of data. For example,we may wish to group multiple wafers into lots, multiple lotsinto products, multiple chambers into tools, or multiple toolsinto tool groups. In this section, we discuss using the UPCAalgorithm to generate performance indices for groups of processruns.

    One of the fundamental assumptions of the PCA algorithmis that there is no autocorrelation between process runs [29]. Ifthis assumption holds, then one can apply the UPCA algorithm

    to combine recursive PCA models (or UPCA models) from mul-tiple process runs. One should contrast this with the prospect ofunfolding data in the wafer direction to build a lot-level PCAmodel. In this case, the PCA model for a 25-wafer lot would be625 times as large. Because it may be unrealistic to unfold dataand build a CPCA or HPCA model on an entire lots worth ofvariables, it is reasonable with UPCA to combine the resultsof PCA as applied to single wafers to generate performancemetrics for a lot. Lots can then be combined to characterize aproduct.

    Using such an approach, a process engineer would first startby looking at a trend chart of lot-level SPE, , or . If a lotsperformance index exceeds its limit, the engineer would drill

    down to the wafer level. The wafer-level contributions wouldthen be used to identify the faulty wafer. This pattern is con-tinued to the PCA block and then the parameter level until theroot cause is identified.

    C. Some Practical Considerations With UPCA

    The use of UPCA has advantages and disadvantages. The ob-vious disadvantage is that the assumption of interblock orthogo-nality is only an approximation. Correlation between the blockswill cause the (and therefore ) limits to be smaller thanwhat we would expect from a PCA model that captures the co-variance. Likewise, if this correlation structure is ignored, dis-ruptions in the correlation structure between blocks is not cap-

    tured by the SPE index, so it is possible to miss process faults.More interblock correlation causes the UPCA approximation to

  • 8/2/2019 05458324

    6/9

    206 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010

    Fig. 5. Fault detection for PCA and UPCA after removing the correlation be-tween blocks.

    be less accurate. It follows that an FDC engineer must considera tradeoff between model size and model performance.

    UPCA also has the advantage of using process knowledge to

    declare that blocks are orthogonal. Because the PCA models are

    built with a finite data set, the models often identify relationships

    that do not actually exist. UPCA allows process knowledge to

    disallow such false relationships.

    Finally, UPCA models can be constructed from the PCA

    blocks in situ. If engineers observe that certain PCA blocks are

    consistently causing faults in the performance indices without

    a real impact to the product, it is possible to simply remove

    these blocks from UPCA without having to rebuild a PCA

    model. Furthermore, it is possible to rebuild a single PCA block

    without having to rebuild the entire model.

    IV. SEMICONDUCTOR CASE STUDY

    In this section, we apply the UPCA algorithm to wafer

    electrical test (WET) data. In this case, process engineers

    have grouped 484 variables into fifteen blocks where process

    knowledge has led them to believe that there should be little

    or no interaction between the blocks. It is worth noting here

    that one can imagine a method to automatically group blocks

    directly from the correlation structure. Although the grouping

    may not be optimal in the sense of creating orthogonal blocks,we have chosen to use product engineerings blocks because

    they capture most physical meaning when doing root cause

    analysis. The correlation matrix and the blocking for this case

    study is shown in Fig. 4. The case study looks at a total of 800

    wafers, where the first 500 wafers are used for model building

    and the final 300 wafers are used to test the model.

    A. Comparing PCA and UPCA

    We first consider the scenario in which there is truly no cor-

    relation between the blocks. To achieve this, we have artificially

    set the off-block-diagonal components of the correlation matrix

    to zero. The PCA and UPCA algorithm are applied to this cor-relation matrix and the performance index is plotted in Fig. 5.

    Fig. 6. Fault detection for PCA and UPCA when correlation exists betweenblocks.

    The results demonstrate that, in the absence of interblock corre-

    lation, PCA and UPCA are identical. However, by using UPCA,

    the model size has been reduced by 89.4% and the model was

    built (including eigenvalue decomposition, cross validation, and

    missing value reconstruction) 26 times faster.

    Returning now to the correlation matrix illustrated in Fig. 4,

    we see that a small amount of correlation exists between the

    blocks. As such, the UPCA algorithm is only an approximation.

    Fig. 6 compares the UPCA and PCA algorithms when correla-

    tion exists between the blocks. We see that, even when corre-

    lation exists between the blocks, UPCA is a good approxima-

    tion of PCA. Both methods identify most of the same wafers asbeing faulty, including a large disturbance at Wafer 250. There

    are only a select few smaller disturbances that cross the control

    limit for UPCA and not PCA. These can be considered false

    alarms for UPCA, and could be avoided through a slight adjust-

    ment to the confidence interval, . As before, the UPCA model

    size is reduced by 89.4% and the model was built 26 times faster.

    As mentioned in Section III, one extreme of UPCA is when

    every parameter is assigned to its own block. In this case, UPCA

    assumes that there is no correlation between variables and the

    approach is identical to simply applying a test to the data.

    The drawback of this approach is that the correlation causes ad-

    ditional false alarms in the model space and misses faultsin the residual SPE space. This is illustrated in Fig. 7. The ob-

    servations inside the PCA limits but outside the limits would

    be false alarms. Observations inside the limits but outside

    the PCA limits would be missed faults.

    Returning now to the case study, we can compare the perfor-

    mance of PCA with the application of UPCA. This compar-

    ison is show in Fig. 8. Here we see that, although the model size

    is reduced by 99.7%, by completely ignoring all correlation we

    greatly increase the number of false alarms and we missa fault at

    Wafer 180. Clearly such an approach is inappropriate for mon-

    itoring a process, but it illustrates that a key contribution of the

    UPCA algorithm is that an FDC engineer can make a tradeoffbetween model performance and model size.

  • 8/2/2019 05458324

    7/9

    GOOD et al.: ALGORITHM FOR MODEL SIZE REDUCTION 207

    Fig. 7. Comparison of the ' limits for PCA and the limits for UPCA whencorrelation exists between two variables.

    Fig. 8. Fault detection for PCA and UPCA when correlation exists betweenblocks, with UPCA results based on the test.

    Fig. 9. Fault diagnosis using contributions from the fifteen PCA models.

    Fig. 10. Fault diagnosis using contributions from the individual variables inBlock 10.

    TABLE IIUPCA ROLL-UP OF SUMMARY STATISTICS FROM SEVERAL LOTS

    TO GET OVERALL STATISTICS FOR THE PRODUCT

    B. Root Cause Analysis

    Both PCA and UPCA show a sizeable fault at Wafer 250.

    We start to identify the root cause of the fault by looking at the

    contributions of the 15 individual PCA models. This is shown in

    Fig. 9, where each of the contributions is scaled by the limit.

    We see a clear signal at Block 10, where the block has exceeded

    its limit by a factor of nine. After identifying this block, we can

    now drill deeper into Block 10 to see the contributions from

    the individual variables. This is shown in Fig. 10. Here we see

    that ten individual variables in the block contribute to the fault.

    These data are then used by engineers to determine the severity

    of the fault and whether the wafer should be scrapped now toprevent further unnecessary processing.

  • 8/2/2019 05458324

    8/9

    208 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010

    C. Lot-Level Summaries

    The data used to train and test the model were processed in

    25-wafer lots. We can now use the UPCA algorithm to combine

    the individual 25 wafers results (25 484 variables) into lots to

    mimic the results as if a single PCA model with 12 100 vari-

    ables were applied to all of the wafers of the lot simultaneously.

    Returning to our case study, the 300 test wafers are divided into12 lots. The necessary summary statistics are shown in Table II,

    which shows that two lots are flagged as being over the limit

    . Drilling down into Lot 10, we would see the contributions

    for each of the wafers in the lots (including Wafer 250, with a

    known fault). The investigation would continue all the way to

    the parameter level.

    Also shown in Table II is a summary for the twelve lots. This

    value could be used to summarize the health of an entire product

    for the time period in question. In this case, we see that even

    though we havea handful of faulty wafers the product as a whole

    has not exceeded the fault criterion.

    V. CONCLUSION

    In this paper, we introduce a new algorithm termed unified

    PCA that is used to combine multiple PCA models into a larger

    model. We show that if the variables in the individual model

    blocks are uncorrelated, then UPCA provides identical perfor-

    mance to PCA but with dramatically smaller model sizes. In

    practice, correlation between the blocks exists and therefore

    UPCA is only an approximation of PCA. As such, a process

    engineer can tune UPCA to form a balance between model size

    and fault detection accuracy.

    Although PCA has the ability to drill down for fast root cause

    identification, UPCA introduces the capability of rolling-upsummary statistics. If PCA (or UPCA) models are created at the

    wafer level, then UPCA can be used to combine the wafers into

    lots. Likewise lots can be combined to monitor entire products.

    Finally, several additional advantages to the algorithm are

    discussed including the ability to create and modify models

    in situ, and the ability to use process knowledge to disallow

    insignificant correlation between unrelated variables.

    As a final note, UPCA could greatly benefit from a method

    to automatically group variables. The authors have investigated

    a handful of methods but, to date, have found no satisfactory

    approach. As such, this remains as an open topic and warrants

    further investigation.

    ACKNOWLEDGMENT

    The authors gratefully acknowledge D. Kadosh,

    K. Chamness, and B. Harris for implementing the Test

    Parameter Analysis application in GLOBALFOUNDRIES

    Fab1 and Spansions Fab25.

    REFERENCES

    [1] H. Yue, S. Qin, R. Markle, C. Nauert, and M. Gatto, Fault detection ofplasma etchers using optical emission spectra,IEEE Trans. Semicond.

    Manuf., vol. 13, no. 3, pp. 374385, 2000.[2] G. Cherry and S. Qin, Multiblock principal component analysis based

    on a combined index for semiconductor fault detection and diagnosis,IEEE Trans. Semicond. Manuf., vol. 19, no. 2, pp. 159172, 2006.

    [3] B. Wise, N. Gallagher, S. Butler, J. D. D. White, and G. Barna, Acomparison of principal component analysis, multiway principal com-ponent analysis, trilinear decompositionand parallel factor analysis forfault detection in a semiconductor etch process, J. Chemometr., vol.13, pp. 379396, 1999.

    [4] H. Yue and S. Qin, Reconstruction based fault detection using a com-bined index, Ind. Eng. Chem. Res., vol. 40, no. 20, pp. 44034414,2001.

    [5] H. Yue and M. Tomoyasu, Weighted principal component analysisand its applications to improve FDC performance, in Proc. 43rd IEEEConf. Decision Contr., Atlantis, Paradise Island, Bahamas, 2004, pp.42624367.

    [6] Q. He and J. Wang, Fault detection using the k-nearest neighbor rulefor semicondductor manufacturing processes, IEEE Trans. Semicond.

    Manuf., vol. 20, no. 4, pp. 345354, 2007.[7] L. Yan, A PCA-based PCM data analyzing method for diagnosing

    process failures, IEEE Trans. Semicond. Manuf., vol. 19, no. 4, pp.404410, 2006.

    [8] K. Skinner, D. Montgomery, G. Runger, J. Fowler, D. McCarville, T.Rhoads, and J. Stanley, Multivariate statistical methods for modelingand analysis of wafer probe test data, IEEE Trans. Semicond. Manuf.,vol. 15, no. 4, pp. 523530, 2002.

    [9] K. Chamness, Multivariate Fault Detection and Visualization in theSemiconductor Industry, Ph.D., Univ. Texas, Austin, 2006.

    [10] A. Holfeld, R. Barlovic, and R. Good, A fab-wide APC sampling ap-plication, IEEE Trans. Semicond. Manuf., vol. 20, no. 4, pp. 393399,2007.

    [11] R. Good and M. Purdy, An MILP approach to wafer sampling andselection, IEEE Trans. Semicond. Manuf., vol. 20, no. 4, pp. 400407,2007.

    [12] M. Purdy, K. Lensing, andC. Nicksic, Method forefficiently handlingmetrology queues, in Proc. Int. Symp. Semicond. Manuf., San Jose,CA, 2005, pp. 7174.

    [13] M. LaPedus, Lithography Vendors Prep for the Next Round: RivalImmersion Scanners Roll for 45-nm Node EE Times Jul. [Online].Available: http://www.eetimes.com/showArticle.jhtml?articleID=190300855

    [14] W. Li, H. Yue, S. Valle, and J. Qin, Recursive PCA for adaptiveprocess monitoring, J. Proc. Cont., vol. 10, pp. 471486, 2000.

    [15] S. Wold, S. Hellberg, T. Lundstedt, M. Sjstrm, and H. Wold, in Proc.Symp. PLS Model Building: Theory and Applications, 1987.

    [16] S. Wold, N. Kettaneh, and K. Tjessem, Hierarchical multiblock PLSand PC models for easier model interpretation and as an alternative tovariable selection, J. Chemometr., vol. 10, pp. 463482, 1996.

    [17] J. A. Westerhuis, T. Kourti, and J. MacGregor, Analysis of multiblockand hierarchical PCA and PLS models, J. Chemometr., vol. 12, pp.301321, 1998.

    [18] S. Qin, S. Valle, and M. Piovoso, On unifying multi-block analysiswith applications to decentralized process monitoring,J. Chemometr.,vol. 15, pp. 715742, 2001.

    [19] K. Kosanovich, K. Dahl, and M. Piovoso, Improved process under-standing using multiway principal component analysis, Ind. Eng.Chem. Res., vol. 35, pp. 138138, 1996.

    [20] C. Undey and A. Cinar, Statistical monitoring of multistage, multi-phase batch processes, IEEE Control Syst. Mag., vol. 22, no. 5, pp.4052, 2002.

    [21] N. Lu, F. Gao, and F. Wang, Sub-PCA modeling and on-line mon-itoring strategy for batch processes, AIChE J., vol. 50, no. 1, pp.255259, 2004.

    [22] J. Camacho and J. Pic, Multi-phase principal component analysisofor batch processes modelling, Chem. Intell. Lab. Syst., vol. 81, pp.127136, 2006.

    [23] S. Wold, Cross validatory estimation of the number of components infactor and principal component analysis, Technometrics, vol. 20, no.4, pp. 397406, Nov. 1978.

    [24] S. Valle, W. Li, and S. Qin, Selection of the number of principal com-ponents: A new criterion with comparison to existing methods, Ind.

    Eng. Chem. Res., vol. 38, no. 11, pp. 43894401, 1999.[25] J. Jackson and G. Mudholkar, Control procedures for residuals asso-

    ciated with principal component analysis, Technometr., vol. 21, no. 3,pp. 341349, Aug. 1979.

    [26] S. Wold, P. Geladi, K. Esbensen, and J. Ohman, Multi-way principalcomponents and PLS analysis,J. Chemometr., vol. 1, pp.4156, 1987.

    [27] U. Von Luxburg, A tutorial on spectral clustering, Statist. Comput.,vol. 17, no. 4, pp. 395416, 2007.

    [28] L. Hogben, Linear algebra, in Ser. Discrete Mathematics and Its Ap-plications. London, U.K.: Chapman and Hall, 2006.

    [29] S. Wold, K. Esbensen, and P. Geladi, Principal component analysis,Chem. Intell. Lab. Sys., vol. 2, pp. 3752, 1987.

  • 8/2/2019 05458324

    9/9

    GOOD et al.: ALGORITHM FOR MODEL SIZE REDUCTION 209

    Richard P. Good received the B.S. degree in chem-ical engineering from the Georgia Institute of Tech-nology, Atlanta, in 2000, and the M.S. and Ph.D. de-grees in chemical engineering from the University ofTexas at Austin in 2002 and 2004, respectively.

    He is currently a Member of the Technical Staffin the Advanced Process Control group, GLOBAL-FOUNDRIES, Austin. His research interests include

    multivariate run-to-run control and fault detection,wafer sampling and selection, supervisory electricalparameter control, and yield prediction.

    Daniel Kost received the diploma and Ph.D. degreesin physics from T.U. Dresden, Germany, in 2003 and2007, respectively.

    He is currently a Software and ApplicationEngineer in the Advanced Process Control group,GLOBALFOUNDRIES, Dresden, Germany. Priorto joining GLOBALFOUNDRIES, he performedinternships with James R. McDonald Laboratoryand Kansas State University. He also worked asa scientist with the Dresden-Rossendorf ResearchCenter, Dresden, in the fields of ion solid interac-

    tion, plasma physics, and highly charged ion physics. His research interestsinclude multivariate fault detection, yield prediction, yield-loss classification,and multivariate fault detection on wafer electrical test data, in addition tophysics-related topics.

    Gregory A. Cherry received the B.S. degree inchemical engineering from the University of Mary-land, College Park, in 2000, and the M.S. and Ph.D.degrees in chemical engineering from the Universityof Texas, Austin, in 2002 and 2006, respectively.

    He is currently a Member of the Technical Staffwith GLOBALFOUNDRIES, Austin, TX. Hisresearch interests include fault detection and diag-

    nosis, as applied to semiconductor processes. Priorto joining GLOBALFOUNDRIES, he performedinternships with Degussa-Hls Corporation, Mobile,

    AL, and the Army Research Laboratory, Aberdeen Proving Ground, MD.