An empirical comparison of alternative methods for principal component extraction

18
J BUSN RES 1987:15:173-190 173 An Empirical Comparison of Alternative Methods for Principal Component Extraction Raymond Hubbard Drake University Stuart J. Allen The Pennsylvania State University, Behrend College A major problem confronting users of principal component analysis is the deter- mination of how many components to extract from an empirical correlation matrix. Using 30 such matrices obtained from marketing and psychology sources, the au- thors provide a comparative assessment of the extraction capabilities exhibited by five principal component decision rules. These are the Kaiser-Guttman, scree, Bartlett, Horn, and random intercepts procedures. Application of these rules pro- duces highly discrepant results. The random intercepts and Bartlett formulations yield unacceptable component solutions by grossly under- and overfactoring re- spectively. The Kaiser-Guttman and scree rules performed equivalently, yet re- vealed tendencies to overfactor. In comparison Horn’s test acquitted itself with distinction, and warrants greater attention from applied researchers. Introduction Although the researcher employs a variety of factor analytic procedures, the ap- proach featuring principal component analysis dominates the literature both in marketing [l] and the behavioral sciences [31, 39, 40, 41, 42, 441. (Please see Kendall and Buckland [30] for a definition of principal components). However, the issue of how many components (factors) to extract from an empirical correlation matrix has long beleaguered scholars [7]. As Stewart [36, p. 581 cogently reminds us: “Perhaps no problem has generated more controversy and misunderstanding than the number-of-factors problem.” Efforts to resolve this enigma have resulted in the development of several different rules for determining the point at which principal component extraction should terminate. These stopping rules, all involv- ing a consideration of eigenvalues, include the Kaiser-Guttman rule, Cattell’s scree test, Bartlett’s test, and Horn’s parallel analysis. Finally, it is instructive to note Send correspondence to Raymond Hubbard, College of Business Administration, Drake University, Des Moines, Iowa 503 11 Journal of Business Research 15, 173-190 (1987) 0 1987 Elsevier Science Publishing Co., Inc. 1987 52 Vanderbilt Ave., New York, NY 10017 0148-2963/87/$3.50

Transcript of An empirical comparison of alternative methods for principal component extraction

Page 1: An empirical comparison of alternative methods for principal component extraction

J BUSN RES 1987:15:173-190

173

An Empirical Comparison of Alternative Methods for Principal Component Extraction

Raymond Hubbard Drake University

Stuart J. Allen The Pennsylvania State University, Behrend College

A major problem confronting users of principal component analysis is the deter- mination of how many components to extract from an empirical correlation matrix. Using 30 such matrices obtained from marketing and psychology sources, the au- thors provide a comparative assessment of the extraction capabilities exhibited by five principal component decision rules. These are the Kaiser-Guttman, scree, Bartlett, Horn, and random intercepts procedures. Application of these rules pro- duces highly discrepant results. The random intercepts and Bartlett formulations yield unacceptable component solutions by grossly under- and overfactoring re- spectively. The Kaiser-Guttman and scree rules performed equivalently, yet re- vealed tendencies to overfactor. In comparison Horn’s test acquitted itself with distinction, and warrants greater attention from applied researchers.

Introduction

Although the researcher employs a variety of factor analytic procedures, the ap- proach featuring principal component analysis dominates the literature both in marketing [l] and the behavioral sciences [31, 39, 40, 41, 42, 441. (Please see Kendall and Buckland [30] for a definition of principal components). However, the issue of how many components (factors) to extract from an empirical correlation matrix has long beleaguered scholars [7]. As Stewart [36, p. 581 cogently reminds us: “Perhaps no problem has generated more controversy and misunderstanding than the number-of-factors problem.” Efforts to resolve this enigma have resulted in the development of several different rules for determining the point at which principal component extraction should terminate. These stopping rules, all involv- ing a consideration of eigenvalues, include the Kaiser-Guttman rule, Cattell’s scree test, Bartlett’s test, and Horn’s parallel analysis. Finally, it is instructive to note

Send correspondence to Raymond Hubbard, College of Business Administration, Drake University, Des Moines, Iowa 503 11

Journal of Business Research 15, 173-190 (1987) 0 1987 Elsevier Science Publishing Co., Inc. 1987 52 Vanderbilt Ave., New York, NY 10017

0148-2963/87/$3.50

Page 2: An empirical comparison of alternative methods for principal component extraction

174 J BUSN RES 1987:15:173-190 R. Hubbard and S. J. Allen

that confirmatory factor analysis has not solved the number-of-components problem (see Cliff [12] in this regard).

Virtually no research evidence is available at the empirical level concerning the comparative efficacy of any of these alternative methods. Given their different underlying rationales, one would anticipate some variation among the rules as to the suggested number of principal components to retain. This being the case, a series of questions arise, for example: Does application of these tests to a given set of data produce approximately equivalent or corroborative findings? If incon- gruous results are obtained, are they amenable to reconciliation? Should the use of some of these stopping rules be completely abandoned? On what basis does a researcher select a particular stopping rule? How is this selection justified? Is any one decision rule demonstrably superior to the others? In view of such questions, we pursued the following method in the present paper. First, we undertake a brief review of certain attributes of the Kaiser-Guttman, scree, Bartlett, and Horn tests. Second, we summarize a new procedure for establishing significant component retention, the random intercepts test. Third, we present the results of previous studies that examined the merits of various stopping rules using simulated data sets. Fourth, we offer an extensive comparative assessment of the performance characteristics of the Kaiser-Guttman, scree, Bartlett, Horn, and random intercepts tests, based on empirical data garnered from the marketing and psychology liter- ature. Finally, we make strong recommendations about the direction for further research in this problematic area.

A Succinct Review of the Kaiser-Guttman, Scree, Bartlett, Horn, and Random Intercepts Tests

In ascending order of desirability Kaiser [28], whose original classification scheme has been subsequently embellished by Hakstian and Muller [21], suggested that the extraction of principal components be predicated on 1) statistical, 2) algebraic, 3) psychometric, and 4) psychological meaningfulness/importance/interpretability criteria. Statistical approaches to the number-of-components problem typically in- volve the construction of significance tests for assessing the distinguishability of the eigenvalues (latent roots) of a sample-based correlation matrix [21]. Algebraic methods, on the other hand, assume that one is working with a population cor- relation matrix, the object being to find the smallest number of components nec- essary to sufficiently reproduce that original matrix. More formally, the number of components to extract is equivalent to estimating the minimum rank of the correlation matrix [ 181.

The notion of reliability is at the core of the psychometric approach. Whereas classical statistical inference assumes that one is sampling from a population of in&vi&&s, psychometric inference involves sampling from a population of vu+ ables (usually human performance tests). The reliability issue concerns the ade- quacy of the psychometric sampling and led to the development of the Kuder- Richardson reliability test [34]. This test has connections with the Kaiser-Guttman rule, as is noted below. Finally, Cattell and Vogelmann [ll] observe that psy- chometric approaches treat the sample as if it were the population. The psycho- logical (marketing, etc.) importance category refers to the ability to satisfactorily interpret or make sense of the principal components generated in any given study.

Page 3: An empirical comparison of alternative methods for principal component extraction

Methods for Principal Component Extraction J BUSN RES 1987:15:173-190

175

The Kaiser-Guttman (KG) rule, which is primarily psychometric in nature, is easily the criterion most widely used for determining principal component removal in marketing and the behavioral sciences [l, 22, 31, 40, 431. So overwhelming is this domination that the rule is generally included as the default option in most statistical packages. Originating with Guttman’s [20] seminal work on the lower bounds of correlation matrices, and later augmented by Kaiser’s [29] demonstration that for an eigenvalue to have positive Kuder-Richardson reliability it should be of a magnitude greater than one, the KG rule specifies the retention of all principal components of a correlation matrix having eigenvalues equal to or exceeding unity. For this reason it is commonly referred to as the eigenvalue-one or (latent) roots criterion.

In addition to its psychometric appeal, the KG rule also possesses a mathematical rationale that is straightforward. With p and X denoting variables and eigenvalues respectively, it should be noted that if Xi is the eigenvalue of the ith principal component, it can be shown that X,lp is the proportion of the variation in the standardized data that can be accounted for by that component. Even if the com- ponents were generated by random groupings of the p variables, each component on an average would be expected to “explain” l/p of the variation in the data. It would therefore have an eigenvalue of one. Thus, when a component has an eigenvalue equal to unity, it is considered to be contributing its “average” share toward the explanation of the variation in the observed scores.

The scree test, developed by Cattell [8], may be classified as belonging to the psychometric/importance categories referenced above [21, 281. This test requires the successive plotting of eigenvalues after they have been ordered from large to small. A visual inspection of the resultant graph is then made in an effort to detect a convincing elbow or break in the curve. The number of components to be retained is decided by observing the number of eigenvalues lying above the elbow.

The rationale for the scree test rests on the assumption that components ac- counting for substantial proportions of the variation in a data set (i.e., those with large eigenvalues) represent major or meaningful dimensions. As such they ought to be clearly distinguishable from components of small variation since the latter are understood to reflect nothing more than different combinations of measurement and sampling error. In short, a convincing break in the plot of eigenvalues should be present, just as the slope of a mountain is readily identifiable from that of the scree or rubble found at its base [24].

Bartlett [4, 51 addresses the number-of-components problem from a statistical significance perspective. He first devised the sphericity test, which is chiefly con- cerned with deciding whether an empirical correlation matrix warrants a principal component analysis in the first place. Alternatively stated, the sphericity test sta- tistically ascertains the extent to which an empirical correlation matrix differs from an identity matrix. If this difference is not significant, then there is no rational basis to support a principal component analysis [37]. The sphericity test, distributed as approximately x-square for large samples, is computed as follows:

x2 = - (n-1 -F) LNIRI,

where p refers to the number of variables in the empirical correlation matrix R, n denotes sample size, LnlRl is the natural logarithm of the determinant of the

Page 4: An empirical comparison of alternative methods for principal component extraction

176 J BUSN RES 1987:15:173-190 R. Hubbard and S. J. Allen

correlation matrix, and [OS(p* -p)] indicates the degrees of freedom. If the com- puted x-square exceeds the appropriate tabled value, the correlation matrix is suitable for a principal component analysis [15].

Bartlett also formulated a significance test to be applied to those residual matrices produced by successive principal component removal. The test concerns the equal- ity of those eigenvalues remaining after their r larger predecessors have been extracted. That is, one tests the hypothesis A,,, = A,,, = . . . = A,. The point at which this hypothesis can no longer be rejected determines the number of significant principal components. The formula for calculating Bartlett’s residual test is given by:

x2 = - (n-l -y -$ ) LnW,_,,

and p-Al-A2- . . . -Ar pm’

W@_,, = IRII[A, A* . . . A, ( P-r

) 17

where p and it are as previously defined, IRI is the determinant of the correlation matrix, Ai is the magnitude of an eigenvalue, r indicates the number of extracted eigenvalues, and W,_, designates the value of the residuals. In this instance the degrees of freedom are provided by the expression [O.S(p - r- l)(p - r + 2)].

Dissatisfied with the inability of the KG rule to reflect sampling error, Horn [23] formulated a procedure called parallel analysis the rationale for which is ex- plained below. Consider a set of p variates with very large samples of size n drawn independently from a normally distributed population of random numbers, N(O,l). If these p variables are then correlated, the resultant p x p matrix will approximate an identity matrix. Therefore, the expected values of the p eigenvalues associated with this matrix will all be close to unity.

However, Horn continues, as long as n is less than infinite the successive plotting of these eigenvalues (ordered from large to small) will not result in the anticipated horizontal line at A = 1.0 but instead will bear the shape of a curve, the so-called reference curve, whose departure from the horizontal line is a function of sampling error (see Figure 1). The greater the sample size, the smaller the deviation of the reference curve from the horizontal line at A = 1 .O and vice versa. Horn contended that since the shape of the reference curve reflects the influence of sampling error, the correct number of components to extract in any given empirical study should be limited to those whose eigenvalues exceed their counterparts indicated by the reference curve. This occurs at the point of intersection between the graph of the plotted eigenvalues derived from the empirical study and the corresponding graph depicting the reference curve. As Figure 1 illustrates, Horn’s test will therefore have a tendency to elicit fewer components than those suggested by the KG criterion.

The reference curve itself is established by randomly generating a large number of correlation matrices (K), all of the same order (p) and sample size (n) as the empirical matrix under scrutiny. Conventional wisdom suggests that K should be on the order of between 30 to 50 replications per empirical correlation matrix [27, 451. This accounts for the expression, “parallel analysis.” A principal component

Page 5: An empirical comparison of alternative methods for principal component extraction

Methods for Principal Component Extraction

MAGNITUDE OF EIGENVALUE

J BUSN RES 1987:15:173-190

177

4 4

5.0

4.0

3.0

2.0

1.0

0

I- Empirical ( Observed/Sample 1

_\ Data

\ \

\ - \

\ \ \ \

\

Reference Curve ( reflects sampling error )

Horn KG P/2

EIGENVALUE NUMBER (1 toPI

Figure 1. Hypothetical illustration of Horn’s test

analysis is then conducted on each of these matrices, thereby yielding K sets of 1,

2,3, . . . , p eigenvalues. The average values of the first, second, and so on through p eigenvalues are then calculated over the K matrices and are plotted as the reference curve on the same graph as the empirical data. As mentioned above, the number of components to retain is indicated by the point at which the eigenvalue plot of the empirical data intersects that of the reference curve. Green [19] also supplies a lucid account of Horn’s parallel analysis.

Finally the recently proposed random intercepts (RI) test deserves commentary. Attributable to Pandit [25, 351, the RI test introduces a probabilistic criterion for directly evaluating the significance of eigenvalues. By drawing upon the random intercepts or random partitions model of probability theory, this test determines whether empirically obtained principal components are significantly larger than those generated randomly on a line segment. Borrowing heavily from Hubbard and Pandit [25] and Pandit [35], we offer the following brief discussion of the RI methodology.

Page 6: An empirical comparison of alternative methods for principal component extraction

178 3 BUSN RES 1987:15:173-190 R. Hubbard and S. J. Allen

I p-A’1 Figure 2. Hypothetical illustration of the RI test.

Consider a line of length p to represent the sum of the eigenvalues associated with a correlation matrix of rank p (see Figure 2). P-l random intercepts made on this line result in segments of length Xi and denote the eigenvalues of random

P

components (p = 2 Ai). Assume the leftmost segment, or eigenvalue, to be of i=l

magnitude A,. A1 will be considered to be significant if its value equals or exceeds that of the smallest significant eigenvalue, A*.

The means of obtaining A* is as follows. Because the intercepts are made at random, the probability that one of them falls to the left of A, is given by A,/p; the probability that an intercept falls to the right of A1 is (p - X,)/p; and the probability that all intercepts fall to the right of A1 is:

Probability (A,, p) = (4)

Note, in addition, that Eq. (4) also expresses the probability that the leftmost eigenvalue is at least of length A,.

For A, to be regarded as significant, it should equal or exceed the value A*, where:

and (Y designates alternative significance levels. Solving Eq. (5) for A* we obtain:

A* = p(l_al’P-‘) (6)

Thus, for various levels of CI and p we can compute the required values for A*. If A, L A*, the eigenvalue is taken to be significant. In other words, one compares

Page 7: An empirical comparison of alternative methods for principal component extraction

Methods for Principal Component Extraction J BUSN RES 1987:15:173-190

the sizes of the eigenvalues produced in empirical studies against those of the appropriate X*‘s, retaining that number of components whose eigenvalues are greater than or equal to those h*‘s.

Principal Component Extraction with Simulated Data Sets

A number of studies have used simulated data sets in order to evaluate the re- spective merits of various component decision rules. Prior to discussing these stud- ies, it is worthwhile to allude to some of the advantages of using simulated data. When using simulated data, the researcher is afforded comfortable latitude with respect to the design of an experiment. Thus one can exercise varying levels of control over such features as the sample size and its method of generation, the number of variables that are assumed to load significantly on each factor, the size and range of communalities, the number of components that are to be programmed into the correlation matrix, and so on. These items, in turn, can be easily manip- ulated, and the experiments replicated, until stable or dependable results are achieved. Such activities are largely beyond the purview of the empirical researcher.

No simulated research results are available concerning the behavior of the RI procedure, and only limited evidence exists regarding the adequacy of Bartlett’s test. Nevertheless, after analyzing a comprehensive data set of some 480 sample correlation matrices generated from different combinations of two variable sizes (p = 36,72), four sample sizes (n = 72,144,180,360), and three known component structures (PC = 3, 6,9), Zwick and Velicer [45] found Bartlett’s test on average to be accurate only about 30% of the time. Moreover they found that the method overestimates the “true” number of components, particularly with increases in sample size.

Monte Carlo evidence attesting to the capabilities of the KG and scree rules is relatively more plentiful. Based on the investigation of 15 population correlation matrices, with variable sizes ranging from eight to 40, and preprogrammed (or “true” number of) factors varying between two and 20, Cattell and Vogelmann [ll] concluded that the KG rule underfactors with small variable sets, but tends to overfactor with large ones. The scree test was judged to be superior. Linn’s [32] Monte Carlo results were based on seven preprogrammed factors, variable sizes equal to 20 or 40, and sample sizes of either 100 or 500. Out of 16 decision conditions, the KG rule estimated the known number of factors correctly on six occasions, underestimated this value on five counts, and overestimated it on the remaining five. But whereas the degree of underestimation was minor, overesti- mations were of a gross character, indicating on average the extraction of 66% more factors than the true number. Dismayed at this finding, Linn [32] felt obliged to recommend against the perpetuation of the KG rule.

Linn’s suggestion was endorsed resoundingly by Zwick and Velicer [44, p. 2671 who observed that, “ . . . there is no evidence supporting the continued use of the Kl (KG) rule” (parentheses ours). Their pronouncement is not without justifica- tion. Based on extensive simulated data which incorporated three levels of a known component structure (PC = 3, 6, 12), three variable levels (p = 36, 72, 144), and three sample sizes (n = 75, 150, 450), these authors demonstrated that over 48

Page 8: An empirical comparison of alternative methods for principal component extraction

180 J BUSN RES 1987:15:173-190 R. Hubbard and S. J. Allen

decision conditions the KG rule could recover the true number of components only 33% of the time. Every one of the remaining conditions resulted in a KG over- estimate. On average, over all 48 conditions, the KG rule extracted approximately 2.7 times more components than the true number.

Horn’s test, on the other hand, fares well when judged against its rivals. For example, in Horn’s [23] original paper the KG criterion severely overfactored relative to the parallel analysis method. Using an empirical correlation matrix consisting of 65 variables, a sample size of 297, and the construction of one random data correlation matrix of the same dimensions for comparative purposes, Horn observed that the KG rule extracted 16 principal components, the parallel analysis technique retaining only nine.

Substantial documentation supporting the ability of Horn’s test to both accurately and consistently recover a known number of factors is provided by Humphreys and Montanelli [27]. Their study employed different numbers of known factors (3, 7), correlation matrices of divergent variable (20, 40) and sample (100, 500) sizes, and between 40 to 50 random data matrix replications per comparison. In almost always specifying the true number of factors to retain, Horn’s test also clearly outperformed the method of maximum likelihood estimation. As Humphreys and Montanelli [27, p. 2041 comment: “An investigator who consistently extracts and rotates more factors than indicated by parallel analysis , . . is almost certainly capitalizing on chance .”

Perhaps the most comprehensive simulation study examining the comparative performances of alternative component-stopping rules is that of Zwick and Velicer [45], described earlier in this section. These authors revealed the scree test to be accurate in indicating the true number of components about 57% of the time. Unfortunately, when the scree method was in error, fully 90% of the errors involved overestimations. The KG rule fared extremely poorly. It was discovered to be accurate in only 22% of the decision conditions. Furthermore, it always overesti- mated and never underestimated the true number of components. By way of con- trast, Horn’s test was found to be accurate approximately 92% of the time. As Zwick and Velicer [45, p. 321 remark, Horn’s test “ . . . was typically the most accurate method at each level of complexity examined.” Additional evidence [13, 261 corroborates their statement.

After reviewing these Monte Carlo studies it seems reasonable to assert that, 1) Bartlett’s test overfactors correlation matrices, 2) application of the KG criterion results in overfactoring, often of a gross nature, 3) there is some indication that the scree rule also displays a tendency to overfactor, albeit to a lesser extent than the KG rule, and 4) Horn’s parallel analysis appears to be consistently accurate in identifying the true number of components to elicit.

The generalization of findings based on simulated data sets to real ones is ob- viously an important issue [45]. Simulated data, by its nature designed specifically to meet as closely as permissible the assumptions of the particular factor analytic model adopted, is of a comparatively pristine quality rarely, if ever, encountered by the market researcher working in the field. In short, empirical data usually does not display the well-behaved characteristics typical of its simulated counterpart [24, 381. Or as Gorsuch [17, p. 3641 maintains: “Seldom does an empirical set of data clearly fall within the components model.” Yet real data remains, no matter how

Page 9: An empirical comparison of alternative methods for principal component extraction

Methods for Principal Component Extraction J BUSN RES 1987:15:173-190

181

much we may prefer otherwise, the stuff with which marketing practitioners must necessarily deal.

Of course relative degrees of “noise” can be introduced into simulated data sets as some have shown [l, 2, 32, 381. And while the data sets in these studies could have been noisier yet, research efforts in this direction must be commended and deserve further encouragement. Notwithstanding such comments, it is evident that information sorely needs to be gathered regarding the performance characteristics of competing decision rules when applied to real data sets. This issue is addressed below.

An Empirical Comparison of the Five Rules

In order to compare the extraction capabilities exhibited by the KG, scree, Bartlett, Horn, and RI rules, the authors subjected some 30 empirical correlation matrices from the marketing (16) and psychology (14) literature to a complete principal component analysis. The data sets themselves are presented alphabetically in the Appendix and possess variable (p) and sample (n) sizes typical of those encountered by the market researcher.

The results of the application of the five stopping rules to these data are reported in Table 1. This table reveals that great dissimilarities exist in the extraction ca- pacities of the five methods. One could compare the mean number of components removed by each of the different approaches across the 30 correlation matrices. However, since the absolute values of the means themselves is a function of the number of variables included in the correlation matrices serving as input, it seems preferable to employ a measure that is independent of variable size. For this reason the PC/p ratio was used, that is, the ratio of significant principal components removed (PC) to the total number of variables (p) in the original correlation matrix.

The average PC/p ratio for the KG rule was 0.283. This is consistent with Kaiser’s [28, p. 451 observation that the PC/p ratio ordinarily “ _ . . runs from a sixth, say, to a third, of the number of variables.” Corresponding figures for the remaining procedures were as follows: scree (0.247), Horn’s test (O.ZOS), Bartlett’s test applied at the 0.01 level (0.734) and at the 0.05 level (0.794), the RI test applied at the 0.01 level (0.034) and at the 0.05 level (0.089). Moreover, on average the KG rule “explained” some 64.8% of total data variation as opposed to a figure of 61% for the scree test. This difference was not statistically significant (t = 1.86, p = 0.069). Nor was the difference between the average variation accounted for by Horn’s (56.7%) procedure and the scree test (t = 1.63, p = 0.109). However there was a highly significant difference between these means for the KG and Horn methods (t = 3.54, p = 0.001). Bartlett’s test at the 0.01 level explained an average of some 90% of total variation as compared with 92.7% at the 0.05 level. These figures are obviously significantly larger than those registered by the other four rules. Finally, the RI test at the 0.01 level on average accounted for 19.7% of the data variation, a figure increasing to 37.6% at the 0.05 level. Every other test reported significantly higher means than did the RI procedure. In terms of component extraction per- formances, then, the empirical findings reported here generally agree with those obtained using simulated data sets. That is, Bartlett’s test retains the most com- ponents, followed in descending order by the KG, scree, Horn, and RI criteria.

Page 10: An empirical comparison of alternative methods for principal component extraction

182 J BUSN RES 1987:15:173-190 R. Hubbard and S. J. Allen

Table 1. Number of Principal Components Extracted by Alternative Rules

Number of Principal Components Extracted

Data Set p n

KG Scree Bartlett

0.01 0.05

Horn RI

0.01 0.05

1 10 200 4 (66.8) 3 (56.8) 2 10 117 3 (63.6) 3 (63.6) 3 10 100 3 (60.8) 2 (50.7) 4 10 100 3 (59.5) 3 (59.5) 5 10 64 3 (62.8) 2 (50.5) 6 10 53 3 (66.3) 3 (66.3) 7 10 123 4 (72.8) 4 (72.8) 8 10 38 4 (69.1) 4 (69.1) 9 17 213 5 (71.5) 5 (71.5)

10 17 212 5 (73.0) 6 (77.6) 11 7 329 2 (64.5) 2 (64.5) 12 9 421 1 (57.7) 1 (57.7) 13 13 170 4 (64.4) 3 (56.7) 14 5 80 2 (61.3) 2 (61.3) 15 26 197 4 (63.3) 3 (59.1) 16 34 283 8 (56.7) 6 (50.6) 17 24 145 5 (60.2) 4 (55.9) 18 8 305 2 (80.5) 2 (80.5) 19 8 211 2 (67.6) 2 (67.6) 20 5 -L? 1 (54.8) 1 (54.8) 21 19 120 5 (56.9) 3 (45.3) 22 14 94 1 (76.0) 1 (76.0) 23 12 -R 3 (55.5) 2 (47.2) 24 10 228 3 (65.5) 3 (65.5) 25 25 154 6 (66.2) 6 (66.2) 26 10 216 2 (63.5) 2 (63.5) 27 19 138 7 (64.9) 5 (53.9) 28 9 1442 4 (70.7) 2 (47.2) 29 30 980 10 (60.9) 7 (50.7) 30 9 850 3 (67.1) 3 (67.1)

9 (97.2) 6 (84.5) 9 (98.3) 6 (83.8) 5 (79.4) 5 (82.4) 6 (85.8) 1 (28.4)

15 (97.9) 16 (99.6) 6 (95.1) 8 (97.8)

12 (99.6) 2 (61.3)

23 (99.0) 33 (99.6) 10 (77.5) 7 (98.8) 7 (97.5)

--

18 (99.5) 5 (91.0)

9 (98.0) 13 (85.5) 8 (96.2)

15 (93.3) 8 (99.8)

25 (93.7) 8 (97.4)

9 (97.2) 3 (56.8) 9 (97.6) 3 (63.6) 9 (98.3) 2 (50.7) 7 (89.4) 3 (59.5) 7 (91.1) 2 (50.5) 7 (92.5) 3 (66.3) 9 (97.8) 2 (51.5) 1 (28.4) 2 (45.9)

15 (97.9) 3 (58.6) 16 (99.6) 4 (66.4) 6 (95.1) 1 (50.1) 8 (97.8) 1 (57.7)

12 (99.6) 3 (56.7) 2 (61.3) 2 (61.3)

23 (99.0) 2 (53.7) 33 (99.6) 3 (39.6) 11 (80.2) 4 (55.9) 7 (98.8) 2 (80.5) 7 (97.5) 2 (67.6)

-- --

18 (99.5) 2 (38.2) 6 (92.7) 1 (76.0)

-- --

9 (98.0) 3 (65.5) 20 (95.9) 3 (51.5)

8 (96.2) 1 (51.9) 18 (98.9) 2 (33.6) 8 (99.8) 3 (59.4)

26 (95.1) 7 (50.7) 8 (97.4) 3 (67.1)

0

0 0 0

0 0 0

0 1 (37.3) 1 (37.3) 0 1 (57.7) 0 0 1 (42.3) 1 (27.6) 1 (33.9) 1 (58.4) 1 (51.9) 0 1 (23.0) 1 (76.0) 1 (36.0) 0 1 (33.2) 1 (51.9) 1 (25.2) 0 0 0

1 (31.6) 1 (35.2) 1 (35.7) 1 (29.2) 1 (32.9) 1 (36.7). 1 (35.0) 1 (28.4) 1 (37.3) 1 (37.3) 1 (50.1) 1 (57.7) 1 (25.3) 0 2 (53.7) 1 (27.6) 1 (33.9) 1 (58.4) 1 (51.9) 1 (54.8) 2 (38.2) 1 (76.0) 1 (36.0) 1 (33.0) 1 (33.2) 1 (51.9) 1 (~5.2) 1 (31.9) 1 (13.8) 1 (34.8)

“Absence of sample size precludes computation of Bartlett’s and Horn’s test. Values in parentheses refer to the proportion of the variation in the data accounted for by the number of components extracted.

Given such results, which of the five methods of component extraction should the applied researcher favor? An answer to this question is supplied in the following critical assessment of the competing procedures. The RI model, although concep- tually attractive, severely underfactors correlation matrices. In fact the method was distinctly unimpressive in its ability to recoup a meaningful number of components in even a single study. At the 0.01 level, for example, no components were extri- cated in 16 of the 30 data sets, whereas at the 0.05 level only two matrices yielded more than one component. The dearth of components extracted by the RI pro- cedure is a fatal deficiency that currently negates its usefulness among marketing practitioners. Until such times as substantial improvements in the RI methodology are forthcoming, including, since it is a statistical test, the explicit incorporation of sample size, this approach cannot be adopted.

In stark contrast, as the staggeringly high PC/p ratios indicate, Bartlett’s test results in the pronounced averfactoring of correlation matrices. For example, Bart- lett’s method at the 0.01 level produced an average PC/p ratio some 2.6 times

Page 11: An empirical comparison of alternative methods for principal component extraction

Methods for Principal Component Extraction J BUSN RES 1987:15:173-190

183

greater than that of the ubiquitous KG rule. This figure inflates to a value of 2.8 when the test is applied at the 0.05 level. Furthermore, this procedure generates an average ratio 3.0 to 3.2 times larger than Cattell’s scree test, and 3.5 to 3.8 times greater than Horn’s parallel analysis. Recall also the explained variation figures cited above.

It would be a mistake, however, to infer that Bartlett’s test overfactors sample correlation matrices only in relation to the four alternative approaches investigated in this paper. The method overfactors in an absolute sense as well. To corroborate this statement it should be noted that Bartlett’s test is theoretically capable of retaining as statistically significant all but p - 1 of the original variables in a sample correlation matrix of rank p. We designate this condition a saturated solution and observe that such outcomes make a mockery of the notion of parsimony. Never- theless, at the 0.01 level saturated solutions were in evidence in 13 (46.4%) of the 28 data sets for which Bartlett’s test could be calculated. Saturated solutions at the 0.05 level involved 16 (57.1%) of the data sets. Gorsuch [17] has previously dem- onstrated the degree of absolute overfactoring produced by Bartlett’s test in the context of the common factor model. Our own analyses clearly reveal that this test grossly exaggerates principal component retention also.

Again it should be emphasized that since Bartlett’s test is of a statistical nature, it indicates only the maximum number of components that could be extracted from any given sample correlation matrix and may convey little information about the “correct” number. Bartlett [4] himself acknowledges that the test is susceptible to the charge that us sample size increases certain components are likely to be judged significant even though they account for only trivial proportions of the variance in the data. In fact, the likelihood of some of the retained eigenvalues being no more than Type I errors cannot be overlooked [21].

Finally, it is not uncommon for Bartlett’s rule to accept as significant eigenvalues in the 0.200 to 0.300 range and even lower. Indeed, across our data sets, the medians for the smallest significant eigenvalues were only 0.379 and 0.315 at the 0.01 and 0.05 levels respectively. Thus Bartlett’s procedure can lead to fractionated com- ponents that are difficult, if not impossible, to interpret and replicate. For the reasons cited in this paper, the sensible course of action would be to jettison Bartlett’s test as a means of eliciting components.

Having eliminated the RI and Bartlett tests as meaningful indicators of the number of significant components, which of the three tests remaining should the researcher adopt? This issue can be resolved by analyzing the weaknesses accom- panying each method.

Despite the tremendous popularity of the KG rule, it nevertheless possesses a number of related limitations. To begin with, although obviously no inherent fault of the rule per se, it is widely misunderstood and misapplied. As Hakstian and Muller [21, pp. 470-711 remark:

The application of such rules as the Kaiser-Guttman, then, with procedural implica- tions, strictly speaking, for only the component model, is seen as theoretically inap- propriate when a common-factor or image analysis is being performed.

Irrespective of such admonishments, the rule is routinely invoked in a variety of factor analytic strategies for which it was not intended.

How has this state of affairs arisen? Much of the misuse of the KG rule originates

Page 12: An empirical comparison of alternative methods for principal component extraction

184 J BUSN RES 1987:15:173-190 R. Hubbard and S. J. Allen

from the fact that it is often the only method for component retention with which applied researchers are acquainted. Being frequently unfamiliar with other decision rules, many researchers display an almost blind adherence to the KG criterion. Such nescient behavior is reinforced by the elevated status accorded the rule as the computerized default option in leading statistical packages. This state of affairs leaves Cattell and Vogelmann [ll, pp. 322-231 outraged:

The practice of building the KG into factoring programs because of its extreme cheap- ness, and proceeding with no break for the investigator to apply other checks for number of factors (leaving the novice even unaware of the basis of the decision) is quite indefensible.

Their point is well taken. Knowledge regarding alternative decision rules should be disseminated more effectively.

Additionally, because of its psychometric orientation, a major drawback affili- ated with the KG rule is the assumption that a population correlation matrix is being factored [26, 361. In short, no sampling errors are presumed to exist in the data being analyzed. Unfortunately, the vast majority of marketing studies are based on sample data and, thus, repeatedly violate the above presumption. The introduction of sampling errors in the correlation matrix will be expected to inflate the size of the eigenvalues produced in empirical investigations, and yet the KG rule is powerless to detect this. The upshot is that the KG criterion is likely to accept a number of components whose so-called significance is in fact attributable purely to sampling error. Ideally such components should be discarded.

A related criticism, and unquestionably the most telling one, is the accumulation of Monte Carlo evidence indicating the KG rule’s distinct propensity to retain an excessive number of components. We reported numerous findings supporting this accusation earlier in the paper [ll, 23, 32, 44, 4.51. Finally, a number of other studies also express reservations regarding the appropriateness of the KG rule [6, 10, 16, 31, 431.

In contradistinction to the weaknesses surrounding the KG criterion, the primary disadvantage of the scree test focuses on the degree of subjective interpretation involved in deciding how many components to remove. Differences in these sub- jective interpretations are likely to be exacerbated whenever there appears to be more than one elbow or break in the plot, when more than one suitable line can be drawn through the scree, or when the slope of the plot is so gradual that it is virtually impossible to detect even a single break [44]. The incidence of the first two conditions in particular was noticeable in our data sets. And to this list of hindrances we must add the varying amount of experience with the scree test that each judge commands. While Cattell and Vogelmann [ll] reported high levels of agreement among judges, Crawford and Koopman [14] found the opposite. Ac- knowledging such difficulties, it comes as no surprise to discover in the literature that users of the test often fail to arrive at a unanimous consensus regarding how many components to extract.

Horn’s test, on the other hand, boasts a number of compelling advantages over the other decision rules. In the first place it is conceptually more satisfying than the KG criterion by virtue of its ability to estimate the degree to which sampling error inflates the sizes of empirically obtained eigenvalues. The KG rule simply cannot accommodate this crucially important feature, and will be expected to retain

Page 13: An empirical comparison of alternative methods for principal component extraction

Methods for Principal Component Extraction J BUSN RES 1987:15:173-190

185

a number of components solely on the basis of capitalization on chance fluctuations in a particular sample of data. Therefore, when employing the KG rule, eigenvalues exceeding, yet in the neighborhood of, unity should be viewed with circumspection. As a suggestive illustration consider the implications of excluding as insignificant those eigenvalues in the 1.00 to 1.10 range exhibited in the current 30 data sets. When this marginal adjustment is realized some 19 eigenvalues from a total of 112 (approximately 17%) retained by the KG rule are eliminated. Horn’s parallel analysis would reject such components.

A second advantage of Horn’s test follows directly from the preceding obser- vation; namely, it will extract fewer principal components from a sample correlation matrix than the KG rule will. In fact, on the basis of the data sets utilized in the present study, a significant difference in the average PC/p ratios of the KG (0.283) and Horn (0.208) rules was discernible (t = 3.22, p = 0.002). Given the volume of previously cited material documenting a strong tendency for the KG rule to overfactor, there is considerable advantage in employing a procedure that is more conservative in nature.

Third, Horn’s test may be considered as offering a more objective alternative to Cattell’s scree test. Methods that eliminate the need for guesswork should be embraced. In addition, there is some evidence based on simulated data that the scree test tends to overfactor correlation matrices [27]. While the degree of over- factoring associated with the scree test is not as pronounced as that engendered by the KG rule, it is felt by some [45, p. 301 to be sufficiently deleterious that it “ . . . can no longer be recommended as the method of choice for determining the number of components in PCA [principal component analysis].” This buttresses the argument in favor of adopting a more parsimonious test such as Horn’s. Ex- amination of the average PC/p ratios in our empirical data indicated that Horn’s (0.208) test typically extracts fewer components than does the scree (0.247) test (t = 1.69, p = 0.096).

Fourth, components retained by the method of parallel analysis generally are interpretable. Indeed, Horn’s method typically outperforms the other rules in this regard. While the burden of conceptual interpretation rests ultimately with the researcher’s ability to design and execute a sound study [9], certain “technical” suggestions concerning the notion of interpretability can be offered. In their Monte Carlo study, for example, Zwick and Velicer [45] postulated that major or inter- pretable components should have an eigenvalue of one or more (to ensure positive Kuder-Richardson reliability) and a minimum of three component loadings of ? 0.50 or greater. At this juncture it should be emphasized that with empirical data sets, the latter requirement is a rather stringent one. It is common practice for applied researchers to accept as significant component loadings of ? 0.40 or even -C 0.30 [18]. Abiding by these criteria as a reasonable measure of interpretability, we find that components extracted by the KG and Horn rules satisfy the eigenvalue- one requirement, as do all but one of the components retained by the scree test. Furthermore, over the 28 data sets for which direct comparisons between the three tests could be made, 86.1% of the components removed by Horn’s method were revealed to be interpretable when the loading constraint was 2 0.50. Corresponding figures for the KG and scree rules were 66.7% and 78.3% respectively. When the loading requirement was ? 0.40, 93.1% of the Horn components were interpret- able, compared with 83.3% for the KG rule and 87.0% for the scree test. All three

Page 14: An empirical comparison of alternative methods for principal component extraction

186 J BUSN RES 1987:15:173-190

R. Hubbard and S. J. Allen

tests produced virtually identical percentages of interpretable components when the loading constraint was ? 0.30. For Horn’s test this value was 95.8%, for the KG rule it was 95.4%, and for the scree test it was 96.7%. Judging by these results, it is clear that parallel analysis compares very favorably with its closest rivals when the issue at hand is the interpretability of components. It is equally clear that the sheer number of components retained by Bartlett’s test guarantees that many of them will be rendered incomprehensible. And although all components extracted by the RI procedure are interpretable, this method unfortunately discards large numbers of meaningful components.

Fifth, Horn’s approach is consigned by Hakstian and Muller [21] to the apex (psychologically meaningful/importance category) of Kaiser’s [28] taxonomy. Sixth, Monte Carlo simulations with the procedure demonstrate its distinguished ability in both principal component [13, 451 and common factor [26] analysis. Seventh, with respect to common factor analysis, Horn’s method was found to be more accurate at recovering a prespecified number of factors than was the maximum likelihood method [27].

The preceding account reveals Horn’s approach to be superior in important respects to competing component decision rules. This begs the obvious question: Why has this test been virtually ignored by researchers? Two responses suggest themselves. First, as noted earlier, many applied researchers may be unfamiliar with alternative stopping rules and hence over rely on the KG criterion. Second, some researchers may be reluctant to adopt a method whose implementation re- quires considerable time and effort, most of which is consumed in the construction of the pivotal reference curve. The present study, for example, necessitated the simulation from an N(O,l) population of some 1,400 correlation matrices. That is, K = 50 replications were made for each p and n configuration found in the 28 empirical data sets amenable to the application of Horn’s test. However, in a recent development Allen and Hubbard [3] have devised a method allowing for the direct implementation of Horn’s test, thus obviating the need to generate large numbers of random data correlation matrices. Their approach, which extends the work of Montanelli and Humphreys [33] from the context of the common factor model to that of principal component analysis, involves the construction of regression equa- tions to predict the logarithms of eigenvalues of random data correlation matrices. This development should serve to make Horn’s test more accessible to the applied researcher.

For all of the reasons discussed above, we recommend that Horn’s test be regarded as the preferred method for determining principal component extraction. Certainly it warrants greater attention from marketers than it has thus far received. We further recommend that additional studies, using both empirical and simulated data sets, be undertaken comparing the performance characteristics of various component decision rules. Suffice it to say that results based on both kinds of data should be welcomed and viewed as complementary.

Conclusions

Using 30 empirical correlation matrices obtained from the marketing and psy- chology literature, the current study provided a comparative assessment of the

Page 15: An empirical comparison of alternative methods for principal component extraction

Methods for Principal Component Extraction J BUSN RES 1987:15:173-190

187

efficacy of five principal component extraction rules. These were the KG, scree, Bartlett, Horn, and RI procedures.

Application of these rules to the sample data produced astonishingly divergent results and generally supported those findings based on simulated data. The RI and Bartlett formulations, for example, precluded the possibility of acceptable component solutions. The RI criterion was found to severely underfactor corre- lation matrices, whereas Bartlett’s test, both relatively and absolutely, revealed the complete opposite. No justification exists for the continued use of these meth- ods. The KG and scree rules performed in an approximately equivalent fashion. Unfortunately, there is substantial evidence suggesting an overfactoring bias in these methods, particularly in the case of the former. Certainly the routine, default use of the KG rule can no longer be defended. Applied researchers need to be more judicious in their selection of component retention methods. It was dem- onstrated that Horn’s test possesses a number of solid advantages over competing decision rules and should be regarded as the preferred method for eliciting principal components. One hopes that this rule will begin to receive the exposure and use it justly deserves. Finally, we urge additional comparative studies using empirical and simulated data sets.

References

1.

2.

3.

8.

9.

10.

11.

12.

Acito, Franklin and Anderson, Ronald D. A Monte Carlo Comparison of Factor Analytic Methods. Journal of Marketing Research 17: 228-36 (May 1980).

Acito, Franklin, Anderson, Ronald D., and Engledow, Jack L. A Simulation Study of Methods for Hypothesis Testing in Factor Analysis. Journal of Consumer Research 7: 141-50 (September 1980).

Allen, Stuart J. and Hubbard, Raymond. Regression Equations for the Latent Roots of Random Data Correlation Matrices with Unities on the Diagonal. Multivariate Behavioral Research 21 pp’s. 393-98 (July 1986).

Bartlett, Maurice S. Tests of Significance in Factor Analysis. British Journal of Psychology, Statistical Section 3: 77-85 (June 1950).

Bartlett, Maurice S. A Further Note on Tests of Significance in Factor Analysis. British Journal of Psychology, Statistical Section 4: 1-2 (March 1951).

Browne, Michael W. A Comparison of Factor Analytic Techniques. Psychometrika 33: 267-334 (September 1968).

Cattell, Raymond B. Extracting the Correct Number of Factors in Factor Analysis. Educational and Psychological Measurement 18: 791-837 (Winter 1958).

Cattell, Raymond B. The Scree Test for the Number of Factors. Multivariate Behavioral Research 1: 245-76 (April 1966).

Cattell, Raymond B. The Scientific Use of Factor Analysis in Behavioral and Life Sciences. New York: Plenum Press (1978).

Cattell, Raymond B. and Jaspers, J. A General Plasmode for Factor Analytic Exercises and Research. Multivariate Behavioral Research Monographs 3: 1-212 (1977).

Cattell, Raymond B. and Vogelmann, S. A Comprehensive Trial of the Scree and KG Criteria for Determining the Number of Factors. Multivariate Behavioral Research 12: 289-325 (July 1977).

Cliff, Norman. Some Cautions concerning the Application of Causal Modeling Methods. Multivariate Behavioral Research 18: 115-26 (January 1983).

Page 16: An empirical comparison of alternative methods for principal component extraction

188 J BUSN RES 1987:15:173-190

R. Hubbard and S. J. Allen

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

Crawford, C. B. and Koopman, Penny. A Note on Horn’s Test for the Number of Factors in Factor Analysis. Multivariate Behavioral Research 8: 117-25 (January 1973).

Crawford, C. B. and Koopman, Penny. Note: Inter-rater Reliability of Scree Test and Mean Square Ratio Test of Number of Factors. Perceptual and Motor Skills 49: 223- 26 (August 1979).

Dziuban, Charles D. and Shirkey, Edwin C. When Is a Correlation Matrix Appropriate for Factor Analysis? Psychological Bulletin 81: 358-61 (June 1974).

Everett, J. E. Factor Comparability as a Means of Determining the Number of Factors and Their Rotation. Multivariate Behavioral Research 18: 197-218 (April 1983).

Gorsuch, Richard L. Using Bartlett’s Significance Test to Determine the Number of Factors to Extract. Educational and Psychological Measurement 33: 361-64 (Summer 1973).

Gorsuch, Richard L. Factor Analysis. Philadelphia: W. B. Saunders Company (1974).

Green, Paul F. Analyzing Multivariate Data. Hinsdale, Ill.: Dryden Press (1978).

Guttman, Louis. Some Necessary Conditions for Common Factor Analysis. Psychometrika 19: 149-61 (June 1954).

Hakstian, A. Ralph and Muller, Victor J. Some Notes on the Number of Factors Problem. Multivariate Behavioral Research 8: 461-75 (October 1973).

Hakstian, A. Ralph, Todd, Rogers W., and Cattell, Raymond B. The Behavior of Number-of-Factors Rules with Simulated Data. Multivariate Behavioral Research 17: 193-219 (April 1982).

Horn, John L. A Rationale and Test for the Number of Factors in Factor Analysis. Psychometrika 30: 179-85 (June 1965).

Horn, John L. and Engstrom, Robert. Cattell’s Scree Test in Relation to Bartlett’s Chi- Square Test and Other Observations on the Number-of-Factors Problem, Multivariate Behavioral Research 14: 283-300 (July 1979).

Hubbard, Raymond and Pandit, Vinay. Determining Significant Principal Components: A Probability Test for Eigenvalues. Developments in Marketing Science 7: 455-59 (1984).

Humphreys, Lloyd G. and Ilgen, Daniel R. Note on a Criterion for the Number of Common Factors, Educational and Psychological Measurement 29: 571-78 (Winter 1969).

Humphreys, Lloyd G. and Montanelli, Richard G., Jr. An Investigation of the Parallel Analysis Criterion for Determining the Number of Common Factors. Multivariate Behavioral Research 10: 193-205 (April 1975).

Kaiser, Henry F. The Application of Electronic Computers to Factor Analysis. Educational and Psychological Measurement 20: 141-51 (Spring 1960).

Kaiser, Henry F. A Note on Guttmann’s Lower Bounds for the Number of Common Factors. British Journal of Statistical Psychology 14: l-2 (1961). Kendall, Maurice G. and Buckland, William R. A Dictionary of Statistical Terms, 2nd edition New York: Hafner Publishing Co. (1960).

Lee, Howard B. and Comrey, Andrew L. Distortions in a Commonly Used Factor Analytic Procedure. Multivariate Behavioral Research 14: 301-21 (July 1979).

Linn, Robert L. A Monte Carlo Approach to the Number-of-Factors Problem, Psychometrika 33: 37-71 (March 1968).

Montanelli, Richard G., Jr. and Humphreys, Lloyd G. Latent Roots of Random Data Correlation Matrices with Squared Multiple Correlations on the Diagonal: A Monte Carlo Study, Psychometrika 41: 341-48 (September 1976).

Mulaik, Stanley A. The Foundations ofFuctor Analysis. New York: McGraw-Hill (1972).

Pandit, Vinay. Data Breeding: Estimation of Micro-Data in an Incomplete Data-Base, Unpublished Ph.D. Dissertation, Graduate School of Business Administration, Columbia University, 1978.

Stewart, David W. The Application and Misapplication of Factor Analysis in Marketing Research. Journal of Marketing Research 18: 51-62 (February 1981).

Page 17: An empirical comparison of alternative methods for principal component extraction

Methods for Principal Component Extraction

37. Tobias, Sigmund and Carlson, James E. Brief Report: Bartlett’s Test of Sphericity and Chance Findings in Factor Anaysis. Multivariate Behavioral Research 4: 375-77 (July 1969).

38.

39.

Tucker, Ledyard, R., Koopman, Raymond F., and Linn, Robert L. Evaluation of Factor Analytic Research Procedures by Means of Simulated Correlation Matrices. Psychometrika 34: 421-59 (December 1969).

Velicer, Wayne F. A Comparison of the Stability of Factor Analysis, Principal Component Analysis, and Resealed Image Analysis. Educational and Psychological Measurement 34: 563-72 (Autumn 1974).

40.

41.

42.

Velicer, Wayne F. Determining the Number of Components from the Matrix of Partial Correlations. Psychometrika 41: 321-27 (September 1976).

Velicer, Wayne F. An Empirical Comparison of the Similarity of Principal Component, Image, and Factor Patterns. Multivariate Behavioral Research 12: 3-22 (January 1977).

Velicer, Wayne F., Peacock, Andrew C., and Jackson, Douglas N. A Comparison of Component and Factor Patterns: A Monte Carlo Approach. Multivariate Behavioral Research 17: 371-88 (July 1982).

43.

44.

45.

Yeomans, Keith A. and Golder, Paul A. The Guttman-Kaiser Criterion as a Predictor of the Number of Common Factors. Statistician 31: 221-29 (1982).

Zwick, William R. and Velicer, Wayne F. Factors Influencing Four Rules for Determining the Number of Components to Retain. Multivariate Behavioral Research 17: 253-69 (April 1982).

Zwick, William R. and Velicer, Wayne F. A Comparison of Five Rules for Determining the Number of Components in Data Sets. Unpublished Manuscript (1985).

J BUSN RES 1987:15:173-190

189

Appendix: Sources of the Thirty Empirical Correlation Matrices”

l-6. Authors’ data concerning salient attributes of supermarket “images.”

7-8. Bagozzi, Richard P. Salesforce Performance and Satisfaction as a Function of Individual Difference, Interpersonal, and Situational Factors. Journal of Marketing Research 15: 517-31 (November 1978).

9-10. Bechtoldt, Harold P. An Emoirical Studv of the Factor Analvsis Stabilitv Hvoothesis.

11.

12.

13.

14.

15.

Psychometrika 26: 405-32 (December 196i).

Chapman, Robert L. The MacQuarrie Test for Mechanical Ability. Psychometrika 13: 175-79 (September 1948).

Davis, Fredrick B. Fundamental Factors of Comprehension in Reading. Psychometrika 9: 185-97 (September 1944).

Denton, J. C. and Taylor, Calvin W. A Factor Analysis of Mental Abilities and Personality Traits. Psychometrika 20: 75-81 (March 1955).

Didow, Nicholas M., Jr. and Franke, George R. Measurement Issues in Time Series Research: Reliability and Validity Assessment in Modeling the Macroeconomic Effects of Advertising. Journal of Marketing Research 21: 12-19 (February 1984).

Fleishman, Edwin A. and Hempel, Walter E., Jr. Changes in Factor Structure of a Complex Psychomotor Test as a Function of Practice. Psychometrika 19: 239-52 (September 1954).

aNote: The numbered data sets in this appendix correspond to those listed in Table 1. Nine of the psychology studies [9, 10, 11, 12, 13, 15, 16, 25 and 27) were suggested by Hakstian and Muller’s [21] paper, while three of the published correlation matrices from the marketing literature [7, 8, and 261 were not factor analyzed by the original authors.

Page 18: An empirical comparison of alternative methods for principal component extraction

190 J BUSN RES 1987:15:173-190

R. Hubbard and S. J. Allen

16. Green, Russell F., Guildford, J. P., Christensen, Paul R., and Comrey, Andrew L. A Factor-Analytic Study of Reasoning Abilities. Psychometrika 18: 135-60 (June 1953).

17. Harman, Harry H. Modern Factor Annlysis. Chicago: University of Chicago Press (1976), p. 124.

18. Harman, Harry H. and Jones, Wayne H. Factor Analysis by Minimizing Residuals (MINRES). Psychometriku 31: 351-68 (September 1966).

19-20. Joreskog, Karl G. Testing a Simple Structure Hypothesis in Factor Analysis. Psychometrika 31: 165-78 (June 1966).

21. Karlin, J. E. Music Ability. Psychometrika 6: 61-65 (February 1941).

22. Mukherjee, Bishwa N. A Factor Analysis of Some Qualitative Attributes of Coffee, in David A. Aaker, ed., Multivuriute Analysis in Marketing: Theory and Application. Belmont, California: Wadsworth (1971), pp. 245-51.

23-24. Murphy, Joseph R. Responses to Parts of TV Commercials. Journal of Advertising Research 11: 34-38 (April 1971).

25. Pemberton, Carol. The Closure Factors Related to Other Cognitive Factors. Psychometriku 17: 267-88 (September 1952).

26. Perreault, William D., Jr. and Russ, Fredrick A. Physical Distribution Service in Industrial Purchase Decisions. Journal of Marketing 40: 3-10 (April 1976).

27. Rimoldi, H. J. A. Study of Some Factors Related to Intelligence. Psychometriku 13: 27-46 (March 1948).

28. Stoetzel, Jean. A Factor Analysis of the Liquor Preferences of French Consumers. Journal of Advertising Research 1: 7-11 (December 1960).

29-30. Wells, William D. and Sheth, Jagdish N. Factor Analysis, in Robert Ferber, ed., Handbook of Marketing Research. New York: McGraw-Hill (19741, pp. 2-458-71.