The Probable Consequences of Violating the Normality Assumption in Parametric Statistical Analysis

7
The Probable Consequences of Violating the Normality Assumption in Parametric Statistical Analysis Author(s): Raymond Hubbard Source: Area, Vol. 10, No. 5 (1978), pp. 393-398 Published by: The Royal Geographical Society (with the Institute of British Geographers) Stable URL: http://www.jstor.org/stable/20001404 . Accessed: 17/06/2014 23:09 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . The Royal Geographical Society (with the Institute of British Geographers) is collaborating with JSTOR to digitize, preserve and extend access to Area. http://www.jstor.org This content downloaded from 188.72.127.119 on Tue, 17 Jun 2014 23:09:24 PM All use subject to JSTOR Terms and Conditions

Transcript of The Probable Consequences of Violating the Normality Assumption in Parametric Statistical Analysis

Page 1: The Probable Consequences of Violating the Normality Assumption in Parametric Statistical Analysis

The Probable Consequences of Violating the Normality Assumption in Parametric StatisticalAnalysisAuthor(s): Raymond HubbardSource: Area, Vol. 10, No. 5 (1978), pp. 393-398Published by: The Royal Geographical Society (with the Institute of British Geographers)Stable URL: http://www.jstor.org/stable/20001404 .

Accessed: 17/06/2014 23:09

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

The Royal Geographical Society (with the Institute of British Geographers) is collaborating with JSTOR todigitize, preserve and extend access to Area.

http://www.jstor.org

This content downloaded from 188.72.127.119 on Tue, 17 Jun 2014 23:09:24 PMAll use subject to JSTOR Terms and Conditions

Page 2: The Probable Consequences of Violating the Normality Assumption in Parametric Statistical Analysis

The probable consequences of violating the normality assumption in parametric statistical analysis Raymond Hubbard, Department of Economics, University of Nebraska- Lincoln

Summary. Confronted with non-normally distributed data, many geographers prefer to adopt nonparametric methods when analyzing the results of their research. The present paper argues that, provided the departures from normality are not severe, conventional parametric statistical models may still be frequently utilized.

A number of articles appearing in recent editions of this journal have been specifically concerned with the need to transform raw geographical data into a form approximating the normal distribution prior to statistical manipulations (Clark, 1973; Pringle, 1976; Roff, 1977). The rationale underlying this approach is, of course, predicated on the need to satisfy the assumptions of standard para

metric statistical models. One of these assumptions requires that scores be normally distributed, and it has been argued that in the absence of such con ditions distribution-free or nonparametric statistical procedures should be adopted in favour of parametric methods. However, it is also recognized that current nonparametric tests typically do not possess the power, versatility, and extensions to multivariate situations which characterize their parametric counter parts (Labovitz, 1970), so that the tendency to embrace nonparametric techniques is frequently undesirable and often unwarranted (Nunnally, 1967). Generally, therefore, geographers prefer to employ parametric statistics even when their data do not fulfil the necessary conditions (Pringle, 1976). In view of this preference it becomes a matter of vital importance to ascertain the extent to which the familiar parametric models are ' robust', at least with respect to the assumption of normality.

It is the purpose of this note to summarize some of the likely consequences of violating the assumption of normality when utilizing parametric statistical

methods and extend and amplify some of the views on this subject expressed in previous contributions to Area. The need for a paper of this nature is revealed by the fact that commonly employed textbooks dealing with quantitative aspects of geographical analysis (for example, Cole and King, 1968; Gregory, 1963; Hammond and McCullagh, 1974; King, 1969; Yeates, 1968) do not devote much attention to this important issue.

Testing for departures from normality

The techniques generally employed to determine whether a distribution ap proximates normality are varied. One can, for example, work with graphical methods such as probit, rankit, and fractile diagrams (Bliss, 1967), or with a chi-square goodness of fit test as outlined in many standard statistics texts (Hays, 1963). Alternatively, as Snedecor and Cochran (1967) demonstrate, the investigator

may choose to ascertain the degree to which the normality assumption is violated by utilizing the information contained in higher moments about the

393

This content downloaded from 188.72.127.119 on Tue, 17 Jun 2014 23:09:24 PMAll use subject to JSTOR Terms and Conditions

Page 3: The Probable Consequences of Violating the Normality Assumption in Parametric Statistical Analysis

394 Violating the normality assumption

sample mean such as skewness and kurtosis. It should be noted that these methods by no means exhaust those available for determining the normality of a

distribution. In many applied contexts, however, researchers tend to emphasize the role of chi-square and log likelihood approaches. Yet Bliss (1967, p. 140) points out that these are ' . . . indicative but hardly a critical criterion, although [they are] sometimes the only convenient test for agreement with the normal distribution '.

Comments of this nature illustrate the dilemma faced by the researcher. In many situations it will be possible for the individual to employ an appropriate transformation such that the resultant distribution conforms to the normal curve (Haggett, et al., 1977). Yet in other instances this state of affairs may be only partially realized at best. In short, the absence of unequivocally acceptable tests of, and corrections for, deviations from normality allows the scholar an element of discretion. Guided partly by convention, partly by his own experience and judgment, and partly by the character of the particular problem at hand, the researcher himself must decide whether the distortion of normality is 'significant'. Thus, a major element adding to the controversy surrounding the 'normality issue' undoubtedly emanates from the lack of undisputed research

guidelines (Clark, 1973). The remainder of this paper consequently focuses attention upon the probable outcomes of violating the normality assumption when employing certain common parametric models.

Violating the normality assumption in regression and correlation It is important to understand that in the classical linear regression model the assumption of normality applies only to the conditional distributions of the endogenous or dependent variable, and to the stochastic disturbance term. The

model may be formally stated as follows:

Yi =Po +1 Xil +2Xi2 + ***+fkXik + Ui

where for each observation on the Xis (exogenous or independent variables), the disturbance term Ui is a (conditionally) normally distributed random variable. Other properties of the disturbance term (such as non-autoregression and homoscedasticity) are adequately discussed by Poole and O'Farrell (1971). Because the error term is assumed to be normally distributed, it follows directly that observations on the dependent variable must also be normally distributed, since these are themselves linear combinations of the errors. But the Xis are not necessarily normally distributed variates, and it is as well to dispense with this somewhat common fallacy at the outset. Indeed, it is postulated in the above classical model that the Xis are not even random variables, but instead are regarded as fixed or non-stochastic elements.1 This implies that, while the Xis

may obviously attain different numerical values, they are nevertheless considered as constants when calculating their mathematical expectation.

The statistician or econometrician typically rationalizes the normality assumption by positing that the stochastic disturbance term incorporates a large number of individually unimportant random effects, all of which influence the behaviour of the endogenous variable Yi in only a minor fashion. The researcher then appeals to the implications of the Central Limit Theorem to justify the fact that such a process would induce a normal distribution in the error term. In addition, it is worth emphasizing that the Central Limit Theorem only applies in those instances where the random effects are mutually independent (Murphy,

This content downloaded from 188.72.127.119 on Tue, 17 Jun 2014 23:09:24 PMAll use subject to JSTOR Terms and Conditions

Page 4: The Probable Consequences of Violating the Normality Assumption in Parametric Statistical Analysis

Violating the normality assumption 395

1973). To the individual working with small sample sizes, however, this rational ization may be of little consolation. Nevertheless, certain research has revealed that even for sample sizes of between ten and twenty observations, distributions

may still approximate the normal curve (Koutsoyiannis, 1973). Finally, if the researcher suspects that the relationships the model purports to represent have been distorted by mis-specification, he should examine the distribution of the residuals, that is, the surrogate values for the unobservable U1s. This may help in suggesting possible transformations of the dependent variable, or the need to include additional variables while deleting others.

Suppose, for the sake of expository purposes, that the normality assumption has indeed been violated. What, one may inquire, are the potential repercussions likely to entail? Fortunately, it transpires that the relaxation of the normality assumption does only minimal damage to the properties of the ordinary least squares (OLS) estimators of f3, and fk. As Maddala (1977) observes, the small sample properties of the OLS estimators still retain their BLUE (best linear unbiased estimator) characteristics, for these are independent of the form of the probability distribution evidenced by the Uis. That is, they are unbiased and continue to possess the minimum variance among the class of linear unbiased estimators. However, following Kmenta (1971) and Zeckhauser and Thompson (1970), it should be noted that without specification of the distributional form of the Uis these estimators are no longer efficient owing to the fact that the Cramer

Rao lower bound of their variances cannot be determined. Furthermore, they are no longer maximum likelihood estimators since legitimate employment of the likelihood function rests critically upon the normality assumption.

Considering now the large sample (asymptotic) properties of these estimators, it can be demonstrated that they are both consistent and asymptotically un biased, for when many observations are employed one can invoke the Central Limit Theorem to illustrate that the sampling distributions of /3" and the /3kS approach normality as the sample size tends toward infinity.2 Consequently, the asymptotic properties of the least squares estimates are equivalent to maximum likelihood estimates; that is, they display the same mean and variance. It is therefore apparent that even when the assumption of normality is violated, the least squares estimates preserve most of their desirable properties. While it is

worth reiterating that the assumption of normality is unnecessary for obtaining estimates of the coefficients, it is a requirement when conducting significance tests and establishing confidence intervals (Maddala, 1977). Yet even under these circumstances, provided that the disturbance term does not depart drastically from the normal curve, the usual t and F tests of statistical inference may be safely utilized to yield reasonably accurate approximations (Kmenta, 1971).

Lest it be imagined that econometricians exercise a monopoly on research concerning violations of the normality premise, it should be mentioned that a considerable amount of related research has also been undertaken by, among others, sociologists and psychologists. With respect to the point-biserial corre lation, for example, Labovitz (1967) employed simulation methods to demon strate the robustness of this coefficient to markedly-skewed distributional forms. In subsequent research the same author (Labovitz, 1970) showed that even when numbers were randomly and non-randomly assigned to rank-ordered data (subject to an order preserving monotonic transformation) to produce a number of logarithmic, exponential, and higher-order curves, the Pearson product

moment correlation coefficient should still be utilized as a superior measure of

This content downloaded from 188.72.127.119 on Tue, 17 Jun 2014 23:09:24 PMAll use subject to JSTOR Terms and Conditions

Page 5: The Probable Consequences of Violating the Normality Assumption in Parametric Statistical Analysis

396 Violating the normality assumption

association to ordinal statistics. Again, other studies recommending the use of the Pearson r when the normality assumption is not dramatically violated are readily available (Borgatta, 1968; Nefzger and Drasgow, 1957; Nunnally, 1967).

In geography also, there is some evidence attesting to the robustness of the normality assumption in empirical work. Thus, for example, in the analysis of a large array of socio-economic and demographic variables, Moser and Scott (1961) discovered that a log transformation of raw data did not materially affect the interpretation of a principal components solution. Similarly, Roff's (1977) investigation of 44 variables from the British 1971 census (in which approximately one-half were non-normally distributed) concluded that trans formations did not significantly alter the correlation matrix or principal com ponents result. Again, Pringle's (1976) analysis of 64 census variables for County Durham demonstrated that, while transformations are not universally appropri ate in that they easily produce a normal distribution in the data set, they typically attenuate the incidence of non-normality to the extent that non parametric statistical models should not be adopted unnecessarily.

It would be negligent, however, to leave the reader with the impression that one can dismiss the applicability of nonparametric methods, for almost all of the studies just mentioned have been contested to some degree. Mayer (1970), for example, has criticized Labovitz's research by arguing that the latter should have made more explicit the particular distributional form (normal, t, F) to which variables were assigned.3 Similarly, the evidence provided by Nefzger and Drasgow (1957) substantiating the notion that normality is a needless assump tion when computing the Pearson r has not gone unchallenged.4 Binder (1959), in particular, has noted that much of the confusion which exists among social scientists when employing correlation methods can be attributed to a failure to fully understand the assumptions and implications of the different mathematical

models which constitute the basis of these approaches; namely, the bivariate normal distribution, the linear regression, and the randomization models. Experimental studies exist which demonstrate that the Pearson r is not a particularly robust statistic (Kowalski, 1972; Norris and Hjelm, 1961), and the use of alternative correlation coefficients should perhaps be encouraged (Carroll, 1961; Tinkler, 1972). Finally, within a geographical context, the findings of Clark (1973) that even small departures from normality significantly affect principal components structures and scores are contrary to those of both Moser and Scott (1961) and Roff (1977).

Additional comments concerning inferential statistics It has already been noted that provided the departures from normality are not excessive, the formulation of confidence intervals and significance tests for regression and correlation coefficients will not on average be unduly affected in an adverse fashion. As may be anticipated, further evidence can be easily adduced indicating similar findings with respect to the generalized use of inferential statistics. This is welcome information to the researcher who wishes to extend the findings for his sample to the relevant underlying population.

In both the F and t tests it is assumed that the error term is distributed normally.5 Nevertheless, available evidence would seem to indicate that these tests are virtually immune to violations concerning this premise (Boneau, 1960).

Even when observations have been drawn from logarithmic, logistic, expo nential, double exponential, J-shaped, and rectangular populations, the F and t

This content downloaded from 188.72.127.119 on Tue, 17 Jun 2014 23:09:24 PMAll use subject to JSTOR Terms and Conditions

Page 6: The Probable Consequences of Violating the Normality Assumption in Parametric Statistical Analysis

Violating the normality assumption 397

tests can usually still be applied with confidence. Empirical research supporting this viewpoint is plentiful with respect to both two-tailed t tests (Baker, et al., 1966; Boneau, 1960; Rider, 1929) and F tests (Cochran, 1947; Lindquist, 1953; Pearson, 1931). Confronted with information of this nature, Gaito (1959, p. 1 16) has commented that ' the mathematical and empirical data indicate that tests of homogeneity of means by analysis of variance (and two-tailed t tests) are relatively insensitive to both deviations from normality and from homogeneity of variance'.

At this point a number of caveats are appropriate. First, counter-examples may be cited indicating that tests of the homogeneity of several variances are indeed sensitive to departures from normality (Haggett, et al., 1977; Kendall and Stuart, 1967). Secondly, as Anderson (1961) points out, the lack of normality will almost certainly make its presence felt if one-tailed t tests are employed, and experiments involve grossly disparate sample sizes. Yet even under these circum stances the researcher may be able to exercise a degree of control over the sample sizes utilized in a study. Thirdly, in the event that deviations from normality are

marked, the individual can either apply suitable transformations to his variables and/or select a lower probability level for significance testing, for example, 0^025 instead of 0 05, 0 005 instead of 0 01, and so on (Gaito, 1959).

Conclusions The aim of this paper has been to indicate that even in those instances where the researcher's data fail to satisfy the normality assumption commonly demanded of parametric statistical models, the alternative course of action should not necessarily be the immediate adoption of some nonparametric technique. This should not be construed to imply that the latter are dispensable or redundant, for several of the studies cited in this paper clearly suggest that the dust has not yet completely settled on the ' normality issue'. Nevertheless, it is maintained that because of the remarkable robustness displayed by parametric procedures, their continued use in the face of reasonable violations of assumptions will generally not result in severely erroneous inferences. As usual, however, the individual should be judicious in his approach, and endeavour to satisfy the

normality requirement in so far as this is possible.

Notes

1. When the Xis are assumed to be stochastic (referred to by Poole and O'Farrell (1971) as the random X model), the critical factor is whether or not they are independent of the error term. When they are not independent, ordinary least squares estimates are biased.

2. It is instructive to note that in many practical research situations the benefits of asymptotic properties accrue with relatively small sample sizes, for example, 100 observations or less.

3. For further communications concerning Labovitz's findings see Soc. Forces (June 1968) and Am. Sociol. Rev. (June 1971)

4. See, for example, the three comments in Am. Psychol. (Sept. 1958) on the Nefzger and Drasgow article.

5. A further assumption requires that the variance associated with the error term of different treatment populations be homoscedastic. But, in so far as non-normality and heterogeneity of variance tend to covary (Bartlett, 1947), it is convenient to treat them simultaneously throughout the remainder of this paper.

References

Anderson, N. H. (1961) ' Scales and statistics: parametric and nonparametric', Psychol. Bull. 58, 305-16

This content downloaded from 188.72.127.119 on Tue, 17 Jun 2014 23:09:24 PMAll use subject to JSTOR Terms and Conditions

Page 7: The Probable Consequences of Violating the Normality Assumption in Parametric Statistical Analysis

398 Violating the normality assumption

Baker, B. O., Hardyck, C. D. and Petrinovich, L. F. (1966) 'Weak measurement versus strong statistics: an empirical critique of S. S. Stevens' proscriptions and statistics', Educ. psychol.

Measur. 26, 291-309 Bartlett, M. S. (1947) 'The use of transformations', Biometrics 3, 39-52 Binder, A. (1959) ' Considerations of the place of assumptions in correlational analysis', Am.

Psychol. 14, 504-10 Bliss, C. I. (1967) Statistics in biology, vol. 1 (New York) Boneau, C. A. (1960) ' The effects of violations of assumptions underlying the t test ', Psychol.

Bull. 57, 49-64 Borgatta, E. F. (1968) ' My student, the purist: a lament', Sociol. Q. 9, 29-34 Carroll, J. B. (1961) ' The nature of the data, or how to choose a correlation coefficient',

Psychometrica 26, 347-72 Clark, D. (1973) 'Normality, transformation and the principal components solution', Area

5, 110-13 Cochran, W. G. (1947) ' Some consequences when the assumptions for the analysis of variance

are not satisfied', Biometrics 3, 22-38 Cole, J. P. and King, C. A. M. (1968) Quantitative geography (New York) Gaito, J. (1959) 'Nonparametric methods in psychological research', Psychol. Rep. 5, 115-25 Gregory, S. (1963) Statistical methods and the geographer (London) Haggett, P., Cliff, A. D. and Frey, A. (1977) Locational methods (London) Hammond, R. and McCullagh, P. S. (1974) Quantitative techniques in geography (Oxford) Hays, W. L. (1963) Statistics for psychologists (New York) Kendall, M. G. and Stuart, A. (1967) The advanced theory of statistics, vol. 3 (London) King, L. J. (1969) Statistical analysis in geography (Englewood Cliffs, N. J.) Kmenta, J. (1971) Elements of econometrics (New York) Koutsoyiannis, A. (1973) Theory of econometrics (New York) Kowalski, C. J. (1972) ' On the effects of non-normality on the distribution of the sample

product-moment correlation coefficient', Appl. Statist. 21, 1-12 Labovitz, S. (1967) ' Some observations on measurement and statistics ', Soc. Forces 46, 151-60 Labovitz, S. (1970) 'The assignment of numbers to rank order categories', Am. Sociol. Rev.

35, 515-24 Lindquist, E. F. (1953) Design and analysis of experiments in psychology and education (New

York) Maddala, G. S. (1977) Econometrics (New York) Mayer, L. S. (1970) 'Comment on "the assignment of numbers to rank order categories"',

Am. Sociol. Rev. 35, 916-17 Moser, C. A. and Scott, W. (1961) British towns (Edinburgh) Murphy, J. L. (1973) Introductory econometrics (Homewood, Ill.) Nefzger, M. D. and Drasgow, J. (1957) ' The needless assumption of normality in Pearson's r',

Am. Psychol. 12, 623-5 Norris, R. C. and Hjelm, H. F. (1961) 'Non-normality and product moment correlation',

J. Exp. Educ. 29, 261-70 Nunnally, J. C. (1967) Psychometric theory (New York) Pearson, E. S. (1931) 'The analysis of variance in cases of non-normal variation', Biometrika

23, 114-33 Poole, M. A. and O'Farrell, P. N. (1971) 'The assumptions of the linear regression model',

Trans. Inst. Br. Geogr. 52, 145-58 Pringle, D. (1976) ' Normality, transformations, and grid square data', Area 8, 42-5 Rider, P. R. (1929) ' On the distribution of the ratio of mean to standard deviation in small

samples from non-normal populations', Biometrika 21, 124-43 Roff, A. (1977) 'The importance of being normal', Area 9, 195-8 Snedecor, G. W. and Cochran, W. G. (1967) Statistical methods (Ames, Iowa) Tinkler, K. J. (1972) ' The physical interpretation of eigenfunctions of dichotomous matrices',

Trans. Inst. Br. Geogr. 55, 17-46 Yeates, M. H. (1968) An introduction to quantitative analysis in economic geography (New York) Zeckhauser, R. and Thompson, M. (1970) ' Linear regression with non-normal error terms',

R. Econ. Stat. 52, 280-6

This content downloaded from 188.72.127.119 on Tue, 17 Jun 2014 23:09:24 PMAll use subject to JSTOR Terms and Conditions