Download - What does Fama (1965) establish? -

© Emilio Venezian 2009

What does Fama (1965) establish?

by

Emilio Venezian

© Emilio Venezian 2009, 2011

This is a work in progress and is sent to you for your information and comment. Please do not

cite, quote, or otherwise distribute the information without prior written consent form me.

Invitations to present and discuss the material are, of course, welcome, as are criticism and

discussion of what I express.


What does Fama (1965) establish?

by

Emilio Venezian

Abstract

This paper provides a review of Eugene Fama‟s well known paper of 1965 on the structure of

stock market returns. This paper departs from tradition by challenging the major conclusions

reached using the data that are contained in the original paper. I find that most of his conclusions

are not well grounded. A major source of problems is that Fama‟s methodology relies on the

assumption that recorded daily prices are continuous and fails to take into account the biases in

the statistics that arise from rounded data. A second source is the failure to check carefully for

indications that the price-generating model may not be stationary over the period studied.

Moreover, Fama‟s conclusion that the characteristic exponent of the distribution of increments is

less than two is based on the assumption that the sample increments have zero first order

autocorrelation and that the justification of conclusion that the autocorrelations is zero is based

on the assumption that the distribution of price increments is not Gaussian; that leads to a

circular argument rather than to a conclusion. The data show that first order autocorrelations

were not zero even if the biases of rounding are insignificant. The problems are aggravated by

the fact that the three methods Fama uses to estimate the characteristic parameter lead to values

that are basically uncorrelated.


1. Introduction

The paper “The Structure of Stock Market Prices” by Eugene Fama is sometimes credited as one

of the foundations of the “efficient markets hypothesis” that has dominated financial thinking for

half a century. It has been cited several hundreds of times. The website of the Journal of Finance

lists some 311 published papers as citing the paper, and the annual citation rate has, on average,

been increasing. One hundred and sixty eight of the citations are in publications with date after

1999. Some books cite the paper with approbation and quote the conclusions without

discussion.1 That seems to be unjustified, for reasons I will discuss in this paper.

The paper consists of two parts. The first part deals with the issue of whether the daily

increments in the logarithm of prices (that I will refer to as “increments”) may be viewed as

samples from a normal distribution or are better characterized as samples from a stable Pareto

distribution of characteristic exponent less than 2. The second part deals with the issue of

whether the increments are serially independent. The author reports a great deal of analysis on

these issues and concludes that a characteristic exponent less than two and serial independence

appear to be justified on the basis of the analysis.

In the first part of the paper, Fama suggests some methods of estimating the characteristic

exponent of series of random numbers, based on the properties of stable Pareto distributions and

derives from these two basic estimation equations which are non-linear transformations of the

equations expressing the basic properties. He implements these estimation procedures ignoring

the fact that non-linear transformations entail biases in the results which, if the empirical data

had matched the theoretical models, would have resulted in estimates that are biased downward.

The paper gives no information of the variability of the estimators, so the fact that the findings

are that estimates of the characteristic exponent are less than 2 mean little: we do not know

whether the difference is large or small with respect to the expected bias and we do not know

how it compares to the standard deviation of the estimators.

Moreover, the theoretical models deal with increments inferred from prices that are recorded as

continuous variables, whereas the data used to obtain the estimates is based on prices that were

recorded to the nearest $0.125, the tick size in effect during the period in which the data were

obtained. Any kind of rounding creates problems with estimators. In the literature we find

hundreds of papers on this problem, sometimes categorized as that of “grouped data” and “errors

in variables” problem. Given this complication, it is not possible to be dogmatic about the net

direction of all these effects so the first part of the paper cannot be dismissed so easily.2

1 See, for example, Campbell, Lo, and MacKinlay, 1997.

2 In general, rounding would make a normal distribution appear leptokurtic because it makes the probability of

zero increments much more probable; this would provide an intuitive basis for concluding that it would make

the downward bias more pronounced. The question, however, is how far will the “fatness” of the tails prevail

and what effect will that have. As far as I know, the effects of rounding when the underlying distribution is

stable Pareto with exponent less than 2 have not been discussed adequately.


The second part of the paper Fama estimates the serial autocorrelation coefficients of the series

and treats them as though the underlying increments came from continuously recorded prices;

unfortunately rounded prices introduce biases. He also introduces a theoretical problem: he uses

sampling theory for correlation coefficient estimators based on sampling from normally

distributed variates. That seems inconsistent with his general conclusion that the increments are

from stable Pareto distributions of characteristic exponent less than 2. Fama also does extensive

testing of the independence of increments based on runs tests; the rounding of prices makes the

runs tests theoretically inapplicable.

The main purpose of this paper is to discuss these and other related issues in some detail. The

material will be presented in sections, as follows. Section 2 discusses the relation between the

underlying results from the theory of stable Pareto distributions, as presented by Fama, and the

estimators that he used, presenting the results of Fama and some of the other issues that need to

be considered in assessing their significance. Section 3 discusses how the rounding of prices

affects the estimates of the increments and creates problems in the evaluation of the results.

Section 4 suggests an alternate way of approaching the issue of whether the issue of the

distribution of increments and uses simulations to provide some insights into the results. Section

5 turns the attention to the issues related to the assessment of independence. Section 6 deals with

issues that interlink the two major parts of Fama‟s paper. Section 7 provides some tentative

conclusions.

2. On Fama‟s Estimators of the Characteristic Exponent

In all, Fama describes three estimators of the characteristic exponent, α. One relies on graphical

analysis and Fama himself describes it as subjective; in fact, he limits himself to publishing

ranges for the estimates of the values of α obtained from this method. The other two methods are

somewhat less subjective. He labels them the range analysis and sequential variance estimators.

The idea for range analysis is derived from the observation that for all stable Pareto variables the

interquantile range of sums of variables is related to the number of terms in the sum, by:

( ) ⁄ ( ) ( )

where is the interquantile range of the set of sums of non-overlapping terms of the series.

This relation holds for any interquantile range.

In order to solve this for the variable of interest, , he solves Equation 1 to express the estimator

as:


( ) ( )

( ( ) ( )⁄ ) ( )

This estimator is implemented by Fama by finding the values for values of of 0.75, 0.83, 0.90,

0.95, and 0.98 and for values of of 4, 9, and 16. No rationale is given for the particular choices.

The sequential variance estimate is based on the observation that the quantiles of the distribution

of the variance of samples of and terms from a stable Pareto distribution of characteristic

exponent are related by the expression

( ) ( ) ( ) ⁄

( )

where ( )

∑ .

∑ ( ) /

From Equation 3 he obtains an estimator that can be expressed as:

( ) (

⁄ )

( ( ) ( )⁄ ) ( ⁄ )

( )

The estimator is implemented by Fama by obtaining the variance of the first 200, 300,…800

increments in his series as values of ( ), and that of the first 300, 400, and then increasing by

100 until the maximum length of each series is reached, as values of ( ). This choice of

implementation uses a single estimate of the ratio ( ) ( )⁄ as an estimator of the median.

Moreover, it uses overlapping periods so that the numerator and denominator are certainly not

independent and ensures that increments that appear late in the sequences are never represented

in ( ).

Both estimators are of the form:

( ) ( )

where is the quantity measured empirically. It is not at all clear that the expected value of the

estimator is equal to the value of . In fact is a convex function of so by the inequality of

Jensen, 1906, we will have:

, -

( , -) ( )


Thus the range analysis estimator is clearly biased downward from the true value. This implies

that if the increments were indeed normally distributed the average of estimators should be less

than 2.

In the case of the sequential variance estimator the case cannot be made as strongly, because the

relationship involves the median and not the expected value of the random variable . It is,

accordingly, possible that:

( ) ( ⁄ )

( , ( ) ( )⁄ -) ( ⁄ ) ( )

though the fact that it might be so needs to be proven rather assumed.3

Fama finds that his estimates4 of the characteristic exponent are frequently less than 2 (in 21 out

of 30 stocks in his data base for the range analysis and 23 out of 30 for the sequential variance

estimation). From the comments made above it follows that downward departures could be due

merely to the non-linear form of the estimators. Hence the finding does not establish that the

characteristic exponent is not exactly 2.

3. The Effects of the Rounding of Prices on Fama‟s Arguments

The basic data on which Fama relies are daily closing prices of the 30 Dow Jones Industrial

stocks in the New York Stock Exchange. He tells us that the data are adjusted for stock splits.

That data would lead to unbiased and correct computation of the daily increments in stock prices

if the NYSE recorded prices were recorded as continuous variables, as Fama assumes in his

equations. In fact, however, at the time the data were generated the recording was based on ticks

of 12.5 cents. A theoretical closing price of $33.100 would have been recorded as either $33 or

$331/8. Two potentially serious complications arise from this rounding.

The first is that the usual adjustment for splits and stock dividends is not perfect. With

continuous prices a simple multiplier will achieve comparability of the increments before and

after the change in the number of shares. With rounded prices a change in the number of shares

would require a corresponding change in the tick size for the distribution of increments to remain

unaffected. In particular, it is possible that large splits may lead to non-stationarity of the

increments estimated from the reported prices. That problem cannot be fixed so easily.

3 It has to be kept in mind that we are not dealing with the unbiased estimator of the variance, so that the

expected value of the ratio is not exactly one, though it will be very close to one if the numbers of points in the

samples is of the order of hundreds rather than tens. 4 It should be made clear that Fama summarizes his results. For the range analysis estimator he gives the average

of the 15 values obtained for the parameter sets he selected. For the sequential variance estimator he gives the

median of the values he calculated, these numbered from 49 to 84 depending on the length of the series. No

rationale is given for these choices of summary statistics.


The second is that even in the absence of effects such a splits and stock dividends the increments

are measured with error and that error distorts the underlying distribution. If the rounding is to

the nearest integer tick, the recorded price may differ from the price in Fama‟s model by as much

as 6.25 cents. Moreover, as the price of a security changes as a result of the randomness of the

increments, the distribution of the calculated increments must change even if the underlying

increments are stationary as specified in the model.

Analyses based on the assumption that the underlying distribution a price changes is normal with

independent differences lead to the conclusion that the rounded prices will yield unbiased

estimations of the mean of the distribution of increments, an overestimate of the variance of that

distribution, an elevated excess variance, and a negative autocovariance of the reported price

changes at a lag of one period (Gottlieb and Kalay, 1985, Marsh and Rosenfield, 1986, Ball,

1988, Cho and Frees, 1988, Harris, 1990, Venezian 2011). .

This has some potentially important consequences. One is that the graphs and tables that display

the frequency of departures from the mean as multiples of the standard deviation are understating

the case, because the estimates of the standard deviations are likely to be too high. Assuming that

the stocks traded mostly at prices above $40 and that the standard deviations of daily returns

were of the order of 0.015, this effect should be negligible.5 Another, and potentially more

serious one, is that if the true increments were indeed stable Pareto, then the reported increments

would not be stable Pareto, so tests based on the properties of that family of distributions may

not perform well with the empirical data. Tests based on kurtosis will also give incorrect results,

Venezian, 2011. Yet another implication is that the price does matter; thus stocks with the same

underlying increments and the same tick size will experience different distributions of measured

increments if the paths of price levels are different. Finally, though the conditions for ergodicity

are satisfied, at least for increments whose distributions have finite variance, the time required to

assure that the time average approaches the ensemble average may be very long indeed. This

suggests that a new approach is needed to investigate the matter.

5 Appendix 1 provides evidence that the standard deviations of daily returns are of this order of magnitude.

Correcting the results for rounding would make the standard deviation of the underlying process somewhat

lower. The price of $40 takes into account my best guess of the magnitude of the distortions.


4. A Possible Alternative Approach

One possible line of attack is to retrench to the mindset of classical statistical tests. In principle

we can postulate as a null hypothesis that the increments are indeed normally and independently

distributed and find the distribution of any statistic we wish based on that null hypothesis. Then

we can ask if the empirical measures are sufficiently different from the derived distribution as to

warrant rejection of the null hypothesis.

That is a great deal easier to say than to do. Analytical treatment appears to be prohibitively

difficult in the case of rounding, especially for estimators such as those used by Fama, and

numerical treatment has to cope with many necessary parameters, such as the initial price, the

mean and standard deviation (or more generally, location and scale parameters) of the

distribution and the numbers of points available and selected.6 But these are not impossible tasks.

In practice, it is of interest to determine, for given parameters generally in the range of those

used by Fama, what the distribution of the individual measures used by Fama might be if the

underlying increments are independent and normally distributed, and whether that distribution

would be consistent in some sense with the data. This might provide some indication of whether

the hypothesis of normality can be rejected or not on the basis of results such as those presented

by Fama. It is also of interest whether the summary statistics of Fama (the averages of the range

analysis estimators and the medians of the sequential variance estimators) might be consistent

with the null hypothesis.

To address these issues, simulated increments were drawn from a normal distribution, daily

prices were calculated from these increments and rounded to the nearest $0.125. From these

rounded prices the reported daily increments were calculated and then the methods that Fama

use were implemented. I used two sample sizes, with 1,152 and 1,728 observations. These cover

most of the range of sizes which Fama encountered (1,118 to 1,693) and are evenly divisible by

the summing intervals he used (4, 9, and 16 days). I used initial stock prices of $10, almost

certainly below the range encountered by Fama, and $20 and $50, which should be in the right

range. At initial prices above $50, the effect of rounding is small. In all the simulations I used

stationary parameters for the underlying distribution of increments, with an expected value of

zero and a standard deviation of 0.015, which are at the center of the range of those encountered

in Fama‟s material, as shown in Appendix 1.

Figure 1 shows the results of range analysis when the initial stock price is $50. In this and

subsequent figures it should be kept in mind that a maximum absolute difference in the

cumulative distribution functions of a sample of 30 stocks and the simulated curve of 500 points

of 0.36 (36 percent) is the critical level of the Kolmogorov Smirnov test to reject, at the 0.1

6 These are a minimum set. Other relevant variables include, among others, descriptions of whether the ranges are

independent, completely overlapping, or partially overlapping.


percent level, the hypothesis that two distributions are the same.7 The difference between the

mean and the median is of small import. It is clear, on the other hand, that the correlation

coefficient has a major systematic effect on the estimates, as Fama discussed, with positive

correlation giving a median estimate of 1.89 at a correlation of +0.1, 2.06 at a correlation of zero,

and 2.27 at a correlation of −0.10. This issue will be discussed more fully in Section 6.

The results of Fama are not consistent with the hypothesis that the increments come from an

underlying stable normal distribution unless the correlation coefficient is close to -0.1. Only two

of the thirty stocks had correlation coefficients in that range, whereas 7 had coefficients close to

+0.1. Truncation and Jensen inequality effects, which are taken into account by the simulation,

do not have an important effect on that conclusion but their impact is plain to see if the median

values of the distribution of α are read from the graph. Their effect is to strengthen the

conclusion; shifting the curve to a median of 2.0 would reduce the maximum difference from the

empirical distribution to less than 0.35.

Figure 1: Results from range analysis estimation

The corresponding results for initial prices of $20 and $10 are not sufficiently different to

warrant separate discussion. The length of the series has effects so small that the thickness of a

line conceals them.

The results for sequential variance estimates show minimal effects from correlation, so that

variable can be dismissed from our discussion. Results for a $50 stock are shown in Figure 2.

7 In contrast, a difference of 0.126 (12.6 percent) indicates that the difference between two simulated curves is

significantly different at that level.


The results of these simulations strengthen the conclusions of Fama in the sense that they concur

that the distribution underlying the empirical data is not likely to be a normal distribution.

Figure 2: Results from sequential variance estimation

In this case, the differences between long and short series and between mean and median are

somewhat more pronounced. Nonetheless, the distribution of the empirical data is not consistent

with the hypothesis of a stationary normal distribution for the underlying increments.

There is a fine but important difference between the conclusions drawn here and those that of

Fama. His conclusion is that the distribution is a stable Pareto distribution. My conclusion goes

no further than saying is that the data from this method are not consistent with the null

hypothesis of an underlying stationary normal distribution and it would go no further even if I

were convinced that the expected value of the population autocorrelation coefficients is exactly

zero and the process was stationary.


5. Assessment of the Independence of Sequential Increments

Fama relies heavily on runs test in the assessment of serial independence. The runs tests of

Mood are based on the null hypothesis that there are a number of distinct outcomes, each of

which has a probability of occurring that is constant over the period of observation. Ideally the

distinct outcomes can be specified objectively a priori as in the case of heads, tails, or edge on a

series of flips of a coin or the numbers 1 through 10 on the toss of a dodecahedral die.8

If the increments are postulated to come from a specific distribution, and the prices are recorded

as continuous variables, these tests are applicable. The probability that the increment will be

between a and b will be the same over time, provided the distribution of increments is stationary.

Hence for sufficiently long series of observations the runs tests should reject the null hypothesis

if the increments are not independent or if the distribution is not stationary.

If the prices are recorded on a rounded basis and the increments have to be inferred from the

prices, then the tests should fail for sufficiently long series if the distribution has non-zero

variance or expected value. This result can be appreciated readily because the probability of a

calculated increment of exactly zero, when prices are recorded to the nearest 12.5 cents is going

to be different when price is $10 than when it is $15. In fact, under those conditions we cannot

even be sure that stocks whose increments have non-stationary means and variances will fail the

test more often that stocks whose increments have stationary distributions. For example, if the

standard deviation declines as price increases, so that their product remains nearly constant, then

the probability of exactly zero reported change in price will remain approximately stationary.

Thus all the runs test conducted by Fama amount to telling us that either the series were too short

to detect changes or that the mean and variance of the underlying increments changed over time

(in response to price changes or as a result of economic events) in a pattern that led to non-

rejection.

The other approach that Fama takes is through the estimation of autocorrelation coefficients at

various lags. Even if we are willing to forget the fact that prices are rounded, and other problems,

the results are somewhat dismaying for someone who would like to believe that the increments

are independent. At lag 1, seven of the 30 stocks have correlation coefficients that are significant

at the 1 percent level by the Fisher z-transform test. These are shown in Table 1. The results are

peculiar in that the correlation coefficients seem to cluster at plus and minus 0.1. Note that the

8 If the categories are specified a posteriori we have to consider the danger of using the standard test because the

“outcomes” can be selected, subconsciously or intentionally, to achieve preconceived results.


numbers given here are somewhat different from those given by Fama.9 His criterion was to

discuss correlation coefficients greater than twice their standard deviation; the z transform gives

a better approximation to normality than the correlation coefficient and a one percent criterion is

easy to implement.

Table 1: Stocks with significant correlation coefficients at lag 1

Stock ρ p- level

Alcoa 0.118 2.21 ∙ 10-5

American Can -0.087 117.71 ∙ 10-5

American Tobacco 0.111 3.34 ∙ 10-5

Goodyear -0.123 1.28 ∙ 10-5

International Nickel 0.096 34.83 ∙ 10-5

Procter & Gamble 0.099 8.02 ∙ 10-5

Sears 0.097 31.68 ∙ 10-5

Texaco 0.094 67.42 ∙ 10-5

Union Carbide 0.107 16.75 ∙ 10-5

Assuming that the test is valid having, 9 of 30 stocks with correlation coefficients that are not

likely to have arisen by chance under the null hypothesis cannot be anything but discouraging.

Moreover, the levels of significance are well below one percent. Fama dismisses this finding

with the statement:

“All the sample serial correlation coefficients in Table 10 are quite small in

absolute value. The largest is only 0.123. Although 11 of the coefficients for lag

τ = 1 are more than twice their computed standard errors, this is not regarded as

important in this case. The standard errors are computed according to

equation (12); and, as we saw earlier, this formula underestimates the true

variability of the coefficient when the underlying variable is stable-Paretian with

characteristic exponent α < 2. In addition, for our large sample, for our large

samples the standard error of the serial correlation coefficient is very small. In

most cases a coefficient as small as 0.06 is more than twice its standard error.

„Dependence‟ of such small order of magnitude is, from a practical point of view,

probably unimportant for both the statistician and the investor. ” (page 70)

Fama‟s Equation (11) is:

( )

( )

9 Fama marks 11 of the 30 stocks as having correlations coefficients more than 2 standard deviations away from

zero; as noted in the body I use the Fisher z-transform test which, under my null hypothesis of normality is

better when the underlying correlation in not zero.


and Equation (12) is:

( ) √ ⁄

where is the sample size.

After displaying this equation Fama states

“Previous sections have suggested, however, that the distribution of is stable

Paretian with characteristic exponent α less than 2. Thus the assumption of finite

variance is probably not valid, and as a result equation (12) is not a precise

measure of the standard error of even for extremely large samples. Moreover,

since the variance of comes into the denominator of the expression for , it

would seem questionable whether serial correlation analysis is an adequate tool

for examining our data.” (page 69)

He then summarizes the results of an exercise in which he randomized the order of the first

differences for each stock, estimated the sample autocorrelation coefficient at lag of 1 day for the

first 5, first 10… of the randomized differences, and compared how often the results crossed the

boundary of zero plus or minus two standard deviations estimated from Equation (2). His

summary is:

“Although the results must be judged subjectively, the sample serial correlation

coefficients appear to break through their control limits only slightly more often

than would be the case if the underlying distribution of the first differences had

finite variance. From the standpoint of consistency the most important feature of

the sample coefficients is that for every stock the serial correlation is very close to

the true value, zero, for samples with more than, say, three hundred observations.

In addition, the sample coefficient stays close to zero thereafter.” (page 70)

I believe that this exercise suggests something quite different from what Fama read into it. It

certainly does not stand as a clear demonstration that “this formula underestimates the true

variability of the coefficient when the underlying variable is stable-Paretian with characteristic

exponent α < 2.”10

The paper does not report how many randomizations were carried out for

each stock; if only one was performed for each stock, we have no guarantee that the “true value”

of the serial autocorrelation coefficient for that one randomization was actually zero. That part of

the conclusion might be credible if several dozen randomizations had been performed.

The fact that the control limits are broken “only slightly more often than would be the case if the

underlying distribution of the first differences had finite variance” is puzzling. In the first place,

10

The stem “underestimate” occurs only twice in the paper; once in the passage quoted and once on page 39, in an

entirely different context. In particular the possibility that the values of the stable Pareto parameter estimated by

Fama might underestimate the true parameter is not discussed in the 1965 paper.


even in the case of independent, normally distributed variables the distribution of the sample

correlation coefficient is not normally distributed but has a t-distribution (Snedecor and Cochran,

1967) so that the control limits are not independent on the number of points. One of the usual

tests for significant departure of the sample coefficient from zero relies on this by comparing

the value of | |√

to the t-distribution with degrees of freedom. The better alternative

is to use the normalized Fisher z-transform,

√ .

/ which, under the null hypothesis of

no correlation is approximately normally distributed with zero mean and unit variance even for

samples of size 50 or so. This formulation also is strictly correct for normally distributed

variables, though some books suggest that it may be useful with other underlying distributions.11

The normalized Fisher transform is particularly useful when the null hypothesis involves

correlation coefficients different from zero or if we are interested in the fiducial range of a

measured coefficient that is not exactly zero, because the transform remains close to normal even

for large correlation coefficients, though the expected value is no longer zero.

The fact that Fama uses control limits that do not take into account correctly the influence of

sample size makes the statement about deviations somewhat more ambiguous than it appears,

since no specific mention is made of how often the deviations beyond the control limits occurred

or whether they occurred only at very large sample sizes (when the true control limits are

virtually invariant as the sample size changes) or at smaller sample sizes (where the difference

between the t- and normal distributions might contribute to the problem).

Taken in context, however, the exercise taken as a whole has some interesting consequences.

On one hand we have the finding that when increments are randomized the hypothesis of zero

autocorrelation is rejected about as often as it would under the hypothesis that increments are

independent and normally distributed. On the other hand we have the finding that if we take the

increments in their natural order the results reject the hypothesis of zero autocorrelation well

beyond the one percent level for 9 out 30 stocks. Put together these findings are not consistent

with the notion that it is the distribution of increment sizes that creates problems with the tests. If

the distribution were the problem it would have been a problem no matter what the ordering of

the increments.12

Thus the exercise underlines that the finding of so many significant correlation

coefficients is, at least from the statistical point of view, a very unlikely event even if we do not

take into account the conclusion that rounded prices lead to estimated correlation coefficients

that are likely to be lower than the true coefficient as shown by Harris, 1990 and Venezian, 2011.

The effect of the shape of the distribution of increment sizes on the distribution of correlation

coefficients is potentially important and deserves more than a passing mention. This, once again,

11

For example, Sachs, 1982, states: “This r to z transformation requires that x and y have bivariate normal

distribution in the population. The larger the sample size, the less stringent is this assumption.” However, later

in the same paragraph he states: “One uses this transformation only for samples with n > 10 from a bivariate

normal distribution.” (page 427) 12

This conclusion would be even stronger if we knew that multiple randomizations had been conducted.


can be explored by some relatively simple simulations. I thought it would be instructive to use

the family of the t-distribution as a starting point, because when the parameter of that family is

set to 1 we have the Cauchy distribution, which is the member of the stable Pareto family with

characteristic exponent α = 1, and when the parameter is to infinity we have the normal

distribution which is the member of the stable Pareto family with characteristic exponent α = 2.

Between these extremes we have a family that is leptokurtic; though for most practical purposes

the distribution can be considered as normal when the parameter is over 100.

A simple way of displaying the results is to determine the ratio of how often the normalized

Fisher z transform (NFT) would reject the hypothesis of zero correlation at the two tails to how

often the hypothesis should be rejected. For example, if 10,000 series of 1,501 points are

simulated, we would expect to have 100 points rejected at each tail if we use a one-sided null

hypothesis. If, in fact, the simulated series were to give us 200 in the lower tail and 30 in the

upper tail, the ratios would be 2.0 and 0.3, respectively. The results of such a simulation are

given in Figure 3. Note that the symbols for the individual points are color filled if the number of

points was significantly different from the expected by a chi-squared test and open if the results

were not significant. Because many tests were performed, I used the 0.005 level of significance

as a threshold. The results are highly supportive of Fama‟s contention that tests based on the

normal distribution would reject the null hypothesis of zero correlation too often. In fact, it is not

necessary to have variates with infinite variance to have a substantial distortion; even the t-

distribution with parameter 3, which has a finite variance, results in about twice as many

rejections as would be appropriate. By the time the parameter reaches 5, however, the test

performs approximately as we would want.

Figure 3, however, displays the results that would be obtained with continuous recording of

prices, which was presumably what Fama had in mind in making his argument. The prices Fama

used to determine increments and their correlation coefficients were, however, recorded only to

the nearest $0.125. The performance under those conditions would be a better gauge for making

decisions. The results of such simulations will, of course, depend on the assumed initial price of

the stock.

Figure 4 show the results for simulations with an initial price of $50 and a tick size of $0.125.

One clear similarity with Figure 3 is that as the probability level tends to zero the ratio goes well

above one. Thus for sufficiently stringent tests, the argument remains valid.

There are, however, also some striking differences between Figure 3 and Figure 4. One is that

Figure 3 appears to be symmetric, with the two tails behaving in much the same way. Figure 4,

on the other hand suggests a lack of symmetry. This should be expected from the known biasing

effect of rounding that was discussed above. The other striking difference is that Figure 4 shows

the ratio going well below one, and with numbers large enough to be statistically significant. For

some of the parameter values used in the simulations the ratios are low enough at probability

levels of 0.01 and 0.02 to invalidate the argument put forth be Fama.


Panel A

Panel B

Figure 3: Performance of the NFT with correlations from t-distributed variates


Panel A

Panel B


with rounding of prices and an initial stock price of $50


Figure 5 shows that even with initial prices of $200 the issues persist. Ratios below one exist at

probability levels below 0.10 and in some case, including case with infinite variance, do not

return above one until probability levels well below the conventional 0.01.

An important feature of the simulations that is not apparent in the figures is that with continuous

prices the possibility of an infinite return is not possible in theory.13

With rounding to the nearest

$0.125 and distributions of very large or infinite variance, however, this should be expected.

Replicates that met this problem in the 1,501 “days” of simulated prices did not yield correlation

coefficients. Thus we have a form of “survivorship bias” but one that would also be met in

practice. With an initial stock price of $50 the problem was encountered 4,195 times with a t

parameter of 1, 2,366 times with a parameter of 1.25, 849 times with a parameter of 1.50, 229

times with a parameter of 1.75, and 43 times with a parameter of 2.00; the problem did not occur

with the parameters of 3.00 and 5.00, which have finite variance. With an initial stock price of

$200 the corresponding numbers for the parameters with infinite variance were 3,922, 1,948,

642, 152, and 25; again, it never occurred with the two parameters of finite variance.

Thus the rounding of prices may cause serious problems and it raises doubts about the validity of

Fama‟s arguments. That does not imply that his arguments must be wrong, since the stable

Pareto family is not the t-distribution family. But the example does provide evidence that

rounding can have serious implications that need to be considered carefully in appraising the

relation between empirical evidence and theory.

This exercise suggests that with fat-tailed distributions that are not stable Pareto distributions the

use of standard normal criteria for evaluating correlation coefficients is dangerous. It also raises

doubts about Fama‟s view that the standard deviation derived from Equation (12)

“underestimates the true value of the variability when the underlying variable is stable-Paretian

with characteristic exponent α < 2.” It certainly leads to too many rejections of the null

hypothesis at extremely low values of probability, but to too few rejections at levels close to the

ones frequently used in research.

The results of the study showed 2 significant negative coefficients and 7 significant positive

coefficients. Even if the negative tail were overrepresented 10 fold it would not affect the

conclusion that there is an excess of “significant” correlations unless the ratio at the upper tail is

also significantly higher than 1. Hence, while I agree with Fama that the measured

autocorrelation at a lag of one day is small, I could not dismiss it so readily as statistically

insignificant even if were to I assume that the distribution of daily returns has infinite variance.

13

In practice it is possible that this would happen in a simulation because even with double precision calculations

“continuous pricing” is subject to some rounding. In some 100,000 simulations this never occurred.


Panel A

Panel B


with rounding of prices and an initial stock price of $200


6. Interactions and Other Issues.

In this section I discuss a number of points that have not been noted above. I begin by pointing

out that if we take Fama‟s three methods of estimating the characteristic coefficient at face value,

they show no consistency. Then I note that there is internal evidence that the two more

quantitative methods may not be providing unbiased answers, these are discussed under separate

headings. Finally I deal with the issue of stationarity since this is essential for the soundness of

the estimation procedures and in my view was not tested adequately.

Cross-validation of the methods of estimation

Since the three methods are intended to measure the same characteristic, one might expect to see

some measure of concordance in a published academic paper. Unfortunately, none is provided.

The results of exploring concordance are not encouraging. Fama presents a summary of his

estimates in his Table 9, with the comment “Even a casual glance at Table 9 is sufficient to show

that the estimates of α produced by the three different procedures are consistently less than 2.”

The table, here repeated as Table 2, has other features that should be just as easy to appreciate.

One noteworthy aspect is that in only 8 of the 30 stocks does the estimate from range analysis

fall within the range provided by the graphical method and only in 3 of 30 cases does the

estimate from the sequential variance method fall within that range. In fact in only one case

(General Foods) do the range analysis and sequential variance method estimates both fall within

the range Fama found through his graphical method.

A second observation is that the results from the two methods that Fama classifies as “less

subjective” do not seem to have much in common. Correlation coefficients of the characteristic

values obtained by the two methods are small and make it unlikely that they are measuring the

same property. The Pearson correlation coefficient between the estimates of the range analysis

and sequential variance methods is only 0.058, which is not significantly different from zero if

we use standard deviations that assume the sampling is from a bivariate normal distribution. The

Spearman correlation coefficient is 0.066. Tables of significance for this coefficient with so few

points and ties are not available, but this assures us that the lack of correlation dos not arise from

a few outliers. Kendall‟s tau, which measures ordering with no effect of distance between ranks

and it therefore distribution free, is 0.064 with a standard deviation of 0.123. According to

Kendall, 1970, its distribution approaches normality very quickly; hence the coefficient does not

appear significantly different from zero.

The level of correlation implies that the variance in the estimates is much greater than the

variance in the supposed characteristic exponents among the 30 stocks. Thus there is clear

internal evidence that the estimators are not very effective and do not give consistent results.


Table 2. Fama's Estimates of Characteristic Exponent

Graphical Range Sequential

Stock Low High Analysis Variance

Allied Chemical 1.99 2.00 1.94 1.40

Alcoa 1.95 1.99 1.80 2.05

American Can 1.85 1.90 2.10 1.71

A.T.&T. 1.50 1.80 1.77 1.07

American Tobacco 1.85 1.90 1.88 1.24

Anaconda 1.95 1.99 2.03 2.55

Bethlehem Steel 1.90 1.95 1.89 1.85

Chrysler 1.90 1.95 1.95 1.36

Du Pont 1.90 1.95 1.88 1.65

Eastman Kodak 1.90 1.95 1.92 1.76

General Electric 1.80 1.90 1.95 1.57

General Foods 1.85 1.90 1.87 1.86

General Motors 1.95 1.99 2.05 1.44

Goodyear 1.80 1.95 2.06 1.39

International Harvester 1.85 1.90 2.06 2.22

International Nickel 1.90 1.95 1.77 2.80

International Paper 1.90 1.95 1.87 1.95

Johns Manville 1.85 1.90 2.08 1.75

Owens Illinois 1.85 1.90 1.95 2.06

Procter & Gamble 1.80 1.90 1.84 1.70

Sears 1.85 1.90 1.75 1.66

Standard Oil (Calif.) 1.95 1.99 2.08 2.41

Standard Oil (N.J.) 1.90 1.95 2.02 2.09

Swift 1.85 1.90 1.99 1.87

Texaco 1.90 1.95 1.85 1.76

Union Carbide 1.80 1.90 1.75 1.56

United Aircraft 1.80 1.90 1.93 1.13

U.S. Steel 1.95 1.99 1.96 1.78

Westinghouse 1.95 1.99 2.10 1.35

Woolworth 1.80 1.99 1.93 1.02


Internal evidence of problems in the estimation in the range analysis estimates

In discussing range analysis estimation, Fama points out that if there is sample autocorrelation in

the daily returns the method will give biased estimates.

Range analysis has one important drawback, however. If successive price changes in the

sample are not independent, this procedure will produce "biased" estimates of α. If there

is positive serial dependence in the first differences, we should expect that the

interfractile range of the distribution of sums will be more than nl/α

times the fractile

range of the distribution of the individual summands. On the other hand, if there is

negative serial dependence in the first differences, we should expect that the interfractile

range of the distribution of sums will be less than nl/α

times that of the individual

summands. Since the range of the sums comes into the denominator of (7), these biases

will work in the opposite direction in the estimation of the characteristic exponent α

Positive dependence will produce downward biased estimates of a, while the estimates

will be upward biased in the case of negative dependence.29

We shall see in Section V, however, that there is, in fact, no evidence of important

dependence in successive price changes, at least for the sampling period covered by our data.

Thus it is probably safe to say that dependence will not have important effects on any

estimates of a produced by the range analysis technique. (pages 64 and 65)

The footnote states:

29 It must be emphasized that the "bias" depends on the serial dependence shown by the

sample and not the true dependence in the population. For example, if there is positive

dependence in the sample, the interfractile range of the sample sums will usually be

more than nl/α

times the interfractile range of the individual summands, even if there

is no serial dependence in the population. In this case the nature of the sample

dependence allows us to pinpoint the direction of the sampling error of the estimate of

α. On the other hand, when the sample dependence is indicative of true dependence in

the population, the error in the estimate of α is a genuine bias rather than just sampling

error. This distinction, however, is irrelevant for present purposes.

He thus dismisses the problem. In the second section, however, he finds that 11 of the 30 stocks

have first order correlation coefficients that are more than two times their standard deviations

away from zero. It might have been prudent, at that point, to enquire whether the “quite small”

correlation coefficients may be causing substantial biases in the estimation. That is not too

difficult to do. The correlation coefficient between the range analysis estimate of α and the first

order correlation coefficient for the sample is -0.665 if all 30 stocks are considered. The value of

Kendall‟s τ is −0.558, significant at the 2*10-6

level and involves no assumptions regarding

distributions.

Other ways in which internal data to examine the effect of sample autocorrelation on the estimate

produced by range analysis involve the correlation between these quantities. The Pearson

correlation coefficient between them is −0.778 and the Kendall tau is −0.622, significant at the

5*10-7

level. If we are willing to assume that the sample correlation coefficients are


approximately normally distributed about the value of zero, the value assumed by Fama, we can

use OLS regression to determine both the slope and the intercept. The analysis in Panel A of

Table 3 shows the results when all 30 stocks are included. The points are shown in Panel A of

Figure 6 along with the regression line. A.T. & T. appears to be an outlier. Since this company

was closer to a public utility than an industrial stock, the analysis was repeated excluding it. The

results are shown in Panel B of the table and the figure. The fit is much better. Both panels show

that the coefficient of the sample autocorrelation is significantly negative at all conventional

level in both regressions even with a two tailed test. This is consistent with Fama‟s argument.

We can now focus on the intercept, the estimator of what the stable Pareto characteristic that

would be obtained if the sample autocorrelation were zero. This is clearly different from zero,

but the relevant question is whether it is different from 2. The intercept would be not deemed

significantly different from 2 at the 5 percent level with a two-tailed test whether we include

A.T.& T. or exclude it. Using a one-tailed test, that is, starting with the hypothesis that the

coefficient should be less than two, it attains significance at the 3 percent level if we consider

the first regression appropriate, but would still not attain significance at the 5 percent level if

A.T.& T. is excluded. Thus it is possible that the range analysis information is governed mostly

by the biases induced because of the non-zero sample autocorrelations. Hence the estimates are

not solid evidence for Fama‟s conclusion.14

Table 3. Regressions of the Characteristic Exponents Estimated by Range Analysis on the First

lag autocorrelation coefficient of the Dow Jones Index.

Panel A: Regression Excluding A.T. & T., Adjusted r-squared=0.423

Variable Coefficients

Standard

Error t Stat p-value

Intercept 1.967 0.016459 119.4929 1.81E-39

Sample Autocorrelation -1.249 0.264901 -4.71629 6.02E-05

Panel B: Regression Excluding A.T. & T., Adjusted r-squared=0.590


Standard

Error t Stat p-value

Intercept 1.980 0.013988 141.5755 2.64E-40

Sample Autocorrelation -1.432 0.222818 -6.42513 6.96E-07

14

Note that Fama‟s statements relate to autocorrelation in the sample, not in the population. Any concerns about

errors in the independent variable are irrelevant. If we were concerned with them we would have to admit that

under those conditions the intercept understates the true intercept if the slope is negative.


Panel A

Panel B

Figure 6. Relation between Estimates by Range Analysis Estimates of the Characteristic

Exponent and the Sample Correlation Coefficient at Lag 1


Internal evidence of problems in the estimation in the sequential variance estimates

In the case of sequential analysis there is also a telltale sign. Fama stresses the fact that the

estimates should be independent of the numbers of observations chosen for the two periods. In

his Table 7 he provides, as an example, the results of sequential variance estimation for

American Tobacco and points out their variability and sensitivity to the end point of the longer

series.

The problems in estimating α by the sequential variance procedure are illustrated in

Table 7 which shows all the different estimates for American Tobacco. The estimates are

quite erratic. They range from 0.46 to 18.54. Reading across any line in the table makes it

clear that the estimates are highly sensitive to the ending point (nl) of the interval of

estimation. Reading down any column, one sees that they are also extremely sensitive to

the starting point (n0).

By way of contrast, Table 8 shows the different estimates of α for American Tobacco that

were produced by the range analysis procedure. Unlike the sequential-variance estimates,

the estimates in Table 8 are relatively stable. They range from 1.67 to 2.06. Moreover, the

results for American Tobacco are quite representative. For each stock the estimates

produced by the sequential variance procedure show much greater dispersion than do the

estimates produced by range analysis. It seems safe to conclude, therefore, that range

analysis is a much more precise estimation procedure than sequential-variance analysis.

Sensitivity to the longer series could be an indication of that stationarity is violated. This

property is assumed throughout Fama‟s analysis, but is not adequately tested. Regressions of the

estimates on the numbers of increments in the first and the second period as independent

variables yield the results shown in Table 4. Based on all 56 values shown in the table leads to

the conclusion that the length of the initial interval is not significant (p = 0.292) but the longer

interval is significant (p = 0.0051). The estimate corresponding to lengths of 200 and 300, has

small numbers and a very large overlap, it leads to the highest estimate of the characteristic

exponent, 18.54. The next highest value is 2.64, when the lengths are 200 and 400. If the first

point is excluded from the regression, the two coefficients are virtually equal and both are highly

significant (at the 5.5*10-4

and 3.4*10-6

levels, respectively) and together have an adjusted r-

squared of 0.517. The fact that both coefficients are different from zero suggests that the

instability is not merely due to the fact that the market experienced unusually high turbulence

toward the end of the period in Fama‟s data, as shown in Figure 7 and suggests that other

problems may exist.


Table 4. Regressions of the Characteristic Exponents of American Tobacco Estimated from

Sequential Variances on the Number of Increments Included in the Estimates.

Panel 1: Regression on 56 points, Adjusted r-squared = 0.176


Standard

Error t Stat P-value

Intercept 5.493921 1.082452 5.075442 5.09E-06

-0.00168 0.001581 -1.06342 0.292412

-0.00337 0.001154 -2.91969 0.005135

Panel 2: Regression on 55 points, Adjusted r-squared = 0.520


Standard

Error t Stat P-value

Intercept 2.656713 0.181376 14.64758 4.29E-20

-0.00092 0.000249 -3.6865 0.000543

-0.00098 0.000189 -5.2021 3.38E-06

Figure 7. Variation in the rate of Return on the Dow Index over the Relevant Time Interval


The Issue of Stationarity

The issue of stationarity is important because if the parameters of the distribution of daily

increments changes over time then tests that cover the whole period can give misleading results.

Fama concludes that this is not a problem. My conclusion based on the published data is that the

one stock for which he chose to presented addition details shows evidence of non-stationarity.

Fama attempts to justify the assumption that the return process is stationary by examining the

distribution of daily returns in two segments of the total period. His analysis is limited to five

stocks that “seemed to show changes in trend that persisted for rather long periods of time during

the period covered by this study. “Trends" were „identified‟ simply by examining a graph of the

stock's price during the sampling period. The procedure, though widely practiced, is of course

completely arbitrary.” (page 58) In line with the subjectivity he presents no comparisons of

either the mean or standard deviation of the returns in the periods, perhaps because the choice of

segments based on something more than changes in the trend of prices. Such tests might,

nonetheless, have cast some light on the issue. For example, if the criterion for segmentation was

a change in the average level of returns, a test of whether the variances of returns in the two

segments were the same might have been useful.

In view of Fama‟s position that the increments come from a stable Pareto distribution, this

approach seems unsuitable. The relevant stationarity would be that of all four basic parameters of

family. Looking at the shape of the distribution in two periods selected primarily, if not

exclusively, because they differ in the location parameter (the fourth parameter) might be

misleading.

Of the five stocks examined he reports some details for only one, A.T. &T. I have already argued

that it was not really an industrial enterprise at the time, but it is the only data provided. The data

provided for that stock is fragmentary and appears to have a number of inconsistencies. These

are discussed in Appendix 2.

Fama does show, in his Figure 4, graphs of the distribution of daily returns in the two segments

and in the aggregate period. He remarks on page 58:

“As was typical of all the stocks the graphs are extremely similar. The same type

of elongated S appears in all three. Thus it seems that the behavior of the

distribution in the tails is independent of the mean. This is not really a very

unusual result. A change in the mean, if it is to persist, must be rather small. In

particular the shift is small relative to the largest values of a random variable from

a long-tailed distribution.”

While I might agree that the graphs of the two segments of A.T.&T. are “similar” to the unaided

eye and that they are “S” shaped, I see the similarity as very limited. Even a casual examination


of the axes suggests that additional investigation is advisable. The range of variation in the first

period, is from −0.04 to +0.07 units, whereas that for the second period is from -0.05 units to

0.08; this might look “extremely similar” to the eye. The first period with the smaller range of

variation, had approximately six times as many observations as the as the second one. It is

unlikely that even stable Pareto distributions are such that the expected value of the range is

larger for a sample of about 220 points than it is for a sample of 1,200 points. 15

A rough estimate is that if the standard deviation is finite then in the second period it was

roughly 50% higher than that in the first period. This would imply an F-ratio of over 2 with

about 200 and 1,000 degrees of freedom, enough to attain significance at all the usually quoted

levels. Of course the use of the F-test is valid only for normal distributions so appealing to it is

useful only to the open-minded. By itself the difference in range and the difference in slope

could be the result of changes in the characteristic exponent which, as Fama points out, is a

measure of how fat the extreme tails are, or of the third parameter, which is a measure of scale.

The fact that periods of different mean were found suggests a shift in the location parameter.

We can also gauge non-stationarity by making copies of the graphs and comparing them by

superposition as shown in Figure 8. The figure suggests a difference in the slope of the body of

the curve and differences in the tails.

Figure 8: Comparison of the distribution of returns of A.T. & T. in the two periods

15

See Appendix 2 for a further discussion of issues related to the data on A.T. & T.


The cumulative distribution functions for the two periods appear to differ. The Kolmogorov-

Smirnov test, which is distribution free, depends on the maximum absolute value of the

difference between two empirical distributions. A more formal comparison can, accordingly, be

made if we have information on the cumulative frequency distributions. That information is not

provided in the paper, but we can get close to it by scaling the graphs carefully. Fortunately

modern technology makes it possible to do that. The graphs in the PDF version of the paper can

be enlarged and the distances between points can be measured electronically. One limitation is

that the thickness of lines increases with magnification, making the positioning of the cursor a

matter of judgment to some extent.

In view of the apparent differences between the two periods, I decided to look use a more formal

test. If the maximum absolute difference is large enough to exceed the critical values of the KS

two sample test, then the curves may be deemed to be far from “extremely similar.” In that event,

the hypothesis of stationary distribution may be rejected.

Figure 9 shows the un-replicated and unadjusted16

results for absolute value of the differences

together with three critical levels for the KS test. Based on these un-replicated and unadjusted

results one might reject the hypothesis that the samples come from the same distribution at the

one per thousand level. The rebuttal, of course would be that this is an unfair comparison

because the two periods were selected on the basis of differences in the average return.

That is a fair remark, though it must be realized that adjusting for the mean would reduce the

critical values of differences when using KS-like tests. Adjustment for the mean is somewhat

problematic. Rounding of prices to the nearest tick makes the probability of increments of

exactly zero quite substantial, so that the curves have a discontinuity at zero return. If we are just

scaling at a fixed ordinate we need merely have a convention to scale at either end of the

discontinuity or at the middle of it, and adhere to that convention. However if we want to adjust

two curves to the same mean return problems arise. In the process of adjustment some ordinates

will shift from the left hand side of a discontinuity to the right hand side, or vice versa. Because

of these limitations and difficulties it is helpful to replicate measurements in order to the extent

to which these shifts may affect the results. As a by-product we obtain information about the

accuracy and its reproducibility of the measurements.

16

In all cases, what I refer to as an un-replicated value is actually the average of two values obtained with scaling

that started at two different positions. These were intended primarily to detect digit transpositions and failures

to enter minus signs, so I do not consider them as replicates. The averaging, however, would reduce the error

of the value reported. The results in this figure are not adjusted to reflect the difference in means.


Figure 9. Absolute difference in the cumulative distribution function of daily increments for A.T.

& T. with no adjustment for differences in the mean return in the two periods

The portion of the paper cited above indicates that the criterion for segmentation was related to

the location parameter, not to the scale parameter and therefore suggests we should expect

differences in the mean of the two. The paper gives 0.00107 as the mean for the first period and

−0.00061 for the second period.17

Hence I rescaled the results for the first period in two ways:

one to adjust the mean to zero and another to adjust the mean to that of the second period.

Analogously, I rescaled the results for the second period in two ways: one to adjust the mean to

zero and another to adjust the mean to that of the first period. These were done by computing the

new abscissas and then measuring the original graphs at these new abscissa values. From these

results we can compute three sets of differences: one referred to a common mean equal to that of

the first period, one referred to a common mean equal to that of the second period, and one

referred to a common mean of zero. The first two methods combine one of the initial scalings

with an independent rescaling, The third one involves two rescalings.

The results are shown in Figure 10, which also shows the average of the three and the original

unadjusted difference. The points are plotted at the original abscissa. Perhaps the most

17

See Appendix 2 for a discussion of these data.


interesting feature is that the maximum difference has increased, rather than decreased, as the

result of the adjustments. The instability near the origin is also notable, this is mostly a result of

the fact that the gaps at zero (amounting to 8.0 percent in the first period and 5.6 percent in the

second), which were coincident in the original scaling, are no longer coincident when one or

both of the origins are shifted. The result is that large differences of the order of 4 percent will

arise at ordinates just from the adjustment across the gap in this area.

That is certainly not a problem at ordinates in the neighborhood of −0.005. In this area the

maximum difference is about 18 percent, well above the critical point for rejection at the 0.1

percent level even if we do not take into account the fact that the mean was sifted. Moreover the

range extends from −0.003 to −0.008 so replication errors are not large enough to lower the

result below that critical point.

Figure 10: Absolute difference in the cumulative distribution function of daily increments for

A.T. & T. after adjusting for differences in the mean return in the two periods


The curves can also be compared to assess symmetry by superposing each period on itself after

turning one of the copies upside down and matching the gap at zero return. The results, shown in

Figure 11 suggest little or no skewness in the first period but visible skewness in the second

period. This would imply that the skewness parameter of the Pareto distribution may have

changed from the first period to the second one.

Panel A: First Period Panel 2: Second Period

Figure 11: Assessment of the skewness of the distribution of

A.T. & T. daily returns in the two periods.

The conclusion is that, at least for A.T. &T., the characterization of “extreme similarity” and the

conclusion that the assumption of stationarity is justified do not stand closer scrutiny. Some

caution is advisable in going further, because the data on for A.T. &T. presented in the paper is

not completely consistent, as discussed in Appendix 2.

One other hint of non-stationarity can be found in the data of American Tobacco presented

earlier. As Fama recognized, there appear to be trends in the estimates of the characteristic

exponent:


“The problems in estimating α by the sequential variance procedure are illustrated

in Table 7 which shows all the different estimates for American Tobacco. The

estimates are quite erratic. They range from 0.46 to 18.54. Reading across any

line in the table makes it clear that the estimates are highly sensitive to the ending

point (nl) of the interval of estimation. Reading down any column, one sees that

they are also extremely sensitive to the starting point (n0). By way of contrast,

Table 8 shows the different estimates of α for American Tobacco that were

produced by the range analysis procedure. Unlike the sequential-variance

estimates, the estimates in Table 8 are relatively stable.”

The statement suggests that similar behavior was found in the estimation of exponents for other

stocks. This may be important because range analysis estimates use the whole series (or as much

of it as is consistent with forming sums of 4, 9, and 16 terms) whereas sequential variance

estimates as implemented by Fama use between 200 and 800 from the early part of the series for

the shorter period and points that include up to the last observation for the longer period. Thus

any non-stationarity that enters in the latter part of the period would be masked in range analysis

and amplified in sequential variance analysis.

Given only the data provided in the published paper I must conclude that stationarity is

questionable. That implies that none of the estimates of characteristic exponent provided in the

paper can be trusted as unbiased assessment of what they purport to measure.

Fama’s choice of implementation methods

The analysis that Fama engaged in deals with time series, and he often mentions this point. The

theory on which his methods are based, however, is based on the characteristic function of stable

Pareto distributions. Accordingly, his argument relates to independent samples from such

distributions. If a theory of sequences of stable Pareto had been available, he would have used

that theory to assess autocorrelation. Instead, as already discussed, he randomized the order of

the increments to investigate autocorrelation. When it is convenient, however, he stresses the

notion of sequences. In his Appendix, for example he starts (pages 104 and 105) with the

statement:18

“This discussion provides us with a way to analyze the distribution of the sample

variance of the stable Paretian variable u. For values of α less than 2, the

population variance of the random variable u is infinite. The sample variance of n

independent realizations of u is

∑

(A20)

This can be multiplied by

with the result

( ∑

) (A21)

Now we know that the distribution of

18

I quote his passage in full. The switch from ut to yi is in the original.


∑

is stable Paretian and independent of n. In particular, the median (or any other

fractile) of this distribution has the same value for all n. This is not true, however,

for the distribution of S2. Now we know that the distribution of ∑

is

stable Paretian and independent of n. In particular, the median (or any other

fractile) of this distribution has the same value for all n. This is not true, however,

for the distribution of S2. The median or any other fractile of the distribution of S

2

will grow in proportion to .”

Then he goes on to illustrate this and relate it to the task at hand by an example:

“For example, if ut is an independent, stable Paretian variable generated in time

series, then the .f fractile of the distribution of the cumulative sample variance of

ut at time tl , as a function of the .f fractile of the distribution of the sample

variance at time t0 is given by

.

/

(A22)

where is the number of observations in the sample at time , is the number

at , and and

are the .f fractiles of the distributions of the cumulative

sample variances.”

It is clear from his context that the first part applies generally, but he then goes on to enshrine the

idea of sequences by calling this the “sequential variance approach”.

The terminology and notation used may give rise to confusion. The symbol is used to denote

a single estimate of the variance from a sample of size n. The symbol is used to denote a

quantile (f) of the distribution of and perforce depends on the sample size. In what follows I

need a more explicit notation. In an attempt to avoid confusion I will use ( ) to denote the

estimate of the variance from the th sample of size n, and , ( )- to denote the estimate of

the q quantile from a sample of k values of ( ). With this notation, the fundamental relation

derived in the appendix may be written as:

, ( )- , ( )- ( )

( )

The corresponding relation for α would be:

.

/

( , ( )- , ( )-

) . / ( )


Fama uses the result from single samples, ( ) and ( ), as the estimators of , ( )-

and , ( )-. That leads to the estimator used by Fama:

.

/

( ( ) ( )

) . / ( )

Fortunately the value of q does not enter the relation, and ( ) is an estimator of the median, so

the calculations can be performed with no further complication. It seems likely that if independent

samples are used in the two estimates the result will have high dispersion. This may be one reason for the

choice of overlapping periods used in the numerical work. It might be better, however, to use an alternate

estimator:

.

/

( , ( )-

, ( )-) .

/

( )

This would require drawing samples of size and determining quantiles of the estimated

variance, then repeating the procedure for samples of size . We could choose the samples to

be continuous segments or we could select at random, with or without replacement. With a series

of 1600 observations we could, for example, use 160 samples of size 10, 80 samples of size 20,

or 16 samples of size 100. The choice of parameters would depend on the relationship between

the size of the sample and the variance of the estimators. The variance of the median of 16

samples of 10 may well be lower than that of a single sample of 160.

This alternative has other advantages. One is that we could use not just the median, but other

quantiles as well. Thus we could check to see if the estimates based on the first quartile, the

median, and the third quartile are consistent. A second one is that every part of the overall

sample would be included in the estimates, in contrast to Fama‟s choice which includes only the

early part of the period in all the estimates with the smaller sample size. A third one is that, at the

cost of reducing the reliability, we can develop measures of stationarity.

These comments also have application to range analysis. That also relies on sums, and the

requirement is that the elements of the sums be independent stable Pareto variables, not that they

be sequential. Assuming that the increments are stable Pareto, this can be used to assess whether

they are stable over the sample period, since the results obtained from sums of sequential

samples of increments yield the same distribution of results as those obtained from sums of

samples of random increments with the same sample size.


7. Summary

In sum, the Fama paper claims a great deal but seems to establish very little on a sound basis.

The claim that estimates of the characteristic exponent determined by three different methods all

lead to values that are predominantly less than two proves nothing in the absence of information

about the potential biases and the variability of the estimates. The fact that the values obtained by

the three methods show no significant positive association raises serious questions about the

contention that the estimates are meaningful. The fact that the observed increments are not

normally distributed could, almost certainly be established by using a Lilliefors test (Sachs,

1982).19

That test is sufficiently sensitive that the discrete spike at zero increment caused by

truncation of prices often leads to significance. I have shown that using the Fama methodology it

is possible to conclude, from data based on rounded prices, that the underlying increments

(which could be measured only with continuous recording of prices) are not likely to be normally

distributed. That conclusion, however, relies on the assumption that the parameters are stable.

The analysis of correlation coefficients leads Fama to the conclusion that 11 of the 30 stocks

have correlation coefficients that are more than twice their standard deviation. He dismisses this

on the grounds that the coefficients are small. This dismissal is used to argue that the

characteristic exponent estimates by range analysis are valid even though the author argued that

range analysis will give biased estimates if first order correlation is present in the sample. In

particular, no analysis was performed by Fama to investigate whether the sample correlation

coefficients are related to the values of the characteristic exponent obtained from range analysis.

I have shown that the relation between the two is very strong. Thus the correlation may be

significant at least in the sense if affects the bias in the estimation procedure.

The “conclusion” that the distributions are stable Pareto with characteristic exponent less than 2

is then used to argue that the two-standard deviations test of the correlation coefficients is

inappropriate because the variance is infinite. We have a circular argument: if the correlations

are not significant the characteristic exponent “is less than 2” and if the characteristic exponent is

less then two the correlations are irrelevant. But if the characteristic exponent were two the

correlations would be measured correctly (at least in the absence of rounding of prices) and

would imply a downward bias in the estimate of the characteristic function.

The simulations presented in this paper show that, at least in the case of t-distributed variables of

infinite variance, the threshold of 2 standard deviations is wrong, but because it leads to too few

rejections, not because it leads to too many as hypothesizes by Fama. Moreover, they indicate

that even series of 4,000 observations still suffer from this bias. This does not prove that the

same results would be obtained with stable Pareto distributions, but it raises doubts about the

19

It is appropriate to point out that Fama could not have used Lilliefors tests since they were not available in

1965.


validity of Fama‟s arguments and raises questions about the argument that the number of

observations is large enough.

The dismissal of autocorrelation on financial grounds is also of concern. While such grounds are

certainly relevant to some arguments, some standard is need for reaching that decision and

relevance should be established. Even for financial decisions. the standard should be based not

on whether it is possible to make money on the basis of one item of information at the time, but

whether it is possible to make money considering all the relevant information. No such test is

proposed or conducted. And no argument is made that the ability to make money is necessary for

a valid statistical decision.

Fama‟s remarked that autocorrelation coefficients of randomized differences in the logarithms of

prices “appear to break through their control limits only slightly more often than would be the case if the

underlying distribution of the first differences had finite variance.” On the other hand, the

autocorrelation coefficients for the ordered differences that are significant at the one percent

level for 9 out of 30 stocks, suggests that the hypothesis of serial independence is not consistent

with the data. This correlation cannot be dismissed on the grounds that the underlying variables

have infinite variance because, that would affect the randomized differences just as much as the

ordered differences. The use of the one percent level was purely conventional. In fact, all the

differences that were significant at the one percent level were also significant at the 0.12 percent

level. Thus there is substantial separation between these and the other 21 stocks. Moreover, the

simulations of the randomness of sample values of correlation coefficients when the underlying

variable has a t-distribution with infinite variance suggest that the analysis based on the normal

distribution undervalues, rather than overvalues the extent of autocorrelation at the 1 and 5

percent levels.

The simulations suggest that we may reject the hypothesis of a stationary normal distribution as

the underlying phenomenon. The issues of rounding of prices and of stationarity are, however,

difficult to address. For A.T. & T., the example given to support the statement that the

distributions of returns in two different periods are “extremely similar,” turns out to give

information that appears to reject stationarity. Thus the rejection must be viewed as tentative

rather than conclusive.

The simulations presented in this paper to illustrate the behavior of sample estimates of

correlation coefficients when the underlying variate has infinite variance suggest another

potentially useful approach to the study of non-normality in the increments of the logarithm of

price in securities markets. Those simulations showed that with prices rounded to $0.125, the

probability that the price will reach zero within a span of 1,500 days is substantial and increases

as the fatness of the tails increases. It seems clear that with the current practice of quoting prices

to the nearest cent this effect will be much smaller. Organized exchanges, however, have rules

that call for delisting of stocks when the price level declines to levels of about $1.00. Hence the

frequency of delisting of stocks could be used to provide insights into the behavior of the

extreme tails.

Finally, the methods as implemented by Fama do not appear to do justice to the underlying

framework. The conclusions could change with better implementation.


Appendix 1:

Estimation of the average and standard deviation of daily returns

from Fama‟s Table 4

In Table 4 of his paper Fama gives the highest and lowest daily return for each of the stocks,

which I will denote as ( ) and ( ), respectively, and the corresponding standardized

variables, which I will denote as as ( ) and ( ), respectively.

Since the standardized variable is given by:

( ) ( ) ( )

( ) ( )

where ( ) and ( ) are the sample mean and standard deviation of the daily return for stock i.

Hence the sample standard deviation can be computed from the relation:

( ) ( ) ( )

( ) ( ) ( )

And the sample average can be obtained from

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

Thus the data in the Fama‟s Table 4 allows us to recover these useful estimates. They are shown

in Table A1.1.20

Given the original data, the largest possible error in the standard deviation is

about 1 unit in the fifth decimal (ranging from 0.000005 to 0.000016); that in the mean is

between 1 and 2 in the fourth decimal (ranging from 0.000156 to 0.000175). The number of

digits given in the table attempts to reflect this fact. Errors of that magnitude would occur if the

rounding in the basic numbers given in the original table all attained the maximum with signs

that lead the result in the same direction.

20

The computed values of mean and variance for the 30 stocks in the Dow Jones Industrial Index have a

Person correlation coefficient -0.490, Spearman correlation of 0.483, and Kendall tau of -0.333, all

significantly different from zero at the 0.1 percent level. This is, presumably, simply a result of the

overall period from which the data are derived and the various sample periods used for the individual

stocks. Alternatively, it could be attributed to the way in which stocks for inclusion in the DJI are

determined. In either event, it raises some concern about how representative the sample might be.


Table A1.1

Estimated mean and standard deviation of the daily returns to the Dow stocks

Daily return Standardized Estimated

Smallest Largest Smallest Largest s(i) m(i)

Allied Chemical

-0.0718

0.0838 -5.012 5.820 0.01436 0.0002

Alcoa -0.0531 0.0619 -3.381 3.945 0.01570 -0.0001

American Can -0.0623 0.0675 -5.446 5.853 0.01149 0.0003

A.T.&T. -0.1038 0.0989 -10.342 9.724 0.01010 0.0007

American Tobacco -0.0800 0.0724 -6.678 5.949 0.01207 0.0006

Anaconda -0.0573 0.0600 -3.851 4.015 0.01492 0.0001

Bethlehem Steel -0.0725 0.0620 -5.571 4.748 0.01303 0.0001

Chrysler -0.0805 0.1009 -4.660 5.853 0.01725 -0.0001

Dupont -0.0599 0.0515 -5.843 4.950 0.01032 0.0004

Eastman Kodak -0.0443 0.0779 -3.399 5.832 0.01324 0.0007

General Electric -0.0647 0.0565 -5.135 4.456 0.01263 0.0002

General Foods -0.0468 0.0625 -3.937 5.065 0.01214 0.0010

General Motors -0.0975 0.0829 -7.761 6.547 0.01261 0.0004

Goodyear -0.0946 0.1744 -5.919 10.879 0.01601 0.0002

International Harvester -0.0870 0.0687 -6.299 4.880 0.01393 0.0007

International Nickel -0.0592 0.0567 -4.917 4.628 0.01214 0.0005

International Paper -0.0507 0.0533 -4.219 4.454 0.01199 -0.0001

Johns Manville -0.0687 0.1194 -4.386 7.575 0.01572 0.0003

Owens Illinois -0.0637 0.0606 -5.195 4.881 0.01234 0.0004

Procter & Gamble -0.0635 0.0656 -5.504 5.559 0.01167 0.0007

Sears -0.1073 0.0606 -9.338 5.148 0.01159 0.0010

Standard Oil CA -0.0633 0.0674 -4.793 5.056 0.01327 0.0003

Standard Oil NJ -0.1032 0.1007 -9.275 9.013 0.01115 0.0002

Swift & Co. -0.0675 0.0628 -4.761 4.418 0.01420 0.0001

Texaco -0.0593 0.0545 -4.650 4.193 0.01287 0.0005

Union Carbide -0.0456 0.0394 -4.396 3.783 0.01039 0.0001

United Aircraft -0.1523 0.0849 -8.878 4.939 0.01717 0.0001

US Steel -0.0539 0.0555 -3.968 4.091 0.01357 0.0000

Westinghouse -0.0804 0.0863 -5.415 5.808 0.01485 0.0000

Woolworth -0.0674 0.0896 -5.890 7.743 0.01152 0.0004

The computed values of mean and variance for the 30 stocks in the Dow Jones Industrial

Index have a correlation coefficient -0.49, significantly different from zero at the 0.2

percent level. This could be a result of the overall period from which the data are derived

and the various sample periods used for the individual stocks or of the way in which stocks


for inclusion in the DJI are determined. In either event, it raises some concern about how

representative the sample might be of stocks generally.

A third, and more disturbing possibility, is that the correlation is a manifestation of

skewness. For symmetric distributions of finite variance the expected value of the

covariance between sample values of the mean and standard deviation based on points is

given by:21

,( )( )- ( )

This obviously goes to zero as the sample size increases, but so do the variances of the first

and second moments. As a result:

* , ( )-+ { ,( )( )-

√ ,( ) - ,( ) -}

√ ( ) (

) ( )

Thus it would be possible to find a significant negative correlation if most of the stocks had

negative skewness. I believe this hypothesis can be disposed by observing that negative

skewness would imply that the largest standardized deviations in the negative direction

should have, on average, larger absolute values than those in the positive direction; the data

from the same table the difference of in absolute values is 0.033 (with positive deviation

being the larger) and a standard deviation of 1.75.

21

See, for example, Cramer, 1974.


Appendix 2:

The A.T.&T. data and related problems

The data on returns to A.T.&T. stock are the only basis for examining more closely the

contention of Fama that the series are stationary. This is important because if the parameters of

the distribution on daily increments changes over time then tests that cover the whole period can

give misleading results. My conclusion based on the data is that at least for this stock there is

substantial evidence for non-stationarity. It is important, however, to point out that I encountered

a number of problems with data on A.T.&T. given in Fama‟s paper and these may affect the

conclusion.

One small (and minor) issue is in the number of observations. A second one is that the average

daily return over the whole period given on page 58 is not consistent with the data given in Table

4, page 51. Finally, there is also reason to doubt the accuracy Figure 4 of Fama‟s paper. The

purpose of this appendix is to point these out since they might impact the analysis.

In his Table 3, Fama gives the number of observations on A.T.&T. increments as 1,219. On page

58 he gives the earliest date for A.T.&T. as 11/25/1957. He also gives the last date for which

data are available as 9/26/1962. Data on the Dow Jones Industrial Index obtained from the

internet has 1,218 closing prices between the two dates given. The maximum number of

increments would therefore be 1,217. That could easily be a typographical error.

Of the five stocks examined he reports some details for only one, A.T.&T. Even for that one the

details are so fragmentary that it is difficult to determine anything with certainty. On page 58

Fama gives the information that for the period between 11/25/1957 and 12/11/1961 the average

daily return to A.T.&T. was 0.00107, between 12/11/1961 and 9/24/1962 it was −0.00061, and

for the whole period it was 0.000652. This suggests that the segments might have been selected

for having different average rates of return but, since the standard deviations of the returns in the

two periods are not given it is not possible to establish that these two averages are significantly

different from each other.

Based on the number of DJI closing values, the first period would have included 1,017

increments and the second one 197. Using these period lengths, the weighted average rate of

return over the whole span would have been 0.00797, not very close to the reported number. For

the three averages to be consistent it would be necessary to have more than 300 trading days in

the second period. The period, however, is less than a year, so it cannot have more than 261

business days. Hence we have an inconsistency.


The computed weighted average from the data on the segments is within the estimated error of

the average inferred from the data in Fama‟s Table 4 and presented in Appendix 1, this suggests

that a typographical error may be involved value for the whole period.22

Without a consistent set

of numbers it becomes hopeless to try to estimate from the numbers what the standard deviation

might have been in the two segments. If the error is in the overall average it would have no effect

on the computation of the adjusted maximum differences between the cumulative distributions

functions of the first and second periods. If, however, the error is in the average return of one or

both of the shorter periods it is possible that the inconsistency contributes to making the two

periods appear more different than they are.

I cannot vouch for the accuracy of the figures. As an example, it can readily be seen that the third

panel of Figure 4 in Fama‟s paper shows two distinct points at returns of approximately −0.0226

and −0.0244 and probabilities, and cumulative probabilities, respectively, of about 2.52 and 2.59

percent. That implies a sample size of over 1,400, but since the period involved was less than a

year the sample size simple could not have been more than 300. Similarly, a point at a return of

approximately −0.050 appears in both the A.T.&T. panel of Fama‟s Figure 2 and the top panel of

his Figure 4, but does not appear in either of the two bottom panels, in which the lowest return is

at −0.037 for the first period and −0.044 for the second. These two are the second and third

lowest returns in the top panel of Figure 4 and the corresponding panel of Figure 2. This problem

is not so apparent because the scales used for the abscissas are not the same, but as I worked on

the issue of stationarity it quickly came to notice. The effect of these two problems on the

Figures 9 and 10 in the text cannot be of major import. The first problem can affect the scaled

differences by no more than 1/1,400. Even if the point at −0.050 was omitted in the calculation

of the frequencies, it can affect the scaled differences by at most 1/200. Thus the combined effect

can be no more than 6 tenths of one percent, compared to a gap of some 8 percent between the

observed maximum difference of 20 percent and the critical difference of no more that 12

percent.

22

The estimate obtained from Table 5 also requires more days in the second period, but error analysis indicates

that a value as high as 0.00085 would be consistent with the data. A value that high would require fewer days

in the second period than the actual number of trading days (118).


References

Ball, C., 1988, “Estimation Bias Induced by Discrete Security Prices,” Journal of Finance,

vol. 43, pp. 841-865.

Campbell, J.Y, A.W. Lo, and A.C. MacKinlay, 1997, “The Econometrics of Financial Markets”,

Princeton University Press, Princeton, NJ.

Cramer, H., 1974, “Mathematical Methods of Statistics,” Princeton University Press, Princeton,

NJ.

Cho, D. and E. Frees, 1988, “Estimating the Volatility of Discrete Stock Prices,” Journal of

Finance, vol. 43, pp. 451-466.

Fama, E. F., 1965, “The behavior of stock market prices”, Journal of Business, vol. 38, 34-105.

Gottlieb, G. and A. Kalay, “Implications of the Discreteness of Observed Stock Prices,” Journal

of Finance, vol. 40, pp. 135-154.

Harris, L. 1990, Estimation of Stock Variances and Serial Covariances from Discrete

Observations,” Journal of Financial and Quantitative Analysis, vol. 25, pp. 291-306.

Kendall, M.G., 1970, “Rank Correlation Methods,” Hafner Press, New York.

Sachs, L., 1982, “Applied Statistics – A Handbook of Techniques”, Springer-Verlag, New York,

NY.

Snedecor, G.W. and W.G. Cochran, 1967, “Statistical Methods,” The Iowa State University

Press, Ames, IA.

Venezian, E., 2011, “Effects of rounded prices on the estimation of the parameters of the pricing

process”, paper presented at the 19th

Annual Conference on Pacific Basin Finance, Economics,

Accounting, and Management, Taipei, Taiwan.