© Emilio Venezian 2009 Page 1
What does Fama (1965) establish?
by
Emilio Venezian
© Emilio Venezian 2009, 2011
This is a work in progress and is sent to you for your information and comment. Please do not
cite, quote, or otherwise distribute the information without prior written consent form me.
Invitations to present and discuss the material are, of course, welcome, as are criticism and
discussion of what I express.
© Emilio Venezian 2009 Page 2
What does Fama (1965) establish?
by
Emilio Venezian
Abstract
This paper provides a review of Eugene Fama‟s well known paper of 1965 on the structure of
stock market returns. This paper departs from tradition by challenging the major conclusions
reached using the data that are contained in the original paper. I find that most of his conclusions
are not well grounded. A major source of problems is that Fama‟s methodology relies on the
assumption that recorded daily prices are continuous and fails to take into account the biases in
the statistics that arise from rounded data. A second source is the failure to check carefully for
indications that the price-generating model may not be stationary over the period studied.
Moreover, Fama‟s conclusion that the characteristic exponent of the distribution of increments is
less than two is based on the assumption that the sample increments have zero first order
autocorrelation and that the justification of conclusion that the autocorrelations is zero is based
on the assumption that the distribution of price increments is not Gaussian; that leads to a
circular argument rather than to a conclusion. The data show that first order autocorrelations
were not zero even if the biases of rounding are insignificant. The problems are aggravated by
the fact that the three methods Fama uses to estimate the characteristic parameter lead to values
that are basically uncorrelated.
© Emilio Venezian 2009 Page 3
1. Introduction
The paper “The Structure of Stock Market Prices” by Eugene Fama is sometimes credited as one
of the foundations of the “efficient markets hypothesis” that has dominated financial thinking for
half a century. It has been cited several hundreds of times. The website of the Journal of Finance
lists some 311 published papers as citing the paper, and the annual citation rate has, on average,
been increasing. One hundred and sixty eight of the citations are in publications with date after
1999. Some books cite the paper with approbation and quote the conclusions without
discussion.1 That seems to be unjustified, for reasons I will discuss in this paper.
The paper consists of two parts. The first part deals with the issue of whether the daily
increments in the logarithm of prices (that I will refer to as “increments”) may be viewed as
samples from a normal distribution or are better characterized as samples from a stable Pareto
distribution of characteristic exponent less than 2. The second part deals with the issue of
whether the increments are serially independent. The author reports a great deal of analysis on
these issues and concludes that a characteristic exponent less than two and serial independence
appear to be justified on the basis of the analysis.
In the first part of the paper, Fama suggests some methods of estimating the characteristic
exponent of series of random numbers, based on the properties of stable Pareto distributions and
derives from these two basic estimation equations which are non-linear transformations of the
equations expressing the basic properties. He implements these estimation procedures ignoring
the fact that non-linear transformations entail biases in the results which, if the empirical data
had matched the theoretical models, would have resulted in estimates that are biased downward.
The paper gives no information of the variability of the estimators, so the fact that the findings
are that estimates of the characteristic exponent are less than 2 mean little: we do not know
whether the difference is large or small with respect to the expected bias and we do not know
how it compares to the standard deviation of the estimators.
Moreover, the theoretical models deal with increments inferred from prices that are recorded as
continuous variables, whereas the data used to obtain the estimates is based on prices that were
recorded to the nearest $0.125, the tick size in effect during the period in which the data were
obtained. Any kind of rounding creates problems with estimators. In the literature we find
hundreds of papers on this problem, sometimes categorized as that of “grouped data” and “errors
in variables” problem. Given this complication, it is not possible to be dogmatic about the net
direction of all these effects so the first part of the paper cannot be dismissed so easily.2
1 See, for example, Campbell, Lo, and MacKinlay, 1997.
2 In general, rounding would make a normal distribution appear leptokurtic because it makes the probability of
zero increments much more probable; this would provide an intuitive basis for concluding that it would make
the downward bias more pronounced. The question, however, is how far will the “fatness” of the tails prevail
and what effect will that have. As far as I know, the effects of rounding when the underlying distribution is
stable Pareto with exponent less than 2 have not been discussed adequately.
© Emilio Venezian 2009 Page 4
The second part of the paper Fama estimates the serial autocorrelation coefficients of the series
and treats them as though the underlying increments came from continuously recorded prices;
unfortunately rounded prices introduce biases. He also introduces a theoretical problem: he uses
sampling theory for correlation coefficient estimators based on sampling from normally
distributed variates. That seems inconsistent with his general conclusion that the increments are
from stable Pareto distributions of characteristic exponent less than 2. Fama also does extensive
testing of the independence of increments based on runs tests; the rounding of prices makes the
runs tests theoretically inapplicable.
The main purpose of this paper is to discuss these and other related issues in some detail. The
material will be presented in sections, as follows. Section 2 discusses the relation between the
underlying results from the theory of stable Pareto distributions, as presented by Fama, and the
estimators that he used, presenting the results of Fama and some of the other issues that need to
be considered in assessing their significance. Section 3 discusses how the rounding of prices
affects the estimates of the increments and creates problems in the evaluation of the results.
Section 4 suggests an alternate way of approaching the issue of whether the issue of the
distribution of increments and uses simulations to provide some insights into the results. Section
5 turns the attention to the issues related to the assessment of independence. Section 6 deals with
issues that interlink the two major parts of Fama‟s paper. Section 7 provides some tentative
conclusions.
2. On Fama‟s Estimators of the Characteristic Exponent
In all, Fama describes three estimators of the characteristic exponent, α. One relies on graphical
analysis and Fama himself describes it as subjective; in fact, he limits himself to publishing
ranges for the estimates of the values of α obtained from this method. The other two methods are
somewhat less subjective. He labels them the range analysis and sequential variance estimators.
The idea for range analysis is derived from the observation that for all stable Pareto variables the
interquantile range of sums of variables is related to the number of terms in the sum, by:
( ) ⁄ ( ) ( )
where is the interquantile range of the set of sums of non-overlapping terms of the series.
This relation holds for any interquantile range.
In order to solve this for the variable of interest, , he solves Equation 1 to express the estimator
as:
© Emilio Venezian 2009 Page 5
( ) ( )
( ( ) ( )⁄ ) ( )
This estimator is implemented by Fama by finding the values for values of of 0.75, 0.83, 0.90,
0.95, and 0.98 and for values of of 4, 9, and 16. No rationale is given for the particular choices.
The sequential variance estimate is based on the observation that the quantiles of the distribution
of the variance of samples of and terms from a stable Pareto distribution of characteristic
exponent are related by the expression
( ) ( ) ( ) ⁄
( )
where ( )
∑ .
∑ ( ) /
From Equation 3 he obtains an estimator that can be expressed as:
( ) (
⁄ )
( ( ) ( )⁄ ) ( ⁄ )
( )
The estimator is implemented by Fama by obtaining the variance of the first 200, 300,…800
increments in his series as values of ( ), and that of the first 300, 400, and then increasing by
100 until the maximum length of each series is reached, as values of ( ). This choice of
implementation uses a single estimate of the ratio ( ) ( )⁄ as an estimator of the median.
Moreover, it uses overlapping periods so that the numerator and denominator are certainly not
independent and ensures that increments that appear late in the sequences are never represented
in ( ).
Both estimators are of the form:
( ) ( )
where is the quantity measured empirically. It is not at all clear that the expected value of the
estimator is equal to the value of . In fact is a convex function of so by the inequality of
Jensen, 1906, we will have:
, -
( , -) ( )
© Emilio Venezian 2009 Page 6
Thus the range analysis estimator is clearly biased downward from the true value. This implies
that if the increments were indeed normally distributed the average of estimators should be less
than 2.
In the case of the sequential variance estimator the case cannot be made as strongly, because the
relationship involves the median and not the expected value of the random variable . It is,
accordingly, possible that:
( ) ( ⁄ )
( , ( ) ( )⁄ -) ( ⁄ ) ( )
though the fact that it might be so needs to be proven rather assumed.3
Fama finds that his estimates4 of the characteristic exponent are frequently less than 2 (in 21 out
of 30 stocks in his data base for the range analysis and 23 out of 30 for the sequential variance
estimation). From the comments made above it follows that downward departures could be due
merely to the non-linear form of the estimators. Hence the finding does not establish that the
characteristic exponent is not exactly 2.
3. The Effects of the Rounding of Prices on Fama‟s Arguments
The basic data on which Fama relies are daily closing prices of the 30 Dow Jones Industrial
stocks in the New York Stock Exchange. He tells us that the data are adjusted for stock splits.
That data would lead to unbiased and correct computation of the daily increments in stock prices
if the NYSE recorded prices were recorded as continuous variables, as Fama assumes in his
equations. In fact, however, at the time the data were generated the recording was based on ticks
of 12.5 cents. A theoretical closing price of $33.100 would have been recorded as either $33 or
$331/8. Two potentially serious complications arise from this rounding.
The first is that the usual adjustment for splits and stock dividends is not perfect. With
continuous prices a simple multiplier will achieve comparability of the increments before and
after the change in the number of shares. With rounded prices a change in the number of shares
would require a corresponding change in the tick size for the distribution of increments to remain
unaffected. In particular, it is possible that large splits may lead to non-stationarity of the
increments estimated from the reported prices. That problem cannot be fixed so easily.
3 It has to be kept in mind that we are not dealing with the unbiased estimator of the variance, so that the
expected value of the ratio is not exactly one, though it will be very close to one if the numbers of points in the
samples is of the order of hundreds rather than tens. 4 It should be made clear that Fama summarizes his results. For the range analysis estimator he gives the average
of the 15 values obtained for the parameter sets he selected. For the sequential variance estimator he gives the
median of the values he calculated, these numbered from 49 to 84 depending on the length of the series. No
rationale is given for these choices of summary statistics.
© Emilio Venezian 2009 Page 7
The second is that even in the absence of effects such a splits and stock dividends the increments
are measured with error and that error distorts the underlying distribution. If the rounding is to
the nearest integer tick, the recorded price may differ from the price in Fama‟s model by as much
as 6.25 cents. Moreover, as the price of a security changes as a result of the randomness of the
increments, the distribution of the calculated increments must change even if the underlying
increments are stationary as specified in the model.
Analyses based on the assumption that the underlying distribution a price changes is normal with
independent differences lead to the conclusion that the rounded prices will yield unbiased
estimations of the mean of the distribution of increments, an overestimate of the variance of that
distribution, an elevated excess variance, and a negative autocovariance of the reported price
changes at a lag of one period (Gottlieb and Kalay, 1985, Marsh and Rosenfield, 1986, Ball,
1988, Cho and Frees, 1988, Harris, 1990, Venezian 2011). .
This has some potentially important consequences. One is that the graphs and tables that display
the frequency of departures from the mean as multiples of the standard deviation are understating
the case, because the estimates of the standard deviations are likely to be too high. Assuming that
the stocks traded mostly at prices above $40 and that the standard deviations of daily returns
were of the order of 0.015, this effect should be negligible.5 Another, and potentially more
serious one, is that if the true increments were indeed stable Pareto, then the reported increments
would not be stable Pareto, so tests based on the properties of that family of distributions may
not perform well with the empirical data. Tests based on kurtosis will also give incorrect results,
Venezian, 2011. Yet another implication is that the price does matter; thus stocks with the same
underlying increments and the same tick size will experience different distributions of measured
increments if the paths of price levels are different. Finally, though the conditions for ergodicity
are satisfied, at least for increments whose distributions have finite variance, the time required to
assure that the time average approaches the ensemble average may be very long indeed. This
suggests that a new approach is needed to investigate the matter.
5 Appendix 1 provides evidence that the standard deviations of daily returns are of this order of magnitude.
Correcting the results for rounding would make the standard deviation of the underlying process somewhat
lower. The price of $40 takes into account my best guess of the magnitude of the distortions.
© Emilio Venezian 2009 Page 8
4. A Possible Alternative Approach
One possible line of attack is to retrench to the mindset of classical statistical tests. In principle
we can postulate as a null hypothesis that the increments are indeed normally and independently
distributed and find the distribution of any statistic we wish based on that null hypothesis. Then
we can ask if the empirical measures are sufficiently different from the derived distribution as to
warrant rejection of the null hypothesis.
That is a great deal easier to say than to do. Analytical treatment appears to be prohibitively
difficult in the case of rounding, especially for estimators such as those used by Fama, and
numerical treatment has to cope with many necessary parameters, such as the initial price, the
mean and standard deviation (or more generally, location and scale parameters) of the
distribution and the numbers of points available and selected.6 But these are not impossible tasks.
In practice, it is of interest to determine, for given parameters generally in the range of those
used by Fama, what the distribution of the individual measures used by Fama might be if the
underlying increments are independent and normally distributed, and whether that distribution
would be consistent in some sense with the data. This might provide some indication of whether
the hypothesis of normality can be rejected or not on the basis of results such as those presented
by Fama. It is also of interest whether the summary statistics of Fama (the averages of the range
analysis estimators and the medians of the sequential variance estimators) might be consistent
with the null hypothesis.
To address these issues, simulated increments were drawn from a normal distribution, daily
prices were calculated from these increments and rounded to the nearest $0.125. From these
rounded prices the reported daily increments were calculated and then the methods that Fama
use were implemented. I used two sample sizes, with 1,152 and 1,728 observations. These cover
most of the range of sizes which Fama encountered (1,118 to 1,693) and are evenly divisible by
the summing intervals he used (4, 9, and 16 days). I used initial stock prices of $10, almost
certainly below the range encountered by Fama, and $20 and $50, which should be in the right
range. At initial prices above $50, the effect of rounding is small. In all the simulations I used
stationary parameters for the underlying distribution of increments, with an expected value of
zero and a standard deviation of 0.015, which are at the center of the range of those encountered
in Fama‟s material, as shown in Appendix 1.
Figure 1 shows the results of range analysis when the initial stock price is $50. In this and
subsequent figures it should be kept in mind that a maximum absolute difference in the
cumulative distribution functions of a sample of 30 stocks and the simulated curve of 500 points
of 0.36 (36 percent) is the critical level of the Kolmogorov Smirnov test to reject, at the 0.1
6 These are a minimum set. Other relevant variables include, among others, descriptions of whether the ranges are
independent, completely overlapping, or partially overlapping.
© Emilio Venezian 2009 Page 9
percent level, the hypothesis that two distributions are the same.7 The difference between the
mean and the median is of small import. It is clear, on the other hand, that the correlation
coefficient has a major systematic effect on the estimates, as Fama discussed, with positive
correlation giving a median estimate of 1.89 at a correlation of +0.1, 2.06 at a correlation of zero,
and 2.27 at a correlation of −0.10. This issue will be discussed more fully in Section 6.
The results of Fama are not consistent with the hypothesis that the increments come from an
underlying stable normal distribution unless the correlation coefficient is close to -0.1. Only two
of the thirty stocks had correlation coefficients in that range, whereas 7 had coefficients close to
+0.1. Truncation and Jensen inequality effects, which are taken into account by the simulation,
do not have an important effect on that conclusion but their impact is plain to see if the median
values of the distribution of α are read from the graph. Their effect is to strengthen the
conclusion; shifting the curve to a median of 2.0 would reduce the maximum difference from the
empirical distribution to less than 0.35.
Figure 1: Results from range analysis estimation
The corresponding results for initial prices of $20 and $10 are not sufficiently different to
warrant separate discussion. The length of the series has effects so small that the thickness of a
line conceals them.
The results for sequential variance estimates show minimal effects from correlation, so that
variable can be dismissed from our discussion. Results for a $50 stock are shown in Figure 2.
7 In contrast, a difference of 0.126 (12.6 percent) indicates that the difference between two simulated curves is
significantly different at that level.
© Emilio Venezian 2009 Page 10
The results of these simulations strengthen the conclusions of Fama in the sense that they concur
that the distribution underlying the empirical data is not likely to be a normal distribution.
Figure 2: Results from sequential variance estimation
In this case, the differences between long and short series and between mean and median are
somewhat more pronounced. Nonetheless, the distribution of the empirical data is not consistent
with the hypothesis of a stationary normal distribution for the underlying increments.
There is a fine but important difference between the conclusions drawn here and those that of
Fama. His conclusion is that the distribution is a stable Pareto distribution. My conclusion goes
no further than saying is that the data from this method are not consistent with the null
hypothesis of an underlying stationary normal distribution and it would go no further even if I
were convinced that the expected value of the population autocorrelation coefficients is exactly
zero and the process was stationary.
© Emilio Venezian 2009 Page 11
5. Assessment of the Independence of Sequential Increments
Fama relies heavily on runs test in the assessment of serial independence. The runs tests of
Mood are based on the null hypothesis that there are a number of distinct outcomes, each of
which has a probability of occurring that is constant over the period of observation. Ideally the
distinct outcomes can be specified objectively a priori as in the case of heads, tails, or edge on a
series of flips of a coin or the numbers 1 through 10 on the toss of a dodecahedral die.8
If the increments are postulated to come from a specific distribution, and the prices are recorded
as continuous variables, these tests are applicable. The probability that the increment will be
between a and b will be the same over time, provided the distribution of increments is stationary.
Hence for sufficiently long series of observations the runs tests should reject the null hypothesis
if the increments are not independent or if the distribution is not stationary.
If the prices are recorded on a rounded basis and the increments have to be inferred from the
prices, then the tests should fail for sufficiently long series if the distribution has non-zero
variance or expected value. This result can be appreciated readily because the probability of a
calculated increment of exactly zero, when prices are recorded to the nearest 12.5 cents is going
to be different when price is $10 than when it is $15. In fact, under those conditions we cannot
even be sure that stocks whose increments have non-stationary means and variances will fail the
test more often that stocks whose increments have stationary distributions. For example, if the
standard deviation declines as price increases, so that their product remains nearly constant, then
the probability of exactly zero reported change in price will remain approximately stationary.
Thus all the runs test conducted by Fama amount to telling us that either the series were too short
to detect changes or that the mean and variance of the underlying increments changed over time
(in response to price changes or as a result of economic events) in a pattern that led to non-
rejection.
The other approach that Fama takes is through the estimation of autocorrelation coefficients at
various lags. Even if we are willing to forget the fact that prices are rounded, and other problems,
the results are somewhat dismaying for someone who would like to believe that the increments
are independent. At lag 1, seven of the 30 stocks have correlation coefficients that are significant
at the 1 percent level by the Fisher z-transform test. These are shown in Table 1. The results are
peculiar in that the correlation coefficients seem to cluster at plus and minus 0.1. Note that the
8 If the categories are specified a posteriori we have to consider the danger of using the standard test because the
“outcomes” can be selected, subconsciously or intentionally, to achieve preconceived results.
© Emilio Venezian 2009 Page 12
numbers given here are somewhat different from those given by Fama.9 His criterion was to
discuss correlation coefficients greater than twice their standard deviation; the z transform gives
a better approximation to normality than the correlation coefficient and a one percent criterion is
easy to implement.
Table 1: Stocks with significant correlation coefficients at lag 1
Stock ρ p- level
Alcoa 0.118 2.21 ∙ 10-5
American Can -0.087 117.71 ∙ 10-5
American Tobacco 0.111 3.34 ∙ 10-5
Goodyear -0.123 1.28 ∙ 10-5
International Nickel 0.096 34.83 ∙ 10-5
Procter & Gamble 0.099 8.02 ∙ 10-5
Sears 0.097 31.68 ∙ 10-5
Texaco 0.094 67.42 ∙ 10-5
Union Carbide 0.107 16.75 ∙ 10-5
Assuming that the test is valid having, 9 of 30 stocks with correlation coefficients that are not
likely to have arisen by chance under the null hypothesis cannot be anything but discouraging.
Moreover, the levels of significance are well below one percent. Fama dismisses this finding
with the statement:
“All the sample serial correlation coefficients in Table 10 are quite small in
absolute value. The largest is only 0.123. Although 11 of the coefficients for lag
τ = 1 are more than twice their computed standard errors, this is not regarded as
important in this case. The standard errors are computed according to
equation (12); and, as we saw earlier, this formula underestimates the true
variability of the coefficient when the underlying variable is stable-Paretian with
characteristic exponent α < 2. In addition, for our large sample, for our large
samples the standard error of the serial correlation coefficient is very small. In
most cases a coefficient as small as 0.06 is more than twice its standard error.
„Dependence‟ of such small order of magnitude is, from a practical point of view,
probably unimportant for both the statistician and the investor. ” (page 70)
Fama‟s Equation (11) is:
( )
( )
9 Fama marks 11 of the 30 stocks as having correlations coefficients more than 2 standard deviations away from
zero; as noted in the body I use the Fisher z-transform test which, under my null hypothesis of normality is
better when the underlying correlation in not zero.
© Emilio Venezian 2009 Page 13
and Equation (12) is:
( ) √ ⁄
where is the sample size.
After displaying this equation Fama states
“Previous sections have suggested, however, that the distribution of is stable
Paretian with characteristic exponent α less than 2. Thus the assumption of finite
variance is probably not valid, and as a result equation (12) is not a precise
measure of the standard error of even for extremely large samples. Moreover,
since the variance of comes into the denominator of the expression for , it
would seem questionable whether serial correlation analysis is an adequate tool
for examining our data.” (page 69)
He then summarizes the results of an exercise in which he randomized the order of the first
differences for each stock, estimated the sample autocorrelation coefficient at lag of 1 day for the
first 5, first 10… of the randomized differences, and compared how often the results crossed the
boundary of zero plus or minus two standard deviations estimated from Equation (2). His
summary is:
“Although the results must be judged subjectively, the sample serial correlation
coefficients appear to break through their control limits only slightly more often
than would be the case if the underlying distribution of the first differences had
finite variance. From the standpoint of consistency the most important feature of
the sample coefficients is that for every stock the serial correlation is very close to
the true value, zero, for samples with more than, say, three hundred observations.
In addition, the sample coefficient stays close to zero thereafter.” (page 70)
I believe that this exercise suggests something quite different from what Fama read into it. It
certainly does not stand as a clear demonstration that “this formula underestimates the true
variability of the coefficient when the underlying variable is stable-Paretian with characteristic
exponent α < 2.”10
The paper does not report how many randomizations were carried out for
each stock; if only one was performed for each stock, we have no guarantee that the “true value”
of the serial autocorrelation coefficient for that one randomization was actually zero. That part of
the conclusion might be credible if several dozen randomizations had been performed.
The fact that the control limits are broken “only slightly more often than would be the case if the
underlying distribution of the first differences had finite variance” is puzzling. In the first place,
10
The stem “underestimate” occurs only twice in the paper; once in the passage quoted and once on page 39, in an
entirely different context. In particular the possibility that the values of the stable Pareto parameter estimated by
Fama might underestimate the true parameter is not discussed in the 1965 paper.
© Emilio Venezian 2009 Page 14
even in the case of independent, normally distributed variables the distribution of the sample
correlation coefficient is not normally distributed but has a t-distribution (Snedecor and Cochran,
1967) so that the control limits are not independent on the number of points. One of the usual
tests for significant departure of the sample coefficient from zero relies on this by comparing
the value of | |√
to the t-distribution with degrees of freedom. The better alternative
is to use the normalized Fisher z-transform,
√ .
/ which, under the null hypothesis of
no correlation is approximately normally distributed with zero mean and unit variance even for
samples of size 50 or so. This formulation also is strictly correct for normally distributed
variables, though some books suggest that it may be useful with other underlying distributions.11
The normalized Fisher transform is particularly useful when the null hypothesis involves
correlation coefficients different from zero or if we are interested in the fiducial range of a
measured coefficient that is not exactly zero, because the transform remains close to normal even
for large correlation coefficients, though the expected value is no longer zero.
The fact that Fama uses control limits that do not take into account correctly the influence of
sample size makes the statement about deviations somewhat more ambiguous than it appears,
since no specific mention is made of how often the deviations beyond the control limits occurred
or whether they occurred only at very large sample sizes (when the true control limits are
virtually invariant as the sample size changes) or at smaller sample sizes (where the difference
between the t- and normal distributions might contribute to the problem).
Taken in context, however, the exercise taken as a whole has some interesting consequences.
On one hand we have the finding that when increments are randomized the hypothesis of zero
autocorrelation is rejected about as often as it would under the hypothesis that increments are
independent and normally distributed. On the other hand we have the finding that if we take the
increments in their natural order the results reject the hypothesis of zero autocorrelation well
beyond the one percent level for 9 out 30 stocks. Put together these findings are not consistent
with the notion that it is the distribution of increment sizes that creates problems with the tests. If
the distribution were the problem it would have been a problem no matter what the ordering of
the increments.12
Thus the exercise underlines that the finding of so many significant correlation
coefficients is, at least from the statistical point of view, a very unlikely event even if we do not
take into account the conclusion that rounded prices lead to estimated correlation coefficients
that are likely to be lower than the true coefficient as shown by Harris, 1990 and Venezian, 2011.
The effect of the shape of the distribution of increment sizes on the distribution of correlation
coefficients is potentially important and deserves more than a passing mention. This, once again,
11
For example, Sachs, 1982, states: “This r to z transformation requires that x and y have bivariate normal
distribution in the population. The larger the sample size, the less stringent is this assumption.” However, later
in the same paragraph he states: “One uses this transformation only for samples with n > 10 from a bivariate
normal distribution.” (page 427) 12
This conclusion would be even stronger if we knew that multiple randomizations had been conducted.
© Emilio Venezian 2009 Page 15
can be explored by some relatively simple simulations. I thought it would be instructive to use
the family of the t-distribution as a starting point, because when the parameter of that family is
set to 1 we have the Cauchy distribution, which is the member of the stable Pareto family with
characteristic exponent α = 1, and when the parameter is to infinity we have the normal
distribution which is the member of the stable Pareto family with characteristic exponent α = 2.
Between these extremes we have a family that is leptokurtic; though for most practical purposes
the distribution can be considered as normal when the parameter is over 100.
A simple way of displaying the results is to determine the ratio of how often the normalized
Fisher z transform (NFT) would reject the hypothesis of zero correlation at the two tails to how
often the hypothesis should be rejected. For example, if 10,000 series of 1,501 points are
simulated, we would expect to have 100 points rejected at each tail if we use a one-sided null
hypothesis. If, in fact, the simulated series were to give us 200 in the lower tail and 30 in the
upper tail, the ratios would be 2.0 and 0.3, respectively. The results of such a simulation are
given in Figure 3. Note that the symbols for the individual points are color filled if the number of
points was significantly different from the expected by a chi-squared test and open if the results
were not significant. Because many tests were performed, I used the 0.005 level of significance
as a threshold. The results are highly supportive of Fama‟s contention that tests based on the
normal distribution would reject the null hypothesis of zero correlation too often. In fact, it is not
necessary to have variates with infinite variance to have a substantial distortion; even the t-
distribution with parameter 3, which has a finite variance, results in about twice as many
rejections as would be appropriate. By the time the parameter reaches 5, however, the test
performs approximately as we would want.
Figure 3, however, displays the results that would be obtained with continuous recording of
prices, which was presumably what Fama had in mind in making his argument. The prices Fama
used to determine increments and their correlation coefficients were, however, recorded only to
the nearest $0.125. The performance under those conditions would be a better gauge for making
decisions. The results of such simulations will, of course, depend on the assumed initial price of
the stock.
Figure 4 show the results for simulations with an initial price of $50 and a tick size of $0.125.
One clear similarity with Figure 3 is that as the probability level tends to zero the ratio goes well
above one. Thus for sufficiently stringent tests, the argument remains valid.
There are, however, also some striking differences between Figure 3 and Figure 4. One is that
Figure 3 appears to be symmetric, with the two tails behaving in much the same way. Figure 4,
on the other hand suggests a lack of symmetry. This should be expected from the known biasing
effect of rounding that was discussed above. The other striking difference is that Figure 4 shows
the ratio going well below one, and with numbers large enough to be statistically significant. For
some of the parameter values used in the simulations the ratios are low enough at probability
levels of 0.01 and 0.02 to invalidate the argument put forth be Fama.
© Emilio Venezian 2009 Page 16
Panel A
Panel B
Figure 3: Performance of the NFT with correlations from t-distributed variates
© Emilio Venezian 2009 Page 17
Panel A
Panel B
Figure 4: Performance of the NFT with correlations from t-distributed variates
with rounding of prices and an initial stock price of $50
© Emilio Venezian 2009 Page 18
Figure 5 shows that even with initial prices of $200 the issues persist. Ratios below one exist at
probability levels below 0.10 and in some case, including case with infinite variance, do not
return above one until probability levels well below the conventional 0.01.
An important feature of the simulations that is not apparent in the figures is that with continuous
prices the possibility of an infinite return is not possible in theory.13
With rounding to the nearest
$0.125 and distributions of very large or infinite variance, however, this should be expected.
Replicates that met this problem in the 1,501 “days” of simulated prices did not yield correlation
coefficients. Thus we have a form of “survivorship bias” but one that would also be met in
practice. With an initial stock price of $50 the problem was encountered 4,195 times with a t
parameter of 1, 2,366 times with a parameter of 1.25, 849 times with a parameter of 1.50, 229
times with a parameter of 1.75, and 43 times with a parameter of 2.00; the problem did not occur
with the parameters of 3.00 and 5.00, which have finite variance. With an initial stock price of
$200 the corresponding numbers for the parameters with infinite variance were 3,922, 1,948,
642, 152, and 25; again, it never occurred with the two parameters of finite variance.
Thus the rounding of prices may cause serious problems and it raises doubts about the validity of
Fama‟s arguments. That does not imply that his arguments must be wrong, since the stable
Pareto family is not the t-distribution family. But the example does provide evidence that
rounding can have serious implications that need to be considered carefully in appraising the
relation between empirical evidence and theory.
This exercise suggests that with fat-tailed distributions that are not stable Pareto distributions the
use of standard normal criteria for evaluating correlation coefficients is dangerous. It also raises
doubts about Fama‟s view that the standard deviation derived from Equation (12)
“underestimates the true value of the variability when the underlying variable is stable-Paretian
with characteristic exponent α < 2.” It certainly leads to too many rejections of the null
hypothesis at extremely low values of probability, but to too few rejections at levels close to the
ones frequently used in research.
The results of the study showed 2 significant negative coefficients and 7 significant positive
coefficients. Even if the negative tail were overrepresented 10 fold it would not affect the
conclusion that there is an excess of “significant” correlations unless the ratio at the upper tail is
also significantly higher than 1. Hence, while I agree with Fama that the measured
autocorrelation at a lag of one day is small, I could not dismiss it so readily as statistically
insignificant even if were to I assume that the distribution of daily returns has infinite variance.
13
In practice it is possible that this would happen in a simulation because even with double precision calculations
“continuous pricing” is subject to some rounding. In some 100,000 simulations this never occurred.
© Emilio Venezian 2009 Page 19
Panel A
Panel B
Figure 5: Performance of the NFT with correlations from t-distributed variates
with rounding of prices and an initial stock price of $200
© Emilio Venezian 2009 Page 20
6. Interactions and Other Issues.
In this section I discuss a number of points that have not been noted above. I begin by pointing
out that if we take Fama‟s three methods of estimating the characteristic coefficient at face value,
they show no consistency. Then I note that there is internal evidence that the two more
quantitative methods may not be providing unbiased answers, these are discussed under separate
headings. Finally I deal with the issue of stationarity since this is essential for the soundness of
the estimation procedures and in my view was not tested adequately.
Cross-validation of the methods of estimation
Since the three methods are intended to measure the same characteristic, one might expect to see
some measure of concordance in a published academic paper. Unfortunately, none is provided.
The results of exploring concordance are not encouraging. Fama presents a summary of his
estimates in his Table 9, with the comment “Even a casual glance at Table 9 is sufficient to show
that the estimates of α produced by the three different procedures are consistently less than 2.”
The table, here repeated as Table 2, has other features that should be just as easy to appreciate.
One noteworthy aspect is that in only 8 of the 30 stocks does the estimate from range analysis
fall within the range provided by the graphical method and only in 3 of 30 cases does the
estimate from the sequential variance method fall within that range. In fact in only one case
(General Foods) do the range analysis and sequential variance method estimates both fall within
the range Fama found through his graphical method.
A second observation is that the results from the two methods that Fama classifies as “less
subjective” do not seem to have much in common. Correlation coefficients of the characteristic
values obtained by the two methods are small and make it unlikely that they are measuring the
same property. The Pearson correlation coefficient between the estimates of the range analysis
and sequential variance methods is only 0.058, which is not significantly different from zero if
we use standard deviations that assume the sampling is from a bivariate normal distribution. The
Spearman correlation coefficient is 0.066. Tables of significance for this coefficient with so few
points and ties are not available, but this assures us that the lack of correlation dos not arise from
a few outliers. Kendall‟s tau, which measures ordering with no effect of distance between ranks
and it therefore distribution free, is 0.064 with a standard deviation of 0.123. According to
Kendall, 1970, its distribution approaches normality very quickly; hence the coefficient does not
appear significantly different from zero.
The level of correlation implies that the variance in the estimates is much greater than the
variance in the supposed characteristic exponents among the 30 stocks. Thus there is clear
internal evidence that the estimators are not very effective and do not give consistent results.
© Emilio Venezian 2009 Page 21
Table 2. Fama's Estimates of Characteristic Exponent
Graphical Range Sequential
Stock Low High Analysis Variance
Allied Chemical 1.99 2.00 1.94 1.40
Alcoa 1.95 1.99 1.80 2.05
American Can 1.85 1.90 2.10 1.71
A.T.&T. 1.50 1.80 1.77 1.07
American Tobacco 1.85 1.90 1.88 1.24
Anaconda 1.95 1.99 2.03 2.55
Bethlehem Steel 1.90 1.95 1.89 1.85
Chrysler 1.90 1.95 1.95 1.36
Du Pont 1.90 1.95 1.88 1.65
Eastman Kodak 1.90 1.95 1.92 1.76
General Electric 1.80 1.90 1.95 1.57
General Foods 1.85 1.90 1.87 1.86
General Motors 1.95 1.99 2.05 1.44
Goodyear 1.80 1.95 2.06 1.39
International Harvester 1.85 1.90 2.06 2.22
International Nickel 1.90 1.95 1.77 2.80
International Paper 1.90 1.95 1.87 1.95
Johns Manville 1.85 1.90 2.08 1.75
Owens Illinois 1.85 1.90 1.95 2.06
Procter & Gamble 1.80 1.90 1.84 1.70
Sears 1.85 1.90 1.75 1.66
Standard Oil (Calif.) 1.95 1.99 2.08 2.41
Standard Oil (N.J.) 1.90 1.95 2.02 2.09
Swift 1.85 1.90 1.99 1.87
Texaco 1.90 1.95 1.85 1.76
Union Carbide 1.80 1.90 1.75 1.56
United Aircraft 1.80 1.90 1.93 1.13
U.S. Steel 1.95 1.99 1.96 1.78
Westinghouse 1.95 1.99 2.10 1.35
Woolworth 1.80 1.99 1.93 1.02
© Emilio Venezian 2009 Page 22
Internal evidence of problems in the estimation in the range analysis estimates
In discussing range analysis estimation, Fama points out that if there is sample autocorrelation in
the daily returns the method will give biased estimates.
Range analysis has one important drawback, however. If successive price changes in the
sample are not independent, this procedure will produce "biased" estimates of α. If there
is positive serial dependence in the first differences, we should expect that the
interfractile range of the distribution of sums will be more than nl/α
times the fractile
range of the distribution of the individual summands. On the other hand, if there is
negative serial dependence in the first differences, we should expect that the interfractile
range of the distribution of sums will be less than nl/α
times that of the individual
summands. Since the range of the sums comes into the denominator of (7), these biases
will work in the opposite direction in the estimation of the characteristic exponent α
Positive dependence will produce downward biased estimates of a, while the estimates
will be upward biased in the case of negative dependence.29
We shall see in Section V, however, that there is, in fact, no evidence of important
dependence in successive price changes, at least for the sampling period covered by our data.
Thus it is probably safe to say that dependence will not have important effects on any
estimates of a produced by the range analysis technique. (pages 64 and 65)
The footnote states:
29 It must be emphasized that the "bias" depends on the serial dependence shown by the
sample and not the true dependence in the population. For example, if there is positive
dependence in the sample, the interfractile range of the sample sums will usually be
more than nl/α
times the interfractile range of the individual summands, even if there
is no serial dependence in the population. In this case the nature of the sample
dependence allows us to pinpoint the direction of the sampling error of the estimate of
α. On the other hand, when the sample dependence is indicative of true dependence in
the population, the error in the estimate of α is a genuine bias rather than just sampling
error. This distinction, however, is irrelevant for present purposes.
He thus dismisses the problem. In the second section, however, he finds that 11 of the 30 stocks
have first order correlation coefficients that are more than two times their standard deviations
away from zero. It might have been prudent, at that point, to enquire whether the “quite small”
correlation coefficients may be causing substantial biases in the estimation. That is not too
difficult to do. The correlation coefficient between the range analysis estimate of α and the first
order correlation coefficient for the sample is -0.665 if all 30 stocks are considered. The value of
Kendall‟s τ is −0.558, significant at the 2*10-6
level and involves no assumptions regarding
distributions.
Other ways in which internal data to examine the effect of sample autocorrelation on the estimate
produced by range analysis involve the correlation between these quantities. The Pearson
correlation coefficient between them is −0.778 and the Kendall tau is −0.622, significant at the
5*10-7
level. If we are willing to assume that the sample correlation coefficients are
© Emilio Venezian 2009 Page 23
approximately normally distributed about the value of zero, the value assumed by Fama, we can
use OLS regression to determine both the slope and the intercept. The analysis in Panel A of
Table 3 shows the results when all 30 stocks are included. The points are shown in Panel A of
Figure 6 along with the regression line. A.T. & T. appears to be an outlier. Since this company
was closer to a public utility than an industrial stock, the analysis was repeated excluding it. The
results are shown in Panel B of the table and the figure. The fit is much better. Both panels show
that the coefficient of the sample autocorrelation is significantly negative at all conventional
level in both regressions even with a two tailed test. This is consistent with Fama‟s argument.
We can now focus on the intercept, the estimator of what the stable Pareto characteristic that
would be obtained if the sample autocorrelation were zero. This is clearly different from zero,
but the relevant question is whether it is different from 2. The intercept would be not deemed
significantly different from 2 at the 5 percent level with a two-tailed test whether we include
A.T.& T. or exclude it. Using a one-tailed test, that is, starting with the hypothesis that the
coefficient should be less than two, it attains significance at the 3 percent level if we consider
the first regression appropriate, but would still not attain significance at the 5 percent level if
A.T.& T. is excluded. Thus it is possible that the range analysis information is governed mostly
by the biases induced because of the non-zero sample autocorrelations. Hence the estimates are
not solid evidence for Fama‟s conclusion.14
Table 3. Regressions of the Characteristic Exponents Estimated by Range Analysis on the First
lag autocorrelation coefficient of the Dow Jones Index.
Panel A: Regression Excluding A.T. & T., Adjusted r-squared=0.423
Variable Coefficients
Standard
Error t Stat p-value
Intercept 1.967 0.016459 119.4929 1.81E-39
Sample Autocorrelation -1.249 0.264901 -4.71629 6.02E-05
Panel B: Regression Excluding A.T. & T., Adjusted r-squared=0.590
Variable Coefficients
Standard
Error t Stat p-value
Intercept 1.980 0.013988 141.5755 2.64E-40
Sample Autocorrelation -1.432 0.222818 -6.42513 6.96E-07
14
Note that Fama‟s statements relate to autocorrelation in the sample, not in the population. Any concerns about
errors in the independent variable are irrelevant. If we were concerned with them we would have to admit that
under those conditions the intercept understates the true intercept if the slope is negative.
© Emilio Venezian 2009 Page 24
Panel A
Panel B
Figure 6. Relation between Estimates by Range Analysis Estimates of the Characteristic
Exponent and the Sample Correlation Coefficient at Lag 1
© Emilio Venezian 2009 Page 25
Internal evidence of problems in the estimation in the sequential variance estimates
In the case of sequential analysis there is also a telltale sign. Fama stresses the fact that the
estimates should be independent of the numbers of observations chosen for the two periods. In
his Table 7 he provides, as an example, the results of sequential variance estimation for
American Tobacco and points out their variability and sensitivity to the end point of the longer
series.
The problems in estimating α by the sequential variance procedure are illustrated in
Table 7 which shows all the different estimates for American Tobacco. The estimates are
quite erratic. They range from 0.46 to 18.54. Reading across any line in the table makes it
clear that the estimates are highly sensitive to the ending point (nl) of the interval of
estimation. Reading down any column, one sees that they are also extremely sensitive to
the starting point (n0).
By way of contrast, Table 8 shows the different estimates of α for American Tobacco that
were produced by the range analysis procedure. Unlike the sequential-variance estimates,
the estimates in Table 8 are relatively stable. They range from 1.67 to 2.06. Moreover, the
results for American Tobacco are quite representative. For each stock the estimates
produced by the sequential variance procedure show much greater dispersion than do the
estimates produced by range analysis. It seems safe to conclude, therefore, that range
analysis is a much more precise estimation procedure than sequential-variance analysis.
Sensitivity to the longer series could be an indication of that stationarity is violated. This
property is assumed throughout Fama‟s analysis, but is not adequately tested. Regressions of the
estimates on the numbers of increments in the first and the second period as independent
variables yield the results shown in Table 4. Based on all 56 values shown in the table leads to
the conclusion that the length of the initial interval is not significant (p = 0.292) but the longer
interval is significant (p = 0.0051). The estimate corresponding to lengths of 200 and 300, has
small numbers and a very large overlap, it leads to the highest estimate of the characteristic
exponent, 18.54. The next highest value is 2.64, when the lengths are 200 and 400. If the first
point is excluded from the regression, the two coefficients are virtually equal and both are highly
significant (at the 5.5*10-4
and 3.4*10-6
levels, respectively) and together have an adjusted r-
squared of 0.517. The fact that both coefficients are different from zero suggests that the
instability is not merely due to the fact that the market experienced unusually high turbulence
toward the end of the period in Fama‟s data, as shown in Figure 7 and suggests that other
problems may exist.
© Emilio Venezian 2009 Page 26
Table 4. Regressions of the Characteristic Exponents of American Tobacco Estimated from
Sequential Variances on the Number of Increments Included in the Estimates.
Panel 1: Regression on 56 points, Adjusted r-squared = 0.176
Variable Coefficients
Standard
Error t Stat P-value
Intercept 5.493921 1.082452 5.075442 5.09E-06
-0.00168 0.001581 -1.06342 0.292412
-0.00337 0.001154 -2.91969 0.005135
Panel 2: Regression on 55 points, Adjusted r-squared = 0.520
Variable Coefficients
Standard
Error t Stat P-value
Intercept 2.656713 0.181376 14.64758 4.29E-20
-0.00092 0.000249 -3.6865 0.000543
-0.00098 0.000189 -5.2021 3.38E-06
Figure 7. Variation in the rate of Return on the Dow Index over the Relevant Time Interval
© Emilio Venezian 2009 Page 27
The Issue of Stationarity
The issue of stationarity is important because if the parameters of the distribution of daily
increments changes over time then tests that cover the whole period can give misleading results.
Fama concludes that this is not a problem. My conclusion based on the published data is that the
one stock for which he chose to presented addition details shows evidence of non-stationarity.
Fama attempts to justify the assumption that the return process is stationary by examining the
distribution of daily returns in two segments of the total period. His analysis is limited to five
stocks that “seemed to show changes in trend that persisted for rather long periods of time during
the period covered by this study. “Trends" were „identified‟ simply by examining a graph of the
stock's price during the sampling period. The procedure, though widely practiced, is of course
completely arbitrary.” (page 58) In line with the subjectivity he presents no comparisons of
either the mean or standard deviation of the returns in the periods, perhaps because the choice of
segments based on something more than changes in the trend of prices. Such tests might,
nonetheless, have cast some light on the issue. For example, if the criterion for segmentation was
a change in the average level of returns, a test of whether the variances of returns in the two
segments were the same might have been useful.
In view of Fama‟s position that the increments come from a stable Pareto distribution, this
approach seems unsuitable. The relevant stationarity would be that of all four basic parameters of
family. Looking at the shape of the distribution in two periods selected primarily, if not
exclusively, because they differ in the location parameter (the fourth parameter) might be
misleading.
Of the five stocks examined he reports some details for only one, A.T. &T. I have already argued
that it was not really an industrial enterprise at the time, but it is the only data provided. The data
provided for that stock is fragmentary and appears to have a number of inconsistencies. These
are discussed in Appendix 2.
Fama does show, in his Figure 4, graphs of the distribution of daily returns in the two segments
and in the aggregate period. He remarks on page 58:
“As was typical of all the stocks the graphs are extremely similar. The same type
of elongated S appears in all three. Thus it seems that the behavior of the
distribution in the tails is independent of the mean. This is not really a very
unusual result. A change in the mean, if it is to persist, must be rather small. In
particular the shift is small relative to the largest values of a random variable from
a long-tailed distribution.”
While I might agree that the graphs of the two segments of A.T.&T. are “similar” to the unaided
eye and that they are “S” shaped, I see the similarity as very limited. Even a casual examination
© Emilio Venezian 2009 Page 28
of the axes suggests that additional investigation is advisable. The range of variation in the first
period, is from −0.04 to +0.07 units, whereas that for the second period is from -0.05 units to
0.08; this might look “extremely similar” to the eye. The first period with the smaller range of
variation, had approximately six times as many observations as the as the second one. It is
unlikely that even stable Pareto distributions are such that the expected value of the range is
larger for a sample of about 220 points than it is for a sample of 1,200 points. 15
A rough estimate is that if the standard deviation is finite then in the second period it was
roughly 50% higher than that in the first period. This would imply an F-ratio of over 2 with
about 200 and 1,000 degrees of freedom, enough to attain significance at all the usually quoted
levels. Of course the use of the F-test is valid only for normal distributions so appealing to it is
useful only to the open-minded. By itself the difference in range and the difference in slope
could be the result of changes in the characteristic exponent which, as Fama points out, is a
measure of how fat the extreme tails are, or of the third parameter, which is a measure of scale.
The fact that periods of different mean were found suggests a shift in the location parameter.
We can also gauge non-stationarity by making copies of the graphs and comparing them by
superposition as shown in Figure 8. The figure suggests a difference in the slope of the body of
the curve and differences in the tails.
Figure 8: Comparison of the distribution of returns of A.T. & T. in the two periods
15
See Appendix 2 for a further discussion of issues related to the data on A.T. & T.
© Emilio Venezian 2009 Page 29
The cumulative distribution functions for the two periods appear to differ. The Kolmogorov-
Smirnov test, which is distribution free, depends on the maximum absolute value of the
difference between two empirical distributions. A more formal comparison can, accordingly, be
made if we have information on the cumulative frequency distributions. That information is not
provided in the paper, but we can get close to it by scaling the graphs carefully. Fortunately
modern technology makes it possible to do that. The graphs in the PDF version of the paper can
be enlarged and the distances between points can be measured electronically. One limitation is
that the thickness of lines increases with magnification, making the positioning of the cursor a
matter of judgment to some extent.
In view of the apparent differences between the two periods, I decided to look use a more formal
test. If the maximum absolute difference is large enough to exceed the critical values of the KS
two sample test, then the curves may be deemed to be far from “extremely similar.” In that event,
the hypothesis of stationary distribution may be rejected.
Figure 9 shows the un-replicated and unadjusted16
results for absolute value of the differences
together with three critical levels for the KS test. Based on these un-replicated and unadjusted
results one might reject the hypothesis that the samples come from the same distribution at the
one per thousand level. The rebuttal, of course would be that this is an unfair comparison
because the two periods were selected on the basis of differences in the average return.
That is a fair remark, though it must be realized that adjusting for the mean would reduce the
critical values of differences when using KS-like tests. Adjustment for the mean is somewhat
problematic. Rounding of prices to the nearest tick makes the probability of increments of
exactly zero quite substantial, so that the curves have a discontinuity at zero return. If we are just
scaling at a fixed ordinate we need merely have a convention to scale at either end of the
discontinuity or at the middle of it, and adhere to that convention. However if we want to adjust
two curves to the same mean return problems arise. In the process of adjustment some ordinates
will shift from the left hand side of a discontinuity to the right hand side, or vice versa. Because
of these limitations and difficulties it is helpful to replicate measurements in order to the extent
to which these shifts may affect the results. As a by-product we obtain information about the
accuracy and its reproducibility of the measurements.
16
In all cases, what I refer to as an un-replicated value is actually the average of two values obtained with scaling
that started at two different positions. These were intended primarily to detect digit transpositions and failures
to enter minus signs, so I do not consider them as replicates. The averaging, however, would reduce the error
of the value reported. The results in this figure are not adjusted to reflect the difference in means.
© Emilio Venezian 2009 Page 30
Figure 9. Absolute difference in the cumulative distribution function of daily increments for A.T.
& T. with no adjustment for differences in the mean return in the two periods
The portion of the paper cited above indicates that the criterion for segmentation was related to
the location parameter, not to the scale parameter and therefore suggests we should expect
differences in the mean of the two. The paper gives 0.00107 as the mean for the first period and
−0.00061 for the second period.17
Hence I rescaled the results for the first period in two ways:
one to adjust the mean to zero and another to adjust the mean to that of the second period.
Analogously, I rescaled the results for the second period in two ways: one to adjust the mean to
zero and another to adjust the mean to that of the first period. These were done by computing the
new abscissas and then measuring the original graphs at these new abscissa values. From these
results we can compute three sets of differences: one referred to a common mean equal to that of
the first period, one referred to a common mean equal to that of the second period, and one
referred to a common mean of zero. The first two methods combine one of the initial scalings
with an independent rescaling, The third one involves two rescalings.
The results are shown in Figure 10, which also shows the average of the three and the original
unadjusted difference. The points are plotted at the original abscissa. Perhaps the most
17
See Appendix 2 for a discussion of these data.
© Emilio Venezian 2009 Page 31
interesting feature is that the maximum difference has increased, rather than decreased, as the
result of the adjustments. The instability near the origin is also notable, this is mostly a result of
the fact that the gaps at zero (amounting to 8.0 percent in the first period and 5.6 percent in the
second), which were coincident in the original scaling, are no longer coincident when one or
both of the origins are shifted. The result is that large differences of the order of 4 percent will
arise at ordinates just from the adjustment across the gap in this area.
That is certainly not a problem at ordinates in the neighborhood of −0.005. In this area the
maximum difference is about 18 percent, well above the critical point for rejection at the 0.1
percent level even if we do not take into account the fact that the mean was sifted. Moreover the
range extends from −0.003 to −0.008 so replication errors are not large enough to lower the
result below that critical point.
Figure 10: Absolute difference in the cumulative distribution function of daily increments for
A.T. & T. after adjusting for differences in the mean return in the two periods
© Emilio Venezian 2009 Page 32
The curves can also be compared to assess symmetry by superposing each period on itself after
turning one of the copies upside down and matching the gap at zero return. The results, shown in
Figure 11 suggest little or no skewness in the first period but visible skewness in the second
period. This would imply that the skewness parameter of the Pareto distribution may have
changed from the first period to the second one.
Panel A: First Period Panel 2: Second Period
Figure 11: Assessment of the skewness of the distribution of
A.T. & T. daily returns in the two periods.
The conclusion is that, at least for A.T. &T., the characterization of “extreme similarity” and the
conclusion that the assumption of stationarity is justified do not stand closer scrutiny. Some
caution is advisable in going further, because the data on for A.T. &T. presented in the paper is
not completely consistent, as discussed in Appendix 2.
One other hint of non-stationarity can be found in the data of American Tobacco presented
earlier. As Fama recognized, there appear to be trends in the estimates of the characteristic
exponent:
© Emilio Venezian 2009 Page 33
“The problems in estimating α by the sequential variance procedure are illustrated
in Table 7 which shows all the different estimates for American Tobacco. The
estimates are quite erratic. They range from 0.46 to 18.54. Reading across any
line in the table makes it clear that the estimates are highly sensitive to the ending
point (nl) of the interval of estimation. Reading down any column, one sees that
they are also extremely sensitive to the starting point (n0). By way of contrast,
Table 8 shows the different estimates of α for American Tobacco that were
produced by the range analysis procedure. Unlike the sequential-variance
estimates, the estimates in Table 8 are relatively stable.”
The statement suggests that similar behavior was found in the estimation of exponents for other
stocks. This may be important because range analysis estimates use the whole series (or as much
of it as is consistent with forming sums of 4, 9, and 16 terms) whereas sequential variance
estimates as implemented by Fama use between 200 and 800 from the early part of the series for
the shorter period and points that include up to the last observation for the longer period. Thus
any non-stationarity that enters in the latter part of the period would be masked in range analysis
and amplified in sequential variance analysis.
Given only the data provided in the published paper I must conclude that stationarity is
questionable. That implies that none of the estimates of characteristic exponent provided in the
paper can be trusted as unbiased assessment of what they purport to measure.
Fama’s choice of implementation methods
The analysis that Fama engaged in deals with time series, and he often mentions this point. The
theory on which his methods are based, however, is based on the characteristic function of stable
Pareto distributions. Accordingly, his argument relates to independent samples from such
distributions. If a theory of sequences of stable Pareto had been available, he would have used
that theory to assess autocorrelation. Instead, as already discussed, he randomized the order of
the increments to investigate autocorrelation. When it is convenient, however, he stresses the
notion of sequences. In his Appendix, for example he starts (pages 104 and 105) with the
statement:18
“This discussion provides us with a way to analyze the distribution of the sample
variance of the stable Paretian variable u. For values of α less than 2, the
population variance of the random variable u is infinite. The sample variance of n
independent realizations of u is
∑
(A20)
This can be multiplied by
with the result
( ∑
) (A21)
Now we know that the distribution of
18
I quote his passage in full. The switch from ut to yi is in the original.
© Emilio Venezian 2009 Page 34
∑
is stable Paretian and independent of n. In particular, the median (or any other
fractile) of this distribution has the same value for all n. This is not true, however,
for the distribution of S2. Now we know that the distribution of ∑
is
stable Paretian and independent of n. In particular, the median (or any other
fractile) of this distribution has the same value for all n. This is not true, however,
for the distribution of S2. The median or any other fractile of the distribution of S
2
will grow in proportion to .”
Then he goes on to illustrate this and relate it to the task at hand by an example:
“For example, if ut is an independent, stable Paretian variable generated in time
series, then the .f fractile of the distribution of the cumulative sample variance of
ut at time tl , as a function of the .f fractile of the distribution of the sample
variance at time t0 is given by
.
/
(A22)
where is the number of observations in the sample at time , is the number
at , and and
are the .f fractiles of the distributions of the cumulative
sample variances.”
It is clear from his context that the first part applies generally, but he then goes on to enshrine the
idea of sequences by calling this the “sequential variance approach”.
The terminology and notation used may give rise to confusion. The symbol is used to denote
a single estimate of the variance from a sample of size n. The symbol is used to denote a
quantile (f) of the distribution of and perforce depends on the sample size. In what follows I
need a more explicit notation. In an attempt to avoid confusion I will use ( ) to denote the
estimate of the variance from the th sample of size n, and , ( )- to denote the estimate of
the q quantile from a sample of k values of ( ). With this notation, the fundamental relation
derived in the appendix may be written as:
, ( )- , ( )- ( )
( )
The corresponding relation for α would be:
.
/
( , ( )- , ( )-
) . / ( )
© Emilio Venezian 2009 Page 35
Fama uses the result from single samples, ( ) and ( ), as the estimators of , ( )-
and , ( )-. That leads to the estimator used by Fama:
.
/
( ( ) ( )
) . / ( )
Fortunately the value of q does not enter the relation, and ( ) is an estimator of the median, so
the calculations can be performed with no further complication. It seems likely that if independent
samples are used in the two estimates the result will have high dispersion. This may be one reason for the
choice of overlapping periods used in the numerical work. It might be better, however, to use an alternate
estimator:
.
/
( , ( )-
, ( )-) .
/
( )
This would require drawing samples of size and determining quantiles of the estimated
variance, then repeating the procedure for samples of size . We could choose the samples to
be continuous segments or we could select at random, with or without replacement. With a series
of 1600 observations we could, for example, use 160 samples of size 10, 80 samples of size 20,
or 16 samples of size 100. The choice of parameters would depend on the relationship between
the size of the sample and the variance of the estimators. The variance of the median of 16
samples of 10 may well be lower than that of a single sample of 160.
This alternative has other advantages. One is that we could use not just the median, but other
quantiles as well. Thus we could check to see if the estimates based on the first quartile, the
median, and the third quartile are consistent. A second one is that every part of the overall
sample would be included in the estimates, in contrast to Fama‟s choice which includes only the
early part of the period in all the estimates with the smaller sample size. A third one is that, at the
cost of reducing the reliability, we can develop measures of stationarity.
These comments also have application to range analysis. That also relies on sums, and the
requirement is that the elements of the sums be independent stable Pareto variables, not that they
be sequential. Assuming that the increments are stable Pareto, this can be used to assess whether
they are stable over the sample period, since the results obtained from sums of sequential
samples of increments yield the same distribution of results as those obtained from sums of
samples of random increments with the same sample size.
© Emilio Venezian 2009 Page 36
7. Summary
In sum, the Fama paper claims a great deal but seems to establish very little on a sound basis.
The claim that estimates of the characteristic exponent determined by three different methods all
lead to values that are predominantly less than two proves nothing in the absence of information
about the potential biases and the variability of the estimates. The fact that the values obtained by
the three methods show no significant positive association raises serious questions about the
contention that the estimates are meaningful. The fact that the observed increments are not
normally distributed could, almost certainly be established by using a Lilliefors test (Sachs,
1982).19
That test is sufficiently sensitive that the discrete spike at zero increment caused by
truncation of prices often leads to significance. I have shown that using the Fama methodology it
is possible to conclude, from data based on rounded prices, that the underlying increments
(which could be measured only with continuous recording of prices) are not likely to be normally
distributed. That conclusion, however, relies on the assumption that the parameters are stable.
The analysis of correlation coefficients leads Fama to the conclusion that 11 of the 30 stocks
have correlation coefficients that are more than twice their standard deviation. He dismisses this
on the grounds that the coefficients are small. This dismissal is used to argue that the
characteristic exponent estimates by range analysis are valid even though the author argued that
range analysis will give biased estimates if first order correlation is present in the sample. In
particular, no analysis was performed by Fama to investigate whether the sample correlation
coefficients are related to the values of the characteristic exponent obtained from range analysis.
I have shown that the relation between the two is very strong. Thus the correlation may be
significant at least in the sense if affects the bias in the estimation procedure.
The “conclusion” that the distributions are stable Pareto with characteristic exponent less than 2
is then used to argue that the two-standard deviations test of the correlation coefficients is
inappropriate because the variance is infinite. We have a circular argument: if the correlations
are not significant the characteristic exponent “is less than 2” and if the characteristic exponent is
less then two the correlations are irrelevant. But if the characteristic exponent were two the
correlations would be measured correctly (at least in the absence of rounding of prices) and
would imply a downward bias in the estimate of the characteristic function.
The simulations presented in this paper show that, at least in the case of t-distributed variables of
infinite variance, the threshold of 2 standard deviations is wrong, but because it leads to too few
rejections, not because it leads to too many as hypothesizes by Fama. Moreover, they indicate
that even series of 4,000 observations still suffer from this bias. This does not prove that the
same results would be obtained with stable Pareto distributions, but it raises doubts about the
19
It is appropriate to point out that Fama could not have used Lilliefors tests since they were not available in
1965.
© Emilio Venezian 2009 Page 37
validity of Fama‟s arguments and raises questions about the argument that the number of
observations is large enough.
The dismissal of autocorrelation on financial grounds is also of concern. While such grounds are
certainly relevant to some arguments, some standard is need for reaching that decision and
relevance should be established. Even for financial decisions. the standard should be based not
on whether it is possible to make money on the basis of one item of information at the time, but
whether it is possible to make money considering all the relevant information. No such test is
proposed or conducted. And no argument is made that the ability to make money is necessary for
a valid statistical decision.
Fama‟s remarked that autocorrelation coefficients of randomized differences in the logarithms of
prices “appear to break through their control limits only slightly more often than would be the case if the
underlying distribution of the first differences had finite variance.” On the other hand, the
autocorrelation coefficients for the ordered differences that are significant at the one percent
level for 9 out of 30 stocks, suggests that the hypothesis of serial independence is not consistent
with the data. This correlation cannot be dismissed on the grounds that the underlying variables
have infinite variance because, that would affect the randomized differences just as much as the
ordered differences. The use of the one percent level was purely conventional. In fact, all the
differences that were significant at the one percent level were also significant at the 0.12 percent
level. Thus there is substantial separation between these and the other 21 stocks. Moreover, the
simulations of the randomness of sample values of correlation coefficients when the underlying
variable has a t-distribution with infinite variance suggest that the analysis based on the normal
distribution undervalues, rather than overvalues the extent of autocorrelation at the 1 and 5
percent levels.
The simulations suggest that we may reject the hypothesis of a stationary normal distribution as
the underlying phenomenon. The issues of rounding of prices and of stationarity are, however,
difficult to address. For A.T. & T., the example given to support the statement that the
distributions of returns in two different periods are “extremely similar,” turns out to give
information that appears to reject stationarity. Thus the rejection must be viewed as tentative
rather than conclusive.
The simulations presented in this paper to illustrate the behavior of sample estimates of
correlation coefficients when the underlying variate has infinite variance suggest another
potentially useful approach to the study of non-normality in the increments of the logarithm of
price in securities markets. Those simulations showed that with prices rounded to $0.125, the
probability that the price will reach zero within a span of 1,500 days is substantial and increases
as the fatness of the tails increases. It seems clear that with the current practice of quoting prices
to the nearest cent this effect will be much smaller. Organized exchanges, however, have rules
that call for delisting of stocks when the price level declines to levels of about $1.00. Hence the
frequency of delisting of stocks could be used to provide insights into the behavior of the
extreme tails.
Finally, the methods as implemented by Fama do not appear to do justice to the underlying
framework. The conclusions could change with better implementation.
© Emilio Venezian 2009 Page 38
Appendix 1:
Estimation of the average and standard deviation of daily returns
from Fama‟s Table 4
In Table 4 of his paper Fama gives the highest and lowest daily return for each of the stocks,
which I will denote as ( ) and ( ), respectively, and the corresponding standardized
variables, which I will denote as as ( ) and ( ), respectively.
Since the standardized variable is given by:
( ) ( ) ( )
( ) ( )
where ( ) and ( ) are the sample mean and standard deviation of the daily return for stock i.
Hence the sample standard deviation can be computed from the relation:
( ) ( ) ( )
( ) ( ) ( )
And the sample average can be obtained from
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
Thus the data in the Fama‟s Table 4 allows us to recover these useful estimates. They are shown
in Table A1.1.20
Given the original data, the largest possible error in the standard deviation is
about 1 unit in the fifth decimal (ranging from 0.000005 to 0.000016); that in the mean is
between 1 and 2 in the fourth decimal (ranging from 0.000156 to 0.000175). The number of
digits given in the table attempts to reflect this fact. Errors of that magnitude would occur if the
rounding in the basic numbers given in the original table all attained the maximum with signs
that lead the result in the same direction.
20
The computed values of mean and variance for the 30 stocks in the Dow Jones Industrial Index have a
Person correlation coefficient -0.490, Spearman correlation of 0.483, and Kendall tau of -0.333, all
significantly different from zero at the 0.1 percent level. This is, presumably, simply a result of the
overall period from which the data are derived and the various sample periods used for the individual
stocks. Alternatively, it could be attributed to the way in which stocks for inclusion in the DJI are
determined. In either event, it raises some concern about how representative the sample might be.
© Emilio Venezian 2009 Page 39
Table A1.1
Estimated mean and standard deviation of the daily returns to the Dow stocks
Daily return Standardized Estimated
Smallest Largest Smallest Largest s(i) m(i)
Allied Chemical
-0.0718
0.0838 -5.012 5.820 0.01436 0.0002
Alcoa -0.0531 0.0619 -3.381 3.945 0.01570 -0.0001
American Can -0.0623 0.0675 -5.446 5.853 0.01149 0.0003
A.T.&T. -0.1038 0.0989 -10.342 9.724 0.01010 0.0007
American Tobacco -0.0800 0.0724 -6.678 5.949 0.01207 0.0006
Anaconda -0.0573 0.0600 -3.851 4.015 0.01492 0.0001
Bethlehem Steel -0.0725 0.0620 -5.571 4.748 0.01303 0.0001
Chrysler -0.0805 0.1009 -4.660 5.853 0.01725 -0.0001
Dupont -0.0599 0.0515 -5.843 4.950 0.01032 0.0004
Eastman Kodak -0.0443 0.0779 -3.399 5.832 0.01324 0.0007
General Electric -0.0647 0.0565 -5.135 4.456 0.01263 0.0002
General Foods -0.0468 0.0625 -3.937 5.065 0.01214 0.0010
General Motors -0.0975 0.0829 -7.761 6.547 0.01261 0.0004
Goodyear -0.0946 0.1744 -5.919 10.879 0.01601 0.0002
International Harvester -0.0870 0.0687 -6.299 4.880 0.01393 0.0007
International Nickel -0.0592 0.0567 -4.917 4.628 0.01214 0.0005
International Paper -0.0507 0.0533 -4.219 4.454 0.01199 -0.0001
Johns Manville -0.0687 0.1194 -4.386 7.575 0.01572 0.0003
Owens Illinois -0.0637 0.0606 -5.195 4.881 0.01234 0.0004
Procter & Gamble -0.0635 0.0656 -5.504 5.559 0.01167 0.0007
Sears -0.1073 0.0606 -9.338 5.148 0.01159 0.0010
Standard Oil CA -0.0633 0.0674 -4.793 5.056 0.01327 0.0003
Standard Oil NJ -0.1032 0.1007 -9.275 9.013 0.01115 0.0002
Swift & Co. -0.0675 0.0628 -4.761 4.418 0.01420 0.0001
Texaco -0.0593 0.0545 -4.650 4.193 0.01287 0.0005
Union Carbide -0.0456 0.0394 -4.396 3.783 0.01039 0.0001
United Aircraft -0.1523 0.0849 -8.878 4.939 0.01717 0.0001
US Steel -0.0539 0.0555 -3.968 4.091 0.01357 0.0000
Westinghouse -0.0804 0.0863 -5.415 5.808 0.01485 0.0000
Woolworth -0.0674 0.0896 -5.890 7.743 0.01152 0.0004
The computed values of mean and variance for the 30 stocks in the Dow Jones Industrial
Index have a correlation coefficient -0.49, significantly different from zero at the 0.2
percent level. This could be a result of the overall period from which the data are derived
and the various sample periods used for the individual stocks or of the way in which stocks
© Emilio Venezian 2009 Page 40
for inclusion in the DJI are determined. In either event, it raises some concern about how
representative the sample might be of stocks generally.
A third, and more disturbing possibility, is that the correlation is a manifestation of
skewness. For symmetric distributions of finite variance the expected value of the
covariance between sample values of the mean and standard deviation based on points is
given by:21
,( )( )- ( )
This obviously goes to zero as the sample size increases, but so do the variances of the first
and second moments. As a result:
* , ( )-+ { ,( )( )-
√ ,( ) - ,( ) -}
√ ( ) (
) ( )
Thus it would be possible to find a significant negative correlation if most of the stocks had
negative skewness. I believe this hypothesis can be disposed by observing that negative
skewness would imply that the largest standardized deviations in the negative direction
should have, on average, larger absolute values than those in the positive direction; the data
from the same table the difference of in absolute values is 0.033 (with positive deviation
being the larger) and a standard deviation of 1.75.
21
See, for example, Cramer, 1974.
© Emilio Venezian 2009 Page 41
Appendix 2:
The A.T.&T. data and related problems
The data on returns to A.T.&T. stock are the only basis for examining more closely the
contention of Fama that the series are stationary. This is important because if the parameters of
the distribution on daily increments changes over time then tests that cover the whole period can
give misleading results. My conclusion based on the data is that at least for this stock there is
substantial evidence for non-stationarity. It is important, however, to point out that I encountered
a number of problems with data on A.T.&T. given in Fama‟s paper and these may affect the
conclusion.
One small (and minor) issue is in the number of observations. A second one is that the average
daily return over the whole period given on page 58 is not consistent with the data given in Table
4, page 51. Finally, there is also reason to doubt the accuracy Figure 4 of Fama‟s paper. The
purpose of this appendix is to point these out since they might impact the analysis.
In his Table 3, Fama gives the number of observations on A.T.&T. increments as 1,219. On page
58 he gives the earliest date for A.T.&T. as 11/25/1957. He also gives the last date for which
data are available as 9/26/1962. Data on the Dow Jones Industrial Index obtained from the
internet has 1,218 closing prices between the two dates given. The maximum number of
increments would therefore be 1,217. That could easily be a typographical error.
Of the five stocks examined he reports some details for only one, A.T.&T. Even for that one the
details are so fragmentary that it is difficult to determine anything with certainty. On page 58
Fama gives the information that for the period between 11/25/1957 and 12/11/1961 the average
daily return to A.T.&T. was 0.00107, between 12/11/1961 and 9/24/1962 it was −0.00061, and
for the whole period it was 0.000652. This suggests that the segments might have been selected
for having different average rates of return but, since the standard deviations of the returns in the
two periods are not given it is not possible to establish that these two averages are significantly
different from each other.
Based on the number of DJI closing values, the first period would have included 1,017
increments and the second one 197. Using these period lengths, the weighted average rate of
return over the whole span would have been 0.00797, not very close to the reported number. For
the three averages to be consistent it would be necessary to have more than 300 trading days in
the second period. The period, however, is less than a year, so it cannot have more than 261
business days. Hence we have an inconsistency.
© Emilio Venezian 2009 Page 42
The computed weighted average from the data on the segments is within the estimated error of
the average inferred from the data in Fama‟s Table 4 and presented in Appendix 1, this suggests
that a typographical error may be involved value for the whole period.22
Without a consistent set
of numbers it becomes hopeless to try to estimate from the numbers what the standard deviation
might have been in the two segments. If the error is in the overall average it would have no effect
on the computation of the adjusted maximum differences between the cumulative distributions
functions of the first and second periods. If, however, the error is in the average return of one or
both of the shorter periods it is possible that the inconsistency contributes to making the two
periods appear more different than they are.
I cannot vouch for the accuracy of the figures. As an example, it can readily be seen that the third
panel of Figure 4 in Fama‟s paper shows two distinct points at returns of approximately −0.0226
and −0.0244 and probabilities, and cumulative probabilities, respectively, of about 2.52 and 2.59
percent. That implies a sample size of over 1,400, but since the period involved was less than a
year the sample size simple could not have been more than 300. Similarly, a point at a return of
approximately −0.050 appears in both the A.T.&T. panel of Fama‟s Figure 2 and the top panel of
his Figure 4, but does not appear in either of the two bottom panels, in which the lowest return is
at −0.037 for the first period and −0.044 for the second. These two are the second and third
lowest returns in the top panel of Figure 4 and the corresponding panel of Figure 2. This problem
is not so apparent because the scales used for the abscissas are not the same, but as I worked on
the issue of stationarity it quickly came to notice. The effect of these two problems on the
Figures 9 and 10 in the text cannot be of major import. The first problem can affect the scaled
differences by no more than 1/1,400. Even if the point at −0.050 was omitted in the calculation
of the frequencies, it can affect the scaled differences by at most 1/200. Thus the combined effect
can be no more than 6 tenths of one percent, compared to a gap of some 8 percent between the
observed maximum difference of 20 percent and the critical difference of no more that 12
percent.
22
The estimate obtained from Table 5 also requires more days in the second period, but error analysis indicates
that a value as high as 0.00085 would be consistent with the data. A value that high would require fewer days
in the second period than the actual number of trading days (118).
© Emilio Venezian 2009 Page 43
References
Ball, C., 1988, “Estimation Bias Induced by Discrete Security Prices,” Journal of Finance,
vol. 43, pp. 841-865.
Campbell, J.Y, A.W. Lo, and A.C. MacKinlay, 1997, “The Econometrics of Financial Markets”,
Princeton University Press, Princeton, NJ.
Cramer, H., 1974, “Mathematical Methods of Statistics,” Princeton University Press, Princeton,
NJ.
Cho, D. and E. Frees, 1988, “Estimating the Volatility of Discrete Stock Prices,” Journal of
Finance, vol. 43, pp. 451-466.
Fama, E. F., 1965, “The behavior of stock market prices”, Journal of Business, vol. 38, 34-105.
Gottlieb, G. and A. Kalay, “Implications of the Discreteness of Observed Stock Prices,” Journal
of Finance, vol. 40, pp. 135-154.
Harris, L. 1990, Estimation of Stock Variances and Serial Covariances from Discrete
Observations,” Journal of Financial and Quantitative Analysis, vol. 25, pp. 291-306.
Kendall, M.G., 1970, “Rank Correlation Methods,” Hafner Press, New York.
Sachs, L., 1982, “Applied Statistics – A Handbook of Techniques”, Springer-Verlag, New York,
NY.
Snedecor, G.W. and W.G. Cochran, 1967, “Statistical Methods,” The Iowa State University
Press, Ames, IA.
Venezian, E., 2011, “Effects of rounded prices on the estimation of the parameters of the pricing
process”, paper presented at the 19th
Annual Conference on Pacific Basin Finance, Economics,
Accounting, and Management, Taipei, Taiwan.
Top Related