Cross-Sectional Regressions in Event Studies · 2017. 2. 16. · 3 Christie shows [2] is...

Cross-Sectional Regressions in Event Studies

Jim Musumeci

Department of Finance, MOR 107

Bentley University

Waltham, MA 02452

[email protected]

781.891.2235

Mark Peterson

Department of Finance, Rehn 134A

Gordon and Sharon Teel Professor of Finance

Southern Illinois University

Carbondale, IL 62901-4626

[email protected]

618.453.1426

Current Draft: September, 2015

The authors are grateful to Claude Cicchetti, Marcia Millon Cornett, Dhaval Dave, Otgo

Erhemjamts, Atul Gupta, Kartik Raman, Len Rosenthal, Richard Sansing, and Aimee Smith for

their helpful comments. The usual disclaimer applies.

mailto:[email protected]:[email protected]


Abstract

Christie [1987] demonstrated that when regressing abnormal returns on various firm

characteristics, the “correct deflator [for those firm characteristics] is the market value of

equity at the beginning of the period.” Despite this, many researchers deflate a variable of

interest by total assets, and then add leverage, a ratio of book equity to market equity (or its

reciprocal), or other independent variables. We show that such a method regularly produces

relationships that appear to be statistically significant, but which in fact are spurious and

attributable only to a mathematical artifact, not to any causal effects.


“You keep using that method. I do not think it

measures what you think it measures.”—with

apologies to Inigo Montoya and William Goldman.

I. Introduction

Most modern event studies test not only whether average abnormal return is equal to zero,

but also how abnormal returns are related to firm characteristics. For example, if the event

under consideration pertains to the mortgage crisis, it is natural to conjecture that the firm’s

response to the event is larger when it owns more mortgages. If V denotes the value of the

presumably affected assets, TA the value of total assets, ME the market value of the firm’s

equity, D the value of the firm’s debt, and ∆ denotes “change in,” and if everything else is held

constant, then the fundamental accounting identity Total Assets = Total Liabilities +

Stockholders’ Equity tells us to expect ∆V = ∆TA = ∆D + ∆ME. Later in the paper we consider the

implications of risky debt, but for now we assume the firm’s debt is riskless, in which case ∆D =

0 and so ∆V = ∆ME. It is often difficult to measure ∆V on a regular basis, and so we typically

assume it is proportional to V, or equivalently that ∆ME = kV for some presumably unknown

constant k. A typical null hypothesis would be that k = 0; the alternative might be k ≠ 0, k > 0, or

k < 0, depending on context.

A direct ordinary least squares (OLS) regression of ∆ME on V is inappropriate for at least a

couple of reasons. First, a heteroscedasticity problem likely exists, with larger firms having

larger error terms. While under this condition OLS will still produce unbiased estimates, these

estimates will no longer have minimum variance within the set of linear estimates, and tests of

statistical significance will be undependable. Second, both ∆ME and V are generally related to

firm size, in which case we will find a significant relationship even if we mistakenly choose V to

be some irrelevant variable that also tends to increase with firm size, i.e., almost any balance

sheet or income statement item. To eliminate these problems, we typically scale ∆ME and V by

some variable related to size before we estimate our regression. Christie [1987] showed that

the dependent variable, abnormal return, is essentially a measure of ∆𝑀𝐸

𝑀𝐸, and so consistency

implies we need to scale V by ME as well. Normalizing V instead by Total Assets spreads the

gain (or loss) from V equally across all the firm’s claimants, which is not what ∆𝑀𝐸

𝑀𝐸 does. This

necessarily produces extraneous noise and weaker tests unless there exists a universal constant

c such that ME = c∙TA for each firm in the sample.

Christie formally demonstrated that use of normalizing variables other than the market

value of equity results in misspecification problems, but did not elaborate on the nature or

2

severity of the misspecification. We demonstrate that the commonly used variables 𝑉

𝑇𝐴,

𝐷

𝑇𝐴, and

𝐵𝐸

𝑀𝐸 (or

𝑀𝐸

𝐵𝐸) will usually produce misleading statistical significance. Moreover, we show that

some seemingly reasonable (and some deliberately unreasonable) choices of variables produce

not only spuriously significant relations, but even contradictory ones.

II. Modern Practice in Cross-Sectional Event-Study Analysis

Despite Christie’s admonition, few modern event studies scale their independent variables

by market equity. To identify the false inferences to which alternative methods can lead, we

consider a hypothetical unexpected change in tax law that would lead to lower taxes associated

with greater cash and short-term receivables (Compustat item DATA1; henceforth “cash”). A

typical hypothesis might be that, ceteris paribus, ∆ME = ∆V = kV for some positive constant k,

where V in this case denotes cash. We select a random sample of 150 firms1 from the 2012

Compustat database and initially assume this relation holds exactly for all the firms in the

sample. In practice, of course, error terms exist, but if an empirical method works poorly under

perfect conditions with no error terms, then it cannot be expected to perform well when noise

is present. Nevertheless, after the main points are established, we consider the more realistic

case in which ∆ME = ∆V does not hold exactly, and we find similar results.

To make the main point, we assume the firm’s gain in market value is 5% of the cash

balance, and that this entire increase in value accrues to equityholders. Thus ∆ME = .05V =

.05Cash, and so dividing both sides by ME gives us

∆𝑀𝐸

𝑀𝐸 =

.05𝑉

𝑀𝐸 =

.05𝐶𝑎𝑠ℎ

𝑀𝐸. [1]

Without loss of generality, we assume that the expected return of each firm conditioned on the

market return is zero, in which case the abnormal return, AR, exactly equals .05𝐶𝑎𝑠ℎ

𝑀𝐸. For this

sample, parameter estimation for the cross-sectional regression Christie shows to be correct,

AR = + 𝐶𝑎𝑠ℎ

𝑀𝐸, would produce �̂� = 𝛼 = 0 and �̂� = 𝛽 = .05 by construction.

A modern researcher unaware that the true state of nature is given by [1] might well

estimate a cross-sectional regression with commonly used variables, specifically,

AR = 𝛼 + 𝛽1𝐶𝑎𝑠ℎ

𝑇𝐴+ 𝛽2

𝐷

𝑇𝐴+ 𝛽3

𝐵𝐸

𝑀𝐸+ 𝜀. [2]

1 We excluded financials (SIC Code 6000-6900) and utilities (SIC code 4900-4999) and firms with negative book

equity, leaving us with 144 firms. The latter exclusion was made because 𝐵𝐸

𝑀𝐸 is a meaningless statistic when book

equity is less than zero.

3

Christie shows [2] is misspecified, yet regressions like this make frequent appearances in the

literature. One motivation to use them is that they appear to separate the event’s effects into

component parts. The question we address here is largely empirical in nature: do such

regressions really explain the components of abnormal return, and how should we interpret the

results?

We estimate [2]’s parameters for our sample and find [with t-values in brackets]

AR = -.009 + .039𝐶𝑎𝑠ℎ

𝑇𝐴+ .014

𝐷

𝑇𝐴+ .010

𝐵𝐸

𝑀𝐸. [3]

[-3.46] [8.14] [3.45] [5.59] Adj R2 = .351

The regression appears to identify several different firm characteristics that are strongly related

to AR. Proponents of this technique would presumably infer from the significant t-values that

firms’ abnormal returns are increasing in each of 𝐶𝑎𝑠ℎ

𝑇𝐴,

𝐷

𝑇𝐴, and

𝐵𝐸

𝑀𝐸.

If no firms in the sample had outstanding preferred stock, then 𝐷

𝑇𝐴 = 1 −

𝐵𝐸

𝑇𝐴. Because

𝐷

𝑇𝐴 and

𝐵𝐸

𝑇𝐴 would have identical variances and a correlation of -1 with each other, and have correlations

with other variables that are identical except for sign, substituting 𝐵𝐸

𝑇𝐴 for

𝐷

𝑇𝐴 would change only

the value and t-statistic of the intercept �̂� and the signs (but not the magnitudes2) of �̂�2 and,

most importantly, the t-statistic of �̂�2. In general, however, some firms will have outstanding

preferred stock, and so 𝐵𝐸

𝑇𝐴 and

𝐷

𝑇𝐴 will have a strong negative correlation, but not a perfect one.

This is sufficient for their tests of significance to be very similar:

AR = . 004 + .043𝐶𝑎𝑠ℎ

𝑇𝐴 − .019

𝐵𝐸

𝑇𝐴+ .011

𝐵𝐸

𝑀𝐸. [4]

[2.18] [9.02] [-4.83] [6.23] Adj R2 = .396

Compared with the results of [3], [4]’s larger R2 and t-values for the coefficients seem to

suggest it is a more powerful and therefore presumably a better test.

We next check what happens if we alter the form of the model by substituting for each

independent variable its logarithm instead. Estimating the parameters of that regression gives

us

2 If r2 is substituted for r1 as an independent variable, a perfect correlation is necessary and sufficient to ensure that the magnitude of the slope’s t-statistic remains unchanged. In general, however, the coefficient itself may change (for example, changing the units of measure of any independent variable from dollars to thousands of dollars in any regression will result in identical t-values, but coefficients that are different by a factor of 1000). What guarantees in this case that the magnitude of the coefficient also remains unchanged is that the absolute

value of the coefficient of 𝐷𝑒𝑏𝑡

𝑇𝐴 in

𝐵𝐸

𝑇𝐴= 𝑎 + 𝑏

𝐷𝑒𝑏𝑡

𝑇𝐴 is 1.

4

AR = . 019 + .004 ln (𝐶𝑎𝑠ℎ

𝑇𝐴) − .006 ln (

𝐵𝐸

𝑇𝐴) + .007ln (

𝐵𝐸

𝑀𝐸). [5]

[12.92] [9.88] [-6.62] [7.60] Adj R2 = .449

Compared with [4], the larger t-values and R2 of [5] would seem to suggest it is a more powerful

test, and therefore a better one.

Finally, to see if we can improve on this some more, we take the log of the abnormal

returns as well.3 This gives us the following result:

ln (𝐴𝑅) = −2.996 + 1.00 ln (𝐶𝑎𝑠ℎ

𝑇𝐴) − 1.00ln (

𝐵𝐸

𝑇𝐴) + 1.00ln (

𝐵𝐸

𝑀𝐸). [6]

[∞] [∞] [–∞] [∞] Adj R2 = 1.0

To see how such extreme statistics are possible, we exponentiate e by each side of [6], giving us

AR = ∆𝑀𝐸

𝑀𝐸= 𝑒−2.996 (

𝐶𝑎𝑠ℎ

𝑇𝐴) (

𝐵𝐸

𝑇𝐴)

−1(

𝐵𝐸

𝑀𝐸), or

∆𝑀𝐸

𝑀𝐸= .05 (

𝐶𝑎𝑠ℎ

𝑀𝐸), [7]

which is precisely equation [1]. The regression results from [3] appeared to suggest Total

Assets and Book Equity were factors that contribute to an explanation of abnormal returns, but

they are absent from [7]. While it is not necessarily incorrect to essentially add and then

subtract the effects of Total Assets and Book Equity (as regression [6] basically does), Occam’s

Razor implies it is inappropriate (and somewhat misleading) to do so. The results of [3]—[6] are

summarized in Table 1.

Assuming [7] (or, equivalently, [1]) holds, the coefficients in regression [2] will typically

show up as significant. The basic principles in play here are that if two candidates for an

independent variable have a perfect correlation, use of either will produce identical magnitudes

for that variable’s t-statistic and for the overall R2, and when two variables have a large

correlation with each other, either will produce very similar t-statistics and R2 values.4 In our

specific context, all that is required is that the proportion of the firm that is financed with

preferred stock is relatively constant across firms, and that the correlation between each of the

independent variables and its logarithm is fairly large. Even this second condition is not

3 This requires that all the abnormal returns be positive, but by assumption ∆ME = kV > 0, so this condition is met. Similarly, if ∆ME = kV < 0, we can apply the same analysis to the variables -∆ME = -kV > 0. The more realistic case in which AR may have different signs for different firms is addressed after we establish the basic intuition for this deterministic special case. 4 The inverse is not necessarily true, i.e., if two variables have a low correlation with each other, the t-statistics are not necessarily substantially different when one of these variables is substituted for the other. For example, if W1

and W2 are independent and identically distributed and Y = W1 + W2 + , then the regressions 𝑌 = �̂�1 + �̂�1𝑊1 + 𝜀

and 𝑌 = �̂�2 + �̂�2𝑊2 + 𝜀 will on average produce similar estimates and similar t-statistics for their respective parameters, despite the fact that W1 and W2 are uncorrelated.

5

necessary for the t-statistics to be misleading; because the correlation between a variable and

its logarithm is always positive, the stated t-statistics will always be distorted away from zero.

It is when this correlation is large, however, that the distortion will be particularly egregious.

Christie showed that a regression like [7] is the correct specification, and regression [6] is

mathematically identical to [7]. Working backwards from [6] to [3] simply substitutes for each

variable another variable that has a high magnitude of correlation with it. Thus the parameters

of equation [3] all show up as significant not because all the independent variables provide

explanatory power, but as a mathematical consequence of the true state of nature given by [7].

While the first and third independent variables, (𝑉

𝑇𝐴) and (

𝐵𝐸

𝑀𝐸), are at least related to the

correct variable, (𝑉

𝑀𝐸), the middle independent variable, (

𝐵𝐸

𝑇𝐴), is not.5 Any rejection of the null

hypothesis that its coefficient is equal to zero, then, is a form of Type I error. This component

of Type I error can be quite large and is in addition to whatever level is chosen for the

significance level . To see in greater detail why an irrelevant independent variable can have a

statistically significant coefficient in a multiple regression, we first consider two effects of

multicollinearity.

III. Multicollinearity

The most commonly appreciated aspect of multicollinearity is that it can obscure a true

relationship between the dependent variable and one or more independent variables, leading

to less powerful tests. For example, consider a dependent variable Y, independent variables W1

and W2, and “building blocks” Zi, where the Zi are independent of each other. If

Y = Z1 + Z2

W1 = Z1 + Z3

and W2 = Z1 + Z4,

then simple regressions of Y on either W1 or W2 are likely to reveal significant coefficients

provided the sample is sufficiently large or the variances of Z2, Z3, and Z4 are sufficiently small

relative to that of Z1. A multiple regression of Y on W1 and W2 is problematic, though, because

while an F-test will reveal that at least one of W1 or W2 is related to Y, it is difficult to ascertain

which of W1 or W2 is the culprit.6

5 If (

𝐵𝐸

𝑇𝐴) happens to be correlated with (

𝑉

𝑀𝐸), it may appear to be significant in a simple regression. However, by

construction, its significance would disappear in a multivariable regression with both variables. 6 In the belief that orthogonalization solves this multicollinearity problem, some researchers first orthogonalize one of the independent variables, say W1, by regressing it on another, W2, with which it is highly correlated, and

6

A less-appreciated consequence of multicollinearity is that it may give the appearance of a

significant relationship, even when none exists. For example, consider

Y = Z1 + Z2

W1 = Z1 + Z3

and W2 = + Z3.

Now a simple regression of Y on W1 will typically produce a significant coefficient as before, but

Y is independent of W2 and the results of that simple regression will reveal this. However, W1

and W2 have a positive correlation because of the common component Z3, and in a multivariate

regression featuring W1 and W2 as independent variables, W2 will have the effect of “cleaning

up” the noise induced by the presence of Z3 as a component of W1.7 Specifically, we will find

that for the parameters of the sample regression line

�̂�𝑖 = �̂� + �̂�1𝑊1,𝑖 + �̂�2𝑊2,𝑖 + 𝑒𝑖

�̂�1 → 1 and �̂�2 → -1 as sample size increases. However, W2 is not in any way related to Y, and

its coefficient appears to be significant not because of any direct relationship with Y, but

because it counteracts the Z3 noise term in W1.

The havoc that can be created by the introduction of superfluous independent variables is

reminiscent of one version of Griliches’ Law: “Any cross-sectional regression with more than

five variables produces garbage.”8 It is for this reason that estimating a series of simple

regressions as a complement to any multiple regression is desirable. When a multiple

regression suggests an independent variable is significant, it is important to know whether the

significance is due to a direct effect on the dependent variable, or to a reduction of noise in one

or more other independent variables; simple regressions are not a panacea,9 but they can give

us some indication of this. Additionally, it is a good idea to have a solid model a priori that is

then replace W1 in a multiple regression with the residuals e1 from the regression 𝑊1 = �̂� + �̂�𝑊2 + 𝑒1. Mitchell [1991], however, shows that this does not solve the multicollinearity problem at all. The coefficient and t-statistic for e1 in the ensuing multiple regression are identical to what they would have been for W1, while those for W2 are the same as they would have been in a simple regression without W1 or e1. 7 We are hardly the first to make this observation that superfluous independent variables may spuriously appear to be significant if they have this “cleaning up” effect, or even to use that expression. Griliches and Wallace [1965] note the same possibility in their footnote 7. 8 While this is the version attributed to Griliches on p. 28 of McCloskey’s The Writing of Economics, the only version we have been able to track down is “any time series regression containing more than four independent variables results in garbage” in Griliches’ comments on p. 335 of Intriligator and Kendrick [1974]. 9 For example, if Y = Z1 + Z2 + Z3, W1 = Z1, and W2 = Z2, and if the variance of Z1 is large relative to those of the other

Zi, then W2 does have some explanatory power (through Z2) that might be revealed in a multiple regression, but

might not be apparent in a simple regression. The reason is that, if the variance Z1 is sufficiently large, then Y =

W2 + will have a great deal of noise (due to the presence of Z1 in the error term),while Y = 1W12W2

+ provides a more powerful test (because Z1 will be removed from the error term).

7

used as a basis for including an independent variable, rather than to add variables because they

might produce better results.

IV. A Reconsideration of Cross-Sectional Regressions

There is nothing special about the choice of Total Assets and Book Equity as scaling

variables in the Section II. We can obtain similar results for most strictly positive firm

attributes. In this section we examine portfolios with a variety of possible Compustat values in

addition to Book Equity and Total Assets. For now, we continue to assume that abnormal

returns are deterministic, specifically, AR = .05𝐶𝑎𝑠ℎ

𝑀𝐸. For each such pair of Compustat values,

which we designate X1 and X2, we estimate parameters of the sequence of regressions

AR = 𝛼 + 𝛽𝑋1

𝑋2 + 𝜀 [8]

AR = 𝛼 + 𝛽ln (𝑋1

𝑋2) + 𝜀 [8a]


𝑋1+ 𝛽2

𝑋1

𝑋2+ 𝛽3

𝑋2

𝑀𝐸 + 𝜀 [9]


𝑋1+ 𝛽2ln (

𝑋1

𝑋2) + 𝛽3

𝑋2

𝑀𝐸 + 𝜀 [10]

AR = 𝛼 + 𝛽1ln (𝐶𝑎𝑠ℎ

𝑋1) + 𝛽2ln (

𝑋1

𝑋2) + 𝛽3ln (

𝑋2

𝑀𝐸) + 𝜀 [11]

In an effort to find ratios 𝑋1

𝑋2 that would be unrelated to our portfolios’ dependent variable,

.05𝐶𝑎𝑠ℎ

𝑀𝐸, we formed ratios with numerators and denominators taken from thirteen Compustat

database variables from 1995 to 2014.10 This gave us 156 ratios, and for our entire dataset we

found the correlation between .05𝐶𝑎𝑠ℎ

𝑀𝐸 and each ratio

𝑋1

𝑋2. We selected the eight ratios that had

the lowest correlations within the entire database with the expectations that they would also

have the lowest correlations within the portfolios of 150 firm-years, and so in a simple

regression might be expected to reject a null hypothesis of zero slope for 𝑋1

𝑋2 (or its log) at a

10 The variables (with Compustat Data Item numbers) are Total Receivables (DATA2), Total Inventories (DATA3), Total Current Assets (DATA4), Total Current Liabilities (DATA5), Total Assets (DATA6), Gross PP&E (DATA7), Net Sales (DATA12), Operating Income Before Depreciation (DATA13), Depreciation and Amortization (DATA14), Interest Expense (DATA15), PP&E Capital Expenditures (DATA30), Cost of Goods Sold (DATA41), Accounts Payable (DATA70). In all cases we trimmed from any regression any observation that had a negative component of a ratio.

8

frequency equal to the significance level.11 Thus we would expect any statistical significance of 𝑋1

𝑋2 in a multiple regression to be solely attributable to the “cleaning up” effect described in

Section III.

Table 2 shows the results of 1000 portfolios of 150 firm-years each. We analyzed each

portfolio using actual ratios described above (the eight having the lowest correlation with

Cash/ME, plus X1 = BE, X2 = TA and X1 = TA, X2 = BE12). Each panel shows the progression for

one ratio from simple regression to a multivariate regression featuring logs of all three

independent variables.

For example, Panel A of Table 2 features X1 = Operating Income before Depreciation =

OIBDP and X2 = Capital Expenditures = CapEx, and is fairly typical. In the simple regression of

the first row, we reject the null hypothesis that the coefficient of 𝑋1

𝑋2 is zero 11.2% of the time.

This is substantially greater than the 5% we were expecting based on the fact that we chose this

ratio because of its low correlation with 𝐶𝑎𝑠ℎ

𝑀𝐸 for the full sample from which the portfolios were

drawn. It is not clear why this is so much greater than 5%, but may be due to the fact that

while such tests are asymptotically well-specified, they may not be well-specified in samples of

only 150. Whether we use 𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥 or ln(

𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥) does not make a large difference in overall

rejection rates of H0: = 0, except that 𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥’s overall rejection rate of 11.2% featured 11.1%

with a positive and significant t-statistic, and only .1% with a negative and significant t-statistic,

while the overall rejection rate of 11.9% for ln(𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥) was more symmetrically distributed, with

a 5.8% frequency of positive rejections and a 6.1% frequency of negative rejections. Because

the distortion in the coefficient of 𝑋1

𝑋2 (in this case,


𝐶𝑎𝑝𝐸𝑥) or its log described in Sections II and

III is positive, we focus mainly on the rejections of H0 due to positive and significant t-statistics

when we consider the multiple regressions. In the first multiple regression [9] featuring 𝐶𝑎𝑠ℎ

𝑂𝐼𝐵𝐷𝑃,


𝐶𝑎𝑝𝐸𝑥, and

𝐶𝑎𝑝𝐸𝑥

𝑀𝐸 as independent variables, we find the positive and significant rejection rate for

the coefficient of 𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥 has more than doubled (to 23.7% from 11.1%) when compared with

that of the simple regression [8]. When we proceed to an identical regression [10] except 𝑂𝐼𝐵𝐷𝑃


is replaced by ln(𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥), this rejection rate again more than doubles, from 23.7% to 61.0%.

Finally, when we take the log of each of the three independent variables [11], this rejection rate

increases from 61.0% to 99.3%, or nearly all the time. Thus, in a fashion analogous to the

11 Alternatively, we can think of these as the eight ratios with the largest p-values. The smallest p-value of this set of eight was depreciation/total inventory, with a p-value of .8259 and a correlation of only -0.00111. 12 Note that in Section II, we always used TA in the denominator to conform to common usage. Here, to be consistent with the other eight pairs of ratios, we let it appear in numerator or denominator, depending on whether it is X1 or X2.

9

progression from regression [4] to [5] in section II, and consistent with the framework of

Section III, we find that a variable that was selected because it had no apparent effect on the

dependent variable appears to be significant in a multiple regression (with logs) almost all the

time. As before, this is not because the variable is conveying information on its own, but rather

because it is “cleaning up” the error created by weak choices for the other two independent

variables.

Panel B features the same two variables, but in the opposite order—X1 = Capital

Expenditures and X2 = Operating Income before Depreciation—and contains another surprising

result. All five rows feature rejection rates that are fairly similar to those of Panel A. Why is

this surprising? Because we would expect any dependent variable that is increasing in 𝑂𝐼𝐵𝐷𝑃


(or its log) to be decreasing in its reciprocal, 𝐶𝑎𝑝𝐸𝑥

𝑂𝐼𝐵𝐷𝑃 (or its log). However, that is not what the

penultimate multivariate regressions of Panels A and B show. When using ln(𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥) as the

middle independent variable in Panel A, 61% of the time we find a positive and significant t-

statistic, but for the same portfolios we find in Panel B that the dependent variable is also

increasing in ln(𝐶𝑎𝑝𝐸𝑥

𝑂𝐼𝐵𝐷𝑃) and has a positive and significant t-statistic 52.7% of the time. The last

regressions of Panels A and B—featuring logs of all three ratios used as independent variables—

are even more damning. Here we have a 99.3% rejection rate suggesting the coefficient of

ln(𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥) is positive, but also a 99.2% rejection rate suggesting the coefficient of ln(


𝑂𝐼𝐵𝐷𝑃) is

positive. Ln(𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥) = - ln(


𝑂𝐼𝐵𝐷𝑃), so clearly if abnormal returns are increasing in one of these

variables, they must be decreasing in the other. What purports to be information regarding

those two variables is really indicative of the fact that we made poor choices of independent

variables in 𝐶𝑎𝑠ℎ

𝑂𝐼𝐵𝐷𝑃 and


𝑀𝐸, and also in

𝐶𝑎𝑠ℎ

𝐶𝑎𝑝𝐸𝑥 and


𝑀𝐸. It also emphasizes that coefficients

in multiple regressions have meaning only in context; multicollinearity creates a type of

entanglement which implies the results do not have any generalizable interpretation, but

instead have significance only in the context of the specific forms of the other variables.

Panels C—H in Table 2 are fairly similar, and are summarized in Table 3. Panels I (X1 = Book

Equity, X2 = Total Assets) and J (X1 = Total Assets, X2 = Book Equity) are a bit different and merit

extra discussion. First of all we note that these are similar to Panels A and B in that together

the multiple regressions [9]—[11] suggest the dependent variable is increasing in 𝐵𝑜𝑜𝑘 𝐸𝑞𝑢𝑖𝑡𝑦

𝑇𝑜𝑡𝑎𝑙 𝐴𝑠𝑠𝑒𝑡𝑠

and also in 𝑇𝑜𝑡𝑎𝑙 𝐴𝑠𝑠𝑒𝑡𝑠

𝐵𝑜𝑜𝑘 𝐸𝑞𝑢𝑖𝑡𝑦. As discussed in the last paragraph, this is implausible. Second, the

22.9% rejection rate for the coefficient of 𝐵𝑜𝑜𝑘 𝐸𝑞𝑢𝑖𝑡𝑦

𝑇𝑜𝑡𝑎𝑙 𝐴𝑠𝑠𝑒𝑡𝑠 in the simple regression [8], for example,

suggests that indeed there might be a weak relationship between 𝐶𝑎𝑠ℎ

𝑀𝑎𝑟𝑘𝑒𝑡 𝐸𝑞𝑢𝑖𝑡𝑦 and

𝐵𝑜𝑜𝑘 𝐸𝑞𝑢𝑖𝑡𝑦

𝑇𝑜𝑡𝑎𝑙 𝐴𝑠𝑠𝑒𝑡𝑠.

The fact that this rejection rate increases to 94.3% in the multivariate regression [9] might seem

10

to suggest [9] is a more powerful test. Unfortunately, when the significance levels are

substantially misaligned, as Panels A—H show them to be, it is impossible to draw any

meaningful inference when the null is rejected. For example, suppose a colleague presents a

new test that he alleges is substantially more powerful than the generally accepted method.

The only problem, he acknowledges, is that the test is misspecified—at a significance level of

= 5%, it actually rejects true null hypotheses at a rate of about 19%. Unbeknownst to you, your

colleague’s method is simply to calculate the t-statistic the standard way and multiply it by 1.5

before comparing it with the critical value. This test is obviously more powerful (compared

with the standard test, it will reject more frequently when the null is false), but the problem is

that the misspecification (rejecting too frequently when the null is true) makes the test results

unreliable. So it is with the multiple regressions involving Book Equity and Total Assets in

Panels I and J. Because the results of Panels A—H demonstrate substantial misspecification, we

cannot draw any meaningful inference from any results of Panels I and J except that the

multivariate regressions reject much more frequently than the simple regressions as the

discussion in Sections II and III demonstrates they would.

Panel A of Table 3 summarizes the frequency with which the coefficient of 𝑋1

𝑋2 [or ln(

𝑋1

𝑋2)] is

found to have a significant positive t-statistic for the sequence of regressions [8]—[11].

Generally we find that the rejection rate is increasing as we move from a simple regression [8]

to a multiple regression which uses the logs of each of the three independent variables [11].

These changes in rejection rates are reported in Panel B of Table 3. For example, for all ten

ratios considered, the average rejection rate for simple regressions (Panel A’s 10.1%) increases

by 18.0% to 28.1% when we consider a multivariate regression with all three (unlogged)

variables. This rejection rate increases by another 18.9% when we use the log of 𝑋1

𝑋2 (but not of

the other two variables), and increases by an additional 51.7% (to an average rejection rate of

98.7%) when we take logs of each of the three independent variables.

Not all pairs of X1 and X2, however, produce increases at the same rates. For example, Table

3’s Panel B shows that when we move from the simple, unlogged regression [8] to the

multivariate regression [9] (with no logs of any of the three independent variables), the

rejection rate for X1 = Book Equity, X2 = Total Assets shows the biggest increase in positive

rejections at 71.4% (from 22.9% to 94.3%). In contrast, X1 = Accounts Payable, X2 = Total

Receivables features an increase in rejection rates of only 4.2%. What accounts for this

difference? Basically, when any independent variable is replaced with another independent

variable with which it has a perfect correlation, the rejection rates for the slope coefficient will

11

be identical.13 Continuity implies that using 𝑋1

𝑋2 or ln(

𝑋1

𝑋2) will produce very similar results when

their correlation is large. We have already seen that the rejection rates when we use logs of all

three independent variables are consistently near 100%, and average almost 50% when we take

the log only of 𝑋1

𝑋2. Because the correlation between

𝐵𝐸

𝑇𝐴 and its log is quite large (0.892), almost

all the bump in rejection rates of H0: 2 = 0 in moving from [8] to [10] occurs in the first stage,

from [8] to [9]. The same is not true of 𝐴𝑐𝑐𝑜𝑢𝑛𝑡𝑠 𝑃𝑎𝑦𝑎𝑏𝑙𝑒

𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑐𝑒𝑖𝑣𝑎𝑏𝑙𝑒𝑠, which has a correlation with its log of

only 0.121. For this variable, the total increase in rejection rates of 29.6% from [8] to [10] does

not come mostly in the move from [8] to [9], but rather in the move from [9] to [10], with its

increase of 25.4%. Consistent with this, we find for the ten ratios the correlation between

changes in positive rejection rates one the one hand and the correlation between 𝑋1

𝑋2 and its log

on the other is 0.768. Similarly, if the correlation between 𝑋1

𝑋2 and its log is a driving factor here,

we should also expect to find that variables with a low correlation get a bigger bump in

rejection rates as they move from [9] to [10], and indeed we find the correlation between

changes in rejection rates and the correlation between 𝑋1

𝑋2 and its log is negative (-0.392).

The other results in Table 3 are also disturbing. We selected the (X1, X2) pairs based on their

low correlations with the dependent variable, and so any method suggesting they are

significant—as do the multivariate regressions [9]-[11]—is seriously flawed. However, if the

method is flawed, then it can (and does) produce erroneous results even when it is used with

seemingly meaningful variables such as the firm’s total assets or book equity. Nevertheless,

this method is commonly seen.

How does multicollinearity cause these results? For the relationship in question, for

example, Christie showed that the correct independent variable is 𝑉

𝑀𝐸, and when we instead

used 𝑉

𝑋1 in regression [9] we not only left out the relevant term ME, but also introduced some

noise in the form of X1. When we added the third variable, 𝑋2

𝑀𝐸, we restored the relevant term

ME, but we introduced more noise in the form of X2. Finally, when we added the term 𝑋1

𝑋2 we

mitigated both noise terms at once. Thus the term 𝑋1

𝑋2 serves the same role in regressions [9]—

[11] that W2 serves in the second example of section III. It does not in any way directly explain

the dependent variable, but it does reduce the noise that was introduced by the erroneous

inclusion of X1 and X2 in the first and third independent variables of [9]—[11].

13 Two variables W1 and W2 will have a perfect correlation if and only if W1,i = a + bW2,i for every observation i, and so substituting one for the other will change the slope coefficient itself by a factor of b; and will result in identical t-statistics for the slopes. The substitution will leave the intercept unchanged if and only if a = 0.

12

Panels C and D focus on rejections of H0: = 0 due to any significant t-statistic, whether

positive or negative. The results are quite similar to those of Panels A and B.

Tables 2 and 3 raise an interesting issue about the correct choice between W and ln(W) as

an independent variable in regressions. Casual observation suggests many finance researchers

believe that the reason for using ln(W) is that it will make a skewed distribution of W more

symmetric. It is true that the distribution of ln(W) is likely to be more symmetric than that of W

if W is skewed, but symmetry of the independent variable’s distribution is not one of the OLS

assumptions; the only distributional assumptions made of the independent variable are that it

has a positive sample variance and a population variance that is finite (e.g., Kmenta (1997), p.

208). Apart from this, the distributional assumptions required for OLS all pertain to the

disturbance term 𝜀, not to the distribution of the independent variable itself (e.g., see Kmenta

or Kennedy (2008), p. 41). Of course, it is possible that the distributions of the independent

variable and the error term are related, but this is not necessarily so. For example, consider an

institution that always has a substantial long position in options. Its return relative for the

options division will be heavily skewed, as will its total return relative if options constitute most

of its trading. However, there is no reason to suspect the disturbance term in a regression of

total return relative on the option division’s return relative would be anything but normally

distributed, and taking the log of the independent variable because it is skewed would be

inappropriate.

Econometrics texts are not silent on the issue of log transformations; however, the main

application has nothing to do with the distribution of an independent variable, but with the

functional form. If it is additive, then OLS is used; logs are generally recommended only if the

functional form is multiplicative, e.g., a Cobb-Douglas production function. Because our

dependent variable is a multiplicative function of the independent variables, it should be no

surprise that the t-statistics improve (in our case, misleadingly so) as we move from [9] to [10]

to [11].

As a final example before proceeding, consider the DuPont formula, ROE = Return on Equity

= Profit Margin*Total Asset Turnover*Equity Multiplier. As it is, the formula is a tautology, but

suppose there is some measurement error so that it is only an approximation. If we wanted to

see the relative contributions of the right-hand side ratios on Return on Equity for a sample

with positive net incomes, it would be inappropriate to estimate parameters of the regression

ROE = + 1*Profit Margin + 2*Total Asset Turnover + 3*Equity Multiplier +

because the relationship is multiplicative, not additive. Instead, it would be more accurate to

estimate the parameters of

ln(ROE) = +1*ln(Profit Margin)+2*ln(Total Asset Turnover)+3*ln(Equity Multiplier) + .

13

The choice of logs has nothing to do with skewness of Profit Margin, Total Asset Turnover, or

the Equity Multiplier. It is driven by the correct choice of a model, not on the distribution of the

sample observations. Similarly, even if the context were different, the effect of leverage (more

accurately, of 1 – leverage, or 𝐵𝐸

𝑇𝐴) on ROE is known to be multiplicative, so if ROE is the

dependent variable, ln(𝐵𝐸

𝑇𝐴) is a better choice for an independent variable in a multivariate

regression than is simply 𝐵𝐸

𝑇𝐴.

Tables 2 and 3 consider a deterministic abnormal return, specifically, AR = ∆𝑀𝐸

𝑀𝐸 =

.05𝐶𝑎𝑠ℎ

𝑀𝐸. In

practice, of course, abnormal returns are noisy. Consequently, we extend the noiseless

equation [1] to a more practical example in which an error term is present, specifically,

AR = ∆𝑀𝐸

𝑀𝐸 =

.05𝐶𝑎𝑠ℎ

𝑀𝐸+ 𝛿. [12]

Now the left side of [12] can be negative, and we can no longer take the log of both sides as

we did to get from equation [2] to equation [7]. Nevertheless, the same intuition we developed

in the deterministic case will apply here. For example, consider any regression (simple or

multiple) and what happens if for every observation a constant is added to the dependent

variable. This will affect only the intercept, and will leave the slope(s) unchanged.14 With that

in mind, we can replace the dependent variable with its return relative, or one plus the

abnormal return. Because the problems we have identified pertain only to slopes, and because

the slopes remain unchanged, the difficulties when all abnormal returns are positive will persist

when some are negative as well.

The extent of the misspecification problem in the presence of a stochastic error term is an

empirical issue, and we address it through simulations. To avoid unnecessary complications,

we simulate 𝛿 as normally distributed with a mean of zero and a standard deviation of 2.5%.15

We find results that are not as strong as those in Table 2, but which nevertheless exemplify the

main problem, namely, that variables not involving V or Market Equity may spuriously appear

to be significant. The correct regression AR = �̂� + �̂�𝐶𝑎𝑠ℎ

𝑀𝐸+ 𝑒 is very powerful. We found the

average t-statistic for �̂� to be 7.39, and we rejected H0: = 0 in all but one of the 1000

14 The same is true if we add any constant(s) to one or more independent variables, but we do not use this fact here. 15 We chose these values because they are approximately the average parameters estimated by Brown and Warner [1985]. A main alternative would be to simply let 𝜀 be the stock’s market-model residual that day, but this would necessarily lead to heteroscedasticity and require the use of weighted least squares or generalized least squares, as pointed out by Karafiath et al [1991]. To avoid this additional complication, we opted for a homoscedastic error term, which allows us to use ordinary least squares.

14

simulated portfolios. Table 4 reports estimates the parameters of each of regressions [8]—[11],

and Table 5 shows the differences in the rejection rates of H0: = 0 (simple) and H0: 2 = 0

(multivariate).

The results are fairly similar to those of Tables 2 and 3, but not as dramatic. Still, when

ln(X1/X2) is used, Table 5’s Panel A shows the average rejection rate of H0: 2 = 0 in favor of a

positive 2 is 17.6%, substantially larger than the 3.2% rejection rate for the simple regression

[8a] featuring ln(X1/X2) as the only independent variable. Even worse, when all three ratios are

logged, the rejection rate of H0: 2 = 0 in favor of a positive 2 is 64.6%. The results of Table 5’s

Panels C and D are quite similar; the middle independent variable [𝑋1

𝑋2 or ln(

𝑋1

𝑋2)] appears to be

significant substantially more often than the 5% significance level or even regression [8a]’s

7.7%, again suggesting that its role in the multivariate regression is primarily “cleaning up” the

noise created by using as normalizing variables something other than what Christie shows to be

the correct normalizer, market equity.

V. Other Issues

Because it is well-established that leverage is usually associated with an increase in the

dispersion of returns, the reason it provides no marginal explanatory power for ∆𝑀𝐸

𝑀𝐸 above and

beyond what is provided by 𝑉

𝑀𝐸 merits a bit of extra explanation. Basically, the familiar result

that additional debt increases equityholders’ risk is based on the assumption that such debt is

used to increase the firm’s assets to scale. However, if the increase in debt is not associated

with a proportional increase in risky assets, it is important to use 𝑉

𝑀𝐸 rather than

𝑉

𝑇𝐴.

As a thought experiment, we consider two cases for a firm that starts as 100% equity and

doubles its size by borrowing; in the first case, the firm increases its assets to scale, while in the

second it invests the proceeds from the debt in a riskless asset. In the first case, the firm’s

equity is indeed more risky. In the second case, however, the issuance of more debt and the

purchase of riskless assets form a perfect hedge; the firm’s equityholders are in no more risky a

position than before. Use of 𝑉

𝑀𝐸 as the independent variable captures this; in the first case,

equity is more risky because V has doubled, while in the second case, the risk to equityholders

has not changed because V has not changed. Use of 𝐷

𝑇𝐴 (or, equivalently,

𝐵𝐸

𝑇𝐴), however, is

misleading because it suggests equityholders’ risk has increased in both cases due to the

increase in debt. Thus 𝑉

𝑀𝐸 works correctly in both cases, while

𝐷

𝑇𝐴 erroneously implies

equityholders always bear more risk, even if V remains unchanged. In either case, leverage

15

adds no explanatory power for equityholders’ returns above and beyond that which was

provided by 𝑉

𝑀𝐸.

The analysis above (and indeed throughout the paper) has been based on the assumption of

riskless debt, but we now briefly consider the impact of risky debt. If debt is risky, then ∆V = kV

= ∆ME + ∆D, or equivalently, ∆𝑀𝐸

𝑀𝐸=

𝑘𝑉

𝑀𝐸−

∆𝐷

𝑀𝐸. Now if we estimate the parameters of the

regression

AR = 𝛼 + 𝛽1𝑉

𝑀𝐸+ 𝛽2

𝐷

𝑀𝐸+ 𝜀, [13]

we would expect 𝛽1 = k to have a sign opposite that of 𝛽2. If k > 0, for example, and the firm’s

debt is risky, then debtholders share some part of the gain ∆V. The more debt there is to share

that gain, the smaller the gain that accrues to the equityholders, and thus E(𝛽2) < 0.16 This is

superficially the opposite of the result found in [3], where 𝐷

𝑇𝐴 had a positive coefficient, but [3]

assumed riskless debt and used the erroneous variables 𝑉

𝑇𝐴 and

𝐷

𝑇𝐴, while here we are explicitly

allowing risky debt and using the correct variables 𝑉

𝑀𝐸 and

𝐷

𝑀𝐸. The result here is that, if debt is

risky, then it mitigates gains and losses to shareholders. This is similar to a consideration of

risky debt in the Miller and Miller Capital Structure Proposition; risky debt itself does not create

extra risk, but rather shares (and thus mitigates) risk that would otherwise accrue to

equityholders. More importantly here, the terms of [13] are normalized by market equity for

the same reasons as discussed in Christie [1987] and in the introduction.

While not explicitly stated, we have assumed so far that V is an entry from a market value

balance sheet. As a consequence, it could also be some capitalized value from an income

statement, and such variables as sales, cost of goods sold, or net income are also candidates for

V. In practice, however, market value balance sheets are generally academic fictions. Only

book values are typically available, and we simply make the assumption that they are closely

related to analogous market-value balance sheet entries. Provided any discrepancies between

relevant book-value entries and market-value entries are proportional across firms, this

changes the slope coefficients themselves but does not alter the t-statistics in any way. For

example, suppose the true dependence of abnormal returns on the variable of interest is

c(𝑉𝑀𝑎𝑟𝑘𝑒𝑡 𝑉𝑎𝑙𝑢𝑒

𝑀𝐸) for some constant c, but all we can measure is

𝑉𝐵𝑜𝑜𝑘 𝑉𝑎𝑙𝑢𝑒

𝑀𝐸. If for all firms book

values are overstated (or understated) by the same proportion d, then the coefficient we

measure will be equal to 𝑐

𝑑 instead of c, but the t-values will be identical.

16 A result of 𝛽2 = 0 would indicate the debt is riskless, at least as far as changes in V are concerned.

16

VI. Extensions

Apart from use of Book Equity and Total Assets as X1 and X2, Tables 2—5 may appear to be

contrived, and indeed to some extent they are. However, the problems they identify can occur

even in more common settings. Specifically, is not strictly necessary for the regression to

include X1 and X2 in multiple places. Suppose, for example, there existed another balance sheet

item, A, whose value was a constant proportion of, say, total assets, for every firm, or Ai = cTAi.

In this case, substitution of Ai for TAi (or vice versa) as a component of any variable would result

in beta coefficients that were different by a factor of c, but which would have identical t-

statistics. If there is a slight perturbation so that Ai = cTAi + i (where i has a small variance

relative to the dispersion of TAi), then substitution of Ai for TAi (or vice versa) in any ratio will

produce very similar t-statistics. Since Ai = cTAi + i is essentially a regression of Ai on TAi that is

constrained to go through the origin, one measure of how slight the perturbations i really are

is the R2 of this constrained regression. While the R2 of a regression with an intercept is

uniquely defined, there are no fewer than eight17 common ways of measuring the R2 of a

regression constrained to go through the origin. We choose the one SAS uses, 1 −∑(𝑦𝑖− �̂�𝑖)

2

∑ 𝑦𝑖2 ,

and dub it the constrained R2 for the remainder of this paper.18 We note in passing that when it

comes to the entire ratio that serves as an independent variable, that a substitute have a very

large linear correlation is sufficient to produce similar t-statistics for the slope coefficients.

However, when we are looking at either the numerator or denominator of an independent

variable, a high linear correlation is insufficient to produce comparable results; instead, the

constrained correlation must be very large to ensure similar t-statistics. The reason is that a

high linear correlation between X1 and X2 implies that there exist a and b such that X2 ≅ a + bX1,

but if a ≠ 0 and if X2 appears in a ratio in lieu of X1, there will not be a cancelling out effect

because the constant term a will not cancel out. However, if the constrained correlation is

large, there will be a cancelling out effect because X2 ≅ bX1 is ensured.19

17 E.g., see Kvalseth (1985). 18 In mathematics, Y = X is termed a linear transformation and Y = X an affine transformation, and thus it seems natural to call their measures of R2 linear and affine, respectively. However, the use of the expression “linear correlation” to describe the affine transformation is already widespread, so we resort to this alternative. 19 For example, suppose X ~U[5, 10]. The correlation between X and the simple translation 6.02*1023 + X is 1, and

yet, because 6.02*1023 dwarfs X, 1

6.02∗1023+𝑋 will have approximately the same value for any values of X that are

orders of magnitude smaller, and so will not produce the desired “cancelling out” effect with any X that may

appear in a numerator. A high constrained R2, however, will necessarily produce a cancelling out effect. A similar

result would occur if X in a numerator is replaced with 6.02*1023 + X.

17

For the full set of observations, one of the largest constrained correlations was that

between Operating Income Before Depreciation and Total Assets, at 0.9741. This suggests that

we should obtain similar results when the value for Total Assets is substituted for one of the

values of Operating Income Before Depreciation in Panel A of Tables 2 and 4. We made this

substitution and report the results in Table 6.

As Table 6 suggests, the problem of misleading significance is somewhat reduced but

nevertheless persists when Total Assets replace Operating Income Before Depreciation in either

the first or third variable (because the middle variables, 𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥 or


𝑂𝐼𝐵𝐷𝑃 are the ones producing

misleading significant slope coefficients, we left them intact). Panels A and B show the results

for the deterministic AR = ∆𝑀𝐸

𝑀𝐸 =

.05𝐶𝑎𝑠ℎ

𝑀𝐸, while C and D show them for the stochastic AR =

∆𝑀𝐸

𝑀𝐸 =

.05𝐶𝑎𝑠ℎ

𝑀𝐸+ 𝛿. In the first three rows Panel A (with no substitutions), for example, frequency of

positive and significant rejections that the coefficient of the middle term (𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥 or its logarithm)

equals zero range from 23.7% to 61.0% to 99.3% (as we proceed from taking no logs to taking

the log of the middle independent variable only to taking logs of all three independent

variables, i.e., regressions [9]—[11]). When we substitute Total Assets for the first independent

variable’s denominator of OIBDP in the last three rows, these rejection rates are significantly

smaller, proceeding from 11.3% to 22.2% to 73.8%. Still, all are larger than the 5.8% rejection

rate we actually found in Table 2, Panel A’s simple regression [8a], AR = 𝛼 + 𝛽ln (𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥) + 𝜀. A

similar result is found in Panel B of Table 6, which uses 𝐶𝑎𝑝𝐸𝑥

𝑂𝐼𝐵𝐷𝑃 (or its logarithm) as the middle

variable. The original frequencies of positive and significant t-values for [9]—[11] of 24.1%,

52.7%, and 99.2% drop to 14.2%, 33.7%, and 84.9% when Total Assets are substituted for

Operating Income Before Depreciation in the numerator of the third independent variable.

Again, while the rejection rates are somewhat lower after the substitution is made, all are

substantially larger than the 6.1% we found for the actual simple regression [8a] in Panel B of

Table 2. When the abnormal return AR is not deterministic and instead includes an error term,

Panel C of Table 6 shows similar results: for 𝑂𝐼𝐵𝐷𝑃

𝐶𝑎𝑝𝐸𝑥 (or its logarithm) the original positive and

significant rejection rates of 9.5%, 20.4%, and 50.3% become 5.2%, 9.9%, and 24.0% after the

substitution is made. All exceed the 3.4% rejection rate for Table 4, Panel A’s simple regression

[8a], the last two by substantial amounts. Panel D of Table 6 shows similar results.

In all cases, Table 6 suggests the problem of distorted t-statistics stemming from the fact

that an irrelevant independent variable may “clean up” poor measures of other (relevant)

independent variables can remain even when the irrelevant component appears in only one

ratio. Provided they have a high constrained correlation, one variable may be substituted for

another in either a numerator or a denominator and produce deceptive significance levels. This

can be a substantial problem for regressions with a large number of independent variables, as

any set of three or more independent variables can combine in such a way that one (or more)

18

of them appear to be significant even though their only role is cleaning up noise created by

poor choices of other independent variables.20

VII. Conclusions

Christie [1987] showed that market equity [ME] is the correct scaling variable for any cross-

sectional regression of abnormal returns on firm characteristics, but many researchers scale by

such variables as Total Assets or Book Equity instead. We demonstrate that the apparent

significance of the resulting variables can be a mathematical artifact that is unrelated to their

true significance. Not only can true (and sometimes clearly so) null hypotheses frequently be

rejected, but depending on the initial setup, they can be rejected in favor of contradictory

alternatives. Moreover, because the effect can occur whenever two variables have a high

constrained R2, detection of false inferences is quite difficult. One (imperfect) step towards

confirming coefficients in a multivariate regression are truly what we purport them to be is to

test them in simple regressions as well. These results also highlight the importance of having a

specific model and using it to determine the exact variables and their appropriate form rather

than let these choices be made by the data. Finally, we can conclude it is inappropriate to add

independent variables based on the assumptions that they might matter, and if they don’t they

will not cause any damage. They may well cause damage by creating (or perhaps resolving)

needless noise that makes them and other irrelevant variables appear to be significant.

Estimates of multivariate regressions have meaning only in the context of all the independent

variables selected, and adding, deleting, or changing variables can completely alter the meaning

of other variables’ coefficients.

Moreover, while our framework is based on event-study cross-sectional regressions, this

condition was only invoked because in this setting Christie has identified the correct functional

form. The general principle could be shown to apply to other cross-sectional regressions as

well, but a proof would require that we knew the correct functional form. While such a correct

functional form may exist, we will rarely know what it is; nevertheless, its existence implies that

estimates of other multiple regressions will be subject to the same problem we have identified

for event-study cross-sectional regressions. Thus this paper provides evidence in support of

Griliches’ Law: “Any cross-sectional regression with more than five variables produces

garbage.” The more independent variables that appear in a regression, the greater the chance

that some subset of them will combine to make an irrelevant variable appear to be significant

because it is “cleaning up” the noise created by an incorrect choice of other variables.

20 For example, a regression with 10 independent variables will have 210 − 11 − (

102

) = 968 subsets of three or

more independent variables, any of which can be plagued by this problem.

19

References

Brown, S., and J. Warner, 1985, Using daily stock returns: The case of event studies,” Journal of

Financial Economics, 14(1): 3—31.

Christie, A. A., 1987, On Cross-Sectional Analysis in Accounting Research, Journal of Accounting

and Economics, 9(3):231—258.

Goldman, W., 1973, The Princess Bride, Harcourt Brace Jovanovich (San Diego).

Griliches, Z, and N. Wallace, 1965, The Determinants of Investment Reinvestigated,

International Economic Review, 6(3): 311—329.

Kendrick, D., and M. Intriligator, 1974, Frontiers of quantitative economics: Papers invited for

presentation at the Econometric Society Winter Meetings, New York, 1969 [and] Toronto, 1972.

Vol. 2. North-Holland Pub. Co.

Karafiath, I., R. Mynatt, and K. Smith, 1991, The Brazilian default announcement and the

contagion effect hypothesis, Journal of Banking and Finance, 15(3): 699—716.

Kennedy, P., 2008, A Guide to Econometrics, 6th Edition, Blackwell Publishing.

Kmenta, J., 1997, Elements of Econometrics, Second Edition, The University of Michigan Press.

Kvalseth, T., 1985, Cautionary Note about R2, The American Statistician, 39(4): 279-285.

McCloskey, D., 1987, The Writing of Economics, MacMillan Publishing Co., New York.

Mitchell, D., 1991, Invariance of Results Under a Common Orthogonalization, Journal of

Economics and Business, 43: 193-196.

20

Table 1—Summary of successive regression forms from equations (3)—(6) and related multivariate

regressions.

Multivariate Regression Constant

(t-statistic) Coefficient (t-statistic) of Cash/TA

or ln(Cash/TA)

Coefficient (t-statistic)

of D/TA, BE/TA, or ln(BE/TA)

Coefficient (t-statistic) of BE/ME

or ln(BE/ME)

[3]

.05Cash/ME = + 1(Cash/TA) +

2(D/TA) + 3(BE/ME)

-.0090 (-3.46)

.0386 (8.14)

.0143 (3.45)

.0098 (5.59)

[4]

.05Cash/ME = + 1(Cash/TA) +

2(BE/TA) + 3(BE/ME)

.0040 (2.18)

.0431 (9.02)

-.0186 (-4.83)

.0107 (6.23)

[5]

.05Cash/ME = + 1ln(Cash/TA) +

2ln(BE/TA) + 3ln(BE/ME)

.0185

(12.92)

.0040 (9.88)

-.0061 (-6.62)

.0068 (7.60)

[6]

Ln(.05Cash/ME) = + 1ln(Cash/TA) +

2ln(BE/TA) + 3ln(BE/ME)

-2.9957 (−∞)

1.000 (∞)

-1.000 (−∞)

1.000 (∞)

In the multivariate regressions, t-values for the three variable’s coefficients improve as we move from

using 𝐷𝑒𝑏𝑡

𝑇𝐴 [equation (3)] to

𝐵𝐸

𝑇𝐴 [equation (4)], and then improve more as when we take natural

logarithms of the three independent variables [equation (5)], and finally become infinite when we take

the logarithm of the dependent variable as well [equation (6)]. The multivariate t-statistics also

generally improve [except when we take logarithms to move from (4) to (5)], but not as dramatically.

21

Table 2: Simulated Portfolios for various choices of X1 and X2 in sequence of regressions from

.𝟎𝟓(𝑪𝒂𝒔𝒉)

𝑴𝑬 = 𝜶 + 𝜷

𝑿𝟏

𝑿𝟐 to


𝑴𝑬 = 𝜶 + 𝜷𝟏𝒍𝒏(

𝑽

𝑿𝟏) + 𝜷𝟐𝒍𝒏(

𝑿𝟏

𝑿𝟐) + 𝜷𝟑𝒍𝒏(

𝑿𝟐

𝑴𝑬).

Panel A: X1 = Operating Income Before Depreciation, X2 = Capital Expenditures

Independent Variables

𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.50) [0.112] {0.516}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(-0.10) [0.119] {0.437}

All three

(unlogged) [9]

(4.18) [0.829] {0.053}

(1.36) [0.237] {0.367}

(4.15) [0.604] {0.176}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(4.76) [0.882] {0.028}

(2.48) [0.610] {0.138}

(4.88) [0.704] {0.100}

All three logged

[11]

(10.87) [1.000] {0.000}

(6.16) [0.993] {0.001}

(7.78) [1.000] {0.000}

(average t)

[total rejection rate]

{average p-value}

22

Table 2, Panel B: X1 = Capital Expenditures, X2 = Operating Income Before Depreciation


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.51) [0.104] {0.500}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(0.10) [0.119] {0.437}

All three

(unlogged) [9]

(4.48) [0.885] {0.039}

(1.31) [0.242] {0.362}

(5.83) [0.754] {0.086}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(4.86) [0.898] {0.031}

(2.03) [0.531] {0.186}

(5.96) [0.775] {0.082}

All three logged

[11]

(10.87) [1.000] {0.000}

(7.09) [0.992] {0.002}

(7.78) [1.000] {0.000}

(average t)


{average p-value}

23

Table 2, Panel C: X1 = Accounts Payable, X2 = Total Receivables


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.32) [0.082] {0.575}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(0.28) [0.099] {0.453}

All three

(unlogged) [9]

(3.79) [0.799] {0.059}

(0.77) [0.127] {0.519}

(4.62) [0.722] {0.105}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(4.08) [0.839] {0.044}

(1.63) [0.377] {0.243}

(4.77) [0.748] {0.091}

All three logged

[11]

(11.14) [1.000] {0.000}

(7.48) [1.000] {0.000}

(9.30) [1.000] {0.000}

(average t)


{average p-value}

24

Table 2, Panel D: X1 = Depreciation and Amortization, X2 = Total Inventory


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.07) [0.061] {0.542}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(0.33) [0.100] {0.460}

All three

(unlogged) [9]

(3.76) [0.807] {0.053}

(0.63) [0.115] {0.531}

(3.49) [0.488] {0.217}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(4.09) [0.873] {0.033}

(2.00) [0.482] {0.172}

(4.09) [0.626] {0.129}

All three logged

[11]

(10.40) [1.000] {0.000}

(8.09) [1.000] {0.000}

(8.73) [1.000] {0.000}

(average t)


{average p-value}

25

Table 2, Panel E: X1 = Inventory, X2 = Cost of Goods Sold


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(-0.30) [0.077] {0.475}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(-0.30) [0.099] {0.452}

All three

(unlogged) [9]

(2.47) [0.550] {0.154}

(0.75) [0.142] {0.490}

(4.43) [0.619] {0.158}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(2.58) [0.596] {0.138}

(0.97) [0.200] {0.360}

(4.48) [0.630] {0.151}

All three logged

[11]

(10.44) [1.000] {0.000}

(6.51) [0.994] {0.001}

(8.96) [1.000] {0.000}

(average t)


{average p-value}

26

Table 2, Panel F: X1 = Interest Expense, X2 = Cost of Goods Sold


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.40) [0.119] {0.484}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(-0.14) [0.096] {0.449}

All three

(unlogged) [9]

(1.48) [0.271] {0.313}

(1.01) [0.205] {0.443}

(4.62) [0.639] {0.141}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(1.53) [0.285] {0.283}

(0.620) [0.166] {0.398}

(4.57) [0.626] {0.148}

All three logged

[11]

(10.14) [1.000] {0.000}

(8.16) [1.000] {0.000}

(8.97) [1.000] {0.000}

(average t)


|negative and significant rejection rate|

{average p-value}

27

Table 2, Panel G: X1 = Capital Expenditures, X2 = Accounts Payable


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(-0.27) [0.067] {0.470}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(-0.90) [0.178] {0.366}

All three

(unlogged) [9]

(3.30) [0.707] {0.102}

(0.46) [0.093] {0.525}

(4.27) [0.621] {0.159}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(3.44) [0.737] {0.082}

(1.06) [0.259] {0.345}

(4.43) [0.644] {0.142}

All three logged

[11]

(11.36) [1.000] {0.000}

(7.55) [0.999] {0.000}

(9.19) [1.000] {0.000}

(average t)


{average p-value}

28

Table 2, Panel H: X1 = Capital Expenditures, X2 = Interest Expense


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.19) [0.042] {0.657}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(-0.50) [0.107] {0.432}

All three

(unlogged) [9]

(3.45) [0.728] {0.090}

(0.60) [0.079] {0.576}

(5.58) [0.724] {0.107}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(3.69) [0.789] {0.066}

(1.43) [0.302] |{0.282}

(5.85) [0.769] {0.081}

All three logged

[11]

(10.41) [1.000] {0.000}

(7.19) [0.999] {0.000}

(8.53) [1.000] {0.000}

(average t)


{average p-value}

29

Table 2, Panel I: X1 = Book Equity, X2 = Total Assets


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.94) [0.253] {0.326}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(-0.03) [0.228] {0.349}

All three

(unlogged) [9]

(6.38) [0.920] {0.024}

(4.26) [0.943] {0.015}

(7.74) [0.988] {0.003}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(7.17) [0.973] {0.007}

(4.54) [0.946] {0.014}

(7.99) [0.990] {0.002}

All three logged

[11]

(10.88) [1.000] {0.000}

(4.53) [0.909] {0.029}

(9.41) [1.000] {0.000}

(average t)


{average p-value}

30

Table 2, Panel J: X1 = Total Assets, X2 = Book Equity


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.85) [0.188] {0.442}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(0.03) [0.228] {0.349}

All three

(unlogged) [9]

(9.54) [1.000] {0.000}

(3.30) [0.643] {0.112}

(9.27) [0.999] {0.000}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(9.95) [1.000] {0.000}

(3.87) [0.872] |{0.034}

(9.44) [0.999] {0.000}

All three logged

[11]

(10.88) [1.000] {0.000}

(5.72) [0.988] {0.002}

(9.41) [1.000] {0.000}

(average t)


{average p-value}

Table 3: Summary of Rejection Rates of the Coefficient of X1/X2, Deterministic ARs

Panel A: Only positive rejections (t-statistic > 1.96)

A B C D E F G H I J average

X1 Oper. Inc.

before Deprec.

Capital Expend.

A/P Deprec. Total Inventory

Interest Expense

Capital Expend.

Capital Expend.

Book Equity

Total Assets

X2 Capital Expend.

Oper. Inc.

before Deprec.

Total Receivables

Total Inventory

Cost of Goods Sold

Cost of Goods Sold

A/P Interest Expense

Total Assets

Book Equity

Simple, not logged [8] 0.111 0.103 0.081 0.061 0.045 0.116 0.044 0.042 0.229 0.179 0.101

Simple, logged [8a] 0.058 0.061 0.079 0.075 0.018 0.034 0.009 0.013 0.097 0.131 0.058

Multivariate, none logged

[9] 0.237 0.241 0.123 0.115 0.136 0.205 0.092 0.079 0.943 0.643 0.281 Multivariate,

only X1/X2 logged [10] 0.61 0.527 0.377 0.482 0.194 0.142 0.249 0.301 0.945 0.872 0.470

Multivariate, all logged

[11] 0.993 0.992 1 1 0.994 1 0.999 0.999 0.906 0.988 0.987

32

Table 3, Panel B: Increases in rejection rates: Only positive rejections (t-statistic > 1.96)


X1 Oper. Inc.

before Deprec.

Capital Expend.


Interest Expense

Capital Expend.

Capital Expend.

Book Equity

Total Assets

X2 Capital Expend.

Oper. Inc.

before Deprec.

Total Receivables

Total Inventory

Cost of Goods Sold

Cost of Goods Sold


Total Assets

Book Equity


minus Simple, not

logged 0.126 0.138 0.042 0.054 0.091 0.089 0.048 0.037 0.714 0.464 0.180 Multivariate,

only X1/X2 logged minus

Multivariate, none logged 0.373 0.286 0.254 0.367 0.058 -0.063 0.157 0.222 0.002 0.229 0.189 Multivariate,

all logged minus

Multivariate, only X1/X2

logged 0.383 0.465 0.623 0.518 0.8 0.858 0.75 0.698 -0.039 0.116 0.517 Correlation

between X1/X2 and ln(X1/X2) 0.065 0.071 0.121 0.253 0.173 0.069 0.169 0.162 0.892 0.11

33

Table 3, Panel C: All rejections (|t-statistic| > 1.96)


X1 Oper. Inc.

before Deprec.

Capital Expend.


Interest Expense

Capital Expend.

Capital Expend.

Book Equity

Total Assets

X2 Capital Expend.

Oper. Inc.

before Deprec.

Total Receivables

Total Inventory

Cost of Goods Sold

Cost of Goods Sold


Total Assets

Book Equity

Simple, not logged [8] 0.112 0.104 0.082 0.061 0.077 0.119 0.067 0.042 0.253 0.188 0.111

Simple, logged [8a] 0.119 0.119 0.099 0.1 0.099 0.096 0.178 0.107 0.228 0.228 0.137


[9] 0.237 0.242 0.127 0.115 0.142 0.205 0.093 0.079 0.943 0.643 0.283

Multivariate, only X1/X2 logged [10] 0.61 0.531 0.377 0.482 0.2 0.166 0.259 0.302 0.946 0.872 0.475

Multivariate, all logged

[11] 0.993 0.992 1 1 0.994 1 0.999 0.999 0.909 0.988 0.987

34

Table 3, Panel D: Increases in rejection rates (all |t-statistic| > 1.96)


X1 Oper. Inc.

before Deprec.

Capital Expend.


Interest Expense

Capital Expend.

Capital Expend.

Book Equity

Total Assets

X2 Capital Expend.

Oper. Inc.

before Deprec.

Total Receivables

Total Inventory

Cost of Goods Sold

Cost of Goods Sold


Total Assets

Book Equity


minus Simple, not

logged 0.125 0.138 0.045 0.054 0.065 0.086 0.026 0.037 0.69 0.455 0.172 Multivariate,

only X1/X2 logged minus

Multivariate, none logged 0.373 0.289 0.25 0.367 0.058 -0.039 0.166 0.223 0.003 0.229 0.192 Multivariate,

all logged minus

Multivariate, only X1/X2

logged 0.383 0.461 0.623 0.518 0.794 0.834 0.74 0.697 -0.037 0.116 0.513

Correlation between X1/X2 and ln(X1/X2) 0.065 0.071 0.121 0.253 0.173 0.069 0.169 0.162 0.892 0.11

Table 4: Simulated Portfolios for various choices of X1 and X2 in sequence of regressions from


𝑴𝑬+ 𝜹 = 𝜶 + 𝜷

𝑿𝟏

𝑿𝟐+ 𝜺 to


𝑴𝑬+ 𝜹 = 𝜶 + 𝜷𝟏𝒍𝒏 (

𝑽

𝑿𝟏) + 𝜷𝟐𝒍𝒏 (

𝑿𝟏

𝑿𝟐) + 𝜷𝟑𝒍𝒏 (

𝑿𝟐

𝑴𝑬) + 𝜺.

Panel A: X1 = Operating Income Before Depreciation, X2 = Capital Expenditures


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.21) [0.076] {0.501}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(-0.02) [0.070] {0.488}

All three

(unlogged) [9]

(1.47) [0.326] {0.269}

(0.54) [0.102] {0.453}

(1.72) [0.314] {0.323}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(1.67) [0.398] {0.234}

(0.99) [0.209] {0.360}

(1.96) [0.352] {0.286}

All three logged

[11]

(3.30) [0.870] {0.028}

(2.02) [0.503] {0.186}

(2.56) [0.619] {0.121}

(average t)


{average p-value}

36

Table 4, Panel B: X1 = Capital Expenditures, X2 = Operating Income Before Depreciation


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.19) [0.074] {0.494}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(0.02) [0.070] {0.488}

All three

(unlogged) [9]

(1.58) [0.322] {0.242}

(0.45) [0.088] {0.452}

(2.33) [0.419] {0.254}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(1.68) [0.382] {0.224}

(0.68) [0.132] {0.413}

(2.35) [0.423] {0.249}

All three logged

[11]

(3.30) [0.870] {0.029}

(2.15) [0.548] {0.139}

(2.56) [0.619] {0.121}

(average t)


{average p-value}

37

Table 4, Panel C: X1 = Accounts Payable, X2 = Total Receivables


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.13) [0.063] {0.484}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(0.15) [0.081] {0.480}

All three

(unlogged) [9]

(1.73) [0.392] {0.239}

(0.34) [0.090] {0.468}

(2.32) [0.403] {0.271}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(1.86) [0.436] {0.212}

(0.78) [0.152] {0.411}

(2.39) [0.408] {0.259}

All three logged

[11]

(4.29) [0.975] {0.006}

(2.92) [0.769] {0.057}

(3.67) [0.896] {0.027}

(average t)


{average p-value}

38

Table 4, Panel D: X1 = Depreciation and Amortization, X2 = Total Inventory


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.05) [0.046] {0.534}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(0.15) [0.063] {0.486}

All three

(unlogged) [9]

(1.54) [0.338] {0.254}

(0.31) [0.054] {0.501}

(1.70) [0.272] {0.338}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(1.67) [0.379] {0.229}

(0.91) [0.175] {0.386}

(1.93) [0.330] {0.304}

All three logged

[11]

(3.66) [0.924] {0.017}

(2.97) [0.787] {0.064}

(3.26) [0.818] {0.049}

(average t)


{average p-value}

39

Table 4, Panel E: X1 = Inventory, X2 = Cost of Goods Sold


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(-0.19) [0.070] {0.491}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(-0.20) [0.062] {0.481}

All three

(unlogged) [9]

(1.08) [0.211] {0.364}

(0.29) [0.080] {0.482}

(2.16) [0.351] {0.293}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(1.11) [0.224] {0.358}

(0.39) [0.074] {0.471}

(2.18) [0.353] {0.286}

All three logged

[11]

(3.70) [0.931] {0.016}

(2.27) [0.605] {0.121}

(3.28) [0.838] {0.047}

(average t)


{average p-value}

40

Table 4, Panel F: X1 = Interest Expense, X2 = Cost of Goods Sold


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.23) [0.077] {0.477}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(-0.01) [0.064] {0.477}

All three

(unlogged) [9]

(0.64) [0.099] {0.438}

(0.49) [0.111] {0.455}

(2.32) [0.386] {0.278}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(0.68) [0.115] {0.425}

(0.32) [0.099] {0.452}

(2.30) [0.379] {0.281}

All three logged

[11]

(3.83) [0.944] {0.016}

(3.14) [0.824] {0.054}

(3.48) [0.859] {0.038}

(average t)


|negative and significant rejection rate|

{average p-value}

41

Table 4, Panel G: X1 = Capital Expenditures, X2 = Accounts Payable


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(-0.17) [0.056] {0.492}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(-0.46) [0.093] {0.449}

All three

(unlogged) [9]

(1.54) [0.365] {0.269}

(0.20) [0.069] {0.500}

(2.108) [0.379] {0.276}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(1.61) [0.389] {0.251}

(0.53) [0.101] {0.438}

(2.25) [0.379] {0.269}

All three logged

[11]

(4.39) [0.979] {0.005}

(2.97) [0.783] {0.054}

(3.68) [0.897] {0.027}

(average t)


{average p-value}

42

Table 4, Panel H: X1 = Capital Expenditures, X2 = Interest Expense


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

𝑿𝟏𝑿𝟐

(or its log)

𝑿𝟐𝑴𝑬

(or its log)

𝑿𝟏

𝑿𝟐 only

[8]

(0.04) [0.035] {0.522}

𝒍𝒏(𝑿𝟏

𝑿𝟐) only

[8a]

(-0.30) [0.071] {0.471}

All three

(unlogged) [9]

(1.38) [0.311] {0.291}

(0.21) [0.048] {0.500}

(2.54) [0.432] {0.249}

All three

(log of 𝑿𝟏

𝑿𝟐

only) [10]

(1.47) [0.326] {0.269}

(0.59) [0.098] {0.444}

(2.62) [0.451] {0.229}

All three logged

[11]

(3.86) [0.947] {0.013}

(2.74) [0.737] {0.071}

(3.30) [0.830] {0.045}

(average t)


{average p-value}

43

Table 4, Panel I: X1 = Book Equity, X2 = Total Assets


𝑪𝒂𝒔𝒉

𝑿𝟏

(or its log)

Cross-Sectional Regressions in Event Studies · 2017. 2. 16. · 3 Christie shows [2] is...

Documents

Transcript of Cross-Sectional Regressions in Event Studies · 2017. 2. 16. · 3 Christie shows [2] is...