Corporate Finance: Instructor's Manual Applied Corporate Finance ...
University of Cologne Corporate Finance ... - uni-saarland.de · University of Cologne Corporate...
Transcript of University of Cologne Corporate Finance ... - uni-saarland.de · University of Cologne Corporate...
Empirical Finance
Universität Saarbrücken
Winter Term 2015
Univ. Prof. Dr. Dieter Hess
University of Cologne
Corporate Finance Seminar
University of Cologne
Corporate Finance Seminar
Organizational Issues
Information/Course Material:
Password:
Further Information:
Announcements in the lecture and tutorial
Theoretical knowledge about econometrics
Empirical applications in practice and research:
company valuation
asset pricing
corporate finance
Critical analysis of empirical studies
Usage of the statistical tool STATA
Handling data sets and conduction of own empirical work
Preparation for seminar and master theses
Learning objectives
Literature:
Verbeek, M. (2008): „A Guide to Modern Econometrics“, 3rd edition, Wiley.
Kohler, U., Kreuter, F. (2009): “Data Analyse Using Stata”, 2nd edition, Stata Press.
Organizational Issues
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Corporate Finance IV
Empirical Finance
1. The OLS estimator
2. Application: Multiples
3. Significance of estimated regression coefficients
4. Variable selection
5. The data set
6. Application: CAPM
7. Goodness of fit and significance of regression
8. Typical problems
9. Advanced techniques
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Corporate Finance IV
Empirical Finance
1. The OLS estimator
2. Application: Multiples
3. Significance of estimated regression coefficients
4. Variable selection
5. The data set
6. Application: CAPM
7. Goodness of fit and significance of regression
8. Typical problems
9. Advanced techniques
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess1. The OLS estimator
1.1 Basic idea of OLS estimation
1.2 The univariate OLS estimator
1.3 The multivariate OLS estimator
1.4 Key Assumption
1.5 Correlation vs. causality
Literature: Verbeek, Chapter 2.1, 2.2, 2.3
CF IV – 4-8Dieter Hess
1.1 Basic Idea of OLS Estimation
Suppose we have a sample of N=10 observations of a firm’s stock price yi
and its earnings per share (EPS) xi.
Economic model: Higher eps imply higher stock prices
Econometric question: Are xi and yi related?
i Stock EPS
1 60 5
2 30 2
3 25 1
4 55 4
5 43 4.5
6 39 2
7 62 4
8 44 3
9 46 3.5
10 52 4
0
10
20
30
40
50
60
70
0 1 2 3 4 5 6sto
ck p
rice
earnings per share
CF IV – 4-9Dieter Hess
The economic model describes the theoretical relation R between xi and yi:
which means that stock prices are somehow related to a firm’s EPS.
Most economic relations can not be measured without errors due to
unobservable factors.
Thus, an econometric model looks for the best approximation to the true
relation R:
1.1 Basic Idea of OLS Estimation
( )y f x
( )y R x
CF IV – 4-10Dieter Hess
The classical OLS model always assumes a linear relation btw. x and y
where α and β are (constant) parameters.
Econometric question (OLS):
What is the best linear approximation to the true relation?
0
10
20
30
40
50
60
70
0 1 2 3 4 5 6earnings per share
1.1 Basic Idea of OLS Estimation
i iy x
sto
ck p
rice
CF IV – 4-11Dieter Hess
0
10
20
30
40
50
60
70
0 1 2 3 4 5 6
sto
ck p
rice
earnings per share
1.1 Basic Idea of OLS Estimation
Idea: The best linear approximation is achieved, if the sum of squared
differences (= residuals = approximation errors) is minimal.
In the case where we have just one regressor:
squared residuals = squared vertical distances btw yi and its fitted value.
All fitted values are on a straight line, the regression line.
u
y
CF IV – 4-12Dieter Hess
Classical Ordinary Least Squares (OLS) Model:
with
Question: How can we estimate α and β ?
dependent / endogenous variable (observation )
independent / exogenous variable (observation )
constant
OLS estimator
error term (residual of observation )
i
i
i
y i
x i
i
1,...,i i i
y x i N
observable
not observable
→ to be estimated
1.1 Basic Idea of OLS Estimation
(1)
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess1. The OLS estimator
1.1 Basic idea of OLS estimation
1.2 The univariate OLS estimator
1.3 The multivariate OLS estimator
1.4 Key Assumption
1.5 Correlation vs. causality
CF IV – 4-14Dieter Hess
1.2 The univariate OLS estimator
(2.2)
2
ˆ ˆ
1
The OLS estimator is achieved by
ˆ min!
ˆWe know that the residuals measure the vertical distance
ˆbetween observation and the regression line :
N
iuui
i
i
S u
u
y R
ˆ ˆ
ˆSince is derived from estimation model (2) we obtain
ˆˆ ˆ
and therefore
i i i
i
i i i
u y y
y
u y x
2
ˆ ˆ
1
ˆˆ ( - - ) min!N
i iuui
S y x
(2.1)
CF IV – 4-15Dieter Hess
The First Order Conditions are obtained by
which can be rewritten as
These equations are the so-called normal equations of the OLS estimation.
1.2 The univariate OLS estimator
!ˆ ˆ
1
!ˆ ˆ
1
ˆ2( ˆ )( 1) 0ˆ
ˆ2( ˆ )( ) 0ˆ
Nuu
i i
i
Nuu
i i i
i
Sy x
Sy x x
1 1
2
1 1 1
ˆˆ
ˆˆ
N N
i i
i i
N N N
i i i i
i i i
y N x
y x x x
(2.2.3)
(2.2.4)
(2.2.1)
(2.2.2)
CF IV – 4-16Dieter Hess
1.2 The univariate OLS estimator
1 1
1
1 1Dividing (2.2.3) over and substituting and leads to
ˆˆ
ˆSolving the above equation for and inserting into 2.2.4 gives
N N
i i
i i
N
i i
i
N y y x xN N
y x
y x
2
1 1
2 2 2
1 1
1 1
2 2 2 2
1 1
ˆ ˆ( )
ˆ ˆ ˆ ˆ ( )
and therefore
1
ˆ 1
N N
i i
i i
N N
i i
i i
N N
i i i ixyi i
N N
xi i
i i
y x x x
y x Nx x Nxy Nx x
y x Nxy y x xyCovN
Varx Nx x x
N
(2.3)
CF IV – 4-17Dieter Hess
Thus, the intercept is determined such that the average approximation error
(residual) is equal to zero.
1.2 The univariate OLS estimator
1 1
ˆSimilarly, for we get:
1 1ˆˆ
ˆˆ
N N
i i
i i
y xN N
or
y x (2.4)
CF IV – 4-18Dieter Hess
Example: Stock price on EPS
1.2 The univariate OLS estimator
1
2 2
1
1,625.5 10 3.3 45.6ˆ 123.5 10 10.89
120.78.2671
14.6
ˆˆ 45.6 8.2671 3.3 18.3186
N
i i
i
N
i
i
y x Nxy
x Nx
y x
i y x x² xy
1 60 5 25 300
2 30 2 4 60
3 25 1 1 25
4 55 4 16 220
5 43 4.5 20.25 193.5
6 39 2 4 78
7 62 4 16 248
8 44 3 9 132
9 46 3.5 12.25 161
10 52 4 16 208
CF IV – 4-19Dieter Hess
1.2 The univariate OLS estimator
Interpretation of the slope parameter (β)
For testing theories, we are most interested in the sign of β
(not so much in its value) and whether β is significant.
E.g., a significantly negative β estimate would be in sharp contrast
to all standard valuation theories
For practical applications, the value (and significance) of β is also
interesting.
In the example, the estimated β suggests that a company could
raise its stock price by 8.27 € if it can manage to increase EPS by 1 €.
CF IV – 4-20Dieter Hess
1.2 The univariate OLS estimator
Now, we can analyze what the affect of a changing EPS on the stock price
is.
Assume that the first firm (i=1) manages to raise their EPS from 5 to 6.5.
The new estimated stock price is higher than before due to the positive
coefficient.
Note, this is just an approximation based on the given data.
ˆˆ ˆ 18.3186 8.2671 6.5 72.0548 y x
CF IV – 4-21Dieter Hess
1.2 The univariate OLS estimator
Interpretation of the intercept parameter (α)
The intercept parameter is the predicted value of y given x=0.
In some cases it makes little sense to set x=0 though.
In the example:
What is the value of a company with zero earnings?
Anything wrong with this question?
CF IV – 4-22Dieter Hess
1.2 The univariate OLS estimator
Whether we want to interpret or not, the intercept should always be included
in a regression!
Otherwise we risk to get biased slope coefficients.
Excluding the intercept means that we impose a restriction, i.e., α = 0
00
i iy x i iy x
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess1. The OLS estimator
1.1 Basic idea of OLS estimation
1.2 The univariate OLS estimator
1.3 The multivariate OLS estimator
1.4 Key Assumption
1.5 Correlation vs. causality
CF IV – 4-24Dieter Hess
Univariate regression model
From a univariate regression we can only draw ceteris paribus conclusions
stock price = α + β · EPS+ ε
o How do earnings per share affect the stock price given that all other factors
affecting the stock price (which were omitted) are uncorrelated with EPS.
o In most cases such an assumption will be unrealistic.
Multivariate regression model
The multivariate regression analysis allows to explicitly control for many other
factors that simultaneously affect the dependent variable
stock price = β0 + β1 · EPS + β2 · expected growth rate + … + ε
→ This is important both for testing economic theories and for evaluating
policy effects when we must rely on non-experimental data.
1.3 The multivariate OLS estimator
CF IV – 4-25Dieter Hess
The multivariate regression model…
… is more flexible since more general functional forms can be incorporated.
… can explain more variation of y.
… can infer causality in cases where the univariate model would be misleading.
… is the most widely used vehicle for empirical analysis in economics and other
social sciences.
1.3 The multivariate OLS estimator
CF IV – 4-26Dieter Hess
The general multivariate linear regression model is written as
(1.3)
with β0 intercept (i.e. “α“ in the univariate model)
βj j=1,…,k change in y with respect to xi,
holding the other factors fixed
Since there are k independent variables and one intercept, equation (1.3)
contains k+1 (unknown) population parameters.
1.3 The multivariate OLS estimator
0 1 1 2 2 ... k ky x x x
CF IV – 4-27Dieter Hess
Interpretation
In the general case,
the estimates of βi have a partial effects (or ceteris paribus)
interpretation:
i.e., how much will y change if we observe a change in xi
whereas all other xj ≠ i remain unchanged
Nevertheless we can also evaluate the overall impact of simultaneous
changes in all / several of the x1, x2, …, xk
1.3 The multivariate OLS estimator
ˆˆi iy x
0 1 1 2 2...
k ky x x x
1 1 2 2ˆ ˆ ˆˆ
k ky x x x
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess1. The OLS estimator
1.1 Basic idea of OLS estimation
1.2 The univariate OLS estimator
1.3 The multivariate OLS estimator
1.4 Key Assumption
1.5 Correlation vs. causality
CF IV – 4-29Dieter Hess
Key Assumption: Zero conditional mean assumption
The key assumption for the general model is
E [ ε | x1, x2, … xk ] = 0
i.e. the error terms are uncorrelated with (any combination of)
the explanatory variables.
In a model with two independent variables, this simplifies to E [ ε | x1, x2 ] = 0.
I.e., for any (values of) x1 and x2 in the population, we expect
that the error term will be zero.
1.4 Key Assumption
CF IV – 4-30Dieter Hess
Since the error term accounts for factors left out (i.e. unobserved data),
this assumption means that the missing factors are not related to
the factors included in the model (i.e. x1 and x2).
This assumption is crucial to obtain unbiased OLS estimators!
However, since we can never observe the true residuals (only the estimated
residuals) it cannot be directly tested.
1.4 Key Assumption
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess1. The OLS estimator
1.1 Basic idea of OLS estimation
1.2 The univariate OLS estimator
1.3 The multivariate OLS estimator
1.4 Key Assumption
1.5 Correlation vs. causality
CF IV – 4-32Dieter Hess
The standard regression models assume a linear relationship between
exogenous (xi) and endogenous (yi) variable.
Questions:
Does this linear relationship approximate the true economic model?
Could the relation between x and y be random even though we can
estimate plausible values for α and β?
Does a change in x really cause a change in y?
1.5 Correlation vs. causality
i iy x
CF IV – 4-33Dieter Hess
Correlation:
A measure of the degree and type of a relationship how two observations x
and y vary together.
Correlation does not account for the source of the variation.
Causality:
An action is said to cause an outcome if the outcome is the direct result or
consequence of that action.
A change in x leads to a specific, measurable change in y.
1.5 Correlation vs. causality
CF IV – 4-34Dieter Hess
An interesting (non-financial) example:
“The researchers predict that one type of health problem will increase with
rising intelligence. Asthma and other allergies are thought by many experts to
be rising […]. Some studies already suggest a (positive) correlation between a
country’s allergy levels and its average IQ.”
(The Economist, June 2010)
Is education/intelligence/IQ the cause for asthma and other allergies?
1.5 Correlation vs. causality
CF IV – 4-35Dieter Hess
No, education is not the cause for allergies.
People in developed countries tend to have higher education than in non-
developed countries.
People in developed countries are more often exposed to chemistry and
artificial products and, hence, are more likely to get an allergy.
How about IQ and intelligence leading to allergies?
If education causes intelligence, then the causal link is the same as above.
When the error term contains factors affecting y that are also correlated with x
it can result in spurious correlation:
We find a relationship between y and x that is actually due to other
unobserved factors that affect y and also happen to be correlated with x.
1.5 Correlation vs. causality
CF IV – 4-36Dieter Hess
Another interesting (non-financial) example:
Researchers found a correlation between a country’s average IQ and its
disease burden.
Is “disease burden” the cause in this study?
Or, more provocative, are ill people less intelligent?
1.5 Correlation vs. causality
CF IV – 4-37Dieter Hess
Yes, disease burden is the causality:
“There is direct evidence that infections and parasites affect cognition.
… Intestinal worms and Malaria are bad for the brain.
… Diarrhoea strikes children hard. ... It prevents the absorption of food
at a time when the brain is growing and developing rapidly.”
(The Economist, June 2010)
Nevertheless, the provocative phrase is wrong. IQ is only affected if one
catches the really bad diseases (at a really bad time).
1.5 Correlation vs. causality
CF IV – 4-38Dieter Hess
Finding a correlation is not sufficient.
A regression only shows the correlation between x and y but does not give a
hint on the causality.
One has to use economic theory to prove and interpret the documented
relationship.
Example: “Stock price and EPS”:
The dividend discount model shows that the stock price is an increasing
function of earnings.
1.5 Correlation vs. causality
CF IV – 4-39Dieter Hess
Wrap up
The basic idea of OLS estimation is to find the
best linear approximation
for the true relation between the explained and the explanatory variables. This
is achieved by minimizing the sum of squared residuals.
There are univariate and multivariate OLS models.
A multivariate OLS model is more flexible and is the most widely used vehicle
for empirical analysis in economics as well as other social sciences.
Under the Gauss-Markov assumptions, the OLS estimator is the
best linear unbiased estimator (BLUE)
for the true .
β reflects the correlation between x and y but not necessarily the causality.
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Corporate Finance IV
Empirical Finance
1. The OLS estimator
2. Application: Multiples
3. Significance of estimated regression coefficients
4. Variable selection
5. The data set
6. Application: CAPM
7. Goodness of fit and significance of regression
8. Typical problems
9. Advanced techniques
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess2. Application: Multiples
2.1 Theoretical background
2.2 Estimation of multiples
Literature: Damodaran (2001), chapter 24
CF IV – 4-42Dieter Hess
Multiple valuation = valuation relative to comparable companies
Technique used frequently in enterprise valuation (M&A)
Target company is valued based on the current pricing of companies with
similar characteristics (comparative company approach)
Example: Price earnings ratio (P/E ratio)
According to the P/E ratio the value (or price) of a company
is just a multiple of the earnings it generates
or
2.1 Theoretical Background
stock market price (per share)P/E ratio
net earnings (per share)
stock market price P/E ratio net earnings
CF IV – 4-43Dieter Hess
2.1 Theoretical Background
Relation of P/E Ratio to traditional approaches
Standard DDM:
Assuming a constant payout ratio: d = Dt / NEt for all t
and a constant growth rate of earnings: Dt+1 = Dt (1+g),
we get
0
1 1
(1 ) (1 )
N Nt t t
t tt t
D d NES
k k
St Market price per share
NEt Net earnings per share
Dt Dividend per share
d payout ratio
k discount rate
00 0
1
(1 ) 1
(1 )
tN
tt
d NE g gS d NE
k k g
CF IV – 4-44Dieter Hess
2.1 Theoretical Background
Rearranging
yields
→ P/E ratio depends on
the expected growth rate of earnings (g)
the company risk (i.e. the risk adjusted discount rate rate k)
the expected payout ratio (d)
0 0
1
gS d NE
k g
0
0
1 P/E ratio
S gd
NE k g
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
2.1 Theoretical background
2.2 Estimation of multiples
2. Application: Multiples
CF IV – 4-46Dieter Hess
First step: Data collection
We want to value the American XYZ brewery employing a P/E Ratio approach.
Information on comparable companies in the beverage industry can be
obtained by several data providers, e.g.
Thomson Reuters Datastream
Bloomberg
Compustat (Standard & Poor’s)
Center for Research in Security Prices (CRSP)
yahoo finance, Onvista, and others
2.2 Estimation of Multiples
CF IV – 4-47Dieter Hess
Second step: Selection of a sample of comparable firms
Our economic model (i.e. DDM) suggests that the P/E ratio is determined by
the (expected) growth rate, payout ratio and earnings risk:
P/E Ratio = f ( growth, payout, risk )
Therefore, comparable firms are the ones with a similar growth potential, a
similar payout ratio and a similar earnings risk.
Finding firms with identical characteristics is virtually impossible
Often it is even hard to find firms that are “similar” with respect
to all three characteristics
Therefore, we have to identify ways of controlling for differences across
firms on these variables.
2.2 Estimation of Multiples
CF IV – 4-48Dieter Hess
Third step: Regression analysis
Traditional multiple analysis selects all (or some) firms from the same industry
(peer group) and computes the average multiple of these firms.
However, a simple average is equal to running a (degenerated) regression of
the multiples on a constant term, i.e.
yi = α0 + εi
where yi = P/E ratio of company i
2.2 Estimation of Multiples
CF IV – 4-49Dieter Hess
2.2 Estimation of Multiples
ˆRemember that is defined as:
1 1ˆˆ
ˆ
Hence, the intercept equals the average
dependent variable if we run a
regression on a constant term:
ˆˆ 0
i iy xN N
y x
y
Company P/E RatioCoca-Cola Bottling 29.18
Molson Inc. Ltd. 'A' 43.65
Anheuser-Busch 24.31
Corby Distilleries Ltd. 16.24
Chalone Wine Group Ltd. 21.76
Andres Wines Ltd. 'A' 8.96
Average: 24.0167
regression output
coefficients t-stat
constant 24.0167 4.8459
CF IV – 4-50Dieter Hess
Such a simple regression (y on a constant) neglects any differences in value
relevant characteristics (e.g., growth, risk) across peer group firms.
In contrast, a regular regression can easily account for differences
in the value drivers across firms.
Procedure:
Run a multivariate regression of multiples (y) against variables which
presumably influence multiples (e.g. x1 = growth rate, x2 = risk, …):
yi = α0 + 1 xi,1 + 2 xi,2 + … + εi
2.2 Estimation of Multiples
Use the estimated coefficients from this regression to compute
ˆpredicted multiples (i.e. )iy
CF IV – 4-51Dieter Hess
2.2 Estimation of Multiples
Assume we want to value a company with the following value characteristics:
estimated earnings growth rate: 8%
suggested payout ratio: 15%
estimated risk (= annualized standard deviation of stock returns): 23.5%
Earnings in 2006: 3.14 millions
Furthermore assume we want to ignore differences in the payout ratio, but
account for differences with respect to risk and expected growth.
CF IV – 4-52Dieter Hess
Example: Regression Analysis
Economic hypothesis: PER is primarily influenced by growth and risk
Regression equation:
Company
"PER"
(price earnings
ratio)
"growth"
(analysts forecast of
company growth rate)
"risk"
(annualized standard deviation
of stock market returns)
Coca-Cola Bottling 29,18 9,50% 20,58%
Molson Inc. Ltd. 'A' 43,65 15,50% 21,88%
Anheuser-Busch 24,31 11,00% 22,92%
Corby Distilleries Ltd. 16,24 7,50% 23,66%
Chalone Wine Group Ltd. 21,76 14,00% 24,08%
Andres Wines Ltd. 'A' 8,96 3,50% 24,70%
Average: 24,02 10% 23%
2.2 Estimation of Multiples
0 1 2i i i iPER growth risk
CF IV – 4-53Dieter Hess
2.2 Estimation of Multiples
Download the data set “multiples.dta”.
You may open the data set with a simple double click from Windows Explorer.
Alternatively, open stata
and upload the data set by typing in the command folder:
cd “d:\...\my_data_directory”
use “multiples.dta”
available variables
(after reading the data)make sure you are in the command folder
(bottom right) when typing commands
use “multiples.dta”
CF IV – 4-54Dieter Hess
2.2 Estimation of Multiples
The syntax for running a regression is:
regress [dependent variable] [list of independent variabels]
We want to regress the variable “PER” on the variables “growth” and “risk”.
In Stata syntax:
regress PER growth risk
CF IV – 4-55Dieter Hess
2.2 Estimation of Multiples
From the regression results we would calculate the following „adjusted“
multiple:
and thus a firm value of
In contrast, a traditional P/E ratio (based on the plain average, i.e. neglecting
differences in value drivers) would result in a firm value of
0 1 2
96.0088 1.6908 3.8825
96.0088 1.6908 8 3.8825 23.5 18.2965
i i i
i i
PER growth risk
growth risk
18.2965 3,140,000 57,451,010i
PER Earnings
24.02 3,140,000 75,422,800i
PER Earnings
CF IV – 4-56Dieter Hess
Summary :
In its simplest form (yi = β0 + εi) the regression approach is identical to
computing an average as in traditional multiple analysis
Advantages of regular regression (yi = β0 + β1 x1 + … + εi) :
Regression allows to explicitly quantify the impact of additional factors
(i.e. differences in the characteristics of the “peer group”)
Allows to test for the significance of the impact of such differences
Allows to use a larger sample (i.e. by including companies from other
industries while controlling for differences across industries)
→ allows examination of over- and under-valuations across whole market
2.2 Estimation of Multiples
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Corporate Finance IV
Empirical Finance
1. The OLS estimator
2. Application: Multiples
3. Significance of estimated regression coefficients
4. Variable selection
5. The data set
6. Application: CAPM
7. Goodness of fit and significance of regression
8. Typical problems
9. Advanced techniques
CF IV – 4-58Dieter Hess
Testing hypothesis regarding a single parameter: t-test
Running a regression we get some parameter estimates.
Now, an important question is how reliable the values of the estimated
parameters are?
E.g., for the influence factors of
P/E ratios we obtained the estimates
But can we really say that the influence of earnings growth is positive?
Or could the true parameter value of growth just be zero?
→ Test: How large is the probability of making a mistake
when we say that growth positively influences P/E ratios?
3. Significance of estimated regression coefficients
coefficientsintercept 96.0088045growth 1.69075039risk -3.88251778
CF IV – 4-59Dieter Hess
3. Significance of estimated regression coefficients
A simulation example (simulation1_OLS.do)
Draws 100 obs. for X1, X2 and ε and calculates Y = α + β1 X1 – β2 X2 + ε
Then the parameters are estimated using the simulated data
Results for a “small” residual variance (relative to the variance of X1 and X2):
Results for a “large” residual variance:
true alpha = 0.1 beta1 = 1.5 beta2 = -0.5 sigma(eps) = 10.0
# 1 est. alpha = -1.5320 beta1 = 1.3342 beta2 = -0.3292
# 2 -1.3733 2.3848 1.1092
# 3 1.8602 2.1849 -1.2090
# 4 0.6060 1.3722 -0.1153
# 5 0.5676 0.3786 -0.5342
true alpha = 0.1 beta1 = 1.5 beta2 = -0.5 sigma(eps) = 0.1
# 1 est. alpha = 0.1035 beta1 = 1.4990 beta2 = -0.5174
# 2 0.0941 1.4839 -0.4879
# 3 0.1197 1.5095 -0.4792
# 4 0.0945 1.5031 -0.5079
# 5 0.1200 1.5046 -0.4994
CF IV – 4-60Dieter Hess
3. Significance of estimated regression coefficients
A second simulation example (simulation2_OLS.do)
Simulation2_OLS.do repeats the previous simulation 1.000 times
and produces a histogram of the resulting parameter estimates
“Small” residual variance (0.1) “Large” residual variance (10.0)
010
20
30
40
De
nsity
1.46 1.48 1.5 1.52 1.54beta1
0.1
.2.3
.4
De
nsity
-2 0 2 4 6beta1
CF IV – 4-61Dieter Hess
Idea of hypotheses testing
Formally, we start with a given set of hypotheses about β, e.g.
Routinely, stata tests
To test such a hypothesis, we need to know the distribution of β given that
the null hypothesis is valid. From this we can construct a test statistic.
Knowing the distribution, we can say how likely it is to observe a given
value of a test statistic (corresponding to the estimated parameter value).
3. Significance of estimated regression coefficients
0 0
0 1
"Null hypothesis" "Alternative"
: against : j j j jH H
0 1 : 0 against : 0j jH H
CF IV – 4-62Dieter Hess
Under the full ideal conditions (i.e., the Gauss-Markov assumptions (A1) to
(A4) and with normally distributed error terms (A5)) we know that the OLS
estimator (i.e., the vector of parameter estimates) is distributed as
(3.1)
where cjj is the (j, j) element in (X´X)-1.
Typially σ is unknow, but it can be estimated from the (estimated) residuals:
3. Significance of estimated regression coefficients
12ˆ , X X
ˆThus, each element in the vector is normally distributed
2ˆ ,j j jjc
2 2
1
1ˆˆ
( 1)
N
i
i
uN k
CF IV – 4-63Dieter Hess
Based on the estimate of σ we can obtain the t-statistic:
The t-statistic follows a Student’s t-distribution with N-(k+1) degrees of freedom
(→ Appendices t-statistic and Distribution Theory)
If a regression has k regressors and an intercept we obtain k+1 individual
t-statistics, i.e., one for every coefficient plus one for the intercept.
3. Significance of estimated regression coefficients
0 0ˆ ˆ
ˆ( )ˆ
j j j j
j
jjj
tsec
This test statistic can be computed from
ˆ - the estimate and
ˆ - its standard error ( ) = ˆ .
j
j jjse c
CF IV – 4-64Dieter Hess
Example
Consider the OLS estimation:
The null hypothesis H0: β = 0 states that EPS do not have the assumed
effect on the stock price.
In contrast, if we can reject the null hypothesis H0: β = 0,
we can say that the alternative hypothesis, H1: β ≠ 0, holds.
Then EPS are related to stock prices.
3. Significance of estimated regression coefficients
stock price EPS
CF IV – 4-65Dieter Hess
Significance Level
The usual testing strategy is to reject H0 if the test-statistic tj becomes so large
that is very unlikely given H0 is true.
Mathematically speaking, H0 is rejected, if the probability to observe
such a large absolute value of |tj| is quite small, i.e., below some threshold α
(e.g., 5%).
This threshold probability α is called the significance level.
3. Significance of estimated regression coefficients
CF IV – 4-66Dieter Hess
3. Significance of estimated regression coefficients
Critical values (two-sided test)
Formally, we define a critical probability α (=significance level)
and derive a critical value of the t-statistic tN-(k+1);α/2 using
Prob { |tj| > tN–(k+1),α/2 } = α
Then, if H0 holds,
H0 will be correctly accepted
with probability (1 – α).
H0 will be erroneously rejected
with probability α.
1-α
α/2α/2
f(t)
tN-(k+1);α/2- tN-(k+1);α/2t
rejection rejectionacceptance
CF IV – 4-67Dieter Hess
Critical value (one-sided test)
For a right-sided test, the critical value is determined from
Prob { tj > tN–(k+1),α } = α
It is important to note that a one-sided test
makes it "easier" to reject the null hypothesis
because it requires a smaller critical value
to reject, i.e., tN–(k+1),α < tN–(k+1),α/2 .
3. Significance of estimated regression coefficients
α
f(t)
tN-(k+1);αt
rejectionacceptance
1-α
CF IV – 4-68Dieter Hess
p-values
Basic question: Which significance level should be tested?
Different researchers prefer different significance levels,
depending on the particular application.
There is no „correct“ significance level.
Typically, for large N we require smaller α.
Rather than testing at different significance levels, one can ask:
What is the smallest significance level, at which H0 would be rejected?
This is the p-value.
In other words, the p-value is the probability of observing a given t-statistic
if the null hypothesis is true.
3. Significance of estimated regression coefficients
CF IV – 4-69Dieter Hess
Example
If the p-value = 0.6, then we expect to observe a value of the t-statistic as
extreme as we did in 60% of all random samples if H0 is true.
In other words, it is not unusual to observe such a t-statistic (given H0 is true).
In contrast, if the p-value was 0.01 there is only a chance of 1% to observe
such a t-statistic (given H0 is true).
In other words, it is very unlikely to observe such a t-statistic
and therefore it is unlikely that H0 is true.
In general, the smaller p-values the harder the evidence against H0.
3. Significance of estimated regression coefficients
CF IV – 4-70Dieter Hess
3. Significance of estimated regression coefficients
When a hypothesis
is statistically tested, two types of errors can be made:
1. H0 is rejected, although it is true
(i.e. H0 is erroneously rejected)
→ Type I error.
2. H0 is not rejected, although H1 is true
(i.e. H0 is erroneously accepted)
→ Type II error
Reality
correctType II
error
Type I
errorcorrect
0 0
0 1: vs. : j j j jH H
0j j
0j j 0j j
0j j
De
cis
ion
CF IV – 4-71Dieter Hess
3. Significance of estimated regression coefficients
Example
Recall the regression output from the multiple application:
0 1 2 i i i iPER growth risk
CF IV – 4-72Dieter Hess
3. Chapter Summary
A hypothesis test is the use of statistics to decide whether a hypothesis is true.
If the observed values are unlikely under the assumption of H0, it indicates that
H0 does not hold.
In regression analysis, one obtains for every regressor, including the intercept,
a t-statistic
,
which is t-distributed with N-(k+1) degrees of freedom.
These t-statistics, or equivalently the corresponding p-values, are used in two-
sided or one-sided hypothesis tests.
Type I error, the probability that H0 is erroneously rejected, is determined by
choosing the significant level α of the test. Type II error, the probability that H0
is not rejected although it is not true, depends on the true parameters.
0ˆ
ˆ
j j
j
jj
tc
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Statistical Appendix
CF IV – 4-74Dieter Hess
From (3.1) follows, we can standardize β, i.e.,
Then, the resulting variable z has a standard normal distribution, i.e. z ~ N(0,1).
However, to obtain the above expression we have to assume that σ is known.
But in practice, this is typically not the case.
Since σ is typically unknown, we have to substitute σ (resp. σ2) by an
appropriate estimate, i.e.,
But then, the above given ratio z changes and does no longer follow exactly a
standard normal distribution.
Appendix: t-statistic
0ˆ
j j
jj
zc
2 2
1
1ˆˆ
( 1)
N
i
i
uN k
CF IV – 4-75Dieter Hess
Therefore, the t-statistic (i.e., the z-statistic after substituting σ with an estimate)
is the ratio of a standard normal variable and the square root of an independent
Chi-squared variable.
This ratio has a t-distribution (Student’s t-distribution) with N-(k+1) degrees of
freedom.
(→ Appendix: Distribution Theory)
Appendix: t-statistic
0ˆ
ˆ
j j
j
jj
tc
ˆIt can be shown that the estimate ˆ is independent of and has a
Chi-squared distribution with ( 1) degrees of freedom:N k
2
2
( 1)2
ˆ( 1)
N kN k
CF IV – 4-76Dieter Hess
Example
Consider the OLS estimation:
The null hypothesis H0: β = 0 states that EPS do not have the assumed
effect on the stock price.
In contrast, if the null hypothesis H0: β = 0 does not hold, we can say that
the alternative hypothesis, H1: β ≠ 0, holds. Then EPS are related to stock
prices.
Appendix: t-statistic
stock price EPS
CF IV – 4-77Dieter Hess
A normally distributed variable follows a standard normal distribution if µ = 0
and σ = 1. The density function of such a variable is given by
If z1, …, zJ is a set of independent normal variables with mean µ and
variance σ2, it follows that sum of the squared standardized variables
follows a Chi-squared distribution with
J degrees of freedom.
Appendix: Distribution Theory
21 1exp
22z z
2
2
21
( )Jj
J
j
z
0
2
(5)
2
(10)
2
(30)
2
( )-value
v
CF IV – 4-78Dieter Hess
If z has a standard normal distribution, z ~ N(0,1)
and
and are independent,
the ratio has a t-distribution with J degrees of freedom.
If J approaches infinity, the t-distribution approaches the normal distribution.
Appendix: Distribution Theory
2
J
and z
zt
J
t (∞) = N(0,1)
t (4)
t (1)
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Corporate Finance IV
Empirical Finance
1. The OLS estimator
2. Application: Multiples
3. Significance of estimated regression coefficients
4. Variable selection
5. The data set
6. Application: CAPM
7. Goodness of fit and significance of regression
8. Typical problems
9. Advanced techniques
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess4. Variable selection
4.1 Dummy variables
4.2 More or less variables?
4.3 Proxies for unknown variables
Literature: Wooldridge, chapter 7
CF IV – 4-81Dieter Hess
Dummy variables provide a powerful concept for empirical work
On the one hand they allow to incorporate qualitative factors, e.g.
industry of a firm (manufacturing, retail, etc.)
member of an index (S&P500, DJ, …),
...
Since the “ordering” of such qualitative variables conveys no useful
information, they are typically coded in binary form (e.g., 1 = firm belongs
to manufacturing industry, 0 = does not belong to manufacturing industry
recession; or: 1 = economy is in a recession, 0 = expansion).
On the other they allow to account for important differences between
observations, e.g.,
state of the economy (expansion vs. recession)
asset characteristics (fixed vs. current; long-term vs. short-term, …)
4.1 Dummy Variables
CF IV – 4-82Dieter Hess
4.1 Dummy Variables
How to create dummy variables?
In defining a dummy variable, we must decide which event is assigned the
value one and which is assigned the value zero.
For example, we distinguish firms whether they belong to the S&P 500 index
or not.
For every observation in our data set we have to compute a value for the
following dummy variable.
1
0
, if firm belongs to the S&P 500 at date t index
, else
CF IV – 4-83Dieter Hess
4.1 Dummy Variables
Example
Assume we have two datasets
(1) EPS_data.dta (containing the variables: Year FirmIdent EPS)
(2) INDEX_data.dta (containing the variables: Year FirmIdent IndexName)
For every observation in our EPS data set we want to compute a dummy
variable called “index” indicating whether a firm in a given year is a member of
the S&P500:
use EPS_data.dta
merge Year FirmIdent using INDEX_data.dta
gen index = 0
replace index = 1 if IndexName == “S&P500”
CF IV – 4-84Dieter Hess
4.1 Dummy Variables
How do we incorporate dummy variables into regression models?
We can use them just like any other variable
For example, in the simplest case of only a single dummy explanatory
variable, we just add the dummy variable as an additional independent
variable:
(4.4)0 1 2 stock price index EPS
CF IV – 4-85Dieter Hess
How to interpret the results?
In model (4.4), only two observed factors affect the stock price:
earnings per share (EPS) and index membership (index).
with index = 1 when the firm belongs to the S&P500
index = 0 when the firm is not listed in the S&P500
Then β1 captures the difference in stock prices between stocks belonging
to the index those not belonging to it, given the same earnings per share.
4.1 Dummy Variables
11 0 E stock price index ,EPS E stock price index ,EPS
0 1 2 stock price index EPS
CF IV – 4-86Dieter Hess
The results can be depicted graphically as an intercept shift between two
regression lines (with the same slope β2)
4.1 Dummy Variables
stock = β0+ β2EPS
stock price
EPS
β0
β0 + β1
slope = β2
stock = β0+ β1 index + β2EPS
CF IV – 4-87Dieter Hess
On a large sample for the year 1990 (stock_eps_1990.dta) we obtain:
From this regression we can conclude that stock prices tend to be higher
for S&P 500 Index firms (at the same level of earnings per share).
4.1 Dummy Variables
CF IV – 4-88Dieter Hess
In our example, we have chosen firms not belonging to the S&P500 to be the
base group or benchmark group (by assigning the value 0 to those firms)
The base group is defined as the group, against which comparisons are made.
We could run the same regression slightly differently by dropping the overall
intercept in the model and including a dummy variable for each group:
4.1 Dummy Variables
stock price = β0 · non index + β1 · index + β2 · EPS + ε
with1
0
, if firm belongs to the S&P500index
, else
1
0
, if firm does not belong to the S&P 500non index
, else
CF IV – 4-89Dieter Hess
The resulting regression lines are identical, but the coefficients will be different
and have to be interpreted differently:
β0 = intercept for non-members (as before)
β1 = intercept for members (before β0 + β1)
4.1 Dummy Variables
stock = β0 non index + β2EPS
stock price
EPS
β0
β1
slope = β2
stock = β1 index + β2EPS
CF IV – 4-90Dieter Hess
4.1 Dummy Variables
(1) stock price = β0 + β1 · index + β2 · EPS + ε
(2) stock price = β0 · non index + β1 · index + β2 · EPS + ε
Why should we use one or the other specification?
Specification (2) is preferable if you are interested in the values of the
intercepts of each group. But you can also find this out with specification (1).
Specification (1) is preferable if you are interested whether there is a statistical
difference in the intercepts. Then you just need to look at the p-value of β1.
From specification (2) this cannot be seen without additional testing.
CF IV – 4-91Dieter Hess
Can we also estimate the following model?
In this case, the matrix X would look like:
→ exact multicollinearity: constant = non index + index
4.1 Dummy Variables
(3) stock price = β0 + β1 · non index + β2 · index + β3 · EPS + ε
i constant
(β0)
Non index
(β1)
Index
(β2)
EPS
(β3)
1 1 0 1 5
2 1 1 0 2
3 1 1 0 1
4 1 0 1 4
… … … …
CF IV – 4-92Dieter Hess
Dummy Variables for multiple categories
What happens if a variable has more than 2 categories,
for example, g industry groups?
Then the regression model needs to produce different intercepts for
these g categories, by
(1) including g -1 dummy variables along with an overall intercept, or
(2) including g dummy variables without an overall intercept.
Again in case (1), the intercept for the base group is the overall intercept in
the model and the g - 1 dummy variable coefficients represent the estimated
differences between a particular group and the base group.
As in the previous 2 category example, including g dummy variables along
with an intercept results in exact muliticollinearity.
4.1 Dummy Variables
CF IV – 4-93Dieter Hess
Interaction terms
A further powerful application of dummy variables is to interacted them with
explanatory variables to allow for a difference in slopes.
We construct a new variable by multiplying the dummy with another
independent variable
For example, we want to test whether an increase in earnings per share
affects S&P500 firms more than non S&P500 firms.
In addition, we could still allow for a constant EPS differential between the
two sections:
4.1 Dummy Variables
i iinteract index EPS
stock price = β0 + β1 · index + β2 · EPS + β3 · interact + ε
CF IV – 4-94Dieter Hess
With
we can allow for
(1) different intercepts across the 2 groups
β1: measures the difference in intercepts between index and non index firms
(2) different slopes across the 2 groups
β3 : measures the difference in the strength of the impact of EPS
4.1 Dummy Variables
stock price = β0 + β1 · index + β2 · EPS + β3 · interact + ε
stock
price
non S&P500 firms
EPS
S&P500 firms
E.g., if β1 >0 and β3 > 0 we get:
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess4. Variable selection
4.1 Dummy variables
4.2 More or less variables?
4.3 Proxies for unknown variables
Literature: Wooldridge, chapter 3
CF IV – 4-96Dieter Hess
Selecting Regressors
To find potentially relevant variables, economic theory should be used.
For example, when specifying expected returns we will use finance theory.
(CAPM, Fama French, … )
Although it is sometimes suggested to select variables on the basis of
statistical arguments (i.e., tests), those are never certainty arguments.
Remember, there is always a possibility of a type I or II error.
Selecting regressors solely based on statistical test sequences
is sometimes referred to as data mining.
4.2 More or less variables?
CF IV – 4-97Dieter Hess
Selection Biases
What happens …
(1) when a relevant variable is excluded from the model?
(2) when an irrelevant variable is included in the model?
(3) when the included variables are (highly) correlated?
4.2 More or less variables?
CF IV – 4-98Dieter Hess
(1) Omitting relevant variables
Consider the following two models
(4.2)
and
(4.3)
Both models can be interpreted as describing the conditional expectations
of yi given xi, zi (and maybe some additional variables).
We say model (4.3) is nested in (4.2) and implicitly assumes that zi is
irrelevant (γ = 0).
4.2 More or less variables?
i i i iy x z
i i iy a bx u
CF IV – 4-99Dieter Hess
1.3 The multivariate OLS estimator
From this relationship, it follows that and are equal if one of
the following conditions is fulfilled
1.) The partial effect of on is zero in the sample. That is, 0.
2.) and are
b
z y
x z
uncorrelated in the sample. That is 0.
The second term, i.e., γδ, is the omitted variable bias,
i.e., the bias in the OLS estimator b due to estimating an incomplete model.
It can be shown that
where is the slope coefficient from an auxiliary regression of on .
b
z x
CF IV – 4-100Dieter Hess
Omitted variable bias
There will be no omitted variable bias (i.e., the estimator b is unbiased)
in two cases:
(1) If γ = 0, which implies that the omitted variable (zi) has no influence
on y and thus the two models are identical.
(2) If Cov(x,z) = 0 or if E[x∙z] = E[x]∙E[z]
This implies that x and z are uncorrelated.
In this case, x and z are said to be orthogonal.
4.2 More or less variables?
b
CF IV – 4-101Dieter Hess
Example:
Recall the two model specifications from section 4.1:
(1)
and (2)
Assume (1) is the true model, but we have estimated (2). Then the omitted
variable bias is:
OVB = 13.67 - 5.20 = 8.47
Omitting “index” overestimates the effect of “interact”.
4.2 More or less variables?
stock price = β0 + β1 · index + β2 · EPS + β3 · interact + ε
stock price = b0 + b2 · EPS + b3 · interact + ε
(15.09) (15.46) (5.20)
(15.45) (13.67)
CF IV – 4-102Dieter Hess
(2) Including irrelevant variables
Consider again
(4.2)
and
(4.3)
But now assume that we estimate (4.2) while in fact model (4.3) is
appropriate, i.e., we include an irrelevant variable (zi).
In this case, the estimator for β is unbiased (since γ is zero)
But usually the estimated β will have a higher variance (and thus may
be insignificant although the xi variables are relevant ones).
4.2 More or less variables?
i i i iy x z
i i iy a bx u
CF IV – 4-103Dieter Hess
(3) Inclusion of highly correlated variables (Multicollinearity)
In general you can include variables in your model that are correlated.
In our stock price EPS example we may want to include both a S&P500
dummy and the size variable.
However, if the correlation between two variables is too high, this may
lead to problems.
→ Technically, the matrix is close to being not invertible
→ It may be hard for the model to identify the individual impact
of one variable
In the extreme case, one explanatory variable is an exact linear
combination of one or more other explanatory variables (including the
intercept). Then the estimation procedure will break down.
→ This is referred to as exact multicollinearity
4.2 More or less variables?
CF IV – 4-104Dieter Hess
Consider the following example:
Moreover, assume that the sample variances of x1 and x2 are equal to 1,
while the sample covariance is r12.
Then, the variance of the OLS estimator can be written as
4.2 More or less variables?
1 2Assume that all variables are demeaned, i.e.: 0 y x x
1 1 2 2 y x x
12
12 122
2
12 1212
1 2 12
12
|
1 11ˆ 1 11
ˆ ˆ the variances of and increase for larger |,
and therefore, the t-statistics will be lower.
if is positiv
r rNVar
r rN r
r
r
1 2 ˆ ˆe, and will be negatively correlated.
CF IV – 4-105Dieter Hess
4.2 More or less variables?
Coefficient becomes smaller and insignificant when variable RevT,t is included.
Source: Gilbert (2010), Information Aggregation Around Macroeconomic Announcements: Revisions Matter
CF IV – 4-106Dieter Hess
Trade off
→ Including as many variables as possible in a model is not a good strategy
since it may produce insignificant estimates for the relevant variables.
→ Including not enough variables is neither a good strategy
since it may produce biased estimates.
4.2 More or less variables?
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess4. Variable selection
4.1 Dummy variables
4.2 More or less variables?
4.3 Proxies for unknown variables
Literature: Wooldridge, Chapter 6 and 9
CF IV – 4-108Dieter Hess
4.3 Proxies for unknown variables
Multiple economic concepts are difficult to measure. Therefore, we need
proxies (=approximations) in order to work with them.
Example 1: “Risk” as a regressor
Risk is an abstract concept that needs a proper definition.
If a relevant variables is omitted, there could be an omitted variable bias.
Therefore, one needs proxies for not measureable variables.
Idea: Include a measureable variable to capture the effect of risk.
CF IV – 4-109Dieter Hess
4.3 Proxies for unknown variables
Suppose we have
where x2 is the unknown variable that measures risk.
Introduce a proxy x3 that correlates with x2. This is captured in the regression
where ν is an error term. If δ1 is zero then x3 is not a suitable proxy. Else, use
x3 in the regression as if it were the unknown variable x2.
Common proxies for risk are the CAPM beta, the historic standard deviation, or
Value at Risk.
0 1 1 2 2y x x
2 0 1 3x x
CF IV – 4-110Dieter Hess
4.3 Proxies for unknown variables
Example 2: “Marcoeconomic conditions”
There is no perfect definition for the state of the economy (recession or
expansion). Still, the state is an important determinant for multiple models.
The most common proxies are
Macro indicators (CFNAI, NBER, XRIC)
Gross domestic product
Term or default spread
Industry production
Different proxies might lead to different results. If a certain definition is used,
one should do a robustness check with another proxy to exclude that the effect
is driven by the proxy.
CF IV – 4-111Dieter Hess
4.3 Proxies for unknown variables
Source: Bestelmeyer/Hess (2011): Stock Price Responses to Unemployment News: State Dependence and the Effect of Cyclicality
Estimated coefficients
robust when using three
alternative recession
measures (all three
only ex-post observable):
NBER
CFNAI
XRIC
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Corporate Finance IV
Empirical Finance
1. The OLS estimator
2. Application: Multiples
3. Significance of estimated regression coefficients
4. Variable selection
5. The data set
6. Application: CAPM
7. Goodness of fit and significance of regression
8. Typical problems
9. Advanced techniques
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess5. The Data Set
5.1 Data quality
5.2 Summary measures
5.3 Identifying outliers
5.4 Plotting variables
Literature: Olsen (2003): “Data Quality”, Chapter 2.1, 2.3
CF IV – 4-114Dieter Hess
5.1 Data quality
Financial data is available form multiple sources:
Bloomberg, Thomson Reuters, Yahoo Finance
Center for Research in Security Prices (CRSP)
Standard & Poor's Compustat (Compustat)
Statistisches Bundesamt (Germany), Federal Reserve (USA)
Surveys, controlled experiments, own data collection, etc.
CF IV – 4-115Dieter Hess
5.1 Data quality
The outcome of an empirical analysis often depends on data quality.
Data quality has several dimensions, in part.
1. Accuracy: Is the given information precise? Does it contain errors?
Are the measurements correct?
2. Relevance: Does the data describe the given problem?
Does it represent the relevant population?
3. Understanding: Are all variables understood well?
4. Reliability: Is the source of the data reliable?
Was the data collected correctly?
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess5. The Data Set
5.1 Data quality
5.2 Summary measures
5.3 Identifying outliers
5.4 Plotting variables
Literature: Albright/Winston/Zappe (2003): “Data Analysis”, Chapter 2
Kohler/Kreuter (2006), “Datenanlyse mit Stata”, Chapter 7
CF IV – 4-117Dieter Hess
5.2 Summary measures
Frequency Tables
A frequency table lists the number of observations of some variable that fall in
various categories. In Stata:
tabulate [varname]
In our example of the S&P500 indicator (stock_eps_1990.dta) we see
- 91.71% of the data describes non S&P500 stocks
- all 500 S&P stocks are included (good to know!)
CF IV – 4-118Dieter Hess
5.2 Summary measures
Summarize
While tables are helpful for analyzing discrete variables (i.e. dummies),
continuous variable have to many values to be presented in a table. In this
case other descriptive statistics may be helpful, e.g. “summarize” or
“codebook” which provide important statistics such as
min
max
range = max – min
1
/2
1
1mean
median
1ˆstandard deviation ( )²
1
N
i
i
N
N
i
i
x xN
x
s x xN
CF IV – 4-119Dieter Hess
5.2 Summary measures
Summary statistics
In Stata summary statistics are obtained by
summarize [varname]
adding “,d” gives information about percentiles.
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess5. The Data Set
5.1 Data quality
5.2 Summary measures
5.3 Identifying outliers
5.4 Plotting variables
Literature: Albright/Winston/Zappe (2003): “Data Analysis”, Chapter 2
Kohler/Kreuter (2006), “Datenanlyse mit Stata”, Chapter 7
CF IV – 4-121Dieter Hess
5.3 Identifying outliers
Outliers
Outliers are observations that differ strongly from the other observations.
From a practical perspective, outlying observations can occur for two reasons
errors in the data (typos, wrong format, …)
correct but strongly heterogeneous data (e.g. small sample but some
members are quite “different” in relevant aspects)
Dropping or keeping outliers may change estimation results substantially.
But the decision to keep or drop such observations is difficult
and statistical properties of the resulting estimators are complicated.
Detecting outliers:
Max and min give a hint on the existence of outliers.
Graphical inspection gives more hints on the existence of outliers.
CF IV – 4-122Dieter Hess
5.3 Identifying outliers
Outliers
In small samples outlier might lead to misleading results, i.e. more or less
strongly biased coefficient estimates which may be insignificant.
In large samples their marginal influence to estimates is smaller and the
application of regression analysis is less problematic.
CF IV – 4-123Dieter Hess
5.3 Identifying outliers
Example:
When comparing PE-ratios of German automobile manufacturer in November
2008 Volkswagen AG is an outlier due to its unusual high stock price
(maximum was €1005.01 on 28.10.2010).
In a regression of PE-ratio on payout, risk and growth, the coefficients
would be strongly biased.
But it’s not a typo!
So, should we keep or drop VW?
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess5. The Data Set
5.1 Data quality
5.2 Summary measures
5.3 Identifying outliers
5.4 Plotting variables
Literature: Albright/Winston/Zappe (2003): “Data Analysis”, Chapter 2
Kohler/Kreuter (2006), “Datenanlyse mit Stata”, Chapter 7
CF IV – 4-125Dieter Hess
5.4 Plotting variables
Graphical inspection tools
A “look at the data” can be also very helpful, for example with histograms or
scatter plotts.
Histograms
A histogram is a bar chart of frequencies of the categories and comes handy
when dealing with continuous variables. The size of the categories (or
intervals) has to be selected by hand.
It approximates the empirical density function and gives an idea of the range,
distribution, mean and variance of the data.
CF IV – 4-126Dieter Hess
5.4 Plotting variables
In Stata a histogram is obtain by
hist [varname], bin(n)
n chooses the number of categories.
In Stata conditions (if) can be added to almost every command. For example,
we can reduce the sample to S&P500 firms in the following way:
hist eps if SP500ind ==1, bin(50)
CF IV – 4-127Dieter Hess
5.4 Plotting variables
Example
On the right we have the
histogram of earnings per
share in 1990 for our
S&P500 firms
The mean is above zero meaning that the average S&P500 firm has positive
earnings. Most of the data is spread around the mean. The observations on
the far left and right indicate outlier (-3, 6, 9,).
CF IV – 4-128Dieter Hess
5.4 Plotting variables
Scatter Plots
A useful way to picture the relationship between two variables is to plot a point
for each observation, where the coordinates of the points represent the values
of the two variables. By examining the points of this plot called scatter plot we
can usually see whether there is any relationship between the two variable,
and if so, what type of relationship it is.
positive linear relation no relation
CF IV – 4-129Dieter Hess
5.4 Plotting variables
Example
We examine the relation between stock prices and EPS for S&P500 firms but
exclude firms with EPS higher than 5 to obtain a better plot. The Stata
command is
sc [var1] [var2]
By adding
if eps <5 & SP500ind ==1
to the command we can
exclude those observations
with an eps higher than 5.
We find a positive linear
relation between the variables
CF IV – 4-130Dieter Hess
5. Chapter Summary
Data quality is defined in different terms, in part.
accuracy, relevance, understanding, reliability
Outliers are observations that strongly differ from the population.
Their existence influences regression estimates.
Always start with a visual inspection of your data
The distribution can be inspected with tables and histograms.
The relation between two variables can be displayed using a scatter plot.
Summary measures give numeric information about the variables.
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Corporate Finance IV
Empirical Finance
1. The OLS estimator
2. Application: Multiples
3. Significance of estimated regression coefficients
4. Variable selection
5. The data set
6. Application: CAPM
7. Goodness of fit and significance of regression
8. Typical problems
9. Advanced techniques
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess6. Application: CAPM
6.1 Theoretical background
6.2 CAPM tests & extensions
6.3 Empirical exercise
Literature: Copeland/Weston/Shastri (2005), chapter 6.
Ross/Westerfield/Jaffe (2005), chapter 9 and 10.
CF IV – 4-133Dieter Hess
Security market line (SML)
SML measures the expected return of (inefficient) individual assets
dependent on their relevant risk (= β risk)
β measures systematic risk of an asset,
i.e. the risk that cannot be eliminated
through diversification.
This non-diversifiable risk (β risk)
determines the expected return on assets
In equilibrium all assets yield returns
proportional to their non-diversifiable risk,
i.e., expected returns must be on the SML
6.1 Theoretical Background
E Ei f i M fr r r r
rf
M
βi
E[ri]
E[ri]
βM=1βi
E[rM]
,
cov ,i M ii i M
M M
r r
Var r
CF IV – 4-134Dieter Hess
Testing the CAPM?
The CAPM (in particular, the SML) predicts that expected returns on individual
asset returns are driven by their expected relevant risk.
In empirical research, we can observe only ex-post realized returns of different
assets over a number of periods.
However, the SML is an ex-ante equality in terms of unobserved expectations:
This implies that the cross-section of expected returns is explained by the
expected betas.
For a valid test of the CAPM we would need expectations data, i.e., investors’
return expectations for different stocks but also their expected betas (as beta is
presumably non-constant over time):
6.1 Theoretical Background
E and Ei i f i iy r r x
E Ei f M f ir r r r
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess6. Application: CAPM
6.1 Theoretical background
6.2 CAPM tests & extensions
6.3 Empirical exercise
Literature: Haugen (2001), chapter 9
CF IV – 4-136Dieter Hess
Fama-MacBeth (1974)
First study which provides an (approximate) test of the CAPM.
Main Problem: no historical expectations data available.
For example, we simply don’t know what market participants expected
in 1965 about the return and risk of IBM over the next year.
But what could you do if you need to estimate an asset’s β
for the year ahead?
6.2 CAPM tests & extensions
CF IV – 4-137Dieter Hess
For example, the betas you can obtain from Bloomberg, Reuters, …
are simply estimated from the return data of the last (n) year(s).
E.g., IBM’s β for 2012 can be estimated using returns during 2011
Can you get a better prediction? Not very likely!
So why not estimate β for 1965 from market returns during 1964?
6.2 CAPM tests & extensions
2012
cov Returns of IBM during 2011, Market returns during 2011
Market returns during 2011Var
CF IV – 4-138Dieter Hess
Fama/McBeth use a “rolling window” technique to obtain proxies for
investors’ β expectations given the available information at a point in
time:
In addition, to reduce noise in their β predictions, Fama/McBeth estimate
the β’s of portfolios (instead of individual assets)
6.2 CAPM tests & extensions
Next 5 years of data
used to estimate ,6 10ˆi
t=10
First 5 years of data
used to estimate
t=0 t=5
t=10t=0 t=5
,1 5
ˆi
→10 ,11 15 ,6 10
ˆE i i
→ 5 ,6 10 ,1 5ˆE i i
CF IV – 4-139Dieter Hess
Fama/McBeth then regress
actually observed portfolio excess
returns over the next 5 years
on predicted betas from the past 5 years
6.2 CAPM tests & extensions
Exce
ss r
etu
rn
of
Port
folio
iat
tim
e t
Predicted beta
of Portfolio i at time t
, , 0 1 , ,ˆ i t f t i t i tr r a a
, ,ˆi.e., i t i tx
, , ,i.e., i t i t f ty r r
CF IV – 4-140Dieter Hess
To determine, whether the security market line exhibits any evidence of
nonlinearity, Fama/McBeth add an additional term to their equation:
According to the CAPM
→ The intercept a0 should be equal to or greater than the risk-free rate in
the bond market.
→ The security market line should be linear, so the mean value for
the coefficient a2 should not be significantly different from zero.
In fact, a2 should be equal to the average market risk premium,
i.e., E[rM] – rf, which is equal to the slope of the SML.
6.2 CAPM tests & extensions
2
i,t f,t 0 1 i,t 2 i,t i,tˆ ˆ r r a a a
CF IV – 4-141Dieter Hess
The CAPM also predicts, that beta is the only determinant of expected security
returns.
Residual variance (variance not explained by beta) is supposedly unimportant
because it can be diversified.
Fama/MacBeth test this prediction by including a residual variance term in the
relationship (RVi,t = remaining part of variance of portfolio i not explained by
beta):
According to the CAPM, this residual variance should not affect
the expected rate of return of a portfolio!
→ a3 should not be significantly different from zero.
6.2 CAPM tests & extensions
2
i,t f,t 0 1 i,t 2 i,t 3 i,t i,tˆ ˆ r r a a a a RV
CF IV – 4-142Dieter Hess
The central results of the Fama/McBeth test are as follows:
i, t 0 1 i, t i, m
2
i, t 0 1 i, t 2 i, t i, t
ˆ
0.0061* 0.0085*
ˆ ˆ
0.0049*
f
f
r r a a
r r a a a
2
i, t 0 1 i, t 2 i, t 3 i, t i, t
0.0105* -0.008
ˆ ˆ
0.0020 0.0114*
fr r a a a a RV
-0.0026 0.0516
6.2 CAPM tests & extensions
* indicates a significance level of 10%
CF IV – 4-143Dieter Hess
Interpretation:
Overall, the Fama/McBeth results are consistent with the predictions
of the CAPM
Portfolios with greater than average beta factors will tend to produce
greater than average rates of return in subsequent periods.
Little or no evidence of nonlinearity in the relationship between beta and
return.
No forecast of future returns based on the residual variance of the
stocks in the portfolio.
6.2 CAPM tests & extensions
CF IV – 4-144Dieter Hess
More recent tests of the CAPM
Fama/French (1992 JF, 1992 JFE) extend the Fama/McBeth analysis
by including tow additional “ad-hoc” factors:
Size (Market Equity Value = Stock Price ∙ Shares outstanding)
Small firms tend to have a lower (and more volatile) profits.
Stock portfolios sorted on size tend to have different risk premia (even after
controlling for their different betas). Hence “size” is interpreted as a risk factor
on its own, i.e., not associated with market risk.
BE/ME (Book Equity Value / Market Equity Value)
Firms with high BE/ME (low stock prices relative to book values)
tend to have persistently low earnings on assets.
Similarly, stock portfolios sorted on BE/ME have different risk premia (after
controlling for beta and other effects) and thus “BE/ME” is a also interpreted as
a separate risk factor.
6.2 CAPM tests & extensions
CF IV – 4-145Dieter Hess
Fama/French (1992 JFE) use a “time-series approach”, in addition to the
Fama/McBeth approach.
This time-series approach is very interesting as it is widely used to measure
“risk-adjusted” returns (in academia as well as in practice).
To approximate risk premia associated with a particular risk factors, we need to
construct a “mimicking portfolio”, e.g., for the Fama/French factors:
SMB = Return on a portfolio of “small” stocks
– Return on a portfolio of “large” stocks
HML = Return on a portfolio with “high” BE/ME
– Return on a portfolio with “low“ BE/ME
SMB is the return on a portfolio which requires zero investment (buy small stocks,
sell short large stocks) but is particularly strongly exposed to the “size risk factor”.
Similarly, HML is the return on a zero-investment portfolio being strongly exposed to
“cheapness” (long high BE/ME stocks, short low BE/ME stocks)
6.2 CAPM tests & extensions
CF IV – 4-146Dieter Hess
6.2 CAPM tests & extensions
Excursus: Construction of Mimicking Portfolios
(1) Sort assets according to a certain characteristic
(e.g., according to their “size”, i.e., price x shares outstanding)
(2) Form portfolios
(e.g. quintile portfolios: Q1 = smallest 20% firms, …, Q5: largest 20%)
(3) Compute the returns on these portfolios
(e.g., rQ1, rQ2 , rQ3 , rQ4 , rQ5)
(4) Construct a portfolio that requires zero investment
(e.g., buy portfolio Q5 and raise the necessary funds by selling short Q1)
(5) Compute the return on this zero-investment portfolio
(e.g., rzero-investm. portf. = rQ5 – rQ1)
Due to its interpretation (return earned for bearing a particular risk)
the mimicking portfolio approach is widely used in empirical finance.
CF IV – 4-147Dieter Hess
Fama/French (1992 JF, 1992 JFE) find that
SMB (or size) and
HML (or BE/ME)
are strongly related to the explanation of cross-sectional and time-series
variation in excess returns.
Carhart(1997, JF) contributes a forth factor (based on the Jagadeesh/Tittman
(1993) one-year momentum anomaly):
MOM (i.e., momentum)
portfolio with a high exposure to stocks which out-performed the
market during the last year (i.e., winners), but which requires zero
investment (i.e. long winners, short losers)
6.2 CAPM tests & extensions
CF IV – 4-148Dieter Hess
With the above introduced additional factors we obtain the Four-factor (time
series) model:
Interpretation:
β1 exposure to market risk >1 → overweight in market risk (CAPM beta)
β2 exposure to size risk >0 → overweight in small stocks
β3 exposure to cheapness >0 → overweight in high BE/ME stocks
β4 exposure to momentum >0 → overweight in momentum stocks
(i.e., with very high returns last year)
α a portfolio’s “alpha” >0 → excess return after controlling for
risk premiums for 4 factor exposures
6.2 CAPM tests & extensions
, 1 , , 2 , 3 , 4 , , ( ) i t f M t f t i t i t i t i tr r r r SMB HML MOM
CF IV – 4-149Dieter Hess
Summary
The insight provided by the CAPM is a major step in understanding how
securities are priced in the market place.
Empirical evidence on the “pure” CAPM is “mixed”.
But empirical results have to be interpreted with care (in particular, due to the
fact that the studies use not “expected” returns, but only “realized” returns)
Empirical evidence on the “extended” CAPM (4-Factor version) is much
better, but the existence of additional factors is evidence against the CAPM.
Nevertheless, the CAPM (and the 4-Factor version) is widely applied in the
securities industry, e.g., to measure risk-adjusted returns of portfolios (or the
“performance” of a fund manager which is not due to taking higher risks).
6.2 CAPM tests & extensions
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess6. Application: CAPM
6.1 Theoretical background
6.2 CAPM tests & extensions
6.3 Empirical exercise
CF IV – 4-151Dieter Hess
6.3 Empirical exercise
(1) open file with monthly size sorted portfolio returns and the 4 risk factors:
cd {your data directory name}
use monthly_portfolio_returns.dta, clear
CF IV – 4-152Dieter Hess
6.3 Empirical exercise
(2) explore the data set and its variables
CF IV – 4-153Dieter Hess
6.3 Empirical exercise
(3) Compute excess returns
gen exret_vw = vwretd – rf
gen exret_sp = sprtrn - rf
gen exret_MC1 = pfret_MC1 – rf
gen exret_MC2 = pfret_MC2 – rf
...
(4) Have a look at your data, e.g.
tabstat exret* , stat( N mean p50 min max)
CF IV – 4-154Dieter Hess
6.3 Empirical exercise
(5) Regress your excess returns on the 4 factors (beta, smb, hml, mom)
A. Excess return on a value weighted market portfolio:
reg exret_vw beta smb hml mom
CF IV – 4-155Dieter Hess
6.3 Empirical exercise
B. Excess return on S&P-500 index:
reg exret_sp beta smb hml mom
Interpretation of the result?
CF IV – 4-156Dieter Hess
6.3 Empirical exercise
C. Compare the regression results for the size-sorted portfolios:
eststo _reg1: reg exret_MC1 beta smb hml mom
eststo _reg2: reg exret_MC2 beta smb hml mom
eststo _reg3: reg exret_MC3 beta smb hml mom
eststo _reg4: reg exret_MC4 beta smb hml mom
eststo _reg5: reg exret_MC5 beta smb hml mom
esttab _reg*, r2
Note: If the command eststo is not working, you need to install the additional
package “estout” with the following command: ssc install estout
The package estout is a useful tool for displaying and comparing regression
results.
CF IV – 4-157Dieter Hess
6.3 Empirical exercise
Interpretation of the results?
CF IV – 4-158Dieter Hess
6.3 Empirical exercise
For comparison: estimation results for Fama/French 5 industry portfolios
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Corporate Finance IV
Empirical Finance
1. The OLS estimator
2. Application: Multiples
3. Significance of estimated regression coefficients
4. Variable selection
5. The data set
6. Application: CAPM
7. Goodness-of-fit
8. Typical problems
9. Advanced techniques
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess7. Goodness-of-fit
7.1 The determination coefficient
7.2 AIC and BIC
Literature: Verbeek (2004), chapter 2.4
Wooldridge (2000), chapter 2
CF IV – 4-161Dieter Hess
Basic Question
How well do the explanatory variables X explain the dependent variable y ?
To answer this question, we divide the total variation of y
(Total sum of squares)
into two parts:
(Explained sum of squares)
(Residual sum of squares)
7.1 The determination coefficient
2
1
N
i
i
SST y y
2
1
ˆ
N
i i
i
SSR y y
2
1
ˆ
N
i
i
SSE y y
CF IV – 4-162Dieter Hess
The R² then tells us which fraction of the total variation of y is explained by x
Properties:
7.1 The determination coefficient
2
2
2 1
22
1
ˆˆˆ
ˆ ( )
N
i
i
N
i
i
y yySSE
RSST y
y y
2 2
ˆ,y yR i.e., R² = square of correlation of y and regression line
0 ≤ R2 ≤ 1 if the model has an intercept
Adding variables leads to an increase in the R2
CF IV – 4-163Dieter Hess
Adjusted R2
Since the R2 increases with the number of regressors, it is not a good criterion
to discriminate between differently large regression models.
Another modified measure is needed to correct for the inclusion of (too) many
explanatory variables
As long as the model includes at least one regressor it holds that adj. R² < R².
(N-1)/(N-k-1)>1 can be viewed as a penalty for adding additional regressors.
7.1 The determination coefficient
2
2
2
1ˆ
1. 1
1
1
( 1)1
( 1)
i i
i
i
i
y yN k
adj R
y yN
N SSR
N k SST
CF IV – 4-164Dieter Hess
7.1 The determination coefficient
Example
Recall application 1 where we have estimated multiples for corporate
valuation.
For the univariate model
the following regression output is obtained:
0 1P/E Ratio (expected) growth
CF IV – 4-165Dieter Hess
7.1 The determination coefficient
If we include the risk variable in our equation, i.e.
→ The inclusion of risk leads to an increase in R2 as well as an increase
in the adj.
0 1 2P/E Ratio (expected) growth STDV
2R
CF IV – 4-166Dieter Hess
7.1 The determination coefficient
However, if we include a random variable (“random”) instead of risk that
should not be systematically related to the P/E ratio, i.e.
→ although the random variable is unrelated to y (and thus could not have any
explanatory power for y), R2 increases,
→ in contrast, the adj. R2 decreases compared to the univariate model.
0 1 2P/E Ratio (expected) growth random
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess7. Goodness-of-fit
7.1 The determination coefficient
7.2 AIC and BIC
Literature: Verbeek (2004), chapter 3
Wooldridge (2000), chapter 3
CF IV – 4-168Dieter Hess
Information Criteria
Information criteria are similar to the adjusted R², as they provide a tradeoff
between goodness-of-fit and the number of parameters used (k+1).
Akaike´s Information Criterion:
Schwarz’s Bayesian Info. Criterion:
Both information criteria have to be minimized over choices of k+1.
The penalty for additional regressors is somewhat larger in BIC. Therefore this
criterion is more “conservative” (tends to favor more parsimonious models).
In Stata the AIC and BIC are called via the post-estimation command:
estat ic
7.2 AIC and BIC
2
1
1 2( 1)ˆAIC log
N
i
i
ku
N N
2
1
1 ( 1)ˆBIC log log
N
i
i
ku N
N N
CF IV – 4-169Dieter Hess
Reconsider the linear relation between PE-ratio and the growth rate and its
regression output. The use of “estat ic” gives us the follwing output:
7.2 AIC and BIC
R2 = 67.82%
Adj. R2 = 59.77%
AIC = 42.82
BIC = 42.40
CF IV – 4-170Dieter Hess
If we add the variable risk, the goodness of fit measures will change.
In case of a “meaningful” additional explanatory variable we should observe
- a decrease of the AIC and BIC and
- an increase of the adjusted R² :
7.2 AIC and BIC
R2 = 88.65% +
Adj. R2 = 81.08% +
AIC = 38.57 -
BIC = 37.94 -
CF IV – 4-171Dieter Hess
In contrast, adding a “meaningless” variable (i.e. having no additional
explanatory power) we should observe
- an increase of the AIC and BIC and
- an decrease of the adjusted R² .
For example, adding a random variable:
the AIC and BIC increase:
7.2 AIC and BIC
R2 = 68.27% +
Adj. R2 = 47.12% -
AIC = 44.73 +
BIC = 44.11 +
CF IV – 4-172Dieter Hess
Summary
The quality of the linear approximation of an OLS model is evaluated with
goodness-of-fit measures.
There is a trade-off between goodness-of-fit and the simplicity of the model.
If you include many explanatory variables, you should always calculate a
measure that corrects for the inclusion of additional variables (such as BIC or
AIC).
7.2 AIC and BIC
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Corporate Finance IV
Empirical Finance
1. The OLS estimator
2. Application: Multiples
3. Significance of estimated regression coefficients
4. Variable selection
5. The data set
6. Application: CAPM
7. Goodness of fit and significance of regression
8. Typical problems
9. Advanced techniques
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess8. Typical problems
8.1 Heteroskedasticity
8.2 Autocorrelation
8.3 Generalized Least Squares
Literature: Verbeek (2004), chapter 4
Wooldridge (2000), chapter 8
CF IV – 4-175Dieter Hess
Introduction
The homoskedasticity assumption states that the variance of the unobservable
error term , conditional on the explanatory variables, is constant.
It fails, whenever the variance of the unobservable changes across different
segments of the population.
→ The error terms are then no longer independent of x
→ Covariance matrix of the OLS estimator is incorrect
In this case
OLS is still unbiased and the goodness-of-fit measures hold.
But the OLS estimator may be relatively inefficient.
8.1 Heteroskedasticity
CF IV – 4-176Dieter Hess
Example
Variation of food expenditures increases with higher income (Engle curve)
8.1 Heteroskedasticity
income
food
expenditures
CF IV – 4-177Dieter Hess
Consequences for the OLS estimator
Consider the following simple OLS model in matrix notation y = X +
The second and third Gauss-Markov assumptions can be summarized as
Heteroskedasticity implies, that (8.2) no longer holds,
i.e., error terms do not have identical variances. Instead
where is a positive definite matrix that may depend upon X
Standard t- and F-tests will no longer be valid and inferences will be
misleading because they relied on (8.2).
8.1 Heteroskedasticity
2 (8.2)Var X Var I
2 (8.3)Var X
CF IV – 4-178Dieter Hess
Three ways to solve the problem
1. Reconsider your model and try to evaluate whether it might be miss-
specified.
2. Use the OLS estimator
but use an alternative procedure to estimate standard errors,
i.e., standard errors which are “robust” against heteroskedasticity
3. Use an alternative estimator, for example, the (F)GLS estimator.
8.1 Heteroskedasticity
CF IV – 4-179Dieter Hess
Adjusted standard errors
White (1980) derived an alternative estimator of the variance of the OLS
parameters that holds under heteroskedasticity
with
Standard errors computed on the basis of this variance-covariance estimate
are referred to as heteroskedasticity-consistent standard errors or White
standard errors.
In Stata they are obtained by simply adding the option “robust” :
reg X Y, robust
8.1 Heteroskedasticity
1 12ˆ
iVar X X X X Diag X X X
1 12 2
1
ˆˆN
i i i
i
X X u x x X X
CF IV – 4-180Dieter Hess
Testing for Heteroskedasticity
Always produce a graph of your data!
This may already tell you enough (like in the previous example)!
Basically, testing for heteroskedasticity means that we try to evaluate whether
the variance is identical across residuals or not.
But what is meant by “not identical”?
Should we compare different points in time?
Should we compare different groups (e.g. men vs. women)?
…
There are several tests for heteroskedasticity available. The main difference is,
what “type of difference” they test for.
8.1 Heteroskedasticity
CF IV – 4-181Dieter Hess
Examples:
The Goldfeld-Quandt test tests for groupwise heteroskedasticity, assuming
that
The null hypothesis states that the variance is identical for both groups:
H0: A2 = B
2.
If we have to reject the null, our data are heteroskedastic.
The ARCH-LM test looks at differences across time, assuming that the
variance is constant over time.
Again, rejecting the null implies that our data are heteroskedastic.
8.1 Heteroskedasticity
2 2( ) or ( ) if belongs to group A or B, respectively.i A i BVar Var i
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess8. Typical problems
8.1 Heteroskedasticity
8.2 Autocorrelation
8.3 Generalized Least Squares
Literature: Verbeek (2004), chapter 4
Wooldridge (2000), chapter 8
CF IV – 4-183Dieter Hess
Assume we have a time series data set, i.e., one with a time dimension t.
E.g., we observe leverage of one firm over several years.
Autocorrelation is present when the error terms εi and εj are not independent.
Again, assumption (8.2) no longer holds.
The consequences of autocorrelation are similar to those of heteroskedasticity:
OLS remains unbiased, but parameter estimates become inefficient and the
standard errors are estimated in the wrong way.
8.2 Autocorrelation
0
1
2
3
4
5
6
7
0 5 10 15 20 25
time t
y
CF IV – 4-184Dieter Hess
First order autocorrelation
The most popular form of autocorrelation is the first-order autoregressive
process. In this case, the error term in
is assumed to depend upon its predecessor
where t is an error term with zero expectation and constant variance that
exhibits no serial correlation and is the first-order autocorrelation coefficient.
8.2 Autocorrelation
t t ty x
1t t t
CF IV – 4-185Dieter Hess
Testing for First Order Autocorrelation
To test for autocorrelation of first order, we can simply regress the OLS
residual on its first lag:
reg y x
predict eps, residual
reg eps L.eps
This auxiliary regression produces an estimate for the first-order
autocorrelation coefficient along with a corresponding t-test.
Alternatively, use a “Durbin Watson” test:
estat dwatson
It can be shown that : DW ≈ 2 – 2 or ≈ 1 – DW / 2
Hence, a DW test statistic substantially apart from 2 indicates that is
different from zero, i.e., that first-order autocorrelation is present.
8.2 Autocorrelation
CF IV – 4-186Dieter Hess
What to do when autocorrelation is found?
In many cases the finding of autocorrelation is an indication that the model is
miss-specified.
Typically, three (interrelated) types of misspecification may lead to a finding of
autocorrelation in the OLS residuals: dynamic misspecification, omitted
variables and functional form misspecification:
→ Do not change your estimator (e.g. from OLS to GLS),
but change your model.
If changing the model does not help (or is not possible):
→ Again, stay with OLS but use a different inference,
i.e., use “robust” t-tests, …
8.2 Autocorrelation
CF IV – 4-187Dieter Hess
Robust inference: Newey-West standard errors
Like in the case of heteroskedasticity (slide 8-13) one can apply OLS
in spite of autocorrelation but has to adjust its standard errors:
Use “heteroskedasticity and autocorrelation consistent” (HAC) standard
errors, i.e., Newey-West standard errors.
In stata you can call for Newey-West standard errors using the comand
“newey”.
Instead of reg Y X
use newey Y X
This produces OLS parameter estimates, but all inference (t-statistics, …) is
based on the HAC estimator.
8.2 Autocorrelation
CF IV – 4-188Dieter Hess
Example
Consider the following dataset containing variables on the volume of annual
textile consumption (consvoltext), real income (income) and relative price of
textiles (price). The Stata data file “autocorr_textile.dta” is available online.
8.2 Autocorrelation
Year
(base=1925)
Volume of textile
consumption
Income Price
1923 99.2 96.7 101
1924 99 98.1 100.1
1925 100 100 100
1926 111.6 104.9 90.6
1927 122.2 104.9 86.5
1928 117.6 109.5 89.7
1929 121.1 110.8 90.6
1930 136 112.3 82.8
1931 154.2 109.3 70.1
1932 153.6 105.3 65.4
1933 158.5 101.7 61.3
1934 140.6 95.4 62.5
1935 136.2 96.4 63.6
1936 168 97.6 52.6
1937 154.3 102.4 59.7
1938 149 101.6 59.5
1939 165.5 103.8 61.3
CF IV – 4-189Dieter Hess
We need to tell Stata that the dataset has a time-series structure and which
variable contains the time index. This is done by:
tsset [name of time variable]
Afterwards, we can run regressions in the usual way.
8.2 Autocorrelation
CF IV – 4-190Dieter Hess
Assume we first want to estimate an equation with price as the explanatory
variable:
Then a post-estimation command can be used to obtain the DW-statistic:
estat dwatson
8.2 Autocorrelation
CF IV – 4-191Dieter Hess
The DW statistic of 1.2 indicates, that the error terms are autocorrelated
(with ≈ 1 – 1.2 / 2 = 0.4)
→ The first suggested solution was to change the model:
Economic theory suggests that income is an important variable
in a demand equation.
→ The second solution was to use Newey-West standard errors
8.2 Autocorrelation
CF IV – 4-192Dieter Hess
First, try what happens if we run a second OLS regression
that includes both price and income as explanatory variables.
Now, DW is quite close to 2, indicating that is no autocorrelation anymore.
8.2 Autocorrelation
CF IV – 4-193Dieter Hess
Second, assume we do not have income data.
Therefore we use HAC standard errors
newey [dependent variable][list of independent variables] {, lag(#)}
Note that the coefficients remained unchanged, but we obtain much smaller
(HAC) standard errors.
8.2 Autocorrelation
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess8. Typical problems
8.1 Heteroskedasticity
8.2 Autocorrelation
8.3 Generalized Least Squares
Literature: Verbeek (2004), chapter 4
Wooldridge (2000), chapter 8
CF IV – 4-195Dieter Hess
Another way to account for biased standard errors due to autocorrelation or
heteroskedasticity is to derive an alternative estimator.
The idea is if we know that our error terms have a certain structure, i.e.,
we can transform the model and derive an alternative estimator:
This is the Generalized Least Squares (GLS) estimator.
If we do not know (typically we have to estimate it) we can apply the
„Feasible“ GLS (FGLS) estimator:
8.3 Generalized Least Squares
2 with Var X I
1 12ˆ Var X X X X X X X
1
* 1 1ˆ ˆ ˆ FGLS X X X y
CF IV – 4-196Dieter Hess
The FGLS estimator is asymptotically unbiased, but not in finite samples!
Hence, there is a tradeoff between unbiasedness of the OLS estimator and
the higher efficiency of the FGLS estimator:
→The OLS estimator is unbiased but inefficient (as it does not have the
smallest variance).
→The FGLS estimator is biased in small samples but efficient
(however, it is only efficient if the form of heteroskedasticity is correctly
specified).
→So, if you have a good idea about the form of heteroskedasticity, FGLS
may provide a more efficient estimator.
If not, use OLS but apply robust inference!
8.3 Generalized Least Squares
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Corporate Finance IV
Empirical Finance
1. The OLS estimator
2. Application: Multiples
3. Significance of estimated regression coefficients
4. Variable selection
5. The data set
6. Application: CAPM
7. Goodness of fit and significance of regression
8. Typical problems
9. Advanced techniques
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess9. Advanced techniques
9.1 Overview
9.2 Panel estimators
9.3 Unbalanced panels
Literature: Verbeek (2004), chapter 10.1
Wooldridge (2000), chapter 13.1
CF IV – 4-199Dieter Hess
9.1 Overview
Data set dimension
Typically, we have one of the two types of data structures:
Cross-sectional data (e.g., earnings of N firms for 1 year)
Time-series data (e.g., earnings of 1 firm over T years)
A combination of both, i.e., repeated cross-sections, is advantageous:
This allows for a richer analysis and for more realistic models
since it is useful to know that certain observations come from a particular
individual, e.g., from a “small” firm.
But we can no longer assume that all observations are independent
as some come from the same individual
CF IV – 4-200Dieter Hess
9.1 Overview
With repeated cross-sections we distinguish:
Pooled cross-sections (ignore the specific data structure)
Panel data (exploit information from repeated observations)
PanelPooled cross-sections
Year firm earn year earn1 earn2 earn31985 1 17 1985 17 4 521990 1 -2 1990 -2 7 481995 1 12 1995 12 3 651985 2 41990 2 71995 2 31985 3 521990 3 481995 3 65
CF IV – 4-201Dieter Hess
9.1 Overview
Pooled cross-sectional regressions
The simplest pooled cross-sectional regression is just a regular OLS estimation:
yi,t = β0 + xi,t β1 + εi,t
It assumes that the parameters (β0 and β1) are identical for all individuals.
But it ignores that there may be dependencies between the observations of a
given individual (e.g., rich people consume more at all points in time; small firms
have lower debt ratios at all points in time; …).
Therefore, the standard OLS assumption that the residuals εi,t are identically
and independently (i.i.d.) distributed residuals may be violated.
But then inference is wrong, i.e., estimated standard errors are misleading.
Moreover, estimated parameters can be biased.
CF IV – 4-202Dieter Hess
9.1 Overview
Example (biased OLS results):
We are interested in analyzing the influence of overall economic risk on
companies’ debt ratios.
Theory suggests:
With higher earnings risk firms should select lower debt levels.
Young (and small) firms have difficulties to raise debt and therefore
tend to have lower debt ratios.
Knowing that small firms tend to have lower debt ratios, we expect that their
residuals will be negative (at most points in time). In contrast, large firms
will presumably have positive residuals.
→ Assumption of i.i.d. residuals and (A.1 : E(ε) = 0) is violated.
CF IV – 4-203Dieter Hess
9.1 Overview
year dr1 dr2 dr3 average_dr gdpvola
1985 0.36 0.36 19
1990 0.35 0.15 0.25 20
1995 0.5 0.2 0.08 0.26 15
2000 0.48 0.23 0.12 0.175 12
year firm dr gdpvola1985 1 0.36 191990 1 0.35 201995 1 0.5 152000 1 0.48 121990 2 0.15 201995 2 0.2 152000 2 0.23 121995 3 0.08 152000 3 0.12 12
Assume we observe the following data:
dr = debt ratio of a firm in a year
gdpvola = overall economic volatility
Obviously this is a (unbalanced) panel:
Note that for all firms the debt ratio
is increasing as gpdvola goes down.
However, firm 2 and 3 are “young”
(being established after 1985 and
1990, resp.).
Therefore they have much lower
debt ratios on average.
CF IV – 4-204Dieter Hess
9.1 Overview
We estimate the following models:
Pooled cross-sectional
regression:
reg dr gdpvola
Panel (random effects)
xtreg dr gdpvola, re
Panel (fixed effects):
xtreg dr gdpvola, fe
→ OLS produces biased results (as it ignores dependencies).
Panel estimators (fixed and random effects) produce much better results.
CF IV – 4-205Dieter Hess
9.1 Overview
For comparison we also estimate a pooled cross-sectional model
but with firm-specific dummies (_c1 = constant for firm 1, …)
reg dr _c1 _c2 _c3 gdpvola, noconst
Note that this produces the same result as the fixed effects estimator
xtreg dr gdpvola, fe
→ Basically the fixed effects panel estimator solves the problem by including
individual-specific constants to accounts for the individual effects.
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess9. Advanced techniques
9.1 Overview
9.2 Panel estimators
9.3 Unbalanced panels
Literature: Verbeek (2004), chapter 10.1
Wooldridge (2000), chapter 13.1
CF IV – 4-207Dieter Hess
9.2 Panel estimators
In contrast to a simple pooled cross-sectional regression,
panel estimators try to capture dependencies of individuals:
yi,t = β0 + xi,t β1 + εi,t with εi,t = αi + υi,t
Panel estimators try to split εi,t into an individual-specific part αi
and a part that is independent of individuals υi,t , i.e. i.i.d., distributed.
We say αi accounts for “unobserved heterogeneity” among individuals,
i.e., for differences across individuals for which cannot control due to a
lack of appropriate data.
For example, we might simply don’t have data about size, age, … of the
firms in our sample.
CF IV – 4-208Dieter Hess
9.2 Panel estimators
Some useful transformations:
If
(1)
holds, then also
(2)
must hold with , and .
Subtracting (2) from (1) yields:
(3)
These three equations provide the basis for estimating β1
, 0 1 , ,i t i t i i ty x
,1
T
i i tty y T
0 1i i i iy x
,1
T
i i ttx x T
,1
T
i i ttT
, 1 , ,i t i i t i i t iy y x x
CF IV – 4-209Dieter Hess
9.2 Panel estimators
Between estimator
Within (= fixed effects) estimator
0 1
The between estimator uses OLS to estimate 2 :
Hence, it explains why differs from ,
i.e., it explains differences individuals.
i i i i
i j
y x
y y
between
xtreg y x, be
xtreg y x, fe
, 1 , ,
,
The within estimator uses OLS to estimate 3 :
Hence, it explains why differs from ,
i.e., it explains differences individuals.
i t i i t i i t i
i t i
y y x x
y y
within
CF IV – 4-210Dieter Hess
9.2 Panel estimators
There are two estimators that combine the between and within dimension:
The regular OLS estimator:
The random effects estimator
The random effects estimator uses FGLS to directly estimate (1),
and thus combines the within and between dimension efficiently.
It is equivalent to a weighted average of the between and within
estimators.
, ,
The regular (pooled cross-section) OLS estimator directly estimates (1)
and thus combines the within and between dimension, but not efficiently.
It requires that E ( ) 0, i.e., that ai t i i t ix ,nd are
contemporanously uncorrelated.
i tx
reg y x
xtreg y x, fe
CF IV – 4-211Dieter Hess
9.2 Panel estimators
Assumptions
The fixed effects (FE) model assumes that the individual-specific terms are
(non-random) constants αi which can be estimated:
yi,t = αi + xi,t β1 + υi,t
Hence, the FE model requires little assumptions. It only assumes that αi are
fixed unknown constants which can be estimated.
In particular, it does not require that the αi and the xi,t are uncorrelated
Note that the FE model is equivalent to a regular OLS estimation including
a separate dummy variable Di for each individual i:
yi,t = α1 D1+ α2 D2 +… + αN DN + xi,t β1 + υi,t
CF IV – 4-212Dieter Hess
9.2 Panel estimators
Assumptions
The random effects (RE) model tries to disentangle the residuals
yi,t = β0 + xi,t β1 + εi,t with εi,t = αi + υi,t
αi captures the individual effects, but they are assumed to be stochastic.
Most importantly, it is assumed that E[εi,t ∙xi,t ] = 0. This implies that the
unobservable characteristics captured by αi are uncorrelated with the
observable regressor(s) in xi,t.
In many applications this may be too restrictive. For example, …
… in a wage-model the unobservable “general ability” of persons might be
correlated with their observable “school degree”.
… in a debt-ratio model unobservable “management skills” might be
correlated with the observed “firms access to capital markets”.
CF IV – 4-213Dieter Hess
9.2 Panel estimators
Choosing between RE and FE
The fixed effects model concentrates on differences “within” individuals,
not “between” them.
It is explaining to which extent yi,t differs from , not why differs from .
The FGLS estimator for random effects is an efficient combination of the
“within” and “between” estimator.
If the assumption of the RE model are is violated (i.e. uncorrelated αi and xi,t)
the RE model will produce more efficient estimates than FE since the FE
model assigns 100% weight to the within dimension and ignores the
between dimension of the data.
However, if the uncorrelatedness assumption is too restrictive, the RE
estimator may produce biased estimates.
This risk is lower for the FE as it does not require uncorrelated αi and xi,t.
iyiy jy
CF IV – 4-214Dieter Hess
9.2 Panel estimators
Choosing between RE and FE
In practice, one often estimates both the RE and FE model.
If there are no substantial differences between the results, then
correlation of αi and xi,t is no issue.
Hausmann (1978) suggests a very general test which can be used to
determine whether the FE or RE model is more appropriate.
1. Estimate the model which is consistent under the null hypothesis
and store the results
2. Estimate the model which is efficient
and store the results
3. Compare the estimates
hausman consistent_mod efficient_mod
xtreg y x, fe
estimates store consistent_mod
xtreg y x, re
estimates store efficient_mod
CF IV – 4-215Dieter Hess
9.2 Panel estimators
Choosing between RE and FE
If the Hausman test yields a low p-value (it rejects the null-hypothesis that
the differences between the individuals are not systematic), then we
should prefer the fixed effects model.
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess9. Advanced techniques
9.1 Overview
9.2 Panel estimators
9.3 Unbalanced panels
Literature: Verbeek (2004), chapter 10.8
Wooldridge (2000), chapter 13.1
CF IV – 4-217Dieter Hess
Some panel data sets have missing years for at least some cross-sectional
units in the sample.
Such a data set is called
an unbalanced panel.
The mechanics of FE estimation remain the same.
If Ti is the number of time periods for cross-sectional unit i, we simply use
these Ti obervations for calculating individual-specific means (αi).
Units that are observed only in one single time period are excluded.
Nevertheless, it is important to determine why the panel is unbalanced.
9.3 Unbalanced Panels
year dr1 dr2 dr3 average_dr gdpvola
1985 0.36 0.36 19
1990 0.35 0.15 0.25 20
1995 0.5 0.2 0.08 0.26 15
2000 0.48 0.23 0.12 0.175 12
CF IV – 4-218Dieter Hess
If the reason we have missing data for some i is not correlated with the
idiosyncratic errors, uit, the unbalanced panel causes no problems.
However, this is often not the case. For example, …
… if we collect data on companies, some of them may be lost in
subsequent years because they have gone out of business or have
merged with other companies. Or newly established firms enter the
sample.
… a hedge fund’s performance influences its likelihood to survive.
If the reason an individual leaves is correlated with the idiosyncratic error,
i.e., an unobserved factor that varies over time, there is a sample
selection problem.
Sample Selection Problems may cause biases, e.g., survivorship bias!
In this case we need more advanced estimators.
9.3 Unbalanced Panels
Universität zu Köln
Seminar für ABWL und
Unternehmensfinanzen
Prof. Dr. Dieter Hess
Corporate Finance IV
Empirical Finance
Good luck on the exam !