University of Cologne Corporate Finance ... - uni-saarland.de · University of Cologne Corporate...

Empirical Finance

Universität Saarbrücken

Winter Term 2015

Univ. Prof. Dr. Dieter Hess

University of Cologne

Corporate Finance Seminar

University of Cologne

Corporate Finance Seminar

Organizational Issues

Information/Course Material:

Password:

Further Information:

Announcements in the lecture and tutorial

Theoretical knowledge about econometrics

Empirical applications in practice and research:

company valuation

asset pricing

corporate finance

Critical analysis of empirical studies

Usage of the statistical tool STATA

Handling data sets and conduction of own empirical work

Preparation for seminar and master theses

Learning objectives

Literature:

Verbeek, M. (2008): „A Guide to Modern Econometrics“, 3rd edition, Wiley.

Kohler, U., Kreuter, F. (2009): “Data Analyse Using Stata”, 2nd edition, Stata Press.

Organizational Issues

Universität zu Köln

Seminar für ABWL und

Unternehmensfinanzen

Prof. Dr. Dieter Hess

Corporate Finance IV

Empirical Finance

1. The OLS estimator

2. Application: Multiples

3. Significance of estimated regression coefficients

4. Variable selection

5. The data set

6. Application: CAPM

7. Goodness of fit and significance of regression

8. Typical problems

9. Advanced techniques




Prof. Dr. Dieter Hess1. The OLS estimator

1.1 Basic idea of OLS estimation

1.2 The univariate OLS estimator

1.3 The multivariate OLS estimator

1.4 Key Assumption

1.5 Correlation vs. causality

Literature: Verbeek, Chapter 2.1, 2.2, 2.3

CF IV – 4-8Dieter Hess

1.1 Basic Idea of OLS Estimation

Suppose we have a sample of N=10 observations of a firm’s stock price yi

and its earnings per share (EPS) xi.

Economic model: Higher eps imply higher stock prices

Econometric question: Are xi and yi related?

i Stock EPS

1 60 5

2 30 2

3 25 1

4 55 4

5 43 4.5

6 39 2

7 62 4

8 44 3

9 46 3.5

10 52 4

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6sto

ck p

rice

earnings per share


The economic model describes the theoretical relation R between xi and yi:

which means that stock prices are somehow related to a firm’s EPS.

Most economic relations can not be measured without errors due to

unobservable factors.

Thus, an econometric model looks for the best approximation to the true

relation R:


( )y f x

( )y R x


The classical OLS model always assumes a linear relation btw. x and y

where α and β are (constant) parameters.

Econometric question (OLS):

What is the best linear approximation to the true relation?

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6earnings per share


i iy x

sto

ck p

rice


0

10

20

30

40

50

60

70

0 1 2 3 4 5 6

sto

ck p

rice

earnings per share


Idea: The best linear approximation is achieved, if the sum of squared

differences (= residuals = approximation errors) is minimal.

In the case where we have just one regressor:

squared residuals = squared vertical distances btw yi and its fitted value.

All fitted values are on a straight line, the regression line.

u

y


Classical Ordinary Least Squares (OLS) Model:

with

Question: How can we estimate α and β ?

dependent / endogenous variable (observation )

independent / exogenous variable (observation )

constant

OLS estimator

error term (residual of observation )

i

i

i

y i

x i

i

1,...,i i i

y x i N

observable

not observable

→ to be estimated


(1)








1.4 Key Assumption




(2.2)

2

ˆ ˆ

1

The OLS estimator is achieved by

ˆ min!

ˆWe know that the residuals measure the vertical distance

ˆbetween observation and the regression line :

N

iuui

i

i

S u

u

y R

ˆ ˆ

ˆSince is derived from estimation model (2) we obtain

ˆˆ ˆ

and therefore

i i i

i

i i i

u y y

y

u y x

2

ˆ ˆ

1

ˆˆ ( - - ) min!N

i iuui

S y x

(2.1)


The First Order Conditions are obtained by

which can be rewritten as

These equations are the so-called normal equations of the OLS estimation.


!ˆ ˆ

1

!ˆ ˆ

1

ˆ2( ˆ )( 1) 0ˆ

ˆ2( ˆ )( ) 0ˆ

Nuu

i i

i

Nuu

i i i

i

Sy x

Sy x x

1 1

2

1 1 1

ˆˆ

ˆˆ

N N

i i

i i

N N N

i i i i

i i i

y N x

y x x x

(2.2.3)

(2.2.4)

(2.2.1)

(2.2.2)



1 1

1

1 1Dividing (2.2.3) over and substituting and leads to

ˆˆ

ˆSolving the above equation for and inserting into 2.2.4 gives

N N

i i

i i

N

i i

i

N y y x xN N

y x

y x

2

1 1

2 2 2

1 1

1 1

2 2 2 2

1 1

ˆ ˆ( )

ˆ ˆ ˆ ˆ ( )

and therefore

1

ˆ 1

N N

i i

i i

N N

i i

i i

N N

i i i ixyi i

N N

xi i

i i

y x x x

y x Nx x Nxy Nx x

y x Nxy y x xyCovN

Varx Nx x x

N

(2.3)


Thus, the intercept is determined such that the average approximation error

(residual) is equal to zero.


1 1

ˆSimilarly, for we get:

1 1ˆˆ

ˆˆ

N N

i i

i i

y xN N

or

y x (2.4)


Example: Stock price on EPS


1

2 2

1

1,625.5 10 3.3 45.6ˆ 123.5 10 10.89

120.78.2671

14.6

ˆˆ 45.6 8.2671 3.3 18.3186

N

i i

i

N

i

i

y x Nxy

x Nx

y x

i y x x² xy

1 60 5 25 300

2 30 2 4 60

3 25 1 1 25

4 55 4 16 220

5 43 4.5 20.25 193.5

6 39 2 4 78

7 62 4 16 248

8 44 3 9 132

9 46 3.5 12.25 161

10 52 4 16 208



Interpretation of the slope parameter (β)

For testing theories, we are most interested in the sign of β

(not so much in its value) and whether β is significant.

E.g., a significantly negative β estimate would be in sharp contrast

to all standard valuation theories

For practical applications, the value (and significance) of β is also

interesting.

In the example, the estimated β suggests that a company could

raise its stock price by 8.27 € if it can manage to increase EPS by 1 €.



Now, we can analyze what the affect of a changing EPS on the stock price

is.

Assume that the first firm (i=1) manages to raise their EPS from 5 to 6.5.

The new estimated stock price is higher than before due to the positive

coefficient.

Note, this is just an approximation based on the given data.

ˆˆ ˆ 18.3186 8.2671 6.5 72.0548 y x



Interpretation of the intercept parameter (α)

The intercept parameter is the predicted value of y given x=0.

In some cases it makes little sense to set x=0 though.

In the example:

What is the value of a company with zero earnings?

Anything wrong with this question?



Whether we want to interpret or not, the intercept should always be included

in a regression!

Otherwise we risk to get biased slope coefficients.

Excluding the intercept means that we impose a restriction, i.e., α = 0

00

i iy x i iy x








1.4 Key Assumption



Univariate regression model

From a univariate regression we can only draw ceteris paribus conclusions

stock price = α + β · EPS+ ε

o How do earnings per share affect the stock price given that all other factors

affecting the stock price (which were omitted) are uncorrelated with EPS.

o In most cases such an assumption will be unrealistic.

Multivariate regression model

The multivariate regression analysis allows to explicitly control for many other

factors that simultaneously affect the dependent variable

stock price = β0 + β1 · EPS + β2 · expected growth rate + … + ε

→ This is important both for testing economic theories and for evaluating

policy effects when we must rely on non-experimental data.



The multivariate regression model…

… is more flexible since more general functional forms can be incorporated.

… can explain more variation of y.

… can infer causality in cases where the univariate model would be misleading.

… is the most widely used vehicle for empirical analysis in economics and other

social sciences.



The general multivariate linear regression model is written as

(1.3)

with β0 intercept (i.e. “α“ in the univariate model)

βj j=1,…,k change in y with respect to xi,

holding the other factors fixed

Since there are k independent variables and one intercept, equation (1.3)

contains k+1 (unknown) population parameters.


0 1 1 2 2 ... k ky x x x


Interpretation

In the general case,

the estimates of βi have a partial effects (or ceteris paribus)

interpretation:

i.e., how much will y change if we observe a change in xi

whereas all other xj ≠ i remain unchanged

Nevertheless we can also evaluate the overall impact of simultaneous

changes in all / several of the x1, x2, …, xk


ˆˆi iy x

0 1 1 2 2...

k ky x x x

1 1 2 2ˆ ˆ ˆˆ

k ky x x x








1.4 Key Assumption



Key Assumption: Zero conditional mean assumption

The key assumption for the general model is

E [ ε | x1, x2, … xk ] = 0

i.e. the error terms are uncorrelated with (any combination of)

the explanatory variables.

In a model with two independent variables, this simplifies to E [ ε | x1, x2 ] = 0.

I.e., for any (values of) x1 and x2 in the population, we expect

that the error term will be zero.

1.4 Key Assumption


Since the error term accounts for factors left out (i.e. unobserved data),

this assumption means that the missing factors are not related to

the factors included in the model (i.e. x1 and x2).

This assumption is crucial to obtain unbiased OLS estimators!

However, since we can never observe the true residuals (only the estimated

residuals) it cannot be directly tested.

1.4 Key Assumption








1.4 Key Assumption



The standard regression models assume a linear relationship between

exogenous (xi) and endogenous (yi) variable.

Questions:

Does this linear relationship approximate the true economic model?

Could the relation between x and y be random even though we can

estimate plausible values for α and β?

Does a change in x really cause a change in y?


i iy x


Correlation:

A measure of the degree and type of a relationship how two observations x

and y vary together.

Correlation does not account for the source of the variation.

Causality:

An action is said to cause an outcome if the outcome is the direct result or

consequence of that action.

A change in x leads to a specific, measurable change in y.



An interesting (non-financial) example:

“The researchers predict that one type of health problem will increase with

rising intelligence. Asthma and other allergies are thought by many experts to

be rising […]. Some studies already suggest a (positive) correlation between a

country’s allergy levels and its average IQ.”

(The Economist, June 2010)

Is education/intelligence/IQ the cause for asthma and other allergies?



No, education is not the cause for allergies.

People in developed countries tend to have higher education than in non-

developed countries.

People in developed countries are more often exposed to chemistry and

artificial products and, hence, are more likely to get an allergy.

How about IQ and intelligence leading to allergies?

If education causes intelligence, then the causal link is the same as above.

When the error term contains factors affecting y that are also correlated with x

it can result in spurious correlation:

We find a relationship between y and x that is actually due to other

unobserved factors that affect y and also happen to be correlated with x.



Another interesting (non-financial) example:

Researchers found a correlation between a country’s average IQ and its

disease burden.

Is “disease burden” the cause in this study?

Or, more provocative, are ill people less intelligent?



Yes, disease burden is the causality:

“There is direct evidence that infections and parasites affect cognition.

… Intestinal worms and Malaria are bad for the brain.

… Diarrhoea strikes children hard. ... It prevents the absorption of food

at a time when the brain is growing and developing rapidly.”

(The Economist, June 2010)

Nevertheless, the provocative phrase is wrong. IQ is only affected if one

catches the really bad diseases (at a really bad time).



Finding a correlation is not sufficient.

A regression only shows the correlation between x and y but does not give a

hint on the causality.

One has to use economic theory to prove and interpret the documented

relationship.

Example: “Stock price and EPS”:

The dividend discount model shows that the stock price is an increasing

function of earnings.



Wrap up

The basic idea of OLS estimation is to find the

best linear approximation

for the true relation between the explained and the explanatory variables. This

is achieved by minimizing the sum of squared residuals.

There are univariate and multivariate OLS models.

A multivariate OLS model is more flexible and is the most widely used vehicle

for empirical analysis in economics as well as other social sciences.

Under the Gauss-Markov assumptions, the OLS estimator is the

best linear unbiased estimator (BLUE)

for the true .

β reflects the correlation between x and y but not necessarily the causality.






Empirical Finance





5. The data set



8. Typical problems





Prof. Dr. Dieter Hess2. Application: Multiples

2.1 Theoretical background

2.2 Estimation of multiples

Literature: Damodaran (2001), chapter 24


Multiple valuation = valuation relative to comparable companies

Technique used frequently in enterprise valuation (M&A)

Target company is valued based on the current pricing of companies with

similar characteristics (comparative company approach)

Example: Price earnings ratio (P/E ratio)

According to the P/E ratio the value (or price) of a company

is just a multiple of the earnings it generates

or

2.1 Theoretical Background

stock market price (per share)P/E ratio

net earnings (per share)

stock market price P/E ratio net earnings



Relation of P/E Ratio to traditional approaches

Standard DDM:

Assuming a constant payout ratio: d = Dt / NEt for all t

and a constant growth rate of earnings: Dt+1 = Dt (1+g),

we get

0

1 1

(1 ) (1 )

N Nt t t

t tt t

D d NES

k k

St Market price per share

NEt Net earnings per share

Dt Dividend per share

d payout ratio

k discount rate

00 0

1

(1 ) 1

(1 )

tN

tt

d NE g gS d NE

k k g



Rearranging

yields

→ P/E ratio depends on

the expected growth rate of earnings (g)

the company risk (i.e. the risk adjusted discount rate rate k)

the expected payout ratio (d)

0 0

1

gS d NE

k g

0

0

1 P/E ratio

S gd

NE k g






2.2 Estimation of multiples



First step: Data collection

We want to value the American XYZ brewery employing a P/E Ratio approach.

Information on comparable companies in the beverage industry can be

obtained by several data providers, e.g.

Thomson Reuters Datastream

Bloomberg

Compustat (Standard & Poor’s)

Center for Research in Security Prices (CRSP)

yahoo finance, Onvista, and others

2.2 Estimation of Multiples


Second step: Selection of a sample of comparable firms

Our economic model (i.e. DDM) suggests that the P/E ratio is determined by

the (expected) growth rate, payout ratio and earnings risk:

P/E Ratio = f ( growth, payout, risk )

Therefore, comparable firms are the ones with a similar growth potential, a

similar payout ratio and a similar earnings risk.

Finding firms with identical characteristics is virtually impossible

Often it is even hard to find firms that are “similar” with respect

to all three characteristics

Therefore, we have to identify ways of controlling for differences across

firms on these variables.



Third step: Regression analysis

Traditional multiple analysis selects all (or some) firms from the same industry

(peer group) and computes the average multiple of these firms.

However, a simple average is equal to running a (degenerated) regression of

the multiples on a constant term, i.e.

yi = α0 + εi

where yi = P/E ratio of company i




ˆRemember that is defined as:

1 1ˆˆ

ˆ

Hence, the intercept equals the average

dependent variable if we run a

regression on a constant term:

ˆˆ 0

i iy xN N

y x

y

Company P/E RatioCoca-Cola Bottling 29.18

Molson Inc. Ltd. 'A' 43.65

Anheuser-Busch 24.31

Corby Distilleries Ltd. 16.24

Chalone Wine Group Ltd. 21.76

Andres Wines Ltd. 'A' 8.96

Average: 24.0167

regression output

coefficients t-stat

constant 24.0167 4.8459


Such a simple regression (y on a constant) neglects any differences in value

relevant characteristics (e.g., growth, risk) across peer group firms.

In contrast, a regular regression can easily account for differences

in the value drivers across firms.

Procedure:

Run a multivariate regression of multiples (y) against variables which

presumably influence multiples (e.g. x1 = growth rate, x2 = risk, …):

yi = α0 + 1 xi,1 + 2 xi,2 + … + εi


Use the estimated coefficients from this regression to compute

ˆpredicted multiples (i.e. )iy



Assume we want to value a company with the following value characteristics:

estimated earnings growth rate: 8%

suggested payout ratio: 15%

estimated risk (= annualized standard deviation of stock returns): 23.5%

Earnings in 2006: 3.14 millions

Furthermore assume we want to ignore differences in the payout ratio, but

account for differences with respect to risk and expected growth.


Example: Regression Analysis

Economic hypothesis: PER is primarily influenced by growth and risk

Regression equation:

Company

"PER"

(price earnings

ratio)

"growth"

(analysts forecast of

company growth rate)

"risk"

(annualized standard deviation

of stock market returns)

Coca-Cola Bottling 29,18 9,50% 20,58%

Molson Inc. Ltd. 'A' 43,65 15,50% 21,88%

Anheuser-Busch 24,31 11,00% 22,92%

Corby Distilleries Ltd. 16,24 7,50% 23,66%

Chalone Wine Group Ltd. 21,76 14,00% 24,08%

Andres Wines Ltd. 'A' 8,96 3,50% 24,70%

Average: 24,02 10% 23%


0 1 2i i i iPER growth risk



Download the data set “multiples.dta”.

You may open the data set with a simple double click from Windows Explorer.

Alternatively, open stata

and upload the data set by typing in the command folder:

cd “d:\...\my_data_directory”

use “multiples.dta”

available variables

(after reading the data)make sure you are in the command folder

(bottom right) when typing commands

use “multiples.dta”



The syntax for running a regression is:

regress [dependent variable] [list of independent variabels]

We want to regress the variable “PER” on the variables “growth” and “risk”.

In Stata syntax:

regress PER growth risk



From the regression results we would calculate the following „adjusted“

multiple:

and thus a firm value of

In contrast, a traditional P/E ratio (based on the plain average, i.e. neglecting

differences in value drivers) would result in a firm value of

0 1 2

96.0088 1.6908 3.8825

96.0088 1.6908 8 3.8825 23.5 18.2965

i i i

i i

PER growth risk

growth risk

18.2965 3,140,000 57,451,010i

PER Earnings

24.02 3,140,000 75,422,800i

PER Earnings


Summary :

In its simplest form (yi = β0 + εi) the regression approach is identical to

computing an average as in traditional multiple analysis

Advantages of regular regression (yi = β0 + β1 x1 + … + εi) :

Regression allows to explicitly quantify the impact of additional factors

(i.e. differences in the characteristics of the “peer group”)

Allows to test for the significance of the impact of such differences

Allows to use a larger sample (i.e. by including companies from other

industries while controlling for differences across industries)

→ allows examination of over- and under-valuations across whole market







Empirical Finance





5. The data set



8. Typical problems



Testing hypothesis regarding a single parameter: t-test

Running a regression we get some parameter estimates.

Now, an important question is how reliable the values of the estimated

parameters are?

E.g., for the influence factors of

P/E ratios we obtained the estimates

But can we really say that the influence of earnings growth is positive?

Or could the true parameter value of growth just be zero?

→ Test: How large is the probability of making a mistake

when we say that growth positively influences P/E ratios?


coefficientsintercept 96.0088045growth 1.69075039risk -3.88251778



A simulation example (simulation1_OLS.do)

Draws 100 obs. for X1, X2 and ε and calculates Y = α + β1 X1 – β2 X2 + ε

Then the parameters are estimated using the simulated data

Results for a “small” residual variance (relative to the variance of X1 and X2):

Results for a “large” residual variance:

true alpha = 0.1 beta1 = 1.5 beta2 = -0.5 sigma(eps) = 10.0

# 1 est. alpha = -1.5320 beta1 = 1.3342 beta2 = -0.3292

# 2 -1.3733 2.3848 1.1092

# 3 1.8602 2.1849 -1.2090

# 4 0.6060 1.3722 -0.1153

# 5 0.5676 0.3786 -0.5342

true alpha = 0.1 beta1 = 1.5 beta2 = -0.5 sigma(eps) = 0.1

# 1 est. alpha = 0.1035 beta1 = 1.4990 beta2 = -0.5174

# 2 0.0941 1.4839 -0.4879

# 3 0.1197 1.5095 -0.4792

# 4 0.0945 1.5031 -0.5079

# 5 0.1200 1.5046 -0.4994



A second simulation example (simulation2_OLS.do)

Simulation2_OLS.do repeats the previous simulation 1.000 times

and produces a histogram of the resulting parameter estimates

“Small” residual variance (0.1) “Large” residual variance (10.0)

010

20

30

40

De

nsity

1.46 1.48 1.5 1.52 1.54beta1

0.1

.2.3

.4

De

nsity

-2 0 2 4 6beta1


Idea of hypotheses testing

Formally, we start with a given set of hypotheses about β, e.g.

Routinely, stata tests

To test such a hypothesis, we need to know the distribution of β given that

the null hypothesis is valid. From this we can construct a test statistic.

Knowing the distribution, we can say how likely it is to observe a given

value of a test statistic (corresponding to the estimated parameter value).


0 0

0 1

"Null hypothesis" "Alternative"

: against : j j j jH H

0 1 : 0 against : 0j jH H


Under the full ideal conditions (i.e., the Gauss-Markov assumptions (A1) to

(A4) and with normally distributed error terms (A5)) we know that the OLS

estimator (i.e., the vector of parameter estimates) is distributed as

(3.1)

where cjj is the (j, j) element in (X´X)-1.

Typially σ is unknow, but it can be estimated from the (estimated) residuals:


12ˆ , X X

ˆThus, each element in the vector is normally distributed

2ˆ ,j j jjc

2 2

1

1ˆˆ

( 1)

N

i

i

uN k


Based on the estimate of σ we can obtain the t-statistic:

The t-statistic follows a Student’s t-distribution with N-(k+1) degrees of freedom

(→ Appendices t-statistic and Distribution Theory)

If a regression has k regressors and an intercept we obtain k+1 individual

t-statistics, i.e., one for every coefficient plus one for the intercept.


0 0ˆ ˆ

ˆ( )ˆ

j j j j

j

jjj

tsec

This test statistic can be computed from

ˆ - the estimate and

ˆ - its standard error ( ) = ˆ .

j

j jjse c


Example

Consider the OLS estimation:

The null hypothesis H0: β = 0 states that EPS do not have the assumed

effect on the stock price.

In contrast, if we can reject the null hypothesis H0: β = 0,

we can say that the alternative hypothesis, H1: β ≠ 0, holds.

Then EPS are related to stock prices.


stock price EPS


Significance Level

The usual testing strategy is to reject H0 if the test-statistic tj becomes so large

that is very unlikely given H0 is true.

Mathematically speaking, H0 is rejected, if the probability to observe

such a large absolute value of |tj| is quite small, i.e., below some threshold α

(e.g., 5%).

This threshold probability α is called the significance level.




Critical values (two-sided test)

Formally, we define a critical probability α (=significance level)

and derive a critical value of the t-statistic tN-(k+1);α/2 using

Prob { |tj| > tN–(k+1),α/2 } = α

Then, if H0 holds,

H0 will be correctly accepted

with probability (1 – α).

H0 will be erroneously rejected

with probability α.

1-α

α/2α/2

f(t)

tN-(k+1);α/2- tN-(k+1);α/2t

rejection rejectionacceptance


Critical value (one-sided test)

For a right-sided test, the critical value is determined from

Prob { tj > tN–(k+1),α } = α

It is important to note that a one-sided test

makes it "easier" to reject the null hypothesis

because it requires a smaller critical value

to reject, i.e., tN–(k+1),α < tN–(k+1),α/2 .


α

f(t)

tN-(k+1);αt

rejectionacceptance

1-α


p-values

Basic question: Which significance level should be tested?

Different researchers prefer different significance levels,

depending on the particular application.

There is no „correct“ significance level.

Typically, for large N we require smaller α.

Rather than testing at different significance levels, one can ask:

What is the smallest significance level, at which H0 would be rejected?

This is the p-value.

In other words, the p-value is the probability of observing a given t-statistic

if the null hypothesis is true.



Example

If the p-value = 0.6, then we expect to observe a value of the t-statistic as

extreme as we did in 60% of all random samples if H0 is true.

In other words, it is not unusual to observe such a t-statistic (given H0 is true).

In contrast, if the p-value was 0.01 there is only a chance of 1% to observe

such a t-statistic (given H0 is true).

In other words, it is very unlikely to observe such a t-statistic

and therefore it is unlikely that H0 is true.

In general, the smaller p-values the harder the evidence against H0.




When a hypothesis

is statistically tested, two types of errors can be made:

1. H0 is rejected, although it is true

(i.e. H0 is erroneously rejected)

→ Type I error.

2. H0 is not rejected, although H1 is true

(i.e. H0 is erroneously accepted)

→ Type II error

Reality

correctType II

error

Type I

errorcorrect

0 0

0 1: vs. : j j j jH H

0j j

0j j 0j j

0j j

De

cis

ion



Example

Recall the regression output from the multiple application:

0 1 2 i i i iPER growth risk


3. Chapter Summary

A hypothesis test is the use of statistics to decide whether a hypothesis is true.

If the observed values are unlikely under the assumption of H0, it indicates that

H0 does not hold.

In regression analysis, one obtains for every regressor, including the intercept,

a t-statistic

,

which is t-distributed with N-(k+1) degrees of freedom.

These t-statistics, or equivalently the corresponding p-values, are used in two-

sided or one-sided hypothesis tests.

Type I error, the probability that H0 is erroneously rejected, is determined by

choosing the significant level α of the test. Type II error, the probability that H0

is not rejected although it is not true, depends on the true parameters.

0ˆ

ˆ

j j

j

jj

tc





Statistical Appendix


From (3.1) follows, we can standardize β, i.e.,

Then, the resulting variable z has a standard normal distribution, i.e. z ~ N(0,1).

However, to obtain the above expression we have to assume that σ is known.

But in practice, this is typically not the case.

Since σ is typically unknown, we have to substitute σ (resp. σ2) by an

appropriate estimate, i.e.,

But then, the above given ratio z changes and does no longer follow exactly a

standard normal distribution.

Appendix: t-statistic

0ˆ

j j

jj

zc

2 2

1

1ˆˆ

( 1)

N

i

i

uN k


Therefore, the t-statistic (i.e., the z-statistic after substituting σ with an estimate)

is the ratio of a standard normal variable and the square root of an independent

Chi-squared variable.

This ratio has a t-distribution (Student’s t-distribution) with N-(k+1) degrees of

freedom.

(→ Appendix: Distribution Theory)


0ˆ

ˆ

j j

j

jj

tc

ˆIt can be shown that the estimate ˆ is independent of and has a

Chi-squared distribution with ( 1) degrees of freedom:N k

2

2

( 1)2

ˆ( 1)

N kN k


Example

Consider the OLS estimation:

The null hypothesis H0: β = 0 states that EPS do not have the assumed

effect on the stock price.

In contrast, if the null hypothesis H0: β = 0 does not hold, we can say that

the alternative hypothesis, H1: β ≠ 0, holds. Then EPS are related to stock

prices.


stock price EPS


A normally distributed variable follows a standard normal distribution if µ = 0

and σ = 1. The density function of such a variable is given by

If z1, …, zJ is a set of independent normal variables with mean µ and

variance σ2, it follows that sum of the squared standardized variables

follows a Chi-squared distribution with

J degrees of freedom.

Appendix: Distribution Theory

21 1exp

22z z

2

2

21

( )Jj

J

j

z

0

2

(5)

2

(10)

2

(30)

2

( )-value

v


If z has a standard normal distribution, z ~ N(0,1)

and

and are independent,

the ratio has a t-distribution with J degrees of freedom.

If J approaches infinity, the t-distribution approaches the normal distribution.

Appendix: Distribution Theory

2

J

and z

zt

J

t (∞) = N(0,1)

t (4)

t (1)






Empirical Finance





5. The data set



8. Typical problems





Prof. Dr. Dieter Hess4. Variable selection

4.1 Dummy variables

4.2 More or less variables?

4.3 Proxies for unknown variables

Literature: Wooldridge, chapter 7


Dummy variables provide a powerful concept for empirical work

On the one hand they allow to incorporate qualitative factors, e.g.

industry of a firm (manufacturing, retail, etc.)

member of an index (S&P500, DJ, …),

...

Since the “ordering” of such qualitative variables conveys no useful

information, they are typically coded in binary form (e.g., 1 = firm belongs

to manufacturing industry, 0 = does not belong to manufacturing industry

recession; or: 1 = economy is in a recession, 0 = expansion).

On the other they allow to account for important differences between

observations, e.g.,

state of the economy (expansion vs. recession)

asset characteristics (fixed vs. current; long-term vs. short-term, …)

4.1 Dummy Variables


4.1 Dummy Variables

How to create dummy variables?

In defining a dummy variable, we must decide which event is assigned the

value one and which is assigned the value zero.

For example, we distinguish firms whether they belong to the S&P 500 index

or not.

For every observation in our data set we have to compute a value for the

following dummy variable.

1

0

, if firm belongs to the S&P 500 at date t index

, else


4.1 Dummy Variables

Example

Assume we have two datasets

(1) EPS_data.dta (containing the variables: Year FirmIdent EPS)

(2) INDEX_data.dta (containing the variables: Year FirmIdent IndexName)

For every observation in our EPS data set we want to compute a dummy

variable called “index” indicating whether a firm in a given year is a member of

the S&P500:

use EPS_data.dta

merge Year FirmIdent using INDEX_data.dta

gen index = 0

replace index = 1 if IndexName == “S&P500”


4.1 Dummy Variables

How do we incorporate dummy variables into regression models?

We can use them just like any other variable

For example, in the simplest case of only a single dummy explanatory

variable, we just add the dummy variable as an additional independent

variable:

(4.4)0 1 2 stock price index EPS


How to interpret the results?

In model (4.4), only two observed factors affect the stock price:

earnings per share (EPS) and index membership (index).

with index = 1 when the firm belongs to the S&P500

index = 0 when the firm is not listed in the S&P500

Then β1 captures the difference in stock prices between stocks belonging

to the index those not belonging to it, given the same earnings per share.

4.1 Dummy Variables

11 0 E stock price index ,EPS E stock price index ,EPS

0 1 2 stock price index EPS


The results can be depicted graphically as an intercept shift between two

regression lines (with the same slope β2)

4.1 Dummy Variables

stock = β0+ β2EPS

stock price

EPS

β0

β0 + β1

slope = β2

stock = β0+ β1 index + β2EPS


On a large sample for the year 1990 (stock_eps_1990.dta) we obtain:

From this regression we can conclude that stock prices tend to be higher

for S&P 500 Index firms (at the same level of earnings per share).

4.1 Dummy Variables


In our example, we have chosen firms not belonging to the S&P500 to be the

base group or benchmark group (by assigning the value 0 to those firms)

The base group is defined as the group, against which comparisons are made.

We could run the same regression slightly differently by dropping the overall

intercept in the model and including a dummy variable for each group:

4.1 Dummy Variables

stock price = β0 · non index + β1 · index + β2 · EPS + ε

with1

0

, if firm belongs to the S&P500index

, else

1

0

, if firm does not belong to the S&P 500non index

, else


The resulting regression lines are identical, but the coefficients will be different

and have to be interpreted differently:

β0 = intercept for non-members (as before)

β1 = intercept for members (before β0 + β1)

4.1 Dummy Variables

stock = β0 non index + β2EPS

stock price

EPS

β0

β1

slope = β2

stock = β1 index + β2EPS


4.1 Dummy Variables

(1) stock price = β0 + β1 · index + β2 · EPS + ε

(2) stock price = β0 · non index + β1 · index + β2 · EPS + ε

Why should we use one or the other specification?

Specification (2) is preferable if you are interested in the values of the

intercepts of each group. But you can also find this out with specification (1).

Specification (1) is preferable if you are interested whether there is a statistical

difference in the intercepts. Then you just need to look at the p-value of β1.

From specification (2) this cannot be seen without additional testing.


Can we also estimate the following model?

In this case, the matrix X would look like:

→ exact multicollinearity: constant = non index + index

4.1 Dummy Variables

(3) stock price = β0 + β1 · non index + β2 · index + β3 · EPS + ε

i constant

(β0)

Non index

(β1)

Index

(β2)

EPS

(β3)

1 1 0 1 5

2 1 1 0 2

3 1 1 0 1

4 1 0 1 4

… … … …


Dummy Variables for multiple categories

What happens if a variable has more than 2 categories,

for example, g industry groups?

Then the regression model needs to produce different intercepts for

these g categories, by

(1) including g -1 dummy variables along with an overall intercept, or

(2) including g dummy variables without an overall intercept.

Again in case (1), the intercept for the base group is the overall intercept in

the model and the g - 1 dummy variable coefficients represent the estimated

differences between a particular group and the base group.

As in the previous 2 category example, including g dummy variables along

with an intercept results in exact muliticollinearity.

4.1 Dummy Variables


Interaction terms

A further powerful application of dummy variables is to interacted them with

explanatory variables to allow for a difference in slopes.

We construct a new variable by multiplying the dummy with another

independent variable

For example, we want to test whether an increase in earnings per share

affects S&P500 firms more than non S&P500 firms.

In addition, we could still allow for a constant EPS differential between the

two sections:

4.1 Dummy Variables

i iinteract index EPS

stock price = β0 + β1 · index + β2 · EPS + β3 · interact + ε


With

we can allow for

(1) different intercepts across the 2 groups

β1: measures the difference in intercepts between index and non index firms

(2) different slopes across the 2 groups

β3 : measures the difference in the strength of the impact of EPS

4.1 Dummy Variables


stock

price

non S&P500 firms

EPS

S&P500 firms

E.g., if β1 >0 and β3 > 0 we get:





4.1 Dummy variables



Literature: Wooldridge, chapter 3


Selecting Regressors

To find potentially relevant variables, economic theory should be used.

For example, when specifying expected returns we will use finance theory.

(CAPM, Fama French, … )

Although it is sometimes suggested to select variables on the basis of

statistical arguments (i.e., tests), those are never certainty arguments.

Remember, there is always a possibility of a type I or II error.

Selecting regressors solely based on statistical test sequences

is sometimes referred to as data mining.



Selection Biases

What happens …

(1) when a relevant variable is excluded from the model?

(2) when an irrelevant variable is included in the model?

(3) when the included variables are (highly) correlated?



(1) Omitting relevant variables

Consider the following two models

(4.2)

and

(4.3)

Both models can be interpreted as describing the conditional expectations

of yi given xi, zi (and maybe some additional variables).

We say model (4.3) is nested in (4.2) and implicitly assumes that zi is

irrelevant (γ = 0).


i i i iy x z

i i iy a bx u



From this relationship, it follows that and are equal if one of

the following conditions is fulfilled

1.) The partial effect of on is zero in the sample. That is, 0.

2.) and are

b

z y

x z

uncorrelated in the sample. That is 0.

The second term, i.e., γδ, is the omitted variable bias,

i.e., the bias in the OLS estimator b due to estimating an incomplete model.

It can be shown that

where is the slope coefficient from an auxiliary regression of on .

b

z x


Omitted variable bias

There will be no omitted variable bias (i.e., the estimator b is unbiased)

in two cases:

(1) If γ = 0, which implies that the omitted variable (zi) has no influence

on y and thus the two models are identical.

(2) If Cov(x,z) = 0 or if E[x∙z] = E[x]∙E[z]

This implies that x and z are uncorrelated.

In this case, x and z are said to be orthogonal.


b


Example:

Recall the two model specifications from section 4.1:

(1)

and (2)

Assume (1) is the true model, but we have estimated (2). Then the omitted

variable bias is:

OVB = 13.67 - 5.20 = 8.47

Omitting “index” overestimates the effect of “interact”.



stock price = b0 + b2 · EPS + b3 · interact + ε

(15.09) (15.46) (5.20)

(15.45) (13.67)


(2) Including irrelevant variables

Consider again

(4.2)

and

(4.3)

But now assume that we estimate (4.2) while in fact model (4.3) is

appropriate, i.e., we include an irrelevant variable (zi).

In this case, the estimator for β is unbiased (since γ is zero)

But usually the estimated β will have a higher variance (and thus may

be insignificant although the xi variables are relevant ones).


i i i iy x z

i i iy a bx u


(3) Inclusion of highly correlated variables (Multicollinearity)

In general you can include variables in your model that are correlated.

In our stock price EPS example we may want to include both a S&P500

dummy and the size variable.

However, if the correlation between two variables is too high, this may

lead to problems.

→ Technically, the matrix is close to being not invertible

→ It may be hard for the model to identify the individual impact

of one variable

In the extreme case, one explanatory variable is an exact linear

combination of one or more other explanatory variables (including the

intercept). Then the estimation procedure will break down.

→ This is referred to as exact multicollinearity



Consider the following example:

Moreover, assume that the sample variances of x1 and x2 are equal to 1,

while the sample covariance is r12.

Then, the variance of the OLS estimator can be written as


1 2Assume that all variables are demeaned, i.e.: 0 y x x

1 1 2 2 y x x

12

12 122

2

12 1212

1 2 12

12

|

1 11ˆ 1 11

ˆ ˆ the variances of and increase for larger |,

and therefore, the t-statistics will be lower.

if is positiv

r rNVar

r rN r

r

r

1 2 ˆ ˆe, and will be negatively correlated.



Coefficient becomes smaller and insignificant when variable RevT,t is included.

Source: Gilbert (2010), Information Aggregation Around Macroeconomic Announcements: Revisions Matter


Trade off

→ Including as many variables as possible in a model is not a good strategy

since it may produce insignificant estimates for the relevant variables.

→ Including not enough variables is neither a good strategy

since it may produce biased estimates.






4.1 Dummy variables



Literature: Wooldridge, Chapter 6 and 9



Multiple economic concepts are difficult to measure. Therefore, we need

proxies (=approximations) in order to work with them.

Example 1: “Risk” as a regressor

Risk is an abstract concept that needs a proper definition.

If a relevant variables is omitted, there could be an omitted variable bias.

Therefore, one needs proxies for not measureable variables.

Idea: Include a measureable variable to capture the effect of risk.



Suppose we have

where x2 is the unknown variable that measures risk.

Introduce a proxy x3 that correlates with x2. This is captured in the regression

where ν is an error term. If δ1 is zero then x3 is not a suitable proxy. Else, use

x3 in the regression as if it were the unknown variable x2.

Common proxies for risk are the CAPM beta, the historic standard deviation, or

Value at Risk.

0 1 1 2 2y x x

2 0 1 3x x



Example 2: “Marcoeconomic conditions”

There is no perfect definition for the state of the economy (recession or

expansion). Still, the state is an important determinant for multiple models.

The most common proxies are

Macro indicators (CFNAI, NBER, XRIC)

Gross domestic product

Term or default spread

Industry production

Different proxies might lead to different results. If a certain definition is used,

one should do a robustness check with another proxy to exclude that the effect

is driven by the proxy.



Source: Bestelmeyer/Hess (2011): Stock Price Responses to Unemployment News: State Dependence and the Effect of Cyclicality

Estimated coefficients

robust when using three

alternative recession

measures (all three

only ex-post observable):

NBER

CFNAI

XRIC






Empirical Finance





5. The data set



8. Typical problems





Prof. Dr. Dieter Hess5. The Data Set

5.1 Data quality

5.2 Summary measures

5.3 Identifying outliers

5.4 Plotting variables

Literature: Olsen (2003): “Data Quality”, Chapter 2.1, 2.3


5.1 Data quality

Financial data is available form multiple sources:

Bloomberg, Thomson Reuters, Yahoo Finance

Center for Research in Security Prices (CRSP)

Standard & Poor's Compustat (Compustat)

Statistisches Bundesamt (Germany), Federal Reserve (USA)

Surveys, controlled experiments, own data collection, etc.


5.1 Data quality

The outcome of an empirical analysis often depends on data quality.

Data quality has several dimensions, in part.

1. Accuracy: Is the given information precise? Does it contain errors?

Are the measurements correct?

2. Relevance: Does the data describe the given problem?

Does it represent the relevant population?

3. Understanding: Are all variables understood well?

4. Reliability: Is the source of the data reliable?

Was the data collected correctly?





5.1 Data quality




Literature: Albright/Winston/Zappe (2003): “Data Analysis”, Chapter 2

Kohler/Kreuter (2006), “Datenanlyse mit Stata”, Chapter 7



Frequency Tables

A frequency table lists the number of observations of some variable that fall in

various categories. In Stata:

tabulate [varname]

In our example of the S&P500 indicator (stock_eps_1990.dta) we see

- 91.71% of the data describes non S&P500 stocks

- all 500 S&P stocks are included (good to know!)



Summarize

While tables are helpful for analyzing discrete variables (i.e. dummies),

continuous variable have to many values to be presented in a table. In this

case other descriptive statistics may be helpful, e.g. “summarize” or

“codebook” which provide important statistics such as

min

max

range = max – min

1

/2

1

1mean

median

1ˆstandard deviation ( )²

1

N

i

i

N

N

i

i

x xN

x

s x xN



Summary statistics

In Stata summary statistics are obtained by

summarize [varname]

adding “,d” gives information about percentiles.





5.1 Data quality








Outliers

Outliers are observations that differ strongly from the other observations.

From a practical perspective, outlying observations can occur for two reasons

errors in the data (typos, wrong format, …)

correct but strongly heterogeneous data (e.g. small sample but some

members are quite “different” in relevant aspects)

Dropping or keeping outliers may change estimation results substantially.

But the decision to keep or drop such observations is difficult

and statistical properties of the resulting estimators are complicated.

Detecting outliers:

Max and min give a hint on the existence of outliers.

Graphical inspection gives more hints on the existence of outliers.



Outliers

In small samples outlier might lead to misleading results, i.e. more or less

strongly biased coefficient estimates which may be insignificant.

In large samples their marginal influence to estimates is smaller and the

application of regression analysis is less problematic.



Example:

When comparing PE-ratios of German automobile manufacturer in November

2008 Volkswagen AG is an outlier due to its unusual high stock price

(maximum was €1005.01 on 28.10.2010).

In a regression of PE-ratio on payout, risk and growth, the coefficients

would be strongly biased.

But it’s not a typo!

So, should we keep or drop VW?





5.1 Data quality








Graphical inspection tools

A “look at the data” can be also very helpful, for example with histograms or

scatter plotts.

Histograms

A histogram is a bar chart of frequencies of the categories and comes handy

when dealing with continuous variables. The size of the categories (or

intervals) has to be selected by hand.

It approximates the empirical density function and gives an idea of the range,

distribution, mean and variance of the data.



In Stata a histogram is obtain by

hist [varname], bin(n)

n chooses the number of categories.

In Stata conditions (if) can be added to almost every command. For example,

we can reduce the sample to S&P500 firms in the following way:

hist eps if SP500ind ==1, bin(50)



Example

On the right we have the

histogram of earnings per

share in 1990 for our

S&P500 firms

The mean is above zero meaning that the average S&P500 firm has positive

earnings. Most of the data is spread around the mean. The observations on

the far left and right indicate outlier (-3, 6, 9,).



Scatter Plots

A useful way to picture the relationship between two variables is to plot a point

for each observation, where the coordinates of the points represent the values

of the two variables. By examining the points of this plot called scatter plot we

can usually see whether there is any relationship between the two variable,

and if so, what type of relationship it is.

positive linear relation no relation



Example

We examine the relation between stock prices and EPS for S&P500 firms but

exclude firms with EPS higher than 5 to obtain a better plot. The Stata

command is

sc [var1] [var2]

By adding

if eps <5 & SP500ind ==1

to the command we can

exclude those observations

with an eps higher than 5.

We find a positive linear

relation between the variables


5. Chapter Summary

Data quality is defined in different terms, in part.

accuracy, relevance, understanding, reliability

Outliers are observations that strongly differ from the population.

Their existence influences regression estimates.

Always start with a visual inspection of your data

The distribution can be inspected with tables and histograms.

The relation between two variables can be displayed using a scatter plot.

Summary measures give numeric information about the variables.






Empirical Finance





5. The data set



8. Typical problems





Prof. Dr. Dieter Hess6. Application: CAPM


6.2 CAPM tests & extensions

6.3 Empirical exercise

Literature: Copeland/Weston/Shastri (2005), chapter 6.

Ross/Westerfield/Jaffe (2005), chapter 9 and 10.


Security market line (SML)

SML measures the expected return of (inefficient) individual assets

dependent on their relevant risk (= β risk)

β measures systematic risk of an asset,

i.e. the risk that cannot be eliminated

through diversification.

This non-diversifiable risk (β risk)

determines the expected return on assets

In equilibrium all assets yield returns

proportional to their non-diversifiable risk,

i.e., expected returns must be on the SML


E Ei f i M fr r r r

rf

M

βi

E[ri]

E[ri]

βM=1βi

E[rM]

,

cov ,i M ii i M

M M

r r

Var r


Testing the CAPM?

The CAPM (in particular, the SML) predicts that expected returns on individual

asset returns are driven by their expected relevant risk.

In empirical research, we can observe only ex-post realized returns of different

assets over a number of periods.

However, the SML is an ex-ante equality in terms of unobserved expectations:

This implies that the cross-section of expected returns is explained by the

expected betas.

For a valid test of the CAPM we would need expectations data, i.e., investors’

return expectations for different stocks but also their expected betas (as beta is

presumably non-constant over time):


E and Ei i f i iy r r x

E Ei f M f ir r r r








Literature: Haugen (2001), chapter 9


Fama-MacBeth (1974)

First study which provides an (approximate) test of the CAPM.

Main Problem: no historical expectations data available.

For example, we simply don’t know what market participants expected

in 1965 about the return and risk of IBM over the next year.

But what could you do if you need to estimate an asset’s β

for the year ahead?



For example, the betas you can obtain from Bloomberg, Reuters, …

are simply estimated from the return data of the last (n) year(s).

E.g., IBM’s β for 2012 can be estimated using returns during 2011

Can you get a better prediction? Not very likely!

So why not estimate β for 1965 from market returns during 1964?


2012

cov Returns of IBM during 2011, Market returns during 2011

Market returns during 2011Var


Fama/McBeth use a “rolling window” technique to obtain proxies for

investors’ β expectations given the available information at a point in

time:

In addition, to reduce noise in their β predictions, Fama/McBeth estimate

the β’s of portfolios (instead of individual assets)


Next 5 years of data

used to estimate ,6 10î

t=10

First 5 years of data

used to estimate

t=0 t=5

t=10t=0 t=5

,1 5

î

→10 ,11 15 ,6 10

Ê i i

→ 5 ,6 10 ,1 5Ê i i


Fama/McBeth then regress

actually observed portfolio excess

returns over the next 5 years

on predicted betas from the past 5 years


Exce

ss r

etu

rn

of

Port

folio

iat

tim

e t

Predicted beta

of Portfolio i at time t

, , 0 1 , ,ˆ i t f t i t i tr r a a

, ,ˆi.e., i t i tx

, , ,i.e., i t i t f ty r r


To determine, whether the security market line exhibits any evidence of

nonlinearity, Fama/McBeth add an additional term to their equation:

According to the CAPM

→ The intercept a0 should be equal to or greater than the risk-free rate in

the bond market.

→ The security market line should be linear, so the mean value for

the coefficient a2 should not be significantly different from zero.

In fact, a2 should be equal to the average market risk premium,

i.e., E[rM] – rf, which is equal to the slope of the SML.


2

i,t f,t 0 1 i,t 2 i,t i,tˆ ˆ r r a a a


The CAPM also predicts, that beta is the only determinant of expected security

returns.

Residual variance (variance not explained by beta) is supposedly unimportant

because it can be diversified.

Fama/MacBeth test this prediction by including a residual variance term in the

relationship (RVi,t = remaining part of variance of portfolio i not explained by

beta):

According to the CAPM, this residual variance should not affect

the expected rate of return of a portfolio!

→ a3 should not be significantly different from zero.


2

i,t f,t 0 1 i,t 2 i,t 3 i,t i,tˆ ˆ r r a a a a RV


The central results of the Fama/McBeth test are as follows:

i, t 0 1 i, t i, m

2

i, t 0 1 i, t 2 i, t i, t

ˆ

0.0061* 0.0085*

ˆ ˆ

0.0049*

f

f

r r a a

r r a a a

2

i, t 0 1 i, t 2 i, t 3 i, t i, t

0.0105* -0.008

ˆ ˆ

0.0020 0.0114*

fr r a a a a RV

-0.0026 0.0516


* indicates a significance level of 10%


Interpretation:

Overall, the Fama/McBeth results are consistent with the predictions

of the CAPM

Portfolios with greater than average beta factors will tend to produce

greater than average rates of return in subsequent periods.

Little or no evidence of nonlinearity in the relationship between beta and

return.

No forecast of future returns based on the residual variance of the

stocks in the portfolio.



More recent tests of the CAPM

Fama/French (1992 JF, 1992 JFE) extend the Fama/McBeth analysis

by including tow additional “ad-hoc” factors:

Size (Market Equity Value = Stock Price ∙ Shares outstanding)

Small firms tend to have a lower (and more volatile) profits.

Stock portfolios sorted on size tend to have different risk premia (even after

controlling for their different betas). Hence “size” is interpreted as a risk factor

on its own, i.e., not associated with market risk.

BE/ME (Book Equity Value / Market Equity Value)

Firms with high BE/ME (low stock prices relative to book values)

tend to have persistently low earnings on assets.

Similarly, stock portfolios sorted on BE/ME have different risk premia (after

controlling for beta and other effects) and thus “BE/ME” is a also interpreted as

a separate risk factor.



Fama/French (1992 JFE) use a “time-series approach”, in addition to the

Fama/McBeth approach.

This time-series approach is very interesting as it is widely used to measure

“risk-adjusted” returns (in academia as well as in practice).

To approximate risk premia associated with a particular risk factors, we need to

construct a “mimicking portfolio”, e.g., for the Fama/French factors:

SMB = Return on a portfolio of “small” stocks

– Return on a portfolio of “large” stocks

HML = Return on a portfolio with “high” BE/ME

– Return on a portfolio with “low“ BE/ME

SMB is the return on a portfolio which requires zero investment (buy small stocks,

sell short large stocks) but is particularly strongly exposed to the “size risk factor”.

Similarly, HML is the return on a zero-investment portfolio being strongly exposed to

“cheapness” (long high BE/ME stocks, short low BE/ME stocks)




Excursus: Construction of Mimicking Portfolios

(1) Sort assets according to a certain characteristic

(e.g., according to their “size”, i.e., price x shares outstanding)

(2) Form portfolios

(e.g. quintile portfolios: Q1 = smallest 20% firms, …, Q5: largest 20%)

(3) Compute the returns on these portfolios

(e.g., rQ1, rQ2 , rQ3 , rQ4 , rQ5)

(4) Construct a portfolio that requires zero investment

(e.g., buy portfolio Q5 and raise the necessary funds by selling short Q1)

(5) Compute the return on this zero-investment portfolio

(e.g., rzero-investm. portf. = rQ5 – rQ1)

Due to its interpretation (return earned for bearing a particular risk)

the mimicking portfolio approach is widely used in empirical finance.


Fama/French (1992 JF, 1992 JFE) find that

SMB (or size) and

HML (or BE/ME)

are strongly related to the explanation of cross-sectional and time-series

variation in excess returns.

Carhart(1997, JF) contributes a forth factor (based on the Jagadeesh/Tittman

(1993) one-year momentum anomaly):

MOM (i.e., momentum)

portfolio with a high exposure to stocks which out-performed the

market during the last year (i.e., winners), but which requires zero

investment (i.e. long winners, short losers)



With the above introduced additional factors we obtain the Four-factor (time

series) model:

Interpretation:

β1 exposure to market risk >1 → overweight in market risk (CAPM beta)

β2 exposure to size risk >0 → overweight in small stocks

β3 exposure to cheapness >0 → overweight in high BE/ME stocks

β4 exposure to momentum >0 → overweight in momentum stocks

(i.e., with very high returns last year)

α a portfolio’s “alpha” >0 → excess return after controlling for

risk premiums for 4 factor exposures


, 1 , , 2 , 3 , 4 , , ( ) i t f M t f t i t i t i t i tr r r r SMB HML MOM


Summary

The insight provided by the CAPM is a major step in understanding how

securities are priced in the market place.

Empirical evidence on the “pure” CAPM is “mixed”.

But empirical results have to be interpreted with care (in particular, due to the

fact that the studies use not “expected” returns, but only “realized” returns)

Empirical evidence on the “extended” CAPM (4-Factor version) is much

better, but the existence of additional factors is evidence against the CAPM.

Nevertheless, the CAPM (and the 4-Factor version) is widely applied in the

securities industry, e.g., to measure risk-adjusted returns of portfolios (or the

“performance” of a fund manager which is not due to taking higher risks).




(1) open file with monthly size sorted portfolio returns and the 4 risk factors:

cd {your data directory name}

use monthly_portfolio_returns.dta, clear



(2) explore the data set and its variables



(3) Compute excess returns

gen exret_vw = vwretd – rf

gen exret_sp = sprtrn - rf

gen exret_MC1 = pfret_MC1 – rf

gen exret_MC2 = pfret_MC2 – rf

...

(4) Have a look at your data, e.g.

tabstat exret* , stat( N mean p50 min max)



(5) Regress your excess returns on the 4 factors (beta, smb, hml, mom)

A. Excess return on a value weighted market portfolio:

reg exret_vw beta smb hml mom



B. Excess return on S&P-500 index:

reg exret_sp beta smb hml mom

Interpretation of the result?



C. Compare the regression results for the size-sorted portfolios:

eststo _reg1: reg exret_MC1 beta smb hml mom





esttab _reg*, r2

Note: If the command eststo is not working, you need to install the additional

package “estout” with the following command: ssc install estout

The package estout is a useful tool for displaying and comparing regression

results.



Interpretation of the results?



For comparison: estimation results for Fama/French 5 industry portfolios






Empirical Finance





5. The data set


7. Goodness-of-fit

8. Typical problems





Prof. Dr. Dieter Hess7. Goodness-of-fit

7.1 The determination coefficient

7.2 AIC and BIC

Literature: Verbeek (2004), chapter 2.4

Wooldridge (2000), chapter 2


Basic Question

How well do the explanatory variables X explain the dependent variable y ?

To answer this question, we divide the total variation of y

(Total sum of squares)

into two parts:

(Explained sum of squares)

(Residual sum of squares)


2

1

N

i

i

SST y y

2

1

ˆ

N

i i

i

SSR y y

2

1

ˆ

N

i

i

SSE y y


The R² then tells us which fraction of the total variation of y is explained by x

Properties:


2

2

2 1

22

1

ˆˆˆ

ˆ ( )

N

i

i

N

i

i

y yySSE

RSST y

y y

2 2

ˆ,y yR i.e., R² = square of correlation of y and regression line

0 ≤ R2 ≤ 1 if the model has an intercept

Adding variables leads to an increase in the R2


Adjusted R2

Since the R2 increases with the number of regressors, it is not a good criterion

to discriminate between differently large regression models.

Another modified measure is needed to correct for the inclusion of (too) many

explanatory variables

As long as the model includes at least one regressor it holds that adj. R² < R².

(N-1)/(N-k-1)>1 can be viewed as a penalty for adding additional regressors.


2

2

2

1ˆ

1. 1

1

1

( 1)1

( 1)

i i

i

i

i

y yN k

adj R

y yN

N SSR

N k SST



Example

Recall application 1 where we have estimated multiples for corporate

valuation.

For the univariate model

the following regression output is obtained:

0 1P/E Ratio (expected) growth



If we include the risk variable in our equation, i.e.

→ The inclusion of risk leads to an increase in R2 as well as an increase

in the adj.

0 1 2P/E Ratio (expected) growth STDV

2R



However, if we include a random variable (“random”) instead of risk that

should not be systematically related to the P/E ratio, i.e.

→ although the random variable is unrelated to y (and thus could not have any

explanatory power for y), R2 increases,

→ in contrast, the adj. R2 decreases compared to the univariate model.

0 1 2P/E Ratio (expected) growth random




Prof. Dr. Dieter Hess7. Goodness-of-fit


7.2 AIC and BIC

Literature: Verbeek (2004), chapter 3



Information Criteria

Information criteria are similar to the adjusted R², as they provide a tradeoff

between goodness-of-fit and the number of parameters used (k+1).

Akaike´s Information Criterion:

Schwarz’s Bayesian Info. Criterion:

Both information criteria have to be minimized over choices of k+1.

The penalty for additional regressors is somewhat larger in BIC. Therefore this

criterion is more “conservative” (tends to favor more parsimonious models).

In Stata the AIC and BIC are called via the post-estimation command:

estat ic

7.2 AIC and BIC

2

1

1 2( 1)ˆAIC log

N

i

i

ku

N N

2

1

1 ( 1)ˆBIC log log

N

i

i

ku N

N N


Reconsider the linear relation between PE-ratio and the growth rate and its

regression output. The use of “estat ic” gives us the follwing output:

7.2 AIC and BIC

R2 = 67.82%

Adj. R2 = 59.77%

AIC = 42.82

BIC = 42.40


If we add the variable risk, the goodness of fit measures will change.

In case of a “meaningful” additional explanatory variable we should observe

- a decrease of the AIC and BIC and

- an increase of the adjusted R² :

7.2 AIC and BIC

R2 = 88.65% +

Adj. R2 = 81.08% +

AIC = 38.57 -

BIC = 37.94 -


In contrast, adding a “meaningless” variable (i.e. having no additional

explanatory power) we should observe

- an increase of the AIC and BIC and

- an decrease of the adjusted R² .

For example, adding a random variable:

the AIC and BIC increase:

7.2 AIC and BIC

R2 = 68.27% +

Adj. R2 = 47.12% -

AIC = 44.73 +

BIC = 44.11 +


Summary

The quality of the linear approximation of an OLS model is evaluated with

goodness-of-fit measures.

There is a trade-off between goodness-of-fit and the simplicity of the model.

If you include many explanatory variables, you should always calculate a

measure that corrects for the inclusion of additional variables (such as BIC or

AIC).

7.2 AIC and BIC






Empirical Finance





5. The data set



8. Typical problems





Prof. Dr. Dieter Hess8. Typical problems

8.1 Heteroskedasticity

8.2 Autocorrelation

8.3 Generalized Least Squares




Introduction

The homoskedasticity assumption states that the variance of the unobservable

error term , conditional on the explanatory variables, is constant.

It fails, whenever the variance of the unobservable changes across different

segments of the population.

→ The error terms are then no longer independent of x

→ Covariance matrix of the OLS estimator is incorrect

In this case

OLS is still unbiased and the goodness-of-fit measures hold.

But the OLS estimator may be relatively inefficient.



Example

Variation of food expenditures increases with higher income (Engle curve)


income

food

expenditures


Consequences for the OLS estimator

Consider the following simple OLS model in matrix notation y = X +

The second and third Gauss-Markov assumptions can be summarized as

Heteroskedasticity implies, that (8.2) no longer holds,

i.e., error terms do not have identical variances. Instead

where is a positive definite matrix that may depend upon X

Standard t- and F-tests will no longer be valid and inferences will be

misleading because they relied on (8.2).


2 (8.2)Var X Var I

2 (8.3)Var X


Three ways to solve the problem

1. Reconsider your model and try to evaluate whether it might be miss-

specified.

2. Use the OLS estimator

but use an alternative procedure to estimate standard errors,

i.e., standard errors which are “robust” against heteroskedasticity

3. Use an alternative estimator, for example, the (F)GLS estimator.



Adjusted standard errors

White (1980) derived an alternative estimator of the variance of the OLS

parameters that holds under heteroskedasticity

with

Standard errors computed on the basis of this variance-covariance estimate

are referred to as heteroskedasticity-consistent standard errors or White

standard errors.

In Stata they are obtained by simply adding the option “robust” :

reg X Y, robust


1 12ˆ

iVar X X X X Diag X X X

1 12 2

1

ˆˆN

i i i

i

X X u x x X X


Testing for Heteroskedasticity

Always produce a graph of your data!

This may already tell you enough (like in the previous example)!

Basically, testing for heteroskedasticity means that we try to evaluate whether

the variance is identical across residuals or not.

But what is meant by “not identical”?

Should we compare different points in time?

Should we compare different groups (e.g. men vs. women)?

…

There are several tests for heteroskedasticity available. The main difference is,

what “type of difference” they test for.



Examples:

The Goldfeld-Quandt test tests for groupwise heteroskedasticity, assuming

that

The null hypothesis states that the variance is identical for both groups:

H0: A2 = B

2.

If we have to reject the null, our data are heteroskedastic.

The ARCH-LM test looks at differences across time, assuming that the

variance is constant over time.

Again, rejecting the null implies that our data are heteroskedastic.


2 2( ) or ( ) if belongs to group A or B, respectively.i A i BVar Var i






8.2 Autocorrelation





Assume we have a time series data set, i.e., one with a time dimension t.

E.g., we observe leverage of one firm over several years.

Autocorrelation is present when the error terms εi and εj are not independent.

Again, assumption (8.2) no longer holds.

The consequences of autocorrelation are similar to those of heteroskedasticity:

OLS remains unbiased, but parameter estimates become inefficient and the

standard errors are estimated in the wrong way.

8.2 Autocorrelation

0

1

2

3

4

5

6

7

0 5 10 15 20 25

time t

y


First order autocorrelation

The most popular form of autocorrelation is the first-order autoregressive

process. In this case, the error term in

is assumed to depend upon its predecessor

where t is an error term with zero expectation and constant variance that

exhibits no serial correlation and is the first-order autocorrelation coefficient.

8.2 Autocorrelation

t t ty x

1t t t


Testing for First Order Autocorrelation

To test for autocorrelation of first order, we can simply regress the OLS

residual on its first lag:

reg y x

predict eps, residual

reg eps L.eps

This auxiliary regression produces an estimate for the first-order

autocorrelation coefficient along with a corresponding t-test.

Alternatively, use a “Durbin Watson” test:

estat dwatson

It can be shown that : DW ≈ 2 – 2 or ≈ 1 – DW / 2

Hence, a DW test statistic substantially apart from 2 indicates that is

different from zero, i.e., that first-order autocorrelation is present.

8.2 Autocorrelation


What to do when autocorrelation is found?

In many cases the finding of autocorrelation is an indication that the model is

miss-specified.

Typically, three (interrelated) types of misspecification may lead to a finding of

autocorrelation in the OLS residuals: dynamic misspecification, omitted

variables and functional form misspecification:

→ Do not change your estimator (e.g. from OLS to GLS),

but change your model.

If changing the model does not help (or is not possible):

→ Again, stay with OLS but use a different inference,

i.e., use “robust” t-tests, …

8.2 Autocorrelation


Robust inference: Newey-West standard errors

Like in the case of heteroskedasticity (slide 8-13) one can apply OLS

in spite of autocorrelation but has to adjust its standard errors:

Use “heteroskedasticity and autocorrelation consistent” (HAC) standard

errors, i.e., Newey-West standard errors.

In stata you can call for Newey-West standard errors using the comand

“newey”.

Instead of reg Y X

use newey Y X

This produces OLS parameter estimates, but all inference (t-statistics, …) is

based on the HAC estimator.

8.2 Autocorrelation


Example

Consider the following dataset containing variables on the volume of annual

textile consumption (consvoltext), real income (income) and relative price of

textiles (price). The Stata data file “autocorr_textile.dta” is available online.

8.2 Autocorrelation

Year

(base=1925)

Volume of textile

consumption

Income Price

1923 99.2 96.7 101

1924 99 98.1 100.1

1925 100 100 100

1926 111.6 104.9 90.6

1927 122.2 104.9 86.5

1928 117.6 109.5 89.7

1929 121.1 110.8 90.6

1930 136 112.3 82.8

1931 154.2 109.3 70.1

1932 153.6 105.3 65.4

1933 158.5 101.7 61.3

1934 140.6 95.4 62.5

1935 136.2 96.4 63.6

1936 168 97.6 52.6

1937 154.3 102.4 59.7

1938 149 101.6 59.5

1939 165.5 103.8 61.3


We need to tell Stata that the dataset has a time-series structure and which

variable contains the time index. This is done by:

tsset [name of time variable]

Afterwards, we can run regressions in the usual way.

8.2 Autocorrelation


Assume we first want to estimate an equation with price as the explanatory

variable:

Then a post-estimation command can be used to obtain the DW-statistic:

estat dwatson

8.2 Autocorrelation


The DW statistic of 1.2 indicates, that the error terms are autocorrelated

(with ≈ 1 – 1.2 / 2 = 0.4)

→ The first suggested solution was to change the model:

Economic theory suggests that income is an important variable

in a demand equation.

→ The second solution was to use Newey-West standard errors

8.2 Autocorrelation


First, try what happens if we run a second OLS regression

that includes both price and income as explanatory variables.

Now, DW is quite close to 2, indicating that is no autocorrelation anymore.

8.2 Autocorrelation


Second, assume we do not have income data.

Therefore we use HAC standard errors

newey [dependent variable][list of independent variables] {, lag(#)}

Note that the coefficients remained unchanged, but we obtain much smaller

(HAC) standard errors.

8.2 Autocorrelation






8.2 Autocorrelation





Another way to account for biased standard errors due to autocorrelation or

heteroskedasticity is to derive an alternative estimator.

The idea is if we know that our error terms have a certain structure, i.e.,

we can transform the model and derive an alternative estimator:

This is the Generalized Least Squares (GLS) estimator.

If we do not know (typically we have to estimate it) we can apply the

„Feasible“ GLS (FGLS) estimator:


2 with Var X I

1 12ˆ Var X X X X X X X

1

* 1 1ˆ ˆ ˆ FGLS X X X y


The FGLS estimator is asymptotically unbiased, but not in finite samples!

Hence, there is a tradeoff between unbiasedness of the OLS estimator and

the higher efficiency of the FGLS estimator:

→The OLS estimator is unbiased but inefficient (as it does not have the

smallest variance).

→The FGLS estimator is biased in small samples but efficient

(however, it is only efficient if the form of heteroskedasticity is correctly

specified).

→So, if you have a good idea about the form of heteroskedasticity, FGLS

may provide a more efficient estimator.

If not, use OLS but apply robust inference!







Empirical Finance





5. The data set



8. Typical problems





Prof. Dr. Dieter Hess9. Advanced techniques

9.1 Overview

9.2 Panel estimators

9.3 Unbalanced panels


Wooldridge (2000), chapter 13.1


9.1 Overview

Data set dimension

Typically, we have one of the two types of data structures:

Cross-sectional data (e.g., earnings of N firms for 1 year)

Time-series data (e.g., earnings of 1 firm over T years)

A combination of both, i.e., repeated cross-sections, is advantageous:

This allows for a richer analysis and for more realistic models

since it is useful to know that certain observations come from a particular

individual, e.g., from a “small” firm.

But we can no longer assume that all observations are independent

as some come from the same individual


9.1 Overview

With repeated cross-sections we distinguish:

Pooled cross-sections (ignore the specific data structure)

Panel data (exploit information from repeated observations)

PanelPooled cross-sections

Year firm earn year earn1 earn2 earn31985 1 17 1985 17 4 521990 1 -2 1990 -2 7 481995 1 12 1995 12 3 651985 2 41990 2 71995 2 31985 3 521990 3 481995 3 65


9.1 Overview

Pooled cross-sectional regressions

The simplest pooled cross-sectional regression is just a regular OLS estimation:

yi,t = β0 + xi,t β1 + εi,t

It assumes that the parameters (β0 and β1) are identical for all individuals.

But it ignores that there may be dependencies between the observations of a

given individual (e.g., rich people consume more at all points in time; small firms

have lower debt ratios at all points in time; …).

Therefore, the standard OLS assumption that the residuals εi,t are identically

and independently (i.i.d.) distributed residuals may be violated.

But then inference is wrong, i.e., estimated standard errors are misleading.

Moreover, estimated parameters can be biased.


9.1 Overview

Example (biased OLS results):

We are interested in analyzing the influence of overall economic risk on

companies’ debt ratios.

Theory suggests:

With higher earnings risk firms should select lower debt levels.

Young (and small) firms have difficulties to raise debt and therefore

tend to have lower debt ratios.

Knowing that small firms tend to have lower debt ratios, we expect that their

residuals will be negative (at most points in time). In contrast, large firms

will presumably have positive residuals.

→ Assumption of i.i.d. residuals and (A.1 : E(ε) = 0) is violated.


9.1 Overview

year dr1 dr2 dr3 average_dr gdpvola

1985 0.36 0.36 19

1990 0.35 0.15 0.25 20

1995 0.5 0.2 0.08 0.26 15

2000 0.48 0.23 0.12 0.175 12

year firm dr gdpvola1985 1 0.36 191990 1 0.35 201995 1 0.5 152000 1 0.48 121990 2 0.15 201995 2 0.2 152000 2 0.23 121995 3 0.08 152000 3 0.12 12

Assume we observe the following data:

dr = debt ratio of a firm in a year

gdpvola = overall economic volatility

Obviously this is a (unbalanced) panel:

Note that for all firms the debt ratio

is increasing as gpdvola goes down.

However, firm 2 and 3 are “young”

(being established after 1985 and

1990, resp.).

Therefore they have much lower

debt ratios on average.


9.1 Overview

We estimate the following models:

Pooled cross-sectional

regression:

reg dr gdpvola

Panel (random effects)

xtreg dr gdpvola, re

Panel (fixed effects):

xtreg dr gdpvola, fe

→ OLS produces biased results (as it ignores dependencies).

Panel estimators (fixed and random effects) produce much better results.


9.1 Overview

For comparison we also estimate a pooled cross-sectional model

but with firm-specific dummies (_c1 = constant for firm 1, …)

reg dr _c1 _c2 _c3 gdpvola, noconst

Note that this produces the same result as the fixed effects estimator

xtreg dr gdpvola, fe

→ Basically the fixed effects panel estimator solves the problem by including

individual-specific constants to accounts for the individual effects.





9.1 Overview







In contrast to a simple pooled cross-sectional regression,

panel estimators try to capture dependencies of individuals:

yi,t = β0 + xi,t β1 + εi,t with εi,t = αi + υi,t

Panel estimators try to split εi,t into an individual-specific part αi

and a part that is independent of individuals υi,t , i.e. i.i.d., distributed.

We say αi accounts for “unobserved heterogeneity” among individuals,

i.e., for differences across individuals for which cannot control due to a

lack of appropriate data.

For example, we might simply don’t have data about size, age, … of the

firms in our sample.



Some useful transformations:

If

(1)

holds, then also

(2)

must hold with , and .

Subtracting (2) from (1) yields:

(3)

These three equations provide the basis for estimating β1

, 0 1 , ,i t i t i i ty x

,1

T

i i tty y T

0 1i i i iy x

,1

T

i i ttx x T

,1

T

i i ttT

, 1 , ,i t i i t i i t iy y x x



Between estimator

Within (= fixed effects) estimator

0 1

The between estimator uses OLS to estimate 2 :

Hence, it explains why differs from ,

i.e., it explains differences individuals.

i i i i

i j

y x

y y

between

xtreg y x, be

xtreg y x, fe

, 1 , ,

,

The within estimator uses OLS to estimate 3 :

Hence, it explains why differs from ,

i.e., it explains differences individuals.

i t i i t i i t i

i t i

y y x x

y y

within



There are two estimators that combine the between and within dimension:

The regular OLS estimator:

The random effects estimator

The random effects estimator uses FGLS to directly estimate (1),

and thus combines the within and between dimension efficiently.

It is equivalent to a weighted average of the between and within

estimators.

, ,

The regular (pooled cross-section) OLS estimator directly estimates (1)

and thus combines the within and between dimension, but not efficiently.

It requires that E ( ) 0, i.e., that ai t i i t ix ,nd are

contemporanously uncorrelated.

i tx

reg y x

xtreg y x, fe



Assumptions

The fixed effects (FE) model assumes that the individual-specific terms are

(non-random) constants αi which can be estimated:

yi,t = αi + xi,t β1 + υi,t

Hence, the FE model requires little assumptions. It only assumes that αi are

fixed unknown constants which can be estimated.

In particular, it does not require that the αi and the xi,t are uncorrelated

Note that the FE model is equivalent to a regular OLS estimation including

a separate dummy variable Di for each individual i:

yi,t = α1 D1+ α2 D2 +… + αN DN + xi,t β1 + υi,t



Assumptions

The random effects (RE) model tries to disentangle the residuals

yi,t = β0 + xi,t β1 + εi,t with εi,t = αi + υi,t

αi captures the individual effects, but they are assumed to be stochastic.

Most importantly, it is assumed that E[εi,t ∙xi,t ] = 0. This implies that the

unobservable characteristics captured by αi are uncorrelated with the

observable regressor(s) in xi,t.

In many applications this may be too restrictive. For example, …

… in a wage-model the unobservable “general ability” of persons might be

correlated with their observable “school degree”.

… in a debt-ratio model unobservable “management skills” might be

correlated with the observed “firms access to capital markets”.



Choosing between RE and FE

The fixed effects model concentrates on differences “within” individuals,

not “between” them.

It is explaining to which extent yi,t differs from , not why differs from .

The FGLS estimator for random effects is an efficient combination of the

“within” and “between” estimator.

If the assumption of the RE model are is violated (i.e. uncorrelated αi and xi,t)

the RE model will produce more efficient estimates than FE since the FE

model assigns 100% weight to the within dimension and ignores the

between dimension of the data.

However, if the uncorrelatedness assumption is too restrictive, the RE

estimator may produce biased estimates.

This risk is lower for the FE as it does not require uncorrelated αi and xi,t.

iyiy jy




In practice, one often estimates both the RE and FE model.

If there are no substantial differences between the results, then

correlation of αi and xi,t is no issue.

Hausmann (1978) suggests a very general test which can be used to

determine whether the FE or RE model is more appropriate.

1. Estimate the model which is consistent under the null hypothesis

and store the results

2. Estimate the model which is efficient

and store the results

3. Compare the estimates

hausman consistent_mod efficient_mod

xtreg y x, fe

estimates store consistent_mod

xtreg y x, re

estimates store efficient_mod




If the Hausman test yields a low p-value (it rejects the null-hypothesis that

the differences between the individuals are not systematic), then we

should prefer the fixed effects model.





9.1 Overview






Some panel data sets have missing years for at least some cross-sectional

units in the sample.

Such a data set is called

an unbalanced panel.

The mechanics of FE estimation remain the same.

If Ti is the number of time periods for cross-sectional unit i, we simply use

these Ti obervations for calculating individual-specific means (αi).

Units that are observed only in one single time period are excluded.

Nevertheless, it is important to determine why the panel is unbalanced.

9.3 Unbalanced Panels

year dr1 dr2 dr3 average_dr gdpvola

1985 0.36 0.36 19

1990 0.35 0.15 0.25 20

1995 0.5 0.2 0.08 0.26 15

2000 0.48 0.23 0.12 0.175 12


If the reason we have missing data for some i is not correlated with the

idiosyncratic errors, uit, the unbalanced panel causes no problems.

However, this is often not the case. For example, …

… if we collect data on companies, some of them may be lost in

subsequent years because they have gone out of business or have

merged with other companies. Or newly established firms enter the

sample.

… a hedge fund’s performance influences its likelihood to survive.

If the reason an individual leaves is correlated with the idiosyncratic error,

i.e., an unobserved factor that varies over time, there is a sample

selection problem.

Sample Selection Problems may cause biases, e.g., survivorship bias!

In this case we need more advanced estimators.

9.3 Unbalanced Panels






Empirical Finance

Good luck on the exam !

University of Cologne Corporate Finance ... - uni-saarland.de · University of Cologne Corporate...

Documents

Transcript of University of Cologne Corporate Finance ... - uni-saarland.de · University of Cologne Corporate...