Download - Panel Data: Linear Models - lem.sssup.it · Laura Magazzini (@univr.it) Panel Data: Linear Models 10 / 45. Introduction Example 1: The panel solution (1) The omitted variable bias

Panel Data: Linear Models

Laura Magazzini

University of Verona

[email protected]://dse.univr.it/magazzini

Laura Magazzini (@univr.it) Panel Data: Linear Models 1 / 45

Introduction

Outline

What is Panel Data?

Motivation: the omitted variable problem

An example: Production function

Model specification

Estimation


Introduction

What is panel (or longitudinal) data?

It is a time-series of cross-section, where the same unit is observedover a number of periods

. Units can be individuals, firms, households, industries, markets, regions,countries, ...

Micro- vs. Macro-panels: different techniques are required forestimation

. Bank of Italy, European panel: large N & small T

. OECD: large N & small/medium/large T

We work on micro-panel (large N & small T )

. Random sampling over the cross-sectional dimension

Micro & Macro-panel: one of the most active bodies of literature ineconometrics


Introduction

Basic model and notation

We will consider the linear model

yit = x ′itβ + vit

with i = 1, ...,N (sample units), t = 1, ...,T (time periods)

For each sample units, we have the following T equations:

yi1 = x ′i1β + vi1

yi2 = x ′i2β + vi2...

yiT = x ′iTβ + viT


Introduction

Advantages of panel data

Greater flexibility in the study of dynamics than CS or TS(ex.1) Repeated CS: in two points in time you observed 50% of women appear

working. One-half of the women will be working? Or the some one-halfof women will be working over all time periods? [Ben-Porath (1973)]

(ex.2) Production function: economies of scale (ES) versus technical change(TC). CS only provides information about ES. TS muddle the twoeffects.

Greater precision in estimation (greater number of observations dueto pooling)

Heterogeneity across units: it is possible to disentangle differentsources of variance of the units of interest (permanent versustransitory factors)Can solve the omitted variables bias (fixed effects). Consistent estimates can be obtained in the presence of omitted

variables, if the omitted variable vary across sample units, but it isconstant over time, e.g. preferences, individual ability, propensity topatent, ...


Introduction

Example 1: Production function

Max output given the value of the inputs

Consider the case of agricultural production:

Q = φ(L,V )

o Q: Outputo L: Input that varies over time (labor)o V : Input that remains constant over time (soil quality). You can also think of a firm production function where V represents

managerial capability

Typically, V is known to the farmer/manager, but unknown to theeconometrician


Introduction

Example 1: Econometric specification

Let us consider a Cobb-Douglas production function:

φ(L,V ) = ALαV β

Taking logs (and adding an error term, summarizing all inputs outsidethe farmer’s control, e.g. rainfall):

q = a + αl + βv + u

Parameter of interest: α, i.e. the (%) increase in Q driven by a 1percent increase in L, holding V constant


Introduction

Example 1: Data availability

q = a + αl + βv + u

Ideal world

You measure Q, L, and V on a sample of N farmers

If standard hypotheses hold, the relationship can be estimated by OLS

Real world

V is not observable: you measure only Q and L on a sample of Nfarmers

q = a + αl + (βv + u) = a + αl + ε

Omitted variable bias?


Introduction

Example 1: Estimation by OLS?

E [q|l ] = a + αl + (βE [v |l ] + E [u|l ]) = a + αl + E [ε|l ]

OLS regression of q on l allows the identification of the parameter ofinterest α if and only if E [ε|l ] = 0

We assume E [u|l ] = 0, therefore we need the omitted variable v

(1) not to affect q once l is controlled for, i.e. β = 0 or(2) uncorrelated with l : E [v |l ] = 0

We do not believe (1): soil quality affects harvest (managerialcapabilities affect firm output)

What does economic theory tell us about hypothesis (2)?


Introduction

Example 1: Relationship between L and V

According to economic theory, a farmer/firm chooses L thatmaximizes the expected profit

Let pl the cost of a unit of L, and p the price of the output Q

π = ALαV βp − Lpl

Taking first derivatives and solving first order condition, the optimal Ldepends on V

As a consequence, L is correlated with V : firms choose the optimal Lon the basis of characteristics that are unobservable for the researcherbut known to the farmer/firm!

cov(v , l) 6= 0⇒ E [v |l ] 6= 0 and, therefore, E [ε|l ] 6= 0: OLS isinconsistent


Introduction

Example 1: The panel solution (1)

The omitted variable bias is linked to the problem of endogeneity

Instrumental Variable can be applied for estimation (need to searchfor “external” instruments)

What if...?

. The soil quality/managerial ability V is constant over time

. Q and L are observed for (at least) T = 2 time periods


Introduction


When t = 1: qi1 = a + αli1 + βvi1 + ui1

When t = 2: qi2 = a + αli2 + βvi2 + ui2

Taking the difference (we assume V constant over time ⇒ vi1 = vi2):

qi2 − qi1︸︷︷︸∆qi

= α(li2 − li1︸︷︷︸∆li

) + ui2 − ui1︸︷︷︸∆ui

The equation ∆qi = α∆li + ∆ui does not depend from theunobserved variable v

If ∆ui satisfies classic assumptions, the regression of ∆qi on ∆li canprovide an estimate of the parameter of interest α.


Introduction


Advantages: repeated observations over time on the same unit allowsto use estimation methods that are robust to the presence of omittedvariables in the model, if these variables are constant over time.

Any transformation of the initial model that eliminates theunobservable variable v is a good starting point

The linearity and additivity of the model are necessary in this context.


Introduction

Example 2: Return to schooling

Aim: Study the variation in income associated to a change in theyears of schooling

The model of interest is:

wi = α + ρsi + ai + εi

with wi indicates the income, si is the number of years of schooling,ai represents individual ability (i = 1, ...,N).

Likely, individual ability affects income (cov(w , a) > 0) and iscorrelated with the years of schooling (cov(s, a) > 0)

Unfortunately, ai is typically unobservable!


Introduction

Example 2: Identification and estimation

Let us suppose we observe (w , s) for the same unit at two points intime

Typically, si does not vary over time, i.e. we look at the relationshipbetween w and s when choices about s have already been done

At time 1: wi1 = α + ρsi1 + ai + εi1

At time 2: wi2 = α + ρsi2 + ai + εi2

Taking differences (since si1 = si2):

wi2 − wi2 = εi2 − εi1

The availability of repeated observations does not improve theidentification of ρ


The Omitted Variables Problem

Motivation: The omitted variables problem

Panel data can be used to obtain consistent estimators in thepresence of omitted variables

Let y and x = (x1, ..., xK ) be observable random variables

Let c be an unobservable random variable

We are interested in the partial effect of the observable explanatoryvariables xj in the population regression function: E [y |x1, ..., xK , c]

Assuming a linear model: E [y |x1, ..., xK , c] = β0 + x′β + c , i.e.

y = β0 + x′β + c + ε

- Interest lies in the (K × 1) vector β- c is called unobserved effect



What if cov(x, c) 6= 0?

y = β0 + x′β + c + ε

1 Find a proxy for c and estimate β using OLS

2 Find an “external” instrument for x and apply 2SLS

3 If we can observe the same units at different points in time (i.e. wecan collect a panel data set), we can get consistent estimates of β aslong as we can assume c to be constant over time

. Accomplished by transforming the original data (“internal”instruments)



The panel solution to omitted variable bias (T = 2)

Assume we can observe (y , x) at two different points in time:

t = 1: (y1, x1) & t = 2: (y2, x2)

The population regression function is:

E [yt |xt , c] = β0 + x′tβ + c or yt = β0 + x′tβ + c + ut

where by definition E [ut |xt , c] = 0 (t = 1, 2).

What about E [c |xt ]?. If E [x′tc] = 0, we can apply OLS

. If E [x′tc] 6= 0, pooled OLS is biased and inconsistent

But we can take first difference and eliminate c:

y2 − y1︸︷︷︸∆y

= (x2 − x1︸︷︷︸∆x

)′β + u2 − u1︸︷︷︸∆u



Can we apply OLS for estimation? (T = 2)

∆y = ∆x′β + ∆u

Exogeneity: E [∆x′∆u] = 0. E (x′2u2) + E (x′1u1)− E (x′1u2)− E (x′2u1) = 0. Stronger than E (x′tut) = 0 (t=1,2). Strict exogeneity: cov(xt , us) = 0 for all t and s. No restrictions on the correlation between xt and c

Rank condition: rankE (∆x′∆x) = K

. If xt contains a variable that is constant across time for every memberof the population, then ∆x contains an entry that is identically zero,and rank condition fails


Linear Model Notation

The basic linear panel data model (1)

For a randomly drawn cross-section i , we assume (i = 1, ...,N,t = 1, ..,T ):

yit = x′itβ + ci + uit

. ci : individual effect or individual heterogeneity

. uit : idiosyncratic errors/disturbances

Assume ci uncorrelated with uit

Assume uit homeschedastic and serially uncorrelated

We consider a “balanced panel”: each cross-section i is observed Ttimes (total of N × T observations)


Linear Model Notation

The basic linear panel data model (2)

In compact form we can write:

yi = x′iβ + ci ιT + ui

where vectors have dimension T × 1

yi = (yi1, ..., yiT )′ xi = (xi1, ..., xiT )′

ui = (ui1, ..., uiT )′ ιT = (1, ..., 1)′

Different estimators are available on the basis of underlyingassumptions on the correlation structure of ci

Asymptotics rely on N →∞, for fixed T


Linear Model OLS estimation

When pooled OLS?

yit = x′itβ + ci + uit = x′itβ + vit

vit : composite error, sum of the unobserved effect and idiosyncraticerror

OLS is consistent if E [x′itvit ] = 0:

. E [x′ituit ] = 0

. E [x′itci ] = 0, t = 1, 2, ...,T

Robust standard errors: the presence of ci induces correlation overtime for the same individual

OLS is not efficient


Linear Model Random effect estimation

Random effects structure

yit = x′itβ + ci + uit = x′itβ + vit

uit homoschedastic and serially uncorrelated: E [uiu′i |xi , ci ] = σ2

uIT

ci homoschedastic: E [c2i |xi ] = σ2

c

As a result, the error structure has the following form:

Ωi = E [viv′i ] =

σ2c + σ2

u σ2c . . . σ2

c

σ2c σ2

c + σ2u . . . σ2

c...

.... . .

...σ2c . . . . . . σ2

c + σ2u

(T×T )

= σ2c ιT ι

′T + σ2

uIT

E [vv′] = IN ⊗ Ωi = Ω



GLS estimation (unfeasible)

βRE(GLS) =

(N∑i=1

X′iΩ−1i Xi

)−1( N∑i=1

Xi′Ω−1

i yi

)

The estimator can be obtained by applying OLS regression to Ω−1/2Xon Ω−1/2y

Ω−1/2 = [IN ⊗ Ωi ]−1/2 = IN ⊗ Ω

−1/2i

Ω−1/2i = 1

σu

[IT − θ

T ιT ι′T

]with θ = 1− σu√

σ2u+Tσ2

c

The GLS estimator can be obtained by the OLS regression of(yit − θyi ) on (xit − θxi )

. If σ2c = 0, θ = 0: RE = OLS (no unobs. heterogeneity; Breusch Pagan

LM statistic)



GLS estimation (feasible)

In order to implement the RE procedure, we need to obtain σ2c and σ2

u

βRE(FGLS) =

(N∑i=1

X′i Ω−1Xi

)−1( N∑i=1

Xi′Ω−1yi

)

To get Ω (get σ2c and σ2

u), Wooldridge suggests:

. σ2c + σ2

u from pooled OLS residuals. As σ2

c = E [vitvis ], autocorrelation in OLS residuals can be exploited toobtain an estimate of σ2

c

. σ2u can be recovered by taking the difference σ2

c + σ2u − σ2

c

. Alternative procedure described in Greene (Maddala and Mount, 1973)

. In small sample you can have σ2c < 0!



Random effect estimationyit = x′itβ + ci + uit

Obtained from the OLS regression of (yit − θyi ) on (xit − θxi ) (in themore general case: OLS regression of Ω−1/2y on Ω−1/2X)

Assumptions (stronger than OLS):

(1) Strict exogeneity: E [x′isuit ] = 0 for each s, t = 1, ...,T

(2) Orthogonality between ci and each xit : E [ci |xi] = E [ci ] = 0

(3) Rank condition: rank E [X ′i Ω−1Xi ] = K , where Ω = E [viv′i ]

Why REE? Exploit serial correlation of the error term in a GLSframework: efficient



The strict exogeneity assumption


E [yit |xi1, xi2, ..., xiT , ci ] = E [yit |xit , ci ] = x′itβ + ci

Once xit and ci are controlled for, xis has no partial effect on yit fors 6= t

xit , t = 1, ...,T are strictly exogenous conditional on theunobserved effect ci

The strict exogeneity assumption can be stated in terms of theidiosyncratic error term: E [uit |xi1, xi2, ..., xiT , ci ] = 0

This implies that explanatory variables in each time period areuncorrelated with the idiosyncratic error in each time period:E [x′isuit ] = 0 for each s, t = 1, ...,T

. Stronger than zero contemporaneous correlation: E [x′ituit ] = 0


Linear Model Fixed effect estimation

Fixed effect framework

We maintain the strict exogeneity assumption: E [uit |xi, ci ] = 0

Allow ci to be arbitrarily correlated with xiFE is more robust than RE

. We can consistently estimate partial effects in the presence oftime-constant omitted variable, that can be related to the observablesxi

. BUT we cannot include time-constant factors in xi (e.g. gender, racein the analysis of individuals; foundation year for firms; ...)

To get estimates we transform the equation to remove ci and applyOLS

. Dummy variable regression

. Within transformation

. First difference



Dummy variable regressionLeast Squares Dummy Variables (LSDV)

yi = xiβ + ci ιT + ui

Collecting the terms over the N units gives:y1

y2...yN

=

x1

x2...xN

β +

ιT 0 . . . 00 ιT . . . 0

...0 0 . . . ιT

c1

c2...

cN

+

u1

u2...uN

Or, letting di be a dummy variable indicating unit i

y = [X d1 d2 . . .dN ]

[βc

]+ u = Xβ + Dc + u

Classical regression model with K + N parameters

What if N is thousands?



Dummy variable regressionDiscussion

The parameter of interest is β

ci : nuisance parameters that only increase the computationalcomplexity of estimation

Incidental parameter problem: increasing N also increases the numberof ci to be estimated

Solution: use the within gruop (WG) transformation

. Numerically, LSDV and WG transformation lead to the same estimatefor β (result of partitioned regression – just algebra)

. Estimate of β “easier” to compute with WG (an important issue someyears ago...)



Within group (WG) transformation

We transform the model in order to remove the term ci

For individual i at time t: yit = x ′itβ + ci + uit

For individual i , the average over the T periods is: yi = x ′iβ + ci + ui

Therefore by taking deviations from group means, we get:

yit − yi = (xit − xi )′β + (uit − ui )

Under the assumption of strict exogeneity, we can apply OLS the thetransformed data to get a consistent estimate of β

Estimates of ci can be computed by ci = yi − βxi (unbiased; notconsistent for fixed T and N →∞)

The F test can be applied for the joint significance of ci



Fixed effect estimationyit = x′itβ + ci + uit

WG: OLS regression of yit − yi on xit − xi (removes ci )

Assumptions:

(1) Strict exogeneity: E [x′isuit ] = 0 for each s, t = 1, ...,T

(2) Rank condition: rank(∑T

t=1 E [x ′it xit ])

= rank E [X ′i Xi ] = K , where

xit = xit − xi

No assumption about the correlation of ci and each xit : consistenteven if E [ci |xi] 6= 0

. More robust than RE, but effect of time-invariant variables cannot beidentified

Efficient if uit homoschedastic and uncorrelated over time



First difference (FD)

Another way to remove the term ci from the equation is to take firstdifferences:

yit − yit−1 = (xit − xit−1)′β + (uit − uit−1)

OLS can be applied for estimation if ∆xit is uncorrelated with ∆uit

(satisfied under strict exogeneity)

However it is not efficient, due to the correlation introduced amongthe error terms ∆uit and ∆uit−1 (if uit is uncorrelated over time)

For example, for T = 3

∆yi2 = ∆x ′i2β + (ui2 − ui1)

∆yi3 = ∆x ′i3β + (ui3 − ui2)

GLS estimation could be employed to solve the problem: you get the“within-group” estimator



First difference estimation


FD: OLS regression of ∆yit on ∆xit (removes ci )

Assumptions:

(1) E [∆x′it∆uit ] = 0, that is E [x′isuit ] = 0 for eacht = 1, ...,T ; s = t − 1, t, t + 1

. satisfied under strict exogeneity

(2) Rank condition: rank E [∆X ′i ∆Xi ] = K

No assumption about the correlation of ci and each xit : consistenteven if E [ci |xi] 6= 0

. More robust than RE, but effect of time-invariant variables cannot beidentified



Non-spherical uit

What if Ωi 6= σ2c ιT ι

′T + σ2

uIT ?. That is, uit heteroskedastic and/or correlated over time

If E (ci |xi ) 6= 0, then the FE estimator is still consistent (under strictexogeneity); it is no longer efficient. Robust formulas should be employed for the computation of the

standard errors!. βFD is efficient if uit is a random walk (∆uit serially uncorrelated)

If E (ci |xi ) = 0 (the orthogonality condition holds), then the REestimator remains consistent (under strict exogeneity); it is no longerefficient. A more general estimator of Ωi can be obtained as:

Ωi = N−1N∑i=1

vi v′i

with vi pooled OLS residuals (efficient in the more general case). Assume alternative specifications: parametric assumptions about the

correlation structure in uit , e.g. AR(1) and perform GLS estimation


Linear Model Which one to choose?

WG vs. FDWhich one to choose?

WG: OLS regression of (yit − yi ) on (xit − xi )

FD: OLS regression of ∆yit on ∆xitBoth WG and FD produces unbiased and consistent estimates of theparameter of interest β, as ci is removed from the regression. The estimate of β is not affected by the correlation (if any) between ci

and xi. Generally, if the two estimators are different, this can be interpreted as

evidence against the assumption of strict exogeneity

When T = 2, βWG = βFDIf T ≥ 3, under homoschedasticity of u, βWG is to be preferredbecause efficientIf uncorrelation and homoschedasticity of u is not satisfied, the choicedepends on the assumptions about uit :. If uit is a random walk, then ∆uit is serially uncorrelated: βFD is

efficient. In the more general set up, use FD or WG with robust s.e.!



FE vs. RE (1)Which one to choose?

Traditional approach: ci treated either as parameter to be estimatedvs. random disturbance

. “Philosophical” issue

. Wrongheaded in microeconometrics applications

Modern terminology: fixed effects estimation vs. random effectsestimation

. The difference is in the assumptions about E [ci |xi ]

. FE allows consistent estimation of β even in cases where ci iscorrelated with xi

. RE requires ci to be uncorrelated with xi



FE vs. RE (2)Which one to choose?

FE: OLS regression of (yit − yi ) on (xit − xi )

. Only “within” variation is considered

RE: OLS regression of (yit − θyi ) on (xit − θxi )

. Both “within” and “between” variation are employed for estimation

. It is possible to show that βRE = ΛβB + (IK − Λ)βFE with βB obtainedfrom the OLS regression of yi on xi

. θ = 1− σu√σ2u+Tσ2

c

: if T →∞, RE = FE – you need a different

framework!

Key: correlation between ci and xit. If E [ci |xit ] = E [ci ] (= 0): RE is consistent and efficient, FE consistent. If E [ci |xit ] 6= E [ci ]: FE consistent, but RE is not



FE vs. REThe Hausman test

Both FE and RE assume strict exogeneity

If E [ci |x] = E [ci ] (= 0)

. Both βFE and βRE are consistent for β: βFE − βRE ≈ 0

. βRE is efficient: Var(βFE ) is “greater than” Var(βRE )

If E [ci |x] 6= E [ci ]

. βFE is consistent, but βRE is biased: βFE − βRE 6= 0

We can apply the Hausman test

(βFE − βRE )′(Var(βFE )− Var(βRE ))−1(βFE − βRE ) ∼ χ2K

Remark: Two maintained hypotheses (not tested!): (i) strictexogeneity; (ii) random effect structure of the covariance (under thenull, RE has to be efficient: valid under spherical uit)



Between FE and RE: Correlated random effects(Mundlak, 1978; Chamberlain, 1982, 1984)

RE assumes no correlation between ci and xit

Richer models can be specified that relax this assumption

Mundlak (1978): ci = x ′iπ + wi with wi i.i.d.

. GLS estimation of the regression of yit on xit and xi produces the fixedeffect estimator

Chamberlain (1982, 1984): ci = x ′i1π1 + ...+ x ′iTπT + wi

. Estimation of the extended model by minimum distance methodproduces the fixed effect estimator

In nonlinear models, fixed effect models are not always estimable andricher RE models provide an alternative approach



FE vs. REA robust version of the Hausman test

Starting from the Mundlak (1978) definition (linear projection):ci = x ′iπ + wi with wi i.i.d. we can write:

yit = x ′itβ + ci + uit = x ′itβ + x ′iπ + (wi + uit)

GLS estimation produces: βGLS = βFE and πGLS = βBET − βFE(βBET : OLS estimate in the regression of yi on xi )

Hausman test can be carried out by testing H0: π = 0 in theextended regression

Robust version of the Hausman test: use a robust Wald statistic inthe context of pooled OLS (strict exo is still needed, but we can relaxon efficiency of RE under the null)


Goodness of fit

The R2 with panel data

R2 as the square of correlation coefficient between observed and fittedvalues

Total variability can be decomposed into within and betweenvariability:

1

NT

∑i ,t

(yit − y)2 =1

NT

∑i ,t

(yit − yi )2 +

1

NT

∑i ,t

(yi − y)2

STATA provides three “R2” statistics:

. R2within = corr 2((xit − xi )

′βFE , yit − yi )

. R2between = corr 2(x ′i βB , yi )

. R2overall = corr 2(x ′it βOLS , yit)


Discussion

DiscussionSource of the examples: Wooldridge

Two questions:. Is the unobserved effect ci uncorrelated with xit for all t?. Is the strict exogeneity assumption (conditional on ci ) reasonable?

Examples:

(a) Program evaluation

log(wageit) = θt + z′itγ + δ1progit + ci + uit

(b) Distributed Lag Model (Hausman, Hall, Griliches, 1984)

patentsit = θt + z′itγ + δ0RDit + δ1RDit−1 + ...+ δ5RDit−5 + ci + uit

(c) Lagged Dependent Variable

log(wageit) = β1 log(wageit−1) + ci + uit


Main References

Main References

Baltagi BH (2001): Econometric Analysis of Panel Data, John Wiley& Sons Ltd.

Chamberlein G (1984): Panel Data, in Griliches and Intriligator, (eds.)Handbook of Econometrics, Vol.2, Elsevier Science, Amsterdam

Greene, WH (2003): Econometric Analysis, Prentice Hall, ch.13

Hsiao C (2003): Analysis of Panel Data, Cambridge University Press

Mundlak Y (1978): “On the Pooling of Time Series and CrossSection Data”, Econometrica, 46(1), 69-85

Verbeek M (2006): A Guide to Modern Econometrics, ch. 10

Wooldridge, JM (2002): Econometric Analysis of Cross Section andPanel Data, MIT Press: Cambridge, ch.10