Lecture 6 LBS Slides
Transcript of Lecture 6 LBS Slides
Econometrics Part ILecture 6
6 November 2009
J. James Reade
Admin
• Lectures:
– Today: Simultaneous equations modelling and VARs.– Next week: Limited Dependent Variable Modelling.– Final lecture: Treatment effects and recap.
• Lecture notes:
– Handouts for each lecture: Slides.– Please point out typos!
• Classes: This week, next week.
• Exam: Two weeks on Tuesday.
– Prep: Assignments best guide for exam.
• Happy to chat via email or after lecture.
Today: Multiple-Equation Modelling
• Many reasons to estimate more than a single-equation model.
• Panel: Time series observed over multiple units.
– View as system of time series?
• We may want to consider several models jointly:
– Helpful if know disturbances likely correlated.– E.g. CAPM residuals (excess returns) over different firms correlated.
• May want to consider determination of several variables jointly:
– Endogeneity/simultaneity.
Today: Textbook Coverage
• Greene covers all material:
– SURE: Chapter 14.– Simultaneous equations: Chapter 15.– VARs, etc: Chapter 19.
• Another textbook gives good intro to simultaneous equations material:
– Gujarati: Basic Econometrics, ed. 4, Chs 18–21.
• Gujarati somewhat more shaky on time-series grounds though.
• Stock and Watson does not cover simultaneous equations.
The Seemingly Unrelated Regression Model
• Seemingly very similar to panel data.
• Set of data for number of units, say firms.
– May have cross equation dependencies, e.g. firms in similar industry.– May be more efficient to exploit error structure in estimation.
• E.g. Consumer demand for N goods. Could estimate N demands.
– But constraints hold over all consumer: Budget constraints etc.
• These kind of models motivate the Seemingly Unrelated Regression approach.
– Proposed by Zellner (1962).
The Seemingly Unrelated Regression Model
• Take bivariate case: E.g. two goods, N observations for each:
y1 = X1β1 + u1 u1 ∼ (0, σ21), (1)
y2 = X2β2 + u2 u1 ∼ (0, σ22). (2)
• Write as a system:
(y1y2
)2N×1
=(X1 00 X2
)2N×(K1+K2)
(β1
β2
)(K1+K2)×1
+(u1
u2
)2N×1
. (3)
• Stacked form:
y = Xβ + u, E (u |X ) = 0, Cov (u |X ) = Ω =(σ11In σ12Inσ21In σ22In
). (4)
• Big regression problem: Use OLS? βOLS = (X ′X)−1X ′y. (5)
The Seemingly Unrelated Regression Model
• OLS estimator has properties:
E(βOLS |X
)= β, Cov
(βOLS |X
)= (X ′X)−1X ′ΩX(X ′X)−1. (6)
– Latter follows from assumption that Cov (u |X ) = Ω.
• This is just a big OLS estimation: OLS on each equation seperately.
• But is Ω diagonal?
– In many cases unlikely: CAPM example, demand expenditures.– If Ω non-diagonal then OLS inefficient.– Can gain over OLS by using GLS and exploiting cross-equation correlations.
• Using GLS is called Seemingly Unrelated Regression (SURE) analysis.
The Seemingly Unrelated Regression Model
• GLS estimator:
βSURE = (X ′Ω−1X)−1X ′Ω−1y. (7)
• We’ve assumed Ω non-diagonal:
Ω =(σ11In σ12Inσ21In σ22In
). (8)
• So σ12 and σ21 non-zero.
• SURE estimation weights this information, in effect using parts of β1 in β2,SURE.
– Problematic if one equation in system misspecified as effects transmitted acrossall equations.
Alternative Systems of Equations
• Panel data from last week seemingly a generalisation of SURE.
– Although would appear SURE accounts for cross-section dependence.1
• May be interested not in the same data series over many different units of observation.
• What if the variables for a given unit are endogenously determined?
– In macroeconomics, it’s hard to see how data are anything but endogenous.– Demand and supply also.
• Much of today: Simultaneous equation models.
– Extending into VAR modelling.
1Don’t quote me on this.
A Necessity for Simultaneous Equations
• Endogeneity pervasive in econometric modelling.
– Regression of yi on xi: yi = βxi + εi. (9)
– But yi also depends on xi: xi = θyi + ei. (10)
– Estimator: β =∑Ni=1 xiyi∑Ni=1 x
2i
= β +∑Ni=1 xiεi∑Ni=1 x
2i
. (11)
– But E (xiεi) = E ((θyi + ei)εi) = E ((θ(βxi + εi) + ei)εi) = θE(ε2i)6= 0.
• Endogeneity or simultaneity manifested in correlation between errors and variables.
– Already considered many strategies for this ailment.
• Today we consider estimating each possible equation.
Multiple-Equation Modelling: Supply and Demand
• E.g. Market for PhDs: Demand, supply, equilibrium wages (w) and employment (e).
– Market clearing w, e caused by demand and supply.– But demand and supply affected by what w and e are.– Joint determination of equilibrium quantities.
• Then quantity demanded of PhDs qdt is: qdt = β11 + β12wt + ε1t. (12)
• But the quantity supplied qst is: qst = β21 + β22wt + ε2t. (13)
• OLS estimation of either β12 or β22 problematic: Setting qdt = qst gives:
wt =β11 − β21
β22 − β12+ε1t − ε2tβ22 − β12
, E (wtε1t) 6= 0, E (wtε2t) 6= 0. (14)
Multiple-Equation Modelling: Macroeconomics
• Macroeconomic, or general-equilibrium, modelling:
– Interested in joint determination of many variables.– E.g. Interest rates, output gap, inflation.– Macro models specify equations for each of these variables.
• All such systems have two types of variable:
– Endogenous Variables: Determined with the model/system.– Exogenous Variables: Determined outside the model/system.∗ Also known as predetermined variables.
Multiple-Equation Modelling
• Stylised model: M endogenous variables Y1,t, . . . , YM,t, K exogenous variablesX1,t, . . . XK,t.
• Implies M equations, one for each endogenous variable:
Y1,t = β12Y2,t + β13Y3,t + · · ·+ β1MYM,t + γ11X1,t + · · ·+ γ1KXK,t + ε1,t,
Y2,t = β21Y1,t + β23Y3,t + · · ·+ β2MYM,t + γ21X1,t + · · ·+ γ2KXK,t + ε2,t,
... = ... + ... + · · ·+ ... + ... + · · ·+ ... + ... ,
YM,t = βM1Y1,t + βM2Y2,t + · · ·+ βM,M−1YM−1,t + γM1X1,t + · · ·+ γMKXK,t + εM,t.
• This is the structural form of an economic, or econometric model.
– Endogenous variables in terms of other endogenous and exogenous variables.
• Some or many coefficients may be restricted to zero by theory a priori.
Structural and Reduced Form Modelling
• Could write structural form instead as: BYt = GXt + εt. (15)
• Where:
B =
1 −β12 −β13 . . . −β1M
−β21 1 −β23 . . . −β2M... ... ... . . . ...
−βM,1 −βM,2 −βM,3 . . . 1
, G =
γ11 . . . γ1K... . . . ...
γM,1 . . . γM,K
,
εt =
ε1tε2t...
εMt
, Yt =
Y1t
Y2t...
YMt
, Xt =
X1t
X2t...
XKt
.
• Clearly modelling structural form has endogeneity problems.
– Solve for endogenous variables in terms of exogenous variables. . .
The Reduced Form
• Reduced form: Represent endogenous variables in terms of exogenous variables.
Yt = B−1GXt +B−1εt. (16)
• E.g. PhDs job market: qdt = β11 + β12wt + ε1t, (17)
qst = β21 + β22wt + ε2t. (18)
• Reduced form here comes from equilibrium: qst = qdt so:
wt =β11 − β21
β22 − β12+ε1t − ε2tβ22 − β12
= Π0 + vt. (19)
• Where: Π0 =β11 − β21
β22 − β12, vt =
ε1t − ε2tβ22 − β12
. (20)
Identification Difficulties
• Can now find equilibrium quantity from either original equation:
qt = β11 + β12
(β11 − β21
β22 − β12+ε1t − ε2tβ22 − β12
)+ ε1t, (21)
=β11β22 − β12β21
β22 − β12+β22ε1t − β12ε2tβ22 − β12
= Π1 + et. (22)
• Where: Π1 =β11β22 − β12β21
β22 − β12, et =
β22ε1t − β12ε2tβ22 − β12
. (23)
• These are reduced-form equations for wage wt and quantity employed qt.
• Can estimate Π0, Π1: No endogeneity problem remains.
• But Π0, Π1 are non-linear functions of β11, β12, β21, β22, the structural parameters.
– Cannot uncover structural parameters from reduced form estimation: Unidentified.
Identification
• Identification achieved if unique values of structural parameters can be found.
• Main interest in PhD model are structural parameters β11, β12, β21, β22:
qdt = β11 + β12wt + ε1t, (24)
qst = β21 + β22wt + ε2t. (25)
• But OLS estimation biased and inconsistent due to E (wtεit) 6= 0, i = 1, 2.
• Reduced form estimators Π0, Π1 are consistently estimated by OLS.
• But cannot recover β11, β12, β21, β22 from Π0, Π1.
• Our parameters of interest are unidentified, or underidentified.
Identification Conceptually
• Recall from microeconomic theory: Market price satisfies demand and supply.
• (qt, wt) are set of equilibrium prices:
– Intersections of different demand and supply curves.
q
w
y y
y y
y
yy
Identification Conceptually
• Recall from microeconomic theory: Market price satisfies demand and supply.
• (qt, wt) are set of equilibrium prices:
– Intersections of different demand and supply curves.
q
w
yS1
D1yS1
D1
yS1
D1yS1
D1
yS1
D1
yS1
D1
yS1
D1
Identification Difficulties
• Identification: Many observationally equivalent representations.
• Could add λ to demand equation, 1− λ to supply:
λqdt = λβ11 + λβ12wt + λε1t, (26)
(1− λ)qst = (1− λ)β21 + (1− λ)β22wt + (1− λ)ε2t. (27)
• Adding the two equations together yields:
qdt = γ1 + γ2wt + η1t, (28)
– Where: γ1 = λβ11 +(1−λ)β21, γ2 = λβ12 +(1−λ)β22, ηt = (1−λ)ε1t+(1−λ)ε2t.
• New equation (28) indistinguishable from demand or supply equations.
– Cannot tell which is which from data alone.
Seeking Identification
• Method for identification: Additional information.
• E.g. Include extra ‘shift’ variable in demand equation.
– Then can conceptually move that variable to shift demand curve.– Trace out supply curve.
q
w
y
S1
D3
yD2
yD1
yD4
yD5
Seeking Identification
• For PhDs, add extra variable to demand equation to find supply equation.
– Add student enrolments st to demand equation. Need lecturers to lecture.
qdt = β11 + β12wt + γ11st + ε1t, (29)
qst = β21 + β22wt + ε2t. (30)
• Then reduced form is: wt = Π0 + Π1st + vt, qt = Π2 + Π3st + ut. (31)
• Where: Π0 =β21 − β11
β12 − β22, Π1 =
γ11
β22 − β12, vt =
ε2t − ε1tβ12 − β22
,
Π2 =β12β21 − β11 − β22
β12 − β22, Π3 =
γ11β22
β22 − β12, ut =
β12ε2t − β22ε1tβ12 − β22
.
• Five unknowns, four equations. But: β21 = Π2 − β22Π0, β22 =Π3
Π1. (32)
– Hence supply equation identified.
Checking Identification
• As earlier, we can add λ and (1− λ) to each equation:
qt = γ0 + γ1wt + γ2st + εt. (33)
– γk and εt defined accordingly.
• Hence: This new representation indistinguishable from demand equation:
– Demand unidentified: Can take any linear combination of equations.
• But supply identified: Can distinguish supply from (33) by st term.
• We identify supply equation by adding term to demand equation:
– Extra demand term allows us to trace out supply curve.– Regression means can hold all else fixed, vary st.– Shifts demand curve while supply fixed:∗ What we get must be supply curve.
Seeking Identification
• Can repeat ‘trick’ to identify demand: Add term to supply equation.
– Add consultancy wage rate ct to supply equation. Outside option.– Add student enrolments st to demand equation. Need lecturers to lecture.
qdt = β11 + β12wt + γ11st + ε1t, (34)
qst = β21 + β22wt + γ21ct + ε2t. (35)
• Will yield reduced forms: wt = Π0 + Π1st + Π2ct + vt, (36)
qt = Π3 + Π4st + Π5ct + ut. (37)
• 6 unknowns: (β11, β12, β21, β22, γ11, γ21), 6 equations: (Π0,Π1,Π2,Π3,Π4,Π5).
– All parameters identified, both equations identified.
Overidentification
• Other factors influence demand and supply. E.g. for PhDs, previous period wage.
qdt = β11 + β12wt + γ11st + ε1t, (38)
qst = β21 + β22wt + γ21ct + γ22wt−1 + ε2t. (39)
• Usual problem of simultaneity bias means we look for reduced form:
wt = Π0 + Π1st + Π2ct + Π3wt−1 + vt, (40)
qt = Π4 + Π5st + Π6ct + Π7wt−1 + ut. (41)
• Got eight equations (Πs) for only seven structural parameters. E.g.:
β22 = Π6/Π2, β22 = Π5/Π1. (42)
Overidentification
• Got eight equations (Πis) for only seven structural parameters. E.g.:
β22 = Π6/Π2, β22 = Π5/Π1. (43)
• Model overidentified: Too much information.
– We exclude two variables for just one endogenous variable in supply function.
• Multiple expressions for parameters such as β22 may give different answers.
– Ambiguity transmitted to other parameters: β22 in denominators of other Πis.
• But TMI not necessarily a bad thing:
– Estimation methods exist to handle extra information.
Identification More Formally
• Identification very difficult to grasp and to find.
• Rank and order conditions exist to check for identification:
– Facilitate finding identification via automation in computer packages.
• Recap and extension of notation:
– M : Number of endogenous variables in model/system.– m: Number of endogenous variables in equation of model.– K: Number of exogenous, predetermined variables in model/system.– k: Number of exogenous, predetermined variables in equation of model/system.
The Order Condition
• Order condition is necessary but not sufficient for identification.
• Can be stated in two ways:
1. In model of M simultaneous equations, equation identified if:
• It excludes at least M − 1 variables (endogenous or otherwise).– If less than M − 1 excluded, unidentified.– If exactly M − 1 excluded, just identified.– If more than M − 1 excluded, overidentified.
2. In M -equation system, equation identified if:
• Number of exogenous variables excluded bigger than number of endogenousvariables in equation minus 1:
K − k ≥ m− 1. (44)
The Order Condition: Examples
• Simple demand example: qdt = β11 + β12wt + ε1t, (45)
qst = β21 + β22wt + ε2t. (46)
– Two endogenous variables, M = 2, each equation excludes zero variables.∗ Unidentified.
• Add student enrolment: qdt = β11 + β12wt + γ11st + ε1t, (47)
qst = β21 + β22wt + ε2t. (48)
– Two endogenous variables, M = 2, K = 1.– Demand equation: Excludes zero variables hence unidentified.– Supply equation: Excludes one variable variable hence identified.
The Order Condition: Examples
• Add student enrolment, consultancy wages and lagged wages:
qdt = β11 + β12wt + γ11st + ε1t, (49)
qst = β21 + β22wt + γ21ct + γ22wt−1 + ε2t. (50)
– Two endogenous variables, M = 2, three exogenous K = 3.– Demand equation: Excludes two variables (ct, wt−1) hence overidentified.– Supply equation: Excludes one variable (st) variable hence identified.
• But we already know supply equation actually overidentified.
The Order Condition
• Order condition probably most commonly used identification strategy.
• Use theory to argue why particular variables excluded.
• E.g. Rainfall affects supply and not demand for wheat.
• Rainfall is stark example: Not always so in economics.
– E.g. Ricardian equivalence: Debt has no real effect?!
• Implications of wrong identification strategy not innocuous:
– But very hard to test!
The Rank Condition
• Order condition necessary but not sufficient for identification.
• Even if satisfied equation may not be identified.
• E.g. If st insignificant in demand equation, γ11 = 0, supply unidentified.
• Identification also violated if exogenous variables excluded not independent:
– If linear combination exists, mapping from βs and γs to Πs non-unique.
Rank Condition:
• M -equation system identified iff at least one non-zero determinant of order (M −1)(M − 1) can be constructed using coefficients (endogenous or exogenous) ofvariables excluded from that equation.
– Necessary and sufficient condition for identification.
Rank Condition: An Example
• System of 4 endogenous Y variables and 3 exogenous X variables:
Y1t − β10 − β12Y2t − β13Y3t − γ11X1t = u1t, (51)
Y2t − β20 − β23Y3t − γ21X1t − γ22X2t = u2t, (52)
Y3t − β30 − β31Y1t − γ31X1t − γ32X2t = u3t, (53)
Y4t − β40 − β41Y1t − β42Y2t − γ43X3t = u4t. (54)
• Identified?
Eq. No K − k m− 1 Identified?
(51) 2 2 Exactly(52) 1 1 Exactly(53) 1 1 Exactly(54) 2 2 Exactly
Rank Condition: An Example
• To help with rank condition write equations in table:
Coefficients of the variablesEq No. 1 Y1 Y2 Y3 Y4 X1 X2 X3
(51) −β10 1 −β12 −β13 0 −γ11 0 0(52) −β20 0 1 −β23 0 −γ21 −γ22 0(53) −β30 −β31 0 1 0 −γ31 −γ32 0(54) −β40 −β41 −β42 0 1 0 0 −γ43
• To check equation (51): Form matrix of coefficients on Y4, X2, X3:
A =
0 −γ22 00 −γ32 01 0 −γ43
, detA = 0. (55)
• Hence A not full rank: Rows/columns not linearly independent.
– Relationships exist between variables hence unidentified.– Cannot tell (52) and (53) apart hence can’t tell (51) from either.
More on Identification
• Order condition necessary but not sufficient, rank condition necessary and sufficient.
• Rank tells us whether identified or not, order whether exact- or over-identification.
• Four cases:
1. If K − k > m− 1 and rank (A) = M − 1 equation overidentified.2. If K − k = m− 1 and rank (A) = M − 1 equation exactly identified.3. If K − k ≥ m− 1 and rank (A) < M − 1 equation under identified.4. If K − k < m− 1 equation unidentified.
• Rank condition can get difficult with large dimension systems:
– Often just order condition used if software cannot calculate.
Testing for Simultaneity
• If we have no simultaneity problem then OLS consistent.
– If we do have simultaneity then need 2SLS/IV — to come.
• Thus testing for simultaneity helpful. Hausman (1976) provided a test.
• Consider model: Qdt = α0 + α1Pt + α2Xt + ε1t, (56)
Qst = β0 + β1Pt + ε2t. (57)
• Reduced form: Pt = Π0 + Π1Xt + vt, (58)
Qt = Π2 + Π3Xt + et. (59)
• OLS gives: Pt and Pt = Pt + vt.
• Sub back into supply: Qt = β0 + β1Pt + β1vt + ε2t.
Testing for Simultaneity
• Test equation: Qt = β0 + β1Pt + β1vt + ε2t = β0 + β1Pt + θvt + ε2t.
• If simultaneity then vt correlated with ε2t:
– vt is variation remaining in Pt controlling for exogenous variables.– We have split Pt into potentially simultaneous component via instrumenting.
• If simultaneity then vt significant if run OLS on test equation. Since:
θ =∑Tt=1Qvt∑Tt=1 v
2t
=∑Tt=1Q(Pt − Pt)∑T
t=1 v2t
. (60)
• Hence Hausman test for simuntaneity is t-test on olsvt:
– If test rejected, conclude simultaneity.
Estimation
• Estimating system: BYt = GXt + εt (61)
– M endogenous variables in M × 1 matrix Yt, M ×M coefficient matrix B.– K exogenous variables in K × 1 matrix Xt, M ×K coefficient matrix G.
• Recall reduced form:
Yt = B−1GXt +B−1εt = ΠXt + ut. (62)
• OLS estimation of Π consistent.
– But Π rarely of interest: These are reduced-form parameters.– B, G more of interest but not necessarily identified.
Estimation
• Two estimation methods:
1. Limited Information Methods:
• Estimate each equation of system separately.• Take into account restrictions on that equation, not others.
2. Full Information Methods:
• Estimate all equations jointly, or simultaneously.• Impose all restrictions on all equations (required for identification).
Other Special Cases
• If the model is recursive, also known as triangular or causal, then:
B =
1 0 0 . . . 0−β21 1 0 . . . 0−β31 −β32 1 . . . 0
... ... ... . . . ...−βM1 −βM2 −βM3 . . . 1
. (63)
• I.e. Y1 causes Y2 causes Y3 etc. hence no endogeneity problems.
• Also require that Cov (εt) = Ω is diagonal, i.e. Cεit, εjt = 0 for all j 6= i.
• Here OLS is consistent.
Other Special Cases
• Vector autoregression: E.g. VAR(2):
Xt = Π1Xt−1 + Π2Xt−2 + εt. (64)
• Here: Xt =
X1t
X2t...Xpt
. (65)
• Model has no contemporaneous values of X1, . . . , Xp on RHS hence OLS consistent.
• Usually economic theory imposes structure on contemporaneous variables:
BXt = Π1Xt−1 + Π2Xt−2 + εt (66)
– Then again got issue of endogeneity.– More on VARs later. . .
Limited Information Methods
• In just-identified case, we can use indirect least squares (ILS). Model:
qdt = α0 + α1pt + α2st + ε1t, (67)
qst = β0 + β1pt + β2ct + ε2t. (68)
• Proceed in three steps:
1. Find reduced-form equations:– Endogenous variables in terms of only exogenous variables.
pt = Π0 + Π1st + Π2ct + ut,
qt = Π3 + Π4st + Π5ct + ut.
2. Estimate reduced-form equations by OLS:– Yields consistent estimates of reduced-form parameters as no endogeneity.– Produces Π1, . . . , Π5.
3. Obtain estimates of structural coefficients by one-to-one correspondence:– Requires just-identification to get α1, . . . , β2.
Limited Information Methods
• ILS breaks down if equation overidentified:
– More than one possibility for each parameter, standard errors dubious.
• Say model is: qdt = α0 + α1pt + α2st + α3pt−1 + ε1t,
qst = β0 + β1pt + β2ct + ε2t.
• Problem remains that pt endogenous hence E (ptε1t) 6= 0.
• Require method to isolate component of pt correlated with ε1t.
• Can use instrumental variable estimation:
– Exogenous variables st, pt−1, ct satisfy one instrumenting condition.∗ Namely uncorrelatedness with the error term.
– Expect relevance condition to hold — otherwise unidentified.– Method: Two-stage least squares
Two-Stage Least Squares
1. Regress endogenous variable pt on all exogenous variables in system:
pt = π0 + π1st + π2pt−1 + π3ct + ut. (69)
• Yields estimates π0, . . . , π3 to use to get fitted values pt.• pt is variation in pt explained by exogenous variables.
– Hence uncorrelated with error component.• ut = pt − pt is remaining component correlated with error.
2. Run original system equation using pt in place of pt:
qdt = α0 + α1pt + α2st + α3pt−1 + ε1t. (70)
• Resulting estimate α1 consistent provided exogenous variables valid instruments.• Isolate and remove component of pt correlated with error term by instrumenting.• Can estimate even if model overidentified as here.
More on 2SLS
• 2SLS estimator: β2SLS = (Z ′iXi)−1(Z ′iyi) = β + (Z ′iXi)−1(Z ′iεi). (71)
– Where Zi is vector of exogenous variables, Xi endogenous and yi dependent.
• As before, need E (Z ′iXi) 6= 0 and E (Z ′iεi) = 0 for consistency:
– I.e. Need exogenous variables be exogenous and to identify system.
• First stage provides test of relevance of instruments.
• Need to ensure standard errors correctly calculated on second stage:
– Use pt not pt to calculate residuals hence standard errors etc.
Full Information Methods
• Write as stacked system: y = Zδ + ε, ε ∼ (0,Σ). (72)
• Or:
y1y2...yM
=
Z1 0 . . . 00 Z2 . . . 0... ... . . . ...0 0 . . . ZM
δ1δ2...δM
+
ε1ε2...εM
. (73)
• Here: Zm = (Ym, Xm), hence both endogenous and exogenous variables.
• OLS estimator:
δOLS = (Z ′Z)−1Z ′y. (74)
– OLS on system equivalent to equation-by-equation OLS hence inconsistent.∗ Also by SUR inefficient: Does not exploit information in Σ.
Full Information: 3SLS
• Method to solve inconsistency:
– Instrumental Variables estimation.
• Method to solve inefficiency:
– Feasible GLS estimation.
• Hence find estimator in three stages:
– Three-stage least squares:
Full Information: 3SLS
1. Instrumenting equation estimation:
• Use all exogenous variables X as instruments.• Yielding Π to create Yi = ΠXi, where i = 2, . . . ,M for equation 1, etc.
– Gives matrix Zi formed of Y1 and Xi.
2. Instrumental variables estimation:
• Use Zi in place of Zi.• Yields 2SLS estimators δ2SLS:
δ2SLS = (Z ′Z)−1(Z ′y) = (W ′Z)−1W ′y. (75)
• Estimator consistent only if E (W ′ε) = 0, E (W ′Z) 6= 0.• Also get variance-covariance matrix estimator Σ:
σ2ij = T−1(yi − Ziδi)′(yj − Zjδj). (76)
Full Information: 3SLS
3. Feasible GLS estimation:
• Use Z and Σ from 2SLS estimates.
δIV,GLS = δ3SLS = (Z ′Σ−1Z)−1Z ′Σ−1y = (W ′Σ−1Z)−1W ′Σ−1y. (77)
• 3SLS consistent provided instruments are valid.
• Asymptotic efficiency amongst system IV estimators.
Full-Information Maximum Likelihood (FIML)
• Likelihood framework can be applied to system.
• Begin with reduced-form: Y = XΠ + V. (78)
– Each row of V assumed multivariate Normal: vi |X ∼ N (0,Ω).
• Log-likelihood function: ln L = −T2[M ln(2π) + ln |Ω|+ tr
(Ω−1W
)]. (79)
• Where: Wij = T−1(y −Xπi)′(y −Xπj). (80)
– Here, πi is ith column of Π not the number.
• Maximise likelihood subject to all restrictions placed on system: B matrix.
Full-Information Maximum Likelihood (FIML)
• Reduced form Y = XΠ + V found from structural form:
Y Γ = XB + U, U ∼ (0,Σ),
⇒ Y = XBΓ−1 + UΓ−1, UΓ−1 ∼ (0,Γ−1′ΣΓ−1).
• Interested in structural form not reduced form so use substitutions:
Π = BΓ−1, Ω = Γ−1′ΣΓ−1, Ω−1 = ΓΣ−1Γ′. (81)
• Hence:
ln L = −T2[M ln(2π) + ln
∣∣Γ−1′ΣΓ−1∣∣+ tr
(ΓΣ−1Γ′(Y +XBΓ−1)′(Y +XBΓ−1)
)].
Full-Information Maximum Likelihood (FIML)
• Again:
ln L = −T2[M ln(2π) + ln
∣∣Γ−1′ΣΓ−1∣∣+ tr
(ΓΣ−1Γ′(Y +XBΓ−1)′(Y +XBΓ−1)
)].
• Simplified: ln L = −T2[M ln(2π)− 2 ln |Γ|+ ln |Σ|+ tr
(Σ−1S
)].
• Where: sij = T−1(Y Γi +XBi)′(Y Γj +XBj). (82)
• Maximise likelihood to yield Γ, B matrices:
Full-Information Methods
• FIML:
– Coherent estimation framework.– Testing feasible via LR, LM, Wald tests.– But: Normality assumption may not be valid.– But: Numerical optimisation.
• 3SLS:
– Vastly easier to compute: No numerical methods.– If Normal errors assumed, 3SLS and FIML same asymptotic properties.∗ 3SLS thus much more popular in usage.
– Small samples: Because many parameters to estimate, 3SLS and FIML may diverge.
Simultaneous Equation Methods Condensed
• Often need to estimate more than one equation:
– Similar regression for different firms, goods, etc.– Equations for all endogenous variables in system.
• Estimation:
– Instrumental variables to counter endogeneity.∗ Exogenous variables are instruments.
– Full- or limited-information (IV) methods:∗ Full can be computationally cumbersome.∗ Limited information methods inefficient.
Break time?
Adding Lagged Variables: Vector Autoregressions
• Simultaneous equations models often used in time-series context:
• Huge macroeconomic models of 1960s and 1970s:
– Klein-Goldberger model of US economy: 20 equations.– Brookings-Social Science Research Council model: 150 equations.
• Models need not be time series but generally are.
• What if many lags included? Stability?
• Alternative simultaneous equations model:
– The Vector Autoregression (VAR — not VaR).– Only endogenous variables, but no contemporaneous terms.– Often used for ‘theory-free’ estimation or forecasting.
Vector Autoregressive Models
• Already noted second-order VAR: Two lags:
Xt = Π0 + Π1Xt−1 + Π2Xt−2 + εt. X1,t...
Xp,t
=
π01...π0p
+
π111 . . . π1
1p... . . . ...π1p1 . . . π1
pp
X1,t−1...
Xp,t−1
+
π211 . . . π2
1p... . . . ...π2p1 . . . π2
pp
X1,t−2...
Xp,t−2
+
ε1,t...εp,t
.
• All variables X1, X2, . . . , Xp determined within system. Endogenous.
• But: No variables dated t enter each equation:
– Hence can apply time-series methods to estimate.
The Vector Autoregressive Model
• Object of interest: set of variables over time:
• Xt p-dimensional data vector at time t: Xt =
X1,t
X2,t...
Xp,t
. (83)
• p variables relating to particular problem of interest:
– Xt is snapshot at point in time:
Xt =
(tax/Y )t(G/Y )t(D/Y )tπtY gapt
=
2008:30.1800.2180.6950.053−0.022
. (84)
• X is p× T data vector:
X =
(tax/Y )(G/Y )(D/Y )π
Y gap
=
1966:10.172
1966:20.176
1966:30.176
1966:40.176 . . .
2008:30.180
0.165 0.172 0.174 0.177 . . . 0.2180.416 0.405 0.409 0.408 . . . 0.6950.023 0.028 0.033 0.037 . . . 0.0530.058 0.050 0.046 0.042 . . . −0.022
. (85)
The Vector Autoregressive Model
• p Xp,t variables:
– Same variable, different countries/regions/firms/people: e.g. exchange rates.– Different variables, same country/region/firm/person: e.g. consumption, income.
• Can view as reduced-form modelling:
– Without assuming exogeneity.
• Can test exogeneity assumptions easily.
– Variable Xp,t exogenous if determined outside system.– Hence if all coefficients on Xs,t−k insignificant, p 6= s, k > 0, Xp,t exogenous.– Can then omit equation in Xp,t from system.∗ Similar to Granger causality test: More later. . .
What are VARs Used For?
• Forecasting:
– All RHS variables are lagged: Can forecast tomorrow given today’s data.
• Theory-free modelling:
– Sims: No need for ‘incredible restrictions’ implied by theory.
• Theory-full modelling:
– VARs correspond well to reduced form of many macroeconomic theory models.– Theory interested in effect of impulse to variables through time:∗ E.g. Effect of government spending on GDP.
What are VARs Used For?
• VARs give extremely rich characterisation of dynamics in data.
– Autoregressive and distributed lag components.– Can cope with unit roots as in AR(k) model: Cointegration.– Deterministic terms can be incorporated (with care).– No conditioning or exogeneity assumed.
• Many theories in economics and wider postulate steady states.
– Yet data are non-stationary and endogenous.– Concept of cointegration means can estimate steady state.– VAR model allows endogeneity and testing of exogeneity.
Cointegration
• If Yt, Xt both I(d) then in general any linear combination will also be I(d).
• If there exists linear combination et = Yt − βXt s.t. et ∼ I(d− b), b > 0 then:
– Yt, Xt are cointegrated of order (d, b).
• In reality rarely find anything other than cointegration of order (1, 1): ‘Cointegration’.
• Cointegration very powerful concept in time-series econometrics.
– Recall spurious regression? Brazilian rainfall causes UK GDP?– Cointegration allows estimation of levels relationships in data.∗ Even if data non-stationary.
• Same spuriousness possible in VAR, simultaneous equation modelling.
Going back to our ADL Model
• ADL model: Yt = δ + α1Yt−1 + β0Xt + β1Xt−1 + εt. (86)
• Error/equilibrium-correction mechanism (ECM) with long-run solution nested is:
∆Yt = β0∆Xt + (α1 − 1)[Yt−1 −
δ
1− α1− β0 + β1
1− α1Xt−1
]+ εt. (87)
• Define ecmt as long-run solution: ecmt = Yt −δ
1− α1− β0 + β1
1− α1Xt. (88)
• Subtract [δ/ (1− α1)− (β0 + β1)Xt/ (1− α1)] from both sides of ADL:
ecmt = δ − δ
1− α1+ α1Yt−1 +
(β0 −
β0 + β1
1− α1
)Xt + β1Xt−1 + εt. (89)
• Collect terms and adding and subtracting a term in Xt−1 to create ecmt−1:
ecmt = α1
[Yt−1 −
δ
1− α1− β0 + β1
1− α1Xt−1
]+[α1 (β0 + β1)
1− α1+ β1
](Xt−1 −Xt) + εt
= α1ecmt−1 +(β1 + α1β0
1− α1
)(Xt−1 −Xt) + εt
= α1ecmt−1 + ξt.
– Where: ξt = εt −(β1 + α1β0
1− α1
)νt.
– νt is error term on random walk for Xt: νt = Xt −Xt−1.
• Hence: ecmt = α1ecmt−1 + ξt, ξ ∼ I(0). (90)
What was that for?
• We just found that: ecmt = α1ecmt−1 + ξt, ξ ∼ I(0). (91)
• Thus provided |α1| < 1, then ecmt ∼ I(0):
– Recall: ecmt = Yt −δ
1− α1− β0 + β1
1− α1Xt.
– ecmt is long-run solution for Yt and Xt, steady state.– Steady-state relationships can exist in context of non-stationary model.∗ Cointegration: Xt ∼ I(1), Yt ∼ I(1), but ecmt = f (Yt, Xt) ∼ I(0).
• Fundamental concept in econometrics:
– Even though data non-stationary can still estimate static steady-state conditions.– Appeals to common sense and to economic theory: Allows estimation.
∆Yt = β0∆Xt + (α1 − 1) [Yt−1 − κ0 − κ1Xt−1] + εt. (92)
All Good, but What About Estimation?
• Cointegrating relationship is static: Yt = βXt + ecmt, ecmt ∼ N(0, σ2
ecm
).
– Thus no problems with estimating: no residual autocorrelation.– Despite dynamic nature of relationship, can estimate static equation.
• Furthermore: it’s ‘super consistent’: estimate converges to true value very fast.2
• Suggests a procedure: Estimate ecmt, insert that into ECM and estimate.
– Engle-Granger 2-step procedure.– Intuitive and encompasses test for cointegration: is ecmt ∼ I(0)?
2It converges at rate T as opposed to√T for ‘normal’ estimator.
That Cointegration Test
• Form of ADL and ECM suggest cointegration test of ADF form:
∆ecmt = φecmt−1 +p−1∑i=1
φi∆ecmt−i + µ+ δt+ ωt, ωt ∼ N(0, σ2
ω
). (93)
• Include trend and constant either to (93) or to ecmt equation.
– Since (93) just residuals from ecmt equation.
• Cointegration test: H0 : φ = 0. (94)
• However, cannot use Dickey-Fuller distribution.
– Must use MacKinnon (1991) critical values: See Table 4.1, Harris and Sollis.
Sounds Too Good to be True?
• Unfortunately, it is.
• Banerjee et al (1993): β biased in small samples.
• β has very complicated distribution: cannot draw standard inference.
• The test of cointegration has low power:
– Rejects null (no cointegration) too infrequently when null false.– Thus we conclude in favour of cointegration too little.
• Endogeneity issues: often in macro systems feedback Yt to Xt.
• More than one steady-state relationship?
– Fiscal and monetary policy: why not two cointegrating relationships?– Cannot estimate more than one here.
Solutions?
• Numerous other estimation strategies suggested for single-equation framework.
• But: all suffer from endogeneity problem and can’t estimate > 1 relationships.
• Solution: Simultaneous equations, or vector autoregressive model (VAR).
– Johansen (1996) proposed VAR approach.– Workhorse of cointegration analysis.
A first order autoregressive model
• We can build up to the VAR(k) in several steps. . .
• First order autoregressive (AR(1)) process: xt = ρxt−1 + µ+ εt. (95)
• Model solution: Recursive substitution: xt = ρtx0 +t∑i=1
ρi (µ+ εi) . (96)
– Moving average representation.
• Cases:
1. Stationarity: z1, . . . , zt (strongly) stationary if: (z1, . . . , zt)D= (zs, . . . , zt+s) ∀s.
– Weak, or covariance, stationary if: E (zt) = µ, Cov (zt, zt−s) = γ(s), ∀t.
– With εt ∼ N (0,Ω), weak ⇒ strong.
Stationary Case
• If |ρ| < 1, characterise model as:
E (xt |x0) = ρtx0 +t∑i=1
ρiµ −→ µ
1− ρ
Var (xt |x0) = E
(2i∑i=1
ρ2iε2i
)=
t∑i=1
ρ2iσ2 −→ σ2
1− ρ2
Cov (xt, xt−k |x0) = E
(t−k∑i=1
ρ2iε2i
)=
t−k∑i=1
ρ2iσ2 −→ σ2
1− ρ2.
• Process stationary asymptotically but not in small samples.
Case 2: Unit-root
• If ρ = 1: xt = x0 +t∑i=1
(µ+ εi) , (97)
• So:E (xt |x0) = x0 + tµ −→∞
Var (xt |x0) =2i∑i=1
ε2i =t∑i=1
σ2 = tσ2 −→∞.
• Mean, variance functions of t: non-stationary.
• Unit root case corresponds many economic data series.
3. Explosive case: ρ > 1 not considered: infinity and beyond. . .
Moving-average Representations
• MA representation: easy to characterise the data process under consideration.
– Principle same for more complicated models.
• Also recall lag operator L s.t. Lkxt = xt−k. If |ρ| < 1:
xt − ρxt−1 = εt (98)
xt (1− ρL) = εt (99)
xt = (1− ρL)−1εt
= εt + ρεt−1 + ρ2εt−2 + . . . ,(100)
• Alternative derivation that assumed stationarity.
Impulse response analysis
• MA representation also facilitates impulse response analysis.
– If economy shocked (impulsed) now, where will it be in h periods?– Formally written:
xt+h = ρhxt +t+h∑i=t
ρi(µ+ εi). (101)
– Taking expectations: E (xt+h |xt) = ρhxt +t+h∑i=t
ρiµ. (102)
– Impulse response defined as: IR (h) =∂E (xt+h |xt)
∂xt= ρh. (103)
– If stationary, |ρ| < 1 then IR (h) =⇒ 0: impulse dies away.– If unit root, ρ = 1, then IR (h) = 1 ∀h: shock cumulates, never dies away.– If explosive, |ρ| > 1, IR (h)→∞.
Three Impulse Responses
0 10 20 30 40 50 60 70 80 90 100
0.5
1.0 Stationary processxt=0.6xt−1+εt
0 10 20 30 40 50 60 70 80 90 100
0.5
1.0 Random walk process
xt=1xt−1+εt
0 10 20 30 40 50 60 70 80 90 100
5
10
15 Explosive processxt=1.03xt−1+εt
Some Caution on Impulse Responses. . .
• IR analysis phenomenally popular in empirical studies.
• Impulse to residual of statistical model 6= economic shock.
– Even if model is identified.
• ‘Retail Energy Prices and Consumer Expenditures’ by Paul Edelstein and Lutz Kilian:
– IR but no formal checks on model: Confidence in output?
• Can impose restrictions to identify structure so it accords to theory:
– But in VARs, IRs heavily dependent on particular restrictions.– Identification restrictions generally not test-able.– Causality very difficult to achieve in macroeconomics.∗ (Identification is on comtemporaneous terms in VAR)
• Impulse response analysis intuitively great but fraught with difficulties.
The AR(2) Model
• Model: xt = π1xt−1 + π2xt−2 + εt. (104)
• Lag operator:(1− π1L− π2L
2)xt = εt ⇒ Π(L)xt = εt (105)
– Characteristic (lag) polynomial defined as:
Π(z) = 1− π1z − π2z2 = (1− ρ1z) (1− ρ2z) , (106)
1Π(z)
=1
(1− ρ1z) (1− ρ2z)=∞∑i
ρi1zi∞∑i
ρi2zi =
∞∑n
cnzn, (107)
– cn → 0 as n→∞ if |ρ1| < 1, |ρ2| < 1, so MA(∞) exists: xt =∞∑n=0
cn (µ+ εt−n) .
• Taking expectations: E(xt |x0) =∞∑n=0
cnµ =µ
(1− ρ1)(1− ρ2)=
µ
1− π1 − π2.
Impulse Response Analysis Again
• Want to know impact at t+ h of impulse at t:
xt+h =∞∑n=0
cn (µ+ εt+h−n) (108)
= c0(εt+h − µ) + c1(εt+h−1 − µ) + . . .
+ ch−1(εt+1 − µ) + ch(εt − µ) + ch+1(εt−1 − µ) + . . .(109)
= · · ·+ ch(xt − π1xt−1 − π2xt−2) + . . . (110)
– Only residual εt matters: rest set to zero.
• Hence: IR (h) =∂
∂xtE(xt+h |x0, . . . , xt) = ch −→ 0. (111)
– Same implications as before for unit root, explosive cases.
Bi-variate VAR(2) model with deterministic terms
• Model: Xt = Π1Xt−1 + Π2Xt−2 + εt (112)(X1,t
X2,t
)=(π1,11 π1,12
π1,21 π1,22
)(X1,t−1
X2,t−1
)+(
π2,11 π2,12
π2,21 π2,22
)(X1,t−2
X2,t−2
)+(ε1,tε2,t
).
(113)
• Characteristic polynomial defined as:
Π(z) = I2 −Π1z −Π2z2 =
(1− π1,11z − π2,11z
2 −π1,12z − π2,12z2
−π1,21z − π2,21z2 1− π1,22z − π2,22z
2
).
(114)
– z scalar, πk,ij is ijth element of Πk.
• As in univariate system, use Π(z) to characterise properties of model.
• Multivariate equivalent to solving for roots is to solve det(Π(z)) = 0:
det(Π(z)) = (1− ρ1z)(1− ρ2z)(1− ρ3z)(1− ρ4z) = 0, (115)
– ρi functions of Π1, Π2.
• Linear algebra:
Π(z)−1 =adj(Π(z))det(Π(z))
, (116)
– adj(Π(z)) adjoint/adjugate matrix: each element at most order 2 as matrix 2× 2.– So convergence of Π(z)−1 depends on det(Π(z)).
• We already have det(Π(z)) so:
Π(z)−1 =adj(Π(z))det(Π(z))
=P (z)
(1− ρ1z)(1− ρ2z)(1− ρ3z)(1− ρ4z)
= P (z)
∞∑i=0
ρi1zi∞∑j=0
ρj2zj∞∑k=0
ρk3zk∞∑m=0
ρm4 zm
=∞∑n=0
P ∗nzn,
– P (z) second order function of z incorporated into P ∗n– P ∗n exponentially convergent if |ρi| < 1.
• If |ρi| < 1 MA(∞) representation:
Xt =∞∑i=0
P ∗i (ΦDt−i + εt−i) = Π(L)−1 (ΦDt + εt) ,
• Π−1(z) =∑∞i=0P
∗i z
i.
• Hence E(Xt) =∑∞i=0P
∗i ΦDt−i, V ar(Xt) =
∑∞i=0P
∗i ΩP ∗′i .
• Xt not stationary as Dt depends on t, but Xt − E(Xt) is stationary.
The companion form of a vector autoregressive model
• Carrying on with VAR(2), useful expression is companion form:
(Xt
Xt−1
)=(
Π1 Π2
I2 0
)(Xt−1
Xt−2
)+(
ΦDt + εt0
)(117)
= ΞXt + vt, (118)
– Ξ, Xt, vt suitably defined.
• Ξ is companion matrix:
– VAR(p) reduced to VAR(1) representation– Useful for characterising model via MA representation.
• Roots of companion matrix = roots of system, found by solving eigenvalue problem:
det((
Π1 Π2
I2 0
)− ρ
(I2 00 I2
))= 0. (119)
• Equivalently:
(Π1 Π2
I2 0
)(v1v2
)= ρ
(I2 00 I2
)(v1v2
), (120)
• Implying:Π1v1 + Π2v2 = ρv1,
v1 = ρv2,
⇒ Π1v1 + Π2ρ−1v1 = ρv1. (121)
• det(A− ρI) = 0 ⇐⇒ Av = ρIv then (121) ⇒ det(ρ−Π1 −Π2ρ−1) = 0, or:
ρ−1Π1v1 + Π2ρ−2v1 = v1 ⇐⇒ det(I2 −Π1ρ
−1 −Π2ρ−2) = 0. (122)
• If roots of characteristic polynomial (ρ) outside the unit circle, then roots ofcompanion matrix (ρ−1) inside unit circle, system stationary.
• Intuition as in AR(1): stationarity conditions enable MA(∞) representation, allowcharacterisation of model.
The unrestricted vector autoregressive model
• The unrestricted VAR model with two lags is:
Xt = Π1Xt−1 + Π2Xt−2 + ΦDt + εt
• Define:
B =
Π′1Π′2Φ′
Wt =
Xt−1
Xt−2
Dt
, (123)
• Can simplify:
Xt = B′Wt + εt. (124)
The Assumptions of the VAR Model
• The VAR(p) depends on a number of assumptions:
1. (Xt |Xt−1, Xt−2, . . . , Xt−p) mutually independent.2. (Xt |Xt−1, Xt−2, . . . , Xt−p) ∼ N (Π1Xt−1 + · · ·+ ΠpXt−p + ΦDt,Σ).
– Conditional Normality.3. Parameter space exists.
• Vital that assumptions hold.
• Likelihood framework gives powerful tool for economic analysis.
– But ‘price’ is distributional assumption.
Maximum likelihood estimation of the unrestricted VAR
• First define likelihood function:
– Joint density of Xt given parameter set θ.
• Autoregressive structure requires sequential factorisation: no independenceassumption.
f(Xt, Xt−1, . . . , X1, X0) =T∏t=k
f(Xt |Xt−1, . . . , Xt−k). (125)
• Likelihood defined as:
L(θ;Xt) =T∏t=k
f(Xt |Xt−1, . . . , Xt−k; θ). (126)
• Maximum likelihood estimator of θ given data Xt defined as:
θ = maxθ
L(θ;Xt), (127)
• Value of θ that, given assumed distribution, maximises likelihood function.
– Measure of plausibility: how plausible is particular parameter value?
• Logarithms often used to make likelihood function tractable:
θ = maxθ
log L(θ;Xt) = maxθ` ((|θ) ;Xt). (128)
• For VAR, Normality assumption implies:
` (θ;Xt) = −T p2
ln(2π)− T 12
ln |Ω| − 12
T∑t=1
(Xt −B′Wt)′Ω−1 (Xt −B′Wt) .
• Likelihood maximisation implies, for B′:
minB
T∑t=0
(Xt −B′Wt)2
(129)
0 =T∑t=0
(Xt −B′Wt)W ′t (130)
B′ =T∑t=0
XtWt
(T∑t=0
WtW′t
)−1
= MXWM−1WW . (131)
– ML Estimators: B′ = (Π1, Π2, Φ).
• Product moment matrices are generically defined as MXW =∑Tt=0XtWt.
• Furthermore: εt = Xt − B′Wt−1 (132)
Ω = T−1T∑t=0
εtε′t = MXX −MXWM
−1WWMWX, (133)
Maximised Likelihood
• Use estimators in likelood:
Lmax = L(B, Ω) = (2π)Tp/2∣∣∣Ω∣∣∣T/2 exp(−1
2tr
[Ω−1
T∑t=1
εtε′t
])(134)
L−2/Tmax = (2πe)p
∣∣∣Ω∣∣∣ . (135)
• Very powerful result for testing:
– Regardless of model estimated with Normal distribution, get this result.– Can impose restrictions, estimate, get ΩR and BR and get:
L−2/TR = (2πe)p
∣∣∣ΩR∣∣∣ . (136)
– Likelihood ratio test has easy form: ratio of residual variances.
Testing with the Maximum Likelihood Framework
• Likelihood framework allows easy testing: likelihood ratio test.
• Test the hypothesis: H0 : θ = θ0, (137)
• Using test statistic:
LR = −2(
logL (θ0;Xt)− logL(θ;Xt))∼ χ2
dim θ. (138)
• Test assesses plausibility of restrictions.
– If restrictions move likelihood too far from θ, reject restrictions.
• Restrictions on B′ formed by constructing matrices R or H — Lecture 3.
• Using H form, restrictions imposed by ψ = HB:
Xt = HB′Zt + εt = ψZt + εt. (139)
• Estimating gives restricted estimators, denoted by checks:
ψ = MXZH (H ′MZZH)−1(140)
Ω = MXX −MXZH (H ′MZZH)−1MZX (141)
• Likelihood ratio test: −2 ln(LR) = T ln(∣∣Ω∣∣ / ∣∣∣Ω∣∣∣)→ χ2
r. (142)
– Test statistic simple and intuitive.
The VAR Likelihood Framework
• VAR is simultaneous equations autoregressive model.
• Allows rich characterisation of dynamics of data.
• Equivalent to reduced form of economic theory models.
• Likelihood estimation consistent as no endogeneity.
– Also efficient as Ω estimated.
• Provided VAR well specified, powerful tool for exploring data:
– Forecasting.– Impulse response analysis.– Investigating steady-state relationships: Cointegration.– Issues of causality and exogeneity.
Checking the VAR
• VAR provides much information on modelled data.
• Johansen (2004): “which statistical model describes the data?”
– Statistical models rely on assumptions, properties proved based these.– Must test assumptions hold.– Check on unrestricted VAR before proceeding cointegration analysis.∗ Choice of number of cointegrating vectors vital.∗ Akin to deciding whether data I(1) or I(0).∗ Choice affected by model misspecification.
The Assumptions of the VAR model
• VAR model assumes:
1. Linear conditional mean explained by past observations and deterministic terms:– Testing: Un-modelled systematic variation in residuals:∗ Informal: plots of residuals.∗ Formal: test for autocorrelated errors, heteroskedasticity and ARCH.
– Remedy by:∗ Choice of lag length.∗ Choice of information set — composition of Xt.∗ Incorporate outliers.∗ Data transformations: Non-linearity.∗ Structural breaks: Non-constant parameters, deterministic terms.
Assumptions (continued. . . )2. Time-invariant conditional variance:
– Heteroskedasticity and ARCH effects:∗ Informal: plots of residuals.∗ Formal: White test, ARCH test.
– Remedy:∗ Add potentially causal regressors?∗ Regime shifts in the variance: deterministic terms.
3. Independent Normal errors, mean zero, variance Ω:– Informal testing: histogram of residuals– Formal testing: Autocorrelation test on residuals.
4. Parameter space:– All model outcomes plausible?– Remedy: data transformation — e.g. logs for % change.
Cointegration in the VAR
• Data generally non-stationary: assume Xt ∼ I(1).
• As with AR(1), reformulate: Error correction form of VAR:
∆Xt = ΠXt−1 + Γ1∆Xt−1 + · · ·+ Γk−1∆Xt−k+1 + ΦDt + εt. (143)
– Π =(∑k
i=1 Πi
)− 1, Γj =
∑ki=j+1 Πi.
• ∆Xt ∼ I(0), εt ∼ I(0), but Xt ∼ I(1) still. (143) unbalanced.
• Solution: Π reduced rank. Then ∃ p× r matrices α, β s.t. Π = αβ′:
∆Xt = αβ′Xt−1 + Γ1∆Xt−1 + · · ·+ Γk−1∆Xt−k+1 + ΦDt + εt. (144)
– β′Xt−1 ∼ I(0): I(0) combinations of I(1) variables: cointegrating vectors.
A Bivariate Example
• Example: r = 1, p = 2, β 2× 1 so
αβ′Xt−1 =(α1
α2
)(β1 β2
)( X1,t−1
X2,t−1
)=(α1
α2
)(β1X1,t−1 + β2X2,t−1) .
• β′Xt: Stationary Linear combination of I(1) variables.
• α1, α2: speed of adjustment of variables in Xt to disequilibrium.
– αi = 0 implies Xi,t weakly exogenous.
• X1,t, X2,t: consumption and income, home and foreign interest rate. . .
• Very powerful framework for analysis of steady-state relationships.
– Can check if more than one variable adjusts to steady-state.– No empirical examples today: If interested I can provide more slides.
Granger Causality
• Causality central to economics and other fields:
– Does money cause GDP, or GDP cause money?– Does advertising cause sales? Or sales cause advertising?
• The VAR framework allows us to answer these questions.
• A variable Xt is Granger non-causal for Yt if:
E (Yt |Yt−1, Xt−1, . . . ) = E (Yt |Yt−1) . (145)
– I.e. Previous values of Xt do not provide information on Yt.– Same as strong exogeneity.
• If (145) does not hold, implication is Xt Granger causal for Yt.
– But need also that Yt Granger non-causal for Xt.– Rule out feedback, establish causality.
Granger Causality
• Can easily test Granger causality in VAR model. E.g. bivariate VAR(2):
Xt = Π1Xt−1 + Π2Xt−2 + εt (146)(X1,t
X2,t
)=(π1,11 π1,12
π1,21 π1,22
)(X1,t−1
X2,t−1
)+(
π2,11 π2,12
π2,21 π2,22
)(X1,t−2
X2,t−2
)+(ε1,tε2,t
).
(147)
• If π1,21 = π2,21 = 0 then lags of X1,t Granger non-causal for X2,t.
• If π1,12 6= 0, π2,12 6= 0 then X2,t Granger causal for X1,t.
– Hence X2,t Granger causal for X1,t.
• Powerful test of causality often used in literature.
Granger Causality
• But general case: tri-variate VAR(2):
X1,t
X2,t
X3,t
=
π1,11 π1,12 π1,13
π1,21 π1,22 π1,23
π1,31 π1,32 π1,33
X1,t−1
X2,t−1
X3,t−1
+
π2,11 π2,12 π2,13
π2,21 π2,22 π2,23
π2,31 π2,32 π2,33
X1,t−2
X2,t−2
X3,t−2
+
ε1,tε2,tε3,t
.
(148)
• If π1,21 = π2,21 = 0 then lags of X1,t Granger non-causal for X2,t.
• If π1,12 6= 0, π2,12 6= 0 then X2,t Granger causal for X1,t.
– But what about causality from X1,t−2 to X3,t−1 to X2,t?– Need also π2,31 = π1,23 = 0.
Granger Causality
• Granger causality extensively used in empirical work.
– Powerful and intuitive test of causality.– Reliant on VAR framework.– Could be run as series of single equation estimations.∗ Thought inefficient.
• But it is severely limited:
– With many lags, complicated structure of zero restrictions required.– Test is conditional on information set included:∗ Hence unmodelled lags or variables may provide causality.∗ Hence Granger non-causality conclusion may be invalidated.
Advertising: Copenhagen Cointegration Summer School
• Learn from the Masters!
• Three week course in August each year:
– August 3–23 2009: register your interest!– http://www.econ.ku.dk/summerschool/
• Mornings: Cointegration theory from Juselius, Johansen, Rahbek and Nielsen.
• Afternoons: Computer labs to work on your own dataset.
• Hugely useful course:
– Submit paper at end of course for feedback.– Potential PhD chapter.
Concluding
• In-depth look at multiple-equation modelling.
• Seemingly Unrelated Regression:
– E.g. Demand systems.– Exploit information between regressions.
• Simultaneous-equation modelling:
– E.g. Demand and supply systems.– Endogeneity problem.– IV estimation: Instruments are exogenous variables.
• VAR modelling:
– Extending the time series dimension.– Forecasting, theory-free/full modelling, impulse responses, Granger causality.
• Next week: Limited-Dependent-Variable Modelling.