“Econometrics may not have the everlasting charm of · commonly used test for serial correlation,...

113

Transcript of “Econometrics may not have the everlasting charm of · commonly used test for serial correlation,...

  • “Econometrics may not have the everlasting charm of Holmesian characters and adventures, or even a famous resident of Baker Street, but there is much in his methodological approach to the solving of criminal cases that is of relevance to applied econometric modeling. Holmesian detection may be interpreted as accommodating the relationship between data and theory, modeling procedures, deductions and inferences, analysis of biases, testing of theories, re-evaluation and reformulation of

    216216

    testing of theories, re-evaluation and reformulation of theories, and finally reaching a solution to the problem at hand. With this in mind, can applied econometricians learn anything from the master of detection?

    Michael McAleer, “Sherlock Holmes and the Search for Truth: A Diagnostic Tale”, Journal of Economic Surveys8,4(1994)317-370.

  • Serial Correlation(Autocorrelation)

    Heteroscedasticity

    217217

    Heteroscedasticity

    Collinearity Diagnostics

    Influence Diagnostics

    Structural Change

  • Definition

    TestsDurbin-Watson Test

    Nonparametric Runs Test

    Durbin h Test

    218218

    Durbin h Test

    Lagrange Multiplier (LM) TestBox-Pierce Q StatisticLjung- Box Q * Statistic (Small-sample modification of Box-Pierce Q Statistic)

    Generalized Least Squares

  • Disturbance terms are not independent.

    The correlation between and is called an autocorrelation of order k.

    ji0)(En,...,2,1iX...XXY

    ji

    ikiki22i110i

    ≠≠εε=ε+β++β+β+β=

    tε kt−ε

    219219

    autocorrelation of order k.

  • Autocorrelation or serial correlation refers to the lack of independence of error (or disturbance) terms. Autocorrelation and serial correlation refer to the same phenomenon. Simply put, a systematic

    220220

    to the same phenomenon. Simply put, a systematic pattern exists in the residuals of the econometric model. Ideally, the residuals, which represent a composite of all factors not embedded in the model, should exhibit no pattern. That is to say, t he residuals should follow a white-noise (or random) pattern.

  • With the use of time-series data in econometric applications, serial correlation is “public enemy number one”. Systematic patterns in the error terms commonly arise due to the (inadvertent) omission of explanatory variables in econometric models. These

    221221

    explanatory variables in econometric models. These variables may come from disciplines other than economics, finance, or business, for example, psychology and sociology. Or, these variables may represent factors that simply are difficult to quan tify, such as tastes and preferences of consumers or technological innovation on the part of producers.

  • � Bishop (1981)

    � Errors “contaminated” with autocorrelation or seria l correlationPotential of discovering “spurious” relationships d ue to problems with autocorrelated errors (Granger and

    222222

    to problems with autocorrelated errors (Granger and Newbold, 1974)

    � Difficulties with structural analysis and forecasti ng

    � If the error structure is autoregressive, then OLS estimates of the regression parameters are: (1) unbiased, (2) consistent, but (3) inefficient in sm all and in large samples.

  • � The estimates of the standard errors of the coefficients in any econometric model are biased downward if the residuals are positively autocorrelated. They are biased upward if the residuals are negatively autocorrelated.

    � Therefore, the calculated t -statistic is biased

    223223

    � Therefore, the calculated t -statistic is biased upward or downward in the opposite direction of the bias in the estimated standard error of that coefficient.

    � Granger and Newbold (1974) further suggest that the econometric results can be defined as “nonsense” if R 2 >DW(d).

  • Positive autocorrelation of the errors generally tends to make the estimate of the error variance to o small, so confidence intervals are too narrow and n ull hypotheses are rejected with a higher probability t han the stated significance level. Negative autocorrela tion of the errors generally tends to make the estimate of the

    224224

    the errors generally tends to make the estimate of the error variance too large, so confidence intervals a re too wide; as well, the power of significance tests is r educed. With either positive or negative autocorrelation, l east-squares parameter estimates usually are not as effi cient as generalized least-squares parameter estimates.

  • � Ordinary regression analysis is based on several st atistical assumptions. One key assumption is that the errors are independent of each other. However, with time serie s data, the ordinary regression residuals usually are correlate d over time.

    � Violation of the independent errors assumption has three important consequences for ordinary regression.

    225225

    important consequences for ordinary regression.

    � First, statistical tests of the significance of the parameters and the confidence limits for the predicted values are not correct.

    � Second, the estimates of the regression coefficient s are not as efficient as they would be if the autocorrelation w ere taken into account.

    � Third, since the ordinary regression residuals are not independent, they contain information that can be u sed to improve the prediction of future values.

  • � The AUTOREG procedure solves this problem by augmen ting the regression model with an autoregressive model f or the random error, thereby accounting for the systematic pattern of the errors. Instead of the usual regression mode l, the following autoregressive error model is used:

    εβxy +′=

    226226

    � The notation indicates that each v t is normally and independently distributed with mean 0 and varia nce .

    ),0( ~

    ...

    2

    2211

    σ

    εφεφεφε

    εβ

    INv

    v

    xy

    t

    tmtmttt

    ttt

    +−−−−=

    +′=

    −−−

    ),0( ~ 2σINvt 2σ

  • By simultaneously estimating the regression coefficients β and the autoregressive error model parameters φi , the AUTOREG procedure corrects the regression estimates for autocorrelation. Thus, this kind of regression analysis is often called

    227227

    this kind of regression analysis is often called autoregressive error correction or serial correlation correction. This technique also is called the use of generalized least squares (GLS).

  • The AUTOREG procedure can produce two kinds of predicted values and corresponding residuals and confidence limits. The first kind of predicted value is obtained from only the structura l part of the model; this predicted value is an estimate of the unconditional mean of the

    228228

    estimate of the unconditional mean of the dependent variable at time t. The second kind of predicted value includes both the structural part o f the model and the predicted value of the autoregressive error process. Both the structural part and autoregressive error process of the model (termed the “full” model) are used to forecast future values.

  • The Durbin-Watson Test

    )(4)ˆˆ(

    0

    0:

    0:

    2

    21

    1

    0

    ρρ

    ≤−

    =≤

    ≠=

    ∑=

    statisticdoree

    DW

    H

    H

    n

    ttt

    229229

    )1(2)(

    )(4ˆ

    0

    1

    2

    2

    ρ−≈

    ≤=≤∑

    =

    =

    dDW

    statisticdore

    DW n

    tt

    t

    (approximation good only for large samples)

    4d then -1, if

    0;d then 1, if 2;d then 0, If

    ==ρ==ρ==ρ

  • 230230

    � dL,dU depend on α, k, n� DW invalid with models that contain no intercept an d

    models that contain lagged dependent variables

  • � The sampling distribution of d depends on the values of the exogenous variables and hence Durbin and Watson derived upper (dU) limits and lower (dL) limits for the significance levels for d.

    � Tables of the distribution are found in

    231231

    � Tables of the distribution are found in most econometric textbooks.

    � The Durbin-Watson test perhaps is the most used procedure in econometric applications.

  • 232232

  • Appendix G, Statistical Table, cont.

    233233

  • Although the DW test is the most commonly used test for serial correlation, there are limitations:

    (1) tests for only first-order serial correlation.

    (2) test may be inconclusive.

    234234

    (3) test cannot be applied in models with lagged dependent variables.

    (4) test cannot be applied in models without intercepts.

  • There are other tables for the DW test that have been prepared to take care of special situations. Some of these are:

    (1) R.W. Farebrother (1980) provides tables for regression models with no intercept term.

    235235

    (2) Savin and White (1977) present tables for the DW test for samples with 6 to 200 observations and for as many as 20 regressors.

  • (3) Wallis (1972) gives tables for regression model s with quarterly data. Here one would like to test for fourth-order autocorrelation rather than first-order autocorrel ation. In this case, the DW statistic is:

    =

    =−−

    =n

    1t

    2t

    n

    5t

    24tt

    4

    )ûû(d

    236236

    Wallis provides 5% critical values d L and d U for two situations: where the k regressors include an inter cept (but not a full set of seasonal dummy variables) and another where the regressors include four quarterly seasonal dummy va riables. In each case the critical values are for testing H 0: ρρρρ =0 against H 1: ρρρρ > 0. For the hypothesis H 1: ρρρρ < 0, Wallis suggests that the appropriate critical values are (4-d U) and (4-d L). King and Giles (1978) give further significance points for these t ests.

    =1t

  • (4) King (1981) gives the 5% points for d L and d Uquarterly time-series data with trend and/or seasonal dummy variables. These tables are for testing first-order autocorrelation.

    (5) King (1983) gives tables for the DW test for

    237237

    (5) King (1983) gives tables for the DW test for monthly data. In case of monthly data, we may wish to test for twelfth-order autocorrelation.

  • � More general than the DW test. Interest in H 0: ρρρρ = 0.� Test of AR(1) process in the error terms

    N+ = number of positive residualsN- = number of negative residualsN = number of observationsNr = number of runs

    � Example:

    238238

    � Example:

    � Test Statistic:

    � Reject H 0 (non-autocorrelation) if the test statistic is too large in absolute value.

    [ ][ ] ))1(/())2(2(

    /)2(2 −−=

    =−+−+

    −+

    NNNNNNNNVAR

    NNNNE

    r

    r

    )1,0(N)N(VAR/))N(EN(Z rrr ≈−=

  • 239239

  • 240240

    In this example, sample evidence exists to suggest the presence of positive serial correlation, the more

    common form of pattern in the residuals in regard to the use of economic or financial data.

  • 241241

  • In the Greene problem for gasoline, In the Greene problem for gasoline, DW = 0.786 and = 0.601.DW = 0.786 and = 0.601.

    � Use of Nonparametric Runs testN = 36 N+ = 19 N- = 17 Nr = 11

    ρ̂

    [ ][ ] 69.8))1(/())2(2(

    94.17/)2(2 =−−=

    ==−+−+

    −+

    NNNNNNNNVAR

    NNNNE r

    242242

    [ ] 69.8))1(/())2(2( 2 =−−= −+−+ NNNNNNNNVAR r

    .05at 96.1

    0reject05at

    35.295.2

    94.6

    69.8

    94.1711

    )())((

    0

    ====

    −=−=−=

    −=

    αcrit

    rrr

    Z

    .:ρ H, .α

    Z

    NVARNENZ

  • � Analysts must recognize that a “good” Durbin-Watson statistic is insufficient evidence upon which to conclude that the error structure is “contamination free” in terms of autocorrelation. The Durbin-Watso n test is only applicable for the presence of first-o rder autocorrelation.

    � There is little reason to suppose that the correct model for residuals is AR(1); a mixed, autoregressive, moving -average (ARMA) structure is much more likely

    243243

    moving -average (ARMA) structure is much more likely to be correct, especially with quarterly, monthly, and weekly frequencies of time-series data. Modeling of the residuals can be employed following the methodology of Box and Jenkins (1976).

    � Owing to higher frequencies of time-series data use d in applied econometrics in recent years, the pattern o f the error structure generally is more complex than the common AR(1) pattern.

  • Coefficient associated with↑

    1tY −

    A large sample test for autocorrelation when lagged dependent variables are present.

    d is the DW statistic

    )1,0(N~h

    ))(nV1/(nˆh

    d)2/1(1ˆ

    &

    β−ρ=−≈ρ

    244244

    )1,0(N~h &

    Test breaks down if 1)ˆ(nV ≥β

    If the Durbin-h test breaks down, compute the OLS residuals Then regress on and the set of exogenous variables. The test for is carried out by testing the significance of the coefficient

    tû

    tû ,y,û 1t1t −−

    1tû −

    0=ρ

  • 245245

    OLS estimates

    Presence of a lagged

    dependent variable

  • � In the Greene Problem for gasoline demand

    1805.0)2/639.1(1ˆ =−=ρ

    2462465788.1

    0155.0)12456.0()(

    35

    2

    =

    ==

    =

    h

    Bv

    n

  • LM - Lagrange multiplier

    0...:H

    ),0(IN~eeu...uuu

    .n,...,2,1tuX...Xy

    p210

    2ttptp2t21t1t

    tktkt110t

    =ρ==ρ=ρ

    σ+ρ++ρ+ρ=

    =+β++β+β=

    −−−

    The X’s may or may not include lagged dependent variables.

    247247

    The X’s may or may not include lagged dependent variables.

    First: estimate by OLS and obtain the least squares residuals

    Second: estimate

    Third: test whether the coefficients of are all zero. Use the conventional F-statistic.

    .vûX...Xû tp

    1tiitktkt110t +ρ+γ++γ+γ= ∑

    =−

    tû

    itû −

  • Check the serial correlation pattern of the residua ls, need to be sure that there is no serial correlation (desire wh ite noise)Box and Pierce (1970) suggest looking at not just t he first-order autocorrelation but autocorrelation of all orders o f residuals.

    Calculate where∑=

    =m

    2k ,rNQ

    248248

    is the autocorrelation of lag k, and N is the numbe r of observations in the series.If the model fitted is appropriate, where p is the number of estimated parameters

    Ljung and Box (1978) suggest a modification of the Q-statistic for moderate sample sizes.

    ∑=

    −−+=m

    1k

    2k

    1r)kN()2N(N*Q

    ∑=1k

    2kr

    2~ pmQ −χ&

  • � We use the correlations and partial correlations of the residuals over time. The idea is to determine the appropriate pattern in the error structure from the autocorrelation and partial autocorrelation functio ns associated with the residuals.

    249249

    � Autocorrelation functions tell us about moving aver age (MA) patterns.

    � Partial autocorrelation functions tell us about autoregressive (AR) patterns.

    � Anticipate ARMA error structures, particularly high er-order AR patterns in residuals of econometric model s.

  • 250250

  • The test can be used for different specifications of the error process:

    For Example: .euu t4t4t +ρ= −

    251251

    estimate

    test

    .ˆ...ˆ 44110 ttktktt vuXXu +++++= −ργγγ

    .0:H 40 =ρ

  • 252252

  • 253253

  • 254254

  • 255255

  • 256256

  • � With time-series data, in most cases serial correlation problems will surface

    � Analysts must examine the error structure carefully

    � Minimally� Graph the residuals over time

    � Consider the significance of the Durbin-Watson statistic

    257257

    statistic

    � Consider higher-order autocorrelation structure via PROC ARIMA

    � Consider the Godfrey LM Test

    � Consider the Box-Pierce or Ljung-Box Tests (Q-Statistics)

    � Re-estimate econometric models with AR(p) error structures

  • � The regression model is specified as where the ∈∈∈∈i’s are identically and independently distributed:

    If the ∈∈∈∈i’s are not independent or their variances are not constant, the parameter est imates are unbiased, but the estimate of the covariance matrix is inconsistent.

    � One of the key assumptions of regression is that th e

    ,iii xy ∈+= β

    and 0)( =∈E I.)( 2σ=∈′∈E

    258258

    � One of the key assumptions of regression is that th e variance of the errors is constant across observati ons. If the errors have constant variance, the errors are calle d homoscedastic. Standard estimation methods are inefficient when the errors are heteroscedastic or have non-constant variance. As well, this issue leads to problems in tests of hypotheses.

    � Null hypothesis: the disturbance terms are homosced astic.� See McCulloch (1985)

    iH i allfor :22

    0 σσ =

  • Example:

    where represents savings

    represents incomei

    i

    XY

    22

    22110

    )(

    ,...,2,1...

    ii

    ikikiii

    E

    niXXXY

    σε

    εββββ

    =

    =+++++=

    iii XY εββ ++= 10

    n represents the number of observations

    259259

    represents incomeiX

  • � Assumption of homoscedasticity

    � Since E( ∈∈∈∈i) = 0 by assumption, we may write the homoscedasticity condition as

    � The variance of the error term or disturbance term is constant for all observations.

    This issue is problematic with microeconomic data o r more

    iVar i allfor )(2σ=∈

    .)( 22 σ=∈iE

    260260

    � This issue is problematic with microeconomic data o r more generally cross-sectional data; take the case of in come and expenditure of households for example.

    � The assumption of homoscedasticity is not very plau sible on a priori grounds; we expect less variation in consu mption (or saving) for low-income households and more variatio n in consumption (or saving) for high-income households.

    � Hence, the need to consider heteroscedastic disturb ance terms in applied econometrics.

  • e2

    0 xj

    e2

    0 xj

    e2

    0 xj

    e2 e2

    (a) (b) (c)

    261261

    xj 00 xj(d) (e)Diagram of estimated squared residuals against expl anatory variables.

    are plotted against xj, j = 1, 2, …, k2ie

  • 1. OLS parameter estimates are unbiasedand consistent . But they are not efficient .

    2. The estimated variances of the parameters

    262262

    2. The estimated variances of the parameters of the model are biased estimators.

  • � When the disturbance term is heteroscedastic, OLS parameter estimates are unbiased and consistent, bu t they are NOT BLUE.

    � The estimated variances of OLS parameters are in ge neral biased. Hence the conventionally calculated confide nce intervals and tests of significance are invalid.

    � Use of weighted least squares to overcome these

    263263

    � Use of weighted least squares to overcome these consequences of heteroscedasticity.

    with relatively high (low) variance

    � Alternatively use maximum likelihood (ML) estimatio n

    n) ..., 2, 1,(i nsobservatio (reward) penalize ,1

    2===

    i

    ii weightwσ

  • � Transform the model

    � Let

    N ..., 2, 1,i ...22110 =∈+++++= ikikiii XXXY ββββ22)( iiE σ=∈

    iikikiiiiiiii wXwXwXwwYw ∈+++++= ββββ ...22110

    iw σ11 ==

    264264

    � Let

    � So,

    ...221

    10

    i

    i

    i

    kik

    i

    i

    i

    i

    ii

    i XXXY

    σσβ

    σβ

    σβ

    σβ

    σ∈+++++=

    ii

    iw σσ 2==

    ... ***22*11

    *0

    *ikikiii XXXY ∈+++++= ββββ

    1)(1

    )(2

    * =∈=

    ∈=∈ iii

    ii VarVarVar σσ

  • � With SAS (or EVIEWS), one can use the WEIGHT statement together with the PROC MODEL to correct for heteroscedasticity.

    265265

    � The WEIGHT statement follows the FIT statement.

  • � Model SpecificationSavingi = a + b*INCOMEi + ∈∈∈∈i491 Households

    1989 Datab = marginal propensity to save

    266266

    b = marginal propensity to save

    Assumption:

    Let ii

    i INCOMEw

    112

    ==σ

    22ii INCOME=σ

    )(2 INCOMEfi =σ

  • 267267

  • 268268

    Dependent variable (residuals) 2

    22 02910.intercept ii incat+=σ

    OLS Estimates

  • Suggests the presence of heteroscedasticity.

    269269

    Use of weighted least squares.

    [ ] 2122 02910.intercept11

    ii

    i

    incatweight

    +==

    σ

  • WLS Estimates}

    MPS = .27697 with WLS

    MPS = .35954 with OLS

    270270

  • � When the variance of the errors of a classical line ar modelY = Xββββ + ∈∈∈∈

    � is not constant across observations (heteroscedasti c), so that for some, the OLS estimator

    � is unbiased but it is inefficient. Models that take into account the changing variance can make more efficient use of the data. W hen the variances, are known, generalized least squares (GLS) can be us ed and the estimator

    YXXXOLS ′′=−1)(β̂

    YXXXGLS11)(ˆ −− Ω′Ω′=β

    22ji σσ ≠

    ,2iσ

    271271

    � where

    � is unbiased and efficient. However, GLS is unavaila ble when the variances, are unknown. n refers to the number of observations .

    GLS

    ,2iσ

    =Ω

    2

    22

    21

    0

    0

    0

    000

    00

    00

    00

    σσ

    L

    LO

    L

    L

  • =Ω

    2

    22

    21

    0

    0

    0

    000

    00

    00

    00

    ˆ

    ne

    e

    e

    L

    LO

    L

    L

    .ˆOLSiii XY β′−=∈

    272272

    � Note that

    =Ω−

    21000

    000

    0022

    10

    00021

    1

    ne

    e

    e

    L

    LO

    L

    L

  • � Assumptions Concerning

    � In the case of the micro consumption (or saving) function, the variance of the disturbance terms often is assumed to be positively associated with the level of household income.

    2iσ

    ),,...,,( 212

    piiii zzzf=σ

    273273

    � To operationalize such assumption, we need to specify not only , but also the functional form of the association.

    � Common forms of association represent both multiplicative heteroscedasticity and additive heteroscedasticity.

    piii zzz and ,...,, 21

  • � Note that this specification applies only when all Zs are positive.

    ppiiii zzz

    δδδσσ ... 21 2122 =

    pipiii zzz ln ...lnlnlnln 221122 δδδσσ ++++=

    0...: ====H δδδ

    274274

    under H 0: we have homoscedastic disturbances if we reject H 0, then evidence suggests heteroscedastic disturbance terms.

    � Appropriate

    0...: 210 ==== pH δδδ

    2/2/22/1 ...

    1

    21p

    piii

    izzz

    w δδδ=

  • � Under H 0: we have homoscedastic

    ) ... ( 2211022

    pipiii zazazaa ++++= σσ

    0...: 210 ==== paaaH

    275275

    � Under H 0: we have homoscedastic disturbancesif we reject H 0, then evidence suggests heteroscedastic disturbance terms.

    � Appropriate

    21

    22110 ) ... (

    1

    pipii

    i

    zazazaaw

    ++++=

  • � Usually it is not likely that analysts would know o f variables related to the variance of the disturbanc e term that have not already been included in the econometric specification. Thus, the usual choices

    Practically speaking, apart from the functional form of the heteroscedastic issue, how do analysts select the z variables?

    276276

    econometric specification. Thus, the usual choices of the z variables are likely to be the explanatory variables (the X variables).

    � Prime candidates for the z variables are the non-discrete explanatory variables X.

  • (1) Park-Glejser(2) Breusch-Pagan-Godfrey

    Harvey

    277277

    (3) Harvey(4) White

  • � Replace

    ikikiii

    ukiiti

    ikikiti

    uXXX

    eXXXH

    uXXXYik

    +++++=

    =

    +++++=

    ln...lnlnlnln

    ...:

    ...

    221122

    2122

    0

    22110

    21

    δδδσσσσ

    ββββδδδ

    2i

    2i elnln ⇒σ

    278278

    � Replace

    � Perform F Test on

    � In F non-significant, then the disturbance terms ar e homoscedastic

    � If F significant, then disturbance terms are hetero scedastic

    ii elnln ⇒σ

    s'δ

    i

    ji

    k

    j

    weight

    X

    Correctioni

    =

    Π=

    2

    1

  • 279279

    OLS Estimates

  • 280280

  • Auxiliary regression

    ii

    ii

    uPCINC

    PCAID

    ++

    +=

    ln

    lnlnln

    2

    122

    δ

    δσσ

    281281

    F test indicates the presence of

    heteroscedasticity.

  • WLS

    282282

    WLS

    2/36463.42/90627.1

    1

    iii PCINCPCAID

    weight =

  • Test H0: coefficient of nez, mwz, and wez are jointly

    equal to 0.

    Test H0: coefficient of nez = coefficient of mwz.

    283283

    Test H0: coefficient of nez = coefficient of wez.

    Test H0: coefficient of mwz = coefficient of wez.

  • Use of Proc Model to handle the

    284284

    Heteroscedasticity Problem (use of

    weight statement).

    Same WLS estimates as before.

  • � Regress

    Perform F test on

    ikikiii

    ikikiii

    uXXXH

    uXXXe

    +++++=

    +++++=

    γγγγσγγγγ

    ...:

    ...

    221102

    0

    221102

    s'γ

    285285

    � Perform F test on

    � Correction:2

    1k

    1jjij0

    i

    X

    1WEIGHT

    γ+γ

    =

    ∑=

    s'γ

    Breusch and Pagan (1979); Godfrey (1978)

  • 286286

    OLS estimates

  • Auxiliary regression

    iiii upcincpcaide +++= 2102 γγγ

    287287

    F test associated with the auxiliary regression

    H0: coefficient of pcaid = coefficient of pcinc = 0

  • [ ] 21*13583.3*92871.54173931

    pcincpcaidweight i

    ++−=

    288288

    WLS estimates

  • Test H0: coefficient of nez, mwz, and wez are jointly

    equal to 0.

    Test H0: coefficient of nez = coefficient of mwz.

    289289

    Test H0: coefficient of nez = coefficient of wez.

    Test H0: coefficient of mwz = coefficient of wez.

  • 290290

    Use of Proc model with

    weight statement.

  • � To operationalize this hypothesis, formulate the following regression:

    [ ]

    iKiKiii

    iKiKiii

    uXaXaXaae

    uXaXaXaaH

    +++++=

    +++++=

    ...ln

    ...exp:

    221102

    221102

    0 σ

    291291

    � Perform F test on a’s

    � Correction:

    2

    1

    K

    1jjij0

    i

    Xaaexp

    1WEIGHT

    +

    =

    ∑=

  • 292292

    OLS estimates

  • Auxiliary regression

    iiii upcincapcaidaae +++= 2102ln

    293293

    Test H 0: a1 = a2 =0 indicates the presence of

    heteroscedasticity

  • [ ] 21)*0009203.*00824.96476.1(exp1

    ii

    i

    pcincpcaidweight

    ++=

    294294

    WLS estimates

  • Test H0: coefficient of nez, mwz, and wez are jointly

    equal to 0.

    Test H0: coefficient of nez = coefficient of mwz.

    295295

    Test H0: coefficient of nez = coefficient of wez.

    Test H0: coefficient of mwz = coefficient of wez.

  • 296296

    WLS estimates via Proc Model

  • � influence by any of the regressors, squares of regr essors, cross-products of regressors on� Step 1 = Use OLS and obtain OLS residuals� Step 2 = Square residuals (form� Step 3 = Form squares of the right-hand side variab les (k

    terms) & cross products of regressors (k(k-1)/2 ter ms)� Step 4 = Regress against original regressors, and

    terms in step 3

    2ie

    )e2i

    297297

    terms in step 3

    � Correction:

    kiikkkiikik

    ikkikii

    XXXXX

    XXXe

    )1()2/)3((2112212

    211110

    2

    2......

    ...

    −++

    +

    +++++

    ++++=

    ααααααα

    2

    1

    )1()2/)3((2112

    212

    211110

    2...

    ......

    1

    +++++++++

    =

    −++

    +

    kiikkkiik

    ikikkiki

    i

    XXXX

    XXXX

    WEIGHT

    ααααααα

  • 298298

    OLS estimates

  • Auxiliary

    299299

    Auxiliary regression

    F test indicates the presence of

    heteroscedasticity

  • WLS estimates with

    *88919.329146628[

    1

    ii pcaid

    weight+−

    =

    300300

    212

    2

    ]**08819.*00364.0

    *25458.0*24914.51

    iii

    ii

    i

    pcincpcaidpcinc

    pcaidpcinc

    −−

    ++

  • Test H0: coefficient of nez, mwz, and wez are jointly

    equal to 0.

    Test H0: coefficient of nez = coefficient of mwz.

    301301

    Test H0: coefficient of nez = coefficient of wez.

    Test H0: coefficient of mwz = coefficient of wez.

  • 302302

    WLS estimates with weight statement from

    Proc Model.

  • Testp-value of F-statistic

    Park Glejser 0.0406

    303303

    Breusch-Pagan-Godfrey

    0.0008

    Harvey 0.0311

    White 0.0018

    � All tests indicate the presence of heteroscedasticity.

  • Correction for Heteroscedasticity—WLSSummary of 1970 State Data Problem with Heterosceda sticity

    Variable OLS Park-Glejser Breusch-Pagan-Godfrey

    Harvey White

    Intercept -665.51586(103.70268)

    -303.89856(124.50782)

    -429.75385(115.99855)

    -291.33079(123.71054)

    -583.81742(106.63934)

    PCAID 2.55446(0.18707)

    1.69045(0.27151)

    2.05626(0.25728)

    1.64611(0.27646)

    2.26077(0.25838)

    304304

    PCINC 0.22216(0.02495)

    0.16693(0.02536)

    0.18451(0.02511)

    0.16576(0.02482)

    0.21301(0.02300)

    NE 28.32663(41.50517)

    67.66182(32.06849)

    51.97487(35.43834)

    69.01092(31.49605)

    51.30945(33.40153)

    MW 61.90315(37.66921 )

    79.60480(27.08596)

    78.65195(29.50126)

    77.16368(27.03204)

    76.51217(30.10805)

    WE 33.69220(38.88263 )

    78.98334(31.65617)

    69.98310(35.81770)

    79.60937(30.56592)

    66.40836(31.60469)

    � Standard errors in parentheses.

  • Correction for Heteroscedasticity—WLSSummary of 1970 State Data Problem with Heterosceda sticity,

    cont.

    Variable OLS Park-GlejserBreusch-

    Pagan-GodfreyHarvey White

    F-Test on Region Dummy

    Variables0.95 3.75 2.73 3.71 2.72

    305305

    P-value of F-test

    0.4235 0.0176 0.0549 0.0183 0.0560

    R2 0.8951 0.7463 0.7972 0.7506 0.8606

    0.8832 0.7175 0.7741 0.7223 0.84442R

    � The R2 and statistics come from the PROC MODEL procedure.

    2R

  • � The heteroscedastic regression model:

    � The heteroscedastic regression model is estimated u sing the following log-likelihood function:

    )(

    ),0(~

    22

    2

    η

    σσ

    σ

    β

    ii

    ii

    ii

    iii

    zlh

    h

    N

    xy

    ′=

    =

    ∈+′=

    2

    306306

    where

    � Use non-linear estimation procedures to maximize the likelihood function. Parameters are σσσσ2, ηηηη, and ββββ. Typically, z i is a subset of the x iexplanatory variables.

    2

    1

    2

    12

    1)ln(

    2

    1)2ln(

    2 ∑∑==

    −−−=

    N

    i i

    ii

    N

    i

    eN

    σσπl

    .βiii xye ′−=

  • (1) Retrieve OLS residuals

    (2) Square the OLS residuals

    (3) Graph the square of the OLS residuals vs. Non-discrete explanatory variables

    (4) Apply Park-Glejser, BPG, Harvey, and/or White tests

    (5) If any of the F-tests from (4) are statistically significant, then heteroscedasticity is present

    307307

    significant, then heteroscedasticity is present

    (6) To alleviate the heteroscedasticity problem, use WLS or ML estimation.

    (7) Report WLS (GLS) or ML estimates, standard errors, p-values, etc.

    (8) Retrieve appropriate goodness-of-fit statistics.

  • Nature of Problem

    Consequences

    Introduction

    Belsley, Kuh, Welsch Diagnostics

    308308

    Variance inflation factors

    Condition indices

    Variance-decomposition proportions

    Circumvention of Problem:

    Ridge Regression

  • 309309

    Multiple regression of Y on X and Z.

    OLS estimators: Use of blue area to estimate and green area to estimate

    Discard information in red areaXβ Zβ

  • ∑=

    ≈k

    0jjj 0Xa

    Near linear dependency among regressor variables

    310310

    variables

    Departure from orthogonality of the columns of X

    Singularity

    Elements “explode”→

    −1T

    T

    )XX(

    )XX(

  • Orthogonal Variables

    Non-orthogonal Variables

    110

    01

    10

    011 Case

    XX)XX(XX T1TT

    19.26.574.4

    74.426.5

    19.

    9.12 Case

    −−

    311311

    Key Points:

    (1) Sampling variances of estimated OLS coefficients increase sharply

    (2) Greater sampling covariances for the OLS coefficients

    02.505.49

    5.4950

    199.

    99.13 Case

    −−

  • � Deals with specific characteristics of the data matrix X-data problem, not a statistical problem

    � Speak in terms of severity rather than of its

    312312

    � Speak in terms of severity rather than of itsexistence or nonexistence

    � Effects on structural integrity of econometric models

    Opposite of collinear orthogonal

  • � Constitutes a threat to the proper specification an d effective estimation of a structural relationship

    � Covariance among parameter estimates are often large and of the wrong sign

    � Larger variances (standard errors) of regression coefficients; are not indistinguishable from the consequences of inadequate variability in the regressors

    1T2 )XX()(VAR −σ=β

    large and of the wrong sign

    � Difficulties in interpretation

    � Confidence regions for parameters are wide

    � Increase in type II error(Accept H 0 when H 0 false)

    � Decrease in power of tests

  • Multicollinearity refers to the presence of highly intercorrelated exogenous variables in regression models. It is not surprising that it is considered “one of the most ubiquitous, significant, and

    314314

    difficult problems in applied econometrics…often referred to by modelers as the familiar curse.” Collinearity diagnostics measure how much regressors are related to other regressors and how these relationships affect the stability and variance of the regression estimates.

  • Signs of multicollinearity in a regression analysis include:

    (1) Large standard errors on the regression coeffi cient, so that estimates of the true model parameters become unstable and low t-values prevail.

    (2) The parameter estimates vary considerably from

    315315

    (2) The parameter estimates vary considerably from sample to sample.

    (3) Often there will be drastic changes in the reg ression estimates after only minor data revision.

    (4) Conflicting conclusions will be reached from t he usual tests of significance (such as the wrong sign for a parameter).

  • (5) Extreme correlations between pairs of variables.

    (6) Omitting a variable from the equation results in smaller regression standard errors.

    316316

    in smaller regression standard errors.

    (7) A good fit not providing good forecasts.

  • (1) produce a set of condition indices that signal the presence of one or more near dependencies among the variables. (Linear dependency, an extreme form of multicollinearity, occurs when there is an exact linear relationship among the variables).

    (2) uncover those variables that are involved in p articular near dependencies and to assess the degree to which

    317317

    near dependencies and to assess the degree to which the estimated regression coefficients are being degraded by the presence of the near dependencies.

    In practice, if one exogenous variable has a high s quared multiple correlation (R-squared) with the other ind ependent variables, it is extremely unlikely that the exogen ous variable in question contributes significantly to the predic tion equation. When the R-squared is too high, the vari ables are, in essence, redundant.

  • The variance inflation factor (VIF i) for variable i is defined as follows:

    As the squared multiple correlation of the exogenou s variable with the other exogenous variables approaches unity, the corresponding VIF becomes infinite. If exogenous variables are o rthogonal to each

    )R1(

    1VIF

    2i

    i −=

    318318

    VIF becomes infinite. If exogenous variables are o rthogonal to each other (no correlation), the variance inflation fact or is 1.0. VIF i thus provides us with a measure of how many times larger the variance of the ith regression coefficient will be for multicollinea r data than for orthogonal data (where each VIF is 1.0). If the VI F’s are not too much larger than 1.0, multicollinearity is not a problem . An advantage of knowing the VIF for each variable is that it gives the user a tangible idea of how much of the variances of the estimated coefficients are degraded by the multicollinearity.

  • Small determinant -- some (or many) of the eigenvalues are small

    Belsley, Kuh, Welsch diagnostic tools

    XX of seigenvalue

    ...XX

    T

    p21T

    λλλ=

    319319

    Belsley, Kuh, Welsch diagnostic tools

    Condition number of the X matrix

    Condition index

    MIN/MAXMIN/MAX)X(k µµ=λλ=

    p,...,1s/MAX ss =µµ=η

    alue.singular vsth theis sµ

  • sth condition index of the nxp data matrix X.

    Key Point

    As many near dependencies among the columns of a data matrix X as there are high condition indices.

    p,...,1s/MAX ss =µµ=η

    320320

    data matrix X as there are high condition indices.

    Weak dependencies are associated with condition indices around 5 or 10.

    Moderate to strong relations are associated with condition indices > 30.

  • Diagnostic Procedure

    111

    ::::

    ...

    ...

    )(VAR...)(VAR)VAR(Value Singular

    pp1p0pp

    p111101

    p001000

    p10

    ΠΠΠµ

    ΠΠΠµΠΠΠµ

    βββ

    321321

    Source:Belsley, Kuh, Welsch. Regression Diagnostics Identifying Influential Data and Sources of Collinearity, (1980), John Wiley & Sons.

    5. moreor two)2(30)1(

    s

    s

    >Π≥µ

  • 322322

    Note that variables 1, 2, 5, and 6 are highly corre lated and the VIF’s for all variables (except variable 3) are greater than 10 with one of them being greater than 1,000.

    Examination of the condition index column reveals a dominating dependency situation with high numbers for several indices.

  • Hoerl and Kennard (1970)

    [ ] [ ]T1T

    R

    TT

    YX)kIXX(ˆ

    0set

    d/dL

    )d(k)XY()XY(L Minimize

    +=β⇒

    −ββ−β−β−=

    323323

    [ ] [ ][ ]

    OLST1TR

    1TT1T2R

    T1TR

    ˆXX)kIXX(ˆ

    )kIXX(XX)kIXX(ˆVAR

    XXkIXXˆE

    β+=β

    ++σ=β

    β+=β

    −−

    There exists a number k such that

    [ ] [ ]2

    R

    )bias(iancevarMSE

    OLSˆMSEˆMSE

    +=β≤β

  • 324324

  • 325325

  • 326326

  • 327327