Henry II Forecasting

download Henry II Forecasting

of 53

Transcript of Henry II Forecasting

  • 8/2/2019 Henry II Forecasting

    1/53

    Econometric Modelling

    David F. Hendry

    Nuffield College, Oxford University.

    July 18, 2000

    Abstract

    The theory of reduction explains the origins of empirical models, by delineating all the steps

    involved in mapping from the actual data generation process (DGP) in the economy far too com-

    plicated and high dimensional ever to be completely modeled to an empirical model thereof. Each

    reduction step involves a potential loss of information from: aggregating, marginalizing, condition-

    ing, approximating, and truncating, leading to a local DGP which is the actual generating process

    in the space of variables under analysis. Tests of losses from many of the reduction steps are feas-

    ible. Models that show no losses are deemed congruent; those that explain rival models are called

    encompassing. The main reductions correspond to well-established econometrics concepts (causal-ity, exogeneity, invariance, innovations, etc.) which are the null hypotheses of the mis-specification

    tests, so the theory has considerable excess content.

    General-to-specific (Gets) modelling seeks to mimic reduction by commencing from a general

    congruent specification that is simplified to a minimal representation consistent with the desired

    criteria and the data evidence (essentially represented by the local DGP). However, in small data

    samples, model selection is difficult. We reconsider model selection from a computer-automation

    perspective, focusing on general-to-specific reductions, embodied in PcGets an Ox Package for

    implementing this modelling strategy for linear, dynamic regression models. We present an econo-

    metric theory that explains the remarkable properties of PcGets. Starting from a general congruent

    model, standard testing procedures eliminate statistically-insignificant variables, with diagnostic

    tests checking the validity of reductions, ensuring a congruent final selection. Path searches in

    PcGets terminate when no variable meets the pre-set criteria, or any diagnostic test becomes sig-

    nificant. Non-rejected models are tested by encompassing: if several are acceptable, the reduction

    recommences from their union: if they re-appear, the search is terminated using the Schwartz cri-

    terion.

    Since model selection with diagnostic testing has eluded theoretical analysis, we study model-

    ling strategies by simulation. The Monte Carlo experiments show that PcGets recovers the DGP

    specification from a general model with size and power close to commencing from the DGP it-

    self, so model selection can be relatively non-distortionary even when the mechanism is unknown.

    Empirical illustrations for consumers expenditure and money demand will be shown live.

    Next, we discuss sample-selection effects on forecast failure, with a Monte Carlo study of their

    impact. This leads to a discussion of the role of selection when testing theories, and the problems

    inherent in conventional approaches. Finally, we show that selecting policy-analysis models byforecast accuracy is not generally appropriate. We anticipate that Gets will perform well in selecting

    models for policy.

    Financial support from the UK Economic and Social Research Council under grant L138251009 Modelling Non-

    stationary Economic Time Series, R000237500, and Forecasting and Policy in the Evolving Macro-economy, L138251009,

    is gratefully acknowledged. The research is based on joint work with Hans-Martin Krolzig of Oxford University.

    1

  • 8/2/2019 Henry II Forecasting

    2/53

    2

    Contents

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Theory of reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.1 Empirical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2 DGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.3 Data transformations and aggregation . . . . . . . . . . . . . . . . . . . 6

    2.4 Parameters of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.5 Data partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.6 Marginalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.7 Sequential factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.7.1 Sequential factorization ofW1T. . . . . . . . . . . . . . . . . . 7

    2.7.2 Marginalizing with respect toV1T. . . . . . . . . . . . . . . . . 7

    2.8 Mapping to I(0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.9 Conditional factorization . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.10 Constancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.11 Lag truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.12 Functional form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.13 The derived model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.14 Dominance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.15 Econometric concepts as measures of no information loss . . . . . . . . . 9

    2.16 Implicit model design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.17 Explicit model design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.18 A taxonomy of evaluation information . . . . . . . . . . . . . . . . . . . 9

    3 General-to-specific modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3.1 Pre-search reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3.2 Additional paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.3 Encompassing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.4 Information criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.5 Sub-sample reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.6 Significant mis-specification tests . . . . . . . . . . . . . . . . . . . . . . 11

    4 The econometrics of model selection . . . . . . . . . . . . . . . . . . . . . . . . . 11

    4.1 Search costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    4.2 Selection probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4.3 Deletion probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4.4 Path selection probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4.5 Improved inference procedures . . . . . . . . . . . . . . . . . . . . . . . 17

    5 PcGets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    5.1 The multi-path reduction process ofPcGets . . . . . . . . . . . . . . . . 195.2 Settings in PcGets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    5.3 Limits to PcGets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    5.3.1 Collinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    5.4 Integrated variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    6 Some Monte Carlo results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    6.1 Aim of the Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    6.2 Design of the Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . 26

  • 8/2/2019 Henry II Forecasting

    3/53

    3

    6.3 Evaluation of the Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . 27

    6.4 Diagnostic tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    6.5 Size and power of variable selection . . . . . . . . . . . . . . . . . . . . 29

    6.6 Test size analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    7 Empirical Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    7.1 DHSY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    7.2 UK Money Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    8 Model selection in forecasting, testing, and policy analysis . . . . . . . . . . . . . 408.1 Model selection for forecasting . . . . . . . . . . . . . . . . . . . . . . . 40

    8.1.1 Sources of forecast errors . . . . . . . . . . . . . . . . . . . . 40

    8.1.2 Sample selection experiments . . . . . . . . . . . . . . . . . . 42

    8.2 Model selection for theory testing . . . . . . . . . . . . . . . . . . . . . 43

    8.3 Model selection for policy analysis . . . . . . . . . . . . . . . . . . . . . 44

    8.3.1 Congruent modelling . . . . . . . . . . . . . . . . . . . . . . . 45

    9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    10 Appendix: encompassing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

  • 8/2/2019 Henry II Forecasting

    4/53

    4

    1 Introduction

    The economy is a complicated, dynamic, non-linear, simultaneous, high-dimensional, and evolving en-

    tity; social systems alter over time; laws change; and technological innovations occur. Time-series

    data samples are short, highly aggregated, heterogeneous, non-stationary, time-dependent and inter-

    dependent. Economic magnitudes are inaccurately measured, subject to revision and important vari-

    ables not unobservable. Economic theories are highly abstract and simplified, with suspect aggregation

    assumptions, change over time, and often rival, conflicting explanations co-exist. In the face of this

    welter of problems, econometric modelling of economic time series seeks to discover sustainable and

    interpretable relationships between observed economic variables.

    However, the situation is not as bleak as it may seem, provided some general scientific notions are

    understood. The first key is that knowledge accumulation is progressive: one does not need to know all

    the answers at the start (otherwise, no science could have advanced). Although the best empirical model

    at any point will later be supplanted, it can provide a springboard for further discovery. Thus, model

    selection problems (e.g., data mining) are not a serious concern: this is established below, by the actual

    behaviour of model-selection algorithms.

    The second key is that determining inconsistencies between the implications of any conjectured

    model and the observed data is easy. Indeed, the ease of rejection worries some economists about eco-

    nometric models, yet is a powerful advantage. Conversely, constructive progress is difficult, because we

    do not know what we dont know, so cannot know how to find out. The dichotomy between construction

    and destruction is an old one in the philosophy of science: critically evaluating empirical evidence is a

    destructive use of econometrics, but can establish a legitimate basis for models.

    To understand modelling, one must begin by assuming a probability structure and conjecturing

    the data generation process. However, the relevant probability basis is unclear, sincet the economic

    mechanism is unknown. Consequently, one must proceed iteratively: conjecture the process, develop

    the associated probability theory, use that for modelling, and revise the starting point when the results do

    not match consistently. This can be seen in the gradual progress from stationarity assumptions, through

    integrated-cointegrated systems, to general non-stationary, mixing processes: further developments will

    undoubtedly occur, leading to a more useful probability basis for empirical modelling. These notesfirst review the theory of reduction in 2 to explain the origins of empirical models, then discuss somemethodological issues that concern many economists.

    Despite the controversy surrounding econometric methodology, the LSE approach (see Hendry,

    1993, for an overview) has emerged as a leading approach to empirical modelling. One of its main

    tenets is the concept of general-to-specific modelling (Gets general-to-specific): starting from a gen-

    eral dynamic statistical model, which captures the essential characteristics of the underlying data set,

    standard testing procedures are used to reduce its complexity by eliminating statistically-insignificant

    variables, checking the validity of the reductions at every stage to ensure the congruence of the selected

    model. Section 3 discusses Gets, and relates it to the empirical analogue of reduction.

    Recently econometric model-selection has been automated in a program called PcGets, which is anOx Package (see Doornik, 1999, and Hendry and Krolzig, 1999a) designed for Gets modelling, currently

    focusing on reduction approaches for linear, dynamic, regression models. The development ofPcGets

    has been stimulated by Hoover and Perez (1999), who sought to evaluate the performance of Gets. To

    implement a general-to-specific approach in a computer algorithm, all decisions must be mechan-

    ized. In doing so, Hoover and Perez made some important advances in practical modelling, and our

    approach builds on these by introducing further improvements. Given an initial general model, many

    reduction paths could be considered, and different selection strategies adopted for each path. Some of

  • 8/2/2019 Henry II Forecasting

    5/53

    5

    these searches may lead to different terminal specifications, between which a choice must be made.

    Consequently, the reduction process is inherently iterative. Should multiple congruent contenders even-

    tuate after a reduction round, encompassing can be used to test between them, with only the surviving

    usually non-nested specifications retained. If multiple models still remain after this testimation pro-

    cess, a new general model is formed from their union, and the simplification process re-applied. Should

    that union repeat, a final selection is made using information criteria, otherwise a unique congruent and

    encompassing reduction has been located.

    Automating Gets throws further light on several methodological issues, and prompts some newideas, which are discussed in section 4. While the joint issue of variable selection and diagnostic testing

    using multiple criteria has eluded most attempts at theoretical analysis, computer automation of the

    model-selection process allows us to evaluate econometric model-selection strategies by simulation.

    Section 6 presents the results of some Monte Carlo experiments to investigate if the model-selection

    process works well or fails badly; their implications for the calibration of PcGets are also analyzed.

    The empirical illustrations presented in section 7 demonstrate the usefulness of PcGets for applied

    econometric research.

    Section 8 then investigates model selection in forecasting, testing, and policy analysis and shows the

    drawbacks of some widely-used approaches.

    2 Theory of reduction

    First we define the notion of an empirical model, then explain the the origins of such models by the

    theory of reduction.

    2.1 Empirical models

    In an experiment, the output is caused by the inputs and can be treated as if it were a mechanism:

    yt = f(zt) + t

    [output] [input] [perturbation]

    (1)

    where yt is the observed outcome of the experiment when zt is the experimental input, f() is themapping from input to output, and t is a small, random perturbation which varies between experiments

    conducted at the same values of z. Given the same inputs {zt}, repeating the experiment generatesessentially the same outputs.

    In an econometric model, however:

    yt = g (zt) + t[observed] [explanation] [remainder]

    (2)

    yt can always be decomposed into two components, namely g (zt) (the part explained) and t (unex-

    plained). Such a partition is feasible even when yt does not depend on g (zt). In econometrics:

    t = yt g (zt) . (3)

    Thus, models can be designed by selection of zt. Design criteria must be analyzed, and lead to the

    notion of a congruentmodel: one that matches the data evidence on the measured attributes. Successive

    congruent models should be able to explain previous ones, which is the concept of encompassing, and

    thereby progress can be achieved.

  • 8/2/2019 Henry II Forecasting

    6/53

    6

    2.2 DGP

    Let {ut} denote a stochastic process where ut is a vector of n random variables. Consider thesample U1T = (u1 . . .uT), where U

    1t1 = (u1 . . .ut1). Denote the initial conditions by U0 =

    (. . .ur . . .u1 u0), and let Ut1 =U0 : U

    1t1

    . The density function ofU1T conditional on U0

    is given by DUU1T|U0,

    where DU () is represented parametrically by a k-dimensional vector of

    parameters = (1 . . . k) with parameter space Rk. All elements of need not be the same at

    each t, and some of the

    {i

    }may reflect transient effects or regime shifts. The data generation process

    (DGP) of{ut} is written as:

    DUU1T | U0,

    with Rk. (4)

    The complete sample {ut, t = 1, . . . , T } is generated from DU () by a population parameter value p.The sample joint data density DU

    U1T|U0,

    is called the Haavelmo distribution (see e.g., Spanos,

    1989).

    The complete set of random variables relevant to the economy under investigation over t = 1, . . . T

    is denoted {ut }, where denotes a perfectly measured variable and U1T = (u1, . . . ,uT), defined onthe probability space (, F,P). The DGP induces U1T = (u1, . . . ,uT) but U1T is unmanageably large.Operational models are defined by a sequence of data reductions, organized into eleven stages.

    2.3 Data transformations and aggregation

    One-one mapping ofU1T to a new data set W1T : U

    1T W1T. The DGP ofU1T, and so ofW1T is

    characterized by the joint density:

    DUU1T | U0,1T

    = DW

    W1T | W0,1T

    (5)

    where 1T and 1T , making parameter change explicit The transformation from U to Waffects the parameter space, so is transformed into .

    2.4 Parameters of interest

    M. Identifiable, and invariant to an interesting class of interventions.

    2.5 Data partition

    Partition W1T into the two sets:

    W1T =X1T : V

    1T

    (6)

    where the X1T matrix is T n. Everything about must be learnt from analyzing the X1T alone, so thatV1T is not essential to inference about .

    2.6 Marginalization

    DWW1T | W0,1T

    = DV|X

    V1T | X1T,W0,1a,T

    DXX1T | W0,1b,T

    . (7)

    Eliminate V1T by discarding the conditional density DV|X

    V1T|X1T,W0,1a,T

    in (7), while retaining

    the marginal density DX

    X1T|W0,1b,T

    . must be a function of1b,T alone, given by = f

    1b,T

    .

    A cut is required, so that1a,T :

    1b,T

    a b.

  • 8/2/2019 Henry II Forecasting

    7/53

    7

    2.7 Sequential factorization

    To create the innovation process sequentially factorize X1T as:

    DXX1T | W0,1b,T

    =

    Tt=1

    Dxxt | X1t1,W0,b,t

    . (8)

    Mean innovation error process t = xt E

    xt|X1t1

    .

    2.7.1 Sequential factorization ofW1T.

    Alternatively:

    DWW1T | W0,1T

    =

    Tt=1

    Dw (wt | Wt1, t) . (9)

    RHS innovation process is t = wt Ewt|W1t1

    .

    2.7.2 Marginalizing with respect to V1T.

    Dw (wt | Wt1,t) = Dv|x (vt | xt,Wt1,a,t)Dxxt | V1t1,X1t1,W0,b,t

    , (10)

    as Wt1 = V1t1,X1t1,W0. must be obtained from {b,t} alone. Marginalize with respect toV1t1:

    Dxxt | V1t1,X1t1,W0, b,t

    = Dx

    xt | X1t1,W0,b,t

    . (11)

    No loss of information if and only ifb,t = b,t t, so the conditional, sequential distribution of{xt}

    does not depend on V1t1 (Granger non-causality).

    2.8 Mapping to I(0)

    Needed to ensure conventional inference is valid, though many inferences will be valid even if this

    reduction is not enforced. Cointegration would need to be treated in a separate set of lectures.

    2.9 Conditional factorization

    Factorize the density ofxt into sets ofn1 and n2 variables where n1 + n2 = n:

    xt =yt : z

    t

    , (12)

    where the yt are endogenous and the zt are non-modelled.

    Dxxt | X1t1,W0,bt

    = Dy|z

    yt | zt,X1t1,W0,a,t

    Dzzt | X1t1,W0,b,t

    (13)

    zt is weakly exogenous for if (i) = f(a,t) alone; and (ii) (a,t,b,t) a b.

    2.10 Constancy

    Complete parameter constancy is :

    a,t = a t (14)where a a, so that is a function ofa : = f(a).

    Tt=1

    Dy|zyt | zt,X1t1,W0,a

    (15)

    with a .

  • 8/2/2019 Henry II Forecasting

    8/53

    8

    2.11 Lag truncation

    Fix the extent of the history ofX1t1 in (15) at s earlier periods:

    Dy|zyt | zt,X1t1,W0,a

    = Dy|z

    yt | zt,Xtst1,W0,

    . (16)

    2.12 Functional form

    Map yt

    into y

    t= h (y

    t) and z

    tinto z

    t= g (z

    t), and denote the resulting data by X. Assume that y

    tand zt simultaneously make Dy|z () approximately normal and homoscedastic, denoted Nn1 [t,]:

    Dy|zyt | zt,Xtst1,W0,

    = Dy|z

    yt | zt ,Xtst1,W0,

    (17)

    2.13 The derived model

    A (L)h (y)t = B (L)g (z)t + t (18)

    where t gapp Nn1 [0,], and A (L) and B (L) are polynomial matrices (i.e., matrices whose elements

    are polynomials) of order s in the lag operator L. t is a derived, and not an autonomous, process

    defined by:

    t = A (L)h (y)t B (L)g (z)t . (19)The reduction to the generic econometric equation involves all the stages of aggregation, marginaliza-

    tion, conditioning etc., transforming the parameters from which determines the stochastic features of

    the data, to the coefficients of the empirical model.

    2.14 Dominance

    Consider two distinct scalar empirical models denoted M1 and M2 with mean-innovation processes

    (MIPs) {t} and {t} relative to their own information sets, where t and t have constant, finite vari-ances 2 and

    2 respectively. Then M1 variance dominates M2 if

    2 <

    2 , denoted by M1 M2.

    Variance dominance is transitive since ifM1 M2 and M2 M3 then M1 M3, and anti-symmetricsince ifM1 M2 then it cannot be true that M2 M1. A model without a MIP error can be variancedominated by a model with a MIP on a common data set. The DGP cannot be variance dominated in

    the population by any models thereof (see e.g. Theil, 1971, p543). Let Ut1 denote the universe of

    information for the DGP and let Xt1 be the subset, with associated innovation sequences {u,t} and{x,t}. Then as {Xt1} {Ut1}, E [u,t|Xt1] = 0, whereas E [x,t|Ut1] need not be zero. Amodel with an innovation error cannot be variance dominated by a model which uses only a subset of

    the same information.

    Ift = xt E [xt|Xt1], then 2 is no larger than the variance of any other empirical model errordefined by t = xt G [xt|Xt1] whatever the choice of G []. The conditional expectation is theminimum mean-square error predictor. These implications favour general rather than simple empirical

    models, given any choice of information set, and suggest modelling the conditional expectation. A

    model which nests all contending explanations as special cases must variance dominate in its class. Let

    model Mj be characterized by parameter vector j with j elements, then as in Hendry and Richard

    (1982):

    M1 is parsimoniously undominated in the class {Mi} ifi, 1 i and no Mi M1.Model selection procedures (such as AIC or the Schwarz criterion: see Judge, Griffiths, Hill, Lutkepohl

    and Lee (1985)) seek parsimoniously undominated models, but do not check for congruence.

  • 8/2/2019 Henry II Forecasting

    9/53

    9

    2.15 Econometric concepts as measures of no information loss

    [1] Aggregation entails no loss of information on marginalizing with respect to disaggregates when the

    retained information comprises a set of sufficient statistics for the parameters of interest .

    [2] Transformations per se do not entail any associated reduction but directly introduce the concept of

    parameters of interest, and indirectly the notions that parameters should be invariant and identifiable.

    [3] Data partition is a preliminary although the decision about which variables to include and which to

    omit is perhaps the most fundamental determinant of the success or otherwise of empirical modelling.

    [4] Marginalizing with respect to vt is without loss providing the remaining data are sufficient for ,

    whereas marginalizing without loss with respect to V1t1 entails both Granger non-causality for xt and

    a cut in the parameters.

    [5] Sequential factorization involves no loss if the derived error process is an innovation relative to the

    history of the random variables, and via the notion of common factors, reveals that autoregressive errors

    are a restriction and not a generalization.

    [6] Integrated data systems can be reduced to I(0) by suitable combinations of cointegration and differ-

    encing, allowing conventional inference procedures to be applied to more parsimonious relationships.

    [7] Conditional factorization reductions, which eliminate marginal processes, lead to no loss of in-

    formation relative to the joint analysis when the conditioning variables are weakly exogenous for the

    parameters of interest.[8] Parameter constancy, implicitly relates to invariance as constancy across interventions which affect

    the marginal processes.

    [9] Lag truncation involves no loss if the error process remains an innovation despite excluding some of

    the past of relevant variables.

    [10] Functional form approximations need involve no reduction (logs of log-normally distributed vari-

    ables): e.g. when the two densities in (17) are equal.

    [11] The derived model, as a reduction of the DGP, is nested within that DGP and its properties are

    explained by the reduction process: knowledge of the DGP entails knowledge of all reductions thereof.

    When knowledge of one model entails knowledge of another, the first is said to encompass the second.

    2.16 Implicit model design

    This correcponds to the symptomatology approach in econometrics, testing for problems (autocorrela-

    tion, heteroscedasticity, omitted variables, multicollinearity, non-constant parameters etc.), and correct-

    ing these.

    2.17 Explicit model design

    Mimic reduction theory in practical research to minimize the losses due to the reductions selected: leads

    to Gets modelling.

    2.18 A taxonomy of evaluation information

    Partition the data X1T used in modelling into the three information sets:

    [a] past data;

    [b] present data

    [c] future data.

    X1T =X1t1 : xt : X

    t+1T

    (20)

  • 8/2/2019 Henry II Forecasting

    10/53

    10

    [d] theory information, which often is the source of parameters of interest, and is a creative stimulus in

    economics;

    [e] measurement information, including price index theory, constructed identities such as consumption

    equals income minus savings, data accuracy and so on; and:

    [f] data of rival models, which could be analyzed into past, present and future in turn.

    The six main criteria which result for selecting an empirical model are:

    [a] homoscedastic innovation errors;

    [b] weakly exogenous conditioning variables for the parameters of interest;[c] constant, invariant parameters of interest;

    [d] theory consistent, identifiable structures;

    [e] data admissible formulations on accurate observations; and

    [f] encompass rival models.

    Models which satisfy the first five information sets are said to be congruent: an encompassing

    congruent model satisfies all six criteria.

    3 General-to-specific modelling

    The practical embodiment of reduction is general-to-specific (Gets) modelling. The DGP is replaced bythe concept of the local DGP (LDGP), namely the joint distribution of the subset of variables under

    analysis. Then a general unrestricted model (GUM) is formulated to provide a congruent approxim-

    ation to the LDGP, given the theoretical and previous empirical background. The empirical analysis

    commences from this general specification, after testing for mis-specifications, and if none are appar-

    ent, is simplified to a parsimonious, congruent representation, each simplification step being checked

    by diagnostic testing. Simplification can be done in many ways: and although the goodness of a model

    is intrinsic to it, and not a property of the selection route, poor routes seem unlikely to deliver useful

    models. Even so, some economists worry about the impact of selection rules on the properties of the

    resulting models, and insist on the use ofa priori specifications: but these need knowledge of the answer

    before we start, so deny empirical modelling any useful role and in practice, it has rarely contributed.

    Few studies have investigated how well general-to-specific modelling does. However, Hoover and

    Perez (1999) offer important evidence in a major Monte Carlo, reconsidering the Lovell (1983) experi-

    ments. They place 20 macro variables in databank; generate one ( y) as a function of 05 others; regress

    y on all 20 plus all lags thereof, then let their algorithm simplify that GUM till it finds a congruent

    (encompassing) irreducible result. They check up to 10 different paths, testing for mis-specification,

    collect the results from each, then select one choice from the remainder by following many paths,

    the algorithm is protected against chance false routes, and delivers an undominated congruent model.

    Nevertheless, Hendry and Krolzig (1999b) improve on their algorithm in several important respects and

    this section now describes these.

    3.1 Pre-search reductions

    First, groups of variables are tested in the order of their absolute t-values, commencing with a block

    where all the p-values exceed 0.9, and continuing down towards the pre-assigned selection criterion,

    when deletion must become inadmissible. A less-stringent significance level is used at this step, usually

    10%, since the insignificant variables are deleted permanently. If no test is significant, the F-test on all

    variables in the GUM has been calculated, establishing that there is nothing to model.

  • 8/2/2019 Henry II Forecasting

    11/53

    11

    3.2 Additional paths

    Blocks of variables constitute feasible search paths, in addition to individual-coefficients, like the block

    F-tests in the preceding sub-section but along search paths. All paths that also commence with an

    insignificant t-deletion are explored.

    3.3 Encompassing

    Encompassing tests select between the candidate congruent models at the end of path searches. Eachcontender is tested against their union, dropping those which are dominated by, and do not dominate,

    another contender. If a unique model results,select that; otherwise, if some are rejected, form the union

    of the remaining models, and repeat this round till no encompassing reductions result. That union

    then constitutes a new starting point, and the complete path-search algorithm repeats till the union is

    unchanged between successive rounds.

    3.4 Information criteria

    When a union coincides with the original GUM, or with a previous union, so no further feasible reduc-

    tions can be found, PcGets selects a model by an information criterion. The preferred final-selection

    rule presently is the Schwarz criterion, or BIC, defined as:

    SC = 2log L/T + p log(T)/T,

    where L is the maximized likelihood, p is the number of parameters and T is the sample size. For

    T = 140 and p = 40, minimum SC corresponds approximately to the marginal regressor satisfying

    |t| 1.9.

    3.5 Sub-sample reliability

    For that finally-selected model, sub-sample reliability is evaluated by the HooverPerez overlapping

    split-sample test. PcGets concludes that some variables are definitely excluded; some definitely in-cluded, and some have an uncertain role, varying from a reliability of 25% (included in the final model,

    but insignificant and insignificant in both sub-samples), through to 75% (significant overall and in one

    sub-sample, or in both sub-samples).

    3.6 Significant mis-specification tests

    If the initial mis-specification tests are significant at the pre-specified level, we raise the required signi-

    ficance level, terminating search paths only when that higher level is violated. Empirical investigators

    would re-specify the GUM on rejection.

    To see why Gets does well, we develop the analytics for several of its procedures.

    4 The econometrics of model selection

    The key issue for any model-selection procedure is the cost of search, since there are always bound to

    be mistakes in statistical inference: specifically, how bad is it to search across many alternatives? The

    conventional statistical analysis of repeated testing provides a pessimistic background: every test has a

    non-zero null rejection frequency (or size, if independent of nuisance parameters), and so type I errors

  • 8/2/2019 Henry II Forecasting

    12/53

    12

    accumulate. Setting a small size for every test can induce low power to detect the influences that really

    matter.

    Critics of general-to-specific methods have pointed to a number of potential difficulties, including

    the problems of lack of identification, measurement without theory, data mining, pre-test biases,

    ignoring selection effects, repeated testing, and the potential path dependence of any selection:

    see inter alia, Faust and Whiteman (1997), Koopmans (1947), Lovell (1983), Judge and Bock (1978),

    Leamer (1978), Hendry, Leamer and Poirier (1990), and Pagan (1987). The following discussion draws

    on Hendry (2000a).Koopmans critique followed up the earlier attack by Keynes (1939, 1940) on Tinbergen (1940a,

    1940b), and set the scene for doubting all econometric analyses that failed to commence from pre-

    specified models. Lovells study of trying to select a small relation (zero to five regressors) hidden in

    a large database (40 variables) found a low success rate, thereby suggesting that search procedures had

    high costs, and supporting an adverse view of data-based model selection. The third criticism concerned

    applying significance tests to select variables, arguing that the resulting estimator was biased in general

    by being a weighted average of zero (when the variable was excluded) and an unbiased coefficient (on

    inclusion). The fourth concerned biases in reported coefficient standard errors from treating the selected

    model as if there was no uncertainty in the choice. The next argued that the probability of retaining

    variables that should not enter a relationship would be high because a multitude of tests on irrelevant

    variables must deliver some significant outcomes. The sixth suggested that how a model was selected

    affected its credibility: at its extreme, we find the claim in Leamer (1983) that the mapping is the

    message, emphasizing the selection process over the properties of the final choice. In the face of this

    barrage of criticism, many economists came to doubt the value of empirical evidence, even to the extent

    of referring to it as a scientific illusion (Summers, 1991).

    The upshot of these attacks on empirical research was that almost all econometric studies had to

    commence from pre-specified models (or pretend they did). Summers (1991) failed to notice that this

    was the source of his claimed scientific illusion: econometric evidence had become theory dependent,

    with little value added, and a strong propensity to be discarded when fashions in theory changed. Much

    empirical evidence only depends on low-level theories which are part of the background knowledge base

    not subject to scrutiny in the current analysis so a data-based approach to studying the economy is

    feasible. Since theory dependence has at least as many drawbacks as sample dependence, data modelling

    procedures are essential: see Hendry (1995a). Indeed, all of these criticisms are refutable, as we now

    show.

    First, identification has three attributes, as discussed in Hendry (1997), namely uniqueness, sat-

    isfying the required interpretation, and correspondence to the desired entity. A non-unique result is

    clearly not identified, so the first attribute is necessary, but insufficient, since uniqueness can be achieved

    by arbitrary restrictions (criticized by Sims, 1980, inter alia). There can exist a unique combination of

    several relationships which is incorrectly interpreted as one of those equations: e.g., a reduced form

    that has a positive price effect, wrongly interpreted as a supply relation. Finally, a unique, interpretable

    model of (say) a money-demand relation may in fact correspond to a Central Banks supply sched-ule, and this too is sometimes called a failure to identify the demand relation. Because economies are

    highly interdependent, simultaneity was long believed to be a serious problem, but higher frequencies of

    observation have attenuated this problem. Anyway, simultaneity is not invariant under linear transform-

    ations although linear systems are so can be avoided by eschewing contemporaneous regressors until

    weak exogeneity is established. Conditioning ensures a unique outcome, although it cannot guarantee

    that the resulting model corresponds to the underlying reality.

    Next, Keynes appears to have believed that statistical work in economics is impossible without

  • 8/2/2019 Henry II Forecasting

    13/53

    13

    knowledge of everything in advance. But if partial explanations are devoid of use, and empirically we

    could discover nothing not already known, then no science could have progressed. That is clearly refuted

    by the historical record. The fallacy in Keyness argument is that since theoretical models are incomplete

    and incorrect, an econometrics that is forced to use such theories as the only permissible starting point

    for data analysis can contribute little useful knowledge, except perhaps rejecting the theories. When

    invariant features of reality exist, progressive research can discover them in part without prior knowledge

    of the whole: see Hendry (1995b). A similar analysis applies to the attack in Koopmans on the study

    by Burns and Mitchell: he relies on the (unstated) assumption that only one sort of economic theory isapplicable, that it is correct, and that it is immutable (see Hendry and Morgan, 1995).

    Data mining is revealed when conflicting evidence exists or when rival models cannot be encom-

    passed and if they can, then an undominated model results despite the inappropriate procedure. Thus,

    stringent critical evaluation renders the data mining criticism otiose. Gilbert (1986) suggests separat-

    ing output into two groups: the first contains only redundant results (those parsimoniously encompassed

    by the finally-selected model), and the second contains all other findings. If the second group is not null,

    then there has been data mining. On such a characterization, Gets cannot involve data mining, despite

    depending heavily on data basing.

    When the LDGP is known a priori from economic theory, but an investigator did not know that the

    resulting model was in fact true, so sought to test conventional null hypotheses on its coefficients, then

    inferential mistakes will occur in general. These will vary as a function of the characteristics of the

    LDGP, and of the particular data sample drawn, but for many parameter values, the selected model will

    differ from the LDGP, and hence have biased coefficients. This is the pre-test problem, and is quite

    distinct from the costs of searching across a general set of specifications for a congruent representation

    of the LDGP.

    If a wide variety of models would be reported when applying any given selection procedure to

    different samples from a common DGP, then the results using a single sample apparently understate

    the true uncertainty. Coefficient standard errors only reflect sampling variation conditional on a fixed

    specification, with no additional terms from changes in that specification (see e.g., Chatfield, 1995).

    Thus, reported empirical estimates must be judged conditional on the resulting equation being a good

    approximation to the LDGP. Undominated (i.e., encompassing) congruent models have a strong claim

    to provide such an approximation, and conditional on that, their reported uncertainty is a good measure

    of the uncertainty inherent in such a specification for the relevant LDGP.

    The theory of repeated testing is easily understood: the probability p that none ofn tests rejects at

    100% is:

    p = (1 )n .When 40 tests of correct null hypotheses are conducted at = 0.05, p0.05 0.13, whereas p0.005 0.89. However, it is difficult to obtain spurious t-test values much in excess of three despite repeated

    testing: as Sargan (1981) pointed out, the t-distribution is thin tailed, so even the 0.5% critical value is

    less than three for 50 degrees of freedom. Unfortunately, stringent criteria for avoiding rejections when

    the null is true lower the power of rejection when it is false. The logic of repeated testing is accurate

    as a description of the statistical properties of mis-specification testing: conducting four independent

    diagnostic tests at 5% will lead to about 19% false rejections. Nevertheless, even in that context, there

    are possible solutions such as using a single combined test which can substantially lower the size

    without too great a power loss (see e.g., Godfrey and Veale, 1999). It is less clear that the analysis is

    a valid characterization of selection procedures in general when more one path is searched, so there is

    no error correction for wrong reductions. In fact, the serious practical difficulty is not one of avoiding

  • 8/2/2019 Henry II Forecasting

    14/53

    14

    spuriously significant regressors because of repeated testing when many hypotheses are tested, it is

    retaining all the variables that genuinely matter.

    Path dependence is when the results obtained in a modelling exercise depend on the simplification

    sequence adopted. Since the quality of a model is intrinsic to it, and progressive research induces

    a sequence of mutually-encompassing congruent models, proponents of Gets consider that the path

    adopted is unlikely to matter. As Hendry and Mizon (1990) expressed the matter: the model is the

    message. Nevertheless, it must be true that some simplifications lead to poorer representations than

    others. One aspect of the value-added of the approach discussed below is that it ensures a uniqueoutcome, so the path does not matter.

    We conclude that each of these criticisms of Gets can be refuted. Indeed, White (1990) showed that

    with sufficiently-rigorous testing, the selected model will converge to the DGP. Thus, any overfitting

    and mis-specification problems are primarily finite sample. Moreover, Mayo (1981) emphasized the

    importance of diagnostic test information being effectively independent of the sufficient statistics from

    which parameter estimates are derived. Hoover and Perez (1999) show how much better Gets is than any

    method Lovell considered, suggesting that modelling per se need not be bad. Indeed, overall, the size

    of their selection procedure is close to that expected, and the power is reasonable. Moreover, re-running

    their experiments using our version (PcGets) delivered substantively better outcomes (see Hendry and

    Krolzig, 1999b). Thus, the case against model selection is far from proved.

    4.1 Search costs

    Let pdgpi denote the probability of retaining the i

    th variable out ofk when commencing from the DGP

    specification and applying the relevant selection test at the same significance level as the search pro-

    cedure. Then 1 pdgpi is the expected cost of inference. For irrelevant variables, pdgpi 0, so thatwhole cost for those is attributed to search. Let p

    gumi denote the probability of retaining the i

    th variable

    when commencing from the GUM, and applying the same selection test and significance level. Then,

    the search costs are pdgpi pgumi . False rejection frequencies of the null can be lowered by increasing

    the required significance levels of selection tests, but only at the cost of also reducing power. However,

    it is feasible to lower the former and raise the latter simultaneously by an improved search algorithm,subject to the bound of attaining the same performance as knowing the DGP from the outset.

    To keep search costs low, any model-selection process must satisfy a number of requirements. First,

    it must start from a congruent statistical model to ensure that selection inferences are reliable: con-

    sequently, it must test for model mis-specification initially, and such tests must be well calibrated (nom-

    inal size close to actual). Secondly, it must avoid getting stuck in search paths that initially inadvertently

    delete relevant variables, thereby retaining many other variables as proxies: consequently, it must search

    many paths. Thirdly, it must check that eliminating variables does not induce diagnostic tests to become

    significant during searches: consequently, model mis-specification tests must be computed at every

    stage. Fourthly, it must ensure that any candidate model parsimoniously encompasses the GUM, so no

    loss of information has occurred. Fifthly, it must have a high probability of retaining relevant variables:

    consequently, a loose significance level and powerful selection tests are required. Sixthly, it must have

    a low probability of retaining variables that are actually irrelevant: consequently, this clashes with the

    fifth objective in part, but requires an alternative use of the available information. Finally, it must have

    powerful procedures to select between the candidate models, and any models derived from them, to end

    with a good model choice, namely one for which:

    L =k

    i=1

    pdgpi pgumi

  • 8/2/2019 Henry II Forecasting

    15/53

    15

    is close to zero.

    4.2 Selection probabilities

    When searching a large database for that DGP, an investigator could well retain the relevant regressors

    much less often than when the correct specification is known, in addition to retaining irrelevant variables

    in the finally-selected model. We first examine the problem of retaining significant variables commen-

    cing from the DGP, then turn to any additional power losses resulting from search.

    For a regression coefficient i, hypothesis tests of the null H0: i = 0 will reject with a probability

    dependent on the non-centrality parameter of the test. We consider the slightly more general setting

    where t-tests are used to check an hypothesis, denoted t(n, ) for n degrees of freedom, when is

    the non-centrality parameter, equal to zero under the null. For a critical value c, P (|t| c|H0) = where H0 implies = 0. The following table records some approximate power calculations when one

    coefficient null hypothesis is tested and when four are tested, in each case, precisely once.

    t-test powers

    n P (|t| c) P (|t| c)41 100 0.05 0.16 0.001

    2 50 0.05 0.50 0.0632 100 0.01 0.26 0.005

    3 50 0.01 0.64 0.168

    4 50 0.05 0.98 0.902

    4 50 0.01 0.91 0.686

    6 50 0.01 1.00 0.997

    Thus, there is little hope of retaining variables with = 1, and only a 5050 chance of retaining a

    single variable with a theoretical |t| of 2 when the critical value is also 2, falling to 3070 for a criticalvalue of 2.6. When = 3, the power of detection is sharply higher, but still leads to more than 35%

    mis-classifications. Finally, when = 4, one such variable will almost always be retained.

    However, the final column shows that the probability of retaining all four relevant variables with

    the given non-centrality is essentially negligible even when they are independent, except in the last

    few cases. Mixed cases (with different values of ) can be calculated by multiplying the probabilities

    in the fourth column (e.g., for = 2, 3, 4, 6 the joint P () = 0.15 at = 0.01). Such combinedprobabilities are highly non-linear in , since one is almost certain to retain all four when = 6, even

    at a 1% significance level. The important conclusion is that, despite knowing the DGP, low signal-

    noise variables will rarely be retained using t-tests when there is any need to test the null; and if there

    are many relevant variables, all of them are unlikely to be retained even when they have quite large

    non-centralities.

    4.3 Deletion probabilities

    The most extreme case where low deletion probabilities might entail high search costs is when many

    variables are included but none actually matters. PcGets systematically checks the reducibility of the

    GUM by testing simplifications up to the empty model. A one-offF-test FG of the GUM against the

    null model using critical value c would have size P (FG c) = under the null if it was the only testimplemented. Consequently, path searches would only commence % of the time, and some of these

    could also terminate at the null model. Let there be k regressors in the GUM, of which n are retained

  • 8/2/2019 Henry II Forecasting

    16/53

    16

    when t-test selection is used should the null model be rejected. In general, when there are no relevant

    variables, the probability of retaining no variables using t-tests with critical value c is:

    P (|ti| < c i = 1, . . . , k) = (1 )k . (21)

    Combining (21) with the FG-test, the null model will be selected with approximate probability:

    pG = (1 ) + (1 )k , (22)

    where is the probability of FG rejecting yet no regressors being retained (conditioning onFG c cannot decrease the probability of at least one rejection). Since is set at quite a high value,such as 0.20, whereas = 0.05 is more usual, FG c0.20 can occur without any |ti| c0.05. Evaluating(22) for = 0.20, = 0.05 and k = 20 yields pG 0.87; whereas the re-run of the HooverPerezexperiments with k = 40 reported by Hendry and Krolzig (1999b) using = 0.01 yielded 97.2% in

    the Monte Carlo as against a theory prediction from (22) of 99%. Alternatively, when = 0.1 and

    = 0.01 (22) has an upper bound of 96.7%, falling to 91.3% for = 0.05. Thus, it is relatively easy

    to obtain a high probability of locating the null model, even when 40 irrelevant variables are included,

    using relatively tight significance levels, or a reasonable probability for looser significance levels.

    4.4 Path selection probabilities

    We now calculate how many spurious regressors will be retained in path searches. The probability

    distribution of one or more null coefficients being significant in pure t-test selection at significance level

    is given by the k + 1 terms of the binomial expansion of:

    ( + (1 ))k .

    The following table illustrates by enumeration for k = 3:

    event probability number retained

    P (|ti| < c, i = 1, . . . 3) (1 )3

    0P (|ti| c | |tj | < c, j = i) 3 (1 )2 1P (|ti| < c | |tj | c, j = i) 3 (1 ) 2 2P (|ti| c, i = 1, . . . 3) 3 3

    Thus, for k = 3, the average number of variables retained is:

    n = 3 3 + 2 3 (1 ) 2 + 3 (1 )2 = 3 = k.

    The result n = k is general. When = 0.05 and k = 40, n equals 2, falling to 0.4 for = 0.01: so

    even if only t-tests are used, few spurious variables will be retained.

    Combining the probability of a non-null model with the number of variables selected when the GUMF-test rejects:

    p = ,

    (where p is the probability any given variable will be retained), which does not depend on k. For

    = 0.1, = 0.01, we have p = 0.001. Even for = 0.25 and = 0.05, p = 0.0125 before search

    paths and diagnostic testing are included in the algorithm. The actual behaviour ofPcGets is much more

    complicated than this, but can deliver a small overall size. Following the event FG c when = 0.1(so the null is incorrectly rejected 10% of the time), and approximating by 0.5 variables retained when

  • 8/2/2019 Henry II Forecasting

    17/53

    17

    that occurs, then the average non-deletion probability (i.e., the probability any given variable will be

    retained) is pr = n/k = 0.125%, as against the reported value of0.19% found by Hendry and Krolzig

    (1999b). These are very small retention rates of spuriously-significant variables.

    Thus, in contrast to the relatively high costs of inference discussed in the previous section, those

    of search arising from retaining additional irrelevant variables are almost negligible. For a reasonable

    GUM with (say) 40 variables where 25 are irrelevant, even without the pre-selection and multiple path

    searches of PcGets, and using just t-tests at 5%, roughly one spuriously significant variable will be

    retained by chance. Against that, from the previous section, there is at most a 50% chance of retainingeach of the variables that have non-centralites around 2, and little chance of keeping them all: the

    difficult problem is retention of relevance, not elimination of irrelevance. The only two solutions are

    better inference procedures, or looser critical values; we will consider them both.

    4.5 Improved inference procedures

    An inference procedure involves a sequence of steps. As a simple example, consider a procedure com-

    prising two F-tests: the first is conducted at the = 50% level, the second at = 5%. The variables to

    be tested are first ordered by their t-values in the GUM, such that t21 t22 t2k, and the first F-testadds in variables from the smallest observed t-values till a rejection would occur, with either F1 > c

    or an individual |t| > c (say). All those variables except the last are then deleted from the model,and a second F-test conducted of the null that all remaining variables are significant. If that rejects, so

    F2 > c , all the remaining variables are retained, otherwise, all are eliminated. We will now analyze

    the probability properties of this 2-step test when all k regressors are orthogonal for a regression model

    estimated from T observations.

    Once m variables are included in the first step, non-rejection requires that (a) the diagnostics are

    insignificant; (b) m 1 variables did not induce rejection, (c) |tm| < c and (d):

    F1 (m, T k) 1m

    mi=1

    t2i c. (23)

    Clearly, any t2

    i 1 reduces the mean F1 statistic, and since P(|ti| < 1 ) = 0.68, when k = 40,approximately 28 variables fall in that group; and P(|ti| 1.65) = 0.1 so only 4 variables shouldchance to have a larger |ti| value on average. In the conventional setting where = 0.05 withP(|ti| < 2) 0.95, only 2 variables will chance to have larger t-values, whereas slightly more thanhalf will have t2i < 0.5 or smaller. Since P(F1 (20, 100) < 1|H0) 0.53, a first step with = 0.5should eliminate all variables with t2i 1, and some larger t-values as well hence the need to checkthat |tm| < c (below we explain why collinearity between variables that matter and those that do notshould not jeopardize this step).

    A crude approximation to the likely value of (23) under H0 is to treat all t-values within blocks as

    having a value equal to the mid-point. We use the five ranges t2i < 0.5, 1, 1.652, 4, and greater than 4,

    using the expected numbers falling in each of the first four blocks, which yields:

    F1 (38, 100) 138

    0.25 20 + 0.75 8 + 1.332 8 + 1.822 2 = 31.8

    38 0.84,

    noting P(F1 (38, 100) < 0.84|H0) 0.72 (setting all ts equal to the upper bound of each block yieldsan illustrative upper bound of about 1.3 for F1). Thus, surprisingly-large values of , such as 0.75, can

    be selected for this step yet have a high probability of eliminating almost all the irrelevant variables.

    Indeed, using = 0.75 entails c 0.75 when m = 20, since:P (F1 (20, 100) < 0.75 | H0) 0.75,

  • 8/2/2019 Henry II Forecasting

    18/53

    18

    or c 0.8 for m = 30.When the second F-test is invoked for a null model, it will falsely reject more than % of the

    time since all small t-values have already been eliminated, but the resulting model will still be tiny

    in comparison to the GUM. Conversely, this procedure has a much higher probability of retaining a

    block of relevant variables. For example, commencing with 40 regressors of which m = 35 (say) were

    correctly eliminated, should the 5 remaining variables all have expected t-values of two the really

    difficult case in section 4.2 then:

    E [F2 (5, 100)] 15 40

    i=36

    Et2i 4. (24)

    When = 0.05, c = 2.3 and:

    PF2 (5, 100) 2.3 | F2 = 4

    > 0.99,

    (using a non-central 2 () approximation to F2), thereby almost always retaining all five relevant vari-ables. This is obviously a dramatic improvement over the near zero probability of retaining all five

    variables using t-tests on the DGP in section 4.2. Practical usage of PcGets suggests its operational

    characteristics are well described by this analysis.

    5 PcGets

    PcGets attempts to meet all of the criteria in section 4. First, it always starts from a congruent general lin-

    ear, dynamic statistical model, using a battery of mis-specification tests to ensure congruence. Secondly,

    it is recommended that the GUM have near orthogonal, non-integrated regressors so test outcomes are

    relatively orthogonal. Then PcGets pre-selection tests for variables at a loose significance level (25%

    or 10%, say), to remove variables that are highly irrelevant then simplifies the model to be searched ac-

    cordingly by eliminating those variables. It then explores multiple selection paths each of which begins

    by eliminating one or more statistically-insignificant variables, with diagnostic tests checking the valid-

    ity of all reductions, thereby ensuring a congruent final model. Path searches continue till no further

    reductions are feasible, or a diagnostic tests rejects. All the viable terminal selections resulting from

    these search paths are stored. If there is more than one terminal model, parsimonious encompassing

    tests are conducted of each against their union to eliminate models that are dominated and do not dom-

    inate any others. If a non-unique outcome does not result, the search procedure is repeated from the

    new union. Finally, if mutually-encompassing contenders remain, information criteria are used to select

    between these terminal reductions. Additionally, sub-sample significance is used to assess the reliability

    of the resulting model choice. For further details, see e.g., Hendry and Krolzig (1999b).

    There is little research on how to design model-search algorithms in econometrics. The search

    procedure must have a high probability of retaining variables that do matter in the LDGP, and eliminating

    those that do not. To achieve that goal, PcGets uses encompassing tests between alternative reductions.

    Balancing the objectives of small size and high power still involves a trade-off, but one that is dependent

    on the algorithm: the upper bound is probably determined by the famous lemma in Neyman and Pearson

    (1928). Nevertheless, to tilt the size-power balance favourably, sub-sample information is also exploited,

    building on the further development in Hoover and Perez of investigating split samples for significance

    (as against constancy). Since non-central t-values diverge with increasing sample size, whereas central

    ts fluctuate around zero, the latter have a low probability of exceeding any given critical value in

    two sub-samples, even when those sample overlap. Thus, adventitiously-significant variables may be

    revealed by their insignificance in one or both of the sub-samples.

  • 8/2/2019 Henry II Forecasting

    19/53

    19

    PcGets embodies some further developments. First, PcGets undertakes pre-search simplification

    F-tests to exclude variables from the general unrestricted model (GUM), after which the GUM is re-

    formulated. Since variables found to be irrelevant on such tests are excluded from later analyses, this

    step uses a loose significance level (such as 10%). Next, many possible paths from that GUM are in-

    vestigated: reduction paths considered include both multiple deletions as well as single, so t and/or F

    test statistics are used as simplification criteria. The third development concerns the encompassing step:

    all distinct contending valid reductions are collected, and encompassing is used to test between these

    (usually non-nested) specifications. Models which survive encompassing are retained; all encompassedequations are rejected. If multiple models survive this testimation process, their union forms a new

    general model, and selection path searches recommence. Such a process repeats till a unique contender

    emerges, or the previous union is reproduced, then stops. Fourthly, the diagnostic tests require careful

    choice to ensure they characterize the salient attributes of congruency, are correctly sized, and do not

    overly restrict reductions. A further improvement concerns model choice when mutually-encompassing

    distinct models survive the encompassing step. A minimum standard error rule, as used by Hoover and

    Perez (1999), will probably over-select as it corresponds to retaining all variables with |t| > 1. Instead,we employ information criteria which penalize the likelihood function for the number of parameters. Fi-

    nally, sub-sample information is used to accord a reliability score to variables, which investigators may

    use to guide their model choice. In Monte Carlo experiments, a progressive research strategy (PRS)

    can be formulated in which decisions on the final model choice are based on the outcomes of such

    reliability measure.

    5.1 The multi-path reduction process of PcGets

    The starting point for Gets model-selection is the general unrestricted model, so the key issues concern

    its specification and congruence. The larger the initial regressor set, the more likely adventitious effects

    will be retained; but the smaller the GUM, the more likely key variables will be omitted. Further, the

    less orthogonality between variables, the more confusion the algorithm faces, leading to a proliferation

    of mutual-encompassing models, where final choices may only differ marginally (e.g., lag 2 versus 1).1

    Finally, the initial specification must be congruent, with no mis-specification tests failed at the outset.Empirically, the GUM would be revised if such tests rejected, and little is known about the consequences

    of doing so (although PcGets will enable such studies in the near future). In Monte Carlo experiments,

    the program automatically changes the significance levels of such tests.

    The reduction path relies on a classical, sequential-testing approach. The number of paths is in-

    creased to try all single-variable deletions, as well as various block deletions from the GUM. Different

    critical values can be set for multiple and single selection tests, and for diagnostic tests. Denote by

    the significance level for the mis-specification tests (diagnostics) and by the significance level for the

    selection t-tests (we ignore F tests for the moment). The corresponding p-values of these are denoted

    and

    , respectively. During the specification search, the current specification is simplified only if

    no diagnostic test rejects its null. This corresponds to a likelihood-based model evaluation, where the

    likelihood function of model M is given by the density:

    LM(M) =

    fM(Y;M)

    if minM(Y; M)

    T1 whereas the opposite holds for the jth if there is not too much sample

    overlap. Consequently, a progressive research strategy (shown as PRS below) can gradually eliminate

    adventitiously-significant variables. Hoover and Perez (1999) found that by adopting a progressive

    search procedure (as in Stage III), the number of spurious regressors can lowered (inducing a lower

    overall size), without losing much power. Details of the resulting algorithm are shown in Table 2.

    5.2 Settings in PcGets

    The testimation process ofPcGets depends on the choice of:

    the n diagnostic checks in the test battery; the parameters of these diagnostic tests;

    the significance levels of the n diagnostics;

    pre-search F-test simplification; the significance levels of such tests; the simplification tests (t and/or F); the significance levels of the simplification tests; the significance levels of the encompassing tests; the sub-sample split; the significance levels of the sub-sample tests; the weights accorded to measure reliability.

    The choice of mis-specification alternatives determines the number and form of the diagnostic tests.

    Their individual significance levels in turn determine the overall significance level of the test battery.

    Since significant diagnostic-test values terminate search paths, they act as constraints on moving away

    3The information criteria are defined as follows:

    AIC = 2log L/T+ 2n/T,

    SC = 2logL/T+ n log(T)/T,

    HQ = 2log L/T+ 2n log(log(T))/T,

    where L is the maximized likelihood, n is the number of parameters and T is the sample size: see Akaike (1985), Schwarz

    (1978), and Hannan and Quinn (1979).

  • 8/2/2019 Henry II Forecasting

    23/53

    23

    Table 2 Additions to the basic PcGets algorithm.

    Stage 0

    (1) Pre-simplification and testing of the GUM

    (a) If a diagnostic test fails for the GUM, the significance level of that test is adjusted, or the

    test is excluded from the test battery during simplifications of the GUM;

    (b) if all variables are significant, the GUM is the final model, and the algorithm stops;

    (c) otherwise, F-tests of sets of individually-insignificant variables are conducted:(i) if one or more diagnostic tests fails, thatF-test reduction is cancelled, and the algorithm

    returns to the previous step;

    (ii) if all diagnostic tests are passed, the blocks of variables that are insignificant are re-

    moved and a simpler GUM specified;

    (iii) if all diagnostic tests are passed, and all blocks of variables are insignificant, the null

    model is the final model.

    Stage III

    (1) Post-selection sub-sample evaluation

    (a) Test the significance of every variable in the final model from Stage II in two overlappingsub-samples (e.g., the first and last r%):

    (i) if a variable is significant overall and both sub-samples, accord it 100% reliable;

    (ii) if a variable is significant overall and in one sub-sample, accord it 75% reliable;

    (iii) if a variable is significant overall and in neither sub-sample, accord it 50% reliable;

    (iv) if a variable is insignificant overall but in both sub-samples, accord it 50% reliable;

    (v) if a variable is insignificant overall and in only one sub-sample, accord it 25% reliable;

    (vi) if a variable is insignificant overall and in neither sub-sample, accord it 0% reliable.

    from the GUM. Thus, if a search is to progress towards an appropriate simplification, such tests mustbe well focused and have the correct size. The pre-search tests were analyzed above, as were the path

    searches. The choices of critical values for pre-selection, selection and encompassing tests are important

    for the success of PcGets: the tighter the size, the fewer the spurious inclusions of irrelevant, but the

    more the false exclusions of relevant variables. In the final analysis, the calibration of PcGets depends

    on the characteristics valued by the user: if PcGets is employed as a first pre-selection step in a users

    research agenda, the optimal values of, , and may be higher than when the focus is on controlling

    the overall size of the selection process. The non-expert user settings reflect this.

    In section 6, we will use simulation techniques to investigate the calibration of PcGets for the op-

    erational characteristics of the diagnostic tests, the selection probabilities of DGP variables, and the

    deletion probabilities of non-DGP variables. However, little research has been undertaken to date to

    optimize any of the choices, or to investigate the impact on model selection of their interactions.

    5.3 Limits to PcGets

    Davidson and Hendry (1981, p.257) mentioned four main problems in the general-to-specific method-

    ology: (i) the chosen general model can be inadequate, comprising a very special case of the DGP;

    (ii) data limitations may preclude specifying the desired relation; (iii) the non-existence of an optimal

    sequence for simplification leaves open the choice of reduction path; and (iv) potentially-large type-II

  • 8/2/2019 Henry II Forecasting

    24/53

    24

    error probabilities of the individual tests may be needed to avoid a high type-I error of the overall se-

    quence. By adopting the multiple path development of Hoover and Perez (1999), and implementing a

    range of important improvements, PcGets overcomes many of problems associated with points (iii) and

    (iv). However, the empirical success of PcGets must depend crucially on the creativity of the researcher

    in specifying the general model and the feasibility of estimating it from the available data aspects bey-

    ond the capabilities of the program, other than the diagnostic tests serving their usual role of revealing

    model mis-specification.

    There is a central role for economic theory in the modelling process in prior specification, priorsimplification, and suggesting admissible data transforms. The first of these relates to the inclusion of

    potentially-relevant variables, the second to the exclusion of irrelevant effects, and the third to the ap-

    propriate formulations in which the influences to be included are entered, such as log or ratio transforms

    etc., differences and cointegration vectors, and any likely linear transformations that might enhance

    orthogonality between regressors. The LSE approach argued for a close link of theory and model,

    and explicitly opposed running regressions on every variable on the database as in Lovell (1983) (see

    e.g., Hendry and Ericsson, 1991a). PcGets currently focuses on general-to-simple reductions for linear,

    dynamic, regression models, and economic theory often provides little evidence for specifying the lag

    lengths in empirical macro-models. Even when the theoretical model is dynamic, the lags are usually

    chosen either for analytical convenience (e.g., first-order differential equation systems), or to allow for

    certain desirable features (as in the choice of a linear second-order single-equation model to replicate

    cycles). Therefore, we adopt the approach of starting with an unrestricted rational-lag model with a

    maximal lag length set according to available evidence (e.g., as 4 or 5 for quarterly time series, to allow

    for seasonal dynamics). Prior analysis remains essential for appropriate parameterizations; functional

    forms; choice of variables; lag lengths; and indicator variables (including seasonals, special events, etc.).

    Orthogonalization helps notably in selecting a unique representation; as does validly reducing the initial

    GUM. The present performance ofPcGets on previously-studied empirical problems is impressive, even

    when the GUM is specified in highly inter-correlated, and probably non-stationary, levels. Hopefully,

    PcGets support in automating the reduction process will enable researchers to concentrate their efforts

    on designing the GUM: that could again significantly improve the empirical success of the algorithm.

    5.3.1 Collinearity

    Perfect collinearity denotes an exact linear dependence between variables; perfect orthogonality denotes

    no linear dependencies. However, any state in between these is both harder to define and to measure

    as it depends on which version of a model is inspected. Most econometric models contain subsets of

    variables that are invariant to linear transformations, whereas measures of collinearity are not invariant:

    if two standardized variables x and z are nearly perfectly correlated, each can act as a close proxy for

    the other, yet x + z and x z are almost uncorrelated. Moreover, observed correlation matrices are notreliable indicators of potential problems in determining if either or both variables should enter a model

    the source of their correlation matters. For example, inter-variable correlations of 0.9999 easily arise

    in systems with unit roots and drift, but there is no difficulty determining the relevance of variables.

    Conversely, in the simple bivariate normal:xtzt

    IN2

    0

    0

    ,

    1

    1

    , (25)

    where we are interested in the DGP:

    yt = xt + zt + t (26)

  • 8/2/2019 Henry II Forecasting

    25/53

    25

    (for well behaved t say), when = 0.9999 there would be almost no hope of determining which

    variables mattered in (26), even if the DGP formulation were known. In economic time series, however,

    the former case is common, whereas (25) is almost irrelevant (although it might occur when trying to

    let estimation determine which of several possible measures of a variable is best). Transforming the

    variables to a near orthogonal representation before modelling would substantially resolve this prob-

    lem, but otherwise, eliminating one of the two variables seems inevitable. Of course, which is dropped

    depends on the vagaries of sampling, and that might be thought to induce considerable unmeasured

    uncertainty, as the chosen model oscillates between retaining xt or zt. However, either variable indi-vidually is a near-perfect proxy for the dependence of yt on xt + zt, and so long as the entire system

    remains constant, selecting either, or the appropriate sum, does not actually increase the uncertainty

    greatly. That remains true even when one of the variables is irrelevant, although then the multiple-path

    search is highly likely to select the correct equation. And if the system is not constant, the collinearity

    will be broken.

    Nevertheless, the outcome of a Monte Carlo model-selection study of (26) given (25) when =

    0.9999 might suggest that model uncertainty was large and coefficient estimates badly biased

    simply because different variables were retained in different replications. The appropriate metric is to

    see how well xt + zt is captured. In some cases, models are estimated to facilitate economic policy,

    and in such a collinear setting, changing only one variable will not have the anticipated outcome

    although it will end the collinearity and so allow precise estimates of the separate effects. Transforming

    the variables to a near orthogonal representation before modelling is assumed to have occurred in the

    remainder of the chapter. By having a high probability of selecting the LDGP in such an orthogonal

    setting, the reported uncertainties (such as estimated coefficient standard errors) in PcGets are not much

    distorted by selection effects.

    5.4 Integrated variables

    To date, PcGets conducts all inferences as I(0). Most selection tests will in fact be valid even when the

    data are I(1), given the results in, say, Sims, Stock and Watson (1990). Only t- or F-tests for an effect

    that corresponds to a unit root require non-standard critical values. The empirical examples on I(1) dataprovided below do not reveal problems, but in principle it would be useful to implement cointegration

    tests and appropriate transformations after stage 0, and prior to stage I reductions.

    Similarly, Wooldridge (1999) shows that diagnostic tests on the GUM (and presumably simplifica-

    tions thereof) remain valid even for integrated time series.

    6 Some Monte Carlo results

    6.1 Aim of the Monte Carlo

    Although the sequential nature of PcGets and its combination of variable-selection and diagnostic test-

    ing has eluded most attempts at theoretical analysis, the properties of the PcGets model-selection process

    can be evaluated in Monte Carlo (MC) experiments. In the MC considered here, we aim to measure the

    size and power of the PcGets model-selection process, namely the probability of inclusion in the

    final model of variables that do not (do) enter the DGP.

    First, the properties of the diagnostic tests under the potential influence of nuisance regressors are

    investigated. Based on these results, a decision can be made as to which diagnostics to include in the test

    battery. Then the size and power of PcGets is compared to the empirical and theoretical properties

  • 8/2/2019 Henry II Forecasting

    26/53

    26

    of a classical t-test. Finally we analyze how the success and failure of PcGets are affected by the

    choice of: (i) the significance levels of the diagnostic tests; and (ii) the significance levels of the

    specification tests.

    6.2 Design of the Monte Carlo

    The Monte Carlo simulation study of Hoover and Perez (1999) considered the Lovell database, which

    embodies many dozens of relations between variables as in real economies, and is of the scale and com-

    plexity that can occur in macro-econometrics: the rerun of those experiments using PcGets is discussed

    in Hendry and Krolzig (1999b). In this paper, we consider a simpler experiment, which however, allows

    an analytical assessment of the simulation findings. The Monte Carlo reported here uses only stages I

    and II in table 1: Hendry and Krolzig (1999b) show the additional improvements that can result from

    adding stages 0 and III to the study in Hoover and Perez (1999).

    The DGP is a Gaussian regression model, where the strongly-exogenous variables are Gaussian

    white-noise processes:

    yt =5

    k=1k,0xk,t + t, t IN [0, 1] , (27)

    xt = vt, vt IN10 [0, I10] for t = 1, . . . , T ,where 1,0 = 2/

    T , 2,0 = 3/

    T , 3,0 = 4/

    T , 4,0 = 6/

    T , 5,0 = 8/

    T.

    The GUM is an ADL(1, 1) model which includes as non-DGP variables the lagged endogenous

    variable yt1, the strongly-exogenous variables x6,t, . . . , x10,t and the first lags of all regressors:

    yt = 0,1yt1 +10k=1

    1i=0

    k,ixk,ti + 0,0 + ut, ut IN

    0, 2

    . (28)

    The sample size T is 100 or 1000 and the number of replications M is 1000.

    The orthogonality of the regressors allows an easier analysis. Recall that the t-test of the null k = 0

    versus the alternative k = 0 is given by:

    tk =kk =

    k2 (XX)1kk =k/2 (XX)1kk2/2 .

    The population value of the t-statistic is:

    tk =k

    k=

    k

    T1

    2 Q1/2kk

    ,

    where the moment matrix Q = limT T1XX is assumed to exist. Since the regressors areorthogonal, we have that k =xky/2k and2k =2/(T2k):tk =

    kk =

    Tkk .Thus the non-zero population t-values are 2, 3, 4, 6, 8. In (28), 17 of22 regressors are nuisance.

  • 8/2/2019 Henry II Forecasting

    27/53

    27

    6.3 Evaluation of the Monte Carlo

    The evaluation of Monte Carlo experiments always involves measurement problems: see Hendry (1984).

    A serious problem here is that, with some positive probability, the GUM and the truth will get re-

    jected ab initio on diagnostic tests. Tests are constructed to have non-zero nominal size under their null,

    so sometimes the truth will be rejected: and the more often, the more tests that are used. Three possible

    strategies suggest themselves: one rejects that data sample, and randomly re-draws; one changes the

    rejection level of the offending test; or one specifies a more general GUM which is congruent. We

    consider these alternatives in turn.

    Hoover and Perez (1999) use a 2-significant test rejections criterion to discard a sample and re-

    draw, which probably slightly favours the performance ofGets . In our Monte Carlo with PcGets, the

    problem is solved by endogenously adjusting the significance levels of tests that reject the GUM (e.g.,

    1% to 0.1%). Such a solution is feasible in a Monte Carlo, but metaphysical in reality, as one could

    never know that a sample from an economy was unrepresentative, since time series are not repeatable.

    Thus, an investigator could never know that the DGP was simpler empirically than the data suggest

    (although such a finding might gradually emerge in a PRS), and so would probably generalize the initial

    GUM. We do not adopt that solution here, partly because of the difficulties inherent in the constructive

    use of diagnostic-test rejections, and partly because it is moot whether the PcGet algorithm fails by

    overfitting on such aberrant samples, when in a non-replicable world, one would conclude that suchfeatures really were aspects of the DGP. Notice that fitting the true equation, then testing it against

    such alternatives, would also lead to rejection in this setting, unless the investigator knew the truth, and

    knew that she knew it, so no tests were needed. While more research is needed on cases where the DGP

    would be rejected against the GUM, here we allow PcGets to adjust significance levels endogenously.

    Another major decision concerns the basis of comparison: the truth seems to be a natural choice,

    and both Lovell (1983) and Hoover and Perez (1999) measure how often the search finds the DGP

    exactly or nearly. Nevertheless, we believe that finding the DGP exactly is not a good choice of

    comparator, because it implicitly entails a basis where the truth is known, and one is certain that it is

    the truth. Rather, to isolate the costs of selection per se, we seek to match probabilities with the same

    procedures applied to testing the DGP. In each replication, the correct DGP equation is fitted, and thesame selection criteria applied: we then compare the retention rates for DGP variables from PcGets

    with those that occur when no search is needed, namely when inference is conducted once for each

    DGP variable, and additional (non-DGP) variables are never retained.

    6.4 Diagnostic tests

    PcGets records the rejection frequencies of both specification and mis-specification tests for the DGP,

    the initial GUM, and the various simplifications thereof based on the selection rules. Figure 1 displays

    quantilequantile (QQ) plots of the empirical distributions of seven potential mis-specification tests for

    the estimated correct specification, the general model, and the finally-selected model. Some strong

    deviations from the theoretical distributions (diagonal) are evident: the portmanteau statistic (see Boxand Pierce, 1970) rejects serial independence of the errors too often in the correct specification, never in

    the general, and too rarely in the final model. The hetero-x test (see White, 1980) was faced with degrees

    of freedom problems for the GUM, but anyway does not look good for the true and final model either.

    Since this incorrect finite-sample size of the diagnostic tests induces an excessively-early termination of

    any search path, resulting in an increased overall size for variable selection, we decided to exclude the

    portmanteau and the hetero-x diagnostics from the test battery of statistics. Thus, the following results

    use the five remaining diagnostic tests in table 3.

  • 8/2/2019 Henry II Forecasting

    28/53

    28

    0 .5 1

    .5

    1Correct Model

    Final (Specific) Model

    Chow1

    0 .5 1

    .5

    1 Chow2

    0 .5 1

    .5

    1 portmonteau

    0 .5 1

    .5

    1 normality

    0 .5 1

    .5

    1 AR test

    0 .5 1

    .5

    1 hetero

    0 .5 1

    .5

    1 hetero-X

    0 2 4 60

    2

    4

    6

    Failed tests

    0 .5 1

    .5

    1

    General ModelChow1

    0 .5 1

    .5

    1 Chow2

    0 .5 1

    .5

    1 portmonteau

    0 .5 1

    .5

    1 normality

    0 .5 1

    .5

    1 AR test

    0 .5 1

    .5

    1 hetero

    0 .5 1

    .5

    1 hetero-X

    0 2 4 60

    2

    4

    6

    Failed tests

    0 .5 1

    .5

    1 Chow1

    0 .5 1

    .5

    1 Chow2

    0 .5 1

    .5

    1 portmonteau

    0 .5 1

    .5

    1 normality

    0 .5 1

    .5

    1 AR test

    0 .5 1

    .5

    1 hetero

    0 .5 1

    .5

    1 hetero-X

    0 2 4 60

    2

    4

    6

    Failed tests

    Figure 1 Selecting diagnostics: QQ Plots for M = 1000 and T = 100..

    0 .5 1

    .5

    1Correct Model: QQ Plots of Diagnostics for M=1000 and T=1000

    Chow1

    0 .5 1

    .5

    1

    Chow2

    0 .5