Gospodinov Ng

download Gospodinov Ng

of 37

Transcript of Gospodinov Ng

  • 7/29/2019 Gospodinov Ng

    1/37

    Minimum Distance Estimation of Possibly

    Non-Invertible Moving Average Models

    Nikolay Gospodinov Serena Ng y

    November 5, 2012

    Abstract

    This paper proposes classical and simulation-based minimum distance estimation of movingaverage (MA) models with non-Gaussian errors. Information in higher order cumulants allowsidentication of the parameters without imposing invertibility. By removing the invertibilityrestriction, the presence of a moving average unit root no longer presents a boundary problemthat gives rise to non-standard asymptotics. As a result, the minimum distance estimator of theMA(1) model has classical root-T asymptotic normal properties when the moving average rootis inside, outside, and on the unit circle. For more general models when the dependence of thecumulants on the model parameters is analytically intractable, we propose a simulation estima-tor based on auxiliary regressions with parameters that are informative about the higher ordercumulants. The method uses an error simulator with a exible functional form that accommo-dates a large class of distributions with non-Gaussian features. The simulation estimator is alsoapproximately normally distributed without imposing the a priori assumption of invertibility.

    JEL Classication: C13, C15, C22

    Keywords: Minimum distance; Non-invertibility; Indirect inference; Identication; Non-Gaussianerrors; Generalized lambda distribution

    Concordia University and CIREQ, 1455 de Maisonneuve Blvd. West, Montreal, QC H3G 1M8, Canada. Email:[email protected]

    yColumbia University, 420 W. 118 St. MC 3308, New York, NY 10027. Email: [email protected]

    We would like to thank Prosper Dovonon, Anders Bredahl Kock, Ivana Komunjer and the participants at the CESGmeeting at Queens University for useful comments and suggestions. The rst author gratefully acknowledges nancialsupport from FQRSC, IFM2 and SSHRC. The second author acknowledges nancial support from the National ScienceFoundation (SES-0962431).

  • 7/29/2019 Gospodinov Ng

    2/37

    1 Introduction

    Moving average (MA) models can parsimoniously characterize the dynamic behavior of many time

    series processes. The challenges in estimating MA models are two-fold. First, invertible and

    non-invertible moving average processes are observationally equivalent up to the second moments.

    Second, invertibility puts an upper bound of one on all roots of the moving average polynomial,

    rendering estimators with non-normal asymptotic distributions when some roots are on or near the

    unit circle. Existing estimators treat invertible and non-invertible processes separately, requiring the

    researcher to take a stand on the parameter space of interest. While estimators are super-consistent

    under the null hypothesis of a moving average unit root, their distributions are not asymptotically

    pivotal. To our knowledge, no estimator of the MA model exists that achieves identication without

    imposing invertibility and yet enables classical inference over the whole parameter space.

    Both invertible and non-invertible representations can be consistent with economic theory. For

    example, if the logarithm of asset price is the sum of a random walk component and a stationarycomponent, the rst dierence (or asset returns) is generally invertible, but non-invertibility can

    arise if the variance of the stationary component is large. While non-invertible models are not ruled

    out by theory, invertibility is often the mainstream assumption in empirical work. One reason is that

    non-invertible models are not useful for forecasting because future values of the endogenous variable

    are not observable. The more practical reason is that the assumption provides the identication

    restrictions without which maximum likelihood and covariance structure-based estimation of MA

    models would not be possible when the data are normally distributed.1 Obviously, falsely assuming

    invertibility will yield an inferior t of the data. It can also lead to spurious estimates of the impulse

    coecients which are often the objects of interest. Hansen and Sargent (1991), Lippi and Reichlin

    (1993), Fernndez-Villaverde et al. (2007), among others, emphasize the need to verify invertibility

    because it aects how we interpret what can be recovered from the data. Indeed, it is necessary in

    many science and engineering applications to admit parameter values in the non-invertible range. 2

    A key nding in these studies is that higher order cumulants are necessary for identication when

    the non-invertible models are to be entertained.

    This paper considers minimum distance estimation of MA models without imposing invertibility

    a priori. We rst show using the MA(1) model that use of higher cumulants per se is not sucient

    1 Invertibility can also help to identify structural models. For example, Komunjer and Ng (2011) use invertibilityto narrow the class of equivalent DSGE models.

    2 For example, in seismology, an accurate model of the seismic source wavelet, in the form of a moving averagelter, is necessary to recover the earths reectivity sequence. The fact that seismic data typically exhibit non-Gaussian features suggests the need for a wavelet (moving average polynomial) which is non-invertible. Similarly,in communication analysis, an accurate modeling of the communication channel by a possibly non-invertible movingaverage process is required to back out the underlying message from the observed distorted message.

    1

  • 7/29/2019 Gospodinov Ng

    3/37

    for the Jacobian matrix to be full rank everywhere in the parameter space. Exploiting the fact that

    mapping between the structural parameters and cumulants can be explicitly derived for the MA(1)

    case, we show that the cumulants can over- but not exactly identify the MA(1) model if a unit root

    and parameters consistent with non-invertibility are admissible. However, two second order along

    with three third order cumulants can be used to construct a classical minimum distance estimatorthat is root-T consistent and uniformly asymptotically normal.

    Extension of the classical minimum distance estimator to more general moving average models

    is not possible when the relation between the model parameters and the higher order cumulants is

    not analytically tractable. Thus, we also propose a simulation based minimum distance estimator

    with errors drawn from the generalized lambda distribution. It is an alternative to the semi-

    parametric density considered in Gallant and Tauchen (1996) for simulating non-Gaussian errors.

    The estimator uses multiple auxiliary regressions and has the avor of indirect inference estimation

    proposed by Gourieroux et al. (1993) as well as the simulated method of moments of Due and

    Singleton (1993). The proposed estimator also has classical asymptotic properties regardless of

    whether the MA roots are inside, outside, or on the unit circle.

    The main arguments of the analysis are presented using the MA(1) model but extensions to more

    general models are also discussed. Section 2 proceeds to highlight two identication problems in the

    context of minimum distance estimation. Section 3 discusses the properties of the classical minimum

    distance estimator based only on information about the covariance structure of the process. It also

    motivates the need of using higher order cumulants in estimation and explains how identication

    can be achieved. Section 4 develops a simulation minimum distance estimator for more general

    moving average models. An empirical application for commodity prices is provided in Section 5.Section 6 concludes.

    2 Two Identication Problems

    Consider the autoregressive and moving average (ARMA) process of order (p, q):

    (L)yt = (L)et;

    where et iid(0; 2); L is the lag operator such that Lpyt = ytp, (L) = 11L : : :pLp haveno common roots with (L) = 1 + 1L + : : : + qLq. The autoregressive polynomial (z) is said

    to be causal if(z) 6= 0 for all jzj 1 on the complex plane, and the moving average polynomialis said to be invertible if (z) 6= 0 for all jzj 1 (Brockwell and Davies (1991)). Ifyt is a causalfunction of et, then there exist constants hj with

    P1j=0 jhj j < 1 such that yt =

    P1j=0 hjetj for

    t = 0;1; : : : We say that yt has minimum phase if the zeros of(z) and (z) are all greater than

    2

  • 7/29/2019 Gospodinov Ng

    4/37

    one in absolute value.3 Few economic time series exhibit explosive behavior. If we narrow the focus

    to causal and stable processes, invertible processes also have minimum phase.

    If a process yt is invertible in et, then there exist constants j withP1

    j=0 jjj < 1 such thatet =

    P1j=0 jytj = (L)yt. For ARMA(p, q) models, invertibility requires that the inverse of (L)

    has a convergent series expansion in positive powers of the lag operator L. For the MA(1) model

    yt = et + et1; (1)

    with et iid(0; 2), the invertibility condition is satised if jj < 1 since (L) =P1

    s=0()sLsis a polynomial in positive powers of L. This is no longer true when jj in (1) exceeds one. Itis, however, misleading to classify invertible and non-invertible processes according to the value

    alone. Consider the MA(1) process yt represented by

    yt = et + et1: (2)

    Even if in (2) is less than one, yt is still non-invertible because the implied (L) =P1

    s=0(L)s1is a polynomial in negative powers ofL.

    Invertible and non-invertible processes have distinctive features with implications for forecasting.

    In the invertible case, the span of et and its history coincide with that of yt, which is observed by

    the econometrician. The one-step ahead forecast errors are etjt1 = yt ytjt1 = et. In the non-invertible case, the econometrician does not observe future values ofyt and his information set is

    strictly inferior to that of the economic agent. As discussed in Ramsey and Montenegro (1992),

    the one-step ahead forecast errors when yt is generated by the non-invertible model (2) are

    etjt1 = yt (et1 + et2) + 2(et2 + et3) + : : : 6= et:

    These dierences are important in the subsequent analysis.

    Identication and estimation of models with a moving-average component are dicult because

    of two problems that are best understood by focusing on the MA(1) case. The rst identication

    problem concerns at or near unity. When the MA parameter is near the unit circle, the Gaussian

    maximum likelihood estimator (MLE) takes values exactly on the boundary of the invertibility

    region with positive probability (the so-called pile-up problem) in nite samples. This point

    probability mass at unity arises from the symmetry of the likelihood function around one and the

    small sample deciency to identify all the critical points of the likelihood function in the vicinity of

    the non-invertibility boundary; see Sargan and Bhargava (1983), Anderson and Takemura (1986),

    Davis and Dunsmuir (1996), Gospodinov (2002), Davis and Song (2011).

    3 A non-stationary process is not mean reverting with the property that the shocks have permanent eects on theseries. This has generated a large literature on testing the unit root hypothesis against the alternative of stationarity.

    3

  • 7/29/2019 Gospodinov Ng

    5/37

    The second identication problem arises because covariance stationary processes are completely

    characterized by the rst and second moments of the observables and an MA(1) model with pa-

    rameters (; 2)0 has the same autocovariance structure as a model parameterized by (1=;22)0.

    In consequence, the Gaussian likelihood for an MA(1) model with L(; 2) is the same as one with

    L(1=;2

    2

    ). The observational equivalence of second moments also implies that the projectioncoecients in (L) are the same regardless of whether is less than or greater than one. Thus,

    cannot be recovered from the coecients without additional assumptions.

    This observational equivalence problem can be further elicited from a frequency domain per-

    spective. If we take as a starting point yt = h(L)et =P1

    j=1 hjetj, the frequency response

    function of the lter is

    H(!) =X

    hj exp(i!j) = jH(!)j expi(!);

    wherejH(!)

    jis the amplitude and (!) is the phase response of the lter. For ARMA models,

    h(z) = (z)(z) =P1

    j=1 hjzj. The amplitude response is usually constant for given ! and tends

    towards zero outside the interval [0; ]. If et is Gaussian and h(L) is invertible, second order

    statistics will correctly identify the amplitude and the phase of the wavelet. But for given a > 0,

    the phase 0 is indistinguishable from (!) = 0 + a! for any ! 2 [0; ]. Recovering et from thesecond order spectrum

    S2;y(z) = 2jH(z)j2

    is problematic because S2;y(z) is proportional to the amplitude jH(z)j2 with no information aboutthe phase. The second order spectrum is thus said to be phase-blind. As explained in Lii and

    Rosenblatt (1982), one can ip the roots of (z) and (z) without aecting the modulus of the

    transfer function. With real distinct roots, there are 2p+q ways of specifying the roots without

    changing the probability structure ofyt.

    3 Classical Minimum Distance Estimation of MA(1) Model

    The consequences of the two identication problems for estimation and inference are easily seen

    from the perspective of the classical minimum distance estimator that exploits only the covariance

    structure of the process. Let

    2 be a K

    1 parameter vector of interest with a true value

    0, where the parameter space is a subset of the K dimensional Euclidean space RK. Considerestimating the MA(1) model indirectly via an auxiliary model with an L 1 (L K) vector ofparameters 2 RL that are functions of; f j = (); 2 g, with a pseudo-true value

    0 (0):

    4

  • 7/29/2019 Gospodinov Ng

    6/37

    Given data y (y1; : : : ; yT)0 and a consistent estimator bT and its asymptotic variance b, ischosen to minimize the dierence between bT and (). The classical minimum distance (CMD)estimator of using the optimal weighting matrix is dened as

    bT = arg minJCMD(bT; ();b)= arg min(bT ())0b1(bT ()): (3)

    If is of the same dimension as and () is of known form and invertible, then bT = 1(bT).In general, the dimension of exceeds that of. The auxiliary model need not nest the true model,

    but identication hinges on a well-behaved mapping from the space of to the parameter space of

    the auxiliary model.

    Denition 1 Let() : ! () be a mapping from to andG() = @()=@0 withG0 G(0). Then, 0 is globally identied if (

    ) is injective and is locally identied if the matrix of

    partial derivativesG0 has full column rank.

    Note that the requirement for full column rank of the derivative matrix G(0) is a sucient

    condition for the mapping (0) to be locally injective. Hence, 0 is locally identied if (0)

    is locally injective and Denition 1 provides a sucient condition for local identication. From

    Denition 1, 1 and 2 are observationally equivalent if(1) = (2).

    Lemma 1 Suppose that the following conditions hold. (C.i) fyt; 1 < t < 1g is a strictlystationary and ergodic process; (C.ii) bT

    p

    !0; (C.iii) the set

    f

    j = ()

    2

    RL

    gis

    a compact subset of RK withL K; (C.iv) () is twice continuously dierentiable in ; (C.v)the matrix of partial derivativesG() has rank equal to K on ; (C.vi) the mapping 0 = (0)

    is unique; Then, b p!0. If, in addition, (AN.i) 0 is in the interior of ; (AN.ii) pT(bT 0)

    d!N(0;0); where 0 is a symmetric positive denite matrix and b p!0, then pT(bT 0)

    d!N(0; (G0010 G0)1).

    Lemma 1, adapted from Ruud (2000), provides conditions for consistency and asymptotic nor-

    mality of the classical minimum distance estimator. Except for (C.v), these are more or less

    standard conditions for extremum estimators to be consistent and asymptotically normal distrib-

    uted; see for example, Newey and McFadden (1994). Condition (C.v) is typically stated as a

    requirement for asymptotic normality but not a necessary condition for global identication; Hall

    (2005, p.69).4 However, in Rothenberg (1971), full rank of the derivative matrix is used in stating

    4 Sargan (1983) and, more recently, Dovonon and Renault (2011) show that identication is possible even whenthe moment conditions have a degenerate derivative matrix at the true values of the parameters.

    5

  • 7/29/2019 Gospodinov Ng

    7/37

    sucient conditions for global identication.5 While this condition may be too strong in general,

    the condition is necessary for identication of the MA(1) model if the case j0j = 1 is to be allowedfor as we now discuss.

    The MA(1) model has = (; 2)0, 0 = (0; 20)0, = [H; H] [2L; 2H]; where 0 2, see Lii and Rosenblatt (1982, Lemma 1), Ginnakis and Swami (1990); Mendel

    (1991). This necessarily requires that et has non-Gaussian features.

    The use of higher order cumulants per se does not, however, automatically guarantee identi-

    cation of the MA(1) model. Let et = "t; where "t iid(0; 1) with E("lt) = l (l 3) and

    assume that 3 6= 0. Since the third cumulant of a mean-zero, stationary process yt is dened ascum3;y(u; v) = E(ytyt+uyt+v) = cum3;e

    Pqi=0 hihi+uhi+v, the non-zero third order cumulants for

    the MA(1) process are given by8

    cum3;y(0; 0) = E(y3t ) = (1 +

    3)33;

    cum3;y(0;1) = E(y2t yt1) = 233;cum3;y(1; 0) = E(y2t yt1) = 233;

    cum3;y(1;1) = E(yty2t1) = 33:

    Lemma 2 Let = (cum2;y(1); cum2;y(0); cum3;y(u; v))0

    , where cum3;y(u; v) is one of the thirdorder cumulants of yt. Then, (i) cannot globally identify for any 0 = (0;

    20; 3;0)

    0 2 and(ii) cannot locally identify when j0j = 1 for any2 and3.

    7 All-pass models are non-causal and/or non-invertible ARMA models in which the roots of the autoregressivepolynomial are reciprocals of the roots of the moving average polynomials and vice versa.

    8 This folllows from direct evaluation ofE(ytyt+uyt+v) with yt = et + et1.

    8

  • 7/29/2019 Gospodinov Ng

    10/37

    Lemma 2 considers the case when and are of the same dimension. Part (i) of the lemma

    implies that there always exist 1 2 and 2 2 such that 1 and 2 are observationally equivalentin the sense that they generate the same . For example, 1 = (;

    2; 3)0 and 2 = (1=;

    22; 3)0

    both imply the same = (E(ytyt1); E(y2t ); E(y

    2t yt1))

    0. It is easy to verify that the mapping

    from to is also surjective when cum3;y(0;1) is replaced by cum3;y(0; 0) or cum3;y(1;1).This result arises because observational equivalence of second moments precludes cum2;y(1) andcum2;y(0) from exactly identifying and

    2, and a single higher order cumulant might, but cannot

    be guaranteed to identify both 3 and the parameters of the MA(1) model. Part (ii) of Lemma

    2 follows from the fact that the determinant of the derivative matrix is zero at j0j = 1. Localidentication of when j0j = 1 will always require replacing cum2;y(1) or cum2;y(0) with anotherthird order cumulant to avoid degeneracy in the derivative matrix.

    Lemma 3 Let fcum2;y(1); cum2;y(0); cum3;y(0; 0); cum3;y(0;1); cum3;y(1;1)g and()

    be a function that maps to . Then, there exists at least one value in the parameter space for at which rank(G()) < 3 if dim() = 3.

    The three third order cumulants together with the two second order cumulants E(ytyt1) and

    E(y2t ) create ten distinct combinations of a three-dimensional vector of auxiliary parameters to be

    considered for exact identication. Direct calculations show that if consists ofE(ytyt1), E(y3t )

    and E(yty2t1) or E(y

    2t ), E(y

    3t ) and E(yty

    2t1), the derivative matrix is singular at = (1=2)

    1=3. If

    consists ofE(ytyt1), E(y2t yt1) and E(yty

    2t1) or E(y

    2t ), E(y

    2t yt1) and E(yty

    2t1), the matrix

    of derivatives is singular at = 0. If contains both E(ytyt1) and E(y2t ), the determinant of the

    derivative matrix is zero at = 1 and = 1. Finally, if consists ofE(ytyt1), E(y2t yt1) and

    E(y3t ) or E(y2t ), E(y

    2t yt1) and E(y

    3t ), the determinant of the derivative matrix is zero at = 0

    and = 21=3: For all ten combinations of considered, there always exist values of for which the

    derivative matrix is singular. The implication is that exact identication of all 2 from threedimensional subsets of is not possible.

    We now propose an over-identied CMD estimator bCMD based on a vector of cumulantsCMD =

    cum2;y(1) cum2;y(0) cum3;y(0; 0) cum3;y(0;1) cum3;y(1;1)

    0=

    E(ytyt1) E(y

    2t ) E(y

    2t yt1) E(y

    3t ) E(yty

    2t1)

    0

    (6)

    and a mapping function

    CMD() =2 (1 + 2)2 233 (1 +

    3)33 330:

    From a methods of moments perspective, CMD contains information about the covariance struc-

    ture, the unconditional skewness of the process, and the time-varying second moments of the ob-

    9

  • 7/29/2019 Gospodinov Ng

    11/37

    servables. The latter is useful for identication because Ramsey and Montenegro (1992) show that

    the residuals from an autoregressive approximation exhibit ARCH-type structure if the underlying

    MA process is non-invertible and the true errors are asymmetric.

    The derivative matrix ofCMD() with respect to is

    GCMD() =

    0BBBB@2 0

    22 (1 + 2) 0233 3

    23=2 23

    3233 3(1 + 3)3=2 (1 +

    3)3

    33 33=2 3

    1CCCCA : (7)

    Notably, due to the addition of the three higher order cumulants, the derivative matrix has full

    column rank everywhere in even at j0j = 1 and conditions (C.v) and (AN.i) are thus satised.The rank condition is necessary for 0 to be a unique solution to the system of non-linear equations

    characterized by

    GCMD()0W(CMD CMD()) = 031:Provided that 3 6= 0, the uniqueness condition (C.vi) holds. The full rank condition is alsonecessary for the estimator to be asymptotically normal. As a result, this CMD estimator is root-T

    consistent and asymptotic normal.

    Proposition 1 Consider the MA(1) model (1) with et = "t; "t iid(0; 1) and E("3t ) = 3:Assume that 3 6= 0 and Ej"tj6 < 1. Let = (; 2; 3)0 and bCMD be the minimum distanceestimator based on (6), with CMD = Avar(

    bCMD). Then,

    pT(bCMD 0) d!N0; GCMD(0)01CMDGCMD(0)1:

    While analytical results for moving average processes of higher order are dicult to obtain, our

    conjecture is that the use of a single higher order cumulant remains necessary but not sucient for

    identication. Two more third order cumulants must be used in conjunction ofE(ytyt1) and E(y2t )

    to overidentify . Provided that 4 6= 3, fourth order cumulants such as E(y4t ) = [(1+4)4+62]4and E(y2t y

    2t1) = (1 +

    2 + 24 + 4)4 can also be incorporated in conjunction of other cumulants

    of order three or higher.

    3.3 Finite-Sample Properties of the CMD Estimator

    To illustrate the nite-sample properties of the CMD estimators, data with T = 1000 observations

    are generated from an MA(1) model yt = et + et1 and et = "t where "t is iid(0; 1) and follows

    10

  • 7/29/2019 Gospodinov Ng

    12/37

    a generalized lambda distribution (GLD) which will be further discussed in Section 4.1. For now,

    it suces to note that GLD distributions can be characterized by a skewness parameter 3 and

    a kurtosis parameter 4: The true values of the parameters are = 0:5; 0:7; 1; 1:5 and 2, = 1,

    3 = 0; 0:35; 0:6 and 0.85, and 4 = 3. Lack of identication of arises when 3 = 0 and weak to

    intermediate identication occurs when 3 = 0:35; 0.6 and 0.85.Table 1 presents the average estimates and the standard deviations of three CMD estimators of

    and 3 over 5000 Monte Carlo replications. While 3 is typically not a parameter of direct interest,

    information for this parameter would indicate how useful the third cumulants are in identifying and

    estimating the parameters of the model. The rst estimator is the CMD estimator using the sample

    analog of (6) as auxiliary parameters. As argued above, the use of higher order cumulants does

    not necessarily guarantee identication. For this reason, we also consider a just-identied classical

    minimum distance estimator that uses only the sample analog of

    U = E(ytyt1) E(y2t ) E(y2t yt1)0 CMDas auxiliary parameters. For the sake of comparison, we consider an infeasible minimum distance

    estimator which is based on bU but assumes that 2 is known and estimates only (; 3)0. Asdiscussed earlier, xing 2 solves the identication problem. Without imposing invertibility, j0j = 1is not on the boundary of the parameter space for . The infeasible estimator is asymptotically

    normally distributed uniformly over the whole parameter space for and over all error distributions.

    The problem is that 2 is, in general, unknown. We demonstrate, however, that our proposed CMD

    estimator has properties similar to this infeasible estimator.

    The results in Table 1 suggest that regardless of the degree of non-Gaussianity, the infeasibleestimator produces estimates of that are very precise and essentially unbiased. Hence, xing

    solves both identication problems without the need of non-Gaussianity although a prior knowledge

    of is rarely available in practice. As seen in Table 1, the feasible (just-identied) version of this

    estimator, based on bU does not achieve identication of the structural parameters for any valueof3. This estimator is also characterized by a large pile-up probability at unity due to a violation

    of condition (C.v). In contrast, over-identifying the model with the auxiliary moments E(y3t ) and

    E(yty2t1) gives rise to the CMD estimator which achieves identication as the degree of skewness

    increases. For the CMD estimator, the skewness parameter appears to be very well identied and

    estimated over all specications of the error distribution. In fact, the CMD estimates of3 appear

    to be much more precise than the infeasible estimator which can be attributed to the usefulness

    of the additional third order cumulants used in the CMD estimator. However, the estimation of

    depends on the strength of identication. While for 3 = 0:35 the identication is weak and

    the estimates of are somewhat biased, for higher values of the skewness parameter the CMD

    11

  • 7/29/2019 Gospodinov Ng

    13/37

    estimates of are practically unbiased. When 3 = 0:85, the CMD estimator identies correctly

    (with probability one) if the true value of is in the invertible or the non-invertible region.

    Figures 1, 2 and 3 plot the density functions of the standardized CMD estimator of , and

    3, respectively, for the MA(1) model considered in this section with = 1:5 and T= 1000. While

    the lack of identication for zero or low values of the skewness parameter induces non-normality(bimodality and fat tails) in the distribution of the estimator, the densities of the standardized CMD

    estimator of, and 3 appear to be very close to the standard normal density for 3 = 0:85.

    4 Semi-Parametric Simulated Minimum Distance Estimation

    While the CMD estimator for the MA(1) model has appealing asymptotic and nite-sample prop-

    erties, analytical expressions for the mapping from general ARMA(p, q) models to the cumulants

    are not tractable. For this reason, we develop a simulation-based estimator for ARMA(p, q) mod-

    els. The simulation estimator is similar in spirit to the CMD but can accommodate autoregressivedynamics, kurtosis and other features of the errors. The dierence with CMD is that it uses

    simulations to approximate and invert ().More precisely, let ys() = (ys1; : : : ; y

    sT)

    0 be data simulated for a candidate value of . This

    usually requires drawing errors from a known distribution, and the parameters of this distribution

    are ancillary for . Let bT = argminQT(;y)and

    esT() = arg minQT(;ys())be the the auxiliary parameters estimated from actual and simulated data, respectively, where

    QT() denotes the objective function of the auxiliary model. Dene eT;S() to be the average ofthe estimates esT() over S draws each using simulated data of length T, i.e.,9

    eT;S() = 1SSX

    s=1

    esT():A simulation-based minimum distance (SMD) estimator can now be dened as

    bT;S arg minJSMD(bT; eT;S();b)= arg min(bT eT;S())0b1(bT eT;S()); (8)

    9 Alternatively, one could use one draw of simulated data of length T S, yS() = (yS1 ; : : : ; yST;:::;y

    STS)

    0; and deneeT;S () as eT;S () = arg min QT(;y

    S()):

    12

  • 7/29/2019 Gospodinov Ng

    14/37

    where bT is a consistent estimate of the asymptotic variance of bT. The ecient method ofmoments (EMM) estimator of Gallant and Tauchen (1996) and the indirect inference estimator

    (IIE) of Gourieroux et al. (1993) consider pseudo-maximum likelihood estimator of in which case

    QT() is the log-likelihood. Identication requires that the mapping () be injective in the sense of

    Denition 1. In other words, the auxiliary model must contain features of the data generated under. Simulations merely provide an approximation to 1(bT).10 Thus, the () in simulation-basedminimum distance estimation is sample size dependent (Phillips (2012)). Gourieroux et al. (1993)

    refer to the estimator as the method of indirect inference and () as the binding function.Simulation estimation of the MA(1) model was considered in Gourieroux et al. (1993), Michaelides

    and Ng (2000), Ghysels et al. (2003), Czellar and Zivot (2008), among others, but only for the in-

    vertible case. All of these studies use an autoregression as the auxiliary model. For = 0:5 and

    assuming that 2 is known, Gourieroux et al. (1993) nd that the IIE compares favorably to the

    exact MLE in terms of bias and root-mean squared error. Michaelides and Ng (2000) and Ghysels

    et al. (2003) also evaluate the properties of simulation-based estimators with 2 assumed known.

    Czellar and Zivot (2008) report that the IIE is relatively less biased but exhibits some instability

    and the tests based on it suer from size distortions when 0 is close to unity. The favorable

    properties of the IIE when is in the invertible range can be traced to the fact that simulation es-

    timation has a bias-correction property that is absent from classical minimum distance estimation.

    Intuitively, if the auxiliary parameter estimates bT obtained from the data are downward biased,so will the estimates esT estimated from the data simulated for a given . Then, can be calibratedto bias correct the CMD, akin to the bootstrap.

    This bias-correction property provided by simulation estimation has an additional but unex-ploited role when non-invertible models are allowed. As shown in the previous section, identication

    without imposing invertibility relies on information in higher order moments which tend to exhibit

    nite-sample biases. The next section considers a simulation based estimator that achieves identi-

    cation without imposing invertibility and enables classical inference even at 0 = 1.

    4.1 SMD Estimator Based on GLD Errors

    As the key to identication is errors with non-Guassian properties, we need to be able to simulate

    non-Gaussian errors in a exible fashion so that yt has the desired distributional properties. There

    is evidently a large class of distributions with third and fourth moments consistent with a non-

    Gaussian process that one can specify. As assuming a particular parametric error distribution

    could compromise the robustness of the estimates, we simulate errors from the generalized lambda

    10 In the terminology of Gallant and Tauchen (1996), true densities of the data be smoothly embedded within thescores of the auxiliary model.

    13

  • 7/29/2019 Gospodinov Ng

    15/37

    distribution P(1; 2; 3; 4) considered in Ramberg and Schmeiser (1975). This distribution has

    two appealing features. First, it can accommodate a wide range of values for the skewness and

    excess kurtosis parameters and it includes as special cases normal, log-normal, exponential, t, beta,

    gamma and Weibull distributions. The second advantage is that it is easy to simulate from. The

    percentile function is given by

    P(u)1 = 1 + [U3 + (1 U)4]=2; (9)

    where U is a uniform random variable on [0; 1], 1 is a location parameter, 2 is a scale parameter,

    and 3 and 4 are shape parameters. To simulate "t, a U is drawn from the uniform distribution and

    (9) is evaluated for given values of (1; 2; 3; 4). As shown in Ramberg and Schmeiser (1975), the

    shape parameters (3; 4) are explicitly related to the coecients of skewness and kurtosis (3 and

    4) of "t. Furthermore, the shape parameters (3; 4) and the location/scale parameters (1; 2)

    can be sequentially evaluated. Since "t

    has mean zero and variance one, the parameters (1;

    2)

    are determined by (3; 4) so that "t is eectively characterized by 3 and 4.

    We jointly estimate the structural parameters and 2 with the nuisance parameters of

    the non-Gaussian distribution which are necessary for identication of and 2. The structural

    parameter vector is expanded to contain parameters of the error process.11 Dene the augmented

    parameter vector of interest by

    SMD = (; 2; 3; 4)

    0:

    Let the vector of auxiliary parameters be dened from the following regression models:

    yt = 1yt1 + : : :+ pytp + c1y2t1 + v1t; (10a)

    y2t = c0 + c2yt1 + c3y2t1 + v2t: (10b)

    Model (10a) captures the dynamics of yt; the slope parameters of model (10b) reect information

    in the higher-order, time-varying cumulants of the process while the intercept c0 is related to the

    second unconditional moment of yt. To capture information in the skewness and kurtosis of the

    errors, we augment the auxiliary parameter vector with the third and fourth moments of the OLS

    residuals from regression (10a), i.e., 3 = E(v31t) and 4 = E(v

    41t): As a result, the auxiliary

    parameter vector for the SMD estimator is

    SMD = (1;:::;p; c0; c1; c2; c3; 3; 4)0: (11)

    11 It would seem tempting to estimate 3 and 4 separately from (; 2)0, such as using the sample skewness and

    kurtosis of the residuals of a long autoregression. But as discussed in Ramsey and Montenegro (1992), the OLSresiduals do not converge in the limit to the true errors when (L) is non-invertible, rendering their sample highermoments also asymptotically biased.

    14

  • 7/29/2019 Gospodinov Ng

    16/37

    The auxiliary regressions (10a) and (10b) allow us to perform simple tests for identication. By

    Lemma 2, two or more cumulants of third order are necessary for identication of if j0j = 1 is anadmissible value in the parameter space. For example, individual t tests ofH0 : c1 = 0, H0 : c2 = 0

    and H0 : c3 = 0 can shed light on whether the third and fourth order cumulants can identify the

    structural parameters of the MA(1) model. If the individual null hypotheses are rejected, a jointtest can be performed before using the classical or simulation-based estimation of.

    The proposed simulated minimum distance estimator bSMD ofSMD is obtained as in (8) fora given consistent estimator bSMD of the auxiliary parameter vector SMD. The estimator issemi-parametric because we use a possibly misspecied error distribution to simulate data from

    the structural model. To establish the consistency and asymptotic normality of the SMD estimatorbSMD we need some additional notation and regularity conditions. Let P denote the class ofgeneralized lambda distributions and all limits be taken with respect to P as T! 1:

    Proposition 2 LetSMD be dened as in (11). Suppose that in addition to the assumptions in

    Lemma 1, sup2

    eSMD() SMD() p!0 and pT(eSMD() SMD(0)) d!N(0;SMD),where SMD = Avar(eSMD). Then, bSMD p!0 and

    pT(bSMD 0) d!N

    0;

    1 +

    1

    S

    GSMD(0)

    01SMDGSMD(0)1! N0;Avar(bSMD):

    Consistency follows from identiability ofand the moment conditions that exploit information

    in higher order cumulants play a crucial role. In our procedure, 3 and 4 are dened in terms of3

    and 4 so that the estimates of3 and 4 are implied by the generalized lambda distribution instead

    of the sample estimates of skewness and kurtosis. Even though 3 and 4 are not parameters of

    direct interest, they are crucial for identication of and 2.

    A key feature of Proposition 2 is that it holds when is less than, greater than or equal to one. In

    a Gaussian likelihood setting when invertibility is assumed for the purpose of identication, there is

    a boundary for the support of jj at the unit circle. Thus, the likelihood-based estimation has non-standard properties when the true value of is on or near the boundary of one. In our setup, this

    boundary constraint is lifted because identication is achieved through higher moments instead

    of imposing invertibility. As a consequence, the SMD estimator

    bSMD has classical properties

    provided that 3 and 4 enable identication.Consistent estimation of the asymptotic variance of bSMD can proceed by substituting a con-

    sistent estimator of SMD and evaluating the Jacobian GSMD(bT;S) numerically. The computedstandard errors can then be used for testing hypotheses and constructing condence intervals.

    Alternatively, inference on the MA parameter of interest, , can be conducted by constructing

    15

  • 7/29/2019 Gospodinov Ng

    17/37

    condence intervals based on test inversion without an explicit computation of the variance matrix

    Avar(bSMD). For a sequence of null hypotheses H0 : = i for i 2 , consider a generic distancemetric (DM) statistic

    DMSMD =TS

    S+ 1JSMD(bT; eS;T(e);b) JSMD(bT; eS;T(b);b);where b is the unrestricted estimate and e is the restricted estimate under the null. Let be thesignicance level of the test and q1 denote the (1 )-th quantile of the chi-square distributionwith one degree of freedom. Then, the 100(1 )% condence interval for is given by the set ofvalues satisfying DMSMD q1; i.e., C1() = f 2 : DMSMD q1g. The endpoints of thecondence interval are obtained as

    L = inff 2 : Pr(DMSMD q1 j H0) 1 g;U = sup

    f

    2 : Pr(DMSMD

    q1

    jH0)

    1

    g:

    This approach is very convenient since it provides information on the invertibility of the process. We

    implement the DM test with = SMD dened in (11) and being the corresponding asymptotic

    variance of bSMD.4.2 Monte Carlo Simulations for the SMD Estimator

    This section uses simulations to assess the properties of the proposed SMD estimator. Section 4.2.1

    evaluates the point estimates of an MA(1) model and Section 4.2.2 studies the estimated impulse

    response functions of an ARMA(1, 1) model.

    4.2.1 Parameter Estimation in MA(1) Model

    We rst study the nite-sample behavior of the proposed SMD estimator in invertible and non-

    invertible MA(1) models with data generated from

    yt = et + et1; et = "t;

    where "t iid(0; 1) is drawn from a GLD with zero excess kurtosis and a skewness parameter0.85.12 In all simulation designs, = 1 and takes the values of 0:5; 0:7; 1; 1:5; and 2.13 The

    sample sizes are T = 1000 and 2000 and the number of Monte Carlo replications is 1000. Wealso investigate the properties of the SMD estimator for smaller sample sizes (T = 500) and other

    asymmetric (chi-squared and exponential) distributions.

    12 Results for a larger range of values of the skewness parameter for GLD are not reported to conserve space butare available from the authors upon request.

    13 The results are invariant to the choice of.

    16

  • 7/29/2019 Gospodinov Ng

    18/37

    The proposed SMD estimator is implemented as follows. We use an error simulator based on the

    generalized lambda error distribution. For the auxiliary model (10a), we use p = 4 for the lag order

    of the AR polynomial. It appears that larger values ofS (the number of simulated sample paths

    of length T) tend to smooth the objective functions which improves the identication of the MA

    parameter. As a result, we set S= 20 although S > 20 seems to oer even further improvement,especially for small T; but at the cost of increased computational time. In addition to the estimate

    of, the SMD also delivers estimates of, 3 and 4. From the estimates of3 and 4, we construct

    estimates of3 and 4 as (see Ramberg and Schmeiser (1975))

    3 =C 3AB + 2A3

    32;

    4 =D 4AC+ 6A2B 3A4

    42;

    where A =

    1

    1+3 1

    1+4 , B =

    1

    1+23 +

    1

    1+24 2Beta(1 + 3; 1 + 4), 2 =pB A

    2

    , C =11+33

    3Beta(1+23; 1 +4)+3Beta(1+3; 1 + 24) 11+34 , D =1

    1+434Beta(1+33; 1 +4) +

    6Beta(1 + 23; 1 + 24) 4Beta(1 + 3; 1 + 34) + 11+44 , and Beta(; ) denotes the beta function.As is true of all non-linear estimation problems, the numerical optimization problem must

    take into account the possibility of local minima. Once non-invertibility is allowed, we need to

    additionally allow for the possibility of multiple equilibria. Thus, the estimation always considers

    two sets of initial values. Specically, we draw two starting values for - one from a uniform

    distribution on (0; 1) and one from a uniform distribution on (1; 2) - with the starting value for

    set equal to qb2y=(1 +

    2) for each of the starting values for . The starting values for the shape

    parameters of the GLD 3 and 4 are set equal to those of the standard normal distribution (with

    3 = 0 and 4 = 3). In this respect, the starting values of , , 3 and 4 contain little prior

    knowledge of the true parameters.

    Figure 4 illustrates how identiability depends on skewness by plotting the log of the objective

    function for the SMD estimator averaged over 1000 Monte Carlo replications of the MA(1) model

    for dierent values of and : The true values of and are 0.7 and 1, respectively, and the errors

    are generated from GLD with zero excess kurtosis and three values of the skewness parameter: 0,

    0.35, 0.6 and 0.85.14 The rst case (skewness=0) corresponds to lack of identication and there

    are two pronounced local minima at and 1=: As the skewness of the error distribution increases,the second local optima at 1= attens out and it almost completely disappears when the error

    distribution is highly asymmetric.

    14 In evaluating the objective function, the values of the lambda parameters in the generalized lambda distributionare set equal to their true values.

    17

  • 7/29/2019 Gospodinov Ng

    19/37

    Tables 2 reports the mean and median estimates of , the average asymptotic standard error

    of the SMD estimator of and the standard deviation of the estimates for which identication is

    achieved. In addition, Table 2 presents the empirical probability of the SMD estimate of to be

    greater than one which provides information on how often the identication of the true parameter

    fails. The last column of Table 2 reports the rejection rate of the DM test ofH0 : = 0 at 10%signicance level. The main ndings can be summarized as follows. The SMD estimator of appears

    to be median unbiased for all values of , even for small T. While there is a positive probability

    that the SMD estimator will converge to 1= instead of (especially when is in the non-invertible

    region), this probability is fairly small and it disappears completely for T = 2000. Interestingly,

    in terms of precision, the SMD estimator appears to be more ecient even than the infeasible

    estimator in Table 1 for values of in the invertible region (see also Gorodnichenko et al. (2012)

    for a similar result in the context of autoregressive models). The asymptotic variance expression

    in Proposition 2 tends to provide a very good approximation of the nite-sample variation of the

    SMD estimates. Finally, the rejection rates of the hypothesis tests based on the SMD estimator

    are very close to the nominal level which suggests that the asymptotic normality provides a good

    approximation of the distribution of the SMD estimator over the whole parameter space.

    Several remarks regarding the eciency properties of the SMD estimator are in order. First, the

    SMD estimator tends to exhibit substantially smaller variability than the CMD estimator in Table

    1 (case 3 = 0:85). These eciency gains are expected since the instrumental model based on the

    AR approximation encompasses the dependence structure of the MA(1) model as the lag order p

    increases to innity. What is somewhat surprising is the magnitude of the eciency gains. Second,

    it is instructive to compare the sampling variability of the SMD estimator to the ML estimatorwhich provides the eciency bound for any estimator in the invertibility region. Recall that the

    variance of the Gaussian ML estimator is (1 2)=T which, due to the invertibility restriction,shrinks to zero as the MA parameter approaches one. In contrast, our proposed SMD estimator

    does not impose invertibility and its variance does not exhibit this type of behavior. For this

    reason, a fair comparison between the SMD and ML estimators would involve values of that are

    far away from the invertibility boundary, such as = 0:5. The sample dispersion measures for the

    SMD estimator of0 = 0:5 in Table 2 are apparently very close and even lower than the asymptotic

    standard error of the MLE which is 0.0274 and 0.0194 for T = 1000 and T= 2000, respectively. We

    should note that similar results are reported by Gourieroux et al. (1993) for the simulation-based

    (indirect inference) estimator of the invertible MA(1) model.

    To gain some understanding about the source of the excellent properties of the SMD estimator

    of, Table 3 reports the mean and median SMD estimates of the nuisance parameters , 3 and

    18

  • 7/29/2019 Gospodinov Ng

    20/37

    4 along with their Monte Carlo standard deviations. The estimate of is practically unbiased

    and very precise. Importantly, the skewness parameter, albeit slightly downward biased, is very

    precisely estimated (its standard deviation is smaller than the standard deviation of the CMD

    estimator in Table 1). This points to the possibility that the excellent identication and estimation

    properties of the SMD estimator of are likely to be due to its built-in bias correction and improvedeciency the improved estimation of the higher order moments of the error process.

    Finally, Table 4 presents results for the SMD estimator of and for a smaller simple size

    (T = 500) and two other asymmetric error distributions: chi-squared distribution with 6 degrees of

    freedom (with skewness and excess kurtosis parameters of 1:15 and 2, respectively) and exponential

    distribution with a scale parameter of one (with skewness and excess kurtosis parameters of 2 and

    6, respectively). The errors are recentered and rescaled to have a mean of zero and variance one.

    Note that the simulator for the SMD estimator is still based on the GLD family and, hence, it

    is misspecied. The results in Table 4 are in line with the previous results for larger sample

    sizes and GLD errors. The SMD estimates of and appear to be almost unbiased and exhibit

    small variability. With the smaller sample size, the probability that the SMD estimate of is

    not identied increases up to 3.8% in some cases but, overall, the nite-sample properties of our

    proposed estimator remain quite attractive.

    4.2.2 Impulse Response Function Estimation of All-Pass ARMA(1, 1) Model

    One of the main advantages of SMD is its exibility to accommodate more general models and

    dependence structures. To illustrate this, we consider the all-pass ARMA(1, 1) model

    yt yt1 = et (1=)et1 for jj < 1; (12)

    where et is a standard exponential random variable with a scale parameter equal to one which is

    recentered and rescaled to have mean zero and variance 1. As discussed in Davis (2010), this process

    possesses some interesting properties. First, the process in (12) is uncorrelated but it exhibits

    higher order dependence (conditional heteroskedasticity). Furthermore, while the process yt is

    causal, it has a non-invertible MA component. If one imposes invertibility on the MA component

    (or replaces the MA parameter 1= by and the unit variance of the error term by (1=)2),

    the process has cancelling roots in the AR and MA polynomials and it reduces to an iid random

    sequence. Therefore, using estimators that impose invertibility would result in a at impulse

    response function while the true impulse response function for horizon j > 1 is given by

    @yt@etj

    = j1( 1=):

    19

  • 7/29/2019 Gospodinov Ng

    21/37

    We investigate the SMD and Gaussian quasi ML estimates of the impulse response functions

    (IRFs) for = 0:5 and 0:5 (T = 500): The SMD estimator uses the same auxiliary model as inthe previous section. The median IRF estimates obtained from 1,000 Monte Carlo replications are

    plotted in Figure 5 and Figure 6, respectively. The SMD-based IRF estimates are median unbiased

    and trace closely the shape of the true impulse response. In sharp contrast, the Gaussian quasiMLE fails to identify the AR and MA parameters and produces a at IRF around zero.

    5 Empirical Application: Commodity Prices

    Non-invertibility can be consistent with economic theory. For example, suppose yt = EtP1

    s=0 sxt+s

    is the present value ofxt = et + #et1. The solution yt = (1 + #)et + #et1 = h(L)et implies that

    the root ofh(z) is 1+## which can be on or inside the unit circle even if j#j < 1. If there is nodiscounting and = 1, yt has a moving average unit root when # = 0:5 and h(L) is non-invertible

    in the past whenever # < 0:5.15

    Present value models are used to analyze a variables with a forward looking component including

    commodity prices whose dynamics have implications for monetary policy and asset allocation. It is a

    stylized fact that commodity price changes are almost uncorrelated (or very weakly autocorrelated)

    over time and exhibit conditional heteroskedasticity. These two characteristics are also properties

    of the all-pass models considered in the previous section and it is interesting to see if commodity

    price changes are driven by a non-invertible MA component. To see that this is also theoretically

    plausible, we revisit the present value model by Pindyck (1993) of commodity price determination.

    Let st and ft denote the spot and futures commodity price for delivery at time t+ 1, and cyt be the

    (net of insurance and storage costs) marginal convenience yield over the period. The no-arbitrage

    condition implies that

    Et(cyt) = (1 + i)st ft; (13)where i is the risk-free rate. Let Et(st+1) = ft + rpt, where rpt is a time-varying risk premium,

    and assume that rpt = ( i)st; where denotes a risk-adjusted discount rate for the commodity.Substituting for ft = Et(st+1) ( i)st into (13) yields

    Et(st+1) = (1 + )st cyt: (14)

    The stationary (no-bubble) solution to the expectational dierence equation (14) is given by

    st = (1 + )

    1Xi=0

    (1 + )iEt(cyt+i):

    15 If the moving average polynomial #(L) is of innite order, as it would be the case for causal autoregressive

    processes, it is still possible for the roots ofh(L) = #()L#(L)L

    to be inside the unit disk.

    20

  • 7/29/2019 Gospodinov Ng

    22/37

    The presence of an invertible MA component in the convenience yield would induce a (possibly)

    non-invertible MA component in the dynamics of the observable commodity prices. Given the

    possible nonstationarity in commodity prices, we estimate an ARMA(1, 1) model of commodity

    (log) price changes

    4st = 4 st1 + et + et1using the Gaussian MLE and the proposed SMD estimator.

    The data for the empirical analysis consist of commodity prices of the nearest futures contract

    from the Commodity Research Bureau and cover the period March 1983 - July 2008. The ARMA(1,

    1) model is estimated at monthly frequency by taking the last daily price in the month as the

    corresponding monthly observation. We use 22 commodity prices from 6 commodity groups: energy

    (crude oil, heating oil), grains and oilseeds (soybean oil, corn, oats, soybeans, wheat, canola), metals

    (platinum, copper, gold, silver, palladium), industrials (cotton, lumber), livestock and meats (cattle

    feeder, cattle live, pork bellies, hogs lean) and foodstus (cocoa, sugar, coee).Table 5 presents the estimation results. Practically all of the commodity price changes exhibit

    some form of non-Gaussianity which is necessary for identifying possible non-invertible MA compo-

    nents. The Gaussian ML tends to produce estimates for and of similar magnitude and opposite

    sign suggesting a presence of cancelling roots and lack of identiability. However, this lack of iden-

    tication could be an artifact of imposing invertibility on the MA root as argued in the previous

    section. Indeed, when this restriction is relaxed within the SMD procedure, most of the commodity

    price changes (except for gold and live cattle) appear to be driven by a non-invertible MA compo-

    nent. Another interesting observation is that the estimated AR and MA parameters are of similar

    magnitude and sign across the dierent commodities which seems to suggest that the parameters

    are well identied within the SMD procedure. This is not the case for the Gaussian MLE where the

    parameter estimates span a wide range of values which possibly arises from the non-identiability

    of the parameters. Overall, there is strong evidence in support of non-invertibility in commod-

    ity price changes which has potentially important implications for impulse response analysis and

    forecasting.16

    6 Conclusions

    This paper proposes classical and simulation-based minimum distance estimation of possibly non-

    invertible MA models with non-Gaussian errors. The classical minimum distance estimator is

    16 Non-invertibility is expected to arise in other variables, such as stock prices, that are believed to be determinedby the present value model. In unreported results for the period February 1952 August 2012, tting an ARMA (1,1) to monthly returns on the S&P500 index (with sample skewness of -0.68 and sample kurtosis of 5.44) has producedSMD (ML) estimates of 0.684 (-0.726) and -1.394 (0.791) for the AR and MA parameters, respectively.

    21

  • 7/29/2019 Gospodinov Ng

    23/37

    developed and analyzed for the MA(1) model with asymmetric errors. The identication of the

    structural parameters is achieved by exploiting the non-Gaussianity of the process through third

    order cumulants. This type of identication also removes the boundary problem at the unit circle

    which gives rise to the pile-up probability and non-standard asymptotics of the Gaussian maximum

    likelihood estimator. As a consequence, the proposed classical minimum distance estimator isroot-T consistent and asymptotically normal over the whole parameter range, provided that the

    non-Gaussianity in the data is suciently large to ensure identication.

    To accommodate more general models with analytically intractable binding functions, we de-

    velop a simulation estimator based on auxiliary regressions that incorporate information from the

    higher order cumulants of the data. The eciency of the estimator is controlled by the ability of the

    auxiliary model in approximating the true data generating process. Our proposed simulated min-

    imum distance estimator is semi-parametric in the sense that it uses a possibly misspecied error

    simulator with a exible functional form that approximates a large class of distributions with non-

    Gaussian features. Particular attention is paid to the accurate estimation of the shape parameters

    of the error distribution which play a critical role in identifying the structural parameters.

    22

  • 7/29/2019 Gospodinov Ng

    24/37

    References

    Anderson, T. and Takemura, A. 1986, Why Do Noninvertible Estimated Moving Averages Occur?,Journal of TIme Series Analysis 7(4), 235254.

    Andrews, B., Davis, R. and Breidt, F. 2006, Maximum Likelihood Estimation of All-Pass Time

    Series Models, Journal of Multivariate Analysis 97, 16381659.Andrews, B., Davis, R. and Breidt, F. 2007, Rank-Based Estimation of All-Pass Time Series Models,

    Annals of Statistics 35, 844869.

    Brockwell, P. and Davies, R. 1991, Time Series Theory and Methods, 2nd edn, Springer-Verlag,New York.

    Czellar, V. and Zivot, E. 2008, Improved Small Sample Inference for Ecient Method of Momentsand Indirect Inference Estimators, University of Washington.

    Davis, R. 2010, All-Pass Procssess with Applications to Finance, Plenary Talk at the 7th Interna-tional Iranian Workshop on Stochastic Processes.

    Davis, R. and Dunsmuir, W. 1996, Maximum Likelihood Estimation for MA(1) processes with aRoot on the Unit Circle, Econometric Theory 12, 120.

    Davis, R. and Song, L. 2011, Unit Roots in Moving Averages Beyond First Order, Annals ofStatistics 39(6), 30623091.

    Dovonon, P. and Renault, E. 2011, Testing for Common GARCH Factors, MPRA paper 40244.

    Due, D. and Singleton, K. 1993, Simulated Moments Estimation of Markov Models of AssetPrices, Econometrica 61, 929952.

    Fernndez-Villaverde, F., Rubio-Ramrez, J., Sargent, T. and Watson, M. 2007, A,B,Cs and (D)sfor Understanding VARs, American Economic Review 97:3, 10211026.

    Gallant, R. and Tauchen, G. 1996, Which Moments to Match, Econometric Theory 12, 657681.

    Ghysels, E., Khalaf, L. and Vodounou, C. 2003, Simulation Based Inference in Moving AverageModels, Annales D Economie et Statistique 69, 8599.

    Ginnakis, G. and Swami, A. 1990, On Estimating Noncausal Nonmiminum Phase ARMA Models ofNon-Gaussian Processes, IEEE Transations, Acoustics, Speech, and Signal Processing 38, 478495.

    Gorodnichenko, Y., Mikusheva, A. and Ng, S. 2012, Estimators for Persistent and Possibly Non-Stationary Data with Classical Properties, Econometric Theory 28, 10031036.

    Gospodinov, N. 2002, Bootstrap Based Inference in Models with a Nearly Noninvertible MovingAverage Component, Journal of Business and Economic Statistics 20, 254268.

    Gourieroux, C., Monfort, A. and Renault, E. 1993, Indirect Inference, Journal of Applied Econo-metrics 85, S85S118.

    Hall, A. 2005, Generalized Methods of Moments, Advanced Texts in Econometrics, Oxford Univer-sity Press, Oxford.

    23

  • 7/29/2019 Gospodinov Ng

    25/37

    Hansen, L. and Sargent, T. 1991, Two Diculties in Interpreting Vector Autoressions, in L. P.Hansen and T. J. Sargent (eds), Rational Expectations Econometrics, Westview, London, pp. 77119.

    Harris, D. 1999, GMM Estimation of Time Series Models, in L. Mtys (ed.), Generalized Methodof Moments Estimation, Vol. Themes in Modern Econometrics, Cambridge University Press,

    Cambridge, U.K., pp. 149169.Huang, J. and Pawitan, Y. 2000, Quasi-Likelihood Estimation of Non-Invertible Moving Average

    Processes, Scandinavian Journal of Statistics 27, 689702.

    Komunjer, I. 2012, Global Identication in Nonlinear Models with Moment Restrictions, Econo-metric Theory, forthcoming.

    Komunjer, I. and Ng, S. 2011, Dynamic Identication of Dynamic Stochastic General EquilibriumModels, Econometrica 79:6, 19952032.

    Lii, K. and Rosenblatt, M. 1982, Deconvolution and Estimation of Transfer Function Phase Coef-cients for Non-Gaussian Linear Processes, Annals of Statistics 10, 11951208.

    Lii, K. and Rosenblatt, M. 1992, An Approximate Maximum Likelihood Estiation of Non-GaussianNon-Minimum Phase Moving Average Processes, Journal of Multivariate Analysis 43, 272299.

    Lippi, M. and Reichlin, L. 1993, The Dynamic Eects of Aggregate Demand and Supply Distur-bances: Comment, American Economic Review 83, 644652.

    Meitz, M. and Saikkonen, P. 2011, Maximum Likelihood Estimation of a Non-Invertible ARMAModel with Autoregressive Conditional Heteroskedasticity, mimeo, University of Helsinki.

    Mendel, J. 1991, Tutorial on Higher Order Statsitics in Signal Processing and System Theory:Theoretical Results and Some Applications, Proceedings of the IEEE 79(3), 278305.

    Michaelides, A. and Ng, S. 2000, Estimating the Rational Expectations Model of SpeculativeStorage: A Monte Carlo Comparison of Three Simulation Estimators, Journal of Econometrics

    96:2, 231266.

    Newey, W. and McFadden, D. 1994, Large Sample Estimation and Hypothesis Testing, Handbookof Econometrics, Vol. 4,Chapter 36, North Holland.

    Phillips, P. 2012, Folklore Theorems, Implicit Maps, and Indirect Inference, Econometrica80(1), 425454.

    Pindyck, R. 1993, The Present Value Model of Rational Commodity Pricing, Journal of PoliticalEconomy 103, 511530.

    Ramberg, J. and Schmeiser, B. 1975, An Approximate Method for Generating Asymmetric RandomVariables, Communications of the ACM 17(2), 7882.

    Ramsey, J. and Montenegro, A. 1992, Identication and Estimation of Non-invertible Non-GaussianMA(q) processes, Journal of Econometrics 54, 301320.

    Rothenberg, T. 1971, Identication in Parametric Models, Econometrica 39:3, 577591.

    Ruud, P. 2000, An Introduction to Classical Econometric Theory, Oxford University Press, NewYork.

    24

  • 7/29/2019 Gospodinov Ng

    26/37

    Sargan, D. and Bhargava, A. 1983, Maximum Likelihood Estimation of Regression Models withFirst Order Moving Average Errors When the Root Lies on the Unit Circle, Econometrica51, 799820.

    Sargan, J. D. 1983, Identication and Lack of Identication, Econometrica 51:6, 16051633.

    Tugnait, J. 1986, Identication of Non-Minimum Phase Linear Stochastic Systems, Automatica22, 457464.

    25

  • 7/29/2019 Gospodinov Ng

    27/37

    Table 1: CMD estimates from MA(1) model with possibly asymmetric errors.

    0 CMD estimator just-identied estimator infeasible estimatorb b3 b b3 b b3mean std. mean std. mean std. mean std. mean std. mean std.

    3 = 0

    0.5 1.459 0.754 0.001 0.126 1.623 0.675 -0.020 0.360 0.501 0.048 -0.019 0.5160.7 1.139 0.394 -0.003 0.168 1.271 0.342 -0.012 0.287 0.697 0.056 -0.014 0.3741.0 1.035 0.223 -0.006 0.189 1.082 0.200 -0.011 0.294 0.995 0.052 -0.010 0.2961.5 1.146 0.443 -0.006 0.159 1.342 0.365 -0.015 0.293 1.496 0.047 -0.009 0.2602.0 1.417 0.764 -0.002 0.126 1.753 0.594 -0.025 0.335 1.996 0.051 -0.008 0.257

    3 = 0:35

    0.5 0.648 0.446 0.327 0.141 1.643 0.666 0.203 0.351 0.501 0.048 0.323 0.5000.7 0.801 0.266 0.342 0.178 1.290 0.329 0.249 0.282 0.696 0.056 0.329 0.3631.0 1.026 0.204 0.331 0.178 1.097 0.197 0.330 0.287 0.994 0.052 0.329 0.2881.5 1.432 0.305 0.343 0.167 1.383 0.334 0.363 0.284 1.496 0.047 0.331 0.2532.0 1.925 0.429 0.336 0.130 1.793 0.557 0.393 0.345 1.996 0.052 0.331 0.250

    3 = 0:60.5 0.506 0.108 0.593 0.102 1.614 0.677 0.355 0.357 0.501 0.048 0.561 0.4710.7 0.702 0.105 0.602 0.144 1.305 0.320 0.431 0.281 0.697 0.056 0.569 0.3421.0 1.010 0.153 0.572 0.174 1.104 0.194 0.570 0.280 0.994 0.052 0.568 0.2731.5 1.514 0.201 0.601 0.139 1.369 0.345 0.635 0.298 1.496 0.047 0.571 0.2382.0 2.019 0.234 0.594 0.099 1.741 0.604 0.702 0.408 1.996 0.052 0.572 0.236

    3 = 0:85

    0.5 0.499 0.055 0.825 0.078 1.619 0.674 0.493 0.360 0.500 0.048 0.790 0.4330.7 0.692 0.083 0.826 0.133 1.315 0.313 0.603 0.281 0.696 0.056 0.796 0.3141.0 1.001 0.095 0.804 0.161 1.105 0.194 0.795 0.273 0.994 0.052 0.793 0.2511.5 1.527 0.183 0.828 0.127 1.337 0.367 0.901 0.320 1.495 0.047 0.797 0.219

    2.0 2.020 0.216 0.824 0.083 1.648 0.667 1.026 0.509 1.995 0.052 0.798 0.217

    Notes: The table reports the mean and the standard deviation (std.) of the CMD estimates of and 3 from the MA(1) model yt = et + et1, et = "t and "t iid(0; 1) generated from a gen-eralized lambda distribution with a skewness parameter 3 and zero excess kurtosis. The samplesize is T = 1000, the number of Monte Carlo replication is 5000 and = 1. CMD estimator is theover-identied classical mimimum distance estimator of (;;3)

    0 with a vector of auxiliary para-meters (E(ytyt1); E(y

    2t ); E(y

    2t yt1); E(y

    3t ); E(yty

    2t1))

    0; the just-identied estimator is the classical

    minimum distance estimator of(;;3)0 with auxiliary parameters (E(ytyt1); E(y

    2t ); E(y

    2t yt1))

    0;and the infeasible estimator is the classical minimum distance estimator of (; 3)

    0 with = 1assumed known and auxiliary parameters (E(ytyt1); E(y

    2t ); E(y

    2t yt1))

    0.

    26

  • 7/29/2019 Gospodinov Ng

    28/37

    Table 2: SMD estimates of from MA(1) model with asymmetric errors

    true 0 mean median Pr(bSMD > 1) s.e. std. DM testT= 1000

    0.5 0.500 0.500 0.000 0.026 0.027 0.1120.7 0.701 0.700 0.000 0.030 0.031 0.112

    1.0 0.969 0.978 0.381 0.074 0.074 0.1041.5 1.499 1.499 0.997 0.067 0.070 0.1122.0 2.007 2.004 0.997 0.113 0.119 0.114

    T= 2000

    0.5 0.500 0.500 0.000 0.019 0.019 0.1110.7 0.701 0.700 0.000 0.022 0.022 0.1011.0 0.982 0.990 0.433 0.056 0.058 0.1011.5 1.500 1.499 1.000 0.048 0.050 0.1172.0 2.004 2.002 1.000 0.080 0.083 0.108

    Notes: The table reports some summary statistics of the simulated minimum distance (SMD)

    estimates of from the MA(1) model yt = et + et1, et = "t and "t iid(0; 1) generated froma generalized lambda distribution with a skewness parameter 3 = 0:85 and zero excess kurtosis.The sample size is T = 1000 and 2000, the number of Monte Carlo replication is 1000 and = 1.

    Pr(bSMD > 1) signies the probability (over Monte Carlo replications) that bSMD > 1; s.e. isthe average standard error computed from consistent estimates of the relevant asymptotic variance

    expressions and std. denotes the Monte Carlo standard deviation ofbSMD. The last column of thetable report the rejection rates of the DM test ofH0 : = 0 at 10% signicance level.

    27

  • 7/29/2019 Gospodinov Ng

    29/37

    Table 3: SMD estimates of, 3 and 4 from MA(1) model with asymmetric errors

    true 0 bSMD b3;SMD b4;SMDmean median std. mean median std. mean median std.

    T = 1000

    0.5 0.994 0.994 0.022 0.817 0.816 0.062 2.966 2.937 0.242

    0.7 0.994 0.994 0.022 0.817 0.815 0.068 2.956 2.942 0.2631.0 1.000 0.997 0.040 0.803 0.805 0.100 2.982 2.961 0.2981.5 1.004 0.997 0.049 0.817 0.817 0.100 2.993 2.933 0.5032.0 1.008 0.998 0.062 0.816 0.816 0.086 2.993 2.910 0.389

    T = 2000

    0.5 0.997 0.997 0.015 0.824 0.824 0.044 2.988 2.978 0.1530.7 0.997 0.997 0.015 0.824 0.822 0.049 2.982 2.972 0.1891.0 1.000 0.999 0.031 0.816 0.818 0.074 2.988 2.966 0.2441.5 1.001 1.001 0.034 0.823 0.820 0.069 2.975 2.936 0.3592.0 1.008 0.999 0.043 0.822 0.823 0.059 2.977 2.911 0.281

    Notes: The table reports some summary statistics of the simulated minimum distance (SMD)estimates of , 3 and 4 from the MA(1) model yt = et + et1, et = "t and "t iid(0; 1)generated from a generalized lambda distribution with a skewness parameter 3 = 0:85 and zeroexcess kurtosis (4 = 3). The sample size is T = 1000 and 2000, the number of Monte Carloreplication is 1000 and = 1. std. denotes the Monte Carlo standard deviation of the correspondingestimate.

    28

  • 7/29/2019 Gospodinov Ng

    30/37

    Table 4: SMD estimates of and from MA(1) model with chi-squared and exponential errors

    error true 0 bSMD bSMDdistr. mean median Pr(bSMD > 1) std. mean median std.26 0.5 0.548 0.504 0.027 0.046 0.955 0.968 0.0452

    6

    0.7 0.716 0.706 0.022 0.055 0.967 0.969 0.04526 1.0 0.979 0.982 0.425 0.098 0.983 0.985 0.06426 1.5 1.458 1.478 0.962 0.105 1.009 0.996 0.07726 2.0 1.967 1.982 0.978 0.177 1.006 0.987 0.096

    exp(1) 0.5 0.558 0.500 0.038 0.047 0.935 0.953 0.054exp(1) 0.7 0.705 0.700 0.004 0.051 0.960 0.960 0.053exp(1) 1.0 0.966 0.978 0.377 0.083 0.987 0.985 0.064exp(1) 1.5 1.493 1.500 0.977 0.114 0.988 0.982 0.083exp(1) 2.0 2.015 2.021 0.987 0.188 0.981 0.973 0.105

    Notes: The table reports some summary statistics of the simulated minimum distance (SMD)estimates of and from the MA(1) model yt = et + et1, et = "t and "t is either an iid chi-

    squared random variable with 6 degrees of freedom (26) or an exponential random variable with ascale parameter equal to one (exp(1)). The errors "t are recentered and rescaled to have mean zero

    and variance 1. The sample size is T = 500, the number of Monte Carlo replication is 1000 and = 1. std. denotes the Monte Carlo standard deviation of the corresponding estimate.

    29

  • 7/29/2019 Gospodinov Ng

    31/37

    Table 5: SMD and Gaussian ML estimates of ARMA(1, 1) model for commodity prices

    commodities sample moments Gaussian ML SMD

    skewness kurtosis b b b bcrude oil 0:019 5:259 0:403

    (0:251)0:538(0:241)

    0:617(0:075)

    1:575(0:218)

    heating oil 0:207 7:394 0:653(0:239) 0:742(0:217) 0:501(0:096) 1:531(0:076)soybean oil 0:018 5:440 0:893

    (0:106)0:814(0:124)

    0:635(0:065)

    1:595(0:108)

    corn 0:346 6:884 0:634(0:562)

    0:583(0:588)

    0:628(0:060)

    1:627(0:170)

    oats 1:086 9:538 0:176(0:512)

    0:288(0:497)

    0:696(0:057)

    1:274(0:081)

    soybeans 0:724 7:259 0:881(0:095)

    0:797(0:116)

    0:515(0:108)

    1:886(0:255)

    wheat 0:020 3:565 0:154(0:693)

    0:078(0:697)

    0:924(0:412)

    1:044(0:456)

    canola 0:887 11:255 0:408(0:405)

    0:500(0:390)

    0:402(0:091)

    1:691(0:067)

    platinum 0:246 4:963 0:046(0:606) 0:146(0:591) 0:638(0:043) 1:428(0:066)copper 0:495 5:400 0:314

    (1:524)0:341(1:498)

    0:691(0:084)

    1:347(0:126)

    gold 0:347 3:598 0:267(0:614)

    0:348(0:586)

    0:234(0:241)

    4:426(4:804)

    silver 0:014 3:979 0:171(0:314)

    0:335(0:295)

    0:098(0:346)

    0:266(0:330)

    palladium 0:229 5:189 0:290(1:272)

    0:253(1:278)

    0:680(0:049)

    1:443(0:154)

    cotton 2:040 18:857 0:903(0:023)

    1:000(0:016)

    0:883(0:078)

    1:048(0:101)

    lumber 0:139 3:498 0:449(0:317)

    0:329(0:332)

    0:712(0:149)

    1:166(0:181)

    cattle, feeder

    0:498 5:912 0:046

    (2:333) 0:020

    (2:337)

    0:623(0:052)

    1:386(0:064)

    cattle, live 0:462 5:079 0:698(0:084)

    0:891(0:053)

    0:670(0:596)

    0:955(0:987)

    pork bellies 0:503 5:198 0:841(0:045)

    0:961(0:025)

    0:390(0:078)

    1:673(0:094)

    hogs, lean 0:396 5:462 0:822(0:046)

    0:973(0:029)

    0:579(0:065)

    1:290(0:090)

    cocoa 0:325 3:832 0:230(0:281)

    0:054(0:287)

    0:016(0:433)

    0:180(0:410)

    sugar 1:127 6:920 0:933(0:021)

    1:000(0:014)

    0:363(0:119)

    2:506(0:658)

    coee 0:374 4:685 0:338(0:657)

    0:263(0:664)

    0:669(0:060)

    1:375(0:132)

    Notes: The table reports the SMD and Gaussian quasi ML estimates and standard errors (inparentheses below the estimates) for the ARMA(1, 1) model 4st = 4 st1 + et + et1, whereet iid(0; 2). The rst two columns report the sample skewness and kurtosis of4st.

    30

  • 7/29/2019 Gospodinov Ng

    32/37

    25 20 15 10 5 0 5 100

    .05

    0.1

    .15

    0.2skewness=0

    20 15 10 5 00

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4skewness=0.35

    16 14 12 10 8 6 4 2 0 2 40

    .05

    0.1

    .15

    0.2

    .25

    0.3

    .35

    0.4

    skewness=0.6

    8 6 4 2 0 2 40

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    skewness=0.85

    Figure 1: Density functions of the standardized CMD estimator (t-statistic) of based on data(T = 1000) generated from an MA(1) model yt = et + et1 with = 1:5 and et iid(0; 1). Theerrors are drawn from a generalized lambda distribution with zero excess kurtosis and a skewnessparameter equal to 0, 0.35, 0.6 and 0.85.

    31

  • 7/29/2019 Gospodinov Ng

    33/37

    10 5 0 5 10 15 200

    .05

    0.1

    .15

    0.2skewness=0

    6 4 2 0 2 4 6 8 10 120

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35skewness=0.35

    6 4 2 0 2 4 6 8 10 120

    .05

    0.1

    .15

    0.2

    .25

    0.3

    .35

    0.4

    skewness=0.6

    6 4 2 0 2 40

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    skewness=0.85

    Figure 2: Density functions of the standardized CMD estimator (t-statistic) of based on data(T = 1000) generated from an MA(1) model yt = et + et1 with = 1:5 and et iid(0; 1). Theerrors are drawn from a generalized lambda distribution with zero excess kurtosis and a skewnessparameter equal to 0, 0.35, 0.6 and 0.85.

    32

  • 7/29/2019 Gospodinov Ng

    34/37

    6 4 2 0 2 4 60

    .05

    0.1

    .15

    0.2

    .25

    0.3

    .35skewness=0

    6 4 2 0 2 40

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4skewness=0.35

    8 6 4 2 0 2 4 60

    .05

    0.1

    .15

    0.2

    .25

    0.3

    .35

    0.4

    skewness=0.6

    8 6 4 2 0 2 40

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    skewness=0.85

    Figure 3: Density functions of the standardized CMD estimator (t-statistic) of the skewness para-meter (3) based on data (T= 1000) generated from an MA(1) model yt = et + et1 with = 1:5and et iid(0; 1). The errors are drawn from a generalized lambda distribution with zero excesskurtosis and a skewness parameter equal to 0, 0.35, 0.6 and 0.85.

    33

  • 7/29/2019 Gospodinov Ng

    35/37

    00.5

    11.5

    22.5

    0

    0.5

    1

    .5

    4

    2

    0

    2

    4

    skewness=0

    0

    0.51

    1.52

    0

    0.5

    1

    1.5

    4

    2

    0

    2

    4

    skewness=0.35

    00.5

    11.5

    22.5

    0

    0.5

    1

    .5

    4

    2

    0

    2

    4

    skewness=0.6

    0

    0.51

    1.52

    0

    0.5

    1

    1.5

    4

    2

    0

    2

    4

    skewness=0.85

    Figure 4: Logarithm of the objective function of SMD estimator of and based on data (T= 1000)generated from an MA(1) model yt = et + et1 with = 0:7 and et iid(0; 1). The errors aredrawn from a generalized lambda distribution with zero excess kurtosis and a skewness parameterequal to 0, 0.35, 0.6 and 0.85.

    34

  • 7/29/2019 Gospodinov Ng

    36/37

    1 2 3 4 5 6 7 8 9 101

    0.5

    0

    0.5

    1

    1.5

    horizon

    impulseresponsefunction

    true IRF

    SMDbased IRF median estimate

    MLbased IRF median estimate

    Figure 5: SMD and Gaussian quasi ML median estimates of the impulse response function fromthe ARMA(1, 1) model (1 + 0:5L)yt = (1 + 2L)et, where et is a standard exponential randomvariable with a scale parameter equal to one which is recentered and rescaled to have mean zeroand variance 1. The sample size of the simulated series is T= 500.

    35

  • 7/29/2019 Gospodinov Ng

    37/37

    1 2 3 4 5 6 7 8 9 101.6

    1.4

    1.2

    1

    0.8

    0.6

    0.4

    0.2

    0

    0.2

    horizon

    impulseresponsefunction

    true IRF

    SMDbased IRF median estimate

    MLbased IRF median estimate

    Figure 6: SMD and Gaussian quasi ML median estimates of the impulse response function fromthe ARMA(1, 1) model (1 0:5L)yt = (1 2L)et, where et is a standard exponential randomvariable with a scale parameter equal to one which is recentered and rescaled to have mean zeroand variance 1. The sample size of the simulated series is T= 500.