Dynamic Time Series Models and Heavy-Tailed … Time Series Models and Heavy-Tailed Distributions...

Dynamic Time Series Models and Heavy-TailedDistributions

NBP Workshop on Forecasting, Warsaw, September2015

Andrew Harvey ([email protected])

Faculty of Economics, University of Cambridge

September 2015

(Faculty of Economics, University of Cambridge)Dynamic Time Series Models and Heavy-Tailed Distributions NBP Workshop on Forecasting, Warsaw, September 2015 Andrew Harvey ([email protected])September 2015 1 / 78

Introduction to dynamic conditional score (DCS) models

1. A unified and comprehensive theory for a class of nonlinear time seriesmodels in which the conditional distribution of an observation may beheavy-tailed and the location and/or scale changes over time.2. Dynamics are driven by the score of the conditional distribution.3. For EGARCH, analytic expressions may be derived for (unconditional)moments, autocorrelations and moments of multi-step forecasts.4. An asymptotic distributional theory for ML estimators can be obtained,sometimes with analytic expressions for the asymptotic covariance matrix.5. Extensions to multivariate time series. Correlation or association maychange over time. Time-varying copulas.***Harvey, A.C. Dynamic models for volatility and heavy tails. CUP. 2013http://www.econ.cam.ac.uk/DCSCreal et al (2011, JBES, 2013, JAE).


Introduction to dynamic conditional score (DCS) models

A guiding principle is signal extraction. When combined with basic ideasof maximum likelihood estimation, the signal extraction approach leads tomodels which, in contrast to many in the literature, are relatively simple intheir form and yield analytic expressions for their principal features.For estimating location, DCS models are closely related to the unobservedcomponents (UC) models described in Harvey (1989).Such models can be handled using state space methods and they are easilyaccessible using the STAMP package of Koopman et al (2008).For estimating scale, the models are close to stochastic volatility (SV)models, where the variance is treated as an unobserved component.


Unobserved component models

A simple Gaussian signal plus noise model is

yt = µt + εt , εt ∼ NID(0, σ2ε

), t = 1, ...,T

µt+1 = φµt + ηt , ηt ∼ NID(0, σ2η),where the irregular and level disturbances, εt and ηt , are mutuallyindependent. The AR parameter is φ, while the signal-noise ratio,q = σ2η/σ2ε , plays the key role in determining how observations should beweighted for prediction and signal extraction.The reduced form (RF) is an ARMA(1,1) process

yt = φyt−1 + ξt − θξt−1, ξt ∼ NID(0, σ2

),

but with restrictions on θ. For example, when φ = 1, 0 ≤ θ ≤ 1. Theforecasts from the UC model and RF are the same.


Unobserved component models

The UC model is effectively in state space form (SSF) and, as such, it maybe handled by the Kalman filter (KF). The parameters φ and q can beestimated by ML, with the likelihood function constructed from theone-step ahead prediction errors.The KF can be expressed as a single equation. Writing this equationtogether with an equation for the one-step ahead prediction error, vt , givesthe innovations form (IF) of the KF:

yt = µt |t−1 + vtµt+1|t = φµt |t−1 + ktvt

The Kalman gain, kt , depends on φ and q.In the steady-state, kt is constant. Setting it equal to κ and re-arranginggives the ARMA(1,1) model with ξt = vt and φ− κ = θ.


Outliers

Suppose noise is from a heavy tailed distribution, such as Student’s t.Outliers.The RF is still an ARMA(1,1), but allowing the ξ ′ts to have a heavy-taileddistribution does not deal with the problem as a large observation becomesincorporated into the level and takes time to work through the system.An ARMA models with a heavy-tailed distribution is designed to handleinnovations outliers, as opposed to additive outliers. See the robustnessliterature.But a model-based approach is not only simpler than the usual robustmethods, but is also more amenable to diagnostic checking andgeneralization.


Unobserved component models for non-Gaussian noise

Simulation methods, such as MCMC, provide the basis for a direct attackon models that are nonlinear and/or non-Gaussian. The aim is to extendthe Kalman filtering and smoothing algorithms that have proved soeffective in handling linear Gaussian models. Considerable progress hasbeen made in recent years; see Durbin and Koopman (2012).But simulation-based estimation can be time-consuming and subject to adegree of uncertainty.Also the statistical properties of the estimators are not easy to establish.


Observation driven model based on the score

The DCS approach begins by writing down the distribution of the t − thobservation, conditional on past observations. Time-varying parametersare then updated by a suitably defined filter. Such a model is observationdriven, as opposed to a UC model which is parameter driven. In a linearGaussian UC model, the KF is driven by the one step-ahead predictionerror, vt . The DCS filter replaces vt in the KF equation by a variable, ut ,that is proportional to the score of the conditional distribution.The innovations form becomes

yt = µt |t−1 + vt , t = 1, ...,T

µt+1|t = φµt |t−1 + κut

where κ is an unknown parameter.


Why the score ?

Suppose that at time t − 1 we have θt−1 the ML estimate of a parameter,θ. Then the score is zero, that is

∂ ln Lt−1(θ)∂θ

=t−1∑j=1

∂ ln `j (θ)∂θ

= 0 at θ = θt−1. (1)

where `j (θ; yj ) = f (yj ; θ). When a new observation becomes available, asingle iteration of the method of scoring gives

θt = θt−1 +1

It (θt−1)

∂ ln Lt (θ)∂θ

= θt−1 +1

It (θt−1)

∂ ln `t (θ)∂θ

where It (θt−1) = t.I (θt−1) is the information matrix for t observationsand the last line follows because of (1).


Why the score ?

For a Gaussian distribution a single update goes straight to the MLestimate at time t (recursive least squares).

RemarkWe may be able to choose a link function so that the information quantitydoes not depend on θ.

As t → ∞, It (θt−1)→ ∞ so the recursion becomes closed to newinformation. If it is thought that θ changes over time, the filter needs tobe opened up. This may be done by replacing 1/t by a constant, whichmay be denoted as κ.


Why the score ?

Thus

θt = θt−1 + κ1

I (θt−1)

∂ ln `t (θ)∂θ

With no information about how θ might evolve, the above equation mightbe converted to the predictive form by letting θt+1|t = θt so

θt+1|t = θt |t−1 + κ1

I (θt |t−1)∂ ln `t (θ)

∂θ

For a Gaussian distribution in which θ is the mean and the variance isknown to be σ2, the recursion is an EWMA because

1I (θt |t−1)

∂ ln `t (θ)∂θ

= yt − θt |t−1


Why the score ?

If there is reason to think that the parameter tends to revert to anunderlying level, ω, the updating scheme might become

θt+1|t = ω(1− φ) + φθt |t−1 + κ1

I (θt |t−1)∂ ln `t (θ)

∂θ

where |φ| < 1. This filter corresponds to a first-order autoregressiveprocess.More generally we might introduce lags so as to smooth out the changes orallow for periodic effects. This leads to the formulation of a QARMA filter.The score can also be motivated by a conditional mode argument based onsmoothed estimates, θt |T . See Durbin and Koopman (2012, p 252-3) andHarvey (2013, p 87-9)


Why the score ?

The attraction of treating the score-driven filter as a model in its ownright is that it becomes possible to derive its properties and to find theasymptotic distribution of the ML estimator.

Seen in this way, the justification for the class of DCS models is not thatthey approximate corresponding UC models, but rather that theirstatistical properties are both comprehensive and straightforward.

An immediate practical advantage comes from the response of the score toan outlier.


Dynamic location model

yt = ω+ µt |t−1 + vt = ω+ µt |t−1 + exp(λ)εt ,

µt+1|t = φµt |t−1 + κut ,

where εt is serially independent, standard t-variate and

ut =

(1+

(yt − µt |t−1)2

νe2λ

)−1vt ,

where vt = yt − µt |t−1 is the prediction error and ϕ = exp(λ) is the(time-invariant) scale.ut → 0 as |y | → ∞. In the robustness literature this is called aredescending M-estimator. It is a gentle form of trimming.


5 4 3 2 1 1 2 3 4 5

5

4

3

2

1

1

2

3

4

5

y

u

Figure: Impact of ut for tν (with a scale of one) for ν = 3 (thick), ν = 10 (thin)and ν = ∞ (dashed).


Seasonally adjusted monthly data on U.S. Average Weekly Hours ofProduction and Nonsupervisory Employees: Manufacturing (AWHMAN)from February 1992 to May 2010. DCS-t model

yt = µt |t−1 + exp(λ)εt

µt+1|t = µt |t−1 + κut

κ = 1.246 λ = −3.625 ν = 6.35


0 5 10 15 20 25 30 35 40 45 50 55 60

40.0

40.5

41.0

41.5Awhman mut

0 5 10 15 20 25 30 35 40 45 50 55 60

40.0

40.5

41.0

41.5

Awhman FilGauss

Figure: DCS and Gaussian ( bottom panel) local level models fitted to USaverage weekly hours of production.


Scale: DCS Volatility Models

For a DCS model, replace ut in the conditional variance equation

σ2t+1|t = γ+ φσ2t |t−1 + ασ2t |t−1ut ,

by another MD

ut =(ν+ 1)y2t

(ν− 2)σ2t |t−1 + y2t− 1, −1 ≤ ut ≤ ν, ν > 2.

which is proportional to the score of the conditional variance.


5 4 3 2 1 1 2 3 4 5

1

1

2

3

4

5

6

7

8

y

u

Figure: Impact of ut for tν with ν = 3 (thick), ν = 6 (medium dashed) ν = 10(thin) and ν = ∞ (dashed).


Exponential DCS Volatility Models: Beta-t-EGARCH

yt = εt exp(λt pt−1), t = 1, ....,T ,

where the serially independent, zero mean variable εt has a tν−distributionwith degrees of freedom, ν > 0, and the dynamic equation for the log ofscale is

λt+1pt = δ+ φλt pt−1 + κut .

The conditional score is

ut =(ν+ 1)y2t

ν exp(2λt |t−1) + y2t− 1, −1 ≤ ut ≤ ν, ν > 0

NB The variance is equal to the square of the scale, that is(ν− 2)σ2t |t−1/ν for ν > 2.


4020 4030 4040 4050 4060 4070 4080 4090 4100 4110 4120

10

20

30

40

50

60

70

ApplesdBetatEGARCH

sdGARCH


Beta-t-EGARCH

The variable ut may be expressed as

ut = (ν+ 1)bt − 1,

where

bt =y2t /ν exp(2λt pt−1)

1+ y2t /ν exp(2λt pt−1), 0 ≤ bt ≤ 1, 0 < ν < ∞,

=ε2t /ν

1+ ε2t /ν

is distributed as Beta(1/2, ν/2).The u′ts are IID.


Beta-t-EGARCH: moments

The score is bounded.The existence of unconditional moments of the observations, yt , dependsonly on the existence of moments of the conditional distribution, that isthe distribution of εt .

The moments of the scale always exist and hence the volatility processdoes not affect the existence of unconditional moments.Analytic expressions for the unconditional moments can be derived for|yt |c , c ≥ 0.Can also find expresions for autocorrelations of |yt |c .


Gamma-GED-EGARCH

General Error distribution (GED) leads to Gamma-GED-EGARCH model.We have

ln f (φ, κ, υ) = −(1+ υ−1

)ln 2− ln Γ(1+ υ−1)− λt pt−1

−12|yt exp(−λt pt−1)|υ ,

where υ is a shape parameter.The score

ut = (υ/2) |yt/ exp(λt pt−1)|υ − 1,is gamma distributed.


Location/scale models for positive variables: duration,realized volatility and range

Engle (2002) introduced a class of multiplicative error models (MEMs) formodeling non-negative variables, such as duration, realized volatility andrange.The conditional mean, µt pt−1, and hence the conditional scale, is aGARCH-type process. Thus

yt = εtµt pt−1, 0 ≤ yt < ∞, t = 1, ....,T ,

where εt has a distribution with mean one and, in the first-order model,

µt pt−1 = βµt−1|t−2 + αyt−1.


Positive variables: duration, realized volatility and range

An exponential link function, µt pt−1 = exp(λt pt−1), not only ensures thatµt pt−1 is positive, but also allows the asymptotic distribution to be derived.The model can be written

yt = εt exp(λt pt−1)

with dynamicsλt pt−1 = δ+ φλt−1pt−2 + κut−1.


Generalized gamma and beta distributions for positivevariables with changing location/scale

The statistical theory of DCS models for positive variables is simplified bythe fact that for the gamma and Weibull distributions the score and itsderivatives are dependent on a gamma variate, while for the Burr,log-logistic and F-distributions the dependence is on a beta variate.Gamma and Weibull distributions are special cases of the generalizedgamma (GG) distribution.Burr and log-logistic distributions are special cases of the generalizedbeta of the second kind (GB2) distribution.GB2 has fat tails except in a limiting case when it goes to GG.


GB2

The PDF of a GB2 is

f (x) =ν(x/α)νξ−1

αB(ξ, ς)[(x/α)ν + 1

]ξ+ς, α, ν, ξ, ς > 0,

where α is the scale parameter, ν, ξ and ς are shape parameters andB(ξ, ς) is the beta function; see Kleiber and Kotz (2003, ch6).The GB2 distribution contains many important distributions as specialcases, including the Burr (ξ = 1) and log-logistic (ξ = 1, ς = 1).GB2 distributions are fat tailed for finite ξ and ς with upper and lower tailindices of η = ςν and η = ξν respectively.The absolute value of a tf variate is GB2(ϕ, 2, 1/2, f /2) with tail index isη = η = f .Score is bounded.


Stochastic location and stochastic scale

The Student t model for time-varying location may be combined withBeta-t-EGARCH. In other words yt | Yt−1 has a tν distribution with meanµt |t−1 and scale exp(λt pt−1), that is

yt = µt |t−1 + exp(λt pt−1)εt

where λt |t−1 depends on

ut =(ν+ 1)(yt − µt |t−1)

2

ν exp(2λt pt−1) + (yt − µt |t−1)2 − 1

Estimation by ML is straightforward.


Stochastic location and stochastic scale: inflation

Seasonally adjusted rate of inflation (CPI) in the United States from1947(1) to 2007(2)The rate of inflation is often taken to follow a random walk plus noise.Thus for the DCS-t model

µt+1|t = µt |t−1 + κ†ut ,

where the dagger serves to differentiate κ† from the κ parameter in thedynamic scale equation. Fitting a Gaussian model using the STAMPpackage gave an estimate of 0.579 for κ†.The plot of the filtered level, µt+1|t , shows it to be sensitive to extremevalues and the ACF of the absolute values of the residuals provides strongevidence of serial correlation in variance.


1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005

0.025

0.000

0.025

0.050

0.075

0.100

0.125

0.150Inflation (Fil)Level

Figure: Filtered estimates of level from a Gaussian random walk plus noise fittedto US inflation.


Estimating a model in which filtered location is a random walk and scaleevolves as a first-order Beta-t-EGARCH process gives the following MLestimates (with standard errors in parentheses): for location,κ† = 0.699(0.097), and for scale, δ = −0.370(0.214), φ = 0.912(0.051)and κ = 0. 118(0.041), with ν = 11.71(4.58).The filtered estimates respond less to extreme values than those from thehomoscedastic Gaussian model.


1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005

0.025

0.000

0.025

0.050

0.075

0.100

0.125

0.150Level Inflation

Figure: Estimated level of US inflation from a random walk plus noise model withthe noise modeled as Beta-t-GARCH.


A unified theory for volatility: EGARCH and location-scale.

1. Generalized-t (McDonald and Newey, Econometrica, 1987) obtainedfrom GB2. (Absolute value of gen-t is GB2.)

Student-t and GED are special cases.2. Information matrix - includes special cases of GED where tails are light.Hence can test against fat tails.

3. Similar treatment for GB2 - GG has light tails.4. ARCH-M5. Gen-t extends to handle skewness and/or asymmetry.Harvey and Lange (2015). Volatility Modeling with a Generalizedt-distribution. Cambridge Working Papers in Economics (CWPE)1517.6. Multivariate


Generalized Student-t distribution

The additional flexibility of the generalized-t enables it to capture a varietyof shapes at the peak of the distribution as well as in the tails. Thisflexibility goes a long way towards meeting the objection that parametricmodels are too restrictive and hence vulnerable to misspecification. Indeedone of the concerns of McDonald and Newey was to meet these objectionsand they argued that the flexibility of the generalized-t model made it‘partially adaptive’. McDonald and Newey highlighted the robustness ofthe generalized-t assumption for models of location ( or more generallystatic regression).



As with the t distribution, the score or influence function of thegeneralized-t is redescending (except in the limiting GED case) and so isresistant to outliers. Outliers are particulary prevalent in financial timeseries yet estimation by QML, that is based on a criterion functionappropriate for a Gaussian distribution, is widespread in financialeconometrics.The unsuitability of QML is further enhanced when modeling volatilitybecause a dynamic equation for variance that is driven by squaredobservations is itself open to distortion by outliers.



Our preferred parameterization sets one of the shape parameters equal tothe tail index and so

f (y) =1ϕK (η, υ)

(1+

1η

∣∣∣∣y − µ

ϕ

∣∣∣∣υ)−(η+1)/υ

, −∞ < y < ∞,

where ϕ is a scale parameter, υ and η are positive shape parameters and

K (η, υ) =υ

2η1/υ

1B(η/υ, 1/υ)

with B(., .) denoting the beta function.The range of η is 0 < η ≤ ∞, so if we define η = 1/η, then 0 ≤ η ≤ 1 for1 ≤ η ≤ ∞. The range of υ is 0 < υ < ∞ but in practice υ < 1 is unlikely.The distribution has fat tails for finite η and so moments only exist up to,but not including, η.



Setting υ = 2 gives tη, a t-distribution with with η degrees of freedom,whereas GED(υ) is obtained when η → ∞. The Laplace or doubleexponential distribution is GED(1), whereas GED(2) is the normal. Theform of the GED is

f (y) =υ1−1/υ

2ϕΓ (1/υ)exp

[−1

υ

∣∣∣∣y − µ

ϕ

∣∣∣∣υ] .The standard deviation is υ1/υ(Γ(3/υ)/Γ(1/υ))1/2ϕ.


6 4 2 0 2 4 60

0.1

0.2

0.3

0.4

0.5



The CDF is a function of the CDF of a beta distribution (an incompletebeta function) for xt = 1− bt , which is beta(η/υ, 1/υ).The corresponding quantile function is readily available. This is relevantfor Value at Risk (VaR) and Expected Shortfall, as discussed in Zhu andGalbraith (2011) and Zhu (2012).


Testing against fat tails

The GED has light tails, whereas when η is finite the tails are fat. Thuswithin the generalized-t framework a test of the null hypothesis η = ∞, orequivalently η = 0, against the alternative of η < ∞, that is η > 0, is atest against fat tails..A LR test of η = 0 is straightforward but because the alternative, η > 0,is one-sided, the asymptotic distribution of the LR test statistic is amixture of a χ21 and a degenerate distribution with its mass at the origin.Hence the 5% critical value is the usual 10% one, that is 2.71.


Dynamic Scale

The Beta-Gen-t-EGARCH model has ϕ = exp(λt pt−1) and the dynamicequation

λt+1pt = ω(1− φ) + φλt pt−1 + κut .

is driven by the score

∂ ln ft∂λt pt−1

= ut = (η + 1) bt − 1,

where

bt =(|yt − µ| e−λt pt−1)υ/η

(|yt − µ| e−λt pt−1)υ/η + 1

is again distributed as beta(1/υ, η/υ) at the true parameter values.Setting υ = 2 gives Beta-t-EGARCH.


2 1 1 2 3 4 5 6 7 8 9 10

2

1

1

2

3

4

5

6

eps

u

Figure: Score functions for η = 5 and υ = 2 (solid) and υ = 1 (solid thin).Dashed line shows υ = 1 times ratio of its SD to that of the υ = 2 variable


Dynamic Scale: Asymptotics

The score and its derivative,

∂ut∂λt pt−1

= −υ(η + 1)bt (1− bt )

depend on the beta(1/υ, η/υ) variable, bt .A proof of consistency and asymptotic normality follows from Harvey(2013, p 37) because the above functions of the score and its derivativedepend only on bounded variables of the form bht (1− bt )k , h, k = 1, 2, ...


Asymmetric impact curve (leverage)

Returns may have a different effect on volatility depending on whetherthey are positive or negative:

λt+1|t = ω (1− φ) + φ λt |t−1 + κ ut + κ∗u∗t

where u∗t = sgn (µ− yt )(ut + 1) and κ∗ is a parameter.The effect of the extra term is to add or subtract, depending onsgn(yt − µ), a fraction of the impact curve plus one.


5 4 3 2 1 1 2 3 4 5

1

1

2

3

4

5

6

y

u

Figure: Impact of u for t9. Purple is κ = κ∗


5 4 3 2 1 1 2 3 4 5

3

2

1

1

2

3

4

5

6

y

u

Figure: Red has κ = 0.


Return N(s=2.28)

20.0 17.5 15.0 12.5 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0

0.05

0.10

0.15

0.20

0.25

Density

Return N(s=2.28)


Silver

υ =1.34 η =0.075υ = 2 η = 0.220 η goes from 13.31 to 4.54.The SE of η is reduced from0.043 to 0.023.All three models fit well according to the Q(20) statistics. HoweverBeta-t-EGARCH is worst on the AIC and BIC. The hypothesis that υ = 2is rejected by Wald and LR tests.The LR statistic of the null hypothesis that η = 0 is 3.60 which is less than3.84, the 5% significance value for a χ21 distribution. However, the correct5% significance value is only 2.70 because of the one-sided alternative.Thus there may be a small gain from using Gen-t rather than GED.


Extensions: Two components

Instead of capturing long memory by a fractionally integrated process, twocomponents may be used. Thus

λt |t−1 = ω+ λ1,t |t−1 + λ2,t |t−1,

λi ,t+1|t = φi λi ,t |t−1 + κi ut , i = 1, 2,

where φ1 > φ2 if λ1,t pt−1 denotes the long-run component.


1971 1981 1991 2001 2011

1

2

3

4

5

6

7

8

Figure: Long and short run volatilities for NASDAQ


Leverage and Two Components

λt |t−1 = ω+ λ1,t |t−1 + λ2,t |t−1,

λi ,t+1|t = φi λi ,t |t−1 + κi ut + κ∗i sgn(−εt ) (ut + 1), i = 1, 2,

It is often found that the leverage effect is confined to the short-termcomponent. In this case, the evolution of the long-run component will beless susceptible to the influence of strongly negative returns and so may bemore suitable for capturing the ARCH-M effect.


Extensions: ARCH in mean

A DCS EGARCH-M model may be set up as

yt = µ+ α exp(λt pt−1) + εt exp(λt pt−1), t = 1, ...,T ,

where α is a parameter. It it straightforward to extend the theory for theconditional t-distribution, used in Harvey and Lange (2015), togeneralized-t.


The ARCH-M effect could be simply captured by total volatility. However,because volatility has two components, it is possible to investigate theimpact of both long-run and short-run movements. Thus

yt = µ+ α1 exp(ω+λ1,t pt−1)+ α2[exp(λ2,t pt−1)− 1]+ εt exp(λt pt−1), t = 1, ...,T ,

When volatility is at its equilibrium level, the risk premium isµ+ α1 exp(ω).


Asymmetry and skewness

Skewness can be introduced into the generalized Student-t distribution bymeans of the Fernandez and Steel (1998) method, as used by Harvey andSucarrat (2014) for the Student-t distribution.An alternative formulation of the skew-t of Fernandez and Steel isproposed by Zhu and Galbraith (2010). They make a further extension toasymmetry by allowing different degrees of freedom above and below µ.Zhu and Zinde-Walsh (2009) set up an asymmetric skew GED in a similarway.Extending the generalized-t distribution to handle skewness andasymmetry is, in principle, straightforward: we now have υ1 and υ2 as wellas η1 and η2.


Asymmetry and skewness: silver

κ∗ κ φ ω µ η1 η2 υ LogLEstimate 0.004 0.040 0.990 0.499 0.098 0.136 0.000 1.349 -4628.4

SE 0.004 0.007 0.002 0.104 0.034 0.052 0.048 0.144

Left tail index η1 = 7.35. Light right tail η2 = ∞.Asymmetric score ( not leverage)


κ∗ κ φ ω µ α η1 η2 υ LogL0.045 0.026 0.988 -0.535 0.040 0.041 1.497 -3361.50.005 0.005 0.001 0.102 0.014 0.024 0.114

0.049 0.036 0.986 -0.376 0.023 0.121 0.004 1.617 -3347.50.005 0.005 0.001 0.100 0.013 0.039 0.044 0.149

0.050 0.030 0.986 -0.233 0.015 0.573 0.031 1.478 -3342.00.005 0.005 0.001 0.115 0.014 0.011 0.029 0.122

0.047 0.032 0.987 -0.323 0.021 0.542 0.068 0.004 1.561 -3341.50.004 0.005 0.001 0.116 0.014 0.023 0.032 0.056 0.153

Table 3 SP500 daily excess returns from 2 Jan 2004 to 31 Dec 2013(T = 2, 517).


6 4 2 0 2 4 60

0 .1

0 .2

0 .3

0 .4

0 .5

Figure: Standardized residuals from a skewed asymmetric generalized t EGARCHmodel fitted to daily SP500 returns


Multivariate Volatility

A multivariate normal distribution for an N × 1 vector y is parameterizedin terms of an N × 1 mean vector, µ, and an N ×N covariance matrix.The most common multivariate t-distribution has PDF

f (yt ; µ,Ω, ν) =Γ(ν+N)

2(πν)N/2Γ(ν/2) |Ω|1/2 w (ν+N )/2t

,

where wt = 1+ (1/ν)(yt − µ)′Ω−1(yt − µ) and Ω is the scale matrix. Itmay be decomposed as Ω= DRD where R is a correlation matrix and D isa diagonal matrix with the i − th diagonal element equal to a scaleparameter, ϕi .In the dynamic case

Ωt pt−1 = Dt pt−1Rt pt−1Dt pt−1

where the i − th diagonal element is ϕi = exp(λi ,t pt−1). See Creal et al(2011) and Harvey (2013, ch 7)



Consider a bivariate with zero means

Rt pt−1 =

[1 ρt |t−1

ρt |t−1 1

]

where ρt |t−1 denotes the (changing) correlation, based on information attime t − 1.How should we drive the dynamics of the filter for changing correlation,ρt |t−1, and with what link function ?A simple moment approach would use

y1texp(λ1)

y2texp(λ2)

= x1tx2t ,

to drive the covariance, but the effect of x1 = x2 = 1 is the same asx1 = 0.5 and x2 = 2.


1.0 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1.0

60

40

20

20

40

60

r

u

Plot of standardized score, u, against correlation, r , for x1 = x2 (dash) andx1 = 4, x2 = 1.


LHong_Kong LKorea

1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012

5

6

7

8

9

10LHong_Kong LKorea

Figure: Logarithms of Hong Kong and Korea stock markets indices from2/1/1984 to 27/11/2007.


1984 1988 1992 1996 2000 20040.2

0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure: Time Varying Correlation for Hong Kong and Korean Stock MarketIndices - 2/1/1984 to 27/11/2007.


LM test against time-varying correlation

Harvey and Thiele (2015) show the effectiveness of the score-based testagainst time-varying correlation.Much better than standard moment-based test.


Time-varying parameters in regression

Time-varying parameters are usually modeled in regression by letting thecoeffi cient of an explanatory variable change:

yt = βtxt + εt , t = 1, ...,T ,

where βt follows a stochastic process, such as an AR(1) or random walk.If the explanatory variable is stochastic then it needs to be independent ofβt and εt in all time periods. Not only is this assumption restrictive, but ifxt is a linear process, then yt will, in general, be nonlinear. Furthermorethere are concerns about the path followed by the dependent variablewhen the explanatory variable is integrated; see the discussion in Harvey(1989, p. 409-10).



A bivariate model provides a better starting point. Suppose that y1t andy2t are jointly normal with zero means and constant variances, but acorrelation that changes over time. The regression equation then becomes

y1t | y2t = ρt (σ1/σ2)y2t + εt , t = 1, ...,T ,

where εt is distributed independently of y2t with variance σ21 − ρ2t σ22.

Similarly for the DCS model

y1t | y2t = ρt |t−1(σ1/σ2)y2t + εt , t = 1, ...,T ,

The time-varying parameter is βt |t−1 = ρt |t−1(σ1/σ2) and if ρt |t−1 isconstrained to lie in the interval (−1, 1), perhaps by letting it evolve as in(??), the range of βt |t−1 is −σ1/σ2 to σ1/σ2.



When the means are non-zero,

y1t | y2t = (µy − ρt |t−1µx ) + ρt |t−1(σ1/σ2)y2t + εt ,

so the movements in the constant term also depend on those of ρt |t−1.There are similar implications when y1t and y2t depend on exogenousexplanatory variables.By formulating the model in terms of changing correlation, the marginaldistributions are unaffected. If both y1t and y2t are linear univariateprocesses, they remain so. Furthermore, the bivariate process for y1t andy2t is strictly stationary if γt |t−1, and hence ρt |t−1, is stationary.


Time-varying parameters in autoregression

Autoregressive models with time-varying parameters have also beenemployed in econometrics. A recent example can be found in Cogley et al(2010). The prototypical case is

yt = µ+ φt (yt−1 − µ) + εt , t = 1, ...,T ,

φt = ξφt−1 + ηt , |ξ| < 1,where εt and ηt have independent normal distributions and µ and ξ areparameters. Although the model is nonlinear, it is conditionally Gaussianand the likelihood function can be obtained from the Kalman filter.However constraints need to be imposed on φt to prevent the modelbecoming explosive and there is the question of what kind of process ytfollows. These diffi culties can be avoided by setting y1t = yt andy2t = yt−1 and assuming that they have a bivariate Gaussian distributionwith covariance matrix

Cov(y1t , y2t ) = exp(λ)[1 ρtρt 1

].



Then, since σ1 = σ2 = exp(λ),

yt = µ+ φt (yt−1 − µ) + εt , t = 1, ...,T ,

with φt = ρt .In the DCS model

yt = µ+ ρt |t−1(yt−1 − µ) + εt , t = 1, ...,T ,

and if ρt |t−1 evolves through γt |t−1 the constraints on the autoregressivecoeffi cient are automatically satisfied because of the arctantransformation. The variance of yt is still given by exp(2λ) and theprocess remains strictly stationary.


Dynamic copulas: estimating changing association

The conditional score for the Clayton copula with parameter θt |t−1 is

uθt = − ln(τ1tτ2t ) + (1+ θt |t−1)−1 + θ−2 ln(τ

−θt |t−11t + τ

−θt |t−12t − 1)

+

(1+ 2θt |t−1

θt |t−1

)(τ−θt |t−11t ln τ1t + τ

−θt |t−12t ln τ2t )

τ−θt |t−11t + τ

−θt |t−12t − 1

,

where τit = F (yit ), i = 1, 2.The first term involves the product τ1tτ2t , and so is a little like theproduct x1tx2t . In the Gaussian model the score modifies the impact ofx1tx2t by taking account of how the product was formed and the currentparameter value. The same is true here.


Dynamic copulas: estimating changing association

Figure shows the response of the score when τ2 varies, but τ1 is fixed.Two points are worth noting.1) As expected, the response is asymmetric in the sense that the behaviourwhen τ1 fixed at 0.9 is not a mirror image of the behaviour for τ1 fixed at0.1.2) When τ1 = 0.1, the score is only positive for vaues of τ2 close to 0.1,the effect being more pronounced when θ = 5, as opposed to θ = 1. Thisbehaviour is entirely consistent with the conditional density shown earlier :if τ2 is not close to 0.1, it suggests that θt |t−1 is too big and the role ofthe negative score in the dynamic equation is to make θt+1|t smaller.


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

3

2

1

0

1

tau

u

Figure: Response of u for fixed τ1


Conclusion

An EGARCH model driven by the conditional score can be set up with ageneralized t distribution which can be extended to accomodate skewnessand/or asymmetry. Properties such as unconditional moments may bederived and the asymptotic distribution of the maximum likelihoodestimators worked out.For a finite tail index, the impact response curve (influence function) isbounded, so mitigating the effect of outliers. It was shown that theinformation matrix remains positive definite as the tail index goes toinfinity - and the generalized t goes to a GED - by working with theinverse tail index. Empirical evidence shows the practical value of thegeneralized t.


The full asymmetric model is very flexible but will often simplify.Parameter restrictions may be tested by likelihood ratio, Wald or Lagrangemultiplier tests. Given the generality of the full model, Lagrange multipliertests may be particularly useful in practice. The form of the score in thecases examined indicates their plausibility.The dynamics can include leverage effects and more than one component.Again the theory goes through. ARCH in mean effects may also beincorporated in the general model. For positive variables the GB2distribution may be adapted in a similar way to the generalized-t so as toinclude generalized gamma as a special case. The overall theory parallelsthat of the generalized-t thereby providing a fully integrated approach tomodeling volatility.


The multivariate generalized t distribution can be used to model dynamicvolatilities and changing correlations. The decomposition into volatilityand correlations offers considerable advantages.DCS approach also provides a way of modeling dynamic copulas


Dynamic Time Series Models and Heavy-Tailed … Time Series Models and Heavy-Tailed Distributions...

Documents

Transcript of Dynamic Time Series Models and Heavy-Tailed … Time Series Models and Heavy-Tailed Distributions...