Paris2012 session1

43
Forecasting in State Space: theory and practice Siem Jan Koopman http://personal.vu.nl/s.j.koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2012

description

State Space Model by Pr. Koopman

Transcript of Paris2012 session1

Page 1: Paris2012 session1

Forecasting in State Space: theory and practice

Siem Jan Koopman

http://personal.vu.nl/s.j.koopman

Department of EconometricsVU University Amsterdam

Tinbergen Institute2012

Page 2: Paris2012 session1

Program

Lectures :

• Introduction to UC models

• State space methods

• Forecasting time series with different components

• Practice of Forecasting with Illustrations

Exercises and assignments will be part of the course.

2 / 42

Page 3: Paris2012 session1

Time Series

A time series is a set of observations yt , each one recorded at aspecific time t.

The observations are ordered over time.We assume to have n observations, t = 1, . . . , n.

Examples of time series are:

• Number of cars sold each year

• Gross Domestic Product of a country

• Stock prices during one day

• Number of firm defaults

Our purpose is to identify and to model the serial or “dynamic”correlation structure in the time series.

Time series analysis may be relevant for economic policy, financialdecision making and forecasting

3 / 42

Page 4: Paris2012 session1

Example: Nile data

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

500

600

700

800

900

1000

1100

1200

1300

1400 Nile Data

4 / 42

Page 5: Paris2012 session1

Example: GDP growth, quarter by quarter

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

−4

−3

−2

−1

0

1

2

3

4

5

5 / 42

Page 6: Paris2012 session1

Example: winner boat races Cambridge/Oxford

1840 1860 1880 1900 1920 1940 1960 1980 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

6 / 42

Page 7: Paris2012 session1

Example: US monthly unemployment

7 / 42

Page 8: Paris2012 session1

Sources for time series data

Data sources :

• US economics :http://research.stlouisfed.org/fred2/

• DK book data : http://www.ssfpack.com/dkbook.html

• Financial data : Datastream, Yahoo Finance

8 / 42

Page 9: Paris2012 session1

9:

9 / 42

Page 10: Paris2012 session1

White noise processes

Simplest example of a stationary process is a white noise (WN)process which we usually denote as εt .

A white noise process is a sequence of uncorrelated randomvariables, each with zero mean and constant variance σ2

ε :

εt ∼ WN(0, σ2ε ).

The autocovariance function is equal to zero for lags h > 0:

γY (h) =

{

σ2ε if h = 0,

0 if h 6= 0.

9 / 42

Page 11: Paris2012 session1

White noise realisations

0 50 100 150 200 250 300 350 400 450 500

−3

−2

−1

0

1

2

White Noise, 500 observations

10 / 42

Page 12: Paris2012 session1

White noise ACF and SACF

0 10 20 30 40 50

−0.05

0.00

0.05

0.10 Theoretical ACF max lag = 50

0 100 200 300 400 500

−0.05

0.00

0.05

0.10 Theoretical ACF max lag = 500

0 10 20 30 40 50

−0.5

0.0

0.5

1.0 Sample ACF n = 50ACF−

0 100 200 300 400 500

−0.5

0.0

0.5

1.0 Sample ACF n = 500ACF−

11 / 42

Page 13: Paris2012 session1

Random Walk processes

If ε1, ε2, . . . come from a white noise process with variance σ2,then the process {Yt} with

Yt = ε1 + ε2 + . . .+ εt for t = 1, 2, . . .

is called a random walk.

A recursive way to define a random walk is:

Yt = Yt−1 + εt for t = 2, 3, . . .

Y1 = ε1

12 / 42

Page 14: Paris2012 session1

Random Walk properties I

A random walk is not stationary, because the variance of Yt istime-varying:

E(Yt) = E(ε1 + . . .+ εt) = 0

Var(Yt) = E(Y 2t ) = E[(ε1 + . . . + εt)

2] = tσ2

The autocovariance function is equal to:

γ(t, t − h) = E(YtYt−h)

= E[(

t−h∑

j=1

εj +

t∑

j=t−h+1

εj)(

t−h∑

j=1

εj )]

= (t − h)σ2

This means that the variance and the autocovariances go toinfinity if t → ∞.

13 / 42

Page 15: Paris2012 session1

Random Walk properties II

The autocorrelation of Yt and Yt−h is

ρ(t, t − h) =γ(t, t − h)

Var(Yt)Var(Yt−h)

=(t − h)σ2

(tσ2)((t − h)σ2)=

√t − h√t

14 / 42

Page 16: Paris2012 session1

RW realisation

0 50 100 150 200 250 300 350 400 450 500

−30

−25

−20

−15

−10

−5

0

5

Random Walk, 500 observations

15 / 42

Page 17: Paris2012 session1

RW sample ACF

0 5 10 15 20 25 30 35 40 45 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0 Sample ACF 50 lags

ACF−

16 / 42

Page 18: Paris2012 session1

Seatbelt Law

70 75 80 85

7.0

7.1

7.2

7.3

7.4

7.5

7.6

7.7

7.8

7.9

17 / 42

Page 19: Paris2012 session1

Classical Decomposition

A basic model for representing a time series is the additive model

yt = µt + γt + εt , t = 1, . . . , n,

also known as the Classical Decomposition.

yt = observation,

µt = slowly changing component (trend),

γt = periodic component (seasonal),

εt = irregular component (disturbance).

In a Structural Time Series Model (STSM) or UnobservedComponents Model (UCM), the RHS components are modelledexplicitly as stochastic processes.

18 / 42

Page 20: Paris2012 session1

Nile data

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

500

600

700

800

900

1000

1100

1200

1300

1400 Nile Data

19 / 42

Page 21: Paris2012 session1

Local Level Model

• Components can be deterministic functions of time (e.g.polynomials), or stochastic processes;

• Deterministic example: yt = µ+ εt with εt ∼ NID(0, σ2ε ).

• Stochastic example: the Random Walk plus Noise, orLocal Level model:

yt = µt + εt , εt ∼ NID(0, σ2ε )

µt+1 = µt + ηt , ηt ∼ NID(0, σ2η),

• The disturbances εt , ηs are independent for all s, t;

• The model is incomplete without a specification for µ1 (notethe non-stationarity):

µ1 ∼ N (a,P)

20 / 42

Page 22: Paris2012 session1

Local Level Model

yt = µt + εt , εt ∼ NID(0, σ2ε )

µt+1 = µt + ηt , ηt ∼ NID(0, σ2η),

µ1 ∼ N (a,P)

• The level µt and the irregular εt are unobserved;

• Parameters: a,P , σ2ε , σ

2η ;

• Trivial special cases:• σ2

η= 0 =⇒ yt ∼ NID(µ1, σ

2ε) (WN with constant level);

• σ2ε= 0 =⇒ yt+1 = yt + ηt (pure RW);

• Local Level is a model representation for EWMA forecasting.

21 / 42

Page 23: Paris2012 session1

Simulated LL Data

0 10 20 30 40 50 60 70 80 90 100

−6

−4

−2

0

2

4

6

σε2=0.1 ση

2=1 y µ

22 / 42

Page 24: Paris2012 session1

Simulated LL Data

0 10 20 30 40 50 60 70 80 90 100

−6

−4

−2

0

2

4

6

σε2=1 ση

2=1

23 / 42

Page 25: Paris2012 session1

Simulated LL Data

0 10 20 30 40 50 60 70 80 90 100

−6

−4

−2

0

2

4

6

σε2=1 ση

2=0.1

24 / 42

Page 26: Paris2012 session1

Simulated LL Data

0 10 20 30 40 50 60 70 80 90 100

−5

0

5 σε2=0.1 ση

2=1 y µ

0 10 20 30 40 50 60 70 80 90 100

−5

0

5 σε2=1 ση

2=1

0 10 20 30 40 50 60 70 80 90 100

−2

0

2 σε2=1 ση

2=0.1

25 / 42

Page 27: Paris2012 session1

Properties of the LL model

yt = µt + εt , εt ∼ NID(0, σ2ε ),

µt+1 = µt + ηt , ηt ∼ NID(0, σ2η),

• First difference is stationary:

∆yt = ∆µt +∆εt = ηt−1 + εt − εt−1.

• Dynamic properties of ∆yt :

E(∆yt) = 0,

γ0 = E(∆yt∆yt) = σ2η + 2σ2

ε ,

γ1 = E(∆yt∆yt−1) = −σ2ε ,

γτ = E(∆yt∆yt−τ ) = 0 for τ ≥ 2.

26 / 42

Page 28: Paris2012 session1

Properties of the LL model

• The ACF of ∆yt is

ρ1 =−σ2

ε

σ2η + 2σ2

ε

= − 1

q + 2, q = σ2

η/σ2ε ,

ρτ = 0, τ ≥ 2.

• q is called the signal-noise ratio;

• The model for ∆yt is MA(1) with restricted parameters suchthat

−1/2 ≤ ρ1 ≤ 0

i.e., yt is ARIMA(0,1,1);

• Write ∆yt = ξt + θξt−1, ξt ∼ NID(0, σ2) to solve θ:

θ =1

2

(

q2 + 4q − 2− q)

.

27 / 42

Page 29: Paris2012 session1

Local Level Model

• The model parameters are estimated by Maximum Likelihood;

• Advantages of model based approach: assumptions can betested, parameters are estimated rather than “calibrated”;

• Estimated model can be used for signal extraction;

• The estimated level µt is obtained as a locally weightedaverage;

• The distribution of weights can be compared with Kernelfunctions in nonparametric regressions;

• Within the model, our methods yield MMSE forecasts.

28 / 42

Page 30: Paris2012 session1

Signal Extraction and Weights for the Nile Data

1880 1900 1920 1940 1960

500

750

1000

1250data and estimated level

−20 −15 −10 −5 0 5 10 15 20

0.01

0.02

weights

1880 1900 1920 1940 1960

500

750

1000

1250

−20 −15 −10 −5 0 5 10 15 20

0.05

0.10

0.15

1880 1900 1920 1940 1960

500

750

1000

1250

1500

−20 −15 −10 −5 0 5 10 15 20

0.25

0.50

29 / 42

Page 31: Paris2012 session1

Local Linear Trend Model

The LLT model extends the LL model with a slope:

yt = µt + εt , εt ∼ NID(0, σ2ε ),

µt+1 = βt + µt + ηt , ηt ∼ NID(0, σ2η),

βt+1 = βt + ξt , ξt ∼ NID(0, σ2ξ ).

• All disturbances are independent at all lags and leads;

• Initial distributions β1, µ1 need to specified;

• If σ2ξ = 0 the trend is a random walk with constant drift β1;

(For β1 = 0 the model reduces to a LL model.)

• If additionally σ2η = 0 the trend is a straight line with slope β1

and intercept µ1;

• If σ2ξ > 0 but σ2

η = 0, the trend is a smooth curve, or anIntegrated Random Walk;

30 / 42

Page 32: Paris2012 session1

Trend and Slope in LLT Model

0 10 20 30 40 50 60 70 80 90 100

−2.5

0.0

2.5

5.0µ

0 10 20 30 40 50 60 70 80 90 100

−0.25

0.00

0.25

0.50

0.75 β

31 / 42

Page 33: Paris2012 session1

Trend and Slope in Integrated Random Walk Model

0 10 20 30 40 50 60 70 80 90 100

0

5

10 µ

0 10 20 30 40 50 60 70 80 90 100

−0.25

0.00

0.25

0.50

0.75 β

32 / 42

Page 34: Paris2012 session1

Local Linear Trend Model

• Reduced form of LLT is ARIMA(0,2,2);

• LLT provides a model for Holt-Winters forecasting;

• Smooth LLT provides a model for spline-fitting;

• Smoother trends: higher order Random Walks

∆dµt = ηt

33 / 42

Page 35: Paris2012 session1

Seasonal Effects

We have seen specifications for µt in the basic model

yt = µt + γt + εt .

Now we will consider the seasonal term γt . Let s denote thenumber of ‘seasons’ in the data:

• s = 12 for monthly data,

• s = 4 for quarterly data,

• s = 7 for daily data when modelling a weekly pattern.

34 / 42

Page 36: Paris2012 session1

Dummy Seasonal

The simplest way to model seasonal effects is by using dummyvariables. The effect summed over the seasons should equal zero:

γt+1 = −s−1∑

j=1

γt+1−j .

To allow the pattern to change over time, we introduce a newdisturbance term:

γt+1 = −s−1∑

j=1

γt+1−j + ωt , ωt ∼ NID(0, σ2ω).

The expectation of the sum of the seasonal effects is zero.

35 / 42

Page 37: Paris2012 session1

Trigonometric Seasonal

Defining γjt as the effect of season j at time t, an alternativespecification for the seasonal pattern is

γt =

[s/2]∑

j=1

γjt ,

γj ,t+1 = γjt cos λj + γ∗jt sinλj + ωjt ,

γ∗j ,t+1 = −γjt sinλj + γ∗jt cos λj + ω∗

jt ,

ωjt , ω∗

jt ∼ NID(0, σ2ω), λj = 2πj/s.

• Without the disturbance, the trigonometric specification isidentical to the deterministic dummy specification.

• The autocorrelation in the trigonometric specification laststhrough more lags: changes occur in a smoother way;

36 / 42

Page 38: Paris2012 session1

Unobserved Component Models

• Different specifications for the trend and the seasonal can befreely combined.

• Other components of interest, like cycles, explanatoryvariables, interventions effects, outliers, are easily added.

• UC models are Multiple Source of Errors models. The reducedform is a Single Source of Errors model.

• We model non-stationarity directly.

• Components have an explicit interpretation: the model is notjust a forecasting device.

37 / 42

Page 39: Paris2012 session1

Seatbelt Law

70 75 80 85

7.0

7.1

7.2

7.3

7.4

7.5

7.6

7.7

7.8

7.9

38 / 42

Page 40: Paris2012 session1

Seatbelt Law: decomposition

drivers Level+Reg

70 75 80 85

7.25

7.75drivers Level+Reg

drivers−Seasonal

70 75 80 85

0.0

0.2drivers−Seasonal

drivers−Irregular

70 75 80 85

−0.1

0.0

0.1drivers−Irregular

39 / 42

Page 41: Paris2012 session1

Seatbelt Law: forecasting

70 75 80 85

7.0

7.1

7.2

7.3

7.4

7.5

7.6

7.7

7.8

7.9

40 / 42

Page 42: Paris2012 session1

Textbooks

• A.C.Harvey (1989). Forecasting, Structural Time SeriesModels and the Kalman Filter. Cambridge University Press

• G.Kitagawa & W.Gersch (1996). Smoothness Priors Analysisof Time Series. Springer-Verlag

• J.Harrison & M.West (1997). Bayesian Forecasting andDynamic Models. Springer-Verlag

• J.Durbin & S.J.Koopman (2001). Time Series Analysis byState Space Methods. Oxford University Press

• J.J.F.Commandeur & S.J.Koopman (2007). An Introductionto State Space Time Series Analysis. Oxford University Press

41 / 42

Page 43: Paris2012 session1

Exercises

1. Consider LL model (see slides, see DK chapter 2).• Reduced form is ARIMA(0,1,1) process. Derive the

relationship between signal-to-noise ratio q of LL model andthe θ coefficient of the ARIMA model;

• Derive the reduced form in the case ηt =√qεt and notice the

difference in the general case.• Give the elements of the mean vector and variance matrix of

y = (y1, . . . , yn)′ when yt is generated by a LL model for

t = 1, . . . , n.

2. Consider LLT model (see slides, see DK section 3.2.1).• Show that the reduced form is an ARIMA(0,2,2) process;• Discuss the initial values for level and slope of LLT;• Relate the LLT model forecasts with the Holt-Winters method

of forecasting. Comment.

42 / 42