Introduction to Statistical Analysis of Time Series

24
1 Outline Modeling objectives in time series General features of ecological/environmental time series Components of a time series Frequency domain analysis-the spectrum Estimating and removing seasonal components Other cyclical components Putting it all together Introduction to Statistical Analysis of Time Series Richard A. Davis Department of Statistics

Transcript of Introduction to Statistical Analysis of Time Series

Page 1: Introduction to Statistical Analysis of Time Series

1

Outline

�Modeling objectives in time series

�General features of ecological/environmental time series

�Components of a time series

�Frequency domain analysis-the spectrum

�Estimating and removing seasonal components

�Other cyclical components

�Putting it all together

Introduction to Statistical Analysis of Time SeriesRichard A. Davis

Department of Statistics

Page 2: Introduction to Statistical Analysis of Time Series

2

Time Series: A collection of observations xt, each one being recorded at time t. (Time could be discrete, t = 1,2,3,…, or continuous t > 0.)

Objective of Time Series Analaysis�Data compression

-provide compact description of the data.�Explanatory

-seasonal factors-relationships with other variables (temperature, humidity, pollution, etc)

�Signal processing-extracting a signal in the presence of noise

�Prediction-use the model to predict future values of the time series

Page 3: Introduction to Statistical Analysis of Time Series

3

General features of ecological/environmental time seriesExamples.

1. Mauna Loa (CO2,, Oct `58-Sept `90)C

O2

1960 1970 1980 1990

320

330

340

350

Features

� increasing trend (linear, quadratic?)

� seasonal (monthly) effect.

Page 4: Introduction to Statistical Analysis of Time Series

4

Go to ITSM Demo

2. Ave-max monthly temp (vegetation=tundra,, 1895-1993)

Features

� seasonal (monthly effect)

� more variability in Jan than in July

tem

p

0 200 400 600 800 1000 1200

-50

510

1520

Page 5: Introduction to Statistical Analysis of Time Series

5

tem

p

0 20 40 60 80 100

-50

510

1520

July: mean = 21.95, var = .6305

Jan : mean = -.486, var =2.637

tem

p

0 20 40 60 80 100

1516

1718

1920

Sept : mean = 17.25, var =1.466Line: 16.83 + .00845 t

Page 6: Introduction to Statistical Analysis of Time Series

6

Components of a time series

Classical decompositionXt = mt + st + Yt

• mt = trend component (slowly changing in time)

• st = seasonal component (known period d=24(hourly), d=12(monthly))

• Yt = random noise component (might contain irregularcyclical components of unknown frequency + other stuff).

Go to ITSM Demo

Page 7: Introduction to Statistical Analysis of Time Series

7

Estimation of the components.

Xt = mt + st + Yt

12/)5.5(.ˆ 6556 ++−− ++++= ttttt xxxxm �

kkt tataam +++= �10ˆ

� polynomial fitting

Trend mt

� filtering. E.g., for monthly data use

Page 8: Introduction to Statistical Analysis of Time Series

8

Estimation of the components (cont).

Xt = mt + st + Yt

years ofnumber ,/)(ˆ 2412 =++= ++ NNxxxs tttt �

)122sin()

122cos(ˆ tBtAst

π+π=

Irregular component Yt

tttt smXY ˆˆˆ −−=

Seasonal st

� Use seasonal (monthly) averages after detrending. (standardize so that st sums to 0 across the year.

� harmonic components fit to the time series using least squares.

Page 9: Introduction to Statistical Analysis of Time Series

9

Toy example. (n=6)

/sqrt(3))' )66

2(cos ,),6

2(cos(1ππ= …c

/sqrt(3))' )6622(cos ,),

622(cos(2

ππ= …c

/sqrt(6)' )622(cos ,),

22(cos(3

ππ= …c

/sqrt(6))' )66

02(cos ,),6

02(cos(0ππ= …c

/sqrt(3))' )66

2(sin ,),6

2(sin(1ππ= …s

/sqrt(3))' )66

22(sin ,),6

22(sin(2ππ= …s

c0c1

s1c2

s2c3

x

X=(4.24, 3.26, -3.14, -3.24, 0.739, 3.04)’X=(4.24, 3.26, -3.14, -3.24, 0.739, 3.04)’ = 2c0+5(c1+s1)-1.5(c2+s2)+.5c3

The spectrum and frequency domain analysis

Page 10: Introduction to Statistical Analysis of Time Series

10

Fact: Any vector of 6 numbers, x = (x1, . . . , x6)’ can be written as a linear combination of the vectors c0, c1, c2, s1, s2, c3.

More generally, any time series x = (x1, . . . , xn)’ of length n (assume n is odd) can be written as a linear combination of the basis (orthonormal) vectors c0, c1, c2, …, c[n/2], s1, s2, …, s[n/2]. That is,

]2/[ ,111100 nmbabaa mmmm =++++= scsccx �

ω

ωω

=

ω

ωω

=

=

)sin(

)2sin()sin(

2 ,

)cos(

)2cos()cos(

2 ,

1

11

1 2/12/12/1

0

j

j

j

j

j

j

j

j

nn

nnn ���

scc

Page 11: Introduction to Statistical Analysis of Time Series

11

Properties:

1. The set of coefficients {a0, a1, b1, … } is called the discrete Fourier transform

]2/[ ,111100 nmbabaa mmmm =++++= scsccx �

=

=

=

ω==

ω==

==

n

tjtjj

n

tjtjj

n

tt

txn

b

txn

a

xn

a

12/1

2/11

2/1

2/11

2/100

)sin(2),(

)cos(2),(

1),(

sx

cx

cx

Page 12: Introduction to Statistical Analysis of Time Series

12

( )∑∑==

++=m

jj

n

tt baax

1j

2220

1

2

2. Sum of squares.

3. ANOVA (analysis of variance table)

Source DF Sum of Squares Periodgram

ω0 1 a02 I(ω0)

ω1=2π/n 2 a12 + b1

2 2 I(ω1)

ωm =2πm/n 2 am2 + bm

2 2 I(ωm)

n

� ��

∑t

tx2

Page 13: Introduction to Statistical Analysis of Time Series

13

Source DF Sum of Squares

ω0=0 (period 0) 1 a02 = 4.0

ω1=2π/6 (period 6) 2 a12 + b1

2 = 50.0

ω2 =2π2/6 (period 3) 2 a22 + b2

2 = 4.5

ω3 =2π3/6 (period 2) 1 a32 = 0.25

6 = 58.75

Applied to toy example

∑t

tx2

Test that period 6 is significant

H0: Xt = µ + εt , {εt} ~ independent noise

H1: Xt = µ + A cos (t2π/6) + B sin (t2π/6) + εt

Test Statistic: (n-3)I(ω1)/(Σt xt2-I(0)-2I(ω1)) ~ F(2,n-3)

(6-3)(50/2)/(58.75-4-50)=15.79 ⇒ p-value = .003

Page 14: Introduction to Statistical Analysis of Time Series

14

Ex. Sinusoid with period 12.

.120,,2,1 ),122sin(3)

122cos(5 …=π+π= tttxt

ITSM DEMO

The spectrum and frequency domain analysis

Ex. Sinusoid with periods 4 and 12.

Ex. Mauna Loa

Page 15: Introduction to Statistical Analysis of Time Series

15

Sometimes, a seasonal component with period 12 in the time series can be removed by differencing at lag 12. That is the differenced series is

.120,,2,1 ,)122sin(3)

122cos(5 …=ε+π+π= tttx tt

Differencing at lag 12

12−−= ttt xxy

Now suppose xt is the sinusoid with period 12 + noise.

Then

which has correlation at lag 12.

1212 −− ε−ε=−= ttttt xxy

Page 16: Introduction to Statistical Analysis of Time Series

16

Ex. Sunspots.

� period ~ 2π/.62684=10.02 years

� Fisher’s test ⇒ significance

What model should we use?

ITSM DEMO

Other cyclical components; searching for hidden cycles

Page 17: Introduction to Statistical Analysis of Time Series

17

Noise.The time series {Xt} is white or independent noise if the sequence of random variables is independent and identically distributed.

time

x_t

0 20 40 60 80 100 120

-20

24

Battery of tests for checking whiteness.In ITSM, choose statistics => residual analysis => Tests of Randomness

Page 18: Introduction to Statistical Analysis of Time Series

18

Residuals from Mauna Loa data.

x_t

x_{t+

1}

-1.0 -0.5 0.0 0.5 1.0 1.5

-1.0

-0.5

0.0

0.5

1.0

1.5 Cor(Xt, Xt+1) = .824

t rt rt+1

1 -.19 -.142 -.14 -.253 -.25 -.134 -.13 -.32

x_t

x_{t+

1}

-1.0 -0.5 0.0 0.5 1.0 1.5

-1.0

-0.5

0.0

0.5

1.0

1.5 Cor(Xt, Xt+2) = .736

t rt rt+2

1 -.19 -.252 -.14 -.133 -.25 -.324 -.13 -.02

t rt rt+1

1 -.19 -.132 -.14 -.323 -.25 -.134 -.13 -.02

x_t

x_{t+

1}

-1.0 -0.5 0.0 0.5 1.0 1.5

-1.0

-0.5

0.0

0.5

1.0

1.5 Cor(Xt, Xt+3) = .654

x_t

x_{t+

1}

-1.0 -0.5 0.0 0.5 1.0 1.5

-1.0

-0.5

0.0

0.5

1.0

1.5 Cor(Xt, Xt+25) = .074 t rt rt+25

1 -.19 .132 -.14 .043 -.25 .204 -.13 .47

Page 19: Introduction to Statistical Analysis of Time Series

19

lag

ACF

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Autocorrelation function (ACF):

Mauna Loa residuals

lag

ACF

0 10 20 30 40

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

white noise

Conf Bds: ± 1.96/sqrt(n)

Page 20: Introduction to Statistical Analysis of Time Series

20

Putting it all together

Example: Mauna LoaC

O2

1960 1970 1980 1990

320

340

trend

1960 1970 1980 1990

320

340

seas

onal

1960 1970 1980 1990

-3-1

12

3

irreg

ular

par

t

1960 1970 1980 1990

-1.0

0.0

1.0

Page 21: Introduction to Statistical Analysis of Time Series

21

Strategies for modeling the irregular part {Yt}.� Fit an autoregressive process� Fit a moving average process� Fit an ARMA (autoregressive-moving average) process

In ITSM, choose the best fitting AR or ARMA using the menu option

Model => Estimation => Preliminary => AR estimationor

Model => Estimation => Autofit

Page 22: Introduction to Statistical Analysis of Time Series

22

How well does the model fit the data?1. Inspection of residuals.

Are they compatible with white (independent) noise?� no discernible trend� no seasonal component� variability does not change in time.� no correlation in residuals or squares of residuals

Are they normally distributed?

2. How well does the model predict.� values within the series (in-sample forecasting)� future values

3. How well do the simulated values from the model capture the characteristics in the observed data?

ITSM DEMO with Mauna Loa

Page 23: Introduction to Statistical Analysis of Time Series

23

Model refinement and Simulation� Residual analysis can often lead to model refinement� Do simulated realizations reflect the key features

present in the original data

Two examples� Sunspots� NEE (Net ecosystem exchange).

Limitations of existing models� Seasonal components are fixed from year to year.� Stationary through the seasons� Add intervention components (forest fires, volcanic

eruptions, etc.)

Page 24: Introduction to Statistical Analysis of Time Series

24

Other directions� Structural model formulation for trend and seasonal

components� Local level model

mt = mt-1 + noiset

� Seasonal component with noise

st = – st-1 – st-2 – . . . – st-11+ noiset

� Xt= mt + st + Yt + εt

� Easy to add intervention terms in the above formulation.

� Periodic models (allows more flexibility in modeling transitions from one season to the next).