Module 3 Descriptive Time Series Statistics and...
-
Upload
duongnguyet -
Category
Documents
-
view
215 -
download
0
Transcript of Module 3 Descriptive Time Series Statistics and...
Module 3
Descriptive Time Series Statistics
and Introduction to Time Series Models
Class notes for Statistics 451: Applied Time Series
Iowa State University
Copyright 2015 W. Q. Meeker.
November 11, 2015
17h 20min
3 - 1
Module 3
Segment 1
Populations, Processes, and
Descriptive Time Series Statistics
3 - 2
Populations
• A population is a collection of identifiable units (or a spec-
ified characteristic of these units).
• A frame is a listing of the units in the population.
• We take a sample from the frame and use the resulting data
to make inferences about the population.
• Simple random sampling with replacement from a popu-
lation implies independent and identically distributed (iid)
observations.
• Standard statistical methods use the iid assumption for ob-
servations or residuals (in regression).
3 - 3
Processes
• A process is a system that transforms inputs into outputs,
operating over time.
outputsinputs
• A process generates a sequence of observations (data) over
time.
• We can use a realization from the process to make infer-
ences about the process.
• The iid model is rarely appropriate for processes (observa-
tions close together in time are typically correlated and the
process often changes with time).
3 - 4
Process, Realization, Model,
Inference, and Applications
Description
Process
Generates DataRealization
Z , Z , ..., Z
Estimates for Model
Parameters
Inferences
Forecasts
Control
n1 2
(Stochastic Process Model)
for the ProcessStatistical Model
3 - 5
Stationary Stochastic Processes
Zt is a stochastic process. Some properties of Zt include
mean µt, variance σ2t , and autocorrelation ρZt,Zt+k. In general,
these can change over time.
• Strongly stationary (also strictly or completely stationary):
F(zt1, . . . , ztn) = F(zt1+k, . . . , ztn+k)
Difficult to check.
• Weakly stationary (or 2nd order or covariance stationary)
requires only that µ = µt and σ2 = σ2t be constant and that
ρk = ρZt,Zt+kdepend only on k.
Easy to check with sample statistics.
Generally, “stationary” is understood to be weakly station-
ary.
3 - 6
Change in Business Inventories 1955-1969
Stationary?
1955 1957 1959 1961 1963 1965 1967 1969
-50
510
15
Year
Bill
ions
of D
olla
rs
3 - 7
Estimation of Stationary Process Parameters
Some Notation
ProcessParameter Notation Estimate
Mean of Z µZ = E(Z) µZ = z =
∑nt=1 ztn
Variance of Z γ0 = σ2Z = Var(Z) γ0 =
∑nt=1(zt−z)2
n
Standard σZ σZ =√γ0
Deviation
Autocovariance γk γk =
∑n−kt=1(zt−z)(zt+k−z)
n
Autocorrelation ρk =γkγ0
ρk =γkγ0
Variance of Z σ2Z= Var(Z) S2
Z= γ0
n [· · · ]
3 - 8
Change in Business Inventories 1955-1969
with z and z ± 3σZ
1955 1957 1959 1961 1963 1965 1967 1969
-50
510
15
Year
Bill
ions
of D
olla
rs
3 - 9
Variance of Z with Correlated Data
The Variance of Z
σ2Z = Var(Z) =γ0n
n−1∑
k=−(n−1)
(1− |k|
n
)ρk
with correlated data is more complicated than with iid data
where
σ2Z = Var(Z) =γ0n
=σ2
n
Thus if one incorrectly uses the iid formula, the variance is
under estimated, potentially leading for false conclusions.
3 - 10
Module 3
Segment 2
Correlation and Autocorrelation
3 - 11
Correlation [From “Statistics 101”]
Consider random data y and x (e.g., sales and advertising):
x y
x1 y1x2 y2... ...
xn yn
ρx,y denotes the “population” correlation between all values
of x and y in the population.
To estimate ρx,y, we use the sample correlation
ρx,y =
∑ni=1(xi − x)(yi − y)
√∑ni=1(xi − x)2
∑ni=1(yi − y)2
, (Stat 101)
−1 ≤ ρx,y ≤ 1
3 - 12
Sample Autocorrelation
Consider the random time series realization z1, z2, . . . , zn
Lagged Variablest zt zt+1 zt+2 zt+3 zt+41 z1 z2 z3 z4 z52 z2 z3 z4 z5 z63 z3 z4 z5 z6 z7... ... ... ... ... ...
n− 2 zn−2 zn−1 zn – –n− 1 zn−1 zn – – –n zn – – – –
Assuming weak (covariance) stationarity, let ρk denote the
process correlation between observations separated by k time
periods. To compute the “order k” sample autocorrelation
(i.e., correlation between zt and zt+k)
ρk =γkγ0
=
∑n−kt=1(zt − z)(zt+k − z)
∑nt=1(zt − z)2
, k = 0,1,2, . . .
Note: −1 ≤ ρk ≤ 13 - 13
Sample Autocorrelation (alternative formula)
Consider the random time series realization z1, z2, . . . , zn
Lagged Variablest zt zt−1 zt−2 zt−3 zt−4
1 z1 – – – –2 z2 z1 – – –3 z3 z2 z1 – –4 z4 z3 z2 z1 –... ... ... ... ... ...n zn zn−1 zn−2 zn−3 zn−4
Assuming weak (covariance) stationarity, let ρk denote the
process correlation between observations separated by k time
periods.
To compute the “order k” sample autocorrelation (i.e., cor-
relation between zt and zt−k)
ρk =γkγ0
=
∑nt=k+1(zt − z)(zt−k − z)
∑nt=1(zt − z)2
, k = 0,1,2, . . .
Note: −1 ≤ ρk ≤ 13 - 14
Wolfer Sunspot Numbers 1770-1869
0 20 40 60 80 100
050
100
150
3 - 15
Lagged Sunspot Data
Lagged Variablest Spot Spot1 Spot2 Spot3 Spot4
1 101 – – – –2 82 101 – – –3 66 82 101 – –4 35 66 82 101 –5 31 35 66 82 101... ... ... ... ... ...99 37 7 16 30 47100 74 37 7 16 30101 – 74 37 7 16102 – – 74 37 7103 – – – 74 37104 – – – – 74
3 - 16
Wolfer Sunspot Numbers
Correlation Between Observations Separated by One
Time Period
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
••
••
••
• • ••
•
• ••
•
•
•••• •
•• •
•
••
••
•
•• ••
•
•
•
••
•
•
•
••
•
•
•
•
•
•
•
•
••
•
•
•
•
•
••
•
•
•
• •
•
•
••
•
•
••
•
•
•
•
spot.ts[-1]
spot
.ts[-
100]
0 50 100 150
050
100
150
Correlation = 0.81708Autocorrelation = 0.806244
3 - 17
Wolfer Sunspot Numbers
Correlation Between Observations Separated by k
Time Periods [show.acf(spot.ts)]
•
••
••
••
•
•
•
••
••
••
•
•••
•
••••
•••••
•
•••••
•••••••••
•••
•••
••• ••
••
•••
•
•
••
•
••
••
•
••
••
•
•
•
•
•
•••
••
••
•
•
••
••
•••
••
•
•
timeseries
times
erie
s
0 50 100 150
050
100
150
•
••
••
••
•
•
•
••
••
••
•
•••
•
••••
•••• • •
•••••
•••••• •• •
•••
••••• • •
•
••••
•
•
•
• •
•
••
••
•
••• •
•
•
•
•
•
•••
••• •
•
•
••
••••
••
•
•
Lag 1
series
Ser
ies
lag
1
0 50 100 150
050
100
150
•
••
• •
••
•
•
•
••
••
••
•
•••
•
••••
••• • • •
•••••
••••• ••
• •
•••
•••• • • •
•
•••••
•
•
• •
•
••
••
•
••
• •
•
•
•
•
•
•••
••
• •
•
•
••
••
•••
••
Lag 2
seriesS
erie
s la
g 2
0 50 100 150
050
100
150
•
••
• •
••
•
•
•
••
••
••
•
•••
•
••••
••• • ••
•••••
•••• •• •
••
•••
•••• • • •
•
••
•••
•
•
• •
•
••
••
•
••
• •
•
•
•
•
•
••••
•• •
•
•
••
••
•••
•
Lag 3
series
Ser
ies
lag
3
0 50 100 150
050
100
150
•
••
• •
••
•
•
•
••
•••
•
•
•••
•
••••
• •• ••
•
•••••
••• •• • •
••
•••
•••
• • • ••
••
•••
•
•
••
•
••
••
•
••
• •
•
•
•
•
•
•••
••
••
•
•
••
••
• ••
Lag 4
series
Ser
ies
lag
4
0 50 100 150
050
100
150
•
••
••
••
•
•
•
••
••
••
•
•••
•
••••
• •••••
•••••
•• •• • ••
••
•••
• ••
• • •••
••
•••
•
•
••
•
••
••
•
••
••
•
•
•
•
•
•••
••••
•
•
••
••
• •
Lag 5
series
Ser
ies
lag
5
0 50 100 150
050
100
150
•
••
••
••
•
•
•
••
••
••
•
•••
•
••• •
• •••••
••••••
•• • •••••
•••• •
•• ••••
••• • •
•
•
••
•
••
••
•
••
••
•
•
•
•
•
• ••
••
••
•
•
••
••
•
Lag 6
series
Ser
ies
lag
6
0 50 100 150
050
100
150
•
••
••
••
•
•
•
••
••
••
•
•••
•
••
• •
••••••
•••••
•• • ••••••
•• •
• ••
•••••
••
• • •
•
•
••
•
••
••
•
••
••
•
•
•
•
•
• •••
•••
•
•
••
••
Lag 7
series
Ser
ies
lag
7
0 50 100 150
050
100
150
•
••
••
••
•
•
•
••
••
••
•
•••
•
••
• •
••••••
•••• •
•• •••••••
•• •
• ••••••
•
••
• ••
•
•
••
•
••
••
•
••••
•
•
•
•
•
• ••
••
••
•
•
• •
•
Lag 8
series
Ser
ies
lag
80 50 100 150
050
100
150
•
••
••
••
•
•
•
••
••
••
•
•••
•
••
••
•••••
•
••• ••
••••••••
•
•• •
• ••••••
•
••
•••
•
•
••
•
••
••
•
••
••
•
•
•
•
•
•••
••••
•
•
• •
Lag 9
series
Ser
ies
lag
9
0 20 60 100 140
050
100
150
•
••
••
••
•
•
•
••
••
••
•
•••
•
• •• •
•••••
•
•• •• •
•••••••
• •
•• •
• ••
••• ••
••
•••
•
•
••
•
••
••
•
••
••
•
•
•
•
•
•••
••
••
•
•
•
Lag 10
series
Ser
ies
lag
10
0 20 60 100 140
050
100
150
•
••
• •
••
•
•
•
••
••
••
•
•••
•
••••
•••••
•
••• • •
•••••••
• •
•• •
•••
•• • ••
••
•••
•
•
• •
•
••
••
•
••
••
•
•
•
•
•
••••
•••
•
•
Lag 11
series
Ser
ies
lag
11
0 20 60 100 140
050
100
150
•
••
• •
••
•
•
•
••
••
••
•
• ••
•
• •••
••••••
•• • ••
•••••• • • •
•• •
•••
• • • ••
••
•••
•
•
• •
•
••
••
•
••
••
•
•
•
•
•
•••
••
• •
•
Lag 12
series
Ser
ies
lag
12
0 20 60 100 140
050
100
150
•
••
• •
••
•
•
•
••
••
••
•
• ••
•
••••
•••••
•
•• •••
••••• • • • •
•••
•••• • • •
•
••
•••
•
•
• •
•
••
••
•
••• •
•
•
•
•
•
•••
••
• •
Lag 13
series
Ser
ies
lag
13
0 20 60 100 140
050
100
150
Lag
AC
F0 5 10 15 20
-0.2
0.2
0.6
1.0
Series : timeseries
3 - 18
Wolfer Sunspot Numbers Sample ACF Function
Lag
AC
F
0 5 10 15 20
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
Series : spot.ts
3 - 19
Module 3
Segment 3
Autoregression, Autoregressive Modeling,
and and Introduction to Partial Autocorrelation
3 - 20
Wolfer Sunspot Numbers 1770-1869
0 20 40 60 80 100
050
100
150
3 - 21
Autoregressive (AR) Models
• AR(0): zt = µ+ at (White noise or “Trivial” model)
• AR(1): zt = θ0 + φ1zt−1 + at
• AR(2): zt = θ0 + φ1zt−1 + φ2zt−2 + at
• AR(3): zt = θ0 + φ1zt−1 + φ2zt−2 + φ3zt−3 + at
• AR(p): zt = θ0 + φ1zt−1 + φ2zt−2 + · · ·+ φpzt−p + at
at ∼ nid(0, σ2a)
3 - 22
Data Analysis Strategy
Time Series PlotRange-Mean PlotACF and PACF
ForecastingExplanationControl
CheckingDiagnostic
Estimation
No
Yes
Modelok?
Use the Model
IdentificationTentative
Maximum Likelihood
Residual Analysisand Forecasts
Least Squares or
3 - 23
AR(1) Model for the Wolfer Sunspot Data
0 20 40 60 80 100
050
100
150
Data and 1-Step Ahead Predictions
Year Number
Sunspot Residuals, AR(1) Model
Year Number
0 20 40 60 80 100
-20
2060
Lag
AC
F
0 5 10 15
-0.4
0.2
0.8
Series : residuals(spot.ar1.fit)
••
•
•
•
•
• •
•
••
•••
•
••
••
•••
• ••
•• • • •••
••
••
••• • • • ••
•••
• ••• ••• •
•••••
•••
•
•
•
•
••
••
•••
••
••
••
••
••••
•
••
•
••••
•• •
••
Quantiles of Standard Normal
resi
dual
s(sp
ot.a
r1.fi
t)
-2 -1 0 1 2
-20
2060
3 - 24
AR(2) Model for the Wolfer Sunspot Data
0 20 40 60 80 100
050
100
150
Data and 1-Step Ahead Predictions
Year Number
Sunspot Residuals, AR(2) Model
Year Number
0 20 40 60 80 100
-40
020
Lag
AC
F
0 5 10 15
-0.2
0.4
0.8
Series : residuals(spot.ar2.fit)
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
••
• •
••
•
•
••• •
•••
••••
•
• •• • ••
•
•• • ••• ••
•••• •• •
••
•
•
••
••
•
•••
••
••
••
•
•
•
•••• •• •
•
• •••
•
•••
••
Quantiles of Standard Normal
resi
dual
s(sp
ot.a
r2.fi
t)
-2 -1 0 1 2
-40
020
3 - 25
AR(3) Model for the Wolfer Sunspot Data
0 20 40 60 80 100
050
100
150
Data and 1-Step Ahead Predictions
Year Number
Sunspot Residuals, AR(3) Model
Year Number
0 20 40 60 80 100
-40
020
40
Lag
AC
F
0 5 10 15
-0.2
0.4
0.8
Series : residuals(spot.ar3.fit)
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
••
• •
••
•
•
••• •
•••
••••
•
• •• • ••
•
•• • ••• ••
•••• •• •
••
•
•
••
••
•
•••
••
••
••
•
•
•
•••• •• •
•
• •••
•
•••
••
Quantiles of Standard Normal
resi
dual
s(sp
ot.a
r2.fi
t)
-2 -1 0 1 2
-40
020
3 - 26
AR(4) Model for the Wolfer Sunspot Data
0 20 40 60 80 100
050
100
150
Data and 1-Step Ahead Predictions
Year Number
Sunspot Residuals, AR(4) Model
Year Number
0 20 40 60 80
-20
020
40
Lag
AC
F
0 5 10 15
-0.2
0.4
0.8
Series : residuals(spot.ar4.fit)
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
••
• •
••
•
•
••• •
•••
••••
•
• •• • ••
•
•• • ••• ••
•••• •• •
••
•
•
••
••
•
•••
••
••
••
•
•
•
•••• •• •
•
• •••
•
•••
••
Quantiles of Standard Normal
resi
dual
s(sp
ot.a
r2.fi
t)
-2 -1 0 1 2
-40
020
3 - 27
Sample Partial Autocorrelation Function
The sample PACF φkk can be computed from the sample ACF
ρ1, ρ2, . . . using the recursive formula from Durbin (1960):
φ1,1 = ρ1
φkk =
ρk −k−1∑
j=1
φk−1,j ρk−j
1−k−1∑
j=1
φk−1,j ρj
, k = 2,3, ...
where
φkj = φk−1,j − φkkφk−1,k−j (k = 3,4, ...; j = 1,2, ..., k − 1)
Thus the PACF contains no new information—just a differ-
ent way of looking at the information in the ACF.
This sample PACF formula is based on the solution of the
Yule-Walker equations (covered module 5).
3 - 28
Summary of Sunspot Autoregressions
Order Regression Output PACF
Model p R2 S φp tφp
φpp
AR(1) 1 .6676 21.53 .810 13.96 .8062AR(2) 2 .8336 15.32 −.711 −9.81 −.6341AR(3) 3 .8407 15.12 .208 2.04 .0805AR(4) 4 .8463 15.01 −.147 −1.41 −.0611
φp is from ordinary least squares (OLS).
φpp is from the Durbin formula.
3 - 29
Sample Partial Autocorrelation
The “true partial autocorrelation function,” denoted by φkk,
for k = 1,2, . . . is the process correlation between obser-
vations separated by k time periods (i.e., between zt and
zt+k) with the effect of the intermediate zt+1, . . . , zt+k−1
removed.
We can estimate φkk, k = 1,2, . . . with φkk, k = 1,2, . . .
• φ1,1 = φ1 from the AR(1) model
• φ2,2 = φ2 from the AR(2) model
• φkk = φk from the AR(k) model
The Durbin formula gives somewhat different answers due
to the basis of the estimator (ordinary least squares versus
solution of the Yule-Walker equations).
3 - 30
Module 3
Segment 4
Introduction to ARMA Models
and Time Series Modeling
3 - 31
Autoregressive-Moving Average (ARMA) Models
(a.k.a. Box-Jenkins Models)
• AR(0): zt = µ+ at (White noise or “Trivial” model)
• AR(1): zt = θ0 + φ1zt−1 + at
• AR(2): zt = θ0 + φ1zt−1 + φ2zt−2 + at
• AR(p): zt = θ0 + φ1zt−1 + · · ·+ φpzt−p + at
• MA(1): zt = θ0 − θ1at−1 + at
• MA(2): zt = θ0 − θ1at−1 − θ2at−2 + at
• MA(q): zt = θ0 − θ1at−1 − · · · − θqat−q + at
• ARMA(1,1): zt = θ0 + φ1zt−1 − θ1at−1 + at
• ARMA(p, q): zt = θ0 + φ1zt−1 + · · ·+ φpzt−p
− θ1at−1 − · · · − θqat−q + at
at ∼ nid(0, σ2a)
3 - 32
Graphical Output from RTSERIES Function
iden(spot.tsd)
Range-Mean Plot
Mean
Ran
ge
20 30 40 50 60 70
4060
8010
012
014
0
time
w
1780 1800 1820 1840 1860
050
100
150
Wolfer Sunspotsw= Number of Spots
ACF
Lag
AC
F
0 10 20 30
-1.0
0.0
0.5
1.0
PACF
Lag
Par
tial A
CF
0 10 20 30
-1.0
0.0
0.5
1.0
3 - 33
Tabular Output from RTSERIES Function
iden(spot.tsd)
Identification Output for Wolfer Sunspots
w= Number of Spots
[1] "Standard deviation of the working series= 37.36504"
ACF
Lag ACF se t-ratio
1 1 0.806243956 0.1000000 8.06243956
2 2 0.428105325 0.1516594 2.82280693
3 3 0.069611110 0.1632975 0.42628402
.
.
.
Partial ACF
Lag Partial ACF se t-ratio
1 1 0.806243896 0.1 8.06243896
2 2 -0.634121358 0.1 -6.34121358
3 - 34
Standard Errors for Sample Autocorrelations and
Sample Partial Autocorrelations
• Sample ACF standard error: Sρk=
√√√√(1+2ρ21+···+2ρ2k−1
n
)
Also can compute the “t-like” statistics t = ρk/Sρk.
• Sample PACF standard error: Sφkk
= 1/√n
Use to compute “t-like” statistics t = φkk/Sφkkand set de-
cision bounds.
• In long realizations from a stationary process, if the true
parameter is really 0, then the distribution of a “t-like”
statistic can be approximated by N(0,1)
• Values of ρk and φkk may be judged to be different from
zero if the “t-like” statistics are outside specified limits (±2
is often suggested; might use ±1.5 as a “warning”).
3 - 35
Drawing Conclusions from Sample ACF’s and PACF’s
PACF PACFDies Down Cutoff after p > 1
ACF Dies Down ARMA AR(p)
ACF Cutoff after q > 1 MA(q) Impossible
Technical details, background, and numerous examples will
be presented in Modules 4 and 5.
3 - 36
Comments on Drawing Conclusions from Sample
ACF’s and PACF’s
The “t-like” statistics should only be used as guidelines in
model building because:
• An ARMA model is only an approximation to some true
process
• Sampling distributions are complicated; the “t-like” statis-
tics do not really follow a standard normal distribution (only
approximately, in large samples, with a stationary process)
• Correlations among the ρk values for different k (e.g., ρ1and ρ2 may be correlated).
• Problems relating to simultaneous inference (looking at many
different statistics simultaneously, confidence/significance
levels have little meaning).
3 - 37