Introduction to Time Series Data and Analysis
Transcript of Introduction to Time Series Data and Analysis
Introduction to Time Series Data andAnalysis
Simon TaylorDepartment of Mathematics and Statistics
20 April 2016
Contents
What is Time Series Data?Analysis Tools
Trace PlotAuto-Correlation FunctionSpectrum
Time Series ModelsMoving AverageAuto-Regressive
Further Topics
What is Time Series Data?
A time series is a set of observations made sequentially throughtime.
Examples:• Changes in execution time, RAM or bandwidth usage.• Times a software has run in consecutive periods of time.• Financial, geophysical, marketing, demographic, etc.
The objectives in time series analysis are:Description How does the data vary over time?Explanation What causes the observed variation?Prediction What are the future values of the series?Control Aim to improve control over the process.
Common Questions
Q: How important is preservingdata order?
A: Very! Changing data orderbreaks the dependencebetween measurements.
Q: How frequent do I need to takemeasurements?
A: It depends:
• Too sparse, risk missing thedependence structure.
• Too frequent, swamped with noise.
−5
05
n
1Hz
−5
05
n
10H
z
0 10 20 30 40 50
−5
05
n
100H
z
Time
Figure: Sampling Frequency
Why is time series important inbenchmarking?
Q: Can I use simple summarystatistics?
A: You can, but they onlydescribe overall properties.
Q: Can’t I just interpolatebetween data points?
A: Signals are often subject touncontrollable random noise.Error from interpolation maybe large if noise is large.
−3
−2
−1
01
23
X
−3
−2
−1
01
23
Y
0 200 400 600 800 1000
−3
−2
−1
01
23
Z
Time
Figure: Three times series withx̄ = 0 and s2 = 1.
Analysis Tools – Trace Plot
A trace plot is a graph of the measurements against time.
Easy to visually identify key features:
• Trends – Long-term trend in the mean level.
• Seasonality – Regular peaks & falls in the measurements.
• Outliers – Unusual measurements that are inconsistent withthe rest of the data.
• Discontinuities – Abrupt change to the underlying process.
Analysis Tools – Auto-correlation function
Correlation measures thelinear dependence between twodata sets.
Auto-correlation measures thecorrelation between all datapairs at lag k apart.
rk =
∑T−kt=1 (xt − x̄)(xt+k − x̄)
(T − 1)s2 ,
where x̄ and s2 is the samplemean and variance.
Figure: Lag 5 ACF calculation.
Analysis Tools – Spectrum
The spectrum describes howthe power in a time seriesvaries across frequencies.
I(ω) =1πT
∣∣∣∣∣T∑
t=1
xtei2πtω
∣∣∣∣∣2
,
for ω ∈ (0,1/2].
Identifies prominent seasonaland cyclic variation.
0 10 20 30 40 50
Time
1/4
1/5
1/6
1/7
1/8
X
0.0 0.1 0.2 0.3 0.4 0.5
010
2030
4050
FrequencyS
pect
rum
Figure: Fourier decomposition andspectrum of time series Xt .
Time series models
Let X1:T = {X1, . . . ,XT} denote a sequence of T measurements.
A time series is stationary if the distribution of any pair of subsetseparated by lag k , X1:t and X1+k ,t+k , are the same.
A time series is weakly stationary if the first two moments areconstant over time:
E[Xt ] = µ and Cov(Xt ,Xt+k ) = γ(k).
Gaussian White Noise Process, GWNP
The time series {Zt} follows a Gaussian white noise process if:
Zt ∼ N (0, σ2), t = 1, . . . ,T
Gaussian White Noise Process
Figure: Gaussian white noise process.
MA(q) process
Moving Average Process of Order q, MA(q)
The process {Xt} is a moving average process of order q if:
Xt = β0Zt + β1Zt−1 + · · ·+ βqZt−q
where {Zt} is a GWNP and β0, . . . , βq are constants (β0 = 1).
Expectation: E[Xt ] = 0.
Auto-covariance:
Cov(Xt ,Xt+k ) =
{σ2∑q−|k |
i=0 βiβi+|k |, |k | = 0, . . . ,q;0, otherwise.
MA(q) process
Figure: Left: MA(1), β1 = 0.9. Right: MA(2), (β1, β2) = (−0.4,0.9).
AR(p) process
Autoregressive Process of Order p, AR(p)
The process {Yt} is an autoregressive process of order p if:
Yt = α1Yt−1 + · · ·+ αpYt−p + Zt
where {Zt} is a GWNP and α1, . . . , αp are constants.
Expectation: E[Xt ] = 0.
Auto-covariance for AR(1):
Cov(Xt ,Xt+k ) = σ2 α|k |1
1− α21, provided |α1| < 1.
AR(p) process
Figure: Left: AR(1), α1 = 0.9. Right: AR(2), (α1, α2) = (0.8,−0.64).
Non-stationary process
0 200 400 600 800 1000
−10
010
20
Random Walk
Time
0 200 400 600 800 1000
−20
020
Non−stationary AR(2)
Time
0 200 400 600 800 1000
−8
−6
−4
−2
02
AR(1) w/ Mean Change
Time
0 200 400 600 800 1000
−3
−2
−1
01
23
Concatenated Haar MA
Time
Figure: Examples of non-stationary processes.
On-going Research in Time Series
local Whittlenonstationary
residual autocorrelationsleast squares estimation
fractional integrationheteroscedasticunobserved components
model selectionspectral analysis
ARIMA
state−space model
empirical distribution function
linear time seriesFisher information matrix
Stochastic volatility
ARVARMA
conditional heteroscedasticity
simulation
kernel−density estimation
deterministic trend CUSUM test
TAR
nonlinear AR
Brownian motion
consistency
count data
likelihood ratio test
fractional cointegration
Markov chain
Additive outliers
random coefficients
EM algorithm
stationary process
bilinear model
parameter estimation
neural network
QMLE
Bayesian inference
time series
integer−valued time series
estimation
high−frequency data
spatio−temporal
periodogram
long memory
Gaussian process
power
wavelet
RJMCMChypothesis testing semiparametric estimation
goodness−of−fit
threshold model
factor modelleast squares
score test
point process
GARCH
ARMAPortmanteau test
Dickey−Fuller test
changepoint
robustness
spectral density
INAR
cointegration
Monte Carlo experimentnonstationary time series
efficiency
partial autocorrelationnonparametric estimation
MLE
CLT
structural break
ARFIMA
smoothing
Seasonal unit roots
VAR
missing data
infinite variance
periodic time series
block bootstrap
spectral density matrix
unit roots
forecastingmaximum likelihood
ARCH
periodically correlated process
prediction
confidence interval
stationarity
structural change
outlier
AIC
Whittle likelihood
seasonalitysubsampling
Asymptotic distributionLagrange multiplier
Asymptotic normality
stationary test
temporal aggregationnon−Gaussian
Autocorrelation
bootstrap
multivariate time series
locally stationary processes
heavy tailKalman filtering
MCMC
nonlinear
unit root test
MA
identification
ergodicity
Figure: Keyword cloud from the Journal of Time Series Analysis,2002–2015. Red: Models, Navy: Properties, Grey: Inference &Methods
Further Reading
• Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (2008) Timeseries analysis: Forecasting and control. 4th ed., JohnWiley & Sons.
• Chatfield, C. (2004) The Analysis of Time Series: AnIntroduction. 6th ed., CRC press.
• Signal processing toolbox, MATLABr
(http://uk.mathworks.com/products/signal/)
• Statsmodels, python(http://statsmodels.sourceforge.net/)
Simon Taylor
Department of Mathematics and Statistics
20 April 2016