Time Series Forecasting
Transcript of Time Series Forecasting
0Mu Sigma Confidential
Chicago, IL
Bangalore, India
www.mu-sigma.com
Proprietary Information
"This document and its attachments are confidential. Anyunauthorized copying, disclosure or distribution of the material is strictly forbidden"
Chicago, IL
Bangalore, India
www.mu-sigma.com
Proprietary Information
"This document and its attachments are confidential. Anyunauthorized copying, disclosure or distribution of the material is strictly forbidden"
Do The Math
Time Series Forecasting
April 2019
2Mu Sigma Confidential
Tim
e S
eri
es M
od
elin
g
Time Series Basics
Basic Time Series Modeling
Exponential Smoothening
ARIMA
Dynamic Time Series Model
Crostonβs Model
Model Selection
TOPICS TO BE COVERED
3Mu Sigma Confidential
Cycle: Patterns which appear, like increase or decrease, but not at fixed interval of time. The time duration of repetition of these pattern is typically long, for example 2 years or so
SOME KEYWORDS
4Mu Sigma ConfidentialTIME SERIES DECOMPOSITION
Multiplicative Decomposition
ππ = πΊπ β π»π β πΉπ
Additive Decomposition
ππ = πΊπ + π»π +πΉπ
Seasonally Adjusted Time Series: Time Series left after removing the seasonal component. Seasonally Adjusted component has trend-cycle and remainder component
Breaking down the time series into seasonal, trend-cycle and remainder component
5Mu Sigma Confidential
Example β Odd point MA
TIME SERIES DECOMPOSITION
Trend Cycle Estimation - Moving Average Smoothing
π»π =π
π
π=βπ
π
ππ+π
Year Sales (GWh) 5-MA
1989 2354.34 -
1990 2379.71 -
1991 2318.52 2381.53
1992 2468.99 2424.56
1993 2386.09 2463.76
1994 2569.47 -
1995 2575.72 -
No estimate for initial and end
values
Example β Even point MA
Year Sales (GWh) 4-MA 2x4-MA
1989 2354.34
1990 2379.71 2380.39
1991 2318.52 2388.33 2384.36
1992 2468.99 2435.77 2412.05
1993 2386.09 2500.07 2467.92
1994 2569.47 2510.43 2505.25
1995 2575.72 2572.60 2541.51
Odd point moving average is
symmetric
π¦ =1
5π¦π‘β2 + π¦π‘β1 + π¦π‘ + π¦π‘+1 + π¦π‘+2
Even point moving average is
non-symmetric
π¦ =1
4π¦π‘β2 + π¦π‘β1 + π¦π‘ + π¦π‘+1
mxEven point moving average is symmetric
π¦ =1
2
1
4π¦π‘β2 + π¦π‘β1 + π¦π‘ + π¦π‘+1 +
1
4π¦π‘β1 + π¦π‘ + π¦π‘+1 + π¦π‘+2
π¦ =1
8π¦π‘β2 +
1
4π¦π‘β1 +
1
4π¦π‘ +
1
4π¦π‘+1 +
1
8π¦π‘+2
Techniques typically used for time series decomposition
X11 Decomposition SEATS Decomposition STL Decomposition
6Mu Sigma ConfidentialBASIC FORECASTING METHOD
BASIC FORECASTING METHOD
Average Method
Future values are same as average of the historical values
NaΓ―ve Method
Future values are same as last observation β also called as
random walk forecast
Seasonal NaΓ―ve Method
Future values are same as last season value β for e.g. future Feb value will be same as last yearβs
Feb value
Drift Method
Future value are same as average change seen over time
in historical data
Average Method
ππ+β =π¦1 + π¦2 + β¦+ π¦πβ1+ π¦π
π
NaΓ―ve Method
ππ+β = π¦π
Seasonal NaΓ―ve Method
ππ+β =ππ+β βπ(π+1)
Drift Method
ππ+β = π¦π +βπ¦π β π¦1π β 1
7Mu Sigma ConfidentialBASIC FORECASTING METHOD
FORECASTING USING TIME SERIES DECOMPOSITION
Forecast Seasonal ComponentForecast Seasonally Adjusted
Component
Any non-seasonal forecasting method, like non-seasonal ARIMA, random walk with drift etc. can be used to forecast the
seasonally adjusted component
Seasonal NaΓ―ve is typically used to forecast seasonal component assuming
seasonal component doesnβt change much over time
8Mu Sigma ConfidentialEXPONENTIAL SMOOTHING
Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights
decaying exponentially as the observations get older. In other words, the most recent observation would have highest weight and the older observations would have lower associated weight
Simple Exponential Smoothing(SES)
βͺ Used when time series doesnβt have trend
and seasonality in it i.e. values hover
around the mean
βͺ Flat forecast β i.e. all forecast values take
same value
ΰ·ππ»+π = πΆππ» + πβ πΆ ΰ·ππ»
ΰ·ππ»+π = ΰ·ππ» + πΆ(ππ» β ΰ·ππ»)
0<Ξ±<1 is the smoothing parameter
β’ Forecast for new period is equal to last
smoothed (i.e. fitted) observation plus
some adjusted of error in last fitted and
actual observation
β’ Parameter is estimated by minimizing the
SSE
Letβs see
an
example
Month Sales (Billions) Alpha = 0.6
Jan-17 97.60 -
Feb-17 95.10 97.60
Mar-17 90.30 96.10
Apr-17 92.50 92.62
May-17 94.60 92.55
Jun-17 91.00 93.78
Jul-17 90.20 92.11
Aug-17 93.00 90.96
Sep-17 93.80 92.19
Oct-17 97.00 93.15
Nov-17 99.00 95.46
Dec-17 90.00 97.58
Jan-18 90.10 93.03
Feb-18 94.20 91.27
Mar-18 95.00 93.03
Apr-18 96.70 94.21
May-18 98.00 95.70
Jun-18 99.00 97.08
84
86
88
90
92
94
96
98
100
JA
N-
17
FE
B-
17
MA
R-1
7
AP
R-
17
MA
Y-
17
JU
N-1
7
JU
L-
17
AU
G-
17
SE
P-
17
OC
T-1
7
NO
V-
17
DE
C-1
7
JA
N-
18
FE
B-
18
MA
R-1
8
AP
R-
18
MA
Y-
18
JU
N-1
8
SIM PLE EXPON ENTIAL SM O OTHING
Actual Predicted
1-Step-Ahead
Forecast
Component form
βͺ Forecast Equation
ΰ·ππ»+π = ππ»
βͺ Smoothing Equation
ππ» = πΆππ» + (π β πΆ)ππ»βπ
9Mu Sigma ConfidentialEXPONENTIAL SMOOTHING
Holtβs Linear Trend Method
βͺ Used when time series has trend into it
βͺ Two parameter are estimated β one for trend
and another for level
Component form
βͺ Forecast Equation
ΰ·ππ»+π = ππ»+ πππ»
βͺ Level Equation
ππ» = πΆππ» + πβ πΆ (ππ»βπ+ππ»βπ)
βͺ Trend Equation
ππ» = π· ππ» β ππ»βπ + (πβ π·)ππ»βπ
0 < Ξ± < 1 is smoothing parameter for level
0 < < 1 is smoothing parameter for trend
Issues with model
βͺ Forecast keeps on increasing or decreasing
indefinitely
βͺ Method tends to over forecast for longer
forecast horizon
HOLTβS METHOD
10Mu Sigma ConfidentialEXPONENTIAL SMOOTHING
HOLTβS METHOD
Holtβs Damped Trend Method
βͺ Another parameter is introduced to dampen
the trend over time
βͺ Three parameter are estimated β one for
trend, second for level and third for damping
the trend
Component form
βͺ Forecast Equation
ΰ·ππ»+π = ππ»+ (β + β π + β¦+ β π)ππ»
βͺ Level Equation
ππ» = πΆππ» + π βπΆ (ππ»βπ+β ππ»βπ)
βͺ Trend Equation
ππ» = π· ππ» β ππ»βπ + (π β π·)β ππ»βπ
0 < Ξ± < 1 is smoothing parameter for level
0 < < 1 is smoothing parameter for trend
0 < < 1 is damping parameter
βͺ Damping parameter equal to 1 leads to Holtβs
linear trend method
βͺ Typically value of damping parameter is kept
between 0.8 to 0.98
11Mu Sigma Confidential
HOLT WINTERβS METHOD
EXPONENTIAL SMOOTHING
Holt Winterβs Method
βͺ Used to handle seasonality as well along
with trend in the time series
βͺ Three parameter are estimated β one for
trend, second for level and third for
seasonality
Component form (Additive)
βͺ Forecast Equation
ΰ·ππ»+π = ππ» +πππ» + ππ»+πβπ(π+π)
βͺ Level Equation
ππ» = πΆ(ππ» βππ»βπ) + π β πΆ (ππ»βπ+ππ»βπ)
βͺ Trend Equation
ππ» = π· ππ» β ππ»βπ + (π β π·)ππ»βπ
βͺ Season Equation
ππ» = πΈ ππ» β ππ»βπ β ππ»βπ + (πβπΈ)ππ»βπ
0 < Ξ± < 1 is smoothing parameter for level
0 < < 1 is smoothing parameter for trend
0 < < 1 is smoothing parameter for season
If seasonality is multiplicative, multiplicative form of HW should be used
12Mu Sigma Confidential
Damping is also possible for both additive and multiplicative HW method
HOLT WINTERβS METHOD
EXPONENTIAL SMOOTHING
13Mu Sigma Confidential
INNOVATIONS β EXPONENTIAL SMOOTHING MODEL
EXPONENTIAL SMOOTHING
ETS (Error, Trend, Seasonal) Model
βͺ These provide prediction intervals along
with point forecasts
βͺ Errors could be additive as well as
multiplicative
Forecast is equal to sum of previous period
forecast and adjustment of error associated
with it
ΰ·ππ»+π = ΰ·ππ»+ πΆ(ππ» βΰ·ππ»)
ΰ·ππ»+π = ΰ·ππ» β πΆππ»
Error - AdditiveSeasonality
N A M
Trend
N NN NA NM
A AN AA AM
Ad AdN AdA AdM
Error - MultiplicativeSeasonality
N A M
Trend
N NN NA NM
A AN AA AM
Ad AdN AdA AdM
βͺ AIC, AICC, BIC values can be used for model
selection
βͺ ETS in R (forecast package) estimates all the
parameters and not the forecast directly
βͺ The parameters estimated can be used to
forecast using βforecast()β function
14Mu Sigma ConfidentialARIMA MODEL
ACF AND PACF PLOT
Autocorrelation
βͺ It is measure of linear relationship between
current and lagged observation
βͺ Graphical representation of autocorrelation
coefficient is called ACF or correlogram
Partial Autocorrelation
βͺ It is measure of correlation between current
and lagged observation by excluding
influence of interim observations
βͺ Graphical representation of partial
autocorrelation coefficient is called PACF
For series w ith trend and
seasonality, peaks w ill be
observed at seasonal lag β
more like w aves
For series w ith trend only, ACF
w ill gradually move tow ards zero
value
For series w ith trend only, PACF
w ill suddenly drop to zero value
For series w ith trend and
seasonality, PACF w ill show
peaks at seasonal lags
Example Example
15Mu Sigma ConfidentialARIMA MODEL
MAKING TIME SERIES STATIONARY
Stationary Time Series
βͺ A stationary time series doesnβt have any
predictable pattern in long term
βͺ Note that cycles are aperiodic and hence,
time series with cycles but no trend and
seasonality will be considered as stationary
Differencing
βͺ Differencing helps in stabilizing the mean of
time series by removing changes in level of
the time series
βͺ This in turn leads to removal of trend and
seasonality from the data
Non-Stationary
to
Stationary
Non-Stationary Time Series Non-stationarity can be validated from ACF plot
Time Series post First Order Differencing ACF of First Order Differenced Time Series
βͺ Box test can be run to identify if original or transformed series is stationary (null hypothesis is of time series being stationary)
βͺ Second order differencing, seasonal differencing can also be done
βͺ Differencing can also be done post log transformation
16Mu Sigma ConfidentialARIMA MODEL
AUTO REGRESSIVE INTEGRATED MOVING AVERAGE
Auto Regressive Model
Future values are forecasted by regressing it against linear combination of past values
π¦π‘ = π + β 1π¦π‘β1+ β 2π¦π‘β2+ β¦+ β ππ¦π‘βπ + ππ‘
Above equation would be referred as AR(p) model
Moving Average Model
Future values are forecasted by regressing it against linear combination of forecast errors
π¦π‘ = π + π1ππ‘β1+ π2ππ‘β2+ β¦+ ππππ‘βπ + ππ‘
Above equation would be referred as MA(q) model
ARIMA Integration
Opposite of differencing
Expressing AR(p) as MA() and MA(q) as AR() is called invertibility of the model; this is an important property of ARIMA model along with stationarity
Value of p and q can be identified using ACF and PACF plots
For an ARIMA(p,d,0) model, ACF and PACF plots of
differenced data would show:
βͺ ACF exponentially decaying or sinusoidal
βͺ Significant spike at lag p in the PACF, but non beyond p
For an ARIMA(0,d,q) model, ACF and PACF plots of
differenced data would show:
βͺ PACF exponentially decaying or sinusoidal
βͺ Significant spike at lag q in the ACF, but non beyond q
If both p and q are positive, then plots donβt help in finding value of p and q
17Mu Sigma ConfidentialARIMA MODEL
NON-SEASONAL ARIMA MODEL
Looks like ARIMA(3,0,0) model
AIC = 340.3 (Better Model) AIC = 342.3
18Mu Sigma Confidential
Non Seasonal Part of the Model
(p, d, q)
Seasonal Part of the Model
(P, D, Q)m
Seasonal ARIMA
Model
(p, d, q) are estimated using ACF and PACF plots as discussed previously
(P, D, Q) are estimated by observing peaks at the seasonal lags in ACF and PACF plots. For e.g.
ARIMA(0,0,0)(0,0,1)12 model will show:
β’ A spike at lag 12 of the ACF plot but no other significant peaks
β’ Exponential decay in seasonal lags of PACF plot
Differencing Seasonal Lag
Data
with
trend a
nd
seasonalit
y
Non-S
tatio
nary
aft
er
seasonal diffe
rencin
g
Dif
fere
ncin
g S
easo
na
l La
g
Dif
fere
nce
d d
ata
Sta
tionary
aft
er diffe
rencin
g t
he
seasonal la
g d
iffe
rence
βͺ With peak at lag of 1 in ACF plot, it can be assumed as MA(1) non-seasonal model whereas peak at lag 4 suggests MA(1) seasonal model
βͺ Therefore, the model could be ARIMA(0,1,1)(0,1,1)4
βͺ Taking PACF plot as reference, ARIMA(1,1,0)(1,1,0)4 can also be created
ARIMA MODEL
19Mu Sigma Confidential
SEASONAL ARIMA MODEL
ARIMA MODEL
βͺ PACF plot has peaks at seasonal lags of 12, 24; hence, a seasonal AR(1) model can be created
βͺ PACF plot also has peaks at 1, 2 and 3; hence, a non-seasonal AR(3) model can be created
βͺ Nothing is coming out of ACF plot
βͺ Therefore, creating ARIMA(3,0,0)(1,1,0)12
model
Data
with
trend a
nd
multi
plic
ativ
e
seasonalit
y
1
Log t
ransfo
rmatio
n t
o m
ake
seasonalit
y c
onsis
tent
2
Diffe
rencin
g logged d
ata
at
seasonal la
g o
f 12
3
20Mu Sigma Confidential
Seasonal ARIMA based on observations made from ACF
and PACF plot
AIC = -464.48
SEASONAL ARIMA MODEL
ARIMA MODEL
Seasonal ARIMA using βauto.arimaβ function in R
AIC = -486.99
Observations to be made
1)Time Series residuals look like white noise; 2)NO autocorrelation in residuals as per ACF plot; 3)Residuals follow bell curve
21Mu Sigma Confidential
ARIMAX / DYNAMIC TIME SERIES MODEL
ARIMAX MODEL
ARIMAX is the extended version of ARIMA model which also includes external regressor into the modelD
aily
ele
ctr
icity
dem
and and
max t
em
pera
ture
tim
e s
eries
Quadra
tic r
ela
tionship
b/w
Dem
and
and tem
pera
ture
Assessing the model f it Forecast for next 14 w eeks
Future values of independent variables are required to
get the forecast; these can also be modeled separately
22Mu Sigma ConfidentialCROSTON MODEL
CROSTON MODEL
βͺ Demand Series (q) β Series with non-
zeroes value
βͺ Inter-Arrival Time Series (a) β Series
with periods between non-zero values
Application Croston Decomposition Forecasting Model
βͺ Extensively used to model
the intermittent demand (multiple zero values) time
series
βͺ Simple exponential smoothening
(SES) is done for q and a both
ΰ·ππ»+π =ππ+π
ππ+π
j is the time for last observed positive
value
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 0 2 0 1 0 11 0 0 0 0 2 0
2 6 3 0 0 0 0 0 7 0 0 0 0
3 0 0 0 3 1 0 0 1 0 1 0 0
i 1 2 3 4 5 6 7 8 9 10 11
q 2 1 11 2 6 3 7 3 1 1 1
a 2 2 2 5 2 1 6 8 1 3 2
Croston Decomposition into
Demand and Inter-arrival time
series
Croston model
Prediction intervals are not generated in the Croston methodDoesnβt give point forecast; forecast value is average forecast for the
period
23Mu Sigma Confidential
Typically used accuracy metrics
MODEL SELECTION
MODEL FORECAST ACCURACY ASSESSMENT
Root Mean Squared Error
(RMSE)
Mean Absolute Error
(MAE)
Mean Absolute Percentage Error
(MAPE)
Mean Absolute Scaled Error
(MASE)
ππππ(πππ)
βͺ Difficult to calculate and
interpret
βͺ Forecast method minimizing RMSE will lead to forecasts
of mean
Forecast error IS NOT same as residuals. Forecast error is calculated on test dataset whereas residuals are calculated
on training dataset
ππ»+π = ππ»+π β ΰ·ππ»+π
ππππ ππ
βͺ Easy to calculate and
interpret
βͺ Forecast method minimizing MAE will lead to forecasts
of median
ππππ ππ
βͺ Suffers from being equal to
infinite or undefined for yt = 0 or very small I value
ππππππ
π»ππππππππ΄π¨π¬
βͺ Extended version of MAPE
βͺ Value greater than 1 means forecast is better than naΓ―ve
forecast and vice versa
Canβt be used when multiple time series
are available with different units as these metrics preserve the metric information
Can be used when multiple time series
are available with different units
At times different accuracy metrics may show different models to be best. In such a scenarios, select model based on business
context
24Mu Sigma ConfidentialMODEL SELECTION
MODEL FORECAST ACCURACY ASSESSMENT
Traditional Test and Train
Sampling
βͺ Segregate data into test and
train sample
βͺ Typically the split between
train and test is taken as 80-20% or 75-25% of
observations respectively
βͺ Model is trained on full
training dataset and
forecast accuracy is
measured on test dataset
Time Series Cross Validation
βͺ Multiple test sets are
created with one
observation each
βͺ All the observations before
the test observation forms the training set
βͺ Model is trained on the training set
βͺ Model forecast accuracy is measured by averaging
value over different test sets
βͺ The accuracy values
generated are more robust
to be communicated
Train Sample
(In-Sample)
Test Sample
(Held-out Sample)
A good way to choose the best forecasting model is to find the model with the smallest RMSE computed using time series cross-
validation