Forecasting Time Series With R - Dataiku
Transcript of Forecasting Time Series With R - Dataiku
-
8/18/2019 Forecasting Time Series With R - Dataiku
1/16
HOWTO
Forecasting time series with R
July 22, 2015
Forecasting Time Series Data With R And
Data Science StudioDo you day-trade stocks? Monitor humidity in the Amazon rainforest? Predict
weekly orange production in the Florida keys? If so, you're using time series!
A time series is when you measure the same variable at regular intervals. They
occur everywhere in data science. R has several great packages that are built
specifically to handle time series data.
This tutorial walks through a time series analysis in R using Data Science Studio.
I'm going to show you how to explore time series data, choose an appropriate
modeling method and deploy the model in DSS. Let's get started!
Preparing The Data
I'm using a dataset with the monthly totals for international airline passengers
(https://datamarket.com/data/set/22u3/international-airline-passengers-
monthly-totals-in-thousands-jan-49-dec-60) provided by datamarket
(https://datamarket.com/). When I upload the data into DSS, it automatically
recognizes the Month column as a date that needs parsing. Pretty cool.
Contact us (/dss/contact/)
+1 646-568-7477
Follow us
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html
1 de 16 1/3/16 15:02
-
8/18/2019 Forecasting Time Series With R - Dataiku
2/16
A simple preparation step can convert this date to the standard format. See our
documentation on dates (https://doc.dataiku.com/dss/latest/preparation/dates.html) for more information on this step.
©Dataiku 2012-2016 - Legal Notice (/legal-notice.html)(/)
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html
2 de 16 1/3/16 15:02
-
8/18/2019 Forecasting Time Series With R - Dataiku
3/16
Great! Now our data is cleaned and ready for analysis.
Plotting
First, I'm going to create a chart to get a feel for the data. To do this, I click on
!"#$%"!'"(&!%)!"$(*&++$",$%+(-)$&"$. and then the Analyse icon.
Then I'm going to click on charts at the top, and drag /'"#0(*&%+$. into the field
for the x-axis and 1"#$%"!'" &!%)!"$ *&++$,$%+ into the y-axis. Afer a bit of
tweaking, we have the line chart shown below.
We see two really interesting patterns. First, there's a general upward trend in
the number of passengers. Second, there is a yearly cycle with the lowest
number of passengers occuring around the new year and the highest number of
passengers during the late-summer. Let's see if we can use these trends to
forecast the number of passengers afer 1960.
Interactive Analysis With R
To start a notebook, I go back to the flow, click on the
!"#$%"!'"(&!%)!"$(*&++$",$%+(-)$&"$. data set, click on the R icon and then
click "Notebook Interactive visualisation and analysis of your data".
(/)
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html
3 de 16 1/3/16 15:02
-
8/18/2019 Forecasting Time Series With R - Dataiku
4/16
DSS will then open an R notebook with some basic starter coded already filled
in.
Sweet. Now that we have an R notebook, I'm going to stop those screen shots
and just show the code. You can type the following code into the iPython
notebook for interactive analysis.
First, I'm going to load the R libraries that we need for this analysis. The .&!34
library lets us read and write datasets to DSS. The 5'%$-&+# library has thefunctions we need for training models to predict time series. The .*)6% package
has functions for manipulating data.frames.
(/)
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html
4 de 16 1/3/16 15:02
-
8/18/2019 Forecasting Time Series With R - Dataiku
5/16
!"#$%$&7.&!348
!"#$%$&75'%$-&+#8
!"#$%$&7.*)6%8
Then, I'm going to load the data into R from DSS
.+ 9: ;1?@(A
-
8/18/2019 Forecasting Time Series With R - Dataiku
6/16
Excellent. We have our time series. It's time to start modeling!
Choosing a forecasting model
I'm going to try three diff erent forecasting methods and deploy the best to DSS.
In general, it's good practice to try several diff erent modeling methods and go
with whichever provides the best performance.
Model 1: Exponential State Smoothing
The $#+78 function in the 5'%$-&+# package fits exponential state smoothing
(ETS) models. This function automically optimizes the choice of model and
necessary parameters. All you have to do is providing it with a time series.
Let's use it and then make a forecast for the next 24 months.
M($#+ E $#+7#+(*&++$",$%+8
5($#+ E 5'%$-&+#7M($#+G 0ELJ8 ! #$%&'()* +, -$.*/) 0.*$ */& #1*1%&
*)'#75($#+8
(/)
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html
6 de 16 1/3/16 15:02
-
8/18/2019 Forecasting Time Series With R - Dataiku
7/16
Looking good! The forecast is shown in blue with the grey area representing a95% confidence interval. Just by looking, we see that the forecast roughly
matches the historical pattern of the data.
Model 2: ARIMA
The &4#'D&%!M&78 function provides another modeling method. More info on the
ARIMA model can be found here (https://en.wikipedia.org
/wiki/Autoregressive_integrated_moving_average). The &4#'D&%!M&78 function
automatically searches for the best model and optimizes the parameters. Usingthe &4#'D&%!M&78 is almost always better than calling the
-
8/18/2019 Forecasting Time Series With R - Dataiku
8/16
Great! These confidence intervals seem bit smaller than those for the ETSmodel. Maybe this is because of a better fit to the data, but let's train a third
model before doing a model comparison.
Model 3: TBATS
The last model I'm going to train is a TBATS model. This model is designed for
use when there are multiple cyclic patterns (e.g. daily, weekly and yearly
patterns) in a single time series. Maybe it will be able to detect complicated
patterns in our time series.
M(#N+ E #N+7#+(*&++$",$%+8
5(#N+ E 5'%$-&+#7M(#N+G 0ELJ8
*)'#75(#N+8
(/)
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html
8 de 16 1/3/16 15:02
-
8/18/2019 Forecasting Time Series With R - Dataiku
9/16
Now we have three models that all seem to give reasonable predictions. Let'scompare them to see which is performing the best.
Model comparison
I'm going to use AIC (https://en.wikipedia.org
/wiki/Akaike_information_criterion) to compare the diff erent models. AIC is
common method for determining how well a model fits the data, while
penalizing more complex models. The model with the smallest AIC is the best
fitting model.
N&%*)'#7-7@OBEM($#+F&!-G
-
8/18/2019 Forecasting Time Series With R - Dataiku
10/16
We see that the ARIMA model performs the best. So, let's go ahead and turn ourinteractive R code into an R recipe that can be built into our DSS workflow.
But before we can do this, we have to turn the output of 5'%$-&+#78 into a
data.frame, so that we can store it in DSS.
First, I'm going to find the last date for which we have a measurement.
)&+#(.$ E !".$R7#+(*&++$",$%+8S)$",#07#+(*&++$",$%+8T
Then, I'm going to create data.frame with the prediction for each month. I'm
also going to include the lower and upper bounds of the predictions, and the
date. Since we're representing dates by the year, each month is 1/12 of a year.
5'%$-&+#(.5 E .&D5%&M$7*&++$",$%+(*%$.!-#$.E5(&&FM$&"G
*&++$",$%+()'U$%E5(&&F)'U$%SGLTG
*&++$",$%+(4**$%E5(&&F4**$%SGLTG
.$E)&+#(.$ V +$K7HWHLG LG N6EHWHL88
(/)
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html
10 de 16 1/3/16 15:02
-
8/18/2019 Forecasting Time Series With R - Dataiku
11/16
Finally, we split the date column into separate columns for year and month.
5'%$-&+#(.5 E 5'%$-&+#(.5 XYX
M4#$76$&%E5)''%7.$88 XYX
M4#$7M'"#0E%'4".777.$ XX H8 Z HL8 V H88
All together the code is
)&+#(.$ E !".$R7#+(*&++$",$%+8S)$",#07#+(*&++$",$%+8T
5'%$-&+#(.5 E .&D5%&M$7*&++$",$%+(*%$.!-#$.E5(&&FM$&"G
*&++$",$%+()'U$%E5(&&F)'U$%SGLTG
*&++$",$%+(4**$%E5(&&F4**$%SGLTG
.$E)&+#(.$ V +$K7HWHLG LG N6EHWHL88
5'%$-&+#(.5 E 5'%$-&+#(.5 XYX
M4#$76$&%E5)''%7.$88 XYX
M4#$7M'"#0E77.$ XX H8 Z HL8 V H8
Awesome! Now we have everything we need to deploy the model onto DSS: the
code to create the forecast for the next 24 months and the code to convert the
result into a data.frame.
Deploying The Model In DSS
To deploy our model, we need to create a new R recipe. To do this, click on
!"#$%&"(&!%)!"$(*&++$",$%+(-)$&"$., then click on the R icon on the right, then
click on "Recipe Create new datasets using R code".
(/)
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html
11 de 16 1/3/16 15:02
-
8/18/2019 Forecasting Time Series With R - Dataiku
12/16
I'm going to create a new managed dataset, 5'%$-&+#, for the output of my
recipe.
(/)
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html
12 de 16 1/3/16 15:02
-
8/18/2019 Forecasting Time Series With R - Dataiku
13/16
-
8/18/2019 Forecasting Time Series With R - Dataiku
14/16
That's it! Now we can click on run at the bottom of the page and return to the
DSS flow where we see our newly created forecast dataset.
Clicking on the 5'%$-&+# dataset let's us look at our new predictions stored as a
DSS dataset.
(/)
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html
14 de 16 1/3/16 15:02
-
8/18/2019 Forecasting Time Series With R - Dataiku
15/16
If you thought this was helpful, check out our other tutorials
(http://learn.dataiku.com/).
(/)
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html
15 de 16 1/3/16 15:02
-
8/18/2019 Forecasting Time Series With R - Dataiku
16/16
What is the di!erence betweenBusiness Intelligence and Data
•
— The main di!erence
is the external, dynamic, and
unstructrued data types which are
Data Science Studio: Yuzuversion released
•
— I think your
product is fantastic, though I'm
still waiting for the trial version.
Data Science Studio First User
Group Meetup: Build your own
•
— Bonjour !
Malheureusement, cette fois-ci un
webex n'est pas prévu. En
[Interview] Olivier Grisel on
scikit-learn and the future of
•
—
This would be a wonderful thing
for many people who wanted to
DATAIKU
0 Comments !1(/)
Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html