Forecasting Time Series With R - Dataiku

download Forecasting Time Series With R - Dataiku

of 16

Transcript of Forecasting Time Series With R - Dataiku

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    1/16

    HOWTO

    Forecasting time series with R

    July 22, 2015

    Forecasting Time Series Data With R And

    Data Science StudioDo you day-trade stocks? Monitor humidity in the Amazon rainforest? Predict

    weekly orange production in the Florida keys? If so, you're using time series!

    A time series is when you measure the same variable at regular intervals. They

    occur everywhere in data science. R has several great packages that are built

    specifically to handle time series data.

    This tutorial walks through a time series analysis in R using Data Science Studio.

    I'm going to show you how to explore time series data, choose an appropriate

    modeling method and deploy the model in DSS. Let's get started!

    Preparing The Data

    I'm using a dataset with the monthly totals for international airline passengers

    (https://datamarket.com/data/set/22u3/international-airline-passengers-

    monthly-totals-in-thousands-jan-49-dec-60) provided by datamarket

    (https://datamarket.com/). When I upload the data into DSS, it automatically

    recognizes the Month column as a date that needs parsing. Pretty cool.

    Contact us (/dss/contact/)

    +1 646-568-7477

    Follow us

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html

    1 de 16 1/3/16 15:02

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    2/16

    A simple preparation step can convert this date to the standard format. See our

    documentation on dates (https://doc.dataiku.com/dss/latest/preparation/dates.html) for more information on this step.

    ©Dataiku 2012-2016 - Legal Notice (/legal-notice.html)(/)

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html

    2 de 16 1/3/16 15:02

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    3/16

    Great! Now our data is cleaned and ready for analysis.

    Plotting

    First, I'm going to create a chart to get a feel for the data. To do this, I click on

    !"#$%"!'"(&!%)!"$(*&++$",$%+(-)$&"$. and then the Analyse  icon.

    Then I'm going to click on charts  at the top, and drag /'"#0(*&%+$. into the field

    for the x-axis and 1"#$%"!'" &!%)!"$ *&++$,$%+ into the y-axis. Afer a bit of 

    tweaking, we have the line chart shown below.

    We see two really interesting patterns. First, there's a general upward trend in

    the number of passengers. Second, there is a yearly cycle with the lowest

    number of passengers occuring around the new year and the highest number of 

    passengers during the late-summer. Let's see if we can use these trends to

    forecast the number of passengers afer 1960.

    Interactive Analysis With R

    To start a notebook, I go back to the flow, click on the

    !"#$%"!'"(&!%)!"$(*&++$",$%+(-)$&"$. data set, click on the R icon and then

    click "Notebook Interactive visualisation and analysis of your data".

    (/)

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html

    3 de 16 1/3/16 15:02

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    4/16

    DSS will then open an R notebook with some basic starter coded already filled

    in.

    Sweet. Now that we have an R notebook, I'm going to stop those screen shots

    and just show the code. You can type the following code into the iPython

    notebook for interactive analysis.

    First, I'm going to load the R libraries that we need for this analysis. The .&!34

    library lets us read and write datasets to DSS. The 5'%$-&+# library has thefunctions we need for training models to predict time series. The .*)6% package

    has functions for manipulating data.frames.

    (/)

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html

    4 de 16 1/3/16 15:02

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    5/16

    !"#$%$&7.&!348

    !"#$%$&75'%$-&+#8

    !"#$%$&7.*)6%8

    Then, I'm going to load the data into R from DSS

    .+ 9: ;1?@(A

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    6/16

    Excellent. We have our time series. It's time to start modeling!

    Choosing a forecasting model

    I'm going to try three diff erent forecasting methods and deploy the best to DSS.

    In general, it's good practice to try several diff erent modeling methods and go

    with whichever provides the best performance.

    Model 1: Exponential State Smoothing

    The $#+78 function in the 5'%$-&+# package fits exponential state smoothing

    (ETS) models. This function automically optimizes the choice of model and

    necessary parameters. All you have to do is providing it with a time series.

    Let's use it and then make a forecast for the next 24 months.

    M($#+ E $#+7#+(*&++$",$%+8

    5($#+ E 5'%$-&+#7M($#+G 0ELJ8 ! #$%&'()* +, -$.*/) 0.*$ */& #1*1%&

    *)'#75($#+8

    (/)

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html

    6 de 16 1/3/16 15:02

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    7/16

    Looking good! The forecast is shown in blue with the grey area representing a95% confidence interval. Just by looking, we see that the forecast roughly

    matches the historical pattern of the data.

    Model 2: ARIMA

    The &4#'D&%!M&78 function provides another modeling method. More info on the

    ARIMA model can be found here (https://en.wikipedia.org

    /wiki/Autoregressive_integrated_moving_average). The &4#'D&%!M&78 function

    automatically searches for the best model and optimizes the parameters. Usingthe &4#'D&%!M&78 is almost always better than calling the

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    8/16

    Great! These confidence intervals seem bit smaller than those for the ETSmodel. Maybe this is because of a better fit to the data, but let's train a third

    model before doing a model comparison.

    Model 3: TBATS

    The last model I'm going to train is a TBATS model. This model is designed for

    use when there are multiple cyclic patterns (e.g. daily, weekly and yearly

    patterns) in a single time series. Maybe it will be able to detect complicated

    patterns in our time series.

    M(#N+ E #N+7#+(*&++$",$%+8

    5(#N+ E 5'%$-&+#7M(#N+G 0ELJ8

    *)'#75(#N+8

    (/)

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html

    8 de 16 1/3/16 15:02

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    9/16

    Now we have three models that all seem to give reasonable predictions. Let'scompare them to see which is performing the best.

    Model comparison

    I'm going to use AIC (https://en.wikipedia.org

    /wiki/Akaike_information_criterion) to compare the diff erent models. AIC is

    common method for determining how well a model fits the data, while

    penalizing more complex models. The model with the smallest  AIC is the best

    fitting model.

    N&%*)'#7-7@OBEM($#+F&!-G

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    10/16

    We see that the ARIMA model performs the best. So, let's go ahead and turn ourinteractive R code into an R recipe that can be built into our DSS workflow.

    But before we can do this, we have to turn the output of 5'%$-&+#78 into a

    data.frame, so that we can store it in DSS.

    First, I'm going to find the last date for which we have a measurement.

    )&+#(.$ E !".$R7#+(*&++$",$%+8S)$",#07#+(*&++$",$%+8T

    Then, I'm going to create data.frame with the prediction for each month. I'm

    also going to include the lower and upper bounds of the predictions, and the

    date. Since we're representing dates by the year, each month is 1/12 of a year.

    5'%$-&+#(.5 E .&D5%&M$7*&++$",$%+(*%$.!-#$.E5(&&FM$&"G

      *&++$",$%+()'U$%E5(&&F)'U$%SGLTG

      *&++$",$%+(4**$%E5(&&F4**$%SGLTG

      .$E)&+#(.$ V +$K7HWHLG LG N6EHWHL88

    (/)

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html

    10 de 16 1/3/16 15:02

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    11/16

    Finally, we split the date column into separate columns for year and month.

    5'%$-&+#(.5 E 5'%$-&+#(.5 XYX

      M4#$76$&%E5)''%7.$88 XYX

      M4#$7M'"#0E%'4".777.$ XX H8 Z HL8 V H88

    All together the code is

    )&+#(.$ E !".$R7#+(*&++$",$%+8S)$",#07#+(*&++$",$%+8T

    5'%$-&+#(.5 E .&D5%&M$7*&++$",$%+(*%$.!-#$.E5(&&FM$&"G

      *&++$",$%+()'U$%E5(&&F)'U$%SGLTG

      *&++$",$%+(4**$%E5(&&F4**$%SGLTG

      .$E)&+#(.$ V +$K7HWHLG LG N6EHWHL88

    5'%$-&+#(.5 E 5'%$-&+#(.5 XYX

      M4#$76$&%E5)''%7.$88 XYX

      M4#$7M'"#0E77.$ XX H8 Z HL8 V H8

    Awesome! Now we have everything we need to deploy the model onto DSS: the

    code to create the forecast for the next 24 months and the code to convert the

    result into a data.frame.

    Deploying The Model In DSS

    To deploy our model, we need to create a new R recipe. To do this, click on

    !"#$%&"(&!%)!"$(*&++$",$%+(-)$&"$., then click on the R icon on the right, then

    click on "Recipe Create new datasets using R code".

    (/)

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html

    11 de 16 1/3/16 15:02

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    12/16

    I'm going to create a new managed dataset, 5'%$-&+#, for the output of my

    recipe.

    (/)

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html

    12 de 16 1/3/16 15:02

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    13/16

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    14/16

    That's it! Now we can click on run at the bottom of the page and return to the

    DSS flow where we see our newly created forecast dataset.

    Clicking on the 5'%$-&+# dataset let's us look at our new predictions stored as a

    DSS dataset.

    (/)

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html

    14 de 16 1/3/16 15:02

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    15/16

    If you thought this was helpful, check out our other tutorials

    (http://learn.dataiku.com/).

    (/)

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html

    15 de 16 1/3/16 15:02

  • 8/18/2019 Forecasting Time Series With R - Dataiku

    16/16

    What is the di!erence betweenBusiness Intelligence and Data

     — The main di!erence

    is the external, dynamic, and

    unstructrued data types which are

     

    Data Science Studio: Yuzuversion released

     — I think your

    product is fantastic, though I'm

    still waiting for the trial version.

     

    Data Science Studio First User

    Group Meetup: Build your own

      •

     — Bonjour !

    Malheureusement, cette fois-ci un

    webex n'est pas prévu. En

     

    [Interview] Olivier Grisel on

    scikit-learn and the future of

      •

     —

    This would be a wonderful thing

    for many people who wanted to

     

    DATAIKU

    0 Comments !1(/)

    Forecasting time series with R - Dataiku https://www.dataiku.com/learn/guide/code/r/time_series.html