Forecasting Multiple Time Series Using the baselineforecast R Package

15
Forecasting (Revenue for S&P 500 Companies) Using the baselineforecast Package by Konstantin Golyaev Microsoft Azure Machine Learning Konstantin Golyaev, useR! 2016, Stanford, CA 1 6/30/2016

Transcript of Forecasting Multiple Time Series Using the baselineforecast R Package

Page 1: Forecasting Multiple Time Series Using the baselineforecast R Package

Forecasting (Revenue for S&P 500 Companies) Using the baselineforecast Package

by Konstantin GolyaevMicrosoft Azure Machine Learning

Konstantin Golyaev, useR! 2016, Stanford, CA 16/30/2016

Page 2: Forecasting Multiple Time Series Using the baselineforecast R Package

Motivation

• “Prediction is very difficult, especially about the future”• © Niels Bohr (allegedly)

• We want to: • Forecast multiple time series at different horizons

• Leverage useful external information, when available

• Employ state-of-the-art methods

Note: won’t show any results due to five-minute time constraint

Konstantin Golyaev, useR! 2016, Stanford, CA 26/30/2016

Page 3: Forecasting Multiple Time Series Using the baselineforecast R Package

Two Ways to Forecast

1. Time-series methods (ARIMA, ETS, STL, etc.)• Great for modeling trend and seasonality

2. Regression-based methods (elastic net, random forest, boosted regression trees, etc.)• Derive power from external information (features)

Can we get the best of both worlds?

Konstantin Golyaev, useR! 2016, Stanford, CA 36/30/2016

Page 4: Forecasting Multiple Time Series Using the baselineforecast R Package

Konstantin Golyaev, useR! 2016, Stanford, CA 46/30/2016

Page 5: Forecasting Multiple Time Series Using the baselineforecast R Package

Illustration

• Take small window of series

𝑦1𝑦2𝑦3𝑦4𝑦5𝑦6𝑦7𝑦8⋮

Konstantin Golyaev, useR! 2016, Stanford, CA 56/30/2016

Page 6: Forecasting Multiple Time Series Using the baselineforecast R Package

Illustration

• Take small window of series

• Fit a model to it, make forecasts few steps ahead

𝑦1𝑦2𝑦3𝑦4𝑦5𝑦6𝑦7𝑦8⋮

𝑓7|6𝑓8|6⋮

Konstantin Golyaev, useR! 2016, Stanford, CA 66/30/2016

Page 7: Forecasting Multiple Time Series Using the baselineforecast R Package

Illustration

• Take small window of series

• Fit a model to it, make forecasts few steps ahead

• Move the window forward

𝑦1𝑦2𝑦3𝑦4𝑦5𝑦6𝑦7𝑦8⋮

Konstantin Golyaev, useR! 2016, Stanford, CA 76/30/2016

Page 8: Forecasting Multiple Time Series Using the baselineforecast R Package

Illustration

• Take small window of series

• Fit a model to it, make forecasts few steps ahead

• Move the window forward

• Repeat the process

𝑦1𝑦2𝑦3𝑦4𝑦5𝑦6𝑦7𝑦8⋮

𝑓8|7𝑓9|7⋮

Konstantin Golyaev, useR! 2016, Stanford, CA 86/30/2016

Page 9: Forecasting Multiple Time Series Using the baselineforecast R Package

Illustration

• Take small window of series

• Fit a model to it, make forecasts few steps ahead

• Move the window forward

• Repeat the process

• Continue until out of data, combine results when done

𝑦7 𝑓7|6𝑦8 𝑓8|6𝑦8𝑦9⋮

𝑓8|7𝑓9|7⋮

Konstantin Golyaev, useR! 2016, Stanford, CA 96/30/2016

Page 10: Forecasting Multiple Time Series Using the baselineforecast R Package

What Else Can We Do?

Konstantin Golyaev, useR! 2016, Stanford, CA 106/30/2016

Page 11: Forecasting Multiple Time Series Using the baselineforecast R Package

Date-Based Features

Examples:

• Year

• Quarter

• Month

• Week

• Holidays

• Etc…

Konstantin Golyaev, useR! 2016, Stanford, CA 116/30/2016

Page 12: Forecasting Multiple Time Series Using the baselineforecast R Package

Lags or Other Functions of 𝑦𝑡

• R does not compute lags correctly when series has gaps in its index (e.g. missing months/days)

• So we implemented it

Konstantin Golyaev, useR! 2016, Stanford, CA 126/30/2016

Page 13: Forecasting Multiple Time Series Using the baselineforecast R Package

External Series as Features

• This is very much problem-specific

• What we used in various projects:• Macroeconomic data from Federal Reserve Economic Data (FRED)

• Web search trends from Bing/Google/etc

• Tweets scored for sentiments

• External business drivers such as promotions

Konstantin Golyaev, useR! 2016, Stanford, CA 136/30/2016

Page 14: Forecasting Multiple Time Series Using the baselineforecast R Package

Implementation

• All code is combined into baselineforecast R package

• Function ConstructDataset() takes series 𝑦𝑡 and external data 𝑋𝑡, returns data frame with target and features

• Function FitModel() interfaces with caret package to train any regression learning algorithm and perform time series cross-validation

Konstantin Golyaev, useR! 2016, Stanford, CA 146/30/2016

Page 15: Forecasting Multiple Time Series Using the baselineforecast R Package

Future Work

• Exploratory Data Analysis

• Computing Prediction Intervals

• Decide on the license/distribution model

Have questions?

Ping me at [email protected]

Konstantin Golyaev, useR! 2016, Stanford, CA 156/30/2016