Time Series Forecasting using Neural Nets (GNNNs)

Toward Automatic Time-Series Forecasting UsingNeural Networks - Weixhong Yan

Presenter: Sean Golliher

1 / 19

Relationship to Research

Currently analyzing the performance of NEAT for Time SeriesForecasting (TSF)

Paper summarizes common approaches, and issues, using ANNs forTSF

2 / 19

Claims of the Paper

Develops an automatic TSF model using a Generalized RegressionNeural Network (GRNN)

Shows promising results by winning NN3 time-series competitionagainst 60 different models

3 / 19

General Problems with ANN

Most approaches are ad hoc meaning they do some type ofpreprocessing of the data

Typically try different ANN architectures to see which one performsbetter

Nelson et al. : ANN inconsistency on TSF is the result of differentpreprocessing strategies

Balkin et al. : ANNs require larger number of samples to be trained.Real-world examples, financial etc., are short training samples.

4 / 19

RBF

RBF can be viewed as local linear regression model

Apply Gaussian kernel to input data. All inputs go to node of form:

G (x) = exp

(−x − c

σ2

)(1)

Find center points by assigning c (center point) to each point in dataset (measuring the distance to center point).

This is equivalent to doing a local regression (sigma affects thesmoothing of the approximation).

Output layer (the weights) are trained using least-squares regression

5 / 19

Generalized Definition for Regression

Computation of most probable value of Y for each value of X basedon finite number of possibly noisy measurements of X

Conditional mean of y given X (regression of y on X ) is given by:

E [y |X] =

∫ ∞−∞

yf (X, y)dy∫ ∞−∞

f (X, y)dy

(2)

Since we don’t typically know the density function f(X, y) it can beestimated using a Parzen window density estimator.

6 / 19

Generalized Definition for Regression

The generalized definition yields the following regression function:

ˆY (X) =

n∑i=1

Y iexp(− D2

i2σ2

)n∑

i=1exp

(− D2

i2σ2

) (3)

Where D2i = (X− Xi )T (X− Xi )

In the case of GRNN X is the input data and Xi are the centers.

7 / 19

GRNN

G (x , xi ) are the standard radial basis functions

wi is the generalized regression equation

The spread factor dictates the performance

8 / 19

Claimed Benefits of GRNN

Easy to train

Can accurately approximate functions from sparse and noisy data

Note: Recent paper, Ahmed et al., claim GRNN inferior to MLP forTSF

9 / 19

Methodology Requirements

Minimal human intervention

Computationally efficient for a large number of series

Good forecasting over range of data sets

10 / 19

Preprocessing: Outliers

Real-world time series has outliers

Outliers identified by

|x | ≥ 4max(|ma|, |mb|) (4)

where ma = median(xi−3, xi−2, xi−1) andmb = median(xi+1, xi+2, xi+3)

If x is an outlier the value is replaced with average value of two pointsbefore and after x

11 / 19

Preprocessing: Trends

Real-world time series has trends. Could be due to seasonality or otherfactors.

Common approaches are curve fitting, filtering, and differencing.

Identifying trends is difficult to do algorithmically

Proposes detrending scheme:

Split series into segments. If monthly split into 12 if quarterly split into4Mean of historical observations within each segment is subtracted fromevery historical observation in segment.

If x is an outlier the value is replaced with average value of two pointsbefore and after x

12 / 19

Preprocessing: Seasonality

Identifying seasonality is typically a manual process

Author used a simple approach and defined short series as n ≤ 60 andlong n ≥ 60

Uses autocorrelation coefficients at one and two seasonal lags todecide if seasonal

Uses a standard method for subtracting out seasonality from seriesdata

13 / 19

ANN Modeling

Aspects of ANN modeling

Spread Factor. Typically found empirically since no good analyticapproach has been found. Some guidance was given by Haykinσ = dmax√

2nwhere dmax is max distance between the training points.

Proposes spread factor be set to d50, d75, d95 (percentiles) of thenearest distance of all training samples to rest of points.

Uses three GRNNs that all take the same input and are combined togive the final output.

Choice of combining three GRNNs is based on previous success inliterature

14 / 19

ANN Modeling Cont’d

Input selection is considered one of the most important aspects inTSF

Two general approaches: filter and wrapper

Filtering selects features based on data itself (independent of learningalgorithm)

Wrapping approaches use the learning algorithm. Wrapper typicallyperforms better.

Author uses contiguous lag and limits to one full season for 12 monthdata.

15 / 19

Experimental Results

Use NN3 time-series competition dataset which has composed ofDataset A and Dataset B

Dataset A is 111 monthly time series data drawn from empiricalbusiness time series

Dataset B is a small subset of Dataset A which consists of 11 timeseries

Error is measured using sMAPE

16 / 19

Experimental Results

B indicates statistical model and C indicates computationalintelligence model

17 / 19

Ablation Studies

SP: Spread, MSA: Multiple Step Ahead

18 / 19

Discussion

Are TSF competitions just a demonstration of the no free lunchtheorem? Why is the theorem not mentioned?

Did he prove his approach was “better” or did this approach justoutperform on a particular contest?

Why doesn’t the training of the GRNN factor out outliers andseasonality on its own? Isn’t that what training is for?

Why did he choose a GRNN? Previous papers said they performpoorly.

What kind of bias does the detrending scheme introduce?

Paper was “rule of thumb” oriented. Is there a way to make anautomatic approach more rigorous?

19 / 19

Time Series Forecasting using Neural Nets (GNNNs)

Science

Transcript of Time Series Forecasting using Neural Nets (GNNNs)