Post on 27-Apr-2015
Forecasting with Artificial Neural Networks
EVIC 2005 Tutorial Santiago de Chile, 15 December 2005 slides on www.neural-forecasting.com Sven F. CroneCentre for Forecasting Department of Management Science Lancaster University Management School email: s.crone@neural-foreasting.com
EVIC05 Sven F. Crone - www.bis-lab.com
Lancaster University Management School?
EVIC05 Sven F. Crone - www.bis-lab.com
What you can expect from this session Simple back propagation algorithmE p = C (t pj , o pj ) o pj = f j (net pj )C (t pj , o pj ) w ji = C (t pj , o pj ) net pj net pj w ji
[Rumelhart et al. 1982]
p w ji
C (t pj , o pj ) w ji
pj =
C (t pj , o pj ) net pj
pj =
C (t pj , o pj ) net pj
=
C (t pj , o pj ) o pj o pj net pj
o pt net pj
How to on Neural Network Forecasting with limited maths! CD-Start-Up Kit for Neural Net Forecasting20+ software simulators datasets literature & faq slides, data & additional info on www.neural-forecasting.com
= f j' ( net pj )C (t pj , o pj ) o pjnet pk C (t pj , o pj ) net pkk
pj =
f j' ( net pj ) wki o pii
k
C (t pj , o pj ) net pk net pk o pj
=k
C (t pj , o pj )
o pj wkj = pj wkjk
=k
pj = f ( net pj ) pj wkj' j
C (t pj , o pj ) ' f j (net pj ) if unit j is in the output layer o pj pj = f ' ( net ) w if unit j is in a hidden layer j pj pk pjk k
1
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
Forecasting with Artificial Neural Networks
1. 2. 3. 4.
Forecasting? Neural Networks? Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting? 1. 2. 3. Forecasting as predictive Regression Time series prediction vs. causal prediction Why NN for Forecasting?
2. 3. 4.
Neural Networks? Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Forecasting or Prediction?Data Mining: Application of data analysis algorithms & discovery algorithms that extract patterns out of the data algorithms?Data Mining
TASKS
Descriptive Data Mining
timing of dependent variable
Predictive Data Mining
Categorisation: Summarisation Clustering & VisualisationK-means Clustering Neural networks Feature Selection Princip.Component Analysis K-means Clustering Class Entropy
Association AnalysisAssociation rules Link Analysis
Sequence DiscoveryTemporal association rules
Categorisation: ClassificationDecision trees Logistic regress. Neural networks Disciminant Analysis
RegressionLinear Regression Nonlinear Regres. Neural networks MLP, RBFN, GRNN
Time Series AnalysisExponential smoothing (S)ARIMA(x) Neural networks
ALGORITHMS
2
EVIC05 Sven F. Crone - www.bis-lab.com
Forecasting or ClassificationIndependent dependent Metric sale Regression Time Series Analysis DOWNSCALE Nominal scale Classification DOWNSCALE Principal Component Analysis DOWNSCALE Analysis of Variance Metric scale Ordinal scale Nominal scale
Supervised learningInputs Target ... ... ... ... ... ... ... ... . . . . . . . . . ... . . ......... . . ......... . . ...
Ordinal scale DOWNSCALE DOWNSCALE Contingency Analysis Clustering Cases
Unsupervised learningInputs ... ... ... ... ... ... ... ... ... . ......... . . ......... . ........ ...
NONE
Cases
Simplification from Regression (How much) Classification (Will event occur) FORECASTING = PREDICTIVE modelling (dependent variable is in future) FORECASTING = REGRESSION modelling (dependent variable is of metric scale)
EVIC05 Sven F. Crone - www.bis-lab.com
Forecasting or Classification?What the experts say
EVIC05 Sven F. Crone - www.bis-lab.com
International Conference on Data MiningYou are welcome to contribute www.dmin-2006.com !!!
3
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting? 1. 2. 3. 4. Forecasting as predictive Regression Time series prediction vs. causal prediction SARIMA-Modelling Why NN for Forecasting?
2. 3. 4.
Neural Networks? Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Forecasting ModelsTime series analysis vs. causal modelling
Time series prediction (Univariate)Assumes that data generating process that creates patterns can be explained only from previous observations of dependent variable
Causal prediction (Multivariate)Data generating process can be explained by interaction of causal (cause-and-effect) independent variables
EVIC05 Sven F. Crone - www.bis-lab.com
Classification of Forecasting MethodsForecasting Methods
Objective Forecasting Methods Time Series Methods Averages Moving Averages Naive Methods Exponential Smoothing Simple ES Linear ES Seasonal ES Dampened Trend ES Simple Regression Autoregression ARIMA Neural Networks Causal Methods Linear Regression
Subjective Forecasting Methods
Prophecy educated guessing
Sales Force Composite Analogies Delphi PERT Survey techniques
Multiple Regression Dynamic Regression Vector Autoregression Intervention model Neural Networks
Demand Planning Practice Objektive Methods + Subjektive correction
Neural Networks ARE time series methods causal methods & CAN be used as Averages & ES Regression
4
EVIC05 Sven F. Crone - www.bis-lab.com
Time Series DefinitionDefinitionTime Series is a series of timely ordered, comparable observations yt recorded in equidistant time intervals
NotationYt represents the t th period observation, t=1,2 n
EVIC05 Sven F. Crone - www.bis-lab.com
Concept of Time SeriesAn observed measurement is made up of asystematic part and a random part
ApproachUnfortunately we cannot observe either of these !!! Forecasting methods try to isolate the systematic part Forecasts are based on the systematic part The random part determines the distribution shape
AssumptionData observed over time is comparableThe time periods are of identical lengths (check!) The units they are measured in change (check!) The definitions of what is being measured remain unchanged (check!) They are correctly measured (check!)
data errors arise from sampling, from bias in the instruments or the responses, from transcription.
EVIC05 Sven F. Crone - www.bis-lab.com
Objective Forecasting Methods Time Series Methods of Time Series Analysis / ForecastingClass of objective Methods based on analysis of past observations of dependent variable alone
Assumptionthere exists a cause-effect relationship, that keeps repeating itself with the yearly calendar Cause-effect relationship may be treated as a BLACK BOX TIME-STABILITY-HYPOTHESIS ASSUMES NO CHANGE: Causal relationship remains intact indefinitely into the future! the time series can be explained & predicted solely from previous observations of the series Time Series Methods consider only past patterns of same variable Future events (no occurrence in past) are explicitly NOT considered! external EVENTS relevant to the forecast must be corected MANUALLY
5
EVIC05 Sven F. Crone - www.bis-lab.com
Simple Time Series PatternsDia gra m 1.1: Tre nd long-te r m gr ow th or de cline occur ing w ithin a s e r ie s 100 80 60 40 20 0 12 15 18 21 24 27 30 3 6 9 Year 120 100 80 60 40
Dia gra m 1.2: S e a sona l m or e or le s s r e gular m ove m e nts w ithin a ye ar
Long term movement in series
20 0 Year
regular fluctuation within a year (or shorter period) superimposed on trend and cycle10 15 20 25 30 35 40 73M 11.2001 M 12.2001 M 01.2002 M 10.2001
Dia gra m 1.3: Cycle 10 8 6 4 2 0 10 5 Year alte r nating ups w ings of var ie d le ngth and inte ns ity 350 300 250 200 150
Dia gra m 1.4: Irre gula r r andom m ove m e nts and thos e w hich r e fle ct unus ual e ve nts
15
20
25
30
35
40
45
Regular fluctuation superimposed on trend (period may be random)
100 50 0 1 10 19 28 37 46 55 64 82M 02.2002 M 03.2002 M 04.2002 M 05.2002 M 06.2002
EVIC05 Sven F. Crone - www.bis-lab.com
Regular Components of Time SeriesA Time Series consists of superimposed components / patterns:
Signal level L trend T seasonality S Noise irregular,error 'e'
Sales
=
LEVEL
+ SEASONALITY
+
TREND +
RANDOM ERROR
EVIC05 Sven F. Crone - www.bis-lab.com
Irregular Components of Time SeriesOriginal-Zeitreihen
Structural changes in systematic data PULSEone time occurrence on top of systematic stationary / trended / seasonal development
45 40 35 30 25 20 15 10 5 0 M 08.2000 M 09.2000 M 10.2000 M 11.2000 M 12.2000 M 01.2001 M 02.2001 M 03.2001 M 04.2001 M 05.2001 M 06.2001 M 07.2001 M 08.2001 M 09.2001 M 07.2002
45
5
[t]
Datenwert original
Korrigierter Datenwert
LEVEL SHIFTone time / multiple time shifts on top of systematic stationary / trended / seasonal etc. development60 50
STATIONARY time series with PULSEOriginal-Zeitreihen
40
30
20
10
Okt 00
Feb 01
Okt 01
Feb 02
Mrz 01
Mrz 02
Mai 01
Nov 00
Dez 00
Nov 01
Dez 01
Jan 01
Jun 01
Jan 02
Mai 02
Jul 01
Jun 02
Sep 00
Sep 01
Apr 01
Aug 00
Trend changes (slope, direction) Seasonal pattern changes & shifts
Datenwert original
Aug 01
Korrigierter Datenwert
STATIONARY time series with level shift
Apr 02
Jul 02
STRUCTURAL BREAKS
0
[t]
6
EVIC05 Sven F. Crone - www.bis-lab.com
Components of Time SeriesTime Series
Level
Trend Random
Time Series decomposed into Components
EVIC05 Sven F. Crone - www.bis-lab.com
Time Series PatternsTime Series Pattern
REGULAR Time Series Patterns STATIONARY) Time SeriesOriginal-Zeitreihen6000045
IRREGULAR Time Series Patterns TRENDED Time SeriesOriginal-Zeitreihen12
SEASONAL Time SeriesOriginal-Zeitreihen
FLUCTUATING Time SeriesOriginal-Zeitreihen6030
INTERMITTANT Time SeriesOriginal-Zeitreihen
50000
40
501035
25
40000
30
408
20
30000
25
3020
15
6
20000
15
204
10
10
100005
102
5
0 M 08.2000 M 09.2000 M 10.2000 M 11.2000 M 12.2000 M 01.2001 M 02.2001 M 03.2001 M 04.2001 M 05.2001 M 06.2001 M 07.2001 M 08.2001 M 09.2001 M 10.2001 M 11.2001 M 12.2001 M 01.2002 M 02.2002 M 03.2002 M 04.2002 M 05.2002 M 06.2002 M 07.2002
M 08.2000
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
M 08.2000
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
Mrz 77
Mrz 78
Mrz 79
Nov 77
Nov 78
Nov 79
Mrz 80
Jan 77
Mai 77
Jul 77
Jan 78
Mai 78
Jul 78
Jan 79
Mai 79
Jul 79
Jan 80
Mai 80
Nov 80
Sep 77
Sep 78
Sep 79
Jul 80
Sep 80
[t]
M 07.2002
M 07.2002
0
[t]0 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
0
0
[t]- 10
[t]-5
[t]
Datenwert original
Korrigierter Datenwert
Datenwert original
Korrigierter Datenwert
Datenwert original
Korrigierter Datenwert
Datenwert original
Korrigierter Datenwert
Datenwert original
Korrigierter Datenwert
Yt = f(Et) time series is influenced by level & random fluctuations
Yt = f(St,Et)time series is influenced by level, season and random fluctuations
Yt = f(Tt,Et) time series is influenced by trend from level and random fluctuations
time series fluctuates very strongly around level (mean deviation > ca. 50% around mean)
Number of periods with zero sales is high (ca. 30%-40%)
Combination of individual Components
Yt = f( St, Tt, Et )
+ PULSES! + LEVEL SHIFTS! + STRUCTURAL BREAKS!
EVIC05 Sven F. Crone - www.bis-lab.com
Components of complex Time SeriesZeitreihen-Diagramm
Sales or observation of time series at point t
800
750
Yt =45 40
700
650
600
Different possibilities to combine componentsMrz 61 Mrz 62 Mrz 63 Mai 63 Mrz 60 Mai 62 Jul 62 Jan 63 Jul 63 Nov 60 Nov 61 Nov 62
550
500 Sep 60 Sep 61 Sep 62 Nov 63 Mai 61 Jul 61 Jan 60 Jan 61 Jan 62 Mai 60 Jul 60 Sep 63
[t]
Datenwert original
Korrigierter Datenwert
consists of a combination of
Original-Zeitreihen
f(
)
35 30
25 20 15 10 5
Mrz 77
Mrz 78
Mrz 79
Nov 77
Nov 78
Nov 79
Mrz 80
Jan 77
Jan 78
Jan 79
Jan 80
Base Level + Seasonal Component
Datenwert original
Korrigierter Datenwert
St ,
Nov 80
Mai 77
Jul 77
Mai 78
Jul 78
Mai 79
Jul 79
Mai 80
Sep 77
Sep 78
Sep 79
Jul 80
0
Sep 80
[t]
Original-Zeitreihen12
10
8
6
Additive Model Yt = L + St + Tt + Et[t]1973 1974 1975 1976 1977 1978 1979
4
2
0 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1980
Tt ,Trend-Component
Datenwert original
Korrigierter Datenwert
Multiplicative Model Yt = L * St * Tt * EtOriginal-Zeitreihen60000
EtIrregular or random ErrorComponent
50000
40000
30000
20000
10000
0 M 08.2000 M 09.2000 M 10.2000 M 11.2000 M 12.2000 M 01.2001 M 02.2001 M 03.2001 M 04.2001 M 05.2001 M 06.2001 M 07.2001 M 08.2001 M 09.2001 M 10.2001 M 11.2001 M 12.2001 M 01.2002 M 02.2002 M 03.2002 M 04.2002 M 05.2002 M 06.2002 M 07.2002
[t]
Datenwert original
Korrigierter Datenwert
7
EVIC05 Sven F. Crone - www.bis-lab.com
Classification of Time Series PatternsNo Seasonal Effect No Trend Effect Additive Seasonal Effect Multiplicative Seasonal Effect
Additive Trend Effect
Multiplicative Trend Effect
[Pegels 1969 / Gardner]
EVIC05 Sven F. Crone - www.bis-lab.com
AgendaForecasting with Artificial Neural Networks
1.
Forecasting? 1. 2. 3. Forecasting as predictive Regression Time series prediction vs. causal prediction SARIMA-Modelling 1. 2. 3. 4. 4. SARIMA Differencing SARIMA Autoregressive Terms SARIMA Moving Average Terms SARIMA Seasonal Terms
Why NN for Forecasting?
2. 3. 4.
Neural Networks? Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Introduction to ARIMA ModellingSeasonal Autoregressive Integrated Moving Average Processes: SARIMApopularised by George Box & Gwilym Jenkins in 1970s (names often used synonymously) models are widely studied Put together theoretical underpinning required to understand & use ARIMA Defined general notation for dealing with ARIMA models
claim that most time series can be parsimoniously represented by the ARIMA class of models ARIMA (p, d, q)-Models attempt to describe the systematic pattern of a time series by 3 parameters p: Number of autoregressive terms (AR-terms) in a time series d: Number of differences to achieve stationarity of a time series q: Number of moving average terms (MA-terms) in a time series
p ( B)(1 B) d Z t = + q ( B)et
8
EVIC05 Sven F. Crone - www.bis-lab.com
The Box-Jenkins Methodology for ARIMA modelsModel Identification Data Preparation Transform time series for stationarity Difference time series for stationarity
Model selection Examine ACF & PACF Identify potential Models (p,q)(sq,sp) auto
Model Estimation & Testing
Model Estimation Estimate parameters in potential models Select best model using suitable criterion
Model Diagnostics / Testing
Check ACF / PACF of residuals white noise Run portmanteau test of residuals Re-identify
Model Application
Model Application Use selected model to forecast
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-ModellingARIMA(p,d,q)-ModelsARIMA - Autoregressive Terms AR(p), with p=order of the autoregressive part ARIMA - Order of Integration, d=degree of first differencing/integration involved ARIMA - Moving Average Terms MA(q), with q=order of the moving average of error SARIMAt (p,d,q)(P,D,Q) with S the (P,D,Q)-process for the seasonal lags
ObjectiveIdentify Identify Identify Identify the appropriate ARIMA model for the time series AR-term I-term MA-term
Identification throughAutocorrelation Function Partial Autocorrelation Function
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-Models: Identification of d-termParameter d determines order of integration ARIMA models assume stationarity of the time seriesStationarity in the mean Stationarity of the variance (homoscedasticity)
Recap:Let the mean of the time series at t be and = cov Y , Yt ,t
(
t
t
)
t = E (Yt )
Definition
t ,t = var (Yt )
A time series is stationary if its mean level t is constant for all t and its variance and covariances t- are constant for all t In other words:all properties of the distribution (mean, varicance, skewness, kurtosis etc.) of a random sample of the time series are independent of the absolute time t of drawing the sample identity of mean & variance across time
9
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-Models: Stationarity and parameter dIs the time series stationarysample B Sample A
Stationarity: (A)= (B) var(A)=var(B) etc. this time series: trend instationary time series
(B)> (A)
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-Modells: Differencing for StationariryDifferencing time series E.g. : time series Yt={2, 4, 6, 8, 10}. time series exhibits linear trend 1st order differencing between observation Yt and predecessor Yt-1 derives a transformed time series: 4-2=2 6-4=2 8-6=2 10-8=2 The new time series Yt={2,2,2,2} is stationary through 1st differencing d=1 ARIMA (0,1,0) model 2nd order differences: d=2
10
2 1 2 3 4 5
xt10
2 1 2 3 4
xt-xt-1
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-Modells: Differencing for StationariryIntegrationDifferencing
Z t = Yt Yt 1Transforms: Logarithms etc. Where Zt is a transform of the variable of interest Yt chosen to make Zt-Zt-1-(Zt-1-Zt-2)- stationary
Tests for stationarity:Dickey-Fuller Test Serial Correlation Test Runs Test
10
EVIC05 Sven F. Crone - www.bis-lab.com
AgendaForecasting with Artificial Neural Networks
1.
Forecasting? 1. 2. 3. Forecasting as predictive Regression Time series prediction vs. causal prediction SARIMA-Modelling 1. 2. 3. 4. 4. SARIMA Differencing SARIMA Autoregressive Terms SARIMA Moving Average Terms SARIMA Seasonal Terms
Why NN for Forecasting?
2. 3. 4.
Neural Networks? Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-Models Autoregressive TermsDescription of Autocorrelation structure auto regressive (AR) term If a dependency exists between lagged observations Yt and Yt-1 we can describe the realisation of Yt-1
Yt = c + 1Yt 1 + 2Yt 2 + ... + pYt p + etObservation in time t Weight of the AR relationship Observation in t-1 (independent) random component (white noise)
Equations include only lagged realisations of the forecast variable ARIMA(p,0,0) model = AR(p)-model Problems Independence of residuals often violated (heteroscedasticity) Determining number of past values problematic Tests for Autoregression: Portmanteau-testsBox-Pierce test Ljung-Box test
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-Modells: Parameter p of Autocorrelationstationary time series can be analysed for autocorrelationstructure The autocorrelation coefficient for lag k
k = t =k +1
(Y Y )(Yt n t =1 t
n
t k
Y )
(Y Y )Autocorrelation between x_t and x_t-1
denotes the correlation between lagged observations of distance k220
Graphical interpretation Uncorrelated data has low autocorrelations Uncorrelated data shows no correlation patern [x_t-1]
200
180
160 X_t-1 140
120
100 100
120
140
160
180
200
220
[x_t]
11
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-Modells: Parameter pE.g. time series Yt 7, lag 1: 7, 8 8, 7 7, 6 6, 5 5, 4 4, 5 5, 6 6, 4 r1=.620.6 0.0 -0.6 1 2 3 lag
8, lag 2:
7, 7, 8, 7, 6, 5, 4, 5,
6, 7 6 5 4 5 6 4
5,
4, lag 3:
5, 7, 8, 7, 6, 5, 4,
6, 6 5 4 5 6 5
4.
r2=.32
r3=.15
ACF
Autocorrelations rt gathered at lags 1, 2, make up the autocorrelation function (ACF)
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-Models Autoregressive TermsIdentification of AR-terms ?NORM1.0 1.0
AUTO
.5
.5
0.0
0.0
-.5 Confidence Limits
-.5 Confidence Limits
ACF
-1.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Coefficient
ACF
-1.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Coefficient
Lag Number
Lag Number
Random independent observations
An AR(1) process?
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-Modells: Partial AutocorrelationsPartical Autocorrelations are used to measure the degree of association between Yt and Yt-k when the effects of other time lags 1,2,3,,k-1 are removedSignificant AC between Yt and Yt-1 significant AC between Yt-1 and Yt-2 induces correlation between Yt and Yt-2 ! (1st AC = PAC!)
When fitting an AR(p) model to the time series, the last coefficient p of Yt-p measures the excess correlation at lag p which is not accounted for by an AR(p-1) model. p is called the pth order partial autocorrelation, i.e.
p = corr (Yt , Yt p | Yt 1 , Yt 2 ,..., Yt p +1 )Partial Autocorrelation coefficient measures true correlation at Yt-p
Yt =0 +p1Yt1 +p2Yt2 +. . . .+pYtp +t
12
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA Modelling AR-Model patterns1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1 Autocorrelation Confidence Interval 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Autocorrelation function Pattern in ACf
1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1
Partial Autocorrelation function 1st lag significant
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Partial Autocorrelation Confidence Interval
AR(1) model:
Yt = c + 1Yt 1 + et
=ARIMA (1,0,0)
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA Modelling AR-Model patterns1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1 Autocorrelation Confidence Interval 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Autocorrelation function Pattern in ACf
1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1
Partial Autocorrelation function 1st lag significant
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Partial Autocorrelation Confidence Interval
AR(1) model:
Yt = c + 1Yt 1 + et
=ARIMA (1,0,0)
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA Modelling AR-Model patterns1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1 Autocorrelation Confidence Interval -0,8 -1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Autocorrelation function Pattern in ACf
1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6
Partial Autocorrelation function 1st & 2nd lag significant
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Partial Autocorrelation Confidence Interval
AR(2) model: =ARIMA (2,0,0)
Yt = c + 1Yt 1 + 2Yt 2 + et
13
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA Modelling AR-Model patterns1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1 Autocorrelation Confidence Interval 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Autocorrelation function Pattern in ACF: dampened sine
1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1
Partial Autocorrelation function 1st & 2nd lag significant
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Partial Autocorrelation Confidence Interval
AR(2) model: =ARIMA (2,0,0)
Yt = c + 1Yt 1 + 2Yt 2 + et
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA Modelling AR-ModelsAutoregressive Model of order one ARIMA(1,0,0), AR(1)Autoregressive: AR(1), rho=.85 4 3 2 1 0 1 -1 -2 -3 -4 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Yt = c + 1Yt 1 + et = 1.1 + 0.8Yt 1 + etHigher order AR models
Yt = c + 1Yt 1 + 2Yt 2 + ... + pYt p + etfor p = 1, 1 < 1 < 1 p = 2, 1 < 2 < 1 2 + 1 < 1 2 1 < 1
EVIC05 Sven F. Crone - www.bis-lab.com
AgendaForecasting with Artificial Neural Networks
1.
Forecasting? 1. 2. 3. Forecasting as predictive Regression Time series prediction vs. causal prediction SARIMA-Modelling 1. 2. 3. 4. 4. SARIMA Differencing SARIMA Autoregressive Terms SARIMA Moving Average Terms SARIMA Seasonal Terms
Why NN for Forecasting?
2. 3. 4.
Neural Networks? Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
14
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA Modelling Moving Average ProcesseDescription of Moving Average structureAR-Models may not approximate data generator underlying the observations perfectly residuals et, et-1, et-2, , et-q Observation Yt may depend on realisation of previous errors e Regress against past errors as explanatory variables
Yt = c + et 1et 1 2 et 2 ... q et qARIMA(0,0,q)-model = MA(q)-model
for q = 1, 1 < 1 < 1 q = 2, 1 < 2 < 1 2 + 1 < 1 2 1 < 1
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA Modelling MA-Model patterns1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1 Autocorrelation Confidence Interval 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Autocorrelation function 1st lag significant
1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1 1
Partial Autocorrelation function Pattern in PACF
2 3 4 5
6 7
8 9 10 11 12 13 14 15 16
Partial Autocorrelation Confidence Interval
MA(1) model: (0,0,1)
Yt = c + et 1et 1
=ARIMA
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA Modelling MA-Model patterns1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1 Autocorrelation Confidence Interval 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Autocorrelation function 1st lag significant
1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1
Partial Autocorrelation function Pattern in PACF
1
2
3 4
5
6 7
8
9 10 11 12 13 14 15 16
Partial Autocorrelation Confidence Interval
MA(1) model:
Yt = c + et 1et 1
=ARIMA (0,0,1)
15
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA Modelling MA-Model patterns1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1 Autocorrelation Confidence Interval-0,8 -1
Autocorrelation function 1st, 2nd & 3rd lag significant
1 0,8 0,6 0,4 0,2 0
Partial Autocorrelation function Pattern in PACF
1
2 3 4
5 6 7
8 9 10 11 12 13 14 15 16
-0,2 -0,4 -0,6
1
2
3 4
5
6 7
8
9 10 11 12 13 14 15 16
Partial Autocorrelation Confidence Interval
MA(3) model: =ARIMA (0,0,3)
Yt = c + et 1et 1 1et 2 1et 3
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA Modelling MA-ModelsAutoregressive Model of order one ARIMA(0,0,1)=MA(1)Autoregressive: MA(1), theta=.2
Yt = c + et 1et 1 = 10 + et + 0.2et 1
3 2,5 2 1,5 1 0,5 0 -0,5 1 -1 -1,5 -2
4
7
10 13 16 19 22 25 28 31 34 37 40 43 46 49
Period
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA Modelling Mixture ARMA-Modelscomplicated series may be modelled by combining AR & MA termsARMA(1,1)-Model = ARIMA(1,0,1)-Model
Yt = c + 1Yt 1 + et 1et 1
Higher order ARMA(p,q)-Models
Yt = c + 1Yt 1 + 2Yt 2 + ... + pYt p + et 1et 1 2 et 2 ... q et q
16
EVIC05 Sven F. Crone - www.bis-lab.com
1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1 1
Autocorrelation function 1st lag significant
1 0,8 0,6 0,4 0,2 0
Partial Autocorrelation function 1st lag significant
2 3 4
5 6 7
8 9 10 11 12 13 14 15 16
-0,2 -0,4 -0,6
1
2
3 4
5
6 7
8
9 10 11 12 13 14 15 1
Autocorrelation Confidence Interval-0,8 -1
Partial Autocorrelation Confidence Interval
AR(1) and MA(1) model: =ARMA(1,1)=ARIMA (1,0,1)
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda1. Forecasting? 1. 2. 3.Forecasting with Artificial Neural Networks
Forecasting as predictive Regression Time series prediction vs. causal prediction SARIMA-Modelling 1. 2. 3. 4. 5. SARIMA Differencing SARIMA Autoregressive Terms SARIMA Moving Average Terms SARIMA Seasonal Terms SARIMAX Seasonal ARIMA with Interventions
4. 2. 3. 4.
Why NN for Forecasting?
Neural Networks? Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Seasonality in ARIMA-ModelsIdentifying seasonal data:Spikes in ACF / PACF at seasonal lags, e.g 1,0000 t-12 & t-13 for yearly ,8000 ,6000 t-4 & t-5 for quarterly ACF,4000 ,2000 ,00001 4 7 10 13 16 19 22 25 28 31 34
Upper Limit Low er Limit
Differences
Simple: Yt=(1-B)Yt -,2000 Seasonal: sYt=(1-Bs)Yt -,4000 with s= seasonality, eg. 4, 12
Data may require seasonal differencing to remove seasonalityTo identify model, specify seasonal parameters: (P,D,Q)the seasonal autoregressive parameters P seasonal difference D and seasonal moving average Q
Seasonal ARIMA (p,d,q)(P,D,Q)-model
17
EVIC05 Sven F. Crone - www.bis-lab.com
Seasonality in ARIMA-Models1.0000 .8000 .6000 .4000 .2000 .0000 -.2000 -.4000 ACF Upper Limit Low er Limit
ACF
.800016
7 10
13
19
22
25
28
.6000 .4000 .2000 .0000 -.2000 -.4000 -.60001 4 7 10 13 16 25 28 31 19 22 34
31 34
Seasonal spikes = monthly data
1
4
PACF Upper Limit Low er Limit
PACF
EVIC05 Sven F. Crone - www.bis-lab.com
Seasonality in ARIMA-ModelsExtension of Notation of Backshift Operator
sYt = Yt-Yt-s =YtBsYt=(1-Bs)YtSeasonal difference followed by a first difference: (1-B)
(1-Bs)
Yt
(1 1 B ) (1 1B 4 ) (1 B ) (1 B 4 ) Yt = c + (1 1 B ) (1 1 B 4 ) etNon-seasonal AR(1) Non-seasonal difference Seasonal Seasonal AR(1) difference Non-seasonal MA(1) Seasonal MA(1)
Seasonal ARIMA(1,1,1)(1,1,1)4-modell
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda1. Forecasting? 1. 2. 3.Forecasting with Artificial Neural Networks
Forecasting as predictive Regression Time series prediction vs. causal prediction SARIMA-Modelling 1. 2. 3. 4. 5. SARIMA Differencing SARIMA Autoregressive Terms SARIMA Moving Average Terms SARIMA Seasonal Terms SARIMAX Seasonal ARIMA with Interventions
4. 2. 3. 4.
Why NN for Forecasting?
Neural Networks? Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
18
EVIC05 Sven F. Crone - www.bis-lab.com
Forecasting ModelsTime series analysis vs. causal modelling
Time series prediction (Univariate)Assumes that data generating process that creates patterns can be explained only from previous observations of dependent variable
Causal prediction (Multivariate)Data generating process can be explained by interaction of causal (cause-and-effect) independent variables
EVIC05 Sven F. Crone - www.bis-lab.com
Causal PredictionARX(p)-Models
General Dynamic Regression Models
EVIC05 Sven F. Crone - www.bis-lab.com
AgendaForecasting with Artificial Neural Networks
1.
Forecasting? 1. 2. 3. 4. Forecasting as predictive Regression Time series prediction vs. causal prediction SARIMA-Modelling Why NN for Forecasting?
2. 3. 4.
Neural Networks? Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
19
EVIC05 Sven F. Crone - www.bis-lab.com
Why forecast with NN?Pattern or noise?
Airline Passenger data Seasonal, trended Real model disagreed: multiplicative seasonality or additive seasonality with level shifts?
Fresh products supermarket Sales Seasonal, events, heteroscedastic noise Real model unknown
EVIC05 Sven F. Crone - www.bis-lab.com
Why forecast with NN?Pattern or noise?
Random Noise iid (normally distributed: mean 0; std.dev. 1)
BL(p,q) Bilinear Autoregressive Model
yt = 0.7 yt 1 t 2 + t
EVIC05 Sven F. Crone - www.bis-lab.com
Why forecast with NN?Pattern or noise?
TAR(p) Threshold Autoregressive model
Random walk
yt = 0.9 yt 1 + t = 0.3 yt 1 t
for yt 1 1 for yt 1 > 1
yt = yt 1 + t
20
EVIC05 Sven F. Crone - www.bis-lab.com
Motivation for NN in Forecasting Nonlinearity!True data generating process in unknown & hard to identify Many interdependencies in business are nonlinear NN can approximate any LINEAR and NONLINEAR function to any desired degree of accuracyCan learn linear time series patterns Can learn nonlinear time series patterns Can extrapolate linear & nonlinear patterns = generalisation!
NN are nonparametricDont assume particular noise process, i.e. gaussian
NN model (learn) linear and nonlinear process directly from dataApproximate underlying data generating process
NN are flexible forecasting paradigm
EVIC05 Sven F. Crone - www.bis-lab.com
Motivation for NN in Forecasting - Modelling FlexibilityUnknown data processes require building of many candidate models!Flexibility on Input Variables flexible codingbinary scale [0;1]; [-1,1] nominal / ordinal scale (0,1,2,,10 binary coded [0001,0010,] metric scale (0.235; 7.35; 12440.0; )
Flexibility on Output Variablesbinary prediction of single class membership nominal / ordinal prediction of multiple class memberships metric regression (point predictions) OR probability of class membership!
Number of Input Variables
Number of Output Variables
One SINGLE network architecture
MANY applications
EVIC05 Sven F. Crone - www.bis-lab.com
Applications of Neural Nets in diverse Research Fields 2500+ journal publications on NN & Forecasting alone!Neurophysiology InformaticsCitation Analysis by yeartitle =(neura l AND ne t* AND (forecast* OR pre dict* OR time -se r* OR time w /2 se r* OR time ser*) & title =(... ) and evaluate d sales forecasting re lated point predictions
simulate & explain brain
Forecast Sales Forecast Linear (Forecast)
350 R = 0.90362
35 30 25 20 15 10 5 0 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 [year]
[citations]
300 eMail & url filtering VirusScan (Symmantec Norton Antivirus) 250 Speech Recognition & Optical Character Recognition 200 150 100 50 0
Engineering
control applications in plants automatic target recognition (DARPA) explosive detection at airports Mineral Identification (NASA Mars Explorer) starting & landing of Jumbo Jets (NASA) Rainfall prediction ElNino Effects
Meteorology / weather Corporate Business
credit card fraud detection simulate forecasting methods52
Number of Publications by Business Forecasting Domain5
Business Forecasting DomainsElectrical Load / Demand Financial Forecasting Sales forecastingCurrency / Exchange rate stock forecasting etc.
32 83 51General Business Marketing Finance Production Product Sales Electrical Load
not all NN recommendations are useful for your DOMAIN!
21 10
21
EVIC05 Sven F. Crone - www.bis-lab.com
IBF Benchmark Forecasting Methods usedApplied Forecasting Methods (all industries)0% Averages Autoregressive Methods Decomposition Exponential Smoothing Trendextrapolation Econometric Models Neural Networks Regression Analogies Delphi PERT Surveys6% 49% 23% 22% 9% 69% 23% 7% 8% 24% 35%
10%
20%
30%25%
40%
50%
60%
70%
80%
[Forecastinng Method]
Time Series methods (objective) 61% Causal Methods (objective) 23%
Judgemental Methods (subjective) 2x%
[number replies]Survey 5 IBF conferences in 2001 240 forecasters, 13 industries
NN are applied in coroporate Demand Planning / S&OP processes![Warning: limited sample size]
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
Forecasting with Artificial Neural Networks
1. 2.
Forecasting? Neural Networks? 1. 2. 3. 4. What are NN? Definition & Online Preview Motivation & brief history of Neural Networks From biological to artificial Neural Network Structures Network Training
3. 4.
Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
What are Artificial Neural Networks?Artificial Neural Networks (NN)a machine that is designed to model the way in which the brain performs a particular task ; the network is implemented or .. simulated in software on a digital computer. [Haykin98] class of statistical methods for information processing consisting of large number of simple processing units (neurons), which exchange information of their activation via directed connections. [Zell97] Inputtime series observation causal variables image data (pixel/bits) Finger prints Chemical readouts n+ 4
Processing n +1
Outputtime series prediction n+ 5
dependent variables class memberships class probabilities principal components
n+ 2Black Box
n+ 6n+h
n+ 3
22
EVIC05 Sven F. Crone - www.bis-lab.com
What are Neural Networks in Forecasting?Artificial Neural Networks (NN) a flexible forecasting paradigmA class of statistical methods for time-series and causal forecasting Highly flexible processing arbitrary input to output relationships Properties non-linear, nonparametric (assumed), error robust (not outlier!) Data driven modelling learning directly from data
InputNominal, interval or ratio scale (not ordinal) time series observations Lagged variables Dummy variables Causal variables Explanatory variables Dummy variables
Processing n +1 n+ 5 n+ 2Black Box
OutputRatio scale Regression Single period ahead Multiple period ahead
n+ 6n+h
n+ 3
n+ 4
EVIC05 Sven F. Crone - www.bis-lab.com
DEMO: Preview of Neural Network ForecastingSimulation of NN for Business Forecasting
Airline Passenger Data Experiment3 layered NN: (12-8-1) 12 Input units - 8 hidden units 1 output unit 12 input lags t, t-1, , t-11 (past 12 observations) time series prediction t+1 forecast single step ahead forecast Benchmark Time Series [Brown / Box&Jenkins] 132 observations 13 periods of monthly data
EVIC05 Sven F. Crone - www.bis-lab.com
Demonstration: Preview of Neural Network Forecasting NeuraLab Predict! look inside neural forecasting
Errors on training / validation / test dataset
Time Series versus Neural Network Forecast updated after each learning step
in sample observations & forecasts out of sample training validate = Test Time series actual value
PQ-diagramm
Absolute Forecasting Errors
NN forecasted value
23
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
Forecasting with Artificial Neural Networks
1. 2.
Forecasting? Neural Networks? 1. 2. 3. 4. What are NN? Definition & Online Preview Motivation & brief history of Neural Networks From biological to artificial Neural Network Structures Network Training
3. 4.
Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Motivation for using NN BIOLOGY!Human & other nervous systems (animals, insects e.g. bats)Ability of various complex functions: perception, motor control, pattern recognition, classification, prediction etc. Speed: e.g. detect & recognize changed face in crowd=100-200ms Efficiency etc.
brains are the most efficient & complex computer known to dateHuman Brain Processing Speed Neurons/Transistors Weight Energy consumption Computation: Vision10-3ms (0.25 MHz) 1500 grams 10-16 Joule 100 steps
Computer (PCs)10-9ms (2500 MHz PC) kilograms to tons! 10-6 Joule billions of steps
10 billion & 103 billion conn. 50 million (PC chip)
Comparison: Human = 10.000.000.000
ant 20.000 neurons
EVIC05 Sven F. Crone - www.bis-lab.com
Brief History of Neural NetworksHistoryDeveloped in interdisciplinary Research (McCulloch/Pitts1943) Motivation from Functions of natural Neural Networksneurobiological motivation application-oriented motivationTuring Hebb 1936 1949 McCulloch / Pitts 1943 Dartmouth Project 1956 Rosenblatt 1959 Minski 1954builds 1st NeuroComputer [Smith & Gupta, 2000]
Minsky/Papert Werbos 1st IJCNN 1st journals 1969 1974 1987 1988 Kohonen Rumelhart/Hinton/Williams 1972 1986 INTEL1971 IBM 1981 Neuralware 1987founded
SAS 1997Enterprise Miner
1st microprocessor Introduces PC
GE 19541st computer payroll system
White 19881st paper on forecasting
IBM 1998$70bn BI initiative
Neuroscience
Applications in Engineering Pattern Recognition & Control
Applications in Business Forecasting
Research field of Soft-Computing & Artificial IntelligenceNeuroscience, Mathematics, Physics, Statistics, Information Science, Engineering, Business Management different VOCABULARY: statistics versus neurophysiology !!!
24
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
Forecasting with Artificial Neural Networks
1. 2.
Forecasting? Neural Networks? 1. 2. 3. 4. What are NN? Definition & Online Preview Motivation & brief history of Neural Networks From biological to artificial Neural Network Structures Network Training
3. 4.
Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Motivation & Implementation of Neural NetworksFrom biological neural networks to artificial neural networks
o1wi,1 o2 wi,2 oj wi,j
neti = wij o j j ai = f (neti )j
oi = ai
oi n +1
n+5 n+2 n+6 n+3
Mathematics as abstract representations of reality use in software simulators, hardware, engineering etc.
n+ h
n+4
oi = tanh w ji o j i j
neural_net = eval(net_name); [num_rows, ins] = size(neural_net.iw{1}); [outs,num_cols] = size(neural_net.lw{neural_net.numLayers, neural_net.numLayers-1}); if (strcmp(neural_net.adaptFcn,'')) net_type = 'RBF'; else net_type = 'MLP'; end
fid = fopen(path,'w');
EVIC05 Sven F. Crone - www.bis-lab.com
Information Processing in biological NeuronsModelling of biological functions in Neurons10-100 Billion Neurons with 10000 connections in Brain Input (sensory), Processing (internal) & Output (motoric) Neurons
CONCEPT of Information Processing in Neurons
25
EVIC05 Sven F. Crone - www.bis-lab.com
Alternative notations Information processing in neurons / nodesBiological Representation
Graphical Notation
Inputin1 wi,1 wi,2
Input Function
Activation Function
Output
i => weights wi 0 => bias
in2
neti = wij in jj
ai = f ( neti j )Neuron / Node ui
outi
inj
wi,j
Mathematical Representation
1 if yi = 0 if
w wj j
ji
x j i 0 x j i < 0
ji
alternative: yi = tanh w ji x j i j
EVIC05 Sven F. Crone - www.bis-lab.com
Information Processing in artificial NodesCONCEPT of Information Processing in Neurons Input Function (Summation of previous signals) Activation Function (nonlinear) binary step function {0;1} sigmoid function: logistic, hyperbolic tangent etc. Output Function (linear / Identity, SoftMax ...)Inputin1 wi,1 wi,2
Input Function
Activation Function
Output Function
Output
in2
neti = wij in j j ai = f (net i )j
oi = ai
outi
inj
wi,j
=1 if outi = 0 if
Neuron / Node ui
Unidirectional Information Processing
w oji
j
i 0
wj
j
ji
o j i < 0
EVIC05 Sven F. Crone - www.bis-lab.com
Input Functions
26
EVIC05 Sven F. Crone - www.bis-lab.com
Binary Activation FunctionsBinary activation calculated from input
a j = f act ( net j , j )
e.g.
a j = f act ( net j j )
EVIC05 Sven F. Crone - www.bis-lab.com
Information Processing: Node Threshold logicNode Function BINARY THRESHOLD LOGIC 1. weight individual input by connection strength 2. sum weighted inputs 3. add bias term 4. calculate output of node through BINARY transfer function
RERUN with next input
inputs2.2
weights w1,i0.71
information processing2.2* 0.71 +4.0*-1.84 +1.0* 9.01 = 3.212 3.212 - 8.0 = -4.778
output
o1 o2
4
w2,i -1.849.01 8.0
-4.778 < 0 0.00
oj
0.00
1
o3
w3,i w0,i
=o0
neti = wij o j jj
ai = f (net i )
1 w ji o j i 0 j oi = 0 w ji o j i < 0 j
EVIC05 Sven F. Crone - www.bis-lab.com
Continuous Activation FunctionsActivation calculated from input
a j = f act ( net j , j )
e.g.
a j = f act ( net j j )
Hyperbolic Tangent
Logistic Function
27
EVIC05 Sven F. Crone - www.bis-lab.com
Information Processing: Node Threshold logicNode Function Sigmoid THRESHOLD LOGIC of TanH activation function 1. weight individual input by connection strength 2. sum weighted inputs 3. add bias term 4. calculate output of node through BINARY transfer function RERUN with next input
inputs2.2
weights w1,i0.71
information processing2.2* 0.71 +4.0*-1.84 +1.0* 9.01 = 3.212 3.212 - 8.0 = -4.778
output
o1 o2
4
w2,i -1.849.01 8.0
Tanh(-4.778) = -0.9998
oj
-0.9998
1
o3
w3,i w0,i
=o0
neti = wij o j jj
ai = f (net i )
oi = tanh
w ji o j i j
EVIC05 Sven F. Crone - www.bis-lab.com
A new Notation GRAPHICS!Single Linear Regression as an equation:
y = o + 1 x1 + 2 x2 + ... + n xn + Single Linear Regression as a directed graph:1 x1 x2 xn Unidirectional Information Processing
01
2 n
xn
n n
y
EVIC05 Sven F. Crone - www.bis-lab.com
Why Graphical Notation?Simple neural network equation without recurrent feedbacks: yk = tanh wkj tanh wki tanh w ji x j j i k Min ! i k j
=> wi with i 0 =>
N N xi wij j xi wij j N e i=1 e i=1 tanh xi wij j = 2 N N i =1 xi wij j xi wij j i=1 + e i=1 e
2
Also:
n+1
n+ 2 n+5 n+3 n+ 4
Simplification for complex models!
28
EVIC05 Sven F. Crone - www.bis-lab.com
Combination of NodesSimple processing per node Combination of simple nodes creates complex behaviour
ok ol
wk,j wl,j
N tanh oi wij j i =1 N N oi wij j oi wij j e i=1 e i=1 = 2 N N oi wij j oi wij j e i=1 + e i=1
2
o1 o2 o3
w1,i w2,i w3,i
N tanh oi wij j i =1 N N oi wij j oi wij j e i=1 e i=1 = 2 N N oi wij j oi wij j i =1 i =1 e +e
2
wi,j oj N tanh oi wij j i =1 N N oi wij j oi wij j e i=1 e i=1 = N N oi wij j oi wij j
wi,j+1 ol wl,j
2
2
EVIC05 Sven F. Crone - www.bis-lab.com
Architecture of Multilayer PerceptronsArchitecture of a Multilayer Perceptron Classic form of feed forward neural network!Neurons un (units / nodes) ordered in Layers unidirectional connections with trainable weights wn,n Vector of input signals xi (input) Vector of output signals oj (output)w1,5 X1 X2 u1 u5 w5,8 u8 w8,12 u12 u2 u5 u9 U13 X3 u3 X4 u4 input-layer u6 u7 u10 u11 hidden-layers output-layer u14 o3 o2 o1
Combination of neurons
= neural network
ok = tanh
wkj tanh wki tanh w ji o j j i k Min! k
i
j
EVIC05 Sven F. Crone - www.bis-lab.com
Dictionary for Neural Network TerminologyDue to its neuro-biological origins, NN use specific terminologyNeural Networks Input Nodes Output Node(s) Training Weights Statistics Independent / lagged Variables Dependent variable(s) Parameterization Parameters
dont be confused: ASK!
29
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
Forecasting with Artificial Neural Networks
1. 2.
Forecasting? Neural Networks? 1. 2. 3. 4. What are NN? Definition & Online Preview Motivation & brief history of Neural Networks From biological to artificial Neural Network Structures Network Training
3. 4.
Forecasting with Neural Networks How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Hebbian LearningHEBB introduced idea of learning by adapting weights [0,1]
wij = oi a jDelta-learning rule of Widrow-Hoff
wij = oi (t j a j ) = oi (t j o j ) = oi j
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network Training with Back-PropagationTraining LEARNING FROM EXAMPLES 1. Initialize connections with randomized weights (symmetry breaking) 2. Show first Input-Pattern (independent Variables) (demo only for 1 node!) 3. Forward-Propagation of input values unto output layer 4. Calculate error between NN output & actual value (using error / objective function) 5. Backward-Propagation of errors for each weight unto input layerRERUN with next input pattern w1,4
4 5
w4,9
Input-Vector
x1 x2 x3
x
Weight-Matrix
1 2 3w3,8
9 6 10 7 8w3,10
o1 o2
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
W
E=o-t
oOutput-Vector
t Teaching-Output
30
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network TrainingSimple back propagation algorithmE p = C (t pj , o pj ) o pj = f j (net pj )C (t pj , o pj ) w ji = C (t pj , o pj ) net pj net pj w ji
[Rumelhart et al. 1982]
p w ji
C (t pj , o pj ) w ji
pj =
C (t pj , o pj ) net pj
wij = oi j f j ( net j )( t j o j ) output nodes j mit j = f j ( net j ) ( k w jk ) hidden nodes j k wij = oi j mit f (net j ) = 1 f (net j ) = o j (1 o j ) oi ( t ) wij 1+ e i o j (1 o j )( t j o j ) output nodes j j = o j (1 o j ) ( k w jk ) hidden nodes j k
pj =
C (t pj , o pj ) net pj
=
C (t pj , o pj ) o pj o pj net pj
o pt net pj
= f j' ( net pj )C (t pj , o pj ) o pjnet pk C (t pj , o pj ) net pkk
pj =
f j' ( net pj ) wki o pii
k
C (t pj , o pj ) net pk net pk o pj
=k
C (t pj , o pj )
o pj
=k
wkj = pj wkjk
pj = f j' (net pj ) pj wkj C (t pj , o pj ) ' f j (net pj ) if unit j is in the output layer o pj f ' ( net ) w pj pk pjk j k if unit j is in a hidden layer
pj =
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network Training = Error MinimizationMinimize Error through changing ONE weight wjE(wj) random starting point 2
random starting point 1
local minimum
local minimum
local minimum
local minimum GLOBAL minimum wj
EVIC05 Sven F. Crone - www.bis-lab.com
Error Backpropagation = 3D+ Gradient DecentLocal search on multi-dimensional error surface4 2 0 -2 -4
task of finding the deepest valley in mountainslocal search stepsize fixed follow steepest decent
2
0
local optimum = any valley global optimum = deepest valley with lowest error varies with error surface-40 4 2 -2 -4 6 4 2 0 0 -2.5 0
-2 0 2 4
5
2.5
-4 -2 0 2 4 0
31
EVIC05 Sven F. Crone - www.bis-lab.com
Demo: Neural Network Forecasting revistied!Simulation of NN for Business Forecasting
Airline Passenger Data Experiment3 layered NN: (12-8-1) 12 Input units - 8 hidden units 1 output unit 12 input lags t, t-1, , t-11 (past 12 observations) time series prediction t+1 forecast single step ahead forecast Benchmark Time Series [Brown / Box&Jenkins] 132 observations 13 periods of monthly data
EVIC05 Sven F. Crone - www.bis-lab.com
AgendaForecasting with Artificial Neural Networks
1. 2. 3.
Forecasting? Neural Networks? Forecasting with Neural Networks 1. 2. 3. NN models for Time Series & Dynamic Causal Prediction NN experiments Process of NN modelling
4.
How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Time Series Prediction with Artificial Neural NetworksANN are universal approximators[Hornik/Stichcomb/White92 etc.]
Forecasts as application of (nonlinear) function-approximation various architectures for prediction (time-series, causal, combined...)
yt + h = f ( xt ) + t + hyt+h = forecast for t+h f (-) = linear / non-linear functionxt = vector of observations in t et+h = independent error term in t+h
Single neuron / node nonlinear AR(p) Feedforward NN (MLP etc.) hierarchy of nonlinear AR(p) Recurrent NN (Elman, Jordan) nonlinear ARMA(p,q)
n +1
n+5n+ 2 n+6 n+3 n+ 4n+ h
yt +1 = f yt , yt 1 , yt 2 ,..., yt n 1 Non-linear autoregressive AR(p)-model
(
)
32
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network Training on Time SeriesSliding Window Approach of presenting DataInput Present new data pattern to Neural Network Calculate Neural Network Output from Input values Compare Neural Network Forecast agains actual value n +1
Backpropagation Change weights to reduce output forecast error New Data Input Slide window forward to show next pattern
Forward pass n+2 n +5 Backpropagation n +3 n+4
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network Architectures for Linear AutoregressionInterpretationweights represent autoregressive terms Same problems / shortcomings as standard AR-models!
Extensionsmultiple output nodes = simultaneous autoregression models Non-linearity through different activation function in output node
yt +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 ) yt +1 = yt wtj + yt 1wt 1 j + yt 2 wt 2 j + ... + yt n 1 wt n 1 j jlinear autoregressive AR(p)-model
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network Architecture for Nonlinear Autoregression
yt +1 = tanh yi wij j i =t Nonlinear autoregressive AR(p)-modelt n 1
yt +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
Extensionsadditional layers with nonlinear nodes linear activation function in output layer
33
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network Architectures for Nonlinear AutoregressionInterpretationAutoregressive modeling AR(p)approach WITHOUT the moving average terms of errors nonlinear ARIMA Similar problems / shortcomings as standard AR-models!
n +1
n+ 2n+5 n +3 n+ 4
Extensionsmultiple output nodes = simultaneous autoregression models
yt +1 = tanh wkj tanh wki tanh w ji yt j j i k i k j Nonlinear autoregressive AR(p)-model
yt +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network Architectures for Multiple Step Ahead Nonlinear AutoregressionInterpretationAs single Autoregressive modeling AR(p)
n +1
n +5
n+2 n +6 n +3 n+4
n+h
yt +1 , yt + 2 ,..., yt + n = f ( yt , yt 1 , yt 2 ,..., yt n 1 )Nonlinear autoregressive AR(p)-model
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network Architectures for Forecasting Nonlinear Autoregression Intervention ModelInterpretationAs single Autoregressive modeling AR(p) Additional Event term to explain external events n +1
Extensionsn +2 n +5 n +3n +4
multiple output nodes = simultaneous multiple regression
yt +1 , yt + 2 ,..., yt + n = f ( yt , yt 1 , yt 2 ,..., yt n 1 )Nonlinear autoregressive ARX(p)-model
34
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network Architecture for Linear RegressionDemand
Max Temperature
Rainfall
Sunshine Hours
y = f ( x1 , x2 , x3 ,..., xn ) y = x1 w1 j + x2t w2 j + x3t w3 j + ... + xn wnj jLinear Regression Model
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network Architectures for Non-Linear Regression (Logistic Regression)Demand
Max Temperature
Rainfall
Sunshine Hours
yt +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 )Nonlinear Multiple (Logistic) Regression Model
t n 1 yt +1 = log yi wij j i =t
EVIC05 Sven F. Crone - www.bis-lab.com
Neural Network Architectures for Non-linear RegressionInterpretationSimilar to linear Multiple Regression Modeling Without nonlinearity in output: weighted expert regime on nonlinear regression With nonlinearity in output layer: ???
n+1
n+ 2 n +5
n+ 3
n+ 4
y = f ( x1 , x2 , x3 ,..., xn ) y = x1 w1 j + x2t w2 j + x3t w3 j + ... + xn wnj jNonlinear Regression Model
35
EVIC05 Sven F. Crone - www.bis-lab.com
Classification of Forecasting MethodsForecasting Methods
Objective Forecasting Methods Time Series Methods Averages Moving Averages Naive Methods Exponential Smoothing Simple ES Linear ES Seasonal ES Dampened Trend ES Simple Regression Autoregression ARIMA Neural Networks Causal Methods Linear Regression
Subjective Forecasting Methods
Prophecy educated guessing
Sales Force Composite Analogies Delphi PERT Survey techniques
Multiple Regression Dynamic Regression Vector Autoregression Intervention model Neural Networks
Demand Planning Practice Objektive Methods + Subjektive correction
Neural Networks ARE time series methods causal methods & CAN be used as Averages & ES Regression
EVIC05 Sven F. Crone - www.bis-lab.com
Different model classes of Neural NetworksSince 1960s a variety of NN were developed for different tasksClassification Optimization ForecastingFocus
Application Specific Models
Different CLASSES of Neural Networks for Forecasting alone! Focus only on original Multilayer Perceptrons!
EVIC05 Sven F. Crone - www.bis-lab.com
Problem!MLP most common NN architecture used MLPs with sliding window can ONLY capture nonlinear seasonal autoregressive processes nSAR(p,P) BUT:Can model MA(q)-process through extended AR(p) window! Can model SARMAX-processes through recurrent NN
36
EVIC05 Sven F. Crone - www.bis-lab.com
AgendaForecasting with Artificial Neural Networks
1. 2. 3.
Forecasting? Neural Networks? Forecasting with Neural Networks 1. 2. 3. NN models for Time Series & Dynamic Causal Prediction NN experiments Process of NN modelling
4.
How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Time Series Prediction with Artificial Neural Networks Which time series patterns can ANNs learn & extrapolate?[Pegels69/Gardner85]
???Simulation of Neural Network prediction of Artificial Time Series
EVIC05 Sven F. Crone - www.bis-lab.com
Time Series Demonstration Artificial Time Series Simualtion of NN in Business Forecasting with NeuroPredictor
Experiment: Prediction of Artificial Time Series (Gaussian noise)Stationary Time Series Seasonal Time Series linear Trend Time Series Trend with additive Seasonality Time Series
37
EVIC05 Sven F. Crone - www.bis-lab.com
Time Series Prediction with Artificial Neural Networks Which time series patterns can ANNs learn & extrapolate?[Pegels69/Gardner85]
Neural Networks can forecast ALL mayor time series patternsNO time series dependent preprocessing / integration necessary NO time series dependent MODEL SELECTION required!!! SINGLE MODEL APPROACH FEASIBLE!
EVIC05 Sven F. Crone - www.bis-lab.com
Time Series Demonstration A - Lynx TrappingsSimulation of NN in Business Forecasting
Experiment: Lynx Trappings at the McKenzie River3 layered NN: (12-8-1) 12 Input units - 8 hidden units 1 output unit Different lag structures: t, t-1, , t-11 (past 12 observations t+1 forecast single step ahead forecast
Benchmark Time Series [Andrews / Hertzberg] 114 observations Periodicity? 8 years?
EVIC05 Sven F. Crone - www.bis-lab.com
Time Series Demonstration B Event ModelSimulation of NN in Business Forecasting
Experiment: Mouthwash Sales3 layered NN: (12-8-1) 12 Input units - 8 hidden units 1 output unit 12 input lags t, t-1, , t-11 (past 12 observations) time series prediction t+1 forecast single step ahead forecast
Spurious Autocorrelations from Marketing Events Advertisement with small Lift Price-reductions with high Lift
38
EVIC05 Sven F. Crone - www.bis-lab.com
Time Series Demonstration C Supermarket Sales Simulation of NN in Business Forecasting
Experiment: Supermarket sales of fresh products with weather4 layered NN: (7-4-4-1) 7 Input units - 8 hidden units 1 output unit t+4 Different lag structures: t, t-1, , t-7 (past 12 observations) t+4 forecast single step ahead forecast
EVIC05 Sven F. Crone - www.bis-lab.com
AgendaForecasting with Artificial Neural Networks
1. 2. 3.
Forecasting? Neural Networks? Forecasting with Neural Networks 1. 2. 3. NN models for Time Series & Dynamic Causal Prediction NN experiments Process of NN modelling 1. 2. 3. 4. Preprocessing Modelling NN Architecture Training Evaluation
4.
How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Decisions in Neural Network ModellingData Pre-processingNN Modelling Process Transformation Scaling Normalizing to [0;1] or [-1;1] Number of INPUT nodes Number of HIDDEN nodes Number of HIDDEN LAYERS Number of OUTPUT nodes Information processing in Nodes (Act. Functions) Interconnection of Nodes Initializing of weights (how often?) Training method (backprop, higher order ) Training parameters Evaluation of best model (early stopping)
Modelling of NN architecture
manual Decisions recquire Expert-Knowledge
Training
Application of Neural Network Model EvaluationEvaluation criteria & selected dataset
39
EVIC05 Sven F. Crone - www.bis-lab.com
Modeling Degrees of FreedomVariety of Parameters must be pre-determined for ANN Forecasting:
D=Dataset
[DSE [C
Selection
DSA N
Sampling
] S ] NOno. of output nodes
P=Preprocessing
Correction
Normalization
Scaling
A=Architecture
[NIno. of input nodes Input function
NSno. of hidden nodes Activation Function
NLno. of hidden layers
Kconnectivity / weight matrix
T
]
Activation Strategy
U=signal processing
[FI [G
FA
FO IP
Output Function
] IN B ]
L=learning algorithm
choice of Algorithm
PT,L
Learning parameters phase & layer
initializations procedure
number of initializations
stopping method & parameters
Oobjective Function
interactions & interdependencies between parameter choices!
EVIC05 Sven F. Crone - www.bis-lab.com
Heuristics to Reduce Design ComplexityNumber of Hidden nodes in MLPs (in no. of input nodes n)2n+1 [Lippmann87; Hecht-Nielsen90; Zhang/Pauwo/Hu98] 2n [Wong91]; n [Tang/Fishwick93]; n/2 [Kang91] 0.75n [Bailey90]; 1.5n to 3n [Kasstra/Boyd96]
Activation Function and preprocessinglogistic in hidden & output [Tang/Fischwick93; Lattermacher/Fuller95; Sharda/Patil92 ] hyperbolic tangent in hidden & output [Zhang/Hutchinson93; DeGroot/Wurtz91] linear output nodes [Lapedes/Faber87; Weigend89-91; Wong90]
... with interdependencies! no research on relative performance of all alternatives no empirical results to support preference of single heuristic ADDITIONAL SELECTION PROBLEM of choosing a HEURISTIC INCREASED COMPLEXITY through interactions of heurstics AVOID selection problem through EXHAUSTIVE ENUMERATION
EVIC05 Sven F. Crone - www.bis-lab.com
Tip & Tricks in Data SamplingDos and Don'tsRandom order sampling? Yes! Sampling with replacement? depends / try! Data splitting: ESSENTIAL!!!!Training & Validation for identification, parameterisation & selection Testing for ex ante evaluation (ideally multiple ways / origins!)
Simulation Experiments
40
EVIC05 Sven F. Crone - www.bis-lab.com
Data PreprocessingData TransformationVerification, correction & editing (data entry errors etc.) Coding of Variables Scaling of Variables Selection of independent Variables (PCA) Outlier removal Missing Value imputation
Data CodingBinary coding of external events binary coding n and n-1 coding have no significant impact, n-coding appears to be more robust (despite issues of multicollinearity)
Modification of Data to enhance accuracy & speed
EVIC05 Sven F. Crone - www.bis-lab.com
Data Preprocessing Variable ScalingScaling of variablesX2 Income 100.000 90.000 80.000 70.000 60.000 50.000 40.000 30.000 20.000 10.000 0 X2 Income 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.5 0.9 1.0 X1 Hair Length
0.01 0.02 0.03
Linear interval scalingy = ILower + ( IUpper ILower )
X1 Hair Length
(x Min(x)) M ax(x) M in(x)
Intervall features, e.g. turnover [28.12 ; 70; 32; 25.05 ; 10.17 ] Linear Intervall scaling to taget intervall, e.g. [-1;1]eg. x = 72 Max(x) = 119.95 Min(x) = 0 Target [-1;1] y = 1 + (1 (1)) (72 0) 144 = 1 + = 0.2005 119.95 0 119.95
EVIC05 Sven F. Crone - www.bis-lab.com
Data Preprocessing Variable ScalingScaling of variablesX2 Income 100.000 90.000 80.000 70.000 60.000 50.000 40.000 30.000 20.000 10.000 0 X2 Income 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
0.01 0.02 0.03
X1 0.0 0.1 0.2 0.5 0.9 1.0 X1 Hair Length Hair Length
Standardisation / Normalisationy= x
Attention: Interaction of interval with activation FunctionLogistic [0;1] TanH [-1;1]
41
EVIC05 Sven F. Crone - www.bis-lab.com
Data Preprocessing OutliersOutliersextreme values Coding errors Data errors
Outlier impact on scaled variables
potential to bias the analysisScaling
Impact on linear interval scaling (no normalisation / standardisation)
Actions
0
10
253
-1
+1
Eliminate outliers (delete records) replace / impute values as missing values Binning of variable = rescaling Normalisation of variables = scaling
EVIC05 Sven F. Crone - www.bis-lab.com
Data Preprocessing Skewed DistributionsAsymmetry of observations
Transform dataTransformation of data (functional transformation of values) Linearization or Normalisation
Rescale (DOWNSCALE) data to allow better analysis byBinning of data (grouping of data into groups) ordinal scale!
EVIC05 Sven F. Crone - www.bis-lab.com
Data Preprocessing Data EncodingDownscaling & Coding of variablesmetric variables create bins/buckets of ordinal variables (=BINNING)Create buckets of equaly spaced intervalls Create bins if Quantile with equal frequencies
ordinal variable of n values rescale to n or n-1 nominal binary variables nominal Variable of n values, e.g. {Business, Sports & Fun, Woman} Rescale to n or n-1 binary variables 0 = Business Press 1 = Sports & Fun 2 = WomanRecode as 1 of N Coding 3 new bit-variables 1 0 0 Business Press 0 1 0 Sports & Fun 0 0 1 Woman Recode 1 of N-1 Coding 2 new bit-variables 1 0 Business Press 0 1 Sports & Fun 0 0 Woman
42
EVIC05 Sven F. Crone - www.bis-lab.com
Data Preprocessing Impute Missing ValuesMissing Valuesmissing feature value for instance some methods interpret as 0! Others create special class for missing X2 Income 100.000 90.000 80.000 70.000 60.000 50.000 40.000 30.000 20.000 10.000 0
0.01 0.02 0.03
X1 Hair Length
SolutionsMissing value of interval scale Missing value of nominal scale set mean, median, etc. most prominent value in feature
EVIC05 Sven F. Crone - www.bis-lab.com
Tip & Tricks in Data Pre-ProcessingDos and Don'tsDe-Seasonalisation? NO! (maybe you can try!) De-Trending / Integration? NO / depends / preprocessing! Normalisation? Not necessarily correct outliers! Scaling Intervals [0;1] or [-1;1]? Both OK! Apply headroom in Scaling? YES! Interaction between scaling & preprocessing? limited
Simulation Experiments
EVIC05 Sven F. Crone - www.bis-lab.com
Outlier correction in Neural Network Forecasts?Outlier correction? YES! Neural networks are often characterized asFault tolerant and robust Showing graceful degradation regarding errors Fault tolerance = outlier resistance in time series prediction?
Simulation Experiments
43
EVIC05 Sven F. Crone - www.bis-lab.com
Number of OUTPUT nodes
Given by problem domain! 1 or 2 depends on Information Processing in nodes Also depends on nonlinearity & continuity of time series Trial & error sorry!
Number of HIDDEN LAYERS Number of HIDDEN nodes
Information processing in Nodes (Act. Functions)Sig-Id Sig-Sig (Bounded & additional nonlinear layer) TanH-Id TanH-TanH (Bounded & additional nonlinear layer)
Interconnection of Nodes???
EVIC05 Sven F. Crone - www.bis-lab.com
Tip & Tricks in Architecture ModellingDos and Don'tsNumber of input nodes? DEPENDS! use linear ACF/PACF to start! Number of hidden nodes? DEPENDS! evaluate each time (few) Number of output nodes? DEPENDS on application! fully or sparsely connected networks? ??? shortcut connections? ??? activation functions logistic or hyperbolic tangent? TanH !!! activation function in the output layer? TanH or Identity!
Simulation Experiments
EVIC05 Sven F. Crone - www.bis-lab.com
AgendaForecasting with Artificial Neural Networks
1. 2. 3.
Forecasting? Neural Networks? Forecasting with Neural Networks 1. 2. 3. NN models for Time Series & Dynamic Causal Prediction NN experiments Process of NN modelling 1. 2. 3. 4. Preprocessing Modelling NN Architecture Training Evaluation & Selection
4.
How to write a good Neural Network forecasting paper!
44
EVIC05 Sven F. Crone - www.bis-lab.com
Tip & Tricks in Network TrainingDos and Don'tsInitialisations? A MUST! Minimum 5-10 times!!!Error Variation by number of Initialisations40,0 35,0 30,0 25,0 [MAE] 20,0 15,0 10,0 5,0 lowest error 0,0 1 5 10 25 50 100 200 400 [initialisations] highest error 800 1600
Simulation Experiments
EVIC05 Sven F. Crone - www.bis-lab.com
Tip & Tricks in Network Training & SelectionDos and Don'tsInitialisations? A MUST! Minimum 5-10 times!!! Selection of Training Algorithm? Backprop OK, DBD OK not higher order methods! Parameterisation of Training Algorithm? DEPENDS on dataset! Use of early stopping? YES carefull with stopping criteria! Suitable Backpropagation training parameters (to start with)Learning rate 0.5 (always