Time Series Forecasting With Feed-Forward Neural Networks:

40
Eric Plummer Computer Science Department University of Wyoming June 6, 2022 Time Series Forecasting With Feed-Forward Neural Networks: Guidelines And Limitations

Transcript of Time Series Forecasting With Feed-Forward Neural Networks:

Page 1: Time Series Forecasting With Feed-Forward Neural Networks:

Eric Plummer

Computer Science Department

University of Wyoming

April 8, 2023

Time Series Forecasting WithFeed-Forward Neural Networks:

Guidelines And Limitations

Page 2: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 2

TopicsTopics

• Thesis Goals• Time Series Forecasting• Neural Networks• K-Nearest-Neighbor• Test-Bed Application• Empirical Evaluation• Data Preprocessing• Contributions• Future Work• Conclusion• Demonstration

Page 3: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 3

Thesis GoalsThesis Goals

• Compare neural networks and k-nearest-neighbor for time series forecasting

• Analyze the response of various configurations to data series with specific characteristics

• Identify when neural networks and k-nearest-neighbor are inadequate

• Evaluate the effectiveness of data preprocessing

Page 4: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 4

Time Series Forecasting –Time Series Forecasting –DescriptionDescription

• What is it?– Given an existing data series, observe or model the

data series to make accurate forecasts

• Example data series– Financial (e.g., stocks, rates)

– Physically observed (e.g., weather, sunspots)

– Mathematical (e.g., Fibonacci sequence)

Page 5: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 5

Time Series Forecasting –Time Series Forecasting –DifficultiesDifficulties

• Why is it difficult?– Limited quantity of data

• Observed data series sometimes too short to partition

– Noise • Erroneous data points• Obscuring component

– Moving Average

– Nonstationarity• Fundamentals change over time• Nonstationary mean: “Ascending” data series

– First-difference preprocessing

– Forecasting method selection • Statistics• Artificial intelligence

Page 6: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 6

Time Series Forecasting –Time Series Forecasting –ImportanceImportance

• Why is it important?– Preventing undesirable events by forecasting the

event, identifying the circumstances preceding the event, and taking corrective action so the event can be avoided (e.g., inflationary economic period)

– Forecasting undesirable, yet unavoidable, events to preemptively lessen their impact (e.g., solar maximum w/ sunspots)

– Profiting from forecasting (e.g., financial markets)

Page 7: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 7

Neural Networks – Neural Networks – BackgroundBackground

• Loosely based on the human brain’s neuron structure• Timeline

– 1940’s – McCulloch and Pitts – proposed neuron models in the form of binary threshold devices and stochastic algorithms

– 1950’s & 1960’s – Rosenblatt – class of learning machines called perceptrons

– Late 1960’s – Minsky and Papert – discouraging analysis of perceptrons (linearly separable classes)

– 1980’s – Rumelhart, Hinton, and Williams – generalized delta rule for learning by back-propagation for training multilayer perceptrons

– Present – many new training algorithms and architectures, but nothing “revolutionary”

Page 8: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 8

Neural Networks –Neural Networks –ArchitectureArchitecture

• A feed-forward neural network can have any number of:– Layers– Units per layer– Network inputs– Network outputs

• Hidden layers (A, B)• Output layer (C)

Page 9: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 9

Neural Networks –Neural Networks –UnitsUnits

• A unit has:– Connections– Weights– Bias– Activation function

• Weights and bias are randomly initialized before training

• Unit’s input consists of:– Sum of the products of each connection

value and associated weight– Add the bias

• Input is then fed into unit’s activation function

• Unit’s output is the output of activation function

– Hidden layers: Sigmoid– Output layer: Linear

Page 10: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 10

Neural Networks –Neural Networks –TrainingTraining

• Partition data series into:– Training set– Validation set (optional)– Test set (optional)

• Typically, the training procedure is:– Perform backpropagation training with training set– After n epochs, compute total squared error on training set

and validation set– If consistently validation error and training error , stop

training.• Overfitting: Training set learned too well• Generalization: Given inputs not in training and validation sets,

able to accurately forecast

Page 11: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 11

Neural Networks –Neural Networks –TrainingTraining

• Backpropagation training:– First, examples in the form of <input, output> pairs are

extracted from the data series– Then, the network is trained with backpropagation on the

examples:1. Present an example’s input vector to the network inputs and

run the network sequentially forward2. Propagate the error sequentially backward from the output layer 3. For every connection, change the weight modifying that

connection in proportion to the error

– When all three steps have been performed for all examples, one epoch has occurred

– Goal is to converge to a near-optimal solution based on the total squared error

Page 12: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 12

Neural Networks –Neural Networks –TrainingTraining

Backpropagation training cycle

Page 13: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 13

Neural Networks –Neural Networks –ForecastingForecasting

• Forecasting method depends on examples

• Examples depend on step-ahead size

If step-ahead size is one: Iterative forecasting

If step-ahead size is greater than one: Direct forecasting

Page 14: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 14

Neural Networks –Neural Networks –ForecastingForecasting

Iterative forecasting

Can continue this indefinitely

Page 15: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 15

Neural Networks –Neural Networks –ForecastingForecasting

Directly forecasting n steps

This is the only forecast

Page 16: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 16

K-Nearest-Neighbor –K-Nearest-Neighbor –ForecastingForecasting

• No model to train• Simple linear

search• Compare

reference to candidates

• Select k candidates with lowest error

• Forecast is average of k next values

Page 17: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 17

Test-Bed Application –Test-Bed Application –FORECASTERFORECASTER

• Written in Visual C++ with MFC• Object-oriented• Multithreaded• Wizard-based• Easily modified• Implements feed-forward neural networks & k-

nearest-neighbor• Used for time series forecasting• Eventually will be upgraded for classification

problems

Page 18: Time Series Forecasting With Feed-Forward Neural Networks:

Empirical Evaluation – Data SeriesEmpirical Evaluation – Data Series

Original

0

5

10

15

20

25

30

35

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105

112

119

126

133

140

147

154

161

168

175

182

189

196

203

210

Data Point

Va

lue

Original with Less Noisy

-5

0

5

10

15

20

25

30

35

0 7 14

21

28

35

42

49

56

63

70

77

84

91

98

105

112

119

126

133

140

147

154

161

168

175

182

189

196

203

210

Data Point

Va

lue

Original Less Noisy

Original with More Noisy

-10

-5

0

5

10

15

20

25

30

35

40

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105

112

119

126

133

140

147

154

161

168

175

182

189

196

203

210

Data Point

Va

lue

Original More NoisyOriginal with Ascending

0

10

20

30

40

50

60

0 7 14

21

28

35

42

49

56

63

70

77

84

91

98

105

112

119

126

133

140

147

154

161

168

175

182

189

196

203

210

Data Point

Va

lue

Original Ascending

Sunspots 1784-1983

0

20

40

60

80

100

120

140

160

180

200

178

4

179

1

179

8

180

5

181

2

181

9

182

6

183

3

184

0

184

7

185

4

186

1

186

8

187

5

188

2

188

9

189

6

190

3

191

0

191

7

192

4

193

1

193

8

194

5

195

2

195

9

196

6

197

3

198

0

Year

Co

un

t

Original

More Noisy

Less Noisy

Ascending

Sunspots

Page 19: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 19

Empirical Evaluation –Empirical Evaluation –Neural Network ArchitecturesNeural Network Architectures

• Number of network inputs based on data series

• Need to make unambiguous examples

• For “sawtooths”:– 24 inputs are necessary– Test networks with 25 &

35 inputs– Test networks with 1

hidden layer with 2, 10, & 20 hidden layer units

– One output layer unit

• For sunspots:– 30 inputs– 1 hidden layer with 30

units• For real-world data series,

selection may be trial-and-error!

Page 20: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 20

Empirical Evaluation –Empirical Evaluation –Neural Network TrainingNeural Network Training

• Heuristic method:– Start with aggressive

learning rate– Gradually lower learning

rate as validation error increases

– Stop training when learning rate cannot be lowered anymore

• Simple method:– Use conservative

learning rate– Training stops when:

• Number of training epochs equals the epochs limit -or-

• Training error is less than or equal to error limit

Page 21: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 21

Empirical Evaluation –Empirical Evaluation –Neural Network ForecastingNeural Network Forecasting

• Metric to compare forecasts: Coefficient of Determination– Value may be (-, 1]– Want value between 0

and 1, where 0 is forecasting the mean of the data series and 1 is forecasting the actual value

– Must have actual values to compare with forecasted values

• For networks trained on original, less noisy, and more noisy data series, forecast will be compared to original series

• For networks trained on ascending data series, forecast will be compared to continuation of ascending series

• For networks trained on sunspots data series, forecast will be compared to test set

Page 22: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 22

Empirical Evaluation –Empirical Evaluation –K-Nearest-NeighborK-Nearest-Neighbor

• Choosing window size analogous to choosing number of neural network inputs

• For sawtooth data series:– k = 2

– Test window sizes of 20, 24, and 30

• For sunspots data series:– k = 3

– Window size of 10

• Compare forecasts via coefficient of determination

Page 23: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 23

Empirical Evaluation –Empirical Evaluation –Candidate SelectionCandidate Selection

• Neural networks– For each training method, data series, and

architecture, 3 candidates were trained

– Also, average of 3 candidates’ forecasts was taken: forecasting by committee

– Best forecast was selected based on coefficient of determination

• K-nearest-neighbor– For each data series, k, and window size, only one

search was performed (only one needed)

Page 24: Time Series Forecasting With Feed-Forward Neural Networks:

Empirical Evaluation – Original Data SeriesEmpirical Evaluation – Original Data Series

Nets Trained on Original

-10

-5

0

5

10

15

20

25

30

35

21

6

21

9

22

2

22

5

22

8

23

1

23

4

23

7

24

0

24

3

24

6

24

9

25

2

25

5

25

8

26

1

26

4

26

7

27

0

27

3

27

6

27

9

28

2

28

5

Data Point

Va

lue

Original 35,2 35,10 35,20

Nets Trained on Original

-5

0

5

10

15

20

25

30

35

21

6

21

9

22

2

22

5

22

8

23

1

23

4

23

7

24

0

24

3

24

6

24

9

25

2

25

5

25

8

26

1

26

4

26

7

27

0

27

3

27

6

27

9

28

2

28

5

Data Point

Va

lue

Original 35,2 35,10 35,20

Nets Trained on Original

-150

-100

-50

0

50

100

150

216

219

222

225

228

231

234

237

240

243

246

249

252

255

258

261

264

267

270

273

276

279

282

285

Data Point

Va

lue

Original 25,10 25,20

K-Nearest-Neighbor on Original

0

5

10

15

20

25

30

35

216

219

222

225

228

231

234

237

240

243

246

249

252

255

258

261

264

267

270

273

276

279

282

285

Data Point

Va

lue

Original 2,20 2,24 2,30

Simple NNHeuristic NN

Smaller NN K-N-N

Page 25: Time Series Forecasting With Feed-Forward Neural Networks:

Empirical Evaluation – Less Noisy Data SeriesEmpirical Evaluation – Less Noisy Data Series

Simple NNHeuristic NN

K-N-N

Nets Trained on Less Noisy

-10

-5

0

5

10

15

20

25

30

35

40

216

219

222

225

228

231

234

237

240

243

246

249

252

255

258

261

264

267

270

273

276

279

282

285

Data Point

Va

lue

Original 35,2 35,10 35,20

Nets Trained on Less Noisy

-20

-10

0

10

20

30

40

216

219

222

225

228

231

234

237

240

243

246

249

252

255

258

261

264

267

270

273

276

279

282

285

Data Point

Va

lue

Original 35,2 35,10 35,20

K-Nearest-Neighbor on Less Noisy

0

5

10

15

20

25

30

35

216

219

222

225

228

231

234

237

240

243

246

249

252

255

258

261

264

267

270

273

276

279

282

285

Data Point

Va

lue

Original 2,20 2,24 2,30

Page 26: Time Series Forecasting With Feed-Forward Neural Networks:

Empirical Evaluation – More Noisy Data SeriesEmpirical Evaluation – More Noisy Data Series

Simple NNHeuristic NN

K-N-N

Nets Trained on More Noisy

-20

-10

0

10

20

30

40

50

60

216

219

222

225

228

231

234

237

240

243

246

249

252

255

258

261

264

267

270

273

276

279

282

285

Data Point

Va

lue

Original 35,10 35,20

Nets Trained on More Noisy

-30

-20

-10

0

10

20

30

40

50

60

216

219

222

225

228

231

234

237

240

243

246

249

252

255

258

261

264

267

270

273

276

279

282

285

Data Point

Va

lue

Original 35,10 35,20

K-Nearest-Neighbor on More Noisy

-10

-5

0

5

10

15

20

25

30

35

216

219

222

225

228

231

234

237

240

243

246

249

252

255

258

261

264

267

270

273

276

279

282

285

Data Point

Va

lue

Original 2,20 2,24 2,30

Page 27: Time Series Forecasting With Feed-Forward Neural Networks:

Empirical Evaluation – Ascending Data SeriesEmpirical Evaluation – Ascending Data Series

Simple NNHeuristic NNNets Trained on Ascending

0

10

20

30

40

50

60

70

216

219

222

225

228

231

234

237

240

243

246

249

252

255

258

261

264

267

270

273

276

279

282

285

Data Point

Va

lue

Ascending 35,10 35,20

Nets Trained on Ascending

0

10

20

30

40

50

60

70

216

219

222

225

228

231

234

237

240

243

246

249

252

255

258

261

264

267

270

273

276

279

282

285

Data Point

Va

lue

Ascending 35,2 35,10 35,20

Page 28: Time Series Forecasting With Feed-Forward Neural Networks:

Empirical Evaluation – Longer ForecastEmpirical Evaluation – Longer Forecast

Nets Trained on Less Noisy (Longer Forecast)

-80

-60

-40

-20

0

20

40

60

216

221

226

231

236

241

246

251

256

261

266

271

276

281

286

291

296

301

306

311

316

321

326

331

336

341

346

351

356

Data Point

Va

lue

Original 35,2 35,10 35,20

Nets Trained on More Noisy (Longer Forecast)

-100

-50

0

50

100

150

216

221

226

231

236

241

246

251

256

261

266

271

276

281

286

291

296

301

306

311

316

321

326

331

336

341

346

351

356

Data Point

Va

lue

Original 35,10 35,20

Heuristic NN

Page 29: Time Series Forecasting With Feed-Forward Neural Networks:

Empirical Evaluation – Sunspots Data SeriesEmpirical Evaluation – Sunspots Data Series

Sunspots 1950-1983

-50

0

50

100

150

200

250

19

50

19

52

19

54

19

56

19

58

19

60

19

62

19

64

19

66

19

68

19

70

19

72

19

74

19

76

19

78

19

80

19

82

Year

Co

un

t

Test Set 30,30 Neural Net 3,10 K-Nearest-Neighbor

Simple NN & K-N-N

Page 30: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 30

Empirical Evaluation –Empirical Evaluation –DiscussionDiscussion

• Heuristic training method observations:

– Networks train longer (more epochs) on smoother data series like the original and ascending data series

– The total squared error and unscaled error are higher for noisy data series

– Neither the number of epochs nor the errors appear to correlate well with the coefficient of determination

– In most cases, the committee forecast is worse than the best candidate's forecast

• When actual values are unavailable, choosing the best candidate is difficult!

Page 31: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 31

Empirical Evaluation –Empirical Evaluation –DiscussionDiscussion

• Simple training method observations:

– The total squared error and unscaled error are higher for noisy data series with the exception of the 35:10:1 network trained on the more noisy data series

– The errors do not appear to correlate well with the coefficient of determination

– In most cases, the committee forecast is worse than the best candidate's forecast

– There are four networks whose coefficient of determination is negative, compared with two for the heuristic training method

Coefficient of Determination Comparison

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Original Less Noisy More Noisy Ascending

Data Series

Co

eff

icie

nt

of

De

term

ina

tio

n

35,2 35,10 35,20

Coefficient of Determination Comparison

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Original Less Noisy More Noisy Ascending

Data Series

Co

eff

icie

nt

of

De

term

ina

tio

n

35,2 35,10 35,20

Page 32: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 32

Empirical Evaluation –Empirical Evaluation –DiscussionDiscussion

• General observations:– One training method did not appear to be clearly better – Increasingly noisy data series increasingly degraded the forecasting

performance– Nonstationarity in the mean degraded the performance– Too few hidden units (e.g., 35:2:1) forecasted well on simpler data

series, but failed for more complex ones– Excessive numbers of hidden units (e.g, 35:20:1) did not hurt

performance– Twenty-five network inputs was not sufficient– K-nearest-neighbor was consistently better than the neural networks – Feed-forward neural networks are extremely sensitive to architecture

and parameter choices, and making such choices is currently more art than science, more trial-and-error than absolute, more practice than theory!

Page 33: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 33

Data PreprocessingData Preprocessing

• First-difference– For ascending data series, a neural network trained on first-

difference can forecast near perfectly– In that case, it is better to train and forecast on first-

difference– FORECASTER reconstitutes forecast from its first-difference

• Moving average– For noisy data series, moving average would eliminate much

of the noise– But would also smooth out peaks and valleys– Series may then be easier to learn and forecast– But in some series, the “noise” may be important data (e.g.,

utility load forecasting)

Page 34: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 34

ContributionsContributions

• Filled a void within feed-forward neural network time series forecasting literature: know how networks respond to various data series characteristics in a controlled environment

• Showed that k-nearest-neighbor is a better forecasting method for the data series used in this research

• Reaffirmed that neural networks are very sensitive to architecture, parameter, and learning method changes

• Presented some insight into neural network architecture selection: selecting number of network inputs based on data series

• Presented a neural network training heuristic that produced good results

Page 35: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 35

Future WorkFuture Work

• Upgrade FORECASTER to work with classification problems

• Add more complex network types, including wavelet networks for time series forecasting

• Investigate k-nearest-neighbor further• Add other forecasting methods, (e.g., decision trees

for classification)

Page 36: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 36

ConclusionConclusion

• Presented:– Time series forecasting

– Neural networks

– K-nearest-neighbor

– Empirical evaluation

• Learned a lot about the implementation details of the forecasting techniques

• Learned a lot about MFC programming

Page 37: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 37

DemonstrationDemonstration

Various files can be found at:http://w3.uwyo.edu/~eplummer

Page 38: Time Series Forecasting With Feed-Forward Neural Networks:

xHiddenc

P

ppcpcHiddenc e

xhwherebwihO

1

1)(

1,,

xxhwherebwihO Outputc

P

ppcpcOutputc

)(

1,,

))(( ccOutputc ODxh

N

ncnnHiddenc wxh

1,)(

pcpc Ow ,

Unit Output, Error, and Weight Unit Output, Error, and Weight Change FormulasChange Formulas

Page 39: Time Series Forecasting With Feed-Forward Neural Networks:

xthanforecastworseaisxifk

xxgenerallyif

xthanforecastbetteraisxifk

xxiif

r

i

i

i

ii

ˆ0

ˆ0

ˆ10

ˆ1

2

C

cccC ODE

1

2

2

1

C

cccC UOUDUE

1

n

ii

n

iii

xx

xxr

1

2

1

2

2

)(

)ˆ(1

Forecast Error FormulasForecast Error Formulas

Page 40: Time Series Forecasting With Feed-Forward Neural Networks:

April 8, 2023 Eric Plummer 40

Related WorkRelated Work

• Drossu and Obradovic (1996): hybrid stochastic and neural network approach to time series forecasting

• Zhang and Thearling (1994): parallel implementations of neural networks and memory-based reasoning

• Geva (1998): multiscale fast wavelet transform and an array of feed-forward neural networks

• Lawrence, Tsoi, and Giles (1996): encodes the series with a self-organizing map and uses recurrent neural networks

• Kingdon (1997): automated intelligent system for financial forecasting and uses neural networks and genetic algorithms