Progress in Forecasting by Neural Networks

8/2/2019 Progress in Forecasting by Neural Networks

1/6

Progress in Forecasting by Neural NetworksP. CAIRE, G. HATABIAN, C. MULLER

Electriiite De FranceDirection des Etudes et Recherches1, avenue du General de Gaulle92141 Clamart cedex, France

ABSTRACTForecasting electricity consumption is a basic concern for a company like Electricite de France, theFrench electricity supply company. Our purpose in this document is to study forecasting by neuralnetworks. The great advantage of NN forecasting is not just its performance, which is as good as tradiionalmodels (ARIMA models), but the possibility of including exogenous variables, data resulting fromforecasting more than one step ahead, and the possibility of changing the minimization criierion accordingto economic conditions.

1 INTRODUCTIONAccurate short term forecasting of consumption is very important for the efficient management ofelectrical power production.For this reason our company, the French electricity supply board needs to study every new modelwhich may offer an improvement in forecasting quality. The use of neural networks seems to be a potentialalternative to traditional methods.First the data are described. Secondly a comparison is made between the traditional and NN methods.The traditional method is known as the Box and Jenkins method. A description is then made of the bestmethod and the best model is shown. Different NN approaches are tried and the resulting modelspresented. Then a comparison between the results of various models is explained. Finally a discussiontakes place about some of the NN properties which are useful for forecasting, such as the use of newminimization criteria which can include economic characteristics, or the good results obtained with NNmodels for forecasting more than one step ahead.

2 DATAFor our first study of forecasting by NN, we have chosen a relatively easy time series with the aim ofidentifying the difficulties of the NN method. The data is the daily electricity consumption for the whole ofFrance from 1986 to 1990 inclusive. In addition to the ease of this method, no traditional method has yetbeen made for this series. Therefore the traditional and NN models were made at the same time, thusmaking the comparison between the two methods that much easier.Two other series are known, firstly the temperatures, and more notably the maxima and minima for thesix cities representing the temperatures in France, and secondly the nebulosities and particularly themaxima and minima for the same sixcities.

3 THE MODELSBoth of the approaches and the various models are presented in this section.

0-7803-0559-0192 $3.00 Q 1992 IEEE 11-540T


2/6

3.1 The Traditional ApproachThe forecasting methods used until now have been based on the following system. Observations are firstcorrected for the weather effect, and individual values smoothed out. Then a normal temperature forecastis made using an ARlMA model. Finally, further corrections are made according to the weather forecastand characteristics of the day for which the forecast is being made. At this point, corrections for theweather effect are made using a sophisticated model which introduces non-linearity betweenconsumption and temperature.The forecasting of the corrected observations is made using the following ARlMA model:

The greatest difficulty in this method is not so much researching a good ARlMA model and itsparameters,which does however require a good knowledge of the Box and Jenkins method, than the preliminarycorrections which are essential and require a very good knowledge of the effects of the weather onelectricity consumption. On the contrary, the time when computing is needed is now.3 . 2 The Neural Approach

The neural model, which is used here is a multi-layer perception,the learning was done with a backpropagation algorithm, and the transfer function is sigmoid. Contrary to the traditional approach, (seesection 3.1), the neural network forecasts are made directly from the observations without any corrections.Exogenous variables, such as temperature and nebulosity, are introduced directly as a network input. Theoutput is always one neuron which gives forecast consumption one step ahead.NN'sfrom different ideas are studied. Firstly, most of the variables which are correlated with the forecast

consumption are introduced as input, this NN is called the maximum model. Due to the size of suchnetworks and the computational time needed for the learning, two additional models are prepared. Thesecond one, i.e. the first additional one, is derived from the idea which is completely opposite to theprevious one, at the same level of results as the maximum, only the most important variables are retainedas input, i.e. those which are the most closely related with consumption forecasting. The second NN is theminimum model. We then try to reduce the number of maximum model connections by using anotherminimization criierion (Weigen et al. 90). This last model is called the reduced maximum d e l .Maximum modelm. he number of input neurons is 134. They are divided as follows. The last 35 consumption values(35 revious days); 7 boolean neurons for each day of the week 72 temperature values, i.e. the maximaand the minima for the six French cities for the last 5 days, and the forecasting day; 12 values of nebulosity,i.e. the maxima and minima for the six cities for the forecast day; and the 7 last errors made by the network.Architecture: The maximum model has only one hidden layer with 3 neurons

11-541


3/6

The minimum model

Thur.

W dTWS.

inputrwurcm

Sun.

Sat.Fri.

m.This network has 18 neuron inputs, i.e. the last 5 consumption observations; the last 5 temperatureaverages; the temperature forecast average, and the 7 boolean neurons for each day of the week.

* n nv U

-- -- n- -Wn-

W

Architecture: The minimum model and the maximum model both have one hidden layer with 3 neurons.

The reduced maximum modelloprd:The reducedmaximum model has the same input as the maximum model.

ecture: The purpose is to reduce the number of connections in comparison with the maximummodel. An algorithm based on weight elimination is used which reduces the number of connections by30%. The purpose of the new criieribnE l is to try to reduce to zero the connections not carrying data.

where Eo s the quadratic error.In addition to the performance of this network, one of the advantages is the possibilityof explainingsome connections and hidden neuron rules, as opposed to the maximum network which is difficult toexplain after convergence. For instance, the interpretationof the reduced maximum mode weighing upbetween the 7 input neurons of the week and the first hidden neuron, shows that Thursday and Fridaytogether are diametrically opposed to Saturday and Sunday, and that Monday, Tuesday and Wednesdayare 'average' (figure 1). This result is stable as we obtain approximately the same resultswith differentinitial weights.

Theweek etlect

11-542


4/6

In comparison with the traditional method, the amount of data required on the process is less for theNN. We do not need to know exactly how the weather works. Nevertheless, previously known data makesthe research of good networks easier. For the NN approach, the computation time for the learning phaseis much greater than for the maximum model, and especially for networks with a lot of connections.

ARIMAmodelmaximummodelminimummodelreducedmaximum model

3. 3 The results

1986-1989 1990C l c2 C l c2

2.1 Yo 2.75% 1.80% 2.64%1.75% 2.51% 1.75% 2.68%1.79% 2.67% 1.84% 3.08%157% 2.38% 1.72% 2.60%

Table 1 shows firstly that the minimum model is not efficient for generalizing, and secondly that both themaximum and the reduced maximum models are better than the ARlMA model, and finally that the reducedmaximum model is the best one. However the difference between all these results is fairly small.These two criteria are too global to be sufficient to give a good idea of the forecasting qualities of thevarious models. Therefore we must study the distribution of the error, which shows that the forecasts areunbiased and for the ARIMA model, as we already knew, and also for the NN models. The differencebetween the spreads is in the queue. The absolute error maxima of the NN models are much higher thanthose of the ARlMA model(236GWh as against 145GWh). On the other hand the number of absoluteerrors over 80GWh is less for the NNmodels then for the ARlMA models(22 gainst 36). We also look thebad forecast days of the models. These are the same for the 3NN models and most of them are publicholidays. On the other hand the badly forecasted days of the ARlMA model are different from those of theNN. Most of them are not public holidays, and the errors are very difficult to explain. We expect that a newNN model, including public holidays, will improve the results.

4 D I S C U S S I O NThe NN results are hardly any better than those of the ARlMA model. The advantages of the NN lieboth in the results and in the properties of the approach. In Section3we underlined the fact that we needless data on the process and that exogenous variables are introduced direct to the input layer. We nowhave two other advantages to present: firstly the good results with NN for forecasting more than one stepahead , and secondly, the possibility of introducing economic characteristics n the minimization criteria.

4 . 1 p Step Ahead ForecastingThe different estimated models are optimized for a one day ahead forecasting. But they can be usedfor distant days forecasting. From d+l to d+20, we study the forecasting by the 4 models, the quality

evolution is shown by the figure 3.

11-543


5/6

P STEP AHEAD FORECASTING

8 -

7

6

5x 4

3'

210

1

..* -.........-

3 5 7 0 11 13 1547 10figure 3Evolution of the error standard deviation for the 4 models

The ARIMA model has a pseudo periodicity of 7 with a sharp increase for d+14, which the NN modelshave not. The minimum model, regardless of the horizon studied, is not so good, lthough for d+20 toresult is close to the ARIMA one. Up to d+14 the ARIMA, the maximum,are the reduced maximum modelsare more or less equivalent. From then on the NN is always better. The two maximum models evolve inparallel with each other, but the reduced maximum is always better. In conclusion, in the short term theperformances are equivalent, but for forecasting more than 14 steps ahead the NN models are much morereliable.4.2 The Minimization Criterion

Most of the time the NN weights are determined as the minimum of the average quadratic error on helearning side (see the results in Section 3.2). But for the electricity forecasting problem, the real costfunction is not y=x2. For instance, we want to reduce the number of the highest errors, sowe use a newminimization criterion that is y=xk, with k even higher than 2.

ABSOLUTE ERRORS BAR CHAR16005 0 0 14 003 0 02 0 01 0 00

k = 2k = 8

1 3 5 7 9 1 1 1 3 1 5 1 7 1 9 2 1 2 3 2 5 2 7 2 9figure 4

11-54


6/6

As figure 4shows, the minima of the absolute errors are much lower for the minimization unction y =x8than for y = x2but the absolute error average increases too quickly as k increases.Two conclusions can be drawn on this point, firstly we can consider a model with two networks, the firstone for the "standard" days (k = 2) and the second for days which are difficult to forecast (k > 2). Theproblem will then be to classify the days into "standard" and "difficult'. Secondty, we can clearly determinethe real cost function and use it as the minimization criterion.

5 CONCLUSIONThis study shows the advantage of forecasting using NN. The first result, which is absolutely essentialto NN approach validation: the forecasting quality of the NN, is equivalent to the result with the traditionalapproach. Secondly, several properties of the NN make them especially advantageous, such as the easeof including exogenous variables, the good results with p step ahead forecasting, and the introduction ofeconomic properties in the minimization criterion. These characteristics do not exist in the traditionalapproach

REFERENCES[de Groot & Wiirtz 901C. de Groot and D. Wiirtz, "Analysis of Univariate Time Series with Connectionist Nets :A case study of twoClassical Examples", Report of Neural Networks for Statistical and Economic Data, Dublin December 1990.[Park et al. 911D.C Park, M.A. El Sharkowi, R.J. Marks, L. E. Atlas, M. J. Damborg,,"Electric Load Forecasting using anArtificial Neural NetworK', IEEE transactions on Power Systems,Vol6 N"2, May 1991.[Canu et al. 901S. Canu, R. Sobral, R. Lengelle,"Formal Neural Networks as an Adaptative model for Water Deman8,lNNCParis 1990.[Varfis &Versino901A. Varfis, C. Versino,"Univariate Economic Time Series Forecasting by Connectionist Methods", 1CParis 1990.weigen et al. 901A. S. Weigend, B.A. Huberman, D.E. Rumelhart,"Predicting the Future : a Connectionist Approach",International Joumal of Neural Systems, Vol 1 No3 (1990).

11-545

Progress in Forecasting by Neural Networks

Documents

Transcript of Progress in Forecasting by Neural Networks