[IEEE 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06) -...

Using Correlation to Improve Boosting Technique: An Application for TimeSeries Forecasting

Luzia Vidal de [email protected]

PhD student, Numerical Methods InEngineering Federal University of

Parana, Departament of Design, PO19981, Curitiba, Brazil

Aurora T. Ramirez [email protected]

Associate Professor of ComputerScience Department

Federal University of Parana,PO 19981, Curitiba, Brazil

Anselmo Chaves [email protected]

Associate professor of StatisticalDepartment, PO 19981

Federal University of Parana,Curitiba, Brazil

Abstract

Time series forecasting has been widely used tosupport decision making, in this context a highlyaccurate prediction is essential to ensure the quality ofthe decisions. Ensembles of machines currently receivea lot of attention; they combine predictions fromdifferent forecasting methods as a procedure toimprove the accuracy. This paper explores GeneticProgramming and Boosting technique to obtain anensemble of regressors and proposes a new formulafor the final hypothesis. This new formula is based onthe correlation coefficient instead of the geometricmedian used by the boosting algorithm. To validatethis method, experiments were performed, the meansquared error (MSE) has been used to compare theaccuracy of the proposed method against the resultsobtained by GP, GP using a Boosting technique andthe traditional statistical methodology (ARMA). Theresults show advantages in the use of the proposedapproach.

1. Introduction

An essential element for many managementdecisions is an accurate forecasting. There are severalmethods and techniques to forecast time series thatinclude traditional forecasting techniques withtheoretical foundations in statistics. These methodspresent some obstacles and complexities to overcome,the most important ones is the difficulty in selecting themodel that can provide the best adjustment for aspecific dataset. Usually, many attempts have to bemade until the best model can be obtained.Considering this scenario, different machine learningtechniques have been recently used in this problem,such as Artificial Neural Network (ANN), EvolutionaryComputation (EC), in particular, Genetic Programming(GP), which is considered a promising approach toforecast noisy complex series [1].

The GP formal description was introduced by Koza[2], who was the first to introduce GP as anevolutionary algorithm that searches for symbolicregressions. There are many applications for GP inforecasting. Recently, Chen et al. [3] applied GP topredict chaotic time series and they used the Taiwanand US Stock Market series. Their results werecompared to the Random Walk method. Kaboudan [4]introduced a new probability measure for time seriespredictability and to evaluate its performance heapplied it to eight Dow Jones stock series. Thismeasure reduced model search space and producedmore reliable forecasting models. Iba and Sasaki [5]applied the GP to predict price data in the Japanesestock market. The results were compared with neuralnetwork results and they showed the superiority of theGP-base method. This paper extends previous worksand presents the results of experiments that explore GPassociated with Boosting Algorithm. The Boostingtechnique has been successfully used to improveperformance of known methods from the machinelearning field [7]. They consist of getting severalhypotheses from different distributions of the trainingset and merging them to get a better final hypothesis. Inthis paper we propose a new formula for the finalhypotheses. This BCIGP method (Boosting UsingCorrelation Coefficient to Improve GeneticProgramming) will be described with details later. TheBCIGP algorithm has been applied to forecast somereal time series and the results were compared with totraditional prediction methods. The results showadvantages in the use of the correlation coefficients.This work is organized as follows: In Section 2, themain aspects of the Boosting theory and the new finalhypothesis formula are presented. Section 3 describesthe experiments performed and Section 4 analyses theresults.2. Ensemble of Predictors

Proceedings of the 18th IEEE InternationalConference on Tools with Artificial Intelligence (ICTAI'06)0-7695-2728-0/06 $20.00 © 2006

Recent studies in Machine Learning show that it ispossible to combine several predictors into a singlepredictor to highly improve the forecasting accuracy.The first motivation to use the ensemble of predictorswas given by Schapire [7]. He showed that the class ofconcepts that are learned by strong algorithms can belearned by the combination of several weak algorithms,with the same precision and with less computationaleffort. The most used algorithm for combiningpredictors is the Boosting Algorithm.

Boosting is a way of combining many weakclassifiers to produce a powerful “committee”.Boosting works by sequentially applying aclassification algorithm to reweighed versions oftraining data, and taking a weighted majority vote ofthe ensemble of classifiers thus produced. Each time,the weights are computed according to the error (orloss) on each example in the learning algorithm.Initially, all the weights are equals, but on each round,those weights of the misclassified examples areincreased so that the weak learner is forced to focushard on these examples in these training set.

This way, the learning algorithm is manipulated tolook closer at examples with bad prediction functions.For many classification algorithms, this simple strategyresults in dramatic improvements in their performance.Some applications of the boosting algorithm toregression problems can be found in the literature.Freund and Schapire [7] proposed a new boostingalgorithm to work with regression problems, which wasdenominated AdaBoost.R. Drucker [6] applied theboosting algorithm to regression problems using an adhoc modification of AdaBoost.R and he obtained goodresults. The GPBoost technique proposed by Paris [8]is a boosting algorithm that uses Genetic Programmingas a weak algorithm. This is showed in figure 1.

First of all, the weight distribution Dt is initializedin Step 1 and the boosting iterations start (Step 2) bycalling up GP algorithm each time. After the GP’scomplete execution, the best individual ft in the run ischosen and the weight distribution of Dt is computedaccording to the loss function for each example. Tocalculate the loss function different forms can be used,such as the exponential showed in equation (1). Thisloss function is also used to calculate the confidence offt, equation (2). In each iteration of the boostingalgorithm, the GP is executed with a fitness functionthat considers the weights of each example. The fitnessfunction has been defined as the absolute errorsweighed sum. However, the fitness function can bedefined according to the current problem. When theBoosting algorithm is finished, Step 3, the output mustbe computed as a combination of the different

generated hypotheses. This can be done in differentforms. To calculate the final hypothesis F, T functionsft will be combined. The expression for the output usedby Paris [8] is the geometric median weighed by theconfidence coefficient. These values are sorted and thegeometric median (See equation 4) is taken to be F(x),the final expression.

��

��

2.1 Boosting using Correlation Coefficient

The traditional form of obtaining the OutputFunction of a Boosting algorithm is to use some kind ofweighed combination of the outputs of the differentboosting iterations. The weighed are always based onthe loss function (or the confidence) of the differentfunctions ft like standard median, geometric median,arithmetic median and arithmetic RMSE-based median.Paris [8] reported no significant difference between thedifferent forms of outputs. However, the loss functionis one of the possible information that can be used toobtain these weights. An interesting approach is to usethe correlation between the training set and theobtained ft functions. The correlation coefficient is ametric which measures the degree of relation between

Given: S ={(x1, y1), ..., (xm, ym)}; xi ∈X, yi ∈Y

Step 1: Initialize D1(i) : = m1 ∀ (xi, yi) ∈S

Step 2: For t = 1, ..., T do:Run the GP over Dt with the fitness function:

( )� −==

m

itii miDyxffit

1*)(*)(

where f is a function in a GP population, ft is the best-of-runCompute the loss for each example

iitm...i

iiti y)x(fmax

y)x(fL

−−

==1

(1)

Compute the average loss:

�==

m

iii DLL

1and Let

L

Lt −

=β1

(2)

be the confidence given to ft. Update the distribution

t

iLt

t Z

iDiD

−

+ =1

1)(

:)( , Zt a normalization factor

End for:Step 3: Output F the geometric median of functions ft

}1

log2

11log:min{)(

1)(:� ��

�

��

�β

� ≥��

��

�β

∈==≤

m

t tyxtft tRyxF


two variables. The correlation between two samples Xand Y is calculated by using equation (4).

( )( )

( ) ( )� −� −

� −−=ρ

==

=m

ii

m

ii

m

iii

yyxx

yyxxyx

1

2

1

2

1),(

Where x and y are the mean of the samples X and

Y respectively

(4)

Therefore, to use the correlation coefficient in theBoosting Algorithm, Step 3 is replaced by the new Step3 (See equation 5). Another modification in thealgorithm was made to calculate the loss function.After performing many experiments, we conclude thatby using the exponential loss (See equation 6), theresults can be improved. To validate this change,experiments comparing the different methods wereaccomplished.

( ) ( )( ) ( )( ) ( ) ( )( )

( ) ( )( ) ( ) ( )( )� −� −

� −−=ρ

==

=m

ii

m

itit

m

iitit

iitt

xyxyxfxf

xyxyxfxfxyxf

1

2

1

2

1),(

Where y is the the training set and ft is the GPBoost run.Calculate the final hypothesis, F(x), using

( ) ( )( )iitt xyxf ,ρNew Step 3: Build up the output final hypothesis:

Calculate the correlation coefficient between thegiven training set’s value and GPBoost runs.

( ) ( )( ) ( )

( ) ( )( )�ρ

�ρ=

=

=− T

tiitt

T

titiitt

i

xyxf

xyxyxf

xF

1

11

,

*,)( i =1,..., m-1

T

xy

xF

T

tit

m

�= =1

)()( , i = m

(5)

( ) ��

��

�

−−

−−== )()(max

)()(exp1

...1 iitmi

iitit xyxf

xyxfxL (6)

3. Experiments

In this Section, we compare the GP’s performancewith GP using traditional Boosting and the newBoosting algorithm (BCIGP). These results are alsocompared with the traditional statistical methodology(ARMA)[9]. We also present the main steps followedto configure the GP algorithm, GPBoost. The data setsused in this paper are from Morettin [10] and they were

obtained from the website

(http://www.ime.usp.br/~pam/ST.html), and threefinancial time series from a website(www.economatica.com ). Each data set was dividedinto two other data sets: training and testing. Thetraining set contains 90% of the data series and theremaining 10% is used as test set. To apply the GP,GPBoost and BCIGP algorithms, we chose the tool Lil-GP 1.0 [11] which is a free and easily configurablesoftware, implemented according to Koza’s GP [2]. Foreach problem to be solved by the tool, it is necessary toprovide configuration files, standard GP parameters,functions and variables to be used for discovering themodels, input and output files (training set) and tospecify the fitness function evaluation. The parametersused to configure the GP tool are: the terminal set T={θ, Zt-1, Zt-2, Zt-3, Zt-4}, where Zt-1, Zt-2, Zt-3, Zt-4, areused to estimate Z at time t, besides these terminals arandom constant θ is also used; the function set F = {+,-, *, /, log, cos, sin, exp, root}; the population size is1000; the inicial method is Grow; the Selection methodis Best; the initial depht is 2; Max Depht and MaxNodes is 30; crossover rate is 0.8; reproduction andmutation rate is 0.1. The fitness function was defined asthe weighed root mean square error (RMSE). TheRMSE is widely used to measure the accuracy offorecasting methods. In this experiment individualswith RMSE equal to 0 or near 0 are best.

A Boosting algorithm with different outputhypothesis was implemented using the C computerlanguage. The experiment uses ten Boosting iterations.Furthermore, for each dataset, ten models of eachalgorithm were obtained using a different randominitial seed for each training set. After that, eachgenerated model is used to forecast the values in thetest set.

4. Results

To evaluate the performance of the differentmethods, we used the mean square error (MSE)average obtained by using the 10 initial seeds out of asample. For the ARMA process, we have only oneprediction. These results are summarized in table (1).

��

Series GP GPBoost ARMA (BCIGP)

Atmosph. 38,0820 36,7214 38,9877 7,5926Beverages 245,6594 231,7631 308,2524 62,5961Consum. 236,1543 140,5209 137,7677 60,8696Fortaleza 423083,19 400212,4 445690,3 209855,37ICV 554,1146 573,5342 661026,9 42614,54IPI 130,9921 124,4582 624,9684 17,0574


Lavras 13788,93 8249,27 5335,99 4623,51Sunspots 320,8523 318,7629 902,7028 329,1093Djiad 0,000054 0,000053 0,053000 0,000052Ibovespa 0,000260 0,000259 0,268000 0,000237Nasdaq 0,000063 0,000063 0,063000 0,000062

Besides comparing the MSE in the test set for allmethods used in this experiment, we also applied apaired two-sided t test to estimate the BCIGP’sperformance against the other methods. The results ofthese tests (standard deviation) are shown in table (2).The performance of the algorithms was evaluated foreach pair of algorithms where one of the algorithms isBCIGP. If the absolute difference values between twomethods will be greater than 2 standard deviations, theconclusion is that there is a significant differencebetween the methods evaluated and that the winningalgorithm is better than the other with a 95%confidence level. In column two of table 2, wepresented the comparative results of the traditional GPand BCIGP where we can observe that only for twoseries our approach is not better than GP.

�� !��"�� #��$��#��%�&&��#��% ��#%��!'��

Series ad(GP-BCIGP)ad(GPBoost-

BCIGP)ad(ARMA-

BCIGP)

Atmosph. 47,5638 108,5077 149,284607

Beverages 23,1582 30,5943 92,6548

Consum. 3,1692 41,0439 78,4014

Fortaleza 8,0958 22,1414 42,6753

ICV -2,9932 -2,9918 44,0088

IPI 46,8669 96,2223 987,0798

Lavras 11,9149 12,3149 3,0383

Sunspots -0,3129 -0,3772 24,6674

Djiad 1,1881 3,4936 1,7837

Ibovespa 20,1914 20,2502 30,2937

Nasdaq 4,2226 7,4065 4,23316

In column three we have the comparative results ofthe GPBoost and BCIGP. We notice that only in 2series our approach is not better than the GPBoost.Finally, in column four, we show the comparativeresults between BCIGP and the ARMA algorithms. Weobserve that in all the series our approach is better thanthe ARMA process at a 95% confidence level, with theexception of the Djiad series in which the standarddeviation is not greater than two, but is close to two.

In conclusion, the absolute difference valuesbetween BCIGP and the other methods is almostalways greater than 2 standard deviations, which meansthat there is a significant difference between BCIGPand traditional GP, GP using Boosting and ARMAmethods for these series. The BCIGP performance isbetter than the other methods.

5. References

[1] M. A. Kaboudan, “Forecasting with computer evolvedmodel specifications: a genetic programming application”,Computers&Operations Research, Vol. 30, pp 1661-1681,sept. 2003

[2] J. Koza, “Genetic programming: on the programming ofcomputers by means of natural selection”, Cambridge, MA:The MIT Press, 1992.

[3] C. H Chen, “Neural networks for financial marketprediction”, In: Proceedings of IEEE InternationalConference on Neural Networks, 1994, pp. 1199-1202.

[4] M. A Kaboudan, “Measure of time series predictabilityusing genetic programming applied to stock returns”, Journalof Forecasting, 1998, vol. 18, pp. 345–357.

[5] H. Iba, T. Sasaki, “Using genetic programming to predictfinancial data”, In: Proceedings of the 1999 Congress ofEvolutionary Computation, July 6–9, Washington, DC.Piscataway, NJ: IEEE, 1999. p. 244–51.

[6] H. Drucker, “Improving regression using boostingtechniques”. In: International Conference on MachineLearning, 1997, Orlando: Proceeding of InternationalConference on Machine Learning, ICML97.

[7] R. Schapire, Y Freund, “Experiments with a newboosting algorithm”, In: Machine Learning: Proceedings ofthe Thirteenth International Conference, 1996, p.148-156

[8] G. Paris, “Applying Boosting Techniques to GeneticProgramming”, Lecture Notes In Computer Science, London,v.2310, p. 267-280, 2001.

[9] G. E. P. Box, G. M. Jenkins, “Time series analysis:forecasting and control”, Ed. Rev.(1976). San Francisco:Holden-Day, 1970.

[10] P. Morettin, C. M. Toloi, “Análise de SériesTemporais”. Ed. Edgard Blucher LTDA. São Paulo, 2004.

[11] Zongker, D., Punch, D.: Lil-gp 1.0 User’s Manual.Michigan State University, July 1995.


[IEEE 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06) -...

Documents

Transcript of [IEEE 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06) -...