Robustly Forecasting the Bucharest Stock

8/4/2019 Robustly Forecasting the Bucharest Stock

1/21

Professor Vasile GEORGESCU, PhD

Department of Mathematical Economics

Faculty of Economics and Business Administration

University of Craiova

Email: [email protected]

ROBUSTLY FORECASTING THE BUCHAREST STOCK

EXCHANGE BET INDEX THROUGH A NOVEL

COMPUTATIONAL INTELLIGENCE APPROACH

Abstract. In this paper two computational intelligence approaches arecontrasted: the common approach, based on one-value-ahead neural network

forecasting methods, and a novel approach, based on a mix of computationalintelligence techniques (noise filtering with wavelets, fuzzy clustering, neuralmapping of fuzzy transitions between cluster prototypes and robust prediction) forone-subsequence-ahead forecasting of stock market indices. The first approach

serves to demonstrate that emerging markets are deeply affected from globalinfluences such as external shocks or signals and at least with neural networkmodels the inclusion of exogenous variables from well established global markets

significantly improves the forecasting performance of the emerging market model.

However, one-value-ahead forecasting of price levels is not as useful as the shapeof middle-term up and down movements, due to their inherent short-termrandomness. The second approach proposes a novel one-subsequence-ahead

forecasting framework that allows the prediction of stock index movements in amore robust way, focusing on predicting one price subsequence rather than one

price level at a time.Keywords: Computational intelligence, Subsequence time series fuzzy

clustering, Neural mapping, One-subsequence-ahead forecasting of time series.

JEL Classification: C22, C45, C51, C53, C63 G17

1. INTRODUCTION

The vision of looking at computational economics from the perspective of

Computational Intelligence (CI) arises essentially from acknowledging the legacyof Herbert Simon to economics and thus primarily tries to face the challenge ofmodeling intelligent behaviors. As opposite to the human-neutral dynamics inphysics, economic dynamics are deeply and inherently induced by either plenty orat least bounded human rationality. Basically, the idea behind CI is to model the


2/21

Vasile Georgescu

intelligence observed from natural behavior (neural sciences, linguistic behavior,biology, adaptive ecologic systems, immune systems, and so on). CI is particularlysuitable for modeling and forecasting complex nonlinear and time-varyingfinancial processes, where many difficult problems are attempted to achieve

tractability and robustness: the lack of an a priori specification of model's structure,high noise levels, non-stationarities induced by structural changes over time,fluctuations and shocks, nonlinear effects of either underlying dynamics orcomplex human behavior, and so on. Exploiting the potential of computationalintelligence techniques is at the core of this paper, which particularly focuses onthe prediction of the future change in a stock market index based on informationavailable at the time of the prediction.

The mathematical characterization of stock market movements has been a

subject of intense interest. In principle, stock trading can be profitable if thedirection of price movement can be predicted consistently. However, the predictionof financial markets is a very complex task, because the financial time series areinherently noisy, non-stationary, and deterministically chaotic (i.e., short-term

random but long-term deterministic). Within traditional financial economics most believe that not only financial crises, but also daily price movements, are simply

unpredictable. This conviction is based upon the Eugene Fama's efficient-markethypothesis (EMH) and the related random-walk hypothesis, which staterespectively that markets contain all information about possible future movements

and that the movement of financial prices is random and practically unpredictable.As a consequence, investors' reactions should be random and should follow a

normal distribution pattern so that the net effect on market prices cannot be reliablyexploited to make an abnormal profit especially when considering transactioncosts. Benot Mandelbrot first observed that stock price variations follow complexdynamics where periods characterized by near random walk movements areoccasionally disrupted by large movements (i.e. crashes). Such turbulent events are

much more common than would be predicted in a normal distribution. Althoughthe conventional assumption has been that stock markets behave according to arandom Gaussian distribution, statistical evidence proves this assumption incorrect.By contrary, it suggests that the stock market prices follow an inverse cubic powerlaw. An empirical confirmation of such an assumption is provided by manyfinancial indices, including the Bucharest Stock Exchange BET Index (see Figure3). This led to the conclusion that the nature of market movements is generally

much better explained using nonlinear dynamics and concepts of chaos theory.In the last few decades, both the theoretical advances in behavioral financeand the empirical analyses have consistently found problems with the efficient-market hypothesis. It has become controversial because substantial inefficiencieswere observed in the market (e.g., stocks with low price to earnings, cash-flow or

book value outperform other stocks), leading investors to purchase overpricedgrowth stocks rather than value stocks. Speculative bubbles (anomalies in marketsdriven by buyers operating on irrational exuberance) are yet another contradictionof EMH. Despite the erratic fluctuations in stock prices in the short term, non-


3/21

Robustly Forecasting the Bucharest Stock Exchange BET Index .

random walk and serial correlation evidence shows that the true value will in thelong run be reflected in the stock price. This means that the problem of stockmarket predictability is difficult in its very nature, but not completely intractable.

As a response to real-world complexity, more and more sophisticated

techniques have been proposed in an attempt to increase the predictability offinancial instruments, including a wide range of computational intelligencetechniques. Among them, feedforward and recurrent neural networks (NNs) gainedincreasing popularity. They, however, did not bear outstanding prediction accuracy(just slightly overperformed the benchmark random walk accuracy rate that is50%) partly because of the tremendous noise and non-stationary characteristics instock market data. Little evidence of predictability is commonly shown when out-

of-sample forecasts are considered. On the other hand, the presence of short-term

randomness suggests that larger profits can be consistently generated if long-termmovements in the stock price are accurately predicted rather than short-termmovements. Unfortunately, most of the proposed models focused on the accurateforecasting of the levels (i.e. value) of the underlying stock index (e.g., the next

day's closing price forecast). Actually, the absolute value of a stock price is usuallynot as interesting as the shape of up and down movements (direction of change).

As an alternative to one-value-ahead forecasting framework, this paperproposes a novel one-subsequence-ahead forecasting approach, which focuses onthe predictability of the direction of stock index movement. It is based upon

computational intelligence techniques and consists of four stages. We start with thepreprocessing stage, which consist of normalizing and de-noising the time series by

wavelet decomposition. A non-overlapping subsequence time series clusteringprocedure with a sliding window and a lower-bound of the Dynamic Time Warpingdistance are addressed when applying the Fuzzy C-Means algorithm. Afterwards,the subsequence time series fuzzy transition function is learned by neural mapping,consisting of deriving, for each subsequence time series, the degrees to which it

belongs to the c cluster prototypes, when the pc membership degrees of theprevious p subsequences are presented as inputs to the neural network. Finally, thisfuzzy transition function is applied to forecasting one-subsequence-ahead timeseries, as a weighted mean of the c cluster prototypes to which it belongs, and theBET index data are used for testing.

In what follows, we will contrast the two computational intelligence basedapproaches.

2. ONE-VALUE-AHEAD FORECASTING OF BUCHAREST

STOCK EXCHANGE BET INDEX, BASED ON NEURAL

NETWORK AR/ARX MODELS

2.1. Nonlinear Neural Network ARX (NNARX) Architecture

The most wide-spread feedforward NN that has been proved to beuniversal function approximator (Hornik, Stinchcombe, & White, 1989) is the


4/21

Vasile Georgescu

multilayer perceptron (MLP), with hidden units having sigmoidal transferfunctions. The class of MLP-networks considered here is furthermore confined tothose having one hidden layer. Hyperbolic tangent activation functions are usually preferred for hidden nodes and linear activation functions for output nodes. This

architecture allows the MLP to approximate any computable function on a compactset arbitrarily closely by

nivwzwvFzFwvy ij

m

jj

q

j

ijiMLP

i ,,1,)(),( 0011

Kl

l

l=

+

+==

==

(1)

where j is a sigmoidal function, iF is a linear function, q is the number of

hidden units, ijv and ljw are weights, 0iv and 0jw are biases (thresholds).MLPs offer a straightforward extension to the classical way of modeling

time series. Namely, they can use a specific mechanism to deal with temporalinformation (layer delay without feedback or time window) and can thus extend thelinear autoregressive model with exogenous variables (ARX) to the nonlinear ARXform:

( ) tnbnktnktnattMLP

t XXyyFy += ,,,,,1 KK (2)

whereMLP

F is a non-linear function, na is the number of past outputs, nb is the

number of past inputs and nk is the time delay.

Nonlinear neural network ARX (NNARX) models are potentially more

powerful than linear ones in that they can model more complex underlyingcharacteristics of time series and theoretically do not have to assume stationarity.

Feedforward networks are well suited only for NNARX models, whichallow a series-parallel architecture that has a predictor without feedback. In a

NNARX model, ty is a function on its lagged values jty and the lagged values of

some exogenous variables. In principle, the output of the NARX network can be

considered as an estimate ty of the output ty of some nonlinear dynamic system

and thus it should be fed back in the next stage to the input of the feedforward

neural network. However, because the true previous outputs jty are available at

time t during the training of the network, a series-parallel architecture can be

created, in which the true outputs jty are used instead of feeding back the

estimated outputs jty , as shown in Figure 1. This has two advantages. The first is

that the input to the feedforward network is more accurate. The second is that theresulting network has a purely feedforward architecture, and static backpropagation

can be used for training.For other types of models used in time series processing that involve

predictors with feedback, one can resort to recurrent networks, where futurenetwork inputs will depend on present and past network outputs.


5/21


Figure 1. A purely feedforward architecture of the NNARX(na, nb, nk) neuralnet

2.2. NNARX based One-Value-Ahead Forecasting of BET Index

One of the major issues in neural network forecasting is how much data arenecessary for neural networks to capture the dynamic nature of the underlyingprocess in a time series. There are two facets to this issue:

(i) How many lagged observations should be used as inputs to the neuralnetwork (or, equivalently, how many input nodes should the neural network have)?Each actual historical value depends upon a number of preceding values(endogenous and exogenous lagged observations).

(ii) How many historical values should be used in training the neuralnetwork? Although larger sample size, in the form of a longer time series, isusually recommended in model development, empirical results suggest that longer

time series do not always yield models that provide the best forecastingperformance. Using a smaller sample of time series or data close in time to the out-of-sample can sometimes produce more accurate neural networks.

Actually, we focus our effort on forecasting the Bucharest Stock ExchangeBET index. Some of its characteristics, such as the synchronization with well

established indices from global markets (Dow Jones, FTSE100-London, Nikkei-Tokyo) provide us with further guidance for choosing the sample size. The BETindex has been relatively recently introduced and its first period of about threeyears is characterized by a relatively flat evolution and a lack of synchronization

with major indices. After that, it starts to synchronize well with the global market(Figure 2).

On the other hand, commonly employed neural network design heuristic is

to capture the dynamics of a stock market index through a time series model, whichrepresents the movement of an endogenous stochastic variable only in terms of itslagged values. The nonlinear neural network autoregression (NNAR) model is atypical example. As an alternative, a NNARX model may be considered, whereone or more exogenous variables are also included. Their role is to capture


6/21

Vasile Georgescu

extraneous influences. Indeed, the ever more global economy causes interactioneffects among the various economies around the world. From this perspective,market indices are classified as either local and emerging or global and mature.Large global markets will have dynamically changing effects on an emerging

market. Results indicate that, as global information is introduced, the forecasting performance of the neural network models for the emerging market indeximproves. Thus, especially with emerging markets, neural network models mustincorporate global economic information into the input vector variable set toachieve optimal performance when forecasting financial time series. All indices inFigure 2 reveal non-stationary patterns. This is confirmed by the empiricaldistribution of BET index, which is asymmetric and exhibits rather a power law

pattern than a Gaussian pattern (see Figure 3).

Figure 2. Parallel evolution of BET, Dow Jones and FTSE100 Indices. Chosing

training and validation data samples

Figure 3. The empirical distribution of BET index, when comparing with the

normal standard distribution


7/21


We have tested a large set of model specification structures and theirrelated neural network architectures in order to choose and validate the bestsettings for our neural-network approach to BET index forecasting. We started with

the simplest NNAR( an ) model, without exogenous variables, and then we

successively introduced one or two exogenous variables (Dow Jones and FTSE100,respectively), thus passing from a purely time series model (NNAR) to a dynamic

system representation, i.e. NNARX( an , bn , kn ).

It is worth to mention that the well known order specification tests, i.e.AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion) andMDL (Minimum Description Length criterion) respectively, may produce

inconsistent results when are used for the selection of the appropriate order na , nb

and nk of a nonlinear model such as NNARX( nknbna ,, ). However, theirindicative guess for the case of linear models can be the starting point in aspecification procedure based on numeric experiments. For example, the orders can be chosen such that the Akaike's Information Criterion (AIC)

( ))/21(log NdVAIC += is minimized, where V is the loss function, i.e.,

( )= N ttNV 1 )),())(,((1det , dis the length of the parameter vector , andNis the number of data points used for the estimation. The Akaike's InformationCriterion suggests the following linear AR/ARX models:

ttt yayAR =+ 11:)1(

ttttttt XbXbXbXbyayARX ++++=+ 1

64

1

53

1

42

1

3111:)3,4,1( tttttttt XbXbXbXbXbyayARX +++++=+

2522

2421

1313

1212

111111:])4,1[],2,3[,1(

where y stays for BET, 1X stays for DJ and 2X stays for FTSE100.

For the nonlinear models NNAR( na ) and NNARX( nknbna ,, ) we chosen

to determine the orders by numerical experiments, trying for orders in the range

]5,1[ .

Once the model structure was specified and the neural network architecturewas selected, the neural network BET index forecasting model was trained over thetraining data and then applied against the validation data. We adopted the standardapproach for training and testing, that is, to evaluate a model by testing its

performance on a validation set consisting of out-of-sample data (Figure 2).Financial time series forecasting neural networks may be evaluated in

many different ways. However, it is commonly admitted that the forecasting performance must result in the form of a tradeoff between statistical accuracymeasurement and trading strategies profitability. The problem cannot be reduced

to how significant (statistically) is the relationship between error measures (used totrain a forecasting NN). It is as important as the trading profitability (based on thatforecast). In other words, the goal is to improve both the predictability and the profitability. Commonly, the neural network forecasting models are evaluated


8/21

Vasile Georgescu

through the MSE (Mean Square Error). But statistical accuracy is not always agood warranty for profitability. It is to be noticed that, even very small errors thatare made in the wrong direction of change may cause significant negative returns,resulting in a capital loss for an investor following the recommendation of the

neural network model. Hence, predicting the correct direction of change to anexisting market index value is the primary criterion for financial forecastingmodels. Instead of measuring the mean standard error of a forecast, a better methodfor measuring the performance of neural networks is to analyze the proportion of predictions that correctly indicate the direction of change to the current marketvalue. Therefore, all of the reported results in this paper will be based on both theMSE and the accuracy or percentage of correct predictions with regard to direction

of change.

The neural network results will be compared to a standard benchmark, therandom walk forecast. The random walk hypothesis assumes that, since tomorrowcannot be predicted, the best guess we can make is that tomorrows price will bethe same as todays price. If neural networks outperform the random walk, then it

can be concluded that there is a nonlinear function or process inherent in the datatested. The implication is that a short-term (typically, one-step-ahead) forecast can

be successfully generated.We first assumed the hypothesis of an insulated stock market and

constructed a neural network forecasting model that shows only the endogenous

dynamics of BET index, relating its current value only to its lagged historicalvalues. This leads to a purely time series based NNAR (nonlinear neural network

AR) model. For each order na , with na from 1 to 5 , we trained the NNAR( na )

architecture and estimated the model. The forecasting model has been evaluatedwith both the MSE (Mean Square Error) and the trading performance (percentageof correct predictions with regard to direction of change). The table below showsthe results.

na=1 na=2 na=3 na=4 na=5

Performance 0.496855 0.507886 0.493671 0.517460 0.490446

MSE 0.000085 0.000087 0.000088 0.000088 0.000089

Although the statistical accuracy measured by MSE was optimal for the NNAR(1) model, the maximum trading performance has been obtained for the

NNAR(4) model. The neural network architecture associated with this model isdepicted in Figure 4.

The forecasting performance of this model is about 51.75%, which is closeto the random walk performance. Based upon such a modest performance, thehypothesis of possible market inefficiencies appears to be rejected. The modelsupports rather a random walk interpretation of BET market index and thus does

not enables a forecasting advantage through nonrandom walk or predictablebehavior.


9/21


Figure 4. The MLP architecture of the NNAR(4) model

However, including additional global knowledge by introducing in theBET index equation some other well established market indices such as DJ and/orFTSE100 proved to ameliorate the overall forecasting performance of the BETindex neural network forecasting model. Thus, modeling the BET index as a NNARX-type nonlinear dynamic system with at least one exogenous variableappears to achieve a statistically significant improvement over the random walkbenchmark dynamic behavior and to provide exploitable market inefficiency.

The DJ index was the first exogenous variable we introduced in the model.Now we have to chose between a large number of possible combinations, allowing

na , nb and nk to vary into the range ]5,1[ . The results are partially shown (only

for nk=1 and nk=2) in the following tables.

For nk=1:Performnace nb=1 nb=2 nb=3 nb=4 nb=5

na=1 0.484277 0.552050 0.566456 0.542857 0.554140

na=2 0.476341 0.548896 0.522152 0.526984 0.525478

na=3 0.481013 0.528481 0.506329 0.517460 0.522293

na=4 0.473016 0.517460 0.504762 0.526984 0.535032

na=5 0.461783 0.519108 0.506369 0.509554 0.528662

For nk=2:

Performance nb=1 nb=2 nb=3 nb=4 nb=5

na=1 0.533123 0.547468 0.536508 0.525478 0.495208

na=2 0.517350 0.528481 0.507937 0.531847 0.523962na=3 0.531646 0.522152 0.495238 0.500000 0.488818

na=4 0.495238 0.520635 0.533333 0.500000 0.463259

na=5 0.490446 0.484076 0.535032 0.519108 0.507987

The selected model is NNARX(1, 3, 1), with a forecasting performance

that increase from 51.75% to 56.65%. The neural network architecture associatedwith this model is depicted in Figure 5.


10/21

Vasile Georgescu

Figure 5. The MLP architecture of the NNARX(1, 3, 1) model

Finally, both the DJ index and the FTSE100 index are included asexogenous variable in the model. The structure specification consists of choosing

orders and time delays for which the best forecasting performance is reached in the

class of NNARX( na ,[ 1nb , 2nb ],[ 1nk , 2nk ]) models. The results are partially given

below.

For 2nb =4, 1nk =1, 2nk =4:

Performance nb1=1 nb1=2 nb1=3 nb1=4 nb1=5

na=1 0.426282 0.483974 0.522436 0.583333 0.557692na=2 0.442308 0.544872 0.564103 0.512821 0.551282na=3 0.522436 0.544872 0.580769 0.512821 0.554487

na=4 0.458333 0.528846 0.608974 0.544872 0.535256na=5 0.487179 0.525641 0.563590 0.548077 0.541667

The selected model is NNARX (4, [3, 4], [1, 4]), which increases theforecasting performance to over 60%. Note that the exogenous variable FTSE100

has a time delay of 2nk =4. The neural network architecture associated with this

model and its learning capability is displayed in Figures 6 and 7.

A series-parallel architecture of the NNARX (4, [3,4], [1,4]) neural network

BET (t-1)

BET (t-2)

BET (t-3)

BET (t-4)DJ (t-1)

DJ (t-2)DJ (t-3)

FTSE100 (t-4)

FTSE100 (t-5)

FTSE100 (t-6)FTSE100 (t-7)

BEThat

(t)

Input Layer Hidden Layer Output Layer

Figure 6. The MLP architecture of the NNARX (4,[3, 4],[1, 4]) model


11/21


Figure 7. Learning capability of the selected neural network architecture

The plot comparing predictions to actual measurements for the validationdataset (out-of-sample predictions) confirms by visual inspection that forecasts arereasonably accurate, but are sometimes out-of-phase, which means over-anticipation (Figure 8).

Figure 8. Comparing out-of-sample predictions to actual measurements

The prediction errors for the validation dataset are depicted in Figure 9.

Figure 9. Out-of-sample prediction errors


12/21

Vasile Georgescu

Just as expected, the in-sample forecasts (i.e., forecasts derived from thetraining dataset) are much more accurate than the out-of-sample ones (Figure 10).However, in-sample data cannot serve for performance validation because theyusually produce overfitted results.

Figure 10. Comparing in-sample (i.e. based on the training dataset)

predictions to actual measurements

In conclusion, the highest forecasting performance was achieved by a NNARX(2,3,2,1,1) model, which utilizes input values from both the DJ and theFTSE100 indices (recall that the performance measures the percentage of neuralnetwork forecasts that are in the same direction, up or down, as the actual index forthe forecast period). With this model, a significant improvement over the randomwalk benchmark dynamic behavior has been obtained and exploitable market

inefficiency has been proved for the Bucharest Stock Exchange BET index, provided that exogenous variables with global impact are involved in the neuralnetwork forecasting model.

3. A NOVEL COMPUTATIONAL INTELLIGENCE BASED

FORECASTING FRAMEWORK

3.1. Time Series Preprocessing

This stage consists of de-noising data by wavelet decomposition and some

other transformations that rely heavily on the selection of a distance measure forclustering.

The Discrete Wavelet Transform (DWT, [10]) uses scaled and shiftedversions of a mother wavelet function, usually with compact support, to form eitheran orthonormal basis (Haar wavelet, Daubechies) or a bi-orthonormal basis(Symlets, Coiflets). Wavelets allow cutting up data into different frequencycomponents (called approximations and details), and then studying each

component with a resolution matched to its scale. They can help de-noiseinherently noisy data such as financial time series through wavelet shrinkage and


13/21


thresholding methods, developed by David Donoho ([3]). The idea is to set to zeroall wavelet coefficients corresponding to details in the data set that are less than a particular threshold. These coefficients are used in an inverse wavelettransformation to reconstruct the data set. An important advantage is that the de-

noising is carried out without smoothing out the sharp structures and thus can helpto increase both the clustering accuracy and predictive performance.

Care has to be taken in choosing suitable transformations such that the timeseries distance measure chosen in the clustering stage is meaningful to theapplication. Normalization of data is common practice when using Fuzzy C-Means,which means applying scaling and vertical translation to the time series as a whole.Moreover, as we already mentioned, the absolute value of a stock price is not as

interesting as the shape of up and down movements. Thus, for allowing stock

prices comparisons subsequence by subsequence, a local translation is alsonecessary, in such a way to have each subsequence starting from zero. A subset of2048 daily closing BET index values drown from Bucharest Stock ExchangeMarket data, as well as the normalized and de-noised data are shown in Figure 11,

were a level 5 decomposition with Sym8 wavelets and a fixed form softthresholding were used.

0 200 400 600 800 1000 1200 1400 1600 1800 2000

2000

4000

6000

8000

10000

Original time series

0 200 400 600 800 1000 1200 1400 1600 1800 2000

0

1

2

Normalized and de-noised time series

Figure 11. A normalized and de-noised data subset, drawn from daily closing

BET index

3.2. Subsequence Time Series Fuzzy Clustering

The idea in subsequence time series (STS) clustering is as follows. Just a

single long time series is given at the start of the clustering process, from which weextract short series with a sliding window. The resulting set of subsequences are


14/21

Vasile Georgescu

then clustered, such that each time series is allowed to belong to each cluster to acertain degree, because of the fuzzy nature of the fuzzy c-means algorithm we use.The window width and the time delay between consecutive windows are two keychoices. The window width depends on the application; it could be some larger

time unit (e.g., 10 days for time series sampled as daily closing BET stock index, inour application). Overlapping or non-overlapping windows can be used. If thedelay is equal to the window width, the problem is essentially converted to non-overlapping subsequence time series clustering. We will follow this approach,being motivated by the Keoghs criticism presented in [8], where using overlappingwindows has been shown to produce meaningless results, due to a surprisinganomaly: cluster centers obtained using STS clustering closely resemble sine

waves, irrespective of the nature of original time series itself, being caused by the

superposition of slightly shifted subsequences. Using larger time delays for placingthe windows does not really solve the problem as long as there is some overlap.Also, the less overlap, the more problematic the choice of the offsets becomes.

Since clustering relies strongly on a good choice of the dissimilarity

measure, this leads to adopting an appropriate distance, depending on the verynature of the subsequence time series.

Let 1,, += wmm yyS K be a subsequence with length w of time series

nyyY ,,1 K= , where 11 + wnm . Subsequences will be represented as

vectors in a w-dimensional vector space. For relatively short time series, shape-

based distances, such as pL norms, are commonly used to compare their overall

appearance. The Euclidean distance (2

L ) is the most widely used shape-based

distance. Other pL norms can be used as well, such as Manhattan ( 1L ) and

Maximum ( L ), putting different emphasis on large deviations.

There are several pitfalls when using an pL distance on time series: it does

not allow for different baselines in the time sequences; it is very sensitive to phase

shifts in time; it does not allow for acceleration and deceleration along the time

axis (time warping). Another problem with pL distances of time series is when

scaling and translation of the amplitudes or the time axis are considered, or whenoutliers and noisy regions are present.

A number of non-metric distance measures have been defined to overcomesome of these problems. Small distortions of the time axis are commonly addressed

with non-uniform time warping, more precisely with Dynamic Time Warping(DTW, [7]). The DTW distance is an extensively used technique in speechrecognition and allows warping of the time axes (accelerationdeceleration ofsignals along the time dimension) in order to align the shapes of the two time series better. The two series can also be of different lengths. The optimal alignment is

found by calculating the shortest warping path in the matrix of distances betweenall pairs of time points under several constraints (boundary conditions, continuity,

monotonicity).


15/21


The warping path is also constrained in a global sense by limiting how farit may stray from the diagonal. The subset of the matrix that the warping path isallowed to visit is called the warping window. The two most common constraintsin the literature are the Sakoe-Chiba band and the Itakura parallelogram. We can

view a global or local constraint as constraining the indices of the warping path

kk jiw ),(= , such that rjirj + , where r is a term defining the allowed

range of warping, for a given point in a sequence. In the case of the Sakoe-Chiba

band (see Figure 12), ris independent of i ; for the Itakura parallelogram, ris a

function of i .

(a) (b)

Figure 12. (a) Aligning two time sequences using DTW. (b) Optimal warping

path with the Sakoe-Chiba band as global constraints.

DTW is a much more robust distance measure for time series than 2L ,

allowing similar shapes to match even if they are out of phase in the time axis.

Unfortunately, however, DWT is calculated using dynamic programming with time

complexity )( 2nO . Recent approaches focus more on approximating the DTW

distance by bounding it from below. For example, a novel, linear time (i.e., with

complexity reduced to )(nO ), lower bound of the DTW distance, was proposed in

[9]. The intuition behind the approach is the construction of a special envelopearound the query. It can be shown that the Euclidean distance between a potentialmatch and the nearest orthogonal point on the envelope lower bounds the DTWdistance. To index this representation, an approximate bounding envelope iscreated.

Let },,{ 1 nqqQ K= and },,,{ 1 mccC K= be two subsequences and

kk jiw ),(= be the warping path, such that rjirj + , where r is a term

defining the range of warping for a given point in a sequence. The term rcan be

used to define two new sequences, L and U, where ):min( ririi qqL += ,

):max( ririi qqU += , with L and U standing forLowerand Upper, respectively.

An obvious but important property ofL and Uis the following:ii

LqUi i , .

C


16/21

Vasile Georgescu

Given L and U, a lower bounding measure for DTW can now be defined (seeFigure 13):

LB-Keogh(Q, C)

( )

( )

=

=

=

otherwise

UcifUc

UcifUc

ii

n

i ii

ii

n

i ii

01

2

1

2

.(3)

0 5 10 15 20 25 30 35 40-1.5

-1

-0.5

0

0.5

1

1.5

U

Q

C

L

Figure 13. The lower bounding function LB-Keogh(Q,C). The original

sequence Q is enclosed in the bounding envelope ofUandL

We are now going to generalize the fuzzy c-means algorithm tosubsequence time series clustering. In this particular context, the entities to be

clustered, denoted by kx , and the cluster prototypes (centroids), denoted by iv , are

both set-defined objects, i.e. subsequence time series. The centroids are computed

as weighted means, where the weights, denoted by iku , are the fuzzy membership

degrees to which each subsequence belongs to a cluster. Both the DTW and LB-

Keogh distances outperform 2L and thus qualify better to be used with the fuzzy c-

means algorithm. However, the LB-Keoghs lower bound of DTW distance has been preferred, due to its linear time complexity. Figure 14 plots the clustercentroids (prototypes) and the subsequence time series grouped around eachcentroid.

3.3. Estimation of the Fuzzy Transition Function between Clusters by

Neural Mapping

At this stage, a fuzzy transition function between clusters must be learned,

which is a nonlinear vector function mapping a number of -c dimensional

membership degree vectors )( 1+jtSTS , pj ,,1 K= , into a -c dimensional


17/21


18/21

Vasile Georgescu

0 50 100 150 2000

0.5

1

Membership degrees to cluster 1

0 50 100 150 2000

0.5

1

Predicted membership degrees to cluster 1

0 50 100 150 2000

0.5

1Membership degrees to cluster 2

0 50 100 150 2000

0.5

1Predicted membership degrees to cluster 2

0 50 100 150 200

0

0.5


0 50 100 150 200

0

0.5


0 50 100 150 2000

0.5


0 50 100 150 2000

0.5


0 50 100 150 2000

0.5


0 50 100 150 2000

0.5


Figure 15. Accurate neural mapping: actual and predicted membership

degrees to which each of the 256 subsequence time series belongs to one of the

5 clusters

0 50 100 150 200-1

0

1Prediction errors for cluster 1

-0.5 0 0.50

200

400Histogram of errors for cluster 1

0 50 100 150 200-1

0


-0.5 0 0.50

200


0 50 100 150 200-1

0


-0.5 0 0.50

200


0 50 100 150 200-1

01

Prediction errors for cluster 4

-0.5 0 0.50

200400

Histogram of errors for cluster 4

0 50 100 150 200-1

0


-0.5 0 0.50

200


Figure 16. Prediction errors and their histogram for each of the 5 clusters


19/21


1 5 10-0.3375

0.0034

1 5 100

0.1264

1 5 10-0.0704

0

1 5 10-0.011

0.0155

1 5 10-0.0269

0.00792

1 5 100

0.1069

1 5 10-0.0806

0

1 5 100

0.0717

1 5 10-0.0359

0.0346

1 5 100

0.0727

1 5 10-0.017

0.0205

1 5 10-0.0443

0

1 5 10-0.0528

0.006

1 5 10-0.0155

0.0323

1 5 100

0.0306

Observed out-of-sample sequence Predicted out-of-sample sequence

Figure 17. One-subsequence-ahead forecasts of 15 out-of-sample subsequences

The two forecasting approaches can now be easily contrasted. As oppositeto the common neural network forecasting approach that focuses on price levelforecasts attempting overperform (more or less) the random walk benchmarkaccuracy, our computational intelligence is intended to reliably exploit the shape of

middle-term up and down price movements. The 15 out-of-sample subsequenceforecasts shown in Figure 17 cover 15*10=150 transaction days and are proved to

be considerably robust in filtering the short-term randomness and in predicting theright direction of change.

6. CONCLUSION

Predicting price levels is an intriguing, challenging, and admittedly riskyendeavor. Technical analysis uses trend following strategies to forecast future pricemovements and to infer trading decision rules, based on the assertion that pricechanges have inertia. However, experimental works show little evidence ofpredictability, with accuracy rates that slightly exceed the random walk benchmarkperformance.

The first approach in this paper served essentially to test and reject thehypothesis of an insulated emerging stock market. We presumed that among otheremerging capital markets, Bucharest Stock Exchange is tremendously affected byglobal interaction effects and extraneous influences from mature markets. In an

attempt to validate this presumption, we compared the statistical accuracy and thetrading performance of several neural network forecasting models for BET indexagainst each other and against the random walk benchmark performance. NNAR(na) models were used to capture only the endogenous dynamics of BETindex, and NNARX(na, nb, nk) models to additionally capture exogenousinfluences induced by global market indices (Dow Jones and/or FTSE100). To

conclude, significantly more accurate results have been obtained through theinclusion of exogenous variables from well established global markets.


20/21

Vasile Georgescu

However, because of the stock market prices short-term randomness,next-day forecasts cannot be efficiently exploited for making consistent tradingstrategies. The second approach introduced a novel computational intelligenceframework allowing one-subsequence-ahead instead of one-value-ahead forecasts.

Experimental evidence with ten-day-length-subsequence-ahead forecasts for BETindex proved to be significantly more robust than one-day-value-ahead forecasts inshowing the right direction of change.

REFERENCES

[1] Chen, S-H., Jain L., Tai C.-C. (Eds.) (2006), Computational Economics:A Perspective from Computational Intelligence, Idea Group;

[2] Chen, S-H., Wang P.P., Kuo T.-W. (Eds.)(2007), ComputationalIntelligence in Economics and Finance, Springer-Verlag;

[3] Donoho, D. (1993), Nonlinear Wavelet Methods for Recovery of Signals,Densities, and Spectra from Indirect and Noisy Data. In: DifferentPerspectives on Wavelets, Proceeding of Symposia in Applied Mathematics,Vol 47, I. Daubechies (eds). Amer. Math. Soc., Providence, R.I., pp. 173205;

[4] Georgescu, V. (2009), Generalizations of Fuzzy C-Means Algorithm toGranular Feature Spaces, based on Underlying Fuzzy Metrics: Issues and

Related Works. In: 13th IFSA World Congress and 6th Conference of

EUSFLAT, pp. 1791--1796, Lisbon, Portugal;[5] Georgescu, V. (2009), A Time Series Knowledge Mining FrameworkExploiting the Synergy between Subsequence Clustering and Predictive

Markovian Models. Fuzzy Economic Review, vol.XIV, No.1, pp. 41--66, 20;

[6] Georgescu, V.(2005), Applied Econometrics: Time Series Analysis (Amaster course in English), Universitaria, Craiova;

[7] Keogh, E., Pazzani, M.J.(1999), Scaling up Dynamic Time Warping toMassive Datasets. In: Zytkow, J.M., Rauch, J. (eds), 3rd EuropeanConference on Principles of Data Mining and Knowledge Discovery

(PKDD'99), pp. 1--11. Springer;

[8] Keogh, E., Lin, J., Truppel, W. (2003), Clustering of Time SeriesSubsequences is Meaningless: Implications for Previous and Future

Research. In: 3rd IEEE International Conference on Data Mining, pp. 115122;

[9] Keogh, E., Ratanamahatana, C. A. (2005), Exact Indexing of DynamicTime Warping. Knowledge and Information Systems, 7, pp. 358386;

[10]Mallat, S. G., Peyr, G. (2009),A Wavelet Tour of Signal Processing: TheSparse Way. Academic Press, 3

rdEdition.


21/21

Copyright of Economic Computation & Economic Cybernetics Studies & Research is the property of Economic

Computation & Economic Cybernetics Studies & Research and its content may not be copied or emailed to

multiple sites or posted to a listserv without the copyright holder's express written permission. However, users

may print, download, or email articles for individual use.

Robustly Forecasting the Bucharest Stock

Documents

Transcript of Robustly Forecasting the Bucharest Stock