Robustly Forecasting the Bucharest Stock

download Robustly Forecasting the Bucharest Stock

of 21

Transcript of Robustly Forecasting the Bucharest Stock

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    1/21

    Professor Vasile GEORGESCU, PhD

    Department of Mathematical Economics

    Faculty of Economics and Business Administration

    University of Craiova

    Email: [email protected]

    ROBUSTLY FORECASTING THE BUCHAREST STOCK

    EXCHANGE BET INDEX THROUGH A NOVEL

    COMPUTATIONAL INTELLIGENCE APPROACH

    Abstract. In this paper two computational intelligence approaches arecontrasted: the common approach, based on one-value-ahead neural network

    forecasting methods, and a novel approach, based on a mix of computationalintelligence techniques (noise filtering with wavelets, fuzzy clustering, neuralmapping of fuzzy transitions between cluster prototypes and robust prediction) forone-subsequence-ahead forecasting of stock market indices. The first approach

    serves to demonstrate that emerging markets are deeply affected from globalinfluences such as external shocks or signals and at least with neural networkmodels the inclusion of exogenous variables from well established global markets

    significantly improves the forecasting performance of the emerging market model.

    However, one-value-ahead forecasting of price levels is not as useful as the shapeof middle-term up and down movements, due to their inherent short-termrandomness. The second approach proposes a novel one-subsequence-ahead

    forecasting framework that allows the prediction of stock index movements in amore robust way, focusing on predicting one price subsequence rather than one

    price level at a time.Keywords: Computational intelligence, Subsequence time series fuzzy

    clustering, Neural mapping, One-subsequence-ahead forecasting of time series.

    JEL Classification: C22, C45, C51, C53, C63 G17

    1. INTRODUCTION

    The vision of looking at computational economics from the perspective of

    Computational Intelligence (CI) arises essentially from acknowledging the legacyof Herbert Simon to economics and thus primarily tries to face the challenge ofmodeling intelligent behaviors. As opposite to the human-neutral dynamics inphysics, economic dynamics are deeply and inherently induced by either plenty orat least bounded human rationality. Basically, the idea behind CI is to model the

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    2/21

    Vasile Georgescu

    intelligence observed from natural behavior (neural sciences, linguistic behavior,biology, adaptive ecologic systems, immune systems, and so on). CI is particularlysuitable for modeling and forecasting complex nonlinear and time-varyingfinancial processes, where many difficult problems are attempted to achieve

    tractability and robustness: the lack of an a priori specification of model's structure,high noise levels, non-stationarities induced by structural changes over time,fluctuations and shocks, nonlinear effects of either underlying dynamics orcomplex human behavior, and so on. Exploiting the potential of computationalintelligence techniques is at the core of this paper, which particularly focuses onthe prediction of the future change in a stock market index based on informationavailable at the time of the prediction.

    The mathematical characterization of stock market movements has been a

    subject of intense interest. In principle, stock trading can be profitable if thedirection of price movement can be predicted consistently. However, the predictionof financial markets is a very complex task, because the financial time series areinherently noisy, non-stationary, and deterministically chaotic (i.e., short-term

    random but long-term deterministic). Within traditional financial economics most believe that not only financial crises, but also daily price movements, are simply

    unpredictable. This conviction is based upon the Eugene Fama's efficient-markethypothesis (EMH) and the related random-walk hypothesis, which staterespectively that markets contain all information about possible future movements

    and that the movement of financial prices is random and practically unpredictable.As a consequence, investors' reactions should be random and should follow a

    normal distribution pattern so that the net effect on market prices cannot be reliablyexploited to make an abnormal profit especially when considering transactioncosts. Benot Mandelbrot first observed that stock price variations follow complexdynamics where periods characterized by near random walk movements areoccasionally disrupted by large movements (i.e. crashes). Such turbulent events are

    much more common than would be predicted in a normal distribution. Althoughthe conventional assumption has been that stock markets behave according to arandom Gaussian distribution, statistical evidence proves this assumption incorrect.By contrary, it suggests that the stock market prices follow an inverse cubic powerlaw. An empirical confirmation of such an assumption is provided by manyfinancial indices, including the Bucharest Stock Exchange BET Index (see Figure3). This led to the conclusion that the nature of market movements is generally

    much better explained using nonlinear dynamics and concepts of chaos theory.In the last few decades, both the theoretical advances in behavioral financeand the empirical analyses have consistently found problems with the efficient-market hypothesis. It has become controversial because substantial inefficiencieswere observed in the market (e.g., stocks with low price to earnings, cash-flow or

    book value outperform other stocks), leading investors to purchase overpricedgrowth stocks rather than value stocks. Speculative bubbles (anomalies in marketsdriven by buyers operating on irrational exuberance) are yet another contradictionof EMH. Despite the erratic fluctuations in stock prices in the short term, non-

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    3/21

    Robustly Forecasting the Bucharest Stock Exchange BET Index .

    random walk and serial correlation evidence shows that the true value will in thelong run be reflected in the stock price. This means that the problem of stockmarket predictability is difficult in its very nature, but not completely intractable.

    As a response to real-world complexity, more and more sophisticated

    techniques have been proposed in an attempt to increase the predictability offinancial instruments, including a wide range of computational intelligencetechniques. Among them, feedforward and recurrent neural networks (NNs) gainedincreasing popularity. They, however, did not bear outstanding prediction accuracy(just slightly overperformed the benchmark random walk accuracy rate that is50%) partly because of the tremendous noise and non-stationary characteristics instock market data. Little evidence of predictability is commonly shown when out-

    of-sample forecasts are considered. On the other hand, the presence of short-term

    randomness suggests that larger profits can be consistently generated if long-termmovements in the stock price are accurately predicted rather than short-termmovements. Unfortunately, most of the proposed models focused on the accurateforecasting of the levels (i.e. value) of the underlying stock index (e.g., the next

    day's closing price forecast). Actually, the absolute value of a stock price is usuallynot as interesting as the shape of up and down movements (direction of change).

    As an alternative to one-value-ahead forecasting framework, this paperproposes a novel one-subsequence-ahead forecasting approach, which focuses onthe predictability of the direction of stock index movement. It is based upon

    computational intelligence techniques and consists of four stages. We start with thepreprocessing stage, which consist of normalizing and de-noising the time series by

    wavelet decomposition. A non-overlapping subsequence time series clusteringprocedure with a sliding window and a lower-bound of the Dynamic Time Warpingdistance are addressed when applying the Fuzzy C-Means algorithm. Afterwards,the subsequence time series fuzzy transition function is learned by neural mapping,consisting of deriving, for each subsequence time series, the degrees to which it

    belongs to the c cluster prototypes, when the pc membership degrees of theprevious p subsequences are presented as inputs to the neural network. Finally, thisfuzzy transition function is applied to forecasting one-subsequence-ahead timeseries, as a weighted mean of the c cluster prototypes to which it belongs, and theBET index data are used for testing.

    In what follows, we will contrast the two computational intelligence basedapproaches.

    2. ONE-VALUE-AHEAD FORECASTING OF BUCHAREST

    STOCK EXCHANGE BET INDEX, BASED ON NEURAL

    NETWORK AR/ARX MODELS

    2.1. Nonlinear Neural Network ARX (NNARX) Architecture

    The most wide-spread feedforward NN that has been proved to beuniversal function approximator (Hornik, Stinchcombe, & White, 1989) is the

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    4/21

    Vasile Georgescu

    multilayer perceptron (MLP), with hidden units having sigmoidal transferfunctions. The class of MLP-networks considered here is furthermore confined tothose having one hidden layer. Hyperbolic tangent activation functions are usually preferred for hidden nodes and linear activation functions for output nodes. This

    architecture allows the MLP to approximate any computable function on a compactset arbitrarily closely by

    nivwzwvFzFwvy ij

    m

    jj

    q

    j

    ijiMLP

    i ,,1,)(),( 0011

    Kl

    l

    l=

    +

    +==

    ==

    (1)

    where j is a sigmoidal function, iF is a linear function, q is the number of

    hidden units, ijv and ljw are weights, 0iv and 0jw are biases (thresholds).MLPs offer a straightforward extension to the classical way of modeling

    time series. Namely, they can use a specific mechanism to deal with temporalinformation (layer delay without feedback or time window) and can thus extend thelinear autoregressive model with exogenous variables (ARX) to the nonlinear ARXform:

    ( ) tnbnktnktnattMLP

    t XXyyFy += ,,,,,1 KK (2)

    whereMLP

    F is a non-linear function, na is the number of past outputs, nb is the

    number of past inputs and nk is the time delay.

    Nonlinear neural network ARX (NNARX) models are potentially more

    powerful than linear ones in that they can model more complex underlyingcharacteristics of time series and theoretically do not have to assume stationarity.

    Feedforward networks are well suited only for NNARX models, whichallow a series-parallel architecture that has a predictor without feedback. In a

    NNARX model, ty is a function on its lagged values jty and the lagged values of

    some exogenous variables. In principle, the output of the NARX network can be

    considered as an estimate ty of the output ty of some nonlinear dynamic system

    and thus it should be fed back in the next stage to the input of the feedforward

    neural network. However, because the true previous outputs jty are available at

    time t during the training of the network, a series-parallel architecture can be

    created, in which the true outputs jty are used instead of feeding back the

    estimated outputs jty , as shown in Figure 1. This has two advantages. The first is

    that the input to the feedforward network is more accurate. The second is that theresulting network has a purely feedforward architecture, and static backpropagation

    can be used for training.For other types of models used in time series processing that involve

    predictors with feedback, one can resort to recurrent networks, where futurenetwork inputs will depend on present and past network outputs.

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    5/21

    Robustly Forecasting the Bucharest Stock Exchange BET Index .

    Figure 1. A purely feedforward architecture of the NNARX(na, nb, nk) neuralnet

    2.2. NNARX based One-Value-Ahead Forecasting of BET Index

    One of the major issues in neural network forecasting is how much data arenecessary for neural networks to capture the dynamic nature of the underlyingprocess in a time series. There are two facets to this issue:

    (i) How many lagged observations should be used as inputs to the neuralnetwork (or, equivalently, how many input nodes should the neural network have)?Each actual historical value depends upon a number of preceding values(endogenous and exogenous lagged observations).

    (ii) How many historical values should be used in training the neuralnetwork? Although larger sample size, in the form of a longer time series, isusually recommended in model development, empirical results suggest that longer

    time series do not always yield models that provide the best forecastingperformance. Using a smaller sample of time series or data close in time to the out-of-sample can sometimes produce more accurate neural networks.

    Actually, we focus our effort on forecasting the Bucharest Stock ExchangeBET index. Some of its characteristics, such as the synchronization with well

    established indices from global markets (Dow Jones, FTSE100-London, Nikkei-Tokyo) provide us with further guidance for choosing the sample size. The BETindex has been relatively recently introduced and its first period of about threeyears is characterized by a relatively flat evolution and a lack of synchronization

    with major indices. After that, it starts to synchronize well with the global market(Figure 2).

    On the other hand, commonly employed neural network design heuristic is

    to capture the dynamics of a stock market index through a time series model, whichrepresents the movement of an endogenous stochastic variable only in terms of itslagged values. The nonlinear neural network autoregression (NNAR) model is atypical example. As an alternative, a NNARX model may be considered, whereone or more exogenous variables are also included. Their role is to capture

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    6/21

    Vasile Georgescu

    extraneous influences. Indeed, the ever more global economy causes interactioneffects among the various economies around the world. From this perspective,market indices are classified as either local and emerging or global and mature.Large global markets will have dynamically changing effects on an emerging

    market. Results indicate that, as global information is introduced, the forecasting performance of the neural network models for the emerging market indeximproves. Thus, especially with emerging markets, neural network models mustincorporate global economic information into the input vector variable set toachieve optimal performance when forecasting financial time series. All indices inFigure 2 reveal non-stationary patterns. This is confirmed by the empiricaldistribution of BET index, which is asymmetric and exhibits rather a power law

    pattern than a Gaussian pattern (see Figure 3).

    Figure 2. Parallel evolution of BET, Dow Jones and FTSE100 Indices. Chosing

    training and validation data samples

    Figure 3. The empirical distribution of BET index, when comparing with the

    normal standard distribution

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    7/21

    Robustly Forecasting the Bucharest Stock Exchange BET Index .

    We have tested a large set of model specification structures and theirrelated neural network architectures in order to choose and validate the bestsettings for our neural-network approach to BET index forecasting. We started with

    the simplest NNAR( an ) model, without exogenous variables, and then we

    successively introduced one or two exogenous variables (Dow Jones and FTSE100,respectively), thus passing from a purely time series model (NNAR) to a dynamic

    system representation, i.e. NNARX( an , bn , kn ).

    It is worth to mention that the well known order specification tests, i.e.AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion) andMDL (Minimum Description Length criterion) respectively, may produce

    inconsistent results when are used for the selection of the appropriate order na , nb

    and nk of a nonlinear model such as NNARX( nknbna ,, ). However, theirindicative guess for the case of linear models can be the starting point in aspecification procedure based on numeric experiments. For example, the orders can be chosen such that the Akaike's Information Criterion (AIC)

    ( ))/21(log NdVAIC += is minimized, where V is the loss function, i.e.,

    ( )= N ttNV 1 )),())(,((1det , dis the length of the parameter vector , andNis the number of data points used for the estimation. The Akaike's InformationCriterion suggests the following linear AR/ARX models:

    ttt yayAR =+ 11:)1(

    ttttttt XbXbXbXbyayARX ++++=+ 1

    64

    1

    53

    1

    42

    1

    3111:)3,4,1( tttttttt XbXbXbXbXbyayARX +++++=+

    2522

    2421

    1313

    1212

    111111:])4,1[],2,3[,1(

    where y stays for BET, 1X stays for DJ and 2X stays for FTSE100.

    For the nonlinear models NNAR( na ) and NNARX( nknbna ,, ) we chosen

    to determine the orders by numerical experiments, trying for orders in the range

    ]5,1[ .

    Once the model structure was specified and the neural network architecturewas selected, the neural network BET index forecasting model was trained over thetraining data and then applied against the validation data. We adopted the standardapproach for training and testing, that is, to evaluate a model by testing its

    performance on a validation set consisting of out-of-sample data (Figure 2).Financial time series forecasting neural networks may be evaluated in

    many different ways. However, it is commonly admitted that the forecasting performance must result in the form of a tradeoff between statistical accuracymeasurement and trading strategies profitability. The problem cannot be reduced

    to how significant (statistically) is the relationship between error measures (used totrain a forecasting NN). It is as important as the trading profitability (based on thatforecast). In other words, the goal is to improve both the predictability and the profitability. Commonly, the neural network forecasting models are evaluated

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    8/21

    Vasile Georgescu

    through the MSE (Mean Square Error). But statistical accuracy is not always agood warranty for profitability. It is to be noticed that, even very small errors thatare made in the wrong direction of change may cause significant negative returns,resulting in a capital loss for an investor following the recommendation of the

    neural network model. Hence, predicting the correct direction of change to anexisting market index value is the primary criterion for financial forecastingmodels. Instead of measuring the mean standard error of a forecast, a better methodfor measuring the performance of neural networks is to analyze the proportion of predictions that correctly indicate the direction of change to the current marketvalue. Therefore, all of the reported results in this paper will be based on both theMSE and the accuracy or percentage of correct predictions with regard to direction

    of change.

    The neural network results will be compared to a standard benchmark, therandom walk forecast. The random walk hypothesis assumes that, since tomorrowcannot be predicted, the best guess we can make is that tomorrows price will bethe same as todays price. If neural networks outperform the random walk, then it

    can be concluded that there is a nonlinear function or process inherent in the datatested. The implication is that a short-term (typically, one-step-ahead) forecast can

    be successfully generated.We first assumed the hypothesis of an insulated stock market and

    constructed a neural network forecasting model that shows only the endogenous

    dynamics of BET index, relating its current value only to its lagged historicalvalues. This leads to a purely time series based NNAR (nonlinear neural network

    AR) model. For each order na , with na from 1 to 5 , we trained the NNAR( na )

    architecture and estimated the model. The forecasting model has been evaluatedwith both the MSE (Mean Square Error) and the trading performance (percentageof correct predictions with regard to direction of change). The table below showsthe results.

    na=1 na=2 na=3 na=4 na=5

    Performance 0.496855 0.507886 0.493671 0.517460 0.490446

    MSE 0.000085 0.000087 0.000088 0.000088 0.000089

    Although the statistical accuracy measured by MSE was optimal for the NNAR(1) model, the maximum trading performance has been obtained for the

    NNAR(4) model. The neural network architecture associated with this model isdepicted in Figure 4.

    The forecasting performance of this model is about 51.75%, which is closeto the random walk performance. Based upon such a modest performance, thehypothesis of possible market inefficiencies appears to be rejected. The modelsupports rather a random walk interpretation of BET market index and thus does

    not enables a forecasting advantage through nonrandom walk or predictablebehavior.

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    9/21

    Robustly Forecasting the Bucharest Stock Exchange BET Index .

    Figure 4. The MLP architecture of the NNAR(4) model

    However, including additional global knowledge by introducing in theBET index equation some other well established market indices such as DJ and/orFTSE100 proved to ameliorate the overall forecasting performance of the BETindex neural network forecasting model. Thus, modeling the BET index as a NNARX-type nonlinear dynamic system with at least one exogenous variableappears to achieve a statistically significant improvement over the random walkbenchmark dynamic behavior and to provide exploitable market inefficiency.

    The DJ index was the first exogenous variable we introduced in the model.Now we have to chose between a large number of possible combinations, allowing

    na , nb and nk to vary into the range ]5,1[ . The results are partially shown (only

    for nk=1 and nk=2) in the following tables.

    For nk=1:Performnace nb=1 nb=2 nb=3 nb=4 nb=5

    na=1 0.484277 0.552050 0.566456 0.542857 0.554140

    na=2 0.476341 0.548896 0.522152 0.526984 0.525478

    na=3 0.481013 0.528481 0.506329 0.517460 0.522293

    na=4 0.473016 0.517460 0.504762 0.526984 0.535032

    na=5 0.461783 0.519108 0.506369 0.509554 0.528662

    For nk=2:

    Performance nb=1 nb=2 nb=3 nb=4 nb=5

    na=1 0.533123 0.547468 0.536508 0.525478 0.495208

    na=2 0.517350 0.528481 0.507937 0.531847 0.523962na=3 0.531646 0.522152 0.495238 0.500000 0.488818

    na=4 0.495238 0.520635 0.533333 0.500000 0.463259

    na=5 0.490446 0.484076 0.535032 0.519108 0.507987

    The selected model is NNARX(1, 3, 1), with a forecasting performance

    that increase from 51.75% to 56.65%. The neural network architecture associatedwith this model is depicted in Figure 5.

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    10/21

    Vasile Georgescu

    Figure 5. The MLP architecture of the NNARX(1, 3, 1) model

    Finally, both the DJ index and the FTSE100 index are included asexogenous variable in the model. The structure specification consists of choosing

    orders and time delays for which the best forecasting performance is reached in the

    class of NNARX( na ,[ 1nb , 2nb ],[ 1nk , 2nk ]) models. The results are partially given

    below.

    For 2nb =4, 1nk =1, 2nk =4:

    Performance nb1=1 nb1=2 nb1=3 nb1=4 nb1=5

    na=1 0.426282 0.483974 0.522436 0.583333 0.557692na=2 0.442308 0.544872 0.564103 0.512821 0.551282na=3 0.522436 0.544872 0.580769 0.512821 0.554487

    na=4 0.458333 0.528846 0.608974 0.544872 0.535256na=5 0.487179 0.525641 0.563590 0.548077 0.541667

    The selected model is NNARX (4, [3, 4], [1, 4]), which increases theforecasting performance to over 60%. Note that the exogenous variable FTSE100

    has a time delay of 2nk =4. The neural network architecture associated with this

    model and its learning capability is displayed in Figures 6 and 7.

    A series-parallel architecture of the NNARX (4, [3,4], [1,4]) neural network

    BET (t-1)

    BET (t-2)

    BET (t-3)

    BET (t-4)DJ (t-1)

    DJ (t-2)DJ (t-3)

    FTSE100 (t-4)

    FTSE100 (t-5)

    FTSE100 (t-6)FTSE100 (t-7)

    BEThat

    (t)

    Input Layer Hidden Layer Output Layer

    Figure 6. The MLP architecture of the NNARX (4,[3, 4],[1, 4]) model

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    11/21

    Robustly Forecasting the Bucharest Stock Exchange BET Index .

    Figure 7. Learning capability of the selected neural network architecture

    The plot comparing predictions to actual measurements for the validationdataset (out-of-sample predictions) confirms by visual inspection that forecasts arereasonably accurate, but are sometimes out-of-phase, which means over-anticipation (Figure 8).

    Figure 8. Comparing out-of-sample predictions to actual measurements

    The prediction errors for the validation dataset are depicted in Figure 9.

    Figure 9. Out-of-sample prediction errors

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    12/21

    Vasile Georgescu

    Just as expected, the in-sample forecasts (i.e., forecasts derived from thetraining dataset) are much more accurate than the out-of-sample ones (Figure 10).However, in-sample data cannot serve for performance validation because theyusually produce overfitted results.

    Figure 10. Comparing in-sample (i.e. based on the training dataset)

    predictions to actual measurements

    In conclusion, the highest forecasting performance was achieved by a NNARX(2,3,2,1,1) model, which utilizes input values from both the DJ and theFTSE100 indices (recall that the performance measures the percentage of neuralnetwork forecasts that are in the same direction, up or down, as the actual index forthe forecast period). With this model, a significant improvement over the randomwalk benchmark dynamic behavior has been obtained and exploitable market

    inefficiency has been proved for the Bucharest Stock Exchange BET index, provided that exogenous variables with global impact are involved in the neuralnetwork forecasting model.

    3. A NOVEL COMPUTATIONAL INTELLIGENCE BASED

    FORECASTING FRAMEWORK

    3.1. Time Series Preprocessing

    This stage consists of de-noising data by wavelet decomposition and some

    other transformations that rely heavily on the selection of a distance measure forclustering.

    The Discrete Wavelet Transform (DWT, [10]) uses scaled and shiftedversions of a mother wavelet function, usually with compact support, to form eitheran orthonormal basis (Haar wavelet, Daubechies) or a bi-orthonormal basis(Symlets, Coiflets). Wavelets allow cutting up data into different frequencycomponents (called approximations and details), and then studying each

    component with a resolution matched to its scale. They can help de-noiseinherently noisy data such as financial time series through wavelet shrinkage and

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    13/21

    Robustly Forecasting the Bucharest Stock Exchange BET Index .

    thresholding methods, developed by David Donoho ([3]). The idea is to set to zeroall wavelet coefficients corresponding to details in the data set that are less than a particular threshold. These coefficients are used in an inverse wavelettransformation to reconstruct the data set. An important advantage is that the de-

    noising is carried out without smoothing out the sharp structures and thus can helpto increase both the clustering accuracy and predictive performance.

    Care has to be taken in choosing suitable transformations such that the timeseries distance measure chosen in the clustering stage is meaningful to theapplication. Normalization of data is common practice when using Fuzzy C-Means,which means applying scaling and vertical translation to the time series as a whole.Moreover, as we already mentioned, the absolute value of a stock price is not as

    interesting as the shape of up and down movements. Thus, for allowing stock

    prices comparisons subsequence by subsequence, a local translation is alsonecessary, in such a way to have each subsequence starting from zero. A subset of2048 daily closing BET index values drown from Bucharest Stock ExchangeMarket data, as well as the normalized and de-noised data are shown in Figure 11,

    were a level 5 decomposition with Sym8 wavelets and a fixed form softthresholding were used.

    0 200 400 600 800 1000 1200 1400 1600 1800 2000

    2000

    4000

    6000

    8000

    10000

    Original time series

    0 200 400 600 800 1000 1200 1400 1600 1800 2000

    0

    1

    2

    Normalized and de-noised time series

    Figure 11. A normalized and de-noised data subset, drawn from daily closing

    BET index

    3.2. Subsequence Time Series Fuzzy Clustering

    The idea in subsequence time series (STS) clustering is as follows. Just a

    single long time series is given at the start of the clustering process, from which weextract short series with a sliding window. The resulting set of subsequences are

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    14/21

    Vasile Georgescu

    then clustered, such that each time series is allowed to belong to each cluster to acertain degree, because of the fuzzy nature of the fuzzy c-means algorithm we use.The window width and the time delay between consecutive windows are two keychoices. The window width depends on the application; it could be some larger

    time unit (e.g., 10 days for time series sampled as daily closing BET stock index, inour application). Overlapping or non-overlapping windows can be used. If thedelay is equal to the window width, the problem is essentially converted to non-overlapping subsequence time series clustering. We will follow this approach,being motivated by the Keoghs criticism presented in [8], where using overlappingwindows has been shown to produce meaningless results, due to a surprisinganomaly: cluster centers obtained using STS clustering closely resemble sine

    waves, irrespective of the nature of original time series itself, being caused by the

    superposition of slightly shifted subsequences. Using larger time delays for placingthe windows does not really solve the problem as long as there is some overlap.Also, the less overlap, the more problematic the choice of the offsets becomes.

    Since clustering relies strongly on a good choice of the dissimilarity

    measure, this leads to adopting an appropriate distance, depending on the verynature of the subsequence time series.

    Let 1,, += wmm yyS K be a subsequence with length w of time series

    nyyY ,,1 K= , where 11 + wnm . Subsequences will be represented as

    vectors in a w-dimensional vector space. For relatively short time series, shape-

    based distances, such as pL norms, are commonly used to compare their overall

    appearance. The Euclidean distance (2

    L ) is the most widely used shape-based

    distance. Other pL norms can be used as well, such as Manhattan ( 1L ) and

    Maximum ( L ), putting different emphasis on large deviations.

    There are several pitfalls when using an pL distance on time series: it does

    not allow for different baselines in the time sequences; it is very sensitive to phase

    shifts in time; it does not allow for acceleration and deceleration along the time

    axis (time warping). Another problem with pL distances of time series is when

    scaling and translation of the amplitudes or the time axis are considered, or whenoutliers and noisy regions are present.

    A number of non-metric distance measures have been defined to overcomesome of these problems. Small distortions of the time axis are commonly addressed

    with non-uniform time warping, more precisely with Dynamic Time Warping(DTW, [7]). The DTW distance is an extensively used technique in speechrecognition and allows warping of the time axes (accelerationdeceleration ofsignals along the time dimension) in order to align the shapes of the two time series better. The two series can also be of different lengths. The optimal alignment is

    found by calculating the shortest warping path in the matrix of distances betweenall pairs of time points under several constraints (boundary conditions, continuity,

    monotonicity).

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    15/21

    Robustly Forecasting the Bucharest Stock Exchange BET Index .

    The warping path is also constrained in a global sense by limiting how farit may stray from the diagonal. The subset of the matrix that the warping path isallowed to visit is called the warping window. The two most common constraintsin the literature are the Sakoe-Chiba band and the Itakura parallelogram. We can

    view a global or local constraint as constraining the indices of the warping path

    kk jiw ),(= , such that rjirj + , where r is a term defining the allowed

    range of warping, for a given point in a sequence. In the case of the Sakoe-Chiba

    band (see Figure 12), ris independent of i ; for the Itakura parallelogram, ris a

    function of i .

    (a) (b)

    Figure 12. (a) Aligning two time sequences using DTW. (b) Optimal warping

    path with the Sakoe-Chiba band as global constraints.

    DTW is a much more robust distance measure for time series than 2L ,

    allowing similar shapes to match even if they are out of phase in the time axis.

    Unfortunately, however, DWT is calculated using dynamic programming with time

    complexity )( 2nO . Recent approaches focus more on approximating the DTW

    distance by bounding it from below. For example, a novel, linear time (i.e., with

    complexity reduced to )(nO ), lower bound of the DTW distance, was proposed in

    [9]. The intuition behind the approach is the construction of a special envelopearound the query. It can be shown that the Euclidean distance between a potentialmatch and the nearest orthogonal point on the envelope lower bounds the DTWdistance. To index this representation, an approximate bounding envelope iscreated.

    Let },,{ 1 nqqQ K= and },,,{ 1 mccC K= be two subsequences and

    kk jiw ),(= be the warping path, such that rjirj + , where r is a term

    defining the range of warping for a given point in a sequence. The term rcan be

    used to define two new sequences, L and U, where ):min( ririi qqL += ,

    ):max( ririi qqU += , with L and U standing forLowerand Upper, respectively.

    An obvious but important property ofL and Uis the following:ii

    LqUi i , .

    C

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    16/21

    Vasile Georgescu

    Given L and U, a lower bounding measure for DTW can now be defined (seeFigure 13):

    LB-Keogh(Q, C)

    ( )

    ( )

    =

    =

    =

    otherwise

    UcifUc

    UcifUc

    ii

    n

    i ii

    ii

    n

    i ii

    01

    2

    1

    2

    .(3)

    0 5 10 15 20 25 30 35 40-1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    U

    Q

    C

    L

    Figure 13. The lower bounding function LB-Keogh(Q,C). The original

    sequence Q is enclosed in the bounding envelope ofUandL

    We are now going to generalize the fuzzy c-means algorithm tosubsequence time series clustering. In this particular context, the entities to be

    clustered, denoted by kx , and the cluster prototypes (centroids), denoted by iv , are

    both set-defined objects, i.e. subsequence time series. The centroids are computed

    as weighted means, where the weights, denoted by iku , are the fuzzy membership

    degrees to which each subsequence belongs to a cluster. Both the DTW and LB-

    Keogh distances outperform 2L and thus qualify better to be used with the fuzzy c-

    means algorithm. However, the LB-Keoghs lower bound of DTW distance has been preferred, due to its linear time complexity. Figure 14 plots the clustercentroids (prototypes) and the subsequence time series grouped around eachcentroid.

    3.3. Estimation of the Fuzzy Transition Function between Clusters by

    Neural Mapping

    At this stage, a fuzzy transition function between clusters must be learned,

    which is a nonlinear vector function mapping a number of -c dimensional

    membership degree vectors )( 1+jtSTS , pj ,,1 K= , into a -c dimensional

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    17/21

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    18/21

    Vasile Georgescu

    0 50 100 150 2000

    0.5

    1

    Membership degrees to cluster 1

    0 50 100 150 2000

    0.5

    1

    Predicted membership degrees to cluster 1

    0 50 100 150 2000

    0.5

    1Membership degrees to cluster 2

    0 50 100 150 2000

    0.5

    1Predicted membership degrees to cluster 2

    0 50 100 150 200

    0

    0.5

    1Membership degrees to cluster 3

    0 50 100 150 200

    0

    0.5

    1Predicted membership degrees to cluster 3

    0 50 100 150 2000

    0.5

    1Membership degrees to cluster 4

    0 50 100 150 2000

    0.5

    1Predicted membership degrees to cluster 4

    0 50 100 150 2000

    0.5

    1Membership degrees to cluster 5

    0 50 100 150 2000

    0.5

    1Predicted membership degrees to cluster 5

    Figure 15. Accurate neural mapping: actual and predicted membership

    degrees to which each of the 256 subsequence time series belongs to one of the

    5 clusters

    0 50 100 150 200-1

    0

    1Prediction errors for cluster 1

    -0.5 0 0.50

    200

    400Histogram of errors for cluster 1

    0 50 100 150 200-1

    0

    1Prediction errors for cluster 2

    -0.5 0 0.50

    200

    400Histogram of errors for cluster 2

    0 50 100 150 200-1

    0

    1Prediction errors for cluster 3

    -0.5 0 0.50

    200

    400Histogram of errors for cluster 3

    0 50 100 150 200-1

    01

    Prediction errors for cluster 4

    -0.5 0 0.50

    200400

    Histogram of errors for cluster 4

    0 50 100 150 200-1

    0

    1Prediction errors for cluster 5

    -0.5 0 0.50

    200

    400Histogram of errors for cluster 5

    Figure 16. Prediction errors and their histogram for each of the 5 clusters

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    19/21

    Robustly Forecasting the Bucharest Stock Exchange BET Index .

    1 5 10-0.3375

    0.0034

    1 5 100

    0.1264

    1 5 10-0.0704

    0

    1 5 10-0.011

    0.0155

    1 5 10-0.0269

    0.00792

    1 5 100

    0.1069

    1 5 10-0.0806

    0

    1 5 100

    0.0717

    1 5 10-0.0359

    0.0346

    1 5 100

    0.0727

    1 5 10-0.017

    0.0205

    1 5 10-0.0443

    0

    1 5 10-0.0528

    0.006

    1 5 10-0.0155

    0.0323

    1 5 100

    0.0306

    Observed out-of-sample sequence Predicted out-of-sample sequence

    Figure 17. One-subsequence-ahead forecasts of 15 out-of-sample subsequences

    The two forecasting approaches can now be easily contrasted. As oppositeto the common neural network forecasting approach that focuses on price levelforecasts attempting overperform (more or less) the random walk benchmarkaccuracy, our computational intelligence is intended to reliably exploit the shape of

    middle-term up and down price movements. The 15 out-of-sample subsequenceforecasts shown in Figure 17 cover 15*10=150 transaction days and are proved to

    be considerably robust in filtering the short-term randomness and in predicting theright direction of change.

    6. CONCLUSION

    Predicting price levels is an intriguing, challenging, and admittedly riskyendeavor. Technical analysis uses trend following strategies to forecast future pricemovements and to infer trading decision rules, based on the assertion that pricechanges have inertia. However, experimental works show little evidence ofpredictability, with accuracy rates that slightly exceed the random walk benchmarkperformance.

    The first approach in this paper served essentially to test and reject thehypothesis of an insulated emerging stock market. We presumed that among otheremerging capital markets, Bucharest Stock Exchange is tremendously affected byglobal interaction effects and extraneous influences from mature markets. In an

    attempt to validate this presumption, we compared the statistical accuracy and thetrading performance of several neural network forecasting models for BET indexagainst each other and against the random walk benchmark performance. NNAR(na) models were used to capture only the endogenous dynamics of BETindex, and NNARX(na, nb, nk) models to additionally capture exogenousinfluences induced by global market indices (Dow Jones and/or FTSE100). To

    conclude, significantly more accurate results have been obtained through theinclusion of exogenous variables from well established global markets.

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    20/21

    Vasile Georgescu

    However, because of the stock market prices short-term randomness,next-day forecasts cannot be efficiently exploited for making consistent tradingstrategies. The second approach introduced a novel computational intelligenceframework allowing one-subsequence-ahead instead of one-value-ahead forecasts.

    Experimental evidence with ten-day-length-subsequence-ahead forecasts for BETindex proved to be significantly more robust than one-day-value-ahead forecasts inshowing the right direction of change.

    REFERENCES

    [1] Chen, S-H., Jain L., Tai C.-C. (Eds.) (2006), Computational Economics:A Perspective from Computational Intelligence, Idea Group;

    [2] Chen, S-H., Wang P.P., Kuo T.-W. (Eds.)(2007), ComputationalIntelligence in Economics and Finance, Springer-Verlag;

    [3] Donoho, D. (1993), Nonlinear Wavelet Methods for Recovery of Signals,Densities, and Spectra from Indirect and Noisy Data. In: DifferentPerspectives on Wavelets, Proceeding of Symposia in Applied Mathematics,Vol 47, I. Daubechies (eds). Amer. Math. Soc., Providence, R.I., pp. 173205;

    [4] Georgescu, V. (2009), Generalizations of Fuzzy C-Means Algorithm toGranular Feature Spaces, based on Underlying Fuzzy Metrics: Issues and

    Related Works. In: 13th IFSA World Congress and 6th Conference of

    EUSFLAT, pp. 1791--1796, Lisbon, Portugal;[5] Georgescu, V. (2009), A Time Series Knowledge Mining FrameworkExploiting the Synergy between Subsequence Clustering and Predictive

    Markovian Models. Fuzzy Economic Review, vol.XIV, No.1, pp. 41--66, 20;

    [6] Georgescu, V.(2005), Applied Econometrics: Time Series Analysis (Amaster course in English), Universitaria, Craiova;

    [7] Keogh, E., Pazzani, M.J.(1999), Scaling up Dynamic Time Warping toMassive Datasets. In: Zytkow, J.M., Rauch, J. (eds), 3rd EuropeanConference on Principles of Data Mining and Knowledge Discovery

    (PKDD'99), pp. 1--11. Springer;

    [8] Keogh, E., Lin, J., Truppel, W. (2003), Clustering of Time SeriesSubsequences is Meaningless: Implications for Previous and Future

    Research. In: 3rd IEEE International Conference on Data Mining, pp. 115122;

    [9] Keogh, E., Ratanamahatana, C. A. (2005), Exact Indexing of DynamicTime Warping. Knowledge and Information Systems, 7, pp. 358386;

    [10]Mallat, S. G., Peyr, G. (2009),A Wavelet Tour of Signal Processing: TheSparse Way. Academic Press, 3

    rdEdition.

  • 8/4/2019 Robustly Forecasting the Bucharest Stock

    21/21

    Copyright of Economic Computation & Economic Cybernetics Studies & Research is the property of Economic

    Computation & Economic Cybernetics Studies & Research and its content may not be copied or emailed to

    multiple sites or posted to a listserv without the copyright holder's express written permission. However, users

    may print, download, or email articles for individual use.