Evolving intraday foreign exchange trading strategies ... · Genetic Programming; Finance; Foreign...

Evolving intraday foreign exchange trading strategiesutilizing multiple instruments price series

Simone CirilloAlmont Capital LLC12800 Hillcrest RoadDallas, Texas 75230

[email protected]

Stefan LloydAlmont Capital LLC12800 Hillcrest RoadDallas, Texas 75230

[email protected]

Peter NordinChalmers University of Technology

Maskingrand 2Gothenburg, [email protected]

ABSTRACTWe propose a Genetic Programming architecture for thegeneration of foreign exchange trading strategies. The sys-tem’s principal features are the evolution of free-form strate-gies which do not rely on any prior models and the uti-lization of price series from multiple instruments as inputdata. This latter feature constitutes an innovation with re-spect to previous works documented in literature. In thisarticle we utilize Open, High, Low, Close bar data at a 5minutes frequency for the AUD.USD, EUR.USD, GBP.USDand USD.JPY currency pairs. We will test the implementa-tion analyzing the in-sample and out-of-sample performanceof strategies for trading the USD.JPY obtained across mul-tiple algorithm runs. We will also evaluate the differencesbetween strategies selected according to two different crite-ria: one relies on the fitness obtained on the training set only,the second one makes use of an additional validation dataset.Strategy activity and trade accuracy are remarkably stablebetween in and out of sample results. From a profitability as-pect, the two criteria both result in strategies successful onout-of-sample data but exhibiting different characteristics.The overall best performing out-of-sample strategy achievesa yearly return of 19%.

Categories and Subject DescriptorsI.2.2 [Artificial Intelligence]: Automatic Programming—Program synthesis

KeywordsGenetic Programming; Finance; Foreign Exchange; Auto-mated Trading

1. INTRODUCTIONThe Foreign Exchange (FX) Market is the most liquid fi-nancial market in the world and its average daily turnoveris continuously growing. According to the Bank for Interna-tional Settlements, trading in Forex markets averaged $5.3trillion per day in April 2013 [4]. Due to its strong liquidity

Copyright c©Simone Cirillo, Stefan Lloyd, Peter Nordin.http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Abstracting and quotingwith credit is permitted. No use for commercial purposes is allowed.

the FX market is often considered to be the market where itis most difficult to consistently make financial returns fromtrading.

So far there have been many academic investigations withthe aim of developing technical trading strategies for finan-cial markets and products using nonlinear computation tech-niques such as Artificial Neural Networks [3], Genetic Algo-rithms [1], and Genetic Programming [30]. Most often theasset class of choice for developing these models and strate-gies has been stocks but attempts have been made on theFX market as well.

Generally the obtained results are positive, exhibiting mod-erate profitability on unseen data. However, the applica-bility and reproducibility of these academic results on real-life implementations running on live markets are often ques-tioned by professional financial practitioners. In this arti-cle we describe an innovative implementation of a GeneticProgramming architecture for the evolution of forex tradingstrategies, with an emphasis on accurate problem modelingto facilitate successive real market application as well as onestablishing a reliable methodology to select, from trainingand validation performance, strategies capable of obtainingprofitable results on unseen, new data.

1.1 Genetic ProgrammingGenetic Programming (GP) is a metaheuristic optimiza-tion technique belonging to the class of evolutionary algo-rithms. Evolutionary Algorithms (EA) take inspiration fromthe process of biological evolution. At the start of the pro-cess a population of candidate solutions to the problem athand is randomly generated and it subsequently undergoesa process of evaluation, selection, reproduction, crossoverand mutation over a number of generations. As generationsprogress, progressively better solutions are found.

GP has been popularized by John Koza [18] and it has beensuccessfully applied to a variety of problems in many ap-plication domains including electronic circuit design, opticallens systems, robotics, game playing, bioinformatics, imageand signal processing, scheduling, etc. . . [19]. In GP, the can-didate solutions being optimized by means of evolution areprograms computing mathematical expressions; the measureutilized to evaluate how good they are at solving the definedproblem, driving the process, is called fitness function. Oneof the key strengths of GP is that there are no human apriori assumptions made about the model to begin with, as

arX

iv:1

411.

2153

v1 [

cs.N

E]

8 N

ov 2

014

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

it is the model itself that is being evolved. Therefore theonly bias GP is subject to is the inductive bias: the set ofassumptions ”learned” on the training data.

This contrasts, for instance, with the approach taken byGenetic Algorithms (GA), another type of EA, where thestarting point is a model of predetermined structure andonly numerical values, usually representing coefficients orweights, undergo evolutionary optimization.

1.2 GP, Financial Markets,and Trading Strategies

Nowadays financial markets are dominated by algorithmictraders, accounting for over 73% of US equity volumes in2010 [24]. Most algorithmic traders rely on relatively fewpopular models, assumptions and technical indicators fortheir strategies.

According to Lo’s Adaptive Market Hypothesis (AMH) [20],in financial markets profit opportunities generally exist how-ever, due to popularization of models, assumptions and in-dicators used in technical trading, over time these opportu-nities gradually disappear while others emerge; a behavioralso noted by Gencay et al. [12].

In the case of the foreign exchange market, Neely and Weller‘sanalyses show that profitability of fixed technical rules sub-stantially fluctuates over time [29, 28], indicating that theforeign exchange market appears to comply, to some extent,to the AMH. Subscribing to this view, as models and algo-rithms go in and out of fashion, an arms-race scenario, ametagame, emerges. Strategies are successful if they beatthe market itself as well as the market effects caused byactions of other trading strategies, individuating emergentprofit opportunities as well as local inefficiencies.

Independently from the AMH, time series of foreign ex-change data have been characterized as chaotic, extremelynoisy, and non stationary [13]. These considerations lead tothe conclusion that, over sufficiently long horizons, a suitableadaptive model, or an adaptive system to construct modelsshould yield better performance than static models or fixedtrading rules.

Summarizing, there is a growing body of evidence indicat-ing that the ability to constantly and reliably find different,novel, models and logic for trading strategies is a decisiveissue faced by financial practitioners. GP, given its inherentcapability of evolving models without human assumptions,intuitively represents a promising methodology for doing so.

The first attempts to apply GP for the evolution of tradingstrategies for the foreign exchange market date back to thelate 1990s. In 1997 Neely, Weller and Dittmar attempted toevolve strategies for the interday trading of several currencypairs; they implemented a binary long/short trading modelusing overnight interest rates as the obtained returns [26].Their results show out-of-sample annualized excess returnsfrom 1% to 6%.

A later, more elaborate work from 2003 always by Neely andWeller considers intraday data with a 30 minutes frequency[27]. Their aim here is the investigation of the impact of

transaction costs on the final return performance. After per-forming experiments with varying commission rates as wellas without them, they conclude that their GP strategies areable to find predictable patterns in the data but they strug-gle to produce positive excess returns once transaction costsare factored in.

In 2001, Dempster and Jones developed a GP system evolv-ing indicator-based trading rules [10]. The indicators arecomputed for 15-minute intervals while the strategies tradeon a 1 minute frequency, an impressive computational ef-fort for the time. On the period Q1 1994 - Q4 1997 theyobtained 7% annualized returns. Their investigation showsconsistent profits for the first three quarters of the out-of-sample period while later performance is much more volatileand prone to losses, providing indirect support to the AMHand indicating that some sort of model re-training is requiredto guarantee stable performance in the long term.

Hryshko and Downs implemented a hybrid architecture us-ing GA and reinforcement learning to optimize indicator-based entry/exit rules. They obtained a 6% profit on theEUR.USD over a 3.5 months period [17].

In 2006 Brabazon and O’Neill proposed the evolution ofstrategies using a technique called grammatical evolution,a variant of GP based on formal grammars, reporting out-of-sample profits from 0.1% to 5% [8].

More recent years have seen the exponential increase in rawcomputing power of retail hardware, the widespread adop-tion of architectures capable of parallel computation, therefinement in software tools, and finally the appearance ofmany retail trading brokers as well as financial data ven-dors providing intraday or even real-time price quotes. Allof these factors laid the foundations for a renewed interestin the subject, making it feasible and practical to performevaluations of more elaborate models on larger amounts ofdata.

Hirabayashi et al. utilized GA to optimize the choice oftechnical indicators for the construction of buy/sell tradingrules [15]. On hourly data for a two year period with rollingwindow retraining every 3 months they obtain a combinedperformance for 2005-2008 of 17% on the USD.JPY, 80% forthe EUR.JPY and 38% for the AUD.JPY using leverage.However, the corresponding unlevered returns are only of2%, -2%, and 19% respectively.

Wilson and Banzhaf proposed an architecture based on Lin-ear Genetic Programming, a variant of GP. Initially theyapplied it to the stock market on interday [33] as well as in-traday data [34]. Later they focused on interday forex trad-ing [35]. Their methodology was tested on the CAD.USD,EUR.USD, GBP.USD and JPY.USD on a rolling-windowone-year out-of-sample period and they compared the per-formance obtained by three different fitness functions: oneentirely profit based, the other two explicitly attempting tominimize losses as well. Their reported annualized profitsrange, after accounting for transaction costs, from -9% to13%, depending on currency pair and fitness measure.

Godinho investigated profitability of a GA rule optimizeron currencies whose exchange rate is bounded, as opposedto free-floating [14]. He only reports positive out-of-samplereturns for the USD.SGD, one of the bounded currenciesanalyzed.

Mendes et al. evolved a set of entry and exit technicaltrading rules using GA [25]. Applied to EUR.USD andGBP.USD data for frequencies ranging from 1 minute to1 hour, the gross returns on the training set are solidly prof-itable. However, statistical performance on test, unseen,data is considerably worse and profitability is eroded in thepresence of transaction costs.

Loginov proposes a GP variant, called FXGP, based on theco-evolution of a decision tree representing the strategy logicas well as the technical indicators utilized by the tree [21];FXGP can also employ periodic or trigger condition basedretraining to improve profit consistency on out-of-sampledata. Such retraining appears to be fundamental to theapproach’s profitability: algorithm runs without any sort ofretraining are shown to have at most a 38% chance of anyprofitability at the end of a 3-years period.

A different way of framing the problem is not the directevolution of trading strategies, but rather the evolution ofmodels for price prediction; the predictive model is thencoupled with a fixed logical rule to produce trading signals.Evans et al. developed a price forecasting model utilizingGA to optimize the structure of an Artificial Neural Network(ANN) [11]; they report annualized returns of 23.3% withoutthe inclusion of transaction costs and with an out-of-sampledataset length of two months.

Vasilakis et al. developed GP models for the 1-day-aheadforecasting of the daily ECB fixing of the EUR.USD rate[31]. The trading performance is shown to outperform abuy-and-hold benchmark strategy, Moving Average Conver-gence Divergence (MACD) trading, and Evolutionary Artifi-cial Neural Network (EANN) models, a technique combiningGA and ANN.

Manahov and Hudson also attempted to forecast intradayforex rates using GP [22]. The peculiarity of their approachis that they rely on an entirely simulated marked where theGP individuals compete against each other. In their ex-periments they utilize 5-minutes data and they show out-of-sample profitability outperforming conventional autore-gressive models for the EUR.USD, USD.JPY, GBP.USD,AUD.USD, USD.CHF, and USD.CAD pairs with returnsranging from 5.90% to 3.76%. In a more recent study using1 minute data, their results don’t show significant improve-ment [23].

1.3 Aim and ScopeIn this paper we will describe an architecture to evolve strate-gies for intraday trading in the FX market using GeneticProgramming, with the final goal of employing them for livetrading. Our approach features innovative key aspects in-troduced with the aim of boosting the GP strategies’ per-formance, increase their flexibility, as well as narrowing thegap between the evolutionary optimization process, occur-

ring within the GP environment, and subsequent usage inonline, live trading.

For instance, in contrast to documented work, our imple-mentation does not rely on technical indicators, autoregres-sive inputs, or a pre-defined set of rules but instead it evolvescompletely free-form trading strategies considering pricequotes from different forex pairs as input values. Our ar-chitecture therefore operates on a significantly larger searchspace for both problem and solution domains.

Finally, strategies are evaluated within a 3rd party commer-cial trading software to verify the correctness of the obtainedresults and to perform further testing.

We will present results obtained by strategies produced byour system for trading the USD.JPY spot rate, with USDas the base currency for calculating performance; we willbenchmark them against a buy-and-hold strategy as wellas the Barclay’s BTOP FX index [7]. With the aim of es-tablishing a reliable methodology, we will also compare twodifferent criteria for the selection of strategies likely to per-form well on unseen data. Good a priori chances of successfulgeneralization is an issue of crucial importance for makingfeasible the consistent utilization of GP strategies in a live,production, trading setting.

In Section 2 we describe the implementation of the forex-trading Genetic Programming architecture utilized for theevaluations presented in this article. Section 3 outlines thedetails and the scope of the GP algorithm runs we per-formed. In Section 4 we present the obtained results whichwill be discussed and compared against previous work inSection 5. In Section 6 we will be making concluding re-marks as well as indicating possible future steps for this lineof research.

2. FOREX TRADING GP SYSTEMA Genetic Programming architecture is a complex systemencompassing many abstraction layers: from the proper GPalgorithm implementation, to what the evolved programs areand how do they operate, to finally what the fitness functionevaluating their performance is.

2.1 GP FrameworkThe core of our system is the HeuristicLab platform, anopen-source, extensible plugin-based optimization frameworkfor heuristic and evolutionary algorithms developed and main-tained by the Heuristic and Evolutionary Algorithms Labo-ratory of Upper Austria University of Applied Sciences [32].

HeuristicLab enforces by design the separation of logical ab-stractions and source code modules for the optimization al-gorithm from those concerning the problem to be solved,facilitating reuse of existing functionalities. Other advan-tages include access to the entire original codebase, allow-ing for the reimplementation or addition of any feature; thesupport for parallel execution, greatly reducing the compu-tation times for the algorithm runs; and finally the powerfulbuilt-in solution and population analysis capabilities.

HeuristicLab does support Genetic Programming on its own,specifically in the form of Tree GP: an implementation pop-

ularized by Koza [18]. Additional modules for significantlyimproving the algorithm run times on large dataset, largepopulation setups [9], as well as functionalities for simulat-ing the financial market and evaluating trading solutionswere instead developed by the authors.

2.2 GP StrategiesThe AMH seems to indicate that established technical trad-ing methods, as their popularity grows, tend to lose effec-tiveness. Our investigation therefore attempts to find prof-itable trading strategies departing from conventionally usedmodels.

Many of the articles mentioned in Section 1.2 employ thetechniques of GA and ANN. Such techniques optimize pre-determined models or have very limited model structure op-timization capabilities. The works of Neely and Weller [26,27], Wilson and Banzhaf [35], Vasilakis et al. [31], Logi-nov [21] and Manahov [22, 23], implementing proper GP,instead allow the optimization process to freely evolve themodel structure as well.

An aspect present in every documented previous implemen-tation is the reliance, or the usage, on common technicalanalysis indicators such as MACD, Momentum, etc. . . Theseindicators have their usefulness if interpreted and analyzedby a human trader. However, it is debatable whether an au-tomatically constructed trading strategy would, optimally,evolve towards using them as well since they represent aman-made, inherently biased, perspective. Additionally, giventhe widespread usage of such technical indicators, their avail-ability to the GP programs might provide a path of least re-sistance to local optima in the fitness landscape representingconventional, but potentially suboptimal, trading strategies.

Concluding, our architecture will evolve GP free-form strate-gies that do not make use of conventional technical indica-tors.

The function set for the experiments presented in this articletherefore only consists of arithmetic and trigonometric op-erators, as well as boolean operators and flow control state-ments to allow for the emergence of functioning if-then-elseconstructs. These latter enable the evolved individuals todevelop different behaviors in response to changing condi-tions in the underlying market.

Specifically the members of the function set are: Addition,Subtraction, Multiplication, Division, Sine, Cosine, Tangent,IfThenElse, GreaterThan, LessThan. The terminal set in-stead consists of the input variables, Variable, as well asrandomly-generated constants, Constant ; both types of ter-minal node allow for a multiplicative weight value.

Trees have both a depth limit and a length limit, so to reducethe phenomenon of bloat [2, 5]. The trees representing ourGP individuals are generated using a probabilistic method.Crossover is implemented using subtree-swapping. Selectionis tournament-based. Mutation can occur in the followingways: change of the type of non-terminal node, change ofthe value of the weight of a terminal node, removal of asub-tree, replacement of a sub-tree with a newly generatedone.

2.3 Strategy Workflow2.3.1 Input Data

In fast moving markets exhibiting high volatility, interdaystrategies obtain very limited informational exposure andtrading opportunities. Their potential profitability resultstherefore at a disadvantage in comparison to the growingnumber of faster market actors. For this reason our workwill be focusing on generating strategies having a successfuland active trading profile on intraday price variations. Therisk this choice entails consists in the increased noise intra-day data exhibit compared to interday series; meaningfulpatterns are harder to find and learn.

The data we start from consists of collected tick-level spotpricing data from Citigroup; the tick data is aggregated inthe form of 5-minute bars, each bar consisting of four values:Open, High, Low, and Close prices for the time interval itrepresents.

All the works mentioned in Section 1.2 only utilize pricequotes referring to the same instrument being traded asinput data. An additional innovation of our proposed ap-proach consists in that, while the strategies presented inthis work will only trade the USD.JPY, they will be giveninput bar data for the following instruments: EUR.USD,USD.JPY, GBP.USD, AUD.USD, accounting for a signifi-cant part of the total global daily liquidity.

The goal is enabling agents to find inter-security patternsand correlations which may result beneficial to their tradingperformance. In comparison to having single security priceor bar values this substantially increases the dimensionalityof the problem space: strategies have a total of 16 inputvalues per datapoint as opposed to the 4 they would haveif given inputs only for the traded instrument. However theimplicit, emergent, feature selection property of GP meansthat each strategy only utilizes an arbitrary subset of thetotal available inputs [6].

Considering the high dimensionality of the problem spaceand the many-to-many non-trivial correlations and relation-ships between the variables, we opted not to perform anypre-processing or normalization procedure on the data; barprice values are used as-is. The evolved strategies are there-fore exposed to the non-stationarity of the underlying mar-ket: strategies might lose predictive or action capabilitiesif the input values, over time, diverge too much from theranges they had in training dataset. On the other handa no-preprocessing policy prevents any loss of informationpresent in the data, leaving much more freedom and there-fore pattern learning potential to the evolutionary process.

2.3.2 Strategies’ OutputThe most common implementation involves mapping theGP program output to a buy, sell, or stay action or to abinary long/short position; alternatively, the evolved pro-grams only produce a next-step price prediction. Both ofthese output types are then coupled to a fixed model, notsubject to evolutionary optimization, aware of equity amountand current position responsible for issuing trading orders.The main limitation of both of these paradigms consists inthe fact that the strategy’s final performance is entirely de-pendant by the trading model in place.

In order to provide maximum freedom to the evolved strate-gies, we will be adopting an implementation similar to theone presented in [22, 23], where for each set of input valuesthe GP individual computes an output value interpreted asthe desired position exposure. Valid exposures range from+100 to -100, from a full long position to a full short posi-tion.

The main advantage this choice confers is the total decou-pling of the strategies from the amount of equity capitaltraded, as the output value represents an entirely relativequantity. Another emerging characteristic is the evolutionof strategies whose output can be seen as a datapoint-by-datapoint confidence value signal, as opposed to a trigger ofan imminent trend shift. Both traits are desirable for ap-plication in live trading, making it possible to later use thestrategy logic with different trading or portfolio managementmodels.

2.3.3 Trading ModelIn foreign exchange trading, as opposed to the stock marketfor instance, a full long position means that all the assets arein the quote currency while a full short position converselymeans they are all in the base currency. Since one currencygoes up the other in the pair goes down, forex trading in-volves appropriately moving assets from a currency to theother and viceversa. To capture this market dynamic there-fore our strategies are evolved in an environment where theonly safe position is 50% long and 50% short: where thegains made one side are exactly matched by the losses onthe other side.

This modelization successfully makes strategies able to tradeon both sides of the market. It also prompts them to action:the total value of a starting 50-50 position will not change nomatter the underlying market conditions. A 100-0 or 0-100starting state would instead be equivalent to a buy-and-holdstrategy and could therefore result in profits at the end ofthe evaluation period. This leads to rewarding of undesiredstrategy behavior; preliminary experiments confirmed theimplicit evolutionary advantage this confers to individualsexploiting the aspect, making it harder for the algorithm toreliably find strategies with an active trading profile.

Strategies are initialized with an arbitrary equity amountplaced in a 50-50 position. As explained in Section 2.3.2,once per datapoint the strategy computes its desired posi-tion exposure. If the strategy output differs from the posi-tion currently held by more than 10% a corresponding buyor sell order is issued; in accordance to Citigroup’s live forextrading platform, order sizes are always rounded to multiplesof 5,000 currency units. Issued orders are executed as mar-ket orders on the current close price for the traded security.This makes the strategies’ internal logic the only responsiblefor the performed trading actions.

As the results of Neely and Weller [27], Mendes et al. [25],and Godinho [14] show, transaction costs have a tremendousimpact on strategy profitability. However they are often ig-nored or their application is approximate; furthermore noprevious article mentions whether their application occursa priori, so that the evolutionary process implicitly adaptsthe strategies to their presence, or a posteriori, necessarily

resulting in strategies that will perform worse when theiractivity is accounted for commissions no matter their mag-nitude.

In this work transaction costs matching those applied byCitigroup on live trading are applied. In our case the trans-action cost amount to 15 USD per million dollars transacted,appropriately converted for non-USD based currency pairs.

Finally, execution details are stored to be able to computevarious statistics at the end of the evaluation. Our trad-ing simulation is also aware of the datapoint timestamps,allowing it to track trading days and compute performancestatistics on a daily basis.

2.3.4 Fitness EvaluationTraditionally, GP is used on problems that from a compu-tational perspective have been framed in terms of regressionor classification problems: in such cases the fitness func-tion is intuitively and aptly defined as an error metric, e.g.Mean Square Error, for the desired output values known be-forehand. However, in simulation-based problems such asfinancial trading it is pointless to define a desired outputvalue for every single datapoint: the consequence of an ac-tion performed at time t (entering a position), will not beknown until at least time t+1 (exiting the position). There-fore a proper fitness measure cannot be defined as an erroron a per-datapoint basis, but instead as a measure of per-formance obtained over the entire dataset.

A natural performance measure for trading strategies is prof-itability: how much profit the strategy makes over time.Another objective that some investigations attempt to pro-mote via explicitly factoring it in the fitness function is theminimization of incurred losses.

Wilson and Banzhaf in [35] compare a raw profits fitnesswith two others factoring in the maximum drawdown (MDD,the maximum cumulative loss since the start of the period):one subtracts the MDD from the final value of assets held,the other, more conservative, instead divides this final valueby the MDD. In most of their evaluations the most conser-vative fitness generated the highest profits. Loginov utilizesa fitness in pips, the minimal profit unit in forex trading,multiplied by a quantity dependant on the number of stay,buy, and sell actions to obtain desired trading activity lev-els [21]. Hryshko and Downs [16], Mendes et Al [25], andDempster and Jones [10] all utilize the Stirling ratio: theprofit divided by the MDD, or a variation of thereof.

We do agree on considering low drawdowns an importantcharacteristic for the application on live markets; however,given the many innovative aspects, our main concern withthis work is evaluating the viability of our approach from theperspective of pure profitability. Taking into account the rel-ative nature of the trading signals emitted by our strategies,not tied to any specific equity amount, we can define:

return =Final NAV

Initial NAV− 1 (1)

Where NAV is the Net Assets Value of the strategy, thetotal value of assets held. This expression, for values of

Initial NAV > 0, would be suitable for use as a fitness func-tion for GP individuals. However an earlier, more limitedversion of our systems had very strict constraints on the fit-ness function such as not allowing negative values as wellas applying a minimization policy to the scores, so lowerscores were considered better. Therefore we had to formu-late a function complying to those criteria, we later decidedto keep the function in use to be able to compare perfor-mance of the old GP engine with the current one. Thefitness function used in this work to evaluate strategies istherefore defined as:

fitness = e−return (2)

This function maps to a monotonically decreasing fitnesslandscape with no singular or negative values, a clear limitvalue for the worst case scenario where the strategy lost allof its money:

limFinal NAV→0

fitness = e

as well as clearly showing whether the strategy is profitable(fitness < 1) or not (fitness > 1). In order to promote strate-gies with an active trading profile rather than just takingadvantage of long term trends in the market we introduceda number of minimum trades, defined as position entry-exitpairs, the strategy has to perform. If the strategy tradesfewer times than the specified amount it is assigned a verypenalizing fitness score. Another case resulting in penal-ization is for strategies that at the end of their evaluationon the training dataset lost all, or more, of the equity theystarted with.

3. EXPERIMENTAL SETUP3.1 Population Size and Generation CountAs described in Section 2.3.1, the problem space our strate-gies evolve on has considerably larger dimensionality thanpreviously documented attempts and the underlying dataseries are non-stationary. In addition, financial trading isa simulation problem (as opposed to regression or classifi-cation problems), therefore not allowing for per-datapointoptimal output values univocally defining a priori the de-sired behavior. Finally, our main concern is finding individ-uals that would exhibit good performance on unseen dataso overtraining is a potential issue.

Such conditions imply that an exhaustive sampling, andlearning, of the search space would be unfeasible. For thesereasons we will then greatly favor exploration of the searchspace over exploitation, a conclusion which Dempster andJones [10] as well as Loginov [21] also explicitly reach. There-fore we will adopt a very high population, low generationsetup.

In contrast, previous investigations on the subject favoredlow population and high generation counts instead, favoringexploitation. To exemplify, early works such as [26] and [8]only have a population size of 500 and a generation count of50; an understandable choice considering the computationalresources available at the time. The largest documentedpopulation setting so far instead is 10,000 in [23].

Dataset Period Days Datapoints

Training 23 February 2012 - 23 December 2012 213 87,604Validation 24 December 2012 - 22 February 2013 40 17,612OoS 23 February 2013 - 25 February 2014 254 105,759

Table 1: Dataset Details

In this work we utilize a population size of 75,000 and ageneration count of only 15.

3.2 DatasetsThe algorithm runs and performance evaluations featuredin this work make use of three different datasets: Training,Validation, and Out-of-Sample. As mentioned in Section2.3.1 the datasets are constituted by 5-minute bars for theEUR.USD, USD.JPY, GBP.USD, and AUD.USD, from Sun-day at 17:00 to Friday at 17:00 Eastern Time.

The training set constitutes the dataset all individuals areevaluated on and the fitness score, as per Equation 2, ob-tained on it is the only one guiding the evolutionary algo-rithm. On preliminary investigations we found that utilizinga short time period for training increases the emergence ofovertrained strategies, failing to learn generalized patternsand behaviors. Such strategies typically focus on specificor unique profitable short-term trends in the training mar-ket but exhibit very little trading activity or, worse, disas-trous performance when run on different data. To counterthis phenomenon the training set we consider consists of tenmonths of data, for a total of 87,603 simulation datapoints.

When the evaluation of strategies on the validation set isrequired, it is performed only for the top-scoring 10% ofthe population. Given the large population sizes we em-ploy, this dramatically reduces execution times for algorithmruns while not negatively affecting the chance of findingenough profitable, successful strategies on the validation setas well. The validation set consists of the two months worthof data immediately successive to the training set, 17,612datapoints.

To test the generalization capabilities of the evolved agents,after the breeding process we evaluate the best strategieson a third dataset encompassing a time period successiveto their training and validation sets. This Out-of-Sample(OoS) set consists of one year of 5-minute bar data, or105,759 simulation datapoints, directly following the timeperiods considered for training and validation.

Table 1 resumes the datasets’ details.

A commonly used policy is rolling-window retraining [10,35, 21]: agents are initially evolved on the training set, thenapplied to a test set. Successively additional algorithm runsare performed by progressively shifting forward, usually par-tially overlapping with the preceding run, the training andtest sets. Finally, the partial out-of-sample results from eachtrain-test cycle are combined to obtain final performance.

Since financial time series are not stationary processes [13],this approach has been shown to boost profitability [21].However the focus of this work is the investigation of thelong-term predictive capabilities of the proposed architec-

1.0 0.8 0.6 0.4 0.2 0.0

ft

1.0

0.8

0.6

0.4

0.2

0.0

fv

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Figure 1: Contour plot for the combined training and validationscoring function listed in Equation 3

ture and of its statistical reliability at finding profitablestrategies; therefore we opted not to employ rolling-windowretraining in the presented evaluations.

3.3 Strategy Selection CriteriaIn order to select strategies to evaluate on the Out-of-Sampleset we propose two alternative selection criteria. The first issimply the fitness score, defined in Equation 2, obtained onthe training set.

The second criterion is a combined value calculated fromthe fitness scores on both the training and validation sets ifthese are both profitable, with values < 1. The formula forthe combined score is:

s(ft, fv) = |ft − fv|+ 1−√

(1− ft)2 + (1− fv)2√2

(3)

ft and fv are the training and validation fitness scores, re-spectively. It is the sum of the function

d(ft, fv) = |ft − fv|

the absolute difference of the two fitness scores, with thefunction

r(ft, fv) = 1−√

(1− ft)2 + (1− fv)2√2

the circular, infinite cone with apex (1, 1), extending into thepositive half-space with respect to the Z axis, and scaled tohave roots on the circle passing for the origin of the ft, fvplane.

The function, illustrated in Figure 1, provides a landscapewith a valley that favors individuals with low difference be-tween their scores on the two sets; unless this difference isalready null, however, it is still possible to obtain bettercombined scoring values if the difference increases due toa significant improvement of only one out of the two sets’

score. To keep consistency with training-only scoring, thiscombined formula was also designed to minimize scores ofbetter performing strategies. Given the equal weighting be-tween the ft and fv, this combined scoring is particularlysuited to instances where the training and validation sethave equal length.

It is worth clarifying that this combined measure does notreplace the training-only score for GP fitness purposes, butit is only used for selecting the strategies to run on the Out-of-Sample set. In other words neither the fitness score ob-tained on the validation set nor the combined score haveany impact on the evolutionary process. It is always onlythe performance on the training set that drives the GP op-timization.

3.4 Out-of-Sample Evaluation and AnalyticsThe Out-of-Sample evaluation is not performed internally tothe HeuristicLab GP environment but within a commercialsoftware for quantitative trading where an execution flowanalogous to the one occurring within the breeding environ-ment is implemented. Custom-written modules for Heuristi-cLab are capable of parsing and saving the trees representingGP individuals into a linearized, plain-text, C# syntax formwhich the 3rd party trading software is capable of importingand running.

It is also possible to re-run strategies on their original train-ing and validation datasets. The comparison of the twoexecution flows in the different environments allows for anexternal, independant, verification of the correctness of thetrading, order execution, NAV, and position tracking modelsused in the breeding process.

Additionally, utilizing the quantitative trading external soft-ware makes it possible to obtain a vast number of perfor-mance statistics for the strategies, allowing a more thoroughanalysis. If computed within the HeuristicLab environmentthese additional statistics would noticeably increase the al-ready intensive algorithm running times. Finally, the ex-ternal software is capable of connecting to real-time marketdata providers and trading brokers. This means that, if de-sired, it is already technically and practically possible to runthe strategies evolved by our system on live, actual, markets.

4. RESULTSA total of five GP algorithm runs was performed; for eachrun and selection criterion we considered the top performing10 individuals. Table 2 summarizes the GP and tradingproblem setup we used to evolve the strategies. We willnow present the results obtained on the Training and onthe Out-of-Sample (OoS) dataset for the trading strategiesselected by the two criteria as well as analyze the length andstructure of their expression trees.

4.1 Training Dataset PerformanceFigure 2 shows the End-of-Day (EoD) relative NAV values,expressed as

CurrentNAV

InitialNAV∗ 100

on the training dataset for the training only selection cri-terion (Tr). The graph shows the single best performing

Population 75,000Generations 15Crossover Rate 90%Mutation Rate 15%Max Tree Depth 8Max Tree Length 60Elitism 1Fitness Function fitness = e−return

Minimum Trades 50Instrument Basket AUD.USD, EUR.USD

GBP.USD, USD.JPYTraded Instrument USD.JPYValidated Agents Top 10%Selected Agents Top 10

Table 2: Common GP Run Settings

strategy obtained across the 5 runs (Best Individual), theaverage of the single best performers of the 5 runs (Win-ners), the best average of the 10 strategies considered ina run (Best Run), and the average of all the 50 strategiesobtained across the 5 runs (Avg Run). For benchmarkingpurposes the graph includes the USD.JPY price curve, anal-ogous to a buy-and-hold strategy as well as the Barclay’sBTOP FX index. Figure 3 instead shows the EoD NAV val-ues obtained by the strategies individuated by the combinedselection (TrVa) according to the same groupings as Figure2.

Tables 3 and 4 display the final return (Return), the ratioof profitable days (Days > 0) and Pearson correlation coeffi-cient (ρ) with the daily returns of the USD.JPY. The valuesare reported for the two financial benchmarks (USD.JPY,BTOP FX), the average of all the 10 considered agents foreach run (Run 1-5), the single best performing agent (BestAgent), the average of the single top performers for the fiveruns (Winners), and the total 50 strategies average (AvgRun). Table 3 refers to the Tr criterion while Table 4 toTrVa.

Table 5 reports, always for the strategies’ Training dataset,various aggregate statistics about the performance and trad-ing activity of the 50 strategies for the two criteria. ThePerformance section shows the number of profitable runaverages and agents as well as those beating a buy-and-hold trading strategy and the global average daily return.The Trading Activity section displays the average numberof trades (Trades), winning trades (Winning) and long sidetrade (Long) ratios as well as the highest achieved winningand long trades ratios (Max Winning, Max Long).

Apr 2012

Jun 2012

Aug 2012

Oct 2012

Dec 2012

0.90

1.00

1.10

1.20

1.30

Rela

tive E

oD

NA

V

USD.JPY

BTOP FX

Avg Run

Best Run

Best Individual

Winners

Figure 2: Training set End-of-Day NAV, Tr criterion strategies

Apr 2012

Jun 2012

Aug 2012

Oct 2012

Dec 2012

0.90

1.00

1.10

1.20

1.30R

ela

tive E

oD

NA

V

USD.JPY

BTOP FX

Avg Run

Best Run

Best Individual

Winners

Figure 3: Training set End-of-Day NAV, TrVa criterion strate-gies

Return Days > 0 ρ

Benchmarks

USD.JPY 1.05 51.20% 1.00BTOP FX 1.01 51.50% 0.30

Runs

Run 1 1.27 68.08% 0.19Run 2 1.26 68.54% 0.05Run 3 1.26 67.61% -0.07Run 4 1.26 66.20% -0.05Run 5 1.23 60.56% -0.07

Best Agent 1.32 66.67% 0.25Winners 1.30 72.30% -0.10Avg Run 1.26 75.12% 0.02

Table 3: Daily returns statistics on the Training set, Tr strate-gies

Return Days > 0 ρ

Benchmarks

USD.JPY 1.05 51.20% 1.00BTOP FX 1.01 51.50% 0.30

Runs

Run 1 1.10 60.09% 0.82Run 2 1.11 53.99% 0.94Run 3 1.10 51.64% 0.95Run 4 1.11 53.99% 0.90Run 5 1.08 50.70% 0.95

Best Agent 1.16 56.81% 0.40Winners 1.14 58.69% 0.78Avg Run 1.10 51.17% 0.96

Table 4: Daily returns statistics on the Training set, TrVastrategies

4.2 Out-of-Sample Dataset PerformanceWe will now present the results obtained by the same strate-gies when run on the OoS dataset with the aim of assessinghow well they perform on unseen data. The reported quan-tities and conventions are analogous to those explained inSection 4.1 for the results obtained on the Training dataset.

Figures 4 and 5 show the EoD NAV curves, Tables 6 and7 feature the daily returns statistics for the two selectioncriteria and Table 8 contains the general performance andtrading activity metrics.

Apr 2013

Jun 2013

Aug 2013

Oct 2013

Dec 2013

Feb 20140.90

1.00

1.10

1.20

Rela

tive E

oD

NA

V

USD.JPY

BTOP FX

Avg Run

Best Run

Best Individual

Winners

Figure 4: Out-of-Sample set End-of-Day NAV, Tr criterionstrategies

4.3 Tree StructureTo investigate the difference in the logic of the trading strate-gies selected under the two criteria, we analyzed the actualsymbolic expression trees encoding the individuals. In Table9 we report the average tree length and number of variablenodes computed across the total 50 strategies per criterion.

Another interesting aspect to analyze is the relative frequen-cies of the input variables within the expression trees. Fig-ures 6 and 7 show, respectively, the per-run relative frequen-

Tr TrVa

Performance

Profitable Runs 5 5Profitable Individuals 50 50Runs > BH 50 50Individuals > BH 50 50Avg Daily Return 1.08× 10−3 4.48× 10−4

(168%) (528%)

Trading Activity

Number of Trades 2411 1040(32.72%) (79.87%)

Winning Trades 56.90% 49.56%(15.10%) (40.19%)

Long Trades 62.69% 77.26%(7.74%) (13.90%)

Max Winning 74.46% 90.77%Max Long 74.03% 89.71%

Table 5: Trading activity statistics on the Training set, Tr andTrVa strategies

Apr 2013

Jun 2013

Aug 2013

Oct 2013

Dec 2013

Feb 20140.90

1.00

1.10

1.20

Rela

tive E

oD

NA

V

USD.JPYBTOP FX

Avg RunBest Run

Best IndividualWinners

Figure 5: Out-of-Sample set End-of-Day NAV, TrVa criterionstrategies

Return Days > 0 ρ

Benchmarks

USD.JPY 1.11 50.61% 1.00BTOP FX 0.95 45.83% 0.01

Runs

Run 1 1.07 50.40% 0.17Run 2 1.02 51.61% 0.57Run 3 0.97 54.44% 0.13Run 4 0.96 51.21% -0.25Run 5 1.06 55.65% -0.10


Table 6: Daily returns statistics on the Out-of-Sample set, Trstrategies

Return Days > 0 ρ

Benchmarks

USD.JPY 1.11 50.61% 1.00BTOP FX 0.95 45.83% 0.01

Runs

Run 1 1.01 52.02% 0.93Run 2 1.05 50.81% 0.98Run 3 1.05 50.40% 0.98Run 4 1.04 50.40% 0.97Run 5 1.04 52.82% 0.95


Table 7: Daily returns statistics on the Out-of-Sample set, TrVastrategies

Run 1 Run 2 Run 3 Run 4 Run 50

10

20

30

40

50

Rela

tive F

requency

(%

)

USD.JPY AUD.USD EUR.USD GBP.USD

Figure 6: Trees for Tr strategies, currencies’ relative frequencies

cies of the currency and bar value (Open, High, Low, Close)of the variable nodes for the trees selected under the Tr cri-terion. Figures 8 and 9 instead display them for strategiesfrom the TrVa criterion.

5. DISCUSSIONIn this Section we will first analyze the strategies perfor-mance on the Training and Out-of-Sample datasets. Thenwe will characterize the common aspects across the twodatasets of the strategies selected with the Tr and TrVacriteria, with a focus on the Out-of-Sample set. Succes-sively we will examine differences in strategy structure andvariable selection occurring under the two criteria. Lastlywe will perform a results comparison with similar previousinvestigations.

5.1 Training DatasetPerformance on the Training dataset is overwhelmingly pos-itive, with all of the strategies for both selection criteriaachieving profitability and beating the two benchmarks: areturn of 5% for USD.JPY and 1% for BTOP FX. The bestperforming agent comes, unsurprisingly, from the Trainingonly criterion, obtaining a final return of 32%. The averagereturn is instead 26% for Tr strategies and 16% for TrVaones.

Tr TrVa

Performance

Profitable Runs 3 5Profitable Individuals 26 42Runs > BH 0 0Individuals > BH 4 1Avg Daily Return 7.67× 10−5 1.89× 10−4

(2538%) (2128%)

Trading Activity

Number of Trades 3247 1460(29.57%) (96.56%)

Winning Trades 56.97% 48.95%(21.33%) (38.43%)

Long Trades 57.49% 76.64%(17.70%) (17.23%)

Max Winning 83.45% 86.70%Max Long 75.13% 90.99%

Table 8: Trading activity statistics on the Out-of-Sample set, Trand TrVa strategies

Tr TrVa

Tree Length 34.06 31.68(2.96) (3.97)

Symbols Count 8.25 7.12(1.5) (0.8)

Table 9: Average and standard deviation values of the treelength and variable symbols count, Tr and TrVa strate-gies

Examining the daily return series, both USD.JPY and BTOPFX benchmarks exhibit behavior very close to a randomwalk, with 51.20% and 51.50% of profitable daily returnsrespectively. Evolved strategies, especially from Tr, insteadshow a solid day-to-day profit potential with a single strat-egy maximum of 66.20% (p = 1.31×10−6 under the null hy-pothesis the daily return series follows a 50/50 positive/negativedistribution); for TrVa this value is lower, 56.81% (p =0.027) but still very improbable under the null hypothesis.

A very interesting aspect emerging for both Tr and TrVais that the best daily profitability ratio is not given by in-dividual strategies, but by aggregates: the global averagefor Tr, 75.12%, and the average of the Run 1 strategies forTrVa, 60.09%; the average of the 5 runs’ top performers alsooutperforms the best single strategy for both criteria. Thissuggests that the evolved strategies present, to some extent,complementary trading logic. If day-to-day losses are to beminimized then, suitable strategies could be combined toform the basis for a trading portfolio.

Considering trading activity Tr strategies are on average 2.3times more active than the TrVa ones, but both have high,active, trading profiles: 2411 total trades for Tr or, con-


10

20

30

40

50

Rela

tive F

requency

(%

)

Open High Low Close

Figure 7: Trees for Tr strategies, OHLC values’ relative fre-quencies


10

20

30

40

50

Rela

tive F

requency

(%

)

USD.JPY AUD.USD EUR.USD GBP.USD

Figure 8: Trees for TrVa strategies, currencies’ relative frequen-cies

sidering the Training period length of 213 days, 11.3 tradesper day and 1040 trades, or 4.8 trades per day, for TrVastrategies.

Analyzing the trade accuracy, the ratio of profitable tradesperformed, we find that Tr achieves a value of 56.90% andTrVa of only 49.56%. However, the standard deviation ofthe accuracy values of TrVa strategies is much higher andsurprisingly the individual strategy with the highest tradeaccuracy, a staggering 90.77% (p = 4.98 × 10−205) was se-lected under this criterion while the most accurate Tr strat-egy ”only” has an accuracy of 74.46% (p = 1.02× 10−162).

Regarding market side exposure, both criteria selected strate-gies presenting a strong bias towards the Long side, howeverthe reasons for this can be explained by the dynamics of thetraded instrument price series which also present a Long sidetrend.

5.2 Out-of-Sample DatasetOn the Out-of-Sample dataset performance is still generallysolid, with more than half of the strategies and run averagesobtaining profitable returns. Understandably, in this casethe obtained returns are lower than on the Training dataset,


10

20

30

40

50

Rela

tive F

requency

(%

)

Open High Low Close

Figure 9: Trees for TrVa strategies, OHLC values’ relative fre-quencies

with the best strategy achieving a 19% return and averagesobtaining 2% and 5% for Tr and TrVa respectively.

On this dataset however, while the best performer was stillfound among the strategies selected under Tr, the total num-ber of agents achieving profits as well as the averaged resultsare better for the TrVa criterion. All of the profitable strate-gies for both criteria beat the BTOP FX benchmark whileonly 4 strategies for Tr and 1 for TrVa manage to beat thebuy-and-hold USD.JPY approach.

Here the daily return profitability is lower than on the Train-ing set as the average values are generally in the neigh-borhood of 50% for both criteria. Once again the betterperformers for this metric are strategy aggregates: 54.44%(p = 0.094) for the Tr Run 3 average, 52.82% (p = 0.20) forthe TrVa Run 5 average.

On the OoS dataset Tr strategies are 2.2 times more activethan TrVa ones. Trades per day are 12.8 for Tr and 5.7 forTrVa.

Trade accuracy average is 56.97% for Tr and 48.95% forTrVa. On this dataset as well the best trade accuracy,86.70%, (p = 2.6× 10−26) comes from a TrVa strategy.

The market side exposure again presents a preference for theLong side.

Examining trade activity, accuracy, and side exposure wefind that these results are very similar to the ones obtainedon the Training set for both criteria. This indicates thatwhile the obtained profits on unseen data are lower, otherimportant strategy performance indicators are preserved,providing the financial practitioner with a consistent basisfor which strategies to choose for live trading.

Figure 10 shows the 30-days moving averages for the dailyreturns of USD.JPY and the 50 strategies’ averages for Trand TrVa. We note that for both of the criteria this quan-tity, despite exhibiting oscillations correlated with the un-derlying instrument, does not show stable decreasing trendsduring the course of the Out-of-Sample dataset. This indi-cates that our overall approach based multi instrument in-

May 2013

Jul 2013

Sep 2013

Nov 2013

Jan 20140.002

0.000

0.002

0.004

Daily

Retu

rns

- M

A(3

0)

USD.JPY Tr TrVa

Figure 10: 30-Days moving averages for the Out-of-Sampledaily returns series of the USD.JPY, Tr average, andTrVa average

put data without using established technical indicators formodel building succeeds at learning patterns which maintainperformance levels for up to one year after being generatedwithout any sort of model retraining in place. Another con-clusion is that the usage of unprocessed price quotes, at leastfor the total considered timespan of two years, also does notlead to performance degradation and that our GP setup isable to cope with the nonstationarity of its input data.

5.3 Selection CriteriaPerformance Differences

Analyzing the strategies selected by the two criteria andtheir performance on Training and Out-of-Sample datasets itemerges that the two criteria select strategies exhibiting dif-ferent behaviors and that such characteristics are preservedwhen run on unseen data.

From the point of view of pure profits, the highest yieldingstrategies were always found via Tr, TrVa instead resultedin better averaged results.

Pearson correlation coefficients with the return series of theUSD.JPY instrument prices together with the average pro-portion of long-side trades as well as a simple qualitativeinspections of the charts in Figure 2, 3, 4, and 5 show thatthe TrVa criterion consistently selects agents much morecorrelated to the security they are trading.

Comparing the activity statistics, Tr reliably selects strate-gies trading more often while the TrVa criterion results instrategies on average performing less trades but individuallyspanning a larger range of activity levels.

Regarding trade accuracy, on average Tr produces betterstrategies but under TrVa is where the individual strategieshaving the highest accuracy were found.

On the Training Dataset, it’s Tr strategies having better av-erage daily returns while on the Out-of-Sample data not onlythis value for TrVa is more than double but also its stan-

dard deviation is slightly lower. Moreover, on OoS it’s TrVaobtaining a higher day-to-day profitability ratio. This fact,together with the higher number of non-profitable strategiesas well as the higher average trade accuracy exhibited underTr, leads us to conclude that when Tr strategies lose on theOut-of-Sample dataset, they lose more than their counter-parts selected with TrVa.

Drawing conclusions on the performance of strategies fromthe two criteria on the Out-Of-Sample dataset, both suc-ceeded in evolving strategies successfully maintaining thetrading activity and accuracy levels shown on the trainingdata. Considering that our signal-based approach requiresstrategies to alter their output value to trigger an execu-tion, this means the agents successfully generalized patternspresent in the input price series to a good level while notfixating on features specific to the training dataset with noor very little applicability on other data even when bredexclusively on the training set. Therefore we conclude ourapproach exhibits very low overtraining.

From the point of view of profitability, both criteria suc-cessfully produce strategies profitable on unseen data, how-ever chances under TrVa are better. Attempting to definethe general characteristics of the strategies selected from thetwo criteria we can conclude that the Tr criterion, statisti-cally, results in more volatile individuals, with higher trad-ing activity and profit potential but also with an increasedchance of substantial losses. The TrVa criterion instead re-sults in more modest, but far less risky, profits with minimalchance of loss and with very high correlation to the under-lying traded instrument.

Therefore both criteria are, in our own opinion, suitable forthe generation of strategies to be applied on actual, livemarkets. The amount of investment risk that can be sus-tained or desired would favor the application of one over theother. Moving from the single strategy level to the portfoliolevel, we have also seen that both criteria produce strate-gies with complementary logic that if successfully combinedwould lower risk even more.

Speculating on the nature of patterns individuated and learnedby the two criteria, we can posit that strategies selected bythe Tr criterion are able to better individuate and exploitprofitable long-term trends present in the data. By beingstable, these trends allow for more frequent trading and overtime they exhibit little correlation with the underlying in-strument. On the other hand, TrVa would seem to favorpatterns on a much shorter timescale, over time leading tolower risk and maintaining a very high correlation with theinstrument.

5.4 Strategy Structure and Variable SelectionFrom the values reported in Table 9 we conclude that thereis no significant difference in total tree length or number offeatured variable symbols in the strategies selected by thetwo criteria.

Analyzing the relative frequencies with which the differentinput currencies appear in the selected strategies, displayedin Figures 6 and 8, we notice that while the global averagefrequencies for all four currencies are very close to the ex-

pected value of 25%, the individual runs present differentvariable usage ”spectra”.

The traded instrument, USD.JPY, is the only one that forevery Tr and TrVa run deviates the least from the expectedvalue, exhibiting a relative standard deviation of 3% for Trruns and of 16% for TrVa runs. This indicates that ourapproach successfully and reliably recognizes the importanceof the traded instrument even with a fairly high-dimensionalinput space as the one we make use of.

Frequencies for other instruments instead vary more betweenruns, with relative standard deviations of 32-57% for Tr runsand 16-28% for TrVa. For some runs the selected strategiesmake more or less and equal use of all of them, in othersthe evolutionary process converges towards a pool focusedon only one or two other instruments besides the tradedone. This could be interpreted as the separate runs findingpatterns mainly involving a subset of the security basketwhile relatively disregarding other inputs.

The always lower relative standard deviations of TrVa ver-sus Tr indicate that the former criterion, due to its implicitfocus towards finding more general patterns, is less prone tosearch space exploitation, or breeding of super-specialists,and therefore maintains a more average variable usage dis-tribution across runs. This is consistent with the bettergeneralization capabilities on unseen data shown by TrVafrom a profitability perspective, as detailed in Section 5.3.

Regarding the distribution of the selected variables with re-spect to the Open, High, Low, and Close values, there ap-pear to be no significant patterns or differences among thetwo criteria.

5.5 Comparison with Previous WorkPerforming a comparison between works in this field is notstraightforward, given the many variable aspects involved inautomated trading strategies and systems. We will then hereexamine only the previous works in the literature with a sim-ilar setup to our own: utilizing GP or GA to evolve strategiesor models, trading on foreign exchange, utilizing unleveredreturns, and including transaction costs in the evaluationprocess.

The next-day return forecasting models of [31] obtains, af-ter transaction costs, an annualized return of 5.9% on theEUR.USD. Despite our strategies having a higher tradingactivity, and being therefore more exposed to transactioncosts, their best result is close to our average Out-Of-Sampleresult of 5% but considerably lower than our best one, 19%.

The hybrid GA - reinforcement learning technique proposedin [17] utilizes 5 minutes data as we do. They report profitsof 6% on the EUR.USD in 3.5 months of out-of-sample data,which are extrapolated to 20% annualized returns. That isslightly better than our overall best performer, however con-sidering that it has been shown that returns tend to degradeover time [10] it is unknown how well their generalized figurewould hold in practice.

The work described in [35] is the one utilizing the setupthe most similar to our own as they are the only other ones

evolving free-form agents; also similarly, their work and oursshow a very high best trade accuracy: 90% or higher. Re-garding profits obtained on the USD.JPY, their raw fitnessachieves 4.27% and their conservative one 13.44%. Theirconservative fitness is based on the Stirling ratio and it aimsat explicitly reducing the strategy losses while their raw fit-ness is the unmodified value of assets held and is the sameas our fitness. Therefore it can be said that our results arebetter than theirs as our non risk-mitigating fitness on av-erage performs slightly higher then theirs, but ours in thebest case obtains better returns than their risk-mitigatingone. However, final profits are strongly tied to the chosenevaluation period so they are not fully comparable betweendifferent works. Even disregarding the actual obtained profitfigures, we produced very similar results without data pre-processing or rolling window retraining; techniques whichreduce the complexity of the input space and allow modelretraining.

Another investigation with aspects similar to our own is theone from Manahov and Hudson [22], as they employed 5 min-utes data as well as a very high population approach. Theiraverage trade accuracy on out-of-sample data is close to theone we obtained, a few points in excess of 50%. However, weare not able to compare the obtained results because theyexpress and report their profits as excess returns against aUS Treasury Bill and their out-of-sample period is only 1month long.

Even comparing with other, more dissimilar approaches, ourresults appear better, more resilient, or higher: in contrastto [27] and [25] we do not struggle at all in the presenceof transaction costs while [10] reports an annualized returnof 7% on the GBP.USD but their performance substantiallydegrades after nine months into the out-of-sample period.

In general, we are able to state that against previous re-lated work we successfully managed to improve on severalweaknesses. Our approach, while still showing lower prof-its on unseen data as opposed to its training set, maintainsessentially analogous trade frequencies and accuracies be-tween the two datasets, indicating successful and generaliz-able learning of profitable patterns. Additionally, by factor-ing in transaction costs already from the breeding processour evolved strategies are fully and stably profitable, bothin and out of sample, in their presence despite their high ex-hibited trading activity: on average 4 to 13 trades per day.Trade accuracy ranges from statistically expected values tovery high ones, pointing to the conclusion that explicitly in-cluding risk mitigation/loss aversion measures in the fitnessfunction, something we do not have, is not strictly necessaryto evolve strategies with very little losses. Lastly, despitenot employing any sort of periodic or performance-triggeredmodel retraining and making use of raw price quotes, wewere able to successfully generate models which do not ex-hibit signs of diminishing returns on out-of-sample perfor-mance for up to one year. This contrasts with what [17]reports and establishes that neither the use of common tech-nical indicators nor data preprocessing are required for theemergence of successful, profitable strategies.

From the point of view of out-of-sample profits, the resultsof our approach outperform most documented attempts and

in the worst case they are on-par with the best results foundin literature.

6. CONCLUSIONSWe described, implemented and tested a genetic program-ming system for the evolution of currency trading strategiesin the foreign exchange market. The proposed system in-troduced several innovative aspects aimed at facilitating theapplication of the strategies to live, production environmentsas well as finding trading patterns breaking away from tra-ditional technical analysis models.

The principal innovation is constituted by having price datafrom multiple currency pairs in addition to the single onebeing traded as inputs to the system. This successfully re-sulted in the evolution of strategies featuring multi-currencytrading logic.

We examined performance of two different sets of solutions,obtained across several independent runs, over one year ofout-of sample, unseen data: one set selected according toperformance on a single training dataset, and the secondone making use of a combined performance metric involv-ing an additional validation set. For all the performed runsprofitable strategies were found, indicating the reliability ofour proposed architecture.

The two selection criteria both resulted in best performingindividuals maintaining an active trading profile (6 to 13trades per day), having a remarkable trade accuracy (83 to87% of performed trades are profitable), as well as achiev-ing final profits competitive with prior documented investi-gations (12 to 19%). Both criteria appear therefore suitablefor live application. However, the strategies selected underthe two criteria also exhibit some differences: the trainingonly selection policy appears to be more suited if looking forstrategies maximizing profits and, therefore, risk and havinga NAV profile lowly correlated with the price of the under-lying instrument. The combined selection criterion insteadis better at finding agents minimizing risks at the expense ofthe profitability potential as well as having high correlationwith the underlying security.

The system also appears able, over multiple runs, to evolvestrategies exhibiting complementary trading logic which canbe combined to minimize risk even further without impact-ing the profit potential.

6.1 Future DirectionsThe proposed system was tested on intraday foreign ex-change data with a 5 minutes frequency, however our tech-nique and setup can be utilized to trade other asset classes(stocks, ETFs, . . . ), or on other datapoint frequencies with-out further modification.

The fitness metric utilized in this work is based on the ob-tained final return, which is dependant on dataset length.The combined selection criterion ties together the fitnessscores obtained on the training and validation datasets. Asin the presented evaluations the training and test sets haddifferent lengths, despite still showing promising results, thecombined criterion’s efficacy at learning less risky patternstraded-off with its maximum profit potential. In order to

better assess its validity it would be therefore necessary toevaluate it in a scenario where training and test sets have thesame length. Alternatively, this issue could be addressed bydefining a different, dataset length invariant, fitness score.

Considering the variety of trading models utilizing autore-gressive components in both literature and industry practice,it is worth investigating the performance impact resultingfrom the introduction of past security prices to the pool ofvariables the Genetic Programming strategies have accessto.

7. REFERENCES[1] F. Allen and R. Karjalainen. Using genetic algorithms

to find technical trading rules. Journal of financialEconomics, 51(2):245–271, 1999.

[2] L. Altenberg. Emergent phenomena in geneticprogramming. In Evolutionary Programming -Proceedings of the Third Annual Conference, pages233–241. World Scientific Publishing, 1994.

[3] E. M. Azoff. Neural network time series forecasting offinancial markets. John Wiley & Sons, Inc., 1994.

[4] Bank for International Settlements. Triennial centralbank survey of foreign exchange turnover in april 2013- preliminary results released by the bis.http://www.bis.org/press/p130905.htm, September2013.

[5] W. Banzhaf, P. Nordin, R. E. Keller, and F. D.Francone. Genetic programming: an introduction,volume 1, chapter 7. Morgan Kaufmann SanFrancisco, 1998.

[6] W. Banzhaf, P. Nordin, R. E. Keller, and F. D.Francone. Genetic programming: an introduction,volume 1, chapter 5. Morgan Kaufmann SanFrancisco, 1998.

[7] BarclayHedge. Barclayhedge | barclay btop currencyindex. http://www.barclayhedge.com/research/indices/btopfx/index.html, June 2014.

[8] A. Brabazon and M. O’Neill. Biologically inspiredalgorithms for financial modelling. Springer, 2006.

[9] S. Cirillo and S. Lloyd. A scalable symbolic expressiontree interpreter for the heuristiclab optimizationframework. In Proceedings of the 2014 conferencecompanion on Genetic and evolutionary computationcompanion, pages 1141–1148, July 2014.

[10] M. Dempster and C. Jones. A real-time adaptivetrading system using genetic programming.Quantitative Finance, 1(4):397–413, 2001.

[11] C. Evans, K. Pappas, and F. Xhafa. Utilizing artificialneural networks and genetic algorithms to build analgo-trading model for intra-day foreign exchangespeculation. Mathematical and Computer Modelling,2013.

[12] R. Gencay, M. Dacorogna, R. Olsen, and O. Pictet.Foreign exchange trading models and marketbehavior. Journal of Economic Dynamics and Control,27(6):909–935, 2003.

[13] C. L. Giles, S. Lawrence, and A. C. Tsoi. Noisy timeseries prediction using recurrent neural networks andgrammatical inference. Machine learning,44(1-2):161–183, 2001.

http://www.bis.org/press/p130905.htm

http://www.barclayhedge.com/research/indices/btopfx/index.html

http://www.barclayhedge.com/research/indices/btopfx/index.html

[14] P. Godinho. Can abnormal returns be earned onbandwidth-bounded currencies? evidence from agenetic algorithm. Economic Issues, 17(1), 2012.

[15] A. Hirabayashi, C. Aranha, and H. Iba. Optimizationof the trading rule in foreign exchange using geneticalgorithm. In Proceedings of the 11th Annualconference on Genetic and evolutionary computation,pages 1529–1536. ACM, 2009.

[16] A. Hryshko and T. Downs. System for foreignexchange trading using genetic algorithms andreinforcement learning. International journal ofsystems science, 35(13-14):763–774, 2004.

[17] A. Hryshko and T. Downs. Development of machinelearning software for high frequency trading infinancial markets. Business Applications andComputational Intelligence, page 406, 2006.

[18] J. R. Koza. Genetic Programming: vol. 1, On theprogramming of computers by means of naturalselection, volume 1. MIT press, 1992.

[19] J. R. Koza. Human-competitive results produced bygenetic programming. Genetic Programming andEvolvable Machines, 11(3-4):251–284, 2010.

[20] A. W. Lo. The adaptive markets hypothesis. TheJournal of Portfolio Management, 30(5):15–29, 2004.

[21] A. Loginov. On the utility of evolving forex markettrading agents with criteria based retraining. 2013.

[22] V. Manahov and R. Hudson. New evidence oftechnical trading profitability. Economics Bulletin,33(4):2493–2503, 2013.

[23] V. Manahov, R. Hudson, and B. Gebka. Does highfrequency trading affect technical analysis and marketefficiency? and if so, how? Journal of InternationalFinancial Markets, Institutions and Money,28:131–157, 2014.

[24] D. McKenney. Literature review: Accelerating stocktrading rule creation using genetic programming on agpu device. 2010.

[25] L. Mendes, P. Godinho, and J. Dias. A forex tradingsystem based on a genetic algorithm. Journal ofHeuristics, 18(4):627–656, 2012.

[26] C. Neely, P. Weller, and R. Dittmar. Is technicalanalysis in the foreign exchange market profitable? Agenetic programming approach. Cambridge Univ Press,1997.

[27] C. J. Neely and P. A. Weller. Intraday technicaltrading in the foreign exchange market. Journal ofInternational Money and Finance, 22(2):223–237,2003.

[28] C. J. Neely and P. A. Weller. Lessons from theevolution of foreign exchange trading strategies.Journal of Banking & Finance, 37(10):3783–3798,2013.

[29] C. J. Neely, P. A. Weller, and J. M. Ulrich. Theadaptive markets hypothesis: evidence from theforeign exchange market. Journal of Financial andQuantitative Analysis, 44(02):467–488, 2009.

[30] J.-Y. Potvin, P. Soriano, and M. Vallee. Generatingtrading rules on the stock markets with geneticprogramming. Computers & Operations Research,31(7):1033–1047, 2004.

[31] G. A. Vasilakis, K. A. Theofilatos, E. F.Georgopoulos, A. Karathanasopoulos, and S. D.Likothanassis. A genetic programming approach foreur/usd exchange rate forecasting and trading.Computational Economics, pages 1–17, 2012.

[32] S. Wagner, G. Kronberger, A. Beham, M. Kommenda,A. Scheibenpflug, E. Pitzer, S. Vonolfen, M. Kofler,S. Winkler, V. Dorfer, et al. Architecture and designof the heuristiclab optimization environment. InAdvanced Methods and Applications in ComputationalIntelligence, pages 197–261. Springer, 2014.

[33] G. Wilson and W. Banzhaf. Prediction of interdaystock prices using developmental and linear geneticprogramming. In Applications of EvolutionaryComputing, pages 172–181. Springer, 2009.

[34] G. Wilson and W. Banzhaf. Interday and intradaystock trading using probabilistic adaptive mappingdevelopmental genetic programming and linear geneticprogramming. In Natural Computing in ComputationalFinance, pages 191–212. Springer, 2010.

[35] G. Wilson and W. Banzhaf. Interday foreign exchangetrading using linear genetic programming. InProceedings of the 12th annual conference on Geneticand evolutionary computation, pages 1139–1146.ACM, 2010.

Evolving intraday foreign exchange trading strategies ... · Genetic Programming; Finance; Foreign...

Documents

Transcript of Evolving intraday foreign exchange trading strategies ... · Genetic Programming; Finance; Foreign...