Performance Evaluation of the E cient Crashes Model Based ... · the model in the form as it exits...

122

Transcript of Performance Evaluation of the E cient Crashes Model Based ... · the model in the form as it exits...

Performance Evaluation of theE�cient Crashes Model Based on

Synthetic Data

Master Thesis

Clint J. Kurinjirappalli

Chair of Entrepreneurial Risk

Swiss Federal Institute of Technology Zurich

Supervision

Jan-Christian GerlachProf. Didier Sornette

April 3, 2019

Contents

1 Introduction 3

2 The E�cient Crashes Bubble Model 7

2.1 Structure of the ECBM Trading Strategy . . . . . . . . . . . . . . . . 7

2.2 Theoretical Model Description . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Optimal Investment Strategy Based on the Kelly Criterion . . . . . . 11

2.4 From Theory to Application: Model Parameter Estimation . . . . . . 12

2.5 Python Model Implementation . . . . . . . . . . . . . . . . . . . . . . 14

3 Experimental Setting 19

3.1 Synthetic Data Generation . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.1 Motivation and Theory for Complexity Levels . . . . . . . . . 19

3.1.2 Complexity Levels and Performance Expectations . . . . . . . 20

3.1.3 Synthetic Data Generation Procedure . . . . . . . . . . . . . . 28

3.2 Running the ECBM on Synthetic Data . . . . . . . . . . . . . . . . . 30

3.3 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Results and Discussion 37

4.1 Estimation Performance . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.1.1 General Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.1.2 Initial Input d . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.1.3 Initial Input dtd . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.1.4 Initial Input dtn . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1.5 Summary Estimation Performance . . . . . . . . . . . . . . . 54

4.2 Trading Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.2 Overall Overview . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.3 Initial Input d . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2.4 Initial Input dtd . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.2.5 Initial Input dtn . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2.6 Summary Trading Performance . . . . . . . . . . . . . . . . . 66

4.3 Relationship between Parameter Estimation and Trading Performance 66

5 Conclusion 73

3

6 Outlook 75

Appendices 80

A Heatmaps 81

B Lambda Boundaries (-1,2) 103

1

Acknowledgements

First and foremost I would like to thank Professor Didier Sornette for allowing me towrite my master thesis at the Chair of Entrepreneurial Risk. The amount of time hetook for me as a simple master student was staggering and was deeply appreciated.He enabled me to tackle this personal challenge at the end of my academic careerby providing all the help and resources I could have possibly needed. I was able tolearn a tremendous amount during my time at his Chair and will miss the discussionsduring the breakfast meetings.I must express my profound gratitude to Jan-Christian Gerlach, without whosesupport it most probably would not have been possible for me to �nish this thesis.His patience for my many questions were crucial to my understanding of the subjectmatter and he steered me in the right direction whenever he thought I needed it. Icould not have wished for a better supervisor.Further, I would also like to thank Jérôme Kreuser for his helpful inputs to mydraft. At this point, my gratitude also goes out to the Euler cluster support team,particularly Urban Borstnik for his advice regarding parallelization.Finally, I would like to thank my parents, siblings, friends and partner for theircontinued encouragement, not only throughout this thesis but over my years ofstudy. Thank you for believing in me, it means the world to me.

Abstract

In this work, the trading strategy based on the Super-Exponential Rational Expec-tations Model with E�cient Crashes, going by the name E�cient Crashes BubbleModel (ECBM) was closer investigated. With the help of synthetic data, the rela-tionship between the model performance and its input variables is analyzed. Thissynthetic data was generated on the basis of the fundamental conditions based onwhich the ECBM abstracts the market. We showed that the estimation performanceand trading performance are not correlated. Consequently, investigating other op-timal investment methods than Kelly is strongly advised.

In dieser Arbeit wurde die Handelsstrategie auf Basis des Super-Exponential Ratio-nal Expectations Model with E�cient Crashes, kurz E�cient Crashes Bubble Model(ECBM) genannt, näher untersucht. Mit Hilfe von synthetischen Daten wird derZusammenhang zwischen der Modellleistung und ihren Eingangsgröÿen analysiert.Diese synthetischen Daten wurden auf der Basis der grundlegenden Annahmen er-stellt, von denen das ECBM ausgeht, um den Markt zu abstrahieren. Wir habengezeigt, dass die Schätzungsleistung und die Handelsperformance nicht korreliertsind. Daher wird dringend empfohlen, andere optimale Anlagemethoden als Kellyzu untersuchen.

Chapter 1

Introduction

For as long as people have traded in organized markets, systematic approaches totrading have been around. After all, it is di�cult if not impossible to generate long-term pro�ts without following a strategic approach. Trading strategies became evenmore prevalent with the growth of the markets themselves, the digitalization of stockexchanges and trading during the 80's and with the growth of inexpensive computingpower. Algorithmic trading now accounts for the majority of total market volumeand the rise of such automated trading is expected to continue[1].

The bene�ts of a successfully tested trading strategy are as obvious as the drawbacksof an improperly tested trading strategy - �nancial gain or loss. On a simple level,the success of a trading strategy can be measured in a binary way - either it ispro�table or not. Additionally, other metrics such as volatility, maximum drawdownand other factors play an important role for investors with di�erent risk preferences.A rigorous (back-)testing approach for the evaluation of trading performance willhelp to avoid the costly mistake of deploying an un�nished or faulty trading strategy.

In this master thesis, a trading strategy based on the Super-Exponential RationalExpectations Bubble Model with E�cient Crashes (abbreviated ECBM) is investi-gated. The model was proposed by Kreuser and Sornette [2] in 2018. In the corre-sponding paper, the authors demonstrate the trading performance of the model onseveral well-known past bubbles in di�erent markets by means of a Matlab tradingstrategy simulation code. For instance, in Figure 1.1 can be seen how the E�cientPortfolio weathers the bust of the bubble unharmed, showing great promise on his-torical bubbles, all while other strategies did not fare quite as well. This shows thecapability of the trading strategy to outperform the simple buy and hold strategyand other strategies on di�erent bubbles. Furthermore, the potential of the tradingstrategy to circumvent phases of market disruption (i.e. crashes or rallies) and ide-ally even exploit them by shorting (or long trading) is demonstrated. The tradingperformance of the model is dependent on adjustable initial input variables, whichcan be optimized to arrive at better trading results. It is one of the main goals ofthis thesis to explore the space of these input variables and investigate their relationto trading and model parameter estimation performance. It must be noted that theperformance shown in the paper was achieved by tuning the initial variables basedon intuition, experience and a trial and error process. In other words, a rigorous

3

Chapter 1. Introduction

Figure 1.1: Figure from Kreuser and Sornette [2] comparing the performanceof various trading strategies during the 1997 Asian �nancial crisis. The OptimalE�cient Portfolio corresponding to the strategy constructed based on the ECBMestimation code, depicted in black, performs better than the market and severalother strategies.

or even automated procedure for obtaining the optimal input variables does not ex-ist yet. In this work, we raise the question whether the positive performance of themodel implementation shown in the paper may have only been a result of over�ttingand in-sample optimization.In-sample optimization of model variables (on the training set) increases the dangerof over�tting, and hence bad performance on newly arriving data (the test set)and is generally unreliable for evaluation of a trading strategy's future performance.Ideally, a trade-o� between good in-sample performance and robust out-of-sampleperformance must be found. In technical terms, we speak of �nding a balancebetween the training and the prediction error.Here, we further test the code used to construct the trading strategy. In particular,we focus on three aspects:

1. Model parameter estimation quality

2. Trading performance

3. The relation between 1. and 2.

Finally, we conclude, whether it is possible to identify stable input variable rangesthat reliably lead to outperformance of the buy and hold strategy. Naturally, lookingat point 1., one might ask how it is possible to measure estimation performance whenthe true parameters underlying the real-world data are unknown. For this reason,we base our analysis entirely on synthetic data, which bears the advantage that onecan specify the true parameters in advance, such that later on they can be used asa reference to calculate the model estimation error.

4

Furthermore, if we were using real market data for this analysis, we would be con-fronted with the issue of model error; we know that real �nancial time series arenot generated in the exact same manner as described by the ECBM. Theoreticallyof course, we assume that the proposed model is the best model capturing the un-derlying dynamics of the system that we are describing. However, every model, nomatter how advanced, is just a �ne or more abstract approximation to the real-worldsituation. Therefore, inevitably, we will encounter model error between our modeland the true model perfectly describing the real-world system.By using synthetic data, we can assure that the model we are using to describe thedata is exactly the same as the model that was used to generate the data. Therefore,the model error will be zero. Then we are left with other error sources and can focuson these. Potentially, therefore, a synthetic data analysis frees us from model errorand enables us to optimize our estimation procedures, solely based on endogenouslygenerated data, which in the end might improve the estimation on real-world data.In summary, when calibrating a trading strategy on a real-world price time serieswith unknown underlying true parameters, two separate problems are entangled:

1. Model Error

2. Estimation Error

These two issues need to be untangled from one another to allow us to study themseparately. Here, we focus on the issue of model calibration (i.e. point 2.), as weemploy a synthetic data approach with known true parameter values.Using synthetic data, as an additional bene�t, we can create as many datasets asrequired for our analysis. We can ultimately focus on the calibration procedures ofthe model and on �nding the optimal initial input variables.In future work, as this is not in the scope of this thesis, one could run the tradingstrategy with optimal initial inputs on real-world price series and gain invaluableinsights. Should, for example, the calibrated model work well on synthetic data butnot on real time series, one can conclude that the model may be misspeci�ed. In analternative scenario where the calibrated model performs badly with real time seriesas well as synthetic data, the conclusion cannot be that the model is bad, but onlythat the calibration is inadequate.As the reader will see in Chapter 2, the ECBM consists of di�erent, rather complexestimation processes, all of which ultimately in�uence the investment decision at agiven point in time. The in�uence of these estimation processes is closer examined bycreating synthetic data of various complexity levels that consider di�erent featuresof the ECBM. Ultimately, we create datasets exhibiting di�erent behaviour, idealto analyze the underlying processes of the model.

5

Chapter 2

The E�cient Crashes Bubble Model

2.1 Structure of the ECBM Trading Strategy

From time to time, asset prices in �nancial markets deviate for a prolonged time fromtheir fundamental value. We call this behaviour, or more speci�cally, the resultingmispricing component in the asset price, a bubble. Positive (Negative) Bubbles growand eventually crash (rally) back to the fundamental value of the asset. Crashes(Rallies) may appear as single hefty market disruptions smashing the price down(up), or as a series of minor drawdowns (drawups) which occur over a longer timeperiod [3]. In both cases, the price is eventually corrected back into the proximityof the fundamental price.Many �nancial bubble models have been proposed in the past, but without �nalconclusion, as the de�nition of bubbles is somewhat di�cult, in the sense that itis seldom possible to �gure out the true fundamental price of an asset. Therefore,measuring the exact size of the bubble component is often not possible which makesit di�cult to test the proposed models. Furthermore, there is a lot of dispute abouthow to actually de�ne a bubble [4]. Moreover, bubbles and crashes often appearon multiple time scales, in di�erent asset classes, with varying shapes of the pricetrajectory, and so on, which raises the question whether there even is �the one�model that can universally describe bubble behaviour.Overall, when designing a model for �nancial bubbles, it may be senseful to leave alot of freedom to the evolution of the price, due to the numerous issues describedabove. One such model, the model that we focus on in this work, is the Super-Exponential Rational Expectations Bubble Model with E�cient Crashes that wasproposed by Kreuser and Sornette [2] in 2018. Alongside the theoretical model, atrading strategy based on the Kelly criterion is developed and, as mentioned above,the results of the strategy's performance are demonstrated.In essence and as schematically depicted in Figure 2.1, the strategy proposed in thepaper consists of two distinct parts. The �rst one is the theoretical presentation ofthe ECBM. The second part is concerned with the practical estimation of the pro-posed model's parameters and using these estimates to construct a two-componentportfolio (of a risk-free asset and the analyzed risky asset) by optimally allocatingone's wealth in the sense of the Kelly Criterion, as described in Chapter 2.3.

7

Chapter 2. The E�cient Crashes Bubble Model

In the following sections, we �rst discuss the two distinct parts of the strategy byintroducing the theoretical model with its associated conditions and explaining howthe trading policy is determined. Then we turn our attention to the central threeestimations procedures, represented by the blocks in Figure 2.1, before we �nallywalk through the Python code and compare it to the Matlab implementation of thepaper.

8

2.1. Structure of the ECBM Trading Strategy

Figure 2.1: This simple �owchart shows all the important pieces of the ECBMtrading strategy implementation. The two parts forming the strategy are the im-plementation of the ECBM, supplying all estimates to the second part, the KellyOptimization Process, which in turn results in an investment policy given by λ.The input variables d, dtd, dtn are tuning parameters and in�uence the outcome ofvarious estimation processes. Their in�uence on the various building bricks of thecode and the model parameter estimates is depicted here.

9

Chapter 2. The E�cient Crashes Bubble Model

2.2 Theoretical Model Description

The ECBM di�erentiates between three types of price behaviour, (i) usual assetnon-bubble price growth, (ii) positive (and negative) bubble dynamics and (iii) pricecorrections through crashes or rallies. Equation 2.1 is the central equation describinghow the model abstracts market behaviour. The market is described as a simplestochastic price process.

pt+1 = pt exp(at + σεt) (2.1)

We assume that σ is constant and the noise is Gaussian. This assumption could berelaxed in future work.Equation 2.2 describes the above mentioned cases where the asset price grows anda bubble develops or the crash happens. Corresponding probabilities are given nextto them.

at =

{rt with probability 1− ρt with 0 ≤ ρt < 1

κi ln(qt) + rD with probability ρtηi(2.2)

with∑ηi = 1 K =

∑ηiκi

During phases of normal price growth, the price follows a slow-moving, normal priceprocess that represents the long-term evolution of the fundamental value of the assetwith Nt = exp(rN t) and rN as the long-term average return. In the limit of largetime, both normal and real price process are assumed to grow at the same rate.However, during a bubble, the real price temporarily deviates from the normal priceprocess resulting in a mispricing of the asset, expressed by ln(qt) = ln(Nt)− ln(pt).Now, with continued growth of the bubble, investors perceive higher risk for a crashcorrecting the in�ated asset price towards the normal price.According to the rational expectations condition, investors demand a larger returnof the asset for increasing mispricing. It becomes evident as the expected return rtis in�uenced by the mispricing as well as the probability and intensity of a crash.At some point, the price randomly deviates from the normal price, leading to thenucleation of a bubble. This starts a feedback mechanism that fuels unsustainablebubble growth, due to the following dynamics: rt = rD − ρK ln(qt)

1−ρIt can be seen from the formula for rt that investors price risk (and therefore returnproportionally) as a function of the (perceived) probability of crash or rally, theexpected average corrective crash size and the mispricing. Intuitively, this agreeswith our imagination of market dynamics.But as the probability and expected intensity of a crash grow, which are both quan-tities that are assumed to grow as functions of increasing mispricing, a feedbackmechanism is in place that introduces higher returns. Therefore, higher mispricinginduces higher returns, but higher returns, in turn, increase the mispricing. Thisbehaviour results in an unsustainable feedback mechanism of spiralling prices that

10

2.3. Optimal Investment Strategy Based on the Kelly Criterion

gives rise to super-exponential price dynamics that are deemed evidence of bubblebehaviour.Finally, a market correction in form of a sudden jump or a series of minor jumps,originating from the Poisson Process component of the price process is bound tooccur. The core idea of the E�cient Crashes Model is that such crashes or rallieswill always correct the bubble price back towards the proximity of the normal priceprocess. In this sense, occurring crashes are termed e�cient, as they always tend torebalance the price of an asset proportionally to the bubble mispricing. Assumingthat bubbles occur repetitively and are followed by e�cient crashes, this implies thatin the long run, the real price will oscillate around the normal price. This backs theassumption that in the limit, the normal and real price grow at the same rate.Crash probability is a function of mispricing too: ρt =

1−qat1+b

with a and b as valuesestimated by the model.The ECBM uses various approaches for estimation of the model parameters, whichwill be closer discussed in Chapter 2.4.

2.3 Optimal Investment Strategy Based on the Kelly

Criterion

Equation 2.3 reveals how wealth is accumulated from one time step to the next forour portfolio with two assets. The optimal investment problem now is concernedwith allocating a portfolio with the expectation of obtaining �optimal� result in thefuture. Optimality in this scenario refers to maximizing the log of wealth growth asdescribed by Equation 2.4. This is called the Kelly approach [5] that combines allestimated quantities into an investment decision expressed by Kelly Criterion λ [6].

Wt+1 = [λ exp(at + σεt) + (1− λt) exp(rf )]Wt (2.3)

with:

Wt = Wealth at Time t

λ = Kelly Allocation

a = Deterministic Drift

rf = Risk Free Rate

σεt = Gaussian Noise

It is the goal to optimally allocate the wealth at �xed1 rebalancing times to atwo-component portfolio consisting of a risky and a risk-free asset. Constructing a

1In the current state of development of the model theory as well as the implementation, we onlyallow for �xed rebalancing times (that are represented by the input d). It is however a future goalto make the rebalancing time more �exible such that the model becomes more reactive in responseto market changes.

11

Chapter 2. The E�cient Crashes Bubble Model

portfolio consisting of one risky asset and a risk-free asset growing at the �xed rateof rf provides clarity and allows for easy tests of the underlying methodology of ourtrading strategy.As mentioned above, the fraction of the total wealth that is invested into the riskyasset is λ while 1 − λ represents the fraction invested in the risk-free asset. Thismeans λ can move freely between for instance [-1,1]. As soon as λ becomes negative,we short the risky asset, if it is positive, a long position in the asset with λ percent ofthe current wealth is invested. This means in a scenario where λ = −1, the strategyshorts the risky asset and for λ = 1, it goes long in it.We wish to maximize the following expression to �nd the optimal λ in the currenttime step.

maxλtEt[ln(Wt+1

Wt

)] ≡ L(λ∗t ) (2.4)

Ultimately, in the theoretical framework of the ECBM, an approximation formulafor the optimal choice of λ is derived:

λ∗ ≈ rD − rf + σ2/2

σ2 + (1− ρ)(r − rf )2 + ρ(K ln(qt) + rD − rf )2

As one can see, several parameters (rD, rN , σ, ρ, K) are required to compute thevalue of λ. They need to be estimated based on the price series at hand using. Thisis the target of the estimation procedures described in the paper and hard-coded inthe implementation.As stated before, we are interested in evaluating how well our code can estimatethese parameters, in other words we aim to investigate the calibration quality. Weassume that a setting such as synthetic data where the true values are known canyield conclusions about how well the code can estimate the parameters on real-worlddatasets as well. The following section will go into further detail regarding that.

2.4 From Theory to Application: Model Parameter

Estimation

As seen, the theoretical model and the Kelly optimization leave us with a number ofmodel parameters that must be estimated. Here, the interface between theory andpractical application comes into play. In order to arrive at real, numerical parameterestimates given input price data, we employ several estimation procedures (that areultimately written to code). In this section, these estimation procedures are shortlysummarized and their interplay is highlighted.Figure 2.2 shows the three main building bricks of the ECBM code, as well as theirinputs and the resulting estimation outputs. It is clear from this �gure, that not allinitial inputs d, dtd, dtn in�uence all procedures to the same degree.The Exponential Price Process Estimation for one, is performed by �tting an ex-ponential model to the price series using Ordinary Least Squares for both rN , rep-resenting the rate of return given by a long-term window dtn, and rD, representing

12

2.4. From Theory to Application: Model Parameter Estimation

Figure 2.2: Figure depicts the central three estimation processes of the ECBMtrading model, their inputs and respective outputs. In contrast to �gure 2.1, this�owchart shows a modi�ed parameter estimation for ρ with the greyed out blockdefunct.

13

Chapter 2. The E�cient Crashes Bubble Model

the discount rate given by the short-term window dtd. Contrary to dtd, which onlyexerts in�uence on rD, dtn a�ects σ, ρ, and K through the mispricing as well as rN .The Jump Estimation is a statistical test performed for the estimation of returnjumps in the price series. By �ltering out jumps, the jump-free volatility of theGeometric Brownian Motion given by σ, as well as the distribution of jumps andthe expected intensity of a crash or rally K can be calculated.The Super-Exponential Probability Estimation returns the probability ρ of a jumpby modelling it as an accelerating power-law-type probability function and estimat-ing its parameters by Weighted Least Squares.Detailed information on the functional principles of the estimation components aregiven in the paper.

2.5 Python Model Implementation

As stated before, we use several estimation procedures to obtain model parame-ter estimates. The estimation procedures themselves are depending partly on theparameter estimates provided by other estimation procedures and furthermore onseveral adjustable input variables. Altogether, the �owchart depicted above hasbeen transformed into a Python code.The main function, dubbed execute_strategy.py, receives two sets of inputs, for onethe price series of the risky-asset and for another a collection of con�guration condi-tions for the strategy to follow. These con�guration conditions include among otherthings the utilized leverage, the duration of the trading period and the initial inputsd, dtd, dtn, whose in�uence on the estimation accuracy and trading performance thiswork investigates. A closer description of these conditions follow in Chapter 3. Sev-eral ancillary functions are employed to arrive at an investment decision at everyrebalancing time. A �owchart showing the detailed steps performed by the Pythonimplementation can be seen in Figure 2.3. The main function outputs metrics de-scribing the performance of the strategy and the resulting wealth evolution. Withinthe scope of this work, it also returns the parameters estimated by the strategy atevery rebalancing time.The results in the paper were created on a Matlab implementation. In the following,we discuss the changes between preceding Matlab implementation of the tradingstrategy and the recreated Python implementations. The implementation of theECBM trading strategy in Python used in this work is slightly di�erent to theimplementation in Matlab used in the paper by Kreuser and Sornette.The three main di�erences:

1. No Kmin,P and Kmin,N

Lower bounds for the average corrective jump size in positive and negativebubbles, Kmin,P and Kmin,N , were set in the Matlab implementation. Thiswas not in line with the theory proposed and has therefore been excluded.

2. No more weighted rD

14

2.5. Python Model Implementation

During the exponential price process estimation, an exponential price is �ttedto the price series using OLS. For the case of estimating rD, the discount rate,formerly, this procedure was executed in a two step manner: �rstly a �t ina short-term window dtd = 63 days (for whatever reason this speci�c valuewas chosen is unknown) was done, in order to obtain the short-term rate rD,s.Secondly, another exponential with window size of the input dtd (which usuallyis longer than 63 days) was carried out in order to get the longer window raterD,l. Lastly, the real discount rate rD was estimated as a weighted average ofthe two rates:

rD = 0.15rD,s + 0.85rD,l (2.5)

In the Python implementation, we discarded this approach. We �nd that theweighted average procedure overcomplicates the procedure by introducing twonew variables, the short-term window size of 63 days, as well as the weight of0.15 in the average. These values are by no means easy to optimize and in-troduce additional parameters and therefore unnecessary danger of over�ttinginto the implementation. Furthermore, similar e�ects as the one achieved bythe weighting process can potentially be obtained by simply choosing a smallervalue for dtd.

3. No super-exponential ρ estimation

Due to reasons elaborated in Chapter 3.1 the super-exponential probabilityestimation as indicated in Figure 2.2 is not used. Instead it is replaced witha non-parametric probability estimate for ρ calculated during the Jump Esti-mation process.

Figures 2.4 & 2.5 show the trading strategies performing the same independent ofwhether the Matlab or Python implementation was used, provided the same inputvariables were chosen. Further analysis should therefore not be a�ected by the choiceof the implementation method.

15

Chapter 2. The E�cient Crashes Bubble Model

Figure 2.3: This �gure, provided by Dr. Vladimir Korenev from the ER Group,shows the work �ow of the Python implementation of the ECBM trading strategyin every detail.

16

2.5. Python Model Implementation

Figure 2.4: Figure from Kreuser and Sornette [2] depicting the strategy, imple-mented in Matlab and run on the S&P 500 crash of 2008. The Optimal E�cientPortfolio outperforms the market and all other strategies.

Figure 2.5: Figure showing the recreation of the trading payo� shown in Figure 2.4with the recreated Python code. This �gure was provided by Jan-Christian Gerlachof the ER Chair.

17

Chapter 3

Experimental Setting

3.1 Synthetic Data Generation

3.1.1 Motivation and Theory for Complexity Levels

The overarching goal is to understand the ECBM in its full complexity. This meansto analyze all de�ning features of the model separately and then jointly. One couldask, what is the simplest model arising from the basic iteratisve equation (givenin 2.1) on which the ECBM is based on? How can we build up the complexity ofthe term at step-by-step in order to arrive at the full model speci�cation? In ouranalysis, we lead up to the full model speci�cations by layering one level of themodel over the other like the layers of an onion. This gives us better understandingabout how the model is built up and what parts it consists of. Also, we are curiousto see how synthetic data at the di�erent stages will look like.So to generate synthetic data from the model, the iterative equation for the simplestochastic price process with a discrete Poisson process from the ECBM is used. Werecall:

pt+1 = pt exp(at + σεt)

at =

{rt with probability 1− ρ with 0 ≤ ρt < 1

κi ln(qt) + rD with probability ρtηi

withln(qt) = ln(Nt)− ln(pt)

at hides the intricacy of the model, describing either the drift of the stochastic priceprocess or the crash/rally component, as shown above in the model presentation.The occurrence of the two cases is controlled by the daily crash probability ρt.The errors εt are assumed to be zero-mean, independent and identically normallydistributed. And as the ECBM assumes σ, the volatility of the random process to behomoscedastic, it set that way for all levels of complexity of synthetic data created.Therefore, only by specifying at and parameters therein, one can in�uence modelcomplexity and create price data of di�erent sophistication.

19

Chapter 3. Experimental Setting

By specifying at, di�erent complexity levels are generated, as described in Chapter3.1 below. Starting with the trivial model, complexity is built up step-by-step untilthe full complexity formulation based on the full ECBM is realized. For each modelstage, a set of 100 di�erent price series were generated for the ECBM to run onduring a Monte Carlo simulation. In the following chapter, we provide the di�erentcomplexity levels. For each level, we specify the choice of at and show the resultinginstance of the main iterative base equation. Furthermore, expectations for thetrading model on the datasets are formulated.

3.1.2 Complexity Levels and Performance Expectations

Trivial Model

Figure 3.1: Plot of the Trivial Model implementation with a constant price overthe entire period.

� Drift:

at = 0

� Volatility:

σ = 0

� Crash Probability:

ρt = 0

� Resulting Equation:

pt+1 = pt = p0

In this trivial model formulation, all parameters are zero. Without a drift componentand with zero uncertainty the price stays constant. In practice, the generated priceseries is irrelevant, nonetheless it was included to see how the ECBM handles thissituation. We would expect it to be fully invested in the risk-free asset with λ = −1as the risk-free rate is larger than zero.

20

3.1. Synthetic Data Generation

Deterministic Exponential Model

Figure 3.2: Plot of the Deterministic Exponential Model implementation. In thisplot the price increases at a constant rate.

� Drift:

at = rD > 0

� Volatility:

σ = 0

� Crash Probability:

ρt = 0

� Resulting Equation: pt+1 = cpt with c = exp(rD)

In the above equation, a constant larger than zero drift is introduced, resulting inan exponential price process. It is at no risk of a crash and there is no market noise.As the discount price rate rD is rD > rf , the trading model is expected to invest allwealth in the strategy λ = 1.

Stationary Geometric Random Walk

Figure 3.3: Plot of the Geometric Brownian Motion Model implementation. Theprice series experiences a homoscedastic volatility and in addition to a constant drift.

21

Chapter 3. Experimental Setting

� Drift:

at = rD > 0

� Volatility:

σ > 0

� Crash Probability:

ρt = 0

� Resulting Equation:

pt+1 = pt exp(rD + σεt)

Augmenting the Deterministic Exponential Model by adding uncertainty, a station-ary Geometric Brownian Motion with constant drift is obtained. In this model thebehaviour of the ECBM strategy will be interesting to observe.

Ine�cient Crashes Model

Figure 3.4: Plot of the Ine�cient Crashes Model implementation. The directionof the jumps is decided by a random coin toss and can therefore be erratic as canbe seen in 3.5.

� Drift:

at =

{rD with probability 1− ρt with 0 ≤ ρt < 1

κ+ rD with probability ρt

� Volatility:

σ > 0

� Crash Probability:

ρt = const

� Resulting Equation:

pt+1 = pt exp(at + σεt)

22

3.1. Synthetic Data Generation

In this implementation, ine�cient crashes are introduced, in contrast to e�cientcrashes as de�ned in the paper by Kreuser & Sornette [2]. They are termed ine�-cient because the size and direction of the discrete event are not yet related to themispricing but to the absolute price. The occurrence of a jump is governed, like inall implementations from hereon, by probability ρ but due to its independence fromthe mispricing, for each instance there is a 0.5 probability for it to be a crash ora rally respectively. This means the decision whether a crash or rally happens isdecided by an unbiased coin toss. Further, no crash distribution is assumed for theamplitude of the crash, instead crashes are described by �xed relative amplitude ofκ and the return of the asset is �xed at rD, resulting in an asset growth independentof crash risk. These properties often result in a nonsensical price series as can beseen in Figure 3.5. Nonsensical as the decision whether a jump is a crash or a rallyis decided in an arbitrary fashion. Consequently this model is not further pursuedin our analysis.

Figure 3.5: Plot of the Ine�cient Crashes Model implementation showing some-what erratic behaviour, due to the random nature deciding the direction of the jump.While all other here depicted model implementations make this decision dependenton mispricing the Ine�cient Crashes Model decides with a unbiased coin toss. Theresulting price series grows to much larger heights than for other implementations.Consequently, the Ine�cient Crashes Model is not further investigated in this work.

E�cient Crashes Model

� Drift:

at =

{rD with probability 1− ρt with 0 ≤ ρt < 1

κ ln(qt) + rD with probability ρt

� Volatility:

σ > 0

� Crash Probability:

ρt = const

� Normal Price:

Nt = exp(rN t) with rN = rD

23

Chapter 3. Experimental Setting

Figure 3.6: Plot of the E�cient Crashes Model implementation in black and thenormal price in orange. When the asset price deviates from the normal price a crashor rally decreases the gap until the cycle starts up again.

� Mispricing:

ln(qt) = ln(Nt)− ln(pt)

� Resulting Equation:

pt+1 = pt exp(at + σεt)

A normal price process and based on it a mispricing, describing the deviation of theprice series from the fundamental asset value, is introduced to the model. While theexpected return of the price series still remains constant, the model now containse�cient crashes for which the jump amplitude is linked to the mispricing. Weexpect the trading strategy to have similar performance on this data as on the fullcomplexity model.

Rational Expectations E�cient Crashes Model

Figure 3.7: Plot of the Rational Expectations Model implementation. Additionallyto jump magnitudes, price development is related to the mispricing at hand in thisimplementation.

� Drift:

at =

{rt with probability 1− ρt with 0 ≤ ρt < 1

κ ln(qt) + rD with probability ρt

24

3.1. Synthetic Data Generation

� Volatility:

σ > 0

� Crash Probability:

ρt = const

� Normal Price:

Nt = exp(rN t) with rN = rD

� Mispricing:

ln(qt) = ln(Nt)− ln(pt)

� Expected Return:

rt = rD − ρK ln(qt)1−ρ

� Resulting Equation:

pt+1 = pt exp(at + σεt)

On this complexity level, we �rst introduce the rational expectations condition,where the future price is in�uenced by the current mispricing. We expect the ECBMto perform reasonably well on this model, since there is now feedback between themispricing and the development of the price series.

Kappa Distribution Model

Figure 3.8: Plot of the kappa distribution model implementation. Price develop-ment is determined by mispricing while jump magnitude is additionally in�uencedby a jump distribution. We chose this distribution to be normal.

� Drift:

at =

{rt with probability 1− ρt with 0 ≤ ρt < 1

κi ln(qt) + rD with probability ρtηi

with∑ηi = 1 K =

∑ηiκi

25

Chapter 3. Experimental Setting

� Volatility:

σ > 0

� Crash Probability:

ρt = const

� Normal Price:

Nt = exp(rN t) with rN = rD

� Mispricing:

ln(qt) = ln(Nt)− ln(pt)

� Expected Return:

rt = rD − ρtK ln(qt)1−ρt

� Resulting Equation:

pt+1 = pt exp(at + σεt)

In real world markets, we assume jumps to happen in two situations, �rst as acorrection towards the normal price or second as continued growth of the bubble dueto momentum or euphoria of the markets [7]. Thus, the jump size distribution mostprobably show a concentration somewhere on the positive side and a smaller oneon the negative and could be approximated by some form of a mixed distributionmodel. But synthetic data was not created in that manner as the ECBM is notconstructed for such an estimation but only to �nd a K1. Thereby, the jump sizedistribution was modelled by a normal distribution2 with κ ∼ N (K, σ2

κ).

Super Exponential Probability Model

� Drift:

at =

{rt with probability 1− ρt with 0 ≤ ρt < 1

κi ln(qt) + rD with probability ρtηi

with∑ηi = 1 K =

∑ηiκi

� Volatility:

σ > 0

1An idea to implement exclusion zones for values in a normal distribution to arrive at a similardistribution was disregarded as the estimation error for K would become unclear and ultimatelythe goal is to analyze the estimation performance of the model.

2The variance of the κ-distribution σ2κ is not to be confused with the variance of the price

process σ2.

26

3.1. Synthetic Data Generation

� Crash Probability:

ρt =1−qat1+b

� Normal Price:

Nt = exp(rN t) with rN = rD

� Mispricing:

ln(qt) = ln(Nt)− ln(pt)

� Expected Return:

rt = rD − ρtK ln(qt)1−ρt

� Resulting Equation:

pt+1 = pt exp(at + σεt)

This is the model that is used in the paper by Kreuser and Sornette [2]. Simulatingthe model with super-exponential probability, constructed based on the formula inthe paper depicted above, resulted in control issues. The following two cases areobserved:

1. The crash probability stays at a negligible level at all times and the price seriesdoes not experience crashes. The expected return is rt =∼ rD resulting in apure geometric random walk.

2. A strong, hardly controllable feedback cycle between the super-exponentialprobability and the mispricing sets in, resulting in the crash probability quicklyreaching its maximum value (which is controlled by b) and remaining there.Reason being an imbalance between the bubble growth and the impact ofjumps resulting in a high probability for jumps, but their magnitude is insu�-cient to bring the asset price back to the normal price. This leads to the priceseries deviating permanently from the normal price and diverging to in�nity.

Figure 3.9: Plot of the Super-Exponential Model implementation when price seriesgoes to in�nity as described in case 2. Figure is depicted to give a sense of the blowupof the generated price series.

27

Chapter 3. Experimental Setting

3.1.3 Synthetic Data Generation Procedure

The experimental procedure is carried out in three steps. First, di�erent datasetsof synthetic data are generated in preparation of a Monte Carlo simulation basedon di�erent complexity levels of the model. In a second step, the Monte Carlosimulation is performed on ETH's Euler super-computer cluster. In the third andlast step, estimation and trading performance as well as the relationship betweenthem are studied and ideally optimal values for the initial parameters are found.Synthetic time series data of various versions of the model at di�erent complexitylevels as described in Chapter 3.1 are generated. In the scope of this work, a Pythonfunction package was written with purpose to generate synthetic data. Whatevercomplexity level for however long time period with whichever features can be gen-erated. Furthermore, to allow future work to vary σ over time, a feature was imple-mented on the basis of Generalized AutoRegressive Conditional Heteroscedasticity(GARCH) to do so.The individual actions of the synthetic data generation algorithm at every timestepare as follows:

1. Gaussian Noise is generated

2. Based on the stochastic price process, price is calculated

3. Mispricing is calculated

4. Normal price is calculated

5. Evaluate whether timestep contains a jump or not

6. In case of jump, draw from Kappa Distribution to get κi and calculate newprice

7. Append price to price series

It was made certain to always generate the di�erent complexity levels of syntheticdata based on the same set of input parameters. All of these input values can belooked up in Table 3.1.Starting price was set to 100 to be able to easily identify the percentage growth inplots. A window of 14 years of synthetic data leaves enough lead time for everytype of estimation we want to perform during the trading process, speci�cally theexponential price process estimation for rN as the maximum window for dtn is 10years as de�ned by our grid in Chapter 3.2. From market observation, we setσ = 0.01, K = 0.2 and the probability for a jump ρ = 0.01. Finally, rN = rD = 0.1was chosen, as we want the asset to move stochastically around its fundamentalvalue in the long run.Furthermore, it was made sure to use the same random seeds when generating theprice series, meaning the underlying stochastic process and the jump times are thesame for all di�erent complexity levels. The highest model complexity that couldbe created reliably, the Kappa Distribution Model, was analyzed in most detail. It

28

3.1. Synthetic Data Generation

Parameters for SDG Valuesstart time 01.01.2007end time 31.12.2020P0 100rN (daily) log(1.1)/252rD (daily) log(1.1)/252σ 0.01ρ 0.01K 0.2σκ 0.2

Table 3.1: Depiction of all input parameters for the synthetic data generation pro-cess given on a daily scale. The long period for the synthetic price series was chosento have enough data for the exponential price process estimation. All remainingparameters were set according to standard market behaviour.

is essential to understand the limits and problems associated with calibration, giventhat there are no model errors in the synthetic data.

Figure 3.10: Figure shows a case where the synthetically generated price seriesgrows exponentially. Running the ECBM trading strategy on these unrealistic mar-ket conditions would not return meaningful insights about the strategy performance.The reason for this is that the lowest price point during the crash is still much largerthan the starting price. The model would return a stellar CAGR growth independentof its behaviour during the trading period.

When generating synthetic data, two cases, depicted in Figures 3.10 and 3.11 becameevident. Some datasets show the tendency to �run o�� and grow to very large heightswhile others exhibit behaviour not seen in real markets such as double dips. Theissue with these price series was that even during the lowest point in the crashthe price was still much larger than at the beginning of the trading window. As aconsequence the model would return very high CAGRs no matter the trading policyit determined skewing our analysis. Consequently, a preselection for the generatedprice series was made to �lter out price series with wild price growth, while the issueof double crashes was left to be addressed in future work.Criteria chosen were a (i) max value of price series < 1000 and for the (ii) highest

29

Chapter 3. Experimental Setting

Figure 3.11: Figure shows a case of multiple dips in a synthetically generated priceseries. We chose to not �lter out these occurrences while generating di�erent priceseries for the subsequent Monte Carlo analysis.

value of price series to lie within the trading window. To arrive at 100 price seriesmeeting these two criteria, 1049 price series were randomly generated resulting ina yield of about 10.5%. The preselection has been thought to be necessary toaccurately simulate a market environment where the ECBM would be deployed.This simulation was carried out on ETH's supercluster Euler [8]. For this purpose theexecution of the grid search was parallelized by splitting the grid to be investigatedinto smaller pieces and assigned to separate CPUs.The analysis shall reveal relations between the model performance and input pa-rameters. Ideally, we will show that the model correctly acts on statistically idealsynthetic data, i.e. the code parameter estimation error is small, and that its out-performance is robust with respect to the choice of input parameters.

3.2 Running the ECBM on Synthetic Data

For every complexity level described in Chapter 3.1 we generate nMC = 100 pricedatasets for the Monte-Carlo-Simulation at the �xed values of the model parametersseen in Table 3.1.The parameters of the ECBM to run over the generated synthetic data were alwaysthe same and can be seen in Table 3.2. Trading is performed in a window of a yeartowards the end of the generated price series with −1 ≤ λ ≤ 1. Considerations toallow |λ| ≤ 2 were entertained but ultimately disregarded to allow for the optionto hold symmetric long and short positions, as we believe to better isolate modelperformance. Nevertheless, in Appendix B the plots and analysis for λ ∈ [−1, 2] areincluded as those were the boundaries used in the paper by Kreuser and Sornette[2].The ECBM Python implementation takes 5 initial input parameters, α,K, d, dtd, dtn,from which the parameters α = 0.02 and K = 60 have been found to perform bestat their respective values by previous testing.This leaves us with the following three inputs, a �triplet� of sorts always form incombination a �strategy�:

� d Portfolio rebalancing time [days]

30

3.2. Running the ECBM on Synthetic Data

Parameters for ECBM Valuesstart time 01.04.2018end time 01.04.2019lower lambda boundary -1upper lambda boundary 1risk free rate 0.02α 0.02K 60business days per year 252business days per month 21

Table 3.2: Table shows the parameters of the trading model chosen to be invariantover the whole analysis. A trading window over one year gives enough time forthe strategy to perform. Boundaries for lambda were chosen symmetrically to notintroduce a discrepancy between positive and negative returns. K shows the spotvolatility estimation window size of 60 days and α the signi�cance level in the jumptest of 2%.

� dtd Short-term window for exponential price process estimation [days]

� dtn Long-term window for normal price process estimation [days]

A grid search is performed by running the ECBM code over the nMC simulateddatasets, with each about 2300 grid points, spanned by the triplets d, dtd, dtn. Notevery unique combination of grid points shown below is examined, namely we ignorecases where dtn < dtd as this goes against the basic idea of the ECBM of having along term and a short term moving window. The individual grid points of this cubicsearch space are de�ned as follows:

� d = 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20

� dtd = 63, 126, 189, 252, 315, 378, 441, 504, 567, 630, 693, 756, 819, 882, 945, 1008

� dtn = 504, 756, 1008, 1260, 1512, 1764, 2016, 2268, 2520

Maximum grid values were chosen based on experiences gathered in previous ex-perimentation and the space between them was �lled in a linear manner. For d insteps of one day, for dtd in steps of quarter years, translating to 63 business daysand �nally for dtn in yearly steps of 252 business days. For future testing, it couldbe also worthwhile to explore a logarithmically spaced grid, as this may reduce thenumber of grid points and thereby computational e�ort, while still covering manydi�erent time scales that are practically relevant to investors. In this work, it wasnot pursued as the maxima were known in advance and the computational powernecessary to analyze all grid points was not a limiting factor.We recall our implementation of the ECBM trading strategy in Python, discussed inChapter 2.5. Function execute_strategy.py takes one set of initial inputs consisting,among others things, of d, dtd, dtn. Together they form a strategy, which then is run

31

Chapter 3. Experimental Setting

over a price series returning a �le with metrics describing the performance of thestrategy and the resulting wealth evolution. Now the grid search procedure inputsall unique combination of triplets of the above de�ned grid one after the other intothis Python code, resulting in just about 2300 strategies run per price series. Thismeans each triplet in the grid represents one strategy from which we obtain a setof outputs. These outputs are saved jointly into tables indexed by the d, dtd, dtntriplets they correspond to and the price series on which they were generated. Thegrid search procedure results in a portfolio evolution, parameter estimates for everytime step, and trading performance metrics for every unique combination of theinput triplets. Aggregating these results over all grid points as well as all MonteCarlo iterations leaves us with ∼ 2300 strategies per price series times 100 priceseries, resulting in ∼ 230000 data points.

This plethora of data allows us to ful�l our primary objective of determining therelationship between the initial input parameters and the resulting (1.) parameterestimation, (2.) trading performance and the (3.) relationship between the two.

3.3 Data Preparation

Given a price series, as well as the �xed and strategy inputs, the code returns a seriesof investment decisions {λ} in steps of d days over the simulated timeframe [ts, te].This self-�nancing trading strategy {λ} results in an overall wealth evolution that isunique to the chosen strategy triplet (dtd, dtn, d). Recall that λ represents the frac-tion of wealth (which is the sum of a risk-free bank account and money invested inthe risky asset) invested in the risky asset. As we only adjust our portfolio in regulartime steps of d days, we need to make sure that all model parameters are estimatedon a d-day-horizon, such that we make an investment decision that is valid over theentire next period of d days. If we would simply estimate daily parameter values ormix d-day and daily estimates up, the corresponding λ would not be valid over theentire next investment period. As a consequence, the quantities rN , rD, σ, ρt, K areall estimated on a d-day scale by the code. However, in the synthetic data genera-tion, we specify daily values for these parameters, as our iterative equation samplesin discrete steps of 1 day. In order to make the model parameter values estimatedby the code on a a d-day scale comparable to the true parameter values (that arehowever speci�ed on daily scale), we need to convert the estimates to a daily scale.Only then, we can proceed by computing the estimation error. In the following, wedescribe how to rescale the parameters and provide formulae for each of them.

For the results of the price estimation the conversion looks as follows: Clearly,if an asset grows by rdaily per day, then over a period of d days it will grow byrdday = d · rdaily. Therefore, we can simply convert the rates by:

rN,daily =1

drN,dday rD,daily =

1

drD,dday

32

3.3. Data Preparation

Similarly, for the volatility, we �nd proportionality to the square root of d:

σdaily =1√dσdday

For ρ and K, the transformation is not as straightforward. For instance, we cannotsimply multiply the probability by d, as this could result in values for ρ exceeding1.

Figure 3.12: Figure shows a simpli�ed scenario of a price series where jumps areindicated in red and ddays are circumscribed by larger vertical lines. Jumps do notoccur on all days and a dday interval may contain di�erent numbers of jumps.

At this point, we need to elaborate a bit more detailed on the mechanics of the jumpestimation procedure in order to clarify how ρdday and Kdday can be rescaled. Forimproved understanding of the description, it is important to pay speci�c attentionto the scale to the quantities, daily or d-daily, that we refer to.At each d-day time di in the analysis timeframe [ts, te], we look back a number ofd-day intervals into the past. We do this in order to be able to obtain estimatesfor each interval and then average these over all intervals in the lookback, such thatthe accuracy of the estimates is increased. For each single interval in the lookbackhorizon, we �rst determine its bubble state (the interval type). We distinguish thetype of the interval via the value of the mispricing qt at the beginning of the interval.An interval can be a positive bubble interval (PBI), a negative bubble interval (NBI)or a no-bubble interval (NOBI). Next, we sum up detected daily jumps over theinterval to obtain a d-day equivalent jump. A single d-day interval can containminimally 0 daily jumps and at most d daily jumps, as visible in Figure 3.12. Whenan interval contains at least one jump, it is labeled as a jump-interval. Therefore,no matter whether 1 or up to d daily jumps occur in the interval, the interval willbe labeled as either a positive bubble jump (1) (PBJI), a negative bubble jump (-1)(NBJI), a no-bubble jump (0.5) (NOBJI) or a no-jump (0) interval (NJI).As stated, the jump analysis is performed over multiple past d-day intervals. Foreach interval, a label pos./neg./no-bubble-jump or no-jump is obtained. Then, Kdday

is estimated as the average of the detected equivalent d-day jump magnitudes andρdday is the number of jump intervals of pos./neg. type divided by the number ofanalyzed intervals of that type. So, actually, in each analysis step of d days, ignoring

33

Chapter 3. Experimental Setting

no-bubble jumps, we obtain a positive and negative estimate for each Kdday andρdday:

1. Kdday,PB = 1N(PBJI)

∑N(PBJI)j=1 κpos,dday,j

2. Kdday,NB = 1N(NBJI)

∑N(NBJI)j=1 κneg,dday,j

3. ρdday,NB = N(PBJI)N(PBI)

4. ρdday,NB = N(NBJI)N(NBI)

where N(·) denotes an operator that counts the number of d-day intervals of thespeci�ed type. κpos,dday,j and κneg,dday,j respectively denote the d-day equivalentjump sizes of the jth analyzed interval in the lookback.For the estimation of λi at the analysis time di, we select the positive (PB) ornegative (NB) bubble estimates depending on the bubble state (expressed by themispricing qt), as we proceed through the analysis timeframe [ts, te] in steps of ddays.Based on the knowledge of how jumps are calculated, we can now infer the conversionformulae for the remaining quantities. Figure 3.12 exemplary shows a simpli�edscenario with two and three daily jumps occurring in two di�erent dday intervals.Both intervals would be labeled as a jump interval. We can ask the question, giventhe probability of a daily jump occuring in each time step (which is constant), howdoes the size of intervals d in�uence the probability that a dday interval will belabeled as a jump interval. The probability of a dday interval being a jump intervalρdday := Pr(dday interval is jump interval) is the same as the probability that theinterval contains at least one jump, because in every such possible case it will belabeled as a jump interval. We can rewrite ρdday as:

ρdday = 1− Pr(dday interval is no jump interval)

Now, the probability that the dday interval is not a jump interval can be computedas the probability of �ipping a ρdaily-biased coin for every day in the interval andobtaining the event �no daily jump" (probability (1 − ρdaily)) d times. This can betranslated into the following relation:

ρdday = 1− (1− ρdaily)d

For instance, for ρdaily = 0.01 (1% probability of a daily jump) and d = 10, weobtain ρdday = 0.0956. Intuitively, it makes sense that with growing d or ρdaily alarger ρdday is obtained. And rearranged for the daily probability this means:

ρdaily = 1− (1− ρdday)1d

Hence, we have found a conversion formula to apply to the dday estimates of theprobability that are returned by the code.

34

3.3. Data Preparation

When deriving a similar expression for the average daily corrective crash size Kdaily

again, the fact that the jumps are summed up over the dday interval needs to betaken into account. In an interval with k ≤ d jumps with respective jump magnitudeκi the total jump size is:

κtot =k∑i=1

κi

So the expected jump size for k jumps is:

Kk = E[κtot] =k∑i=1

E[κi] = kKdaily (3.1)

Next, the expected average corrective crash size conversion formula is derived. Inthe synthetic data generation, we assume that a daily jump occurs with a dailyprobability of ρdaily as well as a magnitude Kdaily. If we regard the occurence of dailyjumps as the outcome of a ρdaily-biased coin �ip, the number of jumps k occuring ina sequence of jumps and no-jumps of length d is simply distributed according to aBinomial distribution:

Pk := Pr(k jumps in d) =

(d

k

)ρk(1− ρ)d−k

We regard a discrete probability space where the events k = 0, 1, 2, ..., d may occur.The sum of all probabilities must be equal to one. Hence, we need to normalize theprobabilities resulting from the Binomial Distribution by their sum

Pk,tot =d∑

k=1

Pk

such that

πk :=PkPk,tot

(3.2)

are the normalized probabilities of the regarded probability space and

d∑k=1

πk = 1

is ful�lled. The quantity πk will then be the probability that an interval with kjumps occurs, given the daily probability of a jump. Putting together Equation 3.1and 3.2, we have the expected corrective size of an interval with k daily jumps, aswell as the probability of a k-jump dday interval occuring. We can now take thediscrete expectation over k and �nd:

Kdday := Ek[K] =d∑

k=1

πkKk =d∑

k=1

PkPk,tot

kKdaily = fdKdaily (3.3)

35

Chapter 3. Experimental Setting

Kdaily =1

fdKdday

As our speci�ed Kdaily = const., we can factorize it in Equation 3.3 and the resultingpre-factor fd is a constant multiple that only depends on d and ρdaily, as intuitivelymakes sense.With conversion formulas for all relevant parameters rN , rD, σ, ρt, K derived, wecan convert the dday estimates returned by the grid search to the same scale asthe true values and are now in a position to compare the estimate of the ECBMimplementation to the true values used in the synthetic data generation.

36

Chapter 4

Results and Discussion

4.1 Estimation Performance

4.1.1 General Goal

We want to investigate the estimation performance in relation to the initial inputvariables and ideally, �nd the optimal relationship between them. For this purpose,we analyze the highest model complexity at our disposal, the Kappa DistributionModel as described in Chapter 3.1. We recall the structure of our data. Aftergenerating a synthetic price series we run the ECBM trading strategy on it for everycombination of triplets d, dtd, dtn in the grid as de�ned in Chapter 3.2. This is carriedout for all nMC = 100 generated price series, resulting in∼ 230000 strategies in total.For every single strategy the Python function returns time varying estimates duringour de�ned trading period. These estimations are averaged by strategy before wecompare them to the set inputs described in Chapter 3.The resulting data was aggregated over these 100 price series to examine all uniquetriplets and their in�uence on the estimations. Results are depicted using two typesof plots showing performance and accuracy of the estimations returned by the ECBMcompared to the set value in the data generation process. Probability density func-tions (PDF) of the estimates conditional on the initial input variables are used andadditionally the accuracy of the estimation is investigated by mean squared error(MSE) and mean absolute percentage error (MAPE) plots. These were calculatedthe following way:

MSE =1

n

n∑i=1

(Yi − Yi)2 (4.1)

MAPE =100

n

n∑i=1

∣∣∣ Y − YY

∣∣∣ (4.2)

Where Yi are the estimated values and Yi the �true� values, set during the syntheticdata generation process.Every input variable and their in�uence on the estimated parameters is assessedseparately in the following sections.

37

Chapter 4. Results and Discussion

4.1.2 Initial Input d

Parameter d is an initial input for the jump estimation as well as the super-exponentialprobability estimation. As can be seen in Figure 2.2, its impact is expected to beexerted on the estimation of σ, K and ρ.Figure 4.1 shows no real change in the probability density function (PDF) overchanging input d. The mode of the PDF is close to the true value, while the medianand mean of the estimations are slightly farther away. A calculation of the volatilitywithout �ltering out jumps returns a signi�cantly larger estimate than the trueinput. The immobile nature of the PDFs for changing input leads us to believethat for even small d values most jumps can be detected. This might be groundsto reevaluate the choice of the set jump magnitude in future work, as it seems notchallenging for the model to pick up. This is highlighted by the fact that the MAPEplot shows a small error for all input variables.Looking at the estimation results for K, Figure 4.2 shows little change over varyingd inputs. The mode of the PDF is close to the set value, all while the median andmean values do not seem to change over the whole range of analysis either. Again,the reason for this behaviour seems to originate in a jump detection that performswell even with small d values and more importantly does not improve for largerwindows. A look at the error plots in Figure 4.5 con�rms this lack of increasedperformance for di�erent d, MAPE shows a large and oscillating relative error.The estimation performance of ρ appears to be almost independent of the initialinput d. In the PDFs mean, median and mode are quite close to each other, but stillthe value set during the synthetic data generation is larger. Although one shouldnote, the scale on the x-axis is quite wide and makes the estimation appear evenfarther apart. It is a small error in absolute terms but the relative error of theestimation is quite large with 50% as seen in Figure 4.6.Overall, it can be seen that varying initial input d has very little in�uence on thequality of the estimations. Further the error of the resulting estimations is quitehigh with the exception of σ.

38

4.1. Estimation Performance

���

���

�� ���������������������������������������������

����������������������������������������������

���

���

�� ���������������������������������������������

����������������������������������������������

���

���

�� ���������������������������������������������

����������������������������������������������

���

���

�� ���������������������������������������������

����������������������������������������������

���

���

�� ��������������������������������������������

����������������������������������������������

���

���

�� ��������������������������������������������

���������������������������������������������

���

���

�� ���������������������������������������������

���������������������������������������������

���

���

�� ����������������������������������������������

����������������������������������������������

����� ����� ����� ����� ���� ����������� ��������������

���

���

�� ����������������������������������������������

����� ����� ����� ����� ���� ����������� ��������������

����������������������������������������������

Figure 4.1: Figure studies the behaviour of the Kappa Distribution model. PDF ofthe σ estimate for all initial inputs d. The green and red vertical lines indicate themedian respectivley the mean of the estimation, while the orange line indicates thestandard deviation of the price series without �ltering for jumps. We see virtuallyno change in estimation quality for σ for changing d. It should be noted that thebulk of estimations made by the model are much closer to the true value than thesimple estimation.

39

Chapter 4. Results and Discussion

���

���

���������������������������

�������������������������

���

���

���������������������������

�������������������������

���

���

���������������������������

�������������������������

���

���

���������������������������

�������������������������

���

���

��������������������������

�������������������������

���

���

��������������������������

������������������������

���

���

���������������������������

������������������������

���

���

����������������������������

�������������������������

���� ���� ���� ���� ��� ��� ��� ��� ��� ����������������

���

���

����������������������������

���� ���� ���� ���� ��� ��� ��� ��� ��� ����������������

�������������������������

Figure 4.2: Figure studies the behaviour of the Kappa Distribution model. PDFof the K estimate, calculated on a daily scale, for all initial input d, with the greenand red vertical lines indicating the median respectivley the mean of the estimation.In this plot the median and mean of the estimation are close to the true value forsmall d, although the mode is signi�cantly smaller. For increasing d, the spread ofthe estimation decreases and mode, median and mean are close to each other butfurther away from the true value.

40

4.1. Estimation Performance

��

��� ������������������������

�������������������������

��

��� ������������������������

�������������������������

��

��� ������������������������

�������������������������

��

��� ������������������������

�������������������������

��

��� �����������������������

�������������������������

��

��� �����������������������

������������������������

��

��� ������������������������

������������������������

��

��� �������������������������

�������������������������

���� ���� ���� ���� �������� ��������������

��

��� �������������������������

���� ���� ���� ���� �������� ��������������

�������������������������

Figure 4.3: Figure studies the behaviour of the Kappa Distribution model. PDFof the ρ estimate, calculated on a daily scale, for all initial input d, with the greenand red vertical lines indicating the median respectivley the mean of the estimation.In this plot the estimates are calculated on a daily scale. The estimation of ρ showsvery little variation with changing d. Reason might be that the super-exponenialestimation is not performed.

41

Chapter 4. Results and Discussion

� � � � � �� �� �� �� �� �� �� � � �� ���

����������

����������

����������

����������

����������

����������

��

����

����

����

���

������������ ��������������

(a) MSE plot of σ

� � � � � �� �� �� �� �� �� �� � � �� ���

��

��

��

��

��

���

����

��� ������������

�� ����

��� ��������������� ���������

(b) MAPE plot of σ

Figure 4.4: Figure studies the behaviour of the Kappa Distribution model. Onthe left side, the MSE plot is depicted while the right hand side shows the MAPEplot. It should be noted in the MAPE plot, that the error is comparatively smalland varies little for increasing d.

� � � � � �� �� �� �� �� �� �� � � �� ���

�����

�����

�����

�����

�����

�����

�����

������������������

������� ����������������

(a) MSE plot of K

� � � � � �� �� �� �� �� �� �� �� � � ���

���

���

���

��

��

��

��

���

����

����

����

����

����

����

���

�������� ����������������

(b) MAPE plot of K

Figure 4.5: Figure studies the behaviour of the Kappa Distribution model. On theright hand side, the MSE plot is depicted, on the left, the MAPE plot. The errorsare small in absolute numbers but quite signi�cant in relative terms. Although thereis some variation with changing d, no trend can be determined conclusively.

42

4.1. Estimation Performance

� � � � � �� �� �� �� �� �� �� � � �� ���

���������

���������

���������

���������

���������

���������

���������

�����������������

���������� ��������������

(a) MSE plot of ρ

� � � � � �� �� �� �� �� �� �� � � �� ���

����

����

����

����

����

����

����

���

����

���!

���

����

� ��

�� �

���

��� ������������� � ��� ���

(b) MAPE plot of ρ

Figure 4.6: Figure studies the behaviour of the Kappa Distribution model. Onthe left side, the MSE plot is depicted and on the right, the MAPE plot. For theestimation of ρ, errors decrease then increase again with higher d. Although, itshould be noted that the di�erences are quite small in relative terms.

43

Chapter 4. Results and Discussion

4.1.3 Initial Input dtd

Initial input dtd is the short-term window used in the exponential price processestimation for the parameter rD as indicated in Figure 2.2. The conditional PDFshows a large spread for small initial inputs dtd, this spread becomes smaller withhigher windows. This is in line with the display in the MSE plot, namely decreasingestimation error with higher dtd making it a consistent estimator.The initial input has a signi�cant e�ect on the estimation quality of rD, but one hasto note the large relative error even in the most favourable case of very large dtd.This raises concerns about the overall e�ectiveness of the exponential price processestimation.

44

4.1. Estimation Performance

���

�������������������������������

���������������������������

���

��������������������������������

����������������������������

���

�������������������������������

����������������������������

���

��������������������������������

���������������������������

���

��������������������������������

���������������������������

���

������������������������������

��������������������������

���

��������������������������������

����������������������������

������ ������ ����� ����� ����� ����� �������� ��������������

���

��������������������������������

������ ������ ����� ����� ����� ����� �������� ��������������

����������������������������

Figure 4.7: Figure studies the behaviour of the Kappa Distribution model. PDF ofdaily rD estimates for di�erent dtd, with the green and red vertical lines indicatingthe median respectivley the mean of the estimation. With larger dtd windows thespread of rD estimation becomes smaller and the mean as well as the median movecloser to the true value.

45

Chapter 4. Results and Discussion

�� ����������������������������������������

�������

�������

�������

�������

�������

�������

�������

������

��

����

��������

���

��������� ��������������

(a) MSE plot of rD

�� ��� � ��� ��� �� ��� ��� ��� ��� �� ��� � � ��������

���

���

���

���

��

�����������������������������

���������� ��������������

(b) MAPE plot of rD

Figure 4.8: Figure studies the behaviour of the Kappa Distribution model. Onthe left, the MSE plot is shown, on the right, the MAPE plot. The errors are quitesmall in absolute terms and decrease with higher initial input. But it stands out,that the relative error is still around 100% in the best case scenario.

46

4.1. Estimation Performance

4.1.4 Initial Input dtn

Parameter dtn is initial input in the exponential price process estimation and usedto �nd an accurate value for rN . In this, we expect it to behave similar to dtd inits estimation for rD, but additionally, rN in�uences the normal price developmentand consequently mispricing as can be seen in Figure 2.2. Those two quantitiesare expected to a�ect model behaviour signi�cantly by exerting in�uence on theestimates σ, K and ρ.Figure 4.9 shows lower spread and mean and median values of the estimation ap-proaching the true set value with higher initial input. The error plots too, show adecreasing estimation error with larger window size but the relative error is quitelarge, even though it decreases as well. This makes dtn a consistent estimator forrN .Estimation results for ρ do not change signi�cantly for larger dtn windows but thespread in the PDF plot (Figure 4.10) decreases and mean and median converge. Therelative error is at around 50% and does not improve with higher dtn as one mightexpect.We can observe a signi�cant improvement in the PDF plot depicting the estimationperformance of σ over increasing dtn (Figure 4.11). Similarly in Figure 4.15 one canclearly see a reduction in estimation error with larger dtn windows.The estimation results for K seen in Figure 4.12 show little change with varyingestimation window sizes. The mode of the estimation shifts towards the true, setvalue for higher dtn, but mean and median show little movement. The error of theestimation is quite high in relative as well as absolute terms, as can be seen in Figure4.16 Although it should be noted, that it decreases with higher initial input dtn.

47

Chapter 4. Results and Discussion

����

���� ����������� ����� ������ ���

����

���� ���������� ����� ������ ���

����

���� ����������� ����� ������ ���

����

���� ������������ ����� ������ ���

����

���� ������������ ����� ������ ���

����

���� ����������� ����� ������ ���

����

���� ������������ ����� ������ ���

����

���� ����������� ����� ������ ���

������� ������ ������ ������ ������ ������ ������ ����������� ������� ����

����

���� ������������ ����� ������ ���

Figure 4.9: Figure studies the behaviour of the Kappa Distribution model. Con-ditional PDF of the rN estimation shows a reduced spread as well as convergence ofmedian and mean values, in green and red, to the true value depicted in blue, forlarger dtn. The mode of the estimation albeit close, is still not at the true value.

48

4.1. Estimation Performance

��

���

��� ����������������������������

��

���

��� ���������������������������

��

���

��� ����������������������������

��

���

��� �����������������������������

��

���

��� �����������������������������

��

���

��� ����������������������������

��

���

��� �����������������������������

��

���

��� ����������������������������

���� ���� ���� ���� ���� ������������������

��

���

��� �����������������������������

Figure 4.10: Figure studies the behaviour of the Kappa Distribution model. This�gure shows the PDF of the ρ estimate, calculated on a daily scale, for all initialinputs dtn, with the green and red vertical lines indicating the median respectivleythe mean of the estimation. While the spread of the estimation decreases withincreasing dtn, the true value is still missed, although the small di�erence in absoluteterms should be noted.

49

Chapter 4. Results and Discussion

���

�������������� ����� ������ �������������� ������ ��

���

�������������� ����� ������ �������������� ������ ��

���

�������������� ����� ������ �������������� ������ ��

���

��������������� ����� ������ �������������� ������ ��

���

��������������� ����� ������ �������������� ������ ��

���

��������������� ����� ������ �������������� ������ ��

���

��������������� ����� ������ �������������� ������ ��

���

�������������� ����� ������ �������������� ������ ��

����� ����� ����� ����� ���� ����� ��������� ��� ������� ����

���

��������������� ����� ������ �������������� ������ ��

Figure 4.11: Figure studies the behaviour of the Kappa Distribution model. Figuredepicts the PDF of the σ estimate for all initial inputs dtn. Median and meanof the estimation are indicated by red and green vertical lines, while the orangeline indicates the simple standard deviation of the price series. Increasing dtn havepositive e�ects on the accuracy of the estimation, mean, median and mode approachthe true value. Compared to this, the simple standard deviation of the price seriesis farther o�, although the absolut di�erence is small.

50

4.1. Estimation Performance

���

���

���

������������� ����� ������ ���

���

���

���

������������� ����� ������ ���

���

���

���

������������� ����� ������ ���

���

���

���

�������������� ����� ������ ���

���

���

���

�������������� ����� ������ ���

���

���

���

�������������� ����� ������ ���

���

���

���

�������������� ����� ������ ���

���

���

���

������������� ����� ������ ���

���� ���� ���� ���� ��� ��� ��� ������� ������� ����

���

���

���

�������������� ����� ������ ���

Figure 4.12: Figure studies the behaviour of the Kappa Distribution model. Fromthe conditional PDF of K, calculated on a daily basis, hardly any improvements ofthe estimation performance with changing dtn can be observed. Spread shows somemovement as do mean and median but the changes are small and no trend can bedetermined.

51

Chapter 4. Results and Discussion

��� ��� ��� ���� ���� ���� ���� ��� �������

��

����

� ��

���

���

����

����������� ������ ����

(a) MSE plot of rN

��� ��� ��� ���� ���� ���� ���� ��� �������

��

��

��

���

���

���

���

��

����

����

����

����

����

����

���

���������� ��������������

(b) MAPE plot of rN

Figure 4.13: Figure studies the behaviour of the Kappa Distribution model. TheMSE plot on the left hand side shows a small error in absolute terms that decreaseswith higher dtn. On the right side, the MAPE plot shows the relative error to bequite signi�cant but steadily decreasing with higher initial input.

��� �� ��� ���� ���� ��� ���� ��� �������

��������

��������

��������

��������

�������

��������

��������

��������

��������

������������������

������� ������������������

(a) MSE plot of ρ

��� ��� ���� ���� ���� ���� ���� ���� �������

��

��

��

��

��

��

���

���

����

����

����

����

���

���

������ �����������������

(b) MAPE plot of ρ

Figure 4.14: Figure studies the behaviour of the Kappa Distribution model. Onthe left side the MSE plot is depicted, while the right �gure shows the MAPE plot.The error in estimating ρ decreases considerably when increasing the dtn window to1260 business days, but increases by ∼ 2% when the window is enlarged further.

52

4.1. Estimation Performance

��� �� ��� ���� ���� ��� ���� ��� �������

���������

���������

���������

���������

���������

���������

���������

���������

���

�� �

����

����

���

� ����� ��������������������

(a) MSE plot of σ

��� ��� ���� ���� ���� ���� ���� ���� �������

��

��

��

���

���

������������

����

������

������ �������������������

(b) MAPE plot of σ

Figure 4.15: Figure studies the behaviour of the Kappa Distribution model. TheMSE plot on the left hand side shows a steadily decreasing error with larger dtnwindows. On the right, the MAPE plot shows the estimation is fairly accuratestarting with ∼ 14% error and decreasing below 6% for large estimation windows.

��� �� ��� ���� ���� ��� ���� ��� �������

���

����

����

����

����

����

����

����

���

����

����

����

���

������� ����������������

(a) MSE plot of K

��� ��� ���� ���� ���� ���� ���� ���� �������

���

���

���

���

���

���

���

������� ����

����

���

���

� ��������������������

(b) MAPE plot of K

Figure 4.16: Figure studies the behaviour of the Kappa Distribution model. Theerrors when estimating K are quite large in absolute as well as relative terms. Thiscan be seen on the left side in the MSE plot and on the right on the MAPE plot.

53

Chapter 4. Results and Discussion

4.1.5 Summary Estimation Performance

A summary of the estimation performance in relation to the input variables canbe consulted in Table 4.1. At �rst glance, it looks like initial input d has overallonly a weak in�uence on the estimation parameters. But dismissing its in�uence onσ, Kand ρ altogether would be hasty. The reason that barely any di�erence can beseen in the conditional PDFs shown in Figures 4.1 - 4.3 with changing d, is the jumpestimation performing already quite well at low d. We can conclude that the JumpEstimation Process, for which d is the initial input detects jumps reliably even atsmall inputs. As a consequence, σ can be estimated quite precisely, a fact becomingevident when looking at Table 4.1 and associated error plots in Figures 4.4 & 4.15.

Enlarging initial input dtd improves the estimation for rD considerably. We canconclude that dtd is an important parameter that in�uences heavily the estimationof rD.

The same holds true for dtn and its in�uence over rN . Therefore, dtn can be labelledas an important parameter in�uencing the estimation of rN .

We clearly see, that the inputs d, dtd, dtn have very little in�uence on the accuracyand performance of the estimations. Further, one can deduct that the exponentialprice process estimation is the only process where parameters have signi�cant in-�uence on the values to be estimated. Therefore, the e�ect of the Jump ProcessEstimation must be reevaluated.

d dtd dtn O(MAPE)

σweak in�uence,size indi�erent

-strong in�uence,higher better

10

Kweak in�uence,size indi�erent

-strong in�uence,higher better

102

ρweak in�uence,size indi�erent

-weak in�uence,higher better

10

rD -strong in�uence,higher better

- 102

rN - -strong in�uence,higher better

102

Table 4.1: Table gives an overview for all initial inputs and their in�uence on theestimated values. In�uence is categorized as weak if the corresponding PDF showsno changes with di�erent initial inputs. On the other hand it is categorized as strongif the PDF shows variation with di�erent initial inputs. Size refers to the windowsize of the initial inputs and shows preferences. Additionally, the order of magnitudefor the estimation errors is given. These errors are very large for all estimated valuesexcept ρ and σ.

54

4.2. Trading Performance

4.2 Trading Performance

4.2.1 Evaluation Metrics

A central question to answer when evaluating a trading strategy is which metricto use when assessing performance. We want to avoid a case where the objectivesof metrics and procedure disagree. Thankfully, this makes it easy for us to �ndsuitable measures. Due to the Kelly Criterion, the trading strategy based on theECBM tries to maximize the expected log of wealth growth without consideringother factors relevant to a portfolio. Therefore, it would not make sense to chosemetrics such as Calmar ratio that look for small drawdowns, as the objective of theKelly procedure is not to minimize potential drawdowns.Consequently, we decided the main metric to judge the trading performance shouldbe compound annual growth rate (CAGR) of the portfolio. Equation 4.3 shows howCAGR is calculated.But the absolute growth of the portfolio is not the only factor we want to look atwhen evaluating trading performance. The question that needs to be answered ishow much better did the portfolio constructed by our strategy perform, compared tosimply holding the risky asset? For this purpose, we use rdiff as de�ned by Equation4.4.

CAGR =

(pendpstart

)1/n

− 1 (4.3)

rdiff = rprotfolio − rasset (4.4)

4.2.2 Overall Overview

Investigating the trading performance independent of the chosen input parametersgives us an understanding of the overall performance of the strategy. This stepis crucial to determine whether good estimation translates to good performance inmarkets and will be closer assessed in a later section.The unconditional PDF (Figure 4.17) shows the CAGR to be almost symmetricallydistributed around zero. It becomes even clearer when looking at the CDF plot(Figure 4.18), showing the 0.5 cumulative probability to be almost exactly at zeroCAGR. Although in both plots it becomes apparent that there are more outlierstowards high CAGR than low CAGR.Overall trading performance in relation to the chosen initial inputs can be seen inthe aggregated heatmaps in Figures 4.19, 4.20 and 4.21. As an aggregated heatmaplooses information and fails to show a di�erentiated picture of the trading strategy'sperformance over all di�erent price series, heatmaps for all generated price seriescan be found in Appendix A. By di�erentiating between CAGR and rdiff , a clearpattern emerges. When averaging over 100 price series, the mean CAGR of thestrategy is positive. But the return following a hold strategy during the tradingperiod for the risky asset would have been more lucrative.

55

Chapter 4. Results and Discussion

���� ���� ��� ��� ��� ��� �

���

���

���

���

���

���

���

�������������������

���� ������������

Figure 4.17: Depiction of the PDF of CAGR of all nMC = 100 price series of themodel implementation Kappa Distribution. Mode, median and mean are very closeto each other and to zero CAGR. Some outliers in performance exist towards higherreturns, more so than towards lower CAGR.

���� ��� ��� ��� �������

���

���

���

��

��

���

���

�������������������

� �������

Figure 4.18: Depiction of the CDF of CAGR of all nMC = 100 price series of themodel implementation Kappa Distribution. The plot is slightly asymmetrical.

56

4.2. Trading Performance

� � � � �� �� �� �� � � �� �� � �� ���

������ ������� ������������ �� ��

���

���

�������������������������������

�����

�����

����

�����

(a) Heatmap of CAGR with dtd on they-axis and d on the x-axis.

� � � � �� �� �� �� � � �� �� � �� ���

������ ������� ������������ �� ��

���

���

��������������������������������

������

�����

������

����

(b) Heatmap of rdiff with dtd on they-axis and d on the x-axis.

Figure 4.19: Depiction of aggregated heatmaps of all nMC = 100 price series of themodel level with kappa distribution. On the x-axis grid points of d and on the y-axisof dtd are depicted with CAGR and rdi� on the left and right respectively. On theleft we see that on average our strategy shows largely positive growth. Although,looking at the right plot it becomes clear, that our strategy's performance is lowerthan the market growth. Also dtd seems to have slightly more in�uence than d, asthe colors are more consistent horizontally for both plots.

57

Chapter 4. Results and Discussion

� � � � �� �� �� �� � � �� �� � �� ���

��

���

����

���

���

����

���

���

���

�������������������������������

����

����

�����

�����

�����

����

(a) Heatmap of CAGR with dtn onthe y-axis and d on the x-axis.

� � � � �� �� �� �� � � �� �� � �� ���

��

���

����

���

���

����

���

���

���

��������������������������������

�����

�����

������

�����

�����

(b) Heatmap of rdiff with dtn on they-axis and d on the x-axis.

Figure 4.20: Depiction of aggregated heatmaps of all nMC = 100 price series ofthe model level with kappa distribution. On the x-axis grid points of d and on they-axis of dtn are depicted with CAGR and rdiff on the left and right respectively.Our strategy shows on average positive returns for all combinations of initial inputsdepicted, as one can see on the left side. But its performance is still lower thanthe development of the market price over the trading period. It is also di�cult torecognize whether d or dtn has larger in�uence on the trading performance.

58

4.2. Trading Performance

�� ���

� �

��

��

��

��

���

���

��

��

���

���

��

���

����

���

���

����

���

���

���

���������������������������������

�����

�����

����

�����

(a) Heatmap of CAGR with dtd on they-axis and d on the x-axis.

�� ���

� �

��

��

��

��

���

���

��

��

���

����

��

���

����

���

���

����

���

���

���

����������������������������������

������

�����

������

����

(b) Heatmap of rdiff with dtd on they-axis and d on the x-axis.

Figure 4.21: Depiction of aggregated heatmaps of all nMC = 100 price series ofthe model level with kappa distribution. On the x-axis grid points of d and on they-axis of dtd are depicted with CAGR and rdiff on the left and right respectively.The region with dtd > dtn remains white, as those grid points were not analyzed.On average our strategy shows positive returns for most grid points as seen in theleft plot. But these returns are lower than the market development during the sameperiod, as can be seen in the right plot.

4.2.3 Initial Input d

As can be seen in Figures 4.22 and 4.23, there are barely any changes in tradingperformance occurring over di�erent initial input d. The median stays consistentlybelow or exactly at zero CAGR for all d and outliers towards higher CAGR are moreprevalent than towards low returns.

59

Chapter 4. Results and Discussion

����������������

����������������

����������������

����������������

����������������

����������������

����������������

����������������

���������������

����������������

���������������

���������������

����������������

���������������

�����������������

����������������

���� ���� ��� ��� ��� ���� ��

�����������������

���� ���� ��� ��� ��� ���� ��

����������������

Figure 4.22: Figure shows the PDF of CAGR conditional on initial input d. Nei-her mean, nor median or mode show large movement with varying d. A wide tailtowards higher CAGR is present while outliers towards lower CAGR are not equallyprevalent.

60

4.2. Trading Performance

� � � � � �� �� �� �� �� �� �� �� � � ���

����

���

���

���

���

��

Figure 4.23: Boxplot showing the change of CAGR of all nMC = 100 price series ofthe model level with kappa distribution conditional on d. Median is slightly below0 for all d and the interquartile range does not show signi�cant movement either.Outliers towards the maximum are very common while towards the minimum whilepresent, less prevalent. CAGR seems not to be in�uenced a lot by d.

61

Chapter 4. Results and Discussion

4.2.4 Initial Input dtd

Figure 4.24 depicts a small change in the trading performance over initial input dtd.For small dtd, the PDF plot is a bit more spread out and the mean and medianlie very close to each other and zero CAGR. This is not the case any more with252 < dtd, where the median slips below and the mean slightly above zero CAGR.Figure 4.25 shows the reason for this lies in more outliers towards higher CAGR andfewer towards lower.

� ������������������

������������������

� �������������������

�������������������

� ������������������

�������������������

� �������������������

������������������

� �������������������

������������������

� �����������������

�����������������

� �������������������

�������������������

���� ���� ��� ��� ��� ���� ��

� �������������������

���� ���� ��� ��� ��� ���� ��

�������������������

Figure 4.24: This Figure shows the PDF of CAGR conditional on dtd. The functionchanges slightly for increasing initial input but stops doing so after 252 < dtd. Modeand median are for all cases very close to each other and zero CAGR.

62

4.2. Trading Performance

�� ��� � ��� ��� �� ��� ��� ��� ��� �� ��� � � �� ������

����

���

���

���

���

��

Figure 4.25: Bboxplot showing the change of CAGR of all nMC = 100 price seriesof the model level with Kappa Distribution conditional on dtd. Median returns areconsistently below zero and notch heights are almost the same for all initial inputs.Positive outliers are more prevalent than negative ones.

4.2.5 Initial Input dtn

As one can observe in Figure 4.26, the PDF plot conditional on dtn shows no realchange over di�erent initial inputs. The exact same picture presents itself in Figure4.27, where the median is noticeably below zero for all dtn. Initial input dtn seemsto have no signi�cant in�uence on returns.

63

Chapter 4. Results and Discussion

�������������������

�������������������

�������������������

��������������������

��������������������

��������������������

��������������������

�������������������

���� ���� ��� ��� ��� ����� �

��������������������

Figure 4.26: Figure shows the PDF of CAGR conditional on dtn. Mean, medianand mode do not show signi�cant changes over varying initial inputs.

64

4.2. Trading Performance

��� ��� ���� ���� ���� ���� ���� ���� ���� ��

����

���

���

���

���

Figure 4.27: Boxplot showing the change of CAGR of all nMC = 100 price seriesof the model level with kappa distribution conditional on dtn. Almost exactly thesame notch height for all dtn. The amount of outliers, positive or negative, is almostthe same over all dtn.

65

Chapter 4. Results and Discussion

4.2.6 Summary Trading Performance

This summary of the trading performance is based on the Figures 4.17 - 4.27. Forthe PDF plots and boxplots conditional on the initial inputs d, dtd, dtn, we see veryfew changes in the trading performance with di�erent input values. Table 4.2clearlyshows this in a condensed fashion. But it should be noted, that outliers with highreturns are prevalent and more so than outliers showing negative CAGR. This isfurther highlighted by the heatmaps in Chapter 4.2.2 that reveal to have positivereturns for almost all triplets when aggregated over all examined 100 price series.Although this must be discounted as none of the triplets consistently outperformsthe market.

d dtd dtn

CAGRweak in�uence,size indi�erent

stong in�uence,size indi�erent

weak in�uence,size indi�erent

Table 4.2: Table shows the impact of the initial inputs on trading performance.Whether the in�uence is categorized as weak or strong is determined by the shownchange of the PDF plots with varying initial input. Whereas size indicates theoptimal range for the initial input.

4.3 Relationship between Parameter Estimation and

Trading Performance

We set the estimation performance, represented by absolute error, in a direct rela-tionship with CAGR. This allows us to visualize how and if an accurate estimationtranslates to superior returns. In the optimal case, we would like to see increasingCAGR with decreasing absolute error and vice versa. The Figures 4.28-4.32, showjointplots between the absolute estimation error and the corresponding CAGR ag-gregated over all ∼ 230000 strategies. Density is indicated by darkening colors andadditionally PDFs of the variable are displayed above the respective axes.

The main issue identi�ed here is that the return is seemingly uncorrelated withthe quality of the estimation. Hence, better estimations do not translate to higherCAGRs. This becomes apparent when looking at Figures 4.28-4.32, where mostreturn is distributed around zero independent of the corresponding estimation. Thereason for this could lie in the way how investment policy is determined, leading usto question the basis for �nding λ.

This is further endorsed by �ndings in Figure 4.33. It shows a jointplot betweenthe investment policy λ and the return of the risky asset at a given time duringthe trading period. It becomes clear that λ is often at its boundaries −1 or 1,independent of what the return of the asset rAsset is, leading to question whetherthe price development or rather the estimates extracted from it has an in�uence onthe Kelly Criterion.

66

4.3. Relationship between Parameter Estimation and Trading Performance

Ultimately, the idea was to identify ranges of promising input triplets and run themon a new synthetic data set for the purpose of an out of sample validation. However,no such ranges could be identi�ed, making the procedure futile.

��� ��� ��� ��� ��� ������������������ �������������

����

����

����

���

���

���

���

���

���

��

Figure 4.28: The absolute error becomes quite large, with a peak between 0 and0.1 and CAGR around 0. CAGR appears to be distributed in an almost band likefashion, appearing to be independent of the size of the estimation error.

67

Chapter 4. Results and Discussion

����� ����� ����� ����� ���� �������������������������������������

����

����

����

���

���

���

���

���

���

Figure 4.29: The peak for the absolute estimation error lies between 0.006 and0.008 with a CAGR slightly below 0. Highest and lowest CAGRs are reached inde-pendent from the estimation error.

68

4.3. Relationship between Parameter Estimation and Trading Performance

������ ������ ������ ������ ���������������������������������������

����

����

����

���

���

���

���

���

���

��

Figure 4.30: This �gure shows a peak in estimation error very close to zero andslightly below zero CAGR. Some tendency can be perceived where higher CAGRsare reached with smaller estimation error and with growing error the performancedecreases.

69

Chapter 4. Results and Discussion

������ ������ ������ ������ ����� �������������������������������������

����

����

����

����

���

���

���

���

���

���

Figure 4.31: One peak can be identi�ed slightly below zero CAGR and at around0.0002 absolute error for rD. While there are more outliers to higher CAGR regionsthan lower, the PDF shows the mode of the trading returns to be negative.

70

4.3. Relationship between Parameter Estimation and Trading Performance

������ ������ ������ ������ ����� �������������������������������������

����

����

����

���

���

���

���

���

���

Figure 4.32: There are two peaks discernible, both near zero CAGR and withsmall estimation errors. The mode of the CAGR PDF is below zero and the spreadis quite even. No correlation between estimation error and strategy return can beidenti�ed.

71

Chapter 4. Results and Discussion

���� ���� ��� ��� ��� ����

�����

�����

�����

�����

����

����

����

����

����

����

Figure 4.33: Figure shows the trading policy given by λ with respect to rAsset,representing the di�erence between portfolio return and the return of the risky-asset.We can see rAsset having a normal distribution around zero, seemingly independentof that lambda tends to strive towards the boundaries −1 and 1. This can be wellobserved in the corresponding PDF.

72

Chapter 5

Conclusion

This work's purpose was to �nd the in�uence of the initial input variables d, dtd, dtnon the estimation quality of the ECBM, the trading performance and the relationshipbetween them. Our investigation revealed two main issues with the trading strategybased on the ECBM model. The �rst one is that initial inputs seem to have only asmall in�uence on the estimated parameters and trading performance, resulting inoverall unsatisfying estimation accuracies. The second lies with the determinationof the trading policy based on the estimations returned by the ECBM.

Indications for the �rst issue are found with the size of the errors in relative and inabsolute terms. They are large for parameters rD, rN , K independent of the initialinputs, leading us to the conclusion that the estimation quality is simply not high.Further, one could also argue that the estimation for σ is accurate thanks to thejump magnitude set in the synthetic data generation being quite large. Increasingwindow sizes to combat this lack of estimation performance is not necessarily aviable solution. In principle, the trading strategy based on the ECBM tries tocapitalize on corrections of the asset price towards the fundamental value of thesevery assets. So while larger windows d, dtd, dtn facilitate better estimation results forsome parameters, this happens at the cost of reactivity of the trading strategy, whichis present with smaller window sizes. Ultimately the goal of this and every tradingstrategy is to outperform the market, to this purpose the estimations accuracy neednot necessarily be optimized but the trading performance. We need to gage thein�uence of these estimates on the trading policy given by the ECBM, to asses theimportance of an accurate estimation.

The second issue is indicated by the fact that higher estimation precision does nottranslate to superior trading results. While it is not to be expected for the correlationto be one to one, the lack of even small dependencies between them sparks suspicion.Jointplots of all estimates between their absolute error and the corresponding CAGRshow no discernible relationship. Further, as the Kelly Coe�cient strives towardsits set boundaries, independent of market conditions present,we are lead to believethe Kelly Allocation is not the most suitable process to determine trading policy.Investigating other optimal investment methods than Kelly is strongly advised.

73

Chapter 5. Conclusion

This work achieved:

1. Creation of a comprehensive Python function package to generate syntheticdata based on the ECBM as proposed by Kreuser and Sornette [2].

2. Conduction of an exhaustive evaluation of the in�uence on estimation accu-racy, trading performance and the relationship between them by initial inputs.

74

Chapter 6

Outlook

Further work regarding the ECBM can take two directions, the model can be ex-panded to incorporate more features exhibited by real markets or the analysis ofthe model in the form as it exits now can be deepened. Independent of the choice,the analysis can resume with synthetic data for as long as the goal is to under-stand model behaviour. Using real market data is indispensable when the goal is toevaluate trading performance.When continuing the analysis with synthetic data we might go about the followinglist of tasks, ranked by priority:

1. Evaluate the code performance with all true parameters known by the model.

This allows us to investigate the determination of the Kelly Criterion inde-pendent of estimation performance. Ultimately, it will show the in�uence ofthe estimates on the trading performance and we can judge what accuracy isneeded for the estimations.

2. Implement the super-exponential probability model

Understanding why the synthetic data generation procedure does not work inthe super-exponential probability case or better even, implementing a workingsolution needs to be highest priority. This helps to fully and conclusivelyanalyze the model described by Kreuser and Sornette [2].

3. Further investigation of in�uence by the individual building blocks of the modelon trading performance

Running the di�erent generated price series on simpli�ed versions of the ECBMwill help to (i) see whether a simpler and faster code may actually performbetter or as good as the complete model code and to (ii) isolate the performanceof estimation building bricks of the code from each other. We will be able toinfer which parts of the estimation are critical to the model performance andwhich ones work and which do not.

4. Analyzing the period before and after the crash separately.

This allows to di�erentiate the behaviour of the model.

75

Chapter 6. Outlook

5. Vary market parameters over time

To simulate realistic market behaviour, we need to introduce time dependentfeatures. A good point to start with would be heteroscedastic sigma.

6. Investigate relationship between initial input variables

Is there a certain ratio between dtd and dtn that is favourable for tradingresults?

76

Figure 6.1: Figure depicts a satirical take on technical analysis by the webcomicxkcd.com [9].

77

Bibliography

[1] Robert Pardo. The Evaluation and Optimization of Trading Strategies, Second

Edition. Wiley, Hoboken, N.J, 2015.

[2] Jerome Kreuser and Didier Sornette. Super-Exponential RE bubble model withe�cient crashes. European Journal of Finance, 25(4):338�368, 2019.

[3] Ulrich Homm and Jörg Breitung. Testing for speculative bubbles in stock mar-kets: A comparison of alternative methods. Journal of Financial Econometrics,10(1):198�231, 2012.

[4] Harold L. Vogel and Richard A. Werner. An analytical review of volatility metricsfor bubbles and crashes. International Review of Financial Analysis, 38:15�28,2015.

[5] J L Kelly. A New Interpretation of Information Rate reproduced with permissionof AT&T. pages 917�926, 1956.

[6] Mark Davis and Sebastien Lleo. Fractional Kelly Strategies in Continuous Time:Recent Developments. Imperial.Ac.Uk, pages 1�41, 2011.

[7] Georges Harras and Didier Sornette. How to grow a bubble: A model of myopicadapting agents. Journal of Economic Behavior and Organization, 80(1):137�152, 2011.

[8] ETH. https://scicomp.ethz.ch/wiki/Getting_started_with_clusters.

[9] Randall Munroe. https://xkcd.com/2101/.

78

Appendices

80

Appendix A

Heatmaps

On the following pages are all individual heatmaps of the generated synthetic datarepresented. An aggregated version was shown in Chapter 4.

81

Appendix B

Lambda Boundaries (-1,2)

Here you can �nd all trading performance related plots for λ = (−1, 2). A reasonto analyze in this con�guration is the ability to short the risky asset as well as therisk-free asset.To generate these plots the same synthetic data set as in the main body of workwas used. As the reader can see, the results do not di�er signi�cantly from the casewith λ = (−1, 1).

103

�� � � � � �����

����

���

���

��

����

���

���

��

����

����

������

����

�����

� �����������������

�� � � � � �����

���

���

���

��

��

���

���

����

����

����

������

�� �������

� � � � �� �� �� �� � � �� �� � �� ���

������ ������� ������������ �� ��

���

���

�������������������������������

�����

����

�����

����

����

���

� � � � �� �� �� �� � � �� �� � �� ���

������ ������� ������������ �� ��

���

���

��������������������������������

�����

�����

����

�����

����

104

� � � � �� �� �� �� � � �� �� � �� ���

��

���

����

���

���

����

���

���

���

�������������������������������

������

�����

�����

����

� � � � �� �� �� �� � � �� �� � �� ���

��

���

����

���

���

����

���

���

���

��������������������������������

������

����

�����

������

������

�� ���

� �

��

��

��

��

���

���

��

��

���

���

��

���

����

���

���

����

���

���

���

���������������������������������

�����

����

�����

����

����

�� ���

� �

��

��

��

��

���

���

��

��

���

���

��

���

����

���

���

����

���

���

���

����������������������������������

�����

�����

����

�����

105

���������������

����������������

���������������

����������������

���������������

����������������

���������������

����������������

���������������

����������������

��������������

����������������

��������������

���������������

����������������

���������������

�� � � � � � ���

����������������

�� � � � � � ���

����������������

106

� � � � � � �� �� �� �� �� �� �� �� �� � ���

��

��

107

�������������������

�������������������

��������������������

�������������������

������������������

������������������

��������������������

�������������������

��������������������

�����������������

�������������������

�����������������

��������������������

������������������

�� � � � � � ���

��������������������

�� � � � � � ���

�������������������

108

�� ��� �� ��� ��� ��� ��� ��� ��� ��� �� ��� �� ��� ���������

��

��

109

������������������

������������������

������������������

�������������������

�������������������

�������������������

�������������������

������������������

�� � � � � ��� �

�������������������

110

��� ��� ���� ���� ���� ���� ���� ���� ���� ��

��

111

��� ��� ���� ���������������������������

����

���

���

���

���

112

������ ������ ������ ������ ������������������� �����������������

����

���

���

���

��

113

������ ������ ������ ������ ������������������� �������������������

����

���

���

���

��

114

�����������������������������������������������������������������

����

����

���

���

���

��

115

�����������������������������������������������������������������

����

���

���

���

��

116