Estimating the Hurst Exponent - MOSAIC Groupmosaic.mpi-cbg.de/docs/Racine2011.pdfEstimating the...

Estimating the Hurst Exponent

Roman Racine

April 14, 2011

Abstract

The Hurst Exponent is a dimensionless estimator for the self-similarityof a time series. Initially defined by Harold Edwin Hurst to develop alaw for regularities of the Nile water level, it now has applications inmedicine and finance. Meaningful values are in the range [0, 1]. Differentmethods for estimating the Hurst Exponent have been evaluated: Theclassical “Rescaled Range” method developed by Harold Edwin Hurst. Inaddition to nowaday’s standard method, two wavelet-based methods havebeen evaluated and compared, one of which is proven the one with thebest convergence [4] developed by Gloter and Hoffmann. A core part ofthe project was to write software to implement and compare the differentalgorithms.

Contents

1 Problem description 21.1 Definition of the Hurst exponent . . . . . . . . . . . . . . . . . . 2

2 Small overview over wavelets 3

3 Wavelet estimator 5

4 Other estimators 54.1 The standard method . . . . . . . . . . . . . . . . . . . . . . . . 54.2 The optimal method . . . . . . . . . . . . . . . . . . . . . . . . . 6

5 Implementation 6

6 Measurements 8

7 Results 87.1 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87.2 Minimum input length . . . . . . . . . . . . . . . . . . . . . . . . 97.3 Comparison of RS, standard, D4 and D16 . . . . . . . . . . . . . 107.4 Errors depending on H . . . . . . . . . . . . . . . . . . . . . . . . 127.5 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127.6 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

8 Conclusions 17

1

MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich Bachelor Thesis

Roman Racine

9 Higher dimensional cases 199.1 Standard algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 199.2 Rescaled range algorithm . . . . . . . . . . . . . . . . . . . . . . 199.3 Other algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 199.4 Measurement results . . . . . . . . . . . . . . . . . . . . . . . . . 19

9.4.1 Synthetic data . . . . . . . . . . . . . . . . . . . . . . . . 199.4.2 Real data . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

10 Future work 2210.1 Multidimensional wavelet estimator . . . . . . . . . . . . . . . . . 2210.2 Wavelet packet algorithm . . . . . . . . . . . . . . . . . . . . . . 22

A Other applications 22A.1 Stock market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22A.2 Picking on seismic signals . . . . . . . . . . . . . . . . . . . . . . 23A.3 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

B Software manual 27B.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27B.2 Compiling and installing . . . . . . . . . . . . . . . . . . . . . . . 27B.3 Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

B.3.1 Binaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27B.3.2 Web service . . . . . . . . . . . . . . . . . . . . . . . . . . 28

B.4 Delivered files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1 Problem description

The Hurst Exponent is a dimensionless estimator for the self-similarity of a timeseries. Initially defined by Harold Edwin Hurst to develop a law for regularitiesof the Nile water level, it now finds applications in medicine and finance. TheHurst Exponent H is related to the fractal dimension D by the relation D = 2−H. For white noise H = 0, for brownian noise H ≈ 0.5; H = 1 indicates directedmotion. Applied to financial data such as stock prices, the Hurst Exponent canbe interpreted as a measure for the trendiness: H < 0.5 high volatility, stockprice is anti trended, H = 0.5, stock price behaves like a brownian process, notrend, H > 0.5 stock price has a trend. Some people believe that an estimationof the Hurst Exponent may yield some valuable information on the long-termbehaviour of a particular stock.

A similar interpretation exists for virology applications: H < 0.5 meansthat a virus is locally confined, H ≈ 0.5 means the virus makes a brownian (andtherefore undirected) motion H > 0.5 indicates a directed motion. [9, 6]

1.1 Definition of the Hurst exponent

There are several ways to formally define the Hurst Exponent. The oldestdescription developed by Harold Hurst himself is as follows:

E

[R(n)

S(n)

]= CnH , n→∞ (1)

2


Roman Racine

The left hand side is also known as the expected value of the rescaled range(see [1]). R(n) is defined on a time series Xi, i = 1, 2, . . . , n as follows:

R(n) = max(Xi, i = 1, 2, . . . , n)−min(Xi, i = 1, 2, . . . , n) (2)

S(n) is the standard deviation and C an arbitrary constant.This immediately yields a procedure how to compute the Hurst exponent.

For some i, i = 1, 2, . . . , N5 (where N is the length of the time series) one com-putes

ai = E

[R(i)

S(i)

]For every size i, the right hand side is estimated by partitioning the time seriesinto chunks of size i. On every chunk, R(i) and S(i) for this particular chunk

are computed. The expectation value E

[R(i)S(i)

]for the whole time series is then

estimated by taking the average over all sub-results of all chunks. This thenleads to the equation

E[ai] = CiH

and thereforelog(E[ai]) = log(C) +Hlog(i) (3)

Using more than two different i, these equations are generally overdeterminedprovided there is enough input data and can be solved using a least squares fit.The slope of the fit will be the estimated value for H, the constant offset theestimated value for log(C) which is in this case unimportant. Unfortunately, thisprocedure generally shows poor convergence and bias, as will later be shown.

Using this method, Harold Hurst has calculated the Hurst exponent of theyearly Nile high water which has obviously been recorded for centuries. Inter-estingly, the estimated value was H = 0.77, meaning that this time series showssome memory, which makes sense, as for example large lakes on the upper Nilecould act as “memory” between different years.

The rest of this thesis focuses on wavelet-based estimators which are sup-posed to show better convergence as well as better computational behaviour. Aswell, a more modern method than rescaled-range is introduced as a benchmark.

2 Small overview over wavelets

Wavelets are a tool which is used in signal analysis and which are in somekind similar to the Fourier transform. The main difference is, that the Wavelettransform is localized in the time as well as the frequency domain, while theFourier transform is only localized in the frequency domain. Depending on thetransformation method chosen, there might also be advantages in computationaleffort: The wavelet transformation can be computed in O(n) as opposed toO(nlog(n)) for the Fourier transformation. The result of a Fourier analysisshows which frequencies have occurred in the signal, but contains no informationabout where in the signal they have occurred. The Wavelet transform gives morefreedom: Depending on the chosen Wavelet function, one can make a trade offbetween resolution in time space and resolution in frequency space. (see [2]).

3


Roman Racine

Figure 1: Three synthetic traces with different Hurst exponents

Throughout the rest of this thesis, only discrete wavelets will be used, asthe input signal is discrete as well. The wavelets used include the Daubechieswavelets named after Ingrid Daubechies as well as Coiflets which have beendeveloped by Ingrid Daubechies and Ronald Coifman. The use of other discretewavelets would be possible as well.

A wavelet generally consists of two functions, a low pass and a high passfilter. Depending on the design of the wavelet, these filters might have specialproperties, e.g. the two filters being orthogonal. (All treated wavelets in thisthesis are orthogonal.)

A wavelet decomposition consists of several subsequent applications ofthe wavelet on a time series. The most common way to do this is as follows:

1. Decompose the signal, this leads to a low-pass filtered and a high-passfiltered new signal.

2. Both signals are down sampled by two. The decomposed high-pass fil-tered signal is called an “octave” of the time series, as it contains theappropriately sampled signal for a certain octave.

3. Reiterate on the low-pass filtered signal.

For the chosen wavelets in this thesis there are some clever techniques whichallow to do the decomposition in O(n) time and O(n) space. However, this isnot the only decomposition which is widely used. Another one is the so-calledstationary wavelet transform which skips the down sampling described above.The advantage is that no information is lost due to down sampling, however, thedrawback is that this decomposition needs O(nlog(n)) time and space. (Strictly

4


Roman Racine

Figure 2: scheme of a wavelet decomposition, source: Wikipedia

speaking from an information-theoretical point of view, of course no informa-tion is lost after applying the standard discrete wavelet transformation as it isreversible. However, some of the information is not accessible by algorithmswhich are operating just on one octave at a time, which in some cases justifiesthe use of the stationary wavelet transformation.)

3 Wavelet estimator

The basic equation to derive the Wavelet estimator is as follows (from [4]):

E

[Qj+1

Qj

]= C2−2H (4)

Q is the “energy level” of a certain “octave” (i.e. the high-pass filtered outputof a certain stage of the wavelet decomposition):

Qj :=∑k

d2j,k (5)

where dj,k are the coefficients corresponding to the jth octave. C is an arbitraryunknown constant.

Knowing this, an estimator can be constructed by estimating Qj for differentj: Defining

aj :=Qj+1

Qj

E[aj ] := C2−2H (6)

log(E[aj ]) = −2Hlog(C) (7)

Doing this for several aj , this leads to an overdetermined equation systemand using a linear least squares fit, the Hurst exponent can be estimated as−0.5 of the slope. This procedure can be run for arbitrary discrete wavelets.

4 Other estimators

4.1 The standard method

What is called “the standard method” in this thesis is an algorithm based onthe following equations (see [9]) where xi, i = 1, 2, . . . , N is the time series. This

5


Roman Racine

method makes use of two different power laws. First, one makes use of the factthat for a time series x1, x2, . . . , xN with N elements, the right hand side of thefollowing equation follows a power law scaling with ∆n:

µν(∆n) :=1

N −∆n

N−∆n−1∑n=0

|xn+∆n − xn|ν (8)

For every ν, µν(∆n) follows a power law in the following sense:

E[µν(∆n)] = C1∆nαν (9)

αν can be estimated by computing µν(∆n) for different ∆n and doing alinear regression of log(µν(∆n)) against log(C1∆n).

After this, αν and ν are coupled through H:

E[αν ] = Hν (10)

By estimating αν for various ν, one can estimate H by using (10) for asecond linear regression.

I have chosen ν to be in the range i = 1, 2, . . . , 10 and ∆n to be in the rangej = 1, 2, 4, 8, . . . , N/5 where N is the number of elements in the time series.

This algorithm is in O(n2) and therefore considerably slower than all of theothers as the measurement results will confirm.

4.2 The optimal method

The optimal method is described in the paper [4]. The authors prove that thismethod will lead to the optimal rate of convergence under some assumptions.They specially treat noisy input data where the algorithm is shown to be opti-mal. Note, that the optimal method imposes some limitations on the choice ofwavelets. The wavelet needs to have open support in [0, S] (S some integer) andtwo vanishing moments, which rules out coiflets and the D2 wavelet. While thewavelet decomposition itself is in O(n), the optimal algorithm is in O(nlog(n))because it does not wavelet-decompose the original input data itself but pre-treats the input data in O(nlog(n)) time before decomposing it afterwards. Asthe results will show, this pre-treating pays off when the data is noisy. However,also note that having the optimal rate of convergence does not mean that thealgorithm gives the best results for any real-world sized inputs, this just meansthat for inputs which are large enough, the algorithm will eventually dominateall the others in precision.

5 Implementation

The code is written in C++. Besides the helper functions to read and printdata, the outline of the code is as follows:

The base classes are HurstEstimator and Wavelet. The different estima-tors for the Hurst exponent as well as the different types of wavelets are derivedclasses of these two base classes.

The class HurstEstimator basically contains some methods for input andoutput. It will read in data, call an estimator and output the data. This is justsome standard C++ code without any sophisticated parts.

6


Roman Racine

The class Wavelet contains basic wavelet methods described in the previoussections: The standard decomposition as well as the stationary decomposition,using what is called the “algorithme a trous”: Instead of down sampling thedata as in the standard wavelet decomposition, the wavelet coefficients are upsampled by inserting zeros. Using the stationary decomposition, no informa-tion is lost due to down sampling, on the other hand, more time and space isneeded O(nlog(n)) instead of O(n). (To state this again: From an information-theoretical point of view, the information is not lost due to down sampling, butit’s not accessible for an algorithm which just gets a single octave as an input.)

While the rescaled-range algorithm by Harold Hurst as well as the standardalgorithm can take inputs of arbitrary size, Wavelet algorithms expect the inputlength to be a power of two. Also, one needs to make a decision on what to doat the edges of the input: A wavelet filter needs to have input values for everyfilter coefficient, at the edge of the input, these values do not exist. Severalapproaches exist on how to overcome these problems. For this project, I havedecided to pad the input with the last known value. The justification for thischoice is, that doing this, a constant signal is introduced into the original signal.The constant part of the signal is discarded after the decomposition, so this wayshould minimize the distortion. However, the Matlab Wavelet Toolbox has aslightly different handling of edges and produces one element more after downsampling. This leads to a slightly different decomposition than using the codewritten for this thesis. These differences are due to different handling of theedges and are not fundamental to the implementation of the algorithm.

Besides the described code, a linear least squares solver has been imple-mented as well which is needed by all estimators. The mathematics behind itare as follows: All least squares problems in this thesis are of the kind:

aim+ c = bi (11)

where m is the value of interest. These equations are generally overdetermined,i.e. more than two of these equations exist. These equations are solved by alinear least squares fit by defining

A :=

a1 1...an 1

(12)

The equation system can then be written in matrix form:

Ax = r (13)

where x = [m c]t contains the values looked for.This is then solved by multiplying At on both sides.

AtAx = Atr

The matrix B := AtA is always positive definite and can therefore be decom-posed using the Cholesky decomposition where R is an upper triangle matrix:

Bx = RRtx = Atr (14)

7


Roman Racine

Defining b := Rtx, one first forward solves

Rb = Atr

After this, one backward solves

Rtx = b

This yields x. Knowing that the matrix B = AtA is always positive definitemakes it possible to use the Cholesky decomposition instead of Gauss elimi-nation and eliminates the need for handling the numerical stability which is acommon problem for Gauss elimination. See [5] p. 145 for a detailed description.

6 Measurements

The following algorithms have been compared:

1. Rescaled Range

2. Standard method

3. Optimal method with wavelets D4, D8, D16 and D64 (Described in [4]).

4. Wavelet method with wavelets D2, D4, D8, D16, D64, C6 and C12 (DN:Daubechies wavelet with N coefficients, CN: Coiflet with N coefficients)

5. Wavelet method using stationary wavelet decomposition with wavelets D2,D4, D16, D64 and C12

Artificial data has been created with Hurst exponents of H = 0.0, 0.1, . . . 1.0and with trace lengths of 28 to 220. Thirty-two test sets for each combination ofH and length have been generated, average and standard deviation have beencomputed. The relevant criteria were the accuracy of the estimator as well asthe standard deviation of the results.

The tests were run with the data as is as well as with various levels of noiseadded in a way which will be described in the next section.

7 Results

Tests have been run on simulated data for all combinations of input length26, 28, 212, 216, 220, H = 0.0, 0.1, . . . 1.0 and hurst estimators on thirty-two testsets for each combination to compute the standard error yielding more than1500 test results. Only a small subset of all the data is shown in the followinggraphics. The error bars displayed on all images mark the standard deviationwhich was computed over all thirty-two test sets.

7.1 Bias

Some initial testing with a “naive” implementations of “Rescaled Range” and“Standard” has shown some significant bias. After some discussion, two mainreasons for the bias were identified:

8


Roman Racine

Figure 3: Comparison of three different implementations of the Standard al-gorithm for H = 0.2 and different input sizes. Green: Naive implementation,Blue: Taking into account that window sizes should not be larger than N

5 , Red:

Additionally limiting any window size to a maximum of 216

5 . Red and blue areidentical up to 216.

• Taking into account window sizes which are too large compared to thelength of the time series. As a general rule of thumb, one should not usewindows which are larger than N

5 where N is the length of the time series.

• For very large time series (larger than 216), one should additionally limit

the upper size of any window to about 216

5 .

Figure 3 shows a comparison between the naive implementation, the limita-

tion to N5 and the second limitation to 216

5 for large inputs for a hurst estimatorof H = 0.2. From the data, it’s clear that these two steps will significantlyreduce bias of these two algorithms.

After optimizing the maximum window size for the two window-based al-gorithms, all of the measurements have been run with the optimized versions.Raw data for the non-optimized versions is delivered as add-on.

7.2 Minimum input length

Results clearly show that most of the tested algorithms are not suited for smallinputs (e.g. 26 data points). The wavelet-based algorithms need to fill theirfilter banks in order to produce some useful output. A large percentage of thedata worked on will consist of padding when running the algorithms on shortinputs. Also the rescaled range algorithm is very unstable on too short inputsand should therefore not be used. Generally, the following rules of thumb hold:

9


Roman Racine

Figure 4: Results for synthetic data with H = 0.2

Up to input sizes of 256, only the standard algorithm can be used. After this,wavelet-based algorithms using wavelets with few wavelet coefficients (up toD16) as well as rescaled range can be used as well. Wavelet-based algorithmsusing bigger wavelets such as D32 or D64 should only be used on very largedata sets as the requirement for padding grows with the number of waveletcoefficients in a wavelet.

Generally, larger input sets will also reduce the standard error of the es-timation, which is evident on all subsequently shown graphs in this section.Especially estimations on very short input data sets should be treated withcare, as the standard error is significant! All of the graphs in this section onlycontain information for input sets of the size 26 for the standard algorithm.All of the other algorithms either cannot produce any result because they needsome minimum input size, or have that little accuracy or a that large standarderror, that they are completely unusable. Based on the results of these testsI have set some minimum sizes for the input data length in the code of thedelivered software, so that users do not run unreasonable combinations of inputdata length and algorithms. Of course this can be easily changed in the code ifreally needed, but this should be done with care.

7.3 Comparison of RS, standard, D4 and D16

The first series of graphs shows a comparison of the Hurst estimators “RescaledRange”, “Standard”, “Wavelet” (using D4) and “Wavelet” (using D16). Resultsare shown for a synthetic traces with H chosen as 0.2, 0.5 and 0.8.

It’s obvious from the graphs that the wavelet-based methods start with ahuge standard error for small datasets compared to the classical algorithms but

10


Roman Racine



11


Roman Racine

Figure 7: Comparison of the optimal algorithm to others with H = 0.2

will eventually outperform all other algorithms for long inputs. By lookingat the results, the best way to work with estimators would be to choose theclassical algorithms for small inputs and the wavelet-based algorithms for largeinputs. This even gets clearer when one takes the optimal algorithm by Gloterand Hoffmann into account (called “opt” in the image labels) (see Figure 7).Although only a graph for H = 0.2 is shown, this holds for all other testedvalues as well.

7.4 Errors depending on H

Another observation is that the accuracy depend on H for the classical algo-rithms (“Rescaled Range”, “standard”). It tends to be better around H ≈ 0.5and gets worse at the extremes. Both algorithms seem to have a bias towards0.5 while the wavelet-based algorithms are bias-free. Figure 8 illustrates this.

7.5 Noise

In contrast to synthetic traces, most real-world data will contain noise, as mea-surements cannot be carried out with arbitrary precision. The authors of [4]claim their algorithm treats noise best. I have run all algorithms with the sametest sets with noise added to it. Noise has been modeled as random normaldistributed factor with average 1 and standard deviation l to each data point:

yi := xiR (15)

Where R = N (1, l). I have run noise tests for l ∈ {0.05, 0.1, 0.2}. On theone hand, it’s clear that multiplying random distributed noise will influence

12


Roman Racine

Figure 8: Plot of true H against estimated H with 216 data points as input

the Hurst exponent of a time series. However, it’s very interesting to see thatalready for l = 0.05, most of the algorithms show very poor performance (seeFigure 9). As can be seen from the figures, RS and Std are already unusablefor l = 0.05. The standard wavelet algorithm starts to give completely uselessresults with l = 0.1 (see Figure 10). Only the optimal algorithm can handlethis level of noise. However, keep in mind that the optimal algorithm needslarge input sets (size 216 or larger, see figure 7) to get to a reasonable level ofstandard error. If only small input sets are available, the only thing to do is toreduce noise in measurements as far as possible. Otherwise, there is no way toestimate the Hurst exponent from the data as there is no suitable algorithm forsmall data sets which can cope with a significant amount of noise.

After these results, I have run more tests only with the optimal algorithm forl = 0.4 and l = 0.6 to explore the limitations of this algorithm. Figure 11 showsthe results. The results show that the optimal algorithm even at higher noiselevel gives some useful results. However, the standard errors keep on growing, solarge input data sets well beyond 216 are needed to estimate the Hurst exponentwith a certain confidence.

The conclusion from the noise measurements is, that for short input sets(such as 64 data points), where the standard algorithm is the only one which isavailable, measurement accuracy is crucial, as even small amounts of noise willdistort the results. Algorithms which cope well with noise need very large inputsets to reach a certain precision.

13


Roman Racine

Figure 9: Plot of different algorithms on a data sets of size 216 with multiplica-tive noise normally distributed with N (1, 0.05)

Figure 10: Plot of different algorithms on a data sets of size 216 with multiplica-tive noise normally distributed with N (1, 0.1)

14


Roman Racine

Figure 11: Plot of the optimal algorithm (using D16) on a data sets of size 216

with multiplicative noise normally distributed with standard deviation l as inthe legend.

7.6 Runtime

Run time measurements have been made for inputs between 28 and 220. Themeasurements have been made on a machine with an Intel Core Duo QuadCPU and 4GB RAM. The complete test bench is written single-threaded. Thegiven values are pure CPU time in the maximum resolution available. Figure12 shows the run-time behaviour of the different algorithms. It’s clearly visiblethat the standard algorithm is in O(n2), the Optimal algorithm in O(nlog(n))while the others are in O(n).

Another metric is the run time needed to achieve a certain precision. Thisis based on the following consideration: The confidence interval for a series ofn measurements of a random variable X is computed as follows:

ci = tp,nσ√n

(16)

where for any X holds

P (x− ci ≤ X ≤ x+ ci) ≥ p (17)

where σ is the standard deviation of the random variable X and x its averageand tp,n the corresponding value of the t-distribution for a certain confidencelevel and a certain n. (See [8], chapters 6 and 7.)

This means that quadrupling the input will make the confidence intervalsshrink by a factor of two. Depending on the inherent precision of the algorithmin question (i.e. how much standard error it produces), more or less input data is

15


Roman Racine

Figure 12: Runtime comparison of the different algorithms.

Figure 13: Runtime comparison of the different algorithms without std algo-rithm for better overview.

16


Roman Racine

needed to meet a certain target. If the desired confidence interval is cides = 0.05and the measured confidence interval with n input data is c, then the numberndes of input data points needed to reach cides is

ndes = n

(c

cides)

)2

(18)

Table 1 first shows the raw values for the amount of input data needed toreach a confidence interval of at most 0.05 around the average value. Thesevalues are to interpreted like this: For a given algorithm, measuring as manytime series with the run length given on the top so that the total number of datapoints (number of time series times length of each run) equals the value in thetable will guarantee that in 95% of all cases doing this the resulting average willbe within ±0.05 around the average that a certain method estimates. However,this is just a guarantee for the precision (being within a certain range around anaverage), it does not guarantee the accuracy of the result (being within a certainrange around the true value). As has been shown in the previous sections, somealgorithms tend to have a bias. While by looking at the table and keeping inmind that some algorithms have super-linear run time, it might be temptingto cut long datasets into several short data sets, as these will run faster and acertain level of precision is reached within a shorter time. However, the increasedprecision is paid by lower accuracy, as it is clear from the previous subsections,that runs with small data sets will produce less accurate results. Depending onthe application, it might of course be worth paying.

The next step is to take into account the run time. This is done by simplyextrapolating the measured run times to the number of data points calculated intable 1. This way, one knows not only how many input data points one needs,but also how much time it will cost to process them using a certain algorithm.Depending on how long it takes to measure the data compared to process thedata (or how much it costs or some other metric), different trade-offs might bechosen. This information is displayed in table 2.

Generally, one can see, that for input sequences up to about 212, the standardalgorithm needs the fewest input to achieve the desired precision. For longerinputs, wavelet based algorithms start to be competitive. This gets even clearerwhen taking run time into account. As the wavelet based algorithms are inO(n) compared to the standard algorithm which is in O(n2), they need muchless time to achieve the desired precision. The optimal algorithm completelyloses this test at least for input sizes up to 220. However, its strengths are itsresilience against noise.

8 Conclusions

The following conclusions can be made after the measurements. There is nofree lunch. Except for the rescaled range algorithm developed by Harold Hursthimself which is dominated by other algorithms in all tested areas, all algorithmshave their strengths and weaknesses.

• The standard algorithm can be used for short inputs (such as 26). Itsthe only algorithm which delivers reasonable results for small input sets.However, it has quadratic run time and is therefore significantly slower

17


Roman Racine

Algo / Run length 26 28 212 216 220

RS - 14,710 23,850 39,560 694,430Std 1,459 2,095 3,994 40,605 91,119Wavelet (D4) - 6,730 8,520 66,200 276,510Wavelet (D16) - 31,920 7,910 27,450 128,490Wavelet (D64) - - 9,593 12,542 50,170Opt (D16) - 17,600 347,400 897,600 3,054,600

Table 1: Total number of measured data points needed to achieve a confidenceinterval of ±0.05 given a certain length of each individual run shown for differentalgorithms using synthetic traces with H = 0.5.

Algo / Run length 26 28 212 216 220

RS - 0.0103 0.0298 0.0698 1.204Std 0.0486 0.1039 0.3047 4.7592 10.8804Wavelet (D4) - 0.0066 0.0065 0.0284 0.1434Wavelet (D16) - 0.0249 0.0060 0.0484 0.2416Wavelet (D64) - - 0.0293 0.0885 0.3677Opt (D16) - 0.0192 1.0602 5.8637 62.1483

Table 2: Time needed to achieve a confidence interval of ±0.05 using synthetictraces with H = 0.5. All values are in seconds.

than the wavelet-based estimators on long inputs. But as a positive aspect,it needs least input data for small input sets up to 212 to achieve a certainprecision. On the other hand, its also very prone to noise in the data andhas some bias in contrast to the wavelet-based algorithms and the optimalalgorithm which are bias-free.

• The wavelet-based algorithms have a broad range where they performwell. Their good convergence means that they need less input data asother algorithms specially for large time series and their linear run timemakes them fast. For very long input sets such as 220 or larger, waveletswith more coefficients start to get interesting, as they need less input thansmaller wavelets. Also, the wavelet-based algorithms have some resilienceagainst noise. The drawback is that wavelet algorithms can only run ondata sets with a certain minimum length, e.g. 128 or even 256 as theyneed some input data to fill their filter banks. They are not suitable forshort input sets.

• The optimal algorithm shows weaker performance than the wavelet-based algorithms in most tests up to the maximum input size tested. Onthe other hand, its resilience against noise is superior to all other testedalgorithms. It’s the only algorithm which can be used in the tested noisemodel with multiplicative noise with a distribution of N (1, 0.1) or more.However, also the optimal algorithm is not suitable for small input sets,as it shares the same problems as the standard wavelet-based algorithms.

In general, different aspects must be taken in to account before choosingthe algorithm. For short inputs, only the standard algorithm is suitable. Asit is prone to noise, this algorithm must be used with care. For large input

18


Roman Racine

sets, wavelet-based algorithms will show better overall performance. For noisydata sets, depending on the level of noise, the optimal algorithm or the otherwavelet-algorithms can be used. In this case, one needs to use larger input datasets.

9 Higher dimensional cases

9.1 Standard algorithm

The standard method can be easily extended to two and more-dimensional ap-plications. For the two-dimensional case, the equation (8) can be adjusted likethis:

µν(∆n) :=1

N −∆n

N−∆n−1∑n=0

‖xn+∆n − xn‖ν2 (19)

Note that the absolute value has now changed to the L2 norm.

9.2 Rescaled range algorithm

Also the rescaled range method can be extended to a more-dimensional case.However, equation (2) needs to be reformulated like this:

R(n) = maxi,j(dist(xi,xj)) (20)

As this leads to a significant computational effort, I have chosen to do thefollowing instead for every time window starting at index l and of size n:

x =1

n

l+n∑i=l

xi (21)

R(n) = 2maxx(dist(x, x)) (22)

The standard deviation S(n) is computed as

S(n) :=1

N − 1

N∑i=1

‖x− x‖2

9.3 Other algorithms

The use of wavelet algorithms for multidimensional cases is not that easy, asmultidimensional wavelets would be needed for this. I have not followed thispath.

9.4 Measurement results

9.4.1 Synthetic data

First, these algorithms have been run against synthetic data sets to verifytheir correctness. The synthetic data sets have H ∈ {0.0, 0.5, 1.0} and l ∈{26, 28, 212, 216, 220}. One graph for each value of H is shown. As one can

19


Roman Racine

Figure 14: Results for synthetic data with H = 0.0. The error bars for Std areso small that they are not visible.

Test case estimated H Ci lower bound Ci upper boundAll3T6 2 0.112 0.0685 0.155All3T6 4 0.0723 0.0519 0.0928All3T6 0.1347 0.115 0.155AllCtrl −0.0205 −0.0293 −0.0117AllLatA 0.373 0.352 0.395AllMCD 0.0146 0.00679 0.0223AllMCDLatA 0.0718 0.0401 0.103

Table 3: Measurements for real data.

see, the rescaled range algorithm shows a much poorer convergence than inthe one dimensional case. I suppose that this is because of some design deci-sions made such as estimating he maximum distance between two points usingR(n) = 2maxx(dist(x, x)) which of course is not exactly the true value. How-ever, I did not further follow this path, as the rescaled range algorithm is knownto have poor performance from the one dimensional case and should generallynot be used.

9.4.2 Real data

The tests with real two-dimensional data have only been made with the standardalgorithm in 2D. The data consists of a number of different test cases with severalindividual runs each containing several hundred two dimensional data points.Table 3 shows test case, estimated Hurst exponent and the 95% confidenceinterval bounds.

20


Roman Racine



21


Roman Racine

10 Future work

10.1 Multidimensional wavelet estimator

Multidimensional Hurst estimators only been treated very limited. More workcould be done to adapt wavelet-based estimators for 2D or even higher dimen-sional applications. The implementation which has been made in this thesis isbased on the “standard algorithm“ and therefore is very slow for large inputsets.

10.2 Wavelet packet algorithm

As described in [6], the Hurst exponent can also be estimated using the waveletpacket algorithm. This algorithm is widely used in image compression and givesmore degrees of freedom than the classical wavelet decomposition. Instead ofonly decomposing the low-pass filtered output of one decomposition step, thewavelet function is applied to the high-pass filtered output as well. Afterwards,a best basis is built from all of the decomposed data.

A Other applications

A.1 Stock market

I have applied the Hurst estimators to historical closing prices of the blue chips ofthe Swiss stock market. Some measurement results (using a static decompositionwavelet estimator) are shown in the table below. The data used is from June2005 to May 2010.

ABB 0.44Actelion 0.47

Baer 0.43Credit Suisse 0.31

Holcim 0.40Lonza 0.35Nestle 0.48

Novartis 0.38Richemond 0.51

Roche 0.42Swiss Re 0.34Swisscom 0.42Swiss Life 0.49Syngenta 0.49Synthes 0.42

UBS 0.48Swatch 0.52Zurich 0.45

This does not look very promising, if the values were clearly over 0.5 in allcases, making money with stocks would be a lot easier. Probably, more informa-tion could be taken out of the data, if not only closing prices (i.e. prices at theend of the day) would be taken into account or if a time window had been cho-sen which does not contain huge turbulences on the financial markets. It would

22


Roman Racine

also be interesting to investigate whether the Hurst exponent has dropped afterthe introduction of electronic trading as well as after making real-time tradingavailable to the broad public through on-line trading. However, appropriatehistorical data is not freely available.

A.2 Picking on seismic signals

As I am working at the Swiss seismological service and we are currently evaluat-ing new software to locate earthquakes, I have tested a wavelet-based algorithmdescribed below against other algorithms we are evaluating. Given a measure-ment signal (basically a time series) as the input, the problem is to find asexactly as possible the onset time of an arriving P wave (primary wave, thefastest seismic wave and therefore the first to hit a measurement station). Thecomplicated part about this is to cope with background noise and to get the on-set time as exactly as possible. A usual problem for all of the algorithms is thatthey might miss the P wave and only declare the onset time when the larger butslower S wave (secondary wave) arrives, a problem which all algorithms usingthreshold values are prone to. Having some picks on the P and some on the Swave will distort the results. Various algorithms have been developed to pickearthquakes and to filter mispicks.

I have implemented a wavelet-based picker which basically does the following

1. Make a stationary wavelet transform. This gives several time series, eachcontaining one octave of the original signal.

2. For each octave, estimate the average and the standard deviation of theenergy (squared decomposed values) on the first n seconds. This is anestimate for the noise in the respective frequency band.

3. Run a sliding window over the rest of the data, declare a pick if the energyreaches at least m times the average plus l times the standard deviation.This might result in no pick as well if there is too much noise or too littlesignal.

4. Doing this on all octaves generally results in a number of picks. If thereare more than k picks, a linear least squares fit is run. As every filterdilutes the signal (because the wavelet is a windowed function), using theleast squares fit allows an estimate on what the pick would be on theundecomposed signal. For every decomposition step, an equation like thisis produced:

tim+ c = i (23)

where i is the number of decompositions already done (e.g. for i = 0 thiswould be the original signal), ti the picking time on the correspondingoctave, m and c unknowns. Solving the linear least squares problem andsetting i = 0 leads to the desired value.

Good values for the parameters k, l, m and n need to be found by priorknowledge and testing.

This procedure can be refined by using a wavelet with a high resolution inthe frequency domain (but a poor in the time domain) to do a first round ofpicking. This way, noise is more probably contained in just a few frequency

23


Roman Racine

bands, but picks are poor as the resolution in the time domain is poor. Afterthis initial round, a second round is done with a wavelet with a high resolutionin the time domain (but a poor in the frequency domain). Instead of picking allover again, only fine adjustments of the original picks are made within a tinywindow. This way, noise is not that much of a problem any more. The waveletsD64 and D16 have been used for this purpose.

The assessment of the wavelet picker was made against the Baer-Kradolferpicker from ETH (see [3]) and AR-AIC (see [10]) using a predefined set of variousevents which have been reviewed manually before. All of the algorithms havebeen run as a plugin to seiscomp (s. [7]). Testing is still ongoing. Some resultsshow that the wavelet picker generally makes less picks than other availablepickers and has far less false positives. However, it also misses more picks thanother algorithms. Parameters can be tuned for all available algorithms, anddepending on the testing metric (how much to punish false positives and howmuch to punish missed picks) results can be quite different.

A.3 Images

To illustrate the principle and the usefulness of a stationary wavelet transform,two typical seismic signals as well as their static wavelet decomposition areshown. The first one (BRANT) shows a typical signal of a remotely locatedstation which is distant to the earthquake. Most of the noise is low-periodic.Also note that little energy is in the high frequency part of the earthquake signal.This shows that the earthquake is at quite some distance, as the earth acts asa low-pass filter. The second shows a typical signal of a station located close tohouses and close to the earthquake (WILA). A lot of high periodic noise showsthe presence of civilization (cars, machines etc.). Also note that the energy isspread over the spectrum, this shows that the earthquake has happened close.If the distance between earthquake and station were bigger, less energy wouldbe in the high frequency part of the earthquake signal. The colors have beenadjusted to use the full possible range and do not correspond with specific exactvalues.

The method on how to graphically display the information is taken from [2].

24


Roman Racine

Figure 17: Seismic signal of station BRANT, vertical component, x axis in1/120s, y axis velocity in raw counts

Figure 18: Scalogram display of static wavelet decomposed data for stationBRANT, x axis: 1/12s, y axis: top highest octave, bottom lowest displayedoctave, red: lot of energy, blue little energy

25


Roman Racine

Figure 19: Seismic signal of station WILA, vertical component, x axis in 1/120s,y axis velocity in raw counts

Figure 20: Scalogram display of static wavelet decomposed data for stationWILA, x axis: 1/12s, y axis: top highest octave, bottom lowest displayed octave,red: lot of energy, blue little energy

26


Roman Racine

B Software manual

B.1 Overview

B.2 Compiling and installing

The core part of the software consists of C++ template classes. No otherlibraries than the standard C++ library is needed to compile them, so thisshould work on every common platform. The web front end is written in Perl andonly uses the libraries which are distributed together with the Perl compiler, noadditional software installations are required. It makes use of the CGI interface.The synthetic traces have been created by a Matlab routine delivered by IvoSbalzarini. The testing routines to drive the test benches partially make useof the Perl library Math::Random which is not part of the core distribution ofPerl but is available as a separate package on common Linux platforms and onhttp://www.cpan.org. However, this part is not necessary for using the software.

Compiling the software should be as simple as changing to the source di-rectory and typing ”make“ and ”make install“ which will copy the files to/usr/local/bin.

To install the web interface, copy web interface.pl to a web server directoryand make sure that CGI is enabled for the particular web directory. You mightneed to adjust the parameters in the first section of the script according to yourneeds (maximum file size, location of the hurst estimator binary). As executingthe web interface script with large volumes of data can impose a heavy load onthe system, you will probably want to limit access to the script to particularusers, e.g. by password protection or any other mechanism supported by theweb server.

B.3 Use

B.3.1 Binaries

After compiling, the following binaries should have been produced:

• hurst: This is the main binary. A help message is displayed when runwithout parameters.

• test 1d: This is a binary for testing the library against 1D input data.This binary is designed to be driven through testbench 1d.pl.

• test 2d: This is a binary for testing the library against 2D input data.This binary is designed to be driven through testbench 2d.pl.

• static wavelet transform: This binary takes a time series as input andcreates output which can then in turn be used as input for the Matlab”image“ command to produce visual output of a static wavelet transform.The chosen parameters are optimized to display seismic waveform signalsin a format which is typical at my working place. In case of other inputs,the routines will need to be fine-adjusted in order to produce an optimalvisual result.

27


Roman Racine

B.3.2 Web service

The web service should be pretty straightforward to use. Files can be uploadedin plain text format. The following input format is expected: In case of an 1Dalgorithm, the input format might be two values on one line or one single valueon a line. In case of a single value on a line, the whole input file is treated as asingle time series, in this case, the output will be the estimated hurst exponentfor all chosen algorithms.

In case of two values per line, the first value is assumed to be an indexand the second one the corresponding value. In case an index is lower thanits predecessor, the program assumes that a new data set has begun. In caseof multiple data sets, the output value will be the average value, the standarddeviation, the lower and upper 95% confidence interval.

In case a 2D algorithm has been chosen, in case of two values per line,the algorithm assumes a single input set, in case of three values per line, thefirst value is treated as an index. In all other aspects, the same information isdisplayed as in the 1D case. 1D and 2D algorithms cannot be mixed on thesame data set (this would make absolutely no sense anyway).

Blanks, tabulators, commas and semicolons or any combination of them areaccepted as separators. In case the input is invalid or too short for a certainhurst estimator, the result will be -1.

B.4 Delivered files

As a general remark, the standard method is called ”MSS“ in all of the deliveryfiles as well as in the source code.

• measurement output: The measurement output resides here. The filesresult 1d*.txt contain the one dimensional measurements, result 2d*.txtthe two dimensional measurements. For the one dimensional case, noisemeasurements have been made as described in section 7.5. In this case,the value at the end of the file name describes the level of noise introducedinto the data, e.g. result 1d 0.2.txt is the measurement output for mul-tiplicative noise distributed as N (1, 0.2). The output files are formattedas follows: For every measured combination of simulated H and inputlength, a section starts with H l. After this on every subsequent line ofthe section, a line is formatted like this: Name of the estimator, average,standard deviation, stddev

avg , lower bound of the 95% confidence interval,

upper bound of the 95% conficence interval, running time.

Example: RS: 0.12461 0.01705 0.13682 0.11871 0.13052 2.18125

This describes a measurement with the rescaled range algorithm with av-erage 0.12461, standard error 0.01705, fracstddevavg = 0.13682, lowerbound of the conficence interval 0.11871, upper bound 0.13052 and runtime 2.18125s.

• src: The source files as well as the perl scripts as described earlier inthis section reside here. The directories data 1d and data 2d contain thesynthetic traces used to test the algorithms as gzipped files. The contentof the files is coded like this: 0.2 10 27.data.gz means the file contains asynthetic trace with H = 0.2, containing 210 data points. 27 is the run

28


Roman Racine

number. data real contains real world data from biological applicationsin 2D. data seismic contains two sample files for static wavelet transform.The directories html and latex are generated by doxygen.

• report: Contains this thesis as well as the source files.

29


Roman Racine

References

[1] Wikipedia article on rescaled range.

[2] W. Bani. Wavelets, eine Einfuhrung fur Ingenieure (2. Auflage). Olden-bourg, Munchen, 2005.

[3] M. Bar and U. Kradolfer. An automatic phase picker for local and teleseis-mic events. Bulletin of the Seismological Society of America, 77(4):1437–1445, 1987.

[4] Arnaud Gloter and Marc Hoffmann. Estimation of the hurst parameterfrom discrete noisy data. Annals of Statistics, 35(5):1947–1974, 2007.

[5] Gene H. Golub and Charles F. Van Loan. Matrix computations (3rd ed.).Johns Hopkins University Press, Baltimore, MD, USA, 1996.

[6] C.L. Jones, G. T. Lonergan, and D. E. Mainwaring. Wavelet packet com-putation of the hurst exponent. Journal of Physics A: Mathematical andGeneral, 29, 1996.

[7] Geoforschungszentrum Potsdam. Seiscomp3, http://www.seiscomp3.org.

[8] John A. Rice. Mathematical Statistics and Data Analysis, Third Edition.Thomsoon Brook/Cole, 2006.

[9] Ivo F. Sbalzarini. Moments of displacement and their spectrum.

[10] R. Sleeman and T. van Eck. Robust automatic p-phase picking: An on-line implementation in the analysis of broadband seismogram recordings.Physics of the earth and planetary interiors, 113:265–275, 1999.

30


Roman Racine

Estimating the Hurst Exponent - MOSAIC Groupmosaic.mpi-cbg.de/docs/Racine2011.pdfEstimating the...

Documents

Transcript of Estimating the Hurst Exponent - MOSAIC Groupmosaic.mpi-cbg.de/docs/Racine2011.pdfEstimating the...