Introduction - Duke's Fuqua School of Businesscharvey/Teaching/Indepen… · Web viewI....

“Vector Quantization and entropy technique to predict high frequency data”

Independent study


Sponsor: Professor Campbell R. Harvey

Subject: high frequency dataCredits: 1

James Krieger

Author : James Krieger 21.05.2023 sur 37


1 Introduction..................................................................................................................32 Theory of information..................................................................................................3

2.1 Model a none discrete data stream.......................................................................33 Use of history data to predict future............................................................................4

3.1 Fix path length with maximum entropy...............................................................43.2 Multi path length or PPM....................................................................................43.3 Quantization.........................................................................................................6

3.3.1 VQ size........................................................................................................63.3.2 Period use for the return..............................................................................63.3.3 Implementation algorithm...........................................................................6

4 Data used.....................................................................................................................75 Redundancy.................................................................................................................7

5.1 Path length.........................................................................................................105.2 VQ size..............................................................................................................105.3 Return period.....................................................................................................10

6 Forecast......................................................................................................................116.1 Algorithm...........................................................................................................11

6.1.1 Multi path length algorithm.......................................................................116.1.2 Fix path length algorithm...........................................................................116.1.3 Return Period.............................................................................................12

6.2 Result.................................................................................................................136.2.1 Daily data...................................................................................................136.2.2 5 minutes data............................................................................................17

7 Additional study.........................................................................................................198 Conclusion.................................................................................................................199 References..................................................................................................................2010 Appendix................................................................................................................21

10.1 LBG design........................................................................................................2110.2 Update the LBG Algorithm into a geometric world..........................................2710.3 Redundancy result.............................................................................................2810.4 Return result.......................................................................................................31



1 Introduction

Entropy is a concept highly used in data compression.As suggest in a draft paper from Professor Campbell R. Harvey, this technique sounds reasonable to predict future return on a high frequency data. Why? The basic assumption is that if we have some redundancy information on historical data, in other word if we could compress this historical data, it means that some repeatability should exist. This repeatability could be use to predict future return.

In this memo, I will try to implement and validate this assumption base on a basic implementation of this technique on FX, which are know to pass most autocorrelation test.

2 Theory of information.The basic theory of information is simple but it generalizes implementation is complex. If we considering the compression of a text book:

while paying special attention to the case where is English text.

Let represents the first symbol. The entropy rate in the general case is given by:

Where the sum is over all possible values of . It is virtually impossible to calculate the entropy rate according to the above equation. Using a prediction method, Shannon has been able to estimate that the entropy rate of the 27-letter English text is 2.3 bits/character.

So the goal of an implementation is to have a simplify model of lower order which will capture most of the entropy. For example, with a 3rd order model, the entropy rate of the 27-letter English text is 2.77 bits/character. This is already substantially lower than the entropy of zero order which is 4.75 bits/character.So a 3rd order redundancy on an English text is already 42%. It means we are able to guess pretty well what will be the letter following two known letters.

2.1 Model a none discrete data streamTo compress the stream of data which are not discrete, the typical approaches are:

- Quantization (as VQ) and after use of lossless compression technique.- Transformation technique (Fourier analysis) and after use of lossless

compression technique.



In this paper we will use the first approach. (VQ)

3 Use of history data to predict futureWe will use the frequency of historical path to determine future return. Here a tree to illustrate the process:

frequencyreturn x''

0.01

return x' return y''1 0.1

.. .. return z''

return x . 0.0021 .

.

.return x''

0.0001. return z'. 1 .. .. return z''. 0.2.

return z0.1

Period 1 Period 2 Period 3

Base on this tree, if we have a return x follow-up with a return x’ (called prefix x,x’), we have 0.01 probability of having a return of x’’, 0.1 probability of having a return of y’’ and so and so. Base on this probability we will forecast future return.

3.1 Fix path length with maximum entropy.To determine the best length of path to use, we will compute the redundancy (inverts of entropy) on historical data and select the path length base on the higher redundancy. (High redundancy should let us forecast with better success future return)

Once the path length is determine, we will us it to compute the probability of the future return. But to improve the result, we will use the geometric mean of the returns of historical data having this particular path instead of the probability of the future path.

3.2 Multi path length or PPM.PPM has set the performance standard in data compression research since its introduction in 1984. PPM’s success stems from its ad hoc probability estimator which dynamically



blends distinct frequency distributions contained in a single model into a probability estimate for each input symbol.

In other word, PPM is able to blend the frequency distributions of different path length into one single probability.At the same time, the algorithm is proposing a solution for the zero probability problem.

Let’s define:S(n)=prefix of length n. (path of length n)P(a¦S(n)) = probability of have “a” with a prefix S(n). (Conditional probability)Count(X)=number of time X appear in the past.Count(X,S(n))=Number of time X appear in the past following a prefix S(n).W(S(n))=mixture weighting.

We can write the recursive definition to be used to compute the probability.(Each recursion will add 1 to the length of the prefix)

P(a¦S(n)) = W(S(n))*count(a,S(n))/count(S(n)) + (1-W(S(n)))*P(a¦S(n-1))

This equation could be applying recursively to increase the order. So from a probability coming from a path of length 1, we could estimate the blended probability of a path of length 1 and 2.

With the above equation, we are able the blend multi length path into one single probability. This probability will be use to compute the expected future return.

The critical part of this equation is the W(). Different solution a proposed in the literature (know as PPMD or PPMC) and is related with the zero probability problem.

The zero probability problems could be simplified as follow:Given a prefix, which probability should be assumed for an event that has never happen in the past? (In compression literature, it is referred as the escape mechanisms)

For example if you have a prefix compose of the letter “P,R,O,B,A,B,L” and that with this given prefix you have seen 3 times the letter “Y” following this given prefix and no another letter. Should you assume that in this context “Y” as 100% probable? Or should you assume none zero probability for another letter as “E”.

In compression, the different suggested solutions are:W(S(n)) = count(S(n)) / (count(S(n)) + count(a)/DS

and

P(a¦S(n)) = W(S(n))*(count(a,S(n))-K)/count(S(n)) + (1-W(S(n)))*P(a¦S(n-1))

With the following value of DS and K:



Algorithm DS KPPM B 1 -1PPM C 1 0PPM D 2 -0.5

The parameter DS could be seen as the weight distribution between long path versus short path. If DS equal 1, a bigger weight is put on the short path and reversely, if DS is big, the weight is put on the longer path.

Based on experimental measure, we will determine which value of DS and K seems the most appropriate.

3.3 QuantizationTo simplify our data stream, we are good to use a VQ quantization technique. The VQ is going to capture the return on our FX data. But we need to decide:

- How precise the VQ need to be. (what size the VQ need to be)- What period we are going to use for the return.

3.3.1 VQ sizeThis choice is driven by the amount of data available. For example, if you are looking at a VQ of size 4 (4 vectors) you can see that a sequence of 3 VQs generate 64 possibilities. And if we would like to capture a path which is compose of 10 sequences of VQ, the possibilities are around 1 millions. But if you have only thousand of sample available, you will never be able to populate the tree enough to get statistical significant probability on these possible paths.

3.3.2 Period use for the returnThe choice of the period use to compute the return is driven by:

- The frequency of data available. High frequency will reduce the period.- The noise in the data. Longer period will reduce noise and improve the quality

of the data.- The volatility between the sample data. You can imagine that if the volatility

between the samples is high, forecasting base on this sample will be very noisy.

Based on experimental measure, we will determine some possible period use in the computation of the return.

3.3.3 Implementation algorithmWe will use the LBG-VQ algorithm to determine the VQ. This algorithm is base on iteration and a description is joined in the annex I.



To take into consideration that we are working on return and not on absolute gain, the algorithm need to be transposed into a geometric world. A solution is described in the annex II.

4 Data usedWe will use the following data:

-daily FX rate of the pair usd_chf and gbp_usd (from Oanda) from 1990 until 2002.

-5 minutes data stream of eur_usd and usd_chf from January 1999 until February 2002

To be able to test the forecasting and trading strategy we are going to use an out of sample data:

- for the daily data, from 1997 until 2002.- for the 5 minutes data, from August 2001 until February 2002.

5 RedundancyIn appendix III, you will find the detail value of redundancy found.

With a VQ of size two, we have the following redundancy:

Daily data VQ = 2

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

0 2 4 6 8 10 12 14 16

path length

redu

ndan

cy 1 day return

2 days return

7 days return

It is interesting to see that the maximum redundancy is around the same path length for each type of return. This suggests that the sample size is driving the shape of the above curve.



With a VQ of size 4 and 8, we have the following redundancy:Daily data VQ size = 4

0.0%2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

14.0%

16.0%

18.0%

20.0%

0 1 2 3 4 5 6 7 8

path length

redu

ndan

cy 1 day return

2 days return

7 days return

Daily data VQ size = 8

0.0%2.0%4.0%

6.0%8.0%

10.0%

12.0%14.0%

16.0%

18.0%20.0%

0 1 2 3 4 5 6 7 8

paht length

redu

ndan

cy 1 day return

2 days return7 days return

Here again the redundancy for the 1 day, 2 days and 7 days return have a very similar shape, and the redundancy in decreasing almost immediately as the path length increased.



On the 5 minutes data, we find some similar result:

5 min data VQ = 2

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

0 5 10 15 20

path length

redu

ndan

cy

3 hours return 20 hours return

5 min data VQ size = 4

0.0%2.0%4.0%6.0%8.0%

10.0%12.0%14.0%16.0%18.0%20.0%

0 1 2 3 4 5 6 7 8

path length

redu

ndan

cy


5 min data VQ size = 8

0.0%2.0%4.0%6.0%8.0%

10.0%12.0%14.0%16.0%18.0%20.0%

0 1 2 3 4 5 6 7 8

path length

redu

ndan

cy


If we compute the average number of sample available for a given VQ size and a particular path of a fix length, we found that the maximum redundancy need more sample as the size of the VQ increase. (See appendix III)

If we compare the redundancy over time we found:daily data 2 days return

0%

2%

4%

6%

8%

10%

12%

14%

16%

28-Oct-95 11-Mar-97 24-Jul-98 06-Dec-99 19-Apr-01 01-Sep-02

date

redu

ndan

cy

2

3

4

5

6

7

daily data 7 days return VQ size of 4

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

28-Oct-95 11-Mar-97 24-Jul-98 06-Dec-99 19-Apr-01 01-Sep-02

date

redu

ndan

cy

2

3

4

5

6

7



5 minutes data 3 hours return VQ size of 8

0.00%2.00%4.00%6.00%8.00%

10.00%12.00%14.00%16.00%18.00%20.00%

08-Jul-01 28-Jul-01 17-Aug-01

06-Sep-01

26-Sep-01

16-Oct-01

05-Nov-01

25-Nov-01

date

redu

ndan

cy

2

3

4

5

6

7

5 minutes data 20 hours return VQ size of 8

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

08-Jun-01 28-Jul-01 16-Sep-01 05-Nov-01 25-Dec-01 13-Feb-02 04-Apr-02

date

redu

ndan

cy

2

3

4

5

6

7

(Discard the abrupt change, which are due to a recalibration of the VQ at the beginning of every year)

We can see that the redundancy is not surprisingly very stable. (This is normal, because adding just a couple of sample to the all data sample should not affect the overall redundancy too much)

But these charts are interesting because they tell a lot on how good a predictor model will do. Assuming that there is not a rolling effect of redundancy and patterns are constant over time, you can have 3 scenarios:

- The redundancy in decreasing. In this case, your prediction is likely to getting worst.

- The redundancy is constant. In this case, your prediction should be good.- The redundancy is increasing. In this case, your prediction is likely to get

better.

If you look at the daily data, we can see an increase in redundancy until 1999. This is the date where the euro has been introduced (Euro tied to the other currency in Europe). So not surprisingly, after 1999, the redundancy is flat and suggest that the introduction of new dynamics in the FX market. (The daily data is the FX pair usd_chf and gbp_usd)

5.1 Path lengthBased on the above experiment, we can conclude that the path length which generates the biggest redundancy is determined by the amount of sample available.

5.2 VQ sizeAs for the path length the size of the VQ which generate the biggest redundancy is determined by the amount of sample available. As a rule of thumb you should have a sample size equal to (VQ size)path length. Also, the redundancy seems to increase with the size of the VQ but decrease for large VQ due to the limited sample size.

5.3 Return periodIn our experiment, the period has a significant impact on the redundancy. In general, the redundancy increased as the period shortening. There is one exception which is on the 5 minutes data with a VQ of size 2, where the shorter period generated a smaller



redundancy. This could be explain by the fact that with a VQ of size 2, you will capture only directional change and the information become noisy if the return period is too small.

6 Forecast

6.1 AlgorithmFor the forecast we are going to use the following algorithm:

6.1.1 Multi path length algorithm

6.1.1.1 PPM p This is based on the PPM blending mechanism. Using the historical data we will find the corresponding probability for each vector. With this probability we will find a forecast by adding each vector with their corresponding probability. This forecast will determine the amount we will bet for the period. We will evaluate this return for different value of ds and k.

6.1.1.2 PPM gThis is similar to the PPM p algorithm but instead of using the probability, we are using the average historical return for each similar path length and blend this return using the PPM algorithm. This forecast will determine the amount we will bet for the period. We will evaluate this return for different value of ds.

6.1.2 Fix path length algorithm For the following algorithm we will use the path length that has the higher redundancy.

6.1.2.1 R signThis is based on the historical direction of the return for identical paths. It is computed by counting the number of positive return minus the number of negative return and divided by the number of sample. R sign can take a value between 1 to -1. This value is going to be use as the bet we are going to take.

6.1.2.2 R gThis is the average on the historical return for identical paths. This average will be use as our bet.

6.1.2.3 R g * countSimilar as R g but we count the number of time we have identical path. This allows weighting our forecast based on how significant this path happens in the past. This computed value will be use as our bet.



6.1.2.4 R g * count / stdevSimilar as above but this time we divide our average return with the standard deviation of the historical return with the identical path.

6.1.2.5 R g*count*sign ifsame Similar as above but this time we take a bet only if the average return and the R sign are pointing in the same direction.

6.1.3 Return Period For the return period we will use 1,2,7 days return for the daily data and 20 hours, 3 hours for the 5 minutes data.For the daily data, we are going to bet every day based on the last trading information. It means that for 2 and 7 days return, we will assume a rolling bet. (Multi bet at the same time waiting for their period to expired)

For the 5 minutes data, we are going to take into consideration inactivate period. So instead of having exactly 3 hours or 20 hours, as a period for the return, we are going to us the number of sample. For example for the 3 hours period we compute the return between 12*3=36 samples (the data has a 5 minutes sample rate). So all inactivity period will be skipped. (There is no sample if no activity)We will take a new bet at the end of the return period. (every 36 or 120 samples in this case).



6.2 Result

6.2.1 Daily dataFor the daily data we found the following monthly sharp ratio in the out of sample date starting in 1/1/1997 and ending 1/1/2002The sharp ratio was calculated on a daily basis and adjusted to reflect a monthly value (assuming 20 trades during a month). For the 7 days we are assuming that we take a bet every day and hold it until the end of the period. (see appendix for detail)

PPM g 2

PPM g 4

PPM g 6

PPM p k0_0:ds 2

PPM p k-0_5:ds 2

PPM p k-0_5:ds 4

PPM p k-0_5:ds 6

PPM p k-1_0:ds 2 r sign r g

r g*count

r g*count/stdev

r g*count*sign ifsame

1 day VQ 4

gbp_usd 2.43 3.00 3.30 2.96 2.73 2.58 2.40 2.49 4.12 3.71 1.92 1.66 4.19usd_chf 1.30 1.54 1.68 1.70 1.75 2.22 2.50 1.77 3.02 1.68 0.62 0.78 1.19 both 2.16 2.63 2.88 2.65 2.54 2.72 2.79 2.41 4.41 3.19 1.50 1.45 3.31

2 day VQ 4

gbp_usd 2.79 3.14 3.35 3.58 3.87 3.86 3.89 4.10 1.66 2.76 0.88 0.58 0.48 usd_chf (3.41) (3.11) (2.94) (2.04) (1.74) (1.19) (0.95) (1.45) (2.38) (1.41) (2.51) (2.68) (1.25)both (0.37) 0.02 0.25 0.91 1.27 1.60 1.76 1.58 (0.44) 0.82 (0.96) (1.25) (0.47)

7 day VQ 4

gbp_usd (1.92) (0.65) 0.06 3.25 3.45 3.40 3.21 3.62 (3.59) 2.15 0.35 (0.58) (1.72)usd_chf (6.66) (4.45) (3.19) (0.87) (0.60) 1.14 1.89 (0.34) 2.01 (0.50) 0.39 0.44 1.30 both (5.34) (3.20) (1.98) 1.39 1.67 2.65 2.97 1.91 (1.02) 1.06 0.47 (0.09) (0.27)

Overall the gbp_usd outperform the usd_chf rate. This could mean that the usd_chf is a leading indicator for the gbp_usd FX rate.

The “PPM g” algorithm has disappointing result compare to the “PPM p”. This suggests that the proposed solution for the “zero statistics” problem is improving the result. (“PPM g” do not assign a default probability if an event has never occur, but “PPM p” does)



The “r sign” algorithm is performing very well for short time period (1 day return) which is surprising due to the simplicity of this algorithm. It means that the direction of the return is much more important than the forecasted value of the return.



Do better understand the risk taken we should check how the repartition of the return are. For the 1 day return, we have the following repartition of the return

algorithm : r signreturn repartition gbp_usd

-40%

-30%-20%-10%

0%10%20%30%40%

50%60%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

e

standard deviation

% o

f tot

al r

etur

n (g

ain

+ lo

ss)

for

the

rang

e

7/2/2001 1/1/2002

1/1/2001 7/2/2001

7/2/2000 1/1/2001

1/1/2000 7/2/2000

7/3/1999 1/1/2000

1/1/1999 7/3/1999

7/2/1998 1/1/1999

1/1/1998 7/2/1998

7/2/1997 1/1/1998

1/1/1997 7/2/1997

Total:r sign

algorithm : r signReturn repartition usd_chf

-60%

-40%

-20%

0%

20%

40%

60%

80%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

e

standard deviation

% o

f tot

al r

etur

n (g

ain

+ lo

ss)

for

the

rang

e

7/2/2001 1/1/2002

1/1/2001 7/2/2001

7/2/2000 1/1/2001

1/1/2000 7/2/2000

7/3/1999 1/1/2000

1/1/1999 7/3/1999

7/2/1998 1/1/1999

1/1/1998 7/2/1998

7/2/1997 1/1/1998

1/1/1997 7/2/1997

Total:r s ign

algorithm : r g*count*sign ifsamereturn repartition gbp_usd

-40%

-30%

-20%

-10%

0%

10%

20%

30%

40%

50%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

e

standard deviation

% o

f tot

al r

etur

n (g

ain

+ lo

ss)

for

the

rang

e

7/2/2001 1/1/2002

1/1/2001 7/2/2001

7/2/2000 1/1/2001

1/1/2000 7/2/2000

7/3/1999 1/1/2000

1/1/1999 7/3/1999

7/2/1998 1/1/1999

1/1/1998 7/2/1998

7/2/1997 1/1/1998

1/1/1997 7/2/1997

Total:r g*count*sign ifsame

algorithm : r g*count*sign ifsameReturn repartition usd_chf

-150%

-100%

-50%

0%

50%

100%

150%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

e

standard deviation

% o

f tot

al r

etur

n (g

ain

+ lo

ss)

for

the

rang

e

7/2/2001 1/1/2002

1/1/2001 7/2/2001

7/2/2000 1/1/2001

1/1/2000 7/2/2000

7/3/1999 1/1/2000

1/1/1999 7/3/1999

7/2/1998 1/1/1999

1/1/1998 7/2/1998

7/2/1997 1/1/1998

1/1/1997 7/2/1997

Total:r g*count*sign ifsame



For the 2 day return, we have the following repartition of the returnalgorithm : PPM p k-0_5:ds 6

return repartition gbp_usd

-60%

-40%

-20%

0%

20%

40%

60%

80%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

estandard deviation

% o

f tot

al r

etur

n (g

ain

+ lo

ss)

for

the

rang

e

7/2/2001 1/1/2002

1/1/2001 7/2/2001

7/2/2000 1/1/2001

1/1/2000 7/2/2000

7/3/1999 1/1/2000

1/1/1999 7/3/1999

7/2/1998 1/1/1999

1/1/1998 7/2/1998

7/2/1997 1/1/1998

1/1/1997 7/2/1997

Total:PPM p k-0_5:ds 6

algorithm : PPM p k-0_5:ds 6

Return repartition usd_chf

-250%

-200%

-150%

-100%

-50%

0%

50%

100%

150%

200%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

e

standard deviation

% o

f tot

al r

etur

n (g

ain

+ lo

ss)

for

the

rang

e

7/2/2001 1/1/2002

1/1/2001 7/2/2001

7/2/2000 1/1/2001

1/1/2000 7/2/2000

7/3/1999 1/1/2000

1/1/1999 7/3/1999

7/2/1998 1/1/1999

1/1/1998 7/2/1998

7/2/1997 1/1/1998

1/1/1997 7/2/1997


For the 7 day return, we have the following repartition of the returnalgorithm : PPM p k-0_5:ds 6

return repartition gbp_usd

-80%

-60%

-40%-20%

0%

20%

40%

60%80%

100%

120%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

e

standard deviation

% o

f tot

al r

etur

n (g

ain

& lo

ss)

for

the

rang

e

7/2/2001 1/1/2002

1/1/2001 7/2/2001

7/2/2000 1/1/2001

1/1/2000 7/2/2000

7/3/1999 1/1/2000

1/1/1999 7/3/1999

7/2/1998 1/1/1999

1/1/1998 7/2/1998

7/2/1997 1/1/1998

1/1/1997 7/2/1997


algorithm : PPM p k-0_5:ds 6


-150%

-100%

-50%

0%

50%

100%

150%

200%

250%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

e

standard deviation

% o

f tot

al r

etur

n (g

ain

& lo

ss)

for

the

rang

e

7/2/2001 1/1/2002

1/1/2001 7/2/2001

7/2/2000 1/1/2001

1/1/2000 7/2/2000

7/3/1999 1/1/2000

1/1/1999 7/3/1999

7/2/1998 1/1/1999

1/1/1998 7/2/1998

7/2/1997 1/1/1998

1/1/1997 7/2/1997




On the above graphs we see that the return generated by the positive skewness is important. Also, we can see in the appendix that the skew value is always positive and relatively high for small period return (1 day return).

Also, the algorithm “r g*count*sign ifsame” for the gbp_usd 1 day return give very good result. Except for the range of return from 1 to 2 standard deviations which is negative, all the other range is positive with a very fat positive tail. Not surprisingly the monthly sharp ratio for this case is 4.19. This is very high.



6.2.2 5 minutes dataFor the 5 minutes data, we found the following monthly sharp ratio. Due to the processing time require, only two scenarios were testedThe sharp ratio was calculated on a period basis (20 hours or 3 hours) and adjusted to reflect a monthly duration (assuming 20 trades during a month for the 20 hours return and 100 trades for the 3 hours return)(a detail of the return are available at the appendix IV).

PPM g 12

PPM g 4

PPM g 8

PPM p k0_0:ds 2

PPM p k-0_5:ds 12

PPM p k-0_5:ds 4

PPM p k-0_5:ds 8 r sign r g

r g*count

r g*count/stdev


20 hours VQ 8

eur_usd (4.13) (2.33) (3.44) (10.46) (8.54) (10.68) (9.54) 4.81 (4.96) 2.56 2.80 2.64 usd_chf (4.68) (2.65) (4.00) (11.86) (9.62) (11.92) (10.61) (0.56) (2.91) 0.51 0.39 0.24 both (4.56) (2.56) (3.83) (11.40) (9.26) (11.53) (10.28) 2.18 (4.19) 1.59 1.65 1.48

3 hours VQ 8

gbp_usd (14.38) (16.80) (14.19) (6.03) 14.70 7.19 12.57 70.41 54.92 (18.17) (17.65) (24.30)usd_chf 14.54 4.66 13.20 8.37 8.38 13.52 10.43 78.17 61.87 3.88 3.84 (6.88)both 0.08 (6.16) (0.50) 1.21 11.59 10.56 11.61 74.02 55.50 (7.41) (7.16) (15.93)

The 20 hours does not generate good return. This is probably due to the data which represent only 2.5 years of information. And for a 20 hours forecast, this is probably not enough.

The 3 hours does generate an extraordinary return. If you look at the detail in appendix IV, you will see that this return is coming from the 2 first weeks of September 2001. (September 11 !)

A closer look confirmed it:



algorithm : r sign

return repartition eur_usd

-60%-40%

-20%0%

20%40%60%80%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

e

standard deviation

% o

f tot

al r

etur

n (g

ain

+ lo

ss) f

or th

e ra

nge

10/18/2001 11/2/2001

10/2/2001 10/18/2001

9/17/2001 10/2/2001

9/1/2001 9/17/2001

8/17/2001 9/1/2001

8/1/2001 8/17/2001

Total:r sign

algorithm : r sign


-40%

-20%

0%

20%

40%

60%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

e

standard deviation

% o

f tot

al r

etur

n (g

ain

+ lo

ss) f

or th

e ra

nge

10/18/2001 11/2/2001

10/2/2001 10/18/2001

9/17/2001 10/2/2001

9/1/2001 9/17/2001

8/17/2001 9/1/2001

8/1/2001 8/17/2001

Total:r sign

Algorithm :PPM p k-1_0:ds 2

return repartition eur_usd

-100%

-50%

0%

50%

100%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

e

standard deviation

% o

f tot

al r

etur

n (g

ain

+ lo

ss) f

or th

e ra

nge 10/18/2001 11/2/2001

10/2/2001 10/18/2001

9/17/2001 10/2/2001

9/1/2001 9/17/2001

8/17/2001 9/1/2001

8/1/2001 8/17/2001


Algorithm :PPM p k-1_0:ds 2Return repartition usd_chf

-100%

-50%

0%

50%

100%

0 to

1

1 to

2

2 to

3

3 to

4

4 to

5

5 to

6

6 to

7

mor

e

standard deviation%

of t

otal

ret

urn

(gai

n +

loss

) for

the

rang

e 10/18/2001 11/2/2001

10/2/2001 10/18/2001

9/17/2001 10/2/2001

9/1/2001 9/17/2001

8/17/2001 9/1/2001

8/1/2001 8/17/2001


As you can see most of the return are coming from the 2 first week of September. The returns are highly skew positively.Without the two first week of September, the returns are positive and not highly skew positively. Most of the return are in the fist standard deviation.



7 Additional studyWe could of course complete this study by trying different return period and VQ size. Especially for the 5 minutes data where only 2 periods were tested which are probably to long (1 hours period will probably be much better)Also, it is necessary to check the impact of transaction cost on the return.

But more interesting study could be done on the following issue:1. Use more than 2 currencies. If we are capturing all the high volume currencies,

we could capture all the important flow which I believe will have high redundancy.

2. Instead of using only currency, we could add a combination of market indicator (S&P, CAC40, etc.)

3. The “PPM g” algorithms are disappointing. But due to the good success of the “R sign” algorithm we could try to combine the PPM smoothing approach and apply it to “R sign” and try to compute multi path length directional return.

4. The VQ LBG algorithm is known to be locally optimal (very good centroid) but not globally optimal. (Minimum error term). Other Algorithm could be tested to check if they will improve the overall result.

5. We could mix multi period in the path. For example we can image build a path based on 2 weeks return following by 1 week return following by a 3 days return following by a 1 days return. This will allowed us to capture longer period pattern with a limited number of sample available.

8 ConclusionThe use of theory of information to predict future return seems promising. In this paper we showed that with daily FX return, we have being able to generate return with a high sharpe ratio (above 4). In some case, over a 5 years period, only 2 half year have generated a negative return.

We have also showed that the length of the path that generated high redundancy is closely related to the amount of data available. In our example, the longest path was 16 period for a VQ of size 2. This could simplify the complexity of the implementation as a path is 16 periods is relatively small and easy to handle.

In our experience, we have unfortunately used long return period for the 5 minutes data. It is realistic to believe that a shorter period would generate much higher return.

Overall the implementation was



9 References

Campbell R. Harvey, "Forecasting Foreign Exchange Market Returns via Entropy Based Coding: The Framework," with Arman Glodjo.

http://faculty.fuqua.duke.edu/~charvey/Research/Working_Papers/W13_Forecasting_foreign_exchange.pdf

David J.C. MacKay, “Information Theory, Inference, and Learning Algorithms”

http://www.inference.phy.cam.ac.uk/mackay/itprnn/book.html

Suzanne Bunton, “On-Line Stochastic Processes in Data Compression”

ftp://ftp.cs.washington.edu/tr/1997/03/UW-CSE-97-03-02.PS.Z

Other online resources:

About compression including PPM algorithm description.http://datacompression.info/index.shtml

Introduction to the theory of information and VQ. (include the LGG VQ)http://datacompression.info/index.shtml


http://datacompression.info/index.shtml

http://datacompression.info/index.shtml

ftp://ftp.cs.washington.edu/tr/1997/03/UW-CSE-97-03-02.PS.Z

http://www.inference.phy.cam.ac.uk/mackay/itprnn/book.html




10 Appendix

Appendix I 10.1LBG designFrom: Nam PhamdoDepartment of Electrical and Computer EngineeringState University of New YorkStony Brook, NY [email protected]

I. Introduction Vector quantization (VQ) is a lossy data compression method based on the principle of block coding. It is a fixed-to-fixed length algorithm. In the earlier days, the design of a vector quantizer (VQ) is considered to be a challenging problem due to the need for multi-dimensional integration. In 1980, Linde, Buzo, and Gray (LBG) proposed a VQ design algorithm based on a training sequence. The use of a training sequence bypasses the need for multi-dimensional integration. A VQ that is designed using this algorithm are referred to in the literature as an LBG-VQ.

II. Preliminaries A VQ is nothing more than an approximator. The idea is similar to that of ``rounding-off'' (say to the nearest integer). An example of a 1-dimensional VQ is shown below:

Here, every number less than -2 are approximated by -3. Every number between -2 and 0 are approximated by -1. Every number between 0 and 2 are approximated by +1. Every number greater than 2 are approximated by +3. Note that the approximate values are uniquely represented by 2 bits. This is a 1-dimensional, 2-bit VQ. It has a rate of 2 bits/dimension.

An example of a 2-dimensional VQ is shown below:


mailto:[email protected]


Here, every pair of numbers falling in a particular region are approximated by a red star associated with that region. Note that there are 16 regions and 16 red stars -- each of which can be uniquely represented by 4 bits. Thus, this is a 2-dimensional, 4-bit VQ. Its rate is also 2 bits/dimension.

In the above two examples, the red stars are called codevectors and the regions defined by the blue borders are called encoding regions. The set of all codevectors is called the codebook and the set of all encoding regions is called the partition of the space.

III. Design Problem The VQ design problem can be stated as follows. Given a vector source with its statistical properties known, given a distortion measure, and given the number of codevectors, find a codebook (the set of all red stars) and a partition (the set of blue lines) which result in the smallest average distortion.

We assume that there is a training sequence consisting of source vectors:

This training sequence can be obtained from some large database. For example, if the source is a speech signal, then the training sequence can be obtained by recording several

long telephone conversations. is assumed to be sufficiently large so that all the



statistical properties of the source are captured by the training sequence. We assume that

the source vectors are -dimensional, e.g.,

Let be the number of codevectors and let

represents the codebook. Each codevector is -dimensional, e.g.,

Let be the encoding region associated with codevector and let

denote the partition of the space. If the source vector is in the encoding region ,

then its approximation (denoted by ) is :

Assuming a squared-error distortion measure, the average distortion is given by:

where . The design problem can be succinctly stated as

follows: Given and , find and such that is minimized.

IV. Optimality Criteria

If and are a solution to the above minimization problem, then it must satisfied the following two criteria.

Nearest Neighbor Condition:

This condition says that the encoding region should consists of all vectors that are closer to than any of the other codevectors. For those vectors lying on the boundary (blue lines), any tie-breaking procedure will do.

Centroid Condition:



This condition says that the codevector should be average of all those training

vectors that are in encoding region . In implementation, one should ensure that at least one training vector belongs to each encoding region (so that the denominator in the above equation is never 0).

V. LBG Design Algorithm The LBG VQ design algorithm is an iterative algorithm which alternatively solves the

above two optimality criteria. The algorithm requires an initial codebook . This initial codebook is obtained by the splitting method. In this method, an initial codevector is set as the average of the entire training sequence. This codevector is then split into two. The iterative algorithm is run with these two vectors as the initial codebook. The final two codevectors are splitted into four and the process is repeated until the desired number of codevectors is obtained. The algorithm is summarized below. LBG Design Algorithm

1. Given . Fixed to be a ``small'' number.

2. Let and

Calculate

3. Splitting: For , set

Set .

4. Iteration: Let . Set the iteration index .



i. For , find the minimum value of

over all . Let be the index which achieves the minimum. Set

ii. For , update the codevector

iii. Set . iv. Calculate

v. If , go back to Step (i).

vi. Set . For , set

as the final codevectors.

5. Repeat Steps 3 and 4 until the desired number of codevectors is obtained.

VI. Performance The performance of VQ are typically given in terms of the signal-to-distortion ratio (SDR):

(in dB),



where is the variance of the source and is the average squared-error distortion. The higher the SDR the better the performance. The following tables show the performance of the LBG-VQ for the memoryless Gaussian source and the first-order Gauss-Markov source with correlation coefficient 0.9. Comparisons are made with the optimal performance theoretically attainable, SDRopt, which is obtained by evaluating the rate-distortion function.

Rate SDR (in dB) SDRopt

(bits/dimension)

1 4.4 4.4 4.5 4.7 4.8 4.8 4.9 5.0 6.02 9.3 9.6 9.9 10.2 10.3 ---- ---- ---- 12.03 14.6 15.3 15.7 ---- ---- ---- ---- ---- 18.14 20.2 21.1 ---- ---- ---- ---- ---- ---- 24.15 26.0 27.0 ---- ---- ---- ---- ---- ---- 30.1

Memoryless Gaussian Source

Rate SDR (in dB) SDRopt

(bits/dimension)

1 4.4 7.8 9.4 10.2 10.7 11.0 11.4 11.6 13.22 9.3 13.6 15.0 15.8 16.2 ---- ---- ---- 19.33 14.6 19.0 20.6 ---- ---- ---- ---- ---- 25.34 20.2 24.8 ---- ---- ---- ---- ---- ---- 31.35 26.0 30.7 ---- ---- ---- ---- ---- ---- 37.3

First-Order Gauss-Markov Source with Correlation 0.9

VII. References

1. A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. 2. H. Abut, Vector Quantization. 3. R. M. Gray, ``Vector Quantization,'' IEEE ASSP Magazine, pp. 4--29, April 1984. 4. Y. Linde, A. Buzo, and R. M. Gray, ``An Algorithm for Vector Quantizer

Design,'' IEEE Transactions on Communications, pp. 702--710, January 1980.



Appendix II10.2Update the LBG Algorithm into a geometric world To model financial information as the change in price, you need to work with geometric mean instead of arithmetic mean.

Below I have updated the LGB VQ Algorithm describes by Nam Phamdo to take into consideration the use of geometric mean.

The training sequence:

is now:The codevectors:

is now:The distortion measure:

is now:

To have a distortion measure, which we could compare into the geometric world, we would have but this last step is not strictly necessary as our goal is to minimize the error. (To get closest to 0 is similar than to get closest to 1)

The Nearest neighbor condition:

is now:

The Centroid Condition:

is now:

This is the geometric means and it could be express as:



Appendix III10.3Redundancy result

Daily data return on 1 day return on 2 day return on 7 day

VQ

siz

e 2

path length Redundancy path length Redundancy path length Redundancy2 5.8% 2 2.0% 2 0.0% 3 5.9% 3 2.0% 3 0.1% 4 6.0% 4 2.0% 4 0.1% 5 6.1% 5 2.1% 5 0.2% 6 6.2% 6 2.2% 6 0.5% 7 6.4% 7 2.3% 7 0.8% 8 6.6% 8 d. 8 2.6% 8 1.3% 9 6.5% 9 3.0% 9 1.9%

10 6.0% 10 3.2% 20 d. 10 2.5% 11 4.8% 11 2.8% 11 2.5% 2.5 month12 3.6% 12 2.0% 12 2.1% 13 2.4% 13 1.3% 13 1.6% 14 1.5% 14 0.7% 14 1.3%

VQ

siz

e 4

path length Redundancy path length Redundancy path length Redundancy2 17.1% 2 12.4% 2 10.3% 3 17.3% 3 d. 3 12.6% 6 days 3 10.5% 3 weeks4 16.7% 4 12.4% 4 10.3% 5 13.3% 5 10.2% 5 8.5% 6 9.1% 6 6.7% 6 5.7% 7 5.5% 7 3.7% 7 3.2%

VQ

siz

e 8

path length Redundancy path length Redundancy path length Redundancy2 18.5% 2 d. 2 11.2% 4 days 2 6.7% 2 weeks3 16.2% 3 10.0% 3 6.6% 4 10.5% 4 5.7% 4 4.0% 5 5.0% 5 2.1% 5 1.6% 6 2.0% 6 0.6% 6 0.9% 7 0.7% 7 0.1% 7 0.6%



5 minutes data return on 3 hours return on 20 hours

VQ

siz

e 2

path length Redundancy path length Redundancy2 1.1% 2 0.8% . . . . . . . .

10 1.4% 10 2.5% 11 1.5% 11 3.3% 12 1.7% 12 4.2% 13 1.9% 13 5.0% 14 2.2% 14 5.4% 15 2.5% 15 5.5% 12.5 days16 2.5% 2 days 16 5.5% 17 2.3% 17 5.4% 18 2.0% 18 5.3% 19 1.7% 19 5.1%

VQ

siz

e 4

path length Redundancy path length Redundancy2 17.4% 2 8.6% 3 17.5% 3 8.8% 4 17.6% 12 hours 4 9.1% 5 17.4% 5 9.4% 4.2 days6 16.0% 6 9.0% 7 13.3% 7 7.9%

VQ

siz

e 8

path length Redundancy path length Redundancy2 17.7% 6 hours 2 10.4% 40 hours3 17.5% 3 10.1% 4 15.4% 4 8.8% 5 11.4% 5 6.8% 6 6.9% 6 5.1% 7 3.5% 7 4.1%



Based on the number of historical sample available, we can compute the average number of sample available for each possible path based on a perfectly random repartition.

In grey, the number of sample for the combination of path length and VQ size which generated the maximum redundancy.For the daily data, we have 4000 sample and 17000 for the 5 minute data.

Daily data size of VQ 4000 2 4 8

Pat

h le

ngth

2 1000 250 633 500 63 84 250 16 15 125 4 06 63 1 07 31 0 08 16 0 09 8 0 0

10 4 0 011 2 0 012 1 0 013 0 0 014 0 0 015 0 0 016 0 0 0

5 minutes data size of VQ 17000 2 4 8

Pat

h le

ngth

2 4250 1063 2663 2125 266 334 1063 66 45 531 17 16 266 4 07 133 1 08 66 0 09 33 0 0

10 17 0 011 8 0 012 4 0 013 2 0 014 1 0 015 1 0 016 0 0 0



Appendix IV10.4Return result



from to PPM g 2 PPM g 4 PPM g 6PPM p k0_0:ds 2

PPM p k-0_5:ds 2

PPM p k-0_5:ds 4

PPM p k-0_5:ds 6

PPM p k-1_0:ds 2 r sign r g r g*count

r g*count/stdev


3 4 5 6 7 8 9 10 11 12 13 14 15stdev 9.88E-07 1.46E-06 1.76E-06 1.81E-06 1.92E-06 2.43E-06 2.79E-06 2.06E-06 0.000579 4.05E-06 0.000433 0.105969 7.16E-05

182.625 gbp_usd 18261/1/1997 7/2/1997 0.086 0.091 0.093 0.070 0.062 0.057 0.054 0.054 0.065 0.082 0.068 0.065 0.0837/2/1997 1/1/1998 0.149 0.141 0.132 0.075 0.053 0.021 0.003 0.033 0.084 0.052 0.119 0.127 0.1671/1/1998 7/2/1998 0.092 0.091 0.089 0.027 0.019 0.003 (0.009) 0.011 0.100 0.162 0.031 0.024 0.1417/2/1998 1/1/1999 0.031 0.041 0.049 0.046 0.046 0.048 0.050 0.046 0.016 0.008 0.003 0.000 (0.040)1/1/1999 7/3/1999 (0.052) (0.043) (0.036) 0.023 0.027 0.027 0.023 0.030 (0.016) 0.003 (0.060) (0.065) (0.056)7/3/1999 1/1/2000 0.030 0.030 0.029 0.033 0.029 0.029 0.025 0.026 (0.008) 0.026 0.065 0.069 0.0451/1/2000 7/2/2000 (0.090) (0.063) (0.048) (0.092) (0.095) (0.096) (0.098) (0.096) 0.058 0.033 (0.135) (0.146) (0.030)7/2/2000 1/1/2001 0.046 0.065 0.077 0.136 0.140 0.163 0.174 0.143 0.207 0.138 0.089 0.071 0.1541/1/2001 7/2/2001 0.025 0.015 0.010 0.007 0.015 0.020 0.028 0.023 (0.037) (0.083) 0.046 0.054 (0.015)7/2/2001 1/1/2002 (0.053) (0.043) (0.037) (0.009) (0.010) (0.002) 0.002 (0.011) (0.014) (0.017) (0.032) (0.033) 0.014

1/1/1997 1/1/1999 0.092 0.094 0.093 0.059 0.049 0.037 0.029 0.041 0.067 0.078 0.060 0.059 0.0891/1/1997 1/1/2002 0.027 0.034 0.037 0.033 0.030 0.029 0.027 0.028 0.046 0.041 0.021 0.019 0.047

skew 1.076 1.221 1.325 1.621 1.672 1.898 1.986 1.691 0.938 5.620 0.906 0.888 2.454stdev 1.34E-06 1.99E-06 2.43E-06 3.49E-06 3.72E-06 4.71E-06 5.42E-06 3.99E-06 0.000701 4.95E-06 0.000626 0.11438 7.35E-05

182.625 usd_chf 36521/1/1997 7/2/1997 0.042 0.045 0.045 (0.003) (0.009) (0.012) (0.014) (0.014) (0.058) (0.010) 0.060 0.064 (0.023)7/2/1997 1/1/1998 (0.058) (0.069) (0.076) (0.084) (0.090) (0.100) (0.103) (0.095) (0.012) (0.020) (0.034) (0.037) (0.019)1/1/1998 7/2/1998 0.028 0.033 0.035 0.040 0.034 0.030 0.026 0.028 0.011 0.033 (0.061) (0.064) (0.011)7/2/1998 1/1/1999 0.132 0.142 0.149 0.111 0.121 0.124 0.124 0.129 0.155 0.036 0.046 0.051 0.1041/1/1999 7/3/1999 (0.038) (0.028) (0.021) 0.018 0.015 0.030 0.037 0.012 0.053 0.032 (0.045) (0.052) (0.026)7/3/1999 1/1/2000 0.013 0.015 0.017 0.037 0.044 0.059 0.069 0.050 0.030 0.063 0.026 0.024 0.0311/1/2000 7/2/2000 0.096 0.101 0.103 0.110 0.105 0.101 0.097 0.100 0.080 0.046 0.054 0.068 0.0367/2/2000 1/1/2001 0.081 0.088 0.091 0.096 0.098 0.109 0.114 0.098 0.084 0.111 0.115 0.108 0.1101/1/2001 7/2/2001 (0.056) (0.064) (0.068) (0.062) (0.050) (0.034) (0.022) (0.039) 0.001 (0.056) (0.000) 0.017 (0.003)7/2/2001 1/1/2002 (0.096) (0.093) (0.090) (0.079) (0.077) (0.064) (0.052) (0.075) (0.013) (0.052) (0.094) (0.095) (0.072)

1/1/1997 1/1/1998 (0.009) (0.013) (0.016) (0.044) (0.050) (0.057) (0.059) (0.055) (0.035) (0.016) 0.012 0.013 (0.021)1/1/1997 1/1/2002 0.015 0.017 0.019 0.019 0.020 0.025 0.028 0.020 0.034 0.019 0.007 0.009 0.013

skew 0.702 0.719 0.792 0.448 0.798 1.162 1.410 1.082 1.204 1.341 1.033 1.068 0.871Both currency (equally weight)

1/1/1997 1/1/2002 0.024 0.029 0.032 0.030 0.028 0.030 0.031 0.027 0.049 0.036 0.017 0.016 0.037skew 0.880 0.985 1.105 1.062 1.338 1.736 1.964 1.554 0.854 2.413 0.715 0.723 1.103monthly sharpe ratio 2.162 2.625 2.885 2.651 2.536 2.723 2.790 2.414 4.410 3.194 1.500 1.447 3.315

Daily data, 1 days return, VQ of size 4. Daily sharpe ratio (return/stdev on overall period)




PPM p k-0_5:ds 2

PPM p k-0_5:ds 4

PPM p k-0_5:ds 6


r g*count/stdev


3 4 5 6 7 8 9 10 11 12 13 14 15stdev 1.93E-06 2.72E-06 3.21E-06 3.78E-06 4.02E-06 5.28E-06 6.23E-06 4.3E-06 0.000894 7.06E-06 0.000736 0.121116 0.000141

182.625 gbp_usd 18241/1/1997 7/2/1997 0.007 0.005 0.002 (0.058) (0.059) (0.084) (0.097) (0.060) (0.017) (0.028) 0.006 0.007 0.0037/2/1997 1/1/1998 0.096 0.096 0.094 0.030 0.025 (0.001) (0.010) 0.020 (0.022) 0.029 0.056 0.046 0.0221/1/1998 7/2/1998 0.106 0.107 0.109 0.092 0.089 0.078 0.070 0.085 0.067 0.102 0.016 0.011 0.0027/2/1998 1/1/1999 0.048 0.057 0.063 0.086 0.096 0.119 0.132 0.103 0.076 0.120 0.092 0.090 0.1221/1/1999 7/3/1999 (0.028) (0.034) (0.037) 0.017 0.022 0.030 0.033 0.026 (0.056) (0.043) (0.079) (0.078) (0.086)7/3/1999 1/1/2000 0.195 0.198 0.197 0.171 0.162 0.136 0.120 0.153 0.089 0.159 0.183 0.174 0.1201/1/2000 7/2/2000 0.021 0.029 0.035 0.056 0.071 0.091 0.103 0.084 0.021 0.038 0.007 0.009 0.0317/2/2000 1/1/2001 (0.062) (0.058) (0.054) (0.043) (0.031) (0.016) (0.004) (0.020) (0.057) (0.065) (0.139) (0.142) (0.156)1/1/2001 7/2/2001 (0.077) (0.068) (0.060) (0.021) (0.016) (0.008) (0.005) (0.012) 0.099 (0.009) (0.048) (0.056) 0.0237/2/2001 1/1/2002 (0.023) (0.015) (0.010) 0.019 0.021 0.034 0.040 0.023 (0.038) (0.031) (0.022) (0.020) (0.045)

1/1/1997 1/1/1999 0.068 0.070 0.071 0.044 0.044 0.035 0.030 0.044 0.029 0.061 0.048 0.044 0.0401/1/1997 1/1/2002 0.031 0.035 0.037 0.040 0.043 0.043 0.044 0.046 0.019 0.031 0.010 0.007 0.005

skew 0.578 0.565 0.574 0.521 0.620 0.725 0.823 0.716 0.580 0.575 0.441 0.503 0.432stdev 1.99E-06 2.93E-06 3.6E-06 5.53E-06 5.77E-06 7.43E-06 8.67E-06 6.05E-06 0.000968 8.74E-06 0.000964 0.11583 9.39E-05

182.625 usd_chf 36481/1/1997 7/2/1997 (0.067) (0.061) (0.061) (0.104) (0.105) (0.127) (0.138) (0.105) (0.019) (0.015) (0.003) (0.012) (0.047)7/2/1997 1/1/1998 (0.030) (0.029) (0.029) (0.037) (0.034) (0.034) (0.033) (0.031) (0.078) (0.029) (0.037) (0.043) (0.054)1/1/1998 7/2/1998 0.005 (0.003) (0.009) 0.028 0.033 0.034 0.035 0.037 (0.097) (0.109) (0.077) (0.082) (0.050)7/2/1998 1/1/1999 0.061 0.047 0.038 0.097 0.094 0.096 0.094 0.089 (0.002) 0.171 0.146 0.146 0.0671/1/1999 7/3/1999 (0.189) (0.165) (0.148) (0.104) (0.089) (0.054) (0.037) (0.075) 0.035 (0.104) (0.180) (0.175) (0.050)7/3/1999 1/1/2000 0.055 0.061 0.063 0.059 0.057 0.054 0.050 0.055 0.021 0.060 0.126 0.122 0.0781/1/2000 7/2/2000 (0.037) (0.028) (0.022) 0.059 0.066 0.079 0.082 0.072 (0.068) (0.055) (0.063) (0.062) (0.015)7/2/2000 1/1/2001 (0.100) (0.102) (0.103) (0.104) (0.103) (0.090) (0.083) (0.102) (0.136) (0.096) (0.130) (0.138) (0.112)1/1/2001 7/2/2001 (0.037) (0.032) (0.025) (0.076) (0.066) (0.042) (0.027) (0.056) 0.159 (0.003) (0.038) (0.033) 0.0357/2/2001 1/1/2002 (0.040) (0.033) (0.030) (0.045) (0.046) (0.049) (0.050) (0.046) (0.074) 0.032 (0.022) (0.023) 0.015

1/1/1997 1/1/1998 (0.048) (0.045) (0.044) (0.070) (0.068) (0.080) (0.085) (0.067) (0.048) (0.022) (0.020) (0.027) (0.050)1/1/1997 1/1/2002 (0.038) (0.035) (0.033) (0.023) (0.019) (0.013) (0.011) (0.016) (0.027) (0.016) (0.028) (0.030) (0.014)

skew 0.301 0.194 0.162 0.430 0.376 0.358 0.301 0.314 0.395 (0.210) 0.877 1.063 1.313Both currency (equally weight)

1/1/1997 1/1/2002 (0.004) 0.000 0.003 0.010 0.014 0.018 0.020 0.018 (0.005) 0.009 (0.011) (0.014) (0.005)skew 0.609 0.488 0.429 0.487 0.500 0.598 0.652 0.506 0.033 (0.124) 1.028 1.239 0.619monthly sharpe ratio (0.367) 0.018 0.248 0.909 1.270 1.601 1.765 1.584 (0.442) 0.824 (0.964) (1.247) (0.474)





PPM p k-0_5:ds 2

PPM p k-0_5:ds 4

PPM p k-0_5:ds 6


r g*count/stdev


3 4 5 6 7 8 9 10 11 12 13 14 15stdev 6.03E-06 9.14E-06 1.13E-05 1.33E-05 1.38E-05 1.83E-05 2.15E-05 1.44E-05 0.002292 3.17E-05 0.002644 0.217239 0.000627

182.625 gbp_usd 18201/1/1997 7/2/1997 (0.063) (0.069) (0.072) (0.128) (0.130) (0.152) (0.161) (0.132) (0.104) (0.092) (0.090) (0.081) (0.192)7/2/1997 1/1/1998 0.095 0.108 0.107 0.222 0.216 0.174 0.148 0.210 (0.139) (0.075) 0.027 0.033 0.0081/1/1998 7/2/1998 0.108 0.138 0.149 0.183 0.179 0.163 0.147 0.175 0.018 0.195 0.173 0.138 0.0957/2/1998 1/1/1999 0.062 0.082 0.096 0.137 0.140 0.159 0.168 0.142 0.079 0.168 0.184 0.161 0.1751/1/1999 7/3/1999 (0.152) (0.128) (0.110) (0.062) (0.049) (0.019) (0.006) (0.036) (0.122) (0.049) (0.142) (0.149) (0.171)7/3/1999 1/1/2000 (0.069) (0.066) (0.064) 0.016 0.013 0.035 0.047 0.011 0.150 0.111 0.036 0.036 0.0471/1/2000 7/2/2000 0.052 0.085 0.104 0.128 0.144 0.135 0.129 0.157 (0.106) 0.036 0.046 0.042 0.0007/2/2000 1/1/2001 (0.115) (0.105) (0.103) (0.107) (0.098) (0.108) (0.114) (0.090) (0.179) (0.114) (0.226) (0.242) (0.152)1/1/2001 7/2/2001 (0.112) (0.107) (0.094) 0.002 0.015 0.038 0.048 0.027 (0.088) (0.010) (0.043) (0.069) (0.092)7/2/2001 1/1/2002 (0.024) (0.012) (0.008) (0.009) (0.023) (0.022) (0.023) (0.037) 0.083 0.074 0.078 0.068 0.094

1/1/1997 1/1/1999 0.050 0.064 0.069 0.097 0.094 0.079 0.069 0.091 (0.035) 0.047 0.071 0.061 0.0191/1/1997 1/1/2002 (0.021) (0.007) 0.001 0.036 0.039 0.038 0.036 0.040 (0.040) 0.024 0.004 (0.006) (0.019)

skew 0.491 0.581 0.603 0.695 0.767 0.496 0.347 0.824 (0.796) (0.424) (0.610) (0.663) (0.318)stdev 6.54E-06 9.85E-06 1.24E-05 1.89E-05 1.95E-05 2.66E-05 3.18E-05 2.03E-05 0.002123 3.32E-05 0.003388 0.217726 0.00062

182.625 usd_chf 36401/1/1997 7/2/1997 (0.380) (0.305) (0.262) (0.258) (0.251) (0.223) (0.209) (0.244) (0.145) (0.227) (0.110) (0.114) (0.059)7/2/1997 1/1/1998 (0.044) (0.023) (0.011) 0.067 0.076 0.092 0.103 0.084 (0.070) 0.000 (0.007) (0.002) (0.011)1/1/1998 7/2/1998 (0.053) (0.036) (0.030) (0.026) (0.016) (0.006) (0.006) (0.007) 0.023 (0.006) (0.058) (0.053) (0.005)7/2/1998 1/1/1999 0.040 0.030 0.033 0.155 0.143 0.160 0.175 0.131 0.118 (0.089) (0.022) (0.020) 0.0051/1/1999 7/3/1999 (0.152) (0.084) (0.049) (0.126) (0.116) (0.073) (0.054) (0.107) (0.049) (0.078) (0.082) (0.091) (0.081)7/3/1999 1/1/2000 (0.030) (0.010) 0.001 0.097 0.101 0.116 0.121 0.103 0.128 0.043 0.043 0.050 0.0591/1/2000 7/2/2000 0.085 0.133 0.162 0.236 0.247 0.302 0.318 0.256 0.282 0.368 0.348 0.362 0.2977/2/2000 1/1/2001 (0.068) (0.047) (0.038) (0.038) (0.036) (0.029) (0.027) (0.034) (0.029) 0.054 0.038 0.035 (0.010)1/1/2001 7/2/2001 0.062 0.032 0.018 (0.056) (0.050) (0.043) (0.041) (0.044) 0.018 (0.046) 0.038 0.034 0.0237/2/2001 1/1/2002 (0.210) (0.191) (0.183) (0.147) (0.161) (0.168) (0.170) (0.173) (0.059) (0.075) (0.146) (0.153) (0.078)

1/1/1997 1/1/1998 (0.213) (0.165) (0.138) (0.096) (0.088) (0.066) (0.054) (0.081) (0.108) (0.114) (0.060) (0.059) (0.036)1/1/1997 1/1/2002 (0.074) (0.050) (0.036) (0.010) (0.007) 0.013 0.021 (0.004) 0.022 (0.006) 0.004 0.005 0.015

skew (0.184) (0.140) (0.130) (0.070) (0.109) (0.040) 0.039 (0.139) 0.281 (0.019) (0.030) (0.059) (0.399)Both currency (equally weight)

1/1/1997 1/1/2002 (0.060) (0.036) (0.022) 0.016 0.019 0.030 0.033 0.021 (0.011) 0.012 0.005 (0.001) (0.003)skew 0.093 0.056 0.039 0.061 0.050 (0.010) (0.010) 0.042 (0.246) (0.121) (0.561) (0.617) (0.517)monthly sharpe ratio (5.344) (3.204) (1.976) 1.388 1.666 2.647 2.967 1.913 (1.023) 1.057 0.473 (0.088) (0.273)




from to PPM g 12 PPM g 2 PPM g 4 PPM g 8PPM p k0_0:ds 2

PPM p k-0_5:ds 12

PPM p k-0_5:ds 4

PPM p k-0_5:ds 8


r g*count/stdev


3 4 5 6 7 8 9 10 11 12 13 14 15 16stdev 2.45E-06 2.05E-06 2.13E-06 2.29E-06 2.66E-06 4.63E-06 3.19E-06 4E-06 2.66E-06 0.000733 4.34E-06 0.026474 4.084671 0.004931

30.5 eur_usd 1298/1/2001 9/1/2001 (0.398) (0.353) (0.381) (0.398) (0.469) (0.296) (0.419) (0.342) (0.465) (0.231) (0.260) (0.262) (0.258) (0.322)9/1/2001 10/1/2001 (0.218) (0.184) (0.202) (0.217) (0.190) (0.041) (0.151) (0.084) (0.184) (0.166) (0.130) (0.107) (0.105) (0.123)

10/1/2001 11/1/2001 0.122 0.069 0.093 0.115 0.021 (0.050) (0.008) (0.038) 0.019 0.188 0.040 0.219 0.235 0.32711/1/2001 12/1/2001 0.024 0.153 0.113 0.058 (0.112) (0.004) (0.083) (0.033) (0.110) 0.235 (0.145) (0.021) (0.012) 0.01912/1/2001 1/1/2002 (0.048) (0.052) (0.052) (0.050) 0.009 (0.126) (0.031) (0.089) 0.007 0.046 0.074 0.023 0.014 (0.018)1/1/2002 1/31/2002 0.247 0.209 0.233 0.249 0.083 (0.102) (0.005) (0.076) 0.077 0.279 0.227 0.382 0.362 0.313

8/1/2001 1/31/2002 (0.046) (0.016) (0.026) (0.038) (0.117) (0.095) (0.119) (0.107) (0.117) 0.054 (0.055) 0.029 0.031 0.030skew (0.741) (0.170) (0.374) (0.603) (1.179) (0.557) (1.280) (0.886) (1.179) 0.191 (0.788) (1.149) (1.152) (1.841)stdev 2.15E-06 1.65E-06 1.77E-06 1.97E-06 2.51E-06 4.78E-06 3.13E-06 4.07E-06 2.51E-06 0.000969 4.87E-06 0.021877 3.414314 0.00475

30.5 usd_chf 2588/1/2001 9/1/2001 (0.385) (0.307) (0.345) (0.375) (0.455) (0.320) (0.420) (0.358) (0.453) (0.180) (0.187) (0.273) (0.278) (0.263)9/1/2001 10/1/2001 (0.184) (0.177) (0.184) (0.188) (0.235) (0.068) (0.191) (0.115) (0.231) (0.242) 0.033 (0.151) (0.150) (0.170)

10/1/2001 11/1/2001 0.134 0.053 0.089 0.122 0.121 0.037 0.092 0.056 0.121 0.116 (0.022) 0.279 0.306 0.29911/1/2001 12/1/2001 0.154 0.192 0.177 0.161 (0.127) 0.073 (0.057) 0.029 (0.123) 0.271 0.085 0.048 0.035 0.08212/1/2001 1/1/2002 (0.116) (0.007) (0.048) (0.093) (0.032) (0.250) (0.115) (0.203) (0.038) (0.111) (0.057) 0.003 (0.002) (0.077)1/1/2002 1/31/2002 0.051 0.097 0.087 0.068 (0.027) (0.161) (0.095) (0.145) (0.032) 0.104 0.003 0.157 0.137 0.140

8/1/2001 1/31/2002 (0.052) (0.016) (0.030) (0.045) (0.133) (0.108) (0.133) (0.119) (0.133) (0.006) (0.033) 0.006 0.004 0.003skew (0.309) (0.103) (0.242) (0.311) (0.880) (0.271) (0.956) (0.591) (0.884) 0.848 0.858 (1.111) (1.016) (1.043)Both currency (equally weight)

8/1/2001 1/31/2002 (0.051) (0.016) (0.029) (0.043) (0.127) (0.103) (0.129) (0.115) (0.127) 0.024 (0.047) 0.018 0.018 0.017skew (0.789) (0.214) (0.437) (0.665) (1.112) (0.418) (1.186) (0.764) (1.113) 0.557 (0.446) (1.459) (1.385) (1.637)monthly sharpe ratio (4.556) (1.469) (2.559) (3.834) (11.398) (9.256) (11.535) (10.278) (11.394) 2.179 (4.193) 1.590 1.647 1.478

5 minutes data, 20 hours return, VQ of size 8. 20 hours sharpe ratio (return/stdev on overall period)



from to PPM g 12 PPM g 2 PPM g 4 PPM g 8PPM p k0_0:ds 2

PPM p k-0_5:ds 12

PPM p k-0_5:ds 4

PPM p k-0_5:ds 8


r g*count/stdev


3 4 5 6 7 8 9 10 11 12 13 14 15 16stdev 2.14E-07 1.46E-07 1.66E-07 1.95E-07 2.81E-07 5.36E-07 3.52E-07 4.55E-07 2.79E-07 0.000305 1.48E-06 0.002171 0.876599 0.000194

15.5 eur_usd 5358/1/2001 8/17/2001 (0.193) (0.202) (0.195) (0.191) (0.137) (0.046) (0.103) (0.069) (0.112) (0.078) (0.036) (0.277) (0.280) (0.313)

8/17/2001 9/1/2001 0.131 0.098 0.119 0.129 0.068 (0.024) 0.029 (0.008) 0.063 0.116 0.032 0.186 0.183 0.1589/1/2001 9/17/2001 (0.013) (0.038) (0.025) (0.016) 0.050 0.108 0.083 0.101 0.084 0.330 0.348 0.073 0.074 0.078

9/17/2001 10/2/2001 0.031 0.036 0.038 0.035 (0.017) (0.021) (0.009) (0.014) (0.019) 0.098 0.021 0.045 0.052 0.05010/2/2001 10/18/2001 0.077 0.075 0.076 0.077 0.088 0.099 0.105 0.105 0.098 0.046 0.018 0.046 0.044 0.036

10/18/2001 11/2/2001 (0.061) (0.051) (0.056) (0.060) (0.059) (0.012) (0.039) (0.021) (0.044) (0.030) (0.022) (0.143) (0.141) (0.106)

8/1/2001 11/2/2001 (0.014) (0.022) (0.017) (0.014) (0.006) 0.015 0.007 0.013 0.007 0.070 0.055 (0.018) (0.018) (0.024)skew 1.959 1.302 1.719 1.942 1.853 0.765 1.594 1.087 1.768 11.909 19.887 (0.291) (0.361) (0.709)stdev 2.03E-07 1.2E-07 1.46E-07 1.81E-07 2.67E-07 5.67E-07 3.46E-07 4.68E-07 2.67E-07 0.000291 1.4E-06 0.002037 0.831939 0.000184

15.5 usd_chf 10708/1/2001 8/17/2001 (0.108) (0.141) (0.120) (0.109) (0.090) (0.028) (0.064) (0.042) (0.070) (0.040) (0.030) (0.223) (0.231) (0.231)

8/17/2001 9/1/2001 0.119 0.099 0.115 0.119 0.062 (0.029) 0.023 (0.013) 0.054 0.107 0.028 0.144 0.144 0.1249/1/2001 9/17/2001 0.005 (0.065) (0.028) (0.004) 0.014 0.075 0.044 0.063 0.055 0.302 0.378 0.100 0.101 0.091

9/17/2001 10/2/2001 0.113 0.090 0.114 0.118 0.048 (0.022) 0.023 (0.007) 0.047 0.152 0.048 0.124 0.124 0.04210/2/2001 10/18/2001 0.126 0.094 0.107 0.119 0.137 0.109 0.146 0.128 0.145 0.062 0.022 0.080 0.080 0.091

10/18/2001 11/2/2001 (0.090) (0.079) (0.082) (0.086) (0.099) (0.053) (0.079) (0.061) (0.087) (0.044) (0.034) (0.152) (0.147) (0.095)

8/1/2001 11/2/2001 0.015 (0.012) 0.005 0.013 0.008 0.008 0.014 0.010 0.021 0.078 0.062 0.004 0.004 (0.007)skew 1.666 0.940 1.383 1.600 0.782 (1.509) 0.079 (1.051) 0.757 9.323 18.807 (0.148) (0.103) (1.379)Both currency (equally weight)

8/1/2001 11/2/2001 0.000 (0.017) (0.006) (0.000) 0.001 0.012 0.011 0.012 0.015 0.074 0.056 (0.007) (0.007) (0.016)skew 2.092 1.283 1.769 2.032 1.528 (0.317) 0.991 0.111 1.441 11.151 19.482 (0.273) (0.330) (0.927)monthly sharpe ratio 0.082 (17.199) (6.161) (0.500) 1.206 11.588 10.556 11.611 14.575 74.016 55.502 (7.414) (7.162) (15.934)

5 minutes data, 3 hours return, VQ of size 8. 3 hours sharpe ratio (return/stdev on overall period)


Introduction - Duke's Fuqua School of Businesscharvey/Teaching/Indepen… · Web viewI....

Documents

Transcript of Introduction - Duke's Fuqua School of Businesscharvey/Teaching/Indepen… · Web viewI....