Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

20
Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010

Transcript of Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Page 1: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Statistical Arbitrage

Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu

March, 2010

Page 2: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Outline

Overview of the project Implement issues Data adjustment mistakes Stocks classification Future work

Page 3: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Framework

RawHistoricalData From WRDS

PCAEigenportfolio

s

PCAEigenportfolio

s

Residuals as

increments of AR

process

Residuals as

increments of AR

process

Compute S-

scores

Compute S-

scoresETFsfor industry

sectors

ETFsfor industry

sectors

Signal trade orders

Signal trade orders

Marketmodel

60-dayreturns

Residualprocessmodel

Current stock prices

Marketmodel

252-day returns

AdjustedStock priceSeries+indices

Data pre-processing(python scripts) Back-testing simulations (matlab scripts)

iii RFR~

Page 4: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Implementation Issues

Delist tomorrow Criteria: detect tomorrow’s outstanding

shares In the portfolio, close transaction Not in the portfolio, not consider trading

but still consider PCA calculating Today’s price == 0 in the middle

Not consider PCA calculating and trading In the portfolio, keep it

Page 5: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Implementation Issues (Cont’d)

Market Cap < 1B If already in the portfolio, keep it and

consider trading No, not consider PCA calculating and

trading Stocks picked to calculate

Eigenportfolio Today’s price != 0 Previous 252 days have nonzero prices Market Cap > 1B or already in the portfolio

Page 6: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

0 500 1000 1500 2000 2500 3000 35000

0.5

1

1.5

2

2.5

3

3.5x 10

8 Fund Value

day

Dolla

rs

12 Dec 1994

04 Nov 2008

Page 7: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Data Adjustment Mistakes

Dividend adjustment

)( * 0)()(

0

0 ttPP

DIVPP old

tt

tnewt

DATE PRC SHROUT DIVAMT Adjusted Price Yahoo Adjusted

20081009 10.08 57428 . 0.98851 4.07

20081010 8.77 145887 . 0.33855 3.54

20081013 5.44 145887 5.23 0.21 5.44

20081014 5.45 145887 . 5.4501 5.45

20081015 5.14 145887 . 5.1401 5.14

20081016 5.34 145887 . 5.3401 5.34

20081017 5.33 145887 . 5.3301 5.33

20081020 6.14 145887 . 6.1401 6.14

20081021 5.96 145887 5.9601 5.96

Page 8: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Data Adjustment Plan

Dividend adjustment Split detection and adjustment using

CFACPR and CFACSHRDATE PRC VOL SHROUT DIVAMT FACPR FACSHR CFACPR CFACSHR

20090807 0.26 26066 23346 . . . 0.05 0.05

20090810 -7.1 176 1167 0 -0.95 -0.95 1 1

20090811 5.975 1937 1167 . . . 1 1

20090812 6.3499 3406 1167 . . . 1 1

20090813 4.78 26123 1167 . . . 1 1

20090814 4.2999 27486 1167 . . . 1 1

20090817 4.05 1658 1167 . . . 1 1

20090818 4.3 6042 1167 . . . 1 1

20090819 4.06 10015 1167 . . . 1 1

20090820 3.7972 8805 1167 . . . 1 1

Page 9: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Stock Classification

Using GIC (Global Industry Classification) in CRSP

10 Sectors, 24 Industry Groups, 67 Industries and 147 Sub-Industries

XXXXXXXX

Sector

Industry Groups

Industries

Sub-Industry

Page 10: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Stock Classification (Cont’d)

Page 11: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

PCA eigenportfolio Weights Normalization Basic principle Find the most important

eigenvectors (15 in the paper) and normalize them by the corresponding standard deviations of each stock return

Page 12: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

PCA algorithm by the author

Suppose X is a nxp matrix including n samples and p features;

Original algorithm: Calculate the Eigen-decomposition of

the correlation matrix:

The matrix Q consists of the Eigen-vectors of the correlation matrix

1QDQXX T

Page 13: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

PCA discussion?

Question Should the eigenvector be divided by

the sigma, the sample standard deviation?

Answer: No. (different from the paper)

Page 14: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

PCA discussion

The meaning of “risk factor” F F should represent the market overall performance. The behavior of F should act as the “market return”

What can PCA do? PCA is mathematically defined as an orthogonal

linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.

PCA is theoretically the optimum transform for given data in least square terms.

Page 15: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

PCA discussion

Derivation Notations

F =EX F :mxn matrix, represents the eigenportifolio E: mxp matrix, first m important eigenvectors X: pxn matrix, contains the stock return m: 15 in the paper n: the number of days, (samples) p: the number of stocks

Page 16: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

PCA discussion

Derivation

The i-th row of the eigenportfolio The variation should be maximized under the

constraint that

to be maximized, then

That is to say, the weighting factor should be the eigenvectors rather than the eigenvectors divided by the standard deviation. (The experiment is the same without dividing)

XEF ii

Tii EE 1

1 Tii

Tii EEEE

0 pi IE

Page 17: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Experiment result

Top 50 eigenvalues of the correlation

matrix of market returns computed on May 1 2007 estimated using a 1-year window and a universe of 1590 stocks

0 10 20 30 40 50 600

0.05

0.1

0.15

0.2

0.25

Page 18: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Value of the first eigenvector

0 200 400 600 800 1000 1200 1400 1600

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

Valu

e o

f eig

envecto

rs

Page 19: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

Future work

Data adjustment Experiment on ETF

Compare ETF with PCA Take into account

Transaction fee, interest, dividend Volume

Page 20: Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.

THANK YOU