The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for...

151
The Search for Repeatable Performance Campbell R. Harvey Duke University, NBER and Man Group plc 1 February 20, 2017 International Finance

Transcript of The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for...

Page 1: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Search for Repeatable Performance

Campbell R. HarveyDuke University, NBER and

Man Group plc

1

February 20, 2017

International Finance

Page 2: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Campbell R. Harvey 2017 2

Page 3: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Source: https://xkcd.com/882/ Campbell R. Harvey 2017 3

Page 4: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Campbell R. Harvey 2017 4Source: https://xkcd.com/882/

Page 5: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Campbell R. Harvey 2017 5Source: https://xkcd.com/882/

Page 6: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Campbell R. Harvey 2017 6Source: https://xkcd.com/882/

Page 7: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Campbell R. Harvey 2017 7Source: https://xkcd.com/882/

Page 8: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Campbell R. Harvey 2017 8

Skip 17 panels of more negative tests, all p‐values>0.05

Page 9: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Campbell R. Harvey 2017 9Source: https://xkcd.com/882/

Page 10: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Campbell R. Harvey 2017 10Source: https://xkcd.com/882/

Page 11: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

Two sigma rule only appropriate for a single test• As we do more tests, there is a chance we find something “significant” (by the two sigma rule) but it is a fluke.

• Here is a simple way to see the impact of multiple tests for a two sigma test:

Campbell R. Harvey 2015

# of tests 1 5 10 20 26 50 nProb of fluke 5% 23% 40% 64% 74% 92% 1‐0.95^n

XKCD Jelly Beans and Acne

11

Page 12: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

The promotional email:• You get an email at the end of each month from an investment manager with “judge my record” as a slogan

• The email recommends either a long or a short position in the S&P• After receiving 10 correct recommendations in a row, you switch your investment account to the new manager

Campbell R. Harvey 2017 12

Page 13: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

The promotional email• Later out, you find out (the hard way) the strategy• Manager sends out each month 100,000 emails: 

50,000 saying long and 50,000 short• The next month manager sends only to those who got the correct prediction, so next month is 25,000 long and 25,000 short recommendations

• 97 people will get 10 correct in a row (100,000 x 0.510 )• No skill here. It is random.

Campbell R. Harvey 2017 13

Page 14: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

3.4 sigma strategy• Profitable during fin crisis• Zero beta vs. market, value,size, and momentum• Impressive performance recently

14Campbell R. Harvey, “The Scientific Outlook in Financial Economics”, Presidential Address, American Finance Association, 2017

Campbell R. Harvey 2017

Page 15: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

Details• Long tickers “S”• Short tickers “U”

15Campbell R. Harvey 2017

Page 16: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

Two sigma rule only appropriate for a single test• As we do more tests, there is a chance we find something “significant” (by the two sigma rule) but it is a fluke.

• Here is a simple way to see the impact of multiple tests for a two sigma test:

Campbell R. Harvey 2015

# of tests 1 5 10 20 26 50 nProb of fluke 5% 23% 40% 64% 74% 92% 1‐0.95^n

XKCD Jelly Beans and Acne Alphabet, i.e., ticker symbols

16

Page 17: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

Research• Companies with meaningful ticker symbols, like Southwest’s LUV, and show they outperform.1

• There is another study that argues that tickers that are easy to pronounce, like BAL vs. BDL, outperform in IPOs.2

• There is yet another study that suggests that tickers that are congruentwith the company’s name, outperform.3

171 Head, Smith and Watson, 2009; 2 Alter and Oppenheimer, 2006; 3 Srinivasan and Umashankar

Campbell R. Harvey 2017

Page 18: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

5 factors

Campbell R. Harvey 2017 18

Page 19: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

Campbell R. Harvey 2017

15 factors

19

Page 20: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

82 factors

Campbell R. Harvey 2017Source: The Barra US Equity Model (USE4), MSCI (2014)

20

Page 21: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

400 factors!

Campbell R. Harvey 2017Source: https://www.capitaliq.com/home/who‐we‐help/investment‐management/quantitative‐investors.aspx

21

Page 22: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Financial Economics

18,000 signals examined in Yan and Zheng (2015)

22Campbell R. Harvey 2017

Page 23: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

What’s going on?Forces causing mistakes1. Failure to account for luck + evolutionary propensity not to account for luck2. Failure in specifying and conducting scientific tests3. Failure to take rare effects into account

23Campbell R. Harvey 2017

Page 24: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

A framework to separate luck from skill

Four research initiatives:*1. Explicitly adjust for multiple tests (“Backtesting”)2. Bootstrap (“Lucky Factors”)3. Noise reduction (“Rethinking Performance Evaluation”)4. Controlling for rare effects (“Scientific Outlook in Financial 

Economics”)

24Campbell R. Harvey 2017

*Bibliography on last page. All my research at: https://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=16198

Page 25: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Luck

• Why are we so easily fooled by randomness?

Campbell R. Harvey 2017 25

Page 26: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Terminology

26Campbell R. Harvey 2017

Page 27: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Terminology

27

I thought this manager was skilled but that was a mistake:False Positive

Campbell R. Harvey 2017

Page 28: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Terminology

28Campbell R. Harvey 2017

Page 29: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Terminology

29

I didn’t invest inthis manager but that was a mistakeFalse Negative

Type II linked toType I. • For example, if all 

patients declared pregnant, there is no Type II error.

Campbell R. Harvey 2017

Page 30: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017Rustling sound in the grass …. 30

Page 31: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017Rustling sound in the grass ….

Type I error

31

Page 32: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017

Type II error

32

Page 33: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017

Type II errorIn examples, cost of Type II error is large – potentially death.

33

Page 34: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017

• High Type I error (low Type II error) animals  survive• This preference is passed on to the next generation• This is the case for an evolutionary predisposition for allowing high Type I errors

34

Page 35: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017

B.F. Skinner 1947

Pigeons put in cage. Food delivered at regular intervals – feeding time has nothing to do with behavior of birds.

35

Page 36: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017

Results• Skinner found that birds associated their behavior with food delivery• One bird would turn counter‐clockwise• Another bird would tilt its head back

36

Page 37: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017

Results• A good example of overfitting – you think there is pattern but there isn’t• Skinner’s paper called: 

• ‘Superstition’ in the Pigeon, JEP (1947)

• But this applies not just to pigeons or gazelles…

37

Page 38: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017

Klaus Conrad 1958

Coins the term Apophänie. This is where you see a pattern and make an incorrect inference. He associated this with psychosis and schizophrenia.

38

Page 39: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017 39

Page 40: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017 40

Page 41: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017 41

Page 42: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017

• Apophany is a Type I error (i.e., false insight)• Epiphany is the opposite (i.e., true insight)

– Apophany may be interpreted as overfitting

K. Conrad, 1958. Die beginnende Schizophrenie. Versuch einer Gestaltanalyse des Wahns

“....nothing is so alien to the human mind as the idea of randomness.” ‐‐John Cohen

42

Page 43: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017

• Sagan (1995):– As soon as the infant can see, it recognizes faces, and we now know that this skill is hardwired in our brains. 

C. Sagan, 1995. The Demon‐Haunted World43

Page 44: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Evolutionary Foundations

Campbell R. Harvey 2017

• Sagan (1995):– Those infants who a million years ago were unable to recognize a face smiled back less, were less likely to win the hearts of their parents and less likely to prosper. 

44

Page 45: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

What about Finance?

Performance of trading strategyis very impressive. • SR=1• Consistent• Drawdowns acceptable

Source: Man‐AHL Research

Campbell R. Harvey 2017 45

Page 46: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

What about Finance?

Source: Man‐AHL Research

Campbell R. Harvey 2017 46

Page 47: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

What about Finance?

Sharpe = 1

Sharpe = 2/3

Sharpe = 1/3

Source: Man‐AHL Research

200 random time‐seriesmean=0; volatility=15%

Campbell R. Harvey 2017 47

Page 48: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Other Sciences?

Particle Physics• Higg’s boson proposed in 1964 (same year as Sharpe published the CAPM)

• First tests of the CAPM in 1972 • Nobel award in 1990. 

• Longer road for Higgs: • $5 billion to construct LHC. • “Discovered” in 2012. • Nobel 2013.

Campbell R. Harvey 2017 48

Page 49: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Other Sciences?

Particle Physics• Testing method very important• Particle rare and decays quickly and the key is measuring the decay signature

• Frequency is 1 in 10 billion collisions and over a quadrillion collisions were conducted

• Problem is that the decay signature could also be caused by normal events from known processes

Campbell R. Harvey 2017 49

Page 50: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Other Sciences?

Particle Physics• The two groups involved in testing (CMS and ATLAS) decided on what appears to be a tough standard: t‐statistic must exceed 5 (i.e., 5‐sigma)

Campbell R. Harvey 2017 50

Page 51: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Other Sciences?

Genetic‐Wide Association Studies• Genetic association research plagued by multiple testing• Researchers try to link certain diseases to certain genes• More than 20,000 human genes• In addition, there is a massive number of combinations of genes• For the first 10 years of publication of association studies, 98% of published results have been found to be false– John Ioannidis: “…There are millions of scientists, some of whom run millions of analyses in each study they conduct. To avoid false‐positive results in genetics, the current goal for a p value should be less than 0.00000005.”*

Campbell R. Harvey 2017 51*Approximately 5.3 sigma

Page 52: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Other Sciences?

Genetic‐Wide Association Studies• Recent paper in Nature claims two genetic linkages to Parkinson’s Disease• Over 500,000 genetic sequences are tried• By chance, thousands of sequences will appear to be linked to the disease• Identified loci had t‐statistics>5.3

Campbell R. Harvey 2017 52

Page 53: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests

• Provide a new framework to do multiple tests in the presence of correlations among tests and publication bias (hidden tests)

• Provide guidelines for future research

Campbell R. Harvey 2017 53

Page 54: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Number of Factors and Publications

0

40

80

120

160

200

240

280

0

10

20

30

40

50

60

70

Cumulative

Per y

ear

Factors and Publications

# of factors # of papers Cumulative # of factors

Campbell R. Harvey 2017 54

Page 55: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: How Many Discoveries Are False?

• In multiple testing, how many tests are likely to be false? • In single testing (significance level = 5%),  5% is the “error rate” (false discoveries)

• In multiple testing, the false discovery rate (FDR) is usually much larger than 5%

Campbell R. Harvey 2017 55

Page 56: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Bonferroni's Method

• Here is a simple adjustment called the Bonferroni adjustment • For a single test, you are tolerant of 5% false discoveries• Hence, a p‐value of 5% or less means you declare a finding “true”• Bonferroni simply multiplies the p‐value by the number of tests

Campbell R. Harvey 2017 56

Page 57: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Bonferroni's Method

• Bonferroni simply multiplies the p‐value by the number of tests• In a single test, if you get a p‐value of 0.05 you declare “significant”• Returning to the Jelly Bean, suppose the green jelly bean test has a p‐value of 0.04 – which appears “significant”

• Bonferroni adjustment 20x0.04 = 0.80 which is “not significant” – not even close!

Campbell R. Harvey 2017 57

Page 58: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Bonferroni's Method

• Stock market does better under Democratic presidents

• Difference “significant” p‐value=.03

Campbell R. Harvey 2017 58

Page 59: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Bonferroni's Method

• However many possible choices for the test ‐‐ here are some:

• Bonferroni adjustment eliminates “significant” difference

Campbell R. Harvey 2017 59

President  + House + Senate vs. President  + House + SenatePresident  + House + Senate vs. President  + House + SenatePresident  + House + Senate vs. President  + House + SenatePresident  + House + Senate vs. President  + House + SenatePresident  + House + Senate vs. President  + House + SenatePresident  + House + Senate vs. President  + House + SenatePresident  + House + Senate vs. President  + House + SenatePresident  + House + Senate vs. President  + House + SenatePresident  + House + Senate vs. President  + House + Senate

Cocquemas and Whaley, 2016

Page 60: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Rewriting History

Campbell R. Harvey 2017

HML MOM

MRT

EP SMB

LIQ

DEFIVOL

SRV

CVOL

DCG

LRV

316  factors in 2012 if  working 

papers are included

0

80

160

240

320

400

480

560

640

720

800

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

1965 1975 1985 1995 2005 2015 2025

Cumulative # of factors

t‐ratio

BonferroniHolmBHYT‐ratio = 1.96 (5%)

60

Page 61: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Discussion

However:• Independence among test statistics is still not dealt with.• The number of hidden tests seems too low.

Campbell R. Harvey 2017 61

Page 62: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: A New Framework

Campbell R. Harvey 2017

No skill. Expected return = 0%

Skill. Expected return = 6%

62

Page 63: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Harvey, Liu and Zhu Approach 

Allows for correlation among strategy returns Allows for missing tests Review of Financial Studies, 2016

Campbell R. Harvey 2017 63

Page 64: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Backtesting

• Due to data mining, a common practice in evaluating backtests of trading strategies is to discount Sharpe ratios by 50%

• The 50% haircut is only a rule of thumb; we develop an analytical way to determine the haircut

Campbell R. Harvey 2017 64

Page 65: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Backtesting

Method • Suppose we observe a strategy with an attractive Sharpe Ratio. • This Sharpe Ratio directly implies a p‐value (which roughly tells you the probability that your strategy is a fluke)

• Suppose the p‐value is 0.01 which looks pretty good.

Campbell R. Harvey 2017 65

Page 66: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Backtesting

Method • However, suppose you tried 10 strategies and picked the best one• The Bonferroni adjusted p‐value is 10x0.01 = 0.10 which would not be deemed “significant”

• Reverse engineer the 0.10 back to the “haircut” Sharpe Ratio*  

Campbell R. Harvey 2017*Note Tstat SR√T 66

Page 67: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

1. Multiple Tests: Backtesting

Campbell R. Harvey 2017

Results: Percentage Haircut is Non‐Linear

Journal of Portfolio Management

67

Page 68: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

2. Bootstrapping

Multiple testing approach has drawbacks• Need to know the number of tests• Need to know the correlation among the tests• With similar sample sizes, this approach does not impact the ordering of performance

Campbell R. Harvey 2017 68

Page 69: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

2. Bootstrapping: Lucky Factors

Suppose we have 100 possible fund returns and 500 observations.• Step 1. Strip out the alpha from all fund returns (e.g. regress on benchmark and use residuals). This means alpha and t‐stat exactly equal zero – we have enforced “no skill”.

• Step 2. Bootstrap rows of the data to produce a new sheet 500x100* (note some rows sampled more than once and some not sampled at all)

Campbell R. Harvey 2017

*500x101 with the benchmark included69

Page 70: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Campbell R. Harvey 2017

Insert animation here

70

Page 71: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

2. Bootstrapping: Lucky Factors

• Step 3. Recalculate the alphas and t‐stats on new data. Save the highest t‐statistic from the 100 funds. Note, in the unbootstrapped data, every t‐statistic is exactly zero.

• Step 4. Repeat steps 2 and 3 10,000 times.• Step 5. Now that we have the empirical distribution of the max t‐statistic under the null of no skill, compare to the max t‐statistic in real data. 

Campbell R. Harvey 2017 71

Page 72: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

2. Bootstrapping: Lucky Factors

• Step 5a. If the max t‐stat in the real data fails to exceed the threshold (95th percentile of the null distribution), stop (no fund has skill). 

• Step 5b. If the max t‐stat in the real data exceeds the threshold, declare the fund, say, F7, “true”

Campbell R. Harvey 2017

‐6   ‐5   ‐4    ‐3   ‐2   ‐1    0    1    2    3    4    5    6 

Bootstrap distributionof the max t‐stat

95th percentile t=4.2

72

Page 73: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

2. Bootstrapping: Lucky Factors

• Step 6. Replace the F7  (no skill) with the actual F7 (positive alpha). • Step 7. Note that 99 funds have zero alpha and one fund has positive alpha.

Campbell R. Harvey 2017 73

Page 74: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

2. Bootstrapping: Lucky Factors

• Step 8. Repeat Steps 3‐5 but now we are saving the “second to max” and comparing to the second highest t‐ratio in the real data.

• Step 9. Continue until the max ordered t‐statistic in the data fails to exceed the max ordered from the bootstrap. 

Campbell R. Harvey 2017 74

Page 75: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

2. Bootstrapping: Lucky Factors

Baseline model

YesAugmented model

No

Candidate factors

Terminate to arrive at the final model

75Campbell R. Harvey 2017

Page 76: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

2. Bootstrapping: Lucky Factors

• Addresses data mining directly• Allows for cross‐correlation of the fund strategies because we are bootstrapping rows of data

• Allows for non‐normality in the data (no distributional assumptions imposed – we are resampling the original data)

• Potentially allows for time‐dependence in the data by changing to a block bootstrap.

• Answers the questions: • How many funds out‐perform? • Which ones were just lucky?

Campbell R. Harvey 2017 76

Page 77: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

2. Bootstrapping: Lucky Factors

Campbell R. Harvey 2017 77

Page 78: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Issue • Past alphas do a poor job of predicting future alphas (e.g., top quartile managers are about as likely to be in top quartile next year as this year’s bottom quartile managers!)

78Campbell R. Harvey 2017

Page 79: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Issue • This could be because all managers are unskilled – or it could be a result of a lot of noise historical performance

79Campbell R. Harvey 2017

Page 80: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Goal • Develop a metric that maximizes cross‐sectional predictability of performance• Useful for separating “skill” vs. “luck” and “smart” vs. “not‐smart”

80Campbell R. Harvey 2017

Page 81: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Observed performance consists of four components:• Alpha• True factor premia• Unmeasured risk (e.g., low vol strategy having negative convexity)• Noise (good or bad luck)

81Campbell R. Harvey 2017

Page 82: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Intuition • Current alpha is overfit. Regression maximizes the time‐series R2 for a particular fund. 

• This time‐series regression has nothing to do with cross‐sectional predictability. 

• All of the noise will be put in the alpha.• No surprise that past alpha have no ability to forecast future alphas

Campbell R. Harvey 2017 82

Page 83: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Our approach• We follow the machine learning literature and “regularize” the problem by imposing a parametric distribution on the cross‐section of alphas. 

• Leads to lower time‐series R2 – but higher cross‐sectional R2

Campbell R. Harvey 2017 83

Page 84: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

• t‐stat = 3.9%/4.0% = 0.98 < 2.0• alpha = 0 cannot be ruled out

Campbell R. Harvey 2017 84

Page 85: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

• Both t‐stats < 2.0• alpha = 0 cannot be rejected for either

Campbell R. Harvey 2017 85

Page 86: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

• t‐stat < 2.0 for all funds• alpha = 0 cannot be excluded for all• However, population mean seems to cluster around 4.0%. Should we declare all alphas as zero? 

Estimated alphas cluster around 4.0%

Campbell R. Harvey 2017 86

Page 87: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

• Although no individual fund has a statistically significant alpha, the population mean seems to be well estimated at 4.0%.

• This might suggest grouping all funds into an index and estimating the alpha for the index. However, the index regression does not always work, as the next example shows. 

Campbell R. Harvey 2017 87

Page 88: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

• Again, no fund generates a significant alpha individually

• An index fund that groups all funds together would indicate an approximately zero alpha for the index

• Fund alphas cluster into two groups. The two group classification seems more informative than declaring all alphas zero

Campbell R. Harvey 2017 88

Page 89: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

We assume that fund alphas are drawn from an underlying distribution (regularization)

– In Example 1, the distribution is a point mass at 4.0%; in Example 2, the distribution is a discrete distribution that has a mass of 0.5 at ‐4.0% and 0.5 at 4.0%

– We search for the best fitting distribution that describes the cross‐section of fund alphas using a generalized mixture distribution

Campbell R. Harvey 2017 89

Page 90: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

We refine the alpha estimate of each individual fund by drawing information from this underlying distribution

– In Example 1, knowing that most alphas cluster around 4.0% would pull our estimate of an individual fund’s alpha towards 4.0% and away from zero.

– In Example 2, knowing that alphas cluster at ‐4.0% and 4.0% with equal probabilities would pull our estimate of a negative alpha towards ‐4.0% and a positive alpha towards 4.0%, and both away from zero.

Campbell R. Harvey 2017 90

Page 91: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Campbell R. Harvey 2017 91

Page 92: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Key idea: – We assume that true alphas follow a parametric distribution. We back out this distribution from the observed returns and use it to aid the inference of each individual fund. 

Main difficulty: – We do not observe the true alphas. We only observe returns, which provide noisy information on true alphas. 

Our approach:– We treat true alphas as missing observations and adapt the Expectation‐Maximization (EM) algorithm to uncover the true alphas.

Campbell R. Harvey 2017 92

Page 93: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Iterative method: – This method weights both the time‐series information for a particular fund’s alpha as well as the cross‐sectional information.

– This delivers a new estimate of alpha – the noise‐reduced alpha, distribution for that individual fund’s alpha, as well as a cross‐sectional distribution. The shapes of the distributions is general (we use as generalized mixture distribution).

Campbell R. Harvey 2017 93

Page 94: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

• Estimate fund‐by‐fund OLS alphas, betas, and standard errors. • Call these alpha0, beta0, sigma0 (denote square of sigma as var0).• Assume a two component population GMD and fit the GMD0 based on the OLS alphas, i.e. each fund’s alpha0. 

• This implies one set of five parameters, MU01, MU02, SIGMA01, SIGMA02, P0 (mixing parameter). The first subscript denotes the iteration step. 

• Also perturb these parameters to have 35 population GMDs for starting values (we want to minimize the chance we hit a local optima). – Note population parameters are denoted in UPPER CASE and fund‐specific parameters in lower case. 

Campbell R. Harvey 2017 94

Page 95: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

• Given fund‐specific alpha0, beta0 and var0, and the population GMD0 also fit fund‐specific GMDs denoted as gmd0 (again, lower case for fund specific). 

• If the GMD is one component (i.e., a normal distribution), then the alpha for fund 1 also follows a one‐component GMD (i.e., a normal distribution). 

• The mean of gmd0 would be: 

alpha/

GMDMU //

• Note that VAR0 is the variance of the population GMD (i.e. cross‐sectional variance). Hence, if the alpha0 is precisely estimated (high R2and low var0/T), there is a greater weight placed on the alpha0. 

• This will be alpha1 for a candidate fund under a single component GMD. Campbell R. Harvey 2017 95

Page 96: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking• If the GMD is two components, there are five parameters and, again, they will be 

a weighted average of the fund‐specific alpha0 parameters and the GMD0. • The parameters governing this fund specific gmd will be conditional on the fund’s 

betas, standard error, and the GMD that govern the alpha population. 

mu01 alpha/

MU //

var01 1/1

var /1

VAR

mu02 alpha/

MU //

var02 1/1

var /1

VAR

Campbell R. Harvey 2017 96

Page 97: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Details of method:• There is also a fifth parameter of the gmd, p0 (the drawing probability from the gmd component). 

• Its formula is a function of the GMD’s P0 and is provided on p. 49 of our paper. 

• The basic intuition is that we increase the drawing probability to the component that implies a mean that is closer to the mean of the population GMD. For example, we will make p0 larger if alpha0 is closer to MU01 than MU02. 

Campbell R. Harvey 2017 97

Page 98: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Details of method:• For each fund's gmd, we calculate its mean. We estimate new regressions where we constrain the intercepts to be the calculated means. This will produce different estimates of the fund betas (beta1) and the standard errors (sigma1). 

Campbell R. Harvey 2017 98

Page 99: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Details of method:• We fit a new GMD based on the cross‐section of gmd's. For each fund, we randomly draw n = 10,000 alphas from its gmd. Suppose we have n funds in the cross‐section. We will have mn draws from the entire panel. We find the MLE of the GMD that best describes these mn alphas.

• Recalculate fund‐specific gmds (gmd1) and draw alpha2• Continue to iterate until there is negligible change in the parameters of the GMD.

• Repeat the entire process 35 times with different initial GMD0s to ensure global convergence.

Campbell R. Harvey 2017 99

Page 100: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

• An exemplar outperforming fund

Campbell R. Harvey 2017 100

Page 101: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

Campbell R. Harvey 2017 101

Page 102: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

3. Noise reduction: Rethinking

• In‐sample: 1984‐2001; Out‐of‐sample: 2002‐2011 In‐sample,  NRA forecast 

error (%)OLS forecast error (%)

# of funds

(‐∞, ‐2.0) 3.29 6.61 64

[‐2.0, ‐1.5) 3.09 3.70 75

[‐1.5, 0) 2.75 2.92 565

[0, 1.5) 2.61 5.54 610

[1.5, 2.0) 2.38 10.47 87

[2.0, +∞) 2.77 12.02 87

Overall 2.71 5.17 1,488

*Mean absolute forecast errors.Campbell R. Harvey 2017 102

Page 103: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

4. P‐Hacking

• Choices are made in research that lead to a positive outcome

Campbell R. Harvey 2017 103

Page 104: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Hurricane and Himicane

Jung, Shavitt, Viswanathan and Hilbe, “Female hurricanes are deadlier than male hurricanes” Proceedings of the National Academy of Sciences, 2014

Hypothesis: Sexism causes people take less seriously hurricanes with female names.

http://www.pnas.org/content/111/24/8782

2015 Impact Factor = 9.4Publishes 3,100 papers per year

Campbell R. Harvey 2017 104

Page 105: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Hurricane and Himmicane

Jung, Shavitt, Viswanathan and Hilbe, 2014:“a hurricane with a relatively masculine name … is estimated to cause 15.15 deaths, whereas a hurricane with a relatively feminine name … is estimated to cause 41.84 deaths. Our model suggests that changing a severe hurricane’s name from Charley … to Eloise could nearly triple its death toll.”

“a hazardous form of implicit sexism”http://www.pnas.org/content/111/24/8782 Campbell R. Harvey 2017 105

Page 106: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Hurricane and Himmicane

However, certain choices were made by the researchers:• Why exclude named tropical storms?  18 Atlantic tropical storms caused 235 deaths compared to 22 hurricanes causing 614 deaths

• Why exclude storms that do not make landfall? • Why only count offshore fatalities if the storm comes on‐shore?• Why exclude fatalities outside the US? In 1980, Hurricane Allen made landfall near Brownsville, TX on the border of Mexico. There were 269 deaths but Jung et al. count only 2.

• Why exclude fatalities from other countries?• Why not test robustness of results using Pacific hurricane data?

106http://dx.doi.org/10.1016/j.wace.2015.11.006 Campbell R. Harvey 2017

Page 107: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Hurricane and Himmicane

Gary Smith, “Hurricane names: A bunch of hot air?” Weather and Climate Extremes, 2016, accuses authors of:• Arbritrary exclusion of “outliers”• Arbitrary construction of a masculinity‐femininity index on a scale of 1‐11. (Sandy is considered strongly feminine – more than Edith)

• Estimation of dozens of models and cherry picking the one that gives “significant” results. 

107http://dx.doi.org/10.1016/j.wace.2015.11.006

Impact Factor =  1.4

Campbell R. Harvey 2017

Page 108: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Hurricane and Himmicane

Gary Smith:2016, accuses authors of:• Dropping a key variable “years elapsed since the occurrence of hurricanes”

• Misspecification of the basic model by including monetary damages as an “explanatory variable” (monetary damage cannot ‘cause’ fatalities)

• Accuses authors of “data dredging”: “If you torture the data long enough, it will confess.”

• No significant differences when sample expanded and specification corrected

108http://dx.doi.org/10.1016/j.wace.2015.11.006

Impact Factor =  1.4

Campbell R. Harvey 2017

Page 109: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

4. P‐Hacking

• Sample selection• “Outlier” exclusion• Data transformation (scaling)• Variable selection and combination• Statistical test choice• In‐sample/out‐of‐sampleIndeed, most academic and commercial research suffers from some form of p‐hacking.

Campbell R. Harvey 2017 109

Page 110: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

5. Rare Effects

• Bonferroni correction increases the threshold as a result of multiple tests

• If an effect is rare, there will be a very high Type I error rate (a lot of false positives)

Campbell R. Harvey 2017 110

Page 111: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Power Pose

Hypothesis: Standing in a posture of confidence impacts testosterone and cortisol levels in the brain leading to        increased risk taking.

Evidence: Carney, Cuddy and Yap (2010) Psychological Science

Carney, Cuddy and Yap, 2012. Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance, Psychological Science 21(1) 1363‐1368.Campbell R. Harvey 2017 111

Page 112: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Power Pose

Second most viewed TED talk in historyCampbell R. Harvey 2017 112

Page 113: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Power Pose

New York Times Best Seller (reached #3)

Campbell R. Harvey 2017 113

Page 114: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Power Pose

Simmons and Simonsohn, 2016, https://ssrn.com/abstract=2791272 Campbell R. Harvey 2017 114

Page 115: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Power Pose

Simmons and Simonsohn, 2016, https://ssrn.com/abstract=2791272Also see Gelman and Fung, 2016. http://www.slate.com/articles/health_and_science/science/2016/01/amy_cuddy_s_power_pose_research_is_the_latest_example_of_scientific_overreach.html

Campbell R. Harvey 2017 115

Page 116: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Power Pose

Simmons and Simonsohn, 2016, https://ssrn.com/abstract=2791272

24 studies

Campbell R. Harvey 2017 116

Page 117: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Power Pose

Dana Carney retracts

http://faculty.haas.berkeley.edu/dana_carney/pdf_My%20position%20on%20power%20poses.pdfCampbell R. Harvey 2017 117

Page 118: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Rare Effects: 500 Shades of Gray

Experiment conducted at University of Virginia• Hypothesis: Political extremists see only black and white – literally.• Experiment: Show words in different shades of gray and then ask participants to try to match color on gradient. 

• Afterwards, evaluate where their political beliefs place on the spectrum and test hypothesis that moderates are more accurate.

Nosek, Spies and Motyl (2012)118Campbell R. Harvey 2017

Page 119: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Rare Effects: 500 Shades of Gray

Hello

Drag slide to match the color of the word

119Campbell R. Harvey 2017

Page 120: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Rare Effects: 500 Shades of Gray

Group 1: Moderates

Group 2: Extremists Group 2: Extremists

120Campbell R. Harvey 2017

Page 121: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Rare Effects: 500 Shades of Gray

Dramatic results with large sample of 2,000 participants• Moderates were able to see significantly more shades of gray• P‐value<0.001 which is highly significant; Implying only a 0.1% chance that the observed test results were consistent with the null hypothesis of no effect

121Campbell R. Harvey 2017

Page 122: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Rare Effects: 500 Shades of Gray

Researchers decided to replicate before submitting results for publication in a top journal• Replication saw no significant difference• P‐value was 0.59  (not even close to significant)

122Campbell R. Harvey 2017

Page 123: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Rare Effects: 500 Shades of Gray

Lesson: If the hypothesis is unlikely, then we need to be especially careful. There will be a lot of false positives using standard testing procedures. Ideally, we incorporate information in the testing procedure when we know the effect is rare.

123Campbell R. Harvey 2017

Page 124: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Baker‐Miller Pink

A.G. Schauss• “Tranquilizing Effect of Color Reduces Aggressive Behavior and Potential Violence”*

• “Room Color and Aggression in A Criminal Detention Holding Cell: A Test of the ‘Tranquilizing Pink’ Hypothesis”**– Named Baker‐Miller pink after the two Naval correctional institute directors that sponsored the experiment.

Campbell R. Harvey 2017 124

*http://www.orthomolecular.org/library/jom/1979/pdf/1979‐v08n04‐p218.pdf

** http://orthomolecular.org/library/jom/1981/pdf/1981‐v10n03‐p174.pdf

Page 125: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Last month Kendall Jenner, the reality television celebrity and half‐sister of Kim Kardashian, announced to the world that she had painted her living‐room wall the shade of pink used in some American police cells to calm rowdy detainees. The Times, February 2, 2017

A friend had told her that staring at this hue, which is known variously as “drunk‐tank” pink and Baker‐Miller pink, was “scientifically proven” to suppress the appetite.

Baker‐Miller Pink

Page 126: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Baker‐Miller Pink

Gilliam and Unruh • “The Effects of Baker‐Miller Pink on Biological, Physical and Cognitive Behavior”*– Original effect debunked– Yet belief the effect is true lives on …

Campbell R. Harvey 2017 126

* http://www.orthomolecular.org/library/jom/1988/pdf/1988‐v03n04‐p202.pdf

Page 127: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

5. Rare Effects

Go to: 

PollEv.com/finance663

• 1% of women aged 40‐50 have breast cancer – a relatively rare event• 90% chance of a true positive test from a mammogram• 10% error rate from mammogram

What is the chance that a woman has breast cancer given a positive test?

Campbell R. Harvey 2017 127

Page 128: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

5. Rare Effects

• Survey of doctors found mean response 75%

Campbell R. Harvey 2017 128

Page 129: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

5. Rare Effects

• Sample size=1,000 and 10 true cases• Test 90% accurate, 9/10 of the true tests significant   

•• Given the test result, what is the probability of cancer?

• In other words, the probability of a false diagnosis is 

Note, this is a simple application of Bayes Rule

Campbell R. Harvey 2017 129

Page 130: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Rate of False Discoveries/Diagnoses

is the prior probability (1% for breast cancer);  is the error rate;  is the power (the probability the test will reject the null when the alternative is true) • Even with 100% power, if  is very small, then the expected fraction of false discoveries is very high (close to one).

130Campbell R. Harvey 2017

Page 131: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Case for Using Priors

Three experiments:*1. The musicologist2. The tea drinker3. The bar patron

Campbell R. Harvey 2017 131*Based on correspondence from Leonard Savage in 1962.

Page 132: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Case for Using Priors

Musicologist claims to be able to identify from unlabeled scores whether Haydn or Mozart is the composer

Simple experiment: 10 pairs of scores. Musicologist gets 10/10 correctCampbell R. Harvey 2017 132

Page 133: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Case for Using Priors

Tea drinker claims to be able to identify whether milk was in the tea cup before the tea was poured or added afterwards

Simple experiment: 10 pairs of tea cups. The tea drinker gets 10/10 correctCampbell R. Harvey 2017 133

Page 134: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Case for Using Priors

Bar patron claims that alcohol enables him to foresee the future

Simple experiment: Flip coin 10 times. Drunk gets 10/10 correct. Campbell R. Harvey 2017 134

Page 135: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Case for Using Priors

All three experiments have the identical p‐value:0.510=0.000977  (or p‐value<0.001)

• This means there is less than 1 out a 1000 chance that what we observed is consistent with the null hypothesis (no ability to choose correct answers)

• Though p‐values are identical, the results have different impacts on our beliefs

Campbell R. Harvey 2017 135

Page 136: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Case for Using Priors

Three experiments:1. The musicologist: We already know she is an expert. Indeed, it is not 

even clear that we need to do the experiment. Our beliefs are barely impacted.

2. The tea drinker: We might have been bit skeptical of this long time tea drinker. However, after these results, the plausibility of the claim is greatly strengthened and our beliefs shift.

3. The bar patron: The hypothesis is preposterous. P‐value of 0.001 or even lower would not change our beliefs. 

Campbell R. Harvey 2017 136

Page 137: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

The Bayesian Setup

Bayesian learning implies:Posterior = Bayes Factor x Prior

• Where the Bayes Factor is the ratio of the data likelihood under the null to the data likelihood under the alternative.

• The Bayes Factor tells us how much we are moved away from the prior given the evidence. 

• A very small Bayes factor is supportive of the alternative.• In practice, could be difficult to implement

Campbell R. Harvey 2017 137

Page 138: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

A Simplified Approach

Minimum Bayes Factor• The MBF is the lower bound among all Bayes factors. • It is achieved when the prior distribution of alternative hypotheses has all of its density at the maximum likelihood estimate of the data. 

• It is the Bayes factor that provides the strongest evidence against the null hypothesis.

Campbell R. Harvey 2017 138

Page 139: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

A Simplified Approach

Minimum Bayes Factor• It is very easy to calculate:  

/

• So, if t‐stat = 2.0 (usually associated with p‐value of 0.05), the MBF is 0. 14.

Campbell R. Harvey 2017 139

Page 140: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

A Bayesianized P‐value

We can use the MBF to answer a key question:

1

• The Bayesianized p‐value tells us the probability the null is true given the data

• It answers the right question

Campbell R. Harvey 2017 140

Page 141: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

A Bayesianized P‐value

Example: Suppose our null hypothesis is that variable Y is not predicted by X. We run a regression of Y on X with 300 observations and find a “significant” coefficient with a t‐statistic of 2.6 which has a p‐value of 0.014. 

/ = 0.034

• Let’s assume prior odds are even, i.e., 1:1

..

= 0.033

• However, if you think there are modest odds against the effect being real, say 2:1, the probability that the null is true increases to 0.064. 

Campbell R. Harvey 2017 141

Page 142: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

A Bayesianized P‐value

There is another MBF that is not as generous to the alternative. It places the mass at the null with Symmetric and Declining density.       It also very easy to calculate:

Where e is the natural exponent and p is the usual p‐value

• This type of prior might be more appropriate in certain situations in financeCampbell R. Harvey 2017 142

Page 143: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Examples in Practice

Campbell R. Harvey 2017 143

Reported Reported Prior odds‐ratio Bayesianized Prior category Effect Sample t‐stat p‐value MBF (null/alternative) p‐valueA stretch Clever tickers outperform 1984‐2005 2.66 0.0079 0.0291 99/1 0.742

(Head, Smith and Wilson, 2009)

Perhaps Size priced 1963‐1990 2.58 0.0099 0.0359 4/1 0.125(Fama and French, 1992)

Solid footing Market beta priced 1935‐1968 2.57 0.0100 0.0368 1/1 0.035(Fama and MacBeth, 1973)

Page 144: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Final perspectives

Combination of: propensity for Type I errors, incorrect testing methods, and lack of effort to reduce noise implies

• Most published empirical research findings are likely false • Most research conducted within companies is likely false• Most managers are just “lucky”• Most the smart beta products are not “smart”• No predictability in performance based on past performance

144Campbell R. Harvey 2017

Page 145: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Final perspectives

• My research makes progress on goal of identifying repeatable performance

• There are a host of other issues:• Factor loadings also noisy• Ex‐post factor loading unfairly punish market timers• It is essential to look beyond the Sharpe Ratio and incorporate other info

145Campbell R. Harvey 2017

5. 

Page 146: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Credits

Joint work with 

Yan LiuTexas A&M University

Based on: • “The Scientific Outlook in Financial Economics”  https://ssrn.com/abstract=2893930 [Presidential Address]

and my joint work with Yan Liu:• “… and the Cross‐section of Expected Returns”

http://ssrn.com/abstract=2249314 [Best paper in investment, WFA 2014]

• “Backtesting”http://ssrn.com/abstract=2345489 [Bernstein Fabozzi/Jacobs‐Levy best paper, JPM 2016]

• “Evaluating Trading Strategies” [Bernstein Fabozzi/Jacobs‐Levy best paper, JPM 2015]http://ssrn.com/abstract=2474755

• “Lucky Factors”http://ssrn.com/abstract=2528780

• “Rethinking Performance Evaluation”http://ssrn.com/abstract=2691658

Campbell R. Harvey 2017 146

Page 147: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Appendix: Questionnaire

Hypothesis: Firms with small boards of directors outperform companies with large boards. Test for differences in mean returns (with controls) is significant with t=2.7 (p‐value=0.01). Consider the following questions: (True or False)1. You have disproved the null hypothesis (no difference in mean performance) 2. You have found the probability of the null hypothesis being true. 3. You have proved your hypothesis that firms with small boards outperform firms with 

large boards. 4. You can deduce the probability of your hypothesis (small better than large) being true.5. You know, when you reject the null hypothesis (of no difference), the probability that 

you are making a mistake6. You have a reliable finding in the sense that if, hypothetically, the experiment were 

repeated a great number of times, you would obtain a significant result on 99% of the occasions. 

147Campbell R. Harvey 2017

Page 148: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Appendix

Answer is “False” for each question.• The p‐value does not tell you whether the null hypothesis or the underlying 

experimental hypothesis is “true”.  It is also incorrect to interpret the test as providing (1‐p‐value) percent confidence that the effect being tested is true. Hence, both (1) and (3) are false.

• The p‐value tells us the probability of observing an effect, D, or greater, given the null hypothesis, H0, is true, i.e. p(D|H0). It does not tell us p(H0|D) – hence (2) is false. 

• The p‐value says nothing about the experimental hypothesis being true or false –hence (4) is false. Question (5) also refers to the probability of a hypothesis which the p‐value does not deal with. Hence, (5) is false.

• The complement of the p‐value does not tell us the probability that a similar effect will hold up in the future unless we know the null is true – and we don’t. Hence (6) is false.  148Campbell R. Harvey 2017

Page 149: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Appendix

P‐value is P[D|H] not P[H|D]• Where H is the null hypothesis and D is the observed data

• It is routine to look at a low p‐value, like p=0.01 and conclude that there is only a 1% chance the null is true. That is incorrect.

149Campbell R. Harvey 2017

Page 150: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Appendix

P‐value is P[D|H] not P[H|D]• To see the large gap consider the difference between:

– P[Death|Hanging] = .99– P[Hanging|Death] = .01

It makes no sense to equate the two.

150Campbell R. Harvey 2017

Page 151: The Search for Repeatable Performance - Duke's …charvey/Teaching/663_2017/...The Search for Repeatable Performance ... Campbell R. Harvey, “The Scientific Outlook in Financial

Appendix

In addition...• The p‐value is routinely used to choose among specifications, i.e. choose the one 

with the lowest p‐value. Comparing p‐values across specifications has no statistical meaning. 

• A low p‐value while rejecting the null hypothesis tells us very little about the ability of the hypothesis to explain the data. That is, you might observe a low p‐value but the model has a low R2.

• Low p‐values could be a result of not controlling for multiple testing.• Low p‐values could be a result of selection and/or p‐hacking.• Low p‐value could be a result of a misspecified test.• P‐values crucially depend on the amount of data. It has been well known since 

Berkson (1938, 1942) that with enough data, you can reject any null hypothesis.• P‐values do not tell us about size of the economic effect.

151Campbell R. Harvey 2017