Applied Statistics III

51
Applied Statistics Vincent JEANNIN – ESGF 4IFM Q1 2012 1 [email protected] ESGF 4IFM Q1 2012

description

Third Session, MSc 4th Year

Transcript of Applied Statistics III

Page 1: Applied Statistics III

Applied Statistics Vincent JEANNIN – ESGF 4IFM

Q1 2012

1

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Page 2: Applied Statistics III

2

Summary of the session (est. 4.5h) • Reminders of last session • Multiple regression • Introduction to econometrics • Estimations • Games: beat the statistics

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Page 3: Applied Statistics III

Reminders of last session

3

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

3 Methods

• Historical • Parametrical • Monte-Carlo

Page 4: Applied Statistics III

4

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Options: what to look at to calculate the VaR?

4 risk factors: • Underlying price • Interest rate • Volatility • Time

4 answers: • Delta/Gamma approximation knowing the distribution of the underlying • Rho approximation knowing the distribution of the underlying rate • Vega approximation knowing the distribution of implied volatility • Theta (time decay)

Yes but,… Does the underling price/rate/volatility vary independently?

Might be a bit more complicated than expected…

Page 5: Applied Statistics III

5

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Portfolio scale: what to look at to calculate the VaR?

Big question, is the VaR additive?

NO! Keywords for the future: covariance, correlation, diversification

Page 6: Applied Statistics III

6

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

VAR 𝑎𝑋 + 𝑏𝑌 = 𝑎2𝑉𝐴𝑅 𝑋 + 𝑏2𝑉𝐴𝑅 𝑌 + 2𝑎𝑏𝐶𝑂𝑉(𝑋, 𝑌)

Parametric VaR on 2 assets?

𝑃 𝑋 ≤ −1.645 ∗ 𝜎 + 𝜇 = 0.05

𝑃 𝑋 ≤ −2.326 ∗ 𝜎 + 𝜇 = 0.01

Asset 1 Mean 0

SD 2.34% Weight 50%

Asset 2 Mean 0

SD 1.50% Weight 50%

Correlation 0.59

What is the VaR (95%)?

2.83%

Page 7: Applied Statistics III

7

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Linear regression model

Minimize the sum of the square vertical distances between the observations and the linear approximation

𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏

Residual ε

OLS: Ordinary Least Square

Minimising residuals

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2

𝑛

𝑖=1

𝑎 =𝐶𝑜𝑣𝑥𝑦

𝜎2𝑥

𝑏 = 𝑦 − 𝑎 𝑥

Page 8: Applied Statistics III

8

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑟 =𝐶𝑜𝑣𝑥𝑦

𝜎𝑥𝜎𝑦 Value between -1 and 1

Dispersion Regression

Total Dispersion 𝑅2 =

Page 9: Applied Statistics III

9

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Page 10: Applied Statistics III

10

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Page 11: Applied Statistics III

11

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Differentiation can happen before the OLS

What do you suggest?

Page 12: Applied Statistics III

12

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑌𝐷𝑖𝑓𝑓 = ln(𝑌)

Let’s create a new variable

Magic!

Page 13: Applied Statistics III

13

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Only one parameters to estimate: • Slope β

Minimising residuals

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖2

𝑛

𝑖=1

When E is minimal?

When partial derivatives i.r.w. a is 0

New idea… No intercept

Page 14: Applied Statistics III

14

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖2

𝑛

𝑖=1

𝜕𝐸

𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖

2

𝑛

𝑖=1

= 0

𝑦𝑖 − 𝑎𝑥𝑖2 = 𝑦𝑖

2 − 2𝑎𝑥𝑖𝑦𝑖 + 𝑎2𝑥𝑖2

Quick high school reminder if necessary…

𝑥𝑖𝑦𝑖 − 𝑎𝑥𝑖2

𝑛

𝑖=1

= 0

𝑎 ∗ 𝑥𝑖2

𝑛

𝑖=1

= 𝑥𝑖𝑦𝑖

𝑛

𝑖=1

𝑎 = 𝑥𝑖𝑦𝑖

𝑛𝑖=1

𝑥𝑖2𝑛

𝑖=1

𝑎 =𝑥𝑖𝑦𝑖

𝑥𝑖2

Any better?

Page 15: Applied Statistics III

Multiple regressions

15

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑦 = 𝑏0 + 𝑏1𝑋1+𝑏2𝑋2+…+𝑏𝑛𝑋𝑛 + ε

More than one explanatory variables

Choosing factors can be difficult

Much tougher without software

Page 16: Applied Statistics III

16

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Variables may not be dependent form each other

Financial methods such APT (Arbitrage Pricing Theory) tries to have pure and independent factors

Used a lot in economics

R-Square is very often very poor

Page 17: Applied Statistics III

17

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Ratio Investment / GDP , World Bank, developing countries

𝑅 = 19.5 −5.8𝐶𝑜𝑟𝑟𝑢𝑝𝑡𝑖𝑜𝑛 + 6.3𝐶𝑜𝑟𝑟𝑢𝑝𝑡𝑖𝑜𝑛𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 + 2𝑆𝑐ℎ𝑜𝑜𝑙 − 1.1𝐺𝐷𝑃 − 2𝐷𝑖𝑠𝑡𝑜𝑟𝑡𝑖𝑜𝑛

Let’s discuss…

• Corruption: current corruption • CorruptionPrediction: future corruption • School: level of education • GDP: GDP • Distortion: how badly policies are run

Page 18: Applied Statistics III

18

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Opposite effect of corruption variables

Any logic with this?

The current level of corruption decreases investment

The future level of corruption increases investment

Investors learn how to live with corruption…

Page 19: Applied Statistics III

19

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

R-Squared is 0.24, very poor…

• General to specific: this starts off with a comprehensive model, including all the likely explanatory variables, then simplifies it.

• Specific to general: this begins with a simple model that is easy to understand, then explanatory variables are added to improve the model’s explanatory power.

How to find the right model?

Page 20: Applied Statistics III

20

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Golden rules

Be logic

Have the best R-Squared

Not over complicate

Page 21: Applied Statistics III

Introduction to econometrics

21

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

3 steps

Identify

Fit

Forecast

𝑂𝑏𝑠 = 𝑀𝑜𝑑𝑒𝑙 + 𝜀 with 𝜀 being a white noise What is a model?

Page 22: Applied Statistics III

22

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

3 components

Trend

Seasonality

Residual

Page 23: Applied Statistics III

23

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Stationary series are easier to forecast… Transform it!

A series is stationary if the mean and the variance are stable

Which one is more likely to be stationary?

Page 24: Applied Statistics III

24

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Properties of stationary series

(𝑌1, 𝑌2, 𝑌3, … , 𝑌𝑛)

(𝑌2, 𝑌3, 𝑌4, … , 𝑌𝑛+1)

Same distribution of the following

Distribution not time dependent

Rare occurrence

Stationarity accepted if

𝐸(𝑌𝑡) = 𝜇 Constant in the time

𝐶𝑜𝑣(𝑌𝑡 , 𝑌𝑡−𝑛) Depends only on n

Page 25: Applied Statistics III

25

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

About the residuals…

White noise!

Normality test

Have an idea with

Skewness

Kurtosis

Proper tests: KS, Durbin Watson, Portmanteau,…

Page 26: Applied Statistics III

26

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

eps<-resid(TReg)

ks.test(eps, "pnorm")

layout(matrix(1:4,2,2))

plot(TReg)

Page 27: Applied Statistics III

27

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

lag.plot(DATA$Val, 9, do.lines=FALSE)

Differentiation seems to be interesting

Page 28: Applied Statistics III

28

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Check ACF/PACF for autocorrelation

Page 29: Applied Statistics III

29

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑋𝑡 = 𝑐 + 𝜑1𝑋𝑡−1 + 𝜑2𝑋𝑡−2 + ⋯+ 𝜑𝑛𝑋𝑡−𝑛 + 𝜀𝑡

𝜑𝑛 Parameters of the model

𝜀𝑛 White noise

Auto Regressive model

AR(n)

Page 30: Applied Statistics III

Estimations

30

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Small sample: Binomial Distribution

Large sample: Normal Distribution

)()1()!(!

!)( xnx pp

xnx

nxf

)1(, pnpnpN

n is the size of the sample, x, the number individuals with the particular characteristic

𝐸 𝑋 = 𝑛𝑝

𝑉 𝑋 = 𝑛𝑝(1 − 𝑝)

Page 31: Applied Statistics III

31

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Binomial Distribution

𝐸 𝑌 = 𝑝 𝑉 𝑌 =𝑝(1 − 𝑝)

𝑛

Normal approximation

𝑌~𝑁 𝑝,𝑝(1 − 𝑝)

𝑛 Standardisation possible

𝑌∗~𝑁 0,1

𝑌∗ =𝑌 − 𝑝

𝑝(1 − 𝑝)𝑛

Normal approximation works only if

𝑛𝑝 ≥ 5 𝑛(1 − 𝑝) ≥ 5

Estimate a proportion 𝑌 =

𝑋

𝑛

Page 32: Applied Statistics III

32

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑃 𝑝1 < 𝑝 < 𝑝2 = 0.95 Let’s look for p with a 95% confidence interval

Easy solve!

𝑃 𝜇 − 1.96 ∗ 𝜎 ≤ 𝑋 ≤ 𝜇 + 1.96 ∗ 𝜎 = 0.95

Page 33: Applied Statistics III

33

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

52 Heads out of 100 toss…

𝑌~𝑁 0.52,0.04996

95% confidence interval

𝑝1 = 0.62

𝑌~𝑁 ? , ?

𝑝2 = 0.42

Page 34: Applied Statistics III

34

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Mean estimation

Problem

The SD of the actual population is unknown

Mean has a Student’s distribution

Similarity with normal

Page 35: Applied Statistics III

35

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Student’s properties

• It is symmetric about its mean • It has a mean of zero • It has a standard deviation and variance greater than 1. • There are actually many t distributions, one for each degree of freedom • As the sample size increases, the t distribution approaches the normal distribution. • It is bell shaped. • The t-scores can be negative or positive, but the probabilities are always positive.

Normal-ish distribution in a discrete environment with a confidence interval

Page 36: Applied Statistics III

36

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Student’s Statistic

S=𝑛

𝑛−1𝜎

𝑃 𝑥 −𝑆

𝑛∗ 𝑡𝛼/2 < 𝜇 < 𝑥 +

𝑆

𝑛∗ 𝑡𝛼/2 = 0.95

Degree of freedom

n-1

Page 37: Applied Statistics III

37

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

IPO Premiums IPO1 / 12% IPO2 / 15% IPO3 / 13% IPO4 / 18% IPO5 / 20% IPO6 / 5%

SD: 𝜎=4.81%

DF: 𝐷𝐹=5

S: 𝑆=5.27%

t: 𝑡=2.571

𝜇1: 𝜇1=19.36%

𝑥 : 𝑥 =13.83%

𝜇2: 𝜇2=8.30%

Page 38: Applied Statistics III

38

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Is a frequency difference significant?

𝑌1~𝑁 𝑝1,𝑝1(1 − 𝑝1)

𝑛1 𝑌2~𝑁 𝑝2,

𝑝2(1 − 𝑝2)

𝑛2

𝑍 = 𝑌1 − 𝑌2

𝐸(𝑍) = 𝐸(𝑌1) − E(𝑌2)

𝑉(𝑍) = 𝑉(𝑌1) + V(𝑌2) Assumption of independence

𝑍~𝑁 𝑝1 − 𝑝2,𝑝1(1 − 𝑝1)

𝑛1+

𝑝2(1 − 𝑝2)

𝑛2

Page 39: Applied Statistics III

39

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Observations 100 Friendly Takeover, 80 success 60 Hostiles Takeover, 50 success

Is the difference significant? 95% confidence

Friendly 80%

Hostiles 83%

Global frequency

𝑝 =𝑛1𝐹1 + 𝑛2𝐹2

𝑛1 +𝑛2 𝑝 =

80 + 50

100 + 60= 81.25%

Page 40: Applied Statistics III

40

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑡∗ =𝐹1 − 𝐹2

𝑝 (1 − 𝑝 )1𝑛1

+1𝑛2

𝑡∗ = −0.52298

If 𝑃(−1.96 < 𝑡∗ < 1.96) = 0.95the frequencies are the same

with a 95% confidence interval

The frequencies are equal

Their difference is not significant

Actual difference due to fluctuation of samples

Page 41: Applied Statistics III

41

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Is a SD difference significant?

Fisher Snedecor distribution

𝑆𝑥 2

𝑆𝑦 2

𝜎𝑝 2

𝜎𝑞 2

Total variance

Total variance

Sample variance

Sample variance

𝑆𝑥 2

𝑆𝑦 2∗𝜎𝑝 2

𝜎𝑞 2~𝐹(𝑛𝑝 − 1, 𝑛𝑞 − 1)

Page 42: Applied Statistics III

42

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝜎𝑝 2 = 𝜎𝑞 2 You want to test

𝑆𝑥 2

𝑆𝑦 2~𝐹(𝑛𝑝 − 1, 𝑛𝑞 − 1)

Page 43: Applied Statistics III

43

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑆𝑥 2

𝑆𝑦 2~𝐹(5,4)

Page 44: Applied Statistics III

44

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

95% confidence interval F-Table

𝑆𝑥 2

𝑆𝑦 2< 6.26 If SD are equals (at 95% CI)

Page 45: Applied Statistics III

Games: Beat the Statistics

45

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Is Martingale safe?

Bet on 2:1, double when you lose…

Risk of ruin?

Page 46: Applied Statistics III

46

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Bet on 2:1

Is this really 2:1? 18

37= 0.4865

Obvious how casino is making money!

The probability of the casino to win is always bigger than the probability of the player to win!

Page 47: Applied Statistics III

47

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

You’ll be right with a martingale… Eventually! But when?

The 2011 recorded record series is 26 reds in Las Vegas, Nevada

You were on the black and hoping the reversal, you begun with $2

At the 27 round you need

227 = $134,217,728

And don’t forget you lost already

21 + 22 + ⋯+ 226 = $134,217,726

Casino limit stakes

Your pocket may not be deep enough anyway!

And if you win at the 27th roll, you made…

$2 Quite risky…

Page 48: Applied Statistics III

48

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

“No one can possibly win at roulette unless he steals money from the table while the

croupier isn’t looking.” — Albert Einstein

Page 49: Applied Statistics III

49

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Binomial approach

𝑃 𝑥 = 𝐶𝑥𝑛𝑝𝑥(1 − 𝑝)𝑛−𝑥

Page 50: Applied Statistics III

50

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

$255, $1 flat bet

$255, $1 start, martingale double when you lose

Ruin in 255 times for flat bet

Ruin in 8 times for martingale

1,000,000 times comparison, 100 rounds maximum

Page 51: Applied Statistics III

51

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Conclusion

Multiple Regression

Econometrics

Estimations

Statistics & Games