Regression of NFL Scores on Vegas Line – 2007 Regular Season.

24
Regression of NFL Scores on Vegas Line – 2007 Regular Season

Transcript of Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Page 1: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Regression of NFL Scores on Vegas Line – 2007 Regular Season

Page 2: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Problem Description

• Odds makers Place a Point Spread (differential) and a Over/Under (total) on all National Football League games

• Combining these two quantities, we can obtain a prediction for the final score of the game

• Let PA and PH be the odds makers Predicted scores for the Away and Home teams, respectively

• Spread [wrt Home Team] (PS)= PA – PH (Negative spreads for Home teams mean they are favored (“giving” points)

• Over/Under (OU) = PA + PH

• PA = (OU+PS)/2 PH = (OU-PS)/2

Page 3: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Data/Model Description• Point Spreads, Over/Under, and Actual Scores obtained

for all n=256 NFL games from 2007 season• Predicted Scores obtained for each team in each game• Regression is fit for each team’s actual score (n=512 team

games) as a function of predicted score, and home team indicator

• Residuals checked to see if errors are independent within games for the two teams

• Tests conducted to determine: If Home Team effect is sufficiently accounted for by odds

makers If Odds makers are “unbiased” in their point predictions If relation between actual and predicted scores is linear

Page 4: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Week 1 DataWeek Away Team Home Team Open Spread (HT) Open Over/Under Expected Home Expected Visitor Observed Home Observed Visitor

1 NO IND -6.5 49.5 28 21.5 41 101 ATL MIN -2.5 36 19.25 16.75 24 31 CAR STL -1 42.5 21.75 20.75 13 271 DEN BUF 3.5 37 16.75 20.25 14 151 KC HOU -3 38 20.5 17.5 20 31 MIA WAS -2.5 35 18.75 16.25 16 131 NE NYJ 7 41 17 24 14 381 PHI GB 2.5 43.5 20.5 23 16 131 PIT CLE 4.5 37 16.25 20.75 7 341 TEN JAX -7 37.5 22.25 15.25 10 131 CHI SD -5.5 42.5 24 18.5 14 31 DET OAK -1.5 40 20.75 19.25 21 361 TB SEA -6 41 23.5 17.5 20 61 NYG DAL -5.5 44 24.75 19.25 45 351 BAL CIN -2.5 40.5 21.5 19 27 201 ARI SF -3.5 45 24.25 20.75 20 17

Note for the first game:• Spread = PA – PH = -6.5 (IND was favored to beat NO by 6.5 Points)• Over/Under = PA + PH = 49.5 (Predicted Total Score was 49.5 points)• PA = (49.5 + (-6.5))/2 = 21.5 PH = (49.5 - (-6.5))/2 = 28

Page 5: Regression of NFL Scores on Vegas Line – 2007 Regular Season.
Page 6: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Regression Model

0

2

1,...,512

where:

Observed score for the team-game

Odds makers predicted score for the team-game

1 if the team-game is a Home game, 0 if Away

~ 0,

Away

i P i H i PH i i i

thi

thi

thi

i

Y P H PH i

Y i

P i

H i

NID

0

0

Teams 0 :

Home Teams 1 :

i P ii

i H P PH ii

H E Y P

H E Y P

Page 7: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Regression ResultsX'X X'Y512 10696 256 5664.5 11104

10696 232904.8 5664.5 129445.4 241207.8256 5664.5 256 5664.5 5919

5664.5 129445.4 5664.5 129445.4 134504.5

(X'X)^-1 Beta-hat0.08846 -0.00430 -0.08846 0.00430 -0.377-0.00430 0.00022 0.00430 -0.00022 1.050-0.08846 0.00430 0.21157 -0.00969 4.4530.00430 -0.00022 -0.00969 0.00046 -0.189

sse mse s49615.22 97.66775 9.8827

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.3942R Square 0.1554Adjusted R Square 0.1504Standard Error 9.8827Observations 512

ANOVAdf SS MS F Significance F

Regression 3 9128.78 3042.93 31.16 0.0000Residual 508 49615.22 97.67Total 511 58744

Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Intercept -0.3767 2.9393 -0.1281 0.8981 -6.1513 5.3980X1=Vegas 1.0497 0.1462 7.1792 0.0000 0.7624 1.3369X2=Home 4.4533 4.5457 0.9797 0.3277 -4.4773 13.3840X3=Vegas*Home -0.1890 0.2125 -0.8893 0.3742 -0.6065 0.2285

2

Away: 0.38 1.05

Home: 0.38 4.45 1.05 0.19

4.07 0.86

97.67 9.88

i i

i i

i

Y P

Y P

P

s s

Page 8: Regression of NFL Scores on Vegas Line – 2007 Regular Season.
Page 9: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Test of No Home Effect and “Unbiasedness”

0

0

0

11

No Home Effect and "Unbiasedness"

0, 1

' where:

1 0 0 0 0

0 1 0 0 1'

0 0 1 0 0

0 0 0 1 0

: ' : '

' ' 'Test Statistic:

i i

H PH P

P

H

PH

A

T

obs

E Y P

H H

F

K β m

K β m

K β m 0 K β m 0

K β m K X X K 2 2

0 , ' 4,508

' / rank

Under : ~ k n p

Q k

s sH F F F

K β m K

Page 10: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Results of Test of No Home Effects and Unbiasedness

K' K'B m K'B-m1 0 0 0 -0.37666 0 -0.376660 1 0 0 1.049672 1 0.0496720 0 1 0 4.453325 0 4.4533250 0 0 1 -0.18898 0 -0.18898

K'(X'X)^-1K (K'(XXI)K)^-10.088456472 -0.00430187 -0.088456472 0.004302 512 10696 256 5664.5-0.00430187 0.000218877 0.00430187 -0.00022 10696 232904.8 5664.5 129445.4

-0.088456472 0.00430187 0.211567097 -0.00969 256 5664.5 256 5664.50.00430187 -0.000218877 -0.009689162 0.000462 5664.5 129445.4 5664.5 129445.4

Q(K) df(K) df(E)436.0329611 4 508

F_obs F(0.05) P-value1.1161 2.3895 0.3481

No evidence to Conclude that E(Y) ≠ P

Page 11: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Fit of Simple Regression of Actual on Predicted ScoreSUMMARY OUTPUT

Regression StatisticsMultiple R 0.3919R Square 0.1536Adjusted R Square 0.1519Standard Error 9.8738Observations 512

ANOVAdf SS MS F Significance F

Regression 1 9023.01 9023.01 92.55 0.0000Residual 510 49720.99 97.49Total 511 58744

Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Intercept 1.2836 2.1653 0.5928 0.5536 -2.9705 5.5377X1=Vegas 0.9767 0.1015 9.6204 0.0000 0.7772 1.1762

Note, we clearly do not reject H0 that the intercept is 0 and slope is 1, but will use this model to obtain Confidence Intervals for Mean Score and Prediction Intervals for Individual Game Scores at various levels of predicted scores

Page 12: Regression of NFL Scores on Vegas Line – 2007 Regular Season.
Page 13: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Confidence Intervals and Prediction Intervals

0:

0'0 00 0 0

0

2

0'0 0/2 ' 0 /2 ' 2

1

Point Predictions of Score When

1

1 100% Confidence Interval for Mean of all Games when :

1

1 100% Prediction

P

P

n p n p n

ii

P P

Y P P

P P

P PY t s Y t s

n P P

-1

o

x β

x X'X x

0

2

0'0 0/2 ' 0 /2 ' 2

1

2

1

Interval for a Single Game when :

11 1

For this analysis: 512 20.89 9458.625 1.2836 0.9767

n p n p n

ii

n

ii

P P

P PY t s Y t s

n P P

n P P P Y P

-1

ox X'X x

Page 14: Regression of NFL Scores on Vegas Line – 2007 Regular Season.
Page 15: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Residual Analysis

• Are the residuals consistent with the model assumptions: Normally Distributed

• Histogram, Normal Probability Plot, Wilks-Shapiro Test Linear relation between Actual and Predicted Scores

• Plot of Residuals versus Fitted, Lack-of-Fit F-test Constant Error Variance

• Plot of Residuals versus Fitted, Regress |resid| vs fitted Independent (e.g. Within Games and Within Teams Over Time)

• Correlation between Home/Away within games• Non-Independent errors within Teams (Random Team

effects)• Autocorrelation among errors over time within teams

Page 16: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Normal Distribution of Residuals

Correlation between residuals and their corresponding normal scores = .9952

Page 17: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Linearity of Regression

0 0 0

2

1 1

2

1 1

-Test for Lack-of-Fit ( observations at distinct levels of " ")

: :

Lack-of-Fit: 2

Pure Error:

Test Statistic:

j

j

j

i P i A i i P i

nc

j j LFj i

nc

jij PEj i

F n c X

H E Y P H E Y P

SS LF Y Y df c

SS PE Y Y df n c

0

2,

( ) 2

( )

For this example: 512, 81

~H

LOF c n c

SS LF cF F

SS PE n c

n c

df SS MS F F(.05) P-valueLack of Fit 79 9039.0655 114.4186 1.2122 1.3097 0.1200Pure Error 431 40681.9251 94.3896Residual 510 49720.9905 97.4921

No evidence to reject the hypothesis of a linear relation between Actual and Predicted scores

Page 18: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Equal (Homogeneous) Variance - I

No overwhelming evidence of unequal variance based on graph

Page 19: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Equal (Homogeneous) Variance - II

20

1

Brown-Forsythe Test:

: Equal Variance Among Errors

: Unequal Variance Among Errors (Increasing or Decreasing in )

1) Split Dataset into 2 groups based on levels of with sample sizes: ,

i

A

H V i

H X

X n

2

1 22) Compute the median residual in each group: ,

3) Compute absolute deviation from group median for each residual:

1,..., 1, 2

4) Compute the mean and variance for each group of

jij ij j

n

e e

d e e i n j

0

1 2

2 21 21 2

2 21 1 2 22

1 2

1 2

2

1 2

: , ,

1 15) Compute the pooled variance:

2

Test Statistic: 1 1 ~

ij

H

BF n n

d d s d s

n s n ss

n n

d dt t

sn n

Group X_Low X_High n(i) med(e) dbar(i) s2(i)1 11 20.5 257 -0.8875 7.5886 31.36432 20.75 37.75 255 -1.4802 8.3643 38.3184

s2 t(BF) t(.025) P-value34.8277 -1.4870 1.9646 0.1376

No evidence to reject the null hypothesis of equal variance among errors

Page 20: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Equal (Homogeneous) Variance

20

2 21 1

2

1

21

Breusch-Pagan (aka Cook-Weisberg) Test:

: Equal Variance Among Errors

: Unequal Variance Among Errors ...

1) Let

2) Fit Regression of on ,... and ob

i

A i i p ip

n

ii

i i ip

H V i

H h X X

SSE e

e X X

02 2

2

2

1

tain Reg*

Reg* 2Test Statistic: ~

H

BP pn

ii

SS

SSX

e n

ANOVAdf SS

Regression 1 93238.13Residual 510 8461614Total 511 8554852

There is some evidence of unequal variance, but keep in mind the sample size is huge.See plot for how weak the association is

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40

Regression of e^2 on X

SS(Reg*) 93238.13SSE 49720.99SS(Reg*)/2 46619.07SSE/512 97.11131X2(BP) 4.943379X2(.05,df=1) 3.841459P-value 0.026191

Page 21: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Independence Between Home/Away Residuals Within Games

0

0

22

Test for Correlation Between Home and Away Team Residuals:

: 0 (Errors within Games are Independent) : 0

Test Statistic: 1

2

~

A

H

r n

H H

rt t

rn

r 0.05771-r^2 0.9967n 512t_r 1.3052t(.025) 1.9646P-value 0.1924No Evidence of associations between residuals within games

Page 22: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Testing For Random Team Effects - I

No overwhelming evidence of team random effects

Page 23: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Testing for Random Team Effects - II

2 2

2'

20 '

2

Residual for team on week 1,..., 1,..., 32, 16

~ 0, ~ 0,

, '

: 0 , 0 Residuals are independent within teams

: 0 ,

ij i i

ij i ij i ij i ij

ij ij

ij ij

A ij i

i j i g j n g n

u NID u NID u

COV j j

H COV

H COV

'

2

1

2

1 1

0 Residuals not independent within teams

Test based on 1-Way Random Effects ANOVA on Residuals

1

( )Test Statistic:

i

j

g

ii Teamsi

ng

iij Erri j

Teaobs

SS Teams n e e df g

SS Error e e df N g

SS Teams dfF

0

,( ) ~ Teams Err

Hms

df dfErr

FSS Error df

Source SS df MS F F-Crit P-value

Team 2997.58 31 96.70 0.9335 1.4752 0.5724Error 49720.99 480 103.59

No evidence of random Team Effects

Page 24: Regression of NFL Scores on Vegas Line – 2007 Regular Season.

Durbin-Watson Test Within Teams over Weeks

20 1 1

0

~ 0, 1

: 0 Errors are uncorrelated over time : 0 Positively correlated

1) Obtain Residuals from Regression 2) Compute Durbin-Watson Statistic

3) If , Reject

t t t t t t t

A

L

Y X u u NID

H H

DW d p n

0 0

2

12

2

1

*

If , Conclude Otherwise Inconclusive

Test Statistic:

For NFL teams, we use =

For =16 (weeks/team) and =1 (predictor): 1.10 1.37

U

n

t tt

n

tt

iit it

L U

H DW d p n H

e eDW

e

e e e

n p d d

Team DW Team DW Team DW Team DW1 1.87 9 0.79 17 1.36 25 1.622 0.97 10 2.22 18 2.04 26 2.333 1.95 11 2.51 19 1.72 27 1.674 2.74 12 2.04 20 1.58 28 2.215 2.42 13 2.21 21 1.55 29 2.206 1.21 14 2.16 22 3.05 30 2.617 1.78 15 1.83 23 1.54 31 2.848 1.60 16 2.61 24 2.87 32 2.04

Teams 2 and 9 have small DW values (positive autocorrelation). Team 22 displays negative autocorrelation (value above 4-dL).Most teams show no autocorrelation