Using Data Analysis to Predict Attendance for NHL · PDF fileUsing Data Analysis to Predict...

65
Using Data Analysis to Predict Attendance for NHL Regular Season Games Brian Macdonald 1 Michael Peterson 2 James Cifu 1 1 Florida Panthers Hockey Club, Sunrise, FL 2 Tampa Bay Lightning Hockey Club, Tampa, FL RITHAC 2017, 10/21/2017 Twitter: @bmacNHL, @FlaPanthers, @TBLightning

Transcript of Using Data Analysis to Predict Attendance for NHL · PDF fileUsing Data Analysis to Predict...

Using Data Analysis to Predict Attendance forNHL Regular Season Games

Brian Macdonald1

Michael Peterson2

James Cifu1

1Florida Panthers Hockey Club, Sunrise, FL2Tampa Bay Lightning Hockey Club, Tampa, FL

RITHAC 2017, 10/21/2017Twitter: @bmacNHL, @FlaPanthers, @TBLightning

Attendance Model

Goal:I Develop a model for predicting attendance for games using

only information that is known before tickets go on sale.

This could help answer questions like:I Which games should be in which tiers for variable pricing?

I What kinds of things could we request when the league isdeveloping the schedule?

I Specific question: Do we prefer good team on a Saturdayand bad team during the week, or a good team during theweek and a bad team on Saturday?"

I What do we want Thanksgiving week?

Attendance Model

Goal:I Develop a model for predicting attendance for games using

only information that is known before tickets go on sale.

This could help answer questions like:I Which games should be in which tiers for variable pricing?

I What kinds of things could we request when the league isdeveloping the schedule?

I Specific question: Do we prefer good team on a Saturdayand bad team during the week, or a good team during theweek and a bad team on Saturday?"

I What do we want Thanksgiving week?

Attendance Model

Goal:I Develop a model for predicting attendance for games using

only information that is known before tickets go on sale.

This could help answer questions like:I Which games should be in which tiers for variable pricing?

I What kinds of things could we request when the league isdeveloping the schedule?

I Specific question: Do we prefer good team on a Saturdayand bad team during the week, or a good team during theweek and a bad team on Saturday?"

I What do we want Thanksgiving week?

Attendance Model

Goal:I Develop a model for predicting attendance for games using

only information that is known before tickets go on sale.

This could help answer questions like:I Which games should be in which tiers for variable pricing?

I What kinds of things could we request when the league isdeveloping the schedule?

I Specific question: Do we prefer good team on a Saturdayand bad team during the week, or a good team during theweek and a bad team on Saturday?"

I What do we want Thanksgiving week?

Attendance Model

Goal:I Develop a model for predicting attendance for games using

only information that is known before tickets go on sale.

This could help answer questions like:I Which games should be in which tiers for variable pricing?

I What kinds of things could we request when the league isdeveloping the schedule?

I Specific question: Do we prefer good team on a Saturdayand bad team during the week, or a good team during theweek and a bad team on Saturday?"

I What do we want Thanksgiving week?

First, let’s plot some raw data.Attendance* by game, from 2007-08 to 2016-17, for all 30

teams.

*Announced attendance, as published on nhl.com

First, let’s plot some raw data.Attendance* by game, from 2007-08 to 2016-17, for all 30

teams.

*Announced attendance, as published on nhl.com

Snippet

Attendance Data and Model

Two observations1. For many teams, attendance is relatively flat.

2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.

Model1. Remove the teams that have flat attendance.

2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.

3. Remove a few games unusual characteristics.I European gamesI Blizzards

4. Use several predictor variables (next slide)

5. Announced attendance is outcome we’re trying to predict

Attendance Data and Model

Two observations1. For many teams, attendance is relatively flat.

2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.

Model1. Remove the teams that have flat attendance.

2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.

3. Remove a few games unusual characteristics.I European gamesI Blizzards

4. Use several predictor variables (next slide)

5. Announced attendance is outcome we’re trying to predict

Attendance Data and Model

Two observations1. For many teams, attendance is relatively flat.

2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.

Model1. Remove the teams that have flat attendance.

2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.

3. Remove a few games unusual characteristics.I European gamesI Blizzards

4. Use several predictor variables (next slide)

5. Announced attendance is outcome we’re trying to predict

Attendance Data and Model

Two observations1. For many teams, attendance is relatively flat.

2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.

Model1. Remove the teams that have flat attendance.

2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.

3. Remove a few games unusual characteristics.I European games

I Blizzards

4. Use several predictor variables (next slide)

5. Announced attendance is outcome we’re trying to predict

Attendance Data and Model

Two observations1. For many teams, attendance is relatively flat.

2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.

Model1. Remove the teams that have flat attendance.

2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.

3. Remove a few games unusual characteristics.I European gamesI Blizzards

4. Use several predictor variables (next slide)

5. Announced attendance is outcome we’re trying to predict

Attendance Data and Model

Two observations1. For many teams, attendance is relatively flat.

2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.

Model1. Remove the teams that have flat attendance.

2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.

3. Remove a few games unusual characteristics.I European gamesI Blizzards

4. Use several predictor variables (next slide)

5. Announced attendance is outcome we’re trying to predict

Attendance Data and Model

Two observations1. For many teams, attendance is relatively flat.

2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.

Model1. Remove the teams that have flat attendance.

2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.

3. Remove a few games unusual characteristics.I European gamesI Blizzards

4. Use several predictor variables (next slide)

5. Announced attendance is outcome we’re trying to predict

Predictors

I home team, away team

I day of week, month

I holiday (Columbus Day, Thanksgiving week, etc., or none.)

I season opener (Y or N)

I same division (Y or N)

I same conference (Y or N)

I points during previous year for home/away (lag)

I year-to-date points relative to average for home/away.

I day and month interaction (Sundays different in fall?)

I home team and day interaction

I home team and month interaction(snowbird months good for us?)

Predictors

I home team, away team

I day of week, month

I holiday (Columbus Day, Thanksgiving week, etc., or none.)

I season opener (Y or N)

I same division (Y or N)

I same conference (Y or N)

I points during previous year for home/away (lag)

I year-to-date points relative to average for home/away.

I day and month interaction (Sundays different in fall?)

I home team and day interaction

I home team and month interaction(snowbird months good for us?)

Predictors

I home team, away team

I day of week, month

I holiday (Columbus Day, Thanksgiving week, etc., or none.)

I season opener (Y or N)

I same division (Y or N)

I same conference (Y or N)

I points during previous year for home/away (lag)

I year-to-date points relative to average for home/away.

I day and month interaction (Sundays different in fall?)

I home team and day interaction

I home team and month interaction(snowbird months good for us?)

Predictors

I home team, away team

I day of week, month

I holiday (Columbus Day, Thanksgiving week, etc., or none.)

I season opener (Y or N)

I same division (Y or N)

I same conference (Y or N)

I points during previous year for home/away (lag)

I year-to-date points relative to average for home/away.

I day and month interaction (Sundays different in fall?)

I home team and day interaction

I home team and month interaction(snowbird months good for us?)

Predictors

I home team, away team

I day of week, month

I holiday (Columbus Day, Thanksgiving week, etc., or none.)

I season opener (Y or N)

I same division (Y or N)

I same conference (Y or N)

I points during previous year for home/away (lag)

I year-to-date points relative to average for home/away.

I day and month interaction (Sundays different in fall?)

I home team and day interaction

I home team and month interaction(snowbird months good for us?)

Predictors

I home team, away team

I day of week, month

I holiday (Columbus Day, Thanksgiving week, etc., or none.)

I season opener (Y or N)

I same division (Y or N)

I same conference (Y or N)

I points during previous year for home/away (lag)

I year-to-date points relative to average for home/away.

I day and month interaction (Sundays different in fall?)

I home team and day interaction

I home team and month interaction(snowbird months good for us?)

Predictors

I home team, away team

I day of week, month

I holiday (Columbus Day, Thanksgiving week, etc., or none.)

I season opener (Y or N)

I same division (Y or N)

I same conference (Y or N)

I points during previous year for home/away (lag)

I year-to-date points relative to average for home/away.

I day and month interaction (Sundays different in fall?)

I home team and day interaction

I home team and month interaction(snowbird months good for us?)

Predictors

I home team, away team

I day of week, month

I holiday (Columbus Day, Thanksgiving week, etc., or none.)

I season opener (Y or N)

I same division (Y or N)

I same conference (Y or N)

I points during previous year for home/away (lag)

I year-to-date points relative to average for home/away.

I day and month interaction (Sundays different in fall?)

I home team and day interaction

I home team and month interaction(snowbird months good for us?)

Predictors

I home team, away team

I day of week, month

I holiday (Columbus Day, Thanksgiving week, etc., or none.)

I season opener (Y or N)

I same division (Y or N)

I same conference (Y or N)

I points during previous year for home/away (lag)

I year-to-date points relative to average for home/away.

I day and month interaction (Sundays different in fall?)

I home team and day interaction

I home team and month interaction(snowbird months good for us?)

Predictors

I home team, away team

I day of week, month

I holiday (Columbus Day, Thanksgiving week, etc., or none.)

I season opener (Y or N)

I same division (Y or N)

I same conference (Y or N)

I points during previous year for home/away (lag)

I year-to-date points relative to average for home/away.

I day and month interaction (Sundays different in fall?)

I home team and day interaction

I home team and month interaction(snowbird months good for us?)

Predictors

I home team, away team

I day of week, month

I holiday (Columbus Day, Thanksgiving week, etc., or none.)

I season opener (Y or N)

I same division (Y or N)

I same conference (Y or N)

I points during previous year for home/away (lag)

I year-to-date points relative to average for home/away.

I day and month interaction (Sundays different in fall?)

I home team and day interaction

I home team and month interaction(snowbird months good for us?)

Interpretation of regression model results

I Impact that each of these variables have on attendance,

independent of all other variables.

I For example, we find the effect of day, controlling for all ofthe other variables in our model

I That’s an important point. Example: If teams schedule bigopponents on the weekend, then the effect of a weekendgame could be overstated if we just look at day and ignoreopponent.

Interpretation of regression model results

I Impact that each of these variables have on attendance,independent of all other variables.

I For example, we find the effect of day, controlling for all ofthe other variables in our model

I That’s an important point. Example: If teams schedule bigopponents on the weekend, then the effect of a weekendgame could be overstated if we just look at day and ignoreopponent.

Interpretation of regression model results

I Impact that each of these variables have on attendance,independent of all other variables.

I For example, we find the effect of day, controlling for all ofthe other variables in our model

I That’s an important point. Example: If teams schedule bigopponents on the weekend, then the effect of a weekendgame could be overstated if we just look at day and ignoreopponent.

Interpretation of regression model results

I Impact that each of these variables have on attendance,independent of all other variables.

I For example, we find the effect of day, controlling for all ofthe other variables in our model

I That’s an important point. Example: If teams schedule bigopponents on the weekend, then the effect of a weekendgame could be overstated if we just look at day and ignoreopponent.

Example: day of week

240

1056

786

−245

−582

−565

−690

Sun

Sat

Fri

Thu

Wed

Tue

Mon

−1000 −500 0 500 1000Announced Attendance

Day

Of W

eek

Effect of Day Of Week on Attendance

1. Attendance on Saturday is expected to be 1,056 higher than average,“holding all other variables constant."

2. The difference between Saturday and Monday is expected to be 1,746(1,056 + 690).

3. Not surprising. Stuff we knew. But now we’ve quantified.

Example: day of week

240

1056

786

−245

−582

−565

−690

Sun

Sat

Fri

Thu

Wed

Tue

Mon

−1000 −500 0 500 1000Announced Attendance

Day

Of W

eek

Effect of Day Of Week on Attendance

1. Attendance on Saturday is expected to be 1,056 higher than average,“holding all other variables constant."

2. The difference between Saturday and Monday is expected to be 1,746(1,056 + 690).

3. Not surprising. Stuff we knew. But now we’ve quantified.

Example: day of week

240

1056

786

−245

−582

−565

−690

Sun

Sat

Fri

Thu

Wed

Tue

Mon

−1000 −500 0 500 1000Announced Attendance

Day

Of W

eek

Effect of Day Of Week on Attendance

1. Attendance on Saturday is expected to be 1,056 higher than average,“holding all other variables constant."

2. The difference between Saturday and Monday is expected to be 1,746(1,056 + 690).

3. Not surprising. Stuff we knew. But now we’ve quantified.

Example: day of week

240

1056

786

−245

−582

−565

−690

Sun

Sat

Fri

Thu

Wed

Tue

Mon

−1000 −500 0 500 1000Announced Attendance

Day

Of W

eek

Effect of Day Of Week on Attendance

1. Attendance on Saturday is expected to be 1,056 higher than average,“holding all other variables constant."

2. The difference between Saturday and Monday is expected to be 1,746(1,056 + 690).

3. Not surprising. Stuff we knew. But now we’ve quantified.

Month

1057

852

350

54

−553

−803

−957

Apr

Mar

Feb

Jan

Dec

Nov

Oct

−1000 −500 0 500 1000Announced Attendance

Mon

thEffect of Month on Attendance

I Attendance increases over the course of the season

Month

1057

852

350

54

−553

−803

−957

Apr

Mar

Feb

Jan

Dec

Nov

Oct

−1000 −500 0 500 1000Announced Attendance

Mon

thEffect of Month on Attendance

I Attendance increases over the course of the season

Away Team

5488

273307360

583618

82111541162

1270

−696−669−660−623−614−592−564−527

−376−340

−171−162−138−121−112−107

−69−52−52−46

STLOTTNSHPHXCARCBJMINDALATL

CGYANAEDMNYI

WPGCOLFLAT.B

BUFL.AS.J

VANN.J

WSHPHI

TORMTLCHI

BOSPIT

NYRDET

−1000 0 1000Announced Attendance

Aw

ay

Effect of Away on Attendance

Holidays

49110491214

20922232

291821102111

681150

905

17482570

58012371361

1006630

177564

846

−146

−800

−1602

0

Veterans DayValentine's Day

Thanksgiving SunThanksgiving SatThanksgiving Fri

Thanksgiving WedThanksgiving Tue

Thanksgiving MonSeason OpenerPresidents Day

noneMLK Day

HalloweenGood FridayElection Day

EasterDec 31Dec 30Dec 29Dec 28Dec 27Dec 26Dec 23Dec 22

Columbus Day

−2000 0 2000Announced Attendance

Hol

iday

Effect of Holiday on Attendance

Day-month combinations

64

80

90

39

31

43

109

99

262

150

2

12

79

195

84

51

54

220

278

170

23

182

3

288

9

53

−30

−39

−49

−39

−1

−147

−12

−282

−219

−232

−135

−38

−120

−301

−94

−77

−69

−158

−209

−70

−196

−22

−131

Sun.AprSun.MarSun.FebSun.JanSun.DecSun.NovSun.OctSat.AprSat.MarSat.FebSat.JanSat.DecSat.NovSat.OctFri.AprFri.MarFri.FebFri.JanFri.DecFri.NovFri.Oct

Thu.AprThu.MarThu.FebThu.JanThu.DecThu.NovThu.Oct

Wed.AprWed.MarWed.FebWed.JanWed.DecWed.NovWed.OctTue.AprTue.MarTue.FebTue.JanTue.DecTue.NovTue.Oct

Mon.AprMon.MarMon.FebMon.JanMon.DecMon.NovMon.Oct

−200 0 200Announced Attendance

Day

Mon

th

Effect of Day Month on Attendance

Opponent-day combinations, Other Notes

Opponent-day:I Good team on Sat and bad team on Tue, or good team on

Tue and bad team on Sat

I No evidence that there is a difference between these two,in terms of attendance.

I Revenue, however, is another story.

Record:I Record matters for both home team and away team.

I Last year’s record matters too.

Opponent-day combinations, Other Notes

Opponent-day:I Good team on Sat and bad team on Tue, or good team on

Tue and bad team on Sat

I No evidence that there is a difference between these two,in terms of attendance.

I Revenue, however, is another story.

Record:I Record matters for both home team and away team.

I Last year’s record matters too.

Opponent-day combinations, Other Notes

Opponent-day:I Good team on Sat and bad team on Tue, or good team on

Tue and bad team on Sat

I No evidence that there is a difference between these two,in terms of attendance.

I Revenue, however, is another story.

Record:I Record matters for both home team and away team.

I Last year’s record matters too.

Opponent-day combinations, Other Notes

Opponent-day:I Good team on Sat and bad team on Tue, or good team on

Tue and bad team on Sat

I No evidence that there is a difference between these two,in terms of attendance.

I Revenue, however, is another story.

Record:I Record matters for both home team and away team.

I Last year’s record matters too.

Opponent-day combinations, Other Notes

Opponent-day:I Good team on Sat and bad team on Tue, or good team on

Tue and bad team on Sat

I No evidence that there is a difference between these two,in terms of attendance.

I Revenue, however, is another story.

Record:I Record matters for both home team and away team.

I Last year’s record matters too.

Using prediction model to tier games

Using prediction model to tier games

Actual vs Budget for 16-17

5.0

7.5

10.0

12.5

5.0 7.5 10.0 12.5Budget

Act

ual

Actual vs Budget for 16−17

Internal data

1. Model using public data (2007-08 to 2016-17)2. Model using internal data (only 2014-15 to 2016-17, but

can use ticket prices and revenue)3. Average

Predictions for 17-18

Total tickets and tickets 60 days out

Total tickets and tickets 60 days out

Total tickets vs tickets 60 days out

Total tickets vs tickets 60 days out

Similar relationship for n days out

Similar relationship for n days out

Fin.

@bmacNHL

Lead scores

88% (n=768)

89% (n=226)

90% (n=136)

82% (n=27)

88% (n=26)

82% (n=11)

Broward

Palm Beach

Miami−Dade

Other FL counties

Northeast States

Canada

0 25 50 75 100 125Renewal Rate

Reg

ion

Lead scores

88% (n=768)

89% (n=226)

90% (n=136)

82% (n=27)

88% (n=26)

82% (n=11)

Broward

Palm Beach

Miami−Dade

Other FL counties

Northeast States

Canada

0 25 50 75 100 125Renewal Rate

Reg

ion

Show rate

17% (n=12)

43% (n=14)

78% (n=18)

71% (n=24)

70% (n=46)

85% (n=68)

88% (n=112)

87% (n=135)

90% (n=224)

92% (n=345)

95% (n=210)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 25 50 75 100 125Renewal Rate

Sho

w R

ate

Win% in games attended

33% (n=24)

70% (n=30)

70% (n=66)

84% (n=144)

89% (n=341)

93% (n=592)

73% (n=11)

0

0.1

0.2

0.3

0.4

0.5

0.6

0 25 50 75 100 125Renewal Rate

Win

s A

tt P

Average total goals in games attended

58% (n=12)

71% (n=14)

59% (n=17)

76% (n=45)

84% (n=31)

78% (n=72)

94% (n=53)

84% (n=147)

94% (n=125)

94% (n=302)

91% (n=242)

93% (n=117)

0.8

1.6

2

2.4

2.8

3.2

3.6

4

4.4

4.8

5.2

5.6

0 25 50 75 100 125Renewal Rate

Tot G

oals

Att

P

Proportion of 1-goal games in games attended

21% (n=14)

79% (n=14)

60% (n=20)

77% (n=30)

79% (n=43)

88% (n=67)

83% (n=72)

86% (n=116)

90% (n=179)

92% (n=258)

94% (n=372)

0

0.1

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0 25 50 75 100 125Renewal Rate

One

Goa

l Att

P