Using Data Analysis to Predict Attendance for NHL · PDF fileUsing Data Analysis to Predict...
Transcript of Using Data Analysis to Predict Attendance for NHL · PDF fileUsing Data Analysis to Predict...
Using Data Analysis to Predict Attendance forNHL Regular Season Games
Brian Macdonald1
Michael Peterson2
James Cifu1
1Florida Panthers Hockey Club, Sunrise, FL2Tampa Bay Lightning Hockey Club, Tampa, FL
RITHAC 2017, 10/21/2017Twitter: @bmacNHL, @FlaPanthers, @TBLightning
Attendance Model
Goal:I Develop a model for predicting attendance for games using
only information that is known before tickets go on sale.
This could help answer questions like:I Which games should be in which tiers for variable pricing?
I What kinds of things could we request when the league isdeveloping the schedule?
I Specific question: Do we prefer good team on a Saturdayand bad team during the week, or a good team during theweek and a bad team on Saturday?"
I What do we want Thanksgiving week?
Attendance Model
Goal:I Develop a model for predicting attendance for games using
only information that is known before tickets go on sale.
This could help answer questions like:I Which games should be in which tiers for variable pricing?
I What kinds of things could we request when the league isdeveloping the schedule?
I Specific question: Do we prefer good team on a Saturdayand bad team during the week, or a good team during theweek and a bad team on Saturday?"
I What do we want Thanksgiving week?
Attendance Model
Goal:I Develop a model for predicting attendance for games using
only information that is known before tickets go on sale.
This could help answer questions like:I Which games should be in which tiers for variable pricing?
I What kinds of things could we request when the league isdeveloping the schedule?
I Specific question: Do we prefer good team on a Saturdayand bad team during the week, or a good team during theweek and a bad team on Saturday?"
I What do we want Thanksgiving week?
Attendance Model
Goal:I Develop a model for predicting attendance for games using
only information that is known before tickets go on sale.
This could help answer questions like:I Which games should be in which tiers for variable pricing?
I What kinds of things could we request when the league isdeveloping the schedule?
I Specific question: Do we prefer good team on a Saturdayand bad team during the week, or a good team during theweek and a bad team on Saturday?"
I What do we want Thanksgiving week?
Attendance Model
Goal:I Develop a model for predicting attendance for games using
only information that is known before tickets go on sale.
This could help answer questions like:I Which games should be in which tiers for variable pricing?
I What kinds of things could we request when the league isdeveloping the schedule?
I Specific question: Do we prefer good team on a Saturdayand bad team during the week, or a good team during theweek and a bad team on Saturday?"
I What do we want Thanksgiving week?
First, let’s plot some raw data.Attendance* by game, from 2007-08 to 2016-17, for all 30
teams.
*Announced attendance, as published on nhl.com
First, let’s plot some raw data.Attendance* by game, from 2007-08 to 2016-17, for all 30
teams.
*Announced attendance, as published on nhl.com
Attendance Data and Model
Two observations1. For many teams, attendance is relatively flat.
2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.
Model1. Remove the teams that have flat attendance.
2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.
3. Remove a few games unusual characteristics.I European gamesI Blizzards
4. Use several predictor variables (next slide)
5. Announced attendance is outcome we’re trying to predict
Attendance Data and Model
Two observations1. For many teams, attendance is relatively flat.
2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.
Model1. Remove the teams that have flat attendance.
2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.
3. Remove a few games unusual characteristics.I European gamesI Blizzards
4. Use several predictor variables (next slide)
5. Announced attendance is outcome we’re trying to predict
Attendance Data and Model
Two observations1. For many teams, attendance is relatively flat.
2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.
Model1. Remove the teams that have flat attendance.
2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.
3. Remove a few games unusual characteristics.I European gamesI Blizzards
4. Use several predictor variables (next slide)
5. Announced attendance is outcome we’re trying to predict
Attendance Data and Model
Two observations1. For many teams, attendance is relatively flat.
2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.
Model1. Remove the teams that have flat attendance.
2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.
3. Remove a few games unusual characteristics.I European games
I Blizzards
4. Use several predictor variables (next slide)
5. Announced attendance is outcome we’re trying to predict
Attendance Data and Model
Two observations1. For many teams, attendance is relatively flat.
2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.
Model1. Remove the teams that have flat attendance.
2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.
3. Remove a few games unusual characteristics.I European gamesI Blizzards
4. Use several predictor variables (next slide)
5. Announced attendance is outcome we’re trying to predict
Attendance Data and Model
Two observations1. For many teams, attendance is relatively flat.
2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.
Model1. Remove the teams that have flat attendance.
2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.
3. Remove a few games unusual characteristics.I European gamesI Blizzards
4. Use several predictor variables (next slide)
5. Announced attendance is outcome we’re trying to predict
Attendance Data and Model
Two observations1. For many teams, attendance is relatively flat.
2. Winning matters. See BOS, CHI, LA, NSH, TB, WSH.
Model1. Remove the teams that have flat attendance.
2. That leaves us with ANA, CAR, CBJ, COL, DAL, FLA, NJ,NSH, NYI, OTT, PHX, STL, and TB.
3. Remove a few games unusual characteristics.I European gamesI Blizzards
4. Use several predictor variables (next slide)
5. Announced attendance is outcome we’re trying to predict
Predictors
I home team, away team
I day of week, month
I holiday (Columbus Day, Thanksgiving week, etc., or none.)
I season opener (Y or N)
I same division (Y or N)
I same conference (Y or N)
I points during previous year for home/away (lag)
I year-to-date points relative to average for home/away.
I day and month interaction (Sundays different in fall?)
I home team and day interaction
I home team and month interaction(snowbird months good for us?)
Predictors
I home team, away team
I day of week, month
I holiday (Columbus Day, Thanksgiving week, etc., or none.)
I season opener (Y or N)
I same division (Y or N)
I same conference (Y or N)
I points during previous year for home/away (lag)
I year-to-date points relative to average for home/away.
I day and month interaction (Sundays different in fall?)
I home team and day interaction
I home team and month interaction(snowbird months good for us?)
Predictors
I home team, away team
I day of week, month
I holiday (Columbus Day, Thanksgiving week, etc., or none.)
I season opener (Y or N)
I same division (Y or N)
I same conference (Y or N)
I points during previous year for home/away (lag)
I year-to-date points relative to average for home/away.
I day and month interaction (Sundays different in fall?)
I home team and day interaction
I home team and month interaction(snowbird months good for us?)
Predictors
I home team, away team
I day of week, month
I holiday (Columbus Day, Thanksgiving week, etc., or none.)
I season opener (Y or N)
I same division (Y or N)
I same conference (Y or N)
I points during previous year for home/away (lag)
I year-to-date points relative to average for home/away.
I day and month interaction (Sundays different in fall?)
I home team and day interaction
I home team and month interaction(snowbird months good for us?)
Predictors
I home team, away team
I day of week, month
I holiday (Columbus Day, Thanksgiving week, etc., or none.)
I season opener (Y or N)
I same division (Y or N)
I same conference (Y or N)
I points during previous year for home/away (lag)
I year-to-date points relative to average for home/away.
I day and month interaction (Sundays different in fall?)
I home team and day interaction
I home team and month interaction(snowbird months good for us?)
Predictors
I home team, away team
I day of week, month
I holiday (Columbus Day, Thanksgiving week, etc., or none.)
I season opener (Y or N)
I same division (Y or N)
I same conference (Y or N)
I points during previous year for home/away (lag)
I year-to-date points relative to average for home/away.
I day and month interaction (Sundays different in fall?)
I home team and day interaction
I home team and month interaction(snowbird months good for us?)
Predictors
I home team, away team
I day of week, month
I holiday (Columbus Day, Thanksgiving week, etc., or none.)
I season opener (Y or N)
I same division (Y or N)
I same conference (Y or N)
I points during previous year for home/away (lag)
I year-to-date points relative to average for home/away.
I day and month interaction (Sundays different in fall?)
I home team and day interaction
I home team and month interaction(snowbird months good for us?)
Predictors
I home team, away team
I day of week, month
I holiday (Columbus Day, Thanksgiving week, etc., or none.)
I season opener (Y or N)
I same division (Y or N)
I same conference (Y or N)
I points during previous year for home/away (lag)
I year-to-date points relative to average for home/away.
I day and month interaction (Sundays different in fall?)
I home team and day interaction
I home team and month interaction(snowbird months good for us?)
Predictors
I home team, away team
I day of week, month
I holiday (Columbus Day, Thanksgiving week, etc., or none.)
I season opener (Y or N)
I same division (Y or N)
I same conference (Y or N)
I points during previous year for home/away (lag)
I year-to-date points relative to average for home/away.
I day and month interaction (Sundays different in fall?)
I home team and day interaction
I home team and month interaction(snowbird months good for us?)
Predictors
I home team, away team
I day of week, month
I holiday (Columbus Day, Thanksgiving week, etc., or none.)
I season opener (Y or N)
I same division (Y or N)
I same conference (Y or N)
I points during previous year for home/away (lag)
I year-to-date points relative to average for home/away.
I day and month interaction (Sundays different in fall?)
I home team and day interaction
I home team and month interaction(snowbird months good for us?)
Predictors
I home team, away team
I day of week, month
I holiday (Columbus Day, Thanksgiving week, etc., or none.)
I season opener (Y or N)
I same division (Y or N)
I same conference (Y or N)
I points during previous year for home/away (lag)
I year-to-date points relative to average for home/away.
I day and month interaction (Sundays different in fall?)
I home team and day interaction
I home team and month interaction(snowbird months good for us?)
Interpretation of regression model results
I Impact that each of these variables have on attendance,
independent of all other variables.
I For example, we find the effect of day, controlling for all ofthe other variables in our model
I That’s an important point. Example: If teams schedule bigopponents on the weekend, then the effect of a weekendgame could be overstated if we just look at day and ignoreopponent.
Interpretation of regression model results
I Impact that each of these variables have on attendance,independent of all other variables.
I For example, we find the effect of day, controlling for all ofthe other variables in our model
I That’s an important point. Example: If teams schedule bigopponents on the weekend, then the effect of a weekendgame could be overstated if we just look at day and ignoreopponent.
Interpretation of regression model results
I Impact that each of these variables have on attendance,independent of all other variables.
I For example, we find the effect of day, controlling for all ofthe other variables in our model
I That’s an important point. Example: If teams schedule bigopponents on the weekend, then the effect of a weekendgame could be overstated if we just look at day and ignoreopponent.
Interpretation of regression model results
I Impact that each of these variables have on attendance,independent of all other variables.
I For example, we find the effect of day, controlling for all ofthe other variables in our model
I That’s an important point. Example: If teams schedule bigopponents on the weekend, then the effect of a weekendgame could be overstated if we just look at day and ignoreopponent.
Example: day of week
240
1056
786
−245
−582
−565
−690
Sun
Sat
Fri
Thu
Wed
Tue
Mon
−1000 −500 0 500 1000Announced Attendance
Day
Of W
eek
Effect of Day Of Week on Attendance
1. Attendance on Saturday is expected to be 1,056 higher than average,“holding all other variables constant."
2. The difference between Saturday and Monday is expected to be 1,746(1,056 + 690).
3. Not surprising. Stuff we knew. But now we’ve quantified.
Example: day of week
240
1056
786
−245
−582
−565
−690
Sun
Sat
Fri
Thu
Wed
Tue
Mon
−1000 −500 0 500 1000Announced Attendance
Day
Of W
eek
Effect of Day Of Week on Attendance
1. Attendance on Saturday is expected to be 1,056 higher than average,“holding all other variables constant."
2. The difference between Saturday and Monday is expected to be 1,746(1,056 + 690).
3. Not surprising. Stuff we knew. But now we’ve quantified.
Example: day of week
240
1056
786
−245
−582
−565
−690
Sun
Sat
Fri
Thu
Wed
Tue
Mon
−1000 −500 0 500 1000Announced Attendance
Day
Of W
eek
Effect of Day Of Week on Attendance
1. Attendance on Saturday is expected to be 1,056 higher than average,“holding all other variables constant."
2. The difference between Saturday and Monday is expected to be 1,746(1,056 + 690).
3. Not surprising. Stuff we knew. But now we’ve quantified.
Example: day of week
240
1056
786
−245
−582
−565
−690
Sun
Sat
Fri
Thu
Wed
Tue
Mon
−1000 −500 0 500 1000Announced Attendance
Day
Of W
eek
Effect of Day Of Week on Attendance
1. Attendance on Saturday is expected to be 1,056 higher than average,“holding all other variables constant."
2. The difference between Saturday and Monday is expected to be 1,746(1,056 + 690).
3. Not surprising. Stuff we knew. But now we’ve quantified.
Month
1057
852
350
54
−553
−803
−957
Apr
Mar
Feb
Jan
Dec
Nov
Oct
−1000 −500 0 500 1000Announced Attendance
Mon
thEffect of Month on Attendance
I Attendance increases over the course of the season
Month
1057
852
350
54
−553
−803
−957
Apr
Mar
Feb
Jan
Dec
Nov
Oct
−1000 −500 0 500 1000Announced Attendance
Mon
thEffect of Month on Attendance
I Attendance increases over the course of the season
Away Team
5488
273307360
583618
82111541162
1270
−696−669−660−623−614−592−564−527
−376−340
−171−162−138−121−112−107
−69−52−52−46
STLOTTNSHPHXCARCBJMINDALATL
CGYANAEDMNYI
WPGCOLFLAT.B
BUFL.AS.J
VANN.J
WSHPHI
TORMTLCHI
BOSPIT
NYRDET
−1000 0 1000Announced Attendance
Aw
ay
Effect of Away on Attendance
Holidays
49110491214
20922232
291821102111
681150
905
17482570
58012371361
1006630
177564
846
−146
−800
−1602
0
Veterans DayValentine's Day
Thanksgiving SunThanksgiving SatThanksgiving Fri
Thanksgiving WedThanksgiving Tue
Thanksgiving MonSeason OpenerPresidents Day
noneMLK Day
HalloweenGood FridayElection Day
EasterDec 31Dec 30Dec 29Dec 28Dec 27Dec 26Dec 23Dec 22
Columbus Day
−2000 0 2000Announced Attendance
Hol
iday
Effect of Holiday on Attendance
Day-month combinations
64
80
90
39
31
43
109
99
262
150
2
12
79
195
84
51
54
220
278
170
23
182
3
288
9
53
−30
−39
−49
−39
−1
−147
−12
−282
−219
−232
−135
−38
−120
−301
−94
−77
−69
−158
−209
−70
−196
−22
−131
Sun.AprSun.MarSun.FebSun.JanSun.DecSun.NovSun.OctSat.AprSat.MarSat.FebSat.JanSat.DecSat.NovSat.OctFri.AprFri.MarFri.FebFri.JanFri.DecFri.NovFri.Oct
Thu.AprThu.MarThu.FebThu.JanThu.DecThu.NovThu.Oct
Wed.AprWed.MarWed.FebWed.JanWed.DecWed.NovWed.OctTue.AprTue.MarTue.FebTue.JanTue.DecTue.NovTue.Oct
Mon.AprMon.MarMon.FebMon.JanMon.DecMon.NovMon.Oct
−200 0 200Announced Attendance
Day
Mon
th
Effect of Day Month on Attendance
Opponent-day combinations, Other Notes
Opponent-day:I Good team on Sat and bad team on Tue, or good team on
Tue and bad team on Sat
I No evidence that there is a difference between these two,in terms of attendance.
I Revenue, however, is another story.
Record:I Record matters for both home team and away team.
I Last year’s record matters too.
Opponent-day combinations, Other Notes
Opponent-day:I Good team on Sat and bad team on Tue, or good team on
Tue and bad team on Sat
I No evidence that there is a difference between these two,in terms of attendance.
I Revenue, however, is another story.
Record:I Record matters for both home team and away team.
I Last year’s record matters too.
Opponent-day combinations, Other Notes
Opponent-day:I Good team on Sat and bad team on Tue, or good team on
Tue and bad team on Sat
I No evidence that there is a difference between these two,in terms of attendance.
I Revenue, however, is another story.
Record:I Record matters for both home team and away team.
I Last year’s record matters too.
Opponent-day combinations, Other Notes
Opponent-day:I Good team on Sat and bad team on Tue, or good team on
Tue and bad team on Sat
I No evidence that there is a difference between these two,in terms of attendance.
I Revenue, however, is another story.
Record:I Record matters for both home team and away team.
I Last year’s record matters too.
Opponent-day combinations, Other Notes
Opponent-day:I Good team on Sat and bad team on Tue, or good team on
Tue and bad team on Sat
I No evidence that there is a difference between these two,in terms of attendance.
I Revenue, however, is another story.
Record:I Record matters for both home team and away team.
I Last year’s record matters too.
Actual vs Budget for 16-17
5.0
7.5
10.0
12.5
5.0 7.5 10.0 12.5Budget
Act
ual
Actual vs Budget for 16−17
Internal data
1. Model using public data (2007-08 to 2016-17)2. Model using internal data (only 2014-15 to 2016-17, but
can use ticket prices and revenue)3. Average
Lead scores
88% (n=768)
89% (n=226)
90% (n=136)
82% (n=27)
88% (n=26)
82% (n=11)
Broward
Palm Beach
Miami−Dade
Other FL counties
Northeast States
Canada
0 25 50 75 100 125Renewal Rate
Reg
ion
Lead scores
88% (n=768)
89% (n=226)
90% (n=136)
82% (n=27)
88% (n=26)
82% (n=11)
Broward
Palm Beach
Miami−Dade
Other FL counties
Northeast States
Canada
0 25 50 75 100 125Renewal Rate
Reg
ion
Show rate
17% (n=12)
43% (n=14)
78% (n=18)
71% (n=24)
70% (n=46)
85% (n=68)
88% (n=112)
87% (n=135)
90% (n=224)
92% (n=345)
95% (n=210)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 25 50 75 100 125Renewal Rate
Sho
w R
ate
Win% in games attended
33% (n=24)
70% (n=30)
70% (n=66)
84% (n=144)
89% (n=341)
93% (n=592)
73% (n=11)
0
0.1
0.2
0.3
0.4
0.5
0.6
0 25 50 75 100 125Renewal Rate
Win
s A
tt P
Average total goals in games attended
58% (n=12)
71% (n=14)
59% (n=17)
76% (n=45)
84% (n=31)
78% (n=72)
94% (n=53)
84% (n=147)
94% (n=125)
94% (n=302)
91% (n=242)
93% (n=117)
0.8
1.6
2
2.4
2.8
3.2
3.6
4
4.4
4.8
5.2
5.6
0 25 50 75 100 125Renewal Rate
Tot G
oals
Att
P