Post on 18-Jul-2015
Sabermetrics in Practice: Examining Fan Voting for MLB All-Stars over
Three Eras
Allison R. Levin, MA, JDPresident, Social Network Advisors for Professional Sports
Allison.levin@gmail.comTwitter: @arl1102
The Study
• The research seeks to understand what criteria the fans valued most when selecting All-Stars and how it has changed over time
• The author collected partial year statistics for the All-Star year as well as full year statistics for the previous two years, for the top three vote getters at each position for the 1994, 2004, and 2014 games.
The Data
• Three classifications of statistics were examined to explain the percentage difference in votes
• Since there are many potential explanatory variables relative to the number of observations several determinations were necessary– What is the best regression model to use
– When does adding additional variables to the model stop providing additional meaningful value
– What order to enter variables for consistency over time
The Regression Model
• To select independent variables the best one-variable model is compared with the best two-variable model and so on.
• The criterion used to select the best of all these models is the one that maximizes the adjusted R2
Overfitting
• John von Neumann explained overfitting
– With four parameters I can fit an elephant and with five I can make him wiggle his trunk
• The subset best fit model was also used to attempt to control for overfitting by estimating the best variables for each time period.
What order
• Due to the relationship between variables and to avoid researcher bias, it was important to first determine the order in which to enter variables
• How does one become a baseball fan?
Traditional Statistics
• Most people don’t remember when they became a baseball fan
– Instead, we tend to have initial memories surrounding favorite players
Visibility
• The second set of variables entered was visibility– 1994- 42% of Americans had pay television
• Of those only 8.1% paid for premium services
• Approximately 32 games on local tv
• ESPN showed 3 games a week
– 2004- 78% of Americans had cable television• Of those 56.8% paid for a package that included sports
programming
• Approximately 85% of local market games
– 2014- 97% of Americans have some form of paid television
SABR
• For diehard fans once they become interested in and have knowledge of multiple teams and players they seek out more information
• Have a group of players and start thinking about how they rate versus each other
Hypothesis 1
When information about players was limited fans tended to vote on the visibility and popularity of the players
Partially supported
Hypothesis 2
When fans had access to multiple games and nearly unlimited information about players on the Web, fans tended to vote by comparing players
Not supported
Hypothesis 3
When Twitter is included fans are influenced by team tweets
Partially supported; Further research needed
Twitter Usage
• Team tweets for 2012-2014 asking for all-star votes were examined for each team with a player in the top 3
• Tweets and Retweets were examined – Tweets- call for action
– Retweet- action
• The adjusted R squared for the 2014 model including retweets showed that retweets significantly increased the explanatory power of the regression