Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder...
-
Upload
alberta-gilbert -
Category
Documents
-
view
213 -
download
0
Transcript of Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder...
Simulating Sports: The Inputs and the Engines
Paul BessireGeneral Manager, Co-Founder
PredictionMachine.comSeptember 29, 2010
Table of Contents
• Intro
• PredictionMachine.com & Simulation Overview
• Simulating Baseball
• Plate Appearance Decision Tree
• Examples (more second presentation)
Introduction• 2004 University of Cincinnati BBA, Finance and QA
• 2005 MSQA - Master’s Project (with Dr. Fry):
Measuring Individual and Team Effectiveness in the NBA Through Multivariate Regression
• 2004 – 2009 WhatIfSports.com/FOXSports.com, Director, Content and Quantitative Analysis
• 2010 Launched PredictionMachine.com in February
About PredictionMachine.com• “We play the game 50,000 times before it’s actually played.”
• Built by Paul Bessire to focus on content after six years at WhatIfSports.com/FOXSports
• February 2010 - Launched with Super Bowl Prediction (Indianapolis 28 – New Orleans 27)
• “Predictalator” – Simulation engine plays entire NFL season 50,000 times in 8 seconds
• March Madness, NBA Playoffs, MLB Daily, College Football, NFL
• Customizable Predictalator – Any teams, Any where, Any line
• Fantasy Football Projections
• Live simulator built to analyze in-game winning probabilities and value in coaching decisions
Sports Simulation• Play-by-play
– A “play” means something different for each sport
– Probabilities for every individual outcome
– Random number generation
– Pitch-by-pitch (or basketball/hockey pass-by-pass) not needed
– Account for every possible statistical interaction during a game
• Can be recreated quickly– 50,000+ games/second
– All data tracked
– Every outcome is different
– Boxscores
Significant StatsPitchers
• HBP/BF• BB/(BF – HBP)• OAV• 1B/Hit Allowed• 2B/Hit Allowed• 3B/Hit Allowed• HR/Hit Allowed• K/Out• GO/FO• BF• Pitches Thrown/BF• Relative Range Factor• Fielding Percentage• Handedness• Ballpark Effects• League Averages
Hitters• HBP/PA• BB/(PA – HBP)• AVG• 1B/Hit • 2B/Hit• 3B/Hit• HR/Hit• K/Out• GO/FO• PA• Relative Range Factor• Fielding Percentage• Catcher Arm Rating• CS% (Runner)• Speed Rating• Handedness• Ballpark Effects• League Averages
Insignificant StatsPitchers
• Wins• Losses• Saves• Holds• Complete Games• Shutouts
• ERA (kind of – 2B and 3B approx)
• Unearned Runs• Games Started
• Pitch Types
• Performance in Counts
• Other Situational Stats
Hitters• RBI
• IBB
• Runs (kind of – in Speed Formula)
• GIDP (kind of – in Speed Formula)
• SF (kind of – in PA, but also situational)
• SH (kind of – in PA, in but also situational)
• SBA (kind of – attempts, but also setting)
• Performance in Counts
• Other Situational Stats
Ballpark Effects
Ballparks – Extremes (Min. 3 seasons)
Effect Ballpark High Ballpark Low
Hits Coors Field 1.182 Petco Park .908
2B Baker Bowl 1.291 Dodger Stadium .795
3B Palace of the Fans 1.868 Great American Ballpark .523
HR_RF Coors Field 1.374 Municipal Stadium .636
HR_LF Coors Field 1.385 Municipal Stadium .634
Runs (unused) Coors Field 1.380 Petco Park .830
PA Decision Tree - NormalizationEvery step in PA uses modified* log5 normalization (Bill James AVG example):
H/AB = ((AVG * OAV) / LgAVG) /
((AVG * OAV) / LgAVG + (1- AVG )*(1- OAV)/(1-LgAvg))
Where, LgAVG = (PLgAVG + BLgAVG)/2
2000 Pedro vs. 1923 Ruth Example:
H/AB = ((.393 * .167) / .2791) /
((.393 * .167) / .2791+ (1- .393)*(1- .167)/(1-.2791))
Where, LgAVG = (.283 + .276)/2 or .2791
Result = .2504
* Modified due to a flaw in the assumption above that the batter and pitcher carry equal (50/50) weights on each possible outcome of the PA event. Also accounts for handedness and ballpark.
PA Decision Tree – Steps 1*Plate Appearance
Unusual Event(IBB, WP, PB, SB, CS, SH, Hit and Run, Pickoff, Balk)
Normal PA
HBP(per PA or BFP)
Not HBP
BB(per PA or BFP – HBP)
At Bat…
* No ballpark or handedness adjustments made yet.
PA Decision Tree – Steps 2At-Bat
Out Hit…(AVG vs. OAV)*
Strikeout(K/Out)
Normal(Logic to determine direction
and GO or FO)
Hit(Poor Play)
Error(Fielding Percentage)
Normal
* Historical handedness adjustment and ballpark hits multiplier used.
PA Decision Tree – Steps 3Hit*
Normal – In Play HR*(HR/Hit)
Out(Plus Play)
Normal Hit
3B*(3B/Hit * multiplier
for lost HR)
2B*(2B/Hit * multiplier
for lost HR)
1B
* Ballpark multipliers used.
PA Decision Tree – Matchup Weights
Addresses previous 50/50 assumption using League-Adjusted Variance to form batter and pitcher weights for each step:
HBP/PA BB/(PA-HBP) H/AB K/(OUT) HR/HIT 2B/HIT 3B/HIT
Pitcher% 47.8 43.5 46.7 45.6 39.7 15.2 11.6
Hitter% 52.2 56.5 53.3 54.4 60.3 84.8 88.4
Matchup Weights: What does this mean?
• Batter always has more control (even with HBP and BB)
– Makes final decision (Swing or not)– Dictates strike zone– Less consistent
• Doubles and Triples are (mostly) out of pitcher’s control (BABIP)
• Does not necessarily batting is more important
– 9 vs. 1– Fewer pitcher outliers means elite pitchers are more valuable
PA Decision Tree - NormalizationBatting Average Example using Matchup Weights:
H/AB = ((1.066*AVG * .934*OAV) / LgAVG) /
((1.066*AVG * .934*OAV) / LgAVG + (1.066- 1.066*AVG )*(.934- .934*OAV)/(1-LgAvg))
Where, LgAVG = (.934*PLgAVG + 1.066*BLgAVG)/2
2000 Pedro vs. 1923 Ruth Example (with handedness):
H/AB = ((1.066*.393 * .167 * .934) / .2795) /
((.393 * .167) / .2795+ (1- .393)*(1- .167)/(1-.2795))
Where, LgAVG = (1.066*.283 + 0.934*.276)/2 or .2795
Result * Handedness = .2502 * 1.045
Final Result = .2614