Predicting New Race Times From Race Historycs229.stanford.edu/proj2015/247_poster.pdf ·...
Transcript of Predicting New Race Times From Race Historycs229.stanford.edu/proj2015/247_poster.pdf ·...
Predicting New Race Times From Race History
Matthew Millett, TJ Melanson
Locally Weighted Linear Regression
Introduction: “What if…?”Many runners like to ask the question, “What if?”• If you race a distance you’ve never raced before,
how would you do?• What can it tell you about your fitness?
Current methods are limited to intuition, Jack Daniel’s running formula, which requires serious Vo2Max testing, or Peter Riegel’s oversimplified formula.
All are either imprecise or fail to generalize well to different distances.
Figure 1: A Vo2Max test in progress. Current tests like this are expensive and complicated, but give a good measure of fitness. Why don’t we just use race history?
DataInitial tests: Baseline & Oracle
Figure 4: Data sources consisted of online race results.
We’re implementing a Hidden Markov Model to model states of fitness. Emissions are normally distributed with respect to fitness, and currently all transition probabilities are uniform.We have a discrete set of fitness states. We train on our entire data set, using the Baum-Welch EM algorithm. Then we use the forward algorithm to predict how likely your fitness is a certain state given the races you’ve run.
To establish a baseline for performance, we drew from literature. Peter Riegel’s formula is still widely used today.We took a runner’s most recent race to be d1 and extrapolated from there to find t2. Our oracle was the running club coach, Pattisue Plumer, a former Olympian.We tested on 16 runners, where for each runner we’d predict a random race and see how far off the time was.Baseline Average Error:11.71% Oracle Average Error: 9.73%
Future WorkWe are tuning the hyperparameters of the regression to see if we can find more reliable values.We are working out some bugs that keep us from fully implementing the HMM with 2 variables instead of 1.Lastly, we hope to eventually incorporate workout data into our model as emissions.
References1. Riegel, Peter S. "Athletic Records and Human
Endurance: A time-vs.-distance equation describing world-record performances may be used to compare the relative endurance capabilities of various groups of people." American Scientist(1981): 285-290.���
2. BeautifulSoup:http://www.crummy.com/software/BeautifulSoup/ ���
3. YAHMM: https://github.com/jmschrei/yahmm
Our Hidden Markov Model
We scraped our data from online results on tfrrs.org. Each athlete has a page with a whole history of results.We randomly selected 47 athletes from track meets and 54 athletes from Cross-Country meets to train and test our models.
CS229 Final Project
t = sdEWe adapted Riegel’s model to take in a set of races and parameterize fitness with values of (s, E). A lower s value means you’ve got speed, but a lower E means you’ve got strength.
t2 = (d2d1)1.06
Figure 2: Riegel’s Formula. Riegel’s formula was first published in Runner’s Wordl Magazine in 1977, modeling human performance over increasing distance. Figure 3: Coach Pattisue, our oracle.
The closer the distance and the closer the date of a race, the more likely it will model the race you’re trying to predict.We used locally weighted linear regression, whereϴ = (XTWX)-1XTWY..
We modeled error using LOOCV. While this model was fast, it is not descriptive and didn’t perform as well as we expected, so we moved to an HMM.
Features: Distance, DateOutput: race speed
XC RMSE: 1.12 m/sTrack RMSE: 0.82 m/s
Figure 5: One athlete’s data: Race distance vs. date vs. speed.
Figure 6: An example of our Markov Model calculation.Given the distribution of races that people have run, and your race, what will you run?
0
0.0005
0.001
0.0015
1400 1450 1500 1550 1600 1650 1700 1750 1800
Ini$al Probability Distribu$on
0
0.0005
0.001
0.0015
0.002
1400 1450 1500 1550 1600 1650 1700 1750 1800
Trained Distribu$on
0
100
200
1400
1410
1420
1430
1440
1450
1460
1470
1480
1490
1500
1510
1520
1530
1540
1550
1560
1570
1580
1590
1600
1610
1620
1630
1640
1650
1660
1670
1680
1690
1700
1710
1720
1730
1740
1750
1760
1770
1780
1790
1800
Data
0
0.000005
0.00001
1400 1450 1500 1550 1600 1650 1700 1750 1800
P(Race | 28:20 8K)