Predicting New Race Times From Race Historycs229.stanford.edu/proj2015/247_poster.pdf ·...

Predicting New Race Times From Race History

Matthew Millett, TJ Melanson

Locally Weighted Linear Regression

Introduction: “What if…?”Many runners like to ask the question, “What if?”•  If you race a distance you’ve never raced before,

how would you do?•  What can it tell you about your fitness?

Current methods are limited to intuition, Jack Daniel’s running formula, which requires serious Vo2Max testing, or Peter Riegel’s oversimplified formula.

All are either imprecise or fail to generalize well to different distances.

Figure 1: A Vo2Max test in progress. Current tests like this are expensive and complicated, but give a good measure of fitness. Why don’t we just use race history?

DataInitial tests: Baseline & Oracle

Figure 4: Data sources consisted of online race results.

We’re implementing a Hidden Markov Model to model states of fitness. Emissions are normally distributed with respect to fitness, and currently all transition probabilities are uniform.We have a discrete set of fitness states. We train on our entire data set, using the Baum-Welch EM algorithm. Then we use the forward algorithm to predict how likely your fitness is a certain state given the races you’ve run.

To establish a baseline for performance, we drew from literature. Peter Riegel’s formula is still widely used today.We took a runner’s most recent race to be d1 and extrapolated from there to find t2. Our oracle was the running club coach, Pattisue Plumer, a former Olympian.We tested on 16 runners, where for each runner we’d predict a random race and see how far off the time was.Baseline Average Error:11.71% Oracle Average Error: 9.73%

Future WorkWe are tuning the hyperparameters of the regression to see if we can find more reliable values.We are working out some bugs that keep us from fully implementing the HMM with 2 variables instead of 1.Lastly, we hope to eventually incorporate workout data into our model as emissions.

References1.  Riegel, Peter S. "Athletic Records and Human

Endurance: A time-vs.-distance equation describing world-record performances may be used to compare the relative endurance capabilities of various groups of people." American Scientist(1981): 285-290.��

2.  BeautifulSoup:http://www.crummy.com/software/BeautifulSoup/ ��

3.  YAHMM: https://github.com/jmschrei/yahmm

Our Hidden Markov Model

We scraped our data from online results on tfrrs.org. Each athlete has a page with a whole history of results.We randomly selected 47 athletes from track meets and 54 athletes from Cross-Country meets to train and test our models.

CS229 Final Project

t = sdEWe adapted Riegel’s model to take in a set of races and parameterize fitness with values of (s, E). A lower s value means you’ve got speed, but a lower E means you’ve got strength.

t2 = (d2d1)1.06

Figure 2: Riegel’s Formula. Riegel’s formula was first published in Runner’s Wordl Magazine in 1977, modeling human performance over increasing distance. Figure 3: Coach Pattisue, our oracle.

The closer the distance and the closer the date of a race, the more likely it will model the race you’re trying to predict.We used locally weighted linear regression, whereϴ = (XTWX)-1XTWY..

We modeled error using LOOCV. While this model was fast, it is not descriptive and didn’t perform as well as we expected, so we moved to an HMM.

Features: Distance, DateOutput: race speed

XC RMSE: 1.12 m/sTrack RMSE: 0.82 m/s

Figure 5: One athlete’s data: Race distance vs. date vs. speed.

Figure 6: An example of our Markov Model calculation.Given the distribution of races that people have run, and your race, what will you run?

0

0.0005

0.001

0.0015

1400 1450 1500 1550 1600 1650 1700 1750 1800

Ini$al Probability Distribu$on

0

0.0005

0.001

0.0015

0.002

1400 1450 1500 1550 1600 1650 1700 1750 1800

Trained Distribu$on

0

100

200

1400

1410

1420

1430

1440

1450

1460

1470

1480

1490

1500

1510

1520

1530

1540

1550

1560

1570

1580

1590

1600

1610

1620

1630

1640

1650

1660

1670

1680

1690

1700

1710

1720

1730

1740

1750

1760

1770

1780

1790

1800

Data

0

0.000005

0.00001

1400 1450 1500 1550 1600 1650 1700 1750 1800

P(Race | 28:20 8K)

Predicting New Race Times From Race Historycs229.stanford.edu/proj2015/247_poster.pdf ·...

Documents

Transcript of Predicting New Race Times From Race Historycs229.stanford.edu/proj2015/247_poster.pdf ·...