Tracking data from GPS-enabled devices in R with package ...ucakhfr/talks/ATI2015.pdf ·
Transcript of Tracking data from GPS-enabled devices in R with package ...ucakhfr/talks/ATI2015.pdf ·
Tracking data from GPS-enabled devices in R withpackage ’trackeR’
Hannah Frick, Ioannis Kosmidis
http://www.ucl.ac.uk/~ucakhfr/
Outline
Introduction
Package structure and capabilities
Training distribution profiles
Real data example
Summary & Outlook
Introduction
Aim: Access and analyse tracking data from GPS-enabled devices.
Options include:
Plenty of commercial software from manufacturers of devices(Catapult, Garmin, TomTom, Polar, ...) or apps (endomondo,Runtastic, Nike+, ...) for different audiences.
Open-source software such as Golden Cheetah which provides alot of options for analysis.
R system for statistical computing: open-source with a vast collection ofadd-on packages for a large variety of tasks.
However, relatively few packages with a focus on sports and/orGPS tracking.
Package cycleRtools provides specialised cycling analysis in R.
We aim to provide the infrastructure to handle training data fromrunning and cycling to leverage the capabilities of R.
Package trackeR
Computational infrastructure: R with add-on packages, mainly zoo forordered observations as well as ggplot2 and ggmap for visualisations.
Available from Github viaR> # install.packages("devtools")R> devtools::install_github("hfrick/trackeR")
Provides functionality for 4 main areas:
read data
process data
summarise & analyse training sessions
visualise training sessions
Read data
Input:tcx file
readTCX
data frame
trackeRdata
trackeRdataobject
Input:db3 file
readDB3
readContainer
Process data
All observations:
Basic cleaning, e.g., no negative speeds or distances.
Split observations into training sessions: Any observations furtherapart than a specified threshold are considered to belong todifferent training sessions. Default is 2 hours.
Per session:
Distances as recorded by the GPS device can be corrected forelevation if necessary.
Imputation of speeds: Speed 0 is imputed for time periods whenthe device is paused.
Imputation of speeds
Either take speed measurements as provided or calculate fromcumulative distances:
Speedj =Distancej+1 − Distancej
Timej+1 − Timej. (1)
Remove observations with negative or missing speeds.
Check time stamps of observations for gaps of more than lgap
seconds.
In these gaps, impute m observations with 0 speed, latest positionfor latitude, longitude and altitude, and missing values for anythingelse.
Add m similar observations at the start and end of the session,assuming the athlete does not immediately start or stop moving.
Update distances based on imputed speeds via (1).
Imputation of speeds
Tj Tj+1
Tj∗ + h Tj∗ + 2h ...
Sj
0 ...
Sj+1
Tj
Sj Sj+1
Tj+1 − Tj > lgap
lskip lskip
Tj+1
0 0
Tj∗ Tj∗+1
0
Example for reading and processing data
Load package and read tcx file:R> library("trackeR")R> filepath <- system.file("extdata", "2013-06-04-174137.TCX",+ package = "trackeR")R> df <- readTCX(file = filepath, timezone = "GMT")
Process data into trackeRdata object:
R> run <- trackeRdata(df,+ ## basic information+ units = NULL, cycling = FALSE,+ ## separate sessions+ sessionThreshold = 2,+ ## elevation correction+ correctDistances = FALSE, country = NULL, mask = TRUE,+ ## impute speeds+ fromDistances = TRUE, lgap = 30, lskip = 5, m = 11)
Example for reading and processing data
Read and process data from a single file in one step:
R> run <- readContainer(file = filepath, type = "tcx",+ units = NULL, cycling = FALSE, sessionThreshold = 2)
Read and process data from all files in a directory in one step:
R> runs <- readDirectory(directory = "~/path/to/directory/",+ aggregate = TRUE,+ speedunit = list(tcx = "m_per_s", db3 = "km_per_h"),+ distanceunit = list(tcx = "m", db3 = "km"),+ verbose = TRUE)
Analysis & Visualisation
Visualise sessions:Plot profiles for speed, pace, elevation level, heart rate, etc.Plot route taken during a session on various maps.
Summarise sessions.Common summary statistics such as total distance, duration, timemoving, averages of speed, pace, power, cadence and heart rate,etc.Time spent training in specific heart rate or speed zones.
Visualise summaries of sessions.
Visualise sessions
5: 2013−06−05 6: 2013−06−06
90
120
150
2
4
6
8
10
heart.rate [bpm]
pace [min_per_km
]
05:3
0
05:4
0
05:5
0
06:0
0
11:5
0
12:0
0
12:1
0
12:2
0
12:3
0
12:4
0
time
Visualise sessions
5: 2013−06−05 6: 2013−06−06
0
25
50
75
0
2
4
6
altitude [m]
speed [m_per_s]
05:3
0
05:4
0
05:5
0
06:0
0
11:5
0
12:0
0
12:1
0
12:2
0
12:3
0
12:4
0
time
Visualise sessions
0
2
4
6
05:3
0
05:4
0
05:5
0
06:0
0
time
spee
d [m
_per
_s]
Visualise sessions
0
2
4
6
05:3
0
05:4
0
05:5
0
06:0
0
time
spee
d [m
_per
_s]
Visualise sessions
41.35
41.36
41.37
41.38
41.39
2.14 2.15 2.16 2.17 2.18longitude
latit
ude
0
2
4
6
speed
Summary of session data
R> summary(runs, session = 1)
*** Session 1 ***
Session times: 2013-06-01 17:32:15 - 2013-06-01 18:37:56Distance: 14130.7 mDuration: 1.09 hoursMoving time: 1.07 hoursAverage speed: 3.59 m_per_sAverage speed moving: 3.66 m_per_sAverage pace (per 1 km): 4:38 min:secAverage pace moving (per 1 km): 4:33 min:secAverage cadence: 88.66 steps_per_minAverage cadence moving: 88.81 steps_per_minAverage power: NA WAverage power moving: NA WAverage heart rate: 141.11 bpmAverage heart rate moving: 141.12 bpmAverage heart rate resting: 135.3 bpmWork to rest ratio: 48.26
Aggregation of high-frequency data
Records made with high frequency, e.g., 1 or 5 Hz, generating afairly big amount of data.
Some summary needed to describe training sessions in acomparable way.
Scalar summaries: time spent above x% of maximum aerobicspeed, set of quantiles, etc.
However, data are potentially noisy and appropriate degree ofsmoothing often is not straightforward.
Training distribution profiles (Kosmidis & Passfield, 2015) extendthe idea of “time spent above”.
Training distribution profile
The distribution profile is defined as the curve {v ,Πu(v)|v ≥ 0} foreach session u lasting tu seconds, with
Πu(v) =
∫ tu
0I(vu(t) > v)dt
being a function of the variable v under consideration (e.g., speed orheart rate) and I(·) denoting the indicator function. This describes thetime spent training above a certain threshold.
An observed version of Πu(v) can be calculated as
Pu(v) =nu∑
j=2
(Tu,j − Tu,j−1)I(Vu,j > v)
which can be conveniently smoothed respecting the positivity andmonotonicity of Πu(v).
Speed profile
0.0
2.5
5.0
7.509
:00
09:1
5
09:3
0
09:4
5
10:0
0
time
spee
d [m
_per
_s]
Distribution profile (unsmoothed)
0
1000
2000
3000
0 4 8 12speed
time
spen
t abo
ve th
resh
old
Distribution profile (smoothed)
0
1000
2000
3000
0 4 8 12speed
time
spen
t abo
ve th
resh
old
Concentration profile
0
25
50
75
0 4 8 12speed
dtim
e
Real data example
27 training session by one runner in June 2013.
Data available fromhttp://www.ucl.ac.uk/~ucakhfr/data/runs_ATI.rda.
Brief summary of the data.
Calculation of distribution and concentration profiles.
Exploratory analysis of speed concentration profiles via functionalprincipal component analysis.
Real data example
Load and summarise data:R> library("trackeR")R> load("runs.rda")R> ## summarise sessionsR> runsSummary <- summary(runs)R> plot(runsSummary, group = c("total", "moving"),+ what = c("avgSpeed", "distance", "duration", "avgHeartRate"))
Calculate distribution and concentration profiles:R> dpRuns <- distributionProfile(runs)R> dpRunsS <- smoother(dpRuns)R> cpRuns <- concentrationProfile(dpRunsS)R> plot(cpRuns, multiple = TRUE, smooth = FALSE)
Real data example
● ●
● ●
●
●●
●● ●
● ●●
●●
●
●
●
●
●
●
●
●●
●
●
●
● ●
● ●
●
●●
●● ●
● ●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●
●●
●
● ● ●●
●
●
● ●
●
●● ●
●
●
●
● ●
● ●
●
● ●
● ●
●
●●
● ● ● ●● ●
● ●● ●
● ● ●
●●
● ●
● ●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●● ●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●● ●
● ●● ●
●
●
●
●
120130140150160170
2
3
4
5
10000
20000
40
80
120
160
avgHeartR
ate [bpm]avgS
peed [m_per_s]
distance [m]
duration [min]
Jun 03 Jun 10 Jun 17 Jun 24 Jul 01Date
type
●
●
total
moving
Real data example
heart.rate [bpm] speed [m_per_s]
0
200
400
0 50 100 150 200 250 0 4 8 12
dtim
eSession3
Session4
Session5
Session6
Session7
Session8
Session9
Session10
Session11
Session12
Session13
Session14
Session15
Session16
Session17
Session18
Session19
Session20
Session21
Session22
Session23
Session24
Session25
Functional principal components analysis
Goal: explore the structure of variability in the data and describecharacteristic features of the profiles.
Tool: functional principal components analysis (PCA).
Idea: Find a weight function such that it captures most of thevariability. This is called the first principal component (PC) or firstharmonic.
The second PC is chosen to capture most of the remainingvariability, and so on.
Do the components capture interpretable concepts?
Real data example
1 2 3 4
Principal component
Sha
re o
f var
ianc
e ca
ptur
ed [%
]
010
2030
4050
60
Real data example
0 2 4 6 8 10 12
050
100
150
PCA function 1 (Percentage of variability 62.7 )
argvals
Har
mon
ic 1
++++++++++++++++++++++++++
++++++
+
+
+
+
+
+
++++
+
+
+
+
+
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−
−−−−
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Real data example
0 2 4 6 8 10 12
020
4060
8010
012
0PCA function 2 (Percentage of variability 24.8 )
argvals
Har
mon
ic 2
++++++++++++++++++++++++
+++++++
+
+
+
+
+
+++
+
+
+
+
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−
−−−−−
−
−
−
−
−
−
−−−
−
−
−
−
−
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Real data example
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
−50
0
50
100
20 40 60 80 100durationMoving
spee
d_pc
1
Real data example
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
−50
0
50
3.5 4.0 4.5 5.0 5.5avgSpeedMoving
spee
d_pc
2
Summary & Outlook
Package trackeR provides basic infrastructure in R to read,process, summarise and analyse running and cycling data fromGPS-enabled devices.
Provides an implementation of training distribution profiles(Kosmidis & Passfield, 2015) as an aggregation of session data tofunctional objects for further analysis.Further work: Extend infrastructure
Reading capabilities for more formats, e.g., gpx and fit.More analytic tools for cycling data, e.g., W’, power distributionprofile.Suggestions welcome!
Further work: Functional data analysis for training distributionprofiles.
References
Golden Cheetahhttp://www.goldencheetah.org/
Mackie J (2015). cycleRtools: Tools for Cycling Data Analysis.R package version 1.0.4.https://github.com/jmackie4/cycleRtools
Kosmidis I, Passfield L (2015). “Linking the Performance of EnduranceRunners to Training and Physiological Effects via Multi-ResolutionElastic Net.” ArXiv e-print arXiv:1506.01388.