Tracking data from GPS-enabled devices in R with package ...ucakhfr/talks/ATI2015.pdf ·

Tracking data from GPS-enabled devices in R withpackage ’trackeR’

Hannah Frick, Ioannis Kosmidis

http://www.ucl.ac.uk/~ucakhfr/

http://www.ucl.ac.uk/~ucakhfr/

Outline

Introduction

Package structure and capabilities

Training distribution profiles

Real data example

Summary & Outlook

Introduction

Aim: Access and analyse tracking data from GPS-enabled devices.

Options include:

Plenty of commercial software from manufacturers of devices(Catapult, Garmin, TomTom, Polar, ...) or apps (endomondo,Runtastic, Nike+, ...) for different audiences.

Open-source software such as Golden Cheetah which provides alot of options for analysis.

R system for statistical computing: open-source with a vast collection ofadd-on packages for a large variety of tasks.

However, relatively few packages with a focus on sports and/orGPS tracking.

Package cycleRtools provides specialised cycling analysis in R.

We aim to provide the infrastructure to handle training data fromrunning and cycling to leverage the capabilities of R.

Package trackeR

Computational infrastructure: R with add-on packages, mainly zoo forordered observations as well as ggplot2 and ggmap for visualisations.

Available from Github viaR> # install.packages("devtools")R> devtools::install_github("hfrick/trackeR")

Provides functionality for 4 main areas:

read data

process data

summarise & analyse training sessions

visualise training sessions

Read data

Input:tcx file

readTCX

data frame

trackeRdata

trackeRdataobject

Input:db3 file

readDB3

readContainer

Process data

All observations:

Basic cleaning, e.g., no negative speeds or distances.

Split observations into training sessions: Any observations furtherapart than a specified threshold are considered to belong todifferent training sessions. Default is 2 hours.

Per session:

Distances as recorded by the GPS device can be corrected forelevation if necessary.

Imputation of speeds: Speed 0 is imputed for time periods whenthe device is paused.

Imputation of speeds

Either take speed measurements as provided or calculate fromcumulative distances:

Speedj =Distancej+1 − Distancej

Timej+1 − Timej. (1)

Remove observations with negative or missing speeds.

Check time stamps of observations for gaps of more than lgap

seconds.

In these gaps, impute m observations with 0 speed, latest positionfor latitude, longitude and altitude, and missing values for anythingelse.

Add m similar observations at the start and end of the session,assuming the athlete does not immediately start or stop moving.

Update distances based on imputed speeds via (1).

Imputation of speeds

Tj Tj+1

Tj∗ + h Tj∗ + 2h ...

Sj

0 ...

Sj+1

Tj

Sj Sj+1

Tj+1 − Tj > lgap

lskip lskip

Tj+1

0 0

Tj∗ Tj∗+1

0

Example for reading and processing data

Load package and read tcx file:R> library("trackeR")R> filepath <- system.file("extdata", "2013-06-04-174137.TCX",+ package = "trackeR")R> df <- readTCX(file = filepath, timezone = "GMT")

Process data into trackeRdata object:

R> run <- trackeRdata(df,+ ## basic information+ units = NULL, cycling = FALSE,+ ## separate sessions+ sessionThreshold = 2,+ ## elevation correction+ correctDistances = FALSE, country = NULL, mask = TRUE,+ ## impute speeds+ fromDistances = TRUE, lgap = 30, lskip = 5, m = 11)

Example for reading and processing data

Read and process data from a single file in one step:

R> run <- readContainer(file = filepath, type = "tcx",+ units = NULL, cycling = FALSE, sessionThreshold = 2)

Read and process data from all files in a directory in one step:

R> runs <- readDirectory(directory = "~/path/to/directory/",+ aggregate = TRUE,+ speedunit = list(tcx = "m_per_s", db3 = "km_per_h"),+ distanceunit = list(tcx = "m", db3 = "km"),+ verbose = TRUE)

Analysis & Visualisation

Visualise sessions:Plot profiles for speed, pace, elevation level, heart rate, etc.Plot route taken during a session on various maps.

Summarise sessions.Common summary statistics such as total distance, duration, timemoving, averages of speed, pace, power, cadence and heart rate,etc.Time spent training in specific heart rate or speed zones.

Visualise summaries of sessions.

Visualise sessions

5: 2013−06−05 6: 2013−06−06

90

120

150

2

4

6

8

10

heart.rate [bpm]

pace [min_per_km

]

05:3

0

05:4

0

05:5

0

06:0

0

11:5

0

12:0

0

12:1

0

12:2

0

12:3

0

12:4

0

time

Visualise sessions

5: 2013−06−05 6: 2013−06−06

0

25

50

75

0

2

4

6

altitude [m]

speed [m_per_s]

05:3

0

05:4

0

05:5

0

06:0

0

11:5

0

12:0

0

12:1

0

12:2

0

12:3

0

12:4

0

time

Visualise sessions

0

2

4

6

05:3

0

05:4

0

05:5

0

06:0

0

time

spee

d [m

_per

_s]

Visualise sessions

41.35

41.36

41.37

41.38

41.39

2.14 2.15 2.16 2.17 2.18longitude

latit

ude

0

2

4

6

speed

Summary of session data

R> summary(runs, session = 1)

*** Session 1 ***

Session times: 2013-06-01 17:32:15 - 2013-06-01 18:37:56Distance: 14130.7 mDuration: 1.09 hoursMoving time: 1.07 hoursAverage speed: 3.59 m_per_sAverage speed moving: 3.66 m_per_sAverage pace (per 1 km): 4:38 min:secAverage pace moving (per 1 km): 4:33 min:secAverage cadence: 88.66 steps_per_minAverage cadence moving: 88.81 steps_per_minAverage power: NA WAverage power moving: NA WAverage heart rate: 141.11 bpmAverage heart rate moving: 141.12 bpmAverage heart rate resting: 135.3 bpmWork to rest ratio: 48.26

Aggregation of high-frequency data

Records made with high frequency, e.g., 1 or 5 Hz, generating afairly big amount of data.

Some summary needed to describe training sessions in acomparable way.

Scalar summaries: time spent above x% of maximum aerobicspeed, set of quantiles, etc.

However, data are potentially noisy and appropriate degree ofsmoothing often is not straightforward.

Training distribution profiles (Kosmidis & Passfield, 2015) extendthe idea of “time spent above”.

Training distribution profile

The distribution profile is defined as the curve {v ,Πu(v)|v ≥ 0} foreach session u lasting tu seconds, with

Πu(v) =

∫ tu

0I(vu(t) > v)dt

being a function of the variable v under consideration (e.g., speed orheart rate) and I(·) denoting the indicator function. This describes thetime spent training above a certain threshold.

An observed version of Πu(v) can be calculated as

Pu(v) =nu∑

j=2

(Tu,j − Tu,j−1)I(Vu,j > v)

which can be conveniently smoothed respecting the positivity andmonotonicity of Πu(v).

Speed profile

0.0

2.5

5.0

7.509

:00

09:1

5

09:3

0

09:4

5

10:0

0

time

spee

d [m

_per

_s]

Distribution profile (unsmoothed)

0

1000

2000

3000

0 4 8 12speed

time

spen

t abo

ve th

resh

old

Distribution profile (smoothed)

0

1000

2000

3000

0 4 8 12speed

time

spen

t abo

ve th

resh

old

Concentration profile

0

25

50

75

0 4 8 12speed

dtim

e

Real data example

27 training session by one runner in June 2013.

Data available fromhttp://www.ucl.ac.uk/~ucakhfr/data/runs_ATI.rda.

Brief summary of the data.

Calculation of distribution and concentration profiles.

Exploratory analysis of speed concentration profiles via functionalprincipal component analysis.

http://www.ucl.ac.uk/~ucakhfr/data/runs_ATI.rda

Real data example

Load and summarise data:R> library("trackeR")R> load("runs.rda")R> ## summarise sessionsR> runsSummary <- summary(runs)R> plot(runsSummary, group = c("total", "moving"),+ what = c("avgSpeed", "distance", "duration", "avgHeartRate"))

Calculate distribution and concentration profiles:R> dpRuns <- distributionProfile(runs)R> dpRunsS <- smoother(dpRuns)R> cpRuns <- concentrationProfile(dpRunsS)R> plot(cpRuns, multiple = TRUE, smooth = FALSE)

Real data example

● ●

● ●

●

●●

●● ●

● ●●

●●

●

●

●

●

●

●

●

●●

●

●

●

● ●

● ●

●

●●

●● ●

● ●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

● ●

●●

●

● ● ●●

●

●

● ●

●

●● ●

●

●

●

● ●

● ●

●

● ●

● ●

●

●●

● ● ● ●● ●

● ●● ●

● ● ●

●●

● ●

● ●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●● ●

●

●● ●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●● ●

● ●● ●

●

●

●

●

120130140150160170

2

3

4

5

10000

20000

40

80

120

160

avgHeartR

ate [bpm]avgS

peed [m_per_s]

distance [m]

duration [min]

Jun 03 Jun 10 Jun 17 Jun 24 Jul 01Date

type

●

●

total

moving

Real data example

heart.rate [bpm] speed [m_per_s]

0

200

400

0 50 100 150 200 250 0 4 8 12

dtim

eSession3

Session4

Session5

Session6

Session7

Session8

Session9

Session10

Session11

Session12

Session13

Session14

Session15

Session16

Session17

Session18

Session19

Session20

Session21

Session22

Session23

Session24

Session25

Functional principal components analysis

Goal: explore the structure of variability in the data and describecharacteristic features of the profiles.

Tool: functional principal components analysis (PCA).

Idea: Find a weight function such that it captures most of thevariability. This is called the first principal component (PC) or firstharmonic.

The second PC is chosen to capture most of the remainingvariability, and so on.

Do the components capture interpretable concepts?

Real data example

1 2 3 4

Principal component

Sha

re o

f var

ianc

e ca

ptur

ed [%

]

010

2030

4050

60

Real data example

0 2 4 6 8 10 12

050

100

150

PCA function 1 (Percentage of variability 62.7 )

argvals

Har

mon

ic 1

++++++++++++++++++++++++++

++++++

+

+

+

+

+

+

++++

+

+

+

+

+

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−−

−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Real data example

0 2 4 6 8 10 12

020

4060

8010

012

0PCA function 2 (Percentage of variability 24.8 )

argvals

Har

mon

ic 2

++++++++++++++++++++++++

+++++++

+

+

+

+

+

+++

+

+

+

+

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−−

−−−−−

−

−

−

−

−

−

−−−

−

−

−

−

−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Real data example

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

−50

0

50

100

20 40 60 80 100durationMoving

spee

d_pc

1

Real data example

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

−50

0

50

3.5 4.0 4.5 5.0 5.5avgSpeedMoving

spee

d_pc

2

Summary & Outlook

Package trackeR provides basic infrastructure in R to read,process, summarise and analyse running and cycling data fromGPS-enabled devices.

Provides an implementation of training distribution profiles(Kosmidis & Passfield, 2015) as an aggregation of session data tofunctional objects for further analysis.Further work: Extend infrastructure

Reading capabilities for more formats, e.g., gpx and fit.More analytic tools for cycling data, e.g., W’, power distributionprofile.Suggestions welcome!

Further work: Functional data analysis for training distributionprofiles.

References

Golden Cheetahhttp://www.goldencheetah.org/

Mackie J (2015). cycleRtools: Tools for Cycling Data Analysis.R package version 1.0.4.https://github.com/jmackie4/cycleRtools

Kosmidis I, Passfield L (2015). “Linking the Performance of EnduranceRunners to Training and Physiological Effects via Multi-ResolutionElastic Net.” ArXiv e-print arXiv:1506.01388.

http://www.goldencheetah.org/

https://github.com/jmackie4/cycleRtools

Tracking data from GPS-enabled devices in R with package ...ucakhfr/talks/ATI2015.pdf ·

Documents

Transcript of Tracking data from GPS-enabled devices in R with package ...ucakhfr/talks/ATI2015.pdf ·