Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study...

35
Introduction Method Simulation study Discussion Bibliography Miscellaneous Functional data analysis for activity profiles from wearable devices Ian McKeague Joint work with Hsin-wen Chang Institute of Statistical Science, Academia Sinica September 16, 2019

Transcript of Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study...

Page 1: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Functional data analysis for activity profilesfrom wearable devices

Ian McKeague

Joint work with Hsin-wen ChangInstitute of Statistical Science, Academia Sinica

September 16, 2019

Page 2: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Outline

• Motivation: inference for sensor data from wearable devices

• Activity profiles based on sensor data (no pre-alignment)

• Empirical likelihood based confidence bands and functionalANOVA testing for mean activity profiles

• Monotonic functional data: no need for smoothing

• Application: accelerometer data from NHANES

Page 3: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Wearable device data

• Inexpensive wearable sensors generate massive amounts ofreal-time data, with potentially exciting applications tophysiological monitoring and health care delivery (mHealth).

• Inferential methods for comparing treatments based onwearable device (outcome) data not well developed.

• Serious challenges: unmeasured time-dependent confounders(e.g., circadian and dietary patterns), highly non-stationary,difficult to align across subjects, missing data, . . . .

• Connection to precision medicine: reinforcement learning formHealth (Murphy et al., 2017):http://papers.nips.cc/paper/7179-action-centered-contextual-bandits.pdf

Tradeoff between exploration and exploitation.Assumes stationarity.

Page 4: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Example: blood pressure monitoring

Page 5: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Example: sweat monitoring (really!)

Noninvasive alternative to blood glucose monitoring (Nyein et al. 2019)

Page 6: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Example: real-time sweat measurements

Patches worn on the forehead, forearm, underarm, and back, andsweat parameters monitored simultaneously.

Page 7: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Example: physical activity monitoring

• National Health and Nutrition Examination Survey (NHANES)

• Accelerometer ‘counts’ recorded for 7 consecutive days in1-minute epochs using an ActiGraph device

• Goal: to compare groups of subjects using their activity profiles

• activity profile: the amount of time activity exceeds some level

• Typically in the physical activity literature, activity is classifiedusing selected thresholds (e.g., as “sedentary”)

Page 8: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Example: gene therapy for mitochondrial disease

5, 000 children are born with mitochondrial disease each year in the US.

Columbia RCT: 40 patients. Accelerometer: activPAL.

Page 9: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

ActiGraph accelerometer readings (NHANES)

0 2000 4000 6000 8000 10000

020

0040

0060

00

time (unit = 1 minute)

inte

nsity

cou

nts

0 2000 4000 6000 8000 10000

020

0060

00

time (unit = 1 minute)

inte

nsity

cou

nts

Page 10: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Activity profiles as monotonic functional data

Sensor readings X (t), t ∈ [0, τ ] generate an activity profile:

Ta = Leb({t ∈ [0, τ ] : X (t) > a}), a ∈ R.

Sensor readings X (t) over 25-minutes with activity Ta = 9 minutesabove level a = 0.1, and Ta = 16 minutes above level a = −0.1.

Note: Need to avoid pre-alignment of sensor data among subjects

(needed in standard FDA approaches).

Page 11: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Raw and mean activity profiles from the NHANES data

Veterans aged 75-and-older (black, n1 = 160), non-veterans aged75-and-older (red, n2 = 279), veterans aged 65–74 (blue,n3 = 139), and non-veterans aged aged 65–74 (green n4 = 348)

Page 12: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Functional ANOVA for activity profiles

Goal: Compare k mean activity profiles µj(a) = ETaj , j = 1, . . . , k.Functional ANOVA: tests µ1(·) = . . . = µk(·) vs. omnibus alternative

• Taj = Leb({t ∈ [0, τ ] : Xj(t) > a}), where

Xj = {Xj(t), t ∈ [0, τ ]}

for sensor readings in the jth group, τ is total study time

• Observe nj iid copies

{Taj1, . . . ,Tajnj , a ∈ [α1, α2]}

of the activity profile Taj . Weaker than iid observations of Xj .[α1, α2] is the range of device readings of interest

• Approach based on a nonparametric likelihood ratio procedure:empirical likelihood (EL)

Page 13: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Empirical likelihood (EL)

• EL involves forming a ratio of two nonparametric likelihoodssubject to constraints on the parameters of interest

• Two early papers: [Thomas and Grunkemeier, 1975],[Owen, 1988]

• Produces highly accurate confidence regions [Owen, 2001] andtests with optimal power [Kitamura et al., 2012]

Page 14: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Empirical likelihood

Observe X1, . . . ,Xn ∼ iid F , µ = µ(F ) a parameter of interest.

NP likelihood ratio:

R(µ0) =sup{L(F ) : µ(F ) = µ0}

sup{L(F )}

L(F ) =∏n

i=1 pi is the NP likelihood, pi = point mass (of F ) at Xi .

Hypothesis tests:

Accept µ(F ) = µ0 when R(µ0) ≥ r0 for some threshold r0.

Confidence regions: {µ : R(µ) ≥ r0}

Page 15: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

EL for means

µ = E (X )

R(µ) = max

{n∏

i=1

npi :n∑

i=1

piXi = µ, pi ≥ 0,n∑

i=1

pi = 1

}

Chi-squared calibration: Wilks type theorem for −2 logR(µ0).

Page 16: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

EL for quantiles

Estimating equation:

E (m(X , µ)) = 0, where for the α-quantile

m(X , µ) = 1{X ≤ µ} − α.

R(µ) = max

{n∏

i=1

npi :n∑

i=1

pim(Xi , µ) = 0, pi ≥ 0,n∑

i=1

pi = 1

}

Chi-squared calibration:

Wilks theorem still applies: replace Xi − µ0 by m(X , µ0).

Page 17: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Activity profiles: relevant references

• Functional data literature• Wald-type ANOVA tests requiring curve registration

[Gorecki and Smaga, 2018]• EL-based tests in a concurrent linear model

[Wang et al., 2018], requiring curve registration and smoothing• Curve registration/alignment only useful on raw sensor data• Time warping alters the activity profiles!

• Physical activity literature• Only considers activity profiles at a few activity levels

• e.g., the time spent in sedentary behavior could berepresented by the accumulated amount of time below 100counts/minute [Matthews et al., 2008]

• The levels are typically chosen in an ad hoc fashion

Page 18: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Our contribution

EL-based functional ANOVA test for comparing groups of subjectsbased on their activity profile data, i.e., an omnibus test of

H0 : µ1(·) = . . . = µk(·)

• greater efficiency using EL

• avoids issues in pre-aligning sensor data

• no smoothing needed (as activity profiles are monotonic)

• analyze entire activity profiles

• EL-based simultaneous confidence bands

• approach also applies to the quantiles of activity profiles

Page 19: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Approach applies to functions of bounded variation

Example: Area covered by Arctic sea ice (Nature, Sept 2019)

Example: Canadian temperature data (Ramsay & Silverman)

Page 20: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Canadian temperature data

Average daily temperature at 35 Canadian weather stations.

Page 21: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

EL-based ANOVA test for activity profiles

• For an activity level a, construct the local EL ratio as

R(a) =sup

{∏kj=1 L(Faj) : µ1(a) = . . . = µk(a)

}sup

{∏kj=1 L(Faj)

}L(Faj) is the NP likelihood based on observation of Taj

• To test H0 we propose the maximally selected EL statistic:

Kn = supa∈[α1,α2]

[−2 logR(a)] .

Page 22: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Wilks type theorem for the EL-based ANOVA test

Suppose nj/n→ γj > 0 and infa∈[α1,α2]Var(Tja) > 0, for each j .

Then, under H0,

Knd−→ sup

a∈[α1,α2]

k∑j=1

wj(a)

[Uj(a)√wj(a)

− U(a)

]2

,

U(a) =k∑

j=1

√wj(a)Uj(a),

Uj are independent zero-mean Gaussian processes, and the weightswj(a) ∝ γj/Var(Tja) are normalized to sum to 1 across the groups.

Proof: Bracketing-entropy CLT for stochastic processes with monotone sample

paths furnishes Uj as the limit of the process√nj{µj(·)− µj(·)}/σj(·).

Page 23: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Nonparametric bootstrap calibration

• The limiting distribution can be bootstrapped by replacingUj(a) by its nonparametric bootstrap

U∗j (a) =√nj{µ∗j (a)− µj(a)}/σj(a)

and replacing other unknowns by their estimates.

• µ∗j (a) is obtained by evaluating µj(a) after resampling with

replacement from {Taj1, . . . ,Tajnj}, with each Taji regarded asfunction of a

• Let M∗n denote the resulting bootstrap

• Simulate M∗n by repeatedly resampling

• Compare the empirical quantiles of these bootstrapped valuesM∗n with our test statistic Kn

Page 24: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Simulation study

• Compare our approach with tests from R package fdANOVA:• Fmaxb: a maximally-selected F -statistic• GPF: an integrated F -statistic• TRP: random projections

that apply to generic functional data.

• Striking differences in performance if groups are unbalanced

• Simulation model:• Generate Xj(·) as positive part of a scaled OU process;

multiply the resulting Taj by an independent beta r.v.• k = 3, each group/scenario with distinct OU/beta parameters.

Page 25: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

A B C

Top row: µj(a) = ETja. Bottom row: σ2j (a) = Var(Tja). Scenario A: identical

µj(a). Scenario B: crossing µj(a). Scenario C: ordered µj(a).

Page 26: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Table: Empirical rejection rates (percentages) for functional ANOVA testsunder various scenarios and sample sizes, based on 1000 Monte Carloreplications, 1000 bootstrap samples, and a nominal level of 5%.

scenario (n1, n2, n3) EL test Fmaxb GPF TRP

A(70, 100, 130) 5.4 3.8 3.6 2.3(130, 100, 70) 5.8 8.1 9.5 4.0

B(70, 100, 130) 76.3 50.2 30.4 67.1(130, 100, 70) 75.6 70.0 60.0 61.7

C(70, 100, 130) 81.3 60.9 55.2 61.6(130, 100, 70) 77.6 74.5 69.5 57.4

Page 27: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

NHANES data revisited

Sample means of raw accelerometer readings (in 4 consecutive days)

comparing veterans aged 75-and-older (acqua) and veterans aged 65–74

(coral). Differences apparent even without curve alignment/smoothing.

Page 28: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Confidence bands of mean activity profiles

Right: EL (black), Wald-type (red), and MFD (blue) 95% simultaneousconfidence bands for the mean activity profile (estimate in gray) of veteransaged-75-and-older, showing that the EL band is narrower than the Wald-typeband and similar to the MFD band at most activity levels.

MFD (mean of functional data) band: uses local linear smoothing with

cross-validated bandwidth selection [Degras, 2011, Degras, 2017].

Page 29: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Applying the various tests

Table: p-values from various functional ANOVA tests: veterans aged75-and-older (group 1), non-veterans aged 75-and-older (group 2),veterans aged 65–74 (group 3), and non-veterans aged 65–74 (group 4).

Comparison EL test GPF Fmaxb TRP

all groups < 0.001 < 0.001 < 0.001 < 0.001group 1 vs 2 0.010 0.060 0.016 0.033group 3 vs 4 0.345 0.416 0.365 0.579group 1 vs 3 < 0.001 < 0.001 < 0.001 < 0.001group 2 vs 4 < 0.001 < 0.001 < 0.001 < 0.001

Page 30: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Conclusion

• We have developed a new functional ANOVA test based on amaximally-selected local empirical likelihood statistic

• Approach applies generally to functional data with samplepaths of bounded variation. Smoothing avoided.

• Simulation study shows that the new test is more accurateand more powerful than standard FDA approaches

• We applied the proposed method to wearable device datafrom NHANES and obtained more significant results thanexisting functional ANOVA tests

• Directions for future work: gaps in sensor observations,activity profiles regressed on high-dimensional predictors . . .

Page 31: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Thank you!

Page 32: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Degras, D. A. (2011).Simultaneous confidence bands for nonparametric regressionwith functional data.Statistica Sinica, 21(4):1735–1765.

Degras, D. A. (2017).Simultaneous confidence bands for the mean of functionaldata.Wiley Interdisciplinary Reviews: Computational Statistics,9(3):e1397.

Gorecki, T. and Smaga, L. (2018).fdANOVA: an R software package for analysis of variance forunivariate and multivariate functional data.Computational Statistics.https://doi.org/10.1007/s00180-018-0842-7.

Kitamura, Y., Santos, A., and Shaikh, A. M. (2012).

Page 33: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

On the asymptotic optimality of empirical likelihood for testingmoment restrictions.Econometrica, 80(1):413–423.

Matthews, C. E., Chen, K. Y., Freedson, P. S., Buchowski,M. S., Beech, B. M., Pate, R. R., and Troiano, R. P. (2008).Amount of time spent in sedentary behaviors in the UnitedStates, 2003–2004.American Journal of Epidemiology, 167(7):875–881.

Owen, A. B. (1988).Empirical likelihood ratio confidence intervals for a singlefunctional.Biometrika, 75(2):237–249.

Owen, A. B. (2001).Empirical Likelihood.Chapman & Hall/CRC, Boca Raton.

Thomas, D. R. and Grunkemeier, G. L. (1975).

Page 34: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Confidence interval estimation of survival probabilities forcensored data.Journal of the American Statistical Association, 70:865–871.

Wang, H., Zhong, P.-S., Cui, Y., and Li, Y. (2018).Unified empirical likelihood ratio tests for functionalconcurrent linear models and the phase transition from sparseto dense functional data.Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 80(2):343–364.

Page 35: Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Specifying [α1, α2]

• In practice α1 and α2 may be specified by practitioners basedon a range of accelerometer readings available in theparticular context

• They could also be chosen in a data-driven fashion, sayα1 = inf{a : µ(a) < 0.95τ} and α2 = sup{a : µ(a) > 0.05τ};this is what we use in our simulation studies and data analysis