Tutorial III: Joint Models for Multivariate Longitudinal Data · 2.3 Multivariate Models •...

Tutorial III:

Joint Models for Multivariate Longitudinal Data

Joint Modeling and BeyondHasselt 2016

Geert Verbeke

Interuniversity Institute for Biostatistics and statistical Bioinformatics

[email protected]

http://perswww.kuleuven.be/geert verbeke

Contents

1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Approaches to Simultaneously Analyze Multiple Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Random-effects Models for High-dimensional Multivariate Longitudinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Joint Modeling and Beyond: Hasselt 2016 i

Chapter 1

Examples

. Example 1: Hearing Data

. Example 2: Fitness Data

Joint Modeling and Beyond: Hasselt 2016 1

1.1 Example 1: Hearing Data∗

∗Fieuws & Verbeke, Biometrics 2006

• Threshold sound pressure levels (dB), on both ears,11 frequencies: 125 → 8000 Hz

• Observations from 603 males, with up to 15 obs./subject.

× 603


• Research questions:

. Is the relation between hearing loss and age the same for all frequencies?

. How are subject-specific evolutions for the different frequencies related?


1.2 Example 2: Fitness Data∗

∗Fieuws, Verbeke, Boen, & Delecluse, Applied Statistics 2006

• Intervention study on 105 elderly participants

• Randomization:

. classical fitness: 3 weekly visits to gym

. distance coaching program with emphasis on incorporating physical activities indaily life

• Aim is to study the effect on psycho-cognitive functioning


• Psycho-cognitive functioning: 106 dichotomised items, 7 different questionnaires,each measuring a latent component of psycho-cognitive functioning:

1. Physical well-being (10)

2. Psychological well-being (14)

3. Self-esteem (10)

4. Physical self-perception (30)

5. Degree of opposition to physical activities (21)

6. Perceived self-efficacy towards physical activity (5)

7. Motivation for intervention program (16)


• Research questions:

. Is there an overall treatment effect?

. How are the various components of psycho-cognitive functioning associated?


Chapter 2

Approaches to Simultaneously Analyze Multiple Outcomes

. Introduction and notation

. Why joint modeling ?

. Multivariate models

. Conditional models

. Shared-parameter models

. Random-effects models

. Methods based on dimension reduction


2.1 Introduction and Notation

• Let Y1 and Y2 be two outcomes measured on a number of subjects for which jointmodeling is of scientific interest.

• We focus on settings where multiple measurements are available for both,potentially but not necessarily longitudinal:

. E.g., Hearing threshold at 125 Hz and hearing threshold at 500 Hz

. E.g., Physical well-being and Psychological well-being

• Same ideas can be applied if only one observation is available:

. E.g., Longitudinal outcome modeled jointly with time-to-event

. E.g., Longitudinal outcome modeled jointly with dropout indicator


• We will discuss various approaches possible to construct a joint density f (y1, y2)of (Y1, Y2)

• Extensions to more than 2 outcomes are (relatively) straightforward


2.2 Why joint modeling ?

• Joint tests for fixed effects (e.g., common average trend)

• Interest in association structure (e.g., association of evolutions)

• Modelling changes in shape

• Improving classification results

• . . .


2.3 Multivariate Models

• General idea: Specify f (y1, y2) directly

• Advantages:

. Allows for direct inferences for marginal characteristics of Y1, Y2, and theirassociations

. Symmetric treatment of Y1 and Y2

• Disadvantages:

. Difficult with Y1 and Y2 of a different type

. Difficult for unbalanced data since association between Y1 and Y2 needs to bemodeled directly

. Difficult to extend to higher dimensions


2.4 Conditional Models

• General idea: Factorize f (y1, y2) as

f (y1, y2) = f (y1|y2)f (y2) = f (y2|y1)f (y1)

• Advantage:

. Modeling tasks reduced to specifying models for each of the outcomesseparately

• Disadvantages:

. With Y1 and Y2, specifying f (y1|y2) requires careful reflection about plausibleassociations between response Y1 and time-varying covariate Y2

. No direct marginal inferences.


. For example, based on f (y1, y2) = f (y1|y2)f (y2), E(Y1) requires

E(Y1) = E[E(Y1|Y2)] =

∫ [∫y1f (y1|y2) dy1

]f (y2) dy2,

. E(Y1) not necessarily of the same parametric form as E(Y1|Y2)(e.g., logistic)

. Effects on Y1 may be attenuated by conditioning on Y2.

. Compatible specification of f (y1|y2)f (y2) and f (y2|y1)f (y1) often requiresdirect specification of f (y1, y2)

. Higher dimensions: Many possible factorizations


• Typical example

. Selection models for longitudinal data subject to informative dropout

. Marginal model f (y1) for the longitudinal outcome Y1

. Conditional model for dropout time Y2:

P [Y2 = t | Y1(t), Y1(t − 1), . . .]


2.5 Shared-parameter Models

• General idea: Latent variables, shared by Y1 and Y2 imply associations

• Let b denote a vector of random effects, with density f (b) (often normal)

• Assume conditional independence: Y1⊥⊥Y2|b

• Joint density of (Y1, Y2) obtained from

f (y1, y2) =

∫f (y1, y2|b) db =

∫f (y1|b)f (y2|b)f (b) db


• Advantages:

. Y1 and Y2 can be of different type

. Parameters in joint model f (y1, y2) have the same interpretation as in the‘univariate’ models f (y1) and f (y2)

. Extension to higher dimensions very straightforward

• Disadvantage:

. Very strong assumptions about the association between Y1 and Y2

. Assume random-intercepts models for Y1 and Y2:

Y1(t) = β1 + b + β2t + e1(t)

Y2(t) = β3 + γb + β4t + e2(t)

with e1(t) ∼ N (0, σ2

1), e2(t) ∼ N (0, σ2

2), and b ∼ N (0, σ2

b)


. Implied association structure (s 6= t):

Corr{Y1(s), Y1(t)} =σ2

b

σ2

b + σ21

Corr{Y2(s), Y2(t)} =γ2σ2

b

γ2σ2

b + σ22

Corr{Y1(s), Y2(t)} =γσ2

b√σ2

b + σ2

1

√γ2σ2

b + σ2

2

=√

Corr{Y1(s), Y1(t)}√

Corr{Y2(s)Y2(t)}

. Association between Y1 and Y2 directly follows from association structures forY1 and Y2

. In some cases this is problematic, especially in higher dimensions


• Typical example:

. Longitudinal outcome Y1 and time-to-event Y2

. Assume a mixed model for Y1:

Y1(t) = (β1 + b1) + (β2 + b2)t + e1(t)

with b = (b1, b2)′ ∼ N (0, D)

. Assume proportional hazard model for Y2 with hazard depending on b:

limh→0

Pr{t ≤ T < t + h|T ≥ t, b} = = λ0(t) exp{αg(b)}

for some pre-specified function g(b).

. Hazard depends on some feature g(b) of the longitudinal trajectories.

. Relation between Y1 and Y2 addressed via inference for α


2.6 Random-effects Models

• General idea: Separate but correlated latent variables for Y1 and Y2

• Let b1 and b2 denote vectors of random effects, with joint density f (b1, b2)(often normal)

• Assume conditional independence: Y1⊥⊥Y2|(b1, b2)

• Joint density of (Y1, Y2) obtained from

f (y1, y2) =

∫f (y1, y2|b) db =

∫ ∫f (y1|b1)f (y2|b2)f (b1, b2) db1db2


• Advantages:

. Same as with shared-parameter models

. Less strict assumtptions about the association between Y1 and Y2

. Assume random-intercepts models for Y1 and Y2:

Y1(t) = β1 + b1 + β2t + e1(t)

Y2(t) = β3 + b2 + β4t + e2(t)

with e1(t) ∼ N (0, σ2

1), e2(t) ∼ N (0, σ2

2), and (b1, b2) ∼ N (0, D)

. Similar expressions for Corr{Y1(s), Y1(t)} and Corr{Y2(s), Y2(t)} as before, but

Corr{Y1(s), Y2(t)} = Corr(b1, b2)√

Corr{Y1(s), Y1(t)}√

Corr{Y2(s), Y2(t)}≤

√Corr{Y1(s), Y1(t)}

√Corr{Y2(s), Y2(t)}

does not directly follow from the association structures for Y1 and Y2


• Disadvantage:

. Dimensionality of b increases with the number of outcomes modeled, hencealso the integration in

f (y1, . . . , yk) =

∫. . .

∫f (y1|b1) . . . f (yk|bk)f (b1, . . . , bk) db1 . . . dbk

=⇒ Computational problems (addressed later)


2.7 Methods Based on Dimension Reduction

• General idea: Use a factor-analytic, or principal-component type, analysis tofirst reduce the dimensionality of the response vector.

• In a second stage, the principal factors are analyzed using any of the classical(longitudinal) models.

• Advantage:

. Standard techniques can be used to analyze the principal factors

• Disadvantages:

. Inferences about principal factor(s), not about original outcome variables.

. Very strong restrictions needed in cases of highly unbalanced longitudinal data:

∗ unequal numbers of measurements for different subjects

∗ observations taken at arbitrary time points.


Chapter 3

Random-effects Models for High-dimensional MultivariateLongitudinal Data

. A random-effects model

. A pairwise model fitting approach

. Applications


3.1 A Random-effects Model

• We now consider the setting of modeling multivariate longitudinal data

• Let Y1i(t), . . . , Ymi(t) be the m outcomes measured on subject i, at time point t

• Outcomes can be of different types:

. continuous

. binary

. counts

. . . .


• Our requirements:

. Inferences for original outcomes

. Direct marginal inferences

. Separate ‘univariate’ models areimplied by ‘multivariate’ model

. Different types of outcomes possible

=⇒ Random-effects approach

. No restriction on dimensionality}

=⇒ Computational problem !


• As an example, re-consider the hearing data:

. Linear mixed model for each outcome separately:

Yi(t) = (β1 + β2 Fagei + β3 Fage2

i + ai)

+ (β4 + β5 Fagei + bi) t + β6 visit1(t) + εi(t)

. Joint model:

Y1i(t) = µ1(t) + a1i + b1it + ε1i(t)

Y2i(t) = µ2(t) + a2i + b2it + ε2i(t)

...

Y22i(t) = µ22(t) + a22i + b22it + ε22i(t)


. Distributional assumptions:

(a1i, a2i, . . . , a22i, b1i, b2i, . . . , b22i)′ ∼ N (0,D44×44)

(ε1i(t), ε2i(t), . . . , ε22i(t))′ ∼ N (0,Σ22×22) , for all t

. Full multivariate joint model:

∗ 44 × 44 covariance matrix for random effects

∗ 22 × 22 covariance matrix for error components

∗ 990 + 253 = 1243 covariance parameters

=⇒ Computational problems!


• As an example, re-consider the fitness data:

. Random-effects logistic regression for each outcome:

logit{P (Yij = 1)} = α + βDCi + bi

. Joint model:

logit{P (Yij1 = 1)} = α1 + β1DCi + bi1


...



. Distributional assumptions:

(bi1, bi2, . . . , bi7)′ ∼ N (0, D7×7)

. Full multivariate joint model:

∗ Only (?) 28 parameters in covariance matrix

∗ Numerical integration over 7-dim. random-effects distribution!

=⇒ Computational problems!


3.2 A Pairwise Model Fitting Approach

• General idea∗:

. Estimation of all parameters does not require fitting the full multivariate model

. It is sufficient to fit the implied model for all pairs, i.e., all ‘bivariate’ models

• Fit all bivariate models:

(Y1, Y2), (Y1, Y3), . . . , (Y1, Ym), (Y2, Y3), . . . , (Y2, Ym), . . . , (Ym−1, Ym)

• Straightforward using standard software (e.g., SAS)

• Equivalent to maximizing pseudo (log-)likelihood:

p`(Θ) = `(Y1, Y2|Θ1,2) + `(Y1, Y3|Θ1,3) + . . . + `(Ym−1, Ym|Θm−1,m)

∗Fieuws & Verbeke, Biometrics 2006


• Θp,q is the vector of parameters in the bivariate model for (Yp, Yq)

• Let Θ be the vector obtained from stacking all Θp,q

• Asymptotic properties (from pseudo likelihood theory):√

N(Θ̂ − Θ) ∼ MV N (0, J−1KJ−1)

J and K consist of first and second-order derivatives of p`.

• Some of the Θp,q contain the same parameters.

• Estimates for these parameters are obtained by averaging pair-specific estimates

• Inference is based on:√

N(A′Θ̂ − A′

Θ) ∼ MV N (0, A′J−1KJ−1A)


• Properties of pairwise estimators:

. Consistent, high agreement with MLE (ICC > 0.95)

. Correct estimation of sampling variability, corrected for misspecified associationstructure

. Relative efficiency versus MLE:

∗ In general, minor loss of efficiency (RE > 0.9), unless with shared parameters

∗ RE independent of number of outcomes


3.3 Example 1: Hearing Data

• Example: Interaction between the linear time effect and age.

• Estimates and standard errors:

χ2

10= 90.4, p < 0.0001 χ2

10= 110.9, p < 0.0001


• Association between underlying random effects: D44×44 of interest

• PCA on correlation matrix of random slopes, left side:


3.4 Example 2: Fitness Data

• Treatment effects (s.e.’s) from univariate and multivariate models:

Univariate Multivariate

Models Model

Physical well-being −0.13 (0.37) −0.12 (0.37)

Psychological well-being 1.22 (0.61) 1.00 (0.68)

Self-esteem 0.43 (0.42) 0.49 (0.39)

Physical self-perception 0.58 (0.24)∗ 0.52 (0.25)∗

Degree of opposition 0.06 (0.24) 0.07 (0.24)

Self-efficacy −0.24 (0.33) −0.22 (0.33)

Motivation −0.35 (0.16)∗ −0.34 (0.16)∗

∗p < 0.05


• No gain from multivariate analysis if interest is in inferences for each outcomeseparately.

• Wald test for overall treatment effect: χ26

= 16.66, p = 0.011

• Correlation matrix (with variances) of random intercepts:

Physical well-being: 2.55

Psychological well-being: 0.75 4.41

Self-esteem: 0.55 0.76 3.43

Physical self-perception: 0.66 0.46 0.53 1.83

Degree of opposition: 0.19 0.12 0.23 0.38 1.16

Self-efficacy: 0.29 0.24 0.25 0.36 0.23 1.33

Motivation: 0.42 0.31 0.28 0.40 0.47 0.30 0.33


• Special association models:

. Original model:


· · ·logit{P (Yij7 = 1)} = α7 + β7DCi + bi7

. Special case 1:

logit{P (Yij1 = 1)} = α1 + β1DCi + bi

logit{P (Yij2 = 1)} = α2 + β2DCi + γ2bi

· · ·logit{P (Yij7 = 1)} = α7 + β7DCi + γ7bi

(∆dev. = 533.7)

. Special case 2:

logit{P (Yij1 = 1)} = α1 + β1DCi + bi

· · ·logit{P (Yij7 = 1)} = α7 + β7DCi + bi

(∆dev. = 758.4)


Chapter 4

Conclusions

• Random-effects models provide flexible tools for joint models:

. Inferences for ‘univariate’ outcomes with classical ‘univariate’ models

. Direct marginal inferences, no conditioning required

. Different types of outcomes, different types of models

. High dimensions can be handled

• Pairwise approach allows fitting of high-dimensional models


Chapter 5

Related Work

. Fieuws S. and Verbeke G. (2004), ‘Joint modelling of multivariate longitudinalprofiles: Pitfalls of the random-effects approach,’ Statistics in Medicine, 23,3093-3104.

. Fieuws S. and Verbeke G. (2006), ‘Pairwise fitting of mixed models for the jointmodelling of multivariate longitudinal profiles,’ Biometrics, 62, 424-431

. Fieuws S., Verbeke G., Boen F., and Delecluse C. (2006) ‘High DimensionalMultivariate Mixed Models for Binary Questionnaire Data,’ Applied Statistics, 55,449-460.

. Fieuws S., Verbeke G., and Molenberghs G. (2007) ‘Random-effects models formultivariate repeated measures,’ Statistical Methods in Medical Research, 16,387-398.

. Fieuws S. and Verbeke G., Maes B., and Vanrenterghem Y. (2008) ‘Predictingrenal graft failure using multivariate longitudinal profiles,’ Biostatistics, 9, 419-431.


Thanks !


Tutorial III: Joint Models for Multivariate Longitudinal Data · 2.3 Multivariate Models •...

Documents

Transcript of Tutorial III: Joint Models for Multivariate Longitudinal Data · 2.3 Multivariate Models •...