Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna...

109
Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics [email protected] Xavier Basagaña Research Assistant Professor Centre for Research in Environmental Epidemiology (CREAL), Spain [email protected] 1

Transcript of Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna...

Page 1: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Designing longitudinal studies

Donna SpiegelmanProfessor of Epidemiologic Methods

Departments of Epidemiology and [email protected]

Xavier BasagañaResearch Assistant Professor

Centre for Research in Environmental Epidemiology (CREAL), [email protected]

1

Page 2: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Study design formulas based on tests that are valid and efficient for observational studies, for two reasonable alternative hypotheses.

Comprehensive assessment of the effect of all parameters on power and sample size.

Extension of results to a context where not all subjects enter the study at the same time.

Extension of results to the case of time-varying covariates, and comparisons to the time-invariant covariates case.

Topics covered in this talk

2

Page 3: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Use of a computer program to perform design computations. Intuitive parameterization and easy to use.

Topics covered in this talk

3

Page 4: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Power (π) for fixed N, r

N for fixed π, r

r for fixed π, N

Minimum detectable effect, fixed π, N, r

Optimal (Nopt,ropt) to maximize power for a fixed budget, or to minimize the total cost of the study for a fixed power.

Design Problems to Solve

4

Page 5: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Notation and Preliminary Results

5

Page 6: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• We focus on two alternative hypotheses:

( ) iriij XXYE 1.01.00.0| βμμμ ++++=

0 1: 0H β =

1. Constant Mean Difference (CMD).

6

( ) 0 1 2|ij i i ijE Y X X Tβ β β ′= + +

( ) 0 1|ij i iE Y X Xβ β= +

CMD, V(t0) = 0

Time

Y

UnexposedExposedDifference

A

CMD, V(t0) > 0

Time

Y

C

Page 7: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

( ) )(| 31.01.00.0 ijiiriij TXXXYE ×+++++= γγμμμ

0 3: 0H γ =

2. Linearly Divergent Differences (LDD)

7

( ) 0 1 3 2| ( )ij i i i ij ijE Y X X X T Tγ γ γ γ= + + × +

( ) ( ) ( ) ( )( )( ) ( )

0 1 0 2 0 3 4 0 5 0

0 1 0 2 3 4 5 30 5

|

,

ij ij i ij i i i i i ij i

i ij i i i i ij

E Y X t t t k k t k t t

t t k k t k t

η η η η η η

η η η η γη η η

′ ′ ′ ′ ′ ′= + + − + + × + × −

= + + + + × =× +

1 , 1, 1 30( ) ,i j i j i sE Y Y kλ λ λ γ+ − = =+

LDD, V(t0) = 0

Time

Y

B

LDD, V(t0) > 0

Time

Y

D

Page 8: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• Some test statistics invalid (SLAIN under LDD) or less efficient (ANCOVA under CMD) in an observational context where

Clinical trials vs. Observational studies:

( ) ( )0100 ii YEYE ≠

ControlTreatment

0 1 2 3 4 5

05

1015

2025

Time

Y

0 1 2 3 4 5

05

1015

2025

3035

Time

Y

8

Page 9: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

In RCTs, the time measure of interest is time from randomization → everyone starts at the same time.

In observational studies in epidemiology, for example, age is the time variable of interest, and study participants do not start at the same age. Exposure may be correlated with time.

RCTs: Time-invariant exposures; Observational: exposures can be either

RCTs: exposure (treatment) prevalence is 50% by design; Observational: exposures often have low or high prevalence (unbalanced designs)

Clinical trials vs. Observational studies:

9

Page 10: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Intuitive parameterization of the alternative hypothesis

1) the mean response at baseline (or at the mean initial time) in the unexposed group, where 

2) the percent difference between exposed and unexposed groups at baseline (or at the mean initial time), where 

00 :μ

( )00 0 | 0 , 1, ,i iE Y X i Nμ = = =

1 :p

( ) ( )( )

0 01

0

| 1 | 0| 0

i i i i

i i

E Y X E Y Xp

E Y X= − =

==

10

Page 11: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Intuitive parameterization of the alternative hypothesis (2)

3) : the percent change from baseline (or from the mean initial time)  to end of follow‐up (or to the mean final time) in the unexposed group, where  

When      is not fixed,        is defined at time s instead of at time 

4) : the percent difference between the change from baseline (or from the mean initial time)  to end of follow‐up (or mean final time) in the exposed group and the unexposed group, where 

When                 ,         will be defined as the percent change from baseline (or from the mean initial time)  to the end of follow‐up (or to the mean final time) in the exposed group, i.e.

2p

2p

( ) ( )( )

02

0

| 0 | 0| 0

i i i i

i i

E Y X E Y Xp

E Y Xτ = − =

==

τ τ

3p

( ) ( )( )

0 03

0

| 1 | 0| 0

i i i i i i

i i i

E Y Y X E Y Y Xp

E Y Y Xτ τ

τ

− = − − ==

− =02 =p 3p

( ) ( )( )

03

0

| 1 | 1| 1

i i i i

i i

E Y X E Y Xp

E Y Xτ = − =

== 11

Page 12: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Intuitive parameterization of the alternative hypothesis (3)

• Under CMD,  

• Under LDD,

– If

– Else

12

00

pβμ

=

2 0,p =

12

1 3 003 3

(1 )p p μγ ητ

+= =

2 3 003 3

p p μγ ητ

= =

Page 13: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• We consider studies where the interval between visits (s) is fixed but the duration of the study is free (τ ) (e.g. participants may respond to questionnaires every two years)

Increasing r involves increasing the duration of the study

• We also consider studies where the duration of the study, τ, is fixed, but the interval between visits is free (e.g. the study is 5 years long)

Increasing r involves increasing the frequency of the measurements, s

• τ = s r.

Notation & Preliminary Results

13

Page 14: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Literature review (1)1. Cochran, W. G. (1977). Sampling techniques. New York, Wiley.

2. Dawson, J. D. (1998). "Sample size calculations based on slopes and other summary statistics." Biometrics 54(1): 323‐30.

3. Diggle, P. (2002). Analysis of longitudinal data. Oxford ; New York, Oxford University Press.

4. Fitzmaurice, G. M., N. M. Laird, et al. (2004). Applied longitudinal analysis. Hoboken, N.J., Wiley‐Interscience.

5. Frison, L. and S. J. Pocock (1992). "Repeated measures in clinical trials: analysis using mean summary statistics and its implications for design." Stat Med 11(13): 1685‐704; Frison, L. J. and S. J. Pocock (1997). "Linearly divergent treatment effects in clinical trials with repeated measures: efficient analysis using summary statistics." Stat Med 16(24): 2855‐72.

6. Galbraith, S. and I. C. Marschner (2002). "Guidelines for the design of clinical trials with longitudinal outcomes." Control Clin Trials 23(3): 257‐73.

14

Page 15: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Literature review (2)7. Hedeker, D. G. R. W. C. (1999). "Sample size estimation for longitudinal 

designs with attrition: comparing time‐related contrasts between two groups." Journal of Educational and Behavioral Statistics 24(1): 70‐93.

8. Jung, S. H. and Ahn C. (2003). “Sample size estimation for gee method for comparing slopes in repeated measurements data”. Stat Med22(8):1305–15.

9. Kirby, A. J., Galai, N., and Munoz A. (1994). “Sample size estimation using repeated measurements on biomarkers as outcomes”. Control Clin Trials, 15(3):165–72.

10. Liu G. and Liang K. Y. (1997). “Sample size calculations for studies with correlated observations”. Biometrics 53(3):937–47.

11. Overall, J. E. (1996). "How many repeated measurements are useful?" J Clin Psychol 52(3): 243‐52; Overall, J. E. and S. R. Doyle (1994). "Estimating sample sizes for repeated measurement designs." Control Clin Trials 15(2): 100‐23.

15

Page 16: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Literature review (3)12. Raudenbush, S. W. (1997). "Statistical analysis and optimal 

design for cluster randomized trials." Psychol Methods 2(2): 173‐85; Raudenbush, S. W. and L. Xiao‐Feng (2001). "Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change." Psychol Methods 6(4): 387‐401.

13. Rochon J. (1998). “Application of gee procedures for sample size calculations in repeated measures experiments”. Stat Med 17(14):1643–58.

14. Schlesselman, J. J. (1973). "Planning a longitudinal study. II. Frequency of measurement and study duration." J Chronic Dis 26(9): 561‐70.

15. Schouten H. J. (1999). “Planning group sizes in clinical trials with a continuous outcome and repeated measures”. Stat Med 18(3):255–64. 16

Page 17: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Literature review (4)17. Snijders, T.A.B. and Bosker, R.J. (1993) “Standard errors and 

sample sizes for two‐level research”. Journal of Educational Statistics 18(3):237–259.

18. Tu, X. M., Kowalski, J., Zhang, J., Lynch, K. G. and Crits‐Christoph, P. (2004) “Power analyses for longitudinal trials and other clustered designs”. Stat Med 23(18):2799–815.

19. Yi, Q. and Panzarella, T. (2002) “Estimating sample size for tests on trends across repeated measurements with missing data based on the interaction term in a mixed model”. Control Clin Trials, 23(5):481–96, 2002.

17

Page 18: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

General Theoretical Results

Under CMD with CS response, ANCOVA optimal (valid & efficient) in RCTs; inefficient in observational studies (Appendix 2)

Under LDD with CS response, SLAIN optimal in RCTs; invalid in observational studies (Appendix 2)

With CS or RS response and V(t0)=0, 2-stage estimator = OLS = GLS (Appendix 3)

Page 19: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• Model 

• The generalized least squares (GLS) estimator of B is 

• Power formula 

Notation & Preliminary Results

( )( )

11 1

1 1

1 1ˆ

~ ,

i i i i i ii i

X i i i

n n

N E

−− −

− −Β

⎛ ⎞ ⎛ ⎞′= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

′=

∑ ∑B X Σ X X Σ Y

Β Σ X Σ X

[ ] [ ], | , 1, ,i i i i iE Var i n= = =Y X Β Y X Σ …

( )1 /21 AH

nz απ −

Β

⎡ ⎤⎢ ⎥= −Φ −⎢ ⎥′⎢ ⎥⎣ ⎦

c'Β

c Σ c

19

Page 20: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• Let νlm be the (l,m)th element of Σ‐1

• Under CMD

and fixed       ,   

where 

( )1

10 0

ˆ (1 )r r

e e jjj j

Var p p vβ−

′′= =

⎡ ⎤⎛ ⎞= −⎢ ⎥⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦

∑∑

Notation & Preliminary Results

20

0( ) 0V t =

0( ) 0V t >

( )( )

( ) ( )0

2

2' 0

0 ' 01 2

2 2' ' e, 0

0 ' 0 0 ' 0

det( )ˆ

(1 ) det( ) 1

r r

jjj j

r r r r

e e jj jj tj j j j

v V t sVar

p p v s v V t

β

ρ

= =

= = = =

⎛ ⎞+⎜ ⎟

⎝ ⎠=⎡ ⎤⎛ ⎞ ⎛ ⎞⎢ ⎥− + −⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦

∑∑

∑∑ ∑∑

A

A

' '0 ' 0 0 ' 0

' '0 ' 0 0 ' 0

'

r r r r

jj jjj j j j

r r r r

jj jjj j j j

v jv

jv jj v

= = = =

= = = =

⎛ ⎞⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟⎝ ⎠

∑∑ ∑∑

∑∑ ∑∑A

s

Presenter
Presentation Notes
\SIGMA_i=\SIGMA for CS, DEX, RS and V(t_0)=0 but not if V(t_0) \ne 0
Page 21: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• Let νlm be the (l,m)th element of Σ‐1

• Under LDD and fixed     ,  and when difference model is used,    

Notation & Preliminary Results

21

0( ) 0V t =

0( ) 0V t >

( )0

'0 ' 0

3 2

2 2e, 0 '

0 ' 0

ˆ( )

(1 ) 1 ( ) det( )

r r

jjj j

r r

e e t jjj j

vVar

p p V t v s

γ

ρ

= =

= =

⎛ ⎞⎜ ⎟⎝ ⎠=

⎡ ⎤⎛ ⎞⎢ ⎥− − +⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦

∑∑

∑∑ A

'0 ' 0

3 2ˆ( )

(1 ) det( )

r r

jjj j

e e

vVar

p p sγ = =

⎛ ⎞⎜ ⎟⎝ ⎠=−

∑∑A

s

& i iΣ = Σ ∀

Page 22: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• We consider three common correlation structures: 

1. Compound symmetry (CS). 

Correlation structures

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

== +×+

1

11

)|( 2)1()1(

ρρρ

ρρρ

σ

rriijVar ΣXY

22

Page 23: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

2. Damped Exponential (DEX) (Munoz et al., 1992) 

⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜

=

1

11

1

)2()(

)2()2(

)()2(

2

θθθ

θ

θθθ

θθ

θθθ

ρρρρρρρ

ρρρρρ

σ

ssrs

s

sss

ss

rsss

Σ

Correlation structures

10.8 10.8 0.8 10.8 0.8 0.8 10.8 0.8 0.8 0.8 1

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

10.8 1

0.76 0.8 10.73 0.76 0.8 10.71 0.73 0.76 0.8 1

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

10.8 1

0.64 0.8 10.51 0.64 0.8 10.41 0.51 0.64 0.8 1

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

θ = 0: CS

θ = 0.3: CS

θ = 1: AR(1)

23

[0,1]θ ∈

Page 24: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

3. Random intercepts and slopes (RS). 

• Reparameterizing: 

– is the reliability coefficient at baseline 

– is the slope reliability at the end of follow‐up (      =0 is CS;       =1 all variation in slopes is between subjects). 

• With this correlation structure, the variance of the response changes with time, i.e. this correlation structure gives a heteroscedastic model.

• When                           ,                

0

2 2(1 )i i i tρ σ′= + −Σ Z DZ I ⎟⎟⎠

⎞⎜⎜⎝

⎛= 2

2

11010

10100

bbbbb

bbbbb

σσσρσσρσ

D

]1,0[0∈tρ

]1,0[,1∈τρb

Correlation structures

1 , 0 CSb τρ = ⇒

1 ,b τρ

24

,dim( ) ( 1) 2i r= + ×i iZ = (1, t ) Z

1 ,b τρ

0( ) 0V t > iΣ ≠ Σ

Page 25: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• Goal is to investigate the effect of indicators of socioeconomic status and post‐menopausal hormone use (PMH) on cognitive function (CMD) and cognitive decline (LDD)

• “Pilot study” by Lee S, Kawachi I, Berkman LF, Grodstein F (“Education, other socioeconomic indicators, and cognitive function. Am J Epidemiol2003; 157: 712‐720). Will denote as Grodstein.

• Design questions include 

power of the published study to detect effects of specified magnitude,the number and timing of additional tests in order to obtain a study with the desired power to detect effects of specified magnitude, the optimal number of participants and measurements needed in a de novo study of these issues

Example

25

Page 26: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• At baseline and two years later, six cognitive tests were administered to 15,654 participants in the Nurses’ Health Study

• Outcome: Telephone Interview for Cognitive Status (TICS)

μ00=32.7 (4); Implies model 

1 point/10 years of age

Example

2 0.3% / yp ear= − ⇔( ) 16,Var Y = ( | ) 12,Var Y X ≈

2 0.25R =26

2γ=

Page 27: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• Exposure: Graduate school degree vs. not (GRAD)

Corr(GRAD, age at start of follow‐up)=‐0.01points

• Exposure: Post‐menopausal hormone use (CURRHORM)

Corr(CURRHORM, age)=‐0.06 points

• Time: age (years) is the best choice, not questionnaire cycle or calendar year of test

The mean age was 74 and V(t0)≈4.

Example

6.2%,ep =

1 2.3% 0.75p = ⇔

1 0.7% 0.02p = ⇔

26.7%,ep =

27

1 1,β γ=

Page 28: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• The estimated covariance parameters were

• SAS code to fit the LDD model with CS covarianceproc mixed;class id;model tics=grad age gradage/s;random id;

• SAS code to fit the LDD model with RS covarianceproc mixed;class id;model tics=grad age gradage/s ddfm=bw;Random intercept age/type=un subject=id;

CS RS

ρ or 0.27 0.26

0.04

0.01

-0.14

0tρ

2~,1 =rbρ

10bbρ

Example

28

1 , 1b rρ =

Page 29: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• To estimate     ,          under DEX response– Get RDEC package from Vincent Carey ([email protected])

– R Code• library(RDEC)• mod1 = rdec(tics ~ age + grad + gradage, data=dat, id=ID,S=age,

• omega.init = c(.5,.5), omega.low=c(.01,.01), omega.high=c(.95, .95))

• summary(mod1)

• to fit DEX response

Example

29

θ 0tρ

2r≥

Page 30: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Program optitxs.rmakes it all possible

30

Page 31: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

31

Page 32: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

32

Page 33: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

33

http://www.hsph.harvard.edu/faculty/donna-spiegelman

Page 34: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

34

http://www.hsph.harvard.edu/faculty/spiegelman/vita.html

Page 35: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

35

http://www.hsph.harvard.edu/faculty/spiegelman/software.html

Page 36: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

http://www.hsph.harvard.edu/faculty/spiegelman/software.html

Page 37: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

http://www.hsph.harvard.edu/faculty/spiegelman/optitxs.html

37

Page 38: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

38

Page 39: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

39

Page 40: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

40

Page 41: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

41

Page 42: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

42

Page 43: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Illustration of use of softwareoptitxs.r

• We’ll calculate the power of the Grodstein’s published study to detect the observed 70% difference in rates of decline between those with more than high school vs. others over the original two year period

• Recall that 6.2% of NHS had more than high school; there was a –0.3% decline in cognitive function per year

43

Page 44: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

> long.power()Press <Esc> to quit

Constant mean difference (CMD) or Linearly divergent difference (LDD)? lddThe alternative is LDD.

Enter the total sample size (N): 15000

Enter the number of post-baseline measures (r>0): 1

Enter the time between repeated measures (s): 2

Enter the exposure prevalence (pe) (0<=pe<=1): 0.062

Enter the variance of the time variable at baseline, V(t0) (enter 0 if all participants begin at the same time): 4

Enter the correlation between the time variable at baseline and exposure, rho[e,t0]

(enter 0 if all participants begin at the same time): -0.01

Will you specify the alternative hypothesis on the absolute (beta coefficient) scale (1) or the relative (percent) scale (2)? 2The alternative hypothesis will be specified on the relative (percent) change scale. 44

Page 45: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Enter mean response at baseline among unexposed (mu00): 32.7

Enter the percent change from baseline to end of follow-up among unexposed (p2) (e.g. enter 0.10 for a 10% change): -0.006

Enter the percent difference between the change from baseline to end of follow-up in the exposed group and the unexposed group (p3) (e.g. enter 0.10 for a 10% difference): 0.7

Which covariance matrix are you assuming: compound symmetry (1),damped exponential (2) or random slopes (3)? 2You are assuming DEX covariance

Enter the residual variance of the response given the assumed model covariates (sigma2): 12

Enter the correlation between two measures of the same subject separated by one unit (rho): 0.3

Enter the damping coefficient (theta): 0.10

Power = 0.4206059

45

Page 46: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Power of current study

• To detect the observed 70% difference in cognitive decline by GRAD 

– CS: 44%

– RS: 35%

– DEX : 42%

• To detect a hypothesized ±10% difference in cognitive decline by current hormone use

– CS & DEX: 7%

– RS: 6%

0( 0.3, 0.10)tρ θ= =

46

( 0.30)ρ =1 0 1, 2 ,( 0.04, 0.14)b r b bρ ρ= = = −

Page 47: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

How many additional measurements are needed when tests are administered every 2 years      

how many more years of follow‐up are needed...

• To detect the observed 70% difference in cognitive decline by GRAD with 90% power?– CS, DEX                  , RS: 3 post‐baseline measurements         =6 years  =

• one more 5 year grant cycle

• To detect a hypothesized ± 20% difference in cognitive decline by current hormone use with 90% power?– CS, DEX                   :  6 post‐baseline measurements  = 12

• More than two 5 year grant cycles

N=15,000 for these calculations

τ( 0.10)θ =

( 0.10)θ = τ

47

Page 48: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

How many more measurements should be taken in four (1 NIH grant cycle) and eight years of follow‐up 

(two NIH grant cycles)...

• To detect the observed70% difference in cognitive decline by GRAD with 90% power?

• To detect a hypothesized± 20% difference in cognitivedecline by current hormoneuse with 90% power?

( 0.10)θ =

( 0.10)θ =

Duration of follow-up

4 years 8 years

CS 8 1

DEX 10 1

RS 10 1

Duration of follow-up

4 years 8 years

CS >50 11

DEX >50 17RS >50 13

48

Page 49: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Optimize (N,r) in a new study of cognitive decline

• Assume – =4 years of follow‐up (1 NIH grant cycle);     fixed at 2 years– cost of recruitment and baseline measurements are twice that of 

subsequent measurements

• GRAD: – (N,r)=(26,795; 1) CS– =(26,930;1) DEX– =(28,945;1) RS 

• CURRHORM: – (N,r)=(97,662; 1) CS– =(98,155; 1) DEX– =(105,470;1) RS

( 0.10)θ =

3 70%p =

3 20%p =

( 0.10)θ =

49

ττ s s

Page 50: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Summary of the features of existing programs

Software Reference CMD LDD CS RS DEX ( )0 0V t >

Exposure and time

correlated

Optimal (N, r) for fixed cost and/or fixed power

Power N for fixed r

r for fixed N

Minimum detectable

effect

PINT Snijders

(1993, 2003) × ×* × (fixed cost)

It computes the standard errors,

( )ˆ2Var β and ( )ˆ3Var γ

http://stat.gamma.rug.nl/snijders/

RMASS2 Hedeker (1999a, 1999b)

× × × × × ×

http://tigger.uic.edu/~hedeker/works.html

GEESIZE

Rochon (1998), Ziegler (2004)

× × × × × × ×

http://www.imbs.uni-luebeck.de/pub/Geesize/

OPTITXS

Basagana and

Spiegelman (2007)

(both constraints)

http://www.hsph.harvard.edu/faculty/spiegelman/optitxs.html

*Only considers B&W model, which reduces to the 0( ) 0V t = case (appendix 1.3)

Page 51: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Designing Longitudinal Studies:

General theoretical results

Page 52: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

52

CMD:Power increases as and asPower increases as Var( ) goes to 0.Power is maximum at pe=0.5.

LDD:Power increases as , as , as as V(t0) increases, and as the correlation between t0 and exposure goes to 0.Power is maximum at pe=0.5.

General theoretical results

0ρ → 1θ →

0t

52

1ρ → 0θ → ,0~,1→rbρ

Page 53: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

General Theoretical Results

Under CMD with CS response, ANCOVA optimal (valid & efficient) in RCTs; inefficient in observational studies (Appendix 2)

Under LDD with CS response, SLAIN optimal in RCTs; invalid in observational studies (Appendix 2)

With CS or RS response and V(t0)=0, 2-stage estimator = OLS = GLS (Appendix 3)

Page 54: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Theoretical Results on minimum r for fixed N and    

• Under LDD,                       , and fixed    , power when r=2 is the same as power when r=1 (Appendix 4)

• r is minimized at                   (Appendix 5)

• Power is limited below 100% as                (Appendix 6)

Under CMD 

CS & RS response

CS & AR(1) response when     is fixed

Under LDD

AR(1) and fixed 

RS, 

π

τ0( ) 0V t =

0.5ep =

r →∞

τ

τ0( ) 0V t =

Page 55: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Theoretical Results on effect of    ,     , and          for minimum r for fixed N and   

• Under CMD with CS response and               ,

as          when 

Else        as 

• Under LDD with CS or RS response,               , and s fixed,      as 

• Under LDD with CS or RS response,              , and s fixed,       as 

2 2 21 / 2 2( ) (1 )e ez z Np pπ α σ β−+ > −

ρ0t

ρ 1bρπ

0( ) 0V t =

0( ) 0V t =

0( ) 0V t =

r ↑ ρ ↑

r ↓ ρ ↑

r ↓ ρ ↑

1bρ ↑r ↑

Page 56: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Designing Longitudinal Studies to Optimize the Number of 

Subjects and Number of Repeated Measurements:

Theoretical results

Page 57: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Bloch, D. A. (1986). “Sample size requirements and the cost of a randomized clinical trial with repeated measurements”. Stat Med, 5(6):663–7.

Cochran, W.G. (1977) “Sampling techniques”. Wiley, New York, 3d edition, 1977.

Moerbeek, M., van Breukelen, G. J. P., Berger, M.P.F. (2000). “Design Issues for Experiments in Multilevel Populations”. Journal of Educational and Behavioral Statistics, 25(3): 271-284.

Raudenbush, S. W. (1997) “Statistical analysis and optimal design for cluster randomized trials”. Psychol Methods 2(2):173–85.

Snijders, T.A.B. and Bosker, R.J. (1993) “Standard errors and sample sizes for two-level research”. Journal of Educational Statistics18(3):237–259.

Winkens, B, Schouten, H.J.A, van Breukelen, G.J.P., Berger, M.P.F. (2006). “Optimal number of repeated measures and group sizes in clinical trials with linearly divergent treatment effects”. Contemporary Clinical Trials 27: 57– 69.

Literature review

Page 58: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Problem to solveCost 1st measurement

Cost of 2nd or following measurementκ =

11,

1 / 2 target

( )( )( )

subject to A

N r

rH

NrcMin Nc

Min rN

z Powerα

κ

κ

⎫+ ⎪⎪⎪ ≡ +⎬

⎡ ⎤ ⎪⎢ ⎥ ⎪Φ − ≥⎢ ⎥ ⎪⎣ ⎦ ⎭

B

B

c'Σ cc'B

c'Σ c

1 / 2,

11

( )( )

subject to

N r

r

NMax z

Min rNrcNc Budget

α

β

σ κ

κ

⎫⎡ ⎤⎪⎢ ⎥Φ −⎪⎢ ⎥ ≡ +⎣ ⎦ ⎬⎪

+ ≤ ⎪⎭

Bc'Σ c

• ropt is the same for both minimizing cost subject to fixed power, and for maximizing power subject to fixed cost (Appendix 9)• Nopt will be different

Page 59: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• Results based on a Wald test of the coefficient of interest, based on the GLS estimator.

• Two scenarios: 

– Fixed frequency of measurements, s.

• Increasing r involves increasing the duration of follow‐up, τ. 

– Fixed length of follow‐up, τ.• Increasing r involves increasing the frequency of measurement s.

Methods

Page 60: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

CMD, V(t0) = 0

Time

YUnexposedExposedDifference

A

CMD, V(t0) > 0

Time

Y

C

Shape of the group difference

Parameter of interest:

Group difference

Parameter of interest:

Exposure by time

interaction term

LDD, V(t0) = 0

Time

Y

B

LDD, V(t0) > 0

Time

Y

D

Page 61: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

1. Compound symmetry (CS). 

2. Damped Exponential (DEX) 

3. Random intercepts and slopes (RS). 

( ) '2', j j

ij ijCov Y Yθ

σ ρ −=

( ) 2',ij ijCov Y Y σ ρ=

Covariance structures

( ) ( )0

2| (1 ) ,i i i i t i iVar ρ σ′= + − =Y X Z DZ I Z 1 t

Page 62: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Comprehensive results for longitudinal studies, under different scenarios:

Shape of the group difference: CMD or LDD.Covariance structure: CS, DEX, RS.Whether all participants are observed at the same time points or not (e.g. age is time variable of interest).

Theoretical results

Page 63: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Under CS:If κ =1, one would not take repeated measures and increase the number of participants.

If κ >1:

If correlations large: still no repeated measures, or just a small number of them.If correlations small: taking some repeated measures and fewer participants is optimal.

If deviations from CS exist, it is advisable to take more repeated measures and less participants.

CMD

Page 64: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• If the follow‐up period is not fixed, choose the maximum length of follow‐up possible with ropt=1 (except when RS is assumed, where in most scenarios ropt=2 to 6).

LDD, same set of time points

Fixed s

Fixed τIf the follow-up period fixed, one would take more than one repeated measure only when κ >5. When there are departures from CS, values of κaround 10 or 20 are needed to justify taking 3 or 4 measures.

Page 65: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Results depend additionally on and the correlation between exposure and time.

LDD: Effect of having different time points

Increasing can either increase or decrease ropt , depending on the case. Few patterns appear.

0 2 4 6 8 10

05

1015

LDD, CS, ρ = 0.2, ρe,t = 0.2

V(t0) τ

r opt

κ = 2κ = 5κ = 10κ = 20

0( )V t

0( )V t

Page 66: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

– If the follow‐up period is not fixed, choose the maximum length of follow‐up (τ) possible (except when RS is assumed).

– If the follow‐up period fixed, one would take more than one repeated measure only when the subsequent measures are more than five times cheaper. When there are departures from CS, values of κ around 10 or 20 are needed to justify taking 3 or 4 measures.

LDD

66

Page 67: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• The optimal (N,r) and the resulting power strongly depend on the correlation structure. Combinations that are optimal for one correlation may be bad for another.

• Recommend performing sensitivity analysis. 

• All the decisions are based on power considerations alone. There might be other reasons to take repeated measures.

LDD

Page 68: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Part 2.  Designing longitudinal studies with a time‐varying exposure

Page 69: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Introduction• RCTs:

• the exposure is time-invariant,

or

• exposure varies in a manner that is controlled by design.

• Observational studies:

• the investigator does not control how exposure varies within subjects over time

• a large number of exposure patterns are observed, with large differences in the number of exposed periods per participant and changes in the cross-sectional prevalence of exposure over time.

Page 70: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Introduction

• Motivating example (Medina-Ramon et al. Eur Resp J 2006):

• Study of domestic cleaners, followed during 15 consecutive days.

• Every day, cleaners provided measurements of lung function and reported on cleaning product use (e.g. used bleach yes/no).

• Exposed days per person (bleach):

– Mean: 10

– Range: 1-15

Page 71: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Methods• We present formulas for study design calculations that

address these issues for studies with a continuous outcome and a binary, time-varying exposure.

• We covered studies where the interest is the effect of a time-varying exposure on either:

– the mean levels of the response (main effect of exposure) (CMD)

– on the rate of change of the response over time (exposure by time interaction) (LDD)

• We assume that participants are observed at r+1 equidistant time points.

Page 72: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Literature Review• JONES, B. & KENWARD, M. G. (1989) Design and analysis of cross‐over trials, London; New York, Chapman and Hall.

• JULIOUS, S. A. (2004) Sample sizes for clinical trials with normal data. Stat Med, 23, 1921‐86.

• MOERBEEK, M., VAN BREUKELEN, J. P. & BERGER, M. P. F. (2001) Optimal experimental designs for multilevel models with covariates. Communications in Statistics ‐Theory and Methods, 30, 2683‐97.

Page 73: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Introduction: Patterns ‐‐ CMD

6

0 1 2 3 4

02

46

810

12

"Envelope" trajectories

Time

Y

ExposedUnexposed

0 1 2 3 4

02

46

810

12

Possible pattern for one subject

Time

Y

E=0 E=1 E=1 E=0 E=0

0 1 2 3 4

01

23

45

6

"Envelope" trajectories

Time

Y

ExposedUnexposed

0 1 2 3 4

01

23

45

6

Possible pattern for one subject

Time

Y

E=0 E=1 E=1 E=0 E=0

Page 74: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Notation: Models ‐‐ CMD

• Basic model:

• One can assume a different intercept for every participant and fit the model by conditional likelihood. – This model estimates the within‐subject effect of exposure. 

– Generalization of paired t‐test, where every participant serves as his/her own control.

– Equivalent to fitting model on differences (Appendix B):

7

( ) 0 1 2|ij ij ij ijY E E tβ β β= + +E

( ) ( ), 1 , 1 2 1 , 1| , W Wi j ij i j ij i j ijY Y E E E Eβ β+ + +− = + −E

Page 75: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Notation: Intraclass correlation of exposure, ρe

8

With a CS exposure covariance, this is a measure of:common correlation of exposure for all pairswithin-subject variability of exposureimbalance in the number of exposed periods per person, Ei•

When ρe=1, the exposure is time-invariant. There is no within-subject variation of exposure. There is maximum imbalance in Ei• : Ei• =0 or Ei• =r+1

When ρe = -1/rMaximum within-subject variation of exposure.Minimum imbalance in Ei•, everyone is exposed the same number of periods (“designed study”).

Example: ρe=0.35 for exposure to bleach in the cleaners study (ρx=0.36)

Page 76: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Notation: Within‐subject exposure correlations

• Correlation between all consecutive pairs of exposure measurements within‐subject.

• First order autocorrelation of exposure, 

9

0 1

0 1

,

, ,

11

1r r r

e e

e e e e

ρ

ρ ρ−

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

1eρ

Page 77: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Results: CS Response, 

10

( ) ( ) 0 1| |ij i ij ij ijY Y E Eβ β= = +XE EModel:

( ) ( ), 1 , 1 1 , 1| , Wi j ij i j ij i j ijY Y E E E Eβ+ + +− = −EModel:

( )( )( ) ( )( )( )

2

1

(1 ) 1ˆ( )1 1 1 2 1 (1 )e e e e

rVar

p p r rσ ρ ρ

βρ ρ ρ− +

=− + − − + − −

2

1(1 )ˆ( )

(1 ) (1 )e e e

Varp p r

σ ρβρ

−=

− −

0( ) 0V t =

Page 78: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Results: Required Input Parameters

To compute               we need:  

• If CS response:       or           , and        (Appendix C)

• If AR(1) response: pej ∀j and             (Appendix E)

• If DEX response, 0<θ <1:– If CS exposure: pej ∀j and ρe– If exposure not CS: pej ∀j and

Assumes within‐subjects model or

11

1̂( )Var β

eρep ejp j∀

1eρ

', , 'j je e j jρ ∀

0( ) 0V t =

Page 79: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• Need

• Since      can be viewed as an average of all the exposure correlations, one can expect that assuming                            would produce reasonable estimates of             , even if the actual covariance matrix of exposure does not follow a CS structure. 

• Evaluated in 10,000 arbitrary exposure correlation matrices and prevalences vectors

Results: Required input parameters for DEX Response and Exposure not CS

12

', , 'j je e e j jρ ρ= ∀

', , 'j je e j jρ ∀

1̂( )Var β

Presenter
Presentation Notes
All four models, \theta=0.2,0.5,0.8, 1; \rho=0.8, 0.5, 0.2; r=2,5,10; all possible prevalence vectors
Page 80: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

• Large underestimations when: 

– Close to AR(1) covariance of the response.

• In those cases, conservative (large) values of 

ρe are recommended

Results: Accuracy of approximations

13

1e eρ ρ

Page 81: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Results: Efficiency

14

Under CS, probably DEX, and, more generally, when ν jj’<0 ∀j ≠ j ’ (Appendix F):

is maximum when ρe=1 (time-invariant exposure).

is minimum when ρe= -1/r (maximum within-subject variation of exposure).

Having within-subject variation in exposure improves efficiency.

1̂( )Var β

1̂( )Var β

Page 82: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Efficiency with CS response

15

With CS response, formulas for time-invariant exposure will always over-estimate the required sample size, especially with large r and large ρ

Response ~ CS, ρ = 0.8

1 time-invariant

time-varying

e

e

N NSSRN Nρ

ρ

== =

0.0 0.2 0.4 0.6 0.8 1.00

1020

3040

50

ρ = 0.8

ρe

SSR

r=1r=5r=10r=20

Page 83: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Efficiency with DEX response: SSR as a function of  θ assuming CS for the exposure process, for r=5 and                     Lines indicate: (——) ρ=.2 , (‐‐‐‐) ρ =.5, (∙∙∙∙∙) ρ =.8.

16

With DEX response, formulas for time-invariant exposure can over-estimate or under-estimate the required sample size, especially with large r and large ρ.

0.2ep =

Page 84: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Example 1: Respiratory effects of exposure to cleaning products

• Exposure to bleach: r+1=15, ρe=0.35,

• Required sample size to detect a difference of 10 L·min-1 with 80% power (~3% difference).

• Time-invariant: N = 1387• Time-varying: N = 24 • SSR=58

Page 85: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Example 2-3. These examples are based on Medina-Ramon et al.’s study of respiratory function in relation to exposure to cleaning products/tasks, where peak expiratory flow is the response and use of air fresheners is the exposure.

is the assumed model, with

More information can be found in the user’s manual at http://www.hsph.harvard.edu/faculty/spiegelman/optitxs.html

( ) 0 1 2|ij ij ij ijY E E tβ β β= + +E

0( ) 0V t =

Page 86: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Example 2: Sample size calculation

What is the sample size (N) needed to detect a 10 L/min decrease in PEF with 14 post-baseline repeated measures (r=14) assuming CS response?

Page 87: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

> long.N.tv()Enter the number of post-baseline measures (r): 14Enter the desired power (0<Pi<1): .9Enter the time between repeated measures (s): 1Do you want to base the calculations on a model with a main effect of exposure (1) or a model that separates the between- and within-subjects effects of exposure (2)? 1Will you specify the alternative hypothesis on the absolute (beta coefficient) scale (1) or the relative (percent) scale (2)? 1The alternative hypothesis will be specified on the absolute (beta coefficient) change scale.Enter the difference between exposed and unexposed periods (beta1): 10Which residual covariance matrix of the response are you assuming: compound symmetry (1) or damped exponential (2) ? 1You are assuming CS covariance of the responseEnter the residual variance of the response given the assumed model covariates (sigma2): 4686Enter the correlation between two measures of the same subject (rho): .88Enter the mean prevalence of exposure (mean.pe): .17Enter the intraclass correlation of exposure (-1/14 <= rho_e <= 1): .59Sample size = 72

Page 88: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

This example is based on Medina-Ramon et al.’s study of the respiratory effects of exposure to cleaning products, with peak expiratory flow as the response and use of air fresheners as the exposure.

, t=0,1,…,14

What is the power to detect a 10 L/min decrease in PEF in relation to a day of exposure to air fresheners, in a study of 31 participants and 14 post-baseline repeated measures, assuming CS response?

Example 3: Power calculation

0( ) 0V t =

Page 89: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

> long.power.tv()Enter the total sample size (N): 31Enter the number of post-baseline measures (r): 14Enter the time between repeated measures (s): 1Do you want to base the calculations on a model with a main effect of exposure (1) or a model that separates the between-and within-subjects effects of exposure (2)? 2Will you specify the alternative hypothesis on the absolute (beta coefficient) scale (1) or the relative (percent) scale (2)? 1The alternative hypothesis will be specified on the absolute (beta coefficient) change scale.Enter the difference between exposed and unexposed periods (beta1): 10Which residual covariance matrix of the response are you assuming: compound symmetry (1) or damped exponential (2) ? 1You are assuming CS covariance of the responseEnter the residual variance of the response given the assumed model covariates (sigma2): 4686Enter the correlation between two measures of the same subject (rho): .88Enter the mean prevalence of exposure (mean.pe): .37Enter the intraclass correlation of exposure (-1/14 <= rho_e <= 1): .13Power = 0.9770487

Page 90: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Time‐varying exposure: LDD

• When generalizing the LDD setting to the case of a time‐varying exposure, we distinguish between two cases:– The effect of exposure on the response is cumulative , 

– The effect of exposure on the response is acute

( ) *0 1 2| ,ij i ij ijY t Eγ γ γ= + +XE *

ij ikk jE E

≤= ∑

( ) ( )0 1 2 3|ij i ij ij ij ijY t E E tγ γ γ γ= + + + ×XE

Page 91: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Literature Review – LDD with time-varying covariate

Page 92: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Paper 3 25

Time‐varying exposure: LDDCumulative exposure trajectories

0 1 2 3 4

02

46

810

12

"Envelope" trajectories

Time

Y

ExposedUnexposed

0 1 2 3 4

02

46

810

12

Possible pattern for one subject

Time

Y

E=0 E=1 E=1 E=0 E=1

Page 93: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

LDD: Cumulative Exposure Effect Model

To exactly compute , we need:pej ∀j.

, or equivalently,and V(t0)

, or equivalently, and

, or equivalently, , or equivalently,

ˆ( )eVar γ

', , 'j je e j jρ ∀ ' , 'j jE E j jE ⎡ ⎤ ∀⎢ ⎥⎣ ⎦

0jE t jE ⎡ ⎤ ∀⎢ ⎥⎣ ⎦ 0,je t jρ ∀

[ ]0tE

( )*1EE − ( )*

1V E−

*1 jE E jE −

⎡ ⎤ ∀⎢ ⎥⎣ ⎦ *1 , jE e

jρ−

∀*

1 0E tE −⎡ ⎤⎢ ⎥⎣ ⎦ *

1 0,E tρ

( ) *0 1|ij i ij e ijY t Eγ γ γ= + +XEModel:

*ij ikk j

E E≤

= ∑

Presenter
Presentation Notes
Only if V(t0)>0 or Var(E*-1)>0 are the last 5 needed
Page 94: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

LDD: Cumulative Exposure Effect Model

• To exactly compute , we need:pej ∀j.

, or equivalently,

Less efficient, fewer input parameters needed, more valid (between-subjects confounding eliminated)

ˆ( )eVar γ

', , 'j je e j jρ ∀ ' , 'j jE E j jE ⎡ ⎤ ∀⎢ ⎥⎣ ⎦

Model: ( ) ( )* *, 1 , , 1 , ,| W W W W

i j i j i t e i j i j t e i jY Y E E Eγ γ γ γ+ +− = + − = +XE

Page 95: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Results: Cumulative Exposure Effect Model

Once the exposure prevalence is fixed, if w jj’≥0∀j ≠ j’ then :

is minimum when ρe=1 (time-invariant exposure).

is maximum when ρe= -1/r (maximum within subject variation of exposure).

Having within-subject variation in exposure produces a loss in efficiency.

ˆ( )eVar γ

ˆ( )eVar γ

Efficiency

Page 96: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Results: Cumulative Exposure Effect Model

[ ]2

2

12 (1 )ˆ( )(1 ) ( 2) 2 ( 1)e

e e e

Varp p s r r r

σ ργρ

−=

− + + −

If both the response and the exposure process have CS covariance then

1 time-invariant

time-varying

e

e

N NSSRN Nρ

ρ

== =

Efficiency

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

CS, ρ = 0.8

ρe

SSR

r=2r=5r=10

Page 97: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Results: LDD: Cumulative Exposure Effect Model

• DEX response: The required N increases as θincreases

• RS response: The required N increases as increases   

• CS, DEX response: The required N increases as ρe decreases          

1bρ

Page 98: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

LDD: Acute exposure trajectories

Acute, transient, exposure effect

0 1 2 3 4

05

1015

"Envelope" trajectories

Time

YExposedUnexposed

0 1 2 3 4

05

1015

Possible pattern for one subject

Time

Y

E=0 E=0 E=1 E=1 E=0

0 1 2 3 4

05

1015

"Envelope" trajectories

Time

Y

ExposedUnexposed

0 1 2 3 4

05

1015

Possible pattern for one subject

Time

Y

E=1 E=1 E=1 E=0 E=0

Page 99: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Results: Acute Exposure Effect Model

To exactly compute , the following quantities need to be provided:

pej ∀j., or equivalently,

V(t0), or equivalently,

One possibility is do the calculations for V(t0)=0, in which case only pej ∀j and are required. This will provide conservative study designs.

', , 'j je e j jρ ∀ ' , 'j jE E j jE ⎡ ⎤ ∀⎢ ⎥⎣ ⎦

0jE t jE ⎡ ⎤ ∀⎢ ⎥⎣ ⎦ 0,je t jρ ∀

' 0 , 'j jE E t j jE ⎡ ⎤ ∀⎢ ⎥⎣ ⎦2

' 0 , 'j jE E t j jE ⎡ ⎤ ∀⎢ ⎥⎣ ⎦

', , 'j je e j jρ ∀

( ) ( )0 1 2 3|ij i ij ij ij ijY t E E tγ γ γ γ= + + + ×XEModel:

Page 100: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Results: Acute Exposure Effect ModelEfficiency

1 time-invariant

time-varying

e

e

N NSSRN Nρ

ρ

== =

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

ρ = 0.8, θ = 0

ρe

SSR

r=1r=2r=5r=10

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

3.0

ρ = 0.8, θ = 0.5

ρe

SSR

r=1r=2r=5r=10

Page 101: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Results: Accuracy of approximations

SSR in 10,000 arbitrary correlation matrices of exposure.

Need pej ∀j, or, equivalently,Assume CS exposure and provide ρe.

', , 'j je e j jρ ∀ ' , 'j jE E j jE ⎡ ⎤ ∀⎢ ⎥⎣ ⎦

CS

true

NSSRN

=

SS

R

ρ = 0.8 ρ = 0.8 ρ = 0.2 ρ = 0.8 ρ = 0.5 ρ = 0.8 ρ = 0.8 ρ = 0.2 ρ = 0.8 ρ = 0.5

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

CS AR(1) AR(1)RS

ρb1=0.8

RSρb1

=0.2 CS AR(1) AR(1)RS

ρb1= 0.8

RSρb1

= 0.2

Model 2.5 (cumulative) Model 2.6 (acute)

Page 102: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Results: Accuracy of approximations

• Low SSR when:

– Cumulative: small, negative correlations for pairs of time points close in time, and large, positive correlations for pairs of time points distant to each other.

– Acute: high correlations for pairs of time points that are both at either the beginning or the end of the study, while the remaining correlations were negative. 

Page 103: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Example. Medina-Ramon’s respiratory effects of exposure to cleaning products study

Here, we compute the required sample size for a study with 31 participants and 14 post-baseline measures to detect a 5 L/min/day decrease in PEF associated with the use of air-freshener sprays with 90% power, assuming DEX covariance structure of the response.

We assume the rates of change vary by exposure and a cumulative exposure effect using the model

which is equivalent to the model when there is no between-subjects confounding.

( ) ( )* *, 1 , , 1 , ,| W W W W

i j i j i t e i j i j t e i jY Y E E Eγ γ γ γ+ +− = + − = +XE

( ) *0 1|ij i ij e ijY t Eγ γ γ= + +XE

Page 104: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

> long.N()* By just pressing <Enter> after each question, the default value,shown between square brackets, will be entered.* Press <Esc> to quitEnter the number of post-baseline measures (r) [1]: 14Enter the desired power (0<Pi<1) [0.8]: .9Enter the time between repeated measures (s) [1]: 1Is the exposure time-invariant (1) or time-varying (2) [1]? 2Do you assume that the exposure prevalence is constant overtime (1), that it changes linearly with time (2), or you wantto enter the prevalence at each time point(3) [1]? 2Enter the exposure prevalence at time 0 (0<pe0<1) [0.5]: .35Enter the exposure prevalence at time 14 (0<pe14<1) [0.5]: .45Enter the intraclass correlation of exposure(-0.071<rho.e<0.808) [0.5]: .13Constant mean difference (1) or Linearly divergent difference (2)[1]: 2Which model are you basing your calculations on:(1) Cumulative exposure effect model. No separation of between andwithin-subject effects(2) Cumulative exposure effect model. Within-subject contrast only(3) Acute exposure effect model. No separation of between- andwithin-subject effects(4) Acute exposure effect model. Within-subject contrast onlyModel [1]: 2

Page 105: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

> Will you specify the alternative hypothesis on the absolute (betacoefficient) scale (1) or the relative (percent) scale (2) [1]? 1

Enter the interaction coefficient (gamma3) [0.1]: 5

Which covariance matrix are you assuming: compound symmetry (1),damped exponential (2) or random slopes (3) [1]? 2

Enter the residual variance of the response given the assumedmodel covariates (sigma2) [1]: 4570

Enter the correlation between two measures of the same subjectseparated by one time unit (0<rho<1) [0.8]: .88

Enter the damping coefficient (theta) [0.5]: .12

Sample size = 28Do you want to continue using the program (y/n) [y]? n

Page 106: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Future work

• Include dropout

• For sample size calculations, simply inflate the sample size by a factor of 1/(1-f).

• However, dropout can alter the relationship between N and r.

• Binary outcomes• Continuous exposures• Optimal (N,r) for time-varying exposures• Other ideas?

39

Page 107: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Basagaña X, Spiegelman D. “The design of observational longitudinal studies with a time-invariant exposure”. Submitted for publication.

Basagaña X, Spiegelman D. “Power and sample size calculations for longitudinal studies estimating a main effect of a time-varying exposure”. Submitted for publication.

Basagaña X, Spiegelman D. “Power and sample size calculations for longitudinal studies comparing rates of change with a time-varying exposure.” Submitted for publication.

All three can be found athttp://www.hsph.harvard.edu/faculty/spiegelman/optitxs.html

For further reading

40

Page 108: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

To accurately design an expensive, long-term longitudinal study, it is best to first conduct a pilot study with r=2, with sufficient sample size to accurately estimate all necessary parameters given the assumed model. Then, design the second phase of the study.

When time and/or funding do not permit, sensitivity analysis, including plausible worst case scenarios, is suggested

Our program will be helpful

http://www.hsph.harvard.edu/faculty/spiegelman/optitxs.html

The bottom line

41

Page 109: Designing longitudinal studies - Harvard Catalyst · Designing longitudinal studies Donna Spiegelman Professor of Epidemiologic Methods. Departments of Epidemiology and Biostatistics.

Thanks for your attention