A copula-based Simulation Method for Clustered Multi-State Survival Data

55
A copula-based simulation method for clustered multi-state survival data F. Rotolo ? , C. Legrand ? , I. Van Keilegom ? , M. Chiogna Dipartmento di Scienze Statistiche Universit` a degli Studi di Padova ? Institut de Statistique, Biostatistique et Sciences Actuarielles Universit´ e Catholique de Louvain September 23, 2011

description

Generating survival data with a clustered and multi-state structure is useful to study Multi-StateModels, Competing Risks Models and Frailty Models. The simulation of such kind of datais not straightforward as one needs to introduce dependence between times of different transitionswhile taking under control the probability of each competing event, the median sojourn time ineach state, the effect of covariates and the type and magnitude of heterogeneity.Here we propose a simulation procedure based on Clayton copulas for the joint distributionof times of each competing events block. It allows to specify the marginal distributions of timevariables, while their dependence is induced by the copula. Furthermore, even though a dependenceis obtained between all the time variables, only some joint distributions have to be handled.The choice of simulation parameters is done by numerical minimization of a criterion functionbased on the ratio of target and observed values of median times and of probabilities of competingevents.The proposed method further allows to simulate discrete and continuous covariates and tospecify their effect on each transition in a proportional hazards way. A frailty term can be added,too, in order to provide clustering. No particular restriction is needed on covariates distributions,frailty distribution, number and sizes of clusters.An example is provided simulating data mimicking those from an Italian multi-center studyon head and neck cancer. The multi-state structure of these data arises from the interest instudying both time to local relapses and to distant metastases before death.We show that our proposed method reaches very good convergence to the target values.

Transcript of A copula-based Simulation Method for Clustered Multi-State Survival Data

Page 1: A copula-based Simulation Method for Clustered Multi-State Survival Data

A copula-based simulation methodfor clustered multi-state survival data

F. Rotolo•?, C. Legrand?, I. Van Keilegom?, M. Chiogna•

•Dipartmento di Scienze Statistiche

Universita degli Studi di Padova

?Institut de Statistique, Biostatistiqueet Sciences Actuarielles

Universite Catholique de Louvain

September 23, 2011

Page 2: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Survival DataTime since an origin event until an event of interest.Example: from birth to death, since beginning of therapy until remission, etc.

0 1 2 3 4 5

Time

T=5

Censoring: some observations cannot be observed, the onlyavailable information being a lower bound.Example: migration, change of therapy, loss to follow-up, etc.

0 1 2 3 4 5

●xTime

T>3.25

A copula-based simulation method for clustered multi-state survival data 2/ 22

Page 3: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Survival DataTime since an origin event until an event of interest.Example: from birth to death, since beginning of therapy until remission, etc.

0 1 2 3 4 5

Time

T=5

Censoring: some observations cannot be observed, the onlyavailable information being a lower bound.Example: migration, change of therapy, loss to follow-up, etc.

0 1 2 3 4 5

●xTime

T>3.25

A copula-based simulation method for clustered multi-state survival data 2/ 22

Page 4: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Modeling Survival Data

Because of this peculiarity, instead of modeling the density f (t) ofT , the hazard is considered

h(t) = lim∆t↘0

P[t ≤ T < t + ∆t|T ≥ t]

∆t=

f (t)

S(t)= − d

dtlog S(t),

with S(t) =∫∞t f (u)du = P[T > t].

Note: S(t) = exp{−∫ t

0 h(u)du}.

The basic regression model for the hazard is the ProportionalHazards (PH) Model (Cox, 1972)

h(t|X ) = h0(t) exp{β′X}.

A copula-based simulation method for clustered multi-state survival data 3/ 22

Page 5: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Modeling Survival Data

Because of this peculiarity, instead of modeling the density f (t) ofT , the hazard is considered

h(t) = lim∆t↘0

P[t ≤ T < t + ∆t|T ≥ t]

∆t=

f (t)

S(t)= − d

dtlog S(t),

with S(t) =∫∞t f (u)du = P[T > t].

Note: S(t) = exp{−∫ t

0 h(u)du}.

The basic regression model for the hazard is the ProportionalHazards (PH) Model (Cox, 1972)

h(t|X ) = h0(t) exp{β′X}.

A copula-based simulation method for clustered multi-state survival data 3/ 22

Page 6: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Survival ModelsComplications of Cox models have been developed

Frailty Models (FMs)

account for overdispersionor clustering by means

of random effects

h(t|Xij) = h0(t)Zieβ′Xij ,

similar to GLMMlog[h(t|Xij)] = log[h0(t)]+Wi+β

′Xij ,

with Zi = eWi

(Duchateau & Janssen, 2008; Wienke, 2010)

Multi-State Models (MSMs)

consider several eventsand their interactions

NED

LR

DM

De

T1

T3

T2

T4

T5

(Putter et al., 2007; de Wreede et al., 2010)

Possible integration? I Simulation studies

A copula-based simulation method for clustered multi-state survival data 4/ 22

Page 7: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Survival ModelsComplications of Cox models have been developed

Frailty Models (FMs)

account for overdispersionor clustering by means

of random effects

h(t|Xij) = h0(t)Zieβ′Xij ,

similar to GLMMlog[h(t|Xij)] = log[h0(t)]+Wi+β

′Xij ,

with Zi = eWi

(Duchateau & Janssen, 2008; Wienke, 2010)

Multi-State Models (MSMs)

consider several eventsand their interactions

NED

LR

DM

De

T1

T3

T2

T4

T5

(Putter et al., 2007; de Wreede et al., 2010)

Possible integration? I Simulation studies

A copula-based simulation method for clustered multi-state survival data 4/ 22

Page 8: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Survival ModelsComplications of Cox models have been developed

Frailty Models (FMs)

account for overdispersionor clustering by means

of random effects

h(t|Xij) = h0(t)Zieβ′Xij ,

similar to GLMMlog[h(t|Xij)] = log[h0(t)]+Wi+β

′Xij ,

with Zi = eWi

(Duchateau & Janssen, 2008; Wienke, 2010)

Multi-State Models (MSMs)

consider several eventsand their interactions

NED

LR

DM

De

T1

T3

T2

T4

T5

(Putter et al., 2007; de Wreede et al., 2010)

Possible integration?

I Simulation studies

A copula-based simulation method for clustered multi-state survival data 4/ 22

Page 9: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Survival ModelsComplications of Cox models have been developed

Frailty Models (FMs)

account for overdispersionor clustering by means

of random effects

h(t|Xij) = h0(t)Zieβ′Xij ,

similar to GLMMlog[h(t|Xij)] = log[h0(t)]+Wi+β

′Xij ,

with Zi = eWi

(Duchateau & Janssen, 2008; Wienke, 2010)

Multi-State Models (MSMs)

consider several eventsand their interactions

NED

LR

DM

De

T1

T3

T2

T4

T5

(Putter et al., 2007; de Wreede et al., 2010)

Possible integration? I Simulation studiesA copula-based simulation method for clustered multi-state survival data 4/ 22

Page 10: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Simulation of data

A simulation method should be able to generate

NED

LR

DM

De

I the dependence of times ofcompeting events

I the dependence of times ofsubsequent events

I the dependence between clusteredobservations

I the censoring due to competingevents occurrence

I the censoring due to end of thestudy or loss to follow up

I the event-specific covariates effect

A copula-based simulation method for clustered multi-state survival data 5/ 22

Page 11: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Simulation of data

A simulation method should be able to generate

NED

LR

DM

De

I the dependence of times ofcompeting events

I the dependence of times ofsubsequent events

I the dependence between clusteredobservations

I the censoring due to competingevents occurrence

I the censoring due to end of thestudy or loss to follow up

I the event-specific covariates effect

A copula-based simulation method for clustered multi-state survival data 5/ 22

Page 12: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Simulation of data

A simulation method should be able to generate

NED

LR

DM

De NED

LR

DM

De

NED

LR

DM

De

NED

LR

DM

De NED

LR

DM

De

NED

LR

DM

De NED

LR

DM

De

NED

LR

DM

De

NED

LR

DM

De NED

LR

DM

De

NED

LR

DM

De NED

LR

DM

De

NED

LR

DM

De

NED

LR

DM

De NED

LR

DM

De

NED

LR

DM

De NED

LR

DM

De

NED

LR

DM

De

NED

LR

DM

De NED

LR

DM

De

I the dependence of times ofcompeting events

I the dependence of times ofsubsequent events

I the dependence between clusteredobservations

I the censoring due to competingevents occurrence

I the censoring due to end of thestudy or loss to follow up

I the event-specific covariates effect

A copula-based simulation method for clustered multi-state survival data 5/ 22

Page 13: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Simulation of data

A simulation method should be able to generate

NED

LR

DM

De

x

x

I the dependence of times ofcompeting events

I the dependence of times ofsubsequent events

I the dependence between clusteredobservations

I the censoring due to competingevents occurrence

I the censoring due to end of thestudy or loss to follow up

I the event-specific covariates effect

A copula-based simulation method for clustered multi-state survival data 5/ 22

Page 14: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Simulation of data

A simulation method should be able to generate

NED

LR

DM

De

x

x

x

I the dependence of times ofcompeting events

I the dependence of times ofsubsequent events

I the dependence between clusteredobservations

I the censoring due to competingevents occurrence

I the censoring due to end of thestudy or loss to follow up

I the event-specific covariates effect

A copula-based simulation method for clustered multi-state survival data 5/ 22

Page 15: A copula-based Simulation Method for Clustered Multi-State Survival Data

Clustered Multi-State Survival Data F. Rotolo

Simulation of data

A simulation method should be able to generate

NED

LR

DM

De

T1

T3

T2

T4

T5

I the dependence of times ofcompeting events

I the dependence of times ofsubsequent events

I the dependence between clusteredobservations

I the censoring due to competingevents occurrence

I the censoring due to end of thestudy or loss to follow up

I the event-specific covariates effect

A copula-based simulation method for clustered multi-state survival data 5/ 22

Page 16: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Outline

Clustered Multi-State Survival Data

Simulation Algorithm

Clustering

Choice of Parameters

Example

A copula-based simulation method for clustered multi-state survival data 6/ 22

Page 17: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Copula Model

NED

LR

DM

De

T1

T3

T2

I Marginal survival functions freely chosenS1(t), S2(t) and S3(t)

I Joint survival function by Clayton Copula

S123(t) =(∑3

i=1 Si (ti )−θ − 2

)−1/θ

I Conditional survivals from the joint

S2|1(t2|t1) =

[1 +

(S1(t1)S2(t2)

)θ− S1(t1)θ

]−1/θ−1

S3|12(t3|t1, t2) =(

1 + S3(t3)−θ−1

S1(t1)−θ+S2(t2)−θ−1

)−1/θ−2

A copula-based simulation method for clustered multi-state survival data 7/ 22

Page 18: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Copula Model

NED

LR

DM

De

T1

T3

T2

I Marginal survival functions freely chosenS1(t), S2(t) and S3(t)

I Joint survival function by Clayton Copula

S123(t) =(∑3

i=1 Si (ti )−θ − 2

)−1/θ

I Conditional survivals from the joint

S2|1(t2|t1) =

[1 +

(S1(t1)S2(t2)

)θ− S1(t1)θ

]−1/θ−1

S3|12(t3|t1, t2) =(

1 + S3(t3)−θ−1

S1(t1)−θ+S2(t2)−θ−1

)−1/θ−2

A copula-based simulation method for clustered multi-state survival data 7/ 22

Page 19: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Copula Model

NED

LR

DM

De

T1

T3

T2

I Marginal survival functions freely chosenS1(t), S2(t) and S3(t)

I Joint survival function by Clayton Copula

S123(t) =(∑3

i=1 Si (ti )−θ − 2

)−1/θ

I Conditional survivals from the joint

S2|1(t2|t1) =

[1 +

(S1(t1)S2(t2)

)θ− S1(t1)θ

]−1/θ−1

S3|12(t3|t1, t2) =(

1 + S3(t3)−θ−1

S1(t1)−θ+S2(t2)−θ−1

)−1/θ−2

A copula-based simulation method for clustered multi-state survival data 7/ 22

Page 20: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Copula Model

NED

LR

DM

De

T1

T3

T2

I Marginal survival functions freely chosenS1(t), S2(t) and S3(t)

I Joint survival function by Clayton Copula

S123(t) =(∑3

i=1 Si (ti )−θ − 2

)−1/θ

I Conditional survivals from the joint

S2|1(t2|t1) =

[1 +

(S1(t1)S2(t2)

)θ− S1(t1)θ

]−1/θ−1

S3|12(t3|t1, t2) =(

1 + S3(t3)−θ−1

S1(t1)−θ+S2(t2)−θ−1

)−1/θ−2

A copula-based simulation method for clustered multi-state survival data 7/ 22

Page 21: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Algorithm

Data from the copula model (Kpanzou, 2007) are simulated asfollows

1 I T1 = S−11 (U1)

2 I T2|t1 = S−12|1 (U2|t1) =

S−12

({[U− θ

1+θ

2 − 1

]S1(t1)−θ + 1

}−1/θ)

3 I T3|t1, t2 = S−13|12(U3|t1, t2) =

S−13

({[U− θ

1+2θ

3 − 1

] [S1(t1)−θ + S2(t2)−θ − 1

]+ 1

}−1/θ)

C I TC = F−1C (UC )

T I min(TC ,T1,T2,T3)

with U1,U2,U3,UC i.i.d. U(0, 1)A copula-based simulation method for clustered multi-state survival data 8/ 22

Page 22: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Algorithm

Data from the copula model (Kpanzou, 2007) are simulated asfollows

1 I T1 = S−11 (U1)

2 I T2|t1 = S−12|1 (U2|t1) =

S−12

({[U− θ

1+θ

2 − 1

]S1(t1)−θ + 1

}−1/θ)

3 I T3|t1, t2 = S−13|12(U3|t1, t2) =

S−13

({[U− θ

1+2θ

3 − 1

] [S1(t1)−θ + S2(t2)−θ − 1

]+ 1

}−1/θ)

C I TC = F−1C (UC )

T I min(TC ,T1,T2,T3)

with U1,U2,U3,UC i.i.d. U(0, 1)A copula-based simulation method for clustered multi-state survival data 8/ 22

Page 23: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Algorithm

Data from the copula model (Kpanzou, 2007) are simulated asfollows

1 I T1 = S−11 (U1)

2 I T2|t1 = S−12|1 (U2|t1) =

S−12

({[U− θ

1+θ

2 − 1

]S1(t1)−θ + 1

}−1/θ)

3 I T3|t1, t2 = S−13|12(U3|t1, t2) =

S−13

({[U− θ

1+2θ

3 − 1

] [S1(t1)−θ + S2(t2)−θ − 1

]+ 1

}−1/θ)

C I TC = F−1C (UC )

T I min(TC ,T1,T2,T3)

with U1,U2,U3,UC i.i.d. U(0, 1)A copula-based simulation method for clustered multi-state survival data 8/ 22

Page 24: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Algorithm

Data from the copula model (Kpanzou, 2007) are simulated asfollows

1 I T1 = S−11 (U1)

2 I T2|t1 = S−12|1 (U2|t1) =

S−12

({[U− θ

1+θ

2 − 1

]S1(t1)−θ + 1

}−1/θ)

3 I T3|t1, t2 = S−13|12(U3|t1, t2) =

S−13

({[U− θ

1+2θ

3 − 1

] [S1(t1)−θ + S2(t2)−θ − 1

]+ 1

}−1/θ)

C I TC = F−1C (UC )

T I min(TC ,T1,T2,T3)

with U1,U2,U3,UC i.i.d. U(0, 1)A copula-based simulation method for clustered multi-state survival data 8/ 22

Page 25: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Algorithm

Data from the copula model (Kpanzou, 2007) are simulated asfollows

1 I T1 = S−11 (U1)

2 I T2|t1 = S−12|1 (U2|t1) =

S−12

({[U− θ

1+θ

2 − 1

]S1(t1)−θ + 1

}−1/θ)

3 I T3|t1, t2 = S−13|12(U3|t1, t2) =

S−13

({[U− θ

1+2θ

3 − 1

] [S1(t1)−θ + S2(t2)−θ − 1

]+ 1

}−1/θ)

C I TC = F−1C (UC )

T I min(TC ,T1,T2,T3)

with U1,U2,U3,UC i.i.d. U(0, 1)A copula-based simulation method for clustered multi-state survival data 8/ 22

Page 26: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Second transitionsFor patients with a transition into state LR or DM, an analogouscopula model is used for second transition to state De

NED

LR

DM

De

T1 T4

The following conditional survivals can be obtained

I S4|1(t4|t1) =

[1 +

(S1(t1)S4(t4)

)θ− S1(t1)θ

]−1/θ−1

I S5|2(t5|t2) =

[1 +

(S2(t2)S5(t5)

)θ− S2(t2)θ

]−1/θ−1

and the same algorithm is used to simulate secondtransition times, conditionally on first transition ones.

A copula-based simulation method for clustered multi-state survival data 9/ 22

Page 27: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Second transitionsFor patients with a transition into state LR or DM, an analogouscopula model is used for second transition to state De

NED

LR

DM

De

T2 T5

The following conditional survivals can be obtained

I S4|1(t4|t1) =

[1 +

(S1(t1)S4(t4)

)θ− S1(t1)θ

]−1/θ−1

I S5|2(t5|t2) =

[1 +

(S2(t2)S5(t5)

)θ− S2(t2)θ

]−1/θ−1

and the same algorithm is used to simulate secondtransition times, conditionally on first transition ones.

A copula-based simulation method for clustered multi-state survival data 9/ 22

Page 28: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Clustering

The algorithm allows to freely specify the marginal survivals Si (t).How can we insert clustering?

In a PH wayhi (t|Z ) = Zh0i (t),

with h0i (t) the baseline hazard for transition i .

Since S0i (t) = exp{−∫ t

0 h0i (u)du}, then

Si (t|Z ) = exp

{−Z

∫ t

0h0i (u)du

}= [S0i (t)]Z

The copula model can be used for conditional survivals{Si (t|Z )}i∈{1,2,3,4,5} and the same algorithm can be used,conditionally on Z .

A copula-based simulation method for clustered multi-state survival data 10/ 22

Page 29: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Clustering

The algorithm allows to freely specify the marginal survivals Si (t).How can we insert clustering?

In a PH wayhi (t|Z ) = Zh0i (t),

with h0i (t) the baseline hazard for transition i .

Since S0i (t) = exp{−∫ t

0 h0i (u)du}, then

Si (t|Z ) = exp

{−Z

∫ t

0h0i (u)du

}= [S0i (t)]Z

The copula model can be used for conditional survivals{Si (t|Z )}i∈{1,2,3,4,5} and the same algorithm can be used,conditionally on Z .

A copula-based simulation method for clustered multi-state survival data 10/ 22

Page 30: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Clustering

The algorithm allows to freely specify the marginal survivals Si (t).How can we insert clustering?

In a PH wayhi (t|Z ) = Zh0i (t),

with h0i (t) the baseline hazard for transition i .

Since S0i (t) = exp{−∫ t

0 h0i (u)du}, then

Si (t|Z ) = exp

{−Z

∫ t

0h0i (u)du

}= [S0i (t)]Z

The copula model can be used for conditional survivals{Si (t|Z )}i∈{1,2,3,4,5} and the same algorithm can be used,conditionally on Z .

A copula-based simulation method for clustered multi-state survival data 10/ 22

Page 31: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Clustering

The algorithm allows to freely specify the marginal survivals Si (t).How can we insert clustering?

In a PH wayhi (t|Z ) = Zh0i (t),

with h0i (t) the baseline hazard for transition i .

Since S0i (t) = exp{−∫ t

0 h0i (u)du}, then

Si (t|Z ) = exp

{−Z

∫ t

0h0i (u)du

}= [S0i (t)]Z

The copula model can be used for conditional survivals{Si (t|Z )}i∈{1,2,3,4,5} and the same algorithm can be used,conditionally on Z .

A copula-based simulation method for clustered multi-state survival data 10/ 22

Page 32: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

Clustering and covariates

The effect of covariates X can be inserted in an analogous way.The marginals are then

Si (t|X ,Z ) = S0i (t)Zeβ′i X

and simulation via the copula model is done conditionally on(X ,Z ).

A copula-based simulation method for clustered multi-state survival data 11/ 22

Page 33: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

The Clayton–Weibull model

Despite the model is quite general, we consider in the following aparticular case:

I Ti ∼Wei(λi , ρi ), i ∈ {1, 2, 3, 4, 5}I TC ∼Wei(λC , 1) ∼ Exp(λC )

I 72 months (6 years) of administrative censoring

This model

1. gives simple forms of conditional distributions

2. implies that Si |X ,Z (t|x , z) = exp{−λizeβTi x tρi},

i.e. Ti |X ,Z ∼Wei(λizeβTi x , ρi ) is still a Weibull r.v.

A copula-based simulation method for clustered multi-state survival data 12/ 22

Page 34: A copula-based Simulation Method for Clustered Multi-State Survival Data

Simulation Algorithm F. Rotolo

The Clayton–Weibull model

Despite the model is quite general, we consider in the following aparticular case:

I Ti ∼Wei(λi , ρi ), i ∈ {1, 2, 3, 4, 5}I TC ∼Wei(λC , 1) ∼ Exp(λC )

I 72 months (6 years) of administrative censoring

This model

1. gives simple forms of conditional distributions

2. implies that Si |X ,Z (t|x , z) = exp{−λizeβTi x tρi},

i.e. Ti |X ,Z ∼Wei(λizeβTi x , ρi ) is still a Weibull r.v.

A copula-based simulation method for clustered multi-state survival data 12/ 22

Page 35: A copula-based Simulation Method for Clustered Multi-State Survival Data

Choice of Parameters F. Rotolo

Outline

Clustered Multi-State Survival Data

Simulation Algorithm

Clustering

Choice of Parameters

Example

A copula-based simulation method for clustered multi-state survival data 13/ 22

Page 36: A copula-based Simulation Method for Clustered Multi-State Survival Data

Choice of Parameters F. Rotolo

Choice of parameters

When simulating a dataset, one should be able to choose parameters inorder to obtain particular target values for

NED

LR

DM

De

T1

T3

T2

T4

T5

pi I probabilities of LR, DM, De andcensoring from NED

mi I median of uncensored LR, DMand De times from NED

pi I probabilities of De and censoringfrom LR and from DM

mi I median of uncensored De timesfrom LR and from DM

It is not possible to analytically express these quantities as functions of

the parameters.

A copula-based simulation method for clustered multi-state survival data 14/ 22

Page 37: A copula-based Simulation Method for Clustered Multi-State Survival Data

Choice of Parameters F. Rotolo

Choice of parameters

When simulating a dataset, one should be able to choose parameters inorder to obtain particular target values for

NED

LR

DM

De

T1

T3

T2

T4

T5

pi I probabilities of LR, DM, De andcensoring from NED

mi I median of uncensored LR, DMand De times from NED

pi I probabilities of De and censoringfrom LR and from DM

mi I median of uncensored De timesfrom LR and from DM

It is not possible to analytically express these quantities as functions of

the parameters.

A copula-based simulation method for clustered multi-state survival data 14/ 22

Page 38: A copula-based Simulation Method for Clustered Multi-State Survival Data

Choice of Parameters F. Rotolo

Choice of parameters

When simulating a dataset, one should be able to choose parameters inorder to obtain particular target values for

NED

LR

DM

De

T1

T3

T2

T4

T5

pi I probabilities of LR, DM, De andcensoring from NED

mi I median of uncensored LR, DMand De times from NED

pi I probabilities of De and censoringfrom LR and from DM

mi I median of uncensored De timesfrom LR and from DM

It is not possible to analytically express these quantities as functions of

the parameters.

A copula-based simulation method for clustered multi-state survival data 14/ 22

Page 39: A copula-based Simulation Method for Clustered Multi-State Survival Data

Choice of Parameters F. Rotolo

Choice of parameters

When simulating a dataset, one should be able to choose parameters inorder to obtain particular target values for

NED

LR

DM

De

T1

T3

T2

T4

T5

pi I probabilities of LR, DM, De andcensoring from NED

mi I median of uncensored LR, DMand De times from NED

pi I probabilities of De and censoringfrom LR and from DM

mi I median of uncensored De timesfrom LR and from DM

It is not possible to analytically express these quantities as functions of

the parameters.

A copula-based simulation method for clustered multi-state survival data 14/ 22

Page 40: A copula-based Simulation Method for Clustered Multi-State Survival Data

Choice of Parameters F. Rotolo

Criterion function

In order to find appropriate parameters for given target values{pi ,mi}, we want to minimize the criterion function

Υ(Π) =∑

i∈{1,2,3,4,5}

{[log

pipi (Π)

]2

+

[log

mi

mi (Π)

]2}

≥ 0

with

Π = {λi}i∈{1,2,3,C ,4,C4,5,C5} ∪ {ρi}i∈{1,2,3,4,5} ∈ R13+

Π = Π123 ∪ Π4 ∪ Π5 ∈ R7+ × R3

+ × R3+

Further reduction of problem dimension...

Π = Π123 ∪ Π4 ∪ Π5 ∈ R4+3+ × R2+1

+ × R2+1+

and {pi (Π), mi (Π)} the observed values in a simulated dataset with

parameters Π

A copula-based simulation method for clustered multi-state survival data 15/ 22

Page 41: A copula-based Simulation Method for Clustered Multi-State Survival Data

Choice of Parameters F. Rotolo

Criterion function

In order to find appropriate parameters for given target values{pi ,mi}, we want to minimize the criterion function

Υ(Π) =∑

i∈{1,2,3,4,5}

{[log

pipi (Π)

]2

+

[log

mi

mi (Π)

]2}

= Υ123(Π123) + Υ4(Π4) + Υ5(Π5) ≥ 0

with

Π = {λi}i∈{1,2,3,C ,4,C4,5,C5} ∪ {ρi}i∈{1,2,3,4,5} ∈ R13+

Π = Π123 ∪ Π4 ∪ Π5 ∈ R7+ × R3

+ × R3+

Further reduction of problem dimension...

Π = Π123 ∪ Π4 ∪ Π5 ∈ R4+3+ × R2+1

+ × R2+1+

and {pi (Π), mi (Π)} the observed values in a simulated dataset with

parameters Π

A copula-based simulation method for clustered multi-state survival data 15/ 22

Page 42: A copula-based Simulation Method for Clustered Multi-State Survival Data

Choice of Parameters F. Rotolo

Criterion function

In order to find appropriate parameters for given target values{pi ,mi}, we want to minimize the criterion function

Υ(Π) =∑

i∈{1,2,3,4,5}

{[log

pipi (Π)

]2

+

[log

mi

mi (Π)

]2}

= Υ123(Π123) + Υ4(Π4) + Υ5(Π5) ≥ 0

with

Π = {λi}i∈{1,2,3,C ,4,C4,5,C5} ∪ {ρi}i∈{1,2,3,4,5} ∈ R13+

Π = Π123 ∪ Π4 ∪ Π5 ∈ R7+ × R3

+ × R3+

Further reduction of problem dimension...

Π = Π123 ∪ Π4 ∪ Π5 ∈ R4+3+ × R2+1

+ × R2+1+

and {pi (Π), mi (Π)} the observed values in a simulated dataset with

parameters Π

A copula-based simulation method for clustered multi-state survival data 15/ 22

Page 43: A copula-based Simulation Method for Clustered Multi-State Survival Data

Choice of Parameters F. Rotolo

Minimization of criterion function

In order to further reduce the dimension of the problem, each ofthe parameter sets ΠK ,K ∈ {{123}, {4}, {5}} is split into thescale {λi} and the shape parameters {ρi}. The optimization of thecriterion function ΥK (ΠK ) is iterated on each subset

Example: algorithm for K = {123}I Set J = 1λ(0) = {λ(0)

i }i∈{C ,1,2,3} = {1, 1, 1, 1}ρ(0) = {ρ(0)

i }i∈{1,2,3} = {1, 1, 1}

I Repeat until J = maxit or Υ123(λ(J−1), ρ(J−1)) < thI Obtain λ(J) by minimizing Υ123(λ, ρ(J−1)) over λI Obtain ρ(J) by minimizing Υ123(λ(J), ρ) over ρI Set J = J + 1

where maxit and th are arbitrary termination parameters.

A copula-based simulation method for clustered multi-state survival data 16/ 22

Page 44: A copula-based Simulation Method for Clustered Multi-State Survival Data

Choice of Parameters F. Rotolo

Minimization of criterion function

In order to further reduce the dimension of the problem, each ofthe parameter sets ΠK ,K ∈ {{123}, {4}, {5}} is split into thescale {λi} and the shape parameters {ρi}. The optimization of thecriterion function ΥK (ΠK ) is iterated on each subset

Example: algorithm for K = {123}I Set J = 1λ(0) = {λ(0)

i }i∈{C ,1,2,3} = {1, 1, 1, 1}ρ(0) = {ρ(0)

i }i∈{1,2,3} = {1, 1, 1}I Repeat until J = maxit or Υ123(λ(J−1), ρ(J−1)) < th

I Obtain λ(J) by minimizing Υ123(λ, ρ(J−1)) over λI Obtain ρ(J) by minimizing Υ123(λ(J), ρ) over ρI Set J = J + 1

where maxit and th are arbitrary termination parameters.

A copula-based simulation method for clustered multi-state survival data 16/ 22

Page 45: A copula-based Simulation Method for Clustered Multi-State Survival Data

Example F. Rotolo

An example

A dataset of size 44 is available from a multi-center study on head and neck cancer.

NED

LR

DM

De22

0

8

14

4

15

3

4

7

Tot: 44

Target values {pi} and {mi}

Frailty term

I 40 Hospitals

I random sizes

I Z ∼ Gam(1, 0.5)

Covariates

I Age ∼ N (60, 7)with

βi,Age =

log(0.8)/10 i = 1

log(0.9)/10 i = 2

log(1.2)/10 i = 3, 4, 5

I Treat ∼ Bin(0.5)with

βi,Treat =

log(1/3) i = 1

0 i = 2

log(1.2) i = 3, 4, 5

A copula-based simulation method for clustered multi-state survival data 17/ 22

Page 46: A copula-based Simulation Method for Clustered Multi-State Survival Data

Example F. Rotolo

An example

A dataset of size 44 is available from a multi-center study on head and neck cancer.

NED

LR

DM

De22

0

8

14

4

15

3

4

7

Tot: 44

Target values {pi} and {mi}

Frailty term

I 40 Hospitals

I random sizes

I Z ∼ Gam(1, 0.5)

Covariates

I Age ∼ N (60, 7)with

βi,Age =

log(0.8)/10 i = 1

log(0.9)/10 i = 2

log(1.2)/10 i = 3, 4, 5

I Treat ∼ Bin(0.5)with

βi,Treat =

log(1/3) i = 1

0 i = 2

log(1.2) i = 3, 4, 5

A copula-based simulation method for clustered multi-state survival data 17/ 22

Page 47: A copula-based Simulation Method for Clustered Multi-State Survival Data

Example F. Rotolo

An example

A dataset of size 44 is available from a multi-center study on head and neck cancer.

NED

LR

DM

De22

0

8

14

4

15

3

4

7

Tot: 44

Target values {pi} and {mi}

Frailty term

I 40 Hospitals

I random sizes

I Z ∼ Gam(1, 0.5)

Covariates

I Age ∼ N (60, 7)with

βi,Age =

log(0.8)/10 i = 1

log(0.9)/10 i = 2

log(1.2)/10 i = 3, 4, 5

I Treat ∼ Bin(0.5)with

βi,Treat =

log(1/3) i = 1

0 i = 2

log(1.2) i = 3, 4, 5

A copula-based simulation method for clustered multi-state survival data 17/ 22

Page 48: A copula-based Simulation Method for Clustered Multi-State Survival Data

Example F. Rotolo

Results

First transitions. The algorithm is run with datasets of size 104,maxit = 10 and th = 0.1. The time of execution was 11:57’ hours

NED→ {LR,DM,De}λ1 λ2 λ3 λC ρ1 ρ2 ρ3

0.276 0.019 0.013 0.031 0.851 1.076 0.569

NED→ {LR,DM,De}pi mi

LR DM De C LR DM De

Target 0.34 0.09 0.07 0.50 6.00 10.00 3.00Simulated 0.33 0.12 0.09 0.46 5.41 9.33 2.29

Υ123(Π123) = 0.24

A copula-based simulation method for clustered multi-state survival data 18/ 22

Page 49: A copula-based Simulation Method for Clustered Multi-State Survival Data

Example F. Rotolo

Results

First transitions. The algorithm is run with datasets of size 104,maxit = 10 and th = 0.1. The time of execution was 11:57’ hours

NED→ {LR,DM,De}λ1 λ2 λ3 λC ρ1 ρ2 ρ3

0.276 0.019 0.013 0.031 0.851 1.076 0.569

NED→ {LR,DM,De}pi mi

LR DM De C LR DM De

Target 0.34 0.09 0.07 0.50 6.00 10.00 3.00Simulated 0.33 0.12 0.09 0.46 5.41 9.33 2.29

Υ123(Π123) = 0.24

A copula-based simulation method for clustered multi-state survival data 18/ 22

Page 50: A copula-based Simulation Method for Clustered Multi-State Survival Data

Example F. Rotolo

ResultsSecond transitions. Conditionally on first transitions data, thealgorithm is run for second transitions from LR and DM withmaxit = 6 and th = 0.05. The times of execution were 4:31’ and3:57’ hours, respectively.

LR→De

λ4 λC4 ρ4

0.029 0.099 1.078

DM→De

λ5 λC5 ρ5

0.192 0.039 1.000

LR→De DM→Depi mi pi mi

De C De De C De

0.53 0.47 3.25 Target 0.95 0.05 0.500.50 0.50 3.32 Simulated 0.97 0.03 0.54

Υ4(Π4) = 0.0043 Υ5(Π5) = 0.0064

A copula-based simulation method for clustered multi-state survival data 19/ 22

Page 51: A copula-based Simulation Method for Clustered Multi-State Survival Data

Example F. Rotolo

ResultsSecond transitions. Conditionally on first transitions data, thealgorithm is run for second transitions from LR and DM withmaxit = 6 and th = 0.05. The times of execution were 4:31’ and3:57’ hours, respectively.

LR→De

λ4 λC4 ρ4

0.029 0.099 1.078

DM→De

λ5 λC5 ρ5

0.192 0.039 1.000

LR→De DM→Depi mi pi mi

De C De De C De

0.53 0.47 3.25 Target 0.95 0.05 0.500.50 0.50 3.32 Simulated 0.97 0.03 0.54

Υ4(Π4) = 0.0043 Υ5(Π5) = 0.0064

A copula-based simulation method for clustered multi-state survival data 19/ 22

Page 52: A copula-based Simulation Method for Clustered Multi-State Survival Data

Conclusion F. Rotolo

Conclusion

The proposed simulation procedure for clustered MS allows to

MSMsJ generate dependence between times of the same subject(between both competing and subsequent event times)

FMsJ generate dependence between times of clustered subjects(with arbitrary number and size of groups and free frailty distribution)

PHJ insert covariates via proportional hazards

parModJ choose marginal distributions of time variables

I automatically find appropriate parameters, given arbitrarytarget values for probabilities of censoring, of competingevents and for medians of uncensored times

I generate censoring, both random and administrative

A copula-based simulation method for clustered multi-state survival data 20/ 22

Page 53: A copula-based Simulation Method for Clustered Multi-State Survival Data

Conclusion F. Rotolo

Conclusion

The proposed simulation procedure for clustered MS allows to

MSMsJ generate dependence between times of the same subject(between both competing and subsequent event times)

FMsJ generate dependence between times of clustered subjects(with arbitrary number and size of groups and free frailty distribution)

PHJ insert covariates via proportional hazards

parModJ choose marginal distributions of time variables

I automatically find appropriate parameters, given arbitrarytarget values for probabilities of censoring, of competingevents and for medians of uncensored times

I generate censoring, both random and administrative

A copula-based simulation method for clustered multi-state survival data 20/ 22

Page 54: A copula-based Simulation Method for Clustered Multi-State Survival Data

References F. Rotolo

References

Cox, D. R. (1972). Regression models and life-tables. Journal of theRoyal Statistical Society. Series B (Methodological) 34, 187–220.

de Wreede, L. C., Fiocco, M. & Putter, H. (2010). The mstatepackage for estimation and prediction in non- and semi-parametricmulti-state and competing risks models. Comput Methods ProgramsBiomed 99, 261–74.

Duchateau, L. & Janssen, P. (2008). The frailty model. Springer.

Kpanzou, T. A. (2007). Copulas in statistics. African Institute forMathematical Sciences (AIMS) .

Putter, H., Fiocco, M. & Geskus, R. B. (2007). Tutorial inbiostatistics: competing risks and multi-state models. Stat Med 26,2389–430.

Wienke, A. (2010). Frailty Models in Survival Analysis. Chapman &Hall/CRC biostatistics series. Taylor and Francis.

A copula-based simulation method for clustered multi-state survival data 21/ 22

Page 55: A copula-based Simulation Method for Clustered Multi-State Survival Data

F. Rotolo [[email protected][email protected]]

PhD Student at University of Padova and Visiting PhD Student at UCL

under the supervision of

prof. C. Legrand, UCLprof. I. Van Keilegom, UCLprof. M. Chiogna, UniPd