A Spatial Scan Statistic for Survival Data

19
A Spatial Scan Statistic for Survival Data Lan Huang, Dep Statistics, Univ Connecticut Martin Kulldorff, Harvard Medical School David Gregorio, Dep Community Medicine, Univ Connecticut

description

A Spatial Scan Statistic for Survival Data. Lan Huang, Dep Statistics, Univ Connecticut Martin Kulldorff, Harvard Medical School David Gregorio, Dep Community Medicine, Univ Connecticut. Motivation and Background. - PowerPoint PPT Presentation

Transcript of A Spatial Scan Statistic for Survival Data

Page 1: A Spatial Scan Statistic for Survival Data

A Spatial Scan Statistic for Survival Data

Lan Huang, Dep Statistics, Univ Connecticut

Martin Kulldorff, Harvard Medical School

David Gregorio, Dep Community Medicine, Univ Connecticut

Page 2: A Spatial Scan Statistic for Survival Data

Motivation and Background

What is the geographical distribution of prostate cancer survival in Connecticut?

Are there geographical clusters with exceptionally short or long survival?

Page 3: A Spatial Scan Statistic for Survival Data

Survival Data

For each person:

• Time of diagnosis.• Whether dead or censored• Time until death/censoring• Residential geographical coordinates• Age• etc

Page 4: A Spatial Scan Statistic for Survival Data

Motivation and Background

• Spatial scan-statistics with Bernoulli and Poisson models are designed for count data.

• Length of survival is continuous data.

• Survival data is often censored.

Page 5: A Spatial Scan Statistic for Survival Data

Solution

Spatial Scan Statistic using an

Exponential Probability Model

Page 6: A Spatial Scan Statistic for Survival Data

Methodology• Exponential model based spatial statistic

H0: θin= θout

Ha: θin θoutExponential likelihood

Spatial scan-statistic

distribution

Permutation test

Stat inference Hypothesis testDetect a

significant cluster

Page 7: A Spatial Scan Statistic for Survival Data

Methods Evaluation

• Location of 610 Connecticut prostate cancer patients diagnosed in 1984.

• 47 patients in southwest Connecticut constitute a cluster with shorter survival (cluster radius: 8.65 km)

• Each of the 610 patients assigned a random survival or censoring time using different distributions inside and outside the cluster

Page 8: A Spatial Scan Statistic for Survival Data

Model Evaluation

Exponential

Gamma

Log-normal

θin θout

1

5

3

7

9

10

θdiff

1

3

5

7

9

Non-cen

censoredrandom

fixed

610 individuals

47563

- =

Page 9: A Spatial Scan Statistic for Survival Data

#individuals inside the true cluster , successfully detected for the simulated datasets without censoring

0

5101520253035404550

1 3 5 7 9

expgammalog-nor

θdiff

P-value<0.05

s

Page 10: A Spatial Scan Statistic for Survival Data

#individuals inside the true cluster , successfully detected for censored datasets with fixed censoring time

0

5101520253035404550

1 3 5 7 9

expgammalog-nor

θdiff

P-value<0.05

s

Page 11: A Spatial Scan Statistic for Survival Data

#individuals inside the true cluster , successfully detected for censored datasets with random censoring time

0

5101520253035404550

1 3 5 7 9

expgammalog-nor

P-value<0.05

θdiff

s

Page 12: A Spatial Scan Statistic for Survival Data

Model Evaluation

• Exponential model is robust, since the exponential based scan statistic is able to reject the null hypothesis with a low p-value when the distribution difference is moderate or large, no matter the distribution and censoring mechanism.

Page 13: A Spatial Scan Statistic for Survival Data

Application to Prostate Cancer Data

• Between 1984 and 1995, the Connecticut Tumor registry recorded 22612 invasive prostate cancer incidence cases among the population-at-risk (roughly 1.2 million males 20+ years old in 1990).

• 19061 records available after data cleaning. • Follow-up through December 2000. • 10308 had died and 8753 were censored.

Page 14: A Spatial Scan Statistic for Survival Data

Significant clusters using exponential model

Page 15: A Spatial Scan Statistic for Survival Data

cluster In cluster RR LLR P

#death #indivi

Short

survival

1 646 938 1.45 41.88 0.001

2 2154 3706 1.13 19.06 0.001

3 33 36 3.26 16.13 0.003

Long

survival

4 661 1445 0.75 31.83 0.001

5 200 529 0.65 22.24 0.001

6 37 114 12.11 12.11 0.015

Application to Prostate Cancer Data

Page 16: A Spatial Scan Statistic for Survival Data

Covariate Adjustment

• Younger patients may live longer

• Geographical variation in histology or stage

Page 17: A Spatial Scan Statistic for Survival Data

Significant clusters after age-adjustment

Page 18: A Spatial Scan Statistic for Survival Data

Discuss

• Exponential model works well for censored and non-censored survival data from difference distribution, but probably no do well for all continuous variables, like data that is approximated normally distributed.

• The statistical inference is valid even though the survival times are not exponentially distributed because of the permutation based test procedure.

Page 19: A Spatial Scan Statistic for Survival Data

Discussion

• The covariate adjustment method here is based on the exponential model, assuming a constant hazard. It could be extended to non-constant hazard with several levels, or as a function of survival time associated with different kind of models.

• It could be extends to a space-time scan statistic when time series data are available.

• It could also be extended to create a scan-statistic with elliptical or other cluster shapes.

• Unfortunatly, no statistical software available.