Inference from ecological models: air pollution and stroke using data from Sheffield, England.
-
Upload
zahir-davenport -
Category
Documents
-
view
21 -
download
1
description
Transcript of Inference from ecological models: air pollution and stroke using data from Sheffield, England.
Inference from ecological models: air pollution and stroke using data from
Sheffield, England.
Ravi Maheswaran, Guangquan Li, Jane Law, Robert Haining, Marta Blangiardo, Sylvia
Richardson, Nicky Best
Outline:
1.Background to the Sheffield study and results presented at Geomed 2005.
2.From the Poisson to the Binomial model
3.Results
4.Conclusions
1. Nitrogen oxides (NOx) and stroke mortality in Sheffield, England (Geomed 2005).
• Strokes account for 8%-12% of UK deaths
• Some evidence of a link between air pollution and stroke:
• studies of severe air pollution episodes (e.g 1952 London smog);
• analysis of daily time series (e.g. Kan et al (2003): Shanghai);
• cohort studies (e.g. Nafstad et al (2004): Norwegian males).
Since absolute number of deaths is small, power of tests even in large cohort studies is not large particularly for a factor that may not have a large effect.
Small area ecological studies may help: - by providing another way of looking at the relationship;- by allowing the analysis of very large populations and at a much lower cost than a cohort study;- small areas are likely to be more homogeneous (than large areas) in terms of population characteristics thus reducing the risk of ecological bias.
Data
Stroke mortality data:
• ICD9 codes 430-438;• 1994-8. c3k stroke deaths in population of c200k
over 45;• Aggregated by Enumeration District (c 150
households); age (5 year cohorts from 45 to 85+) and sex.
• 2.89 deaths per ED (min expected: 0.1; max:10.9)
Population data:
(i) 1991 Census data on demography and deprivation (Townsend index);
• Recorded at the Enumeration District level (n=1030)
(ii) Sheffield Health and Illness Prevalence survey (2000):• Random sample stratified by ward;• >10k respondents of whom >9.5k gave complete age, sex
and smoking information.• Average of 2.43 smokers per ED (Min expected: 0.19; max
expected: 19.24)
Average annual mean pollution levels 1994-9 (exc 1998): NOx (ug/m3)
50.00 100.00 150.00 200.00 250.00 300.00 350.00
Monitored
50.00
100.00
150.00
200.00
250.00
300.00
350.00
Mo
del
led
Modelled = -80.43 + 3.66 * monitorR-Square = 0.73
Areal Interpolation (from grid to ED): point in polygon – weighted PostPoint
ID Domestic Pollution Dom*Pollproperties for grid
1 13 16.95 220.382 3 18.29 54.883 33 16.72 551.634 31 16.97 526.195 19 16.97 322.516 3 17.40 52.207 33 17.02 561.808 20 17.72 354.449 7 18.72 131.04
Sum 162 2775.06Average 17.42
Weighted average = 17.13
NOx data transfered to the enumeration district framework after application of the weighted PostPoint
method of areal interpolation
Poisson Model
yi = number of stroke deaths in area i.
yi ~ Poisson(i)
i = riEi
ri = underlying true area i specific relative risk.Ei = expected number of deaths in area i standardized for age, sex and socio-economic deprivation:
m = age-sex-deprivation specific mortality rate for population subgroup m.ni,m = size of population subgroup m in area i.
k
1mmi,mi nθE
aveiii z x]Log[r β
Generalized linear model:
xi = NOx level in area i.zi
ave = Smoking prevalence ratio in area i (spatial moving average using the observed and expected counts).
Poisson regression controlling for age, sex, deprivation and smoking prevalence.
Parameter Rel. Risk (95% CI) WinBUGS
Rel. Risk (95% CI)
SAS
NOx category
5 1.48 (1.31-1.67) 1.48 (1.23-1.77)
4 1.26 (1.12-1.42) 1.26 (1.06-1.51)
3 1.10 (0.98–1.24) 1.10 (0.92-1.32)
2 1.13 (1.00-1.26) 1.12 (0.94-1.34)
1 1 1
Smoking: zave 0.93 (0.84-1.02) 0.93 (0.80-1.08)
DIC: 4871.57 Deviance/df=2.3
Bayesian hierarchical spatial model:
Fitted to allow for overdispersion due to :- small area population heterogeneity;- missing covariates (that may be spatially autocorrelated).
To allow for the uncertainty associated with the smoking data (small counts; missing values), an errors-in-variable model used for zi.
ieβ estiii z x]Log[r
ei = unexplained area-specific log relative risk in area i after adjusting for x and zest. = vi + si
vi = unstructured random effects (zero-mean normal prior)si = spatially structured random effects (zero-mean intrinsic conditional autoregressive prior).
ziest = log[smoke.ri] = smoke. + smoke.vi +
smoke.si
Priors:- flat priors used for , and .- gamma(0.5, 0.0005) used for the precision parameters of the random effect terms.
Spatial fraction (SF):- Var(si)/[Var(si) + Var(vi)]. Ratio of the estimate of the marginal variance of the spatial random effect to the sum of the estimated marginal variances of the
spatial and the unstructured random effects.
SF => 1 implies spatial heterogeneity dominates;SF => 0 implies unstructured heterogeneity dominates.
Poisson regression with spatial random effects, controlling for age, sex, deprivation and smoking prevalence
Parameter Rel. Risk (95% CI) WinBUGS
NOx category
5 1.27 (1.03-1.54)
4 1.16 (0.95-1.40)
3 1.04 (0.85-1.25)
2 1.07 (0.89-1.29)
1 1
Smoking: zest 1.05 (0.79-1.40)
Spatial fraction (model; for smoking
(0.006; 0.99)
DIC= 3927.77
Conclusions:
Evidence of an association between NOx and stroke mortality:1. threshold level for an effect;2. effect size diminishes after including random
effects to allow for overdispersion and missing variables;
3. spatially smoothing NOx to allow for local journeys did not make a difference to the size of the effect;
4. Unable to allow for long and short term population movements.
5. No association with smoking prevalence (effect of definition?; small sample sizes in some EDs?)
2. Fitting a Binomial Model
-stroke is not contagious so outcomes for individuals are independent Bernoulli rvs and therefore at the area level they aggregate to Binomial rvs. - because stroke is relatively rare, the Poisson assumption should give similar results, but it is only an approximation.- we also have data on the proportion exposed to different levels of NOx at the ED level which was not previously used.
Ecological analysis
Not-exposed Exposed Margins
Death
Not Death
Totals
Unknown (but of interest)Unknown (but of interest)
Observed (not previously used)Observed (not previously used) Observed (and used in the previous analysis)Observed (and used in the previous analysis)
Dichotomised individual level model
xi,j is 0 (if individual j in area i is not exposed) or 1 (if individual j in area i is exposed).
:stroke risk in not-exposed group in i
:stroke risk in exposed group in i
zi denotes other area level covariates (e.g. deprivation)
vi ~ N(0,2). An unstructured random effect to account for unmeasured covariates.
i
xi
vlogit
yji
iγzji,xi,
xi,,
x)q(
)q(Bernoulli~
ji,
ji,,
}1{y probabilitq
1}y{probabilitq
1,i,1
0,i,0
�y
y
i
i
The person is in the exposed groupThe person is in the exposed group
The person is in the not-exposed groupThe person is in the not-exposed group
Depending on the exposure status of the individual:
This can be extend to a categorical exposure variable with more than 2 levels. Various extensions of the model such as incorporating continuous exposure can be found in Jackson et al. (2006)
Jackson, C. H., Best, N. G. and Richardson, S. Improving ecological inference using individual-level data. Statistics in Medicine (2006) 25(12): 2136--2159
ii
ji
ii
ji
vqlogit
thenxIf
vqlogit
thenxIf
i
i
zγ
zγ
)(
1
)(
0
1,
,
0,
,
An area-level model incorporating the distribution of within-area exposure
where
i = proportion of the population in area i in the
exposed category.pi = probability of stroke death in area i, regardless of exposure.
)(
)(
1,
0,
ii
ii
vexpitq
vexpitq
i
i
zγ
zγ
RemarkNote that applying a Binomial model with the proportion of exposed individuals as a covariate:
But in general
Ecological biasEcological bias
Derived from an individual level model
)(
)(
1,
0,
ii
ii
vexpitq
vexpitq
i
i
zγ
zγ
iii
iii
vplogit
npBiny
izγ)(
,~
Parameter Rel. Risk (95% CI) Rel. Risk (95% CI)
NOx category Without unstr. R.E. With unstr. R.E.
5 1.34 (1.14 – 1.52) 1.07 (0.88 – 1.29)
4 1.16 (1.03 – 1.30) 1.05 (0.86 – 1.25)
3 1.10 (0.99 – 1.22) 0.92 (0.75 – 1.10)
2 1.00 (0.87 – 1.13) 0.87 (0.73 – 1.04)
1 1 1
DIC: 4953.02
pD: 8
DIC: 3936.66
pD: 480
3. ResultsBinomial regression controlling for age, sex (18 strata), deprivation and incorporating the within area distribution of exposure.
A dichotomised-exposure Binomial regression model controlling for age, sex (4 strata; 18 strata) and deprivation
and incorporating data on the within area distribution of exposure.
Parameter Rel. Risk (95% CI): (4 strata)
Rel. Risk (95% CI): (18 strata)
NOx category
Exposed 1.20 (1.05 – 1.34) 1.14 (1.00-1.30)
Non-exposed 1 1
• The exposed category comprises NOx categories 4 and 5 in the previous slide;
• The non-exposed category comprises categories 1, 2 and 3.
4. Conclusions
1. Incorporation of information on within area exposure resulted in a reduction of the estimated relative risk compared to the earlier set of results.
2. Lower risks in categories 2 and 3 in the binomial model with 5 exposure categories may indicate some confounding effects have not been accounted for in the current model; in the absence of additional information, these effects could be “averaged out” by combining some exposure categories.
3. Fitting a reduced model with two exposure categories does indicate a significant effect in the exposed group after adjusting for age, sex and deprivation;
4. Increasing the number of age-sex cohorts from 4 to 18 in the dichotomous-exposure model reduced the estimated relative risk to 1.14 (95% CI: 1.00, 1.30), but there is still evidence of a significant effect.
Differences between the current approach and the earlier modelling. – The Poisson model is prone to ecological bias since for
exposure, only aggregated information was used.– Here we attempt to reduce the bias by utilizing data on
the within-area distribution of exposure, i.e., the proportion of people in the exposed and non-exposed groups.
– Deprivation was absorbed into the expected number of cases in the earlier work, here it has been included as a covariate. We could adjust for deprivation in the baseline risks.
– There was no adjustment for smoking prevalence since it was not significant in the earlier modeling. The possibility exists of using lung cancer mortality as a proxy for smoking instead.