Bayesian mixture modelling for characterising...
Transcript of Bayesian mixture modelling for characterising...
School of Mathematical Sciences
Queensland University of Technology
Bayesian mixture modellingfor characterising
environmental exposuresand outcomes
Darren Eastwood Wraith
BCom(Econ), Post Grad Dipl Health Econ & Eval, BMath
A thesis submitted for the degree of Doctor of Philosophy in the Faculty of
Science, Queensland University of Technology according to QUT requirements.
Principal Supervisor: Prof. Kerrie Mengersen
Associate Supervisors: Assoc. Prof. Shilu Tong; Dr Clair Alston
2008
Abstract
Environmental exposure and outcomes assessment is a great challenge to scientists.
Increasingly more and more detailed data are becoming available to understand the
nature and complexity of the relationships involved. The methodology of mixture
models provides a means to understand, quantify and describe features and relation-
ships within complex data sets. In this thesis, we focussed on a number of applied
problems to characterise complex environmental exposure and outcomes, including:
assessing the interaction between environmental exposures as risk factors for health
outcomes; identifying differing environmental outcomes across a region; and estab-
lishing patterns in the size and concentration of aerosol particles over time. Mixture
model approaches to address these problems are developed and examined for their
suitability in these contexts.
i
List of publications and manuscripts arising from this
thesis
This thesis comprises the following publications which have been accepted, or sub-
mitted, for publication in international refereed journals
Chapter 3: Wraith D. & Mengersen K. Assessing the combined effect of asbestos exposure
and smoking on lung cancer: A Bayesian approach. Statistics in Medicine, 28
February 2007, 1150-1169
Chapter 4: Wraith D. & Mengersen K. A Bayesian approach to assess interaction be-
tween known risk factors: the risk of lung cancer from exposure to asbestos
and smoking. Statistical Methods in Medical Research. (Published online 14
August 2007)
Chapter 5: Wraith D., Mengersen K., Low Choy S., Tong S. Spatial and Temporal Mod-
elling of Ross River virus in Queensland. In Zerger, A. and Argent, R.M. (eds)
MODSIM 2005 International Congress on Modelling and Simulation. Mod-
elling and Simulation Society of Australia and New Zealand, December 2005
Chapter 6: Wraith D., Alston C., Mengersen K., & Hussein T. Bayesian mixture model
estimation of aerosol particle size distributions. Environmetrics (Submitted:
November 2007)
Chapter 7: Wraith D., Alston C., Mengersen K., & Hussein T. Bayesian estimation of
mixtures over time with application to aerosol particle size distributions. Sta-
tistical Modelling (Submitted: November 2007)
ii
Contents
1 Introduction 1
1.1 Primary Research Aim . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Research Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Scope of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Outline of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Literature Review 8
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Meta-analysis methods . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Applications to environmental exposures and outcomes . . . . 11
2.3 Mixture models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Relevant applications . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Assessing the combined effect of asbestos exposure and smoking on
lung cancer: A Bayesian approach 23
3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Assessing interaction between asbestos and smoking . . . . . . . . . . 25
3.3.1 Synergy Index (S) . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.2 Multiplicativity Index (V) . . . . . . . . . . . . . . . . . . . . 27
3.3.3 The relationship between exposure to asbestos and smoking . 28
iii
iv
3.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.1 Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.2 Methods to assess interaction . . . . . . . . . . . . . . . . . . 32
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.1 Sensitivity of the Results . . . . . . . . . . . . . . . . . . . . . 38
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4 A Bayesian approach to assess interaction between known risk fac-
tors: the risk of lung cancer from exposure to asbestos and smoking 50
4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Overview of studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Methods to assess interaction . . . . . . . . . . . . . . . . . . . . . . 53
4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 Spatial and temporal modelling of Ross River virus in Queensland 73
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2.2 Mixture models . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6 Bayesian mixture model estimation of aerosol particle size distri-
butions 90
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2 Particle size distribution data . . . . . . . . . . . . . . . . . . . . . . 92
6.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.1 Mixture model at a single time point . . . . . . . . . . . . . . 96
6.3.2 Accounting for truncated data . . . . . . . . . . . . . . . . . . 98
6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
v
6.4.1 Simulated data: single time point . . . . . . . . . . . . . . . . 100
6.4.2 Case study: single time point . . . . . . . . . . . . . . . . . . 103
6.4.3 Results for mixture model estimation over multiple time points 107
6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7 Bayesian estimation of mixtures over time with application to aerosol
particle size distributions 113
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.2 Particle size distribution data . . . . . . . . . . . . . . . . . . . . . . 115
7.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.3.1 Mixture representation . . . . . . . . . . . . . . . . . . . . . . 117
7.3.2 Choice of temporal prior . . . . . . . . . . . . . . . . . . . . . 120
7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.4.1 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.4.2 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8 Bayesian hierarchical modelling for a time series of mixtures 138
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.2 Particle size distribution data . . . . . . . . . . . . . . . . . . . . . . 139
8.3 Mixture models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.3.1 Hierarchical time series approach for mixture models . . . . . 144
8.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.4.1 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.4.2 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9 Conclusions and further work 161
9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
A Appendices 166
A.1 Calculations for the variance of S and V (Ch.3) . . . . . . . . . . . . 166
vi
A.2 Reversible Jump Markov Chain Monte Carlo (RJMCMC) (Ch.6) . . . 168
A.3 Penalised Prior (Ch.6) . . . . . . . . . . . . . . . . . . . . . . . . . . 171
A.4 Details of MH Gibbs sampler for hierarchical model (Ch. 8) . . . . . 173
List of Figures
2.1 Representative scatter plots of altitude (km) versus ozone partial
pressure (micro-millibar) with fitted mixture regression curves. (a)
7 February 1990; (b) 9 February 1990; (c) 12 February 1990; (d) 14
February 1990 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Box plots of V from Study 2 . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Box plots of V from Study 5 . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Density plot of V and S from multivariate analysis for Study 2 . . . . 41
3.4 Density plot of V and S from multivariate analysis for Study 5 . . . . 42
4.1 Boxplots of β12(log scale) by study (horizontal axis and study numbers
ordered left to right) and overall (over-dispersed model) . . . . . . . . 63
4.2 Starplots by study (1-18) and Overall. S is the Synergy Index, V the
Multiplicativity Index, PM the probability of a multiplicative relation,
and gamma is the power transformation estimate from Rlg (gamma=0
(additive), gamma=1 (multiplicative)) . . . . . . . . . . . . . . . . . 67
5.1 Queensland climate zones - Bureau of Meteorology . . . . . . . . . . 76
5.2 Time plot of weekly cases - Zone 15 . . . . . . . . . . . . . . . . . . . 78
5.3 Time plot of weekly cases - Zone 5 . . . . . . . . . . . . . . . . . . . 79
5.4 Histograms of data (log(y+1)) for all Zones (as numbered) . . . . . . . . 81
vii
viii
5.5 Plot of fitted mixture model for Zone 15 showing three components
against the data over time (log values). Overall fitted density is shown
in Black, and components in Red. Blue lines indicate the estimates
of µ for the three components. . . . . . . . . . . . . . . . . . . . . . . 83
5.6 Plot of fitted mixture model for Zone 15 against a histogram of the
data. Overall fitted density is shown in Black, and components in Red. 84
5.7 Plot of fitted mixture model for Zone 5 showing three components
against the data over time (log values). Overall fitted density is shown
in Black, and components in Red. Blue lines indicate the estimates
of µ for the three components. . . . . . . . . . . . . . . . . . . . . . . 85
5.8 Plot of fitted mixture model for Zone 5 against a histogram of the
data (density scale). Overall fitted density is shown in Black, and
components in Red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1 Histogram of data sampled from Hyytiala, Finland for a single time period 93
6.2 An illustration of a new particle formation event at a Boreal Forest site
located in Southern Finland. (a) The temporal variation of the particle
number size distribution and (b) selected particle number size distributions
showing the different stages of the newly formed particle mode from its
early stage. Note that this new particle formation occurred on a regional
scale over the southern part of Finland. . . . . . . . . . . . . . . . . . . 95
6.3 Kernel density estimator of simulated data (black) with fitted results from
normal (dark green) and truncated normal (blue) approaches. Simulated
data based on parameters: k = 4;µ = (1.40, 2.30, 3.70, 5.10);σ = 0.30; λ =
(0.10, 0.10, 0.60, 0.20) . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4 Histograms of data sampled from Hyytiala, Finland with estimated overall
fit and components for non-truncated Normal (left, k=4) and truncated
normal (right, k=3) overlaid . . . . . . . . . . . . . . . . . . . . . . . . 103
6.5 Histograms of data sampled from Hyytiala, Finland with estimated overall
fit and components from RJMCMC (left) and LSM (right) overlaid . . . . 105
ix
6.6 Histograms of data sampled from Hyytiala, Finland with estimated overall
fit and components from RJMCMC (left) and LSM (right) overlaid . . . . 106
6.7 Plot of posterior mean values for µjt obtained from the first stage using
the RJMCMC algorithm for one day (Hyytiala measurement station). The
size of the circles indicating the weight (λjt) corresponding to µjt . . . . . 107
6.8 Plot of parameters (µ and λ) over one day (Hyytiala, Finland) for the inde-
pendent approach. Stage 2 of the analysis for the evolution of parameters.
Measurements taken every 10 minutes. Colours indicate the components
to which parameter estimates belong (The parameter estimates for the first
component are Black, parameters for the second component are Red, for
the third component they are Green, etc.) . . . . . . . . . . . . . . . . . 109
6.9 Plot of parameters (µ and λ) over one day (Hyytiala, Finland) for the
informed prior approach. Stage 2 of the analysis for the evolution of the
parameters. Measurements taken every 10 minutes. Colours indicate the
components to which parameter estimates belong (The parameter esti-
mates for the first component are Black, parameters for the second com-
ponent are Red, for the third component they are Green, etc.) . . . . . . 111
7.1 Estimated overall fit and components from RJMCMC for one time
period. Concentration of particles (dN/dlog(Dp)[cm3]) by particle
diameter (log(Dp(nm))) . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.2 An illustration of a new particle formation event at a Boreal Forest site
located in Southern Finland. (a) The temporal variation of the particle
number size distribution and (b) selected particle number size distributions
showing the different stages of the newly formed particle mode from its
early stage. Note that this new particle formation occurred on a regional
scale over the southern part of Finland. . . . . . . . . . . . . . . . . . . 118
7.3 Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D1): Simulated data
(Black); Independent (Red); Informed Prior (Green); Penalised Prior (Blue)123
x
7.4 Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D2): Simulated data
(Black); Independent (Red); Informed Prior (Green); Penalised Prior (Blue)124
7.5 Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D3): Simulated data
(Black); Independent (Red) . . . . . . . . . . . . . . . . . . . . . . . . 125
7.6 Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for Informed Prior approach using simulated data (D3):
Simulated data (Black); Theta=0.1 (Green); Theta=0.8 (Blue); Theta=1.3
(Purple) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.7 Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for Penalised Prior approach using simulated data (D3):
Simulated data (Black); φ=0.04 (Brown); φ=0.08 (Light Blue); φ=0.12
(Dark Green) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.8 Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for Informed Prior approach using simulated data (D3):
Simulated data (Black); Smoothing on µ (Orange); Smoothing on σ (Dark
Green)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.9 Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D3): Simulated data
(Black); Independent (Red); Smoothing on µ and λ (Green) . . . . . . . . 130
7.10 Plot of posterior mean estimates for µj from RJMCMC algorithm for one
day (Hyytiala). Stage 1 of analysis for temporal evolution of parameters.
Larger circles indicate greater weight for that component . . . . . . . . . 132
7.11 Plot of estimated parameters (µ (top panel), λ (bottom panel) under an
independent prior over time. Stage 2 of analysis for temporal evolution of
parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.12 Plot of estimated parameters (µ (top panel) and λ (bottom panel) under
an informed prior over time. Stage 2 of analysis for temporal evolution of
parameters. Informed prior specified for λ in all components and µ3 . . . 134
xi
8.1 Histogram of data sampled from Hyytiala, Finland for a single time period 140
8.2 An illustration of a new particle formation event at a Boreal Forest site
located in Southern Finland. (a) The temporal variation of the particle
number size distribution and (b) selected particle number size distributions
showing the different stages of the newly formed particle mode from its
early stage. Note that this new particle formation occurred on a regional
scale over the southern part of Finland. . . . . . . . . . . . . . . . . . . 142
8.3 Plot of estimated parameters over time for simulated dataset D1: µ (top
panels), σ (middle panels), λ (bottom panels). Actual data (Black), Inde-
pendent (Red), Informed Prior (Green) . . . . . . . . . . . . . . . . . . 148
8.4 Plot of estimated parameters over time for simulated dataset D1. µ (top
panels), σ (middle panels), λ (bottom panels). Actual data (Black), Hier-
achical approach for µ (Dark Green), φ (Blue) . . . . . . . . . . . . . . . 149
8.5 Plot of estimated parameters over time for simulated dataset D1. µ (top
panels), σ (middle panels), λ (bottom panels). Actual data (Black), Hier-
achical approach for λ (Dark Green), γ (Blue) . . . . . . . . . . . . . . . 150
8.6 Plot of estimated parameters over time for simulated dataset D2.
µ (top panels), σ (middle panels), λ (bottom panels). Actual data
(Black), Independent (Red), Informed Prior (Green) . . . . . . . . . . 151
8.7 Plot of estimated parameters over time for simulated dataset D2.
µ (top panels), σ (middle panels), λ (bottom panels). Actual data
(Black), Hierachical approach for µ (Dark Green), φ (Blue) . . . . . . 152
8.8 Plot of estimated parameters over time for simulated dataset D2.µ (top
panels), σ (middle panels), λ (bottom panels). Actual data (Black), Hier-
achical approach for λ (Dark Green), γ (Blue) . . . . . . . . . . . . . . . 153
8.9 Plot of posterior mean estimates for µj from RJMCMC algorithm for one
day (Hyytiala). Stage 1 of analysis for temporal evolution of parameters.
Larger circles indicate greater weight for that component . . . . . . . . . 155
8.10 Plot of estimated parameters over time for actual data. Independent ap-
proach. Posterior mean estimates for µ (top panel), and λ (bottom panel). 156
xii
8.11 Plot of estimated parameters over time for actual data. Hierarchical ap-
proach for λ. Posterior estimates for µ (top panel), and γ (bottom panel). 157
List of Tables
3.1 Details of Studies Used for Statistical Analysis . . . . . . . . . . . . . 30
3.2 Reported Results of Studies . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Results (Univariate): Test of Synergy (S) and Multiplicativity (V) . . 35
3.4 Results for Multivariate RR Analysis . . . . . . . . . . . . . . . . . . 37
3.5 Combined Results for S and V . . . . . . . . . . . . . . . . . . . . . . 38
3.6 Sensitivity of estimates from the Variance/Covariance Matrix for the
Multivariate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7 Results for S and V by Factor . . . . . . . . . . . . . . . . . . . . . . 44
3.8 Sensitivity of the Posterior Estimates (Univariate): Test of S and V . 45
3.9 Sensitivity of the Posterior Estimates (Multivariate Analysis) . . . . . 45
4.1 Details of studies for statistical review . . . . . . . . . . . . . . . . . 54
4.2 Results of logistic and Poisson regression models . . . . . . . . . . . . 62
4.3 Relative Risk Estimates, Observed and Posterior . . . . . . . . . . . . 65
4.4 Results of relative risk models and mixture model . . . . . . . . . . . 66
4.5 Results of Synergy Index (S) and Multiplicativity Index (V) . . . . . 68
4.6 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1 Summary results - all zones . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Results for Zones 5 and 15 . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3 Results for all Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
xiii
xiv
6.1 Estimated parameter values from Bayesian mixture model analysis
using RJMCMC algorithm with simulated data. Based on 200,000
iterations with a burnin of 100,000. CI = Credible Interval . . . . . . 102
Statement of Original Authorship
The work contained in this thesis has not been previously submitted for a degree or
diploma at any other higher educational institution. To the best of my knowledge
and belief, the thesis contains no material previously published or written by another
person except where due reference is made.
Signed:
Date:
xv
Acknowledgements
There are many people to thank really and I will not be able to remember everyone
who helped in some way and for that I apologise in advance. I would firstly like to
thank my Principal supervisor Kerrie Mengersen for her amazing support and guid-
ance over a very long period of time and for that I am very grateful. To Clair Alston
who was a huge help for Chapters 6 and 7, especially in the early stages of this work
when the computational task, including the programming seemed onerous–I thank
you very much. I would also like to thank everyone in the School of Mathematical
Sciences at QUT for making the last few years an enjoyable and friendly experience.
To my roommates and friends in O415, thank you for all the necessary and fun dis-
tractions, including the Friday night ritual! I would also like to thank Andrew Torre
whose inspiration and belief started my interest in mathematics seemingly a long
time ago, and whose support over the years I have greatly appreciated. Lastly, but
not least, I would to thank my family and friends for all their support, who although
for some weren’t actually sure what I was doing ... were nevertheless supportive any-
way and in those terms perhaps the best support I could ask for. I would also like to
especially thank my partner, Shonah, for her amazing support, particularly in the
final stages, and whose level of support continues to amaze me everyday–thank you!
xvi
Chapter 1
Introduction
1.1 Primary Research Aim
The primary aim of this study is to develop mixture model approaches to characterise
complex environmental exposures. The methodology of mixture models provides
a means to understand, quantify and describe features and relationships within
environmental exposure data.
1.2 Motivation
The motivation for this thesis arose from a research project examining the nature
of the relationship between exposure to both asbestos and smoking as risk factors
for lung cancer. From a preliminary review of the literature, we found there was
well documented evidence indicating that both long term exposure to asbestos and
active smoking are independent risk factors for lung cancer. However, the nature of
the relationship between, or interaction of the two risk factors was less clear and was
often the subject of much debate in the epidemiological literature. The question of
primary interest from a statistical perspective was whether the risk from exposure
to both asbestos and smoking is an additive, multiplicative or other relation of the
1
2
risk from exposure to each factor alone.
We found from reviewing the studies which aimed to quantify the relationship
between exposure to asbestos and smoking, that there was much variability in the
results. We also found much discrepancy between the outcomes of two major reviews
published at a similar time, the first by Lee (2001) and the second by Liddell (2001),
which lead to an interesting interchange between the two authors in discussing the
outcomes of their reviews (Lee, 2002; Liddell, 2001). Lee (2002) found little evidence
to reject a multiplicative relation, however Liddell (2002) highlighted differences in
the results of the case-control versus cohort studies, finding evidence against a simple
multiplicative hypothesis. It was clear from reading these two reviews that apart
from some of the differences in the individual studies (e.g study design, exposure
levels, etc.) to which these reviews refer, an alternative method or perspective
to characterise the relationship was needed. In particular, we were interested in
whether we could say with some probability whether the relationship between the
two exposures could be assigned to one functional form or another (additive or
multiplicative). From a statistical perspective, a mixture model approach seemed
ideally suited to the analysis of this problem and we started to look at how we
could apply this approach in a meta-analysis context. After much investigation, we
found that we could characterise the relationship between the two risk factors in
this way, and that the results could be used as an aid to understand and quantify
the uncertainty in establishing a relationship. With the success of this approach,
the idea to use this to characterise other environmental exposures was born.
Under an ARC Discovery Grant, we started to look at a mixture model approach
to characterise the risk of Ross River virus (RRv) in Queensland (QLD). In partic-
ular, interest was in how the risk of RRv varied over time and across the spatial
region of QLD. A recent paper by Gatton et al. (2004), examining the risk of RRv
across QLD found much variability in the results across the region, and found it
useful in the analysis to separately identify outbreak periods. With much variability
in the results over time and space, we started to look at applying a mixture model
approach to classify the data over time into different periods, which would build
on the approach adopted in Gatton et al. (2004). In particular, we were interested
3
in whether we could classify cases of RRv over time into more than two periods
(outbreak or no outbreak) and how the number of periods varied across the spatial
region of QLD.
Under the ARC Discovery Grant, we were also collaborating with the aerosol
physics group at QUT on a project looking at the concentration of particles of dif-
ferent sizes over time, which had been collected in Brisbane over a 5 year period.
The size of the particles ranged from 16nm to 600nm. In the aerosol physics litera-
ture, one of the standard approaches to analysing data of this form was to classify
the data into particular size groups of interest (e.g 16-30nm, 31-90nm, 90nm+) and
analyse these groups separately. As the size of the particles can reveal their source,
and because particles are governed by formation and transformation processes, they
tend to form well distinguishable modal feature. An alternative approach in the
aerosol literature for analysing this type of data is to identify the modes of the data
using a classification technique and analyse the modes separately. However, this
classification approach was more difficult to do and because of the basic approaches
employed (e.g least squares regression) was very rarely used without making some
subjective decisions (See Hussein et al., 2005). Armed with our experience in using
mixture models, we were interested in whether we could use a mixture model to
classify the data by the modal features of the data, and also in how this mixture
representation would change over time. As time intervals between measurements
is often quite small (e.g 5 minutes to an hour) and thus the data over time often
highly dependent, we were also interested in any improvements to estimation and
inference by including information about the mixture representation at neighbouring
time points.
For the first part of the analysis using the data from Brisbane, we grouped the
data into size bins according to size ranges of interest and analysed the concentration
of particles for the different size bins separately over time (Mejia et al., 2007). To
apply a mixture model approach to investigate the modal structure of the data we
chose a more comprehensive dataset from Hyytiala, Finland, which provided a more
detailed assessment of the modal structure and an almost a complete dataset of
observations.
4
1.3 Research Plan
To address the primary aim of this thesis, as stated in Section 1.1, this thesis focusses
on the following problems to characterise complex environmental exposures and
outcomes:
• assessing the interaction between environmental exposures as risk factors for
health outcomes
• identifying differing environmental outcomes across a region
• establishing patterns in the size and concentration of aerosol particles over
time
In order to address these problems the following mixture model approaches are
developed and examined:
• a mixture model approach to assess interaction between risk factors in a meta-
analysis framework
• a mixture model approach to classify cases of a disease over time into a number
of groups based on time periods with differing risk levels
• estimation of mixture models over multiple time points
1.4 Scope of thesis
Each of the selected problems in characterising complex environmental exposures
and outcomes could be approached in a number of ways. In this thesis, we confine
our attention to the development and application of mixture model approaches to
address these problems.
A mixture model approach to assess interaction or relationships between risk
factors in a meta-analysis context is outlined. In this analysis, we consider whether
the relationship between the two exposures could be assigned to one functional form
or another (additive or multiplicative). Alternative relationships and mechanisms
underlying disease causation are not investigated in detail.
5
A mixture model approach to characterise the risk of RRv over time and spa-
tial regions by identifying groups in the data is outlined. Explanatory variables
associated with RRv and the correlation structure of the data are not investigated.
Stochastic or mechanistic models to describe the transmission of the disease are also
not examined.
Mixture model approaches to estimate parameters at both single and multiple
time points for aerosol particle size distribution (PSD) data are outlined. Alternative
approaches to analysing this data, such as grouping of size bins into categories and
separate analyses of size variables are not provided as a comparison in the analysis.
The dynamic processes describing the evolution of the particles are not investigated.
1.5 Outline of thesis
The remaining chapters of this thesis are organised as follows:
Chapter 2 presents a review of meta-analysis and mixture model approaches in
the literature to characterise environmental exposures and outcomes. Most of the
relevant literature is discussed in the chapters, and here we present an overview of
the main approaches used in this thesis.
In Chapter 3, we examine the relationship between two risk factors for lung
cancer, exposure to asbestos and smoking using a multivariate meta-analysis. In
particular, from a statistical perspective, we are interested in whether the risk from
exposure to both asbestos and smoking is an additive, multiplicative or other relation
of the risk from exposure to each risk factor alone. In this analysis, we consider the
evidence for either relation using separate tests.
Chapter 4 extends the meta-analysis approach in Chapter 3, and examines a
mixture model approach to assess the strength of evidence for either relation. In this
approach, we move away from separate tests for either an additive or multiplicative
relation and allow the data to choose between both models. By allowing both
relations to be considered at the same time, this type of inference may be more
informative than considering each relation separately.
In Chapter 5, we examine a mixture model approach to characterise the risk of
6
Ross River virus (RRv) in Queensland from 1984 to 2001. The approach builds on
the approach adopted by Gatton et al. (2004), and considers that the weekly cases
of RRv could be attributed to more than two hypothesised periods (outbreak or no
outbreak period), and also extends the analysis to compare the number of periods
across non-homogenous spatial regions of Queensland.
In Chapters 6, 7 and 8 we examine approaches to estimate a mixture model
at both single and multiple time points for aerosol particle size distribution (PSD)
data. In Chapter 6, for estimation of mixture model at a single time point, we
use Reversible Jump MCMC to estimate mixture model parameters including the
number of components which is assumed to be unknown. We compare the results
of this approach to a commonly used estimation method in the aerosol physics
literature. As PSD data is often measured over time at small time intervals, we also
examine the use of an informative prior for estimation of the mixture parameters
which takes into account the correlated nature of the parameters.
In Chapter 7, we examine in some detail the issue of using informative priors
for estimation of mixtures at multiple time points. In this analysis, the use of two
different informative priors, and an independent prior are compared using simulated
and actual data. The use of informative priors may provide useful information in
which to better identify component parameters at each time point, and as an aid
for inference provide information in which to more clearly establish patterns in the
parameters over time.
In Chapter 8 we address some of the issues raised in Chapter 7, and explore a
hierarchical approach to estimation of mixture parameters over time in which an
informative prior is placed at two different levels. Simulated and actual data is used
to assess the performance of the approach.
The approaches examined in Chapters 6 to 8 extend a previous mixture model
approach for estimation of more than a single time point in a different setting (Alston
et al., 2005), to include all parameters and allow for a generalised correlation struc-
ture to be imposed. We also extend the two stage approach to estimation adopted
in Lee and Berger (2003), by allowing for correlation information to be used at the
same time as parameters are estimated. Approaches in the literature to estimate
7
a mixture model over a spatial region can also potentially be adapted for use in
a time series setting (Green and Richardson, 2002; Fernandez and Green, 2002),
however the influence or choice of informative priors in a time series framework and
the implications in different data environments has largely not been examined. In
Chapters 6, 7 and 8 we examine the use of informative priors for estimation of pa-
rameters over time, and extend the approaches in Green and Richardson (2002) and
Fernandez and Green (2002) to a time series setting and allow for all parameters to
be correlated over time.
An overview and discussion of the methodology are provided in Chapter 9. Pos-
sible extensions to the research presented in this thesis are indicated.
Chapter 2
Literature Review
2.1 Introduction
As much of the discussion of the literature is contained in each chapter, in this
chapter we provide an overview of the main approaches used in this thesis.
2.2 Meta-analysis methods
The use of Bayesian methods for meta-analysis has recently been reviewed (Sutton
and Higgins, 2007; Sutton and Abrams, 2001; Ashby, 2006; Spiegelhalter et al.,
2004). In this section, I briefly review the Bayesian approach to meta-analysis and
outline some of the main applications to environmental exposures and outcomes.
The use of meta-analysis methods to synthesise evidence regarding environmental
exposures and outcomes have been investigated and applied by a number of authors.
A number of epidemiological applications have concerned: environmental tobacco
smoke and cancer (Tweedie et al., 1996; Wolpert and Mengersen, 2004; Salanti et al.,
2006; Nam et al., 2003); air pollution and mortality or morbidity (Dominici et al.,
2000, 2004; Chen et al., 2006); and health effects from low-level exposure to lead or
exposure to nitrogen oxide (Hasselblad, 1995).
8
9
In most of the above applications, the method of meta-analysis has largely been
used to provide an overall assessment of the existence or size of an exposure-outcome
relationship from evidence provided by a number of individual studies where an over-
all picture remains largely obscure (Tweedie et al., 1996). If results from individual
studies are fairly consistent and clearcut it can also be used simply to increase sta-
tistical power, and provide greater confidence around an individual effect.
The earliest Bayesian approach to meta-analysis starts with the landmark papers
by Dumouchel and Harris (1983) and Dempster et al. (1983). Dumouchel and Harris
(1983) inspired by the hierarchical prior distributions of Lindley and Smith (1972),
introduced the idea of constructing hierarchical Bayesian models to synthesise infor-
mation from five types of environmental studies of the effect on human and animal
subjects of exposure to nine related environmental agents. Since then broad guides
to the use of a Bayesian hierarchical model to synthesise evidence include Carlin
(1992) and Spiegelhalter et al. (2004).
In a meta-analysis approach, interest is often in an overall measure or true un-
derlying measure, let us say µ, for which we would like to infer. To outline the
Bayesian hierarchical approach to meta-analysis, consider the simple formulation
Y = θ + e
θ = Xµ + ε
in which Y=(Y1, . . . , Yk) are the observed log relative risks for each study, θ =
(θ1, . . . , θk) are the corresponding true log relative risks, e = (e1, . . . , ek) and ε =
(ε1, . . . , εk) are random errors, X is a k×p design matrix, and µ is a p×1-vector of pa-
rameters of interest. If we take Yi to be normally distributed such that Y ∼ N(θ, Σ),
assume that θ ∼ N(Xµ, τ 2I), where ei ∼ N(0, σ2i ) and εi ∼ N(0, σ2
i ) are mutually
independent. A frequentist approach is to consider µ, σ2 and τ 2 as fixed parame-
ters and estimation of τ 2 is most commonly achieved through an approximation by
DerSimonian and Laird (1996).
For a Bayesian approach, Dumouchel (1990) and Carlin (1992) make the following
10
distributional assumptions,
Y |θ, σ ∼ N(θ, σ2C)
σ−2 ∼ χ2(dfσ)/dfσ
and
θ|µ, τ ∼ N(Xµ, τ 2V )
µ|τ ∼ N(0, D →∞)
τ−2 ∼ χ2dfτ
/dfτ
where C and V are k×k observed and prior variance-covariance matrices respectively,
and the degrees of freedom dfσ and dfτ indicate how well C and V, respectively, are
known. If we assume the studies to be independent, we can take C, which describes
within-study variability, to be a diagonal matrix with corresponding diagonal entries
the variances of the individual observations Yi. Similarly, if we assume little inter-
study variability, the matrix V, which describes interstudy variability, can be taken
to be a k × k identity matrix. An overall measure of the mean log relative risk for
all studies combined is provided by µ. The notation D → ∞ indicates that the
elements of D are very large and tending to infinity.
While Dumouchel and Harris (1983) outlines approximations to the analytical
posterior distributions for the above distributional assumptions, one of the advances
since then has been the use of Markov Chain Monte Carlo (MCMC) methods which
avoid the need for such approximations, and the above can be implemented using
Gibbs sampling.
Both Carlin (1992) and Dumouchel (1990) suggest that it is desirable to assess
the sensitivity of prior information on the results, in particular the dependence of
the posterior estimates of µ and θ on the specifications of dfσ and dfτ .
There are limitations associated with combining studies in the form of a meta-
analysis. The main limitations are common to both a frequentist and Bayesian
approach, and include confounding effects and biases within studies or biases by pub-
lication. Meta-analysis is designed to enable a combination of results from studies
11
which are comparable in outcome and exposure. In conducting a meta-analysis, we
may try to combine studies with different designs, or of different quality, which may
produce a consistent bias either upwards or downwards for an overall assessment.
Proposals to account for biases within studies include: restricting those studies in-
clude in the meta-analysis to only the best quality; down-weighting studies based
on a quantitative assessment of quality (Tritchler, 1999); and adjustments to study
outcomes using either covariates in a weighted regression (Thompson and Sharp,
1999) and/or prior information (Wolpert and Mengersen, 2004).
Publication bias is concerned with the potential for only statistically significant
or ‘positive’ results to be published and thereby biasing the selection of studies
to be included in a meta-analysis. The issue of publication bias and the impact
on the validity of findings has recently been reviewed (Sutton et al., 2000; Ashby,
2006). Various tests have been proposed to test for publication bias (Begg and
Mazumdar, 1994; Egger et al., 1997). Approaches to address the issue, amongst
others, include: the use of selection models incorporating various weight functions
(Silliman, 1997); the use of simulated pseudo-data (Bowden et al., 2006); and a
non-parametric method trim and funnel plot approach (Duval and Tweedie, 2000).
2.2.1 Applications to environmental exposures and outcomes
In this section, we outline some of the main applications of a Bayesian meta-analysis
to characterise environmental exposures and outcomes.
Dominici et al. (2000) applied a hierarchical regression model to analyse the effect
of urban air pollution on daily mortality using data for the 20 largest US cities. The
data consisted of publicly available listings of individual deaths by day and location,
and hourly measurements of pollutants and weather variables for a seven year period
(1987-1994). In a two stage analysis, the main interest was to establish an association
between PM10 (particulate matter less than 10µm in aerodynamic diameter) and
daily mortality, after controlling for possible confounders, by combining information
from across the cities. Interest is also in the extent to which some of the daily
mortality can be explained by variation in ozone levels (O3), and which may confound
an association between PM10. In the first stage, a log-linear regression is used
12
(using maximum likelihood) to estimate a pollution relative rate for each city, while
controlling for the city-specific longer term time trends and weather effects. For the
second stage, for the estimates of the log-relative rates associated with PM10 and O3
for each city (βc = (βcPM10
, βcO3
)), the following hierarchical model was considered,
βc ∼ N2(βc, V c),
βcPM10
= zc′PM10
αPM10 + εcPM10
,
βcO3
= zc′O3
αO3 + εcO3
,
εc ∼ N(0, Σ)
where zcPM10
and zcO3
are vectors of city-specific covariates, αPM10 , αO3 are the overall
estimates of the log-relative rates, and εc = (εcPM10
, εcO3
). Maximum likelihood esti-
mates from the first stage are used for βc and V c. Priors for α and Σ are specified
to be weakly uninformative.
For the second stage analysis, the assumption of relative rates of mortality for
PM10 (βcPM10
) to be independent across cities, and adjusted by levels of O3, were
compared to the possibility of there being geographic correlation. For the spatial
analysis, cities were clustered into three regions (North-East, South-East, and West-
Coast). The authors found the results under all of these models to be similar, with
the spatial analyses slightly attenuating the effects.
In the second stage, Gibbs sampling was used to estimate parameters. Given
the large size of the database and the main interest being in a combined estimate of
the association between PM10 and mortality, a single combined model using MCMC
was perceived to be too computationally demanding in light of any improvement in
estimation to be made.
A similar hierachical model to the above was also used in Dominici et al. (2002)
and Dominici et al. (2004). In Dominici et al. (2002) the above, analysis was ex-
tended to 88 of the largest US cities, with interest mainly focussed on the effects of
PM10. In Dominici et al. (2004), a hierachical bivariate model was used to charac-
terise the relationship between PM10 and both mortality and hospital admissions
for cardiovascular diseases for 10 metropolitan areas in the US.
13
Tweedie et al. (1996) and Wolpert and Mengersen (2004) examine the combined
evidence from 29 studies of the association between environmental tobacco smoke
exposure and lung cancer in adults who have never smoked.
Wolpert and Mengersen (2004) applied an adjusted likelihood approach to syn-
thesise the disparate information from the 29 studies. In their analysis, the assump-
tion of exchangeability between the studies was considered to be untenable. Variabil-
ity between the studies centered around three main quality issues: misclassification
of ever-smokers as never-smokers; misclassification of disease; and misclassification
of exposure.
In their approach, the investigator begins by specifying in detail the target condi-
tions which he/she is interested in for example, the subject population, treatment or
exposure details, etc. Whilst each individual study offers direct evidence about the
parameters that govern that particular study, the idea is then to construct an ad-
justed likelihood function that describes the indirect evidence offered by each study
about the questions of interest to the investigator under the specified target con-
ditions. Studies conducted under conditions quite similar to the target conditions
lend stronger evidence.
If the relationship between the indirect and direct evidence about θ from the
studies is known (with parameter αi), then
LAdji (θ) = Li(φi(θ, αi)) (2.1)
where the function φ is used to adjust the parameter θ towards θ under ideal or target
conditions (θ0). Information about both the functional form for φ and parameter αi
may be gained from expert opinion or by evidence in the literature.
14
2.3 Mixture models
There is a large literature on mixture models, with applications in disease map-
ping (Green and Richardson, 2002), earthquake analysis (Walshaw, 2000), finance
(Watanabe, 2000) and industrial quality control (Kvam and Miller, 2002) to name
only a few. Seminal monographs include Titterington et al. (1985) and McLachlan
and Peel (2000a). Diebolt and Robert (1994) and Marin et al. (2005) provide an
overview from the Bayesian perspective.
Given data (y) which is independent and i.i.d, the density of data given by a
finite mixture model can be represented by;
p(y|θ) =k∑
j=1
λjf(y|θj) (2.2)
where k is the number of components in the mixture, λj represents the probability of
membership to the jth component (∑k
j=1 λj = 1), and f(y|θj) is the density function
of component j, which has parameters θj. We can also represent the density in terms
of a continuous mixture but we do not discuss this representation here.
From Equation (2.2), the posterior distribution of the unknown parameters is
given by
p(θ, λ) ∝ p(y|θ, λ)p(θ, λ)
∝N∏
i=1
[ k∑j=1
λjf(y|θj)
]p(θ, λ) (2.3)
For even relatively moderate sample sizes, analytical methods to evaluate the sum
of kN terms from Equation 2.3 can become too computationally intensive to con-
template (Robert and Casella, 2004). As component membership of the data (y) is
unknown, a computationally convenient method of estimation for mixture models
is to use a hidden allocation process and introduce a latent indicator variable zij,
which is used along the lines of a missing variable approach to allocate observations
yi to each component.
Markov Chain Monte Carlo (MCMC) methods represent the most common ap-
15
proach to estimation of finite mixture models in the Bayesian literature where the
choice of sampler varies widely. The most common sampler used is the Gibbs Sam-
pling algorithm (Diebolt and Robert, 1994), which uses the full conditionals for each
model parameter to simulate from the joint posterior distribution. This is partic-
ularly useful for mixtures as the joint posterior distribution of the parameters is
difficult to simulate from while the full conditionals are often available.
Two alternative approaches to dealing with an unknown number of components
include direct estimation in the sampler or by model comparison. The first can
involve a Markov chain moving in spaces of different dimensions e.g Green (1995)
and Richardson and Green (1997) and the reversible Jump MCMC, while alternative
samplers that move between models are proposed by Stephens (2000a) and Phillips
and Smith (1996). The second, model comparison, involves fitting the mixture model
with different values for k and then using a model choice criteria to choose between
the competing models. For mixture models and missing data problems, various
model choice criteria have been proposed but they are not without their problems
(Celeux et al., 2003). Commonly used criteria are the Bayesian Information Criteria
(BIC) (Kass and Raftery, 1995), the Deviance Information Criteria (DIC) (Celeux
et al., 2003) and Bayes factors (as in Fruhwirth-Schnatter and Kaufmann (2004),
Ishwaran et al. (2001) and Raftery (1996)).
As discussed by a number of papers, (see, for example, Marin et al., 2005 and
Casella et al., 2004), a number of difficulties can arise when constructing a sampler
for a mixture model. The main difficulties include label switching, exploration of
the parameter space, and computational expense. Label switching can occur due to
the invariance of the likelihood to k! permutations of the labels, which during an
MCMC run can cause the allocation vector to switch between components. As a
result the posterior distribution can have k! modes. To overcome this problem, a
common solution proposed by Diebolt and Robert (1994) is to impose identifiability
constraints on the parameters (e.g. µ1, . . . , µk), but as discussed by Celeux et al.
(2000) and Stephens (2000b) these constraints can lead to truncation of the posterior
distribution. As alternatives to imposing an identifiability constraint, Celeux et al.
(2000) proposes a loss minimisation approach, while Stephens (2000b) proposes to
16
use clustering techniques. Casella et al. (2004) suggest a method based on an ap-
propriate partition of the space of augmented variables. Fruhwirth-Schnatter (2001)
proposes a random permutation scheme, and Geweke (2007) proposes a permutation-
augmented simulator, a deterministic modification of the usual MCMC sampler. A
comprehensive discussion of this issue is found in Jasra et al. (2005).
Another difficulty in constructing a sampler for a mixture model is to ensure
a full exploration of the parameter space. This is an issue in general for a sam-
pler in most settings but can be exacerbated in the case of a mixture due to the
expected multimodality of the posterior distribution. A common criticism of the
Gibbs sampler is that it may not always visit the k! symmetric modes of the poste-
rior distribution easily. Alternatives to the standard Gibbs sampler using tempering
MCMC is suggested by Celeux et al. (2000) or adding a Metropolis Hastings step
as suggested in Cappe et al. (2002).
Alternative representations of the standard mixture model, include Hidden Markov
Models (HMM’s) and Dirchlet Process mixture models (DPM’s). HMM’s represent
a generalisation of the mixture model in situations where the observations are not
independent or a latent Markov model is assumed to underly the data observed.
An HMM consists of two processes: a hidden process or a sequence of states that
evolves in a Markov manner and an observed process that is dependent on this hid-
den process. The HMM assumes a Markov dependence in time between the latent
variable of Equation (2.2)
p(zt|z\t) = p(zt|zt−1, zt+1) (2.4)
For a number of states, the process is governed by a transition matrix which spec-
ifies the probability of moving from one state to another. We can also model the
dependency on the observations (autoregressive HMM),
p(yt| . . . ) = p(yt|yt−1, yt+1, zt, θy) (2.5)
Two main methods to sample from the states include Gibbs sampling, and a recursive
scheme using a forwards/backwards filter (Scott, 2002). Inference is enhanced by
17
identification of the underlying states and also of obtaining probability estimates of
moving from one state to another. For these reasons, the approach has found many
applications in a wide range of areas (Hamilton, 1994; Scott et al., 2004).
Dirichlet Process Mixture models (DPM’s) provide a different approach to the
interpretation and estimation of mixtures discussed for far. Here interest is in a
non-parametric approach, in which a mixture model is used to provide a type of
basis function for a density, and for which k may be very large. In the standard
mixture model approach, a commonly used prior for the allocation of the weight
λ (λ = (λ1, . . . , λk)) is to assume a Dirichlet distribution (representing a sort of
stick breaking allocation of λ into k bits). In the limit as k → ∞ the Dirichlet
distribution becomes a Dirichlet Process. The flexibility of the approach has seen
a rapidly expanding literature and a number of applications in recent years (Griffin
and Steel, 2004; Do et al., 2005).
2.3.1 Relevant applications
In this section, we discuss some of the applications of mixture model approaches
related to characterising environmental exposures and outcomes. We also discuss
mixture model approaches which have been used in other applications but which
provide a background to estimation of mixtures over multiple time points developed
in Chapters 6 to 8 of this thesis.
In a disease modelling context, there have been a number of applications of mix-
ture model approaches (e.g (Knorr-Held and Rasser, 1999; Denison and Holmes,
2001; Green and Richardson, 2002; Fernandez and Green, 2002; Gangnon and Clay-
ton, 2000)). In this setting, interest is largely focussed on partitioning or clustering
relative risk estimates observed at particular spatial sites or area units into a num-
ber of groups, either for non-parametric estimation of the underlying risk surface
(Knorr-Held and Rasser, 1999; Denison and Holmes, 2001; Green and Richardson,
2002; Fernandez and Green, 2002)) or where the location and composition of the
clusters is of primary interest (Gangnon and Clayton, 2000). Applications considered
in these papers include: the risk of leukemia (Gangnon and Clayton, 2000; Denison
and Holmes, 2001) and larynx cancer (Green and Richardson, 2002; Fernandez and
18
Green, 2002). For the purposes of this review, we outline the approach considered
by Green and Richardson (2002) which provides a good example and a basis for the
other approaches considered.
Green and Richardson (2002) explore a finite mixture model to allocate or par-
tition relative risk estimates for a particular disease (say θ) identified by area units
(i). A common approach in a spatial setting is to consider that the estimates for the
spatial units (θi) are related by a continuous Markov random field (Besag, 1974).
Here we take,
Yi|Yj = θj, j 6= i ∼ N(n∑
j=1
Wijθj, σ2θDii) (2.6)
where W is a specified matrix of weights (Wii = 0,Wij = −Qij/Qii and Dii = Q−1ii ),
σ2θ is the overall variance of Yi, and Q is a precision matrix. A set of spatial weights
can be specified for (2.6) which define a set of ‘neighbours’. To define ’neighbours’
a number of authors have taken areas i and j to be neighbours if they share a
common boundary. The set of conditional distributions given by (2.6) defines a
Markov random field (MRF) model (Discussed by Besag (1974)).
An alternative approach adopted by Green and Richardson (2002) is to consider
that this continuous random field of θi consists of allocations or partitions (where
now θi = θzi, and zi is an allocation variable (unobserved) taking values (1,. . . ,k),
where k is the number of components). In this approach the spatial variability of
zi is assumed to follow a Potts model, with the number of states (k) and strength
of interaction unknown (ψ) (to be estimated). In the Potts model, p(z|ψ, k) =
eψU(z)−δk(ψ). In contrast to standard mixture models, this formulation does not make
use of explicit weights on components. The degree of spatial dependency is controlled
by favouring probabilistically those allocation patterns where like-labelled locations
are neighbours (U(z) =∑
i∼i′ I[zi = zi′ ]). Interest is in estimating the number of
components k which defines the number of levels of the relative risk surface. The
approach is estimated using Reversible Jump MCMC (RJMCMC) (which allows for
an increase or decrease in the number of components at any particular time in the
estimation process).
Fernandez and Green (2002) extend the work of Green and Richardson (2002)
19
and explore a markov random field similar to a CAR model for zi, but allow the
weights in the mixture to vary from one location to another.
In an image analysis context, Alston et al. (2005) use a Bayesian mixture Gaus-
sian mixture model in conjunction with a hidden Markov random field to estimate
the proportion of tissue types present in an individual CAT scan image of sheep.
While CAT scan images provide a measure of the denseness of tissue, interest is in
estimating the proportion of tissue type (fat and muscle) present in the scan, the
denseness of tissues of interest such as fat and muscle, and a characterisation of the
distribution of these tissues.
In their model, the grey scale data y = (y1, . . . , yn) from CAT scan images are
represented with an approximate density of
g(y) ≈ g(y) =k∑
j=1
wjφj(y|µj, σj) (2.7)
where n is the number of pixels in the CAT scan. The latent variable indicator zi
(here representing an allocation variable for yi, again taking values in (1, . . . , k)) is
assumed to be drawn from a hidden Markov random field (MRF) involving the first
order neighbouring pixels (pixels which share an edge are deemed to be neighbours)
in the CAT scan. The joint distribution is represented by (otherwise known as a
Potts model)
p(z|β) = C(β)−1exp(β∑
zizδi) (2.8)
where C(β) is a normalising constant, δi indicates a neighbour of pixel i and zizδi= 1
if zi = zδi, otherwise zizδi
= 0. The parameter β estimates the level of spatial
homogeneity in component membership between neighbouring pixels in the image.
Previous scans are represented as a perturbation on the current Gaussian component
estimates in both time and space,
µj = ρtµjt + ρsµjs + εj (2.9)
σ2j = θj(ρtσ
2jt + ρsσ
2js) (2.10)
20
where s and t represent previous space and time estimates respectively, and pertu-
bation parameters are θj, εj and (ρt, ρs) which are assumed to follow a random walk.
Estimation of the number of components uses birth and death processes, and BIC
criteria.
A natural extension of the approaches developed above in a spatial setting is to
consider these approaches for a time series setting. In either setting, we can use
prior information about the dependency that exists between neighbouring observa-
tions (in space or time) to improve estimation. However, the influence or choice of
similar informative priors in a time series framework and the implications in different
data environments has largely not been examined. We discuss these influences and
implications in Chapters 6 to 8.
Lee and Berger (2003) propose a mixture model to analyse ozone measurements
at different altitudes. In a two stage approach they model the spatial component
(altitude) as a four component mixture of normal distributions, and a state-space
representation to describe the parameters over time. Figure 2.3.1 shows scatterplots
of altitude and ozone measurements with fitted mixture curves for selected time
points (from Lee and Berger (2003))
In the first stage of the analysis, for each time point, the mixture model consid-
ered is
y =4∑
j=1
wjf(h|µj, τj) + εh (2.11)
where: the εh are independent mean zero Gaussian errors; y are ozone measurements;
h is altitude; and the weight wj can be interpreted as the amount of ozone contained
in the jth component of the mixture, and the sum of the weights,∑4
j=1 wj is the
total column ozone amount. Parameters are estimated using MCMC.
Due to the size and complexity of the dataset (and future goals of the analysis)
a single stage analysis in which parameters are time dependent was considered to be
infeasible. In the second stage of the analysis, the posterior modes of the parameters
from the first stage are explicitly modelled over time using Bayesian state-space
modelling (West and Harrison, 1997).
21
Figure 2.1: Representative scatter plots of altitude (km) versus ozone partial pressure (micro-millibar) with fitted mixture regression curves. (a) 7 February 1990; (b) 9 February 1990; (c) 12February 1990; (d) 14 February 1990
22
2.4 Conclusion
In this chapter, we have provided and overview of the main approaches used in this
thesis. Further discussion of the relevant approaches in the literature are contained
in each chapter. The meta-analysis approaches discussed form a basis for the ap-
proaches developed in Chapters 3 and 4. Similarly, the mixture model approaches
discussed form a basis of the approaches developed in Chapters 4 to 8.
Chapter 3
Assessing the combined effect of asbestos exposure
and smoking on lung cancer: A Bayesian approach
3.1 Summary
In this Chapter, we review the literature on the combined association between lung
cancer and two environmental exposures, asbestos exposure and smoking, and ex-
plore a Bayesian approach to assess evidence of interaction between the exposures.
The meta-analysis combines separate indices of additive and multiplicative relation-
ships and multivariate relative risk estimates. By making inferences on posterior
probabilities we can explore both the form and strength of interaction. This anal-
ysis may be more informative than providing evidence to support one relation over
another on the basis of statistical significance. Overall, we find evidence for a more
than additive and less than multiplicative relation.
3.2 Introduction
There is well documented evidence indicating that both long term exposure to as-
bestos and active smoking are independent risk factors for lung cancer. The statisti-
cal form of their combined effect is less clear. The question of interest here is whether
23
24
the risk from exposure to both asbestos and smoking is an additive, multiplicative
or other relation of the risk from exposure to each factor alone.
Methodologies for assessing the relationship between risk factors have been the
subject of much research (Saracci and Boffetta, 1994; Liddell, 2001; Lee, 2001).
As well as interpreting interaction in the context of classical relative risk models
(Roy and Esteve, 1998) recent studies have explored the viability of non-parametric
models (van der Linde and Osius, 2001).
Evidence for a multiplicative association between exposure to asbestos and active
smoking and the outcome of lung cancer was indicated by an early study of US
workers (Selikoff et al., 1968). Subsequent studies and reviews of the literature with
an objective to assess the form of the relationship further have indicated mixed
results, ranging from mixed evidence for either an additive or multiplicative relation
to strong evidence for a supramultiplicative relation (Saracci and Boffetta, 1994;
Erren et al., 1999; Vainio and Boffetta, 1994; Steenland and Thun, 1986).
The importance of understanding the combined effect of asbestos exposure and
smoking can be placed in both a public health and legal context. From a public
health perspective, evidence for a multiplicative relation between asbestos exposure
and smoking has lead to recommendations for asbestos-exposed smokers to quit
smoking, since cases of lung cancer induced by both exposures would be prevented,
along with those induced by smoking alone (Waage et al., 1997). In a legal context, a
greater understanding of the combined effect has been required in the attribution of
damages in cases where there is a history of exposure to both asbestos and smoking
(Guidotti, 2002).
The objective of this investigation is to review the evidence for the combined
effect of smoking and asbestos, the relationship of which is frequently debated in
epidemiology, and to propose a Bayesian approach for combining this information.
The strength of the Bayesian approach, in this context, is twofold. First, through
the hierarchical structure of likelihoods and priors, informed opinion about variance
structures and relationships between studies and outcomes can be integrated with
the observed data. The second is the ability to make useful probability statements
on the basis of all information, rather than simple significance statements based on
25
specific hypothesis tests.
3.3 Assessing interaction between asbestos and
smoking
While a conceptual basis for assessing interaction between two risk factors is well
known (Greenland and Rothman, 1998), in general, tests for interaction and the
interpretation of results are less well understood (UNSCEAR, 1982). Incorrect ap-
proaches to assess interaction appear frequently in the literature (Hallqvist et al.,
1996). Further, as many studies are underpowered to assess interaction, assess-
ments of strength of interaction, rather than statistical significance may be impor-
tant (Saracci and Boffetta, 1994).
A conceptual basis for understanding interaction between risk factors is found in
Rothman’s (Rothman, 1976) component sufficient-cause paradigm of disease causa-
tion. Under this paradigm, synergistic or positive interaction occurs if two exposures
are component causes in the same sufficient cause. In the context of this study, a
case for synergistic interaction can be made if some persons develop lung cancer only
under exposure to both asbestos and smoking.
A synergistic interaction effect, in the biological sense, is tested by departure
from additivity of absolute effects. That is, the relative excess risk among those with
combined exposure should exceed the sum of the relative excess risks for each of the
component causes, referenced to those not exposed to both causes. This description
is analagous to the basis for the Synergy Index (S) introduced by Rothman (1974)
and outlined in the next section.
Hallqvist et al. (1996) describes some of the problems with approaches which have
been used in the literature. For example, an inappropriate approach is to compare a
higher cumulative incidence of a joint exposure to that observed for either risk factor
separately and infer that one risk factor is exacerbating the effect of the other, since
the relationship of each risk factor to a joint exposure may be less than additive.
Another common approach to assessing interaction is to include a product term in a
26
logistic or log-linear regression. As both types of regressions assume a multiplicative
form, including an interaction term assesses departure from a simple multiplicative
model but provides no information in support of an additive relation.
3.3.1 Synergy Index (S)
A common test for an additive relation used in the epidemiological literature is the
Synergy Index (S). The theoretical basis for S has been well described (Rothman,
1974, 1976). Here we present the methodology as outlined by Rothman (1976).
Suppose that there are two independently acting causal agents, in this case say
A for asbestos and S for smoking, and underlying (background) causes denoted
collectively as C, also independent of A and S. Then
PT = PA + PS + PC − PAPS − PAPC − PSPC + PAPSPC (3.1)
where P denotes the probability that a disease develops alone (with appropriate
subscripts), and the subscript T denotes the total probability (where A,S and C
are present). The combined or joint effect of A and S on the probability of disease
(risk) is given by PT − PC . i.e PA + PS − PAPS − PAPC − PSPC + PAPSPC (under
independence)
Using risk notation, define RAS = PT , R00 = PC , RA = PA + PC − PAPC and
RS = PS + PC − PSPC . Assuming that PA and PS are small (the implications of
this are discussed by Wildner and Markuzzi (1997)), this can be simplified to
PT − PC∼= PA + PS (3.2)
which in risk notation becomes
RAS −R00∼= RA + RS − 2R00 (3.3)
Equation (3.3) can be expressed in relative risk terms (by dividing each term by
27
R00), which can then be defined as a Synergy Index (S),
S =RRAS − 1
RRS + RRA − 2=
ERRAS
ERRA + ERRS
(3.4)
where ERR is the excess relative risk. Thus, positive interaction or synergy is
observed if the relative risk attributable to combined exposure exceeds the sum of the
risks attributable to each exposure separately. Alternatively, S can be interpreted as
the excess risk from exposure (to both exposures) when there is interaction relative
to the excess risk from exposure (to both exposures) without interaction. Under the
additive hypothesis S = 1, whereas for a more than additive model S > 1 and a
less than additive model is reflected by S < 1. On the basis of S, estimates can be
obtained of the attributable proportion of risk due to interaction, API = S/(S − 1)
(Walker, 1981). The API expresses the proportion of lung cancer risk for those
exposed to both factors (including background risk) that can be attributed to the
combined (as distinct from the separate) effects of the two factors. The calculation
for the standard error of S is described in the Appendix.
3.3.2 Multiplicativity Index (V)
A common test for a multiplicative relation is to include an interaction term in
a logistic or log-linear model (Gustavsson et al., 2002). Alternatively, in a recent
review of the literature, Lee (2001) defines and uses a ‘Test of Multiplicativity’. Since
this is not strictly a test we use this here in the equivalent sense of a Multiplicativity
Index.
Following Lee (2001), for a multiplicative relation to hold, the product of risks
for R00 and RAS should equal that for RA and RS:
RASR00 = RARS (3.5)
or in relative risk terms (by dividing by R00)
RRAS = RRSRRA (3.6)
28
The Multiplicativity Index (V) is simply then,
V =RRAS
RRSRRA
(3.7)
Under the multiplicative hypothesis V = 1, whereas for a more than multiplica-
tive model (e.g an exponential relation) V > 1, and for a less than multiplicative
model V < 1. The calculation for the standard error of V is described in the Ap-
pendix.
Note that there is no specific value of S that corresponds to a multiplicative
model. Similarly, there is no specific value of V that corresponds to an additive
model. Therefore, neither index confirms one model and rejects the other, but an
investigation of both indices together provides an assessment of the degree of support
for additive or multiplicative relationships.
3.3.3 The relationship between exposure to asbestos and
smoking
In the earliest reported assessment of the interaction between exposure to asbestos
and smoking on lung cancer, Doll found some evidence for a multiplicative hypoth-
esis, although it was “far from convincing” (Doll, 1971). Subsequent reviews by
Saracci (1977, 1987); Saracci and Boffetta (1994) and Erren et al. (1999) indicated
evidence in support of the multiplicative hypothesis, while evidence from Berry et al.
(1985) was inconclusive. Consistent with the evidence from a number of studies, two
recent reviews of the literature by Lee (2001) and Liddell (2001) arrived at slightly
different conclusions as to the form of the combined effect. Lee (2002) found lit-
tle evidence to reject a multiplicative relation: “The asbestos relative risk may be
somewhat lower in smokers than non-smokers, but the available data do not clearly
reject the simple multiplicative relation. More complex models of joint action might
indeed fit the data better, but in view of the general problems with the data, it
seems doubtful whether more detailed statistical analysis would shed any greater in-
sight.” (p.496). However, Liddell (2002) highlighted differences in the results of the
case-control versus cohort studies, finding evidence against a simple multiplicative
29
hypothesis. “Therefore, the multiplicative hypothesis is not generally satisfactory.
Nor, of course, is the additive hypothesis, although it does fit some data sets very
well. Evidently, interaction takes several forms. ” (p.495).
Both authors agreed that the form of the combined effect is more than a simple
additive relation, but the strength and nature of the more complex association was
not unanimously determined.
3.4 Methods
3.4.1 Studies
In our assessment of the interaction between exposure to asbestos and smoking we
restrict our attention to the set of studies included in two recent reviews of the
literature by Lee (2001) and Liddell (2001), as it is here that the debate over the
relationship between asbestos exposure and smoking crystalised. The inclusion of
these studies also allows a comparison of results from the approaches explored in
this chapter. A full search of the MEDLINE reference database (1966 - May 2004)
was performed to confirm the information on the studies included in the two reviews
and to assess the influence of results from studies published since then. Details and
results of studies published after 1998 are provided in the discussion section. Details
of studies up to and including the reviews by Lee and Liddell (1966 - 1998) are
shown in Table 3.1.
A summary of the relevant results from studies which provided enough informa-
tion to estimate relative risk of lung cancer for each exposure category is given in
Table 3.2. Differences in results between studies can be partly explained by vari-
ability in exposure levels for both asbestos and smoking. For example, some of
the studies included ex-smokers and light smokers in the non-smoking group. The
variability in exposure levels is discussed further in Section 5, and addressed in the
sensitivity analysis (Section 4.1). Our review found much variability both in the
use of formal statistical methods to assess the combined effect, and the conclusions
reached. The statistical methods to assess interaction ranged from visually compar-
30
Table 3.1: Details of Studies Used for Statistical Analysis
AuthorLocation Study Type and Population Period Followed Study
Ref.*Selikoff & Hammond Cohort 1: New York and
Newark NJ, Cohort 2: USAand Canada
Cohort. Asbestos insulation workers Cohort 1: 1963-1973,Cohort 2: 1967-72
13(1975)
Martischnig et al. Gateshead, England Hospital CC in shipbuilding area 1972-73 2(1977)Blot et al. (1978) Georgia, USA Hospital CC in shipbuilding area 1970-76 7
Hammond et al. (1979) USA and Canada Cohort. Asbestos insulation workers 1967-76 15
Blot et al. (1980) Virginia, USA Hospital CC in shipbuilding area 1972-76 8
Selikoff et al. (1980) New Jersey, USA Cohort. Amosite asbestos factory 1961-77 14workers
Blot et al. (1982) Florida, USA Hospital based CC in shipbuilding 1970-75 9area
Liddell et al.(1984) Quebec, Canada Cohort. Chrysotile miners and millers 1967-75 17
Pastorino et al.(1984) Lombardy, Italy CC in industrial areas 1976-79 3,4
Berry et al.(1985) East London, England Cohort. Asbestos factory workers 1960-70, 1971-80 16,18
Kjuus et al.(1986) Telemark andVestfold, Norway
Hospital CC in industrial and 1979-83 6shipbuilding areas
de Klerk et al. (1991) Wittenoom, Australia Nested CC in crocidolite miners and 1979-86 1millers
Bovenzi et al.(1993) Trieste, Italy Decedent CC in industrial and 1979-81, 1985-86 5shipbuilding area
McDonald et al. (1993) Quebec, Canada Cohort. Chrysotile miners and millers 1950-92 10
Zhu & Wang (1993) 8 factories, China Cohort. Chrysotile asbestos products 1972-86 11workers
Meurman et al. (1994) North Savo, Finland Anthophyllite miners 1953-91 12
Note: *Study numbering for the purposes of this statistical review (as used by Lee (2002), except
Studies 14 - 19 are referenced here as Studies 13 - 18), and in Table 3.2. For study references, see Lee (2001).
31
ing relative risk estimates for exposure groups to more formal significance testing.
There was no consistent conclusion in favour of either an additive or multiplicative
relation.
Table 3.2: Reported Results of Studies
Study AuthorObserved relative risk estimates (95% CI) Covariance of
relative riskestimates
RRS RRA RRAS
1 deKlerk 3.44 2.24 9.57 1.65(0.74, 16.01) (0.41, 12.28) (2.25, 40.65)
2 Martischnig 1.78 1.08 5.57 1.17(0.75, 4.20) (0.19, 6.05) (2.04, 15.18)
3 Pastorino, no PAH 5.47 2.82 9.86 5.50(0.40, 74.20) (0.04, 188.22) (0.69, 140.09)
4 Pastorino, PAH 6.93 2.21 15.50 11.72(0.30, 159.08) (0.02, 206.42) (0.63, 380.37)
5 Bovenzi 10.13 1.83 15.89 3.45(1.13, 91.06) (0.10, 33.82) (1.77, 142.80)
6 Kjuus 5.41 2.41 19.86 1.21(2.09, 13.99) (0.46, 12.50) (5.57, 70.78)
7 Blot, Georgia 4.71 1.28 7.58 1.13(2.27, 9.77) (0.27, 6.01) (3.31, 17.35)
8 Blot, Virginia 3.09 1.88 4.87 1.14(1.43, 6.70) (0.64, 5.50) (2.04, 11.58)
9 Blot, Florida 6.01 1.80 7.79 1.66(1.45, 24.92) (0.14, 22.85) (1.77, 34.18)
10 McDonald 4.46 1.65 4.51 1.11(2.34, 8.48) (0.70, 3.88) (2.38, 8.57)
11 Zhu 1.83 3.78 11.06 1.32(0.58, 5.76) (1.25, 11.37) (3.87, 31.62)
12 Meurman 6.27 0.83 6.16 2.72(0.82, 48.25) (0.05, 13.22) (0.85, 44.78)
13 Selikoff & Hammond 7.13 8.47 73.73 1.07(4.20, 12.11) (1.92, 37.25) (40.47, 134.33)
14 Selikoff 8.67 25.00 40.63 1.07(5.11, 14.71) (9.00, 69.41) (22.30, 74.01)
15 Hammond 10.85 5.17 53.24 1.06(6.39, 18.41) (2.17, 12.32) (31.11, 91.12)
16 Berry, 1971-80 M+F 7.13 7.27 17.25 1.06(4.20, 12.11) (2.39, 22.09) (9.75, 30.52)
17 Liddell 4.94 2.98 8.21 24.62(0.14, 172.43) (0.07, 127.77) (0.24, 279.28)
18 Berry, 1960-70 F 7.13 5.00 52.56 1.06(4.20, 12.11) (0.66, 38.02) (25.06, 110.25)
32
Lee (2001, 2002) and Liddell (2001, 2002) described their criteria for excluding
studies, with many studies excluded for insufficient reporting of exposure levels,
and absences of lung cancer cases in the non-smoking group. The set of studies
for which there was some agreement on their inclusion (See Lee (2002), p.495, Ta-
ble 3.1, Studies 1-12,14-19) are identified and numbered in the right hand column
of Table 3.1.
3.4.2 Methods to assess interaction
Bayesian Meta-Analysis of V and S
We are interested in an overall estimate of the combined effect of asbestos and
smoking using estimates from each study. The main advantage of which is to ‘borrow
strength’ across studies, in order to gain greater precision for the estimate of the
variable of interest, in this case S and V. For each study we estimate the value of
S and V using equations 3.4 and 3.7 respectively, and their associated variances as
outlined above. The information given by Study 11 was insufficient to calculate an
appropriate estimate for the variance of S, and hence we excluded the estimate from
this study in estimating the overall measure. The influence of this exclusion on the
overall results is presented separately in Section 3.5.
Consider first a hierarchical model for S. We suppose that we have k studies, and
that
Yi = observed log(si)
θi = true log(si) for study i, i = 1, ..., k
where si denotes the synergy index for the ith study. Following Dumouchel and
Harris (1983) for the univariate analysis, we make the following distributional as-
sumptions
Y |θ, σ ∼ N(θ, σ2C)
σ−2 ∼ χ2(dfσ)/dfσ
33
and
θ|µ, τ ∼ N(Xµ, τ 2V )
µ|τ ∼ N(0, D →∞)
τ−2 ∼ χ2(dfτ )/dfτ
where C and V are k×k observed and prior variance-covariance matrices respectively,
and the degrees of freedom dfσ and dfτ indicate how well C and V, respectively, are
known. Again following DuMouchel, we assume the studies are independent and
take V to be the k × k identity matrix, and we take C to be a diagonal matrix
with the corresponding diagonal entries the variances of the individual observations
Yi. For a general discussion on these assumptions see Tweedie et al. (1996). X is
a vector of 1’s and µ is the mean log synergy index for all studies combined. The
notation D → ∞ indicates that the elements of D are very large and tending to
infinity.
The hierarchical model for V is the same as that for S, with obvious changes of
notation.
We initially conservatively assume that dfσ = 79, to reflect the average number
of jointly exposed cases, and dfτ = 10 to acknowledge there is little information
about between study-behaviour. In Section 3.5.1 we test the sensitivity of these
assumptions.
The models were run using a Gibbs sampling algorithm in the software package
WinBUGS (Spiegelhalter et al., 2002). For each analysis, estimates were based on
30,000 iterations, after a burn in of 20,000 cycles. Convergence was assessed by
examining Monte-carlo error estimates and Gelman-Rubin statistics (Brooks and
Gelman, 1998a).
Bayesian Multivariate Analysis of Relative Risks
Here Yi is a vector (log(RRAS),log(RRA),log(RRS)) observed for each study i, i =
1, ...k. The multivariate normal distribution is denoted as MVN.
34
For the multivariate analysis, we make the following distributional assumptions,
Yi ∼ MV N(θi, Ci)
θi ∼ MV N(µ, Σ)
C−1i ∼ Wishart(R, 3)
and
µ ∼ MV N(0, D)
Σ−1 ∼ Wishart(V, 3)
where θi are µ are the study-specific and overall posterior estimates, respectively.
C−1i and Σ−1 are precision matrices, and R and V are scale matrices for the prior
variance-covariance matrices. R consists of the observed variance-covariance matrix
for yi, and V is taken to be a diagonal matrix suggesting a priori independence be-
tween the risk factors and little a priori information about the size of the variances.
D is a variance-covariance matrix of diagonal elements approaching ∞. The covari-
ances of observed relative risk estimates for case-control studies were estimated via
logistic regression, and for cohort studies from a Poisson model (Breslow and Day,
1987).
3.5 Results
Table 3.3 provides the observed study-specific estimates of S and V and the corre-
sponding posterior estimates based on the univariate Bayesian model described in
Section 3.4.2. Most of the observed study-specific estimates for S are greater than 1,
indicating a more than additive relationship. The study-specific posterior estimates
for S show evidence of shrinkage towards the overall mean. Overall, by ‘borrowing
strength’ across studies the posterior mean of S is 1.70 with a 95% credible interval
(CI) of (1.09, 2.67) indicating (overall) strong evidence in favour of a more than
additive relationship. Inclusion of Study 11 by assuming a relatively small variance
does not change the overall result greatly (1.74 (1.13, 2.70)).
35
Table 3.3: Results (Univariate): Test of Synergy (S) and Multiplicativity (V)
Study Author Synergy Index(S) Multiplicativity Index (V)ObservedEstimates*
PosteriorEstimates**
ObservedEstimates*
PosteriorEstimates**
1 deKlerk 2.33 2.08 1.25 1.00(0.90, 6.06) (0.83, 5.25) (0.19, 8.15) (0.29, 3.51)
2 Martischnig 5.30 2.77 2.89 1.87(1.23, 22.80) (0.86, 9.29) (0.87, 9.61) (0.70, 5.04)
3 Pastorino, no PAH 1.41 1.48 0.64 0.75(0.64, 3.12) (0.66, 3.26) (0.10, 4.08) (0.22, 2.58)
4 Pastorino, PAH 2.03 1.92 1.01 0.91(0.86, 4.79) (0.83, 4.50) (0.13, 7.89) (0.25, 3.40)
5 Bovenzi 1.49 1.50 0.86 0.85(1.14, 1.96) (1.08, 2.09) (0.31, 2.39) (0.36, 2.01)
6 Kjuus 3.24 2.63 1.52 1.20(1.36, 7.72) (1.11, 6.26) (0.39, 5.93) (0.42, 3.42)
7 Blot, Georgia 1.65 1.65 1.26 1.15(1.07, 2.55) (1.01, 2.69) (0.54, 2.93) (0.55, 2.44)
8 Blot, Virginia 1.30 1.35 0.84 0.84(0.75, 2.24) (0.74, 2.45) (0.39, 1.81) (0.42, 1.69)
9 Blot, Florida 1.17 1.23 0.72 0.76(0.73, 1.87) (0.72, 2.07) (0.22, 2.36) (0.29, 2.00)
10 McDonald 0.86 0.90 0.61 0.66(0.56, 1.31) (0.59, 1.38) (0.25, 1.49) (0.31, 1.44)
11 Zhu 1.60 1.25(0.43, 5.93) (0.45, 3.46)
12 Meurman 1.01 1.13 1.19 0.93(0.45, 2.25) (0.56, 2.33) (0.07, 20.33) (0.22, 4.03)
13 Selikoff & Hammond 5.35 2.50 1.22 1.01(0.63, 45.16) (0.70, 9.18) (0.21, 6.96) (0.30, 3.37)
14 Selikoff 1.31 1.38 0.19 0.29(0.62, 2.78) (0.70, 2.70) (0.06, 0.56) (0.12, 0.71)
15 Hammond 3.73 3.15 0.95 0.93(1.71, 8.11) (1.55, 6.39) (0.44, 2.06) (0.47, 1.84)
16 Berry, 1971-80 M+F 1.31 1.52 0.33 0.46(0.22, 7.68) (0.47, 4.99) (0.10, 1.07) (0.18, 1.18)
17 Liddell 1.22 1.25 0.56 0.64(0.84, 1.77) (0.82, 1.91) (0.20, 1.56) (0.27, 1.51)
18 Berry, 1960-70 F 5.09 2.16 1.47 0.98(0.28, 91.59) (0.54, 8.91) (0.09, 24.74) (0.23, 4.21)
Overall 1.70 0.86(1.09, 2.67) (0.52, 1.41)
Overall (incl. Study 11) 1.74(1.13, 2.70)
Attributable Proportion 0.41Due to Interaction (API) (0.08, 0.63)
Note: Estimates quoted are the mean estimate, below which is the * 95% Confidence Interval or** 95% Credible Interval.
36
The estimated value of API, given an overall observed estimate for S of 1.74, was
0.41 (0.08,0.63). This suggests that for smokers also exposed to asbestos, approxi-
mately 40% of lung cancer cases can be attributed to the synergistic behaviour of
the two carcinogens, as distinct from their separate effects. Note that this ‘attri-
bution’ is descriptive only and, without other analyses, indicates association rather
than cause.
The observed study-specific estimates for V in Table 3.3 vary from 0.19 for Study
14 to 2.89 for Study 2. On the basis of the observed 95% confidence intervals, most
of the studies except for Study 14 show evidence consistent with a multiplicative
relationship. Combining the studies, the overall posterior estimate for V is 0.86
(0.52,1.41). This is consistent with Lee’s results of 0.83 (0.63, 1.08) Lee (2002). Both
the observed and posterior estimates for V indicate conformity with the hypothesis
of a multiplicative relationship.
Table 3.4 provides the results of the multivariate analysis. The study-specific
posterior estimates of the relative risk of exposure to smoking alone (RRS) range
from 4.07 to 8.13. We find an overall posterior estimate for RRS of 5.51 (3.78,7.89).
For the relative risk of exposure to asbestos alone (RRA), the posterior estimates
range from 1.77 to 6.92. Overall, the posterior estimate for RRA is 3.13 (1.80,5.41).
For the combined exposure of asbestos and smoking, the posterior estimates range
considerably from 5.50 to 50.86. Overall, the posterior estimate for RRAS is 13.69
(8.20,22.76). On the basis of the relative risk estimates, the multivariate estimates
for S and V are 1.94 and 0.83 respectively. For each study, we also calculate the
probability that either S is greater than 1 (indicating more than additive) or V is
less than 1 (indicating less than multiplicative). Overall, the multivariate analysis
indicates overwhelming support for a value of S greater than 1 (P(S>1)' 1), and
a very high probability that V is less than 1 (P(V<1)=0.79). Considering the two
tests together, there is very strong evidence the relationship is more than additive
but less than multiplicative. Table 3.5 provides the overall results for S and V under
the univariate and multivariate models and probability estimates that S and V are
greater than selected thresholds.
A comparison of the overall results for S and V from the univariate and multi-
37
Table 3.4: Results for Multivariate RR Analysis
StudyNo. Author Posterior estimates*
RRS RRA RRAS S V P(S)>1 P(V)<11 deKlerk 4.73 2.88 12.30 2.16 1.04 0.97 0.58
(2.16, 10.74) (1.11, 7.81) (5.19, 31.47) (0.98, 4.23) (0.31, 2.51)2 Martischnig 3.43 2.29 9.56 3.05 1.51 0.94 0.37
(1.54, 8.79) (0.72, 8.05) (3.61, 31.66) (0.75, 7.51) (0.31, 4.19)3 Pastorino,
no PAH5.57 2.78 10.75 1.62 0.85 0.87 0.73
(2.34, 13.24) (0.82, 9.16) (3.82, 30.30) (0.69, 3.44) (0.20, 2.46)4 Pastorino,
PAH5.89 2.87 13.45 1.94 0.98 0.93 0.65
(2.38, 14.61) (0.82, 9.87) (4.47, 40.69) (0.77, 4.12) (0.23, 2.82)5 Bovenzi 6.80 2.25 11.00 1.50 0.86 0.91 0.72
(2.70, 16.56) (0.69, 7.20) (4.16, 27.72) (0.80, 2.45) (0.22, 2.37)6 Kjuus 5.51 3.03 17.37 2.67 1.21 0.97 0.45
(2.90, 10.50) (1.16, 8.35) (6.47, 43.08) (0.98, 5.34) (0.33, 2.99)7 Blot, Georgia 5.20 2.02 8.80 1.56 0.96 0.90 0.63
(2.87, 9.87) (0.76, 5.91) (4.28, 20.88) (0.76, 2.83) (0.28, 2.28)8 Blot, Virginia 4.07 2.29 6.99 1.52 0.85 0.81 0.74
(2.16, 8.93) (0.96, 6.23) (3.16, 21.20) (0.62, 3.27) (0.26, 2.05)9 Blot, Florida 5.94 2.25 8.39 1.25 0.74 0.71 0.80
(2.71, 12.53) (0.74, 6.84) (3.56, 21.18) (0.62, 2.37) (0.20, 1.99)10 McDonald 5.08 1.92 5.50 0.94 0.62 0.25 0.92
(2.81, 10.25) (0.87, 4.88) (2.93, 13.72) (0.56, 1.70) (0.23, 1.36)11 Zhu 3.74 4.70 15.93 2.42 1.04 0.98 0.56
(1.50, 10.00) (1.61, 12.94) (5.94, 44.48) (1.10, 4.50) (0.30, 2.45)12 Meurman 6.27 1.77 7.22 1.03 0.80 0.47 0.77
(2.47, 15.17) (0.49, 6.36) (2.67, 20.53) (0.51, 2.28) (0.19, 2.33)13 Selikoff &
Hammond6.25 6.53 50.86 4.91 1.49 0.99 0.33
(2.97, 11.15) (1.82, 20.45) (12.28, 104.27) (1.55, 9.24) (0.36, 3.97)14 Selikoff 6.39 6.92 26.71 2.43 0.78 0.97 0.76
(2.80, 12.76) (1.65, 26.60) (7.80, 64.91) (0.93, 5.40) (0.16, 2.46)15 Hammond 8.13 4.62 36.16 3.38 1.09 0.99 0.53
(3.51, 14.44) (1.67, 11.48) (10.94, 72.31) (1.44, 5.54) (0.34, 2.59)16 Berry,
1971-80M+F
6.37 4.56 15.83 1.71 0.63 0.96 0.88(3.46, 10.68) (1.61, 10.99) (7.85, 29.22) (0.94, 3.10) (0.22, 1.64)
17 Liddell 5.42 2.93 9.62 1.45 0.72 0.84 0.82(2.15, 13.41) (0.95, 8.80) (3.40, 27.36) (0.71, 2.90) (0.20, 1.90)
18 Berry, 1960-70 F 6.46 5.10 37.08 3.98 1.34 0.99 0.41(3.42, 10.80) (1.59, 15.99) (10.90, 76.17) (1.34, 7.52) (0.33, 3.51)
Overall 5.51 3.13 13.69 1.94 0.83 1.00 0.79(3.78, 7.89) (1.80, 5.41) (8.20, 22.76) (1.29, 2.84) (0.46, 1.40)
Note: * Estimates quoted are the mean estimate, below which is the 95% Credible Interval.
38
variate analyses in Table 3.5 does not reveal a large difference in either the point
estimates or the confidence range. The results for V from Studies 2 and 5 (the
largest study) can be compared in Figures 3.1 and 3.2. For Study 2 the relative
risk and covariance estimates are low compared to higher estimates for Study 5.
These figures again reveal relatively minor differences in the point estimates, but
the multivariate analysis supports a wider credible interval.
Table 3.5: Combined Results for S and V
Analysis Test of Synergy (S) Test of Multiplicativity(V)Overall
(95% CI)P(S)>1 P(S)>1.5 P(S)>2 Overall
(95% CI)P(V)<0.5 P(V)<1 P(V)<1.5
Bayesian Univariate 1.70 0.99 0.70 0.23 0.86 0.02 0.74 0.98(1.09, 2.67) (0.52, 1.41)
Bayesian Multivariate 1.94 1.00 1.00 1.00 0.83 0.05 0.79 0.98(1.29, 2.84) (0.46, 1.40)
Figures 3.3 and 3.4 show bivariate density plots for S and V based on the results
of the multivariate analysis from Studies 2 and 5 respectively. Lower relative risk
and covariance estimates for Study 2 results in a bivariate density plot which is
more evenly spread compared to Study 5. Table 3.6 provides a test of the variance-
covariance matrix for the multivariate analysis. Higher covariance estimates for the
relative risk estimates appear to result in a tightening of the 95% credible intervals
for S and V.
3.5.1 Sensitivity of the Results
A meta-analysis provides an opportunity to investigate subgroups of studies. Ta-
ble 3.7 provide the results for S and V by type of study, classification for non-smoker,
use of external reference, type of asbestos, classification used for no asbestos expo-
sure, and study size. There appears to be strong evidence for a less than multiplica-
tive relationship among the following subsets: prospective studies (P(V<1)=0.86);
39
Figure 3.1: Box plots of V from Study 2
40
Figure 3.2: Box plots of V from Study 5
41
Figure 3.3: Density plot of V and S from multivariate analysis for Study 2
42
Figure 3.4: Density plot of V and S from multivariate analysis for Study 5
43
Table 3.6: Sensitivity of estimates from the Variance/Covariance Matrix for the MultivariateAnalysis
Analysis S (95% CI) V (95% CI)Main 1.94 (1.29, 2.84) 0.83 (0.46, 1.40)Low variance (0.20), Low covariance (0.15) 2.07 (1.38, 2.99) 0.97 (0.60, 1.50)High variance (1.00), High covariance (0.90) 2.10 (1.36, 3.16) 1.00 (0.53, 1.74)High variance (1.00), Low covariance (0.15) 2.08 (1.07, 3.67) 0.99 (0.45, 1.92)
studies which classify only those who never smoked as non-smokers (P(V<1)=0.86);
studies based on exposure to crocidolite or amosite (P(V<1)=0.85); and studies
with number of cases less than 150 (P(V<1)=0.78). The difference between types
of asbestos is based on only two studies (1 and 14) for crocidolite and amosite,
and strongly influenced by a low estimate for V from study 14 (Observed V=0.19
(0.06,0.56)).
The sensitivity of the Bayesian univariate estimates to the distributional as-
sumptions in the model is provided in Table 3.8. Plausible ranges for the degrees
of freedom for σ and τ , a tighter precision estimate for µ and a robust analysis
excluding the smallest and largest studies were tested. The overall results appear to
be robust to these alternative assumptions.
The sensitivity of the Bayesian multivariate estimates is summarised in Table 3.9.
Results by study type, a tighter precision for µ and exclusion of the smallest and
largest study are shown. As previously indicated by the univariate results, there
appears to be more evidence for a multiplicative relation for case-control studies
(P(V<1)=0.37) compared to cohort studies (P(V<1)=0.90), although support for
a simple multiplicative relation for cohort studies (V=0.64 (0.23,1.42)) cannot be
ruled out.
44
Table 3.7: Results for S and V by Factor
Analysis Synergy Index (S) Multiplicativity Index (V)PosteriorEstimate (95%CI)
No.Studies
PosteriorEstimate (95%CI)
No.StudiesP(S)>1 P(V)<1
Main 1.70 0.99 17 0.86 0.74 18(1.09, 2.67) (0.52, 1.41)
1. Study Type Prospective 1.6 0.89 8 0.66 0.86 9(0.75, 3.55) (0.30, 1.46)
Case-control 1.82 0.97 9 1.10 0.40 9(0.96, 3.50) (0.53, 2.33)
2. Classification forNon-smoker
Never smoked 1.58 0.90 9 0.68 0.86 10(0.79, 3.26) (0.33, 1.39)
Light smoker 1.88 0.96 8 1.14 0.37 8(0.93, 3.88) (0.51, 2.56)
3. By Group(External Reference)
Data compared toexternal ref.
2.48 0.95 5 0.55 0.86 5(0.81, 7.74) (0.18, 1.69)
Otherwise 1.54 0.95 12 1.01 0.49 13(0.91, 2.61) (0.56, 1.85)
4. Type of AsbestosAny Type 2.03 0.99 12 0.98 0.53 12
(1.14, 3.68) (0.53, 1.83)Crocidolite orAmosite
1.69 0.75 2 0.40 0.85 2(0.33, 8.56) (0.06, 2.56)
Chrysotile 1.01 0.51 2 0.79 0.64 3(0.22, 4.54) (0.20, 3.10)
5. Classification forNo Asbestos Exposure
No Exposure 1.97 0.89 4 1.25 0.34 5(0.66, 6.12) (0.42, 3.75)
Low Exposure 1.53 0.85 7 0.76 0.72 7(0.68, 3.51) (0.30, 1.95)
Population Exp. 1.81 0.92 6 0.73 0.76 6(0.77, 4.22) (0.29, 1.78)
6. Study SizeNo. Cases < 150 1.76 0.92 8 0.71 0.78 9
(0.80, 3.97) (0.30, 1.75)No. Cases > 150. 1.69 0.95 9 0.97 0.54 9
(0.90, 3.26) (0.49, 1.93)
45
Table 3.8: Sensitivity of the Posterior Estimates (Univariate): Test of S and V
Analysis Posterior estimates* P(S)>1 P(V)<1S V
Main 1.70 0.86 0.99 0.74(1.09, 2.67) (0.52, 1.41)
1. df(σ) 14 1.71 0.86 0.99 0.74(1.09, 2.70) (0.53, 1.41)
268 1.70 0.85 0.99 0.74(1.09, 2.67) (0.52, 1.42)
2. df(τ) 20 1.72 0.86 0.99 0.73(1.07, 2.79) (0.51, 1.44)
2 1.63 0.84 1.00 0.79(1.15, 2.36) (0.55, 1.31)
3. Precision(µ) 1/15 1.69 0.86 0.99 0.74(1.09, 2.66) (0.52, 1.41)
4. Robust (excl. smallest 1.69 0.84 0.98 0.74and largest studies) (1.04, 2.75) (0.49, 1.43)
Note: * Estimates quoted are the mean estimate, below which is the 95% Credible Interval.
Table 3.9: Sensitivity of the Posterior Estimates (Multivariate Analysis)
Analysis RRS RRA RRAS S V P(S)>1 P(V)<1Main 5.51 3.13 13.69 1.94 0.83 1.00 0.79
(3.78, 7.89) (1.80, 5.41) (8.20, 22.76) (1.29, 2.84) (0.46, 1.40)SA1 CC 4.22 1.72 8.39 1.99 1.28 0.97 0.37
(2.39, 7.53) (0.85, 3.55) (4.75, 15.18) (0.99, 3.79) (0.48, 2.79)PP 7.14 5.63 23.08 2.13 0.64 0.98 0.90
(4.06, 12.39) (2.51, 12.85) (10.33, 48.28) (1.01, 3.98) (0.23, 1.42)SA2 Restrictive prior 5.22 2.90 12.54 1.92 0.86 1.00 0.75
(3.57, 7.45) (1.67, 4.92) (7.48, 20.59) (1.26, 2.80) (0.48, 1.45)SA3 Robust (excl. smallest 5.13 3.05 12.33 1.87 0.83 1.00 0.79
and largest studies) (3.39, 7.67) (1.71, 5.54) (7.24, 21.16) (1.20, 2.78) (0.43, 1.44)
Note: Estimates quoted are the mean estimate, below which is the 95% Credible Interval.
46
3.6 Discussion
We reviewed the literature on the combined effect of exposure to asbestos and smok-
ing on lung cancer, and explored a Bayesian approach to assess evidence of interac-
tion. A Bayesian approach using estimates of S and V indicates that the relation is
closer to multiplicative than additive, a result consistent with recent reviews of the
literature.
The results highlight two issues. First, estimates from the univariate and multi-
variate analysis were similar but with wider credible intervals on the latter. Although
we have more information about our parameters, the effect of incorporating covari-
ance information is to increase the number of parameters of interest, and we only
indirectly estimate these parameters. The wider credible interval of each estimate
may thus be a more accurate reflection of the uncertainty we have in these estimates.
Second, while there was support from a few of the studies for a multiplicative re-
lation, the same studies also supported an additive relation. It is thus important
to both allow for this uncertainty in the modelling and directly assess competing
hypotheses of interest by analysing the evidence jointly.
Several explanations have been postulated for the biological mechanisms under-
lying evidence for a multiplicative relation. One is that cancer may be a multistage
process, with the two carcinogens acting at different stages (Peto et al., 1996). It is
postulated that early stage carcinogenesis by asbestos supplies a population of initi-
ated cells that the powerful late-stage actions of tobacco carcinogens then promote
to overt cancer (Reif and Heeren, 1999). In contrast, when two carcinogens affect
the same stage of carcinogenesis, then the relative risks are additive, and there is no
interaction (Brown and C, 1989). Another explanation is that smoking may impair
clearance of asbestos particles from the lung (Cohen et al., 1979).
The main emphasis of our review is on studies included in two recent reviews
of the literature (Lee, 2001; Liddell, 2001). A search of the MEDLINE reference
database (1998 - May 2004) and cited references for more recent studies revealed a
number of papers with information relating to occupational exposure to asbestos,
smoking habits, and the association of these factors with lung cancer risks. However,
47
only one study provided quantitative information about relative risks for each expo-
sure category (Gustavsson et al., 2002). The results from one study were based on a
cohort previously included (Liddell and Armstrong, 2002). Some studies provided in-
sufficient information on smoking habits (Rafnsson and Sulem, 2003; Ulvestad et al.,
2002; Goldberg, 1999; Stayner et al., 1997) , while other studies were underpowered
to assess evidence of a joint effect (Rosamilia et al., 1999) or were genetically based
(Schabath et al., 2002). The study by Goldberg (1999) found that “the probabil-
ity that a cancer is due to asbestos is the same among smokers and non-smokers”,
implying a multiplicative relation was found.
The study by Gustavsson et al. (2002) investigated the association between low-
dose exposure to asbestos and lung cancer, and in the analysis of the combined
effect of asbestos and smoking, found evidence indicating a less than multiplicative
yet slightly more than additive effect. Relative risk estimates (with 95% confidence
intervals) are reported as RRS=21.8 (14.4, 32.8), RRA=4.2 (1.6, 11.1), RRAS=28.6
(19.9, 48.3). Departure from multiplicativity was investigated in the study by includ-
ing an interaction term in a logistic regression (β12=0.31 (0.11,0.86)), and departure
from additivity was evaluated using the Synergy Index (1.15 (0.77, 1.72)).
There are, of course, limitations associated with combining studies in the form of
a meta-analysis. Meta-analysis is designed to enable a combination of results from
studies which are comparable in outcome and exposure. Here we have combined
studies with variability in, but not limited to: definitions of non-smokers; exposure
times to asbestos; and exposure to different types and size of asbestos particles. In
the first case, approximately half of the studies (ten) defined a non-smoker as ‘never
smoked’, with the rest combining non-smokers and ‘light smokers’. Although, as Lee
points out, it could still be possible to observe a multiplicative relation regardless
of the smoking definition, the magnitude of the effect may be somewhat diminished
(Lee, 2001). Across studies there is also a difference in the duration and level of
exposure to asbestos. A recent study by Gustavsson et al. (2002) examining the
risk of low-dose exposure to asbestos, found evidence for a multiplicative relation
with a magnitude of interaction lower than that previously reported for higher doses.
The studies also differ in the type of asbestos and the size of asbestos particles to
48
which subjects are likely to be exposed. Another recent study by Hodgson found
the risk differential between chrysotile and crocidolite or amosite for lung cancer to
be between 1:10 and 1:50 (Hodgson and Darnton, 2000). Further, there is evidence
that the size of asbestos particles is important. Landrigan (1998) found evidence
that the risk of lung cancer in the mining and milling industry is 10 to 50 fold lower
than in industries that process and use asbestos, such as textile manufacture and
insulation. In industries that use and process asbestos, bundles of fibres are broken
up into shorter, thinner fibres that are readily inhaled and retained in the alveoli.
There are also limitations to assessing interaction. First, in the case of studies
on exposure to asbestos and smoking, the small number of lung cancer cases for
non-smokers greatly increases the uncertainty of establishing any relation unless
study populations are very large or specifically targeted. For example, in study
9, we found support for a multiplicative relation using our test of multiplicativity
(V=0.72 (0.22,2.36)). A hypothetical increase in the number of lung cancer cases
occurring in the asbestos exposed population from 5 to 10 would only be needed
to obtain a result which is not supportive of a multiplicative relation. Greenland
and Rothman (1998) suggest that even with large data sets we may not have enough
information to establish relations among variables while controlling for confounding.
Second, consistent with a multi-stage model of carcinogenesis, the form of interaction
observed may be influenced by the length of follow-up time in studies (Archer, 1988).
An assessment of interaction is also a function of dosage levels for each risk factor,
both in the nature of the functional form assumed for dose-response relationships for
each factor and the dosage levels at which they combine. In the case of continuous
covariates, care must be taken to consider the appropriate dose-reponse relationship
for each factor individually before an assessment of the combined effect. Here we
have used categorical covariates (exposed versus not exposed) on the dosage levels
for each factor, and the dose-reponse relationship is difficult to explicitly model.
Our main interest is then the extent to which the risk factors combine at this binary
level. Although the definitions of those exposed and not exposed are subject to
cutoff points we should still be able to see evidence for a multiplicative or additive
relation provided the definitions are consistent across studies. However, a limitation
49
of such a binary classification is that the power to test interactions is essentially
determined by the size of the smallest category, so few lung cancer cases for non-
smokers suggests that an analysis based on a binary classification is likely to be
weaker than one based on continuous data.
Chapter 4
A Bayesian approach to assess interaction between
known risk factors: the risk of lung cancer from
exposure to asbestos and smoking
4.1 Summary
In Chapter 3, we primarily focussed on separate tests for an additive or multiplicative
relation. In this Chapter, we extend these approaches by exploring the strength
of evidence for either relation using approaches which allow the data to choose
between both models. We then compare the different approaches. As this chapter is
designed to be read independently of Chapter 3, the first three sections (Introduction,
Overview of studies, and Methods to assess interaction) are largely repeated from
Chapter 3.
4.2 Introduction
The assessment of relationships between risk factors has been the subject of much
research (Saracci and Boffetta, 1994; Liddell, 2001; Lee, 2001). Recent studies have
provided an interpretation in the context of classical relative risk models (Roy and
50
51
Esteve, 1998), proposed an alternative measure of effect (Berry and Liddell, 2004)
or assessed non-parametric alternatives (van der Linde and Osius, 2001).
Tests for interaction are commonly based on linear additive or multiplicative
relations. Evidence for a multiplicative relation is indicated if the risk attributed to
combined exposure exceeds the risk attributable to each factor alone. In this sense,
we are talking about positive interaction or synergy between risk factors, and not
antagonism. Alternatively, if the risk attributed to combined exposure equals the
sum of the risks attributable to each factor alone the relation is considered to be
additive.
Evidence for a multiplicative relation between exposure to asbestos and smoking
and the incidence of lung cancer was indicated by an early study of US workers
(Selikoff et al., 1968). Subsequent studies and reviews of the literature with an
objective to assess the nature of the relationship further have indicated mixed results,
ranging from mixed evidence for both an additive and multiplicative relation to
strong evidence for a supramultiplicative relation (Saracci and Boffetta, 1994; Erren
et al., 1999; Vainio and Boffetta, 1994; Steenland and Thun, 1986).
The importance of understanding the nature of the combined effect of asbestos
exposure and smoking can be placed in a public health and legal context. From
a public health perspective, evidence for a multiplicative relation between asbestos
exposure and smoking has lead to recommendations for asbestos-exposed persons
who currently smoke to stop, since cases of lung cancer induced by both exposures
would be prevented, along with those induced by smoking alone (Waage et al., 1997).
In a legal context, a greater understanding of the nature of the combined effect has
been required in the attribution of damages in cases where there is a history of
exposure to both asbestos and smoking (Guidotti, 2002).
We illustrate a Bayesian approach to assessing interaction using evidence on
the risk of lung cancer of exposure to asbestos and smoking. The strength of this
approach, in this context, is two fold. First, through the hierarchical structure of
likelihoods and priors, informed opinion about variance structures and relationships
between studies and outcomes can be integrated with the observed data. The second
is the ability to make useful probability statements on the basis of all information.
52
In particular, we will draw on these strengths, and explore approaches which allow a
single inference to be made about the strength of evidence for one relation compared
to another.
The chapter proceeds as follows. Section 2 provides an overview of the case study
and corresponding available literature. Section 3 outlines the proposed Bayesian ap-
proaches to the assessment of interaction. The results of the case study are presented
in Section 4. General conclusions and discussion follow in Section 5.
4.3 Overview of studies
In this chapter we focus on assessing information about the relationship between
asbestos exposure and smoking from studies included in two recent reviews of the
literature by Lee (2001)and Liddell (2001)(as discussed above), as it is here that the
debate over the relationship crystalised. The inclusion of these studies also allows
a comparison of reported meta-analysis results with those obtained in this chapter.
A full search of the MEDLINE reference database (1966 - May 2004) was used to
confirm the information of the studies included in the two reviews and to assess
the influence of results from studies published since then. The details and results
of studies published after Lee and Liddell are provided in the Discussion. Details
of studies up to and including the reviews by Lee and Liddell (1966 - 1998) are
provided in Wraith and Mengersen (2007).
In the earliest reported assessment of the interaction between exposure to as-
bestos and smoking on lung cancer, Doll found some evidence for a multiplicative
hypothesis, although it was “far from convincing” (Doll, 1971). Subsequent reviews
by Saracci (1977, 1987); Saracci and Boffetta (1994) and Erren et al. (1999) in-
dicated evidence in support of the multiplicative hypothesis, while evidence from
Berry et al. (1985) was inconclusive. Consistent with the evidence from a number of
studies, two recent reviews of the literature by Lee (2001) and Liddell (2001) arrived
at slightly different conclusions as to the form of the combined effect. Lee (2002)
found little evidence to reject a multiplicative relation: “The asbestos relative risk
may be somewhat lower in smokers than non-smokers, but the available data do
53
not clearly reject the simple multiplicative relation. More complex models of joint
action might indeed fit the data better, but in view of the general problems with
the data, it seems doubtful whether more detailed statistical analysis would shed
any greater insight.” (p.496). However, Liddell (2002) highlighted differences in the
results of the case-control versus cohort studies, finding evidence against a simple
multiplicative hypothesis. “Therefore, the multiplicative hypothesis is not generally
satisfactory. Nor, of course, is the additive hypothesis, although it does fit some
data sets very well. Evidently, interaction takes several forms. ” (p.495).
Both authors agreed that the form of the combined effect is more than a simple
additive relation, but the strength and nature of the more complex association was
not unanimously determined.
For the statistical assessment in the current chapter it was decided to include
studies for which there was some agreement on their inclusion between Lee (2002)
and Liddell (2002). Studies were mainly excluded for insufficient reporting of expo-
sure levels, and absences of lung cancer cases in the non-smoking group. A further
discussion on the inclusion of these studies can be found in Lee (2002) and Lid-
dell (2002). Details of the studies included for the following statistical analysis are
provided in Table 4.1.
4.4 Methods to assess interaction
A conceptual basis for understanding interaction between risk factors can be found
in Rothman’s (Rothman, 1976) component sufficient-cause paradigm of disease cau-
sation. Under this framework, synergistic or positive interaction occurs if two ex-
posures are component causes in the same sufficient cause. In the context of this
study, a case for synergistic interaction can be made if lung cancer occurs only with
exposure to both asbestos and smoking, as opposed to exposure to only one of these
two factors.
A synergistic interaction effect, in the biological sense, is tested by departure from
additivity of absolute effects, i.e the relative excess risk among those with combined
exposure should exceed the sum of the relative excess risks for each of the component
54
Table 4.1: Details of studies for statistical review
StudyRef.*
Author Location Study Type and Population Period Followed
1 de Klerk et al.(1991) Wittenoom, Australia Nested CC in crocidolite miners and 1979-86millers
2 Martischnig et al.(1977)Gateshead, England Hospital CC in shipbuilding area 1972-73
3,4 Pastorino et al.(1984) Lombardy, Italy CC in industrial areas. 3: No PAH, 4:PAH
1976-79
5 Bovenzi et al.(1993) Trieste, Italy Decedent CC in industrial andshipbuilding area
1979-81, 1985-86
6 Kjuus et al.(1986) Telemark and Vestfold,Norway
Hospital CC in industrial andshipbuilding areas
1979-83
7 Blot et al. (1978) Georgia, USA Hospital CC in shipbuilding 1970-76area
8 Blot et al. (1980) Virginia, USA Hospital CC in shipbuilding area 1972-76.
9 Blot et al. (1982) Florida, USA Hospital based CC in shipbuilding 1970-75area
10 McDonald et al. (1980) Quebec, Canada Cohort. Workers at Thetford minesand Asbestos, Que.
1891-1920 to 1975.
11 Zhu & Wang (1993) 8 factories, China Cohort. Chrysotile asbestos products 1972-86workers
12 Meurman et al. (1994) North Savo, Finland Anthophyllite miners 1953-91
13 Selikoff & Hammond(1975)
New York and Newark NJ,USA and Canada
Two cohorts. Asbestosinsulation workers
1963-1973, 1967-72
14 Selikoff et al. (1980) New Jersey, USA Cohort. Amosite asbestos factory 1961-77workers
15 Hammond et al. (1979) USA and Canada Cohort. Asbestos insulation workers 1967-76
16,18 Berry et al.(1985) East London, England Population based case-referentasbestos factory workers. 16: 1971-80M+F, 18: 1960-70 F
1960-70, 1971-80
17 Liddell et al.(1984) Quebec, Canada Cohort. Chrysotile miners and 1967-75millers
Note: * Study numbering for the purposes of this statistical review (as used by Lee Lee (2002), except Studies
14-19 are referenced here as Studies 13-18). For study references, see Lee Lee (2001).
55
causes, referenced to those unexposed to both causes. A common test for an additive
relation used in the epidemiological literature is the Synergy Index (S), introduced by
Rothman (Rothman, 1974) as S = (RRAS − 1)/(RRA +RRS − 2) where RRS, RRA
and RRAS are the relative risks from exposure to smoking, asbestos and smoking
and asbestos combined, referenced against no exposure (RR00), respectively. Under
the additive hypothesis S = 1; for a more than additive model S > 1 and less than
additive S < 1.
A test for synergy defined by Lee (Lee, 2001) is based on the rationalisation that
for a multiplicative model to hold, the product of RRAS and RR00 should equal that
for RRS and RRA. Hence the test statistic is V = (RRASRR00)/(RRSRRA). Under
the multiplicative hypothesis V = 1; for a more than multiplicative model V > 1,
and less than multiplicative V < 1.
In a previous chapter, we explored a Bayesian approach to the problem using the
Synergy Index (S) and Test of Multiplicativity (V) independently, and a bivariate
analysis using both these measures (Wraith and Mengersen, 2007). As outlined
above, S and V are essentially two separate tests, either testing for departure from
an additive or multiplicative relation, respectively. In the present chapter we explore
more synthesised approaches to assessing interaction, again in a Bayesian framework.
Our motivation in moving away from tests based on S and V is to explore approaches
which allow a single inference to be made about the strength of evidence for either
relation.
This section is arranged as follows. First, we adopt the common approach of
including an interaction term in either a logistic or Poisson hierarchical regression
model, for case-control or cohort studies, respectively. Next, we consider relative risk
models developed by Guerrero and Johnson (1982) and Lubin and Gaffey (1988).
Finally, we explore a mixture model where we hypothesise that the relative risk of
lung cancer from combined exposure to asbestos and smoking is drawn either from
an additive or multiplicative relation of exposure to asbestos and smoking alone.
56
Meta-analysis of logistic and poisson regression models
A common method to test for a multiplicative model is to include an interaction term
in either a logistic model for case-control data or Poisson model for cohort data. For
the studies in this review we only had access to the summary information provided
by the study results. While some of the case-control studies were matched the study
information provided precluded an analysis using conditional logistic regression.
For case control studies, let Yij denote the number of cases observed in each ex-
posure category j for study i. For the ith study, πij is the probability of observing
outcome j and nij is the number of cases and controls for outcome j. The vari-
ables XA and XS are binary covariates indicating exposure to asbestos or smoking,
respectively, βik is a vector of coefficients where k=(0,A,S,AS) with 0 representing
no exposure, and σ2i is a study-specific variance variable. The following hierarchical
logistic model is assumed,
Yij ∼ Bin(nij, πij)
log(πij
1− πij
) = βi0 + βiAXA + βiSXS + βiASXAXS + εij
(4.1)
with the following priors,
βik ∼ N(θk, 1)
εij ∼ N(0, σ2i )
θk ∼ N(0, 0.1)
σ−2i ∼ Gamma(0.1, 0.1)
Equation (4.1) specifies the full model including the interaction term (XAXS)
and includes allowance for over-dispersion (εij). Our main interest is in θk repre-
senting the effects of the covariates smoking, asbestos exposure and their interaction,
over all the studies. Using (4.1) the exponentiated form of βAS can be expressed
as RRAS/(RRA ∗ RRS), and hence measures the additional risk arising from the
combined exposure.
57
For the cohort studies we assume the following hierarchical Poisson model,
Yij ∼ Poisson(µij)
log(µij) = log(Eij) + βi0 + βiAXA + βiSXS + βiASXAXS + εij
(4.2)
where βik and εij are as in (4.1), Yij and Eij are the observed and expected number
of cases by exposure type, respectively.
Another common test for interaction is to assess the homogeneity of relative risks
across strata (in this case exposure to asbestos and smoking) using a Breslow-Day
test (Breslow and Day, 1987). Homogeneity of relative risks across strata implies a
multiplicative joint effect of two exposure categories, and hence the test is assessing
evidence for a multiplicative model. We do not pursue this test here.
Relative Risk Models
A multivariate meta-analysis of the observed relative risks for each study was first
undertaken to utilise information from both within and across studies. The output
of this analysis was then used as input for three relative risk models.
Here Yi is a vector (log(RRAS),log(RRA),log(RRS)) observed for each study i,
i = 1, ...k. The multivariate normal distribution is denoted as MVN.
For the multivariate analysis, we make the following distributional assumptions,
Yi ∼ MV N(θi, Ci)
θi ∼ MV N(µ, Σ)
C−1i ∼ Wishart(R, 3)
and
µ ∼ MV N(0, D)
Σ−1 ∼ Wishart(V, 3)
where θi are µ are the study-specific and overall posterior estimates, respectively.
C−1i and Σ−1 are precision matrices, and R and V are scale matrices for the prior
58
variance-covariance matrices. R consists of the observed variance-covariance matrix
for yi, and V is taken to be a diagonal matrix suggesting a priori independence be-
tween the risk factors and little a priori information about the size of the variances.
The prior distribution for µ is taken to be very uninformative, with the diagonal
variance-covariance matrix D comprising elements tending to infinity. The covari-
ances of observed relative risk estimates for case-control studies were estimated via
logistic regression, and for cohort studies from a Poisson model (Breslow and Day,
1987).
This multivariate model was run using the software package WinBUGS (Spiegel-
halter et al., 2002). For each analysis, estimates were based on 30,000 iterations,
after a burnin of 20,000 iterations. Convergence was assessed by examining Monte
Carlo error estimates and Gelman-Rubin statistic (Brooks and Gelman, 1998a). The
results of this analysis are provided in Table 4.3. A sample of five hundred (500)
posterior values of θi and µ was then used as input into the relative risk models
described below. A sample of 500 was chosen as a compromise between adequate
representation of the (well-behaved) parameter and computation time.
Several relative risk models have been proposed to assess interaction between
risk factors (Lubin and Gaffey, 1988). A recent review by Roy and Esteve (1998)
highlighted three main relative risk models used in cancer epidemiology, including
those proposed by Thomas (1981), Breslow and Storer (1985), and Guerrero and
Johnson (1982).
We first explored a Box-Cox type transformation applied to the relative risks
developed by Guerrero and Johnson (1982). The attractiveness of this model is that
it is invariant within the family of linear transformations of scale of the covariates,
and as a direct application of the Box-Cox method appears clearly to be searching
for the best link function within a given family (Roy and Esteve, 1998).
Guerrero and Johnson (1982) assumed that a power transformation of the odds
ratio satisfied a linear model,
(πij
1− πij
)γ = β0 + βAXA + βSXS (4.3)
59
and
(πij
1− πij
)γ =(
πij
1−πij)γ − 1
γ(γ 6= 0)
=log(πij
1− πij
) (γ = 0)
(4.4)
Although Equation (4.3) includes, as a special case the logistic model, we can also
estimate the parameters β in a Poisson regression model (See Equation 4.2).
Expressing (4.4) in relative risk terms using the sample posterior estimates (study
level),
θγi
iAS = θγi
iA + θγi
iS − 1 (4.5)
From equation (4.3), a multiplicative model is indicated by γ=0, whereas an ad-
ditive model is preferred if γ=1. We use equation (4.5) and estimate γ using Markov
Chain Monte Carlo (MCMC). As outlined earlier, our approach is to estimate γ us-
ing a sample of the posterior estimates from the multivariate analysis, both at the
study level (θi) and overall (µ).
For equation (4.3) our prior for γ is γi ∼ N(0.5, 1), where γi ∈ (−0.2, 3). This re-
striction on the prior distribution for γ was imposed to allow only feasible estimates,
and did not adversely affect the results.
As an alternative to equation (4.5), a general class of relative risk model proposed
by Lubin and Gaffey (1988) is
θiAS = (θiAθiS)γi(θiA + θiS − 1)1−γi (4.6)
where γ = 1 or γ = 0 imply a multiplicative or additive relation, respectively.
This representation more easily allows different functional forms for dose-response
relationships in the presence of continuous covariates (Lubin and Gaffey, 1988). In
the present analysis the covariates are categorical and there are only two outcomes
(exposed or not exposed), so we do not exploit this flexibility but rather use this
model as a viable functional form in which to express an additive or multiplicative
relation and to compare results. The prior for γ is γi ∼ N(0, 1).
60
An alternative to equation (4.6) was also considered. Here,
θiAS = θiA + θiS − 1 + γiθiAθiS (4.7)
where γ assesses the degree of departure from additivity (γ = 0 implying an additive
relation, γ > 0 more than additive).
The three relative risk models, equations (4.5),(4.6)and (4.7) were run using the
software package WinBUGS (Spiegelhalter et al., 2002) using three chains each with
6,500 iterations, after a burn in of 3,000 iterations. This run length was sufficient
to satisfy convergence diagnostics including Monte Carlo error estimates and the
Gelman-Rubin statistic (Brooks and Gelman, 1998a). Although these three relative
risk models are conceptually straightforward, the estimate for γ from both equations
(4.5) and (4.6) is difficult to interpret for two reasons. First, without information
about the distribution of γ, it is difficult to identify a threshold (e.g 0.5) for model
preference. We discuss this issue later in relation to the results of the mixture model.
Second, the interpretation of γ other than a value of 0 (multiplicative) or 1 (additive)
is difficult as effect parameters from equation (4.5) are measured as differences in
the value of (θγAS − 1)/γ. In light of these issues, the results using equations (4.5)
and (4.6) are best viewed as indicative only of the strength of evidence for either
relation.
Mixture Model
An alternative to the relative risk models described above is a mixture model. The
use of mixture distributions comprising a finite or infinite number of components,
possibly of different distributional types, to describe different features of data has
attracted a great deal of recent research interest (Marin et al., 2005; McLachlan and
Peel, 2000b).
Here we specify that values of θiAS are drawn from a Gaussian distribution with
a mean corresponding to either an additive (θiA +θiS−1) or a multiplicative relation
(θiAθiS). This choice is identified in the variable µ below, which can be drawn from
either µ1j or µ2j for an additive or multiplicative relation respectively. Formally, for
61
each study i,
log(θASj) ∼ N(µ(1,2)j, τ(Tj))
µ1j = log(θAj + θSj − 1)
µ2j = log(θAjθSj)
Tj ∼ Binomial(P1,2)
and
P1,2 ∼ Dirichlet(α)
τ1 ∼ Gamma(0.01, 0.01)
τ2 ∼ Gamma(0.01, 0.01)
where µ(1,2)j and τ(Tj) are mean and precision (1/σ2) variables for the jth obser-
vation, respectively. The parameter α reflects any prior information we may have in
the way in which θASj is allocated between the two components (additive or multi-
plicative relation), and for this analysis we have made it uninformative (α = (1, 1)).
4.5 Results
The meta-analysis using the logistic and Poisson regression models (equations (4.1)
& (4.2)) were based on studies for which all information to calculate relative risk
estimates was provided by the study, thus excluding five cohort studies which relied
on external rates (studies 13,14,15,16 and 18). Table 4.2 provides the results of
these meta-analysis. The reported coefficients (βik and θk) have been exponentiated
in order to interpret their meaning on the relative risk scale.
The results in Table 4.2 confirm mixed evidence in support of the simple mul-
tiplicative model, overall and by study type, with the 95% credible interval for the
relative risk from exposure to asbestos (exp(θA)) excluding 1 for a model without al-
lowance for over dispersion (1.04, 3.22) and including 1 for the model with allowance
for over dispersion (0.86, 3.12). Eleven out of the 13 studies had 95% credible in-
62
Table 4.2: Results of logistic and Poisson regression models
StudyWithout over-dispersion parameter With over-dispersion parameter
θ0 θ1 θ2ˆθ12 θ0 θ1 θ2
ˆθ12
Multiplicative model
Overall 0.09 1.84 4.53 na 0.16 1.64 4.69 na(0.05, 0.16) (1.04, 3.22) (2.56, 8.01) (0.08, 0.30)(0.86, 3.12) (2.49, 8.95)
Case Control 0.14 1.92 4.62 na 0.15 1.83 4.75 na(0.07, 0.28) (0.98, 3.74) (2.33, 9.10) (0.07, 0.33)(0.86, 3.90)(2.24, 10.06)
Cohort 0.03 1.67 4.24 na 0.20 1.18 4.14 na(0.01, 0.09) (0.62, 4.45) (1.50, 12.11) (0.05, 0.74)(0.33, 4.27)(1.13, 15.33)
Multiplicative model with interaction term
Overall 0.09 1.76 4.36 1.11 0.16 1.65 4.63 1.00(0.05, 0.16) (0.95, 3.26) (2.45, 7.83) (0.58, 2.11)(0.08, 0.31)(0.79, 3.42) (2.30, 9.36) (0.43, 2.28)
Case Control 0.15 1.65 4.39 1.24 0.16 1.67 4.45 1.16(0.07, 0.29) (0.78, 3.55) (2.20, 8.76) (0.57, 2.70)(0.07, 0.35)(0.71, 3.97) (1.99, 9.94) (0.46, 2.89)
Cohort 0.03 2.19 4.77 0.79 0.17 1.59 4.85 0.67(0.01, 0.09) (0.69, 6.69) (1.57, 14.56)(0.25, 2.55)(0.04, 0.70)(0.32, 6.94)(1.07, 21.71)(0.13, 3.79)
Note: Estimates quoted are the mean estimate, below which is the 95% Credible Interval.
tervals for exp(βA) excluding 1, thus supporting a simple multiplicative model. For
both models (over-dispersed or otherwise) exposure to smoking appears to dominate
the simple multiplicative model with credible intervals at all levels excluding 1 for
studies overall and by type of study.
Overall, the coefficient for the interaction term (exp(θAS)) in the multiplicative
model has a 95% credible interval in the over-dispersed model of (0.43, 2.28). Inter-
estingly, the posterior mean of exp(θAS) is greater than 1 for case-control studies,
and less than 1 for cohort studies which is in agreement with other evidence that
there is greater support for the multiplicative hypothesis among case-control studies
than cohort studies (Liddell, 2001). Figure 4.1 shows the estimate for θAS in the
over-dispersed model in relation to the study estimates (βAS).
Table 4.3 provides the results of the multivariate analysis. The study-specific
posterior estimates of the relative risk of exposure to smoking alone (θS) range
63
−3
−2
−1
01
23
Figure 4.1: Boxplots of β12(log scale) by study (horizontal axis and study numbers orderedleft to right) and overall (over-dispersed model)
64
from 4.07 to 8.13. We find an overall posterior estimate for θS of 5.51 (3.78,7.89).
For the relative risk of exposure to asbestos alone (θA), the posterior estimates
range from 1.77 to 6.92. Overall, the posterior estimate for θA is 3.13 (1.80,5.41).
For the combined exposure of asbestos and smoking, the posterior estimates range
considerably from 5.50 to 50.86. Overall, the posterior estimate for θAS is 13.69
(8.20,22.76). As outlined earlier we used the posterior estimates from this analysis
as input for the the relative risk models.
Table 4.4 provides the results of the relative risk models and mixture model anal-
ysis. Rgj refers to the relative risk model by Guerrero & Johnson (Equation (4.5)),
Rlg to the Lubin & Gaffey model (Equation (4.6)), and Ra (Equation (4.7)) which
is assessing the degree of departure from an additive model. For ease of compar-
ison with the other results γ∗ for Rgj is expressed as 1 − γ. Overall, we find the
posterior mean estimate of γ∗ is 0.84 (0.82, 0.86) for Rgj, 0.69 (0.66, 0.71) for Rlg,
and 0.35 (0.33, 0.36) for Ra. These results fail to provide unequivocal support for
either a multiplicative model (implied by γ = 1) or an additive model (γ = 0). The
study-specific estimates (γi) appear to be generally higher for Rgj compared to Rlg.
For the mixture model, the overall posterior probability of an additive model is
0.06 (0.00, 0.12). This result is based on the overall relative risk estimates found
in Table 2. However, the study-specific estimates range from a probability of 0 to
0.98. For more than half the studies (11) the posterior probability of an additive
relation is greater than 0.5, and the lower bounds of the 95% credible intervals from
nine studies are greater than 0.5. Thus the study-specific results provide much more
variable support for a clear conclusion of additivity or multiplicativity.
The results from the relative risk and mixture models can be compared with
the analyses of S and V from the same data (Table 4.5). The estimates of S and
V, overall, are 1.94 (1.29, 2.84) and 0.83 (0.46,1.40), respectively, indicating a more
than additive and less than multiplicative relationship. Comparison of these results
with the relative risk and mixture models (Table 4.4) reveals a fairly close inverse
relationship between estimates of V or γ and P (additive). Figure 4.2 shows a star
plot, by study and overall, representing the results from Rlg, S, V and the mixture
model. Studies with strong evidence for a multiplicative relationship have relatively
65
Table 4.3: Relative Risk Estimates, Observed and Posterior
Observed RR Estimates* Posterior MV Estimates**Study RRS RRA RRAS θS θA θAS
1 3.44 2.24 9.57 4.73 2.88 12.30(0.74, 16.01) (0.41, 12.28) (2.25, 40.65) (2.16, 10.74) (1.11, 7.81) (5.19, 31.47)
2 1.78 1.08 5.57 3.43 2.29 9.56(0.75, 4.20) (0.19, 6.05) (2.04, 15.18) (1.54, 8.79) (0.72, 8.05) (3.61, 31.66)
3 5.47 2.82 9.86 5.57 2.78 10.75(0.40, 74.20) (0.04, 188.22) (0.69, 140.09) (2.34, 13.24) (0.82, 9.16) (3.82, 30.30)
4 6.93 2.21 15.50 5.89 2.87 13.45(0.30, 159.08) (0.02, 206.42) (0.63, 380.37) (2.38, 14.61) (0.82, 9.87) (4.47, 40.69)
5 10.13 1.83 15.89 6.80 2.25 11.00(1.13, 91.06) (0.10, 33.82) (1.77, 142.80) (2.70, 16.56) (0.69, 7.20) (4.16, 27.72)
6 5.41 2.41 19.86 5.51 3.03 17.37(2.09, 13.99) (0.46, 12.50) (5.57, 70.78) (2.90, 10.50) (1.16, 8.35) (6.47, 43.08)
7 4.71 1.28 7.58 5.20 2.02 8.80(2.27, 9.77) (0.27, 6.01) (3.31, 17.35) (2.87, 9.87) (0.76, 5.91) (4.28, 20.88)
8 3.09 1.88 4.87 4.07 2.29 6.99(1.43, 6.70) (0.64, 5.50) (2.04, 11.58) (2.16, 8.93) (0.96, 6.23) (3.16, 21.20)
9 6.01 1.80 7.79 5.94 2.25 8.39(1.45, 24.92) (0.14, 22.85) (1.77, 34.18) (2.71, 12.53) (0.74, 6.84) (3.56, 21.18)
10 4.46 1.65 4.51 5.08 1.92 5.50(2.34, 8.48) (0.70, 3.88) (2.38, 8.57) (2.81, 10.25) (0.87, 4.88) (2.93, 13.72)
11 1.83 3.78 11.06 3.74 4.70 15.93(0.58, 5.76) (1.25, 11.37) (3.87, 31.62) (1.50, 10.00) (1.61, 12.94) (5.94, 44.48)
12 6.27 0.83 6.16 6.27 1.77 7.22(0.82, 48.25) (0.05, 13.22) (0.85, 44.78) (2.47, 15.17) (0.49, 6.36) (2.67, 20.53)
13 7.13 8.47 73.73 6.25 6.53 50.86(4.20, 12.11) (1.92, 37.25) (40.47, 134.33) (2.97, 11.15) (1.82, 20.45) (12.28, 104.27)
14 8.67 25.00 40.63 6.39 6.92 26.71(5.11, 14.71) (9.00, 69.41) (22.30, 74.01) (2.80, 12.76) (1.65, 26.60) (7.80, 64.91)
15 10.85 5.17 53.24 8.13 4.62 36.16(6.39, 18.41) (2.17, 12.32) (31.11, 91.12) (3.51, 14.44) (1.67, 11.48) (10.94, 72.31)
16 7.13 7.27 17.25 6.37 4.56 15.83(4.20, 12.11) (2.39, 22.09) (9.75, 30.52) (3.46, 10.68) (1.61, 10.99) (7.85, 29.22)
17 4.94 2.98 8.21 5.42 2.93 9.62(0.14, 172.43) (0.07, 127.27) (0.24, 279.28) (2.15, 13.41) (0.95, 8.80) (3.40, 27.36)
18 7.13 5.00 52.56 6.46 5.10 37.08(4.20, 12.11) (0.66, 38.02) (25.06, 110.25) (3.42, 10.80) (1.59, 15.99) (10.90, 76.17)
Overall 5.51 3.13 13.69(3.78, 7.89) (1.80, 5.41) (8.20, 22.76)
Note: Estimates quoted are the mean estimate, below which is the * 95 per cent confidence intervalor ** 95 per cent credible interval.
66
Table 4.4: Results of relative risk models and mixture model
Relative risk models Mixture ModelStudy No. Rgj Rlg Ra
γ∗ γ γ P(Additive)1 0.83 0.67 0.38 0.24
(0.80, 0.86) (0.62, 0.72) (0.35, 0.41) (0.13, 0.36)2 0.89 0.79 0.57 0.12
(0.84, 0.94) (0.70, 0.89) (0.52, 0.62) (0.03, 0.23)3 0.54 0.30 0.16 0.85
(0.49, 0.59) (0.26,0.35) (0.13, 0.18) (0.76, 0.94)4 0.71 0.48 0.27 0.53
(0.67, 0.75) (0.43, 0.53) (0.24, 0.30) (0.42, 0.64)5 0.43 0.20 0.11 0.98
(0.37, 0.48) (0.16, 0.24) (0.10, 0.13) (0.93, 1.00)6 0.93 0.85 0.52 0.08
(0.90, 0.95) (0.80, 0.91) (0.48, 0.56) (0.04, 0.14)7 0.54 0.31 0.17 0.71
(0.48, 0.59) (0.26, 0.36) (0.15, 0.19) (0.60, 0.81)8 0.54 0.33 0.16 0.75
(0.48, 0.60) (0.27, 0.38) (0.13, 0.18) (0.65, 0.84)9 0.25 0.11 0.06 0.93
(0.18, 0.32) (0.07, 0.14) (0.04, 0.07) (0.87, 0.98)10 -0.50 -0.14 -0.04 0.98
(-0.68, -0.34) (-0.18, -0.11) (-0.05, -0.03) (0.96, 0.99)11 0.85 0.70 0.40 0.12
(0.83, 0.87) (0.66, 0.74) (0.38, 0.43) (0.06, 0.19)12 -0.14 -0.05 -0.01 0.98
(-0.28, -0.01) (-0.09, -0.01) (-0.02, 0.01) (0.95, 0.99)13 1.02 1.10 0.94 0.00
(1.00, 1.03) (1.05, 1.14) (0.88, 0.10) (0.00, 0.01)14 0.75 0.50 0.26 0.56
(0.73, 0.78) (0.46, 0.53) (0.24, 0.29) (0.47, 0.66)15 0.95 0.88 0.62 0.04
(0.93, 0.96) (0.84, 0.92) (0.58, 0.65) (0.00, 0.09)16 0.59 0.34 0.16 0.80
(0.56, 0.62) (0.31, 0.37) (0.15, 0.18) (0.70, 0.89)17 0.46 0.23 0.11 0.89
(0.41, 0.51) (0.19, 0.27) (0.09, 0.12) (0.82, 0.94)18 0.98 0.99 0.73 0.00
(0.97, 1.00) (0.94, 1.03) (0.69, 0.78) (0.00, 0.01)Overall 0.84 0.69 0.35 0.06
(0.82, 0.86) (0.66, 0.71) (0.33, 0.36) (0.00, 0.12)
Note: Estimates quoted are the mean estimate, below which is the 95% Credible Interval.
67
bigger stars, indicating large values for S, V, γ and P(M).
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
17 18 Overall
gamma
PM
S
V
Figure 4.2: Starplots by study (1-18) and Overall. S is the Synergy Index, V the Multiplicativ-ity Index, PM the probability of a multiplicative relation, and gamma is the power transformationestimate from Rlg (gamma=0 (additive), gamma=1 (multiplicative))
Sensitivity Analysis
Table 4.6 provides the results of the sensitivity analysis for both the relative risk
and mixture models. Estimates of γ from the relative risk models are significantly
lower for cohort studies than case-control studies, indicating less evidence of a sim-
ple multiplicative relationship for cohort studies. The mixture model analysis also
appears to show a clearer difference between type of study. Given a choice of ei-
ther an additive or multiplicative model, the probability of an additive model for
68
Table 4.5: Results of Synergy Index (S) and Multiplicativity Index (V)
Study No. S V P(S)>1 P(V)<11 2.16 1.04 0.97 0.58
(0.98, 4.23) (0.31, 2.51)2 3.05 1.51 0.94 0.37
(0.75, 7.51) (0.31, 4.19)3 1.62 0.85 0.87 0.73
(0.69, 3.44) (0.20, 2.46)4 1.94 0.98 0.93 0.65
(0.77, 4.12) (0.23, 2.82)5 1.50 0.86 0.91 0.72
(0.80, 2.45) (0.22, 2.37)6 2.67 1.21 0.97 0.45
(0.98, 5.34) (0.33, 2.99)7 1.56 0.96 0.90 0.63
(0.76, 2.83) (0.28, 2.28)8 1.52 0.85 0.81 0.74
(0.62, 3.27) (0.26, 2.05)9 1.25 0.74 0.71 0.80
(0.62, 2.37) (0.20, 1.99)10 0.94 0.62 0.25 0.92
(0.56, 1.70) (0.23, 1.36)11 2.42 1.04 0.98 0.56
(1.10, 4.50) (0.30, 2.45)12 1.03 0.80 0.47 0.77
(0.51, 2.28) (0.19, 2.33)13 4.91 1.49 0.99 0.33
(1.55, 9.24) (0.36, 3.97)14 2.43 0.78 0.97 0.76
(0.93, 5.40) (0.16, 2.46)15 3.38 1.09 0.99 0.53
(1.44, 5.54) (0.34, 2.59)16 1.71 0.63 0.96 0.88
(0.94, 3.10) (0.22, 1.64)17 1.45 0.72 0.84 0.82
(0.71, 2.90) (0.20, 1.90)18 3.98 1.34 0.99 0.41
(1.34, 7.52) (0.33, 3.51)Overall 1.94 0.83 1.00 0.79
(1.29, 2.84) (0.46, 1.40)
Note: Estimates quoted are the mean estimate, below which is the 95% Credible Interval.
69
case-control studies is 0.02 (0.00, 0.10), and for cohort studies is 0.52 (0.42, 0.62).
This difference in results by type of study is in agreement with previous evidence
(Liddell, 2002) and analysis using S and V.
Table 4.6: Sensitivity Analysis
Relative risk models Mixture ModelSensitivity analysis Rgj Rlg Ra
γ∗ γ γ P(Additive)Type of Study CC 0.88 0.79 0.41 0.02
(0.84, 0.92) (0.71, 0.87) (0.38, 0.44) (0.00, 0.10)PP 0.74 0.50 0.25 0.52
(0.72, 0.76) (0.47, 0.52) (0.23, 0.27) (0.42, 0.62)
Note: Estimates quoted are the mean estimate, below which is the 95% Credible Interval.
4.6 Discussion
We reviewed the literature for quantitative information on the combined effect of
exposure to asbestos and smoking on lung cancer, and explored a Bayesian approach
to assess evidence of interaction. The overall conclusion is that the relation is more
than additive and less than multiplicative, a result consistent with recent reviews of
the literature.
While a conceptual basis for assessing interaction (i.e evidence for one relation
against another) is well known (Greenland and Rothman, 1998), in general, tests
for interaction and the interpretation of results remain topics of some debate (UN-
SCEAR, 1982). Incorrect approaches to test for interaction appear frequently in the
literature (Hallqvist et al., 1996). Much of the difficulty with interpretation of results
is found in the ambiguity between the meaning of interaction from a statistical and
biological perspective. A further difficulty is that, as many studies are underpowered
to assess interaction, assessments of strength of interaction rather than statistical
significance may be important (Saracci and Boffetta, 1994).
70
Hallqvist et al. (1996) describes some of the problems with approaches which have
been used in the literature. For example, an inappropriate approach is to compare a
higher cumulative incidence of a joint exposure to that observed for either risk factor
separately and infer that one risk factor is exacerbating the effect of the other, since
the relationship of each risk factor to a joint exposure may be less than additive.
Another common approach to assessing interaction is to include a product term in a
logistic or log-linear regression. As both types of regressions assume a multiplicative
form, including an interaction term assesses departure from a simple multiplicative
model but provides no information in support of an additive relation.
As outlined in the introduction, our aim in this chapter was to allow joint infer-
ences to be made about the strength of evidence for an additive or multiplicative
relation. Although the power transformation estimate from the relative risk model
provides an appropriate link function there is uncertainty about the interpretation
of evidence for one relation over another. The mixture model, on the other hand,
directly provides information about the preference of one relation over another in
the form of a probabilistic statement. This approach could easily be extended to
include alternative relations.
As outlined in Section 4.4 for each model, our choice of prior distributions for
our parameters of interest, in the absence of available information, were relatively
uninformative. Where information is available alternative assumptions can be made,
but we did not explore them here. Informative priors could be used in cases where
studies are not independent or where expert opinion is available about parameter
values.
The meta-analysis considered here included studies reported in two recent reviews
of the literature by Lee (2001) and Liddell (2001). A search of the MEDLINE
reference database (1998 - May 2006) and cited references for more recent studies
revealed a number of papers with information relating to occupational exposure to
asbestos, smoking habits, and the association of these factors with lung cancer risks.
However, only one study provided quantitative information about relative risks for
each exposure category (Gustavsson et al., 2002). The results from one other study
were based on a cohort previously included (Liddell and Armstrong, 2002), and
71
for another study little detailed information was provided on the joint exposure to
asbestos and smoking (?). Some studies provided insufficient information on smoking
habits (Rafnsson and Sulem, 2003; Ulvestad et al., 2002; Goldberg, 1999; Stayner
et al., 1997; ?), while other studies were underpowered to assess evidence of a joint
effect (Rosamilia et al., 1999) or were genetically based (Schabath et al., 2002). A
study by Goldberg (1999) reported that “the probability that a cancer is due to
asbestos is the same among smokers and non-smokers”, implying a multiplicative
relation was found, but insufficient quantitative information was provided to allow
its incorporation into the meta-analysis considered here.
The case-control study by Gustavsson et al. (2002) investigated the association
between low-dose exposure to asbestos and lung cancer, and in the analysis of the
combined effect of asbestos and smoking, found more evidence for an additive re-
lation compared to a multiplicative. Relative risk estimates (with 95% confidence
intervals) were reportedly RRS=21.8 (14.4, 32.8), RRA=4.2 (1.6, 11.1), RRAS=28.6
(19.9, 48.3). Departure from multiplicativity was investigated in the study by includ-
ing an interaction term in a logistic regression (βAS=0.31 (0.11,0.86)), and departure
from additivity was evaluated using the Synergy Index (1.15 (0.77, 1.72)).
There are limitations associated with combining studies in the form of a meta-
analysis. Meta-analysis is designed to facilitate combination of results from studies
which are comparable in outcome and exposure. The inclusion of studies is inevitably
a choice that directly impacts on the content and applicability of the results. Here
we have combined studies with variability in, but not limited to type of study,
country of study, size of study, definitions of non-smokers, exposure times to asbestos
and exposure to different types and size of asbestos particles. For the first factor
(type of study), we found much greater evidence for a multiplicative relation using
information from case-control than cohort studies, which is consistent with S and
V analyses and as noted by Liddell (2001).
There are also limitations to assessing interaction. First, for studies of exposure
to asbestos and smoking, the small number of lung cancer cases for non-smokers
greatly increases the uncertainty of any estimated association. Second, Greenland
and Rothman (1998) suggest that even large data sets may not provide enough
72
information to establish relationships among variables while controlling confounding.
In the present case, we have relied on summary statistics from studies, and derived
unadjusted relative risk estimates, which is far from the ideal construction of a model
with complete control of covariates such as the extent and duration of smoking,
exposure to other lung carcinogens, etc. Third, and consistent with a multi-stage
model of carcinogenesis, the form of interaction observed may be influenced by the
length of follow-up time in studies (Archer, 1988).
An assessment of interaction is also a function of dosage levels for each risk factor,
both in the nature of the functional form assumed for dose-response relationships for
each factor and the dosage levels at which they combine. In the case of continuous
covariates, care must be taken to consider the appropriate dose-reponse relationship
for each factor individually before an assessment of the combined effect. Here we
have used categorical covariates (exposed versus not exposed) on the dosage levels
for each factor, and the dose-reponse relationship is difficult to explicitly model.
Our main interest is then the extent to which the risk factors combine at this binary
level. Although the definitions of those exposed and not exposed are subject to
cutoff points we should still be able to see evidence for a multiplicative or additive
relation provided the definitions are consistent across studies. However, a limitation
of such a binary classification is that the power to test interactions is essentially
determined by the size of the smallest category, so few lung cancer cases for non-
smokers suggests that an analysis based on a binary classification is likely to be
weaker than one based on continuous data.
Chapter 5
Spatial and temporal modelling of Ross River virus in
Queensland
In this chapter, we examine a mixture model approach to characterise the risk of
Ross River virus (RRv) in Queensland from 1984 to 2001. At the time of analysis
(2005), this data was the most recently available for all of QLD. The mixture model
approach builds on the approach adopted by Gatton et al. (2004), and considers
that the weekly cases of RRv could be attributed to more than two hypothesised
periods (outbreak or no outbreak period), and also extends the analysis to compare
the number of periods across non-homogenous spatial regions of Queensland.
5.1 Introduction
Ross River virus (RRv), also known as Epidemic Polyarthritis, is a debilitating
disease and is the most prevalent vector-borne disease in Australia (Lin et al., 2002).
It was first identified in 1958 from mosquitoes collected at Ross River, Townsville, by
the Queensland Institute of Medical Research and since then has become common
in Queensland. The virus can survive and replicate in humans and other vertebrate
hosts, and is transmitted by a variety of mosquito vectors (Russell and Dwyer,
2000). The disease in humans is nonfatal and infections can be either asymptomatic
73
74
or symptomatic, with symptoms including polyarthritis, rash, fever, myalgia, and
lethargy (Harley et al., 2001).
There has been much recent research into the spatial and temporal nature of
Ross River virus in Queensland (Gatton et al., 2004; Kelly-Hope et al., 2004; Tong
and Hu, 2002). A recent paper by Gatton et al. (2004) focussed on the spatial
and temporal nature of outbreak periods, where outbreak periods are defined by
comparison against long term incidence rates specific to that area. The spatial and
temporal nature of outbreak periods is of public health importance as increased
understanding will lead to more targeted public health interventions (Tong, 2004).
In this chapter, we use a Bayesian mixture model to characterise outbreaks in
weekly cases of Ross River virus in Queensland from 1984 to 2001. RRv notification
data was obtained from the Communicable Diseases Section of Queensland Health.
An exploratory analysis revealed an association between climate variables and cases
of RRv, so we aggregated the data to fifteen homogenous climate zones representing
Queensland.
The mixture model allows us to separate the RRv data over time into a number
of states or components, where the number of components is unknown a priori. This
is an extension of previous work on RRv which has focussed on only two components
or states, a non-outbreak state (background) and an outbreak state, with the latter
state associated with a higher mean value of cases than the former. Evidence for
three components may indicate an additional state to these two, and we could call
this a ‘hyper-outbreak’ state. It is less clear how to interpret data best fitted by a
model with four or more components. The method also provides a probability of the
component (state) to which each week in the data set belongs, and thereby avoid
possibly subjective decision rules.
The choice between competing models of a different number of components in-
variably involves a selection criteria that will take into account both measures of
fit and complexity. In this chapter we use methodology developed in Celeux et al.
(2003) and choose between competing models based on Deviance Information Cri-
terion (DIC) estimates. The parameters for the different models were estimated
by Markov Chain Monte Carlo (MCMC) using the software package WinBUGS
75
(Spiegelhalter et al. (2002)).
We focussed the analysis on two different climate zones which appeared to display
different temporal behaviour, and found much variability in the results, with a higher
number of components preferred for data from the zone which appeared to show a
more distinctive pattern.
We then fitted a mixture model to each of the remaining zones and compared
the variability in the number of components and associated parameter estimates.
5.2 Method
5.2.1 Data
Ross river virus disease notification data from 1984 to 2001 was obtained from the
Communicable Diseases section of Queensland Health. A notification was reported
if serologic testing indicated a four-fold change in antibody titer between paired
acute and convalescent sera, or if IgM and IgG antibody levels against RR virus
were consistent with acute infection. Each complete notification included place
of residence (location and street/road), date of onset, age and sex of the patient.
Place of residence was further geocoded by the Queensland Department of Local
Government and Planning into Statistical Local Areas (SLA) and later grouped to
Local Government Areas (LGA).
An exploratory analysis of the data at the residence level and recent research
indicates a strong relationship between the incidence of RRv virus and climate re-
lated variables such as rainfall, temperature, humidity, Southern Oscillation Index,
and sea levels (McFallan, 2001; Tong and Hu, 2002; Kelly-Hope et al., 2004). On
this basis, we decided to aggregate the data to 15 climate zones as identified by the
Australian Bureau of Meteorology (See Figure 5.1).
A summary of the data for the fifteen climate zones is provided in Table 5.1.
As an example of the variability of the data over time between the different
zones, Figures 5.2 & 5.3 show the weekly number of RRv cases over time for Zones
15 and 5 respectively. The data from Zone 15 appears to show a more distinctive
76
Figure 5.1: Queensland climate zones - Bureau of Meteorology
77
Table 5.1: Summary results - all zones
Zone Min Q1 Median Mean Q3 Max1 0.00 0.00 0.00 0.08 0.00 2.002 0.00 0.00 0.00 0.14 0.00 5.003 0.00 0.00 0.00 0.33 0.00 7.004 0.00 0.00 2.00 4.17 5.00 42.005 0.00 0.00 2.00 5.10 5.00 59.006 0.00 0.00 1.00 2.07 2.00 37.007 0.00 0.00 1.00 3.21 3.00 48.008 0.00 0.00 0.00 0.85 1.00 14.009 0.00 0.00 0.00 0.29 0.00 9.0010 0.00 0.00 0.00 0.42 0.00 20.0011 0.00 0.00 0.00 0.07 0.00 3.0012 0.00 0.00 0.00 0.73 1.00 15.0013 0.00 0.00 1.00 2.07 2.00 59.0014 0.00 0.00 1.00 2.33 2.00 44.0015 0.00 2.00 5.00 14.42 11.00 307.00
outbreak pattern than from Zone 5, and thus it is of interest to see how the results
of applying a mixture model to these zones separately may differ.
5.2.2 Mixture models
The use of mixture distributions comprising a finite or infinite number of compo-
nents, possibly of different distributional types, to describe different features of data
has attracted a great deal of recent research interest (Marin et al., 2005; McLachlan
and Peel, 2000a).
The mixture model can be formulated as,
p(y|θ) =k∑
j=1
wjf(y|θj)
k∑j=1
wj = 1, k > 1
(5.1)
78
050
100
150
200
250
300
Ross River virus cases − Zone 15
Time
Cas
es
1985 1990 1995 2000
Figure 5.2: Time plot of weekly cases - Zone 15
where k is the number of components, wj is the probability of being allocated to com-
ponent j, and where the allocation of each observation yi to one of the components
is represented by a latent variable zi (zi ∈ N (discrete case))
p(zi = j) = wj
Z ∼Multinomial(1, p1 . . . pk)(5.2)
Although, we have correlated data and an assumption of the data (y) for a mixture
model is that it is i.i.d, we use this simplifying assumption for now and discuss the
implications of this in the discussion.
Choice between competing models of a different number of components invariably
involves a selection criteria that will take into account both measures of fit and
79
010
2030
4050
60
Ross River virus cases − Zone 5
Time
Cas
es
1985 1990 1995 2000
Figure 5.3: Time plot of weekly cases - Zone 5
complexity. For example, a mixture model with a large number of components may
fit the data well, but suffer from a lack of interpretability of the parameters. In this
chapter we use methodology developed in Celeux et al. (2003) and choose between
competing models based on Deviance Information Criterion (DIC) estimates. As the
aim of the analysis is to interpret the components as identifying separate groupings
in the data with different levels, a further criterion for model choice in this context
is also to ensure the means of the components are well separated.
Application to RRv data
We first fitted a Poisson distribution to the RRv data for a number of the zones,
and found that due to the large number of zeros, this distribution does not offer a
80
good fit (See Figure 5.4). There are a range of methods available to handle the case
of a large number of zeros for count data (See for example Dalrymple et al. (2003)).
We chose to let log(yt+1) follow a truncated normal distribution.
For RRv data in each zone we specified,
log(yt + 1) ∼ TN(µZt , τ)
Zt ∼ Multinomial(P1,...,k)
and priors
µZt ∼ N(1, 0.01)
P1,...,k ∼ Dirichlet(α)
τ ∼ Gamma(2, 3)
where yt are the observed cases, τ is a precision parameter and µ is restricted to
µ1 < µ2 < . . . µk to prevent label switching.
The parameters for the different models were estimated by Markov Chain Monte
Carlo (MCMC) using the software package WinBUGS (Spiegelhalter et al., 2002).
Estimates are based on runs of 10,000 iterations, or until evidence of convergence.
Convergence was assessed by examining Monte-carlo error estimates and Gelman-
Rubin statistics (Brooks and Gelman, 1998b).
5.3 Results
The results for Zones 5 and 15 are provided in Table 5.2. The estimates for µ have
been exponentiated for ease of interpretability with the original data.
Our model choice criteria are the DIC estimates (lowest being preferable), the
effective number of parameters (pd), and the separation of the means. For Zone
15 (Table 5.2), the estimates for four components and above show weak signs of
convergence in the MCMC runs (Brooks and Gelman, 1998b). This is also an indi-
cation that we could be overfitting. On the basis of this, the three component model
appears to be preferable as there is a reduction in the DIC estimate from the two
81
1
0.0 0.2 0.4 0.6 0.8 1.0
020
040
060
080
0
2
0.0 0.5 1.0 1.5
020
040
060
0
3
0.0 0.5 1.0 1.5 2.0
020
040
060
0
4
0 1 2 3 4
050
100
200
5
0 1 2 3 4
050
100
150
200
250
6
0 1 2 3 4
050
150
250
350
7
0 1 2 3 4
050
100
200
8
0.0 0.5 1.0 1.5 2.0 2.5
010
030
050
0
9
0.0 0.5 1.0 1.5 2.0
020
040
060
0
10
0.0 0.5 1.0 1.5 2.0 2.5 3.0
020
040
060
0
11
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
020
040
060
080
0
12
0.0 0.5 1.0 1.5 2.0 2.5
010
030
050
0
13
0 1 2 3 4
050
150
250
350
14
0 1 2 3 4
050
150
250
15
0 1 2 3 4 5 6
050
100
150
Figure 5.4: Histograms of data (log(y+1)) for all Zones (as numbered)
82
Table 5.2: Results for Zones 5 and 15
Zone No. Components µ σ2 λ pD DIC
5 1 0.03 3.68 1.00 1.11 21342 0.03 2.03 0.84 3.05 2093
17.41 0.97 0.163 - - - - -
15 1 2.65 3.87 1.00 2.11 27452 0.46 7.20 0.72 3.59 2711
6.39 0.86 0.283 0.03 1.08 0.23 5.06 2701
6.12 1.08 0.6661.55 1.08 0.11
4 - - - - -
Note: - indicates non-convergence and is evidence of overfitting. pD is the effective number ofparameters being used in the model from DIC calculations.
component model (2,711 (k = 2) to 2,701 (k = 3)), without a large increase in the
number of effective parameters (3.59 (k = 2) to 5.06 (k = 3)).
Figure 5.5 illustrates the fitted mixture model with three components for Zone
15 against a time series of the data. In this figure, we can see the three levels of
the time series corresponding to the means of the three components. Figure 5.6
shows a comparison of the fitted mixture model against a histogram of the data. In
comparison, the results from Zone 5 indicate that two components can be fitted to
the data over time.
Figure 5.7 illustrates the fitted mixture model with two components for Zone 5
against a time series of the data, and figure 5.8 shows a comparison of the fitted
mixture model against a histogram of the data.
The results for all zones are shown in Table 5.3. For Zones 8 and 10 the results
suggest only one component or group in the data over time; the results for Zones 4
to 7, 13 and 14 suggest two components, and the results for Zone 15 suggest three
components. For the other zones, the data were too disparate to apply a mixture
83
Time
log(
case
sz15
+1)
1985 1990 1995 2000
01
23
45
0.0 0.1 0.2 0.3
01
23
45
6
Figure 5.5: Plot of fitted mixture model for Zone 15 showing three components against thedata over time (log values). Overall fitted density is shown in Black, and components in Red. Bluelines indicate the estimates of µ for the three components.
84
0 1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
Figure 5.6: Plot of fitted mixture model for Zone 15 against a histogram of the data. Overallfitted density is shown in Black, and components in Red.
85
Time
log(
case
sz5+
1)
1985 1990 1995 2000
01
23
4
0.0 0.2 0.4 0.6
01
23
4
Figure 5.7: Plot of fitted mixture model for Zone 5 showing three components against thedata over time (log values). Overall fitted density is shown in Black, and components in Red. Bluelines indicate the estimates of µ for the three components.
86
0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0.5
log(casesz5+1)
Den
sity
Figure 5.8: Plot of fitted mixture model for Zone 5 against a histogram of the data (densityscale). Overall fitted density is shown in Black, and components in Red.
87
model and draw any substantive conclusions (See Figure 5.4).
Table 5.3: Results for all Zones
Zone No. Components† µ σ2 λ pD
1 * * * * *2 * * * * *3 * * * * *4 2 0.01 1.06 0.57 2.85
6.36 1.06 0.435 2 0.03 2.03 0.84 3.05
17.41 0.97 0.166 2 0.00 0.14 0.37 3.55
2.02 1.10 0.637 2 0.02 1.82 0.86 2.63
8.85 1.21 0.148 1 0.00 1.02 1.00 1.869 * * * * *10 1 0.00 0.63 1 4.7811 * * * * *12 1 0.00 0.92 1 2.1213 2 0.00 0.15 0.36 4.54
1.75 1.17 0.6414 2 0.00 0.15 0.34 3.94
1.89 1.26 0.6615 3 0.03 1.0765 0.23 5.06
6.12 1.08 0.6661.55 1.08 0.11
Note: † indicates the number of components best addressing the model choice criteria. * indi-cates the data range is too disparate to evaluate a mixture model. pD is the effective number ofparameters being used in the model from DIC calculations
The results from Table 5.3 suggests a spatial pattern to the data, with two
components identified for zones located on the coast (Zones 1,4-7,14,15) compared
to only one component for zones located inland. This spatial pattern for RRv is
supportive of previous evidence associating higher incidences of RRv with coastal
regions (Tong, 2004).
88
5.4 Discussion
We explored a Bayesian mixture model to analyse cases of RRv occurring in 15
climate zones throughout Queensland. We examined two of the zones in detail
and found a higher number of components preferred for data from the zone which
appeared to show a more distinctive pattern (Zone 15). A comparison across all the
zones suggests a higher number of components is identifiable from the data for zones
located along the coast of QLD.
There may be a number of explanations as to why we observe a number of
components or groups in the data over time. Further analysis of Zone 15 suggests
that if we take into account a possible change point in the data around 1991/92 due
to a change in notification practice, the number of components reduces from three to
two. We may also observe two or more components if there has been a substantive
increase in the magnitude of outbreaks over time.
We may also observe differences between the zones in terms of the mean (µ) and
weight (λ) associated with components. Analysis of the means of the components
indicating the change in the level of the data between the components, and the
weight being indicative of the amount of time spent in each component. Even for
zones with the same number of components the disparity between these parameter
estimates may be quite large.
Although we have used a simplifying assumption that the data is i.i.d, this is
not without implications. The most likely implication is that the standard errors
around our estimates are biased, and are likely to be understated. For this reason
no inference for the variance was made. This in some sense a price to pay for the
approach we have adopted. Allowance for the correlated nature of the data is likely
to lead us away from the main aim of the analysis. The primary aim of the analysis
is to classify the data into groups based on changes to the level of the data, rather
than explain the correlation structure of the data.
Explaining the correlation structure of the data is also likely to be difficult. There
does not appear to be a consistent correlation structure to the data, either due to
changes in the magnitude of the seasonal cycles or from changes in the correlation
89
of the data from one period to another. In this case, allowing for the correlation
structure of the data is likely to disguise any changes in the levels that we may
otherwise observe. Alternative approaches such as a Hidden Markov Model, or
a Dirichlet Process mixture have similar inferential difficulties and computational
issues are substantially more involved.
The analysis could be extended in a number of ways. Other distributional forms
could be assume to take account of the large number of zeros in the data and
investigated to assess the difference in the results. Further analysis is also required
to compare the timing of the components across the zones, and we could further
reduce the number of zones into a grouping based on the timing and number of
components observed.
Chapter 6
Bayesian mixture model estimation of aerosol particle
size distributions
In Chapters 6, 7 and 8 we examine approaches to estimate a mixture model at
both single and multiple time points for aerosol particle size distribution (PSD)
data. In this chapter, for estimation of mixture model at a single time point, we
use Reversible Jump MCMC to estimate mixture model parameters including the
number of components which is assumed to be unknown. We compare the results
of this approach to a commonly used estimation method in the aerosol physics
literature. As PSD data is often measured over time at small time intervals, we also
examine the use of an informative prior for estimation of the mixture parameters
which takes into account the correlated nature of the parameters.
6.1 Introduction
There has been recent interest in the estimation of particle size distributions of
aerosol particulate data (Makela et al., 2000; Birmili et al., 2001; Xu et al., 2002;
Whitby et al., 2002; Lu and Bowman, 2004; Hussein et al., 2005). In these pa-
pers, the interest in estimation is largely directed at better understanding aerosol
dynamic processes (i.e., coagulation, nucleation, condensation, and deposition) that
90
91
govern aerosol formation, as growth, and evolution depend on the number, size,
and composition of particles. In the atmosphere, these aerosol characteristics de-
termine the influence of particles on health, climate, cloud formation, and visibility
(Seinfeld and Pandis, 1998). To examine the effects of these impacts, accurate and
computationally efficient estimates of the size and composition of the distribution
are required.
A number of different mathematical representations of size distributions exist,
including discrete, spline, sectional, modal, or monodisperse (Whitby and McMurry,
1997). Two of the most common approaches for representing size distributions are
sectional and modal methods. First introduced by Whitby (1978), a modal represen-
tation treats the aerosol size distribution as a set of individual, typically lognormal,
distributions or modes. Estimation of the modal representation commonly uses an
iterative least squares method (LSM) subject to certain conditions, such as main-
taining a minimum distance between mean estimates of two adjacent components
(for example, see Hussein et al. (2005)).
An alternative modal representation of the size distribution is a finite mixture
model. Mixture models have been the subject of much recent research (Diebolt
and Robert, 1994; McLachlan and Peel, 2000a; Marin et al., 2005; Richardson and
Green, 1997). The Bayesian paradigm for mixture modelling allows for probability
statements to be made directly about the unknown parameters and (perhaps) an
unknown number of components, prior knowledge and expert opinion to be included
in the analysis, and hierarchical descriptions of both local-scale and global features
of the model.
In this chapter, we analyse a sample of aerosol particulate data using a Bayesian
mixture model, and assess the performance of the method using actual and simu-
lated data. We then outline an approach for describing the evolution of the aerosol
particles over time, using an informative prior on a sample of data collected over
one day.
In Section 8.2, we briefly describe particle size distributions, and provide an
illustration with actual data. In Section 3 we outline the methodology of mixture
models, a Gibbs sampling algorithm to estimate the mixture, and a variation to
92
account for the truncation of the data. In Section 4 we present the results of applying
the Bayesian mixture model to some simulated and actual datasets and compare the
results to those obtained by LSM.
6.2 Particle size distribution data
One of the most important physical properties of aerosol particles is their size and
the concentration of particles in terms of their size is referred to as the particle
size distribution. Figure 8.1 shows an example of particle size distribution data
for one measurement or time period. Because aerosol particles are often charged,
their size can be determined from their electrical mobility (McMurry, 2000). A
common instrument that utilizes this principle is the Differential Mobility Particle
Sizer (DMPS). The DMPS includes three main parts: (1) an aerosol particle charger
that produces a steady-state charge distribution for the aerosol particle sample (e.g.
Wiedensohler, 1988; Adachi et al., 1985; Hussin et al., 1983), (2) differential mobility
analyzer (DMA) that separates aerosol particles according to their electrical mobility
(e.g. Hewitt, 1957; Knutson and Whitby, 1975), and (3) a particle counter to count
the number concentrations of the separated aerosol particles after the DMA.
Based on their formation processes, aerosol particles are either primary or sec-
ondary. Primary aerosol particles are directly emitted into the atmosphere or formed
in the atmosphere by condensation or coagulation without chemical reactions. On
the other hand, secondary aerosol particles are formed in the atmosphere by gas-to-
particle conversion processes. Growth of aerosol particles occurs through coagulation
and condensation of hot vapors (e.g. Kulmala et al., 2004). However, the rate of
coagulation depends on the already existing particle number concentration whereas
the rate of condensation depends on the surface area of aerosol particles. There-
fore, particles do not normally grow above 1 mm because the condensation and
coagulation rates decrease as the particle size increases.
In this study we present, as an example, the aerosol particle evolution, before,
during, and after a new particle formation event at a Boreal Forest in Southern
Finland (Figure 8.2). This dataset was selected as it provides a wide ranging repre-
93
1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
0.5
Particle Diameter (log(Dp(nm))
Den
sity
Figure 6.1: Histogram of data sampled from Hyytiala, Finland for a single time period
94
sentation of modes for particle size distributions (Dal Masso et al., 2005). Because
aerosol particles are governed by formation and transformation processes, they tend
to form well distinguishable modal feature. For example, during background con-
ditions in the Boreal Forest the particle number size distribution of fine aerosols
(diameter < 2.5 mm) is bi-modal: an Aitken mode (below 0.1 mm) and an accumu-
lation mode (over 0.1 mm). During a new particle formation event a new particle
mode, which is commonly known as nucleation mode, is formed in the atmosphere
with geometric mean diameter bellow 0.025 mm. However, in the urban atmosphere,
aerosol particles are more dynamic because of the different types and properties of
sources of aerosol particles and may show more than 3 lognormal modes. Typically
the number concentrations of aerosol particles in the urban background can be as
high 5× 104cm−3 and very close to a major road they often exceed 105cm−3.
In general, aerosol particles have direct and indirect impacts on the Earth’s
climate. Investigating the modal structure of aerosol particles provides a better un-
derstanding about their dynamic behavior in addition to their effect on the climate.
New particle formation events in the background atmosphere can be one of the best
case studies aiming to understand the dynamical changes that take place in aerosol
particles from the very early stage of their size until they grow and further their
participation in cloud processes. It has been recently observed that new particle
formation can take place anywhere in the globe, further increasing the importance
of understanding the processes involved.
6.3 Methods
In this section, we outline an independent approach to estimating a mixture model
at a single time point using RJMCMC, a two stage approach to estimation of a
mixture over multiple time points, and an approach to estimate a mixture of normal
distributions where there is truncation present in the data.
95
Figure 6.2: An illustration of a new particle formation event at a Boreal Forest sitelocated in Southern Finland. (a) The temporal variation of the particle number size distri-bution and (b) selected particle number size distributions showing the different stages ofthe newly formed particle mode from its early stage. Note that this new particle formationoccurred on a regional scale over the southern part of Finland.
96
6.3.1 Mixture model at a single time point
The density of data (y) given by a finite mixture model can be represented by;
p(y|θ) =k∑
j=1
λjf(y|θj) (6.1)
where k is the number of components in the mixture, λj represents the probability of
membership of the jth component (∑k
j=1 λj = 1), and f(y|θj) is the density function
of component j which has parameters θj.
As component membership of the data is unknown, the usual hierarchical frame-
work for the mixture model involves introducing the latent indicator variable z. In
this model, zi represents the unobserved component membership from which ob-
servation yi is drawn, and is treated as another parameter to be estimated in the
modelling procedure.
In this chapter, we transform the aerosol particle size distribution data using a
natural logarithm prior to fitting the mixture model (Whitby and McMurry, 1997).
In this case, the data (yi) are the natural log of particle diameters (nm), which are
assumed to be normally distributed and the parameters (θj) to be estimated for each
component are therefore the mean (µj) and variance (σ2j ), along with the component
weight (λj). The number of normal components, k, is also assumed to be unknown.
Priors were:
p(µj) ∼ N(ξ, κ−1)
p(σ−2j ) ∼ Gamma(δ, β)
p(β) ∼ Gamma(g, h)
p(λ) ∼ Dirichlet(α1, α2, · · · , αk)
p(k) ∼ Uniform(kmin, kmax)
where ξ, κ, δ, α, η, g, h, kmin and kmax are fixed hyperparameters.
For estimation of the mixture model, we implemented Richardson & Green’s
(1997) Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm (for de-
97
tails see Appendix I). This approach is “fully” Bayesian in the sense that a posterior
distribution for the unknown number of components (k) in the mixture model is es-
timated, rather than using a comparative measure, such as the Bayesian Information
Criteria, to assess the fit of mixture models of different dimensions. In section 8.4,
we compare the results of applying the RJMCMC algorithm to results obtained
using the LSM for measurements at particular time periods.
Mixture model estimation over multiple time periods
To examine aerosol dynamic processes, measurements of aerosol particle size dis-
tribution data are often taken regularly over time with the measurement intervals
typically ranging from 5 minutes to an hour. For the smaller measurement inter-
vals the data and associated parameter estimates are likely to be highly correlated
across time. Most estimation of the particle size distribution data does not take in
to account the likely correlation between parameters other than in most cases using
the previous parameter estimates as starting values in an optimisation routine for
the current period. Allowing for correlation between the parameter estimates over
time is likely to lead to improvements in estimation, inference and efficiency.
One approach to this problem is to extend the RJMCMC mixture model de-
scribed in Section 6.3.1 to allow for evolution of parameters θjt over time periods t,
t = 1, . . . , T with kt, the number of mixture components at time t, also unknown
and possibly unequal. This single modelling approach requires reversible moves not
only within time periods but also across them. In our experience this was computa-
tionally very costly and required substantial pre-processing to ensure good mixing,
labelling and convergence. Moreover, further post-processing was required to obtain
adequate summary statistics and between component mapping.
As an alternative, we adopted a two-stage approach to estimation of a mixture
model over multiple time points. In the first stage, for each time period t, we
implemented the RJMCMC algorithm of Section 6.3.1 and estimated kt. We then
calculated k′= maxt=1,...,T (kt). In the second stage, we fixed kt = k
′and estimated
(θjt = (µj, σj, λj); j = 1, . . . , k′; t = 1, . . . , T ). As we do not observe all of the k
′
components in every time period, we allowed component weights to be “effectively”
98
zero (inf(λt)=0.001) if required.
In the second stage of this algorithm, we considered two sets of priors. The
first was the set of independent priors: p(λ) ∼ Dirichlet(α1, ..., αk); p(µj|σ2j ) ∼
N(ξj,σ2
j
nj);
p(σ2j ) ∼ IG(
vj
2,
s2j
2), where αj, ξj, nj, vj and sj are fixed hyperparameters. The second
allowed for temporal correlation. In the case of a Gaussian mixture model, we have
three parameters (µ, σ and λ) for which we could utilise information from previous
time periods. In this chapter, we adopt one such informative prior for λ as it was
the parameter of most interest in the aerosol study. We note that alternative priors
for λ could be defined and that informative priors can be constructed for µ and σ
instead or on multiple parameters.
Gustafson and Walker (2003) proposed a prior for λ that downweights large
changes in probabilities in successive periods. For time period t, t = 2, . . . , T define
p(λt) ∼ Dirichlet(1, . . . , 1) exp(−∑T
t=2
∑Jj=1(λjt − λj,t−1)
2
φ) (6.2)
where smoothing increases as φ → 0.
A potential advantage of using information about estimates over the whole time
period (t = 1 . . . T ) is the additional information this may provide to guide parameter
estimates in the current period. This may be an advantage at times where large
changes in the parameter estimates are occurring for single time periods.
To sample from the posterior distribution of λ, Gustafson and Walker (2003) pro-
pose a rejection sampling algorithm in which the candidate distribution is Beta(m(r)jt +
1,m(r)kt + 1) where m
(r)jt is the number of observations allocated to component j or k
(where j 6= k) for time period t at iteration r. A limitation of this rejection sampling
scheme is that it becomes problematic for large sample sizes and we discuss this issue
later in relation to the results.
6.3.2 Accounting for truncated data
In this section we outline the use of a second latent variable to estimate a mixture
of normal distributions where there is truncation present in the data.
99
A feature of some of the data used to estimate particle size distributions is a
definitive lower and upper bound for the particle size. For example, particle con-
centrations may be measured with a range of particle size from 3nm to 650nm,
depending on the measurement device used. Preliminary investigation of some sam-
pled data from Hyytiala (Finland), revealed the possibility of there being truncation
of the data on the lower and upper bounds. Figure 8.1 shows a sample of particle
size distribution data which clearly illustrates truncation on the lower bound of the
data.
Measurement of aerosol particles is commonly observed in the form of a number
of distinct particle size ranges, or channels, the size and number of the channels
being governed by the type and setup of the measurement instrument (Hussein
et al., 2004). For example, in the sampled data from Hyytiala (See Section 6.4.2),
we observed 32 distinct size partitions (bins) covering the range from 3nm to 650nm.
For estimation of truncated normal distributions using Gibbs sampling, we took a
missing data latent variable approach and introduced a new variable (y = (y, y∗))
to consist of the original data (y) in the largest and smallest size bins (yU and yL)
and the assumed missing data (y∗) consisting of size measurements smaller than the
lower bound, and larger than the upper bound of the original data.
To extend the boundary of the range of the data to be included in (y), we created
an additional number of bins for sizes less than (yL), and greater than (yU). For
example, for the sampled data used for fitting in the next section, we created four
additional bins to the left of the original lower bound, and three to the right of the
original upper bound. The space between size bins was evenly spread in proportion
to the size between the original bins.
We then estimated the parameters of the mixture model using the original data
and (y). At the end of each iteration y∗ was reallocated using the current parameter
values.
An advantage of this latent variable approach is that the algorithm described in
Section 6.3.1 can be readily applied to estimate y∗ and the mixture parameters based
on y. The approach is generalisable to other missing data assumptions we might
have about the original data, although in this chapter we confine our approach to
100
the issue of truncation. For the purposes of this chapter we now refer to the analysis
allowing for truncated data as ‘truncated Normal’.
6.4 Results
In this section, we present the results of applying the RJ algorithm outlined above
to a simulated and actual dataset. For both datasets analysed we used a uniform
prior for k over the range k = 1, . . . , 10 and the following weakly informative hyper-
parameter values: κ = 1/R2; α=2; g = 0.2; h = 10/R2 and δ=1, where R equals the
range of the data. Hyperparameter values for α and g encourages similar values for
σj without being informative about their absolute size, and the value for κ reflects
a weak prior belief in ξ. Results were based on 200,000 iterations with a burnin
period of 100,000.
6.4.1 Simulated data: single time point
In this section, we use simulated data to validate estimates from the RJMCMC
algorithm outlined above. The data comprised four components truncated on the
lower bound, with characteristics representative of the aerosol data described in
Section 6.3.2. Figure 6.3 shows the kernel density estimator of simulated data with
fitted results from normal and truncated normal approaches. The corresponding
posterior estimates of the component parameters and 95% credible intervals are
given in Table 6.1.
Due to the clear truncation of the data in Figure 6.3, we would expect the
truncated Normal distributions to fit the data better in the area of truncation
than a model that ignores this truncation. For both models the posterior estimate
for the number of components (k) was highest for four components (truncated:
P (k = 4) = 0.54, non-truncated: P (k = 4) = 0.93). The point estimates for the
parameters from the truncated Normal distribution for almost all parameters are
much closer to the true values than for estimates from the non-truncated version
(See Table 6.1). Ignoring truncation appears to result in less weight assigned to
the first component, with a mean value lower than the true value, and estimates
101
1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
Diameter (natural log scale)
Den
sity
Figure 6.3: Kernel density estimator of simulated data (black) with fitted results fromnormal (dark green) and truncated normal (blue) approaches. Simulated data based onparameters: k = 4;µ = (1.40, 2.30, 3.70, 5.10);σ = 0.30; λ = (0.10, 0.10, 0.60, 0.20)
102
for standard deviation (σ1) and weight (λ1) smaller than the true values. For the
first component, the true value for the weight of the component (λ1) is 0.10, and
the mean estimates for the assumed non-truncated and truncated distributions are
0.0671 and 0.1017, respectively. In the non-truncated model, estimates for the sec-
ond and third components appear then to compensate for smaller estimates from the
first component, with standard deviation and weights for these components larger
than the true values. For the second component, the true value for σ2 is 0.30, and for
λ2 is 0.10. The mean estimates for the non-truncated distribution for σ2 and λ2 are
0.40 and 0.13, respectively. The results thus suggest that accounting for truncation
may not only result in a better fit for the associated component but also better fits
for neighbouring components.
Table 6.1: Estimated parameter values from Bayesian mixture model analysis using RJMCMCalgorithm with simulated data. Based on 200,000 iterations with a burnin of 100,000. CI =Credible Interval
Component Parameter True valuePosterior Estimates (Mean (95% CI))
Normal TruncatedNormal
µ1 1.40 1.32 (1.26, 1.38) 1.44 (1.30, 1.88)1 σ1 0.30 0.18 (0.13, 0.21) 0.34 (0.22, 0.63)
λ1 0.10 0.07 (0.05, 0.09) 0.10 (0.05, 0.20)µ2 2.30 2.14 (2.03, 2.24) 2.33 (2.09, 3.69)
2 σ2 0.30 0.40 (0.32, 0.46) 0.35 (0.21, 0.62)λ2 0.10 0.13 (0.10, 0.15) 0.12 (0.05, 0.44)µ3 3.70 3.73 (3.68, 3.76) 3.75 (3.68, 3.94)
3 σ3 0.50 0.53 (0.48, 0.58) 0.52 (0.45,0.58)λ3 0.60 0.62 (0.58, 0.66) 0.60 (0.18, 0.66)µ4 5.10 5.14 (5.03, 5.23) 5.13 (5.00, 5.23)
4 σ4 0.50 0.44 (0.39, 0.50) 0.45 (0.40, 0.51)λ4 0.20 0.18 (0.15, 0.22) 0.19 (0.15, 0.23)
103
6.4.2 Case study: single time point
Figure 8.1 shows a plot of actual data, taken from measurements at Hyytiala, a
boreal forest site in Southern Finland (SMEAR II) (Vesala et al., 1998). This dataset
was selected as it provides a wide ranging representation of modes for particle size
distributions (Dal Masso et al., 2005). Figure 6.4 show the results of fitting a normal
and truncated normal distribution to a single time period from this dataset.
1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
0.5
Particle Diameter (log(Dp(nm))
Dens
ity
1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
0.5
Particle Diameter (log(Dp(nm))
Figure 6.4: Histograms of data sampled from Hyytiala, Finland with estimated overallfit and components for non-truncated Normal (left, k=4) and truncated normal (right,k=3) overlaid
The non-truncated mixture model appears to fit two components with small
variance, whereas the truncated normal mixture model fits one component with
no apparent loss in fit. In practice, a result like this may suggest that there are
two sources for the smaller sized particles instead of one or alternatively a different
104
underlying aerosol process.
We can compare these with an iterative least squares method (LSM) commonly
used for estimation of the modal method in the aerosol literature. Different research
groups have developed their own algorithms and most involve some degree of user
input for the number of components (Makela et al., 2000; Birmili et al., 2001; Whitby
et al., 1991). For this reason, we chose the fully automated algorithm outlined by
Hussein et al. (2005), which compares favourably to Makela et al. (2000), and to
previous versions (Hussein et al., 2004). The aim here is not to comment on this
algorithm or on LSM as a methodology but rather to offer a brief comparison of
results in the context of this case study.
Figure 6.5 shows the results of fitting using LSM and our algorithm for a sample
of data from Hyytiala. The solid line is the predicted fit, and the dotted lines display
the components. The LSM appears to underfit both the small and large sizes of the
particle size distribution. Our algorithm identified two more modes to describe these
extremes of the particle size distribution, and the resulting five component model
appears to provide an improved fit.
Figure 6.6 again shows the results of fitting using the two approaches for a sec-
ond sample of data from Hyytiala. The fit of the LSM appears to ignore either a
second component (mean of 3.6nm) or skewness of the main component. From pre-
liminary investigation, this component gradually emerges around this time period.
Our algorithm provides a better fit for this second component and hence overall.
The main difference between the results of the two algorithms in the examples
that we have seen appears to be that our approach is performing a much more
thorough search of the parameter space including the number of components. By
using uninformative priors, our model choice criteria using RJMCMC is also largely
driven by the data, and is thus avoiding the use of any subjective influences on
model choice.
105
1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Particle Diameter (log(Dp(nm))
Dens
ity
1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Particle Diameter (log(Dp(nm))
Figure 6.5: Histograms of data sampled from Hyytiala, Finland with estimated overallfit and components from RJMCMC (left) and LSM (right) overlaid
106
1 2 3 4 5 6
0.00.2
0.40.6
0.81.0
1.2
Particle Diameter (log(Dp(nm))
Dens
ity
1 2 3 4 5 6
0.00.2
0.40.6
0.81.0
1.2
Particle Diameter (log(Dp(nm))
Figure 6.6: Histograms of data sampled from Hyytiala, Finland with estimated overallfit and components from RJMCMC (left) and LSM (right) overlaid
107
6.4.3 Results for mixture model estimation over multiple
time points
In this section, we apply our two stage algorithm to data collected over one day from
Hyytiala, measured at 10 minute intervals (T=144). Figure 6.7 shows the results
of the first stage of the algorithm, with a plot of the posterior mean estimates for
µjt at each time point t, with the size of the circles indicating the corresponding
weight λjt. The average of the number of components estimated with the highest
probability over the day was four, and the largest number of components was five.
12
34
5
Time
Pos
terio
r m
ean
estim
ates
for
µ
00:00 04:00 08:00 12:00 16:00 20:00 23:50
Figure 6.7: Plot of posterior mean values for µjt obtained from the first stage using theRJMCMC algorithm for one day (Hyytiala measurement station). The size of the circlesindicating the weight (λjt) corresponding to µjt
108
Based on these results, for the second stage of the analysis we set the number of
components to be five (k′= 5) and hyperparameters to be: ξ = (1.1, 2.0, 3.0, 4.0, 5.0);
s2tj = 2.0225; vtj = 0.092025; and nj = 200. The hyperparameter values for ξ were
chosen to adequately represent the parameter space, and hyperparameter values for
s2tj and vtj were chosen to indicate an uninformative prior and allow σjt to range
from 0 to 1.5. The large value of nj was chosen to be large enough to prevent label
switching on the means, and small enough to have neglible influence on the posterior
estimates. Over the time period of interest, the average concentration of particles
was approximately 100,000 and ranged from 2,000 to 120,000. In light of this, the
value for nj is relatively small.
Figure 6.8 shows a plot of the posterior mean estimates for the parameters (µ and
λ) for each component over the course of the day, obtained from the second stage of
analysis using independent priors. The figure indicates a nucleation event occurring
around 08:00. Such events are a common feature of the data from this measurement
station (Sogacheva et al., 2005). Characteristic of such an event is a large increase in
the number of smaller sized particles (Nucleation < 20nm) , which typically grow in
size over the next few hours to either the Aitken (25-90nm) or accumulation modes
(100+nm). From the parameter estimates (Figure 6.8), we see that most of the
weight for the mixture prior to 08:00 is in the third component (bottom panel, µ ≈20nm, λ=0.8), however from 08:00 to 12:00, we see a large increase in the weight
for the first component (µ ≈ 4nm, 08:00 to 10:00), followed by a large increase in
the weight for the second component (µ ≈ 10nm, 10:00 to 12:00). Components 4
and 5 appear and disappear during the course of the day (bottom panel). This
may have a physical interpretation as the presence of an actual component source
or alternatively it may represent convenient modelling of the skewness of a single
component through multiple Gaussian distributions. In either case, the evolving
nature of these components is clearly depicted through this model and motivates
scientific interest.
Figure 6.9 shows a plot of the posterior mean estimates for the parameters (µ
and λ) for each component over the course of the day, obtained from the second
stage of analysis using an informative prior. From Figure 6.9, we can see that the
109
mu
12
34
5
lam
bda
0.0
0.2
0.4
0.6
0.8
1.0
00:00 04:00 08:00 12:00 16:00 20:00 23:50
Figure 6.8: Plot of parameters (µ and λ) over one day (Hyytiala, Finland) for the inde-pendent approach. Stage 2 of the analysis for the evolution of parameters. Measurementstaken every 10 minutes. Colours indicate the components to which parameter estimatesbelong (The parameter estimates for the first component are Black, parameters for thesecond component are Red, for the third component they are Green, etc.)
110
parameter estimates from the informed prior approach appear largely to follow the
parameter estimates from the independent approach with some degree of smoothing
on the weights. Here φ was set equal to 0.05. In analysis not shown, smaller values
of φ suggest a smoother pattern to the weights over time. Alternatively φ could
be treated as unknown and estimated, although care is required in this case. The
results under the informed prior suggest that at times we may be able to better infer
patterns in the data or in some cases remove some anomalies.
As indicated in Section 6.3.1, a limitation of the rejection sampling algorithm
for the informed prior approach as outlined by Gustafson and Walker (2003), is the
specification of the candidate distribution Beta(m(r)jt + 1, m
(r)ikt + 1) for large sample
sizes. We found that for our large sample size of particles and with the volatility in
weights for some periods of time the acceptance rate of proposed parameter values
was exceedingly low (< 5%). This appeared to be particularly the case for estimation
during the period between 11:00 to 13:00 where the sample size increased markedly
(>75,000 particles) and with much variability. This is clearly due to a very narrowly
defined distribution if both m(r)jt and m
(r)kt are large, and if neighbouring estimates
of λjt (λj,t−1 and λj,t+1) are, under the independent approach, some distance away.
Further research to investigate alternative forms of the candidate distribution would
be beneficial to improve computation time.
6.5 Discussion
We used a Bayesian mixture model to estimate particle size distributions for a sam-
ple of real and simulated datasets. We also proposed a modification to the standard
Gibbs sampler to handle the case of truncated data on both the lower and upper
bound. The results from using the algorithm were promising, and the method pro-
vides considerable flexibility both in estimation and inference.
By estimating the parameters of a mixture model using the RJMCMC algorithm,
we can make probabilistic statements about all unknown parameters, including the
number of components (k). In the case of the number of components, this avoids
the need to use comparative measures for fixed values of k, but also places the
111
mu
12
34
5
lam
bda
0.0
0.2
0.4
0.6
0.8
1.0
00:00 04:00 08:00 12:00 16:00 20:00 23:50
Figure 6.9: Plot of parameters (µ and λ) over one day (Hyytiala, Finland) for theinformed prior approach. Stage 2 of the analysis for the evolution of the parameters. Mea-surements taken every 10 minutes. Colours indicate the components to which parameterestimates belong (The parameter estimates for the first component are Black, parametersfor the second component are Red, for the third component they are Green, etc.)
112
assessment of model fit on a probabilistic basis which can be used for inference. For
example, we can say with some probability whether two or three modes exist at a
particular point in time. Of further interest may be the concentration of particles for
each mode. To examine this, we can estimate the probability that the concentration
of particles is above or below certain thresholds of interest.
In a Bayesian approach, prior knowledge or expert opinion and hierarchical de-
scriptions of both local-scale and global features can be included in the model. Al-
though, we have generally used weakly informative priors for parameter estimates,
more informative priors can be used in situations where this information is available.
In the case of frequent measurements of aerosol particle size distribution over time,
including prior information may assist in estimation and inference. For estimation,
the identifiability of mixtures is an important issue (Marin et al., 2005) and we may
be able to better identify individual components in time periods where there appears
to be some degree of overlap. By obtaining smoother parameter estimates over time
we may be able to more clearly establish patterns or identify anomalies from the
data.
While the Bayesian method for mixture models offers both flexibility in estima-
tion and inference, the interpretability of the parameters estimated is an important
question. For example, particle size distribution fits with five components may not
readily have a physical interpretation. From preliminary investigation of various size
distribution data, we generally found that extra components were needed to account
for the skewness of some size distribution data. Depending on their location in the
size distribution neighbouring components may need to be combined. The Gaussian
representation of mixture densities makes this relatively straightforward. Interpreta-
tion of results is also aided by including prior or expert opinion, which may have the
effect of restricting parameter estimates to known domains. Alternatively, further
investigation of the particle source for these components may be needed.
Chapter 7
Bayesian estimation of mixtures over time with
application to aerosol particle size distributions
In this chapter, we examine in some detail the issue of using informative priors for
estimation of mixtures at multiple time points. In this analysis, the use of two
different informative priors, and an independent prior are compared using simulated
and actual data. The use of informative priors may provide useful information
in which to better identify component parameters at each time point, and as an
aid for inference provide information in which to more clearly establish patterns in
the parameters over time. As this chapter is designed to be read independently of
Chapter 6, Section 2 describing PSD data and the first part of Section 3 outlining
mixture models, are largely repeated from Chapter 6.
7.1 Introduction
Interest in the estimation of aerosol particle size distributions is largely directed at
better understanding aerosol dynamic processes (i.e., coagulation, nucleation, con-
densation, and deposition) that govern aerosol formation, as growth, and evolution
depend on the number, size, and composition of particles (Makela et al., 2000, Bir-
mili et al., 2001, Xu et al., 2002, Whitby et al., 2002, Lu and Bowman, 2004, Hussein
113
114
et al., 2005). In the atmosphere, these aerosol characteristics determine the influence
of particles on health, climate, cloud formation, and visibility (Seinfeld and Pandis,
1998). One of the most common approaches for representing size distributions is
by treating the aerosol size distribution at any time point as a set of individual
typically log-normal distributions or modes. In this formulation, the estimation of
particle size distributions is largely analagous to a mixture model problem at each
time point in the statistical setting and for which there is now a growing literature
(Marin et al., 2005).
While interest is in the representation of the particle size distribution as a mix-
ture at each time point, it is also of interest to describe how this distribution evolves
over time. To better understand aerosol dynamic processes, a feature of the mea-
surements of particle size distributions is that they are often collected at regular
points in time, and often at quite small time intervals (e.g every 10 minutes). In
this setting, parameters of the mixture model at each time point are likely to be cor-
related with neighbouring time points and useful information about the parameters
may be gained by incorporating this information in estimation.
The standard setting in which mixture models have been applied has largely been
for independent random samples (Marin et al. (2005)), but an emerging literature
is developing for situations in which the data are spatially and/or temporally struc-
tured (Fernandez and Green (2002); Green and Richardson (2002)). While some of
the methods developed for mixture models in the spatial setting can be adapted for
use in a time series setting, the influence or choice of informative priors in a time
series framework and the implications in different data environments has largely not
been examined.
While our motivation is in exploring methodology to be used for estimation
of particle size distributions over time, a more general framework can include any
situation in which a mixture representation exists at a point in time, but for which
further time series data are available. For example, in a disease mapping context
interest may be in both the mixture representation of the spatial surface and also
in any temporal changes to the mixture. In an image analysis context, interest may
be in changes to the composition of the image associated with an intervention or
115
response (e.g environmental modelling, neurological examinations, etc.)
In this chapter, we briefly explore two different informative priors for estimation
of mixtures where the data are highly correlated, and all parameters in the mixture
are allowed to vary. Different datasets, with features similar to actual particle size
distribution data, are used to highlight the influence of using informative priors and
identify situations where placing informative priors may not be beneficial.
An outline of the chapter follows. In Section 8.2, we briefly describe particle
size distributions, and provide an illustration with actual data. In Section 8.3, we
outline the standard mixture model setup for a single time point and then outline
two approaches to estimation of a mixture model where we have more than one
time point. Section 8.4 presents results on the performance of the approaches on
several simulated datasets and actual data, and we conclude in Section 7.5 with
some discussion and possibilities for further work.
7.2 Particle size distribution data
One of the most important physical properties of aerosol particles is their size and
the concentration of particles in terms of their size is referred to as the particle size
distribution. Figure 7.1 shows an example of particle size distribution data for one
measurement or time period. Because aerosol particles are often charged, their size
can be determined from their electrical mobility (McMurry, 2000).
In this study we present, as an example, the aerosol particle evolution, before,
during, and after a new particle formation event at a Boreal Forest in Southern
Finland (Figure 8.2). This dataset was selected as it provides a wide ranging repre-
sentation of modes for particle size distributions (Dal Masso et al., 2005). Because
aerosol particles are governed by formation and transformation processes, they tend
to form well distinguishable modal feature. For example, during background con-
ditions in the Boreal Forest the particle number size distribution of fine aerosols
(diameter < 2.5 mm) is bimodal: an Aitken mode (below 0.1 mm) and an accumu-
lation mode (over 0.1 mm). During a new particle formation event a new particle
mode, which is commonly known as nucleation mode, is formed in the atmosphere
116
1 2 3 4 5 6
020
040
060
080
010
00
Particle Diameter
Con
cent
ratio
n
Predicted fitComponents
Figure 7.1: Estimated overall fit and components from RJMCMC for one time period. Con-centration of particles (dN/dlog(Dp)[cm3]) by particle diameter (log(Dp(nm)))
117
with geometric mean diameter bellow 0.025 mm. However, in the urban atmosphere,
aerosol particles are more dynamic because of the different types and properties of
sources of aerosol particles and may show more than 3 lognormal modes. Typically
the number concentrations of aerosol particles in the urban background can be as
high 5× 104cm−3 and very close to a major road they often exceed 105cm−3.
7.3 Methods
In this section, we briefly describe a mixture model, outline a two stage approach to
estimation of parameters over time, and describe three types of priors for temporal
evolution of the parameters.
7.3.1 Mixture representation
The density of data (y) at a given time period is represented by a finite mixture
model
p(y|θ) =k∑
j=1
λjf(y|θj) (7.1)
where k is the number of components in the mixture, λj represents the probability
of membership of the jth component (∑k
j=1 λj = 1), and f(y|θj) is the density
function of component j which has parameters θj.
As component membership of the data is unknown, a computationally convenient
method of estimation for mixture models is to use a hidden allocation process and
introduce a latent indicator variable zi, which is used along the lines of a missing
variable approach to allocate observations yi to each component.
In this chapter, we adopt the common assumption of fitting log-normal distri-
butions to aerosol particle size distribution data (Whitby and McMurry, 1997). As
PSD data are often measured with a definite lower and upper bound for the size of
the particles we introduce a slight modification and assume that the data follow a
truncated normal distribution. As is commonly assumed, we take the data (y) to
be the log of particle diameters (nm), and the parameters to be estimated (θj) for
each component are the mean (µ), variance (σ2) and weight (λ). The number of
118
Figure 7.2: An illustration of a new particle formation event at a Boreal Forest sitelocated in Southern Finland. (a) The temporal variation of the particle number size distri-bution and (b) selected particle number size distributions showing the different stages ofthe newly formed particle mode from its early stage. Note that this new particle formationoccurred on a regional scale over the southern part of Finland.
119
components k was also considered to be unknown. Priors for the first stage of the
analysis were:
p(µj) ∼ N(ξ, κ−1)
p(σ−2j ) ∼ Gamma(δ, β)
p(β) ∼ Gamma(g, h)
p(λ) ∼ Dirichlet(α1, α2, · · · , αk)
p(k) ∼ Uniform(kmin, kmax)
where ξ, κ, δ, α, η, g, h, kmin and kmax are fixed hyperparameters.
In the first stage of the temporal analysis, for each time period we implemented
Richardson & Green’s (1997) RJMCMC algorithm. Although this algorithm is easily
fit at a single time point, the use of RJMCMC for mixture models with temporal data
requires significant pre-processing with respect to mixing coverage and convergence,
as well as post-processing to provide adequate summary statistics and between time
component mapping. As an alternative, we considered a two-stage approach. In
the first stage, the number of components was estimated at each time point using
RJMCMC. In the second stage, we fixed the number of components (k) to the
maximum observed at any time period and independently estimated the parameters
of the mixture model (µ,σ, and λ) for each time period using a Gibbs sampler
algorithm. As we do not observe all of the components in every time period, we
allow component weights to be zero (inf(λt)=0.001) if required.
The Gibbs sampler is iterated until the Markov Chains for the parameters have
converged to stationary posterior distributions. For the second stage, priors were
p(λ) ∼ Dirichlet(α1, ..., αk)
p(µj|σ2j ) ∼ N(ξj,
σ2j
nj
) (7.2)
p(σ2j ) ∼ IG(
vj
2,s2
j
2)
where αj, ξj, nj, vj and sj are fixed hyperparameters.
120
For the independent prior case, we use uninformative priors for µ, σ and λ. Priors
for the dependent prior are discussed below.
7.3.2 Choice of temporal prior
In the second stage, three priors were considered for linking parameter values (µ,σ,λ)
over time. The first of these was the independent prior, in which the correlated
nature of the data was ignored completely and parameters were independently esti-
mated at each time point. The second and third were termed the ‘informed prior’
and ’penalised prior’, as described below.
Informed Prior
In this approach we use the information provided from previous and future time
periods as prior information for the current period. For the main results we focus on
a simple case where posterior estimates from the previous period are used as prior
information for the current period. We do this to illustrate the influence of a simple
prior specification on the posterior estimates of parameters (θ).
In the case of a mixture model using Gaussian distributions, we have three pa-
rameters (µ, σ and λ) for which we could utilise available prior information to aid in
estimation. Preliminary investigation indicates that all three parameters are likely to
show strong evidence of autocorrelation, so here we examine the effect of smoothing
on each of these parameters.
For p(λ), we allow αj in Equation 7.2 to vary and reflect prior information about
λt−1,j. Thus, we set αj = θjmt−1,j where mt−1,j is the mean of the number of
observations allocated to component j in the previous time period, and θj is fixed
at some value. An alternative is to impose a distribution on θ, say θj ∼ U(0, 1) (or
N(1, 0.5)) but we do not present the results for this approach in this chapter.
For the specification of prior information for µ and σ, we set ξjt = µjt−1,
vj = nj/σ2jt−1 and sj = nj and increase the value of nj from the value set for
the independent case to reflect the degree of dependency for these parameters from
the previous period.
121
Penalised Prior
An alternative to using the informed prior described in Section 7.3.2 is to use a re-
parameterisation of the prior to reflect the degree of dependency between parameters.
Gustafson and Walker (2003) proposed a prior for λ that downweights large changes
in probabilities in successive periods. Thus, at time period t (t = 2, . . . , T ), p(λt|zt)
is defined as
p(λt|zt) ∼ Dirichlet(1, . . . , 1) exp(−∑T
t=2
∑Jj=1(λt,j − λt−1,j)
2
φ) (7.3)
where smaller values of φ imply greater smoothing. The above formulation can
naturally be extended to dependence on more than a single time period.
A potential advantage of using both information about estimates forwards and
backwards in time is the additional information this may provide to guide parameter
estimates in the current period. This may be an advantage at times where large
changes in the parameter estimates are occurring for single time periods at a time.
For comparative purposes in Section 8.4.1, we compare the results of using a similar
formulation for λ in the informed prior approach.
To sample from the posterior distribution of λ, we use a rejection sampling
approach proposed by Gustafson and Walker (2003).
Prior distributions p(µ) and p(σ) are set as for the independent approach (Equa-
tion 7.2).
7.4 Results
In this section we present and assess the results using simulated data and then
present the results of applying the approaches to particle size distribution data from
Hyytiala, Finland. We use the simulated data to test the impact of the different
prior representations and the degree of smoothing. We first use an informative and
penalised prior only on the weights (λ), and then assess the influence of using an
informative prior on µ and σ.
122
7.4.1 Simulated Data
Data Setup
We simulated datasets indicative of the type of behaviour of aerosol particle size
distribution data observed at Hyytiala, a boreal forest site in Southern Finland
(SMEAR II) (Vesala et al., 1998). A particular feature of this particle size distribu-
tion data is both a growth in the mean and weight for a component.
We simulated data from three different cases. In the first case (D1), we simulated
data with parameters which can be characterised as having medium correlation
across time. In this case the mixture is well identified and interest is in whether the
results from the informed and penalised approaches are largely the same as for the
independent approach.
In practice it is quite common to observe sudden large changes in the number
of particles measured which may persist for a number of time periods. This is more
often observed when there are relatively few particles for a particular size group,
and more so for the smaller sized particles. Thus, for the second data set (D2) we
simulate data for the first component where the weight at smaller values is quite
volatile. For this dataset the mixture is also well identified.
For the third data set (D3), we simulated data which are highly correlated across
time, a feature of particle size distribution data observed in practice for most time
periods where measurements are commonly taken at small time intervals. This
dataset was also simulated with parameter estimates where at times the mixture is
not well identified. Of interest in this setting is to see the effect of using either the
informed prior or penalised prior approach.
For the results to follow, except as specified otherwise, for the independent,
informed prior and penalised prior approaches, we set the hyperparameters to be:
ξ = (1.5, 3.5, 5.0); s2t,j = 10; vt,j = 10/0.62; and nj = 2.
Smoothing on λ
Figure 7.3 shows the results of the informed prior, penalised prior and independent
approaches compared to the actual dataset D1. As expected for this well defined
123
dataset, the results for the informed and penalised prior approaches appear to largely
follow the results obtained from the independent approach.
Component 1
1.5
2.0
2.5
3.0
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.1
0.3
0.5
0.7
Component 2
3.2
3.3
3.4
3.5
3.6
3.7
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.15
0.20
0.25
Component 3
4.7
4.8
4.9
5.0
5.1
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.1
0.3
0.5
0.7
Figure 7.3: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D1): Simulated data (Black); Inde-pendent (Red); Informed Prior (Green); Penalised Prior (Blue)
Figure 7.4 shows the results of analysis using dataset D2, using the informed
prior approach with θ = 0.5 and penalised prior approach with φ = 0.08 and the
independent prior approach. The values for θ and φ were chosen to allow prior infor-
mation to be influential, but not overwhelm the posterior estimates. Interestingly
it is evident that smoothing on the weights results in compensatory measures by
both µ and σ. The compensatory measures appear to be more pronounced when
λ is volatile over time. In this case, the prior is imposing larger adjustments away
124
from the data at each time point. We see this most clearly in the results for the
first two components, but not for the third component. A possible explanation for
this observation is that for the first component, µ is able to adjust to a higher value
which is supportive of a greater value for λ, and in some sense borrow support from
the second component. However, for the third component, µ is not able to increase
or decrease in support of a lower value for λ by borrowing support from a nearby
component.
Component 1
1.5
2.0
2.5
3.0
0.4
0.6
0.8
1.0
1.2
1.4
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 2
3.0
3.5
4.0
4.5
0.4
0.6
0.8
1.0
0 20 40 60 80 100
0.20
0.30
0.40
Component 3
4.6
4.8
5.0
5.2
0.4
0.5
0.6
0.7
0 20 40 60 80 100
0.2
0.3
0.4
0.5
0.6
0.7
Figure 7.4: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D2): Simulated data (Black); Inde-pendent (Red); Informed Prior (Green); Penalised Prior (Blue)
As shown in Figure 7.5 (black line), for the third data set (D3) we simulated data
for the first component with a mean value increasing from 1.5 to 3.0, and weight
125
increasing from 0.1 to 0.6 and then decreasing to 0.3, over time. Often a consequence
of the growth in the first component is a decline in size and weight for the larger
sized particles, and this is reflected in the weight for the second component following
an opposite pattern to the first component. For the third component, the weight
increases from 0.1 to 0.3 over time. The parameters µ and λ are simulated with
some noise around the parameter values, and the sample size is 1000.
Component 1
1.5
2.0
2.5
0.2
0.3
0.4
0.5
0.6
0 20 40 60 80 100
0.0
0.1
0.2
0.3
0.4
0.5
Component 2
3.1
3.2
3.3
3.4
3.5
3.6
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.2
0.4
0.6
0.8
Component 3
4.0
4.5
5.0
0.4
0.6
0.8
1.0
0 20 40 60 80 100
0.1
0.3
0.5
0.7
Figure 7.5: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D3): Simulated data (Black); Inde-pendent (Red)
Figure 7.5 also shows the results of using the independent approach. We see that
at times the parameter estimates for the independent approach deviate away from
the actual data.
126
Figures 7.6 and 7.7 show the results for the informed prior and penalised prior
compared to the actual data, respectively. In Figure 7.6, the results show the effect
of varying the degree of smoothing on λ for the informed prior using θ=(0.1,0.5,1.3).
For the results of the penalised prior, we vary the degree of smoothing on λ using
φ=(0.04,0.08,0.12).
Component 1
1.5
2.0
2.5
3.0
0.50
0.60
0.70
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 23.
353.
453.
553.
650.
500.
600.
700.
80
0 20 40 60 80 100
0.3
0.4
0.5
0.6
0.7
0.8
Component 3
4.9
5.0
5.1
5.2
0.50
0.55
0.60
0 20 40 60 80 100
0.10
0.15
0.20
0.25
Figure 7.6: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for Informed Prior approach using simulated data (D3): Simulated data(Black); Theta=0.1 (Green); Theta=0.8 (Blue); Theta=1.3 (Purple)
In Figure 7.6, we can see that the parameter estimates for λ for all three values
of θ appear to closely follow the actual data, with the closest estimates to the actual
data being for θ = 0.5 and 1.3. As we are only using an informed prior on the
weights the parameter estimates for µ and σ appear to be quite variable over time
127
Component 1
1.5
2.0
2.5
0.55
0.60
0.65
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 23.
303.
403.
503.
600.
500.
550.
600.
650.
70
0 20 40 60 80 100
0.3
0.4
0.5
0.6
0.7
0.8
Component 3
4.8
4.9
5.0
5.1
5.2
0.50
0.54
0.58
0.62
0 20 40 60 80 100
0.10
0.15
0.20
0.25
Figure 7.7: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for Penalised Prior approach using simulated data (D3): Simulated data(Black); φ=0.04 (Brown); φ=0.08 (Light Blue); φ=0.12 (Dark Green)
128
compared to the actual data. However, the variability appears to be slightly less
for these variables than for the independent approach (Figure 7.5) and closer to the
actual data over time. Of interest is the closeness of the parameter estimates of µ
and σ for components 1 and 2 which more clearly follow the true growth occurring in
component 1 and the stability over time for component 2 compared to that observed
for the independent approach.
In Figure 7.7, the parameter estimates for the penalised prior approach appear
to deviate slightly from the actual data for components 1 and 2. For the third
component, the parameter estimates for the penalised prior approach follows the
actual data with some noise. Overall, the results from the penalised prior approach
are similar to the independent approach but with less variability over time.
Smoothing on µ and σ
We turn now to an assessment of the impact of using an informative prior for µ
or σ over time. We present results for the highly correlated data set, since this is
the most sensitive of the simulated data as discussed above. Here we set nj = 25,
ξjt = µjt−1, vj = 200/σ2jt−1 and sj = 200.
In Figure 7.8, the parameter estimates for the informative prior for µ appear to
more closely follow the actual data than using an informative prior for σ. Although
the parameter estimates for both approaches appear to be further away from the
actual data than using an informative prior for λ, they do appear to be closer than
under the independent approach.
Figure 7.9 shows the results of using an informative prior on both µ and λ. In
this example, the results are similar to using an informative prior only on λ. Thus
depending on the objectives of the analysis, using an informative prior on both
parameters may not be needed.
In general, from the results of smoothing on µ, σ and λ it appears that large
adjustments to one parameter (e.g from volatility in some time periods) are not
supported unless compensatory measures can be taken by the other parameters.
In analyses not shown here, we compared the results from using a three period
centred moving on the weight for the informed prior to the results of using the
129
Component 1
1.5
2.0
2.5
0.55
0.60
0.65
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 23.
303.
403.
503.
600.
500.
550.
600.
650.
70
0 20 40 60 80 100
0.3
0.4
0.5
0.6
0.7
0.8
Component 3
4.8
4.9
5.0
5.1
5.2
0.50
0.55
0.60
0.65
0 20 40 60 80 100
0.10
0.15
0.20
0.25
Figure 7.8: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for Informed Prior approach using simulated data (D3): Simulated data(Black); Smoothing on µ (Orange); Smoothing on σ (Dark Green))
130
Component 1
1.5
2.0
2.5
3.0
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.0
0.1
0.2
0.3
0.4
0.5
Component 23.
13.
23.
33.
43.
53.
60.
20.
40.
60.
8
0 20 40 60 80 100
0.2
0.4
0.6
0.8
Component 3
4.0
4.5
5.0
0.2
0.4
0.6
0.8
0 20 40 60 80 100
0.1
0.3
0.5
0.7
Figure 7.9: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ
(bottom panels) for approaches using simulated data (D3): Simulated data (Black); Inde-pendent (Red); Smoothing on µ and λ (Green)
131
penalised prior to assess whether the form specified by the penalised prior had a
different influence on the results. We found the results from using both approaches
to be largely the same.
7.4.2 Case study
The data set studied here was taken from a measurement site at Hyytiala, Finland; a
plot of the measurements for the day selected is shown in Figure 8.2. This particular
day was selected as it shows a new particle formation event occurring, whereby a
new mode of aerosol particles appears with a significant influx of particles (as high as
106cm3) with a geometric mean diameter (< 10 nm), growing later into the Aitken
(25-90nm) or accumulation modes (100+nm). In terms of a temporal mixture model
setting, we will be able to assess the performance of the three prior specifications
outlined previously as new components are introduced and both a growth in the
mean and weight for those components are observed.
As outlined in Section 8.3, the first stage of our approach is to apply RJMCMC
to each time period. These results are then used to guide the choice of the number
of components and initial parameter estimates for the second stage analysis, in
which temporally correlated priors are used to model the evolution of the mixture
parameters over time. Figure 8.9 shows the results of the first stage of the algorithm,
with a plot of the posterior mean estimates for µjt at each time point t, with the
size of the circles indicating the corresponding weight λjt. The average number
of components estimated with the highest probability over the day was four; the
minimum number of components was one, and the maximum number of components
was five.
For the second stage, we fixed the number of components to be five. For the inde-
pendent approach, we set the hyperparameters to be: ξ = (1.5, 2.2, 3.0, 3.8, 4.2, 5.1);
s2t,j = 2.0225; vt,j = 0.092025; and nj = 200. Figure 7.11 shows the results of
estimation using the independent approach.
From previous results of using the three prior specificatons to simulated data
(Section 8.4.1) we generally found closer parameter estimates to the actual data
over time using an informed prior on µ or λ. For data that are quite noisy (D2), we
132
0 20 40 60 80 100
12
34
5
Time
Pos
terio
r m
ean
estim
ates
for
µ
Figure 7.10: Plot of posterior mean estimates for µj from RJMCMC algorithm for oneday (Hyytiala). Stage 1 of analysis for temporal evolution of parameters. Larger circlesindicate greater weight for that component
133
3
]), a
s.ts
(MuM
eans
Ind[
, 4])
, as.
ts(M
uMea
nsIn
d[, 5
]))
23
45
3
]), a
s.ts
(Pro
pMea
nsIn
d[, 4
]), a
s.ts
(Pro
pMea
nsIn
d[, 5
]))
0 20 40 60 80 100 120 140
0.0
0.2
0.4
0.6
0.8
1.0
Figure 7.11: Plot of estimated parameters (µ (top panel), λ (bottom panel) under anindependent prior over time. Stage 2 of analysis for temporal evolution of parameters.
134
also observed that the informative and penalised prior specifications can cause large
adjustments to other parameters. Thus, caution must be exercised when applying
the approaches to data of this type.
Figure 7.12 shows the results of estimation using the informed prior with smooth-
ing on all of the weights and only the mean for component 3. Comparing the results
to the independent approach in Figure 7.11, the parameter estimates for the informed
prior appear to show smoothly growing estimates for µ over time for components 1
and 2, and smoother parameter estimates for λ.
3
]), a
s.ts
(MuM
eans
TP
[, 4]
), a
s.ts
(MuM
eans
TP
[, 5]
))2
34
5
3])
, as.
ts(P
ropM
eans
TP
[, 4]
), a
s.ts
(Pro
pMea
nsT
P[,
5]))
0 20 40 60 80 100 120 140
0.0
0.2
0.4
0.6
0.8
1.0
Figure 7.12: Plot of estimated parameters (µ (top panel) and λ (bottom panel) underan informed prior over time. Stage 2 of analysis for temporal evolution of parameters.Informed prior specified for λ in all components and µ3
In analyses not shown here, we also considered the effect of using an informative
prior on other parameters and found the results to vary to a small degree. Our
choice in using an informative prior on µ3 and λ was guided partly by interest in
135
λ and from prior belief in µ for the larger sized particles, forming in this case the
background concentration of particles, to be highly correlated over time. We were
also guided by our choice of parameters by the variability of the parameter estimates
from the first stage of the analysis.
7.5 Discussion
In this chapter, we explored the problem of estimating Bayesian mixture models
at multiple time points. Under different situations, approaches that employ infor-
mation about neighbouring time points compared favourably to results based on
an independent approach. By including additional temporal information about pa-
rameters for correlated time periods we may be able to better identify individual
components at each time point. As an aid for inference, we may also be able to ob-
tain smoother parameter estimates over time and from this be able to more clearly
establish patterns or identify anomalies from the data.
The results highlight a number of observations about mixture representations at
multiple time points. First, analysis of the evolution of parameters of a mixture over
multiple time points highlights the large degree of dependency that exists between
component parameters. Changes to a parameter in one component may flow on to
the parameter in a nearby component. Depending on the context of the study, we
can anticipate this dependency to be more readily apparent for the weight parameter
but we found similar dependencies to exist for other parameters. The second is the
need to be mindful that the same parameter in one component may have a different
correlation structure over time to the same parameter in another component. In
the context of particle size distribution data, we often observed greater volatility
in estimates for the smaller particles compared to the larger sized particles and
so at times the correlation structure of the parameters between these respective
components appeared to be quite different.
A possible effect of using informative priors in this context is to impose a prior
not supported by the data or to impose a temporal correlation structure where
such a structure does not exist, and thereby cause unnecessary adjustments to other
136
parameters. We observed this most clearly in the results from the simulated data
where at times the data was quite noisy. For this dataset, using an informative
prior for a parameter which supported large adjustments away from the actual data,
resulted in large compensatory adjustments being made by other parameters not only
within the same component, but also to parameters in neighbouring components.
The easy solution may be to use an appropriate correlation structure for components
but of course this may not always be known apriori.
A further result of the dependency that can exist between parameters of com-
ponents and within component parameters is that the inclusion of correlation in-
formation to aid in the identifiability of the mixture, may not be required for all
parameters or alternatively all components. In the context of a mixture with a small
number of components, we may only need to provide more information about one
parameter for an influential component in order to separate out the influence of com-
peting components. This result will also be invaluable if the correlation structure
for one parameter or parameters for one component are more readily known. In the
context of a mixture of Gaussians, we generally found that an informative prior was
only needed on µ or λ or possibly both. This result could well be context specific and
influenced by any reliance on the means for defining (in terms of size) and ordering
of components. The choice of which parameter to use more information may also
be guided by whether it is a parameter of interest for inference as demonstrated in
analysis of the case study where most interest was in the behaviour of λ over time.
In this case, and in general, one must be careful in the analysis of just one parameter
as it can largely be a conditional analysis in view of the behaviour of other possible
cross-correlated parameters within the same component and between components.
While many of the above difficulties may seem to be avoided if smoothing ap-
proaches are applied retrospectively on parameter estimates from an independent
mixture model, this type of analysis may largely ignore the true mapping of compo-
nents or path of parameters over time. From the results of the simulated data, the
large degree of dependency that we observe between the parameters of a mixture
over time suggests that including temporal information to better identify one of the
parameters at a single time point can flow on to affect other parameters. This could
137
change inference about both the mixture representation at a point in time, and also
the behaviour of mixture parameters over time.
In general, one of the potential difficulties in using an informative prior approach
to smooth parameter estimates over time is the variable degree of influence the prior
may have in the posterior. If the primary objective is to obtain smoothed parameter
estimates over time, larger sample sizes and noisiness of the data at times may
warrant increasingly restrictive priors. In such cases where the objective might be
to downplay the influence of the data, a number of alternative approaches to increase
the influence of prior information can be used Ibrahim et al. (2003). In all cases, it is
valuable to undertake a sensitivity analysis in order to assess the effect of the prior.
Such an analysis should include the independent prior as a baseline comparison.
Alternative approaches which are less sensitive to the form in which prior infor-
mation is given in the model, and/or include covariate information could also be
used to aid in estimation.
For estimation of aerosol particle size distributions, the dynamics of the aerosol
process and the complexity of the influences on particle concentration and size,
demand the use of approaches which utilise as much information from the data as
possible. To this end, the inclusion of temporal information may be helpful.
Chapter 8
Bayesian hierarchical modelling for a time series of
mixtures
In this chapter, we address some of the issues raised in Chapter 7, and explore a
hierarchical approach to estimation of mixture parameters over time in which an
informative prior is placed at two different levels. Simulated and actual data is used
to assess the performance of the approach. As this chapter is designed to be read
independently of the previous chapters, Section 2 describing PSD data and the first
part of Section 3 outlining mixture models, are largely repeated from Chapter 7.
8.1 Introduction
In Chapter 7, we explored the problem of estimating a Bayesian mixture model at
multiple time points using an informative or penalised prior which carried informa-
tion about the correlation of parameters for neighboring time points. The analysis,
in general, highlighted a number of observations about mixture representations at
multiple time points. First, analysis of the evolution of parameters of a mixture over
multiple time points highlighted the large degree of dependency that exists between
component parameters. For example, in the case of a mixture of Gaussians, large
changes to the weight parameter over time for one component not only was reflected
138
139
in adjustments to the weights of other components but also to the mean parameter
of the associated component and neighbouring components.
The second is the observation that a parameter in one component may have a dif-
ferent correlation structure over time to the same parameter in another component.
In the context of PSD data, we often observe greater volatility in concentration lev-
els for the smaller sized particles compared to the larger sized particles and we can
expect this to be reflected in the corresponding parameter estimates for a mixture
model over time.
In light of the above observations, a possible effect of using informative priors
in this context is to impose a prior not supported by the data or to impose a
temporal correlation structure where such a structure does not exist, and thereby
cause unnecessary adjustments to other parameters.
In this chapter, we explore the problem of estimating a Bayesian mixture model
at multiple time points using a hierarachical model for the parameters in which
an informative prior is placed at two different levels. The aim of exploring this
approach is to address some of the issues raised in the previous paper and develop
an alternative approach which is less sensitive to the form in which prior information
is given in the model.
An outline of the chapter follows. In Section 8.2, we briefly describe particle
size distributions, and provide an illustration with actual data. In Section 8.3, we
outline the standard mixture model setup for a single time point and a two stage
approach for estimation of a mixture model at multiple time points. For estimation
of a mixture model at multiple time points we introduce a hierarchical approach
to estimation. Section 8.4 presents results on the performance of the approach on
several simulated datasets and actual data, and we conclude in Section 8.5 with
some discussion and possibilities for further work.
8.2 Particle size distribution data
One of the most important physical properties of aerosol particles is their size and
the concentration of particles in terms of their size is referred to as the particle size
140
distribution. Figure 8.1 shows an example of particle size distribution data for one
measurement or time period. Because aerosol particles are often charged, their size
can be determined from their electrical mobility (McMurry, 2000).
1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
0.5
Particle Diameter (log(Dp(nm))
Den
sity
Figure 8.1: Histogram of data sampled from Hyytiala, Finland for a single time period
In this study we present, as an example, the aerosol particle evolution, before,
during, and after a new particle formation event at a Boreal Forest in Southern
Finland (Figure 8.2). This dataset was selected as it provides a wide ranging repre-
sentation of modes for particle size distributions (Dal Masso et al., 2005). Because
aerosol particles are governed by formation and transformation processes, they tend
to form well distinguished modal features. For example, during background con-
ditions in the Boreal Forest the particle number size distribution of fine aerosols
(diameter < 2.5 mm) is bimodal: an Aitken mode (below 0.1 mm) and an accumu-
lation mode (over 0.1 mm). During a new particle formation event a new particle
mode, which is commonly known as nucleation mode, is formed in the atmosphere
with geometric mean diameter below 0.025 mm. However, in the urban atmosphere,
141
aerosol particles are more dynamic because of the different types and properties of
sources of aerosol particles and may show more modes. Typically the number con-
centrations of aerosol particles in the urban background can be as high as 5×104cm−3
and very close to a major road they often exceed 105cm−3.
8.3 Mixture models
In this section, we briefly describe a mixture model, outline the independent and
informed prior representations, and outline a hierarchical approach to estimation of
parameters.
The density of data (y) at a given time period is represented by a finite mixture
model
p(y|θ) =k∑
j=1
λjf(y|θj) (8.1)
where k is the number of components in the mixture, λj represents the probability
of membership to the jth component (∑k
j=1 λj = 1), and f(y|θj) is the density
function of component j which has parameters θj.
As component membership of the data is unknown, a computationally convenient
method of estimation for mixture models is to use a hidden allocation process and
introduce a latent indicator variable zij, which is used along the lines of a missing
variable approach to allocate observations yi to each component.
In this chapter, we adopt the common assumption of fitting log-normal distri-
butions to aerosol particle size distribution data (Whitby and McMurry, 1997). As
PSD data are often measured with a definite lower and upper bound for the size of
the particles we introduce a slight modification and assume that the (log) data follow
a truncated normal distribution. Thus, we take the data (y) to be the log of particle
diameters (nm), and the parameters to be estimated (θj) for each component are
the mean (µ), variance (σ2) and weight (λ). The number of components k is also
considered to be unknown.
For the independent, informative prior and hierarchical approaches (except where
142
Figure 8.2: An illustration of a new particle formation event at a Boreal Forest sitelocated in Southern Finland. (a) The temporal variation of the particle number size distri-bution and (b) selected particle number size distributions showing the different stages ofthe newly formed particle mode from its early stage. Note that this new particle formationoccurred on a regional scale over the southern part of Finland.
143
stated otherwise), priors were:
p(µj) ∼ N(ξ, κ−1)
p(σ−2j ) ∼ Gamma(δ, β)
p(β) ∼ Gamma(g, h)
p(λ) ∼ Dirichlet(α1, α2, · · · , αk)
p(k) ∼ Uniform(kmin, kmax)
where ξ, κ, δ, α, η, g, h, kmin and kmax are fixed hyperparameters.
In the first stage of the temporal analysis, for each time period we implemented
Richardson & Green’s (1997) RJMCMC algorithm to estimate both θt and kt (t =
1, . . . , T ). Although this algorithm is easily fit at a single time point, the use of
RJMCMC for mixture models with temporal data, where both θ and k may vary et
each time point, requires significant pre-processing with respect to mixing coverage
and convergence, as well as post-processing to provide adequate summary statistics
and between time component mapping. As an alternative, we consider a two-stage
approach. In the first stage, the number of components is estimated at each time
point using RJMCMC. In the second stage, we fix the number of components (k)
to the maximum observed at any time period and then independently estimate the
parameters θj (j = 1, . . . , k) for each time period using a Gibbs sampler algorithm.
As we do not observe all of the components in every time period, we allow component
weights to be ‘effectively zero’ (inf(λt)=0.001) if required. The Gibbs sampler is
iterated until the Markov Chains for the parameters have converged to stationary
posterior distributions.
In the second stage, for estimation of parameters of a mixture at multiple time
points, independent estimation of θ at each time period does not allow for any
information about θ to be shared over time. An alternative to this independent
approach is to use an informative prior where information provided from previous
and future time periods is used as prior information for the current period. In a
previous paper, we explored the use of this type of approach in some detail. Here, the
prior was imposed directly on elements of θ and strong sensitivity of the posterior
144
estimates tom the specification of these priors was observed. In this paper, we
explore an alternative representation as described in section 1.3.1 and present the
results from the previous approach as a comparison to the results from the new
hierarchical approach.
We focus, in particular, on a simple case where posterior estimates from the
previous period are used as prior information in the current period. As the weight
parameter in a mixture is often of interest in analysing PSD data, we present the
results from using an informative prior for this parameter. Thus specification of
prior information for λ can be achieved by allowing δt,j in the Dirichlet prior at time
t to depend on λt−1,j.
For the results to follow for the informative prior approach, we specify that δt,j =
θjmt−1,j where mt−1,j is the mean number of observations allocated to component j
in the previous time period. The parameter θj reflects how strongly the information
from the previous time period is used as prior information for the current period. In
this chapter, we choose to fix θ = 0.5; alternatively we could estimate this parameter
but we do not pursue this approach in this paper.
8.3.1 Hierarchical time series approach for mixture models
In this section, we outline a hierarchical approach for the estimation of parameters
of a mixture model for multiple time points.
Smoothing on µ
The hierarchical approach for µ is specified as,
µjt ∼ N(φjt, V1)
φjt ∼ N(φjt−1, V2)(8.2)
where V1 and V2 are fixed scalars, reflecting the variability of µjt and φjt re-
spectively. In this hierarchical formulation, the parameter µ is used to estimate the
mixture distribution at the level of the data, and φ represents the underlying corre-
lation of µ over time (assuming an AR(1) process). In this setting, we can interpret
145
the ratio V2/V1 as reflecting the amount of information we have about the underlying
behaviour (signal) of µ in comparison to estimates at the level of the data (noise).
For the first time period (t=1), we set φjt = µjt. For estimation of µ and φ we
use a Gibbs sampling scheme. For details see the Appendix.
Smoothing on λ
For independent data, a convenient prior for λ is to use a Dirichlet distribution,
Wj ∼ Gamma(shape = αi, scale = 1)(independently)
then
V =k∑
j=1
Wj ∼ Gamma(shape =k∑
j=1
αi, scale = 1),
(λ1, . . . , λk) = (W1/V, . . . , Wk/V ) ∼ Dir(α1, . . . , αk)
(8.3)
However, it is difficult to work with a Dirichlet in a time series or hierarchical
approach, mainly due to the inflexibility of the Gamma distribution. An alternative
formulation of the Dirichlet in terms of the Beta distribution does not appear to
provide greater flexibility.
Another alternative is to use a Logistic-Normal prior for λ (LN(λ; Xt, Σd)) where,
Wt ∼ MV N(Xt, Σd)
λj,t =exp(Wj,t)∑kj=1 exp(Wj,t)
(8.4)
Using this functional form, the parameterisation of λ in terms of a multivariate
normal distribution allows for a suitably flexible form in which to explore a hierar-
chical structure for this parameter. Such flexibility, in comparison to the Dirichlet
distribution, has recently been investigated in a hierarchical approach for pooling of
estimates across different sampling units (Hoff, 2003).
In a hierarchical setting and similar to the model used for µ we can further say
146
that,
Xt ∼ MV N(Xt−1, Σs)
γj,t =exp(Xj,t)∑kj=1 exp(Xj,t)
(8.5)
where Σd and Σs reflect the variability of Wt and Xt respectively. In this hier-
archical formulation, the parameter λ is used to estimate the mixture model at the
level of the data, and γ represents the underlying correlation of λ over time (assum-
ing an AR(1) process). For the results to follow the diagonal entries of Σd and Σs are
fixed to reflect the noisiness of the data and the degree of smoothing respectively,
and off diagonal entries are set to zero. Alternatively, we could estimate Σd and Σs
but we do not pursue this approach in this paper.
For estimation of λ and γ we use a Gibbs sampling scheme with a Metropolis
Hastings step For details see the Appendix. For identifiability, both Wt and Xt are
k − 1 dimensional, and λk = 1−∑k−1j=1 λj (with same identification used for γ).
8.4 Results
In this section we present and assess the results using simulated data and then
present the results of applying the approach to particle size distribution data from
Hyytiala, Finland.
8.4.1 Simulated Data
Data Setup
We simulated data which is indicative of the type of behaviour of aerosol particle
size distribution data observed at Hyytiala, a boreal forest site in Southern Finland
(SMEAR II) (Vesala et al. (1998)). A particular feature of this particle size distribu-
tion data is both a growth in the mean and weight for a component. Two datasets
are simulated. The first provides an illustration of a particular feature of PSD data
for some time periods and the second is representative of most time periods.
In practice it is quite common to observe sudden large changes in the number
of particles measured which may persist for a number of time periods. This is more
147
often observed when the number of particles for a particular size group are low, and
more so for the smaller sized particles. For the first dataset (D1) we simulate data
for the first component where the weight at smaller values is quite volatile. For this
dataset the mixture is well identified.
For the second dataset (D2), we simulated data which is highly correlated across
time, a feature of particle size distribution data observed in practice for most time
periods where measurements are commonly taken at small time intervals. This
dataset was simulated with parameter estimates where the mixture is not well iden-
tified during the second half of the time period.
Results from simulated dataset D1
As shown in Figure 8.3 (black line), for the first simulated dataset (D1) we simulated
data for the first component with a mean value increasing slowly over time from 1.5,
and weight increasing from 0.2 to 0.5. For the first half of the time period, the
weight for the first component was simulated with a large degree of noise to reflect
the observed volatility of smaller sized particles in practice at relatively low weights.
The parameter µ was simulated with some noise around the parameter values; σ
kept constant at 0.55, and the sample size was 1000.
Also shown in Figure 8.3 are the results from the independent (red) and informed
prior approaches (green). For the results from the informed prior, we can see that
the effect of using an informative prior on the weights (λ) over time results in
compensatory measures by both µ and σ. We can see this most clearly in the results
for the first component where we see large adjustments to µ1 in compensation for
a smoother estimate of λ1 over time, which is clearly in contrast to the actual data
(black) and results from the independent approach (red line). Of interest is that we
don’t see large compensatory measures for the parameters of the third component,
where for the first half of the time period the actual behaviour of the weight (λ3) is
highly variable. The difference appears to be that in the first component, the mean is
able to adjust to a higher value which is supportive of a greater weight, and in some
sense borrow support from the second component. For the third component, the
mean is not able to increase or decrease in support of a lower weight by borrowing
148
Component 1
1.5
2.0
2.5
3.0
0.4
0.6
0.8
1.0
1.2
1.4
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 23.
03.
54.
04.
50.
50.
70.
91.
1
0 20 40 60 80 100
0.20
0.30
0.40
Component 3
4.6
4.8
5.0
5.2
0.45
0.55
0.65
0.75
0 20 40 60 80 100
0.2
0.3
0.4
0.5
0.6
0.7
Figure 8.3: Plot of estimated parameters over time for simulated dataset D1: µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Independent (Red),Informed Prior (Green)
149
support from a nearby component. Both the independent approach and informed
prior approaches overestimate the weight in the second component.
Figure 8.4 provides the results of the hierarchical model for µ (green and blue)
and the actual data (black). For these results, V1 and V2 are fixed at 0.36 and 0.04
respectively. From Figure 8.4, we can see that the estimates for φ are a smooth
version of the much noisier estimates for µ.
Component 1
1.4
1.6
1.8
2.0
0.4
0.5
0.6
0.7
0.8
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 2
3.4
3.6
3.8
4.0
0.5
0.6
0.7
0.8
0 20 40 60 80 100
0.20
0.25
0.30
0.35
0.40
Component 3
4.85
4.95
5.05
5.15
0.50
0.55
0.60
0 20 40 60 80 100
0.2
0.3
0.4
0.5
0.6
0.7
Figure 8.4: Plot of estimated parameters over time for simulated dataset D1. µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Hierachical approachfor µ (Dark Green), φ (Blue)
Figure 8.5 provides the results of the hierarchical model for λ (green and blue)
and the actual data (black). For these results, V1 and V2 are fixed at 0.2 and
0.015 respectively. From Figure 8.5 the estimates for γ roughly follow the more
variable estimates for λ over time. In contrast to the results from the informed
150
prior (Figure 8.3), we don’t see any large compensatory adjustments being made
to parameters by using a hierarachically based informative prior for γ. Apart from
estimates for γ, the variability of the other parameter estimates are comparable to
the results from the independent approach.
Component 1
1.4
1.6
1.8
2.0
0.4
0.5
0.6
0.7
0.8
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 2
3.4
3.6
3.8
4.0
0.5
0.6
0.7
0.8
0 20 40 60 80 100
0.20
0.25
0.30
0.35
0.40
Component 3
4.85
4.95
5.05
5.15
0.50
0.55
0.60
0 20 40 60 80 100
0.2
0.3
0.4
0.5
0.6
0.7
Figure 8.5: Plot of estimated parameters over time for simulated dataset D1. µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Hierachical approachfor λ (Dark Green), γ (Blue)
Results from simulated dataset D2
As shown in Figure 8.6 (black line), for the second simulated dataset (D2) we sim-
ulated data for the first component with a mean value increasing from 1.5 to 3.0,
and weight increasing from 0.1 to 0.6 and then decreasing to 0.3, over time. Often a
consequence of the growth in the first component is a decline in size and weight for
151
the larger sized particles, and this is reflected in the weight for the second component
following an opposite pattern to the first component. For the third component, the
weight increases from 0.1 to 0.3 over time. The parameters µ and λ are simulated
with some noise around the parameter values, and the sample size is 1000.
Component 1
1.5
2.0
2.5
3.0
0.35
0.45
0.55
0.65
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 2
3.20
3.30
3.40
3.50
0.40
0.50
0.60
0.70
0 20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
Component 3
4.7
4.8
4.9
5.0
5.1
5.2
0.35
0.45
0.55
0 20 40 60 80 100
0.10
0.15
0.20
0.25
0.30
Figure 8.6: Plot of estimated parameters over time for simulated dataset D2. µ (top panels),σ (middle panels), λ (bottom panels). Actual data (Black), Independent (Red), Informed Prior(Green)
Also shown in Figure 8.6 are the results from the independent (red) and informed
prior approaches (green). For the results of the informed prior, the parameter esti-
mates appear to closely follow the actual data in comparison with the independent
approach. Of interest is the closeness of the parameter estimates of µ and σ for com-
ponents 1 and 2 over the second half of the time period, which more clearly follow
the true growth occurring in component 1 and the stability over time for component
152
2.
Figure 8.7 provides the results of the hierarchical model for µ (green and blue)
and the actual data (black). For these results, V1 and V2 are fixed at 0.04 and
0.0025 respectively. The parameter estimates for the mean of the first and second
components appear to be lower than the actual data for the last quarter of the time
period, which is similar to the results of the independent approach (Figure 8.6).
Component 1
1.4
1.8
2.2
2.6
0.35
0.45
0.55
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 2
3.25
3.35
3.45
3.55
0.40
0.50
0.60
0.70
0 20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
Component 3
4.8
4.9
5.0
5.1
0.35
0.40
0.45
0.50
0.55
0 20 40 60 80 100
0.10
0.15
0.20
0.25
0.30
Figure 8.7: Plot of estimated parameters over time for simulated dataset D2. µ (top panels), σ
(middle panels), λ (bottom panels). Actual data (Black), Hierachical approach for µ (Dark Green),φ (Blue)
Figure 8.8 provides the results of the hierarchical model for λ (green and blue)
and the actual data (black). For these results, V1 and V2 are fixed at 0.025 and
0.0075 respectively. The parameter estimates for the hierarchical approach appear
to more closely follow the actual data than for the independent approach.
153
Component 1
1.5
2.0
2.5
0.35
0.45
0.55
0.65
0 20 40 60 80 100
0.1
0.2
0.3
0.4
0.5
Component 23.
303.
403.
500.
450.
500.
550.
60
0 20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
Component 3
4.85
4.95
5.05
5.15
0.35
0.40
0.45
0.50
0 20 40 60 80 100
0.10
0.15
0.20
0.25
Figure 8.8: Plot of estimated parameters over time for simulated dataset D2.µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Hierachical approachfor λ (Dark Green), γ (Blue)
154
8.4.2 Case study
The data set studied here was taken from a measurement site at Hyytiala, Finland
and a plot of the measurements for the day selected is shown in Figure 8.2. This
particular day was selected as it shows a new particle formation event occurring,
whereby a new mode of aerosol particles appears with a significant influx of particles
(as high as 106cm3) with a geometric mean diameter (< 10 nm), growing later into
the Aitken (25-90nm) or accumulation modes (100+nm). In terms of a mixture
model setting, we will be able to assess the performance of the three approaches
outlined previously as new components are introduced and both a growth in the
mean and weight for those components are observed.
As outlined in Section 8.3, the first stage of our approach is to apply RJMCMC
to each time period. These results are then used to guide the choice of the number
of components and initial parameter estimates for the second stage analysis, in
which temporally correlated priors are used to model the evolution of the mixture
parameters over time. Figure 8.9 shows the results of the first stage of the algorithm,
with a plot of the posterior mean estimates for µjt at each time point t, with the
size of the circles indicating the corresponding weight λjt. The average number
of components estimated with the highest probability over the day was four; the
minimum number of components was one, and the maximum number of components
was five.
For the second stage, we fixed the number of components to be five. Figure 8.10
shows the results of estimation using the independent approach.
Figure 8.11 shows the results of estimation using the hierarchical model for the
weights (λ). For these results, V1 and V2 are 0.05 and 0.015 respectively. Compared
to the results from the independent approach (Figure 8.10), we see a noticeable
reduction in the noise surrounding λ and a clearer picture emerging of the pattern
of λ over the course of the day.
Alternatively, we could have used a hierarchical approach for µ alone, or for both
µ and λ. In results not shown, we found similar results for both approaches.
155
0 20 40 60 80 100
12
34
5
Time
Pos
terio
r m
ean
estim
ates
for
µ
Figure 8.9: Plot of posterior mean estimates for µj from RJMCMC algorithm for oneday (Hyytiala). Stage 1 of analysis for temporal evolution of parameters. Larger circlesindicate greater weight for that component
156
12
34
50.
00.
20.
40.
60.
81.
0
00:00 04:00 08:00 12:00 16:00 20:00 23:50
Figure 8.10: Plot of estimated parameters over time for actual data. Independentapproach. Posterior mean estimates for µ (top panel), and λ (bottom panel).
157
12
34
50.
00.
20.
40.
60.
81.
0
00:00 04:00 08:00 12:00 16:00 20:00 23:50
Figure 8.11: Plot of estimated parameters over time for actual data. Hierarchicalapproach for λ. Posterior estimates for µ (top panel), and γ (bottom panel).
158
8.5 Discussion
In this chapter, we explored the problem of estimating Bayesian mixture models at
multiple time points. In this setting, parameters of the mixture model at each time
point are likely to be correlated with neighbouring time points and useful information
about the parameters may be gained by incorporating this information in estimation.
We found that using a hierarchical approach to the estimation of parameters, where
an informative prior is placed at two different levels, offers considerable flexibility in
estimation for a mixture model setting.
Compared to placing an informative prior at a single level, a hierarchical ap-
proach allows for a separation of the underlying pattern of the parameter over time
(signal) from some of the noise surrounding the parameter at each time point. The
advantage of this is two fold. First, where inference is interested in the underlying
pattern of the parameters over time, we may be able to more clearly establish pat-
terns or identify anomalies from the data. Second, in light of the large degree of
dependency that exists between parameters of a mixture both within and between
components, we can impose an informative prior which may be less sensitive to
changes in the correlation structure of the data, and thereby reduce the influence of
adjustments to neighbouring parameters.
In the hierarchical approach outlined, the influence of the informative prior at
the two levels was specified by parameters V1 (low level) and V2 (high level), and
the values assigned to these parameters is critical in carrying information about
the correlation structure of the parameter of interest. In this paper, we decided
to choose parameter values based on prior belief in the correlation structure of the
data; alternatively these parameters could be estimated. To this effect, a number
of approaches are available for estimation (West and Harrison (1997); Fahrmeir
et al. (2004)). However, in order to estimate V1 and V2, we still face a choice as to
the degree of penalisation or smoothing of the parameter in light of the apparent
variability in the data. This is a common issue in temporal and spatial modelling
in general.
Although, we have focussed on developing a hierarchical approach for parameters
159
µ and λ we could equally apply the same approach to consider estimation of σ. Such
an approach may be to consider a half-t distribution which has previously been used
in similar hierarchical settings (Gelman (2006)). Faced with a number of parameters
to consider, the choice of parameter or parameters may depend on the objectives of
the analysis, the data context and the information available. As most interest for
PSD data is in the size and composition of particles over time, we found it useful to
concentrate on µ and λ over time; in other contexts this will change.
Although we have applied the hierarchical approach to PSD data, the approach
is generalisable to other contexts in which a mixture representation exists at multiple
time points. For example, in a disease mapping context interest may be in both the
mixture representation of the spatial surface and also in any temporal changes to
the mixture.
The hierarchical approach considered here can be readily generalised to include
covariates. Morever, through the flexibility of assuming a logistic normal distribution
on the weights we can better explore and estimate transitory movements between
components.
There are several limitations to the hierarchical approach considered. First, the
hierarchical approach relies on estimation of parameters under a fixed number of
components. In this chapter, we sought to fix the number of components based on
a first stage analysis in which we used results from the RJ approach as a guide to
the maximum number of components and for establishing hyperparameter values.
In some situations, where reliable prior information is available this first stage may
not be necessary. However, an alternative is to use a single approach and jointly
allow estimation of the parameters and the number of components (e.g RJMCMC).
This single modelling approach requires reversible moves not only within time pe-
riods but also across them. In our experience this was computationally very costly
and required substantial pre-processing to ensure good mixing, labelling and conver-
gence. Moreover, further post-processing was required to obtain adequate summary
statistics and between component mapping.
A further limitation of the approach outlined is that it is computationally ex-
pensive. Most of this expense is experienced in the first stage of the analysis. For
160
estimation of PSD data over one day using 144 time points, the running time of
the RJ approach with 200,000 iterations was about 27 hours. In comparison, the
second stage approach using 50,000 iterations took about an hour. Such computa-
tional expense quickly becomes burdensome if analyses is required for several days
or indeed several weeks. Of course, the use of the first stage for subsequent days
may not required, considerably reducing the computational time involved.
Chapter 9
Conclusions and further work
This chapter provides a brief overview of the thesis and some suggestions for further
work.
9.1 Conclusions
The primary aim of this thesis was to develop mixture model approaches to char-
acterise complex environmental exposures and outcomes. To address this primary
aim, we focussed on a number of applied problems in characterising complex en-
vironmental exposures and outcomes, including: assessing the interaction between
environmental exposures as risk factors for health outcomes; identifying differing
environmental outcomes across a region; and establishing patterns in the size and
concentration of aerosol particles over time. In this section, we discuss the four main
methodological contributions to address these problems and associated applied con-
tributions which have been made.
First, we explored the use of a mixture model in a meta-analysis setting to provide
for a joint assessment of the evidence for a number of hypothesised relationships in
the data. In Chapter 3, we examined the use of a multivariate meta-analysis to
describe the relationship between exposure to asbestos and smoking on the risk of
161
162
lung cancer. In particular, from a statistical perspective, interest was in whether
the risk from exposure to both asbestos and smoking is an additive, multiplicative
or other relation of the risk from exposure to each factor alone. In this analysis, we
considered the evidence for either relation using separate tests.
In Chapter 4, we extended the analysis in Chapter 3 and explored a mixture
model approach to assess the strength of evidence for either relation. In this ap-
proach, we moved away from separate tests for either an additive or multiplicative
relation and allowed the data to choose between both models. The approach allowed
both relations to be considered at the same time, and an advantage for inference is
that we can say with some probability whether the data belongs to one relation or
another. This type of inference may be more informative than information provided
from significance tests on each relation separately.
Second, we developed a simple mixture model approach to classify cases of a
disease over time into a number of groups. In Chapter 5, we examined a mixture
model approach to characterise the risk of Ross River virus (RRv) in Queensland.
This approach built on the approach adopted by Gatton et al. (2004) and considered
that the weekly cases of RRv could be attributed to more than two hypothesised
periods (an outbreak period or no outbreak period), and also extended the analysis
to compare the number and timing of the periods across the spatial region of QLD.
In this approach, we may be able to better identify outbreak periods when they
occur and also provide a more detailed characterisation of the data, which can be
used as a basis for association of explanatory variables.
Third, we developed and examined an informative prior approach for estimation
of mixture model parameters for multiple time points. A mixture model approach
to estimate aerosol particle size distribution (PSD) data over time was introduced
in Chapters 6, 7 and 8. In Chapter 6, we compared the results of using a Bayesian
mixture model approach to estimating PSD data with a commonly used estima-
tion method in the aerosol physics literature. In using a Bayesian mixture model
approach we were able to improve upon previous approaches by providing a better
exploration of the parameter space, and also allow the data to better choose between
alternative representations without the use of subjective decisions. As PSD data is
163
often measured over time at small time intervals, we also examined the use of an
informative prior for estimation of the mixture parameters which takes into account
the correlated nature of the parameters.
In Chapter 7, we examined in some detail the issue of using informative priors for
estimation of mixtures at multiple time points. In this analysis, the use of two dif-
ferent informative priors, and an independent prior were compared using simulated
and actual data. In general, we found that approaches that employ information
about neighbouring time points compared favourably to results based on an inde-
pendent approach. We found that by using informative priors about parameters for
correlated time periods we may be able to better identify individual components at
each time point. As an aid for inference, we may also be able to obtain smoother pa-
rameter estimates over time and from this be able to more clearly establish patterns
or identify anomalies from the data.
Analysis of the evolution of parameters of a mixture over multiple time points
also highlighted the large degree of dependency that exists between component pa-
rameters. A possible effect of using informative priors in this context is to impose a
prior not supported by the data or to impose a temporal correlation structure where
such a structure does not exist, and thereby cause unnecessary adjustments to other
parameters.
Fourth, we introduced a hierarchical approach to estimate mixture model pa-
rameters for multiple time points. In this approach (Chapter 8), we addressed some
of the issues associated with using an informative prior at a single level found in
Chapter 7, and allowed an informative prior to be placed at two different levels.
Compared to placing an informative prior at a single level, a hierarchical approach
allows for a separation of the underlying pattern of the parameter over time (signal)
from some of the noise surrounding the parameter at each time point. In this case,
we may be able to more clearly establish patterns or identify anomalies in the data.
We can also impose an informative prior which is less sensitive to changes in the
correlation structure of the data, and thereby reduce the influence of adjustments
to neighbouring parameters.
In summary, we have demonstrated that a mixture model approach can be used
164
to better understand and describe features/relationships within environmental ex-
posure data. The approach is not without significant computational and estimation
issues, and thus considerable care must be taken in using the approach for inference.
These issues, however, are likely to be outweighed by the additional information this
approach can provide to understand complex environmental exposure and outcome
data.
9.2 Future Work
The mixture models and analysis in this thesis could be extended in a number of
ways.
A mixture model approach to provide an assessment of interaction or relation-
ship between risk factors in a meta-analysis context could be extended to include
alternative relations or be used to assign preference to more than two relations. The
number of relations and the nature of the hypothesised relations being dependent
on the context of the study.
We could extend the mixture model to characterise the risk of RRv over time
to formally include a spatial dimension, where mixture model parameters for each
zone are able to borrow strength from parameter estimates of neighbouring zones.
This is similar to the approach adopted in Fernandez and Green (2002) for a single
time point, in which the weight parameter is spatially related by neighbouring sites.
Further analysis of the RRv data would be needed, however, to investigate which
parameters may be spatially related, including the timing of components.
For estimation of mixture models over time (Chapters 6, 7 and 8), a number of
extensions are possible. First, improvements to parameter estimation may be gained
by reducing the influence of the truncated nature of the size data (i.e the effect of
binning on the size of the particles). In estimation, we could take into account
the ordering of the size bins. In this case, we recognize that observations within
neighbouring size bins are more likely to be allocated to the same component. A
natural approach would be to then use a spatial prior on the allocation variable (z)
(similar to Alston et al. (2005)), and depending on the strength of prior information,
165
this could reduce the number of components covering only a small number of size
bins.
To reduce the influence of the truncated nature of the size data, we could also
expand the number of size bins used in estimation. In this approach, a number of
extra size bins are created between the original size bins and handled in estimation
as missing data. The effect of which is likely to lead to a smoother mixture repre-
sentation of the data. The tradeoff is more computational time by a factor of the
number of additional size bins created, and this would need to be evaluated against
potential improvements in estimation.
Within the MCMC framework, block updating rather than sequential updating
could be used in the hierarchical approach to minimise the effect of correlation
between parameters leading to improved convergence and mixing. This is likely to be
of most benefit for dependencies which are apparent between µ and φ (Equation 8.2)
or λ and γ (Equation 8.5)
Any improvements to computational time are worthy of investigation. Analyses
requiring several weeks or months will require significant computational demands.
Population monte carlo (Celeux et al. (2003)), or perfect sampling (Casella et al.
(2004)) could be investigated and developed to allow for estimation of a mixture
over multiple time points.
The hierarchical approach to estimation could also be extended to include a hi-
erarchical structure for the variance (σ2), and alternative correlation structures. For
the variance, a flexible prior such as a truncated t-distribution could be investigated
(Gelman (2006)). The correlation structure could also be extended to include covari-
ates, which could provide further information to aid in identification of components
at each time point.
Further analysis could also be undertaken to associate the components (modal
structure) of the mixture model with health outcomes. Evidence on the association
of air pollution particles with a number of respiratory related diseases is growing
(Osunsanya et al. (2001); Chen et al. (2006)). Such a detailed characterisation of the
data would enable a more representative association to be obtained with either the
source of the particles or a range of particles of a particular size and concentration.
Appendix A
Appendices
A.1 Calculations for the variance of S and V (Ch.3)
Variance of S
We calculated the variance of S based on Rothman (1976). A large sample interval
estimator for S based on a log-Gaussian sampling distribution would be
SL = exp(ln(S)− Z1−α/2SE(ln(S)))
SU = exp(ln(S) + Z1−α/2SE(ln(S)))(A-1)
The evaluation of SE(ln(S)) depends upon the type of study. For case-control
studies,
SE(ln(S)) =
[ˆvar( ˆRRAS)
( ˆRRAS − 1)2+
ˆvar( ˆRRS) + ˆvar( ˆRRA) + 2 ˆcov( ˆRRS, ˆRRA)
( ˆRRS + ˆRRA − 2)2
− 2 ˆcov( ˆRRAS, ˆRRS + ˆRRA)
( ˆRRAS − 1)( ˆRRS + ˆRRA − 2)
]1/2
,
(A-2)
where
ˆvar( ˆRRij) = ˆRRij2(
1
aij
+1
cij
+1
b+
1
d
)(A-3)
166
167
ˆcov( ˆRRS, ˆRRA) = ˆRRSˆRRA
(1
b+
1
d
)(A-4)
ˆcov( ˆRRAS, ˆRRS + ˆRRA) = ˆRRAS( ˆRRS + ˆRRA)
(1
b+
1
d
), (A-5)
and b and d denote the frequencies of cases and controls in the low-risk category for
both risk indicators, and aij and cij denote the frequencies of cases and controls in
(non-referent) risk category i, j.
For cohort studies (with small effects), using first order Taylor series approximations,
SE(ln(S)) =
[ˆvar( ˆRAS) + ˆvar(R00)
( ˆRAS − R00)2+
ˆvar(RS) + ˆvar(RA) + 4 ˆvar(R00)
(RS + RA − 2R00)2
− 4 ˆvar(R00)
( ˆRAS − R00)(RS + RA − 2R00)
]1/2(A-6)
where ˆvar(Rij) can be taken as Rij/Mij with Mij denoting the total number of
observations in the joint risk indicator category i, j.
Variance of V
For case-control studies, V can also be expressed as RRAS compared to RRS(X2)
divided by RRA(X1). V = X2/X1.
var(log(X1)) = 1/a + 1/b + 1/c + 1/d
var(log(X2)) = 1/e + 1/f + 1/g + 1/h
var(log(V )) = var(log(X1)) + var(log(X2))
(A-7)
where a to h denote the frequency of cases and controls for each risk category, and
X1 and X2 are assumed to be independent.
For cohort studies with background risk not externally referenced,
var(log(V )) = 1/a + 1/b + 1/e + 1/f (A-8)
For cohort studies with background risk externally referenced, we used the variance
for the ratio of two standardised ratios found in Gardner and Altman (1989).
168
A.2 Reversible Jump Markov Chain Monte Carlo
(RJMCMC) (Ch.6)
In this section, we outline details of the RJMCMC algorithm used in this chapter.
An important feature of the algorithm is the variable dimension of the state spaces.
The change of dimension for the mixture model using Reversible Jump Markov Chain
Monte Carlo (RJMCMC) is achieved by either splitting an existing component into
two separate components (increasing the dimension of the model by one component)
or merging two existing components into a single component, commonly known as
the split/merge step of the algorithm.
To split a component, a vector of continuous random variables (u), which are
independent of the current model, are drawn and applied in an invertible determin-
istic function to propose a new model. The proposal is designed to be deterministic
in order that the reverse of the split move, the corresponding merge move, can be
obtained through the inverse transformation of the function.
The other dimension changing moves proposed in RJMCMC are the addition of
a new component or the removal of an empty component which is currently in the
model. These proposals are referred to as births and deaths, respectively.
The Normal mixture model is computed iteratively as follows;
1. Given (λ,µ,σ), update allocation vector z,
2. Given (k, µ,σ), update estimates of the weights λ,
3. Given (k, λ), update Normal component parameters µj and σ2j , j ∈ {1, · · · , k}
4. Update hyperparameters as required,
5. Propose a split or merge for the components in the current model, and accept
with probability
In this scheme, steps 1-4 do not involve changes in dimension and are updated
using standard Gibbs moves outlined below, with the following conjugate priors;
169
p(µj) ∼ N(ξ, κ−1) (A-9)
p(σ−2j ) ∼ Gamma(δ, β) (A-10)
p(β) ∼ Gamma(g, h) (A-11)
p(λ) ∼ Dirichlet(α1, α2, · · · , αk) (A-12)
p(k) ∼ Uniform(kmin, kmax) (A-13)
where ξ, κ, δ, α, η, g, h, kmin and kmax are fixed hyperparameters. Note, in this
case all µj follow a universal prior.
We can construct these prior distributions to be weakly informative and use their
conjugacy to obtain proper posterior distributions for the unknown mixture model
parameters.
Step 5 requires the reversible jump mechanism of the algorithm. The choice
between a split or merge move is made with equal probability, with the only exception
being at the extremes of the allowable range for k (if k = kmin, the probability of
proposing a split move is 1, and if k = kmax, the probability of a split move is 0).
To propose a split move, Richardson and Green (1997) generate a 3 dimensional
random vector u using beta distributions;
u1 ∼ beta(2, 2), u2 ∼ beta(2, 2), u3 ∼ beta(1, 1)
and randomly choose one of the current k components to be split. For example, we
will assume component j is chosen for a split move. The proposed transformation
of variables ((θ(n), uθn) = Tm→n(θ(m), uθm)) is;
λj1 = u1λj and λj2 = (1− u1)λj
µj1 = µj − u2σj
√λj2
λj1
and µj2 = µj + u2σj
√λj1
λj2
σ2j1 = u3
(1− u2
2
)σ2
j
(λj
λj1
)and σ2
j2 = (1− u3)(1− u2
2
)σ2
j
(λj
λj2
)
170
where dim(n) > dim(m). The allocation vector zi, where zij = 1, is redrawn so that
the data which is currently allocated to component j is now reallocated to either
component j1 or j2.
The split proposal is the reverse of the merge proposal for components j1 and
j2. To propose the merger of 2 components, the parameters of the mixture model
for these components are reassigned by matching the 0th, 1st and 2nd moments for
the distribution;
λj = λj1 + λj2
λjµj = λj1µj1 + λj2µj2
λj
(µ2
j + σ2j
)= λj1
(µ2
j1 + σ2j1
)+ λj2
(µ2
j2 + σ2j2
)
The allocation vector zi, where zij1 = 1 or zij2 = 1 is amalgamated so that the
allocation becomes zij = 1.
In the case of a split move, the probability of acceptance for the move from model
Mm to Mn is
min
(π(n, θ(n))
π(m, θ(m))
πnm
πmn
g(uθn)
g(uθm)
∣∣∣∣∂Tm→n(θ(m), uθm)
∂(θ(m), uθm)
∣∣∣∣, 1)
(A-14)
involving the Jacobian of the transform Tm→n, the probability πmn of choosing a
jump to Mm while in Mn, and g, the density of u. The acceptance probability for
the merge move is the inverse ratio of a split.
171
A.3 Penalised Prior (Ch.6)
In this section we outline the rejection sampling algorithm for λ proposed by Gustafson
and Walker (2003) for the penalised prior approach.
Prior
p(λ) ∝ Dirichlet(1, . . . , 1) exp
(T∑
t=2
‖λt,j − λt−1,j‖2
φ
)(A-15)
Posterior
p(λ|φ,m) ∝k∏
j=1
{ T∏t=1
f(λjt|mjt + 1)I(λjt)}
exp
(T∑
t=2
‖λt,j − λt−1,j‖2
φ
)(A-16)
Gustafson and Walker (2003) suggest sampling λjt/s from a Beta(mjt+1,mkt+1)
and accepting when U ≤ g1(λjt)/g2(λjt) (U ∼ U(0, 1)), where
g1(λjt) = λmjt
jt (s− λjt)mktI(λt)
× exp[−φ−2{(λt,j − λt−1,j)2 + (λt,j − (s− λt−1,j))
2
+ (λt,j − λt+1,j)2 + (λt,j − (s− λt+1,k))
2]) (A-17)
and
g2(λjt) = λmjt
jt (s− λjt)mktI(λt)
× exp[−φ−2{(λ∗ − λt−1,j)2 + (λ∗ − (s− λt−1,j))
2
+ (λ∗ − λt+1,j)2 + (λ∗ − (s− λt+1,k))
2]) (A-18)
where
λ∗ = max{
0, min{1
4(λt−1,j + s− λt−1,k + λt+1,j + s− λt+1,k), s
}}(A-19)
172
s = λjt + λkt and g1(λjt) ≤ g2(λjt). I(λt) is an indicator function equal to 1 when
λt ∈ R and 0 otherwise.
173
A.4 Details of MH Gibbs sampler for hierarchical
model (Ch. 8)
HM for µ
Update z, β, λ, σ as in Independent Model.
After updating λ and before updating σ,
φjt|. ∼ N(V1φjt−1 + V2µjt
V1 + V2
,1
V −11 + V −1
2
)
µjt|. ∼ N
(φjt + mj yjV1σ
−2jt
V1mjσ−2jt + 1
,V1
(V1mjσ−2jt + 1)
)
HM for λ
Update zi, then update γt,
Sample from conditional,
Xt ∼MV N(Σ−1
d Wt + Σ−1s Xt−1
Σ−1d + Σ−1
s
,1
Σ−1d + Σ−1
s
)
γjt =exp(Xj,t)∑kj=1 exp(Xj,t)
where Xkt = 0.
Update λt
Sample from
174
Wi+1,t ∼ MV N(Wit, σ2pI)
with density = q(Wi+1,t|Wit)
where λi+1,t =exp(Wi+1,t)∑kj=1 exp(Wi+1,t)
and accept this proposal with probability min(α, 1)where
α =π(λi+1,t)q(Wi+1,t|Wit)
π(λit)q(Wit|Wi+1,t)
Let Wk1 = 0 and for T = 2, Wj2 = log(mj,t−1
mk,t−1), where mj,t−1 is the mean number
of observations allocated to component j in the previous time period (under the
independent approach).
π(λit) = LN(λit; Wt, Σd)×∏k
j=1 λmjt
ijt . σ2p is the variance of the proposal. Proposal
accepted if u < α where u ∼ U(0, 1), otherwise λi+1,t = λit.
Update µ, β and σ as in Independent Model.
Bibliography
Alston, C. L., K. Mengersen, and C. P. Robert (2005). Bayesian mixture models in a
longitudinal setting for analysing sheep CAT scan images. Journal of Agriculture.
Archer, V. (1988). Lung cancer risks of underground miners. The Yale Journal of
Biology and Medicine 61, 183–193.
Ashby, D. (2006). Bayesian statistics in medicine: A 25 year review. Statistics in
Medicine 25, 3589–3631.
Begg, C. B. and M. Mazumdar (1994). Operating characteristics of a rank correlation
test for publication bias. Biometrics 50, 1088–1101.
Berry, D. A. and F. D. K. Liddell (2004). The interaction of asbestos and smoking
in lung cancer: A modified measure of effect. Annals of Occup. Hygiene 48 (5),
459–462.
Berry, G., M. L. Newhouse, and P. Antonis (1985). Combined effect of asbestos
and smoking on mortality from lung cancer and mesothelioma in factory workers.
British Journal of Industrial Medicine 42, 12–18.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems.
Journal of the Royal Statistical Society: Series B 36, 192–236.
Birmili, W., A. Wiedensohler, J. Heintzenberg, and K. Lehmann (2001). Atmo-
spheric particle number size distribution in central Europe: Statistical relations
to air masses and meteorology. J. Geophys. Res. 106, 32005–32018.
175
176
Bowden, J., J. R. Thompson, and P. Burton (2006). Using pseudo-data to correct
for publication bias in meta-analysis. Statistics in Medicine 25, 3798–3813.
Breslow, N. E. and N. E. Day (1987). Statistical methods in cancer research. Vol 2:
The design and analysis of cohort studies. IARC.
Breslow, N. E. and B. E. Storer (1985). General relative risk functions for case-
control studies. American Journal of Epidemiology 122, 149–162.
Brooks, S. P. and A. Gelman (1998a). Alternative methods for monitoring conver-
gence of iterative simulations. Journal of Computational and Graphical Statis-
tics 7, 434–455.
Brooks, S. P. and A. Gelman (1998b). Alternative methods for monitoring conver-
gence of iterative simulations. Journal of Computational and Graphical Statis-
tics 7, 434–455.
Brown, C. C. and C. K. C (1989). Additive and mulitplicative models and multistage
carcinogenesis theory. Risk analysis 9, 99–105.
Cappe, O., C. P. Robert, and T. Ryden (2002). Reversible jump MCMC converging
to birth-and-death MCMC and more general continuous time samplers. Journal
of the Royal Statistical Society, Series B 65 (3), 679–700.
Carlin, J. B. (1992). Meta-analysis for 2×2 tables: a Bayesian approach. Statistics
in Medicine 11, 141–158.
Casella, G., C. P. Robert, and M. T. Wells (2004). Mixture models, latent variables
and partitioned important sampling. Statistical Methodology 1, 1–18.
Celeux, G., F. Forbes, C. P. Robert, and M. Titterington (2003, June). Deviance
Information Criteria for missing data models. Technical report, Institut National
De Recherche en Informatique et en Automatique.
Celeux, G., M. Hurn, and C. P. Robert (2000). Computational and inferential dif-
ficulties with mixture posterior distributions. Journal of the American Statistical
Association 95 (451), 957–970.
177
Celeux, G., J. M. Marin, and C. P. Robert (2003). Iterated importance sampling in
missing data problems. Technical report, Universite Paris Dauphine.
Chen, L., K. Mengersen, and S. Tong (2006). Spatiotemporal relationship between
particle air pollution and respiratory emergency hospital admissions in brisbane,
australia. Science of the total environment 373 (1), 57–67.
Cohen, D., S. F. Arai, and J. D. Brain (1979). Smoking impairs long-term dust
clearance from the lung. Science 204, 514–516.
Dal Masso, M., M. Kulmala, and I. Riipinen (2005). Formation and growth of fresh
atmospheric aerosols: eight years of aerosol size distribution data from smear ii,
hyytiala, finland. Boreal Environment Research 10, 323–336.
Dalrymple, M. L., I. L. Hudson, and R. P. K. Ford (2003). Finite mixture, zero-
infated poisson and hurdle models with application to SIDS. Computational Statis-
tics & Data Analysis 41, 491–504.
Dempster, A. P., M. R. Selwyn, and B. J. Weeks (1983). Combining historical and
randomized controls for assessing trends in proportions. Journal of the American
Statistical Association 78, 221–227.
Denison, D. G. and C. C. Holmes (2001). Bayesian partitioning for estimating disease
risk. Biometrics 57, 143–149.
DerSimonian, R. and N. Laird (1996). Meta-analysis in clinical trials. Controlled
clinical trials 7, 177–188.
Diebolt, J. and C. P. Robert (1994). Estimation of finite mixture distributions
through bayesian sampling. Journal of the Royal Statistical Society, Series B 56,
363–375.
Do, K. A., P. Muller, and F. Tang (2005). A Bayesian mixture model for differential
gene expression. Journal of the Royal Statistical Society C 54 (3).
Doll, R. (1971). The age distribution of cancer: implications for models of carcino-
genesis. JRSS(A) 134, 133–166.
178
Dominici, F., M. Daniels, S. L. Zeger, and J. Samet (2002). Air pollution and
mortality: Estimating regional and national dose-response relationships. Journal
of the American Statistical Association 97 (457), 100–111.
Dominici, F., J. Samet, and S. L. Zeger (2000). Combining evidence on air pollution
and daily mortality from the largest 20 U.S cities: a hierarchical modelling strategy
(with discussion). Journal of the Royal Statistical Society, Series A 163, 263–302.
Dominici, F., A. Zanobetti, S. L. Zeger, J. Schwartz, and J. M. Samet (2004).
Hierarchical bivariate time series models: a combined analysis of the effects of
particulate matter on morbidity and mortality. Biostatistics 5, 341–360.
Dumouchel, W. (1990). Bayesian meta-analysis. In D. A. Berry (Ed.), Statistical
methodology in the Pharmaceutical Sciences, pp. 509–529. New York: Dekker.
Dumouchel, W. and J. E. Harris (1983). Bayes methods for combining the results
of cancer studies in humans and other species (with discussion). Journal of the
American Statistical Association 78, 293–315.
Duval, S. J. and R. L. Tweedie (2000). A non-parametric “trim and fill” method
of accounting for publication bias in meta-analysis. Journal of the American
Statistical Association 95, 89–98.
Egger, M., G. D. Smith, M. Schneider, and C. Minder (1997). Bias in meta-analysis
detected by a simple, graphical test. British Medical Journal 315, 629–634.
Erren, T., M. Jacobsen, and C. Piekarski (1999). Synergy between asbestos and
smoking on lung cancer risks. Epidemiology 10 (4), 405–411.
Fahrmeir, L., T. Kneib, and S. Lang (2004). Penalized structured additive regression
for space-time data: A bayesian perspective. Statistica Sinica 14 (3), 731–761.
Fernandez, C. and P. J. Green (2002). Modelling spatially correlated data via mix-
tures: A Bayesian approach.
179
Fruhwirth-Schnatter, S. (2001). Markov chain Monte Carlo estimation of classi-
cal and dynamic switching mixture models. Journal of the American Statistical
Association 96, 194–209.
Fruhwirth-Schnatter, S. and S. Kaufmann (2004). Model-based clustering of multiple
time series. Report, Johannes Kepler Universitat Linz.
Gangnon, R. E. and M. K. Clayton (2000). Bayesian detection and modeling of
spatial disease clustering. Biometrics 56, 922–935.
Gardner, M. J. and D. G. Altman (1989). Statistics with confidence. London: BMJ.
Gatton, M. L., L. A. Kelly-Hope, B. H. Kay, and P. A. Ryan (2004). Spatial-
temporal analysis of Ross River virus disease patterns in Queensland, Australia.
American Journal of Tropical Hygiene and Medicine 71 (5), 629–635.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical mod-
els. Bayesian analysis 1 (3), 515–533.
Geweke, J. (2007). Interpretation and inference in mixture models: Simple MCMC
works. Computational Statistics & Data Analysis 51, 3529–3550.
Goldberg, M. (1999). Asbestos and cancer risk: the exposure-effect relationship for
populations with occupational exposure. Revue Des Maladies Respiratoires 16,
1278–1285.
Green, P. and S. Richardson (2002). Hidden markov models and disease mapping.
Journal of the American Statistical Association 97 (460), 1055–1070.
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and
Bayesian model determination. Biometrika 82, 711–732.
Greenland, S. and K. J. Rothman (1998). Modern methods in epidemiology, Volume
2nd edition. Philadelphia: Lippincott-Raven.
Griffin, J. and M. Steel (2004). Semiparametric Bayesian inference for stochastic
frontier models. Journal of Econometrics 123 (1), 121–152.
180
Guerrero, V. M. and R. A. Johnson (1982). Use of the Box-Cox transformation with
binary response models. Biometrika 69 (2), 309–14.
Guidotti, T. (2002). Apportionment in asbestos-related disease for purposes of com-
pensation. Industrial Health 40, 295–311.
Gustafson, P. and L. J. Walker (2003). An extension of the Dirichlet prior for the
analysis of longtitudinal multinomial data. Journal of Applied Statistics 30 (3),
293–310.
Gustavsson, P., F. Nyberg, G. Pershagen, and et al (2002). Low-dose exposure to
asbestos and lung cancer: dose-response relations and interaction with smoking in
a population-based case-referent study in stockholm, sweden. American Journal
of Epidemiology 155 (11), 1016–1022.
Hallqvist, J., A. Ahlbom, F. Diderichsen, and et al (1996). How to evaluate interac-
tion between causes: a review of practices in cardiovascular epidemiology. Journal
of Internal Medicine 239, 377–382.
Hamilton, J. D. (1994). Time series analysis. New Jersey: Princeton University
Press.
Harley, D., A. Sleigh, and S. Ritchie (2001). Ross River virus transmission, infection,
and disease: a cross-disciplinary review. Clin Microbiol Rev 14, 909–932.
Hasselblad, V. (1995). Meta-analysis of environmental health data. The Science of
the Total Environment 160/16, 545–558.
Hodgson, J. and A. Darnton (2000). The quantitative risks of mesothelioma and lung
cancer in relation to asbestos exposure. Annals of Occupational Hygiene 44 (8),
565–601.
Hoff, P. D. (2003). Nonparametric modelling of hierarchically exchangeable data.
Technical report, Dept. Statistics, University of Washington.
181
Hussein, T., M. Dal Masso, and P. T (2005). Evaluation of an automatic algorithm
for fitting the particle number size distributions. Boreal Environment Research 10,
337–355.
Hussein, T., A. Puustinen, P. P. Aalto, J. M. Mkel, K. Hmeri, and M. Kulmala
(2004). Urban aerosol number size distributions. Atmos. Chem. Phys. 4, 391–411.
Ibrahim, J. G., M. H. Chen, and D. Sinha (2003). On optimality properties of the
power prior. Journal of the American Statistical Association 98, 204–213.
Ishwaran, H., L. F. James, and J. Sun (2001). Bayesian model selection in finite
mixtures by marginal density decompositions. Journal of the American Statistical
Association 96, 1316–1322.
Jasra, A., C. C. Holmes, and D. A. Stephens (2005). Markov Chain Monte Carlo
methods and the label switching problem in bayesian mixture modelling. Statis-
tical Science 20 (1), 50–67.
Kass, R. and A. E. Raftery (1995). Bayes factors. Journal of the American Statistical
Association 90, 773–795.
Kelly-Hope, L. A., D. M. Purdie, and B. H. Kay (2004). Ross River virus disease
in australia, 1886-1998, with analysis of risk factors associated with outbreaks. J
Med Entomology 41, 133–150.
Knorr-Held, L. and G. Rasser (1999). Bayesian detection of clusters and disconti-
nuities in disease maps. Biometrics 56 (13), 13–21.
Kvam, P. and J. Miller (2002). Discrete predictive analysis in probabilistic safety
assessment. Journal of Quality Technology 34 (1), 106–117.
Landrigan, P. (1998). Editorial: Asbestos - Still a carcinogen. N Engl J Med 338,
1618–1619.
Lee, J. and J. O. Berger (2003). Space-time modeling of vertical ozone profiles.
Environmetrics 14 (6), 617–639.
182
Lee, P. (2001). Relation between exposure to asbestos and smoking jointly and the
risk of lung cancer. Occup Environ Med 58, 145–153.
Lee, P. (2002). Author’s Reply: Joint action of smoking and asbestos exposure on
lung cancer. Occup Environ Med 59, 495–496.
Liddell, F. (2001). The interaction of asbestos and smoking in lung cancer. Ann
Occup Hyg 45 (5), 341–356.
Liddell, F. (2002). Letter: Joint action of smoking and asbestos exposure on lung
cancer. Occup Environ Med 59, 494–495.
Liddell, F. D. K. and B. G. Armstrong (2002). The combination of effects on lung
cancer of cigarette smoking and exposure in Quebec chrysotile miners and millers.
Ann Occup Hyg 46 (1), 5–13.
Lin, M., P. Roche, J. Spencer, A. Milton, P. Wright, and D. Witteveen (2002).
Australia’s notifiable disease status, 2000. Annual report of the National Notifiable
Diseases Surveillance System. Commun Dis Intell 26, 118–175.
Lindley, D. W. and A. F. M. Smith (1972). Bayes estimates for the linear model
(with Discussion). Journal of the Royal Statistical Society, Series B 34, 1–41.
Lu, J. and F. N. Bowman (2004). Conversion of multicomponent aerosol size dis-
tributions from sectional to modal representations. Aerosol Science and Technol-
ogy 38, 391–399.
Lubin, J. H. and W. Gaffey (1988). Relative risk models for assessing the joint
effects of multiple factors. American Journal of Industrial Medicine 13, 131–147.
Makela, J. M., I. K. Koponen, P. Aalto, and M. Kulmala (2000). One-year data of
sub-micron size modes of tropospheric background aerosol in southern finland. J.
Aerosol Sci. 31, 595–611.
Marin, J. M., K. Mengersen, and C. P. Robert (2005). Bayesian modelling and
inference on mixtures of distributions. In D. Dey and C. R. Rao (Eds.), Handbook
of Statistics, Volume 25. Elsevier-Sciences.
183
McFallan, S. (2001). Climatic change and its impact on Ross River virus. Masters
thesis, School of Mathematical Sciences, Queensland University of Technology.
McLachlan, G. and D. Peel (2000a). Finite Mixture Models. New York: John Wiley
and Sons Ltd.
McLachlan, G. J. and D. Peel (2000b). Finite Mixture Models. New York: John
Wiley & Sons.
Mejia, J. F., D. Wraith, K. Mengersen, and L. Morawska (2007). Trends in size
classified particle number concentration in subtropical Brisbane, Australia, based
on a 5 year study. Atmospheric Environment 41 (5), 1064–1079.
Nam, I., K. Mengersen, and G. Garthwaite (2003). Multivariate meta-analysis.
Statistics in Medicine 22, 2309–2333.
Osunsanya, T., G. Prescott, and A. Seaton (2001). Acute respiratory effects of
ultrafine particles: mass or number? Occup. Environ. Med. 58, 154–159.
Peto, R., A. D. Lopez, J. Boreham, M. Thun, J. C. Heath, and R. Doll (1996).
Mortality from smoking worldwide. British Medical Bulletin 52, 12–21.
Phillips, D. B. and A. F. M. Smith (1996). Bayesian model comparison via jump
diffusions. In W. R. Gilks, S. Richardson, and D. J. Spiegelhalter (Eds.), Markov
chain Monte Carlo in Practice, pp. 215–40. Boca Raton: Chapman and Hall.
Rafnsson, V. and P. Sulem (2003). Cancer incidence among marine engineers, a
population based study (Iceland). Cancer Causes & Control 14 (1), 29–35.
Raftery, A. E. (1996). Hypothesis testing and model selection. In W. R. Gilks,
S. Richardson, and D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in prac-
tice, pp. 163–188. Boca Raton: Chapman and Hall.
Reif, A. and T. Heeren (1999). Consensus on synergism between cigarette smoke
and other environmental carcinogens in the causation of lung cancer. Advances in
Cancer Research 76, 161–186.
184
Richardson, S. and P. J. Green (1997). On bayesian analysis of mixtures with an
unknown number of components (with discussion). Journal of the Royal Statistical
Society, Series B 59, 731–792.
Robert, C. P. and G. Casella (2004). Monte Carlo Statistical Methods. New York:
Springer-Verlag.
Rosamilia, K., O. Wong, and G. K. Raabe (1999). A case-control study of
lung cancer among refinery workers. Journal of Occupational & Environmental
Medicine 41 (12), 1091–1103.
Rothman, K. (1974). Synergy and antagnism in cause-effect relationships. American
Journal of Epidemiology 99, 385–388.
Rothman, K. (1976). The estimation of synergy or antagonism. American Journal
of Epidemiology 103, 506–511.
Roy, P. and J. Esteve (1998). Using relative risk models for estimating synergy
between two risk factors. Statistics in Medicine 17, 1357–1373.
Russell, R. C. and D. E. Dwyer (2000). Arboviruses associated with human disease
in Australia. Microbes Infect 2, 1693–1704.
Salanti, G., J. P. Higgins, and I. R. White (2006). Bayesian synthesis of epidemio-
logical evidence with different combinations of exposure groups: application to a
gene-gene-environmental interaction. Statistics in Medicine 25, 4147–4163.
Saracci, R. (1977). Asbestos and lung cancer: an analysis of the epidemiological
evidence on the asbestos-smoking interaction. Int J Cancer 20, 323–331.
Saracci, R. (1987). The interactions of tobacco smoking and other agents in cancer
etiology. Epidemiology Review 9, 175–193.
Saracci, R. and P. Boffetta (1994). Interactions of tobacco smoking with other causes
of lung cancer. In J. M. Samet (Ed.), Epidemiology of lung cancer: lung biology
in health and disease.
185
Schabath, M. B., M. R. Spitz, G. L. Delclos, G. B. Gunn, L. W. Whitehead,
and X. Wu (2002). Association between asbestos exposure, cigarette smoking,
myeloperoxidase (MPO) genotypes, and lung cancer risk. American Journal of
Industrial Medicine 42, 29–37.
Scott, S. L. (2002). Bayesian methods for hidden Markov models: Recursive com-
puting in the 21st Century. Journal of the American Statistical Association 97,
337–351.
Scott, S. L., G. M. James, and C. A. Sugar (2004). Hidden markov models for
longtitudinal comparisons.
Seinfeld, J. and S. N. Pandis (1998). Atmospheric chemistry and physics: from air
pollution to climate change. United States of America: John Wiley and Sons.
Selikoff, I., E. Hammond, and J. Churg (1968). Asbestos exposure, smoking and
neoplasia. JAMA 204, 104–110.
Silliman, N. P. (1997). Hierarchical selection models with applications in meta-
analysis. Journal of the American Statistical Association 92, 926–936.
Sogacheva, L., M. Dal Maso, V. Kerminen, and M. Kulmala (2005). Probability
of nucleation events and aerosol particle concentration in different air masses ar-
riving at hyytiala, southern finland, based on back trajectories analysis. Boreal
Environment Research 10, 479–491.
Spiegelhalter, D. J., K. R. Abrams, and J. P. Myles (2004). Bayesian approaches to
clinical trials and health-care evaluation. Statistics in practice. Chichester: Wiley.
Spiegelhalter, D. J., A. Thomas, and N. G. Best (2002). Winbugs version 1.4 user
manual. Research report. Cambridge: Medical Research Council Biostatistics.
Stayner, L. T., R. Smith, J. Bailer, and et al (1997). Exposure-response analysis
of risk of respiratory disease associated with occupational exposure to chrysotile
asbestos. Occupational and Environmental Medicine 54, 646–652.
186
Steenland, K. and M. Thun (1986). Interaction between tobacco smoking and oc-
cupational exposures in the causation of lung cancer. J Occup Med 28, 110–118.
Stephens, M. (2000a). Bayesian analysis of mixture models with an unknown number
of components - an alternative to reversible jump methods. Annals of Statistics 28,
40–74.
Stephens, M. (2000b). Dealing with label switching in mixture models. Journal of
the Royal Statistical Society, Series B 62, 795–809.
Sutton, A. J. and K. R. Abrams (2001). Bayesian methods in meta-analysis and
evidence synthesis. Statistical methods in medical research 10, 277–303.
Sutton, A. J. and J. P. T. Higgins (2007). Recent developments in meta-analysis.
Statistics in Medicine (Online 18 April 2007).
Sutton, A. J., F. Song, S. Gilbody, and K. R. Abrams (2000). Modelling publication
bias in meta-analysis: a review. Statistical methods in medical research 9, 421–445.
Thomas, D. C. (1981). General relative-risk models for survival time and matched
case-control analysis. Biometrics 37, 673–686.
Thompson, S. G. and S. J. Sharp (1999). Explaining heterogeneity in meta-analysis:
A comparison of methods. Statistics in Medicine 18, 2693–2708.
Titterington, D. M., A. F. M. Smith, and U. E. Makov (1985). Statistical Analysis
of Finite Mixture Distributions. Chichester: Wiley.
Tong, S. (2004). Ross River virus disease in Australia: epidemiology, socioecology
and public health response. Internal Medicine Journal 34, 58–60.
Tong, S. and W. Hu (2002). Different responses of Ross River virus to climate
variability between coastline and inland cities in Queensland, Australia. Occup
Environ Med 59, 739–744.
Tritchler, D. (1999). Modelling study quality in meta-analysis. Statistics in
Medicine 18, 2135–2145.
187
Tweedie, R. L., B. Biggerstaff, D. Scott, and K. Mengersen (1996). Bayesian
meta-analysis with application to studies of ETS and lung cancer. Lung Can-
cer 14 (Suppl 1), S171–S194.
Ulvestad, B., K. Kjaerheim, J. I. Martinsen, and et al (2002). Cancer incidence
among workers in the asbestos-cement producing industry in Norway. Scandina-
vian Journal of Work Environment and Health 28 (6), 411–417.
UNSCEAR (1982). Ionizing radiation: Sources and biological effects. Technical
report, United Nations Scientific Committee on the Effects of Atomic Radiation.
Vainio, H. and P. Boffetta (1994). Mechanisms of the combined effect of asbestos
and smoking in the etiology of lung cancer. Scandinavian Journal of Work Envi-
ronment & Health 20, 235–242.
van der Linde, A. and G. Osius (2001). Estimation of non-parametric multivariate
risk functions in matched case-control studies with application to the assessment
of interactions of risk factors in the study of cancer. Statistics in Medicine 20,
1639–1662.
Vesala, T., J. Haataja, P. Aaalto, and e. al (1998). Long-term field measurements of
atmospheric-surface interactions in boreal forest ecology, micrometerology, aerosol
physics, and atmospheric chemistry. Trends in Heat, Mass and Momentum Trans-
fer 4, 17–35.
Waage, H., L. Vatten, and E. Opedal (1997). Smoking intervention in subjects at
risk of asbestos-related lung cancer. Am J Ind Med 31, 705–12.
Walker, A. (1981). Proportion of disease attributable to the combined effect of two
factors. Int J Epidemiol 10, 81–85.
Walshaw, D. (2000). Modelling extreme wind speeds in regions prone to hurricanes.
Applied Statistics 49 (1), 51–62.
Watanabe, T. (2000). A Bayesian analysis of dynamic bivariate mixture models: Can
they explain the behaviour of returns and trading volume? Journal of Business
and Economic Statistics 18 (2), 199–210.
188
West, M. and P. J. Harrison (1997). Bayesian Forecasting and Dynamic Models (2nd
ed.). New York: Springer-Verlag.
Whitby, E. (1978). The physical characteristics of sulfur aerosols. Atmos. Envi-
ron. 12, 135–159.
Whitby, E. and P. H. McMurry (1997). Modal aerosol dynamics modeling. Aerosol
Sci. Technol. 27, 673–688.
Whitby, E., P. H. McMurry, and U. Shanker (1991). Modal aerosol dynamics mod-
elling. Technical report, U.S. Environment Protection Agency, Atmospheric Re-
search and Exposure Assessment Laboratory.
Whitby, E., F. Stratmann, and M. Wilck (2002). Merging and remapping modes in
modal aerosol dynamics models: a ’dynamic mode manager’. Aerosol Science 33,
623–645.
Wildner, M. and A. Markuzzi (1997). Interaction and model selection. Journal of
Internal Medicine 241 (6), 535–536.
Wolpert, R. L. and K. Mengersen (2004). Adjusted likelihood for synthesising empiri-
cal evidence from studies that differ in quality and design: effects of environmental
tobacco smoke. Statistical Science 19, 450–471.
Wraith, D. and K. Mengersen (2007). Assessing the combined effect of asbestos expo-
sure and smoking on lung cancer: A Bayesian approach. Statistics in Medicine (28
Feb.), 1150–1169.
Xu, Z., M. Gautam, and S. Mehta (2002). Cumulative frequency fit for particle size
distribution. Applied Occupational and Environmenta l Hygiene 17 (8), 538–42.