Bayesian mixture modelling for characterising...

School of Mathematical Sciences

Queensland University of Technology

Bayesian mixture modellingfor characterising

environmental exposuresand outcomes

Darren Eastwood Wraith

BCom(Econ), Post Grad Dipl Health Econ & Eval, BMath

A thesis submitted for the degree of Doctor of Philosophy in the Faculty of

Science, Queensland University of Technology according to QUT requirements.

Principal Supervisor: Prof. Kerrie Mengersen

Associate Supervisors: Assoc. Prof. Shilu Tong; Dr Clair Alston

2008

Abstract

Environmental exposure and outcomes assessment is a great challenge to scientists.

Increasingly more and more detailed data are becoming available to understand the

nature and complexity of the relationships involved. The methodology of mixture

models provides a means to understand, quantify and describe features and relation-

ships within complex data sets. In this thesis, we focussed on a number of applied

problems to characterise complex environmental exposure and outcomes, including:

assessing the interaction between environmental exposures as risk factors for health

outcomes; identifying differing environmental outcomes across a region; and estab-

lishing patterns in the size and concentration of aerosol particles over time. Mixture

model approaches to address these problems are developed and examined for their

suitability in these contexts.

i

List of publications and manuscripts arising from this

thesis

This thesis comprises the following publications which have been accepted, or sub-

mitted, for publication in international refereed journals

Chapter 3: Wraith D. & Mengersen K. Assessing the combined effect of asbestos exposure

and smoking on lung cancer: A Bayesian approach. Statistics in Medicine, 28

February 2007, 1150-1169

Chapter 4: Wraith D. & Mengersen K. A Bayesian approach to assess interaction be-

tween known risk factors: the risk of lung cancer from exposure to asbestos

and smoking. Statistical Methods in Medical Research. (Published online 14

August 2007)

Chapter 5: Wraith D., Mengersen K., Low Choy S., Tong S. Spatial and Temporal Mod-

elling of Ross River virus in Queensland. In Zerger, A. and Argent, R.M. (eds)

MODSIM 2005 International Congress on Modelling and Simulation. Mod-

elling and Simulation Society of Australia and New Zealand, December 2005

Chapter 6: Wraith D., Alston C., Mengersen K., & Hussein T. Bayesian mixture model

estimation of aerosol particle size distributions. Environmetrics (Submitted:

November 2007)

Chapter 7: Wraith D., Alston C., Mengersen K., & Hussein T. Bayesian estimation of

mixtures over time with application to aerosol particle size distributions. Sta-

tistical Modelling (Submitted: November 2007)

ii

Contents

1 Introduction 1

1.1 Primary Research Aim . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Research Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Scope of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Outline of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Literature Review 8

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Meta-analysis methods . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Applications to environmental exposures and outcomes . . . . 11

2.3 Mixture models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.1 Relevant applications . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Assessing the combined effect of asbestos exposure and smoking on

lung cancer: A Bayesian approach 23

3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Assessing interaction between asbestos and smoking . . . . . . . . . . 25

3.3.1 Synergy Index (S) . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3.2 Multiplicativity Index (V) . . . . . . . . . . . . . . . . . . . . 27

3.3.3 The relationship between exposure to asbestos and smoking . 28

iii

iv

3.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4.1 Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4.2 Methods to assess interaction . . . . . . . . . . . . . . . . . . 32

3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.5.1 Sensitivity of the Results . . . . . . . . . . . . . . . . . . . . . 38

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 A Bayesian approach to assess interaction between known risk fac-

tors: the risk of lung cancer from exposure to asbestos and smoking 50

4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.3 Overview of studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.4 Methods to assess interaction . . . . . . . . . . . . . . . . . . . . . . 53

4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Spatial and temporal modelling of Ross River virus in Queensland 73

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.2.2 Mixture models . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6 Bayesian mixture model estimation of aerosol particle size distri-

butions 90

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.2 Particle size distribution data . . . . . . . . . . . . . . . . . . . . . . 92

6.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.3.1 Mixture model at a single time point . . . . . . . . . . . . . . 96

6.3.2 Accounting for truncated data . . . . . . . . . . . . . . . . . . 98

6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

v

6.4.1 Simulated data: single time point . . . . . . . . . . . . . . . . 100

6.4.2 Case study: single time point . . . . . . . . . . . . . . . . . . 103

6.4.3 Results for mixture model estimation over multiple time points 107

6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

7 Bayesian estimation of mixtures over time with application to aerosol

particle size distributions 113

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113


7.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

7.3.1 Mixture representation . . . . . . . . . . . . . . . . . . . . . . 117

7.3.2 Choice of temporal prior . . . . . . . . . . . . . . . . . . . . . 120

7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.4.1 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.4.2 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

8 Bayesian hierarchical modelling for a time series of mixtures 138

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138


8.3 Mixture models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

8.3.1 Hierarchical time series approach for mixture models . . . . . 144

8.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

8.4.1 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . 146

8.4.2 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

9 Conclusions and further work 161

9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

A Appendices 166

A.1 Calculations for the variance of S and V (Ch.3) . . . . . . . . . . . . 166

vi

A.2 Reversible Jump Markov Chain Monte Carlo (RJMCMC) (Ch.6) . . . 168

A.3 Penalised Prior (Ch.6) . . . . . . . . . . . . . . . . . . . . . . . . . . 171

A.4 Details of MH Gibbs sampler for hierarchical model (Ch. 8) . . . . . 173

List of Figures

2.1 Representative scatter plots of altitude (km) versus ozone partial

pressure (micro-millibar) with fitted mixture regression curves. (a)

7 February 1990; (b) 9 February 1990; (c) 12 February 1990; (d) 14

February 1990 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1 Box plots of V from Study 2 . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Box plots of V from Study 5 . . . . . . . . . . . . . . . . . . . . . . . 40

3.3 Density plot of V and S from multivariate analysis for Study 2 . . . . 41

3.4 Density plot of V and S from multivariate analysis for Study 5 . . . . 42

4.1 Boxplots of β12(log scale) by study (horizontal axis and study numbers

ordered left to right) and overall (over-dispersed model) . . . . . . . . 63

4.2 Starplots by study (1-18) and Overall. S is the Synergy Index, V the

Multiplicativity Index, PM the probability of a multiplicative relation,

and gamma is the power transformation estimate from Rlg (gamma=0

(additive), gamma=1 (multiplicative)) . . . . . . . . . . . . . . . . . 67

5.1 Queensland climate zones - Bureau of Meteorology . . . . . . . . . . 76

5.2 Time plot of weekly cases - Zone 15 . . . . . . . . . . . . . . . . . . . 78

5.3 Time plot of weekly cases - Zone 5 . . . . . . . . . . . . . . . . . . . 79

5.4 Histograms of data (log(y+1)) for all Zones (as numbered) . . . . . . . . 81

vii

viii

5.5 Plot of fitted mixture model for Zone 15 showing three components

against the data over time (log values). Overall fitted density is shown

in Black, and components in Red. Blue lines indicate the estimates

of µ for the three components. . . . . . . . . . . . . . . . . . . . . . . 83

5.6 Plot of fitted mixture model for Zone 15 against a histogram of the

data. Overall fitted density is shown in Black, and components in Red. 84

5.7 Plot of fitted mixture model for Zone 5 showing three components

against the data over time (log values). Overall fitted density is shown

in Black, and components in Red. Blue lines indicate the estimates

of µ for the three components. . . . . . . . . . . . . . . . . . . . . . . 85

5.8 Plot of fitted mixture model for Zone 5 against a histogram of the

data (density scale). Overall fitted density is shown in Black, and

components in Red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.1 Histogram of data sampled from Hyytiala, Finland for a single time period 93

6.2 An illustration of a new particle formation event at a Boreal Forest site

located in Southern Finland. (a) The temporal variation of the particle

number size distribution and (b) selected particle number size distributions

showing the different stages of the newly formed particle mode from its

early stage. Note that this new particle formation occurred on a regional

scale over the southern part of Finland. . . . . . . . . . . . . . . . . . . 95

6.3 Kernel density estimator of simulated data (black) with fitted results from

normal (dark green) and truncated normal (blue) approaches. Simulated

data based on parameters: k = 4;µ = (1.40, 2.30, 3.70, 5.10);σ = 0.30; λ =

(0.10, 0.10, 0.60, 0.20) . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.4 Histograms of data sampled from Hyytiala, Finland with estimated overall

fit and components for non-truncated Normal (left, k=4) and truncated

normal (right, k=3) overlaid . . . . . . . . . . . . . . . . . . . . . . . . 103


fit and components from RJMCMC (left) and LSM (right) overlaid . . . . 105

ix


fit and components from RJMCMC (left) and LSM (right) overlaid . . . . 106

6.7 Plot of posterior mean values for µjt obtained from the first stage using

the RJMCMC algorithm for one day (Hyytiala measurement station). The

size of the circles indicating the weight (λjt) corresponding to µjt . . . . . 107

6.8 Plot of parameters (µ and λ) over one day (Hyytiala, Finland) for the inde-

pendent approach. Stage 2 of the analysis for the evolution of parameters.

Measurements taken every 10 minutes. Colours indicate the components

to which parameter estimates belong (The parameter estimates for the first

component are Black, parameters for the second component are Red, for

the third component they are Green, etc.) . . . . . . . . . . . . . . . . . 109

6.9 Plot of parameters (µ and λ) over one day (Hyytiala, Finland) for the

informed prior approach. Stage 2 of the analysis for the evolution of the

parameters. Measurements taken every 10 minutes. Colours indicate the

components to which parameter estimates belong (The parameter esti-

mates for the first component are Black, parameters for the second com-

ponent are Red, for the third component they are Green, etc.) . . . . . . 111

7.1 Estimated overall fit and components from RJMCMC for one time

period. Concentration of particles (dN/dlog(Dp)[cm3]) by particle

diameter (log(Dp(nm))) . . . . . . . . . . . . . . . . . . . . . . . . . 116







7.3 Plot of estimated parameters (µ (top panels), σ (middle panels) and λ

(bottom panels) for approaches using simulated data (D1): Simulated data

(Black); Independent (Red); Informed Prior (Green); Penalised Prior (Blue)123

x



(Black); Independent (Red); Informed Prior (Green); Penalised Prior (Blue)124



(Black); Independent (Red) . . . . . . . . . . . . . . . . . . . . . . . . 125


(bottom panels) for Informed Prior approach using simulated data (D3):

Simulated data (Black); Theta=0.1 (Green); Theta=0.8 (Blue); Theta=1.3

(Purple) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126


(bottom panels) for Penalised Prior approach using simulated data (D3):

Simulated data (Black); φ=0.04 (Brown); φ=0.08 (Light Blue); φ=0.12

(Dark Green) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127


(bottom panels) for Informed Prior approach using simulated data (D3):

Simulated data (Black); Smoothing on µ (Orange); Smoothing on σ (Dark

Green)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129



(Black); Independent (Red); Smoothing on µ and λ (Green) . . . . . . . . 130

7.10 Plot of posterior mean estimates for µj from RJMCMC algorithm for one

day (Hyytiala). Stage 1 of analysis for temporal evolution of parameters.

Larger circles indicate greater weight for that component . . . . . . . . . 132

7.11 Plot of estimated parameters (µ (top panel), λ (bottom panel) under an

independent prior over time. Stage 2 of analysis for temporal evolution of

parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

7.12 Plot of estimated parameters (µ (top panel) and λ (bottom panel) under

an informed prior over time. Stage 2 of analysis for temporal evolution of

parameters. Informed prior specified for λ in all components and µ3 . . . 134

xi

8.1 Histogram of data sampled from Hyytiala, Finland for a single time period 140







8.3 Plot of estimated parameters over time for simulated dataset D1: µ (top

panels), σ (middle panels), λ (bottom panels). Actual data (Black), Inde-

pendent (Red), Informed Prior (Green) . . . . . . . . . . . . . . . . . . 148

8.4 Plot of estimated parameters over time for simulated dataset D1. µ (top

panels), σ (middle panels), λ (bottom panels). Actual data (Black), Hier-

achical approach for µ (Dark Green), φ (Blue) . . . . . . . . . . . . . . . 149

8.5 Plot of estimated parameters over time for simulated dataset D1. µ (top


achical approach for λ (Dark Green), γ (Blue) . . . . . . . . . . . . . . . 150

8.6 Plot of estimated parameters over time for simulated dataset D2.

µ (top panels), σ (middle panels), λ (bottom panels). Actual data

(Black), Independent (Red), Informed Prior (Green) . . . . . . . . . . 151

8.7 Plot of estimated parameters over time for simulated dataset D2.

µ (top panels), σ (middle panels), λ (bottom panels). Actual data

(Black), Hierachical approach for µ (Dark Green), φ (Blue) . . . . . . 152

8.8 Plot of estimated parameters over time for simulated dataset D2.µ (top


achical approach for λ (Dark Green), γ (Blue) . . . . . . . . . . . . . . . 153

8.9 Plot of posterior mean estimates for µj from RJMCMC algorithm for one

day (Hyytiala). Stage 1 of analysis for temporal evolution of parameters.

Larger circles indicate greater weight for that component . . . . . . . . . 155

8.10 Plot of estimated parameters over time for actual data. Independent ap-

proach. Posterior mean estimates for µ (top panel), and λ (bottom panel). 156

xii

8.11 Plot of estimated parameters over time for actual data. Hierarchical ap-

proach for λ. Posterior estimates for µ (top panel), and γ (bottom panel). 157

List of Tables

3.1 Details of Studies Used for Statistical Analysis . . . . . . . . . . . . . 30

3.2 Reported Results of Studies . . . . . . . . . . . . . . . . . . . . . . . 31

3.3 Results (Univariate): Test of Synergy (S) and Multiplicativity (V) . . 35

3.4 Results for Multivariate RR Analysis . . . . . . . . . . . . . . . . . . 37

3.5 Combined Results for S and V . . . . . . . . . . . . . . . . . . . . . . 38

3.6 Sensitivity of estimates from the Variance/Covariance Matrix for the

Multivariate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.7 Results for S and V by Factor . . . . . . . . . . . . . . . . . . . . . . 44

3.8 Sensitivity of the Posterior Estimates (Univariate): Test of S and V . 45

3.9 Sensitivity of the Posterior Estimates (Multivariate Analysis) . . . . . 45

4.1 Details of studies for statistical review . . . . . . . . . . . . . . . . . 54

4.2 Results of logistic and Poisson regression models . . . . . . . . . . . . 62

4.3 Relative Risk Estimates, Observed and Posterior . . . . . . . . . . . . 65

4.4 Results of relative risk models and mixture model . . . . . . . . . . . 66

4.5 Results of Synergy Index (S) and Multiplicativity Index (V) . . . . . 68

4.6 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.1 Summary results - all zones . . . . . . . . . . . . . . . . . . . . . . . 77

5.2 Results for Zones 5 and 15 . . . . . . . . . . . . . . . . . . . . . . . . 82

5.3 Results for all Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

xiii

xiv

6.1 Estimated parameter values from Bayesian mixture model analysis

using RJMCMC algorithm with simulated data. Based on 200,000

iterations with a burnin of 100,000. CI = Credible Interval . . . . . . 102

Statement of Original Authorship

The work contained in this thesis has not been previously submitted for a degree or

diploma at any other higher educational institution. To the best of my knowledge

and belief, the thesis contains no material previously published or written by another

person except where due reference is made.

Signed:

Date:

xv

Acknowledgements

There are many people to thank really and I will not be able to remember everyone

who helped in some way and for that I apologise in advance. I would firstly like to

thank my Principal supervisor Kerrie Mengersen for her amazing support and guid-

ance over a very long period of time and for that I am very grateful. To Clair Alston

who was a huge help for Chapters 6 and 7, especially in the early stages of this work

when the computational task, including the programming seemed onerous–I thank

you very much. I would also like to thank everyone in the School of Mathematical

Sciences at QUT for making the last few years an enjoyable and friendly experience.

To my roommates and friends in O415, thank you for all the necessary and fun dis-

tractions, including the Friday night ritual! I would also like to thank Andrew Torre

whose inspiration and belief started my interest in mathematics seemingly a long

time ago, and whose support over the years I have greatly appreciated. Lastly, but

not least, I would to thank my family and friends for all their support, who although

for some weren’t actually sure what I was doing ... were nevertheless supportive any-

way and in those terms perhaps the best support I could ask for. I would also like to

especially thank my partner, Shonah, for her amazing support, particularly in the

final stages, and whose level of support continues to amaze me everyday–thank you!

xvi

Chapter 1

Introduction

1.1 Primary Research Aim

The primary aim of this study is to develop mixture model approaches to characterise

complex environmental exposures. The methodology of mixture models provides

a means to understand, quantify and describe features and relationships within

environmental exposure data.

1.2 Motivation

The motivation for this thesis arose from a research project examining the nature

of the relationship between exposure to both asbestos and smoking as risk factors

for lung cancer. From a preliminary review of the literature, we found there was

well documented evidence indicating that both long term exposure to asbestos and

active smoking are independent risk factors for lung cancer. However, the nature of

the relationship between, or interaction of the two risk factors was less clear and was

often the subject of much debate in the epidemiological literature. The question of

primary interest from a statistical perspective was whether the risk from exposure

to both asbestos and smoking is an additive, multiplicative or other relation of the

1

2

risk from exposure to each factor alone.

We found from reviewing the studies which aimed to quantify the relationship

between exposure to asbestos and smoking, that there was much variability in the

results. We also found much discrepancy between the outcomes of two major reviews

published at a similar time, the first by Lee (2001) and the second by Liddell (2001),

which lead to an interesting interchange between the two authors in discussing the

outcomes of their reviews (Lee, 2002; Liddell, 2001). Lee (2002) found little evidence

to reject a multiplicative relation, however Liddell (2002) highlighted differences in

the results of the case-control versus cohort studies, finding evidence against a simple

multiplicative hypothesis. It was clear from reading these two reviews that apart

from some of the differences in the individual studies (e.g study design, exposure

levels, etc.) to which these reviews refer, an alternative method or perspective

to characterise the relationship was needed. In particular, we were interested in

whether we could say with some probability whether the relationship between the

two exposures could be assigned to one functional form or another (additive or

multiplicative). From a statistical perspective, a mixture model approach seemed

ideally suited to the analysis of this problem and we started to look at how we

could apply this approach in a meta-analysis context. After much investigation, we

found that we could characterise the relationship between the two risk factors in

this way, and that the results could be used as an aid to understand and quantify

the uncertainty in establishing a relationship. With the success of this approach,

the idea to use this to characterise other environmental exposures was born.

Under an ARC Discovery Grant, we started to look at a mixture model approach

to characterise the risk of Ross River virus (RRv) in Queensland (QLD). In partic-

ular, interest was in how the risk of RRv varied over time and across the spatial

region of QLD. A recent paper by Gatton et al. (2004), examining the risk of RRv

across QLD found much variability in the results across the region, and found it

useful in the analysis to separately identify outbreak periods. With much variability

in the results over time and space, we started to look at applying a mixture model

approach to classify the data over time into different periods, which would build

on the approach adopted in Gatton et al. (2004). In particular, we were interested

3

in whether we could classify cases of RRv over time into more than two periods

(outbreak or no outbreak) and how the number of periods varied across the spatial

region of QLD.

Under the ARC Discovery Grant, we were also collaborating with the aerosol

physics group at QUT on a project looking at the concentration of particles of dif-

ferent sizes over time, which had been collected in Brisbane over a 5 year period.

The size of the particles ranged from 16nm to 600nm. In the aerosol physics litera-

ture, one of the standard approaches to analysing data of this form was to classify

the data into particular size groups of interest (e.g 16-30nm, 31-90nm, 90nm+) and

analyse these groups separately. As the size of the particles can reveal their source,

and because particles are governed by formation and transformation processes, they

tend to form well distinguishable modal feature. An alternative approach in the

aerosol literature for analysing this type of data is to identify the modes of the data

using a classification technique and analyse the modes separately. However, this

classification approach was more difficult to do and because of the basic approaches

employed (e.g least squares regression) was very rarely used without making some

subjective decisions (See Hussein et al., 2005). Armed with our experience in using

mixture models, we were interested in whether we could use a mixture model to

classify the data by the modal features of the data, and also in how this mixture

representation would change over time. As time intervals between measurements

is often quite small (e.g 5 minutes to an hour) and thus the data over time often

highly dependent, we were also interested in any improvements to estimation and

inference by including information about the mixture representation at neighbouring

time points.

For the first part of the analysis using the data from Brisbane, we grouped the

data into size bins according to size ranges of interest and analysed the concentration

of particles for the different size bins separately over time (Mejia et al., 2007). To

apply a mixture model approach to investigate the modal structure of the data we

chose a more comprehensive dataset from Hyytiala, Finland, which provided a more

detailed assessment of the modal structure and an almost a complete dataset of

observations.

4

1.3 Research Plan

To address the primary aim of this thesis, as stated in Section 1.1, this thesis focusses

on the following problems to characterise complex environmental exposures and

outcomes:

• assessing the interaction between environmental exposures as risk factors for

health outcomes

• identifying differing environmental outcomes across a region

• establishing patterns in the size and concentration of aerosol particles over

time

In order to address these problems the following mixture model approaches are

developed and examined:

• a mixture model approach to assess interaction between risk factors in a meta-

analysis framework

• a mixture model approach to classify cases of a disease over time into a number

of groups based on time periods with differing risk levels

• estimation of mixture models over multiple time points

1.4 Scope of thesis

Each of the selected problems in characterising complex environmental exposures

and outcomes could be approached in a number of ways. In this thesis, we confine

our attention to the development and application of mixture model approaches to

address these problems.

A mixture model approach to assess interaction or relationships between risk

factors in a meta-analysis context is outlined. In this analysis, we consider whether

the relationship between the two exposures could be assigned to one functional form

or another (additive or multiplicative). Alternative relationships and mechanisms

underlying disease causation are not investigated in detail.

5

A mixture model approach to characterise the risk of RRv over time and spa-

tial regions by identifying groups in the data is outlined. Explanatory variables

associated with RRv and the correlation structure of the data are not investigated.

Stochastic or mechanistic models to describe the transmission of the disease are also

not examined.

Mixture model approaches to estimate parameters at both single and multiple

time points for aerosol particle size distribution (PSD) data are outlined. Alternative

approaches to analysing this data, such as grouping of size bins into categories and

separate analyses of size variables are not provided as a comparison in the analysis.

The dynamic processes describing the evolution of the particles are not investigated.

1.5 Outline of thesis

The remaining chapters of this thesis are organised as follows:

Chapter 2 presents a review of meta-analysis and mixture model approaches in

the literature to characterise environmental exposures and outcomes. Most of the

relevant literature is discussed in the chapters, and here we present an overview of

the main approaches used in this thesis.

In Chapter 3, we examine the relationship between two risk factors for lung

cancer, exposure to asbestos and smoking using a multivariate meta-analysis. In

particular, from a statistical perspective, we are interested in whether the risk from

exposure to both asbestos and smoking is an additive, multiplicative or other relation

of the risk from exposure to each risk factor alone. In this analysis, we consider the

evidence for either relation using separate tests.

Chapter 4 extends the meta-analysis approach in Chapter 3, and examines a

mixture model approach to assess the strength of evidence for either relation. In this

approach, we move away from separate tests for either an additive or multiplicative

relation and allow the data to choose between both models. By allowing both

relations to be considered at the same time, this type of inference may be more

informative than considering each relation separately.

In Chapter 5, we examine a mixture model approach to characterise the risk of

6

Ross River virus (RRv) in Queensland from 1984 to 2001. The approach builds on

the approach adopted by Gatton et al. (2004), and considers that the weekly cases

of RRv could be attributed to more than two hypothesised periods (outbreak or no

outbreak period), and also extends the analysis to compare the number of periods

across non-homogenous spatial regions of Queensland.

In Chapters 6, 7 and 8 we examine approaches to estimate a mixture model

at both single and multiple time points for aerosol particle size distribution (PSD)

data. In Chapter 6, for estimation of mixture model at a single time point, we

use Reversible Jump MCMC to estimate mixture model parameters including the

number of components which is assumed to be unknown. We compare the results

of this approach to a commonly used estimation method in the aerosol physics

literature. As PSD data is often measured over time at small time intervals, we also

examine the use of an informative prior for estimation of the mixture parameters

which takes into account the correlated nature of the parameters.

In Chapter 7, we examine in some detail the issue of using informative priors

for estimation of mixtures at multiple time points. In this analysis, the use of two

different informative priors, and an independent prior are compared using simulated

and actual data. The use of informative priors may provide useful information in

which to better identify component parameters at each time point, and as an aid

for inference provide information in which to more clearly establish patterns in the

parameters over time.

In Chapter 8 we address some of the issues raised in Chapter 7, and explore a

hierarchical approach to estimation of mixture parameters over time in which an

informative prior is placed at two different levels. Simulated and actual data is used

to assess the performance of the approach.

The approaches examined in Chapters 6 to 8 extend a previous mixture model

approach for estimation of more than a single time point in a different setting (Alston

et al., 2005), to include all parameters and allow for a generalised correlation struc-

ture to be imposed. We also extend the two stage approach to estimation adopted

in Lee and Berger (2003), by allowing for correlation information to be used at the

same time as parameters are estimated. Approaches in the literature to estimate

7

a mixture model over a spatial region can also potentially be adapted for use in

a time series setting (Green and Richardson, 2002; Fernandez and Green, 2002),

however the influence or choice of informative priors in a time series framework and

the implications in different data environments has largely not been examined. In

Chapters 6, 7 and 8 we examine the use of informative priors for estimation of pa-

rameters over time, and extend the approaches in Green and Richardson (2002) and

Fernandez and Green (2002) to a time series setting and allow for all parameters to

be correlated over time.

An overview and discussion of the methodology are provided in Chapter 9. Pos-

sible extensions to the research presented in this thesis are indicated.

Chapter 2

Literature Review

2.1 Introduction

As much of the discussion of the literature is contained in each chapter, in this

chapter we provide an overview of the main approaches used in this thesis.

2.2 Meta-analysis methods

The use of Bayesian methods for meta-analysis has recently been reviewed (Sutton

and Higgins, 2007; Sutton and Abrams, 2001; Ashby, 2006; Spiegelhalter et al.,

2004). In this section, I briefly review the Bayesian approach to meta-analysis and

outline some of the main applications to environmental exposures and outcomes.

The use of meta-analysis methods to synthesise evidence regarding environmental

exposures and outcomes have been investigated and applied by a number of authors.

A number of epidemiological applications have concerned: environmental tobacco

smoke and cancer (Tweedie et al., 1996; Wolpert and Mengersen, 2004; Salanti et al.,

2006; Nam et al., 2003); air pollution and mortality or morbidity (Dominici et al.,

2000, 2004; Chen et al., 2006); and health effects from low-level exposure to lead or

exposure to nitrogen oxide (Hasselblad, 1995).

8

9

In most of the above applications, the method of meta-analysis has largely been

used to provide an overall assessment of the existence or size of an exposure-outcome

relationship from evidence provided by a number of individual studies where an over-

all picture remains largely obscure (Tweedie et al., 1996). If results from individual

studies are fairly consistent and clearcut it can also be used simply to increase sta-

tistical power, and provide greater confidence around an individual effect.

The earliest Bayesian approach to meta-analysis starts with the landmark papers

by Dumouchel and Harris (1983) and Dempster et al. (1983). Dumouchel and Harris

(1983) inspired by the hierarchical prior distributions of Lindley and Smith (1972),

introduced the idea of constructing hierarchical Bayesian models to synthesise infor-

mation from five types of environmental studies of the effect on human and animal

subjects of exposure to nine related environmental agents. Since then broad guides

to the use of a Bayesian hierarchical model to synthesise evidence include Carlin

(1992) and Spiegelhalter et al. (2004).

In a meta-analysis approach, interest is often in an overall measure or true un-

derlying measure, let us say µ, for which we would like to infer. To outline the

Bayesian hierarchical approach to meta-analysis, consider the simple formulation

Y = θ + e

θ = Xµ + ε

in which Y=(Y1, . . . , Yk) are the observed log relative risks for each study, θ =

(θ1, . . . , θk) are the corresponding true log relative risks, e = (e1, . . . , ek) and ε =

(ε1, . . . , εk) are random errors, X is a k×p design matrix, and µ is a p×1-vector of pa-

rameters of interest. If we take Yi to be normally distributed such that Y ∼ N(θ, Σ),

assume that θ ∼ N(Xµ, τ 2I), where ei ∼ N(0, σ2i ) and εi ∼ N(0, σ2

i ) are mutually

independent. A frequentist approach is to consider µ, σ2 and τ 2 as fixed parame-

ters and estimation of τ 2 is most commonly achieved through an approximation by

DerSimonian and Laird (1996).

For a Bayesian approach, Dumouchel (1990) and Carlin (1992) make the following

10

distributional assumptions,

Y |θ, σ ∼ N(θ, σ2C)

σ−2 ∼ χ2(dfσ)/dfσ

and

θ|µ, τ ∼ N(Xµ, τ 2V )

µ|τ ∼ N(0, D →∞)

τ−2 ∼ χ2dfτ

/dfτ

where C and V are k×k observed and prior variance-covariance matrices respectively,

and the degrees of freedom dfσ and dfτ indicate how well C and V, respectively, are

known. If we assume the studies to be independent, we can take C, which describes

within-study variability, to be a diagonal matrix with corresponding diagonal entries

the variances of the individual observations Yi. Similarly, if we assume little inter-

study variability, the matrix V, which describes interstudy variability, can be taken

to be a k × k identity matrix. An overall measure of the mean log relative risk for

all studies combined is provided by µ. The notation D → ∞ indicates that the

elements of D are very large and tending to infinity.

While Dumouchel and Harris (1983) outlines approximations to the analytical

posterior distributions for the above distributional assumptions, one of the advances

since then has been the use of Markov Chain Monte Carlo (MCMC) methods which

avoid the need for such approximations, and the above can be implemented using

Gibbs sampling.

Both Carlin (1992) and Dumouchel (1990) suggest that it is desirable to assess

the sensitivity of prior information on the results, in particular the dependence of

the posterior estimates of µ and θ on the specifications of dfσ and dfτ .

There are limitations associated with combining studies in the form of a meta-

analysis. The main limitations are common to both a frequentist and Bayesian

approach, and include confounding effects and biases within studies or biases by pub-

lication. Meta-analysis is designed to enable a combination of results from studies

11

which are comparable in outcome and exposure. In conducting a meta-analysis, we

may try to combine studies with different designs, or of different quality, which may

produce a consistent bias either upwards or downwards for an overall assessment.

Proposals to account for biases within studies include: restricting those studies in-

clude in the meta-analysis to only the best quality; down-weighting studies based

on a quantitative assessment of quality (Tritchler, 1999); and adjustments to study

outcomes using either covariates in a weighted regression (Thompson and Sharp,

1999) and/or prior information (Wolpert and Mengersen, 2004).

Publication bias is concerned with the potential for only statistically significant

or ‘positive’ results to be published and thereby biasing the selection of studies

to be included in a meta-analysis. The issue of publication bias and the impact

on the validity of findings has recently been reviewed (Sutton et al., 2000; Ashby,

2006). Various tests have been proposed to test for publication bias (Begg and

Mazumdar, 1994; Egger et al., 1997). Approaches to address the issue, amongst

others, include: the use of selection models incorporating various weight functions

(Silliman, 1997); the use of simulated pseudo-data (Bowden et al., 2006); and a

non-parametric method trim and funnel plot approach (Duval and Tweedie, 2000).

2.2.1 Applications to environmental exposures and outcomes

In this section, we outline some of the main applications of a Bayesian meta-analysis

to characterise environmental exposures and outcomes.

Dominici et al. (2000) applied a hierarchical regression model to analyse the effect

of urban air pollution on daily mortality using data for the 20 largest US cities. The

data consisted of publicly available listings of individual deaths by day and location,

and hourly measurements of pollutants and weather variables for a seven year period

(1987-1994). In a two stage analysis, the main interest was to establish an association

between PM10 (particulate matter less than 10µm in aerodynamic diameter) and

daily mortality, after controlling for possible confounders, by combining information

from across the cities. Interest is also in the extent to which some of the daily

mortality can be explained by variation in ozone levels (O3), and which may confound

an association between PM10. In the first stage, a log-linear regression is used

12

(using maximum likelihood) to estimate a pollution relative rate for each city, while

controlling for the city-specific longer term time trends and weather effects. For the

second stage, for the estimates of the log-relative rates associated with PM10 and O3

for each city (βc = (βcPM10

, βcO3

)), the following hierarchical model was considered,

βc ∼ N2(βc, V c),

βcPM10

= zc′PM10

αPM10 + εcPM10

,

βcO3

= zc′O3

αO3 + εcO3

,

εc ∼ N(0, Σ)

where zcPM10

and zcO3

are vectors of city-specific covariates, αPM10 , αO3 are the overall

estimates of the log-relative rates, and εc = (εcPM10

, εcO3

). Maximum likelihood esti-

mates from the first stage are used for βc and V c. Priors for α and Σ are specified

to be weakly uninformative.

For the second stage analysis, the assumption of relative rates of mortality for

PM10 (βcPM10

) to be independent across cities, and adjusted by levels of O3, were

compared to the possibility of there being geographic correlation. For the spatial

analysis, cities were clustered into three regions (North-East, South-East, and West-

Coast). The authors found the results under all of these models to be similar, with

the spatial analyses slightly attenuating the effects.

In the second stage, Gibbs sampling was used to estimate parameters. Given

the large size of the database and the main interest being in a combined estimate of

the association between PM10 and mortality, a single combined model using MCMC

was perceived to be too computationally demanding in light of any improvement in

estimation to be made.

A similar hierachical model to the above was also used in Dominici et al. (2002)

and Dominici et al. (2004). In Dominici et al. (2002) the above, analysis was ex-

tended to 88 of the largest US cities, with interest mainly focussed on the effects of

PM10. In Dominici et al. (2004), a hierachical bivariate model was used to charac-

terise the relationship between PM10 and both mortality and hospital admissions

for cardiovascular diseases for 10 metropolitan areas in the US.

13

Tweedie et al. (1996) and Wolpert and Mengersen (2004) examine the combined

evidence from 29 studies of the association between environmental tobacco smoke

exposure and lung cancer in adults who have never smoked.

Wolpert and Mengersen (2004) applied an adjusted likelihood approach to syn-

thesise the disparate information from the 29 studies. In their analysis, the assump-

tion of exchangeability between the studies was considered to be untenable. Variabil-

ity between the studies centered around three main quality issues: misclassification

of ever-smokers as never-smokers; misclassification of disease; and misclassification

of exposure.

In their approach, the investigator begins by specifying in detail the target condi-

tions which he/she is interested in for example, the subject population, treatment or

exposure details, etc. Whilst each individual study offers direct evidence about the

parameters that govern that particular study, the idea is then to construct an ad-

justed likelihood function that describes the indirect evidence offered by each study

about the questions of interest to the investigator under the specified target con-

ditions. Studies conducted under conditions quite similar to the target conditions

lend stronger evidence.

If the relationship between the indirect and direct evidence about θ from the

studies is known (with parameter αi), then

LAdji (θ) = Li(φi(θ, αi)) (2.1)

where the function φ is used to adjust the parameter θ towards θ under ideal or target

conditions (θ0). Information about both the functional form for φ and parameter αi

may be gained from expert opinion or by evidence in the literature.

14

2.3 Mixture models

There is a large literature on mixture models, with applications in disease map-

ping (Green and Richardson, 2002), earthquake analysis (Walshaw, 2000), finance

(Watanabe, 2000) and industrial quality control (Kvam and Miller, 2002) to name

only a few. Seminal monographs include Titterington et al. (1985) and McLachlan

and Peel (2000a). Diebolt and Robert (1994) and Marin et al. (2005) provide an

overview from the Bayesian perspective.

Given data (y) which is independent and i.i.d, the density of data given by a

finite mixture model can be represented by;

p(y|θ) =k∑

j=1

λjf(y|θj) (2.2)

where k is the number of components in the mixture, λj represents the probability of

membership to the jth component (∑k

j=1 λj = 1), and f(y|θj) is the density function

of component j, which has parameters θj. We can also represent the density in terms

of a continuous mixture but we do not discuss this representation here.

From Equation (2.2), the posterior distribution of the unknown parameters is

given by

p(θ, λ) ∝ p(y|θ, λ)p(θ, λ)

∝N∏

i=1

[ k∑j=1

λjf(y|θj)

]p(θ, λ) (2.3)

For even relatively moderate sample sizes, analytical methods to evaluate the sum

of kN terms from Equation 2.3 can become too computationally intensive to con-

template (Robert and Casella, 2004). As component membership of the data (y) is

unknown, a computationally convenient method of estimation for mixture models

is to use a hidden allocation process and introduce a latent indicator variable zij,

which is used along the lines of a missing variable approach to allocate observations

yi to each component.

Markov Chain Monte Carlo (MCMC) methods represent the most common ap-

15

proach to estimation of finite mixture models in the Bayesian literature where the

choice of sampler varies widely. The most common sampler used is the Gibbs Sam-

pling algorithm (Diebolt and Robert, 1994), which uses the full conditionals for each

model parameter to simulate from the joint posterior distribution. This is partic-

ularly useful for mixtures as the joint posterior distribution of the parameters is

difficult to simulate from while the full conditionals are often available.

Two alternative approaches to dealing with an unknown number of components

include direct estimation in the sampler or by model comparison. The first can

involve a Markov chain moving in spaces of different dimensions e.g Green (1995)

and Richardson and Green (1997) and the reversible Jump MCMC, while alternative

samplers that move between models are proposed by Stephens (2000a) and Phillips

and Smith (1996). The second, model comparison, involves fitting the mixture model

with different values for k and then using a model choice criteria to choose between

the competing models. For mixture models and missing data problems, various

model choice criteria have been proposed but they are not without their problems

(Celeux et al., 2003). Commonly used criteria are the Bayesian Information Criteria

(BIC) (Kass and Raftery, 1995), the Deviance Information Criteria (DIC) (Celeux

et al., 2003) and Bayes factors (as in Fruhwirth-Schnatter and Kaufmann (2004),

Ishwaran et al. (2001) and Raftery (1996)).

As discussed by a number of papers, (see, for example, Marin et al., 2005 and

Casella et al., 2004), a number of difficulties can arise when constructing a sampler

for a mixture model. The main difficulties include label switching, exploration of

the parameter space, and computational expense. Label switching can occur due to

the invariance of the likelihood to k! permutations of the labels, which during an

MCMC run can cause the allocation vector to switch between components. As a

result the posterior distribution can have k! modes. To overcome this problem, a

common solution proposed by Diebolt and Robert (1994) is to impose identifiability

constraints on the parameters (e.g. µ1, . . . , µk), but as discussed by Celeux et al.

(2000) and Stephens (2000b) these constraints can lead to truncation of the posterior

distribution. As alternatives to imposing an identifiability constraint, Celeux et al.

(2000) proposes a loss minimisation approach, while Stephens (2000b) proposes to

16

use clustering techniques. Casella et al. (2004) suggest a method based on an ap-

propriate partition of the space of augmented variables. Fruhwirth-Schnatter (2001)

proposes a random permutation scheme, and Geweke (2007) proposes a permutation-

augmented simulator, a deterministic modification of the usual MCMC sampler. A

comprehensive discussion of this issue is found in Jasra et al. (2005).

Another difficulty in constructing a sampler for a mixture model is to ensure

a full exploration of the parameter space. This is an issue in general for a sam-

pler in most settings but can be exacerbated in the case of a mixture due to the

expected multimodality of the posterior distribution. A common criticism of the

Gibbs sampler is that it may not always visit the k! symmetric modes of the poste-

rior distribution easily. Alternatives to the standard Gibbs sampler using tempering

MCMC is suggested by Celeux et al. (2000) or adding a Metropolis Hastings step

as suggested in Cappe et al. (2002).

Alternative representations of the standard mixture model, include Hidden Markov

Models (HMM’s) and Dirchlet Process mixture models (DPM’s). HMM’s represent

a generalisation of the mixture model in situations where the observations are not

independent or a latent Markov model is assumed to underly the data observed.

An HMM consists of two processes: a hidden process or a sequence of states that

evolves in a Markov manner and an observed process that is dependent on this hid-

den process. The HMM assumes a Markov dependence in time between the latent

variable of Equation (2.2)

p(zt|z\t) = p(zt|zt−1, zt+1) (2.4)

For a number of states, the process is governed by a transition matrix which spec-

ifies the probability of moving from one state to another. We can also model the

dependency on the observations (autoregressive HMM),

p(yt| . . . ) = p(yt|yt−1, yt+1, zt, θy) (2.5)

Two main methods to sample from the states include Gibbs sampling, and a recursive

scheme using a forwards/backwards filter (Scott, 2002). Inference is enhanced by

17

identification of the underlying states and also of obtaining probability estimates of

moving from one state to another. For these reasons, the approach has found many

applications in a wide range of areas (Hamilton, 1994; Scott et al., 2004).

Dirichlet Process Mixture models (DPM’s) provide a different approach to the

interpretation and estimation of mixtures discussed for far. Here interest is in a

non-parametric approach, in which a mixture model is used to provide a type of

basis function for a density, and for which k may be very large. In the standard

mixture model approach, a commonly used prior for the allocation of the weight

λ (λ = (λ1, . . . , λk)) is to assume a Dirichlet distribution (representing a sort of

stick breaking allocation of λ into k bits). In the limit as k → ∞ the Dirichlet

distribution becomes a Dirichlet Process. The flexibility of the approach has seen

a rapidly expanding literature and a number of applications in recent years (Griffin

and Steel, 2004; Do et al., 2005).

2.3.1 Relevant applications

In this section, we discuss some of the applications of mixture model approaches

related to characterising environmental exposures and outcomes. We also discuss

mixture model approaches which have been used in other applications but which

provide a background to estimation of mixtures over multiple time points developed

in Chapters 6 to 8 of this thesis.

In a disease modelling context, there have been a number of applications of mix-

ture model approaches (e.g (Knorr-Held and Rasser, 1999; Denison and Holmes,

2001; Green and Richardson, 2002; Fernandez and Green, 2002; Gangnon and Clay-

ton, 2000)). In this setting, interest is largely focussed on partitioning or clustering

relative risk estimates observed at particular spatial sites or area units into a num-

ber of groups, either for non-parametric estimation of the underlying risk surface

(Knorr-Held and Rasser, 1999; Denison and Holmes, 2001; Green and Richardson,

2002; Fernandez and Green, 2002)) or where the location and composition of the

clusters is of primary interest (Gangnon and Clayton, 2000). Applications considered

in these papers include: the risk of leukemia (Gangnon and Clayton, 2000; Denison

and Holmes, 2001) and larynx cancer (Green and Richardson, 2002; Fernandez and

18

Green, 2002). For the purposes of this review, we outline the approach considered

by Green and Richardson (2002) which provides a good example and a basis for the

other approaches considered.

Green and Richardson (2002) explore a finite mixture model to allocate or par-

tition relative risk estimates for a particular disease (say θ) identified by area units

(i). A common approach in a spatial setting is to consider that the estimates for the

spatial units (θi) are related by a continuous Markov random field (Besag, 1974).

Here we take,

Yi|Yj = θj, j 6= i ∼ N(n∑

j=1

Wijθj, σ2θDii) (2.6)

where W is a specified matrix of weights (Wii = 0,Wij = −Qij/Qii and Dii = Q−1ii ),

σ2θ is the overall variance of Yi, and Q is a precision matrix. A set of spatial weights

can be specified for (2.6) which define a set of ‘neighbours’. To define ’neighbours’

a number of authors have taken areas i and j to be neighbours if they share a

common boundary. The set of conditional distributions given by (2.6) defines a

Markov random field (MRF) model (Discussed by Besag (1974)).

An alternative approach adopted by Green and Richardson (2002) is to consider

that this continuous random field of θi consists of allocations or partitions (where

now θi = θzi, and zi is an allocation variable (unobserved) taking values (1,. . . ,k),

where k is the number of components). In this approach the spatial variability of

zi is assumed to follow a Potts model, with the number of states (k) and strength

of interaction unknown (ψ) (to be estimated). In the Potts model, p(z|ψ, k) =

eψU(z)−δk(ψ). In contrast to standard mixture models, this formulation does not make

use of explicit weights on components. The degree of spatial dependency is controlled

by favouring probabilistically those allocation patterns where like-labelled locations

are neighbours (U(z) =∑

i∼i′ I[zi = zi′ ]). Interest is in estimating the number of

components k which defines the number of levels of the relative risk surface. The

approach is estimated using Reversible Jump MCMC (RJMCMC) (which allows for

an increase or decrease in the number of components at any particular time in the

estimation process).

Fernandez and Green (2002) extend the work of Green and Richardson (2002)

19

and explore a markov random field similar to a CAR model for zi, but allow the

weights in the mixture to vary from one location to another.

In an image analysis context, Alston et al. (2005) use a Bayesian mixture Gaus-

sian mixture model in conjunction with a hidden Markov random field to estimate

the proportion of tissue types present in an individual CAT scan image of sheep.

While CAT scan images provide a measure of the denseness of tissue, interest is in

estimating the proportion of tissue type (fat and muscle) present in the scan, the

denseness of tissues of interest such as fat and muscle, and a characterisation of the

distribution of these tissues.

In their model, the grey scale data y = (y1, . . . , yn) from CAT scan images are

represented with an approximate density of

g(y) ≈ g(y) =k∑

j=1

wjφj(y|µj, σj) (2.7)

where n is the number of pixels in the CAT scan. The latent variable indicator zi

(here representing an allocation variable for yi, again taking values in (1, . . . , k)) is

assumed to be drawn from a hidden Markov random field (MRF) involving the first

order neighbouring pixels (pixels which share an edge are deemed to be neighbours)

in the CAT scan. The joint distribution is represented by (otherwise known as a

Potts model)

p(z|β) = C(β)−1exp(β∑

zizδi) (2.8)

where C(β) is a normalising constant, δi indicates a neighbour of pixel i and zizδi= 1

if zi = zδi, otherwise zizδi

= 0. The parameter β estimates the level of spatial

homogeneity in component membership between neighbouring pixels in the image.

Previous scans are represented as a perturbation on the current Gaussian component

estimates in both time and space,

µj = ρtµjt + ρsµjs + εj (2.9)

σ2j = θj(ρtσ

2jt + ρsσ

2js) (2.10)

20

where s and t represent previous space and time estimates respectively, and pertu-

bation parameters are θj, εj and (ρt, ρs) which are assumed to follow a random walk.

Estimation of the number of components uses birth and death processes, and BIC

criteria.

A natural extension of the approaches developed above in a spatial setting is to

consider these approaches for a time series setting. In either setting, we can use

prior information about the dependency that exists between neighbouring observa-

tions (in space or time) to improve estimation. However, the influence or choice of

similar informative priors in a time series framework and the implications in different

data environments has largely not been examined. We discuss these influences and

implications in Chapters 6 to 8.

Lee and Berger (2003) propose a mixture model to analyse ozone measurements

at different altitudes. In a two stage approach they model the spatial component

(altitude) as a four component mixture of normal distributions, and a state-space

representation to describe the parameters over time. Figure 2.3.1 shows scatterplots

of altitude and ozone measurements with fitted mixture curves for selected time

points (from Lee and Berger (2003))

In the first stage of the analysis, for each time point, the mixture model consid-

ered is

y =4∑

j=1

wjf(h|µj, τj) + εh (2.11)

where: the εh are independent mean zero Gaussian errors; y are ozone measurements;

h is altitude; and the weight wj can be interpreted as the amount of ozone contained

in the jth component of the mixture, and the sum of the weights,∑4

j=1 wj is the

total column ozone amount. Parameters are estimated using MCMC.

Due to the size and complexity of the dataset (and future goals of the analysis)

a single stage analysis in which parameters are time dependent was considered to be

infeasible. In the second stage of the analysis, the posterior modes of the parameters

from the first stage are explicitly modelled over time using Bayesian state-space

modelling (West and Harrison, 1997).

21

Figure 2.1: Representative scatter plots of altitude (km) versus ozone partial pressure (micro-millibar) with fitted mixture regression curves. (a) 7 February 1990; (b) 9 February 1990; (c) 12February 1990; (d) 14 February 1990

22

2.4 Conclusion

In this chapter, we have provided and overview of the main approaches used in this

thesis. Further discussion of the relevant approaches in the literature are contained

in each chapter. The meta-analysis approaches discussed form a basis for the ap-

proaches developed in Chapters 3 and 4. Similarly, the mixture model approaches

discussed form a basis of the approaches developed in Chapters 4 to 8.

Chapter 3

Assessing the combined effect of asbestos exposure

and smoking on lung cancer: A Bayesian approach

3.1 Summary

In this Chapter, we review the literature on the combined association between lung

cancer and two environmental exposures, asbestos exposure and smoking, and ex-

plore a Bayesian approach to assess evidence of interaction between the exposures.

The meta-analysis combines separate indices of additive and multiplicative relation-

ships and multivariate relative risk estimates. By making inferences on posterior

probabilities we can explore both the form and strength of interaction. This anal-

ysis may be more informative than providing evidence to support one relation over

another on the basis of statistical significance. Overall, we find evidence for a more

than additive and less than multiplicative relation.

3.2 Introduction

There is well documented evidence indicating that both long term exposure to as-

bestos and active smoking are independent risk factors for lung cancer. The statisti-

cal form of their combined effect is less clear. The question of interest here is whether

23

24

the risk from exposure to both asbestos and smoking is an additive, multiplicative

or other relation of the risk from exposure to each factor alone.

Methodologies for assessing the relationship between risk factors have been the

subject of much research (Saracci and Boffetta, 1994; Liddell, 2001; Lee, 2001).

As well as interpreting interaction in the context of classical relative risk models

(Roy and Esteve, 1998) recent studies have explored the viability of non-parametric

models (van der Linde and Osius, 2001).

Evidence for a multiplicative association between exposure to asbestos and active

smoking and the outcome of lung cancer was indicated by an early study of US

workers (Selikoff et al., 1968). Subsequent studies and reviews of the literature with

an objective to assess the form of the relationship further have indicated mixed

results, ranging from mixed evidence for either an additive or multiplicative relation

to strong evidence for a supramultiplicative relation (Saracci and Boffetta, 1994;

Erren et al., 1999; Vainio and Boffetta, 1994; Steenland and Thun, 1986).

The importance of understanding the combined effect of asbestos exposure and

smoking can be placed in both a public health and legal context. From a public

health perspective, evidence for a multiplicative relation between asbestos exposure

and smoking has lead to recommendations for asbestos-exposed smokers to quit

smoking, since cases of lung cancer induced by both exposures would be prevented,

along with those induced by smoking alone (Waage et al., 1997). In a legal context, a

greater understanding of the combined effect has been required in the attribution of

damages in cases where there is a history of exposure to both asbestos and smoking

(Guidotti, 2002).

The objective of this investigation is to review the evidence for the combined

effect of smoking and asbestos, the relationship of which is frequently debated in

epidemiology, and to propose a Bayesian approach for combining this information.

The strength of the Bayesian approach, in this context, is twofold. First, through

the hierarchical structure of likelihoods and priors, informed opinion about variance

structures and relationships between studies and outcomes can be integrated with

the observed data. The second is the ability to make useful probability statements

on the basis of all information, rather than simple significance statements based on

25

specific hypothesis tests.

3.3 Assessing interaction between asbestos and

smoking

While a conceptual basis for assessing interaction between two risk factors is well

known (Greenland and Rothman, 1998), in general, tests for interaction and the

interpretation of results are less well understood (UNSCEAR, 1982). Incorrect ap-

proaches to assess interaction appear frequently in the literature (Hallqvist et al.,

1996). Further, as many studies are underpowered to assess interaction, assess-

ments of strength of interaction, rather than statistical significance may be impor-

tant (Saracci and Boffetta, 1994).

A conceptual basis for understanding interaction between risk factors is found in

Rothman’s (Rothman, 1976) component sufficient-cause paradigm of disease causa-

tion. Under this paradigm, synergistic or positive interaction occurs if two exposures

are component causes in the same sufficient cause. In the context of this study, a

case for synergistic interaction can be made if some persons develop lung cancer only

under exposure to both asbestos and smoking.

A synergistic interaction effect, in the biological sense, is tested by departure

from additivity of absolute effects. That is, the relative excess risk among those with

combined exposure should exceed the sum of the relative excess risks for each of the

component causes, referenced to those not exposed to both causes. This description

is analagous to the basis for the Synergy Index (S) introduced by Rothman (1974)

and outlined in the next section.

Hallqvist et al. (1996) describes some of the problems with approaches which have

been used in the literature. For example, an inappropriate approach is to compare a

higher cumulative incidence of a joint exposure to that observed for either risk factor

separately and infer that one risk factor is exacerbating the effect of the other, since

the relationship of each risk factor to a joint exposure may be less than additive.

Another common approach to assessing interaction is to include a product term in a

26

logistic or log-linear regression. As both types of regressions assume a multiplicative

form, including an interaction term assesses departure from a simple multiplicative

model but provides no information in support of an additive relation.

3.3.1 Synergy Index (S)

A common test for an additive relation used in the epidemiological literature is the

Synergy Index (S). The theoretical basis for S has been well described (Rothman,

1974, 1976). Here we present the methodology as outlined by Rothman (1976).

Suppose that there are two independently acting causal agents, in this case say

A for asbestos and S for smoking, and underlying (background) causes denoted

collectively as C, also independent of A and S. Then

PT = PA + PS + PC − PAPS − PAPC − PSPC + PAPSPC (3.1)

where P denotes the probability that a disease develops alone (with appropriate

subscripts), and the subscript T denotes the total probability (where A,S and C

are present). The combined or joint effect of A and S on the probability of disease

(risk) is given by PT − PC . i.e PA + PS − PAPS − PAPC − PSPC + PAPSPC (under

independence)

Using risk notation, define RAS = PT , R00 = PC , RA = PA + PC − PAPC and

RS = PS + PC − PSPC . Assuming that PA and PS are small (the implications of

this are discussed by Wildner and Markuzzi (1997)), this can be simplified to

PT − PC∼= PA + PS (3.2)

which in risk notation becomes

RAS −R00∼= RA + RS − 2R00 (3.3)

Equation (3.3) can be expressed in relative risk terms (by dividing each term by

27

R00), which can then be defined as a Synergy Index (S),

S =RRAS − 1

RRS + RRA − 2=

ERRAS

ERRA + ERRS

(3.4)

where ERR is the excess relative risk. Thus, positive interaction or synergy is

observed if the relative risk attributable to combined exposure exceeds the sum of the

risks attributable to each exposure separately. Alternatively, S can be interpreted as

the excess risk from exposure (to both exposures) when there is interaction relative

to the excess risk from exposure (to both exposures) without interaction. Under the

additive hypothesis S = 1, whereas for a more than additive model S > 1 and a

less than additive model is reflected by S < 1. On the basis of S, estimates can be

obtained of the attributable proportion of risk due to interaction, API = S/(S − 1)

(Walker, 1981). The API expresses the proportion of lung cancer risk for those

exposed to both factors (including background risk) that can be attributed to the

combined (as distinct from the separate) effects of the two factors. The calculation

for the standard error of S is described in the Appendix.

3.3.2 Multiplicativity Index (V)

A common test for a multiplicative relation is to include an interaction term in

a logistic or log-linear model (Gustavsson et al., 2002). Alternatively, in a recent

review of the literature, Lee (2001) defines and uses a ‘Test of Multiplicativity’. Since

this is not strictly a test we use this here in the equivalent sense of a Multiplicativity

Index.

Following Lee (2001), for a multiplicative relation to hold, the product of risks

for R00 and RAS should equal that for RA and RS:

RASR00 = RARS (3.5)

or in relative risk terms (by dividing by R00)

RRAS = RRSRRA (3.6)

28

The Multiplicativity Index (V) is simply then,

V =RRAS

RRSRRA

(3.7)

Under the multiplicative hypothesis V = 1, whereas for a more than multiplica-

tive model (e.g an exponential relation) V > 1, and for a less than multiplicative

model V < 1. The calculation for the standard error of V is described in the Ap-

pendix.

Note that there is no specific value of S that corresponds to a multiplicative

model. Similarly, there is no specific value of V that corresponds to an additive

model. Therefore, neither index confirms one model and rejects the other, but an

investigation of both indices together provides an assessment of the degree of support

for additive or multiplicative relationships.

3.3.3 The relationship between exposure to asbestos and

smoking

In the earliest reported assessment of the interaction between exposure to asbestos

and smoking on lung cancer, Doll found some evidence for a multiplicative hypoth-

esis, although it was “far from convincing” (Doll, 1971). Subsequent reviews by

Saracci (1977, 1987); Saracci and Boffetta (1994) and Erren et al. (1999) indicated

evidence in support of the multiplicative hypothesis, while evidence from Berry et al.

(1985) was inconclusive. Consistent with the evidence from a number of studies, two

recent reviews of the literature by Lee (2001) and Liddell (2001) arrived at slightly

different conclusions as to the form of the combined effect. Lee (2002) found lit-

tle evidence to reject a multiplicative relation: “The asbestos relative risk may be

somewhat lower in smokers than non-smokers, but the available data do not clearly

reject the simple multiplicative relation. More complex models of joint action might

indeed fit the data better, but in view of the general problems with the data, it

seems doubtful whether more detailed statistical analysis would shed any greater in-

sight.” (p.496). However, Liddell (2002) highlighted differences in the results of the

case-control versus cohort studies, finding evidence against a simple multiplicative

29

hypothesis. “Therefore, the multiplicative hypothesis is not generally satisfactory.

Nor, of course, is the additive hypothesis, although it does fit some data sets very

well. Evidently, interaction takes several forms. ” (p.495).

Both authors agreed that the form of the combined effect is more than a simple

additive relation, but the strength and nature of the more complex association was

not unanimously determined.

3.4 Methods

3.4.1 Studies

In our assessment of the interaction between exposure to asbestos and smoking we

restrict our attention to the set of studies included in two recent reviews of the

literature by Lee (2001) and Liddell (2001), as it is here that the debate over the

relationship between asbestos exposure and smoking crystalised. The inclusion of

these studies also allows a comparison of results from the approaches explored in

this chapter. A full search of the MEDLINE reference database (1966 - May 2004)

was performed to confirm the information on the studies included in the two reviews

and to assess the influence of results from studies published since then. Details and

results of studies published after 1998 are provided in the discussion section. Details

of studies up to and including the reviews by Lee and Liddell (1966 - 1998) are

shown in Table 3.1.

A summary of the relevant results from studies which provided enough informa-

tion to estimate relative risk of lung cancer for each exposure category is given in

Table 3.2. Differences in results between studies can be partly explained by vari-

ability in exposure levels for both asbestos and smoking. For example, some of

the studies included ex-smokers and light smokers in the non-smoking group. The

variability in exposure levels is discussed further in Section 5, and addressed in the

sensitivity analysis (Section 4.1). Our review found much variability both in the

use of formal statistical methods to assess the combined effect, and the conclusions

reached. The statistical methods to assess interaction ranged from visually compar-

30

Table 3.1: Details of Studies Used for Statistical Analysis

AuthorLocation Study Type and Population Period Followed Study

Ref.*Selikoff & Hammond Cohort 1: New York and

Newark NJ, Cohort 2: USAand Canada

Cohort. Asbestos insulation workers Cohort 1: 1963-1973,Cohort 2: 1967-72

13(1975)

Martischnig et al. Gateshead, England Hospital CC in shipbuilding area 1972-73 2(1977)Blot et al. (1978) Georgia, USA Hospital CC in shipbuilding area 1970-76 7

Hammond et al. (1979) USA and Canada Cohort. Asbestos insulation workers 1967-76 15

Blot et al. (1980) Virginia, USA Hospital CC in shipbuilding area 1972-76 8

Selikoff et al. (1980) New Jersey, USA Cohort. Amosite asbestos factory 1961-77 14workers

Blot et al. (1982) Florida, USA Hospital based CC in shipbuilding 1970-75 9area

Liddell et al.(1984) Quebec, Canada Cohort. Chrysotile miners and millers 1967-75 17

Pastorino et al.(1984) Lombardy, Italy CC in industrial areas 1976-79 3,4

Berry et al.(1985) East London, England Cohort. Asbestos factory workers 1960-70, 1971-80 16,18

Kjuus et al.(1986) Telemark andVestfold, Norway

Hospital CC in industrial and 1979-83 6shipbuilding areas

de Klerk et al. (1991) Wittenoom, Australia Nested CC in crocidolite miners and 1979-86 1millers

Bovenzi et al.(1993) Trieste, Italy Decedent CC in industrial and 1979-81, 1985-86 5shipbuilding area

McDonald et al. (1993) Quebec, Canada Cohort. Chrysotile miners and millers 1950-92 10

Zhu & Wang (1993) 8 factories, China Cohort. Chrysotile asbestos products 1972-86 11workers

Meurman et al. (1994) North Savo, Finland Anthophyllite miners 1953-91 12

Note: *Study numbering for the purposes of this statistical review (as used by Lee (2002), except

Studies 14 - 19 are referenced here as Studies 13 - 18), and in Table 3.2. For study references, see Lee (2001).

31

ing relative risk estimates for exposure groups to more formal significance testing.

There was no consistent conclusion in favour of either an additive or multiplicative

relation.

Table 3.2: Reported Results of Studies

Study AuthorObserved relative risk estimates (95% CI) Covariance of

relative riskestimates

RRS RRA RRAS

1 deKlerk 3.44 2.24 9.57 1.65(0.74, 16.01) (0.41, 12.28) (2.25, 40.65)

2 Martischnig 1.78 1.08 5.57 1.17(0.75, 4.20) (0.19, 6.05) (2.04, 15.18)

3 Pastorino, no PAH 5.47 2.82 9.86 5.50(0.40, 74.20) (0.04, 188.22) (0.69, 140.09)

4 Pastorino, PAH 6.93 2.21 15.50 11.72(0.30, 159.08) (0.02, 206.42) (0.63, 380.37)

5 Bovenzi 10.13 1.83 15.89 3.45(1.13, 91.06) (0.10, 33.82) (1.77, 142.80)

6 Kjuus 5.41 2.41 19.86 1.21(2.09, 13.99) (0.46, 12.50) (5.57, 70.78)

7 Blot, Georgia 4.71 1.28 7.58 1.13(2.27, 9.77) (0.27, 6.01) (3.31, 17.35)

8 Blot, Virginia 3.09 1.88 4.87 1.14(1.43, 6.70) (0.64, 5.50) (2.04, 11.58)

9 Blot, Florida 6.01 1.80 7.79 1.66(1.45, 24.92) (0.14, 22.85) (1.77, 34.18)

10 McDonald 4.46 1.65 4.51 1.11(2.34, 8.48) (0.70, 3.88) (2.38, 8.57)

11 Zhu 1.83 3.78 11.06 1.32(0.58, 5.76) (1.25, 11.37) (3.87, 31.62)

12 Meurman 6.27 0.83 6.16 2.72(0.82, 48.25) (0.05, 13.22) (0.85, 44.78)

13 Selikoff & Hammond 7.13 8.47 73.73 1.07(4.20, 12.11) (1.92, 37.25) (40.47, 134.33)

14 Selikoff 8.67 25.00 40.63 1.07(5.11, 14.71) (9.00, 69.41) (22.30, 74.01)

15 Hammond 10.85 5.17 53.24 1.06(6.39, 18.41) (2.17, 12.32) (31.11, 91.12)

16 Berry, 1971-80 M+F 7.13 7.27 17.25 1.06(4.20, 12.11) (2.39, 22.09) (9.75, 30.52)

17 Liddell 4.94 2.98 8.21 24.62(0.14, 172.43) (0.07, 127.77) (0.24, 279.28)

18 Berry, 1960-70 F 7.13 5.00 52.56 1.06(4.20, 12.11) (0.66, 38.02) (25.06, 110.25)

32

Lee (2001, 2002) and Liddell (2001, 2002) described their criteria for excluding

studies, with many studies excluded for insufficient reporting of exposure levels,

and absences of lung cancer cases in the non-smoking group. The set of studies

for which there was some agreement on their inclusion (See Lee (2002), p.495, Ta-

ble 3.1, Studies 1-12,14-19) are identified and numbered in the right hand column

of Table 3.1.

3.4.2 Methods to assess interaction

Bayesian Meta-Analysis of V and S

We are interested in an overall estimate of the combined effect of asbestos and

smoking using estimates from each study. The main advantage of which is to ‘borrow

strength’ across studies, in order to gain greater precision for the estimate of the

variable of interest, in this case S and V. For each study we estimate the value of

S and V using equations 3.4 and 3.7 respectively, and their associated variances as

outlined above. The information given by Study 11 was insufficient to calculate an

appropriate estimate for the variance of S, and hence we excluded the estimate from

this study in estimating the overall measure. The influence of this exclusion on the

overall results is presented separately in Section 3.5.

Consider first a hierarchical model for S. We suppose that we have k studies, and

that

Yi = observed log(si)

θi = true log(si) for study i, i = 1, ..., k

where si denotes the synergy index for the ith study. Following Dumouchel and

Harris (1983) for the univariate analysis, we make the following distributional as-

sumptions

Y |θ, σ ∼ N(θ, σ2C)

σ−2 ∼ χ2(dfσ)/dfσ

33

and

θ|µ, τ ∼ N(Xµ, τ 2V )

µ|τ ∼ N(0, D →∞)

τ−2 ∼ χ2(dfτ )/dfτ

where C and V are k×k observed and prior variance-covariance matrices respectively,

and the degrees of freedom dfσ and dfτ indicate how well C and V, respectively, are

known. Again following DuMouchel, we assume the studies are independent and

take V to be the k × k identity matrix, and we take C to be a diagonal matrix

with the corresponding diagonal entries the variances of the individual observations

Yi. For a general discussion on these assumptions see Tweedie et al. (1996). X is

a vector of 1’s and µ is the mean log synergy index for all studies combined. The

notation D → ∞ indicates that the elements of D are very large and tending to

infinity.

The hierarchical model for V is the same as that for S, with obvious changes of

notation.

We initially conservatively assume that dfσ = 79, to reflect the average number

of jointly exposed cases, and dfτ = 10 to acknowledge there is little information

about between study-behaviour. In Section 3.5.1 we test the sensitivity of these

assumptions.

The models were run using a Gibbs sampling algorithm in the software package

WinBUGS (Spiegelhalter et al., 2002). For each analysis, estimates were based on

30,000 iterations, after a burn in of 20,000 cycles. Convergence was assessed by

examining Monte-carlo error estimates and Gelman-Rubin statistics (Brooks and

Gelman, 1998a).

Bayesian Multivariate Analysis of Relative Risks

Here Yi is a vector (log(RRAS),log(RRA),log(RRS)) observed for each study i, i =

1, ...k. The multivariate normal distribution is denoted as MVN.

34

For the multivariate analysis, we make the following distributional assumptions,

Yi ∼ MV N(θi, Ci)

θi ∼ MV N(µ, Σ)

C−1i ∼ Wishart(R, 3)

and

µ ∼ MV N(0, D)

Σ−1 ∼ Wishart(V, 3)

where θi are µ are the study-specific and overall posterior estimates, respectively.

C−1i and Σ−1 are precision matrices, and R and V are scale matrices for the prior

variance-covariance matrices. R consists of the observed variance-covariance matrix

for yi, and V is taken to be a diagonal matrix suggesting a priori independence be-

tween the risk factors and little a priori information about the size of the variances.

D is a variance-covariance matrix of diagonal elements approaching ∞. The covari-

ances of observed relative risk estimates for case-control studies were estimated via

logistic regression, and for cohort studies from a Poisson model (Breslow and Day,

1987).

3.5 Results

Table 3.3 provides the observed study-specific estimates of S and V and the corre-

sponding posterior estimates based on the univariate Bayesian model described in

Section 3.4.2. Most of the observed study-specific estimates for S are greater than 1,

indicating a more than additive relationship. The study-specific posterior estimates

for S show evidence of shrinkage towards the overall mean. Overall, by ‘borrowing

strength’ across studies the posterior mean of S is 1.70 with a 95% credible interval

(CI) of (1.09, 2.67) indicating (overall) strong evidence in favour of a more than

additive relationship. Inclusion of Study 11 by assuming a relatively small variance

does not change the overall result greatly (1.74 (1.13, 2.70)).

35

Table 3.3: Results (Univariate): Test of Synergy (S) and Multiplicativity (V)

Study Author Synergy Index(S) Multiplicativity Index (V)ObservedEstimates*

PosteriorEstimates**

ObservedEstimates*

PosteriorEstimates**

1 deKlerk 2.33 2.08 1.25 1.00(0.90, 6.06) (0.83, 5.25) (0.19, 8.15) (0.29, 3.51)

2 Martischnig 5.30 2.77 2.89 1.87(1.23, 22.80) (0.86, 9.29) (0.87, 9.61) (0.70, 5.04)

3 Pastorino, no PAH 1.41 1.48 0.64 0.75(0.64, 3.12) (0.66, 3.26) (0.10, 4.08) (0.22, 2.58)

4 Pastorino, PAH 2.03 1.92 1.01 0.91(0.86, 4.79) (0.83, 4.50) (0.13, 7.89) (0.25, 3.40)

5 Bovenzi 1.49 1.50 0.86 0.85(1.14, 1.96) (1.08, 2.09) (0.31, 2.39) (0.36, 2.01)

6 Kjuus 3.24 2.63 1.52 1.20(1.36, 7.72) (1.11, 6.26) (0.39, 5.93) (0.42, 3.42)

7 Blot, Georgia 1.65 1.65 1.26 1.15(1.07, 2.55) (1.01, 2.69) (0.54, 2.93) (0.55, 2.44)

8 Blot, Virginia 1.30 1.35 0.84 0.84(0.75, 2.24) (0.74, 2.45) (0.39, 1.81) (0.42, 1.69)

9 Blot, Florida 1.17 1.23 0.72 0.76(0.73, 1.87) (0.72, 2.07) (0.22, 2.36) (0.29, 2.00)

10 McDonald 0.86 0.90 0.61 0.66(0.56, 1.31) (0.59, 1.38) (0.25, 1.49) (0.31, 1.44)

11 Zhu 1.60 1.25(0.43, 5.93) (0.45, 3.46)

12 Meurman 1.01 1.13 1.19 0.93(0.45, 2.25) (0.56, 2.33) (0.07, 20.33) (0.22, 4.03)

13 Selikoff & Hammond 5.35 2.50 1.22 1.01(0.63, 45.16) (0.70, 9.18) (0.21, 6.96) (0.30, 3.37)

14 Selikoff 1.31 1.38 0.19 0.29(0.62, 2.78) (0.70, 2.70) (0.06, 0.56) (0.12, 0.71)

15 Hammond 3.73 3.15 0.95 0.93(1.71, 8.11) (1.55, 6.39) (0.44, 2.06) (0.47, 1.84)

16 Berry, 1971-80 M+F 1.31 1.52 0.33 0.46(0.22, 7.68) (0.47, 4.99) (0.10, 1.07) (0.18, 1.18)

17 Liddell 1.22 1.25 0.56 0.64(0.84, 1.77) (0.82, 1.91) (0.20, 1.56) (0.27, 1.51)

18 Berry, 1960-70 F 5.09 2.16 1.47 0.98(0.28, 91.59) (0.54, 8.91) (0.09, 24.74) (0.23, 4.21)

Overall 1.70 0.86(1.09, 2.67) (0.52, 1.41)

Overall (incl. Study 11) 1.74(1.13, 2.70)

Attributable Proportion 0.41Due to Interaction (API) (0.08, 0.63)

Note: Estimates quoted are the mean estimate, below which is the * 95% Confidence Interval or** 95% Credible Interval.

36

The estimated value of API, given an overall observed estimate for S of 1.74, was

0.41 (0.08,0.63). This suggests that for smokers also exposed to asbestos, approxi-

mately 40% of lung cancer cases can be attributed to the synergistic behaviour of

the two carcinogens, as distinct from their separate effects. Note that this ‘attri-

bution’ is descriptive only and, without other analyses, indicates association rather

than cause.

The observed study-specific estimates for V in Table 3.3 vary from 0.19 for Study

14 to 2.89 for Study 2. On the basis of the observed 95% confidence intervals, most

of the studies except for Study 14 show evidence consistent with a multiplicative

relationship. Combining the studies, the overall posterior estimate for V is 0.86

(0.52,1.41). This is consistent with Lee’s results of 0.83 (0.63, 1.08) Lee (2002). Both

the observed and posterior estimates for V indicate conformity with the hypothesis

of a multiplicative relationship.

Table 3.4 provides the results of the multivariate analysis. The study-specific

posterior estimates of the relative risk of exposure to smoking alone (RRS) range

from 4.07 to 8.13. We find an overall posterior estimate for RRS of 5.51 (3.78,7.89).

For the relative risk of exposure to asbestos alone (RRA), the posterior estimates

range from 1.77 to 6.92. Overall, the posterior estimate for RRA is 3.13 (1.80,5.41).

For the combined exposure of asbestos and smoking, the posterior estimates range

considerably from 5.50 to 50.86. Overall, the posterior estimate for RRAS is 13.69

(8.20,22.76). On the basis of the relative risk estimates, the multivariate estimates

for S and V are 1.94 and 0.83 respectively. For each study, we also calculate the

probability that either S is greater than 1 (indicating more than additive) or V is

less than 1 (indicating less than multiplicative). Overall, the multivariate analysis

indicates overwhelming support for a value of S greater than 1 (P(S>1)' 1), and

a very high probability that V is less than 1 (P(V<1)=0.79). Considering the two

tests together, there is very strong evidence the relationship is more than additive

but less than multiplicative. Table 3.5 provides the overall results for S and V under

the univariate and multivariate models and probability estimates that S and V are

greater than selected thresholds.

A comparison of the overall results for S and V from the univariate and multi-

37

Table 3.4: Results for Multivariate RR Analysis

StudyNo. Author Posterior estimates*

RRS RRA RRAS S V P(S)>1 P(V)<11 deKlerk 4.73 2.88 12.30 2.16 1.04 0.97 0.58

(2.16, 10.74) (1.11, 7.81) (5.19, 31.47) (0.98, 4.23) (0.31, 2.51)2 Martischnig 3.43 2.29 9.56 3.05 1.51 0.94 0.37

(1.54, 8.79) (0.72, 8.05) (3.61, 31.66) (0.75, 7.51) (0.31, 4.19)3 Pastorino,

no PAH5.57 2.78 10.75 1.62 0.85 0.87 0.73

(2.34, 13.24) (0.82, 9.16) (3.82, 30.30) (0.69, 3.44) (0.20, 2.46)4 Pastorino,

PAH5.89 2.87 13.45 1.94 0.98 0.93 0.65

(2.38, 14.61) (0.82, 9.87) (4.47, 40.69) (0.77, 4.12) (0.23, 2.82)5 Bovenzi 6.80 2.25 11.00 1.50 0.86 0.91 0.72

(2.70, 16.56) (0.69, 7.20) (4.16, 27.72) (0.80, 2.45) (0.22, 2.37)6 Kjuus 5.51 3.03 17.37 2.67 1.21 0.97 0.45

(2.90, 10.50) (1.16, 8.35) (6.47, 43.08) (0.98, 5.34) (0.33, 2.99)7 Blot, Georgia 5.20 2.02 8.80 1.56 0.96 0.90 0.63

(2.87, 9.87) (0.76, 5.91) (4.28, 20.88) (0.76, 2.83) (0.28, 2.28)8 Blot, Virginia 4.07 2.29 6.99 1.52 0.85 0.81 0.74

(2.16, 8.93) (0.96, 6.23) (3.16, 21.20) (0.62, 3.27) (0.26, 2.05)9 Blot, Florida 5.94 2.25 8.39 1.25 0.74 0.71 0.80

(2.71, 12.53) (0.74, 6.84) (3.56, 21.18) (0.62, 2.37) (0.20, 1.99)10 McDonald 5.08 1.92 5.50 0.94 0.62 0.25 0.92

(2.81, 10.25) (0.87, 4.88) (2.93, 13.72) (0.56, 1.70) (0.23, 1.36)11 Zhu 3.74 4.70 15.93 2.42 1.04 0.98 0.56

(1.50, 10.00) (1.61, 12.94) (5.94, 44.48) (1.10, 4.50) (0.30, 2.45)12 Meurman 6.27 1.77 7.22 1.03 0.80 0.47 0.77

(2.47, 15.17) (0.49, 6.36) (2.67, 20.53) (0.51, 2.28) (0.19, 2.33)13 Selikoff &

Hammond6.25 6.53 50.86 4.91 1.49 0.99 0.33

(2.97, 11.15) (1.82, 20.45) (12.28, 104.27) (1.55, 9.24) (0.36, 3.97)14 Selikoff 6.39 6.92 26.71 2.43 0.78 0.97 0.76

(2.80, 12.76) (1.65, 26.60) (7.80, 64.91) (0.93, 5.40) (0.16, 2.46)15 Hammond 8.13 4.62 36.16 3.38 1.09 0.99 0.53

(3.51, 14.44) (1.67, 11.48) (10.94, 72.31) (1.44, 5.54) (0.34, 2.59)16 Berry,

1971-80M+F

6.37 4.56 15.83 1.71 0.63 0.96 0.88(3.46, 10.68) (1.61, 10.99) (7.85, 29.22) (0.94, 3.10) (0.22, 1.64)

17 Liddell 5.42 2.93 9.62 1.45 0.72 0.84 0.82(2.15, 13.41) (0.95, 8.80) (3.40, 27.36) (0.71, 2.90) (0.20, 1.90)

18 Berry, 1960-70 F 6.46 5.10 37.08 3.98 1.34 0.99 0.41(3.42, 10.80) (1.59, 15.99) (10.90, 76.17) (1.34, 7.52) (0.33, 3.51)

Overall 5.51 3.13 13.69 1.94 0.83 1.00 0.79(3.78, 7.89) (1.80, 5.41) (8.20, 22.76) (1.29, 2.84) (0.46, 1.40)

Note: * Estimates quoted are the mean estimate, below which is the 95% Credible Interval.

38

variate analyses in Table 3.5 does not reveal a large difference in either the point

estimates or the confidence range. The results for V from Studies 2 and 5 (the

largest study) can be compared in Figures 3.1 and 3.2. For Study 2 the relative

risk and covariance estimates are low compared to higher estimates for Study 5.

These figures again reveal relatively minor differences in the point estimates, but

the multivariate analysis supports a wider credible interval.

Table 3.5: Combined Results for S and V

Analysis Test of Synergy (S) Test of Multiplicativity(V)Overall

(95% CI)P(S)>1 P(S)>1.5 P(S)>2 Overall

(95% CI)P(V)<0.5 P(V)<1 P(V)<1.5

Bayesian Univariate 1.70 0.99 0.70 0.23 0.86 0.02 0.74 0.98(1.09, 2.67) (0.52, 1.41)

Bayesian Multivariate 1.94 1.00 1.00 1.00 0.83 0.05 0.79 0.98(1.29, 2.84) (0.46, 1.40)

Figures 3.3 and 3.4 show bivariate density plots for S and V based on the results

of the multivariate analysis from Studies 2 and 5 respectively. Lower relative risk

and covariance estimates for Study 2 results in a bivariate density plot which is

more evenly spread compared to Study 5. Table 3.6 provides a test of the variance-

covariance matrix for the multivariate analysis. Higher covariance estimates for the

relative risk estimates appear to result in a tightening of the 95% credible intervals

for S and V.

3.5.1 Sensitivity of the Results

A meta-analysis provides an opportunity to investigate subgroups of studies. Ta-

ble 3.7 provide the results for S and V by type of study, classification for non-smoker,

use of external reference, type of asbestos, classification used for no asbestos expo-

sure, and study size. There appears to be strong evidence for a less than multiplica-

tive relationship among the following subsets: prospective studies (P(V<1)=0.86);

39

Figure 3.1: Box plots of V from Study 2

40

Figure 3.2: Box plots of V from Study 5

41

Figure 3.3: Density plot of V and S from multivariate analysis for Study 2

42

Figure 3.4: Density plot of V and S from multivariate analysis for Study 5

43

Table 3.6: Sensitivity of estimates from the Variance/Covariance Matrix for the MultivariateAnalysis

Analysis S (95% CI) V (95% CI)Main 1.94 (1.29, 2.84) 0.83 (0.46, 1.40)Low variance (0.20), Low covariance (0.15) 2.07 (1.38, 2.99) 0.97 (0.60, 1.50)High variance (1.00), High covariance (0.90) 2.10 (1.36, 3.16) 1.00 (0.53, 1.74)High variance (1.00), Low covariance (0.15) 2.08 (1.07, 3.67) 0.99 (0.45, 1.92)

studies which classify only those who never smoked as non-smokers (P(V<1)=0.86);

studies based on exposure to crocidolite or amosite (P(V<1)=0.85); and studies

with number of cases less than 150 (P(V<1)=0.78). The difference between types

of asbestos is based on only two studies (1 and 14) for crocidolite and amosite,

and strongly influenced by a low estimate for V from study 14 (Observed V=0.19

(0.06,0.56)).

The sensitivity of the Bayesian univariate estimates to the distributional as-

sumptions in the model is provided in Table 3.8. Plausible ranges for the degrees

of freedom for σ and τ , a tighter precision estimate for µ and a robust analysis

excluding the smallest and largest studies were tested. The overall results appear to

be robust to these alternative assumptions.

The sensitivity of the Bayesian multivariate estimates is summarised in Table 3.9.

Results by study type, a tighter precision for µ and exclusion of the smallest and

largest study are shown. As previously indicated by the univariate results, there

appears to be more evidence for a multiplicative relation for case-control studies

(P(V<1)=0.37) compared to cohort studies (P(V<1)=0.90), although support for

a simple multiplicative relation for cohort studies (V=0.64 (0.23,1.42)) cannot be

ruled out.

44

Table 3.7: Results for S and V by Factor

Analysis Synergy Index (S) Multiplicativity Index (V)PosteriorEstimate (95%CI)

No.Studies

PosteriorEstimate (95%CI)

No.StudiesP(S)>1 P(V)<1

Main 1.70 0.99 17 0.86 0.74 18(1.09, 2.67) (0.52, 1.41)

1. Study Type Prospective 1.6 0.89 8 0.66 0.86 9(0.75, 3.55) (0.30, 1.46)

Case-control 1.82 0.97 9 1.10 0.40 9(0.96, 3.50) (0.53, 2.33)

2. Classification forNon-smoker

Never smoked 1.58 0.90 9 0.68 0.86 10(0.79, 3.26) (0.33, 1.39)

Light smoker 1.88 0.96 8 1.14 0.37 8(0.93, 3.88) (0.51, 2.56)

3. By Group(External Reference)

Data compared toexternal ref.

2.48 0.95 5 0.55 0.86 5(0.81, 7.74) (0.18, 1.69)

Otherwise 1.54 0.95 12 1.01 0.49 13(0.91, 2.61) (0.56, 1.85)

4. Type of AsbestosAny Type 2.03 0.99 12 0.98 0.53 12

(1.14, 3.68) (0.53, 1.83)Crocidolite orAmosite

1.69 0.75 2 0.40 0.85 2(0.33, 8.56) (0.06, 2.56)

Chrysotile 1.01 0.51 2 0.79 0.64 3(0.22, 4.54) (0.20, 3.10)

5. Classification forNo Asbestos Exposure

No Exposure 1.97 0.89 4 1.25 0.34 5(0.66, 6.12) (0.42, 3.75)

Low Exposure 1.53 0.85 7 0.76 0.72 7(0.68, 3.51) (0.30, 1.95)

Population Exp. 1.81 0.92 6 0.73 0.76 6(0.77, 4.22) (0.29, 1.78)

6. Study SizeNo. Cases < 150 1.76 0.92 8 0.71 0.78 9

(0.80, 3.97) (0.30, 1.75)No. Cases > 150. 1.69 0.95 9 0.97 0.54 9

(0.90, 3.26) (0.49, 1.93)

45

Table 3.8: Sensitivity of the Posterior Estimates (Univariate): Test of S and V

Analysis Posterior estimates* P(S)>1 P(V)<1S V

Main 1.70 0.86 0.99 0.74(1.09, 2.67) (0.52, 1.41)

1. df(σ) 14 1.71 0.86 0.99 0.74(1.09, 2.70) (0.53, 1.41)

268 1.70 0.85 0.99 0.74(1.09, 2.67) (0.52, 1.42)

2. df(τ) 20 1.72 0.86 0.99 0.73(1.07, 2.79) (0.51, 1.44)

2 1.63 0.84 1.00 0.79(1.15, 2.36) (0.55, 1.31)

3. Precision(µ) 1/15 1.69 0.86 0.99 0.74(1.09, 2.66) (0.52, 1.41)

4. Robust (excl. smallest 1.69 0.84 0.98 0.74and largest studies) (1.04, 2.75) (0.49, 1.43)

Note: * Estimates quoted are the mean estimate, below which is the 95% Credible Interval.

Table 3.9: Sensitivity of the Posterior Estimates (Multivariate Analysis)

Analysis RRS RRA RRAS S V P(S)>1 P(V)<1Main 5.51 3.13 13.69 1.94 0.83 1.00 0.79

(3.78, 7.89) (1.80, 5.41) (8.20, 22.76) (1.29, 2.84) (0.46, 1.40)SA1 CC 4.22 1.72 8.39 1.99 1.28 0.97 0.37

(2.39, 7.53) (0.85, 3.55) (4.75, 15.18) (0.99, 3.79) (0.48, 2.79)PP 7.14 5.63 23.08 2.13 0.64 0.98 0.90

(4.06, 12.39) (2.51, 12.85) (10.33, 48.28) (1.01, 3.98) (0.23, 1.42)SA2 Restrictive prior 5.22 2.90 12.54 1.92 0.86 1.00 0.75

(3.57, 7.45) (1.67, 4.92) (7.48, 20.59) (1.26, 2.80) (0.48, 1.45)SA3 Robust (excl. smallest 5.13 3.05 12.33 1.87 0.83 1.00 0.79

and largest studies) (3.39, 7.67) (1.71, 5.54) (7.24, 21.16) (1.20, 2.78) (0.43, 1.44)

Note: Estimates quoted are the mean estimate, below which is the 95% Credible Interval.

46

3.6 Discussion

We reviewed the literature on the combined effect of exposure to asbestos and smok-

ing on lung cancer, and explored a Bayesian approach to assess evidence of interac-

tion. A Bayesian approach using estimates of S and V indicates that the relation is

closer to multiplicative than additive, a result consistent with recent reviews of the

literature.

The results highlight two issues. First, estimates from the univariate and multi-

variate analysis were similar but with wider credible intervals on the latter. Although

we have more information about our parameters, the effect of incorporating covari-

ance information is to increase the number of parameters of interest, and we only

indirectly estimate these parameters. The wider credible interval of each estimate

may thus be a more accurate reflection of the uncertainty we have in these estimates.

Second, while there was support from a few of the studies for a multiplicative re-

lation, the same studies also supported an additive relation. It is thus important

to both allow for this uncertainty in the modelling and directly assess competing

hypotheses of interest by analysing the evidence jointly.

Several explanations have been postulated for the biological mechanisms under-

lying evidence for a multiplicative relation. One is that cancer may be a multistage

process, with the two carcinogens acting at different stages (Peto et al., 1996). It is

postulated that early stage carcinogenesis by asbestos supplies a population of initi-

ated cells that the powerful late-stage actions of tobacco carcinogens then promote

to overt cancer (Reif and Heeren, 1999). In contrast, when two carcinogens affect

the same stage of carcinogenesis, then the relative risks are additive, and there is no

interaction (Brown and C, 1989). Another explanation is that smoking may impair

clearance of asbestos particles from the lung (Cohen et al., 1979).

The main emphasis of our review is on studies included in two recent reviews

of the literature (Lee, 2001; Liddell, 2001). A search of the MEDLINE reference

database (1998 - May 2004) and cited references for more recent studies revealed a

number of papers with information relating to occupational exposure to asbestos,

smoking habits, and the association of these factors with lung cancer risks. However,

47

only one study provided quantitative information about relative risks for each expo-

sure category (Gustavsson et al., 2002). The results from one study were based on a

cohort previously included (Liddell and Armstrong, 2002). Some studies provided in-

sufficient information on smoking habits (Rafnsson and Sulem, 2003; Ulvestad et al.,

2002; Goldberg, 1999; Stayner et al., 1997) , while other studies were underpowered

to assess evidence of a joint effect (Rosamilia et al., 1999) or were genetically based

(Schabath et al., 2002). The study by Goldberg (1999) found that “the probabil-

ity that a cancer is due to asbestos is the same among smokers and non-smokers”,

implying a multiplicative relation was found.

The study by Gustavsson et al. (2002) investigated the association between low-

dose exposure to asbestos and lung cancer, and in the analysis of the combined

effect of asbestos and smoking, found evidence indicating a less than multiplicative

yet slightly more than additive effect. Relative risk estimates (with 95% confidence

intervals) are reported as RRS=21.8 (14.4, 32.8), RRA=4.2 (1.6, 11.1), RRAS=28.6

(19.9, 48.3). Departure from multiplicativity was investigated in the study by includ-

ing an interaction term in a logistic regression (β12=0.31 (0.11,0.86)), and departure

from additivity was evaluated using the Synergy Index (1.15 (0.77, 1.72)).

There are, of course, limitations associated with combining studies in the form of

a meta-analysis. Meta-analysis is designed to enable a combination of results from

studies which are comparable in outcome and exposure. Here we have combined

studies with variability in, but not limited to: definitions of non-smokers; exposure

times to asbestos; and exposure to different types and size of asbestos particles. In

the first case, approximately half of the studies (ten) defined a non-smoker as ‘never

smoked’, with the rest combining non-smokers and ‘light smokers’. Although, as Lee

points out, it could still be possible to observe a multiplicative relation regardless

of the smoking definition, the magnitude of the effect may be somewhat diminished

(Lee, 2001). Across studies there is also a difference in the duration and level of

exposure to asbestos. A recent study by Gustavsson et al. (2002) examining the

risk of low-dose exposure to asbestos, found evidence for a multiplicative relation

with a magnitude of interaction lower than that previously reported for higher doses.

The studies also differ in the type of asbestos and the size of asbestos particles to

48

which subjects are likely to be exposed. Another recent study by Hodgson found

the risk differential between chrysotile and crocidolite or amosite for lung cancer to

be between 1:10 and 1:50 (Hodgson and Darnton, 2000). Further, there is evidence

that the size of asbestos particles is important. Landrigan (1998) found evidence

that the risk of lung cancer in the mining and milling industry is 10 to 50 fold lower

than in industries that process and use asbestos, such as textile manufacture and

insulation. In industries that use and process asbestos, bundles of fibres are broken

up into shorter, thinner fibres that are readily inhaled and retained in the alveoli.

There are also limitations to assessing interaction. First, in the case of studies

on exposure to asbestos and smoking, the small number of lung cancer cases for

non-smokers greatly increases the uncertainty of establishing any relation unless

study populations are very large or specifically targeted. For example, in study

9, we found support for a multiplicative relation using our test of multiplicativity

(V=0.72 (0.22,2.36)). A hypothetical increase in the number of lung cancer cases

occurring in the asbestos exposed population from 5 to 10 would only be needed

to obtain a result which is not supportive of a multiplicative relation. Greenland

and Rothman (1998) suggest that even with large data sets we may not have enough

information to establish relations among variables while controlling for confounding.

Second, consistent with a multi-stage model of carcinogenesis, the form of interaction

observed may be influenced by the length of follow-up time in studies (Archer, 1988).

An assessment of interaction is also a function of dosage levels for each risk factor,

both in the nature of the functional form assumed for dose-response relationships for

each factor and the dosage levels at which they combine. In the case of continuous

covariates, care must be taken to consider the appropriate dose-reponse relationship

for each factor individually before an assessment of the combined effect. Here we

have used categorical covariates (exposed versus not exposed) on the dosage levels

for each factor, and the dose-reponse relationship is difficult to explicitly model.

Our main interest is then the extent to which the risk factors combine at this binary

level. Although the definitions of those exposed and not exposed are subject to

cutoff points we should still be able to see evidence for a multiplicative or additive

relation provided the definitions are consistent across studies. However, a limitation

49

of such a binary classification is that the power to test interactions is essentially

determined by the size of the smallest category, so few lung cancer cases for non-

smokers suggests that an analysis based on a binary classification is likely to be

weaker than one based on continuous data.

Chapter 4

A Bayesian approach to assess interaction between

known risk factors: the risk of lung cancer from

exposure to asbestos and smoking

4.1 Summary

In Chapter 3, we primarily focussed on separate tests for an additive or multiplicative

relation. In this Chapter, we extend these approaches by exploring the strength

of evidence for either relation using approaches which allow the data to choose

between both models. We then compare the different approaches. As this chapter is

designed to be read independently of Chapter 3, the first three sections (Introduction,

Overview of studies, and Methods to assess interaction) are largely repeated from

Chapter 3.

4.2 Introduction

The assessment of relationships between risk factors has been the subject of much

research (Saracci and Boffetta, 1994; Liddell, 2001; Lee, 2001). Recent studies have

provided an interpretation in the context of classical relative risk models (Roy and

50

51

Esteve, 1998), proposed an alternative measure of effect (Berry and Liddell, 2004)

or assessed non-parametric alternatives (van der Linde and Osius, 2001).

Tests for interaction are commonly based on linear additive or multiplicative

relations. Evidence for a multiplicative relation is indicated if the risk attributed to

combined exposure exceeds the risk attributable to each factor alone. In this sense,

we are talking about positive interaction or synergy between risk factors, and not

antagonism. Alternatively, if the risk attributed to combined exposure equals the

sum of the risks attributable to each factor alone the relation is considered to be

additive.

Evidence for a multiplicative relation between exposure to asbestos and smoking

and the incidence of lung cancer was indicated by an early study of US workers

(Selikoff et al., 1968). Subsequent studies and reviews of the literature with an

objective to assess the nature of the relationship further have indicated mixed results,

ranging from mixed evidence for both an additive and multiplicative relation to

strong evidence for a supramultiplicative relation (Saracci and Boffetta, 1994; Erren

et al., 1999; Vainio and Boffetta, 1994; Steenland and Thun, 1986).

The importance of understanding the nature of the combined effect of asbestos

exposure and smoking can be placed in a public health and legal context. From

a public health perspective, evidence for a multiplicative relation between asbestos

exposure and smoking has lead to recommendations for asbestos-exposed persons

who currently smoke to stop, since cases of lung cancer induced by both exposures

would be prevented, along with those induced by smoking alone (Waage et al., 1997).

In a legal context, a greater understanding of the nature of the combined effect has

been required in the attribution of damages in cases where there is a history of

exposure to both asbestos and smoking (Guidotti, 2002).

We illustrate a Bayesian approach to assessing interaction using evidence on

the risk of lung cancer of exposure to asbestos and smoking. The strength of this

approach, in this context, is two fold. First, through the hierarchical structure of

likelihoods and priors, informed opinion about variance structures and relationships

between studies and outcomes can be integrated with the observed data. The second

is the ability to make useful probability statements on the basis of all information.

52

In particular, we will draw on these strengths, and explore approaches which allow a

single inference to be made about the strength of evidence for one relation compared

to another.

The chapter proceeds as follows. Section 2 provides an overview of the case study

and corresponding available literature. Section 3 outlines the proposed Bayesian ap-

proaches to the assessment of interaction. The results of the case study are presented

in Section 4. General conclusions and discussion follow in Section 5.

4.3 Overview of studies

In this chapter we focus on assessing information about the relationship between

asbestos exposure and smoking from studies included in two recent reviews of the

literature by Lee (2001)and Liddell (2001)(as discussed above), as it is here that the

debate over the relationship crystalised. The inclusion of these studies also allows

a comparison of reported meta-analysis results with those obtained in this chapter.

A full search of the MEDLINE reference database (1966 - May 2004) was used to

confirm the information of the studies included in the two reviews and to assess

the influence of results from studies published since then. The details and results

of studies published after Lee and Liddell are provided in the Discussion. Details

of studies up to and including the reviews by Lee and Liddell (1966 - 1998) are

provided in Wraith and Mengersen (2007).

In the earliest reported assessment of the interaction between exposure to as-

bestos and smoking on lung cancer, Doll found some evidence for a multiplicative

hypothesis, although it was “far from convincing” (Doll, 1971). Subsequent reviews

by Saracci (1977, 1987); Saracci and Boffetta (1994) and Erren et al. (1999) in-

dicated evidence in support of the multiplicative hypothesis, while evidence from

Berry et al. (1985) was inconclusive. Consistent with the evidence from a number of

studies, two recent reviews of the literature by Lee (2001) and Liddell (2001) arrived

at slightly different conclusions as to the form of the combined effect. Lee (2002)

found little evidence to reject a multiplicative relation: “The asbestos relative risk

may be somewhat lower in smokers than non-smokers, but the available data do

53

not clearly reject the simple multiplicative relation. More complex models of joint

action might indeed fit the data better, but in view of the general problems with

the data, it seems doubtful whether more detailed statistical analysis would shed

any greater insight.” (p.496). However, Liddell (2002) highlighted differences in the

results of the case-control versus cohort studies, finding evidence against a simple

multiplicative hypothesis. “Therefore, the multiplicative hypothesis is not generally

satisfactory. Nor, of course, is the additive hypothesis, although it does fit some

data sets very well. Evidently, interaction takes several forms. ” (p.495).

Both authors agreed that the form of the combined effect is more than a simple

additive relation, but the strength and nature of the more complex association was

not unanimously determined.

For the statistical assessment in the current chapter it was decided to include

studies for which there was some agreement on their inclusion between Lee (2002)

and Liddell (2002). Studies were mainly excluded for insufficient reporting of expo-

sure levels, and absences of lung cancer cases in the non-smoking group. A further

discussion on the inclusion of these studies can be found in Lee (2002) and Lid-

dell (2002). Details of the studies included for the following statistical analysis are

provided in Table 4.1.

4.4 Methods to assess interaction

A conceptual basis for understanding interaction between risk factors can be found

in Rothman’s (Rothman, 1976) component sufficient-cause paradigm of disease cau-

sation. Under this framework, synergistic or positive interaction occurs if two ex-

posures are component causes in the same sufficient cause. In the context of this

study, a case for synergistic interaction can be made if lung cancer occurs only with

exposure to both asbestos and smoking, as opposed to exposure to only one of these

two factors.

A synergistic interaction effect, in the biological sense, is tested by departure from

additivity of absolute effects, i.e the relative excess risk among those with combined

exposure should exceed the sum of the relative excess risks for each of the component

54

Table 4.1: Details of studies for statistical review

StudyRef.*

Author Location Study Type and Population Period Followed

1 de Klerk et al.(1991) Wittenoom, Australia Nested CC in crocidolite miners and 1979-86millers

2 Martischnig et al.(1977)Gateshead, England Hospital CC in shipbuilding area 1972-73

3,4 Pastorino et al.(1984) Lombardy, Italy CC in industrial areas. 3: No PAH, 4:PAH

1976-79

5 Bovenzi et al.(1993) Trieste, Italy Decedent CC in industrial andshipbuilding area

1979-81, 1985-86

6 Kjuus et al.(1986) Telemark and Vestfold,Norway

Hospital CC in industrial andshipbuilding areas

1979-83

7 Blot et al. (1978) Georgia, USA Hospital CC in shipbuilding 1970-76area

8 Blot et al. (1980) Virginia, USA Hospital CC in shipbuilding area 1972-76.

9 Blot et al. (1982) Florida, USA Hospital based CC in shipbuilding 1970-75area

10 McDonald et al. (1980) Quebec, Canada Cohort. Workers at Thetford minesand Asbestos, Que.

1891-1920 to 1975.

11 Zhu & Wang (1993) 8 factories, China Cohort. Chrysotile asbestos products 1972-86workers

12 Meurman et al. (1994) North Savo, Finland Anthophyllite miners 1953-91

13 Selikoff & Hammond(1975)

New York and Newark NJ,USA and Canada

Two cohorts. Asbestosinsulation workers

1963-1973, 1967-72

14 Selikoff et al. (1980) New Jersey, USA Cohort. Amosite asbestos factory 1961-77workers

15 Hammond et al. (1979) USA and Canada Cohort. Asbestos insulation workers 1967-76

16,18 Berry et al.(1985) East London, England Population based case-referentasbestos factory workers. 16: 1971-80M+F, 18: 1960-70 F

1960-70, 1971-80

17 Liddell et al.(1984) Quebec, Canada Cohort. Chrysotile miners and 1967-75millers

Note: * Study numbering for the purposes of this statistical review (as used by Lee Lee (2002), except Studies

14-19 are referenced here as Studies 13-18). For study references, see Lee Lee (2001).

55

causes, referenced to those unexposed to both causes. A common test for an additive

relation used in the epidemiological literature is the Synergy Index (S), introduced by

Rothman (Rothman, 1974) as S = (RRAS − 1)/(RRA +RRS − 2) where RRS, RRA

and RRAS are the relative risks from exposure to smoking, asbestos and smoking

and asbestos combined, referenced against no exposure (RR00), respectively. Under

the additive hypothesis S = 1; for a more than additive model S > 1 and less than

additive S < 1.

A test for synergy defined by Lee (Lee, 2001) is based on the rationalisation that

for a multiplicative model to hold, the product of RRAS and RR00 should equal that

for RRS and RRA. Hence the test statistic is V = (RRASRR00)/(RRSRRA). Under

the multiplicative hypothesis V = 1; for a more than multiplicative model V > 1,

and less than multiplicative V < 1.

In a previous chapter, we explored a Bayesian approach to the problem using the

Synergy Index (S) and Test of Multiplicativity (V) independently, and a bivariate

analysis using both these measures (Wraith and Mengersen, 2007). As outlined

above, S and V are essentially two separate tests, either testing for departure from

an additive or multiplicative relation, respectively. In the present chapter we explore

more synthesised approaches to assessing interaction, again in a Bayesian framework.

Our motivation in moving away from tests based on S and V is to explore approaches

which allow a single inference to be made about the strength of evidence for either

relation.

This section is arranged as follows. First, we adopt the common approach of

including an interaction term in either a logistic or Poisson hierarchical regression

model, for case-control or cohort studies, respectively. Next, we consider relative risk

models developed by Guerrero and Johnson (1982) and Lubin and Gaffey (1988).

Finally, we explore a mixture model where we hypothesise that the relative risk of

lung cancer from combined exposure to asbestos and smoking is drawn either from

an additive or multiplicative relation of exposure to asbestos and smoking alone.

56

Meta-analysis of logistic and poisson regression models

A common method to test for a multiplicative model is to include an interaction term

in either a logistic model for case-control data or Poisson model for cohort data. For

the studies in this review we only had access to the summary information provided

by the study results. While some of the case-control studies were matched the study

information provided precluded an analysis using conditional logistic regression.

For case control studies, let Yij denote the number of cases observed in each ex-

posure category j for study i. For the ith study, πij is the probability of observing

outcome j and nij is the number of cases and controls for outcome j. The vari-

ables XA and XS are binary covariates indicating exposure to asbestos or smoking,

respectively, βik is a vector of coefficients where k=(0,A,S,AS) with 0 representing

no exposure, and σ2i is a study-specific variance variable. The following hierarchical

logistic model is assumed,

Yij ∼ Bin(nij, πij)

log(πij

1− πij

) = βi0 + βiAXA + βiSXS + βiASXAXS + εij

(4.1)

with the following priors,

βik ∼ N(θk, 1)

εij ∼ N(0, σ2i )

θk ∼ N(0, 0.1)

σ−2i ∼ Gamma(0.1, 0.1)

Equation (4.1) specifies the full model including the interaction term (XAXS)

and includes allowance for over-dispersion (εij). Our main interest is in θk repre-

senting the effects of the covariates smoking, asbestos exposure and their interaction,

over all the studies. Using (4.1) the exponentiated form of βAS can be expressed

as RRAS/(RRA ∗ RRS), and hence measures the additional risk arising from the

combined exposure.

57

For the cohort studies we assume the following hierarchical Poisson model,

Yij ∼ Poisson(µij)

log(µij) = log(Eij) + βi0 + βiAXA + βiSXS + βiASXAXS + εij

(4.2)

where βik and εij are as in (4.1), Yij and Eij are the observed and expected number

of cases by exposure type, respectively.

Another common test for interaction is to assess the homogeneity of relative risks

across strata (in this case exposure to asbestos and smoking) using a Breslow-Day

test (Breslow and Day, 1987). Homogeneity of relative risks across strata implies a

multiplicative joint effect of two exposure categories, and hence the test is assessing

evidence for a multiplicative model. We do not pursue this test here.

Relative Risk Models

A multivariate meta-analysis of the observed relative risks for each study was first

undertaken to utilise information from both within and across studies. The output

of this analysis was then used as input for three relative risk models.

Here Yi is a vector (log(RRAS),log(RRA),log(RRS)) observed for each study i,

i = 1, ...k. The multivariate normal distribution is denoted as MVN.

For the multivariate analysis, we make the following distributional assumptions,

Yi ∼ MV N(θi, Ci)

θi ∼ MV N(µ, Σ)

C−1i ∼ Wishart(R, 3)

and

µ ∼ MV N(0, D)

Σ−1 ∼ Wishart(V, 3)

where θi are µ are the study-specific and overall posterior estimates, respectively.

C−1i and Σ−1 are precision matrices, and R and V are scale matrices for the prior

58

variance-covariance matrices. R consists of the observed variance-covariance matrix

for yi, and V is taken to be a diagonal matrix suggesting a priori independence be-

tween the risk factors and little a priori information about the size of the variances.

The prior distribution for µ is taken to be very uninformative, with the diagonal

variance-covariance matrix D comprising elements tending to infinity. The covari-

ances of observed relative risk estimates for case-control studies were estimated via

logistic regression, and for cohort studies from a Poisson model (Breslow and Day,

1987).

This multivariate model was run using the software package WinBUGS (Spiegel-

halter et al., 2002). For each analysis, estimates were based on 30,000 iterations,

after a burnin of 20,000 iterations. Convergence was assessed by examining Monte

Carlo error estimates and Gelman-Rubin statistic (Brooks and Gelman, 1998a). The

results of this analysis are provided in Table 4.3. A sample of five hundred (500)

posterior values of θi and µ was then used as input into the relative risk models

described below. A sample of 500 was chosen as a compromise between adequate

representation of the (well-behaved) parameter and computation time.

Several relative risk models have been proposed to assess interaction between

risk factors (Lubin and Gaffey, 1988). A recent review by Roy and Esteve (1998)

highlighted three main relative risk models used in cancer epidemiology, including

those proposed by Thomas (1981), Breslow and Storer (1985), and Guerrero and

Johnson (1982).

We first explored a Box-Cox type transformation applied to the relative risks

developed by Guerrero and Johnson (1982). The attractiveness of this model is that

it is invariant within the family of linear transformations of scale of the covariates,

and as a direct application of the Box-Cox method appears clearly to be searching

for the best link function within a given family (Roy and Esteve, 1998).

Guerrero and Johnson (1982) assumed that a power transformation of the odds

ratio satisfied a linear model,

(πij

1− πij

)γ = β0 + βAXA + βSXS (4.3)

59

and

(πij

1− πij

)γ =(

πij

1−πij)γ − 1

γ(γ 6= 0)

=log(πij

1− πij

) (γ = 0)

(4.4)

Although Equation (4.3) includes, as a special case the logistic model, we can also

estimate the parameters β in a Poisson regression model (See Equation 4.2).

Expressing (4.4) in relative risk terms using the sample posterior estimates (study

level),

θγi

iAS = θγi

iA + θγi

iS − 1 (4.5)

From equation (4.3), a multiplicative model is indicated by γ=0, whereas an ad-

ditive model is preferred if γ=1. We use equation (4.5) and estimate γ using Markov

Chain Monte Carlo (MCMC). As outlined earlier, our approach is to estimate γ us-

ing a sample of the posterior estimates from the multivariate analysis, both at the

study level (θi) and overall (µ).

For equation (4.3) our prior for γ is γi ∼ N(0.5, 1), where γi ∈ (−0.2, 3). This re-

striction on the prior distribution for γ was imposed to allow only feasible estimates,

and did not adversely affect the results.

As an alternative to equation (4.5), a general class of relative risk model proposed

by Lubin and Gaffey (1988) is

θiAS = (θiAθiS)γi(θiA + θiS − 1)1−γi (4.6)

where γ = 1 or γ = 0 imply a multiplicative or additive relation, respectively.

This representation more easily allows different functional forms for dose-response

relationships in the presence of continuous covariates (Lubin and Gaffey, 1988). In

the present analysis the covariates are categorical and there are only two outcomes

(exposed or not exposed), so we do not exploit this flexibility but rather use this

model as a viable functional form in which to express an additive or multiplicative

relation and to compare results. The prior for γ is γi ∼ N(0, 1).

60

An alternative to equation (4.6) was also considered. Here,

θiAS = θiA + θiS − 1 + γiθiAθiS (4.7)

where γ assesses the degree of departure from additivity (γ = 0 implying an additive

relation, γ > 0 more than additive).

The three relative risk models, equations (4.5),(4.6)and (4.7) were run using the

software package WinBUGS (Spiegelhalter et al., 2002) using three chains each with

6,500 iterations, after a burn in of 3,000 iterations. This run length was sufficient

to satisfy convergence diagnostics including Monte Carlo error estimates and the

Gelman-Rubin statistic (Brooks and Gelman, 1998a). Although these three relative

risk models are conceptually straightforward, the estimate for γ from both equations

(4.5) and (4.6) is difficult to interpret for two reasons. First, without information

about the distribution of γ, it is difficult to identify a threshold (e.g 0.5) for model

preference. We discuss this issue later in relation to the results of the mixture model.

Second, the interpretation of γ other than a value of 0 (multiplicative) or 1 (additive)

is difficult as effect parameters from equation (4.5) are measured as differences in

the value of (θγAS − 1)/γ. In light of these issues, the results using equations (4.5)

and (4.6) are best viewed as indicative only of the strength of evidence for either

relation.

Mixture Model

An alternative to the relative risk models described above is a mixture model. The

use of mixture distributions comprising a finite or infinite number of components,

possibly of different distributional types, to describe different features of data has

attracted a great deal of recent research interest (Marin et al., 2005; McLachlan and

Peel, 2000b).

Here we specify that values of θiAS are drawn from a Gaussian distribution with

a mean corresponding to either an additive (θiA +θiS−1) or a multiplicative relation

(θiAθiS). This choice is identified in the variable µ below, which can be drawn from

either µ1j or µ2j for an additive or multiplicative relation respectively. Formally, for

61

each study i,

log(θASj) ∼ N(µ(1,2)j, τ(Tj))

µ1j = log(θAj + θSj − 1)

µ2j = log(θAjθSj)

Tj ∼ Binomial(P1,2)

and

P1,2 ∼ Dirichlet(α)

τ1 ∼ Gamma(0.01, 0.01)

τ2 ∼ Gamma(0.01, 0.01)

where µ(1,2)j and τ(Tj) are mean and precision (1/σ2) variables for the jth obser-

vation, respectively. The parameter α reflects any prior information we may have in

the way in which θASj is allocated between the two components (additive or multi-

plicative relation), and for this analysis we have made it uninformative (α = (1, 1)).

4.5 Results

The meta-analysis using the logistic and Poisson regression models (equations (4.1)

& (4.2)) were based on studies for which all information to calculate relative risk

estimates was provided by the study, thus excluding five cohort studies which relied

on external rates (studies 13,14,15,16 and 18). Table 4.2 provides the results of

these meta-analysis. The reported coefficients (βik and θk) have been exponentiated

in order to interpret their meaning on the relative risk scale.

The results in Table 4.2 confirm mixed evidence in support of the simple mul-

tiplicative model, overall and by study type, with the 95% credible interval for the

relative risk from exposure to asbestos (exp(θA)) excluding 1 for a model without al-

lowance for over dispersion (1.04, 3.22) and including 1 for the model with allowance

for over dispersion (0.86, 3.12). Eleven out of the 13 studies had 95% credible in-

62

Table 4.2: Results of logistic and Poisson regression models

StudyWithout over-dispersion parameter With over-dispersion parameter

θ0 θ1 θ2ˆθ12 θ0 θ1 θ2

ˆθ12

Multiplicative model

Overall 0.09 1.84 4.53 na 0.16 1.64 4.69 na(0.05, 0.16) (1.04, 3.22) (2.56, 8.01) (0.08, 0.30)(0.86, 3.12) (2.49, 8.95)

Case Control 0.14 1.92 4.62 na 0.15 1.83 4.75 na(0.07, 0.28) (0.98, 3.74) (2.33, 9.10) (0.07, 0.33)(0.86, 3.90)(2.24, 10.06)

Cohort 0.03 1.67 4.24 na 0.20 1.18 4.14 na(0.01, 0.09) (0.62, 4.45) (1.50, 12.11) (0.05, 0.74)(0.33, 4.27)(1.13, 15.33)

Multiplicative model with interaction term

Overall 0.09 1.76 4.36 1.11 0.16 1.65 4.63 1.00(0.05, 0.16) (0.95, 3.26) (2.45, 7.83) (0.58, 2.11)(0.08, 0.31)(0.79, 3.42) (2.30, 9.36) (0.43, 2.28)

Case Control 0.15 1.65 4.39 1.24 0.16 1.67 4.45 1.16(0.07, 0.29) (0.78, 3.55) (2.20, 8.76) (0.57, 2.70)(0.07, 0.35)(0.71, 3.97) (1.99, 9.94) (0.46, 2.89)

Cohort 0.03 2.19 4.77 0.79 0.17 1.59 4.85 0.67(0.01, 0.09) (0.69, 6.69) (1.57, 14.56)(0.25, 2.55)(0.04, 0.70)(0.32, 6.94)(1.07, 21.71)(0.13, 3.79)


tervals for exp(βA) excluding 1, thus supporting a simple multiplicative model. For

both models (over-dispersed or otherwise) exposure to smoking appears to dominate

the simple multiplicative model with credible intervals at all levels excluding 1 for

studies overall and by type of study.

Overall, the coefficient for the interaction term (exp(θAS)) in the multiplicative

model has a 95% credible interval in the over-dispersed model of (0.43, 2.28). Inter-

estingly, the posterior mean of exp(θAS) is greater than 1 for case-control studies,

and less than 1 for cohort studies which is in agreement with other evidence that

there is greater support for the multiplicative hypothesis among case-control studies

than cohort studies (Liddell, 2001). Figure 4.1 shows the estimate for θAS in the

over-dispersed model in relation to the study estimates (βAS).

Table 4.3 provides the results of the multivariate analysis. The study-specific

posterior estimates of the relative risk of exposure to smoking alone (θS) range

63

−3

−2

−1

01

23

Figure 4.1: Boxplots of β12(log scale) by study (horizontal axis and study numbers orderedleft to right) and overall (over-dispersed model)

64

from 4.07 to 8.13. We find an overall posterior estimate for θS of 5.51 (3.78,7.89).

For the relative risk of exposure to asbestos alone (θA), the posterior estimates

range from 1.77 to 6.92. Overall, the posterior estimate for θA is 3.13 (1.80,5.41).

For the combined exposure of asbestos and smoking, the posterior estimates range

considerably from 5.50 to 50.86. Overall, the posterior estimate for θAS is 13.69

(8.20,22.76). As outlined earlier we used the posterior estimates from this analysis

as input for the the relative risk models.

Table 4.4 provides the results of the relative risk models and mixture model anal-

ysis. Rgj refers to the relative risk model by Guerrero & Johnson (Equation (4.5)),

Rlg to the Lubin & Gaffey model (Equation (4.6)), and Ra (Equation (4.7)) which

is assessing the degree of departure from an additive model. For ease of compar-

ison with the other results γ∗ for Rgj is expressed as 1 − γ. Overall, we find the

posterior mean estimate of γ∗ is 0.84 (0.82, 0.86) for Rgj, 0.69 (0.66, 0.71) for Rlg,

and 0.35 (0.33, 0.36) for Ra. These results fail to provide unequivocal support for

either a multiplicative model (implied by γ = 1) or an additive model (γ = 0). The

study-specific estimates (γi) appear to be generally higher for Rgj compared to Rlg.

For the mixture model, the overall posterior probability of an additive model is

0.06 (0.00, 0.12). This result is based on the overall relative risk estimates found

in Table 2. However, the study-specific estimates range from a probability of 0 to

0.98. For more than half the studies (11) the posterior probability of an additive

relation is greater than 0.5, and the lower bounds of the 95% credible intervals from

nine studies are greater than 0.5. Thus the study-specific results provide much more

variable support for a clear conclusion of additivity or multiplicativity.

The results from the relative risk and mixture models can be compared with

the analyses of S and V from the same data (Table 4.5). The estimates of S and

V, overall, are 1.94 (1.29, 2.84) and 0.83 (0.46,1.40), respectively, indicating a more

than additive and less than multiplicative relationship. Comparison of these results

with the relative risk and mixture models (Table 4.4) reveals a fairly close inverse

relationship between estimates of V or γ and P (additive). Figure 4.2 shows a star

plot, by study and overall, representing the results from Rlg, S, V and the mixture

model. Studies with strong evidence for a multiplicative relationship have relatively

65

Table 4.3: Relative Risk Estimates, Observed and Posterior

Observed RR Estimates* Posterior MV Estimates**Study RRS RRA RRAS θS θA θAS

1 3.44 2.24 9.57 4.73 2.88 12.30(0.74, 16.01) (0.41, 12.28) (2.25, 40.65) (2.16, 10.74) (1.11, 7.81) (5.19, 31.47)

2 1.78 1.08 5.57 3.43 2.29 9.56(0.75, 4.20) (0.19, 6.05) (2.04, 15.18) (1.54, 8.79) (0.72, 8.05) (3.61, 31.66)

3 5.47 2.82 9.86 5.57 2.78 10.75(0.40, 74.20) (0.04, 188.22) (0.69, 140.09) (2.34, 13.24) (0.82, 9.16) (3.82, 30.30)

4 6.93 2.21 15.50 5.89 2.87 13.45(0.30, 159.08) (0.02, 206.42) (0.63, 380.37) (2.38, 14.61) (0.82, 9.87) (4.47, 40.69)

5 10.13 1.83 15.89 6.80 2.25 11.00(1.13, 91.06) (0.10, 33.82) (1.77, 142.80) (2.70, 16.56) (0.69, 7.20) (4.16, 27.72)

6 5.41 2.41 19.86 5.51 3.03 17.37(2.09, 13.99) (0.46, 12.50) (5.57, 70.78) (2.90, 10.50) (1.16, 8.35) (6.47, 43.08)

7 4.71 1.28 7.58 5.20 2.02 8.80(2.27, 9.77) (0.27, 6.01) (3.31, 17.35) (2.87, 9.87) (0.76, 5.91) (4.28, 20.88)

8 3.09 1.88 4.87 4.07 2.29 6.99(1.43, 6.70) (0.64, 5.50) (2.04, 11.58) (2.16, 8.93) (0.96, 6.23) (3.16, 21.20)

9 6.01 1.80 7.79 5.94 2.25 8.39(1.45, 24.92) (0.14, 22.85) (1.77, 34.18) (2.71, 12.53) (0.74, 6.84) (3.56, 21.18)

10 4.46 1.65 4.51 5.08 1.92 5.50(2.34, 8.48) (0.70, 3.88) (2.38, 8.57) (2.81, 10.25) (0.87, 4.88) (2.93, 13.72)

11 1.83 3.78 11.06 3.74 4.70 15.93(0.58, 5.76) (1.25, 11.37) (3.87, 31.62) (1.50, 10.00) (1.61, 12.94) (5.94, 44.48)

12 6.27 0.83 6.16 6.27 1.77 7.22(0.82, 48.25) (0.05, 13.22) (0.85, 44.78) (2.47, 15.17) (0.49, 6.36) (2.67, 20.53)

13 7.13 8.47 73.73 6.25 6.53 50.86(4.20, 12.11) (1.92, 37.25) (40.47, 134.33) (2.97, 11.15) (1.82, 20.45) (12.28, 104.27)

14 8.67 25.00 40.63 6.39 6.92 26.71(5.11, 14.71) (9.00, 69.41) (22.30, 74.01) (2.80, 12.76) (1.65, 26.60) (7.80, 64.91)

15 10.85 5.17 53.24 8.13 4.62 36.16(6.39, 18.41) (2.17, 12.32) (31.11, 91.12) (3.51, 14.44) (1.67, 11.48) (10.94, 72.31)

16 7.13 7.27 17.25 6.37 4.56 15.83(4.20, 12.11) (2.39, 22.09) (9.75, 30.52) (3.46, 10.68) (1.61, 10.99) (7.85, 29.22)

17 4.94 2.98 8.21 5.42 2.93 9.62(0.14, 172.43) (0.07, 127.27) (0.24, 279.28) (2.15, 13.41) (0.95, 8.80) (3.40, 27.36)

18 7.13 5.00 52.56 6.46 5.10 37.08(4.20, 12.11) (0.66, 38.02) (25.06, 110.25) (3.42, 10.80) (1.59, 15.99) (10.90, 76.17)

Overall 5.51 3.13 13.69(3.78, 7.89) (1.80, 5.41) (8.20, 22.76)

Note: Estimates quoted are the mean estimate, below which is the * 95 per cent confidence intervalor ** 95 per cent credible interval.

66

Table 4.4: Results of relative risk models and mixture model

Relative risk models Mixture ModelStudy No. Rgj Rlg Ra

γ∗ γ γ P(Additive)1 0.83 0.67 0.38 0.24

(0.80, 0.86) (0.62, 0.72) (0.35, 0.41) (0.13, 0.36)2 0.89 0.79 0.57 0.12

(0.84, 0.94) (0.70, 0.89) (0.52, 0.62) (0.03, 0.23)3 0.54 0.30 0.16 0.85

(0.49, 0.59) (0.26,0.35) (0.13, 0.18) (0.76, 0.94)4 0.71 0.48 0.27 0.53

(0.67, 0.75) (0.43, 0.53) (0.24, 0.30) (0.42, 0.64)5 0.43 0.20 0.11 0.98

(0.37, 0.48) (0.16, 0.24) (0.10, 0.13) (0.93, 1.00)6 0.93 0.85 0.52 0.08

(0.90, 0.95) (0.80, 0.91) (0.48, 0.56) (0.04, 0.14)7 0.54 0.31 0.17 0.71

(0.48, 0.59) (0.26, 0.36) (0.15, 0.19) (0.60, 0.81)8 0.54 0.33 0.16 0.75

(0.48, 0.60) (0.27, 0.38) (0.13, 0.18) (0.65, 0.84)9 0.25 0.11 0.06 0.93

(0.18, 0.32) (0.07, 0.14) (0.04, 0.07) (0.87, 0.98)10 -0.50 -0.14 -0.04 0.98

(-0.68, -0.34) (-0.18, -0.11) (-0.05, -0.03) (0.96, 0.99)11 0.85 0.70 0.40 0.12

(0.83, 0.87) (0.66, 0.74) (0.38, 0.43) (0.06, 0.19)12 -0.14 -0.05 -0.01 0.98

(-0.28, -0.01) (-0.09, -0.01) (-0.02, 0.01) (0.95, 0.99)13 1.02 1.10 0.94 0.00

(1.00, 1.03) (1.05, 1.14) (0.88, 0.10) (0.00, 0.01)14 0.75 0.50 0.26 0.56

(0.73, 0.78) (0.46, 0.53) (0.24, 0.29) (0.47, 0.66)15 0.95 0.88 0.62 0.04

(0.93, 0.96) (0.84, 0.92) (0.58, 0.65) (0.00, 0.09)16 0.59 0.34 0.16 0.80

(0.56, 0.62) (0.31, 0.37) (0.15, 0.18) (0.70, 0.89)17 0.46 0.23 0.11 0.89

(0.41, 0.51) (0.19, 0.27) (0.09, 0.12) (0.82, 0.94)18 0.98 0.99 0.73 0.00

(0.97, 1.00) (0.94, 1.03) (0.69, 0.78) (0.00, 0.01)Overall 0.84 0.69 0.35 0.06

(0.82, 0.86) (0.66, 0.71) (0.33, 0.36) (0.00, 0.12)


67

bigger stars, indicating large values for S, V, γ and P(M).

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 Overall

gamma

PM

S

V

Figure 4.2: Starplots by study (1-18) and Overall. S is the Synergy Index, V the Multiplicativ-ity Index, PM the probability of a multiplicative relation, and gamma is the power transformationestimate from Rlg (gamma=0 (additive), gamma=1 (multiplicative))

Sensitivity Analysis

Table 4.6 provides the results of the sensitivity analysis for both the relative risk

and mixture models. Estimates of γ from the relative risk models are significantly

lower for cohort studies than case-control studies, indicating less evidence of a sim-

ple multiplicative relationship for cohort studies. The mixture model analysis also

appears to show a clearer difference between type of study. Given a choice of ei-

ther an additive or multiplicative model, the probability of an additive model for

68

Table 4.5: Results of Synergy Index (S) and Multiplicativity Index (V)

Study No. S V P(S)>1 P(V)<11 2.16 1.04 0.97 0.58

(0.98, 4.23) (0.31, 2.51)2 3.05 1.51 0.94 0.37

(0.75, 7.51) (0.31, 4.19)3 1.62 0.85 0.87 0.73

(0.69, 3.44) (0.20, 2.46)4 1.94 0.98 0.93 0.65

(0.77, 4.12) (0.23, 2.82)5 1.50 0.86 0.91 0.72

(0.80, 2.45) (0.22, 2.37)6 2.67 1.21 0.97 0.45

(0.98, 5.34) (0.33, 2.99)7 1.56 0.96 0.90 0.63

(0.76, 2.83) (0.28, 2.28)8 1.52 0.85 0.81 0.74

(0.62, 3.27) (0.26, 2.05)9 1.25 0.74 0.71 0.80

(0.62, 2.37) (0.20, 1.99)10 0.94 0.62 0.25 0.92

(0.56, 1.70) (0.23, 1.36)11 2.42 1.04 0.98 0.56

(1.10, 4.50) (0.30, 2.45)12 1.03 0.80 0.47 0.77

(0.51, 2.28) (0.19, 2.33)13 4.91 1.49 0.99 0.33

(1.55, 9.24) (0.36, 3.97)14 2.43 0.78 0.97 0.76

(0.93, 5.40) (0.16, 2.46)15 3.38 1.09 0.99 0.53

(1.44, 5.54) (0.34, 2.59)16 1.71 0.63 0.96 0.88

(0.94, 3.10) (0.22, 1.64)17 1.45 0.72 0.84 0.82

(0.71, 2.90) (0.20, 1.90)18 3.98 1.34 0.99 0.41

(1.34, 7.52) (0.33, 3.51)Overall 1.94 0.83 1.00 0.79

(1.29, 2.84) (0.46, 1.40)


69

case-control studies is 0.02 (0.00, 0.10), and for cohort studies is 0.52 (0.42, 0.62).

This difference in results by type of study is in agreement with previous evidence

(Liddell, 2002) and analysis using S and V.

Table 4.6: Sensitivity Analysis

Relative risk models Mixture ModelSensitivity analysis Rgj Rlg Ra

γ∗ γ γ P(Additive)Type of Study CC 0.88 0.79 0.41 0.02

(0.84, 0.92) (0.71, 0.87) (0.38, 0.44) (0.00, 0.10)PP 0.74 0.50 0.25 0.52

(0.72, 0.76) (0.47, 0.52) (0.23, 0.27) (0.42, 0.62)


4.6 Discussion

We reviewed the literature for quantitative information on the combined effect of

exposure to asbestos and smoking on lung cancer, and explored a Bayesian approach

to assess evidence of interaction. The overall conclusion is that the relation is more

than additive and less than multiplicative, a result consistent with recent reviews of

the literature.

While a conceptual basis for assessing interaction (i.e evidence for one relation

against another) is well known (Greenland and Rothman, 1998), in general, tests

for interaction and the interpretation of results remain topics of some debate (UN-

SCEAR, 1982). Incorrect approaches to test for interaction appear frequently in the

literature (Hallqvist et al., 1996). Much of the difficulty with interpretation of results

is found in the ambiguity between the meaning of interaction from a statistical and

biological perspective. A further difficulty is that, as many studies are underpowered

to assess interaction, assessments of strength of interaction rather than statistical

significance may be important (Saracci and Boffetta, 1994).

70

Hallqvist et al. (1996) describes some of the problems with approaches which have

been used in the literature. For example, an inappropriate approach is to compare a

higher cumulative incidence of a joint exposure to that observed for either risk factor

separately and infer that one risk factor is exacerbating the effect of the other, since

the relationship of each risk factor to a joint exposure may be less than additive.

Another common approach to assessing interaction is to include a product term in a

logistic or log-linear regression. As both types of regressions assume a multiplicative

form, including an interaction term assesses departure from a simple multiplicative

model but provides no information in support of an additive relation.

As outlined in the introduction, our aim in this chapter was to allow joint infer-

ences to be made about the strength of evidence for an additive or multiplicative

relation. Although the power transformation estimate from the relative risk model

provides an appropriate link function there is uncertainty about the interpretation

of evidence for one relation over another. The mixture model, on the other hand,

directly provides information about the preference of one relation over another in

the form of a probabilistic statement. This approach could easily be extended to

include alternative relations.

As outlined in Section 4.4 for each model, our choice of prior distributions for

our parameters of interest, in the absence of available information, were relatively

uninformative. Where information is available alternative assumptions can be made,

but we did not explore them here. Informative priors could be used in cases where

studies are not independent or where expert opinion is available about parameter

values.

The meta-analysis considered here included studies reported in two recent reviews

of the literature by Lee (2001) and Liddell (2001). A search of the MEDLINE

reference database (1998 - May 2006) and cited references for more recent studies

revealed a number of papers with information relating to occupational exposure to

asbestos, smoking habits, and the association of these factors with lung cancer risks.

However, only one study provided quantitative information about relative risks for

each exposure category (Gustavsson et al., 2002). The results from one other study

were based on a cohort previously included (Liddell and Armstrong, 2002), and

71

for another study little detailed information was provided on the joint exposure to

asbestos and smoking (?). Some studies provided insufficient information on smoking

habits (Rafnsson and Sulem, 2003; Ulvestad et al., 2002; Goldberg, 1999; Stayner

et al., 1997; ?), while other studies were underpowered to assess evidence of a joint

effect (Rosamilia et al., 1999) or were genetically based (Schabath et al., 2002). A

study by Goldberg (1999) reported that “the probability that a cancer is due to

asbestos is the same among smokers and non-smokers”, implying a multiplicative

relation was found, but insufficient quantitative information was provided to allow

its incorporation into the meta-analysis considered here.

The case-control study by Gustavsson et al. (2002) investigated the association

between low-dose exposure to asbestos and lung cancer, and in the analysis of the

combined effect of asbestos and smoking, found more evidence for an additive re-

lation compared to a multiplicative. Relative risk estimates (with 95% confidence

intervals) were reportedly RRS=21.8 (14.4, 32.8), RRA=4.2 (1.6, 11.1), RRAS=28.6

(19.9, 48.3). Departure from multiplicativity was investigated in the study by includ-

ing an interaction term in a logistic regression (βAS=0.31 (0.11,0.86)), and departure

from additivity was evaluated using the Synergy Index (1.15 (0.77, 1.72)).

There are limitations associated with combining studies in the form of a meta-

analysis. Meta-analysis is designed to facilitate combination of results from studies

which are comparable in outcome and exposure. The inclusion of studies is inevitably

a choice that directly impacts on the content and applicability of the results. Here

we have combined studies with variability in, but not limited to type of study,

country of study, size of study, definitions of non-smokers, exposure times to asbestos

and exposure to different types and size of asbestos particles. For the first factor

(type of study), we found much greater evidence for a multiplicative relation using

information from case-control than cohort studies, which is consistent with S and

V analyses and as noted by Liddell (2001).

There are also limitations to assessing interaction. First, for studies of exposure

to asbestos and smoking, the small number of lung cancer cases for non-smokers

greatly increases the uncertainty of any estimated association. Second, Greenland

and Rothman (1998) suggest that even large data sets may not provide enough

72

information to establish relationships among variables while controlling confounding.

In the present case, we have relied on summary statistics from studies, and derived

unadjusted relative risk estimates, which is far from the ideal construction of a model

with complete control of covariates such as the extent and duration of smoking,

exposure to other lung carcinogens, etc. Third, and consistent with a multi-stage

model of carcinogenesis, the form of interaction observed may be influenced by the

length of follow-up time in studies (Archer, 1988).

An assessment of interaction is also a function of dosage levels for each risk factor,

both in the nature of the functional form assumed for dose-response relationships for

each factor and the dosage levels at which they combine. In the case of continuous

covariates, care must be taken to consider the appropriate dose-reponse relationship

for each factor individually before an assessment of the combined effect. Here we

have used categorical covariates (exposed versus not exposed) on the dosage levels

for each factor, and the dose-reponse relationship is difficult to explicitly model.

Our main interest is then the extent to which the risk factors combine at this binary

level. Although the definitions of those exposed and not exposed are subject to

cutoff points we should still be able to see evidence for a multiplicative or additive

relation provided the definitions are consistent across studies. However, a limitation

of such a binary classification is that the power to test interactions is essentially

determined by the size of the smallest category, so few lung cancer cases for non-

smokers suggests that an analysis based on a binary classification is likely to be

weaker than one based on continuous data.

Chapter 5

Spatial and temporal modelling of Ross River virus in

Queensland

In this chapter, we examine a mixture model approach to characterise the risk of

Ross River virus (RRv) in Queensland from 1984 to 2001. At the time of analysis

(2005), this data was the most recently available for all of QLD. The mixture model

approach builds on the approach adopted by Gatton et al. (2004), and considers

that the weekly cases of RRv could be attributed to more than two hypothesised

periods (outbreak or no outbreak period), and also extends the analysis to compare

the number of periods across non-homogenous spatial regions of Queensland.

5.1 Introduction

Ross River virus (RRv), also known as Epidemic Polyarthritis, is a debilitating

disease and is the most prevalent vector-borne disease in Australia (Lin et al., 2002).

It was first identified in 1958 from mosquitoes collected at Ross River, Townsville, by

the Queensland Institute of Medical Research and since then has become common

in Queensland. The virus can survive and replicate in humans and other vertebrate

hosts, and is transmitted by a variety of mosquito vectors (Russell and Dwyer,

2000). The disease in humans is nonfatal and infections can be either asymptomatic

73

74

or symptomatic, with symptoms including polyarthritis, rash, fever, myalgia, and

lethargy (Harley et al., 2001).

There has been much recent research into the spatial and temporal nature of

Ross River virus in Queensland (Gatton et al., 2004; Kelly-Hope et al., 2004; Tong

and Hu, 2002). A recent paper by Gatton et al. (2004) focussed on the spatial

and temporal nature of outbreak periods, where outbreak periods are defined by

comparison against long term incidence rates specific to that area. The spatial and

temporal nature of outbreak periods is of public health importance as increased

understanding will lead to more targeted public health interventions (Tong, 2004).

In this chapter, we use a Bayesian mixture model to characterise outbreaks in

weekly cases of Ross River virus in Queensland from 1984 to 2001. RRv notification

data was obtained from the Communicable Diseases Section of Queensland Health.

An exploratory analysis revealed an association between climate variables and cases

of RRv, so we aggregated the data to fifteen homogenous climate zones representing

Queensland.

The mixture model allows us to separate the RRv data over time into a number

of states or components, where the number of components is unknown a priori. This

is an extension of previous work on RRv which has focussed on only two components

or states, a non-outbreak state (background) and an outbreak state, with the latter

state associated with a higher mean value of cases than the former. Evidence for

three components may indicate an additional state to these two, and we could call

this a ‘hyper-outbreak’ state. It is less clear how to interpret data best fitted by a

model with four or more components. The method also provides a probability of the

component (state) to which each week in the data set belongs, and thereby avoid

possibly subjective decision rules.

The choice between competing models of a different number of components in-

variably involves a selection criteria that will take into account both measures of

fit and complexity. In this chapter we use methodology developed in Celeux et al.

(2003) and choose between competing models based on Deviance Information Cri-

terion (DIC) estimates. The parameters for the different models were estimated

by Markov Chain Monte Carlo (MCMC) using the software package WinBUGS

75

(Spiegelhalter et al. (2002)).

We focussed the analysis on two different climate zones which appeared to display

different temporal behaviour, and found much variability in the results, with a higher

number of components preferred for data from the zone which appeared to show a

more distinctive pattern.

We then fitted a mixture model to each of the remaining zones and compared

the variability in the number of components and associated parameter estimates.

5.2 Method

5.2.1 Data

Ross river virus disease notification data from 1984 to 2001 was obtained from the

Communicable Diseases section of Queensland Health. A notification was reported

if serologic testing indicated a four-fold change in antibody titer between paired

acute and convalescent sera, or if IgM and IgG antibody levels against RR virus

were consistent with acute infection. Each complete notification included place

of residence (location and street/road), date of onset, age and sex of the patient.

Place of residence was further geocoded by the Queensland Department of Local

Government and Planning into Statistical Local Areas (SLA) and later grouped to

Local Government Areas (LGA).

An exploratory analysis of the data at the residence level and recent research

indicates a strong relationship between the incidence of RRv virus and climate re-

lated variables such as rainfall, temperature, humidity, Southern Oscillation Index,

and sea levels (McFallan, 2001; Tong and Hu, 2002; Kelly-Hope et al., 2004). On

this basis, we decided to aggregate the data to 15 climate zones as identified by the

Australian Bureau of Meteorology (See Figure 5.1).

A summary of the data for the fifteen climate zones is provided in Table 5.1.

As an example of the variability of the data over time between the different

zones, Figures 5.2 & 5.3 show the weekly number of RRv cases over time for Zones

15 and 5 respectively. The data from Zone 15 appears to show a more distinctive

76

Figure 5.1: Queensland climate zones - Bureau of Meteorology

77

Table 5.1: Summary results - all zones

Zone Min Q1 Median Mean Q3 Max1 0.00 0.00 0.00 0.08 0.00 2.002 0.00 0.00 0.00 0.14 0.00 5.003 0.00 0.00 0.00 0.33 0.00 7.004 0.00 0.00 2.00 4.17 5.00 42.005 0.00 0.00 2.00 5.10 5.00 59.006 0.00 0.00 1.00 2.07 2.00 37.007 0.00 0.00 1.00 3.21 3.00 48.008 0.00 0.00 0.00 0.85 1.00 14.009 0.00 0.00 0.00 0.29 0.00 9.0010 0.00 0.00 0.00 0.42 0.00 20.0011 0.00 0.00 0.00 0.07 0.00 3.0012 0.00 0.00 0.00 0.73 1.00 15.0013 0.00 0.00 1.00 2.07 2.00 59.0014 0.00 0.00 1.00 2.33 2.00 44.0015 0.00 2.00 5.00 14.42 11.00 307.00

outbreak pattern than from Zone 5, and thus it is of interest to see how the results

of applying a mixture model to these zones separately may differ.

5.2.2 Mixture models

The use of mixture distributions comprising a finite or infinite number of compo-

nents, possibly of different distributional types, to describe different features of data

has attracted a great deal of recent research interest (Marin et al., 2005; McLachlan

and Peel, 2000a).

The mixture model can be formulated as,

p(y|θ) =k∑

j=1

wjf(y|θj)

k∑j=1

wj = 1, k > 1

(5.1)

78

050

100

150

200

250

300

Ross River virus cases − Zone 15

Time

Cas

es

1985 1990 1995 2000

Figure 5.2: Time plot of weekly cases - Zone 15

where k is the number of components, wj is the probability of being allocated to com-

ponent j, and where the allocation of each observation yi to one of the components

is represented by a latent variable zi (zi ∈ N (discrete case))

p(zi = j) = wj

Z ∼Multinomial(1, p1 . . . pk)(5.2)

Although, we have correlated data and an assumption of the data (y) for a mixture

model is that it is i.i.d, we use this simplifying assumption for now and discuss the

implications of this in the discussion.

Choice between competing models of a different number of components invariably

involves a selection criteria that will take into account both measures of fit and

79

010

2030

4050

60

Ross River virus cases − Zone 5

Time

Cas

es

1985 1990 1995 2000

Figure 5.3: Time plot of weekly cases - Zone 5

complexity. For example, a mixture model with a large number of components may

fit the data well, but suffer from a lack of interpretability of the parameters. In this

chapter we use methodology developed in Celeux et al. (2003) and choose between

competing models based on Deviance Information Criterion (DIC) estimates. As the

aim of the analysis is to interpret the components as identifying separate groupings

in the data with different levels, a further criterion for model choice in this context

is also to ensure the means of the components are well separated.

Application to RRv data

We first fitted a Poisson distribution to the RRv data for a number of the zones,

and found that due to the large number of zeros, this distribution does not offer a

80

good fit (See Figure 5.4). There are a range of methods available to handle the case

of a large number of zeros for count data (See for example Dalrymple et al. (2003)).

We chose to let log(yt+1) follow a truncated normal distribution.

For RRv data in each zone we specified,

log(yt + 1) ∼ TN(µZt , τ)

Zt ∼ Multinomial(P1,...,k)

and priors

µZt ∼ N(1, 0.01)

P1,...,k ∼ Dirichlet(α)

τ ∼ Gamma(2, 3)

where yt are the observed cases, τ is a precision parameter and µ is restricted to

µ1 < µ2 < . . . µk to prevent label switching.

The parameters for the different models were estimated by Markov Chain Monte

Carlo (MCMC) using the software package WinBUGS (Spiegelhalter et al., 2002).

Estimates are based on runs of 10,000 iterations, or until evidence of convergence.

Convergence was assessed by examining Monte-carlo error estimates and Gelman-

Rubin statistics (Brooks and Gelman, 1998b).

5.3 Results

The results for Zones 5 and 15 are provided in Table 5.2. The estimates for µ have

been exponentiated for ease of interpretability with the original data.

Our model choice criteria are the DIC estimates (lowest being preferable), the

effective number of parameters (pd), and the separation of the means. For Zone

15 (Table 5.2), the estimates for four components and above show weak signs of

convergence in the MCMC runs (Brooks and Gelman, 1998b). This is also an indi-

cation that we could be overfitting. On the basis of this, the three component model

appears to be preferable as there is a reduction in the DIC estimate from the two

81

1

0.0 0.2 0.4 0.6 0.8 1.0

020

040

060

080

0

2

0.0 0.5 1.0 1.5

020

040

060

0

3

0.0 0.5 1.0 1.5 2.0

020

040

060

0

4

0 1 2 3 4

050

100

200

5

0 1 2 3 4

050

100

150

200

250

6

0 1 2 3 4

050

150

250

350

7

0 1 2 3 4

050

100

200

8

0.0 0.5 1.0 1.5 2.0 2.5

010

030

050

0

9

0.0 0.5 1.0 1.5 2.0

020

040

060

0

10

0.0 0.5 1.0 1.5 2.0 2.5 3.0

020

040

060

0

11

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

020

040

060

080

0

12

0.0 0.5 1.0 1.5 2.0 2.5

010

030

050

0

13

0 1 2 3 4

050

150

250

350

14

0 1 2 3 4

050

150

250

15

0 1 2 3 4 5 6

050

100

150

Figure 5.4: Histograms of data (log(y+1)) for all Zones (as numbered)

82

Table 5.2: Results for Zones 5 and 15

Zone No. Components µ σ2 λ pD DIC

5 1 0.03 3.68 1.00 1.11 21342 0.03 2.03 0.84 3.05 2093

17.41 0.97 0.163 - - - - -

15 1 2.65 3.87 1.00 2.11 27452 0.46 7.20 0.72 3.59 2711

6.39 0.86 0.283 0.03 1.08 0.23 5.06 2701

6.12 1.08 0.6661.55 1.08 0.11

4 - - - - -

Note: - indicates non-convergence and is evidence of overfitting. pD is the effective number ofparameters being used in the model from DIC calculations.

component model (2,711 (k = 2) to 2,701 (k = 3)), without a large increase in the

number of effective parameters (3.59 (k = 2) to 5.06 (k = 3)).

Figure 5.5 illustrates the fitted mixture model with three components for Zone

15 against a time series of the data. In this figure, we can see the three levels of

the time series corresponding to the means of the three components. Figure 5.6

shows a comparison of the fitted mixture model against a histogram of the data. In

comparison, the results from Zone 5 indicate that two components can be fitted to

the data over time.

Figure 5.7 illustrates the fitted mixture model with two components for Zone 5

against a time series of the data, and figure 5.8 shows a comparison of the fitted

mixture model against a histogram of the data.

The results for all zones are shown in Table 5.3. For Zones 8 and 10 the results

suggest only one component or group in the data over time; the results for Zones 4

to 7, 13 and 14 suggest two components, and the results for Zone 15 suggest three

components. For the other zones, the data were too disparate to apply a mixture

83

Time

log(

case

sz15

+1)

1985 1990 1995 2000

01

23

45

0.0 0.1 0.2 0.3

01

23

45

6

Figure 5.5: Plot of fitted mixture model for Zone 15 showing three components against thedata over time (log values). Overall fitted density is shown in Black, and components in Red. Bluelines indicate the estimates of µ for the three components.

84

0 1 2 3 4 5 6

0.0

0.1

0.2

0.3

0.4

Figure 5.6: Plot of fitted mixture model for Zone 15 against a histogram of the data. Overallfitted density is shown in Black, and components in Red.

85

Time

log(

case

sz5+

1)

1985 1990 1995 2000

01

23

4

0.0 0.2 0.4 0.6

01

23

4

Figure 5.7: Plot of fitted mixture model for Zone 5 showing three components against thedata over time (log values). Overall fitted density is shown in Black, and components in Red. Bluelines indicate the estimates of µ for the three components.

86

0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

log(casesz5+1)

Den

sity

Figure 5.8: Plot of fitted mixture model for Zone 5 against a histogram of the data (densityscale). Overall fitted density is shown in Black, and components in Red.

87

model and draw any substantive conclusions (See Figure 5.4).

Table 5.3: Results for all Zones

Zone No. Components† µ σ2 λ pD

1 * * * * *2 * * * * *3 * * * * *4 2 0.01 1.06 0.57 2.85

6.36 1.06 0.435 2 0.03 2.03 0.84 3.05

17.41 0.97 0.166 2 0.00 0.14 0.37 3.55

2.02 1.10 0.637 2 0.02 1.82 0.86 2.63

8.85 1.21 0.148 1 0.00 1.02 1.00 1.869 * * * * *10 1 0.00 0.63 1 4.7811 * * * * *12 1 0.00 0.92 1 2.1213 2 0.00 0.15 0.36 4.54

1.75 1.17 0.6414 2 0.00 0.15 0.34 3.94

1.89 1.26 0.6615 3 0.03 1.0765 0.23 5.06

6.12 1.08 0.6661.55 1.08 0.11

Note: † indicates the number of components best addressing the model choice criteria. * indi-cates the data range is too disparate to evaluate a mixture model. pD is the effective number ofparameters being used in the model from DIC calculations

The results from Table 5.3 suggests a spatial pattern to the data, with two

components identified for zones located on the coast (Zones 1,4-7,14,15) compared

to only one component for zones located inland. This spatial pattern for RRv is

supportive of previous evidence associating higher incidences of RRv with coastal

regions (Tong, 2004).

88

5.4 Discussion

We explored a Bayesian mixture model to analyse cases of RRv occurring in 15

climate zones throughout Queensland. We examined two of the zones in detail

and found a higher number of components preferred for data from the zone which

appeared to show a more distinctive pattern (Zone 15). A comparison across all the

zones suggests a higher number of components is identifiable from the data for zones

located along the coast of QLD.

There may be a number of explanations as to why we observe a number of

components or groups in the data over time. Further analysis of Zone 15 suggests

that if we take into account a possible change point in the data around 1991/92 due

to a change in notification practice, the number of components reduces from three to

two. We may also observe two or more components if there has been a substantive

increase in the magnitude of outbreaks over time.

We may also observe differences between the zones in terms of the mean (µ) and

weight (λ) associated with components. Analysis of the means of the components

indicating the change in the level of the data between the components, and the

weight being indicative of the amount of time spent in each component. Even for

zones with the same number of components the disparity between these parameter

estimates may be quite large.

Although we have used a simplifying assumption that the data is i.i.d, this is

not without implications. The most likely implication is that the standard errors

around our estimates are biased, and are likely to be understated. For this reason

no inference for the variance was made. This in some sense a price to pay for the

approach we have adopted. Allowance for the correlated nature of the data is likely

to lead us away from the main aim of the analysis. The primary aim of the analysis

is to classify the data into groups based on changes to the level of the data, rather

than explain the correlation structure of the data.

Explaining the correlation structure of the data is also likely to be difficult. There

does not appear to be a consistent correlation structure to the data, either due to

changes in the magnitude of the seasonal cycles or from changes in the correlation

89

of the data from one period to another. In this case, allowing for the correlation

structure of the data is likely to disguise any changes in the levels that we may

otherwise observe. Alternative approaches such as a Hidden Markov Model, or

a Dirichlet Process mixture have similar inferential difficulties and computational

issues are substantially more involved.

The analysis could be extended in a number of ways. Other distributional forms

could be assume to take account of the large number of zeros in the data and

investigated to assess the difference in the results. Further analysis is also required

to compare the timing of the components across the zones, and we could further

reduce the number of zones into a grouping based on the timing and number of

components observed.

Chapter 6

Bayesian mixture model estimation of aerosol particle

size distributions

In Chapters 6, 7 and 8 we examine approaches to estimate a mixture model at

both single and multiple time points for aerosol particle size distribution (PSD)

data. In this chapter, for estimation of mixture model at a single time point, we

use Reversible Jump MCMC to estimate mixture model parameters including the

number of components which is assumed to be unknown. We compare the results

of this approach to a commonly used estimation method in the aerosol physics

literature. As PSD data is often measured over time at small time intervals, we also

examine the use of an informative prior for estimation of the mixture parameters

which takes into account the correlated nature of the parameters.

6.1 Introduction

There has been recent interest in the estimation of particle size distributions of

aerosol particulate data (Makela et al., 2000; Birmili et al., 2001; Xu et al., 2002;

Whitby et al., 2002; Lu and Bowman, 2004; Hussein et al., 2005). In these pa-

pers, the interest in estimation is largely directed at better understanding aerosol

dynamic processes (i.e., coagulation, nucleation, condensation, and deposition) that

90

91

govern aerosol formation, as growth, and evolution depend on the number, size,

and composition of particles. In the atmosphere, these aerosol characteristics de-

termine the influence of particles on health, climate, cloud formation, and visibility

(Seinfeld and Pandis, 1998). To examine the effects of these impacts, accurate and

computationally efficient estimates of the size and composition of the distribution

are required.

A number of different mathematical representations of size distributions exist,

including discrete, spline, sectional, modal, or monodisperse (Whitby and McMurry,

1997). Two of the most common approaches for representing size distributions are

sectional and modal methods. First introduced by Whitby (1978), a modal represen-

tation treats the aerosol size distribution as a set of individual, typically lognormal,

distributions or modes. Estimation of the modal representation commonly uses an

iterative least squares method (LSM) subject to certain conditions, such as main-

taining a minimum distance between mean estimates of two adjacent components

(for example, see Hussein et al. (2005)).

An alternative modal representation of the size distribution is a finite mixture

model. Mixture models have been the subject of much recent research (Diebolt

and Robert, 1994; McLachlan and Peel, 2000a; Marin et al., 2005; Richardson and

Green, 1997). The Bayesian paradigm for mixture modelling allows for probability

statements to be made directly about the unknown parameters and (perhaps) an

unknown number of components, prior knowledge and expert opinion to be included

in the analysis, and hierarchical descriptions of both local-scale and global features

of the model.

In this chapter, we analyse a sample of aerosol particulate data using a Bayesian

mixture model, and assess the performance of the method using actual and simu-

lated data. We then outline an approach for describing the evolution of the aerosol

particles over time, using an informative prior on a sample of data collected over

one day.

In Section 8.2, we briefly describe particle size distributions, and provide an

illustration with actual data. In Section 3 we outline the methodology of mixture

models, a Gibbs sampling algorithm to estimate the mixture, and a variation to

92

account for the truncation of the data. In Section 4 we present the results of applying

the Bayesian mixture model to some simulated and actual datasets and compare the

results to those obtained by LSM.

6.2 Particle size distribution data

One of the most important physical properties of aerosol particles is their size and

the concentration of particles in terms of their size is referred to as the particle

size distribution. Figure 8.1 shows an example of particle size distribution data

for one measurement or time period. Because aerosol particles are often charged,

their size can be determined from their electrical mobility (McMurry, 2000). A

common instrument that utilizes this principle is the Differential Mobility Particle

Sizer (DMPS). The DMPS includes three main parts: (1) an aerosol particle charger

that produces a steady-state charge distribution for the aerosol particle sample (e.g.

Wiedensohler, 1988; Adachi et al., 1985; Hussin et al., 1983), (2) differential mobility

analyzer (DMA) that separates aerosol particles according to their electrical mobility

(e.g. Hewitt, 1957; Knutson and Whitby, 1975), and (3) a particle counter to count

the number concentrations of the separated aerosol particles after the DMA.

Based on their formation processes, aerosol particles are either primary or sec-

ondary. Primary aerosol particles are directly emitted into the atmosphere or formed

in the atmosphere by condensation or coagulation without chemical reactions. On

the other hand, secondary aerosol particles are formed in the atmosphere by gas-to-

particle conversion processes. Growth of aerosol particles occurs through coagulation

and condensation of hot vapors (e.g. Kulmala et al., 2004). However, the rate of

coagulation depends on the already existing particle number concentration whereas

the rate of condensation depends on the surface area of aerosol particles. There-

fore, particles do not normally grow above 1 mm because the condensation and

coagulation rates decrease as the particle size increases.

In this study we present, as an example, the aerosol particle evolution, before,

during, and after a new particle formation event at a Boreal Forest in Southern

Finland (Figure 8.2). This dataset was selected as it provides a wide ranging repre-

93

1 2 3 4 5 6

0.0

0.1

0.2

0.3

0.4

0.5

Particle Diameter (log(Dp(nm))

Den

sity

Figure 6.1: Histogram of data sampled from Hyytiala, Finland for a single time period

94

sentation of modes for particle size distributions (Dal Masso et al., 2005). Because

aerosol particles are governed by formation and transformation processes, they tend

to form well distinguishable modal feature. For example, during background con-

ditions in the Boreal Forest the particle number size distribution of fine aerosols

(diameter < 2.5 mm) is bi-modal: an Aitken mode (below 0.1 mm) and an accumu-

lation mode (over 0.1 mm). During a new particle formation event a new particle

mode, which is commonly known as nucleation mode, is formed in the atmosphere

with geometric mean diameter bellow 0.025 mm. However, in the urban atmosphere,

aerosol particles are more dynamic because of the different types and properties of

sources of aerosol particles and may show more than 3 lognormal modes. Typically

the number concentrations of aerosol particles in the urban background can be as

high 5× 104cm−3 and very close to a major road they often exceed 105cm−3.

In general, aerosol particles have direct and indirect impacts on the Earth’s

climate. Investigating the modal structure of aerosol particles provides a better un-

derstanding about their dynamic behavior in addition to their effect on the climate.

New particle formation events in the background atmosphere can be one of the best

case studies aiming to understand the dynamical changes that take place in aerosol

particles from the very early stage of their size until they grow and further their

participation in cloud processes. It has been recently observed that new particle

formation can take place anywhere in the globe, further increasing the importance

of understanding the processes involved.

6.3 Methods

In this section, we outline an independent approach to estimating a mixture model

at a single time point using RJMCMC, a two stage approach to estimation of a

mixture over multiple time points, and an approach to estimate a mixture of normal

distributions where there is truncation present in the data.

95

Figure 6.2: An illustration of a new particle formation event at a Boreal Forest sitelocated in Southern Finland. (a) The temporal variation of the particle number size distri-bution and (b) selected particle number size distributions showing the different stages ofthe newly formed particle mode from its early stage. Note that this new particle formationoccurred on a regional scale over the southern part of Finland.

96

6.3.1 Mixture model at a single time point

The density of data (y) given by a finite mixture model can be represented by;

p(y|θ) =k∑

j=1

λjf(y|θj) (6.1)

where k is the number of components in the mixture, λj represents the probability of

membership of the jth component (∑k

j=1 λj = 1), and f(y|θj) is the density function

of component j which has parameters θj.

As component membership of the data is unknown, the usual hierarchical frame-

work for the mixture model involves introducing the latent indicator variable z. In

this model, zi represents the unobserved component membership from which ob-

servation yi is drawn, and is treated as another parameter to be estimated in the

modelling procedure.

In this chapter, we transform the aerosol particle size distribution data using a

natural logarithm prior to fitting the mixture model (Whitby and McMurry, 1997).

In this case, the data (yi) are the natural log of particle diameters (nm), which are

assumed to be normally distributed and the parameters (θj) to be estimated for each

component are therefore the mean (µj) and variance (σ2j ), along with the component

weight (λj). The number of normal components, k, is also assumed to be unknown.

Priors were:

p(µj) ∼ N(ξ, κ−1)

p(σ−2j ) ∼ Gamma(δ, β)

p(β) ∼ Gamma(g, h)

p(λ) ∼ Dirichlet(α1, α2, · · · , αk)

p(k) ∼ Uniform(kmin, kmax)

where ξ, κ, δ, α, η, g, h, kmin and kmax are fixed hyperparameters.

For estimation of the mixture model, we implemented Richardson & Green’s

(1997) Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm (for de-

97

tails see Appendix I). This approach is “fully” Bayesian in the sense that a posterior

distribution for the unknown number of components (k) in the mixture model is es-

timated, rather than using a comparative measure, such as the Bayesian Information

Criteria, to assess the fit of mixture models of different dimensions. In section 8.4,

we compare the results of applying the RJMCMC algorithm to results obtained

using the LSM for measurements at particular time periods.

Mixture model estimation over multiple time periods

To examine aerosol dynamic processes, measurements of aerosol particle size dis-

tribution data are often taken regularly over time with the measurement intervals

typically ranging from 5 minutes to an hour. For the smaller measurement inter-

vals the data and associated parameter estimates are likely to be highly correlated

across time. Most estimation of the particle size distribution data does not take in

to account the likely correlation between parameters other than in most cases using

the previous parameter estimates as starting values in an optimisation routine for

the current period. Allowing for correlation between the parameter estimates over

time is likely to lead to improvements in estimation, inference and efficiency.

One approach to this problem is to extend the RJMCMC mixture model de-

scribed in Section 6.3.1 to allow for evolution of parameters θjt over time periods t,

t = 1, . . . , T with kt, the number of mixture components at time t, also unknown

and possibly unequal. This single modelling approach requires reversible moves not

only within time periods but also across them. In our experience this was computa-

tionally very costly and required substantial pre-processing to ensure good mixing,

labelling and convergence. Moreover, further post-processing was required to obtain

adequate summary statistics and between component mapping.

As an alternative, we adopted a two-stage approach to estimation of a mixture

model over multiple time points. In the first stage, for each time period t, we

implemented the RJMCMC algorithm of Section 6.3.1 and estimated kt. We then

calculated k′= maxt=1,...,T (kt). In the second stage, we fixed kt = k

′and estimated

(θjt = (µj, σj, λj); j = 1, . . . , k′; t = 1, . . . , T ). As we do not observe all of the k

′

components in every time period, we allowed component weights to be “effectively”

98

zero (inf(λt)=0.001) if required.

In the second stage of this algorithm, we considered two sets of priors. The

first was the set of independent priors: p(λ) ∼ Dirichlet(α1, ..., αk); p(µj|σ2j ) ∼

N(ξj,σ2

j

nj);

p(σ2j ) ∼ IG(

vj

2,

s2j

2), where αj, ξj, nj, vj and sj are fixed hyperparameters. The second

allowed for temporal correlation. In the case of a Gaussian mixture model, we have

three parameters (µ, σ and λ) for which we could utilise information from previous

time periods. In this chapter, we adopt one such informative prior for λ as it was

the parameter of most interest in the aerosol study. We note that alternative priors

for λ could be defined and that informative priors can be constructed for µ and σ

instead or on multiple parameters.

Gustafson and Walker (2003) proposed a prior for λ that downweights large

changes in probabilities in successive periods. For time period t, t = 2, . . . , T define

p(λt) ∼ Dirichlet(1, . . . , 1) exp(−∑T

t=2

∑Jj=1(λjt − λj,t−1)

2

φ) (6.2)

where smoothing increases as φ → 0.

A potential advantage of using information about estimates over the whole time

period (t = 1 . . . T ) is the additional information this may provide to guide parameter

estimates in the current period. This may be an advantage at times where large

changes in the parameter estimates are occurring for single time periods.

To sample from the posterior distribution of λ, Gustafson and Walker (2003) pro-

pose a rejection sampling algorithm in which the candidate distribution is Beta(m(r)jt +

1,m(r)kt + 1) where m

(r)jt is the number of observations allocated to component j or k

(where j 6= k) for time period t at iteration r. A limitation of this rejection sampling

scheme is that it becomes problematic for large sample sizes and we discuss this issue

later in relation to the results.

6.3.2 Accounting for truncated data

In this section we outline the use of a second latent variable to estimate a mixture

of normal distributions where there is truncation present in the data.

99

A feature of some of the data used to estimate particle size distributions is a

definitive lower and upper bound for the particle size. For example, particle con-

centrations may be measured with a range of particle size from 3nm to 650nm,

depending on the measurement device used. Preliminary investigation of some sam-

pled data from Hyytiala (Finland), revealed the possibility of there being truncation

of the data on the lower and upper bounds. Figure 8.1 shows a sample of particle

size distribution data which clearly illustrates truncation on the lower bound of the

data.

Measurement of aerosol particles is commonly observed in the form of a number

of distinct particle size ranges, or channels, the size and number of the channels

being governed by the type and setup of the measurement instrument (Hussein

et al., 2004). For example, in the sampled data from Hyytiala (See Section 6.4.2),

we observed 32 distinct size partitions (bins) covering the range from 3nm to 650nm.

For estimation of truncated normal distributions using Gibbs sampling, we took a

missing data latent variable approach and introduced a new variable (y = (y, y∗))

to consist of the original data (y) in the largest and smallest size bins (yU and yL)

and the assumed missing data (y∗) consisting of size measurements smaller than the

lower bound, and larger than the upper bound of the original data.

To extend the boundary of the range of the data to be included in (y), we created

an additional number of bins for sizes less than (yL), and greater than (yU). For

example, for the sampled data used for fitting in the next section, we created four

additional bins to the left of the original lower bound, and three to the right of the

original upper bound. The space between size bins was evenly spread in proportion

to the size between the original bins.

We then estimated the parameters of the mixture model using the original data

and (y). At the end of each iteration y∗ was reallocated using the current parameter

values.

An advantage of this latent variable approach is that the algorithm described in

Section 6.3.1 can be readily applied to estimate y∗ and the mixture parameters based

on y. The approach is generalisable to other missing data assumptions we might

have about the original data, although in this chapter we confine our approach to

100

the issue of truncation. For the purposes of this chapter we now refer to the analysis

allowing for truncated data as ‘truncated Normal’.

6.4 Results

In this section, we present the results of applying the RJ algorithm outlined above

to a simulated and actual dataset. For both datasets analysed we used a uniform

prior for k over the range k = 1, . . . , 10 and the following weakly informative hyper-

parameter values: κ = 1/R2; α=2; g = 0.2; h = 10/R2 and δ=1, where R equals the

range of the data. Hyperparameter values for α and g encourages similar values for

σj without being informative about their absolute size, and the value for κ reflects

a weak prior belief in ξ. Results were based on 200,000 iterations with a burnin

period of 100,000.

6.4.1 Simulated data: single time point

In this section, we use simulated data to validate estimates from the RJMCMC

algorithm outlined above. The data comprised four components truncated on the

lower bound, with characteristics representative of the aerosol data described in

Section 6.3.2. Figure 6.3 shows the kernel density estimator of simulated data with

fitted results from normal and truncated normal approaches. The corresponding

posterior estimates of the component parameters and 95% credible intervals are

given in Table 6.1.

Due to the clear truncation of the data in Figure 6.3, we would expect the

truncated Normal distributions to fit the data better in the area of truncation

than a model that ignores this truncation. For both models the posterior estimate

for the number of components (k) was highest for four components (truncated:

P (k = 4) = 0.54, non-truncated: P (k = 4) = 0.93). The point estimates for the

parameters from the truncated Normal distribution for almost all parameters are

much closer to the true values than for estimates from the non-truncated version

(See Table 6.1). Ignoring truncation appears to result in less weight assigned to

the first component, with a mean value lower than the true value, and estimates

101

1 2 3 4 5 6

0.0

0.1

0.2

0.3

0.4

Diameter (natural log scale)

Den

sity

Figure 6.3: Kernel density estimator of simulated data (black) with fitted results fromnormal (dark green) and truncated normal (blue) approaches. Simulated data based onparameters: k = 4;µ = (1.40, 2.30, 3.70, 5.10);σ = 0.30; λ = (0.10, 0.10, 0.60, 0.20)

102

for standard deviation (σ1) and weight (λ1) smaller than the true values. For the

first component, the true value for the weight of the component (λ1) is 0.10, and

the mean estimates for the assumed non-truncated and truncated distributions are

0.0671 and 0.1017, respectively. In the non-truncated model, estimates for the sec-

ond and third components appear then to compensate for smaller estimates from the

first component, with standard deviation and weights for these components larger

than the true values. For the second component, the true value for σ2 is 0.30, and for

λ2 is 0.10. The mean estimates for the non-truncated distribution for σ2 and λ2 are

0.40 and 0.13, respectively. The results thus suggest that accounting for truncation

may not only result in a better fit for the associated component but also better fits

for neighbouring components.

Table 6.1: Estimated parameter values from Bayesian mixture model analysis using RJMCMCalgorithm with simulated data. Based on 200,000 iterations with a burnin of 100,000. CI =Credible Interval

Component Parameter True valuePosterior Estimates (Mean (95% CI))

Normal TruncatedNormal

µ1 1.40 1.32 (1.26, 1.38) 1.44 (1.30, 1.88)1 σ1 0.30 0.18 (0.13, 0.21) 0.34 (0.22, 0.63)

λ1 0.10 0.07 (0.05, 0.09) 0.10 (0.05, 0.20)µ2 2.30 2.14 (2.03, 2.24) 2.33 (2.09, 3.69)

2 σ2 0.30 0.40 (0.32, 0.46) 0.35 (0.21, 0.62)λ2 0.10 0.13 (0.10, 0.15) 0.12 (0.05, 0.44)µ3 3.70 3.73 (3.68, 3.76) 3.75 (3.68, 3.94)

3 σ3 0.50 0.53 (0.48, 0.58) 0.52 (0.45,0.58)λ3 0.60 0.62 (0.58, 0.66) 0.60 (0.18, 0.66)µ4 5.10 5.14 (5.03, 5.23) 5.13 (5.00, 5.23)

4 σ4 0.50 0.44 (0.39, 0.50) 0.45 (0.40, 0.51)λ4 0.20 0.18 (0.15, 0.22) 0.19 (0.15, 0.23)

103

6.4.2 Case study: single time point

Figure 8.1 shows a plot of actual data, taken from measurements at Hyytiala, a

boreal forest site in Southern Finland (SMEAR II) (Vesala et al., 1998). This dataset

was selected as it provides a wide ranging representation of modes for particle size

distributions (Dal Masso et al., 2005). Figure 6.4 show the results of fitting a normal

and truncated normal distribution to a single time period from this dataset.

1 2 3 4 5 6

0.0

0.1

0.2

0.3

0.4

0.5


Dens

ity

1 2 3 4 5 6

0.0

0.1

0.2

0.3

0.4

0.5


Figure 6.4: Histograms of data sampled from Hyytiala, Finland with estimated overallfit and components for non-truncated Normal (left, k=4) and truncated normal (right,k=3) overlaid

The non-truncated mixture model appears to fit two components with small

variance, whereas the truncated normal mixture model fits one component with

no apparent loss in fit. In practice, a result like this may suggest that there are

two sources for the smaller sized particles instead of one or alternatively a different

104

underlying aerosol process.

We can compare these with an iterative least squares method (LSM) commonly

used for estimation of the modal method in the aerosol literature. Different research

groups have developed their own algorithms and most involve some degree of user

input for the number of components (Makela et al., 2000; Birmili et al., 2001; Whitby

et al., 1991). For this reason, we chose the fully automated algorithm outlined by

Hussein et al. (2005), which compares favourably to Makela et al. (2000), and to

previous versions (Hussein et al., 2004). The aim here is not to comment on this

algorithm or on LSM as a methodology but rather to offer a brief comparison of

results in the context of this case study.

Figure 6.5 shows the results of fitting using LSM and our algorithm for a sample

of data from Hyytiala. The solid line is the predicted fit, and the dotted lines display

the components. The LSM appears to underfit both the small and large sizes of the

particle size distribution. Our algorithm identified two more modes to describe these

extremes of the particle size distribution, and the resulting five component model

appears to provide an improved fit.

Figure 6.6 again shows the results of fitting using the two approaches for a sec-

ond sample of data from Hyytiala. The fit of the LSM appears to ignore either a

second component (mean of 3.6nm) or skewness of the main component. From pre-

liminary investigation, this component gradually emerges around this time period.

Our algorithm provides a better fit for this second component and hence overall.

The main difference between the results of the two algorithms in the examples

that we have seen appears to be that our approach is performing a much more

thorough search of the parameter space including the number of components. By

using uninformative priors, our model choice criteria using RJMCMC is also largely

driven by the data, and is thus avoiding the use of any subjective influences on

model choice.

105

1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4


Dens

ity

1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4


Figure 6.5: Histograms of data sampled from Hyytiala, Finland with estimated overallfit and components from RJMCMC (left) and LSM (right) overlaid

106

1 2 3 4 5 6

0.00.2

0.40.6

0.81.0

1.2


Dens

ity

1 2 3 4 5 6

0.00.2

0.40.6

0.81.0

1.2


Figure 6.6: Histograms of data sampled from Hyytiala, Finland with estimated overallfit and components from RJMCMC (left) and LSM (right) overlaid

107

6.4.3 Results for mixture model estimation over multiple

time points

In this section, we apply our two stage algorithm to data collected over one day from

Hyytiala, measured at 10 minute intervals (T=144). Figure 6.7 shows the results

of the first stage of the algorithm, with a plot of the posterior mean estimates for

µjt at each time point t, with the size of the circles indicating the corresponding

weight λjt. The average of the number of components estimated with the highest

probability over the day was four, and the largest number of components was five.

12

34

5

Time

Pos

terio

r m

ean

estim

ates

for

µ

00:00 04:00 08:00 12:00 16:00 20:00 23:50

Figure 6.7: Plot of posterior mean values for µjt obtained from the first stage using theRJMCMC algorithm for one day (Hyytiala measurement station). The size of the circlesindicating the weight (λjt) corresponding to µjt

108

Based on these results, for the second stage of the analysis we set the number of

components to be five (k′= 5) and hyperparameters to be: ξ = (1.1, 2.0, 3.0, 4.0, 5.0);

s2tj = 2.0225; vtj = 0.092025; and nj = 200. The hyperparameter values for ξ were

chosen to adequately represent the parameter space, and hyperparameter values for

s2tj and vtj were chosen to indicate an uninformative prior and allow σjt to range

from 0 to 1.5. The large value of nj was chosen to be large enough to prevent label

switching on the means, and small enough to have neglible influence on the posterior

estimates. Over the time period of interest, the average concentration of particles

was approximately 100,000 and ranged from 2,000 to 120,000. In light of this, the

value for nj is relatively small.

Figure 6.8 shows a plot of the posterior mean estimates for the parameters (µ and

λ) for each component over the course of the day, obtained from the second stage of

analysis using independent priors. The figure indicates a nucleation event occurring

around 08:00. Such events are a common feature of the data from this measurement

station (Sogacheva et al., 2005). Characteristic of such an event is a large increase in

the number of smaller sized particles (Nucleation < 20nm) , which typically grow in

size over the next few hours to either the Aitken (25-90nm) or accumulation modes

(100+nm). From the parameter estimates (Figure 6.8), we see that most of the

weight for the mixture prior to 08:00 is in the third component (bottom panel, µ ≈20nm, λ=0.8), however from 08:00 to 12:00, we see a large increase in the weight

for the first component (µ ≈ 4nm, 08:00 to 10:00), followed by a large increase in

the weight for the second component (µ ≈ 10nm, 10:00 to 12:00). Components 4

and 5 appear and disappear during the course of the day (bottom panel). This

may have a physical interpretation as the presence of an actual component source

or alternatively it may represent convenient modelling of the skewness of a single

component through multiple Gaussian distributions. In either case, the evolving

nature of these components is clearly depicted through this model and motivates

scientific interest.

Figure 6.9 shows a plot of the posterior mean estimates for the parameters (µ

and λ) for each component over the course of the day, obtained from the second

stage of analysis using an informative prior. From Figure 6.9, we can see that the

109

mu

12

34

5

lam

bda

0.0

0.2

0.4

0.6

0.8

1.0

00:00 04:00 08:00 12:00 16:00 20:00 23:50

Figure 6.8: Plot of parameters (µ and λ) over one day (Hyytiala, Finland) for the inde-pendent approach. Stage 2 of the analysis for the evolution of parameters. Measurementstaken every 10 minutes. Colours indicate the components to which parameter estimatesbelong (The parameter estimates for the first component are Black, parameters for thesecond component are Red, for the third component they are Green, etc.)

110

parameter estimates from the informed prior approach appear largely to follow the

parameter estimates from the independent approach with some degree of smoothing

on the weights. Here φ was set equal to 0.05. In analysis not shown, smaller values

of φ suggest a smoother pattern to the weights over time. Alternatively φ could

be treated as unknown and estimated, although care is required in this case. The

results under the informed prior suggest that at times we may be able to better infer

patterns in the data or in some cases remove some anomalies.

As indicated in Section 6.3.1, a limitation of the rejection sampling algorithm

for the informed prior approach as outlined by Gustafson and Walker (2003), is the

specification of the candidate distribution Beta(m(r)jt + 1, m

(r)ikt + 1) for large sample

sizes. We found that for our large sample size of particles and with the volatility in

weights for some periods of time the acceptance rate of proposed parameter values

was exceedingly low (< 5%). This appeared to be particularly the case for estimation

during the period between 11:00 to 13:00 where the sample size increased markedly

(>75,000 particles) and with much variability. This is clearly due to a very narrowly

defined distribution if both m(r)jt and m

(r)kt are large, and if neighbouring estimates

of λjt (λj,t−1 and λj,t+1) are, under the independent approach, some distance away.

Further research to investigate alternative forms of the candidate distribution would

be beneficial to improve computation time.

6.5 Discussion

We used a Bayesian mixture model to estimate particle size distributions for a sam-

ple of real and simulated datasets. We also proposed a modification to the standard

Gibbs sampler to handle the case of truncated data on both the lower and upper

bound. The results from using the algorithm were promising, and the method pro-

vides considerable flexibility both in estimation and inference.

By estimating the parameters of a mixture model using the RJMCMC algorithm,

we can make probabilistic statements about all unknown parameters, including the

number of components (k). In the case of the number of components, this avoids

the need to use comparative measures for fixed values of k, but also places the

111

mu

12

34

5

lam

bda

0.0

0.2

0.4

0.6

0.8

1.0

00:00 04:00 08:00 12:00 16:00 20:00 23:50

Figure 6.9: Plot of parameters (µ and λ) over one day (Hyytiala, Finland) for theinformed prior approach. Stage 2 of the analysis for the evolution of the parameters. Mea-surements taken every 10 minutes. Colours indicate the components to which parameterestimates belong (The parameter estimates for the first component are Black, parametersfor the second component are Red, for the third component they are Green, etc.)

112

assessment of model fit on a probabilistic basis which can be used for inference. For

example, we can say with some probability whether two or three modes exist at a

particular point in time. Of further interest may be the concentration of particles for

each mode. To examine this, we can estimate the probability that the concentration

of particles is above or below certain thresholds of interest.

In a Bayesian approach, prior knowledge or expert opinion and hierarchical de-

scriptions of both local-scale and global features can be included in the model. Al-

though, we have generally used weakly informative priors for parameter estimates,

more informative priors can be used in situations where this information is available.

In the case of frequent measurements of aerosol particle size distribution over time,

including prior information may assist in estimation and inference. For estimation,

the identifiability of mixtures is an important issue (Marin et al., 2005) and we may

be able to better identify individual components in time periods where there appears

to be some degree of overlap. By obtaining smoother parameter estimates over time

we may be able to more clearly establish patterns or identify anomalies from the

data.

While the Bayesian method for mixture models offers both flexibility in estima-

tion and inference, the interpretability of the parameters estimated is an important

question. For example, particle size distribution fits with five components may not

readily have a physical interpretation. From preliminary investigation of various size

distribution data, we generally found that extra components were needed to account

for the skewness of some size distribution data. Depending on their location in the

size distribution neighbouring components may need to be combined. The Gaussian

representation of mixture densities makes this relatively straightforward. Interpreta-

tion of results is also aided by including prior or expert opinion, which may have the

effect of restricting parameter estimates to known domains. Alternatively, further

investigation of the particle source for these components may be needed.

Chapter 7

Bayesian estimation of mixtures over time with

application to aerosol particle size distributions

In this chapter, we examine in some detail the issue of using informative priors for

estimation of mixtures at multiple time points. In this analysis, the use of two

different informative priors, and an independent prior are compared using simulated

and actual data. The use of informative priors may provide useful information

in which to better identify component parameters at each time point, and as an

aid for inference provide information in which to more clearly establish patterns in

the parameters over time. As this chapter is designed to be read independently of

Chapter 6, Section 2 describing PSD data and the first part of Section 3 outlining

mixture models, are largely repeated from Chapter 6.

7.1 Introduction

Interest in the estimation of aerosol particle size distributions is largely directed at

better understanding aerosol dynamic processes (i.e., coagulation, nucleation, con-

densation, and deposition) that govern aerosol formation, as growth, and evolution

depend on the number, size, and composition of particles (Makela et al., 2000, Bir-

mili et al., 2001, Xu et al., 2002, Whitby et al., 2002, Lu and Bowman, 2004, Hussein

113

114

et al., 2005). In the atmosphere, these aerosol characteristics determine the influence

of particles on health, climate, cloud formation, and visibility (Seinfeld and Pandis,

1998). One of the most common approaches for representing size distributions is

by treating the aerosol size distribution at any time point as a set of individual

typically log-normal distributions or modes. In this formulation, the estimation of

particle size distributions is largely analagous to a mixture model problem at each

time point in the statistical setting and for which there is now a growing literature

(Marin et al., 2005).

While interest is in the representation of the particle size distribution as a mix-

ture at each time point, it is also of interest to describe how this distribution evolves

over time. To better understand aerosol dynamic processes, a feature of the mea-

surements of particle size distributions is that they are often collected at regular

points in time, and often at quite small time intervals (e.g every 10 minutes). In

this setting, parameters of the mixture model at each time point are likely to be cor-

related with neighbouring time points and useful information about the parameters

may be gained by incorporating this information in estimation.

The standard setting in which mixture models have been applied has largely been

for independent random samples (Marin et al. (2005)), but an emerging literature

is developing for situations in which the data are spatially and/or temporally struc-

tured (Fernandez and Green (2002); Green and Richardson (2002)). While some of

the methods developed for mixture models in the spatial setting can be adapted for

use in a time series setting, the influence or choice of informative priors in a time

series framework and the implications in different data environments has largely not

been examined.

While our motivation is in exploring methodology to be used for estimation

of particle size distributions over time, a more general framework can include any

situation in which a mixture representation exists at a point in time, but for which

further time series data are available. For example, in a disease mapping context

interest may be in both the mixture representation of the spatial surface and also

in any temporal changes to the mixture. In an image analysis context, interest may

be in changes to the composition of the image associated with an intervention or

115

response (e.g environmental modelling, neurological examinations, etc.)

In this chapter, we briefly explore two different informative priors for estimation

of mixtures where the data are highly correlated, and all parameters in the mixture

are allowed to vary. Different datasets, with features similar to actual particle size

distribution data, are used to highlight the influence of using informative priors and

identify situations where placing informative priors may not be beneficial.

An outline of the chapter follows. In Section 8.2, we briefly describe particle

size distributions, and provide an illustration with actual data. In Section 8.3, we

outline the standard mixture model setup for a single time point and then outline

two approaches to estimation of a mixture model where we have more than one

time point. Section 8.4 presents results on the performance of the approaches on

several simulated datasets and actual data, and we conclude in Section 7.5 with

some discussion and possibilities for further work.



the concentration of particles in terms of their size is referred to as the particle size

distribution. Figure 7.1 shows an example of particle size distribution data for one

measurement or time period. Because aerosol particles are often charged, their size

can be determined from their electrical mobility (McMurry, 2000).






to form well distinguishable modal feature. For example, during background con-


(diameter < 2.5 mm) is bimodal: an Aitken mode (below 0.1 mm) and an accumu-



116

1 2 3 4 5 6

020

040

060

080

010

00

Particle Diameter

Con

cent

ratio

n

Predicted fitComponents

Figure 7.1: Estimated overall fit and components from RJMCMC for one time period. Con-centration of particles (dN/dlog(Dp)[cm3]) by particle diameter (log(Dp(nm)))

117

with geometric mean diameter bellow 0.025 mm. However, in the urban atmosphere,


sources of aerosol particles and may show more than 3 lognormal modes. Typically

the number concentrations of aerosol particles in the urban background can be as

high 5× 104cm−3 and very close to a major road they often exceed 105cm−3.

7.3 Methods

In this section, we briefly describe a mixture model, outline a two stage approach to

estimation of parameters over time, and describe three types of priors for temporal

evolution of the parameters.

7.3.1 Mixture representation

The density of data (y) at a given time period is represented by a finite mixture

model

p(y|θ) =k∑

j=1

λjf(y|θj) (7.1)

where k is the number of components in the mixture, λj represents the probability

of membership of the jth component (∑k

j=1 λj = 1), and f(y|θj) is the density

function of component j which has parameters θj.

As component membership of the data is unknown, a computationally convenient

method of estimation for mixture models is to use a hidden allocation process and

introduce a latent indicator variable zi, which is used along the lines of a missing

variable approach to allocate observations yi to each component.

In this chapter, we adopt the common assumption of fitting log-normal distri-

butions to aerosol particle size distribution data (Whitby and McMurry, 1997). As

PSD data are often measured with a definite lower and upper bound for the size of

the particles we introduce a slight modification and assume that the data follow a

truncated normal distribution. As is commonly assumed, we take the data (y) to

be the log of particle diameters (nm), and the parameters to be estimated (θj) for

each component are the mean (µ), variance (σ2) and weight (λ). The number of

118


119

components k was also considered to be unknown. Priors for the first stage of the

analysis were:

p(µj) ∼ N(ξ, κ−1)






In the first stage of the temporal analysis, for each time period we implemented

Richardson & Green’s (1997) RJMCMC algorithm. Although this algorithm is easily

fit at a single time point, the use of RJMCMC for mixture models with temporal data

requires significant pre-processing with respect to mixing coverage and convergence,

as well as post-processing to provide adequate summary statistics and between time

component mapping. As an alternative, we considered a two-stage approach. In

the first stage, the number of components was estimated at each time point using

RJMCMC. In the second stage, we fixed the number of components (k) to the

maximum observed at any time period and independently estimated the parameters

of the mixture model (µ,σ, and λ) for each time period using a Gibbs sampler

algorithm. As we do not observe all of the components in every time period, we

allow component weights to be zero (inf(λt)=0.001) if required.

The Gibbs sampler is iterated until the Markov Chains for the parameters have

converged to stationary posterior distributions. For the second stage, priors were

p(λ) ∼ Dirichlet(α1, ..., αk)

p(µj|σ2j ) ∼ N(ξj,

σ2j

nj

) (7.2)

p(σ2j ) ∼ IG(

vj

2,s2

j

2)

where αj, ξj, nj, vj and sj are fixed hyperparameters.

120

For the independent prior case, we use uninformative priors for µ, σ and λ. Priors

for the dependent prior are discussed below.

7.3.2 Choice of temporal prior

In the second stage, three priors were considered for linking parameter values (µ,σ,λ)

over time. The first of these was the independent prior, in which the correlated

nature of the data was ignored completely and parameters were independently esti-

mated at each time point. The second and third were termed the ‘informed prior’

and ’penalised prior’, as described below.

Informed Prior

In this approach we use the information provided from previous and future time

periods as prior information for the current period. For the main results we focus on

a simple case where posterior estimates from the previous period are used as prior

information for the current period. We do this to illustrate the influence of a simple

prior specification on the posterior estimates of parameters (θ).

In the case of a mixture model using Gaussian distributions, we have three pa-

rameters (µ, σ and λ) for which we could utilise available prior information to aid in

estimation. Preliminary investigation indicates that all three parameters are likely to

show strong evidence of autocorrelation, so here we examine the effect of smoothing

on each of these parameters.

For p(λ), we allow αj in Equation 7.2 to vary and reflect prior information about

λt−1,j. Thus, we set αj = θjmt−1,j where mt−1,j is the mean of the number of

observations allocated to component j in the previous time period, and θj is fixed

at some value. An alternative is to impose a distribution on θ, say θj ∼ U(0, 1) (or

N(1, 0.5)) but we do not present the results for this approach in this chapter.

For the specification of prior information for µ and σ, we set ξjt = µjt−1,

vj = nj/σ2jt−1 and sj = nj and increase the value of nj from the value set for

the independent case to reflect the degree of dependency for these parameters from

the previous period.

121

Penalised Prior

An alternative to using the informed prior described in Section 7.3.2 is to use a re-

parameterisation of the prior to reflect the degree of dependency between parameters.

Gustafson and Walker (2003) proposed a prior for λ that downweights large changes

in probabilities in successive periods. Thus, at time period t (t = 2, . . . , T ), p(λt|zt)

is defined as

p(λt|zt) ∼ Dirichlet(1, . . . , 1) exp(−∑T

t=2

∑Jj=1(λt,j − λt−1,j)

2

φ) (7.3)

where smaller values of φ imply greater smoothing. The above formulation can

naturally be extended to dependence on more than a single time period.

A potential advantage of using both information about estimates forwards and

backwards in time is the additional information this may provide to guide parameter

estimates in the current period. This may be an advantage at times where large

changes in the parameter estimates are occurring for single time periods at a time.

For comparative purposes in Section 8.4.1, we compare the results of using a similar

formulation for λ in the informed prior approach.

To sample from the posterior distribution of λ, we use a rejection sampling

approach proposed by Gustafson and Walker (2003).

Prior distributions p(µ) and p(σ) are set as for the independent approach (Equa-

tion 7.2).

7.4 Results

In this section we present and assess the results using simulated data and then

present the results of applying the approaches to particle size distribution data from

Hyytiala, Finland. We use the simulated data to test the impact of the different

prior representations and the degree of smoothing. We first use an informative and

penalised prior only on the weights (λ), and then assess the influence of using an

informative prior on µ and σ.

122

7.4.1 Simulated Data

Data Setup

We simulated datasets indicative of the type of behaviour of aerosol particle size

distribution data observed at Hyytiala, a boreal forest site in Southern Finland

(SMEAR II) (Vesala et al., 1998). A particular feature of this particle size distribu-

tion data is both a growth in the mean and weight for a component.

We simulated data from three different cases. In the first case (D1), we simulated

data with parameters which can be characterised as having medium correlation

across time. In this case the mixture is well identified and interest is in whether the

results from the informed and penalised approaches are largely the same as for the

independent approach.

In practice it is quite common to observe sudden large changes in the number

of particles measured which may persist for a number of time periods. This is more

often observed when there are relatively few particles for a particular size group,

and more so for the smaller sized particles. Thus, for the second data set (D2) we

simulate data for the first component where the weight at smaller values is quite

volatile. For this dataset the mixture is also well identified.

For the third data set (D3), we simulated data which are highly correlated across

time, a feature of particle size distribution data observed in practice for most time

periods where measurements are commonly taken at small time intervals. This

dataset was also simulated with parameter estimates where at times the mixture is

not well identified. Of interest in this setting is to see the effect of using either the

informed prior or penalised prior approach.

For the results to follow, except as specified otherwise, for the independent,

informed prior and penalised prior approaches, we set the hyperparameters to be:

ξ = (1.5, 3.5, 5.0); s2t,j = 10; vt,j = 10/0.62; and nj = 2.

Smoothing on λ

Figure 7.3 shows the results of the informed prior, penalised prior and independent

approaches compared to the actual dataset D1. As expected for this well defined

123

dataset, the results for the informed and penalised prior approaches appear to largely

follow the results obtained from the independent approach.

Component 1

1.5

2.0

2.5

3.0

0.2

0.4

0.6

0.8

0 20 40 60 80 100

0.1

0.3

0.5

0.7

Component 2

3.2

3.3

3.4

3.5

3.6

3.7

0.2

0.4

0.6

0.8

0 20 40 60 80 100

0.15

0.20

0.25

Component 3

4.7

4.8

4.9

5.0

5.1

0.2

0.4

0.6

0.8

0 20 40 60 80 100

0.1

0.3

0.5

0.7

Figure 7.3: Plot of estimated parameters (µ (top panels), σ (middle panels) and λ

(bottom panels) for approaches using simulated data (D1): Simulated data (Black); Inde-pendent (Red); Informed Prior (Green); Penalised Prior (Blue)

Figure 7.4 shows the results of analysis using dataset D2, using the informed

prior approach with θ = 0.5 and penalised prior approach with φ = 0.08 and the

independent prior approach. The values for θ and φ were chosen to allow prior infor-

mation to be influential, but not overwhelm the posterior estimates. Interestingly

it is evident that smoothing on the weights results in compensatory measures by

both µ and σ. The compensatory measures appear to be more pronounced when

λ is volatile over time. In this case, the prior is imposing larger adjustments away

124

from the data at each time point. We see this most clearly in the results for the

first two components, but not for the third component. A possible explanation for

this observation is that for the first component, µ is able to adjust to a higher value

which is supportive of a greater value for λ, and in some sense borrow support from

the second component. However, for the third component, µ is not able to increase

or decrease in support of a lower value for λ by borrowing support from a nearby

component.

Component 1

1.5

2.0

2.5

3.0

0.4

0.6

0.8

1.0

1.2

1.4

0 20 40 60 80 100

0.1

0.2

0.3

0.4

0.5

Component 2

3.0

3.5

4.0

4.5

0.4

0.6

0.8

1.0

0 20 40 60 80 100

0.20

0.30

0.40

Component 3

4.6

4.8

5.0

5.2

0.4

0.5

0.6

0.7

0 20 40 60 80 100

0.2

0.3

0.4

0.5

0.6

0.7


(bottom panels) for approaches using simulated data (D2): Simulated data (Black); Inde-pendent (Red); Informed Prior (Green); Penalised Prior (Blue)

As shown in Figure 7.5 (black line), for the third data set (D3) we simulated data

for the first component with a mean value increasing from 1.5 to 3.0, and weight

125

increasing from 0.1 to 0.6 and then decreasing to 0.3, over time. Often a consequence

of the growth in the first component is a decline in size and weight for the larger

sized particles, and this is reflected in the weight for the second component following

an opposite pattern to the first component. For the third component, the weight

increases from 0.1 to 0.3 over time. The parameters µ and λ are simulated with

some noise around the parameter values, and the sample size is 1000.

Component 1

1.5

2.0

2.5

0.2

0.3

0.4

0.5

0.6

0 20 40 60 80 100

0.0

0.1

0.2

0.3

0.4

0.5

Component 2

3.1

3.2

3.3

3.4

3.5

3.6

0.2

0.4

0.6

0.8

0 20 40 60 80 100

0.2

0.4

0.6

0.8

Component 3

4.0

4.5

5.0

0.4

0.6

0.8

1.0

0 20 40 60 80 100

0.1

0.3

0.5

0.7


(bottom panels) for approaches using simulated data (D3): Simulated data (Black); Inde-pendent (Red)

Figure 7.5 also shows the results of using the independent approach. We see that

at times the parameter estimates for the independent approach deviate away from

the actual data.

126

Figures 7.6 and 7.7 show the results for the informed prior and penalised prior

compared to the actual data, respectively. In Figure 7.6, the results show the effect

of varying the degree of smoothing on λ for the informed prior using θ=(0.1,0.5,1.3).

For the results of the penalised prior, we vary the degree of smoothing on λ using

φ=(0.04,0.08,0.12).

Component 1

1.5

2.0

2.5

3.0

0.50

0.60

0.70

0 20 40 60 80 100

0.1

0.2

0.3

0.4

0.5

Component 23.

353.

453.

553.

650.

500.

600.

700.

80

0 20 40 60 80 100

0.3

0.4

0.5

0.6

0.7

0.8

Component 3

4.9

5.0

5.1

5.2

0.50

0.55

0.60

0 20 40 60 80 100

0.10

0.15

0.20

0.25


(bottom panels) for Informed Prior approach using simulated data (D3): Simulated data(Black); Theta=0.1 (Green); Theta=0.8 (Blue); Theta=1.3 (Purple)

In Figure 7.6, we can see that the parameter estimates for λ for all three values

of θ appear to closely follow the actual data, with the closest estimates to the actual

data being for θ = 0.5 and 1.3. As we are only using an informed prior on the

weights the parameter estimates for µ and σ appear to be quite variable over time

127

Component 1

1.5

2.0

2.5

0.55

0.60

0.65

0 20 40 60 80 100

0.1

0.2

0.3

0.4

0.5

Component 23.

303.

403.

503.

600.

500.

550.

600.

650.

70

0 20 40 60 80 100

0.3

0.4

0.5

0.6

0.7

0.8

Component 3

4.8

4.9

5.0

5.1

5.2

0.50

0.54

0.58

0.62

0 20 40 60 80 100

0.10

0.15

0.20

0.25


(bottom panels) for Penalised Prior approach using simulated data (D3): Simulated data(Black); φ=0.04 (Brown); φ=0.08 (Light Blue); φ=0.12 (Dark Green)

128

compared to the actual data. However, the variability appears to be slightly less

for these variables than for the independent approach (Figure 7.5) and closer to the

actual data over time. Of interest is the closeness of the parameter estimates of µ

and σ for components 1 and 2 which more clearly follow the true growth occurring in

component 1 and the stability over time for component 2 compared to that observed

for the independent approach.

In Figure 7.7, the parameter estimates for the penalised prior approach appear

to deviate slightly from the actual data for components 1 and 2. For the third

component, the parameter estimates for the penalised prior approach follows the

actual data with some noise. Overall, the results from the penalised prior approach

are similar to the independent approach but with less variability over time.

Smoothing on µ and σ

We turn now to an assessment of the impact of using an informative prior for µ

or σ over time. We present results for the highly correlated data set, since this is

the most sensitive of the simulated data as discussed above. Here we set nj = 25,

ξjt = µjt−1, vj = 200/σ2jt−1 and sj = 200.

In Figure 7.8, the parameter estimates for the informative prior for µ appear to

more closely follow the actual data than using an informative prior for σ. Although

the parameter estimates for both approaches appear to be further away from the

actual data than using an informative prior for λ, they do appear to be closer than

under the independent approach.

Figure 7.9 shows the results of using an informative prior on both µ and λ. In

this example, the results are similar to using an informative prior only on λ. Thus

depending on the objectives of the analysis, using an informative prior on both

parameters may not be needed.

In general, from the results of smoothing on µ, σ and λ it appears that large

adjustments to one parameter (e.g from volatility in some time periods) are not

supported unless compensatory measures can be taken by the other parameters.

In analyses not shown here, we compared the results from using a three period

centred moving on the weight for the informed prior to the results of using the

129

Component 1

1.5

2.0

2.5

0.55

0.60

0.65

0 20 40 60 80 100

0.1

0.2

0.3

0.4

0.5

Component 23.

303.

403.

503.

600.

500.

550.

600.

650.

70

0 20 40 60 80 100

0.3

0.4

0.5

0.6

0.7

0.8

Component 3

4.8

4.9

5.0

5.1

5.2

0.50

0.55

0.60

0.65

0 20 40 60 80 100

0.10

0.15

0.20

0.25


(bottom panels) for Informed Prior approach using simulated data (D3): Simulated data(Black); Smoothing on µ (Orange); Smoothing on σ (Dark Green))

130

Component 1

1.5

2.0

2.5

3.0

0.2

0.4

0.6

0.8

0 20 40 60 80 100

0.0

0.1

0.2

0.3

0.4

0.5

Component 23.

13.

23.

33.

43.

53.

60.

20.

40.

60.

8

0 20 40 60 80 100

0.2

0.4

0.6

0.8

Component 3

4.0

4.5

5.0

0.2

0.4

0.6

0.8

0 20 40 60 80 100

0.1

0.3

0.5

0.7


(bottom panels) for approaches using simulated data (D3): Simulated data (Black); Inde-pendent (Red); Smoothing on µ and λ (Green)

131

penalised prior to assess whether the form specified by the penalised prior had a

different influence on the results. We found the results from using both approaches

to be largely the same.

7.4.2 Case study

The data set studied here was taken from a measurement site at Hyytiala, Finland; a

plot of the measurements for the day selected is shown in Figure 8.2. This particular

day was selected as it shows a new particle formation event occurring, whereby a

new mode of aerosol particles appears with a significant influx of particles (as high as

106cm3) with a geometric mean diameter (< 10 nm), growing later into the Aitken

(25-90nm) or accumulation modes (100+nm). In terms of a temporal mixture model

setting, we will be able to assess the performance of the three prior specifications

outlined previously as new components are introduced and both a growth in the

mean and weight for those components are observed.

As outlined in Section 8.3, the first stage of our approach is to apply RJMCMC

to each time period. These results are then used to guide the choice of the number

of components and initial parameter estimates for the second stage analysis, in

which temporally correlated priors are used to model the evolution of the mixture

parameters over time. Figure 8.9 shows the results of the first stage of the algorithm,

with a plot of the posterior mean estimates for µjt at each time point t, with the

size of the circles indicating the corresponding weight λjt. The average number

of components estimated with the highest probability over the day was four; the

minimum number of components was one, and the maximum number of components

was five.

For the second stage, we fixed the number of components to be five. For the inde-

pendent approach, we set the hyperparameters to be: ξ = (1.5, 2.2, 3.0, 3.8, 4.2, 5.1);

s2t,j = 2.0225; vt,j = 0.092025; and nj = 200. Figure 7.11 shows the results of

estimation using the independent approach.

From previous results of using the three prior specificatons to simulated data

(Section 8.4.1) we generally found closer parameter estimates to the actual data

over time using an informed prior on µ or λ. For data that are quite noisy (D2), we

132

0 20 40 60 80 100

12

34

5

Time

Pos

terio

r m

ean

estim

ates

for

µ

Figure 7.10: Plot of posterior mean estimates for µj from RJMCMC algorithm for oneday (Hyytiala). Stage 1 of analysis for temporal evolution of parameters. Larger circlesindicate greater weight for that component

133

3

]), a

s.ts

(MuM

eans

Ind[

, 4])

, as.

ts(M

uMea

nsIn

d[, 5

]))

23

45

3

]), a

s.ts

(Pro

pMea

nsIn

d[, 4

]), a

s.ts

(Pro

pMea

nsIn

d[, 5

]))

0 20 40 60 80 100 120 140

0.0

0.2

0.4

0.6

0.8

1.0

Figure 7.11: Plot of estimated parameters (µ (top panel), λ (bottom panel) under anindependent prior over time. Stage 2 of analysis for temporal evolution of parameters.

134

also observed that the informative and penalised prior specifications can cause large

adjustments to other parameters. Thus, caution must be exercised when applying

the approaches to data of this type.

Figure 7.12 shows the results of estimation using the informed prior with smooth-

ing on all of the weights and only the mean for component 3. Comparing the results

to the independent approach in Figure 7.11, the parameter estimates for the informed

prior appear to show smoothly growing estimates for µ over time for components 1

and 2, and smoother parameter estimates for λ.

3

]), a

s.ts

(MuM

eans

TP

[, 4]

), a

s.ts

(MuM

eans

TP

[, 5]

))2

34

5

3])

, as.

ts(P

ropM

eans

TP

[, 4]

), a

s.ts

(Pro

pMea

nsT

P[,

5]))

0 20 40 60 80 100 120 140

0.0

0.2

0.4

0.6

0.8

1.0

Figure 7.12: Plot of estimated parameters (µ (top panel) and λ (bottom panel) underan informed prior over time. Stage 2 of analysis for temporal evolution of parameters.Informed prior specified for λ in all components and µ3

In analyses not shown here, we also considered the effect of using an informative

prior on other parameters and found the results to vary to a small degree. Our

choice in using an informative prior on µ3 and λ was guided partly by interest in

135

λ and from prior belief in µ for the larger sized particles, forming in this case the

background concentration of particles, to be highly correlated over time. We were

also guided by our choice of parameters by the variability of the parameter estimates

from the first stage of the analysis.

7.5 Discussion

In this chapter, we explored the problem of estimating Bayesian mixture models

at multiple time points. Under different situations, approaches that employ infor-

mation about neighbouring time points compared favourably to results based on

an independent approach. By including additional temporal information about pa-

rameters for correlated time periods we may be able to better identify individual

components at each time point. As an aid for inference, we may also be able to ob-

tain smoother parameter estimates over time and from this be able to more clearly

establish patterns or identify anomalies from the data.

The results highlight a number of observations about mixture representations at

multiple time points. First, analysis of the evolution of parameters of a mixture over

multiple time points highlights the large degree of dependency that exists between

component parameters. Changes to a parameter in one component may flow on to

the parameter in a nearby component. Depending on the context of the study, we

can anticipate this dependency to be more readily apparent for the weight parameter

but we found similar dependencies to exist for other parameters. The second is the

need to be mindful that the same parameter in one component may have a different

correlation structure over time to the same parameter in another component. In

the context of particle size distribution data, we often observed greater volatility

in estimates for the smaller particles compared to the larger sized particles and

so at times the correlation structure of the parameters between these respective

components appeared to be quite different.

A possible effect of using informative priors in this context is to impose a prior

not supported by the data or to impose a temporal correlation structure where

such a structure does not exist, and thereby cause unnecessary adjustments to other

136

parameters. We observed this most clearly in the results from the simulated data

where at times the data was quite noisy. For this dataset, using an informative

prior for a parameter which supported large adjustments away from the actual data,

resulted in large compensatory adjustments being made by other parameters not only

within the same component, but also to parameters in neighbouring components.

The easy solution may be to use an appropriate correlation structure for components

but of course this may not always be known apriori.

A further result of the dependency that can exist between parameters of com-

ponents and within component parameters is that the inclusion of correlation in-

formation to aid in the identifiability of the mixture, may not be required for all

parameters or alternatively all components. In the context of a mixture with a small

number of components, we may only need to provide more information about one

parameter for an influential component in order to separate out the influence of com-

peting components. This result will also be invaluable if the correlation structure

for one parameter or parameters for one component are more readily known. In the

context of a mixture of Gaussians, we generally found that an informative prior was

only needed on µ or λ or possibly both. This result could well be context specific and

influenced by any reliance on the means for defining (in terms of size) and ordering

of components. The choice of which parameter to use more information may also

be guided by whether it is a parameter of interest for inference as demonstrated in

analysis of the case study where most interest was in the behaviour of λ over time.

In this case, and in general, one must be careful in the analysis of just one parameter

as it can largely be a conditional analysis in view of the behaviour of other possible

cross-correlated parameters within the same component and between components.

While many of the above difficulties may seem to be avoided if smoothing ap-

proaches are applied retrospectively on parameter estimates from an independent

mixture model, this type of analysis may largely ignore the true mapping of compo-

nents or path of parameters over time. From the results of the simulated data, the

large degree of dependency that we observe between the parameters of a mixture

over time suggests that including temporal information to better identify one of the

parameters at a single time point can flow on to affect other parameters. This could

137

change inference about both the mixture representation at a point in time, and also

the behaviour of mixture parameters over time.

In general, one of the potential difficulties in using an informative prior approach

to smooth parameter estimates over time is the variable degree of influence the prior

may have in the posterior. If the primary objective is to obtain smoothed parameter

estimates over time, larger sample sizes and noisiness of the data at times may

warrant increasingly restrictive priors. In such cases where the objective might be

to downplay the influence of the data, a number of alternative approaches to increase

the influence of prior information can be used Ibrahim et al. (2003). In all cases, it is

valuable to undertake a sensitivity analysis in order to assess the effect of the prior.

Such an analysis should include the independent prior as a baseline comparison.

Alternative approaches which are less sensitive to the form in which prior infor-

mation is given in the model, and/or include covariate information could also be

used to aid in estimation.

For estimation of aerosol particle size distributions, the dynamics of the aerosol

process and the complexity of the influences on particle concentration and size,

demand the use of approaches which utilise as much information from the data as

possible. To this end, the inclusion of temporal information may be helpful.

Chapter 8

Bayesian hierarchical modelling for a time series of

mixtures

In this chapter, we address some of the issues raised in Chapter 7, and explore a

hierarchical approach to estimation of mixture parameters over time in which an

informative prior is placed at two different levels. Simulated and actual data is used

to assess the performance of the approach. As this chapter is designed to be read

independently of the previous chapters, Section 2 describing PSD data and the first

part of Section 3 outlining mixture models, are largely repeated from Chapter 7.

8.1 Introduction

In Chapter 7, we explored the problem of estimating a Bayesian mixture model at

multiple time points using an informative or penalised prior which carried informa-

tion about the correlation of parameters for neighboring time points. The analysis,

in general, highlighted a number of observations about mixture representations at

multiple time points. First, analysis of the evolution of parameters of a mixture over

multiple time points highlighted the large degree of dependency that exists between

component parameters. For example, in the case of a mixture of Gaussians, large

changes to the weight parameter over time for one component not only was reflected

138

139

in adjustments to the weights of other components but also to the mean parameter

of the associated component and neighbouring components.

The second is the observation that a parameter in one component may have a dif-

ferent correlation structure over time to the same parameter in another component.

In the context of PSD data, we often observe greater volatility in concentration lev-

els for the smaller sized particles compared to the larger sized particles and we can

expect this to be reflected in the corresponding parameter estimates for a mixture

model over time.

In light of the above observations, a possible effect of using informative priors

in this context is to impose a prior not supported by the data or to impose a

temporal correlation structure where such a structure does not exist, and thereby

cause unnecessary adjustments to other parameters.

In this chapter, we explore the problem of estimating a Bayesian mixture model

at multiple time points using a hierarachical model for the parameters in which

an informative prior is placed at two different levels. The aim of exploring this

approach is to address some of the issues raised in the previous paper and develop

an alternative approach which is less sensitive to the form in which prior information

is given in the model.

An outline of the chapter follows. In Section 8.2, we briefly describe particle

size distributions, and provide an illustration with actual data. In Section 8.3, we

outline the standard mixture model setup for a single time point and a two stage

approach for estimation of a mixture model at multiple time points. For estimation

of a mixture model at multiple time points we introduce a hierarchical approach

to estimation. Section 8.4 presents results on the performance of the approach on

several simulated datasets and actual data, and we conclude in Section 8.5 with

some discussion and possibilities for further work.



the concentration of particles in terms of their size is referred to as the particle size

140

distribution. Figure 8.1 shows an example of particle size distribution data for one

measurement or time period. Because aerosol particles are often charged, their size

can be determined from their electrical mobility (McMurry, 2000).

1 2 3 4 5 6

0.0

0.1

0.2

0.3

0.4

0.5


Den

sity

Figure 8.1: Histogram of data sampled from Hyytiala, Finland for a single time period






to form well distinguished modal features. For example, during background con-


(diameter < 2.5 mm) is bimodal: an Aitken mode (below 0.1 mm) and an accumu-



with geometric mean diameter below 0.025 mm. However, in the urban atmosphere,

141


sources of aerosol particles and may show more modes. Typically the number con-

centrations of aerosol particles in the urban background can be as high as 5×104cm−3

and very close to a major road they often exceed 105cm−3.

8.3 Mixture models

In this section, we briefly describe a mixture model, outline the independent and

informed prior representations, and outline a hierarchical approach to estimation of

parameters.

The density of data (y) at a given time period is represented by a finite mixture

model

p(y|θ) =k∑

j=1

λjf(y|θj) (8.1)

where k is the number of components in the mixture, λj represents the probability

of membership to the jth component (∑k

j=1 λj = 1), and f(y|θj) is the density

function of component j which has parameters θj.

As component membership of the data is unknown, a computationally convenient

method of estimation for mixture models is to use a hidden allocation process and

introduce a latent indicator variable zij, which is used along the lines of a missing

variable approach to allocate observations yi to each component.

In this chapter, we adopt the common assumption of fitting log-normal distri-

butions to aerosol particle size distribution data (Whitby and McMurry, 1997). As

PSD data are often measured with a definite lower and upper bound for the size of

the particles we introduce a slight modification and assume that the (log) data follow

a truncated normal distribution. Thus, we take the data (y) to be the log of particle

diameters (nm), and the parameters to be estimated (θj) for each component are

the mean (µ), variance (σ2) and weight (λ). The number of components k is also

considered to be unknown.

For the independent, informative prior and hierarchical approaches (except where

142


143

stated otherwise), priors were:

p(µj) ∼ N(ξ, κ−1)






In the first stage of the temporal analysis, for each time period we implemented

Richardson & Green’s (1997) RJMCMC algorithm to estimate both θt and kt (t =

1, . . . , T ). Although this algorithm is easily fit at a single time point, the use of

RJMCMC for mixture models with temporal data, where both θ and k may vary et

each time point, requires significant pre-processing with respect to mixing coverage

and convergence, as well as post-processing to provide adequate summary statistics

and between time component mapping. As an alternative, we consider a two-stage

approach. In the first stage, the number of components is estimated at each time

point using RJMCMC. In the second stage, we fix the number of components (k)

to the maximum observed at any time period and then independently estimate the

parameters θj (j = 1, . . . , k) for each time period using a Gibbs sampler algorithm.

As we do not observe all of the components in every time period, we allow component

weights to be ‘effectively zero’ (inf(λt)=0.001) if required. The Gibbs sampler is

iterated until the Markov Chains for the parameters have converged to stationary

posterior distributions.

In the second stage, for estimation of parameters of a mixture at multiple time

points, independent estimation of θ at each time period does not allow for any

information about θ to be shared over time. An alternative to this independent

approach is to use an informative prior where information provided from previous

and future time periods is used as prior information for the current period. In a

previous paper, we explored the use of this type of approach in some detail. Here, the

prior was imposed directly on elements of θ and strong sensitivity of the posterior

144

estimates tom the specification of these priors was observed. In this paper, we

explore an alternative representation as described in section 1.3.1 and present the

results from the previous approach as a comparison to the results from the new

hierarchical approach.

We focus, in particular, on a simple case where posterior estimates from the

previous period are used as prior information in the current period. As the weight

parameter in a mixture is often of interest in analysing PSD data, we present the

results from using an informative prior for this parameter. Thus specification of

prior information for λ can be achieved by allowing δt,j in the Dirichlet prior at time

t to depend on λt−1,j.

For the results to follow for the informative prior approach, we specify that δt,j =

θjmt−1,j where mt−1,j is the mean number of observations allocated to component j

in the previous time period. The parameter θj reflects how strongly the information

from the previous time period is used as prior information for the current period. In

this chapter, we choose to fix θ = 0.5; alternatively we could estimate this parameter

but we do not pursue this approach in this paper.

8.3.1 Hierarchical time series approach for mixture models

In this section, we outline a hierarchical approach for the estimation of parameters

of a mixture model for multiple time points.

Smoothing on µ

The hierarchical approach for µ is specified as,

µjt ∼ N(φjt, V1)

φjt ∼ N(φjt−1, V2)(8.2)

where V1 and V2 are fixed scalars, reflecting the variability of µjt and φjt re-

spectively. In this hierarchical formulation, the parameter µ is used to estimate the

mixture distribution at the level of the data, and φ represents the underlying corre-

lation of µ over time (assuming an AR(1) process). In this setting, we can interpret

145

the ratio V2/V1 as reflecting the amount of information we have about the underlying

behaviour (signal) of µ in comparison to estimates at the level of the data (noise).

For the first time period (t=1), we set φjt = µjt. For estimation of µ and φ we

use a Gibbs sampling scheme. For details see the Appendix.

Smoothing on λ

For independent data, a convenient prior for λ is to use a Dirichlet distribution,

Wj ∼ Gamma(shape = αi, scale = 1)(independently)

then

V =k∑

j=1

Wj ∼ Gamma(shape =k∑

j=1

αi, scale = 1),

(λ1, . . . , λk) = (W1/V, . . . , Wk/V ) ∼ Dir(α1, . . . , αk)

(8.3)

However, it is difficult to work with a Dirichlet in a time series or hierarchical

approach, mainly due to the inflexibility of the Gamma distribution. An alternative

formulation of the Dirichlet in terms of the Beta distribution does not appear to

provide greater flexibility.

Another alternative is to use a Logistic-Normal prior for λ (LN(λ; Xt, Σd)) where,

Wt ∼ MV N(Xt, Σd)

λj,t =exp(Wj,t)∑kj=1 exp(Wj,t)

(8.4)

Using this functional form, the parameterisation of λ in terms of a multivariate

normal distribution allows for a suitably flexible form in which to explore a hierar-

chical structure for this parameter. Such flexibility, in comparison to the Dirichlet

distribution, has recently been investigated in a hierarchical approach for pooling of

estimates across different sampling units (Hoff, 2003).

In a hierarchical setting and similar to the model used for µ we can further say

146

that,

Xt ∼ MV N(Xt−1, Σs)

γj,t =exp(Xj,t)∑kj=1 exp(Xj,t)

(8.5)

where Σd and Σs reflect the variability of Wt and Xt respectively. In this hier-

archical formulation, the parameter λ is used to estimate the mixture model at the

level of the data, and γ represents the underlying correlation of λ over time (assum-

ing an AR(1) process). For the results to follow the diagonal entries of Σd and Σs are

fixed to reflect the noisiness of the data and the degree of smoothing respectively,

and off diagonal entries are set to zero. Alternatively, we could estimate Σd and Σs

but we do not pursue this approach in this paper.

For estimation of λ and γ we use a Gibbs sampling scheme with a Metropolis

Hastings step For details see the Appendix. For identifiability, both Wt and Xt are

k − 1 dimensional, and λk = 1−∑k−1j=1 λj (with same identification used for γ).

8.4 Results

In this section we present and assess the results using simulated data and then

present the results of applying the approach to particle size distribution data from

Hyytiala, Finland.

8.4.1 Simulated Data

Data Setup

We simulated data which is indicative of the type of behaviour of aerosol particle

size distribution data observed at Hyytiala, a boreal forest site in Southern Finland

(SMEAR II) (Vesala et al. (1998)). A particular feature of this particle size distribu-

tion data is both a growth in the mean and weight for a component. Two datasets

are simulated. The first provides an illustration of a particular feature of PSD data

for some time periods and the second is representative of most time periods.

In practice it is quite common to observe sudden large changes in the number

of particles measured which may persist for a number of time periods. This is more

147

often observed when the number of particles for a particular size group are low, and

more so for the smaller sized particles. For the first dataset (D1) we simulate data

for the first component where the weight at smaller values is quite volatile. For this

dataset the mixture is well identified.

For the second dataset (D2), we simulated data which is highly correlated across

time, a feature of particle size distribution data observed in practice for most time

periods where measurements are commonly taken at small time intervals. This

dataset was simulated with parameter estimates where the mixture is not well iden-

tified during the second half of the time period.

Results from simulated dataset D1

As shown in Figure 8.3 (black line), for the first simulated dataset (D1) we simulated

data for the first component with a mean value increasing slowly over time from 1.5,

and weight increasing from 0.2 to 0.5. For the first half of the time period, the

weight for the first component was simulated with a large degree of noise to reflect

the observed volatility of smaller sized particles in practice at relatively low weights.

The parameter µ was simulated with some noise around the parameter values; σ

kept constant at 0.55, and the sample size was 1000.

Also shown in Figure 8.3 are the results from the independent (red) and informed

prior approaches (green). For the results from the informed prior, we can see that

the effect of using an informative prior on the weights (λ) over time results in

compensatory measures by both µ and σ. We can see this most clearly in the results

for the first component where we see large adjustments to µ1 in compensation for

a smoother estimate of λ1 over time, which is clearly in contrast to the actual data

(black) and results from the independent approach (red line). Of interest is that we

don’t see large compensatory measures for the parameters of the third component,

where for the first half of the time period the actual behaviour of the weight (λ3) is

highly variable. The difference appears to be that in the first component, the mean is

able to adjust to a higher value which is supportive of a greater weight, and in some

sense borrow support from the second component. For the third component, the

mean is not able to increase or decrease in support of a lower weight by borrowing

148

Component 1

1.5

2.0

2.5

3.0

0.4

0.6

0.8

1.0

1.2

1.4

0 20 40 60 80 100

0.1

0.2

0.3

0.4

0.5

Component 23.

03.

54.

04.

50.

50.

70.

91.

1

0 20 40 60 80 100

0.20

0.30

0.40

Component 3

4.6

4.8

5.0

5.2

0.45

0.55

0.65

0.75

0 20 40 60 80 100

0.2

0.3

0.4

0.5

0.6

0.7

Figure 8.3: Plot of estimated parameters over time for simulated dataset D1: µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Independent (Red),Informed Prior (Green)

149

support from a nearby component. Both the independent approach and informed

prior approaches overestimate the weight in the second component.

Figure 8.4 provides the results of the hierarchical model for µ (green and blue)

and the actual data (black). For these results, V1 and V2 are fixed at 0.36 and 0.04

respectively. From Figure 8.4, we can see that the estimates for φ are a smooth

version of the much noisier estimates for µ.

Component 1

1.4

1.6

1.8

2.0

0.4

0.5

0.6

0.7

0.8

0 20 40 60 80 100

0.1

0.2

0.3

0.4

0.5

Component 2

3.4

3.6

3.8

4.0

0.5

0.6

0.7

0.8

0 20 40 60 80 100

0.20

0.25

0.30

0.35

0.40

Component 3

4.85

4.95

5.05

5.15

0.50

0.55

0.60

0 20 40 60 80 100

0.2

0.3

0.4

0.5

0.6

0.7

Figure 8.4: Plot of estimated parameters over time for simulated dataset D1. µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Hierachical approachfor µ (Dark Green), φ (Blue)

Figure 8.5 provides the results of the hierarchical model for λ (green and blue)

and the actual data (black). For these results, V1 and V2 are fixed at 0.2 and

0.015 respectively. From Figure 8.5 the estimates for γ roughly follow the more

variable estimates for λ over time. In contrast to the results from the informed

150

prior (Figure 8.3), we don’t see any large compensatory adjustments being made

to parameters by using a hierarachically based informative prior for γ. Apart from

estimates for γ, the variability of the other parameter estimates are comparable to

the results from the independent approach.

Component 1

1.4

1.6

1.8

2.0

0.4

0.5

0.6

0.7

0.8

0 20 40 60 80 100

0.1

0.2

0.3

0.4

0.5

Component 2

3.4

3.6

3.8

4.0

0.5

0.6

0.7

0.8

0 20 40 60 80 100

0.20

0.25

0.30

0.35

0.40

Component 3

4.85

4.95

5.05

5.15

0.50

0.55

0.60

0 20 40 60 80 100

0.2

0.3

0.4

0.5

0.6

0.7

Figure 8.5: Plot of estimated parameters over time for simulated dataset D1. µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Hierachical approachfor λ (Dark Green), γ (Blue)

Results from simulated dataset D2

As shown in Figure 8.6 (black line), for the second simulated dataset (D2) we sim-

ulated data for the first component with a mean value increasing from 1.5 to 3.0,

and weight increasing from 0.1 to 0.6 and then decreasing to 0.3, over time. Often a

consequence of the growth in the first component is a decline in size and weight for

151

the larger sized particles, and this is reflected in the weight for the second component

following an opposite pattern to the first component. For the third component, the

weight increases from 0.1 to 0.3 over time. The parameters µ and λ are simulated

with some noise around the parameter values, and the sample size is 1000.

Component 1

1.5

2.0

2.5

3.0

0.35

0.45

0.55

0.65

0 20 40 60 80 100

0.1

0.2

0.3

0.4

0.5

Component 2

3.20

3.30

3.40

3.50

0.40

0.50

0.60

0.70

0 20 40 60 80 100

0.4

0.5

0.6

0.7

0.8

Component 3

4.7

4.8

4.9

5.0

5.1

5.2

0.35

0.45

0.55

0 20 40 60 80 100

0.10

0.15

0.20

0.25

0.30

Figure 8.6: Plot of estimated parameters over time for simulated dataset D2. µ (top panels),σ (middle panels), λ (bottom panels). Actual data (Black), Independent (Red), Informed Prior(Green)

Also shown in Figure 8.6 are the results from the independent (red) and informed

prior approaches (green). For the results of the informed prior, the parameter esti-

mates appear to closely follow the actual data in comparison with the independent

approach. Of interest is the closeness of the parameter estimates of µ and σ for com-

ponents 1 and 2 over the second half of the time period, which more clearly follow

the true growth occurring in component 1 and the stability over time for component

152

2.

Figure 8.7 provides the results of the hierarchical model for µ (green and blue)


0.0025 respectively. The parameter estimates for the mean of the first and second

components appear to be lower than the actual data for the last quarter of the time

period, which is similar to the results of the independent approach (Figure 8.6).

Component 1

1.4

1.8

2.2

2.6

0.35

0.45

0.55

0 20 40 60 80 100

0.1

0.2

0.3

0.4

0.5

Component 2

3.25

3.35

3.45

3.55

0.40

0.50

0.60

0.70

0 20 40 60 80 100

0.4

0.5

0.6

0.7

0.8

Component 3

4.8

4.9

5.0

5.1

0.35

0.40

0.45

0.50

0.55

0 20 40 60 80 100

0.10

0.15

0.20

0.25

0.30

Figure 8.7: Plot of estimated parameters over time for simulated dataset D2. µ (top panels), σ

(middle panels), λ (bottom panels). Actual data (Black), Hierachical approach for µ (Dark Green),φ (Blue)

Figure 8.8 provides the results of the hierarchical model for λ (green and blue)


0.0075 respectively. The parameter estimates for the hierarchical approach appear

to more closely follow the actual data than for the independent approach.

153

Component 1

1.5

2.0

2.5

0.35

0.45

0.55

0.65

0 20 40 60 80 100

0.1

0.2

0.3

0.4

0.5

Component 23.

303.

403.

500.

450.

500.

550.

60

0 20 40 60 80 100

0.4

0.5

0.6

0.7

0.8

Component 3

4.85

4.95

5.05

5.15

0.35

0.40

0.45

0.50

0 20 40 60 80 100

0.10

0.15

0.20

0.25

Figure 8.8: Plot of estimated parameters over time for simulated dataset D2.µ (toppanels), σ (middle panels), λ (bottom panels). Actual data (Black), Hierachical approachfor λ (Dark Green), γ (Blue)

154

8.4.2 Case study

The data set studied here was taken from a measurement site at Hyytiala, Finland

and a plot of the measurements for the day selected is shown in Figure 8.2. This

particular day was selected as it shows a new particle formation event occurring,

whereby a new mode of aerosol particles appears with a significant influx of particles

(as high as 106cm3) with a geometric mean diameter (< 10 nm), growing later into

the Aitken (25-90nm) or accumulation modes (100+nm). In terms of a mixture

model setting, we will be able to assess the performance of the three approaches

outlined previously as new components are introduced and both a growth in the

mean and weight for those components are observed.

As outlined in Section 8.3, the first stage of our approach is to apply RJMCMC

to each time period. These results are then used to guide the choice of the number

of components and initial parameter estimates for the second stage analysis, in

which temporally correlated priors are used to model the evolution of the mixture

parameters over time. Figure 8.9 shows the results of the first stage of the algorithm,

with a plot of the posterior mean estimates for µjt at each time point t, with the

size of the circles indicating the corresponding weight λjt. The average number

of components estimated with the highest probability over the day was four; the

minimum number of components was one, and the maximum number of components

was five.

For the second stage, we fixed the number of components to be five. Figure 8.10

shows the results of estimation using the independent approach.

Figure 8.11 shows the results of estimation using the hierarchical model for the

weights (λ). For these results, V1 and V2 are 0.05 and 0.015 respectively. Compared

to the results from the independent approach (Figure 8.10), we see a noticeable

reduction in the noise surrounding λ and a clearer picture emerging of the pattern

of λ over the course of the day.

Alternatively, we could have used a hierarchical approach for µ alone, or for both

µ and λ. In results not shown, we found similar results for both approaches.

155

0 20 40 60 80 100

12

34

5

Time

Pos

terio

r m

ean

estim

ates

for

µ

Figure 8.9: Plot of posterior mean estimates for µj from RJMCMC algorithm for oneday (Hyytiala). Stage 1 of analysis for temporal evolution of parameters. Larger circlesindicate greater weight for that component

156

12

34

50.

00.

20.

40.

60.

81.

0

00:00 04:00 08:00 12:00 16:00 20:00 23:50

Figure 8.10: Plot of estimated parameters over time for actual data. Independentapproach. Posterior mean estimates for µ (top panel), and λ (bottom panel).

157

12

34

50.

00.

20.

40.

60.

81.

0

00:00 04:00 08:00 12:00 16:00 20:00 23:50

Figure 8.11: Plot of estimated parameters over time for actual data. Hierarchicalapproach for λ. Posterior estimates for µ (top panel), and γ (bottom panel).

158

8.5 Discussion

In this chapter, we explored the problem of estimating Bayesian mixture models at

multiple time points. In this setting, parameters of the mixture model at each time

point are likely to be correlated with neighbouring time points and useful information

about the parameters may be gained by incorporating this information in estimation.

We found that using a hierarchical approach to the estimation of parameters, where

an informative prior is placed at two different levels, offers considerable flexibility in

estimation for a mixture model setting.

Compared to placing an informative prior at a single level, a hierarchical ap-

proach allows for a separation of the underlying pattern of the parameter over time

(signal) from some of the noise surrounding the parameter at each time point. The

advantage of this is two fold. First, where inference is interested in the underlying

pattern of the parameters over time, we may be able to more clearly establish pat-

terns or identify anomalies from the data. Second, in light of the large degree of

dependency that exists between parameters of a mixture both within and between

components, we can impose an informative prior which may be less sensitive to

changes in the correlation structure of the data, and thereby reduce the influence of

adjustments to neighbouring parameters.

In the hierarchical approach outlined, the influence of the informative prior at

the two levels was specified by parameters V1 (low level) and V2 (high level), and

the values assigned to these parameters is critical in carrying information about

the correlation structure of the parameter of interest. In this paper, we decided

to choose parameter values based on prior belief in the correlation structure of the

data; alternatively these parameters could be estimated. To this effect, a number

of approaches are available for estimation (West and Harrison (1997); Fahrmeir

et al. (2004)). However, in order to estimate V1 and V2, we still face a choice as to

the degree of penalisation or smoothing of the parameter in light of the apparent

variability in the data. This is a common issue in temporal and spatial modelling

in general.

Although, we have focussed on developing a hierarchical approach for parameters

159

µ and λ we could equally apply the same approach to consider estimation of σ. Such

an approach may be to consider a half-t distribution which has previously been used

in similar hierarchical settings (Gelman (2006)). Faced with a number of parameters

to consider, the choice of parameter or parameters may depend on the objectives of

the analysis, the data context and the information available. As most interest for

PSD data is in the size and composition of particles over time, we found it useful to

concentrate on µ and λ over time; in other contexts this will change.

Although we have applied the hierarchical approach to PSD data, the approach

is generalisable to other contexts in which a mixture representation exists at multiple

time points. For example, in a disease mapping context interest may be in both the

mixture representation of the spatial surface and also in any temporal changes to

the mixture.

The hierarchical approach considered here can be readily generalised to include

covariates. Morever, through the flexibility of assuming a logistic normal distribution

on the weights we can better explore and estimate transitory movements between

components.

There are several limitations to the hierarchical approach considered. First, the

hierarchical approach relies on estimation of parameters under a fixed number of

components. In this chapter, we sought to fix the number of components based on

a first stage analysis in which we used results from the RJ approach as a guide to

the maximum number of components and for establishing hyperparameter values.

In some situations, where reliable prior information is available this first stage may

not be necessary. However, an alternative is to use a single approach and jointly

allow estimation of the parameters and the number of components (e.g RJMCMC).

This single modelling approach requires reversible moves not only within time pe-

riods but also across them. In our experience this was computationally very costly

and required substantial pre-processing to ensure good mixing, labelling and conver-

gence. Moreover, further post-processing was required to obtain adequate summary

statistics and between component mapping.

A further limitation of the approach outlined is that it is computationally ex-

pensive. Most of this expense is experienced in the first stage of the analysis. For

160

estimation of PSD data over one day using 144 time points, the running time of

the RJ approach with 200,000 iterations was about 27 hours. In comparison, the

second stage approach using 50,000 iterations took about an hour. Such computa-

tional expense quickly becomes burdensome if analyses is required for several days

or indeed several weeks. Of course, the use of the first stage for subsequent days

may not required, considerably reducing the computational time involved.

Chapter 9

Conclusions and further work

This chapter provides a brief overview of the thesis and some suggestions for further

work.

9.1 Conclusions

The primary aim of this thesis was to develop mixture model approaches to char-

acterise complex environmental exposures and outcomes. To address this primary

aim, we focussed on a number of applied problems in characterising complex en-

vironmental exposures and outcomes, including: assessing the interaction between

environmental exposures as risk factors for health outcomes; identifying differing

environmental outcomes across a region; and establishing patterns in the size and

concentration of aerosol particles over time. In this section, we discuss the four main

methodological contributions to address these problems and associated applied con-

tributions which have been made.

First, we explored the use of a mixture model in a meta-analysis setting to provide

for a joint assessment of the evidence for a number of hypothesised relationships in

the data. In Chapter 3, we examined the use of a multivariate meta-analysis to

describe the relationship between exposure to asbestos and smoking on the risk of

161

162

lung cancer. In particular, from a statistical perspective, interest was in whether

the risk from exposure to both asbestos and smoking is an additive, multiplicative

or other relation of the risk from exposure to each factor alone. In this analysis, we

considered the evidence for either relation using separate tests.

In Chapter 4, we extended the analysis in Chapter 3 and explored a mixture

model approach to assess the strength of evidence for either relation. In this ap-

proach, we moved away from separate tests for either an additive or multiplicative

relation and allowed the data to choose between both models. The approach allowed

both relations to be considered at the same time, and an advantage for inference is

that we can say with some probability whether the data belongs to one relation or

another. This type of inference may be more informative than information provided

from significance tests on each relation separately.

Second, we developed a simple mixture model approach to classify cases of a

disease over time into a number of groups. In Chapter 5, we examined a mixture

model approach to characterise the risk of Ross River virus (RRv) in Queensland.

This approach built on the approach adopted by Gatton et al. (2004) and considered

that the weekly cases of RRv could be attributed to more than two hypothesised

periods (an outbreak period or no outbreak period), and also extended the analysis

to compare the number and timing of the periods across the spatial region of QLD.

In this approach, we may be able to better identify outbreak periods when they

occur and also provide a more detailed characterisation of the data, which can be

used as a basis for association of explanatory variables.

Third, we developed and examined an informative prior approach for estimation

of mixture model parameters for multiple time points. A mixture model approach

to estimate aerosol particle size distribution (PSD) data over time was introduced

in Chapters 6, 7 and 8. In Chapter 6, we compared the results of using a Bayesian

mixture model approach to estimating PSD data with a commonly used estima-

tion method in the aerosol physics literature. In using a Bayesian mixture model

approach we were able to improve upon previous approaches by providing a better

exploration of the parameter space, and also allow the data to better choose between

alternative representations without the use of subjective decisions. As PSD data is

163

often measured over time at small time intervals, we also examined the use of an

informative prior for estimation of the mixture parameters which takes into account

the correlated nature of the parameters.

In Chapter 7, we examined in some detail the issue of using informative priors for

estimation of mixtures at multiple time points. In this analysis, the use of two dif-

ferent informative priors, and an independent prior were compared using simulated

and actual data. In general, we found that approaches that employ information

about neighbouring time points compared favourably to results based on an inde-

pendent approach. We found that by using informative priors about parameters for

correlated time periods we may be able to better identify individual components at

each time point. As an aid for inference, we may also be able to obtain smoother pa-

rameter estimates over time and from this be able to more clearly establish patterns

or identify anomalies from the data.

Analysis of the evolution of parameters of a mixture over multiple time points

also highlighted the large degree of dependency that exists between component pa-

rameters. A possible effect of using informative priors in this context is to impose a

prior not supported by the data or to impose a temporal correlation structure where

such a structure does not exist, and thereby cause unnecessary adjustments to other

parameters.

Fourth, we introduced a hierarchical approach to estimate mixture model pa-

rameters for multiple time points. In this approach (Chapter 8), we addressed some

of the issues associated with using an informative prior at a single level found in

Chapter 7, and allowed an informative prior to be placed at two different levels.

Compared to placing an informative prior at a single level, a hierarchical approach

allows for a separation of the underlying pattern of the parameter over time (signal)

from some of the noise surrounding the parameter at each time point. In this case,

we may be able to more clearly establish patterns or identify anomalies in the data.

We can also impose an informative prior which is less sensitive to changes in the

correlation structure of the data, and thereby reduce the influence of adjustments

to neighbouring parameters.

In summary, we have demonstrated that a mixture model approach can be used

164

to better understand and describe features/relationships within environmental ex-

posure data. The approach is not without significant computational and estimation

issues, and thus considerable care must be taken in using the approach for inference.

These issues, however, are likely to be outweighed by the additional information this

approach can provide to understand complex environmental exposure and outcome

data.

9.2 Future Work

The mixture models and analysis in this thesis could be extended in a number of

ways.

A mixture model approach to provide an assessment of interaction or relation-

ship between risk factors in a meta-analysis context could be extended to include

alternative relations or be used to assign preference to more than two relations. The

number of relations and the nature of the hypothesised relations being dependent

on the context of the study.

We could extend the mixture model to characterise the risk of RRv over time

to formally include a spatial dimension, where mixture model parameters for each

zone are able to borrow strength from parameter estimates of neighbouring zones.

This is similar to the approach adopted in Fernandez and Green (2002) for a single

time point, in which the weight parameter is spatially related by neighbouring sites.

Further analysis of the RRv data would be needed, however, to investigate which

parameters may be spatially related, including the timing of components.

For estimation of mixture models over time (Chapters 6, 7 and 8), a number of

extensions are possible. First, improvements to parameter estimation may be gained

by reducing the influence of the truncated nature of the size data (i.e the effect of

binning on the size of the particles). In estimation, we could take into account

the ordering of the size bins. In this case, we recognize that observations within

neighbouring size bins are more likely to be allocated to the same component. A

natural approach would be to then use a spatial prior on the allocation variable (z)

(similar to Alston et al. (2005)), and depending on the strength of prior information,

165

this could reduce the number of components covering only a small number of size

bins.

To reduce the influence of the truncated nature of the size data, we could also

expand the number of size bins used in estimation. In this approach, a number of

extra size bins are created between the original size bins and handled in estimation

as missing data. The effect of which is likely to lead to a smoother mixture repre-

sentation of the data. The tradeoff is more computational time by a factor of the

number of additional size bins created, and this would need to be evaluated against

potential improvements in estimation.

Within the MCMC framework, block updating rather than sequential updating

could be used in the hierarchical approach to minimise the effect of correlation

between parameters leading to improved convergence and mixing. This is likely to be

of most benefit for dependencies which are apparent between µ and φ (Equation 8.2)

or λ and γ (Equation 8.5)

Any improvements to computational time are worthy of investigation. Analyses

requiring several weeks or months will require significant computational demands.

Population monte carlo (Celeux et al. (2003)), or perfect sampling (Casella et al.

(2004)) could be investigated and developed to allow for estimation of a mixture

over multiple time points.

The hierarchical approach to estimation could also be extended to include a hi-

erarchical structure for the variance (σ2), and alternative correlation structures. For

the variance, a flexible prior such as a truncated t-distribution could be investigated

(Gelman (2006)). The correlation structure could also be extended to include covari-

ates, which could provide further information to aid in identification of components

at each time point.

Further analysis could also be undertaken to associate the components (modal

structure) of the mixture model with health outcomes. Evidence on the association

of air pollution particles with a number of respiratory related diseases is growing

(Osunsanya et al. (2001); Chen et al. (2006)). Such a detailed characterisation of the

data would enable a more representative association to be obtained with either the

source of the particles or a range of particles of a particular size and concentration.

Appendix A

Appendices

A.1 Calculations for the variance of S and V (Ch.3)

Variance of S

We calculated the variance of S based on Rothman (1976). A large sample interval

estimator for S based on a log-Gaussian sampling distribution would be

SL = exp(ln(S)− Z1−α/2SE(ln(S)))

SU = exp(ln(S) + Z1−α/2SE(ln(S)))(A-1)

The evaluation of SE(ln(S)) depends upon the type of study. For case-control

studies,

SE(ln(S)) =

[ˆvar( ˆRRAS)

( ˆRRAS − 1)2+

ˆvar( ˆRRS) + ˆvar( ˆRRA) + 2 ˆcov( ˆRRS, ˆRRA)

( ˆRRS + ˆRRA − 2)2

− 2 ˆcov( ˆRRAS, ˆRRS + ˆRRA)

( ˆRRAS − 1)( ˆRRS + ˆRRA − 2)

]1/2

,

(A-2)

where

ˆvar( ˆRRij) = ˆRRij2(

1

aij

+1

cij

+1

b+

1

d

)(A-3)

166

167

ˆcov( ˆRRS, ˆRRA) = ˆRRSˆRRA

(1

b+

1

d

)(A-4)

ˆcov( ˆRRAS, ˆRRS + ˆRRA) = ˆRRAS( ˆRRS + ˆRRA)

(1

b+

1

d

), (A-5)

and b and d denote the frequencies of cases and controls in the low-risk category for

both risk indicators, and aij and cij denote the frequencies of cases and controls in

(non-referent) risk category i, j.

For cohort studies (with small effects), using first order Taylor series approximations,

SE(ln(S)) =

[ˆvar( ˆRAS) + ˆvar(R00)

( ˆRAS − R00)2+

ˆvar(RS) + ˆvar(RA) + 4 ˆvar(R00)

(RS + RA − 2R00)2

− 4 ˆvar(R00)

( ˆRAS − R00)(RS + RA − 2R00)

]1/2(A-6)

where ˆvar(Rij) can be taken as Rij/Mij with Mij denoting the total number of

observations in the joint risk indicator category i, j.

Variance of V

For case-control studies, V can also be expressed as RRAS compared to RRS(X2)

divided by RRA(X1). V = X2/X1.

var(log(X1)) = 1/a + 1/b + 1/c + 1/d

var(log(X2)) = 1/e + 1/f + 1/g + 1/h

var(log(V )) = var(log(X1)) + var(log(X2))

(A-7)

where a to h denote the frequency of cases and controls for each risk category, and

X1 and X2 are assumed to be independent.

For cohort studies with background risk not externally referenced,

var(log(V )) = 1/a + 1/b + 1/e + 1/f (A-8)

For cohort studies with background risk externally referenced, we used the variance

for the ratio of two standardised ratios found in Gardner and Altman (1989).

168

A.2 Reversible Jump Markov Chain Monte Carlo

(RJMCMC) (Ch.6)

In this section, we outline details of the RJMCMC algorithm used in this chapter.

An important feature of the algorithm is the variable dimension of the state spaces.

The change of dimension for the mixture model using Reversible Jump Markov Chain

Monte Carlo (RJMCMC) is achieved by either splitting an existing component into

two separate components (increasing the dimension of the model by one component)

or merging two existing components into a single component, commonly known as

the split/merge step of the algorithm.

To split a component, a vector of continuous random variables (u), which are

independent of the current model, are drawn and applied in an invertible determin-

istic function to propose a new model. The proposal is designed to be deterministic

in order that the reverse of the split move, the corresponding merge move, can be

obtained through the inverse transformation of the function.

The other dimension changing moves proposed in RJMCMC are the addition of

a new component or the removal of an empty component which is currently in the

model. These proposals are referred to as births and deaths, respectively.

The Normal mixture model is computed iteratively as follows;

1. Given (λ,µ,σ), update allocation vector z,

2. Given (k, µ,σ), update estimates of the weights λ,

3. Given (k, λ), update Normal component parameters µj and σ2j , j ∈ {1, · · · , k}

4. Update hyperparameters as required,

5. Propose a split or merge for the components in the current model, and accept

with probability

In this scheme, steps 1-4 do not involve changes in dimension and are updated

using standard Gibbs moves outlined below, with the following conjugate priors;

169

p(µj) ∼ N(ξ, κ−1) (A-9)

p(σ−2j ) ∼ Gamma(δ, β) (A-10)

p(β) ∼ Gamma(g, h) (A-11)

p(λ) ∼ Dirichlet(α1, α2, · · · , αk) (A-12)

p(k) ∼ Uniform(kmin, kmax) (A-13)

where ξ, κ, δ, α, η, g, h, kmin and kmax are fixed hyperparameters. Note, in this

case all µj follow a universal prior.

We can construct these prior distributions to be weakly informative and use their

conjugacy to obtain proper posterior distributions for the unknown mixture model

parameters.

Step 5 requires the reversible jump mechanism of the algorithm. The choice

between a split or merge move is made with equal probability, with the only exception

being at the extremes of the allowable range for k (if k = kmin, the probability of

proposing a split move is 1, and if k = kmax, the probability of a split move is 0).

To propose a split move, Richardson and Green (1997) generate a 3 dimensional

random vector u using beta distributions;

u1 ∼ beta(2, 2), u2 ∼ beta(2, 2), u3 ∼ beta(1, 1)

and randomly choose one of the current k components to be split. For example, we

will assume component j is chosen for a split move. The proposed transformation

of variables ((θ(n), uθn) = Tm→n(θ(m), uθm)) is;

λj1 = u1λj and λj2 = (1− u1)λj

µj1 = µj − u2σj

√λj2

λj1

and µj2 = µj + u2σj

√λj1

λj2

σ2j1 = u3

(1− u2

2

)σ2

j

(λj

λj1

)and σ2

j2 = (1− u3)(1− u2

2

)σ2

j

(λj

λj2

)

170

where dim(n) > dim(m). The allocation vector zi, where zij = 1, is redrawn so that

the data which is currently allocated to component j is now reallocated to either

component j1 or j2.

The split proposal is the reverse of the merge proposal for components j1 and

j2. To propose the merger of 2 components, the parameters of the mixture model

for these components are reassigned by matching the 0th, 1st and 2nd moments for

the distribution;

λj = λj1 + λj2

λjµj = λj1µj1 + λj2µj2

λj

(µ2

j + σ2j

)= λj1

(µ2

j1 + σ2j1

)+ λj2

(µ2

j2 + σ2j2

)

The allocation vector zi, where zij1 = 1 or zij2 = 1 is amalgamated so that the

allocation becomes zij = 1.

In the case of a split move, the probability of acceptance for the move from model

Mm to Mn is

min

(π(n, θ(n))

π(m, θ(m))

πnm

πmn

g(uθn)

g(uθm)

∣∣∣∣∂Tm→n(θ(m), uθm)

∂(θ(m), uθm)

∣∣∣∣, 1)

(A-14)

involving the Jacobian of the transform Tm→n, the probability πmn of choosing a

jump to Mm while in Mn, and g, the density of u. The acceptance probability for

the merge move is the inverse ratio of a split.

171

A.3 Penalised Prior (Ch.6)

In this section we outline the rejection sampling algorithm for λ proposed by Gustafson

and Walker (2003) for the penalised prior approach.

Prior

p(λ) ∝ Dirichlet(1, . . . , 1) exp

(T∑

t=2

‖λt,j − λt−1,j‖2

φ

)(A-15)

Posterior

p(λ|φ,m) ∝k∏

j=1

{ T∏t=1

f(λjt|mjt + 1)I(λjt)}

exp

(T∑

t=2

‖λt,j − λt−1,j‖2

φ

)(A-16)

Gustafson and Walker (2003) suggest sampling λjt/s from a Beta(mjt+1,mkt+1)

and accepting when U ≤ g1(λjt)/g2(λjt) (U ∼ U(0, 1)), where

g1(λjt) = λmjt

jt (s− λjt)mktI(λt)

× exp[−φ−2{(λt,j − λt−1,j)2 + (λt,j − (s− λt−1,j))

2

+ (λt,j − λt+1,j)2 + (λt,j − (s− λt+1,k))

2]) (A-17)

and

g2(λjt) = λmjt

jt (s− λjt)mktI(λt)

× exp[−φ−2{(λ∗ − λt−1,j)2 + (λ∗ − (s− λt−1,j))

2

+ (λ∗ − λt+1,j)2 + (λ∗ − (s− λt+1,k))

2]) (A-18)

where

λ∗ = max{

0, min{1

4(λt−1,j + s− λt−1,k + λt+1,j + s− λt+1,k), s

}}(A-19)

172

s = λjt + λkt and g1(λjt) ≤ g2(λjt). I(λt) is an indicator function equal to 1 when

λt ∈ R and 0 otherwise.

173

A.4 Details of MH Gibbs sampler for hierarchical

model (Ch. 8)

HM for µ

Update z, β, λ, σ as in Independent Model.

After updating λ and before updating σ,

φjt|. ∼ N(V1φjt−1 + V2µjt

V1 + V2

,1

V −11 + V −1

2

)

µjt|. ∼ N

(φjt + mj yjV1σ

−2jt

V1mjσ−2jt + 1

,V1

(V1mjσ−2jt + 1)

)

HM for λ

Update zi, then update γt,

Sample from conditional,

Xt ∼MV N(Σ−1

d Wt + Σ−1s Xt−1

Σ−1d + Σ−1

s

,1

Σ−1d + Σ−1

s

)

γjt =exp(Xj,t)∑kj=1 exp(Xj,t)

where Xkt = 0.

Update λt

Sample from

174

Wi+1,t ∼ MV N(Wit, σ2pI)

with density = q(Wi+1,t|Wit)

where λi+1,t =exp(Wi+1,t)∑kj=1 exp(Wi+1,t)

and accept this proposal with probability min(α, 1)where

α =π(λi+1,t)q(Wi+1,t|Wit)

π(λit)q(Wit|Wi+1,t)

Let Wk1 = 0 and for T = 2, Wj2 = log(mj,t−1

mk,t−1), where mj,t−1 is the mean number

of observations allocated to component j in the previous time period (under the

independent approach).

π(λit) = LN(λit; Wt, Σd)×∏k

j=1 λmjt

ijt . σ2p is the variance of the proposal. Proposal

accepted if u < α where u ∼ U(0, 1), otherwise λi+1,t = λit.

Update µ, β and σ as in Independent Model.

Bibliography

Alston, C. L., K. Mengersen, and C. P. Robert (2005). Bayesian mixture models in a

longitudinal setting for analysing sheep CAT scan images. Journal of Agriculture.

Archer, V. (1988). Lung cancer risks of underground miners. The Yale Journal of

Biology and Medicine 61, 183–193.

Ashby, D. (2006). Bayesian statistics in medicine: A 25 year review. Statistics in

Medicine 25, 3589–3631.

Begg, C. B. and M. Mazumdar (1994). Operating characteristics of a rank correlation

test for publication bias. Biometrics 50, 1088–1101.

Berry, D. A. and F. D. K. Liddell (2004). The interaction of asbestos and smoking

in lung cancer: A modified measure of effect. Annals of Occup. Hygiene 48 (5),

459–462.

Berry, G., M. L. Newhouse, and P. Antonis (1985). Combined effect of asbestos

and smoking on mortality from lung cancer and mesothelioma in factory workers.

British Journal of Industrial Medicine 42, 12–18.

Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems.

Journal of the Royal Statistical Society: Series B 36, 192–236.

Birmili, W., A. Wiedensohler, J. Heintzenberg, and K. Lehmann (2001). Atmo-

spheric particle number size distribution in central Europe: Statistical relations

to air masses and meteorology. J. Geophys. Res. 106, 32005–32018.

175

176

Bowden, J., J. R. Thompson, and P. Burton (2006). Using pseudo-data to correct

for publication bias in meta-analysis. Statistics in Medicine 25, 3798–3813.

Breslow, N. E. and N. E. Day (1987). Statistical methods in cancer research. Vol 2:

The design and analysis of cohort studies. IARC.

Breslow, N. E. and B. E. Storer (1985). General relative risk functions for case-

control studies. American Journal of Epidemiology 122, 149–162.

Brooks, S. P. and A. Gelman (1998a). Alternative methods for monitoring conver-

gence of iterative simulations. Journal of Computational and Graphical Statis-

tics 7, 434–455.

Brooks, S. P. and A. Gelman (1998b). Alternative methods for monitoring conver-

gence of iterative simulations. Journal of Computational and Graphical Statis-

tics 7, 434–455.

Brown, C. C. and C. K. C (1989). Additive and mulitplicative models and multistage

carcinogenesis theory. Risk analysis 9, 99–105.

Cappe, O., C. P. Robert, and T. Ryden (2002). Reversible jump MCMC converging

to birth-and-death MCMC and more general continuous time samplers. Journal

of the Royal Statistical Society, Series B 65 (3), 679–700.

Carlin, J. B. (1992). Meta-analysis for 2×2 tables: a Bayesian approach. Statistics

in Medicine 11, 141–158.

Casella, G., C. P. Robert, and M. T. Wells (2004). Mixture models, latent variables

and partitioned important sampling. Statistical Methodology 1, 1–18.

Celeux, G., F. Forbes, C. P. Robert, and M. Titterington (2003, June). Deviance

Information Criteria for missing data models. Technical report, Institut National

De Recherche en Informatique et en Automatique.

Celeux, G., M. Hurn, and C. P. Robert (2000). Computational and inferential dif-

ficulties with mixture posterior distributions. Journal of the American Statistical

Association 95 (451), 957–970.

177

Celeux, G., J. M. Marin, and C. P. Robert (2003). Iterated importance sampling in

missing data problems. Technical report, Universite Paris Dauphine.

Chen, L., K. Mengersen, and S. Tong (2006). Spatiotemporal relationship between

particle air pollution and respiratory emergency hospital admissions in brisbane,

australia. Science of the total environment 373 (1), 57–67.

Cohen, D., S. F. Arai, and J. D. Brain (1979). Smoking impairs long-term dust

clearance from the lung. Science 204, 514–516.

Dal Masso, M., M. Kulmala, and I. Riipinen (2005). Formation and growth of fresh

atmospheric aerosols: eight years of aerosol size distribution data from smear ii,

hyytiala, finland. Boreal Environment Research 10, 323–336.

Dalrymple, M. L., I. L. Hudson, and R. P. K. Ford (2003). Finite mixture, zero-

infated poisson and hurdle models with application to SIDS. Computational Statis-

tics & Data Analysis 41, 491–504.

Dempster, A. P., M. R. Selwyn, and B. J. Weeks (1983). Combining historical and

randomized controls for assessing trends in proportions. Journal of the American

Statistical Association 78, 221–227.

Denison, D. G. and C. C. Holmes (2001). Bayesian partitioning for estimating disease

risk. Biometrics 57, 143–149.

DerSimonian, R. and N. Laird (1996). Meta-analysis in clinical trials. Controlled

clinical trials 7, 177–188.

Diebolt, J. and C. P. Robert (1994). Estimation of finite mixture distributions

through bayesian sampling. Journal of the Royal Statistical Society, Series B 56,

363–375.

Do, K. A., P. Muller, and F. Tang (2005). A Bayesian mixture model for differential

gene expression. Journal of the Royal Statistical Society C 54 (3).

Doll, R. (1971). The age distribution of cancer: implications for models of carcino-

genesis. JRSS(A) 134, 133–166.

178

Dominici, F., M. Daniels, S. L. Zeger, and J. Samet (2002). Air pollution and

mortality: Estimating regional and national dose-response relationships. Journal

of the American Statistical Association 97 (457), 100–111.

Dominici, F., J. Samet, and S. L. Zeger (2000). Combining evidence on air pollution

and daily mortality from the largest 20 U.S cities: a hierarchical modelling strategy

(with discussion). Journal of the Royal Statistical Society, Series A 163, 263–302.

Dominici, F., A. Zanobetti, S. L. Zeger, J. Schwartz, and J. M. Samet (2004).

Hierarchical bivariate time series models: a combined analysis of the effects of

particulate matter on morbidity and mortality. Biostatistics 5, 341–360.

Dumouchel, W. (1990). Bayesian meta-analysis. In D. A. Berry (Ed.), Statistical

methodology in the Pharmaceutical Sciences, pp. 509–529. New York: Dekker.

Dumouchel, W. and J. E. Harris (1983). Bayes methods for combining the results

of cancer studies in humans and other species (with discussion). Journal of the

American Statistical Association 78, 293–315.

Duval, S. J. and R. L. Tweedie (2000). A non-parametric “trim and fill” method

of accounting for publication bias in meta-analysis. Journal of the American

Statistical Association 95, 89–98.

Egger, M., G. D. Smith, M. Schneider, and C. Minder (1997). Bias in meta-analysis

detected by a simple, graphical test. British Medical Journal 315, 629–634.

Erren, T., M. Jacobsen, and C. Piekarski (1999). Synergy between asbestos and

smoking on lung cancer risks. Epidemiology 10 (4), 405–411.

Fahrmeir, L., T. Kneib, and S. Lang (2004). Penalized structured additive regression

for space-time data: A bayesian perspective. Statistica Sinica 14 (3), 731–761.

Fernandez, C. and P. J. Green (2002). Modelling spatially correlated data via mix-

tures: A Bayesian approach.

179

Fruhwirth-Schnatter, S. (2001). Markov chain Monte Carlo estimation of classi-

cal and dynamic switching mixture models. Journal of the American Statistical

Association 96, 194–209.

Fruhwirth-Schnatter, S. and S. Kaufmann (2004). Model-based clustering of multiple

time series. Report, Johannes Kepler Universitat Linz.

Gangnon, R. E. and M. K. Clayton (2000). Bayesian detection and modeling of

spatial disease clustering. Biometrics 56, 922–935.

Gardner, M. J. and D. G. Altman (1989). Statistics with confidence. London: BMJ.

Gatton, M. L., L. A. Kelly-Hope, B. H. Kay, and P. A. Ryan (2004). Spatial-

temporal analysis of Ross River virus disease patterns in Queensland, Australia.

American Journal of Tropical Hygiene and Medicine 71 (5), 629–635.

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical mod-

els. Bayesian analysis 1 (3), 515–533.

Geweke, J. (2007). Interpretation and inference in mixture models: Simple MCMC

works. Computational Statistics & Data Analysis 51, 3529–3550.

Goldberg, M. (1999). Asbestos and cancer risk: the exposure-effect relationship for

populations with occupational exposure. Revue Des Maladies Respiratoires 16,

1278–1285.

Green, P. and S. Richardson (2002). Hidden markov models and disease mapping.

Journal of the American Statistical Association 97 (460), 1055–1070.

Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and

Bayesian model determination. Biometrika 82, 711–732.

Greenland, S. and K. J. Rothman (1998). Modern methods in epidemiology, Volume

2nd edition. Philadelphia: Lippincott-Raven.

Griffin, J. and M. Steel (2004). Semiparametric Bayesian inference for stochastic

frontier models. Journal of Econometrics 123 (1), 121–152.

180

Guerrero, V. M. and R. A. Johnson (1982). Use of the Box-Cox transformation with

binary response models. Biometrika 69 (2), 309–14.

Guidotti, T. (2002). Apportionment in asbestos-related disease for purposes of com-

pensation. Industrial Health 40, 295–311.

Gustafson, P. and L. J. Walker (2003). An extension of the Dirichlet prior for the

analysis of longtitudinal multinomial data. Journal of Applied Statistics 30 (3),

293–310.

Gustavsson, P., F. Nyberg, G. Pershagen, and et al (2002). Low-dose exposure to

asbestos and lung cancer: dose-response relations and interaction with smoking in

a population-based case-referent study in stockholm, sweden. American Journal

of Epidemiology 155 (11), 1016–1022.

Hallqvist, J., A. Ahlbom, F. Diderichsen, and et al (1996). How to evaluate interac-

tion between causes: a review of practices in cardiovascular epidemiology. Journal

of Internal Medicine 239, 377–382.

Hamilton, J. D. (1994). Time series analysis. New Jersey: Princeton University

Press.

Harley, D., A. Sleigh, and S. Ritchie (2001). Ross River virus transmission, infection,

and disease: a cross-disciplinary review. Clin Microbiol Rev 14, 909–932.

Hasselblad, V. (1995). Meta-analysis of environmental health data. The Science of

the Total Environment 160/16, 545–558.

Hodgson, J. and A. Darnton (2000). The quantitative risks of mesothelioma and lung

cancer in relation to asbestos exposure. Annals of Occupational Hygiene 44 (8),

565–601.

Hoff, P. D. (2003). Nonparametric modelling of hierarchically exchangeable data.

Technical report, Dept. Statistics, University of Washington.

181

Hussein, T., M. Dal Masso, and P. T (2005). Evaluation of an automatic algorithm

for fitting the particle number size distributions. Boreal Environment Research 10,

337–355.

Hussein, T., A. Puustinen, P. P. Aalto, J. M. Mkel, K. Hmeri, and M. Kulmala

(2004). Urban aerosol number size distributions. Atmos. Chem. Phys. 4, 391–411.

Ibrahim, J. G., M. H. Chen, and D. Sinha (2003). On optimality properties of the

power prior. Journal of the American Statistical Association 98, 204–213.

Ishwaran, H., L. F. James, and J. Sun (2001). Bayesian model selection in finite

mixtures by marginal density decompositions. Journal of the American Statistical


Jasra, A., C. C. Holmes, and D. A. Stephens (2005). Markov Chain Monte Carlo

methods and the label switching problem in bayesian mixture modelling. Statis-

tical Science 20 (1), 50–67.

Kass, R. and A. E. Raftery (1995). Bayes factors. Journal of the American Statistical


Kelly-Hope, L. A., D. M. Purdie, and B. H. Kay (2004). Ross River virus disease

in australia, 1886-1998, with analysis of risk factors associated with outbreaks. J

Med Entomology 41, 133–150.

Knorr-Held, L. and G. Rasser (1999). Bayesian detection of clusters and disconti-

nuities in disease maps. Biometrics 56 (13), 13–21.

Kvam, P. and J. Miller (2002). Discrete predictive analysis in probabilistic safety

assessment. Journal of Quality Technology 34 (1), 106–117.

Landrigan, P. (1998). Editorial: Asbestos - Still a carcinogen. N Engl J Med 338,

1618–1619.

Lee, J. and J. O. Berger (2003). Space-time modeling of vertical ozone profiles.

Environmetrics 14 (6), 617–639.

182

Lee, P. (2001). Relation between exposure to asbestos and smoking jointly and the

risk of lung cancer. Occup Environ Med 58, 145–153.

Lee, P. (2002). Author’s Reply: Joint action of smoking and asbestos exposure on

lung cancer. Occup Environ Med 59, 495–496.

Liddell, F. (2001). The interaction of asbestos and smoking in lung cancer. Ann

Occup Hyg 45 (5), 341–356.

Liddell, F. (2002). Letter: Joint action of smoking and asbestos exposure on lung

cancer. Occup Environ Med 59, 494–495.

Liddell, F. D. K. and B. G. Armstrong (2002). The combination of effects on lung

cancer of cigarette smoking and exposure in Quebec chrysotile miners and millers.

Ann Occup Hyg 46 (1), 5–13.

Lin, M., P. Roche, J. Spencer, A. Milton, P. Wright, and D. Witteveen (2002).

Australia’s notifiable disease status, 2000. Annual report of the National Notifiable

Diseases Surveillance System. Commun Dis Intell 26, 118–175.

Lindley, D. W. and A. F. M. Smith (1972). Bayes estimates for the linear model

(with Discussion). Journal of the Royal Statistical Society, Series B 34, 1–41.

Lu, J. and F. N. Bowman (2004). Conversion of multicomponent aerosol size dis-

tributions from sectional to modal representations. Aerosol Science and Technol-

ogy 38, 391–399.

Lubin, J. H. and W. Gaffey (1988). Relative risk models for assessing the joint

effects of multiple factors. American Journal of Industrial Medicine 13, 131–147.

Makela, J. M., I. K. Koponen, P. Aalto, and M. Kulmala (2000). One-year data of

sub-micron size modes of tropospheric background aerosol in southern finland. J.

Aerosol Sci. 31, 595–611.

Marin, J. M., K. Mengersen, and C. P. Robert (2005). Bayesian modelling and

inference on mixtures of distributions. In D. Dey and C. R. Rao (Eds.), Handbook

of Statistics, Volume 25. Elsevier-Sciences.

183

McFallan, S. (2001). Climatic change and its impact on Ross River virus. Masters

thesis, School of Mathematical Sciences, Queensland University of Technology.

McLachlan, G. and D. Peel (2000a). Finite Mixture Models. New York: John Wiley

and Sons Ltd.

McLachlan, G. J. and D. Peel (2000b). Finite Mixture Models. New York: John

Wiley & Sons.

Mejia, J. F., D. Wraith, K. Mengersen, and L. Morawska (2007). Trends in size

classified particle number concentration in subtropical Brisbane, Australia, based

on a 5 year study. Atmospheric Environment 41 (5), 1064–1079.

Nam, I., K. Mengersen, and G. Garthwaite (2003). Multivariate meta-analysis.

Statistics in Medicine 22, 2309–2333.

Osunsanya, T., G. Prescott, and A. Seaton (2001). Acute respiratory effects of

ultrafine particles: mass or number? Occup. Environ. Med. 58, 154–159.

Peto, R., A. D. Lopez, J. Boreham, M. Thun, J. C. Heath, and R. Doll (1996).

Mortality from smoking worldwide. British Medical Bulletin 52, 12–21.

Phillips, D. B. and A. F. M. Smith (1996). Bayesian model comparison via jump

diffusions. In W. R. Gilks, S. Richardson, and D. J. Spiegelhalter (Eds.), Markov

chain Monte Carlo in Practice, pp. 215–40. Boca Raton: Chapman and Hall.

Rafnsson, V. and P. Sulem (2003). Cancer incidence among marine engineers, a

population based study (Iceland). Cancer Causes & Control 14 (1), 29–35.

Raftery, A. E. (1996). Hypothesis testing and model selection. In W. R. Gilks,

S. Richardson, and D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in prac-

tice, pp. 163–188. Boca Raton: Chapman and Hall.

Reif, A. and T. Heeren (1999). Consensus on synergism between cigarette smoke

and other environmental carcinogens in the causation of lung cancer. Advances in

Cancer Research 76, 161–186.

184

Richardson, S. and P. J. Green (1997). On bayesian analysis of mixtures with an

unknown number of components (with discussion). Journal of the Royal Statistical

Society, Series B 59, 731–792.

Robert, C. P. and G. Casella (2004). Monte Carlo Statistical Methods. New York:

Springer-Verlag.

Rosamilia, K., O. Wong, and G. K. Raabe (1999). A case-control study of

lung cancer among refinery workers. Journal of Occupational & Environmental

Medicine 41 (12), 1091–1103.

Rothman, K. (1974). Synergy and antagnism in cause-effect relationships. American

Journal of Epidemiology 99, 385–388.

Rothman, K. (1976). The estimation of synergy or antagonism. American Journal

of Epidemiology 103, 506–511.

Roy, P. and J. Esteve (1998). Using relative risk models for estimating synergy

between two risk factors. Statistics in Medicine 17, 1357–1373.

Russell, R. C. and D. E. Dwyer (2000). Arboviruses associated with human disease

in Australia. Microbes Infect 2, 1693–1704.

Salanti, G., J. P. Higgins, and I. R. White (2006). Bayesian synthesis of epidemio-

logical evidence with different combinations of exposure groups: application to a

gene-gene-environmental interaction. Statistics in Medicine 25, 4147–4163.

Saracci, R. (1977). Asbestos and lung cancer: an analysis of the epidemiological

evidence on the asbestos-smoking interaction. Int J Cancer 20, 323–331.

Saracci, R. (1987). The interactions of tobacco smoking and other agents in cancer

etiology. Epidemiology Review 9, 175–193.

Saracci, R. and P. Boffetta (1994). Interactions of tobacco smoking with other causes

of lung cancer. In J. M. Samet (Ed.), Epidemiology of lung cancer: lung biology

in health and disease.

185

Schabath, M. B., M. R. Spitz, G. L. Delclos, G. B. Gunn, L. W. Whitehead,

and X. Wu (2002). Association between asbestos exposure, cigarette smoking,

myeloperoxidase (MPO) genotypes, and lung cancer risk. American Journal of

Industrial Medicine 42, 29–37.

Scott, S. L. (2002). Bayesian methods for hidden Markov models: Recursive com-

puting in the 21st Century. Journal of the American Statistical Association 97,

337–351.

Scott, S. L., G. M. James, and C. A. Sugar (2004). Hidden markov models for

longtitudinal comparisons.

Seinfeld, J. and S. N. Pandis (1998). Atmospheric chemistry and physics: from air

pollution to climate change. United States of America: John Wiley and Sons.

Selikoff, I., E. Hammond, and J. Churg (1968). Asbestos exposure, smoking and

neoplasia. JAMA 204, 104–110.

Silliman, N. P. (1997). Hierarchical selection models with applications in meta-

analysis. Journal of the American Statistical Association 92, 926–936.

Sogacheva, L., M. Dal Maso, V. Kerminen, and M. Kulmala (2005). Probability

of nucleation events and aerosol particle concentration in different air masses ar-

riving at hyytiala, southern finland, based on back trajectories analysis. Boreal

Environment Research 10, 479–491.

Spiegelhalter, D. J., K. R. Abrams, and J. P. Myles (2004). Bayesian approaches to

clinical trials and health-care evaluation. Statistics in practice. Chichester: Wiley.

Spiegelhalter, D. J., A. Thomas, and N. G. Best (2002). Winbugs version 1.4 user

manual. Research report. Cambridge: Medical Research Council Biostatistics.

Stayner, L. T., R. Smith, J. Bailer, and et al (1997). Exposure-response analysis

of risk of respiratory disease associated with occupational exposure to chrysotile

asbestos. Occupational and Environmental Medicine 54, 646–652.

186

Steenland, K. and M. Thun (1986). Interaction between tobacco smoking and oc-

cupational exposures in the causation of lung cancer. J Occup Med 28, 110–118.

Stephens, M. (2000a). Bayesian analysis of mixture models with an unknown number

of components - an alternative to reversible jump methods. Annals of Statistics 28,

40–74.

Stephens, M. (2000b). Dealing with label switching in mixture models. Journal of

the Royal Statistical Society, Series B 62, 795–809.

Sutton, A. J. and K. R. Abrams (2001). Bayesian methods in meta-analysis and

evidence synthesis. Statistical methods in medical research 10, 277–303.

Sutton, A. J. and J. P. T. Higgins (2007). Recent developments in meta-analysis.

Statistics in Medicine (Online 18 April 2007).

Sutton, A. J., F. Song, S. Gilbody, and K. R. Abrams (2000). Modelling publication

bias in meta-analysis: a review. Statistical methods in medical research 9, 421–445.

Thomas, D. C. (1981). General relative-risk models for survival time and matched

case-control analysis. Biometrics 37, 673–686.

Thompson, S. G. and S. J. Sharp (1999). Explaining heterogeneity in meta-analysis:

A comparison of methods. Statistics in Medicine 18, 2693–2708.

Titterington, D. M., A. F. M. Smith, and U. E. Makov (1985). Statistical Analysis

of Finite Mixture Distributions. Chichester: Wiley.

Tong, S. (2004). Ross River virus disease in Australia: epidemiology, socioecology

and public health response. Internal Medicine Journal 34, 58–60.

Tong, S. and W. Hu (2002). Different responses of Ross River virus to climate

variability between coastline and inland cities in Queensland, Australia. Occup

Environ Med 59, 739–744.

Tritchler, D. (1999). Modelling study quality in meta-analysis. Statistics in

Medicine 18, 2135–2145.

187

Tweedie, R. L., B. Biggerstaff, D. Scott, and K. Mengersen (1996). Bayesian

meta-analysis with application to studies of ETS and lung cancer. Lung Can-

cer 14 (Suppl 1), S171–S194.

Ulvestad, B., K. Kjaerheim, J. I. Martinsen, and et al (2002). Cancer incidence

among workers in the asbestos-cement producing industry in Norway. Scandina-

vian Journal of Work Environment and Health 28 (6), 411–417.

UNSCEAR (1982). Ionizing radiation: Sources and biological effects. Technical

report, United Nations Scientific Committee on the Effects of Atomic Radiation.

Vainio, H. and P. Boffetta (1994). Mechanisms of the combined effect of asbestos

and smoking in the etiology of lung cancer. Scandinavian Journal of Work Envi-

ronment & Health 20, 235–242.

van der Linde, A. and G. Osius (2001). Estimation of non-parametric multivariate

risk functions in matched case-control studies with application to the assessment

of interactions of risk factors in the study of cancer. Statistics in Medicine 20,

1639–1662.

Vesala, T., J. Haataja, P. Aaalto, and e. al (1998). Long-term field measurements of

atmospheric-surface interactions in boreal forest ecology, micrometerology, aerosol

physics, and atmospheric chemistry. Trends in Heat, Mass and Momentum Trans-

fer 4, 17–35.

Waage, H., L. Vatten, and E. Opedal (1997). Smoking intervention in subjects at

risk of asbestos-related lung cancer. Am J Ind Med 31, 705–12.

Walker, A. (1981). Proportion of disease attributable to the combined effect of two

factors. Int J Epidemiol 10, 81–85.

Walshaw, D. (2000). Modelling extreme wind speeds in regions prone to hurricanes.

Applied Statistics 49 (1), 51–62.

Watanabe, T. (2000). A Bayesian analysis of dynamic bivariate mixture models: Can

they explain the behaviour of returns and trading volume? Journal of Business

and Economic Statistics 18 (2), 199–210.

188

West, M. and P. J. Harrison (1997). Bayesian Forecasting and Dynamic Models (2nd

ed.). New York: Springer-Verlag.

Whitby, E. (1978). The physical characteristics of sulfur aerosols. Atmos. Envi-

ron. 12, 135–159.

Whitby, E. and P. H. McMurry (1997). Modal aerosol dynamics modeling. Aerosol

Sci. Technol. 27, 673–688.

Whitby, E., P. H. McMurry, and U. Shanker (1991). Modal aerosol dynamics mod-

elling. Technical report, U.S. Environment Protection Agency, Atmospheric Re-

search and Exposure Assessment Laboratory.

Whitby, E., F. Stratmann, and M. Wilck (2002). Merging and remapping modes in

modal aerosol dynamics models: a ’dynamic mode manager’. Aerosol Science 33,

623–645.

Wildner, M. and A. Markuzzi (1997). Interaction and model selection. Journal of

Internal Medicine 241 (6), 535–536.

Wolpert, R. L. and K. Mengersen (2004). Adjusted likelihood for synthesising empiri-

cal evidence from studies that differ in quality and design: effects of environmental

tobacco smoke. Statistical Science 19, 450–471.

Wraith, D. and K. Mengersen (2007). Assessing the combined effect of asbestos expo-

sure and smoking on lung cancer: A Bayesian approach. Statistics in Medicine (28

Feb.), 1150–1169.

Xu, Z., M. Gautam, and S. Mehta (2002). Cumulative frequency fit for particle size

distribution. Applied Occupational and Environmenta l Hygiene 17 (8), 538–42.

Bayesian mixture modelling for characterising...

Documents

Transcript of Bayesian mixture modelling for characterising...