Modelling the spread of influenza in Western Australia

8
Modelling the spread of influenza in Western Australia Aldo F. Saavedra Centre for Data Science The University of Sydney, Sydney, NSW, Australia Aldo.Saavedra@ sydney.edu.au Sally Wood Business School The University of Sydney Sydney, NSW, Australia [email protected] Jemma L. Geoghegan Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia jemma.geoghegan@ sydney.edu.au Edwards Holmes Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia [email protected] Hugh Durrant-Whyte Centre for Data Science The University of Sydney, Sydney, NSW, Australia hugh.durrantwhyte @sydney.edu.au ABSTRACT The daily number of confirmed cases of influenza throughout Western Australia, at a particular point in space and time is modelled nonparametrically by assuming that the obser- vation is generated from a mixture of Poisson distributions where the log of the expected value of each component in the mixture has a priori a stationary Gaussian Process pri- ors (GPP). The weights attached to the components in the mixture are parameterised to depend upon factors which are specific to a particular point in space and time, such as the population density, the time of year, the current weather conditions, the age distribution and human movement pat- terns as well as the attributes of the virus. This mixture model serves two purposes. First it seeks to identify the spread of influenza at a point in space-time as belonging to one of a finite but unknown number of possible influenza propagation signatures. Second it addresses the issue that the space-time covariance structure for influenza counts is likely to be nonstationary, by attaching different weights to the mixture components at different points in space and This paper was presented at the First International Work- shop on Population Informatics for Big Data (PopInfo’15), Sydney, 10th of August 2015. Copyright of this work is with the authors Corresponding author Corresponding author time. This allows the covariance to change across space- time. In this first study we focus on the data collected by a surveillance program run by Department of Health of West- ern Australia. Predictive distributions of influenza counts are obtained at a particular point in space and time, after accounting for the certainty surrounding the possible mod- els as well as the uncertainty surrounding the parameters which prescribe these models. The multidimensional inte- gration required to obtain these predictive distributions is performed using reversible jump Markov chain Monte Carlo (RJMCMC). 1. INTRODUCTION The spatial and temporal spread of an infectious diseases such as influenza are problems that readily lend itself to be modeled using non-parametric Bayesian methods[1]. Epidemics of influenza occur every year are a considerable cost to the community [2]. They tend to be seasonal, taking place during the winter in temperate zones [3] while in trop- ical regions the seasonality is less pronounced, and in some locations, non-existent. [4]. The relative isolation of the Australian continent, the di- versity of its climate and the large distances between major urban centres provide a unique opportunity to study drivers behind the spread of influenza. Important insights have been gathered from simulation featuring air, surface transporta- tion and human movement [5, 6, 7, 8, 3]. In developing a model for the spread of influenza wish it to have two prop- erties (a) It is intrepretable and parsimonious and (b) Is flexible enough to give good estimates for a large range of functions and capture complex features. Choosing a para- metric model for example, a linear model - satisfies the first requirement, but not the second, whereas a nonparametric model such as a GPP satisfies both requirements.

Transcript of Modelling the spread of influenza in Western Australia

Page 1: Modelling the spread of influenza in Western Australia

Modelling the spread of influenza in Western Australia∗

Aldo F. Saavedra†

Centre for Data ScienceThe University of Sydney,Sydney, NSW, Australia

[email protected]

Sally Wood‡

Business SchoolThe University of SydneySydney, NSW, Australia

[email protected]

Jemma L. GeogheganMarie Bashir Institute

for InfectiousDiseases and Biosecurity,Charles Perkins Centre,

The University of Sydney,Sydney, NSW, Australia

[email protected]

Edwards HolmesMarie Bashir Institute

for InfectiousDiseases and Biosecurity,Charles Perkins Centre,

The University of Sydney,Sydney, NSW, Australia

[email protected]

Hugh Durrant-WhyteCentre for Data Science

The University of Sydney,Sydney, NSW, [email protected]

ABSTRACTThe daily number of confirmed cases of influenza throughoutWestern Australia, at a particular point in space and timeis modelled nonparametrically by assuming that the obser-vation is generated from a mixture of Poisson distributionswhere the log of the expected value of each component inthe mixture has a priori a stationary Gaussian Process pri-ors (GPP). The weights attached to the components in themixture are parameterised to depend upon factors which arespecific to a particular point in space and time, such as thepopulation density, the time of year, the current weatherconditions, the age distribution and human movement pat-terns as well as the attributes of the virus. This mixturemodel serves two purposes. First it seeks to identify thespread of influenza at a point in space-time as belonging toone of a finite but unknown number of possible influenzapropagation signatures. Second it addresses the issue thatthe space-time covariance structure for influenza counts islikely to be nonstationary, by attaching different weightsto the mixture components at different points in space and

∗This paper was presented at the First International Work-shop on Population Informatics for Big Data (PopInfo’15),Sydney, 10th of August 2015. Copyright of this work is withthe authors†Corresponding author‡Corresponding author

time. This allows the covariance to change across space-time. In this first study we focus on the data collected by asurveillance program run by Department of Health of West-ern Australia. Predictive distributions of influenza countsare obtained at a particular point in space and time, afteraccounting for the certainty surrounding the possible mod-els as well as the uncertainty surrounding the parameterswhich prescribe these models. The multidimensional inte-gration required to obtain these predictive distributions isperformed using reversible jump Markov chain Monte Carlo(RJMCMC).

1. INTRODUCTIONThe spatial and temporal spread of an infectious diseasessuch as influenza are problems that readily lend itself to bemodeled using non-parametric Bayesian methods[1].

Epidemics of influenza occur every year are a considerablecost to the community [2]. They tend to be seasonal, takingplace during the winter in temperate zones [3] while in trop-ical regions the seasonality is less pronounced, and in somelocations, non-existent. [4].

The relative isolation of the Australian continent, the di-versity of its climate and the large distances between majorurban centres provide a unique opportunity to study driversbehind the spread of influenza. Important insights have beengathered from simulation featuring air, surface transporta-tion and human movement [5, 6, 7, 8, 3]. In developing amodel for the spread of influenza wish it to have two prop-erties (a) It is intrepretable and parsimonious and (b) Isflexible enough to give good estimates for a large range offunctions and capture complex features. Choosing a para-metric model for example, a linear model - satisfies the firstrequirement, but not the second, whereas a nonparametricmodel such as a GPP satisfies both requirements.

Page 2: Modelling the spread of influenza in Western Australia

Time (Month/Year)

case

s/day

050

100

150

200

01/0

7

07/0

7

01/0

8

07/0

8

01/0

9

07/0

9

01/1

0

07/1

0

01/1

1

07/1

1

01/1

2

07/1

2

01/1

3

07/1

3

01/1

4

07/1

4

01/1

5

Figure 1: The number of confirmed cases determined by PathWest between 2007 and 2014.

2. DATAThe data was collected by a surveillance program run by De-partment of Health of Western Australia[9]. Medical centresdistributed throughout Western Australia report on patientsexhibiting symptoms of an ‘Influenza like illness’. The dateof the patient reporting the symptoms and their postcodeof residence are recorded. A fraction of the reported casesundergo laboratory tests performed by PathWest[10] to con-firm whether the illness is influenza. Only confirmed casesare included in this study and as such the method only sam-ples a fraction of the total of the infected population at anyone time.

Figure 1 shows the number of confirmed cases collected bythe program between 2007 and 2014. The seasonal trend isthe most prominent feature of the histogram with the peakslocated around late August or early September each year.The severity of the influenza season cannot be determinedfrom the height of peak alone. In figure 1 the highest peakcorresponds to with the emergence of a novel swine-origininfluenza A (H1N1) [11] which prompted more symptomaticpeople to seek a medical diagnosis.

Figure 2 shows the spatial distribution of influenza weeklycounts a function of reported date for 2010. The left panelshows the distribution of the latitude of the postcode of res-idence of each case as a function of the reported date whilethe right panel shows the distribution corresponding to thelongitude. The latitude and longitude associated with eachpostcode corresponds to an average of the positions of allthe localities that share the postcode.

As expected, Figure 2, shows that the highest concentrationof cases is in and around Perth (-31.9,115.9). The figure alsoshows that in additiona to spatial depency there is also atemporal dependence. For example, the number of reportedcases in Albany (-35.0, 117.9), coincides with the peak inPerth although it is ≃ 400 km south of the city while Kar-

ratha (-20.7,116.8), ≃ 1500 km north of Perth, also reportsa small number of cases in time with Perth.

An interesting feature found in Figures 1 and 2 is the numberof cases reported between the seasonal peaks. The numberof these ‘out of season’ cases is small but are persistentlydistributed across the state and not just limited to ”humid-rainy” [12]. To understand the drivers behind such subtlefeatures together with the main seasonal feature, the ulti-mate goal of this study is to build a nonparametric modelthat will include diverse data such as temperature, rela-tive humidity, daylight hours, human movement and terrain.Such a model would provide insight into the drivers behindthe temporal and spatial spread of influenza.

Our first step in our study is to outline the model and priorsassociated with this study.

3. MODEL AND PRIORS

3.1 Model for Mixture ComponentsLet ytuivi , be the observed disease count at time t, latitudeui and longitude vi, for t = 1, . . . , T , i = 1, . . . , n. Weassume that this observation is generated from a mixture ofa finite but unknown number of Poisson probability massfunctions (PMF), with mean λtuivi which is a function ofspace and time. However for now, we limit our discussion toa fixed point in time, t, and develop a model for the spatialdistribution of yyyt = (ytu1v1 , . . . , ytunvn), and in what followsthe subscript t will be dropped for ease of notation. Theextension to a time varying distribution is the subject offuture work.

For a fixed number of components in the mixture, r, eachcomponent has a different mean spatial surfaceλλλj = (λu1v1j , . . . , λunvnj), for j = 1, . . . , r where subscriptj denotes the component in the mixture. To place a prioron these mean surfaces we assume that, ηηηj = log(λλλj) is a

Page 3: Modelling the spread of influenza in Western Australia

Gaussian process (GP), with covariance structure definedby the reproducing kernel for a thinplate smoothing spline[13, 14]. The overall log of the mean surface is modelledas a weighted average of these mixture components, wherethe weights attached to the components are parameterizedto depend on space, as well as on other covariates, suchas population density, current weather conditions, humanmovement patterns and virus type.

This parameterization has two purposes. First it allows theprior on η to be a nonstationary GP. This nonstationaritycan arise because the covariance of η can change across spaceor it can arise in response to changes in other variables suchas climate conditions. Second it allows for an overdispersedmodel, so that at a particular point in space the variance ofthe data need not equal the mean as must be the case if thedata arose from a single Poisson component.

Suppose there are P possible variables which may give riseto nonstationarity, and that the values of these P covari-ates at a point in space are contained in the vector zzzuv =(1, z1uv, . . . , zPuv). For a fixed number of r components, thePMF of y is

Pr(yuv=k|r, ηηη, zzzuv) =

r∑

j=1

Pr(yuv=k|ηjuv, γuv=j, r)×Pr(γuv=j|zzzuv)

(1)where

Pr(yuv=k|ηjuv, γuv=j, r) =exp(ηjuvk) exp(− exp(ηjuv)

k!,

ηηηr = (ηηη1, . . . , ηηηr), and γuv is an indicator variable denotingthe component in the mixture to which the observation be-longs. Note that although all parameters should be indexedby r, to indicate that parameters will be of different dimen-sion and take on different values for different values of r, wehave omitted this subscript for clarity.

3.2 Model for Mixing WeightsThe mixing weights are modelled using a multinomial logis-tic regression so that

Pr(γuv = j) =exp(zuvδδδj)

∑r

l=1 exp(zuvδδδl)

for j = 1, . . . , r and δδδjr = (δ0jr, δ1jr, . . . , δPjr) are the re-gression coefficients with δδδr set to 000 for identifiability.

3.3 The LikelihoodThe likelihood function for the data for a fixed number of rcomponents is

Lr =

n∏

i=1

r∑

j=1

exp(ηjuiviyuivi) exp(− exp(ηjuivi))

yuivi !Pr(γuivi=j)

We note that observations are only recorded if there is atleast one individual who has a confirmed case of influenza,so that the likelihood we work with is

n∏

i=1

r∑

j=1

Pr(yuivi=k|yuivi>0, ηjuivi , γuivi=j, r) Pr(γuivi = j)

(2)

where

Pr(yuv=k|yuv>0, ηjuv, γuv=j) =Pr(yuv=k|ηjuv, γuv=j)

1− exp(exp(ηjuv))

and Pr(yuv=k|ηjuv, γuv=j) is given by (1).

3.4 Model for Number of Mixture ComponentsWe consider the number of components in the mixture r tobe a random variable. The PMF for yuv unconditional onthe number of components is,

Pr(yuv=k|ΘΘΘ, zzzuv) =

R∑

r=1

Pr(yuv=k|r, ˜ηηηruv,∆r, zzzuv) Pr(r)

where ΘΘΘ = (θθθ1, . . . , θθθR), is the set of all parameters whichspecify a mixture an unknown but finite number of com-ponents with θθθr = ( ˜ηηηruv,∆r) and ∆r = (δδδ1r, . . . , δδδrr) forr = 1, . . . , R. The quantity Pr(r) is the prior probabilitythat the mixture has r components.

The likelihood function now becomes

L =R∑

r=1

lr Pr(r)

and Lr is given by (2).

3.5 PriorsFor a given number of components, r, we write the log meanof the jth component, ηj , as

ηj(ui, vi) = α0j + α1jui + α2jvi + fj(ui, vi)

and place a Gaussian Process prior on fj , so that fj ∼N(0, τ2

j Ω) where Ω is the covariance matrix correspondingto a thinplate spline prior, the elements of which are givenin [13], pg 30. The parameter τ2 controls the tradeoff be-tween the goodness of fit and smoothness. If τ2 = 0, thenη is linear in space, and as τ2 → ∞, η interpolates thedata. The prior for η is completed by specifying priors forαααj = (α0j , α1j , α2j) and τ2. We assume that αααj ∼ N(0, cαI)and τ2

j ∼ U [0, cτ ] for some large cα and cτ , for j = 1, . . . , r.

To place a prior on the parameters of the mixing functions,∆r, we followed [15] assume that δδδj ∼ N(0, cδZ

′Z−1).

We consider two priors for the number of components in themixture. The first is Pr(r = k) = 1/R for k = 1, . . . , R andthe second is a truncated Poisson with upper limit R andmean (µr) = 2.

3.6 Predictive DistributionThe predictive distribution of the number of counts, y∗ atlocation (u∗, v∗) given the data is

Pr(y∗

u,v = k|yyy) =

R∑

r=1

Pr(yu,v = k|yyy, r, θθθr)p(θθθr|r, yyy)dθθθr Pr(r|yyy)

and we use reversible jumpMarkov chain Monte Carlo (RJM-CMC) to perform the required multidimensional integration.

The sampling scheme is broken into two steps; a betweenmodel move where the number of components potentiallychanges and a within model move where the parameters fora model of a fixed number of r components, θθθr are updated

Page 4: Modelling the spread of influenza in Western Australia

1. Between Model Move

This is the subject of future work, but the aim is tofollow [16] as described below. Let the value of thecurrent number of components in the chain be denotedby rc, and value of the parameters which prescribe amodel with rc components be denoted by θθθcrc . Wepropose to move the chain from (rc, θθθcrc) to (rp, θθθprp),by

by drawing (rp, θθθprp) from a proposal density q(rp, θθθprp |rc, θθθcrc)

and accepting this draw with probability

α = min

1,p(rp, θθθprp |xxx)× q(rc, θθθcrc |r

p, θθθprp)

p(rc, θθθcrc |xxx)× q(rp, θθθprp |rc, θθθcrc)

,

where p(·) denotes a target density, which is the prod-uct of an approximate likelihood times prior densities.The proposal density q(rp, θθθprp |r

c, θθθcrc).

q(rp, θθθprp |rc, θθθcrc) = q(rp|rc)× q(θθθprp |r

p, rc, θθθcrc)

= q(rp|rc)× q(δδδprp , τττ2prp , ηηη

prp |r

p, rc, θθθcrc)

= q(rp|rc)× q(δδδprp |rp, rc, θθθcrc)

× q(τττ2prp |δδδ

prp , r

p, rc, θθθcrc)

× q(ηηηprp |τττ

2prp , δδδ

prp , r

p, rc, θθθcrc).

Thus, (rp, θθθprp) is drawn by first drawing rp, followedby δδδprp , τττ

2prp and finally ηηηp

rp .

2. Within Model Move

For this type of move, r is fixed, and so the notation in-dicating the dependence on the number of componentsis dropped. Suppose the chain is at ηηηk = (ηηηk

1 , . . . , ηηηkr ),

γγγk = (γku1,v1 , . . . , γ

kun,vn), τττ2k = (τ2k

1 , . . . , τ2kr ), and

δδδk. We move the chain to ηηηk+1, γγγk+1, τττ2(k+1), and∆∆∆k+1, via a kernel which consists of four parts;

(a) Drawing ηηηk+1.

For j = 1, . . . , r a value of ηηηpj is proposed from

q(ηηηj |γγγk, τ2k

j ) where q is a Gaussian distributionwith mean ηηηj and covariance Σj where

ηηηj = argmaxηηη

j

l(ηηηj)

with

l(ηηηj) =n∑

i∈Aj

(yiηij − exp(ηij))− 1/2ηηη′D−1ηηη

and Aj is the set of integers, i, for which γuivi =j, for i = 1, . . . , n, and D the prior variance of

ηηηj . The covariance matrix, Σj , is equal to∂2l(ηηη

j)

∂ηηηj2

evaluated at ηηηj . Then with probability

α = min

1,p(ηηηp

j |γγγc, τττ2c)× q(ηηηc

j |γγγc, τττ2c)

p(ηηηcj |γγγ

c, τττ2c)× q(ηηηpj |γγγ

c, τττ2c)

,

ηηηk+1 = ηηηp otherwise ηηηk+1 = ηηηk

(b) Drawing τττ2(k+1)

Conditional on ηηηk+1j , τ

(k+1)j is drawn from an in-

verse gamma distribution and accepted with prob-ability 1.

(c) Drawing γγγk+1

Conditional on ηηηk+1, and ∆∆∆k we compute the

probability that observation yi is generated fromcomponent j, which is given by

Pr(γi=j|yi,∆∆∆, ηηη) =

=Pr(yi|γi=j, ηηηj) Pr(γi=j|∆∆∆)

∑rk=1 Pr(yi|γi=k, ηηηk) Pr(γi=k|∆∆∆)

= πij

and γk+1i is drawn from a multinomial distribu-

tion with probabilities πππi = (πi1, . . . , πir) for i =1, . . . , n.

(d) Drawing ∆∆∆k+1

Let zzzi be an r × 1 indicator vector with zji = 1if γi = j and zji = 0 otherwise, for j = 1, . . . , r.A value of ∆∆∆p is proposed from q(∆∆∆|γγγk+1) where

q is a Gaussian distribution with mean ∆∆∆ andcovariance Σ where

∆∆∆ = argmax∆∆∆

l(∆∆∆)

with

l(∆∆∆) =n∑

i=1

r∑

j=1

zij log(πij)−1

2cδ

r∑

j=1

δδδ′jδδδj .

The covariance matrix Σ is ∂2l(∆∆∆)

∂∆∆∆2 evaluated at

∆∆∆. Then with probability

α = min

1,p(∆∆∆p|γγγ)× q(∆∆∆k|γγγ)

p(∆∆∆k|γγγ)× q(∆∆∆p|γγγ)

,

∆∆∆k+1 = ∆∆∆p otherwise ∆∆∆k+1 = ∆∆∆k.

4. RESULTS AND DISCUSSIONWe first illustrate the method and evaluate its performanceon simulated data. To study the frequentist property of ourtechnique we will generate 50 realizations from the modelgiven by (1), with n = 1600, r = 1, 2, λi1 = 15+8 sin(6πui)×sin(6πvi), λi2 = 111n,

Pr(γi = 2) =exp(17ui + 17vi − 24)

1 + exp(17ui + 17vi − 24)= πi

. For each realization we will compute the deviance measure

dr =n∑

i=1

(log(λi./λri ) + (λr

i − λ),

where λi, is the true mean function and equals λi1(1− π) +

λi2π and λri is the posterior mean function for a mixture of

r components, for r = 1, 2. Initial results suggest that thereis a considerable reduction in deviance in using a mixture oftwo GPP. The full results will be presented in a later versionof this paper.

Figure 3 illustrates the method on a single simulated ex-ample. Panel (a) is a plot of the function with the data,panels (b) and c are plots of the estimated posterior meansurface for a one and two component respectively. Figure 3shows that an estimate based on a mixture of two compo-nents clearly outperforms the single mixture estimate. Themixture of two component does a better job of capturing thepeak in the mean function while remaining smooth.

Page 5: Modelling the spread of influenza in Western Australia

−40

−38

−36

−34

−32

−30

−28

−26

−24

−22

−20

−18

−16

−14

−12

−10

02/10 04/10 06/10 08/10 10/10 12/10Date of reporting

Lat

itiu

de

1

7

50

N Cases

112

114

116

118

120

122

124

126

128

130

02/10 04/10 06/10 08/10 10/10 12/10Date of reporting

Lon

gitu

de

1

7

50N Cases

Figure 2: The distribution of flu cases as a function of latitude vs time (left) and longitude vs time (right).

1

0.5

(a)

00

0.5

25

20

15

10

5

0

1

1

0.5

(b)

00

0.5

1

0

20

18

16

14

12

10

8

6

4

2

1

0.5

(c)

00

0.5

1

15

5

0

10

(a) (b) (c)

u v u v u v

Figure 3: Illustration of the simulation example. Panel (a) shows the true 2d function as a surface and the data pointsobtained from the function. Panels (b) and (c) are plots of the estimated posterior mean surface obtained for a one and twocomponent respectively.

To motivate the application of the method to disease map-ping we provide preliminary analysis to demonstrate thatinfluenza counts across WA exhibit very local structure. Todo this we analyse influenza count data in four subsets of in-creasing size. The first subset are those counts correspond-ing to patients who reside in a postcode within 100km ofthe CBD in Perth; the second and third subsets correspondto those patients who live within 500km and 1000km fromthe CBD respectively. The fourth subsets is the data for theentire state of WA.

The model for the data is a single GPP with the reproducingkernel given by a thinplate smoothing spline prior. The dataand the estimates of the posterior means for each of thesefour subsets are given in the figures

• distance < 100 km. The data is shown in figure 4aand posterior mean in figure 4c.

• distance < 500 km The data is shown in figure 4b andposterior mean in figure 4d.

• distance < 1000 km The data is shown in figure 5a

and posterior mean in figure 5c.

• No selection. The data is shown in figure 5b and pos-terior mean in figure 5d.

These figures show how sensitive modelling of influenza datarequire a method which can detect highly localized struc-ture. In figure 4c one can clearly see the distribution ofcases, north and south of the Perth and this is reflectedin the contours obtained from the model in figure 4c. Asthe area associated with the data increases, these two peakmerge into one dominant peak, with other main sources ofcases included in the model.

For example in figure 5c small peaks correspond to Gerald-ton (-28.74,114.62) and Kalgoorlie (-30.75,121.47) and in fig-ure 5d, which includes the data from the whole state, peakscorresponds with the area of Broome (-17.96,122.24) andKarratha (-20.74,116.85). The limitation of patients justreporting their postcode of residence becomes apparent insparsely populated areas such as the north of Western Aus-tralia. For example, there are 8 localities in Broome thatshare the same postcode of 6725. This results in any spatial

Page 6: Modelling the spread of influenza in Western Australia

structure that may have existed being artificially erased.

The results presented in this paper only include a small pro-portion of the possible data available for such study. It isexpected that future results will include a greater amountof data that includes all the Australian states. The resultsshow via simulation that the a mixture of GPP provides abetter model for data which exhibits localized structure andthat modelling disease counts needs a technique that is ca-pable to do this. It is the subject of future work to apply themixture model disease data across, where the mixing com-ponents are a function of space and other variables such asdemographic data and climate data, to determine whetherby employing a mixture of them will enable the model to besensitive to both, small and large spatial structure. The laststep in the study will be to develop the time component ofthe model.

References[1] Larry P Ammann. Bayesian nonparametric inference

for quantal response data. The Annals of Statistics,pages 636–645, 1984.

[2] Lone Simonsen. The global impact of influenza on mor-bidity and mortality. Vaccine, 17:S3–S10, 1999.

[3] Cecile Viboud, Ottar N Bjørnstad, David L Smith,Lone Simonsen, Mark A Miller, and Bryan T Grenfell.Synchrony, waves, and spatial hierarchies in the spreadof influenza. science, 312(5772):447–451, 2006.

[4] Wladimir Jimenez Alonso, Julia Guillebaud, Cecile Vi-boud, Norosoa Harline Razanajatovo, Arnaud Orelle,Steven Zhixiang Zhou, Laurence Randrianasolo, andJean-Michel Heraud. Influenza seasonality in madagas-car: the mysterious african free-runner. Influenza and

Other Respiratory Viruses, 9(3):101–109, 2015.

[5] I. M. Longini, P. E. Fine, and S. B. Thacker. Predict-ing the global spread of new infectious agents. Am. J.

Epidemiol., 123(3):383–391, Mar 1986.

[6] R. F. Grais, J. H. Ellis, and G. E. Glass. Assessingthe impact of airline travel on the geographic spread ofpandemic influenza. Eur. J. Epidemiol., 18(11):1065–1072, 2003.

[7] R. F. Grais, J. H. Ellis, A. Kress, and G. E. Glass.Modeling the spread of annual influenza epidemics inthe U.S.: the potential role of air travel. Health Care

Manag Sci, 7(2):127–134, May 2004.

[8] A. Flahault, S. Letrait, P. Blin, S. Hazout, J. MAl’-

narAl’s, and A. J. Valleron. Modelling the 1985 in-fluenza epidemic in france. Statistics in Medicine,7(11):1147–1155, 1988.

[9] The Department of Health of Western Australia, 189Royal Street, East Perth WA 6004, Australia.

[10] PathWest, J Block Hospital Avenue, QEII MedicalCentre, Nedlands WA 6009, Australia.

[11] Emergence of a novel swine-origin influenza a (h1n1)virus in humans. New England Journal of Medicine,360(25):2605–2615, 2009. PMID: 19423869.

[12] James D Tamerius, Jeffrey Shaman, Wladmir J Alonso,Kimberly Bloom-Feshbach, Christopher K Uejio, An-drew Comrie, and Cecile Viboud. Environmental pre-dictors of seasonal influenza epidemics across temper-ate and tropical climates. PLoS Pathog, 9(3):e1003194,2013.

[13] G. Wahba. Spline models for observational data.CBMS-NSF regional conference series in applied math-ematics. Society for Industrial and Applied Mathemat-ics, 1990.

[14] S A. Wood, Jiang Wenxin, and Martin Tanner.Bayesian mixture of splines for spatially adaptive non-parametric regression. Biometrika, 89(3):513 – 528,2002.

[15] E J. Cripps, S A. Wood, R.E Wood, and J. Lau. Mod-elling the impact of personality on individual perfor-mance behavior with a time-varying mixture of mono-tonic random effects, 2014.

[16] S A. Wood, Ori. Rosen, and R.J. Kohn. Bayesian mix-tures of autoregressive models. Journal of Computa-

tional and Graphical Statistics, 20, 2011.

Page 7: Modelling the spread of influenza in Western Australia

Perth

−32.6

−32.4

−32.2

−32.0

−31.8

−31.6

−31.4

115.4 115.6 115.8 116.0 116.2 116.4 116.6 116.8Longitude

Lat

itude

1

2

3

4

5

10

25

50

100

N Cases

(a)

Perth

Albany

Geraldton

Bunbury

−35.5

−34.5

−33.5

−32.5

−31.5

−30.5

−29.5

−28.5

114 115 116 117 118 119Longitude

Lat

itude

1

2

3

4

5

10

25

50

100

250

N Cases

(b)

Perth

−32.6

−32.4

−32.2

−32.0

−31.8

−31.6

−31.4

115.6 116.0 116.4 116.8Longitude

Lat

itude

10

20

30

40

50

Model

(c)

Perth

Albany

Geraldton

Bunbury

−34

−32

−30

−28

114 115 116 117 118 119Longitude

Lat

itude

5

10

15

20

25

30

Model

(d)

Figure 4: The spatial distribution of the number of cases between June and mid October for the year 2009 shown within aradius of 100 km (a) and 500 (b). The origin of the radius is the location of postcode 6000 (-31.92,115.91) which correspondsto the centre of Perth. Figures (c) and (d) show the contour of the model obtained using the data shown in figure (a) and (b)respectively.

Page 8: Modelling the spread of influenza in Western Australia

Perth

Albany

Geraldton

Kalgoorlie

BunburyEsperance

Carnarvon

−35.5

−33.5

−31.5

−29.5

−27.5

−25.5

−23.5

112 114 116 118 120 122 124Longitude

Lat

itude

1

2

3

45

10

25

50

100

400

600

N Cases

(a)

Perth

Albany

Geraldton

Kalgoorlie

Bunbury

Port Hedland

Broome Halls Creek

Esperance

Carnarvon

−36

−32

−28

−24

−20

−16

112.5 114.5 116.5 118.5 120.5 122.5 124.5 126.5 128.5Longitude

Lat

itude

1

2

3

45

10

25

50

100

400

1000

N Cases

(b)

Perth

Albany

Geraldton

Kalgoorlie

BunburyEsperance

Carnarvon

−35.0

−32.5

−30.0

−27.5

−25.0

112.5 115.0 117.5 120.0 122.5 125.0Longitude

Lat

itude

10

15

20

25

30

Model

(c)

Perth

Albany

Geraldton

Kalgoorlie

Bunbury

Port Hedland

Broome Halls Creek

Esperance

Carnarvon

−35

−30

−25

−20

−15

115 120 125 130Longitude

Lat

itude

10

15

20

25

30

Model

(d)

Figure 5: The spatial distribution of the number of cases between June and mid October for the year 2009 shown within aradius of 1000 km (a) and no restriction (b). The origin of the radius is the location of postcode 6000 (-31.92,115.91) whichcorresponds to the centre of Perth. Figures (c) and (d) show the contour of the model obtained using the data shown in figure(a) and (b) respectively.