Hierarchical Bayes GLMs for the analysis of spatial data: An application to disease mapping

Journal of Statistical Planning andInference 75 (1999) 305–318

Hierarchical Bayes GLMs for the analysis of spatial data:An application to disease mapping

Malay Ghosh a; ∗, Kannan Natarajan b, Lance A. Waller c, Dalho Kim d

a Department of Statistics, University of Florida, Gainesville, FL, USAb Bristol-Myers Squibb PRI, Princeton, NJ, USA

cDivision of Biostatistics, University of Minnesota, Minneapolis, MN, USAd Department of Statistics, Kyungpook National University, South Korea

Abstract

This paper considers estimation of cancer incidence rates for local areas. The raw estimatesusually are based on small sample sizes, and hence are usually unreliable. A hierarchical Bayesgeneralized linear model approach is taken which connects the local areas, thereby enabling oneto ‘borrow strength’. Random e�ects with pairwise di�erence priors model the spatial structurein the data. The methods are applied to cancer incidence estimation for census tracts in a certainregion of the state of New York. ? 1999 Elsevier Science B.V. All rights reserved.

Keywords: Bayesian methods; Hierarchical model; Leukemia incidence rates; Pairwisedi�erence priors; Small areas

1. Introduction

The prime objective of this paper is to produce estimates of leukemia incidencerates for 281 census tracts (from the 1980 census) in an eight-county area of up-state New York. The counties are Broome, Cayuga, Chenango, Cortland, Madison,Onondaga, Tioga, and Tompkins. These estimates will then be used for the mappingof the said disease across these 281 census tracts. Such maps can be utilized for de-tecting clusters or aggregations of incident cases of cancer.The raw estimates are usually unreliable for this purpose, being accompanied with

large standard errors and coe�cients of variation. The reason behind this is that thelocal areas contain only a few people at risk. This is particularly a problem for rarediseases where thousands of individuals are needed before even a single case is ex-pected to occur. This makes it a necessity to ‘borrow strength’ or utilize informationfrom the neighboring areas in order to produce smoothed estimates for the individuallocal areas.

∗ Corresponding author. E-mail: [email protected] .edu.

0378-3758/99/$ – see front matter c© 1999 Elsevier Science B.V. All rights reserved.PII: S0378 -3758(98)00150 -5

306 M. Ghosh et al. / Journal of Statistical Planning and Inference 75 (1999) 305–318

The second objective is to examine whether there is any clustering of the cases nearprespeci�ed putative sources (‘foci’) of increased risk. In particular, we are interestedin �nding whether there is any e�ect of proximity of residence to 11 inactive hazardouswaste sites containing the volatile organic compound (VOC) trichloroethylene (TCE),a common contaminant of groundwater.Hierarchical and empirical Bayes (HB and EB) procedures are particularly well

suited to meet these needs, i.e. they allow one to maintain geographic resolution andstill obtain stable estimates. These methods enable one to connect the local areas sys-tematically through a model. The similarity between the two Bayesian procedures isthat they both recognize the uncertainty due to not knowing the prior parameters (oftencalled hyperparameters). But whereas the EB procedure estimates the hyperparametersfrom the marginal distributions of the observations, the HB method models the sameby assigning second stage (and often di�use) priors.The present analysis produces smoothed estimates of leukemia incidence rates for

the 281 census tracts using a spatial HB generalized linear model. In addition, weinvestigate the e�ect (if any) of the hazardous waste sites. Our model bears similarityto the EB methods proposed by Clayton and Kaldor (1987). We follow the HB formu-lation of Besag et al. (1991) where one models the ‘log-relative risks’ (to be de�nedin Section 2) beginning with certain Poisson models as opposed to direct modelingof the Poisson parameters (Ferr�andiz et al., 1995). In addition, we follow some recentwork (Clayton and Bernardinelli, 1992; Breslow and Clayton, 1993; Besag et al., 1995;Bernardinelli et al., 1997; Carlin and Louis, 1996 (Section 8:3); Waller et al., 1997a, b;Xia and Carlin, 1998) making explicit use of covariates, which in this case is the in-verse distance of each census tract centroid from the nearest hazardous waste site.Regarding the incorporation of random e�ects at the second stage of the model, our

procedure is similar to that of Besag et al. (1991, 1995), where each area e�ect isexpressed as the sum of two random components. One set of error components are in-dependent and identically distributed re ecting an unstructured pattern of heterogeneity,while the other set of error components exhibits spatial correlation among neighboringcensus tracts (a precise de�nition of neighbors will be given in Section 3), but uncor-relation among the other census tracts, and this is generally referred to as ‘structuredheterogeneity’ (cf. Bernardinelli and Montomoli, 1992).Section 2 of this paper introduces a HB spatial generalized linear model which in-

cludes and uni�es most of the work referred to earlier. More important, we providesu�cient conditions ensuring the propriety of the posteriors under noninformative priorsin a very general framework. Also, the duality between nonidenti�ability and impro-priety of posteriors is brought out explicitly in regression models which include theintercept term.Section 3 considers a special case of the general model introduced in Section 2, and

applies it to the estimation of leukemia incidence rates, and detection of disease clusters.Our �ndings indicate that, after adjustment for extra-Poisson variability, there is littleevidence for a consistent risk increase due to proximity to the TCE-contaminated wastesites.

M. Ghosh et al. / Journal of Statistical Planning and Inference 75 (1999) 305–318 307

2. A hierarchical Bayes spatial generalized linear model

The following hierarchical Bayes spatial generalized model is broad enough to covera large number of situations where a spatial structure needs to be incorporated.I. Conditional on �=(�1; : : : ; �m)T, Y1; : : : ; Ym are mutually independent with

f(yi | �i)= exp (yi�i − (�i))h(yi):

II.

�i= qi + xTi b+ ui + vi (i=1; : : : ; m); (1)

where qi are known constants, ui and vi are mutually independent with viiid∼N(0; �2v)

and the ui have joint pdf

f(u)∝ (�2u)−1=2m exp[−∑∑

i 6=j(ui − uj)2wij=(2�2u)

]; (2)

where wij¿0 for all 16i 6= j6m.III. b, �2u, and �2v are mutually independent with b∼Uniform (Rp)(p¡m);(�2u)

−1∼Gamma (12a; 12g); (�2v)−1∼Gamma(12c; 12d): (A random variable Z is saidto have a Gamma(�; �) pdf if f(z)∝ exp (−�z)z�−1.)The target is to �nd the posterior distributions of � and of b given y=(y1; : : : ; ym)T,

and in particular, the posterior means, variances and covariances of these distributions.Three special cases are of great practical interest. The �rst is when the Yi are inde-

pendent Bin(ni; pi). In that case �i= log(pi=(1−pi)) and (�i)= ni log(1 + exp (�i)).Second, Yi ∼Poisson (�i). Then �i= log �i and (�i)= exp (�i). Finally, Yi ∼N(�i; 1)in which case (�i)= 1

2�2i .

Part II of this model involves the o�set parameters qi, the random e�ects ui andvi, and the weights wij. The assumed distribution of u is a special case of one givenin Besag et al. (1995) who allow �(ui − uj) in the place of (ui − uj)2, � arbitrarybut symmetric in its arguments. The joint distribution of the ui is called a ‘pairwisedi�erence prior’. Our Bayesian methodology is applicable to an arbitrary W (the matrixof weights wij) as shown in the later half of this section, but in Section 3, we havetaken wij =1 if i and j are neighbors, and wij =0 otherwise. Clearly, there exist otherchoices of W . For example, Devine et al. (1994) have extended the work of Cook andPocock (1983) to model the covariance matrix directly with zero correlation betweennonneighbors. Then, W , the inverse correlation matrix need not have wij =0 when iand j are not neighbors.The simplest prior models are those which build exchangeability among all the local

areas, and shrink the individual local area e�ects towards a global value. The simplesthierarchical Bayes model of Lindley and Smith (1972) is of this type. In contrast, thepresent model incorporates the geographical structure of the map. More speci�cally,estimates of the �i are strongly in uenced by their neighbors, and only indirectlyin uenced by estimates of all other areas of the map. As a result, the individual


estimates shrink more towards a local than towards a global mean value. Inclusionof the covariates also eliminates exchangeability among the neighboring areas as such.Part III of the model is the standard for HB analysis where one assigns independently

a uniform prior to the regression coe�cients, and inverse gamma priors to the variancecomponents.First, one needs to check that under the hierarchical model given in I–III, the pos-

terior distribution of � given y is proper. The following theorem provides su�cientconditions to ensure this.

Theorem 1. Assume that f(yi | �i) is bounded for all i. Suppose also that there existyi1 ; : : : ; yin (1¡i1¡ · · ·¡in6m; p6n6m) such that∫ ∞

−∞exp[�yij − (�)] d�¡∞

for j=1; : : : ; n, and the corresponding design vectors xi1 ; : : : ; xin are such that Xi∗=(xi1 − �xi·; : : : ; xin − �xi·)T ( �xi·= n−1

∑nj=1 xij) has full rank p. Then the joint posterior

p(�; b; �2u; �2v | y) is proper if a¿0; c¿0; m+ g¿0 and n+ d¿0.

The proof of this theorem is technical, and is deferred to the appendix. The theoremgeneralizes the one of Ghosh et al. (1998) in that the latter require �niteness of theintegral for every y1; : : : ; yn. This is equivalent to the integrability of the likelihood.Consequently, while the present theorem requires 16yi6ni − 1 for at least p of them su�xes in the binomial case, the general theorem of Ghosh et al. (1998) requires16yi6ni − 1 for every i=1; : : : ; m. In the Poisson case, while our theorem requiresyi¿1 for at least p of the m su�xes, Ghosh et al. (1998) require yi¿1 for everyi=1; : : : ; m.The present theorem also generalizes Maiti (1997) who considered the Poisson model

with p=1 and u degenerate at 0.We may note at this stage that if instead of the model given in Eq. (1), we model

�i as

�i= qi + b0 + xTi b+ ui + vi (i=1; : : : ; m); (3)

then the joint posterior p(�; b0; b; �2u; �2v | y) is improper. The basic reason behind this is

that the posterior p(�; b0; b; �2u; �2v | y) then becomes nonidenti�able. This happens due

to the inclusion of the intercept term. The following theorem is proved to this e�ect.

Theorem 2. Consider the hierarchical model given in I–III except that Eq. (2) is nowreplaced by Eq. (3) in II, and in III b0; b; �2u and �2v are mutually independent withb0∼Uniform (Rp) in addition to what is already assumed. Then the joint posteriorp(�; b0; b; �2u; �

2v | y) is improper.

The proof of this theorem is also deferred to the appendix. Brie y, the ui are trans-lation invariant, hence confound the overall baseline e�ect (intercept). The resulting


identi�ability problems associated with the intercept term are noted by Bernardinelli etal. (1995a, b), Carlin and Louis (1996), (p. 264), and Waller et al. (1997a, b) althougha formal proof as given in Theorem 2 seems to be new. The previous authors suggestconstraining the spatial random e�ects, e.g. requiring

∑i ui=0, to allow an identi�able

overall intercept.Next, we discuss implementation of the Bayes procedure via Markov chain Monte

Carlo (MCMC). In particular, we use the Gibbs sampler which requires generatingsamples from the full conditionals, in order to generate samples from the marginalposteriors p(� | y) or p(b | y).In order to �nd the full conditionals, write u= �−2

u , and v= �−2v . The joint posterior

of �; b; u; v; v, and u is

p(�; b; u; v; v; u | y)∝exp[∑

iyi�i −

∑i (�i)

]

× (1=2)mv exp[− 12 v∑i(�i − qi − xTi b− ui)2

]

× (1=2)mu exp

[− 12 u

∑∑16i 6=j6m

(ui − uj)2wij

]

× exp (− 12a u)

(1=2)g−1u exp (− 1

2c v) (1=2)d−1v :

(4)

Hence, the full conditionals are

(i) u | �; b; u; v; y∼Gamma(

a+∑∑

i 6=j(ui − uj)2wij

2; 12 (m+ g)

);

(ii) v | �; b; u; u; y∼Gamma(c +

∑i(�i − xTb− ui)2

2; 12 (m+ d)

);

(iii) ui | �; b; uj(j 6= i); u; v; y∼N[( v + ki)−1{ v(�i − xTi b)2 + u

∑j(6= i) ujwij}; ( v + ki)−1]; (ki=

∑j wij);

(iv) b | �; u; u; v; y∼N[(XTX)−1XT(� − u); −1v (XTX)−1];(v) p(�i | �j(j 6= i); b; u; u; v; y)∝ exp (yi�i − (�i)) exp[− 1

2 v(�i − qi − xTi b− ui)2].It is easy to generate samples from the full conditionals given in (i)–(iv). Appar-

ently, the only problem is to generate samples from (v) which is known only up to amultiplicative constant. Fortunately, the task becomes simpler due to the log-concavityof f(�i | �j(j 6= i); b; u; u; v; y), because then one can use the adaptive rejection sam-pling of Gilks and Wild (1992). The log-concavity follows from the calculation of

@2 log f(�i | ·)@�2i

=−ni ′′(�i)− v

and the facts that v¿0 and ′′(�i)=V (Yi | �i)¿0: An example of this approach to thespatial modeling of lip cancer rates in 56 districts of Scotland appears in the BUGSsoftware package (Spiegelhalter et al., 1995a, b).In disease mapping applications, Clayton and Kaldor (1987) and Besag et al. (1991)

considered Yi | iind∼ Poisson (Ei i) with �i(= log i)= xTi b + ui + vi, Ei denotes the


number of cases ‘expected’ in region i, compared to some ‘standard’ population withrespect to common risk factors such as age and gender. The �i are referred to as log-relative risks. Then �i= log �i= log Ei + log i= log Ei + xTi b + ui + vi. This modelwill be used in the next section.

3. Application to waste sites and leukemia in upstate New york

The study area includes an eight-county region of upstate New York containing1 057 673 people (1980 census) in 281 census tracts. The New York Department ofHealth recorded 592 incident cases of leukemia in the years 1978–1982. The incidencedata are available in Waller et al. (1994). Cases were assigned to census tracts basedon the location of the individual’s residence. In a few instances (less than 10% of allcases), a case residence could only be identi�ed to a collection of possible tracts. Suchcases are excluded from the analysis below.Previous analyses of the New York leukemia data include Turnbull et al. (1990)

and Kulldor� and Nagarwalla (1995), who analyze the data for general clustering ofcases amongst themselves. Waller et al. (1992, 1994) examined the data for clusteringof the cases near prespeci�ed putative sources (‘foci’) of increased risk. The foci ofinterest were 11 inactive hazardous waste sites containing the volatile organic com-pound (VOC) trichloroethylene (TCE), a common contaminant of groundwater. Thewell-known Woburn study in Massachusetts (cf. Lagakos et al., 1986) also concentratedon leukemia incidence in areas around TCE-contaminated wells.Waller et al. (1992, 1994) found results suggestive of clustering near some of the

TCE sites. The results were not adjusted for the observed age and sex distributionof the population at risk within the census tracts. Waller and McMaster (1998) usegeographical information systems and externally standardized rates to age-standardizethe census tract rates for one of the counties (Broome). In the absence of measuredexposure values, Waller et al. (1992, 1994) and Waller and McMaster (1998) use theinverse distance to the nearest site as a surrogate for exposure.We adopt the HB formulation of Section 2 to model the New York leukemia rates

in order to quantify any e�ect of inverse distance to a TCE site. A model-basedapproach allows a greater range of analysis than the focused hypothesis tests employedin Waller et al. (1992, 1994). The HB models extend the earlier analysis by allowingspatial correlation between the observed rates (in the second stage of the hierarchy)that is not allowed in most clustering tests. Age and sex adjustments may be includedexplicitly as covariates, or as an o�set in the model (or else their e�ect will be absorbedby the random e�ects).We use the model given in Section 2 where conditional on �; Y1; : : : ; Y281 are mutually

independent Poisson(Ei i); i=1; : : : ; 281. We model the �i as

�i= log Ei + bxi + ui + vi (i=1; : : : ; 281); (5)


Fig. 1. Census tract centroids for eight-county region of upstate New York. Inactive hazardous waste sitescontaining trichloroethylene (TCE) appear as ‘⊕’. Tracts with crude rates of zero are indicated by ‘0’.

where xi is the inverse distance of the centroid of the ith census tract from its nearesthazardous waste site (containing TCE), and ui, vi are the excess heterogeneity in thepopulation, respectively. For comparative purposes, we follow Waller et al. (1994) andde�ne Ei to be the number of cases expected for a constant incidence rate (0.00054)across all 281 tracts. Also, we take wij =1 if i and j are neighbors, while wij =0otherwise. Finally, in part (III) of the model p=1, a= c=0:001, and b=d=1.A key step in model speci�cation is the de�nition of neighbors, i.e. those tracts

whose rates are correlated with that of a given tract. A traditional de�nition of neighborsincludes all tracts contiguous to a given tract. This choice seems appropriate for tractsof equal geographic area and population size. In our case, tracts are quite heterogeneousin both aspects. The study area is largely rural with two urban areas (Syracuse andBinghamton) and three large towns (Auburn, Cortland and Ithaca). While census tractswere originally de�ned to contain roughly 3000–4000 individuals, the tracts in ourstudy area range in population size from 9 to 13 015 people. For each tract, we de�neits ‘neighbors’ as those tracts whose centroids are within 30 km of the central tract.This results in a considerably larger ‘neighborhood’ than the traditional choice of onlycontiguous tracts. Fig. 1 illustrates the census tract centroids and the 30 km radiusde�ning a neighborhood. We see that rates (after adjustment for covariate e�ects) willbe ‘shrunk’ toward a local mean comprised of a rather large proportion of the studyarea.One interpretation of spatial autocorrelation among rates is residual random vari-

ation due to unmeasured covariates. The partitioning of a model into deterministic


Fig. 2. Surface and contour maps of leukemia incidence rates (crude) by census tract, 1978–1982.

(covariate e�ects) and stochastic (correlation) components is not unique (cf. Cressie,1991, pp. 113–114). One goal of the statistical modeller is to accurately measure thee�ects of the covariates on the outcome while adequately accounting for residual corre-lation. If prediction is of primary importance, as in mining geostatistics (e.g. kriging)or in disease mapping, the goal is to portray the best estimates available from thedata. For these applications, the correct partitioning into components is secondary tocorrect modelling of each component. If, as in our case, one wishes to interpret pa-rameter estimates as measures of e�ect, the modeller must take care to adjust e�ectsof one covariate for any other known or suspected covariates. If left unaccounted for,unmeasured covariates may bias or confound inference. Factors that almost always ef-fect disease incidence (e.g. age) may be incorporated either through the standardizationproposed by Clayton and Kaldor (1987) or by explicit inclusion in the model. Directinclusion is perhaps a bit more exible, allowing interactions, polynomial terms, orother relationships, but requires a more conscientious e�ort by the modeller to includeall possible and suspected confounding factors.Fig. 2 shows surface and contour plots of the crude incidence rates, while Fig. 3

illustrates the posterior median rates from the model. The contour plots show a similaroverall pattern of rates between the two maps, with an area of higher rates in thecentral portion of the study region. The surface plots indicate the moderation of thehighest rates due to incorporating neighboring information. For example, the highestcrude rate is due to a single case in a tract with only 143 residents, but nearby rates(based on larger population sizes) are much smaller.Fig. 4 plots the crude rates versus the posterior median (smoothed) rates. We see the

smoothed rates are smaller than all non-zero crude rates. Crude rates of zero appearthroughout the study area, and most neighborhoods contain at least one zero crude rateresulting in a lower smoothed rate (see Fig. 1). The zero rates ‘shrink’ towards the(higher) neighborhood mean rate.


Fig. 3. Surface and contour maps of smoothed leukemia incidence rates 1978–1982 based on model usingposterior median parameter values.

Fig. 4. Comparison of crude incidence rates with predicted rates based on �tted model.


The 95% credible set for the waste site e�ect b1 is (−0:112; 1:596). Our HB analysissuggests some evidence for a positive e�ect of proximity to a waste site on leukemiaincidence, but there is positive posterior probability that no such e�ect exists. We �ndthat the inverse distance to TCE sites has little e�ect on the smoothed disease rates,after we have adjusted for heterogeneities in the denominators of the crude rates.The results weakly echo the suggestion of clustering seen in Waller et al. (1994)

and Waller and McMaster (1998), i.e., there is slight evidence of an e�ect of proximityto the waste sites on leukemia incidence for the years under study. The tests used inWaller et al. (1994) are most sensitive to deviations from expectation in tracts nearestthe TCE site. It is possible that the tests detected stronger local e�ects that are notconsistent throughout the study area, or that the relatively large neighborhood sizeconcealed small local e�ects. Future analyses will re�ne the model, compare di�erentneighborhood structures, and study the sensitivity of the results. Our �rst concern isto obtain data and include possible confounding factors such as age and gender. Also,in the model above, we consider only the e�ect of the nearest waste site and treatthe waste sites equally with respect to their e�ect on disease risk. In reality, the sitesare likely to di�er in their potential threat to public health due to di�ering amounts ofcontaminants, proximity to wells, etc. As a result, the aggregate e�ect of proximity toany waste site may well attenuate the e�ect of proximity to a particular waste site.As a �nal comment, we note that by including covariate values measured on tracts

rather than individuals, our analysis is an example of ‘ecological modelling’. Due tothe well-known ‘ecologic fallacy’ (i.e. attributing e�ects measured in aggregate to in-dividuals), one must interpret results from such models with caution, even when everypertinent covariate is included. Current research focuses on the proper inference avail-able from such models (cf. Richardson, 1992; Breslow and Clayton, 1993; Bernardinelliet al., 1997; Prentice and Sheppard, 1995). Future work will explore the validity ofecological modelling in this sort of situation.

Acknowledgements

The authors gratefully acknowledge the assistance of Prof. Gauri S. Datta for con-versations leading to the current formulation of Theorem 1, and Ms. Erin Conlon forthe implementation of the models. This research was supported in part by NationalScience Foundation Grant Number SBR-9423996, and National Institute of Environ-mental Health Sciences, NIH, grant number 1 R01 ES07750-01A1. The results re ectthose of the authors and not necessarily those of NSF, NIEHS or NIH.

Appendix

Proof of Theorem 1. With the one-to-one transformation zi= ui−um (i=1; : : : ; m−1),and writing z=(z1; : : : ; zm−1)T, the joint posterior of �; b; z; um; u, and v given y is


p(�; b; z; um; u; v | y)∝exp[

m∑i=1

{yi�i − (�i)}]

× (1=2)mv exp[− 12 v

m∑i=1(�i − qi − xTi b− zi − um)2

]

× (1=2)mu exp

[− 12 u

∑∑16i 6=j6m

(zi − zj)2wij

]

× exp (− 12a u)

(1=2)g−1u exp (− 1

2c v) (1=2)d−1v ; (6)

where zm=0. Without loss of generality, assume ij = j (j=1; : : : ; n). Let �∗=(�1; : : : ; �n)T, and write X∗=Xi∗. Now, integrating with respect to �n+1; : : : ; �m, thejoint posterior of �∗; b; z; um; u; and v given y is

p(�∗; b; z; um; u; v | y)∝ exp[

n∑i=1

{yi�i − (�i)}]

× (1=2)nv exp[− 12 v

n∑i=1(�i − qi − xTi b− zi − um)2

]

× (1=2)mu exp

[− 12 u

∑∑16i 6=j6m

(zi − zj)2wij

]

× exp (− 12a u)

(1=2)g−1u exp (− 1

2c v) (1=2)d−1v : (7)

Next, writing ��= n−1∑n

i=1 �i, �z= n−1∑n

i=1 zi, �x= n−1∑n

i=1 xi, and �q= n−1∑n

i=1 qi

integration with respect to um gives

p(�∗; b; z; u; v | y) ∝ exp[

n∑i=1

{yi�i − (�i)}]

× 1=2(n+d)−1v exp

[− 12 v

{c +

n∑i=1((�i − ��)− (qi − �q)

− (xi − �x)Tb− (zi − �z))}2]

× 1=2(m+g)−1u exp

[− 12 u

{a+

∑∑16i 6=j6m

(zi − zj)2wij

}]:

Now, integration with respect to b gives

p(�∗; z; u; v | y)∝ exp[

n∑i=1

{yi�i − (�i)}]


× 1=2(n+d)−1v exp

[− 12 v

{c +

n∑i=1(�i − ��− qi + �q− zi + �z)2

− �T(XT∗X∗)−1�}]

× 1=2(m+g)−1u exp

[− 12 u

{a+

∑∑16i 6=j6m

(zi − zj)2wij

}]; (8)

where �=∑n

i=1(�i − ��− qi + �q− zi + �z)(xi − �x). Now, integrating with respect to uand v, it follows from Eq. (8) that

p(�; z | y)6K exp[

n∑i=1

{yi�i − (�i)}][

a+∑∑

16i¡j6m(zi − zj)2wij

]−1=2(m+g)

; (9)

where K(¿0) is a generic constant which does not depend on �∗ and z. Finally,zm=0 and

∑∑16i¡j6m wij(zi − zj)2 involves only m− 1 variables z1; : : : ; zm−1. Thus,

integrating with respect to z, and using the structure of a multivariate t-distribution, itfollows that

p(� | y)6K exp[

n∑i=1

{yi�i − (�i)}]:

Now, from the hypothesis of the theorem, the result follows.

Proof of Theorem 2. The joint posterior of �; b0; b; u; u, and v is given by

p(�; b0; b; u; u; v | y)∝ exp[

m∑i=1

{�iyi − (�i)}]

× (1=2)m−1v exp[− v2

m∑i=1(�i − qi − b0 − xTi b− ui)2

]

× (1=2)m−1u exp

[− u2∑∑16i 6=j6m

wij(ui − uj)2]

× exp(−a2 u) (1=2)g−1u exp

(− c2 v) (1=2)d−1v : (10)


With the one-to-one transformation si= b0 + ui (i=1; : : : ; m); and writing s=(s1; : : : ; sm)T, the joint posterior of �; b0; b; s; u, and v is

p(�; b0; b; s; u; v | y)∝exp[

m∑i=1

{�iyi − (�i)}]

× (1=2)m−1u exp[− v2

m∑i=1(�i − qi − xTi b− si)2

]

× (1=2)m−1u exp

[− u2∑∑16i 6=j6m

wij(si − sj)2]

× exp(−a2 u) (1=2)g−1u exp

(− c2 v) (1=2)d−1v ;

(11)

which does not depend on b0 ∈ (−∞;∞). This leads to a nonidenti�able posterior.Integration with respect to b0 now leads to∫ ∞

−∞p(�; b0; b; s; u; v | y) db0 =+∞:

This proves the theorem.

References

Bernardinelli, L., Clayton, D., Montomoli, C., 1995a. Bayesian estimates of disease maps: how importantare priors? Statist. Med. 14, 2411–2431.

Bernardinelli, L., Clayton, D., Pascutto, C., Montomoli, C., Ghistendi, M., Songini, M., 1995b. Bayesiananalysis of space-time variation in disease risk. Statist. Med. 14, 2433–2443.

Bernardinelli, L., Montomoli, C., 1992. Empirical Bayes versus fully Bayesian analysis of geographicalvariation in disease risk. Statist. Med. 11, 983–1007.

Bernardinelli, L., Pascutto, C., Best, N.G., Gilks, W.R., 1997. Disease mapping with errors in covariates.Statist. Med. 16, 741–752.

Besag, J., Green, P., Higdon, D., Mengersen, K., 1995. Bayesian computation and stochastic systems (withdiscussion). Statist. Sci. 10, 3–66.

Besag, J., York, J., Molli�e, A., 1991. Bayesian image restoration with two applications in spatial statistics.Ann. Inst. Statist. Math. 43, 1–59.

Breslow, N., Clayton, D., 1993. Approximate inference in generalized linear mixed models. J. Amer. Statist.Assoc. 88, 9–25.

Carlin, B.P., Louis, T.A., 1996. Bayes and Empirical Bayes Methods for Data Analysis. Chapman & Hall,London.

Clayton, D.G., Bernardinelli, L., 1992. Bayesian methods for mapping disease risk. In: Elliott, P., Cuzick,J., English, D., Stern, R. (Eds.), Geographical and Environmental Epidemiology: Methods for Small-AreaStudies, Oxford University Press, London.

Clayton, D., Kaldor, J., 1987. Empirical Bayes estimators of age-standardised relative risks for use in diseasemapping. Biometrics 43, 671–681.

Cook, D., Pocock, S., 1983. Multiple regression in geographic mortality studies, with allowance for spatiallycorrelated error. Biometrics 39, 361–371.

Cressie, N.A.C., 1991. Statistics for Spatial Data. Wiley, New York.Devine, O.J., Louis, T.A., Halloran, M.E., 1994. Empirical Bayes estimators for spatially correlated incidencerates. Environmetrics 5, 381–398.


Ferr�andiz, J., L�opez, A., Llopis, A., Morales, M., Tejerizo, M.L., 1995. Spatial interaction betweenneighbouring counties: cancer mortality data in Valencia (Spain). Biometrics 51, 665–678.

Ghosh, M., Natarajan, K., Stroud, T.W.F., Carlin, B.P., 1998. Generalized linear models for small areaestimation. J. Amer. Statist. Assoc. 93.

Gilks, W.R, Wild, P., 1992. Adaptive rejection sampling for Gibbs sampling. J. Roy. Statist. Soc., Ser. B41, 337–348.

Kulldor�, M., Nagarwalla, N., 1995. Spatial disease clusters: detection and inference. Statist. Med. 14,799–810.

Lagakos, S.W., Wessen, B.J., Zelen, M., 1986. An analysis of contaminated water and health e�ects inWoburn, Massachusetts (with discussion). J. Amer. Statist. Assoc. 81, 583–614.

Lindley, D.V., Smith, A.F.M., 1972. Bayes estimates for the linear model (with discussion). J. Roy. Statist.Soc., Ser. B 34, 1–41.

Maiti, T., 1997. Hierarchical Bayes estimation of mortality rates for disease mapping. Technical Report No.546. Department of Statistics, University of Florida.

Prentice, R.L., Sheppard, L., 1995. Aggregate data studies of disease risk factors. Biometrika 82, 113–125.Richardson, S., 1992. Statistical methods for geographical correlation studies. In: Elliott, P., Cuzick, J.,English, D., Stern, R. (Eds.), Geographical and Environmental Epidemiology: Methods for Small AreaStudies. Oxford University Press, Oxford, pp. 181–204.

Spiegelhalter, D.J., Thomas, A., Best, N., Gilks, W.R., 1995a. BUGS: Bayesian inference using Gibbssampling, Version 0.50. Technical Report, Medical Research Council Biostatistics Unit, Institute of PublicHealth, Cambridge University.

Spiegelhalter, D.J., Thomas, A., Best, N., Gilks, W.R., 1995b. BUGS examples, Version 0.50. TechnicalReport, Medical Research Council Biostatistics Unit, Institute of Public Health, Cambridge University.

Turnbull, B.W., Iwano, E.J., Burnett, W.S., Howe, H.L., Clark, L.C., 1990. Monitoring for clustering ofdisease: application to leukemia incidence in upstate New York. Amer. J. Epidemiol. 132, S136–S143.

Waller, L.A., Turnbull, B.W., Clark, L.C., Nasca, P., 1992. Chronic disease surveillance and testing ofclustering of disease and exposure: application to leukaemia incidence and TCE-contaminated dumpsitesin upstate New York. Environmetrics 3, 281–300.

Waller, L.A., Turnbull, B.W., Clark, L.C., Nasca, P., 1994. Spatial pattern analyses to detect rare diseaseclusters. In: Lange, N., Ryan, L., Billard, L., Brillinger, D., Conquest, L., Greenhouse, J. (Eds.), CaseStudies in Biometry. Wiley, New York, pp. 3–23.

Waller, L.A., Carlin, B.P., Xia, H., 1997a. Structuring correlation within hierarchical spatio-temporal modelsfor disease rates. In: Gregoire, T.G., Brillinger, D.R., Diggle, P.J., Russek-Cohen, E., Warren, W.G.,Wol�nger, R.D. (Eds.), Modelling Longitudinal and Spatially Correlated Data, Lecture Notes in Statistics,vol. 122, Springer, New York, pp. 309–319.

Waller, L.A., Carlin, B.P., Xia, H., Gelfand, A., 1997b. Hierarchical spatio-temporal mapping of diseaserates. J. Amer. Statist. Assoc. 92, 607–617.

Waller, L.A., McMaster, R.B., 1998. Incorporating indirect standardization in tests for disease clustering ina GIS environment. Geographical Systems, to appear.

Xia, H., Carlin, B.P., 1998. Spatio-temporal models with errors and covariates: mapping Ohio lung cancermortality. Statist. Med., to appear.

Hierarchical Bayes GLMs for the analysis of spatial data: An application to disease mapping

Documents

Transcript of Hierarchical Bayes GLMs for the analysis of spatial data: An application to disease mapping