Spatial Epidemiology - Rollins School of Public Health Epidemiology LANCE A. WALLER Emory...

Spatial Epidemiology

LANCE A. WALLEREmory University, Atlanta, Georgia

1 SPACE AND DISEASE

The spatial distribution of cases of disease often captures the imagina-tion of health researchers and the general public, based primarily onthe notion that the observed pattern of incidence or prevalence can pro-vide insight in to the underlying mechanisms driving disease incidence,its progression, and the design and implementation of effective pub-lic health responses. Historical examples include John Snow’s famousmaps of cholera incidence in London neighborhoods and early maps ofyellow fever incidence in relation to features of cities and docks (Walter2000, Koch 2005). More recent examples include reports of clustersof cancer cases near hazardous waste sites. However, quantifying suchhypotheses through statistical inference is a difficult task due to often-times subtle signals within multiple layers of noisy, nonindependent,observational data from multiple agencies collected for multiple pur-poses, typically other than the spatial epidemiology question at hand.As a result, most applications do not have the luxury of a research de-sign optimized for the questions of interest, nor an experimental settingwithin which to conduct inference controlling for potential confound-ing factors. For these reasons, spatial epidemiologic studies encountermany complications in addition to those in traditional studies, includ-ing some unique to the geographic setting.

The field of spatial statistics involves the statistical analysis of obser-vations with associated locations in space. Often these observations donot follow a Gaussian distribution and are not independent, two main-stays in the development of statistical methods. In addition, asymp-totic results take on a different flavor depending on whether we consider

1

an increasing number of observations within a fixed study area (infillasymptotics) or an increasing number of observations in an increasingstudy area (increasing domain asymptotics). In response to these is-sues, a wide variety of statistical techniques for spatial epidemiologicinference have developed in recent years, coalescing into a collection ofapproaches addressing specific questions. The field of spatial epidemi-ology is the subject of several lengthy texts (Elliott et al. 1992, Elliottet al. 2000, Lawson 2001, Waller and Gotway 2004) and we present onlyan overview of a few particular issues here, accompanied by brief exam-ples. We encourage the interested reader to follow up in the referencedmaterial for more detailed development and applications.

In the sections below, we provide an overview of statistical methodscommonly applied to gain insight on epidemiologic questions based onspatially referenced data. We begin with an overview of relevant spa-tial questions typically addressed in spatial epidemiology, the typicaldata structures available, then review and illustrate methods based ongeneral topics.

2 BASIC SPATIAL QUESTIONS AND

RELATED DATA

Often, our main goals in the analysis of spatial data mirror thosefrom non-spatial data, namely, we seek to describe data patterns, mea-sure associations, and assess variability. In the specific case of epidemi-ology, this often falls to assessment of associations between exposureand disease. For spatial epidemiology, we typically don’t have preciselythe information we want and we seek to use spatial information to fillin for data we cannot easily measure.

More specifically, consider the following basic inferential epidemio-logic questions, each with a spatial dimension:

Q1) Can we quantify spatial trends and/or patterns in the location ofcases? Does the risk of disease appear to vary over space?

Q2) Can we quantify spatial trends and/or patterns in regional countsof incidence or prevalence (e.g., the number of cases reportedwithin each of a set of census regions)? Again, does the risk ofdisease appear to vary over space?

2

Q3) Can we predict ambient exposure levels at locations where nomeasurement is taken, based on measurements from several pointlocations?

Q4) Can we measure associations between disease risk at particular lo-cations, accounting for residual spatial correlation in model errorterms?

Next, consider the types of data typically accompanying each question.

D1) Residential locations of cases and controls.

D2) Reported counts of incident or prevalent cases and populationsizes from census regions.

D3) Continuous observations of exposure levels at each of a numberof monitoring locations.

D4) Local measures of disease incidence, prevalence, or risk and asso-ciated exposure measures.

Texts on spatial statistics often categorize methods by the data typeavailable but Waller and Gotway (2004) note a close correspondencebetween the data structures D1-D4 and their associated underlying in-ferential questions of interest Q1-Q4. In this chapter we consider recentdevelopments addressing questions Q1 and Q3 (spatial point patternsand spatial prediction, respectively), and comment on relationships toquestions Q2 and Q4.

3 QUANTIFYING PATTERN IN POINT

DATA

We begin with question Q1, how do we quantify the spatial patternof observed events in space? If we consider the locations of eventsas random variables distributed in space, we may use a spatial pointprocess to describe the pattern of events within our study area, denotedD. Basic questions include whether events are clustered or regularlydistributed in space and a spatial Poisson process offers a convenientreference model of spatial randomness. Diggle (1983, p. 50) and Stoyanet al. (1995, p. 33) define a homogeneous spatial Poisson process by:

3

(i) The number of events occurring within a finite region A ⊆ Dis a Poisson random variable with mean λ|A| for some positiveconstant λ and |A| denoting the area of A, and

(ii) Given the total number of events, n, occurring within area A, thelocations of the events represent an independent random sampleof n locations within A, where each point is equally likely to bechosen as an event location.

Cressie (1993, p. 634) lists the following equivalent definition of a ho-mogeneous spatial Poisson process:

(a) the numbers of events in non-overlapping regions are statisticallyindependent,

(b) for any region A within the study area,

lim|A|→0

Pr[exactly one event in A]

|A| = λ > 0

where |A| = the area of region A, and

(c) lim|A|→0Pr[two or more events in A]

|A| = 0.

These definitions rely on a distinction between a point (any locationwithin the study area where an event could occur) and an event (a lo-cation within the study are where an event did occur within a particularrealization of the process). Properties (i) and (a) provide motivationfor extensions for the models of regional counts for question Q2, lead-ing to development of spatial Poisson regression models with residualspatial correlation (Waller and Gotway, 2004, Chapter 9; Banerjee etal. 2004, Chapter 5).

A few features of spatial Poisson processes merit mention. First,the constant λ represents the intensity of the process and representsthe expected number of events occurring per unit area. Second, a Pois-son process assumes event locations are independent of one another,i.e., the occurrence of an event at one location does not influence theprobability of events occurring at any other locations. Third, properties(i) and (ii) provide a recipe for simulating realizations from a spatialPoisson process, enabling Monte Carlo assessments of deviations of ob-served patterns from a null distribution defined by the Poisson process.For instance, if we want to assess evidence of clustering of observations,we may define a test statistic summarizing some aspect of pattern, cal-culate its value in the observed data, then compare this value to a

4

histogram of values obtained under repeated simulations under a Pois-son process (Besag and Diggle 1977, Waller and Gotway 2004). As aresult, Monte Carlo methods are widely used in the analysis of spatialpoint processes, as illustrated in the example below.

If we wish to allow the intensity to vary over space, we may definea heterogeneous spatial Poisson process with intensity λ(s), a functionthat varies (typically smoothly) across locations within the study areas ∈ D. Properties i) and ii) are revised to:

(i∗) The number of events occurring within a finite region A followsa Poisson distribution with mean

∫A λ(s)ds, and

(ii∗) Given the total number of events, n, occurring within an areaA, the events are distributed according to a (spatial) probabilitydistribution proportional to λ(s).

Spatial Poisson processes provide a convenient framework for mod-elling spatial point patterns, but are not the only set of models fordoing so. The Poisson process assumes that all events are independentof one another and models all spatial pattern via the heterogeneousintensity function, i.e., we model all pattern through the first order(mean) properties of the process. Models assuming interdependencebetween event locations (e.g., contagion processes) also allow patterndue to second order properties of the process. It is mathematically im-possible to distinguish first-order patterns from second-order patternsbased on a single realization of a point process as one could describe anaggregation equally well through a locally increased mean number (in-tensity) of independent observations, through a constant mean number(intensity) of dependent observations, or some combination of the two(Bartlett 1964, Waller and Gotway 2004, p. 137).

In practice, one often assumes independence between events associ-ated with chronic outcomes (e.g., cancers) and seeks to identify spatialvariation in the risk of disease by estimating the underlying intensityfunction associated with cases. However, the intensity of cases alonecan be misleading without due consideration of the spatial distributionof the population at risk, since more cases will be observed where morepersons are at risk under the null model of a constant risk of disease.As a result, most modern studies of spatial point patterns of disease in-corporate a set of “controls” or non-diseased individuals sampled fromthe population at risk. The analyst then compares the spatial patternof cases with that of the controls and quantifies differences to identifycase patterns of interest.

5

To illustrate this point, consider the following data set originallypresented in Cuzick and Edwards (1990). Figure 1 represents the resi-dential locations of 62 cases of childhood leukemia diagnosed between1974 and 1982 in the North Humberside region of the United Kingdom.Also shown are the residential locations of 143 controls sampled at ran-dom from the birth register for the same years. Note the concentrationof cases and controls in the southern region representing the higherpopulation density found in the city of Hull. Without the controls, theconcentration of cases might seem suspicious, illustrating the impor-tance of considering a heterogeneous process with intensity adjustedfor spatial patterns in the population at risk. Additional analyses ofthese data appear in Lawson (2001) and Diggle (2003).

4.7 4.8 4.9 5.0 5.1 5.2 5.3

4.2

4.3

4.4

4.5

4.6

4.7

4.8

u

v

Figure 1: Location of 62 leukemia cases (filled circles) and 143 controls(open circles) in the North Humberside region of the United Kingdom(data originally from Cuzick and Edwards 1990). Cases were diagnosedbetween 1974 and 1982 and controls sampled from the birth register forthe same years.

6

Since we wish to compare patterns between cases and controls, webegin by estimating the intensity functions for each, denoted λ1(s) forcases and λ0(s) for controls. Here s = (u, v) denotes any location withinour study area D illustrated by the polygon containing observed caseand control events in Figure 1.

Since an intensity function is simply a non-normalized spatial den-sity function, it is natural to use kernel-based smoothing methods toprovide non-parametric intensity estimates (Wand and Jones 1995).The use of kernel-based intensity function estimates to identify regionsof different patterns between cases and controls have been proposed forsome time (Bithell 1990, Lawson and Williams 1993) and developedin depth by Kelsall and Diggle (1995a, 1995b, 1998). The approachprovides both visual and inferential output addressing the questionsof interest but applications in the literature are somewhat limited bysoftware availability. The analyses below were implemented in the R

language (R Core Development Team, 2005) using libraries splancs

and KernSmooth.To start, suppose we estimate λ1(s) by

λ1(s) =1

|D|bn1∑

i=1

Kern

( ||s− si||b

)(3.1)

where s is a location within D, si, i = 1, . . . , n1 represents the locationsof n1 cases, ||s−si|| denotes the distance between a point location s andan observed event location si, Kern(·) is a kernel function satisfying

∫

DKern(s)ds = 1,

|D| denotes the area of D, and b denotes a smoothing parameter (band-

width). We define λ0(s) similarly.A note on the scaling factor 1/|D| is in order. Scaling by 1/|D| re-

sults in a kernel estimate which integrates to N/|D|, the average numberof events per unit area, and omitting the 1/|D| term generates a kernelestimate integrating to N . Wand and Jones (1995) suggest omittingthe scaling factor 1/|D|, and Diggle (2000) suggests scaling by 1/|D|to provide an estimate expressed as average event counts rather thana probability. Perhaps stricter notation would define the estimate inequation 3.1 as “proportional to” rather than “equal to” the estimatedintensity function, but, in a sense, scaling by 1/N (density), 1 (inten-sity), or 1/|D| (expected events per area) is somewhat irrelevant forvisualization of the local peaks and valleys for a particular process.

7

Kelsall and Diggle (1995a and 1995b) note that, under an assump-tion of independent heterogeneous Poisson processes for cases and con-trols, when we condition on the observed numbers of cases and controls,the data are equivalent to two independent random samples from (spa-tial) density function

f(s) = λ1(s)/∫

Dλ1(s

∗)ds∗,

for cases and

g(s) = λ0(s)/∫

Dλ0(s

∗)ds∗,

for controls, where s∗ represents any location within D. Conditionalon the observed case and control totals (n1 and n0) Kelsall and Diggle(1995a,b) build inference based on the natural logarithm of the ratio ofthe two spatial densities,

r(s) = log{f(s)/g(s)},a quantity related to the logarithm of the relative risk of observing acase rather than a control at location s in D.

To illustrate the approach, Figure 2 gives the kernel density esti-mates for cases and controls in the North Humberside data for a com-mon bandwidth of 0.05 distance units. We note an overall similaritybetween the general patterns in Figure 2. Taking the ratio of the twoelements, we obtain the (log) relative risk surface shown in Figure 3.

We see some spatial variation in the log relative risk surface, mostnotably a generally decreasing west-to-east trend with a marked in-crease on the westernmost edge. For inference, we need to assess thevariability of the estimated log relative risk surface under a null hypoth-esis of constant risk everywhere (i.e., a flat (log) relative risk surface).Kelsall and Diggle (1995b) offer a convenient way to operationalize thisnull hypothesis, conditional on the observed case and control locations,based on a “random labelling” of n1 of the n1 + n0 observed locationsas cases. For a grid of points, we calculate the observed log relative risksurface illustrated in Figure 3. For each of a large number of randomsamples of n1 cases from the set of n1 + n0 observed locations (500such relabellings in our examples below), we recalculate the log relativerisk surface at the same grid of points, then compare the log relativerisk estimate at each location to those simulated under the randomlabellings.

We note that the random labelling inferences represent pointwiserather than simultaneous confidence bounds across all grid locationsdue to the large number of grid points where we make comparisons.

8

4.7 4.8 4.9 5.0 5.1 5.2 5.3

4.2

4.3

4.4

4.5

4.6

4.7

4.8

u

v

Cases, bandwidth = 0.05

4.7 4.8 4.9 5.0 5.1 5.2 5.3

4.2

4.3

4.4

4.5

4.6

4.7

4.8

u

v

Controls, bandwidth = 0.05

Figure 2: Kernel density estimates of the North Humberside leukemiacases (left) and controls (right) in North Humberside with bandwithset to 0.05.

For the North Humberside data, we mark grid locations in the right-hand plot of Figure 3 with the observed log relative risk value fallingabove the 97.5th percentile of random labelling values by a “+” andthose falling below the 2.5th percentile by a “-”. We note that the peakon the western side of the study area and the troughs on the northernand eastern edge each fall outside of the random labelling toleranceintervals. Examination of the data in Figure 1 reveals all three areasrepresent low density regions within the study area. In particular, thepeak covers an area with very few cases and no controls.

To investigate the robustness of the significant departures in Fig-ure 3, we consider broader bandwidths to incorporate more informationin the sparsely represented regions of the study area. Kelsall and Dig-gle (1995b) stress the theoretical and practical importance of a sharedbandwidth value to maintain comparability between the two density es-

9

u

v

Log relative risk

Log relative risk surface

4.7 4.8 4.9 5.0 5.1 5.2 5.3

4.2

4.3

4.4

4.5

4.6

4.7

4.8

u

v

Relative risk, bandwidth = 0.05

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Figure 3: Estimated log relative risk surface for the North Humbersideleukemia data with bandwidth set to 0.05 for both cases and controls.

timates and offer a cross-validation algorithm for identifying the band-width minimizing the mean integrated squared error between the es-timate and the true (unobserved) density surface. While identifyinga single, optimal bandwidth has merit, Silverman (1986) also notesthe value of exploring the stability of observed structures across band-width values, an approach we take in our illustrative example here. Toillustrate the point, consider the density estimates and associated logrelative risk surfaces in Figures 4 and 5. As one would expect, we seethe peaks and valleys attenuated, but note that the western peak re-mains outside the range of values obtained under 500 random labellingsimulations, suggesting a “significant” increase in the local relative risk.

The example reveals several items meriting additional comments.First, we note the impact of bandwidth selection on the general ap-pearance and smoothness of the estimated log relative risk surface. Inaddition, we note the impact of sparsely populated (or at least sparsely

10

u

v

Log relative risk


4.7 4.8 4.9 5.0 5.1 5.2 5.3

4.2

4.3

4.4

4.5

4.6

4.7

4.8

u

v


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

−−−−−−−−−−−−−−−−−−−−−−−−


sampled) areas on the estimated surface, particularly when such areasare near the edge of the study area. Kernel estimates are particularlyvariable near these edges since we do not include information from anycases or controls falling outside of the study region boundary. This issueof “edge effects” has seen some discussion in the spatial analysis litera-ture (Lawson 2001), but adjustments remain largely ad-hoc and rarely(if ever) offer different adjustments for different sorts of edges (e.g., acoastline represents a true edge beyond which no cases can occur, whilea county or country boundary is purely a political distinction with lit-tle causal association with local disease risk). Finally, the illustrationalso reminds us of the epidemiologic distinction between relative andattributable risks since the observed elevated estimate of log relativerisk corresponds to only a small number of actual cases.

The log relative risk surface represents one way of exploring spatially-referenced point data for local “clusters” of increased risk. Waller and

11

u

v

Log relative risk


4.7 4.8 4.9 5.0 5.1 5.2 5.3

4.2

4.3

4.4

4.5

4.6

4.7

4.8

u

v


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Gotway (2004, Chapters 6 and 7) review a wide variety of additionalapproaches for cluster detection, noting that each approach uses itsown mathematical definition of a “cluster” and, as a result, differentmethods may detect differing evidence of clusters (and, in fact, differentclusters) within the same data set.

To illustrate this point, we briefly describe the popular spatial scanstatistic developed by Kulldorff (1997) and available in the freewarepackage SaTScan (Kulldorff and Information Management Services, Inc.2002), then apply it to the North Humberside leukemia data. Scanstatistics consist of a moving window where one calculates the statis-tic of interest (here the local relative risk) inside and outside of thewindow and seeks to identify the window (or windows) providing themost extreme values. In the particular instance of the spatial scanstatistic developed by Kulldorff (1997), we consider circular windowsof varying sizes (with windows ranging from those containing a single

12

case or control to those containing half of the sampled population),and for each window we calculate a likelihood ratio statistic for thehypothesis of equal risk inside and outside of the window. More specif-ically, let n1,in and nin = (n1,in + n0,in) denote the number of caselocations and persons at risk (number of case and control locations)inside a particular window, respectively, and similarly define n1,out andnout = (n1,out + n0,out) for outside the window. The statistic of interestis the maximum of the local likelihood ratio statistics, for the Poissoncase,

maxall windows

(n1,in

nin

)n1,in(

n1,out

nout

)n1,out

I(

n1,in

nin

>n1,out

nout

), (3.2)

where I(·) denotes the indicator function, so we only maximize overwindows where the observed rate inside the window exceeds that out-side the window.

We obtain a likelihood ratio via equation 3.2 within each windowand identify the window(s) yielding the highest value. Rather than us-ing distributional results for each likelihood ratio statistic (which wouldresult in multiple testing problems), Kulldorff (1997) instead proposesa Monte Carlo test addressing the following question: “How unusualis the highest observed local likelihood ratio statistic?” For each of alarge number of simulated assignments of case locations, we again findthe window (among those considered) and the largest local likelihoodratio statistic. The test statistic obtained from the original observeddata is ranked against those values obtained from the simulated datathereby providing a Monte Carlo p-value associated with the “mostlikely cluster”.

It is important to note two features. First, we note that the ob-served likelihood ratio statistic for the most likely cluster is rankedagainst the maximum statistic from each simulation, regardless of lo-cation. This avoids the multiple comparison problem in a clever way,but requires careful explanation to understand what precisely is meantby significance statements regarding the most likely cluster. Second,the qualifying phrase “among those considered” added above in refer-ence to the spatial window also defines an aspect of the proper contextfor interpretation. Specifically, the set of windows considered defines afamily of potential clusters and our analysis assesses the most unusualcase aggregation among these. As an example, consider a long, linearcluster of increased risk. In order to “capture” the cluster within ourcircular windows, we will either have a small circle containing part ofthe cluster or a large circle containing most of the cluster but also a

13

large area experiencing the null, background risk. The first exampleloses statistical power due to a smaller local sample size, the seconddue to the diluted relative risk averaged over the larger window. Morerecent implementations of SaTScan allow elliptical clusters to generalizethe set of potential clusters. Other recent developments include the up-per level set approach of Patil and Taillie (2004) and an approach basedon minimum spanning trees (Assuncao et al. 2006) which broaden theclass of potential clusters at the expense of increases in computationtime.

We apply the spatial scan statistic (using SaTScan, version 3.0)(Kulldorff and Information Management Services, Inc., 2002) to theNorth Humberside leukemia data presented in Figure 1. Figure 6 in-dicates the most likely clusters by arrows (both having the same localrelative risk value). These clusters are quite small, both contain fourcases out of four individuals at risk (with 1.22 cases expected under anull hypothesis of constant risk), and the Monte Carlo p-value associ-ated with this value is 0.648, based on 999 simulations.

Comparing the results of the spatial scan statistic to those of the logrelative risk surface brings home several conceptual issues. First, notethat the basic question of interest is the same in both cases, namely,we wish to find areas (“clusters”) inconsistent with a null hypothesisof equal risk. In this example, the data are the same, but the differ-ent methods operationalize our question slightly differently. The mostlikely clusters for the scan statistic are much smaller than the suspi-cious areas raised by the log relative risk surface. Both of the mostlikely clusters are found in areas of higher population density. Dueto their small size and the concentration of nearby case and controlevent locations, these aggregations are unlikely to be detected by ker-nel smoothing methods, due to the use of a single, fixed bandwidthacross the entire study area. Regardless of the resulting statistical sig-nificance, the choice of a bandwidth small enough to detect very localexcesses are unlikely to be effective in summarizing patterns in themore sparsely populated sections of the study area. In addition, thearea most suspicious for the log relative risk surface is unlikely to bedetected by a circular scan statistic due to its elongated shape alongthe edge of the study area.

To wrap up our discussion of point patterns in spatial epidemiologicliterature and our North Humberside example, we note that neitherapproach presented above (nor many other proposed methods found inthe literature) are necessarily more “correct” than any other. Rather,each test examines a class of potential deviations from the null set-

14

4.7 4.8 4.9 5.0 5.1 5.2 5.3

4.2

4.3

4.4

4.5

4.6

4.7

4.8

u

v

Most likely clusters: Spatial Scan Statistic

Figure 6: Most likely clusters identified by the spatial scan statistic forthe North Humberside leukemia data. Neither is statistically signifi-cant, based on 999 Monte Carlo simulations.

ting, and draws conclusions based on that examination. Our examplepresents two specific approaches, but the same general principle ap-plies to all methods for detecting clusters: Different methods define(explicitly or implicitly) the sorts of deviations under considerationand different methods may provide different results based the types ofclusters present, hence no single method will provide a comprehensiveassessment of the presence or absence of clusters. Rather, applyingdifferent methods to the same data set may provide insight into thetype of clusters potentially found within the data (Waller and Jacquez1995).

We conclude this section by noting that the field of spatial point pro-cess modelling is far broader than the two methods presented here anddirect interested readers to general surveys in Lawson (2001), Wallerand Gotway (2004, Chapters 5-8), and especially recent texts by Møller

15

and Waagepeterson (2004) and Baddeley et al. (2006). In addition, thesubfield of cluster detection includes many additional approaches. Inparticular, the text by Lawson and Denison (2002); papers by Anselin(1995), Getis and Ord (1992), Ord and Getis (1995), and Tango (1995);and the recent comprehensive review by Kulldorff (2006) provide en-trance to additional families of analytic cluster detection techniques.These publications and the references therein provide a wider exam-ination of different classes of methods and inferential questions andtechniques than is presented here.

4 PREDICTING SPATIAL OBSERVATIONS

We next consider another important component of spatial epidemiol-ogy, namely the prediction of local exposures across the study areaD, based on a finite number of observations taken at point locationswithin D. Many methods central to spatial prediction have their rootsin the geology and mining literature and the field of geostatistics is fo-cused on the mathematics and associated inferential methods of spatialprediction.

The basic elements of spatial prediction follows a very intuitivestructure. First, we assume that observations are spatially correlatedwith observations taken close together more closely related than thosetaken far apart (also known in the geography literature as “Tobler’sFirst Law of Geography” after the eminent geographer Waldo Tobler,Tobler 1970). Second, an accurate estimate of spatial correlation as afunction of distance should allow us to combine information across sitesand predict the outcome at any location relative to the observed mea-surements and their locations relative to the prediction location. Takentogether, spatial prediction is a two-component process: estimate cor-relation (covariance) as a function of space then use this covariancefunction to combine observations and create a set of predictions acrossthe study area. In the case of a linear prediction of Gaussian data (aweighted average of observed measurements with weights dependent onthe covariance function), the process is often referred to as kriging inhonor of D.G. Krige, a South African mining engineer (Krige 1951).

The geostatistical literature is large and varied. Cressie (1993) andChiles and Delfiner (1999) offer comprehensive coverage of classicalstatistical inference in geostatistics, while Stein (1999) expands its the-oretical basis and Wackernagel (1995) focuses on multivariate setting.Webster and Oliver (2001) give an applied introduction to geostatistics

16

and Waller and Gotway (2004, Chapter 8) explore results in the settingof predicting exposure values for public health.

Recent methodologic advances of particular interest to spatial epi-demiology involve the use of hierarchical models for spatial prediction,often in a Bayesian setting using Markov chain Monte Carlo (MCMC)algorithms for inference. We focus attention on this formulation here,drawing primarily from Diggle, Tawn and Moyeed (1998) and the recenttext by Banerjee, Carlin and Gelfand (2004).

We illustrate the model development and application on a data setinvolving soil samples and dioxin contamination originally publishedin Zirschky and Harris (1986) and used as a case study in Waller andGotway (2004). In 1971, a truck carrying dioxin-contaminated wastedumped part of its load along a road in rural Missouri. In 1983, theUnited States’ Environmental Protection Agency collected soil samplesin a systematic manner along the road at varying distances, and at ahigher frequency in the immediate area of the spill. Figure 7 shows thesampling locations with circles having area proportional to the mea-sured log concentration (with concentration measured in µg/kg of soil)of dioxin taken at that location. As we might expect, concentrationsare highest in the vicinity of the spill (near coordinates (15,30)) andalong the roadway (the line Y = 30).

Several basic assumptions underly spatial prediction. Most of thesemay be generalized, but we present a straightforward example here toillustrate the approach. The first assumption is that of stationarity,i.e., the assumption that the spatial correlation structure is the sameacross the entire study area. This assumption provides a sense of repli-cation for estimation of the spatial covariance function, since we oftenonly observe one data set in any particular application. Second, one of-ten assumes isotropy, i.e., the spatial covariance declines with distancein the same manner in all directions. Third, spatial prediction oftenassumes Gaussian measurements taken at each location. Under theGaussian assumption, classical kriging methods provide the best linearunbiased prediction (BLUP) for each location in the study area. Finally,for simplicity, we will assume a constant mean for all observations inour development, i.e., all spatial pattern is due to spatial covarianceamong the observations.

In our dioxin example, we maintain these assumptions for ease ofexposition. However, in order for the assumptions to best apply, wetransform the data, first dividing the original X coordinate by 50 tobetter follow isotropy then considering the log concentrations to bettermeet the Gaussian assumption. The latter transformation results in

17

0 10 20 30 40 50 60 70

010

2030

4050

6070

X

Y

Figure 7: Location of dioxin measurements. The area of each symbol isproportional to the natural logarithm of the concentration as measuredin µg/kg of soil.

the BLUP of the transformed values and is no longer a BLUP of thedioxin concentration itself. Related bias adjustments are provided byCressie (1993), but we focus here on the basic approach and comparethe classical and recent hierarchical formulations of spatial predictionusing the transformed data set for illustration.

Let Z(s) denote the outcome of interest (dioxin in our example),measured at location s within study area D. Let Z(si) denote our nmeasured values taken at locations si, i = 1, . . . , n. In classical geo-statistics we seek a BLUP, hence our goal is to obtain an unbiasedprediction Z(s0) for any location s0 in D, defined by a linear combina-tion of observed values

Z(s0) =n∑

i=1

ηiZ(si),

18

with weights ηi to minimize the prediction error (Z(s0)− Z(s0)), typi-cally summarized in classical geostatistics by the mean square predictionerror (MSPE)

MSPE = E[(

Z(s0)− Z(s0))2

].

In short, we need to find the set of weights {ηi, i = 1, . . . , n} minimizingthe MSPE under the given unbiasedness constraint.

Since we assume a constant mean, it should come as no surprise thatthe optimal weights will depend on the spatial covariance between theobserved locations and any prediction location of interest. The stan-dard classical approach typically considers the semivariogram, denotedγ(h), defined via

2γ(||si − sj||) = V ar(Z(si)− Z(sj)), (4.3)

for locations si and sj within D. In other words, the semivariogram isdefined as one-half of the variance of the contrast between observationstaken at distance h = ||si− sj|| apart. If we assume this function is thesame for distance h regardless of the locations and relative orientationof si and sj then the semivariogram is stationary and isotropic, respec-tively. The semivariogram is related to the spatial covariance function,C(h), specifically

γ(h) = C(0)− C(h),

and, as such, the semivariogram must meet conditions necessary to en-sure a positive-definite variance-covariance matrix for all measurements(observed or not) within the study area. Based on this relationship, andsince we generally assume positive spatial correlation declining withdistance, the semivariogram will typically be an increasing function ofdistance, often rising to a sill value representing the variance betweenobservations taken far enough apart to be effectively independent. Thesemivariogram has been slightly preferred over the covariance functionin classical geostatistics primarily due to ease and accuracy of estima-tion over those for the covariance function, among other, more technicalreasons.

Given the semivariogram γ(h), we obtain the optimal predictionweights η1, . . . , ηn as solutions to the kriging equations

η = Γ−1γ (4.4)

where

η = (η1, . . . , ηn,m)′

19

γ = (γ(s0 − s1), . . . , γ(s0 − sn), 1)′

where the elements of Γ are

Γij =

γ(si − sj) i = 1, . . . , n;j = 1, . . . , n;

1 i = n + 1; j = 1, . . . , n;j = n + 1; i = 1, . . . , n;

0 i = j = n + 1.

So (4.4) becomes

η1

η2...

ηn

m

=

γ(s1 − s1) · · · γ(s1 − sn) 1γ(s2 − s1) · · · γ(s2 − sn) 1

.... . .

......

γ(sn − s1) · · · γ(sn − sn) 11 · · · 1 0

−1

γ(s0 − s1)γ(s0 − s2)

...γ(s0 − sn)

1

.

The equations above derive from minimization of the MSPE with anadditional Lagrangian multiplier m included to guarantee an unbiasedpredictor.

Computationally, note that we must calculate η for each predictionlocation, s0. However, only the vector γ changes with the predictionlocation. Since Γ depends only on the data but not the prediction loca-tions, we need only invert Γ once and then multiply by the associatedγ vector to obtain a prediction for any s0 in D.

The minimized MSPE, also known as the kriging variance, derivesfrom the same elements and is given by

σ2k(s0) = η′γ

=n∑

i=1

λiγ(s0 − si) + m (4.5)

= 2n∑

i=1

λiγ(s0 − si)−n∑

i=1

n∑

j=1

ηiηjγ(si − sj),

From equations 4.4 and 4.5 we see that if we have the semivariogramfunction, we have all we need to provide BLUP predictions at any lo-cation. As a result, there is a considerable literature on variography,

20

or the estimation of the semivariogram from observed data. One typ-ically estimates the semivariogram from the observed data contrasts,often averaging over pairs of observations taken the same (or nearlythe same) distance apart. Such averages often provide an empiricalsemivariogram, to which one fits a theoretical semivariogram defined asa parametric function of distance. Commonly used parametric fami-lies are cataloged throughout the spatial statistical literature, e.g., inCressie (1993), and Waller and Gotway (2004, Chapter 8). Parametricsemivariogram families are often defined in terms of the semivariogram’slimiting value as distance approaches zero (the nugget), the semivari-ogram’s limiting value as distance increases (the sill), and the distancebeyond which observations are effectively independent (the range).

Figure 8 illustrates the empirical semivariogram for the dioxin data,shown with dots representing one half of the average variation betweencontrasts observed at given distances apart. The lines represent the bestfitting theoretical semivariograms from the exponential semivariogrammodel defined by

γ(h; c0, ce, ae) =

{0 h = 0c0 + ce {1− exp(−h/ae)} h > 0,

(4.6)

where c0 ≥ 0 denotes the nugget effect, ce ≥ 0 denotes the partialsill, and ae > 0, where 3ae denotes the “effective range” (tradition-ally defined as the distance at which the autocorrelation is 0.05). Wefit the theoretical semivariograms using both least squares (ordinaryand weighted) and likelihood-based (maximum likelihood and restrictedmaximum likelihood) method in the geoR library for R (Ribeiro and Dig-gle 2001). The observed difference between the two types of estimatorsuggest a skewed likelihood across the parameter space.

At this point, one would select a “best” semivariogram from theestimates, then condition on this estimated function to define the vectorγ in the kriging equations to obtain both the point predictions at a setof prediction locations as well as the associated kriging variances. ForGaussian data, these will allow construction of pointwise predictionintervals at each location.

As noted above, the classical kriging approach is typically a two-stage process: first estimate the semivariogram, then solve the krigingequations. Common criticisms of the classical kriging approach in-clude concern that the uncertainty associated with the estimation ofthe semivariogram is not adequately reflected in the prediction errorsexpressed in the kriging variances. In addition, the typical semivari-ogram estimation procedure outlined above is itself a two step proce-

21

0 10 20 30 40 50

01

23

4

Estimated Semivariograms

distance

sem

ivar

ianc

e

MLREMLOLSWLS

Figure 8: Semivariograms for dioxin measurements. The circles denotethe empirical semivariogram and the lines denote the best fitting expo-nential semivariogram under maximum likelhood (ML), restricted max-imum likelhood (REML), ordinary least squares (OLS), and weightedleast squares (WLS).

dure involving construction of the point estimates defining the empiricalsemivariogram, then the estimation of the parameters in the selectedtheoretical semivariogram family. Finally, the asymptotic propertiesof semivariogram estimation are nontrivial, depending on whether oneuses infill (Stein 1999) or increasing domain (Cressie 1993, pp. 100-101)asymptotics.

Recent attention has turned toward an effort to express spatial pre-diction in a more cohesive manner addressing the uncertainty in all com-ponents in such a way as to accurately reflect the entire process. Thepreceding description of classical spatial prediction followed a ratherutilitarian approach, highlighting each step of the typical analyticalprocess. The basic theoretical construction underlying both classical

22

and Bayesian kriging, on the other hand, is rather elegant when ex-pressed in a hierarchical fashion, but until recently could not be readilyimplemented as such. However, with the advent of Markov chain MonteCarlo (MCMC) techniques, a general computational framework for ad-dressing hierarchical models now exists and recent statistical publica-tions move toward bringing these ideas and algorithms to the field ofspatial prediction. This is not to say that MCMC solves all problemsas its implementation is often slow and much more computationallydemanding than the classical approach outlined above.

Statistical prediction can be thought of in a Bayesian frameworkwhere the optimal predictor is defined by the conditional expecta-tion of Z(s0) given the observed data. In a more formal Bayesianstatement, the point prediction defined by the conditional expectationE[Z(s0)|Z(s1), . . . , Z(sn)] represents the posterior predictive expecta-tion of Z(s0) marginalizing over the posterior distribution of any modelparameters. For Gaussian data, this conditional expectation is a linearfunction of the data (motivating the BLUP formulation above), butfor non-Gaussian data this conditional expectation need not be linearin the data, so the conditional expectation provides a more generalframework for prediction, provided the expectation (and appropriateuncertainty measures) can be calculated.

More specifically, one can briefly contrast classical and hierarchically-specified spatial prediction following exposition found in Diggle et al.(1998), Diggle and Ribeiro (2002), Banerjee et al. (2004) and Waller(2005). Let Z denote the vector of data, S the unobserved spatial ran-dom field of all observations in D, and θ a vector of model parameters,for our purposes the set of covariance parameters. In effect, S is theset of values Z(s0) for all s0 in D. It is helpful to distinguish Z andS in model development, since we model the stochastic structure of Sand seek inference regarding model parameters based on observationsin Z. Our prediction goal is to obtain the conditional distribution of Sgiven Z and θ, expressed in general notation as [S|Z, θ]. Diggle andRibeiro (2002) note Bayes’ Theorem provides

[S|Z,θ] =[Z, S|θ]

[Z|θ]=

[Z|S,θ][S|θ]∫[Z|S,θ][S|θ]dS

. (4.7)

This development conditions on a known value of the set of covarianceparameters θ, and summarizes the classical spatial prediction setting ina general and elegant hierarchical form. As illustrated in the develop-ment above, in practice, one typically estimates θ from the data, thencalculates the conditional expectation above (conditional on both the

23

data Z and the parameter estimates) via the kriging equations.The hierarchical structure in equation 4.7 allows both frequentist

and Bayesian implementation. A frequentist approach builds the like-lihood from the hierarchical components then requires calculation ofthe associated predictive distribution for inference, allowing one to in-corporate the variability associated with the covariance function (or,equivalently, the semivariogram) into the likelihood at the cost of morecomplicated computation. In contrast, a Bayesian view of equation 4.7assumes a prior distribution [θ] for the unknown parameters, thenmarginalizes over the posterior distribution of the parameters giventhe data, [θ|Z] yielding the posterior predictive distribution

[S|Z] =∫

[S|θ,Z][θ|Z]dθ. (4.8)

Bayesian kriging draws point and interval predictions from this poste-rior predictive distribution. While simple in theory, application of thefull Bayesian approach typically encounters complicated or intractableintegrals, and also relies on advanced computing through MCMC sam-pling from the desired posterior distributions for inference. Both thefrequentist and Bayesian implementations require either advanced com-puting or simplifying assumptions (e.g., treating the estimated semi-variogram as fixed, as in the classical approach). We focus here onthe Bayesian implementation of the hierarchical structure due to its in-creasing application in the literature and the advent of more generallyavailable MCMC code for such models, and we apply both Bayesianand classical kriging to the dioxin data to compare results.

For comparability with the classical development above, supposeour data are Gaussian with constant mean, β0, i.e., suppose

Z ∼ MV N (β0, ΣZ) , (4.9)

where β0 represents an n-vector of the constant mean β0 and ΣZ then × n variance-covariance matrix of the Z’s. (We use the interceptnotation β0 to highlight where one might add additional parameters ina regression-type model of the mean). Next suppose we have a para-metric covariance function C(h; θ), defined up to the unknown modelparameters θ. For our example, we will use the isotropic, stationaryexponential covariance function family, corresponding to the exponen-tial semivariogram defined above. The covariance function defines theelements of ΣZ based on the distances between pairs of observationlocations, i.e.,

ΣZij= C(||si − sj||,θ).

24

The latent random field S serves as a data generator for Z, in thesense that the covariance function C(h, θ) is defined for any locations in D. For any set of n locations, we obtain a vector of observationsZ with the multivariate normal distribution defined in equation 4.9,with C(h, θ) and the relative locations of observations defining ΣZ .For simplicity, we assume no measurement error and set Z(si) = S(si).

The next step in the model definition is to define prior parametersfor model parameters β0, and θ = (ce, c0, ae). In most applications,the likelihood structure for mean parameter β0 is quite strong allow-ing very vague prior specifications. Prior specification for covarianceparameters in θ is somewhat more complicated, recently reviewed byWaller (2005) and summarized here. One could ignore θ by specifyinga conjugate inverse-Wishart prior for ΣZ (Box and Tiao 1973, pp. 459-470; Daniels and Kass 1999), but note that the inverse-Wishart does notlimit attention to specifically spatial covariance structures and, similarto discussions of clustering approaches in the preceding section, maynot focus attention on the set of models we are particularly interestedin exploring. A more common practice is to define individual conjugateprior distributions for each parameter within θ. While practical, thisapproach still requires care as noted by Berger et al. (2001, 2003) whoconsider reference and Jeffreys’ priors for variance-covariance parame-ters in a Gaussian random field. This setting provides one of the fewexamples where applying Jeffreys’ prior independently to each elementof θ yields an improper posterior distribution, suggesting a need forfurther work and especially for care in transferring seemingly sensiblepriors from the non-spatial to the spatial setting.

For the dioxin example, we assign a flat prior to β0, assume a nuggeteffect of c0 = 0, and a prior proportional to the reciprocal of the sillce (variance of independent observations). Rather than consider theeffective range parameter ae, directly, we instead model the rate ofexponential decay in the covariance function, denoted φ. We assign adiscrete prior based on 51 equally spaced values between zero and twicethe maximum observed distance between sampling locations. We fit themodel using the R library geoR (Ribeiro and Diggle 2001). Samples fromthe posterior distributions of all three parameters appear in Figure 9.We find a clear posterior signal for β0 (compared to the assumed flatprior). For φ, Figure 9 includes a thin line illustrating the assignedprior distribution and a thick line representing a kernel estimate of theposterior distribution. We see both the posterior moving away fromthe prior and tightening around its (posterior) mean value.

Figure 10 shows the empirical semivariogram values from Figure 8

25

beta0

Den

sity

−10 −5 0 5 10 15

0.0

0.1

0.2

0.3

sill

Den

sity

0 5 10 15 20 25

0.00

0.10

0.20

0.30

φ

Den

sity

0 20 40 60 80 100

0.00

0.02

0.04

0.06

Figure 9: Posterior samples for semivariogram parameters.

and the semivariograms corresponding to the exponential covariancefunction evaluated at the estimated posterior mean, median, and mode.The similarity between the two figures is reassuring and suggests ac-curate implementation of the Bayesian approach. We note that theskewed posterior distributions of the sill and φ result in the differencebetween the curves based on posterior mean and medians and thosebased on the posterior mode (which is, not surprisingly, quite similarto the maximum likelihood estimates shown in Figure 8).

Figure 11 shows the posterior mean predicted levels of log(dioxin)(the posterior mean surface S) as perspective plots (from two differentorientations) in the top row and as a contour plot in the lower left. Theimage plot in the lower right shows the spatial pattern in the varianceof the posterior predictive distribution. Predictive variance is highesetin areas with few observations, in particular, note the band of highpredictive variances along the road (Y = 30), representing the band ofno observations.

26

0 10 20 30 40 50

01

23

45

distance

sem

ivar

ianc

e

variogram posterior meanvariogram posterior medianparameters posterior mode

Figure 10: Bayesian semivariograms for dioxin measurements.

Figure 12 illustrates the posterior predictive mean and associated2.5th and 97.5th percentiles drawn from a sample of 5,000 samples fromthe posterior predictive distribution taken along transects for four dif-ferent values of X. Note the widening of the posterior predictive dis-tribution in the area near the road (Y = 30) corresponding to theincreased predictive variance associated with the lack of data taken inthis area. Also, note the “tightening” of the prediction errors near thesampling rows near Y = 10, 20, 40, 50, and 60.

An interesting feature appears in Figure 12 for X = 15, the predic-tions taken through the area of highest observed dioxin concentrationvalues. Note that the mean posterior prediction dips down betweenthe peak values taken on either side of the road. This is due to ourbasing prediction on a distance-decay correlation function. If we movebeyond the effective range estimated by our covariance (or semivari-ogram) function, observations are (nearly) independent of one anotherand each would receive nearly the same weight, resulting in the sample

27

X

Y

log(dioxin)

Mean of predictive distribution

XY

log(dioxin)


X Coord

Y C

oord

0 10 20 30 40 50 60 70

010

2030

4050

6070


0 10 20 30 40 50 60 70

010

2030

4050

6070

X

Y

Local variance of predictive distribution

Figure 11: Kriged log(dioxin) measurements.

mean in the BLUP case or a posterior estimate of the overall (assumedfixed) mean β0. In short, spatial prediction adjusts toward neighboringvalues based on how correlated we expect those values to be to ourdesired prediction. If no observations are close, the method resorts toan estimate of the background mean.

Results based on classical results (solving the kriging equationsbased on the variogram estimates shown in Figure 8) are quite sim-ilar to those shown here for Bayesian kriging. As a result, one mightwonder what we gain from the extra model specification (setting priordistributions) and the extra computational effort required for MCMCimplementation over the basic matrix calculations required for classicalkriging. For Gaussian data with a covariance structure easily modeledby standard parametric covariance functions and satisfying the basicassumptions of stationarity and isotropy, there may be little gain otherthan a sense of completeness in the definition of the full probabilitymodel incorporating both covariance estimation and outcome predic-

28

0 10 20 30 40 50 60 70

−4

−2

02

4

Y

log(

diox

in)

X = 5

0 10 20 30 40 50 60 70

−4

−2

02

4

Y

log(

diox

in)

X = 10

0 10 20 30 40 50 60 70

−4

−2

02

4

Y

log(

diox

in)

X = 15

0 10 20 30 40 50 60 70

−4

−2

02

4

Y

log(

diox

in)

X = 20

Figure 12: Posterior predictive values of natural logarithm of the localdioxin concentration for selected transects perpendicular to the road.The thick line denotes the posterior predictive mean value and thedashed lines represent the 2.5th and 97.5th percentiles of 5,000 samplesfrom the posterior predictive distribution.

tion.However, the Bayesian hierarchical framework also sets the stage

for broader extensions than does the classical framework. To see this,consider the descriptions above. Both the classical setting and theBayesian setting derive from a basic definition of conditional probabil-ity in equation 4.7. At this point the two descriptions (above and inmost of the spatial statistical literature) diverge in a manner mirroringa contrast seen in the frequentist and Bayesian literature. At the riskof over-simplifying, the classical (frequentist) description often buildsupon the basic framework in the following manner: Given the data areGaussian with a fixed mean we know (or can derive) that the predictorwill be a linear function of the observed data. We would like the weights

29

in this linear predictor to minimize the mean square prediction error,so we define the MSPE as a function of the weights and optimize sub-ject to an unbiasedness constraint, obtaining the MSPE as part of theoptimization. We find the optimal weights are a function of the semivar-iogram, and next seek to estimate the semivariogram in a accurate andconsistent manner in order to provide the best set of prediction weights.The prediction goal has been achieved by a series of theoretical deriva-tions and accompanying calculations, effectively achieving the largertask by a series of focused smaller tasks, each built on appropriate the-oretical results. Again, at the risk of oversimplification, in the Bayesiandescription the initial goal of prediction is again cast in the setting of aconditional probability. This conditional probability is reexpressed asa marginalization of a hierarchical probability model with parametersdefining the overall mean and covariance function (or semivariogramif preferred). The inclusion of prior probabilities places prediction inthe setting of a posterior distribution, the primary inferential tool forBayesian statistics. The details of interest in development relate to thestructure of the full probability model, e.g., defining the likelihood viathe multivariate normal distribution given the covariance structure inequation 4.9, next specifying this covariance structure given its parame-ters, and finally specifying the prior distributions for these parameters.The computational implementation falls to MCMC, perhaps a com-plicated MCMC requiring care in set-up and implementation, but anMCMC algorithm nonetheless.

Here we note a subtle difference in value between the settings, i.e.,a difference in what is regarded as the “cool” part of the derivation.In the classical development, value is placed on a deconstruction ofthe problem at hand into a series of steps motivated and validated bystatistical theory, each step with an accurate and efficient mode of cal-culation. Even in a frequentist evaluation of a hierarchical likelihood,similar steps occur (Lee and Nelder 1996, 2006). In the Bayesian de-velopment however, value is placed on an accurate formulation of theproblem in terms of a joint posterior distribution defined through aninterconnected set of hierarchical components, each justified in its ownright and fitting together in a manner that guarantees a proper poste-rior distribution incorporating all sources of potential information.

With this distinction in mind, an advantage of the Bayesian ap-proach appears when we consider changes to the components of theprobabilistic structure. Suppose we no longer have Gaussian data, butrather observations following some other distribution? In the classi-cal setting, this impacts one of the first steps in the process, namely

30

the equivalence between the desired conditional expectation and a lin-ear combination of observations. One approach is to transform thedata to achieve a distribution closer to a Gaussian distribution (as inthe dioxin example). The problem is not insurmountable, in fact, logGaussian (or more generally trans-Gaussian kriging) is widely used, butit requires reconstruction of one of the key components of the classi-cal approach. For the Bayesian formulation, moving from a Gaussiandistribution requires reformulation of the likelihood function and canintroduce complicated identifiability issues as the mean and covarianceparameters may no longer be orthogonal, yet the basic method is stillin place. This is not to say the Bayesian approach is necessarily easierto adapt, but one can argue that the required adjustments impact lessof the basic structure than they do in classical kriging.

5 CONCLUSIONS

In the discussion above, we focus attention on developing statisticalissues for two of the four general questions of interest in spatial epi-demiology. The resulting examples illustrate important issues under-lying the analysis of spatially-referenced data ranging from the inter-pretation of spatial cluster detection to contrasts between classical andBayesian spatial prediction. While the approaches and issues may seemvery different at first glance, the two areas highlight current convergingdirections of development in spatial epidemiology.

First, note the similarity between the spatial intensity function λ(s)in Section 3 and the latent random field S in Section 4. The hierarchicalstructure outlined regarding spatial prediction suggests extension tothe point process setting, as discussed in Møller and Waagepetersen(2004), and Diggle et al. (2000). In particular, Diggle et al. (2000)note that parametric (often hierarchical) approaches to the analysisof spatial point processes often provide increased accuracy, while non-parametric (e.g., the kernel-based approaches above) are often morerobust to violations of the assumed parametric models.

Second, while the definition of a spatial Poisson process motivatesthe construction of spatial Poisson regression models popular in diseasemapping (Lawson 2001, Waller and Gotway 2004, Banerjee et al. 2004,Wakefield et al. 2000), the hierarchical framework also offers a wayto build inference for aggregated counts of underlying (latent) pointpatterns, even for data collected at differing levels of aggregation. Bestet al. (2000) provide an example of such an approach.

31

The field of spatial epidemiology is much larger than the two de-tailed areas considered here and further classes of analytic methodsappear in the literature addressing additional epidemiologic questionsfor additional forms of available data. A prime example is the use ofremote sensing data in epidemiologic investigations. Robinson (2000)provides a thorough review of statistical techniques for remote sensingdata in public health and Goovaerts et al. (2005) define methods linkingboth of the analytic areas considered above (geostatistical predictionand cluster detection) within the setting of remote sensing data.

In conclusion, spatial epidemiology offers much opportunity for con-tinued methodologic development in order to provide accurate, reliableinference on important public health issues.

ACKNOWLEDGEMENT

This work is supported in part by grant NIEHS R01 ES007750. Theopinions expressed herein are solely those of the author and may notreflect those of the National Institutes of Health or the National Insti-tute of Environmental Health Sciences.

REFERENCES

[1] Anselin, L. (1995) Local indicators of spatial association: LISA. Ge-ographical Analysis 27, 93-116.

[2] Assuncao, R., Costa, M., Tavares, A., and Ferreira, S. (2006) Fastdetection of arbitrarily shaped disease clusters. Statistics in Medicine25, 723-742.

[3] Baddeley, A., Gregori, P., Mateau, J., Stoica, R., and Stoyan D.(2006) Case Studies in Spatial Point Process Modeling. New York:Springer.

[4] Banerjee, S., Carlin, B.P., and Gelfand, A.E. (2004) HierarchicalModeling and Analysis of Spatial Data. Boca Raton FL: Chapman &Hall/CRC.

32

[5] Bartlett, M.S. (1964) The spectral analysis of two-dimensional pointprocesses. Biometrika 51, 299-311.

[6] Berger, J.O., De Oliveira, Sanso, B. (2001) Objective Bayesian anal-ysis of spatially correlated data. Journal of the American StatisticalAssociation 96, 1361-1374.

[7] Berger, J.O., De Oliveira, Sanso, B., Ren, C., Wittig, T.A. (2003)Correction. Journal of the American Statistical Association 98, 779.

[8] Besag, J. and Diggle, P.J. (1977) Simple Monte Carlo tests for spa-tial pattern. Applied Statistics 26, 327-333.

[9] Best, N.G., Ickstadt, K., Wolpert, R.L. (2000) Spatial Poisson re-gression for health and exposure data measured at disparate resolutions.Journal of the American Statistical Association 95, 1076-1088.

[10] Bithell, J. (1990) An application of density estimation to geograph-ical epidemiology. Statistics in Medicine 9, 691-701.

[11] Box, G.E.P. and Tiao, G.C. (1973) Bayesian Inference in Statisti-cal Analysis. New York: John Wiley & Sons.

[12] Chiles, J.-P., and Delfiner, P. (1999) Geostatistics: Modeling Spa-tial Uncertainty. New York: John Wiley & Sons.

[13] Cressie, N.A.C. (1993) Statistics for Spatial Data, Revised Edition.New York: John Wiley & Sons.

[14] Cuzick, J. and Edwards, R. (1990) Spatial clustering for inhomo-geneous populations (with discussion). Journal of the Royal StatisticalSociety, Series B 52, 73-104.

[15] Daniels, M. and Kass, R. (1999) Nonconjugate Bayesian estimationof covariance matrices in hierarchical models. Journal of the AmericanStatistical Association 94, 1254-1263.

[16] Diggle, P.J. (1983) Statistical Analysis of Spatial Point Patterns.London: Academic Press.

33

[17] Diggle, P.J. (2000) Overview of statistical methods for disease map-ping and its relationship to cluster detection. In Spatial Epidemiology:Methods and Applications. P. Elliott, J.C. Wakefield, N.G. Best, andD.J. Briggs (eds). Oxford: Oxford University Press. pp. 87-103.

[18] Diggle, P.J. (2003) Statistical Analysis of Spatial Point Patterns,Second Edition. Oxford: Oxford University Press.

[19] Diggle, P., Tawn, J., and Moyeed, R. (1998) Model-based geostatis-tics (with discussion). Applied Statistics 47, 299-350.

[20] Diggle, P.J., Mateau, J., Clough, H.E. (2000) A comparison be-tween parametric and non-parametric approaches to the analysis ofreplicated spatial point patterns. Advances in Applied Probability 32,331-343.

[21] Diggle, P.J. and Robeiro Jr., P.J. (2002) Bayesian inference inGaussian model-based geostatistics. Geographical & EnvironmentalModelling 6, 129-146.

[22] Elliott, P., Cuzick, J., English, D., and Stern, R. (1992) Geographi-cal and Environmental Epidemiology: Methods for Small-Area Studies.Oxford: Oxford University Press.

[23] Elliott, P., Wakefield, J.C., Best, N.G., and Briggs, D.J. (2000)Spatial Epidemiology: Methods and Applications. Oxford: Oxford Uni-versity Press.

[24] Getis, A. and Ord, K. (1992) The analysis of spatial association bydistance statistics. Geographical Analysis 24, 189-207.

[25] Goovaerts, P., Jacquez, G.M., and Marcus, A. (2005) Geostatisticaland local cluster analysis of high resolution hyperspectral imagery fordetection of anomalies. Remote Sensing of Environment 95, 351-367.

[26] Koch, T. (2005) Cartographies of Disease: Maps, Mapping, andMedicine. Redlands, CA: ESRI Press.

[27] Kelsall, J. and Diggle, P.J. (1995a) Kernel estimation of relativerisk. Bernoulli 1, 3-16.

34

[28] Kelsall, J. and Diggle, P.J. (1995b) Non-parametric estimation ofspatial variation in relative risk. Statistics in Medicine 14, 2335-2342.

[29] Kelsall, J. and Diggle, P.J. (1998) Spatial variation in risk: a non-parametric binary regression approach. Statistics in Medicine 17, 559-573.

[30] Krige, D.G. (1951) A statistical approach to some basic mine eval-uation problems on the Witwaterstrand. Journal of Chemical, Metal-lurgical and Mining Society of South Africa 52, 119-139.

[31] Kulldorff, M. (1997) A spatial scan statistic. Communications inStatistics: Theory and Methods 26, 1487-1496.

[32] Kulldorff, M. and Information Management Services, Inc. (2002)SaTScan v. 3.0: Software for the spatial and space-time scan statistics.Bethesda, MD: National Cancer Institute.

[33] Kulldorff, M. (2006) Tests of spatial randomness adjusted for inho-mogeneity: A general framework. Journal of the American StatisticalAssociation 101, 1289-1305.

[34] Lawson, A.B. (2001) Statistical Methods in Spatial Epidemiology.Chichester: John Wiley & Sons.

[35] Lawson, A.B. and D.G.T. Denison (2002) Spatial Cluster Mod-elling. Boca Raton FL: Chapman & Hall/CRC.

[36] Lawson, A.B. and Williams F.L.R. (1993) Applications of extrac-tion mapping in environmental epidemiology. Statistics in Medicine12, 1249-1258.

[37] Lawson, A.B. and Williams F.L.R. (2001) An Introductory Guideto Disease Mapping. Chichester: John Wiley & Sons.

[38] Lee, Y. and Nelder, J.A. (1996) Hierarchical generalized linearmodels. Journal of the Royal Statistical Society, Series B 58, 619-656.

[39] Lee, Y. and Nelder, J.A. (2006) Double hierarchical generalizedlinear models. Journal of the Royal Statistical Society, Series C 55,139-167.

35

[40] Møller, J. and Waagepetersen, R. (2002) Statistical Inference andSimulation for Spatial Point Patterns. Boca Raton, FL: Chapman &Hall/CRC.

[41] Ord, K. and Getis, A. (1995) Local spatial autocorrelation statis-tics: Distributional issues and an application. Geographical Analysis27, 286-306.

[42] Patil, G.P., and Taillie, C. (2004) Upper level set scan statistic fordetecting arbitrarily shaped hotspots. Environmental and EcologicalStatistics 11, 183-197.

[43] R Development Core Team (2005). R: A language and environmentfor statistical computing. R Foundation for Statistical Computing, Vi-enna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

[44] Robinson, T.P. (2000) Spatial statistics and geographical informa-tion systems in epidemiology and public health. Advances in Parasitol-ogy 47, 81-128.

[45] Ribeiro Jr., P.J. and Diggle, P.J. (2001) geoR: A package for geo-statistical analysis. R-NEWS, Vol 1, No 2, 15-18. ISSN 1609-3631.

[46] Silverman, B.W. (1986) Density Estimation for Statistics and DataAnalysis. Boca Raton FL: Chapman & Hall/CRC.

[47] Stein, M.L. (1999) Interpolation of Spatial Data: Some Theory forKriging. New York: Springer.

[48] Stoyan, D., Kendall, W.S., and Mecke, J. (1995) Stochastic Geom-etry and its Applications, Second Edition. New York: John Wiley &Sons.

[49] Tango, T. (1995) A class of tests for detecting “general” and “fo-cused” clustering of rare diseases. Statistics in Medicine 14, 2323-2334.

[50] Tobler, W. (1970) A computer movie simulating urban growth inthe Detroit region. Economic Geography 46, 234-240.

[51] Wackernagel, H. (1995) Multivariate Geostatistics. Berlin: Springer.

36

[52] Wakefield, J., Best, N.G., and Waller, L.A. (2000) Bayesian ap-proaches to disease mapping. In Spatial Epidemiology: Methods andApplications, P. Elliott, J.C. Wakefield, N.G. Best, and D.J. Briggs(eds.) Oxford: Oxford University Press. pp. 106-127.

[53] Waller, L.A. and Jacquez, G.M. (1995) Disease models implicit instatistical tests of disease clustering. Epidemiology 6, 584-590.

[54] Waller, L.A. and Gotway, C.A. (2004) Applied Spatial Analysis ofPublic Health Data. Hoboken NJ: John Wiley & Sons.

[55] Waller, L.A. (2005) Bayesian thinking in spatial statistics. In Hand-book of Statistics, Volume 25: Bayesian Thinking: Modeling and Com-putation, D.K. Dey and C.R. Rao (eds.) Amsterdam: Elsevier-NorthHolland.

[56] Wand, M.P. and Jones, M.C. (1995) Kernel Smoothing. Boca Ra-ton FL: Chapman & Hall/CRC.

[57] Walter, S.D. (2000) Disease mapping: a historical perspective. InSpatial Epidemiology: Methods and Applications. P. Elliott, J.C. Wake-field, N.G. Best, and D.J. Briggs (eds.) Oxford: Oxford UniversityPress. pp. 223-239.

[58] Webster, R. and Oliver, M.A. (2001) Geostatistics for Environmen-tal Scientists. Chichester: John Wiley & Sons.

[59] Zirschky, J.H. and Harris, D.J. (1986) Geostatistical analysis ofhazardous waste site data. Journal of Environmental Engineering,ASCE 112, 770-784.

Addresses for communication:

L. A. WALLER, Department of Biostatistics, Rollins School of PublicHealth, Emory University, 1518 Clifton Road NE, Atlanta, GA 30322,U.S.A.. E-mail: [email protected]

37

Spatial Epidemiology - Rollins School of Public Health Epidemiology LANCE A. WALLER Emory...

Documents

Transcript of Spatial Epidemiology - Rollins School of Public Health Epidemiology LANCE A. WALLER Emory...