MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

MODULE 5: Spatial Statistics in Epidemiologyand Public Health

Lecture 3: Point Processes

Jon Wakefield and Lance Waller

1 / 37




Preliminaries

Random patternsHeterogeneous Poisson process

Estimating intensities

Second order propertiesK functionsMonte Carlo envelopes

2 / 37




References

I Baddeley, A., Rubak, E., and Turner. R. (2015) Spatial PointPatterns: Methodology and Applications in R. Boca Raton,FL: CRC/Chapman & Hall.

I Diggle, P.J. (1983) Statistical Analysis of Spatial PointPatterns. London: Academic Press.

I Diggle, P.J. (2013) Statistical Analysis of Spatial andSpatio-Temporal Point Patterns, Third EditionCRC/Chapman & Hall.

I Waller and Gotway (2004, Chapter 5) Applied SpatialStatistics for Public Health Data. New York: Wiley.

I Møller, J. and Waagepetersen (2004) Statistical Inference andSimulation for Spatial Point Processes. Boca Raton, FL:CRC/Chapman & Hall.

3 / 37




Goals

I Describe basic types of spatial point patterns.

I Introduce mathematical models for random patterns of events.

I Introduce analytic methods for describing patterns in observedcollections of events.

4 / 37




Heterogeneous Poisson process

Terminology

I Realization: An observed set of event locations (a data set).

I Point: Where an event could occur.

I Event: Where an event did occur.

5 / 37





Complete Spatial Randomness (CSR)

I Start with a model of “lack of pattern”.

I Events equally likely to occur anywhere in the study area(uniform distribution).

I Event locations independent of each other.

6 / 37





Six realizations of CSR

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0u

v

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v ●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

7 / 37





CSR as a boundary condition

CSR serves as a boundary between:

I Patterns that are more “clustered” than CSR.

I Patterns that are more “regular” than CSR.

8 / 37





Too Clustered (top), Too Regular (bottom)

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Clustered

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

u

v

Clustered

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Clustered

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Regular

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Regular

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Regular

9 / 37





Spatial Point Processes

I Mathematically, we treat our point patterns as realizations ofa spatial stochastic process.

I A stochastic process is a collection of random variablesX1,X2, . . . ,XN .

I Examples: Number of people in line at grocery store.

I For us, each random variable represents an event location.

10 / 37





CSR as a Stochastic Process

Let N(A) = number of events observed in region A, and λ = apositive constant.

A homogenous spatial Poisson point process is defined by:

(a) N(A) ∼ Pois(λ|A|)(b) given N(A) = n, the locations of the events are uniformly

distributed over A.

λ is the intensity of the process (mean number of events expectedper unit area).

11 / 37





Is this CSR?

I Criteria (a) and (b) give a “recipe” for simulating realizationsof this process:

* Generate a Poisson random number of events.* Distribute that many events uniformly across the study area.runif(n,min(x),max(x))

runif(n,min(y),max(y))

12 / 37





Monte Carlo testing

Let T= a random variable representing a test statistic (somenumerical summary of the observed data).

What is the distribution of T under H0?

1. t1.

2. simulate t2, ..., tm under H0, these values will follow F0.

3. p.value = rank of t1m .

M.C. tests are useful in spatial statistics since we can simulatespatial patterns and calculate the statistics.

Example: e.g., 592 leukemia cases in ∼ 790 regions...

13 / 37





Moving beyond CSR

CSR:

1. is the “white noise” of spatial point processes.

2. characterizes the absence of structure (signal) in data.

3. often the null hypothesis in statistical tests to determine ifthere is clustering in an observed point pattern.

4. not as useful in public health? Why not?

14 / 37





Heterogeneous population density

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

15 / 37





Heterogeneous Poisson Process

What if λ, the intensity of the process (mean number of eventsexpected per unit area), varies by location?

1. N(A) = Pois(∫

(s)∈A λ(s)ds)

(|A| =∫(s)∈A ds)

2. Given N(A) = n, events distributed in A as an independentsample from a distribution on A with p.d.f. proportional toλ(s).

We still have counts from areas ∼ Poisson and events aredistributed proportional to the intensity.

16 / 37





Example intensity function

u

v

lambda

u

v

0 5 10 15 20

05

1015

20

17 / 37





Six realizations

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0 5 10 15 20

05

1015

20

u

v

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

● ●

● ●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

0 5 10 15 200

510

1520

u

v ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

0 5 10 15 20

05

1015

20

u

v

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

0 5 10 15 20

05

1015

20

u

v

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

0 5 10 15 20

05

1015

20

u

v

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

0 5 10 15 20

05

1015

20

u

v

18 / 37





IMPORTANT FACT!

Without additional information, no analysis can differentiatebetween:

1. independent events in a heterogeneous (non-stationary)environment

2. dependent events in a homogeneous (stationary) environment

19 / 37




How do we estimate intensities?

Kernel estimators provide a natural approach (Silverman (1986)and Wand and Jones (1995, KernSmooth R library)).

Main idea: Put a little “kernel” of density at each data point, thensum to give the estimate of the overall density function.

20 / 37




Kernels and bandwidths

s

0.0 0.2 0.4 0.6 0.8 1.0

Kernel variance = 0.02

s

0.0 0.2 0.4 0.6 0.8 1.0


s

0.0 0.2 0.4 0.6 0.8 1.0


s

0.0 0.2 0.4 0.6 0.8 1.0


21 / 37




Kernel estimation in R

base

I density() one-dimensional kernel

library(KernSmooth)

I bkde2D(x, bandwidth, gridsize=c(51, 51),

range.x=<<see below>>, truncate=TRUE) block kerneldensity estimation

library(splancs)

I kernel2d(pts,poly,h0,nx=20,

ny=20,kernel=’quartic’)

library(spatstat)

I ksmooth.ppp(x, sigma, weights, edge=TRUE)

22 / 37




Data Break: Early Medieval Grave Sites

I Alt and Vach (1991). (Data from Richard Wright EmeritusProfessor, School of Archaeology, University of Sydney.)

I Archeological dig in Neresheim, Baden-Wurttemberg,Germany.

I Question: are graves placed according to family units?

I 143 grave sites, 30 with missing or reduced wisdom teeth.

I Could intensity estimates for grave sites with and withoutwisdom teeth help answer this question?

23 / 37




Plot of the data

*

*

**

**

*

** * *

**

*

*

*

*

*

*

**

*

*

**

**

*

*

*

***

**

*

*

*

*

*

*

*

**

*

*

*

*

*

*

*

**

*

*

*

**

**

*

**

*

*

*

**

*

*

*

**

*

*

*

***

* *

**

*

*

*

*

**

**

*

**

*

*

*

**

*

*

*

*

**

*

*

*

*

*

*

**

** *

*

*

*

*

**

*

*

*

*

****

*

*

*

**

*

*

*

***

*

*

4000 6000 8000 10000

4000

6000

8000

10000

u

v

Grave locations (*=grave, O=affected)

OO

O

O

O

O

O

O

OO

OOO

O

O

OO

O

O

O

O

O

O

O

O

OO

OO

O

24 / 37




Case intensity

u

v

Intensity

Estimated intensity function

**

*

*

*

*

*

*

* *

***

*

*

**

**

*

*

*

**

*** **

*

4000 8000

4000

8000

u

v

Affected grave locations

25 / 37




Control intensity

u

v

Intensity

Estimated intensity function

*

*

**

*

** * *

**

*

*

*

*

*

**

*

*

**

**

*

*

***

***

*

**

**

*

*

*

*

*

*

**

*

*

*

*

*

**

*

*

**

*

*

*

*

*

** *

***

*

*

***

*

*

*

**

*

*

*

*

**

*

*

*

*

*

*

** *

*

*

*

**

*

*

**

****

*

*

**

*

*

*

*

4000 6000 8000 10000

4000

6000

8000

1000

0

u

v

Non−affected grave locations

26 / 37




What we have/don’t have

I Kernel estimates suggest where there might be differences.

I No significance testing (yet!)

27 / 37




K functionsMonte Carlo envelopes

First and Second Order Properties

I The intensity function describes the mean number of eventsper unit area, a first order property of the underlying process.

I What about second order properties relating to thevariance/covariance/correlation between event locations (ifevents non independent...)?

28 / 37





Ripley’s K function

Ripley (1976, 1977 introduced) the reduced second momentmeasure or K function

K (h) =E [# events within h of a randomly chosen event]

λ,

for any positive spatial lag h.

I Under CSR, K (h) = πh2 (area of circle of with radius h).

I Clustered? K (h) > πh2.

I Regular? K (h) < πh2.

29 / 37





Calculating K (h) in R

library(splancs)

I khat(pts,poly,s,newstyle=FALSE)

I poly defines polygon boundary (important!!!).

library(spatstat)

I Kest(X, r, correction=c("border", "isotropic",

"Ripley", "translate"))

I Boundary part of X (point process “object”).

30 / 37





Plots with K (h)

I Plotting (h,K (h)) for CSR is a parabola.

I K (h) = πh2 implies (K (h)

π

)1/2

= h.

I Besag (1977) suggests plotting

h versus L(h)

where

L(h) =

(Kec(h)

π

)1/2

− h

31 / 37





Monte Carlo Variability and Envelopes

I Observe K (h) from data.

I Simulate a realization of events from CSR.

I Find K (h) for the simulated data.

I Repeat simulations many times.

I Create simulation “envelopes” from simulation-based K (h)’s.

32 / 37





Example: Regular clusters and clusters of regularity

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

−0.

10.

00.

10.

20.

3

Distance (h)

sqrt

(Kha

t/pi)

− h

Estimated K function, regular pattern of clusters

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

−0.

10.

00.

10.

20.

3

Distance (h)

sqrt

(Kha

t/pi)

− h

Estimated K function, cluster of regular patterns

33 / 37





Data break: Medieval graves: K functions with polygonadjustment

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for all gravesites, polygon

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for affected gravesites, polygon

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for non−affected gravesites, polygon

34 / 37





Clustering?

I Clustering of cases at very shortest distances.

I Likely due to two coincident-pair sites (both cases in bothpairs).

I Envelopes based on random samples of 30 “cases” from set of143 locations.

35 / 37





Notes

I First and second moments do not uniquely define adistribution, and λ(s) and K (h) do not uniquely define aspatial point pattern (Baddeley and Silverman 1984, and inSection 5.3.4 ).

I Analyses based on λ(s) typically assume independent events.

I Analyses based on K (h) typically assume a stationary process(with constant λ).

I Remember IMPORTANT FACT! above.

36 / 37





What questions can we answer?

I Are events uniformly distributed in space?I Test CSR.

I If not, where are events more or less likely?I Intensity estimation.

I Do events tend to occur near other events, and, if so, at whatscale?I K functions with Monte Carlo envelopes.

37 / 37

MODULE 5: Spatial Statistics in Epidemiology and Public ...

Documents

Transcript of MODULE 5: Spatial Statistics in Epidemiology and Public ...