MODULE 5: Spatial Statistics in Epidemiology and Public ...
Transcript of MODULE 5: Spatial Statistics in Epidemiology and Public ...
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
MODULE 5: Spatial Statistics in Epidemiologyand Public Health
Lecture 3: Point Processes
Jon Wakefield and Lance Waller
1 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Preliminaries
Random patternsHeterogeneous Poisson process
Estimating intensities
Second order propertiesK functionsMonte Carlo envelopes
2 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
References
I Baddeley, A., Rubak, E., and Turner. R. (2015) Spatial PointPatterns: Methodology and Applications in R. Boca Raton,FL: CRC/Chapman & Hall.
I Diggle, P.J. (1983) Statistical Analysis of Spatial PointPatterns. London: Academic Press.
I Diggle, P.J. (2013) Statistical Analysis of Spatial andSpatio-Temporal Point Patterns, Third EditionCRC/Chapman & Hall.
I Waller and Gotway (2004, Chapter 5) Applied SpatialStatistics for Public Health Data. New York: Wiley.
I Møller, J. and Waagepetersen (2004) Statistical Inference andSimulation for Spatial Point Processes. Boca Raton, FL:CRC/Chapman & Hall.
3 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Goals
I Describe basic types of spatial point patterns.
I Introduce mathematical models for random patterns of events.
I Introduce analytic methods for describing patterns in observedcollections of events.
4 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Terminology
I Realization: An observed set of event locations (a data set).
I Point: Where an event could occur.
I Event: Where an event did occur.
5 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Complete Spatial Randomness (CSR)
I Start with a model of “lack of pattern”.
I Events equally likely to occur anywhere in the study area(uniform distribution).
I Event locations independent of each other.
6 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Six realizations of CSR
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.00.
00.
20.
40.
60.
81.
0u
v
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
7 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
CSR as a boundary condition
CSR serves as a boundary between:
I Patterns that are more “clustered” than CSR.
I Patterns that are more “regular” than CSR.
8 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Too Clustered (top), Too Regular (bottom)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
Clustered
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.00.
00.
20.
40.
60.
81.
0
u
v
Clustered
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
Clustered
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
Regular
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
Regular
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
Regular
9 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Spatial Point Processes
I Mathematically, we treat our point patterns as realizations ofa spatial stochastic process.
I A stochastic process is a collection of random variablesX1,X2, . . . ,XN .
I Examples: Number of people in line at grocery store.
I For us, each random variable represents an event location.
10 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
CSR as a Stochastic Process
Let N(A) = number of events observed in region A, and λ = apositive constant.
A homogenous spatial Poisson point process is defined by:
(a) N(A) ∼ Pois(λ|A|)(b) given N(A) = n, the locations of the events are uniformly
distributed over A.
λ is the intensity of the process (mean number of events expectedper unit area).
11 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Is this CSR?
I Criteria (a) and (b) give a “recipe” for simulating realizationsof this process:
* Generate a Poisson random number of events.* Distribute that many events uniformly across the study area.runif(n,min(x),max(x))
runif(n,min(y),max(y))
12 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Monte Carlo testing
Let T= a random variable representing a test statistic (somenumerical summary of the observed data).
What is the distribution of T under H0?
1. t1.
2. simulate t2, ..., tm under H0, these values will follow F0.
3. p.value = rank of t1m .
M.C. tests are useful in spatial statistics since we can simulatespatial patterns and calculate the statistics.
Example: e.g., 592 leukemia cases in ∼ 790 regions...
13 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Moving beyond CSR
CSR:
1. is the “white noise” of spatial point processes.
2. characterizes the absence of structure (signal) in data.
3. often the null hypothesis in statistical tests to determine ifthere is clustering in an observed point pattern.
4. not as useful in public health? Why not?
14 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Heterogeneous population density
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
15 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Heterogeneous Poisson Process
What if λ, the intensity of the process (mean number of eventsexpected per unit area), varies by location?
1. N(A) = Pois(∫
(s)∈A λ(s)ds)
(|A| =∫(s)∈A ds)
2. Given N(A) = n, events distributed in A as an independentsample from a distribution on A with p.d.f. proportional toλ(s).
We still have counts from areas ∼ Poisson and events aredistributed proportional to the intensity.
16 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Example intensity function
u
v
lambda
u
v
0 5 10 15 20
05
1015
20
17 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
Six realizations
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
05
1015
20
u
v
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
0 5 10 15 200
510
1520
u
v ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
05
1015
20
u
v
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
0 5 10 15 20
05
1015
20
u
v
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
05
1015
20
u
v
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
0 5 10 15 20
05
1015
20
u
v
18 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Heterogeneous Poisson process
IMPORTANT FACT!
Without additional information, no analysis can differentiatebetween:
1. independent events in a heterogeneous (non-stationary)environment
2. dependent events in a homogeneous (stationary) environment
19 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
How do we estimate intensities?
Kernel estimators provide a natural approach (Silverman (1986)and Wand and Jones (1995, KernSmooth R library)).
Main idea: Put a little “kernel” of density at each data point, thensum to give the estimate of the overall density function.
20 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Kernels and bandwidths
s
0.0 0.2 0.4 0.6 0.8 1.0
Kernel variance = 0.02
s
0.0 0.2 0.4 0.6 0.8 1.0
Kernel variance = 0.03
s
0.0 0.2 0.4 0.6 0.8 1.0
Kernel variance = 0.04
s
0.0 0.2 0.4 0.6 0.8 1.0
Kernel variance = 0.1
21 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Kernel estimation in R
base
I density() one-dimensional kernel
library(KernSmooth)
I bkde2D(x, bandwidth, gridsize=c(51, 51),
range.x=<<see below>>, truncate=TRUE) block kerneldensity estimation
library(splancs)
I kernel2d(pts,poly,h0,nx=20,
ny=20,kernel=’quartic’)
library(spatstat)
I ksmooth.ppp(x, sigma, weights, edge=TRUE)
22 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Data Break: Early Medieval Grave Sites
I Alt and Vach (1991). (Data from Richard Wright EmeritusProfessor, School of Archaeology, University of Sydney.)
I Archeological dig in Neresheim, Baden-Wurttemberg,Germany.
I Question: are graves placed according to family units?
I 143 grave sites, 30 with missing or reduced wisdom teeth.
I Could intensity estimates for grave sites with and withoutwisdom teeth help answer this question?
23 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Plot of the data
*
*
**
**
*
** * *
**
*
*
*
*
*
*
**
*
*
**
**
*
*
*
***
**
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
**
*
*
*
**
**
*
**
*
*
*
**
*
*
*
**
*
*
*
***
* *
**
*
*
*
*
**
**
*
**
*
*
*
**
*
*
*
*
**
*
*
*
*
*
*
**
** *
*
*
*
*
**
*
*
*
*
****
*
*
*
**
*
*
*
***
*
*
4000 6000 8000 10000
4000
6000
8000
10000
u
v
Grave locations (*=grave, O=affected)
OO
O
O
O
O
O
O
OO
OOO
O
O
OO
O
O
O
O
O
O
O
O
OO
OO
O
24 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Case intensity
u
v
Intensity
Estimated intensity function
**
*
*
*
*
*
*
* *
***
*
*
**
**
*
*
*
**
*** **
*
4000 8000
4000
8000
u
v
Affected grave locations
25 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Control intensity
u
v
Intensity
Estimated intensity function
*
*
**
*
** * *
**
*
*
*
*
*
**
*
*
**
**
*
*
***
***
*
**
**
*
*
*
*
*
*
**
*
*
*
*
*
**
*
*
**
*
*
*
*
*
** *
***
*
*
***
*
*
*
**
*
*
*
*
**
*
*
*
*
*
*
** *
*
*
*
**
*
*
**
****
*
*
**
*
*
*
*
4000 6000 8000 10000
4000
6000
8000
1000
0
u
v
Non−affected grave locations
26 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
What we have/don’t have
I Kernel estimates suggest where there might be differences.
I No significance testing (yet!)
27 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsMonte Carlo envelopes
First and Second Order Properties
I The intensity function describes the mean number of eventsper unit area, a first order property of the underlying process.
I What about second order properties relating to thevariance/covariance/correlation between event locations (ifevents non independent...)?
28 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsMonte Carlo envelopes
Ripley’s K function
Ripley (1976, 1977 introduced) the reduced second momentmeasure or K function
K (h) =E [# events within h of a randomly chosen event]
λ,
for any positive spatial lag h.
I Under CSR, K (h) = πh2 (area of circle of with radius h).
I Clustered? K (h) > πh2.
I Regular? K (h) < πh2.
29 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsMonte Carlo envelopes
Calculating K (h) in R
library(splancs)
I khat(pts,poly,s,newstyle=FALSE)
I poly defines polygon boundary (important!!!).
library(spatstat)
I Kest(X, r, correction=c("border", "isotropic",
"Ripley", "translate"))
I Boundary part of X (point process “object”).
30 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsMonte Carlo envelopes
Plots with K (h)
I Plotting (h,K (h)) for CSR is a parabola.
I K (h) = πh2 implies (K (h)
π
)1/2
= h.
I Besag (1977) suggests plotting
h versus L(h)
where
L(h) =
(Kec(h)
π
)1/2
− h
31 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsMonte Carlo envelopes
Monte Carlo Variability and Envelopes
I Observe K (h) from data.
I Simulate a realization of events from CSR.
I Find K (h) for the simulated data.
I Repeat simulations many times.
I Create simulation “envelopes” from simulation-based K (h)’s.
32 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsMonte Carlo envelopes
Example: Regular clusters and clusters of regularity
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
−0.
10.
00.
10.
20.
3
Distance (h)
sqrt
(Kha
t/pi)
− h
Estimated K function, regular pattern of clusters
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
−0.
10.
00.
10.
20.
3
Distance (h)
sqrt
(Kha
t/pi)
− h
Estimated K function, cluster of regular patterns
33 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsMonte Carlo envelopes
Data break: Medieval graves: K functions with polygonadjustment
0 1000 2000 3000 4000 5000
−40
040
0
Distance
Lhat
− d
ista
nce
L plot for all gravesites, polygon
0 1000 2000 3000 4000 5000
−40
040
0
Distance
Lhat
− d
ista
nce
L plot for affected gravesites, polygon
0 1000 2000 3000 4000 5000
−40
040
0
Distance
Lhat
− d
ista
nce
L plot for non−affected gravesites, polygon
34 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsMonte Carlo envelopes
Clustering?
I Clustering of cases at very shortest distances.
I Likely due to two coincident-pair sites (both cases in bothpairs).
I Envelopes based on random samples of 30 “cases” from set of143 locations.
35 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsMonte Carlo envelopes
Notes
I First and second moments do not uniquely define adistribution, and λ(s) and K (h) do not uniquely define aspatial point pattern (Baddeley and Silverman 1984, and inSection 5.3.4 ).
I Analyses based on λ(s) typically assume independent events.
I Analyses based on K (h) typically assume a stationary process(with constant λ).
I Remember IMPORTANT FACT! above.
36 / 37
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsMonte Carlo envelopes
What questions can we answer?
I Are events uniformly distributed in space?I Test CSR.
I If not, where are events more or less likely?I Intensity estimation.
I Do events tend to occur near other events, and, if so, at whatscale?I K functions with Monte Carlo envelopes.
37 / 37