An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data,...
Transcript of An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data,...
An Introduction to Spatial Statistics
Chunfeng HuangDepartment of Statistics, Indiana University
Microwave Sounding Unit (MSU) Anomalies (Monthly):1979-2006.
Iron Ore (Cressie, 1986)
5 10 15
24
68
Raw percent data
North Carolina sudden infant deaths, 1974-1978 and 1979-1984(Cressie, 1993)
SIDS (sudden infant death syndrome) in North Carolina
34°°N
34.5°°N
35°°N
35.5°°N
36°°N
36.5°°N1974
84°°W 82°°W 80°°W 78°°W 76°°W
34°°N
34.5°°N
35°°N
35.5°°N
36°°N
36.5°°N1979
0 10 20 30 40 50 60
cells japanesepines redwoodfull
Three types of spatial data
I GeostatisticsI VariogramI Kriging
I Lattice or Areal dataI Markov Random FieldI Conditional Autoregressive Model (CAR)
I Spatial point patternI Complete spatial randomness (CSR)I K function, L function
Geostatistics: Quantify spatial dependency and Assumptions
I Second-order stationarity:
E(Z (si )) = µ, cov(Z (si ),Z (sj)) = C (si − sj)
I Isotropy:cov(Z (si ),Z (sj)) = C (||si − sj ||)
I Intrinsic stationarity:
E(Z (si )) = µ, var(Z (si )− Z (sj)) = 2γ(si − sj)
γ(·): semi-variogram, 2γ(·): variogram.
I Stationary implies intrinsic stationary:
γ(si − sj) = C (0)− C (si − sj)
I Intrinsic stationary does not necessarily imply stationary.
distance
semivariance
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0.0 0.5 1.0 1.5 2.0 2.5 3.0
vgm(1,"Sph",1.5,nugget=0.2)
Variogram (Semi-variogram)
I Sill, range, nugget
I Conditional negative definite. (Variance may be negative ifthis is not satisfied.)
I Parametric models
distance
semivariance
0
1
2
3vgm(1,"Nug",0)
0.0 1.0 2.0 3.0
vgm(1,"Exp",1) vgm(1,"Sph",1)
0.0 1.0 2.0 3.0
vgm(1,"Gau",1) vgm(1,"Exc",1)
vgm(1,"Mat",1) vgm(1,"Ste",1) vgm(1,"Cir",1) vgm(1,"Lin",0)
0
1
2
3vgm(1,"Bes",1)
0
1
2
3vgm(1,"Pen",1) vgm(1,"Per",1) vgm(1,"Hol",1) vgm(1,"Log",1) vgm(1,"Pow",1)
0.0 1.0 2.0 3.0
vgm(1,"Spl",1)
0
1
2
3vgm(1,"Leg",1)
Estimation of variogram
I Empirical variogram (Method-of-moments estimator)
2γ(h) =1
|N(h)|∑N(h)
{Z (si )− Z (sj)}2,
where N(h) is the set of pairs with distance h, |N(h)| is thenumber of pairs.
I Tolerance regions.
I Fitting parametric models (weighted least squares, MLE,REML)
0 2 4 6 8
05
10
distance
semivariance
Empirical variogram
0 2 4 6 8
05
10
distance
semivariance
Empirical variogram
KrigingSuppose we observe a spatial process Z (s1),Z (s2), . . . ,Z (sn). Thebest (in terms of minimizing mean squared prediction error)unbiased linear predictor of Z (s0) is (Cressie, 1993)
Z (s0) =n∑
i=1
λiZ (si ),
where
(λ1, . . . , λn) =
(γ + 1
(1− 1TΓ−1γ)
1TΓ−11
)T
Γ−1
γ = (γ(s0 − s1), . . . , γ(s0 − sn))T
Γ = {γ(si − sj)}n×n
Kriging variance:
σ2K (s0) = λTΓ−1λ− (1TΓ−11− 1)2/(1TΓ−11)
x
y
Z
48 50 52 54 56 58
x
y
Prediction MSE
2.6 2.8 3.0 3.2 3.4 3.6
oo
oo
oo
ooo
52 54 56 58
24
68
Iron Ore %
Northing
xx
xx
xxx
xx
o
o
o o
o
oo
o
o
oo o o
o o
oo
5 10 15
4650
5458
Easting
Iron
Ore
%
x
x
x xxxx
x x
xxx x
xx
xx o = Median Iron Ore %
x = Mean Iron Ore %
Areal Data or Lattice Data
Simultaneous autoregressive (SAR) model
z(s1) = +b12z(s2) + . . .+ b1nz(sn) + ε1
z(s2) = b21z(s1) + + . . .+ b2nz(sn) + ε2...
z(sn) = bn1z(s1) + bn2z(s2) + . . .+ +εn
Orz = Bz + ε
I Explanatory variables.
I One can show that ε is not independent of z.
I What if z is discrete?
Conditional autoregressive (CAR) model
z(si )|{z(sj) : j 6= i} ∼ N(n∑
i=1
cijz(sj), τ2i )
wherecijτ
2j = cjiτ
2i , cii = 0
If we further assume the conditional dependency only through theneighborhood of si : N(si )
cik = 0, k /∈ N(si )
(Markov random field (MRF))
I Response variable falls in exponential family: auto spatialmodels.
I Response variable is normal: auto Gaussian model.I Response variable is binary: auto logistic model.I Response variable is poisson: auto poisson model.
I SAR can be represented as a CAR model. Not necessarily viceversa, see example in Cressie (1993).
I Markov random field theory is used to guide from theconditional distributions to a valid joint distribution.
Proximity matrix W .
Some choices for Wij (Wii = 0):
I Wij = 1 if i , j share a common boundary.
I Wij is an inverse distance between units.
I Wij = 1 if distance between i and j less than some fixed value.
I Wij = 1 for m nearest neighbors.
CAR modelz ∼ N(Xβ, (I − C )−1M)
whereC = {cij}n×n, M = τ2I
andC = ρW
ρ = 0⇒ spatial independence
Condition on ρ: I − ρW is positive definite.
Spatial point process
A spatial point process is said to have the complete spatialrandomness (CSR) property if it is a homogeneous poisson pointprocess.
I A1, . . . ,Ar disjoint, then N(A1), . . . ,N(Ar ) are independent.(N(A) : number of events in A)
I N(A) ∼ Poisson(λ|A|), where |A| is the volume of A.
Testing for CSR
I W : distance from a randomly chosen event to its nearestevent.
I X : point to nearest event.
I Through Monte Carlo test.
First order intensity function
λ(x) = lim|dx |→0
{E[N(dx)]
|dx |
}Second order intensity function
λ2(x , y) = lim|dx |,|dy |→0
{E[N(dx)N(dy)]
|dx ||dy |
}Stationary: λ(x) = λ, λ2(x , y) = λ2(x − y)Isotropy: λ2(x − y) = λ2(||x − y ||)Under CSR: λ(x) = λ, λ2(x , y) = λ2
K function
K (t) =1
λE[N0(t)]
where N0(t) is the number of further events within distance t ofan arbitrary event.One can show that
λ2(t) =λ2K ′(t)
2πt, λ2K (t) = 2π
∫ t
0λ2(y)ydy
For the CSR: K (t) = πt2
L function:L(t) =
√K (t)/π − t
0.00 0.05 0.10 0.15 0.20
0.00
0.05
0.10
0.15
0.20
Kenvjap
r
K(r)
Kobs(r)Ktheo(r)Khi(r)Klo(r)
0.00 0.05 0.10 0.15 0.20
0.00
0.05
0.10
0.15
Kenvred
r
K(r)
Kobs(r)Ktheo(r)Khi(r)Klo(r)
References
I Cressie, N. (1993), Statistics for Spatial Data, rev. ed., Wiley.
I Cressie, N. and Wikle, C. (2012), Statistics forSpatio-Temporal Data, Wiley.
I Diggle, P. J. (1983), Statistical Analysis of Spatial PointPatterns, Academic Press.
I Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004),Hierarchical Modeling and Analysis for Spatial Data,Chapman and Hall.
I Bivand, R., Pebesma, E. and Gomez-Rubio V. (2008), AppliedSpatial Data Analysis with R, Springer-Verlag.
I Stein, M. (1999), Interpolation of Spatial Data.
I Cressie, N. (1986). Kiring non stationary data. Journal ofAmerican Statistical Association, 81, 625-634.