An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data,...

30
An Introduction to Spatial Statistics Chunfeng Huang Department of Statistics, Indiana University

Transcript of An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data,...

Page 1: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

An Introduction to Spatial Statistics

Chunfeng HuangDepartment of Statistics, Indiana University

Page 2: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.
Page 3: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

Microwave Sounding Unit (MSU) Anomalies (Monthly):1979-2006.

Page 4: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

Iron Ore (Cressie, 1986)

5 10 15

24

68

Raw percent data

Page 5: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

North Carolina sudden infant deaths, 1974-1978 and 1979-1984(Cressie, 1993)

SIDS (sudden infant death syndrome) in North Carolina

34°°N

34.5°°N

35°°N

35.5°°N

36°°N

36.5°°N1974

84°°W 82°°W 80°°W 78°°W 76°°W

34°°N

34.5°°N

35°°N

35.5°°N

36°°N

36.5°°N1979

0 10 20 30 40 50 60

Page 6: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

cells japanesepines redwoodfull

Page 7: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

Three types of spatial data

I GeostatisticsI VariogramI Kriging

I Lattice or Areal dataI Markov Random FieldI Conditional Autoregressive Model (CAR)

I Spatial point patternI Complete spatial randomness (CSR)I K function, L function

Page 8: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

Geostatistics: Quantify spatial dependency and Assumptions

I Second-order stationarity:

E(Z (si )) = µ, cov(Z (si ),Z (sj)) = C (si − sj)

I Isotropy:cov(Z (si ),Z (sj)) = C (||si − sj ||)

I Intrinsic stationarity:

E(Z (si )) = µ, var(Z (si )− Z (sj)) = 2γ(si − sj)

γ(·): semi-variogram, 2γ(·): variogram.

I Stationary implies intrinsic stationary:

γ(si − sj) = C (0)− C (si − sj)

I Intrinsic stationary does not necessarily imply stationary.

Page 9: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

distance

semivariance

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0.0 0.5 1.0 1.5 2.0 2.5 3.0

vgm(1,"Sph",1.5,nugget=0.2)

Page 10: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

Variogram (Semi-variogram)

I Sill, range, nugget

I Conditional negative definite. (Variance may be negative ifthis is not satisfied.)

I Parametric models

Page 11: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

distance

semivariance

0

1

2

3vgm(1,"Nug",0)

0.0 1.0 2.0 3.0

vgm(1,"Exp",1) vgm(1,"Sph",1)

0.0 1.0 2.0 3.0

vgm(1,"Gau",1) vgm(1,"Exc",1)

vgm(1,"Mat",1) vgm(1,"Ste",1) vgm(1,"Cir",1) vgm(1,"Lin",0)

0

1

2

3vgm(1,"Bes",1)

0

1

2

3vgm(1,"Pen",1) vgm(1,"Per",1) vgm(1,"Hol",1) vgm(1,"Log",1) vgm(1,"Pow",1)

0.0 1.0 2.0 3.0

vgm(1,"Spl",1)

0

1

2

3vgm(1,"Leg",1)

Page 12: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

Estimation of variogram

I Empirical variogram (Method-of-moments estimator)

2γ(h) =1

|N(h)|∑N(h)

{Z (si )− Z (sj)}2,

where N(h) is the set of pairs with distance h, |N(h)| is thenumber of pairs.

I Tolerance regions.

I Fitting parametric models (weighted least squares, MLE,REML)

Page 13: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

0 2 4 6 8

05

10

distance

semivariance

Empirical variogram

Page 14: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

0 2 4 6 8

05

10

distance

semivariance

Empirical variogram

Page 15: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

KrigingSuppose we observe a spatial process Z (s1),Z (s2), . . . ,Z (sn). Thebest (in terms of minimizing mean squared prediction error)unbiased linear predictor of Z (s0) is (Cressie, 1993)

Z (s0) =n∑

i=1

λiZ (si ),

where

(λ1, . . . , λn) =

(γ + 1

(1− 1TΓ−1γ)

1TΓ−11

)T

Γ−1

γ = (γ(s0 − s1), . . . , γ(s0 − sn))T

Γ = {γ(si − sj)}n×n

Kriging variance:

σ2K (s0) = λTΓ−1λ− (1TΓ−11− 1)2/(1TΓ−11)

Page 16: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

x

y

Z

48 50 52 54 56 58

Page 17: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

x

y

Prediction MSE

2.6 2.8 3.0 3.2 3.4 3.6

Page 18: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

oo

oo

oo

ooo

52 54 56 58

24

68

Iron Ore %

Northing

xx

xx

xxx

xx

o

o

o o

o

oo

o

o

oo o o

o o

oo

5 10 15

4650

5458

Easting

Iron

Ore

%

x

x

x xxxx

x x

xxx x

xx

xx o = Median Iron Ore %

x = Mean Iron Ore %

Page 19: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

Areal Data or Lattice Data

Simultaneous autoregressive (SAR) model

z(s1) = +b12z(s2) + . . .+ b1nz(sn) + ε1

z(s2) = b21z(s1) + + . . .+ b2nz(sn) + ε2...

z(sn) = bn1z(s1) + bn2z(s2) + . . .+ +εn

Orz = Bz + ε

I Explanatory variables.

I One can show that ε is not independent of z.

I What if z is discrete?

Page 20: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

Conditional autoregressive (CAR) model

z(si )|{z(sj) : j 6= i} ∼ N(n∑

i=1

cijz(sj), τ2i )

wherecijτ

2j = cjiτ

2i , cii = 0

If we further assume the conditional dependency only through theneighborhood of si : N(si )

cik = 0, k /∈ N(si )

(Markov random field (MRF))

Page 21: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

I Response variable falls in exponential family: auto spatialmodels.

I Response variable is normal: auto Gaussian model.I Response variable is binary: auto logistic model.I Response variable is poisson: auto poisson model.

I SAR can be represented as a CAR model. Not necessarily viceversa, see example in Cressie (1993).

I Markov random field theory is used to guide from theconditional distributions to a valid joint distribution.

Page 22: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

Proximity matrix W .

Some choices for Wij (Wii = 0):

I Wij = 1 if i , j share a common boundary.

I Wij is an inverse distance between units.

I Wij = 1 if distance between i and j less than some fixed value.

I Wij = 1 for m nearest neighbors.

Page 23: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

CAR modelz ∼ N(Xβ, (I − C )−1M)

whereC = {cij}n×n, M = τ2I

andC = ρW

ρ = 0⇒ spatial independence

Condition on ρ: I − ρW is positive definite.

Page 24: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

Spatial point process

A spatial point process is said to have the complete spatialrandomness (CSR) property if it is a homogeneous poisson pointprocess.

I A1, . . . ,Ar disjoint, then N(A1), . . . ,N(Ar ) are independent.(N(A) : number of events in A)

I N(A) ∼ Poisson(λ|A|), where |A| is the volume of A.

Page 25: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

Testing for CSR

I W : distance from a randomly chosen event to its nearestevent.

I X : point to nearest event.

I Through Monte Carlo test.

Page 26: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

First order intensity function

λ(x) = lim|dx |→0

{E[N(dx)]

|dx |

}Second order intensity function

λ2(x , y) = lim|dx |,|dy |→0

{E[N(dx)N(dy)]

|dx ||dy |

}Stationary: λ(x) = λ, λ2(x , y) = λ2(x − y)Isotropy: λ2(x − y) = λ2(||x − y ||)Under CSR: λ(x) = λ, λ2(x , y) = λ2

Page 27: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

K function

K (t) =1

λE[N0(t)]

where N0(t) is the number of further events within distance t ofan arbitrary event.One can show that

λ2(t) =λ2K ′(t)

2πt, λ2K (t) = 2π

∫ t

0λ2(y)ydy

For the CSR: K (t) = πt2

L function:L(t) =

√K (t)/π − t

Page 28: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

0.00 0.05 0.10 0.15 0.20

0.00

0.05

0.10

0.15

0.20

Kenvjap

r

K(r)

Kobs(r)Ktheo(r)Khi(r)Klo(r)

Page 29: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

0.00 0.05 0.10 0.15 0.20

0.00

0.05

0.10

0.15

Kenvred

r

K(r)

Kobs(r)Ktheo(r)Khi(r)Klo(r)

Page 30: An Introduction to Spatial Statistics · Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. I Bivand, R., ... Applied Spatial Data Analysis with R, Springer-Verlag.

References

I Cressie, N. (1993), Statistics for Spatial Data, rev. ed., Wiley.

I Cressie, N. and Wikle, C. (2012), Statistics forSpatio-Temporal Data, Wiley.

I Diggle, P. J. (1983), Statistical Analysis of Spatial PointPatterns, Academic Press.

I Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004),Hierarchical Modeling and Analysis for Spatial Data,Chapman and Hall.

I Bivand, R., Pebesma, E. and Gomez-Rubio V. (2008), AppliedSpatial Data Analysis with R, Springer-Verlag.

I Stein, M. (1999), Interpolation of Spatial Data.

I Cressie, N. (1986). Kiring non stationary data. Journal ofAmerican Statistical Association, 81, 625-634.