Chapter 4 - Fundamentals of spatial processes Lecture notes€¦ · Testing for independence If...

Chapter 4 - Fundamentals of spatial processesLecture notes

Geir Storvik

January 21, 2013

STK4150 - Intro 1

Spatial processes

Typically correlation between nearby sites

Mostly positive correlation

Negative correlation when competition

Typically part of a space-time process

Temporal snapshot

Temporal aggregation

Statistical analysis

Incorporate spatial dependence into spatial statistical models

Relatively new in statistics, active research field!

STK4150 - Intro 2

Hierarchical (statistical) models

Data model

Process model

In time series setting - state space models

Example

Yt =αYt−1 + Wt , t = 2, 3, ... Wtind∼ (0, σ2

W ) Process model

Zt =βYt + ηt , t = 1, 2, 3, ... ηtind∼ (0, σ2

W ) Data model

We will do similar type of modelling now, separating the process modeland the data model:

Z (si ) = Y (si ) + ε(si ), ε(si ) ∼ iid

STK4150 - Intro 3

Hierarchical (statistical) models

Data model

Process model

In time series setting - state space modelsExample

Yt =αYt−1 + Wt , t = 2, 3, ... Wtind∼ (0, σ2

W ) Process model

Zt =βYt + ηt , t = 1, 2, 3, ... ηtind∼ (0, σ2

W ) Data model

We will do similar type of modelling now, separating the process modeland the data model:

Z (si ) = Y (si ) + ε(si ), ε(si ) ∼ iid

STK4150 - Intro 4

Prediction

Typical aim:

Observed Z = (Z (s1), ...,Z (sm))

Want to predict Y (s0)

Squared error loss function: Optimal E (Y (s0)|Z)

Empirical methods: E(Y (s0)|Z) ≈ E(Y (s0)|Z, θ)Bayesian methods: E(Y (s0)|Z) =

∫θ E(Y (s0)|Z,θ)p(θ|Z)dθ

Require E (Y (s0)|Z,θ)

Z|Y are independent, but dependent marginally

Require model for {Y (s)}

STK4150 - Intro 5

Geostatistical models

Assume {Y (s)} is a Gaussian process

(Y (s1), ...,Y (sm)) is multivariate Gaussian for all m and s1, ..., sm

Need to specify

µ(s) = E (Y (s))

CY (s, s′) = cov(Y (s),Y (s′))

Usually assume 2. order stationarity:

µ(s) = µ, ∀sCY (s, s′) = CY (|s− s′|), ∀s, s′

Possible extension:

µ(s) = x(s)Tβ

Often:

Z (si )|Y (si ), σ2ε ∼ ind.Gau(Y (si ), σ

2ε)

STK4150 - Intro 6

Prediction

(Z (s1), ...,Z (sm),Y (s0)) is multivariate GaussianCan use rules about conditional distributions:(

X1

X2

)∼MVN

((µ1

µ2

),

(Σ11 Σ12

Σ21 Σ22

))E (X2|X1) =µ1 + Σ12Σ−122 (X2 − µ2)

var(X2|X1) =Σ11 −Σ12Σ−122 Σ21

Need

Expectations: As for ordinary linear regression

Covariances: New!

STK4150 - Intro 7

Variogram and covariance function

Dependence can be specified through covariance functionsNot always that such exist, but prediction can still be carried outMore general concept: Variogram

2γY (h) ≡var[Y (s + h)− Y (s)]

=var[Y (s + h)] + var[Y (s)]− 2cov[Y (s + h),Y (s)]

=2CY (0)− 2CY (h)

Note:

Variogram can exist even if var[Y (s)] =∞!

Often assumed in spatial statistics, thereby the use of variogramsWhen 2γY (h) = 2γY (||h||), we get an isotrophic variogram

STK4150 - Intro 8

Isotropic covariance functions/variograms

Matern covariance function

CY (h;θ) = σ21{2θ2−1Γ(θ2)}−1{||h||/θ1}θ2Kθ2(||h||/θ1)

Powered-exponential

CY (h;θ) = σ21 exp{−(||h||/θ1)θ2}

Exponential

CY (h;θ) = σ21 exp{−(||h||/θ1)}

Gaussian

CY (h;θ) = σ21 exp{−(||h||/θ1)2}

STK4150 - Intro 9

Bochner’s theorem

A covariance function needs to be positive definite

Theorem (Bochner, 1955)

If∫∞−∞ · · ·

∫∞−∞ |CY (h)|dh <∞, then a valid real-valued covariance

function can be written as

CY (h) =

∫ ∞−∞· · ·∫ ∞−∞

cos(ωTh)fY (ω)dω

where fY (ω) ≥ 0 is symmetric about ω = 0.

fY (ω): Spectral density of CY (h).

STK4150 - Intro 10

Nugget effect and Sill

We have

CZ (h) =cov[Z (s),Z (s + h)]

=cov[Y (s) + ε(s),Y (s + h) + ε(s + h)]

=cov[Y (s),Y (s + h)] + cov[ε(s), ε(s + h)]

=CY (h) + σ2εI (h = 0)

Assume

CY (0) =σ2Y

limh→0

[CY (0)− CY (h)] =0

Then

CZ (0) = σ2Y + σ2

ε Sill

limh→0

[CZ (0)− CZ (h)] = σ2ε = c0 Nugget effect

Possible to include nugget effect also in Y -process.STK4150 - Intro 11

Nugget/sill

STK4150 - Intro 12

Estimation of variogram/covariance function

2γZ (h) ≡var[Z (s + h)− Z (s)]

Const expectation= E [Z (s + h)− Z (s)]2

Can estimate from “all” pairs having distance h between.Problem: Few/no pairs for all hSimplifications

γZ (h) = γ0Z (||h||)Use 2γ0Z (h) = ave{(Z (si )− Z (sj))2; ||si − sj || ∈ T (h)}If covariates, use residuals

STK4150 - Intro 13

Boreality data

Empirical variogram: Boreality data

STK4150 - Intro 14

Testing for independence

If independence: γ0Z (h) = σ2Z

Test-statistic F = γZ (h1)/σ2Z , h1 smallest observed distance

Reject H0 for |F − 1| largePermutation test: Recalculate F for all permutations of ZIf observed F is above 97.5% percentile, reject H0

Boreality example:

P-value = 0.0001

STK4150 - Intro 15

Kriging = Prediction

Model

Y =Xβ + δ

Z =Y + ε

Prediction of Y (s0)Linear predictors {LTZ + k}Optimal predictor minimize

MSPE(L, k) ≡E [Y (s0)− LTZ− k]2

=var[Y (s0)− LTZ− k] + {E [Y (s0)− LTZ− k]}2

Note: Do not assume any distributional assumptions

STK4150 - Intro 16

Kriging

MSPE(L, k) =var[Y (s0)− LTZ− k] + {E [Y (s0)− LTZ− k]}2

=var[Y (s0)− LTZ− k] + {µY (s0)− LTµz − k]}2

Second term is zero if k = µY (s0)− LTµZ .First term (c(s0) = cov[Z,Y (s0)])

var[Y (s0)− LTZ− k] =CY (s0, s0)− 2LTc(s0) + LTCZL

Derivative wrt LT :

− 2c(s0) + 2CZL = 0

L∗ = C−1Z c(s0)

giving

Y ∗(s0) =µY (s0) + c(s0)TC−1Z [Z− µZ ]

MSPE(L∗, k∗) =CY (s0, s0)− c(s0)TC−1Z c(s0)

STK4150 - Intro 17

Kriging

c(s0) =cov[Z,Y (s0)]

=cov[Y,Y (s0)]

=(CY (s0, s1), ...,CY (s0, sm))

=cY (s0)

CZ ={CZ (si , sj)}

CZ (si , sj) =

{CY (si , si ) + σ2

ε, si = sj

CY (si , sj), si 6= sj

Y ∗(s0) = µY (s0) + cY (s0)TC−1Z (Z− µZ )

STK4150 - Intro 18

Gaussian assumptions

Assume now Y,Z are MVN(Z

Y(s0)

)= MVN

((µZ

µY (s0)

),

(CZ cY (s0)T

cY (s0) CY (s0, s0)

))Give

Y (s0)|Z ∼ N(µY (s0) + c(s0)TC−1Z [Z− µZ ],CY (s0, s0)− c(s0)TC−1Z c(s0))

Same as kriging!The book derive this directly without using the formula for conditionaldistribution

STK4150 - Intro 19

Kriging (cont)

Y ∗(s0) = µY (s0) + cY (s0)TC−1Z (Z− µZ )

Assuming

E [Y (s)] =x(s)Tβ

Z (si )|Y (si ), σ2ε ∼ind.Gau(Y (si ), σ

2ε)

Then

µZ =µY = Xβ

CZ =ΣY + σ2εI

cY (s0) =(CY (s0, s1), ...,CY (s0, sm))T

and

Y ∗(s0) = x(s0)β + cY (s0)T [ΣY + σ2εI]−1(Z− Xβ)

STK4150 - Intro 20

Summary previous time/plan

Simple kriging

Linear predictorAssume parameters knownEqual to conditional expectation

Unknown parameters

Ordinary krigingPlug-in estimatesBayesian approach

Non-Gaussian models

STK4150 - Intro 21

Summary previous time/plan

Simple kriging

Linear predictorAssume parameters knownEqual to conditional expectation

Unknown parameters

Ordinary krigingPlug-in estimatesBayesian approach

Non-Gaussian models

STK4150 - Intro 22

Unknown parameters

So far assumed parameters known, what if unknown?

Plug-in estimate/Empirical Bayes

Bayesian approach

Direct approach - Ordinary kriging

STK4150 - Intro 23

Estimation

Likelihood

L(θ) = p(z;θ) =

∫y

p(z|y;θ)p(y;θ)dθ

Gaussian processes: Can be derived analytically

Optimization can still be problematic

Many (too many?) routines in R available

> gls(Bor~Wet,correlation=corGaus(form=~x+y,nugget=TRUE),data=Boreality)

Generalized least squares fit by REML

Model: Bor ~ Wet

Data: Boreality

Log-restricted-likelihood: -1363.146

Coefficients:

(Intercept) Wet

15.06705 77.66940

Correlation Structure: Gaussian spatial correlation

Formula: ~x + y

Parameter estimate(s):

range nugget

460.3941829 0.6112509

Degrees of freedom: 533 total; 531 residual

Residual standard error: 3.636527

STK4150 - Intro 24

Estimation

Likelihood

L(θ) = p(z;θ) =

∫y

p(z|y;θ)p(y;θ)dθ

Gaussian processes: Can be derived analytically

Optimization can still be problematic

Many (too many?) routines in R available

> gls(Bor~Wet,correlation=corGaus(form=~x+y,nugget=TRUE),data=Boreality)

Generalized least squares fit by REML

Model: Bor ~ Wet

Data: Boreality

Log-restricted-likelihood: -1363.146

Coefficients:

(Intercept) Wet

15.06705 77.66940

Correlation Structure: Gaussian spatial correlation

Formula: ~x + y

Parameter estimate(s):

range nugget

460.3941829 0.6112509

Degrees of freedom: 533 total; 531 residual

Residual standard error: 3.636527

STK4150 - Intro 25

Bayesian approach

Hierarchical model



∫y p(z, y|θ)dy





∫θ∫

y p(z, y|θ)dydθ


STK4150 - Intro 27

Bayesian approach

Hierarchical model



∫y p(z, y|θ)dy





∫θ∫

y p(z, y|θ)dydθ


STK4150 - Intro 28

Bayesian approach - example

Assume z1, ..., zm are iid, zi = µ+ εi , εi ∼ N(0, σ2)σ2 known, interest in estimating µ.ML: µML = z , Var[z ] = σ2/m

Bayesian approach: Assume µ ∼ N(µ0, kσ2)

Can show that

E [µ|z] =k−1µ0 + mz

k−1 + mk→∞→ z

Var[µ|z] =σ2

k−1 + mk→∞→ σ2

m

STK4150 - Intro 29

Bayesian approach - example

Assume z1, ..., zm are iid, zi = µ+ εi , εi ∼ N(0, σ2)σ2 known, interest in estimating µ.ML: µML = z , Var[z ] = σ2/m

Bayesian approach: Assume µ ∼ N(µ0, kσ2)

Can show that

E [µ|z] =k−1µ0 + mz

k−1 + mk→∞→ z

Var[µ|z] =σ2

k−1 + mk→∞→ σ2

m

STK4150 - Intro 30

INLA

INLA = Integrated nested Laplace approximation (software)C-code, R-interfaceCan be installed by

source("http://www.math.ntnu.no/inla/givemeINLA.R")

Both empirical Bayes and Bayes

STK4150 - Intro 33

Example - simulated data

Assume

Y (s) =β0 + β1x(s) + σδδ(s), {δ(s)} matern correlation

Z (si ) =Y (si ) + σεε(si ), {ε(si )} independent

Simulations:

Simulation on 20× 30 grid

{x(s)} sinus curve in horizontal direction

Parametervalues (β0, β1) = (2, 0.3), σδ = 1, σε = 0.3 and(θ1, θ2) = (2, 3)

Show imagesShow code/results

STK4150 - Intro 34

Matern model - parametrization

Model Range par Smoothness parBook ∝ {||h||/θ1}θ2Kθ2(||h||/θ1) θ1 θ2geoR ∝ {||h||/φ}κKκ(||h||/φ) φ κINLA ∝ {||h|| ∗ κ}νKν(||h|| ∗ κ) 1/κ ν

Note: INLA reports an estimate of another “range”, defined as

r =

√8

κ

corresponding to a distance where the covariance function isapproximately zero. An estimate of the range parameter can then beobtained by

1

κ=

r√8

Note: INLA works with precisions τy = σ−2y , τz = σ−2z .

STK4150 - Intro 35

Output INLA

Empirical Bayes

Fixed effects:mean sd 0.025quant 0.5quant 0.975quant kld

(Intercept) 2.5757115 0.006218343 2.5635180 2.5757114 2.5879077 0x 0.3714519 0.040367838 0.2922947 0.3714515 0.4506262 0

Random effects:Name Model Max KLDdelta Matern2D model

Model hyperparameters:mean sd 0.025quant 0.5quant

Precision for the Gaussian observations 28.8979 1.5841 25.9174 28.8546Precision for delta 2.2549 0.1507 1.9738 2.2498Range for delta 5.9772 0.2936 5.4222 5.9700

0.975quantPrecision for the Gaussian observations 32.1247Precision for delta 2.5644Range for delta 6.5730

Marginal Likelihood: -455.08Warning: Interpret the marginal likelihood with care if the prior model is improper.Posterior marginals for linear predictor and fitted values computed

STK4150 - Intro 36

Result - INLA - empirical Bayes

Parameter True value Estimateβ1 2.0 2.423β2 0.3 0.357σz 0.3 1√

13.2739= 0.274

σy 1.0 1√0.5802

= 1.313

φ1 2.0 5.977√8

= 2.113

STK4150 - Intro 37

Ordinary kriging

E [Y (s)] = µ, µ unknown, obs Z = (Z1, ...,Zm)MSPE(λ) = E [Y (s0)− λTZ)2

Constraint: E [λTZ] = E [Y (s0)] = µE [λTZ] = λTE [Z] = λT1µ⇒ λT1 = 1Lagrange multiplier: Minimize MSPE(λ)− 2κ(λT1− 1)

MSPE(λ) = CY (s0, s0)− 2λT cY (s0) + λTCZλ

Differentiation wrt λT :

−2cY (s0) + 2CZλ =2κ1

λT1 =1

λ∗ =C−1Z [cY (s0) + κ∗1]

κ∗ =1− cY (s0)TC−1Z 1

1TC−1Z 1

STK4150 - Intro 38

Kriging

Simple kriging, EY (s) = x(s)Tβ, known

Y ∗(s0) = x(s0)β + cY (s0)TC−1Z (Z− Xβ)

Ordinary kriging, EY (s) = µ, unknown

Y (s0) ={cY (s0) +1(1− 1TCZ−1cy (s0))

1TC−1Z 1}TC−1Z Z

=µgls + cY (s0)TC−1Z (Z− 1µgls)

µgls =[1TC−1Z 1]−11TC−1z Z

Universal kriging, EY (s) = x(s)Tβ, unknown

Y (s0) =x(s0)T βgls + cY (s0)TC−1Z (Z− Xβgls)

βgls =[XTC−1Z X]−1XTC−1z Z

STK4150 - Intro 39

Kriging


Y ∗(s0) = x(s0)β + cY (s0)TC−1Z (Z− Xβ)


Y (s0) ={cY (s0) +1(1− 1TCZ−1cy (s0))







STK4150 - Intro 40

Kriging


Y ∗(s0) = x(s0)β + cY (s0)TC−1Z (Z− Xβ)


Y (s0) ={cY (s0) +1(1− 1TCZ−1cy (s0))







STK4150 - Intro 41

Non-Gaussian data

Counts, binary data: Gaussian assumption inappropriate

Can still have Gaussian assumption on latent process, butnon-Gaussian data-distribution

Y (s) =x(s)Tβ + ε(s), {ε(s)} Gaussian process

Z (si )|Y (si ), θ1 ∼ind.f(Y (si ), θ1)

Best linear predictor still possible, but not reasonable (?)

Conditional expectation E [Y (s0)|Z] still optimal under square lossNot easy to compute anymore

Exponential-family model (EFM)

f (z) = exp{(zη − b(η))/a(θ1) + c(z , θ1)}η =xTβ

Include Binomial, Poisson, Gaussian, Gamma

STK4150 - Intro 42

Estimation

Likelihood

L(θ) = p(z;θ) =

∫y

p(z|y;θ)p(y;θ)dθ

Gaussian processes: Can be derived analyticallyNon-Gaussian: Can be problematic

Monte Carlo

Laplace approximations

STK4150 - Intro 43

Computation of conditional expectation

E [Y (s0)|Z] = E [E [Y (s0)|Y,Z]]

Monte Carlo:

E [E [Y (s0)|Y,Z]] ≈ 1

M

M∑m=1

E [Y (s0)|Ym,Z]

where Ym ∼ [Y|Z] (MCMC)

Laplace approximation: Approximate [Y|Z] by a Gaussian

Computer software available for both approahces.

Also give uncertainty measures var[Y (s0)|Z].

STK4150 - Intro 44

Laplace approximation

L(θ) =p(z;θ) =

∫y

p(z|y;θ)p(y;θ)dy =

∫y

exp{f (y;θ)}dy

where

f (y;θ) = log p(z|y;θ) + log p(y;θ)

Taylor approximation, y0 max-point of f (y;θ) (depending on θ)

f (y) ≈f (y0)− 1

2(y − y0)TH(y − y0), H = − ∂2

∂y∂yTf (y)|y=y0

L(θ) ≈∫

y

exp{f (y0)− 1

2(y − y0)TH(y − y0)}dy

= exp{f (y0)}(2π)m/2|H|1/2

STK4150 - Intro 45

Example - simulated data

Assume

Y (s) =β0 + β1x(s) + σδδ(s), {δ(s)} matern correlation

Z (si ) ∼Poisson(exp(Y (si ))

Simulations:

Simulation on 20× 30 grid

{x(s)} sinus curve in horizontal direction

Parametervalues (β0, β1) = (2, 0.3), σδ = 1, σε = 0.3 and(θ1, θ2) = (2, 3)

Show imagesShow code/results

STK4150 - Intro 46

Chapter 4 - Fundamentals of spatial processes Lecture notes€¦ · Testing for independence If...

Documents

Transcript of Chapter 4 - Fundamentals of spatial processes Lecture notes€¦ · Testing for independence If...