Lecture 12

9
1 Modeling Spatially Continuous Data Lecture 12 October17, 2002 Bailey and Gatrell – Chapter 5 Generalized Least Squares n First fit this model to the observed data using ordinary least squares n Then estimate a variogram model using the residuals from the ordinary least squares estimation n Then compute a covariogram using the variogram model n From the covariogram estimate the covariance matrix and refit the model using generalized least squares The validity of the final model depends on: § appropriate trend surface model choice § appropriate variogram model choice Objectives n Understanding and describing the nature of the spatial variation in an attribute Knowledge of trend and covariance structure are sufficient n Prediction of the attribute values at locations where it has not been sampled How to use derived model for prediction purposes Generalized Least Squares Want to predict a value for the attribute at location s s

description

Geostadistica kriging simple

Transcript of Lecture 12

Page 1: Lecture 12

1

Modeling Spatially Continuous Data

Lecture 12

October17, 2002

Bailey and Gatrell – Chapter 5

Generalized Least Squares

n First fit this model to the observed data using ordinary least squares

n Then estimate a variogram model using the residuals from the ordinary least squares estimation

n Then compute a covariogram using the variogram model

n From the covariogram estimate the covariance matrix and refit the model using generalized least squares

The validity of the final model depends on:§ appropriate trend surface model choice§ appropriate variogram model choice

Objectives

n Understanding and describing the nature of the spatial variation in an attribute

Knowledge of trend and covariance structure are sufficient

n Prediction of the attribute values at locations where it has not been sampled

How to use derived model for prediction purposes

Generalized Least Squares Want to predict a value for the attribute at location s

s

Page 2: Lecture 12

2

Developing Models for PredictionAssume we have the model

)()()( sUsxsY T += β

U(s) as a zero mean process with covariance function C()

From this model can do better than just a prediction based on mean value βµ ˆ)(sxT=

Can add a local component to the mean based on knowledge of the covariance structure and the observed values at sampled points

This approach is known as kriging

Kriging

§ Optimal in the sense that it is unbiased and minimizes the mean squared prediction error

§ Prediction weights are based on the spatial dependence between observations modeled by the variogram

§ An optimal spatial linear prediction method

§ Based primarily on the second order properties of the process Y

Matheron coined the term in honor of D. G. Krige, a South African mining engineer

Kriging Models

n Simple Kriging

n Ordinary Kriging

n Universal Kriging

Mean is known and constant throughout the study region

Mean is fixed but unknown and needs to be estimated

Mean varies, is an unknown linear combination of known functions

Simple KrigingAssumes mean is known and does not need to be estimated

n Find an estimate for a value of the random variable U(s) at the location s given observed values ui of the random variable U(si)at the n sample locations si

)(ˆ su )(su

n Subtract mean from sample observations to derive a set of residuals ui )(syu ii µ−=

n With an estimate the predicted value of the random variable is derived by adding to the known trend at the point s

)(ˆ su)(ˆ su)(ˆ sy

n We assume these residuals with zero mean also have a known variance and covariance function C(s)2σ

Page 3: Lecture 12

3

Simple KrigingEstimates are weighted linear combinations of the observed residuals

Weighted sum of n random variables at sample sites si

)()()(ˆ1

i

n

ii sUssU ∑

=

= λ

)(siλ different weights applied to different locations s

)(ˆ sU as the sum of random variables is a random variable itself

)(ˆ sU should have a mean value of zero for any set of weights since by assumption the mean of is zero and weights are constants

)(sU

s0

s1

s2

s3

s4

s5

s6

Unmeasured point to be estimated

s1,…sn are observed point locations

)()()(ˆ1

i

n

ii sUssU ∑

=

= λ

Simple Kriging

))(ˆ)((2))(())(ˆ())()(ˆ(( 222 sUsUEsUEsUEsUsUE −+=−

∑∑∑== =

−+=n

iiijij

n

i

n

ji ssCsssCss

1

2

1 1

),()(2),()()( λσλλ

)()(2)()( 2 scssCs TT λσλλ −+=

C is an (n x n) matrix of covariances, between all possible pairs of the n sample points

C(s) is an (n x 1) column vector of covariances between the prediction point s and each of the n sample points

Assuming and both have zero mean the expected mean square error is:

)(ˆ sU )(sUSimple Kriging

)()( 1 scCs −=λ

UCscUssU TT 1)()()(ˆ −== λ

)()()))()(ˆ(( 122 scCscsUsUE T −−=− σ

The minimized expected mean square error corresponding to the choice of weights is

Want to chose weights to minimize the mean square error

Differentiating with respect to in order to minimize gives)(sλ

Mean square prediction error or kriging variance 2eσ

Page 4: Lecture 12

4

Worked Example –Simple Kriging

ID X Y Z u1 29 82 122 -38.372 73 105 183 22.633 82 65 148 -12.374 91 52 160 -0.375 22 22 176 15.63

Distances 1 2 3 4 51 0 49.2 53.9 65.2 60.7952 0 40.3 55.9 97.13 0 15.8 72.094 0 69.65 0

Worked Example –Simple Kriging

Covariances 1 2 3 4 51 20 4.571 3.970 2.828 3.2282 20 5.970 3.739 1.0863 20 12.450 2.3004 20 2.4795 20

0.054914 -0.01004 -0.00646 -0.00095 -0.00746-0.01004 0.056751 -0.01508 0.000168 0.000252-0.00646 -0.01508 0.087403 -0.05044 -0.00194-0.00095 0.000168 -0.05044 0.082023 -0.00422-0.00746 0.000252 -0.00194 -0.00422 0.051936

C-1

Worked Example –Simple Kriging

0.054914 -0.01004 -0.00646 -0.00095 -0.00746-0.01004 0.056751 -0.01508 0.000168 0.000252-0.00646 -0.01508 0.087403 -0.05044 -0.00194-0.00095 0.000168 -0.05044 0.082023 -0.00422-0.00746 0.000252 -0.00194 -0.00422 0.051936

λ(s)

Sum

0.2250.0660.3510.1280.107

0.877

6.8955.03210.158.0834.079

C-1 c(s)

)()( 1 scCs −=λ

*

Simple kriging weights need not sum to 1

Worked Example –Simple Kriging

63.15*107.037.0*128.037.12*351.063.22*066.037.38*225.0)(ˆ +−+−++−=su

)()( 122 scCscTe

−−=σσ

= 20.0-6.92

86.9−≅

= 13.08

95 percent CI = 09.75.150 ±

λi(s)

)()()(ˆ1

i

n

ii sUssU ∑

=

= λ

5.15086.937.160)(ˆ ≅−=sy0.2250.0660.3510.1280.107

95 percent confidence interval is esy σ96.1)(ˆ ±

Page 5: Lecture 12

5

s0

s1

s2

s3

s4

s5

s1,…sn are observed point locations

1λ3λ

4λ5λ

)()()(ˆ1

i

n

ii sUssU ∑

=

= λ0.066

0.351

0.128

0.107

0.225

150.5

Ordinary Kriging

Optimal local spatial prediction

n First order effects are implicitly estimated as part of the prediction process

n Prediction of yi occurs in one step using a weighted linear combination of the observed values yi

)()()(ˆ1

i

n

ii sYssY ∑

=

= ω

In this case we chose weights so that the mean value of Y(s) is constrained to be the mean

Ordinary Kriging

§ The mean of will also be µ as long as the weights sum to one

)(ˆ sY

§ The mean of Y(s) and each of the are all µ)( isY

§ With this constraint we minimize the mean squared error between values of Y(s) and )(ˆ sY

The expected mean square is

)()(2)()()))()(ˆ(( 22 scssCssYsYE TT ωσωω −+=−

)()(2)()()))()(ˆ(( 22 scssCssYsYE TT ωσωω −+=−

Ordinary Kriging

C is an (n x n) matrix of covariances, C(si, sj) between all possible pairs of the n sample sites and c(s) is an (n x 1) column vector of covariances, C(s,si) between the prediction point s and each of the n sample sites

To carry out optimization with constraints use the method of Lagrange multiplier v(s)

)()11(2)()(2)()( 2 svscssCs TTT −+−+ ωωσωω

Page 6: Lecture 12

6

Ordinary Kriging1=1Tω

)()()( scsvs =+1CωThese equations are expressed as modified matrix C+ and vectors ω + and c+

0111),(),(

11),(),(

1

111

LL

MMMMOM

L

nnn

n

ssCssC

ssCssC

)()(

)(1

svs

s

ωM

1),(

),( 1

nssC

ssCM

=

)()( scs? +++ =C

)()1(2)()(2)()( 2 svscsss TTT −+−+ 1C ωωσωω

Ordinary Kriging

Leads to 2 simultaneous equations

1=1Tω)()()( scsvs =+ 1Cω

And a solution

)()( 1 scs +−++ = Cω

Ordinary Kriging

Then yssy T )()(ˆ ω=

To obtain the prediction )(ˆ sy

Solve the equation for )(s+ω

Extract from this the vector )(sω

yi is the original set of observations

http://www. msu.edu/~ashton/466/2002_notes/3-18-02/ok_ill.htmlSource

Page 7: Lecture 12

7

x y z 1 2 4 32 4 7 43 8 9 24 7 4 4

Worked Example – Ordinary Kriging

[1] [2] [3] [4] [5] [1] 0.000 3.605 7.810 5.000 4.000[2] 3.605 0.000 4.472 4.243 3.605[3] 7.810 4.472 0.000 5.099 5.385[4] 5.000 4.243 5.099 0.000 1.000[5] 4.000 3.605 5.385 1.000 0.000

The distance vector between the data points and the unknown point is:

[1] 3.162 2.236 5.000 2.236 1.414

The distance matrix for the 5 data points is:

[1] [2] [3] [4] [5] [6] [1] 7.500000 3.620065 0.5001733 2.343750 3.240000 1[2] 3.6200650 7.500000 2.8043796 3.013076 3.620065 1[3] 0.5001733 2.804380 7.5.00000 2.260774 2.027458 1[4] 2.3437500 3.013076 2.2607737 7.50000 6.378750 1[5] 3.2400000 3.620065 2.0274579 6.378750 7.50000 1[6] 1.0000000 1.000000 1.0000000 1.000000 1.000000 0

The covariance matrix for the data points for the given model

Worked Example – Ordinary KrigingVariogram model is a spherical model with nugget=2.5, sill=7.5, range=10.0

=

≤≤

−−+

=

otherwise

h

hhh

h

5.7

00

100200020

3)5.25.7(5.2

)(

3

γ

−−+−=

2000203)5.25.7(5.25.7)(

3hhhC

)()( 2 hCh −= σγ

4.061023 5.026350 2.343750 5.026350 5.919616 1.000000

The vector of covariances for the points to the unknown point is:

Worked Example – Ordinary Kriging

A vector of weights lambda, along with the LaGrange multiplier in column 6:

[1] [1] 0.17289193[2] 0.26523729[3] 0.05887157[4] 0.16986833[5] 0.33313088[6] -0.13471033

4.375627

Then multiply the weight for each data point by the attribute value of that point to determine the ordinary krigingestimate:

5.135453

Calculate the variance as the transpose of c times the inverse of C times c, subtracted from the sill

Worked Example – Ordinary Kriging

The kriging standard error is the square root of this:

= 2.2661542eσ

Page 8: Lecture 12

8

There is no need to estimate a first-order trend; instead, the mean is estimated from nearby data only.

Ordinary versus Simple Kriging

Primarily a local neighborhood estimator

Estimates are not as sensitive to non-stationarity (though the covariance model may be affected, it is not as strongly affectedat short ranges as at long ones).

Universal KrigingIncludes a first order trend component

Forms a prediction for y in one step

As for ordinary kriging, the weights are chosen to minimize mean squared error

)()()()()( 22110 ssxsxsxsY nn εββββ +++++= L

)()()(ˆ1

i

n

ii sYssY ∑

=

= ω

{ }2|)()(ˆ| sYsYEMSE −=

Subject to the constraint that is unbiased for Y(s))(ˆ sY

that is { } { })(()(ˆ sYEsYE = for all pββββ L,,, 210

Universal Kriging

)(ˆ sY is unbiased if and only if 11

=∑=

n

iiω

To obtain the weights that minimize the mean square error subject to this constraint, again use the method of Lagrange multipliers

Taking the derivatives with respect to ω and v, setting the expressions to zero and re-arranging terms we get the kriging equations (using variogram)

nksssxvvss kkj

p

jjki

n

ii ,2,1);()()( 0

10

1

L=−=++− ∑∑==

γγω

11

=∑=

n

iiω pksxsx kik

n

ii ,2,1);()( 0

1

L==∑=

ω

Page 9: Lecture 12

9

Universal KrigingIn matrix notation

p

n

v

vv

M

M

1

0

2

1

ω

ωω

000)()()(

000)()()(

000111)()(10),(),(

)()(1),(0),()()(1),(),(0

21

12111

1211

221212

111121

LLMMMMMM

LL

LLLL

MMMMMOMM

LLLL

nppp

n

npnn

pn

pn

sxsxsx

sxsxsx

sxsxssss

sxsxsssssxsxssss

γγ

γγγγ

−−

)(

)(1

)(

)()(

0

01

0

02

01

sx

sx

ss

ssss

p

n

M

M

γ

γγ

Universal Kriging

00)()(

00)()()()(),(),(

)()(),(),(

1

111

12121

1112121

LLMOMMOM

LLLL

MOMMOMLL

npp

n

npn

p

sxsx

sxsxsxsxssCssC

sxsxssCssC

)(

)()(

)(

1

1

sv

svs

s

p

n

M

ω

)(

)(),(

),(

1

1

1

sx

sxssC

ssC

p

M

M

=

solution

)()( 1 scs +−++ = Cω

)()( scs +++ =ωC

MSE )()( 122 scscTe +

−++−= Cσσ

Prediction is

y)()(ˆ ssy Tω=

Universal Kriging

n May make more sense to estimate trend explicitly

n Need to estimate trend to derive residuals for variogram modeling in any case since it is only safe to estimate variogram model from y when the mean is assumed constant

nDoes not make sense to use it for local neighborhoods

Best to use generalized approach and remove trend explicitly