WP5 – Training session: RGeostats Geostatistics …rgeostats.free.fr/doc/Files/INTAROS WP5 -...

WP5 – Training session: RGeostats Geostatistics Course & Exercices INTAROS – General Assembly Haus der Wissenschaft, Sandstrasse 4/5 28195 Bremen, Germany 9.00-16.00 January 11th 2019 RENARD Didier, ARMINES (lead) ORS Fabien, ARMINES (co-lead)

Outline Thursday 10th

1. Creating iAOS Processing Services 2. Geostatistics and RGeostats 3. Ellip Notebooks using RGeostats 4. Ellip Worflow using RGeostats 5. IMR Case Study - RGeostats in Action!

Friday 11th

Geostatistics Course & Exercices

2

Details level

Introduction

3

IMR dataset global overview

• 7 vessels • from 1995 to 2016 • 3 variables measured:

• Temperature • Salinity • Conductivity

• 63 500 positions {long, lat} • 63 500 vertical profiles (in depth) • A few million samples • 84 NetCDF files (~60 Mb each)

4

Getting Ready for Exercices! • Click on the Home tab of your

Jupyter Notebooks (in Chrome)

• Click on the geostats_course.ipynb

5

Getting Ready for Exercices!

• Execute all code cells in the Introduction section: => Hitting Shift + Enter

• Definition of R functions • Loading Data (slow) • Setting global environment

6

From Data to Estimation (Kriging)

7

Data – Measurements

Space: 1-D, 2-D, 3-D or more

Data:

• Number : few to several thousands

• Locations: isolated, along lines, regular

• Time dependency

8

Statistics

• Mean, variance (or standard deviation)

• Histograms: extremes, mode, median, quantiles

9

0.05

0.10

0.15

0.20

0 000

0.025

0.050

0.075

0.100

0 00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Bimodal Symmetric Skewed

Statistics • Mean and variance of each variable

• Correlation, covariance, linear regression

10

0.

5.

10.

15.

baZZ += 12 '' 21 bZaZ +=

10

10

20

20

30

30

40

40

50

50

60

60

70

70

0 0

5 5

10 10

15 15

rho=0.453

Z1

Z2

0

0

5

5

10

10

15

15

10 10

20 20

30 30

40 40

50 50

60 60

70 70 rho=0.453

Z2

Z1

Back to Jupyter Notebook!

• We focus on the cells from Basics statistics section: • Global Database Statistics • Display Temperature Variable • Temperature Histogram • Temperature vs Salinity Correlation • Sample Filtering • Statistics per Blocks • Mean and Variances by Years

11

Exercise

• Calculate the experimental mean and variance for Z1

0 1 2 0 2 1 0 1 2 1

• Same calculation for Z2

1 1 1 0 2 1 0 0 2 2

12

Statistics Punctual statistics are not sufficient:

13

0 00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Two images with the same histogram

Hypotheses • Stationary RF : invariance under translation of the spatial law.

• Order-2 stationary RF: the first two moments exist and are invariant under translation.

• Intrinsic RF (IRF0): the increments are order-2 stationary.

14

Stationary

Intrinsic

Hypotheses • An stationary RF is also intrinsic, but an intrinsic RF is not necessarily

stationary.

• No attraction by the mean... but no systematic behavior (as the depth of the bottom of the sea which increases regularly from the beach)

• The purely intrinsic model stands between the stationary and the non-stationary cases. The choice of the degree of non-stationarity depends upon the field of observation.

15

Experimental Variogram • Stationary hypothesis: covariance

• Intrinsic hypothesis

16

[ ])()()( xZhxZEhCov ×+=

[ ] [ ]

( ) ( )[ ]∑=

−+=

−+=−+=

)(

1

2

2

)(21)(

)()(21)()(

21)(

hN

iii xZhxZ

hNh

xZhxZExZhxZVarh

γ

γ

Work with variogram rather than covariance (more general framework)

Exercise: Variogram on regular 1-D sampling

Variable Z defined on a regular grid in 1-D

• Calculate the experimental variogram for the lags: 5m, 10m and 15m.

• Evaluate the experimental variogram of the new variable:

Y(x) = Z(x) + 3.2

17

+ + + + + + + + + + + + + 8 6 4 3 6 5 7 2 8 9 5 6 3

5m

Solution: Variogram on regular 1-D sampling

Consider the distance 5m:



18

+ + + + + + + + + + + + + 8 6 4 3 6 5 7 2 8 9 5 6 3

5m

( ) ( ) ( ) ( )[ ] 625.436...34466821)5( 2222 =−++−+−+−=mγ

( ) ( ) ( ) ( )[ ] 227.535...64364821)10( 2222 =−++−+−+−=mγ

( ) ( ) ( ) ( )[ ] 000.639...54663821)15( 2222 =−++−+−+−=mγ

Variogram Cloud Variable Z defined on a regular grid in 1-D

19

0

0

5

5

10

10

15

15

0 0

5 5

10 10

15 15

20 20

25 25

Distance(x1,x2)

½ [(Z(x1)-Z(x2))2]

Variogram on regular 2-D grid Variable Z defined on a regular grid in 2-D (square mesh = a)

20

1 0 2 -1 1

-1 -2 1 2 0

-2 0 2 1 -1

0 -1 1 0 2

1 0 0 -1 1

a

a

Variogram on 2-D irregular data Calculation of the experimental omni-directional variogram

21

0.

0.

10.

10.

20.

20.

X (km)

X (km)

0. 0.

10. 10.

20. 20.

Y (km)

Y (km)

N

Variogram on 2-D irregular data Establish the Variogram Cloud

22

Distance(x1,x2)

½ [(Z(x1)-Z(x2))2]

Tolerance

Experimental Variogram

23

Lag Number of lags

0

0

10

10

20

20

30

30

40

40

50

50

60

60

0.0

0.5

1.0

Data Variance

Experimental Variogram

Histogram of pairs

Directional Variograms Calculation along two directions: E-W and N-S – Looking for Anisotropy

24

D1D2

0.

0.

50.

50.

100.

100.

150.

150.

(Meter)

(Meter)

0.00 0.00

0.25 0.25

0.50 0.50

0.75 0.75

1.00 1.00

Distance V

ario

gram

N

Directional Variograms

25

Distance classes

Tolerance on Distance

Angular classes

Angular Tolerance

Examples of Variograms

26

From “Geostatistics: modeling spatial uncertainty” J.P. Chilès, P. Delfiner (Wiley 1999)

Hints for Variogram calculation

• Analyze the variogram cloud in order to detect outliers. Pa y attention to presence of different areas or faults, different measurement tools.

• Choose the lag and the tolerance on distance. Check the homogeneity of number of pairs for all lags.

• If possible, calculate variograms in several directions (at least 4 in 2D, 5 in 3D) in order to look for possible anisotropy

27

Fitting a Model Procedure:

• Choose a single variogram Model for all directions and all distances

• Use a valid type of function: authorized Model

• Needed further (for Estimation) as it ensures (conditional) positivity of the variance of any linear combination

• As close as possible to the Experimental Variogram: • Behavior near the origin • Behavior at large distance • Specific features: presence of anisotropy, presence of trend, …

28

Behavior near the origin

• The behavior of the variogram near the origin is directly linked to the continuity of the variable

29

0 00

0.25

0.50

0.75

1.00

1.25

γ(h)

h

0 00

0.25

0.50

0.75

1.00

1.25

γ(h)≈h |h|→0

Continuous variable

0 00

0.25

0.50

0.75

1.00

1.25

γ(h)≈h2

|h|→0

Differentiable variable

0 00

0.25

0.50

0.75

1.00

1.25

Discontinuous variable

Nugget effect

Behavior at large distance

Check for strict stationarity

30

0 00

0.25

0.50

0.75

1.00

1.25

γ(h)

h

Bounded variogram Unbounded variogram

0 00

0.25

0.50

0.75

1.00

1.25

0.5

1.0

1.5

2.0

2.5

Increase rate < h2

Model characteristics

31

0 00

0.25

0.50

0.75

1.00

1.25

γ(h)

h Nugget effect

Sill

Range

Slope

Variogram vs. Covariance

32

0 00

0.25

0.50

0.75

1.00

1.25

γ(h)

h

Sill

Range

0 00

0.25

0.50

0.75

1.00

1.25

C(h)

h

0.50

0.75

1.00

Bounded variogram can be turned into covariance (stationary case)

Usual Models Spherical

33

0 00

0.25

0.50

0.75

1.00

1.25

ahc

ahah

ahch

>=

≤

−= 3

332

)(γ

Usual Models Exponential

34

−=

−ah

ch exp1)(γ

0 00

0.25

0.50

0.75

1.00

1.25

Usual Models Gaussian

35

−=

−

2

exp1)( ah

chγ

0 00

0.25

0.50

0.75

1.00

1.25

Usual Models Cardinal Sine

36

( )ah

ahch sin)( =γ

0.25

0.50

0.75

1.00

1.25

Usual Models Linear

37

ah

ch =)(γ

0.5

1.0

1.5

Usual Models Nugget Effect

38

000)(

>===

hchhγ

Nested Variograms: Definition Nesting variograms = Adding values for each distance

39

0 00

0.25

0.50

0.75

1.00

1.25

0 00

0.25

0.50

0.75

1.00

1.25

Short range 1 ( )hγ

0 00

0.25

0.50

0.75

1.00

1.25

Long range 2 ( )hγ

+ =

)()()( 21 hhh γγγ +=

Nested Variograms: Example Cubic (short range) + Spherical (long range)

40

0 00

0.25

0.50

0.75

1.00

Anisotropy: Definition Two types of Anisotropies:

41

γ(h)

0 00

0.25

0.50

0.75

1.00

1.25

h

sill 2

sill 1

Zonal anisotropy

sill

γ(h)

0 00

0.25

0.50

0.75

1.00

h

Geometrical anisotropy

Anisotropy: Examples

42

5000

10000

15000

Geometrical Anisotropy

5000

10000

15000

Zonal Anisotropy

0 00

0.25

0.50

0.75

1.00

0 00

0.25

0.50

0.75

1.00

1.25

Variances

Calculate the variance of a linear combination:

• Using the Covariance

• Using the Variogram

43

∑i

iiZλ

)(hC

( ) 0≥==

∑∑∑∑∑i j

ijjii j

ijjii

ii ChCZVar λλλλλ

)(hγ

( ) 0≥−=−=

∑∑∑∑∑i j

ijjii j

ijjii

ii hZVar γλλγλλλ

Why using authorized variogram Calculate the Variance of Z1 -1/2 Z2 - 1/2 Z3

44

0.2

0.2 0.4 0.6 0.8 1.0

γ(h)

h

1 Z1

Z2 Z3

0.8

1.0

0.8

Samples

Why using authorized variogram Calculate the Variance of

• Check the sum of weights:

• Variance:

45

021

21)1(321 =

−+

−+=++ λλλ

∑=−−i

iiZZZZ λ321 21

21

( )( ) ( )

−×

−×+

−××+

−××−=

+++++−=

−= ∑∑

21

212

2112

2112

222 233213311221332322

2211

21 γλλγλλγλλγλγλγλ

γλλi j

ijjiVar

1.0−=Var

Hints for Modeling

• The Experimental variogram must be fitted by the Model: • For any distance • For any direction

• The Model (if using authorized basic structures) ensures the positivity of the variance of any linear combination of the data:

• Compulsory for the next step: Estimation by Kriging

46


• We focus on the cells from Variography section: • Temperature 2-D Omni-directional Experimental Variogram • 4 Directions Experimental Variograms • Variogram Model Fitting

47

Estimation Consider an exhaustive data set and extract 100 samples randomly located

48

5000.

10000.

15000.

0.

10000.

20000.

Exhaustive Data Set Sampled Data Set

Section

Estimation Estimation at target site as a linear combination of the sample values

49

samples

target ∑=

iiiZZ λ*

Estimation Moving Average

50

1 2

3

4

5

20% 20%

20%

20%

20%

∑=i

iZZ5

*

5000.

10000.

15000.

Estimation Influence Polygon

51

1 2

3

4

5

100% 0%

0%

0%

0%

1* ZZ =

5000.

10000.

15000.

2. 1. 0. 1. 2. 3.

Estimation Inverse Distance

52

1 2

3

4

5

37% 21%

20%

7%

15%

∑=i i

ii

ddZZ 2

2*

1

5000.

10000.

15000.

Kriging: Principle • Kriging produces the estimation Z* of the variable Z on a target site (point,

block, polygon) as a linear combination of the sample values

• The real unknown value is denoted

• The estimation error: • is a linear combination of the data • authorized • with a zero expectation (non bias) • with minimum variance (optimality)

• The constraints are strictly ordered

53

∑+=i

iiZZ λλ0*0

0Z

00*00 λλε ∑ −−=−=

iiiZZZZ

Simple Kriging • Stationary hypothesis – Simple Kriging - Known mean:

• Non bias:

• Optimality:

• Results:

54

mZE =)(

( )

−=⇒=−×−=

−−= ∑∑∑

ii

ii

iii mmmZZEE λλλλλλε 1 0 0000

( ) minimum 2 00000 ∑∑∑∑ +−=

−−=

i jijji

iii

iii CCCZZVarVar λλλλλε

( ) iCCVari

jijj

i

∀=−=∂

∂⇒ ∑ 00λ

λε

( )

−=

−+=

∑∑ ∑

iii

i iiii

CCVar

mZZ

000

*0 1

λε

λλ

Simple Kriging In algebraic terms

• Simple Kriging System:

• Estimation:

• Variance of Estimation Error:

55

=

×

0

101

1

111

nnnnn

n

C

C

CC

CC

λ

λ

[ ]

−+

•= ∑

ii

n

n mZ

ZZ λλλ 1

1

1*0

( ) [ ]

•−=

0

10

100

n

n

C

CCVar λλε

Simple Kriging • Correlation between estimation and estimation error:

• Re-writing the definition of the estimation error:

• In variances:

• Finally:

1. Kriging gives more accurate results than Statistics 2. Kriging is smoothing reality

56

( ) ( ) ( ) ( ) 0,,, 0*00

*00

*0

*0 =−=−=−= ∑ ∑∑

i iii

jijji CCZZCovZVarZZCovZCov λλλεε

*00

*00 ZZZZ +=⇔−= εε

( ) ( ) ( ) ( ) ( ) ( )εεε VarZVarZCovVarZVarZVar +=++= *0

*0

*00 ,2

( ) ( )( ) ( )

≤≤

0*0

0

ZVarZVarZVarVar ε

Simple Kriging: smoothing Spherical

57

0 00

0.25

0.50

0.75

1.00

1.25

Reality Kriging Sampling St. Dev.

Simple Kriging: smoothing Exponential

58


0 00

0.25

0.50

0.75

1.00

1.25

Simple Kriging: smoothing Gaussian

59


0 00

0.25

0.50

0.75

1.00

1.25

Simple Kriging: smoothing Nugget Effect

60


Simple Kriging: Exercise

• Spherical Model with range 1.25m and sill 2

• 3 Data and Target on a square pattern (mesh = 1m)

• Known mean = 2

61

Z1=3

Z2=4

Z3=1

Target

Simple Kriging: Solution • Simple Kriging System

• Results

62

=

×

30

20

10

3

2

1

333231

232221

131211

CCC

CCCCCCCCC

λλλ

0)2(

112.0)1(2)0(

=

==

C

CC

∑ =

==−=

106.0056.0

006.0

32

1

iλ

λλλ

[ ] [ ]

( ) [ ] [ ]

=•Λ−=

=

−+•Λ=

⇒ ∑41.1

050.21

000

*0

it

ii

t

CCVar

mZZ

ε

λ

=

×

112.0112.0

.0

20112.002112.0112.0112.02

3

2

1

λλλ

Block Kriging Estimate the average value of Z over a block, starting from sample values:

• Simple Block Kriging is (almost) identical to Simple Point Kriging, replacing Point index (◦) by Block index (v):

• Requires calculation of block-data and block-block covariances: discretization

63

∑+=i

iiv ZZ λλ0*

( )

−=

−+=

⇒

∀=

∑∑∑

∑

iii

ii

iiio

ij

ijj

CCVar

mZZ

iCC

0000

*

0

1

λε

λλ

λ

Point Block ( )

−=

−+=

⇒

∀=

∑∑∑

∑

iviivvv

ii

iiiv

vij

ijj

CCVar

mZZ

iCC

λε

λλ

λ

1

*

ixviC vvC

Block Kriging: Example

64

∫=V

v dxxZV

Z )(1 **

Data

Point Estimation

Block Estimation

Simple Kriging: Extensions Kriging can be extended to any linear function of the data: e.g. gradient

65

Data Estimation on a fine grid

Estimated Gradients

Model

Simple Kriging: Filtering

66

Data Kriging

Kriging Filtering

Nugget Effect

Ordinary Kriging • Intrinsic hypothesis – Ordinary Kriging - Unknown mean:

• Non bias:

• Optimality (under constraints):

• Result

67

mZE =)(

( )

=

=⇒∀=−×−=

−−=

∑∑∑0

1 0

0

ii

000λ

λλλλλε mmmZZEE

ii

iii

minimum 1200

−+

−−=Φ ∑∑

ii

iiiZZVar λµλλ

=−=∂Φ∂

∀=++−=∂Φ∂

⇒∑

∑

01

00

ii

ij

ijji

i

λµ

µγγλλ

( )

−=

=

∑∑

µγλε

λ

iii

iii

Var

ZZ

0

*0

Ordinary Kriging In algebraic terms

• Simple Kriging System:

• Estimation:

• Variance of Estimation Error:

68

−

−

=

×

−−

−−

10111

1

0

101

1

111

nnnnn

n

γ

γ

µλ

λ

γγ

γγ

[ ]

•=

0

1

1*0

nn Z

Z

Z

µλλ

( ) [ ]

−

−

•−=

10

10

1n

nVarγ

γ

µλλε

Ordinary Kriging: Exercise

• Spherical Model with range 1.25m and sill 2

• 3 Data and Target on a square pattern (mesh = 1m)

69

Z1=3

Z2=4

Z3=1

Target

Ordinary Kriging: Solution • Simple Kriging System

• Results

70

−−−

=

×

−−−−−−−−−

10111111

30

20

10

3

2

1

333231

232221

131211

γγγ

µλλλ

γγγγλγγγγ

2)2(

888.1)1(2)0(

=

==

γ

γγ

( )

=

•

Λ−=

=

•

Λ=

⇒

6.11

640.20

000

*0

it

t

CCVar

ZZ

OK

µε

µ

6.0360.0

280.0

32

1

−===

=

µλλ

λ

−−−

=

×

−−−−

−−

888.1888.12

02888.120888.1888.1888.10

3

2

1

λλλ

[ ] [ ]

( ) [ ] [ ]

=•Λ−=

=

−+•Λ=

⇒ ∑41.1

050.21

000

*0

it

ii

t

CCVar

mZZSK

ε

λ

Kriging weights Simple Kriging of the central Point

71

L

0 %

0 %

0 % 0 %

Nugget 2 1.σ =

L

31.25%

31.25%

0 % 0 %

Sphe(2L) 2 0.80σ =

L

44.91%

44.91%

2.88% 2.88%

Gaus*(2L) 2 0.57σ =

Kriging weights Ordinary Kriging of the central Point

72

L

25%

25%

25% 25%

Nugget 2 1.25σ =

L

40.6%

40.6%

9.4% 9.4%

Sphe(2L) 2 0.84σ =

L

45.99%

45.99%

4.01% 4.01%

Gaus*(2L) 2 0.57σ =

Kriging weights Simple Kriging of the central Block

73

L

0 %

0 %

0 % 0 %

Nugget

L

31.05%

31.05%

0.35 % 0.35 %

Sphe(2L) 2

0.81

0.61vvC

σ

=

=

L

44.53%

44.53%

3.20% 3.20%

Gaus*(2L) 2

0.94

0.52vvC

σ

=

=

Kriging weights Ordinary Kriging of the central Block

74

L

25 %

25 %

25 % 25 %

Nugget

L

40.35%

40.35%

9.65 % 9.65 %

Sphe(2L) 2

0.81

0.65vvC

σ

=

=

L

45.64%

45.64%

4.36% 4.36%

Gaus*(2L) 2

0.94

0.521vvC

σ

=

=

Kriging weights: Declustering Ordinary Kriging – Model with large range

75

2 0.48σ =37.0% 37.0%

26.0%

33.3% 33.3%

33.3%

2 0.45σ =

50.0%

50.0%

2 0.537σ =

25.7% 25.7%

48.7%

2 0.526σ =

Data

Target +

Kriging Properties • The Kriging system is regular if:

• the model is authorized • there is no redundant information (ex. duplicate points) • the solution is unique

• Kriging is an exact interpolator: at a sample, the punctual estimate is equal to data value and the estimation variance is zero

• Data values are not used for the determination of the Kriging weights, nor for the variance of estimation error

• Kriging weights are not modified when modifying the sill of the Model

• Variance of estimation error is proportional to the sill of the Model

76

Neighborhood The neighborhood defines the subset of samples used in kriging of a target site:

• unique: all samples • moving: only closest samples to the site (used if data set too large ~400)

Display of kriging weights for target site (square)

77

Unique Moving Radius=0.33 2 pts / quadrant

Neighborhood • Kriging weight relative to

blue sample when estimating each grid node

• Moving neighborhood: radius = 0.33 maximum= 2 pts per quadrant

78

Unique

Moving


• We focus on the cell from Interpolation section: • Kriging Temperature for Year 2008-T2 at 25m Depth

• Display Estimation Results • Display Associated Standard Deviation • Change the Neighborhood parameters

79

Cross-validation For each datum:

• Suppress the data • Perform Kriging at that data site:

Evaluate: • The error between data value and its estimation

• Normalized by the st. deviation of estimation error

Calculate statistics (mean and variance) on these errors

80

Mean Variance

Error 0 <<D2

Normalized error 0 ≈1

0iZ

∑≠

=0

0

*

iiiii ZZ λ

00

*ii ZZ −

( )0

00

*

i

ii

VarZZε

−

Cross-validation

81

Base Map of normalized errors

Histogram of normalized errors Correlation Z | Z*

Outliers ( )5.2

*

>−εVarZZ

( )5.2

*

−<−εVarZZ= +


• We focus on the cell from Cross-Validation section: • Kriging Temperature for Year 2008-T2 at 25m Depth

• Display Cross-Validation Results

82

End of presentation

83

Thank you for attention!

WP5 – Training session: RGeostats Geostatistics …rgeostats.free.fr/doc/Files/INTAROS WP5 -...

Documents

Transcript of WP5 – Training session: RGeostats Geostatistics …rgeostats.free.fr/doc/Files/INTAROS WP5 -...