WP5 – Training session: RGeostats Geostatistics …rgeostats.free.fr/doc/Files/INTAROS WP5 -...
Transcript of WP5 – Training session: RGeostats Geostatistics …rgeostats.free.fr/doc/Files/INTAROS WP5 -...
WP5 – Training session: RGeostats Geostatistics Course & Exercices INTAROS – General Assembly Haus der Wissenschaft, Sandstrasse 4/5 28195 Bremen, Germany 9.00-16.00 January 11th 2019 RENARD Didier, ARMINES (lead) ORS Fabien, ARMINES (co-lead)
Outline Thursday 10th
1. Creating iAOS Processing Services 2. Geostatistics and RGeostats 3. Ellip Notebooks using RGeostats 4. Ellip Worflow using RGeostats 5. IMR Case Study - RGeostats in Action!
Friday 11th
Geostatistics Course & Exercices
2
Details level
Introduction
3
IMR dataset global overview
• 7 vessels • from 1995 to 2016 • 3 variables measured:
• Temperature • Salinity • Conductivity
• 63 500 positions {long, lat} • 63 500 vertical profiles (in depth) • A few million samples • 84 NetCDF files (~60 Mb each)
4
Getting Ready for Exercices! • Click on the Home tab of your
Jupyter Notebooks (in Chrome)
• Click on the geostats_course.ipynb
5
Getting Ready for Exercices!
• Execute all code cells in the Introduction section: => Hitting Shift + Enter
• Definition of R functions • Loading Data (slow) • Setting global environment
6
From Data to Estimation (Kriging)
7
Data – Measurements
Space: 1-D, 2-D, 3-D or more
Data:
• Number : few to several thousands
• Locations: isolated, along lines, regular
• Time dependency
8
Statistics
• Mean, variance (or standard deviation)
• Histograms: extremes, mode, median, quantiles
9
0.05
0.10
0.15
0.20
0 000
0.025
0.050
0.075
0.100
0 00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Bimodal Symmetric Skewed
Statistics • Mean and variance of each variable
• Correlation, covariance, linear regression
10
0.
5.
10.
15.
baZZ += 12 '' 21 bZaZ +=
10
10
20
20
30
30
40
40
50
50
60
60
70
70
0 0
5 5
10 10
15 15
rho=0.453
Z1
Z2
0
0
5
5
10
10
15
15
10 10
20 20
30 30
40 40
50 50
60 60
70 70 rho=0.453
Z2
Z1
Back to Jupyter Notebook!
• We focus on the cells from Basics statistics section: • Global Database Statistics • Display Temperature Variable • Temperature Histogram • Temperature vs Salinity Correlation • Sample Filtering • Statistics per Blocks • Mean and Variances by Years
11
Exercise
• Calculate the experimental mean and variance for Z1
0 1 2 0 2 1 0 1 2 1
• Same calculation for Z2
1 1 1 0 2 1 0 0 2 2
12
Statistics Punctual statistics are not sufficient:
13
0 00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Two images with the same histogram
Hypotheses • Stationary RF : invariance under translation of the spatial law.
• Order-2 stationary RF: the first two moments exist and are invariant under translation.
• Intrinsic RF (IRF0): the increments are order-2 stationary.
14
Stationary
Intrinsic
Hypotheses • An stationary RF is also intrinsic, but an intrinsic RF is not necessarily
stationary.
• No attraction by the mean... but no systematic behavior (as the depth of the bottom of the sea which increases regularly from the beach)
• The purely intrinsic model stands between the stationary and the non-stationary cases. The choice of the degree of non-stationarity depends upon the field of observation.
15
Experimental Variogram • Stationary hypothesis: covariance
• Intrinsic hypothesis
16
[ ])()()( xZhxZEhCov ×+=
[ ] [ ]
( ) ( )[ ]∑=
−+=
−+=−+=
)(
1
2
2
)(21)(
)()(21)()(
21)(
hN
iii xZhxZ
hNh
xZhxZExZhxZVarh
γ
γ
Work with variogram rather than covariance (more general framework)
Exercise: Variogram on regular 1-D sampling
Variable Z defined on a regular grid in 1-D
• Calculate the experimental variogram for the lags: 5m, 10m and 15m.
• Evaluate the experimental variogram of the new variable:
Y(x) = Z(x) + 3.2
17
+ + + + + + + + + + + + + 8 6 4 3 6 5 7 2 8 9 5 6 3
5m
Solution: Variogram on regular 1-D sampling
Consider the distance 5m:
Consider the distance 10m:
Consider the distance 15m:
18
+ + + + + + + + + + + + + 8 6 4 3 6 5 7 2 8 9 5 6 3
5m
( ) ( ) ( ) ( )[ ] 625.436...34466821)5( 2222 =−++−+−+−=mγ
( ) ( ) ( ) ( )[ ] 227.535...64364821)10( 2222 =−++−+−+−=mγ
( ) ( ) ( ) ( )[ ] 000.639...54663821)15( 2222 =−++−+−+−=mγ
Variogram Cloud Variable Z defined on a regular grid in 1-D
19
0
0
5
5
10
10
15
15
0 0
5 5
10 10
15 15
20 20
25 25
Distance(x1,x2)
½ [(Z(x1)-Z(x2))2]
Variogram on regular 2-D grid Variable Z defined on a regular grid in 2-D (square mesh = a)
20
1 0 2 -1 1
-1 -2 1 2 0
-2 0 2 1 -1
0 -1 1 0 2
1 0 0 -1 1
a
a
Variogram on 2-D irregular data Calculation of the experimental omni-directional variogram
21
0.
0.
10.
10.
20.
20.
X (km)
X (km)
0. 0.
10. 10.
20. 20.
Y (km)
Y (km)
N
Variogram on 2-D irregular data Establish the Variogram Cloud
22
Distance(x1,x2)
½ [(Z(x1)-Z(x2))2]
Tolerance
Experimental Variogram
23
Lag Number of lags
0
0
10
10
20
20
30
30
40
40
50
50
60
60
0.0
0.5
1.0
Data Variance
Experimental Variogram
Histogram of pairs
Directional Variograms Calculation along two directions: E-W and N-S – Looking for Anisotropy
24
D1D2
0.
0.
50.
50.
100.
100.
150.
150.
(Meter)
(Meter)
0.00 0.00
0.25 0.25
0.50 0.50
0.75 0.75
1.00 1.00
Distance V
ario
gram
N
Directional Variograms
25
Distance classes
Tolerance on Distance
Angular classes
Angular Tolerance
Examples of Variograms
26
From “Geostatistics: modeling spatial uncertainty” J.P. Chilès, P. Delfiner (Wiley 1999)
Hints for Variogram calculation
• Analyze the variogram cloud in order to detect outliers. Pa y attention to presence of different areas or faults, different measurement tools.
• Choose the lag and the tolerance on distance. Check the homogeneity of number of pairs for all lags.
• If possible, calculate variograms in several directions (at least 4 in 2D, 5 in 3D) in order to look for possible anisotropy
27
Fitting a Model Procedure:
• Choose a single variogram Model for all directions and all distances
• Use a valid type of function: authorized Model
• Needed further (for Estimation) as it ensures (conditional) positivity of the variance of any linear combination
• As close as possible to the Experimental Variogram: • Behavior near the origin • Behavior at large distance • Specific features: presence of anisotropy, presence of trend, …
28
Behavior near the origin
• The behavior of the variogram near the origin is directly linked to the continuity of the variable
29
0 00
0.25
0.50
0.75
1.00
1.25
γ(h)
h
0 00
0.25
0.50
0.75
1.00
1.25
γ(h)≈h |h|→0
Continuous variable
0 00
0.25
0.50
0.75
1.00
1.25
γ(h)≈h2
|h|→0
Differentiable variable
0 00
0.25
0.50
0.75
1.00
1.25
Discontinuous variable
Nugget effect
Behavior at large distance
Check for strict stationarity
30
0 00
0.25
0.50
0.75
1.00
1.25
γ(h)
h
Bounded variogram Unbounded variogram
0 00
0.25
0.50
0.75
1.00
1.25
0.5
1.0
1.5
2.0
2.5
Increase rate < h2
Model characteristics
31
0 00
0.25
0.50
0.75
1.00
1.25
γ(h)
h Nugget effect
Sill
Range
Slope
Variogram vs. Covariance
32
0 00
0.25
0.50
0.75
1.00
1.25
γ(h)
h
Sill
Range
0 00
0.25
0.50
0.75
1.00
1.25
C(h)
h
0.50
0.75
1.00
Bounded variogram can be turned into covariance (stationary case)
Usual Models Spherical
33
0 00
0.25
0.50
0.75
1.00
1.25
ahc
ahah
ahch
>=
≤
−= 3
332
)(γ
Usual Models Exponential
34
−=
−ah
ch exp1)(γ
0 00
0.25
0.50
0.75
1.00
1.25
Usual Models Gaussian
35
−=
−
2
exp1)( ah
chγ
0 00
0.25
0.50
0.75
1.00
1.25
Usual Models Cardinal Sine
36
( )ah
ahch sin)( =γ
0.25
0.50
0.75
1.00
1.25
Usual Models Linear
37
ah
ch =)(γ
0.5
1.0
1.5
Usual Models Nugget Effect
38
000)(
>===
hchhγ
Nested Variograms: Definition Nesting variograms = Adding values for each distance
39
0 00
0.25
0.50
0.75
1.00
1.25
0 00
0.25
0.50
0.75
1.00
1.25
Short range 1 ( )hγ
0 00
0.25
0.50
0.75
1.00
1.25
Long range 2 ( )hγ
+ =
)()()( 21 hhh γγγ +=
Nested Variograms: Example Cubic (short range) + Spherical (long range)
40
0 00
0.25
0.50
0.75
1.00
Anisotropy: Definition Two types of Anisotropies:
41
γ(h)
0 00
0.25
0.50
0.75
1.00
1.25
h
sill 2
sill 1
Zonal anisotropy
sill
γ(h)
0 00
0.25
0.50
0.75
1.00
h
Geometrical anisotropy
Anisotropy: Examples
42
5000
10000
15000
Geometrical Anisotropy
5000
10000
15000
Zonal Anisotropy
0 00
0.25
0.50
0.75
1.00
0 00
0.25
0.50
0.75
1.00
1.25
Variances
Calculate the variance of a linear combination:
• Using the Covariance
• Using the Variogram
43
∑i
iiZλ
)(hC
( ) 0≥==
∑∑∑∑∑i j
ijjii j
ijjii
ii ChCZVar λλλλλ
)(hγ
( ) 0≥−=−=
∑∑∑∑∑i j
ijjii j
ijjii
ii hZVar γλλγλλλ
Why using authorized variogram Calculate the Variance of Z1 -1/2 Z2 - 1/2 Z3
44
0.2
0.2 0.4 0.6 0.8 1.0
γ(h)
h
1 Z1
Z2 Z3
0.8
1.0
0.8
Samples
Why using authorized variogram Calculate the Variance of
• Check the sum of weights:
• Variance:
45
021
21)1(321 =
−+
−+=++ λλλ
∑=−−i
iiZZZZ λ321 21
21
( )( ) ( )
−×
−×+
−××+
−××−=
+++++−=
−= ∑∑
21
212
2112
2112
222 233213311221332322
2211
21 γλλγλλγλλγλγλγλ
γλλi j
ijjiVar
1.0−=Var
Hints for Modeling
• The Experimental variogram must be fitted by the Model: • For any distance • For any direction
• The Model (if using authorized basic structures) ensures the positivity of the variance of any linear combination of the data:
• Compulsory for the next step: Estimation by Kriging
46
Back to Jupyter Notebook!
• We focus on the cells from Variography section: • Temperature 2-D Omni-directional Experimental Variogram • 4 Directions Experimental Variograms • Variogram Model Fitting
47
Estimation Consider an exhaustive data set and extract 100 samples randomly located
48
5000.
10000.
15000.
0.
10000.
20000.
Exhaustive Data Set Sampled Data Set
Section
Estimation Estimation at target site as a linear combination of the sample values
49
samples
target ∑=
iiiZZ λ*
Estimation Moving Average
50
1 2
3
4
5
20% 20%
20%
20%
20%
∑=i
iZZ5
*
5000.
10000.
15000.
Estimation Influence Polygon
51
1 2
3
4
5
100% 0%
0%
0%
0%
1* ZZ =
5000.
10000.
15000.
2. 1. 0. 1. 2. 3.
Estimation Inverse Distance
52
1 2
3
4
5
37% 21%
20%
7%
15%
∑=i i
ii
ddZZ 2
2*
1
5000.
10000.
15000.
Kriging: Principle • Kriging produces the estimation Z* of the variable Z on a target site (point,
block, polygon) as a linear combination of the sample values
• The real unknown value is denoted
• The estimation error: • is a linear combination of the data • authorized • with a zero expectation (non bias) • with minimum variance (optimality)
• The constraints are strictly ordered
53
∑+=i
iiZZ λλ0*0
0Z
00*00 λλε ∑ −−=−=
iiiZZZZ
Simple Kriging • Stationary hypothesis – Simple Kriging - Known mean:
• Non bias:
• Optimality:
• Results:
54
mZE =)(
( )
−=⇒=−×−=
−−= ∑∑∑
ii
ii
iii mmmZZEE λλλλλλε 1 0 0000
( ) minimum 2 00000 ∑∑∑∑ +−=
−−=
i jijji
iii
iii CCCZZVarVar λλλλλε
( ) iCCVari
jijj
i
∀=−=∂
∂⇒ ∑ 00λ
λε
( )
−=
−+=
∑∑ ∑
iii
i iiii
CCVar
mZZ
000
*0 1
λε
λλ
Simple Kriging In algebraic terms
• Simple Kriging System:
• Estimation:
• Variance of Estimation Error:
55
=
×
0
101
1
111
nnnnn
n
C
C
CC
CC
λ
λ
[ ]
−+
•= ∑
ii
n
n mZ
ZZ λλλ 1
1
1*0
( ) [ ]
•−=
0
10
100
n
n
C
CCVar λλε
Simple Kriging • Correlation between estimation and estimation error:
• Re-writing the definition of the estimation error:
• In variances:
• Finally:
1. Kriging gives more accurate results than Statistics 2. Kriging is smoothing reality
56
( ) ( ) ( ) ( ) 0,,, 0*00
*00
*0
*0 =−=−=−= ∑ ∑∑
i iii
jijji CCZZCovZVarZZCovZCov λλλεε
*00
*00 ZZZZ +=⇔−= εε
( ) ( ) ( ) ( ) ( ) ( )εεε VarZVarZCovVarZVarZVar +=++= *0
*0
*00 ,2
( ) ( )( ) ( )
≤≤
0*0
0
ZVarZVarZVarVar ε
Simple Kriging: smoothing Spherical
57
0 00
0.25
0.50
0.75
1.00
1.25
Reality Kriging Sampling St. Dev.
Simple Kriging: smoothing Exponential
58
Reality Kriging Sampling St. Dev.
0 00
0.25
0.50
0.75
1.00
1.25
Simple Kriging: smoothing Gaussian
59
Reality Kriging Sampling St. Dev.
0 00
0.25
0.50
0.75
1.00
1.25
Simple Kriging: smoothing Nugget Effect
60
Reality Kriging Sampling St. Dev.
Simple Kriging: Exercise
• Spherical Model with range 1.25m and sill 2
• 3 Data and Target on a square pattern (mesh = 1m)
• Known mean = 2
61
Z1=3
Z2=4
Z3=1
Target
Simple Kriging: Solution • Simple Kriging System
• Results
62
=
×
30
20
10
3
2
1
333231
232221
131211
CCC
CCCCCCCCC
λλλ
0)2(
112.0)1(2)0(
=
==
C
CC
∑ =
==−=
106.0056.0
006.0
32
1
iλ
λλλ
[ ] [ ]
( ) [ ] [ ]
=•Λ−=
=
−+•Λ=
⇒ ∑41.1
050.21
000
*0
it
ii
t
CCVar
mZZ
ε
λ
=
×
112.0112.0
.0
20112.002112.0112.0112.02
3
2
1
λλλ
Block Kriging Estimate the average value of Z over a block, starting from sample values:
• Simple Block Kriging is (almost) identical to Simple Point Kriging, replacing Point index (◦) by Block index (v):
• Requires calculation of block-data and block-block covariances: discretization
63
∑+=i
iiv ZZ λλ0*
( )
−=
−+=
⇒
∀=
∑∑∑
∑
iii
ii
iiio
ij
ijj
CCVar
mZZ
iCC
0000
*
0
1
λε
λλ
λ
Point Block ( )
−=
−+=
⇒
∀=
∑∑∑
∑
iviivvv
ii
iiiv
vij
ijj
CCVar
mZZ
iCC
λε
λλ
λ
1
*
ixviC vvC
Block Kriging: Example
64
∫=V
v dxxZV
Z )(1 **
Data
Point Estimation
Block Estimation
Simple Kriging: Extensions Kriging can be extended to any linear function of the data: e.g. gradient
65
Data Estimation on a fine grid
Estimated Gradients
Model
Simple Kriging: Filtering
66
Data Kriging
Kriging Filtering
Nugget Effect
Ordinary Kriging • Intrinsic hypothesis – Ordinary Kriging - Unknown mean:
• Non bias:
• Optimality (under constraints):
• Result
67
mZE =)(
( )
=
=⇒∀=−×−=
−−=
∑∑∑0
1 0
0
ii
000λ
λλλλλε mmmZZEE
ii
iii
minimum 1200
−+
−−=Φ ∑∑
ii
iiiZZVar λµλλ
=−=∂Φ∂
∀=++−=∂Φ∂
⇒∑
∑
01
00
ii
ij
ijji
i
λµ
µγγλλ
( )
−=
=
∑∑
µγλε
λ
iii
iii
Var
ZZ
0
*0
Ordinary Kriging In algebraic terms
• Simple Kriging System:
• Estimation:
• Variance of Estimation Error:
68
−
−
=
×
−−
−−
10111
1
0
101
1
111
nnnnn
n
γ
γ
µλ
λ
γγ
γγ
[ ]
•=
0
1
1*0
nn Z
Z
Z
µλλ
( ) [ ]
−
−
•−=
10
10
1n
nVarγ
γ
µλλε
Ordinary Kriging: Exercise
• Spherical Model with range 1.25m and sill 2
• 3 Data and Target on a square pattern (mesh = 1m)
69
Z1=3
Z2=4
Z3=1
Target
Ordinary Kriging: Solution • Simple Kriging System
• Results
70
−−−
=
×
−−−−−−−−−
10111111
30
20
10
3
2
1
333231
232221
131211
γγγ
µλλλ
γγγγλγγγγ
2)2(
888.1)1(2)0(
=
==
γ
γγ
( )
=
•
Λ−=
=
•
Λ=
⇒
6.11
640.20
000
*0
it
t
CCVar
ZZ
OK
µε
µ
6.0360.0
280.0
32
1
−===
=
µλλ
λ
−−−
=
×
−−−−
−−
888.1888.12
02888.120888.1888.1888.10
3
2
1
λλλ
[ ] [ ]
( ) [ ] [ ]
=•Λ−=
=
−+•Λ=
⇒ ∑41.1
050.21
000
*0
it
ii
t
CCVar
mZZSK
ε
λ
Kriging weights Simple Kriging of the central Point
71
L
0 %
0 %
0 % 0 %
Nugget 2 1.σ =
L
31.25%
31.25%
0 % 0 %
Sphe(2L) 2 0.80σ =
L
44.91%
44.91%
2.88% 2.88%
Gaus*(2L) 2 0.57σ =
Kriging weights Ordinary Kriging of the central Point
72
L
25%
25%
25% 25%
Nugget 2 1.25σ =
L
40.6%
40.6%
9.4% 9.4%
Sphe(2L) 2 0.84σ =
L
45.99%
45.99%
4.01% 4.01%
Gaus*(2L) 2 0.57σ =
Kriging weights Simple Kriging of the central Block
73
L
0 %
0 %
0 % 0 %
Nugget
L
31.05%
31.05%
0.35 % 0.35 %
Sphe(2L) 2
0.81
0.61vvC
σ
=
=
L
44.53%
44.53%
3.20% 3.20%
Gaus*(2L) 2
0.94
0.52vvC
σ
=
=
Kriging weights Ordinary Kriging of the central Block
74
L
25 %
25 %
25 % 25 %
Nugget
L
40.35%
40.35%
9.65 % 9.65 %
Sphe(2L) 2
0.81
0.65vvC
σ
=
=
L
45.64%
45.64%
4.36% 4.36%
Gaus*(2L) 2
0.94
0.521vvC
σ
=
=
Kriging weights: Declustering Ordinary Kriging – Model with large range
75
2 0.48σ =37.0% 37.0%
26.0%
33.3% 33.3%
33.3%
2 0.45σ =
50.0%
50.0%
2 0.537σ =
25.7% 25.7%
48.7%
2 0.526σ =
Data
Target +
Kriging Properties • The Kriging system is regular if:
• the model is authorized • there is no redundant information (ex. duplicate points) • the solution is unique
• Kriging is an exact interpolator: at a sample, the punctual estimate is equal to data value and the estimation variance is zero
• Data values are not used for the determination of the Kriging weights, nor for the variance of estimation error
• Kriging weights are not modified when modifying the sill of the Model
• Variance of estimation error is proportional to the sill of the Model
76
Neighborhood The neighborhood defines the subset of samples used in kriging of a target site:
• unique: all samples • moving: only closest samples to the site (used if data set too large ~400)
Display of kriging weights for target site (square)
77
Unique Moving Radius=0.33 2 pts / quadrant
Neighborhood • Kriging weight relative to
blue sample when estimating each grid node
• Moving neighborhood: radius = 0.33 maximum= 2 pts per quadrant
78
Unique
Moving
Back to Jupyter Notebook!
• We focus on the cell from Interpolation section: • Kriging Temperature for Year 2008-T2 at 25m Depth
• Display Estimation Results • Display Associated Standard Deviation • Change the Neighborhood parameters
79
Cross-validation For each datum:
• Suppress the data • Perform Kriging at that data site:
Evaluate: • The error between data value and its estimation
• Normalized by the st. deviation of estimation error
Calculate statistics (mean and variance) on these errors
80
Mean Variance
Error 0 <<D2
Normalized error 0 ≈1
0iZ
∑≠
=0
0
*
iiiii ZZ λ
00
*ii ZZ −
( )0
00
*
i
ii
VarZZε
−
Cross-validation
81
Base Map of normalized errors
Histogram of normalized errors Correlation Z | Z*
Outliers ( )5.2
*
>−εVarZZ
( )5.2
*
−<−εVarZZ= +
Back to Jupyter Notebook!
• We focus on the cell from Cross-Validation section: • Kriging Temperature for Year 2008-T2 at 25m Depth
• Display Cross-Validation Results
82
End of presentation
83
Thank you for attention!