7. Homogenization Seminar Budapest – 24. – 27. October 2011 What is the correct number of break...
-
Upload
michael-gibson -
Category
Documents
-
view
215 -
download
1
Transcript of 7. Homogenization Seminar Budapest – 24. – 27. October 2011 What is the correct number of break...
7. Homogenization Seminar Budapest – 24. – 27. October 2011
What is the correct numberof break points
hidden in a climate record?
Ralf LindauVictor VenemaBonn University
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Defining breaks
Consider the differences of one station compared to a reference. (Kriged ensemble of surrounding stations)
Breaks are defined by abrupt changes in the station-reference time series.
Internal variancewithin the subperiods
External variancebetween the means of different
subperiods
Criterion:Maximum external variance attained bya minimum number of breaks
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Decomposition of Variance
n total number of yearsN subperiodsni years within a subperiod
The sum of external and internal variance is constant.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Two questions
Titel of this talk asks: How many breaks?
Where are they situated?
Testing of all permutions is not feasible.
The best solution for a fixed number of breaks can be found by Dynamical Programming
131010987654321
90919293949596979899
10
991
k
n
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Dynamical Programming (1)
Find the optimum positions for a fixed number of breaks.
Consider not only the complete time series, but all possible truncated variants.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Dynamical Programming (2)
Find the optimum positions for a fixed number of breaks.
Consider not only the complete time series, but all possible truncated variants.
Find the first break by simply testing all permutions.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Dynamical Programming (3)
Find the optimum positions for a fixed number of breaks.
Consider not only the complete time series, but all possible truncated variants.
Find the first break by simply testing all permutions.
Fill up all truncated variants. The internal variance consists now of two parts: that of the truncated variant plus that of the rest.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Dynamical Programming (4)
Find the optimum positions for a fixed number of breaks.
Consider not only the complete time series, but all possible truncated variants.
Find the first break by simply testing all permutions.
Fill up all truncated variants. The internal variance consists of two parts: that of the truncated variant plus that of the rest.
Search the minimum out of n.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Dynamical Programming (5)
The 2-breaks optimum for the full length is found.
To begin the search for 3 breaks, we need as before the previous solutions for all, also shorter length.
This needs n2/2 searches, which is for larger numbers of breaks k much less than all permutations (n over k).
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Position & Number
Solved:
The optimum positions for a fixed number of breaks are known by Dynamical Programming.
Left:
Find the optimum number of breaks.
The external variance increase in any case with increasing number of breaks.
Use as benchmark the behaviour of a random time series.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Segment averages
with stddev = 1
Segment averages xi scatter randomly
mean : 0
stddev: 1/
Because any deviation from zero can beseen as inaccuracy due to the limited number of members.
in
7. Homogenization Seminar Budapest – 24. – 27. October 2011
External Variance
The external varianceis equal to the mean square sumof a random normal distributed variable.
Weighted measure for thevariability of the subperiods‘means
7. Homogenization Seminar Budapest – 24. – 27. October 2011
2-distribution
n: Length of time series (Number of years)k: Number of breaksN = k+1: Number of subsegments[ ]: Mean of several break position permutations
[varext] = (N-1)/n = k/nIn average, the external variance increases linearly with k.However, we consider the best member as found by DP.
varext ~ N2 The external variance is chi2-distributed.
Def.:
Take N values out of N (0,1), square and add them up.
By repeating a N2-distribution is obtained.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
21-years random data (1)
1000 random time series are created.
Only 21-years long, so that explicite tests of all permutations are possible.
The mean increases linearly.
However, the maximum is relevant
(the best solution as found by DP)
Can we describe this function?
First guess: 4*11 kv
7. Homogenization Seminar Budapest – 24. – 27. October 2011
21-years random data (2)
Above, we expected the datafor a fixed number of breaksbeing chi2-distributed.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
The random data does not fit exactly to a chi2-distribution.
The reason is that chi2 has no upper bounds.
But varext cannot exceed 1.
A kind of confined chi2 is the beta distribution.
From 2 to distribution
n = 21 yearsk = 7 breaks
data
7. Homogenization Seminar Budapest – 24. – 27. October 2011
From 2 to distribution
n = 21 yearsk = 7 breaks
data
X ~ 2(a) and Y ~ 2(b)
X / (X+Y) ~ (a/2, b/2)
If we normalize a chi2-distributed variable by the sum of itself and another chi2-distributed variable, the result will be -distributed.
The -distribution fits well to the data and is the theoretical distribution for the external variance of all break position permutations.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
From 2 to distribution
11
15
17
7
)(
)()(),(
ba
babaB
2
1,2
1)(
12
112
knkB
vvvp
knk
with
We are interested in the best solution,with the highest external variance, as provided by DP.
We need the exceeding probability forhigh varext
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Incomplete Beta Function
2
1,2
1)(
12
112
knkB
vvvp
knk
External variance v is -distributedand depends on n (years) and k (breaks):
The exceeding probability P gives thebest (maximum) solution for v
Incomplete Beta Function
1
0
1)(i
l
lml vvl
mvP
Solvable for even k and odd n:
2
ki
2
3n
m
v
pdvvP0
1)(
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Example 21 years, 4 breaks
1
0
1)(i
l
lml vvl
mvP
k = 4 i = 2n = 21 m = 9
89 191)( vvvvP
2
ki
2
3n
m
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Theory and Data
89 191)( vvvvP
Theory (Curve):
Random data (hached) fits well.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Nominal Combination Number
48454
201
k
n
For n = 21 and k = 4 there are
break combinations.
If they all were independent wecould read the maximum externalvariance at (4845)-1 ≈ 0.0002 being 0.7350
However, we suspect that thebreak combinations are notindependent. And we know thecorrect value of varext.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Effective and Nominal
Remember: varext= 0.5876 for k=4
The reverse reading leads to an 23 times higher exceeding probability.
This shows that the break permutationsare strongly dependent and the effectivenumber of combinations is smaller than the nominal.
However, the theorectical function is correct.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
From 21 years to 101 years
As we now know the theoretical function, we quit the explicit check by random data.
And skip from unrealistic short time series (n=21) to more realistic (n=101).
Again the numerical values of the external variance is known and we can conclude the effective combination numbers.
Can we give a formula for in order to derive v(k) ?
220
breaks
dk
dv
7. Homogenization Seminar Budapest – 24. – 27. October 2011
dv/dk sketch
Increasing the break number from k to k+1 has two consequences:
1. The probability function changes.
2. The number combinations increase.
Both increase the external variance.
k breaks
k+1 breaks
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Using the Slope
P(v) is a complicated function and hard to invert into v(P).
Thus, dv is concluded from dP / slope.
We just derived P(v) by integrating p(v), so that the slope p(v) is known.
k breaks
k+1 breaks
1
0
1)(i
l
lml vvl
mvP
7. Homogenization Seminar Budapest – 24. – 27. October 2011
The Slope
)(
)())(ln(
vP
vpvP
dv
d
1
0
1
1
111
))(ln(i
l
lml
imi
vvl
m
i
mimvv
vPdv
d
11
1
11
111
))(ln(
imi
imi
vvi
m
i
mimvv
vPdv
d
v
imvP
dv
d
1
1))(ln(
vkn
vPdv
d
12
1))(ln(
Insert the known functions:
The last summand dominates:
Reduce and replace m and i:
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Distance between the Curves
1
0
01
1
1
lnlnlni
l
lml
i
l
lml
ii
vvl
m
vvl
m
PP
11
1
11
1
lnlnlnimi
imi
ii
vvi
m
vvi
m
PP
vi
vimPP ii 1
1lnlnln 1
vk
vknPP kk 1
1ln2
1lnln 1
The last summand dominates:
Reduce and replace m and i:
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Effective combination growth
Nominal Growth Rate
-2 ln ( (n-1- k) / k) Ln: Logarithmic sketch minus: Number of combinations is reciprocal to Exceeding Probability
2: Exceeding Probability only known for even break numbers
k
n 1
1
1
k
n(n-1-k) / k
However, break combinations are not independentand we know the effective number of combinations
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Ratio: nominal / effective
k1 k2 k nominal effectiv c=nom/eff
2 4 3 -2.552 -7.784 0.328
4 6 5 -2.186 -6.952 0.315
6 8 7 -1.963 -6.356 0.309
8 10 9 -1.765 -5.889 0.300
10 12 11 -1.645 -5.503 0.299
12 14 13 -1.514 -5.173 0.293
14 16 15 -1.435 -4.885 0.294
16 18 17 -1.363 -4.627 0.295
18 20 19 -1.292 -4.394 0.295
The ratio of nominal / effective is approximatly constant with c = 0.3
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Approximative Solution
vk
vkn
k
knc
kn
v
dk
dv
1
1ln2
11ln
1
12
1*
n
kkNormalisation
for small k*
)4ln()100ln(3.02
:
4
)1(
)1(ln
1ln2
1
1*
*
*
*
*
*
vk
vk
n
k
kc
dk
dv
v
k
15.439.176.2
for n = 100
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Exact Solution
5ln21ln
2
1
1
1*
***
k
kk
dk
dv
v
k
***
*
1
5ln21ln
2
1
1
1dk
kk
kdv
v
*
2
1
*
*
2
1)5ln(2* 1
11k
k
kkv
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Constance of Solution
101 ye
ars21 yea
rs
The solution for the exponent is constant for different length oftime series (21 and 101 years).
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Conclusion
We have found a general mathematical formulation how the external variance of a random time series is increasing when more and more breaks as given by Dynamical Programming are inserted.
This can be used as benchmark to define the optimum number of breaks.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Integrated result
How does the found function look like after integration?
Crosses: Test data
Line: Theory
Error bars: 90 and 95 percentile
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Appendix (1)
vl
m
vl
m
f
i
i
1
1
vl
vlmf
i
i
1
1
vl
vlnf
k
k
1
1
vk
vknf
1
1
Consider the individual summands of the sum as defined in The factor of change f between a certain summand and its successor is:
m and i can be replaced by n and k:
inserting k instead of lk is a lower limit for f because (n-1-lk)/lk, the rate of change of the binomial coefficients, is decreasing monotonously with k:
where li runs from zero to i. The ratio of consecutive binomial coefficients can be replaced and it follows:
normalised by 1/(n-1):
vk
vkf
1
1*
*
1
0
1)(i
l
lml vvl
mxP
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Appendix (2)
4**
4**
1
111
kk
kkf
3**
4*
1
11
kk
kf
4
31
4
31
4
31
411***
*
**
*
kkk
k
kk
kf
1
011*
4*3*
k
kkf
the approximate solution is known with 1-v = (1- k*)4
0k
1k
We can conclude that each element of the sum givenabove is by a factor f larger than the prior element.For small k* the factor f is greater than about 4 and grows to infinity for large k*. Consequently, we canapproximate the sum by its last summand according to:
111
11
1)(
imi
i
il
lml vvi
mvv
l
mxP
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Application (1)
Insert in each of 1000 random time series 5 breaks of variance 1.
The change of external variance for low break numbers (1, 2, 3 up to about 10) increase.
Lying above the theoretical function for random time series without any break (arrow).
Variances of break numbers higher than 5 increase, because the inserted 5 breaks are not always the biggest.
7. Homogenization Seminar Budapest – 24. – 27. October 2011
Application (2)Stop break search, when the growth rate for
the external variance drops firstly below the theoretical one for zero breaks.
1 Example of 1000 test time series
Crosses: Observations
Thin line: Inserted breaks
Fat line: Detected breaks
In average over 1000 samples:
Added variance: 86%
(theoretically 5/6)
Remaining after correction: 27%
Average detected break number 5.48