Break Position Errors in Climate Records Ralf Lindau & Victor Venema University of Bonn Germany.

22
Break Position Errors in Climate Records Ralf Lindau & Victor Venema University of Bonn Germany

Transcript of Break Position Errors in Climate Records Ralf Lindau & Victor Venema University of Bonn Germany.

Break Position Errors in Climate Records

Ralf Lindau & Victor VenemaUniversity of Bonn

Germany

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

Internal and External Variance

Consider the differences of one station compared to a neighbor reference.

Breaks are defined by abrupt changes in the station-reference time series.

Internal variancewithin the subperiods

External variancebetween the means of different

subperiods

Break criterion:Maximum external variance

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

Decomposition of Variance

n total number of yearsN subperiodsni years within a subperiod

The sum of external and internal variance is constant.

Position errors

Two segments of lengths n1 and n2 with means x1 and x2.

A subsegment of length m with mean x0 is erroneously exchanged from segment 2 to segment 1.

x1 is strongly reduced, x2 differs slightly. x1 and x2 converge.

This reduces the external variance, and the wrong segmentation is rejected.

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

Change of external variance

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

𝑛∆𝑣=− 𝑓 1𝑥12+ 𝑓 2𝑥2

2+2 𝑥0 ( 𝑓 1 𝑥1− 𝑓 2𝑥2 )+ 𝑓 0𝑥02

The change of external variance Dv

is only a function of the means and

lengths of the two segments and the

exchanged subsegment .

with

Express x0 by x2 plus scatter

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

The mean of the exchanged

subsegment x0 is equal to x2, the

segment mean where it stem

from, plus a random scatter

variable .d

𝑥0=𝑥2+𝛿 𝛿= 𝜎𝑚∑

𝑖=1

𝑚

𝛿𝑖 ,𝛿𝑖 𝒩(0,1)

d depends on the internal

variance s2 and the length m,

because it is a mean over m

random numbers.

with

𝛿 𝒩(0 ,𝜎2

𝑚)

Quadratic function for Dv

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

Replace x0 by d and

normalize by the square of

the jump height d.

𝑛∆𝑣=− 𝑓 1 (𝑥1−𝑥2 )2+2 𝑓 1 (𝑥1−𝑥2 ) 𝛿+ ( 𝑓 2− 𝑓 1 ) 𝛿

2

𝑣∗≔𝑛∆𝑣𝑑2

=− 𝑓 1+2 𝑓 1𝜀+ 𝑓 0𝜀2

The change of the normalized external variance v*, which is the decision criterion for break

detection, is a quadratic function of a random variable ,e which depends on the signal to

noise ratio and the length of the exchanged segment .

𝜀 𝒩(0 ,1

4𝑚𝑆𝑁𝑅2 )

𝑆𝑁𝑅≔¿𝑑 /2∨ ¿𝜎¿

Zero points

If the parabola becomes positive, the

shift of the break position by m items

leads to increased external variance

so that this solution is preferred by

mistake.

Zero points at:

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

+12

−2 𝑓 1𝑓 0−12≅−

𝑛1𝑚

≅𝟐𝒎

Simulated data

10,000 random time series of length 100.

Internal s = 1

Jump height = 2

Data confirm the existence of different parabolae for different m.

But data coverage only for scatter near zero, never reaching the negative solution.

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

m=1

m=2

m=3

d

(

n D

v) /

4} SNR = 1

The negative solution

Typical situation:

SNR extreme low.

A drastically disturbed measurement near the break.

Its exchange leads to x1’ < x2 and x2’ > x1. The two means diverge so that the external variance grows.

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

X1

X1’

X2’ X2

The positive solution

A subsegment adjacent to the true break is randomly lifted by more than half of the jump height.

Including it to the neighboring segment will reduce the internal variance.

An erroneous break position is concluded.

Criterion: Maximum hatched area

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

Mathematical formulation of the criterion:

Brownian motion with drift

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

Drift = - SNR

d

s

Theoretical retrace

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

Parabola equation

linear approximation around the zero point

inserting known slope and (positive) zero point

replacing f1 + f2 by 2m

multiplying by signal-to-noise ratio

Brownian motion with drift

Distribution of the time of the maximum of a Brownian motion

with drift

Strictly valid only for continuous processes.

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

𝑓 (𝑠 )=2[ 1√𝑠 𝜑 (𝜇√𝑠 )+𝜇Φ (𝜇√ 𝑠) ]×[ 1

√𝑡−𝑠𝜑 (𝜇√𝑡−𝑠)−𝜇Φ (−𝜇√𝑡−𝑠 )] ,0<𝑠<𝑡

Buffet , 2003, J Appl Math Stoch Anal

_ _ _ _ _ Buffet, 2003

0 0 0 Numerical simulation of a discrete Brownian motion with drift.

+ + + Complete break search simulation

SNR = 0.5

SNR = 1SNR = 2

Two more problems

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

𝑓 (𝑠 )=2[ 1√𝑠 𝜑 (𝜇√𝑠 )+𝜇Φ (𝜇√ 𝑠) ]×[ 1

√𝑡−𝑠𝜑 (𝜇√𝑡−𝑠)−𝜇Φ (−𝜇√𝑡−𝑠 )] ,0<𝑠<𝑡

Buffet , 2003

Hit rate is not accurately reproduced

Break errors are a two-sided symmetric process. Both, too early and too late breaks are possible.

Hit rate

The hit rate h can be estimated for all drifts d by:

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

true + + + estimated

h=1−Φ1−Φ2+Φ1

2

2

with

Two-sided processes

Deviations are caused by random scatter independently on both sides.

The hit rate h is reduced to h2.

One-sided deviations have the probability:

with + without competitor

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

For two-sided deviations the probability is halved, if a competitor occurs on the other side:

All other probabilities are reduced by

Practical application

The hit rate drops from

from 95% for SNR = 2

to 29% for SNR = 0.5

SNR > 1

becoming quickly very exact.

SNR < 1

becoming quickly very inexact.

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

true + + + estimated

SNR = 1

SNR = 2

SNR = 0.5

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

Conclusions

• Break position errors can be described by the distribution of the time of maximum of a Brownian motion with drift.

• The drift parameter is equal to the signal to noise ratio, as given by the half jump height between and the internal standard deviation within homogeneous subperiods.

Hit rate simulation

The hit rate is the probability that the initial value is never exceeded.

For realistic drift sizes the value converges after a few steps.

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

Preliminary maximum

Instead of multiplying with h < 1, we can alternatively stop the summation earlier. k = 2 works well.

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

pik is defined as the probability that the kth member of a Brownian motion is the preliminary maximum after i steps.

The probability to be also the absolute maximum is lower by a factor of h.

Thus:

Hit rate estimate

Define the drift-dependent exceeding probability:

The preliminary maxima after 1 and after 2 steps are known.

12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013

𝒑𝟏𝟏=𝚽𝟏

𝒑𝟐𝟐=𝚽𝟐+𝚽𝟏

𝟐

𝟐

𝒑𝟐𝟏=𝚽𝟏 (𝟏−𝚽𝟏)

Φ𝑘:=1−Φ (|𝑑|√𝑘 )=Φ (𝑑√𝑘 )