Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of...

17
logga Modern methods of statistical learning SF2935 Johan Westerborn [email protected] Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap

Transcript of Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of...

Page 1: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Modern methods of statistical learningSF2935

Johan Westerborn

[email protected]

Lecture 6: BootstrapModern methods of statistical learning

16 November 2015

Johan Westerborn Statistical learning (1) Bootstrap

Page 2: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Outline

1 Introduction to bootsrap

2 Non-parametric bootstrap

3 Parameteric bootstrap

Johan Westerborn Statistical learning (2) Bootstrap

Page 3: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Introduction to bootsrap

What is the bootstrap

We are given some data z = (z0, z1, . . . , zn) and wish to calculatethe value τ that depends on the distribution of the data.Using this data we calculate some estimator τ̂ = t(z)

I As an example we can estimate the mean using

t(z) =1n

n∑i=1

zi

How certain can we be on the value τ̂ that we get?The bootstrap method tries to answer this question.

Johan Westerborn Statistical learning (4) Bootstrap

Page 4: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Introduction to bootsrap

If we knew the distribution of Z

t(z) is just an observation of the random variable t(Z )

the error in our estimator is ∆(z) = t(z)− τwhich is an observation of the random variable ∆(Z ) = t(Z )− τUncertainity of the estimator would require us to study thedistribution of ∆(Z )

If we would like to calculate a confidence interval for the estimatort(Y ) we would have to invert the distribution function of ∆(Y )

Iα =(

t(z)− F−1∆(Z )(1− α/2), t(z)− F−1

∆(Z )(α/2))

The bias of the estimator is E[∆(Z )].

Johan Westerborn Statistical learning (5) Bootstrap

Page 5: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Introduction to bootsrap

Normal distribution example

Normal distributionAssume that zi , i = 1, . . . ,n are i.i.d. normal distributed randomvariable with mean µ and variance 1.

Our estimator of µ is µ̂ = t(z) = 1n∑n

i=1 zi .

With this estimator we have that ∆(Z ) = t(Z )− µ ∼ N (0,1/n)

In this case we can calculate the error distribution exactly.

Johan Westerborn Statistical learning (6) Bootstrap

Page 6: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Introduction to bootsrap

What if we don’t know the distribution

If we don’t know the distribution we can use the bootstrap!The main idea is to substitute the distribution of Z with theempirical distribution based on the sample z.

F̂n(x) =1n

n∑i=1

1(zi ≤ x) = fraction of the zi ’s less than x

Easy to show that

nF̂n(x) ∼ Bin(n,F (x))

limn→∞

F̂n(x) = F (x) a.s.

Johan Westerborn Statistical learning (7) Bootstrap

Page 7: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Introduction to bootsrap

Empirical versus true distribution

−3 −2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

ecdf(ran.vec)

x

Fn(x

)

Johan Westerborn Statistical learning (8) Bootstrap

Page 8: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Introduction to bootsrap

The bootstrap algorithm

Given data z from the distribution Z we replace the distributionfunction with the empirical distribution function. The algorithmgoes as follows:

I Calculate τ̂ = t(z).I Simulate B new datasets zb,b = 1, . . . ,B where each zb has the

same size as z and each zb is obtained by drawing from theempirical distribution (that is resample with replacement from thevector z).

I Compute τ̂b = t(zb),b = 1, . . . ,B.I Calculate ∆b = τ̂b − τ̂ . This can be used for uncertainty analysis.

Johan Westerborn Statistical learning (9) Bootstrap

Page 9: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Introduction to bootsrap

Runing example

Yearly maximum water height in Port Pirie

We have a dataset of 65yearly measurements of thehighest sea level recorded inthe city of Port Pirie in thesouthern Australia.Can we say anything aboutthe 10 year sea level? or the100 year sea level?

Johan Westerborn Statistical learning (10) Bootstrap

Page 10: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Introduction to bootsrap

Runing example cont.

1930 1940 1950 1960 1970 1980

3.6

3.8

4.0

4.2

4.4

4.6

Year

SeaLevel

Johan Westerborn Statistical learning (11) Bootstrap

Page 11: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Non-parametric bootstrap

The 10-year sea level

The 10-year sea level is defined as F−1(1− 1/10), where thedistribution is the distribution of the yearly maximum.Since we don’t make any assumptions on the data, given a vectorof 65 values we choose the 65 ∗ (1− 1/10) = 58.5th value as the10-year return.Since we can’t choose the 58.5 th value we take the mean of the58th and 59th largest value.We let the function t(z) be taking the “58.5” value in the vector z.

Johan Westerborn Statistical learning (13) Bootstrap

Page 12: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Non-parametric bootstrap

The 10-year sea level cont.

We perform the bootstrap by following the algorithm:I Calculate τ̂ = t(z)I For every b ∈ {1, . . . ,1000} draw a vector zb by resampling from

the data z and set τ̂b = t(zb).I We set ∆b = τ̂b − τ̂I We can now estimate the bias as

Estimated bias =1B

B∑b=1

∆b

I We can estimate the standard deviation of the error in the usualway.

I We can estimate a confidence interval by taking the appropriatequantiles of the ∆b vector and using that together with τ̂ .

Johan Westerborn Statistical learning (14) Bootstrap

Page 13: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Non-parametric bootstrap

The 10-year sea level cont.We get the estimated 10-year sea level to be τ̂ = 4.298Estimated bias = −0.006995% confidence interval to be (4.198,4.370)

Histogram of non_par.boot$t

non_par.boot$t

Fre

quency

4.1 4.2 4.3 4.4 4.5

0100

200

300

400

Johan Westerborn Statistical learning (15) Bootstrap

Page 14: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Parameteric bootstrap

Another wayWhat if we want to estimate the 100-year sea level?

I Notice how bad the previous estimator would be.I We need to estimate something that is outside of our data range.

To solve this we set a distribution governed by some parameters θon our data.In our case we set the Gumbel distribution with distributionfunction

F (x) = exp(−exp

(−x − µ

β

)), x ∈ R

which has inverse

F−1(y) = µ− β log(− log(y)), y ∈ (0,1)

Johan Westerborn Statistical learning (17) Bootstrap

Page 15: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Parameteric bootstrap

The parametric bootstrap

In the parametric bootstrap instead of using the empiricaldistribution we calculate θ̂ = θ̂(y) as an estimate of θThe new samples are then generated from the distributiondecided by θ̂ and calculate θ̂b = θ̂(yb)

We let the function t(y) depend on the estimated parameters θ̂binstead of the sample.

Johan Westerborn Statistical learning (18) Bootstrap

Page 16: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Parameteric bootstrap

The 100-year sea level

We perform the parametric bootstrap to get the 100-year sea levelin the following way:

I Estimate the parameters θ̂ using the MLE method and we letτ̂ = t(θ̂) = F−1(1− 1/100; θ̂)

I For each b ∈ {1, . . . ,1000} we draw a new sample of size 65 fromthe Gumbel distribution with parameters θ̂ and calculate θ̂b

I We let τ̂b = t(θ̂b) and ∆b = τ̂b − τ̂

Johan Westerborn Statistical learning (19) Bootstrap

Page 17: Modern methods of statistical learning SF2935€¦ · Lecture 6: Bootstrap Modern methods of statistical learning 16 November 2015 Johan Westerborn Statistical learning (1) Bootstrap.

logga

Parameteric bootstrap

The 100-year sea level cont.We get τ̂ = 4.77We get an estimate bias to −0.0059We get an estimated lower 95% confidence interval to (4.60,∞)

Histogram of b1$t

b1$t

Fre

quency

4.4 4.5 4.6 4.7 4.8 4.9 5.0

050

100

150

200

Johan Westerborn Statistical learning (20) Bootstrap