Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte...

25
Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1

Transcript of Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte...

Page 1: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Bootstrap Confidence Intervals

ST552 Lecture 13

Charlotte Wickham

2019-02-11

1

Page 2: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Motivation

The inferences we’ve covered so far relied on our assumption of

Normal errors:

‘ ≥ N(0, ‡2In◊n)

For example, we’ve seen under this assumption, the least squares

estimates are also Normally distributed:

— ≥ N3

—, ‡21XT X

2≠14

If the errors aren’t truly Normally distributed, what

distribution do the estimates have?

2

Page 3: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Warm-up: Your Turn

Imagine the errors are in fact t3 distributed?

With your neighbour: design a simulation to understand the

distribution of the least squares estimates.

3

[ aGot ichor B

sampling

Page 4: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

4

Some model & - ¥ ,① Assure some things about model : y=XpptE

Decide o→pX ad B ,make something up .

n

② Compute f 's 100,000 times

a) Simulate y: simulate E

finely y-- X ft E

b) Using y ,fit model : f- ( * X) - '

XTY

③ MakeA Red :

We don't

¥ LA'

Bo

Page 5: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Example: 1. Fix n, fix X

2

4

6

8

10

5 10 15 20x

Response

5

n = 10

Xnvniffo ,20)

Page 6: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Example: 1. Fix —, find y

0

4

8

12

5 10 15 20x

Response

6

X Ely )

f- f 1,05)T

trueregressionline

, Ely )

Page 7: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Example: 2. Simulate errors, find y

0

4

8

12

5 10 15 20x

Response

7

Fsi .÷÷:D

Page 8: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Example: 3. Find least squares line

0

4

8

12

5 10 15 20x

Response

8

[Fitted

regression line

E. a

Page 9: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Example: 4. Repeat #2. and #3. many times

0

4

8

12

5 10 15 20x

Response

9

IOne fit fromOne

simulation

Page 10: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Example: Examine distribution of estimates

0.0

0.1

0.2

0.3

0.4

−5 0 5 10Estimate of intercept

dens

ity

0

1

2

3

4

5

0.00 0.25 0.50 0.75Estimate of slope

dens

ity

10

fo -- I p,

= o - 5

'siege. .

/ /

1000 Go 1000 § ,

Page 11: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Example: Compared to theory

0.0

0.1

0.2

0.3

0.4

−5 0 5 10Estimate of intercept

dens

ity

0

1

2

3

4

5

0.00 0.25 0.50 0.75Estimate of slope

dens

ity

11

If Eira Nfo ,o

' ) BN Nfp ,

E # D- ' )

E ; I ¥

Based on simulation :

If error are tz distributed I to

Normal,

We assured on n, X

, B .

Page 12: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

When the errors aren’t Normal: CLT

Think of our estimates like linear combinations of the errors. I.e. a

sort of average of i.i.d random variables.

Some version of the Central Limit Theorem will apply.

For large samples, even when the errors aren’t Normal,

—≥N(—, ‡2(XT X )

≠1)

12

§ -

- - + we

-

T

approximately

as h - A that approximationwill improve .

Page 13: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Summary so far

If we knew the error distribution and true parameters we could use

simulation to understand the sampling distribution the least

squares estimates.

Simulation can also be used to demonstrate the CLT at work in

regression.

13

Page 14: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Bootstrap confidence intervals

In practice, with data in front of us, we don’t know the distribution

of the errors (nor the true parameter values).

The bootstrap is one approach to estimate the sampling

distribution of —, by using the simulation idea, and substituting in

our best guesses for the things we don’t know.

14

Page 15: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Bootstrapping regression

(Model based resampling)

0. Fit model and find estimates, —, and residuals, ei

1. Fix X ,

2. For k = 1, . . . , B2.1 Generate errors, ‘ú

i sampled with replacement from ei2.2 Construct y , using the model, y = y + ‘ú

2.3 Use least squares to find —ú(k)

3. Examine the distribution of —úand compare to —

One confidence interval for —j is the 2.5% and 97.5% quantiles of

the distribution of —új .

(Known as the Percentile method, there are other (better?)methods).

15

residualsRepeat many

timesy

= x § + E*

(95T

.

Page 16: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Example: Faraway Galapagos Islands

(I’ll illustrate with simple linear regression, Faraway does multiplecase in 3.6)

−200

0

200

400

0 500 1000 1500Elevation

Spec

ies

Observed data

16

-

:Eth Elevates

Page 17: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Bootstrap: 1. Find —, y , and ei .

−200

0

200

400

0 500 1000 1500Elevation

Spec

ies

Observed data

17

Assume E Species ;-- fo Tf, Eteuitei

fo 'RE Levit

e i←

Page 18: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Bootstrap: Using fixed X , ˆbeta from observed data

−200

0

200

400

0 500 1000 1500Elevation

Spec

ies

Observed data

18

he

*

Page 19: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Bootstrap: 2. Resample residuals to construct bootstrapped

response

−200

0

200

400

0 500 1000 1500Elevation

Spec

ies

Bootstrapped data

19

appeala booth- r

yerror EEO,

eco sampledat

random

€ are from previouspage

Page 20: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Bootstrap: 3. Fit regression model to bootstrapped response

−200

0

200

400

0 500 1000 1500Elevation

Spec

ies

Bootstrapped data

20

Page 21: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Bootstrap: 3. Repeat #2. and #3. many times

−200

0

200

400

0 500 1000 1500Elevation

Species

21

Imay regression

lines fit to many

bootstrappeddatasets

Page 22: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Examine distribution of estimates

0.000

0.005

0.010

0.015

0.020

−50 0 50Bootstrap estimates of intercept

dens

ity

0

5

10

0.10 0.15 0.20 0.25 0.30 0.35Bootstrap estimates of slope

dens

ity

22

telling us about2- File sapling dist

↳ / of ei

197 . site

(otI e:*° C- 25,50 ) : 95 's .CI for Po

Page 23: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

High level: bootstrap idea

We don’t know the distribution of the errors, but our best guess is

probably the empirical c.d.f on the residuals.

Sampling from a random variable with a c.d.f. defined as the

empirical c.d.f. of the residuals, boils down to sampling with

replacement from residuals.

23

inn " " " i : "

D

Page 24: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Limitations

We might rely on bootstrap confidence intervals when we are

worried about the assumption of Normal errors. But, there are

limitations.

• We still rely on the assumption that the errors are

independent and identically distributed.

• Generally scaled residuals are used (residuals don’t have the

same variance, more later)

• An alternative bootstrap resamples the (yi , xi1, . . . , xip)

vectors, i.e. resamples the rows of the data, a.k.a resamplingcases bootstrap.

24

↳ choice should duped on

Experiment :sresapaEY.akae.de

' 'SJiang say .

.

T

Page 25: Bootstrap Confidence Intervals · Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1. Motivation The inferences we’ve covered so far relied on our

Limitations

We might rely on bootstrap confidence intervals when we are

worried about the assumption of Normal errors. But, there are

limitations.

• We still rely on the assumption that the errors are

independent and identically distributed.

• Generally scaled residuals are used (residuals don’t have the

same variance, more later)

• An alternative bootstrap resamples the (yi , xi1, . . . , xip)

vectors, i.e. resamples the rows of the data, a.k.a resamplingcases bootstrap.

24