The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for...

15
The Bootstrap and Jackknife Methods for Data Analysis Rainer W. Schiel Institut f¨ ur Theoretische Physik, Universit¨ at Regensburg December 21, 2011

Transcript of The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for...

Page 1: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

The Bootstrap and Jackknife Methods for DataAnalysis

Rainer W. Schiel

Institut fur Theoretische Physik, Universitat Regensburg

December 21, 2011

Page 2: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

A very simple example

Zero-dimensional data (only one quantity measured)

from an experiment

or from a Monte-Carlo simulation

200 400 600 800 1000

5

10

15

20

25

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 2 / 15

Page 3: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

A very simple example

Plot the Histogram:

5 10 15 20 25

20

40

60

80

100

120

140

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 3 / 15

Page 4: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

A very simple example

5 10 15 20 25

20

40

60

80

100

120

140

Average: x = 1N

i xi (= 14.9043 . . . )

Standard deviation: σ =√

1N−1

i(xi − x)2 (= 3.2298 . . . )

Standard deviation of the mean: σmean = 1√

Nσ (= 0.1021 . . . )

Result: x = x ± σmean (= 14.90 ± 0.10)

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 4 / 15

Page 5: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

Standard deviation of the mean: an Alternative?Repeat the whole experiment (i.e. taking 1000 data points) many (say2500) times

Compute the means of each experiment (→ 2500 means)

14.7 14.8 14.9 15.0 15.1 15.2 15.3

50

100

150

200

Standard deviation of these means is σmean (compare to 14.90 ± 0.10)

Method works, but is nonpractical

⇒ use re-sampling techniques

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 5 / 15

Page 6: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

The Bootstrap

“A loop sewn at the top rear orsometimes on each side of a bootto facilitate pulling it on.”

It is said that Baron vonMunchhausen pulled himself (andhis horse) up by his ownbootstraps.

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 6 / 15

Page 7: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

The bootstrap re-sampling technique

Do experiment once (here: 1000 measurements)

Use this data to “repeat the experiment” over and over again (here: 2500times)

“Repeat the experiment” by randomly picking (with replacement) 1000measurements

Do this 2500 times

Compute the mean of each sample

14.6 14.7 14.8 14.9 15.0 15.1 15.2

50

100

150

200

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 7 / 15

Page 8: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

The bootstrap re-sampling technique

14.6 14.7 14.8 14.9 15.0 15.1 15.2

50

100

150

200

The mean of the means is ≈ the mean of the original data set

The standard deviation of the means of the samples is σmean

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 8 / 15

Page 9: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

Advantages of the Bootstrap method

No need to propagate errors . . . never again!just compute the final quantity that you’re interested in on all samplesthe errors are automatically propagated to the final result

Handle data with skewed distributionsthe data of the samples is just as skewed as the original data

Perform more complicated data analysise.g. consider two noisy data sets fi(x) and gi(x)you are interested in a fit to f (x)

g(x)

with traditional methods, this can be difficult (division by zero!)no problem with the bootstrap method, though

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 9 / 15

Page 10: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

Disadvantages of the Bootstrap method

Computationally intensivethough usually not a problem with today’s computers

You might have to write your own codebut you might have to do that for other data analysis methods, too

A reasonably large original data sample is needed (> O(100))

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 10 / 15

Page 11: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

The Jackknife

According to www.artofmanliness.com, the Jackknife is a pocketknife that “has a simple hinge at one end, and may have more than oneblade.”

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 11 / 15

Page 12: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

The single-elimination jackknife re-sampling technique

Somewhat similar to the bootstrap re-sampling technique

Do experiment once (here: 1000 measurements)

Make 1000 samples of 999 measurements each – with the i th

measurement eliminated in the i th sample

Compute the mean (or the wanted quantity) of each sample

14.895 14.900 14.905 14.910 14.915

20

40

60

80

100

120

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 12 / 15

Page 13: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

The single-elimination jackknife re-sampling technique

14.895 14.900 14.905 14.910 14.915

20

40

60

80

100

120

The mean of the means is the mean of the original data set

The standard deviation of the means is σmean√

N(where N is the number of

measurements (here: 1000))

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 13 / 15

Page 14: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

Bootstrap vs. Jackknife

The bootstrap method handles skewed distributions better

The jackknife method is suitable for smaller original data samples

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 14 / 15

Page 15: The Bootstrap and Jackknife Methods for Data Analysis · The Bootstrap and Jackknife Methods for Data Analysis ... No need to propagate errors ... The bootstrap and jackknife methods

Conclusions

The bootstrap and jackknifemethods are powerful tools fordata analysis

They are very well suited toanalyze lattice data

Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 15 / 15