Math 141 - Lecture 17: Bootstrapping and Randomization...
Transcript of Math 141 - Lecture 17: Bootstrapping and Randomization...
Math 141Lecture 17: Bootstrapping and Randomization tests
Albyn Jones1
1Library [email protected]
www.people.reed.edu/∼jones/courses/141
Albyn Jones Math 141
Non-Normal data
Question: The t-test depends on having normally distributeddata. What do we do with data that are clearly non-normal, orwhen we have nonlinear functions of normally distributed data?
Albyn Jones Math 141
Non-Normal Data: Analysis Options
Options:
Transformations If the data are positively skewed, try workingwith log (Y ) or
√Y .
Classical Non-parametric Methods Including the sign test, andrank tests.
Resampling Methods Computer intensive: randomization testsand bootstrapping.
Distribution-Specific Methods Other methods are available ifyou have a specific probability model for your data:see Math 392.
Large Sample Approximations Possible for nonlinear functionsof well behaved statistics like X , for example:log(X ) or a ratio X/Y .
Albyn Jones Math 141
Non-Normal Data: Analysis Options
Options:Transformations If the data are positively skewed, try working
with log (Y ) or√
Y .
Classical Non-parametric Methods Including the sign test, andrank tests.
Resampling Methods Computer intensive: randomization testsand bootstrapping.
Distribution-Specific Methods Other methods are available ifyou have a specific probability model for your data:see Math 392.
Large Sample Approximations Possible for nonlinear functionsof well behaved statistics like X , for example:log(X ) or a ratio X/Y .
Albyn Jones Math 141
Non-Normal Data: Analysis Options
Options:Transformations If the data are positively skewed, try working
with log (Y ) or√
Y .Classical Non-parametric Methods Including the sign test, and
rank tests.
Resampling Methods Computer intensive: randomization testsand bootstrapping.
Distribution-Specific Methods Other methods are available ifyou have a specific probability model for your data:see Math 392.
Large Sample Approximations Possible for nonlinear functionsof well behaved statistics like X , for example:log(X ) or a ratio X/Y .
Albyn Jones Math 141
Non-Normal Data: Analysis Options
Options:Transformations If the data are positively skewed, try working
with log (Y ) or√
Y .Classical Non-parametric Methods Including the sign test, and
rank tests.Resampling Methods Computer intensive: randomization tests
and bootstrapping.
Distribution-Specific Methods Other methods are available ifyou have a specific probability model for your data:see Math 392.
Large Sample Approximations Possible for nonlinear functionsof well behaved statistics like X , for example:log(X ) or a ratio X/Y .
Albyn Jones Math 141
Non-Normal Data: Analysis Options
Options:Transformations If the data are positively skewed, try working
with log (Y ) or√
Y .Classical Non-parametric Methods Including the sign test, and
rank tests.Resampling Methods Computer intensive: randomization tests
and bootstrapping.Distribution-Specific Methods Other methods are available if
you have a specific probability model for your data:see Math 392.
Large Sample Approximations Possible for nonlinear functionsof well behaved statistics like X , for example:log(X ) or a ratio X/Y .
Albyn Jones Math 141
Non-Normal Data: Analysis Options
Options:Transformations If the data are positively skewed, try working
with log (Y ) or√
Y .Classical Non-parametric Methods Including the sign test, and
rank tests.Resampling Methods Computer intensive: randomization tests
and bootstrapping.Distribution-Specific Methods Other methods are available if
you have a specific probability model for your data:see Math 392.
Large Sample Approximations Possible for nonlinear functionsof well behaved statistics like X , for example:log(X ) or a ratio X/Y .
Albyn Jones Math 141
Transformations
Often with positively skewed data (long right tail, short left tail),it is helpful to work with the logarithm (natural logarithm!) of thedata, or occasionally with the square root. Examples:
SD(Y ) ∝ Y =⇒ log y
Var(Y ) ∝ Y =⇒√
y
In certain disciplines such transformations are standardoperating procedure. For example, with economic data such asincome or wealth, always consider working in the log scale.
Albyn Jones Math 141
Log Transformation: Example
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
010
2030
40Normal Q−Q Plot
Theoretical Quantiles
CPS:
Wag
es ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
01
23
Normal Q−Q Plot
Theoretical Quantiles
CPS:
log(W
ages
)
Albyn Jones Math 141
Log Transformation: Example
0 10 20 30 40
0.00
0.02
0.04
0.06
0.08
0.10
Wages
N = 534 Bandwidth = 1.148
Dens
ity
0 1 2 3 4
0.00.2
0.40.6
0.8
log(Wages)
N = 534 Bandwidth = 0.1353
Dens
ity
Albyn Jones Math 141
Sign Test and Rank Test
Wilcoxon Rank Sum and Signed Rank Tests
Performs one- and two-sample Wilcoxon testson vectors of data; the latter is also knownas the Mann-Whitney test.
wilcox.test(x, y = NULL,alternative = c("two.sided", "less", "greater"),mu = 0, paired = FALSE, exact = NULL,correct = TRUE,conf.int = FALSE, conf.level = 0.95, ...)
Albyn Jones Math 141
Randomization Tests
Suppose we have randomly assigned subjects to one of twotreatments. We record some score for each subject. Under thenull hypothesis that there is no difference between the twotreatments, the two sets of scores have the same distribution,and thus each subject’s score is equally likely to belong toeither group.Randomization Test: randomly shuffle the scores many times.Each time, record the value of the test statistic (eg. thedifference between the average scores of the two groups). Thep-value for a two-sided test is the proportion of the shuffled teststatistics that are more extreme in absolute value than theobserved statistic.
Albyn Jones Math 141
Randomization Tests: Simple Example
> X <- c(1,2,3,4,7)> Y <- c(6,8,9,10)> mean(X)-mean(Y)[1] -4.85
> A <- c(X,Y)> A[1] 1 2 3 4 7 6 8 9 10
> B <- sample(A)> B[1] 2 7 1 9 8 3 6 4 10> mean(B[1:5]) - mean(B[6:9])[1] -0.35
Albyn Jones Math 141
Randomization Tests: Example
> CogB1.7 0.7 -0.1 -0.7 -3.5 14.9 3.5 17.1 -7.61.6 11.7 6.1 1.1 -4.0 20.9 -9.1 2.1 -1.41.4 -0.3 -3.7 -0.8 12.6 1.9 3.9 0.1 15.4 -0.7
> Control-0.5 -9.3 -5.4 12.3 -2.0 -10.2 -12.2 11.6 -7.16.2 -0.2 -9.2 8.3 3.3 11.3 0.0 -1.0 -10.6-4.6 -6.7 2.8 0.3 1.8 3.7 15.9 -10.2
> length(CogB)[1] 28> length(Control)[1] 26
Albyn Jones Math 141
Example: One step
> mean(CogB)-mean(Control)[1] 3.478571## one trial#> x <- c(CogB,Control)> y <- sample(x) # shuffle the data!> G1 <- y[1:28]> G2 <- y[29:54]> mean(G1)-mean(G2)[1] -2.158791
Albyn Jones Math 141
Details
# the data, in original order1.7 0.7 -0.1 -0.7 -3.5 14.9 3.517.1 -7.6 1.6 11.7 6.1 1.1 -4.020.9 -9.1 2.1 -1.4 1.4 -0.3 -3.7-0.8 12.6 1.9 3.9 0.1 15.4 -0.7----------------------------------------0.5 -9.3 -5.4 12.3 -2.0 -10.2 -12.211.6 -7.1 6.2 -0.2 -9.2 8.3 3.311.3 0.0 -1.0 -10.6 -4.6 -6.72.8 0.3 1.8 3.7 15.9 -10.2
Albyn Jones Math 141
Shuffled Data
> sample(X)-0.7 20.9 -0.2 14.9 11.3 15.9 -10.2-9.2 0.3 2.8 -1.0 -10.6 -7.6 1.9
-12.2 12.3 3.5 1.1 15.4 6.2 -0.50.0 -5.4 -4.0 -3.7 -0.8 12.6 -0.3----------------------------------------3.5 11.6 -0.7 -1.4 -10.2 11.7 1.7-7.1 1.6 3.9 2.1 -2.0 3.7 0.7-4.6 -6.7 0.1 6.1 1.4 17.1 -0.18.3 -9.1 3.3 -9.3 1.8
Albyn Jones Math 141
Example: 10000 trials, a p-value!
> diffs <- rep(0,10000)> for(i in 1:10000){+ y <- sample(x)+ diffs[i] <- mean(y[1:28])-mean(y[29:54])+ }> sum(abs(diffs)>3.478571)/10000[1] 0.1014
Albyn Jones Math 141
Compare: the t-test
> t.test(CogB,Control,var.equal=TRUE)
Two Sample t-test
data: CogB and Controlt = 1.6567, df = 52, p-value = 0.1036alternative hypothesis: true difference
in means is not equal to 095 percent confidence interval:-0.7347931 7.6919359
# The randomization p-value was 0.1014
Albyn Jones Math 141
Compare: the rank sum test
> wilcox.test(CogB,Control,conf.int=TRUE)Wilcoxon rank sum test with continuity correction
data: CogB and ControlW = 455, p-value = 0.1172alternative hypothesis: true location shift
is not equal to 095 percent confidence interval:-0.6999481 8.2000132
Albyn Jones Math 141
Randomization test advantages
The randomization test requires only that subjects wererandomly assigned to groups. (Independent samples fromtwo populations is a stronger condition!)It does not assume that the data have any particulardistribution.It applies to any test statistic.Unlike the sign test and rank tests, it does not discard anyinformation.
Albyn Jones Math 141
Example: Test H0 : σ2x/σ
2y = 1
We can test hypotheses about other parameters of interest!
> ratios <- rep(0,10000)> for(i in 1:10000){+ y <- sample(x)+ ratios[i] <- var(y[1:28])/var(y[29:54])+ }> var(CogB)/var(Control)[1] 0.8677363> r <- var(CogB)/var(Control)> sum(ratios< r | ratios > 1/r)/10000[1] 0.7025
Albyn Jones Math 141
Bootstrapping
Bootstrapping is another resampling method, more general inmany respects, slightly less in others. In particular, it is easy toproduce confidence intervals, as well as test specifichypotheses.
The Idea: The Empirical CDF estimates the population CDF.The behavior of statistics computed on samples taken from thedistribution represented by the ECDF approximates thebehavior of statistics computed on samples taken from thepopulation.
The Method: Repeatedly resample the dataset, computing thestatistic of interest each time.
Albyn Jones Math 141
Example: Simple 95% CI for difference of means
> BootDiffs <- rep(0,20000)> for(i in 1:20000){+ x <- sample(CogB,size=28,replace=TRUE)+ y <- sample(Control,size=26,replace=TRUE)+ BootDiffs[i] <- mean(x) - mean(y)+ }> quantile(BootDiffs,c(.025,.975))
2.5% 97.5%-0.5198077 7.5454190
# The 95% CI based on the t test: (-0.699, 8.200)
Albyn Jones Math 141
Another Bootstrap CI
> quantile(BootDiffs,c(.025,.975))2.5% 97.5%
-0.5198077 7.5454190
> sd(BootDiffs)[1] 2.045221> mean(CogB)-mean(Control)[1] 3.478571> 3.478571 + c(-1,1)*2*2.045221[1] -0.611871 7.569013
# The 95% CI based on the t test: (-0.699, 8.200)
Albyn Jones Math 141
Bootstrapping Paired Samples
Suppose we have paired samples, such as the Crossed andSelf polinated pairs in Darwin’s experiment: Treat it as a onesample Problem!
> DarwinPot Crossed Self
1 1 23.500 17.3752 1 12.000 20.3753 1 21.000 20.000<................>
14 4 23.000 15.50015 4 12.000 18.000
Albyn Jones Math 141
Example: Darwin’s Data
> attach(Darwin)> Y <- Crossed - Self> B <- rep(0,10000)> for(i in 1:10000){+ y <- sample(Y,size=15,replace=TRUE)+ B[i] <- mean(y)+ }
> quantile(B,c(.025,.975))2.5% 97.5%
0.2247917 4.7750000
Albyn Jones Math 141
Compare to the T-test
> t.test(Y)
One Sample t-test
data: Yt = 2.148, df = 14, p-value = 0.0497alternative hypothesis: true mean is not equal to 095 percent confidence interval:0.003899165 5.229434169
# Bootstrap CI: (0.2247917 4.7750000)
Albyn Jones Math 141
Comparison: Two Anorexia Therapies
Method p-value 95% CI
t.test 0.104 (-0.73 7.69)
wilcox.test 0.117 (-0.70, 8.20)
randomization 0.101
bootstrap (-0.52, 7.55)
boot.se 0.095 (-0.61, 7.57)
Albyn Jones Math 141
The R boot package
There is an R library with more sophisticated bootstrappingfunctions: package boot:
install.packages("boot")library(boot)?boot
Albyn Jones Math 141