Revisiting an Old Topic: Probability of Replication
description
Transcript of Revisiting an Old Topic: Probability of Replication
![Page 1: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/1.jpg)
Revisiting an Old Topic:Probability of Replication
D. Lizotte, E. Laber & S. Murphy
Johns Hopkins Biostatistics
September 23, 2009
![Page 2: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/2.jpg)
2
Outline
• Scientific Background
• Our Estimand: Probability of Selection
• Estimators
• STAR*D
• Where to go from here?
![Page 3: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/3.jpg)
3
Scientific Background
First experiment results in– or – ,
– what is the chance that we will replicate this result in a subsequent experiment?
– Prob. of Concurrence or Prob. of Replication– Killeen (2005) followed by great controversy in
psychology (Cumming, (2005, 2006, 2008); MacDonald (2005);Doros & Geier(2005); Iverson(2008); Iverson, Wagenmakers & Lee (2008); Asby & O’Brien(2008), Iverson, Lee & Wagenmakers (2009)……)
![Page 4: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/4.jpg)
4
Scientific Background
Similar problem but discredited:
• Post-hoc power/ Observed power: Assuming the observed standardized effect size is the truth, calculate the probability of rejecting null hypothesis. Hoenig & Heisey (2001)
![Page 5: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/5.jpg)
5
Scientific Background
First experiment results in– or – – what is the chance that we will replicate this result
in a subsequent experiment?
• Why is this question so attractive?• Scientists (including statisticians!) often want to
answer this question with 1 – p-value
![Page 6: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/6.jpg)
6
Scientific Background
• First experiment results in– or – ,– what is the chance that we will replicate this result
in a subsequent experiment?
• 1 – p-value does not address this question.– Goodman (1992), Cumming (2008)– 1 – p-value is not an estimator.
![Page 7: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/7.jpg)
7
Scientific Background
• Much confusion about estimand:– , what is the chance that we will replicate this
result in a subsequent experiment?• Do we want to “estimate”
1) or
2) or
3) or
4) ?
• Good frequentist properties are desired.
![Page 8: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/8.jpg)
8
Our Estimand
• Probabilities of Selection 2)
• The probability of selection is a composite measure of signal, noise, and sample size
![Page 9: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/9.jpg)
9
Our Estimand• Advantages (The Hope) over the concept of p-value
– Close to what many scientists want.
– The intuitive interpretation is correct.
– Does not rely on the correctness of a data generating model for meaning.
– Less ambitious than 3)
• Disadvantages– We changed the question.
– Some may think that there is no need for a confidence interval—wrong.
– Non-regular
![Page 10: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/10.jpg)
10
Estimators
• Why is this a hard problem?– The desire for good frequentist properties – The fact that effect sizes tend to be small relative to
the noise.– This is a non-regular problem—bias is of the same
order as variance.
• Back of the envelope calculations:
![Page 11: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/11.jpg)
11
Estimators
•
• Use plug-in estimator
• Plug-in estimator is 1 – p-value (Goodman, 1992)!– Nonregular
• Near a uniform distribution if
• If n is large, close to 0 or 1 otherwise
– We can expect to be small.
![Page 12: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/12.jpg)
12
Estimators
• Try a Bayesian approach. – Random sample from a , – Flat prior on , known – Use as an estimator of
–
• Bayesian methods do not eliminate non-regularity.
![Page 13: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/13.jpg)
13
Estimators
Focus on MSE in formulating estimators for . 1) Assume is approximately normal with mean
and variance 1) Flat prior (e.g. Killeen’s prep)
2) Normal Prior:
3) Prior is mixture between N(0,1) with probability w point mass on with probability 1-w
![Page 14: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/14.jpg)
14
Estimators
Focus on MSE in formulating estimators for . 2) Single bootstrap (Efron & Tibshirani:1989) .
• This is 1 - p-value. No assumption of approximate normality. If is approximately normal then this is approximately the plug-in estimator:
3) Double bootstrap• This is a bagged plug-in estimator. This bags the 1-
bootstrap p-value. No assumption of approximate normality.
![Page 15: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/15.jpg)
15
Why a double bootstrap?
Double bootstrap estimator for . • Bagging is used to trade variance for bias when
estimators are unstable (Buehlman & Yu, 2002). • The bootstrap estimator of is
unstable; if it does not converge as the sample size increases.
• Under local alternatives such as the bootstrap estimator is inconsistent as well.
![Page 16: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/16.jpg)
16
Double Bootstrap
Double bootstrap estimator for .If has an approximate normal distribution then the
double bootstrap estimator is
That is, the double bootstrap reduces to prep in this case.
![Page 17: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/17.jpg)
17
MSE Plots
• Two groups, each of size 25
• Two distributions (normal, bimodal)
• Two definitions of – –
• Compare – prep, pnorm, pmix, single bootstrap, double
bootstrap
![Page 18: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/18.jpg)
![Page 19: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/19.jpg)
![Page 20: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/20.jpg)
![Page 21: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/21.jpg)
21
Estimators
Instead of a point estimator, consider a confidence interval for .
Assume has an approximate normal distribution; then
In this case a confidence interval for can be found from a confidence interval for the standardized effect size:
![Page 22: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/22.jpg)
22
STAR*D
• Sequenced Treatment Alternatives to Relieve Depression
• Large multi-site study focused on individuals whose depression did not remit with citalopram
• In this trial each individual can proceed through up to 4 stages of treatment. The individual moves to a next stage if the individual is not responding to present treatment.
• Each stage involves a randomization.
![Page 23: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/23.jpg)
23
STAR*D
• This is a data from 683 individuals who did not respond to citalopram and preferred a switch in treatment.
• These individuals were randomized between Venlafaxine, Bupropion, Sertraline
• Outcome: Time until remission.
• We model the area under the survival curve from entry into this stage of treatment until 30 months. (e.g. min(T, 30)).
![Page 24: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/24.jpg)
STAR*D
Regression formula at level 2:
![Page 25: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/25.jpg)
25
STAR*D
• For each s,
• Double Bootstrap– Inner-most bootstrap counts proportion of “votes”
in which – Outer-most bootstrap averages over the proportion
across the bootstrap samples
![Page 26: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/26.jpg)
![Page 27: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/27.jpg)
![Page 28: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/28.jpg)
28
Discussion
• Definition of the probability of selection when there is more than two treatments.
• Confidence intervals for comparisons between more than two treatments.
• Is there a minimax estimator of the selection probability?
• Is there hope for the replication probability?
![Page 29: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/29.jpg)
29
Truth in Advertising:STAR*D
Missing Data + Study Drop-Out
• 1200 subjects begin level 2 (e.g. stage 1)
• 42% study dropout during level 2
• 62% study dropout by 30 weeks.
• Approximately 13% item missingness for important variables observed after the start of the study but prior to dropout.
![Page 30: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/30.jpg)
30
This seminar can be found at:http://www.stat.lsa.umich.edu/~samurphy/
seminars/HopkinsBiostat09.23.09.ppt
Email me with questions or if you would like a copy!
![Page 31: Revisiting an Old Topic: Probability of Replication](https://reader035.fdocuments.in/reader035/viewer/2022081418/56814645550346895db35064/html5/thumbnails/31.jpg)
31
Our Estimand
• The probability of selection is a composite measure of signal, noise and sample size
• The p-value is a composite measure of estimated signal, estimated noise and sample size.