THE REPLICATION PARADOX - WordPress.com€¦ · psychology students, 360 social scientists, and 31...

2
THE REPLICATION PARADOX Combining Studies Can Decrease Accuracy of Effect Size Estimates Michèle B. Nuijten, Marcel A. L. M. van Assen, Coosje L. S. Veldkamp, & Jelte M. Wicherts, Tilburg University, [email protected] INTRODUCTION PROBLEM BIAS IN REPLICATIONS CONCLUSION SUMMARY A replication will increase PRECISION but… A replication adds BIAS if N is smaller than original study But ONLY when there is PUBLICATION BIAS and POWER is not high enough BIAS IN SINGLE STUDIES We analytically derived several scenarios in which we investigated the influence of publication bias on the effect size estimate in simple two-independent samples designs. 3,4,5 We varied N (20 or 35 subjects per group), the population effect size (Cohen’s d = 0-1), and publication bias (proportion of non-significant results published: pub = 0, .05, .25, .50, 1). In each scenario we measured the bias in published effect size (bias = published effect - population effect). We used the weighted averages of the bias in two single studies to calculate the bias in a replication scenario. PUBLICATION BIAS inflates published effect sizes Seems to hold for REPLICATIONS as well 1,2 How does this affect effect size estimates when studies are COMBINED? METHOD EXPERIMENT A EXPERIMENT B N k =35 How do I evaluate this information? N k =20 Researchers preferred two large studies over one. However, because both studies contain the same amount of bias, the weighted average of the two effect sizes will contain the same amount of bias as a single large study. Furthermore, adding a small replication to a large study increases bias, since a small study contains more bias in the first place. Finally, the bias in two small studies is equal to the bias in one small study, which is higher than the bias in a single large study. It turns out that in all scenarios of the questionnaire, a single large study renders the most accurate effect size estimate. INTERPRETATION 1 A B 2 A A+B 3 A B+B 4 A A+A = majority chose this answer Imagine you want to estimate a certain effect size. You search the literature and find two identical studies that only differ in size: In a survey we gave 106 psychology students, 360 social scientists, and 31 quantitative psychologists 4 study combinations and asked: “Which situation yields the most accurate estimate of the effect size in the population?”. The answers are summarized below: = analytically shown to be the best option Based on our results, we present the correct answers to the survey in which we asked: “Which situation yields the most accurate estimate of the effect size in the population?”. Remember that “A” is a study with N k =35, and “B” with N k =20. 1 A B 2 A A+B 3 A B+B 4 A A+A The intuition is: the more information, the better If a replication is available, researchers would include it in their effect size estimate This seems very logical. Statistics teach us “the higher the N, the higher your precision” and the literature is filled with the advantages of replication However, the intuition is false 1. Francis, G. (2012). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin & Review, 19(6), 975-991. 2. Ferguson, C. J., & Brannick, M. T. (2012). Publication bias in psychological science: Prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychological Methods, 17, 120-128. 3. Gerber, A. S., D. P. Green, et al. (2001). Testing for publication bias in political science. Political Analysis, 9, 385-392. 4. Kraemer, H. C., C. Gardner, et al. (1998). Advantages of excluding underpowered studies in meta-analysis: Inclusionist versus exclusionist viewpoints. Psychological Methods, 3, 23-31. 5. Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 1-12. 6. Open Science Collaboration. (2012). An open, large-scale, collaborative effort to estimate the reproducibility of psychological science. Perspectives on Psychological Science, 7, 657-660. There are several solutions Eliminate publication bias. There are some good initiatives already, 6 but it will take time. And what to do with the studies that are already published? Correct for publication bias. However, existing tests have low power or strong assumptions. Increase POWER! Only evaluate (and only perform) studies with high power. Scan the QR code with your smartphone to download a copy of this poster, including more detailed information about the calculations and an explanation of the “bumps” in the graphs. EXPERIMENT A EXPERIMENT B N k =35 N k =20 So: when you perform a literature search and want to get the most accurate effect size estimate, discard all underpowered studies.

Transcript of THE REPLICATION PARADOX - WordPress.com€¦ · psychology students, 360 social scientists, and 31...

Page 1: THE REPLICATION PARADOX - WordPress.com€¦ · psychology students, 360 social scientists, and 31 quantitative psychologists 4 study combinations and asked: “Which situation yields

THE REPLICATION PARADOX Combining Studies Can Decrease Accuracy of Effect Size Estimates Michèle B. Nuijten, Marcel A. L. M. van Assen, Coosje L. S. Veldkamp, & Jelte M. Wicherts, Tilburg University, [email protected]

INTRODUCTION PROBLEM

BIAS IN REPLICATIONS

CONCLUSION SUMMARY

• A replication will increase PRECISION but…

• A replication adds BIAS if N

is smaller than original study • But ONLY when there is

PUBLICATION BIAS and POWER is not high enough

BIAS IN SINGLE STUDIES We analytically derived several scenarios in which we investigated the influence of publication bias on the effect size estimate in simple two-independent samples designs.3,4,5 We varied N (20 or 35 subjects per group), the population effect size (Cohen’s d = 0-1), and publication bias (proportion of non-significant results published: pub = 0, .05, .25, .50, 1). In each scenario we measured the bias in published effect size (bias = published effect -population effect). We used the weighted averages of the bias in two single studies to calculate the bias in a replication scenario.

• PUBLICATION BIAS inflates published effect sizes

• Seems to hold for REPLICATIONS as well1,2

• How does this affect effect size estimates when studies are COMBINED?

METHOD

EXPERIMENT A EXPERIMENT B

Nk=35

How do I evaluate this information?

Nk=20

Researchers preferred two large studies over one. However, because both studies contain the same amount of bias, the weighted average of the two effect sizes will contain the same amount of bias as a single large study. Furthermore, adding a small replication to a large study increases bias, since a small study contains more bias in the first place. Finally, the bias in two small studies is equal to the bias in one small study, which is higher than the bias in a single large study. It turns out that in all scenarios of the questionnaire, a single large study renders the most accurate effect size estimate.

INTERPRETATION

1 A B 2 A A+B 3 A B+B 4 A A+A

= majority chose this answer

Imagine you want to estimate a certain effect size. You search the literature and find two identical studies that only differ in size:

In a survey we gave 106 psychology students, 360 social scientists, and 31 quantitative psychologists 4 study combinations and asked: “Which situation yields the most accurate estimate of the effect size in the population?”. The answers are summarized below:

= analytically shown to be the best option

Based on our results, we present the correct answers to the survey in which we asked: “Which situation yields the most accurate estimate of the effect size in the population?”. Remember that “A” is a study with Nk=35, and “B” with Nk=20.

1 A B 2 A A+B 3 A B+B 4 A A+A

• The intuition is: the more information, the better

• If a replication is available, researchers would include it in their effect size estimate

• This seems very logical. Statistics teach us “the higher the N, the higher your precision” and the literature is filled with the advantages of replication

• However, the intuition is false

1. Francis, G. (2012). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin & Review, 19(6), 975-991. 2. Ferguson, C. J., & Brannick, M. T. (2012). Publication bias in psychological science: Prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychological Methods, 17, 120-128. 3. Gerber, A. S., D. P. Green, et al. (2001). Testing for publication bias in political science. Political Analysis, 9, 385-392. 4. Kraemer, H. C., C. Gardner, et al. (1998). Advantages of excluding underpowered studies in meta-analysis: Inclusionist versus exclusionist viewpoints. Psychological Methods, 3, 23-31. 5. Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 1-12. 6. Open Science Collaboration. (2012). An open, large-scale, collaborative effort to estimate the reproducibility of psychological science. Perspectives on Psychological Science, 7, 657-660.

• There are several solutions

• Eliminate publication bias. There are some good initiatives already,6 but it will take time. And what to do with the studies that are already published?

• Correct for publication bias. However, existing tests have low power or strong assumptions.

• Increase POWER! Only evaluate (and only perform) studies with high power.

Scan the QR code with your smartphone to download a copy of this poster, including more detailed information about the calculations and an explanation of the “bumps” in the graphs.

EXPERIMENT A EXPERIMENT B

Nk=35 Nk=20

So: when you perform a literature search and want to get the most accurate effect size estimate, discard all underpowered studies.

Page 2: THE REPLICATION PARADOX - WordPress.com€¦ · psychology students, 360 social scientists, and 31 quantitative psychologists 4 study combinations and asked: “Which situation yields

THE REPLICATION PARADOX Combining Studies Can Decrease Accuracy of Effect Size Estimates Michèle B. Nuijten, Marcel A. L. M. van Assen, Coosje L. S. Veldkamp, & Jelte M. Wicherts, Tilburg University, [email protected]

1.

2.

The bump in the graph can be explained as follows. The total bias consists of two opposing forces: 1. Bias when pub =0 decreases

as d increases 2. Relative bias increases when

d increases

Total bias = 1 * 2 When d is low: • Increase in relative bias

overrules decrease in bias for pub = 0

When d is high: • Relative bias increases

relatively less than bias for pub = 0

EQUATIONS TO CALCULATE BIAS

THE “BUMP” IN THE GRAPH

Schematic representation of the effect of publication bias on the published effect size estimate. “H0” and “H1” are the regions of accepting and rejecting H0, respectively, 1- represents power, α is the type I error, cv is the critical value of the test, and d is the true effect size. D0 and D1 are the expected effect sizes conditional on the acceptance or rejection of H0, respectively, and D is the expected value of the published effect size.