The Meta-Science of Adults Statistical Word Segmentation: Part 1 … · 2018-03-27 · Ongoing:...

1
The Meta-Science of Adults Statistical Word Segmentation: Part 1 BACKGROUND & MOTIVATION Data and analyses available at: https://osf.io/ehu7q/ 1. Online replications: (Q1, Q2 & Q3) Part 1: 6 of 130 experiments replicated [1] Exp. 1 from Saffran, Newport, & Aslin (1996) [2-3] Exps. 1 and 2 from Saffran et al. (1999) [4-5] Exps. 1 and 3 from Finn & Hudson Kam (2008) [6] Exp. 1 from Frank et al. (2010) Ongoing: Replicate all adult word segmentation exps (100+) 2. Three Replications: Online & In Lab: (Q1 & Q2) Part 1: Comparison of in-lab and online replications of [1] 3. Meta- analysis: (Q1 & Q4) Part 1: Preliminary p-curve analyses of main effects and modulators (30 of 148 studies coded) Ongoing: Complete p-curve analyses and meta-analytic regression analyses ** Help us confirm that we have collected all of the relevant papers by reviewing our list here: https://goo.gl/forms/Kllq0xAC12kzikex1 CONCLUSIONS OVERVIEW OF METHODOLOGY META ANALYSIS Lauren Skorb 1 Joshua K. Hartshorne 1 1 Department of Psychology, Boston College Poster available at l3atbc.org ONLINE REPLICATIONS Aarts, A. A., & LeBel, E. P. (2016). Curate science: A platform to gauge the replicability of psychological science. Behrend, T. S., Sharek, D. J., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior research methods, 43(3), 800–813. Finn, A. S., & Hudson Kam, C. L. (2008). The curse of knowledge: First language knowledge impairs adult learners’ use of novel statistics for word segmentation. Cognition, 108(2), 477–499. Frank, M. C., Goldwater, S., Griffiths, T. L., & Tenenbaum, J. B. (2010). Modeling human performance in statistical word segmentation. Cognition, 117(2), 107–125. Gureckis, T. M., Martin, J., McDonnell, J., Rich, A. S., Markant, D., Coenen, A., . . . Chan, P. (2016). psiTurk: An open-source framework for conducting replicable behavioral experiments online. Behavior Research Methods, 48(3), 829–842. Mahowald, K., James, A., Futrell, R., & Gibson, E. (2016). A meta-analysis of syntactic priming in language production. Journal of Memory and Language, 91, 5-27. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349. Romberg, A. R. and Saffran, J. R. (2010). Statistical learning and language acquisition. Wiley Interdisciplinary Reviews: Cognitive Science , 1(6):906–914. Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1), 27-52. Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of distributional cues. Journal of memory and language, 35(4), 606–621. Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534-547 THREE REPLICATIONS: ONLINE & IN LAB A1: Main effect (ME) of statistical word segmentation is most reliable finding, while the modulators do not appear to be reliable. 1. Meta analysis: P-curves show strong evidence for both effects 2. Online replications: We replicate ME but not modulators 3. In-lab replication: ME replicates w/ or w/o attention screen A2: Preliminary observation: The most reliable findings have larger sample sizes, more test items, & open materials/data. A3: We ran into issues in trying to reproduce some of the original studies including errors in the papers & obsolete technology. A4: The p-curve analyses show that the literature is sufficiently powered to detect the main effect and modulators; however, our failure to replicate the modulating effects suggests low power in the selected studies. Exp. 1 of Saffran et al. 1996 Attention screen? Main effect Modulators Word learning High TP vs. Low TP Change-1 v. change- last Non-word vs. Part- word “bupabi” Original in lab no Online rep.1 no Online rep. 2 no yes In lab rep. no yes Interpretation of slope: A1 Main effect slope = 0.25 Replication effects smaller than the original (perfect replication would be a slope of one) A1 Modulators slope = 0.03 No relationship between replication & original (complete failure to replicate would be a slope of zero) Main effects Modulators y = 0.55 + 0.25 * x, r 2 = 0.339 y = 0.019 + 0.03 * x, r 2 = 0.004 Statistical word segmentation: Identification of words based solely on the statistical properties of speech stream (e.g. Romberg & Saffran, 2010) Theoretically important: learning language with little explicit instruction Extensively studied phenomenon: 100+ published experiments Meta-Science: More information about our literature in order to interpret results and improve methodologies Replicability varies by discipline, but little research on the robustness of psycholinguistics (Aarts & LeBel, 2016; Open Science Collaboration, 2015) Help determine power for specific literatures (Mahowald et al., 2016) Better understand & utilize methods (e.g. Behrend et al., 2011) Research questions: Q1 What are the most reliable findings? Where is more investigation most useful? Q2 What factors influence replicability? Q3 How reproducible is the literature? Q4 Are studies sufficiently powered? A1 & A2 Main effects are robust—the pattern of significance remains the same with changing venues, samples, and attention screen A1 & A2 Modulating effects are not robust—we fail to replicate pattern of significance with changing venues, samples, and attention screen Observed p-curve Null of no eect Null of 33% power Observed p-curve Null of no eect Null of 33% power Main effects Modulators p-value p-value .05 .02 .01 .03 .04 .05 .02 .01 .03 .04 0% 25% 50% 75% 100% Percentage of test results 0% 25% 50% 75% 100% Percentage of test results 11% 5% 5% 11% 68% 85% 8% 2% 5% 0% P-curve definition: A tool that plots the reported p-values in a literature. True effects show a right skew, and a left skew suggests selective reporting or p- hacking (Simonsohn et al. 2014) Our analyses: P-curve analysis of 62 main effects (top) and 19 modulators (bottom) Results: A1: P-curve analyses find evidential value for both main effects & modulators (significant right skew) A4: Both main effects and modulators show significant power (99% and 97%)

Transcript of The Meta-Science of Adults Statistical Word Segmentation: Part 1 … · 2018-03-27 · Ongoing:...

Page 1: The Meta-Science of Adults Statistical Word Segmentation: Part 1 … · 2018-03-27 · Ongoing: Replicate all adult word segmentation exps(100+) 2.Three Replications: Online & In

Printing:Thisposteris48”wideby36”high.It’sdesignedtobeprintedonalarge

CustomizingtheContent:Theplaceholdersinthisformattedforyou.placeholderstoaddtext,orclickanicontoaddatable,chart,SmartArtgraphic,pictureormultimediafile.

Tfromtext,justclicktheBulletsbuttonontheHometab.

Ifyouneedmoreplaceholdersfortitles,makeacopyofwhatyouneedanddragitintoplace.PowerPoint’sSmartGuideswillhelpyoualignitwitheverythingelse.

Wanttouseyourownpicturesinsteadofours?Noproblem!JustrightChangePicture.Maintaintheproportionofpicturesasyouresizebydraggingacorner.

The Meta-Science of Adults Statistical Word Segmentation: Part 1

BACKGROUND & MOTIVATION

Data and analyses available at: https://osf.io/ehu7q/

1. Online replications: (Q1, Q2 & Q3)Part 1: 6 of 130 experiments replicated

[1] Exp. 1 from Saffran, Newport, & Aslin (1996)[2-3] Exps. 1 and 2 from Saffran et al. (1999) [4-5] Exps. 1 and 3 from Finn & Hudson Kam (2008) [6] Exp. 1 from Frank et al. (2010)

Ongoing: Replicate all adult word segmentation exps (100+)

2. Three Replications: Online & In Lab: (Q1 & Q2)Part 1: Comparison of in-lab and online replications of [1]

3. Meta- analysis: (Q1 & Q4)Part 1: Preliminary p-curve analyses of main effects and modulators (30 of 148 studies coded)Ongoing: Complete p-curve analyses and meta-analytic regression analyses

** Help us confirm that we have collected all of the relevant papers by reviewing our list here: https://goo.gl/forms/Kllq0xAC12kzikex1

CONCLUSIONSOVERVIEW OF METHODOLOGY

META ANALYSIS

Lauren Skorb1

Joshua K. Hartshorne11Department of Psychology, Boston College

Poster available at l3atbc.org

ONLINE REPLICATIONS

Aarts, A. A., & LeBel, E. P. (2016). Curate science: A platform to gauge the replicability of psychological science.Behrend, T. S., Sharek, D. J., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior research methods, 43(3), 800–813.Finn, A. S., & Hudson Kam, C. L. (2008). The curse of knowledge: First language knowledge impairs adult learners’ use of novel statistics for word segmentation. Cognition, 108(2), 477–499.Frank, M. C., Goldwater, S., Griffiths, T. L., & Tenenbaum, J. B. (2010). Modeling human performance in statistical word segmentation. Cognition, 117(2), 107–125.Gureckis, T. M., Martin, J., McDonnell, J., Rich, A. S., Markant, D., Coenen, A., . . . Chan, P. (2016). psiTurk: An open-source framework for conducting replicable behavioral experiments online.

Behavior Research Methods, 48(3), 829–842.Mahowald, K., James, A., Futrell, R., & Gibson, E. (2016). A meta-analysis of syntactic priming in language production. Journal of Memory and Language, 91, 5-27.Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349.Romberg, A. R. and Saffran, J. R. (2010). Statistical learning and language acquisition. Wiley Interdisciplinary Reviews: Cognitive Science , 1(6):906–914.Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1), 27-52.Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of distributional cues. Journal of memory and language, 35(4), 606–621.Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534-547

THREE REPLICATIONS: ONLINE & IN LABA1: Main effect (ME) of statistical word segmentation is most

reliable finding, while the modulators do not appear to be reliable.1. Meta analysis: P-curves show strong evidence for both effects2. Online replications: We replicate ME but not modulators3. In-lab replication: ME replicates w/ or w/o attention screen

A2: Preliminary observation: The most reliable findings have larger sample sizes, more test items, & open materials/data.

A3: We ran into issues in trying to reproduce some of the original studies including errors in the papers & obsolete technology.

A4: The p-curve analyses show that the literature is sufficiently powered to detect the main effect and modulators; however, our failure to replicate the modulating effects suggests low power in the selected studies.

Exp. 1 of Saffran et al. 1996

Attention screen?

Main effect Modulators

Word learning

High TP vs. Low TP

Change-1 v. change-

last

Non-word vs. Part-

word“bupabi”

Original in lab no ✅ ✅ ✅ ✅ ✅

Online rep.1 no ✅ ❌ ❌ ❌ ❌

Online rep. 2no ✅ ❌ ❌ ❌ ❌

yes ✅ ❌ ❌ ❌ ❌

In lab rep.no ✅ ❌ ❌ ❌ ❌

yes ✅ ❌ ❌ ❌ ❌

Interpretation of slope:A1 Main effect slope = 0.25 Replication effects smaller than the original

(perfect replication would be a slope of one)A1 Modulators slope = 0.03 No relationship between replication & original

(complete failure to replicate would be a slope of zero)

Main effects Modulators

y = 0.55 + 0.25 * x, r2 = 0.339

y = 0.019 + 0.03 * x, r2 = 0.004

Statistical word segmentation: • Identification of words based solely on the statistical properties of

speech stream (e.g. Romberg & Saffran, 2010)

• Theoretically important: learning language with little explicit instruction

• Extensively studied phenomenon: 100+ published experiments

Meta-Science: • More information about our literature in order to interpret results

and improve methodologies• Replicability varies by discipline, but little research on the

robustness of psycholinguistics (Aarts & LeBel, 2016; Open Science Collaboration, 2015)

• Help determine power for specific literatures (Mahowald et al., 2016)

• Better understand & utilize methods (e.g. Behrend et al., 2011)

Research questions:Q1 What are the most reliable findings? Where is more investigation most useful?Q2 What factors influence replicability?Q3 How reproducible is the literature?Q4 Are studies sufficiently powered?

A1 & A2 Main effects are robust—the pattern of significance remains the same with changing venues, samples, and attention screen

A1 & A2 Modulating effects are not robust—we fail to replicate pattern of significance with changing venues, samples, and attention screen

Observed p-curveNull of no effect

Null of 33% power

Observed p-curveNull of no effect

Null of 33% power

main effects

modulators

Main effects

Modulatorsp-value

p-value.05.02.01 .03 .04

.05.02.01 .03 .04

0%

25%

50%

75%

100%

Perc

enta

ge o

f tes

t res

ults

0%

25%

50%

75%

100%

Perc

enta

ge o

f tes

t res

ults

11% 5% 5%11%

68%

85%

8%2% 5% 0%

P-curve definition:A tool that plots the reported p-values in a literature. True effects show a right skew, and a left skew suggests selective reporting or p-hacking (Simonsohn et al. 2014)

Our analyses:P-curve analysis of 62 main effects (top) and 19 modulators (bottom)

Results:A1: P-curve analyses find

evidential value for both main effects & modulators (significant right skew)

A4: Both main effects and modulators show significant power (99% and 97%)