Open Data + Preprints = Open Science
David Mellor
@EvoMellor
Find this presentation at osf.io/7grp5
Open Data + Preprints = Open Science
David Mellor
@EvoMellor
Find this presentation at https://osf.io/zshgp/
Everything that happens prior to the final publication
Openness is a core value of the scientific method
“Nullius in verba” ~ “Take nobody's word for it”
• Communalism
• Universalism
• Disinterestedness
• Organized skepticism(Merton, 1942)
Problem:
The gap between scholarly values and practices
The combination of a strong bias toward statistically
significant findings and flexibility in data analysis
results in irreproducible research
The combination of a strong bias toward statistically
significant findings and flexibility in data analysis
results in irreproducible research
https://simplystatistics.org/2017/07/26/announcing-the-tidypvals-package/
The combination of a strong bias toward statistically
significant findings and flexibility in data analysis
results in irreproducible research
The Garden of Forking Paths
Gelman and Loken, 2013
“Does X affect Y?”
Control for time?
Exclude outliers?
Median or mean?
How do you define X?
The combination of a strong bias toward statistically
significant findings and flexibility in data analysis
results in irreproducible research
“Many analysts, one dataset: Making transparent how variations in analytical choices affect results” https://psyarxiv.com/qkwst/ Image via 538.com
https://fivethirtyeight.com/features/science-isnt-broken/#part1
The combination of a strong bias toward statistically
significant findings and flexibility in data analysis
results in irreproducible research
p-v
alu
es
Original Studies Replications
97%
“significant”
37%
“significant”
Solutions
Option B
Reward actions that are idealized
scientific practice.
The “truth” is versioned.
Pointing out mistakes enhance
reputations of all parties.
Getting to Option B
I. Allow peers to receive recognition for
ideal behaviors.
II. Identify biases in analysis and
publication, use a process that
addresses those biases.
III. Collective action by key decision
makers.
Transparency and Openness Promotion (TOP)
Guidelines
Eight policy statements for increasing
the transparency and reproducibility
of the published research.
• Agnostic to discipline
• Low barrier to entry
• Modular
See cos.io/top for more detailed language
Three Tiers
Disclose Require Verify
Eight Standards
Data citation
Materials transparency
Data transparency
Code transparency
Design transparency
Study Preregistration
Analysis Preregistration
Replication
See cos.io/top for more detailed language
Three Tiers
Disclose Require Verify
Eight Standards
Data citation
Materials transparency
Data transparency
Code transparency
Design transparency
Study Preregistration
Analysis Preregistration
Replication
See cos.io/top for more detailed language
Three Tiers
Disclose Require Verify
Eight Standards
Data citation
Materials transparency
Data transparency
Code transparency
Design transparency
Study Preregistration
Analysis Preregistration
Replication
See cos.io/top for more detailed language
Three Tiers
Disclose Require Verify
Eight Standards
Data citation
Materials transparency
Data transparency
Code transparency
Design transparency
Study Preregistration
Analysis Preregistration
Replication
See cos.io/top for more detailed language
Ask authors who submit to answer 2
questions:
1) Are the data/code/materials
available in a public repository?
Yes/No
2) If Yes, where: URL: ________
Three Tiers
Disclose Require Verify
Eight Standards
Data citation
Materials transparency
Data transparency
Code transparency
Design transparency
Study Preregistration
Analysis Preregistration
Replication
See cos.io/top for more detailed language
Ask authors who submit to answer 2
questions:
1) Are the data/code/materials
available in a public repository?
Yes/No
2) If Yes, where: URL: ________
Make answers available in article
metadata, or simply in footnotes.
Fig 4. Actually available, correct, usable, and complete data.
% o
f A
rtic
les
wit
h D
ata
Re
po
rte
dly
Ava
ila
ble
Reportedly
Available
Actually
Available
Correct
DataUsable
Data
Complete
Data
Preregistration increases credibility by
specifying in advance how data will be
analyzed, thus preventing biased
reasoning from affecting data analysis.
cos.io/prereg
What is a preregistration?
A time-stamped, read-only version of your research plan
created before the study.
Study plan:
● Hypothesis
● Data collection procedures
● Manipulated and measured variables
Analysis plan:
● Statistical model
● Inference criteria
What problems do preregistration fix?
1) The file drawer
2) P-Hacking: Unreported flexibility in data analysis
3) HARKing: Hypothesizing After Results are Known
Dataset
Hypothesis
Kerr, 1998
What problems do preregistration fix?
1) The file drawer
2) P-Hacking: Unreported flexibility in data analysis
3) HARKing: Hypothesizing After Results are Known
Dataset
Hypothesis
Kerr, 1998
Preregistration makes the distinction between
confirmatory (hypothesis testing) and
exploratory (hypothesis generating)
research more clear.
Confirmatory versus exploratory analysis
Context of confirmation
Traditional hypothesis testing
Results held to the highest
standards of rigor
Goal is to minimize false positives
P-values interpretable
Presenting exploratory results as confirmatory
increases publishability at the expense of credibility
Context of discovery
Pushes knowledge into new areas/
data-led discovery
Finds unexpected relationships
Goal is to minimize false negatives
P-values meaningless
Example workflow #1
(Theory driven with specific prediction)
Discovery Phase
Exploratory research
Hypothesis generating
Confirmation Phase
Hypothesis testing
Collect New Data
Discovery Phase
Exploratory research
Hypothesis generating
Confirmation Phase
Hypothesis testing
Collect Data
Split DataKeep these
data secret!
Example workflow #2
(Few a-priori predictions)
Tips for writing up preregistered work
1. Include a link to your preregistration
2. Report the results of ALL preregistered analyses
3. ANY unregistered analyses must be transparent
What is a Registered Report?
When the research plan undergoes peer review
before results are known, the preregistration
becomes part of a Registered Report
cos.io/rr
Registered Reports
• Are the hypotheses well founded?
• Are the methods and proposed analyses feasible and sufficiently
detailed?
• Is the study well powered? (≥90%)
• Have the authors included sufficient positive controls to confirm that the
study will provide a fair test?
Registered Reports
• Did the authors follow the approved protocol?
• Did positive controls succeed?
• Are the conclusions justified by the data?
What problems do preregistration fix?
1) The file drawer
2) P-Hacking: Unreported flexibility in data analysis
3) HARKing: Hypothesizing After Results are Known
4) Registered Reports also address publication bias.
5) Registered Reports also may improve the study design,
by getting peer review into the process sooner.
FAQ: Does preregistration work?
Reported Tests (122)
Median p-value = .02
Median effect size (d) = .29
% p < .05 = 63%
Unreported Tests (147)
Median p-value = .35
Median effect size (d) = .13
% p < .05 = 23%
Underreporting in Political Science Survey Experiments: Comparing Questionnaires to Published
Results. Franco, A., Malhotra, N., & Simonovits, G. (2015).
Managing a research workflow
Planning Execution Reporting Archiving Discovery
Collaboration
Version Control
Hub for Services
Project Management
https://osf.io/preprints/discover
Aggregated
Powered by SHARE (share.osf.io), OSF
Preprints aggregates search across local
and external preprint services
Currently over 2M preprint records
available
Commenting options
Visibility of comments
1. Visible only to other preprint moderators
2. Visible to both preprint moderators AND to
authors
Anonymity
1. Comments from moderators are anonymous
2. Comments from moderators are identified
What’s Next?
Public Roadmap (http://bit.ly/2iUAFGF)
• Improved Analytics
• Improved search and filtering
• Public commenting
• Preprints Advisory Committee is providing on going
governance, technical prioritization, best practices, and
education
• A widening community that supports experimentation and
innovation in scholarly communications
Open Science and Citation Impact
1) Articles that appear in preprint servers are more highly cited than
those that don’t. • The Citation Impact of Digital Preprint Archives for Solar Physics Papers (Metcalfe, 2006;
https://doi.org/10.1007/s11207-006-0262-7)
2) Sharing Detailed Research Data Is Associated with Increased
Citation Rate • (Piwowar et al, 2017; https://doi.org/10.1371/journal.pone.0000308)
Thank you!
Find this presentation at osf.io/7grp5
Resources for Registered Reports, preregistration, Open Science Badges,
Statistical Consulting, Communities, and more at https://cos.io
Find me online @EvoMellor or email: [email protected]
Our mission is to provide expertise, tools, and training to help researchers create and
promote open science within their teams and institutions. Promoting these practices
within the research funding and publishing communities accelerates scientific progress.
Top Related