Public data archiving: Who does? Who doesn't? What can we do about it?
-
Upload
heather-piwowar -
Category
Education
-
view
760 -
download
1
description
Transcript of Public data archiving: Who does? Who doesn't? What can we do about it?
![Page 1: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/1.jpg)
Public data archiving:
Who shares? Who doesn’t?
What can we do about it?Heather Piwowar
Presented at UBC BLISS, Sept 2010
DataONE postdoc with Dryad and NESCent, @UBCPhD in Dept of Biomedical Informatics, U of Pittsburgh
![Page 2: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/2.jpg)
http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm
![Page 3: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/3.jpg)
http://www.flickr.com/photos/jsmjr/62443357/
![Page 4: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/4.jpg)
http://www.flickr.com/photos/camilleharrington/3587294608/
![Page 5: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/5.jpg)
http://www.flickr.com/photos/rkuhnau/3318245976/
![Page 6: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/6.jpg)
http://www.flickr.com/photos/conformpdx/1796399674/
![Page 7: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/7.jpg)
http://www.flickr.com/photos/rkuhnau/3317418699/
![Page 8: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/8.jpg)
http://www.flickr.com/photos/zemlinki/261617721/
![Page 9: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/9.jpg)
http://www.flickr.com/photos/tracenmatt/3020786491/
![Page 10: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/10.jpg)
http://www.flickr.com/photos/the-o/2078239333/
![Page 11: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/11.jpg)
http://www.flickr.com/photos/ryanr/142455033/
![Page 12: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/12.jpg)
http://www.flickr.com/photos/75166820@N00/5318468/
![Page 13: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/13.jpg)
FindOrganizeDocumentDeidentifyFormatDecideAskSubmit
Answer questionsWorry about mistakes being foundWorry about data being misinterpretedWorry about being scoopedForgo money and IP and prestige???
![Page 14: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/14.jpg)
not very motivating.
![Page 15: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/15.jpg)
As a result, policy makers have spent lots of time and money ....
http://www.flickr.com/photos/tonivc/2283676770/
http://www.flickr.com/photos/johnnyvulkan/381941233/
![Page 16: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/16.jpg)
building databases, developing standards, articulating best practices
to support public archiving of research datasets
![Page 17: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/17.jpg)
lots of data sharing!
http://www.genome.jp/en/db_growth.html
![Page 18: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/18.jpg)
but how much isn’t shared?
what isn’t shared?
who isn’t sharing it?why not?
what can we do about it?
how much does it matter?
![Page 19: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/19.jpg)
you can not manage what you do not measure
quote: Lord Kelvinhttp://www.flickr.com/photos/archeon/2941655917/
![Page 20: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/20.jpg)
As we seek to embrace and encourage data sharing,
understanding patterns of adoption will allow us to make informed decisions about tools, policies, and best practices.
Measuring adoption over time will allow us to note progress and identify best practices and opportunities for improvement.
![Page 21: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/21.jpg)
1. Is there benefit for those who share?
2. How can we study data sharing behaviour in a scalable, systematic way?
3. What factors are correlated with sharing and withholding data?
research questions
![Page 22: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/22.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
![Page 23: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/23.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
Which data?
![Page 24: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/24.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
Where?
![Page 25: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/25.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
With whom?
![Page 26: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/26.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
When?
![Page 27: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/27.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
Under what terms?
![Page 28: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/28.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
![Page 29: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/29.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
![Page 30: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/30.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
• gene expression microarray data
• raw intensity data
• upon publication
• publicly on the internet
• (centralized databases)
![Page 31: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/31.jpg)
microarray data
http://en.wikipedia.org/wiki/DNA_microarray
http://en.wikipedia.org/wiki/Image:Heatmap.png
http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG
![Page 32: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/32.jpg)
microarray data
![Page 33: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/33.jpg)
http://www.flickr.com/photos/sunrise/35819369/
1. Is there benefit for those who share?
![Page 34: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/34.jpg)
currency of value?
Citations.
![Page 35: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/35.jpg)
currency of value?
Citations.
$50!
Diamond,Arthur M. What is a Citation Worth?. The Journal of Human Resources (1986) vol. 21 (2) pp. 200-215
![Page 36: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/36.jpg)
dataset85 cancer microarray trials published in 1999-2003, as identified by Ntzani and Ioannidis (2003)
citationsISI Web of Science Citation index, citations from 2004-2005
data sharing locationsPublisher and lab websites, microarray databases, WayBack Internet Archive, Oncomine
statisticsMultivariate linear regression
![Page 37: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/37.jpg)
Note:log scale
![Page 38: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/38.jpg)
![Page 39: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/39.jpg)
~70%
![Page 40: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/40.jpg)
2. Need automated methods to:
a) Identify studies that create datasets
b) Determine which of these have in fact been shared
c) Extract attributes about the environment
![Page 41: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/41.jpg)
a) Identify studies that create datasets
http://www.flickr.com/photos/lofaesofa/248546821/
![Page 42: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/42.jpg)
Look for wetlab methods in article full text:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1522022&tool=pmcentrezhttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1590031&tool=pmcentrez
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1482311&tool=pmcentrez#id331936http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2082469&tool=pmcentrez
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=126870&tool=pmcentrez#id442745
![Page 43: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/43.jpg)
Combined, these full-text portals reach 85% of the articles available through U of Pittsburgh library subscriptions.
![Page 44: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/44.jpg)
But how to generate an effective query?
Use open access articles.
![Page 45: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/45.jpg)
•text analysis: automatically catalogued single words and word-pairs from full text
•assessed precision and recall
•combined the high performers:
![Page 46: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/46.jpg)
Derived query:
("gene expression" AND microarray AND cell AND rna)
AND (rneasy OR trizol OR "real-time pcr")
NOT (“tissue microarray*” OR “cpg island*”)
![Page 47: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/47.jpg)
Evaluation:
Ochsner et al. Nature Methods (2008) 400 studies across 20 journals
Precision: 90% (conf int: 86% to 93%) Recall: 56% (conf int: 52% to 61%)
![Page 48: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/48.jpg)
a) Identify studies that create datasets
b) Determine which of these have in fact been shared
c) Extract attributes about the environment
![Page 49: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/49.jpg)
b) Determine which datasets have in fact been shared
![Page 50: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/50.jpg)
![Page 51: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/51.jpg)
77 %
![Page 52: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/52.jpg)
![Page 53: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/53.jpg)
a) Identify studies that create datasets
b) Determine which of these have in fact been shared
c) Extract attributes about the environment
![Page 54: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/54.jpg)
Is research data shared after publication?
Funder Journal Investigator Institution Study
![Page 55: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/55.jpg)
funded by NIH?
size of grant
sharing plan req’d?
funded by non-NIH?
impact factor
strength of policy
open access?
number of microarray studies published
years since first paper
# pubs
# citations
previously shared?
previously reused?
gender
sector
size
impact rank
country
humans?
mice?
plants?
cancer?
clinical trial?
number of authors
year
Funder Journal Investigator Institution Study
![Page 56: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/56.jpg)
journal rank
![Page 57: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/57.jpg)
“An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …”
http://www.nature.com/authors/editorial_policies/availability.html
http://www.nature.com/nature/journal/v453/n7197/index.html
journal data sharing policy
![Page 58: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/58.jpg)
institution rank
Yu et al. BMC medical informatics and decision making (2007) vol. 7 pp. 17
![Page 59: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/59.jpg)
study type
![Page 60: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/60.jpg)
Author publication history:
Citation counts:
Author-ity web serviceTorvik & Smalheiser. (2009). Author Name Disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3):11.
Author name disambiguation:
author “experience”
![Page 61: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/61.jpg)
author gender
![Page 62: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/62.jpg)
funding level
PubMed grant lists + NIH grant details
![Page 63: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/63.jpg)
funder mandates
Requires a data sharing planfor studies funded after October 2003
that receive more than $500 000 in direct funding per year
![Page 64: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/64.jpg)
Proxy for NIH data sharing policy applicability:
If in any year since 2004,
• funded by an NIH grant number with a “1” or “2” type code
• received more than $750 000 in total funding from the grant
funder mandates
![Page 65: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/65.jpg)
and so on...
124 variables
![Page 66: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/66.jpg)
Now equipped with automated methods to:
a) Identify studies that create datasets
b) Determine which of these have in fact been shared
c) Extract attributes about the environment
![Page 67: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/67.jpg)
http://www.flickr.com/photos/cogdog/123072/
3. What factors are correlated with sharing and withholding data?
![Page 68: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/68.jpg)
11,603 datapoints
25% had links from datasets in databases
![Page 69: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/69.jpg)
univariate analysis
![Page 70: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/70.jpg)
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Year article published
Pro
po
rtio
n o
f a
rtic
les w
ith
da
tase
ts f
ou
nd
in
GE
O o
r A
rra
yE
xp
ress
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Proportion of articles with shared datasets, by year
Across time
![Page 71: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/71.jpg)
Ph
ysio
l G
en
om
ics
PL
oS
Ge
ne
t
Ge
no
me
Bio
l
Microbiology
PL
oS
On
e
BM
C G
en
om
ics
Pla
nt
Ce
ll
Ge
no
me
Re
s
Eu
ka
ryo
t C
ell
Ap
pl E
nviro
n M
icro
bio
lB
MC
Me
d G
en
om
ics
Hu
m M
ol G
en
et
Pro
c N
atl A
ca
d S
ci U
S A
Infe
ct
Imm
un
Am
J R
esp
ir C
ell
Mo
l B
iol
De
v B
iol
J B
acte
rio
l
Mo
l E
nd
ocrin
ol
BM
C C
an
ce
r
Pla
nt
Ph
ysio
lB
iol R
ep
rod
Blood
J I
mm
un
ol
FA
SE
B J
To
xic
ol S
ci
J E
xp
Bo
tN
ucle
ic A
cid
s R
es
Diabetes
Mo
l C
ell B
iol
Mo
l C
an
ce
r T
he
r
BM
C B
ioin
form
atics
Ste
m C
ells
FE
BS
Le
tt
J N
eu
rosci
Am
J P
ath
ol
J B
iol C
he
m
J V
iro
l
OTHER
Ca
nce
r R
es
J C
lin
En
do
crin
ol M
eta
b
Pla
nt
Mo
l B
iol
Clin
Ca
nce
r R
es
Genomics
Inve
st
Op
hth
alm
ol V
is S
ci
Mo
l H
um
Re
pro
dCarcinogenesis
Gene
Endocrinology
Oncogene
Ca
nce
r L
ett
Bio
ch
em
Bio
ph
ys R
es C
om
mu
n
Pro
port
ion o
f data
sets
share
d
0.0
0.2
0.4
0.6
0.8
1.0 Journals(Physiological Genomics)
![Page 72: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/72.jpg)
Sta
nfo
rd U
niv
ers
ity
Un
ive
rsity o
f P
en
nsylv
an
ia
Un
ive
rsity o
f Illin
ois
Un
ive
rsity o
f C
alif
orn
ia,
Lo
s A
ng
ele
s
Un
ive
rsity o
f W
isco
nsin
, M
ad
iso
n
Un
ive
rsity o
f W
ash
ing
ton
Un
ive
rsity o
f C
alif
orn
ia,
Da
vis
Th
e U
niv
ers
ity o
f B
ritish
Co
lum
bia
Un
ive
rsity o
f C
alif
orn
ia,
Sa
n F
ran
cis
co
Un
ive
rsity o
f F
lorid
a
Un
ive
rsity o
f C
alif
orn
ia,
Sa
n D
ieg
o
Un
ive
rsity o
f M
inn
eso
ta,
Tw
in C
itie
s
Ba
ylo
r C
olle
ge
of
Me
dic
ine
OTHER
Ma
x P
lan
ck G
ese
llsch
aft
Ha
rva
rd U
niv
ers
ity
Du
ke
Un
ive
rsity M
ed
ica
l C
en
ter
Ya
le U
niv
ers
ity
Jo
hn
s H
op
kin
s U
niv
ers
ity
Un
ive
rsity o
f P
itts
bu
rgh
Wa
sh
ing
ton
Un
ive
rsity in
Sa
int
Lo
uis
Un
ive
rsity o
f T
oro
nto
Un
ive
rsity o
f C
alif
orn
ia,
Be
rke
ley
Un
ive
rsity o
f M
ich
iga
n,
An
n A
rbo
r
Mic
hig
an
Sta
te U
niv
ers
ity
Na
tio
na
l C
an
ce
r In
stitu
te
To
kyo
Da
iga
ku
Pro
po
rtio
n o
f d
ata
se
ts s
ha
red
0.0
0.2
0.4
0.6
0.8
1.0
Institutions(Stanford)
![Page 73: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/73.jpg)
1
101
201
301
401
501
601
701
801
901
1001
1101
1201
1301
1401
1501
1601
1701
1801
1901
Pro
po
rtio
n o
f d
ata
se
ts s
ha
red
0.0
0.2
0.4
0.6
0.8
1.0
Institutionrank
![Page 74: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/74.jpg)
multivariate analysis
![Page 75: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/75.jpg)
![Page 76: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/76.jpg)
factor analysis
![Page 77: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/77.jpg)
![Page 78: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/78.jpg)
![Page 79: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/79.jpg)
![Page 80: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/80.jpg)
![Page 81: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/81.jpg)
![Page 82: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/82.jpg)
![Page 83: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/83.jpg)
![Page 84: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/84.jpg)
![Page 85: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/85.jpg)
![Page 86: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/86.jpg)
![Page 87: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/87.jpg)
![Page 88: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/88.jpg)
multivariate logistic regression over the first-order factors
![Page 89: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/89.jpg)
Odds Ratio
0.25 0.50 1.00 2.00 4.00 8.00
Has journal policy0.95Count of R01 & other NIH grants
Authors prev GEOAE sharing & OA & microarray creation
NO K funding or P funding
Institution high citations & collaboration
Journal impact
Journal policy consequences & long halflife
NOT animals or mice
Instititution is government & NOT higher ed
Last author num prev pubs & first year pub
Large NIH grant
Humans & cancer
NO geo reuse + YES high institution output
First author num prev pubs & first year pub
Multivariate nonlinear regressions with interactions
Odds Ratio
0.25 0.50 1.00 2.00 4.00 8.00
Has journal policy0.95Count of R01 & other NIH grants
Authors prev GEOAE sharing & OA & microarray creation
NO K funding or P funding
Journal impact
Journal policy consequences & long halflife
Institution high citations & collaboration
NOT animals or mice
Instititution is government & NOT higher ed
Last author num prev pubs & first year pub
Large NIH grant
Humans & cancer
NO geo reuse + YES high institution output
First author num prev pubs & first year pub
Multivariate nonlinear regressions with interactions
![Page 90: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/90.jpg)
Odds Ratio
0.25 0.50 1.00 2.00 4.00 8.00
Has journal policy0.95Count of R01 & other NIH grants
Authors prev GEOAE sharing & OA & microarray creation
NO K funding or P funding
Institution high citations & collaboration
Journal impact
Journal policy consequences & long halflife
NOT animals or mice
Instititution is government & NOT higher ed
Last author num prev pubs & first year pub
Large NIH grant
Humans & cancer
NO geo reuse + YES high institution output
First author num prev pubs & first year pub
Multivariate nonlinear regressions with interactions
Odds Ratio
0.25 0.50 1.00 2.00 4.00 8.00
Has journal policy0.95Count of R01 & other NIH grants
Authors prev GEOAE sharing & OA & microarray creation
NO K funding or P funding
Journal impact
Journal policy consequences & long halflife
Institution high citations & collaboration
NOT animals or mice
Instititution is government & NOT higher ed
Last author num prev pubs & first year pub
Large NIH grant
Humans & cancer
NO geo reuse + YES high institution output
First author num prev pubs & first year pub
Multivariate nonlinear regressions with interactions
![Page 91: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/91.jpg)
logistic regressionusing second-order factors
![Page 92: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/92.jpg)
Odds Ratio
0.25 0.50 1.00 2.00 4.00
OA journal & previous GEO-AE sharing
0.95Amount of NIH funding
Journal impact factor and policy
Higher Ed in USA
Cancer & humans
Multivariate nonlinear regression with interactions
![Page 93: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/93.jpg)
Odds Ratio
0.25 0.50 1.00 2.00 4.00
OA journal & previous GEO-AE sharing
0.95Amount of NIH funding
Journal impact factor and policy
Higher Ed in USA
Cancer & humans
Multivariate nonlinear regression with interactions
![Page 94: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/94.jpg)
Conclusions:
• data sharing rates are increasing, but overall levels are low
Preliminary evidence:• levels are particularly low in cancer• levels are highest for those who
• publish in a journal with a policy• publish in an open access journal • have shared data before
![Page 95: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/95.jpg)
• data and filters were imperfect• many assumptions• didn’t capture all types of sharing• don’t know how generalizable across datatypes• should be considered hypothesis-generating
http://www.flickr.com/photos/vlastula/300102949/
![Page 96: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/96.jpg)
http://www.flickr.com/photos/gatewaystreets/3838452287/
![Page 97: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/97.jpg)
NSF-funded distributed framework and cyberinfrastructure for environmental science.
Dryad is a repository of data underlying scientific publications, with an initial focus on evolution, ecology, and related fields.
The National Evolutionary Synthesis Center, NSF-funded:
• Duke University,• UNC at Chapel Hill• North Carolina State University
![Page 98: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/98.jpg)
1. new domain
![Page 99: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/99.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
![Page 100: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/100.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
![Page 101: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/101.jpg)
http://www.flickr.com/photos/paulhami/1020538523//
• evolution and ecology datasets
• raw data that support results
• upon publication or short embargo
• publicly on the internet
![Page 102: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/102.jpg)
challenges!
1. No PubMed
2. Diverse data types, norms, repositories
3. Data almost always collected for a specific hypothesis
4. Less public sharing so far
![Page 103: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/103.jpg)
2. new initiatives
![Page 104: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/104.jpg)
![Page 105: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/105.jpg)
JDAP• The American Naturalist• Evolution• Journal of Evolutionary Biology• Molecular Ecology• Evolutionary Applications• Genetics• Heredity• Molecular Biology and Evolution• Systematic Biology• Paleobiology• BMC Evolutionary Biology
![Page 106: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/106.jpg)
http://www.flickr.com/photos/jima/606588905/
Blumenthal et al. Acad Med. 2006 Campbell et al. JAMA. 2002.
Kyzas et al. J Natl Cancer Inst. 2005.Vogeli et al. Acad Med. 2006.
Reidpath et al. Bioethics 2001.
![Page 107: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/107.jpg)
3. Reuse.
http://www.flickr.com/photos/boitabulle/3668162701/
![Page 108: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/108.jpg)
who reuses data?when?
why aren’t they?
which datasets are most likely to be reused?
what can we do about it?
how many datasets could be reused but aren’t?
why?
who doesn’t?
does it matter?
![Page 109: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/109.jpg)
http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png
![Page 110: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/110.jpg)
![Page 111: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/111.jpg)
![Page 112: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/112.jpg)
I post my data, code, and statistical scripts on GitHub (links from http://researchremix.org)
Share yours too!
http://www.flickr.com/photos/myklroventine/892446624/
![Page 113: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/113.jpg)
“Does anyone want your data?
That’s hard to predict […] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay.
Your data, too, may simply be awaiting an effective matchmaker.”
Got data? Nature Neuroscience (2007)
![Page 114: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/114.jpg)
thank you
Dept of Biomedical Informatics at U of Pittsburgh
Wendy Chapman for support and feedback
Todd Vision, Mike Whitlock for ongoing discussions
NIH NLM. NSF through DataONE, NESCent, Dryad.
Open science online community and those who release their articles, datasets and photos openly
![Page 115: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/115.jpg)
![Page 116: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/116.jpg)
http://www.flickr.com/photos/jep42/3017149415/in/set-72157608797298056/
![Page 117: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/117.jpg)
variables
Journal mandates
![Page 118: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/118.jpg)
• readers
• reusers
• authors
• editors
• reviewers
• funders
• database designers, maintainers, curators
• patients, subjects, or populations
perspectives,
and also driving towards actionable results for these groups
![Page 119: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/119.jpg)
![Page 120: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/120.jpg)
![Page 121: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/121.jpg)
http://www.flickr.com/photos/sunrise/35819369/http://www.flickr.com/photos/fboyd/2156630044/
![Page 122: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/122.jpg)
Blumenthal et al. Acad Med. 2006
industry involvement
perceived competitiveness of field
male
sharing discouraged in training
human participants
academic productivity
0 1 2 3
Correlates with self‐reported data withholding
![Page 123: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/123.jpg)
Campbell et al. JAMA 2002.
sharing is too much effort
want student or jr faculty to publish more
they themselves want to publish more
cost
industrial sponsor
confidentiality
commercial value of results0% 20% 40% 60% 80%
Self‐reported reasons for data withholding
![Page 124: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/124.jpg)
Table 2: Second-order factor loadings, by first-order factors
Amount of NIH funding 0.88 Count of R01 & other NIH grants
0.49 Large NIH grant -0.55 NO K funding or P funding
Cancer & humans 0.83 Humans & cancer
OA journal & previous GEO-AE sharing 0.59 Authors prev GEOAE sharing & OA & microarray creation
0.43 Institution high citations & collaboration 0.31 First author num prev pubs & first year pub -0.36 Last author num prev pubs & first year pub
Journal impact factor and policy 0.57 Journal impact
0.51 Last author num prev pubs & first year pub
Higher Ed in USA 0.40 NO geo reuse + YES high institution output -0.44 Institution is government & NOT higher ed
![Page 125: Public data archiving: Who does? Who doesn't? What can we do about it?](https://reader033.fdocuments.in/reader033/viewer/2022051816/5455ba74b1af9f40378b49ed/html5/thumbnails/125.jpg)
Table 3: Second-order factor loadings, by original variables
Amount of NIH funding 0.87 nih.cumulative.years.tr 0.85 num.grants.via.nih.tr 0.84 max.grant.duration.tr 0.82 num.grant.numbers.tr 0.80 pubmed.is.funded.nih 0.79 nih.max.max.dollars.tr 0.70 nih.sum.avg.dollars.tr 0.70 nih.sum.sum.dollars.tr 0.59 has.R.funding 0.59 num.post2003.morethan500k.tr 0.58 country.usa 0.58 has.U.funding 0.57 has.R01.funding 0.55 num.post2003.morethan750k.tr 0.53 has.T.funding 0.53 num.post2003.morethan1000k.tr 0.49 num.post2004.morethan500k.tr 0.45 num.post2004.morethan750k.tr 0.44 has.P.funding 0.43 num.post2004.morethan1000k.tr 0.43 num.nih.is.nci.tr 0.35 num.post2005.morethan500k.tr 0.32 num.nih.is.nigms.tr 0.31 num.post2005.morethan750k.tr
Cancer & humans 0.60 pubmed.is.cancer 0.59 pubmed.is.humans 0.52 pubmed.is.cultured.cells 0.43 pubmed.is.core.clinical.journal 0.39 institution.is.medical -0.58 pubmed.is.plants -0.50 pubmed.is.fungi -0.37 pubmed.is.shared.other -0.30 pubmed.is.bacteria
OA journal & previous GEO-AE sharing 0.40 first.author.num.prev.geoae.sharing.tr 0.37 pubmed.is.open.access 0.37 first.author.num.prev.oa.tr 0.35 last.author.num.prev.geoae.sharing.tr 0.32 pubmed.is.effectiveness 0.32 last.author.num.prev.oa.tr 0.31 pubmed.is.geo.reuse -0.38 country.japan
Journal impact factor and policy 0.48 journal.impact.factor.log 0.47 jour.policy.requires.microarray.accession 0.46 jour.policy.mentions.exceptions 0.46 pubmed.num.cites.from.pmc.tr 0.45 journal.5yr.impact.factor.log 0.45 jour.policy.contains.word.miame.mged 0.42 last.author.num.prev.pmc.cites.tr 0.41 jour.policy.requests.accession 0.40 journal.immediacy.index.log 0.40 journal.num.articles.2008.tr 0.39 years.ago.tr 0.36 jour.policy.says.must.deposit 0.35 pubmed.num.cites.from.pmc.per.year 0.33 institution.mean.norm.citation.score 0.32 last.author.year.first.pub.ago.tr 0.31 country.usa 0.31 last.author.num.prev.pubs.tr 0.31 jour.policy.contains.word.microarray -0.31 pubmed.is.open.access
Higher Ed in USA 0.36 institution.stanford 0.36 institution.is.higher.ed 0.35 country.usa 0.35 has.R.funding 0.33 has.R01.funding 0.30 institution.harvard -0.37 institution.is.govnt