Thesis defense, Heather Piwowar, Sharing biomedical research data

117
Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data Heather Piwowar Doctoral Defense March 24, 2010 Department of Biomedical Informatics University of Pittsburgh

description

Presentation by Heather Piwowar as PhD dissertation defense on March 24, 2010 at the Dept of Biomedical Informatics, U of Pittsburgh. "Foundational studies formeasuring the impact, prevalence, and patterns of publicly sharing biomedical research data." I passed :)

Transcript of Thesis defense, Heather Piwowar, Sharing biomedical research data

Page 1: Thesis defense, Heather Piwowar, Sharing biomedical research data

Foundational studies for measuring the impact, 

prevalence, and patterns of publicly sharing biomedical 

research data

Heather PiwowarDoctoral DefenseMarch 24, 2010

Department of Biomedical InformaticsUniversity of Pittsburgh

Page 2: Thesis defense, Heather Piwowar, Sharing biomedical research data

Wendy Chapman, PhDBrian Butler, PhD

Ellen Detlefsen, DLS Madhavi Ganapathiraju, PhD Gunther Eysenbach, MD, MPH

Page 3: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm

Page 4: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.flickr.com/photos/jsmjr/62443357/

Page 5: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.flickr.com/photos/camilleharrington/3587294608/

Page 6: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.flickr.com/photos/rkuhnau/3318245976/

Page 7: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.flickr.com/photos/rkuhnau/3317418699/

Page 8: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.flickr.com/photos/zemlinki/261617721/

Page 9: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.flickr.com/photos/tracenmatt/3020786491/

Page 10: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.flickr.com/photos/conformpdx/1796399674/

Page 11: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.flickr.com/photos/the-o/2078239333/

Page 12: Thesis defense, Heather Piwowar, Sharing biomedical research data

lots of data sharing!

http://www.genome.jp/en/db_growth.html

Page 13: Thesis defense, Heather Piwowar, Sharing biomedical research data

but how much isn’t shared?

what isn’t shared?

who isn’t sharing it?why not?

what can we do about it?

how much does it matter?

Page 14: Thesis defense, Heather Piwowar, Sharing biomedical research data

Prior studies: surveys and/or manual audits

http://www.flickr.com/photos/jima/606588905/

Blumenthal et al. Acad Med. 2006 Campbell et al. JAMA. 2002.

Kyzas et al. J Natl Cancer Inst. 2005.Vogeli et al. Acad Med. 2006.

Reidpath et al. Bioethics 2001.

Page 15: Thesis defense, Heather Piwowar, Sharing biomedical research data

• small sample sizes• relatively few variables• self-reporting bias • not much focus on measuring demonstrated behavior• not much focus on rewards • not much focus on policy• not much focus on biomedical data other than

DNA sequences

Limitations of related work

Page 16: Thesis defense, Heather Piwowar, Sharing biomedical research data

I believe analysis of the impact, prevalence, and patterns with which researchers share and withhold biomedical data can uncover rewards, best practices, and opportunities for increased adoption of data sharing.

http://www.flickr.com/photos/archeon/2941655917/

Page 17: Thesis defense, Heather Piwowar, Sharing biomedical research data

Goal of this dissertation:

Collect useful evidence on patterns of data sharing behaviour through methods that can be applied broadly, repeatably, and cost-effectively.

Page 18: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 1:  Does sharing have benefit for those who share?

Aim 2:  Can sharing and withholding be systematically measured? 

Aim 3:  How often is data shared?  What predicts sharing?  How can we model sharing behavior?

Page 19: Thesis defense, Heather Piwowar, Sharing biomedical research data

Scope:

• raw research data• upon study publication• making data publicly available on the Internet• one datatype

Page 20: Thesis defense, Heather Piwowar, Sharing biomedical research data

microarray data

http://en.wikipedia.org/wiki/DNA_microarray

http://en.wikipedia.org/wiki/Image:Heatmap.png

http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG

Page 21: Thesis defense, Heather Piwowar, Sharing biomedical research data

microarray data

Page 22: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 1

Page 23: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 1:  Does sharing have benefit for those who share?

http://www.flickr.com/photos/sunrise/35819369/

Page 24: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 1:  Does sharing have benefit for those who share?

http://www.flickr.com/photos/sunrise/35819369/

Benefit of value:  Citations.

Page 25: Thesis defense, Heather Piwowar, Sharing biomedical research data

dataset85 cancer microarray trials published in 1999-2003, as identified by Ntzani and Ioannidis (2003)

citationsISI Web of Science Citation index, citations from 2004-2005

data sharing locationsPublisher and lab websites, microarray databases, WayBack Internet Archive, Oncomine

statisticsMultivariate linear regression

Aim 1:  Does sharing have benefit for those who share?

Page 26: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 1:  Does sharing have benefit for those who share?

Page 27: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 1:  Does sharing have benefit for those who share?

Note the logarithmic scale

Page 28: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 1:  Does sharing have benefit for those who share?

Page 29: Thesis defense, Heather Piwowar, Sharing biomedical research data

Conclusion:  data sharing is associated with an increase in citation rate

Aim 1:  Does sharing have benefit for those who share?

Page 30: Thesis defense, Heather Piwowar, Sharing biomedical research data

Next:

What factors predict sharing?

http://www.flickr.com/photos/ryanr/142455033/

Page 31: Thesis defense, Heather Piwowar, Sharing biomedical research data

Can I use the same methods of Aim 1 to choose studies and determine data sharing status?

Page 32: Thesis defense, Heather Piwowar, Sharing biomedical research data

Can I use the same methods of Aim 1 to choose studies and determine data sharing status?

No, those methods don’t scale to identify or classify enough datapoints

Page 33: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 2

Page 34: Thesis defense, Heather Piwowar, Sharing biomedical research data

Need automated methods to:

Aim 2a: Identify studies that create datasets

Aim 2b: Determine which of these have in fact been shared

Page 35: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 2a: Identify studies that create gene expression microarray data

http://www.flickr.com/photos/lofaesofa/248546821/

Page 36: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 2a: Identify studies that create gene expression microarray data

Easy, via MeSH indexing terms?

gene expression profiling and/or

microarray analysis

Unfortunately, these have neither high recall nor precision.

Page 37: Thesis defense, Heather Piwowar, Sharing biomedical research data

Look for wetlab methods in full text:

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1522022&tool=pmcentrezhttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1590031&tool=pmcentrez

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1482311&tool=pmcentrez#id331936http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2082469&tool=pmcentrez

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=126870&tool=pmcentrez#id442745

Aim 2a: Identify studies that create gene expression microarray data

Page 38: Thesis defense, Heather Piwowar, Sharing biomedical research data

Query environment:

Full-text portals query 85% of articles available through U of Pittsburgh library digital subscriptions.

Page 39: Thesis defense, Heather Piwowar, Sharing biomedical research data

Development set?

Open access articles.

Page 40: Thesis defense, Heather Piwowar, Sharing biomedical research data

Features? Unigrams and bigrams from full text

Training classifications? Automatic filter for whether publication had an associated dataset deposited in a database

Feature selection and combination:

Page 41: Thesis defense, Heather Piwowar, Sharing biomedical research data

Derived query:

("gene expression" AND microarray AND cell AND rna)

AND (rneasy OR trizol OR "real-time pcr")

NOT (“tissue microarray*” OR “cpg island*”)

Page 42: Thesis defense, Heather Piwowar, Sharing biomedical research data

Evaluation:

Ochsner et al. Nature Methods (2008) vol. 5 (12) pp. 991• 400 studies across 20 journals

Precision: 90% (86% to 93%) Recall: 56% (52% to 61%)

Page 43: Thesis defense, Heather Piwowar, Sharing biomedical research data

Conclusion:  We derived a query with high precision and adequate recall to identify studies that created microarray data

Aim 2a: Identify studies that create gene expression microarray data

Page 44: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 2b

Page 45: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 2b: Identify studies that share their expression microarray data

http://www.flickr.com/photos/dcassaa/422261773/

Page 46: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 2b: Identify studies that share their expression microarray data

Page 47: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 2b: Identify studies that share their expression microarray data

Page 48: Thesis defense, Heather Piwowar, Sharing biomedical research data

Querying GEO and ArrayExpress for PubMed IDs identified 77% of datasets that were publicly available somewhere on the internet.

Aim 2b: Identify studies that share their expression microarray data

Page 49: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 2b: Identify studies that share their expression microarray data

Page 50: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 2b: Identify studies that share their expression microarray data

Page 51: Thesis defense, Heather Piwowar, Sharing biomedical research data

Conclusion:  we have a method to find most gene expression microarray datasets shared on the internet, without much bias.

Aim 2b: Identify studies that share their expression microarray data

Page 52: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 3

Page 53: Thesis defense, Heather Piwowar, Sharing biomedical research data

Aim 3 – How often is data shared? What predicts sharing? How can we model sharing behavior?

Aim 2a + 

Aim 2b + 

lots of stats

http://www.flickr.com/photos/cogdog/123072/

Page 54: Thesis defense, Heather Piwowar, Sharing biomedical research data

Is research data shared after publication?

Funder Journal Investigator Institution Study

Page 55: Thesis defense, Heather Piwowar, Sharing biomedical research data

funded by NIH?

size of grant

sharing plan req’d?

funded by non-NIH?

impact factor

strength of policy

open access?

number of microarray studies published

years since first paper

# pubs

# citations

previously shared?

previously reused?

gender

sector

size

impact rank

country

humans?

mice?

plants?

cancer?

clinical trial?

number of authors

year

Funder Journal Investigator Institution Study

Page 56: Thesis defense, Heather Piwowar, Sharing biomedical research data

journal rank

Page 57: Thesis defense, Heather Piwowar, Sharing biomedical research data

“An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …”

http://www.nature.com/authors/editorial_policies/availability.html

http://www.nature.com/nature/journal/v453/n7197/index.html

journal data sharing policy

Page 58: Thesis defense, Heather Piwowar, Sharing biomedical research data

institution rank

Yu et al. BMC medical informatics and decision making (2007) vol. 7 pp. 17

Page 59: Thesis defense, Heather Piwowar, Sharing biomedical research data

study type

Page 60: Thesis defense, Heather Piwowar, Sharing biomedical research data

Author publication history:

Citation counts:

Author-ity web serviceTorvik & Smalheiser. (2009). Author Name Disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3):11.

Author name disambiguation:

author “experience”

Page 61: Thesis defense, Heather Piwowar, Sharing biomedical research data

author gender

Page 62: Thesis defense, Heather Piwowar, Sharing biomedical research data

funding level

PubMed grant lists + NIH grant details

Page 63: Thesis defense, Heather Piwowar, Sharing biomedical research data

funder mandates

Requires a data sharing planfor studies funded after October 2003

that receive more than $500 000 in direct funding per year

Page 64: Thesis defense, Heather Piwowar, Sharing biomedical research data

Proxy for NIH data sharing policy applicability:

If in any year since 2004,

• funded by an NIH grant number with a “1” or “2” type code

• received more than $750 000 in total funding from the grant

funder mandates

Page 65: Thesis defense, Heather Piwowar, Sharing biomedical research data

and so on...

124 variables

Page 66: Thesis defense, Heather Piwowar, Sharing biomedical research data

Univariate proportions

Factor analysis

Logistic regression

Second-order factor analysis

More logistic regression

stats

Page 67: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.flickr.com/photos/blatzandchocolate/4281306244/

Page 68: Thesis defense, Heather Piwowar, Sharing biomedical research data

11,603 datapoints

we found shared datasets for 25%

results

Page 69: Thesis defense, Heather Piwowar, Sharing biomedical research data

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Year article published

Pro

po

rtio

n o

f a

rtic

les w

ith

da

tase

ts f

ou

nd

in

GE

O o

r A

rra

yE

xp

ress

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Proportion of articles with shared datasets, by year

Across time

Page 70: Thesis defense, Heather Piwowar, Sharing biomedical research data

univariate analysis

Page 71: Thesis defense, Heather Piwowar, Sharing biomedical research data

Ph

ysio

l G

en

om

ics

PL

oS

Ge

ne

t

Ge

no

me

Bio

l

Microbiology

PL

oS

On

e

BM

C G

en

om

ics

Pla

nt

Ce

ll

Ge

no

me

Re

s

Eu

ka

ryo

t C

ell

Ap

pl E

nviro

n M

icro

bio

lB

MC

Me

d G

en

om

ics

Hu

m M

ol G

en

et

Pro

c N

atl A

ca

d S

ci U

S A

Infe

ct

Imm

un

Am

J R

esp

ir C

ell

Mo

l B

iol

De

v B

iol

J B

acte

rio

l

Mo

l E

nd

ocrin

ol

BM

C C

an

ce

r

Pla

nt

Ph

ysio

lB

iol R

ep

rod

Blood

J I

mm

un

ol

FA

SE

B J

To

xic

ol S

ci

J E

xp

Bo

tN

ucle

ic A

cid

s R

es

Diabetes

Mo

l C

ell B

iol

Mo

l C

an

ce

r T

he

r

BM

C B

ioin

form

atics

Ste

m C

ells

FE

BS

Le

tt

J N

eu

rosci

Am

J P

ath

ol

J B

iol C

he

m

J V

iro

l

OTHER

Ca

nce

r R

es

J C

lin

En

do

crin

ol M

eta

b

Pla

nt

Mo

l B

iol

Clin

Ca

nce

r R

es

Genomics

Inve

st

Op

hth

alm

ol V

is S

ci

Mo

l H

um

Re

pro

dCarcinogenesis

Gene

Endocrinology

Oncogene

Ca

nce

r L

ett

Bio

ch

em

Bio

ph

ys R

es C

om

mu

n

Pro

port

ion o

f data

sets

share

d

0.0

0.2

0.4

0.6

0.8

1.0 Journals

Page 72: Thesis defense, Heather Piwowar, Sharing biomedical research data

Sta

nfo

rd U

niv

ers

ity

Un

ive

rsity o

f P

en

nsylv

an

ia

Un

ive

rsity o

f Illin

ois

Un

ive

rsity o

f C

alif

orn

ia,

Lo

s A

ng

ele

s

Un

ive

rsity o

f W

isco

nsin

, M

ad

iso

n

Un

ive

rsity o

f W

ash

ing

ton

Un

ive

rsity o

f C

alif

orn

ia,

Da

vis

Th

e U

niv

ers

ity o

f B

ritish

Co

lum

bia

Un

ive

rsity o

f C

alif

orn

ia,

Sa

n F

ran

cis

co

Un

ive

rsity o

f F

lorid

a

Un

ive

rsity o

f C

alif

orn

ia,

Sa

n D

ieg

o

Un

ive

rsity o

f M

inn

eso

ta,

Tw

in C

itie

s

Ba

ylo

r C

olle

ge

of

Me

dic

ine

OTHER

Ma

x P

lan

ck G

ese

llsch

aft

Ha

rva

rd U

niv

ers

ity

Du

ke

Un

ive

rsity M

ed

ica

l C

en

ter

Ya

le U

niv

ers

ity

Jo

hn

s H

op

kin

s U

niv

ers

ity

Un

ive

rsity o

f P

itts

bu

rgh

Wa

sh

ing

ton

Un

ive

rsity in

Sa

int

Lo

uis

Un

ive

rsity o

f T

oro

nto

Un

ive

rsity o

f C

alif

orn

ia,

Be

rke

ley

Un

ive

rsity o

f M

ich

iga

n,

An

n A

rbo

r

Mic

hig

an

Sta

te U

niv

ers

ity

Na

tio

na

l C

an

ce

r In

stitu

te

To

kyo

Da

iga

ku

Pro

po

rtio

n o

f d

ata

se

ts s

ha

red

0.0

0.2

0.4

0.6

0.8

1.0

Institutions

Page 73: Thesis defense, Heather Piwowar, Sharing biomedical research data

1

101

201

301

401

501

601

701

801

901

1001

1101

1201

1301

1401

1501

1601

1701

1801

1901

Pro

po

rtio

n o

f d

ata

se

ts s

ha

red

0.0

0.2

0.4

0.6

0.8

1.0

Institutionrank

Page 74: Thesis defense, Heather Piwowar, Sharing biomedical research data

multivariate analysis

Page 75: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 76: Thesis defense, Heather Piwowar, Sharing biomedical research data

factor analysis

Page 77: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 78: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 79: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 80: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 81: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 82: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 83: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 84: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 85: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 86: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 87: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 88: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 89: Thesis defense, Heather Piwowar, Sharing biomedical research data

logistic regression

Page 90: Thesis defense, Heather Piwowar, Sharing biomedical research data

Odds Ratio

0.25 0.50 1.00 2.00 4.00 8.00

Has journal policy0.95Count of R01 & other NIH grants

Authors prev GEOAE sharing & OA & microarray creation

NO K funding or P funding

Institution high citations & collaboration

Journal impact

Journal policy consequences & long halflife

NOT animals or mice

Instititution is government & NOT higher ed

Last author num prev pubs & first year pub

Large NIH grant

Humans & cancer

NO geo reuse + YES high institution output

First author num prev pubs & first year pub

Multivariate nonlinear regressions with interactions

Odds Ratio

0.25 0.50 1.00 2.00 4.00 8.00

Has journal policy0.95Count of R01 & other NIH grants

Authors prev GEOAE sharing & OA & microarray creation

NO K funding or P funding

Journal impact

Journal policy consequences & long halflife

Institution high citations & collaboration

NOT animals or mice

Instititution is government & NOT higher ed

Last author num prev pubs & first year pub

Large NIH grant

Humans & cancer

NO geo reuse + YES high institution output

First author num prev pubs & first year pub

Multivariate nonlinear regressions with interactions

Page 91: Thesis defense, Heather Piwowar, Sharing biomedical research data

Odds Ratio

0.25 0.50 1.00 2.00 4.00 8.00

Has journal policy0.95Count of R01 & other NIH grants

Authors prev GEOAE sharing & OA & microarray creation

NO K funding or P funding

Institution high citations & collaboration

Journal impact

Journal policy consequences & long halflife

NOT animals or mice

Instititution is government & NOT higher ed

Last author num prev pubs & first year pub

Large NIH grant

Humans & cancer

NO geo reuse + YES high institution output

First author num prev pubs & first year pub

Multivariate nonlinear regressions with interactions

Odds Ratio

0.25 0.50 1.00 2.00 4.00 8.00

Has journal policy0.95Count of R01 & other NIH grants

Authors prev GEOAE sharing & OA & microarray creation

NO K funding or P funding

Journal impact

Journal policy consequences & long halflife

Institution high citations & collaboration

NOT animals or mice

Instititution is government & NOT higher ed

Last author num prev pubs & first year pub

Large NIH grant

Humans & cancer

NO geo reuse + YES high institution output

First author num prev pubs & first year pub

Multivariate nonlinear regressions with interactions

Page 92: Thesis defense, Heather Piwowar, Sharing biomedical research data

second-order factor analysis

Page 93: Thesis defense, Heather Piwowar, Sharing biomedical research data

Instititu

tio

n is g

ove

rnm

en

t &

NO

T h

igh

er

ed

NO

T in

stitu

tio

n N

CI

or

intr

am

ura

l

NO

K f

un

din

g o

r P

fu

nd

ing

Jo

urn

al p

olic

y c

on

se

qu

en

ce

s &

lo

ng

ha

lflif

e

Au

tho

rs p

rev G

EO

AE

sh

arin

g &

OA

& m

icro

arr

ay c

rea

tio

n

Institu

tio

n h

igh

cita

tio

ns &

co

llab

ora

tio

n

NO

T a

nim

als

or

mic

e

First

au

tho

r n

um

pre

v p

ub

s &

first

ye

ar

pu

b

Hu

ma

ns &

ca

nce

r

Co

un

t o

f R

01

& o

the

r N

IH g

ran

ts

La

rge

NIH

gra

nt

Ha

s jo

urn

al p

olic

y

NO

ge

o r

eu

se

+ Y

ES

hig

h in

stitu

tio

n o

utp

ut

La

st

au

tho

r n

um

pre

v p

ub

s &

first

ye

ar

pu

b

Jo

urn

al im

pa

ct

Journal impact

Last author num prev pubs & first year pub

NO geo reuse + YES high institution output

Has journal policy

Large NIH grant

Count of R01 & other NIH grants

Humans & cancer

First author num prev pubs & first year pub

NOT animals or mice

Institution high citations & collaboration

Authors prev GEOAE sharing & OA & microarray creation

Journal policy consequences & long halflife

NO K funding or P funding

NOT institution NCI or intramural

Instititution is government & NOT higher ed

Page 94: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 95: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 96: Thesis defense, Heather Piwowar, Sharing biomedical research data

logistic regressionusing second-order factors

Page 97: Thesis defense, Heather Piwowar, Sharing biomedical research data

Odds Ratio

0.25 0.50 1.00 2.00 4.00

OA journal & previous GEO-AE sharing

0.95Amount of NIH funding

Journal impact factor and policy

Higher Ed in USA

Cancer & humans

Multivariate nonlinear regression with interactions

Page 98: Thesis defense, Heather Piwowar, Sharing biomedical research data

Odds Ratio

0.25 0.50 1.00 2.00 4.00

OA journal & previous GEO-AE sharing

0.95Amount of NIH funding

Journal impact factor and policy

Higher Ed in USA

Cancer & humans

Multivariate nonlinear regression with interactions

Page 99: Thesis defense, Heather Piwowar, Sharing biomedical research data

size of effect:split at the medians of the factors

Page 100: Thesis defense, Heather Piwowar, Sharing biomedical research data

Overall:25%

Page 101: Thesis defense, Heather Piwowar, Sharing biomedical research data

Open access/previous

sharing: 31%

LessOA/prev

sharing: 19%

Overall:25%

Page 102: Thesis defense, Heather Piwowar, Sharing biomedical research data

Open access/previous

sharing: 31%

LessOA/prev

sharing: 19%

cancer/human: 18%

Notcancer/human:

32%

Overall:25%

Page 103: Thesis defense, Heather Piwowar, Sharing biomedical research data

24% 37%Open access/

previous sharing: 31%

13% 25%Less

OA/prev sharing: 19%

cancer/human: 18%

Notcancer/human:

32%

Overall:25%

Page 104: Thesis defense, Heather Piwowar, Sharing biomedical research data

Conclusions:

• data sharing rates are increasing, but overall levels are low

Preliminary evidence:• levels are particularly low in cancer• levels are highest for those who are publishing OA, have shared before

Page 105: Thesis defense, Heather Piwowar, Sharing biomedical research data

• data and filters were imperfect• many assumptions• didn’t capture all types of sharing• don’t know how generalizable across datatypes• should be considered hypothesis-generating

http://www.flickr.com/photos/vlastula/300102949/

Page 106: Thesis defense, Heather Piwowar, Sharing biomedical research data

Goal of this dissertation:

Collect useful evidence on patterns of data sharing behaviour through methods that can be applied broadly, repeatably, and cost-effectively.

Page 107: Thesis defense, Heather Piwowar, Sharing biomedical research data

contribution

• Aim 1 publication cited 45 times in Google Scholar, including by several editorials and books

• Aim 2 methods reused in a neuroethics study at UBC• Aim 3 revealed evidence suggesting areas with high and

low data sharing adoption for future study• data collection was mostly automated using mostly free,

and open resources• dataset, collection code, analysis scripts to be made

openly available upon publication of thesis

Page 108: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.flickr.com/photos/skrb/2427171774/

what’s next?

Page 109: Thesis defense, Heather Piwowar, Sharing biomedical research data

More data analysis

Including:• Citation analysis of the 11,603 articles• Analysis with a focus on policy variables• Causality through structural equation

modeling

doi/10.1371/journal.pone.0008469.g002

Page 110: Thesis defense, Heather Piwowar, Sharing biomedical research data

Begin to investigate reuse

http://www.flickr.com/photos/boitabulle/3668162701/

Page 111: Thesis defense, Heather Piwowar, Sharing biomedical research data

who reuses data?

when?

why aren’t they?

which datasets are most likely to be reused?

what can we do about it?

how many datasets could be reused but aren’t?

why?

who doesn’t?

what should we do about it?

Page 112: Thesis defense, Heather Piwowar, Sharing biomedical research data

Postdoctoral Research Associate in the Sharing, Preservation, and Stewardship of Scientific Data

Potential areas of focus include:• overcoming social and technological

barriers to data deposition among scientists

• the roles and interactions of individual scientists, journals/publishers, institutions, and the variety of disciplinary repositories

• ...

Post‐doc of my dreams

http://www.flickr.com/photos/gatewaystreets/3838452287/

Page 113: Thesis defense, Heather Piwowar, Sharing biomedical research data

Enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it.

Dryad is a repository of data underlying scientific publications, with an initial focus on evolution, ecology, and related fields.

The National Evolutionary Synthesis Center, NSF-funded:

• Duke University,• UNC at Chapel Hill• North Carolina State University

Page 114: Thesis defense, Heather Piwowar, Sharing biomedical research data

Data sharing is hard.

I share my code and data at http://www.researchremix.org

It is hard.Some is better than none.Be the change you want to see.

http://www.flickr.com/photos/myklroventine/892446624/

Page 115: Thesis defense, Heather Piwowar, Sharing biomedical research data

Thanks to

the Dept of Biomedical Informatics at the U of Pittsburgh,

the NLM for funding through training grant 5 T15 LM007059,

those who openly publish their data, source code, papers, photos,

Dr. Wendy Chapman for her support and feedback,

My family.

Page 116: Thesis defense, Heather Piwowar, Sharing biomedical research data
Page 117: Thesis defense, Heather Piwowar, Sharing biomedical research data

http://www.flickr.com/photos/jep42/3017149415/in/set-72157608797298056/