IASSIST 2011 presentation: Tracking Data Reuse Motivations, Methods, and Obstacles

33
Tracking Data Reuse Motivations, Methods, and Obstacles Heather Piwowar DataONE postdoc with NESCent and Dryad @researchremix IASSIST2011 #iassist

description

 

Transcript of IASSIST 2011 presentation: Tracking Data Reuse Motivations, Methods, and Obstacles

Page 1: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

Tracking Data Reuse Motivations, Methods, and Obstacles

Heather  PiwowarDataONE  postdoc  with  NESCent  and  Dryad

@researchremix  

IASSIST2011  #iassist

Page 2: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm

Page 3: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

http://www.flickr.com/photos/jsmjr/62443357/

Page 4: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

http://www.flickr.com/photos/camilleharrington/3587294608/

Page 5: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

http://www.flickr.com/photos/rkuhnau/3318245976/

Page 6: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

http://www.flickr.com/photos/conformpdx/1796399674/

Page 7: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

http://www.flickr.com/photos/rkuhnau/3317418699/

Page 8: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

http://www.flickr.com/photos/zemlinki/261617721/

Page 9: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

http://www.flickr.com/photos/tracenmatt/3020786491/

Page 10: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

http://www.flickr.com/photos/the-o/2078239333/

Page 14: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
Page 15: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
Page 16: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

In 2009, 116 articles cited ORNL DAAC data.

Finding these articles took 70-80 hours

across at least 12 resourcesall chosen from a deep understanding of this specific research domain

then the full text of all the hits were manually reviewed

Valerie Enriquez interview with James Kidderhttp://openwetware.org/wiki/DataONE:Notebook/Reuse_of_repository_data

Page 17: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

publicly  archived  dataset

dataset  has  an  iden2fier?(DOI,  url,  accession  #)

IDs  are  difficult  to  unambiguously  iden2fy  in  full  text    unless  they  have  a  unique  paCern  (DOI)  or  unusual  prefix  or  suffix.

search  in  full  text  of  all  papers

search  in  reference  sec2ons    of  all  papers

sort  hits  to  disambiguate  reuse  from  submission

dataset  submission  record  men2ons  data  collec2on  ar2cle  publica2on?

gather  papers  that  cite  the  data  collec2on  paper

sort  hits  to  disambiguate  reuse  from  other  cita2on  contexts

dataset  submission  record  has  submiCer  name  or  dataset  

2tle?

with  dataset  unique  ID

with  (submi-er  surname  AND  repository  name),  and  also(dataset  9tle  AND  repository  name)

with  (first  author  surname  AND  repository  name)

with  dataset  unique  ID

DOI/ID  search  not  supported  by  ISI  Web  of  Science  or  Scopus

DOI/ID  search  works  in  Google  Scholar,  but  scope  is  poorly  defined,  results  are  messy.

This  cita2on  paCern  (dataset  DOI/ID  in  references  sec2on)  is  used  almost  exclusively  for  dataset  reuse.    Manual  disambigua2on  not  required:    can  be  automated  pending  API  support.

Disambigua2on  is  2me  consuming:  most  cita2ons  are  not  in  the  context  of  reuse

Requires  access  to  full  text  of  search  hits  for  sor2ng

This  flow  s2ll  misses  aCribu2ons  embedded  in  supplementary  informa2on,  reuses  aCributed  through  a  query  descrip2on,  etc.

Disambigua2on  is  2me  consuming

Requires  access  to  full  text  of  search  hits  for  sor2ng

Only  finds  cita2ons  indexed  by  cita2on  databases

DOI/ID  reference  search  possible  in  full-­‐text  portals  like  PubMed  Central  and  HighWire  Press,  however  portal  coverage  is  limited  and  search  is  not  restricted  to  references  sec2on.

Cita2on  history  export  is  2me  consuming:    automa2on  not  supported.

This  cita2on  paCern  (cita2on  to  data  crea2on  paper)  is  very  common  in  some  subdisciplines,  so  probably  finds  most  reuses.

This  cita2on  paCern  (accession  numbers  in  full  text)  is  very  common  in  some  subdisciplines,  so  probably  finds  most  reuses.Requires  ability  to  query  

full  text  across  all  literature  that  may  contain  reuse

Link  to  data  collec2on  paper  oVen  missing  from  dataset  submission  record,  especially  when  dataset  submission  predates  ar2cle  publica2on.

Does  not  require  access  to  full-­‐text

How  to  iden9fy  Dataset  Reuse  in  the  published  literature

Names  and  2tles  are  messy  iden2fiers

Heather  Piwowar,  v1.0,  CC-­‐BY

This  cita2on  paCern  is  currently  rare

This  cita2on  paCern  is  difficult  to  track  with  exis2ng  tool  limita2ons

with  data  collec2on  ar2cle’s  journal,  volume,  page,  etc.

Page 18: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
Page 19: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

10 * 100 = 1000

Page 20: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

publication-based datasets

Page 21: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

deposited in 2005

Page 22: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
Page 23: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
Page 24: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

1. following citations to the paper that describes the data

collection, then filtering.

Page 25: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
Page 26: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

2. searching for accession numbers, urls, and DOIs in

full text

Page 27: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
Page 29: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

2005 long time ago

biomedicine familiar, also very dominant

search interfaces not well designed for this task

helpdesks are very helpful

Page 30: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

stay tuned for results

poster at ASIS&T, SIGUSE

Page 31: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

I post my data, code, and statistical scripts: http://researchremix.org

Share yours too!

-> Open Notebook Science

http://www.flickr.com/photos/myklroventine/892446624/

Page 33: IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles

thank youTodd Vision,

Estephanie Sta MariaJonathan CarlsonDryad and DataONE teams

The open science online community and those who release their articles, datasets and photos openly