ORCID and data publication...Data Citation Principles workshop, Harvard 16-17 May 2011 G. A....

36
Data Citation Principles workshop, Harvard 16-17 May 2011 http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID ORCID and data publication Identifying knowledge contributors to motivate sharing 1 Gudmundur A. Thorisson <[email protected] > Tony Brookes bioinformatics group Departments of Genetics University of Leicester -- Outline -- Pretext: my route to workshop Ongoing & planned data publication projects Disease genetics data Planned integration with ORCID for researcher identification Role of ORCID in data publication ecosystem? [shameless] plug for Sept workshop on researcher identity This work can be freely copied, redistributed and adapted, as long as proper attribution is given Monday, 16 May 2011

Transcript of ORCID and data publication...Data Citation Principles workshop, Harvard 16-17 May 2011 G. A....

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    ORCID and data publicationIdentifying knowledge contributors to motivate sharing

    1

    Gudmundur A. Thorisson Tony Brookes bioinformatics group

    Departments of GeneticsUniversity of Leicester

    -- Outline --• Pretext: my route to workshop

    • Ongoing & planned data publication projects

    • Disease genetics data

    • Planned integration with ORCID for researcher identification

    • Role of ORCID in data publication ecosystem?

    • [shameless] plug for Sept workshop on researcher identity

    This work can be freely copied, redistributed and adapted, as long as proper attribution is given

    Monday, 16 May 2011

    mailto:[email protected]:[email protected]

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Pretext

    2

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    3

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    44

    Prof Anthony J Brookes GEN2PHEN coordinatorChair, Bioinformatics and GenomicsDepartment of GeneticsUniversity of Leicester, UK

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    5

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    The data sharing problem

    6

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Lack of incentives for sharing

    • Effort required to prepare, package and submit datasets to public repositories

    • Time better spent writing papers & grants

    • All sticks (funders, journals) - no carrots

    • Need incentives - treat data as publications and credit creators

    7

    “[...] Many of the issues regarding data availability can be addressed if the principles of “publication” rather than “sharing” are applied. However, online data publication systems also need to develop mechanisms for data citation and indices of data access comparable to those for citation systems in print journals”

    Costello, M. Motivating Online Publication of Data. BioScience (2009) vol. 59 (5) pp. 418-427

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Name ambiguity => attribution challenges

    8

    Are these authors all the same person?G. Thorisson, University of LeicesterG. A. Thorisson, University of LeicesterG. A. Thorisson, Cold Spring Harbor Laboratory

    J. SmithJ. SmithJ. SmithJ. SmithJ. Smith [etc.]

    Or these?

    ∼2/3 of the ∼6 million authors in MEDLINE share a last name and first initial with at least one other author, and an ambiguous name refers to ∼8 persons on average.Torvik and Smalheiser. Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data (2009) vol. 3 (3)

    How about these?

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    ORCID

    F67572010

    ?

    ORCID ID: B-1242-2010G. Thorisson, Univ. LeicesterG. A. Thorisson, Univ. LeicesterG. A. Thorisson, Cold Spring Harbor Lab.

    ORCID ID: G-1442-2009J. Smith, Univ. North Pole

    ORCID ID: D-2400-2010J. Smith, Luthor Corporation

    ORCID - tackling the contributor identity problem

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Projects

    10

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    1110

    1. Diagnostic laboratories

    2. Central ‘clearinghouse’

    3. End-users (e.g. LSDB curators)

    Publish data Retrieve Atom feeds

    Submi&ng  muta,ons  from  diagnos,c  labs  using  “Café  RouGE  enabled”  so

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    12

    Cafe Variome - facilitating exchange of genetic data

    dbSNP  (coding)UniProt

    PhenCode

    Submission  from  diag.  lab

    Metadata  describing  varia,on  data  published  elsewhere

    Data  shared    with  diverse  3rd  par,es  and  data  usage/cita,on  tracked  via  DOI

    ×

    DOI  assigned  to  incoming  data  upload

    Already  stable  IDs  so  no  DOI  assignedA@ribu,on  given  to  data  submi@ers

    via  ORCID  unique  iden,fier

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    12

    Cafe Variome - facilitating exchange of genetic data

    dbSNP  (coding)UniProt

    PhenCode

    Submission  from  diag.  lab

    Metadata  describing  varia,on  data  published  elsewhere

    Data  shared    with  diverse  3rd  par,es  and  data  usage/cita,on  tracked  via  DOI

    ×

    DOI  assigned  to  incoming  data  upload

    Already  stable  IDs  so  no  DOI  assignedA@ribu,on  given  to  data  submi@ers

    via  ORCID  unique  iden,fier

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    13

    G. Thorisson, Univ. [email protected]

    ORCID ID: A-883-2010

    4x variants in BRCA2gene in patient X

    Publication credit for Cafe Variome deposits

    CV user has linked his user account with his ORCID profile

    Monday, 16 May 2011

    mailto:[email protected]:[email protected]:[email protected]:[email protected]

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    13

    G. Thorisson, Univ. [email protected]

    ORCID ID: A-883-2010

    4x variants in BRCA2gene in patient X

    G. A. Thorisson (A-883-2010). 4x variants in BRCA2 gene. Published online via Cafe Variome. 21 January (2011) doi:10.1255/caferouge.BRCA2-2352354

    => http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354

    Publication credit for Cafe Variome deposits

    CV user has linked his user account with his ORCID profile

    Monday, 16 May 2011

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://test.caferouge.org/atomserver/v1/caferouge/mutations/2352354http://test.caferouge.org/atomserver/v1/caferouge/mutations/2352354

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    GWAS nanopublications• Foray into semantic publishing

    – GWAS Central as ‘nano-publisher’

    – variantdisease assertion as nanopub

    rs19243 Type II diabetes + condition & provenance

    • Provenance part to include:– Contributors IDs

    – Contributor roles:

    • Author(s) on original GWAS paper

    • Curator

    • Registrant

    • Citability: register DOI for nanopub?

    14

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    BRIF - measuring bioresource use and impact

    • Biobanks: collections of biomaterials + associated metadata – Identification: citing, acknowledging, tracking use of

    – Evaluation: assess impact

    – Attribution: crediting PIs, repository managers, technicians [?]

    • Digital resources, incl. biomedical databases– E.g. locus-specific databases (LSDBs), variation archives (e.g. Cafe Variome)

    – How to acknowledge researchers who:

    • Maintain vital community resource (e.g. http://www.wormbase.org )

    • Undertake value-adding curation

    – Micro-attribution: Giardine, B. et al. Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach. Nature Genetics advance on, (2011). http://dx.doi.org/10.1038/ng.785

    • BRIF online group: http://bit.ly/brif-group

    15

    Monday, 16 May 2011

    http://www.wormbase.orghttp://www.wormbase.orghttp://dx.doi.org/10.1038/ng.785http://dx.doi.org/10.1038/ng.785http://bit.ly/brif-grouphttp://bit.ly/brif-group

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Identifying & citing databases

    16

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Identifying & citing databases• Bio-databases are often cited as a collection

    – E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”

    – Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk

    16

    Monday, 16 May 2011

    http://dx.doi.org/10.1093/nar/26.1.253http://dx.doi.org/10.1093/nar/26.1.253http://dx.doi.org/10.1093/nar/25.1.181http://dx.doi.org/10.1093/nar/25.1.181https://oi.gene.le.ac.ukhttps://oi.gene.le.ac.uk

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Identifying & citing databases• Bio-databases are often cited as a collection

    – E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”

    – Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk

    • Are DOIs appropriate? - db’s are not ‘unchanging entities’

    16

    Monday, 16 May 2011

    http://dx.doi.org/10.1093/nar/26.1.253http://dx.doi.org/10.1093/nar/26.1.253http://dx.doi.org/10.1093/nar/25.1.181http://dx.doi.org/10.1093/nar/25.1.181https://oi.gene.le.ac.ukhttps://oi.gene.le.ac.uk

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Identifying & citing databases• Bio-databases are often cited as a collection

    – E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”

    – Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk

    • Are DOIs appropriate? - db’s are not ‘unchanging entities’

    • Minimal information about a database - include DOI name?– What does the DOI point to? URL for database site vs. URL for db description

    16

    Monday, 16 May 2011

    http://dx.doi.org/10.1093/nar/26.1.253http://dx.doi.org/10.1093/nar/26.1.253http://dx.doi.org/10.1093/nar/25.1.181http://dx.doi.org/10.1093/nar/25.1.181https://oi.gene.le.ac.ukhttps://oi.gene.le.ac.uk

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Attributing contributions to bio-resources

    17

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Attributing contributions to bio-resources

    • Database curation– Management: R. Dalgleish A-3523-534-144 10.5335/lsdb.oi.325dff

    Temporary curator appointment: J. Smith G-1442-2009 10.5335/lsdb.oi.325dff

    – Microattribution: fine-grained tracking of curator activity (insert/update/delete)

    17

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Attributing contributions to bio-resources

    • Database curation– Management: R. Dalgleish A-3523-534-144 10.5335/lsdb.oi.325dff

    Temporary curator appointment: J. Smith G-1442-2009 10.5335/lsdb.oi.325dff

    – Microattribution: fine-grained tracking of curator activity (insert/update/delete)

    • Biobanking activities– Principal Investigator responsible for project (aka ‘corresponding author’)

    – Laboratory personnel?

    – Clinical collaborators?

    17

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Characterizing citations and contributions

    18

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Characterizing citations and contributions• What is the nature of the resource citation?

    – acknowledgement / earlier or related work

    – reused data or materials

    – extended methodology

    – ‘..this study is flawed and complete rubbish!!’

    18

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Characterizing citations and contributions• What is the nature of the resource citation?

    – acknowledgement / earlier or related work

    – reused data or materials

    – extended methodology

    – ‘..this study is flawed and complete rubbish!!’

    • What is the nature of my contribution to the resource?– Paper: authored / undertook analysis / conceived of study / designed experiment

    – Dataset: created / submitted / managed

    – Database: curator / manager / PI responsible

    – Biobank: sample collector / day-to-day manager / ??

    – Temporal aspect:

    • E.g. Mummi contributed in a curator role for SwissProt Jun 2004 to Oct 2009

    18

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Semantic frameworks for scientific publishing

    19

    Shotton, D., 2010. CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics, 1(Suppl 1).doi:10.1186/2041-1480-1-S1-S6

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Semantic frameworks for scientific publishing

    my study Thorisson et al. 2008 doi:10.433/888544jamaX

    my study Biobank X doi:10.424/35xxjapan.5 ??

    G. Thorisson (A-523-44-3423) Biobank X doi:10.424/35xxjapan??

    19

    Shotton, D., 2010. CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics, 1(Suppl 1).doi:10.1186/2041-1480-1-S1-S6

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    20

    Role of ORCID?

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    • Who contributed to dataset 10.4259/psycho.5gtpq-thorisson?

    • All data publications by ORCID A-883-2010 ?

    • Which papers have cited the works of ORCID A-883-2010 ?

    • Total no. citations to datasets by A-883-2010 in the last 2 years?

    • Total no. downloads of datasets by A-883-2010?

    • Which database projects has A-883-2010 contributed to?

    • [...]

    G. Thorisson, Univ. [email protected]

    ORCID ID: A-883-2010

    Why track all this stuff?Enable aggregation of contributions by unique researcher ID

    Monday, 16 May 2011

    mailto:[email protected]:[email protected]:[email protected]:[email protected]

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Current ORCID status & timeline

    • Alpha prototype– Running on a sandbox website for limited testing

    • partial functionality - based on ResearcherID software

    • Early adopters / collaborators

    • Looking to collaborate with projects– Gather use cases => feed requirements for ORCID

    core system

    – WHERE/HOW might ORCID be used to identify contributors?

    – Joint fund-seeking to do pilot implementations

    22

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Current ORCID status & timeline

    • Alpha prototype– Running on a sandbox website for limited testing

    • partial functionality - based on ResearcherID software

    • Early adopters / collaborators

    • Looking to collaborate with projects– Gather use cases => feed requirements for ORCID

    core system

    – WHERE/HOW might ORCID be used to identify contributors?

    – Joint fund-seeking to do pilot implementations

    22

    • Timeline for live beta system: early 2012

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    Example: SageCite?

    • i) dataset published in SageCommons– assigned DOI via DataCite

    – attribution link deposited in ORCID

    • ii) derivative datasets published in SageCommons– assigned DOI => DataCite

    – attribution link deposited in ORCID

    • iii) analysis workflow published via myExperiment– attribution => ORCID (creator/submitter & others who contributed)

    – DOI (or not - not essential?)

    23

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    24

    Monday, 16 May 2011

  • Data Citation Principles workshop, Harvard 16-17 May 2011

    http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

    25

    GEN2PHEN Consortiumhttp://www.gen2phen.org/about-gen2phen/partners

    Prof Anthony J. Brookes Bioinformatics Group

    This work has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)under grant agreement number 200754 - the GEN2PHEN project.

    Acknowledgements

    Contact me! Gudmundur ‘Mummi’ Thorisson

    |http://friendfeed.com/mummi

    http://www.linkedin.com/in/mummihttp://www.twitter.com/gthorisson

    Monday, 16 May 2011

    http://www.gen2phen.org/about-gen2phen/partnershttp://www.gen2phen.org/about-gen2phen/partnersmailto:[email protected]:[email protected]:[email protected]:[email protected]://friendfeed.com/mummihttp://friendfeed.com/mummihttp://www.linkedin.com/in/mummihttp://www.linkedin.com/in/mummihttp://www.twitter.com/gthorissonhttp://www.twitter.com/gthorisson