ODIN – ORCID and DATACITE Interoperability Network

32
ODIN – ORCID and DATACITE Interoperability Network Presentation to S&C Open House January 2013 John Kaye – British Library Funded by The European Union Seventh Framework Programme www.slideshare.net/ johnkayebl www.odin- project.eu

description

ODIN – ORCID and DATACITE Interoperability Network. Presentation to S&C Open House January 2013. John Kaye – British Library. www.slideshare.net/johnkayebl. www.odin-project.eu. Funded by The European Union Seventh Framework Programme. Overview. People and Resource Identifiers - PowerPoint PPT Presentation

Transcript of ODIN – ORCID and DATACITE Interoperability Network

Page 1: ODIN –  ORCID and DATACITE Interoperability Network

ODIN – ORCID and DATACITE Interoperability Network

Presentation to S&C Open House

January 2013

John Kaye – British Library

Funded by The European Union Seventh Framework Programme

www.slideshare.net/johnkayebl www.odin-project.eu

Page 2: ODIN –  ORCID and DATACITE Interoperability Network

Overview

• People and Resource Identifiers• ODIN Overview• Project Structure• Humanities and Social Science Proof of Concept• High Energy Physics Proof of Concept• Results• Commonalties• Risks

Page 3: ODIN –  ORCID and DATACITE Interoperability Network

People Identifers

• Uniquely identify people• Open Researcher

and Contributor ID (ORCID)

• International Standard Name Identifier (ISNI)

• JISC Names

Page 4: ODIN –  ORCID and DATACITE Interoperability Network

Identifers

• Uniquely identify Research Objects• Digital Object Identifiers

• DataCite (DOI’s)• CrossRef DOI’s

Page 5: ODIN –  ORCID and DATACITE Interoperability Network

Identifers

• Uniquely identify Research Objects• Archival Research Key’s

(ARK’s)• International Standard

Book Numbers (ISBN’s)• Uniform Resource

Locators (URL’s)• Institutional and other ID’s

Page 6: ODIN –  ORCID and DATACITE Interoperability Network

ODIN Partners

Page 7: ODIN –  ORCID and DATACITE Interoperability Network

ODIN Overview

• 2 year project funded under EC FP7 Coordination and Action Programme

• Build on ORCID and Datacite initiatives to uniquely identify and connect scientists and datasets

• Project is not limited to these identifiers

• ‘Datasets’ has a broad definition (anything but journals) so can include grey literature, presentations, code etc.

• Connect information across multiple services and infrastructures for scholarly communications

• www.odin-project.eu

Page 8: ODIN –  ORCID and DATACITE Interoperability Network

Overview

• Infrastructure already exists for researchers to build up an open portfolio of research objects

Page 9: ODIN –  ORCID and DATACITE Interoperability Network

Overview

• Register an ORCID ID www.orcid.org and link published papers using ORCID’s tools

Page 10: ODIN –  ORCID and DATACITE Interoperability Network

Overview

• Non published outputs (working papers, datasets) can be deposited in figshare http://figshare.com/ given a DataCite DOI and linked back and added to ORCID profile

• ODIN wants to expand on this principle and engage with data centres and institutional repositories to allow easier more open discovery of non-traditional research outputs.

Page 11: ODIN –  ORCID and DATACITE Interoperability Network

Overview

• View the impact of your work using traditional citation metrics and social citations

Page 12: ODIN –  ORCID and DATACITE Interoperability Network

Project Structure

Page 13: ODIN –  ORCID and DATACITE Interoperability Network

Proofs of Concept Objectives

• Develop two disciplinary proofs of the concept of open and interoperable persistent identifiers of data and contributors in scholarly communication, in a variety of current and future scenarios.

Specific goals:

• Prove the ability to navigate across data and contributors in the Humanities and Social Sciences (HSS) where data and contributors are separated in space and time, with curators bridging the gap;

• Prove the ability to navigate across data and contributors in High-Energy Physics (HEP), where multiple version of articles in preliminary and final form, with several thousand contributors, need to be associated with a correspondent dataset hosted in different systems

• Identify, by a critical analysis of the proofs of concept, common issues in open and interoperable permanent identifiers of data and contributors, by establishing a common cross-disciplinary view on the relevant workflows

Page 14: ODIN –  ORCID and DATACITE Interoperability Network

Deliverables and Time frames

• D3.1 HSS Proof of Concept – Aug 2013• D3.2 HEP Proof of Concept – Aug 2013• D3.3 Commonalities – Sept 2014

• Milestone: Commonalities Identified Jan 2014

• D3.1 and D3.2 Validated by the community at 1st year event (Hackathon)

Page 15: ODIN –  ORCID and DATACITE Interoperability Network

Humanities and Social Sciences (HSS)

Page 16: ODIN –  ORCID and DATACITE Interoperability Network

HSS: Birth Cohort Studies

• Why Birth Cohort Studies?• Investment• Established/Long history• Tradition of data curation• High Re-use• Derived Data• Multi-disciplinary• BL Involvement in CLOSER (Cohort and Longitudinal

Studies Enhancement Resource)

Page 17: ODIN –  ORCID and DATACITE Interoperability Network

HSS: Current Status

• HSS British Birth Cohort characteristics:• High re-use of data• Data analysed across cohorts (e.g. 1958 questions alongside 2000)• Derived data often kept outside original repository• Lots of ‘grey literature’ (working papers, pre-prints etc.)• Different publication spaces (publishers, institutional repositories)

• Challenges:• Uniquely associate articles/datasets with authors/contributors from a range of

data sources• Authors/creators/researchers go back a long way (could be as early as 1946)• How to deal with non-digital research outputs• How to deal with cross-cohort analysis (multiple datasets, derived datasets)• Associate datasets with articles and track impact of data re-use• Survey questions often more important to identify than actual survey (survey

contains thousands of variables)

Page 18: ODIN –  ORCID and DATACITE Interoperability Network

HSS: Objectives

• Identify workflows and develop conceptual model

• Provide technical solutions for Identifying and connecting data creators, authors, researchers, contributors and research objects related to British Birth Cohort Studies

• Identify, use and link existing identifiers and data sources where possible

• Identify deficiencies in identification or relationship data and develop or propose solutions

• Work with the research community to develop user case studies and data collection and enhancement

• Create an open and interoperable network linking people and research objects to allow Impact Tracking and Resource Discovery

Page 19: ODIN –  ORCID and DATACITE Interoperability Network

HSS Proof of Concept

19581958

19701970

External Data (Census,

Health etc)

External Data (Census,

Health etc)

Data Creator, Researcher, Author

Birth Cohort Study dataset

Non- Birth Cohort Study dataset

Derived dataset

Citation

Data Creator

Derived Data CreatorExternal Data input

Grey Literature

Published article

Author: Grey lit

Author: Article

Page 20: ODIN –  ORCID and DATACITE Interoperability Network

HSS Proof of Concept

19581958

19701970

External Data (Census,

Health etc)

External Data (Census,

Health etc)

Data Creator, Researcher, Author

Birth Cohort Study dataset

Non- Birth Cohort Study dataset

Derived dataset

Citation

Data Creator

Derived Data CreatorExternal Data input

Grey Literature

Published article

Author: Grey lit

Author: Article

Page 21: ODIN –  ORCID and DATACITE Interoperability Network

HSS Proof of Concept

19581958

19701970

External Data (Census,

Health etc)

External Data (Census,

Health etc)

Data Creator, Researcher, Author

Birth Cohort Study dataset

Non- Birth Cohort Study dataset

Derived dataset

Citation

Data Creator

Derived Data CreatorExternal Data input

Grey Literature

Published article

Author: Grey lit

Author: Artticle

Page 22: ODIN –  ORCID and DATACITE Interoperability Network

HSS Proof of Concept

19581958

19701970

External Data (Census,

Health etc)

External Data (Census,

Health etc)

Data Creator, Researcher, Author

Birth Cohort Study dataset

Non- Birth Cohort Study dataset

Derived dataset

Citation

Data Creator

Derived Data CreatorExternal Data input

Grey Literature

Published article

Author: Grey lit

Author: Article

Page 23: ODIN –  ORCID and DATACITE Interoperability Network

HSS: Identifiers and Data Sources

Researchers etc.: ORCID, ISNI, JISC Names, SCOPUS, Surveys, Citation DB’s, UK Data Service, Catalogue metadata

Source Datasets: DataCite DOIs, ESDS

Derived Data: DataCite DOIs, Institutional ID’s, No ID’s, ESDS, Surveys, Institutional Repositories

‘External’ Data: DataCite DOIs, Institutional ID’s, No ID’s, ESDS, Other datacentres, NHS, Institutional etc.

Grey Literature: DataCite DOIs, Institutional ID’s, No ID’s, Surveys, ESDS, Institutions

Published Literature: CrossRef DOIs, Institutional ID’s, No ID’s, SCOPUS Surveys, ESDS, Institutions, Citation DB’s, Catalogue metadata

Page 24: ODIN –  ORCID and DATACITE Interoperability Network

HSS: Phase 1(First Year)

• Preliminary conceptual models for connecting data creators, curators, contributors and data sets will be described in the proof of concept report, disseminated at the first year conference and be the object of a first synoptic evaluation by WP6 (Internationalisation), together with all other project outputs.

• During M8, one selected expert from Australian National Data Service will stay for 3 weeks at the British Library to provide input to proof of concept, based on their experience.

Page 25: ODIN –  ORCID and DATACITE Interoperability Network

HSS: Immediate Actions

• Technical Lead/Developer in post• Identify Application Programming Interfaces (APIs) and Data Sources• Identify work already completed• Investigate name matches between journals and datasets• Investigate how to record non-digital research outputs in the model (older reports,

books)• Investigate how to deal with Grey literature and non-standard research objects

(images, audio recordings etc.)• Identify suitable Researcher ID’s for different types of researcher/uses

• 1946 author may be in ISNI• 2012 author in JISC Names or ORCID

• Create workflows• Build conceptual model• Use cases and visualisations• Investigate domain specific data citation granularity (Dataset level to question level)• Investigate additional Impact Data? Grants? ESRC/MRC Institutional ID’s?

Page 26: ODIN –  ORCID and DATACITE Interoperability Network

HSS: Phase 2 (Second Year)

• Concrete workflows will be designed and developed • Particular care will be paid to unique aspects of the data, such as

their value in terms of influencing decisions or government policy or practice.

• The implementation of such workflows will be done in strong collaboration with the relevant stakeholders in the community, such as UK Data Service and DataCite members like Gesis and ICPSR

• Final results will be demonstrated in the final event.• During M20, one selected expert from ANDS will stay for 3 weeks at

the British Library to provide input to the commonalities report

Page 27: ODIN –  ORCID and DATACITE Interoperability Network

High Energy Physics

Page 28: ODIN –  ORCID and DATACITE Interoperability Network

Current status (I)

HEP (High-Energy Physics) field specificities: Multiversioning: from preprint versions until final

publications Hyperauthorship: hundreds/thousands of scientists signing

the same article Data levels of abstraction (CERN, Inspire, HEPData) Different publication spaces (arXiv, Inspire, publishers)

Challenges: Author identification, improvement of the disambiguation

process done in place Uniquely associate articles/datasets with authors/contributors Version management during the long publication process

Page 29: ODIN –  ORCID and DATACITE Interoperability Network

Current status (II)

Current Inspire interface

Page 30: ODIN –  ORCID and DATACITE Interoperability Network

Current status (III)

Current Inspire interface

Disambiguation process among thousands of authors: Names and affiliations Different ways to write the

same information Clustering algorithm

Page 31: ODIN –  ORCID and DATACITE Interoperability Network

Phase 2: Results and Commonalities

• Results to feed into Hackathon event at CERN in October 2013 and the strategy

• Assessment and validation by research community and international partners

• BL and CERN come together to find commonalities in the disciplines to inform WP4 (interoperability)

• This process will incorporate knowledge from the results of the Hackathon as well as the conceptual model for global interoperability of data and contributor identifiers developed in WP4

• This task will result in a more comprehensive view on disciplinary and interdisciplinary needs, and will produce information, internally transferred to the other work packages

• This information will help shape the discussion of scholarly communication beyond the project’s lifetime.

Page 32: ODIN –  ORCID and DATACITE Interoperability Network

Questions?

John Kaye – Lead Curator Digital Social Sciences

The British Library

96 Euston Road

London NW1 2DB

[email protected]

Twitter: @johnkayebl

Telephone: 020 7412 7450

Project Website http://odin-project.eu/

Blog: http://britishlibrary.typepad.co.uk/socialscience/