FAIRsharing presentation at the Japan Science and Technology Agency
-
Upload
peter-mcquilton -
Category
Science
-
view
95 -
download
2
Transcript of FAIRsharing presentation at the Japan Science and Technology Agency
Interlinking standards, repositories and policies
Peter McQuilton, PhD
@fairsharing_org
NBDC Seminar, Tokyo, Japan 4th Dec., 2017
Credit to: ttps://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/ 2014
But we don’t handle data well
A set of principles, for those wishing to enhance
the value of their
data holdings
Designed and endorsed by a diverse set of stakeholders - representing academia, industry, funding
agencies, and scholarly publishers
FAIRFindable
Accessible
Interoperable
Reusable
Visible, citable
Trackable
Community standards
Reproducible
These put emphasis on enhancing the
ability of machines to automatically find
and use the data, in addition to supporting
its reuse by individuals
• Not always well cited, stored
o Software, code, workflows are hard to find/access
• Poorly described for third party reuse
o Different levels of detail and annotation
• Curation activities are perceived as time-consuming
o Collection and harmonization of detailed methods and
experimental steps is rushed at the publication stage
Not FAIR – low findability and badly documented
• Available in a public repository
• Findable through some sort of search facility
• Retrievable in a standard format
• Self-described so that third parties can make sense of it
• Intended to outlive the experiment for which they were collected
To do better science, more efficiently, we need data that are…
My database is going offline, where should I
put the data, and in what format?
Before accepting my paper, this journal
wants my data to be in a public repository, but
which one?
My funder says I should deposit the data in a reputable
repository. But which one?
I’m collecting in-vivo animal
testing data –what metadata should I curate?
I’m about to start a set of experiments. In what
format should I record the data?
A web-based, curated, and searchable portal that monitors the
development and evolution of standards*, across all disciplines,
inter-related to databases/repositories and data policies
* A standard is a formal community specification for reporting, sharing and citing data, metadata and other digital assets
Initial focus on metadata (or content) standards
Content standards
Models/Formats = Conceptual
model, conceptual schema,
exchange formats
Terminologies = Controlled
vocabularies, taxonomies,
thesauri, ontologies etc.
Guidelines = Minimum information
reporting requirements, checklists
Formats Terminologies Guidelines
Formats Terminologies Guidelines
FAIRsharing enhances their findability
240+
119+
709+
Source:
Sources:
MIAME
MIRIAM
MIQASMIX
MIGEN
ARRIVEMIAPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
SRAxml
SOFT FASTA
DICOM
MzML
SBRML
SEDML…
GELML
ISA
CML
MITAB
AAO
CHEBIOBI
PATO ENVO
MOD
BTO
IDO…
TEDDY
PRO
XAO
DO
VO
~1500
Source:
Content standards
Data policies by funders, journals and other organizations
Databases/Repositories
Formats Terminologies Guidelines
Mapping a complex and evolving landscape
270
4823
2
97
87 4
204
9 6 8
Paper in preparation, preliminary information as of July 2017
Ready for use, implementation, or recommendation
In development
Status uncertain
Deprecated as subsumed or superseded
All records are manually curated
in-house and verified by the
community behind each resource
Community verified status indicators
My funder’s data policy recommends the use of established standards, but which are widely endorsed and applicable to my crop data?
We need a standard for sharing social science data, what’s out there and who should we talk to?
I have some old rice genomic data in format X, which is now deprecated; what format has replaced X?
Which are the mature standards and standards-compliant databases that we should recommend to our authors?
Collections group together
one or more types of
resource by domain,
project or organization.
Recommendations are a
core-set of resources that
are selected and
recommended by a funder
or journal data policy.
Grouping the data
A pan-European infrastructure for biological information
€19 million 2015 - 2019
Making FAIRsharing FAIR –Findable - ELIXIR
Making FAIRsharing FAIR –Findable - Bioschemas• Web mark-up – Schema.org and Bioschema.org
scscscsc/BioSchemas/specifications/tree/master/DataCatalog
Consortium of 33 pan-European organisations & 15 third parties covering a range of disciplines and organisations working together to develop a European-wide
governance framework for a pan-European “trusted virtual environment with free, open and seamless services for data storage, management, analysis, sharing and re-
use, across disciplines”
European Open Science Cloud (EOSC) Pilot
Wider adoption of FAIRsharing by many biomedical research infrastructure programmes in EU and USA, e.g.
Embeddable Widget
• Recommendation/Collection Widget for embedding in third-party websites• Journal data policies (GigaScience, PLOS, Springer
Nature…)
• Standard Developing Organisations (e.g. TDWG)
• Societies/Organisations (e.g. ELIXIR)
Dr Massimiliano Izzo
FAIR - Interoperability/Accessibility
• Data annotation:• Users/Maintainers – ORCID
• Organisations – FundRef
• Species – NCBI Taxon ontology
• Disciplines and Domains – re3data/EDAM/BRO
• API – swagger (ELIXIR guidelines)
• DOIs for standards (coming soon)
Reaching out to the community - What were the aims of the RDA /Force BioSharing WG?
• To develop guidelines for linking information on databases, content standards and journal and funder data policies in the life sciences
• To develop a curated registry (running since 2011),to access and cross-search this information, suchthat a variety of stakeholders can make decisionson which standards and databases to use orendorse
Standard developing groups, incl:Journal publishers, incl:
Cross-links, data exchange, incl:
Societies and organisations, incl: Institutional RDM services, incl:
Projects, programmes:
Working with and for the community
OBO