Peter Adrien - Financial Restructuring in the OECS Countries [ECCB]
Bioschemas presentation at ECCB 2016, The Hague
-
Upload
niall-beard -
Category
Science
-
view
236 -
download
0
Transcript of Bioschemas presentation at ECCB 2016, The Hague
Bioschemas.org
Structured data for Life Sciences using
Schema.org Niall Beard
Scientific Web Technologist, University of Manchester
ELIXIR: European infrastructure for biological informationData infrastructure for Europe’s life-science research:
www.elixir-europe.org
@ELIXIREurope
Data
Interoperability
Tools
Compute
Training
Marine metagenomics
Human data
Crop and forest plants
Rare diseases
• 20 Members • 1 Observer
ELIXIR Hub based alongside EMBL-EBI in Hinxton
• 20 Members• 1 Observer
FAIRFindable
Accessible
Interoperable
Reusable
Finding resources – Search engine index
Resource Resource Resource
Finding resources – Catalogues
bio.tools
tess.elixir-uk.org
Discover resources by filtering metadata
Finding resources – Content Integration platforms
Training Resource
Training Resource
Training Resource
Tool Resource
Tool Resource
Tool Resource
bio.tools
tess.elixir-uk.org
Programmatically aggregated
Bio.tools XSD
https://github.com/bio-tools/biotoolsxsd
Metadata modelie. Recipe type
<div itemscope itemtype="http://schema.org/Recipe">
<div itemprop="nutrition” itemscopeitemtype="http://schema.org/NutritionInformation">
Nutrition facts: <span itemprop="calories">144 kcal</span>, </div>
Ingredients: - <span itemprop="recipeIngredient">800g small new potato</span> - <span itemprop="recipeIngredient">3 shallot</span> . . .
<script type="application/ld+json">{ "@context": "http://schema.org", "@type": ”Recipe", "name": ”Potato Salad", “NutritionInformation”: {
"calories”: “144 kcal”, "recipeIngredient”: “800g small new potato”, "recipeIngredient”: “3 shallot”. . .
Search engine readable = optimized
Content Content Content
Schema.org Schema.org Schema.org
Search engines favour websites containing schema.org in their search results
Content integration aggregationTraining Resource
Training Resource
Training Resource
Schema.org Schema.org Schema.org
tess.elixir-uk.org
Minimum informationControlled vocabularies
Cardinality
Data model
New properties
BioSchemas.orgminimal, maximal, extensible
Trainingmaterials
Events Organizations
Data
Standards
Software
Minimum information
for one content type
Trainingmaterials
Events Organizations
DataSoftware
Standards
Common properties
among content types
More depth to a broad-reach technology
DepthDATS
Reach
Use case 1: TeSS, ELIXIR Training Portal - Aggregates Life Science Training Materials
Large Training Sites• Well-formed APIs• XML Dumps • RSS feeds
Medium/Small Sites• No structured data
The long tail, collections sets and small science
Slide courtesy of Todd Vision, Dryad
http://www.france-bioinformatique.fr/en/training_material
https://search.google.com/structured-data/testing-tool
Applied Drupal 7 schema.org extensionTook about 2 hours
Included in TeSS in an hour
Biosamples entry(Diabetic mouse strain)
Diabetes termEFO_0000400 Experimental
Factor Ontology
Defined byisAbout
Courtesy of Tony Burdett and Simon Jupp
Use case 2: Mapping data to ontologies
Organization- name
MedicalEntity- name- description
MedicalCode- codeValue- codingSystem
MedicalCode- name- url- alternateName- description- codeValue- codingSystem…
CreativeWork- about- name- description- url- datePublished…
Data Term Ontology
Courtesy of Tony Burdett and Simon Jupp
Use case 2: Mapping data to ontologies
Use case 3.1: Dataset Markup, Citation
• Dataset Citation• Mapping to JATS Journal Article
Tag Suite Data extension*• Metadata for data citationGoogle, Bing, Yahoo, Yandex
Trainingmaterials
Events Organizations
DataSoftware
Standards
*Daniel Mietchen et al , Adapting JATS to support data citation, Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015, Bethesda (MD): National Center for Biotechnology Information 2015.
Use case 3.2: Dataset Markup, Samples
• Biobank Samples• Limited number of simple key
properties• Disease, gender, age and
sample type, data available• Cross-walk MIABIS: Minimum
Information About BIobank data Sharing
Google, Bing, Yahoo, Yandex
Trainingmaterials
Events Organizations
DataSoftware
Standards
Cataloging 400 UK Biobanks
Value for content providers
• More exposition through search engines and portals• Favoured in search results
• Low barrier for adoption• Embedding schema.org in pages can be done with off-the-
shelf CMS • Tools for most frameworks and web scripting languages
• Longevity of Standard • Standard is open to the wider community and will survive
past funding• Less chance of the schema deprecating after
implementation
Value for content integration platforms
• Good benefits to persuade providers to structure their data
• Lots of tooling available for parsing structured data• Many open RDFa, JSON-LD, and microdata parses
available on GitHub• Wider community engaged in construction
• Schema.org is a public forum so not limited to just the people you know
• Much more scalable than scraping • Bespoke scripts that gain technical debt when scraping
Development Process
Acknowledgements
Acknowledgments
• TeSSNiall Beard
• BioSharingSA Sansone, A Gonzalez-Beltran, P McQuilton, P Rocca-Serra
• NIH BD2K bioCADDIESA Sansone, A Gonzalez-Beltran, Jeff Grethe
• CommunityPremysl Velek
• EventMartin Cook
• Training materialsAleksandra Nenadic & Gabriella Rustici
Organization representatives
Group chairs
BioSchemas community
• ELIXIRPremysl Velek
• Pistoia AllianceRichard Holland
• GOBLETTerri Attwood
• BBMRIMichaela Mayrhofer
• OrganizationRichard Holland & Rafael C Jimenez
• PersonNiall Beard
• StandardA Gonzalez-Beltran & P McQuilton
Contributors• Aleksandra Nenadic• Adam Hospital • Gabriella Rustici• Carlos Horro• Martin Cook• Niall Beard• Rafael C Jimenez• Andy Jenkinson• Manuel Corpas• Roberto Preste• Richard Holland• Alejandra Gonzalez-Beltran• Andrew Lonie• Carole Coble• Peter McQuilton• Premysil Velek• Ian Dunlop• Jef Grethe• Milo Thurston• Niklas Blomberg
• Isabelle Perseil• Jaap Heringa• Jon Ison• John Hancock• Simon Jupp• John (Jack) D. Van Horn • Ivana Krenkova• Laura Furlong• Morris Swertz• Mateusz Kuzak• Mario Alberich• Mark Thompson• Maria Martin• Mikael Borg• Montserrat González• Norman Morrison• Núria Queralt-Rosinach• Olivier Sallou• Robert Pergl• Pedro Fernandes
• Yasset Perez-Riverol• Sarala Wimalaratne• Nick Juty• Jose Luis Ambite• Brane Leskošek• Celia van Gelder• Christa Janko• Christine Staiger• Dan Brickley• Daniel Faria• Dmitry Repchevsky• Daniel Sobral• Daniel Vaughan• Ian Fore• Frederik Coppens• Josep Ll. Gelpi• ChuQiao Gong• Hedi Peterson• Hervé Ménager• Nina Hrtonova
• Pierre Larmande• Rob Finn• Renzo Kottmann• Rodrigo Lopez• Sameer Velankar• Sara Light• Carol Shreffler • Silvano Squizzato• Susanna Sansone• Tony Burdett• Terri Attwood• Cath Brooksbank• Hedi Peterson• Luc Deltombe• Michaela Mayrhofer• Philippe Rocca-Serra
Upcoming Bioschemas Activities
• Biosoftware description using bio.tools and schema.org - NETTAB, 24th October
• Bioschemas AGM on 8th-9th November in Rothamsted UK• See: https://goo.gl/hu7uYK
• Implementation study proposal being drafted• Develop more content types for life sciences:
• Data repository• Dataset• Sample• Phenotype• Protein annotations