smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
-
Upload
mark-wilkinson -
Category
Internet
-
view
322 -
download
3
Transcript of smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
Slide 1
1@micheldumontier & @markmobyThe smartAPI project
Mark D. WilkinsonCenter for Plant Biotechnology and Genomics UPM-INIA, Madrid
On behalf of
Michel DumontierMaastricht University
Discovering interconnected web APIs with semantic metadata
1
2@micheldumontier & @markmobyBiomedical data analysis is increasingly being done using cloud-based, web-friendly application programming interfaces (APIs).
BUT its pretty much impossible to automatically discover which API to use and how to connect these together to create an effective workflow.
Background
3@micheldumontier & @markmobyAPI Catalogs
17,202 APIs1,187 APIs6206 APIs
15,128 APIsSHARE Registry
4@micheldumontier & @markmobyVariable Metadata
5@micheldumontier & @markmobyVariable Metadata
6@micheldumontier & @markmobyVariable Metadata
7@micheldumontier & @markmobyVariable Metadata
8@micheldumontier & @markmoby
9@micheldumontier & @markmoby
The parameter called sequence can have values that are FASTA formatted sequences
10@micheldumontier & @markmoby
The average bioinformatician can traverse these links, read these API documents, and make reasonably good guesses about how to access the service
But this is limited to the speed and patience of a human
11@micheldumontier & @markmobyMeanwhile, in another registry
12@micheldumontier & @markmobyVariable Metadata
13@micheldumontier & @markmobyVariable Metadata
Different metadata fieldsdescribing ~the same operation (BLAST)
14@micheldumontier & @markmobyVariable Metadata
15@micheldumontier & @markmobyVariable Metadata
In this case, the parameter is called QUERY, and it can consume an Accession (???...), a GI, or a FASTA formatted sequence
16@micheldumontier & @markmobyIf you really work and dig-around
A human can use Service Registries to findmost of the information they need
(though they still need experience and/or guesswork!)
17@micheldumontier & @markmobyWeak or absent input/output descriptors
makes pipelining of services difficult based solely on registry metadata
18@micheldumontier & @markmobyWeak or absent input/output descriptors
And even with ~well-described servicespipelining remains troublesome
19@micheldumontier & @markmoby
20@micheldumontier & @markmoby
myGene.info: Input parameters(described using the openAPI descriptor standard)
21@micheldumontier & @markmobymyGene.info: Input parameters(described using the openAPI descriptor standard)
From the openAPI description, A bioinformatician can learn thatthe geneid parameter can be an Entrez or EnsEMBL gene id
22@micheldumontier & @markmoby
myGene.info: Input parameters(described using the openAPI descriptor standard)
GenemyGene.info
23@micheldumontier & @markmoby
myGene.info: Input parameters(described using the openAPI descriptor standard)
GenemyGene.info?
24@micheldumontier & @markmoby
myGene.info: Input parameters(described using the openAPI descriptor standard)
GenemyGene.infoJSON
25@micheldumontier & @markmoby
GenBank identifierAffymetrix identifierTaxonomy identifier 1340 lines HGNC symbol?NCBI Gene TerminologyA big block of JSON!
What do these symbols refer to?How do we find out more?
26@micheldumontier & @markmobyTwo distinct problems:
Discovery of a tool that does what you need
Understanding how to use the tool you discovered
Its inputs and outputs (what kind of information, and in what format/syntax, with which parameter names, required/optional?)How it can be chained with other tools into more complex analytical workflows.
27@micheldumontier & @markmobyMore contemporary registries get us closer
28@micheldumontier & @markmoby
Crowdsourced API registry (some curation)Features ontology-constrained fields
29@micheldumontier & @markmoby
Crowdsourced API registry (some curation)Features ontology-constrained fields
GUID
30@micheldumontier & @markmoby
Crowdsourced API registry (some curation)Features ontology-constrained fields
EDAM:operation_0346
31@micheldumontier & @markmoby
Crowdsourced API registry (some curation)Features ontology-constrained fields
EDAM:data_2044
32@micheldumontier & @markmoby
Crowdsourced API registry (some curation)Features ontology-constrained fields
EDAM:data_0857
33@micheldumontier & @markmoby
Crowdsourced API registry (some curation)Features ontology-constrained fieldsNo description of I/O parameters (for non-browser-based interaction)
Description of data formats are sometimes available (and also grounded in EDAM ontology) but inconsistent
Only possible to use this API registry for discovery, not for invocation
(i.e. solves problem #1, but not #2)
Also invented a novel Service Descriptor format requires de novo tool-building
34@micheldumontier & @markmoby
Semantic Health and Research Environment - SHARE - Registry(synopsis interface)
35@micheldumontier & @markmoby
Semantic Health and Research Environment (SHARE) RegistryUses the myGrid Service descriptor (same as )
36@micheldumontier & @markmoby
Semantic Health and Research Environment (SHARE) RegistryUses ontology terms for both data types and service operation types, much as with(but allows/encourages any ontology)
37@micheldumontier & @markmoby
Semantic Health and Research Environment (SHARE) Registry
SADI standardizes service interfaces such that the interface itself is also defined by these ontology terms (i.e. data must be owl:Individuals of the ontological type)
38@micheldumontier & @markmoby
Semantic Health and Research Environment (SHARE) Registry
and therefore.
39@micheldumontier & @markmoby
Semantic Health and Research Environment (SHARE) Registry
Automated synthesis of, and invocation of, complex Service pipelines from independent providers
40@micheldumontier & @markmoby
Semantic Health and Research Environment (SHARE) Registry
Automated gap filling for unavailable data
Automated detection of useful data combinations
41@micheldumontier & @markmoby
Semantic Health and Research Environment (SHARE) Registry
SADI assumes a world of 100% OWL/RDF data
(Good) OWL can be quite hard to write!
42@micheldumontier & @markmobyBarely describedNo automationHard to find and useNot FAIRRichly describedFully automatableFully FAIR
43@micheldumontier & @markmobyBarely describedNo automationHard to find and useNot FAIRRichly describedFully automatableFully FAIRAn incremental path to increasingly rich semantically-controlled metadata
that
Does not invent new standards
and
Is easy for our end-users to create
44@micheldumontier & @markmoby
45@micheldumontier & @markmobyThe goal is to reduce the barrier for the discovery and reuse of web APIs through richer semantic metadata.
a coordinated facility for the intelligent and facile annotation of smart APIsa web application to discover smart APIs and how they connect to each other.
1 year supplement in collaboration with HeartBD2K center - Peipei Ping (PI), Andrew Su and Chunlei Wu.smartAPI
46@micheldumontier & @markmobyBuild on API metadata specification standards
SWAGGER
47@micheldumontier & @markmobyTools for Intelligent API Metadata Authoring
Build on CEDAR technology Generate the Service metadata capture Web Form from a smartAPI template (CEDAR)
Discover context-appropriate annotation recommendations to enhance harmonization
Validate and give improvement suggestions
48@micheldumontier & @markmobyMetadata authoring will connect to numerous existing resources
Identifier syntax and link outs475 ontologies and terminologies
49@micheldumontier & @markmoby
50@micheldumontier & @markmoby
Smart Profiling
51@micheldumontier & @markmoby
Smart Profiling(not the same as Extreme Vetting ;-) )
52@micheldumontier & @markmobyUsing information from identifiers.org, MIRIAM, and prefix-commons, make some intelligent guesses about what a given data field might be
Enhanced suggestions for the end-user annotator
53@micheldumontier & @markmobyUse this to automatically map API data to Linked Open Data
53
54@micheldumontier & @markmobySteps along the stairway
55@micheldumontier & @markmobyMetadata Survey
We performed a survey of 3 repositories (Biocatalogue, Programmable Web, Elixir Tools & Services Registry) and 4 specifications (MIAS, OPEN API, SADI, schema.org, and a preliminary smartAPI metadata specification).
56@micheldumontier & @markmoby
Metadata Elements 20 basic, 6 provider, 10 operation, 12 parameters, 6 response
57@micheldumontier & @markmobyMUSTNameAccess PointSHOULDDescriptionDocumentationResponse MIME-TypeTerms of ServiceAuthentication ModeVersionSSL SupportMAYWebsiteCategoryPublicationsAPI Access RestrictionsAccess Point MirrorsAPI Metadata FormatAPI Access Mode
API LocationAPI Implementation LanguageAPI MaturitySocial Media Links
58@micheldumontier & @markmobyMetadata authoring made easier. We augmented the Swagger Editor to autocomplete using the smartAPI Repository and enabled validation against the smartAPI specification.
59@micheldumontier & @markmoby
60@micheldumontier & @markmoby
Faceted Search Inteface. We implemented a lightweight web-based tool to perform faceted search and filtering over the elasticSearch repository of smartAPIs descriptions.
API Interoperability WG People
Michel DumontierAmrapali ZaveriShima DastgheibChunlei Wu
Ruben Verborgh
Caty ChungRaymond TerrynPaul AvillachGregg KelloggNolan Nicholshttp://mygene.info/
http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/http://smart-api.info/website/http://www.lincsproject.org/http://bd2k-picsure.hms.harvard.edu
https://spec-ops.iohttp://nidm.nidash.org/
Kevin OsbornDavid Steinberg
https://cgl.genomics.ucsc.edu/
Mark WilkinsonMary ShimoyamaJeff De Pons
Denise Lunahttp://sadiframework.orghttps://bd2kccc.org/
http://rgd.mcw.edu/
Kathleen Jagodnik61@micheldumontier & @markmoby
62@micheldumontier & @markmobyFacilitate the discoverability, interoperability, and reuse of web-based APIs Eliminate API data silos by providing FAIR (Findable, Accessible, Interoperable, Reuseable) Linked Data.
The tools, technologies, and design patterns developed in the pilot and WG should generalize to API development across the BD2K consortium (and beyond).
Take-home Message
63@micheldumontier & @markmobyMichel DumontierChunlei WuCyrus Afrasiabi (backend, repository API)Trish Whetzel (API profiling)Yash Vyas (recommendation engine)Amrapali Zaveri (metadata survey, template, web application, evaluation)Andrew Su (evaluation)Mark Wilkinson (evaluation)
TEAM
[email protected]: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier @micheldumontier & @markmoby
64
64