Semantic web at Novartis
-
Upload
novartis-institutes-for-biomedical-research -
Category
Science
-
view
2.869 -
download
0
Transcript of Semantic web at Novartis
Experiences in Novartis Andrea Splendiani, Sr Scientific KE Consultant Geneve, Dec 2nd 2015
Semantic Web @Novartis
Semantic Web @Novartis
2
Topics
§ Semantic Web @Novartis • Context (Where in Novartis) • Semantic Web in production • Semantic Web in research • Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web uptake in time
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use 3
Context
Metastore/RDF
prep. production
“Semantic Web in pubmed” preparation
prep
Query federation
Visualisation
Other semantic technologies
CTMF p. p.
Semantic Web usage within the organization
4
Context
Activities of TMS:
§ Text mining
§ Ontology development
§ Ontology provision
§ Data curation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web @Novartis
5
Topics
§ Semantic Web @Novartis • Context (Where in Novartis) • Semantic Web in production • Semantic Web in research • Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Metastore: a central repository for ontologies
6
Semantic Web in production: Metastore
§ Consists of a semantic data federation layer based on controlled terminologies extracted from scientific data repositories
§ Organized around scientific concepts: Genes, Proteins, Indications, Anatomy etc…; some hierarchically organized and classified
§ Complemented by referential knowledge (cross references to internal and external knowledge repositories)
§ Supports different use cases, including text mining, data curation, data integration, search
§ Accessible through SPARQL endpoint, dedicated service layer and reusable widgets; full integrated application (MS Viewer) released to visualize all Metastore content.
§ Based on an RDF data model
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Metastore: content and usage
7
Semantic Web in production: Metastore
Approximately >2M accesses per month
March 2013
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Metastore data model
8
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Metastore technology I
9
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Metastore technology II
10
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Staging Table T_STABLE
RDF Triple store
Materialized Views
SPARQL end Point Joseki
Relational Tables • Pointers • History • Versions • Logs • Reference
tables
Jena
Query SQL and PL/SQL APIs
DATA - Services
RDF/XML files
Metastore Widgets (suggest example)
11
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Metastore applications (Metastore viewer: summary)
12
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Metastore applications (Metastore viewer: links)
13
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Metastore applications (Metastore viewer: explorer)
14
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web @Novartis
15
Topics
§ Semantic Web @Novartis • Context (Where in Novartis) • Semantic Web in production • Semantic Web in research
- Query federation - Visualization/interaction - Other projects
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Query federation: why and how
16
Semantic Web in Research: query federation
• Internal and external data already in RDF
• Large datasets in relational systems
• Proprietary datasets with license restrictions (e.g.: one server only)
• Relational 2 RDF mapping (materialised and virtualised)
• Bridge ontologies (work in progress)
• Distributed queries (service)
Why ? How ?
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Data and systems architecture: example
17
Semantic Web in Research: query federation
Different arrangements possible (with caveats)
Export!triplest !
SERVICE!Dynamic translation!
Persist triples!
Ontop!SPARQL End Point!
NIBR!Data
Warehouse!!
Ontop!API!
Assay Repository!
RDBMS!
Allegrograph!!
Triplestore & End point!
UNIPROT/EBI SPARQL End
Point!METASTORE!
Oracle Spatial & graphs!
R2RML!+ reasoning!
Metastore!
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Federated query example
18
Semantic Web in Research: query federation
Assays
UNIPROT
Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Federated queries: logical model
19
Semantic Web in Research: query federation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
RDF virtualization via OnTop
20
Semantic Web in Research: query federation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web @Novartis
21
Topics
§ Semantic Web @Novartis • Context (Where in Novartis) • Semantic Web in production • Semantic Web in research
- Query federation - Visualization/interaction - Other projects
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Visualization: why and how
22
Semantic Web in research: visulization and interaction
• Accessibility of RDF data by end users
• Complexity (or unfamiliarity) with SPARQL
• General lack of knowledge on the structure of data, at query time
• Visual, interactive environment
• Pre-configuration to optimize interaction styles
• Combination of tools and exploration paradigms
• Data access through SPARQL endpoints
Why ? How ?
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
RDF data explorer configuration
23
Semantic Web in research: visulization and interaction
§ Visualisation features are tuned to the datasets via a semi-automatic configuration.
§ Structure discovery:
• ontology
• queries
• sampling
• manual specification/overriding
§ Manual tuning of the ontology and other interaction parameters
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Data overview
24
Semantic Web in research: visulization and interaction
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Interaction: query builder + suggest
25
Semantic Web in research: visulization and interaction
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Interaction: path suggestions
26
Semantic Web in research: visulization and interaction
Assisted query formulation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Visulization and graph navigation
27
Semantic Web in research: visulization and interaction
Detail, Augmentation, Filtering, query re-formulation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Exploration, layouts, graphic clues
28
Semantic Web in research: visulization and interaction
Detail, Augmentation, Filtering, query re-formulation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Multiple exports, sharing
29
Semantic Web in research: visulization and interaction
§ “queries” can be saved and shared as files or links
§ Query history
§ Download of partial or total datasets
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web @Novartis
30
Topics
§ Semantic Web @Novartis • Context (Where in Novartis) • Semantic Web in production • Semantic Web in research
- Query federation - Visualization/interaction - Other projects
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
31
Example: provision of “phenotype ontologies” Semantic Web in Research: other projects
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
<owl:Class rdf:about="http://purl.obolibrary.org/obo/HP_0001636"> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Tetralogy of Fallot</rdfs:label> <owl:equivalentClass> <owl:Restriction> <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/> <owl:someValuesFrom> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <rdf:Description rdf:about="http://purl.obolibrary.org/obo/PATO_0000001"/> <owl:Restriction> <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/> <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/HP_0001629"/> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/> <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/HP_0001642"/> </owl:Restriction>
…
What systems can understand: HP_0001636 hasPart HP_0001629
32
Example: provision of “phenotype ontologies” Semantic Web in Research: other projects
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
<owl:Class rdf:about="http://purl.obolibrary.org/obo/HP_0001636"> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Tetralogy of Fallot</rdfs:label> <owl:equivalentClass> <owl:Restriction> <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/> <owl:someValuesFrom> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <rdf:Description rdf:about="http://purl.obolibrary.org/obo/PATO_0000001"/> <owl:Restriction> <owl:onProperty rdfresource="http://purl.obolibrary.org/obo/BFO_0000051"/> <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/HP_0001629"/> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/> <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/HP_0001642"/> </owl:Restriction>
What systems can understand: HP_0001636 hasPart HP_0001629
Imports closure
Classification
Extraction
Semantic Web @Novartis
33
Topics
§ Semantic Web @Novartis • Context (Where in Novartis) • Semantic Web in production • Semantic Web in research
- Query federation - Visualization/interaction - Other projects
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
CTMF: Collaborative Terminology Management
34
Semantic web under the hood: CTMF
§ The CTMF is a system designed to allow a distributed “editing of ontologies”.
§ Users can request new “terms” via a web interface or within an application.
§ “Content owners” can “assess” whether the requested terms are new concepts or synonyms (or errors!) and update the ontologies.
§ Resolution is asynchronous and the term request is non-blocking for applications
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
CTMF web application (new request form)
35
Semantic web under the hood: CTMF
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
CTMF: integration in applications
36
Semantic web under the hood: CTMF
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
CTMF: term status page and discussion
37
Semantic web under the hood: CTMF
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
CTMF: process (use of temporary ID)
38
Semantic web under the hood: CTMF
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Under the hood
39
Semantic web under the hood: CTMF
§ Basic principle of the Semantic Web: identity comes first. • What “people can talk about” is give an URI, and information is built around it.
§ The CTMF adopts the same approach: • a “term” request is in itself identifying a concept: what the requestor had in mind at the time of the
request. We give this idea a URI (the term status page) • Information is built around this request (clarification). • A “content owner” can assess whether the concept is identical to something already in metastore
(most likely what was requested for was a synonym), or whether a new concept should be introduced.
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web @Novartis
40
Topics
§ Semantic Web @Novartis • Context (Where in Novartis) • Semantic Web in production • Semantic Web in research • Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web @Novartis
41
Topics
§ Semantic Web @Novartis • Context (Where in Novartis) • Semantic Web in production • Semantic Web in research
- Query federation - Visualization/interaction - Other projects
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web in Real Life: Open questions
42
Data trumps everything
§ If there is a choice between better technology to access data, and better data, the latter prevails. • Corollary: interest is often where there is little data, especially in the
public domain.
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web in Real Life: Open questions
43
Industry (or real life) is big
§ Areas that look nearby on paper may be very distant organization-wise. • Bench-to-bedside data integration
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web in Real Life: Open questions
44
You don’t know the semantics of your data
§ The semantic expressiveness of RDF may be too much for what is represented in your data. • You don’t always make your data
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web in Real Life: Open questions
45
Is data integration really a shared goal ?
§ Not all stakeholders have interest in “opening” their data. • When does a data producer gain in making its data more
accessible ?
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Semantic Web in Real Life: Open questions
46
Many people are doing SemWeb without knowing it
§ “My project is not based on RDF, it is based on a graph with properties from controlled vocabularies.” • Why not RDF?
- Too academic - Need something that works - URIs are too long
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
§ Therese Vachon
§ Pierre Parisot
§ Katia Vella
§ Frederic Sutter
§ Daniel Cronenberger
§ Fatma Oezdemir-Zaech
§ Anosha Siripala
§ Olivier Kreim
§ Gilles Hubert
§ Laurentiu Stanculescu
§ Marc Lieber
§ Martin Rezk (OnTop)
§ Andrea Splendiani
47
Semantic Web technologies experiences in Novartis
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use