Practical semantics in the pharmaceutical industry - the Open PHACTS project
-
Upload
orcid-0000-0002-2668-4821 -
Category
Technology
-
view
2.980 -
download
1
description
Transcript of Practical semantics in the pharmaceutical industry - the Open PHACTS project
![Page 1: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/1.jpg)
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Antony Williams
On behalf of the Open PHACTS Team
(and with a focus on Chemistry!)
![Page 2: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/2.jpg)
Fundamental issue:
There is a LOT of science online!
Chaotic, varying quality and very valuable!
Scientists want to find information quickly and easily
Often they just “can’t get there” (or don’t even know where “there” is)
And you have to manage it all (or not)
![Page 3: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/3.jpg)
Pre-competitive Informatics:Pharma are all accessing, processing, storing & re-processing external research data
LiteraturePubChem
GenbankPatents
DatabasesDownloads
Data Integration Data AnalysisFirewalled Databases
Repeat @ each
companyx
Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
![Page 4: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/4.jpg)
The Project
Innovative Medicines Initiative• EC funded public-private
partnership for pharmaceutical research
• Focus on key problems– Efficacy, Safety,
Education & Training, Knowledge Management
The Open PHACTS Project• Create a semantic integration hub (“Open
Pharmacological Space”)…• Delivering services to support on-going drug
discovery programs in pharma and public domain• Not just another project; Leading academics in
semantics, pharmacology and informatics, driven by solid industry business requirements
• 23 academic partners, 8 pharmaceutical companies, 3 biotechs INITIALLY
• Work split into clusters:• Technical Build • Scientific Drive• Community & Sustainability
![Page 5: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/5.jpg)
Major Work Streams
Build: OPS service layer and resource integration
Drive: Development of exemplar work packages & Applications
Sustain: Community engagement and long-term sustainability
Assertion & Meta Data MgmtTransform / TranslateIntegrator
OPS Service Layer
Corpus 1
‘Consumer’Firewall
SupplierFirewall
Db 2
Db 3
Db 4
Corpus 5
Std PublicVocabularies
TargetDossier
CompoundDossier
PharmacologicalNetworks
BusinessRules
Work Stream 1: Open Pharmacological Space (OPS) Service LayerStandardised software layer to allow public DD resource integration− Define standards and construct OPS service layer− Develop interface (API) for data access, integration
and analysis− Develop secure access models
Existing Drug Discovery (DD) Resource Integration
Work Stream 2: Exemplar Drug Discovery Informatics toolsDevelop exemplar services to test OPS Service Layer Target Dossier (Data Integration)Pharmacological Network Navigator (Data Visualisation)Compound Dossier (Data Analysis)
![Page 6: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/6.jpg)
ChEMBL DrugBankGene
OntologyWikipathways
UniProt
ChemSpider
UMLS
ConceptWiki
ChEBI
TrialTrove
GVKBio
GeneGo
TR Integrity
“Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM”
“What is the selectivity profile of known p38 inhibitors?”
“Let me compare MW, logP and PSA for known oxidoreductase inhibitors”
![Page 7: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/7.jpg)
Number sum Nr of 1 Question
15 12 9 All oxidoreductase inhibitors active <100nM in both human and mouse
18 14 8Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound?
24 13 8Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives.
32 13 8 For a given interaction profile, give me compounds similar to it.
37 13 8The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X.
38 13 8Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not).
41 13 8
A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature.
44 13 8 Give me all active compounds on a given target with the relevant assay data
46 13 8Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease)
59 14 8 Identify all known protein-protein interaction inhibitors
Business Question Driven Approach
![Page 8: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/8.jpg)
![Page 9: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/9.jpg)
Open PHACTS Scientific Services
Platform Explorer
Standards
Apps
API
“Provenance Everywhere”
![Page 10: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/10.jpg)
RDFNanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices
Identity Resolution
Service
Chemistry RegistrationNormalisation & Q/C
IdentifierManagement
Service
Indexing
Co
re P
latf
orm
P12374EC2.43.4
CS4532
“Adenosine receptor 2a”
RDF
VoID
Db
RDFNanopub
Db
VoID
RDF
Db
VoID
RDFNanopub
VoID
Public Content Commercial
Public Ontologies
User Annotations
Apps
![Page 11: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/11.jpg)
![Page 12: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/12.jpg)
RDF/VoIDRDF (Resource Description Framework)VoID (Vocabulary of Interlinked Datasets)
– Metadata describing the RDF– Describes how Datasets are linked using Linksets
• skos:exactMatch (Simple Knowledge Organisation System)E.g. To link compounds in OPS with compounds in ChEBI.• skos:closeMatch E.g. To link Stereo Insensitive Parents to their Children within OPS.• skos:relatedMatch E.g. To link Parent compounds that contain others as Fragments.• dul:expresses (DOLCE+DnS Ultralite) – describes what links the Datasets. We
use Cheminf to express the links E.g. http://semanticscience.org/resource/CHEMINF_000059 represents an InChIKey.
– Recommendations on how to create the VoID have been specified by Manchester here: http://www.cs.man.ac.uk/~graya/ops/2012/ED-datadesc/
![Page 13: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/13.jpg)
Chemistry RegistrationNormalisation & Q/C
Chemistry Registration
• Old chemistry registration system uses standard ChemSpider deposition system: includes low-level structure validation and manual curation service by RSC staff.
• New Registration System• Utilizes ChemSpider Validation and
Standardization platform including collapsing tautomers
• Utilizes FDA rule set as basis for standardizations
• Generate Open PHACTS identifier (OPS ID)
![Page 14: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/14.jpg)
STANDARD_TYPE UNIT_COUNT---------------- -------AC50 7 Activity 421 EC50 39 IC50 46 ID50 42 Ki 23 Log IC50 4 Log Ki 7 Potency 11 log IC50 0
STANDARD_TYPE STANDARD_UNITS COUNT(*)------------------ ------------------ --------IC50 nM 829448 IC50 ug.mL-1 41000 IC50 38521 IC50 ug/ml 2038 IC50 ug ml-1 509 IC50 mg kg-1 295 IC50 molar ratio 178 IC50 ug 117 IC50 % 113 IC50 uM well-1 52
~ 100 units
>5000 types
Implemented using the Quantities, Dimension, Units, TypesOntology (http://www.qudt.org/)
Quantitative Data Challenges
![Page 15: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/15.jpg)
Content Changes Regularly! POINT IN TIME
Source Initial Records Triples Properties
ChEMBL 1,149,792 ~1,091,462 cmpds ~8845 targets
146,079,194 17 cmpds13 targets
DrugBank 19,628~14,000 drugs ~5000 targets
517,584 74
UniProt 536,789 156,569,764 78
ENZYME 6,187 73,838 2
ChEBI 35,584 905,189 2
GO/GOA 38,137 24,574,774 42
ChemSpider/ACD 1,194,437 161,336,857 22 ACD, 4 CS
ConceptWiki 2,828,966 3,739,884 1
![Page 16: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/16.jpg)
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Infrastructure
Hardware (development)- 2 x Intel Xeon E5-2640 - 384 GB DDR3 1333MHz RAM- 1.5 TB SSD - 3TB 7200rpm
Triple Store- Virtuoso 7 column store- Shown to scale to > 100 billion
triples
Network- AMX-IS- Extensive memcache
![Page 17: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/17.jpg)
Antony Williams vs Identifiers
Passport ID
Dad, Tony, others
SSN
Green Card
License5 email addressesChemConnector (blog, Twitter account, Facebook, Friendfeed)OpenID, ORCID….
![Page 18: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/18.jpg)
![Page 19: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/19.jpg)
![Page 20: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/20.jpg)
![Page 21: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/21.jpg)
![Page 22: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/22.jpg)
P12047X31045
P120
47
GB:29384RS
_2353
Let a Mapping Service take the strain….
![Page 23: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/23.jpg)
PubChemDrugbankChemSpider
Imatinib
Mesylate
What Is Gleevec?
![Page 24: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/24.jpg)
Strict Relaxed
Analysing Browsing
Dynamic Equality
LinkSet#1 { chemspider:gleevec hasParent imatinib ... drugbank:gleevec exactMatch imatinib ...}
chemspider:gleevec drugbank:gleevec
![Page 25: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/25.jpg)
ChemSpider Validation & Standardization Platform
Quality Assurance
![Page 26: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/26.jpg)
Chemistry Validation and Standardization Platform (CVSP)
at cvsp.chemspider.com• Validation• Standardization• Parent generation
RDF Export
Data
![Page 27: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/27.jpg)
CTABREGID1DataSourceSynonym1Synonym2XRef1etc
DepositedSDF record Standardized
entity
OPS_ID1
Parents
Charge Parent (OPS_ID7)
Isotope Parent (OPS_ID5)
Stereo Parent (OPS_ID4)
Tautomer Parent (OPS_ID6)
Super Parent (OPS_ID8)
Fragment (OPS_ID3)
Fragment (OPS_ID2)
![Page 28: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/28.jpg)
For each Compound (CSID) parent generation is attempted: “Tautomerism in large databases”, Sitzmann and others, J.Comput Aided Mol Des (2010)
Parent Description
Charge-Unsensitive
An attempt is made to neutralize ionized acids and bases. Envisioned to be an ongoing improvement while new cases appear.
Isotope-Unsensitive
Isotopes replaced by common weight
Stereo-Unsensitive Stereo is stripped
Tautomer-Unsensitive
Tautomer canonicalization is attempting to generate a “reasonable” tautomer
Super-Unsensitive This parent is all of the above
![Page 29: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/29.jpg)
O H
O
O H
O
O–
O
Na+
Na+
O
O–
O
O–
OPS1
O–
ONa
+
DrugBank ID DB07241
OPS5OPS4
OPS3
OPS2
OPS6
ops:OPS1 skos:exactMatch <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB07241> .
ops:OPS2 skos:relatedMatch ops:OPS1 .
ops:OPS3 skos:relatedMatch ops:OPS1 .
ops:OPS3 skos:closeMatch ops:OPS4 .
ops:OPS3 skos:closeMatch ops:OPS5 .
ops:OPS4 skos:closeMatch ops:OPS6 .
ops:OPS5 skos:closeMatch ops:OPS6 .
![Page 30: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/30.jpg)
![Page 31: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/31.jpg)
A Precompetitive Knowledge Framework
Integration
Pharma Needs
Inputs
Sustainability
StabilitySecurity
Management /
Governance Data Mining
Services/Algorithms
Mapping & Populating
Architecture
Interfaces & Services
ContentStructured
& Unstructure
d
Vocabularies &
Identifiers (URIs)
CommunityKD
Innovation
![Page 32: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/32.jpg)
The Ecosystem is ….
API
Approach
Community
Industry
AcademiaData
Provider
Software Provider
![Page 33: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/33.jpg)
Kick-Starting SustainabilityC
olla
bo
rati
on
Gra
nts
Ind
ust
ry
Open PHACTSA
PI U
sers
Apps
API
![Page 34: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/34.jpg)
explorer.openphacts.org
![Page 35: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/35.jpg)
Example applications
Advanced analytics
ChemBioNavigator Navigating at the interface of chemical and biological data with sorting and plotting options
TargetDossier Interconnecting Open PHACTS with multiple target centric services. Exploring target similarity using diverse criteria
PharmaTrek Interactive Polypharmacology space of experimental annotations
UTOPIA Semantic enrichment of scientific PDFs
Predictions
GARFIELD Prediction of target pharmacology based on the Similar Ensemble Approach
eTOX connector Automatic extraction of data for building predictive toxicology models in eTOX project
![Page 36: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/36.jpg)
Front-end framework to visualize biological data
Target dossier (CNIO)
![Page 37: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/37.jpg)
![Page 38: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/38.jpg)
![Page 39: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/39.jpg)
![Page 40: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/40.jpg)
![Page 41: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/41.jpg)
![Page 42: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/42.jpg)
The Open PHACTS community ecosystem
![Page 43: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/43.jpg)
Becoming part of the Open PHACTS Foundation
Members
membership offers early access to platform updates and releases
the opportunity to steer research and development directions
receive technical support
work with the ecosystem of developers and semantic data integrators around Open PHACTS
tiered membership
familiar business and governance model
A UK-based not-for-profit member owned company
![Page 44: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/44.jpg)
What are the problems with licensing we had to address?– To make data and software generated by the project usable/ reusable– Multiplicity of unclear or non-standard licenses on original data sources
• ‘Public’ can mean use but not redistribute, use in commercial environment, • Legal position on use and reuse extremely unclear • Different issues than just linking to data
– Legal status of integrated collections of the above, and of derived knowledge?
– Appropriate software license selection– Legal clarity for EFPIA and end users– Approaches for commercial data integration, EFPIA in-house data
AIM: enable maximum possible dissemination and usability of integrated data and architecture with approaches that will be applicable in other data integration projects
Licensing Challenges
![Page 45: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/45.jpg)
Chose John Wilbanks as consultant
A framework built around STANDARD well-understood Creative Commons licences – and how they interoperate
Deal with the problems by:
Interoperable licences
Appropriate terms
Declare expectations to users and data publishers
One size won‘t fit all requirements
Data Licensing Solution
![Page 46: Practical semantics in the pharmaceutical industry - the Open PHACTS project](https://reader035.fdocuments.in/reader035/viewer/2022070315/554e7d83b4c9054a698b5292/html5/thumbnails/46.jpg)
Open PHACTS Project Partners
Pfizer Limited – Coordinator
Universität Wien – Managing entity
Technical University of Denmark
University of Hamburg, Center for Bioinformatics
BioSolveIT GmBH
Consorci Mar Parc de Salut de Barcelona
Leiden University Medical Centre
Royal Society of Chemistry
Vrije Universiteit Amsterdam
Spanish National Cancer Research Centre
University of Manchester
Maastricht University
Aqnowledge
University of Santiago de Compostela
Rheinische Friedrich-Wilhelms-Universität Bonn
AstraZeneca
GlaxoSmithKline
Esteve
Novartis
Merck Serono
H. Lundbeck A/S
Eli LillyNetherlands Bioinformatics CentreSwiss Institute of BioinformaticsConnectedDiscoveryEMBL-European Bioinformatics Institute
Janssen
OpenLink