Post on 09-Aug-2020
ChEMBL resources and KNIME
George Papadatos
georgep@ebi.ac.uk
Outline
• ChEMBL data
• ChEMBL nodes
• Web services v2.0
• UniChem
• Cheminformatics utilities
• myChEMBL
• SureChEMBL and Open PHACTS
Bioactivity data
Compound
Ass
ay/T
arge
t
>Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
3. Insight, tools and resources for translational drug discovery
2. Organization, integration, curation and standardization of pharmacology data
1. Scientific facts
Ki = 4.5nM
APTT = 11 min.
ChEMBL: Data for drug discovery
Bioactivity data
Compound
Ass
ay/T
arge
t
>Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
3. Insight, tools and resources for translational drug discovery
2. Organization, integration, curation and standardization of pharmacology data
1. Scientific facts
Ki = 4.5nM
APTT = 11 min.
ChEMBL: Data for drug discovery
KNIME at the EBI
• Access ChEBI and ChEMBL databases via KNIME nodes
• Trusted community nodes
• Algorithms development
• Document classification
• Share example workflows and use cases
• Provide KNIME training to scientists and researchers
• Wellcome Trust drug discovery courses, EMBL courses
• CDK community nodes development
h"p://tech.knime.org/book/embl-‐ebi-‐nodes
ChEMBL nodes
ChEMBL KNIME nodes
Example: All bioactivities for hERG
All bioac9vi9es for hERG
Ac9vity value, assay descrip9on, compound, reference
Example: Compound searching in ChEMBL
Query
List of NNs
Example: Polypharmacology profile
Compounds
Query
Find NNs
Retrieve bioac9vi9es
Filter, summarise & pivot
Web services v2.0
• Many more entities à granularity
• Pagination, filtering, ordering
UniChem integration
EMBL-EBI chemistry resources
RDF and REST API interfaces
REST API Interface -‐ h"ps://www.ebi.ac.uk/unichem/
Atlas
Ligand induced transcript response
750
PDBe
Ligand structures
from structurally defined protein
complexes
15K
ChEBI
Nomenclature of primary and secondary metabolites. Chemical Ontology
24K
SureChEMBL
Chemical structures from patent literature
~17M
ChEMBL
Bioac9vity data from literature
and deposi9ons
1.5M
UniChem – InChI-‐based chemical resolver (full + relaxed ‘lenses’) >90M
3rd Party Data
ZINC, PubChem, ThomsonPharma DOTF, IUPHAR, DrugBank, KEGG,
NIH NCC, eMolecules, FDA SRS, PharmGKB,
Selleck, ….
~70M
Novelty checking with UniChem h"ps://www.ebi.ac.uk/unichem/
Cheminformatics utilities
Cheminformatics utilities (aka ‘Beaker’)
• Chemical format conversions
• Dynamic image generation
• Image processing (via OSRA)
• Descriptors and property calculations
• Chemical modifications and standardization
https://www.ebi.ac.uk/chembl/api/utils/docs
Example: Image to Structure
image URL
myChEMBL integration
Accessing local data with myChEMBL
Using KNIME to connect to myChEMBL
SELECT mr.*, md.chembl_id, cp.full_mwt, cp.alogp from mols_rdkit mr, molecule_dictionary md, compound_properties cp
where mr.m @> '$${SMolecule}$$'::qmol and mr.molregno = md.molregno and md.molregno = cp.molregno;
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
SureChEMBL
SciBite Termite
Open PHACTS API
https://dev.openphacts.org/docs/develop https://github.com/openphacts/OPS-Knime/
http://rdf.ebi.ac.uk/resource/surechembl/patent/US-8877786-B2
Substituted carbamoylmethylamino acetic acid derivatives as novel NEP inhibitors
US-8877786-B2
Most relevant targets and diseases
MCS scaffold
Most relevant diseases
Most relevant targets
Patent publication date histogram http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL371804
Foretinib, a kinase inhibitor in clinical phase II Found in 89 EP, WO and US patents
Summary
• KNIME: democratizes access to data and tools
• Access public domain structure and bioactivity data and services with KNIME
• ChEMBL KNIME Nodes
• UniChem
• Cheminformatics services
• myChEMBL
• SureChEMBL
Publications
Acknowledgements
• Francis Atkinson
• Louisa Bellis
• Jon Chambers
• Michał Nowotka
• Anne Hersey
• Stefan Beisken
• Edmund Duesbury
• Daniela Digles
• Thorsten Meinl
• KNIME
• KNIME community
All workflow examples are available on request.
ChEMBL resources and KNIME
George Papadatos
georgep@ebi.ac.uk