2011 ebi industry workshop
-
Upload
michel-dumontier -
Category
Health & Medicine
-
view
1.145 -
download
1
Transcript of 2011 ebi industry workshop
1 2011-EBI-Industry-SW::Dumontier
Predicting Druglikeness and Toxicity from Integrated Data and Services on
the Life Science Semantic Web
Michel Dumontier, Ph.D.
Associate Professor of Bioinformatics, Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University
Professeur Associé, Département d’informatique et de génielogiciel, Université Laval
Ottawa Institute of Systems BiologyOttawa-Carleton Institute of Biomedical Engineering
2 2011-EBI-Industry-SW::Dumontier
Is caffeine a drug-like molecule?
Is acetaminophen toxic?
3 2011-EBI-Industry-SW::Dumontier
Finding the right information to answer a question is hardand sometimes requires a sophisticated workflow
4 2011-EBI-Industry-SW::Dumontier
5 2011-EBI-Industry-SW::Dumontier
What if we could answer a question by automatically building a knowledge base
using both data and services?
6 2011-EBI-Industry-SW::Dumontier
The Semantic Web is a web of knowledge.
It is about standards for publishing, sharing and querying knowledge drawn from diverse sources
It enables the answering of sophisticated questions
7 2011-EBI-Industry-SW::Dumontier
To answer this question we need to know:
• what ‘drug like molecule’ really means• caffeine’s molecular structure• the ability to compute the relevant attributes• determine whether caffeine satisfies the requirements of being ‘drug like’
Is caffeine a drug-like molecule?
8 2011-EBI-Industry-SW::Dumontier
Lipinski Rule of Five
• Rule of thumb for druglikeness (orally active in humans)(4 rules with multiples of 5)– mass of less than 500 Daltons– fewer than 5 hydrogen bond donors– fewer than 10 hydrogen bond acceptors– A partition coefficient value between -5 and 5
We need a more formal (machine understandable) description of a ‘drug-like molecule’ which specifies values for chemical descriptors
9 2011-EBI-Industry-SW::Dumontier
ontology as a strategy to
formally represent knowledge
10 2011-EBI-Industry-SW::Dumontier
The Web Ontology Language (OWL) Has Explicit Semantics
Can therefore be used to capture knowledge in a machine understandable way
11 2011-EBI-Industry-SW::Dumontier
Semanticscience Integrated Ontology (SIO)
• OWL2 ontology• 900+ classes covering basic types (physical, processual, abstract,
informational) with an emphasis on biological entities• 169 basic relations (mereological, participatory, attribute/quality,
spatial, temporal and representational)• axioms can be used by reasoners to generate inferences for
consistency checking, classification and answering questions about life science knowledge
• embodies emerging ontology design patterns – specifies the representation of knowledge
• dereferenceable URIs• searchable in the NCBO bioportal• Available at http://semanticscience.org/ontology/sio.owl
12 2011-EBI-Industry-SW::Dumontier
2011-EBI-Industry-SW::Dumontier
The Chemical Information Ontology (CHEMINF)
• 100+ chemical descriptors• 50+ chemical qualities• Relates descriptors to their
specifications, the software that generated them (along with the running parameters, and the algorithms that they implement)
• Contributors: Nico Adams, Leonid Chepelev, Michel Dumontier, Janna Hastings, Egon Willighagen, Peter Murray-Rust, Cristoph Steinbeck
13
http://semanticchemistry.googlecode.com
2011-EBI-Industry-SW::Dumontier
Molecular structure can be represented using a SMILES string, which is a common representation
of the chemical graph
14
ball & stick model for caffeine
SMILES string for caffeine
Cn1cnc2n(C)c(=O)n(C)c(=O)c12
15 2011-EBI-Industry-SW::Dumontier
Lipinski Rule of Five• Empirically derived ruleset for druglikeness
(4 rules with multiples of 5)– mass of less than 500 Daltons– fewer than 5 hydrogen bond donors– fewer than 10 hydrogen bond acceptors– A partition coefficient value between -5 and 5
• A formal description using OWL:
2011-EBI-Industry-SW::Dumontier
What we then need are services that will consume SMILES strings and annotate the molecule with the required chemical
descriptors
16
then we can reason about whether it satisfies the drug-likeness definition
2011-EBI-Industry-SW::Dumontier
Semantic Automated Discovery and Integration
http://sadiframework.org
Mark Wilkinson, UBCMichel Dumontier, Carleton UniversityChristopher Baker, UNB
SADI is a framework to create Semantic Web services using OWL classes as service inputs and outputs
17
2011-EBI-Industry-SW::Dumontier
Create code stubs using the ontology
• Publish the ontology to a web-accessible locationhttp://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl
• Make sure that the class names are resolvable(easy when using the hash notation)
http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smiles-moleculehttp://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#logp-moleculehttp://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hbdc-moleculehttp://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hdba-moleculehttp://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#lipinksi-druglike-molecule
• Download/checkout the codehttp://sadiframework.org
• Run the code generator (Java, Perl, python)– specify the URIs that correspond to input and output types
• Implement the functionality– We used the Chemistry Development Kit (CDK) to implement 4 services
18
2011-EBI-Industry-SW::Dumontier
Responds to a GET operation by providing the service description in RDF
conforms to Feta (BioMoby, myGrid)
19
curl http://cbrass.biordf.net/logpdc/logpc
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:j.0="http://www.mygrid.org.uk/mygrid-moby-service#" > <rdf:Description rdf:about=""> <j.0:hasServiceDescriptionText>no description</j.0:hasServiceDescriptionText> <j.0:hasServiceNameText rdf:datatype="http://www.w3.org/2001/XMLSchema#string">logpc</j.0:hasServiceNameText> <j.0:hasOperation rdf:resource="#operation"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#serviceDescription"/> </rdf:Description> <rdf:Description rdf:about="#input"> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smilesmolecule"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/> </rdf:Description> <rdf:Description rdf:about="#operation"> <j.0:outputParameter rdf:resource="#output"/> <j.0:inputParameter rdf:resource="#input"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#operation"/> </rdf:Description> <rdf:Description rdf:about="#output"> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#alogpsmilesmolecule"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/> </rdf:Description></rdf:RDF>
2011-EBI-Industry-SW::Dumontier
Responds to a POST containing service input with a service output in RDF
20
<rdf:Description rdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#mdalogp"> <rdf:type rdf:resource="http://semanticscience.org/resource/CHEMINF_000251"/> <j.0:SIO_000300 rdf:datatype="http://www.w3.org/2001/XMLSchema#double">-0.4311000000000006</j.0:SIO_000300> </rdf:Description>
<rdf:RDF xmlns="http://semanticscience.org/sadi/ontology/caffeine.rdf#" xmlns:so="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:sio="http://semanticscience.org/resource/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <so:smilesmolecule rdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#m"> <sio:SIO_000008 rdf:resource = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"/> </so:smilesmolecule> <sio:CHEMINF_000018 rdf:about = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"> <sio:SIO_000300 rdf:datatype="xsd:string">Cn1cnc2n(C)c(=O)n(C)c(=O)c12</sio:SIO_000300> </sio:CHEMINF_000018></rdf:RDF>
The response is in RDF:
The query is in RDF:
21 2011-EBI-Industry-SW::Dumontier
61 Chemical Semantic Web Services
• these and an increasing number of semantic web services are registered at http://sadiframework.org/registry/services/
2011-EBI-Industry-SW::Dumontier
Now what?
22
2011-EBI-Industry-SW::Dumontier23
Semantic Health and Research Environment
SHARE is an application that execute (SPARQL) queries as workflows over SADI Services
2011-EBI-Industry-SW::Dumontier
“Reckoning”
dynamic discovery of instances of OWL classes through synthesis and invocation of a Web Service workflow capable of generating data described by the OWL class restrictions, followed by reasoning to classify the data
into that ontology
24
2011-EBI-Industry-SW::Dumontier
ChEBI publishes (non-SW) data!
25
2011-EBI-Industry-SW::Dumontier
Bio2RDF provides ChEBI in RDF
26
27 2011-EBI-Industry-SW::Dumontier
Bio2RDF covers the major biological databases
28
Bio2RDF’s RDFized data fits together
29
Resource Description Framework (RDF)
Uniform Resource Identifier (URI) can be used as entity names
Bio2RDF specifies the naming convention
http://bio2rdf.org/uniprot:P05067
is a name for Amyloid precursor protein
http://bio2rdf.org/omim:104300
is a name for Alzheimer disease
uniprot:P05067
omim:104300
Allows one to talk about anything
30
Life Science Dataset Registry Coordinates Naming
• Provides stable URI patterns for records and the entities they describe.
Directory Service• ~1500 datasets & dozens of resolvers.
Discovery Service• Registry links entities to records and their representations (RDF/XML,
HTML, etc) and provider (Bio2RDF, Uniprot)
Redirection Service• Automatic redirection to data provider document
Stanford : 22-04-2010
31 2011-EBI-Industry-SW::Dumontier
Bio2RDF is now serving over 40 billion triples of linked biological data
32
Bio2RDF is a framework to create and provision linked data networks
Francois Belleau, Laval UniversityMarc-Alexandre Nolin, Laval University
Peter Ansell, Queensland University of TechnologyMichel Dumontier, Carleton University
33 2011-EBI-Industry-SW::Dumontier
Bio2RDF is part of a growing web of linked data
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
34 2011-EBI-Industry-SW::Dumontier
something you can lookup or search for with rich descriptions
35 2011-EBI-Industry-SW::Dumontier
SPARQL is the new cool kid on the query block
SQL SPARQL
2011-EBI-Industry-SW::Dumontier
Query for log p
36
2011-EBI-Industry-SW::Dumontier37
2011-EBI-Industry-SW::Dumontier
Query: Is caffeine a drug-like molecule?
38
39 2011-EBI-Industry-SW::Dumontier
2011-EBI-Industry-SW::Dumontier
Benefits
• Data remains distributed – as the internet was meant to be!
• Data is not “exposed” as a SPARQL endpoint– greater provider-control over computational resources
• Service invocation is straightforward and matchmaking by reasoning about ontology-based input/output descriptions
40
41 2011-EBI-Industry-SW::Dumontier
Is acetaminophen toxic?
• Classical approaches involve decision trees or machine learning over validated data.
• Algorithms are often proprietary, even by the regulatory agencies
• Issues around which data was used, and what the informative parameters are, and how easily can new information affect the outcomes?
42 2011-EBI-Industry-SW::Dumontier
OWLED2011 : Large-Scale Boolean Feature Based Trees as OWL ontologies
43 2011-EBI-Industry-SW::Dumontier
DL Reasoners give Explanations
44 2011-EBI-Industry-SW::Dumontier
Summary
• Semantic Web technologies offer tantalizing ability to create and share data and services for drug discovery– Bio2RDF provides linked life science data– SADI provides a framework to provide semantic web
services– SHARE allows us to simultaneously query and reason
about data and services represented using RDF/OWL– Expressive ontologies can be used to make toxicity
decisions transparent
2011-EBI-Industry-SW::Dumontier45
Acknowledgements
Bio2RDF: Peter Ansell, Francois Belleau, Allison Callahan, Jacques Corbeil, Jose Cruz-Toledo, Alex De Leon, Steve Etlinger, James Hogan, Nichealla Keath, Jean Morissette, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault and, Paul Roe
SADI: Christopher Baker, Melanie Courtot, Jose Cruz-Toledo, Steve Etlinger, Nichealla Keath, Artjom Klein, Luke McCarthy, Silvane Paixao, Ben Vandervalk, Natalia Villanueva-Rosales, Mark Wilkinson
CHEMINF GroupLeo ChepelevJanna HastingsEgon WillighagenNico Adams
Toxicity GroupLeo ChepelevDana Klassen
46 2011-EBI-Industry-SW::Dumontier
Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier