Knowledge Discovery using an Integrated Semantic Web
-
Upload
michel-dumontier -
Category
Technology
-
view
4.125 -
download
1
description
Transcript of Knowledge Discovery using an Integrated Semantic Web
Knowledge Discovery using an Integrated Semantic Web
1
Michel Dumontier
Department of Biology, School of Computer Science, Institute of Biochemistry Ottawa Institute for Systems Biology
Ottawa-Carleton Institute for Biomedical Engineering Carleton University
Ottawa, Canada
Chair, W3C Semantic Web for Health Care and Life Sciences Interest Group BH2012
2 BH2012
BH2012 3
uncovering a sufficient amount of evidence to support/refute a hypothesis is becoming increasingly difficult
it requires a lot of digging around
BH2012 4
continuous growth in research literature
5
Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html
BH2012
growing amount of biomedical data
6 BH2012
increasingly complex software & interfaces to predict, compare and evaluate
7 BH2012
ultimately, we answer questions by building sophisticated workflows
8 BH2012
What if we could just pose a hypothesis and have a system automatically use
available data, ontologies and services? 9 BH2012
HyQue
HyQue is the Hypothesis query and evaluation system • A platform for knowledge discovery • Facilitates hypothesis formulation and evaluation • Leverages Semantic Web technologies to provide access to
facts, expert knowledge and web services • Conforms to a simplified event-based model • Supports evaluation against positive and negative findings • Transparent and reproducible evidence prioritization • Provenance of across all elements of hypothesis testing
– trace a hypothesis to its evaluation, including the data and rules used
BH2012 10 HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.
Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012.
HyQue Architecture
BH2012 11
Services
Ontologies
Event-based data model
HyQue events denote a phenomenon involving two objects: ‘agent’ and ‘target’ . In addition, we can specify the location of this event (e.g. located in nucleus, or under some genetic background) Event
‘has agent’ agent
‘has target’ target
‘is located in’ location
‘is negated’ boolean
BH2012 12
supported events
1. protein-protein binding 2. protein-nucleic acid binding 3. molecular activation 4. molecular inhibition 5. gene induction 6. gene repression 7. transport
HyQue domain rules CALCULATE a quantitative measure of evidence for an event
‘induce’ rule (maximum score: 5): – Is event negated?
• If yes, subtract 2 – Is event of type ‘induce’?
• If yes, add 1; if no, subtract 1 – Is agent of type ‘protein’ or ‘RNA’?
• If yes, add 1; if type ‘gene’, subtract 1 – Is target of type ‘gene’?
• If yes, add 1; if no, subtract 1 – Does agent have known ‘transcription factor activity’?
• If yes, add 1 – Is event located in the ‘nucleus’?
• If yes, add 1; if no, subtract 1
GO:0010628
CHEBI:36080
SO:0000236
GO:0003700
GO:0005634
BH2012 13
Customization of rules/data sources will generate different evidence-based evaluations
BH2012 14
The Semantic Web is the new global web of knowledge
15 BH2012
It involves standards for publishing, sharing and querying facts, expert knowledge and services
It is a scalable approach to the
discovery of independently formulated and distributed knowledge
something you can search, lookup, link to, check
consistency of, and query for
16 BH2012
An ever expanding web of linked data
17 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” BH2012
Bio2RDF provides a simple convention
and infrastructure to provide linked data for the life sciences
18 BH2012
19
linked data for the life sciences
An Open Source Project for the Provision of Scalable, Decentralized Data with Global Mirroring
and Customizable Query Resolution
Laval University, Carleton University, Queensland University of Technology
BH2012
http://bio2rdf.org/ns:id
provides billions of interconnections
20 BH2012
Towards universally-accepted identifiers
BH2012 21
BH2012 22
http://identifiers.org/taxonomy/$id
BH2012 23
(coming soon)
engaging the BioPAX community to adopt identifiers.org
Pathwaycommons (level 2; download) <bp:unificationXref rdf:ID="CPATH-LOCAL-653"> <bp:ID rdf:datatype="xsd:string">9606</bp:ID> <bp:DB rdf:datatype="xsd:string">NCBI_TAXONOMY</bp:DB> </bp:unificationXref> Pathwaycommons (level 3; web service) <bp:UnificationXref rdf:about="urn:biopax:UnificationXref:REACTOME+DATABASE+ID_109276"> <bp:id rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">109276</bp:id> <bp:db rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">Reactome Database ID</bp:db> </bp:UnificationXref> Biomodels (level 3) <bp:UnificationXref rdf:about="http://identifiers.org/obo.go/GO:0004889"> <bp:id rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">GO:0004889</bp:id> <bp:db rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">Gene Ontology</bp:db> </bp:UnificationXref>
BH2012 24
Andrea Splendiani
More sophisticated OWL-based Data Integration, Consistency Checking and Discovery
• Checking the consistency of semantic annotations [1] – Formalized semantic annotations in SBML models as OWL axioms.
Automated reasoning uncovered inconsistencies in 16 models. • e.g. alpha-D-glucose phosphate is not the required ATP in an ATP-dependent
reaction (GO + ChEBI + disjoint + closure axioms)
• Finding significant biomedical associations [2] (initiated at BH11) – found significant associations between genes, drugs, diseases and
pathways using Drugbank, PharmGKB, CTD, PID across categories of drugs (ChEBI, ATC, MeSH) and diseases (DO, MeSH)
– 22,653 pathway-disease type associations (6304 over; 16,349 under) • carcinosarcoma (DOID:4236) and (HIV RT) Zidovudine Pathway
(PharmGKB:PA165859361) – 13,826 pathway-chemical type associations (12,564 over; 1262 under)
• drug clopidogrel (CHEBI:37941) with Endothelin signaling pathway (PharmGKB:PA164728163) -> (smooth muscle mitogenesis)
1. Integrating systems biology models and biomedical ontologies. BMC Systems Biology. 2011. 5 : 124 2. Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics. 2012. in press
http://pharmgkb-owl.googlecode.com
BH2012 25
Robert Hoehndorf
Personal Health Lens
Observation: Patients often look up new/alternative drugs to treat their condition or alleviate side effects. Opportunity: A patient-centric health care application that identifies contraindications for drugs mentioned on web pages using the patient’s own health data
Components: • RDFized patient data • Bio2RDF semantically annotated data • SADI semantic web services to process the page and retrieve data • SHARE automatic workflow composition
BH2012 26
Mark Wilkinson Chris Baker
BH2012 27
W3C Task Force: Clinical Decision Support for Personalized Medicine
Curated and unified set of essential 385+
markers, 50+ pharmacogenes and rulesystem unified
under one standardized model: The Medicine Safety
Code
We are developing a simple, cheap and ubiquitous solutions for anchoring pharmacogenomics in medical practice
BH2012 28
Matthias Samwald
Unified OWL
Ontology
(inferencing,
consistency checking, mapping)
BH2012 29
At this Biohackathon
• refine Bio2RDF RDFization Guide • complete Dataset Description (BH11) • dataspace statistics & visualization • SPARQL-based Enrichment Analysis • ontology-based Similarity Networks
– see Rob’s email
BH2012 30
dumontierlab.com [email protected]
BH2012
Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier
31