Knowledge Discovery using an Integrated Semantic Web

31
Knowledge Discovery using an Integrated Semantic Web 1 Michel Dumontier Department of Biology, School of Computer Science, Institute of Biochemistry Ottawa Institute for Systems Biology Ottawa-Carleton Institute for Biomedical Engineering Carleton University Ottawa, Canada Chair, W3C Semantic Web for Health Care and Life Sciences Interest Group BH2012

description

Biohackathon 2012 keynote

Transcript of Knowledge Discovery using an Integrated Semantic Web

Page 1: Knowledge Discovery using an Integrated Semantic Web

Knowledge Discovery using an Integrated Semantic Web

1

Michel Dumontier

Department of Biology, School of Computer Science, Institute of Biochemistry Ottawa Institute for Systems Biology

Ottawa-Carleton Institute for Biomedical Engineering Carleton University

Ottawa, Canada

Chair, W3C Semantic Web for Health Care and Life Sciences Interest Group BH2012

Page 2: Knowledge Discovery using an Integrated Semantic Web

2 BH2012

Page 3: Knowledge Discovery using an Integrated Semantic Web

BH2012 3

Page 4: Knowledge Discovery using an Integrated Semantic Web

uncovering a sufficient amount of evidence to support/refute a hypothesis is becoming increasingly difficult

it requires a lot of digging around

BH2012 4

Page 5: Knowledge Discovery using an Integrated Semantic Web

continuous growth in research literature

5

Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html

BH2012

Page 6: Knowledge Discovery using an Integrated Semantic Web

growing amount of biomedical data

6 BH2012

Page 7: Knowledge Discovery using an Integrated Semantic Web

increasingly complex software & interfaces to predict, compare and evaluate

7 BH2012

Page 8: Knowledge Discovery using an Integrated Semantic Web

ultimately, we answer questions by building sophisticated workflows

8 BH2012

Page 9: Knowledge Discovery using an Integrated Semantic Web

What if we could just pose a hypothesis and have a system automatically use

available data, ontologies and services? 9 BH2012

Page 10: Knowledge Discovery using an Integrated Semantic Web

HyQue

HyQue is the Hypothesis query and evaluation system • A platform for knowledge discovery • Facilitates hypothesis formulation and evaluation • Leverages Semantic Web technologies to provide access to

facts, expert knowledge and web services • Conforms to a simplified event-based model • Supports evaluation against positive and negative findings • Transparent and reproducible evidence prioritization • Provenance of across all elements of hypothesis testing

– trace a hypothesis to its evaluation, including the data and rules used

BH2012 10 HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.

Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012.

Page 11: Knowledge Discovery using an Integrated Semantic Web

HyQue Architecture

BH2012 11

Services

Ontologies

Page 12: Knowledge Discovery using an Integrated Semantic Web

Event-based data model

HyQue events denote a phenomenon involving two objects: ‘agent’ and ‘target’ . In addition, we can specify the location of this event (e.g. located in nucleus, or under some genetic background) Event

‘has agent’ agent

‘has target’ target

‘is located in’ location

‘is negated’ boolean

BH2012 12

supported events

1. protein-protein binding 2. protein-nucleic acid binding 3. molecular activation 4. molecular inhibition 5. gene induction 6. gene repression 7. transport

Page 13: Knowledge Discovery using an Integrated Semantic Web

HyQue domain rules CALCULATE a quantitative measure of evidence for an event

‘induce’ rule (maximum score: 5): – Is event negated?

• If yes, subtract 2 – Is event of type ‘induce’?

• If yes, add 1; if no, subtract 1 – Is agent of type ‘protein’ or ‘RNA’?

• If yes, add 1; if type ‘gene’, subtract 1 – Is target of type ‘gene’?

• If yes, add 1; if no, subtract 1 – Does agent have known ‘transcription factor activity’?

• If yes, add 1 – Is event located in the ‘nucleus’?

• If yes, add 1; if no, subtract 1

GO:0010628

CHEBI:36080

SO:0000236

GO:0003700

GO:0005634

BH2012 13

Page 14: Knowledge Discovery using an Integrated Semantic Web

Customization of rules/data sources will generate different evidence-based evaluations

BH2012 14

Page 15: Knowledge Discovery using an Integrated Semantic Web

The Semantic Web is the new global web of knowledge

15 BH2012

It involves standards for publishing, sharing and querying facts, expert knowledge and services

It is a scalable approach to the

discovery of independently formulated and distributed knowledge

Page 16: Knowledge Discovery using an Integrated Semantic Web

something you can search, lookup, link to, check

consistency of, and query for

16 BH2012

Page 17: Knowledge Discovery using an Integrated Semantic Web

An ever expanding web of linked data

17 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” BH2012

Page 18: Knowledge Discovery using an Integrated Semantic Web

Bio2RDF provides a simple convention

and infrastructure to provide linked data for the life sciences

18 BH2012

Page 19: Knowledge Discovery using an Integrated Semantic Web

19

linked data for the life sciences

An Open Source Project for the Provision of Scalable, Decentralized Data with Global Mirroring

and Customizable Query Resolution

Laval University, Carleton University, Queensland University of Technology

BH2012

http://bio2rdf.org/ns:id

Page 20: Knowledge Discovery using an Integrated Semantic Web

provides billions of interconnections

20 BH2012

Page 21: Knowledge Discovery using an Integrated Semantic Web

Towards universally-accepted identifiers

BH2012 21

Page 22: Knowledge Discovery using an Integrated Semantic Web

BH2012 22

http://identifiers.org/taxonomy/$id

Page 23: Knowledge Discovery using an Integrated Semantic Web

BH2012 23

(coming soon)

Page 24: Knowledge Discovery using an Integrated Semantic Web

engaging the BioPAX community to adopt identifiers.org

Pathwaycommons (level 2; download) <bp:unificationXref rdf:ID="CPATH-LOCAL-653"> <bp:ID rdf:datatype="xsd:string">9606</bp:ID> <bp:DB rdf:datatype="xsd:string">NCBI_TAXONOMY</bp:DB> </bp:unificationXref> Pathwaycommons (level 3; web service) <bp:UnificationXref rdf:about="urn:biopax:UnificationXref:REACTOME+DATABASE+ID_109276"> <bp:id rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">109276</bp:id> <bp:db rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">Reactome Database ID</bp:db> </bp:UnificationXref> Biomodels (level 3) <bp:UnificationXref rdf:about="http://identifiers.org/obo.go/GO:0004889"> <bp:id rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">GO:0004889</bp:id> <bp:db rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">Gene Ontology</bp:db> </bp:UnificationXref>

BH2012 24

Andrea Splendiani

Page 25: Knowledge Discovery using an Integrated Semantic Web

More sophisticated OWL-based Data Integration, Consistency Checking and Discovery

• Checking the consistency of semantic annotations [1] – Formalized semantic annotations in SBML models as OWL axioms.

Automated reasoning uncovered inconsistencies in 16 models. • e.g. alpha-D-glucose phosphate is not the required ATP in an ATP-dependent

reaction (GO + ChEBI + disjoint + closure axioms)

• Finding significant biomedical associations [2] (initiated at BH11) – found significant associations between genes, drugs, diseases and

pathways using Drugbank, PharmGKB, CTD, PID across categories of drugs (ChEBI, ATC, MeSH) and diseases (DO, MeSH)

– 22,653 pathway-disease type associations (6304 over; 16,349 under) • carcinosarcoma (DOID:4236) and (HIV RT) Zidovudine Pathway

(PharmGKB:PA165859361) – 13,826 pathway-chemical type associations (12,564 over; 1262 under)

• drug clopidogrel (CHEBI:37941) with Endothelin signaling pathway (PharmGKB:PA164728163) -> (smooth muscle mitogenesis)

1. Integrating systems biology models and biomedical ontologies. BMC Systems Biology. 2011. 5 : 124 2. Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics. 2012. in press

http://pharmgkb-owl.googlecode.com

BH2012 25

Robert Hoehndorf

Page 26: Knowledge Discovery using an Integrated Semantic Web

Personal Health Lens

Observation: Patients often look up new/alternative drugs to treat their condition or alleviate side effects. Opportunity: A patient-centric health care application that identifies contraindications for drugs mentioned on web pages using the patient’s own health data

Components: • RDFized patient data • Bio2RDF semantically annotated data • SADI semantic web services to process the page and retrieve data • SHARE automatic workflow composition

BH2012 26

Mark Wilkinson Chris Baker

Page 27: Knowledge Discovery using an Integrated Semantic Web

BH2012 27

Page 28: Knowledge Discovery using an Integrated Semantic Web

W3C Task Force: Clinical Decision Support for Personalized Medicine

Curated and unified set of essential 385+

markers, 50+ pharmacogenes and rulesystem unified

under one standardized model: The Medicine Safety

Code

We are developing a simple, cheap and ubiquitous solutions for anchoring pharmacogenomics in medical practice

BH2012 28

Matthias Samwald

Page 29: Knowledge Discovery using an Integrated Semantic Web

Unified OWL

Ontology

(inferencing,

consistency checking, mapping)

BH2012 29

Page 30: Knowledge Discovery using an Integrated Semantic Web

At this Biohackathon

• refine Bio2RDF RDFization Guide • complete Dataset Description (BH11) • dataspace statistics & visualization • SPARQL-based Enrichment Analysis • ontology-based Similarity Networks

– see Rob’s email

BH2012 30

Page 31: Knowledge Discovery using an Integrated Semantic Web

dumontierlab.com [email protected]

BH2012

Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier

31