Ontologies for representing, integrating and analyzing phenotypes

Post on 18-May-2015

666 views 6 download

Tags:

description

The development and application of high-throughput technologies in biology leads to a rapid increase of data and knowledge and enables the possibility for a paradigm shift towards the personalized treatment of disease based on an individual patient’s genetic markup. Major challenges that biology faces today are to integrate data across different databases, domains, levels of granularity and species, and to make the information resulting from high-throughput experiments amenable to scientific analyses and the discovery of mechanisms underlying disease. In my talk, I will demonstrate how formal ontologies combined with recent progress in automated reasoning can be used to represent, integrate and analyze data resulting from high-throughput phenotyping experiments. I will show how an expressive formal representation of phenotype ontologies can lead to interoperability with biomedical ontologies of other domains, illustrate an ontology modularization approach that enables the use of automated reasoning over these ontologies and show how to integrate phenotype data across multiple species. Finally, I will demonstrate how measures of semantic similarity can be applied to analyze high-throughput phenotype data and reveal novel gene-disease associations and discuss how an ontology-based approach to the semantic integration of data in biomedicine can facilitate translational research and personalized medicine.

Transcript of Ontologies for representing, integrating and analyzing phenotypes

Ontologies for representing, integrating and analyzingphenotypes

Robert Hoehndorf

Department of GeneticsUniversity of Cambridge

21 June 2011

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 1 / 40

Introduction Motivation

Motivation

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 2 / 40

Introduction Motivation

Motivation

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 3 / 40

Introduction Ontology

Open Biomedical Ontologies (OBO)

Body

Organ

Cell

Molecule

Tissue

Population

Gene

Transcript

Organelle

Individual

Physical object Quality Function Process

Gene OntologyCelltype

Sequence Ontology

GO-CC

ChEBI Ontology

AnatomyOntology

PhenotypeOntology

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 4 / 40

Introduction Ontology

OntologyPhenotype and anatomy ontologies

anatomy ontologies: > 100,000 classes

FMA, MA, WA, ZFA, FA, GO-CC, ...

phenotype ontologies: > 20,000 classes

HPO, MP, WBPhenotype, FBcv, APO, ...

quality ontology: > 2,000 classes

PATO

process and function ontologies: > 25,000 classes

Gene Ontology, ...

alignments between anatomy ontologies

UBERON, various mappings

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 5 / 40

Introduction Ontology

OntologyChallenges for interoperability

“merely using ontologies [...] does not reduce heterogeneity: itjust raises heterogeneity problems to a higher level” [Euzenat,2007]

implicit knowledge

implicit semantics

weakly formalized

very large

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 6 / 40

Introduction Ontology

OntologyExample query

Find all regions in the human and mouse genome sequences that areassociated with Tetralogy of Fallot.

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 7 / 40

Phenotype ontology Tetralogy of Fallot

Tetralogy of Fallot

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 8 / 40

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotHuman phenotypes

Overriding aorta (HP:0002623)

Ventricular septal defect (HP:0001629)

Pulmonic stenosis (HP:0001642)

Right ventricular hypertrophy (HP:0001667)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 9 / 40

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotPhenotype description syntax

Overriding aorta (HP:0002623):

Q: overlap with (PATO:0001590)

E1: Aorta (FMA:3734)

E2: Membranous part of interventricular septum (FMA:7135)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 10 / 40

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotPhenotype description syntax

Overriding aorta (HP:0002623):

Q: overlap with (PATO:0001590)

E1: Aorta (FMA:3734)

E2: Membranous part of interventricular septum (FMA:7135)

HP:0002623 EquivalentTo:

phene-of some (has-part some (FMA:3734 and

has-quality some (PATO:0001590 and towards some

FMA:7135)))

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 11 / 40

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotPhenotype description syntax

Overriding aorta (HP:0002623):

Q: overlap with (PATO:0001590)

E1: Aorta (FMA:3734)

E2: Membranous part of interventricular septum (FMA:7135)

HP:0002623 EquivalentTo:

phene-of some (has-part some (FMA:3734 and

overlaps-with some FMA:7135))

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 12 / 40

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotUBERON human-mouse anatomy equivalences

Overriding aorta (HP:0002623):

Q: overlap with (PATO:0001590)

E1: Aorta (FMA:3734)

FMA:3734 EquivalentTo: MA:0000062

E2: Membranous part of interventricular septum (FMA:7135)

FMA:7135 EquivalentTo: MA:0002939

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 13 / 40

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotPhenotype equivalence

Overriding aorta (MP:0000273):

Q: overlap with (PATO:0001590)

E1: Aorta (MA:0000062)

E2: Membranous interventricular septum (MA:0002939)

MP:0000273 EquivalentTo:

phene-of some (has-part some (MA:0000062 and

has-quality some (PATO:0001590 and towards some

MA:0002939)))

Consequence: MP:00000273 EquivalentTo: HP:0002623

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 14 / 40

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotPhenotype equivalence

Overriding aorta (MP:0000273):

Q: overlap with (PATO:0001590)

E1: Aorta (MA:0000062)

E2: Membranous interventricular septum (MA:0002939)

MP:0000273 EquivalentTo:

phene-of some (has-part some (MA:0000062 and

has-quality some (PATO:0001590 and towards some

MA:0002939)))

Consequence: MP:00000273 EquivalentTo: HP:0002623

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 14 / 40

Phenotype ontology Absence

AbsenceAbsent appendix

Absent appendix:

Q: lacks all parts of type (PATO:0002000)

E1: Human body (FMA:20394)

E2: Appendix (FMA:14542)

AbsentAppendix ≡LacksParts u ∃towards.Appendix u ∃inheresIn.HumanBody (Horrocks,2007)

AbsentAppendix ≡LacksParts u ∃towards.{Appendix} u ∃inheresIn.HumanBody(Mungall, 2007)

AbsentAppendix v ∃pheneOf .(HumanBody u ¬∃hasPart.Appendix)(H et al., 2007, 2011)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 15 / 40

Phenotype ontology Absence

AbsenceAbsent appendix

Absent appendix:

Q: lacks all parts of type (PATO:0002000)

E1: Human body (FMA:20394)

E2: Appendix (FMA:14542)

AbsentAppendix ≡LacksParts u ∃towards.Appendix u ∃inheresIn.HumanBody (Horrocks,2007)

AbsentAppendix ≡LacksParts u ∃towards.{Appendix} u ∃inheresIn.HumanBody(Mungall, 2007)

AbsentAppendix v ∃pheneOf .(HumanBody u ¬∃hasPart.Appendix)(H et al., 2007, 2011)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 15 / 40

Phenotype ontology Absence

AbsenceAbsent appendix

Absent appendix:

Q: lacks all parts of type (PATO:0002000)

E1: Human body (FMA:20394)

E2: Appendix (FMA:14542)

AbsentAppendix ≡LacksParts u ∃towards.Appendix u ∃inheresIn.HumanBody (Horrocks,2007)

AbsentAppendix ≡LacksParts u ∃towards.{Appendix} u ∃inheresIn.HumanBody(Mungall, 2007)

AbsentAppendix v ∃pheneOf .(HumanBody u ¬∃hasPart.Appendix)(H et al., 2007, 2011)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 15 / 40

Phenotype ontology Absence

AbsenceAbsent appendix

Absent appendix:

Q: lacks all parts of type (PATO:0002000)

E1: Human body (FMA:20394)

E2: Appendix (FMA:14542)

AbsentAppendix ≡LacksParts u ∃towards.Appendix u ∃inheresIn.HumanBody (Horrocks,2007)

AbsentAppendix ≡LacksParts u ∃towards.{Appendix} u ∃inheresIn.HumanBody(Mungall, 2007)

AbsentAppendix v ∃pheneOf .(HumanBody u ¬∃hasPart.Appendix)(H et al., 2007, 2011)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 15 / 40

Phenotype ontology Absence

AbsenceAbsent appendix

AbsentAppendix v ∃pheneOf .(HumanBody u ¬∃hasPart.Appendix)

FMA: HumanBody v ∃hasPart.Appendix

HumanBody(John),AbsentAppendix(x), hasPhene(John, x)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 16 / 40

Phenotype ontology Absence

AbsenceAbsent appendix

Removal of conflicting axioms (has-part/part-of in anatomy)

Contextualize anatomy:

Normal u HumanBody v ∃hasPart.(Normal u Appendix)

Use of non-monotonic reasoning:

Normally: HumanBody v ∃hasPart.AppendixCircumscription of ¬NormalImplementation in dlvhexIC-has-part(X,Y) :- ind(X),class(Y),inst(X,Z),

CC-normally-has-part(Z,Y), not IC-lacks-has-part(X,Y),

class(Z).

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 17 / 40

Phenotype ontology Absence

Ontology of phenotypes

Different formal expressions for phenotypes based on

qualities,

anatomical parts,

functions,

processes

enable cross-species integration of phenotypes.

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 18 / 40

Phenotype ontology Discovering mouse models

Tetralogy of Fallot

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 19 / 40

Phenotype ontology Discovering mouse models

Phenotype alignmentsMouse model: Phc1

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 20 / 40

Phenotype ontology Discovering mouse models

Phenotype alignmentsTetralogy of Fallot: Phc1

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 21 / 40

Knowledge representation Modularization

Complexity of automated reasoning

ontologies based on OWL

OWL 2 is based on description logic (SROIQ)

satisfiability in SROIQ is 2NEXPTIME-complete

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 22 / 40

Knowledge representation Modularization

Modularization

tractable subsets of OWL 2: EL, QL, RL

problem: identify a large (EL, QL, RL)-module of an OWL ontology

AbnormalityOfAppendix ≡∃pheneOf .(¬∃hasPart.(Normal u Appendix)) (��ZZEL)

AbsentAppendix ≡ ∃pheneOf .(¬∃hasPart.Appendix) (��ZZEL)

Inference: AbsentAppendix v AbnormalityOfAppendix (EL)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 23 / 40

Knowledge representation Modularization

Modularization

tractable subsets of OWL 2: EL, QL, RL

problem: identify a large (EL, QL, RL)-module of an OWL ontology

AbnormalityOfAppendix ≡∃pheneOf .(¬∃hasPart.(Normal u Appendix)) (��ZZEL)

AbsentAppendix ≡ ∃pheneOf .(¬∃hasPart.Appendix) (��ZZEL)

Inference: AbsentAppendix v AbnormalityOfAppendix (EL)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 23 / 40

Knowledge representation Modularization

Modularization

tractable subsets of OWL 2: EL, QL, RL

problem: identify a large (EL, QL, RL)-module of an OWL ontology

AbnormalityOfAppendix ≡∃pheneOf .(¬∃hasPart.(Normal u Appendix)) (��ZZEL)

AbsentAppendix ≡ ∃pheneOf .(¬∃hasPart.Appendix) (��ZZEL)

Inference: AbsentAppendix v AbnormalityOfAppendix (EL)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 23 / 40

Knowledge representation Modularization

ModularizationEL Vira

http://el-vira.googlecode.com

ontology modularization

retain signature of ontology

identify EL, QL, RL axioms in deductive closure

completeness is open problem

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 24 / 40

Knowledge representation Modularization

ModularizationEL Module

AbnormalityOfAppendix ≡∃pheneOf .(¬∃hasPart.(Normal u Appendix))

AbsentAppendix ≡ ∃pheneOf .(¬∃hasPart.Appendix)

AbsentAppendix v AbnormalityOfAppendix

H et al., 2011. A common layer of interoperability for biomedical ontologies based on OWL EL. Bioinformatics, 27(7), 1001–1008.

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 25 / 40

Knowledge representation Applications and evaluation

Phenotype alignmentsPhenomeBLAST

apply to yeast, fly, worm, fish, mouse and human phenotypes

phenotype alignment through OWL reasoning

more than 300,000 classes and 1,000,000 axioms

combination of HermiT (for modularization), CB and CEL reasoner

classification time: 7 minutes

http://phenomeblast.googlecode.org

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 26 / 40

Knowledge representation Applications and evaluation

Phenotype alignmentsPhenomeBLAST

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 27 / 40

Knowledge representation Applications and evaluation

Phenotype alignmentsPhenomeBLAST

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 28 / 40

Knowledge representation Applications and evaluation

ApplicationComparison of phenotypes

direct comparison of phenotypes:

disease phenotypes, e.g., tetralogy of Fallotphenotypes associated with genetic mutations (genotypes in mouse,fish, etc.)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 29 / 40

Knowledge representation Applications and evaluation

ApplicationComparison of phenotypes

phenotype of mutations subclass of disease phenotype allows inference ofgene-disease association if

disease phenotypes sufficient for having the disease

mutation phenotypes necessary for having a specific genotype

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 30 / 40

Knowledge representation Applications and evaluation

ApplicationSimilarity-based comparison

pairwise comparison of phenotypes

semantic similarity: weighted Jaccard index

result: similarity matrix between phenotypes

(quantitative) evaluation based on predicting orthology, pathway,disease

identify novel gene-disease associations

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 31 / 40

Knowledge representation Applications and evaluation

ApplicationSimilarity-based comparison: ROC

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Tru

e po

sitiv

e ra

te

False positive rate

DiseaseOrthologyPathway

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 32 / 40

Knowledge representation Applications and evaluation

ApplicationSimilarity-based comparison: gene-disease associations

Adam19 and Fgf15 genes in mice may be involved in Tetralogy ofFallot

Aberrant pathways

Cytokine-cytokine receptor interaction pathway (ko04060) issignificantly correlated with Tetralogy of Fallot (p = 5 · 10−7, Wilcoxonsigned-rank test)

Gene disease associations for orphan diseases

Slc34a1 (MGI:1345284) and Fanconi renotubular syndrome 1(OMIM:134600)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 33 / 40

Knowledge representation Applications and evaluation

ApplicationPhenomeBrowser

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 34 / 40

Conclusions

SummaryAspects of ontology-based information systems in biology

knowledge representation language

expressivenessnon-monotonicitycomplexity of inferences

ontological decisions

anatomy (parthood, connectedness)physiology (function)pathology, disease (normality, abnormality)

statistical/similarity-based framework

semantic similarityaccount for incomplete informationaccount for noisy data

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 35 / 40

Conclusions

Challenges and future researchKnowledge representation

establish reasoning infrastructure (OWLlink, ...)

improve reasoning performance (OWL profiles, modularity,approximate reasoning)

OWL reasoning with prototypes, non-monotonic reasoning, abduction

explore alternatives to OWL

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 36 / 40

Conclusions

Challenges and future researchOntology

Body

Organ

Cell

Molecule

Tissue

Population

Gene

Transcript

Organelle

Individual

Physical object Quality Function Process

Gene OntologyCelltype

Sequence Ontology

GO-CC

ChEBI Ontology

AnatomyOntology

PhenotypeOntology

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 37 / 40

Conclusions

Challenges and future researchBiology

add phenotype information

20,000 knockout micedog, rat, slime mold, ...

define disease phenotypes

extension to other domains

functional genomicspharmacology, drug discoverysystems biologyclinical research, decision support

quantifiable evaluation

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 38 / 40

Conclusions

Acknowledgements

George Gkoutos

Heinrich Herre

Janet Kelso

Michel Dumontier

DietrichRebholz-Schuhmann

Nico Adams

Dan Cook

Bernard de Bono

John Gennari

Pierre Grenon

Pascal Hitzler

Frank Loebe

Anika Oellrich

Kay Pruefer

Paul Schofield

Stefan Schulz

Robert Stevens

Sarala Wimalaratne

...

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 39 / 40

Conclusions

Thank you!

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 40 / 40