Ontologies for representing, integrating and analyzing phenotypes

46
Ontologies for representing, integrating and analyzing phenotypes Robert Hoehndorf Department of Genetics University of Cambridge 21 June 2011 Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 1 / 40

description

The development and application of high-throughput technologies in biology leads to a rapid increase of data and knowledge and enables the possibility for a paradigm shift towards the personalized treatment of disease based on an individual patient’s genetic markup. Major challenges that biology faces today are to integrate data across different databases, domains, levels of granularity and species, and to make the information resulting from high-throughput experiments amenable to scientific analyses and the discovery of mechanisms underlying disease. In my talk, I will demonstrate how formal ontologies combined with recent progress in automated reasoning can be used to represent, integrate and analyze data resulting from high-throughput phenotyping experiments. I will show how an expressive formal representation of phenotype ontologies can lead to interoperability with biomedical ontologies of other domains, illustrate an ontology modularization approach that enables the use of automated reasoning over these ontologies and show how to integrate phenotype data across multiple species. Finally, I will demonstrate how measures of semantic similarity can be applied to analyze high-throughput phenotype data and reveal novel gene-disease associations and discuss how an ontology-based approach to the semantic integration of data in biomedicine can facilitate translational research and personalized medicine.

Transcript of Ontologies for representing, integrating and analyzing phenotypes

Page 1: Ontologies for representing, integrating and analyzing phenotypes

Ontologies for representing, integrating and analyzingphenotypes

Robert Hoehndorf

Department of GeneticsUniversity of Cambridge

21 June 2011

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 1 / 40

Page 2: Ontologies for representing, integrating and analyzing phenotypes

Introduction Motivation

Motivation

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 2 / 40

Page 3: Ontologies for representing, integrating and analyzing phenotypes

Introduction Motivation

Motivation

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 3 / 40

Page 4: Ontologies for representing, integrating and analyzing phenotypes

Introduction Ontology

Open Biomedical Ontologies (OBO)

Body

Organ

Cell

Molecule

Tissue

Population

Gene

Transcript

Organelle

Individual

Physical object Quality Function Process

Gene OntologyCelltype

Sequence Ontology

GO-CC

ChEBI Ontology

AnatomyOntology

PhenotypeOntology

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 4 / 40

Page 5: Ontologies for representing, integrating and analyzing phenotypes

Introduction Ontology

OntologyPhenotype and anatomy ontologies

anatomy ontologies: > 100,000 classes

FMA, MA, WA, ZFA, FA, GO-CC, ...

phenotype ontologies: > 20,000 classes

HPO, MP, WBPhenotype, FBcv, APO, ...

quality ontology: > 2,000 classes

PATO

process and function ontologies: > 25,000 classes

Gene Ontology, ...

alignments between anatomy ontologies

UBERON, various mappings

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 5 / 40

Page 6: Ontologies for representing, integrating and analyzing phenotypes

Introduction Ontology

OntologyChallenges for interoperability

“merely using ontologies [...] does not reduce heterogeneity: itjust raises heterogeneity problems to a higher level” [Euzenat,2007]

implicit knowledge

implicit semantics

weakly formalized

very large

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 6 / 40

Page 7: Ontologies for representing, integrating and analyzing phenotypes

Introduction Ontology

OntologyExample query

Find all regions in the human and mouse genome sequences that areassociated with Tetralogy of Fallot.

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 7 / 40

Page 8: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Tetralogy of Fallot

Tetralogy of Fallot

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 8 / 40

Page 9: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotHuman phenotypes

Overriding aorta (HP:0002623)

Ventricular septal defect (HP:0001629)

Pulmonic stenosis (HP:0001642)

Right ventricular hypertrophy (HP:0001667)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 9 / 40

Page 10: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotPhenotype description syntax

Overriding aorta (HP:0002623):

Q: overlap with (PATO:0001590)

E1: Aorta (FMA:3734)

E2: Membranous part of interventricular septum (FMA:7135)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 10 / 40

Page 11: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotPhenotype description syntax

Overriding aorta (HP:0002623):

Q: overlap with (PATO:0001590)

E1: Aorta (FMA:3734)

E2: Membranous part of interventricular septum (FMA:7135)

HP:0002623 EquivalentTo:

phene-of some (has-part some (FMA:3734 and

has-quality some (PATO:0001590 and towards some

FMA:7135)))

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 11 / 40

Page 12: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotPhenotype description syntax

Overriding aorta (HP:0002623):

Q: overlap with (PATO:0001590)

E1: Aorta (FMA:3734)

E2: Membranous part of interventricular septum (FMA:7135)

HP:0002623 EquivalentTo:

phene-of some (has-part some (FMA:3734 and

overlaps-with some FMA:7135))

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 12 / 40

Page 13: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotUBERON human-mouse anatomy equivalences

Overriding aorta (HP:0002623):

Q: overlap with (PATO:0001590)

E1: Aorta (FMA:3734)

FMA:3734 EquivalentTo: MA:0000062

E2: Membranous part of interventricular septum (FMA:7135)

FMA:7135 EquivalentTo: MA:0002939

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 13 / 40

Page 14: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotPhenotype equivalence

Overriding aorta (MP:0000273):

Q: overlap with (PATO:0001590)

E1: Aorta (MA:0000062)

E2: Membranous interventricular septum (MA:0002939)

MP:0000273 EquivalentTo:

phene-of some (has-part some (MA:0000062 and

has-quality some (PATO:0001590 and towards some

MA:0002939)))

Consequence: MP:00000273 EquivalentTo: HP:0002623

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 14 / 40

Page 15: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Tetralogy of Fallot

Tetralogy of FallotPhenotype equivalence

Overriding aorta (MP:0000273):

Q: overlap with (PATO:0001590)

E1: Aorta (MA:0000062)

E2: Membranous interventricular septum (MA:0002939)

MP:0000273 EquivalentTo:

phene-of some (has-part some (MA:0000062 and

has-quality some (PATO:0001590 and towards some

MA:0002939)))

Consequence: MP:00000273 EquivalentTo: HP:0002623

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 14 / 40

Page 16: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Absence

AbsenceAbsent appendix

Absent appendix:

Q: lacks all parts of type (PATO:0002000)

E1: Human body (FMA:20394)

E2: Appendix (FMA:14542)

AbsentAppendix ≡LacksParts u ∃towards.Appendix u ∃inheresIn.HumanBody (Horrocks,2007)

AbsentAppendix ≡LacksParts u ∃towards.{Appendix} u ∃inheresIn.HumanBody(Mungall, 2007)

AbsentAppendix v ∃pheneOf .(HumanBody u ¬∃hasPart.Appendix)(H et al., 2007, 2011)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 15 / 40

Page 17: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Absence

AbsenceAbsent appendix

Absent appendix:

Q: lacks all parts of type (PATO:0002000)

E1: Human body (FMA:20394)

E2: Appendix (FMA:14542)

AbsentAppendix ≡LacksParts u ∃towards.Appendix u ∃inheresIn.HumanBody (Horrocks,2007)

AbsentAppendix ≡LacksParts u ∃towards.{Appendix} u ∃inheresIn.HumanBody(Mungall, 2007)

AbsentAppendix v ∃pheneOf .(HumanBody u ¬∃hasPart.Appendix)(H et al., 2007, 2011)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 15 / 40

Page 18: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Absence

AbsenceAbsent appendix

Absent appendix:

Q: lacks all parts of type (PATO:0002000)

E1: Human body (FMA:20394)

E2: Appendix (FMA:14542)

AbsentAppendix ≡LacksParts u ∃towards.Appendix u ∃inheresIn.HumanBody (Horrocks,2007)

AbsentAppendix ≡LacksParts u ∃towards.{Appendix} u ∃inheresIn.HumanBody(Mungall, 2007)

AbsentAppendix v ∃pheneOf .(HumanBody u ¬∃hasPart.Appendix)(H et al., 2007, 2011)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 15 / 40

Page 19: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Absence

AbsenceAbsent appendix

Absent appendix:

Q: lacks all parts of type (PATO:0002000)

E1: Human body (FMA:20394)

E2: Appendix (FMA:14542)

AbsentAppendix ≡LacksParts u ∃towards.Appendix u ∃inheresIn.HumanBody (Horrocks,2007)

AbsentAppendix ≡LacksParts u ∃towards.{Appendix} u ∃inheresIn.HumanBody(Mungall, 2007)

AbsentAppendix v ∃pheneOf .(HumanBody u ¬∃hasPart.Appendix)(H et al., 2007, 2011)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 15 / 40

Page 20: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Absence

AbsenceAbsent appendix

AbsentAppendix v ∃pheneOf .(HumanBody u ¬∃hasPart.Appendix)

FMA: HumanBody v ∃hasPart.Appendix

HumanBody(John),AbsentAppendix(x), hasPhene(John, x)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 16 / 40

Page 21: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Absence

AbsenceAbsent appendix

Removal of conflicting axioms (has-part/part-of in anatomy)

Contextualize anatomy:

Normal u HumanBody v ∃hasPart.(Normal u Appendix)

Use of non-monotonic reasoning:

Normally: HumanBody v ∃hasPart.AppendixCircumscription of ¬NormalImplementation in dlvhexIC-has-part(X,Y) :- ind(X),class(Y),inst(X,Z),

CC-normally-has-part(Z,Y), not IC-lacks-has-part(X,Y),

class(Z).

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 17 / 40

Page 22: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Absence

Ontology of phenotypes

Different formal expressions for phenotypes based on

qualities,

anatomical parts,

functions,

processes

enable cross-species integration of phenotypes.

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 18 / 40

Page 23: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Discovering mouse models

Tetralogy of Fallot

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 19 / 40

Page 24: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Discovering mouse models

Phenotype alignmentsMouse model: Phc1

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 20 / 40

Page 25: Ontologies for representing, integrating and analyzing phenotypes

Phenotype ontology Discovering mouse models

Phenotype alignmentsTetralogy of Fallot: Phc1

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 21 / 40

Page 26: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Modularization

Complexity of automated reasoning

ontologies based on OWL

OWL 2 is based on description logic (SROIQ)

satisfiability in SROIQ is 2NEXPTIME-complete

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 22 / 40

Page 27: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Modularization

Modularization

tractable subsets of OWL 2: EL, QL, RL

problem: identify a large (EL, QL, RL)-module of an OWL ontology

AbnormalityOfAppendix ≡∃pheneOf .(¬∃hasPart.(Normal u Appendix)) (��ZZEL)

AbsentAppendix ≡ ∃pheneOf .(¬∃hasPart.Appendix) (��ZZEL)

Inference: AbsentAppendix v AbnormalityOfAppendix (EL)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 23 / 40

Page 28: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Modularization

Modularization

tractable subsets of OWL 2: EL, QL, RL

problem: identify a large (EL, QL, RL)-module of an OWL ontology

AbnormalityOfAppendix ≡∃pheneOf .(¬∃hasPart.(Normal u Appendix)) (��ZZEL)

AbsentAppendix ≡ ∃pheneOf .(¬∃hasPart.Appendix) (��ZZEL)

Inference: AbsentAppendix v AbnormalityOfAppendix (EL)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 23 / 40

Page 29: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Modularization

Modularization

tractable subsets of OWL 2: EL, QL, RL

problem: identify a large (EL, QL, RL)-module of an OWL ontology

AbnormalityOfAppendix ≡∃pheneOf .(¬∃hasPart.(Normal u Appendix)) (��ZZEL)

AbsentAppendix ≡ ∃pheneOf .(¬∃hasPart.Appendix) (��ZZEL)

Inference: AbsentAppendix v AbnormalityOfAppendix (EL)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 23 / 40

Page 30: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Modularization

ModularizationEL Vira

http://el-vira.googlecode.com

ontology modularization

retain signature of ontology

identify EL, QL, RL axioms in deductive closure

completeness is open problem

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 24 / 40

Page 31: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Modularization

ModularizationEL Module

AbnormalityOfAppendix ≡∃pheneOf .(¬∃hasPart.(Normal u Appendix))

AbsentAppendix ≡ ∃pheneOf .(¬∃hasPart.Appendix)

AbsentAppendix v AbnormalityOfAppendix

H et al., 2011. A common layer of interoperability for biomedical ontologies based on OWL EL. Bioinformatics, 27(7), 1001–1008.

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 25 / 40

Page 32: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Applications and evaluation

Phenotype alignmentsPhenomeBLAST

apply to yeast, fly, worm, fish, mouse and human phenotypes

phenotype alignment through OWL reasoning

more than 300,000 classes and 1,000,000 axioms

combination of HermiT (for modularization), CB and CEL reasoner

classification time: 7 minutes

http://phenomeblast.googlecode.org

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 26 / 40

Page 33: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Applications and evaluation

Phenotype alignmentsPhenomeBLAST

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 27 / 40

Page 34: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Applications and evaluation

Phenotype alignmentsPhenomeBLAST

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 28 / 40

Page 35: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Applications and evaluation

ApplicationComparison of phenotypes

direct comparison of phenotypes:

disease phenotypes, e.g., tetralogy of Fallotphenotypes associated with genetic mutations (genotypes in mouse,fish, etc.)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 29 / 40

Page 36: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Applications and evaluation

ApplicationComparison of phenotypes

phenotype of mutations subclass of disease phenotype allows inference ofgene-disease association if

disease phenotypes sufficient for having the disease

mutation phenotypes necessary for having a specific genotype

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 30 / 40

Page 37: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Applications and evaluation

ApplicationSimilarity-based comparison

pairwise comparison of phenotypes

semantic similarity: weighted Jaccard index

result: similarity matrix between phenotypes

(quantitative) evaluation based on predicting orthology, pathway,disease

identify novel gene-disease associations

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 31 / 40

Page 38: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Applications and evaluation

ApplicationSimilarity-based comparison: ROC

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Tru

e po

sitiv

e ra

te

False positive rate

DiseaseOrthologyPathway

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 32 / 40

Page 39: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Applications and evaluation

ApplicationSimilarity-based comparison: gene-disease associations

Adam19 and Fgf15 genes in mice may be involved in Tetralogy ofFallot

Aberrant pathways

Cytokine-cytokine receptor interaction pathway (ko04060) issignificantly correlated with Tetralogy of Fallot (p = 5 · 10−7, Wilcoxonsigned-rank test)

Gene disease associations for orphan diseases

Slc34a1 (MGI:1345284) and Fanconi renotubular syndrome 1(OMIM:134600)

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 33 / 40

Page 40: Ontologies for representing, integrating and analyzing phenotypes

Knowledge representation Applications and evaluation

ApplicationPhenomeBrowser

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 34 / 40

Page 41: Ontologies for representing, integrating and analyzing phenotypes

Conclusions

SummaryAspects of ontology-based information systems in biology

knowledge representation language

expressivenessnon-monotonicitycomplexity of inferences

ontological decisions

anatomy (parthood, connectedness)physiology (function)pathology, disease (normality, abnormality)

statistical/similarity-based framework

semantic similarityaccount for incomplete informationaccount for noisy data

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 35 / 40

Page 42: Ontologies for representing, integrating and analyzing phenotypes

Conclusions

Challenges and future researchKnowledge representation

establish reasoning infrastructure (OWLlink, ...)

improve reasoning performance (OWL profiles, modularity,approximate reasoning)

OWL reasoning with prototypes, non-monotonic reasoning, abduction

explore alternatives to OWL

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 36 / 40

Page 43: Ontologies for representing, integrating and analyzing phenotypes

Conclusions

Challenges and future researchOntology

Body

Organ

Cell

Molecule

Tissue

Population

Gene

Transcript

Organelle

Individual

Physical object Quality Function Process

Gene OntologyCelltype

Sequence Ontology

GO-CC

ChEBI Ontology

AnatomyOntology

PhenotypeOntology

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 37 / 40

Page 44: Ontologies for representing, integrating and analyzing phenotypes

Conclusions

Challenges and future researchBiology

add phenotype information

20,000 knockout micedog, rat, slime mold, ...

define disease phenotypes

extension to other domains

functional genomicspharmacology, drug discoverysystems biologyclinical research, decision support

quantifiable evaluation

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 38 / 40

Page 45: Ontologies for representing, integrating and analyzing phenotypes

Conclusions

Acknowledgements

George Gkoutos

Heinrich Herre

Janet Kelso

Michel Dumontier

DietrichRebholz-Schuhmann

Nico Adams

Dan Cook

Bernard de Bono

John Gennari

Pierre Grenon

Pascal Hitzler

Frank Loebe

Anika Oellrich

Kay Pruefer

Paul Schofield

Stefan Schulz

Robert Stevens

Sarala Wimalaratne

...

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 39 / 40

Page 46: Ontologies for representing, integrating and analyzing phenotypes

Conclusions

Thank you!

Robert Hoehndorf (University of Cambridge) Phenotype ontologies 21 June 2011 40 / 40