My ontology is better than yours! Building and evaluating ontologies for integrative research

Post on 07-May-2015

811 views 0 download

Transcript of My ontology is better than yours! Building and evaluating ontologies for integrative research

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

My ontology is better than yours!Building and evaluating ontologies for integrative research

Robert Hoehndorf

Department of GeneticsUniversity of Cambridge

Bio-Ontology SIG

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Translational research

National Cancer Institute:

Translational research transforms scientific discoveries arising fromlaboratory, clinical, or population studies into clinical applicationsto reduce [disease] incidence, morbidity, and mortality.

slide by Robert Stevens

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologies

Gruber (1993):

An ontology is the explicit specification of a conceptualization of adomain.

controlled vocabularies

hierarchically organized

facilitate data integration

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologies

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologies

Body

Organ

Cell

Molecule

Tissue

Population

Gene

Transcript

Organelle

Individual

Physical object Quality Function Process

Gene OntologyCelltype

Sequence Ontology

GO-CC

ChEBI Ontology

AnatomyOntology

PhenotypeOntology

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologies

How can we find the “best” ontology?How can we develop the “best” ontology?

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologiesOntology evaluation

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologiesEvaluation criteria

ontology design principles rooted in

best practicesphilosophylogicontology engineeringlinguistics

community agreement

community requests

peer review

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

OntologyOntology evaluation

definitions

singular nouns

common relations

single is-a hierarchy

orthogonality

realism

...

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologies

Most ontology evaluation criteria are intrinsic criteria and evaluatewhat ontologies are.

How can we evaluate what ontologies do?

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologies

Most ontology evaluation criteria are intrinsic criteria and evaluatewhat ontologies are.

How can we evaluate what ontologies do?

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologiesA functional perspective

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologiesEvaluation criteria

criteria from software engineering, etc.

user studyunit testscomplexity...

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologiesA functional perspective

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Biomedical ontologiesEvaluation criteria

criteria from biology

experimentsstatistics (p-values)comparison to gold/silver standard...

PharmacogenomicsPharmacogenomics databases

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Research questions

drug discovery

drug repurposing

drug response

drug pathways

disease pathways

causal mutations

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Research questions

drug discovery

drug repurposing

drug response

drug pathways

disease pathways

causal mutations

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Traditional approaches to drug repurposing

drug target identification

models of drug binding

experiment design and execution (e.g., binding assays)

analysis and interpretation of experiment results

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Integrative approaches to drug repurposing

SIDER

text mining of drug labelsside-effect similarityUMLS

PREDICT

disease–disease similaritydrug–drug similaritydisease phenotypes, gene functions, side effects, chemicalstructure, protein interactions, text miningHPO, MESH, GO

OFFSIDES

adverse event reportsATC, UMLS

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Pharmacogenomics

Can we get some novel information about drug indications (andcausal mutations) by analyzing experimental data from animalmodels?

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Approach

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Approach

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Relevant ontologies

Mammalian Phenotype Ontology

9,161 classesmanually developedannotation of animal modelsformal (EQ) definitions

Human Phenotype Ontology

9,796 classesmanually developedannotation of diseasesformal (EQ) definitions

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Challenges

1 comparison of human and mouse phenotypes

cross-species integrationhow do we represent phenotypes?

2 computation of similarity

semantic similarity based on ontology taxonomywhich ontology do we use for computing similarity?

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Cross-species phenotype integration

representation of MP and HPO phenotypes

PATO-based formal definitionsGOhomologous and analogous anatomical structures (UBERON)

aim: cross-species integration of phenotypes

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

What are phenotypes and how do we represent them (forcross-species integration)?

Abnormal appendix: E=Appendix, Q=Abnormal

representation:

appendix with quality Abnormalquality Abnormal of some appendixorganism with appendix that has quality Abnormal...

inheritance of phenotypes across parthood

Abnormality of tip of appendix subclass of Abnormality ofappendix?

absence of appendix

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Semantic similarity

Semantic similarity results depend on

the number of distinctions made by ontology developers

the kind of distinctions made by ontology developers

the data that is analyzed

the similarity measure

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Semantic similarity

Should we compute phenotypic similarity based on the Human orthe Mammalian Phenotype Ontology (or both)? How can wecompare the results?

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Ontology design decisions can be resolved empirically!

no a priori “right” way to represent phenotypes

focus on scientific results, not representation

evaluation:

empiricalobjectivequantitativeexternal

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Ontology design decisions can be resolved empirically!

finish the analysis

use known gene–disease associations as gold standard

use FDA-approved drug indications as gold standard

compare analysis results against gold standard

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Semantic similarity over phenotype ontologies measuresphenotypic similarity

semantic similarity

pairwise comparison of disease and animal phenotypes

sim(P,D) =

∑x∈Cl(P)∩Cl(D)

IC (x)

∑y∈Cl(P)∪Cl(D)

IC (y)

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

PhenomeNET compares phenotypes across species

ranking of gene for each disease

candidate genes for disease

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Statistical testing to rank drug–disease pairs

one-sided Wilcoxon signed rank test

result: ranking of drugs for each disease based on p-value

low p-value: mutations in mouse genes associated with a drugresult in phenotypes that are very similar to a diseasephenotypehigh p-value: genes uniformly distributed across ranks

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Receiver Operating Characteristic

Source: Wikipedia

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Gene-disease associations

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tru

e P

ositiv

e R

ate

False Positive Rate

PhenomeNet initial

xoriginal

AUC: original 0.68

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Gene-disease associations

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tru

e P

ositiv

e R

ate

False Positive Rate

PhenomeNet improved

xoriginal

latest

AUC (original): 0.68AUC (latest): 0.89

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Gene-drug associations

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tru

e P

ositiv

e R

ate

False Positive Rate

PhenomeDrug initial

xoriginal

AUC: original 0.61

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Gene-drug associations

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tru

e P

ositiv

e R

ate

False Positive Rate

PhenomeDrug improved

xoriginal

latest

AUC (original): 0.61AUC (latest): 0.67

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Representation of phenotypes for cross-species integration

’Abnormality of appendix’ EquivalentTo: has-part

some (part-of some (Appendix and has-quality some

Quality))

organism-centric approach (has-part some)

transitivity over parthood (part-of some)

Quality used as indicator of abnormality

use of OWL EL

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Representation of phenotypes for cross-species integration

’Large appendix’ EquivalentTo: has-part some

(Appendix and has-quality some ’Increased size’)

organism-centric approach (has-part some)

no transitivity over parthood

use of OWL EL

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Absence

’Absence of appendix’ EquivalentTo: has-part some

(Appendix and has-quality some Absent)

subclass of Abnormality of appendix

use of OWL EL

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Semantic similarity

Should we compute phenotypic similarity based on the Human orthe Mammalian Phenotype Ontology (or both)? How can wecompare the results?

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Semantic similarity

Computation of semantic similarity using the MammalianPhenotype Ontology improves the analysis results.

problem specific

depending on mouse data

depending on the approach

depending on similarity measure

depending on gold standard dataset

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Conclusion

Quantitative, external evaluation can improve ontologies andontology-based analysis methods.

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Annotation

Definitions:

intrinsic:

having definitionsAristotelian definitions

external:

having definitions that are easily understandablehaving definitions that improve annotation consistency

criteria:

measure annotation consistencyuser study

Dolan, M. E., et al. A procedure for assessing GO annotation consistency. Bioinformatics 21, i136–i143 (2005).

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Annotation

Labels:

intrinsic:

singular nounsreference to universals

external:

use of common, widely used termsuse of unambiguous terms

criteria:

measure annotation consistencyuser studyrecall in text

Yao, L., et al. Benchmarking Ontologies: Bigger or Better? PLoS Comput Biol 7, e1001055 (Jan. 2011).

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Knowledge bases and querying

Queries:

intrinsic:

use of OWLuse of specific relationsuse of upper level ontologyconsistency

external:

retrieve correct answersretrieve relevant answers

criteria:

user study (to evaluate query answers)test setcomparison to gold standard

Boeker, M., et al. Unintended consequences of existential quantifications in biomedical ontologies. BMCBioinformatics 12, 456 (2011).

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Conclusions

My ontology is better than yours.

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Conclusions

My ontology is better than yours.

My ontology can do some things better than your ontology.

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

ConclusionsQuantitative criteria

Empirical, objective, quantitative, application-based evaluation willallow us to systematically improve ontologies for science.

Thank you for your attention

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Semantic similarity

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Semantic similarity: 112

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Semantic similarity

Introduction Biomedical ontology Use case: pharmacogenomics Outlook

Semantic similarity: 412